Proposal: Progressive explain
Hello community,
CONTEXT:
Back in October I presented the talk "Debugging active queries with
mid-flight instrumented explain plans" at PGConf EU 2024
(recording: https://www.youtube.com/watch?v=6ahTb-7C05c) presenting
an experimental feature that enables visualization of in progress
EXPLAIN ANALYZE executions. Given the positive feedback and requests,
I am sending this patch with the feature, which I am calling Progressive
Explain.
PROPOSAL:
This proposal introduces a feature to print execution plans of active
queries in an in-memory shared hash object so that other sessions can
visualize them with a new view: pg_stat_progress_explain.
Plans are only printed if the new GUC parameter progressive_explain is
enabled.
For regular queries or queries started with EXPLAIN (without ANALYZE)
the plan is printed only once at the start.
For instrumented runs (started via EXPLAIN ANALYZE or when auto_explain
flag log_analyze is enabled), the plan is printed on a fixed interval
controlled by the new GUC parameter progressive_explain_interval. This plan
includes all instrumentation stats computed so far (per node rows and
execution time).
New view:
- pg_stat_progress_explain
- pid: PID of the process running the query
- last_explain: timestamp when plan was last printed
- explain_count: amount of times plan was printed
- total_explain_time: accumulated time spend printing plans (in ms)
- explain: the actual plan (limited read privileges)
New GUCs:
- progressive_explain: if progressive plans are printed for local
session.
- type: bool
- default: off
- context: user
- progressive_explain_interval: interval between each explain print.
- type: int
- default: 1s
- min: 10ms
- context: user
- progressive_explain_sample_rate: fraction of rows processed by the
query until progressive_explain_interval is evaluated to print a
progressive plan
- type: floating point
- default: 0.1
- range: (0.0 - 1.0)
- context: user
- progressive_explain_output_size: max output size of the plan
printed in the in-memory hash table.
- type: int
- default: 4096
- min: 100
- context: postmaster
- progressive_explain_format: format used to print the plans.
- type: enum
- default: text
- context: user
- progressive_explain_settings: controls whether information about
modified configuration is added to the printed plan.
- type: bool
- default: off
- context: user
- progressive_explain_verbose: controls whether verbose details are
added to the printed plan.
- type: bool
- default: off
- context: user
DEMONSTRATION:
postgres=# SET progressive_explain = ON;
SET
postgres=# EXPLAIN ANALYZE SELECT *
FROM test t1
UNION ALL
SELECT *
FROM test t1;
postgres=# select * from pg_stat_progress_explain;
-[ RECORD 1
]------+---------------------------------------------------------------------------------------------------------------------------------------
pid | 299663
last_explain | 2024-12-29 22:40:33.016833+00
explain_count | 5
total_explain_time | 0.205
explain | Append (cost=0.00..466670.40 rows=20000160 width=37)
(actual time=0.052..3372.318 rows=14013813 loops=1)
| Buffers: shared hit=4288 read=112501
| -> Seq Scan on test t1 (cost=0.00..183334.80
rows=10000080 width=37) (actual time=0.052..1177.428 rows=10000000 loops=1)
| Buffers: shared hit=4288 read=79046
| -> Seq Scan on test t1_1 (cost=0.00..183334.80
rows=10000080 width=37) (actual time=0.072..608.481 rows=4013813 loops=1)
(current)
| Buffers: shared read=33455
|
IMPLEMENTATION HIGHLIGHTS:
- The initial plan is printed at the end of standard_ExecutorStart
if progressive_explain is enabled, for both regular queries and
instrumented ones (EXPLAIN ANALYZE):
/*
* Start progressive explain if enabled.
*/
if (progressive_explain)
ProgressiveExplainBegin(queryDesc);
- The incremental plan print for instrumented runs uses a dedicated
ExecProcNode if progressive_explain is enabled:
if (node->instrument)
if (progressive_explain)
node->ExecProcNode = ExecProcNodeInstrExplain;
else
node->ExecProcNode = ExecProcNodeInstr;
else
node->ExecProcNode = node->ExecProcNodeReal;
- ExecProcNodeInstrExplain is identical to ExecProcNodeInstr with an
additional part to print plans based on a sampling logic:
/*
* Update progressive explain based on sampling.
*/
if (pg_prng_double(&pg_global_prng_state) < progressive_explain_sample_rate)
ProgressiveExplainUpdate(node);
That logic was added because ExecProcNodeInstrExplain is called once per
row processed (a lot of times) and performing the timestamp interval
check with progressive_explain_interval to decide whether to print
the plan (done inside ProgressiveExplainUpdate) is expensive. Benchmarks
(shared at the end of this email) show that sampling + timestamp check
gives much better results than performing the timestamp check at every
ExecProcNodeInstrExplain call.
- The plans are stored in a shared hash object (explainArray) allocated
at database start, similar to procArray. ExplainHashShmemSize() computes
shared memory needed for it, based on max_connections + max_parallel_workers
for the amount of elements in the array and progressive_explain_output_size
for the size per element.
- A memory context release callback is configured in the memory context
where the query is running, being responsible for updating explainArray
even when the query doesn't finish gracefully.
- Instrumented plans being printed incrementally need to clone
instrumentation
objects to change them, so each print uses a dedicated memory context
that gets released after the output is constructed. This avoids extended
private memory usage:
/* Dedicated memory context for the current plan being printed */
tmpCxt = AllocSetContextCreate(CurrentMemoryContext,
"Progressive Explain Temporary Context",
ALLOCSET_DEFAULT_SIZES);
- A new version of InstrEndLoop (InstrEndLoopForce) was created that allows
to be called targeting in-progress instrumented objects. Those are common
when traversing the plan tree of an active query.
- Column explain from pg_stat_progress_explain can only be visualized by
superusers or the same role that is running the query. If none of those
conditions are met, users will see "<insufficient privilege>".
- For instrumented runs, the printed includes 2 per node modifiers when
appropriate:
<current>: the plan node currently being processed.
<never executed>: a plan node not processed yet.
IMPLEMENTATION OVERHEAD:
When not used, the overhead added is:
- One IF at standard_ExecutorStart to check if progressive_explain is
enabled
- For instrumented runs (EXPLAIN ANALYZE), one IF at ExecProcNodeFirst
to define ExecProcNode wrapper
BENCHMARKS:
Performed 3 scenarios of benchmarks:
A) Comparison between unpatched PG18, patched with progressive explain
disabled and patched with feature enabled globally (all queries printing
the plan at query start:
- PG18 without patch:
postgres@ip-172-31-39-191:~$ /usr/local/pgsql/bin/pgbench -S -n -T 120 -c 30
number of transactions actually processed: 2173978
latency average = 1.655 ms
tps = 18127.977529 (without initial connection time)
- PG18 with patch:
-- progressive_explain = off
postgres@ip-172-31-39-191:~$ /usr/local/pgsql/bin/pgbench -S -n -T 120 -c 30
number of transactions actually processed: 2198806
latency average = 1.636 ms
tps = 18333.157809 (without initial connection time)
-- progressive_explain = on (prints plan only once per query)
postgres@ip-172-31-39-191:~$ /usr/local/pgsql/bin/pgbench -S -n -T 120 -c 30
number of transactions actually processed: 2047459
latency average = 1.756 ms
tps = 17081.477199 (without initial connection time)
B) EXPLAIN ANALYZE performance with different progressive_explain_interval
settings in patched:
-- progressive_explain = off
postgres@ip-172-31-39-191:~$ /usr/local/pgsql/bin/pgbench -n -T 120 -c 1 -f
script.sql
number of transactions actually processed: 27
latency average = 4492.845 ms
-- progressive_explain = on
-- progressive_explain_interval = 1s (default)
-- progressive_explain_sample_rate = 0.01 (default)
postgres@ip-172-31-39-191:~$ /usr/local/pgsql/bin/pgbench -n -T 300 -c 1 -f
script.sql
number of transactions actually processed: 26
latency average = 4656.067 ms
-- progressive_explain = on
-- progressive_explain_interval = 10ms
-- progressive_explain_sample_rate = 0.01 (default)
postgres@ip-172-31-39-191:~$ /usr/local/pgsql/bin/pgbench -n -T 300 -c 1 -f
script.sql
number of transactions actually processed: 26
latency average = 4785.608 ms
C) EXPLAIN ANALYZE performance in patched with and without
progressive_explain_sample_rate, ie, sampling with 2 different values
and also no sampling logic:
-- progressive_explain = off
postgres@ip-172-31-39-191:~$ /usr/local/pgsql/bin/pgbench -n -T 120 -c 1 -f
script.sql
number of transactions actually processed: 27
latency average = 4492.845 ms
-- progressive_explain = on
-- progressive_explain_interval = 1s (default)
-- progressive_explain_sample_rate = 0.01 (default)
postgres@ip-172-31-39-191:~$ /usr/local/pgsql/bin/pgbench -n -T 120 -c 1 -f
script.sql
number of transactions actually processed: 26
latency average = 4656.067 ms
-- progressive_explain = on
-- progressive_explain_interval = 1s (default)
-- progressive_explain_sample_rate = 1
postgres@ip-172-31-39-191:~$ /usr/local/pgsql/bin/pgbench -n -T 120 -c 1 -f
script.sql
number of transactions actually processed: 19
latency average = 6432.902 ms
-- progressive_explain = on
-- progressive_explain_interval = 1s (default)
-- NO SAMPLING LOGIC
postgres@ip-172-31-39-191:~$ /usr/local/pgsql/bin/pgbench -n -T 120 -c 1 -f
script.sql
number of transactions actually processed: 21
latency average = 5864.820 ms
BENCHMARK RESULTS:
It definitely needs more testing but preliminary results show that:
- From (A) we see that the patch adds negligible overhead when the
feature is not used. Enabling globally reduces overall TPS as all
queries are spending time printing the plan. The idea is to enable
progressive_explain on a per-need basis, only to a subset of sessions
that need it.
- From (B) we see that using progressive explain slightly increases
total execution time. Difference between using progressive_explain_interval
set to 1s (plan printed 4 times per query in the test) and to
10ms (plan printed ~460 times per query in the test) is very small.
The actual overhead appears when changing progressive_explain_sample_rate.
- From (C) we see that progressive_explain_sample_rate with a low
value (default 0.01) performs better than not using sampling or
using progressive_explain_sample_rate = 1. So the overhead of having
the sampling logic is much lower than not sampling at all.
TESTS:
Currently working on tests for a second version of the patch.
DOCUMENTATION:
Added documentation for the new view pg_stat_progress_explain,
new GUCs and a new item in section 14.1:
14.1. Using EXPLAIN
14.1.1. EXPLAIN Basics
14.1.2. EXPLAIN ANALYZE
14.1.3. Caveats
14.1.4. Progressive EXPLAIN
FURTHER DISCUSSION:
Considering that this patch introduces a new major feature with
several new components (view, GUCs, etc), there is open room for
discussion such as:
- Do the columns in pg_stat_progress_explain make sense? Are we
missing or adding unnecessary columns?
- Do the new GUCs make sense and are their default values appropriate?
- Do we want progressive explain to print plans of regular queries
started without EXPLAIN if progressive_explain is enabled or should
the feature be restricted to instrumented queries (EXPLAIN ANALYZE)?
- Is the size of explainHash based on max_connections + max_parallel_workers
large enough or are there other types of backends that use the
executor and will print plans too?
Regards,
Rafael Castro.
Attachments:
v1-0001-Proposal-for-progressive-explains.patchapplication/octet-stream; name=v1-0001-Proposal-for-progressive-explains.patchDownload
From 8e5bc90c7d1aedbf1f8bd2effe9428d147094539 Mon Sep 17 00:00:00 2001
From: Rafael Castro <rafaelthca@gmail.com>
Date: Sun, 17 Nov 2024 20:37:04 -0300
Subject: [PATCH v1] Proposal for progressive explains
This proposal introduces a feature to print execution plans of active
queries in an in-memory shared hash object so that other sessions can
visualize via new view pg_stat_progress_explain.
Plans are only printed if new GUC parameter progressive_explain is
enabled.
For regular queries or queries started with EXPLAIN (without ANALYZE)
the plan is printed only once at the start.
For instrumented runs (started via EXPLAIN ANALYZE or when auto_explain
flag log_analyze is enabled) the plan is printed on a fixed interval
controlled by new GUC parameter progressive_explain_interval including
all instrumentation stats computed so far (per node rows and execution
time).
New view:
- pg_stat_progress_explain
- pid: PID of the process running the query
- last_explain: timestamp when plan was last printed
- explain_count: amount of times plan was printed
- total_explain_time: accumulated time spent printing plans (in ms)
- explain: the actual plan (limited read privileges)
New GUCs:
- progressive_explain: if progressive plans are printed for local
session.
- type: bool
- default: off
- context: user
- progressive_explain_interval: interval between each explain print.
- type: int
- default: 1s
- min: 10ms
- context: user
- progressive_explain_sample_rate: fraction of rows processed by the
query until progressive_explain_interval is evaluated to print a
progressive plan
- type: floating point
- default: 0.1
- range: 0.0 - 1.0
- context: user
- progressive_explain_output_size: max output size of the plan
printed in the in-memory hash table.
- type: int
- default: 4096
- min: 100
- context: postmaster
- progressive_explain_format: format used to print the plans.
- type: enum
- default: text
- context: user
- progressive_explain_settings: controls whether information about
modified configuration is added to the printed plan.
- type: bool
- default: off
- context: user
- progressive_explain_verbose: controls whether verbose details are
added to the printed plan.
- type: bool
- default: off
- context: user
---
doc/src/sgml/config.sgml | 109 ++++
doc/src/sgml/monitoring.sgml | 84 +++
doc/src/sgml/perform.sgml | 100 ++++
src/backend/catalog/system_views.sql | 5 +
src/backend/commands/explain.c | 479 ++++++++++++++++--
src/backend/executor/execMain.c | 12 +
src/backend/executor/execProcnode.c | 35 +-
src/backend/executor/instrument.c | 20 +-
src/backend/storage/ipc/ipci.c | 7 +
.../utils/activity/wait_event_names.txt | 1 +
src/backend/utils/misc/guc_tables.c | 92 ++++
src/include/catalog/pg_proc.dat | 10 +
src/include/commands/explain.h | 21 +
src/include/executor/instrument.h | 1 +
src/include/nodes/execnodes.h | 5 +
src/include/storage/lwlocklist.h | 1 +
src/include/utils/guc.h | 7 +
17 files changed, 937 insertions(+), 52 deletions(-)
diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml
index a84e60c09b..bde7631268 100644
--- a/doc/src/sgml/config.sgml
+++ b/doc/src/sgml/config.sgml
@@ -8462,6 +8462,115 @@ COPY postgres_log FROM '/full/path/to/logfile.csv' WITH csv;
</listitem>
</varlistentry>
+ <varlistentry id="guc-progressive-explain" xreflabel="progressive_explain">
+ <term><varname>progressive_explain</varname> (<type>boolean</type>)
+ <indexterm>
+ <primary><varname>progressive_explain</varname> configuration parameter</primary>
+ </indexterm>
+ </term>
+ <listitem>
+ <para>
+ Determines whether progressive explains are enabled; see
+ <xref linkend="using-explain-progressive"/>. The default is
+ <literal>off</literal>.
+ </para>
+ </listitem>
+ </varlistentry>
+
+ <varlistentry id="guc-progressive-explain-verbose" xreflabel="progressive_explain_verbose">
+ <term><varname>progressive_explain_verbose</varname> (<type>boolean</type>)
+ <indexterm>
+ <primary><varname>progressive_explain_verbose</varname> configuration parameter</primary>
+ </indexterm>
+ </term>
+ <listitem>
+ <para>
+ Controls whether verbose details are added to progressive explains.
+ Equivalent to the VERBOSE option of EXPLAIN. The default is
+ <literal>off</literal>.
+ </para>
+ </listitem>
+ </varlistentry>
+
+ <varlistentry id="guc-progressive-explain-settings" xreflabel="progressive_explain_settings">
+ <term><varname>progressive_explain_settings</varname> (<type>boolean</type>)
+ <indexterm>
+ <primary><varname>progressive_explain_settings</varname> configuration parameter</primary>
+ </indexterm>
+ </term>
+ <listitem>
+ <para>
+ Controls whether information about modified configuration is added to
+ progressive explains. Equivalent to the SETTINGS option of EXPLAIN.
+ The default is <literal>off</literal>.
+ </para>
+ </listitem>
+ </varlistentry>
+
+ <varlistentry id="guc-progressive-explain-interval" xreflabel="progressive_explain_interval">
+ <term><varname>progressive_explain_interval</varname> (<type>integer</type>)
+ <indexterm>
+ <primary><varname>progressive_explain_interval</varname> configuration parameter</primary>
+ </indexterm>
+ </term>
+ <listitem>
+ <para>
+ Sets the interval (in milliseconds) between each instrumented
+ progressive explain. The default is <literal>1s</literal>.
+ </para>
+ </listitem>
+ </varlistentry>
+
+ <varlistentry id="guc-progressive-explain-format" xreflabel="progressive_explain_format">
+ <term><varname>progressive_explain_format</varname> (<type>enum</type>)
+ <indexterm>
+ <primary><varname>progressive_explain_format</varname> configuration parameter</primary>
+ </indexterm>
+ </term>
+ <listitem>
+ <para>
+ Selects the EXPLAIN output format to be used with progressive
+ explains. Equivalent to the FORMAT option of EXPLAIN. The default
+ is <literal>text</literal>.
+ </para>
+ </listitem>
+ </varlistentry>
+
+ <varlistentry id="guc-progressive-explain-output-size" xreflabel="progressive_explain_output_size">
+ <term><varname>progressive_explain_output_size</varname> (<type>integer</type>)
+ <indexterm>
+ <primary><varname>progressive_explain_output_size</varname> configuration parameter</primary>
+ </indexterm>
+ </term>
+ <listitem>
+ <para>
+ Specifies the amount of memory reserved to store the text of the
+ progressive explain for each client backend or parallel worker, for the
+ <structname>pg_stat_progress_explain</structname>.<structfield>explain</structfield>
+ field. If this value is specified without units, it is taken as bytes.
+ The default value is <literal>4096 bytes</literal>.
+ This parameter can only be set at server start.
+ </para>
+ </listitem>
+ </varlistentry>
+
+ <varlistentry id="guc-progressive-explain-sample-rate" xreflabel="progressive_explain_sample_rate">
+ <term><varname>progressive_explain_sample_rate</varname> (<type>floating point</type>)
+ <indexterm>
+ <primary><varname>progressive_explain_sample_rate</varname> configuration parameter</primary>
+ </indexterm>
+ </term>
+ <listitem>
+ <para>
+ Fraction of rows processed by the query until
+ <xref linkend="guc-progressive-explain-interval"/> is evaluated
+ to print a progressive explain plan. The default value is
+ <literal>0.01</literal>, resulting in 1 check every 100 processed
+ rows.
+ </para>
+ </listitem>
+ </varlistentry>
+
</variablelist>
</sect2>
diff --git a/doc/src/sgml/monitoring.sgml b/doc/src/sgml/monitoring.sgml
index 840d7f8161..d2beb91893 100644
--- a/doc/src/sgml/monitoring.sgml
+++ b/doc/src/sgml/monitoring.sgml
@@ -6688,6 +6688,90 @@ FROM pg_stat_get_backend_idset() AS backendid;
</sect2>
+<sect2 id="explain-progress-reporting">
+ <title>EXPLAIN Progress Reporting</title>
+
+ <indexterm>
+ <primary>pg_stat_progress_explain</primary>
+ </indexterm>
+
+ <para>
+ Whenever a client backend or parallel worker is running a query with
+ <xref linkend="guc-progressive-explain"/> enabled, the
+ <structname>pg_stat_progress_explain</structname> view will contain a
+ corresponding row with query plan details; see
+ <xref linkend="using-explain-progressive"/>. The table below describe the
+ information that will be reported.
+ </para>
+
+ <table id="pg-stat-progress-explain-view" xreflabel="pg_stat_progress_explain">
+ <title><structname>pg_stat_progress_explain</structname> View</title>
+ <tgroup cols="1">
+ <thead>
+ <row>
+ <entry role="catalog_table_entry"><para role="column_definition">
+ Column Type
+ </para>
+ <para>
+ Description
+ </para></entry>
+ </row>
+ </thead>
+
+ <tbody>
+ <row>
+ <entry role="catalog_table_entry"><para role="column_definition">
+ <structfield>pid</structfield> <type>integer</type>
+ </para>
+ <para>
+ Process ID of a client backend or parallel worker.
+ </para></entry>
+ </row>
+
+ <row>
+ <entry role="catalog_table_entry"><para role="column_definition">
+ <structfield>last_explain</structfield> <type>timestamp with time zone</type>
+ </para>
+ <para>
+ Timestamp when plan was last printed.
+ </para></entry>
+ </row>
+
+ <row>
+ <entry role="catalog_table_entry"><para role="column_definition">
+ <structfield>explain_count</structfield> <type>integer</type>
+ </para>
+ <para>
+ Amount of times plan was printed.
+ </para></entry>
+ </row>
+
+ <row>
+ <entry role="catalog_table_entry"><para role="column_definition">
+ <structfield>total_explain_time</structfield> <type>floating point</type>
+ </para>
+ <para>
+ Accumulated time spent printing plans (in ms).
+ </para></entry>
+ </row>
+
+ <row>
+ <entry role="catalog_table_entry"><para role="column_definition">
+ <structfield>explain</structfield> <type>text</type>
+ </para>
+ <para>
+ The progressive explain text. By default the explain text is
+ truncated at 4096 bytes; this value can be changed via the
+ parameter <xref linkend="guc-progressive-explain-output-size"/>.
+ </para></entry>
+ </row>
+
+ </tbody>
+ </tgroup>
+ </table>
+
+ </sect2>
+
</sect1>
<sect1 id="dynamic-trace">
diff --git a/doc/src/sgml/perform.sgml b/doc/src/sgml/perform.sgml
index cd12b9ce48..dd2d21edb3 100644
--- a/doc/src/sgml/perform.sgml
+++ b/doc/src/sgml/perform.sgml
@@ -1091,6 +1091,106 @@ EXPLAIN ANALYZE SELECT * FROM tenk1 WHERE unique1 < 100 AND unique2 > 9000
</para>
</sect2>
+ <sect2 id="using-explain-progressive">
+ <title>Progressive <command>EXPLAIN</command></title>
+
+ <para>
+ The query plan created by the planner for any given active query, or
+ detailed plan with row counts and accumulated run time when
+ <command>EXPLAIN ANALYZE</command> is used, can be visualized by any
+ session via <xref linkend="pg-stat-progress-explain-view"/> view when
+ <xref linkend="guc-progressive-explain"/> is enabled in the client
+ backend or parallel worker executing query. Settings
+ <xref linkend="guc-progressive-explain-format"/>,
+ <xref linkend="guc-progressive-explain-verbose"/> and
+ <xref linkend="guc-progressive-explain-settings"/> can be adjusted
+ to customize the printed plan, containing a length limit defined by
+ <xref linkend="guc-progressive-explain-output-size"/>.
+ </para>
+
+ <para>
+ For queries executed without <command>EXPLAIN ANALYZE</command> the
+ plan is printed only once at the beginning of query execution:
+ </para>
+
+ <para>
+<screen>
+SET progressive_explain = on;
+SET
+
+SELECT *
+FROM test t1
+INNER JOIN test t2 ON (t1.c1 = t2.c1);
+</screen>
+ </para>
+ <para>
+<screen>
+SELECT pid, explain_count, explain FROM pg_stat_progress_explain;
+ pid | explain_count | explain
+-----+---------------+--------------------------------------------------------------------------------
+ 159 | 1 | Hash Join (cost=1159375.00..3912500.00 rows=30000000 width=74)
+ | | Hash Cond: (t1.c1 = t2.c1)
+ | | -> Seq Scan on test t1 (cost=0.00..550000.00 rows=30000000 width=37)
+ | | -> Hash (cost=550000.00..550000.00 rows=30000000 width=37)
+ | | -> Seq Scan on test t2 (cost=0.00..550000.00 rows=30000000 width=37)
+ | |
+</screen>
+ </para>
+
+ <para>
+ When <command>EXPLAIN ANALYZE</command> is used the detailed plan is
+ printed progressively based on
+ <xref linkend="guc-progressive-explain-interval"/> and
+ <xref linkend="guc-progressive-explain-sample-rate"/> settings, including
+ per node accumulated row count and run time statistics computed so far. This
+ makes progressive explain a powerful ally when investigating bottlenecks in
+ expensive queries without having to wait for <command>EXPLAIN ANALYZE</command>
+ to finish.
+ </para>
+
+ <para>
+ Progressive explains include additional information per node to help analyzing
+ execution progress:
+
+ <itemizedlist>
+ <listitem>
+ <para>
+ current: the plan node currently being processed.
+ </para>
+ </listitem>
+ <listitem>
+ <para>
+ never executed: a plan node not processed yet.
+ </para>
+ </listitem>
+ </itemizedlist>
+ </para>
+ <para>
+<screen>
+SET progressive_explain = on;
+SET
+
+EXPLAIN ANALYZE SELECT *
+FROM test t1
+INNER JOIN test t2 ON (t1.c1 = t2.c1);
+</screen>
+ </para>
+ <para>
+<screen>
+SELECT pid, explain_count, explain FROM pg_stat_progress_explain;
+ pid | explain_count | explain
+-----+---------------+----------------------------------------------------------------------------------------------------------------------------------------------
+ 159 | 7 | Hash Join (cost=1159375.00..3912500.00 rows=30000000 width=74) (never executed)
+ | | Hash Cond: (t1.c1 = t2.c1)
+ | | -> Seq Scan on test t1 (cost=0.00..550000.00 rows=30000000 width=37) (actual time=0.009..0.009 rows=1 loops=1)
+ | | -> Hash (cost=550000.00..550000.00 rows=30000000 width=37) (never executed)
+ | | -> Seq Scan on test t2 (cost=0.00..550000.00 rows=30000000 width=37) (actual time=0.004..2165.201 rows=27925599 loops=1) (current)
+ | |
+</screen>
+ </para>
+
+ </sect2>
+
</sect1>
<sect1 id="planner-stats">
diff --git a/src/backend/catalog/system_views.sql b/src/backend/catalog/system_views.sql
index da9a8fe99f..4021b1ee6b 100644
--- a/src/backend/catalog/system_views.sql
+++ b/src/backend/catalog/system_views.sql
@@ -1325,6 +1325,11 @@ CREATE VIEW pg_stat_progress_copy AS
FROM pg_stat_get_progress_info('COPY') AS S
LEFT JOIN pg_database D ON S.datid = D.oid;
+CREATE VIEW pg_stat_progress_explain AS
+ SELECT
+ *
+ FROM pg_stat_progress_explain(true);
+
CREATE VIEW pg_user_mappings AS
SELECT
U.oid AS umid,
diff --git a/src/backend/commands/explain.c b/src/backend/commands/explain.c
index 7c0fd63b2f..1f37ec755d 100644
--- a/src/backend/commands/explain.c
+++ b/src/backend/commands/explain.c
@@ -13,6 +13,8 @@
*/
#include "postgres.h"
+#include <time.h>
+
#include "access/xact.h"
#include "catalog/pg_type.h"
#include "commands/createas.h"
@@ -22,6 +24,8 @@
#include "jit/jit.h"
#include "libpq/pqformat.h"
#include "libpq/protocol.h"
+#include "miscadmin.h"
+#include "funcapi.h"
#include "nodes/extensible.h"
#include "nodes/makefuncs.h"
#include "nodes/nodeFuncs.h"
@@ -40,6 +44,12 @@
#include "utils/tuplesort.h"
#include "utils/typcache.h"
#include "utils/xml.h"
+#include "utils/backend_status.h"
+#include "storage/procarray.h"
+#include "executor/spi.h"
+#include "utils/guc.h"
+
+
/* Hook for plugins to get control in ExplainOneQuery() */
@@ -48,6 +58,8 @@ ExplainOneQuery_hook_type ExplainOneQuery_hook = NULL;
/* Hook for plugins to get control in explain_get_index_name() */
explain_get_index_name_hook_type explain_get_index_name_hook = NULL;
+/* Shared hash to store progressive explains */
+static HTAB *explainArray = NULL;
/* Instrumentation data for SERIALIZE option */
typedef struct SerializeMetrics
@@ -140,7 +152,7 @@ static void show_hashagg_info(AggState *aggstate, ExplainState *es);
static void show_tidbitmap_info(BitmapHeapScanState *planstate,
ExplainState *es);
static void show_instrumentation_count(const char *qlabel, int which,
- PlanState *planstate, ExplainState *es);
+ Instrumentation *instr, ExplainState *es);
static void show_foreignscan_info(ForeignScanState *fsstate, ExplainState *es);
static const char *explain_get_index_name(Oid indexId);
static bool peek_buffer_usage(ExplainState *es, const BufferUsage *usage);
@@ -180,6 +192,8 @@ static void ExplainJSONLineEnding(ExplainState *es);
static void ExplainYAMLLineStarting(ExplainState *es);
static void escape_yaml(StringInfo buf, const char *str);
static SerializeMetrics GetSerializationMetrics(DestReceiver *dest);
+void ProgressiveExplainPrint(QueryDesc *queryDesc);
+static void ExplainTrackQueryReleaseFunc(void *);
@@ -385,6 +399,8 @@ NewExplainState(void)
es->costs = true;
/* Prepare output buffer. */
es->str = makeStringInfo();
+ /* An explain state is not progressive by default */
+ es->progressive = false;
return es;
}
@@ -1497,6 +1513,7 @@ ExplainNode(PlanState *planstate, List *ancestors,
const char *partialmode = NULL;
const char *operation = NULL;
const char *custom_name = NULL;
+ Instrumentation *local_instr = NULL;
ExplainWorkersState *save_workers_state = es->workers_state;
int save_indent = es->indent;
bool haschildren;
@@ -1960,24 +1977,56 @@ ExplainNode(PlanState *planstate, List *ancestors,
* instrumentation results the user didn't ask for. But we do the
* InstrEndLoop call anyway, if possible, to reduce the number of cases
* auto_explain has to contend with.
+ *
+ * For regular explains instrumentation clean up is called directly in the
+ * main instrumentation objects. progressive explains need to clone
+ * instrumentation object and forcibly end the loop in nodes that may be
+ * running.
*/
if (planstate->instrument)
- InstrEndLoop(planstate->instrument);
+ {
+ /* Progressive explain. Clone instrumentation */
+ if (es->progressive)
+ {
+ local_instr = palloc0(sizeof(*local_instr));
+ *local_instr = *planstate->instrument;
+ /* Force end loop even if node is in progress */
+ InstrEndLoopForce(local_instr);
+ }
+ /* Use main instrumentation */
+ else
+ {
+ local_instr = planstate->instrument;
+ InstrEndLoop(local_instr);
+ }
+ }
if (es->analyze &&
- planstate->instrument && planstate->instrument->nloops > 0)
+ local_instr && local_instr->nloops > 0)
{
- double nloops = planstate->instrument->nloops;
- double startup_ms = 1000.0 * planstate->instrument->startup / nloops;
- double total_ms = 1000.0 * planstate->instrument->total / nloops;
- double rows = planstate->instrument->ntuples / nloops;
+ double nloops = local_instr->nloops;
+ double startup_ms = 1000.0 * local_instr->startup / nloops;
+ double total_ms = 1000.0 * local_instr->total / nloops;
+ double rows = local_instr->ntuples / nloops;
if (es->format == EXPLAIN_FORMAT_TEXT)
{
if (es->timing)
+ /* Node in progress */
+ if (es->progressive && planstate == planstate->state->progressive_explain_current_node)
+ appendStringInfo(es->str,
+ " (actual time=%.3f..%.3f rows=%.0f loops=%.0f) (current)",
+ startup_ms, total_ms, rows, nloops);
+ else
+ appendStringInfo(es->str,
+ " (actual time=%.3f..%.3f rows=%.0f loops=%.0f)",
+ startup_ms, total_ms, rows, nloops);
+ else
+ /* Node in progress */
+ if (es->progressive && planstate == planstate->state->progressive_explain_current_node)
appendStringInfo(es->str,
- " (actual time=%.3f..%.3f rows=%.0f loops=%.0f)",
- startup_ms, total_ms, rows, nloops);
+ " (actual rows=%.0f loops=%.0f) (current)",
+ rows, nloops);
else
appendStringInfo(es->str,
" (actual rows=%.0f loops=%.0f)",
@@ -1992,6 +2041,11 @@ ExplainNode(PlanState *planstate, List *ancestors,
ExplainPropertyFloat("Actual Total Time", "ms", total_ms,
3, es);
}
+ /* Progressive explain. Add current node flag is applicable */
+ if (es->progressive && planstate == planstate->state->progressive_explain_current_node)
+ {
+ ExplainPropertyBool("Current", true, es);
+ }
ExplainPropertyFloat("Actual Rows", NULL, rows, 0, es);
ExplainPropertyFloat("Actual Loops", NULL, nloops, 0, es);
}
@@ -2100,29 +2154,29 @@ ExplainNode(PlanState *planstate, List *ancestors,
"Index Cond", planstate, ancestors, es);
if (((IndexScan *) plan)->indexqualorig)
show_instrumentation_count("Rows Removed by Index Recheck", 2,
- planstate, es);
+ local_instr, es);
show_scan_qual(((IndexScan *) plan)->indexorderbyorig,
"Order By", planstate, ancestors, es);
show_scan_qual(plan->qual, "Filter", planstate, ancestors, es);
if (plan->qual)
show_instrumentation_count("Rows Removed by Filter", 1,
- planstate, es);
+ local_instr, es);
break;
case T_IndexOnlyScan:
show_scan_qual(((IndexOnlyScan *) plan)->indexqual,
"Index Cond", planstate, ancestors, es);
if (((IndexOnlyScan *) plan)->recheckqual)
show_instrumentation_count("Rows Removed by Index Recheck", 2,
- planstate, es);
+ local_instr, es);
show_scan_qual(((IndexOnlyScan *) plan)->indexorderby,
"Order By", planstate, ancestors, es);
show_scan_qual(plan->qual, "Filter", planstate, ancestors, es);
if (plan->qual)
show_instrumentation_count("Rows Removed by Filter", 1,
- planstate, es);
+ local_instr, es);
if (es->analyze)
ExplainPropertyFloat("Heap Fetches", NULL,
- planstate->instrument->ntuples2, 0, es);
+ local_instr->ntuples2, 0, es);
break;
case T_BitmapIndexScan:
show_scan_qual(((BitmapIndexScan *) plan)->indexqualorig,
@@ -2133,11 +2187,11 @@ ExplainNode(PlanState *planstate, List *ancestors,
"Recheck Cond", planstate, ancestors, es);
if (((BitmapHeapScan *) plan)->bitmapqualorig)
show_instrumentation_count("Rows Removed by Index Recheck", 2,
- planstate, es);
+ local_instr, es);
show_scan_qual(plan->qual, "Filter", planstate, ancestors, es);
if (plan->qual)
show_instrumentation_count("Rows Removed by Filter", 1,
- planstate, es);
+ local_instr, es);
show_tidbitmap_info((BitmapHeapScanState *) planstate, es);
break;
case T_SampleScan:
@@ -2154,7 +2208,7 @@ ExplainNode(PlanState *planstate, List *ancestors,
show_scan_qual(plan->qual, "Filter", planstate, ancestors, es);
if (plan->qual)
show_instrumentation_count("Rows Removed by Filter", 1,
- planstate, es);
+ local_instr, es);
if (IsA(plan, CteScan))
show_ctescan_info(castNode(CteScanState, planstate), es);
break;
@@ -2165,7 +2219,7 @@ ExplainNode(PlanState *planstate, List *ancestors,
show_scan_qual(plan->qual, "Filter", planstate, ancestors, es);
if (plan->qual)
show_instrumentation_count("Rows Removed by Filter", 1,
- planstate, es);
+ local_instr, es);
ExplainPropertyInteger("Workers Planned", NULL,
gather->num_workers, es);
@@ -2189,7 +2243,7 @@ ExplainNode(PlanState *planstate, List *ancestors,
show_scan_qual(plan->qual, "Filter", planstate, ancestors, es);
if (plan->qual)
show_instrumentation_count("Rows Removed by Filter", 1,
- planstate, es);
+ local_instr, es);
ExplainPropertyInteger("Workers Planned", NULL,
gm->num_workers, es);
@@ -2223,7 +2277,7 @@ ExplainNode(PlanState *planstate, List *ancestors,
show_scan_qual(plan->qual, "Filter", planstate, ancestors, es);
if (plan->qual)
show_instrumentation_count("Rows Removed by Filter", 1,
- planstate, es);
+ local_instr, es);
break;
case T_TableFuncScan:
if (es->verbose)
@@ -2237,7 +2291,7 @@ ExplainNode(PlanState *planstate, List *ancestors,
show_scan_qual(plan->qual, "Filter", planstate, ancestors, es);
if (plan->qual)
show_instrumentation_count("Rows Removed by Filter", 1,
- planstate, es);
+ local_instr, es);
show_table_func_scan_info(castNode(TableFuncScanState,
planstate), es);
break;
@@ -2255,7 +2309,7 @@ ExplainNode(PlanState *planstate, List *ancestors,
show_scan_qual(plan->qual, "Filter", planstate, ancestors, es);
if (plan->qual)
show_instrumentation_count("Rows Removed by Filter", 1,
- planstate, es);
+ local_instr, es);
}
break;
case T_TidRangeScan:
@@ -2272,14 +2326,14 @@ ExplainNode(PlanState *planstate, List *ancestors,
show_scan_qual(plan->qual, "Filter", planstate, ancestors, es);
if (plan->qual)
show_instrumentation_count("Rows Removed by Filter", 1,
- planstate, es);
+ local_instr, es);
}
break;
case T_ForeignScan:
show_scan_qual(plan->qual, "Filter", planstate, ancestors, es);
if (plan->qual)
show_instrumentation_count("Rows Removed by Filter", 1,
- planstate, es);
+ local_instr, es);
show_foreignscan_info((ForeignScanState *) planstate, es);
break;
case T_CustomScan:
@@ -2289,7 +2343,7 @@ ExplainNode(PlanState *planstate, List *ancestors,
show_scan_qual(plan->qual, "Filter", planstate, ancestors, es);
if (plan->qual)
show_instrumentation_count("Rows Removed by Filter", 1,
- planstate, es);
+ local_instr, es);
if (css->methods->ExplainCustomScan)
css->methods->ExplainCustomScan(css, ancestors, es);
}
@@ -2299,11 +2353,11 @@ ExplainNode(PlanState *planstate, List *ancestors,
"Join Filter", planstate, ancestors, es);
if (((NestLoop *) plan)->join.joinqual)
show_instrumentation_count("Rows Removed by Join Filter", 1,
- planstate, es);
+ local_instr, es);
show_upper_qual(plan->qual, "Filter", planstate, ancestors, es);
if (plan->qual)
show_instrumentation_count("Rows Removed by Filter", 2,
- planstate, es);
+ local_instr, es);
break;
case T_MergeJoin:
show_upper_qual(((MergeJoin *) plan)->mergeclauses,
@@ -2312,11 +2366,11 @@ ExplainNode(PlanState *planstate, List *ancestors,
"Join Filter", planstate, ancestors, es);
if (((MergeJoin *) plan)->join.joinqual)
show_instrumentation_count("Rows Removed by Join Filter", 1,
- planstate, es);
+ local_instr, es);
show_upper_qual(plan->qual, "Filter", planstate, ancestors, es);
if (plan->qual)
show_instrumentation_count("Rows Removed by Filter", 2,
- planstate, es);
+ local_instr, es);
break;
case T_HashJoin:
show_upper_qual(((HashJoin *) plan)->hashclauses,
@@ -2325,11 +2379,11 @@ ExplainNode(PlanState *planstate, List *ancestors,
"Join Filter", planstate, ancestors, es);
if (((HashJoin *) plan)->join.joinqual)
show_instrumentation_count("Rows Removed by Join Filter", 1,
- planstate, es);
+ local_instr, es);
show_upper_qual(plan->qual, "Filter", planstate, ancestors, es);
if (plan->qual)
show_instrumentation_count("Rows Removed by Filter", 2,
- planstate, es);
+ local_instr, es);
break;
case T_Agg:
show_agg_keys(castNode(AggState, planstate), ancestors, es);
@@ -2337,13 +2391,13 @@ ExplainNode(PlanState *planstate, List *ancestors,
show_hashagg_info((AggState *) planstate, es);
if (plan->qual)
show_instrumentation_count("Rows Removed by Filter", 1,
- planstate, es);
+ local_instr, es);
break;
case T_WindowAgg:
show_upper_qual(plan->qual, "Filter", planstate, ancestors, es);
if (plan->qual)
show_instrumentation_count("Rows Removed by Filter", 1,
- planstate, es);
+ local_instr, es);
show_upper_qual(((WindowAgg *) plan)->runConditionOrig,
"Run Condition", planstate, ancestors, es);
show_windowagg_info(castNode(WindowAggState, planstate), es);
@@ -2353,7 +2407,7 @@ ExplainNode(PlanState *planstate, List *ancestors,
show_upper_qual(plan->qual, "Filter", planstate, ancestors, es);
if (plan->qual)
show_instrumentation_count("Rows Removed by Filter", 1,
- planstate, es);
+ local_instr, es);
break;
case T_Sort:
show_sort_keys(castNode(SortState, planstate), ancestors, es);
@@ -2375,7 +2429,7 @@ ExplainNode(PlanState *planstate, List *ancestors,
show_upper_qual(plan->qual, "Filter", planstate, ancestors, es);
if (plan->qual)
show_instrumentation_count("Rows Removed by Filter", 1,
- planstate, es);
+ local_instr, es);
break;
case T_ModifyTable:
show_modifytable_info(castNode(ModifyTableState, planstate), ancestors,
@@ -2420,10 +2474,10 @@ ExplainNode(PlanState *planstate, List *ancestors,
}
/* Show buffer/WAL usage */
- if (es->buffers && planstate->instrument)
- show_buffer_usage(es, &planstate->instrument->bufusage);
- if (es->wal && planstate->instrument)
- show_wal_usage(es, &planstate->instrument->walusage);
+ if (es->buffers && local_instr)
+ show_buffer_usage(es, &local_instr->bufusage);
+ if (es->wal && local_instr)
+ show_wal_usage(es, &local_instr->walusage);
/* Prepare per-worker buffer/WAL usage */
if (es->workers_state && (es->buffers || es->wal) && es->verbose)
@@ -2562,6 +2616,12 @@ ExplainNode(PlanState *planstate, List *ancestors,
ExplainCloseGroup("Plan",
relationship ? NULL : "Plan",
true, es);
+
+ /* Progressive explain. Free cloned instrumentation object */
+ if (es->progressive && local_instr)
+ {
+ pfree(local_instr);
+ }
}
/*
@@ -3940,19 +4000,19 @@ show_tidbitmap_info(BitmapHeapScanState *planstate, ExplainState *es)
*/
static void
show_instrumentation_count(const char *qlabel, int which,
- PlanState *planstate, ExplainState *es)
+ Instrumentation *instr, ExplainState *es)
{
double nfiltered;
double nloops;
- if (!es->analyze || !planstate->instrument)
+ if (!es->analyze || !instr)
return;
if (which == 2)
- nfiltered = planstate->instrument->nfiltered2;
+ nfiltered = instr->nfiltered2;
else
- nfiltered = planstate->instrument->nfiltered1;
- nloops = planstate->instrument->nloops;
+ nfiltered = instr->nfiltered1;
+ nloops = instr->nloops;
/* In text mode, suppress zero counts; they're not interesting enough */
if (nfiltered > 0 || es->format != EXPLAIN_FORMAT_TEXT)
@@ -4618,7 +4678,7 @@ show_modifytable_info(ModifyTableState *mtstate, List *ancestors,
{
show_upper_qual((List *) node->onConflictWhere, "Conflict Filter",
&mtstate->ps, ancestors, es);
- show_instrumentation_count("Rows Removed by Conflict Filter", 1, &mtstate->ps, es);
+ show_instrumentation_count("Rows Removed by Conflict Filter", 1, (&mtstate->ps)->instrument, es);
}
/* EXPLAIN ANALYZE display of actual outcome for each tuple proposed */
@@ -4627,11 +4687,24 @@ show_modifytable_info(ModifyTableState *mtstate, List *ancestors,
double total;
double insert_path;
double other_path;
+ Instrumentation *local_instr;
- InstrEndLoop(outerPlanState(mtstate)->instrument);
+ /* Progressive explain. Clone instrumentation */
+ if (es->progressive)
+ {
+ local_instr = palloc0(sizeof(*local_instr));
+ *local_instr = *outerPlanState(mtstate)->instrument;
+ /* Force end loop even if node is in progress */
+ InstrEndLoopForce(local_instr);
+ }
+ else
+ {
+ local_instr = outerPlanState(mtstate)->instrument;
+ InstrEndLoop(local_instr);
+ }
/* count the number of source rows */
- total = outerPlanState(mtstate)->instrument->ntuples;
+ total = local_instr->ntuples;
other_path = mtstate->ps.instrument->ntuples2;
insert_path = total - other_path;
@@ -4639,6 +4712,10 @@ show_modifytable_info(ModifyTableState *mtstate, List *ancestors,
insert_path, 0, es);
ExplainPropertyFloat("Conflicting Tuples", NULL,
other_path, 0, es);
+
+ /* Progressive explain. Free cloned instrumentation object */
+ if (es->progressive)
+ pfree(local_instr);
}
}
else if (node->operation == CMD_MERGE)
@@ -4651,11 +4728,24 @@ show_modifytable_info(ModifyTableState *mtstate, List *ancestors,
double update_path;
double delete_path;
double skipped_path;
+ Instrumentation *local_instr;
- InstrEndLoop(outerPlanState(mtstate)->instrument);
+ /* Progressive explain. Clone instrumentation */
+ if (es->progressive)
+ {
+ local_instr = palloc0(sizeof(*local_instr));
+ *local_instr = *outerPlanState(mtstate)->instrument;
+ /* Force end loop even if node is in progress */
+ InstrEndLoopForce(local_instr);
+ }
+ else
+ {
+ local_instr = outerPlanState(mtstate)->instrument;
+ InstrEndLoop(local_instr);
+ }
/* count the number of source rows */
- total = outerPlanState(mtstate)->instrument->ntuples;
+ total = local_instr->ntuples;
insert_path = mtstate->mt_merge_inserted;
update_path = mtstate->mt_merge_updated;
delete_path = mtstate->mt_merge_deleted;
@@ -4686,6 +4776,10 @@ show_modifytable_info(ModifyTableState *mtstate, List *ancestors,
ExplainPropertyFloat("Tuples Deleted", NULL, delete_path, 0, es);
ExplainPropertyFloat("Tuples Skipped", NULL, skipped_path, 0, es);
}
+
+ /* Progressive explain. Free cloned instrumentation object */
+ if (es->progressive)
+ pfree(local_instr);
}
}
@@ -5910,3 +6004,290 @@ GetSerializationMetrics(DestReceiver *dest)
return empty;
}
+
+
+/*
+ * ProgressiveExplainPrint
+ * Prints progressive explain in memory.
+ *
+ * This operation needs to be done in a dedicated memory context
+ * as plans for instrumented runs will be printed multiple times
+ * and instrumentation objects need to be cloned so that stats
+ * can get updated without interfering with original objects.
+ */
+void
+ProgressiveExplainPrint(QueryDesc *queryDesc)
+{
+ MemoryContext tmpCxt;
+ MemoryContext oldCxt;
+ instr_time starttime;
+
+ INSTR_TIME_SET_CURRENT(starttime);
+
+ /* Dedicated memory context for the current plan being printed */
+ tmpCxt = AllocSetContextCreate(CurrentMemoryContext,
+ "Progressive Explain Temporary Context",
+ ALLOCSET_DEFAULT_SIZES);
+ oldCxt = MemoryContextSwitchTo(tmpCxt);
+
+ /* Produce a plan only if descriptor is being tracked */
+ if (queryDesc &&
+ queryDesc->planstate)
+ {
+ QueryDesc *currentQueryDesc = queryDesc;
+
+ explainHashKey key;
+ explainHashEntry *entry;
+ ExplainState *es;
+
+ es = NewExplainState();
+
+ /*
+ * Mark ExplainState as progressive so that the plan printing function
+ * adjusts logic accordingly.
+ */
+ es->progressive = true;
+ /* Instrumentation details come from the active QueryDesc */
+ es->analyze = queryDesc->instrument_options;
+ es->buffers = (queryDesc->instrument_options &
+ INSTRUMENT_BUFFERS) != 0;
+ es->wal = (queryDesc->instrument_options &
+ INSTRUMENT_WAL) != 0;
+ es->timing = (queryDesc->instrument_options &
+ INSTRUMENT_TIMER) != 0;
+ es->summary = (es->analyze);
+
+ /* Additional options come from progressive GUC settings */
+ es->format = progressive_explain_format;
+ es->verbose = progressive_explain_verbose;
+ es->settings = progressive_explain_settings;
+
+ ExplainBeginOutput(es);
+ ExplainPrintPlan(es, currentQueryDesc);
+ ExplainEndOutput(es);
+
+ key.pid = MyProcPid;
+
+ /* Exclusive access is needed to update the hash */
+ LWLockAcquire(ExplainHashLock, LW_EXCLUSIVE);
+ entry = (explainHashEntry *) hash_search(explainArray, &key, HASH_FIND, NULL);
+
+
+ if (entry)
+ {
+ entry->explain_count++;
+ strncpy(entry->plan, es->str->data, progressive_explain_output_size);
+ entry->explain_duration += elapsed_time(&starttime);
+ entry->last_explain = GetCurrentTimestamp();
+ }
+
+ LWLockRelease(ExplainHashLock);
+
+ /*
+ * Free local explain state before exiting as this function may be
+ * called multiple times in the same memory context.
+ */
+ pfree(es->str);
+ pfree(es);
+ }
+
+ /* Clean up temp context */
+ MemoryContextSwitchTo(oldCxt);
+ MemoryContextDelete(tmpCxt);
+}
+
+/*
+ * ProgressiveExplainBegin
+ * Enables progressive explain progress tracking for a query in the local backend.
+ *
+ * An progressive explain is printed at the beginning of every query if progressive_explain
+ * is enabled.
+
+ * For instrumented runs started with EXPLAIN ANALYZE the progressive plan is printed
+ * via ExecProcNodeInstrExplain at a regular interval controlled by progressive_explain_interval.
+ *
+ * Plans are stored in shared memory object explainArray that needs to be properly
+ * cleared when the query finishes or gets canceled. This is achieved with the help
+ * of a memory context callback configured in the same memory context where the query
+ * descriptor was created. This strategy allows cleaning explainArray even when the
+ * query doesn't finish gracefully.
+ */
+void
+ProgressiveExplainBegin(QueryDesc *queryDesc)
+{
+ explainHashKey key;
+ explainHashEntry *entry;
+ bool found;
+
+ /* Configure memory context release callback */
+ MemoryContextCallback *queryDescReleaseCallback;
+
+ queryDescReleaseCallback = (MemoryContextCallback *)
+ palloc0(sizeof(MemoryContextCallback));
+ queryDescReleaseCallback->func = ExplainTrackQueryReleaseFunc;
+ queryDescReleaseCallback->arg = NULL;
+ MemoryContextRegisterResetCallback(CurrentMemoryContext,
+ queryDescReleaseCallback);
+
+ INSTR_TIME_SET_CURRENT(queryDesc->estate->progressive_explain_interval_time);
+
+ key.pid = MyProcPid;
+
+ /* Exclusive access is needed to update the hash */
+ LWLockAcquire(ExplainHashLock, LW_EXCLUSIVE);
+
+ /* Find or create an entry with desired hash code */
+ entry = (explainHashEntry *) hash_search(explainArray, &key, HASH_ENTER, &found);
+ entry->pid = MyProcPid;
+ entry->explain_count = 0;
+ entry->explain_duration = 0.0f;
+ strcpy(entry->plan, "");
+ entry->last_explain = 0;
+
+ LWLockRelease(ExplainHashLock);
+
+ /* Printing progressive plan for the first time */
+ ProgressiveExplainPrint(queryDesc);
+}
+
+/*
+ * ProgressiveExplainUpdate
+ * Updates progressive explain for instrumented runs.
+ */
+void
+ProgressiveExplainUpdate(PlanState *node)
+{
+ /*
+ * Update explain plan only if has passed since previous print.
+ */
+ if (elapsed_time(&node->state->progressive_explain_interval_time) * 1000.0 > progressive_explain_interval)
+ {
+ node->state->progressive_explain_current_node = node;
+ ProgressiveExplainPrint(node->state->query_desc);
+ node->state->progressive_explain_current_node = NULL;
+ INSTR_TIME_SET_CURRENT(node->state->progressive_explain_interval_time);
+ }
+}
+
+/*
+ * ExplainTrackQueryReleaseFunc
+ * Memory context release callback function to remove
+ * plan from explain hash.
+ */
+static void
+ExplainTrackQueryReleaseFunc(void *)
+{
+ /* Remove row from hash */
+ explainHashKey key;
+
+ key.pid = MyProcPid;
+ LWLockAcquire(ExplainHashLock, LW_EXCLUSIVE);
+ hash_search(explainArray, &key, HASH_REMOVE, NULL);
+ LWLockRelease(ExplainHashLock);
+}
+
+/*
+ * ExplainHashShmemSize
+ * Compute shared memory space needed for explain hash.
+ */
+Size
+ExplainHashShmemSize(void)
+{
+ Size size = 0;
+ long max_table_size;
+
+ max_table_size = add_size(MaxBackends, max_parallel_workers);
+
+ size = add_size(size, hash_estimate_size(max_table_size, add_size(sizeof(explainHashEntry), progressive_explain_output_size)));
+
+ return size;
+}
+
+/*
+ * InitExplainHash
+ * Initialize hash used to store data of progressive explains.
+ */
+void
+InitExplainHash(void)
+{
+ HASHCTL info;
+
+ info.keysize = sizeof(explainHashKey);
+ info.entrysize = sizeof(explainHashEntry) + progressive_explain_output_size;
+
+ explainArray = ShmemInitHash("explain hash",
+ 50, 50,
+ &info,
+ HASH_ELEM | HASH_BLOBS);
+}
+
+/*
+ * pg_stat_progress_explain
+ * Return the progress of progressive explains.
+ */
+Datum
+pg_stat_progress_explain(PG_FUNCTION_ARGS)
+{
+ char duration[1024];
+#define EXPLAIN_ACTIVITY_COLS 5
+ HASH_SEQ_STATUS hash_seq;
+ explainHashEntry *entry;
+
+ ReturnSetInfo *rsinfo = (ReturnSetInfo *) fcinfo->resultinfo;
+
+ InitMaterializedSRF(fcinfo, 0);
+
+ LWLockAcquire(ExplainHashLock, LW_SHARED);
+
+ hash_seq_init(&hash_seq, explainArray);
+ while ((entry = hash_seq_search(&hash_seq)) != NULL)
+ {
+ Datum values[EXPLAIN_ACTIVITY_COLS] = {0};
+ bool nulls[EXPLAIN_ACTIVITY_COLS] = {0};
+
+ values[0] = entry->pid;
+ values[1] = TimestampTzGetDatum(entry->last_explain);
+ values[2] = entry->explain_count;
+ sprintf(duration, "%.3f", 1000.0 * entry->explain_duration);
+ values[3] = CStringGetTextDatum(duration);
+
+ if (superuser())
+ values[4] = CStringGetTextDatum(entry->plan);
+ else
+ {
+ int num_backends = pgstat_fetch_stat_numbackends();
+ int curr_backend;
+ bool found;
+
+ /* 1-based index */
+ for (curr_backend = 1; curr_backend <= num_backends; curr_backend++)
+ {
+ LocalPgBackendStatus *local_beentry;
+ PgBackendStatus *beentry;
+
+ /* Get the next one in the list */
+ local_beentry = pgstat_get_local_beentry_by_index(curr_backend);
+ beentry = &local_beentry->backendStatus;
+
+ if (beentry->st_procpid == entry->pid)
+ {
+ found = true;
+ if (beentry->st_userid == GetUserId())
+ values[4] = CStringGetTextDatum(entry->plan);
+ else
+ values[4] = CStringGetTextDatum("<insufficient privilege>");
+ break;
+ }
+ }
+
+ if (!found)
+ values[3] = CStringGetTextDatum("<insufficient privilege>");
+ }
+
+ tuplestore_putvalues(rsinfo->setResult, rsinfo->setDesc, values, nulls);
+
+ }
+ LWLockRelease(ExplainHashLock);
+
+ return (Datum) 0;
+}
diff --git a/src/backend/executor/execMain.c b/src/backend/executor/execMain.c
index 5ca856fd27..e34c8f03fe 100644
--- a/src/backend/executor/execMain.c
+++ b/src/backend/executor/execMain.c
@@ -61,6 +61,7 @@
#include "utils/partcache.h"
#include "utils/rls.h"
#include "utils/snapmgr.h"
+#include "commands/explain.h"
/* Hooks for plugins to get control in ExecutorStart/Run/Finish/End */
@@ -174,6 +175,11 @@ standard_ExecutorStart(QueryDesc *queryDesc, int eflags)
estate = CreateExecutorState();
queryDesc->estate = estate;
+ /*
+ * Adding back reference to QueryDesc
+ */
+ estate->query_desc = queryDesc;
+
oldcontext = MemoryContextSwitchTo(estate->es_query_cxt);
/*
@@ -260,6 +266,12 @@ standard_ExecutorStart(QueryDesc *queryDesc, int eflags)
InitPlan(queryDesc, eflags);
MemoryContextSwitchTo(oldcontext);
+
+ /*
+ * Start progressive explain if enabled.
+ */
+ if (progressive_explain)
+ ProgressiveExplainBegin(queryDesc);
}
/* ----------------------------------------------------------------
diff --git a/src/backend/executor/execProcnode.c b/src/backend/executor/execProcnode.c
index 34f28dfece..16f1407633 100644
--- a/src/backend/executor/execProcnode.c
+++ b/src/backend/executor/execProcnode.c
@@ -118,9 +118,13 @@
#include "executor/nodeWorktablescan.h"
#include "miscadmin.h"
#include "nodes/nodeFuncs.h"
+#include "commands/explain.h"
+#include "utils/guc.h"
+#include "common/pg_prng.h"
static TupleTableSlot *ExecProcNodeFirst(PlanState *node);
static TupleTableSlot *ExecProcNodeInstr(PlanState *node);
+static TupleTableSlot *ExecProcNodeInstrExplain(PlanState *node);
static bool ExecShutdownNode_walker(PlanState *node, void *context);
@@ -461,8 +465,12 @@ ExecProcNodeFirst(PlanState *node)
* does instrumentation. Otherwise we can dispense with all wrappers and
* have ExecProcNode() directly call the relevant function from now on.
*/
+
if (node->instrument)
- node->ExecProcNode = ExecProcNodeInstr;
+ if (progressive_explain)
+ node->ExecProcNode = ExecProcNodeInstrExplain;
+ else
+ node->ExecProcNode = ExecProcNodeInstr;
else
node->ExecProcNode = node->ExecProcNodeReal;
@@ -489,6 +497,31 @@ ExecProcNodeInstr(PlanState *node)
return result;
}
+/*
+ * ExecProcNode wrapper that performs instrumentation calls and prints
+ * progressive explains. By keeping this a separate function, we add
+ * overhead only when progressive explain is enabled
+ */
+static TupleTableSlot *
+ExecProcNodeInstrExplain(PlanState *node)
+{
+ TupleTableSlot *result;
+
+ InstrStartNode(node->instrument);
+
+ /*
+ * Update progressive explain based on sampling.
+ */
+ if (pg_prng_double(&pg_global_prng_state) < progressive_explain_sample_rate)
+ ProgressiveExplainUpdate(node);
+
+ result = node->ExecProcNodeReal(node);
+
+ InstrStopNode(node->instrument, TupIsNull(result) ? 0.0 : 1.0);
+
+ return result;
+}
+
/* ----------------------------------------------------------------
* MultiExecProcNode
diff --git a/src/backend/executor/instrument.c b/src/backend/executor/instrument.c
index 268ae8a945..244c3591a1 100644
--- a/src/backend/executor/instrument.c
+++ b/src/backend/executor/instrument.c
@@ -25,6 +25,8 @@ static WalUsage save_pgWalUsage;
static void BufferUsageAdd(BufferUsage *dst, const BufferUsage *add);
static void WalUsageAdd(WalUsage *dst, WalUsage *add);
+static void InstrEndLoopInternal(Instrumentation *instr, bool force);
+
/* Allocate new instrumentation structure(s) */
Instrumentation *
@@ -137,7 +139,7 @@ InstrUpdateTupleCount(Instrumentation *instr, double nTuples)
/* Finish a run cycle for a plan node */
void
-InstrEndLoop(Instrumentation *instr)
+InstrEndLoopInternal(Instrumentation *instr, bool force)
{
double totaltime;
@@ -145,7 +147,7 @@ InstrEndLoop(Instrumentation *instr)
if (!instr->running)
return;
- if (!INSTR_TIME_IS_ZERO(instr->starttime))
+ if (!INSTR_TIME_IS_ZERO(instr->starttime) && !force)
elog(ERROR, "InstrEndLoop called on running node");
/* Accumulate per-cycle statistics into totals */
@@ -164,6 +166,20 @@ InstrEndLoop(Instrumentation *instr)
instr->tuplecount = 0;
}
+/* Safely finish a run cycle for a plan node */
+void
+InstrEndLoop(Instrumentation *instr)
+{
+ InstrEndLoopInternal(instr, false);
+}
+
+/* Forcibly finish a run cycle for a plan node */
+void
+InstrEndLoopForce(Instrumentation *instr)
+{
+ InstrEndLoopInternal(instr, true);
+}
+
/* aggregate instrumentation information */
void
InstrAggNode(Instrumentation *dst, Instrumentation *add)
diff --git a/src/backend/storage/ipc/ipci.c b/src/backend/storage/ipc/ipci.c
index 7783ba854f..25e70c63d5 100644
--- a/src/backend/storage/ipc/ipci.c
+++ b/src/backend/storage/ipc/ipci.c
@@ -50,6 +50,7 @@
#include "storage/sinvaladt.h"
#include "utils/guc.h"
#include "utils/injection_point.h"
+#include "commands/explain.h"
/* GUCs */
int shared_memory_type = DEFAULT_SHARED_MEMORY_TYPE;
@@ -148,6 +149,7 @@ CalculateShmemSize(int *num_semaphores)
size = add_size(size, WaitEventCustomShmemSize());
size = add_size(size, InjectionPointShmemSize());
size = add_size(size, SlotSyncShmemSize());
+ size = add_size(size, ExplainHashShmemSize());
/* include additional requested shmem from preload libraries */
size = add_size(size, total_addin_request);
@@ -300,6 +302,11 @@ CreateOrAttachShmemStructs(void)
*/
PredicateLockShmemInit();
+ /*
+ * Set up instrumented explain hash table
+ */
+ InitExplainHash();
+
/*
* Set up process table
*/
diff --git a/src/backend/utils/activity/wait_event_names.txt b/src/backend/utils/activity/wait_event_names.txt
index 16144c2b72..da08e7f3c9 100644
--- a/src/backend/utils/activity/wait_event_names.txt
+++ b/src/backend/utils/activity/wait_event_names.txt
@@ -345,6 +345,7 @@ WALSummarizer "Waiting to read or update WAL summarization state."
DSMRegistry "Waiting to read or update the dynamic shared memory registry."
InjectionPoint "Waiting to read or update information related to injection points."
SerialControl "Waiting to read or update shared <filename>pg_serial</filename> state."
+ExplainHash "Waiting to access backend explain shared hash table."
#
# END OF PREDEFINED LWLOCKS (DO NOT CHANGE THIS LINE)
diff --git a/src/backend/utils/misc/guc_tables.c b/src/backend/utils/misc/guc_tables.c
index 8a67f01200..b276955603 100644
--- a/src/backend/utils/misc/guc_tables.c
+++ b/src/backend/utils/misc/guc_tables.c
@@ -46,6 +46,7 @@
#include "commands/vacuum.h"
#include "common/file_utils.h"
#include "common/scram-common.h"
+#include "commands/explain.h"
#include "jit/jit.h"
#include "libpq/auth.h"
#include "libpq/libpq.h"
@@ -474,6 +475,14 @@ static const struct config_enum_entry wal_compression_options[] = {
{NULL, 0, false}
};
+static const struct config_enum_entry explain_format_options[] = {
+ {"text", EXPLAIN_FORMAT_TEXT, false},
+ {"xml", EXPLAIN_FORMAT_XML, false},
+ {"json", EXPLAIN_FORMAT_JSON, false},
+ {"yaml", EXPLAIN_FORMAT_YAML, false},
+ {NULL, 0, false}
+};
+
/*
* Options for enum values stored in other modules
*/
@@ -528,6 +537,13 @@ int log_parameter_max_length_on_error = 0;
int log_temp_files = -1;
double log_statement_sample_rate = 1.0;
double log_xact_sample_rate = 0;
+bool progressive_explain = false;
+bool progressive_explain_verbose = false;
+bool progressive_explain_settings = false;
+int progressive_explain_interval = 1000;
+int progressive_explain_format = EXPLAIN_FORMAT_TEXT;
+int progressive_explain_output_size = 4096;
+double progressive_explain_sample_rate = 0.01;
char *backtrace_functions;
int temp_file_limit = -1;
@@ -2076,6 +2092,39 @@ struct config_bool ConfigureNamesBool[] =
NULL, NULL, NULL
},
+ {
+ {"progressive_explain", PGC_USERSET, QUERY_TUNING_METHOD,
+ gettext_noop("Enables progressive explains."),
+ gettext_noop("Explain output is visible via pg_stat_progress_explain."),
+ GUC_EXPLAIN
+ },
+ &progressive_explain,
+ false,
+ NULL, NULL, NULL
+ },
+
+ {
+ {"progressive_explain_verbose", PGC_USERSET, QUERY_TUNING_METHOD,
+ gettext_noop("Controls whether verbose details are added to progressive explains."),
+ gettext_noop("Equivalent to the VERBOSE option of EXPLAIN."),
+ GUC_EXPLAIN
+ },
+ &progressive_explain_verbose,
+ false,
+ NULL, NULL, NULL
+ },
+
+ {
+ {"progressive_explain_settings", PGC_USERSET, QUERY_TUNING_METHOD,
+ gettext_noop("Controls whether information about modified configuration is added to progressive explains."),
+ gettext_noop("Equivalent to the SETTINGS option of EXPLAIN."),
+ GUC_EXPLAIN
+ },
+ &progressive_explain_settings,
+ false,
+ NULL, NULL, NULL
+ },
+
/* End-of-list marker */
{
{NULL, 0, 0, NULL, NULL}, NULL, false, NULL, NULL, NULL
@@ -3714,6 +3763,28 @@ struct config_int ConfigureNamesInt[] =
NULL, NULL, NULL
},
+ {
+ {"progressive_explain_interval", PGC_USERSET, QUERY_TUNING_METHOD,
+ gettext_noop("Sets the interval between instrumented progressive "
+ "explains."),
+ NULL,
+ GUC_UNIT_MS
+ },
+ &progressive_explain_interval,
+ 1000, 10, INT_MAX,
+ NULL, NULL, NULL
+ },
+
+ {
+ {"progressive_explain_output_size", PGC_POSTMASTER, QUERY_TUNING_METHOD,
+ gettext_noop("Sets the size reserved for pg_stat_progress_explain.explain, in bytes."),
+ NULL
+ },
+ &progressive_explain_output_size,
+ 4096, 100, 1048576,
+ NULL, NULL, NULL
+ },
+
/* End-of-list marker */
{
{NULL, 0, 0, NULL, NULL}, NULL, 0, 0, 0, NULL, NULL, NULL
@@ -3995,6 +4066,17 @@ struct config_real ConfigureNamesReal[] =
NULL, NULL, NULL
},
+ {
+ {"progressive_explain_sample_rate", PGC_USERSET, QUERY_TUNING_METHOD,
+ gettext_noop("Fraction of rows processed by the query until progressive_explain_interval is evaluated "
+ "to print a progressive plan."),
+ gettext_noop("Use a value between 0.0 (never) and 1.0 (always).")
+ },
+ &progressive_explain_sample_rate,
+ 0.01, 0.0, 1.0,
+ NULL, NULL, NULL
+ },
+
/* End-of-list marker */
{
{NULL, 0, 0, NULL, NULL}, NULL, 0.0, 0.0, 0.0, NULL, NULL, NULL
@@ -5207,6 +5289,16 @@ struct config_enum ConfigureNamesEnum[] =
NULL, NULL, NULL
},
+ {
+ {"progressive_explain_format", PGC_USERSET, QUERY_TUNING_METHOD,
+ gettext_noop("Selects the EXPLAIN output format to be used with progressive explains."),
+ gettext_noop("Equivalent to the FORMAT option of EXPLAIN.")
+ },
+ &progressive_explain_format,
+ EXPLAIN_FORMAT_TEXT, explain_format_options,
+ NULL, NULL, NULL
+ },
+
/* End-of-list marker */
{
{NULL, 0, 0, NULL, NULL}, NULL, 0, NULL, NULL, NULL, NULL
diff --git a/src/include/catalog/pg_proc.dat b/src/include/catalog/pg_proc.dat
index cbbe8acd38..dbc1185cfc 100644
--- a/src/include/catalog/pg_proc.dat
+++ b/src/include/catalog/pg_proc.dat
@@ -12402,4 +12402,14 @@
proargtypes => 'int2',
prosrc => 'gist_stratnum_identity' },
+{ oid => '8770',
+ descr => 'statistics: information about progress of backends running statements',
+ proname => 'pg_stat_progress_explain', prorows => '100', proisstrict => 'f',
+ proretset => 't', provolatile => 's', proparallel => 'r',
+ prorettype => 'record', proargtypes => 'bool',
+ proallargtypes => '{bool,int4,timestamptz,int4,text,text}',
+ proargmodes => '{i,o,o,o,o,o}',
+ proargnames => '{mode,pid,last_explain,explain_count,total_explain_time,explain}',
+ prosrc => 'pg_stat_progress_explain' },
+
]
diff --git a/src/include/commands/explain.h b/src/include/commands/explain.h
index aa5872bc15..b501d2eaa4 100644
--- a/src/include/commands/explain.h
+++ b/src/include/commands/explain.h
@@ -67,12 +67,28 @@ typedef struct ExplainState
List *deparse_cxt; /* context list for deparsing expressions */
Bitmapset *printed_subplans; /* ids of SubPlans we've printed */
bool hide_workers; /* set if we find an invisible Gather */
+ bool progressive; /* set if tracking a progressive explain */
int rtable_size; /* length of rtable excluding the RTE_GROUP
* entry */
/* state related to the current plan node */
ExplainWorkersState *workers_state; /* needed if parallel plan */
} ExplainState;
+typedef struct explainHashKey
+{
+ int pid; /* PID */
+} explainHashKey;
+
+typedef struct explainHashEntry
+{
+ explainHashKey key; /* hash key of entry - MUST BE FIRST */
+ int pid;
+ TimestampTz last_explain;
+ int explain_count;
+ float explain_duration;
+ char plan[];
+} explainHashEntry;
+
/* Hook for plugins to get control in ExplainOneQuery() */
typedef void (*ExplainOneQuery_hook_type) (Query *query,
int cursorOptions,
@@ -144,4 +160,9 @@ extern void ExplainCloseGroup(const char *objtype, const char *labelname,
extern DestReceiver *CreateExplainSerializeDestReceiver(ExplainState *es);
+extern void ProgressiveExplainBegin(QueryDesc *queryDesc);
+extern void ProgressiveExplainUpdate(PlanState *node);
+extern Size ExplainHashShmemSize(void);
+extern void InitExplainHash(void);
+
#endif /* EXPLAIN_H */
diff --git a/src/include/executor/instrument.h b/src/include/executor/instrument.h
index bfd7b6d844..2963a70e41 100644
--- a/src/include/executor/instrument.h
+++ b/src/include/executor/instrument.h
@@ -108,6 +108,7 @@ extern void InstrStartNode(Instrumentation *instr);
extern void InstrStopNode(Instrumentation *instr, double nTuples);
extern void InstrUpdateTupleCount(Instrumentation *instr, double nTuples);
extern void InstrEndLoop(Instrumentation *instr);
+extern void InstrEndLoopForce(Instrumentation *instr);
extern void InstrAggNode(Instrumentation *dst, Instrumentation *add);
extern void InstrStartParallelQuery(void);
extern void InstrEndParallelQuery(BufferUsage *bufusage, WalUsage *walusage);
diff --git a/src/include/nodes/execnodes.h b/src/include/nodes/execnodes.h
index 182a6956bb..c57b4c28d2 100644
--- a/src/include/nodes/execnodes.h
+++ b/src/include/nodes/execnodes.h
@@ -56,6 +56,7 @@ struct ExprState;
struct ExprContext;
struct RangeTblEntry; /* avoid including parsenodes.h here */
struct ExprEvalStep; /* avoid including execExpr.h everywhere */
+struct QueryDesc; /* avoid including execdesc.h here */
struct CopyMultiInsertBuffer;
struct LogicalTapeSet;
@@ -735,6 +736,10 @@ typedef struct EState
*/
List *es_insert_pending_result_relations;
List *es_insert_pending_modifytables;
+
+ struct QueryDesc *query_desc;
+ instr_time progressive_explain_interval_time;
+ struct PlanState *progressive_explain_current_node;
} EState;
diff --git a/src/include/storage/lwlocklist.h b/src/include/storage/lwlocklist.h
index 6a2f64c54f..b6ab577370 100644
--- a/src/include/storage/lwlocklist.h
+++ b/src/include/storage/lwlocklist.h
@@ -83,3 +83,4 @@ PG_LWLOCK(49, WALSummarizer)
PG_LWLOCK(50, DSMRegistry)
PG_LWLOCK(51, InjectionPoint)
PG_LWLOCK(52, SerialControl)
+PG_LWLOCK(53, ExplainHash)
diff --git a/src/include/utils/guc.h b/src/include/utils/guc.h
index 840b0fe57f..faa5118c58 100644
--- a/src/include/utils/guc.h
+++ b/src/include/utils/guc.h
@@ -274,6 +274,13 @@ extern PGDLLIMPORT int log_min_duration_statement;
extern PGDLLIMPORT int log_temp_files;
extern PGDLLIMPORT double log_statement_sample_rate;
extern PGDLLIMPORT double log_xact_sample_rate;
+extern PGDLLIMPORT bool progressive_explain;
+extern PGDLLIMPORT bool progressive_explain_verbose;
+extern PGDLLIMPORT bool progressive_explain_settings;
+extern PGDLLIMPORT int progressive_explain_interval;
+extern PGDLLIMPORT int progressive_explain_output_size;
+extern PGDLLIMPORT int progressive_explain_format;
+extern PGDLLIMPORT double progressive_explain_sample_rate;
extern PGDLLIMPORT char *backtrace_functions;
extern PGDLLIMPORT int temp_file_limit;
--
2.39.5 (Apple Git-154)
On Sun, Dec 29, 2024 at 8:19 PM Rafael Thofehrn Castro <rafaelthca@gmail.com>
wrote:
Plans are only printed if the new GUC parameter progressive_explain is
enabled.
Maybe track_explain instead? In the spirit of track_activity.
- progressive_explain_output_size: max output size of the plan printed in
the in-memory hash table.
- default: 4096
- min: 100
4096 seems low, if this means the explain plan is truncated at that size.
Also, the 100 minimum seems arbitrary.
So we can enable verbose and settings - but not wal? I could see that one
being useful. Not so much the rest (timing, summary). And buffers has
recently changed, so no need to worry about that. :)
- The plans are stored in a shared hash object (explainArray) allocated at
database start, similar to procArray. ExplainHashShmemSize() computes
shared memory needed for it, based on max_connections +
max_parallel_workers for the amount of elements in the array and
progressive_explain_output_size for the size per element.
Hmmm...don't have a solution/suggestion offhand, but using max_connections
would seem to be allocating a chunk of memory that is never used 99% of the
time, as most people don't run active queries near max_connections.
(Actually, on re-reading my draft, I would prefer a rotating pool like
pg_stat_statements does.)
- Column explain from pg_stat_progress_explain can only be visualized by
superusers or the same role that is running the query. If none of those
conditions are met, users will see "<insufficient privilege>".
Or pg_read_all_stats I presume? Are those other columns (e.g.
explain_count) being visible to anyone really useful, or can we throw them
all behind the same permission restriction?
- From (B) we see that using progressive explain slightly increases total
execution time.
Is this using the default dirt-simple pgbench queries? What about queries
that generate very large explain plans?
- Do the columns in pg_stat_progress_explain make sense? Are we missing or
adding unnecessary columns?
Perhaps having the interval and sample rate in here as well, since they are
user-level and thus could be different from other rows in the view. It is
tempting to throw in other things as well like the query_start and datname,
but we don't want to copy all of pg_stat_activity...
It's not clear if total_explain_time is now() - query_start or something
else. If not, I would love to see an elapsed time interval column.
Perhaps add a leader_pid column. That's something I would always be joining
with pg_stat_activity to find out.
- Do we want progressive explain to print plans of regular queries started
without EXPLAIN if progressive_explain is enabled or should
the feature be restricted to instrumented queries (EXPLAIN ANALYZE)?
The latter, but no strong opinion.
id="guc-progressive-explain"
The new docs should mention the view name here, IMO, in addition to the
existing link that has details.
Random idea: track_explain_min_duration
Looks very cool overall, +1.
Cheers,
Greg
This proposal introduces a feature to print execution plans of active
queries in an in-memory shared hash object so that other sessions can
visualize them with a new view: pg_stat_progress_explain.
Thanks for this thread and for sharing the presentation
material. +1 for the idea of adding instrumentation that
will help users understand the bottlenecks in execution
plans. I want to share my perspective on this topic.
A DBA will want to know:
1/ Where is the bottleneck for a long running query currently
in flight?
2/ For a OLTP workload with many quick plans that
could be further optimized; what plan and what
part of the plan is contributing to the database load?
Having a view like pg_stat_progress_explain ( maybe a more
appropriate name is pg_stat_progress_plan ) will be
extremely useful to allow a user to build monitoring
dashboards to be able to answer such questions.
I do not think however this instrumentation should only be
made available if a user runs EXPLAIN ANALYZE.
In my opinion, this will severely limit the usefulness of this
instrumentation in production. Of course, one can use auto_explain,
but users will be hesitant to enable auto_explain with analyze in
production for all their workloads. Also, there should not be an
auto_explain dependency for this feature.
One approach will be for the view to expose the
explain plan and the current node being executed. I think the
plan_node_id can be exposed for this purpose but have not looked
into this in much detail yet. The plan_node_id can then be used
to locate the part of the plan that is a potential bottleneck ( if that
plan node is the one constantly being called ).
This may also be work that is better suited for an extension, but
core will need to add a hook in ExecProcNode so an extension can
have access to PlanState.
Regards,
Sami Imseih
Amazon Web Services (AWS)
hi.
[48/208] Compiling C object
contrib/postgres_fdw/postgres_fdw.so.p/postgres_fdw.c.o
FAILED: contrib/postgres_fdw/postgres_fdw.so.p/postgres_fdw.c.o
/usr/local/gcc-14.1.0/bin/gcc-14.1.0
-Icontrib/postgres_fdw/postgres_fdw.so.p -Isrc/include
-I../../Desktop/pg_src/src5/postgres/src/include
-Isrc/interfaces/libpq
-I../../Desktop/pg_src/src5/postgres/src/interfaces/libpq
-I/usr/include/libxml2 -fdiagnostics-color=always --coverage
-D_FILE_OFFSET_BITS=64 -Wall -Winvalid-pch -Werror -g
-fno-strict-aliasing -fwrapv -fexcess-precision=standard -D_GNU_SOURCE
-Wmissing-prototypes -Wpointer-arith -Werror=vla -Wendif-labels
-Wmissing-format-attribute -Wimplicit-fallthrough=3
-Wcast-function-type -Wshadow=compatible-local -Wformat-security
-Wdeclaration-after-statement -Wmissing-variable-declarations
-Wno-format-truncation -Wno-stringop-truncation -Wunused-variable
-Wuninitialized -Werror=maybe-uninitialized -Wreturn-type
-DWRITE_READ_PARSE_PLAN_TREES -DCOPY_PARSE_PLAN_TREES
-DREALLOCATE_BITMAPSETS -DLOCK_DEBUG -DRELCACHE_FORCE_RELEASE
-DENFORCE_REGRESSION_TEST_NAME_RESTRICTIONS
-DRAW_EXPRESSION_COVERAGE_TEST -fno-omit-frame-pointer -fPIC -pthread
-fvisibility=hidden -MD -MQ
contrib/postgres_fdw/postgres_fdw.so.p/postgres_fdw.c.o -MF
contrib/postgres_fdw/postgres_fdw.so.p/postgres_fdw.c.o.d -o
contrib/postgres_fdw/postgres_fdw.so.p/postgres_fdw.c.o -c
../../Desktop/pg_src/src5/postgres/contrib/postgres_fdw/postgres_fdw.c
In file included from
../../Desktop/pg_src/src5/postgres/contrib/postgres_fdw/postgres_fdw.c:22:
../../Desktop/pg_src/src5/postgres/src/include/commands/explain.h:86:9:
error: unknown type name ‘TimestampTz’
86 | TimestampTz last_explain;
| ^~~~~~~~~~~
[58/188] Linking target contrib/sslinfo/sslinfo.so
ninja: build stopped: subcommand failed.
compile failed. the above is the error message.
Thanks Greg, Sami and Jian for the feedback so far.
Maybe track_explain instead? In the spirit of track_activity.
That was the original name, and all other GUCs were following the
track_activity_* logic. Changed to the name of the feature after
discussion with colleagues at EDB. This is definitely open for
discussion.
4096 seems low, if this means the explain plan is truncated at
that size. Also, the 100 minimum seems arbitrary.
Min (100) and max (1048576) are the same as the values for GUC
track_activity_query_size, which has a very similar purpose: controls
the size of pg_stat_activity.query column.
So we can enable verbose and settings - but not wal? I could see
that one being useful. Not so much the rest (timing, summary). And
buffers has recently changed, so no need to worry about that. :)
The logic I used for adding GUCs that control explain options is that
none of these settings should change QueryDesc->instrument_options,
which would change instrumentation options added to the actual
execution. GUCs available modify only the ExplainState object, which
affects only the output printed to pg_stat_progress_explain.
Hmmm...don't have a solution/suggestion offhand, but using max_connections
would seem to be allocating a chunk of memory that is never
used 99% of the time, as most people don't run active queries
near max_connections.
(Actually, on re-reading my draft, I would prefer a rotating pool
like pg_stat_statements does.)
Agreed. Thought about using a similar logic as pg_stat_statements
but the pool size there is very large by default, 5000. The difference
is that pg_stat_statements keeps the data in disk and I wanted to
avoid that as instrumented plans can print new plans very often,
affecting performance.
Maybe one idea would be to include a new GUC (progressive_explain_max_size)
that controls how many rows explainArray can have. If limit is reached
a backend won't print anything in that iteration.
It's not clear if total_explain_time is now() - query_start or something
else. If not, I would love to see an elapsed time interval column.
total_explain_time is accumulated time computed only printing
the plan. It does not include execution time.
Perhaps add a leader_pid column. That's something I would always
be joining with pg_stat_activity to find out.
For prints done by parallel workers? That information is available
in pg_stat_activity. The idea is to use the pid column and join with
pg_stat_activity to get all other relevant details.
I do not think however this instrumentation should only be
made available if a user runs EXPLAIN ANALYZE.
In my opinion, this will severely limit the usefulness of this
instrumentation in production. Of course, one can use auto_explain,
but users will be hesitant to enable auto_explain with analyze in
production for all their workloads. Also, there should not be an
auto_explain dependency for this feature.
One approach will be for the view to expose the
explain plan and the current node being executed. I think the
plan_node_id can be exposed for this purpose but have not looked
into this in much detail yet. The plan_node_id can then be used
to locate the part of the plan that is a potential bottleneck ( if that
plan node is the one constantly being called ).
You mean that we could include the current node being executed even
for non instrumented runs? In that case it would print the plain
plan + current node? That is a valid point and shouldn't be
difficult to implement. The problem is that this would require
adding overhead to ExecProcNode() (non instrumented) and that
can be a performance killer.
This may also be work that is better suited for an extension, but
core will need to add a hook in ExecProcNode so an extension can
have access to PlanState.
Are you talking about implementing your proposal (also printing
plan with current node for non instrumented runs) as an extension
or this whole patch as an extension?
If the whole patch, I thought about that. The thing is that the
proposal also changes ExplainNode() function, the core function
to print a plan. To implement it as an extension we would have to
duplicate 95% of that code.
I do think there is merit in having this feature as part of the
core and use existing extensions (auto_explain for example) to
increment it, like adding your suggestion to use a hook in
ExecProcNode().
compile failed. the above is the error message.
Thanks. It was indeed missing an include. It complained only for
a complete build (including contrib), so I failed to catch it.
Sending a second version with the fix.
Rafael.
On Tue, Dec 31, 2024 at 3:00 AM jian he <jian.universality@gmail.com> wrote:
Show quoted text
hi.
[48/208] Compiling C object
contrib/postgres_fdw/postgres_fdw.so.p/postgres_fdw.c.o
FAILED: contrib/postgres_fdw/postgres_fdw.so.p/postgres_fdw.c.o
/usr/local/gcc-14.1.0/bin/gcc-14.1.0
-Icontrib/postgres_fdw/postgres_fdw.so.p -Isrc/include
-I../../Desktop/pg_src/src5/postgres/src/include
-Isrc/interfaces/libpq
-I../../Desktop/pg_src/src5/postgres/src/interfaces/libpq
-I/usr/include/libxml2 -fdiagnostics-color=always --coverage
-D_FILE_OFFSET_BITS=64 -Wall -Winvalid-pch -Werror -g
-fno-strict-aliasing -fwrapv -fexcess-precision=standard -D_GNU_SOURCE
-Wmissing-prototypes -Wpointer-arith -Werror=vla -Wendif-labels
-Wmissing-format-attribute -Wimplicit-fallthrough=3
-Wcast-function-type -Wshadow=compatible-local -Wformat-security
-Wdeclaration-after-statement -Wmissing-variable-declarations
-Wno-format-truncation -Wno-stringop-truncation -Wunused-variable
-Wuninitialized -Werror=maybe-uninitialized -Wreturn-type
-DWRITE_READ_PARSE_PLAN_TREES -DCOPY_PARSE_PLAN_TREES
-DREALLOCATE_BITMAPSETS -DLOCK_DEBUG -DRELCACHE_FORCE_RELEASE
-DENFORCE_REGRESSION_TEST_NAME_RESTRICTIONS
-DRAW_EXPRESSION_COVERAGE_TEST -fno-omit-frame-pointer -fPIC -pthread
-fvisibility=hidden -MD -MQ
contrib/postgres_fdw/postgres_fdw.so.p/postgres_fdw.c.o -MF
contrib/postgres_fdw/postgres_fdw.so.p/postgres_fdw.c.o.d -o
contrib/postgres_fdw/postgres_fdw.so.p/postgres_fdw.c.o -c
../../Desktop/pg_src/src5/postgres/contrib/postgres_fdw/postgres_fdw.c
In file included from
../../Desktop/pg_src/src5/postgres/contrib/postgres_fdw/postgres_fdw.c:22:
../../Desktop/pg_src/src5/postgres/src/include/commands/explain.h:86:9:
error: unknown type name ‘TimestampTz’
86 | TimestampTz last_explain;
| ^~~~~~~~~~~
[58/188] Linking target contrib/sslinfo/sslinfo.so
ninja: build stopped: subcommand failed.compile failed. the above is the error message.
Attachments:
v2-0001-Proposal-for-progressive-explains.patchapplication/octet-stream; name=v2-0001-Proposal-for-progressive-explains.patchDownload
From 67ba6949d928db3a1a8f41163f10b71c94571ade Mon Sep 17 00:00:00 2001
From: rafaelthca <rafaelthca@gmail.com>
Date: Tue, 31 Dec 2024 15:14:03 +0000
Subject: [PATCH v2] Proposal for progressive explains
This proposal introduces a feature to print execution plans of active
queries in an in-memory shared hash object so that other sessions can
visualize via new view pg_stat_progress_explain.
Plans are only printed if new GUC parameter progressive_explain is
enabled.
For regular queries or queries started with EXPLAIN (without ANALYZE)
the plan is printed only once at the start.
For instrumented runs (started via EXPLAIN ANALYZE or when auto_explain
flag log_analyze is enabled) the plan is printed on a fixed interval
controlled by new GUC parameter progressive_explain_interval including
all instrumentation stats computed so far (per node rows and execution
time).
New view:
- pg_stat_progress_explain
- pid: PID of the process running the query
- last_explain: timestamp when plan was last printed
- explain_count: amount of times plan was printed
- total_explain_time: accumulated time spent printing plans (in ms)
- explain: the actual plan (limited read privileges)
New GUCs:
- progressive_explain: if progressive plans are printed for local
session.
- type: bool
- default: off
- context: user
- progressive_explain_interval: interval between each explain print.
- type: int
- default: 1s
- min: 10ms
- context: user
- progressive_explain_sample_rate: fraction of rows processed by the
query until progressive_explain_interval is evaluated to print a
progressive plan
- type: floating point
- default: 0.1
- range: 0.0 - 1.0
- context: user
- progressive_explain_output_size: max output size of the plan
printed in the in-memory hash table.
- type: int
- default: 4096
- min: 100
- context: postmaster
- progressive_explain_format: format used to print the plans.
- type: enum
- default: text
- context: user
- progressive_explain_settings: controls whether information about
modified configuration is added to the printed plan.
- type: bool
- default: off
- context: user
- progressive_explain_verbose: controls whether verbose details are
added to the printed plan.
- type: bool
- default: off
- context: user
---
doc/src/sgml/config.sgml | 109 ++++
doc/src/sgml/monitoring.sgml | 84 +++
doc/src/sgml/perform.sgml | 100 ++++
src/backend/catalog/system_views.sql | 5 +
src/backend/commands/explain.c | 479 ++++++++++++++++--
src/backend/executor/execMain.c | 12 +
src/backend/executor/execProcnode.c | 35 +-
src/backend/executor/instrument.c | 20 +-
src/backend/storage/ipc/ipci.c | 7 +
.../utils/activity/wait_event_names.txt | 1 +
src/backend/utils/misc/guc_tables.c | 92 ++++
src/include/catalog/pg_proc.dat | 10 +
src/include/commands/explain.h | 22 +
src/include/executor/instrument.h | 1 +
src/include/nodes/execnodes.h | 5 +
src/include/storage/lwlocklist.h | 1 +
src/include/utils/guc.h | 7 +
17 files changed, 938 insertions(+), 52 deletions(-)
diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml
index a84e60c09b..bde7631268 100644
--- a/doc/src/sgml/config.sgml
+++ b/doc/src/sgml/config.sgml
@@ -8462,6 +8462,115 @@ COPY postgres_log FROM '/full/path/to/logfile.csv' WITH csv;
</listitem>
</varlistentry>
+ <varlistentry id="guc-progressive-explain" xreflabel="progressive_explain">
+ <term><varname>progressive_explain</varname> (<type>boolean</type>)
+ <indexterm>
+ <primary><varname>progressive_explain</varname> configuration parameter</primary>
+ </indexterm>
+ </term>
+ <listitem>
+ <para>
+ Determines whether progressive explains are enabled; see
+ <xref linkend="using-explain-progressive"/>. The default is
+ <literal>off</literal>.
+ </para>
+ </listitem>
+ </varlistentry>
+
+ <varlistentry id="guc-progressive-explain-verbose" xreflabel="progressive_explain_verbose">
+ <term><varname>progressive_explain_verbose</varname> (<type>boolean</type>)
+ <indexterm>
+ <primary><varname>progressive_explain_verbose</varname> configuration parameter</primary>
+ </indexterm>
+ </term>
+ <listitem>
+ <para>
+ Controls whether verbose details are added to progressive explains.
+ Equivalent to the VERBOSE option of EXPLAIN. The default is
+ <literal>off</literal>.
+ </para>
+ </listitem>
+ </varlistentry>
+
+ <varlistentry id="guc-progressive-explain-settings" xreflabel="progressive_explain_settings">
+ <term><varname>progressive_explain_settings</varname> (<type>boolean</type>)
+ <indexterm>
+ <primary><varname>progressive_explain_settings</varname> configuration parameter</primary>
+ </indexterm>
+ </term>
+ <listitem>
+ <para>
+ Controls whether information about modified configuration is added to
+ progressive explains. Equivalent to the SETTINGS option of EXPLAIN.
+ The default is <literal>off</literal>.
+ </para>
+ </listitem>
+ </varlistentry>
+
+ <varlistentry id="guc-progressive-explain-interval" xreflabel="progressive_explain_interval">
+ <term><varname>progressive_explain_interval</varname> (<type>integer</type>)
+ <indexterm>
+ <primary><varname>progressive_explain_interval</varname> configuration parameter</primary>
+ </indexterm>
+ </term>
+ <listitem>
+ <para>
+ Sets the interval (in milliseconds) between each instrumented
+ progressive explain. The default is <literal>1s</literal>.
+ </para>
+ </listitem>
+ </varlistentry>
+
+ <varlistentry id="guc-progressive-explain-format" xreflabel="progressive_explain_format">
+ <term><varname>progressive_explain_format</varname> (<type>enum</type>)
+ <indexterm>
+ <primary><varname>progressive_explain_format</varname> configuration parameter</primary>
+ </indexterm>
+ </term>
+ <listitem>
+ <para>
+ Selects the EXPLAIN output format to be used with progressive
+ explains. Equivalent to the FORMAT option of EXPLAIN. The default
+ is <literal>text</literal>.
+ </para>
+ </listitem>
+ </varlistentry>
+
+ <varlistentry id="guc-progressive-explain-output-size" xreflabel="progressive_explain_output_size">
+ <term><varname>progressive_explain_output_size</varname> (<type>integer</type>)
+ <indexterm>
+ <primary><varname>progressive_explain_output_size</varname> configuration parameter</primary>
+ </indexterm>
+ </term>
+ <listitem>
+ <para>
+ Specifies the amount of memory reserved to store the text of the
+ progressive explain for each client backend or parallel worker, for the
+ <structname>pg_stat_progress_explain</structname>.<structfield>explain</structfield>
+ field. If this value is specified without units, it is taken as bytes.
+ The default value is <literal>4096 bytes</literal>.
+ This parameter can only be set at server start.
+ </para>
+ </listitem>
+ </varlistentry>
+
+ <varlistentry id="guc-progressive-explain-sample-rate" xreflabel="progressive_explain_sample_rate">
+ <term><varname>progressive_explain_sample_rate</varname> (<type>floating point</type>)
+ <indexterm>
+ <primary><varname>progressive_explain_sample_rate</varname> configuration parameter</primary>
+ </indexterm>
+ </term>
+ <listitem>
+ <para>
+ Fraction of rows processed by the query until
+ <xref linkend="guc-progressive-explain-interval"/> is evaluated
+ to print a progressive explain plan. The default value is
+ <literal>0.01</literal>, resulting in 1 check every 100 processed
+ rows.
+ </para>
+ </listitem>
+ </varlistentry>
+
</variablelist>
</sect2>
diff --git a/doc/src/sgml/monitoring.sgml b/doc/src/sgml/monitoring.sgml
index 840d7f8161..d2beb91893 100644
--- a/doc/src/sgml/monitoring.sgml
+++ b/doc/src/sgml/monitoring.sgml
@@ -6688,6 +6688,90 @@ FROM pg_stat_get_backend_idset() AS backendid;
</sect2>
+<sect2 id="explain-progress-reporting">
+ <title>EXPLAIN Progress Reporting</title>
+
+ <indexterm>
+ <primary>pg_stat_progress_explain</primary>
+ </indexterm>
+
+ <para>
+ Whenever a client backend or parallel worker is running a query with
+ <xref linkend="guc-progressive-explain"/> enabled, the
+ <structname>pg_stat_progress_explain</structname> view will contain a
+ corresponding row with query plan details; see
+ <xref linkend="using-explain-progressive"/>. The table below describe the
+ information that will be reported.
+ </para>
+
+ <table id="pg-stat-progress-explain-view" xreflabel="pg_stat_progress_explain">
+ <title><structname>pg_stat_progress_explain</structname> View</title>
+ <tgroup cols="1">
+ <thead>
+ <row>
+ <entry role="catalog_table_entry"><para role="column_definition">
+ Column Type
+ </para>
+ <para>
+ Description
+ </para></entry>
+ </row>
+ </thead>
+
+ <tbody>
+ <row>
+ <entry role="catalog_table_entry"><para role="column_definition">
+ <structfield>pid</structfield> <type>integer</type>
+ </para>
+ <para>
+ Process ID of a client backend or parallel worker.
+ </para></entry>
+ </row>
+
+ <row>
+ <entry role="catalog_table_entry"><para role="column_definition">
+ <structfield>last_explain</structfield> <type>timestamp with time zone</type>
+ </para>
+ <para>
+ Timestamp when plan was last printed.
+ </para></entry>
+ </row>
+
+ <row>
+ <entry role="catalog_table_entry"><para role="column_definition">
+ <structfield>explain_count</structfield> <type>integer</type>
+ </para>
+ <para>
+ Amount of times plan was printed.
+ </para></entry>
+ </row>
+
+ <row>
+ <entry role="catalog_table_entry"><para role="column_definition">
+ <structfield>total_explain_time</structfield> <type>floating point</type>
+ </para>
+ <para>
+ Accumulated time spent printing plans (in ms).
+ </para></entry>
+ </row>
+
+ <row>
+ <entry role="catalog_table_entry"><para role="column_definition">
+ <structfield>explain</structfield> <type>text</type>
+ </para>
+ <para>
+ The progressive explain text. By default the explain text is
+ truncated at 4096 bytes; this value can be changed via the
+ parameter <xref linkend="guc-progressive-explain-output-size"/>.
+ </para></entry>
+ </row>
+
+ </tbody>
+ </tgroup>
+ </table>
+
+ </sect2>
+
</sect1>
<sect1 id="dynamic-trace">
diff --git a/doc/src/sgml/perform.sgml b/doc/src/sgml/perform.sgml
index cd12b9ce48..dd2d21edb3 100644
--- a/doc/src/sgml/perform.sgml
+++ b/doc/src/sgml/perform.sgml
@@ -1091,6 +1091,106 @@ EXPLAIN ANALYZE SELECT * FROM tenk1 WHERE unique1 < 100 AND unique2 > 9000
</para>
</sect2>
+ <sect2 id="using-explain-progressive">
+ <title>Progressive <command>EXPLAIN</command></title>
+
+ <para>
+ The query plan created by the planner for any given active query, or
+ detailed plan with row counts and accumulated run time when
+ <command>EXPLAIN ANALYZE</command> is used, can be visualized by any
+ session via <xref linkend="pg-stat-progress-explain-view"/> view when
+ <xref linkend="guc-progressive-explain"/> is enabled in the client
+ backend or parallel worker executing query. Settings
+ <xref linkend="guc-progressive-explain-format"/>,
+ <xref linkend="guc-progressive-explain-verbose"/> and
+ <xref linkend="guc-progressive-explain-settings"/> can be adjusted
+ to customize the printed plan, containing a length limit defined by
+ <xref linkend="guc-progressive-explain-output-size"/>.
+ </para>
+
+ <para>
+ For queries executed without <command>EXPLAIN ANALYZE</command> the
+ plan is printed only once at the beginning of query execution:
+ </para>
+
+ <para>
+<screen>
+SET progressive_explain = on;
+SET
+
+SELECT *
+FROM test t1
+INNER JOIN test t2 ON (t1.c1 = t2.c1);
+</screen>
+ </para>
+ <para>
+<screen>
+SELECT pid, explain_count, explain FROM pg_stat_progress_explain;
+ pid | explain_count | explain
+-----+---------------+--------------------------------------------------------------------------------
+ 159 | 1 | Hash Join (cost=1159375.00..3912500.00 rows=30000000 width=74)
+ | | Hash Cond: (t1.c1 = t2.c1)
+ | | -> Seq Scan on test t1 (cost=0.00..550000.00 rows=30000000 width=37)
+ | | -> Hash (cost=550000.00..550000.00 rows=30000000 width=37)
+ | | -> Seq Scan on test t2 (cost=0.00..550000.00 rows=30000000 width=37)
+ | |
+</screen>
+ </para>
+
+ <para>
+ When <command>EXPLAIN ANALYZE</command> is used the detailed plan is
+ printed progressively based on
+ <xref linkend="guc-progressive-explain-interval"/> and
+ <xref linkend="guc-progressive-explain-sample-rate"/> settings, including
+ per node accumulated row count and run time statistics computed so far. This
+ makes progressive explain a powerful ally when investigating bottlenecks in
+ expensive queries without having to wait for <command>EXPLAIN ANALYZE</command>
+ to finish.
+ </para>
+
+ <para>
+ Progressive explains include additional information per node to help analyzing
+ execution progress:
+
+ <itemizedlist>
+ <listitem>
+ <para>
+ current: the plan node currently being processed.
+ </para>
+ </listitem>
+ <listitem>
+ <para>
+ never executed: a plan node not processed yet.
+ </para>
+ </listitem>
+ </itemizedlist>
+ </para>
+ <para>
+<screen>
+SET progressive_explain = on;
+SET
+
+EXPLAIN ANALYZE SELECT *
+FROM test t1
+INNER JOIN test t2 ON (t1.c1 = t2.c1);
+</screen>
+ </para>
+ <para>
+<screen>
+SELECT pid, explain_count, explain FROM pg_stat_progress_explain;
+ pid | explain_count | explain
+-----+---------------+----------------------------------------------------------------------------------------------------------------------------------------------
+ 159 | 7 | Hash Join (cost=1159375.00..3912500.00 rows=30000000 width=74) (never executed)
+ | | Hash Cond: (t1.c1 = t2.c1)
+ | | -> Seq Scan on test t1 (cost=0.00..550000.00 rows=30000000 width=37) (actual time=0.009..0.009 rows=1 loops=1)
+ | | -> Hash (cost=550000.00..550000.00 rows=30000000 width=37) (never executed)
+ | | -> Seq Scan on test t2 (cost=0.00..550000.00 rows=30000000 width=37) (actual time=0.004..2165.201 rows=27925599 loops=1) (current)
+ | |
+</screen>
+ </para>
+
+ </sect2>
+
</sect1>
<sect1 id="planner-stats">
diff --git a/src/backend/catalog/system_views.sql b/src/backend/catalog/system_views.sql
index da9a8fe99f..4021b1ee6b 100644
--- a/src/backend/catalog/system_views.sql
+++ b/src/backend/catalog/system_views.sql
@@ -1325,6 +1325,11 @@ CREATE VIEW pg_stat_progress_copy AS
FROM pg_stat_get_progress_info('COPY') AS S
LEFT JOIN pg_database D ON S.datid = D.oid;
+CREATE VIEW pg_stat_progress_explain AS
+ SELECT
+ *
+ FROM pg_stat_progress_explain(true);
+
CREATE VIEW pg_user_mappings AS
SELECT
U.oid AS umid,
diff --git a/src/backend/commands/explain.c b/src/backend/commands/explain.c
index 7c0fd63b2f..1f37ec755d 100644
--- a/src/backend/commands/explain.c
+++ b/src/backend/commands/explain.c
@@ -13,6 +13,8 @@
*/
#include "postgres.h"
+#include <time.h>
+
#include "access/xact.h"
#include "catalog/pg_type.h"
#include "commands/createas.h"
@@ -22,6 +24,8 @@
#include "jit/jit.h"
#include "libpq/pqformat.h"
#include "libpq/protocol.h"
+#include "miscadmin.h"
+#include "funcapi.h"
#include "nodes/extensible.h"
#include "nodes/makefuncs.h"
#include "nodes/nodeFuncs.h"
@@ -40,6 +44,12 @@
#include "utils/tuplesort.h"
#include "utils/typcache.h"
#include "utils/xml.h"
+#include "utils/backend_status.h"
+#include "storage/procarray.h"
+#include "executor/spi.h"
+#include "utils/guc.h"
+
+
/* Hook for plugins to get control in ExplainOneQuery() */
@@ -48,6 +58,8 @@ ExplainOneQuery_hook_type ExplainOneQuery_hook = NULL;
/* Hook for plugins to get control in explain_get_index_name() */
explain_get_index_name_hook_type explain_get_index_name_hook = NULL;
+/* Shared hash to store progressive explains */
+static HTAB *explainArray = NULL;
/* Instrumentation data for SERIALIZE option */
typedef struct SerializeMetrics
@@ -140,7 +152,7 @@ static void show_hashagg_info(AggState *aggstate, ExplainState *es);
static void show_tidbitmap_info(BitmapHeapScanState *planstate,
ExplainState *es);
static void show_instrumentation_count(const char *qlabel, int which,
- PlanState *planstate, ExplainState *es);
+ Instrumentation *instr, ExplainState *es);
static void show_foreignscan_info(ForeignScanState *fsstate, ExplainState *es);
static const char *explain_get_index_name(Oid indexId);
static bool peek_buffer_usage(ExplainState *es, const BufferUsage *usage);
@@ -180,6 +192,8 @@ static void ExplainJSONLineEnding(ExplainState *es);
static void ExplainYAMLLineStarting(ExplainState *es);
static void escape_yaml(StringInfo buf, const char *str);
static SerializeMetrics GetSerializationMetrics(DestReceiver *dest);
+void ProgressiveExplainPrint(QueryDesc *queryDesc);
+static void ExplainTrackQueryReleaseFunc(void *);
@@ -385,6 +399,8 @@ NewExplainState(void)
es->costs = true;
/* Prepare output buffer. */
es->str = makeStringInfo();
+ /* An explain state is not progressive by default */
+ es->progressive = false;
return es;
}
@@ -1497,6 +1513,7 @@ ExplainNode(PlanState *planstate, List *ancestors,
const char *partialmode = NULL;
const char *operation = NULL;
const char *custom_name = NULL;
+ Instrumentation *local_instr = NULL;
ExplainWorkersState *save_workers_state = es->workers_state;
int save_indent = es->indent;
bool haschildren;
@@ -1960,24 +1977,56 @@ ExplainNode(PlanState *planstate, List *ancestors,
* instrumentation results the user didn't ask for. But we do the
* InstrEndLoop call anyway, if possible, to reduce the number of cases
* auto_explain has to contend with.
+ *
+ * For regular explains instrumentation clean up is called directly in the
+ * main instrumentation objects. progressive explains need to clone
+ * instrumentation object and forcibly end the loop in nodes that may be
+ * running.
*/
if (planstate->instrument)
- InstrEndLoop(planstate->instrument);
+ {
+ /* Progressive explain. Clone instrumentation */
+ if (es->progressive)
+ {
+ local_instr = palloc0(sizeof(*local_instr));
+ *local_instr = *planstate->instrument;
+ /* Force end loop even if node is in progress */
+ InstrEndLoopForce(local_instr);
+ }
+ /* Use main instrumentation */
+ else
+ {
+ local_instr = planstate->instrument;
+ InstrEndLoop(local_instr);
+ }
+ }
if (es->analyze &&
- planstate->instrument && planstate->instrument->nloops > 0)
+ local_instr && local_instr->nloops > 0)
{
- double nloops = planstate->instrument->nloops;
- double startup_ms = 1000.0 * planstate->instrument->startup / nloops;
- double total_ms = 1000.0 * planstate->instrument->total / nloops;
- double rows = planstate->instrument->ntuples / nloops;
+ double nloops = local_instr->nloops;
+ double startup_ms = 1000.0 * local_instr->startup / nloops;
+ double total_ms = 1000.0 * local_instr->total / nloops;
+ double rows = local_instr->ntuples / nloops;
if (es->format == EXPLAIN_FORMAT_TEXT)
{
if (es->timing)
+ /* Node in progress */
+ if (es->progressive && planstate == planstate->state->progressive_explain_current_node)
+ appendStringInfo(es->str,
+ " (actual time=%.3f..%.3f rows=%.0f loops=%.0f) (current)",
+ startup_ms, total_ms, rows, nloops);
+ else
+ appendStringInfo(es->str,
+ " (actual time=%.3f..%.3f rows=%.0f loops=%.0f)",
+ startup_ms, total_ms, rows, nloops);
+ else
+ /* Node in progress */
+ if (es->progressive && planstate == planstate->state->progressive_explain_current_node)
appendStringInfo(es->str,
- " (actual time=%.3f..%.3f rows=%.0f loops=%.0f)",
- startup_ms, total_ms, rows, nloops);
+ " (actual rows=%.0f loops=%.0f) (current)",
+ rows, nloops);
else
appendStringInfo(es->str,
" (actual rows=%.0f loops=%.0f)",
@@ -1992,6 +2041,11 @@ ExplainNode(PlanState *planstate, List *ancestors,
ExplainPropertyFloat("Actual Total Time", "ms", total_ms,
3, es);
}
+ /* Progressive explain. Add current node flag is applicable */
+ if (es->progressive && planstate == planstate->state->progressive_explain_current_node)
+ {
+ ExplainPropertyBool("Current", true, es);
+ }
ExplainPropertyFloat("Actual Rows", NULL, rows, 0, es);
ExplainPropertyFloat("Actual Loops", NULL, nloops, 0, es);
}
@@ -2100,29 +2154,29 @@ ExplainNode(PlanState *planstate, List *ancestors,
"Index Cond", planstate, ancestors, es);
if (((IndexScan *) plan)->indexqualorig)
show_instrumentation_count("Rows Removed by Index Recheck", 2,
- planstate, es);
+ local_instr, es);
show_scan_qual(((IndexScan *) plan)->indexorderbyorig,
"Order By", planstate, ancestors, es);
show_scan_qual(plan->qual, "Filter", planstate, ancestors, es);
if (plan->qual)
show_instrumentation_count("Rows Removed by Filter", 1,
- planstate, es);
+ local_instr, es);
break;
case T_IndexOnlyScan:
show_scan_qual(((IndexOnlyScan *) plan)->indexqual,
"Index Cond", planstate, ancestors, es);
if (((IndexOnlyScan *) plan)->recheckqual)
show_instrumentation_count("Rows Removed by Index Recheck", 2,
- planstate, es);
+ local_instr, es);
show_scan_qual(((IndexOnlyScan *) plan)->indexorderby,
"Order By", planstate, ancestors, es);
show_scan_qual(plan->qual, "Filter", planstate, ancestors, es);
if (plan->qual)
show_instrumentation_count("Rows Removed by Filter", 1,
- planstate, es);
+ local_instr, es);
if (es->analyze)
ExplainPropertyFloat("Heap Fetches", NULL,
- planstate->instrument->ntuples2, 0, es);
+ local_instr->ntuples2, 0, es);
break;
case T_BitmapIndexScan:
show_scan_qual(((BitmapIndexScan *) plan)->indexqualorig,
@@ -2133,11 +2187,11 @@ ExplainNode(PlanState *planstate, List *ancestors,
"Recheck Cond", planstate, ancestors, es);
if (((BitmapHeapScan *) plan)->bitmapqualorig)
show_instrumentation_count("Rows Removed by Index Recheck", 2,
- planstate, es);
+ local_instr, es);
show_scan_qual(plan->qual, "Filter", planstate, ancestors, es);
if (plan->qual)
show_instrumentation_count("Rows Removed by Filter", 1,
- planstate, es);
+ local_instr, es);
show_tidbitmap_info((BitmapHeapScanState *) planstate, es);
break;
case T_SampleScan:
@@ -2154,7 +2208,7 @@ ExplainNode(PlanState *planstate, List *ancestors,
show_scan_qual(plan->qual, "Filter", planstate, ancestors, es);
if (plan->qual)
show_instrumentation_count("Rows Removed by Filter", 1,
- planstate, es);
+ local_instr, es);
if (IsA(plan, CteScan))
show_ctescan_info(castNode(CteScanState, planstate), es);
break;
@@ -2165,7 +2219,7 @@ ExplainNode(PlanState *planstate, List *ancestors,
show_scan_qual(plan->qual, "Filter", planstate, ancestors, es);
if (plan->qual)
show_instrumentation_count("Rows Removed by Filter", 1,
- planstate, es);
+ local_instr, es);
ExplainPropertyInteger("Workers Planned", NULL,
gather->num_workers, es);
@@ -2189,7 +2243,7 @@ ExplainNode(PlanState *planstate, List *ancestors,
show_scan_qual(plan->qual, "Filter", planstate, ancestors, es);
if (plan->qual)
show_instrumentation_count("Rows Removed by Filter", 1,
- planstate, es);
+ local_instr, es);
ExplainPropertyInteger("Workers Planned", NULL,
gm->num_workers, es);
@@ -2223,7 +2277,7 @@ ExplainNode(PlanState *planstate, List *ancestors,
show_scan_qual(plan->qual, "Filter", planstate, ancestors, es);
if (plan->qual)
show_instrumentation_count("Rows Removed by Filter", 1,
- planstate, es);
+ local_instr, es);
break;
case T_TableFuncScan:
if (es->verbose)
@@ -2237,7 +2291,7 @@ ExplainNode(PlanState *planstate, List *ancestors,
show_scan_qual(plan->qual, "Filter", planstate, ancestors, es);
if (plan->qual)
show_instrumentation_count("Rows Removed by Filter", 1,
- planstate, es);
+ local_instr, es);
show_table_func_scan_info(castNode(TableFuncScanState,
planstate), es);
break;
@@ -2255,7 +2309,7 @@ ExplainNode(PlanState *planstate, List *ancestors,
show_scan_qual(plan->qual, "Filter", planstate, ancestors, es);
if (plan->qual)
show_instrumentation_count("Rows Removed by Filter", 1,
- planstate, es);
+ local_instr, es);
}
break;
case T_TidRangeScan:
@@ -2272,14 +2326,14 @@ ExplainNode(PlanState *planstate, List *ancestors,
show_scan_qual(plan->qual, "Filter", planstate, ancestors, es);
if (plan->qual)
show_instrumentation_count("Rows Removed by Filter", 1,
- planstate, es);
+ local_instr, es);
}
break;
case T_ForeignScan:
show_scan_qual(plan->qual, "Filter", planstate, ancestors, es);
if (plan->qual)
show_instrumentation_count("Rows Removed by Filter", 1,
- planstate, es);
+ local_instr, es);
show_foreignscan_info((ForeignScanState *) planstate, es);
break;
case T_CustomScan:
@@ -2289,7 +2343,7 @@ ExplainNode(PlanState *planstate, List *ancestors,
show_scan_qual(plan->qual, "Filter", planstate, ancestors, es);
if (plan->qual)
show_instrumentation_count("Rows Removed by Filter", 1,
- planstate, es);
+ local_instr, es);
if (css->methods->ExplainCustomScan)
css->methods->ExplainCustomScan(css, ancestors, es);
}
@@ -2299,11 +2353,11 @@ ExplainNode(PlanState *planstate, List *ancestors,
"Join Filter", planstate, ancestors, es);
if (((NestLoop *) plan)->join.joinqual)
show_instrumentation_count("Rows Removed by Join Filter", 1,
- planstate, es);
+ local_instr, es);
show_upper_qual(plan->qual, "Filter", planstate, ancestors, es);
if (plan->qual)
show_instrumentation_count("Rows Removed by Filter", 2,
- planstate, es);
+ local_instr, es);
break;
case T_MergeJoin:
show_upper_qual(((MergeJoin *) plan)->mergeclauses,
@@ -2312,11 +2366,11 @@ ExplainNode(PlanState *planstate, List *ancestors,
"Join Filter", planstate, ancestors, es);
if (((MergeJoin *) plan)->join.joinqual)
show_instrumentation_count("Rows Removed by Join Filter", 1,
- planstate, es);
+ local_instr, es);
show_upper_qual(plan->qual, "Filter", planstate, ancestors, es);
if (plan->qual)
show_instrumentation_count("Rows Removed by Filter", 2,
- planstate, es);
+ local_instr, es);
break;
case T_HashJoin:
show_upper_qual(((HashJoin *) plan)->hashclauses,
@@ -2325,11 +2379,11 @@ ExplainNode(PlanState *planstate, List *ancestors,
"Join Filter", planstate, ancestors, es);
if (((HashJoin *) plan)->join.joinqual)
show_instrumentation_count("Rows Removed by Join Filter", 1,
- planstate, es);
+ local_instr, es);
show_upper_qual(plan->qual, "Filter", planstate, ancestors, es);
if (plan->qual)
show_instrumentation_count("Rows Removed by Filter", 2,
- planstate, es);
+ local_instr, es);
break;
case T_Agg:
show_agg_keys(castNode(AggState, planstate), ancestors, es);
@@ -2337,13 +2391,13 @@ ExplainNode(PlanState *planstate, List *ancestors,
show_hashagg_info((AggState *) planstate, es);
if (plan->qual)
show_instrumentation_count("Rows Removed by Filter", 1,
- planstate, es);
+ local_instr, es);
break;
case T_WindowAgg:
show_upper_qual(plan->qual, "Filter", planstate, ancestors, es);
if (plan->qual)
show_instrumentation_count("Rows Removed by Filter", 1,
- planstate, es);
+ local_instr, es);
show_upper_qual(((WindowAgg *) plan)->runConditionOrig,
"Run Condition", planstate, ancestors, es);
show_windowagg_info(castNode(WindowAggState, planstate), es);
@@ -2353,7 +2407,7 @@ ExplainNode(PlanState *planstate, List *ancestors,
show_upper_qual(plan->qual, "Filter", planstate, ancestors, es);
if (plan->qual)
show_instrumentation_count("Rows Removed by Filter", 1,
- planstate, es);
+ local_instr, es);
break;
case T_Sort:
show_sort_keys(castNode(SortState, planstate), ancestors, es);
@@ -2375,7 +2429,7 @@ ExplainNode(PlanState *planstate, List *ancestors,
show_upper_qual(plan->qual, "Filter", planstate, ancestors, es);
if (plan->qual)
show_instrumentation_count("Rows Removed by Filter", 1,
- planstate, es);
+ local_instr, es);
break;
case T_ModifyTable:
show_modifytable_info(castNode(ModifyTableState, planstate), ancestors,
@@ -2420,10 +2474,10 @@ ExplainNode(PlanState *planstate, List *ancestors,
}
/* Show buffer/WAL usage */
- if (es->buffers && planstate->instrument)
- show_buffer_usage(es, &planstate->instrument->bufusage);
- if (es->wal && planstate->instrument)
- show_wal_usage(es, &planstate->instrument->walusage);
+ if (es->buffers && local_instr)
+ show_buffer_usage(es, &local_instr->bufusage);
+ if (es->wal && local_instr)
+ show_wal_usage(es, &local_instr->walusage);
/* Prepare per-worker buffer/WAL usage */
if (es->workers_state && (es->buffers || es->wal) && es->verbose)
@@ -2562,6 +2616,12 @@ ExplainNode(PlanState *planstate, List *ancestors,
ExplainCloseGroup("Plan",
relationship ? NULL : "Plan",
true, es);
+
+ /* Progressive explain. Free cloned instrumentation object */
+ if (es->progressive && local_instr)
+ {
+ pfree(local_instr);
+ }
}
/*
@@ -3940,19 +4000,19 @@ show_tidbitmap_info(BitmapHeapScanState *planstate, ExplainState *es)
*/
static void
show_instrumentation_count(const char *qlabel, int which,
- PlanState *planstate, ExplainState *es)
+ Instrumentation *instr, ExplainState *es)
{
double nfiltered;
double nloops;
- if (!es->analyze || !planstate->instrument)
+ if (!es->analyze || !instr)
return;
if (which == 2)
- nfiltered = planstate->instrument->nfiltered2;
+ nfiltered = instr->nfiltered2;
else
- nfiltered = planstate->instrument->nfiltered1;
- nloops = planstate->instrument->nloops;
+ nfiltered = instr->nfiltered1;
+ nloops = instr->nloops;
/* In text mode, suppress zero counts; they're not interesting enough */
if (nfiltered > 0 || es->format != EXPLAIN_FORMAT_TEXT)
@@ -4618,7 +4678,7 @@ show_modifytable_info(ModifyTableState *mtstate, List *ancestors,
{
show_upper_qual((List *) node->onConflictWhere, "Conflict Filter",
&mtstate->ps, ancestors, es);
- show_instrumentation_count("Rows Removed by Conflict Filter", 1, &mtstate->ps, es);
+ show_instrumentation_count("Rows Removed by Conflict Filter", 1, (&mtstate->ps)->instrument, es);
}
/* EXPLAIN ANALYZE display of actual outcome for each tuple proposed */
@@ -4627,11 +4687,24 @@ show_modifytable_info(ModifyTableState *mtstate, List *ancestors,
double total;
double insert_path;
double other_path;
+ Instrumentation *local_instr;
- InstrEndLoop(outerPlanState(mtstate)->instrument);
+ /* Progressive explain. Clone instrumentation */
+ if (es->progressive)
+ {
+ local_instr = palloc0(sizeof(*local_instr));
+ *local_instr = *outerPlanState(mtstate)->instrument;
+ /* Force end loop even if node is in progress */
+ InstrEndLoopForce(local_instr);
+ }
+ else
+ {
+ local_instr = outerPlanState(mtstate)->instrument;
+ InstrEndLoop(local_instr);
+ }
/* count the number of source rows */
- total = outerPlanState(mtstate)->instrument->ntuples;
+ total = local_instr->ntuples;
other_path = mtstate->ps.instrument->ntuples2;
insert_path = total - other_path;
@@ -4639,6 +4712,10 @@ show_modifytable_info(ModifyTableState *mtstate, List *ancestors,
insert_path, 0, es);
ExplainPropertyFloat("Conflicting Tuples", NULL,
other_path, 0, es);
+
+ /* Progressive explain. Free cloned instrumentation object */
+ if (es->progressive)
+ pfree(local_instr);
}
}
else if (node->operation == CMD_MERGE)
@@ -4651,11 +4728,24 @@ show_modifytable_info(ModifyTableState *mtstate, List *ancestors,
double update_path;
double delete_path;
double skipped_path;
+ Instrumentation *local_instr;
- InstrEndLoop(outerPlanState(mtstate)->instrument);
+ /* Progressive explain. Clone instrumentation */
+ if (es->progressive)
+ {
+ local_instr = palloc0(sizeof(*local_instr));
+ *local_instr = *outerPlanState(mtstate)->instrument;
+ /* Force end loop even if node is in progress */
+ InstrEndLoopForce(local_instr);
+ }
+ else
+ {
+ local_instr = outerPlanState(mtstate)->instrument;
+ InstrEndLoop(local_instr);
+ }
/* count the number of source rows */
- total = outerPlanState(mtstate)->instrument->ntuples;
+ total = local_instr->ntuples;
insert_path = mtstate->mt_merge_inserted;
update_path = mtstate->mt_merge_updated;
delete_path = mtstate->mt_merge_deleted;
@@ -4686,6 +4776,10 @@ show_modifytable_info(ModifyTableState *mtstate, List *ancestors,
ExplainPropertyFloat("Tuples Deleted", NULL, delete_path, 0, es);
ExplainPropertyFloat("Tuples Skipped", NULL, skipped_path, 0, es);
}
+
+ /* Progressive explain. Free cloned instrumentation object */
+ if (es->progressive)
+ pfree(local_instr);
}
}
@@ -5910,3 +6004,290 @@ GetSerializationMetrics(DestReceiver *dest)
return empty;
}
+
+
+/*
+ * ProgressiveExplainPrint
+ * Prints progressive explain in memory.
+ *
+ * This operation needs to be done in a dedicated memory context
+ * as plans for instrumented runs will be printed multiple times
+ * and instrumentation objects need to be cloned so that stats
+ * can get updated without interfering with original objects.
+ */
+void
+ProgressiveExplainPrint(QueryDesc *queryDesc)
+{
+ MemoryContext tmpCxt;
+ MemoryContext oldCxt;
+ instr_time starttime;
+
+ INSTR_TIME_SET_CURRENT(starttime);
+
+ /* Dedicated memory context for the current plan being printed */
+ tmpCxt = AllocSetContextCreate(CurrentMemoryContext,
+ "Progressive Explain Temporary Context",
+ ALLOCSET_DEFAULT_SIZES);
+ oldCxt = MemoryContextSwitchTo(tmpCxt);
+
+ /* Produce a plan only if descriptor is being tracked */
+ if (queryDesc &&
+ queryDesc->planstate)
+ {
+ QueryDesc *currentQueryDesc = queryDesc;
+
+ explainHashKey key;
+ explainHashEntry *entry;
+ ExplainState *es;
+
+ es = NewExplainState();
+
+ /*
+ * Mark ExplainState as progressive so that the plan printing function
+ * adjusts logic accordingly.
+ */
+ es->progressive = true;
+ /* Instrumentation details come from the active QueryDesc */
+ es->analyze = queryDesc->instrument_options;
+ es->buffers = (queryDesc->instrument_options &
+ INSTRUMENT_BUFFERS) != 0;
+ es->wal = (queryDesc->instrument_options &
+ INSTRUMENT_WAL) != 0;
+ es->timing = (queryDesc->instrument_options &
+ INSTRUMENT_TIMER) != 0;
+ es->summary = (es->analyze);
+
+ /* Additional options come from progressive GUC settings */
+ es->format = progressive_explain_format;
+ es->verbose = progressive_explain_verbose;
+ es->settings = progressive_explain_settings;
+
+ ExplainBeginOutput(es);
+ ExplainPrintPlan(es, currentQueryDesc);
+ ExplainEndOutput(es);
+
+ key.pid = MyProcPid;
+
+ /* Exclusive access is needed to update the hash */
+ LWLockAcquire(ExplainHashLock, LW_EXCLUSIVE);
+ entry = (explainHashEntry *) hash_search(explainArray, &key, HASH_FIND, NULL);
+
+
+ if (entry)
+ {
+ entry->explain_count++;
+ strncpy(entry->plan, es->str->data, progressive_explain_output_size);
+ entry->explain_duration += elapsed_time(&starttime);
+ entry->last_explain = GetCurrentTimestamp();
+ }
+
+ LWLockRelease(ExplainHashLock);
+
+ /*
+ * Free local explain state before exiting as this function may be
+ * called multiple times in the same memory context.
+ */
+ pfree(es->str);
+ pfree(es);
+ }
+
+ /* Clean up temp context */
+ MemoryContextSwitchTo(oldCxt);
+ MemoryContextDelete(tmpCxt);
+}
+
+/*
+ * ProgressiveExplainBegin
+ * Enables progressive explain progress tracking for a query in the local backend.
+ *
+ * An progressive explain is printed at the beginning of every query if progressive_explain
+ * is enabled.
+
+ * For instrumented runs started with EXPLAIN ANALYZE the progressive plan is printed
+ * via ExecProcNodeInstrExplain at a regular interval controlled by progressive_explain_interval.
+ *
+ * Plans are stored in shared memory object explainArray that needs to be properly
+ * cleared when the query finishes or gets canceled. This is achieved with the help
+ * of a memory context callback configured in the same memory context where the query
+ * descriptor was created. This strategy allows cleaning explainArray even when the
+ * query doesn't finish gracefully.
+ */
+void
+ProgressiveExplainBegin(QueryDesc *queryDesc)
+{
+ explainHashKey key;
+ explainHashEntry *entry;
+ bool found;
+
+ /* Configure memory context release callback */
+ MemoryContextCallback *queryDescReleaseCallback;
+
+ queryDescReleaseCallback = (MemoryContextCallback *)
+ palloc0(sizeof(MemoryContextCallback));
+ queryDescReleaseCallback->func = ExplainTrackQueryReleaseFunc;
+ queryDescReleaseCallback->arg = NULL;
+ MemoryContextRegisterResetCallback(CurrentMemoryContext,
+ queryDescReleaseCallback);
+
+ INSTR_TIME_SET_CURRENT(queryDesc->estate->progressive_explain_interval_time);
+
+ key.pid = MyProcPid;
+
+ /* Exclusive access is needed to update the hash */
+ LWLockAcquire(ExplainHashLock, LW_EXCLUSIVE);
+
+ /* Find or create an entry with desired hash code */
+ entry = (explainHashEntry *) hash_search(explainArray, &key, HASH_ENTER, &found);
+ entry->pid = MyProcPid;
+ entry->explain_count = 0;
+ entry->explain_duration = 0.0f;
+ strcpy(entry->plan, "");
+ entry->last_explain = 0;
+
+ LWLockRelease(ExplainHashLock);
+
+ /* Printing progressive plan for the first time */
+ ProgressiveExplainPrint(queryDesc);
+}
+
+/*
+ * ProgressiveExplainUpdate
+ * Updates progressive explain for instrumented runs.
+ */
+void
+ProgressiveExplainUpdate(PlanState *node)
+{
+ /*
+ * Update explain plan only if has passed since previous print.
+ */
+ if (elapsed_time(&node->state->progressive_explain_interval_time) * 1000.0 > progressive_explain_interval)
+ {
+ node->state->progressive_explain_current_node = node;
+ ProgressiveExplainPrint(node->state->query_desc);
+ node->state->progressive_explain_current_node = NULL;
+ INSTR_TIME_SET_CURRENT(node->state->progressive_explain_interval_time);
+ }
+}
+
+/*
+ * ExplainTrackQueryReleaseFunc
+ * Memory context release callback function to remove
+ * plan from explain hash.
+ */
+static void
+ExplainTrackQueryReleaseFunc(void *)
+{
+ /* Remove row from hash */
+ explainHashKey key;
+
+ key.pid = MyProcPid;
+ LWLockAcquire(ExplainHashLock, LW_EXCLUSIVE);
+ hash_search(explainArray, &key, HASH_REMOVE, NULL);
+ LWLockRelease(ExplainHashLock);
+}
+
+/*
+ * ExplainHashShmemSize
+ * Compute shared memory space needed for explain hash.
+ */
+Size
+ExplainHashShmemSize(void)
+{
+ Size size = 0;
+ long max_table_size;
+
+ max_table_size = add_size(MaxBackends, max_parallel_workers);
+
+ size = add_size(size, hash_estimate_size(max_table_size, add_size(sizeof(explainHashEntry), progressive_explain_output_size)));
+
+ return size;
+}
+
+/*
+ * InitExplainHash
+ * Initialize hash used to store data of progressive explains.
+ */
+void
+InitExplainHash(void)
+{
+ HASHCTL info;
+
+ info.keysize = sizeof(explainHashKey);
+ info.entrysize = sizeof(explainHashEntry) + progressive_explain_output_size;
+
+ explainArray = ShmemInitHash("explain hash",
+ 50, 50,
+ &info,
+ HASH_ELEM | HASH_BLOBS);
+}
+
+/*
+ * pg_stat_progress_explain
+ * Return the progress of progressive explains.
+ */
+Datum
+pg_stat_progress_explain(PG_FUNCTION_ARGS)
+{
+ char duration[1024];
+#define EXPLAIN_ACTIVITY_COLS 5
+ HASH_SEQ_STATUS hash_seq;
+ explainHashEntry *entry;
+
+ ReturnSetInfo *rsinfo = (ReturnSetInfo *) fcinfo->resultinfo;
+
+ InitMaterializedSRF(fcinfo, 0);
+
+ LWLockAcquire(ExplainHashLock, LW_SHARED);
+
+ hash_seq_init(&hash_seq, explainArray);
+ while ((entry = hash_seq_search(&hash_seq)) != NULL)
+ {
+ Datum values[EXPLAIN_ACTIVITY_COLS] = {0};
+ bool nulls[EXPLAIN_ACTIVITY_COLS] = {0};
+
+ values[0] = entry->pid;
+ values[1] = TimestampTzGetDatum(entry->last_explain);
+ values[2] = entry->explain_count;
+ sprintf(duration, "%.3f", 1000.0 * entry->explain_duration);
+ values[3] = CStringGetTextDatum(duration);
+
+ if (superuser())
+ values[4] = CStringGetTextDatum(entry->plan);
+ else
+ {
+ int num_backends = pgstat_fetch_stat_numbackends();
+ int curr_backend;
+ bool found;
+
+ /* 1-based index */
+ for (curr_backend = 1; curr_backend <= num_backends; curr_backend++)
+ {
+ LocalPgBackendStatus *local_beentry;
+ PgBackendStatus *beentry;
+
+ /* Get the next one in the list */
+ local_beentry = pgstat_get_local_beentry_by_index(curr_backend);
+ beentry = &local_beentry->backendStatus;
+
+ if (beentry->st_procpid == entry->pid)
+ {
+ found = true;
+ if (beentry->st_userid == GetUserId())
+ values[4] = CStringGetTextDatum(entry->plan);
+ else
+ values[4] = CStringGetTextDatum("<insufficient privilege>");
+ break;
+ }
+ }
+
+ if (!found)
+ values[3] = CStringGetTextDatum("<insufficient privilege>");
+ }
+
+ tuplestore_putvalues(rsinfo->setResult, rsinfo->setDesc, values, nulls);
+
+ }
+ LWLockRelease(ExplainHashLock);
+
+ return (Datum) 0;
+}
diff --git a/src/backend/executor/execMain.c b/src/backend/executor/execMain.c
index 5ca856fd27..e34c8f03fe 100644
--- a/src/backend/executor/execMain.c
+++ b/src/backend/executor/execMain.c
@@ -61,6 +61,7 @@
#include "utils/partcache.h"
#include "utils/rls.h"
#include "utils/snapmgr.h"
+#include "commands/explain.h"
/* Hooks for plugins to get control in ExecutorStart/Run/Finish/End */
@@ -174,6 +175,11 @@ standard_ExecutorStart(QueryDesc *queryDesc, int eflags)
estate = CreateExecutorState();
queryDesc->estate = estate;
+ /*
+ * Adding back reference to QueryDesc
+ */
+ estate->query_desc = queryDesc;
+
oldcontext = MemoryContextSwitchTo(estate->es_query_cxt);
/*
@@ -260,6 +266,12 @@ standard_ExecutorStart(QueryDesc *queryDesc, int eflags)
InitPlan(queryDesc, eflags);
MemoryContextSwitchTo(oldcontext);
+
+ /*
+ * Start progressive explain if enabled.
+ */
+ if (progressive_explain)
+ ProgressiveExplainBegin(queryDesc);
}
/* ----------------------------------------------------------------
diff --git a/src/backend/executor/execProcnode.c b/src/backend/executor/execProcnode.c
index 34f28dfece..16f1407633 100644
--- a/src/backend/executor/execProcnode.c
+++ b/src/backend/executor/execProcnode.c
@@ -118,9 +118,13 @@
#include "executor/nodeWorktablescan.h"
#include "miscadmin.h"
#include "nodes/nodeFuncs.h"
+#include "commands/explain.h"
+#include "utils/guc.h"
+#include "common/pg_prng.h"
static TupleTableSlot *ExecProcNodeFirst(PlanState *node);
static TupleTableSlot *ExecProcNodeInstr(PlanState *node);
+static TupleTableSlot *ExecProcNodeInstrExplain(PlanState *node);
static bool ExecShutdownNode_walker(PlanState *node, void *context);
@@ -461,8 +465,12 @@ ExecProcNodeFirst(PlanState *node)
* does instrumentation. Otherwise we can dispense with all wrappers and
* have ExecProcNode() directly call the relevant function from now on.
*/
+
if (node->instrument)
- node->ExecProcNode = ExecProcNodeInstr;
+ if (progressive_explain)
+ node->ExecProcNode = ExecProcNodeInstrExplain;
+ else
+ node->ExecProcNode = ExecProcNodeInstr;
else
node->ExecProcNode = node->ExecProcNodeReal;
@@ -489,6 +497,31 @@ ExecProcNodeInstr(PlanState *node)
return result;
}
+/*
+ * ExecProcNode wrapper that performs instrumentation calls and prints
+ * progressive explains. By keeping this a separate function, we add
+ * overhead only when progressive explain is enabled
+ */
+static TupleTableSlot *
+ExecProcNodeInstrExplain(PlanState *node)
+{
+ TupleTableSlot *result;
+
+ InstrStartNode(node->instrument);
+
+ /*
+ * Update progressive explain based on sampling.
+ */
+ if (pg_prng_double(&pg_global_prng_state) < progressive_explain_sample_rate)
+ ProgressiveExplainUpdate(node);
+
+ result = node->ExecProcNodeReal(node);
+
+ InstrStopNode(node->instrument, TupIsNull(result) ? 0.0 : 1.0);
+
+ return result;
+}
+
/* ----------------------------------------------------------------
* MultiExecProcNode
diff --git a/src/backend/executor/instrument.c b/src/backend/executor/instrument.c
index 268ae8a945..244c3591a1 100644
--- a/src/backend/executor/instrument.c
+++ b/src/backend/executor/instrument.c
@@ -25,6 +25,8 @@ static WalUsage save_pgWalUsage;
static void BufferUsageAdd(BufferUsage *dst, const BufferUsage *add);
static void WalUsageAdd(WalUsage *dst, WalUsage *add);
+static void InstrEndLoopInternal(Instrumentation *instr, bool force);
+
/* Allocate new instrumentation structure(s) */
Instrumentation *
@@ -137,7 +139,7 @@ InstrUpdateTupleCount(Instrumentation *instr, double nTuples)
/* Finish a run cycle for a plan node */
void
-InstrEndLoop(Instrumentation *instr)
+InstrEndLoopInternal(Instrumentation *instr, bool force)
{
double totaltime;
@@ -145,7 +147,7 @@ InstrEndLoop(Instrumentation *instr)
if (!instr->running)
return;
- if (!INSTR_TIME_IS_ZERO(instr->starttime))
+ if (!INSTR_TIME_IS_ZERO(instr->starttime) && !force)
elog(ERROR, "InstrEndLoop called on running node");
/* Accumulate per-cycle statistics into totals */
@@ -164,6 +166,20 @@ InstrEndLoop(Instrumentation *instr)
instr->tuplecount = 0;
}
+/* Safely finish a run cycle for a plan node */
+void
+InstrEndLoop(Instrumentation *instr)
+{
+ InstrEndLoopInternal(instr, false);
+}
+
+/* Forcibly finish a run cycle for a plan node */
+void
+InstrEndLoopForce(Instrumentation *instr)
+{
+ InstrEndLoopInternal(instr, true);
+}
+
/* aggregate instrumentation information */
void
InstrAggNode(Instrumentation *dst, Instrumentation *add)
diff --git a/src/backend/storage/ipc/ipci.c b/src/backend/storage/ipc/ipci.c
index 7783ba854f..25e70c63d5 100644
--- a/src/backend/storage/ipc/ipci.c
+++ b/src/backend/storage/ipc/ipci.c
@@ -50,6 +50,7 @@
#include "storage/sinvaladt.h"
#include "utils/guc.h"
#include "utils/injection_point.h"
+#include "commands/explain.h"
/* GUCs */
int shared_memory_type = DEFAULT_SHARED_MEMORY_TYPE;
@@ -148,6 +149,7 @@ CalculateShmemSize(int *num_semaphores)
size = add_size(size, WaitEventCustomShmemSize());
size = add_size(size, InjectionPointShmemSize());
size = add_size(size, SlotSyncShmemSize());
+ size = add_size(size, ExplainHashShmemSize());
/* include additional requested shmem from preload libraries */
size = add_size(size, total_addin_request);
@@ -300,6 +302,11 @@ CreateOrAttachShmemStructs(void)
*/
PredicateLockShmemInit();
+ /*
+ * Set up instrumented explain hash table
+ */
+ InitExplainHash();
+
/*
* Set up process table
*/
diff --git a/src/backend/utils/activity/wait_event_names.txt b/src/backend/utils/activity/wait_event_names.txt
index 16144c2b72..da08e7f3c9 100644
--- a/src/backend/utils/activity/wait_event_names.txt
+++ b/src/backend/utils/activity/wait_event_names.txt
@@ -345,6 +345,7 @@ WALSummarizer "Waiting to read or update WAL summarization state."
DSMRegistry "Waiting to read or update the dynamic shared memory registry."
InjectionPoint "Waiting to read or update information related to injection points."
SerialControl "Waiting to read or update shared <filename>pg_serial</filename> state."
+ExplainHash "Waiting to access backend explain shared hash table."
#
# END OF PREDEFINED LWLOCKS (DO NOT CHANGE THIS LINE)
diff --git a/src/backend/utils/misc/guc_tables.c b/src/backend/utils/misc/guc_tables.c
index 8a67f01200..b276955603 100644
--- a/src/backend/utils/misc/guc_tables.c
+++ b/src/backend/utils/misc/guc_tables.c
@@ -46,6 +46,7 @@
#include "commands/vacuum.h"
#include "common/file_utils.h"
#include "common/scram-common.h"
+#include "commands/explain.h"
#include "jit/jit.h"
#include "libpq/auth.h"
#include "libpq/libpq.h"
@@ -474,6 +475,14 @@ static const struct config_enum_entry wal_compression_options[] = {
{NULL, 0, false}
};
+static const struct config_enum_entry explain_format_options[] = {
+ {"text", EXPLAIN_FORMAT_TEXT, false},
+ {"xml", EXPLAIN_FORMAT_XML, false},
+ {"json", EXPLAIN_FORMAT_JSON, false},
+ {"yaml", EXPLAIN_FORMAT_YAML, false},
+ {NULL, 0, false}
+};
+
/*
* Options for enum values stored in other modules
*/
@@ -528,6 +537,13 @@ int log_parameter_max_length_on_error = 0;
int log_temp_files = -1;
double log_statement_sample_rate = 1.0;
double log_xact_sample_rate = 0;
+bool progressive_explain = false;
+bool progressive_explain_verbose = false;
+bool progressive_explain_settings = false;
+int progressive_explain_interval = 1000;
+int progressive_explain_format = EXPLAIN_FORMAT_TEXT;
+int progressive_explain_output_size = 4096;
+double progressive_explain_sample_rate = 0.01;
char *backtrace_functions;
int temp_file_limit = -1;
@@ -2076,6 +2092,39 @@ struct config_bool ConfigureNamesBool[] =
NULL, NULL, NULL
},
+ {
+ {"progressive_explain", PGC_USERSET, QUERY_TUNING_METHOD,
+ gettext_noop("Enables progressive explains."),
+ gettext_noop("Explain output is visible via pg_stat_progress_explain."),
+ GUC_EXPLAIN
+ },
+ &progressive_explain,
+ false,
+ NULL, NULL, NULL
+ },
+
+ {
+ {"progressive_explain_verbose", PGC_USERSET, QUERY_TUNING_METHOD,
+ gettext_noop("Controls whether verbose details are added to progressive explains."),
+ gettext_noop("Equivalent to the VERBOSE option of EXPLAIN."),
+ GUC_EXPLAIN
+ },
+ &progressive_explain_verbose,
+ false,
+ NULL, NULL, NULL
+ },
+
+ {
+ {"progressive_explain_settings", PGC_USERSET, QUERY_TUNING_METHOD,
+ gettext_noop("Controls whether information about modified configuration is added to progressive explains."),
+ gettext_noop("Equivalent to the SETTINGS option of EXPLAIN."),
+ GUC_EXPLAIN
+ },
+ &progressive_explain_settings,
+ false,
+ NULL, NULL, NULL
+ },
+
/* End-of-list marker */
{
{NULL, 0, 0, NULL, NULL}, NULL, false, NULL, NULL, NULL
@@ -3714,6 +3763,28 @@ struct config_int ConfigureNamesInt[] =
NULL, NULL, NULL
},
+ {
+ {"progressive_explain_interval", PGC_USERSET, QUERY_TUNING_METHOD,
+ gettext_noop("Sets the interval between instrumented progressive "
+ "explains."),
+ NULL,
+ GUC_UNIT_MS
+ },
+ &progressive_explain_interval,
+ 1000, 10, INT_MAX,
+ NULL, NULL, NULL
+ },
+
+ {
+ {"progressive_explain_output_size", PGC_POSTMASTER, QUERY_TUNING_METHOD,
+ gettext_noop("Sets the size reserved for pg_stat_progress_explain.explain, in bytes."),
+ NULL
+ },
+ &progressive_explain_output_size,
+ 4096, 100, 1048576,
+ NULL, NULL, NULL
+ },
+
/* End-of-list marker */
{
{NULL, 0, 0, NULL, NULL}, NULL, 0, 0, 0, NULL, NULL, NULL
@@ -3995,6 +4066,17 @@ struct config_real ConfigureNamesReal[] =
NULL, NULL, NULL
},
+ {
+ {"progressive_explain_sample_rate", PGC_USERSET, QUERY_TUNING_METHOD,
+ gettext_noop("Fraction of rows processed by the query until progressive_explain_interval is evaluated "
+ "to print a progressive plan."),
+ gettext_noop("Use a value between 0.0 (never) and 1.0 (always).")
+ },
+ &progressive_explain_sample_rate,
+ 0.01, 0.0, 1.0,
+ NULL, NULL, NULL
+ },
+
/* End-of-list marker */
{
{NULL, 0, 0, NULL, NULL}, NULL, 0.0, 0.0, 0.0, NULL, NULL, NULL
@@ -5207,6 +5289,16 @@ struct config_enum ConfigureNamesEnum[] =
NULL, NULL, NULL
},
+ {
+ {"progressive_explain_format", PGC_USERSET, QUERY_TUNING_METHOD,
+ gettext_noop("Selects the EXPLAIN output format to be used with progressive explains."),
+ gettext_noop("Equivalent to the FORMAT option of EXPLAIN.")
+ },
+ &progressive_explain_format,
+ EXPLAIN_FORMAT_TEXT, explain_format_options,
+ NULL, NULL, NULL
+ },
+
/* End-of-list marker */
{
{NULL, 0, 0, NULL, NULL}, NULL, 0, NULL, NULL, NULL, NULL
diff --git a/src/include/catalog/pg_proc.dat b/src/include/catalog/pg_proc.dat
index cbbe8acd38..dbc1185cfc 100644
--- a/src/include/catalog/pg_proc.dat
+++ b/src/include/catalog/pg_proc.dat
@@ -12402,4 +12402,14 @@
proargtypes => 'int2',
prosrc => 'gist_stratnum_identity' },
+{ oid => '8770',
+ descr => 'statistics: information about progress of backends running statements',
+ proname => 'pg_stat_progress_explain', prorows => '100', proisstrict => 'f',
+ proretset => 't', provolatile => 's', proparallel => 'r',
+ prorettype => 'record', proargtypes => 'bool',
+ proallargtypes => '{bool,int4,timestamptz,int4,text,text}',
+ proargmodes => '{i,o,o,o,o,o}',
+ proargnames => '{mode,pid,last_explain,explain_count,total_explain_time,explain}',
+ prosrc => 'pg_stat_progress_explain' },
+
]
diff --git a/src/include/commands/explain.h b/src/include/commands/explain.h
index aa5872bc15..127777b174 100644
--- a/src/include/commands/explain.h
+++ b/src/include/commands/explain.h
@@ -16,6 +16,7 @@
#include "executor/executor.h"
#include "lib/stringinfo.h"
#include "parser/parse_node.h"
+#include "datatype/timestamp.h"
typedef enum ExplainSerializeOption
{
@@ -67,12 +68,28 @@ typedef struct ExplainState
List *deparse_cxt; /* context list for deparsing expressions */
Bitmapset *printed_subplans; /* ids of SubPlans we've printed */
bool hide_workers; /* set if we find an invisible Gather */
+ bool progressive; /* set if tracking a progressive explain */
int rtable_size; /* length of rtable excluding the RTE_GROUP
* entry */
/* state related to the current plan node */
ExplainWorkersState *workers_state; /* needed if parallel plan */
} ExplainState;
+typedef struct explainHashKey
+{
+ int pid; /* PID */
+} explainHashKey;
+
+typedef struct explainHashEntry
+{
+ explainHashKey key; /* hash key of entry - MUST BE FIRST */
+ int pid;
+ TimestampTz last_explain;
+ int explain_count;
+ float explain_duration;
+ char plan[];
+} explainHashEntry;
+
/* Hook for plugins to get control in ExplainOneQuery() */
typedef void (*ExplainOneQuery_hook_type) (Query *query,
int cursorOptions,
@@ -144,4 +161,9 @@ extern void ExplainCloseGroup(const char *objtype, const char *labelname,
extern DestReceiver *CreateExplainSerializeDestReceiver(ExplainState *es);
+extern void ProgressiveExplainBegin(QueryDesc *queryDesc);
+extern void ProgressiveExplainUpdate(PlanState *node);
+extern Size ExplainHashShmemSize(void);
+extern void InitExplainHash(void);
+
#endif /* EXPLAIN_H */
diff --git a/src/include/executor/instrument.h b/src/include/executor/instrument.h
index bfd7b6d844..2963a70e41 100644
--- a/src/include/executor/instrument.h
+++ b/src/include/executor/instrument.h
@@ -108,6 +108,7 @@ extern void InstrStartNode(Instrumentation *instr);
extern void InstrStopNode(Instrumentation *instr, double nTuples);
extern void InstrUpdateTupleCount(Instrumentation *instr, double nTuples);
extern void InstrEndLoop(Instrumentation *instr);
+extern void InstrEndLoopForce(Instrumentation *instr);
extern void InstrAggNode(Instrumentation *dst, Instrumentation *add);
extern void InstrStartParallelQuery(void);
extern void InstrEndParallelQuery(BufferUsage *bufusage, WalUsage *walusage);
diff --git a/src/include/nodes/execnodes.h b/src/include/nodes/execnodes.h
index 182a6956bb..c57b4c28d2 100644
--- a/src/include/nodes/execnodes.h
+++ b/src/include/nodes/execnodes.h
@@ -56,6 +56,7 @@ struct ExprState;
struct ExprContext;
struct RangeTblEntry; /* avoid including parsenodes.h here */
struct ExprEvalStep; /* avoid including execExpr.h everywhere */
+struct QueryDesc; /* avoid including execdesc.h here */
struct CopyMultiInsertBuffer;
struct LogicalTapeSet;
@@ -735,6 +736,10 @@ typedef struct EState
*/
List *es_insert_pending_result_relations;
List *es_insert_pending_modifytables;
+
+ struct QueryDesc *query_desc;
+ instr_time progressive_explain_interval_time;
+ struct PlanState *progressive_explain_current_node;
} EState;
diff --git a/src/include/storage/lwlocklist.h b/src/include/storage/lwlocklist.h
index 6a2f64c54f..b6ab577370 100644
--- a/src/include/storage/lwlocklist.h
+++ b/src/include/storage/lwlocklist.h
@@ -83,3 +83,4 @@ PG_LWLOCK(49, WALSummarizer)
PG_LWLOCK(50, DSMRegistry)
PG_LWLOCK(51, InjectionPoint)
PG_LWLOCK(52, SerialControl)
+PG_LWLOCK(53, ExplainHash)
diff --git a/src/include/utils/guc.h b/src/include/utils/guc.h
index 840b0fe57f..faa5118c58 100644
--- a/src/include/utils/guc.h
+++ b/src/include/utils/guc.h
@@ -274,6 +274,13 @@ extern PGDLLIMPORT int log_min_duration_statement;
extern PGDLLIMPORT int log_temp_files;
extern PGDLLIMPORT double log_statement_sample_rate;
extern PGDLLIMPORT double log_xact_sample_rate;
+extern PGDLLIMPORT bool progressive_explain;
+extern PGDLLIMPORT bool progressive_explain_verbose;
+extern PGDLLIMPORT bool progressive_explain_settings;
+extern PGDLLIMPORT int progressive_explain_interval;
+extern PGDLLIMPORT int progressive_explain_output_size;
+extern PGDLLIMPORT int progressive_explain_format;
+extern PGDLLIMPORT double progressive_explain_sample_rate;
extern PGDLLIMPORT char *backtrace_functions;
extern PGDLLIMPORT int temp_file_limit;
--
2.39.5 (Apple Git-154)
hi.
all the newly added GUC
progressive_explain;
progressive_explain_verbose;
progressive_explain_settings;
progressive_explain_interval;
progressive_explain_output_size;
progressive_explain_format;
progressive_explain_sample_rate;
also need to add to postgresql.conf.sample?
in doc/src/sgml/monitoring.sgml, we also need add
view pg_stat_progress_explain
to the section
<table id="monitoring-stats-dynamic-views-table">
<title>Dynamic Statistics Views</title>
(Table 27.1. Dynamic Statistics Views)
pg_stat_progress_explain.explain will be truncated after 4096 byte.
(default value of progressive_explain_output_size)
so if the progressive_explain_format is json,
and the plan is bigger (imagine two partitioned tables joined
together, each having many partitions)
the column "explain" text may not be a valid json.
Should we be concerned about this?
I don't really understand the actual usage of
pg_stat_progress_explain.explain_count.
Other column usage makes sense to me.
Can you share your idea why we need this column?
select name, category from pg_settings
where category = 'Query Tuning / Planner Method Configuration';
you will see that in config_group as QUERY_TUNING_METHOD
all the GUC names generally begin with "enable".
all the GUC names begin with "progressive" set the config_group as
QUERY_TUNING_METHOD
may not be appropriate? also it is not related to query tuning.
#include "utils/backend_status.h"
#include "storage/procarray.h"
#include "executor/spi.h"
#include "utils/guc.h"
src/backend/commands/explain.c
the header generally should be sorted in alphabetical ascending order.
apply the order to ipci.c, execMain.c, execProcnode.c
else
/* Node in progress */
if (es->progressive && planstate ==
planstate->state->progressive_explain_current_node)
appendStringInfo(es->str,
" (actual rows=%.0f loops=%.0f) (current)",
rows, nloops);
else
appendStringInfo(es->str,
" (actual rows=%.0f loops=%.0f)",
rows, nloops);
the above part in src/backend/commands/explain.c ExplainNode, the
indentation looks wrong to me.
Sending rebased version to fix cfbot tests.
Attachments:
v3-0001-Proposal-for-progressive-explains.patchapplication/octet-stream; name=v3-0001-Proposal-for-progressive-explains.patchDownload
From d8d42cf44235d664dc16492e601b5c3e97757bcd Mon Sep 17 00:00:00 2001
From: rafaelthca <rafaelthca@gmail.com>
Date: Tue, 7 Jan 2025 19:32:16 +0000
Subject: [PATCH v3] Proposal for progressive explains
This proposal introduces a feature to print execution plans of active
queries in an in-memory shared hash object so that other sessions can
visualize via new view pg_stat_progress_explain.
Plans are only printed if new GUC parameter progressive_explain is
enabled.
For regular queries or queries started with EXPLAIN (without ANALYZE)
the plan is printed only once at the start.
For instrumented runs (started via EXPLAIN ANALYZE or when auto_explain
flag log_analyze is enabled) the plan is printed on a fixed interval
controlled by new GUC parameter progressive_explain_interval including
all instrumentation stats computed so far (per node rows and execution
time).
New view:
- pg_stat_progress_explain
- pid: PID of the process running the query
- last_explain: timestamp when plan was last printed
- explain_count: amount of times plan was printed
- total_explain_time: accumulated time spent printing plans (in ms)
- explain: the actual plan (limited read privileges)
New GUCs:
- progressive_explain: if progressive plans are printed for local
session.
- type: bool
- default: off
- context: user
- progressive_explain_interval: interval between each explain print.
- type: int
- default: 1s
- min: 10ms
- context: user
- progressive_explain_sample_rate: fraction of rows processed by the
query until progressive_explain_interval is evaluated to print a
progressive plan
- type: floating point
- default: 0.1
- range: 0.0 - 1.0
- context: user
- progressive_explain_output_size: max output size of the plan
printed in the in-memory hash table.
- type: int
- default: 4096
- min: 100
- context: postmaster
- progressive_explain_format: format used to print the plans.
- type: enum
- default: text
- context: user
- progressive_explain_settings: controls whether information about
modified configuration is added to the printed plan.
- type: bool
- default: off
- context: user
- progressive_explain_verbose: controls whether verbose details are
added to the printed plan.
- type: bool
- default: off
- context: user
---
doc/src/sgml/config.sgml | 109 ++++
doc/src/sgml/monitoring.sgml | 84 +++
doc/src/sgml/perform.sgml | 100 ++++
src/backend/catalog/system_views.sql | 5 +
src/backend/commands/explain.c | 479 ++++++++++++++++--
src/backend/executor/execMain.c | 12 +
src/backend/executor/execProcnode.c | 35 +-
src/backend/executor/instrument.c | 20 +-
src/backend/storage/ipc/ipci.c | 7 +
.../utils/activity/wait_event_names.txt | 1 +
src/backend/utils/misc/guc_tables.c | 92 ++++
src/include/catalog/pg_proc.dat | 10 +
src/include/commands/explain.h | 22 +
src/include/executor/instrument.h | 1 +
src/include/nodes/execnodes.h | 5 +
src/include/storage/lwlocklist.h | 1 +
src/include/utils/guc.h | 7 +
17 files changed, 938 insertions(+), 52 deletions(-)
diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml
index 740ff5d504..b8f5cb72ff 100644
--- a/doc/src/sgml/config.sgml
+++ b/doc/src/sgml/config.sgml
@@ -8502,6 +8502,115 @@ COPY postgres_log FROM '/full/path/to/logfile.csv' WITH csv;
</listitem>
</varlistentry>
+ <varlistentry id="guc-progressive-explain" xreflabel="progressive_explain">
+ <term><varname>progressive_explain</varname> (<type>boolean</type>)
+ <indexterm>
+ <primary><varname>progressive_explain</varname> configuration parameter</primary>
+ </indexterm>
+ </term>
+ <listitem>
+ <para>
+ Determines whether progressive explains are enabled; see
+ <xref linkend="using-explain-progressive"/>. The default is
+ <literal>off</literal>.
+ </para>
+ </listitem>
+ </varlistentry>
+
+ <varlistentry id="guc-progressive-explain-verbose" xreflabel="progressive_explain_verbose">
+ <term><varname>progressive_explain_verbose</varname> (<type>boolean</type>)
+ <indexterm>
+ <primary><varname>progressive_explain_verbose</varname> configuration parameter</primary>
+ </indexterm>
+ </term>
+ <listitem>
+ <para>
+ Controls whether verbose details are added to progressive explains.
+ Equivalent to the VERBOSE option of EXPLAIN. The default is
+ <literal>off</literal>.
+ </para>
+ </listitem>
+ </varlistentry>
+
+ <varlistentry id="guc-progressive-explain-settings" xreflabel="progressive_explain_settings">
+ <term><varname>progressive_explain_settings</varname> (<type>boolean</type>)
+ <indexterm>
+ <primary><varname>progressive_explain_settings</varname> configuration parameter</primary>
+ </indexterm>
+ </term>
+ <listitem>
+ <para>
+ Controls whether information about modified configuration is added to
+ progressive explains. Equivalent to the SETTINGS option of EXPLAIN.
+ The default is <literal>off</literal>.
+ </para>
+ </listitem>
+ </varlistentry>
+
+ <varlistentry id="guc-progressive-explain-interval" xreflabel="progressive_explain_interval">
+ <term><varname>progressive_explain_interval</varname> (<type>integer</type>)
+ <indexterm>
+ <primary><varname>progressive_explain_interval</varname> configuration parameter</primary>
+ </indexterm>
+ </term>
+ <listitem>
+ <para>
+ Sets the interval (in milliseconds) between each instrumented
+ progressive explain. The default is <literal>1s</literal>.
+ </para>
+ </listitem>
+ </varlistentry>
+
+ <varlistentry id="guc-progressive-explain-format" xreflabel="progressive_explain_format">
+ <term><varname>progressive_explain_format</varname> (<type>enum</type>)
+ <indexterm>
+ <primary><varname>progressive_explain_format</varname> configuration parameter</primary>
+ </indexterm>
+ </term>
+ <listitem>
+ <para>
+ Selects the EXPLAIN output format to be used with progressive
+ explains. Equivalent to the FORMAT option of EXPLAIN. The default
+ is <literal>text</literal>.
+ </para>
+ </listitem>
+ </varlistentry>
+
+ <varlistentry id="guc-progressive-explain-output-size" xreflabel="progressive_explain_output_size">
+ <term><varname>progressive_explain_output_size</varname> (<type>integer</type>)
+ <indexterm>
+ <primary><varname>progressive_explain_output_size</varname> configuration parameter</primary>
+ </indexterm>
+ </term>
+ <listitem>
+ <para>
+ Specifies the amount of memory reserved to store the text of the
+ progressive explain for each client backend or parallel worker, for the
+ <structname>pg_stat_progress_explain</structname>.<structfield>explain</structfield>
+ field. If this value is specified without units, it is taken as bytes.
+ The default value is <literal>4096 bytes</literal>.
+ This parameter can only be set at server start.
+ </para>
+ </listitem>
+ </varlistentry>
+
+ <varlistentry id="guc-progressive-explain-sample-rate" xreflabel="progressive_explain_sample_rate">
+ <term><varname>progressive_explain_sample_rate</varname> (<type>floating point</type>)
+ <indexterm>
+ <primary><varname>progressive_explain_sample_rate</varname> configuration parameter</primary>
+ </indexterm>
+ </term>
+ <listitem>
+ <para>
+ Fraction of rows processed by the query until
+ <xref linkend="guc-progressive-explain-interval"/> is evaluated
+ to print a progressive explain plan. The default value is
+ <literal>0.01</literal>, resulting in 1 check every 100 processed
+ rows.
+ </para>
+ </listitem>
+ </varlistentry>
+
</variablelist>
</sect2>
diff --git a/doc/src/sgml/monitoring.sgml b/doc/src/sgml/monitoring.sgml
index d0d176cc54..2ce928fa36 100644
--- a/doc/src/sgml/monitoring.sgml
+++ b/doc/src/sgml/monitoring.sgml
@@ -6727,6 +6727,90 @@ FROM pg_stat_get_backend_idset() AS backendid;
</sect2>
+<sect2 id="explain-progress-reporting">
+ <title>EXPLAIN Progress Reporting</title>
+
+ <indexterm>
+ <primary>pg_stat_progress_explain</primary>
+ </indexterm>
+
+ <para>
+ Whenever a client backend or parallel worker is running a query with
+ <xref linkend="guc-progressive-explain"/> enabled, the
+ <structname>pg_stat_progress_explain</structname> view will contain a
+ corresponding row with query plan details; see
+ <xref linkend="using-explain-progressive"/>. The table below describe the
+ information that will be reported.
+ </para>
+
+ <table id="pg-stat-progress-explain-view" xreflabel="pg_stat_progress_explain">
+ <title><structname>pg_stat_progress_explain</structname> View</title>
+ <tgroup cols="1">
+ <thead>
+ <row>
+ <entry role="catalog_table_entry"><para role="column_definition">
+ Column Type
+ </para>
+ <para>
+ Description
+ </para></entry>
+ </row>
+ </thead>
+
+ <tbody>
+ <row>
+ <entry role="catalog_table_entry"><para role="column_definition">
+ <structfield>pid</structfield> <type>integer</type>
+ </para>
+ <para>
+ Process ID of a client backend or parallel worker.
+ </para></entry>
+ </row>
+
+ <row>
+ <entry role="catalog_table_entry"><para role="column_definition">
+ <structfield>last_explain</structfield> <type>timestamp with time zone</type>
+ </para>
+ <para>
+ Timestamp when plan was last printed.
+ </para></entry>
+ </row>
+
+ <row>
+ <entry role="catalog_table_entry"><para role="column_definition">
+ <structfield>explain_count</structfield> <type>integer</type>
+ </para>
+ <para>
+ Amount of times plan was printed.
+ </para></entry>
+ </row>
+
+ <row>
+ <entry role="catalog_table_entry"><para role="column_definition">
+ <structfield>total_explain_time</structfield> <type>floating point</type>
+ </para>
+ <para>
+ Accumulated time spent printing plans (in ms).
+ </para></entry>
+ </row>
+
+ <row>
+ <entry role="catalog_table_entry"><para role="column_definition">
+ <structfield>explain</structfield> <type>text</type>
+ </para>
+ <para>
+ The progressive explain text. By default the explain text is
+ truncated at 4096 bytes; this value can be changed via the
+ parameter <xref linkend="guc-progressive-explain-output-size"/>.
+ </para></entry>
+ </row>
+
+ </tbody>
+ </tgroup>
+ </table>
+
+ </sect2>
+
</sect1>
<sect1 id="dynamic-trace">
diff --git a/doc/src/sgml/perform.sgml b/doc/src/sgml/perform.sgml
index a502a2aaba..4f09361829 100644
--- a/doc/src/sgml/perform.sgml
+++ b/doc/src/sgml/perform.sgml
@@ -1109,6 +1109,106 @@ EXPLAIN ANALYZE SELECT * FROM tenk1 WHERE unique1 < 100 AND unique2 > 9000
</para>
</sect2>
+ <sect2 id="using-explain-progressive">
+ <title>Progressive <command>EXPLAIN</command></title>
+
+ <para>
+ The query plan created by the planner for any given active query, or
+ detailed plan with row counts and accumulated run time when
+ <command>EXPLAIN ANALYZE</command> is used, can be visualized by any
+ session via <xref linkend="pg-stat-progress-explain-view"/> view when
+ <xref linkend="guc-progressive-explain"/> is enabled in the client
+ backend or parallel worker executing query. Settings
+ <xref linkend="guc-progressive-explain-format"/>,
+ <xref linkend="guc-progressive-explain-verbose"/> and
+ <xref linkend="guc-progressive-explain-settings"/> can be adjusted
+ to customize the printed plan, containing a length limit defined by
+ <xref linkend="guc-progressive-explain-output-size"/>.
+ </para>
+
+ <para>
+ For queries executed without <command>EXPLAIN ANALYZE</command> the
+ plan is printed only once at the beginning of query execution:
+ </para>
+
+ <para>
+<screen>
+SET progressive_explain = on;
+SET
+
+SELECT *
+FROM test t1
+INNER JOIN test t2 ON (t1.c1 = t2.c1);
+</screen>
+ </para>
+ <para>
+<screen>
+SELECT pid, explain_count, explain FROM pg_stat_progress_explain;
+ pid | explain_count | explain
+-----+---------------+--------------------------------------------------------------------------------
+ 159 | 1 | Hash Join (cost=1159375.00..3912500.00 rows=30000000 width=74)
+ | | Hash Cond: (t1.c1 = t2.c1)
+ | | -> Seq Scan on test t1 (cost=0.00..550000.00 rows=30000000 width=37)
+ | | -> Hash (cost=550000.00..550000.00 rows=30000000 width=37)
+ | | -> Seq Scan on test t2 (cost=0.00..550000.00 rows=30000000 width=37)
+ | |
+</screen>
+ </para>
+
+ <para>
+ When <command>EXPLAIN ANALYZE</command> is used the detailed plan is
+ printed progressively based on
+ <xref linkend="guc-progressive-explain-interval"/> and
+ <xref linkend="guc-progressive-explain-sample-rate"/> settings, including
+ per node accumulated row count and run time statistics computed so far. This
+ makes progressive explain a powerful ally when investigating bottlenecks in
+ expensive queries without having to wait for <command>EXPLAIN ANALYZE</command>
+ to finish.
+ </para>
+
+ <para>
+ Progressive explains include additional information per node to help analyzing
+ execution progress:
+
+ <itemizedlist>
+ <listitem>
+ <para>
+ current: the plan node currently being processed.
+ </para>
+ </listitem>
+ <listitem>
+ <para>
+ never executed: a plan node not processed yet.
+ </para>
+ </listitem>
+ </itemizedlist>
+ </para>
+ <para>
+<screen>
+SET progressive_explain = on;
+SET
+
+EXPLAIN ANALYZE SELECT *
+FROM test t1
+INNER JOIN test t2 ON (t1.c1 = t2.c1);
+</screen>
+ </para>
+ <para>
+<screen>
+SELECT pid, explain_count, explain FROM pg_stat_progress_explain;
+ pid | explain_count | explain
+-----+---------------+----------------------------------------------------------------------------------------------------------------------------------------------
+ 159 | 7 | Hash Join (cost=1159375.00..3912500.00 rows=30000000 width=74) (never executed)
+ | | Hash Cond: (t1.c1 = t2.c1)
+ | | -> Seq Scan on test t1 (cost=0.00..550000.00 rows=30000000 width=37) (actual time=0.009..0.009 rows=1 loops=1)
+ | | -> Hash (cost=550000.00..550000.00 rows=30000000 width=37) (never executed)
+ | | -> Seq Scan on test t2 (cost=0.00..550000.00 rows=30000000 width=37) (actual time=0.004..2165.201 rows=27925599 loops=1) (current)
+ | |
+</screen>
+ </para>
+
+ </sect2>
+
</sect1>
<sect1 id="planner-stats">
diff --git a/src/backend/catalog/system_views.sql b/src/backend/catalog/system_views.sql
index 7a595c84db..c8de672547 100644
--- a/src/backend/catalog/system_views.sql
+++ b/src/backend/catalog/system_views.sql
@@ -1325,6 +1325,11 @@ CREATE VIEW pg_stat_progress_copy AS
FROM pg_stat_get_progress_info('COPY') AS S
LEFT JOIN pg_database D ON S.datid = D.oid;
+CREATE VIEW pg_stat_progress_explain AS
+ SELECT
+ *
+ FROM pg_stat_progress_explain(true);
+
CREATE VIEW pg_user_mappings AS
SELECT
U.oid AS umid,
diff --git a/src/backend/commands/explain.c b/src/backend/commands/explain.c
index c24e66f82e..22578213cf 100644
--- a/src/backend/commands/explain.c
+++ b/src/backend/commands/explain.c
@@ -13,6 +13,8 @@
*/
#include "postgres.h"
+#include <time.h>
+
#include "access/xact.h"
#include "catalog/pg_type.h"
#include "commands/createas.h"
@@ -22,6 +24,8 @@
#include "jit/jit.h"
#include "libpq/pqformat.h"
#include "libpq/protocol.h"
+#include "miscadmin.h"
+#include "funcapi.h"
#include "nodes/extensible.h"
#include "nodes/makefuncs.h"
#include "nodes/nodeFuncs.h"
@@ -40,6 +44,12 @@
#include "utils/tuplesort.h"
#include "utils/typcache.h"
#include "utils/xml.h"
+#include "utils/backend_status.h"
+#include "storage/procarray.h"
+#include "executor/spi.h"
+#include "utils/guc.h"
+
+
/* Hook for plugins to get control in ExplainOneQuery() */
@@ -48,6 +58,8 @@ ExplainOneQuery_hook_type ExplainOneQuery_hook = NULL;
/* Hook for plugins to get control in explain_get_index_name() */
explain_get_index_name_hook_type explain_get_index_name_hook = NULL;
+/* Shared hash to store progressive explains */
+static HTAB *explainArray = NULL;
/* Instrumentation data for SERIALIZE option */
typedef struct SerializeMetrics
@@ -140,7 +152,7 @@ static void show_hashagg_info(AggState *aggstate, ExplainState *es);
static void show_tidbitmap_info(BitmapHeapScanState *planstate,
ExplainState *es);
static void show_instrumentation_count(const char *qlabel, int which,
- PlanState *planstate, ExplainState *es);
+ Instrumentation *instr, ExplainState *es);
static void show_foreignscan_info(ForeignScanState *fsstate, ExplainState *es);
static const char *explain_get_index_name(Oid indexId);
static bool peek_buffer_usage(ExplainState *es, const BufferUsage *usage);
@@ -180,6 +192,8 @@ static void ExplainJSONLineEnding(ExplainState *es);
static void ExplainYAMLLineStarting(ExplainState *es);
static void escape_yaml(StringInfo buf, const char *str);
static SerializeMetrics GetSerializationMetrics(DestReceiver *dest);
+void ProgressiveExplainPrint(QueryDesc *queryDesc);
+static void ExplainTrackQueryReleaseFunc(void *);
@@ -392,6 +406,8 @@ NewExplainState(void)
es->costs = true;
/* Prepare output buffer. */
es->str = makeStringInfo();
+ /* An explain state is not progressive by default */
+ es->progressive = false;
return es;
}
@@ -1504,6 +1520,7 @@ ExplainNode(PlanState *planstate, List *ancestors,
const char *partialmode = NULL;
const char *operation = NULL;
const char *custom_name = NULL;
+ Instrumentation *local_instr = NULL;
ExplainWorkersState *save_workers_state = es->workers_state;
int save_indent = es->indent;
bool haschildren;
@@ -1967,24 +1984,56 @@ ExplainNode(PlanState *planstate, List *ancestors,
* instrumentation results the user didn't ask for. But we do the
* InstrEndLoop call anyway, if possible, to reduce the number of cases
* auto_explain has to contend with.
+ *
+ * For regular explains instrumentation clean up is called directly in the
+ * main instrumentation objects. progressive explains need to clone
+ * instrumentation object and forcibly end the loop in nodes that may be
+ * running.
*/
if (planstate->instrument)
- InstrEndLoop(planstate->instrument);
+ {
+ /* Progressive explain. Clone instrumentation */
+ if (es->progressive)
+ {
+ local_instr = palloc0(sizeof(*local_instr));
+ *local_instr = *planstate->instrument;
+ /* Force end loop even if node is in progress */
+ InstrEndLoopForce(local_instr);
+ }
+ /* Use main instrumentation */
+ else
+ {
+ local_instr = planstate->instrument;
+ InstrEndLoop(local_instr);
+ }
+ }
if (es->analyze &&
- planstate->instrument && planstate->instrument->nloops > 0)
+ local_instr && local_instr->nloops > 0)
{
- double nloops = planstate->instrument->nloops;
- double startup_ms = 1000.0 * planstate->instrument->startup / nloops;
- double total_ms = 1000.0 * planstate->instrument->total / nloops;
- double rows = planstate->instrument->ntuples / nloops;
+ double nloops = local_instr->nloops;
+ double startup_ms = 1000.0 * local_instr->startup / nloops;
+ double total_ms = 1000.0 * local_instr->total / nloops;
+ double rows = local_instr->ntuples / nloops;
if (es->format == EXPLAIN_FORMAT_TEXT)
{
if (es->timing)
+ /* Node in progress */
+ if (es->progressive && planstate == planstate->state->progressive_explain_current_node)
+ appendStringInfo(es->str,
+ " (actual time=%.3f..%.3f rows=%.0f loops=%.0f) (current)",
+ startup_ms, total_ms, rows, nloops);
+ else
+ appendStringInfo(es->str,
+ " (actual time=%.3f..%.3f rows=%.0f loops=%.0f)",
+ startup_ms, total_ms, rows, nloops);
+ else
+ /* Node in progress */
+ if (es->progressive && planstate == planstate->state->progressive_explain_current_node)
appendStringInfo(es->str,
- " (actual time=%.3f..%.3f rows=%.0f loops=%.0f)",
- startup_ms, total_ms, rows, nloops);
+ " (actual rows=%.0f loops=%.0f) (current)",
+ rows, nloops);
else
appendStringInfo(es->str,
" (actual rows=%.0f loops=%.0f)",
@@ -1999,6 +2048,11 @@ ExplainNode(PlanState *planstate, List *ancestors,
ExplainPropertyFloat("Actual Total Time", "ms", total_ms,
3, es);
}
+ /* Progressive explain. Add current node flag is applicable */
+ if (es->progressive && planstate == planstate->state->progressive_explain_current_node)
+ {
+ ExplainPropertyBool("Current", true, es);
+ }
ExplainPropertyFloat("Actual Rows", NULL, rows, 0, es);
ExplainPropertyFloat("Actual Loops", NULL, nloops, 0, es);
}
@@ -2107,29 +2161,29 @@ ExplainNode(PlanState *planstate, List *ancestors,
"Index Cond", planstate, ancestors, es);
if (((IndexScan *) plan)->indexqualorig)
show_instrumentation_count("Rows Removed by Index Recheck", 2,
- planstate, es);
+ local_instr, es);
show_scan_qual(((IndexScan *) plan)->indexorderbyorig,
"Order By", planstate, ancestors, es);
show_scan_qual(plan->qual, "Filter", planstate, ancestors, es);
if (plan->qual)
show_instrumentation_count("Rows Removed by Filter", 1,
- planstate, es);
+ local_instr, es);
break;
case T_IndexOnlyScan:
show_scan_qual(((IndexOnlyScan *) plan)->indexqual,
"Index Cond", planstate, ancestors, es);
if (((IndexOnlyScan *) plan)->recheckqual)
show_instrumentation_count("Rows Removed by Index Recheck", 2,
- planstate, es);
+ local_instr, es);
show_scan_qual(((IndexOnlyScan *) plan)->indexorderby,
"Order By", planstate, ancestors, es);
show_scan_qual(plan->qual, "Filter", planstate, ancestors, es);
if (plan->qual)
show_instrumentation_count("Rows Removed by Filter", 1,
- planstate, es);
+ local_instr, es);
if (es->analyze)
ExplainPropertyFloat("Heap Fetches", NULL,
- planstate->instrument->ntuples2, 0, es);
+ local_instr->ntuples2, 0, es);
break;
case T_BitmapIndexScan:
show_scan_qual(((BitmapIndexScan *) plan)->indexqualorig,
@@ -2140,11 +2194,11 @@ ExplainNode(PlanState *planstate, List *ancestors,
"Recheck Cond", planstate, ancestors, es);
if (((BitmapHeapScan *) plan)->bitmapqualorig)
show_instrumentation_count("Rows Removed by Index Recheck", 2,
- planstate, es);
+ local_instr, es);
show_scan_qual(plan->qual, "Filter", planstate, ancestors, es);
if (plan->qual)
show_instrumentation_count("Rows Removed by Filter", 1,
- planstate, es);
+ local_instr, es);
show_tidbitmap_info((BitmapHeapScanState *) planstate, es);
break;
case T_SampleScan:
@@ -2161,7 +2215,7 @@ ExplainNode(PlanState *planstate, List *ancestors,
show_scan_qual(plan->qual, "Filter", planstate, ancestors, es);
if (plan->qual)
show_instrumentation_count("Rows Removed by Filter", 1,
- planstate, es);
+ local_instr, es);
if (IsA(plan, CteScan))
show_ctescan_info(castNode(CteScanState, planstate), es);
break;
@@ -2172,7 +2226,7 @@ ExplainNode(PlanState *planstate, List *ancestors,
show_scan_qual(plan->qual, "Filter", planstate, ancestors, es);
if (plan->qual)
show_instrumentation_count("Rows Removed by Filter", 1,
- planstate, es);
+ local_instr, es);
ExplainPropertyInteger("Workers Planned", NULL,
gather->num_workers, es);
@@ -2196,7 +2250,7 @@ ExplainNode(PlanState *planstate, List *ancestors,
show_scan_qual(plan->qual, "Filter", planstate, ancestors, es);
if (plan->qual)
show_instrumentation_count("Rows Removed by Filter", 1,
- planstate, es);
+ local_instr, es);
ExplainPropertyInteger("Workers Planned", NULL,
gm->num_workers, es);
@@ -2230,7 +2284,7 @@ ExplainNode(PlanState *planstate, List *ancestors,
show_scan_qual(plan->qual, "Filter", planstate, ancestors, es);
if (plan->qual)
show_instrumentation_count("Rows Removed by Filter", 1,
- planstate, es);
+ local_instr, es);
break;
case T_TableFuncScan:
if (es->verbose)
@@ -2244,7 +2298,7 @@ ExplainNode(PlanState *planstate, List *ancestors,
show_scan_qual(plan->qual, "Filter", planstate, ancestors, es);
if (plan->qual)
show_instrumentation_count("Rows Removed by Filter", 1,
- planstate, es);
+ local_instr, es);
show_table_func_scan_info(castNode(TableFuncScanState,
planstate), es);
break;
@@ -2262,7 +2316,7 @@ ExplainNode(PlanState *planstate, List *ancestors,
show_scan_qual(plan->qual, "Filter", planstate, ancestors, es);
if (plan->qual)
show_instrumentation_count("Rows Removed by Filter", 1,
- planstate, es);
+ local_instr, es);
}
break;
case T_TidRangeScan:
@@ -2279,14 +2333,14 @@ ExplainNode(PlanState *planstate, List *ancestors,
show_scan_qual(plan->qual, "Filter", planstate, ancestors, es);
if (plan->qual)
show_instrumentation_count("Rows Removed by Filter", 1,
- planstate, es);
+ local_instr, es);
}
break;
case T_ForeignScan:
show_scan_qual(plan->qual, "Filter", planstate, ancestors, es);
if (plan->qual)
show_instrumentation_count("Rows Removed by Filter", 1,
- planstate, es);
+ local_instr, es);
show_foreignscan_info((ForeignScanState *) planstate, es);
break;
case T_CustomScan:
@@ -2296,7 +2350,7 @@ ExplainNode(PlanState *planstate, List *ancestors,
show_scan_qual(plan->qual, "Filter", planstate, ancestors, es);
if (plan->qual)
show_instrumentation_count("Rows Removed by Filter", 1,
- planstate, es);
+ local_instr, es);
if (css->methods->ExplainCustomScan)
css->methods->ExplainCustomScan(css, ancestors, es);
}
@@ -2306,11 +2360,11 @@ ExplainNode(PlanState *planstate, List *ancestors,
"Join Filter", planstate, ancestors, es);
if (((NestLoop *) plan)->join.joinqual)
show_instrumentation_count("Rows Removed by Join Filter", 1,
- planstate, es);
+ local_instr, es);
show_upper_qual(plan->qual, "Filter", planstate, ancestors, es);
if (plan->qual)
show_instrumentation_count("Rows Removed by Filter", 2,
- planstate, es);
+ local_instr, es);
break;
case T_MergeJoin:
show_upper_qual(((MergeJoin *) plan)->mergeclauses,
@@ -2319,11 +2373,11 @@ ExplainNode(PlanState *planstate, List *ancestors,
"Join Filter", planstate, ancestors, es);
if (((MergeJoin *) plan)->join.joinqual)
show_instrumentation_count("Rows Removed by Join Filter", 1,
- planstate, es);
+ local_instr, es);
show_upper_qual(plan->qual, "Filter", planstate, ancestors, es);
if (plan->qual)
show_instrumentation_count("Rows Removed by Filter", 2,
- planstate, es);
+ local_instr, es);
break;
case T_HashJoin:
show_upper_qual(((HashJoin *) plan)->hashclauses,
@@ -2332,11 +2386,11 @@ ExplainNode(PlanState *planstate, List *ancestors,
"Join Filter", planstate, ancestors, es);
if (((HashJoin *) plan)->join.joinqual)
show_instrumentation_count("Rows Removed by Join Filter", 1,
- planstate, es);
+ local_instr, es);
show_upper_qual(plan->qual, "Filter", planstate, ancestors, es);
if (plan->qual)
show_instrumentation_count("Rows Removed by Filter", 2,
- planstate, es);
+ local_instr, es);
break;
case T_Agg:
show_agg_keys(castNode(AggState, planstate), ancestors, es);
@@ -2344,13 +2398,13 @@ ExplainNode(PlanState *planstate, List *ancestors,
show_hashagg_info((AggState *) planstate, es);
if (plan->qual)
show_instrumentation_count("Rows Removed by Filter", 1,
- planstate, es);
+ local_instr, es);
break;
case T_WindowAgg:
show_upper_qual(plan->qual, "Filter", planstate, ancestors, es);
if (plan->qual)
show_instrumentation_count("Rows Removed by Filter", 1,
- planstate, es);
+ local_instr, es);
show_upper_qual(((WindowAgg *) plan)->runConditionOrig,
"Run Condition", planstate, ancestors, es);
show_windowagg_info(castNode(WindowAggState, planstate), es);
@@ -2360,7 +2414,7 @@ ExplainNode(PlanState *planstate, List *ancestors,
show_upper_qual(plan->qual, "Filter", planstate, ancestors, es);
if (plan->qual)
show_instrumentation_count("Rows Removed by Filter", 1,
- planstate, es);
+ local_instr, es);
break;
case T_Sort:
show_sort_keys(castNode(SortState, planstate), ancestors, es);
@@ -2382,7 +2436,7 @@ ExplainNode(PlanState *planstate, List *ancestors,
show_upper_qual(plan->qual, "Filter", planstate, ancestors, es);
if (plan->qual)
show_instrumentation_count("Rows Removed by Filter", 1,
- planstate, es);
+ local_instr, es);
break;
case T_ModifyTable:
show_modifytable_info(castNode(ModifyTableState, planstate), ancestors,
@@ -2427,10 +2481,10 @@ ExplainNode(PlanState *planstate, List *ancestors,
}
/* Show buffer/WAL usage */
- if (es->buffers && planstate->instrument)
- show_buffer_usage(es, &planstate->instrument->bufusage);
- if (es->wal && planstate->instrument)
- show_wal_usage(es, &planstate->instrument->walusage);
+ if (es->buffers && local_instr)
+ show_buffer_usage(es, &local_instr->bufusage);
+ if (es->wal && local_instr)
+ show_wal_usage(es, &local_instr->walusage);
/* Prepare per-worker buffer/WAL usage */
if (es->workers_state && (es->buffers || es->wal) && es->verbose)
@@ -2569,6 +2623,12 @@ ExplainNode(PlanState *planstate, List *ancestors,
ExplainCloseGroup("Plan",
relationship ? NULL : "Plan",
true, es);
+
+ /* Progressive explain. Free cloned instrumentation object */
+ if (es->progressive && local_instr)
+ {
+ pfree(local_instr);
+ }
}
/*
@@ -3947,19 +4007,19 @@ show_tidbitmap_info(BitmapHeapScanState *planstate, ExplainState *es)
*/
static void
show_instrumentation_count(const char *qlabel, int which,
- PlanState *planstate, ExplainState *es)
+ Instrumentation *instr, ExplainState *es)
{
double nfiltered;
double nloops;
- if (!es->analyze || !planstate->instrument)
+ if (!es->analyze || !instr)
return;
if (which == 2)
- nfiltered = planstate->instrument->nfiltered2;
+ nfiltered = instr->nfiltered2;
else
- nfiltered = planstate->instrument->nfiltered1;
- nloops = planstate->instrument->nloops;
+ nfiltered = instr->nfiltered1;
+ nloops = instr->nloops;
/* In text mode, suppress zero counts; they're not interesting enough */
if (nfiltered > 0 || es->format != EXPLAIN_FORMAT_TEXT)
@@ -4625,7 +4685,7 @@ show_modifytable_info(ModifyTableState *mtstate, List *ancestors,
{
show_upper_qual((List *) node->onConflictWhere, "Conflict Filter",
&mtstate->ps, ancestors, es);
- show_instrumentation_count("Rows Removed by Conflict Filter", 1, &mtstate->ps, es);
+ show_instrumentation_count("Rows Removed by Conflict Filter", 1, (&mtstate->ps)->instrument, es);
}
/* EXPLAIN ANALYZE display of actual outcome for each tuple proposed */
@@ -4634,11 +4694,24 @@ show_modifytable_info(ModifyTableState *mtstate, List *ancestors,
double total;
double insert_path;
double other_path;
+ Instrumentation *local_instr;
- InstrEndLoop(outerPlanState(mtstate)->instrument);
+ /* Progressive explain. Clone instrumentation */
+ if (es->progressive)
+ {
+ local_instr = palloc0(sizeof(*local_instr));
+ *local_instr = *outerPlanState(mtstate)->instrument;
+ /* Force end loop even if node is in progress */
+ InstrEndLoopForce(local_instr);
+ }
+ else
+ {
+ local_instr = outerPlanState(mtstate)->instrument;
+ InstrEndLoop(local_instr);
+ }
/* count the number of source rows */
- total = outerPlanState(mtstate)->instrument->ntuples;
+ total = local_instr->ntuples;
other_path = mtstate->ps.instrument->ntuples2;
insert_path = total - other_path;
@@ -4646,6 +4719,10 @@ show_modifytable_info(ModifyTableState *mtstate, List *ancestors,
insert_path, 0, es);
ExplainPropertyFloat("Conflicting Tuples", NULL,
other_path, 0, es);
+
+ /* Progressive explain. Free cloned instrumentation object */
+ if (es->progressive)
+ pfree(local_instr);
}
}
else if (node->operation == CMD_MERGE)
@@ -4658,11 +4735,24 @@ show_modifytable_info(ModifyTableState *mtstate, List *ancestors,
double update_path;
double delete_path;
double skipped_path;
+ Instrumentation *local_instr;
- InstrEndLoop(outerPlanState(mtstate)->instrument);
+ /* Progressive explain. Clone instrumentation */
+ if (es->progressive)
+ {
+ local_instr = palloc0(sizeof(*local_instr));
+ *local_instr = *outerPlanState(mtstate)->instrument;
+ /* Force end loop even if node is in progress */
+ InstrEndLoopForce(local_instr);
+ }
+ else
+ {
+ local_instr = outerPlanState(mtstate)->instrument;
+ InstrEndLoop(local_instr);
+ }
/* count the number of source rows */
- total = outerPlanState(mtstate)->instrument->ntuples;
+ total = local_instr->ntuples;
insert_path = mtstate->mt_merge_inserted;
update_path = mtstate->mt_merge_updated;
delete_path = mtstate->mt_merge_deleted;
@@ -4693,6 +4783,10 @@ show_modifytable_info(ModifyTableState *mtstate, List *ancestors,
ExplainPropertyFloat("Tuples Deleted", NULL, delete_path, 0, es);
ExplainPropertyFloat("Tuples Skipped", NULL, skipped_path, 0, es);
}
+
+ /* Progressive explain. Free cloned instrumentation object */
+ if (es->progressive)
+ pfree(local_instr);
}
}
@@ -5917,3 +6011,290 @@ GetSerializationMetrics(DestReceiver *dest)
return empty;
}
+
+
+/*
+ * ProgressiveExplainPrint
+ * Prints progressive explain in memory.
+ *
+ * This operation needs to be done in a dedicated memory context
+ * as plans for instrumented runs will be printed multiple times
+ * and instrumentation objects need to be cloned so that stats
+ * can get updated without interfering with original objects.
+ */
+void
+ProgressiveExplainPrint(QueryDesc *queryDesc)
+{
+ MemoryContext tmpCxt;
+ MemoryContext oldCxt;
+ instr_time starttime;
+
+ INSTR_TIME_SET_CURRENT(starttime);
+
+ /* Dedicated memory context for the current plan being printed */
+ tmpCxt = AllocSetContextCreate(CurrentMemoryContext,
+ "Progressive Explain Temporary Context",
+ ALLOCSET_DEFAULT_SIZES);
+ oldCxt = MemoryContextSwitchTo(tmpCxt);
+
+ /* Produce a plan only if descriptor is being tracked */
+ if (queryDesc &&
+ queryDesc->planstate)
+ {
+ QueryDesc *currentQueryDesc = queryDesc;
+
+ explainHashKey key;
+ explainHashEntry *entry;
+ ExplainState *es;
+
+ es = NewExplainState();
+
+ /*
+ * Mark ExplainState as progressive so that the plan printing function
+ * adjusts logic accordingly.
+ */
+ es->progressive = true;
+ /* Instrumentation details come from the active QueryDesc */
+ es->analyze = queryDesc->instrument_options;
+ es->buffers = (queryDesc->instrument_options &
+ INSTRUMENT_BUFFERS) != 0;
+ es->wal = (queryDesc->instrument_options &
+ INSTRUMENT_WAL) != 0;
+ es->timing = (queryDesc->instrument_options &
+ INSTRUMENT_TIMER) != 0;
+ es->summary = (es->analyze);
+
+ /* Additional options come from progressive GUC settings */
+ es->format = progressive_explain_format;
+ es->verbose = progressive_explain_verbose;
+ es->settings = progressive_explain_settings;
+
+ ExplainBeginOutput(es);
+ ExplainPrintPlan(es, currentQueryDesc);
+ ExplainEndOutput(es);
+
+ key.pid = MyProcPid;
+
+ /* Exclusive access is needed to update the hash */
+ LWLockAcquire(ExplainHashLock, LW_EXCLUSIVE);
+ entry = (explainHashEntry *) hash_search(explainArray, &key, HASH_FIND, NULL);
+
+
+ if (entry)
+ {
+ entry->explain_count++;
+ strncpy(entry->plan, es->str->data, progressive_explain_output_size);
+ entry->explain_duration += elapsed_time(&starttime);
+ entry->last_explain = GetCurrentTimestamp();
+ }
+
+ LWLockRelease(ExplainHashLock);
+
+ /*
+ * Free local explain state before exiting as this function may be
+ * called multiple times in the same memory context.
+ */
+ pfree(es->str);
+ pfree(es);
+ }
+
+ /* Clean up temp context */
+ MemoryContextSwitchTo(oldCxt);
+ MemoryContextDelete(tmpCxt);
+}
+
+/*
+ * ProgressiveExplainBegin
+ * Enables progressive explain progress tracking for a query in the local backend.
+ *
+ * An progressive explain is printed at the beginning of every query if progressive_explain
+ * is enabled.
+
+ * For instrumented runs started with EXPLAIN ANALYZE the progressive plan is printed
+ * via ExecProcNodeInstrExplain at a regular interval controlled by progressive_explain_interval.
+ *
+ * Plans are stored in shared memory object explainArray that needs to be properly
+ * cleared when the query finishes or gets canceled. This is achieved with the help
+ * of a memory context callback configured in the same memory context where the query
+ * descriptor was created. This strategy allows cleaning explainArray even when the
+ * query doesn't finish gracefully.
+ */
+void
+ProgressiveExplainBegin(QueryDesc *queryDesc)
+{
+ explainHashKey key;
+ explainHashEntry *entry;
+ bool found;
+
+ /* Configure memory context release callback */
+ MemoryContextCallback *queryDescReleaseCallback;
+
+ queryDescReleaseCallback = (MemoryContextCallback *)
+ palloc0(sizeof(MemoryContextCallback));
+ queryDescReleaseCallback->func = ExplainTrackQueryReleaseFunc;
+ queryDescReleaseCallback->arg = NULL;
+ MemoryContextRegisterResetCallback(CurrentMemoryContext,
+ queryDescReleaseCallback);
+
+ INSTR_TIME_SET_CURRENT(queryDesc->estate->progressive_explain_interval_time);
+
+ key.pid = MyProcPid;
+
+ /* Exclusive access is needed to update the hash */
+ LWLockAcquire(ExplainHashLock, LW_EXCLUSIVE);
+
+ /* Find or create an entry with desired hash code */
+ entry = (explainHashEntry *) hash_search(explainArray, &key, HASH_ENTER, &found);
+ entry->pid = MyProcPid;
+ entry->explain_count = 0;
+ entry->explain_duration = 0.0f;
+ strcpy(entry->plan, "");
+ entry->last_explain = 0;
+
+ LWLockRelease(ExplainHashLock);
+
+ /* Printing progressive plan for the first time */
+ ProgressiveExplainPrint(queryDesc);
+}
+
+/*
+ * ProgressiveExplainUpdate
+ * Updates progressive explain for instrumented runs.
+ */
+void
+ProgressiveExplainUpdate(PlanState *node)
+{
+ /*
+ * Update explain plan only if has passed since previous print.
+ */
+ if (elapsed_time(&node->state->progressive_explain_interval_time) * 1000.0 > progressive_explain_interval)
+ {
+ node->state->progressive_explain_current_node = node;
+ ProgressiveExplainPrint(node->state->query_desc);
+ node->state->progressive_explain_current_node = NULL;
+ INSTR_TIME_SET_CURRENT(node->state->progressive_explain_interval_time);
+ }
+}
+
+/*
+ * ExplainTrackQueryReleaseFunc
+ * Memory context release callback function to remove
+ * plan from explain hash.
+ */
+static void
+ExplainTrackQueryReleaseFunc(void *)
+{
+ /* Remove row from hash */
+ explainHashKey key;
+
+ key.pid = MyProcPid;
+ LWLockAcquire(ExplainHashLock, LW_EXCLUSIVE);
+ hash_search(explainArray, &key, HASH_REMOVE, NULL);
+ LWLockRelease(ExplainHashLock);
+}
+
+/*
+ * ExplainHashShmemSize
+ * Compute shared memory space needed for explain hash.
+ */
+Size
+ExplainHashShmemSize(void)
+{
+ Size size = 0;
+ long max_table_size;
+
+ max_table_size = add_size(MaxBackends, max_parallel_workers);
+
+ size = add_size(size, hash_estimate_size(max_table_size, add_size(sizeof(explainHashEntry), progressive_explain_output_size)));
+
+ return size;
+}
+
+/*
+ * InitExplainHash
+ * Initialize hash used to store data of progressive explains.
+ */
+void
+InitExplainHash(void)
+{
+ HASHCTL info;
+
+ info.keysize = sizeof(explainHashKey);
+ info.entrysize = sizeof(explainHashEntry) + progressive_explain_output_size;
+
+ explainArray = ShmemInitHash("explain hash",
+ 50, 50,
+ &info,
+ HASH_ELEM | HASH_BLOBS);
+}
+
+/*
+ * pg_stat_progress_explain
+ * Return the progress of progressive explains.
+ */
+Datum
+pg_stat_progress_explain(PG_FUNCTION_ARGS)
+{
+ char duration[1024];
+#define EXPLAIN_ACTIVITY_COLS 5
+ HASH_SEQ_STATUS hash_seq;
+ explainHashEntry *entry;
+
+ ReturnSetInfo *rsinfo = (ReturnSetInfo *) fcinfo->resultinfo;
+
+ InitMaterializedSRF(fcinfo, 0);
+
+ LWLockAcquire(ExplainHashLock, LW_SHARED);
+
+ hash_seq_init(&hash_seq, explainArray);
+ while ((entry = hash_seq_search(&hash_seq)) != NULL)
+ {
+ Datum values[EXPLAIN_ACTIVITY_COLS] = {0};
+ bool nulls[EXPLAIN_ACTIVITY_COLS] = {0};
+
+ values[0] = entry->pid;
+ values[1] = TimestampTzGetDatum(entry->last_explain);
+ values[2] = entry->explain_count;
+ sprintf(duration, "%.3f", 1000.0 * entry->explain_duration);
+ values[3] = CStringGetTextDatum(duration);
+
+ if (superuser())
+ values[4] = CStringGetTextDatum(entry->plan);
+ else
+ {
+ int num_backends = pgstat_fetch_stat_numbackends();
+ int curr_backend;
+ bool found;
+
+ /* 1-based index */
+ for (curr_backend = 1; curr_backend <= num_backends; curr_backend++)
+ {
+ LocalPgBackendStatus *local_beentry;
+ PgBackendStatus *beentry;
+
+ /* Get the next one in the list */
+ local_beentry = pgstat_get_local_beentry_by_index(curr_backend);
+ beentry = &local_beentry->backendStatus;
+
+ if (beentry->st_procpid == entry->pid)
+ {
+ found = true;
+ if (beentry->st_userid == GetUserId())
+ values[4] = CStringGetTextDatum(entry->plan);
+ else
+ values[4] = CStringGetTextDatum("<insufficient privilege>");
+ break;
+ }
+ }
+
+ if (!found)
+ values[3] = CStringGetTextDatum("<insufficient privilege>");
+ }
+
+ tuplestore_putvalues(rsinfo->setResult, rsinfo->setDesc, values, nulls);
+
+ }
+ LWLockRelease(ExplainHashLock);
+
+ return (Datum) 0;
+}
diff --git a/src/backend/executor/execMain.c b/src/backend/executor/execMain.c
index a06295b6ba..9ceeebce70 100644
--- a/src/backend/executor/execMain.c
+++ b/src/backend/executor/execMain.c
@@ -61,6 +61,7 @@
#include "utils/partcache.h"
#include "utils/rls.h"
#include "utils/snapmgr.h"
+#include "commands/explain.h"
/* Hooks for plugins to get control in ExecutorStart/Run/Finish/End */
@@ -172,6 +173,11 @@ standard_ExecutorStart(QueryDesc *queryDesc, int eflags)
estate = CreateExecutorState();
queryDesc->estate = estate;
+ /*
+ * Adding back reference to QueryDesc
+ */
+ estate->query_desc = queryDesc;
+
oldcontext = MemoryContextSwitchTo(estate->es_query_cxt);
/*
@@ -258,6 +264,12 @@ standard_ExecutorStart(QueryDesc *queryDesc, int eflags)
InitPlan(queryDesc, eflags);
MemoryContextSwitchTo(oldcontext);
+
+ /*
+ * Start progressive explain if enabled.
+ */
+ if (progressive_explain)
+ ProgressiveExplainBegin(queryDesc);
}
/* ----------------------------------------------------------------
diff --git a/src/backend/executor/execProcnode.c b/src/backend/executor/execProcnode.c
index f5f9cfbeea..08467f3ce5 100644
--- a/src/backend/executor/execProcnode.c
+++ b/src/backend/executor/execProcnode.c
@@ -118,9 +118,13 @@
#include "executor/nodeWorktablescan.h"
#include "miscadmin.h"
#include "nodes/nodeFuncs.h"
+#include "commands/explain.h"
+#include "utils/guc.h"
+#include "common/pg_prng.h"
static TupleTableSlot *ExecProcNodeFirst(PlanState *node);
static TupleTableSlot *ExecProcNodeInstr(PlanState *node);
+static TupleTableSlot *ExecProcNodeInstrExplain(PlanState *node);
static bool ExecShutdownNode_walker(PlanState *node, void *context);
@@ -461,8 +465,12 @@ ExecProcNodeFirst(PlanState *node)
* does instrumentation. Otherwise we can dispense with all wrappers and
* have ExecProcNode() directly call the relevant function from now on.
*/
+
if (node->instrument)
- node->ExecProcNode = ExecProcNodeInstr;
+ if (progressive_explain)
+ node->ExecProcNode = ExecProcNodeInstrExplain;
+ else
+ node->ExecProcNode = ExecProcNodeInstr;
else
node->ExecProcNode = node->ExecProcNodeReal;
@@ -489,6 +497,31 @@ ExecProcNodeInstr(PlanState *node)
return result;
}
+/*
+ * ExecProcNode wrapper that performs instrumentation calls and prints
+ * progressive explains. By keeping this a separate function, we add
+ * overhead only when progressive explain is enabled
+ */
+static TupleTableSlot *
+ExecProcNodeInstrExplain(PlanState *node)
+{
+ TupleTableSlot *result;
+
+ InstrStartNode(node->instrument);
+
+ /*
+ * Update progressive explain based on sampling.
+ */
+ if (pg_prng_double(&pg_global_prng_state) < progressive_explain_sample_rate)
+ ProgressiveExplainUpdate(node);
+
+ result = node->ExecProcNodeReal(node);
+
+ InstrStopNode(node->instrument, TupIsNull(result) ? 0.0 : 1.0);
+
+ return result;
+}
+
/* ----------------------------------------------------------------
* MultiExecProcNode
diff --git a/src/backend/executor/instrument.c b/src/backend/executor/instrument.c
index 2d3569b374..ac17c7e9d8 100644
--- a/src/backend/executor/instrument.c
+++ b/src/backend/executor/instrument.c
@@ -25,6 +25,8 @@ static WalUsage save_pgWalUsage;
static void BufferUsageAdd(BufferUsage *dst, const BufferUsage *add);
static void WalUsageAdd(WalUsage *dst, WalUsage *add);
+static void InstrEndLoopInternal(Instrumentation *instr, bool force);
+
/* Allocate new instrumentation structure(s) */
Instrumentation *
@@ -137,7 +139,7 @@ InstrUpdateTupleCount(Instrumentation *instr, double nTuples)
/* Finish a run cycle for a plan node */
void
-InstrEndLoop(Instrumentation *instr)
+InstrEndLoopInternal(Instrumentation *instr, bool force)
{
double totaltime;
@@ -145,7 +147,7 @@ InstrEndLoop(Instrumentation *instr)
if (!instr->running)
return;
- if (!INSTR_TIME_IS_ZERO(instr->starttime))
+ if (!INSTR_TIME_IS_ZERO(instr->starttime) && !force)
elog(ERROR, "InstrEndLoop called on running node");
/* Accumulate per-cycle statistics into totals */
@@ -164,6 +166,20 @@ InstrEndLoop(Instrumentation *instr)
instr->tuplecount = 0;
}
+/* Safely finish a run cycle for a plan node */
+void
+InstrEndLoop(Instrumentation *instr)
+{
+ InstrEndLoopInternal(instr, false);
+}
+
+/* Forcibly finish a run cycle for a plan node */
+void
+InstrEndLoopForce(Instrumentation *instr)
+{
+ InstrEndLoopInternal(instr, true);
+}
+
/* aggregate instrumentation information */
void
InstrAggNode(Instrumentation *dst, Instrumentation *add)
diff --git a/src/backend/storage/ipc/ipci.c b/src/backend/storage/ipc/ipci.c
index 174eed7036..efb4af1971 100644
--- a/src/backend/storage/ipc/ipci.c
+++ b/src/backend/storage/ipc/ipci.c
@@ -50,6 +50,7 @@
#include "storage/sinvaladt.h"
#include "utils/guc.h"
#include "utils/injection_point.h"
+#include "commands/explain.h"
/* GUCs */
int shared_memory_type = DEFAULT_SHARED_MEMORY_TYPE;
@@ -148,6 +149,7 @@ CalculateShmemSize(int *num_semaphores)
size = add_size(size, WaitEventCustomShmemSize());
size = add_size(size, InjectionPointShmemSize());
size = add_size(size, SlotSyncShmemSize());
+ size = add_size(size, ExplainHashShmemSize());
/* include additional requested shmem from preload libraries */
size = add_size(size, total_addin_request);
@@ -300,6 +302,11 @@ CreateOrAttachShmemStructs(void)
*/
PredicateLockShmemInit();
+ /*
+ * Set up instrumented explain hash table
+ */
+ InitExplainHash();
+
/*
* Set up process table
*/
diff --git a/src/backend/utils/activity/wait_event_names.txt b/src/backend/utils/activity/wait_event_names.txt
index 0b53cba807..0eef52c211 100644
--- a/src/backend/utils/activity/wait_event_names.txt
+++ b/src/backend/utils/activity/wait_event_names.txt
@@ -345,6 +345,7 @@ WALSummarizer "Waiting to read or update WAL summarization state."
DSMRegistry "Waiting to read or update the dynamic shared memory registry."
InjectionPoint "Waiting to read or update information related to injection points."
SerialControl "Waiting to read or update shared <filename>pg_serial</filename> state."
+ExplainHash "Waiting to access backend explain shared hash table."
#
# END OF PREDEFINED LWLOCKS (DO NOT CHANGE THIS LINE)
diff --git a/src/backend/utils/misc/guc_tables.c b/src/backend/utils/misc/guc_tables.c
index c9d8cd796a..10c0f5e557 100644
--- a/src/backend/utils/misc/guc_tables.c
+++ b/src/backend/utils/misc/guc_tables.c
@@ -46,6 +46,7 @@
#include "commands/vacuum.h"
#include "common/file_utils.h"
#include "common/scram-common.h"
+#include "commands/explain.h"
#include "jit/jit.h"
#include "libpq/auth.h"
#include "libpq/libpq.h"
@@ -474,6 +475,14 @@ static const struct config_enum_entry wal_compression_options[] = {
{NULL, 0, false}
};
+static const struct config_enum_entry explain_format_options[] = {
+ {"text", EXPLAIN_FORMAT_TEXT, false},
+ {"xml", EXPLAIN_FORMAT_XML, false},
+ {"json", EXPLAIN_FORMAT_JSON, false},
+ {"yaml", EXPLAIN_FORMAT_YAML, false},
+ {NULL, 0, false}
+};
+
/*
* Options for enum values stored in other modules
*/
@@ -528,6 +537,13 @@ int log_parameter_max_length_on_error = 0;
int log_temp_files = -1;
double log_statement_sample_rate = 1.0;
double log_xact_sample_rate = 0;
+bool progressive_explain = false;
+bool progressive_explain_verbose = false;
+bool progressive_explain_settings = false;
+int progressive_explain_interval = 1000;
+int progressive_explain_format = EXPLAIN_FORMAT_TEXT;
+int progressive_explain_output_size = 4096;
+double progressive_explain_sample_rate = 0.01;
char *backtrace_functions;
int temp_file_limit = -1;
@@ -2095,6 +2111,39 @@ struct config_bool ConfigureNamesBool[] =
NULL, NULL, NULL
},
+ {
+ {"progressive_explain", PGC_USERSET, QUERY_TUNING_METHOD,
+ gettext_noop("Enables progressive explains."),
+ gettext_noop("Explain output is visible via pg_stat_progress_explain."),
+ GUC_EXPLAIN
+ },
+ &progressive_explain,
+ false,
+ NULL, NULL, NULL
+ },
+
+ {
+ {"progressive_explain_verbose", PGC_USERSET, QUERY_TUNING_METHOD,
+ gettext_noop("Controls whether verbose details are added to progressive explains."),
+ gettext_noop("Equivalent to the VERBOSE option of EXPLAIN."),
+ GUC_EXPLAIN
+ },
+ &progressive_explain_verbose,
+ false,
+ NULL, NULL, NULL
+ },
+
+ {
+ {"progressive_explain_settings", PGC_USERSET, QUERY_TUNING_METHOD,
+ gettext_noop("Controls whether information about modified configuration is added to progressive explains."),
+ gettext_noop("Equivalent to the SETTINGS option of EXPLAIN."),
+ GUC_EXPLAIN
+ },
+ &progressive_explain_settings,
+ false,
+ NULL, NULL, NULL
+ },
+
/* End-of-list marker */
{
{NULL, 0, 0, NULL, NULL}, NULL, false, NULL, NULL, NULL
@@ -3742,6 +3791,28 @@ struct config_int ConfigureNamesInt[] =
NULL, NULL, NULL
},
+ {
+ {"progressive_explain_interval", PGC_USERSET, QUERY_TUNING_METHOD,
+ gettext_noop("Sets the interval between instrumented progressive "
+ "explains."),
+ NULL,
+ GUC_UNIT_MS
+ },
+ &progressive_explain_interval,
+ 1000, 10, INT_MAX,
+ NULL, NULL, NULL
+ },
+
+ {
+ {"progressive_explain_output_size", PGC_POSTMASTER, QUERY_TUNING_METHOD,
+ gettext_noop("Sets the size reserved for pg_stat_progress_explain.explain, in bytes."),
+ NULL
+ },
+ &progressive_explain_output_size,
+ 4096, 100, 1048576,
+ NULL, NULL, NULL
+ },
+
/* End-of-list marker */
{
{NULL, 0, 0, NULL, NULL}, NULL, 0, 0, 0, NULL, NULL, NULL
@@ -4023,6 +4094,17 @@ struct config_real ConfigureNamesReal[] =
NULL, NULL, NULL
},
+ {
+ {"progressive_explain_sample_rate", PGC_USERSET, QUERY_TUNING_METHOD,
+ gettext_noop("Fraction of rows processed by the query until progressive_explain_interval is evaluated "
+ "to print a progressive plan."),
+ gettext_noop("Use a value between 0.0 (never) and 1.0 (always).")
+ },
+ &progressive_explain_sample_rate,
+ 0.01, 0.0, 1.0,
+ NULL, NULL, NULL
+ },
+
/* End-of-list marker */
{
{NULL, 0, 0, NULL, NULL}, NULL, 0.0, 0.0, 0.0, NULL, NULL, NULL
@@ -5235,6 +5317,16 @@ struct config_enum ConfigureNamesEnum[] =
NULL, NULL, NULL
},
+ {
+ {"progressive_explain_format", PGC_USERSET, QUERY_TUNING_METHOD,
+ gettext_noop("Selects the EXPLAIN output format to be used with progressive explains."),
+ gettext_noop("Equivalent to the FORMAT option of EXPLAIN.")
+ },
+ &progressive_explain_format,
+ EXPLAIN_FORMAT_TEXT, explain_format_options,
+ NULL, NULL, NULL
+ },
+
/* End-of-list marker */
{
{NULL, 0, 0, NULL, NULL}, NULL, 0, NULL, NULL, NULL, NULL
diff --git a/src/include/catalog/pg_proc.dat b/src/include/catalog/pg_proc.dat
index b37e8a6f88..bded926a9b 100644
--- a/src/include/catalog/pg_proc.dat
+++ b/src/include/catalog/pg_proc.dat
@@ -12429,4 +12429,14 @@
proargtypes => 'int2',
prosrc => 'gist_stratnum_identity' },
+{ oid => '8770',
+ descr => 'statistics: information about progress of backends running statements',
+ proname => 'pg_stat_progress_explain', prorows => '100', proisstrict => 'f',
+ proretset => 't', provolatile => 's', proparallel => 'r',
+ prorettype => 'record', proargtypes => 'bool',
+ proallargtypes => '{bool,int4,timestamptz,int4,text,text}',
+ proargmodes => '{i,o,o,o,o,o}',
+ proargnames => '{mode,pid,last_explain,explain_count,total_explain_time,explain}',
+ prosrc => 'pg_stat_progress_explain' },
+
]
diff --git a/src/include/commands/explain.h b/src/include/commands/explain.h
index ea7419951f..05ee8db3b9 100644
--- a/src/include/commands/explain.h
+++ b/src/include/commands/explain.h
@@ -16,6 +16,7 @@
#include "executor/executor.h"
#include "lib/stringinfo.h"
#include "parser/parse_node.h"
+#include "datatype/timestamp.h"
typedef enum ExplainSerializeOption
{
@@ -67,12 +68,28 @@ typedef struct ExplainState
List *deparse_cxt; /* context list for deparsing expressions */
Bitmapset *printed_subplans; /* ids of SubPlans we've printed */
bool hide_workers; /* set if we find an invisible Gather */
+ bool progressive; /* set if tracking a progressive explain */
int rtable_size; /* length of rtable excluding the RTE_GROUP
* entry */
/* state related to the current plan node */
ExplainWorkersState *workers_state; /* needed if parallel plan */
} ExplainState;
+typedef struct explainHashKey
+{
+ int pid; /* PID */
+} explainHashKey;
+
+typedef struct explainHashEntry
+{
+ explainHashKey key; /* hash key of entry - MUST BE FIRST */
+ int pid;
+ TimestampTz last_explain;
+ int explain_count;
+ float explain_duration;
+ char plan[];
+} explainHashEntry;
+
/* Hook for plugins to get control in ExplainOneQuery() */
typedef void (*ExplainOneQuery_hook_type) (Query *query,
int cursorOptions,
@@ -144,4 +161,9 @@ extern void ExplainCloseGroup(const char *objtype, const char *labelname,
extern DestReceiver *CreateExplainSerializeDestReceiver(ExplainState *es);
+extern void ProgressiveExplainBegin(QueryDesc *queryDesc);
+extern void ProgressiveExplainUpdate(PlanState *node);
+extern Size ExplainHashShmemSize(void);
+extern void InitExplainHash(void);
+
#endif /* EXPLAIN_H */
diff --git a/src/include/executor/instrument.h b/src/include/executor/instrument.h
index 5a6eff75c6..d431877efe 100644
--- a/src/include/executor/instrument.h
+++ b/src/include/executor/instrument.h
@@ -108,6 +108,7 @@ extern void InstrStartNode(Instrumentation *instr);
extern void InstrStopNode(Instrumentation *instr, double nTuples);
extern void InstrUpdateTupleCount(Instrumentation *instr, double nTuples);
extern void InstrEndLoop(Instrumentation *instr);
+extern void InstrEndLoopForce(Instrumentation *instr);
extern void InstrAggNode(Instrumentation *dst, Instrumentation *add);
extern void InstrStartParallelQuery(void);
extern void InstrEndParallelQuery(BufferUsage *bufusage, WalUsage *walusage);
diff --git a/src/include/nodes/execnodes.h b/src/include/nodes/execnodes.h
index 2912741607..102ecffc14 100644
--- a/src/include/nodes/execnodes.h
+++ b/src/include/nodes/execnodes.h
@@ -56,6 +56,7 @@ struct ExprState;
struct ExprContext;
struct RangeTblEntry; /* avoid including parsenodes.h here */
struct ExprEvalStep; /* avoid including execExpr.h everywhere */
+struct QueryDesc; /* avoid including execdesc.h here */
struct CopyMultiInsertBuffer;
struct LogicalTapeSet;
@@ -735,6 +736,10 @@ typedef struct EState
*/
List *es_insert_pending_result_relations;
List *es_insert_pending_modifytables;
+
+ struct QueryDesc *query_desc;
+ instr_time progressive_explain_interval_time;
+ struct PlanState *progressive_explain_current_node;
} EState;
diff --git a/src/include/storage/lwlocklist.h b/src/include/storage/lwlocklist.h
index cf56545238..43f10a5186 100644
--- a/src/include/storage/lwlocklist.h
+++ b/src/include/storage/lwlocklist.h
@@ -83,3 +83,4 @@ PG_LWLOCK(49, WALSummarizer)
PG_LWLOCK(50, DSMRegistry)
PG_LWLOCK(51, InjectionPoint)
PG_LWLOCK(52, SerialControl)
+PG_LWLOCK(53, ExplainHash)
diff --git a/src/include/utils/guc.h b/src/include/utils/guc.h
index 532d6642bb..e84f06046c 100644
--- a/src/include/utils/guc.h
+++ b/src/include/utils/guc.h
@@ -274,6 +274,13 @@ extern PGDLLIMPORT int log_min_duration_statement;
extern PGDLLIMPORT int log_temp_files;
extern PGDLLIMPORT double log_statement_sample_rate;
extern PGDLLIMPORT double log_xact_sample_rate;
+extern PGDLLIMPORT bool progressive_explain;
+extern PGDLLIMPORT bool progressive_explain_verbose;
+extern PGDLLIMPORT bool progressive_explain_settings;
+extern PGDLLIMPORT int progressive_explain_interval;
+extern PGDLLIMPORT int progressive_explain_output_size;
+extern PGDLLIMPORT int progressive_explain_format;
+extern PGDLLIMPORT double progressive_explain_sample_rate;
extern PGDLLIMPORT char *backtrace_functions;
extern PGDLLIMPORT int temp_file_limit;
--
2.34.1
Hi Rafael,
This sounds like a great feature, thanks for working on it and sharing
the patch. Let me share some comments / questions after a quick review.
On 12/30/24 02:18, Rafael Thofehrn Castro wrote:
Hello community,
CONTEXT:
Back in October I presented the talk "Debugging active queries with
mid-flight instrumented explain plans" at PGConf EU 2024
(recording: https://www.youtube.com/watch?v=6ahTb-7C05c <https://
www.youtube.com/watch?v=6ahTb-7C05c>) presenting
an experimental feature that enables visualization of in progress
EXPLAIN ANALYZE executions. Given the positive feedback and requests,
I am sending this patch with the feature, which I am calling Progressive
Explain.
Really nice talk, I enjoyed watching it a couple weeks back.
PROPOSAL:
This proposal introduces a feature to print execution plans of active
queries in an in-memory shared hash object so that other sessions can
visualize them with a new view: pg_stat_progress_explain.Plans are only printed if the new GUC parameter progressive_explain is
enabled.
Aren't the names of the view / GUC are a bit misleading? Because this is
really about EXPLAIN ANALYZE, not just EXPLAIN. Not sure.
For regular queries or queries started with EXPLAIN (without ANALYZE)
the plan is printed only once at the start.
Initially I thought this is a bit weird, and also inconsistent with the
auto_explain (which prints plans at the end). But I think for "progress"
that makes sense, even if it's not updated after that.
For instrumented runs (started via EXPLAIN ANALYZE or when auto_explain
flag log_analyze is enabled), the plan is printed on a fixed interval
controlled by the new GUC parameter progressive_explain_interval. This plan
includes all instrumentation stats computed so far (per node rows and
execution time).
OK, I understand why it works like this. But I rather dislike that it
relies on auto_explain to enable the progressive updates. If we want to
make this dependent on auto_explain, then most of this could/should be
moved to auto_explain, and called through a hook. Or something similar.
If we want to add this to core, then I think the progressive_explain GUC
should have "off|explain|analyze" values, or a similar way to enable the
instrumentation and regular updates. But that would probably also need
to duplicate a number of auto_explain options (e.g. to allow disabling
timings).
New view:
- pg_stat_progress_explain
- pid: PID of the process running the query
- last_explain: timestamp when plan was last printed
- explain_count: amount of times plan was printed
- total_explain_time: accumulated time spend printing plans (in ms)
- explain: the actual plan (limited read privileges)
I find the "explain_time" a bit misleading. On the one hand - yes, it
does account for time spent generating the EXPLAIN output. But on the
other hand, that's only a part of the overhead - it ignores the overhead
of the extra instrumentation. Also, wouldn't it be better to rename
"explain" to "query plan"? That's what the EXPLAIN result says too.
New GUCs:
- progressive_explain: if progressive plans are printed for local
session.
- type: bool
- default: off
- context: user
- progressive_explain_interval: interval between each explain print.
- type: int
- default: 1s
- min: 10ms
- context: user
Seems reasonable (but I already commented on progressive_explain).
- progressive_explain_sample_rate: fraction of rows processed by the
query until progressive_explain_interval is evaluated to print a
progressive plan
- type: floating point
- default: 0.1
- range: (0.0 - 1.0)
- context: user
I find this pretty weird / unnecessary. More comments later.
- progressive_explain_output_size: max output size of the plan
printed in the in-memory hash table.
- type: int
- default: 4096
- min: 100
- context: postmaster
Seems far too small.
- progressive_explain_format: format used to print the plans.
- type: enum
- default: text
- context: user
Good idea. When processing the view automatically, JSON would be much
better than text.
- progressive_explain_settings: controls whether information about
modified configuration is added to the printed plan.
- type: bool
- default: off
- context: user
- progressive_explain_verbose: controls whether verbose details are
added to the printed plan.
- type: bool
- default: off
- context: user
This seems to be the duplication of some auto_explain parameters that I
mentioned above. Except that it only duplicates some of them. I'd like
the ability to disable collecting timing info, which is usually by far
the main overhead.
DEMONSTRATION:
postgres=# SET progressive_explain = ON;
SET
postgres=# EXPLAIN ANALYZE SELECT *
FROM test t1
UNION ALL
SELECT *
FROM test t1;postgres=# select * from pg_stat_progress_explain; -[ RECORD 1 ]------ +--------------------------------------------------------------------------------------------------------------------------------------- pid | 299663 last_explain | 2024-12-29 22:40:33.016833+00 explain_count | 5 total_explain_time | 0.205 explain | Append (cost=0.00..466670.40 rows=20000160 width=37) (actual time=0.052..3372.318 rows=14013813 loops=1) | Buffers: shared hit=4288 read=112501 | -> Seq Scan on test t1 (cost=0.00..183334.80 rows=10000080 width=37) (actual time=0.052..1177.428 rows=10000000 loops=1) | Buffers: shared hit=4288 read=79046 | -> Seq Scan on test t1_1 (cost=0.00..183334.80 rows=10000080 width=37) (actual time=0.072..608.481 rows=4013813 loops=1) (current) | Buffers: shared read=33455 |IMPLEMENTATION HIGHLIGHTS:
- The initial plan is printed at the end of standard_ExecutorStart
if progressive_explain is enabled, for both regular queries and
instrumented ones (EXPLAIN ANALYZE):/*
* Start progressive explain if enabled.
*/
if (progressive_explain)
ProgressiveExplainBegin(queryDesc);- The incremental plan print for instrumented runs uses a dedicated
ExecProcNode if progressive_explain is enabled:if (node->instrument)
if (progressive_explain)
node->ExecProcNode = ExecProcNodeInstrExplain;
else
node->ExecProcNode = ExecProcNodeInstr;
else
node->ExecProcNode = node->ExecProcNodeReal;- ExecProcNodeInstrExplain is identical to ExecProcNodeInstr with an
additional part to print plans based on a sampling logic:
I haven't looked at the details yet, but this seems mostly reasonable.
/*
* Update progressive explain based on sampling.
*/
if (pg_prng_double(&pg_global_prng_state) < progressive_explain_sample_rate)
ProgressiveExplainUpdate(node);That logic was added because ExecProcNodeInstrExplain is called once per
row processed (a lot of times) and performing the timestamp interval
check with progressive_explain_interval to decide whether to print
the plan (done inside ProgressiveExplainUpdate) is expensive. Benchmarks
(shared at the end of this email) show that sampling + timestamp check
gives much better results than performing the timestamp check at every
ExecProcNodeInstrExplain call.
I don't think this is the way to deal with the high cost of collecting
timing information. It just adds unpredictability, because how would you
know what's a good sample rate? What if you pick a far too low value,
and then a plan gets logged much less frequently because for that
particular query it takes much longer to return a tuple?
I think the right solution to deal with high cost of timing are
timeouts. Set a timeout, with a handler that sets a "pending" flag, and
check the flag before calling ProgressiveExplainUpdate(). See for
example TransactionTimeoutHandler in postinit.c.
- The plans are stored in a shared hash object (explainArray) allocated
at database start, similar to procArray. ExplainHashShmemSize() computes
shared memory needed for it, based on max_connections + max_parallel_workers
for the amount of elements in the array and progressive_explain_output_size
for the size per element.
Why do we need a shared hash table?
- A memory context release callback is configured in the memory context
where the query is running, being responsible for updating explainArray
even when the query doesn't finish gracefully.
OK
- Instrumented plans being printed incrementally need to clone
instrumentation
objects to change them, so each print uses a dedicated memory context
that gets released after the output is constructed. This avoids extended
private memory usage:
Yeah. I was wondering how you're going to deal with this. Wouldn't it be
cheaper to just use a static variable? I don't think this is called
recursively, and that'd save the palloc/pfree. Haven't tried and not
sure if it's worth it.
/* Dedicated memory context for the current plan being printed */
tmpCxt = AllocSetContextCreate(CurrentMemoryContext,
"Progressive Explain Temporary Context",
ALLOCSET_DEFAULT_SIZES);
Maybe it'd be better to keep this memory context for the query duration
(e.g. by adding it to queryDesc), and just reset it before the calls?
That'd cache the memory, and it shouldn't really use a lot of it, no?
Also, ProgressiveExplainPrint() does this at the end:
+ /*
+ * Free local explain state before exiting as this function may be
+ * called multiple times in the same memory context.
+ */
+ pfree(es->str);
+ pfree(es);
Isn't this actually pointless with the local memory context?
- A new version of InstrEndLoop (InstrEndLoopForce) was created that allows
to be called targeting in-progress instrumented objects. Those are common
when traversing the plan tree of an active query.
No opinion yet.
- Column explain from pg_stat_progress_explain can only be visualized by
superusers or the same role that is running the query. If none of those
conditions are met, users will see "<insufficient privilege>".
I think this is in line with how we restrict access to similar catalogs.
- For instrumented runs, the printed includes 2 per node modifiers when
appropriate:<current>: the plan node currently being processed.
<never executed>: a plan node not processed yet.
Not sure the <current> label is all that useful. It seems misleading at
best, because it's simply the last node that generated the explain. But
we might have already moved to a different node. And that node may be
stuck / very expensive, yet we'll see the plan as seemingly waiting in
some other node.
IMPLEMENTATION OVERHEAD:
When not used, the overhead added is:
- One IF at standard_ExecutorStart to check if progressive_explain is
enabled
- For instrumented runs (EXPLAIN ANALYZE), one IF at ExecProcNodeFirst
to define ExecProcNode wrapperBENCHMARKS:
Performed 3 scenarios of benchmarks:
A) Comparison between unpatched PG18, patched with progressive explain
disabled and patched with feature enabled globally (all queries printing
the plan at query start:- PG18 without patch:
postgres@ip-172-31-39-191:~$ /usr/local/pgsql/bin/pgbench -S -n -T 120 -c 30
number of transactions actually processed: 2173978
latency average = 1.655 ms
tps = 18127.977529 (without initial connection time)- PG18 with patch:
-- progressive_explain = off
postgres@ip-172-31-39-191:~$ /usr/local/pgsql/bin/pgbench -S -n -T 120 -c 30
number of transactions actually processed: 2198806
latency average = 1.636 ms
tps = 18333.157809 (without initial connection time)-- progressive_explain = on (prints plan only once per query)
postgres@ip-172-31-39-191:~$ /usr/local/pgsql/bin/pgbench -S -n -T 120 -c 30
number of transactions actually processed: 2047459
latency average = 1.756 ms
tps = 17081.477199 (without initial connection time)B) EXPLAIN ANALYZE performance with different progressive_explain_interval
settings in patched:-- progressive_explain = off
postgres@ip-172-31-39-191:~$ /usr/local/pgsql/bin/pgbench -n -T 120 -c 1
-f script.sql
number of transactions actually processed: 27
latency average = 4492.845 ms-- progressive_explain = on
-- progressive_explain_interval = 1s (default)
-- progressive_explain_sample_rate = 0.01 (default)postgres@ip-172-31-39-191:~$ /usr/local/pgsql/bin/pgbench -n -T 300 -c 1
-f script.sql
number of transactions actually processed: 26
latency average = 4656.067 ms-- progressive_explain = on
-- progressive_explain_interval = 10ms
-- progressive_explain_sample_rate = 0.01 (default)postgres@ip-172-31-39-191:~$ /usr/local/pgsql/bin/pgbench -n -T 300 -c 1
-f script.sql
number of transactions actually processed: 26
latency average = 4785.608 msC) EXPLAIN ANALYZE performance in patched with and without
progressive_explain_sample_rate, ie, sampling with 2 different values
and also no sampling logic:-- progressive_explain = off
postgres@ip-172-31-39-191:~$ /usr/local/pgsql/bin/pgbench -n -T 120 -c 1
-f script.sql
number of transactions actually processed: 27
latency average = 4492.845 ms-- progressive_explain = on
-- progressive_explain_interval = 1s (default)
-- progressive_explain_sample_rate = 0.01 (default)postgres@ip-172-31-39-191:~$ /usr/local/pgsql/bin/pgbench -n -T 120 -c 1
-f script.sql
number of transactions actually processed: 26
latency average = 4656.067 ms-- progressive_explain = on
-- progressive_explain_interval = 1s (default)
-- progressive_explain_sample_rate = 1postgres@ip-172-31-39-191:~$ /usr/local/pgsql/bin/pgbench -n -T 120 -c 1
-f script.sql
number of transactions actually processed: 19
latency average = 6432.902 ms-- progressive_explain = on
-- progressive_explain_interval = 1s (default)
-- NO SAMPLING LOGICpostgres@ip-172-31-39-191:~$ /usr/local/pgsql/bin/pgbench -n -T 120 -c 1
-f script.sql
number of transactions actually processed: 21
latency average = 5864.820 msBENCHMARK RESULTS:
It definitely needs more testing but preliminary results show that:
- From (A) we see that the patch adds negligible overhead when the
feature is not used. Enabling globally reduces overall TPS as all
queries are spending time printing the plan. The idea is to enable
progressive_explain on a per-need basis, only to a subset of sessions
that need it.- From (B) we see that using progressive explain slightly increases
total execution time. Difference between using progressive_explain_interval
set to 1s (plan printed 4 times per query in the test) and to
10ms (plan printed ~460 times per query in the test) is very small.
The actual overhead appears when changing progressive_explain_sample_rate.- From (C) we see that progressive_explain_sample_rate with a low
value (default 0.01) performs better than not using sampling or
using progressive_explain_sample_rate = 1. So the overhead of having
the sampling logic is much lower than not sampling at all.
No opinion. I need to do some testing / benchmarking myself.
TESTS:
Currently working on tests for a second version of the patch.
DOCUMENTATION:
Added documentation for the new view pg_stat_progress_explain,
new GUCs and a new item in section 14.1:14.1. Using EXPLAIN
14.1.1. EXPLAIN Basics
14.1.2. EXPLAIN ANALYZE
14.1.3. Caveats
14.1.4. Progressive EXPLAINFURTHER DISCUSSION:
Considering that this patch introduces a new major feature with
several new components (view, GUCs, etc), there is open room for
discussion such as:- Do the columns in pg_stat_progress_explain make sense? Are we
missing or adding unnecessary columns?- Do the new GUCs make sense and are their default values appropriate?
- Do we want progressive explain to print plans of regular queries
started without EXPLAIN if progressive_explain is enabled or should
the feature be restricted to instrumented queries (EXPLAIN ANALYZE)?- Is the size of explainHash based on max_connections + max_parallel_workers
large enough or are there other types of backends that use the
executor and will print plans too?
I've commented on some of these items earlier.
regards
--
Tomas Vondra
On 12/30/24 2:18 AM, Rafael Thofehrn Castro wrote:
Hello community,
CONTEXT:
Back in October I presented the talk "Debugging active queries with
mid-flight instrumented explain plans" at PGConf EU 2024
(recording: https://www.youtube.com/watch?v=6ahTb-7C05c) presenting
an experimental feature that enables visualization of in progress
EXPLAIN ANALYZE executions. Given the positive feedback and requests,
I am sending this patch with the feature, which I am calling Progressive
Explain.PROPOSAL:
This proposal introduces a feature to print execution plans of active
queries in an in-memory shared hash object so that other sessions can
visualize them with a new view: pg_stat_progress_explain.
Hello
Thanks for your contribution! Such feature would be very useful.
I did not look carefully about the implementation. I just wanted to let
you know someone already posted a similar feature in this thread :
/messages/by-id/CADdR5ny_0dFwnD+suBnV1Vz6NDKbFHeWoV1EDv9buhDCtc3aAA@mail.gmail.com
Maybe it could give you some ideas.
Regards,
--
Adrien NAYRAT
Hi,
Thanks for the valuable feedback Tomas. I am sending a new version of the
patch
that includes:
- Changed instrumented plan printing based on timeouts instead of sampling.
This works perfectly and benchmarks are promising. So new GUC
progressive_explain_sampe_rate is removed.
- Removed all parts of the code where allocated memory for instrumentation
and explain objects is released as this is done automatically when the
custom memory context is released. Comments added later on.
- Adjusted regression tests expected objects so tests pass.
OK, I understand why it works like this. But I rather dislike that it
relies on auto_explain to enable the progressive updates. If we want to
make this dependent on auto_explain, then most of this could/should be
moved to auto_explain, and called through a hook. Or something similar.
If we want to add this to core, then I think the progressive_explain GUC
should have "off|explain|analyze" values, or a similar way to enable the
instrumentation and regular updates. But that would probably also need
to duplicate a number of auto_explain options (e.g. to allow disabling
timings).
This seems to be the duplication of some auto_explain parameters that I
mentioned above. Except that it only duplicates some of them. I'd like
the ability to disable collecting timing info, which is usually by far
the main overhead.
I implemented using the current logic with the premise that this new feature
shouldn't change how queries are executed. If a query is not using
EXPLAIN ANALYZE then it shouldn't transparently enable instrumentation,
which will then cause additional overhead. If someone wants an instrumented
run, then it explicitly calls EXPLAIN ANALYZE.
That is the same reasoning for coming up with the GUCs. The only ones
available are progressive_explain_format, progressive_explain_settings
and progressive_explain_verbose, which will only affect the explain
output without changing how the query is executed.
But If we come to an agreement that "off|explain|analyze" values for
progressive_explain GUC is a better approach, changing the logic should
be easy. Now that you mentioned the idea, it does make sense.
I think the right solution to deal with high cost of timing are
timeouts. Set a timeout, with a handler that sets a "pending" flag, and
check the flag before calling ProgressiveExplainUpdate(). See for
example TransactionTimeoutHandler in postinit.c.
Thanks for that. The new patch sent in this message uses a timeout instead.
Why do we need a shared hash table?
This is where I am storing the printed plans so other backends can read
from. I saw that as the only option apart from printing to files. Do
you recommend something else?
Yeah. I was wondering how you're going to deal with this. Wouldn't it be
cheaper to just use a static variable? I don't think this is called
recursively, and that'd save the palloc/pfree. Haven't tried and not
sure if it's worth it.
You mean an instrumented object allocated only once that will be updated
with the counters of the currently processed node? That is also an option.
Maybe it is worth testing considering that it may save millions of
palloc/pfree.
Will give it a go.
Maybe it'd be better to keep this memory context for the query duration
(e.g. by adding it to queryDesc), and just reset it before the calls?
That'd cache the memory, and it shouldn't really use a lot of it, no?
Instead of having a dedicated memory context, use the same context as the
one running the query? This will then require manually freeing the
ExplainEstate
allocated in each ProgressiveExplainPrint() call (I removed in this patch
that
manual allocation that wasn't needed currently) and also the allocated
instrumentation objects (which will no longer be needed if we use a static
variable). This will save memory context creation/release and is definitely
a better option. Will test it.
Not sure the <current> label is all that useful. It seems misleading at
best, because it's simply the last node that generated the explain. But
we might have already moved to a different node. And that node may be
stuck / very expensive, yet we'll see the plan as seemingly waiting in
some other node.
Right. The idea here is that, for some specific cases, the execution is
momentarily limited to a subpart of the plan. For example, if a HashJoin
is present that whole set of nodes inside the hash creation need to be
complete before other operations are done. If we see a <current> as part
of that hash creation we know that the query is currently in that region,
regardless of which specific node it is.
This could become a feature to be created as part of an extension with the
help of a new hook inside ExplainNode() if we come to the agreement that
it is not really needed out of the box.
Thanks for your contribution! Such feature would be very useful.
I did not look carefully about the implementation. I just wanted to let
you know someone already posted a similar feature in this thread :
Maybe it could give you some ideas.
Thanks Adrien! Will have a look.
Regards,
Rafael Castro.
Attachments:
v4-0001-Proposal-for-progressive-explains.patchapplication/octet-stream; name=v4-0001-Proposal-for-progressive-explains.patchDownload
From d30e43dc227a2351aafec3492521abe4b2b1e6c7 Mon Sep 17 00:00:00 2001
From: rafaelthca <rafaelthca@gmail.com>
Date: Tue, 7 Jan 2025 19:32:16 +0000
Subject: [PATCH v4] Proposal for progressive explains
This proposal introduces a feature to print execution plans of active
queries in an in-memory shared hash object so that other sessions can
visualize via new view pg_stat_progress_explain.
Plans are only printed if new GUC parameter progressive_explain is
enabled.
For regular queries or queries started with EXPLAIN (without ANALYZE)
the plan is printed only once at the start.
For instrumented runs (started via EXPLAIN ANALYZE or when auto_explain
flag log_analyze is enabled) the plan is printed on a fixed interval
controlled by new GUC parameter progressive_explain_interval including
all instrumentation stats computed so far (per node rows and execution
time).
New view:
- pg_stat_progress_explain
- pid: PID of the process running the query
- last_explain: timestamp when plan was last printed
- explain_count: amount of times plan was printed
- total_explain_time: accumulated time spent printing plans (in ms)
- explain: the actual plan (limited read privileges)
New GUCs:
- progressive_explain: if progressive plans are printed for local
session.
- type: bool
- default: off
- context: user
- progressive_explain_interval: interval between each explain print.
- type: int
- default: 1s
- min: 10ms
- context: user
- progressive_explain_output_size: max output size of the plan
printed in the in-memory hash table.
- type: int
- default: 4096
- min: 100
- context: postmaster
- progressive_explain_format: format used to print the plans.
- type: enum
- default: text
- context: user
- progressive_explain_settings: controls whether information about
modified configuration is added to the printed plan.
- type: bool
- default: off
- context: user
- progressive_explain_verbose: controls whether verbose details are
added to the printed plan.
- type: bool
- default: off
- context: user
---
doc/src/sgml/config.sgml | 109 ++++
doc/src/sgml/monitoring.sgml | 84 ++++
doc/src/sgml/perform.sgml | 100 ++++
src/backend/catalog/system_views.sql | 5 +
src/backend/commands/explain.c | 465 ++++++++++++++++--
src/backend/executor/execMain.c | 12 +
src/backend/executor/execProcnode.c | 35 +-
src/backend/executor/instrument.c | 20 +-
src/backend/storage/ipc/ipci.c | 7 +
.../utils/activity/wait_event_names.txt | 1 +
src/backend/utils/init/postinit.c | 10 +
src/backend/utils/misc/guc_tables.c | 80 +++
src/include/catalog/pg_proc.dat | 10 +
src/include/commands/explain.h | 24 +
src/include/executor/instrument.h | 1 +
src/include/nodes/execnodes.h | 5 +
src/include/storage/lwlocklist.h | 1 +
src/include/utils/guc.h | 6 +
src/include/utils/timeout.h | 1 +
src/test/regress/expected/rules.out | 6 +
20 files changed, 930 insertions(+), 52 deletions(-)
diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml
index a782f10998..8fb9342a41 100644
--- a/doc/src/sgml/config.sgml
+++ b/doc/src/sgml/config.sgml
@@ -8362,6 +8362,115 @@ COPY postgres_log FROM '/full/path/to/logfile.csv' WITH csv;
</listitem>
</varlistentry>
+ <varlistentry id="guc-progressive-explain" xreflabel="progressive_explain">
+ <term><varname>progressive_explain</varname> (<type>boolean</type>)
+ <indexterm>
+ <primary><varname>progressive_explain</varname> configuration parameter</primary>
+ </indexterm>
+ </term>
+ <listitem>
+ <para>
+ Determines whether progressive explains are enabled; see
+ <xref linkend="using-explain-progressive"/>. The default is
+ <literal>off</literal>.
+ </para>
+ </listitem>
+ </varlistentry>
+
+ <varlistentry id="guc-progressive-explain-verbose" xreflabel="progressive_explain_verbose">
+ <term><varname>progressive_explain_verbose</varname> (<type>boolean</type>)
+ <indexterm>
+ <primary><varname>progressive_explain_verbose</varname> configuration parameter</primary>
+ </indexterm>
+ </term>
+ <listitem>
+ <para>
+ Controls whether verbose details are added to progressive explains.
+ Equivalent to the VERBOSE option of EXPLAIN. The default is
+ <literal>off</literal>.
+ </para>
+ </listitem>
+ </varlistentry>
+
+ <varlistentry id="guc-progressive-explain-settings" xreflabel="progressive_explain_settings">
+ <term><varname>progressive_explain_settings</varname> (<type>boolean</type>)
+ <indexterm>
+ <primary><varname>progressive_explain_settings</varname> configuration parameter</primary>
+ </indexterm>
+ </term>
+ <listitem>
+ <para>
+ Controls whether information about modified configuration is added to
+ progressive explains. Equivalent to the SETTINGS option of EXPLAIN.
+ The default is <literal>off</literal>.
+ </para>
+ </listitem>
+ </varlistentry>
+
+ <varlistentry id="guc-progressive-explain-interval" xreflabel="progressive_explain_interval">
+ <term><varname>progressive_explain_interval</varname> (<type>integer</type>)
+ <indexterm>
+ <primary><varname>progressive_explain_interval</varname> configuration parameter</primary>
+ </indexterm>
+ </term>
+ <listitem>
+ <para>
+ Sets the interval (in milliseconds) between each instrumented
+ progressive explain. The default is <literal>1s</literal>.
+ </para>
+ </listitem>
+ </varlistentry>
+
+ <varlistentry id="guc-progressive-explain-format" xreflabel="progressive_explain_format">
+ <term><varname>progressive_explain_format</varname> (<type>enum</type>)
+ <indexterm>
+ <primary><varname>progressive_explain_format</varname> configuration parameter</primary>
+ </indexterm>
+ </term>
+ <listitem>
+ <para>
+ Selects the EXPLAIN output format to be used with progressive
+ explains. Equivalent to the FORMAT option of EXPLAIN. The default
+ is <literal>text</literal>.
+ </para>
+ </listitem>
+ </varlistentry>
+
+ <varlistentry id="guc-progressive-explain-output-size" xreflabel="progressive_explain_output_size">
+ <term><varname>progressive_explain_output_size</varname> (<type>integer</type>)
+ <indexterm>
+ <primary><varname>progressive_explain_output_size</varname> configuration parameter</primary>
+ </indexterm>
+ </term>
+ <listitem>
+ <para>
+ Specifies the amount of memory reserved to store the text of the
+ progressive explain for each client backend or parallel worker, for the
+ <structname>pg_stat_progress_explain</structname>.<structfield>explain</structfield>
+ field. If this value is specified without units, it is taken as bytes.
+ The default value is <literal>4096 bytes</literal>.
+ This parameter can only be set at server start.
+ </para>
+ </listitem>
+ </varlistentry>
+
+ <varlistentry id="guc-progressive-explain-sample-rate" xreflabel="progressive_explain_sample_rate">
+ <term><varname>progressive_explain_sample_rate</varname> (<type>floating point</type>)
+ <indexterm>
+ <primary><varname>progressive_explain_sample_rate</varname> configuration parameter</primary>
+ </indexterm>
+ </term>
+ <listitem>
+ <para>
+ Fraction of rows processed by the query until
+ <xref linkend="guc-progressive-explain-interval"/> is evaluated
+ to print a progressive explain plan. The default value is
+ <literal>0.01</literal>, resulting in 1 check every 100 processed
+ rows.
+ </para>
+ </listitem>
+ </varlistentry>
+
</variablelist>
</sect2>
diff --git a/doc/src/sgml/monitoring.sgml b/doc/src/sgml/monitoring.sgml
index 4e917f159a..725fcb7c69 100644
--- a/doc/src/sgml/monitoring.sgml
+++ b/doc/src/sgml/monitoring.sgml
@@ -6778,6 +6778,90 @@ FROM pg_stat_get_backend_idset() AS backendid;
</sect2>
+<sect2 id="explain-progress-reporting">
+ <title>EXPLAIN Progress Reporting</title>
+
+ <indexterm>
+ <primary>pg_stat_progress_explain</primary>
+ </indexterm>
+
+ <para>
+ Whenever a client backend or parallel worker is running a query with
+ <xref linkend="guc-progressive-explain"/> enabled, the
+ <structname>pg_stat_progress_explain</structname> view will contain a
+ corresponding row with query plan details; see
+ <xref linkend="using-explain-progressive"/>. The table below describe the
+ information that will be reported.
+ </para>
+
+ <table id="pg-stat-progress-explain-view" xreflabel="pg_stat_progress_explain">
+ <title><structname>pg_stat_progress_explain</structname> View</title>
+ <tgroup cols="1">
+ <thead>
+ <row>
+ <entry role="catalog_table_entry"><para role="column_definition">
+ Column Type
+ </para>
+ <para>
+ Description
+ </para></entry>
+ </row>
+ </thead>
+
+ <tbody>
+ <row>
+ <entry role="catalog_table_entry"><para role="column_definition">
+ <structfield>pid</structfield> <type>integer</type>
+ </para>
+ <para>
+ Process ID of a client backend or parallel worker.
+ </para></entry>
+ </row>
+
+ <row>
+ <entry role="catalog_table_entry"><para role="column_definition">
+ <structfield>last_explain</structfield> <type>timestamp with time zone</type>
+ </para>
+ <para>
+ Timestamp when plan was last printed.
+ </para></entry>
+ </row>
+
+ <row>
+ <entry role="catalog_table_entry"><para role="column_definition">
+ <structfield>explain_count</structfield> <type>integer</type>
+ </para>
+ <para>
+ Amount of times plan was printed.
+ </para></entry>
+ </row>
+
+ <row>
+ <entry role="catalog_table_entry"><para role="column_definition">
+ <structfield>total_explain_time</structfield> <type>floating point</type>
+ </para>
+ <para>
+ Accumulated time spent printing plans (in ms).
+ </para></entry>
+ </row>
+
+ <row>
+ <entry role="catalog_table_entry"><para role="column_definition">
+ <structfield>explain</structfield> <type>text</type>
+ </para>
+ <para>
+ The progressive explain text. By default the explain text is
+ truncated at 4096 bytes; this value can be changed via the
+ parameter <xref linkend="guc-progressive-explain-output-size"/>.
+ </para></entry>
+ </row>
+
+ </tbody>
+ </tgroup>
+ </table>
+
+ </sect2>
+
</sect1>
<sect1 id="dynamic-trace">
diff --git a/doc/src/sgml/perform.sgml b/doc/src/sgml/perform.sgml
index a502a2aaba..4f09361829 100644
--- a/doc/src/sgml/perform.sgml
+++ b/doc/src/sgml/perform.sgml
@@ -1109,6 +1109,106 @@ EXPLAIN ANALYZE SELECT * FROM tenk1 WHERE unique1 < 100 AND unique2 > 9000
</para>
</sect2>
+ <sect2 id="using-explain-progressive">
+ <title>Progressive <command>EXPLAIN</command></title>
+
+ <para>
+ The query plan created by the planner for any given active query, or
+ detailed plan with row counts and accumulated run time when
+ <command>EXPLAIN ANALYZE</command> is used, can be visualized by any
+ session via <xref linkend="pg-stat-progress-explain-view"/> view when
+ <xref linkend="guc-progressive-explain"/> is enabled in the client
+ backend or parallel worker executing query. Settings
+ <xref linkend="guc-progressive-explain-format"/>,
+ <xref linkend="guc-progressive-explain-verbose"/> and
+ <xref linkend="guc-progressive-explain-settings"/> can be adjusted
+ to customize the printed plan, containing a length limit defined by
+ <xref linkend="guc-progressive-explain-output-size"/>.
+ </para>
+
+ <para>
+ For queries executed without <command>EXPLAIN ANALYZE</command> the
+ plan is printed only once at the beginning of query execution:
+ </para>
+
+ <para>
+<screen>
+SET progressive_explain = on;
+SET
+
+SELECT *
+FROM test t1
+INNER JOIN test t2 ON (t1.c1 = t2.c1);
+</screen>
+ </para>
+ <para>
+<screen>
+SELECT pid, explain_count, explain FROM pg_stat_progress_explain;
+ pid | explain_count | explain
+-----+---------------+--------------------------------------------------------------------------------
+ 159 | 1 | Hash Join (cost=1159375.00..3912500.00 rows=30000000 width=74)
+ | | Hash Cond: (t1.c1 = t2.c1)
+ | | -> Seq Scan on test t1 (cost=0.00..550000.00 rows=30000000 width=37)
+ | | -> Hash (cost=550000.00..550000.00 rows=30000000 width=37)
+ | | -> Seq Scan on test t2 (cost=0.00..550000.00 rows=30000000 width=37)
+ | |
+</screen>
+ </para>
+
+ <para>
+ When <command>EXPLAIN ANALYZE</command> is used the detailed plan is
+ printed progressively based on
+ <xref linkend="guc-progressive-explain-interval"/> and
+ <xref linkend="guc-progressive-explain-sample-rate"/> settings, including
+ per node accumulated row count and run time statistics computed so far. This
+ makes progressive explain a powerful ally when investigating bottlenecks in
+ expensive queries without having to wait for <command>EXPLAIN ANALYZE</command>
+ to finish.
+ </para>
+
+ <para>
+ Progressive explains include additional information per node to help analyzing
+ execution progress:
+
+ <itemizedlist>
+ <listitem>
+ <para>
+ current: the plan node currently being processed.
+ </para>
+ </listitem>
+ <listitem>
+ <para>
+ never executed: a plan node not processed yet.
+ </para>
+ </listitem>
+ </itemizedlist>
+ </para>
+ <para>
+<screen>
+SET progressive_explain = on;
+SET
+
+EXPLAIN ANALYZE SELECT *
+FROM test t1
+INNER JOIN test t2 ON (t1.c1 = t2.c1);
+</screen>
+ </para>
+ <para>
+<screen>
+SELECT pid, explain_count, explain FROM pg_stat_progress_explain;
+ pid | explain_count | explain
+-----+---------------+----------------------------------------------------------------------------------------------------------------------------------------------
+ 159 | 7 | Hash Join (cost=1159375.00..3912500.00 rows=30000000 width=74) (never executed)
+ | | Hash Cond: (t1.c1 = t2.c1)
+ | | -> Seq Scan on test t1 (cost=0.00..550000.00 rows=30000000 width=37) (actual time=0.009..0.009 rows=1 loops=1)
+ | | -> Hash (cost=550000.00..550000.00 rows=30000000 width=37) (never executed)
+ | | -> Seq Scan on test t2 (cost=0.00..550000.00 rows=30000000 width=37) (actual time=0.004..2165.201 rows=27925599 loops=1) (current)
+ | |
+</screen>
+ </para>
+
+ </sect2>
+
</sect1>
<sect1 id="planner-stats">
diff --git a/src/backend/catalog/system_views.sql b/src/backend/catalog/system_views.sql
index cddc3ea9b5..427bd883cf 100644
--- a/src/backend/catalog/system_views.sql
+++ b/src/backend/catalog/system_views.sql
@@ -1336,6 +1336,11 @@ CREATE VIEW pg_stat_progress_copy AS
FROM pg_stat_get_progress_info('COPY') AS S
LEFT JOIN pg_database D ON S.datid = D.oid;
+CREATE VIEW pg_stat_progress_explain AS
+ SELECT
+ *
+ FROM pg_stat_progress_explain(true);
+
CREATE VIEW pg_user_mappings AS
SELECT
U.oid AS umid,
diff --git a/src/backend/commands/explain.c b/src/backend/commands/explain.c
index c24e66f82e..db62347939 100644
--- a/src/backend/commands/explain.c
+++ b/src/backend/commands/explain.c
@@ -13,6 +13,8 @@
*/
#include "postgres.h"
+#include <time.h>
+
#include "access/xact.h"
#include "catalog/pg_type.h"
#include "commands/createas.h"
@@ -22,6 +24,8 @@
#include "jit/jit.h"
#include "libpq/pqformat.h"
#include "libpq/protocol.h"
+#include "miscadmin.h"
+#include "funcapi.h"
#include "nodes/extensible.h"
#include "nodes/makefuncs.h"
#include "nodes/nodeFuncs.h"
@@ -40,6 +44,13 @@
#include "utils/tuplesort.h"
#include "utils/typcache.h"
#include "utils/xml.h"
+#include "utils/backend_status.h"
+#include "storage/procarray.h"
+#include "executor/spi.h"
+#include "utils/guc.h"
+#include "utils/timeout.h"
+
+
/* Hook for plugins to get control in ExplainOneQuery() */
@@ -48,6 +59,11 @@ ExplainOneQuery_hook_type ExplainOneQuery_hook = NULL;
/* Hook for plugins to get control in explain_get_index_name() */
explain_get_index_name_hook_type explain_get_index_name_hook = NULL;
+/* Shared hash to store progressive explains */
+static HTAB *explainArray = NULL;
+
+/* Flag set by timeouts to control when to print progressive explains */
+bool ProgressiveExplainPending = false;
/* Instrumentation data for SERIALIZE option */
typedef struct SerializeMetrics
@@ -140,7 +156,7 @@ static void show_hashagg_info(AggState *aggstate, ExplainState *es);
static void show_tidbitmap_info(BitmapHeapScanState *planstate,
ExplainState *es);
static void show_instrumentation_count(const char *qlabel, int which,
- PlanState *planstate, ExplainState *es);
+ Instrumentation *instr, ExplainState *es);
static void show_foreignscan_info(ForeignScanState *fsstate, ExplainState *es);
static const char *explain_get_index_name(Oid indexId);
static bool peek_buffer_usage(ExplainState *es, const BufferUsage *usage);
@@ -180,6 +196,8 @@ static void ExplainJSONLineEnding(ExplainState *es);
static void ExplainYAMLLineStarting(ExplainState *es);
static void escape_yaml(StringInfo buf, const char *str);
static SerializeMetrics GetSerializationMetrics(DestReceiver *dest);
+void ProgressiveExplainPrint(QueryDesc *queryDesc);
+static void ExplainTrackQueryReleaseFunc(void *);
@@ -392,6 +410,8 @@ NewExplainState(void)
es->costs = true;
/* Prepare output buffer. */
es->str = makeStringInfo();
+ /* An explain state is not progressive by default */
+ es->progressive = false;
return es;
}
@@ -1504,6 +1524,7 @@ ExplainNode(PlanState *planstate, List *ancestors,
const char *partialmode = NULL;
const char *operation = NULL;
const char *custom_name = NULL;
+ Instrumentation *local_instr = NULL;
ExplainWorkersState *save_workers_state = es->workers_state;
int save_indent = es->indent;
bool haschildren;
@@ -1967,24 +1988,56 @@ ExplainNode(PlanState *planstate, List *ancestors,
* instrumentation results the user didn't ask for. But we do the
* InstrEndLoop call anyway, if possible, to reduce the number of cases
* auto_explain has to contend with.
+ *
+ * For regular explains instrumentation clean up is called directly in the
+ * main instrumentation objects. progressive explains need to clone
+ * instrumentation object and forcibly end the loop in nodes that may be
+ * running.
*/
if (planstate->instrument)
- InstrEndLoop(planstate->instrument);
+ {
+ /* Progressive explain. Clone instrumentation */
+ if (es->progressive)
+ {
+ local_instr = palloc0(sizeof(*local_instr));
+ *local_instr = *planstate->instrument;
+ /* Force end loop even if node is in progress */
+ InstrEndLoopForce(local_instr);
+ }
+ /* Use main instrumentation */
+ else
+ {
+ local_instr = planstate->instrument;
+ InstrEndLoop(local_instr);
+ }
+ }
if (es->analyze &&
- planstate->instrument && planstate->instrument->nloops > 0)
+ local_instr && local_instr->nloops > 0)
{
- double nloops = planstate->instrument->nloops;
- double startup_ms = 1000.0 * planstate->instrument->startup / nloops;
- double total_ms = 1000.0 * planstate->instrument->total / nloops;
- double rows = planstate->instrument->ntuples / nloops;
+ double nloops = local_instr->nloops;
+ double startup_ms = 1000.0 * local_instr->startup / nloops;
+ double total_ms = 1000.0 * local_instr->total / nloops;
+ double rows = local_instr->ntuples / nloops;
if (es->format == EXPLAIN_FORMAT_TEXT)
{
if (es->timing)
+ /* Node in progress */
+ if (es->progressive && planstate == planstate->state->progressive_explain_current_node)
+ appendStringInfo(es->str,
+ " (actual time=%.3f..%.3f rows=%.0f loops=%.0f) (current)",
+ startup_ms, total_ms, rows, nloops);
+ else
+ appendStringInfo(es->str,
+ " (actual time=%.3f..%.3f rows=%.0f loops=%.0f)",
+ startup_ms, total_ms, rows, nloops);
+ else
+ /* Node in progress */
+ if (es->progressive && planstate == planstate->state->progressive_explain_current_node)
appendStringInfo(es->str,
- " (actual time=%.3f..%.3f rows=%.0f loops=%.0f)",
- startup_ms, total_ms, rows, nloops);
+ " (actual rows=%.0f loops=%.0f) (current)",
+ rows, nloops);
else
appendStringInfo(es->str,
" (actual rows=%.0f loops=%.0f)",
@@ -1999,6 +2052,11 @@ ExplainNode(PlanState *planstate, List *ancestors,
ExplainPropertyFloat("Actual Total Time", "ms", total_ms,
3, es);
}
+ /* Progressive explain. Add current node flag is applicable */
+ if (es->progressive && planstate == planstate->state->progressive_explain_current_node)
+ {
+ ExplainPropertyBool("Current", true, es);
+ }
ExplainPropertyFloat("Actual Rows", NULL, rows, 0, es);
ExplainPropertyFloat("Actual Loops", NULL, nloops, 0, es);
}
@@ -2107,29 +2165,29 @@ ExplainNode(PlanState *planstate, List *ancestors,
"Index Cond", planstate, ancestors, es);
if (((IndexScan *) plan)->indexqualorig)
show_instrumentation_count("Rows Removed by Index Recheck", 2,
- planstate, es);
+ local_instr, es);
show_scan_qual(((IndexScan *) plan)->indexorderbyorig,
"Order By", planstate, ancestors, es);
show_scan_qual(plan->qual, "Filter", planstate, ancestors, es);
if (plan->qual)
show_instrumentation_count("Rows Removed by Filter", 1,
- planstate, es);
+ local_instr, es);
break;
case T_IndexOnlyScan:
show_scan_qual(((IndexOnlyScan *) plan)->indexqual,
"Index Cond", planstate, ancestors, es);
if (((IndexOnlyScan *) plan)->recheckqual)
show_instrumentation_count("Rows Removed by Index Recheck", 2,
- planstate, es);
+ local_instr, es);
show_scan_qual(((IndexOnlyScan *) plan)->indexorderby,
"Order By", planstate, ancestors, es);
show_scan_qual(plan->qual, "Filter", planstate, ancestors, es);
if (plan->qual)
show_instrumentation_count("Rows Removed by Filter", 1,
- planstate, es);
+ local_instr, es);
if (es->analyze)
ExplainPropertyFloat("Heap Fetches", NULL,
- planstate->instrument->ntuples2, 0, es);
+ local_instr->ntuples2, 0, es);
break;
case T_BitmapIndexScan:
show_scan_qual(((BitmapIndexScan *) plan)->indexqualorig,
@@ -2140,11 +2198,11 @@ ExplainNode(PlanState *planstate, List *ancestors,
"Recheck Cond", planstate, ancestors, es);
if (((BitmapHeapScan *) plan)->bitmapqualorig)
show_instrumentation_count("Rows Removed by Index Recheck", 2,
- planstate, es);
+ local_instr, es);
show_scan_qual(plan->qual, "Filter", planstate, ancestors, es);
if (plan->qual)
show_instrumentation_count("Rows Removed by Filter", 1,
- planstate, es);
+ local_instr, es);
show_tidbitmap_info((BitmapHeapScanState *) planstate, es);
break;
case T_SampleScan:
@@ -2161,7 +2219,7 @@ ExplainNode(PlanState *planstate, List *ancestors,
show_scan_qual(plan->qual, "Filter", planstate, ancestors, es);
if (plan->qual)
show_instrumentation_count("Rows Removed by Filter", 1,
- planstate, es);
+ local_instr, es);
if (IsA(plan, CteScan))
show_ctescan_info(castNode(CteScanState, planstate), es);
break;
@@ -2172,7 +2230,7 @@ ExplainNode(PlanState *planstate, List *ancestors,
show_scan_qual(plan->qual, "Filter", planstate, ancestors, es);
if (plan->qual)
show_instrumentation_count("Rows Removed by Filter", 1,
- planstate, es);
+ local_instr, es);
ExplainPropertyInteger("Workers Planned", NULL,
gather->num_workers, es);
@@ -2196,7 +2254,7 @@ ExplainNode(PlanState *planstate, List *ancestors,
show_scan_qual(plan->qual, "Filter", planstate, ancestors, es);
if (plan->qual)
show_instrumentation_count("Rows Removed by Filter", 1,
- planstate, es);
+ local_instr, es);
ExplainPropertyInteger("Workers Planned", NULL,
gm->num_workers, es);
@@ -2230,7 +2288,7 @@ ExplainNode(PlanState *planstate, List *ancestors,
show_scan_qual(plan->qual, "Filter", planstate, ancestors, es);
if (plan->qual)
show_instrumentation_count("Rows Removed by Filter", 1,
- planstate, es);
+ local_instr, es);
break;
case T_TableFuncScan:
if (es->verbose)
@@ -2244,7 +2302,7 @@ ExplainNode(PlanState *planstate, List *ancestors,
show_scan_qual(plan->qual, "Filter", planstate, ancestors, es);
if (plan->qual)
show_instrumentation_count("Rows Removed by Filter", 1,
- planstate, es);
+ local_instr, es);
show_table_func_scan_info(castNode(TableFuncScanState,
planstate), es);
break;
@@ -2262,7 +2320,7 @@ ExplainNode(PlanState *planstate, List *ancestors,
show_scan_qual(plan->qual, "Filter", planstate, ancestors, es);
if (plan->qual)
show_instrumentation_count("Rows Removed by Filter", 1,
- planstate, es);
+ local_instr, es);
}
break;
case T_TidRangeScan:
@@ -2279,14 +2337,14 @@ ExplainNode(PlanState *planstate, List *ancestors,
show_scan_qual(plan->qual, "Filter", planstate, ancestors, es);
if (plan->qual)
show_instrumentation_count("Rows Removed by Filter", 1,
- planstate, es);
+ local_instr, es);
}
break;
case T_ForeignScan:
show_scan_qual(plan->qual, "Filter", planstate, ancestors, es);
if (plan->qual)
show_instrumentation_count("Rows Removed by Filter", 1,
- planstate, es);
+ local_instr, es);
show_foreignscan_info((ForeignScanState *) planstate, es);
break;
case T_CustomScan:
@@ -2296,7 +2354,7 @@ ExplainNode(PlanState *planstate, List *ancestors,
show_scan_qual(plan->qual, "Filter", planstate, ancestors, es);
if (plan->qual)
show_instrumentation_count("Rows Removed by Filter", 1,
- planstate, es);
+ local_instr, es);
if (css->methods->ExplainCustomScan)
css->methods->ExplainCustomScan(css, ancestors, es);
}
@@ -2306,11 +2364,11 @@ ExplainNode(PlanState *planstate, List *ancestors,
"Join Filter", planstate, ancestors, es);
if (((NestLoop *) plan)->join.joinqual)
show_instrumentation_count("Rows Removed by Join Filter", 1,
- planstate, es);
+ local_instr, es);
show_upper_qual(plan->qual, "Filter", planstate, ancestors, es);
if (plan->qual)
show_instrumentation_count("Rows Removed by Filter", 2,
- planstate, es);
+ local_instr, es);
break;
case T_MergeJoin:
show_upper_qual(((MergeJoin *) plan)->mergeclauses,
@@ -2319,11 +2377,11 @@ ExplainNode(PlanState *planstate, List *ancestors,
"Join Filter", planstate, ancestors, es);
if (((MergeJoin *) plan)->join.joinqual)
show_instrumentation_count("Rows Removed by Join Filter", 1,
- planstate, es);
+ local_instr, es);
show_upper_qual(plan->qual, "Filter", planstate, ancestors, es);
if (plan->qual)
show_instrumentation_count("Rows Removed by Filter", 2,
- planstate, es);
+ local_instr, es);
break;
case T_HashJoin:
show_upper_qual(((HashJoin *) plan)->hashclauses,
@@ -2332,11 +2390,11 @@ ExplainNode(PlanState *planstate, List *ancestors,
"Join Filter", planstate, ancestors, es);
if (((HashJoin *) plan)->join.joinqual)
show_instrumentation_count("Rows Removed by Join Filter", 1,
- planstate, es);
+ local_instr, es);
show_upper_qual(plan->qual, "Filter", planstate, ancestors, es);
if (plan->qual)
show_instrumentation_count("Rows Removed by Filter", 2,
- planstate, es);
+ local_instr, es);
break;
case T_Agg:
show_agg_keys(castNode(AggState, planstate), ancestors, es);
@@ -2344,13 +2402,13 @@ ExplainNode(PlanState *planstate, List *ancestors,
show_hashagg_info((AggState *) planstate, es);
if (plan->qual)
show_instrumentation_count("Rows Removed by Filter", 1,
- planstate, es);
+ local_instr, es);
break;
case T_WindowAgg:
show_upper_qual(plan->qual, "Filter", planstate, ancestors, es);
if (plan->qual)
show_instrumentation_count("Rows Removed by Filter", 1,
- planstate, es);
+ local_instr, es);
show_upper_qual(((WindowAgg *) plan)->runConditionOrig,
"Run Condition", planstate, ancestors, es);
show_windowagg_info(castNode(WindowAggState, planstate), es);
@@ -2360,7 +2418,7 @@ ExplainNode(PlanState *planstate, List *ancestors,
show_upper_qual(plan->qual, "Filter", planstate, ancestors, es);
if (plan->qual)
show_instrumentation_count("Rows Removed by Filter", 1,
- planstate, es);
+ local_instr, es);
break;
case T_Sort:
show_sort_keys(castNode(SortState, planstate), ancestors, es);
@@ -2382,7 +2440,7 @@ ExplainNode(PlanState *planstate, List *ancestors,
show_upper_qual(plan->qual, "Filter", planstate, ancestors, es);
if (plan->qual)
show_instrumentation_count("Rows Removed by Filter", 1,
- planstate, es);
+ local_instr, es);
break;
case T_ModifyTable:
show_modifytable_info(castNode(ModifyTableState, planstate), ancestors,
@@ -2427,10 +2485,10 @@ ExplainNode(PlanState *planstate, List *ancestors,
}
/* Show buffer/WAL usage */
- if (es->buffers && planstate->instrument)
- show_buffer_usage(es, &planstate->instrument->bufusage);
- if (es->wal && planstate->instrument)
- show_wal_usage(es, &planstate->instrument->walusage);
+ if (es->buffers && local_instr)
+ show_buffer_usage(es, &local_instr->bufusage);
+ if (es->wal && local_instr)
+ show_wal_usage(es, &local_instr->walusage);
/* Prepare per-worker buffer/WAL usage */
if (es->workers_state && (es->buffers || es->wal) && es->verbose)
@@ -3947,19 +4005,19 @@ show_tidbitmap_info(BitmapHeapScanState *planstate, ExplainState *es)
*/
static void
show_instrumentation_count(const char *qlabel, int which,
- PlanState *planstate, ExplainState *es)
+ Instrumentation *instr, ExplainState *es)
{
double nfiltered;
double nloops;
- if (!es->analyze || !planstate->instrument)
+ if (!es->analyze || !instr)
return;
if (which == 2)
- nfiltered = planstate->instrument->nfiltered2;
+ nfiltered = instr->nfiltered2;
else
- nfiltered = planstate->instrument->nfiltered1;
- nloops = planstate->instrument->nloops;
+ nfiltered = instr->nfiltered1;
+ nloops = instr->nloops;
/* In text mode, suppress zero counts; they're not interesting enough */
if (nfiltered > 0 || es->format != EXPLAIN_FORMAT_TEXT)
@@ -4625,7 +4683,7 @@ show_modifytable_info(ModifyTableState *mtstate, List *ancestors,
{
show_upper_qual((List *) node->onConflictWhere, "Conflict Filter",
&mtstate->ps, ancestors, es);
- show_instrumentation_count("Rows Removed by Conflict Filter", 1, &mtstate->ps, es);
+ show_instrumentation_count("Rows Removed by Conflict Filter", 1, (&mtstate->ps)->instrument, es);
}
/* EXPLAIN ANALYZE display of actual outcome for each tuple proposed */
@@ -4634,11 +4692,24 @@ show_modifytable_info(ModifyTableState *mtstate, List *ancestors,
double total;
double insert_path;
double other_path;
+ Instrumentation *local_instr;
- InstrEndLoop(outerPlanState(mtstate)->instrument);
+ /* Progressive explain. Clone instrumentation */
+ if (es->progressive)
+ {
+ local_instr = palloc0(sizeof(*local_instr));
+ *local_instr = *outerPlanState(mtstate)->instrument;
+ /* Force end loop even if node is in progress */
+ InstrEndLoopForce(local_instr);
+ }
+ else
+ {
+ local_instr = outerPlanState(mtstate)->instrument;
+ InstrEndLoop(local_instr);
+ }
/* count the number of source rows */
- total = outerPlanState(mtstate)->instrument->ntuples;
+ total = local_instr->ntuples;
other_path = mtstate->ps.instrument->ntuples2;
insert_path = total - other_path;
@@ -4658,11 +4729,24 @@ show_modifytable_info(ModifyTableState *mtstate, List *ancestors,
double update_path;
double delete_path;
double skipped_path;
+ Instrumentation *local_instr;
- InstrEndLoop(outerPlanState(mtstate)->instrument);
+ /* Progressive explain. Clone instrumentation */
+ if (es->progressive)
+ {
+ local_instr = palloc0(sizeof(*local_instr));
+ *local_instr = *outerPlanState(mtstate)->instrument;
+ /* Force end loop even if node is in progress */
+ InstrEndLoopForce(local_instr);
+ }
+ else
+ {
+ local_instr = outerPlanState(mtstate)->instrument;
+ InstrEndLoop(local_instr);
+ }
/* count the number of source rows */
- total = outerPlanState(mtstate)->instrument->ntuples;
+ total = local_instr->ntuples;
insert_path = mtstate->mt_merge_inserted;
update_path = mtstate->mt_merge_updated;
delete_path = mtstate->mt_merge_deleted;
@@ -5917,3 +6001,286 @@ GetSerializationMetrics(DestReceiver *dest)
return empty;
}
+
+
+/*
+ * ProgressiveExplainPrint
+ * Prints progressive explain in memory.
+ *
+ * This operation needs to be done in a dedicated memory context
+ * as plans for instrumented runs will be printed multiple times
+ * and instrumentation objects need to be cloned so that stats
+ * can get updated without interfering with original objects.
+ */
+void
+ProgressiveExplainPrint(QueryDesc *queryDesc)
+{
+ MemoryContext tmpCxt;
+ MemoryContext oldCxt;
+ instr_time starttime;
+
+ INSTR_TIME_SET_CURRENT(starttime);
+
+ /* Dedicated memory context for the current plan being printed */
+ tmpCxt = AllocSetContextCreate(CurrentMemoryContext,
+ "Progressive Explain Temporary Context",
+ ALLOCSET_DEFAULT_SIZES);
+ oldCxt = MemoryContextSwitchTo(tmpCxt);
+
+ /* Produce a plan only if descriptor is being tracked */
+ if (queryDesc &&
+ queryDesc->planstate)
+ {
+ QueryDesc *currentQueryDesc = queryDesc;
+
+ explainHashKey key;
+ explainHashEntry *entry;
+ ExplainState *es;
+
+ es = NewExplainState();
+
+ /*
+ * Mark ExplainState as progressive so that the plan printing function
+ * adjusts logic accordingly.
+ */
+ es->progressive = true;
+ /* Instrumentation details come from the active QueryDesc */
+ es->analyze = queryDesc->instrument_options;
+ es->buffers = (queryDesc->instrument_options &
+ INSTRUMENT_BUFFERS) != 0;
+ es->wal = (queryDesc->instrument_options &
+ INSTRUMENT_WAL) != 0;
+ es->timing = (queryDesc->instrument_options &
+ INSTRUMENT_TIMER) != 0;
+ es->summary = (es->analyze);
+
+ /* Additional options come from progressive GUC settings */
+ es->format = progressive_explain_format;
+ es->verbose = progressive_explain_verbose;
+ es->settings = progressive_explain_settings;
+
+ ExplainBeginOutput(es);
+ ExplainPrintPlan(es, currentQueryDesc);
+ ExplainEndOutput(es);
+
+ key.pid = MyProcPid;
+
+ /* Exclusive access is needed to update the hash */
+ LWLockAcquire(ExplainHashLock, LW_EXCLUSIVE);
+ entry = (explainHashEntry *) hash_search(explainArray, &key, HASH_FIND, NULL);
+
+
+ if (entry)
+ {
+ entry->explain_count++;
+ strncpy(entry->plan, es->str->data, progressive_explain_output_size);
+ entry->explain_duration += elapsed_time(&starttime);
+ entry->last_explain = GetCurrentTimestamp();
+ }
+
+ LWLockRelease(ExplainHashLock);
+ }
+
+ /* Clean up temp context */
+ MemoryContextSwitchTo(oldCxt);
+ MemoryContextDelete(tmpCxt);
+}
+
+/*
+ * ProgressiveExplainBegin
+ * Enables progressive explain progress tracking for a query in the local backend.
+ *
+ * An progressive explain is printed at the beginning of every query if progressive_explain
+ * is enabled.
+
+ * For instrumented runs started with EXPLAIN ANALYZE the progressive plan is printed
+ * via ExecProcNodeInstrExplain at a regular interval controlled by progressive_explain_interval.
+ *
+ * Plans are stored in shared memory object explainArray that needs to be properly
+ * cleared when the query finishes or gets canceled. This is achieved with the help
+ * of a memory context callback configured in the same memory context where the query
+ * descriptor was created. This strategy allows cleaning explainArray even when the
+ * query doesn't finish gracefully.
+ */
+void
+ProgressiveExplainBegin(QueryDesc *queryDesc)
+{
+ explainHashKey key;
+ explainHashEntry *entry;
+ bool found;
+
+ /* Configure memory context release callback */
+ MemoryContextCallback *queryDescReleaseCallback;
+
+ queryDescReleaseCallback = (MemoryContextCallback *)
+ palloc0(sizeof(MemoryContextCallback));
+ queryDescReleaseCallback->func = ExplainTrackQueryReleaseFunc;
+ queryDescReleaseCallback->arg = NULL;
+ MemoryContextRegisterResetCallback(CurrentMemoryContext,
+ queryDescReleaseCallback);
+
+ INSTR_TIME_SET_CURRENT(queryDesc->estate->progressive_explain_interval_time);
+
+ key.pid = MyProcPid;
+
+ /* Exclusive access is needed to update the hash */
+ LWLockAcquire(ExplainHashLock, LW_EXCLUSIVE);
+
+ /* Find or create an entry with desired hash code */
+ entry = (explainHashEntry *) hash_search(explainArray, &key, HASH_ENTER, &found);
+ entry->pid = MyProcPid;
+ entry->explain_count = 0;
+ entry->explain_duration = 0.0f;
+ strcpy(entry->plan, "");
+ entry->last_explain = 0;
+
+ LWLockRelease(ExplainHashLock);
+
+ /* Printing progressive plan for the first time */
+ ProgressiveExplainPrint(queryDesc);
+
+ /* Enable timeout for instrumented runs */
+ if (queryDesc->instrument_options)
+ enable_timeout_every(PROGRESSIVE_EXPLAIN_TIMEOUT, GetCurrentTimestamp(),
+ progressive_explain_interval);
+}
+
+/*
+ * ProgressiveExplainUpdate
+ * Updates progressive explain for instrumented runs.
+ */
+void
+ProgressiveExplainUpdate(PlanState *node)
+{
+ node->state->progressive_explain_current_node = node;
+ ProgressiveExplainPrint(node->state->query_desc);
+ node->state->progressive_explain_current_node = NULL;
+ INSTR_TIME_SET_CURRENT(node->state->progressive_explain_interval_time);
+ ProgressiveExplainPending = false;
+}
+
+/*
+ * ExplainTrackQueryReleaseFunc
+ * Memory context release callback function to remove
+ * plan from explain hash.
+ */
+static void
+ExplainTrackQueryReleaseFunc(void *)
+{
+ /* Remove row from hash */
+ explainHashKey key;
+
+ key.pid = MyProcPid;
+ LWLockAcquire(ExplainHashLock, LW_EXCLUSIVE);
+ hash_search(explainArray, &key, HASH_REMOVE, NULL);
+ LWLockRelease(ExplainHashLock);
+
+ /* Stop timeout */
+ disable_timeout(PROGRESSIVE_EXPLAIN_TIMEOUT, false);
+}
+
+/*
+ * ExplainHashShmemSize
+ * Compute shared memory space needed for explain hash.
+ */
+Size
+ExplainHashShmemSize(void)
+{
+ Size size = 0;
+ long max_table_size;
+
+ max_table_size = add_size(MaxBackends, max_parallel_workers);
+
+ size = add_size(size, hash_estimate_size(max_table_size, add_size(sizeof(explainHashEntry), progressive_explain_output_size)));
+
+ return size;
+}
+
+/*
+ * InitExplainHash
+ * Initialize hash used to store data of progressive explains.
+ */
+void
+InitExplainHash(void)
+{
+ HASHCTL info;
+
+ info.keysize = sizeof(explainHashKey);
+ info.entrysize = sizeof(explainHashEntry) + progressive_explain_output_size;
+
+ explainArray = ShmemInitHash("explain hash",
+ 50, 50,
+ &info,
+ HASH_ELEM | HASH_BLOBS);
+}
+
+/*
+ * pg_stat_progress_explain
+ * Return the progress of progressive explains.
+ */
+Datum
+pg_stat_progress_explain(PG_FUNCTION_ARGS)
+{
+ char duration[1024];
+#define EXPLAIN_ACTIVITY_COLS 5
+ HASH_SEQ_STATUS hash_seq;
+ explainHashEntry *entry;
+
+ ReturnSetInfo *rsinfo = (ReturnSetInfo *) fcinfo->resultinfo;
+
+ InitMaterializedSRF(fcinfo, 0);
+
+ LWLockAcquire(ExplainHashLock, LW_SHARED);
+
+ hash_seq_init(&hash_seq, explainArray);
+ while ((entry = hash_seq_search(&hash_seq)) != NULL)
+ {
+ Datum values[EXPLAIN_ACTIVITY_COLS] = {0};
+ bool nulls[EXPLAIN_ACTIVITY_COLS] = {0};
+
+ values[0] = entry->pid;
+ values[1] = TimestampTzGetDatum(entry->last_explain);
+ values[2] = entry->explain_count;
+ sprintf(duration, "%.3f", 1000.0 * entry->explain_duration);
+ values[3] = CStringGetTextDatum(duration);
+
+ if (superuser())
+ values[4] = CStringGetTextDatum(entry->plan);
+ else
+ {
+ int num_backends = pgstat_fetch_stat_numbackends();
+ int curr_backend;
+ bool found;
+
+ /* 1-based index */
+ for (curr_backend = 1; curr_backend <= num_backends; curr_backend++)
+ {
+ LocalPgBackendStatus *local_beentry;
+ PgBackendStatus *beentry;
+
+ /* Get the next one in the list */
+ local_beentry = pgstat_get_local_beentry_by_index(curr_backend);
+ beentry = &local_beentry->backendStatus;
+
+ if (beentry->st_procpid == entry->pid)
+ {
+ found = true;
+ if (beentry->st_userid == GetUserId())
+ values[4] = CStringGetTextDatum(entry->plan);
+ else
+ values[4] = CStringGetTextDatum("<insufficient privilege>");
+ break;
+ }
+ }
+
+ if (!found)
+ values[3] = CStringGetTextDatum("<insufficient privilege>");
+ }
+
+ tuplestore_putvalues(rsinfo->setResult, rsinfo->setDesc, values, nulls);
+
+ }
+ LWLockRelease(ExplainHashLock);
+
+ return (Datum) 0;
+}
diff --git a/src/backend/executor/execMain.c b/src/backend/executor/execMain.c
index fb8dba3ab2..83fcdf15a6 100644
--- a/src/backend/executor/execMain.c
+++ b/src/backend/executor/execMain.c
@@ -61,6 +61,7 @@
#include "utils/partcache.h"
#include "utils/rls.h"
#include "utils/snapmgr.h"
+#include "commands/explain.h"
/* Hooks for plugins to get control in ExecutorStart/Run/Finish/End */
@@ -172,6 +173,11 @@ standard_ExecutorStart(QueryDesc *queryDesc, int eflags)
estate = CreateExecutorState();
queryDesc->estate = estate;
+ /*
+ * Adding back reference to QueryDesc
+ */
+ estate->query_desc = queryDesc;
+
oldcontext = MemoryContextSwitchTo(estate->es_query_cxt);
/*
@@ -258,6 +264,12 @@ standard_ExecutorStart(QueryDesc *queryDesc, int eflags)
InitPlan(queryDesc, eflags);
MemoryContextSwitchTo(oldcontext);
+
+ /*
+ * Start progressive explain if enabled.
+ */
+ if (progressive_explain)
+ ProgressiveExplainBegin(queryDesc);
}
/* ----------------------------------------------------------------
diff --git a/src/backend/executor/execProcnode.c b/src/backend/executor/execProcnode.c
index f5f9cfbeea..952aa1ba87 100644
--- a/src/backend/executor/execProcnode.c
+++ b/src/backend/executor/execProcnode.c
@@ -118,9 +118,13 @@
#include "executor/nodeWorktablescan.h"
#include "miscadmin.h"
#include "nodes/nodeFuncs.h"
+#include "commands/explain.h"
+#include "utils/guc.h"
+#include "common/pg_prng.h"
static TupleTableSlot *ExecProcNodeFirst(PlanState *node);
static TupleTableSlot *ExecProcNodeInstr(PlanState *node);
+static TupleTableSlot *ExecProcNodeInstrExplain(PlanState *node);
static bool ExecShutdownNode_walker(PlanState *node, void *context);
@@ -461,8 +465,12 @@ ExecProcNodeFirst(PlanState *node)
* does instrumentation. Otherwise we can dispense with all wrappers and
* have ExecProcNode() directly call the relevant function from now on.
*/
+
if (node->instrument)
- node->ExecProcNode = ExecProcNodeInstr;
+ if (progressive_explain)
+ node->ExecProcNode = ExecProcNodeInstrExplain;
+ else
+ node->ExecProcNode = ExecProcNodeInstr;
else
node->ExecProcNode = node->ExecProcNodeReal;
@@ -489,6 +497,31 @@ ExecProcNodeInstr(PlanState *node)
return result;
}
+/*
+ * ExecProcNode wrapper that performs instrumentation calls and prints
+ * progressive explains. By keeping this a separate function, we add
+ * overhead only when progressive explain is enabled
+ */
+static TupleTableSlot *
+ExecProcNodeInstrExplain(PlanState *node)
+{
+ TupleTableSlot *result;
+
+ InstrStartNode(node->instrument);
+
+ /*
+ * Update progressive after timeout is reached.
+ */
+ if (ProgressiveExplainPending)
+ ProgressiveExplainUpdate(node);
+
+ result = node->ExecProcNodeReal(node);
+
+ InstrStopNode(node->instrument, TupIsNull(result) ? 0.0 : 1.0);
+
+ return result;
+}
+
/* ----------------------------------------------------------------
* MultiExecProcNode
diff --git a/src/backend/executor/instrument.c b/src/backend/executor/instrument.c
index 2d3569b374..ac17c7e9d8 100644
--- a/src/backend/executor/instrument.c
+++ b/src/backend/executor/instrument.c
@@ -25,6 +25,8 @@ static WalUsage save_pgWalUsage;
static void BufferUsageAdd(BufferUsage *dst, const BufferUsage *add);
static void WalUsageAdd(WalUsage *dst, WalUsage *add);
+static void InstrEndLoopInternal(Instrumentation *instr, bool force);
+
/* Allocate new instrumentation structure(s) */
Instrumentation *
@@ -137,7 +139,7 @@ InstrUpdateTupleCount(Instrumentation *instr, double nTuples)
/* Finish a run cycle for a plan node */
void
-InstrEndLoop(Instrumentation *instr)
+InstrEndLoopInternal(Instrumentation *instr, bool force)
{
double totaltime;
@@ -145,7 +147,7 @@ InstrEndLoop(Instrumentation *instr)
if (!instr->running)
return;
- if (!INSTR_TIME_IS_ZERO(instr->starttime))
+ if (!INSTR_TIME_IS_ZERO(instr->starttime) && !force)
elog(ERROR, "InstrEndLoop called on running node");
/* Accumulate per-cycle statistics into totals */
@@ -164,6 +166,20 @@ InstrEndLoop(Instrumentation *instr)
instr->tuplecount = 0;
}
+/* Safely finish a run cycle for a plan node */
+void
+InstrEndLoop(Instrumentation *instr)
+{
+ InstrEndLoopInternal(instr, false);
+}
+
+/* Forcibly finish a run cycle for a plan node */
+void
+InstrEndLoopForce(Instrumentation *instr)
+{
+ InstrEndLoopInternal(instr, true);
+}
+
/* aggregate instrumentation information */
void
InstrAggNode(Instrumentation *dst, Instrumentation *add)
diff --git a/src/backend/storage/ipc/ipci.c b/src/backend/storage/ipc/ipci.c
index 174eed7036..efb4af1971 100644
--- a/src/backend/storage/ipc/ipci.c
+++ b/src/backend/storage/ipc/ipci.c
@@ -50,6 +50,7 @@
#include "storage/sinvaladt.h"
#include "utils/guc.h"
#include "utils/injection_point.h"
+#include "commands/explain.h"
/* GUCs */
int shared_memory_type = DEFAULT_SHARED_MEMORY_TYPE;
@@ -148,6 +149,7 @@ CalculateShmemSize(int *num_semaphores)
size = add_size(size, WaitEventCustomShmemSize());
size = add_size(size, InjectionPointShmemSize());
size = add_size(size, SlotSyncShmemSize());
+ size = add_size(size, ExplainHashShmemSize());
/* include additional requested shmem from preload libraries */
size = add_size(size, total_addin_request);
@@ -300,6 +302,11 @@ CreateOrAttachShmemStructs(void)
*/
PredicateLockShmemInit();
+ /*
+ * Set up instrumented explain hash table
+ */
+ InitExplainHash();
+
/*
* Set up process table
*/
diff --git a/src/backend/utils/activity/wait_event_names.txt b/src/backend/utils/activity/wait_event_names.txt
index e199f07162..890acf02da 100644
--- a/src/backend/utils/activity/wait_event_names.txt
+++ b/src/backend/utils/activity/wait_event_names.txt
@@ -346,6 +346,7 @@ WALSummarizer "Waiting to read or update WAL summarization state."
DSMRegistry "Waiting to read or update the dynamic shared memory registry."
InjectionPoint "Waiting to read or update information related to injection points."
SerialControl "Waiting to read or update shared <filename>pg_serial</filename> state."
+ExplainHash "Waiting to access backend explain shared hash table."
#
# END OF PREDEFINED LWLOCKS (DO NOT CHANGE THIS LINE)
diff --git a/src/backend/utils/init/postinit.c b/src/backend/utils/init/postinit.c
index 01bb6a410c..1e1fd2460a 100644
--- a/src/backend/utils/init/postinit.c
+++ b/src/backend/utils/init/postinit.c
@@ -65,6 +65,7 @@
#include "utils/snapmgr.h"
#include "utils/syscache.h"
#include "utils/timeout.h"
+#include "commands/explain.h"
static HeapTuple GetDatabaseTuple(const char *dbname);
static HeapTuple GetDatabaseTupleByOid(Oid dboid);
@@ -78,6 +79,7 @@ static void TransactionTimeoutHandler(void);
static void IdleSessionTimeoutHandler(void);
static void IdleStatsUpdateTimeoutHandler(void);
static void ClientCheckTimeoutHandler(void);
+static void ProgressiveExplainTimeoutHandler(void);
static bool ThereIsAtLeastOneRole(void);
static void process_startup_options(Port *port, bool am_superuser);
static void process_settings(Oid databaseid, Oid roleid);
@@ -741,6 +743,8 @@ InitPostgres(const char *in_dbname, Oid dboid,
RegisterTimeout(CLIENT_CONNECTION_CHECK_TIMEOUT, ClientCheckTimeoutHandler);
RegisterTimeout(IDLE_STATS_UPDATE_TIMEOUT,
IdleStatsUpdateTimeoutHandler);
+ RegisterTimeout(PROGRESSIVE_EXPLAIN_TIMEOUT,
+ ProgressiveExplainTimeoutHandler);
}
/*
@@ -1397,6 +1401,12 @@ ClientCheckTimeoutHandler(void)
SetLatch(MyLatch);
}
+static void
+ProgressiveExplainTimeoutHandler(void)
+{
+ ProgressiveExplainPending = true;
+}
+
/*
* Returns true if at least one role is defined in this database cluster.
*/
diff --git a/src/backend/utils/misc/guc_tables.c b/src/backend/utils/misc/guc_tables.c
index 38cb9e970d..706e15cca1 100644
--- a/src/backend/utils/misc/guc_tables.c
+++ b/src/backend/utils/misc/guc_tables.c
@@ -46,6 +46,7 @@
#include "commands/vacuum.h"
#include "common/file_utils.h"
#include "common/scram-common.h"
+#include "commands/explain.h"
#include "jit/jit.h"
#include "libpq/auth.h"
#include "libpq/libpq.h"
@@ -474,6 +475,14 @@ static const struct config_enum_entry wal_compression_options[] = {
{NULL, 0, false}
};
+static const struct config_enum_entry explain_format_options[] = {
+ {"text", EXPLAIN_FORMAT_TEXT, false},
+ {"xml", EXPLAIN_FORMAT_XML, false},
+ {"json", EXPLAIN_FORMAT_JSON, false},
+ {"yaml", EXPLAIN_FORMAT_YAML, false},
+ {NULL, 0, false}
+};
+
/*
* Options for enum values stored in other modules
*/
@@ -528,6 +537,12 @@ int log_parameter_max_length_on_error = 0;
int log_temp_files = -1;
double log_statement_sample_rate = 1.0;
double log_xact_sample_rate = 0;
+bool progressive_explain = false;
+bool progressive_explain_verbose = false;
+bool progressive_explain_settings = false;
+int progressive_explain_interval = 1000;
+int progressive_explain_format = EXPLAIN_FORMAT_TEXT;
+int progressive_explain_output_size = 4096;
char *backtrace_functions;
int temp_file_limit = -1;
@@ -2096,6 +2111,39 @@ struct config_bool ConfigureNamesBool[] =
NULL, NULL, NULL
},
+ {
+ {"progressive_explain", PGC_USERSET, STATS_MONITORING,
+ gettext_noop("Enables progressive explains."),
+ gettext_noop("Explain output is visible via pg_stat_progress_explain."),
+ GUC_EXPLAIN
+ },
+ &progressive_explain,
+ false,
+ NULL, NULL, NULL
+ },
+
+ {
+ {"progressive_explain_verbose", PGC_USERSET, STATS_MONITORING,
+ gettext_noop("Controls whether verbose details are added to progressive explains."),
+ gettext_noop("Equivalent to the VERBOSE option of EXPLAIN."),
+ GUC_EXPLAIN
+ },
+ &progressive_explain_verbose,
+ false,
+ NULL, NULL, NULL
+ },
+
+ {
+ {"progressive_explain_settings", PGC_USERSET, STATS_MONITORING,
+ gettext_noop("Controls whether information about modified configuration is added to progressive explains."),
+ gettext_noop("Equivalent to the SETTINGS option of EXPLAIN."),
+ GUC_EXPLAIN
+ },
+ &progressive_explain_settings,
+ false,
+ NULL, NULL, NULL
+ },
+
/* End-of-list marker */
{
{NULL, 0, 0, NULL, NULL}, NULL, false, NULL, NULL, NULL
@@ -3743,6 +3791,28 @@ struct config_int ConfigureNamesInt[] =
NULL, NULL, NULL
},
+ {
+ {"progressive_explain_interval", PGC_USERSET, STATS_MONITORING,
+ gettext_noop("Sets the interval between instrumented progressive "
+ "explains."),
+ NULL,
+ GUC_UNIT_MS
+ },
+ &progressive_explain_interval,
+ 1000, 10, INT_MAX,
+ NULL, NULL, NULL
+ },
+
+ {
+ {"progressive_explain_output_size", PGC_POSTMASTER, STATS_MONITORING,
+ gettext_noop("Sets the size reserved for pg_stat_progress_explain.explain, in bytes."),
+ NULL
+ },
+ &progressive_explain_output_size,
+ 4096, 100, 1048576,
+ NULL, NULL, NULL
+ },
+
/* End-of-list marker */
{
{NULL, 0, 0, NULL, NULL}, NULL, 0, 0, 0, NULL, NULL, NULL
@@ -5236,6 +5306,16 @@ struct config_enum ConfigureNamesEnum[] =
NULL, NULL, NULL
},
+ {
+ {"progressive_explain_format", PGC_USERSET, STATS_MONITORING,
+ gettext_noop("Selects the EXPLAIN output format to be used with progressive explains."),
+ gettext_noop("Equivalent to the FORMAT option of EXPLAIN.")
+ },
+ &progressive_explain_format,
+ EXPLAIN_FORMAT_TEXT, explain_format_options,
+ NULL, NULL, NULL
+ },
+
/* End-of-list marker */
{
{NULL, 0, 0, NULL, NULL}, NULL, 0, NULL, NULL, NULL, NULL
diff --git a/src/include/catalog/pg_proc.dat b/src/include/catalog/pg_proc.dat
index 5b8c2ad2a5..534bdcaca3 100644
--- a/src/include/catalog/pg_proc.dat
+++ b/src/include/catalog/pg_proc.dat
@@ -12454,4 +12454,14 @@
proargtypes => 'int4',
prosrc => 'gist_stratnum_common' },
+{ oid => '8770',
+ descr => 'statistics: information about progress of backends running statements',
+ proname => 'pg_stat_progress_explain', prorows => '100', proisstrict => 'f',
+ proretset => 't', provolatile => 's', proparallel => 'r',
+ prorettype => 'record', proargtypes => 'bool',
+ proallargtypes => '{bool,int4,timestamptz,int4,text,text}',
+ proargmodes => '{i,o,o,o,o,o}',
+ proargnames => '{mode,pid,last_explain,explain_count,total_explain_time,explain}',
+ prosrc => 'pg_stat_progress_explain' },
+
]
diff --git a/src/include/commands/explain.h b/src/include/commands/explain.h
index ea7419951f..cdcdac9f8d 100644
--- a/src/include/commands/explain.h
+++ b/src/include/commands/explain.h
@@ -16,6 +16,7 @@
#include "executor/executor.h"
#include "lib/stringinfo.h"
#include "parser/parse_node.h"
+#include "datatype/timestamp.h"
typedef enum ExplainSerializeOption
{
@@ -67,12 +68,28 @@ typedef struct ExplainState
List *deparse_cxt; /* context list for deparsing expressions */
Bitmapset *printed_subplans; /* ids of SubPlans we've printed */
bool hide_workers; /* set if we find an invisible Gather */
+ bool progressive; /* set if tracking a progressive explain */
int rtable_size; /* length of rtable excluding the RTE_GROUP
* entry */
/* state related to the current plan node */
ExplainWorkersState *workers_state; /* needed if parallel plan */
} ExplainState;
+typedef struct explainHashKey
+{
+ int pid; /* PID */
+} explainHashKey;
+
+typedef struct explainHashEntry
+{
+ explainHashKey key; /* hash key of entry - MUST BE FIRST */
+ int pid;
+ TimestampTz last_explain;
+ int explain_count;
+ float explain_duration;
+ char plan[];
+} explainHashEntry;
+
/* Hook for plugins to get control in ExplainOneQuery() */
typedef void (*ExplainOneQuery_hook_type) (Query *query,
int cursorOptions,
@@ -144,4 +161,11 @@ extern void ExplainCloseGroup(const char *objtype, const char *labelname,
extern DestReceiver *CreateExplainSerializeDestReceiver(ExplainState *es);
+extern void ProgressiveExplainBegin(QueryDesc *queryDesc);
+extern void ProgressiveExplainUpdate(PlanState *node);
+extern Size ExplainHashShmemSize(void);
+extern void InitExplainHash(void);
+
+extern bool ProgressiveExplainPending;
+
#endif /* EXPLAIN_H */
diff --git a/src/include/executor/instrument.h b/src/include/executor/instrument.h
index 5a6eff75c6..d431877efe 100644
--- a/src/include/executor/instrument.h
+++ b/src/include/executor/instrument.h
@@ -108,6 +108,7 @@ extern void InstrStartNode(Instrumentation *instr);
extern void InstrStopNode(Instrumentation *instr, double nTuples);
extern void InstrUpdateTupleCount(Instrumentation *instr, double nTuples);
extern void InstrEndLoop(Instrumentation *instr);
+extern void InstrEndLoopForce(Instrumentation *instr);
extern void InstrAggNode(Instrumentation *dst, Instrumentation *add);
extern void InstrStartParallelQuery(void);
extern void InstrEndParallelQuery(BufferUsage *bufusage, WalUsage *walusage);
diff --git a/src/include/nodes/execnodes.h b/src/include/nodes/execnodes.h
index d0f2dca592..3225e67023 100644
--- a/src/include/nodes/execnodes.h
+++ b/src/include/nodes/execnodes.h
@@ -56,6 +56,7 @@ struct ExprState;
struct ExprContext;
struct RangeTblEntry; /* avoid including parsenodes.h here */
struct ExprEvalStep; /* avoid including execExpr.h everywhere */
+struct QueryDesc; /* avoid including execdesc.h here */
struct CopyMultiInsertBuffer;
struct LogicalTapeSet;
@@ -751,6 +752,10 @@ typedef struct EState
*/
List *es_insert_pending_result_relations;
List *es_insert_pending_modifytables;
+
+ struct QueryDesc *query_desc;
+ instr_time progressive_explain_interval_time;
+ struct PlanState *progressive_explain_current_node;
} EState;
diff --git a/src/include/storage/lwlocklist.h b/src/include/storage/lwlocklist.h
index cf56545238..43f10a5186 100644
--- a/src/include/storage/lwlocklist.h
+++ b/src/include/storage/lwlocklist.h
@@ -83,3 +83,4 @@ PG_LWLOCK(49, WALSummarizer)
PG_LWLOCK(50, DSMRegistry)
PG_LWLOCK(51, InjectionPoint)
PG_LWLOCK(52, SerialControl)
+PG_LWLOCK(53, ExplainHash)
diff --git a/src/include/utils/guc.h b/src/include/utils/guc.h
index 532d6642bb..3a4b1f2da5 100644
--- a/src/include/utils/guc.h
+++ b/src/include/utils/guc.h
@@ -274,6 +274,12 @@ extern PGDLLIMPORT int log_min_duration_statement;
extern PGDLLIMPORT int log_temp_files;
extern PGDLLIMPORT double log_statement_sample_rate;
extern PGDLLIMPORT double log_xact_sample_rate;
+extern PGDLLIMPORT bool progressive_explain;
+extern PGDLLIMPORT bool progressive_explain_verbose;
+extern PGDLLIMPORT bool progressive_explain_settings;
+extern PGDLLIMPORT int progressive_explain_interval;
+extern PGDLLIMPORT int progressive_explain_output_size;
+extern PGDLLIMPORT int progressive_explain_format;
extern PGDLLIMPORT char *backtrace_functions;
extern PGDLLIMPORT int temp_file_limit;
diff --git a/src/include/utils/timeout.h b/src/include/utils/timeout.h
index 7b19beafdc..f2751c5b4d 100644
--- a/src/include/utils/timeout.h
+++ b/src/include/utils/timeout.h
@@ -36,6 +36,7 @@ typedef enum TimeoutId
IDLE_STATS_UPDATE_TIMEOUT,
CLIENT_CONNECTION_CHECK_TIMEOUT,
STARTUP_PROGRESS_TIMEOUT,
+ PROGRESSIVE_EXPLAIN_TIMEOUT,
/* First user-definable timeout reason */
USER_TIMEOUT,
/* Maximum number of timeout reasons */
diff --git a/src/test/regress/expected/rules.out b/src/test/regress/expected/rules.out
index 3361f6a69c..0552017506 100644
--- a/src/test/regress/expected/rules.out
+++ b/src/test/regress/expected/rules.out
@@ -2040,6 +2040,12 @@ pg_stat_progress_create_index| SELECT s.pid,
s.param15 AS partitions_done
FROM (pg_stat_get_progress_info('CREATE INDEX'::text) s(pid, datid, relid, param1, param2, param3, param4, param5, param6, param7, param8, param9, param10, param11, param12, param13, param14, param15, param16, param17, param18, param19, param20)
LEFT JOIN pg_database d ON ((s.datid = d.oid)));
+pg_stat_progress_explain| SELECT pid,
+ last_explain,
+ explain_count,
+ total_explain_time,
+ explain
+ FROM pg_stat_progress_explain(true) pg_stat_progress_explain(pid, last_explain, explain_count, total_explain_time, explain);
pg_stat_progress_vacuum| SELECT s.pid,
s.datid,
d.datname,
--
2.34.1
Hello all,
Sending a new version of the patch that includes important changes
addressing
feedback provided by Greg and Tomas. So, including the previous version (v5)
sent on Jan 29, these are the highlights of what has changed:
- Progressive plan printed on regular interval defined by
progressive_explain_timeout
now uses timeouts. GUC progressive_explain_sampe_rate is removed.
- Objects allocated per plan print (Instrument and ExplainState) were
replaced
by reusable objects allocated at query start (during progressive explain
setup
phase). So currently we are allocating only 2 objects for the complete
duration
of the feature. With that, removed the temporary memory context that was
being
allocated per iteration.
- progressive_explain GUC was changed from boolean to enum, accepting
values 'off',
'explain' and 'analyze'. This allows using instrumented progressive explains
for any query and not only the ones started via EXPLAIN ANALYZE. If GUC is
set to 'explain' the plan will be printed only once at query start. If set
to 'analyze' instrumentation will be enabled in QueryDesc and the detailed
plan will be printed iteratively. Considering that now we can enable
instrumentation
for regular queries, added the following GUCs to control what instruments
are enabled: progressive_explain_buffers, progressive_explain_timing and
progressive_explain_wals.
- better handling of shared memory space where plans are printed and shared
with other backends. In previous version we had a shared hash with elements
holding all data related to progressive explains, including the complete
plan string:
typedef struct explainHashEntry
{
explainHashKey key; /* hash key of entry - MUST BE FIRST */
int pid;
TimestampTz last_explain;
int explain_count;
float explain_duration;
char plan[];
} explainHashEntry;
The allocated size per element used to be defined by
progressive_explain_output_size,
which would essentially control the space available for plan[].
Greg raised the concern of PG having to allocate too much shared memory
at database start considering that we need enough space for max_connections
+
max_parallel_workers, and that is a totally valid point.
So the new version takes advantage of DSAs. Each backend creates its own
DSA at query start (if progressive explain is enabled) where the shared
data is stored. That DSA is shared with other backends via hash structure
through dsa_handle and dsa_pointer pointers:
typedef struct progressiveExplainHashEntry
{
progressiveExplainHashKey key; /* hash key of entry - MUST BE FIRST */
dsa_handle h;
dsa_pointer p;
} progressiveExplainHashEntry;
typedef struct progressiveExplainData
{
int pid;
TimestampTz last_print;
char plan[];
} progressiveExplainData;
That allows us to allocate areas of custom sizes for plan[]. The strategy
being used currently is to allocate an initial space with the size of the
initial plan output + PROGRESSIVE_EXPLAIN_ALLOC_SIZE (4096 currently), which
gives PG enough room for subsequent iterations where the new string may
increase a bit, without having to reallocate space. The code checks sizes
and
will reallocate if needed. With that, GUC progressive_explain_output_size
was removed.
- Adjusted columns of pg_stat_progress_explain. Columns explain_count and
total_explain_time were removed. Column last_explain was renamed to
last_print.
Column explain was renamed to query_plan, as this is the name used by PG
when a plan is printed with EXPLAIN.
Rafael.
Attachments:
v5-0001-Proposal-for-progressive-explains.patchapplication/octet-stream; name=v5-0001-Proposal-for-progressive-explains.patchDownload
From 2195695627ea3f127937b3c066c157b475127c6c Mon Sep 17 00:00:00 2001
From: rafaelthca <rafaelthca@gmail.com>
Date: Tue, 7 Jan 2025 19:32:16 +0000
Subject: [PATCH v5] Proposal for progressive explains
This proposal introduces a feature to print execution plans of active
queries in an in-memory shared hash object so that other sessions can
visualize via new view pg_stat_progress_explain.
Plans are only printed if new GUC parameter progressive_explain is
enabled.
When progressive_explain is set to 'explain' the plan will be printed
only once at the beginning of the query. If set to 'analyze' the QueryDesc
will be adjusted adding instrumentation flags. In that case the plan
will be printed on a fixed interval controlled by new GUC parameter
progressive_explain_interval including all instrumentation stats
computed so far (per node rows and execution time).
New view:
- pg_stat_progress_explain
- pid: PID of the process running the query
- last_print: timestamp when plan was last printed
- query_plan: the actual plan (limited read privileges)
New GUCs:
- progressive_explain: if progressive plans are printed for local
session.
- type: enum
- default: off
- values: [off, explain, analyze]
- context: user
- progressive_explain_interval: interval between each explain print.
- type: int
- default: 1s
- min: 10ms
- context: user
- progressive_explain_format: format used to print the plans.
- type: enum
- default: text
- values: [TEXT, XML, JSON, or YAML]
- context: user
- progressive_explain_settings: controls whether information about
modified configuration is added to the printed plan.
- type: bool
- default: off
- context: user
- progressive_explain_verbose: controls whether verbose details are
added to the printed plan.
- type: bool
- default: off
- context: user
- progressive_explain_buffers: controls whether buffers details are
added to the printed plan.
- type: bool
- default: off
- context: user
- progressive_explain_timing: controls whether per node timing details
are added to the printed plan.
- type: bool
- default: off
- context: user
- progressive_explain_wal: controls whether WAL record generation
details are added to the printed plan.
- type: bool
- default: off
- context: user
---
doc/src/sgml/config.sgml | 120 ++++
doc/src/sgml/monitoring.sgml | 64 ++
doc/src/sgml/perform.sgml | 95 +++
src/backend/catalog/system_views.sql | 5 +
src/backend/commands/explain.c | 580 ++++++++++++++++--
src/backend/executor/execMain.c | 24 +
src/backend/executor/execProcnode.c | 36 +-
src/backend/executor/instrument.c | 20 +-
src/backend/storage/ipc/ipci.c | 7 +
src/backend/storage/lmgr/lwlock.c | 1 +
.../utils/activity/wait_event_names.txt | 1 +
src/backend/utils/init/postinit.c | 10 +
src/backend/utils/misc/guc_tables.c | 112 ++++
src/backend/utils/misc/postgresql.conf.sample | 12 +
src/include/catalog/pg_proc.dat | 10 +
src/include/commands/explain.h | 42 ++
src/include/executor/execdesc.h | 1 +
src/include/executor/instrument.h | 1 +
src/include/nodes/execnodes.h | 4 +
src/include/storage/lwlock.h | 1 +
src/include/storage/lwlocklist.h | 1 +
src/include/utils/guc.h | 8 +
src/include/utils/timeout.h | 1 +
src/test/regress/expected/rules.out | 4 +
src/tools/pgindent/typedefs.list | 4 +
25 files changed, 1108 insertions(+), 56 deletions(-)
diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml
index 336630ce417..ec44ccc693d 100644
--- a/doc/src/sgml/config.sgml
+++ b/doc/src/sgml/config.sgml
@@ -8411,6 +8411,126 @@ COPY postgres_log FROM '/full/path/to/logfile.csv' WITH csv;
</listitem>
</varlistentry>
+ <varlistentry id="guc-progressive-explain" xreflabel="progressive_explain">
+ <term><varname>progressive_explain</varname> (<type>enum</type>)
+ <indexterm>
+ <primary><varname>progressive_explain</varname> configuration parameter</primary>
+ </indexterm>
+ </term>
+ <listitem>
+ <para>
+ Determines whether progressive explains are enabled and how
+ they are executed; see <xref linkend="using-explain-progressive"/>.
+ Possible values are <literal>off</literal>, <literal>explain</literal>
+ and <literal>analyze</literal>. The default is <literal>off</literal>.
+ </para>
+ </listitem>
+ </varlistentry>
+
+ <varlistentry id="guc-progressive-explain-buffers" xreflabel="progressive_explain_buffers">
+ <term><varname>progressive_explain_buffers</varname> (<type>boolean</type>)
+ <indexterm>
+ <primary><varname>progressive_explain_buffers</varname> configuration parameter</primary>
+ </indexterm>
+ </term>
+ <listitem>
+ <para>
+ Controls whether information about buffers is added to
+ progressive explains. Equivalent to the BUFFERS option of
+ EXPLAIN. The default is <literal>off</literal>.
+ </para>
+ </listitem>
+ </varlistentry>
+
+ <varlistentry id="guc-progressive-explain-timing" xreflabel="progressive_explain_timing">
+ <term><varname>progressive_explain_timing</varname> (<type>boolean</type>)
+ <indexterm>
+ <primary><varname>progressive_explain_timing</varname> configuration parameter</primary>
+ </indexterm>
+ </term>
+ <listitem>
+ <para>
+ Controls whether information about per node timing is added
+ to progressive explains. Equivalent to the TIMING option of
+ EXPLAIN. The default is <literal>off</literal>.
+ </para>
+ </listitem>
+ </varlistentry>
+
+ <varlistentry id="guc-progressive-explain-wal" xreflabel="progressive_explain_wal">
+ <term><varname>progressive_explain_wal</varname> (<type>boolean</type>)
+ <indexterm>
+ <primary><varname>progressive_explain_wal</varname> configuration parameter</primary>
+ </indexterm>
+ </term>
+ <listitem>
+ <para>
+ Controls whether information about WAL record generation is
+ added to progressive explains. Equivalent to the WAL option of
+ EXPLAIN. The default is <literal>off</literal>.
+ </para>
+ </listitem>
+ </varlistentry>
+
+ <varlistentry id="guc-progressive-explain-verbose" xreflabel="progressive_explain_verbose">
+ <term><varname>progressive_explain_verbose</varname> (<type>boolean</type>)
+ <indexterm>
+ <primary><varname>progressive_explain_verbose</varname> configuration parameter</primary>
+ </indexterm>
+ </term>
+ <listitem>
+ <para>
+ Controls whether verbose details are added to progressive explains.
+ Equivalent to the VERBOSE option of EXPLAIN. The default is
+ <literal>off</literal>.
+ </para>
+ </listitem>
+ </varlistentry>
+
+ <varlistentry id="guc-progressive-explain-settings" xreflabel="progressive_explain_settings">
+ <term><varname>progressive_explain_settings</varname> (<type>boolean</type>)
+ <indexterm>
+ <primary><varname>progressive_explain_settings</varname> configuration parameter</primary>
+ </indexterm>
+ </term>
+ <listitem>
+ <para>
+ Controls whether information about modified configuration is added to
+ progressive explains. Equivalent to the SETTINGS option of EXPLAIN.
+ The default is <literal>off</literal>.
+ </para>
+ </listitem>
+ </varlistentry>
+
+ <varlistentry id="guc-progressive-explain-interval" xreflabel="progressive_explain_interval">
+ <term><varname>progressive_explain_interval</varname> (<type>integer</type>)
+ <indexterm>
+ <primary><varname>progressive_explain_interval</varname> configuration parameter</primary>
+ </indexterm>
+ </term>
+ <listitem>
+ <para>
+ Sets the interval (in milliseconds) between each instrumented
+ progressive explain. The default is <literal>1s</literal>.
+ </para>
+ </listitem>
+ </varlistentry>
+
+ <varlistentry id="guc-progressive-explain-format" xreflabel="progressive_explain_format">
+ <term><varname>progressive_explain_format</varname> (<type>enum</type>)
+ <indexterm>
+ <primary><varname>progressive_explain_format</varname> configuration parameter</primary>
+ </indexterm>
+ </term>
+ <listitem>
+ <para>
+ Selects the EXPLAIN output format to be used with progressive
+ explains. Equivalent to the FORMAT option of EXPLAIN. The default
+ is <literal>text</literal>.
+ </para>
+ </listitem>
+ </varlistentry>
+
</variablelist>
</sect2>
diff --git a/doc/src/sgml/monitoring.sgml b/doc/src/sgml/monitoring.sgml
index 928a6eb64b0..0a64ea78324 100644
--- a/doc/src/sgml/monitoring.sgml
+++ b/doc/src/sgml/monitoring.sgml
@@ -6818,6 +6818,70 @@ FROM pg_stat_get_backend_idset() AS backendid;
</sect2>
+<sect2 id="explain-progress-reporting">
+ <title>EXPLAIN Progress Reporting</title>
+
+ <indexterm>
+ <primary>pg_stat_progress_explain</primary>
+ </indexterm>
+
+ <para>
+ Whenever a client backend or parallel worker is running a query with
+ <xref linkend="guc-progressive-explain"/> enabled, the
+ <structname>pg_stat_progress_explain</structname> view will contain a
+ corresponding row with query plan details; see
+ <xref linkend="using-explain-progressive"/>. The table below describe the
+ information that will be reported.
+ </para>
+
+ <table id="pg-stat-progress-explain-view" xreflabel="pg_stat_progress_explain">
+ <title><structname>pg_stat_progress_explain</structname> View</title>
+ <tgroup cols="1">
+ <thead>
+ <row>
+ <entry role="catalog_table_entry"><para role="column_definition">
+ Column Type
+ </para>
+ <para>
+ Description
+ </para></entry>
+ </row>
+ </thead>
+
+ <tbody>
+ <row>
+ <entry role="catalog_table_entry"><para role="column_definition">
+ <structfield>pid</structfield> <type>integer</type>
+ </para>
+ <para>
+ Process ID of a client backend or parallel worker.
+ </para></entry>
+ </row>
+
+ <row>
+ <entry role="catalog_table_entry"><para role="column_definition">
+ <structfield>last_print</structfield> <type>timestamp with time zone</type>
+ </para>
+ <para>
+ Timestamp when plan was last printed.
+ </para></entry>
+ </row>
+
+ <row>
+ <entry role="catalog_table_entry"><para role="column_definition">
+ <structfield>query_plan</structfield> <type>text</type>
+ </para>
+ <para>
+ The progressive explain text.
+ </para></entry>
+ </row>
+
+ </tbody>
+ </tgroup>
+ </table>
+
+ </sect2>
+
</sect1>
<sect1 id="dynamic-trace">
diff --git a/doc/src/sgml/perform.sgml b/doc/src/sgml/perform.sgml
index a502a2aaba3..349e8a66b65 100644
--- a/doc/src/sgml/perform.sgml
+++ b/doc/src/sgml/perform.sgml
@@ -1109,6 +1109,101 @@ EXPLAIN ANALYZE SELECT * FROM tenk1 WHERE unique1 < 100 AND unique2 > 9000
</para>
</sect2>
+ <sect2 id="using-explain-progressive">
+ <title>Progressive <command>EXPLAIN</command></title>
+
+ <para>
+ The query plan created by the planner for any given active query can
+ be visualized by any session via <xref linkend="pg-stat-progress-explain-view"/>
+ view when <xref linkend="guc-progressive-explain"/> is enabled in the
+ client backend or parallel worker executing query. Settings
+ <xref linkend="guc-progressive-explain-timing"/>,
+ <xref linkend="guc-progressive-explain-buffers"/> and
+ <xref linkend="guc-progressive-explain-wal"/> control which additional
+ instrumentaton details are collected and included in the output while
+ <xref linkend="guc-progressive-explain-format"/>,
+ <xref linkend="guc-progressive-explain-verbose"/> and
+ <xref linkend="guc-progressive-explain-settings"/>
+ define how the plan is printed and which details are added there.
+ </para>
+
+ <para>
+ When <xref linkend="guc-progressive-explain"/> is set to <literal>explain</literal>
+ the plan will be printed once at the beginning of the query.
+ </para>
+
+ <para>
+<screen>
+SET progressive_explain = 'explain';
+SET
+
+SELECT * FROM test t1 INNER JOIN test t2 ON t1.c1=t2.c1;
+</screen>
+ </para>
+ <para>
+<screen>
+SELECT * FROM pg_stat_progress_explain;
+ pid | last_print | query_plan
+------+-------------------------------+------------------------------------------------------------------------------
+ 5307 | 2025-02-18 09:37:41.781459-03 | Hash Join (cost=327879.85..878413.95 rows=9999860 width=18) +
+ | | Hash Cond: (t1.c1 = t2.c1) +
+ | | -> Seq Scan on test t1 (cost=0.00..154053.60 rows=9999860 width=9) +
+ | | -> Hash (cost=154053.60..154053.60 rows=9999860 width=9) +
+ | | -> Seq Scan on test t2 (cost=0.00..154053.60 rows=9999860 width=9)+
+ | |
+(1 row)
+</screen>
+ </para>
+
+ <para>
+ Setting <xref linkend="guc-progressive-explain"/> to <literal>analyze</literal>
+ will enable instrumentation and the detailed plan is printed on a fixed interval
+ controlled by <xref linkend="guc-progressive-explain-interval"/>, including
+ per node accumulated row count and other statistics.
+ </para>
+
+ <para>
+ Progressive explains include additional information per node to help analyzing
+ execution progress:
+
+ <itemizedlist>
+ <listitem>
+ <para>
+ current: the plan node currently being processed.
+ </para>
+ </listitem>
+ <listitem>
+ <para>
+ never executed: a plan node not processed yet.
+ </para>
+ </listitem>
+ </itemizedlist>
+ </para>
+ <para>
+<screen>
+SET progressive_explain = 'analyze';
+SET
+
+SELECT * FROM test t1 INNER JOIN test t2 ON t1.c1=t2.c1;
+</screen>
+ </para>
+ <para>
+<screen>
+SELECT * FROM pg_stat_progress_explain;
+ pid | last_print | query_plan
+------+-------------------------------+------------------------------------------------------------------------------------------------------------------------------------
+ 5307 | 2025-02-18 09:36:03.580721-03 | Hash Join (cost=327879.85..878413.95 rows=9999860 width=18) (actual time=2010.504..2998.111 rows=38603 loops=1) +
+ | | Hash Cond: (t1.c1 = t2.c1) +
+ | | -> Seq Scan on test t1 (cost=0.00..154053.60 rows=9999860 width=9) (actual time=0.068..303.963 rows=4928320 loops=1) (current)+
+ | | -> Hash (cost=154053.60..154053.60 rows=9999860 width=9) (actual time=2009.824..2009.824 rows=10000000 loops=1) +
+ | | -> Seq Scan on test t2 (cost=0.00..154053.60 rows=9999860 width=9) (actual time=0.035..640.890 rows=10000000 loops=1) +
+ | |
+(1 row)
+</screen>
+ </para>
+
+ </sect2>
+
</sect1>
<sect1 id="planner-stats">
diff --git a/src/backend/catalog/system_views.sql b/src/backend/catalog/system_views.sql
index eff0990957e..445aa317ad0 100644
--- a/src/backend/catalog/system_views.sql
+++ b/src/backend/catalog/system_views.sql
@@ -1338,6 +1338,11 @@ CREATE VIEW pg_stat_progress_copy AS
FROM pg_stat_get_progress_info('COPY') AS S
LEFT JOIN pg_database D ON S.datid = D.oid;
+CREATE VIEW pg_stat_progress_explain AS
+ SELECT
+ *
+ FROM pg_stat_progress_explain(true);
+
CREATE VIEW pg_user_mappings AS
SELECT
U.oid AS umid,
diff --git a/src/backend/commands/explain.c b/src/backend/commands/explain.c
index dc4bef9ab81..d89b71ff684 100644
--- a/src/backend/commands/explain.c
+++ b/src/backend/commands/explain.c
@@ -19,6 +19,7 @@
#include "commands/defrem.h"
#include "commands/prepare.h"
#include "foreign/fdwapi.h"
+#include "funcapi.h"
#include "jit/jit.h"
#include "libpq/pqformat.h"
#include "libpq/protocol.h"
@@ -30,6 +31,7 @@
#include "rewrite/rewriteHandler.h"
#include "storage/bufmgr.h"
#include "tcop/tcopprot.h"
+#include "utils/backend_status.h"
#include "utils/builtins.h"
#include "utils/guc_tables.h"
#include "utils/json.h"
@@ -37,17 +39,27 @@
#include "utils/rel.h"
#include "utils/ruleutils.h"
#include "utils/snapmgr.h"
+#include "utils/timeout.h"
#include "utils/tuplesort.h"
#include "utils/typcache.h"
#include "utils/xml.h"
+
+
+#define PROGRESSIVE_EXPLAIN_ALLOC_SIZE 4096
+
/* Hook for plugins to get control in ExplainOneQuery() */
ExplainOneQuery_hook_type ExplainOneQuery_hook = NULL;
/* Hook for plugins to get control in explain_get_index_name() */
explain_get_index_name_hook_type explain_get_index_name_hook = NULL;
+/* Shared hash to store progressive explains */
+static HTAB *progressiveExplainArray = NULL;
+
+/* Flag set by timeouts to control when to print progressive explains */
+bool ProgressiveExplainPending = false;
/* Instrumentation data for SERIALIZE option */
typedef struct SerializeMetrics
@@ -140,7 +152,7 @@ static void show_hashagg_info(AggState *aggstate, ExplainState *es);
static void show_tidbitmap_info(BitmapHeapScanState *planstate,
ExplainState *es);
static void show_instrumentation_count(const char *qlabel, int which,
- PlanState *planstate, ExplainState *es);
+ Instrumentation *instr, ExplainState *es);
static void show_foreignscan_info(ForeignScanState *fsstate, ExplainState *es);
static const char *explain_get_index_name(Oid indexId);
static bool peek_buffer_usage(ExplainState *es, const BufferUsage *usage);
@@ -180,6 +192,9 @@ static void ExplainJSONLineEnding(ExplainState *es);
static void ExplainYAMLLineStarting(ExplainState *es);
static void escape_yaml(StringInfo buf, const char *str);
static SerializeMetrics GetSerializationMetrics(DestReceiver *dest);
+static void ProgressiveExplainPrint(QueryDesc *queryDesc);
+static void ProgressiveExplainCleanup(dsa_area *a);
+static void ProgressiveExplainReleaseFunc(void *);
@@ -392,6 +407,8 @@ NewExplainState(void)
es->costs = true;
/* Prepare output buffer. */
es->str = makeStringInfo();
+ /* An explain state is not progressive by default */
+ es->progressive = false;
return es;
}
@@ -1504,6 +1521,7 @@ ExplainNode(PlanState *planstate, List *ancestors,
const char *partialmode = NULL;
const char *operation = NULL;
const char *custom_name = NULL;
+ Instrumentation *local_instr = NULL;
ExplainWorkersState *save_workers_state = es->workers_state;
int save_indent = es->indent;
bool haschildren;
@@ -1967,28 +1985,65 @@ ExplainNode(PlanState *planstate, List *ancestors,
* instrumentation results the user didn't ask for. But we do the
* InstrEndLoop call anyway, if possible, to reduce the number of cases
* auto_explain has to contend with.
+ *
+ * For regular explains instrumentation clean up is called directly in the
+ * main instrumentation objects. Progressive explains need to clone
+ * instrumentation object and forcibly end the loop in nodes that may be
+ * running.
*/
if (planstate->instrument)
- InstrEndLoop(planstate->instrument);
+ {
+ /* Progressive explain. Use auxiliary instrumentation object */
+ if (es->progressive)
+ {
+ local_instr = es->pe_local_instr;
+ *local_instr = *planstate->instrument;
+
+ /* Force end loop even if node is in progress */
+ InstrEndLoopForce(local_instr);
+ }
+ /* Use main instrumentation */
+ else
+ {
+ local_instr = planstate->instrument;
+ InstrEndLoop(local_instr);
+ }
+ }
if (es->analyze &&
- planstate->instrument && planstate->instrument->nloops > 0)
+ local_instr && local_instr->nloops > 0)
{
- double nloops = planstate->instrument->nloops;
- double startup_ms = 1000.0 * planstate->instrument->startup / nloops;
- double total_ms = 1000.0 * planstate->instrument->total / nloops;
- double rows = planstate->instrument->ntuples / nloops;
+ double nloops = local_instr->nloops;
+ double startup_ms = 1000.0 * local_instr->startup / nloops;
+ double total_ms = 1000.0 * local_instr->total / nloops;
+ double rows = local_instr->ntuples / nloops;
if (es->format == EXPLAIN_FORMAT_TEXT)
{
if (es->timing)
- appendStringInfo(es->str,
- " (actual time=%.3f..%.3f rows=%.0f loops=%.0f)",
- startup_ms, total_ms, rows, nloops);
+ {
+ /* Node in progress */
+ if (es->progressive && planstate == es->pe_curr_node)
+ appendStringInfo(es->str,
+ " (actual time=%.3f..%.3f rows=%.0f loops=%.0f) (current)",
+ startup_ms, total_ms, rows, nloops);
+ else
+ appendStringInfo(es->str,
+ " (actual time=%.3f..%.3f rows=%.0f loops=%.0f)",
+ startup_ms, total_ms, rows, nloops);
+ }
else
- appendStringInfo(es->str,
- " (actual rows=%.0f loops=%.0f)",
- rows, nloops);
+ {
+ /* Node in progress */
+ if (es->progressive && planstate == es->pe_curr_node)
+ appendStringInfo(es->str,
+ " (actual rows=%.0f loops=%.0f) (current)",
+ rows, nloops);
+ else
+ appendStringInfo(es->str,
+ " (actual rows=%.0f loops=%.0f)",
+ rows, nloops);
+ }
}
else
{
@@ -1999,6 +2054,9 @@ ExplainNode(PlanState *planstate, List *ancestors,
ExplainPropertyFloat("Actual Total Time", "ms", total_ms,
3, es);
}
+ /* Progressive explain. Add current node flag is applicable */
+ if (es->progressive && planstate == es->pe_curr_node)
+ ExplainPropertyBool("Current", true, es);
ExplainPropertyFloat("Actual Rows", NULL, rows, 0, es);
ExplainPropertyFloat("Actual Loops", NULL, nloops, 0, es);
}
@@ -2107,29 +2165,29 @@ ExplainNode(PlanState *planstate, List *ancestors,
"Index Cond", planstate, ancestors, es);
if (((IndexScan *) plan)->indexqualorig)
show_instrumentation_count("Rows Removed by Index Recheck", 2,
- planstate, es);
+ local_instr, es);
show_scan_qual(((IndexScan *) plan)->indexorderbyorig,
"Order By", planstate, ancestors, es);
show_scan_qual(plan->qual, "Filter", planstate, ancestors, es);
if (plan->qual)
show_instrumentation_count("Rows Removed by Filter", 1,
- planstate, es);
+ local_instr, es);
break;
case T_IndexOnlyScan:
show_scan_qual(((IndexOnlyScan *) plan)->indexqual,
"Index Cond", planstate, ancestors, es);
if (((IndexOnlyScan *) plan)->recheckqual)
show_instrumentation_count("Rows Removed by Index Recheck", 2,
- planstate, es);
+ local_instr, es);
show_scan_qual(((IndexOnlyScan *) plan)->indexorderby,
"Order By", planstate, ancestors, es);
show_scan_qual(plan->qual, "Filter", planstate, ancestors, es);
if (plan->qual)
show_instrumentation_count("Rows Removed by Filter", 1,
- planstate, es);
+ local_instr, es);
if (es->analyze)
ExplainPropertyFloat("Heap Fetches", NULL,
- planstate->instrument->ntuples2, 0, es);
+ local_instr->ntuples2, 0, es);
break;
case T_BitmapIndexScan:
show_scan_qual(((BitmapIndexScan *) plan)->indexqualorig,
@@ -2140,11 +2198,11 @@ ExplainNode(PlanState *planstate, List *ancestors,
"Recheck Cond", planstate, ancestors, es);
if (((BitmapHeapScan *) plan)->bitmapqualorig)
show_instrumentation_count("Rows Removed by Index Recheck", 2,
- planstate, es);
+ local_instr, es);
show_scan_qual(plan->qual, "Filter", planstate, ancestors, es);
if (plan->qual)
show_instrumentation_count("Rows Removed by Filter", 1,
- planstate, es);
+ local_instr, es);
show_tidbitmap_info((BitmapHeapScanState *) planstate, es);
break;
case T_SampleScan:
@@ -2161,7 +2219,7 @@ ExplainNode(PlanState *planstate, List *ancestors,
show_scan_qual(plan->qual, "Filter", planstate, ancestors, es);
if (plan->qual)
show_instrumentation_count("Rows Removed by Filter", 1,
- planstate, es);
+ local_instr, es);
if (IsA(plan, CteScan))
show_ctescan_info(castNode(CteScanState, planstate), es);
break;
@@ -2172,7 +2230,7 @@ ExplainNode(PlanState *planstate, List *ancestors,
show_scan_qual(plan->qual, "Filter", planstate, ancestors, es);
if (plan->qual)
show_instrumentation_count("Rows Removed by Filter", 1,
- planstate, es);
+ local_instr, es);
ExplainPropertyInteger("Workers Planned", NULL,
gather->num_workers, es);
@@ -2196,7 +2254,7 @@ ExplainNode(PlanState *planstate, List *ancestors,
show_scan_qual(plan->qual, "Filter", planstate, ancestors, es);
if (plan->qual)
show_instrumentation_count("Rows Removed by Filter", 1,
- planstate, es);
+ local_instr, es);
ExplainPropertyInteger("Workers Planned", NULL,
gm->num_workers, es);
@@ -2230,7 +2288,7 @@ ExplainNode(PlanState *planstate, List *ancestors,
show_scan_qual(plan->qual, "Filter", planstate, ancestors, es);
if (plan->qual)
show_instrumentation_count("Rows Removed by Filter", 1,
- planstate, es);
+ local_instr, es);
break;
case T_TableFuncScan:
if (es->verbose)
@@ -2244,7 +2302,7 @@ ExplainNode(PlanState *planstate, List *ancestors,
show_scan_qual(plan->qual, "Filter", planstate, ancestors, es);
if (plan->qual)
show_instrumentation_count("Rows Removed by Filter", 1,
- planstate, es);
+ local_instr, es);
show_table_func_scan_info(castNode(TableFuncScanState,
planstate), es);
break;
@@ -2262,7 +2320,7 @@ ExplainNode(PlanState *planstate, List *ancestors,
show_scan_qual(plan->qual, "Filter", planstate, ancestors, es);
if (plan->qual)
show_instrumentation_count("Rows Removed by Filter", 1,
- planstate, es);
+ local_instr, es);
}
break;
case T_TidRangeScan:
@@ -2279,14 +2337,14 @@ ExplainNode(PlanState *planstate, List *ancestors,
show_scan_qual(plan->qual, "Filter", planstate, ancestors, es);
if (plan->qual)
show_instrumentation_count("Rows Removed by Filter", 1,
- planstate, es);
+ local_instr, es);
}
break;
case T_ForeignScan:
show_scan_qual(plan->qual, "Filter", planstate, ancestors, es);
if (plan->qual)
show_instrumentation_count("Rows Removed by Filter", 1,
- planstate, es);
+ local_instr, es);
show_foreignscan_info((ForeignScanState *) planstate, es);
break;
case T_CustomScan:
@@ -2296,7 +2354,7 @@ ExplainNode(PlanState *planstate, List *ancestors,
show_scan_qual(plan->qual, "Filter", planstate, ancestors, es);
if (plan->qual)
show_instrumentation_count("Rows Removed by Filter", 1,
- planstate, es);
+ local_instr, es);
if (css->methods->ExplainCustomScan)
css->methods->ExplainCustomScan(css, ancestors, es);
}
@@ -2306,11 +2364,11 @@ ExplainNode(PlanState *planstate, List *ancestors,
"Join Filter", planstate, ancestors, es);
if (((NestLoop *) plan)->join.joinqual)
show_instrumentation_count("Rows Removed by Join Filter", 1,
- planstate, es);
+ local_instr, es);
show_upper_qual(plan->qual, "Filter", planstate, ancestors, es);
if (plan->qual)
show_instrumentation_count("Rows Removed by Filter", 2,
- planstate, es);
+ local_instr, es);
break;
case T_MergeJoin:
show_upper_qual(((MergeJoin *) plan)->mergeclauses,
@@ -2319,11 +2377,11 @@ ExplainNode(PlanState *planstate, List *ancestors,
"Join Filter", planstate, ancestors, es);
if (((MergeJoin *) plan)->join.joinqual)
show_instrumentation_count("Rows Removed by Join Filter", 1,
- planstate, es);
+ local_instr, es);
show_upper_qual(plan->qual, "Filter", planstate, ancestors, es);
if (plan->qual)
show_instrumentation_count("Rows Removed by Filter", 2,
- planstate, es);
+ local_instr, es);
break;
case T_HashJoin:
show_upper_qual(((HashJoin *) plan)->hashclauses,
@@ -2332,11 +2390,11 @@ ExplainNode(PlanState *planstate, List *ancestors,
"Join Filter", planstate, ancestors, es);
if (((HashJoin *) plan)->join.joinqual)
show_instrumentation_count("Rows Removed by Join Filter", 1,
- planstate, es);
+ local_instr, es);
show_upper_qual(plan->qual, "Filter", planstate, ancestors, es);
if (plan->qual)
show_instrumentation_count("Rows Removed by Filter", 2,
- planstate, es);
+ local_instr, es);
break;
case T_Agg:
show_agg_keys(castNode(AggState, planstate), ancestors, es);
@@ -2344,13 +2402,13 @@ ExplainNode(PlanState *planstate, List *ancestors,
show_hashagg_info((AggState *) planstate, es);
if (plan->qual)
show_instrumentation_count("Rows Removed by Filter", 1,
- planstate, es);
+ local_instr, es);
break;
case T_WindowAgg:
show_upper_qual(plan->qual, "Filter", planstate, ancestors, es);
if (plan->qual)
show_instrumentation_count("Rows Removed by Filter", 1,
- planstate, es);
+ local_instr, es);
show_upper_qual(((WindowAgg *) plan)->runConditionOrig,
"Run Condition", planstate, ancestors, es);
show_windowagg_info(castNode(WindowAggState, planstate), es);
@@ -2360,7 +2418,7 @@ ExplainNode(PlanState *planstate, List *ancestors,
show_upper_qual(plan->qual, "Filter", planstate, ancestors, es);
if (plan->qual)
show_instrumentation_count("Rows Removed by Filter", 1,
- planstate, es);
+ local_instr, es);
break;
case T_Sort:
show_sort_keys(castNode(SortState, planstate), ancestors, es);
@@ -2382,7 +2440,7 @@ ExplainNode(PlanState *planstate, List *ancestors,
show_upper_qual(plan->qual, "Filter", planstate, ancestors, es);
if (plan->qual)
show_instrumentation_count("Rows Removed by Filter", 1,
- planstate, es);
+ local_instr, es);
break;
case T_ModifyTable:
show_modifytable_info(castNode(ModifyTableState, planstate), ancestors,
@@ -2427,10 +2485,10 @@ ExplainNode(PlanState *planstate, List *ancestors,
}
/* Show buffer/WAL usage */
- if (es->buffers && planstate->instrument)
- show_buffer_usage(es, &planstate->instrument->bufusage);
- if (es->wal && planstate->instrument)
- show_wal_usage(es, &planstate->instrument->walusage);
+ if (es->buffers && local_instr)
+ show_buffer_usage(es, &local_instr->bufusage);
+ if (es->wal && local_instr)
+ show_wal_usage(es, &local_instr->walusage);
/* Prepare per-worker buffer/WAL usage */
if (es->workers_state && (es->buffers || es->wal) && es->verbose)
@@ -3947,19 +4005,19 @@ show_tidbitmap_info(BitmapHeapScanState *planstate, ExplainState *es)
*/
static void
show_instrumentation_count(const char *qlabel, int which,
- PlanState *planstate, ExplainState *es)
+ Instrumentation *instr, ExplainState *es)
{
double nfiltered;
double nloops;
- if (!es->analyze || !planstate->instrument)
+ if (!es->analyze || !instr)
return;
if (which == 2)
- nfiltered = planstate->instrument->nfiltered2;
+ nfiltered = instr->nfiltered2;
else
- nfiltered = planstate->instrument->nfiltered1;
- nloops = planstate->instrument->nloops;
+ nfiltered = instr->nfiltered1;
+ nloops = instr->nloops;
/* In text mode, suppress zero counts; they're not interesting enough */
if (nfiltered > 0 || es->format != EXPLAIN_FORMAT_TEXT)
@@ -4630,7 +4688,7 @@ show_modifytable_info(ModifyTableState *mtstate, List *ancestors,
{
show_upper_qual((List *) node->onConflictWhere, "Conflict Filter",
&mtstate->ps, ancestors, es);
- show_instrumentation_count("Rows Removed by Conflict Filter", 1, &mtstate->ps, es);
+ show_instrumentation_count("Rows Removed by Conflict Filter", 1, (&mtstate->ps)->instrument, es);
}
/* EXPLAIN ANALYZE display of actual outcome for each tuple proposed */
@@ -4639,11 +4697,24 @@ show_modifytable_info(ModifyTableState *mtstate, List *ancestors,
double total;
double insert_path;
double other_path;
+ Instrumentation *local_instr;
- InstrEndLoop(outerPlanState(mtstate)->instrument);
+ /* Progressive explain. Use auxiliary instrumentation object */
+ if (es->progressive)
+ {
+ local_instr = es->pe_local_instr;
+ *local_instr = *outerPlanState(mtstate)->instrument;
+ /* Force end loop even if node is in progress */
+ InstrEndLoopForce(local_instr);
+ }
+ else
+ {
+ local_instr = outerPlanState(mtstate)->instrument;
+ InstrEndLoop(local_instr);
+ }
/* count the number of source rows */
- total = outerPlanState(mtstate)->instrument->ntuples;
+ total = local_instr->ntuples;
other_path = mtstate->ps.instrument->ntuples2;
insert_path = total - other_path;
@@ -4663,11 +4734,24 @@ show_modifytable_info(ModifyTableState *mtstate, List *ancestors,
double update_path;
double delete_path;
double skipped_path;
+ Instrumentation *local_instr;
- InstrEndLoop(outerPlanState(mtstate)->instrument);
+ /* Progressive explain. Use auxiliary instrumentation object */
+ if (es->progressive)
+ {
+ local_instr = es->pe_local_instr;
+ *local_instr = *outerPlanState(mtstate)->instrument;
+ /* Force end loop even if node is in progress */
+ InstrEndLoopForce(local_instr);
+ }
+ else
+ {
+ local_instr = outerPlanState(mtstate)->instrument;
+ InstrEndLoop(local_instr);
+ }
/* count the number of source rows */
- total = outerPlanState(mtstate)->instrument->ntuples;
+ total = local_instr->ntuples;
insert_path = mtstate->mt_merge_inserted;
update_path = mtstate->mt_merge_updated;
delete_path = mtstate->mt_merge_deleted;
@@ -5922,3 +6006,393 @@ GetSerializationMetrics(DestReceiver *dest)
return empty;
}
+
+/*
+ * ProgressiveExplainSetup
+ * Adjusts QueryDesc with instrumentation for progressive explains.
+ *
+ * If progressive explain is enabled and configured to collect
+ * instrumentation details, we are going to adjust QueryDesc accordingly
+ * even if the query was not initiated with EXPLAIN ANALYZE. This
+ * will directly affect query execution and add computation overhead.
+ */
+void
+ProgressiveExplainSetup(QueryDesc *queryDesc)
+{
+ /* Adjust instrumentation if enabled */
+ if (progressive_explain == PROGRESSIVE_EXPLAIN_ANALYZE)
+ {
+ if (progressive_explain_timing)
+ queryDesc->instrument_options |= INSTRUMENT_TIMER;
+ else
+ queryDesc->instrument_options |= INSTRUMENT_ROWS;
+ if (progressive_explain_buffers)
+ queryDesc->instrument_options |= INSTRUMENT_BUFFERS;
+ if (progressive_explain_wal)
+ queryDesc->instrument_options |= INSTRUMENT_WAL;
+ }
+}
+
+/*
+ * ProgressiveExplainUpdate
+ * Updates progressive explain for instrumented runs.
+ */
+void
+ProgressiveExplainUpdate(PlanState *node)
+{
+ /* Track the current PlanState */
+ node->state->query_desc->pe_es->pe_curr_node = node;
+ ProgressiveExplainPrint(node->state->query_desc);
+ node->state->query_desc->pe_es->pe_curr_node = NULL;
+
+ /* Reset timeout flag */
+ ProgressiveExplainPending = false;
+}
+
+/*
+ * ProgressiveExplainFinish
+ * Finalizes query execution with progressive explain enabled.
+ */
+void
+ProgressiveExplainFinish(QueryDesc *queryDesc)
+{
+ ProgressiveExplainCleanup(queryDesc->pe_es->pe_a);
+}
+
+/*
+ * ProgressiveExplainCleanup
+ * Cleanup routine when progressive explain is enabled.
+ *
+ * We need deal with structures not automatically released by the memory
+ * context removal. Current tasks are:
+ * - remove local backend from progressive explain hash
+ * - detach from DSA used to store shared data
+ */
+void
+ProgressiveExplainCleanup(dsa_area *a)
+{
+ progressiveExplainHashKey key;
+
+ key.pid = MyProcPid;
+ LWLockAcquire(ExplainHashLock, LW_EXCLUSIVE);
+ if (a)
+ dsa_detach(a);
+ hash_search(progressiveExplainArray, &key, HASH_REMOVE, NULL);
+ LWLockRelease(ExplainHashLock);
+}
+
+/*
+ * ProgressiveExplainBegin
+ * Initialization of all structures related to progressive explains.
+ *
+ * We define a ExplainState that will be reused in every iteration of
+ * plan prints.
+ *
+ * Progressive explain plans are printed in shared memory via DSAs. Each
+ * A dynamic shared memory area is created to hold the progressive plans.
+ * Each backend printing plans has its own DSA, which is shared with other
+ * backends via the global progressive explain hash through dsa_handle and
+ * dsa_pointer pointers.
+ *
+ * A memory context release callback is defined for manual resource release
+ * in case of query cancellations.
+ *
+ * A periodid timeout is configured to print the plan in fixed intervals if
+ * progressive explain is configured with instrumentation enabled.
+ */
+void
+ProgressiveExplainBegin(QueryDesc *queryDesc)
+{
+ ExplainState *es;
+ progressiveExplainHashKey key;
+ progressiveExplainHashEntry *entry;
+ bool found;
+
+ /* Configure memory context release callback */
+ MemoryContextCallback *queryDescReleaseCallback;
+
+ queryDescReleaseCallback = (MemoryContextCallback *)
+ palloc0(sizeof(MemoryContextCallback));
+ queryDescReleaseCallback->func = ProgressiveExplainReleaseFunc;
+ queryDescReleaseCallback->arg = NULL;
+ MemoryContextRegisterResetCallback(CurrentMemoryContext,
+ queryDescReleaseCallback);
+
+ /* Initialize ExplainState to be used for all prints */
+ es = NewExplainState();
+ queryDesc->pe_es = es;
+
+ /* Local instrumentation object to be reused for every node */
+ es->pe_local_instr = palloc0(sizeof(Instrumentation));
+
+ /*
+ * Mark ExplainState as progressive so that the plan printing function
+ * adjusts logic accordingly.
+ */
+ es->progressive = true;
+
+ es->analyze = (queryDesc->instrument_options &&
+ (progressive_explain == PROGRESSIVE_EXPLAIN_ANALYZE));
+ es->buffers = (es->analyze && progressive_explain_buffers);
+ es->wal = (es->analyze && progressive_explain_wal);
+ es->timing = (es->analyze && progressive_explain_timing);
+ es->summary = (es->analyze);
+ es->format = progressive_explain_format;
+ es->verbose = progressive_explain_verbose;
+ es->settings = progressive_explain_settings;
+
+ /* Exclusive access is needed to update the hash */
+ LWLockAcquire(ExplainHashLock, LW_EXCLUSIVE);
+
+ /* Find or create an entry with desired hash code */
+ key.pid = MyProcPid;
+ entry = (progressiveExplainHashEntry *) hash_search(progressiveExplainArray,
+ &key,
+ HASH_ENTER,
+ &found);
+
+ /* Define the DSA and share through the hash */
+ es->pe_a = dsa_create(LWTRANCHE_PROGRESSIVE_EXPLAIN_DSA);
+ entry->h = dsa_get_handle(es->pe_a);
+ entry->p = (dsa_pointer) NULL;
+
+ LWLockRelease(ExplainHashLock);
+
+ /* Enable timeout only if instrumentation is enabled */
+ if (es->analyze)
+ enable_timeout_every(PROGRESSIVE_EXPLAIN_TIMEOUT,
+ GetCurrentTimestamp(),
+ progressive_explain_interval);
+
+ /* Printing progressive plan for the first time */
+ ProgressiveExplainPrint(queryDesc);
+}
+
+/*
+ * ProgressiveExplainPrint
+ * Prints progressive explain in memory.
+ *
+ * This function resets the reusable ExplainState, prints the
+ * plan and updates the DSA with new data.
+ *
+ * DSA memory allocation is also done here. Amount of shared
+ * memory allocated depends on size of currently printed plan.
+ * There may be reallocations in subsequent calls if new plans
+ * don't fit in the existing area.
+ */
+void
+ProgressiveExplainPrint(QueryDesc *queryDesc)
+{
+ bool alloc_needed = false;
+
+ /* Produce a plan only if descriptor is being tracked */
+ if (queryDesc &&
+ queryDesc->planstate)
+ {
+ QueryDesc *currentQueryDesc = queryDesc;
+
+ progressiveExplainHashKey key;
+ progressiveExplainHashEntry *entry;
+ progressiveExplainData *pe_data;
+ ExplainState *es = queryDesc->pe_es;
+
+ /* Reset the string to be reused */
+ resetStringInfo(es->str);
+
+ /* Print the plan */
+ ExplainBeginOutput(es);
+ ExplainPrintPlan(es, currentQueryDesc);
+ ExplainEndOutput(es);
+
+ /* Exclusive access is needed to update the hash */
+ key.pid = MyProcPid;
+ LWLockAcquire(ExplainHashLock, LW_EXCLUSIVE);
+ entry = (progressiveExplainHashEntry *) hash_search(progressiveExplainArray,
+ &key,
+ HASH_FIND,
+ NULL);
+
+ /* Exclusive access is needed to update the hash */
+ if (entry)
+ {
+ /* Plan was never printed */
+ if (!entry->p)
+ alloc_needed = true;
+ else
+ {
+ pe_data = dsa_get_address(es->pe_a,
+ entry->p);
+
+ /*
+ * Plan does not fit in existing shared memory area.
+ * Reallocation is needed.
+ */
+ if (strlen(es->str->data) >
+ add_size(strlen(pe_data->plan),
+ PROGRESSIVE_EXPLAIN_ALLOC_SIZE))
+ {
+ dsa_free(es->pe_a, entry->p);
+ alloc_needed = true;
+ }
+ }
+
+ if (alloc_needed)
+ {
+ /*
+ * The allocated size combines the length of the currently
+ * printed query plan with an additional delta defined by
+ * PROGRESSIVE_EXPLAIN_ALLOC_SIZE. This strategy prevents
+ * having to reallocate the segment very often, which would be
+ * needed in case the length of the next printed exceeds the
+ * previously allocated size.
+ */
+ entry->p = dsa_allocate(es->pe_a,
+ add_size(sizeof(progressiveExplainData),
+ add_size(strlen(es->str->data),
+ PROGRESSIVE_EXPLAIN_ALLOC_SIZE)));
+ pe_data = dsa_get_address(es->pe_a, entry->p);
+ pe_data->pid = MyProcPid;
+ strcpy(pe_data->plan, "");
+ pe_data->last_print = 0;
+ }
+
+ /* Update shared memory with new data */
+ strcpy(pe_data->plan, es->str->data);
+ pe_data->last_print = GetCurrentTimestamp();
+ }
+
+ LWLockRelease(ExplainHashLock);
+ }
+}
+
+/*
+ * ProgressiveExplainReleaseFunc
+ * Memory context release callback function to remove
+ * plan from explain hash and disable the timeout.
+ */
+static void
+ProgressiveExplainReleaseFunc(void *arg)
+{
+ /* Remove row from hash */
+ progressiveExplainHashKey key;
+ progressiveExplainHashEntry *entry;
+
+ key.pid = MyProcPid;
+ LWLockAcquire(ExplainHashLock, LW_SHARED);
+ entry = (progressiveExplainHashEntry *) hash_search(progressiveExplainArray,
+ &key,
+ HASH_FIND,
+ NULL);
+ LWLockRelease(ExplainHashLock);
+ if (entry)
+ ProgressiveExplainCleanup(NULL);
+
+ /* Stop timeout */
+ disable_timeout(PROGRESSIVE_EXPLAIN_TIMEOUT, false);
+}
+
+/*
+ * ProgressiveExplainHashShmemSize
+ * Compute shared memory space needed for explain hash.
+ */
+Size
+ProgressiveExplainHashShmemSize(void)
+{
+ Size size = 0;
+ long max_table_size;
+
+ max_table_size = add_size(MaxBackends,
+ max_parallel_workers);
+ size = add_size(size,
+ hash_estimate_size(max_table_size,
+ sizeof(progressiveExplainHashEntry)));
+
+ return size;
+}
+
+/*
+ * InitProgressiveExplainHash
+ * Initialize hash used to store data of progressive explains.
+ */
+void
+InitProgressiveExplainHash(void)
+{
+ HASHCTL info;
+
+ info.keysize = sizeof(progressiveExplainHashKey);
+ info.entrysize = sizeof(progressiveExplainHashEntry);
+
+ progressiveExplainArray = ShmemInitHash("progressive explain hash",
+ 50, 50,
+ &info,
+ HASH_ELEM | HASH_BLOBS);
+}
+
+/*
+ * pg_stat_progress_explain
+ * Return the progress of progressive explains.
+ */
+Datum
+pg_stat_progress_explain(PG_FUNCTION_ARGS)
+{
+#define EXPLAIN_ACTIVITY_COLS 3
+ HASH_SEQ_STATUS hash_seq;
+ progressiveExplainHashEntry *entry;
+ dsa_area *a;
+ progressiveExplainData *ped;
+
+ ReturnSetInfo *rsinfo = (ReturnSetInfo *) fcinfo->resultinfo;
+
+ InitMaterializedSRF(fcinfo, 0);
+
+ LWLockAcquire(ExplainHashLock, LW_SHARED);
+
+ hash_seq_init(&hash_seq, progressiveExplainArray);
+ while ((entry = hash_seq_search(&hash_seq)) != NULL)
+ {
+ Datum values[EXPLAIN_ACTIVITY_COLS] = {0};
+ bool nulls[EXPLAIN_ACTIVITY_COLS] = {0};
+
+ a = dsa_attach(entry->h);
+ ped = dsa_get_address(a, entry->p);
+
+ values[0] = ped->pid;
+ values[1] = TimestampTzGetDatum(ped->last_print);
+
+ if (superuser())
+ values[2] = CStringGetTextDatum(ped->plan);
+ else
+ {
+ int num_backends = pgstat_fetch_stat_numbackends();
+ int curr_backend;
+
+ /* 1-based index */
+ for (curr_backend = 1; curr_backend <= num_backends; curr_backend++)
+ {
+ LocalPgBackendStatus *local_beentry;
+ PgBackendStatus *beentry;
+
+ /* Get the next one in the list */
+ local_beentry = pgstat_get_local_beentry_by_index(curr_backend);
+ beentry = &local_beentry->backendStatus;
+
+ if (beentry->st_procpid == ped->pid)
+ {
+ if (beentry->st_userid == GetUserId())
+ values[2] = CStringGetTextDatum(ped->plan);
+ else
+ values[2] = CStringGetTextDatum("<insufficient privilege>");
+ break;
+ }
+ }
+ }
+
+ tuplestore_putvalues(rsinfo->setResult, rsinfo->setDesc, values, nulls);
+
+ dsa_detach(a);
+
+ }
+ LWLockRelease(ExplainHashLock);
+
+ return (Datum) 0;
+}
diff --git a/src/backend/executor/execMain.c b/src/backend/executor/execMain.c
index 963aa390620..b7ff473a056 100644
--- a/src/backend/executor/execMain.c
+++ b/src/backend/executor/execMain.c
@@ -43,6 +43,7 @@
#include "access/xact.h"
#include "catalog/namespace.h"
#include "catalog/partition.h"
+#include "commands/explain.h"
#include "commands/matview.h"
#include "commands/trigger.h"
#include "executor/executor.h"
@@ -148,6 +149,12 @@ standard_ExecutorStart(QueryDesc *queryDesc, int eflags)
/* caller must ensure the query's snapshot is active */
Assert(GetActiveSnapshot() == queryDesc->snapshot);
+ /*
+ * Setup progressive explain if enabled.
+ */
+ if (progressive_explain != PROGRESSIVE_EXPLAIN_NONE)
+ ProgressiveExplainSetup(queryDesc);
+
/*
* If the transaction is read-only, we need to check if any writes are
* planned to non-temporary tables. EXPLAIN is considered read-only.
@@ -173,6 +180,11 @@ standard_ExecutorStart(QueryDesc *queryDesc, int eflags)
estate = CreateExecutorState();
queryDesc->estate = estate;
+ /*
+ * Adding back reference to QueryDesc
+ */
+ estate->query_desc = queryDesc;
+
oldcontext = MemoryContextSwitchTo(estate->es_query_cxt);
/*
@@ -258,6 +270,12 @@ standard_ExecutorStart(QueryDesc *queryDesc, int eflags)
*/
InitPlan(queryDesc, eflags);
+ /*
+ * Start progressive explain if enabled.
+ */
+ if (progressive_explain != PROGRESSIVE_EXPLAIN_NONE)
+ ProgressiveExplainBegin(queryDesc);
+
MemoryContextSwitchTo(oldcontext);
}
@@ -445,6 +463,12 @@ standard_ExecutorFinish(QueryDesc *queryDesc)
MemoryContextSwitchTo(oldcontext);
+ /*
+ * Finish progressive explain if enabled.
+ */
+ if (progressive_explain != PROGRESSIVE_EXPLAIN_NONE)
+ ProgressiveExplainFinish(queryDesc);
+
estate->es_finished = true;
}
diff --git a/src/backend/executor/execProcnode.c b/src/backend/executor/execProcnode.c
index f5f9cfbeead..521a7b41404 100644
--- a/src/backend/executor/execProcnode.c
+++ b/src/backend/executor/execProcnode.c
@@ -72,6 +72,7 @@
*/
#include "postgres.h"
+#include "commands/explain.h"
#include "executor/executor.h"
#include "executor/nodeAgg.h"
#include "executor/nodeAppend.h"
@@ -118,9 +119,11 @@
#include "executor/nodeWorktablescan.h"
#include "miscadmin.h"
#include "nodes/nodeFuncs.h"
+#include "utils/guc.h"
static TupleTableSlot *ExecProcNodeFirst(PlanState *node);
static TupleTableSlot *ExecProcNodeInstr(PlanState *node);
+static TupleTableSlot *ExecProcNodeInstrExplain(PlanState *node);
static bool ExecShutdownNode_walker(PlanState *node, void *context);
@@ -461,8 +464,14 @@ ExecProcNodeFirst(PlanState *node)
* does instrumentation. Otherwise we can dispense with all wrappers and
* have ExecProcNode() directly call the relevant function from now on.
*/
+
if (node->instrument)
- node->ExecProcNode = ExecProcNodeInstr;
+ {
+ if (progressive_explain == PROGRESSIVE_EXPLAIN_ANALYZE)
+ node->ExecProcNode = ExecProcNodeInstrExplain;
+ else
+ node->ExecProcNode = ExecProcNodeInstr;
+ }
else
node->ExecProcNode = node->ExecProcNodeReal;
@@ -489,6 +498,31 @@ ExecProcNodeInstr(PlanState *node)
return result;
}
+/*
+ * ExecProcNode wrapper that performs instrumentation calls and prints
+ * progressive explains. By keeping this a separate function, we add
+ * overhead only when progressive explain is enabled
+ */
+static TupleTableSlot *
+ExecProcNodeInstrExplain(PlanState *node)
+{
+ TupleTableSlot *result;
+
+ InstrStartNode(node->instrument);
+
+ /*
+ * Update progressive after timeout is reached.
+ */
+ if (ProgressiveExplainPending)
+ ProgressiveExplainUpdate(node);
+
+ result = node->ExecProcNodeReal(node);
+
+ InstrStopNode(node->instrument, TupIsNull(result) ? 0.0 : 1.0);
+
+ return result;
+}
+
/* ----------------------------------------------------------------
* MultiExecProcNode
diff --git a/src/backend/executor/instrument.c b/src/backend/executor/instrument.c
index 56e635f4700..6a160ee254f 100644
--- a/src/backend/executor/instrument.c
+++ b/src/backend/executor/instrument.c
@@ -25,6 +25,8 @@ static WalUsage save_pgWalUsage;
static void BufferUsageAdd(BufferUsage *dst, const BufferUsage *add);
static void WalUsageAdd(WalUsage *dst, WalUsage *add);
+static void InstrEndLoopInternal(Instrumentation *instr, bool force);
+
/* Allocate new instrumentation structure(s) */
Instrumentation *
@@ -137,7 +139,7 @@ InstrUpdateTupleCount(Instrumentation *instr, double nTuples)
/* Finish a run cycle for a plan node */
void
-InstrEndLoop(Instrumentation *instr)
+InstrEndLoopInternal(Instrumentation *instr, bool force)
{
double totaltime;
@@ -145,7 +147,7 @@ InstrEndLoop(Instrumentation *instr)
if (!instr->running)
return;
- if (!INSTR_TIME_IS_ZERO(instr->starttime))
+ if (!INSTR_TIME_IS_ZERO(instr->starttime) && !force)
elog(ERROR, "InstrEndLoop called on running node");
/* Accumulate per-cycle statistics into totals */
@@ -164,6 +166,20 @@ InstrEndLoop(Instrumentation *instr)
instr->tuplecount = 0;
}
+/* Safely finish a run cycle for a plan node */
+void
+InstrEndLoop(Instrumentation *instr)
+{
+ InstrEndLoopInternal(instr, false);
+}
+
+/* Forcibly finish a run cycle for a plan node */
+void
+InstrEndLoopForce(Instrumentation *instr)
+{
+ InstrEndLoopInternal(instr, true);
+}
+
/* aggregate instrumentation information */
void
InstrAggNode(Instrumentation *dst, Instrumentation *add)
diff --git a/src/backend/storage/ipc/ipci.c b/src/backend/storage/ipc/ipci.c
index 174eed70367..15d8a3b28a8 100644
--- a/src/backend/storage/ipc/ipci.c
+++ b/src/backend/storage/ipc/ipci.c
@@ -25,6 +25,7 @@
#include "access/xlogprefetcher.h"
#include "access/xlogrecovery.h"
#include "commands/async.h"
+#include "commands/explain.h"
#include "miscadmin.h"
#include "pgstat.h"
#include "postmaster/autovacuum.h"
@@ -148,6 +149,7 @@ CalculateShmemSize(int *num_semaphores)
size = add_size(size, WaitEventCustomShmemSize());
size = add_size(size, InjectionPointShmemSize());
size = add_size(size, SlotSyncShmemSize());
+ size = add_size(size, ProgressiveExplainHashShmemSize());
/* include additional requested shmem from preload libraries */
size = add_size(size, total_addin_request);
@@ -300,6 +302,11 @@ CreateOrAttachShmemStructs(void)
*/
PredicateLockShmemInit();
+ /*
+ * Set up instrumented explain hash table
+ */
+ InitProgressiveExplainHash();
+
/*
* Set up process table
*/
diff --git a/src/backend/storage/lmgr/lwlock.c b/src/backend/storage/lmgr/lwlock.c
index f1e74f184f1..1d713d942b4 100644
--- a/src/backend/storage/lmgr/lwlock.c
+++ b/src/backend/storage/lmgr/lwlock.c
@@ -166,6 +166,7 @@ static const char *const BuiltinTrancheNames[] = {
[LWTRANCHE_SUBTRANS_SLRU] = "SubtransSLRU",
[LWTRANCHE_XACT_SLRU] = "XactSLRU",
[LWTRANCHE_PARALLEL_VACUUM_DSA] = "ParallelVacuumDSA",
+ [LWTRANCHE_PROGRESSIVE_EXPLAIN_DSA] = "ProgressiveExplainDSA",
};
StaticAssertDecl(lengthof(BuiltinTrancheNames) ==
diff --git a/src/backend/utils/activity/wait_event_names.txt b/src/backend/utils/activity/wait_event_names.txt
index e199f071628..890acf02da9 100644
--- a/src/backend/utils/activity/wait_event_names.txt
+++ b/src/backend/utils/activity/wait_event_names.txt
@@ -346,6 +346,7 @@ WALSummarizer "Waiting to read or update WAL summarization state."
DSMRegistry "Waiting to read or update the dynamic shared memory registry."
InjectionPoint "Waiting to read or update information related to injection points."
SerialControl "Waiting to read or update shared <filename>pg_serial</filename> state."
+ExplainHash "Waiting to access backend explain shared hash table."
#
# END OF PREDEFINED LWLOCKS (DO NOT CHANGE THIS LINE)
diff --git a/src/backend/utils/init/postinit.c b/src/backend/utils/init/postinit.c
index 01bb6a410cb..41842b85e00 100644
--- a/src/backend/utils/init/postinit.c
+++ b/src/backend/utils/init/postinit.c
@@ -33,6 +33,7 @@
#include "catalog/pg_database.h"
#include "catalog/pg_db_role_setting.h"
#include "catalog/pg_tablespace.h"
+#include "commands/explain.h"
#include "libpq/auth.h"
#include "libpq/libpq-be.h"
#include "mb/pg_wchar.h"
@@ -78,6 +79,7 @@ static void TransactionTimeoutHandler(void);
static void IdleSessionTimeoutHandler(void);
static void IdleStatsUpdateTimeoutHandler(void);
static void ClientCheckTimeoutHandler(void);
+static void ProgressiveExplainTimeoutHandler(void);
static bool ThereIsAtLeastOneRole(void);
static void process_startup_options(Port *port, bool am_superuser);
static void process_settings(Oid databaseid, Oid roleid);
@@ -741,6 +743,8 @@ InitPostgres(const char *in_dbname, Oid dboid,
RegisterTimeout(CLIENT_CONNECTION_CHECK_TIMEOUT, ClientCheckTimeoutHandler);
RegisterTimeout(IDLE_STATS_UPDATE_TIMEOUT,
IdleStatsUpdateTimeoutHandler);
+ RegisterTimeout(PROGRESSIVE_EXPLAIN_TIMEOUT,
+ ProgressiveExplainTimeoutHandler);
}
/*
@@ -1397,6 +1401,12 @@ ClientCheckTimeoutHandler(void)
SetLatch(MyLatch);
}
+static void
+ProgressiveExplainTimeoutHandler(void)
+{
+ ProgressiveExplainPending = true;
+}
+
/*
* Returns true if at least one role is defined in this database cluster.
*/
diff --git a/src/backend/utils/misc/guc_tables.c b/src/backend/utils/misc/guc_tables.c
index cce73314609..fc64eb9b9b4 100644
--- a/src/backend/utils/misc/guc_tables.c
+++ b/src/backend/utils/misc/guc_tables.c
@@ -40,6 +40,7 @@
#include "catalog/storage.h"
#include "commands/async.h"
#include "commands/event_trigger.h"
+#include "commands/explain.h"
#include "commands/tablespace.h"
#include "commands/trigger.h"
#include "commands/user.h"
@@ -474,6 +475,22 @@ static const struct config_enum_entry wal_compression_options[] = {
{NULL, 0, false}
};
+static const struct config_enum_entry explain_format_options[] = {
+ {"text", EXPLAIN_FORMAT_TEXT, false},
+ {"xml", EXPLAIN_FORMAT_XML, false},
+ {"json", EXPLAIN_FORMAT_JSON, false},
+ {"yaml", EXPLAIN_FORMAT_YAML, false},
+ {NULL, 0, false}
+};
+
+static const struct config_enum_entry progressive_explain_options[] = {
+ {"off", PROGRESSIVE_EXPLAIN_NONE, false},
+ {"explain", PROGRESSIVE_EXPLAIN_EXPLAIN, false},
+ {"analyze", PROGRESSIVE_EXPLAIN_ANALYZE, false},
+ {"false", PROGRESSIVE_EXPLAIN_NONE, true},
+ {NULL, 0, false}
+};
+
/*
* Options for enum values stored in other modules
*/
@@ -528,6 +545,14 @@ int log_parameter_max_length_on_error = 0;
int log_temp_files = -1;
double log_statement_sample_rate = 1.0;
double log_xact_sample_rate = 0;
+int progressive_explain = PROGRESSIVE_EXPLAIN_NONE;
+bool progressive_explain_verbose = false;
+bool progressive_explain_settings = false;
+bool progressive_explain_timing = false;
+bool progressive_explain_buffers = false;
+bool progressive_explain_wal = false;
+int progressive_explain_interval = 1000;
+int progressive_explain_format = EXPLAIN_FORMAT_TEXT;
char *backtrace_functions;
int temp_file_limit = -1;
@@ -2116,6 +2141,61 @@ struct config_bool ConfigureNamesBool[] =
NULL, NULL, NULL
},
+ {
+ {"progressive_explain_verbose", PGC_USERSET, STATS_MONITORING,
+ gettext_noop("Controls whether verbose details are added to progressive explains."),
+ gettext_noop("Equivalent to the VERBOSE option of EXPLAIN."),
+ GUC_EXPLAIN
+ },
+ &progressive_explain_verbose,
+ false,
+ NULL, NULL, NULL
+ },
+
+ {
+ {"progressive_explain_settings", PGC_USERSET, STATS_MONITORING,
+ gettext_noop("Controls whether information about modified configuration is added to progressive explains."),
+ gettext_noop("Equivalent to the SETTINGS option of EXPLAIN."),
+ GUC_EXPLAIN
+ },
+ &progressive_explain_settings,
+ false,
+ NULL, NULL, NULL
+ },
+
+ {
+ {"progressive_explain_timing", PGC_USERSET, STATS_MONITORING,
+ gettext_noop("Controls whether information about per node timing is added to progressive explains."),
+ gettext_noop("Equivalent to the TIMING option of EXPLAIN."),
+ GUC_EXPLAIN
+ },
+ &progressive_explain_timing,
+ true,
+ NULL, NULL, NULL
+ },
+
+ {
+ {"progressive_explain_buffers", PGC_USERSET, STATS_MONITORING,
+ gettext_noop("Controls whether information about buffers is added to progressive explains."),
+ gettext_noop("Equivalent to the BUFFERS option of EXPLAIN."),
+ GUC_EXPLAIN
+ },
+ &progressive_explain_buffers,
+ false,
+ NULL, NULL, NULL
+ },
+
+ {
+ {"progressive_explain_wal", PGC_USERSET, STATS_MONITORING,
+ gettext_noop("Controls whether information about WAL record generation is added to progressive explains."),
+ gettext_noop("Equivalent to the WAL option of EXPLAIN."),
+ GUC_EXPLAIN
+ },
+ &progressive_explain_wal,
+ false,
+ NULL, NULL, NULL
+ },
+
/* End-of-list marker */
{
{NULL, 0, 0, NULL, NULL}, NULL, false, NULL, NULL, NULL
@@ -3771,6 +3851,18 @@ struct config_int ConfigureNamesInt[] =
NULL, NULL, NULL
},
+ {
+ {"progressive_explain_interval", PGC_USERSET, STATS_MONITORING,
+ gettext_noop("Sets the interval between instrumented progressive "
+ "explains."),
+ NULL,
+ GUC_UNIT_MS
+ },
+ &progressive_explain_interval,
+ 1000, 10, INT_MAX,
+ NULL, NULL, NULL
+ },
+
/* End-of-list marker */
{
{NULL, 0, 0, NULL, NULL}, NULL, 0, 0, 0, NULL, NULL, NULL
@@ -5274,6 +5366,26 @@ struct config_enum ConfigureNamesEnum[] =
NULL, NULL, NULL
},
+ {
+ {"progressive_explain_format", PGC_USERSET, STATS_MONITORING,
+ gettext_noop("Selects the EXPLAIN output format to be used with progressive explains."),
+ gettext_noop("Equivalent to the FORMAT option of EXPLAIN.")
+ },
+ &progressive_explain_format,
+ EXPLAIN_FORMAT_TEXT, explain_format_options,
+ NULL, NULL, NULL
+ },
+
+ {
+ {"progressive_explain", PGC_USERSET, STATS_MONITORING,
+ gettext_noop("Enables progressive explains."),
+ gettext_noop("Explain output is visible via pg_stat_progress_explain.")
+ },
+ &progressive_explain,
+ PROGRESSIVE_EXPLAIN_NONE, progressive_explain_options,
+ NULL, NULL, NULL
+ },
+
/* End-of-list marker */
{
{NULL, 0, 0, NULL, NULL}, NULL, 0, NULL, NULL, NULL, NULL
diff --git a/src/backend/utils/misc/postgresql.conf.sample b/src/backend/utils/misc/postgresql.conf.sample
index d472987ed46..a05a9cebf6f 100644
--- a/src/backend/utils/misc/postgresql.conf.sample
+++ b/src/backend/utils/misc/postgresql.conf.sample
@@ -651,6 +651,18 @@
#log_executor_stats = off
+# - Progressive Explain -
+
+#progressive_explain = off
+#progressive_explain_interval = 1s
+#progressive_explain_format = text
+#progressive_explain_settings = off
+#progressive_explain_verbose = off
+#progressive_explain_buffers = off
+#progressive_explain_wal = off
+#progressive_explain_timing = off
+
+
#------------------------------------------------------------------------------
# VACUUMING
#------------------------------------------------------------------------------
diff --git a/src/include/catalog/pg_proc.dat b/src/include/catalog/pg_proc.dat
index 9e803d610d7..bb4e514b7ea 100644
--- a/src/include/catalog/pg_proc.dat
+++ b/src/include/catalog/pg_proc.dat
@@ -12464,4 +12464,14 @@
proargtypes => 'int4',
prosrc => 'gist_stratnum_common' },
+{ oid => '8770',
+ descr => 'statistics: information about progress of backends running statements',
+ proname => 'pg_stat_progress_explain', prorows => '100', proisstrict => 'f',
+ proretset => 't', provolatile => 's', proparallel => 'r',
+ prorettype => 'record', proargtypes => 'bool',
+ proallargtypes => '{bool,int4,timestamptz,text}',
+ proargmodes => '{i,o,o,o}',
+ proargnames => '{mode,pid,last_print,query_plan}',
+ prosrc => 'pg_stat_progress_explain' },
+
]
diff --git a/src/include/commands/explain.h b/src/include/commands/explain.h
index ea7419951f4..89b94dfcf67 100644
--- a/src/include/commands/explain.h
+++ b/src/include/commands/explain.h
@@ -13,6 +13,7 @@
#ifndef EXPLAIN_H
#define EXPLAIN_H
+#include "datatype/timestamp.h"
#include "executor/executor.h"
#include "lib/stringinfo.h"
#include "parser/parse_node.h"
@@ -32,6 +33,13 @@ typedef enum ExplainFormat
EXPLAIN_FORMAT_YAML,
} ExplainFormat;
+typedef enum ProgressiveExplain
+{
+ PROGRESSIVE_EXPLAIN_NONE,
+ PROGRESSIVE_EXPLAIN_EXPLAIN,
+ PROGRESSIVE_EXPLAIN_ANALYZE,
+} ProgressiveExplain;
+
typedef struct ExplainWorkersState
{
int num_workers; /* # of worker processes the plan used */
@@ -67,12 +75,37 @@ typedef struct ExplainState
List *deparse_cxt; /* context list for deparsing expressions */
Bitmapset *printed_subplans; /* ids of SubPlans we've printed */
bool hide_workers; /* set if we find an invisible Gather */
+ bool progressive; /* set if tracking a progressive explain */
int rtable_size; /* length of rtable excluding the RTE_GROUP
* entry */
/* state related to the current plan node */
ExplainWorkersState *workers_state; /* needed if parallel plan */
+
+ /* state related to progressive explains */
+ struct PlanState *pe_curr_node;
+ struct Instrumentation *pe_local_instr;
+ dsa_area *pe_a;
} ExplainState;
+typedef struct progressiveExplainHashKey
+{
+ int pid; /* PID */
+} progressiveExplainHashKey;
+
+typedef struct progressiveExplainHashEntry
+{
+ progressiveExplainHashKey key; /* hash key of entry - MUST BE FIRST */
+ dsa_handle h;
+ dsa_pointer p;
+} progressiveExplainHashEntry;
+
+typedef struct progressiveExplainData
+{
+ int pid;
+ TimestampTz last_print;
+ char plan[];
+} progressiveExplainData;
+
/* Hook for plugins to get control in ExplainOneQuery() */
typedef void (*ExplainOneQuery_hook_type) (Query *query,
int cursorOptions,
@@ -144,4 +177,13 @@ extern void ExplainCloseGroup(const char *objtype, const char *labelname,
extern DestReceiver *CreateExplainSerializeDestReceiver(ExplainState *es);
+extern void ProgressiveExplainBegin(QueryDesc *queryDesc);
+extern void ProgressiveExplainFinish(QueryDesc *queryDesc);
+extern void ProgressiveExplainSetup(QueryDesc *queryDesc);
+extern void ProgressiveExplainUpdate(PlanState *node);
+extern Size ProgressiveExplainHashShmemSize(void);
+extern void InitProgressiveExplainHash(void);
+
+extern bool ProgressiveExplainPending;
+
#endif /* EXPLAIN_H */
diff --git a/src/include/executor/execdesc.h b/src/include/executor/execdesc.h
index 86db3dc8d0d..f9985ca7429 100644
--- a/src/include/executor/execdesc.h
+++ b/src/include/executor/execdesc.h
@@ -47,6 +47,7 @@ typedef struct QueryDesc
TupleDesc tupDesc; /* descriptor for result tuples */
EState *estate; /* executor's query-wide state */
PlanState *planstate; /* tree of per-plan-node state */
+ struct ExplainState *pe_es; /* progressive explain state if enabled */
/* This field is set by ExecutePlan */
bool already_executed; /* true if previously executed */
diff --git a/src/include/executor/instrument.h b/src/include/executor/instrument.h
index 03653ab6c6c..21de2a5632d 100644
--- a/src/include/executor/instrument.h
+++ b/src/include/executor/instrument.h
@@ -109,6 +109,7 @@ extern void InstrStartNode(Instrumentation *instr);
extern void InstrStopNode(Instrumentation *instr, double nTuples);
extern void InstrUpdateTupleCount(Instrumentation *instr, double nTuples);
extern void InstrEndLoop(Instrumentation *instr);
+extern void InstrEndLoopForce(Instrumentation *instr);
extern void InstrAggNode(Instrumentation *dst, Instrumentation *add);
extern void InstrStartParallelQuery(void);
extern void InstrEndParallelQuery(BufferUsage *bufusage, WalUsage *walusage);
diff --git a/src/include/nodes/execnodes.h b/src/include/nodes/execnodes.h
index 2625d7e8222..0a8ab9109ae 100644
--- a/src/include/nodes/execnodes.h
+++ b/src/include/nodes/execnodes.h
@@ -56,6 +56,7 @@ struct ExprState;
struct ExprContext;
struct RangeTblEntry; /* avoid including parsenodes.h here */
struct ExprEvalStep; /* avoid including execExpr.h everywhere */
+struct QueryDesc; /* avoid including execdesc.h here */
struct CopyMultiInsertBuffer;
struct LogicalTapeSet;
@@ -760,6 +761,9 @@ typedef struct EState
*/
List *es_insert_pending_result_relations;
List *es_insert_pending_modifytables;
+
+ /* Reference to query descriptor */
+ struct QueryDesc *query_desc;
} EState;
diff --git a/src/include/storage/lwlock.h b/src/include/storage/lwlock.h
index 2aa46fd50da..716623bde3a 100644
--- a/src/include/storage/lwlock.h
+++ b/src/include/storage/lwlock.h
@@ -215,6 +215,7 @@ typedef enum BuiltinTrancheIds
LWTRANCHE_SUBTRANS_SLRU,
LWTRANCHE_XACT_SLRU,
LWTRANCHE_PARALLEL_VACUUM_DSA,
+ LWTRANCHE_PROGRESSIVE_EXPLAIN_DSA,
LWTRANCHE_FIRST_USER_DEFINED,
} BuiltinTrancheIds;
diff --git a/src/include/storage/lwlocklist.h b/src/include/storage/lwlocklist.h
index cf565452382..43f10a51862 100644
--- a/src/include/storage/lwlocklist.h
+++ b/src/include/storage/lwlocklist.h
@@ -83,3 +83,4 @@ PG_LWLOCK(49, WALSummarizer)
PG_LWLOCK(50, DSMRegistry)
PG_LWLOCK(51, InjectionPoint)
PG_LWLOCK(52, SerialControl)
+PG_LWLOCK(53, ExplainHash)
diff --git a/src/include/utils/guc.h b/src/include/utils/guc.h
index 1233e07d7da..b6326550ba3 100644
--- a/src/include/utils/guc.h
+++ b/src/include/utils/guc.h
@@ -278,6 +278,14 @@ extern PGDLLIMPORT int log_min_duration_statement;
extern PGDLLIMPORT int log_temp_files;
extern PGDLLIMPORT double log_statement_sample_rate;
extern PGDLLIMPORT double log_xact_sample_rate;
+extern PGDLLIMPORT int progressive_explain;
+extern PGDLLIMPORT int progressive_explain_interval;
+extern PGDLLIMPORT int progressive_explain_format;
+extern PGDLLIMPORT bool progressive_explain_verbose;
+extern PGDLLIMPORT bool progressive_explain_settings;
+extern PGDLLIMPORT bool progressive_explain_timing;
+extern PGDLLIMPORT bool progressive_explain_buffers;
+extern PGDLLIMPORT bool progressive_explain_wal;
extern PGDLLIMPORT char *backtrace_functions;
extern PGDLLIMPORT int temp_file_limit;
diff --git a/src/include/utils/timeout.h b/src/include/utils/timeout.h
index 7b19beafdc9..f2751c5b4df 100644
--- a/src/include/utils/timeout.h
+++ b/src/include/utils/timeout.h
@@ -36,6 +36,7 @@ typedef enum TimeoutId
IDLE_STATS_UPDATE_TIMEOUT,
CLIENT_CONNECTION_CHECK_TIMEOUT,
STARTUP_PROGRESS_TIMEOUT,
+ PROGRESSIVE_EXPLAIN_TIMEOUT,
/* First user-definable timeout reason */
USER_TIMEOUT,
/* Maximum number of timeout reasons */
diff --git a/src/test/regress/expected/rules.out b/src/test/regress/expected/rules.out
index 5baba8d39ff..b93c22773be 100644
--- a/src/test/regress/expected/rules.out
+++ b/src/test/regress/expected/rules.out
@@ -2041,6 +2041,10 @@ pg_stat_progress_create_index| SELECT s.pid,
s.param15 AS partitions_done
FROM (pg_stat_get_progress_info('CREATE INDEX'::text) s(pid, datid, relid, param1, param2, param3, param4, param5, param6, param7, param8, param9, param10, param11, param12, param13, param14, param15, param16, param17, param18, param19, param20)
LEFT JOIN pg_database d ON ((s.datid = d.oid)));
+pg_stat_progress_explain| SELECT pid,
+ last_print,
+ query_plan
+ FROM pg_stat_progress_explain(true) pg_stat_progress_explain(pid, last_print, query_plan);
pg_stat_progress_vacuum| SELECT s.pid,
s.datid,
d.datname,
diff --git a/src/tools/pgindent/typedefs.list b/src/tools/pgindent/typedefs.list
index 80aa50d55a4..0ae80866978 100644
--- a/src/tools/pgindent/typedefs.list
+++ b/src/tools/pgindent/typedefs.list
@@ -2250,6 +2250,7 @@ ProcessUtilityContext
ProcessUtility_hook_type
ProcessingMode
ProgressCommandType
+ProgressiveExplain
ProjectSet
ProjectSetPath
ProjectSetState
@@ -3847,6 +3848,9 @@ process_sublinks_context
proclist_head
proclist_mutable_iter
proclist_node
+progressiveExplainData
+progressiveExplainHashEntry
+progressiveExplainHashKey
promptStatus_t
pthread_barrier_t
pthread_cond_t
--
2.43.0
Fixed a corner case where pg_stat_progress_explain is looking at its own
plan.
Previous message in this thread contains all relevant implementation
details of
the last patch.
On Tue, Feb 18, 2025 at 3:31 PM Rafael Thofehrn Castro <rafaelthca@gmail.com>
wrote:
Show quoted text
Hello all,
Sending a new version of the patch that includes important changes
addressing
feedback provided by Greg and Tomas. So, including the previous version
(v5)
sent on Jan 29, these are the highlights of what has changed:- Progressive plan printed on regular interval defined by
progressive_explain_timeout
now uses timeouts. GUC progressive_explain_sampe_rate is removed.- Objects allocated per plan print (Instrument and ExplainState) were
replaced
by reusable objects allocated at query start (during progressive explain
setup
phase). So currently we are allocating only 2 objects for the complete
duration
of the feature. With that, removed the temporary memory context that was
being
allocated per iteration.- progressive_explain GUC was changed from boolean to enum, accepting
values 'off',
'explain' and 'analyze'. This allows using instrumented progressive
explains
for any query and not only the ones started via EXPLAIN ANALYZE. If GUC is
set to 'explain' the plan will be printed only once at query start. If set
to 'analyze' instrumentation will be enabled in QueryDesc and the detailed
plan will be printed iteratively. Considering that now we can enable
instrumentation
for regular queries, added the following GUCs to control what instruments
are enabled: progressive_explain_buffers, progressive_explain_timing and
progressive_explain_wals.- better handling of shared memory space where plans are printed and shared
with other backends. In previous version we had a shared hash with elements
holding all data related to progressive explains, including the complete
plan string:typedef struct explainHashEntry
{
explainHashKey key; /* hash key of entry - MUST BE FIRST */
int pid;
TimestampTz last_explain;
int explain_count;
float explain_duration;
char plan[];
} explainHashEntry;The allocated size per element used to be defined by
progressive_explain_output_size,
which would essentially control the space available for plan[].Greg raised the concern of PG having to allocate too much shared memory
at database start considering that we need enough space for
max_connections +
max_parallel_workers, and that is a totally valid point.So the new version takes advantage of DSAs. Each backend creates its own
DSA at query start (if progressive explain is enabled) where the shared
data is stored. That DSA is shared with other backends via hash structure
through dsa_handle and dsa_pointer pointers:typedef struct progressiveExplainHashEntry
{
progressiveExplainHashKey key; /* hash key of entry - MUST BE FIRST */
dsa_handle h;
dsa_pointer p;
} progressiveExplainHashEntry;typedef struct progressiveExplainData
{
int pid;
TimestampTz last_print;
char plan[];
} progressiveExplainData;That allows us to allocate areas of custom sizes for plan[]. The strategy
being used currently is to allocate an initial space with the size of the
initial plan output + PROGRESSIVE_EXPLAIN_ALLOC_SIZE (4096 currently),
which
gives PG enough room for subsequent iterations where the new string may
increase a bit, without having to reallocate space. The code checks sizes
and
will reallocate if needed. With that, GUC progressive_explain_output_size
was removed.- Adjusted columns of pg_stat_progress_explain. Columns explain_count and
total_explain_time were removed. Column last_explain was renamed to
last_print.
Column explain was renamed to query_plan, as this is the name used by PG
when a plan is printed with EXPLAIN.Rafael.
Attachments:
v6-0001-Proposal-for-progressive-explains.patchapplication/octet-stream; name=v6-0001-Proposal-for-progressive-explains.patchDownload
From 8c0533092fdba3a102188e552a6c6b43530f9e2e Mon Sep 17 00:00:00 2001
From: rafaelthca <rafaelthca@gmail.com>
Date: Tue, 7 Jan 2025 19:32:16 +0000
Subject: [PATCH v6] Proposal for progressive explains
This proposal introduces a feature to print execution plans of active
queries in an in-memory shared hash object so that other sessions can
visualize via new view pg_stat_progress_explain.
Plans are only printed if new GUC parameter progressive_explain is
enabled.
When progressive_explain is set to 'explain' the plan will be printed
only once at the beginning of the query. If set to 'analyze' the QueryDesc
will be adjusted adding instrumentation flags. In that case the plan
will be printed on a fixed interval controlled by new GUC parameter
progressive_explain_interval including all instrumentation stats
computed so far (per node rows and execution time).
New view:
- pg_stat_progress_explain
- pid: PID of the process running the query
- last_print: timestamp when plan was last printed
- query_plan: the actual plan (limited read privileges)
New GUCs:
- progressive_explain: if progressive plans are printed for local
session.
- type: enum
- default: off
- values: [off, explain, analyze]
- context: user
- progressive_explain_interval: interval between each explain print.
- type: int
- default: 1s
- min: 10ms
- context: user
- progressive_explain_format: format used to print the plans.
- type: enum
- default: text
- values: [TEXT, XML, JSON, or YAML]
- context: user
- progressive_explain_settings: controls whether information about
modified configuration is added to the printed plan.
- type: bool
- default: off
- context: user
- progressive_explain_verbose: controls whether verbose details are
added to the printed plan.
- type: bool
- default: off
- context: user
- progressive_explain_buffers: controls whether buffers details are
added to the printed plan.
- type: bool
- default: off
- context: user
- progressive_explain_timing: controls whether per node timing details
are added to the printed plan.
- type: bool
- default: off
- context: user
- progressive_explain_wal: controls whether WAL record generation
details are added to the printed plan.
- type: bool
- default: off
- context: user
---
doc/src/sgml/config.sgml | 120 ++++
doc/src/sgml/monitoring.sgml | 64 ++
doc/src/sgml/perform.sgml | 95 +++
src/backend/catalog/system_views.sql | 5 +
src/backend/commands/explain.c | 588 ++++++++++++++++--
src/backend/executor/execMain.c | 24 +
src/backend/executor/execProcnode.c | 36 +-
src/backend/executor/instrument.c | 20 +-
src/backend/storage/ipc/ipci.c | 7 +
src/backend/storage/lmgr/lwlock.c | 1 +
.../utils/activity/wait_event_names.txt | 1 +
src/backend/utils/init/postinit.c | 10 +
src/backend/utils/misc/guc_tables.c | 112 ++++
src/backend/utils/misc/postgresql.conf.sample | 12 +
src/include/catalog/pg_proc.dat | 10 +
src/include/commands/explain.h | 42 ++
src/include/executor/execdesc.h | 1 +
src/include/executor/instrument.h | 1 +
src/include/nodes/execnodes.h | 4 +
src/include/storage/lwlock.h | 1 +
src/include/storage/lwlocklist.h | 1 +
src/include/utils/guc.h | 8 +
src/include/utils/timeout.h | 1 +
src/test/regress/expected/rules.out | 4 +
src/tools/pgindent/typedefs.list | 4 +
25 files changed, 1116 insertions(+), 56 deletions(-)
diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml
index 336630ce417..ec44ccc693d 100644
--- a/doc/src/sgml/config.sgml
+++ b/doc/src/sgml/config.sgml
@@ -8411,6 +8411,126 @@ COPY postgres_log FROM '/full/path/to/logfile.csv' WITH csv;
</listitem>
</varlistentry>
+ <varlistentry id="guc-progressive-explain" xreflabel="progressive_explain">
+ <term><varname>progressive_explain</varname> (<type>enum</type>)
+ <indexterm>
+ <primary><varname>progressive_explain</varname> configuration parameter</primary>
+ </indexterm>
+ </term>
+ <listitem>
+ <para>
+ Determines whether progressive explains are enabled and how
+ they are executed; see <xref linkend="using-explain-progressive"/>.
+ Possible values are <literal>off</literal>, <literal>explain</literal>
+ and <literal>analyze</literal>. The default is <literal>off</literal>.
+ </para>
+ </listitem>
+ </varlistentry>
+
+ <varlistentry id="guc-progressive-explain-buffers" xreflabel="progressive_explain_buffers">
+ <term><varname>progressive_explain_buffers</varname> (<type>boolean</type>)
+ <indexterm>
+ <primary><varname>progressive_explain_buffers</varname> configuration parameter</primary>
+ </indexterm>
+ </term>
+ <listitem>
+ <para>
+ Controls whether information about buffers is added to
+ progressive explains. Equivalent to the BUFFERS option of
+ EXPLAIN. The default is <literal>off</literal>.
+ </para>
+ </listitem>
+ </varlistentry>
+
+ <varlistentry id="guc-progressive-explain-timing" xreflabel="progressive_explain_timing">
+ <term><varname>progressive_explain_timing</varname> (<type>boolean</type>)
+ <indexterm>
+ <primary><varname>progressive_explain_timing</varname> configuration parameter</primary>
+ </indexterm>
+ </term>
+ <listitem>
+ <para>
+ Controls whether information about per node timing is added
+ to progressive explains. Equivalent to the TIMING option of
+ EXPLAIN. The default is <literal>off</literal>.
+ </para>
+ </listitem>
+ </varlistentry>
+
+ <varlistentry id="guc-progressive-explain-wal" xreflabel="progressive_explain_wal">
+ <term><varname>progressive_explain_wal</varname> (<type>boolean</type>)
+ <indexterm>
+ <primary><varname>progressive_explain_wal</varname> configuration parameter</primary>
+ </indexterm>
+ </term>
+ <listitem>
+ <para>
+ Controls whether information about WAL record generation is
+ added to progressive explains. Equivalent to the WAL option of
+ EXPLAIN. The default is <literal>off</literal>.
+ </para>
+ </listitem>
+ </varlistentry>
+
+ <varlistentry id="guc-progressive-explain-verbose" xreflabel="progressive_explain_verbose">
+ <term><varname>progressive_explain_verbose</varname> (<type>boolean</type>)
+ <indexterm>
+ <primary><varname>progressive_explain_verbose</varname> configuration parameter</primary>
+ </indexterm>
+ </term>
+ <listitem>
+ <para>
+ Controls whether verbose details are added to progressive explains.
+ Equivalent to the VERBOSE option of EXPLAIN. The default is
+ <literal>off</literal>.
+ </para>
+ </listitem>
+ </varlistentry>
+
+ <varlistentry id="guc-progressive-explain-settings" xreflabel="progressive_explain_settings">
+ <term><varname>progressive_explain_settings</varname> (<type>boolean</type>)
+ <indexterm>
+ <primary><varname>progressive_explain_settings</varname> configuration parameter</primary>
+ </indexterm>
+ </term>
+ <listitem>
+ <para>
+ Controls whether information about modified configuration is added to
+ progressive explains. Equivalent to the SETTINGS option of EXPLAIN.
+ The default is <literal>off</literal>.
+ </para>
+ </listitem>
+ </varlistentry>
+
+ <varlistentry id="guc-progressive-explain-interval" xreflabel="progressive_explain_interval">
+ <term><varname>progressive_explain_interval</varname> (<type>integer</type>)
+ <indexterm>
+ <primary><varname>progressive_explain_interval</varname> configuration parameter</primary>
+ </indexterm>
+ </term>
+ <listitem>
+ <para>
+ Sets the interval (in milliseconds) between each instrumented
+ progressive explain. The default is <literal>1s</literal>.
+ </para>
+ </listitem>
+ </varlistentry>
+
+ <varlistentry id="guc-progressive-explain-format" xreflabel="progressive_explain_format">
+ <term><varname>progressive_explain_format</varname> (<type>enum</type>)
+ <indexterm>
+ <primary><varname>progressive_explain_format</varname> configuration parameter</primary>
+ </indexterm>
+ </term>
+ <listitem>
+ <para>
+ Selects the EXPLAIN output format to be used with progressive
+ explains. Equivalent to the FORMAT option of EXPLAIN. The default
+ is <literal>text</literal>.
+ </para>
+ </listitem>
+ </varlistentry>
+
</variablelist>
</sect2>
diff --git a/doc/src/sgml/monitoring.sgml b/doc/src/sgml/monitoring.sgml
index 928a6eb64b0..0a64ea78324 100644
--- a/doc/src/sgml/monitoring.sgml
+++ b/doc/src/sgml/monitoring.sgml
@@ -6818,6 +6818,70 @@ FROM pg_stat_get_backend_idset() AS backendid;
</sect2>
+<sect2 id="explain-progress-reporting">
+ <title>EXPLAIN Progress Reporting</title>
+
+ <indexterm>
+ <primary>pg_stat_progress_explain</primary>
+ </indexterm>
+
+ <para>
+ Whenever a client backend or parallel worker is running a query with
+ <xref linkend="guc-progressive-explain"/> enabled, the
+ <structname>pg_stat_progress_explain</structname> view will contain a
+ corresponding row with query plan details; see
+ <xref linkend="using-explain-progressive"/>. The table below describe the
+ information that will be reported.
+ </para>
+
+ <table id="pg-stat-progress-explain-view" xreflabel="pg_stat_progress_explain">
+ <title><structname>pg_stat_progress_explain</structname> View</title>
+ <tgroup cols="1">
+ <thead>
+ <row>
+ <entry role="catalog_table_entry"><para role="column_definition">
+ Column Type
+ </para>
+ <para>
+ Description
+ </para></entry>
+ </row>
+ </thead>
+
+ <tbody>
+ <row>
+ <entry role="catalog_table_entry"><para role="column_definition">
+ <structfield>pid</structfield> <type>integer</type>
+ </para>
+ <para>
+ Process ID of a client backend or parallel worker.
+ </para></entry>
+ </row>
+
+ <row>
+ <entry role="catalog_table_entry"><para role="column_definition">
+ <structfield>last_print</structfield> <type>timestamp with time zone</type>
+ </para>
+ <para>
+ Timestamp when plan was last printed.
+ </para></entry>
+ </row>
+
+ <row>
+ <entry role="catalog_table_entry"><para role="column_definition">
+ <structfield>query_plan</structfield> <type>text</type>
+ </para>
+ <para>
+ The progressive explain text.
+ </para></entry>
+ </row>
+
+ </tbody>
+ </tgroup>
+ </table>
+
+ </sect2>
+
</sect1>
<sect1 id="dynamic-trace">
diff --git a/doc/src/sgml/perform.sgml b/doc/src/sgml/perform.sgml
index a502a2aaba3..349e8a66b65 100644
--- a/doc/src/sgml/perform.sgml
+++ b/doc/src/sgml/perform.sgml
@@ -1109,6 +1109,101 @@ EXPLAIN ANALYZE SELECT * FROM tenk1 WHERE unique1 < 100 AND unique2 > 9000
</para>
</sect2>
+ <sect2 id="using-explain-progressive">
+ <title>Progressive <command>EXPLAIN</command></title>
+
+ <para>
+ The query plan created by the planner for any given active query can
+ be visualized by any session via <xref linkend="pg-stat-progress-explain-view"/>
+ view when <xref linkend="guc-progressive-explain"/> is enabled in the
+ client backend or parallel worker executing query. Settings
+ <xref linkend="guc-progressive-explain-timing"/>,
+ <xref linkend="guc-progressive-explain-buffers"/> and
+ <xref linkend="guc-progressive-explain-wal"/> control which additional
+ instrumentaton details are collected and included in the output while
+ <xref linkend="guc-progressive-explain-format"/>,
+ <xref linkend="guc-progressive-explain-verbose"/> and
+ <xref linkend="guc-progressive-explain-settings"/>
+ define how the plan is printed and which details are added there.
+ </para>
+
+ <para>
+ When <xref linkend="guc-progressive-explain"/> is set to <literal>explain</literal>
+ the plan will be printed once at the beginning of the query.
+ </para>
+
+ <para>
+<screen>
+SET progressive_explain = 'explain';
+SET
+
+SELECT * FROM test t1 INNER JOIN test t2 ON t1.c1=t2.c1;
+</screen>
+ </para>
+ <para>
+<screen>
+SELECT * FROM pg_stat_progress_explain;
+ pid | last_print | query_plan
+------+-------------------------------+------------------------------------------------------------------------------
+ 5307 | 2025-02-18 09:37:41.781459-03 | Hash Join (cost=327879.85..878413.95 rows=9999860 width=18) +
+ | | Hash Cond: (t1.c1 = t2.c1) +
+ | | -> Seq Scan on test t1 (cost=0.00..154053.60 rows=9999860 width=9) +
+ | | -> Hash (cost=154053.60..154053.60 rows=9999860 width=9) +
+ | | -> Seq Scan on test t2 (cost=0.00..154053.60 rows=9999860 width=9)+
+ | |
+(1 row)
+</screen>
+ </para>
+
+ <para>
+ Setting <xref linkend="guc-progressive-explain"/> to <literal>analyze</literal>
+ will enable instrumentation and the detailed plan is printed on a fixed interval
+ controlled by <xref linkend="guc-progressive-explain-interval"/>, including
+ per node accumulated row count and other statistics.
+ </para>
+
+ <para>
+ Progressive explains include additional information per node to help analyzing
+ execution progress:
+
+ <itemizedlist>
+ <listitem>
+ <para>
+ current: the plan node currently being processed.
+ </para>
+ </listitem>
+ <listitem>
+ <para>
+ never executed: a plan node not processed yet.
+ </para>
+ </listitem>
+ </itemizedlist>
+ </para>
+ <para>
+<screen>
+SET progressive_explain = 'analyze';
+SET
+
+SELECT * FROM test t1 INNER JOIN test t2 ON t1.c1=t2.c1;
+</screen>
+ </para>
+ <para>
+<screen>
+SELECT * FROM pg_stat_progress_explain;
+ pid | last_print | query_plan
+------+-------------------------------+------------------------------------------------------------------------------------------------------------------------------------
+ 5307 | 2025-02-18 09:36:03.580721-03 | Hash Join (cost=327879.85..878413.95 rows=9999860 width=18) (actual time=2010.504..2998.111 rows=38603 loops=1) +
+ | | Hash Cond: (t1.c1 = t2.c1) +
+ | | -> Seq Scan on test t1 (cost=0.00..154053.60 rows=9999860 width=9) (actual time=0.068..303.963 rows=4928320 loops=1) (current)+
+ | | -> Hash (cost=154053.60..154053.60 rows=9999860 width=9) (actual time=2009.824..2009.824 rows=10000000 loops=1) +
+ | | -> Seq Scan on test t2 (cost=0.00..154053.60 rows=9999860 width=9) (actual time=0.035..640.890 rows=10000000 loops=1) +
+ | |
+(1 row)
+</screen>
+ </para>
+
+ </sect2>
+
</sect1>
<sect1 id="planner-stats">
diff --git a/src/backend/catalog/system_views.sql b/src/backend/catalog/system_views.sql
index eff0990957e..445aa317ad0 100644
--- a/src/backend/catalog/system_views.sql
+++ b/src/backend/catalog/system_views.sql
@@ -1338,6 +1338,11 @@ CREATE VIEW pg_stat_progress_copy AS
FROM pg_stat_get_progress_info('COPY') AS S
LEFT JOIN pg_database D ON S.datid = D.oid;
+CREATE VIEW pg_stat_progress_explain AS
+ SELECT
+ *
+ FROM pg_stat_progress_explain(true);
+
CREATE VIEW pg_user_mappings AS
SELECT
U.oid AS umid,
diff --git a/src/backend/commands/explain.c b/src/backend/commands/explain.c
index dc4bef9ab81..c4e93830ddb 100644
--- a/src/backend/commands/explain.c
+++ b/src/backend/commands/explain.c
@@ -19,6 +19,7 @@
#include "commands/defrem.h"
#include "commands/prepare.h"
#include "foreign/fdwapi.h"
+#include "funcapi.h"
#include "jit/jit.h"
#include "libpq/pqformat.h"
#include "libpq/protocol.h"
@@ -30,6 +31,7 @@
#include "rewrite/rewriteHandler.h"
#include "storage/bufmgr.h"
#include "tcop/tcopprot.h"
+#include "utils/backend_status.h"
#include "utils/builtins.h"
#include "utils/guc_tables.h"
#include "utils/json.h"
@@ -37,17 +39,27 @@
#include "utils/rel.h"
#include "utils/ruleutils.h"
#include "utils/snapmgr.h"
+#include "utils/timeout.h"
#include "utils/tuplesort.h"
#include "utils/typcache.h"
#include "utils/xml.h"
+
+
+#define PROGRESSIVE_EXPLAIN_ALLOC_SIZE 4096
+
/* Hook for plugins to get control in ExplainOneQuery() */
ExplainOneQuery_hook_type ExplainOneQuery_hook = NULL;
/* Hook for plugins to get control in explain_get_index_name() */
explain_get_index_name_hook_type explain_get_index_name_hook = NULL;
+/* Shared hash to store progressive explains */
+static HTAB *progressiveExplainArray = NULL;
+
+/* Flag set by timeouts to control when to print progressive explains */
+bool ProgressiveExplainPending = false;
/* Instrumentation data for SERIALIZE option */
typedef struct SerializeMetrics
@@ -140,7 +152,7 @@ static void show_hashagg_info(AggState *aggstate, ExplainState *es);
static void show_tidbitmap_info(BitmapHeapScanState *planstate,
ExplainState *es);
static void show_instrumentation_count(const char *qlabel, int which,
- PlanState *planstate, ExplainState *es);
+ Instrumentation *instr, ExplainState *es);
static void show_foreignscan_info(ForeignScanState *fsstate, ExplainState *es);
static const char *explain_get_index_name(Oid indexId);
static bool peek_buffer_usage(ExplainState *es, const BufferUsage *usage);
@@ -180,6 +192,9 @@ static void ExplainJSONLineEnding(ExplainState *es);
static void ExplainYAMLLineStarting(ExplainState *es);
static void escape_yaml(StringInfo buf, const char *str);
static SerializeMetrics GetSerializationMetrics(DestReceiver *dest);
+static void ProgressiveExplainPrint(QueryDesc *queryDesc);
+static void ProgressiveExplainCleanup(dsa_area *a);
+static void ProgressiveExplainReleaseFunc(void *);
@@ -392,6 +407,8 @@ NewExplainState(void)
es->costs = true;
/* Prepare output buffer. */
es->str = makeStringInfo();
+ /* An explain state is not progressive by default */
+ es->progressive = false;
return es;
}
@@ -1504,6 +1521,7 @@ ExplainNode(PlanState *planstate, List *ancestors,
const char *partialmode = NULL;
const char *operation = NULL;
const char *custom_name = NULL;
+ Instrumentation *local_instr = NULL;
ExplainWorkersState *save_workers_state = es->workers_state;
int save_indent = es->indent;
bool haschildren;
@@ -1967,28 +1985,65 @@ ExplainNode(PlanState *planstate, List *ancestors,
* instrumentation results the user didn't ask for. But we do the
* InstrEndLoop call anyway, if possible, to reduce the number of cases
* auto_explain has to contend with.
+ *
+ * For regular explains instrumentation clean up is called directly in the
+ * main instrumentation objects. Progressive explains need to clone
+ * instrumentation object and forcibly end the loop in nodes that may be
+ * running.
*/
if (planstate->instrument)
- InstrEndLoop(planstate->instrument);
+ {
+ /* Progressive explain. Use auxiliary instrumentation object */
+ if (es->progressive)
+ {
+ local_instr = es->pe_local_instr;
+ *local_instr = *planstate->instrument;
+
+ /* Force end loop even if node is in progress */
+ InstrEndLoopForce(local_instr);
+ }
+ /* Use main instrumentation */
+ else
+ {
+ local_instr = planstate->instrument;
+ InstrEndLoop(local_instr);
+ }
+ }
if (es->analyze &&
- planstate->instrument && planstate->instrument->nloops > 0)
+ local_instr && local_instr->nloops > 0)
{
- double nloops = planstate->instrument->nloops;
- double startup_ms = 1000.0 * planstate->instrument->startup / nloops;
- double total_ms = 1000.0 * planstate->instrument->total / nloops;
- double rows = planstate->instrument->ntuples / nloops;
+ double nloops = local_instr->nloops;
+ double startup_ms = 1000.0 * local_instr->startup / nloops;
+ double total_ms = 1000.0 * local_instr->total / nloops;
+ double rows = local_instr->ntuples / nloops;
if (es->format == EXPLAIN_FORMAT_TEXT)
{
if (es->timing)
- appendStringInfo(es->str,
- " (actual time=%.3f..%.3f rows=%.0f loops=%.0f)",
- startup_ms, total_ms, rows, nloops);
+ {
+ /* Node in progress */
+ if (es->progressive && planstate == es->pe_curr_node)
+ appendStringInfo(es->str,
+ " (actual time=%.3f..%.3f rows=%.0f loops=%.0f) (current)",
+ startup_ms, total_ms, rows, nloops);
+ else
+ appendStringInfo(es->str,
+ " (actual time=%.3f..%.3f rows=%.0f loops=%.0f)",
+ startup_ms, total_ms, rows, nloops);
+ }
else
- appendStringInfo(es->str,
- " (actual rows=%.0f loops=%.0f)",
- rows, nloops);
+ {
+ /* Node in progress */
+ if (es->progressive && planstate == es->pe_curr_node)
+ appendStringInfo(es->str,
+ " (actual rows=%.0f loops=%.0f) (current)",
+ rows, nloops);
+ else
+ appendStringInfo(es->str,
+ " (actual rows=%.0f loops=%.0f)",
+ rows, nloops);
+ }
}
else
{
@@ -1999,6 +2054,9 @@ ExplainNode(PlanState *planstate, List *ancestors,
ExplainPropertyFloat("Actual Total Time", "ms", total_ms,
3, es);
}
+ /* Progressive explain. Add current node flag is applicable */
+ if (es->progressive && planstate == es->pe_curr_node)
+ ExplainPropertyBool("Current", true, es);
ExplainPropertyFloat("Actual Rows", NULL, rows, 0, es);
ExplainPropertyFloat("Actual Loops", NULL, nloops, 0, es);
}
@@ -2107,29 +2165,29 @@ ExplainNode(PlanState *planstate, List *ancestors,
"Index Cond", planstate, ancestors, es);
if (((IndexScan *) plan)->indexqualorig)
show_instrumentation_count("Rows Removed by Index Recheck", 2,
- planstate, es);
+ local_instr, es);
show_scan_qual(((IndexScan *) plan)->indexorderbyorig,
"Order By", planstate, ancestors, es);
show_scan_qual(plan->qual, "Filter", planstate, ancestors, es);
if (plan->qual)
show_instrumentation_count("Rows Removed by Filter", 1,
- planstate, es);
+ local_instr, es);
break;
case T_IndexOnlyScan:
show_scan_qual(((IndexOnlyScan *) plan)->indexqual,
"Index Cond", planstate, ancestors, es);
if (((IndexOnlyScan *) plan)->recheckqual)
show_instrumentation_count("Rows Removed by Index Recheck", 2,
- planstate, es);
+ local_instr, es);
show_scan_qual(((IndexOnlyScan *) plan)->indexorderby,
"Order By", planstate, ancestors, es);
show_scan_qual(plan->qual, "Filter", planstate, ancestors, es);
if (plan->qual)
show_instrumentation_count("Rows Removed by Filter", 1,
- planstate, es);
+ local_instr, es);
if (es->analyze)
ExplainPropertyFloat("Heap Fetches", NULL,
- planstate->instrument->ntuples2, 0, es);
+ local_instr->ntuples2, 0, es);
break;
case T_BitmapIndexScan:
show_scan_qual(((BitmapIndexScan *) plan)->indexqualorig,
@@ -2140,11 +2198,11 @@ ExplainNode(PlanState *planstate, List *ancestors,
"Recheck Cond", planstate, ancestors, es);
if (((BitmapHeapScan *) plan)->bitmapqualorig)
show_instrumentation_count("Rows Removed by Index Recheck", 2,
- planstate, es);
+ local_instr, es);
show_scan_qual(plan->qual, "Filter", planstate, ancestors, es);
if (plan->qual)
show_instrumentation_count("Rows Removed by Filter", 1,
- planstate, es);
+ local_instr, es);
show_tidbitmap_info((BitmapHeapScanState *) planstate, es);
break;
case T_SampleScan:
@@ -2161,7 +2219,7 @@ ExplainNode(PlanState *planstate, List *ancestors,
show_scan_qual(plan->qual, "Filter", planstate, ancestors, es);
if (plan->qual)
show_instrumentation_count("Rows Removed by Filter", 1,
- planstate, es);
+ local_instr, es);
if (IsA(plan, CteScan))
show_ctescan_info(castNode(CteScanState, planstate), es);
break;
@@ -2172,7 +2230,7 @@ ExplainNode(PlanState *planstate, List *ancestors,
show_scan_qual(plan->qual, "Filter", planstate, ancestors, es);
if (plan->qual)
show_instrumentation_count("Rows Removed by Filter", 1,
- planstate, es);
+ local_instr, es);
ExplainPropertyInteger("Workers Planned", NULL,
gather->num_workers, es);
@@ -2196,7 +2254,7 @@ ExplainNode(PlanState *planstate, List *ancestors,
show_scan_qual(plan->qual, "Filter", planstate, ancestors, es);
if (plan->qual)
show_instrumentation_count("Rows Removed by Filter", 1,
- planstate, es);
+ local_instr, es);
ExplainPropertyInteger("Workers Planned", NULL,
gm->num_workers, es);
@@ -2230,7 +2288,7 @@ ExplainNode(PlanState *planstate, List *ancestors,
show_scan_qual(plan->qual, "Filter", planstate, ancestors, es);
if (plan->qual)
show_instrumentation_count("Rows Removed by Filter", 1,
- planstate, es);
+ local_instr, es);
break;
case T_TableFuncScan:
if (es->verbose)
@@ -2244,7 +2302,7 @@ ExplainNode(PlanState *planstate, List *ancestors,
show_scan_qual(plan->qual, "Filter", planstate, ancestors, es);
if (plan->qual)
show_instrumentation_count("Rows Removed by Filter", 1,
- planstate, es);
+ local_instr, es);
show_table_func_scan_info(castNode(TableFuncScanState,
planstate), es);
break;
@@ -2262,7 +2320,7 @@ ExplainNode(PlanState *planstate, List *ancestors,
show_scan_qual(plan->qual, "Filter", planstate, ancestors, es);
if (plan->qual)
show_instrumentation_count("Rows Removed by Filter", 1,
- planstate, es);
+ local_instr, es);
}
break;
case T_TidRangeScan:
@@ -2279,14 +2337,14 @@ ExplainNode(PlanState *planstate, List *ancestors,
show_scan_qual(plan->qual, "Filter", planstate, ancestors, es);
if (plan->qual)
show_instrumentation_count("Rows Removed by Filter", 1,
- planstate, es);
+ local_instr, es);
}
break;
case T_ForeignScan:
show_scan_qual(plan->qual, "Filter", planstate, ancestors, es);
if (plan->qual)
show_instrumentation_count("Rows Removed by Filter", 1,
- planstate, es);
+ local_instr, es);
show_foreignscan_info((ForeignScanState *) planstate, es);
break;
case T_CustomScan:
@@ -2296,7 +2354,7 @@ ExplainNode(PlanState *planstate, List *ancestors,
show_scan_qual(plan->qual, "Filter", planstate, ancestors, es);
if (plan->qual)
show_instrumentation_count("Rows Removed by Filter", 1,
- planstate, es);
+ local_instr, es);
if (css->methods->ExplainCustomScan)
css->methods->ExplainCustomScan(css, ancestors, es);
}
@@ -2306,11 +2364,11 @@ ExplainNode(PlanState *planstate, List *ancestors,
"Join Filter", planstate, ancestors, es);
if (((NestLoop *) plan)->join.joinqual)
show_instrumentation_count("Rows Removed by Join Filter", 1,
- planstate, es);
+ local_instr, es);
show_upper_qual(plan->qual, "Filter", planstate, ancestors, es);
if (plan->qual)
show_instrumentation_count("Rows Removed by Filter", 2,
- planstate, es);
+ local_instr, es);
break;
case T_MergeJoin:
show_upper_qual(((MergeJoin *) plan)->mergeclauses,
@@ -2319,11 +2377,11 @@ ExplainNode(PlanState *planstate, List *ancestors,
"Join Filter", planstate, ancestors, es);
if (((MergeJoin *) plan)->join.joinqual)
show_instrumentation_count("Rows Removed by Join Filter", 1,
- planstate, es);
+ local_instr, es);
show_upper_qual(plan->qual, "Filter", planstate, ancestors, es);
if (plan->qual)
show_instrumentation_count("Rows Removed by Filter", 2,
- planstate, es);
+ local_instr, es);
break;
case T_HashJoin:
show_upper_qual(((HashJoin *) plan)->hashclauses,
@@ -2332,11 +2390,11 @@ ExplainNode(PlanState *planstate, List *ancestors,
"Join Filter", planstate, ancestors, es);
if (((HashJoin *) plan)->join.joinqual)
show_instrumentation_count("Rows Removed by Join Filter", 1,
- planstate, es);
+ local_instr, es);
show_upper_qual(plan->qual, "Filter", planstate, ancestors, es);
if (plan->qual)
show_instrumentation_count("Rows Removed by Filter", 2,
- planstate, es);
+ local_instr, es);
break;
case T_Agg:
show_agg_keys(castNode(AggState, planstate), ancestors, es);
@@ -2344,13 +2402,13 @@ ExplainNode(PlanState *planstate, List *ancestors,
show_hashagg_info((AggState *) planstate, es);
if (plan->qual)
show_instrumentation_count("Rows Removed by Filter", 1,
- planstate, es);
+ local_instr, es);
break;
case T_WindowAgg:
show_upper_qual(plan->qual, "Filter", planstate, ancestors, es);
if (plan->qual)
show_instrumentation_count("Rows Removed by Filter", 1,
- planstate, es);
+ local_instr, es);
show_upper_qual(((WindowAgg *) plan)->runConditionOrig,
"Run Condition", planstate, ancestors, es);
show_windowagg_info(castNode(WindowAggState, planstate), es);
@@ -2360,7 +2418,7 @@ ExplainNode(PlanState *planstate, List *ancestors,
show_upper_qual(plan->qual, "Filter", planstate, ancestors, es);
if (plan->qual)
show_instrumentation_count("Rows Removed by Filter", 1,
- planstate, es);
+ local_instr, es);
break;
case T_Sort:
show_sort_keys(castNode(SortState, planstate), ancestors, es);
@@ -2382,7 +2440,7 @@ ExplainNode(PlanState *planstate, List *ancestors,
show_upper_qual(plan->qual, "Filter", planstate, ancestors, es);
if (plan->qual)
show_instrumentation_count("Rows Removed by Filter", 1,
- planstate, es);
+ local_instr, es);
break;
case T_ModifyTable:
show_modifytable_info(castNode(ModifyTableState, planstate), ancestors,
@@ -2427,10 +2485,10 @@ ExplainNode(PlanState *planstate, List *ancestors,
}
/* Show buffer/WAL usage */
- if (es->buffers && planstate->instrument)
- show_buffer_usage(es, &planstate->instrument->bufusage);
- if (es->wal && planstate->instrument)
- show_wal_usage(es, &planstate->instrument->walusage);
+ if (es->buffers && local_instr)
+ show_buffer_usage(es, &local_instr->bufusage);
+ if (es->wal && local_instr)
+ show_wal_usage(es, &local_instr->walusage);
/* Prepare per-worker buffer/WAL usage */
if (es->workers_state && (es->buffers || es->wal) && es->verbose)
@@ -3947,19 +4005,19 @@ show_tidbitmap_info(BitmapHeapScanState *planstate, ExplainState *es)
*/
static void
show_instrumentation_count(const char *qlabel, int which,
- PlanState *planstate, ExplainState *es)
+ Instrumentation *instr, ExplainState *es)
{
double nfiltered;
double nloops;
- if (!es->analyze || !planstate->instrument)
+ if (!es->analyze || !instr)
return;
if (which == 2)
- nfiltered = planstate->instrument->nfiltered2;
+ nfiltered = instr->nfiltered2;
else
- nfiltered = planstate->instrument->nfiltered1;
- nloops = planstate->instrument->nloops;
+ nfiltered = instr->nfiltered1;
+ nloops = instr->nloops;
/* In text mode, suppress zero counts; they're not interesting enough */
if (nfiltered > 0 || es->format != EXPLAIN_FORMAT_TEXT)
@@ -4630,7 +4688,7 @@ show_modifytable_info(ModifyTableState *mtstate, List *ancestors,
{
show_upper_qual((List *) node->onConflictWhere, "Conflict Filter",
&mtstate->ps, ancestors, es);
- show_instrumentation_count("Rows Removed by Conflict Filter", 1, &mtstate->ps, es);
+ show_instrumentation_count("Rows Removed by Conflict Filter", 1, (&mtstate->ps)->instrument, es);
}
/* EXPLAIN ANALYZE display of actual outcome for each tuple proposed */
@@ -4639,11 +4697,24 @@ show_modifytable_info(ModifyTableState *mtstate, List *ancestors,
double total;
double insert_path;
double other_path;
+ Instrumentation *local_instr;
- InstrEndLoop(outerPlanState(mtstate)->instrument);
+ /* Progressive explain. Use auxiliary instrumentation object */
+ if (es->progressive)
+ {
+ local_instr = es->pe_local_instr;
+ *local_instr = *outerPlanState(mtstate)->instrument;
+ /* Force end loop even if node is in progress */
+ InstrEndLoopForce(local_instr);
+ }
+ else
+ {
+ local_instr = outerPlanState(mtstate)->instrument;
+ InstrEndLoop(local_instr);
+ }
/* count the number of source rows */
- total = outerPlanState(mtstate)->instrument->ntuples;
+ total = local_instr->ntuples;
other_path = mtstate->ps.instrument->ntuples2;
insert_path = total - other_path;
@@ -4663,11 +4734,24 @@ show_modifytable_info(ModifyTableState *mtstate, List *ancestors,
double update_path;
double delete_path;
double skipped_path;
+ Instrumentation *local_instr;
- InstrEndLoop(outerPlanState(mtstate)->instrument);
+ /* Progressive explain. Use auxiliary instrumentation object */
+ if (es->progressive)
+ {
+ local_instr = es->pe_local_instr;
+ *local_instr = *outerPlanState(mtstate)->instrument;
+ /* Force end loop even if node is in progress */
+ InstrEndLoopForce(local_instr);
+ }
+ else
+ {
+ local_instr = outerPlanState(mtstate)->instrument;
+ InstrEndLoop(local_instr);
+ }
/* count the number of source rows */
- total = outerPlanState(mtstate)->instrument->ntuples;
+ total = local_instr->ntuples;
insert_path = mtstate->mt_merge_inserted;
update_path = mtstate->mt_merge_updated;
delete_path = mtstate->mt_merge_deleted;
@@ -5922,3 +6006,401 @@ GetSerializationMetrics(DestReceiver *dest)
return empty;
}
+
+/*
+ * ProgressiveExplainSetup
+ * Adjusts QueryDesc with instrumentation for progressive explains.
+ *
+ * If progressive explain is enabled and configured to collect
+ * instrumentation details, we are going to adjust QueryDesc accordingly
+ * even if the query was not initiated with EXPLAIN ANALYZE. This
+ * will directly affect query execution and add computation overhead.
+ */
+void
+ProgressiveExplainSetup(QueryDesc *queryDesc)
+{
+ /* Adjust instrumentation if enabled */
+ if (progressive_explain == PROGRESSIVE_EXPLAIN_ANALYZE)
+ {
+ if (progressive_explain_timing)
+ queryDesc->instrument_options |= INSTRUMENT_TIMER;
+ else
+ queryDesc->instrument_options |= INSTRUMENT_ROWS;
+ if (progressive_explain_buffers)
+ queryDesc->instrument_options |= INSTRUMENT_BUFFERS;
+ if (progressive_explain_wal)
+ queryDesc->instrument_options |= INSTRUMENT_WAL;
+ }
+}
+
+/*
+ * ProgressiveExplainUpdate
+ * Updates progressive explain for instrumented runs.
+ */
+void
+ProgressiveExplainUpdate(PlanState *node)
+{
+ /* Track the current PlanState */
+ node->state->query_desc->pe_es->pe_curr_node = node;
+ ProgressiveExplainPrint(node->state->query_desc);
+ node->state->query_desc->pe_es->pe_curr_node = NULL;
+
+ /* Reset timeout flag */
+ ProgressiveExplainPending = false;
+}
+
+/*
+ * ProgressiveExplainFinish
+ * Finalizes query execution with progressive explain enabled.
+ */
+void
+ProgressiveExplainFinish(QueryDesc *queryDesc)
+{
+ ProgressiveExplainCleanup(queryDesc->pe_es->pe_a);
+}
+
+/*
+ * ProgressiveExplainCleanup
+ * Cleanup routine when progressive explain is enabled.
+ *
+ * We need deal with structures not automatically released by the memory
+ * context removal. Current tasks are:
+ * - remove local backend from progressive explain hash
+ * - detach from DSA used to store shared data
+ */
+void
+ProgressiveExplainCleanup(dsa_area *a)
+{
+ progressiveExplainHashKey key;
+
+ key.pid = MyProcPid;
+ LWLockAcquire(ExplainHashLock, LW_EXCLUSIVE);
+ if (a)
+ dsa_detach(a);
+ hash_search(progressiveExplainArray, &key, HASH_REMOVE, NULL);
+ LWLockRelease(ExplainHashLock);
+}
+
+/*
+ * ProgressiveExplainBegin
+ * Initialization of all structures related to progressive explains.
+ *
+ * We define a ExplainState that will be reused in every iteration of
+ * plan prints.
+ *
+ * Progressive explain plans are printed in shared memory via DSAs. Each
+ * A dynamic shared memory area is created to hold the progressive plans.
+ * Each backend printing plans has its own DSA, which is shared with other
+ * backends via the global progressive explain hash through dsa_handle and
+ * dsa_pointer pointers.
+ *
+ * A memory context release callback is defined for manual resource release
+ * in case of query cancellations.
+ *
+ * A periodid timeout is configured to print the plan in fixed intervals if
+ * progressive explain is configured with instrumentation enabled.
+ */
+void
+ProgressiveExplainBegin(QueryDesc *queryDesc)
+{
+ ExplainState *es;
+ progressiveExplainHashKey key;
+ progressiveExplainHashEntry *entry;
+ bool found;
+
+ /* Configure memory context release callback */
+ MemoryContextCallback *queryDescReleaseCallback;
+
+ queryDescReleaseCallback = (MemoryContextCallback *)
+ palloc0(sizeof(MemoryContextCallback));
+ queryDescReleaseCallback->func = ProgressiveExplainReleaseFunc;
+ queryDescReleaseCallback->arg = NULL;
+ MemoryContextRegisterResetCallback(CurrentMemoryContext,
+ queryDescReleaseCallback);
+
+ /* Initialize ExplainState to be used for all prints */
+ es = NewExplainState();
+ queryDesc->pe_es = es;
+
+ /* Local instrumentation object to be reused for every node */
+ es->pe_local_instr = palloc0(sizeof(Instrumentation));
+
+ /*
+ * Mark ExplainState as progressive so that the plan printing function
+ * adjusts logic accordingly.
+ */
+ es->progressive = true;
+
+ es->analyze = (queryDesc->instrument_options &&
+ (progressive_explain == PROGRESSIVE_EXPLAIN_ANALYZE));
+ es->buffers = (es->analyze && progressive_explain_buffers);
+ es->wal = (es->analyze && progressive_explain_wal);
+ es->timing = (es->analyze && progressive_explain_timing);
+ es->summary = (es->analyze);
+ es->format = progressive_explain_format;
+ es->verbose = progressive_explain_verbose;
+ es->settings = progressive_explain_settings;
+
+ /* Exclusive access is needed to update the hash */
+ LWLockAcquire(ExplainHashLock, LW_EXCLUSIVE);
+
+ /* Find or create an entry with desired hash code */
+ key.pid = MyProcPid;
+ entry = (progressiveExplainHashEntry *) hash_search(progressiveExplainArray,
+ &key,
+ HASH_ENTER,
+ &found);
+
+ /* Define the DSA and share through the hash */
+ es->pe_a = dsa_create(LWTRANCHE_PROGRESSIVE_EXPLAIN_DSA);
+ entry->h = dsa_get_handle(es->pe_a);
+ entry->p = (dsa_pointer) NULL;
+
+ LWLockRelease(ExplainHashLock);
+
+ /* Enable timeout only if instrumentation is enabled */
+ if (es->analyze)
+ enable_timeout_every(PROGRESSIVE_EXPLAIN_TIMEOUT,
+ GetCurrentTimestamp(),
+ progressive_explain_interval);
+
+ /* Printing progressive plan for the first time */
+ ProgressiveExplainPrint(queryDesc);
+}
+
+/*
+ * ProgressiveExplainPrint
+ * Prints progressive explain in memory.
+ *
+ * This function resets the reusable ExplainState, prints the
+ * plan and updates the DSA with new data.
+ *
+ * DSA memory allocation is also done here. Amount of shared
+ * memory allocated depends on size of currently printed plan.
+ * There may be reallocations in subsequent calls if new plans
+ * don't fit in the existing area.
+ */
+void
+ProgressiveExplainPrint(QueryDesc *queryDesc)
+{
+ bool alloc_needed = false;
+
+ /* Produce a plan only if descriptor is being tracked */
+ if (queryDesc &&
+ queryDesc->planstate)
+ {
+ QueryDesc *currentQueryDesc = queryDesc;
+
+ progressiveExplainHashKey key;
+ progressiveExplainHashEntry *entry;
+ progressiveExplainData *pe_data;
+ ExplainState *es = queryDesc->pe_es;
+
+ /* Reset the string to be reused */
+ resetStringInfo(es->str);
+
+ /* Print the plan */
+ ExplainBeginOutput(es);
+ ExplainPrintPlan(es, currentQueryDesc);
+ ExplainEndOutput(es);
+
+ /* Exclusive access is needed to update the hash */
+ key.pid = MyProcPid;
+ LWLockAcquire(ExplainHashLock, LW_EXCLUSIVE);
+ entry = (progressiveExplainHashEntry *) hash_search(progressiveExplainArray,
+ &key,
+ HASH_FIND,
+ NULL);
+
+ /* Entry exists */
+ if (entry)
+ {
+ /* Plan was never printed */
+ if (!entry->p)
+ alloc_needed = true;
+ else
+ {
+ pe_data = dsa_get_address(es->pe_a,
+ entry->p);
+
+ /*
+ * Plan does not fit in existing shared memory area.
+ * Reallocation is needed.
+ */
+ if (strlen(es->str->data) >
+ add_size(strlen(pe_data->plan),
+ PROGRESSIVE_EXPLAIN_ALLOC_SIZE))
+ {
+ dsa_free(es->pe_a, entry->p);
+ alloc_needed = true;
+ }
+ }
+
+ if (alloc_needed)
+ {
+ /*
+ * The allocated size combines the length of the currently
+ * printed query plan with an additional delta defined by
+ * PROGRESSIVE_EXPLAIN_ALLOC_SIZE. This strategy prevents
+ * having to reallocate the segment very often, which would be
+ * needed in case the length of the next printed exceeds the
+ * previously allocated size.
+ */
+ entry->p = dsa_allocate(es->pe_a,
+ add_size(sizeof(progressiveExplainData),
+ add_size(strlen(es->str->data),
+ PROGRESSIVE_EXPLAIN_ALLOC_SIZE)));
+ pe_data = dsa_get_address(es->pe_a, entry->p);
+ pe_data->pid = MyProcPid;
+ strcpy(pe_data->plan, "");
+ pe_data->last_print = 0;
+ }
+
+ /* Update shared memory with new data */
+ strcpy(pe_data->plan, es->str->data);
+ pe_data->last_print = GetCurrentTimestamp();
+ }
+
+ LWLockRelease(ExplainHashLock);
+ }
+}
+
+/*
+ * ProgressiveExplainReleaseFunc
+ * Memory context release callback function to remove
+ * plan from explain hash and disable the timeout.
+ */
+static void
+ProgressiveExplainReleaseFunc(void *arg)
+{
+ /* Remove row from hash */
+ progressiveExplainHashKey key;
+ progressiveExplainHashEntry *entry;
+
+ key.pid = MyProcPid;
+ LWLockAcquire(ExplainHashLock, LW_SHARED);
+ entry = (progressiveExplainHashEntry *) hash_search(progressiveExplainArray,
+ &key,
+ HASH_FIND,
+ NULL);
+ LWLockRelease(ExplainHashLock);
+ if (entry)
+ ProgressiveExplainCleanup(NULL);
+
+ /* Stop timeout */
+ disable_timeout(PROGRESSIVE_EXPLAIN_TIMEOUT, false);
+}
+
+/*
+ * ProgressiveExplainHashShmemSize
+ * Compute shared memory space needed for explain hash.
+ */
+Size
+ProgressiveExplainHashShmemSize(void)
+{
+ Size size = 0;
+ long max_table_size;
+
+ max_table_size = add_size(MaxBackends,
+ max_parallel_workers);
+ size = add_size(size,
+ hash_estimate_size(max_table_size,
+ sizeof(progressiveExplainHashEntry)));
+
+ return size;
+}
+
+/*
+ * InitProgressiveExplainHash
+ * Initialize hash used to store data of progressive explains.
+ */
+void
+InitProgressiveExplainHash(void)
+{
+ HASHCTL info;
+
+ info.keysize = sizeof(progressiveExplainHashKey);
+ info.entrysize = sizeof(progressiveExplainHashEntry);
+
+ progressiveExplainArray = ShmemInitHash("progressive explain hash",
+ 50, 50,
+ &info,
+ HASH_ELEM | HASH_BLOBS);
+}
+
+/*
+ * pg_stat_progress_explain
+ * Return the progress of progressive explains.
+ */
+Datum
+pg_stat_progress_explain(PG_FUNCTION_ARGS)
+{
+#define EXPLAIN_ACTIVITY_COLS 3
+ HASH_SEQ_STATUS hash_seq;
+ progressiveExplainHashEntry *entry;
+ dsa_area *a;
+ progressiveExplainData *ped;
+
+ ReturnSetInfo *rsinfo = (ReturnSetInfo *) fcinfo->resultinfo;
+
+ InitMaterializedSRF(fcinfo, 0);
+
+ LWLockAcquire(ExplainHashLock, LW_SHARED);
+
+ hash_seq_init(&hash_seq, progressiveExplainArray);
+ while ((entry = hash_seq_search(&hash_seq)) != NULL)
+ {
+ Datum values[EXPLAIN_ACTIVITY_COLS] = {0};
+ bool nulls[EXPLAIN_ACTIVITY_COLS] = {0};
+
+ /*
+ * We don't look at a DSA that doesn't contain data yet, or at our own
+ * row.
+ */
+ if (!DsaPointerIsValid(entry->p) ||
+ MyProcPid == entry->key.pid)
+ continue;
+
+ a = dsa_attach(entry->h);
+ ped = dsa_get_address(a, entry->p);
+
+ values[0] = ped->pid;
+ values[1] = TimestampTzGetDatum(ped->last_print);
+
+ if (superuser())
+ values[2] = CStringGetTextDatum(ped->plan);
+ else
+ {
+ int num_backends = pgstat_fetch_stat_numbackends();
+ int curr_backend;
+
+ /* 1-based index */
+ for (curr_backend = 1; curr_backend <= num_backends; curr_backend++)
+ {
+ LocalPgBackendStatus *local_beentry;
+ PgBackendStatus *beentry;
+
+ /* Get the next one in the list */
+ local_beentry = pgstat_get_local_beentry_by_index(curr_backend);
+ beentry = &local_beentry->backendStatus;
+
+ if (beentry->st_procpid == ped->pid)
+ {
+ if (beentry->st_userid == GetUserId())
+ values[2] = CStringGetTextDatum(ped->plan);
+ else
+ values[2] = CStringGetTextDatum("<insufficient privilege>");
+ break;
+ }
+ }
+ }
+
+ tuplestore_putvalues(rsinfo->setResult, rsinfo->setDesc, values, nulls);
+
+ dsa_detach(a);
+
+ }
+ LWLockRelease(ExplainHashLock);
+
+ return (Datum) 0;
+}
diff --git a/src/backend/executor/execMain.c b/src/backend/executor/execMain.c
index 963aa390620..b7ff473a056 100644
--- a/src/backend/executor/execMain.c
+++ b/src/backend/executor/execMain.c
@@ -43,6 +43,7 @@
#include "access/xact.h"
#include "catalog/namespace.h"
#include "catalog/partition.h"
+#include "commands/explain.h"
#include "commands/matview.h"
#include "commands/trigger.h"
#include "executor/executor.h"
@@ -148,6 +149,12 @@ standard_ExecutorStart(QueryDesc *queryDesc, int eflags)
/* caller must ensure the query's snapshot is active */
Assert(GetActiveSnapshot() == queryDesc->snapshot);
+ /*
+ * Setup progressive explain if enabled.
+ */
+ if (progressive_explain != PROGRESSIVE_EXPLAIN_NONE)
+ ProgressiveExplainSetup(queryDesc);
+
/*
* If the transaction is read-only, we need to check if any writes are
* planned to non-temporary tables. EXPLAIN is considered read-only.
@@ -173,6 +180,11 @@ standard_ExecutorStart(QueryDesc *queryDesc, int eflags)
estate = CreateExecutorState();
queryDesc->estate = estate;
+ /*
+ * Adding back reference to QueryDesc
+ */
+ estate->query_desc = queryDesc;
+
oldcontext = MemoryContextSwitchTo(estate->es_query_cxt);
/*
@@ -258,6 +270,12 @@ standard_ExecutorStart(QueryDesc *queryDesc, int eflags)
*/
InitPlan(queryDesc, eflags);
+ /*
+ * Start progressive explain if enabled.
+ */
+ if (progressive_explain != PROGRESSIVE_EXPLAIN_NONE)
+ ProgressiveExplainBegin(queryDesc);
+
MemoryContextSwitchTo(oldcontext);
}
@@ -445,6 +463,12 @@ standard_ExecutorFinish(QueryDesc *queryDesc)
MemoryContextSwitchTo(oldcontext);
+ /*
+ * Finish progressive explain if enabled.
+ */
+ if (progressive_explain != PROGRESSIVE_EXPLAIN_NONE)
+ ProgressiveExplainFinish(queryDesc);
+
estate->es_finished = true;
}
diff --git a/src/backend/executor/execProcnode.c b/src/backend/executor/execProcnode.c
index f5f9cfbeead..521a7b41404 100644
--- a/src/backend/executor/execProcnode.c
+++ b/src/backend/executor/execProcnode.c
@@ -72,6 +72,7 @@
*/
#include "postgres.h"
+#include "commands/explain.h"
#include "executor/executor.h"
#include "executor/nodeAgg.h"
#include "executor/nodeAppend.h"
@@ -118,9 +119,11 @@
#include "executor/nodeWorktablescan.h"
#include "miscadmin.h"
#include "nodes/nodeFuncs.h"
+#include "utils/guc.h"
static TupleTableSlot *ExecProcNodeFirst(PlanState *node);
static TupleTableSlot *ExecProcNodeInstr(PlanState *node);
+static TupleTableSlot *ExecProcNodeInstrExplain(PlanState *node);
static bool ExecShutdownNode_walker(PlanState *node, void *context);
@@ -461,8 +464,14 @@ ExecProcNodeFirst(PlanState *node)
* does instrumentation. Otherwise we can dispense with all wrappers and
* have ExecProcNode() directly call the relevant function from now on.
*/
+
if (node->instrument)
- node->ExecProcNode = ExecProcNodeInstr;
+ {
+ if (progressive_explain == PROGRESSIVE_EXPLAIN_ANALYZE)
+ node->ExecProcNode = ExecProcNodeInstrExplain;
+ else
+ node->ExecProcNode = ExecProcNodeInstr;
+ }
else
node->ExecProcNode = node->ExecProcNodeReal;
@@ -489,6 +498,31 @@ ExecProcNodeInstr(PlanState *node)
return result;
}
+/*
+ * ExecProcNode wrapper that performs instrumentation calls and prints
+ * progressive explains. By keeping this a separate function, we add
+ * overhead only when progressive explain is enabled
+ */
+static TupleTableSlot *
+ExecProcNodeInstrExplain(PlanState *node)
+{
+ TupleTableSlot *result;
+
+ InstrStartNode(node->instrument);
+
+ /*
+ * Update progressive after timeout is reached.
+ */
+ if (ProgressiveExplainPending)
+ ProgressiveExplainUpdate(node);
+
+ result = node->ExecProcNodeReal(node);
+
+ InstrStopNode(node->instrument, TupIsNull(result) ? 0.0 : 1.0);
+
+ return result;
+}
+
/* ----------------------------------------------------------------
* MultiExecProcNode
diff --git a/src/backend/executor/instrument.c b/src/backend/executor/instrument.c
index 56e635f4700..6a160ee254f 100644
--- a/src/backend/executor/instrument.c
+++ b/src/backend/executor/instrument.c
@@ -25,6 +25,8 @@ static WalUsage save_pgWalUsage;
static void BufferUsageAdd(BufferUsage *dst, const BufferUsage *add);
static void WalUsageAdd(WalUsage *dst, WalUsage *add);
+static void InstrEndLoopInternal(Instrumentation *instr, bool force);
+
/* Allocate new instrumentation structure(s) */
Instrumentation *
@@ -137,7 +139,7 @@ InstrUpdateTupleCount(Instrumentation *instr, double nTuples)
/* Finish a run cycle for a plan node */
void
-InstrEndLoop(Instrumentation *instr)
+InstrEndLoopInternal(Instrumentation *instr, bool force)
{
double totaltime;
@@ -145,7 +147,7 @@ InstrEndLoop(Instrumentation *instr)
if (!instr->running)
return;
- if (!INSTR_TIME_IS_ZERO(instr->starttime))
+ if (!INSTR_TIME_IS_ZERO(instr->starttime) && !force)
elog(ERROR, "InstrEndLoop called on running node");
/* Accumulate per-cycle statistics into totals */
@@ -164,6 +166,20 @@ InstrEndLoop(Instrumentation *instr)
instr->tuplecount = 0;
}
+/* Safely finish a run cycle for a plan node */
+void
+InstrEndLoop(Instrumentation *instr)
+{
+ InstrEndLoopInternal(instr, false);
+}
+
+/* Forcibly finish a run cycle for a plan node */
+void
+InstrEndLoopForce(Instrumentation *instr)
+{
+ InstrEndLoopInternal(instr, true);
+}
+
/* aggregate instrumentation information */
void
InstrAggNode(Instrumentation *dst, Instrumentation *add)
diff --git a/src/backend/storage/ipc/ipci.c b/src/backend/storage/ipc/ipci.c
index 174eed70367..15d8a3b28a8 100644
--- a/src/backend/storage/ipc/ipci.c
+++ b/src/backend/storage/ipc/ipci.c
@@ -25,6 +25,7 @@
#include "access/xlogprefetcher.h"
#include "access/xlogrecovery.h"
#include "commands/async.h"
+#include "commands/explain.h"
#include "miscadmin.h"
#include "pgstat.h"
#include "postmaster/autovacuum.h"
@@ -148,6 +149,7 @@ CalculateShmemSize(int *num_semaphores)
size = add_size(size, WaitEventCustomShmemSize());
size = add_size(size, InjectionPointShmemSize());
size = add_size(size, SlotSyncShmemSize());
+ size = add_size(size, ProgressiveExplainHashShmemSize());
/* include additional requested shmem from preload libraries */
size = add_size(size, total_addin_request);
@@ -300,6 +302,11 @@ CreateOrAttachShmemStructs(void)
*/
PredicateLockShmemInit();
+ /*
+ * Set up instrumented explain hash table
+ */
+ InitProgressiveExplainHash();
+
/*
* Set up process table
*/
diff --git a/src/backend/storage/lmgr/lwlock.c b/src/backend/storage/lmgr/lwlock.c
index f1e74f184f1..1d713d942b4 100644
--- a/src/backend/storage/lmgr/lwlock.c
+++ b/src/backend/storage/lmgr/lwlock.c
@@ -166,6 +166,7 @@ static const char *const BuiltinTrancheNames[] = {
[LWTRANCHE_SUBTRANS_SLRU] = "SubtransSLRU",
[LWTRANCHE_XACT_SLRU] = "XactSLRU",
[LWTRANCHE_PARALLEL_VACUUM_DSA] = "ParallelVacuumDSA",
+ [LWTRANCHE_PROGRESSIVE_EXPLAIN_DSA] = "ProgressiveExplainDSA",
};
StaticAssertDecl(lengthof(BuiltinTrancheNames) ==
diff --git a/src/backend/utils/activity/wait_event_names.txt b/src/backend/utils/activity/wait_event_names.txt
index e199f071628..890acf02da9 100644
--- a/src/backend/utils/activity/wait_event_names.txt
+++ b/src/backend/utils/activity/wait_event_names.txt
@@ -346,6 +346,7 @@ WALSummarizer "Waiting to read or update WAL summarization state."
DSMRegistry "Waiting to read or update the dynamic shared memory registry."
InjectionPoint "Waiting to read or update information related to injection points."
SerialControl "Waiting to read or update shared <filename>pg_serial</filename> state."
+ExplainHash "Waiting to access backend explain shared hash table."
#
# END OF PREDEFINED LWLOCKS (DO NOT CHANGE THIS LINE)
diff --git a/src/backend/utils/init/postinit.c b/src/backend/utils/init/postinit.c
index 01bb6a410cb..41842b85e00 100644
--- a/src/backend/utils/init/postinit.c
+++ b/src/backend/utils/init/postinit.c
@@ -33,6 +33,7 @@
#include "catalog/pg_database.h"
#include "catalog/pg_db_role_setting.h"
#include "catalog/pg_tablespace.h"
+#include "commands/explain.h"
#include "libpq/auth.h"
#include "libpq/libpq-be.h"
#include "mb/pg_wchar.h"
@@ -78,6 +79,7 @@ static void TransactionTimeoutHandler(void);
static void IdleSessionTimeoutHandler(void);
static void IdleStatsUpdateTimeoutHandler(void);
static void ClientCheckTimeoutHandler(void);
+static void ProgressiveExplainTimeoutHandler(void);
static bool ThereIsAtLeastOneRole(void);
static void process_startup_options(Port *port, bool am_superuser);
static void process_settings(Oid databaseid, Oid roleid);
@@ -741,6 +743,8 @@ InitPostgres(const char *in_dbname, Oid dboid,
RegisterTimeout(CLIENT_CONNECTION_CHECK_TIMEOUT, ClientCheckTimeoutHandler);
RegisterTimeout(IDLE_STATS_UPDATE_TIMEOUT,
IdleStatsUpdateTimeoutHandler);
+ RegisterTimeout(PROGRESSIVE_EXPLAIN_TIMEOUT,
+ ProgressiveExplainTimeoutHandler);
}
/*
@@ -1397,6 +1401,12 @@ ClientCheckTimeoutHandler(void)
SetLatch(MyLatch);
}
+static void
+ProgressiveExplainTimeoutHandler(void)
+{
+ ProgressiveExplainPending = true;
+}
+
/*
* Returns true if at least one role is defined in this database cluster.
*/
diff --git a/src/backend/utils/misc/guc_tables.c b/src/backend/utils/misc/guc_tables.c
index cce73314609..fc64eb9b9b4 100644
--- a/src/backend/utils/misc/guc_tables.c
+++ b/src/backend/utils/misc/guc_tables.c
@@ -40,6 +40,7 @@
#include "catalog/storage.h"
#include "commands/async.h"
#include "commands/event_trigger.h"
+#include "commands/explain.h"
#include "commands/tablespace.h"
#include "commands/trigger.h"
#include "commands/user.h"
@@ -474,6 +475,22 @@ static const struct config_enum_entry wal_compression_options[] = {
{NULL, 0, false}
};
+static const struct config_enum_entry explain_format_options[] = {
+ {"text", EXPLAIN_FORMAT_TEXT, false},
+ {"xml", EXPLAIN_FORMAT_XML, false},
+ {"json", EXPLAIN_FORMAT_JSON, false},
+ {"yaml", EXPLAIN_FORMAT_YAML, false},
+ {NULL, 0, false}
+};
+
+static const struct config_enum_entry progressive_explain_options[] = {
+ {"off", PROGRESSIVE_EXPLAIN_NONE, false},
+ {"explain", PROGRESSIVE_EXPLAIN_EXPLAIN, false},
+ {"analyze", PROGRESSIVE_EXPLAIN_ANALYZE, false},
+ {"false", PROGRESSIVE_EXPLAIN_NONE, true},
+ {NULL, 0, false}
+};
+
/*
* Options for enum values stored in other modules
*/
@@ -528,6 +545,14 @@ int log_parameter_max_length_on_error = 0;
int log_temp_files = -1;
double log_statement_sample_rate = 1.0;
double log_xact_sample_rate = 0;
+int progressive_explain = PROGRESSIVE_EXPLAIN_NONE;
+bool progressive_explain_verbose = false;
+bool progressive_explain_settings = false;
+bool progressive_explain_timing = false;
+bool progressive_explain_buffers = false;
+bool progressive_explain_wal = false;
+int progressive_explain_interval = 1000;
+int progressive_explain_format = EXPLAIN_FORMAT_TEXT;
char *backtrace_functions;
int temp_file_limit = -1;
@@ -2116,6 +2141,61 @@ struct config_bool ConfigureNamesBool[] =
NULL, NULL, NULL
},
+ {
+ {"progressive_explain_verbose", PGC_USERSET, STATS_MONITORING,
+ gettext_noop("Controls whether verbose details are added to progressive explains."),
+ gettext_noop("Equivalent to the VERBOSE option of EXPLAIN."),
+ GUC_EXPLAIN
+ },
+ &progressive_explain_verbose,
+ false,
+ NULL, NULL, NULL
+ },
+
+ {
+ {"progressive_explain_settings", PGC_USERSET, STATS_MONITORING,
+ gettext_noop("Controls whether information about modified configuration is added to progressive explains."),
+ gettext_noop("Equivalent to the SETTINGS option of EXPLAIN."),
+ GUC_EXPLAIN
+ },
+ &progressive_explain_settings,
+ false,
+ NULL, NULL, NULL
+ },
+
+ {
+ {"progressive_explain_timing", PGC_USERSET, STATS_MONITORING,
+ gettext_noop("Controls whether information about per node timing is added to progressive explains."),
+ gettext_noop("Equivalent to the TIMING option of EXPLAIN."),
+ GUC_EXPLAIN
+ },
+ &progressive_explain_timing,
+ true,
+ NULL, NULL, NULL
+ },
+
+ {
+ {"progressive_explain_buffers", PGC_USERSET, STATS_MONITORING,
+ gettext_noop("Controls whether information about buffers is added to progressive explains."),
+ gettext_noop("Equivalent to the BUFFERS option of EXPLAIN."),
+ GUC_EXPLAIN
+ },
+ &progressive_explain_buffers,
+ false,
+ NULL, NULL, NULL
+ },
+
+ {
+ {"progressive_explain_wal", PGC_USERSET, STATS_MONITORING,
+ gettext_noop("Controls whether information about WAL record generation is added to progressive explains."),
+ gettext_noop("Equivalent to the WAL option of EXPLAIN."),
+ GUC_EXPLAIN
+ },
+ &progressive_explain_wal,
+ false,
+ NULL, NULL, NULL
+ },
+
/* End-of-list marker */
{
{NULL, 0, 0, NULL, NULL}, NULL, false, NULL, NULL, NULL
@@ -3771,6 +3851,18 @@ struct config_int ConfigureNamesInt[] =
NULL, NULL, NULL
},
+ {
+ {"progressive_explain_interval", PGC_USERSET, STATS_MONITORING,
+ gettext_noop("Sets the interval between instrumented progressive "
+ "explains."),
+ NULL,
+ GUC_UNIT_MS
+ },
+ &progressive_explain_interval,
+ 1000, 10, INT_MAX,
+ NULL, NULL, NULL
+ },
+
/* End-of-list marker */
{
{NULL, 0, 0, NULL, NULL}, NULL, 0, 0, 0, NULL, NULL, NULL
@@ -5274,6 +5366,26 @@ struct config_enum ConfigureNamesEnum[] =
NULL, NULL, NULL
},
+ {
+ {"progressive_explain_format", PGC_USERSET, STATS_MONITORING,
+ gettext_noop("Selects the EXPLAIN output format to be used with progressive explains."),
+ gettext_noop("Equivalent to the FORMAT option of EXPLAIN.")
+ },
+ &progressive_explain_format,
+ EXPLAIN_FORMAT_TEXT, explain_format_options,
+ NULL, NULL, NULL
+ },
+
+ {
+ {"progressive_explain", PGC_USERSET, STATS_MONITORING,
+ gettext_noop("Enables progressive explains."),
+ gettext_noop("Explain output is visible via pg_stat_progress_explain.")
+ },
+ &progressive_explain,
+ PROGRESSIVE_EXPLAIN_NONE, progressive_explain_options,
+ NULL, NULL, NULL
+ },
+
/* End-of-list marker */
{
{NULL, 0, 0, NULL, NULL}, NULL, 0, NULL, NULL, NULL, NULL
diff --git a/src/backend/utils/misc/postgresql.conf.sample b/src/backend/utils/misc/postgresql.conf.sample
index d472987ed46..a05a9cebf6f 100644
--- a/src/backend/utils/misc/postgresql.conf.sample
+++ b/src/backend/utils/misc/postgresql.conf.sample
@@ -651,6 +651,18 @@
#log_executor_stats = off
+# - Progressive Explain -
+
+#progressive_explain = off
+#progressive_explain_interval = 1s
+#progressive_explain_format = text
+#progressive_explain_settings = off
+#progressive_explain_verbose = off
+#progressive_explain_buffers = off
+#progressive_explain_wal = off
+#progressive_explain_timing = off
+
+
#------------------------------------------------------------------------------
# VACUUMING
#------------------------------------------------------------------------------
diff --git a/src/include/catalog/pg_proc.dat b/src/include/catalog/pg_proc.dat
index 9e803d610d7..bb4e514b7ea 100644
--- a/src/include/catalog/pg_proc.dat
+++ b/src/include/catalog/pg_proc.dat
@@ -12464,4 +12464,14 @@
proargtypes => 'int4',
prosrc => 'gist_stratnum_common' },
+{ oid => '8770',
+ descr => 'statistics: information about progress of backends running statements',
+ proname => 'pg_stat_progress_explain', prorows => '100', proisstrict => 'f',
+ proretset => 't', provolatile => 's', proparallel => 'r',
+ prorettype => 'record', proargtypes => 'bool',
+ proallargtypes => '{bool,int4,timestamptz,text}',
+ proargmodes => '{i,o,o,o}',
+ proargnames => '{mode,pid,last_print,query_plan}',
+ prosrc => 'pg_stat_progress_explain' },
+
]
diff --git a/src/include/commands/explain.h b/src/include/commands/explain.h
index ea7419951f4..89b94dfcf67 100644
--- a/src/include/commands/explain.h
+++ b/src/include/commands/explain.h
@@ -13,6 +13,7 @@
#ifndef EXPLAIN_H
#define EXPLAIN_H
+#include "datatype/timestamp.h"
#include "executor/executor.h"
#include "lib/stringinfo.h"
#include "parser/parse_node.h"
@@ -32,6 +33,13 @@ typedef enum ExplainFormat
EXPLAIN_FORMAT_YAML,
} ExplainFormat;
+typedef enum ProgressiveExplain
+{
+ PROGRESSIVE_EXPLAIN_NONE,
+ PROGRESSIVE_EXPLAIN_EXPLAIN,
+ PROGRESSIVE_EXPLAIN_ANALYZE,
+} ProgressiveExplain;
+
typedef struct ExplainWorkersState
{
int num_workers; /* # of worker processes the plan used */
@@ -67,12 +75,37 @@ typedef struct ExplainState
List *deparse_cxt; /* context list for deparsing expressions */
Bitmapset *printed_subplans; /* ids of SubPlans we've printed */
bool hide_workers; /* set if we find an invisible Gather */
+ bool progressive; /* set if tracking a progressive explain */
int rtable_size; /* length of rtable excluding the RTE_GROUP
* entry */
/* state related to the current plan node */
ExplainWorkersState *workers_state; /* needed if parallel plan */
+
+ /* state related to progressive explains */
+ struct PlanState *pe_curr_node;
+ struct Instrumentation *pe_local_instr;
+ dsa_area *pe_a;
} ExplainState;
+typedef struct progressiveExplainHashKey
+{
+ int pid; /* PID */
+} progressiveExplainHashKey;
+
+typedef struct progressiveExplainHashEntry
+{
+ progressiveExplainHashKey key; /* hash key of entry - MUST BE FIRST */
+ dsa_handle h;
+ dsa_pointer p;
+} progressiveExplainHashEntry;
+
+typedef struct progressiveExplainData
+{
+ int pid;
+ TimestampTz last_print;
+ char plan[];
+} progressiveExplainData;
+
/* Hook for plugins to get control in ExplainOneQuery() */
typedef void (*ExplainOneQuery_hook_type) (Query *query,
int cursorOptions,
@@ -144,4 +177,13 @@ extern void ExplainCloseGroup(const char *objtype, const char *labelname,
extern DestReceiver *CreateExplainSerializeDestReceiver(ExplainState *es);
+extern void ProgressiveExplainBegin(QueryDesc *queryDesc);
+extern void ProgressiveExplainFinish(QueryDesc *queryDesc);
+extern void ProgressiveExplainSetup(QueryDesc *queryDesc);
+extern void ProgressiveExplainUpdate(PlanState *node);
+extern Size ProgressiveExplainHashShmemSize(void);
+extern void InitProgressiveExplainHash(void);
+
+extern bool ProgressiveExplainPending;
+
#endif /* EXPLAIN_H */
diff --git a/src/include/executor/execdesc.h b/src/include/executor/execdesc.h
index 86db3dc8d0d..f9985ca7429 100644
--- a/src/include/executor/execdesc.h
+++ b/src/include/executor/execdesc.h
@@ -47,6 +47,7 @@ typedef struct QueryDesc
TupleDesc tupDesc; /* descriptor for result tuples */
EState *estate; /* executor's query-wide state */
PlanState *planstate; /* tree of per-plan-node state */
+ struct ExplainState *pe_es; /* progressive explain state if enabled */
/* This field is set by ExecutePlan */
bool already_executed; /* true if previously executed */
diff --git a/src/include/executor/instrument.h b/src/include/executor/instrument.h
index 03653ab6c6c..21de2a5632d 100644
--- a/src/include/executor/instrument.h
+++ b/src/include/executor/instrument.h
@@ -109,6 +109,7 @@ extern void InstrStartNode(Instrumentation *instr);
extern void InstrStopNode(Instrumentation *instr, double nTuples);
extern void InstrUpdateTupleCount(Instrumentation *instr, double nTuples);
extern void InstrEndLoop(Instrumentation *instr);
+extern void InstrEndLoopForce(Instrumentation *instr);
extern void InstrAggNode(Instrumentation *dst, Instrumentation *add);
extern void InstrStartParallelQuery(void);
extern void InstrEndParallelQuery(BufferUsage *bufusage, WalUsage *walusage);
diff --git a/src/include/nodes/execnodes.h b/src/include/nodes/execnodes.h
index 2625d7e8222..0a8ab9109ae 100644
--- a/src/include/nodes/execnodes.h
+++ b/src/include/nodes/execnodes.h
@@ -56,6 +56,7 @@ struct ExprState;
struct ExprContext;
struct RangeTblEntry; /* avoid including parsenodes.h here */
struct ExprEvalStep; /* avoid including execExpr.h everywhere */
+struct QueryDesc; /* avoid including execdesc.h here */
struct CopyMultiInsertBuffer;
struct LogicalTapeSet;
@@ -760,6 +761,9 @@ typedef struct EState
*/
List *es_insert_pending_result_relations;
List *es_insert_pending_modifytables;
+
+ /* Reference to query descriptor */
+ struct QueryDesc *query_desc;
} EState;
diff --git a/src/include/storage/lwlock.h b/src/include/storage/lwlock.h
index 2aa46fd50da..716623bde3a 100644
--- a/src/include/storage/lwlock.h
+++ b/src/include/storage/lwlock.h
@@ -215,6 +215,7 @@ typedef enum BuiltinTrancheIds
LWTRANCHE_SUBTRANS_SLRU,
LWTRANCHE_XACT_SLRU,
LWTRANCHE_PARALLEL_VACUUM_DSA,
+ LWTRANCHE_PROGRESSIVE_EXPLAIN_DSA,
LWTRANCHE_FIRST_USER_DEFINED,
} BuiltinTrancheIds;
diff --git a/src/include/storage/lwlocklist.h b/src/include/storage/lwlocklist.h
index cf565452382..43f10a51862 100644
--- a/src/include/storage/lwlocklist.h
+++ b/src/include/storage/lwlocklist.h
@@ -83,3 +83,4 @@ PG_LWLOCK(49, WALSummarizer)
PG_LWLOCK(50, DSMRegistry)
PG_LWLOCK(51, InjectionPoint)
PG_LWLOCK(52, SerialControl)
+PG_LWLOCK(53, ExplainHash)
diff --git a/src/include/utils/guc.h b/src/include/utils/guc.h
index 1233e07d7da..b6326550ba3 100644
--- a/src/include/utils/guc.h
+++ b/src/include/utils/guc.h
@@ -278,6 +278,14 @@ extern PGDLLIMPORT int log_min_duration_statement;
extern PGDLLIMPORT int log_temp_files;
extern PGDLLIMPORT double log_statement_sample_rate;
extern PGDLLIMPORT double log_xact_sample_rate;
+extern PGDLLIMPORT int progressive_explain;
+extern PGDLLIMPORT int progressive_explain_interval;
+extern PGDLLIMPORT int progressive_explain_format;
+extern PGDLLIMPORT bool progressive_explain_verbose;
+extern PGDLLIMPORT bool progressive_explain_settings;
+extern PGDLLIMPORT bool progressive_explain_timing;
+extern PGDLLIMPORT bool progressive_explain_buffers;
+extern PGDLLIMPORT bool progressive_explain_wal;
extern PGDLLIMPORT char *backtrace_functions;
extern PGDLLIMPORT int temp_file_limit;
diff --git a/src/include/utils/timeout.h b/src/include/utils/timeout.h
index 7b19beafdc9..f2751c5b4df 100644
--- a/src/include/utils/timeout.h
+++ b/src/include/utils/timeout.h
@@ -36,6 +36,7 @@ typedef enum TimeoutId
IDLE_STATS_UPDATE_TIMEOUT,
CLIENT_CONNECTION_CHECK_TIMEOUT,
STARTUP_PROGRESS_TIMEOUT,
+ PROGRESSIVE_EXPLAIN_TIMEOUT,
/* First user-definable timeout reason */
USER_TIMEOUT,
/* Maximum number of timeout reasons */
diff --git a/src/test/regress/expected/rules.out b/src/test/regress/expected/rules.out
index 5baba8d39ff..b93c22773be 100644
--- a/src/test/regress/expected/rules.out
+++ b/src/test/regress/expected/rules.out
@@ -2041,6 +2041,10 @@ pg_stat_progress_create_index| SELECT s.pid,
s.param15 AS partitions_done
FROM (pg_stat_get_progress_info('CREATE INDEX'::text) s(pid, datid, relid, param1, param2, param3, param4, param5, param6, param7, param8, param9, param10, param11, param12, param13, param14, param15, param16, param17, param18, param19, param20)
LEFT JOIN pg_database d ON ((s.datid = d.oid)));
+pg_stat_progress_explain| SELECT pid,
+ last_print,
+ query_plan
+ FROM pg_stat_progress_explain(true) pg_stat_progress_explain(pid, last_print, query_plan);
pg_stat_progress_vacuum| SELECT s.pid,
s.datid,
d.datname,
diff --git a/src/tools/pgindent/typedefs.list b/src/tools/pgindent/typedefs.list
index 80aa50d55a4..0ae80866978 100644
--- a/src/tools/pgindent/typedefs.list
+++ b/src/tools/pgindent/typedefs.list
@@ -2250,6 +2250,7 @@ ProcessUtilityContext
ProcessUtility_hook_type
ProcessingMode
ProgressCommandType
+ProgressiveExplain
ProjectSet
ProjectSetPath
ProjectSetState
@@ -3847,6 +3848,9 @@ process_sublinks_context
proclist_head
proclist_mutable_iter
proclist_node
+progressiveExplainData
+progressiveExplainHashEntry
+progressiveExplainHashKey
promptStatus_t
pthread_barrier_t
pthread_cond_t
--
2.43.0
Hello,
Was recently benchmarking the last version of the patch and found room for
improvement when GUC progressive_explain is enabled globally.
Results with the last published version of the patch:
- progressive_explain = off:
/usr/local/pgsql/bin/pgbench -S -n -T 10 -c 30
tps = 18249.363540 (without initial connection time)
- progressive_explain = 'explain':
/usr/local/pgsql/bin/pgbench -S -n -T 10 -c 30
tps = 3536.635125 (without initial connection time)
This is because progressive explains are being printed for every query,
including
the ones that finish instantly.
If we think about it, those printed plans for instant queries are useless as
other backends won't have a chance to look at the plans before they get
removed
from pg_stat_progress_explain.
So this new version of the patch implements new GUC
progressive_explain_min_duration
to be a used as a threshold for the plan to be printed for the first time:
- progressive_explain_min_duration: min query duration until progressive
explain starts.
- type: int
- default: 1s
- min: 0
- context: user
Results with the new version:
- progressive_explain = off:
/usr/local/pgsql/bin/pgbench -S -n -T 10 -c 30
tps = 18871.800242 (without initial connection time)
- progressive_explain = 'explain' and progressive_explain_min_duration =
'5s':
/usr/local/pgsql/bin/pgbench -S -n -T 10 -c 30
tps = 18896.266538 (without initial connection time)
Implementation of the new GUC progressive_explain_min_duration was done with
timeouts. The timeout callback function is used to initialize the
progressive
explain.
There is a catch to this implementation. In thread
/messages/by-id/d68c3ae31672664876b22d2dcbb526d2@postgrespro.ru
where torikoshia proposes logging of query plans it was raised concerns
about
logging plans in the CFI, a sensible part of the code. So torikoshia
implemented
a smart workaround consisting in adjusting the execProcNode wrapper of all
nodes
so that the plan printing can be done there.
I'm not sure if this same concern applies to timeout callbacks so I also
implemented
a second version of the latest patch that uses that execProcNode wrapper
strategy.
The wrapper code was implemented by torikoshia (torikoshia@oss.nttdata.com),
so
adding the credits here.
Rafael.
Attachments:
v7-0001-Proposal-for-progressive-explains.patchapplication/octet-stream; name=v7-0001-Proposal-for-progressive-explains.patchDownload
From 01c9b23fc3f492f6aa3b33039cb94916d040b097 Mon Sep 17 00:00:00 2001
From: rafaelthca <rafaelthca@gmail.com>
Date: Thu, 6 Mar 2025 17:34:16 -0300
Subject: [PATCH v7] Proposal for progressive explains
This proposal introduces a feature to print execution plans of active
queries in an in-memory shared hash object so that other sessions can
visualize via new view pg_stat_progress_explain.
Plans are only printed if new GUC parameter progressive_explain is
enabled.
When progressive_explain is set to 'explain' the plan will be printed
only once at the beginning of the query. If set to 'analyze' the QueryDesc
will be adjusted adding instrumentation flags. In that case the plan
will be printed on a fixed interval controlled by new GUC parameter
progressive_explain_interval including all instrumentation stats
computed so far (per node rows and execution time).
New view:
- pg_stat_progress_explain
- pid: PID of the process running the query
- last_print: timestamp when plan was last printed
- query_plan: the actual plan (limited read privileges)
New GUCs:
- progressive_explain: if progressive plans are printed for local
session.
- type: enum
- default: off
- values: [off, explain, analyze]
- context: user
- progressive_explain_min_duration: min query duration until progressive
explain starts.
- type: int
- default: 1s
- min: 0
- context: user
- progressive_explain_interval: interval between each explain print.
- type: int
- default: 1s
- min: 10ms
- context: user
- progressive_explain_format: format used to print the plans.
- type: enum
- default: text
- values: [TEXT, XML, JSON, or YAML]
- context: user
- progressive_explain_settings: controls whether information about
modified configuration is added to the printed plan.
- type: bool
- default: off
- context: user
- progressive_explain_verbose: controls whether verbose details are
added to the printed plan.
- type: bool
- default: off
- context: user
- progressive_explain_buffers: controls whether buffers details are
added to the printed plan.
- type: bool
- default: off
- context: user
- progressive_explain_timing: controls whether per node timing details
are added to the printed plan.
- type: bool
- default: off
- context: user
- progressive_explain_wal: controls whether WAL record generation
details are added to the printed plan.
- type: bool
- default: off
- context: user
---
doc/src/sgml/config.sgml | 134 ++++
doc/src/sgml/monitoring.sgml | 64 ++
doc/src/sgml/perform.sgml | 96 +++
src/backend/catalog/system_views.sql | 5 +
src/backend/commands/explain.c | 635 ++++++++++++++++--
src/backend/executor/execMain.c | 24 +
src/backend/executor/execProcnode.c | 10 +-
src/backend/executor/instrument.c | 20 +-
src/backend/storage/ipc/ipci.c | 7 +
src/backend/storage/lmgr/lwlock.c | 1 +
.../utils/activity/wait_event_names.txt | 1 +
src/backend/utils/init/postinit.c | 19 +
src/backend/utils/misc/guc_tables.c | 125 ++++
src/backend/utils/misc/postgresql.conf.sample | 13 +
src/include/catalog/pg_proc.dat | 10 +
src/include/commands/explain.h | 44 ++
src/include/executor/execdesc.h | 1 +
src/include/executor/instrument.h | 1 +
src/include/nodes/execnodes.h | 6 +
src/include/storage/lwlock.h | 1 +
src/include/storage/lwlocklist.h | 1 +
src/include/utils/guc.h | 9 +
src/include/utils/timeout.h | 2 +
.../test_misc/t/008_progressive_explain.pl | 130 ++++
src/test/regress/expected/rules.out | 4 +
src/tools/pgindent/typedefs.list | 4 +
26 files changed, 1317 insertions(+), 50 deletions(-)
create mode 100644 src/test/modules/test_misc/t/008_progressive_explain.pl
diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml
index d2fa5f7d1a9..6f0b0d06c7b 100644
--- a/doc/src/sgml/config.sgml
+++ b/doc/src/sgml/config.sgml
@@ -8483,6 +8483,140 @@ COPY postgres_log FROM '/full/path/to/logfile.csv' WITH csv;
</listitem>
</varlistentry>
+ <varlistentry id="guc-progressive-explain" xreflabel="progressive_explain">
+ <term><varname>progressive_explain</varname> (<type>enum</type>)
+ <indexterm>
+ <primary><varname>progressive_explain</varname> configuration parameter</primary>
+ </indexterm>
+ </term>
+ <listitem>
+ <para>
+ Determines whether progressive explains are enabled and how
+ they are executed; see <xref linkend="using-explain-progressive"/>.
+ Possible values are <literal>off</literal>, <literal>explain</literal>
+ and <literal>analyze</literal>. The default is <literal>off</literal>.
+ </para>
+ </listitem>
+ </varlistentry>
+
+ <varlistentry id="guc-progressive-explain-min-duration" xreflabel="progressive_explain_min_duration">
+ <term><varname>progressive_explain_min_duration</varname> (<type>integer</type>)
+ <indexterm>
+ <primary><varname>progressive_explain_min_duration</varname> configuration parameter</primary>
+ </indexterm>
+ </term>
+ <listitem>
+ <para>
+ Sets the threshold (in milliseconds) until progressive explain is
+ printed for the first time. The default is <literal>1s</literal>.
+ </para>
+ </listitem>
+ </varlistentry>
+
+ <varlistentry id="guc-progressive-explain-interval" xreflabel="progressive_explain_interval">
+ <term><varname>progressive_explain_interval</varname> (<type>integer</type>)
+ <indexterm>
+ <primary><varname>progressive_explain_interval</varname> configuration parameter</primary>
+ </indexterm>
+ </term>
+ <listitem>
+ <para>
+ Sets the interval (in milliseconds) between each instrumented
+ progressive explain. The default is <literal>1s</literal>.
+ </para>
+ </listitem>
+ </varlistentry>
+
+ <varlistentry id="guc-progressive-explain-buffers" xreflabel="progressive_explain_buffers">
+ <term><varname>progressive_explain_buffers</varname> (<type>boolean</type>)
+ <indexterm>
+ <primary><varname>progressive_explain_buffers</varname> configuration parameter</primary>
+ </indexterm>
+ </term>
+ <listitem>
+ <para>
+ Controls whether information about buffers is added to
+ progressive explains. Equivalent to the BUFFERS option of
+ EXPLAIN. The default is <literal>off</literal>.
+ </para>
+ </listitem>
+ </varlistentry>
+
+ <varlistentry id="guc-progressive-explain-timing" xreflabel="progressive_explain_timing">
+ <term><varname>progressive_explain_timing</varname> (<type>boolean</type>)
+ <indexterm>
+ <primary><varname>progressive_explain_timing</varname> configuration parameter</primary>
+ </indexterm>
+ </term>
+ <listitem>
+ <para>
+ Controls whether information about per node timing is added
+ to progressive explains. Equivalent to the TIMING option of
+ EXPLAIN. The default is <literal>off</literal>.
+ </para>
+ </listitem>
+ </varlistentry>
+
+ <varlistentry id="guc-progressive-explain-wal" xreflabel="progressive_explain_wal">
+ <term><varname>progressive_explain_wal</varname> (<type>boolean</type>)
+ <indexterm>
+ <primary><varname>progressive_explain_wal</varname> configuration parameter</primary>
+ </indexterm>
+ </term>
+ <listitem>
+ <para>
+ Controls whether information about WAL record generation is
+ added to progressive explains. Equivalent to the WAL option of
+ EXPLAIN. The default is <literal>off</literal>.
+ </para>
+ </listitem>
+ </varlistentry>
+
+ <varlistentry id="guc-progressive-explain-verbose" xreflabel="progressive_explain_verbose">
+ <term><varname>progressive_explain_verbose</varname> (<type>boolean</type>)
+ <indexterm>
+ <primary><varname>progressive_explain_verbose</varname> configuration parameter</primary>
+ </indexterm>
+ </term>
+ <listitem>
+ <para>
+ Controls whether verbose details are added to progressive explains.
+ Equivalent to the VERBOSE option of EXPLAIN. The default is
+ <literal>off</literal>.
+ </para>
+ </listitem>
+ </varlistentry>
+
+ <varlistentry id="guc-progressive-explain-settings" xreflabel="progressive_explain_settings">
+ <term><varname>progressive_explain_settings</varname> (<type>boolean</type>)
+ <indexterm>
+ <primary><varname>progressive_explain_settings</varname> configuration parameter</primary>
+ </indexterm>
+ </term>
+ <listitem>
+ <para>
+ Controls whether information about modified configuration is added to
+ progressive explains. Equivalent to the SETTINGS option of EXPLAIN.
+ The default is <literal>off</literal>.
+ </para>
+ </listitem>
+ </varlistentry>
+
+ <varlistentry id="guc-progressive-explain-format" xreflabel="progressive_explain_format">
+ <term><varname>progressive_explain_format</varname> (<type>enum</type>)
+ <indexterm>
+ <primary><varname>progressive_explain_format</varname> configuration parameter</primary>
+ </indexterm>
+ </term>
+ <listitem>
+ <para>
+ Selects the EXPLAIN output format to be used with progressive
+ explains. Equivalent to the FORMAT option of EXPLAIN. The default
+ is <literal>text</literal>.
+ </para>
+ </listitem>
+ </varlistentry>
+
</variablelist>
</sect2>
diff --git a/doc/src/sgml/monitoring.sgml b/doc/src/sgml/monitoring.sgml
index 16646f560e8..8a9934d53e1 100644
--- a/doc/src/sgml/monitoring.sgml
+++ b/doc/src/sgml/monitoring.sgml
@@ -6793,6 +6793,70 @@ FROM pg_stat_get_backend_idset() AS backendid;
</sect2>
+<sect2 id="explain-progress-reporting">
+ <title>EXPLAIN Progress Reporting</title>
+
+ <indexterm>
+ <primary>pg_stat_progress_explain</primary>
+ </indexterm>
+
+ <para>
+ Whenever a client backend or parallel worker is running a query with
+ <xref linkend="guc-progressive-explain"/> enabled, the
+ <structname>pg_stat_progress_explain</structname> view will contain a
+ corresponding row with query plan details; see
+ <xref linkend="using-explain-progressive"/>. The table below describe the
+ information that will be reported.
+ </para>
+
+ <table id="pg-stat-progress-explain-view" xreflabel="pg_stat_progress_explain">
+ <title><structname>pg_stat_progress_explain</structname> View</title>
+ <tgroup cols="1">
+ <thead>
+ <row>
+ <entry role="catalog_table_entry"><para role="column_definition">
+ Column Type
+ </para>
+ <para>
+ Description
+ </para></entry>
+ </row>
+ </thead>
+
+ <tbody>
+ <row>
+ <entry role="catalog_table_entry"><para role="column_definition">
+ <structfield>pid</structfield> <type>integer</type>
+ </para>
+ <para>
+ Process ID of a client backend or parallel worker.
+ </para></entry>
+ </row>
+
+ <row>
+ <entry role="catalog_table_entry"><para role="column_definition">
+ <structfield>last_print</structfield> <type>timestamp with time zone</type>
+ </para>
+ <para>
+ Timestamp when plan was last printed.
+ </para></entry>
+ </row>
+
+ <row>
+ <entry role="catalog_table_entry"><para role="column_definition">
+ <structfield>query_plan</structfield> <type>text</type>
+ </para>
+ <para>
+ The progressive explain text.
+ </para></entry>
+ </row>
+
+ </tbody>
+ </tgroup>
+ </table>
+
+ </sect2>
+
</sect1>
<sect1 id="dynamic-trace">
diff --git a/doc/src/sgml/perform.sgml b/doc/src/sgml/perform.sgml
index be4b49f62b5..50eaf5533a4 100644
--- a/doc/src/sgml/perform.sgml
+++ b/doc/src/sgml/perform.sgml
@@ -1109,6 +1109,102 @@ EXPLAIN ANALYZE SELECT * FROM tenk1 WHERE unique1 < 100 AND unique2 > 9000
</para>
</sect2>
+ <sect2 id="using-explain-progressive">
+ <title>Progressive <command>EXPLAIN</command></title>
+
+ <para>
+ The query plan created by the planner for any given active query can
+ be visualized by any session via <xref linkend="pg-stat-progress-explain-view"/>
+ view when <xref linkend="guc-progressive-explain"/> is enabled in the
+ client backend or parallel worker executing query and after min duration
+ specified by <xref linkend="guc-progressive-explain-min-duration"/> has
+ passed. Settings <xref linkend="guc-progressive-explain-timing"/>,
+ <xref linkend="guc-progressive-explain-buffers"/> and
+ <xref linkend="guc-progressive-explain-wal"/> control which additional
+ instrumentaton details are collected and included in the output while
+ <xref linkend="guc-progressive-explain-format"/>,
+ <xref linkend="guc-progressive-explain-verbose"/> and
+ <xref linkend="guc-progressive-explain-settings"/>
+ define how the plan is printed and which details are added there.
+ </para>
+
+ <para>
+ When <xref linkend="guc-progressive-explain"/> is set to <literal>explain</literal>
+ the plan will be printed once at the beginning of the query.
+ </para>
+
+ <para>
+<screen>
+SET progressive_explain = 'explain';
+SET
+
+SELECT * FROM test t1 INNER JOIN test t2 ON t1.c1=t2.c1;
+</screen>
+ </para>
+ <para>
+<screen>
+SELECT * FROM pg_stat_progress_explain;
+ pid | last_print | query_plan
+------+-------------------------------+------------------------------------------------------------------------------
+ 5307 | 2025-02-18 09:37:41.781459-03 | Hash Join (cost=327879.85..878413.95 rows=9999860 width=18) +
+ | | Hash Cond: (t1.c1 = t2.c1) +
+ | | -> Seq Scan on test t1 (cost=0.00..154053.60 rows=9999860 width=9) +
+ | | -> Hash (cost=154053.60..154053.60 rows=9999860 width=9) +
+ | | -> Seq Scan on test t2 (cost=0.00..154053.60 rows=9999860 width=9)+
+ | |
+(1 row)
+</screen>
+ </para>
+
+ <para>
+ Setting <xref linkend="guc-progressive-explain"/> to <literal>analyze</literal>
+ will enable instrumentation and the detailed plan is printed on a fixed interval
+ controlled by <xref linkend="guc-progressive-explain-interval"/>, including
+ per node accumulated row count and other statistics.
+ </para>
+
+ <para>
+ Progressive explains include additional information per node to help analyzing
+ execution progress:
+
+ <itemizedlist>
+ <listitem>
+ <para>
+ current: the plan node currently being processed.
+ </para>
+ </listitem>
+ <listitem>
+ <para>
+ never executed: a plan node not processed yet.
+ </para>
+ </listitem>
+ </itemizedlist>
+ </para>
+ <para>
+<screen>
+SET progressive_explain = 'analyze';
+SET
+
+SELECT * FROM test t1 INNER JOIN test t2 ON t1.c1=t2.c1;
+</screen>
+ </para>
+ <para>
+<screen>
+SELECT * FROM pg_stat_progress_explain;
+ pid | last_print | query_plan
+------+-------------------------------+------------------------------------------------------------------------------------------------------------------------------------
+ 5307 | 2025-02-18 09:36:03.580721-03 | Hash Join (cost=327879.85..878413.95 rows=9999860 width=18) (actual time=2010.504..2998.111 rows=38603 loops=1) +
+ | | Hash Cond: (t1.c1 = t2.c1) +
+ | | -> Seq Scan on test t1 (cost=0.00..154053.60 rows=9999860 width=9) (actual time=0.068..303.963 rows=4928320 loops=1) (current)+
+ | | -> Hash (cost=154053.60..154053.60 rows=9999860 width=9) (actual time=2009.824..2009.824 rows=10000000 loops=1) +
+ | | -> Seq Scan on test t2 (cost=0.00..154053.60 rows=9999860 width=9) (actual time=0.035..640.890 rows=10000000 loops=1) +
+ | |
+(1 row)
+</screen>
+ </para>
+
+ </sect2>
+
</sect1>
<sect1 id="planner-stats">
diff --git a/src/backend/catalog/system_views.sql b/src/backend/catalog/system_views.sql
index a4d2cfdcaf5..dd746078ea7 100644
--- a/src/backend/catalog/system_views.sql
+++ b/src/backend/catalog/system_views.sql
@@ -1334,6 +1334,11 @@ CREATE VIEW pg_stat_progress_copy AS
FROM pg_stat_get_progress_info('COPY') AS S
LEFT JOIN pg_database D ON S.datid = D.oid;
+CREATE VIEW pg_stat_progress_explain AS
+ SELECT
+ *
+ FROM pg_stat_progress_explain(true);
+
CREATE VIEW pg_user_mappings AS
SELECT
U.oid AS umid,
diff --git a/src/backend/commands/explain.c b/src/backend/commands/explain.c
index d8a7232cedb..ad05560462d 100644
--- a/src/backend/commands/explain.c
+++ b/src/backend/commands/explain.c
@@ -21,6 +21,7 @@
#include "commands/explain_format.h"
#include "commands/prepare.h"
#include "foreign/fdwapi.h"
+#include "funcapi.h"
#include "jit/jit.h"
#include "libpq/pqformat.h"
#include "libpq/protocol.h"
@@ -32,6 +33,7 @@
#include "rewrite/rewriteHandler.h"
#include "storage/bufmgr.h"
#include "tcop/tcopprot.h"
+#include "utils/backend_status.h"
#include "utils/builtins.h"
#include "utils/guc_tables.h"
#include "utils/json.h"
@@ -39,17 +41,28 @@
#include "utils/rel.h"
#include "utils/ruleutils.h"
#include "utils/snapmgr.h"
+#include "utils/timeout.h"
#include "utils/tuplesort.h"
#include "utils/typcache.h"
#include "utils/xml.h"
+#define PROGRESSIVE_EXPLAIN_ALLOC_SIZE 4096
+
/* Hook for plugins to get control in ExplainOneQuery() */
ExplainOneQuery_hook_type ExplainOneQuery_hook = NULL;
/* Hook for plugins to get control in explain_get_index_name() */
explain_get_index_name_hook_type explain_get_index_name_hook = NULL;
+/* Shared hash to store progressive explains */
+static HTAB *progressiveExplainArray = NULL;
+
+/* Pointer to running query */
+static QueryDesc *activeQueryDesc = NULL;
+
+/* Flag set by timeouts to control when to print progressive explains */
+bool ProgressiveExplainPending = false;
/*
* Various places within need to convert bytes to kilobytes. Round these up
@@ -128,7 +141,7 @@ static void show_hashagg_info(AggState *aggstate, ExplainState *es);
static void show_tidbitmap_info(BitmapHeapScanState *planstate,
ExplainState *es);
static void show_instrumentation_count(const char *qlabel, int which,
- PlanState *planstate, ExplainState *es);
+ Instrumentation *instr, ExplainState *es);
static void show_foreignscan_info(ForeignScanState *fsstate, ExplainState *es);
static const char *explain_get_index_name(Oid indexId);
static bool peek_buffer_usage(ExplainState *es, const BufferUsage *usage);
@@ -154,6 +167,10 @@ static ExplainWorkersState *ExplainCreateWorkersState(int num_workers);
static void ExplainOpenWorker(int n, ExplainState *es);
static void ExplainCloseWorker(int n, ExplainState *es);
static void ExplainFlushWorkersState(ExplainState *es);
+static void ProgressiveExplainInit(QueryDesc *queryDesc);
+static void ProgressiveExplainPrint(QueryDesc *queryDesc);
+static void ProgressiveExplainCleanup(QueryDesc *queryDesc);
+static void ProgressiveExplainReleaseFunc(void *);
@@ -366,6 +383,8 @@ NewExplainState(void)
es->costs = true;
/* Prepare output buffer. */
es->str = makeStringInfo();
+ /* An explain state is not progressive by default */
+ es->progressive = false;
return es;
}
@@ -1490,6 +1509,7 @@ ExplainNode(PlanState *planstate, List *ancestors,
const char *partialmode = NULL;
const char *operation = NULL;
const char *custom_name = NULL;
+ Instrumentation *local_instr = NULL;
ExplainWorkersState *save_workers_state = es->workers_state;
int save_indent = es->indent;
bool haschildren;
@@ -1953,17 +1973,38 @@ ExplainNode(PlanState *planstate, List *ancestors,
* instrumentation results the user didn't ask for. But we do the
* InstrEndLoop call anyway, if possible, to reduce the number of cases
* auto_explain has to contend with.
+ *
+ * For regular explains instrumentation clean up is called directly in the
+ * main instrumentation objects. Progressive explains need to clone
+ * instrumentation object and forcibly end the loop in nodes that may be
+ * running.
*/
if (planstate->instrument)
- InstrEndLoop(planstate->instrument);
+ {
+ /* Progressive explain. Use auxiliary instrumentation object */
+ if (es->progressive)
+ {
+ local_instr = es->pe_local_instr;
+ *local_instr = *planstate->instrument;
+
+ /* Force end loop even if node is in progress */
+ InstrEndLoopForce(local_instr);
+ }
+ /* Use main instrumentation */
+ else
+ {
+ local_instr = planstate->instrument;
+ InstrEndLoop(local_instr);
+ }
+ }
if (es->analyze &&
- planstate->instrument && planstate->instrument->nloops > 0)
+ local_instr && local_instr->nloops > 0)
{
- double nloops = planstate->instrument->nloops;
- double startup_ms = 1000.0 * planstate->instrument->startup / nloops;
- double total_ms = 1000.0 * planstate->instrument->total / nloops;
- double rows = planstate->instrument->ntuples / nloops;
+ double nloops = local_instr->nloops;
+ double startup_ms = 1000.0 * local_instr->startup / nloops;
+ double total_ms = 1000.0 * local_instr->total / nloops;
+ double rows = local_instr->ntuples / nloops;
if (es->format == EXPLAIN_FORMAT_TEXT)
{
@@ -1973,6 +2014,9 @@ ExplainNode(PlanState *planstate, List *ancestors,
appendStringInfo(es->str, "time=%.3f..%.3f ", startup_ms, total_ms);
appendStringInfo(es->str, "rows=%.2f loops=%.0f)", rows, nloops);
+
+ if (es->progressive && planstate == es->pe_curr_node)
+ appendStringInfo(es->str, " (current)");
}
else
{
@@ -1985,6 +2029,10 @@ ExplainNode(PlanState *planstate, List *ancestors,
}
ExplainPropertyFloat("Actual Rows", NULL, rows, 2, es);
ExplainPropertyFloat("Actual Loops", NULL, nloops, 0, es);
+
+ /* Progressive explain. Add current node flag is applicable */
+ if (es->progressive && planstate == es->pe_curr_node)
+ ExplainPropertyBool("Current", true, es);
}
}
else if (es->analyze)
@@ -2089,29 +2137,29 @@ ExplainNode(PlanState *planstate, List *ancestors,
"Index Cond", planstate, ancestors, es);
if (((IndexScan *) plan)->indexqualorig)
show_instrumentation_count("Rows Removed by Index Recheck", 2,
- planstate, es);
+ local_instr, es);
show_scan_qual(((IndexScan *) plan)->indexorderbyorig,
"Order By", planstate, ancestors, es);
show_scan_qual(plan->qual, "Filter", planstate, ancestors, es);
if (plan->qual)
show_instrumentation_count("Rows Removed by Filter", 1,
- planstate, es);
+ local_instr, es);
break;
case T_IndexOnlyScan:
show_scan_qual(((IndexOnlyScan *) plan)->indexqual,
"Index Cond", planstate, ancestors, es);
if (((IndexOnlyScan *) plan)->recheckqual)
show_instrumentation_count("Rows Removed by Index Recheck", 2,
- planstate, es);
+ local_instr, es);
show_scan_qual(((IndexOnlyScan *) plan)->indexorderby,
"Order By", planstate, ancestors, es);
show_scan_qual(plan->qual, "Filter", planstate, ancestors, es);
if (plan->qual)
show_instrumentation_count("Rows Removed by Filter", 1,
- planstate, es);
+ local_instr, es);
if (es->analyze)
ExplainPropertyFloat("Heap Fetches", NULL,
- planstate->instrument->ntuples2, 0, es);
+ local_instr->ntuples2, 0, es);
break;
case T_BitmapIndexScan:
show_scan_qual(((BitmapIndexScan *) plan)->indexqualorig,
@@ -2122,11 +2170,11 @@ ExplainNode(PlanState *planstate, List *ancestors,
"Recheck Cond", planstate, ancestors, es);
if (((BitmapHeapScan *) plan)->bitmapqualorig)
show_instrumentation_count("Rows Removed by Index Recheck", 2,
- planstate, es);
+ local_instr, es);
show_scan_qual(plan->qual, "Filter", planstate, ancestors, es);
if (plan->qual)
show_instrumentation_count("Rows Removed by Filter", 1,
- planstate, es);
+ local_instr, es);
show_tidbitmap_info((BitmapHeapScanState *) planstate, es);
break;
case T_SampleScan:
@@ -2143,7 +2191,7 @@ ExplainNode(PlanState *planstate, List *ancestors,
show_scan_qual(plan->qual, "Filter", planstate, ancestors, es);
if (plan->qual)
show_instrumentation_count("Rows Removed by Filter", 1,
- planstate, es);
+ local_instr, es);
if (IsA(plan, CteScan))
show_ctescan_info(castNode(CteScanState, planstate), es);
break;
@@ -2154,7 +2202,7 @@ ExplainNode(PlanState *planstate, List *ancestors,
show_scan_qual(plan->qual, "Filter", planstate, ancestors, es);
if (plan->qual)
show_instrumentation_count("Rows Removed by Filter", 1,
- planstate, es);
+ local_instr, es);
ExplainPropertyInteger("Workers Planned", NULL,
gather->num_workers, es);
@@ -2178,7 +2226,7 @@ ExplainNode(PlanState *planstate, List *ancestors,
show_scan_qual(plan->qual, "Filter", planstate, ancestors, es);
if (plan->qual)
show_instrumentation_count("Rows Removed by Filter", 1,
- planstate, es);
+ local_instr, es);
ExplainPropertyInteger("Workers Planned", NULL,
gm->num_workers, es);
@@ -2212,7 +2260,7 @@ ExplainNode(PlanState *planstate, List *ancestors,
show_scan_qual(plan->qual, "Filter", planstate, ancestors, es);
if (plan->qual)
show_instrumentation_count("Rows Removed by Filter", 1,
- planstate, es);
+ local_instr, es);
break;
case T_TableFuncScan:
if (es->verbose)
@@ -2226,7 +2274,7 @@ ExplainNode(PlanState *planstate, List *ancestors,
show_scan_qual(plan->qual, "Filter", planstate, ancestors, es);
if (plan->qual)
show_instrumentation_count("Rows Removed by Filter", 1,
- planstate, es);
+ local_instr, es);
show_table_func_scan_info(castNode(TableFuncScanState,
planstate), es);
break;
@@ -2244,7 +2292,7 @@ ExplainNode(PlanState *planstate, List *ancestors,
show_scan_qual(plan->qual, "Filter", planstate, ancestors, es);
if (plan->qual)
show_instrumentation_count("Rows Removed by Filter", 1,
- planstate, es);
+ local_instr, es);
}
break;
case T_TidRangeScan:
@@ -2261,14 +2309,14 @@ ExplainNode(PlanState *planstate, List *ancestors,
show_scan_qual(plan->qual, "Filter", planstate, ancestors, es);
if (plan->qual)
show_instrumentation_count("Rows Removed by Filter", 1,
- planstate, es);
+ local_instr, es);
}
break;
case T_ForeignScan:
show_scan_qual(plan->qual, "Filter", planstate, ancestors, es);
if (plan->qual)
show_instrumentation_count("Rows Removed by Filter", 1,
- planstate, es);
+ local_instr, es);
show_foreignscan_info((ForeignScanState *) planstate, es);
break;
case T_CustomScan:
@@ -2278,7 +2326,7 @@ ExplainNode(PlanState *planstate, List *ancestors,
show_scan_qual(plan->qual, "Filter", planstate, ancestors, es);
if (plan->qual)
show_instrumentation_count("Rows Removed by Filter", 1,
- planstate, es);
+ local_instr, es);
if (css->methods->ExplainCustomScan)
css->methods->ExplainCustomScan(css, ancestors, es);
}
@@ -2288,11 +2336,11 @@ ExplainNode(PlanState *planstate, List *ancestors,
"Join Filter", planstate, ancestors, es);
if (((NestLoop *) plan)->join.joinqual)
show_instrumentation_count("Rows Removed by Join Filter", 1,
- planstate, es);
+ local_instr, es);
show_upper_qual(plan->qual, "Filter", planstate, ancestors, es);
if (plan->qual)
show_instrumentation_count("Rows Removed by Filter", 2,
- planstate, es);
+ local_instr, es);
break;
case T_MergeJoin:
show_upper_qual(((MergeJoin *) plan)->mergeclauses,
@@ -2301,11 +2349,11 @@ ExplainNode(PlanState *planstate, List *ancestors,
"Join Filter", planstate, ancestors, es);
if (((MergeJoin *) plan)->join.joinqual)
show_instrumentation_count("Rows Removed by Join Filter", 1,
- planstate, es);
+ local_instr, es);
show_upper_qual(plan->qual, "Filter", planstate, ancestors, es);
if (plan->qual)
show_instrumentation_count("Rows Removed by Filter", 2,
- planstate, es);
+ local_instr, es);
break;
case T_HashJoin:
show_upper_qual(((HashJoin *) plan)->hashclauses,
@@ -2314,11 +2362,11 @@ ExplainNode(PlanState *planstate, List *ancestors,
"Join Filter", planstate, ancestors, es);
if (((HashJoin *) plan)->join.joinqual)
show_instrumentation_count("Rows Removed by Join Filter", 1,
- planstate, es);
+ local_instr, es);
show_upper_qual(plan->qual, "Filter", planstate, ancestors, es);
if (plan->qual)
show_instrumentation_count("Rows Removed by Filter", 2,
- planstate, es);
+ local_instr, es);
break;
case T_Agg:
show_agg_keys(castNode(AggState, planstate), ancestors, es);
@@ -2326,13 +2374,13 @@ ExplainNode(PlanState *planstate, List *ancestors,
show_hashagg_info((AggState *) planstate, es);
if (plan->qual)
show_instrumentation_count("Rows Removed by Filter", 1,
- planstate, es);
+ local_instr, es);
break;
case T_WindowAgg:
show_upper_qual(plan->qual, "Filter", planstate, ancestors, es);
if (plan->qual)
show_instrumentation_count("Rows Removed by Filter", 1,
- planstate, es);
+ local_instr, es);
show_upper_qual(((WindowAgg *) plan)->runConditionOrig,
"Run Condition", planstate, ancestors, es);
show_windowagg_info(castNode(WindowAggState, planstate), es);
@@ -2342,7 +2390,7 @@ ExplainNode(PlanState *planstate, List *ancestors,
show_upper_qual(plan->qual, "Filter", planstate, ancestors, es);
if (plan->qual)
show_instrumentation_count("Rows Removed by Filter", 1,
- planstate, es);
+ local_instr, es);
break;
case T_Sort:
show_sort_keys(castNode(SortState, planstate), ancestors, es);
@@ -2364,7 +2412,7 @@ ExplainNode(PlanState *planstate, List *ancestors,
show_upper_qual(plan->qual, "Filter", planstate, ancestors, es);
if (plan->qual)
show_instrumentation_count("Rows Removed by Filter", 1,
- planstate, es);
+ local_instr, es);
break;
case T_ModifyTable:
show_modifytable_info(castNode(ModifyTableState, planstate), ancestors,
@@ -2409,10 +2457,10 @@ ExplainNode(PlanState *planstate, List *ancestors,
}
/* Show buffer/WAL usage */
- if (es->buffers && planstate->instrument)
- show_buffer_usage(es, &planstate->instrument->bufusage);
- if (es->wal && planstate->instrument)
- show_wal_usage(es, &planstate->instrument->walusage);
+ if (es->buffers && local_instr)
+ show_buffer_usage(es, &local_instr->bufusage);
+ if (es->wal && local_instr)
+ show_wal_usage(es, &local_instr->walusage);
/* Prepare per-worker buffer/WAL usage */
if (es->workers_state && (es->buffers || es->wal) && es->verbose)
@@ -3929,19 +3977,19 @@ show_tidbitmap_info(BitmapHeapScanState *planstate, ExplainState *es)
*/
static void
show_instrumentation_count(const char *qlabel, int which,
- PlanState *planstate, ExplainState *es)
+ Instrumentation *instr, ExplainState *es)
{
double nfiltered;
double nloops;
- if (!es->analyze || !planstate->instrument)
+ if (!es->analyze || !instr)
return;
if (which == 2)
- nfiltered = planstate->instrument->nfiltered2;
+ nfiltered = instr->nfiltered2;
else
- nfiltered = planstate->instrument->nfiltered1;
- nloops = planstate->instrument->nloops;
+ nfiltered = instr->nfiltered1;
+ nloops = instr->nloops;
/* In text mode, suppress zero counts; they're not interesting enough */
if (nfiltered > 0 || es->format != EXPLAIN_FORMAT_TEXT)
@@ -4612,7 +4660,7 @@ show_modifytable_info(ModifyTableState *mtstate, List *ancestors,
{
show_upper_qual((List *) node->onConflictWhere, "Conflict Filter",
&mtstate->ps, ancestors, es);
- show_instrumentation_count("Rows Removed by Conflict Filter", 1, &mtstate->ps, es);
+ show_instrumentation_count("Rows Removed by Conflict Filter", 1, (&mtstate->ps)->instrument, es);
}
/* EXPLAIN ANALYZE display of actual outcome for each tuple proposed */
@@ -4621,11 +4669,24 @@ show_modifytable_info(ModifyTableState *mtstate, List *ancestors,
double total;
double insert_path;
double other_path;
+ Instrumentation *local_instr;
- InstrEndLoop(outerPlanState(mtstate)->instrument);
+ /* Progressive explain. Use auxiliary instrumentation object */
+ if (es->progressive)
+ {
+ local_instr = es->pe_local_instr;
+ *local_instr = *outerPlanState(mtstate)->instrument;
+ /* Force end loop even if node is in progress */
+ InstrEndLoopForce(local_instr);
+ }
+ else
+ {
+ local_instr = outerPlanState(mtstate)->instrument;
+ InstrEndLoop(local_instr);
+ }
/* count the number of source rows */
- total = outerPlanState(mtstate)->instrument->ntuples;
+ total = local_instr->ntuples;
other_path = mtstate->ps.instrument->ntuples2;
insert_path = total - other_path;
@@ -4645,11 +4706,24 @@ show_modifytable_info(ModifyTableState *mtstate, List *ancestors,
double update_path;
double delete_path;
double skipped_path;
+ Instrumentation *local_instr;
- InstrEndLoop(outerPlanState(mtstate)->instrument);
+ /* Progressive explain. Use auxiliary instrumentation object */
+ if (es->progressive)
+ {
+ local_instr = es->pe_local_instr;
+ *local_instr = *outerPlanState(mtstate)->instrument;
+ /* Force end loop even if node is in progress */
+ InstrEndLoopForce(local_instr);
+ }
+ else
+ {
+ local_instr = outerPlanState(mtstate)->instrument;
+ InstrEndLoop(local_instr);
+ }
/* count the number of source rows */
- total = outerPlanState(mtstate)->instrument->ntuples;
+ total = local_instr->ntuples;
insert_path = mtstate->mt_merge_inserted;
update_path = mtstate->mt_merge_updated;
delete_path = mtstate->mt_merge_deleted;
@@ -4931,3 +5005,470 @@ ExplainFlushWorkersState(ExplainState *es)
pfree(wstate->worker_state_save);
pfree(wstate);
}
+
+/*
+ * ProgressiveExplainSetup
+ * Adjusts QueryDesc with instrumentation for progressive explains.
+ *
+ * If progressive explain is enabled and configured to collect
+ * instrumentation details, we adjust QueryDesc accordingly even if
+ * the query was not initiated with EXPLAIN ANALYZE. This will
+ * directly affect query execution and add computation overhead.
+ */
+void
+ProgressiveExplainSetup(QueryDesc *queryDesc)
+{
+ /* Adjust instrumentation if enabled */
+ if (progressive_explain == PROGRESSIVE_EXPLAIN_ANALYZE)
+ {
+ if (progressive_explain_timing)
+ queryDesc->instrument_options |= INSTRUMENT_TIMER;
+ else
+ queryDesc->instrument_options |= INSTRUMENT_ROWS;
+ if (progressive_explain_buffers)
+ queryDesc->instrument_options |= INSTRUMENT_BUFFERS;
+ if (progressive_explain_wal)
+ queryDesc->instrument_options |= INSTRUMENT_WAL;
+ }
+}
+
+/*
+ * ProgressiveExplainStart
+ * Progressive explain start point.
+ */
+void
+ProgressiveExplainStart(QueryDesc *queryDesc)
+{
+ activeQueryDesc = queryDesc;
+ queryDesc->pe_es = NULL;
+
+ /* Timeout is only needed if duration > 0 */
+ if (progressive_explain_min_duration == 0)
+ ProgressiveExplainInit(queryDesc);
+ else
+ enable_timeout_after(PROGRESSIVE_EXPLAIN_STARTUP_TIMEOUT,
+ progressive_explain_min_duration);
+}
+
+/*
+ * ExecProcNodeWithExplain
+ * Responsible for initialization of all structures related to progressive
+ * explains.
+ *
+ * We define a ExplainState that will be reused in every iteration of
+ * plan prints.
+ *
+ * Progressive explain plans are printed in shared memory via DSAs. Each
+ * A dynamic shared memory area is created to hold the progressive plans.
+ * Each backend printing plans has its own DSA, which is shared with other
+ * backends via the global progressive explain hash through dsa_handle and
+ * dsa_pointer pointers.
+ *
+ * A memory context release callback is defined for manual resource release
+ * in case of query cancellations.
+ *
+ * A periodic timeout is configured to print the plan in fixed intervals if
+ * progressive explain is configured with instrumentation enabled. Otherwise
+ * the plain plan is printed once.
+ */
+void
+ProgressiveExplainInit(QueryDesc *queryDesc)
+{
+ ExplainState *es;
+ progressiveExplainHashKey key;
+ progressiveExplainHashEntry *entry;
+ bool found;
+
+ /* Configure memory context release callback */
+ MemoryContextCallback *queryDescReleaseCallback;
+
+ queryDescReleaseCallback = (MemoryContextCallback *)
+ palloc0(sizeof(MemoryContextCallback));
+ queryDescReleaseCallback->func = ProgressiveExplainReleaseFunc;
+ queryDescReleaseCallback->arg = NULL;
+ MemoryContextRegisterResetCallback(CurrentMemoryContext,
+ queryDescReleaseCallback);
+
+ /* Initialize ExplainState to be used for all prints */
+ es = NewExplainState();
+ queryDesc->pe_es = es;
+
+ /* Local instrumentation object to be reused for every node */
+ es->pe_local_instr = palloc0(sizeof(Instrumentation));
+
+ /*
+ * Mark ExplainState as progressive so that the plan printing function
+ * adjusts logic accordingly.
+ */
+ es->progressive = true;
+
+ es->analyze = (queryDesc->instrument_options &&
+ (progressive_explain == PROGRESSIVE_EXPLAIN_ANALYZE));
+ es->buffers = (es->analyze && progressive_explain_buffers);
+ es->wal = (es->analyze && progressive_explain_wal);
+ es->timing = (es->analyze && progressive_explain_timing);
+ es->summary = (es->analyze);
+ es->format = progressive_explain_format;
+ es->verbose = progressive_explain_verbose;
+ es->settings = progressive_explain_settings;
+
+ /* Define the DSA and share through the hash */
+ es->pe_a = dsa_create(LWTRANCHE_PROGRESSIVE_EXPLAIN_DSA);
+
+ /* Exclusive access is needed to update the hash */
+ LWLockAcquire(ExplainHashLock, LW_EXCLUSIVE);
+
+ /* Find or create an entry with desired hash code */
+ key.pid = MyProcPid;
+ entry = (progressiveExplainHashEntry *) hash_search(progressiveExplainArray,
+ &key,
+ HASH_ENTER,
+ &found);
+
+ entry->h = dsa_get_handle(es->pe_a);
+ entry->p = (dsa_pointer) NULL;
+
+ LWLockRelease(ExplainHashLock);
+
+ /* Enable timeout only if instrumentation is enabled */
+ if (es->analyze)
+ enable_timeout_every(PROGRESSIVE_EXPLAIN_TIMEOUT,
+ TimestampTzPlusMilliseconds(GetCurrentTimestamp(),
+ progressive_explain_interval),
+ progressive_explain_interval);
+
+ /* Printing progressive plan for the first time */
+ ProgressiveExplainPrint(queryDesc);
+}
+
+/*
+ * ProgressiveExplainTrigger
+ * Called by the timeout handler to start printing progressive
+ * explain plans.
+ */
+void
+ProgressiveExplainTrigger(void)
+{
+ /* Initialize progressive explain */
+ ProgressiveExplainInit(activeQueryDesc);
+}
+
+/*
+ * ProgressiveExplainUpdate
+ * Updates progressive explain for instrumented runs.
+ */
+void
+ProgressiveExplainUpdate(PlanState *node)
+{
+ /* Track the current PlanState */
+ node->state->query_desc->pe_es->pe_curr_node = node;
+ ProgressiveExplainPrint(node->state->query_desc);
+ node->state->query_desc->pe_es->pe_curr_node = NULL;
+
+ /* Reset timeout flag */
+ ProgressiveExplainPending = false;
+}
+
+/*
+ * ProgressiveExplainPrint
+ * Prints progressive explain in memory.
+ *
+ * This function resets the reusable ExplainState, prints the
+ * plan and updates the DSA with new data.
+ *
+ * DSA memory allocation is also done here. Amount of shared
+ * memory allocated depends on size of currently printed plan.
+ * There may be reallocations in subsequent calls if new plans
+ * don't fit in the existing area.
+ */
+void
+ProgressiveExplainPrint(QueryDesc *queryDesc)
+{
+ bool alloc_needed = false;
+
+ /* Produce a plan only if descriptor is being tracked */
+ if (queryDesc &&
+ queryDesc->planstate)
+ {
+ QueryDesc *currentQueryDesc = queryDesc;
+
+ progressiveExplainHashKey key;
+ progressiveExplainHashEntry *entry;
+ progressiveExplainData *pe_data;
+ ExplainState *es = queryDesc->pe_es;
+
+ /* Reset the string to be reused */
+ resetStringInfo(es->str);
+
+ /* Print the plan */
+ ExplainBeginOutput(es);
+ ExplainPrintPlan(es, currentQueryDesc);
+ ExplainEndOutput(es);
+
+ /* Exclusive access is needed to update the hash */
+ key.pid = MyProcPid;
+ LWLockAcquire(ExplainHashLock, LW_EXCLUSIVE);
+ entry = (progressiveExplainHashEntry *) hash_search(progressiveExplainArray,
+ &key,
+ HASH_FIND,
+ NULL);
+
+ /* Entry exists */
+ if (entry)
+ {
+ /* Plan was never printed */
+ if (!entry->p)
+ alloc_needed = true;
+ else
+ {
+ pe_data = dsa_get_address(es->pe_a,
+ entry->p);
+
+ /*
+ * Plan does not fit in existing shared memory area.
+ * Reallocation is needed.
+ */
+ if (strlen(es->str->data) >
+ add_size(strlen(pe_data->plan),
+ PROGRESSIVE_EXPLAIN_ALLOC_SIZE))
+ {
+ dsa_free(es->pe_a, entry->p);
+ alloc_needed = true;
+ }
+ }
+
+ if (alloc_needed)
+ {
+ /*
+ * The allocated size combines the length of the currently
+ * printed query plan with an additional delta defined by
+ * PROGRESSIVE_EXPLAIN_ALLOC_SIZE. This strategy prevents
+ * having to reallocate the segment very often, which would be
+ * needed in case the length of the next printed exceeds the
+ * previously allocated size.
+ */
+ entry->p = dsa_allocate(es->pe_a,
+ add_size(sizeof(progressiveExplainData),
+ add_size(strlen(es->str->data),
+ PROGRESSIVE_EXPLAIN_ALLOC_SIZE)));
+ pe_data = dsa_get_address(es->pe_a, entry->p);
+ pe_data->pid = MyProcPid;
+ }
+
+ /* Update shared memory with new data */
+ strcpy(pe_data->plan, es->str->data);
+ pe_data->last_print = GetCurrentTimestamp();
+ }
+
+ LWLockRelease(ExplainHashLock);
+ }
+}
+
+/*
+ * ProgressiveExplainFinish
+ * Finalizes query execution with progressive explain enabled.
+ */
+void
+ProgressiveExplainFinish(QueryDesc *queryDesc)
+{
+ ProgressiveExplainCleanup(queryDesc);
+}
+
+/*
+ * ProgressiveExplainCleanup
+ * Cleanup routine when progressive explain is enabled.
+ *
+ * We need deal with structures not automatically released by the memory
+ * context removal. Current tasks are:
+ * - remove local backend from progressive explain hash
+ * - detach from DSA used to store shared data
+ */
+void
+ProgressiveExplainCleanup(QueryDesc *queryDesc)
+{
+ progressiveExplainHashKey key;
+
+ /* Startup timeout hasn't triggered yet, just disable it */
+ if (get_timeout_active(PROGRESSIVE_EXPLAIN_STARTUP_TIMEOUT))
+ {
+ disable_timeout(PROGRESSIVE_EXPLAIN_STARTUP_TIMEOUT, false);
+ }
+ /* Initial progressive explain was done, clean everything */
+ else
+ {
+ /* Stop timeout */
+ disable_timeout(PROGRESSIVE_EXPLAIN_TIMEOUT, false);
+
+ /* Reset timeout flag */
+ ProgressiveExplainPending = false;
+
+ key.pid = MyProcPid;
+ LWLockAcquire(ExplainHashLock, LW_EXCLUSIVE);
+ if (queryDesc && queryDesc->pe_es->pe_a)
+ dsa_detach(queryDesc->pe_es->pe_a);
+ hash_search(progressiveExplainArray, &key, HASH_REMOVE, NULL);
+ LWLockRelease(ExplainHashLock);
+ }
+}
+
+/*
+ * ExecProcNode wrapper that performs instrumentation calls and prints
+ * progressive explains. By keeping this a separate function, we add
+ * overhead only when progressive explain is enabled
+ */
+TupleTableSlot *
+ExecProcNodeInstrExplain(PlanState *node)
+{
+ TupleTableSlot *result;
+
+ InstrStartNode(node->instrument);
+
+ /*
+ * Update progressive after timeout is reached.
+ */
+ if (ProgressiveExplainPending)
+ ProgressiveExplainUpdate(node);
+
+ result = node->ExecProcNodeReal(node);
+
+ InstrStopNode(node->instrument, TupIsNull(result) ? 0.0 : 1.0);
+
+ return result;
+}
+
+/*
+ * ProgressiveExplainReleaseFunc
+ * Memory context release callback function to remove
+ * plan from explain hash and disable the timeout.
+ */
+static void
+ProgressiveExplainReleaseFunc(void *arg)
+{
+ /* Remove row from hash */
+ progressiveExplainHashKey key;
+ progressiveExplainHashEntry *entry;
+
+ key.pid = MyProcPid;
+ LWLockAcquire(ExplainHashLock, LW_SHARED);
+ entry = (progressiveExplainHashEntry *) hash_search(progressiveExplainArray,
+ &key,
+ HASH_FIND,
+ NULL);
+ LWLockRelease(ExplainHashLock);
+ if (entry)
+ ProgressiveExplainCleanup(NULL);
+}
+
+/*
+ * ProgressiveExplainHashShmemSize
+ * Compute shared memory space needed for explain hash.
+ */
+Size
+ProgressiveExplainHashShmemSize(void)
+{
+ Size size = 0;
+ long max_table_size;
+
+ max_table_size = add_size(MaxBackends,
+ max_parallel_workers);
+ size = add_size(size,
+ hash_estimate_size(max_table_size,
+ sizeof(progressiveExplainHashEntry)));
+
+ return size;
+}
+
+/*
+ * InitProgressiveExplainHash
+ * Initialize hash used to store data of progressive explains.
+ */
+void
+InitProgressiveExplainHash(void)
+{
+ HASHCTL info;
+
+ info.keysize = sizeof(progressiveExplainHashKey);
+ info.entrysize = sizeof(progressiveExplainHashEntry);
+
+ progressiveExplainArray = ShmemInitHash("progressive explain hash",
+ 50, 50,
+ &info,
+ HASH_ELEM | HASH_BLOBS);
+}
+
+/*
+ * pg_stat_progress_explain
+ * Return the progress of progressive explains.
+ */
+Datum
+pg_stat_progress_explain(PG_FUNCTION_ARGS)
+{
+#define EXPLAIN_ACTIVITY_COLS 3
+ HASH_SEQ_STATUS hash_seq;
+ progressiveExplainHashEntry *entry;
+ dsa_area *a;
+ progressiveExplainData *ped;
+
+ ReturnSetInfo *rsinfo = (ReturnSetInfo *) fcinfo->resultinfo;
+
+ InitMaterializedSRF(fcinfo, 0);
+
+ LWLockAcquire(ExplainHashLock, LW_SHARED);
+
+ hash_seq_init(&hash_seq, progressiveExplainArray);
+ while ((entry = hash_seq_search(&hash_seq)) != NULL)
+ {
+ Datum values[EXPLAIN_ACTIVITY_COLS] = {0};
+ bool nulls[EXPLAIN_ACTIVITY_COLS] = {0};
+
+ /*
+ * We don't look at a DSA that doesn't contain data yet, or at our own
+ * row.
+ */
+ if (!DsaPointerIsValid(entry->p) ||
+ MyProcPid == entry->key.pid)
+ continue;
+
+ a = dsa_attach(entry->h);
+ ped = dsa_get_address(a, entry->p);
+
+ values[0] = ped->pid;
+ values[1] = TimestampTzGetDatum(ped->last_print);
+
+ if (superuser())
+ values[2] = CStringGetTextDatum(ped->plan);
+ else
+ {
+ int num_backends = pgstat_fetch_stat_numbackends();
+ int curr_backend;
+
+ /* 1-based index */
+ for (curr_backend = 1; curr_backend <= num_backends; curr_backend++)
+ {
+ LocalPgBackendStatus *local_beentry;
+ PgBackendStatus *beentry;
+
+ /* Get the next one in the list */
+ local_beentry = pgstat_get_local_beentry_by_index(curr_backend);
+ beentry = &local_beentry->backendStatus;
+
+ if (beentry->st_procpid == ped->pid)
+ {
+ if (beentry->st_userid == GetUserId())
+ values[2] = CStringGetTextDatum(ped->plan);
+ else
+ values[2] = CStringGetTextDatum("<insufficient privilege>");
+ break;
+ }
+ }
+ }
+
+ tuplestore_putvalues(rsinfo->setResult, rsinfo->setDesc, values, nulls);
+
+ dsa_detach(a);
+
+ }
+ LWLockRelease(ExplainHashLock);
+
+ return (Datum) 0;
+}
diff --git a/src/backend/executor/execMain.c b/src/backend/executor/execMain.c
index 0493b7d5365..52b8b2bd1f7 100644
--- a/src/backend/executor/execMain.c
+++ b/src/backend/executor/execMain.c
@@ -43,6 +43,7 @@
#include "access/xact.h"
#include "catalog/namespace.h"
#include "catalog/partition.h"
+#include "commands/explain.h"
#include "commands/matview.h"
#include "commands/trigger.h"
#include "executor/executor.h"
@@ -157,6 +158,12 @@ standard_ExecutorStart(QueryDesc *queryDesc, int eflags)
/* caller must ensure the query's snapshot is active */
Assert(GetActiveSnapshot() == queryDesc->snapshot);
+ /*
+ * Setup progressive explain if enabled.
+ */
+ if (progressive_explain == PROGRESSIVE_EXPLAIN_ANALYZE)
+ ProgressiveExplainSetup(queryDesc);
+
/*
* If the transaction is read-only, we need to check if any writes are
* planned to non-temporary tables. EXPLAIN is considered read-only.
@@ -182,6 +189,11 @@ standard_ExecutorStart(QueryDesc *queryDesc, int eflags)
estate = CreateExecutorState();
queryDesc->estate = estate;
+ /*
+ * Adding back reference to QueryDesc
+ */
+ estate->query_desc = queryDesc;
+
oldcontext = MemoryContextSwitchTo(estate->es_query_cxt);
/*
@@ -267,6 +279,12 @@ standard_ExecutorStart(QueryDesc *queryDesc, int eflags)
*/
InitPlan(queryDesc, eflags);
+ /*
+ * Start progressive explain if enabled.
+ */
+ if (progressive_explain != PROGRESSIVE_EXPLAIN_NONE)
+ ProgressiveExplainStart(queryDesc);
+
MemoryContextSwitchTo(oldcontext);
return ExecPlanStillValid(queryDesc->estate);
@@ -516,6 +534,12 @@ standard_ExecutorFinish(QueryDesc *queryDesc)
MemoryContextSwitchTo(oldcontext);
+ /*
+ * Finish progressive explain if enabled.
+ */
+ if (progressive_explain != PROGRESSIVE_EXPLAIN_NONE)
+ ProgressiveExplainFinish(queryDesc);
+
estate->es_finished = true;
}
diff --git a/src/backend/executor/execProcnode.c b/src/backend/executor/execProcnode.c
index f5f9cfbeead..3af8e9d964d 100644
--- a/src/backend/executor/execProcnode.c
+++ b/src/backend/executor/execProcnode.c
@@ -72,6 +72,7 @@
*/
#include "postgres.h"
+#include "commands/explain.h"
#include "executor/executor.h"
#include "executor/nodeAgg.h"
#include "executor/nodeAppend.h"
@@ -118,6 +119,7 @@
#include "executor/nodeWorktablescan.h"
#include "miscadmin.h"
#include "nodes/nodeFuncs.h"
+#include "utils/guc.h"
static TupleTableSlot *ExecProcNodeFirst(PlanState *node);
static TupleTableSlot *ExecProcNodeInstr(PlanState *node);
@@ -461,8 +463,14 @@ ExecProcNodeFirst(PlanState *node)
* does instrumentation. Otherwise we can dispense with all wrappers and
* have ExecProcNode() directly call the relevant function from now on.
*/
+
if (node->instrument)
- node->ExecProcNode = ExecProcNodeInstr;
+ {
+ if (progressive_explain == PROGRESSIVE_EXPLAIN_ANALYZE)
+ node->ExecProcNode = ExecProcNodeInstrExplain;
+ else
+ node->ExecProcNode = ExecProcNodeInstr;
+ }
else
node->ExecProcNode = node->ExecProcNodeReal;
diff --git a/src/backend/executor/instrument.c b/src/backend/executor/instrument.c
index 56e635f4700..6a160ee254f 100644
--- a/src/backend/executor/instrument.c
+++ b/src/backend/executor/instrument.c
@@ -25,6 +25,8 @@ static WalUsage save_pgWalUsage;
static void BufferUsageAdd(BufferUsage *dst, const BufferUsage *add);
static void WalUsageAdd(WalUsage *dst, WalUsage *add);
+static void InstrEndLoopInternal(Instrumentation *instr, bool force);
+
/* Allocate new instrumentation structure(s) */
Instrumentation *
@@ -137,7 +139,7 @@ InstrUpdateTupleCount(Instrumentation *instr, double nTuples)
/* Finish a run cycle for a plan node */
void
-InstrEndLoop(Instrumentation *instr)
+InstrEndLoopInternal(Instrumentation *instr, bool force)
{
double totaltime;
@@ -145,7 +147,7 @@ InstrEndLoop(Instrumentation *instr)
if (!instr->running)
return;
- if (!INSTR_TIME_IS_ZERO(instr->starttime))
+ if (!INSTR_TIME_IS_ZERO(instr->starttime) && !force)
elog(ERROR, "InstrEndLoop called on running node");
/* Accumulate per-cycle statistics into totals */
@@ -164,6 +166,20 @@ InstrEndLoop(Instrumentation *instr)
instr->tuplecount = 0;
}
+/* Safely finish a run cycle for a plan node */
+void
+InstrEndLoop(Instrumentation *instr)
+{
+ InstrEndLoopInternal(instr, false);
+}
+
+/* Forcibly finish a run cycle for a plan node */
+void
+InstrEndLoopForce(Instrumentation *instr)
+{
+ InstrEndLoopInternal(instr, true);
+}
+
/* aggregate instrumentation information */
void
InstrAggNode(Instrumentation *dst, Instrumentation *add)
diff --git a/src/backend/storage/ipc/ipci.c b/src/backend/storage/ipc/ipci.c
index 174eed70367..15d8a3b28a8 100644
--- a/src/backend/storage/ipc/ipci.c
+++ b/src/backend/storage/ipc/ipci.c
@@ -25,6 +25,7 @@
#include "access/xlogprefetcher.h"
#include "access/xlogrecovery.h"
#include "commands/async.h"
+#include "commands/explain.h"
#include "miscadmin.h"
#include "pgstat.h"
#include "postmaster/autovacuum.h"
@@ -148,6 +149,7 @@ CalculateShmemSize(int *num_semaphores)
size = add_size(size, WaitEventCustomShmemSize());
size = add_size(size, InjectionPointShmemSize());
size = add_size(size, SlotSyncShmemSize());
+ size = add_size(size, ProgressiveExplainHashShmemSize());
/* include additional requested shmem from preload libraries */
size = add_size(size, total_addin_request);
@@ -300,6 +302,11 @@ CreateOrAttachShmemStructs(void)
*/
PredicateLockShmemInit();
+ /*
+ * Set up instrumented explain hash table
+ */
+ InitProgressiveExplainHash();
+
/*
* Set up process table
*/
diff --git a/src/backend/storage/lmgr/lwlock.c b/src/backend/storage/lmgr/lwlock.c
index 8adf2730277..2b4393d3635 100644
--- a/src/backend/storage/lmgr/lwlock.c
+++ b/src/backend/storage/lmgr/lwlock.c
@@ -176,6 +176,7 @@ static const char *const BuiltinTrancheNames[] = {
[LWTRANCHE_SUBTRANS_SLRU] = "SubtransSLRU",
[LWTRANCHE_XACT_SLRU] = "XactSLRU",
[LWTRANCHE_PARALLEL_VACUUM_DSA] = "ParallelVacuumDSA",
+ [LWTRANCHE_PROGRESSIVE_EXPLAIN_DSA] = "ProgressiveExplainDSA",
};
StaticAssertDecl(lengthof(BuiltinTrancheNames) ==
diff --git a/src/backend/utils/activity/wait_event_names.txt b/src/backend/utils/activity/wait_event_names.txt
index e199f071628..890acf02da9 100644
--- a/src/backend/utils/activity/wait_event_names.txt
+++ b/src/backend/utils/activity/wait_event_names.txt
@@ -346,6 +346,7 @@ WALSummarizer "Waiting to read or update WAL summarization state."
DSMRegistry "Waiting to read or update the dynamic shared memory registry."
InjectionPoint "Waiting to read or update information related to injection points."
SerialControl "Waiting to read or update shared <filename>pg_serial</filename> state."
+ExplainHash "Waiting to access backend explain shared hash table."
#
# END OF PREDEFINED LWLOCKS (DO NOT CHANGE THIS LINE)
diff --git a/src/backend/utils/init/postinit.c b/src/backend/utils/init/postinit.c
index ee1a9d5d98b..6aee6f08b75 100644
--- a/src/backend/utils/init/postinit.c
+++ b/src/backend/utils/init/postinit.c
@@ -33,6 +33,7 @@
#include "catalog/pg_database.h"
#include "catalog/pg_db_role_setting.h"
#include "catalog/pg_tablespace.h"
+#include "commands/explain.h"
#include "libpq/auth.h"
#include "libpq/libpq-be.h"
#include "mb/pg_wchar.h"
@@ -80,6 +81,8 @@ static void TransactionTimeoutHandler(void);
static void IdleSessionTimeoutHandler(void);
static void IdleStatsUpdateTimeoutHandler(void);
static void ClientCheckTimeoutHandler(void);
+static void ProgressiveExplainStartupTimeoutHandler(void);
+static void ProgressiveExplainTimeoutHandler(void);
static bool ThereIsAtLeastOneRole(void);
static void process_startup_options(Port *port, bool am_superuser);
static void process_settings(Oid databaseid, Oid roleid);
@@ -757,6 +760,10 @@ InitPostgres(const char *in_dbname, Oid dboid,
RegisterTimeout(CLIENT_CONNECTION_CHECK_TIMEOUT, ClientCheckTimeoutHandler);
RegisterTimeout(IDLE_STATS_UPDATE_TIMEOUT,
IdleStatsUpdateTimeoutHandler);
+ RegisterTimeout(PROGRESSIVE_EXPLAIN_TIMEOUT,
+ ProgressiveExplainTimeoutHandler);
+ RegisterTimeout(PROGRESSIVE_EXPLAIN_STARTUP_TIMEOUT,
+ ProgressiveExplainStartupTimeoutHandler);
}
/*
@@ -1418,6 +1425,18 @@ ClientCheckTimeoutHandler(void)
SetLatch(MyLatch);
}
+static void
+ProgressiveExplainStartupTimeoutHandler(void)
+{
+ ProgressiveExplainTrigger();
+}
+
+static void
+ProgressiveExplainTimeoutHandler(void)
+{
+ ProgressiveExplainPending = true;
+}
+
/*
* Returns true if at least one role is defined in this database cluster.
*/
diff --git a/src/backend/utils/misc/guc_tables.c b/src/backend/utils/misc/guc_tables.c
index ad25cbb39c5..e6cd1d8781c 100644
--- a/src/backend/utils/misc/guc_tables.c
+++ b/src/backend/utils/misc/guc_tables.c
@@ -40,6 +40,7 @@
#include "catalog/storage.h"
#include "commands/async.h"
#include "commands/event_trigger.h"
+#include "commands/explain.h"
#include "commands/tablespace.h"
#include "commands/trigger.h"
#include "commands/user.h"
@@ -476,6 +477,22 @@ static const struct config_enum_entry wal_compression_options[] = {
{NULL, 0, false}
};
+static const struct config_enum_entry explain_format_options[] = {
+ {"text", EXPLAIN_FORMAT_TEXT, false},
+ {"xml", EXPLAIN_FORMAT_XML, false},
+ {"json", EXPLAIN_FORMAT_JSON, false},
+ {"yaml", EXPLAIN_FORMAT_YAML, false},
+ {NULL, 0, false}
+};
+
+static const struct config_enum_entry progressive_explain_options[] = {
+ {"off", PROGRESSIVE_EXPLAIN_NONE, false},
+ {"explain", PROGRESSIVE_EXPLAIN_EXPLAIN, false},
+ {"analyze", PROGRESSIVE_EXPLAIN_ANALYZE, false},
+ {"false", PROGRESSIVE_EXPLAIN_NONE, true},
+ {NULL, 0, false}
+};
+
/*
* Options for enum values stored in other modules
*/
@@ -530,6 +547,15 @@ int log_parameter_max_length_on_error = 0;
int log_temp_files = -1;
double log_statement_sample_rate = 1.0;
double log_xact_sample_rate = 0;
+int progressive_explain = PROGRESSIVE_EXPLAIN_NONE;
+bool progressive_explain_verbose = false;
+bool progressive_explain_settings = false;
+bool progressive_explain_timing = false;
+bool progressive_explain_buffers = false;
+bool progressive_explain_wal = false;
+int progressive_explain_min_duration = 1000;
+int progressive_explain_interval = 1000;
+int progressive_explain_format = EXPLAIN_FORMAT_TEXT;
char *backtrace_functions;
int temp_file_limit = -1;
@@ -2118,6 +2144,61 @@ struct config_bool ConfigureNamesBool[] =
NULL, NULL, NULL
},
+ {
+ {"progressive_explain_verbose", PGC_USERSET, STATS_MONITORING,
+ gettext_noop("Controls whether verbose details are added to progressive explains."),
+ gettext_noop("Equivalent to the VERBOSE option of EXPLAIN."),
+ GUC_EXPLAIN
+ },
+ &progressive_explain_verbose,
+ false,
+ NULL, NULL, NULL
+ },
+
+ {
+ {"progressive_explain_settings", PGC_USERSET, STATS_MONITORING,
+ gettext_noop("Controls whether information about modified configuration is added to progressive explains."),
+ gettext_noop("Equivalent to the SETTINGS option of EXPLAIN."),
+ GUC_EXPLAIN
+ },
+ &progressive_explain_settings,
+ false,
+ NULL, NULL, NULL
+ },
+
+ {
+ {"progressive_explain_timing", PGC_USERSET, STATS_MONITORING,
+ gettext_noop("Controls whether information about per node timing is added to progressive explains."),
+ gettext_noop("Equivalent to the TIMING option of EXPLAIN."),
+ GUC_EXPLAIN
+ },
+ &progressive_explain_timing,
+ true,
+ NULL, NULL, NULL
+ },
+
+ {
+ {"progressive_explain_buffers", PGC_USERSET, STATS_MONITORING,
+ gettext_noop("Controls whether information about buffers is added to progressive explains."),
+ gettext_noop("Equivalent to the BUFFERS option of EXPLAIN."),
+ GUC_EXPLAIN
+ },
+ &progressive_explain_buffers,
+ false,
+ NULL, NULL, NULL
+ },
+
+ {
+ {"progressive_explain_wal", PGC_USERSET, STATS_MONITORING,
+ gettext_noop("Controls whether information about WAL record generation is added to progressive explains."),
+ gettext_noop("Equivalent to the WAL option of EXPLAIN."),
+ GUC_EXPLAIN
+ },
+ &progressive_explain_wal,
+ false,
+ NULL, NULL, NULL
+ },
+
/* End-of-list marker */
{
{NULL, 0, 0, NULL, NULL}, NULL, false, NULL, NULL, NULL
@@ -3785,6 +3866,30 @@ struct config_int ConfigureNamesInt[] =
NULL, NULL, NULL
},
+ {
+ {"progressive_explain_interval", PGC_USERSET, STATS_MONITORING,
+ gettext_noop("Sets the interval between instrumented progressive "
+ "explains."),
+ NULL,
+ GUC_UNIT_MS
+ },
+ &progressive_explain_interval,
+ 1000, 10, INT_MAX,
+ NULL, NULL, NULL
+ },
+
+ {
+ {"progressive_explain_min_duration", PGC_USERSET, STATS_MONITORING,
+ gettext_noop("Min query duration to start printing instrumented "
+ "progressive explains."),
+ NULL,
+ GUC_UNIT_MS
+ },
+ &progressive_explain_min_duration,
+ 1000, 0, INT_MAX,
+ NULL, NULL, NULL
+ },
+
/* End-of-list marker */
{
{NULL, 0, 0, NULL, NULL}, NULL, 0, 0, 0, NULL, NULL, NULL
@@ -5299,6 +5404,26 @@ struct config_enum ConfigureNamesEnum[] =
NULL, NULL, NULL
},
+ {
+ {"progressive_explain_format", PGC_USERSET, STATS_MONITORING,
+ gettext_noop("Selects the EXPLAIN output format to be used with progressive explains."),
+ gettext_noop("Equivalent to the FORMAT option of EXPLAIN.")
+ },
+ &progressive_explain_format,
+ EXPLAIN_FORMAT_TEXT, explain_format_options,
+ NULL, NULL, NULL
+ },
+
+ {
+ {"progressive_explain", PGC_USERSET, STATS_MONITORING,
+ gettext_noop("Enables progressive explains."),
+ gettext_noop("Explain output is visible via pg_stat_progress_explain.")
+ },
+ &progressive_explain,
+ PROGRESSIVE_EXPLAIN_NONE, progressive_explain_options,
+ NULL, NULL, NULL
+ },
+
/* End-of-list marker */
{
{NULL, 0, 0, NULL, NULL}, NULL, 0, NULL, NULL, NULL, NULL
diff --git a/src/backend/utils/misc/postgresql.conf.sample b/src/backend/utils/misc/postgresql.conf.sample
index 2d1de9c37bd..7da505564d4 100644
--- a/src/backend/utils/misc/postgresql.conf.sample
+++ b/src/backend/utils/misc/postgresql.conf.sample
@@ -655,6 +655,19 @@
#log_executor_stats = off
+# - Progressive Explain -
+
+#progressive_explain = off
+#progressive_explain_min_duration = 1s
+#progressive_explain_interval = 1s
+#progressive_explain_format = text
+#progressive_explain_settings = off
+#progressive_explain_verbose = off
+#progressive_explain_buffers = off
+#progressive_explain_wal = off
+#progressive_explain_timing = off
+
+
#------------------------------------------------------------------------------
# VACUUMING
#------------------------------------------------------------------------------
diff --git a/src/include/catalog/pg_proc.dat b/src/include/catalog/pg_proc.dat
index 134b3dd8689..67be2633e50 100644
--- a/src/include/catalog/pg_proc.dat
+++ b/src/include/catalog/pg_proc.dat
@@ -12450,4 +12450,14 @@
proargtypes => 'int4',
prosrc => 'gist_stratnum_common' },
+{ oid => '8770',
+ descr => 'statistics: information about progress of backends running statements',
+ proname => 'pg_stat_progress_explain', prorows => '100', proisstrict => 'f',
+ proretset => 't', provolatile => 's', proparallel => 'r',
+ prorettype => 'record', proargtypes => 'bool',
+ proallargtypes => '{bool,int4,timestamptz,text}',
+ proargmodes => '{i,o,o,o}',
+ proargnames => '{mode,pid,last_print,query_plan}',
+ prosrc => 'pg_stat_progress_explain' },
+
]
diff --git a/src/include/commands/explain.h b/src/include/commands/explain.h
index 64547bd9b9c..50e90815936 100644
--- a/src/include/commands/explain.h
+++ b/src/include/commands/explain.h
@@ -13,6 +13,7 @@
#ifndef EXPLAIN_H
#define EXPLAIN_H
+#include "datatype/timestamp.h"
#include "executor/executor.h"
#include "lib/stringinfo.h"
#include "parser/parse_node.h"
@@ -32,6 +33,13 @@ typedef enum ExplainFormat
EXPLAIN_FORMAT_YAML,
} ExplainFormat;
+typedef enum ProgressiveExplain
+{
+ PROGRESSIVE_EXPLAIN_NONE,
+ PROGRESSIVE_EXPLAIN_EXPLAIN,
+ PROGRESSIVE_EXPLAIN_ANALYZE,
+} ProgressiveExplain;
+
typedef struct ExplainWorkersState
{
int num_workers; /* # of worker processes the plan used */
@@ -67,12 +75,37 @@ typedef struct ExplainState
List *deparse_cxt; /* context list for deparsing expressions */
Bitmapset *printed_subplans; /* ids of SubPlans we've printed */
bool hide_workers; /* set if we find an invisible Gather */
+ bool progressive; /* set if tracking a progressive explain */
int rtable_size; /* length of rtable excluding the RTE_GROUP
* entry */
/* state related to the current plan node */
ExplainWorkersState *workers_state; /* needed if parallel plan */
+
+ /* state related to progressive explains */
+ struct PlanState *pe_curr_node;
+ struct Instrumentation *pe_local_instr;
+ dsa_area *pe_a;
} ExplainState;
+typedef struct progressiveExplainHashKey
+{
+ int pid; /* PID */
+} progressiveExplainHashKey;
+
+typedef struct progressiveExplainHashEntry
+{
+ progressiveExplainHashKey key; /* hash key of entry - MUST BE FIRST */
+ dsa_handle h;
+ dsa_pointer p;
+} progressiveExplainHashEntry;
+
+typedef struct progressiveExplainData
+{
+ int pid;
+ TimestampTz last_print;
+ char plan[];
+} progressiveExplainData;
+
/* Hook for plugins to get control in ExplainOneQuery() */
typedef void (*ExplainOneQuery_hook_type) (Query *query,
int cursorOptions,
@@ -120,4 +153,15 @@ extern void ExplainPrintJITSummary(ExplainState *es, QueryDesc *queryDesc);
extern void ExplainQueryText(ExplainState *es, QueryDesc *queryDesc);
extern void ExplainQueryParameters(ExplainState *es, ParamListInfo params, int maxlen);
+extern void ProgressiveExplainSetup(QueryDesc *queryDesc);
+extern void ProgressiveExplainStart(QueryDesc *queryDesc);
+extern void ProgressiveExplainTrigger(void);
+extern void ProgressiveExplainUpdate(PlanState *node);
+extern void ProgressiveExplainFinish(QueryDesc *queryDesc);
+extern Size ProgressiveExplainHashShmemSize(void);
+extern void InitProgressiveExplainHash(void);
+extern TupleTableSlot *ExecProcNodeInstrExplain(PlanState *node);
+
+extern bool ProgressiveExplainPending;
+
#endif /* EXPLAIN_H */
diff --git a/src/include/executor/execdesc.h b/src/include/executor/execdesc.h
index ba53305ad42..a6d2362c6c9 100644
--- a/src/include/executor/execdesc.h
+++ b/src/include/executor/execdesc.h
@@ -48,6 +48,7 @@ typedef struct QueryDesc
TupleDesc tupDesc; /* descriptor for result tuples */
EState *estate; /* executor's query-wide state */
PlanState *planstate; /* tree of per-plan-node state */
+ struct ExplainState *pe_es; /* progressive explain state if enabled */
/* This field is set by ExecutePlan */
bool already_executed; /* true if previously executed */
diff --git a/src/include/executor/instrument.h b/src/include/executor/instrument.h
index 03653ab6c6c..21de2a5632d 100644
--- a/src/include/executor/instrument.h
+++ b/src/include/executor/instrument.h
@@ -109,6 +109,7 @@ extern void InstrStartNode(Instrumentation *instr);
extern void InstrStopNode(Instrumentation *instr, double nTuples);
extern void InstrUpdateTupleCount(Instrumentation *instr, double nTuples);
extern void InstrEndLoop(Instrumentation *instr);
+extern void InstrEndLoopForce(Instrumentation *instr);
extern void InstrAggNode(Instrumentation *dst, Instrumentation *add);
extern void InstrStartParallelQuery(void);
extern void InstrEndParallelQuery(BufferUsage *bufusage, WalUsage *walusage);
diff --git a/src/include/nodes/execnodes.h b/src/include/nodes/execnodes.h
index a323fa98bbb..3ace9a88636 100644
--- a/src/include/nodes/execnodes.h
+++ b/src/include/nodes/execnodes.h
@@ -57,6 +57,7 @@ struct ExprState;
struct ExprContext;
struct RangeTblEntry; /* avoid including parsenodes.h here */
struct ExprEvalStep; /* avoid including execExpr.h everywhere */
+struct QueryDesc; /* avoid including execdesc.h here */
struct CopyMultiInsertBuffer;
struct LogicalTapeSet;
@@ -763,6 +764,9 @@ typedef struct EState
*/
List *es_insert_pending_result_relations;
List *es_insert_pending_modifytables;
+
+ /* Reference to query descriptor */
+ struct QueryDesc *query_desc;
} EState;
@@ -1159,6 +1163,8 @@ typedef struct PlanState
ExecProcNodeMtd ExecProcNode; /* function to return next tuple */
ExecProcNodeMtd ExecProcNodeReal; /* actual function, if above is a
* wrapper */
+ ExecProcNodeMtd ExecProcNodeOriginal; /* pointer to original function
+ * another wrapper was added */
Instrumentation *instrument; /* Optional runtime stats for this node */
WorkerInstrumentation *worker_instrument; /* per-worker instrumentation */
diff --git a/src/include/storage/lwlock.h b/src/include/storage/lwlock.h
index 13a7dc89980..eef4af1a3e7 100644
--- a/src/include/storage/lwlock.h
+++ b/src/include/storage/lwlock.h
@@ -217,6 +217,7 @@ typedef enum BuiltinTrancheIds
LWTRANCHE_SUBTRANS_SLRU,
LWTRANCHE_XACT_SLRU,
LWTRANCHE_PARALLEL_VACUUM_DSA,
+ LWTRANCHE_PROGRESSIVE_EXPLAIN_DSA,
LWTRANCHE_FIRST_USER_DEFINED,
} BuiltinTrancheIds;
diff --git a/src/include/storage/lwlocklist.h b/src/include/storage/lwlocklist.h
index cf565452382..43f10a51862 100644
--- a/src/include/storage/lwlocklist.h
+++ b/src/include/storage/lwlocklist.h
@@ -83,3 +83,4 @@ PG_LWLOCK(49, WALSummarizer)
PG_LWLOCK(50, DSMRegistry)
PG_LWLOCK(51, InjectionPoint)
PG_LWLOCK(52, SerialControl)
+PG_LWLOCK(53, ExplainHash)
diff --git a/src/include/utils/guc.h b/src/include/utils/guc.h
index 1233e07d7da..365c933ab00 100644
--- a/src/include/utils/guc.h
+++ b/src/include/utils/guc.h
@@ -278,6 +278,15 @@ extern PGDLLIMPORT int log_min_duration_statement;
extern PGDLLIMPORT int log_temp_files;
extern PGDLLIMPORT double log_statement_sample_rate;
extern PGDLLIMPORT double log_xact_sample_rate;
+extern PGDLLIMPORT int progressive_explain;
+extern PGDLLIMPORT int progressive_explain_min_duration;
+extern PGDLLIMPORT int progressive_explain_interval;
+extern PGDLLIMPORT int progressive_explain_format;
+extern PGDLLIMPORT bool progressive_explain_verbose;
+extern PGDLLIMPORT bool progressive_explain_settings;
+extern PGDLLIMPORT bool progressive_explain_timing;
+extern PGDLLIMPORT bool progressive_explain_buffers;
+extern PGDLLIMPORT bool progressive_explain_wal;
extern PGDLLIMPORT char *backtrace_functions;
extern PGDLLIMPORT int temp_file_limit;
diff --git a/src/include/utils/timeout.h b/src/include/utils/timeout.h
index 7b19beafdc9..ea66a0505d9 100644
--- a/src/include/utils/timeout.h
+++ b/src/include/utils/timeout.h
@@ -36,6 +36,8 @@ typedef enum TimeoutId
IDLE_STATS_UPDATE_TIMEOUT,
CLIENT_CONNECTION_CHECK_TIMEOUT,
STARTUP_PROGRESS_TIMEOUT,
+ PROGRESSIVE_EXPLAIN_STARTUP_TIMEOUT,
+ PROGRESSIVE_EXPLAIN_TIMEOUT,
/* First user-definable timeout reason */
USER_TIMEOUT,
/* Maximum number of timeout reasons */
diff --git a/src/test/modules/test_misc/t/008_progressive_explain.pl b/src/test/modules/test_misc/t/008_progressive_explain.pl
new file mode 100644
index 00000000000..db6adfa89d8
--- /dev/null
+++ b/src/test/modules/test_misc/t/008_progressive_explain.pl
@@ -0,0 +1,130 @@
+# Copyright (c) 2023-2025, PostgreSQL Global Development Group
+#
+# Test progressive explain
+#
+# We need to make sure pg_stat_progress_explain does not show rows for the local
+# session, even if progressive explain is enabled. For other sessions pg_stat_progress_explain
+# should contain data for a PID only if progressive_explain is enabled and a query
+# is running. Data needs to be removed when query finishes (or gets cancelled).
+
+use strict;
+use PostgreSQL::Test::Cluster;
+use PostgreSQL::Test::Utils;
+use Test::More;
+
+# Initialize node
+my $node = PostgreSQL::Test::Cluster->new('progressive_explain');
+
+$node->init;
+# Configure progressive explain to be logged immediatelly
+$node->append_conf('postgresql.conf', 'progressive_explain_min_duration = 0');
+$node->start;
+
+# Test for local session
+sub test_local_session
+{
+ my $setting = $_[0];
+ # Make sure local session does not appear in pg_stat_progress_explain
+ my $count = $node->safe_psql(
+ 'postgres', qq[
+ SET progressive_explain='$setting';
+ SELECT count(*) from pg_stat_progress_explain WHERE pid = pg_backend_pid()
+ ]);
+
+ ok($count == "0",
+ "Session cannot see its own explain with progressive_explain set to ${setting}");
+}
+
+# Tests for peer session
+sub test_peer_session
+{
+ my $setting = $_[0];
+ my $ret;
+
+ # Start a background session and get its PID
+ my $background_psql = $node->background_psql(
+ 'postgres',
+ on_error_stop => 0);
+
+ my $pid = $background_psql->query_safe(
+ qq[
+ SELECT pg_backend_pid();
+ ]);
+
+ # Configure progressive explain in background session and run a simple query
+ # letting it finish
+ $background_psql->query_safe(
+ qq[
+ SET progressive_explain='$setting';
+ SELECT 1;
+ ]);
+
+ # Check that pg_stat_progress_explain contains no row for the PID that finished
+ # its query gracefully
+ $ret = $node->safe_psql(
+ 'postgres', qq[
+ SELECT count(*) FROM pg_stat_progress_explain where pid = $pid
+ ]);
+
+ ok($ret == "0",
+ "pg_stat_progress_explain empty for completed query with progressive_explain set to ${setting}");
+
+ # Start query in background session and leave it running
+ $background_psql->query_until(
+ qr/start/, q(
+ \echo start
+ SELECT pg_sleep(600);
+ ));
+
+ $ret = $node->safe_psql(
+ 'postgres', qq[
+ SELECT count(*) FROM pg_stat_progress_explain where pid = $pid
+ ]);
+
+ # If progressive_explain is disabled pg_stat_progress_explain should not contain
+ # row for PID
+ if ($setting eq 'off') {
+ ok($ret == "0",
+ "pg_stat_progress_explain empty for running query with progressive_explain set to ${setting}");
+ }
+ # 1 row for pid is expected for running query
+ else {
+ ok($ret == "1",
+ "pg_stat_progress_explain with 1 row for running query with progressive_explain set to ${setting}");
+ }
+
+ # Terminate running query and make sure it is gone
+ $node->safe_psql(
+ 'postgres', qq[
+ SELECT pg_cancel_backend($pid);
+ ]);
+
+ $node->poll_query_until(
+ 'postgres', qq[
+ SELECT count(*) = 0 FROM pg_stat_activity
+ WHERE pid = $pid and state = 'active'
+ ]);
+
+ # Check again pg_stat_progress_explain and confirm that the existing row is
+ # now gone
+ $ret = $node->safe_psql(
+ 'postgres', qq[
+ SELECT count(*) FROM pg_stat_progress_explain where pid = $pid
+ ]);
+
+ ok($ret == "0",
+ "pg_stat_progress_explain empty for canceled query with progressive_explain set to ${setting}");
+}
+
+# Run tests for the local session
+test_local_session('off');
+test_local_session('explain');
+test_local_session('analyze');
+
+# Run tests for peer session
+test_peer_session('off');
+test_peer_session('explain');
+test_peer_session('analyze');
+
+$node->stop;
+done_testing();
diff --git a/src/test/regress/expected/rules.out b/src/test/regress/expected/rules.out
index 62f69ac20b2..4968db5607f 100644
--- a/src/test/regress/expected/rules.out
+++ b/src/test/regress/expected/rules.out
@@ -2041,6 +2041,10 @@ pg_stat_progress_create_index| SELECT s.pid,
s.param15 AS partitions_done
FROM (pg_stat_get_progress_info('CREATE INDEX'::text) s(pid, datid, relid, param1, param2, param3, param4, param5, param6, param7, param8, param9, param10, param11, param12, param13, param14, param15, param16, param17, param18, param19, param20)
LEFT JOIN pg_database d ON ((s.datid = d.oid)));
+pg_stat_progress_explain| SELECT pid,
+ last_print,
+ query_plan
+ FROM pg_stat_progress_explain(true) pg_stat_progress_explain(pid, last_print, query_plan);
pg_stat_progress_vacuum| SELECT s.pid,
s.datid,
d.datname,
diff --git a/src/tools/pgindent/typedefs.list b/src/tools/pgindent/typedefs.list
index 9840060997f..f1a74454187 100644
--- a/src/tools/pgindent/typedefs.list
+++ b/src/tools/pgindent/typedefs.list
@@ -2267,6 +2267,7 @@ ProcessUtilityContext
ProcessUtility_hook_type
ProcessingMode
ProgressCommandType
+ProgressiveExplain
ProjectSet
ProjectSetPath
ProjectSetState
@@ -3870,6 +3871,9 @@ process_sublinks_context
proclist_head
proclist_mutable_iter
proclist_node
+progressiveExplainData
+progressiveExplainHashEntry
+progressiveExplainHashKey
promptStatus_t
pthread_barrier_t
pthread_cond_t
--
2.43.0
v7-0001-Proposal-for-progressive-explains-with-execprocnode-wrapper.patch.nocfbotapplication/octet-stream; name=v7-0001-Proposal-for-progressive-explains-with-execprocnode-wrapper.patch.nocfbotDownload
From 918554a966c4cf45da330ce9b8e2356cbc152fb7 Mon Sep 17 00:00:00 2001
From: rafaelthca <rafaelthca@gmail.com>
Date: Thu, 6 Mar 2025 17:34:16 -0300
Subject: [PATCH v7] Proposal for progressive explains
This proposal introduces a feature to print execution plans of active
queries in an in-memory shared hash object so that other sessions can
visualize via new view pg_stat_progress_explain.
Plans are only printed if new GUC parameter progressive_explain is
enabled.
When progressive_explain is set to 'explain' the plan will be printed
only once at the beginning of the query. If set to 'analyze' the QueryDesc
will be adjusted adding instrumentation flags. In that case the plan
will be printed on a fixed interval controlled by new GUC parameter
progressive_explain_interval including all instrumentation stats
computed so far (per node rows and execution time).
New view:
- pg_stat_progress_explain
- pid: PID of the process running the query
- last_print: timestamp when plan was last printed
- query_plan: the actual plan (limited read privileges)
New GUCs:
- progressive_explain: if progressive plans are printed for local
session.
- type: enum
- default: off
- values: [off, explain, analyze]
- context: user
- progressive_explain_min_duration: min query duration until progressive
explain starts.
- type: int
- default: 1s
- min: 0
- context: user
- progressive_explain_interval: interval between each explain print.
- type: int
- default: 1s
- min: 10ms
- context: user
- progressive_explain_format: format used to print the plans.
- type: enum
- default: text
- values: [TEXT, XML, JSON, or YAML]
- context: user
- progressive_explain_settings: controls whether information about
modified configuration is added to the printed plan.
- type: bool
- default: off
- context: user
- progressive_explain_verbose: controls whether verbose details are
added to the printed plan.
- type: bool
- default: off
- context: user
- progressive_explain_buffers: controls whether buffers details are
added to the printed plan.
- type: bool
- default: off
- context: user
- progressive_explain_timing: controls whether per node timing details
are added to the printed plan.
- type: bool
- default: off
- context: user
- progressive_explain_wal: controls whether WAL record generation
details are added to the printed plan.
- type: bool
- default: off
- context: user
---
doc/src/sgml/config.sgml | 134 +++
doc/src/sgml/monitoring.sgml | 64 ++
doc/src/sgml/perform.sgml | 96 ++
src/backend/catalog/system_views.sql | 5 +
src/backend/commands/explain.c | 822 +++++++++++++++++-
src/backend/executor/execMain.c | 24 +
src/backend/executor/execProcnode.c | 10 +-
src/backend/executor/instrument.c | 20 +-
src/backend/storage/ipc/ipci.c | 7 +
src/backend/storage/lmgr/lwlock.c | 1 +
.../utils/activity/wait_event_names.txt | 1 +
src/backend/utils/init/postinit.c | 19 +
src/backend/utils/misc/guc_tables.c | 125 +++
src/backend/utils/misc/postgresql.conf.sample | 13 +
src/include/catalog/pg_proc.dat | 10 +
src/include/commands/explain.h | 44 +
src/include/executor/execdesc.h | 1 +
src/include/executor/instrument.h | 1 +
src/include/nodes/execnodes.h | 6 +
src/include/storage/lwlock.h | 1 +
src/include/storage/lwlocklist.h | 1 +
src/include/utils/guc.h | 9 +
src/include/utils/timeout.h | 2 +
.../test_misc/t/008_progressive_explain.pl | 130 +++
src/test/regress/expected/rules.out | 4 +
src/tools/pgindent/typedefs.list | 4 +
26 files changed, 1504 insertions(+), 50 deletions(-)
create mode 100644 src/test/modules/test_misc/t/008_progressive_explain.pl
diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml
index d2fa5f7d1a9..6f0b0d06c7b 100644
--- a/doc/src/sgml/config.sgml
+++ b/doc/src/sgml/config.sgml
@@ -8483,6 +8483,140 @@ COPY postgres_log FROM '/full/path/to/logfile.csv' WITH csv;
</listitem>
</varlistentry>
+ <varlistentry id="guc-progressive-explain" xreflabel="progressive_explain">
+ <term><varname>progressive_explain</varname> (<type>enum</type>)
+ <indexterm>
+ <primary><varname>progressive_explain</varname> configuration parameter</primary>
+ </indexterm>
+ </term>
+ <listitem>
+ <para>
+ Determines whether progressive explains are enabled and how
+ they are executed; see <xref linkend="using-explain-progressive"/>.
+ Possible values are <literal>off</literal>, <literal>explain</literal>
+ and <literal>analyze</literal>. The default is <literal>off</literal>.
+ </para>
+ </listitem>
+ </varlistentry>
+
+ <varlistentry id="guc-progressive-explain-min-duration" xreflabel="progressive_explain_min_duration">
+ <term><varname>progressive_explain_min_duration</varname> (<type>integer</type>)
+ <indexterm>
+ <primary><varname>progressive_explain_min_duration</varname> configuration parameter</primary>
+ </indexterm>
+ </term>
+ <listitem>
+ <para>
+ Sets the threshold (in milliseconds) until progressive explain is
+ printed for the first time. The default is <literal>1s</literal>.
+ </para>
+ </listitem>
+ </varlistentry>
+
+ <varlistentry id="guc-progressive-explain-interval" xreflabel="progressive_explain_interval">
+ <term><varname>progressive_explain_interval</varname> (<type>integer</type>)
+ <indexterm>
+ <primary><varname>progressive_explain_interval</varname> configuration parameter</primary>
+ </indexterm>
+ </term>
+ <listitem>
+ <para>
+ Sets the interval (in milliseconds) between each instrumented
+ progressive explain. The default is <literal>1s</literal>.
+ </para>
+ </listitem>
+ </varlistentry>
+
+ <varlistentry id="guc-progressive-explain-buffers" xreflabel="progressive_explain_buffers">
+ <term><varname>progressive_explain_buffers</varname> (<type>boolean</type>)
+ <indexterm>
+ <primary><varname>progressive_explain_buffers</varname> configuration parameter</primary>
+ </indexterm>
+ </term>
+ <listitem>
+ <para>
+ Controls whether information about buffers is added to
+ progressive explains. Equivalent to the BUFFERS option of
+ EXPLAIN. The default is <literal>off</literal>.
+ </para>
+ </listitem>
+ </varlistentry>
+
+ <varlistentry id="guc-progressive-explain-timing" xreflabel="progressive_explain_timing">
+ <term><varname>progressive_explain_timing</varname> (<type>boolean</type>)
+ <indexterm>
+ <primary><varname>progressive_explain_timing</varname> configuration parameter</primary>
+ </indexterm>
+ </term>
+ <listitem>
+ <para>
+ Controls whether information about per node timing is added
+ to progressive explains. Equivalent to the TIMING option of
+ EXPLAIN. The default is <literal>off</literal>.
+ </para>
+ </listitem>
+ </varlistentry>
+
+ <varlistentry id="guc-progressive-explain-wal" xreflabel="progressive_explain_wal">
+ <term><varname>progressive_explain_wal</varname> (<type>boolean</type>)
+ <indexterm>
+ <primary><varname>progressive_explain_wal</varname> configuration parameter</primary>
+ </indexterm>
+ </term>
+ <listitem>
+ <para>
+ Controls whether information about WAL record generation is
+ added to progressive explains. Equivalent to the WAL option of
+ EXPLAIN. The default is <literal>off</literal>.
+ </para>
+ </listitem>
+ </varlistentry>
+
+ <varlistentry id="guc-progressive-explain-verbose" xreflabel="progressive_explain_verbose">
+ <term><varname>progressive_explain_verbose</varname> (<type>boolean</type>)
+ <indexterm>
+ <primary><varname>progressive_explain_verbose</varname> configuration parameter</primary>
+ </indexterm>
+ </term>
+ <listitem>
+ <para>
+ Controls whether verbose details are added to progressive explains.
+ Equivalent to the VERBOSE option of EXPLAIN. The default is
+ <literal>off</literal>.
+ </para>
+ </listitem>
+ </varlistentry>
+
+ <varlistentry id="guc-progressive-explain-settings" xreflabel="progressive_explain_settings">
+ <term><varname>progressive_explain_settings</varname> (<type>boolean</type>)
+ <indexterm>
+ <primary><varname>progressive_explain_settings</varname> configuration parameter</primary>
+ </indexterm>
+ </term>
+ <listitem>
+ <para>
+ Controls whether information about modified configuration is added to
+ progressive explains. Equivalent to the SETTINGS option of EXPLAIN.
+ The default is <literal>off</literal>.
+ </para>
+ </listitem>
+ </varlistentry>
+
+ <varlistentry id="guc-progressive-explain-format" xreflabel="progressive_explain_format">
+ <term><varname>progressive_explain_format</varname> (<type>enum</type>)
+ <indexterm>
+ <primary><varname>progressive_explain_format</varname> configuration parameter</primary>
+ </indexterm>
+ </term>
+ <listitem>
+ <para>
+ Selects the EXPLAIN output format to be used with progressive
+ explains. Equivalent to the FORMAT option of EXPLAIN. The default
+ is <literal>text</literal>.
+ </para>
+ </listitem>
+ </varlistentry>
+
</variablelist>
</sect2>
diff --git a/doc/src/sgml/monitoring.sgml b/doc/src/sgml/monitoring.sgml
index 16646f560e8..8a9934d53e1 100644
--- a/doc/src/sgml/monitoring.sgml
+++ b/doc/src/sgml/monitoring.sgml
@@ -6793,6 +6793,70 @@ FROM pg_stat_get_backend_idset() AS backendid;
</sect2>
+<sect2 id="explain-progress-reporting">
+ <title>EXPLAIN Progress Reporting</title>
+
+ <indexterm>
+ <primary>pg_stat_progress_explain</primary>
+ </indexterm>
+
+ <para>
+ Whenever a client backend or parallel worker is running a query with
+ <xref linkend="guc-progressive-explain"/> enabled, the
+ <structname>pg_stat_progress_explain</structname> view will contain a
+ corresponding row with query plan details; see
+ <xref linkend="using-explain-progressive"/>. The table below describe the
+ information that will be reported.
+ </para>
+
+ <table id="pg-stat-progress-explain-view" xreflabel="pg_stat_progress_explain">
+ <title><structname>pg_stat_progress_explain</structname> View</title>
+ <tgroup cols="1">
+ <thead>
+ <row>
+ <entry role="catalog_table_entry"><para role="column_definition">
+ Column Type
+ </para>
+ <para>
+ Description
+ </para></entry>
+ </row>
+ </thead>
+
+ <tbody>
+ <row>
+ <entry role="catalog_table_entry"><para role="column_definition">
+ <structfield>pid</structfield> <type>integer</type>
+ </para>
+ <para>
+ Process ID of a client backend or parallel worker.
+ </para></entry>
+ </row>
+
+ <row>
+ <entry role="catalog_table_entry"><para role="column_definition">
+ <structfield>last_print</structfield> <type>timestamp with time zone</type>
+ </para>
+ <para>
+ Timestamp when plan was last printed.
+ </para></entry>
+ </row>
+
+ <row>
+ <entry role="catalog_table_entry"><para role="column_definition">
+ <structfield>query_plan</structfield> <type>text</type>
+ </para>
+ <para>
+ The progressive explain text.
+ </para></entry>
+ </row>
+
+ </tbody>
+ </tgroup>
+ </table>
+
+ </sect2>
+
</sect1>
<sect1 id="dynamic-trace">
diff --git a/doc/src/sgml/perform.sgml b/doc/src/sgml/perform.sgml
index be4b49f62b5..50eaf5533a4 100644
--- a/doc/src/sgml/perform.sgml
+++ b/doc/src/sgml/perform.sgml
@@ -1109,6 +1109,102 @@ EXPLAIN ANALYZE SELECT * FROM tenk1 WHERE unique1 < 100 AND unique2 > 9000
</para>
</sect2>
+ <sect2 id="using-explain-progressive">
+ <title>Progressive <command>EXPLAIN</command></title>
+
+ <para>
+ The query plan created by the planner for any given active query can
+ be visualized by any session via <xref linkend="pg-stat-progress-explain-view"/>
+ view when <xref linkend="guc-progressive-explain"/> is enabled in the
+ client backend or parallel worker executing query and after min duration
+ specified by <xref linkend="guc-progressive-explain-min-duration"/> has
+ passed. Settings <xref linkend="guc-progressive-explain-timing"/>,
+ <xref linkend="guc-progressive-explain-buffers"/> and
+ <xref linkend="guc-progressive-explain-wal"/> control which additional
+ instrumentaton details are collected and included in the output while
+ <xref linkend="guc-progressive-explain-format"/>,
+ <xref linkend="guc-progressive-explain-verbose"/> and
+ <xref linkend="guc-progressive-explain-settings"/>
+ define how the plan is printed and which details are added there.
+ </para>
+
+ <para>
+ When <xref linkend="guc-progressive-explain"/> is set to <literal>explain</literal>
+ the plan will be printed once at the beginning of the query.
+ </para>
+
+ <para>
+<screen>
+SET progressive_explain = 'explain';
+SET
+
+SELECT * FROM test t1 INNER JOIN test t2 ON t1.c1=t2.c1;
+</screen>
+ </para>
+ <para>
+<screen>
+SELECT * FROM pg_stat_progress_explain;
+ pid | last_print | query_plan
+------+-------------------------------+------------------------------------------------------------------------------
+ 5307 | 2025-02-18 09:37:41.781459-03 | Hash Join (cost=327879.85..878413.95 rows=9999860 width=18) +
+ | | Hash Cond: (t1.c1 = t2.c1) +
+ | | -> Seq Scan on test t1 (cost=0.00..154053.60 rows=9999860 width=9) +
+ | | -> Hash (cost=154053.60..154053.60 rows=9999860 width=9) +
+ | | -> Seq Scan on test t2 (cost=0.00..154053.60 rows=9999860 width=9)+
+ | |
+(1 row)
+</screen>
+ </para>
+
+ <para>
+ Setting <xref linkend="guc-progressive-explain"/> to <literal>analyze</literal>
+ will enable instrumentation and the detailed plan is printed on a fixed interval
+ controlled by <xref linkend="guc-progressive-explain-interval"/>, including
+ per node accumulated row count and other statistics.
+ </para>
+
+ <para>
+ Progressive explains include additional information per node to help analyzing
+ execution progress:
+
+ <itemizedlist>
+ <listitem>
+ <para>
+ current: the plan node currently being processed.
+ </para>
+ </listitem>
+ <listitem>
+ <para>
+ never executed: a plan node not processed yet.
+ </para>
+ </listitem>
+ </itemizedlist>
+ </para>
+ <para>
+<screen>
+SET progressive_explain = 'analyze';
+SET
+
+SELECT * FROM test t1 INNER JOIN test t2 ON t1.c1=t2.c1;
+</screen>
+ </para>
+ <para>
+<screen>
+SELECT * FROM pg_stat_progress_explain;
+ pid | last_print | query_plan
+------+-------------------------------+------------------------------------------------------------------------------------------------------------------------------------
+ 5307 | 2025-02-18 09:36:03.580721-03 | Hash Join (cost=327879.85..878413.95 rows=9999860 width=18) (actual time=2010.504..2998.111 rows=38603 loops=1) +
+ | | Hash Cond: (t1.c1 = t2.c1) +
+ | | -> Seq Scan on test t1 (cost=0.00..154053.60 rows=9999860 width=9) (actual time=0.068..303.963 rows=4928320 loops=1) (current)+
+ | | -> Hash (cost=154053.60..154053.60 rows=9999860 width=9) (actual time=2009.824..2009.824 rows=10000000 loops=1) +
+ | | -> Seq Scan on test t2 (cost=0.00..154053.60 rows=9999860 width=9) (actual time=0.035..640.890 rows=10000000 loops=1) +
+ | |
+(1 row)
+</screen>
+ </para>
+
+ </sect2>
+
</sect1>
<sect1 id="planner-stats">
diff --git a/src/backend/catalog/system_views.sql b/src/backend/catalog/system_views.sql
index a4d2cfdcaf5..dd746078ea7 100644
--- a/src/backend/catalog/system_views.sql
+++ b/src/backend/catalog/system_views.sql
@@ -1334,6 +1334,11 @@ CREATE VIEW pg_stat_progress_copy AS
FROM pg_stat_get_progress_info('COPY') AS S
LEFT JOIN pg_database D ON S.datid = D.oid;
+CREATE VIEW pg_stat_progress_explain AS
+ SELECT
+ *
+ FROM pg_stat_progress_explain(true);
+
CREATE VIEW pg_user_mappings AS
SELECT
U.oid AS umid,
diff --git a/src/backend/commands/explain.c b/src/backend/commands/explain.c
index d8a7232cedb..7fd6d664bb5 100644
--- a/src/backend/commands/explain.c
+++ b/src/backend/commands/explain.c
@@ -21,6 +21,7 @@
#include "commands/explain_format.h"
#include "commands/prepare.h"
#include "foreign/fdwapi.h"
+#include "funcapi.h"
#include "jit/jit.h"
#include "libpq/pqformat.h"
#include "libpq/protocol.h"
@@ -32,6 +33,7 @@
#include "rewrite/rewriteHandler.h"
#include "storage/bufmgr.h"
#include "tcop/tcopprot.h"
+#include "utils/backend_status.h"
#include "utils/builtins.h"
#include "utils/guc_tables.h"
#include "utils/json.h"
@@ -39,17 +41,28 @@
#include "utils/rel.h"
#include "utils/ruleutils.h"
#include "utils/snapmgr.h"
+#include "utils/timeout.h"
#include "utils/tuplesort.h"
#include "utils/typcache.h"
#include "utils/xml.h"
+#define PROGRESSIVE_EXPLAIN_ALLOC_SIZE 4096
+
/* Hook for plugins to get control in ExplainOneQuery() */
ExplainOneQuery_hook_type ExplainOneQuery_hook = NULL;
/* Hook for plugins to get control in explain_get_index_name() */
explain_get_index_name_hook_type explain_get_index_name_hook = NULL;
+/* Shared hash to store progressive explains */
+static HTAB *progressiveExplainArray = NULL;
+
+/* Pointer to running query */
+static QueryDesc *activeQueryDesc = NULL;
+
+/* Flag set by timeouts to control when to print progressive explains */
+bool ProgressiveExplainPending = false;
/*
* Various places within need to convert bytes to kilobytes. Round these up
@@ -128,7 +141,7 @@ static void show_hashagg_info(AggState *aggstate, ExplainState *es);
static void show_tidbitmap_info(BitmapHeapScanState *planstate,
ExplainState *es);
static void show_instrumentation_count(const char *qlabel, int which,
- PlanState *planstate, ExplainState *es);
+ Instrumentation *instr, ExplainState *es);
static void show_foreignscan_info(ForeignScanState *fsstate, ExplainState *es);
static const char *explain_get_index_name(Oid indexId);
static bool peek_buffer_usage(ExplainState *es, const BufferUsage *usage);
@@ -154,6 +167,13 @@ static ExplainWorkersState *ExplainCreateWorkersState(int num_workers);
static void ExplainOpenWorker(int n, ExplainState *es);
static void ExplainCloseWorker(int n, ExplainState *es);
static void ExplainFlushWorkersState(ExplainState *es);
+static void ProgressiveExplainInit(QueryDesc *queryDesc);
+static void ProgressiveExplainPrint(QueryDesc *queryDesc);
+static void ProgressiveExplainCleanup(QueryDesc *queryDesc);
+static TupleTableSlot *ExecProcNodeExplain(PlanState *node);
+static void WrapExecProcNodeWithExplain(PlanState *ps);
+static void UnwrapExecProcNodeWithExplain(PlanState *ps);
+static void ProgressiveExplainReleaseFunc(void *);
@@ -366,6 +386,8 @@ NewExplainState(void)
es->costs = true;
/* Prepare output buffer. */
es->str = makeStringInfo();
+ /* An explain state is not progressive by default */
+ es->progressive = false;
return es;
}
@@ -1490,6 +1512,7 @@ ExplainNode(PlanState *planstate, List *ancestors,
const char *partialmode = NULL;
const char *operation = NULL;
const char *custom_name = NULL;
+ Instrumentation *local_instr = NULL;
ExplainWorkersState *save_workers_state = es->workers_state;
int save_indent = es->indent;
bool haschildren;
@@ -1953,17 +1976,38 @@ ExplainNode(PlanState *planstate, List *ancestors,
* instrumentation results the user didn't ask for. But we do the
* InstrEndLoop call anyway, if possible, to reduce the number of cases
* auto_explain has to contend with.
+ *
+ * For regular explains instrumentation clean up is called directly in the
+ * main instrumentation objects. Progressive explains need to clone
+ * instrumentation object and forcibly end the loop in nodes that may be
+ * running.
*/
if (planstate->instrument)
- InstrEndLoop(planstate->instrument);
+ {
+ /* Progressive explain. Use auxiliary instrumentation object */
+ if (es->progressive)
+ {
+ local_instr = es->pe_local_instr;
+ *local_instr = *planstate->instrument;
+
+ /* Force end loop even if node is in progress */
+ InstrEndLoopForce(local_instr);
+ }
+ /* Use main instrumentation */
+ else
+ {
+ local_instr = planstate->instrument;
+ InstrEndLoop(local_instr);
+ }
+ }
if (es->analyze &&
- planstate->instrument && planstate->instrument->nloops > 0)
+ local_instr && local_instr->nloops > 0)
{
- double nloops = planstate->instrument->nloops;
- double startup_ms = 1000.0 * planstate->instrument->startup / nloops;
- double total_ms = 1000.0 * planstate->instrument->total / nloops;
- double rows = planstate->instrument->ntuples / nloops;
+ double nloops = local_instr->nloops;
+ double startup_ms = 1000.0 * local_instr->startup / nloops;
+ double total_ms = 1000.0 * local_instr->total / nloops;
+ double rows = local_instr->ntuples / nloops;
if (es->format == EXPLAIN_FORMAT_TEXT)
{
@@ -1973,6 +2017,9 @@ ExplainNode(PlanState *planstate, List *ancestors,
appendStringInfo(es->str, "time=%.3f..%.3f ", startup_ms, total_ms);
appendStringInfo(es->str, "rows=%.2f loops=%.0f)", rows, nloops);
+
+ if (es->progressive && planstate == es->pe_curr_node)
+ appendStringInfo(es->str, " (current)");
}
else
{
@@ -1985,6 +2032,10 @@ ExplainNode(PlanState *planstate, List *ancestors,
}
ExplainPropertyFloat("Actual Rows", NULL, rows, 2, es);
ExplainPropertyFloat("Actual Loops", NULL, nloops, 0, es);
+
+ /* Progressive explain. Add current node flag is applicable */
+ if (es->progressive && planstate == es->pe_curr_node)
+ ExplainPropertyBool("Current", true, es);
}
}
else if (es->analyze)
@@ -2089,29 +2140,29 @@ ExplainNode(PlanState *planstate, List *ancestors,
"Index Cond", planstate, ancestors, es);
if (((IndexScan *) plan)->indexqualorig)
show_instrumentation_count("Rows Removed by Index Recheck", 2,
- planstate, es);
+ local_instr, es);
show_scan_qual(((IndexScan *) plan)->indexorderbyorig,
"Order By", planstate, ancestors, es);
show_scan_qual(plan->qual, "Filter", planstate, ancestors, es);
if (plan->qual)
show_instrumentation_count("Rows Removed by Filter", 1,
- planstate, es);
+ local_instr, es);
break;
case T_IndexOnlyScan:
show_scan_qual(((IndexOnlyScan *) plan)->indexqual,
"Index Cond", planstate, ancestors, es);
if (((IndexOnlyScan *) plan)->recheckqual)
show_instrumentation_count("Rows Removed by Index Recheck", 2,
- planstate, es);
+ local_instr, es);
show_scan_qual(((IndexOnlyScan *) plan)->indexorderby,
"Order By", planstate, ancestors, es);
show_scan_qual(plan->qual, "Filter", planstate, ancestors, es);
if (plan->qual)
show_instrumentation_count("Rows Removed by Filter", 1,
- planstate, es);
+ local_instr, es);
if (es->analyze)
ExplainPropertyFloat("Heap Fetches", NULL,
- planstate->instrument->ntuples2, 0, es);
+ local_instr->ntuples2, 0, es);
break;
case T_BitmapIndexScan:
show_scan_qual(((BitmapIndexScan *) plan)->indexqualorig,
@@ -2122,11 +2173,11 @@ ExplainNode(PlanState *planstate, List *ancestors,
"Recheck Cond", planstate, ancestors, es);
if (((BitmapHeapScan *) plan)->bitmapqualorig)
show_instrumentation_count("Rows Removed by Index Recheck", 2,
- planstate, es);
+ local_instr, es);
show_scan_qual(plan->qual, "Filter", planstate, ancestors, es);
if (plan->qual)
show_instrumentation_count("Rows Removed by Filter", 1,
- planstate, es);
+ local_instr, es);
show_tidbitmap_info((BitmapHeapScanState *) planstate, es);
break;
case T_SampleScan:
@@ -2143,7 +2194,7 @@ ExplainNode(PlanState *planstate, List *ancestors,
show_scan_qual(plan->qual, "Filter", planstate, ancestors, es);
if (plan->qual)
show_instrumentation_count("Rows Removed by Filter", 1,
- planstate, es);
+ local_instr, es);
if (IsA(plan, CteScan))
show_ctescan_info(castNode(CteScanState, planstate), es);
break;
@@ -2154,7 +2205,7 @@ ExplainNode(PlanState *planstate, List *ancestors,
show_scan_qual(plan->qual, "Filter", planstate, ancestors, es);
if (plan->qual)
show_instrumentation_count("Rows Removed by Filter", 1,
- planstate, es);
+ local_instr, es);
ExplainPropertyInteger("Workers Planned", NULL,
gather->num_workers, es);
@@ -2178,7 +2229,7 @@ ExplainNode(PlanState *planstate, List *ancestors,
show_scan_qual(plan->qual, "Filter", planstate, ancestors, es);
if (plan->qual)
show_instrumentation_count("Rows Removed by Filter", 1,
- planstate, es);
+ local_instr, es);
ExplainPropertyInteger("Workers Planned", NULL,
gm->num_workers, es);
@@ -2212,7 +2263,7 @@ ExplainNode(PlanState *planstate, List *ancestors,
show_scan_qual(plan->qual, "Filter", planstate, ancestors, es);
if (plan->qual)
show_instrumentation_count("Rows Removed by Filter", 1,
- planstate, es);
+ local_instr, es);
break;
case T_TableFuncScan:
if (es->verbose)
@@ -2226,7 +2277,7 @@ ExplainNode(PlanState *planstate, List *ancestors,
show_scan_qual(plan->qual, "Filter", planstate, ancestors, es);
if (plan->qual)
show_instrumentation_count("Rows Removed by Filter", 1,
- planstate, es);
+ local_instr, es);
show_table_func_scan_info(castNode(TableFuncScanState,
planstate), es);
break;
@@ -2244,7 +2295,7 @@ ExplainNode(PlanState *planstate, List *ancestors,
show_scan_qual(plan->qual, "Filter", planstate, ancestors, es);
if (plan->qual)
show_instrumentation_count("Rows Removed by Filter", 1,
- planstate, es);
+ local_instr, es);
}
break;
case T_TidRangeScan:
@@ -2261,14 +2312,14 @@ ExplainNode(PlanState *planstate, List *ancestors,
show_scan_qual(plan->qual, "Filter", planstate, ancestors, es);
if (plan->qual)
show_instrumentation_count("Rows Removed by Filter", 1,
- planstate, es);
+ local_instr, es);
}
break;
case T_ForeignScan:
show_scan_qual(plan->qual, "Filter", planstate, ancestors, es);
if (plan->qual)
show_instrumentation_count("Rows Removed by Filter", 1,
- planstate, es);
+ local_instr, es);
show_foreignscan_info((ForeignScanState *) planstate, es);
break;
case T_CustomScan:
@@ -2278,7 +2329,7 @@ ExplainNode(PlanState *planstate, List *ancestors,
show_scan_qual(plan->qual, "Filter", planstate, ancestors, es);
if (plan->qual)
show_instrumentation_count("Rows Removed by Filter", 1,
- planstate, es);
+ local_instr, es);
if (css->methods->ExplainCustomScan)
css->methods->ExplainCustomScan(css, ancestors, es);
}
@@ -2288,11 +2339,11 @@ ExplainNode(PlanState *planstate, List *ancestors,
"Join Filter", planstate, ancestors, es);
if (((NestLoop *) plan)->join.joinqual)
show_instrumentation_count("Rows Removed by Join Filter", 1,
- planstate, es);
+ local_instr, es);
show_upper_qual(plan->qual, "Filter", planstate, ancestors, es);
if (plan->qual)
show_instrumentation_count("Rows Removed by Filter", 2,
- planstate, es);
+ local_instr, es);
break;
case T_MergeJoin:
show_upper_qual(((MergeJoin *) plan)->mergeclauses,
@@ -2301,11 +2352,11 @@ ExplainNode(PlanState *planstate, List *ancestors,
"Join Filter", planstate, ancestors, es);
if (((MergeJoin *) plan)->join.joinqual)
show_instrumentation_count("Rows Removed by Join Filter", 1,
- planstate, es);
+ local_instr, es);
show_upper_qual(plan->qual, "Filter", planstate, ancestors, es);
if (plan->qual)
show_instrumentation_count("Rows Removed by Filter", 2,
- planstate, es);
+ local_instr, es);
break;
case T_HashJoin:
show_upper_qual(((HashJoin *) plan)->hashclauses,
@@ -2314,11 +2365,11 @@ ExplainNode(PlanState *planstate, List *ancestors,
"Join Filter", planstate, ancestors, es);
if (((HashJoin *) plan)->join.joinqual)
show_instrumentation_count("Rows Removed by Join Filter", 1,
- planstate, es);
+ local_instr, es);
show_upper_qual(plan->qual, "Filter", planstate, ancestors, es);
if (plan->qual)
show_instrumentation_count("Rows Removed by Filter", 2,
- planstate, es);
+ local_instr, es);
break;
case T_Agg:
show_agg_keys(castNode(AggState, planstate), ancestors, es);
@@ -2326,13 +2377,13 @@ ExplainNode(PlanState *planstate, List *ancestors,
show_hashagg_info((AggState *) planstate, es);
if (plan->qual)
show_instrumentation_count("Rows Removed by Filter", 1,
- planstate, es);
+ local_instr, es);
break;
case T_WindowAgg:
show_upper_qual(plan->qual, "Filter", planstate, ancestors, es);
if (plan->qual)
show_instrumentation_count("Rows Removed by Filter", 1,
- planstate, es);
+ local_instr, es);
show_upper_qual(((WindowAgg *) plan)->runConditionOrig,
"Run Condition", planstate, ancestors, es);
show_windowagg_info(castNode(WindowAggState, planstate), es);
@@ -2342,7 +2393,7 @@ ExplainNode(PlanState *planstate, List *ancestors,
show_upper_qual(plan->qual, "Filter", planstate, ancestors, es);
if (plan->qual)
show_instrumentation_count("Rows Removed by Filter", 1,
- planstate, es);
+ local_instr, es);
break;
case T_Sort:
show_sort_keys(castNode(SortState, planstate), ancestors, es);
@@ -2364,7 +2415,7 @@ ExplainNode(PlanState *planstate, List *ancestors,
show_upper_qual(plan->qual, "Filter", planstate, ancestors, es);
if (plan->qual)
show_instrumentation_count("Rows Removed by Filter", 1,
- planstate, es);
+ local_instr, es);
break;
case T_ModifyTable:
show_modifytable_info(castNode(ModifyTableState, planstate), ancestors,
@@ -2409,10 +2460,10 @@ ExplainNode(PlanState *planstate, List *ancestors,
}
/* Show buffer/WAL usage */
- if (es->buffers && planstate->instrument)
- show_buffer_usage(es, &planstate->instrument->bufusage);
- if (es->wal && planstate->instrument)
- show_wal_usage(es, &planstate->instrument->walusage);
+ if (es->buffers && local_instr)
+ show_buffer_usage(es, &local_instr->bufusage);
+ if (es->wal && local_instr)
+ show_wal_usage(es, &local_instr->walusage);
/* Prepare per-worker buffer/WAL usage */
if (es->workers_state && (es->buffers || es->wal) && es->verbose)
@@ -3929,19 +3980,19 @@ show_tidbitmap_info(BitmapHeapScanState *planstate, ExplainState *es)
*/
static void
show_instrumentation_count(const char *qlabel, int which,
- PlanState *planstate, ExplainState *es)
+ Instrumentation *instr, ExplainState *es)
{
double nfiltered;
double nloops;
- if (!es->analyze || !planstate->instrument)
+ if (!es->analyze || !instr)
return;
if (which == 2)
- nfiltered = planstate->instrument->nfiltered2;
+ nfiltered = instr->nfiltered2;
else
- nfiltered = planstate->instrument->nfiltered1;
- nloops = planstate->instrument->nloops;
+ nfiltered = instr->nfiltered1;
+ nloops = instr->nloops;
/* In text mode, suppress zero counts; they're not interesting enough */
if (nfiltered > 0 || es->format != EXPLAIN_FORMAT_TEXT)
@@ -4612,7 +4663,7 @@ show_modifytable_info(ModifyTableState *mtstate, List *ancestors,
{
show_upper_qual((List *) node->onConflictWhere, "Conflict Filter",
&mtstate->ps, ancestors, es);
- show_instrumentation_count("Rows Removed by Conflict Filter", 1, &mtstate->ps, es);
+ show_instrumentation_count("Rows Removed by Conflict Filter", 1, (&mtstate->ps)->instrument, es);
}
/* EXPLAIN ANALYZE display of actual outcome for each tuple proposed */
@@ -4621,11 +4672,24 @@ show_modifytable_info(ModifyTableState *mtstate, List *ancestors,
double total;
double insert_path;
double other_path;
+ Instrumentation *local_instr;
- InstrEndLoop(outerPlanState(mtstate)->instrument);
+ /* Progressive explain. Use auxiliary instrumentation object */
+ if (es->progressive)
+ {
+ local_instr = es->pe_local_instr;
+ *local_instr = *outerPlanState(mtstate)->instrument;
+ /* Force end loop even if node is in progress */
+ InstrEndLoopForce(local_instr);
+ }
+ else
+ {
+ local_instr = outerPlanState(mtstate)->instrument;
+ InstrEndLoop(local_instr);
+ }
/* count the number of source rows */
- total = outerPlanState(mtstate)->instrument->ntuples;
+ total = local_instr->ntuples;
other_path = mtstate->ps.instrument->ntuples2;
insert_path = total - other_path;
@@ -4645,11 +4709,24 @@ show_modifytable_info(ModifyTableState *mtstate, List *ancestors,
double update_path;
double delete_path;
double skipped_path;
+ Instrumentation *local_instr;
- InstrEndLoop(outerPlanState(mtstate)->instrument);
+ /* Progressive explain. Use auxiliary instrumentation object */
+ if (es->progressive)
+ {
+ local_instr = es->pe_local_instr;
+ *local_instr = *outerPlanState(mtstate)->instrument;
+ /* Force end loop even if node is in progress */
+ InstrEndLoopForce(local_instr);
+ }
+ else
+ {
+ local_instr = outerPlanState(mtstate)->instrument;
+ InstrEndLoop(local_instr);
+ }
/* count the number of source rows */
- total = outerPlanState(mtstate)->instrument->ntuples;
+ total = local_instr->ntuples;
insert_path = mtstate->mt_merge_inserted;
update_path = mtstate->mt_merge_updated;
delete_path = mtstate->mt_merge_deleted;
@@ -4931,3 +5008,654 @@ ExplainFlushWorkersState(ExplainState *es)
pfree(wstate->worker_state_save);
pfree(wstate);
}
+
+/*
+ * ProgressiveExplainSetup
+ * Adjusts QueryDesc with instrumentation for progressive explains.
+ *
+ * If progressive explain is enabled and configured to collect
+ * instrumentation details, we adjust QueryDesc accordingly even if
+ * the query was not initiated with EXPLAIN ANALYZE. This will
+ * directly affect query execution and add computation overhead.
+ */
+void
+ProgressiveExplainSetup(QueryDesc *queryDesc)
+{
+ /* Adjust instrumentation if enabled */
+ if (progressive_explain == PROGRESSIVE_EXPLAIN_ANALYZE)
+ {
+ if (progressive_explain_timing)
+ queryDesc->instrument_options |= INSTRUMENT_TIMER;
+ else
+ queryDesc->instrument_options |= INSTRUMENT_ROWS;
+ if (progressive_explain_buffers)
+ queryDesc->instrument_options |= INSTRUMENT_BUFFERS;
+ if (progressive_explain_wal)
+ queryDesc->instrument_options |= INSTRUMENT_WAL;
+ }
+}
+
+/*
+ * ProgressiveExplainStart
+ * Progressive explain start point.
+ */
+void
+ProgressiveExplainStart(QueryDesc *queryDesc)
+{
+ activeQueryDesc = queryDesc;
+ queryDesc->pe_es = NULL;
+
+ /* Timeout is only needed if duration > 0 */
+ if (progressive_explain_min_duration == 0)
+ ProgressiveExplainInit(queryDesc);
+ else
+ enable_timeout_after(PROGRESSIVE_EXPLAIN_STARTUP_TIMEOUT,
+ progressive_explain_min_duration);
+}
+
+/*
+ * ExecProcNodeWithExplain
+ * Responsible for initialization of all structures related to progressive
+ * explains.
+ *
+ * We define a ExplainState that will be reused in every iteration of
+ * plan prints.
+ *
+ * Progressive explain plans are printed in shared memory via DSAs. Each
+ * A dynamic shared memory area is created to hold the progressive plans.
+ * Each backend printing plans has its own DSA, which is shared with other
+ * backends via the global progressive explain hash through dsa_handle and
+ * dsa_pointer pointers.
+ *
+ * A memory context release callback is defined for manual resource release
+ * in case of query cancellations.
+ *
+ * A periodic timeout is configured to print the plan in fixed intervals if
+ * progressive explain is configured with instrumentation enabled. Otherwise
+ * the plain plan is printed once.
+ */
+void
+ProgressiveExplainInit(QueryDesc *queryDesc)
+{
+ ExplainState *es;
+ progressiveExplainHashKey key;
+ progressiveExplainHashEntry *entry;
+ bool found;
+
+ /* Configure memory context release callback */
+ MemoryContextCallback *queryDescReleaseCallback;
+
+ queryDescReleaseCallback = (MemoryContextCallback *)
+ palloc0(sizeof(MemoryContextCallback));
+ queryDescReleaseCallback->func = ProgressiveExplainReleaseFunc;
+ queryDescReleaseCallback->arg = NULL;
+ MemoryContextRegisterResetCallback(CurrentMemoryContext,
+ queryDescReleaseCallback);
+
+ /* Initialize ExplainState to be used for all prints */
+ es = NewExplainState();
+ queryDesc->pe_es = es;
+
+ /* Local instrumentation object to be reused for every node */
+ es->pe_local_instr = palloc0(sizeof(Instrumentation));
+
+ /*
+ * Mark ExplainState as progressive so that the plan printing function
+ * adjusts logic accordingly.
+ */
+ es->progressive = true;
+
+ es->analyze = (queryDesc->instrument_options &&
+ (progressive_explain == PROGRESSIVE_EXPLAIN_ANALYZE));
+ es->buffers = (es->analyze && progressive_explain_buffers);
+ es->wal = (es->analyze && progressive_explain_wal);
+ es->timing = (es->analyze && progressive_explain_timing);
+ es->summary = (es->analyze);
+ es->format = progressive_explain_format;
+ es->verbose = progressive_explain_verbose;
+ es->settings = progressive_explain_settings;
+
+ /* Define the DSA and share through the hash */
+ es->pe_a = dsa_create(LWTRANCHE_PROGRESSIVE_EXPLAIN_DSA);
+
+ /* Exclusive access is needed to update the hash */
+ LWLockAcquire(ExplainHashLock, LW_EXCLUSIVE);
+
+ /* Find or create an entry with desired hash code */
+ key.pid = MyProcPid;
+ entry = (progressiveExplainHashEntry *) hash_search(progressiveExplainArray,
+ &key,
+ HASH_ENTER,
+ &found);
+
+ entry->h = dsa_get_handle(es->pe_a);
+ entry->p = (dsa_pointer) NULL;
+
+ LWLockRelease(ExplainHashLock);
+
+ /* Enable timeout only if instrumentation is enabled */
+ if (es->analyze)
+ enable_timeout_every(PROGRESSIVE_EXPLAIN_TIMEOUT,
+ TimestampTzPlusMilliseconds(GetCurrentTimestamp(),
+ progressive_explain_interval),
+ progressive_explain_interval);
+
+ /* Printing progressive plan for the first time */
+ ProgressiveExplainPrint(queryDesc);
+}
+
+/*
+ * ProgressiveExplainTrigger
+ * Called by the timeout handler to start printing progressive
+ * explain plans.
+ */
+void
+ProgressiveExplainTrigger(void)
+{
+ WrapExecProcNodeWithExplain(activeQueryDesc->planstate);
+}
+
+/*
+ * ProgressiveExplainUpdate
+ * Updates progressive explain for instrumented runs.
+ */
+void
+ProgressiveExplainUpdate(PlanState *node)
+{
+ /* Track the current PlanState */
+ node->state->query_desc->pe_es->pe_curr_node = node;
+ ProgressiveExplainPrint(node->state->query_desc);
+ node->state->query_desc->pe_es->pe_curr_node = NULL;
+
+ /* Reset timeout flag */
+ ProgressiveExplainPending = false;
+}
+
+/*
+ * ProgressiveExplainPrint
+ * Prints progressive explain in memory.
+ *
+ * This function resets the reusable ExplainState, prints the
+ * plan and updates the DSA with new data.
+ *
+ * DSA memory allocation is also done here. Amount of shared
+ * memory allocated depends on size of currently printed plan.
+ * There may be reallocations in subsequent calls if new plans
+ * don't fit in the existing area.
+ */
+void
+ProgressiveExplainPrint(QueryDesc *queryDesc)
+{
+ bool alloc_needed = false;
+
+ /* Produce a plan only if descriptor is being tracked */
+ if (queryDesc &&
+ queryDesc->planstate)
+ {
+ QueryDesc *currentQueryDesc = queryDesc;
+
+ progressiveExplainHashKey key;
+ progressiveExplainHashEntry *entry;
+ progressiveExplainData *pe_data;
+ ExplainState *es = queryDesc->pe_es;
+
+ /* Reset the string to be reused */
+ resetStringInfo(es->str);
+
+ /* Print the plan */
+ ExplainBeginOutput(es);
+ ExplainPrintPlan(es, currentQueryDesc);
+ ExplainEndOutput(es);
+
+ /* Exclusive access is needed to update the hash */
+ key.pid = MyProcPid;
+ LWLockAcquire(ExplainHashLock, LW_EXCLUSIVE);
+ entry = (progressiveExplainHashEntry *) hash_search(progressiveExplainArray,
+ &key,
+ HASH_FIND,
+ NULL);
+
+ /* Entry exists */
+ if (entry)
+ {
+ /* Plan was never printed */
+ if (!entry->p)
+ alloc_needed = true;
+ else
+ {
+ pe_data = dsa_get_address(es->pe_a,
+ entry->p);
+
+ /*
+ * Plan does not fit in existing shared memory area.
+ * Reallocation is needed.
+ */
+ if (strlen(es->str->data) >
+ add_size(strlen(pe_data->plan),
+ PROGRESSIVE_EXPLAIN_ALLOC_SIZE))
+ {
+ dsa_free(es->pe_a, entry->p);
+ alloc_needed = true;
+ }
+ }
+
+ if (alloc_needed)
+ {
+ /*
+ * The allocated size combines the length of the currently
+ * printed query plan with an additional delta defined by
+ * PROGRESSIVE_EXPLAIN_ALLOC_SIZE. This strategy prevents
+ * having to reallocate the segment very often, which would be
+ * needed in case the length of the next printed exceeds the
+ * previously allocated size.
+ */
+ entry->p = dsa_allocate(es->pe_a,
+ add_size(sizeof(progressiveExplainData),
+ add_size(strlen(es->str->data),
+ PROGRESSIVE_EXPLAIN_ALLOC_SIZE)));
+ pe_data = dsa_get_address(es->pe_a, entry->p);
+ pe_data->pid = MyProcPid;
+ }
+
+ /* Update shared memory with new data */
+ strcpy(pe_data->plan, es->str->data);
+ pe_data->last_print = GetCurrentTimestamp();
+ }
+
+ LWLockRelease(ExplainHashLock);
+ }
+}
+
+/*
+ * ProgressiveExplainFinish
+ * Finalizes query execution with progressive explain enabled.
+ */
+void
+ProgressiveExplainFinish(QueryDesc *queryDesc)
+{
+ ProgressiveExplainCleanup(queryDesc);
+}
+
+/*
+ * ProgressiveExplainCleanup
+ * Cleanup routine when progressive explain is enabled.
+ *
+ * We need deal with structures not automatically released by the memory
+ * context removal. Current tasks are:
+ * - remove local backend from progressive explain hash
+ * - detach from DSA used to store shared data
+ */
+void
+ProgressiveExplainCleanup(QueryDesc *queryDesc)
+{
+ progressiveExplainHashKey key;
+
+ /* Startup timeout hasn't triggered yet, just disable it */
+ if (get_timeout_active(PROGRESSIVE_EXPLAIN_STARTUP_TIMEOUT))
+ {
+ disable_timeout(PROGRESSIVE_EXPLAIN_STARTUP_TIMEOUT, false);
+ }
+ /* Initial progressive explain was done, clean everything */
+ else
+ {
+ /* Stop timeout */
+ disable_timeout(PROGRESSIVE_EXPLAIN_TIMEOUT, false);
+
+ /* Reset timeout flag */
+ ProgressiveExplainPending = false;
+
+ key.pid = MyProcPid;
+ LWLockAcquire(ExplainHashLock, LW_EXCLUSIVE);
+ if (queryDesc && queryDesc->pe_es->pe_a)
+ dsa_detach(queryDesc->pe_es->pe_a);
+ hash_search(progressiveExplainArray, &key, HASH_REMOVE, NULL);
+ LWLockRelease(ExplainHashLock);
+ }
+}
+
+/*
+ * ExecProcNodeWithExplain
+ * ExecProcNode wrapper that initializes progressive explain
+ * and uwraps ExecProdNode to the original function.
+ */
+static TupleTableSlot *
+ExecProcNodeExplain(PlanState *node)
+{
+ /* Initialize progressive explain */
+ ProgressiveExplainInit(node->state->query_desc);
+
+ /* Unwrap exec proc node for all nodes */
+ UnwrapExecProcNodeWithExplain(node->state->query_desc->planstate);
+
+ /*
+ * Since unwrapping has already done, call ExecProcNode() not
+ * ExecProcNodeOriginal().
+ */
+ return node->ExecProcNode(node);
+}
+
+/*
+ * ExecProcNode wrapper that performs instrumentation calls and prints
+ * progressive explains. By keeping this a separate function, we add
+ * overhead only when progressive explain is enabled
+ */
+TupleTableSlot *
+ExecProcNodeInstrExplain(PlanState *node)
+{
+ TupleTableSlot *result;
+
+ InstrStartNode(node->instrument);
+
+ /*
+ * Update progressive after timeout is reached.
+ */
+ if (ProgressiveExplainPending)
+ ProgressiveExplainUpdate(node);
+
+ result = node->ExecProcNodeReal(node);
+
+ InstrStopNode(node->instrument, TupIsNull(result) ? 0.0 : 1.0);
+
+ return result;
+}
+
+/*
+ * WrapMultiExecProcNodesWithExplain -
+ * Wrap array of PlanStates ExecProcNodes with ExecProcNodeWithExplain
+ */
+static void
+WrapMultiExecProcNodesWithExplain(PlanState **planstates, int nplans)
+{
+ int i;
+
+ for (i = 0; i < nplans; i++)
+ WrapExecProcNodeWithExplain(planstates[i]);
+}
+
+/*
+ * WrapCustomPlanChildExecProcNodesWithExplain -
+ * Wrap CustomScanstate children's ExecProcNodes with ExecProcNodeWithExplain
+ */
+static void
+WrapCustomPlanChildExecProcNodesWithExplain(CustomScanState *css)
+{
+ ListCell *cell;
+
+ foreach(cell, css->custom_ps)
+ WrapExecProcNodeWithExplain((PlanState *) lfirst(cell));
+}
+
+/*
+ * WrapExecProcNodeWithExplain -
+ * Wrap ExecProcNode with ExecProcNodeWithExplain recursively
+ */
+static void
+WrapExecProcNodeWithExplain(PlanState *ps)
+{
+ /* wrapping can be done only once */
+ if (ps->ExecProcNodeOriginal != NULL)
+ return;
+
+ check_stack_depth();
+
+ ps->ExecProcNodeOriginal = ps->ExecProcNode;
+ ps->ExecProcNode = ExecProcNodeExplain;
+
+ if (ps->lefttree != NULL)
+ WrapExecProcNodeWithExplain(ps->lefttree);
+ if (ps->righttree != NULL)
+ WrapExecProcNodeWithExplain(ps->righttree);
+
+ /* special child plans */
+ switch (nodeTag(ps->plan))
+ {
+ case T_Append:
+ ereport(DEBUG1, errmsg("wrapping Append"));
+ WrapMultiExecProcNodesWithExplain(((AppendState *) ps)->appendplans,
+ ((AppendState *) ps)->as_nplans);
+ break;
+ case T_MergeAppend:
+ ereport(DEBUG1, errmsg("wrapping MergeAppend"));
+ WrapMultiExecProcNodesWithExplain(((MergeAppendState *) ps)->mergeplans,
+ ((MergeAppendState *) ps)->ms_nplans);
+ break;
+ case T_BitmapAnd:
+ ereport(DEBUG1, errmsg("wrapping BitmapAndState"));
+ WrapMultiExecProcNodesWithExplain(((BitmapAndState *) ps)->bitmapplans,
+ ((BitmapAndState *) ps)->nplans);
+ break;
+ case T_BitmapOr:
+ ereport(DEBUG1, errmsg("wrapping BitmapOrtate"));
+ WrapMultiExecProcNodesWithExplain(((BitmapOrState *) ps)->bitmapplans,
+ ((BitmapOrState *) ps)->nplans);
+ break;
+ case T_SubqueryScan:
+ ereport(DEBUG1, errmsg("wrapping Subquery"));
+ WrapExecProcNodeWithExplain(((SubqueryScanState *) ps)->subplan);
+ break;
+ case T_CustomScan:
+ ereport(DEBUG1, errmsg("wrapping CustomScanState"));
+ WrapCustomPlanChildExecProcNodesWithExplain((CustomScanState *) ps);
+ break;
+ default:
+ break;
+ }
+}
+
+/*
+ * UnwrapMultiExecProcNodesWithExplain -
+ * Unwrap array of PlanStates ExecProcNodes with ExecProcNodeWithExplain
+ */
+static void
+UnwrapMultiExecProcNodesWithExplain(PlanState **planstates, int nplans)
+{
+ int i;
+
+ for (i = 0; i < nplans; i++)
+ UnwrapExecProcNodeWithExplain(planstates[i]);
+}
+
+/*
+ * UnwrapCustomPlanChildExecProcNodesWithExplain -
+ * Unwrap CustomScanstate children's ExecProcNodes with ExecProcNodeWithExplain
+ */
+static void
+UnwrapCustomPlanChildExecProcNodesWithExplain(CustomScanState *css)
+{
+ ListCell *cell;
+
+ foreach(cell, css->custom_ps)
+ UnwrapExecProcNodeWithExplain((PlanState *) lfirst(cell));
+}
+
+/*
+ * UnwrapExecProcNodeWithExplain -
+ * Unwrap ExecProcNode with ExecProcNodeWithExplain recursively
+ */
+static void
+UnwrapExecProcNodeWithExplain(PlanState *ps)
+{
+ Assert(ps->ExecProcNodeOriginal != NULL);
+
+ check_stack_depth();
+
+ ps->ExecProcNode = ps->ExecProcNodeOriginal;
+ ps->ExecProcNodeOriginal = NULL;
+
+ if (ps->lefttree != NULL)
+ UnwrapExecProcNodeWithExplain(ps->lefttree);
+ if (ps->righttree != NULL)
+ UnwrapExecProcNodeWithExplain(ps->righttree);
+
+ /* special child plans */
+ switch (nodeTag(ps->plan))
+ {
+ case T_Append:
+ ereport(DEBUG1, errmsg("unwrapping Append"));
+ UnwrapMultiExecProcNodesWithExplain(((AppendState *) ps)->appendplans,
+ ((AppendState *) ps)->as_nplans);
+ break;
+ case T_MergeAppend:
+ ereport(DEBUG1, errmsg("unwrapping MergeAppend"));
+ UnwrapMultiExecProcNodesWithExplain(((MergeAppendState *) ps)->mergeplans,
+ ((MergeAppendState *) ps)->ms_nplans);
+ break;
+ case T_BitmapAnd:
+ ereport(DEBUG1, errmsg("unwrapping BitmapAndState"));
+ UnwrapMultiExecProcNodesWithExplain(((BitmapAndState *) ps)->bitmapplans,
+ ((BitmapAndState *) ps)->nplans);
+ break;
+ case T_BitmapOr:
+ ereport(DEBUG1, errmsg("unwrapping BitmapOrtate"));
+ UnwrapMultiExecProcNodesWithExplain(((BitmapOrState *) ps)->bitmapplans,
+ ((BitmapOrState *) ps)->nplans);
+ break;
+ case T_SubqueryScan:
+ ereport(DEBUG1, errmsg("unwrapping Subquery"));
+ UnwrapExecProcNodeWithExplain(((SubqueryScanState *) ps)->subplan);
+ break;
+ case T_CustomScan:
+ ereport(DEBUG1, errmsg("unwrapping CustomScanState"));
+ UnwrapCustomPlanChildExecProcNodesWithExplain((CustomScanState *) ps);
+ break;
+ default:
+ break;
+ }
+}
+
+/*
+ * ProgressiveExplainReleaseFunc
+ * Memory context release callback function to remove
+ * plan from explain hash and disable the timeout.
+ */
+static void
+ProgressiveExplainReleaseFunc(void *arg)
+{
+ /* Remove row from hash */
+ progressiveExplainHashKey key;
+ progressiveExplainHashEntry *entry;
+
+ key.pid = MyProcPid;
+ LWLockAcquire(ExplainHashLock, LW_SHARED);
+ entry = (progressiveExplainHashEntry *) hash_search(progressiveExplainArray,
+ &key,
+ HASH_FIND,
+ NULL);
+ LWLockRelease(ExplainHashLock);
+ if (entry)
+ ProgressiveExplainCleanup(NULL);
+}
+
+/*
+ * ProgressiveExplainHashShmemSize
+ * Compute shared memory space needed for explain hash.
+ */
+Size
+ProgressiveExplainHashShmemSize(void)
+{
+ Size size = 0;
+ long max_table_size;
+
+ max_table_size = add_size(MaxBackends,
+ max_parallel_workers);
+ size = add_size(size,
+ hash_estimate_size(max_table_size,
+ sizeof(progressiveExplainHashEntry)));
+
+ return size;
+}
+
+/*
+ * InitProgressiveExplainHash
+ * Initialize hash used to store data of progressive explains.
+ */
+void
+InitProgressiveExplainHash(void)
+{
+ HASHCTL info;
+
+ info.keysize = sizeof(progressiveExplainHashKey);
+ info.entrysize = sizeof(progressiveExplainHashEntry);
+
+ progressiveExplainArray = ShmemInitHash("progressive explain hash",
+ 50, 50,
+ &info,
+ HASH_ELEM | HASH_BLOBS);
+}
+
+/*
+ * pg_stat_progress_explain
+ * Return the progress of progressive explains.
+ */
+Datum
+pg_stat_progress_explain(PG_FUNCTION_ARGS)
+{
+#define EXPLAIN_ACTIVITY_COLS 3
+ HASH_SEQ_STATUS hash_seq;
+ progressiveExplainHashEntry *entry;
+ dsa_area *a;
+ progressiveExplainData *ped;
+
+ ReturnSetInfo *rsinfo = (ReturnSetInfo *) fcinfo->resultinfo;
+
+ InitMaterializedSRF(fcinfo, 0);
+
+ LWLockAcquire(ExplainHashLock, LW_SHARED);
+
+ hash_seq_init(&hash_seq, progressiveExplainArray);
+ while ((entry = hash_seq_search(&hash_seq)) != NULL)
+ {
+ Datum values[EXPLAIN_ACTIVITY_COLS] = {0};
+ bool nulls[EXPLAIN_ACTIVITY_COLS] = {0};
+
+ /*
+ * We don't look at a DSA that doesn't contain data yet, or at our own
+ * row.
+ */
+ if (!DsaPointerIsValid(entry->p) ||
+ MyProcPid == entry->key.pid)
+ continue;
+
+ a = dsa_attach(entry->h);
+ ped = dsa_get_address(a, entry->p);
+
+ values[0] = ped->pid;
+ values[1] = TimestampTzGetDatum(ped->last_print);
+
+ if (superuser())
+ values[2] = CStringGetTextDatum(ped->plan);
+ else
+ {
+ int num_backends = pgstat_fetch_stat_numbackends();
+ int curr_backend;
+
+ /* 1-based index */
+ for (curr_backend = 1; curr_backend <= num_backends; curr_backend++)
+ {
+ LocalPgBackendStatus *local_beentry;
+ PgBackendStatus *beentry;
+
+ /* Get the next one in the list */
+ local_beentry = pgstat_get_local_beentry_by_index(curr_backend);
+ beentry = &local_beentry->backendStatus;
+
+ if (beentry->st_procpid == ped->pid)
+ {
+ if (beentry->st_userid == GetUserId())
+ values[2] = CStringGetTextDatum(ped->plan);
+ else
+ values[2] = CStringGetTextDatum("<insufficient privilege>");
+ break;
+ }
+ }
+ }
+
+ tuplestore_putvalues(rsinfo->setResult, rsinfo->setDesc, values, nulls);
+
+ dsa_detach(a);
+
+ }
+ LWLockRelease(ExplainHashLock);
+
+ return (Datum) 0;
+}
diff --git a/src/backend/executor/execMain.c b/src/backend/executor/execMain.c
index 0493b7d5365..52b8b2bd1f7 100644
--- a/src/backend/executor/execMain.c
+++ b/src/backend/executor/execMain.c
@@ -43,6 +43,7 @@
#include "access/xact.h"
#include "catalog/namespace.h"
#include "catalog/partition.h"
+#include "commands/explain.h"
#include "commands/matview.h"
#include "commands/trigger.h"
#include "executor/executor.h"
@@ -157,6 +158,12 @@ standard_ExecutorStart(QueryDesc *queryDesc, int eflags)
/* caller must ensure the query's snapshot is active */
Assert(GetActiveSnapshot() == queryDesc->snapshot);
+ /*
+ * Setup progressive explain if enabled.
+ */
+ if (progressive_explain == PROGRESSIVE_EXPLAIN_ANALYZE)
+ ProgressiveExplainSetup(queryDesc);
+
/*
* If the transaction is read-only, we need to check if any writes are
* planned to non-temporary tables. EXPLAIN is considered read-only.
@@ -182,6 +189,11 @@ standard_ExecutorStart(QueryDesc *queryDesc, int eflags)
estate = CreateExecutorState();
queryDesc->estate = estate;
+ /*
+ * Adding back reference to QueryDesc
+ */
+ estate->query_desc = queryDesc;
+
oldcontext = MemoryContextSwitchTo(estate->es_query_cxt);
/*
@@ -267,6 +279,12 @@ standard_ExecutorStart(QueryDesc *queryDesc, int eflags)
*/
InitPlan(queryDesc, eflags);
+ /*
+ * Start progressive explain if enabled.
+ */
+ if (progressive_explain != PROGRESSIVE_EXPLAIN_NONE)
+ ProgressiveExplainStart(queryDesc);
+
MemoryContextSwitchTo(oldcontext);
return ExecPlanStillValid(queryDesc->estate);
@@ -516,6 +534,12 @@ standard_ExecutorFinish(QueryDesc *queryDesc)
MemoryContextSwitchTo(oldcontext);
+ /*
+ * Finish progressive explain if enabled.
+ */
+ if (progressive_explain != PROGRESSIVE_EXPLAIN_NONE)
+ ProgressiveExplainFinish(queryDesc);
+
estate->es_finished = true;
}
diff --git a/src/backend/executor/execProcnode.c b/src/backend/executor/execProcnode.c
index f5f9cfbeead..3af8e9d964d 100644
--- a/src/backend/executor/execProcnode.c
+++ b/src/backend/executor/execProcnode.c
@@ -72,6 +72,7 @@
*/
#include "postgres.h"
+#include "commands/explain.h"
#include "executor/executor.h"
#include "executor/nodeAgg.h"
#include "executor/nodeAppend.h"
@@ -118,6 +119,7 @@
#include "executor/nodeWorktablescan.h"
#include "miscadmin.h"
#include "nodes/nodeFuncs.h"
+#include "utils/guc.h"
static TupleTableSlot *ExecProcNodeFirst(PlanState *node);
static TupleTableSlot *ExecProcNodeInstr(PlanState *node);
@@ -461,8 +463,14 @@ ExecProcNodeFirst(PlanState *node)
* does instrumentation. Otherwise we can dispense with all wrappers and
* have ExecProcNode() directly call the relevant function from now on.
*/
+
if (node->instrument)
- node->ExecProcNode = ExecProcNodeInstr;
+ {
+ if (progressive_explain == PROGRESSIVE_EXPLAIN_ANALYZE)
+ node->ExecProcNode = ExecProcNodeInstrExplain;
+ else
+ node->ExecProcNode = ExecProcNodeInstr;
+ }
else
node->ExecProcNode = node->ExecProcNodeReal;
diff --git a/src/backend/executor/instrument.c b/src/backend/executor/instrument.c
index 56e635f4700..6a160ee254f 100644
--- a/src/backend/executor/instrument.c
+++ b/src/backend/executor/instrument.c
@@ -25,6 +25,8 @@ static WalUsage save_pgWalUsage;
static void BufferUsageAdd(BufferUsage *dst, const BufferUsage *add);
static void WalUsageAdd(WalUsage *dst, WalUsage *add);
+static void InstrEndLoopInternal(Instrumentation *instr, bool force);
+
/* Allocate new instrumentation structure(s) */
Instrumentation *
@@ -137,7 +139,7 @@ InstrUpdateTupleCount(Instrumentation *instr, double nTuples)
/* Finish a run cycle for a plan node */
void
-InstrEndLoop(Instrumentation *instr)
+InstrEndLoopInternal(Instrumentation *instr, bool force)
{
double totaltime;
@@ -145,7 +147,7 @@ InstrEndLoop(Instrumentation *instr)
if (!instr->running)
return;
- if (!INSTR_TIME_IS_ZERO(instr->starttime))
+ if (!INSTR_TIME_IS_ZERO(instr->starttime) && !force)
elog(ERROR, "InstrEndLoop called on running node");
/* Accumulate per-cycle statistics into totals */
@@ -164,6 +166,20 @@ InstrEndLoop(Instrumentation *instr)
instr->tuplecount = 0;
}
+/* Safely finish a run cycle for a plan node */
+void
+InstrEndLoop(Instrumentation *instr)
+{
+ InstrEndLoopInternal(instr, false);
+}
+
+/* Forcibly finish a run cycle for a plan node */
+void
+InstrEndLoopForce(Instrumentation *instr)
+{
+ InstrEndLoopInternal(instr, true);
+}
+
/* aggregate instrumentation information */
void
InstrAggNode(Instrumentation *dst, Instrumentation *add)
diff --git a/src/backend/storage/ipc/ipci.c b/src/backend/storage/ipc/ipci.c
index 174eed70367..15d8a3b28a8 100644
--- a/src/backend/storage/ipc/ipci.c
+++ b/src/backend/storage/ipc/ipci.c
@@ -25,6 +25,7 @@
#include "access/xlogprefetcher.h"
#include "access/xlogrecovery.h"
#include "commands/async.h"
+#include "commands/explain.h"
#include "miscadmin.h"
#include "pgstat.h"
#include "postmaster/autovacuum.h"
@@ -148,6 +149,7 @@ CalculateShmemSize(int *num_semaphores)
size = add_size(size, WaitEventCustomShmemSize());
size = add_size(size, InjectionPointShmemSize());
size = add_size(size, SlotSyncShmemSize());
+ size = add_size(size, ProgressiveExplainHashShmemSize());
/* include additional requested shmem from preload libraries */
size = add_size(size, total_addin_request);
@@ -300,6 +302,11 @@ CreateOrAttachShmemStructs(void)
*/
PredicateLockShmemInit();
+ /*
+ * Set up instrumented explain hash table
+ */
+ InitProgressiveExplainHash();
+
/*
* Set up process table
*/
diff --git a/src/backend/storage/lmgr/lwlock.c b/src/backend/storage/lmgr/lwlock.c
index 8adf2730277..2b4393d3635 100644
--- a/src/backend/storage/lmgr/lwlock.c
+++ b/src/backend/storage/lmgr/lwlock.c
@@ -176,6 +176,7 @@ static const char *const BuiltinTrancheNames[] = {
[LWTRANCHE_SUBTRANS_SLRU] = "SubtransSLRU",
[LWTRANCHE_XACT_SLRU] = "XactSLRU",
[LWTRANCHE_PARALLEL_VACUUM_DSA] = "ParallelVacuumDSA",
+ [LWTRANCHE_PROGRESSIVE_EXPLAIN_DSA] = "ProgressiveExplainDSA",
};
StaticAssertDecl(lengthof(BuiltinTrancheNames) ==
diff --git a/src/backend/utils/activity/wait_event_names.txt b/src/backend/utils/activity/wait_event_names.txt
index e199f071628..890acf02da9 100644
--- a/src/backend/utils/activity/wait_event_names.txt
+++ b/src/backend/utils/activity/wait_event_names.txt
@@ -346,6 +346,7 @@ WALSummarizer "Waiting to read or update WAL summarization state."
DSMRegistry "Waiting to read or update the dynamic shared memory registry."
InjectionPoint "Waiting to read or update information related to injection points."
SerialControl "Waiting to read or update shared <filename>pg_serial</filename> state."
+ExplainHash "Waiting to access backend explain shared hash table."
#
# END OF PREDEFINED LWLOCKS (DO NOT CHANGE THIS LINE)
diff --git a/src/backend/utils/init/postinit.c b/src/backend/utils/init/postinit.c
index ee1a9d5d98b..6aee6f08b75 100644
--- a/src/backend/utils/init/postinit.c
+++ b/src/backend/utils/init/postinit.c
@@ -33,6 +33,7 @@
#include "catalog/pg_database.h"
#include "catalog/pg_db_role_setting.h"
#include "catalog/pg_tablespace.h"
+#include "commands/explain.h"
#include "libpq/auth.h"
#include "libpq/libpq-be.h"
#include "mb/pg_wchar.h"
@@ -80,6 +81,8 @@ static void TransactionTimeoutHandler(void);
static void IdleSessionTimeoutHandler(void);
static void IdleStatsUpdateTimeoutHandler(void);
static void ClientCheckTimeoutHandler(void);
+static void ProgressiveExplainStartupTimeoutHandler(void);
+static void ProgressiveExplainTimeoutHandler(void);
static bool ThereIsAtLeastOneRole(void);
static void process_startup_options(Port *port, bool am_superuser);
static void process_settings(Oid databaseid, Oid roleid);
@@ -757,6 +760,10 @@ InitPostgres(const char *in_dbname, Oid dboid,
RegisterTimeout(CLIENT_CONNECTION_CHECK_TIMEOUT, ClientCheckTimeoutHandler);
RegisterTimeout(IDLE_STATS_UPDATE_TIMEOUT,
IdleStatsUpdateTimeoutHandler);
+ RegisterTimeout(PROGRESSIVE_EXPLAIN_TIMEOUT,
+ ProgressiveExplainTimeoutHandler);
+ RegisterTimeout(PROGRESSIVE_EXPLAIN_STARTUP_TIMEOUT,
+ ProgressiveExplainStartupTimeoutHandler);
}
/*
@@ -1418,6 +1425,18 @@ ClientCheckTimeoutHandler(void)
SetLatch(MyLatch);
}
+static void
+ProgressiveExplainStartupTimeoutHandler(void)
+{
+ ProgressiveExplainTrigger();
+}
+
+static void
+ProgressiveExplainTimeoutHandler(void)
+{
+ ProgressiveExplainPending = true;
+}
+
/*
* Returns true if at least one role is defined in this database cluster.
*/
diff --git a/src/backend/utils/misc/guc_tables.c b/src/backend/utils/misc/guc_tables.c
index ad25cbb39c5..e6cd1d8781c 100644
--- a/src/backend/utils/misc/guc_tables.c
+++ b/src/backend/utils/misc/guc_tables.c
@@ -40,6 +40,7 @@
#include "catalog/storage.h"
#include "commands/async.h"
#include "commands/event_trigger.h"
+#include "commands/explain.h"
#include "commands/tablespace.h"
#include "commands/trigger.h"
#include "commands/user.h"
@@ -476,6 +477,22 @@ static const struct config_enum_entry wal_compression_options[] = {
{NULL, 0, false}
};
+static const struct config_enum_entry explain_format_options[] = {
+ {"text", EXPLAIN_FORMAT_TEXT, false},
+ {"xml", EXPLAIN_FORMAT_XML, false},
+ {"json", EXPLAIN_FORMAT_JSON, false},
+ {"yaml", EXPLAIN_FORMAT_YAML, false},
+ {NULL, 0, false}
+};
+
+static const struct config_enum_entry progressive_explain_options[] = {
+ {"off", PROGRESSIVE_EXPLAIN_NONE, false},
+ {"explain", PROGRESSIVE_EXPLAIN_EXPLAIN, false},
+ {"analyze", PROGRESSIVE_EXPLAIN_ANALYZE, false},
+ {"false", PROGRESSIVE_EXPLAIN_NONE, true},
+ {NULL, 0, false}
+};
+
/*
* Options for enum values stored in other modules
*/
@@ -530,6 +547,15 @@ int log_parameter_max_length_on_error = 0;
int log_temp_files = -1;
double log_statement_sample_rate = 1.0;
double log_xact_sample_rate = 0;
+int progressive_explain = PROGRESSIVE_EXPLAIN_NONE;
+bool progressive_explain_verbose = false;
+bool progressive_explain_settings = false;
+bool progressive_explain_timing = false;
+bool progressive_explain_buffers = false;
+bool progressive_explain_wal = false;
+int progressive_explain_min_duration = 1000;
+int progressive_explain_interval = 1000;
+int progressive_explain_format = EXPLAIN_FORMAT_TEXT;
char *backtrace_functions;
int temp_file_limit = -1;
@@ -2118,6 +2144,61 @@ struct config_bool ConfigureNamesBool[] =
NULL, NULL, NULL
},
+ {
+ {"progressive_explain_verbose", PGC_USERSET, STATS_MONITORING,
+ gettext_noop("Controls whether verbose details are added to progressive explains."),
+ gettext_noop("Equivalent to the VERBOSE option of EXPLAIN."),
+ GUC_EXPLAIN
+ },
+ &progressive_explain_verbose,
+ false,
+ NULL, NULL, NULL
+ },
+
+ {
+ {"progressive_explain_settings", PGC_USERSET, STATS_MONITORING,
+ gettext_noop("Controls whether information about modified configuration is added to progressive explains."),
+ gettext_noop("Equivalent to the SETTINGS option of EXPLAIN."),
+ GUC_EXPLAIN
+ },
+ &progressive_explain_settings,
+ false,
+ NULL, NULL, NULL
+ },
+
+ {
+ {"progressive_explain_timing", PGC_USERSET, STATS_MONITORING,
+ gettext_noop("Controls whether information about per node timing is added to progressive explains."),
+ gettext_noop("Equivalent to the TIMING option of EXPLAIN."),
+ GUC_EXPLAIN
+ },
+ &progressive_explain_timing,
+ true,
+ NULL, NULL, NULL
+ },
+
+ {
+ {"progressive_explain_buffers", PGC_USERSET, STATS_MONITORING,
+ gettext_noop("Controls whether information about buffers is added to progressive explains."),
+ gettext_noop("Equivalent to the BUFFERS option of EXPLAIN."),
+ GUC_EXPLAIN
+ },
+ &progressive_explain_buffers,
+ false,
+ NULL, NULL, NULL
+ },
+
+ {
+ {"progressive_explain_wal", PGC_USERSET, STATS_MONITORING,
+ gettext_noop("Controls whether information about WAL record generation is added to progressive explains."),
+ gettext_noop("Equivalent to the WAL option of EXPLAIN."),
+ GUC_EXPLAIN
+ },
+ &progressive_explain_wal,
+ false,
+ NULL, NULL, NULL
+ },
+
/* End-of-list marker */
{
{NULL, 0, 0, NULL, NULL}, NULL, false, NULL, NULL, NULL
@@ -3785,6 +3866,30 @@ struct config_int ConfigureNamesInt[] =
NULL, NULL, NULL
},
+ {
+ {"progressive_explain_interval", PGC_USERSET, STATS_MONITORING,
+ gettext_noop("Sets the interval between instrumented progressive "
+ "explains."),
+ NULL,
+ GUC_UNIT_MS
+ },
+ &progressive_explain_interval,
+ 1000, 10, INT_MAX,
+ NULL, NULL, NULL
+ },
+
+ {
+ {"progressive_explain_min_duration", PGC_USERSET, STATS_MONITORING,
+ gettext_noop("Min query duration to start printing instrumented "
+ "progressive explains."),
+ NULL,
+ GUC_UNIT_MS
+ },
+ &progressive_explain_min_duration,
+ 1000, 0, INT_MAX,
+ NULL, NULL, NULL
+ },
+
/* End-of-list marker */
{
{NULL, 0, 0, NULL, NULL}, NULL, 0, 0, 0, NULL, NULL, NULL
@@ -5299,6 +5404,26 @@ struct config_enum ConfigureNamesEnum[] =
NULL, NULL, NULL
},
+ {
+ {"progressive_explain_format", PGC_USERSET, STATS_MONITORING,
+ gettext_noop("Selects the EXPLAIN output format to be used with progressive explains."),
+ gettext_noop("Equivalent to the FORMAT option of EXPLAIN.")
+ },
+ &progressive_explain_format,
+ EXPLAIN_FORMAT_TEXT, explain_format_options,
+ NULL, NULL, NULL
+ },
+
+ {
+ {"progressive_explain", PGC_USERSET, STATS_MONITORING,
+ gettext_noop("Enables progressive explains."),
+ gettext_noop("Explain output is visible via pg_stat_progress_explain.")
+ },
+ &progressive_explain,
+ PROGRESSIVE_EXPLAIN_NONE, progressive_explain_options,
+ NULL, NULL, NULL
+ },
+
/* End-of-list marker */
{
{NULL, 0, 0, NULL, NULL}, NULL, 0, NULL, NULL, NULL, NULL
diff --git a/src/backend/utils/misc/postgresql.conf.sample b/src/backend/utils/misc/postgresql.conf.sample
index 2d1de9c37bd..7da505564d4 100644
--- a/src/backend/utils/misc/postgresql.conf.sample
+++ b/src/backend/utils/misc/postgresql.conf.sample
@@ -655,6 +655,19 @@
#log_executor_stats = off
+# - Progressive Explain -
+
+#progressive_explain = off
+#progressive_explain_min_duration = 1s
+#progressive_explain_interval = 1s
+#progressive_explain_format = text
+#progressive_explain_settings = off
+#progressive_explain_verbose = off
+#progressive_explain_buffers = off
+#progressive_explain_wal = off
+#progressive_explain_timing = off
+
+
#------------------------------------------------------------------------------
# VACUUMING
#------------------------------------------------------------------------------
diff --git a/src/include/catalog/pg_proc.dat b/src/include/catalog/pg_proc.dat
index 134b3dd8689..67be2633e50 100644
--- a/src/include/catalog/pg_proc.dat
+++ b/src/include/catalog/pg_proc.dat
@@ -12450,4 +12450,14 @@
proargtypes => 'int4',
prosrc => 'gist_stratnum_common' },
+{ oid => '8770',
+ descr => 'statistics: information about progress of backends running statements',
+ proname => 'pg_stat_progress_explain', prorows => '100', proisstrict => 'f',
+ proretset => 't', provolatile => 's', proparallel => 'r',
+ prorettype => 'record', proargtypes => 'bool',
+ proallargtypes => '{bool,int4,timestamptz,text}',
+ proargmodes => '{i,o,o,o}',
+ proargnames => '{mode,pid,last_print,query_plan}',
+ prosrc => 'pg_stat_progress_explain' },
+
]
diff --git a/src/include/commands/explain.h b/src/include/commands/explain.h
index 64547bd9b9c..50e90815936 100644
--- a/src/include/commands/explain.h
+++ b/src/include/commands/explain.h
@@ -13,6 +13,7 @@
#ifndef EXPLAIN_H
#define EXPLAIN_H
+#include "datatype/timestamp.h"
#include "executor/executor.h"
#include "lib/stringinfo.h"
#include "parser/parse_node.h"
@@ -32,6 +33,13 @@ typedef enum ExplainFormat
EXPLAIN_FORMAT_YAML,
} ExplainFormat;
+typedef enum ProgressiveExplain
+{
+ PROGRESSIVE_EXPLAIN_NONE,
+ PROGRESSIVE_EXPLAIN_EXPLAIN,
+ PROGRESSIVE_EXPLAIN_ANALYZE,
+} ProgressiveExplain;
+
typedef struct ExplainWorkersState
{
int num_workers; /* # of worker processes the plan used */
@@ -67,12 +75,37 @@ typedef struct ExplainState
List *deparse_cxt; /* context list for deparsing expressions */
Bitmapset *printed_subplans; /* ids of SubPlans we've printed */
bool hide_workers; /* set if we find an invisible Gather */
+ bool progressive; /* set if tracking a progressive explain */
int rtable_size; /* length of rtable excluding the RTE_GROUP
* entry */
/* state related to the current plan node */
ExplainWorkersState *workers_state; /* needed if parallel plan */
+
+ /* state related to progressive explains */
+ struct PlanState *pe_curr_node;
+ struct Instrumentation *pe_local_instr;
+ dsa_area *pe_a;
} ExplainState;
+typedef struct progressiveExplainHashKey
+{
+ int pid; /* PID */
+} progressiveExplainHashKey;
+
+typedef struct progressiveExplainHashEntry
+{
+ progressiveExplainHashKey key; /* hash key of entry - MUST BE FIRST */
+ dsa_handle h;
+ dsa_pointer p;
+} progressiveExplainHashEntry;
+
+typedef struct progressiveExplainData
+{
+ int pid;
+ TimestampTz last_print;
+ char plan[];
+} progressiveExplainData;
+
/* Hook for plugins to get control in ExplainOneQuery() */
typedef void (*ExplainOneQuery_hook_type) (Query *query,
int cursorOptions,
@@ -120,4 +153,15 @@ extern void ExplainPrintJITSummary(ExplainState *es, QueryDesc *queryDesc);
extern void ExplainQueryText(ExplainState *es, QueryDesc *queryDesc);
extern void ExplainQueryParameters(ExplainState *es, ParamListInfo params, int maxlen);
+extern void ProgressiveExplainSetup(QueryDesc *queryDesc);
+extern void ProgressiveExplainStart(QueryDesc *queryDesc);
+extern void ProgressiveExplainTrigger(void);
+extern void ProgressiveExplainUpdate(PlanState *node);
+extern void ProgressiveExplainFinish(QueryDesc *queryDesc);
+extern Size ProgressiveExplainHashShmemSize(void);
+extern void InitProgressiveExplainHash(void);
+extern TupleTableSlot *ExecProcNodeInstrExplain(PlanState *node);
+
+extern bool ProgressiveExplainPending;
+
#endif /* EXPLAIN_H */
diff --git a/src/include/executor/execdesc.h b/src/include/executor/execdesc.h
index ba53305ad42..a6d2362c6c9 100644
--- a/src/include/executor/execdesc.h
+++ b/src/include/executor/execdesc.h
@@ -48,6 +48,7 @@ typedef struct QueryDesc
TupleDesc tupDesc; /* descriptor for result tuples */
EState *estate; /* executor's query-wide state */
PlanState *planstate; /* tree of per-plan-node state */
+ struct ExplainState *pe_es; /* progressive explain state if enabled */
/* This field is set by ExecutePlan */
bool already_executed; /* true if previously executed */
diff --git a/src/include/executor/instrument.h b/src/include/executor/instrument.h
index 03653ab6c6c..21de2a5632d 100644
--- a/src/include/executor/instrument.h
+++ b/src/include/executor/instrument.h
@@ -109,6 +109,7 @@ extern void InstrStartNode(Instrumentation *instr);
extern void InstrStopNode(Instrumentation *instr, double nTuples);
extern void InstrUpdateTupleCount(Instrumentation *instr, double nTuples);
extern void InstrEndLoop(Instrumentation *instr);
+extern void InstrEndLoopForce(Instrumentation *instr);
extern void InstrAggNode(Instrumentation *dst, Instrumentation *add);
extern void InstrStartParallelQuery(void);
extern void InstrEndParallelQuery(BufferUsage *bufusage, WalUsage *walusage);
diff --git a/src/include/nodes/execnodes.h b/src/include/nodes/execnodes.h
index a323fa98bbb..3ace9a88636 100644
--- a/src/include/nodes/execnodes.h
+++ b/src/include/nodes/execnodes.h
@@ -57,6 +57,7 @@ struct ExprState;
struct ExprContext;
struct RangeTblEntry; /* avoid including parsenodes.h here */
struct ExprEvalStep; /* avoid including execExpr.h everywhere */
+struct QueryDesc; /* avoid including execdesc.h here */
struct CopyMultiInsertBuffer;
struct LogicalTapeSet;
@@ -763,6 +764,9 @@ typedef struct EState
*/
List *es_insert_pending_result_relations;
List *es_insert_pending_modifytables;
+
+ /* Reference to query descriptor */
+ struct QueryDesc *query_desc;
} EState;
@@ -1159,6 +1163,8 @@ typedef struct PlanState
ExecProcNodeMtd ExecProcNode; /* function to return next tuple */
ExecProcNodeMtd ExecProcNodeReal; /* actual function, if above is a
* wrapper */
+ ExecProcNodeMtd ExecProcNodeOriginal; /* pointer to original function
+ * another wrapper was added */
Instrumentation *instrument; /* Optional runtime stats for this node */
WorkerInstrumentation *worker_instrument; /* per-worker instrumentation */
diff --git a/src/include/storage/lwlock.h b/src/include/storage/lwlock.h
index 13a7dc89980..eef4af1a3e7 100644
--- a/src/include/storage/lwlock.h
+++ b/src/include/storage/lwlock.h
@@ -217,6 +217,7 @@ typedef enum BuiltinTrancheIds
LWTRANCHE_SUBTRANS_SLRU,
LWTRANCHE_XACT_SLRU,
LWTRANCHE_PARALLEL_VACUUM_DSA,
+ LWTRANCHE_PROGRESSIVE_EXPLAIN_DSA,
LWTRANCHE_FIRST_USER_DEFINED,
} BuiltinTrancheIds;
diff --git a/src/include/storage/lwlocklist.h b/src/include/storage/lwlocklist.h
index cf565452382..43f10a51862 100644
--- a/src/include/storage/lwlocklist.h
+++ b/src/include/storage/lwlocklist.h
@@ -83,3 +83,4 @@ PG_LWLOCK(49, WALSummarizer)
PG_LWLOCK(50, DSMRegistry)
PG_LWLOCK(51, InjectionPoint)
PG_LWLOCK(52, SerialControl)
+PG_LWLOCK(53, ExplainHash)
diff --git a/src/include/utils/guc.h b/src/include/utils/guc.h
index 1233e07d7da..365c933ab00 100644
--- a/src/include/utils/guc.h
+++ b/src/include/utils/guc.h
@@ -278,6 +278,15 @@ extern PGDLLIMPORT int log_min_duration_statement;
extern PGDLLIMPORT int log_temp_files;
extern PGDLLIMPORT double log_statement_sample_rate;
extern PGDLLIMPORT double log_xact_sample_rate;
+extern PGDLLIMPORT int progressive_explain;
+extern PGDLLIMPORT int progressive_explain_min_duration;
+extern PGDLLIMPORT int progressive_explain_interval;
+extern PGDLLIMPORT int progressive_explain_format;
+extern PGDLLIMPORT bool progressive_explain_verbose;
+extern PGDLLIMPORT bool progressive_explain_settings;
+extern PGDLLIMPORT bool progressive_explain_timing;
+extern PGDLLIMPORT bool progressive_explain_buffers;
+extern PGDLLIMPORT bool progressive_explain_wal;
extern PGDLLIMPORT char *backtrace_functions;
extern PGDLLIMPORT int temp_file_limit;
diff --git a/src/include/utils/timeout.h b/src/include/utils/timeout.h
index 7b19beafdc9..ea66a0505d9 100644
--- a/src/include/utils/timeout.h
+++ b/src/include/utils/timeout.h
@@ -36,6 +36,8 @@ typedef enum TimeoutId
IDLE_STATS_UPDATE_TIMEOUT,
CLIENT_CONNECTION_CHECK_TIMEOUT,
STARTUP_PROGRESS_TIMEOUT,
+ PROGRESSIVE_EXPLAIN_STARTUP_TIMEOUT,
+ PROGRESSIVE_EXPLAIN_TIMEOUT,
/* First user-definable timeout reason */
USER_TIMEOUT,
/* Maximum number of timeout reasons */
diff --git a/src/test/modules/test_misc/t/008_progressive_explain.pl b/src/test/modules/test_misc/t/008_progressive_explain.pl
new file mode 100644
index 00000000000..db6adfa89d8
--- /dev/null
+++ b/src/test/modules/test_misc/t/008_progressive_explain.pl
@@ -0,0 +1,130 @@
+# Copyright (c) 2023-2025, PostgreSQL Global Development Group
+#
+# Test progressive explain
+#
+# We need to make sure pg_stat_progress_explain does not show rows for the local
+# session, even if progressive explain is enabled. For other sessions pg_stat_progress_explain
+# should contain data for a PID only if progressive_explain is enabled and a query
+# is running. Data needs to be removed when query finishes (or gets cancelled).
+
+use strict;
+use PostgreSQL::Test::Cluster;
+use PostgreSQL::Test::Utils;
+use Test::More;
+
+# Initialize node
+my $node = PostgreSQL::Test::Cluster->new('progressive_explain');
+
+$node->init;
+# Configure progressive explain to be logged immediatelly
+$node->append_conf('postgresql.conf', 'progressive_explain_min_duration = 0');
+$node->start;
+
+# Test for local session
+sub test_local_session
+{
+ my $setting = $_[0];
+ # Make sure local session does not appear in pg_stat_progress_explain
+ my $count = $node->safe_psql(
+ 'postgres', qq[
+ SET progressive_explain='$setting';
+ SELECT count(*) from pg_stat_progress_explain WHERE pid = pg_backend_pid()
+ ]);
+
+ ok($count == "0",
+ "Session cannot see its own explain with progressive_explain set to ${setting}");
+}
+
+# Tests for peer session
+sub test_peer_session
+{
+ my $setting = $_[0];
+ my $ret;
+
+ # Start a background session and get its PID
+ my $background_psql = $node->background_psql(
+ 'postgres',
+ on_error_stop => 0);
+
+ my $pid = $background_psql->query_safe(
+ qq[
+ SELECT pg_backend_pid();
+ ]);
+
+ # Configure progressive explain in background session and run a simple query
+ # letting it finish
+ $background_psql->query_safe(
+ qq[
+ SET progressive_explain='$setting';
+ SELECT 1;
+ ]);
+
+ # Check that pg_stat_progress_explain contains no row for the PID that finished
+ # its query gracefully
+ $ret = $node->safe_psql(
+ 'postgres', qq[
+ SELECT count(*) FROM pg_stat_progress_explain where pid = $pid
+ ]);
+
+ ok($ret == "0",
+ "pg_stat_progress_explain empty for completed query with progressive_explain set to ${setting}");
+
+ # Start query in background session and leave it running
+ $background_psql->query_until(
+ qr/start/, q(
+ \echo start
+ SELECT pg_sleep(600);
+ ));
+
+ $ret = $node->safe_psql(
+ 'postgres', qq[
+ SELECT count(*) FROM pg_stat_progress_explain where pid = $pid
+ ]);
+
+ # If progressive_explain is disabled pg_stat_progress_explain should not contain
+ # row for PID
+ if ($setting eq 'off') {
+ ok($ret == "0",
+ "pg_stat_progress_explain empty for running query with progressive_explain set to ${setting}");
+ }
+ # 1 row for pid is expected for running query
+ else {
+ ok($ret == "1",
+ "pg_stat_progress_explain with 1 row for running query with progressive_explain set to ${setting}");
+ }
+
+ # Terminate running query and make sure it is gone
+ $node->safe_psql(
+ 'postgres', qq[
+ SELECT pg_cancel_backend($pid);
+ ]);
+
+ $node->poll_query_until(
+ 'postgres', qq[
+ SELECT count(*) = 0 FROM pg_stat_activity
+ WHERE pid = $pid and state = 'active'
+ ]);
+
+ # Check again pg_stat_progress_explain and confirm that the existing row is
+ # now gone
+ $ret = $node->safe_psql(
+ 'postgres', qq[
+ SELECT count(*) FROM pg_stat_progress_explain where pid = $pid
+ ]);
+
+ ok($ret == "0",
+ "pg_stat_progress_explain empty for canceled query with progressive_explain set to ${setting}");
+}
+
+# Run tests for the local session
+test_local_session('off');
+test_local_session('explain');
+test_local_session('analyze');
+
+# Run tests for peer session
+test_peer_session('off');
+test_peer_session('explain');
+test_peer_session('analyze');
+
+$node->stop;
+done_testing();
diff --git a/src/test/regress/expected/rules.out b/src/test/regress/expected/rules.out
index 62f69ac20b2..4968db5607f 100644
--- a/src/test/regress/expected/rules.out
+++ b/src/test/regress/expected/rules.out
@@ -2041,6 +2041,10 @@ pg_stat_progress_create_index| SELECT s.pid,
s.param15 AS partitions_done
FROM (pg_stat_get_progress_info('CREATE INDEX'::text) s(pid, datid, relid, param1, param2, param3, param4, param5, param6, param7, param8, param9, param10, param11, param12, param13, param14, param15, param16, param17, param18, param19, param20)
LEFT JOIN pg_database d ON ((s.datid = d.oid)));
+pg_stat_progress_explain| SELECT pid,
+ last_print,
+ query_plan
+ FROM pg_stat_progress_explain(true) pg_stat_progress_explain(pid, last_print, query_plan);
pg_stat_progress_vacuum| SELECT s.pid,
s.datid,
d.datname,
diff --git a/src/tools/pgindent/typedefs.list b/src/tools/pgindent/typedefs.list
index 9840060997f..f1a74454187 100644
--- a/src/tools/pgindent/typedefs.list
+++ b/src/tools/pgindent/typedefs.list
@@ -2267,6 +2267,7 @@ ProcessUtilityContext
ProcessUtility_hook_type
ProcessingMode
ProgressCommandType
+ProgressiveExplain
ProjectSet
ProjectSetPath
ProjectSetState
@@ -3870,6 +3871,9 @@ process_sublinks_context
proclist_head
proclist_mutable_iter
proclist_node
+progressiveExplainData
+progressiveExplainHashEntry
+progressiveExplainHashKey
promptStatus_t
pthread_barrier_t
pthread_cond_t
--
2.43.0
Implementation of the new GUC progressive_explain_min_duration was done
with
timeouts. The timeout callback function is used to initialize the
progressive
explain.There is a catch to this implementation. In thread
/messages/by-id/d68c3ae31672664876b22d2dcbb526d2@postgrespro.ru
where torikoshia proposes logging of query plans it was raised concerns
about
logging plans in the CFI, a sensible part of the code. So torikoshia
implemented
a smart workaround consisting in adjusting the execProcNode wrapper of all
nodes
so that the plan printing can be done there.I'm not sure if this same concern applies to timeout callbacks so I also
implemented
a second version of the latest patch that uses that execProcNode wrapper
strategy.The wrapper code was implemented by torikoshia (torikoshia@oss.nttdata.com),
so
adding the credits here.
Did additional benchmarks and found issues with the patch that doesn't do
execProcNode
wrapping. There are sporadic crashes with *double free or corruption (top)*
So making the patch that uses the wrapper the current one. Again, giving
the credits to
torikoshia as being the owner of that section of the code.
Rafael.
Show quoted text
Attachments:
v8-0001-Proposal-for-progressive-explains.patchapplication/octet-stream; name=v8-0001-Proposal-for-progressive-explains.patchDownload
From 894e8f19d73f2e7ad19a5357fafb3c943f9762d8 Mon Sep 17 00:00:00 2001
From: rafaelthca <rafaelthca@gmail.com>
Date: Fri, 7 Mar 2025 17:00:45 -0300
Subject: [PATCH v8] Proposal for progressive explains
This proposal introduces a feature to print execution plans of active
queries in an in-memory shared hash object so that other sessions can
visualize via new view pg_stat_progress_explain.
Plans are only printed if new GUC parameter progressive_explain is
enabled.
When progressive_explain is set to 'explain' the plan will be printed
only once at the beginning of the query. If set to 'analyze' the QueryDesc
will be adjusted adding instrumentation flags. In that case the plan
will be printed on a fixed interval controlled by new GUC parameter
progressive_explain_interval including all instrumentation stats
computed so far (per node rows and execution time).
New view:
- pg_stat_progress_explain
- pid: PID of the process running the query
- last_print: timestamp when plan was last printed
- query_plan: the actual plan (limited read privileges)
New GUCs:
- progressive_explain: if progressive plans are printed for local
session.
- type: enum
- default: off
- values: [off, explain, analyze]
- context: user
- progressive_explain_min_duration: min query duration until progressive
explain starts.
- type: int
- default: 1s
- min: 0
- context: user
- progressive_explain_interval: interval between each explain print.
- type: int
- default: 1s
- min: 10ms
- context: user
- progressive_explain_format: format used to print the plans.
- type: enum
- default: text
- values: [TEXT, XML, JSON, or YAML]
- context: user
- progressive_explain_settings: controls whether information about
modified configuration is added to the printed plan.
- type: bool
- default: off
- context: user
- progressive_explain_verbose: controls whether verbose details are
added to the printed plan.
- type: bool
- default: off
- context: user
- progressive_explain_buffers: controls whether buffers details are
added to the printed plan.
- type: bool
- default: off
- context: user
- progressive_explain_timing: controls whether per node timing details
are added to the printed plan.
- type: bool
- default: off
- context: user
- progressive_explain_wal: controls whether WAL record generation
details are added to the printed plan.
- type: bool
- default: off
- context: user
---
doc/src/sgml/config.sgml | 134 +++
doc/src/sgml/monitoring.sgml | 64 ++
doc/src/sgml/perform.sgml | 96 +++
src/backend/catalog/system_views.sql | 5 +
src/backend/commands/explain.c | 815 +++++++++++++++++-
src/backend/executor/execMain.c | 24 +
src/backend/executor/execProcnode.c | 10 +-
src/backend/executor/instrument.c | 20 +-
src/backend/storage/ipc/ipci.c | 7 +
src/backend/storage/lmgr/lwlock.c | 1 +
.../utils/activity/wait_event_names.txt | 1 +
src/backend/utils/init/postinit.c | 19 +
src/backend/utils/misc/guc_tables.c | 125 +++
src/backend/utils/misc/postgresql.conf.sample | 13 +
src/include/catalog/pg_proc.dat | 10 +
src/include/commands/explain.h | 44 +
src/include/executor/execdesc.h | 1 +
src/include/executor/instrument.h | 1 +
src/include/nodes/execnodes.h | 6 +
src/include/storage/lwlock.h | 1 +
src/include/storage/lwlocklist.h | 1 +
src/include/utils/guc.h | 9 +
src/include/utils/timeout.h | 2 +
.../test_misc/t/008_progressive_explain.pl | 130 +++
src/test/regress/expected/rules.out | 4 +
src/tools/pgindent/typedefs.list | 4 +
26 files changed, 1497 insertions(+), 50 deletions(-)
create mode 100644 src/test/modules/test_misc/t/008_progressive_explain.pl
diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml
index d2fa5f7d1a9..6f0b0d06c7b 100644
--- a/doc/src/sgml/config.sgml
+++ b/doc/src/sgml/config.sgml
@@ -8483,6 +8483,140 @@ COPY postgres_log FROM '/full/path/to/logfile.csv' WITH csv;
</listitem>
</varlistentry>
+ <varlistentry id="guc-progressive-explain" xreflabel="progressive_explain">
+ <term><varname>progressive_explain</varname> (<type>enum</type>)
+ <indexterm>
+ <primary><varname>progressive_explain</varname> configuration parameter</primary>
+ </indexterm>
+ </term>
+ <listitem>
+ <para>
+ Determines whether progressive explains are enabled and how
+ they are executed; see <xref linkend="using-explain-progressive"/>.
+ Possible values are <literal>off</literal>, <literal>explain</literal>
+ and <literal>analyze</literal>. The default is <literal>off</literal>.
+ </para>
+ </listitem>
+ </varlistentry>
+
+ <varlistentry id="guc-progressive-explain-min-duration" xreflabel="progressive_explain_min_duration">
+ <term><varname>progressive_explain_min_duration</varname> (<type>integer</type>)
+ <indexterm>
+ <primary><varname>progressive_explain_min_duration</varname> configuration parameter</primary>
+ </indexterm>
+ </term>
+ <listitem>
+ <para>
+ Sets the threshold (in milliseconds) until progressive explain is
+ printed for the first time. The default is <literal>1s</literal>.
+ </para>
+ </listitem>
+ </varlistentry>
+
+ <varlistentry id="guc-progressive-explain-interval" xreflabel="progressive_explain_interval">
+ <term><varname>progressive_explain_interval</varname> (<type>integer</type>)
+ <indexterm>
+ <primary><varname>progressive_explain_interval</varname> configuration parameter</primary>
+ </indexterm>
+ </term>
+ <listitem>
+ <para>
+ Sets the interval (in milliseconds) between each instrumented
+ progressive explain. The default is <literal>1s</literal>.
+ </para>
+ </listitem>
+ </varlistentry>
+
+ <varlistentry id="guc-progressive-explain-buffers" xreflabel="progressive_explain_buffers">
+ <term><varname>progressive_explain_buffers</varname> (<type>boolean</type>)
+ <indexterm>
+ <primary><varname>progressive_explain_buffers</varname> configuration parameter</primary>
+ </indexterm>
+ </term>
+ <listitem>
+ <para>
+ Controls whether information about buffers is added to
+ progressive explains. Equivalent to the BUFFERS option of
+ EXPLAIN. The default is <literal>off</literal>.
+ </para>
+ </listitem>
+ </varlistentry>
+
+ <varlistentry id="guc-progressive-explain-timing" xreflabel="progressive_explain_timing">
+ <term><varname>progressive_explain_timing</varname> (<type>boolean</type>)
+ <indexterm>
+ <primary><varname>progressive_explain_timing</varname> configuration parameter</primary>
+ </indexterm>
+ </term>
+ <listitem>
+ <para>
+ Controls whether information about per node timing is added
+ to progressive explains. Equivalent to the TIMING option of
+ EXPLAIN. The default is <literal>off</literal>.
+ </para>
+ </listitem>
+ </varlistentry>
+
+ <varlistentry id="guc-progressive-explain-wal" xreflabel="progressive_explain_wal">
+ <term><varname>progressive_explain_wal</varname> (<type>boolean</type>)
+ <indexterm>
+ <primary><varname>progressive_explain_wal</varname> configuration parameter</primary>
+ </indexterm>
+ </term>
+ <listitem>
+ <para>
+ Controls whether information about WAL record generation is
+ added to progressive explains. Equivalent to the WAL option of
+ EXPLAIN. The default is <literal>off</literal>.
+ </para>
+ </listitem>
+ </varlistentry>
+
+ <varlistentry id="guc-progressive-explain-verbose" xreflabel="progressive_explain_verbose">
+ <term><varname>progressive_explain_verbose</varname> (<type>boolean</type>)
+ <indexterm>
+ <primary><varname>progressive_explain_verbose</varname> configuration parameter</primary>
+ </indexterm>
+ </term>
+ <listitem>
+ <para>
+ Controls whether verbose details are added to progressive explains.
+ Equivalent to the VERBOSE option of EXPLAIN. The default is
+ <literal>off</literal>.
+ </para>
+ </listitem>
+ </varlistentry>
+
+ <varlistentry id="guc-progressive-explain-settings" xreflabel="progressive_explain_settings">
+ <term><varname>progressive_explain_settings</varname> (<type>boolean</type>)
+ <indexterm>
+ <primary><varname>progressive_explain_settings</varname> configuration parameter</primary>
+ </indexterm>
+ </term>
+ <listitem>
+ <para>
+ Controls whether information about modified configuration is added to
+ progressive explains. Equivalent to the SETTINGS option of EXPLAIN.
+ The default is <literal>off</literal>.
+ </para>
+ </listitem>
+ </varlistentry>
+
+ <varlistentry id="guc-progressive-explain-format" xreflabel="progressive_explain_format">
+ <term><varname>progressive_explain_format</varname> (<type>enum</type>)
+ <indexterm>
+ <primary><varname>progressive_explain_format</varname> configuration parameter</primary>
+ </indexterm>
+ </term>
+ <listitem>
+ <para>
+ Selects the EXPLAIN output format to be used with progressive
+ explains. Equivalent to the FORMAT option of EXPLAIN. The default
+ is <literal>text</literal>.
+ </para>
+ </listitem>
+ </varlistentry>
+
</variablelist>
</sect2>
diff --git a/doc/src/sgml/monitoring.sgml b/doc/src/sgml/monitoring.sgml
index 16646f560e8..8a9934d53e1 100644
--- a/doc/src/sgml/monitoring.sgml
+++ b/doc/src/sgml/monitoring.sgml
@@ -6793,6 +6793,70 @@ FROM pg_stat_get_backend_idset() AS backendid;
</sect2>
+<sect2 id="explain-progress-reporting">
+ <title>EXPLAIN Progress Reporting</title>
+
+ <indexterm>
+ <primary>pg_stat_progress_explain</primary>
+ </indexterm>
+
+ <para>
+ Whenever a client backend or parallel worker is running a query with
+ <xref linkend="guc-progressive-explain"/> enabled, the
+ <structname>pg_stat_progress_explain</structname> view will contain a
+ corresponding row with query plan details; see
+ <xref linkend="using-explain-progressive"/>. The table below describe the
+ information that will be reported.
+ </para>
+
+ <table id="pg-stat-progress-explain-view" xreflabel="pg_stat_progress_explain">
+ <title><structname>pg_stat_progress_explain</structname> View</title>
+ <tgroup cols="1">
+ <thead>
+ <row>
+ <entry role="catalog_table_entry"><para role="column_definition">
+ Column Type
+ </para>
+ <para>
+ Description
+ </para></entry>
+ </row>
+ </thead>
+
+ <tbody>
+ <row>
+ <entry role="catalog_table_entry"><para role="column_definition">
+ <structfield>pid</structfield> <type>integer</type>
+ </para>
+ <para>
+ Process ID of a client backend or parallel worker.
+ </para></entry>
+ </row>
+
+ <row>
+ <entry role="catalog_table_entry"><para role="column_definition">
+ <structfield>last_print</structfield> <type>timestamp with time zone</type>
+ </para>
+ <para>
+ Timestamp when plan was last printed.
+ </para></entry>
+ </row>
+
+ <row>
+ <entry role="catalog_table_entry"><para role="column_definition">
+ <structfield>query_plan</structfield> <type>text</type>
+ </para>
+ <para>
+ The progressive explain text.
+ </para></entry>
+ </row>
+
+ </tbody>
+ </tgroup>
+ </table>
+
+ </sect2>
+
</sect1>
<sect1 id="dynamic-trace">
diff --git a/doc/src/sgml/perform.sgml b/doc/src/sgml/perform.sgml
index 91feb59abd1..c79e237f813 100644
--- a/doc/src/sgml/perform.sgml
+++ b/doc/src/sgml/perform.sgml
@@ -1109,6 +1109,102 @@ EXPLAIN ANALYZE SELECT * FROM tenk1 WHERE unique1 < 100 AND unique2 > 9000
</para>
</sect2>
+ <sect2 id="using-explain-progressive">
+ <title>Progressive <command>EXPLAIN</command></title>
+
+ <para>
+ The query plan created by the planner for any given active query can
+ be visualized by any session via <xref linkend="pg-stat-progress-explain-view"/>
+ view when <xref linkend="guc-progressive-explain"/> is enabled in the
+ client backend or parallel worker executing query and after min duration
+ specified by <xref linkend="guc-progressive-explain-min-duration"/> has
+ passed. Settings <xref linkend="guc-progressive-explain-timing"/>,
+ <xref linkend="guc-progressive-explain-buffers"/> and
+ <xref linkend="guc-progressive-explain-wal"/> control which additional
+ instrumentaton details are collected and included in the output while
+ <xref linkend="guc-progressive-explain-format"/>,
+ <xref linkend="guc-progressive-explain-verbose"/> and
+ <xref linkend="guc-progressive-explain-settings"/>
+ define how the plan is printed and which details are added there.
+ </para>
+
+ <para>
+ When <xref linkend="guc-progressive-explain"/> is set to <literal>explain</literal>
+ the plan will be printed once at the beginning of the query.
+ </para>
+
+ <para>
+<screen>
+SET progressive_explain = 'explain';
+SET
+
+SELECT * FROM test t1 INNER JOIN test t2 ON t1.c1=t2.c1;
+</screen>
+ </para>
+ <para>
+<screen>
+SELECT * FROM pg_stat_progress_explain;
+ pid | last_print | query_plan
+------+-------------------------------+------------------------------------------------------------------------------
+ 5307 | 2025-02-18 09:37:41.781459-03 | Hash Join (cost=327879.85..878413.95 rows=9999860 width=18) +
+ | | Hash Cond: (t1.c1 = t2.c1) +
+ | | -> Seq Scan on test t1 (cost=0.00..154053.60 rows=9999860 width=9) +
+ | | -> Hash (cost=154053.60..154053.60 rows=9999860 width=9) +
+ | | -> Seq Scan on test t2 (cost=0.00..154053.60 rows=9999860 width=9)+
+ | |
+(1 row)
+</screen>
+ </para>
+
+ <para>
+ Setting <xref linkend="guc-progressive-explain"/> to <literal>analyze</literal>
+ will enable instrumentation and the detailed plan is printed on a fixed interval
+ controlled by <xref linkend="guc-progressive-explain-interval"/>, including
+ per node accumulated row count and other statistics.
+ </para>
+
+ <para>
+ Progressive explains include additional information per node to help analyzing
+ execution progress:
+
+ <itemizedlist>
+ <listitem>
+ <para>
+ current: the plan node currently being processed.
+ </para>
+ </listitem>
+ <listitem>
+ <para>
+ never executed: a plan node not processed yet.
+ </para>
+ </listitem>
+ </itemizedlist>
+ </para>
+ <para>
+<screen>
+SET progressive_explain = 'analyze';
+SET
+
+SELECT * FROM test t1 INNER JOIN test t2 ON t1.c1=t2.c1;
+</screen>
+ </para>
+ <para>
+<screen>
+SELECT * FROM pg_stat_progress_explain;
+ pid | last_print | query_plan
+------+-------------------------------+------------------------------------------------------------------------------------------------------------------------------------
+ 5307 | 2025-02-18 09:36:03.580721-03 | Hash Join (cost=327879.85..878413.95 rows=9999860 width=18) (actual time=2010.504..2998.111 rows=38603 loops=1) +
+ | | Hash Cond: (t1.c1 = t2.c1) +
+ | | -> Seq Scan on test t1 (cost=0.00..154053.60 rows=9999860 width=9) (actual time=0.068..303.963 rows=4928320 loops=1) (current)+
+ | | -> Hash (cost=154053.60..154053.60 rows=9999860 width=9) (actual time=2009.824..2009.824 rows=10000000 loops=1) +
+ | | -> Seq Scan on test t2 (cost=0.00..154053.60 rows=9999860 width=9) (actual time=0.035..640.890 rows=10000000 loops=1) +
+ | |
+(1 row)
+</screen>
+ </para>
+
+ </sect2>
+
</sect1>
<sect1 id="planner-stats">
diff --git a/src/backend/catalog/system_views.sql b/src/backend/catalog/system_views.sql
index a4d2cfdcaf5..dd746078ea7 100644
--- a/src/backend/catalog/system_views.sql
+++ b/src/backend/catalog/system_views.sql
@@ -1334,6 +1334,11 @@ CREATE VIEW pg_stat_progress_copy AS
FROM pg_stat_get_progress_info('COPY') AS S
LEFT JOIN pg_database D ON S.datid = D.oid;
+CREATE VIEW pg_stat_progress_explain AS
+ SELECT
+ *
+ FROM pg_stat_progress_explain(true);
+
CREATE VIEW pg_user_mappings AS
SELECT
U.oid AS umid,
diff --git a/src/backend/commands/explain.c b/src/backend/commands/explain.c
index d8a7232cedb..0ff07d39e9b 100644
--- a/src/backend/commands/explain.c
+++ b/src/backend/commands/explain.c
@@ -21,6 +21,7 @@
#include "commands/explain_format.h"
#include "commands/prepare.h"
#include "foreign/fdwapi.h"
+#include "funcapi.h"
#include "jit/jit.h"
#include "libpq/pqformat.h"
#include "libpq/protocol.h"
@@ -32,6 +33,7 @@
#include "rewrite/rewriteHandler.h"
#include "storage/bufmgr.h"
#include "tcop/tcopprot.h"
+#include "utils/backend_status.h"
#include "utils/builtins.h"
#include "utils/guc_tables.h"
#include "utils/json.h"
@@ -39,17 +41,28 @@
#include "utils/rel.h"
#include "utils/ruleutils.h"
#include "utils/snapmgr.h"
+#include "utils/timeout.h"
#include "utils/tuplesort.h"
#include "utils/typcache.h"
#include "utils/xml.h"
+#define PROGRESSIVE_EXPLAIN_ALLOC_SIZE 4096
+
/* Hook for plugins to get control in ExplainOneQuery() */
ExplainOneQuery_hook_type ExplainOneQuery_hook = NULL;
/* Hook for plugins to get control in explain_get_index_name() */
explain_get_index_name_hook_type explain_get_index_name_hook = NULL;
+/* Shared hash to store progressive explains */
+static HTAB *progressiveExplainArray = NULL;
+
+/* Pointer to running query */
+static QueryDesc *activeQueryDesc = NULL;
+
+/* Flag set by timeouts to control when to print progressive explains */
+bool ProgressiveExplainPending = false;
/*
* Various places within need to convert bytes to kilobytes. Round these up
@@ -128,7 +141,7 @@ static void show_hashagg_info(AggState *aggstate, ExplainState *es);
static void show_tidbitmap_info(BitmapHeapScanState *planstate,
ExplainState *es);
static void show_instrumentation_count(const char *qlabel, int which,
- PlanState *planstate, ExplainState *es);
+ Instrumentation *instr, ExplainState *es);
static void show_foreignscan_info(ForeignScanState *fsstate, ExplainState *es);
static const char *explain_get_index_name(Oid indexId);
static bool peek_buffer_usage(ExplainState *es, const BufferUsage *usage);
@@ -154,6 +167,13 @@ static ExplainWorkersState *ExplainCreateWorkersState(int num_workers);
static void ExplainOpenWorker(int n, ExplainState *es);
static void ExplainCloseWorker(int n, ExplainState *es);
static void ExplainFlushWorkersState(ExplainState *es);
+static void ProgressiveExplainInit(QueryDesc *queryDesc);
+static void ProgressiveExplainPrint(QueryDesc *queryDesc);
+static void ProgressiveExplainCleanup(QueryDesc *queryDesc);
+static TupleTableSlot *ExecProcNodeExplain(PlanState *node);
+static void WrapExecProcNodeWithExplain(PlanState *ps);
+static void UnwrapExecProcNodeWithExplain(PlanState *ps);
+static void ProgressiveExplainReleaseFunc(void *);
@@ -366,6 +386,8 @@ NewExplainState(void)
es->costs = true;
/* Prepare output buffer. */
es->str = makeStringInfo();
+ /* An explain state is not progressive by default */
+ es->progressive = false;
return es;
}
@@ -1490,6 +1512,7 @@ ExplainNode(PlanState *planstate, List *ancestors,
const char *partialmode = NULL;
const char *operation = NULL;
const char *custom_name = NULL;
+ Instrumentation *local_instr = NULL;
ExplainWorkersState *save_workers_state = es->workers_state;
int save_indent = es->indent;
bool haschildren;
@@ -1953,17 +1976,38 @@ ExplainNode(PlanState *planstate, List *ancestors,
* instrumentation results the user didn't ask for. But we do the
* InstrEndLoop call anyway, if possible, to reduce the number of cases
* auto_explain has to contend with.
+ *
+ * For regular explains instrumentation clean up is called directly in the
+ * main instrumentation objects. Progressive explains need to clone
+ * instrumentation object and forcibly end the loop in nodes that may be
+ * running.
*/
if (planstate->instrument)
- InstrEndLoop(planstate->instrument);
+ {
+ /* Progressive explain. Use auxiliary instrumentation object */
+ if (es->progressive)
+ {
+ local_instr = es->pe_local_instr;
+ *local_instr = *planstate->instrument;
+
+ /* Force end loop even if node is in progress */
+ InstrEndLoopForce(local_instr);
+ }
+ /* Use main instrumentation */
+ else
+ {
+ local_instr = planstate->instrument;
+ InstrEndLoop(local_instr);
+ }
+ }
if (es->analyze &&
- planstate->instrument && planstate->instrument->nloops > 0)
+ local_instr && local_instr->nloops > 0)
{
- double nloops = planstate->instrument->nloops;
- double startup_ms = 1000.0 * planstate->instrument->startup / nloops;
- double total_ms = 1000.0 * planstate->instrument->total / nloops;
- double rows = planstate->instrument->ntuples / nloops;
+ double nloops = local_instr->nloops;
+ double startup_ms = 1000.0 * local_instr->startup / nloops;
+ double total_ms = 1000.0 * local_instr->total / nloops;
+ double rows = local_instr->ntuples / nloops;
if (es->format == EXPLAIN_FORMAT_TEXT)
{
@@ -1973,6 +2017,9 @@ ExplainNode(PlanState *planstate, List *ancestors,
appendStringInfo(es->str, "time=%.3f..%.3f ", startup_ms, total_ms);
appendStringInfo(es->str, "rows=%.2f loops=%.0f)", rows, nloops);
+
+ if (es->progressive && planstate == es->pe_curr_node)
+ appendStringInfo(es->str, " (current)");
}
else
{
@@ -1985,6 +2032,10 @@ ExplainNode(PlanState *planstate, List *ancestors,
}
ExplainPropertyFloat("Actual Rows", NULL, rows, 2, es);
ExplainPropertyFloat("Actual Loops", NULL, nloops, 0, es);
+
+ /* Progressive explain. Add current node flag is applicable */
+ if (es->progressive && planstate == es->pe_curr_node)
+ ExplainPropertyBool("Current", true, es);
}
}
else if (es->analyze)
@@ -2089,29 +2140,29 @@ ExplainNode(PlanState *planstate, List *ancestors,
"Index Cond", planstate, ancestors, es);
if (((IndexScan *) plan)->indexqualorig)
show_instrumentation_count("Rows Removed by Index Recheck", 2,
- planstate, es);
+ local_instr, es);
show_scan_qual(((IndexScan *) plan)->indexorderbyorig,
"Order By", planstate, ancestors, es);
show_scan_qual(plan->qual, "Filter", planstate, ancestors, es);
if (plan->qual)
show_instrumentation_count("Rows Removed by Filter", 1,
- planstate, es);
+ local_instr, es);
break;
case T_IndexOnlyScan:
show_scan_qual(((IndexOnlyScan *) plan)->indexqual,
"Index Cond", planstate, ancestors, es);
if (((IndexOnlyScan *) plan)->recheckqual)
show_instrumentation_count("Rows Removed by Index Recheck", 2,
- planstate, es);
+ local_instr, es);
show_scan_qual(((IndexOnlyScan *) plan)->indexorderby,
"Order By", planstate, ancestors, es);
show_scan_qual(plan->qual, "Filter", planstate, ancestors, es);
if (plan->qual)
show_instrumentation_count("Rows Removed by Filter", 1,
- planstate, es);
+ local_instr, es);
if (es->analyze)
ExplainPropertyFloat("Heap Fetches", NULL,
- planstate->instrument->ntuples2, 0, es);
+ local_instr->ntuples2, 0, es);
break;
case T_BitmapIndexScan:
show_scan_qual(((BitmapIndexScan *) plan)->indexqualorig,
@@ -2122,11 +2173,11 @@ ExplainNode(PlanState *planstate, List *ancestors,
"Recheck Cond", planstate, ancestors, es);
if (((BitmapHeapScan *) plan)->bitmapqualorig)
show_instrumentation_count("Rows Removed by Index Recheck", 2,
- planstate, es);
+ local_instr, es);
show_scan_qual(plan->qual, "Filter", planstate, ancestors, es);
if (plan->qual)
show_instrumentation_count("Rows Removed by Filter", 1,
- planstate, es);
+ local_instr, es);
show_tidbitmap_info((BitmapHeapScanState *) planstate, es);
break;
case T_SampleScan:
@@ -2143,7 +2194,7 @@ ExplainNode(PlanState *planstate, List *ancestors,
show_scan_qual(plan->qual, "Filter", planstate, ancestors, es);
if (plan->qual)
show_instrumentation_count("Rows Removed by Filter", 1,
- planstate, es);
+ local_instr, es);
if (IsA(plan, CteScan))
show_ctescan_info(castNode(CteScanState, planstate), es);
break;
@@ -2154,7 +2205,7 @@ ExplainNode(PlanState *planstate, List *ancestors,
show_scan_qual(plan->qual, "Filter", planstate, ancestors, es);
if (plan->qual)
show_instrumentation_count("Rows Removed by Filter", 1,
- planstate, es);
+ local_instr, es);
ExplainPropertyInteger("Workers Planned", NULL,
gather->num_workers, es);
@@ -2178,7 +2229,7 @@ ExplainNode(PlanState *planstate, List *ancestors,
show_scan_qual(plan->qual, "Filter", planstate, ancestors, es);
if (plan->qual)
show_instrumentation_count("Rows Removed by Filter", 1,
- planstate, es);
+ local_instr, es);
ExplainPropertyInteger("Workers Planned", NULL,
gm->num_workers, es);
@@ -2212,7 +2263,7 @@ ExplainNode(PlanState *planstate, List *ancestors,
show_scan_qual(plan->qual, "Filter", planstate, ancestors, es);
if (plan->qual)
show_instrumentation_count("Rows Removed by Filter", 1,
- planstate, es);
+ local_instr, es);
break;
case T_TableFuncScan:
if (es->verbose)
@@ -2226,7 +2277,7 @@ ExplainNode(PlanState *planstate, List *ancestors,
show_scan_qual(plan->qual, "Filter", planstate, ancestors, es);
if (plan->qual)
show_instrumentation_count("Rows Removed by Filter", 1,
- planstate, es);
+ local_instr, es);
show_table_func_scan_info(castNode(TableFuncScanState,
planstate), es);
break;
@@ -2244,7 +2295,7 @@ ExplainNode(PlanState *planstate, List *ancestors,
show_scan_qual(plan->qual, "Filter", planstate, ancestors, es);
if (plan->qual)
show_instrumentation_count("Rows Removed by Filter", 1,
- planstate, es);
+ local_instr, es);
}
break;
case T_TidRangeScan:
@@ -2261,14 +2312,14 @@ ExplainNode(PlanState *planstate, List *ancestors,
show_scan_qual(plan->qual, "Filter", planstate, ancestors, es);
if (plan->qual)
show_instrumentation_count("Rows Removed by Filter", 1,
- planstate, es);
+ local_instr, es);
}
break;
case T_ForeignScan:
show_scan_qual(plan->qual, "Filter", planstate, ancestors, es);
if (plan->qual)
show_instrumentation_count("Rows Removed by Filter", 1,
- planstate, es);
+ local_instr, es);
show_foreignscan_info((ForeignScanState *) planstate, es);
break;
case T_CustomScan:
@@ -2278,7 +2329,7 @@ ExplainNode(PlanState *planstate, List *ancestors,
show_scan_qual(plan->qual, "Filter", planstate, ancestors, es);
if (plan->qual)
show_instrumentation_count("Rows Removed by Filter", 1,
- planstate, es);
+ local_instr, es);
if (css->methods->ExplainCustomScan)
css->methods->ExplainCustomScan(css, ancestors, es);
}
@@ -2288,11 +2339,11 @@ ExplainNode(PlanState *planstate, List *ancestors,
"Join Filter", planstate, ancestors, es);
if (((NestLoop *) plan)->join.joinqual)
show_instrumentation_count("Rows Removed by Join Filter", 1,
- planstate, es);
+ local_instr, es);
show_upper_qual(plan->qual, "Filter", planstate, ancestors, es);
if (plan->qual)
show_instrumentation_count("Rows Removed by Filter", 2,
- planstate, es);
+ local_instr, es);
break;
case T_MergeJoin:
show_upper_qual(((MergeJoin *) plan)->mergeclauses,
@@ -2301,11 +2352,11 @@ ExplainNode(PlanState *planstate, List *ancestors,
"Join Filter", planstate, ancestors, es);
if (((MergeJoin *) plan)->join.joinqual)
show_instrumentation_count("Rows Removed by Join Filter", 1,
- planstate, es);
+ local_instr, es);
show_upper_qual(plan->qual, "Filter", planstate, ancestors, es);
if (plan->qual)
show_instrumentation_count("Rows Removed by Filter", 2,
- planstate, es);
+ local_instr, es);
break;
case T_HashJoin:
show_upper_qual(((HashJoin *) plan)->hashclauses,
@@ -2314,11 +2365,11 @@ ExplainNode(PlanState *planstate, List *ancestors,
"Join Filter", planstate, ancestors, es);
if (((HashJoin *) plan)->join.joinqual)
show_instrumentation_count("Rows Removed by Join Filter", 1,
- planstate, es);
+ local_instr, es);
show_upper_qual(plan->qual, "Filter", planstate, ancestors, es);
if (plan->qual)
show_instrumentation_count("Rows Removed by Filter", 2,
- planstate, es);
+ local_instr, es);
break;
case T_Agg:
show_agg_keys(castNode(AggState, planstate), ancestors, es);
@@ -2326,13 +2377,13 @@ ExplainNode(PlanState *planstate, List *ancestors,
show_hashagg_info((AggState *) planstate, es);
if (plan->qual)
show_instrumentation_count("Rows Removed by Filter", 1,
- planstate, es);
+ local_instr, es);
break;
case T_WindowAgg:
show_upper_qual(plan->qual, "Filter", planstate, ancestors, es);
if (plan->qual)
show_instrumentation_count("Rows Removed by Filter", 1,
- planstate, es);
+ local_instr, es);
show_upper_qual(((WindowAgg *) plan)->runConditionOrig,
"Run Condition", planstate, ancestors, es);
show_windowagg_info(castNode(WindowAggState, planstate), es);
@@ -2342,7 +2393,7 @@ ExplainNode(PlanState *planstate, List *ancestors,
show_upper_qual(plan->qual, "Filter", planstate, ancestors, es);
if (plan->qual)
show_instrumentation_count("Rows Removed by Filter", 1,
- planstate, es);
+ local_instr, es);
break;
case T_Sort:
show_sort_keys(castNode(SortState, planstate), ancestors, es);
@@ -2364,7 +2415,7 @@ ExplainNode(PlanState *planstate, List *ancestors,
show_upper_qual(plan->qual, "Filter", planstate, ancestors, es);
if (plan->qual)
show_instrumentation_count("Rows Removed by Filter", 1,
- planstate, es);
+ local_instr, es);
break;
case T_ModifyTable:
show_modifytable_info(castNode(ModifyTableState, planstate), ancestors,
@@ -2409,10 +2460,10 @@ ExplainNode(PlanState *planstate, List *ancestors,
}
/* Show buffer/WAL usage */
- if (es->buffers && planstate->instrument)
- show_buffer_usage(es, &planstate->instrument->bufusage);
- if (es->wal && planstate->instrument)
- show_wal_usage(es, &planstate->instrument->walusage);
+ if (es->buffers && local_instr)
+ show_buffer_usage(es, &local_instr->bufusage);
+ if (es->wal && local_instr)
+ show_wal_usage(es, &local_instr->walusage);
/* Prepare per-worker buffer/WAL usage */
if (es->workers_state && (es->buffers || es->wal) && es->verbose)
@@ -3929,19 +3980,19 @@ show_tidbitmap_info(BitmapHeapScanState *planstate, ExplainState *es)
*/
static void
show_instrumentation_count(const char *qlabel, int which,
- PlanState *planstate, ExplainState *es)
+ Instrumentation *instr, ExplainState *es)
{
double nfiltered;
double nloops;
- if (!es->analyze || !planstate->instrument)
+ if (!es->analyze || !instr)
return;
if (which == 2)
- nfiltered = planstate->instrument->nfiltered2;
+ nfiltered = instr->nfiltered2;
else
- nfiltered = planstate->instrument->nfiltered1;
- nloops = planstate->instrument->nloops;
+ nfiltered = instr->nfiltered1;
+ nloops = instr->nloops;
/* In text mode, suppress zero counts; they're not interesting enough */
if (nfiltered > 0 || es->format != EXPLAIN_FORMAT_TEXT)
@@ -4612,7 +4663,7 @@ show_modifytable_info(ModifyTableState *mtstate, List *ancestors,
{
show_upper_qual((List *) node->onConflictWhere, "Conflict Filter",
&mtstate->ps, ancestors, es);
- show_instrumentation_count("Rows Removed by Conflict Filter", 1, &mtstate->ps, es);
+ show_instrumentation_count("Rows Removed by Conflict Filter", 1, (&mtstate->ps)->instrument, es);
}
/* EXPLAIN ANALYZE display of actual outcome for each tuple proposed */
@@ -4621,11 +4672,24 @@ show_modifytable_info(ModifyTableState *mtstate, List *ancestors,
double total;
double insert_path;
double other_path;
+ Instrumentation *local_instr;
- InstrEndLoop(outerPlanState(mtstate)->instrument);
+ /* Progressive explain. Use auxiliary instrumentation object */
+ if (es->progressive)
+ {
+ local_instr = es->pe_local_instr;
+ *local_instr = *outerPlanState(mtstate)->instrument;
+ /* Force end loop even if node is in progress */
+ InstrEndLoopForce(local_instr);
+ }
+ else
+ {
+ local_instr = outerPlanState(mtstate)->instrument;
+ InstrEndLoop(local_instr);
+ }
/* count the number of source rows */
- total = outerPlanState(mtstate)->instrument->ntuples;
+ total = local_instr->ntuples;
other_path = mtstate->ps.instrument->ntuples2;
insert_path = total - other_path;
@@ -4645,11 +4709,24 @@ show_modifytable_info(ModifyTableState *mtstate, List *ancestors,
double update_path;
double delete_path;
double skipped_path;
+ Instrumentation *local_instr;
- InstrEndLoop(outerPlanState(mtstate)->instrument);
+ /* Progressive explain. Use auxiliary instrumentation object */
+ if (es->progressive)
+ {
+ local_instr = es->pe_local_instr;
+ *local_instr = *outerPlanState(mtstate)->instrument;
+ /* Force end loop even if node is in progress */
+ InstrEndLoopForce(local_instr);
+ }
+ else
+ {
+ local_instr = outerPlanState(mtstate)->instrument;
+ InstrEndLoop(local_instr);
+ }
/* count the number of source rows */
- total = outerPlanState(mtstate)->instrument->ntuples;
+ total = local_instr->ntuples;
insert_path = mtstate->mt_merge_inserted;
update_path = mtstate->mt_merge_updated;
delete_path = mtstate->mt_merge_deleted;
@@ -4931,3 +5008,647 @@ ExplainFlushWorkersState(ExplainState *es)
pfree(wstate->worker_state_save);
pfree(wstate);
}
+
+/*
+ * ProgressiveExplainSetup
+ * Adjusts QueryDesc with instrumentation for progressive explains.
+ *
+ * If progressive explain is enabled and configured to collect
+ * instrumentation details, we adjust QueryDesc accordingly even if
+ * the query was not initiated with EXPLAIN ANALYZE. This will
+ * directly affect query execution and add computation overhead.
+ */
+void
+ProgressiveExplainSetup(QueryDesc *queryDesc)
+{
+ /* Adjust instrumentation if enabled */
+ if (progressive_explain == PROGRESSIVE_EXPLAIN_ANALYZE)
+ {
+ if (progressive_explain_timing)
+ queryDesc->instrument_options |= INSTRUMENT_TIMER;
+ else
+ queryDesc->instrument_options |= INSTRUMENT_ROWS;
+ if (progressive_explain_buffers)
+ queryDesc->instrument_options |= INSTRUMENT_BUFFERS;
+ if (progressive_explain_wal)
+ queryDesc->instrument_options |= INSTRUMENT_WAL;
+ }
+}
+
+/*
+ * ProgressiveExplainStart
+ * Progressive explain start point.
+ */
+void
+ProgressiveExplainStart(QueryDesc *queryDesc)
+{
+ activeQueryDesc = queryDesc;
+ queryDesc->pe_es = NULL;
+
+ /* Timeout is only needed if duration > 0 */
+ if (progressive_explain_min_duration == 0)
+ ProgressiveExplainInit(queryDesc);
+ else
+ enable_timeout_after(PROGRESSIVE_EXPLAIN_STARTUP_TIMEOUT,
+ progressive_explain_min_duration);
+}
+
+/*
+ * ExecProcNodeWithExplain
+ * Responsible for initialization of all structures related to progressive
+ * explains.
+ *
+ * We define a ExplainState that will be reused in every iteration of
+ * plan prints.
+ *
+ * Progressive explain plans are printed in shared memory via DSAs. Each
+ * A dynamic shared memory area is created to hold the progressive plans.
+ * Each backend printing plans has its own DSA, which is shared with other
+ * backends via the global progressive explain hash through dsa_handle and
+ * dsa_pointer pointers.
+ *
+ * A memory context release callback is defined for manual resource release
+ * in case of query cancellations.
+ *
+ * A periodic timeout is configured to print the plan in fixed intervals if
+ * progressive explain is configured with instrumentation enabled. Otherwise
+ * the plain plan is printed once.
+ */
+void
+ProgressiveExplainInit(QueryDesc *queryDesc)
+{
+ ExplainState *es;
+ progressiveExplainHashKey key;
+ progressiveExplainHashEntry *entry;
+ bool found;
+
+ /* Configure memory context release callback */
+ MemoryContextCallback *queryDescReleaseCallback;
+
+ queryDescReleaseCallback = (MemoryContextCallback *)
+ palloc0(sizeof(MemoryContextCallback));
+ queryDescReleaseCallback->func = ProgressiveExplainReleaseFunc;
+ queryDescReleaseCallback->arg = NULL;
+ MemoryContextRegisterResetCallback(CurrentMemoryContext,
+ queryDescReleaseCallback);
+
+ /* Initialize ExplainState to be used for all prints */
+ es = NewExplainState();
+ queryDesc->pe_es = es;
+
+ /* Local instrumentation object to be reused for every node */
+ es->pe_local_instr = palloc0(sizeof(Instrumentation));
+
+ /*
+ * Mark ExplainState as progressive so that the plan printing function
+ * adjusts logic accordingly.
+ */
+ es->progressive = true;
+
+ es->analyze = (queryDesc->instrument_options &&
+ (progressive_explain == PROGRESSIVE_EXPLAIN_ANALYZE));
+ es->buffers = (es->analyze && progressive_explain_buffers);
+ es->wal = (es->analyze && progressive_explain_wal);
+ es->timing = (es->analyze && progressive_explain_timing);
+ es->summary = (es->analyze);
+ es->format = progressive_explain_format;
+ es->verbose = progressive_explain_verbose;
+ es->settings = progressive_explain_settings;
+
+ /* Define the DSA and share through the hash */
+ es->pe_a = dsa_create(LWTRANCHE_PROGRESSIVE_EXPLAIN_DSA);
+
+ /* Exclusive access is needed to update the hash */
+ LWLockAcquire(ExplainHashLock, LW_EXCLUSIVE);
+
+ /* Find or create an entry with desired hash code */
+ key.pid = MyProcPid;
+ entry = (progressiveExplainHashEntry *) hash_search(progressiveExplainArray,
+ &key,
+ HASH_ENTER,
+ &found);
+
+ entry->h = dsa_get_handle(es->pe_a);
+ entry->p = (dsa_pointer) NULL;
+
+ LWLockRelease(ExplainHashLock);
+
+ /* Enable timeout only if instrumentation is enabled */
+ if (es->analyze)
+ enable_timeout_every(PROGRESSIVE_EXPLAIN_TIMEOUT,
+ TimestampTzPlusMilliseconds(GetCurrentTimestamp(),
+ progressive_explain_interval),
+ progressive_explain_interval);
+
+ /* Printing progressive plan for the first time */
+ ProgressiveExplainPrint(queryDesc);
+}
+
+/*
+ * ProgressiveExplainTrigger
+ * Called by the timeout handler to start printing progressive
+ * explain plans.
+ */
+void
+ProgressiveExplainTrigger(void)
+{
+ WrapExecProcNodeWithExplain(activeQueryDesc->planstate);
+}
+
+/*
+ * ProgressiveExplainUpdate
+ * Updates progressive explain for instrumented runs.
+ */
+void
+ProgressiveExplainUpdate(PlanState *node)
+{
+ /* Track the current PlanState */
+ node->state->query_desc->pe_es->pe_curr_node = node;
+ ProgressiveExplainPrint(node->state->query_desc);
+ node->state->query_desc->pe_es->pe_curr_node = NULL;
+
+ /* Reset timeout flag */
+ ProgressiveExplainPending = false;
+}
+
+/*
+ * ProgressiveExplainPrint
+ * Prints progressive explain in memory.
+ *
+ * This function resets the reusable ExplainState, prints the
+ * plan and updates the DSA with new data.
+ *
+ * DSA memory allocation is also done here. Amount of shared
+ * memory allocated depends on size of currently printed plan.
+ * There may be reallocations in subsequent calls if new plans
+ * don't fit in the existing area.
+ */
+void
+ProgressiveExplainPrint(QueryDesc *queryDesc)
+{
+ bool alloc_needed = false;
+
+ /* Produce a plan only if descriptor is being tracked */
+ if (queryDesc &&
+ queryDesc->planstate)
+ {
+ QueryDesc *currentQueryDesc = queryDesc;
+
+ progressiveExplainHashKey key;
+ progressiveExplainHashEntry *entry;
+ progressiveExplainData *pe_data;
+ ExplainState *es = queryDesc->pe_es;
+
+ /* Reset the string to be reused */
+ resetStringInfo(es->str);
+
+ /* Print the plan */
+ ExplainBeginOutput(es);
+ ExplainPrintPlan(es, currentQueryDesc);
+ ExplainEndOutput(es);
+
+ /* Exclusive access is needed to update the hash */
+ key.pid = MyProcPid;
+ LWLockAcquire(ExplainHashLock, LW_EXCLUSIVE);
+ entry = (progressiveExplainHashEntry *) hash_search(progressiveExplainArray,
+ &key,
+ HASH_FIND,
+ NULL);
+
+ /* Entry exists */
+ if (entry)
+ {
+ /* Plan was never printed */
+ if (!entry->p)
+ alloc_needed = true;
+ else
+ {
+ pe_data = dsa_get_address(es->pe_a,
+ entry->p);
+
+ /*
+ * Plan does not fit in existing shared memory area.
+ * Reallocation is needed.
+ */
+ if (strlen(es->str->data) >
+ add_size(strlen(pe_data->plan),
+ PROGRESSIVE_EXPLAIN_ALLOC_SIZE))
+ {
+ dsa_free(es->pe_a, entry->p);
+ alloc_needed = true;
+ }
+ }
+
+ if (alloc_needed)
+ {
+ /*
+ * The allocated size combines the length of the currently
+ * printed query plan with an additional delta defined by
+ * PROGRESSIVE_EXPLAIN_ALLOC_SIZE. This strategy prevents
+ * having to reallocate the segment very often, which would be
+ * needed in case the length of the next printed exceeds the
+ * previously allocated size.
+ */
+ entry->p = dsa_allocate(es->pe_a,
+ add_size(sizeof(progressiveExplainData),
+ add_size(strlen(es->str->data),
+ PROGRESSIVE_EXPLAIN_ALLOC_SIZE)));
+ pe_data = dsa_get_address(es->pe_a, entry->p);
+ pe_data->pid = MyProcPid;
+ }
+
+ /* Update shared memory with new data */
+ strcpy(pe_data->plan, es->str->data);
+ pe_data->last_print = GetCurrentTimestamp();
+ }
+
+ LWLockRelease(ExplainHashLock);
+ }
+}
+
+/*
+ * ProgressiveExplainFinish
+ * Finalizes query execution with progressive explain enabled.
+ */
+void
+ProgressiveExplainFinish(QueryDesc *queryDesc)
+{
+ /* Startup timeout hasn't triggered yet, just disable it */
+ if (get_timeout_active(PROGRESSIVE_EXPLAIN_STARTUP_TIMEOUT))
+ disable_timeout(PROGRESSIVE_EXPLAIN_STARTUP_TIMEOUT, false);
+ /* Initial progressive explain was done, clean everything */
+ else if (queryDesc && queryDesc->pe_es)
+ ProgressiveExplainCleanup(queryDesc);
+}
+
+/*
+ * ProgressiveExplainCleanup
+ * Cleanup routine when progressive explain is enabled.
+ *
+ * We need to deal with structures not automatically released by the memory
+ * context removal. Current tasks are:
+ * - remove local backend from progressive explain hash
+ * - detach from DSA used to store shared data
+ */
+void
+ProgressiveExplainCleanup(QueryDesc *queryDesc)
+{
+ progressiveExplainHashKey key;
+
+ /* Stop timeout */
+ disable_timeout(PROGRESSIVE_EXPLAIN_TIMEOUT, false);
+
+ /* Reset timeout flag */
+ ProgressiveExplainPending = false;
+
+ /* Reset querydesc tracker */
+ activeQueryDesc = NULL;
+
+ key.pid = MyProcPid;
+ LWLockAcquire(ExplainHashLock, LW_EXCLUSIVE);
+
+ /*
+ * Only detach from DSA if query ended gracefully, ie, if
+ * ProgressiveExplainCleanup was called by function
+ * ProgressiveExplainFinish
+ */
+ if (queryDesc)
+ dsa_detach(queryDesc->pe_es->pe_a);
+ hash_search(progressiveExplainArray, &key, HASH_REMOVE, NULL);
+ LWLockRelease(ExplainHashLock);
+}
+
+/*
+ * ExecProcNodeWithExplain
+ * ExecProcNode wrapper that initializes progressive explain
+ * and uwraps ExecProdNode to the original function.
+ */
+static TupleTableSlot *
+ExecProcNodeExplain(PlanState *node)
+{
+ /* Initialize progressive explain */
+ ProgressiveExplainInit(node->state->query_desc);
+
+ /* Unwrap exec proc node for all nodes */
+ UnwrapExecProcNodeWithExplain(node->state->query_desc->planstate);
+
+ /*
+ * Since unwrapping has already done, call ExecProcNode() not
+ * ExecProcNodeOriginal().
+ */
+ return node->ExecProcNode(node);
+}
+
+/*
+ * ExecProcNode wrapper that performs instrumentation calls and prints
+ * progressive explains. By keeping this a separate function, we add
+ * overhead only when progressive explain is enabled
+ */
+TupleTableSlot *
+ExecProcNodeInstrExplain(PlanState *node)
+{
+ TupleTableSlot *result;
+
+ InstrStartNode(node->instrument);
+
+ /*
+ * Update progressive after timeout is reached.
+ */
+ if (ProgressiveExplainPending)
+ ProgressiveExplainUpdate(node);
+
+ result = node->ExecProcNodeReal(node);
+
+ InstrStopNode(node->instrument, TupIsNull(result) ? 0.0 : 1.0);
+
+ return result;
+}
+
+/*
+ * WrapMultiExecProcNodesWithExplain -
+ * Wrap array of PlanStates ExecProcNodes with ExecProcNodeWithExplain
+ */
+static void
+WrapMultiExecProcNodesWithExplain(PlanState **planstates, int nplans)
+{
+ int i;
+
+ for (i = 0; i < nplans; i++)
+ WrapExecProcNodeWithExplain(planstates[i]);
+}
+
+/*
+ * WrapCustomPlanChildExecProcNodesWithExplain -
+ * Wrap CustomScanstate children's ExecProcNodes with ExecProcNodeWithExplain
+ */
+static void
+WrapCustomPlanChildExecProcNodesWithExplain(CustomScanState *css)
+{
+ ListCell *cell;
+
+ foreach(cell, css->custom_ps)
+ WrapExecProcNodeWithExplain((PlanState *) lfirst(cell));
+}
+
+/*
+ * WrapExecProcNodeWithExplain -
+ * Wrap ExecProcNode with ExecProcNodeWithExplain recursively
+ */
+static void
+WrapExecProcNodeWithExplain(PlanState *ps)
+{
+ /* wrapping can be done only once */
+ if (ps->ExecProcNodeOriginal != NULL)
+ return;
+
+ check_stack_depth();
+
+ ps->ExecProcNodeOriginal = ps->ExecProcNode;
+ ps->ExecProcNode = ExecProcNodeExplain;
+
+ if (ps->lefttree != NULL)
+ WrapExecProcNodeWithExplain(ps->lefttree);
+ if (ps->righttree != NULL)
+ WrapExecProcNodeWithExplain(ps->righttree);
+
+ /* special child plans */
+ switch (nodeTag(ps->plan))
+ {
+ case T_Append:
+ WrapMultiExecProcNodesWithExplain(((AppendState *) ps)->appendplans,
+ ((AppendState *) ps)->as_nplans);
+ break;
+ case T_MergeAppend:
+ WrapMultiExecProcNodesWithExplain(((MergeAppendState *) ps)->mergeplans,
+ ((MergeAppendState *) ps)->ms_nplans);
+ break;
+ case T_BitmapAnd:
+ WrapMultiExecProcNodesWithExplain(((BitmapAndState *) ps)->bitmapplans,
+ ((BitmapAndState *) ps)->nplans);
+ break;
+ case T_BitmapOr:
+ WrapMultiExecProcNodesWithExplain(((BitmapOrState *) ps)->bitmapplans,
+ ((BitmapOrState *) ps)->nplans);
+ break;
+ case T_SubqueryScan:
+ WrapExecProcNodeWithExplain(((SubqueryScanState *) ps)->subplan);
+ break;
+ case T_CustomScan:
+ WrapCustomPlanChildExecProcNodesWithExplain((CustomScanState *) ps);
+ break;
+ default:
+ break;
+ }
+}
+
+/*
+ * UnwrapMultiExecProcNodesWithExplain -
+ * Unwrap array of PlanStates ExecProcNodes with ExecProcNodeWithExplain
+ */
+static void
+UnwrapMultiExecProcNodesWithExplain(PlanState **planstates, int nplans)
+{
+ int i;
+
+ for (i = 0; i < nplans; i++)
+ UnwrapExecProcNodeWithExplain(planstates[i]);
+}
+
+/*
+ * UnwrapCustomPlanChildExecProcNodesWithExplain -
+ * Unwrap CustomScanstate children's ExecProcNodes with ExecProcNodeWithExplain
+ */
+static void
+UnwrapCustomPlanChildExecProcNodesWithExplain(CustomScanState *css)
+{
+ ListCell *cell;
+
+ foreach(cell, css->custom_ps)
+ UnwrapExecProcNodeWithExplain((PlanState *) lfirst(cell));
+}
+
+/*
+ * UnwrapExecProcNodeWithExplain -
+ * Unwrap ExecProcNode with ExecProcNodeWithExplain recursively
+ */
+static void
+UnwrapExecProcNodeWithExplain(PlanState *ps)
+{
+ Assert(ps->ExecProcNodeOriginal != NULL);
+
+ check_stack_depth();
+
+ ps->ExecProcNode = ps->ExecProcNodeOriginal;
+ ps->ExecProcNodeOriginal = NULL;
+
+ if (ps->lefttree != NULL)
+ UnwrapExecProcNodeWithExplain(ps->lefttree);
+ if (ps->righttree != NULL)
+ UnwrapExecProcNodeWithExplain(ps->righttree);
+
+ /* special child plans */
+ switch (nodeTag(ps->plan))
+ {
+ case T_Append:
+ UnwrapMultiExecProcNodesWithExplain(((AppendState *) ps)->appendplans,
+ ((AppendState *) ps)->as_nplans);
+ break;
+ case T_MergeAppend:
+ UnwrapMultiExecProcNodesWithExplain(((MergeAppendState *) ps)->mergeplans,
+ ((MergeAppendState *) ps)->ms_nplans);
+ break;
+ case T_BitmapAnd:
+ UnwrapMultiExecProcNodesWithExplain(((BitmapAndState *) ps)->bitmapplans,
+ ((BitmapAndState *) ps)->nplans);
+ break;
+ case T_BitmapOr:
+ UnwrapMultiExecProcNodesWithExplain(((BitmapOrState *) ps)->bitmapplans,
+ ((BitmapOrState *) ps)->nplans);
+ break;
+ case T_SubqueryScan:
+ UnwrapExecProcNodeWithExplain(((SubqueryScanState *) ps)->subplan);
+ break;
+ case T_CustomScan:
+ UnwrapCustomPlanChildExecProcNodesWithExplain((CustomScanState *) ps);
+ break;
+ default:
+ break;
+ }
+}
+
+/*
+ * ProgressiveExplainReleaseFunc
+ * Memory context release callback function to remove
+ * plan from explain hash and disable the timeout.
+ */
+static void
+ProgressiveExplainReleaseFunc(void *arg)
+{
+ /* Remove row from hash */
+ progressiveExplainHashKey key;
+ progressiveExplainHashEntry *entry;
+
+ key.pid = MyProcPid;
+ LWLockAcquire(ExplainHashLock, LW_SHARED);
+ entry = (progressiveExplainHashEntry *) hash_search(progressiveExplainArray,
+ &key,
+ HASH_FIND,
+ NULL);
+ LWLockRelease(ExplainHashLock);
+ if (entry)
+ ProgressiveExplainCleanup(NULL);
+}
+
+/*
+ * ProgressiveExplainHashShmemSize
+ * Compute shared memory space needed for explain hash.
+ */
+Size
+ProgressiveExplainHashShmemSize(void)
+{
+ Size size = 0;
+ long max_table_size;
+
+ max_table_size = add_size(MaxBackends,
+ max_parallel_workers);
+ size = add_size(size,
+ hash_estimate_size(max_table_size,
+ sizeof(progressiveExplainHashEntry)));
+
+ return size;
+}
+
+/*
+ * InitProgressiveExplainHash
+ * Initialize hash used to store data of progressive explains.
+ */
+void
+InitProgressiveExplainHash(void)
+{
+ HASHCTL info;
+
+ info.keysize = sizeof(progressiveExplainHashKey);
+ info.entrysize = sizeof(progressiveExplainHashEntry);
+
+ progressiveExplainArray = ShmemInitHash("progressive explain hash",
+ 50, 50,
+ &info,
+ HASH_ELEM | HASH_BLOBS);
+}
+
+/*
+ * pg_stat_progress_explain
+ * Return the progress of progressive explains.
+ */
+Datum
+pg_stat_progress_explain(PG_FUNCTION_ARGS)
+{
+#define EXPLAIN_ACTIVITY_COLS 3
+ HASH_SEQ_STATUS hash_seq;
+ progressiveExplainHashEntry *entry;
+ dsa_area *a;
+ progressiveExplainData *ped;
+
+ ReturnSetInfo *rsinfo = (ReturnSetInfo *) fcinfo->resultinfo;
+
+ InitMaterializedSRF(fcinfo, 0);
+
+ LWLockAcquire(ExplainHashLock, LW_SHARED);
+
+ hash_seq_init(&hash_seq, progressiveExplainArray);
+ while ((entry = hash_seq_search(&hash_seq)) != NULL)
+ {
+ Datum values[EXPLAIN_ACTIVITY_COLS] = {0};
+ bool nulls[EXPLAIN_ACTIVITY_COLS] = {0};
+
+ /*
+ * We don't look at a DSA that doesn't contain data yet, or at our own
+ * row.
+ */
+ if (!DsaPointerIsValid(entry->p) ||
+ MyProcPid == entry->key.pid)
+ continue;
+
+ a = dsa_attach(entry->h);
+ ped = dsa_get_address(a, entry->p);
+
+ values[0] = ped->pid;
+ values[1] = TimestampTzGetDatum(ped->last_print);
+
+ if (superuser())
+ values[2] = CStringGetTextDatum(ped->plan);
+ else
+ {
+ int num_backends = pgstat_fetch_stat_numbackends();
+ int curr_backend;
+
+ /* 1-based index */
+ for (curr_backend = 1; curr_backend <= num_backends; curr_backend++)
+ {
+ LocalPgBackendStatus *local_beentry;
+ PgBackendStatus *beentry;
+
+ /* Get the next one in the list */
+ local_beentry = pgstat_get_local_beentry_by_index(curr_backend);
+ beentry = &local_beentry->backendStatus;
+
+ if (beentry->st_procpid == ped->pid)
+ {
+ if (beentry->st_userid == GetUserId())
+ values[2] = CStringGetTextDatum(ped->plan);
+ else
+ values[2] = CStringGetTextDatum("<insufficient privilege>");
+ break;
+ }
+ }
+ }
+
+ tuplestore_putvalues(rsinfo->setResult, rsinfo->setDesc, values, nulls);
+
+ dsa_detach(a);
+
+ }
+ LWLockRelease(ExplainHashLock);
+
+ return (Datum) 0;
+}
diff --git a/src/backend/executor/execMain.c b/src/backend/executor/execMain.c
index 0493b7d5365..52b8b2bd1f7 100644
--- a/src/backend/executor/execMain.c
+++ b/src/backend/executor/execMain.c
@@ -43,6 +43,7 @@
#include "access/xact.h"
#include "catalog/namespace.h"
#include "catalog/partition.h"
+#include "commands/explain.h"
#include "commands/matview.h"
#include "commands/trigger.h"
#include "executor/executor.h"
@@ -157,6 +158,12 @@ standard_ExecutorStart(QueryDesc *queryDesc, int eflags)
/* caller must ensure the query's snapshot is active */
Assert(GetActiveSnapshot() == queryDesc->snapshot);
+ /*
+ * Setup progressive explain if enabled.
+ */
+ if (progressive_explain == PROGRESSIVE_EXPLAIN_ANALYZE)
+ ProgressiveExplainSetup(queryDesc);
+
/*
* If the transaction is read-only, we need to check if any writes are
* planned to non-temporary tables. EXPLAIN is considered read-only.
@@ -182,6 +189,11 @@ standard_ExecutorStart(QueryDesc *queryDesc, int eflags)
estate = CreateExecutorState();
queryDesc->estate = estate;
+ /*
+ * Adding back reference to QueryDesc
+ */
+ estate->query_desc = queryDesc;
+
oldcontext = MemoryContextSwitchTo(estate->es_query_cxt);
/*
@@ -267,6 +279,12 @@ standard_ExecutorStart(QueryDesc *queryDesc, int eflags)
*/
InitPlan(queryDesc, eflags);
+ /*
+ * Start progressive explain if enabled.
+ */
+ if (progressive_explain != PROGRESSIVE_EXPLAIN_NONE)
+ ProgressiveExplainStart(queryDesc);
+
MemoryContextSwitchTo(oldcontext);
return ExecPlanStillValid(queryDesc->estate);
@@ -516,6 +534,12 @@ standard_ExecutorFinish(QueryDesc *queryDesc)
MemoryContextSwitchTo(oldcontext);
+ /*
+ * Finish progressive explain if enabled.
+ */
+ if (progressive_explain != PROGRESSIVE_EXPLAIN_NONE)
+ ProgressiveExplainFinish(queryDesc);
+
estate->es_finished = true;
}
diff --git a/src/backend/executor/execProcnode.c b/src/backend/executor/execProcnode.c
index f5f9cfbeead..3af8e9d964d 100644
--- a/src/backend/executor/execProcnode.c
+++ b/src/backend/executor/execProcnode.c
@@ -72,6 +72,7 @@
*/
#include "postgres.h"
+#include "commands/explain.h"
#include "executor/executor.h"
#include "executor/nodeAgg.h"
#include "executor/nodeAppend.h"
@@ -118,6 +119,7 @@
#include "executor/nodeWorktablescan.h"
#include "miscadmin.h"
#include "nodes/nodeFuncs.h"
+#include "utils/guc.h"
static TupleTableSlot *ExecProcNodeFirst(PlanState *node);
static TupleTableSlot *ExecProcNodeInstr(PlanState *node);
@@ -461,8 +463,14 @@ ExecProcNodeFirst(PlanState *node)
* does instrumentation. Otherwise we can dispense with all wrappers and
* have ExecProcNode() directly call the relevant function from now on.
*/
+
if (node->instrument)
- node->ExecProcNode = ExecProcNodeInstr;
+ {
+ if (progressive_explain == PROGRESSIVE_EXPLAIN_ANALYZE)
+ node->ExecProcNode = ExecProcNodeInstrExplain;
+ else
+ node->ExecProcNode = ExecProcNodeInstr;
+ }
else
node->ExecProcNode = node->ExecProcNodeReal;
diff --git a/src/backend/executor/instrument.c b/src/backend/executor/instrument.c
index 56e635f4700..6a160ee254f 100644
--- a/src/backend/executor/instrument.c
+++ b/src/backend/executor/instrument.c
@@ -25,6 +25,8 @@ static WalUsage save_pgWalUsage;
static void BufferUsageAdd(BufferUsage *dst, const BufferUsage *add);
static void WalUsageAdd(WalUsage *dst, WalUsage *add);
+static void InstrEndLoopInternal(Instrumentation *instr, bool force);
+
/* Allocate new instrumentation structure(s) */
Instrumentation *
@@ -137,7 +139,7 @@ InstrUpdateTupleCount(Instrumentation *instr, double nTuples)
/* Finish a run cycle for a plan node */
void
-InstrEndLoop(Instrumentation *instr)
+InstrEndLoopInternal(Instrumentation *instr, bool force)
{
double totaltime;
@@ -145,7 +147,7 @@ InstrEndLoop(Instrumentation *instr)
if (!instr->running)
return;
- if (!INSTR_TIME_IS_ZERO(instr->starttime))
+ if (!INSTR_TIME_IS_ZERO(instr->starttime) && !force)
elog(ERROR, "InstrEndLoop called on running node");
/* Accumulate per-cycle statistics into totals */
@@ -164,6 +166,20 @@ InstrEndLoop(Instrumentation *instr)
instr->tuplecount = 0;
}
+/* Safely finish a run cycle for a plan node */
+void
+InstrEndLoop(Instrumentation *instr)
+{
+ InstrEndLoopInternal(instr, false);
+}
+
+/* Forcibly finish a run cycle for a plan node */
+void
+InstrEndLoopForce(Instrumentation *instr)
+{
+ InstrEndLoopInternal(instr, true);
+}
+
/* aggregate instrumentation information */
void
InstrAggNode(Instrumentation *dst, Instrumentation *add)
diff --git a/src/backend/storage/ipc/ipci.c b/src/backend/storage/ipc/ipci.c
index 174eed70367..15d8a3b28a8 100644
--- a/src/backend/storage/ipc/ipci.c
+++ b/src/backend/storage/ipc/ipci.c
@@ -25,6 +25,7 @@
#include "access/xlogprefetcher.h"
#include "access/xlogrecovery.h"
#include "commands/async.h"
+#include "commands/explain.h"
#include "miscadmin.h"
#include "pgstat.h"
#include "postmaster/autovacuum.h"
@@ -148,6 +149,7 @@ CalculateShmemSize(int *num_semaphores)
size = add_size(size, WaitEventCustomShmemSize());
size = add_size(size, InjectionPointShmemSize());
size = add_size(size, SlotSyncShmemSize());
+ size = add_size(size, ProgressiveExplainHashShmemSize());
/* include additional requested shmem from preload libraries */
size = add_size(size, total_addin_request);
@@ -300,6 +302,11 @@ CreateOrAttachShmemStructs(void)
*/
PredicateLockShmemInit();
+ /*
+ * Set up instrumented explain hash table
+ */
+ InitProgressiveExplainHash();
+
/*
* Set up process table
*/
diff --git a/src/backend/storage/lmgr/lwlock.c b/src/backend/storage/lmgr/lwlock.c
index 8adf2730277..2b4393d3635 100644
--- a/src/backend/storage/lmgr/lwlock.c
+++ b/src/backend/storage/lmgr/lwlock.c
@@ -176,6 +176,7 @@ static const char *const BuiltinTrancheNames[] = {
[LWTRANCHE_SUBTRANS_SLRU] = "SubtransSLRU",
[LWTRANCHE_XACT_SLRU] = "XactSLRU",
[LWTRANCHE_PARALLEL_VACUUM_DSA] = "ParallelVacuumDSA",
+ [LWTRANCHE_PROGRESSIVE_EXPLAIN_DSA] = "ProgressiveExplainDSA",
};
StaticAssertDecl(lengthof(BuiltinTrancheNames) ==
diff --git a/src/backend/utils/activity/wait_event_names.txt b/src/backend/utils/activity/wait_event_names.txt
index e199f071628..890acf02da9 100644
--- a/src/backend/utils/activity/wait_event_names.txt
+++ b/src/backend/utils/activity/wait_event_names.txt
@@ -346,6 +346,7 @@ WALSummarizer "Waiting to read or update WAL summarization state."
DSMRegistry "Waiting to read or update the dynamic shared memory registry."
InjectionPoint "Waiting to read or update information related to injection points."
SerialControl "Waiting to read or update shared <filename>pg_serial</filename> state."
+ExplainHash "Waiting to access backend explain shared hash table."
#
# END OF PREDEFINED LWLOCKS (DO NOT CHANGE THIS LINE)
diff --git a/src/backend/utils/init/postinit.c b/src/backend/utils/init/postinit.c
index ee1a9d5d98b..6aee6f08b75 100644
--- a/src/backend/utils/init/postinit.c
+++ b/src/backend/utils/init/postinit.c
@@ -33,6 +33,7 @@
#include "catalog/pg_database.h"
#include "catalog/pg_db_role_setting.h"
#include "catalog/pg_tablespace.h"
+#include "commands/explain.h"
#include "libpq/auth.h"
#include "libpq/libpq-be.h"
#include "mb/pg_wchar.h"
@@ -80,6 +81,8 @@ static void TransactionTimeoutHandler(void);
static void IdleSessionTimeoutHandler(void);
static void IdleStatsUpdateTimeoutHandler(void);
static void ClientCheckTimeoutHandler(void);
+static void ProgressiveExplainStartupTimeoutHandler(void);
+static void ProgressiveExplainTimeoutHandler(void);
static bool ThereIsAtLeastOneRole(void);
static void process_startup_options(Port *port, bool am_superuser);
static void process_settings(Oid databaseid, Oid roleid);
@@ -757,6 +760,10 @@ InitPostgres(const char *in_dbname, Oid dboid,
RegisterTimeout(CLIENT_CONNECTION_CHECK_TIMEOUT, ClientCheckTimeoutHandler);
RegisterTimeout(IDLE_STATS_UPDATE_TIMEOUT,
IdleStatsUpdateTimeoutHandler);
+ RegisterTimeout(PROGRESSIVE_EXPLAIN_TIMEOUT,
+ ProgressiveExplainTimeoutHandler);
+ RegisterTimeout(PROGRESSIVE_EXPLAIN_STARTUP_TIMEOUT,
+ ProgressiveExplainStartupTimeoutHandler);
}
/*
@@ -1418,6 +1425,18 @@ ClientCheckTimeoutHandler(void)
SetLatch(MyLatch);
}
+static void
+ProgressiveExplainStartupTimeoutHandler(void)
+{
+ ProgressiveExplainTrigger();
+}
+
+static void
+ProgressiveExplainTimeoutHandler(void)
+{
+ ProgressiveExplainPending = true;
+}
+
/*
* Returns true if at least one role is defined in this database cluster.
*/
diff --git a/src/backend/utils/misc/guc_tables.c b/src/backend/utils/misc/guc_tables.c
index ad25cbb39c5..e6cd1d8781c 100644
--- a/src/backend/utils/misc/guc_tables.c
+++ b/src/backend/utils/misc/guc_tables.c
@@ -40,6 +40,7 @@
#include "catalog/storage.h"
#include "commands/async.h"
#include "commands/event_trigger.h"
+#include "commands/explain.h"
#include "commands/tablespace.h"
#include "commands/trigger.h"
#include "commands/user.h"
@@ -476,6 +477,22 @@ static const struct config_enum_entry wal_compression_options[] = {
{NULL, 0, false}
};
+static const struct config_enum_entry explain_format_options[] = {
+ {"text", EXPLAIN_FORMAT_TEXT, false},
+ {"xml", EXPLAIN_FORMAT_XML, false},
+ {"json", EXPLAIN_FORMAT_JSON, false},
+ {"yaml", EXPLAIN_FORMAT_YAML, false},
+ {NULL, 0, false}
+};
+
+static const struct config_enum_entry progressive_explain_options[] = {
+ {"off", PROGRESSIVE_EXPLAIN_NONE, false},
+ {"explain", PROGRESSIVE_EXPLAIN_EXPLAIN, false},
+ {"analyze", PROGRESSIVE_EXPLAIN_ANALYZE, false},
+ {"false", PROGRESSIVE_EXPLAIN_NONE, true},
+ {NULL, 0, false}
+};
+
/*
* Options for enum values stored in other modules
*/
@@ -530,6 +547,15 @@ int log_parameter_max_length_on_error = 0;
int log_temp_files = -1;
double log_statement_sample_rate = 1.0;
double log_xact_sample_rate = 0;
+int progressive_explain = PROGRESSIVE_EXPLAIN_NONE;
+bool progressive_explain_verbose = false;
+bool progressive_explain_settings = false;
+bool progressive_explain_timing = false;
+bool progressive_explain_buffers = false;
+bool progressive_explain_wal = false;
+int progressive_explain_min_duration = 1000;
+int progressive_explain_interval = 1000;
+int progressive_explain_format = EXPLAIN_FORMAT_TEXT;
char *backtrace_functions;
int temp_file_limit = -1;
@@ -2118,6 +2144,61 @@ struct config_bool ConfigureNamesBool[] =
NULL, NULL, NULL
},
+ {
+ {"progressive_explain_verbose", PGC_USERSET, STATS_MONITORING,
+ gettext_noop("Controls whether verbose details are added to progressive explains."),
+ gettext_noop("Equivalent to the VERBOSE option of EXPLAIN."),
+ GUC_EXPLAIN
+ },
+ &progressive_explain_verbose,
+ false,
+ NULL, NULL, NULL
+ },
+
+ {
+ {"progressive_explain_settings", PGC_USERSET, STATS_MONITORING,
+ gettext_noop("Controls whether information about modified configuration is added to progressive explains."),
+ gettext_noop("Equivalent to the SETTINGS option of EXPLAIN."),
+ GUC_EXPLAIN
+ },
+ &progressive_explain_settings,
+ false,
+ NULL, NULL, NULL
+ },
+
+ {
+ {"progressive_explain_timing", PGC_USERSET, STATS_MONITORING,
+ gettext_noop("Controls whether information about per node timing is added to progressive explains."),
+ gettext_noop("Equivalent to the TIMING option of EXPLAIN."),
+ GUC_EXPLAIN
+ },
+ &progressive_explain_timing,
+ true,
+ NULL, NULL, NULL
+ },
+
+ {
+ {"progressive_explain_buffers", PGC_USERSET, STATS_MONITORING,
+ gettext_noop("Controls whether information about buffers is added to progressive explains."),
+ gettext_noop("Equivalent to the BUFFERS option of EXPLAIN."),
+ GUC_EXPLAIN
+ },
+ &progressive_explain_buffers,
+ false,
+ NULL, NULL, NULL
+ },
+
+ {
+ {"progressive_explain_wal", PGC_USERSET, STATS_MONITORING,
+ gettext_noop("Controls whether information about WAL record generation is added to progressive explains."),
+ gettext_noop("Equivalent to the WAL option of EXPLAIN."),
+ GUC_EXPLAIN
+ },
+ &progressive_explain_wal,
+ false,
+ NULL, NULL, NULL
+ },
+
/* End-of-list marker */
{
{NULL, 0, 0, NULL, NULL}, NULL, false, NULL, NULL, NULL
@@ -3785,6 +3866,30 @@ struct config_int ConfigureNamesInt[] =
NULL, NULL, NULL
},
+ {
+ {"progressive_explain_interval", PGC_USERSET, STATS_MONITORING,
+ gettext_noop("Sets the interval between instrumented progressive "
+ "explains."),
+ NULL,
+ GUC_UNIT_MS
+ },
+ &progressive_explain_interval,
+ 1000, 10, INT_MAX,
+ NULL, NULL, NULL
+ },
+
+ {
+ {"progressive_explain_min_duration", PGC_USERSET, STATS_MONITORING,
+ gettext_noop("Min query duration to start printing instrumented "
+ "progressive explains."),
+ NULL,
+ GUC_UNIT_MS
+ },
+ &progressive_explain_min_duration,
+ 1000, 0, INT_MAX,
+ NULL, NULL, NULL
+ },
+
/* End-of-list marker */
{
{NULL, 0, 0, NULL, NULL}, NULL, 0, 0, 0, NULL, NULL, NULL
@@ -5299,6 +5404,26 @@ struct config_enum ConfigureNamesEnum[] =
NULL, NULL, NULL
},
+ {
+ {"progressive_explain_format", PGC_USERSET, STATS_MONITORING,
+ gettext_noop("Selects the EXPLAIN output format to be used with progressive explains."),
+ gettext_noop("Equivalent to the FORMAT option of EXPLAIN.")
+ },
+ &progressive_explain_format,
+ EXPLAIN_FORMAT_TEXT, explain_format_options,
+ NULL, NULL, NULL
+ },
+
+ {
+ {"progressive_explain", PGC_USERSET, STATS_MONITORING,
+ gettext_noop("Enables progressive explains."),
+ gettext_noop("Explain output is visible via pg_stat_progress_explain.")
+ },
+ &progressive_explain,
+ PROGRESSIVE_EXPLAIN_NONE, progressive_explain_options,
+ NULL, NULL, NULL
+ },
+
/* End-of-list marker */
{
{NULL, 0, 0, NULL, NULL}, NULL, 0, NULL, NULL, NULL, NULL
diff --git a/src/backend/utils/misc/postgresql.conf.sample b/src/backend/utils/misc/postgresql.conf.sample
index 2d1de9c37bd..7da505564d4 100644
--- a/src/backend/utils/misc/postgresql.conf.sample
+++ b/src/backend/utils/misc/postgresql.conf.sample
@@ -655,6 +655,19 @@
#log_executor_stats = off
+# - Progressive Explain -
+
+#progressive_explain = off
+#progressive_explain_min_duration = 1s
+#progressive_explain_interval = 1s
+#progressive_explain_format = text
+#progressive_explain_settings = off
+#progressive_explain_verbose = off
+#progressive_explain_buffers = off
+#progressive_explain_wal = off
+#progressive_explain_timing = off
+
+
#------------------------------------------------------------------------------
# VACUUMING
#------------------------------------------------------------------------------
diff --git a/src/include/catalog/pg_proc.dat b/src/include/catalog/pg_proc.dat
index cede992b6e2..6ff212dfb7a 100644
--- a/src/include/catalog/pg_proc.dat
+++ b/src/include/catalog/pg_proc.dat
@@ -12469,4 +12469,14 @@
proargtypes => 'int4',
prosrc => 'gist_stratnum_common' },
+{ oid => '8770',
+ descr => 'statistics: information about progress of backends running statements',
+ proname => 'pg_stat_progress_explain', prorows => '100', proisstrict => 'f',
+ proretset => 't', provolatile => 's', proparallel => 'r',
+ prorettype => 'record', proargtypes => 'bool',
+ proallargtypes => '{bool,int4,timestamptz,text}',
+ proargmodes => '{i,o,o,o}',
+ proargnames => '{mode,pid,last_print,query_plan}',
+ prosrc => 'pg_stat_progress_explain' },
+
]
diff --git a/src/include/commands/explain.h b/src/include/commands/explain.h
index 64547bd9b9c..50e90815936 100644
--- a/src/include/commands/explain.h
+++ b/src/include/commands/explain.h
@@ -13,6 +13,7 @@
#ifndef EXPLAIN_H
#define EXPLAIN_H
+#include "datatype/timestamp.h"
#include "executor/executor.h"
#include "lib/stringinfo.h"
#include "parser/parse_node.h"
@@ -32,6 +33,13 @@ typedef enum ExplainFormat
EXPLAIN_FORMAT_YAML,
} ExplainFormat;
+typedef enum ProgressiveExplain
+{
+ PROGRESSIVE_EXPLAIN_NONE,
+ PROGRESSIVE_EXPLAIN_EXPLAIN,
+ PROGRESSIVE_EXPLAIN_ANALYZE,
+} ProgressiveExplain;
+
typedef struct ExplainWorkersState
{
int num_workers; /* # of worker processes the plan used */
@@ -67,12 +75,37 @@ typedef struct ExplainState
List *deparse_cxt; /* context list for deparsing expressions */
Bitmapset *printed_subplans; /* ids of SubPlans we've printed */
bool hide_workers; /* set if we find an invisible Gather */
+ bool progressive; /* set if tracking a progressive explain */
int rtable_size; /* length of rtable excluding the RTE_GROUP
* entry */
/* state related to the current plan node */
ExplainWorkersState *workers_state; /* needed if parallel plan */
+
+ /* state related to progressive explains */
+ struct PlanState *pe_curr_node;
+ struct Instrumentation *pe_local_instr;
+ dsa_area *pe_a;
} ExplainState;
+typedef struct progressiveExplainHashKey
+{
+ int pid; /* PID */
+} progressiveExplainHashKey;
+
+typedef struct progressiveExplainHashEntry
+{
+ progressiveExplainHashKey key; /* hash key of entry - MUST BE FIRST */
+ dsa_handle h;
+ dsa_pointer p;
+} progressiveExplainHashEntry;
+
+typedef struct progressiveExplainData
+{
+ int pid;
+ TimestampTz last_print;
+ char plan[];
+} progressiveExplainData;
+
/* Hook for plugins to get control in ExplainOneQuery() */
typedef void (*ExplainOneQuery_hook_type) (Query *query,
int cursorOptions,
@@ -120,4 +153,15 @@ extern void ExplainPrintJITSummary(ExplainState *es, QueryDesc *queryDesc);
extern void ExplainQueryText(ExplainState *es, QueryDesc *queryDesc);
extern void ExplainQueryParameters(ExplainState *es, ParamListInfo params, int maxlen);
+extern void ProgressiveExplainSetup(QueryDesc *queryDesc);
+extern void ProgressiveExplainStart(QueryDesc *queryDesc);
+extern void ProgressiveExplainTrigger(void);
+extern void ProgressiveExplainUpdate(PlanState *node);
+extern void ProgressiveExplainFinish(QueryDesc *queryDesc);
+extern Size ProgressiveExplainHashShmemSize(void);
+extern void InitProgressiveExplainHash(void);
+extern TupleTableSlot *ExecProcNodeInstrExplain(PlanState *node);
+
+extern bool ProgressiveExplainPending;
+
#endif /* EXPLAIN_H */
diff --git a/src/include/executor/execdesc.h b/src/include/executor/execdesc.h
index ba53305ad42..a6d2362c6c9 100644
--- a/src/include/executor/execdesc.h
+++ b/src/include/executor/execdesc.h
@@ -48,6 +48,7 @@ typedef struct QueryDesc
TupleDesc tupDesc; /* descriptor for result tuples */
EState *estate; /* executor's query-wide state */
PlanState *planstate; /* tree of per-plan-node state */
+ struct ExplainState *pe_es; /* progressive explain state if enabled */
/* This field is set by ExecutePlan */
bool already_executed; /* true if previously executed */
diff --git a/src/include/executor/instrument.h b/src/include/executor/instrument.h
index 03653ab6c6c..21de2a5632d 100644
--- a/src/include/executor/instrument.h
+++ b/src/include/executor/instrument.h
@@ -109,6 +109,7 @@ extern void InstrStartNode(Instrumentation *instr);
extern void InstrStopNode(Instrumentation *instr, double nTuples);
extern void InstrUpdateTupleCount(Instrumentation *instr, double nTuples);
extern void InstrEndLoop(Instrumentation *instr);
+extern void InstrEndLoopForce(Instrumentation *instr);
extern void InstrAggNode(Instrumentation *dst, Instrumentation *add);
extern void InstrStartParallelQuery(void);
extern void InstrEndParallelQuery(BufferUsage *bufusage, WalUsage *walusage);
diff --git a/src/include/nodes/execnodes.h b/src/include/nodes/execnodes.h
index a323fa98bbb..3ace9a88636 100644
--- a/src/include/nodes/execnodes.h
+++ b/src/include/nodes/execnodes.h
@@ -57,6 +57,7 @@ struct ExprState;
struct ExprContext;
struct RangeTblEntry; /* avoid including parsenodes.h here */
struct ExprEvalStep; /* avoid including execExpr.h everywhere */
+struct QueryDesc; /* avoid including execdesc.h here */
struct CopyMultiInsertBuffer;
struct LogicalTapeSet;
@@ -763,6 +764,9 @@ typedef struct EState
*/
List *es_insert_pending_result_relations;
List *es_insert_pending_modifytables;
+
+ /* Reference to query descriptor */
+ struct QueryDesc *query_desc;
} EState;
@@ -1159,6 +1163,8 @@ typedef struct PlanState
ExecProcNodeMtd ExecProcNode; /* function to return next tuple */
ExecProcNodeMtd ExecProcNodeReal; /* actual function, if above is a
* wrapper */
+ ExecProcNodeMtd ExecProcNodeOriginal; /* pointer to original function
+ * another wrapper was added */
Instrumentation *instrument; /* Optional runtime stats for this node */
WorkerInstrumentation *worker_instrument; /* per-worker instrumentation */
diff --git a/src/include/storage/lwlock.h b/src/include/storage/lwlock.h
index 13a7dc89980..eef4af1a3e7 100644
--- a/src/include/storage/lwlock.h
+++ b/src/include/storage/lwlock.h
@@ -217,6 +217,7 @@ typedef enum BuiltinTrancheIds
LWTRANCHE_SUBTRANS_SLRU,
LWTRANCHE_XACT_SLRU,
LWTRANCHE_PARALLEL_VACUUM_DSA,
+ LWTRANCHE_PROGRESSIVE_EXPLAIN_DSA,
LWTRANCHE_FIRST_USER_DEFINED,
} BuiltinTrancheIds;
diff --git a/src/include/storage/lwlocklist.h b/src/include/storage/lwlocklist.h
index cf565452382..43f10a51862 100644
--- a/src/include/storage/lwlocklist.h
+++ b/src/include/storage/lwlocklist.h
@@ -83,3 +83,4 @@ PG_LWLOCK(49, WALSummarizer)
PG_LWLOCK(50, DSMRegistry)
PG_LWLOCK(51, InjectionPoint)
PG_LWLOCK(52, SerialControl)
+PG_LWLOCK(53, ExplainHash)
diff --git a/src/include/utils/guc.h b/src/include/utils/guc.h
index 1233e07d7da..365c933ab00 100644
--- a/src/include/utils/guc.h
+++ b/src/include/utils/guc.h
@@ -278,6 +278,15 @@ extern PGDLLIMPORT int log_min_duration_statement;
extern PGDLLIMPORT int log_temp_files;
extern PGDLLIMPORT double log_statement_sample_rate;
extern PGDLLIMPORT double log_xact_sample_rate;
+extern PGDLLIMPORT int progressive_explain;
+extern PGDLLIMPORT int progressive_explain_min_duration;
+extern PGDLLIMPORT int progressive_explain_interval;
+extern PGDLLIMPORT int progressive_explain_format;
+extern PGDLLIMPORT bool progressive_explain_verbose;
+extern PGDLLIMPORT bool progressive_explain_settings;
+extern PGDLLIMPORT bool progressive_explain_timing;
+extern PGDLLIMPORT bool progressive_explain_buffers;
+extern PGDLLIMPORT bool progressive_explain_wal;
extern PGDLLIMPORT char *backtrace_functions;
extern PGDLLIMPORT int temp_file_limit;
diff --git a/src/include/utils/timeout.h b/src/include/utils/timeout.h
index 7b19beafdc9..ea66a0505d9 100644
--- a/src/include/utils/timeout.h
+++ b/src/include/utils/timeout.h
@@ -36,6 +36,8 @@ typedef enum TimeoutId
IDLE_STATS_UPDATE_TIMEOUT,
CLIENT_CONNECTION_CHECK_TIMEOUT,
STARTUP_PROGRESS_TIMEOUT,
+ PROGRESSIVE_EXPLAIN_STARTUP_TIMEOUT,
+ PROGRESSIVE_EXPLAIN_TIMEOUT,
/* First user-definable timeout reason */
USER_TIMEOUT,
/* Maximum number of timeout reasons */
diff --git a/src/test/modules/test_misc/t/008_progressive_explain.pl b/src/test/modules/test_misc/t/008_progressive_explain.pl
new file mode 100644
index 00000000000..db6adfa89d8
--- /dev/null
+++ b/src/test/modules/test_misc/t/008_progressive_explain.pl
@@ -0,0 +1,130 @@
+# Copyright (c) 2023-2025, PostgreSQL Global Development Group
+#
+# Test progressive explain
+#
+# We need to make sure pg_stat_progress_explain does not show rows for the local
+# session, even if progressive explain is enabled. For other sessions pg_stat_progress_explain
+# should contain data for a PID only if progressive_explain is enabled and a query
+# is running. Data needs to be removed when query finishes (or gets cancelled).
+
+use strict;
+use PostgreSQL::Test::Cluster;
+use PostgreSQL::Test::Utils;
+use Test::More;
+
+# Initialize node
+my $node = PostgreSQL::Test::Cluster->new('progressive_explain');
+
+$node->init;
+# Configure progressive explain to be logged immediatelly
+$node->append_conf('postgresql.conf', 'progressive_explain_min_duration = 0');
+$node->start;
+
+# Test for local session
+sub test_local_session
+{
+ my $setting = $_[0];
+ # Make sure local session does not appear in pg_stat_progress_explain
+ my $count = $node->safe_psql(
+ 'postgres', qq[
+ SET progressive_explain='$setting';
+ SELECT count(*) from pg_stat_progress_explain WHERE pid = pg_backend_pid()
+ ]);
+
+ ok($count == "0",
+ "Session cannot see its own explain with progressive_explain set to ${setting}");
+}
+
+# Tests for peer session
+sub test_peer_session
+{
+ my $setting = $_[0];
+ my $ret;
+
+ # Start a background session and get its PID
+ my $background_psql = $node->background_psql(
+ 'postgres',
+ on_error_stop => 0);
+
+ my $pid = $background_psql->query_safe(
+ qq[
+ SELECT pg_backend_pid();
+ ]);
+
+ # Configure progressive explain in background session and run a simple query
+ # letting it finish
+ $background_psql->query_safe(
+ qq[
+ SET progressive_explain='$setting';
+ SELECT 1;
+ ]);
+
+ # Check that pg_stat_progress_explain contains no row for the PID that finished
+ # its query gracefully
+ $ret = $node->safe_psql(
+ 'postgres', qq[
+ SELECT count(*) FROM pg_stat_progress_explain where pid = $pid
+ ]);
+
+ ok($ret == "0",
+ "pg_stat_progress_explain empty for completed query with progressive_explain set to ${setting}");
+
+ # Start query in background session and leave it running
+ $background_psql->query_until(
+ qr/start/, q(
+ \echo start
+ SELECT pg_sleep(600);
+ ));
+
+ $ret = $node->safe_psql(
+ 'postgres', qq[
+ SELECT count(*) FROM pg_stat_progress_explain where pid = $pid
+ ]);
+
+ # If progressive_explain is disabled pg_stat_progress_explain should not contain
+ # row for PID
+ if ($setting eq 'off') {
+ ok($ret == "0",
+ "pg_stat_progress_explain empty for running query with progressive_explain set to ${setting}");
+ }
+ # 1 row for pid is expected for running query
+ else {
+ ok($ret == "1",
+ "pg_stat_progress_explain with 1 row for running query with progressive_explain set to ${setting}");
+ }
+
+ # Terminate running query and make sure it is gone
+ $node->safe_psql(
+ 'postgres', qq[
+ SELECT pg_cancel_backend($pid);
+ ]);
+
+ $node->poll_query_until(
+ 'postgres', qq[
+ SELECT count(*) = 0 FROM pg_stat_activity
+ WHERE pid = $pid and state = 'active'
+ ]);
+
+ # Check again pg_stat_progress_explain and confirm that the existing row is
+ # now gone
+ $ret = $node->safe_psql(
+ 'postgres', qq[
+ SELECT count(*) FROM pg_stat_progress_explain where pid = $pid
+ ]);
+
+ ok($ret == "0",
+ "pg_stat_progress_explain empty for canceled query with progressive_explain set to ${setting}");
+}
+
+# Run tests for the local session
+test_local_session('off');
+test_local_session('explain');
+test_local_session('analyze');
+
+# Run tests for peer session
+test_peer_session('off');
+test_peer_session('explain');
+test_peer_session('analyze');
+
+$node->stop;
+done_testing();
diff --git a/src/test/regress/expected/rules.out b/src/test/regress/expected/rules.out
index 62f69ac20b2..4968db5607f 100644
--- a/src/test/regress/expected/rules.out
+++ b/src/test/regress/expected/rules.out
@@ -2041,6 +2041,10 @@ pg_stat_progress_create_index| SELECT s.pid,
s.param15 AS partitions_done
FROM (pg_stat_get_progress_info('CREATE INDEX'::text) s(pid, datid, relid, param1, param2, param3, param4, param5, param6, param7, param8, param9, param10, param11, param12, param13, param14, param15, param16, param17, param18, param19, param20)
LEFT JOIN pg_database d ON ((s.datid = d.oid)));
+pg_stat_progress_explain| SELECT pid,
+ last_print,
+ query_plan
+ FROM pg_stat_progress_explain(true) pg_stat_progress_explain(pid, last_print, query_plan);
pg_stat_progress_vacuum| SELECT s.pid,
s.datid,
d.datname,
diff --git a/src/tools/pgindent/typedefs.list b/src/tools/pgindent/typedefs.list
index 9840060997f..f1a74454187 100644
--- a/src/tools/pgindent/typedefs.list
+++ b/src/tools/pgindent/typedefs.list
@@ -2267,6 +2267,7 @@ ProcessUtilityContext
ProcessUtility_hook_type
ProcessingMode
ProgressCommandType
+ProgressiveExplain
ProjectSet
ProjectSetPath
ProjectSetState
@@ -3870,6 +3871,9 @@ process_sublinks_context
proclist_head
proclist_mutable_iter
proclist_node
+progressiveExplainData
+progressiveExplainHashEntry
+progressiveExplainHashKey
promptStatus_t
pthread_barrier_t
pthread_cond_t
--
2.43.0
On Fri, Mar 7, 2025, at 5:34 PM, Rafael Thofehrn Castro wrote:
Did additional benchmarks and found issues with the patch that doesn't do execProcNode
wrapping. There are sporadic crashes with **double free or corruption (top)**
****
So making the patch that uses the wrapper the current one. Again, giving the credits to
torikoshia as being the owner of that section of the code.
Rafael, thanks for working on it. It is a step forward in observability. I
started with some performance tests and the last improvements seem to fix the
overhead imposed in the initial patch version. I didn't notice any of these new
function in the perf report while executing fast queries.
I found a crash. It is simple to reproduce.
Session A:
select * from pg_stat_progress_explain;
\watch 2
Session B:
explain select pg_sleep(30);
\watch 2
8<--------------------------------------------------------------------8<
Backtrace:
Core was generated by `postgres: euler postgres [lo'.
Program terminated with signal SIGSEGV, Segmentation fault.
#0 WrapExecProcNodeWithExplain (ps=0x7f7f7f7f7f7f7f7f) at explain.c:5401
5401 if (ps->ExecProcNodeOriginal != NULL)
#0 WrapExecProcNodeWithExplain (ps=0x7f7f7f7f7f7f7f7f) at explain.c:5401
#1 0x00005624173829aa in handle_sig_alarm (postgres_signal_arg=<optimized out>) at timeout.c:414
#2 0x00005624173ba02c in wrapper_handler (postgres_signal_arg=14) at pqsignal.c:110
#3 <signal handler called>
#4 0x00007f20fa529e63 in epoll_wait (epfd=6, events=0x56244ef37e58, maxevents=1, timeout=-1) at ../sysdeps/unix/sysv/linux/epoll_wait.c:30
#5 0x00005624171fb02f in WaitEventSetWaitBlock (nevents=1, occurred_events=0x7ffdd9e62080, cur_timeout=-1, set=0x56244ef37dd8) at waiteventset.c:1190
#6 WaitEventSetWait (set=0x56244ef37dd8, timeout=timeout@entry=-1, occurred_events=occurred_events@entry=0x7ffdd9e62080, nevents=nevents@entry=1, wait_event_info=wait_event_info@entry=100663296) at waiteventset.c:1138
#7 0x000056241709513c in secure_read (port=0x56244eeb90e0, ptr=0x56241775a9a0 <PqRecvBuffer>, len=8192) at be-secure.c:218
#8 0x000056241709bf2e in pq_recvbuf () at pqcomm.c:924
#9 0x000056241709ceb5 in pq_getbyte () at pqcomm.c:970
#10 0x000056241721b617 in SocketBackend (inBuf=0x7ffdd9e622a0) at postgres.c:361
#11 ReadCommand (inBuf=0x7ffdd9e622a0) at postgres.c:484
#12 PostgresMain (dbname=<optimized out>, username=<optimized out>) at postgres.c:4625
#13 0x00005624172167ed in BackendMain (startup_data=<optimized out>, startup_data_len=<optimized out>) at backend_startup.c:107
#14 0x000056241717519b in postmaster_child_launch (child_type=<optimized out>, child_slot=2, startup_data=startup_data@entry=0x7ffdd9e6253c, startup_data_len=startup_data_len@entry=4, client_sock=client_sock@entry=0x7ffdd9e62540) at launch_backend.c:274
#15 0x0000562417178c32 in BackendStartup (client_sock=0x7ffdd9e62540) at postmaster.c:3519
#16 ServerLoop () at postmaster.c:1688
#17 0x000056241717a6da in PostmasterMain (argc=argc@entry=1, argv=argv@entry=0x56244eeb81b0) at postmaster.c:1386
#18 0x0000562416e64f9a in main (argc=1, argv=0x56244eeb81b0) at main.c:230
8<--------------------------------------------------------------------8<
You call this feature "progressive explain". My first impression is that it
will only provide the execution plans for EXPLAIN commands. Instead of
"progressive explain", I would suggest "query progress" that is a general
database terminology. It seems natural to use "progressive explain" since you
are using the explain infrastructure (including the same options -- format,
settings, wal, ...) -- to print the execution plan.
+CREATE VIEW pg_stat_progress_explain AS
+ SELECT
+ *
+ FROM pg_stat_progress_explain(true);
+
There is no use for the function argument. If you decide to keep this function,
remove it.
Why don't you use the pgstat_progress_XXX() API? Since you are using a
pg_stat_progress_XXX view name I would expect using the command progress
reporting infrastructure (see backend_progress.c).
Maybe you could include datid and datname as the other progress reporting
views. It would avoid a join to figure out what the database is.
+static const struct config_enum_entry explain_format_options[] = {
+ {"text", EXPLAIN_FORMAT_TEXT, false},
+ {"xml", EXPLAIN_FORMAT_XML, false},
+ {"json", EXPLAIN_FORMAT_JSON, false},
+ {"yaml", EXPLAIN_FORMAT_YAML, false},
+ {NULL, 0, false}
+};
Isn't it the same definition as in auto_explain.c? Use only one definition for
auto_explain and this feature. You can put this struct into explain.c, use an
extern definition for guc_tables.c and put a extern PGDLLIMPORT defintion into
guc.h. See wal_level_options, for an example.
+static const struct config_enum_entry progressive_explain_options[] = {
+ {"off", PROGRESSIVE_EXPLAIN_NONE, false},
+ {"explain", PROGRESSIVE_EXPLAIN_EXPLAIN, false},
+ {"analyze", PROGRESSIVE_EXPLAIN_ANALYZE, false},
+ {"false", PROGRESSIVE_EXPLAIN_NONE, true},
+ {NULL, 0, false}
+};
The "analyze" is a separate option in auto_explain. Should we have 2 options?
One that enable/disable this feature and another one that enable/disable
analyze option.
Don't the other EXPLAIN options make sense here? Like serialize and summary.
TupleDesc tupDesc; /* descriptor for result tuples */
EState *estate; /* executor's query-wide state */
PlanState *planstate; /* tree of per-plan-node state */
+ struct ExplainState *pe_es; /* progressive explain state if enabled */
Should you use the same name pattern here? pestate, for example.
PG_LWLOCK(52, SerialControl)
+PG_LWLOCK(53, ExplainHash)
Could you use a specific name? Even if you keep the proposed name, you should
use ProgressiveExplainHash, ProgressiveExplain or QueryProgress.
+$node->init;
+# Configure progressive explain to be logged immediatelly
+$node->append_conf('postgresql.conf', 'progressive_explain_min_duration = 0');
+$node->start;
s/immediatelly/immediately/
+typedef struct progressiveExplainHashKey
+{
+ int pid; /* PID */
+} progressiveExplainHashKey;
+
+typedef struct progressiveExplainHashEntry
+{
+ progressiveExplainHashKey key; /* hash key of entry - MUST BE FIRST */
+ dsa_handle h;
+ dsa_pointer p;
+} progressiveExplainHashEntry;
You don't need progressiveExplainHashKey. Use pid as key directly.
+ /* Update shared memory with new data */
+ strcpy(pe_data->plan, es->str->data);
+ pe_data->last_print = GetCurrentTimestamp();
I don't think last_print is accurate because it is not the time the execution plan
is printed but the time it was updated. I suggest last_updated_time.
+/* Flag set by timeouts to control when to print progressive explains */
+bool ProgressiveExplainPending = false;
s/print/update/
There are other point that you use "print" but it is better to replace it with
"update".
+ progressiveExplainArray = ShmemInitHash("progressive explain hash",
+ 50, 50,
+ &info,
+ HASH_ELEM | HASH_BLOBS);
I'm wondering why you use "array" in the name. ProgressiveExplainHash is a
better name.
+ entry->p = dsa_allocate(es->pe_a,
+ add_size(sizeof(progressiveExplainData),
+ add_size(strlen(es->str->data),
+ PROGRESSIVE_EXPLAIN_ALLOC_SIZE)));
I think you need a better name for PROGRESSIVE_EXPLAIN_ALLOC_SIZE because it
doesn't reflect what it is. PROGRESSIVE_EXPLAIN_FREE_SIZE or
PROGRESSIVE_EXPLAIN_AVAIL_SIZE?
Maybe you can use dshash.
There are some comments that still refers to the wrong function name.
+/*
+ * ExecProcNodeWithExplain
+ * ExecProcNode wrapper that initializes progressive explain
+ * and uwraps ExecProdNode to the original function.
+ */
+static TupleTableSlot *
+ExecProcNodeExplain(PlanState *node)
and
+/*
+ * ExecProcNodeWithExplain
+ * Responsible for initialization of all structures related to progressive
+ * explains.
+ *
+ /* state related to progressive explains */
+ struct PlanState *pe_curr_node;
+ struct Instrumentation *pe_local_instr;
+ dsa_area *pe_a;
Could you add some comments saying what each of these variables are for?
I didn't experiment but I was wondering if there is a way to avoid the
duplicates that you added to avoid the overhead.
--
Euler Taveira
EDB https://www.enterprisedb.com/
Thanks for the valuable inputs Euler. Adjusted most of your recommendations.
I found a crash. It is simple to reproduce.
Indeed, I failed to test plain EXPLAIN after the addition of the new GUC
progressive_explain_min_duration. This is fixed.
You call this feature "progressive explain". My first impression is that
it
will only provide the execution plans for EXPLAIN commands. Instead of
"progressive explain", I would suggest "query progress" that is a general
database terminology. It seems natural to use "progressive explain" since
you
are using the explain infrastructure (including the same options --
format,
settings, wal, ...) -- to print the execution plan.
Makes sense. Kept progressive explain for now but this is still open for
discussion.
There is no use for the function argument. If you decide to keep this
function,
remove it.
Done.
Why don't you use the pgstat_progress_XXX() API? Since you are using a
pg_stat_progress_XXX view name I would expect using the command progress
reporting infrastructure (see backend_progress.c).
I haven't changed that part as of now. My implementation and underlying data
structure may not work well with that API, but I am investigating.
Maybe you could include datid and datname as the other progress reporting
views. It would avoid a join to figure out what the database is.
Done.
Isn't it the same definition as in auto_explain.c? Use only one
definition for
auto_explain and this feature. You can put this struct into explain.c,
use an
extern definition for guc_tables.c and put a extern PGDLLIMPORT defintion
into
guc.h. See wal_level_options, for an example.
Done.
The "analyze" is a separate option in auto_explain. Should we have 2
options?
One that enable/disable this feature and another one that enable/disable
analyze option.
Tomas Vondra proposed the current logic and I think it makes sense. Having
a
single GUC to control the execution behavior keeps the feature simpler IMO.
Don't the other EXPLAIN options make sense here? Like serialize and
summary.
I added a missing GUC for option COSTS (progressive_explain_costs). Adding
the other ones doesn't make much sense IMO. SUMMARY, SERIALIZE and MEMORY
are information added at the end of the query execution (or plan creation
for plain
EXPLAIN) in the summary section but at that point the progressive explain
will be
already finished, with no more information in pg_stat_progress_explain.
TupleDesc tupDesc; /* descriptor for result tuples */
EState *estate; /* executor's query-wide state */
PlanState *planstate; /* tree of per-plan-node state */
+ struct ExplainState *pe_es; /* progressive explain state if enabled */
Should you use the same name pattern here? pestate, for example.
Done.
PG_LWLOCK(52, SerialControl)
+PG_LWLOCK(53, ExplainHash)
Could you use a specific name? Even if you keep the proposed name, you
should
use ProgressiveExplainHash, ProgressiveExplain or QueryProgress.
Changed to ProgressiveExplainHash.
You don't need progressiveExplainHashKey. Use pid as key directly.
Done.
I don't think last_print is accurate because it is not the time the
execution plan
is printed but the time it was updated. I suggest last_updated_time.
Changed from last_print to last_update. This is still open for discussion.
I'm wondering why you use "array" in the name. ProgressiveExplainHash is a
better name.
Used to be compatible with the ProcArray (that is also a hash). But what you
proposed is better indeed. Changed.
I think you need a better name for PROGRESSIVE_EXPLAIN_ALLOC_SIZE because
it
doesn't reflect what it is. PROGRESSIVE_EXPLAIN_FREE_SIZE or
PROGRESSIVE_EXPLAIN_AVAIL_SIZE?
Changed to PROGRESSIVE_EXPLAIN_FREE_SIZE.
Fixed the wrong function names in the comments and changed the format of
those
comments in function headers to be comptible with other functions in
explain.c.
+ /* state related to progressive explains */ + struct PlanState *pe_curr_node; + struct Instrumentation *pe_local_instr; + dsa_area *pe_a;
Could you add some comments saying what each of these variables are for?
Done.
I didn't experiment but I was wondering if there is a way to avoid the
duplicates that you added to avoid the overhead.
You mean the local instrumentation object reused for each node?
Rafael.
Attachments:
v9-0001-Proposal-for-progressive-explains.patchapplication/octet-stream; name=v9-0001-Proposal-for-progressive-explains.patchDownload
From e41a4364066a98695caca424294572679377fa3e Mon Sep 17 00:00:00 2001
From: rafaelthca <rafaelthca@gmail.com>
Date: Fri, 14 Mar 2025 12:33:14 -0300
Subject: [PATCH v9] Proposal for progressive explains
This proposal introduces a feature to print execution plans of active
queries in an in-memory shared hash object so that other sessions can
visualize via new view pg_stat_progress_explain.
Plans are only printed if new GUC parameter progressive_explain is
enabled.
When progressive_explain is set to 'explain' the plan will be printed
only once at the beginning of the query. If set to 'analyze' the QueryDesc
will be adjusted adding instrumentation flags. In that case the plan
will be printed on a fixed interval controlled by new GUC parameter
progressive_explain_interval including all instrumentation stats
computed so far (per node rows and execution time).
New view:
- pg_stat_progress_explain
- datid: OID of the database
- datname: Name of the database
- pid: PID of the process running the query
- last_update: timestamp when plan was last printed
- query_plan: the actual plan (limited read privileges)
New GUCs:
- progressive_explain: if progressive plans are printed for local
session.
- type: enum
- default: off
- values: [off, explain, analyze]
- context: user
- progressive_explain_min_duration: min query duration until progressive
explain starts.
- type: int
- default: 1s
- min: 0
- context: user
- progressive_explain_interval: interval between each explain print.
- type: int
- default: 1s
- min: 10ms
- context: user
- progressive_explain_format: format used to print the plans.
- type: enum
- default: text
- values: [TEXT, XML, JSON, or YAML]
- context: user
- progressive_explain_settings: controls whether information about
modified configuration is added to the printed plan.
- type: bool
- default: off
- context: user
- progressive_explain_verbose: controls whether verbose details are
added to the printed plan.
- type: bool
- default: off
- context: user
- progressive_explain_buffers: controls whether buffers details are
added to the printed plan.
- type: bool
- default: off
- context: user
- progressive_explain_timing: controls whether per node timing details
are added to the printed plan.
- type: bool
- default: true
- context: user
- progressive_explain_wal: controls whether WAL record generation
details are added to the printed plan.
- type: bool
- default: off
- context: user
- progressive_explain_costs: controls whether estimated startup and
total cost of each plan noded is added to the printed plan.
- type: bool
- default: true
- context: user
---
contrib/auto_explain/auto_explain.c | 10 +-
doc/src/sgml/config.sgml | 150 ++++
doc/src/sgml/monitoring.sgml | 82 ++
doc/src/sgml/perform.sgml | 97 ++
src/backend/catalog/system_views.sql | 10 +
src/backend/commands/explain.c | 832 +++++++++++++++++-
src/backend/executor/execMain.c | 24 +
src/backend/executor/execProcnode.c | 10 +-
src/backend/executor/instrument.c | 20 +-
src/backend/storage/ipc/ipci.c | 7 +
src/backend/storage/lmgr/lwlock.c | 1 +
.../utils/activity/wait_event_names.txt | 1 +
src/backend/utils/init/postinit.c | 19 +
src/backend/utils/misc/guc_tables.c | 129 +++
src/backend/utils/misc/postgresql.conf.sample | 14 +
src/include/catalog/pg_proc.dat | 10 +
src/include/commands/explain.h | 41 +
src/include/executor/execdesc.h | 1 +
src/include/executor/instrument.h | 1 +
src/include/nodes/execnodes.h | 6 +
src/include/storage/lwlock.h | 1 +
src/include/storage/lwlocklist.h | 1 +
src/include/utils/guc.h | 11 +
src/include/utils/timeout.h | 2 +
.../test_misc/t/008_progressive_explain.pl | 130 +++
src/test/regress/expected/rules.out | 7 +
src/tools/pgindent/typedefs.list | 3 +
27 files changed, 1561 insertions(+), 59 deletions(-)
create mode 100644 src/test/modules/test_misc/t/008_progressive_explain.pl
diff --git a/contrib/auto_explain/auto_explain.c b/contrib/auto_explain/auto_explain.c
index 7007a226c08..131b22bc080 100644
--- a/contrib/auto_explain/auto_explain.c
+++ b/contrib/auto_explain/auto_explain.c
@@ -38,14 +38,6 @@ static int auto_explain_log_level = LOG;
static bool auto_explain_log_nested_statements = false;
static double auto_explain_sample_rate = 1;
-static const struct config_enum_entry format_options[] = {
- {"text", EXPLAIN_FORMAT_TEXT, false},
- {"xml", EXPLAIN_FORMAT_XML, false},
- {"json", EXPLAIN_FORMAT_JSON, false},
- {"yaml", EXPLAIN_FORMAT_YAML, false},
- {NULL, 0, false}
-};
-
static const struct config_enum_entry loglevel_options[] = {
{"debug5", DEBUG5, false},
{"debug4", DEBUG4, false},
@@ -187,7 +179,7 @@ _PG_init(void)
NULL,
&auto_explain_log_format,
EXPLAIN_FORMAT_TEXT,
- format_options,
+ explain_format_options,
PGC_SUSET,
0,
NULL,
diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml
index 8c82b39a89d..3a4f3304e73 100644
--- a/doc/src/sgml/config.sgml
+++ b/doc/src/sgml/config.sgml
@@ -8562,6 +8562,156 @@ COPY postgres_log FROM '/full/path/to/logfile.csv' WITH csv;
</listitem>
</varlistentry>
+ <varlistentry id="guc-progressive-explain" xreflabel="progressive_explain">
+ <term><varname>progressive_explain</varname> (<type>enum</type>)
+ <indexterm>
+ <primary><varname>progressive_explain</varname> configuration parameter</primary>
+ </indexterm>
+ </term>
+ <listitem>
+ <para>
+ Determines whether progressive explains are enabled and how
+ they are executed; see <xref linkend="using-explain-progressive"/>.
+ Possible values are <literal>off</literal>, <literal>explain</literal>
+ and <literal>analyze</literal>. The default is <literal>off</literal>.
+ </para>
+ </listitem>
+ </varlistentry>
+
+ <varlistentry id="guc-progressive-explain-min-duration" xreflabel="progressive_explain_min_duration">
+ <term><varname>progressive_explain_min_duration</varname> (<type>integer</type>)
+ <indexterm>
+ <primary><varname>progressive_explain_min_duration</varname> configuration parameter</primary>
+ </indexterm>
+ </term>
+ <listitem>
+ <para>
+ Sets the threshold (in milliseconds) until progressive explain is
+ printed for the first time. The default is <literal>1s</literal>.
+ </para>
+ </listitem>
+ </varlistentry>
+
+ <varlistentry id="guc-progressive-explain-interval" xreflabel="progressive_explain_interval">
+ <term><varname>progressive_explain_interval</varname> (<type>integer</type>)
+ <indexterm>
+ <primary><varname>progressive_explain_interval</varname> configuration parameter</primary>
+ </indexterm>
+ </term>
+ <listitem>
+ <para>
+ Sets the interval (in milliseconds) between each instrumented
+ progressive explain. The default is <literal>1s</literal>.
+ </para>
+ </listitem>
+ </varlistentry>
+
+ <varlistentry id="guc-progressive-explain-buffers" xreflabel="progressive_explain_buffers">
+ <term><varname>progressive_explain_buffers</varname> (<type>boolean</type>)
+ <indexterm>
+ <primary><varname>progressive_explain_buffers</varname> configuration parameter</primary>
+ </indexterm>
+ </term>
+ <listitem>
+ <para>
+ Controls whether information on buffer usage is added to
+ progressive explains. Equivalent to the BUFFERS option of
+ EXPLAIN. The default is <literal>off</literal>.
+ </para>
+ </listitem>
+ </varlistentry>
+
+ <varlistentry id="guc-progressive-explain-timing" xreflabel="progressive_explain_timing">
+ <term><varname>progressive_explain_timing</varname> (<type>boolean</type>)
+ <indexterm>
+ <primary><varname>progressive_explain_timing</varname> configuration parameter</primary>
+ </indexterm>
+ </term>
+ <listitem>
+ <para>
+ Controls whether information on per node timing is added
+ to progressive explains. Equivalent to the TIMING option of
+ EXPLAIN. The default is <literal>off</literal>.
+ </para>
+ </listitem>
+ </varlistentry>
+
+ <varlistentry id="guc-progressive-explain-wal" xreflabel="progressive_explain_wal">
+ <term><varname>progressive_explain_wal</varname> (<type>boolean</type>)
+ <indexterm>
+ <primary><varname>progressive_explain_wal</varname> configuration parameter</primary>
+ </indexterm>
+ </term>
+ <listitem>
+ <para>
+ Controls whether information on WAL record generation is
+ added to progressive explains. Equivalent to the WAL option of
+ EXPLAIN. The default is <literal>off</literal>.
+ </para>
+ </listitem>
+ </varlistentry>
+
+ <varlistentry id="guc-progressive-explain-verbose" xreflabel="progressive_explain_verbose">
+ <term><varname>progressive_explain_verbose</varname> (<type>boolean</type>)
+ <indexterm>
+ <primary><varname>progressive_explain_verbose</varname> configuration parameter</primary>
+ </indexterm>
+ </term>
+ <listitem>
+ <para>
+ Controls whether verbose details are added to progressive explains.
+ Equivalent to the VERBOSE option of EXPLAIN. The default is
+ <literal>off</literal>.
+ </para>
+ </listitem>
+ </varlistentry>
+
+ <varlistentry id="guc-progressive-explain-settings" xreflabel="progressive_explain_settings">
+ <term><varname>progressive_explain_settings</varname> (<type>boolean</type>)
+ <indexterm>
+ <primary><varname>progressive_explain_settings</varname> configuration parameter</primary>
+ </indexterm>
+ </term>
+ <listitem>
+ <para>
+ Controls whether information on modified configuration is added to
+ progressive explains. Equivalent to the SETTINGS option of EXPLAIN.
+ The default is <literal>off</literal>.
+ </para>
+ </listitem>
+ </varlistentry>
+
+ <varlistentry id="guc-progressive-explain-costs" xreflabel="progressive_explain_costs">
+ <term><varname>progressive_explain_costs</varname> (<type>boolean</type>)
+ <indexterm>
+ <primary><varname>progressive_explain_costs</varname> configuration parameter</primary>
+ </indexterm>
+ </term>
+ <listitem>
+ <para>
+ Controls whether information on the estimated startup and total cost of
+ each plan node is added to progressive explains. Equivalent to the COSTS
+ option of EXPLAIN.
+ The default is <literal>off</literal>.
+ </para>
+ </listitem>
+ </varlistentry>
+
+ <varlistentry id="guc-progressive-explain-format" xreflabel="progressive_explain_format">
+ <term><varname>progressive_explain_format</varname> (<type>enum</type>)
+ <indexterm>
+ <primary><varname>progressive_explain_format</varname> configuration parameter</primary>
+ </indexterm>
+ </term>
+ <listitem>
+ <para>
+ Selects the EXPLAIN output format to be used with progressive
+ explains. Equivalent to the FORMAT option of EXPLAIN. The default
+ is <literal>text</literal>.
+ </para>
+ </listitem>
+ </varlistentry>
+
</variablelist>
</sect2>
diff --git a/doc/src/sgml/monitoring.sgml b/doc/src/sgml/monitoring.sgml
index aaa6586d3a4..9f654c4e649 100644
--- a/doc/src/sgml/monitoring.sgml
+++ b/doc/src/sgml/monitoring.sgml
@@ -6828,6 +6828,88 @@ FROM pg_stat_get_backend_idset() AS backendid;
</sect2>
+<sect2 id="explain-progress-reporting">
+ <title>EXPLAIN Progress Reporting</title>
+
+ <indexterm>
+ <primary>pg_stat_progress_explain</primary>
+ </indexterm>
+
+ <para>
+ Whenever a client backend or parallel worker is running a query with
+ <xref linkend="guc-progressive-explain"/> enabled, the
+ <structname>pg_stat_progress_explain</structname> view will contain a
+ corresponding row with query plan details; see
+ <xref linkend="using-explain-progressive"/>. The table below describe the
+ information that will be reported.
+ </para>
+
+ <table id="pg-stat-progress-explain-view" xreflabel="pg_stat_progress_explain">
+ <title><structname>pg_stat_progress_explain</structname> View</title>
+ <tgroup cols="1">
+ <thead>
+ <row>
+ <entry role="catalog_table_entry"><para role="column_definition">
+ Column Type
+ </para>
+ <para>
+ Description
+ </para></entry>
+ </row>
+ </thead>
+
+ <tbody>
+ <row>
+ <entry role="catalog_table_entry"><para role="column_definition">
+ <structfield>datid</structfield> <type>oid</type>
+ </para>
+ <para>
+ OID of the database this backend is connected to
+ </para></entry>
+ </row>
+
+ <row>
+ <entry role="catalog_table_entry"><para role="column_definition">
+ <structfield>datname</structfield> <type>name</type>
+ </para>
+ <para>
+ Name of the database this backend is connected to
+ </para></entry>
+ </row>
+
+ <row>
+ <entry role="catalog_table_entry"><para role="column_definition">
+ <structfield>pid</structfield> <type>integer</type>
+ </para>
+ <para>
+ Process ID of a client backend or parallel worker.
+ </para></entry>
+ </row>
+
+ <row>
+ <entry role="catalog_table_entry"><para role="column_definition">
+ <structfield>last_update</structfield> <type>timestamp with time zone</type>
+ </para>
+ <para>
+ Timestamp when plan was last printed.
+ </para></entry>
+ </row>
+
+ <row>
+ <entry role="catalog_table_entry"><para role="column_definition">
+ <structfield>query_plan</structfield> <type>text</type>
+ </para>
+ <para>
+ The progressive explain text.
+ </para></entry>
+ </row>
+
+ </tbody>
+ </tgroup>
+ </table>
+
+ </sect2>
+
</sect1>
<sect1 id="dynamic-trace">
diff --git a/doc/src/sgml/perform.sgml b/doc/src/sgml/perform.sgml
index 387baac7e8c..04a78f29df9 100644
--- a/doc/src/sgml/perform.sgml
+++ b/doc/src/sgml/perform.sgml
@@ -1169,6 +1169,103 @@ EXPLAIN ANALYZE SELECT * FROM tenk1 WHERE unique1 < 100 AND unique2 > 9000
</para>
</sect2>
+ <sect2 id="using-explain-progressive">
+ <title>Progressive <command>EXPLAIN</command></title>
+
+ <para>
+ The query plan created by the planner for any given active query can
+ be visualized by any session via <xref linkend="pg-stat-progress-explain-view"/>
+ view when <xref linkend="guc-progressive-explain"/> is enabled in the
+ client backend or parallel worker executing query and after min duration
+ specified by <xref linkend="guc-progressive-explain-min-duration"/> has
+ passed. Settings <xref linkend="guc-progressive-explain-timing"/>,
+ <xref linkend="guc-progressive-explain-buffers"/> and
+ <xref linkend="guc-progressive-explain-wal"/> control which additional
+ instrumentaton details are collected and included in the output while
+ <xref linkend="guc-progressive-explain-format"/>,
+ <xref linkend="guc-progressive-explain-verbose"/>,
+ <xref linkend="guc-progressive-explain-settings"/> and
+ <xref linkend="guc-progressive-explain-costs"/>
+ define how the plan is printed and which details are added there.
+ </para>
+
+ <para>
+ When <xref linkend="guc-progressive-explain"/> is set to <literal>explain</literal>
+ the plan will be printed once at the beginning of the query.
+ </para>
+
+ <para>
+<screen>
+SET progressive_explain = 'explain';
+SET
+
+SELECT * FROM test t1 INNER JOIN test t2 ON t1.c1=t2.c1;
+</screen>
+ </para>
+ <para>
+<screen>
+SELECT * FROM pg_stat_progress_explain;
+datid | datname | pid | last_update | query_plan
+-------+----------+-------+-------------------------------+------------------------------------------------------------------------------
+ 5 | postgres | 73972 | 2025-03-13 23:41:01.606324-03 | Hash Join (cost=327879.85..878413.95 rows=9999860 width=18) +
+ | | | | Hash Cond: (t1.c1 = t2.c1) +
+ | | | | -> Seq Scan on test t1 (cost=0.00..154053.60 rows=9999860 width=9) +
+ | | | | -> Hash (cost=154053.60..154053.60 rows=9999860 width=9) +
+ | | | | -> Seq Scan on test t2 (cost=0.00..154053.60 rows=9999860 width=9)+
+ | | | |
+(1 row)
+</screen>
+ </para>
+
+ <para>
+ Setting <xref linkend="guc-progressive-explain"/> to <literal>analyze</literal>
+ will enable instrumentation and the detailed plan is printed on a fixed interval
+ controlled by <xref linkend="guc-progressive-explain-interval"/>, including
+ per node accumulated row count and other statistics.
+ </para>
+
+ <para>
+ Progressive explains include additional information per node to help analyzing
+ execution progress:
+
+ <itemizedlist>
+ <listitem>
+ <para>
+ current: the plan node currently being processed.
+ </para>
+ </listitem>
+ <listitem>
+ <para>
+ never executed: a plan node not processed yet.
+ </para>
+ </listitem>
+ </itemizedlist>
+ </para>
+ <para>
+<screen>
+SET progressive_explain = 'analyze';
+SET
+
+SELECT * FROM test t1 INNER JOIN test t2 ON t1.c1=t2.c1;
+</screen>
+ </para>
+ <para>
+<screen>
+SELECT * FROM pg_stat_progress_explain;
+datid | datname | pid | last_update | query_plan
+-------+----------+-------+-------------------------------+---------------------------------------------------------------------------------------------------------------------------------------
+ 5 | postgres | 73972 | 2025-03-13 23:41:53.951884-03 | Hash Join (cost=327879.85..878413.95 rows=9999860 width=18) (actual time=1581.469..2963.158 rows=64862.00 loops=1) +
+ | | | | Hash Cond: (t1.c1 = t2.c1) +
+ | | | | -> Seq Scan on test t1 (cost=0.00..154053.60 rows=9999860 width=9) (actual time=0.079..486.702 rows=8258962.00 loops=1) (current)+
+ | | | | -> Hash (cost=154053.60..154053.60 rows=9999860 width=9) (actual time=1580.933..1580.933 rows=10000000.00 loops=1) +
+ | | | | -> Seq Scan on test t2 (cost=0.00..154053.60 rows=9999860 width=9) (actual time=0.004..566.961 rows=10000000.00 loops=1) +
+ | | | |
+(1 row)
+</screen>
+ </para>
+
+ </sect2>
+
</sect1>
<sect1 id="planner-stats">
diff --git a/src/backend/catalog/system_views.sql b/src/backend/catalog/system_views.sql
index a4d2cfdcaf5..460946a4079 100644
--- a/src/backend/catalog/system_views.sql
+++ b/src/backend/catalog/system_views.sql
@@ -1334,6 +1334,16 @@ CREATE VIEW pg_stat_progress_copy AS
FROM pg_stat_get_progress_info('COPY') AS S
LEFT JOIN pg_database D ON S.datid = D.oid;
+CREATE VIEW pg_stat_progress_explain AS
+ SELECT
+ S.datid AS datid,
+ D.datname AS datname,
+ S.pid,
+ S.last_update,
+ S.query_plan
+ FROM pg_stat_progress_explain() AS S
+ LEFT JOIN pg_database AS D ON (S.datid = D.oid);
+
CREATE VIEW pg_user_mappings AS
SELECT
U.oid AS umid,
diff --git a/src/backend/commands/explain.c b/src/backend/commands/explain.c
index 19ffcc2cacb..67e929c1056 100644
--- a/src/backend/commands/explain.c
+++ b/src/backend/commands/explain.c
@@ -21,6 +21,7 @@
#include "commands/explain_format.h"
#include "commands/prepare.h"
#include "foreign/fdwapi.h"
+#include "funcapi.h"
#include "jit/jit.h"
#include "libpq/pqformat.h"
#include "libpq/protocol.h"
@@ -32,6 +33,7 @@
#include "rewrite/rewriteHandler.h"
#include "storage/bufmgr.h"
#include "tcop/tcopprot.h"
+#include "utils/backend_status.h"
#include "utils/builtins.h"
#include "utils/guc_tables.h"
#include "utils/json.h"
@@ -39,17 +41,39 @@
#include "utils/rel.h"
#include "utils/ruleutils.h"
#include "utils/snapmgr.h"
+#include "utils/timeout.h"
#include "utils/tuplesort.h"
#include "utils/typcache.h"
#include "utils/xml.h"
+#define PROGRESSIVE_EXPLAIN_FREE_SIZE 4096
+
+/*
+ * GUC support
+ */
+const struct config_enum_entry explain_format_options[] = {
+ {"text", EXPLAIN_FORMAT_TEXT, false},
+ {"xml", EXPLAIN_FORMAT_XML, false},
+ {"json", EXPLAIN_FORMAT_JSON, false},
+ {"yaml", EXPLAIN_FORMAT_YAML, false},
+ {NULL, 0, false}
+};
+
/* Hook for plugins to get control in ExplainOneQuery() */
ExplainOneQuery_hook_type ExplainOneQuery_hook = NULL;
/* Hook for plugins to get control in explain_get_index_name() */
explain_get_index_name_hook_type explain_get_index_name_hook = NULL;
+/* Shared hash to store progressive explains */
+static HTAB *progressiveExplainHash = NULL;
+
+/* Pointer to running query */
+static QueryDesc *activeQueryDesc = NULL;
+
+/* Flag set by timeouts to control when to update progressive explains */
+bool ProgressiveExplainPending = false;
/*
* Various places within need to convert bytes to kilobytes. Round these up
@@ -134,7 +158,7 @@ static void show_indexsearches_info(PlanState *planstate, ExplainState *es);
static void show_tidbitmap_info(BitmapHeapScanState *planstate,
ExplainState *es);
static void show_instrumentation_count(const char *qlabel, int which,
- PlanState *planstate, ExplainState *es);
+ Instrumentation *instr, ExplainState *es);
static void show_foreignscan_info(ForeignScanState *fsstate, ExplainState *es);
static const char *explain_get_index_name(Oid indexId);
static bool peek_buffer_usage(ExplainState *es, const BufferUsage *usage);
@@ -160,6 +184,13 @@ static ExplainWorkersState *ExplainCreateWorkersState(int num_workers);
static void ExplainOpenWorker(int n, ExplainState *es);
static void ExplainCloseWorker(int n, ExplainState *es);
static void ExplainFlushWorkersState(ExplainState *es);
+static void ProgressiveExplainInit(QueryDesc *queryDesc);
+static void ProgressiveExplainPrint(QueryDesc *queryDesc);
+static void ProgressiveExplainCleanup(QueryDesc *queryDesc);
+static TupleTableSlot *ExecProcNodeExplain(PlanState *node);
+static void WrapExecProcNodeWithExplain(PlanState *ps);
+static void UnwrapExecProcNodeWithExplain(PlanState *ps);
+static void ProgressiveExplainReleaseFunc(void *);
@@ -372,6 +403,8 @@ NewExplainState(void)
es->costs = true;
/* Prepare output buffer. */
es->str = makeStringInfo();
+ /* An explain state is not progressive by default */
+ es->progressive = false;
return es;
}
@@ -726,6 +759,15 @@ ExplainOnePlan(PlannedStmt *plannedstmt, CachedPlan *cplan,
/* We can't run ExecutorEnd 'till we're done printing the stats... */
totaltime += elapsed_time(&starttime);
}
+ else
+ {
+ /*
+ * Handle progressive explain cleanup manually if enabled as it is not
+ * called as part of ExecutorFinish.
+ */
+ if (progressive_explain != PROGRESSIVE_EXPLAIN_NONE)
+ ProgressiveExplainFinish(queryDesc);
+ }
/* grab serialization metrics before we destroy the DestReceiver */
if (es->serialize != EXPLAIN_SERIALIZE_NONE)
@@ -1496,6 +1538,7 @@ ExplainNode(PlanState *planstate, List *ancestors,
const char *partialmode = NULL;
const char *operation = NULL;
const char *custom_name = NULL;
+ Instrumentation *local_instr = NULL;
ExplainWorkersState *save_workers_state = es->workers_state;
int save_indent = es->indent;
bool haschildren;
@@ -1959,17 +2002,38 @@ ExplainNode(PlanState *planstate, List *ancestors,
* instrumentation results the user didn't ask for. But we do the
* InstrEndLoop call anyway, if possible, to reduce the number of cases
* auto_explain has to contend with.
+ *
+ * For regular explains instrumentation clean up is called directly in the
+ * main instrumentation objects. Progressive explains need to clone
+ * instrumentation object and forcibly end the loop in nodes that may be
+ * running.
*/
if (planstate->instrument)
- InstrEndLoop(planstate->instrument);
+ {
+ /* Progressive explain. Use auxiliary instrumentation object */
+ if (es->progressive)
+ {
+ local_instr = es->pe_local_instr;
+ *local_instr = *planstate->instrument;
+
+ /* Force end loop even if node is in progress */
+ InstrEndLoopForce(local_instr);
+ }
+ /* Use main instrumentation */
+ else
+ {
+ local_instr = planstate->instrument;
+ InstrEndLoop(local_instr);
+ }
+ }
if (es->analyze &&
- planstate->instrument && planstate->instrument->nloops > 0)
+ local_instr && local_instr->nloops > 0)
{
- double nloops = planstate->instrument->nloops;
- double startup_ms = 1000.0 * planstate->instrument->startup / nloops;
- double total_ms = 1000.0 * planstate->instrument->total / nloops;
- double rows = planstate->instrument->ntuples / nloops;
+ double nloops = local_instr->nloops;
+ double startup_ms = 1000.0 * local_instr->startup / nloops;
+ double total_ms = 1000.0 * local_instr->total / nloops;
+ double rows = local_instr->ntuples / nloops;
if (es->format == EXPLAIN_FORMAT_TEXT)
{
@@ -1979,6 +2043,9 @@ ExplainNode(PlanState *planstate, List *ancestors,
appendStringInfo(es->str, "time=%.3f..%.3f ", startup_ms, total_ms);
appendStringInfo(es->str, "rows=%.2f loops=%.0f)", rows, nloops);
+
+ if (es->progressive && planstate == es->pe_curr_node)
+ appendStringInfo(es->str, " (current)");
}
else
{
@@ -1991,6 +2058,10 @@ ExplainNode(PlanState *planstate, List *ancestors,
}
ExplainPropertyFloat("Actual Rows", NULL, rows, 2, es);
ExplainPropertyFloat("Actual Loops", NULL, nloops, 0, es);
+
+ /* Progressive explain. Add current node flag is applicable */
+ if (es->progressive && planstate == es->pe_curr_node)
+ ExplainPropertyBool("Current", true, es);
}
}
else if (es->analyze)
@@ -2095,13 +2166,13 @@ ExplainNode(PlanState *planstate, List *ancestors,
"Index Cond", planstate, ancestors, es);
if (((IndexScan *) plan)->indexqualorig)
show_instrumentation_count("Rows Removed by Index Recheck", 2,
- planstate, es);
+ local_instr, es);
show_scan_qual(((IndexScan *) plan)->indexorderbyorig,
"Order By", planstate, ancestors, es);
show_scan_qual(plan->qual, "Filter", planstate, ancestors, es);
if (plan->qual)
show_instrumentation_count("Rows Removed by Filter", 1,
- planstate, es);
+ local_instr, es);
show_indexsearches_info(planstate, es);
break;
case T_IndexOnlyScan:
@@ -2109,16 +2180,16 @@ ExplainNode(PlanState *planstate, List *ancestors,
"Index Cond", planstate, ancestors, es);
if (((IndexOnlyScan *) plan)->recheckqual)
show_instrumentation_count("Rows Removed by Index Recheck", 2,
- planstate, es);
+ local_instr, es);
show_scan_qual(((IndexOnlyScan *) plan)->indexorderby,
"Order By", planstate, ancestors, es);
show_scan_qual(plan->qual, "Filter", planstate, ancestors, es);
if (plan->qual)
show_instrumentation_count("Rows Removed by Filter", 1,
- planstate, es);
+ local_instr, es);
if (es->analyze)
ExplainPropertyFloat("Heap Fetches", NULL,
- planstate->instrument->ntuples2, 0, es);
+ local_instr->ntuples2, 0, es);
show_indexsearches_info(planstate, es);
break;
case T_BitmapIndexScan:
@@ -2131,11 +2202,11 @@ ExplainNode(PlanState *planstate, List *ancestors,
"Recheck Cond", planstate, ancestors, es);
if (((BitmapHeapScan *) plan)->bitmapqualorig)
show_instrumentation_count("Rows Removed by Index Recheck", 2,
- planstate, es);
+ local_instr, es);
show_scan_qual(plan->qual, "Filter", planstate, ancestors, es);
if (plan->qual)
show_instrumentation_count("Rows Removed by Filter", 1,
- planstate, es);
+ local_instr, es);
show_tidbitmap_info((BitmapHeapScanState *) planstate, es);
break;
case T_SampleScan:
@@ -2152,7 +2223,7 @@ ExplainNode(PlanState *planstate, List *ancestors,
show_scan_qual(plan->qual, "Filter", planstate, ancestors, es);
if (plan->qual)
show_instrumentation_count("Rows Removed by Filter", 1,
- planstate, es);
+ local_instr, es);
if (IsA(plan, CteScan))
show_ctescan_info(castNode(CteScanState, planstate), es);
break;
@@ -2163,7 +2234,7 @@ ExplainNode(PlanState *planstate, List *ancestors,
show_scan_qual(plan->qual, "Filter", planstate, ancestors, es);
if (plan->qual)
show_instrumentation_count("Rows Removed by Filter", 1,
- planstate, es);
+ local_instr, es);
ExplainPropertyInteger("Workers Planned", NULL,
gather->num_workers, es);
@@ -2187,7 +2258,7 @@ ExplainNode(PlanState *planstate, List *ancestors,
show_scan_qual(plan->qual, "Filter", planstate, ancestors, es);
if (plan->qual)
show_instrumentation_count("Rows Removed by Filter", 1,
- planstate, es);
+ local_instr, es);
ExplainPropertyInteger("Workers Planned", NULL,
gm->num_workers, es);
@@ -2221,7 +2292,7 @@ ExplainNode(PlanState *planstate, List *ancestors,
show_scan_qual(plan->qual, "Filter", planstate, ancestors, es);
if (plan->qual)
show_instrumentation_count("Rows Removed by Filter", 1,
- planstate, es);
+ local_instr, es);
break;
case T_TableFuncScan:
if (es->verbose)
@@ -2235,7 +2306,7 @@ ExplainNode(PlanState *planstate, List *ancestors,
show_scan_qual(plan->qual, "Filter", planstate, ancestors, es);
if (plan->qual)
show_instrumentation_count("Rows Removed by Filter", 1,
- planstate, es);
+ local_instr, es);
show_table_func_scan_info(castNode(TableFuncScanState,
planstate), es);
break;
@@ -2253,7 +2324,7 @@ ExplainNode(PlanState *planstate, List *ancestors,
show_scan_qual(plan->qual, "Filter", planstate, ancestors, es);
if (plan->qual)
show_instrumentation_count("Rows Removed by Filter", 1,
- planstate, es);
+ local_instr, es);
}
break;
case T_TidRangeScan:
@@ -2270,14 +2341,14 @@ ExplainNode(PlanState *planstate, List *ancestors,
show_scan_qual(plan->qual, "Filter", planstate, ancestors, es);
if (plan->qual)
show_instrumentation_count("Rows Removed by Filter", 1,
- planstate, es);
+ local_instr, es);
}
break;
case T_ForeignScan:
show_scan_qual(plan->qual, "Filter", planstate, ancestors, es);
if (plan->qual)
show_instrumentation_count("Rows Removed by Filter", 1,
- planstate, es);
+ local_instr, es);
show_foreignscan_info((ForeignScanState *) planstate, es);
break;
case T_CustomScan:
@@ -2287,7 +2358,7 @@ ExplainNode(PlanState *planstate, List *ancestors,
show_scan_qual(plan->qual, "Filter", planstate, ancestors, es);
if (plan->qual)
show_instrumentation_count("Rows Removed by Filter", 1,
- planstate, es);
+ local_instr, es);
if (css->methods->ExplainCustomScan)
css->methods->ExplainCustomScan(css, ancestors, es);
}
@@ -2297,11 +2368,11 @@ ExplainNode(PlanState *planstate, List *ancestors,
"Join Filter", planstate, ancestors, es);
if (((NestLoop *) plan)->join.joinqual)
show_instrumentation_count("Rows Removed by Join Filter", 1,
- planstate, es);
+ local_instr, es);
show_upper_qual(plan->qual, "Filter", planstate, ancestors, es);
if (plan->qual)
show_instrumentation_count("Rows Removed by Filter", 2,
- planstate, es);
+ local_instr, es);
break;
case T_MergeJoin:
show_upper_qual(((MergeJoin *) plan)->mergeclauses,
@@ -2310,11 +2381,11 @@ ExplainNode(PlanState *planstate, List *ancestors,
"Join Filter", planstate, ancestors, es);
if (((MergeJoin *) plan)->join.joinqual)
show_instrumentation_count("Rows Removed by Join Filter", 1,
- planstate, es);
+ local_instr, es);
show_upper_qual(plan->qual, "Filter", planstate, ancestors, es);
if (plan->qual)
show_instrumentation_count("Rows Removed by Filter", 2,
- planstate, es);
+ local_instr, es);
break;
case T_HashJoin:
show_upper_qual(((HashJoin *) plan)->hashclauses,
@@ -2323,11 +2394,11 @@ ExplainNode(PlanState *planstate, List *ancestors,
"Join Filter", planstate, ancestors, es);
if (((HashJoin *) plan)->join.joinqual)
show_instrumentation_count("Rows Removed by Join Filter", 1,
- planstate, es);
+ local_instr, es);
show_upper_qual(plan->qual, "Filter", planstate, ancestors, es);
if (plan->qual)
show_instrumentation_count("Rows Removed by Filter", 2,
- planstate, es);
+ local_instr, es);
break;
case T_Agg:
show_agg_keys(castNode(AggState, planstate), ancestors, es);
@@ -2335,7 +2406,7 @@ ExplainNode(PlanState *planstate, List *ancestors,
show_hashagg_info((AggState *) planstate, es);
if (plan->qual)
show_instrumentation_count("Rows Removed by Filter", 1,
- planstate, es);
+ local_instr, es);
break;
case T_WindowAgg:
show_window_def(castNode(WindowAggState, planstate), ancestors, es);
@@ -2344,7 +2415,7 @@ ExplainNode(PlanState *planstate, List *ancestors,
show_upper_qual(plan->qual, "Filter", planstate, ancestors, es);
if (plan->qual)
show_instrumentation_count("Rows Removed by Filter", 1,
- planstate, es);
+ local_instr, es);
show_windowagg_info(castNode(WindowAggState, planstate), es);
break;
case T_Group:
@@ -2352,7 +2423,7 @@ ExplainNode(PlanState *planstate, List *ancestors,
show_upper_qual(plan->qual, "Filter", planstate, ancestors, es);
if (plan->qual)
show_instrumentation_count("Rows Removed by Filter", 1,
- planstate, es);
+ local_instr, es);
break;
case T_Sort:
show_sort_keys(castNode(SortState, planstate), ancestors, es);
@@ -2374,7 +2445,7 @@ ExplainNode(PlanState *planstate, List *ancestors,
show_upper_qual(plan->qual, "Filter", planstate, ancestors, es);
if (plan->qual)
show_instrumentation_count("Rows Removed by Filter", 1,
- planstate, es);
+ local_instr, es);
break;
case T_ModifyTable:
show_modifytable_info(castNode(ModifyTableState, planstate), ancestors,
@@ -2419,10 +2490,10 @@ ExplainNode(PlanState *planstate, List *ancestors,
}
/* Show buffer/WAL usage */
- if (es->buffers && planstate->instrument)
- show_buffer_usage(es, &planstate->instrument->bufusage);
- if (es->wal && planstate->instrument)
- show_wal_usage(es, &planstate->instrument->walusage);
+ if (es->buffers && local_instr)
+ show_buffer_usage(es, &local_instr->bufusage);
+ if (es->wal && local_instr)
+ show_wal_usage(es, &local_instr->walusage);
/* Prepare per-worker buffer/WAL usage */
if (es->workers_state && (es->buffers || es->wal) && es->verbose)
@@ -4105,19 +4176,19 @@ show_tidbitmap_info(BitmapHeapScanState *planstate, ExplainState *es)
*/
static void
show_instrumentation_count(const char *qlabel, int which,
- PlanState *planstate, ExplainState *es)
+ Instrumentation *instr, ExplainState *es)
{
double nfiltered;
double nloops;
- if (!es->analyze || !planstate->instrument)
+ if (!es->analyze || !instr)
return;
if (which == 2)
- nfiltered = planstate->instrument->nfiltered2;
+ nfiltered = instr->nfiltered2;
else
- nfiltered = planstate->instrument->nfiltered1;
- nloops = planstate->instrument->nloops;
+ nfiltered = instr->nfiltered1;
+ nloops = instr->nloops;
/* In text mode, suppress zero counts; they're not interesting enough */
if (nfiltered > 0 || es->format != EXPLAIN_FORMAT_TEXT)
@@ -4788,7 +4859,7 @@ show_modifytable_info(ModifyTableState *mtstate, List *ancestors,
{
show_upper_qual((List *) node->onConflictWhere, "Conflict Filter",
&mtstate->ps, ancestors, es);
- show_instrumentation_count("Rows Removed by Conflict Filter", 1, &mtstate->ps, es);
+ show_instrumentation_count("Rows Removed by Conflict Filter", 1, (&mtstate->ps)->instrument, es);
}
/* EXPLAIN ANALYZE display of actual outcome for each tuple proposed */
@@ -4797,11 +4868,24 @@ show_modifytable_info(ModifyTableState *mtstate, List *ancestors,
double total;
double insert_path;
double other_path;
+ Instrumentation *local_instr;
- InstrEndLoop(outerPlanState(mtstate)->instrument);
+ /* Progressive explain. Use auxiliary instrumentation object */
+ if (es->progressive)
+ {
+ local_instr = es->pe_local_instr;
+ *local_instr = *outerPlanState(mtstate)->instrument;
+ /* Force end loop even if node is in progress */
+ InstrEndLoopForce(local_instr);
+ }
+ else
+ {
+ local_instr = outerPlanState(mtstate)->instrument;
+ InstrEndLoop(local_instr);
+ }
/* count the number of source rows */
- total = outerPlanState(mtstate)->instrument->ntuples;
+ total = local_instr->ntuples;
other_path = mtstate->ps.instrument->ntuples2;
insert_path = total - other_path;
@@ -4821,11 +4905,24 @@ show_modifytable_info(ModifyTableState *mtstate, List *ancestors,
double update_path;
double delete_path;
double skipped_path;
+ Instrumentation *local_instr;
- InstrEndLoop(outerPlanState(mtstate)->instrument);
+ /* Progressive explain. Use auxiliary instrumentation object */
+ if (es->progressive)
+ {
+ local_instr = es->pe_local_instr;
+ *local_instr = *outerPlanState(mtstate)->instrument;
+ /* Force end loop even if node is in progress */
+ InstrEndLoopForce(local_instr);
+ }
+ else
+ {
+ local_instr = outerPlanState(mtstate)->instrument;
+ InstrEndLoop(local_instr);
+ }
/* count the number of source rows */
- total = outerPlanState(mtstate)->instrument->ntuples;
+ total = local_instr->ntuples;
insert_path = mtstate->mt_merge_inserted;
update_path = mtstate->mt_merge_updated;
delete_path = mtstate->mt_merge_deleted;
@@ -5107,3 +5204,644 @@ ExplainFlushWorkersState(ExplainState *es)
pfree(wstate->worker_state_save);
pfree(wstate);
}
+
+/*
+ * ProgressiveExplainSetup -
+ * Adjusts QueryDesc with instrumentation for progressive explains.
+ *
+ * If progressive explain is enabled and configured to collect
+ * instrumentation details, we adjust QueryDesc accordingly even if
+ * the query was not initiated with EXPLAIN ANALYZE. This will
+ * directly affect query execution and add computation overhead.
+ */
+void
+ProgressiveExplainSetup(QueryDesc *queryDesc)
+{
+ /* Adjust instrumentation if enabled */
+ if (progressive_explain == PROGRESSIVE_EXPLAIN_ANALYZE)
+ {
+ if (progressive_explain_timing)
+ queryDesc->instrument_options |= INSTRUMENT_TIMER;
+ else
+ queryDesc->instrument_options |= INSTRUMENT_ROWS;
+ if (progressive_explain_buffers)
+ queryDesc->instrument_options |= INSTRUMENT_BUFFERS;
+ if (progressive_explain_wal)
+ queryDesc->instrument_options |= INSTRUMENT_WAL;
+ }
+}
+
+/*
+ * ProgressiveExplainStart
+ * Progressive explain start point.
+ */
+void
+ProgressiveExplainStart(QueryDesc *queryDesc)
+{
+ activeQueryDesc = queryDesc;
+ queryDesc->pestate = NULL;
+
+ /* Timeout is only needed if duration > 0 */
+ if (progressive_explain_min_duration == 0)
+ ProgressiveExplainInit(queryDesc);
+ else
+ enable_timeout_after(PROGRESSIVE_EXPLAIN_STARTUP_TIMEOUT,
+ progressive_explain_min_duration);
+}
+
+/*
+ * ProgressiveExplainInit -
+ * Responsible for initialization of all structures related to progressive
+ * explains.
+ *
+ * We define a ExplainState that will be reused in every iteration of
+ * plan prints.
+ *
+ * Progressive explain plans are printed in shared memory via DSAs. Each
+ * A dynamic shared memory area is created to hold the progressive plans.
+ * Each backend printing plans has its own DSA, which is shared with other
+ * backends via the global progressive explain hash through dsa_handle and
+ * dsa_pointer pointers.
+ *
+ * A memory context release callback is defined for manual resource release
+ * in case of query cancellations.
+ *
+ * A periodic timeout is configured to print the plan in fixed intervals if
+ * progressive explain is configured with instrumentation enabled. Otherwise
+ * the plain plan is printed once.
+ */
+void
+ProgressiveExplainInit(QueryDesc *queryDesc)
+{
+ ExplainState *es;
+ progressiveExplainHashEntry *entry;
+ bool found;
+
+ /* Configure memory context release callback */
+ MemoryContextCallback *queryDescReleaseCallback;
+
+ queryDescReleaseCallback = (MemoryContextCallback *)
+ palloc0(sizeof(MemoryContextCallback));
+ queryDescReleaseCallback->func = ProgressiveExplainReleaseFunc;
+ queryDescReleaseCallback->arg = NULL;
+ MemoryContextRegisterResetCallback(CurrentMemoryContext,
+ queryDescReleaseCallback);
+
+ /* Initialize ExplainState to be used for all prints */
+ es = NewExplainState();
+ queryDesc->pestate = es;
+
+ /* Local instrumentation object to be reused for every node */
+ es->pe_local_instr = palloc0(sizeof(Instrumentation));
+
+ /*
+ * Mark ExplainState as progressive so that the plan printing function
+ * adjusts logic accordingly.
+ */
+ es->progressive = true;
+
+ es->analyze = (queryDesc->instrument_options &&
+ (progressive_explain == PROGRESSIVE_EXPLAIN_ANALYZE));
+ es->buffers = (es->analyze && progressive_explain_buffers);
+ es->wal = (es->analyze && progressive_explain_wal);
+ es->timing = (es->analyze && progressive_explain_timing);
+ es->summary = (es->analyze);
+ es->format = progressive_explain_format;
+ es->verbose = progressive_explain_verbose;
+ es->settings = progressive_explain_settings;
+ es->costs = progressive_explain_costs;
+
+ /* Define the DSA and share through the hash */
+ es->pe_a = dsa_create(LWTRANCHE_PROGRESSIVE_EXPLAIN_DSA);
+
+ /* Exclusive access is needed to update the hash */
+ LWLockAcquire(ProgressiveExplainHashLock, LW_EXCLUSIVE);
+
+ /* Find or create an entry with desired hash code */
+ entry = (progressiveExplainHashEntry *) hash_search(progressiveExplainHash,
+ &MyProcPid,
+ HASH_ENTER,
+ &found);
+
+ entry->h = dsa_get_handle(es->pe_a);
+ entry->p = (dsa_pointer) NULL;
+
+ LWLockRelease(ProgressiveExplainHashLock);
+
+ /* Enable timeout only if instrumentation is enabled */
+ if (es->analyze)
+ enable_timeout_every(PROGRESSIVE_EXPLAIN_TIMEOUT,
+ TimestampTzPlusMilliseconds(GetCurrentTimestamp(),
+ progressive_explain_interval),
+ progressive_explain_interval);
+
+ /* Printing progressive plan for the first time */
+ ProgressiveExplainPrint(queryDesc);
+}
+
+/*
+ * ProgressiveExplainTrigger -
+ * Called by the timeout handler to start printing progressive
+ * explain plans.
+ */
+void
+ProgressiveExplainTrigger(void)
+{
+ WrapExecProcNodeWithExplain(activeQueryDesc->planstate);
+}
+
+/*
+ * ProgressiveExplainUpdate
+ * Updates progressive explain for instrumented runs.
+ */
+void
+ProgressiveExplainUpdate(PlanState *node)
+{
+ /* Track the current PlanState */
+ node->state->query_desc->pestate->pe_curr_node = node;
+ ProgressiveExplainPrint(node->state->query_desc);
+ node->state->query_desc->pestate->pe_curr_node = NULL;
+
+ /* Reset timeout flag */
+ ProgressiveExplainPending = false;
+}
+
+/*
+ * ProgressiveExplainPrint -
+ * Prints progressive explain in memory.
+ *
+ * This function resets the reusable ExplainState, prints the
+ * plan and updates the DSA with new data.
+ *
+ * DSA memory allocation is also done here. Amount of shared
+ * memory allocated depends on size of currently printed plan.
+ * There may be reallocations in subsequent calls if new plans
+ * don't fit in the existing area.
+ */
+void
+ProgressiveExplainPrint(QueryDesc *queryDesc)
+{
+ bool alloc_needed = false;
+
+ /* Produce a plan only if descriptor is being tracked */
+ if (queryDesc &&
+ queryDesc->planstate)
+ {
+ QueryDesc *currentQueryDesc = queryDesc;
+
+ progressiveExplainHashEntry *entry;
+ progressiveExplainData *pe_data;
+ ExplainState *es = queryDesc->pestate;
+
+ /* Reset the string to be reused */
+ resetStringInfo(es->str);
+
+ /* Print the plan */
+ ExplainBeginOutput(es);
+ ExplainPrintPlan(es, currentQueryDesc);
+ ExplainEndOutput(es);
+
+ /* Exclusive access is needed to update the hash */
+ LWLockAcquire(ProgressiveExplainHashLock, LW_EXCLUSIVE);
+ entry = (progressiveExplainHashEntry *) hash_search(progressiveExplainHash,
+ &MyProcPid,
+ HASH_FIND,
+ NULL);
+
+ /* Entry exists */
+ if (entry)
+ {
+ /* Plan was never printed */
+ if (!entry->p)
+ alloc_needed = true;
+ else
+ {
+ pe_data = dsa_get_address(es->pe_a,
+ entry->p);
+
+ /*
+ * Plan does not fit in existing shared memory area.
+ * Reallocation is needed.
+ */
+ if (strlen(es->str->data) >
+ add_size(strlen(pe_data->plan),
+ PROGRESSIVE_EXPLAIN_FREE_SIZE))
+ {
+ dsa_free(es->pe_a, entry->p);
+ alloc_needed = true;
+ }
+ }
+
+ if (alloc_needed)
+ {
+ /*
+ * The allocated size combines the length of the currently
+ * printed query plan with an additional delta defined by
+ * PROGRESSIVE_EXPLAIN_FREE_SIZE. This strategy prevents
+ * having to reallocate the segment very often, which would be
+ * needed in case the length of the next printed exceeds the
+ * previously allocated size.
+ */
+ entry->p = dsa_allocate(es->pe_a,
+ add_size(sizeof(progressiveExplainData),
+ add_size(strlen(es->str->data),
+ PROGRESSIVE_EXPLAIN_FREE_SIZE)));
+ pe_data = dsa_get_address(es->pe_a, entry->p);
+ pe_data->pid = MyProcPid;
+ }
+
+ /* Update shared memory with new data */
+ strcpy(pe_data->plan, es->str->data);
+ pe_data->last_update = GetCurrentTimestamp();
+ }
+
+ LWLockRelease(ProgressiveExplainHashLock);
+ }
+}
+
+/*
+ * ProgressiveExplainFinish -
+ * Finalizes query execution with progressive explain enabled.
+ */
+void
+ProgressiveExplainFinish(QueryDesc *queryDesc)
+{
+ /* Startup timeout hasn't triggered yet, just disable it */
+ if (get_timeout_active(PROGRESSIVE_EXPLAIN_STARTUP_TIMEOUT))
+ disable_timeout(PROGRESSIVE_EXPLAIN_STARTUP_TIMEOUT, false);
+ /* Initial progressive explain was done, clean everything */
+ else if (queryDesc && queryDesc->pestate)
+ ProgressiveExplainCleanup(queryDesc);
+}
+
+/*
+ * ProgressiveExplainCleanup -
+ * Cleanup routine when progressive explain is enabled.
+ *
+ * We need to deal with structures not automatically released by the memory
+ * context removal. Current tasks are:
+ * - remove local backend from progressive explain hash
+ * - detach from DSA used to store shared data
+ */
+void
+ProgressiveExplainCleanup(QueryDesc *queryDesc)
+{
+ /* Stop timeout */
+ disable_timeout(PROGRESSIVE_EXPLAIN_TIMEOUT, false);
+
+ /* Reset timeout flag */
+ ProgressiveExplainPending = false;
+
+ /* Reset querydesc tracker */
+ activeQueryDesc = NULL;
+
+ LWLockAcquire(ProgressiveExplainHashLock, LW_EXCLUSIVE);
+
+ /*
+ * Only detach from DSA if query ended gracefully, ie, if
+ * ProgressiveExplainCleanup was called by function
+ * ProgressiveExplainFinish
+ */
+ if (queryDesc)
+ dsa_detach(queryDesc->pestate->pe_a);
+ hash_search(progressiveExplainHash, &MyProcPid, HASH_REMOVE, NULL);
+ LWLockRelease(ProgressiveExplainHashLock);
+}
+
+/*
+ * ExecProcNodeExplain -
+ * ExecProcNode wrapper that initializes progressive explain
+ * and uwraps ExecProdNode to the original function.
+ */
+static TupleTableSlot *
+ExecProcNodeExplain(PlanState *node)
+{
+ /* Initialize progressive explain */
+ ProgressiveExplainInit(node->state->query_desc);
+
+ /* Unwrap exec proc node for all nodes */
+ UnwrapExecProcNodeWithExplain(node->state->query_desc->planstate);
+
+ /*
+ * Since unwrapping has already done, call ExecProcNode() not
+ * ExecProcNodeOriginal().
+ */
+ return node->ExecProcNode(node);
+}
+
+/*
+ * ExecProcNodeInstrExplain -
+ * ExecProcNode wrapper that performs instrumentation calls and prints
+ * progressive explains. By keeping this a separate function, we add
+ * overhead only when progressive explain is enabled
+ */
+TupleTableSlot *
+ExecProcNodeInstrExplain(PlanState *node)
+{
+ TupleTableSlot *result;
+
+ InstrStartNode(node->instrument);
+
+ /*
+ * Update progressive after timeout is reached.
+ */
+ if (ProgressiveExplainPending)
+ ProgressiveExplainUpdate(node);
+
+ result = node->ExecProcNodeReal(node);
+
+ InstrStopNode(node->instrument, TupIsNull(result) ? 0.0 : 1.0);
+
+ return result;
+}
+
+/*
+ * WrapMultiExecProcNodesWithExplain -
+ * Wrap array of PlanStates ExecProcNodes with ExecProcNodeWithExplain
+ */
+static void
+WrapMultiExecProcNodesWithExplain(PlanState **planstates, int nplans)
+{
+ int i;
+
+ for (i = 0; i < nplans; i++)
+ WrapExecProcNodeWithExplain(planstates[i]);
+}
+
+/*
+ * WrapCustomPlanChildExecProcNodesWithExplain -
+ * Wrap CustomScanstate children's ExecProcNodes with ExecProcNodeWithExplain
+ */
+static void
+WrapCustomPlanChildExecProcNodesWithExplain(CustomScanState *css)
+{
+ ListCell *cell;
+
+ foreach(cell, css->custom_ps)
+ WrapExecProcNodeWithExplain((PlanState *) lfirst(cell));
+}
+
+/*
+ * WrapExecProcNodeWithExplain -
+ * Wrap ExecProcNode with ExecProcNodeWithExplain recursively
+ */
+static void
+WrapExecProcNodeWithExplain(PlanState *ps)
+{
+ /* wrapping can be done only once */
+ if (ps->ExecProcNodeOriginal != NULL)
+ return;
+
+ check_stack_depth();
+
+ ps->ExecProcNodeOriginal = ps->ExecProcNode;
+ ps->ExecProcNode = ExecProcNodeExplain;
+
+ if (ps->lefttree != NULL)
+ WrapExecProcNodeWithExplain(ps->lefttree);
+ if (ps->righttree != NULL)
+ WrapExecProcNodeWithExplain(ps->righttree);
+
+ /* special child plans */
+ switch (nodeTag(ps->plan))
+ {
+ case T_Append:
+ WrapMultiExecProcNodesWithExplain(((AppendState *) ps)->appendplans,
+ ((AppendState *) ps)->as_nplans);
+ break;
+ case T_MergeAppend:
+ WrapMultiExecProcNodesWithExplain(((MergeAppendState *) ps)->mergeplans,
+ ((MergeAppendState *) ps)->ms_nplans);
+ break;
+ case T_BitmapAnd:
+ WrapMultiExecProcNodesWithExplain(((BitmapAndState *) ps)->bitmapplans,
+ ((BitmapAndState *) ps)->nplans);
+ break;
+ case T_BitmapOr:
+ WrapMultiExecProcNodesWithExplain(((BitmapOrState *) ps)->bitmapplans,
+ ((BitmapOrState *) ps)->nplans);
+ break;
+ case T_SubqueryScan:
+ WrapExecProcNodeWithExplain(((SubqueryScanState *) ps)->subplan);
+ break;
+ case T_CustomScan:
+ WrapCustomPlanChildExecProcNodesWithExplain((CustomScanState *) ps);
+ break;
+ default:
+ break;
+ }
+}
+
+/*
+ * UnwrapMultiExecProcNodesWithExplain -
+ * Unwrap array of PlanStates ExecProcNodes with ExecProcNodeWithExplain
+ */
+static void
+UnwrapMultiExecProcNodesWithExplain(PlanState **planstates, int nplans)
+{
+ int i;
+
+ for (i = 0; i < nplans; i++)
+ UnwrapExecProcNodeWithExplain(planstates[i]);
+}
+
+/*
+ * UnwrapCustomPlanChildExecProcNodesWithExplain -
+ * Unwrap CustomScanstate children's ExecProcNodes with ExecProcNodeWithExplain
+ */
+static void
+UnwrapCustomPlanChildExecProcNodesWithExplain(CustomScanState *css)
+{
+ ListCell *cell;
+
+ foreach(cell, css->custom_ps)
+ UnwrapExecProcNodeWithExplain((PlanState *) lfirst(cell));
+}
+
+/*
+ * UnwrapExecProcNodeWithExplain -
+ * Unwrap ExecProcNode with ExecProcNodeWithExplain recursively
+ */
+static void
+UnwrapExecProcNodeWithExplain(PlanState *ps)
+{
+ Assert(ps->ExecProcNodeOriginal != NULL);
+
+ check_stack_depth();
+
+ ps->ExecProcNode = ps->ExecProcNodeOriginal;
+ ps->ExecProcNodeOriginal = NULL;
+
+ if (ps->lefttree != NULL)
+ UnwrapExecProcNodeWithExplain(ps->lefttree);
+ if (ps->righttree != NULL)
+ UnwrapExecProcNodeWithExplain(ps->righttree);
+
+ /* special child plans */
+ switch (nodeTag(ps->plan))
+ {
+ case T_Append:
+ UnwrapMultiExecProcNodesWithExplain(((AppendState *) ps)->appendplans,
+ ((AppendState *) ps)->as_nplans);
+ break;
+ case T_MergeAppend:
+ UnwrapMultiExecProcNodesWithExplain(((MergeAppendState *) ps)->mergeplans,
+ ((MergeAppendState *) ps)->ms_nplans);
+ break;
+ case T_BitmapAnd:
+ UnwrapMultiExecProcNodesWithExplain(((BitmapAndState *) ps)->bitmapplans,
+ ((BitmapAndState *) ps)->nplans);
+ break;
+ case T_BitmapOr:
+ UnwrapMultiExecProcNodesWithExplain(((BitmapOrState *) ps)->bitmapplans,
+ ((BitmapOrState *) ps)->nplans);
+ break;
+ case T_SubqueryScan:
+ UnwrapExecProcNodeWithExplain(((SubqueryScanState *) ps)->subplan);
+ break;
+ case T_CustomScan:
+ UnwrapCustomPlanChildExecProcNodesWithExplain((CustomScanState *) ps);
+ break;
+ default:
+ break;
+ }
+}
+
+/*
+ * ProgressiveExplainReleaseFunc -
+ * Memory context release callback function to remove
+ * plan from explain hash and disable the timeout.
+ */
+static void
+ProgressiveExplainReleaseFunc(void *arg)
+{
+ progressiveExplainHashEntry *entry;
+
+ LWLockAcquire(ProgressiveExplainHashLock, LW_SHARED);
+ entry = (progressiveExplainHashEntry *) hash_search(progressiveExplainHash,
+ &MyProcPid,
+ HASH_FIND,
+ NULL);
+ LWLockRelease(ProgressiveExplainHashLock);
+ if (entry)
+ ProgressiveExplainCleanup(NULL);
+}
+
+/*
+ * ProgressiveExplainHashShmemSize
+ * Compute shared memory space needed for explain hash.
+ */
+Size
+ProgressiveExplainHashShmemSize(void)
+{
+ Size size = 0;
+ long max_table_size;
+
+ max_table_size = add_size(MaxBackends,
+ max_parallel_workers);
+ size = add_size(size,
+ hash_estimate_size(max_table_size,
+ sizeof(progressiveExplainHashEntry)));
+
+ return size;
+}
+
+/*
+ * InitProgressiveExplainHash -
+ * Initialize hash used to store data of progressive explains.
+ */
+void
+InitProgressiveExplainHash(void)
+{
+ HASHCTL info;
+
+ info.keysize = sizeof(int);
+ info.entrysize = sizeof(progressiveExplainHashEntry);
+
+ progressiveExplainHash = ShmemInitHash("progressive explain hash",
+ 50, 50,
+ &info,
+ HASH_ELEM | HASH_BLOBS);
+}
+
+/*
+ * pg_stat_progress_explain -
+ * Return the progress of progressive explains.
+ */
+Datum
+pg_stat_progress_explain(PG_FUNCTION_ARGS)
+{
+#define EXPLAIN_ACTIVITY_COLS 4
+ int num_backends = pgstat_fetch_stat_numbackends();
+ int curr_backend;
+ HASH_SEQ_STATUS hash_seq;
+ progressiveExplainHashEntry *entry;
+ dsa_area *a;
+ progressiveExplainData *ped;
+
+ ReturnSetInfo *rsinfo = (ReturnSetInfo *) fcinfo->resultinfo;
+
+ InitMaterializedSRF(fcinfo, 0);
+
+ LWLockAcquire(ProgressiveExplainHashLock, LW_SHARED);
+
+ hash_seq_init(&hash_seq, progressiveExplainHash);
+ while ((entry = hash_seq_search(&hash_seq)) != NULL)
+ {
+ Datum values[EXPLAIN_ACTIVITY_COLS] = {0};
+ bool nulls[EXPLAIN_ACTIVITY_COLS] = {0};
+
+ /*
+ * We don't look at a DSA that doesn't contain data yet, or at our own
+ * row.
+ */
+ if (!DsaPointerIsValid(entry->p) ||
+ MyProcPid == entry->pid)
+ continue;
+
+ a = dsa_attach(entry->h);
+ ped = dsa_get_address(a, entry->p);
+
+ /* 1-based index */
+ for (curr_backend = 1; curr_backend <= num_backends; curr_backend++)
+ {
+ LocalPgBackendStatus *local_beentry;
+ PgBackendStatus *beentry;
+
+ /* Get the next one in the list */
+ local_beentry = pgstat_get_local_beentry_by_index(curr_backend);
+ beentry = &local_beentry->backendStatus;
+
+ if (beentry->st_procpid == ped->pid)
+ {
+ /* Values available to all callers */
+ if (beentry->st_databaseid != InvalidOid)
+ values[0] = ObjectIdGetDatum(beentry->st_databaseid);
+ else
+ nulls[0] = true;
+
+ values[1] = ped->pid;
+ values[2] = TimestampTzGetDatum(ped->last_update);
+
+ if (superuser())
+ values[3] = CStringGetTextDatum(ped->plan);
+ else
+ {
+ if (beentry->st_userid == GetUserId())
+ values[3] = CStringGetTextDatum(ped->plan);
+ else
+ values[3] = CStringGetTextDatum("<insufficient privilege>");
+ }
+ break;
+ }
+ }
+
+ tuplestore_putvalues(rsinfo->setResult, rsinfo->setDesc, values, nulls);
+
+ dsa_detach(a);
+
+ }
+ LWLockRelease(ProgressiveExplainHashLock);
+
+ return (Datum) 0;
+}
diff --git a/src/backend/executor/execMain.c b/src/backend/executor/execMain.c
index 0493b7d5365..52b8b2bd1f7 100644
--- a/src/backend/executor/execMain.c
+++ b/src/backend/executor/execMain.c
@@ -43,6 +43,7 @@
#include "access/xact.h"
#include "catalog/namespace.h"
#include "catalog/partition.h"
+#include "commands/explain.h"
#include "commands/matview.h"
#include "commands/trigger.h"
#include "executor/executor.h"
@@ -157,6 +158,12 @@ standard_ExecutorStart(QueryDesc *queryDesc, int eflags)
/* caller must ensure the query's snapshot is active */
Assert(GetActiveSnapshot() == queryDesc->snapshot);
+ /*
+ * Setup progressive explain if enabled.
+ */
+ if (progressive_explain == PROGRESSIVE_EXPLAIN_ANALYZE)
+ ProgressiveExplainSetup(queryDesc);
+
/*
* If the transaction is read-only, we need to check if any writes are
* planned to non-temporary tables. EXPLAIN is considered read-only.
@@ -182,6 +189,11 @@ standard_ExecutorStart(QueryDesc *queryDesc, int eflags)
estate = CreateExecutorState();
queryDesc->estate = estate;
+ /*
+ * Adding back reference to QueryDesc
+ */
+ estate->query_desc = queryDesc;
+
oldcontext = MemoryContextSwitchTo(estate->es_query_cxt);
/*
@@ -267,6 +279,12 @@ standard_ExecutorStart(QueryDesc *queryDesc, int eflags)
*/
InitPlan(queryDesc, eflags);
+ /*
+ * Start progressive explain if enabled.
+ */
+ if (progressive_explain != PROGRESSIVE_EXPLAIN_NONE)
+ ProgressiveExplainStart(queryDesc);
+
MemoryContextSwitchTo(oldcontext);
return ExecPlanStillValid(queryDesc->estate);
@@ -516,6 +534,12 @@ standard_ExecutorFinish(QueryDesc *queryDesc)
MemoryContextSwitchTo(oldcontext);
+ /*
+ * Finish progressive explain if enabled.
+ */
+ if (progressive_explain != PROGRESSIVE_EXPLAIN_NONE)
+ ProgressiveExplainFinish(queryDesc);
+
estate->es_finished = true;
}
diff --git a/src/backend/executor/execProcnode.c b/src/backend/executor/execProcnode.c
index f5f9cfbeead..3af8e9d964d 100644
--- a/src/backend/executor/execProcnode.c
+++ b/src/backend/executor/execProcnode.c
@@ -72,6 +72,7 @@
*/
#include "postgres.h"
+#include "commands/explain.h"
#include "executor/executor.h"
#include "executor/nodeAgg.h"
#include "executor/nodeAppend.h"
@@ -118,6 +119,7 @@
#include "executor/nodeWorktablescan.h"
#include "miscadmin.h"
#include "nodes/nodeFuncs.h"
+#include "utils/guc.h"
static TupleTableSlot *ExecProcNodeFirst(PlanState *node);
static TupleTableSlot *ExecProcNodeInstr(PlanState *node);
@@ -461,8 +463,14 @@ ExecProcNodeFirst(PlanState *node)
* does instrumentation. Otherwise we can dispense with all wrappers and
* have ExecProcNode() directly call the relevant function from now on.
*/
+
if (node->instrument)
- node->ExecProcNode = ExecProcNodeInstr;
+ {
+ if (progressive_explain == PROGRESSIVE_EXPLAIN_ANALYZE)
+ node->ExecProcNode = ExecProcNodeInstrExplain;
+ else
+ node->ExecProcNode = ExecProcNodeInstr;
+ }
else
node->ExecProcNode = node->ExecProcNodeReal;
diff --git a/src/backend/executor/instrument.c b/src/backend/executor/instrument.c
index 56e635f4700..6a160ee254f 100644
--- a/src/backend/executor/instrument.c
+++ b/src/backend/executor/instrument.c
@@ -25,6 +25,8 @@ static WalUsage save_pgWalUsage;
static void BufferUsageAdd(BufferUsage *dst, const BufferUsage *add);
static void WalUsageAdd(WalUsage *dst, WalUsage *add);
+static void InstrEndLoopInternal(Instrumentation *instr, bool force);
+
/* Allocate new instrumentation structure(s) */
Instrumentation *
@@ -137,7 +139,7 @@ InstrUpdateTupleCount(Instrumentation *instr, double nTuples)
/* Finish a run cycle for a plan node */
void
-InstrEndLoop(Instrumentation *instr)
+InstrEndLoopInternal(Instrumentation *instr, bool force)
{
double totaltime;
@@ -145,7 +147,7 @@ InstrEndLoop(Instrumentation *instr)
if (!instr->running)
return;
- if (!INSTR_TIME_IS_ZERO(instr->starttime))
+ if (!INSTR_TIME_IS_ZERO(instr->starttime) && !force)
elog(ERROR, "InstrEndLoop called on running node");
/* Accumulate per-cycle statistics into totals */
@@ -164,6 +166,20 @@ InstrEndLoop(Instrumentation *instr)
instr->tuplecount = 0;
}
+/* Safely finish a run cycle for a plan node */
+void
+InstrEndLoop(Instrumentation *instr)
+{
+ InstrEndLoopInternal(instr, false);
+}
+
+/* Forcibly finish a run cycle for a plan node */
+void
+InstrEndLoopForce(Instrumentation *instr)
+{
+ InstrEndLoopInternal(instr, true);
+}
+
/* aggregate instrumentation information */
void
InstrAggNode(Instrumentation *dst, Instrumentation *add)
diff --git a/src/backend/storage/ipc/ipci.c b/src/backend/storage/ipc/ipci.c
index 174eed70367..15d8a3b28a8 100644
--- a/src/backend/storage/ipc/ipci.c
+++ b/src/backend/storage/ipc/ipci.c
@@ -25,6 +25,7 @@
#include "access/xlogprefetcher.h"
#include "access/xlogrecovery.h"
#include "commands/async.h"
+#include "commands/explain.h"
#include "miscadmin.h"
#include "pgstat.h"
#include "postmaster/autovacuum.h"
@@ -148,6 +149,7 @@ CalculateShmemSize(int *num_semaphores)
size = add_size(size, WaitEventCustomShmemSize());
size = add_size(size, InjectionPointShmemSize());
size = add_size(size, SlotSyncShmemSize());
+ size = add_size(size, ProgressiveExplainHashShmemSize());
/* include additional requested shmem from preload libraries */
size = add_size(size, total_addin_request);
@@ -300,6 +302,11 @@ CreateOrAttachShmemStructs(void)
*/
PredicateLockShmemInit();
+ /*
+ * Set up instrumented explain hash table
+ */
+ InitProgressiveExplainHash();
+
/*
* Set up process table
*/
diff --git a/src/backend/storage/lmgr/lwlock.c b/src/backend/storage/lmgr/lwlock.c
index 5702c35bb91..7097312b1a8 100644
--- a/src/backend/storage/lmgr/lwlock.c
+++ b/src/backend/storage/lmgr/lwlock.c
@@ -177,6 +177,7 @@ static const char *const BuiltinTrancheNames[] = {
[LWTRANCHE_SUBTRANS_SLRU] = "SubtransSLRU",
[LWTRANCHE_XACT_SLRU] = "XactSLRU",
[LWTRANCHE_PARALLEL_VACUUM_DSA] = "ParallelVacuumDSA",
+ [LWTRANCHE_PROGRESSIVE_EXPLAIN_DSA] = "ProgressiveExplainDSA",
};
StaticAssertDecl(lengthof(BuiltinTrancheNames) ==
diff --git a/src/backend/utils/activity/wait_event_names.txt b/src/backend/utils/activity/wait_event_names.txt
index 3c594415bfd..17e88e98252 100644
--- a/src/backend/utils/activity/wait_event_names.txt
+++ b/src/backend/utils/activity/wait_event_names.txt
@@ -346,6 +346,7 @@ WALSummarizer "Waiting to read or update WAL summarization state."
DSMRegistry "Waiting to read or update the dynamic shared memory registry."
InjectionPoint "Waiting to read or update information related to injection points."
SerialControl "Waiting to read or update shared <filename>pg_serial</filename> state."
+ProgressiveExplainHash "Waiting to access backend progressive explain shared hash table."
#
# END OF PREDEFINED LWLOCKS (DO NOT CHANGE THIS LINE)
diff --git a/src/backend/utils/init/postinit.c b/src/backend/utils/init/postinit.c
index 4b2faf1ba9d..67d8ed2be64 100644
--- a/src/backend/utils/init/postinit.c
+++ b/src/backend/utils/init/postinit.c
@@ -33,6 +33,7 @@
#include "catalog/pg_database.h"
#include "catalog/pg_db_role_setting.h"
#include "catalog/pg_tablespace.h"
+#include "commands/explain.h"
#include "libpq/auth.h"
#include "libpq/libpq-be.h"
#include "mb/pg_wchar.h"
@@ -81,6 +82,8 @@ static void TransactionTimeoutHandler(void);
static void IdleSessionTimeoutHandler(void);
static void IdleStatsUpdateTimeoutHandler(void);
static void ClientCheckTimeoutHandler(void);
+static void ProgressiveExplainStartupTimeoutHandler(void);
+static void ProgressiveExplainTimeoutHandler(void);
static bool ThereIsAtLeastOneRole(void);
static void process_startup_options(Port *port, bool am_superuser);
static void process_settings(Oid databaseid, Oid roleid);
@@ -764,6 +767,10 @@ InitPostgres(const char *in_dbname, Oid dboid,
RegisterTimeout(CLIENT_CONNECTION_CHECK_TIMEOUT, ClientCheckTimeoutHandler);
RegisterTimeout(IDLE_STATS_UPDATE_TIMEOUT,
IdleStatsUpdateTimeoutHandler);
+ RegisterTimeout(PROGRESSIVE_EXPLAIN_TIMEOUT,
+ ProgressiveExplainTimeoutHandler);
+ RegisterTimeout(PROGRESSIVE_EXPLAIN_STARTUP_TIMEOUT,
+ ProgressiveExplainStartupTimeoutHandler);
}
/*
@@ -1425,6 +1432,18 @@ ClientCheckTimeoutHandler(void)
SetLatch(MyLatch);
}
+static void
+ProgressiveExplainStartupTimeoutHandler(void)
+{
+ ProgressiveExplainTrigger();
+}
+
+static void
+ProgressiveExplainTimeoutHandler(void)
+{
+ ProgressiveExplainPending = true;
+}
+
/*
* Returns true if at least one role is defined in this database cluster.
*/
diff --git a/src/backend/utils/misc/guc_tables.c b/src/backend/utils/misc/guc_tables.c
index 508970680d1..7e3878a1fde 100644
--- a/src/backend/utils/misc/guc_tables.c
+++ b/src/backend/utils/misc/guc_tables.c
@@ -40,6 +40,7 @@
#include "catalog/storage.h"
#include "commands/async.h"
#include "commands/event_trigger.h"
+#include "commands/explain.h"
#include "commands/tablespace.h"
#include "commands/trigger.h"
#include "commands/user.h"
@@ -476,6 +477,14 @@ static const struct config_enum_entry wal_compression_options[] = {
{NULL, 0, false}
};
+static const struct config_enum_entry progressive_explain_options[] = {
+ {"off", PROGRESSIVE_EXPLAIN_NONE, false},
+ {"explain", PROGRESSIVE_EXPLAIN_EXPLAIN, false},
+ {"analyze", PROGRESSIVE_EXPLAIN_ANALYZE, false},
+ {"false", PROGRESSIVE_EXPLAIN_NONE, true},
+ {NULL, 0, false}
+};
+
/*
* Options for enum values stored in other modules
*/
@@ -530,6 +539,16 @@ int log_parameter_max_length_on_error = 0;
int log_temp_files = -1;
double log_statement_sample_rate = 1.0;
double log_xact_sample_rate = 0;
+int progressive_explain = PROGRESSIVE_EXPLAIN_NONE;
+bool progressive_explain_verbose = false;
+bool progressive_explain_settings = false;
+bool progressive_explain_timing = true;
+bool progressive_explain_buffers = false;
+bool progressive_explain_wal = false;
+bool progressive_explain_costs = true;
+int progressive_explain_min_duration = 1000;
+int progressive_explain_interval = 1000;
+int progressive_explain_format = EXPLAIN_FORMAT_TEXT;
char *backtrace_functions;
int temp_file_limit = -1;
@@ -2109,6 +2128,72 @@ struct config_bool ConfigureNamesBool[] =
NULL, NULL, NULL
},
+ {
+ {"progressive_explain_verbose", PGC_USERSET, STATS_MONITORING,
+ gettext_noop("Controls whether verbose details are added to progressive explains."),
+ gettext_noop("Equivalent to the VERBOSE option of EXPLAIN."),
+ GUC_EXPLAIN
+ },
+ &progressive_explain_verbose,
+ false,
+ NULL, NULL, NULL
+ },
+
+ {
+ {"progressive_explain_settings", PGC_USERSET, STATS_MONITORING,
+ gettext_noop("Controls whether information on modified configuration is added to progressive explains."),
+ gettext_noop("Equivalent to the SETTINGS option of EXPLAIN."),
+ GUC_EXPLAIN
+ },
+ &progressive_explain_settings,
+ false,
+ NULL, NULL, NULL
+ },
+
+ {
+ {"progressive_explain_timing", PGC_USERSET, STATS_MONITORING,
+ gettext_noop("Controls whether information on per node timing is added to progressive explains."),
+ gettext_noop("Equivalent to the TIMING option of EXPLAIN."),
+ GUC_EXPLAIN
+ },
+ &progressive_explain_timing,
+ true,
+ NULL, NULL, NULL
+ },
+
+ {
+ {"progressive_explain_buffers", PGC_USERSET, STATS_MONITORING,
+ gettext_noop("Controls whether information on buffer usage is added to progressive explains."),
+ gettext_noop("Equivalent to the BUFFERS option of EXPLAIN."),
+ GUC_EXPLAIN
+ },
+ &progressive_explain_buffers,
+ false,
+ NULL, NULL, NULL
+ },
+
+ {
+ {"progressive_explain_wal", PGC_USERSET, STATS_MONITORING,
+ gettext_noop("Controls whether information on WAL record generation is added to progressive explains."),
+ gettext_noop("Equivalent to the WAL option of EXPLAIN."),
+ GUC_EXPLAIN
+ },
+ &progressive_explain_wal,
+ false,
+ NULL, NULL, NULL
+ },
+
+ {
+ {"progressive_explain_costs", PGC_USERSET, STATS_MONITORING,
+ gettext_noop("Controls whether information on the estimated startup and total cost of each plan node is added to progressive explains."),
+ gettext_noop("Equivalent to the COSTS option of EXPLAIN."),
+ GUC_EXPLAIN
+ },
+ &progressive_explain_costs,
+ true,
+ NULL, NULL, NULL
+ },
+
/* End-of-list marker */
{
{NULL, 0, 0, NULL, NULL}, NULL, false, NULL, NULL, NULL
@@ -3776,6 +3861,30 @@ struct config_int ConfigureNamesInt[] =
NULL, NULL, NULL
},
+ {
+ {"progressive_explain_interval", PGC_USERSET, STATS_MONITORING,
+ gettext_noop("Sets the interval between instrumented progressive "
+ "explains."),
+ NULL,
+ GUC_UNIT_MS
+ },
+ &progressive_explain_interval,
+ 1000, 10, INT_MAX,
+ NULL, NULL, NULL
+ },
+
+ {
+ {"progressive_explain_min_duration", PGC_USERSET, STATS_MONITORING,
+ gettext_noop("Min query duration to start printing instrumented "
+ "progressive explains."),
+ NULL,
+ GUC_UNIT_MS
+ },
+ &progressive_explain_min_duration,
+ 1000, 0, INT_MAX,
+ NULL, NULL, NULL
+ },
+
/* End-of-list marker */
{
{NULL, 0, 0, NULL, NULL}, NULL, 0, 0, 0, NULL, NULL, NULL
@@ -5302,6 +5411,26 @@ struct config_enum ConfigureNamesEnum[] =
NULL, NULL, NULL
},
+ {
+ {"progressive_explain_format", PGC_USERSET, STATS_MONITORING,
+ gettext_noop("Selects the EXPLAIN output format to be used with progressive explains."),
+ gettext_noop("Equivalent to the FORMAT option of EXPLAIN.")
+ },
+ &progressive_explain_format,
+ EXPLAIN_FORMAT_TEXT, explain_format_options,
+ NULL, NULL, NULL
+ },
+
+ {
+ {"progressive_explain", PGC_USERSET, STATS_MONITORING,
+ gettext_noop("Enables progressive explains."),
+ gettext_noop("Explain output is visible via pg_stat_progress_explain.")
+ },
+ &progressive_explain,
+ PROGRESSIVE_EXPLAIN_NONE, progressive_explain_options,
+ NULL, NULL, NULL
+ },
+
/* End-of-list marker */
{
{NULL, 0, 0, NULL, NULL}, NULL, 0, NULL, NULL, NULL, NULL
diff --git a/src/backend/utils/misc/postgresql.conf.sample b/src/backend/utils/misc/postgresql.conf.sample
index 36cb64d7ebc..8d458a8f66f 100644
--- a/src/backend/utils/misc/postgresql.conf.sample
+++ b/src/backend/utils/misc/postgresql.conf.sample
@@ -657,6 +657,20 @@
#log_executor_stats = off
+# - Progressive Explain -
+
+#progressive_explain = off
+#progressive_explain_min_duration = 1s
+#progressive_explain_interval = 1s
+#progressive_explain_format = text
+#progressive_explain_settings = off
+#progressive_explain_verbose = off
+#progressive_explain_buffers = off
+#progressive_explain_wal = off
+#progressive_explain_timing = on
+#progressive_explain_costs = on
+
+
#------------------------------------------------------------------------------
# VACUUMING
#------------------------------------------------------------------------------
diff --git a/src/include/catalog/pg_proc.dat b/src/include/catalog/pg_proc.dat
index 890822eaf79..c6ecd537aaa 100644
--- a/src/include/catalog/pg_proc.dat
+++ b/src/include/catalog/pg_proc.dat
@@ -12479,4 +12479,14 @@
proargtypes => 'int4',
prosrc => 'gist_stratnum_common' },
+{ oid => '8770',
+ descr => 'statistics: information about progress of backends running statements',
+ proname => 'pg_stat_progress_explain', prorows => '100', proisstrict => 'f',
+ proretset => 't', provolatile => 's', proparallel => 'r',
+ prorettype => 'record', proargtypes => '',
+ proallargtypes => '{oid,int4,timestamptz,text}',
+ proargmodes => '{o,o,o,o}',
+ proargnames => '{datid,pid,last_update,query_plan}',
+ prosrc => 'pg_stat_progress_explain' },
+
]
diff --git a/src/include/commands/explain.h b/src/include/commands/explain.h
index 64547bd9b9c..e2a994baf5d 100644
--- a/src/include/commands/explain.h
+++ b/src/include/commands/explain.h
@@ -13,6 +13,7 @@
#ifndef EXPLAIN_H
#define EXPLAIN_H
+#include "datatype/timestamp.h"
#include "executor/executor.h"
#include "lib/stringinfo.h"
#include "parser/parse_node.h"
@@ -32,6 +33,13 @@ typedef enum ExplainFormat
EXPLAIN_FORMAT_YAML,
} ExplainFormat;
+typedef enum ProgressiveExplain
+{
+ PROGRESSIVE_EXPLAIN_NONE,
+ PROGRESSIVE_EXPLAIN_EXPLAIN,
+ PROGRESSIVE_EXPLAIN_ANALYZE,
+} ProgressiveExplain;
+
typedef struct ExplainWorkersState
{
int num_workers; /* # of worker processes the plan used */
@@ -67,12 +75,34 @@ typedef struct ExplainState
List *deparse_cxt; /* context list for deparsing expressions */
Bitmapset *printed_subplans; /* ids of SubPlans we've printed */
bool hide_workers; /* set if we find an invisible Gather */
+ bool progressive; /* set if tracking a progressive explain */
int rtable_size; /* length of rtable excluding the RTE_GROUP
* entry */
/* state related to the current plan node */
ExplainWorkersState *workers_state; /* needed if parallel plan */
+
+ /* current plan node in progressive explains */
+ struct PlanState *pe_curr_node;
+ /* reusable instr object used in progressive explains */
+ struct Instrumentation *pe_local_instr;
+ /* dsa area used to store progressive explain data */
+ dsa_area *pe_a;
} ExplainState;
+typedef struct progressiveExplainHashEntry
+{
+ int pid; /* hash key of entry - MUST BE FIRST */
+ dsa_handle h;
+ dsa_pointer p;
+} progressiveExplainHashEntry;
+
+typedef struct progressiveExplainData
+{
+ int pid;
+ TimestampTz last_update;
+ char plan[];
+} progressiveExplainData;
+
/* Hook for plugins to get control in ExplainOneQuery() */
typedef void (*ExplainOneQuery_hook_type) (Query *query,
int cursorOptions,
@@ -120,4 +150,15 @@ extern void ExplainPrintJITSummary(ExplainState *es, QueryDesc *queryDesc);
extern void ExplainQueryText(ExplainState *es, QueryDesc *queryDesc);
extern void ExplainQueryParameters(ExplainState *es, ParamListInfo params, int maxlen);
+extern void ProgressiveExplainSetup(QueryDesc *queryDesc);
+extern void ProgressiveExplainStart(QueryDesc *queryDesc);
+extern void ProgressiveExplainTrigger(void);
+extern void ProgressiveExplainUpdate(PlanState *node);
+extern void ProgressiveExplainFinish(QueryDesc *queryDesc);
+extern Size ProgressiveExplainHashShmemSize(void);
+extern void InitProgressiveExplainHash(void);
+extern TupleTableSlot *ExecProcNodeInstrExplain(PlanState *node);
+
+extern bool ProgressiveExplainPending;
+
#endif /* EXPLAIN_H */
diff --git a/src/include/executor/execdesc.h b/src/include/executor/execdesc.h
index ba53305ad42..27692aee542 100644
--- a/src/include/executor/execdesc.h
+++ b/src/include/executor/execdesc.h
@@ -48,6 +48,7 @@ typedef struct QueryDesc
TupleDesc tupDesc; /* descriptor for result tuples */
EState *estate; /* executor's query-wide state */
PlanState *planstate; /* tree of per-plan-node state */
+ struct ExplainState *pestate; /* progressive explain state if enabled */
/* This field is set by ExecutePlan */
bool already_executed; /* true if previously executed */
diff --git a/src/include/executor/instrument.h b/src/include/executor/instrument.h
index 03653ab6c6c..21de2a5632d 100644
--- a/src/include/executor/instrument.h
+++ b/src/include/executor/instrument.h
@@ -109,6 +109,7 @@ extern void InstrStartNode(Instrumentation *instr);
extern void InstrStopNode(Instrumentation *instr, double nTuples);
extern void InstrUpdateTupleCount(Instrumentation *instr, double nTuples);
extern void InstrEndLoop(Instrumentation *instr);
+extern void InstrEndLoopForce(Instrumentation *instr);
extern void InstrAggNode(Instrumentation *dst, Instrumentation *add);
extern void InstrStartParallelQuery(void);
extern void InstrEndParallelQuery(BufferUsage *bufusage, WalUsage *walusage);
diff --git a/src/include/nodes/execnodes.h b/src/include/nodes/execnodes.h
index 575b0b1bd24..47f7040af29 100644
--- a/src/include/nodes/execnodes.h
+++ b/src/include/nodes/execnodes.h
@@ -57,6 +57,7 @@ struct ExprState;
struct ExprContext;
struct RangeTblEntry; /* avoid including parsenodes.h here */
struct ExprEvalStep; /* avoid including execExpr.h everywhere */
+struct QueryDesc; /* avoid including execdesc.h here */
struct CopyMultiInsertBuffer;
struct LogicalTapeSet;
@@ -763,6 +764,9 @@ typedef struct EState
*/
List *es_insert_pending_result_relations;
List *es_insert_pending_modifytables;
+
+ /* Reference to query descriptor */
+ struct QueryDesc *query_desc;
} EState;
@@ -1159,6 +1163,8 @@ typedef struct PlanState
ExecProcNodeMtd ExecProcNode; /* function to return next tuple */
ExecProcNodeMtd ExecProcNodeReal; /* actual function, if above is a
* wrapper */
+ ExecProcNodeMtd ExecProcNodeOriginal; /* pointer to original function
+ * another wrapper was added */
Instrumentation *instrument; /* Optional runtime stats for this node */
WorkerInstrumentation *worker_instrument; /* per-worker instrumentation */
diff --git a/src/include/storage/lwlock.h b/src/include/storage/lwlock.h
index ffa03189e2d..f3499d307f4 100644
--- a/src/include/storage/lwlock.h
+++ b/src/include/storage/lwlock.h
@@ -218,6 +218,7 @@ typedef enum BuiltinTrancheIds
LWTRANCHE_SUBTRANS_SLRU,
LWTRANCHE_XACT_SLRU,
LWTRANCHE_PARALLEL_VACUUM_DSA,
+ LWTRANCHE_PROGRESSIVE_EXPLAIN_DSA,
LWTRANCHE_FIRST_USER_DEFINED,
} BuiltinTrancheIds;
diff --git a/src/include/storage/lwlocklist.h b/src/include/storage/lwlocklist.h
index cf565452382..1ff746dd5a2 100644
--- a/src/include/storage/lwlocklist.h
+++ b/src/include/storage/lwlocklist.h
@@ -83,3 +83,4 @@ PG_LWLOCK(49, WALSummarizer)
PG_LWLOCK(50, DSMRegistry)
PG_LWLOCK(51, InjectionPoint)
PG_LWLOCK(52, SerialControl)
+PG_LWLOCK(53, ProgressiveExplainHash)
diff --git a/src/include/utils/guc.h b/src/include/utils/guc.h
index 24444cbc365..08ac979cb53 100644
--- a/src/include/utils/guc.h
+++ b/src/include/utils/guc.h
@@ -278,6 +278,16 @@ extern PGDLLIMPORT int log_min_duration_statement;
extern PGDLLIMPORT int log_temp_files;
extern PGDLLIMPORT double log_statement_sample_rate;
extern PGDLLIMPORT double log_xact_sample_rate;
+extern PGDLLIMPORT int progressive_explain;
+extern PGDLLIMPORT int progressive_explain_min_duration;
+extern PGDLLIMPORT int progressive_explain_interval;
+extern PGDLLIMPORT int progressive_explain_format;
+extern PGDLLIMPORT bool progressive_explain_verbose;
+extern PGDLLIMPORT bool progressive_explain_settings;
+extern PGDLLIMPORT bool progressive_explain_timing;
+extern PGDLLIMPORT bool progressive_explain_buffers;
+extern PGDLLIMPORT bool progressive_explain_wal;
+extern PGDLLIMPORT bool progressive_explain_costs;
extern PGDLLIMPORT char *backtrace_functions;
extern PGDLLIMPORT int temp_file_limit;
@@ -321,6 +331,7 @@ extern PGDLLIMPORT const struct config_enum_entry dynamic_shared_memory_options[
extern PGDLLIMPORT const struct config_enum_entry recovery_target_action_options[];
extern PGDLLIMPORT const struct config_enum_entry wal_level_options[];
extern PGDLLIMPORT const struct config_enum_entry wal_sync_method_options[];
+extern PGDLLIMPORT const struct config_enum_entry explain_format_options[];
/*
* Functions exported by guc.c
diff --git a/src/include/utils/timeout.h b/src/include/utils/timeout.h
index 7b19beafdc9..ea66a0505d9 100644
--- a/src/include/utils/timeout.h
+++ b/src/include/utils/timeout.h
@@ -36,6 +36,8 @@ typedef enum TimeoutId
IDLE_STATS_UPDATE_TIMEOUT,
CLIENT_CONNECTION_CHECK_TIMEOUT,
STARTUP_PROGRESS_TIMEOUT,
+ PROGRESSIVE_EXPLAIN_STARTUP_TIMEOUT,
+ PROGRESSIVE_EXPLAIN_TIMEOUT,
/* First user-definable timeout reason */
USER_TIMEOUT,
/* Maximum number of timeout reasons */
diff --git a/src/test/modules/test_misc/t/008_progressive_explain.pl b/src/test/modules/test_misc/t/008_progressive_explain.pl
new file mode 100644
index 00000000000..05e555a5e26
--- /dev/null
+++ b/src/test/modules/test_misc/t/008_progressive_explain.pl
@@ -0,0 +1,130 @@
+# Copyright (c) 2023-2025, PostgreSQL Global Development Group
+#
+# Test progressive explain
+#
+# We need to make sure pg_stat_progress_explain does not show rows for the local
+# session, even if progressive explain is enabled. For other sessions pg_stat_progress_explain
+# should contain data for a PID only if progressive_explain is enabled and a query
+# is running. Data needs to be removed when query finishes (or gets cancelled).
+
+use strict;
+use PostgreSQL::Test::Cluster;
+use PostgreSQL::Test::Utils;
+use Test::More;
+
+# Initialize node
+my $node = PostgreSQL::Test::Cluster->new('progressive_explain');
+
+$node->init;
+# Configure progressive explain to be logged immediately
+$node->append_conf('postgresql.conf', 'progressive_explain_min_duration = 0');
+$node->start;
+
+# Test for local session
+sub test_local_session
+{
+ my $setting = $_[0];
+ # Make sure local session does not appear in pg_stat_progress_explain
+ my $count = $node->safe_psql(
+ 'postgres', qq[
+ SET progressive_explain='$setting';
+ SELECT count(*) from pg_stat_progress_explain WHERE pid = pg_backend_pid()
+ ]);
+
+ ok($count == "0",
+ "Session cannot see its own explain with progressive_explain set to ${setting}");
+}
+
+# Tests for peer session
+sub test_peer_session
+{
+ my $setting = $_[0];
+ my $ret;
+
+ # Start a background session and get its PID
+ my $background_psql = $node->background_psql(
+ 'postgres',
+ on_error_stop => 0);
+
+ my $pid = $background_psql->query_safe(
+ qq[
+ SELECT pg_backend_pid();
+ ]);
+
+ # Configure progressive explain in background session and run a simple query
+ # letting it finish
+ $background_psql->query_safe(
+ qq[
+ SET progressive_explain='$setting';
+ SELECT 1;
+ ]);
+
+ # Check that pg_stat_progress_explain contains no row for the PID that finished
+ # its query gracefully
+ $ret = $node->safe_psql(
+ 'postgres', qq[
+ SELECT count(*) FROM pg_stat_progress_explain where pid = $pid
+ ]);
+
+ ok($ret == "0",
+ "pg_stat_progress_explain empty for completed query with progressive_explain set to ${setting}");
+
+ # Start query in background session and leave it running
+ $background_psql->query_until(
+ qr/start/, q(
+ \echo start
+ SELECT pg_sleep(600);
+ ));
+
+ $ret = $node->safe_psql(
+ 'postgres', qq[
+ SELECT count(*) FROM pg_stat_progress_explain where pid = $pid
+ ]);
+
+ # If progressive_explain is disabled pg_stat_progress_explain should not contain
+ # row for PID
+ if ($setting eq 'off') {
+ ok($ret == "0",
+ "pg_stat_progress_explain empty for running query with progressive_explain set to ${setting}");
+ }
+ # 1 row for pid is expected for running query
+ else {
+ ok($ret == "1",
+ "pg_stat_progress_explain with 1 row for running query with progressive_explain set to ${setting}");
+ }
+
+ # Terminate running query and make sure it is gone
+ $node->safe_psql(
+ 'postgres', qq[
+ SELECT pg_cancel_backend($pid);
+ ]);
+
+ $node->poll_query_until(
+ 'postgres', qq[
+ SELECT count(*) = 0 FROM pg_stat_activity
+ WHERE pid = $pid and state = 'active'
+ ]);
+
+ # Check again pg_stat_progress_explain and confirm that the existing row is
+ # now gone
+ $ret = $node->safe_psql(
+ 'postgres', qq[
+ SELECT count(*) FROM pg_stat_progress_explain where pid = $pid
+ ]);
+
+ ok($ret == "0",
+ "pg_stat_progress_explain empty for canceled query with progressive_explain set to ${setting}");
+}
+
+# Run tests for the local session
+test_local_session('off');
+test_local_session('explain');
+test_local_session('analyze');
+
+# Run tests for peer session
+test_peer_session('off');
+test_peer_session('explain');
+test_peer_session('analyze');
+
+$node->stop;
+done_testing();
diff --git a/src/test/regress/expected/rules.out b/src/test/regress/expected/rules.out
index 62f69ac20b2..6aac983be42 100644
--- a/src/test/regress/expected/rules.out
+++ b/src/test/regress/expected/rules.out
@@ -2041,6 +2041,13 @@ pg_stat_progress_create_index| SELECT s.pid,
s.param15 AS partitions_done
FROM (pg_stat_get_progress_info('CREATE INDEX'::text) s(pid, datid, relid, param1, param2, param3, param4, param5, param6, param7, param8, param9, param10, param11, param12, param13, param14, param15, param16, param17, param18, param19, param20)
LEFT JOIN pg_database d ON ((s.datid = d.oid)));
+pg_stat_progress_explain| SELECT s.datid,
+ d.datname,
+ s.pid,
+ s.last_update,
+ s.query_plan
+ FROM (pg_stat_progress_explain() s(datid, pid, last_update, query_plan)
+ LEFT JOIN pg_database d ON ((s.datid = d.oid)));
pg_stat_progress_vacuum| SELECT s.pid,
s.datid,
d.datname,
diff --git a/src/tools/pgindent/typedefs.list b/src/tools/pgindent/typedefs.list
index 93339ef3c58..09c21dc28b8 100644
--- a/src/tools/pgindent/typedefs.list
+++ b/src/tools/pgindent/typedefs.list
@@ -2270,6 +2270,7 @@ ProcessUtilityContext
ProcessUtility_hook_type
ProcessingMode
ProgressCommandType
+ProgressiveExplain
ProjectSet
ProjectSetPath
ProjectSetState
@@ -3875,6 +3876,8 @@ process_sublinks_context
proclist_head
proclist_mutable_iter
proclist_node
+progressiveExplainData
+progressiveExplainHashEntry
promptStatus_t
pthread_barrier_t
pthread_cond_t
--
2.43.0
Sending a new version as rebase was required.
Rafael.
Attachments:
v10-0001-Proposal-for-progressive-explains.patchapplication/octet-stream; name=v10-0001-Proposal-for-progressive-explains.patchDownload
From 7b17718be73ba4aa690bf936a05479d0f9771fe6 Mon Sep 17 00:00:00 2001
From: rafaelthca <rafaelthca@gmail.com>
Date: Wed, 19 Mar 2025 14:40:25 -0300
Subject: [PATCH v10] Proposal for progressive explains
This proposal introduces a feature to print execution plans of active
queries in an in-memory shared hash object so that other sessions can
visualize via new view pg_stat_progress_explain.
Plans are only printed if new GUC parameter progressive_explain is
enabled.
When progressive_explain is set to 'explain' the plan will be printed
only once after progressive_explain_min_duration. If set to 'analyze'
the QueryDesc will be adjusted adding instrumentation flags. In that
case the plan will be printed on a fixed interval controlled by new
GUC parameter progressive_explain_interval including all instrumentation
stats computed so far (per node rows and execution time).
New view:
- pg_stat_progress_explain
- datid: OID of the database
- datname: Name of the database
- pid: PID of the process running the query
- last_update: timestamp when plan was last printed
- query_plan: the actual plan (limited read privileges)
New GUCs:
- progressive_explain: if progressive plans are printed for local
session.
- type: enum
- default: off
- values: [off, explain, analyze]
- context: user
- progressive_explain_min_duration: min query duration until progressive
explain starts.
- type: int
- default: 1s
- min: 0
- context: user
- progressive_explain_interval: interval between each explain print.
- type: int
- default: 1s
- min: 10ms
- context: user
- progressive_explain_format: format used to print the plans.
- type: enum
- default: text
- values: [TEXT, XML, JSON, or YAML]
- context: user
- progressive_explain_settings: controls whether information about
modified configuration is added to the printed plan.
- type: bool
- default: off
- context: user
- progressive_explain_verbose: controls whether verbose details are
added to the printed plan.
- type: bool
- default: off
- context: user
- progressive_explain_buffers: controls whether buffers details are
added to the printed plan.
- type: bool
- default: off
- context: user
- progressive_explain_timing: controls whether per node timing details
are added to the printed plan.
- type: bool
- default: true
- context: user
- progressive_explain_wal: controls whether WAL record generation
details are added to the printed plan.
- type: bool
- default: off
- context: user
- progressive_explain_costs: controls whether estimated startup and
total cost of each plan noded is added to the printed plan.
- type: bool
- default: true
- context: user
---
contrib/auto_explain/auto_explain.c | 10 +-
doc/src/sgml/config.sgml | 150 ++++
doc/src/sgml/monitoring.sgml | 82 ++
doc/src/sgml/perform.sgml | 97 ++
src/backend/catalog/system_views.sql | 10 +
src/backend/commands/explain.c | 831 +++++++++++++++++-
src/backend/executor/execMain.c | 24 +
src/backend/executor/execProcnode.c | 10 +-
src/backend/executor/instrument.c | 20 +-
src/backend/storage/ipc/ipci.c | 7 +
src/backend/storage/lmgr/lwlock.c | 1 +
.../utils/activity/wait_event_names.txt | 1 +
src/backend/utils/init/postinit.c | 19 +
src/backend/utils/misc/guc_tables.c | 131 +++
src/backend/utils/misc/postgresql.conf.sample | 14 +
src/include/catalog/pg_proc.dat | 10 +
src/include/commands/explain.h | 33 +
src/include/commands/explain_state.h | 9 +
src/include/executor/execdesc.h | 1 +
src/include/executor/instrument.h | 1 +
src/include/nodes/execnodes.h | 6 +
src/include/storage/lwlock.h | 1 +
src/include/storage/lwlocklist.h | 1 +
src/include/utils/guc.h | 11 +
src/include/utils/timeout.h | 2 +
.../test_misc/t/008_progressive_explain.pl | 130 +++
src/test/regress/expected/rules.out | 7 +
src/tools/pgindent/typedefs.list | 3 +
28 files changed, 1563 insertions(+), 59 deletions(-)
create mode 100644 src/test/modules/test_misc/t/008_progressive_explain.pl
diff --git a/contrib/auto_explain/auto_explain.c b/contrib/auto_explain/auto_explain.c
index 3b73bd19107..769b79ecc6b 100644
--- a/contrib/auto_explain/auto_explain.c
+++ b/contrib/auto_explain/auto_explain.c
@@ -39,14 +39,6 @@ static int auto_explain_log_level = LOG;
static bool auto_explain_log_nested_statements = false;
static double auto_explain_sample_rate = 1;
-static const struct config_enum_entry format_options[] = {
- {"text", EXPLAIN_FORMAT_TEXT, false},
- {"xml", EXPLAIN_FORMAT_XML, false},
- {"json", EXPLAIN_FORMAT_JSON, false},
- {"yaml", EXPLAIN_FORMAT_YAML, false},
- {NULL, 0, false}
-};
-
static const struct config_enum_entry loglevel_options[] = {
{"debug5", DEBUG5, false},
{"debug4", DEBUG4, false},
@@ -188,7 +180,7 @@ _PG_init(void)
NULL,
&auto_explain_log_format,
EXPLAIN_FORMAT_TEXT,
- format_options,
+ explain_format_options,
PGC_SUSET,
0,
NULL,
diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml
index 873290daa61..19357194b26 100644
--- a/doc/src/sgml/config.sgml
+++ b/doc/src/sgml/config.sgml
@@ -8682,6 +8682,156 @@ COPY postgres_log FROM '/full/path/to/logfile.csv' WITH csv;
</listitem>
</varlistentry>
+ <varlistentry id="guc-progressive-explain" xreflabel="progressive_explain">
+ <term><varname>progressive_explain</varname> (<type>enum</type>)
+ <indexterm>
+ <primary><varname>progressive_explain</varname> configuration parameter</primary>
+ </indexterm>
+ </term>
+ <listitem>
+ <para>
+ Determines whether progressive explains are enabled and how
+ they are executed; see <xref linkend="using-explain-progressive"/>.
+ Possible values are <literal>off</literal>, <literal>explain</literal>
+ and <literal>analyze</literal>. The default is <literal>off</literal>.
+ </para>
+ </listitem>
+ </varlistentry>
+
+ <varlistentry id="guc-progressive-explain-min-duration" xreflabel="progressive_explain_min_duration">
+ <term><varname>progressive_explain_min_duration</varname> (<type>integer</type>)
+ <indexterm>
+ <primary><varname>progressive_explain_min_duration</varname> configuration parameter</primary>
+ </indexterm>
+ </term>
+ <listitem>
+ <para>
+ Sets the threshold (in milliseconds) until progressive explain is
+ printed for the first time. The default is <literal>1s</literal>.
+ </para>
+ </listitem>
+ </varlistentry>
+
+ <varlistentry id="guc-progressive-explain-interval" xreflabel="progressive_explain_interval">
+ <term><varname>progressive_explain_interval</varname> (<type>integer</type>)
+ <indexterm>
+ <primary><varname>progressive_explain_interval</varname> configuration parameter</primary>
+ </indexterm>
+ </term>
+ <listitem>
+ <para>
+ Sets the interval (in milliseconds) between each instrumented
+ progressive explain. The default is <literal>1s</literal>.
+ </para>
+ </listitem>
+ </varlistentry>
+
+ <varlistentry id="guc-progressive-explain-buffers" xreflabel="progressive_explain_buffers">
+ <term><varname>progressive_explain_buffers</varname> (<type>boolean</type>)
+ <indexterm>
+ <primary><varname>progressive_explain_buffers</varname> configuration parameter</primary>
+ </indexterm>
+ </term>
+ <listitem>
+ <para>
+ Controls whether information on buffer usage is added to
+ progressive explains. Equivalent to the BUFFERS option of
+ EXPLAIN. The default is <literal>off</literal>.
+ </para>
+ </listitem>
+ </varlistentry>
+
+ <varlistentry id="guc-progressive-explain-timing" xreflabel="progressive_explain_timing">
+ <term><varname>progressive_explain_timing</varname> (<type>boolean</type>)
+ <indexterm>
+ <primary><varname>progressive_explain_timing</varname> configuration parameter</primary>
+ </indexterm>
+ </term>
+ <listitem>
+ <para>
+ Controls whether information on per node timing is added
+ to progressive explains. Equivalent to the TIMING option of
+ EXPLAIN. The default is <literal>off</literal>.
+ </para>
+ </listitem>
+ </varlistentry>
+
+ <varlistentry id="guc-progressive-explain-wal" xreflabel="progressive_explain_wal">
+ <term><varname>progressive_explain_wal</varname> (<type>boolean</type>)
+ <indexterm>
+ <primary><varname>progressive_explain_wal</varname> configuration parameter</primary>
+ </indexterm>
+ </term>
+ <listitem>
+ <para>
+ Controls whether information on WAL record generation is
+ added to progressive explains. Equivalent to the WAL option of
+ EXPLAIN. The default is <literal>off</literal>.
+ </para>
+ </listitem>
+ </varlistentry>
+
+ <varlistentry id="guc-progressive-explain-verbose" xreflabel="progressive_explain_verbose">
+ <term><varname>progressive_explain_verbose</varname> (<type>boolean</type>)
+ <indexterm>
+ <primary><varname>progressive_explain_verbose</varname> configuration parameter</primary>
+ </indexterm>
+ </term>
+ <listitem>
+ <para>
+ Controls whether verbose details are added to progressive explains.
+ Equivalent to the VERBOSE option of EXPLAIN. The default is
+ <literal>off</literal>.
+ </para>
+ </listitem>
+ </varlistentry>
+
+ <varlistentry id="guc-progressive-explain-settings" xreflabel="progressive_explain_settings">
+ <term><varname>progressive_explain_settings</varname> (<type>boolean</type>)
+ <indexterm>
+ <primary><varname>progressive_explain_settings</varname> configuration parameter</primary>
+ </indexterm>
+ </term>
+ <listitem>
+ <para>
+ Controls whether information on modified configuration is added to
+ progressive explains. Equivalent to the SETTINGS option of EXPLAIN.
+ The default is <literal>off</literal>.
+ </para>
+ </listitem>
+ </varlistentry>
+
+ <varlistentry id="guc-progressive-explain-costs" xreflabel="progressive_explain_costs">
+ <term><varname>progressive_explain_costs</varname> (<type>boolean</type>)
+ <indexterm>
+ <primary><varname>progressive_explain_costs</varname> configuration parameter</primary>
+ </indexterm>
+ </term>
+ <listitem>
+ <para>
+ Controls whether information on the estimated startup and total cost of
+ each plan node is added to progressive explains. Equivalent to the COSTS
+ option of EXPLAIN.
+ The default is <literal>off</literal>.
+ </para>
+ </listitem>
+ </varlistentry>
+
+ <varlistentry id="guc-progressive-explain-format" xreflabel="progressive_explain_format">
+ <term><varname>progressive_explain_format</varname> (<type>enum</type>)
+ <indexterm>
+ <primary><varname>progressive_explain_format</varname> configuration parameter</primary>
+ </indexterm>
+ </term>
+ <listitem>
+ <para>
+ Selects the EXPLAIN output format to be used with progressive
+ explains. Equivalent to the FORMAT option of EXPLAIN. The default
+ is <literal>text</literal>.
+ </para>
+ </listitem>
+ </varlistentry>
+
</variablelist>
</sect2>
diff --git a/doc/src/sgml/monitoring.sgml b/doc/src/sgml/monitoring.sgml
index aaa6586d3a4..9f654c4e649 100644
--- a/doc/src/sgml/monitoring.sgml
+++ b/doc/src/sgml/monitoring.sgml
@@ -6828,6 +6828,88 @@ FROM pg_stat_get_backend_idset() AS backendid;
</sect2>
+<sect2 id="explain-progress-reporting">
+ <title>EXPLAIN Progress Reporting</title>
+
+ <indexterm>
+ <primary>pg_stat_progress_explain</primary>
+ </indexterm>
+
+ <para>
+ Whenever a client backend or parallel worker is running a query with
+ <xref linkend="guc-progressive-explain"/> enabled, the
+ <structname>pg_stat_progress_explain</structname> view will contain a
+ corresponding row with query plan details; see
+ <xref linkend="using-explain-progressive"/>. The table below describe the
+ information that will be reported.
+ </para>
+
+ <table id="pg-stat-progress-explain-view" xreflabel="pg_stat_progress_explain">
+ <title><structname>pg_stat_progress_explain</structname> View</title>
+ <tgroup cols="1">
+ <thead>
+ <row>
+ <entry role="catalog_table_entry"><para role="column_definition">
+ Column Type
+ </para>
+ <para>
+ Description
+ </para></entry>
+ </row>
+ </thead>
+
+ <tbody>
+ <row>
+ <entry role="catalog_table_entry"><para role="column_definition">
+ <structfield>datid</structfield> <type>oid</type>
+ </para>
+ <para>
+ OID of the database this backend is connected to
+ </para></entry>
+ </row>
+
+ <row>
+ <entry role="catalog_table_entry"><para role="column_definition">
+ <structfield>datname</structfield> <type>name</type>
+ </para>
+ <para>
+ Name of the database this backend is connected to
+ </para></entry>
+ </row>
+
+ <row>
+ <entry role="catalog_table_entry"><para role="column_definition">
+ <structfield>pid</structfield> <type>integer</type>
+ </para>
+ <para>
+ Process ID of a client backend or parallel worker.
+ </para></entry>
+ </row>
+
+ <row>
+ <entry role="catalog_table_entry"><para role="column_definition">
+ <structfield>last_update</structfield> <type>timestamp with time zone</type>
+ </para>
+ <para>
+ Timestamp when plan was last printed.
+ </para></entry>
+ </row>
+
+ <row>
+ <entry role="catalog_table_entry"><para role="column_definition">
+ <structfield>query_plan</structfield> <type>text</type>
+ </para>
+ <para>
+ The progressive explain text.
+ </para></entry>
+ </row>
+
+ </tbody>
+ </tgroup>
+ </table>
+
+ </sect2>
+
</sect1>
<sect1 id="dynamic-trace">
diff --git a/doc/src/sgml/perform.sgml b/doc/src/sgml/perform.sgml
index 387baac7e8c..04a78f29df9 100644
--- a/doc/src/sgml/perform.sgml
+++ b/doc/src/sgml/perform.sgml
@@ -1169,6 +1169,103 @@ EXPLAIN ANALYZE SELECT * FROM tenk1 WHERE unique1 < 100 AND unique2 > 9000
</para>
</sect2>
+ <sect2 id="using-explain-progressive">
+ <title>Progressive <command>EXPLAIN</command></title>
+
+ <para>
+ The query plan created by the planner for any given active query can
+ be visualized by any session via <xref linkend="pg-stat-progress-explain-view"/>
+ view when <xref linkend="guc-progressive-explain"/> is enabled in the
+ client backend or parallel worker executing query and after min duration
+ specified by <xref linkend="guc-progressive-explain-min-duration"/> has
+ passed. Settings <xref linkend="guc-progressive-explain-timing"/>,
+ <xref linkend="guc-progressive-explain-buffers"/> and
+ <xref linkend="guc-progressive-explain-wal"/> control which additional
+ instrumentaton details are collected and included in the output while
+ <xref linkend="guc-progressive-explain-format"/>,
+ <xref linkend="guc-progressive-explain-verbose"/>,
+ <xref linkend="guc-progressive-explain-settings"/> and
+ <xref linkend="guc-progressive-explain-costs"/>
+ define how the plan is printed and which details are added there.
+ </para>
+
+ <para>
+ When <xref linkend="guc-progressive-explain"/> is set to <literal>explain</literal>
+ the plan will be printed once at the beginning of the query.
+ </para>
+
+ <para>
+<screen>
+SET progressive_explain = 'explain';
+SET
+
+SELECT * FROM test t1 INNER JOIN test t2 ON t1.c1=t2.c1;
+</screen>
+ </para>
+ <para>
+<screen>
+SELECT * FROM pg_stat_progress_explain;
+datid | datname | pid | last_update | query_plan
+-------+----------+-------+-------------------------------+------------------------------------------------------------------------------
+ 5 | postgres | 73972 | 2025-03-13 23:41:01.606324-03 | Hash Join (cost=327879.85..878413.95 rows=9999860 width=18) +
+ | | | | Hash Cond: (t1.c1 = t2.c1) +
+ | | | | -> Seq Scan on test t1 (cost=0.00..154053.60 rows=9999860 width=9) +
+ | | | | -> Hash (cost=154053.60..154053.60 rows=9999860 width=9) +
+ | | | | -> Seq Scan on test t2 (cost=0.00..154053.60 rows=9999860 width=9)+
+ | | | |
+(1 row)
+</screen>
+ </para>
+
+ <para>
+ Setting <xref linkend="guc-progressive-explain"/> to <literal>analyze</literal>
+ will enable instrumentation and the detailed plan is printed on a fixed interval
+ controlled by <xref linkend="guc-progressive-explain-interval"/>, including
+ per node accumulated row count and other statistics.
+ </para>
+
+ <para>
+ Progressive explains include additional information per node to help analyzing
+ execution progress:
+
+ <itemizedlist>
+ <listitem>
+ <para>
+ current: the plan node currently being processed.
+ </para>
+ </listitem>
+ <listitem>
+ <para>
+ never executed: a plan node not processed yet.
+ </para>
+ </listitem>
+ </itemizedlist>
+ </para>
+ <para>
+<screen>
+SET progressive_explain = 'analyze';
+SET
+
+SELECT * FROM test t1 INNER JOIN test t2 ON t1.c1=t2.c1;
+</screen>
+ </para>
+ <para>
+<screen>
+SELECT * FROM pg_stat_progress_explain;
+datid | datname | pid | last_update | query_plan
+-------+----------+-------+-------------------------------+---------------------------------------------------------------------------------------------------------------------------------------
+ 5 | postgres | 73972 | 2025-03-13 23:41:53.951884-03 | Hash Join (cost=327879.85..878413.95 rows=9999860 width=18) (actual time=1581.469..2963.158 rows=64862.00 loops=1) +
+ | | | | Hash Cond: (t1.c1 = t2.c1) +
+ | | | | -> Seq Scan on test t1 (cost=0.00..154053.60 rows=9999860 width=9) (actual time=0.079..486.702 rows=8258962.00 loops=1) (current)+
+ | | | | -> Hash (cost=154053.60..154053.60 rows=9999860 width=9) (actual time=1580.933..1580.933 rows=10000000.00 loops=1) +
+ | | | | -> Seq Scan on test t2 (cost=0.00..154053.60 rows=9999860 width=9) (actual time=0.004..566.961 rows=10000000.00 loops=1) +
+ | | | |
+(1 row)
+</screen>
+ </para>
+
+ </sect2>
+
</sect1>
<sect1 id="planner-stats">
diff --git a/src/backend/catalog/system_views.sql b/src/backend/catalog/system_views.sql
index a4d2cfdcaf5..460946a4079 100644
--- a/src/backend/catalog/system_views.sql
+++ b/src/backend/catalog/system_views.sql
@@ -1334,6 +1334,16 @@ CREATE VIEW pg_stat_progress_copy AS
FROM pg_stat_get_progress_info('COPY') AS S
LEFT JOIN pg_database D ON S.datid = D.oid;
+CREATE VIEW pg_stat_progress_explain AS
+ SELECT
+ S.datid AS datid,
+ D.datname AS datname,
+ S.pid,
+ S.last_update,
+ S.query_plan
+ FROM pg_stat_progress_explain() AS S
+ LEFT JOIN pg_database AS D ON (S.datid = D.oid);
+
CREATE VIEW pg_user_mappings AS
SELECT
U.oid AS umid,
diff --git a/src/backend/commands/explain.c b/src/backend/commands/explain.c
index 33a16d2d8e2..d0ac8d2e998 100644
--- a/src/backend/commands/explain.c
+++ b/src/backend/commands/explain.c
@@ -23,6 +23,7 @@
#include "commands/explain_state.h"
#include "commands/prepare.h"
#include "foreign/fdwapi.h"
+#include "funcapi.h"
#include "jit/jit.h"
#include "libpq/pqformat.h"
#include "libpq/protocol.h"
@@ -34,6 +35,7 @@
#include "rewrite/rewriteHandler.h"
#include "storage/bufmgr.h"
#include "tcop/tcopprot.h"
+#include "utils/backend_status.h"
#include "utils/builtins.h"
#include "utils/guc_tables.h"
#include "utils/json.h"
@@ -41,11 +43,25 @@
#include "utils/rel.h"
#include "utils/ruleutils.h"
#include "utils/snapmgr.h"
+#include "utils/timeout.h"
#include "utils/tuplesort.h"
#include "utils/typcache.h"
#include "utils/xml.h"
+#define PROGRESSIVE_EXPLAIN_FREE_SIZE 4096
+
+/*
+ * GUC support
+ */
+const struct config_enum_entry explain_format_options[] = {
+ {"text", EXPLAIN_FORMAT_TEXT, false},
+ {"xml", EXPLAIN_FORMAT_XML, false},
+ {"json", EXPLAIN_FORMAT_JSON, false},
+ {"yaml", EXPLAIN_FORMAT_YAML, false},
+ {NULL, 0, false}
+};
+
/* Hook for plugins to get control in ExplainOneQuery() */
ExplainOneQuery_hook_type ExplainOneQuery_hook = NULL;
@@ -56,6 +72,15 @@ explain_get_index_name_hook_type explain_get_index_name_hook = NULL;
explain_per_plan_hook_type explain_per_plan_hook = NULL;
explain_per_node_hook_type explain_per_node_hook = NULL;
+/* Shared hash to store progressive explains */
+static HTAB *progressiveExplainHash = NULL;
+
+/* Pointer to running query */
+static QueryDesc *activeQueryDesc = NULL;
+
+/* Flag set by timeouts to control when to update progressive explains */
+bool ProgressiveExplainPending = false;
+
/*
* Various places within need to convert bytes to kilobytes. Round these up
* to the next whole kilobyte.
@@ -139,7 +164,7 @@ static void show_indexsearches_info(PlanState *planstate, ExplainState *es);
static void show_tidbitmap_info(BitmapHeapScanState *planstate,
ExplainState *es);
static void show_instrumentation_count(const char *qlabel, int which,
- PlanState *planstate, ExplainState *es);
+ Instrumentation *instr, ExplainState *es);
static void show_foreignscan_info(ForeignScanState *fsstate, ExplainState *es);
static const char *explain_get_index_name(Oid indexId);
static bool peek_buffer_usage(ExplainState *es, const BufferUsage *usage);
@@ -165,6 +190,13 @@ static ExplainWorkersState *ExplainCreateWorkersState(int num_workers);
static void ExplainOpenWorker(int n, ExplainState *es);
static void ExplainCloseWorker(int n, ExplainState *es);
static void ExplainFlushWorkersState(ExplainState *es);
+static void ProgressiveExplainInit(QueryDesc *queryDesc);
+static void ProgressiveExplainPrint(QueryDesc *queryDesc);
+static void ProgressiveExplainCleanup(QueryDesc *queryDesc);
+static TupleTableSlot *ExecProcNodeExplain(PlanState *node);
+static void WrapExecProcNodeWithExplain(PlanState *ps);
+static void UnwrapExecProcNodeWithExplain(PlanState *ps);
+static void ProgressiveExplainReleaseFunc(void *);
@@ -596,6 +628,15 @@ ExplainOnePlan(PlannedStmt *plannedstmt, CachedPlan *cplan,
/* We can't run ExecutorEnd 'till we're done printing the stats... */
totaltime += elapsed_time(&starttime);
}
+ else
+ {
+ /*
+ * Handle progressive explain cleanup manually if enabled as it is not
+ * called as part of ExecutorFinish.
+ */
+ if (progressive_explain != PROGRESSIVE_EXPLAIN_NONE)
+ ProgressiveExplainFinish(queryDesc);
+ }
/* grab serialization metrics before we destroy the DestReceiver */
if (es->serialize != EXPLAIN_SERIALIZE_NONE)
@@ -1371,6 +1412,7 @@ ExplainNode(PlanState *planstate, List *ancestors,
const char *partialmode = NULL;
const char *operation = NULL;
const char *custom_name = NULL;
+ Instrumentation *local_instr = NULL;
ExplainWorkersState *save_workers_state = es->workers_state;
int save_indent = es->indent;
bool haschildren;
@@ -1834,17 +1876,38 @@ ExplainNode(PlanState *planstate, List *ancestors,
* instrumentation results the user didn't ask for. But we do the
* InstrEndLoop call anyway, if possible, to reduce the number of cases
* auto_explain has to contend with.
+ *
+ * For regular explains instrumentation clean up is called directly in the
+ * main instrumentation objects. Progressive explains need to clone
+ * instrumentation object and forcibly end the loop in nodes that may be
+ * running.
*/
if (planstate->instrument)
- InstrEndLoop(planstate->instrument);
+ {
+ /* Progressive explain. Use auxiliary instrumentation object */
+ if (es->progressive)
+ {
+ local_instr = es->pe_local_instr;
+ *local_instr = *planstate->instrument;
+
+ /* Force end loop even if node is in progress */
+ InstrEndLoopForce(local_instr);
+ }
+ /* Use main instrumentation */
+ else
+ {
+ local_instr = planstate->instrument;
+ InstrEndLoop(local_instr);
+ }
+ }
if (es->analyze &&
- planstate->instrument && planstate->instrument->nloops > 0)
+ local_instr && local_instr->nloops > 0)
{
- double nloops = planstate->instrument->nloops;
- double startup_ms = 1000.0 * planstate->instrument->startup / nloops;
- double total_ms = 1000.0 * planstate->instrument->total / nloops;
- double rows = planstate->instrument->ntuples / nloops;
+ double nloops = local_instr->nloops;
+ double startup_ms = 1000.0 * local_instr->startup / nloops;
+ double total_ms = 1000.0 * local_instr->total / nloops;
+ double rows = local_instr->ntuples / nloops;
if (es->format == EXPLAIN_FORMAT_TEXT)
{
@@ -1854,6 +1917,9 @@ ExplainNode(PlanState *planstate, List *ancestors,
appendStringInfo(es->str, "time=%.3f..%.3f ", startup_ms, total_ms);
appendStringInfo(es->str, "rows=%.2f loops=%.0f)", rows, nloops);
+
+ if (es->progressive && planstate == es->pe_curr_node)
+ appendStringInfo(es->str, " (current)");
}
else
{
@@ -1866,6 +1932,10 @@ ExplainNode(PlanState *planstate, List *ancestors,
}
ExplainPropertyFloat("Actual Rows", NULL, rows, 2, es);
ExplainPropertyFloat("Actual Loops", NULL, nloops, 0, es);
+
+ /* Progressive explain. Add current node flag is applicable */
+ if (es->progressive && planstate == es->pe_curr_node)
+ ExplainPropertyBool("Current", true, es);
}
}
else if (es->analyze)
@@ -1970,13 +2040,13 @@ ExplainNode(PlanState *planstate, List *ancestors,
"Index Cond", planstate, ancestors, es);
if (((IndexScan *) plan)->indexqualorig)
show_instrumentation_count("Rows Removed by Index Recheck", 2,
- planstate, es);
+ local_instr, es);
show_scan_qual(((IndexScan *) plan)->indexorderbyorig,
"Order By", planstate, ancestors, es);
show_scan_qual(plan->qual, "Filter", planstate, ancestors, es);
if (plan->qual)
show_instrumentation_count("Rows Removed by Filter", 1,
- planstate, es);
+ local_instr, es);
show_indexsearches_info(planstate, es);
break;
case T_IndexOnlyScan:
@@ -1984,16 +2054,16 @@ ExplainNode(PlanState *planstate, List *ancestors,
"Index Cond", planstate, ancestors, es);
if (((IndexOnlyScan *) plan)->recheckqual)
show_instrumentation_count("Rows Removed by Index Recheck", 2,
- planstate, es);
+ local_instr, es);
show_scan_qual(((IndexOnlyScan *) plan)->indexorderby,
"Order By", planstate, ancestors, es);
show_scan_qual(plan->qual, "Filter", planstate, ancestors, es);
if (plan->qual)
show_instrumentation_count("Rows Removed by Filter", 1,
- planstate, es);
+ local_instr, es);
if (es->analyze)
ExplainPropertyFloat("Heap Fetches", NULL,
- planstate->instrument->ntuples2, 0, es);
+ local_instr->ntuples2, 0, es);
show_indexsearches_info(planstate, es);
break;
case T_BitmapIndexScan:
@@ -2006,11 +2076,11 @@ ExplainNode(PlanState *planstate, List *ancestors,
"Recheck Cond", planstate, ancestors, es);
if (((BitmapHeapScan *) plan)->bitmapqualorig)
show_instrumentation_count("Rows Removed by Index Recheck", 2,
- planstate, es);
+ local_instr, es);
show_scan_qual(plan->qual, "Filter", planstate, ancestors, es);
if (plan->qual)
show_instrumentation_count("Rows Removed by Filter", 1,
- planstate, es);
+ local_instr, es);
show_tidbitmap_info((BitmapHeapScanState *) planstate, es);
break;
case T_SampleScan:
@@ -2027,7 +2097,7 @@ ExplainNode(PlanState *planstate, List *ancestors,
show_scan_qual(plan->qual, "Filter", planstate, ancestors, es);
if (plan->qual)
show_instrumentation_count("Rows Removed by Filter", 1,
- planstate, es);
+ local_instr, es);
if (IsA(plan, CteScan))
show_ctescan_info(castNode(CteScanState, planstate), es);
break;
@@ -2038,7 +2108,7 @@ ExplainNode(PlanState *planstate, List *ancestors,
show_scan_qual(plan->qual, "Filter", planstate, ancestors, es);
if (plan->qual)
show_instrumentation_count("Rows Removed by Filter", 1,
- planstate, es);
+ local_instr, es);
ExplainPropertyInteger("Workers Planned", NULL,
gather->num_workers, es);
@@ -2062,7 +2132,7 @@ ExplainNode(PlanState *planstate, List *ancestors,
show_scan_qual(plan->qual, "Filter", planstate, ancestors, es);
if (plan->qual)
show_instrumentation_count("Rows Removed by Filter", 1,
- planstate, es);
+ local_instr, es);
ExplainPropertyInteger("Workers Planned", NULL,
gm->num_workers, es);
@@ -2096,7 +2166,7 @@ ExplainNode(PlanState *planstate, List *ancestors,
show_scan_qual(plan->qual, "Filter", planstate, ancestors, es);
if (plan->qual)
show_instrumentation_count("Rows Removed by Filter", 1,
- planstate, es);
+ local_instr, es);
break;
case T_TableFuncScan:
if (es->verbose)
@@ -2110,7 +2180,7 @@ ExplainNode(PlanState *planstate, List *ancestors,
show_scan_qual(plan->qual, "Filter", planstate, ancestors, es);
if (plan->qual)
show_instrumentation_count("Rows Removed by Filter", 1,
- planstate, es);
+ local_instr, es);
show_table_func_scan_info(castNode(TableFuncScanState,
planstate), es);
break;
@@ -2128,7 +2198,7 @@ ExplainNode(PlanState *planstate, List *ancestors,
show_scan_qual(plan->qual, "Filter", planstate, ancestors, es);
if (plan->qual)
show_instrumentation_count("Rows Removed by Filter", 1,
- planstate, es);
+ local_instr, es);
}
break;
case T_TidRangeScan:
@@ -2145,14 +2215,14 @@ ExplainNode(PlanState *planstate, List *ancestors,
show_scan_qual(plan->qual, "Filter", planstate, ancestors, es);
if (plan->qual)
show_instrumentation_count("Rows Removed by Filter", 1,
- planstate, es);
+ local_instr, es);
}
break;
case T_ForeignScan:
show_scan_qual(plan->qual, "Filter", planstate, ancestors, es);
if (plan->qual)
show_instrumentation_count("Rows Removed by Filter", 1,
- planstate, es);
+ local_instr, es);
show_foreignscan_info((ForeignScanState *) planstate, es);
break;
case T_CustomScan:
@@ -2162,7 +2232,7 @@ ExplainNode(PlanState *planstate, List *ancestors,
show_scan_qual(plan->qual, "Filter", planstate, ancestors, es);
if (plan->qual)
show_instrumentation_count("Rows Removed by Filter", 1,
- planstate, es);
+ local_instr, es);
if (css->methods->ExplainCustomScan)
css->methods->ExplainCustomScan(css, ancestors, es);
}
@@ -2172,11 +2242,11 @@ ExplainNode(PlanState *planstate, List *ancestors,
"Join Filter", planstate, ancestors, es);
if (((NestLoop *) plan)->join.joinqual)
show_instrumentation_count("Rows Removed by Join Filter", 1,
- planstate, es);
+ local_instr, es);
show_upper_qual(plan->qual, "Filter", planstate, ancestors, es);
if (plan->qual)
show_instrumentation_count("Rows Removed by Filter", 2,
- planstate, es);
+ local_instr, es);
break;
case T_MergeJoin:
show_upper_qual(((MergeJoin *) plan)->mergeclauses,
@@ -2185,11 +2255,11 @@ ExplainNode(PlanState *planstate, List *ancestors,
"Join Filter", planstate, ancestors, es);
if (((MergeJoin *) plan)->join.joinqual)
show_instrumentation_count("Rows Removed by Join Filter", 1,
- planstate, es);
+ local_instr, es);
show_upper_qual(plan->qual, "Filter", planstate, ancestors, es);
if (plan->qual)
show_instrumentation_count("Rows Removed by Filter", 2,
- planstate, es);
+ local_instr, es);
break;
case T_HashJoin:
show_upper_qual(((HashJoin *) plan)->hashclauses,
@@ -2198,11 +2268,11 @@ ExplainNode(PlanState *planstate, List *ancestors,
"Join Filter", planstate, ancestors, es);
if (((HashJoin *) plan)->join.joinqual)
show_instrumentation_count("Rows Removed by Join Filter", 1,
- planstate, es);
+ local_instr, es);
show_upper_qual(plan->qual, "Filter", planstate, ancestors, es);
if (plan->qual)
show_instrumentation_count("Rows Removed by Filter", 2,
- planstate, es);
+ local_instr, es);
break;
case T_Agg:
show_agg_keys(castNode(AggState, planstate), ancestors, es);
@@ -2210,7 +2280,7 @@ ExplainNode(PlanState *planstate, List *ancestors,
show_hashagg_info((AggState *) planstate, es);
if (plan->qual)
show_instrumentation_count("Rows Removed by Filter", 1,
- planstate, es);
+ local_instr, es);
break;
case T_WindowAgg:
show_window_def(castNode(WindowAggState, planstate), ancestors, es);
@@ -2219,7 +2289,7 @@ ExplainNode(PlanState *planstate, List *ancestors,
show_upper_qual(plan->qual, "Filter", planstate, ancestors, es);
if (plan->qual)
show_instrumentation_count("Rows Removed by Filter", 1,
- planstate, es);
+ local_instr, es);
show_windowagg_info(castNode(WindowAggState, planstate), es);
break;
case T_Group:
@@ -2227,7 +2297,7 @@ ExplainNode(PlanState *planstate, List *ancestors,
show_upper_qual(plan->qual, "Filter", planstate, ancestors, es);
if (plan->qual)
show_instrumentation_count("Rows Removed by Filter", 1,
- planstate, es);
+ local_instr, es);
break;
case T_Sort:
show_sort_keys(castNode(SortState, planstate), ancestors, es);
@@ -2249,7 +2319,7 @@ ExplainNode(PlanState *planstate, List *ancestors,
show_upper_qual(plan->qual, "Filter", planstate, ancestors, es);
if (plan->qual)
show_instrumentation_count("Rows Removed by Filter", 1,
- planstate, es);
+ local_instr, es);
break;
case T_ModifyTable:
show_modifytable_info(castNode(ModifyTableState, planstate), ancestors,
@@ -2294,10 +2364,10 @@ ExplainNode(PlanState *planstate, List *ancestors,
}
/* Show buffer/WAL usage */
- if (es->buffers && planstate->instrument)
- show_buffer_usage(es, &planstate->instrument->bufusage);
- if (es->wal && planstate->instrument)
- show_wal_usage(es, &planstate->instrument->walusage);
+ if (es->buffers && local_instr)
+ show_buffer_usage(es, &local_instr->bufusage);
+ if (es->wal && local_instr)
+ show_wal_usage(es, &local_instr->walusage);
/* Prepare per-worker buffer/WAL usage */
if (es->workers_state && (es->buffers || es->wal) && es->verbose)
@@ -3985,19 +4055,19 @@ show_tidbitmap_info(BitmapHeapScanState *planstate, ExplainState *es)
*/
static void
show_instrumentation_count(const char *qlabel, int which,
- PlanState *planstate, ExplainState *es)
+ Instrumentation *instr, ExplainState *es)
{
double nfiltered;
double nloops;
- if (!es->analyze || !planstate->instrument)
+ if (!es->analyze || !instr)
return;
if (which == 2)
- nfiltered = planstate->instrument->nfiltered2;
+ nfiltered = instr->nfiltered2;
else
- nfiltered = planstate->instrument->nfiltered1;
- nloops = planstate->instrument->nloops;
+ nfiltered = instr->nfiltered1;
+ nloops = instr->nloops;
/* In text mode, suppress zero counts; they're not interesting enough */
if (nfiltered > 0 || es->format != EXPLAIN_FORMAT_TEXT)
@@ -4678,7 +4748,7 @@ show_modifytable_info(ModifyTableState *mtstate, List *ancestors,
{
show_upper_qual((List *) node->onConflictWhere, "Conflict Filter",
&mtstate->ps, ancestors, es);
- show_instrumentation_count("Rows Removed by Conflict Filter", 1, &mtstate->ps, es);
+ show_instrumentation_count("Rows Removed by Conflict Filter", 1, (&mtstate->ps)->instrument, es);
}
/* EXPLAIN ANALYZE display of actual outcome for each tuple proposed */
@@ -4687,11 +4757,24 @@ show_modifytable_info(ModifyTableState *mtstate, List *ancestors,
double total;
double insert_path;
double other_path;
+ Instrumentation *local_instr;
- InstrEndLoop(outerPlanState(mtstate)->instrument);
+ /* Progressive explain. Use auxiliary instrumentation object */
+ if (es->progressive)
+ {
+ local_instr = es->pe_local_instr;
+ *local_instr = *outerPlanState(mtstate)->instrument;
+ /* Force end loop even if node is in progress */
+ InstrEndLoopForce(local_instr);
+ }
+ else
+ {
+ local_instr = outerPlanState(mtstate)->instrument;
+ InstrEndLoop(local_instr);
+ }
/* count the number of source rows */
- total = outerPlanState(mtstate)->instrument->ntuples;
+ total = local_instr->ntuples;
other_path = mtstate->ps.instrument->ntuples2;
insert_path = total - other_path;
@@ -4711,11 +4794,24 @@ show_modifytable_info(ModifyTableState *mtstate, List *ancestors,
double update_path;
double delete_path;
double skipped_path;
+ Instrumentation *local_instr;
- InstrEndLoop(outerPlanState(mtstate)->instrument);
+ /* Progressive explain. Use auxiliary instrumentation object */
+ if (es->progressive)
+ {
+ local_instr = es->pe_local_instr;
+ *local_instr = *outerPlanState(mtstate)->instrument;
+ /* Force end loop even if node is in progress */
+ InstrEndLoopForce(local_instr);
+ }
+ else
+ {
+ local_instr = outerPlanState(mtstate)->instrument;
+ InstrEndLoop(local_instr);
+ }
/* count the number of source rows */
- total = outerPlanState(mtstate)->instrument->ntuples;
+ total = local_instr->ntuples;
insert_path = mtstate->mt_merge_inserted;
update_path = mtstate->mt_merge_updated;
delete_path = mtstate->mt_merge_deleted;
@@ -4997,3 +5093,644 @@ ExplainFlushWorkersState(ExplainState *es)
pfree(wstate->worker_state_save);
pfree(wstate);
}
+
+/*
+ * ProgressiveExplainSetup -
+ * Adjusts QueryDesc with instrumentation for progressive explains.
+ *
+ * If progressive explain is enabled and configured to collect
+ * instrumentation details, we adjust QueryDesc accordingly even if
+ * the query was not initiated with EXPLAIN ANALYZE. This will
+ * directly affect query execution and add computation overhead.
+ */
+void
+ProgressiveExplainSetup(QueryDesc *queryDesc)
+{
+ /* Adjust instrumentation if enabled */
+ if (progressive_explain == PROGRESSIVE_EXPLAIN_ANALYZE)
+ {
+ if (progressive_explain_timing)
+ queryDesc->instrument_options |= INSTRUMENT_TIMER;
+ else
+ queryDesc->instrument_options |= INSTRUMENT_ROWS;
+ if (progressive_explain_buffers)
+ queryDesc->instrument_options |= INSTRUMENT_BUFFERS;
+ if (progressive_explain_wal)
+ queryDesc->instrument_options |= INSTRUMENT_WAL;
+ }
+}
+
+/*
+ * ProgressiveExplainStart
+ * Progressive explain start point.
+ */
+void
+ProgressiveExplainStart(QueryDesc *queryDesc)
+{
+ activeQueryDesc = queryDesc;
+ queryDesc->pestate = NULL;
+
+ /* Timeout is only needed if duration > 0 */
+ if (progressive_explain_min_duration == 0)
+ ProgressiveExplainInit(queryDesc);
+ else
+ enable_timeout_after(PROGRESSIVE_EXPLAIN_STARTUP_TIMEOUT,
+ progressive_explain_min_duration);
+}
+
+/*
+ * ProgressiveExplainInit -
+ * Responsible for initialization of all structures related to progressive
+ * explains.
+ *
+ * We define a ExplainState that will be reused in every iteration of
+ * plan prints.
+ *
+ * Progressive explain plans are printed in shared memory via DSAs. Each
+ * A dynamic shared memory area is created to hold the progressive plans.
+ * Each backend printing plans has its own DSA, which is shared with other
+ * backends via the global progressive explain hash through dsa_handle and
+ * dsa_pointer pointers.
+ *
+ * A memory context release callback is defined for manual resource release
+ * in case of query cancellations.
+ *
+ * A periodic timeout is configured to print the plan in fixed intervals if
+ * progressive explain is configured with instrumentation enabled. Otherwise
+ * the plain plan is printed once.
+ */
+void
+ProgressiveExplainInit(QueryDesc *queryDesc)
+{
+ ExplainState *es;
+ progressiveExplainHashEntry *entry;
+ bool found;
+
+ /* Configure memory context release callback */
+ MemoryContextCallback *queryDescReleaseCallback;
+
+ queryDescReleaseCallback = (MemoryContextCallback *)
+ palloc0(sizeof(MemoryContextCallback));
+ queryDescReleaseCallback->func = ProgressiveExplainReleaseFunc;
+ queryDescReleaseCallback->arg = NULL;
+ MemoryContextRegisterResetCallback(CurrentMemoryContext,
+ queryDescReleaseCallback);
+
+ /* Initialize ExplainState to be used for all prints */
+ es = NewExplainState();
+ queryDesc->pestate = es;
+
+ /* Local instrumentation object to be reused for every node */
+ es->pe_local_instr = palloc0(sizeof(Instrumentation));
+
+ /*
+ * Mark ExplainState as progressive so that the plan printing function
+ * adjusts logic accordingly.
+ */
+ es->progressive = true;
+
+ es->analyze = (queryDesc->instrument_options &&
+ (progressive_explain == PROGRESSIVE_EXPLAIN_ANALYZE));
+ es->buffers = (es->analyze && progressive_explain_buffers);
+ es->wal = (es->analyze && progressive_explain_wal);
+ es->timing = (es->analyze && progressive_explain_timing);
+ es->summary = (es->analyze);
+ es->format = progressive_explain_format;
+ es->verbose = progressive_explain_verbose;
+ es->settings = progressive_explain_settings;
+ es->costs = progressive_explain_costs;
+
+ /* Define the DSA and share through the hash */
+ es->pe_a = dsa_create(LWTRANCHE_PROGRESSIVE_EXPLAIN_DSA);
+
+ /* Exclusive access is needed to update the hash */
+ LWLockAcquire(ProgressiveExplainHashLock, LW_EXCLUSIVE);
+
+ /* Find or create an entry with desired hash code */
+ entry = (progressiveExplainHashEntry *) hash_search(progressiveExplainHash,
+ &MyProcPid,
+ HASH_ENTER,
+ &found);
+
+ entry->h = dsa_get_handle(es->pe_a);
+ entry->p = (dsa_pointer) NULL;
+
+ LWLockRelease(ProgressiveExplainHashLock);
+
+ /* Enable timeout only if instrumentation is enabled */
+ if (es->analyze)
+ enable_timeout_every(PROGRESSIVE_EXPLAIN_TIMEOUT,
+ TimestampTzPlusMilliseconds(GetCurrentTimestamp(),
+ progressive_explain_interval),
+ progressive_explain_interval);
+
+ /* Printing progressive plan for the first time */
+ ProgressiveExplainPrint(queryDesc);
+}
+
+/*
+ * ProgressiveExplainTrigger -
+ * Called by the timeout handler to start printing progressive
+ * explain plans.
+ */
+void
+ProgressiveExplainTrigger(void)
+{
+ WrapExecProcNodeWithExplain(activeQueryDesc->planstate);
+}
+
+/*
+ * ProgressiveExplainUpdate
+ * Updates progressive explain for instrumented runs.
+ */
+void
+ProgressiveExplainUpdate(PlanState *node)
+{
+ /* Track the current PlanState */
+ node->state->query_desc->pestate->pe_curr_node = node;
+ ProgressiveExplainPrint(node->state->query_desc);
+ node->state->query_desc->pestate->pe_curr_node = NULL;
+
+ /* Reset timeout flag */
+ ProgressiveExplainPending = false;
+}
+
+/*
+ * ProgressiveExplainPrint -
+ * Prints progressive explain in memory.
+ *
+ * This function resets the reusable ExplainState, prints the
+ * plan and updates the DSA with new data.
+ *
+ * DSA memory allocation is also done here. Amount of shared
+ * memory allocated depends on size of currently printed plan.
+ * There may be reallocations in subsequent calls if new plans
+ * don't fit in the existing area.
+ */
+void
+ProgressiveExplainPrint(QueryDesc *queryDesc)
+{
+ bool alloc_needed = false;
+
+ /* Produce a plan only if descriptor is being tracked */
+ if (queryDesc &&
+ queryDesc->planstate)
+ {
+ QueryDesc *currentQueryDesc = queryDesc;
+
+ progressiveExplainHashEntry *entry;
+ progressiveExplainData *pe_data;
+ ExplainState *es = queryDesc->pestate;
+
+ /* Reset the string to be reused */
+ resetStringInfo(es->str);
+
+ /* Print the plan */
+ ExplainBeginOutput(es);
+ ExplainPrintPlan(es, currentQueryDesc);
+ ExplainEndOutput(es);
+
+ /* Exclusive access is needed to update the hash */
+ LWLockAcquire(ProgressiveExplainHashLock, LW_EXCLUSIVE);
+ entry = (progressiveExplainHashEntry *) hash_search(progressiveExplainHash,
+ &MyProcPid,
+ HASH_FIND,
+ NULL);
+
+ /* Entry exists */
+ if (entry)
+ {
+ /* Plan was never printed */
+ if (!entry->p)
+ alloc_needed = true;
+ else
+ {
+ pe_data = dsa_get_address(es->pe_a,
+ entry->p);
+
+ /*
+ * Plan does not fit in existing shared memory area.
+ * Reallocation is needed.
+ */
+ if (strlen(es->str->data) >
+ add_size(strlen(pe_data->plan),
+ PROGRESSIVE_EXPLAIN_FREE_SIZE))
+ {
+ dsa_free(es->pe_a, entry->p);
+ alloc_needed = true;
+ }
+ }
+
+ if (alloc_needed)
+ {
+ /*
+ * The allocated size combines the length of the currently
+ * printed query plan with an additional delta defined by
+ * PROGRESSIVE_EXPLAIN_FREE_SIZE. This strategy prevents
+ * having to reallocate the segment very often, which would be
+ * needed in case the length of the next printed exceeds the
+ * previously allocated size.
+ */
+ entry->p = dsa_allocate(es->pe_a,
+ add_size(sizeof(progressiveExplainData),
+ add_size(strlen(es->str->data),
+ PROGRESSIVE_EXPLAIN_FREE_SIZE)));
+ pe_data = dsa_get_address(es->pe_a, entry->p);
+ pe_data->pid = MyProcPid;
+ }
+
+ /* Update shared memory with new data */
+ strcpy(pe_data->plan, es->str->data);
+ pe_data->last_update = GetCurrentTimestamp();
+ }
+
+ LWLockRelease(ProgressiveExplainHashLock);
+ }
+}
+
+/*
+ * ProgressiveExplainFinish -
+ * Finalizes query execution with progressive explain enabled.
+ */
+void
+ProgressiveExplainFinish(QueryDesc *queryDesc)
+{
+ /* Startup timeout hasn't triggered yet, just disable it */
+ if (get_timeout_active(PROGRESSIVE_EXPLAIN_STARTUP_TIMEOUT))
+ disable_timeout(PROGRESSIVE_EXPLAIN_STARTUP_TIMEOUT, false);
+ /* Initial progressive explain was done, clean everything */
+ else if (queryDesc && queryDesc->pestate)
+ ProgressiveExplainCleanup(queryDesc);
+}
+
+/*
+ * ProgressiveExplainCleanup -
+ * Cleanup routine when progressive explain is enabled.
+ *
+ * We need to deal with structures not automatically released by the memory
+ * context removal. Current tasks are:
+ * - remove local backend from progressive explain hash
+ * - detach from DSA used to store shared data
+ */
+void
+ProgressiveExplainCleanup(QueryDesc *queryDesc)
+{
+ /* Stop timeout */
+ disable_timeout(PROGRESSIVE_EXPLAIN_TIMEOUT, false);
+
+ /* Reset timeout flag */
+ ProgressiveExplainPending = false;
+
+ /* Reset querydesc tracker */
+ activeQueryDesc = NULL;
+
+ LWLockAcquire(ProgressiveExplainHashLock, LW_EXCLUSIVE);
+
+ /*
+ * Only detach from DSA if query ended gracefully, ie, if
+ * ProgressiveExplainCleanup was called by function
+ * ProgressiveExplainFinish
+ */
+ if (queryDesc)
+ dsa_detach(queryDesc->pestate->pe_a);
+ hash_search(progressiveExplainHash, &MyProcPid, HASH_REMOVE, NULL);
+ LWLockRelease(ProgressiveExplainHashLock);
+}
+
+/*
+ * ExecProcNodeExplain -
+ * ExecProcNode wrapper that initializes progressive explain
+ * and uwraps ExecProdNode to the original function.
+ */
+static TupleTableSlot *
+ExecProcNodeExplain(PlanState *node)
+{
+ /* Initialize progressive explain */
+ ProgressiveExplainInit(node->state->query_desc);
+
+ /* Unwrap exec proc node for all nodes */
+ UnwrapExecProcNodeWithExplain(node->state->query_desc->planstate);
+
+ /*
+ * Since unwrapping has already done, call ExecProcNode() not
+ * ExecProcNodeOriginal().
+ */
+ return node->ExecProcNode(node);
+}
+
+/*
+ * ExecProcNodeInstrExplain -
+ * ExecProcNode wrapper that performs instrumentation calls and prints
+ * progressive explains. By keeping this a separate function, we add
+ * overhead only when progressive explain is enabled
+ */
+TupleTableSlot *
+ExecProcNodeInstrExplain(PlanState *node)
+{
+ TupleTableSlot *result;
+
+ InstrStartNode(node->instrument);
+
+ /*
+ * Update progressive after timeout is reached.
+ */
+ if (ProgressiveExplainPending)
+ ProgressiveExplainUpdate(node);
+
+ result = node->ExecProcNodeReal(node);
+
+ InstrStopNode(node->instrument, TupIsNull(result) ? 0.0 : 1.0);
+
+ return result;
+}
+
+/*
+ * WrapMultiExecProcNodesWithExplain -
+ * Wrap array of PlanStates ExecProcNodes with ExecProcNodeWithExplain
+ */
+static void
+WrapMultiExecProcNodesWithExplain(PlanState **planstates, int nplans)
+{
+ int i;
+
+ for (i = 0; i < nplans; i++)
+ WrapExecProcNodeWithExplain(planstates[i]);
+}
+
+/*
+ * WrapCustomPlanChildExecProcNodesWithExplain -
+ * Wrap CustomScanstate children's ExecProcNodes with ExecProcNodeWithExplain
+ */
+static void
+WrapCustomPlanChildExecProcNodesWithExplain(CustomScanState *css)
+{
+ ListCell *cell;
+
+ foreach(cell, css->custom_ps)
+ WrapExecProcNodeWithExplain((PlanState *) lfirst(cell));
+}
+
+/*
+ * WrapExecProcNodeWithExplain -
+ * Wrap ExecProcNode with ExecProcNodeWithExplain recursively
+ */
+static void
+WrapExecProcNodeWithExplain(PlanState *ps)
+{
+ /* wrapping can be done only once */
+ if (ps->ExecProcNodeOriginal != NULL)
+ return;
+
+ check_stack_depth();
+
+ ps->ExecProcNodeOriginal = ps->ExecProcNode;
+ ps->ExecProcNode = ExecProcNodeExplain;
+
+ if (ps->lefttree != NULL)
+ WrapExecProcNodeWithExplain(ps->lefttree);
+ if (ps->righttree != NULL)
+ WrapExecProcNodeWithExplain(ps->righttree);
+
+ /* special child plans */
+ switch (nodeTag(ps->plan))
+ {
+ case T_Append:
+ WrapMultiExecProcNodesWithExplain(((AppendState *) ps)->appendplans,
+ ((AppendState *) ps)->as_nplans);
+ break;
+ case T_MergeAppend:
+ WrapMultiExecProcNodesWithExplain(((MergeAppendState *) ps)->mergeplans,
+ ((MergeAppendState *) ps)->ms_nplans);
+ break;
+ case T_BitmapAnd:
+ WrapMultiExecProcNodesWithExplain(((BitmapAndState *) ps)->bitmapplans,
+ ((BitmapAndState *) ps)->nplans);
+ break;
+ case T_BitmapOr:
+ WrapMultiExecProcNodesWithExplain(((BitmapOrState *) ps)->bitmapplans,
+ ((BitmapOrState *) ps)->nplans);
+ break;
+ case T_SubqueryScan:
+ WrapExecProcNodeWithExplain(((SubqueryScanState *) ps)->subplan);
+ break;
+ case T_CustomScan:
+ WrapCustomPlanChildExecProcNodesWithExplain((CustomScanState *) ps);
+ break;
+ default:
+ break;
+ }
+}
+
+/*
+ * UnwrapMultiExecProcNodesWithExplain -
+ * Unwrap array of PlanStates ExecProcNodes with ExecProcNodeWithExplain
+ */
+static void
+UnwrapMultiExecProcNodesWithExplain(PlanState **planstates, int nplans)
+{
+ int i;
+
+ for (i = 0; i < nplans; i++)
+ UnwrapExecProcNodeWithExplain(planstates[i]);
+}
+
+/*
+ * UnwrapCustomPlanChildExecProcNodesWithExplain -
+ * Unwrap CustomScanstate children's ExecProcNodes with ExecProcNodeWithExplain
+ */
+static void
+UnwrapCustomPlanChildExecProcNodesWithExplain(CustomScanState *css)
+{
+ ListCell *cell;
+
+ foreach(cell, css->custom_ps)
+ UnwrapExecProcNodeWithExplain((PlanState *) lfirst(cell));
+}
+
+/*
+ * UnwrapExecProcNodeWithExplain -
+ * Unwrap ExecProcNode with ExecProcNodeWithExplain recursively
+ */
+static void
+UnwrapExecProcNodeWithExplain(PlanState *ps)
+{
+ Assert(ps->ExecProcNodeOriginal != NULL);
+
+ check_stack_depth();
+
+ ps->ExecProcNode = ps->ExecProcNodeOriginal;
+ ps->ExecProcNodeOriginal = NULL;
+
+ if (ps->lefttree != NULL)
+ UnwrapExecProcNodeWithExplain(ps->lefttree);
+ if (ps->righttree != NULL)
+ UnwrapExecProcNodeWithExplain(ps->righttree);
+
+ /* special child plans */
+ switch (nodeTag(ps->plan))
+ {
+ case T_Append:
+ UnwrapMultiExecProcNodesWithExplain(((AppendState *) ps)->appendplans,
+ ((AppendState *) ps)->as_nplans);
+ break;
+ case T_MergeAppend:
+ UnwrapMultiExecProcNodesWithExplain(((MergeAppendState *) ps)->mergeplans,
+ ((MergeAppendState *) ps)->ms_nplans);
+ break;
+ case T_BitmapAnd:
+ UnwrapMultiExecProcNodesWithExplain(((BitmapAndState *) ps)->bitmapplans,
+ ((BitmapAndState *) ps)->nplans);
+ break;
+ case T_BitmapOr:
+ UnwrapMultiExecProcNodesWithExplain(((BitmapOrState *) ps)->bitmapplans,
+ ((BitmapOrState *) ps)->nplans);
+ break;
+ case T_SubqueryScan:
+ UnwrapExecProcNodeWithExplain(((SubqueryScanState *) ps)->subplan);
+ break;
+ case T_CustomScan:
+ UnwrapCustomPlanChildExecProcNodesWithExplain((CustomScanState *) ps);
+ break;
+ default:
+ break;
+ }
+}
+
+/*
+ * ProgressiveExplainReleaseFunc -
+ * Memory context release callback function to remove
+ * plan from explain hash and disable the timeout.
+ */
+static void
+ProgressiveExplainReleaseFunc(void *arg)
+{
+ progressiveExplainHashEntry *entry;
+
+ LWLockAcquire(ProgressiveExplainHashLock, LW_SHARED);
+ entry = (progressiveExplainHashEntry *) hash_search(progressiveExplainHash,
+ &MyProcPid,
+ HASH_FIND,
+ NULL);
+ LWLockRelease(ProgressiveExplainHashLock);
+ if (entry)
+ ProgressiveExplainCleanup(NULL);
+}
+
+/*
+ * ProgressiveExplainHashShmemSize
+ * Compute shared memory space needed for explain hash.
+ */
+Size
+ProgressiveExplainHashShmemSize(void)
+{
+ Size size = 0;
+ long max_table_size;
+
+ max_table_size = add_size(MaxBackends,
+ max_parallel_workers);
+ size = add_size(size,
+ hash_estimate_size(max_table_size,
+ sizeof(progressiveExplainHashEntry)));
+
+ return size;
+}
+
+/*
+ * InitProgressiveExplainHash -
+ * Initialize hash used to store data of progressive explains.
+ */
+void
+InitProgressiveExplainHash(void)
+{
+ HASHCTL info;
+
+ info.keysize = sizeof(int);
+ info.entrysize = sizeof(progressiveExplainHashEntry);
+
+ progressiveExplainHash = ShmemInitHash("progressive explain hash",
+ 50, 50,
+ &info,
+ HASH_ELEM | HASH_BLOBS);
+}
+
+/*
+ * pg_stat_progress_explain -
+ * Return the progress of progressive explains.
+ */
+Datum
+pg_stat_progress_explain(PG_FUNCTION_ARGS)
+{
+#define EXPLAIN_ACTIVITY_COLS 4
+ int num_backends = pgstat_fetch_stat_numbackends();
+ int curr_backend;
+ HASH_SEQ_STATUS hash_seq;
+ progressiveExplainHashEntry *entry;
+ dsa_area *a;
+ progressiveExplainData *ped;
+
+ ReturnSetInfo *rsinfo = (ReturnSetInfo *) fcinfo->resultinfo;
+
+ InitMaterializedSRF(fcinfo, 0);
+
+ LWLockAcquire(ProgressiveExplainHashLock, LW_SHARED);
+
+ hash_seq_init(&hash_seq, progressiveExplainHash);
+ while ((entry = hash_seq_search(&hash_seq)) != NULL)
+ {
+ Datum values[EXPLAIN_ACTIVITY_COLS] = {0};
+ bool nulls[EXPLAIN_ACTIVITY_COLS] = {0};
+
+ /*
+ * We don't look at a DSA that doesn't contain data yet, or at our own
+ * row.
+ */
+ if (!DsaPointerIsValid(entry->p) ||
+ MyProcPid == entry->pid)
+ continue;
+
+ a = dsa_attach(entry->h);
+ ped = dsa_get_address(a, entry->p);
+
+ /* 1-based index */
+ for (curr_backend = 1; curr_backend <= num_backends; curr_backend++)
+ {
+ LocalPgBackendStatus *local_beentry;
+ PgBackendStatus *beentry;
+
+ /* Get the next one in the list */
+ local_beentry = pgstat_get_local_beentry_by_index(curr_backend);
+ beentry = &local_beentry->backendStatus;
+
+ if (beentry->st_procpid == ped->pid)
+ {
+ /* Values available to all callers */
+ if (beentry->st_databaseid != InvalidOid)
+ values[0] = ObjectIdGetDatum(beentry->st_databaseid);
+ else
+ nulls[0] = true;
+
+ values[1] = ped->pid;
+ values[2] = TimestampTzGetDatum(ped->last_update);
+
+ if (superuser())
+ values[3] = CStringGetTextDatum(ped->plan);
+ else
+ {
+ if (beentry->st_userid == GetUserId())
+ values[3] = CStringGetTextDatum(ped->plan);
+ else
+ values[3] = CStringGetTextDatum("<insufficient privilege>");
+ }
+ break;
+ }
+ }
+
+ tuplestore_putvalues(rsinfo->setResult, rsinfo->setDesc, values, nulls);
+
+ dsa_detach(a);
+
+ }
+ LWLockRelease(ProgressiveExplainHashLock);
+
+ return (Datum) 0;
+}
diff --git a/src/backend/executor/execMain.c b/src/backend/executor/execMain.c
index e9bd98c7738..aeccbd1ff48 100644
--- a/src/backend/executor/execMain.c
+++ b/src/backend/executor/execMain.c
@@ -43,6 +43,7 @@
#include "access/xact.h"
#include "catalog/namespace.h"
#include "catalog/partition.h"
+#include "commands/explain.h"
#include "commands/matview.h"
#include "commands/trigger.h"
#include "executor/executor.h"
@@ -157,6 +158,12 @@ standard_ExecutorStart(QueryDesc *queryDesc, int eflags)
/* caller must ensure the query's snapshot is active */
Assert(GetActiveSnapshot() == queryDesc->snapshot);
+ /*
+ * Setup progressive explain if enabled.
+ */
+ if (progressive_explain == PROGRESSIVE_EXPLAIN_ANALYZE)
+ ProgressiveExplainSetup(queryDesc);
+
/*
* If the transaction is read-only, we need to check if any writes are
* planned to non-temporary tables. EXPLAIN is considered read-only.
@@ -182,6 +189,11 @@ standard_ExecutorStart(QueryDesc *queryDesc, int eflags)
estate = CreateExecutorState();
queryDesc->estate = estate;
+ /*
+ * Adding back reference to QueryDesc
+ */
+ estate->query_desc = queryDesc;
+
oldcontext = MemoryContextSwitchTo(estate->es_query_cxt);
/*
@@ -267,6 +279,12 @@ standard_ExecutorStart(QueryDesc *queryDesc, int eflags)
*/
InitPlan(queryDesc, eflags);
+ /*
+ * Start progressive explain if enabled.
+ */
+ if (progressive_explain != PROGRESSIVE_EXPLAIN_NONE)
+ ProgressiveExplainStart(queryDesc);
+
MemoryContextSwitchTo(oldcontext);
return ExecPlanStillValid(queryDesc->estate);
@@ -516,6 +534,12 @@ standard_ExecutorFinish(QueryDesc *queryDesc)
MemoryContextSwitchTo(oldcontext);
+ /*
+ * Finish progressive explain if enabled.
+ */
+ if (progressive_explain != PROGRESSIVE_EXPLAIN_NONE)
+ ProgressiveExplainFinish(queryDesc);
+
estate->es_finished = true;
}
diff --git a/src/backend/executor/execProcnode.c b/src/backend/executor/execProcnode.c
index f5f9cfbeead..3af8e9d964d 100644
--- a/src/backend/executor/execProcnode.c
+++ b/src/backend/executor/execProcnode.c
@@ -72,6 +72,7 @@
*/
#include "postgres.h"
+#include "commands/explain.h"
#include "executor/executor.h"
#include "executor/nodeAgg.h"
#include "executor/nodeAppend.h"
@@ -118,6 +119,7 @@
#include "executor/nodeWorktablescan.h"
#include "miscadmin.h"
#include "nodes/nodeFuncs.h"
+#include "utils/guc.h"
static TupleTableSlot *ExecProcNodeFirst(PlanState *node);
static TupleTableSlot *ExecProcNodeInstr(PlanState *node);
@@ -461,8 +463,14 @@ ExecProcNodeFirst(PlanState *node)
* does instrumentation. Otherwise we can dispense with all wrappers and
* have ExecProcNode() directly call the relevant function from now on.
*/
+
if (node->instrument)
- node->ExecProcNode = ExecProcNodeInstr;
+ {
+ if (progressive_explain == PROGRESSIVE_EXPLAIN_ANALYZE)
+ node->ExecProcNode = ExecProcNodeInstrExplain;
+ else
+ node->ExecProcNode = ExecProcNodeInstr;
+ }
else
node->ExecProcNode = node->ExecProcNodeReal;
diff --git a/src/backend/executor/instrument.c b/src/backend/executor/instrument.c
index 56e635f4700..6a160ee254f 100644
--- a/src/backend/executor/instrument.c
+++ b/src/backend/executor/instrument.c
@@ -25,6 +25,8 @@ static WalUsage save_pgWalUsage;
static void BufferUsageAdd(BufferUsage *dst, const BufferUsage *add);
static void WalUsageAdd(WalUsage *dst, WalUsage *add);
+static void InstrEndLoopInternal(Instrumentation *instr, bool force);
+
/* Allocate new instrumentation structure(s) */
Instrumentation *
@@ -137,7 +139,7 @@ InstrUpdateTupleCount(Instrumentation *instr, double nTuples)
/* Finish a run cycle for a plan node */
void
-InstrEndLoop(Instrumentation *instr)
+InstrEndLoopInternal(Instrumentation *instr, bool force)
{
double totaltime;
@@ -145,7 +147,7 @@ InstrEndLoop(Instrumentation *instr)
if (!instr->running)
return;
- if (!INSTR_TIME_IS_ZERO(instr->starttime))
+ if (!INSTR_TIME_IS_ZERO(instr->starttime) && !force)
elog(ERROR, "InstrEndLoop called on running node");
/* Accumulate per-cycle statistics into totals */
@@ -164,6 +166,20 @@ InstrEndLoop(Instrumentation *instr)
instr->tuplecount = 0;
}
+/* Safely finish a run cycle for a plan node */
+void
+InstrEndLoop(Instrumentation *instr)
+{
+ InstrEndLoopInternal(instr, false);
+}
+
+/* Forcibly finish a run cycle for a plan node */
+void
+InstrEndLoopForce(Instrumentation *instr)
+{
+ InstrEndLoopInternal(instr, true);
+}
+
/* aggregate instrumentation information */
void
InstrAggNode(Instrumentation *dst, Instrumentation *add)
diff --git a/src/backend/storage/ipc/ipci.c b/src/backend/storage/ipc/ipci.c
index 2fa045e6b0f..8ec9bf18d36 100644
--- a/src/backend/storage/ipc/ipci.c
+++ b/src/backend/storage/ipc/ipci.c
@@ -25,6 +25,7 @@
#include "access/xlogprefetcher.h"
#include "access/xlogrecovery.h"
#include "commands/async.h"
+#include "commands/explain.h"
#include "miscadmin.h"
#include "pgstat.h"
#include "postmaster/autovacuum.h"
@@ -150,6 +151,7 @@ CalculateShmemSize(int *num_semaphores)
size = add_size(size, InjectionPointShmemSize());
size = add_size(size, SlotSyncShmemSize());
size = add_size(size, AioShmemSize());
+ size = add_size(size, ProgressiveExplainHashShmemSize());
/* include additional requested shmem from preload libraries */
size = add_size(size, total_addin_request);
@@ -302,6 +304,11 @@ CreateOrAttachShmemStructs(void)
*/
PredicateLockShmemInit();
+ /*
+ * Set up instrumented explain hash table
+ */
+ InitProgressiveExplainHash();
+
/*
* Set up process table
*/
diff --git a/src/backend/storage/lmgr/lwlock.c b/src/backend/storage/lmgr/lwlock.c
index 5702c35bb91..7097312b1a8 100644
--- a/src/backend/storage/lmgr/lwlock.c
+++ b/src/backend/storage/lmgr/lwlock.c
@@ -177,6 +177,7 @@ static const char *const BuiltinTrancheNames[] = {
[LWTRANCHE_SUBTRANS_SLRU] = "SubtransSLRU",
[LWTRANCHE_XACT_SLRU] = "XactSLRU",
[LWTRANCHE_PARALLEL_VACUUM_DSA] = "ParallelVacuumDSA",
+ [LWTRANCHE_PROGRESSIVE_EXPLAIN_DSA] = "ProgressiveExplainDSA",
};
StaticAssertDecl(lengthof(BuiltinTrancheNames) ==
diff --git a/src/backend/utils/activity/wait_event_names.txt b/src/backend/utils/activity/wait_event_names.txt
index 9fa12a555e8..b53bc61d0d8 100644
--- a/src/backend/utils/activity/wait_event_names.txt
+++ b/src/backend/utils/activity/wait_event_names.txt
@@ -349,6 +349,7 @@ DSMRegistry "Waiting to read or update the dynamic shared memory registry."
InjectionPoint "Waiting to read or update information related to injection points."
SerialControl "Waiting to read or update shared <filename>pg_serial</filename> state."
AioWorkerSubmissionQueue "Waiting to access AIO worker submission queue."
+ProgressiveExplainHash "Waiting to access backend progressive explain shared hash table."
#
# END OF PREDEFINED LWLOCKS (DO NOT CHANGE THIS LINE)
diff --git a/src/backend/utils/init/postinit.c b/src/backend/utils/init/postinit.c
index 7958ea11b73..9c70323ba23 100644
--- a/src/backend/utils/init/postinit.c
+++ b/src/backend/utils/init/postinit.c
@@ -33,6 +33,7 @@
#include "catalog/pg_database.h"
#include "catalog/pg_db_role_setting.h"
#include "catalog/pg_tablespace.h"
+#include "commands/explain.h"
#include "libpq/auth.h"
#include "libpq/libpq-be.h"
#include "mb/pg_wchar.h"
@@ -82,6 +83,8 @@ static void TransactionTimeoutHandler(void);
static void IdleSessionTimeoutHandler(void);
static void IdleStatsUpdateTimeoutHandler(void);
static void ClientCheckTimeoutHandler(void);
+static void ProgressiveExplainStartupTimeoutHandler(void);
+static void ProgressiveExplainTimeoutHandler(void);
static bool ThereIsAtLeastOneRole(void);
static void process_startup_options(Port *port, bool am_superuser);
static void process_settings(Oid databaseid, Oid roleid);
@@ -771,6 +774,10 @@ InitPostgres(const char *in_dbname, Oid dboid,
RegisterTimeout(CLIENT_CONNECTION_CHECK_TIMEOUT, ClientCheckTimeoutHandler);
RegisterTimeout(IDLE_STATS_UPDATE_TIMEOUT,
IdleStatsUpdateTimeoutHandler);
+ RegisterTimeout(PROGRESSIVE_EXPLAIN_TIMEOUT,
+ ProgressiveExplainTimeoutHandler);
+ RegisterTimeout(PROGRESSIVE_EXPLAIN_STARTUP_TIMEOUT,
+ ProgressiveExplainStartupTimeoutHandler);
}
/*
@@ -1432,6 +1439,18 @@ ClientCheckTimeoutHandler(void)
SetLatch(MyLatch);
}
+static void
+ProgressiveExplainStartupTimeoutHandler(void)
+{
+ ProgressiveExplainTrigger();
+}
+
+static void
+ProgressiveExplainTimeoutHandler(void)
+{
+ ProgressiveExplainPending = true;
+}
+
/*
* Returns true if at least one role is defined in this database cluster.
*/
diff --git a/src/backend/utils/misc/guc_tables.c b/src/backend/utils/misc/guc_tables.c
index cc8f2b1230a..39b9b0ad0ee 100644
--- a/src/backend/utils/misc/guc_tables.c
+++ b/src/backend/utils/misc/guc_tables.c
@@ -41,6 +41,8 @@
#include "commands/async.h"
#include "commands/extension.h"
#include "commands/event_trigger.h"
+#include "commands/explain.h"
+#include "commands/explain_state.h"
#include "commands/tablespace.h"
#include "commands/trigger.h"
#include "commands/user.h"
@@ -479,6 +481,14 @@ static const struct config_enum_entry wal_compression_options[] = {
{NULL, 0, false}
};
+static const struct config_enum_entry progressive_explain_options[] = {
+ {"off", PROGRESSIVE_EXPLAIN_NONE, false},
+ {"explain", PROGRESSIVE_EXPLAIN_EXPLAIN, false},
+ {"analyze", PROGRESSIVE_EXPLAIN_ANALYZE, false},
+ {"false", PROGRESSIVE_EXPLAIN_NONE, true},
+ {NULL, 0, false}
+};
+
/*
* Options for enum values stored in other modules
*/
@@ -533,6 +543,16 @@ int log_parameter_max_length_on_error = 0;
int log_temp_files = -1;
double log_statement_sample_rate = 1.0;
double log_xact_sample_rate = 0;
+int progressive_explain = PROGRESSIVE_EXPLAIN_NONE;
+bool progressive_explain_verbose = false;
+bool progressive_explain_settings = false;
+bool progressive_explain_timing = true;
+bool progressive_explain_buffers = false;
+bool progressive_explain_wal = false;
+bool progressive_explain_costs = true;
+int progressive_explain_min_duration = 1000;
+int progressive_explain_interval = 1000;
+int progressive_explain_format = EXPLAIN_FORMAT_TEXT;
char *backtrace_functions;
int temp_file_limit = -1;
@@ -2131,6 +2151,73 @@ struct config_bool ConfigureNamesBool[] =
NULL, NULL, NULL
},
+
+ {
+ {"progressive_explain_verbose", PGC_USERSET, STATS_MONITORING,
+ gettext_noop("Controls whether verbose details are added to progressive explains."),
+ gettext_noop("Equivalent to the VERBOSE option of EXPLAIN."),
+ GUC_EXPLAIN
+ },
+ &progressive_explain_verbose,
+ false,
+ NULL, NULL, NULL
+ },
+
+ {
+ {"progressive_explain_settings", PGC_USERSET, STATS_MONITORING,
+ gettext_noop("Controls whether information on modified configuration is added to progressive explains."),
+ gettext_noop("Equivalent to the SETTINGS option of EXPLAIN."),
+ GUC_EXPLAIN
+ },
+ &progressive_explain_settings,
+ false,
+ NULL, NULL, NULL
+ },
+
+ {
+ {"progressive_explain_timing", PGC_USERSET, STATS_MONITORING,
+ gettext_noop("Controls whether information on per node timing is added to progressive explains."),
+ gettext_noop("Equivalent to the TIMING option of EXPLAIN."),
+ GUC_EXPLAIN
+ },
+ &progressive_explain_timing,
+ true,
+ NULL, NULL, NULL
+ },
+
+ {
+ {"progressive_explain_buffers", PGC_USERSET, STATS_MONITORING,
+ gettext_noop("Controls whether information on buffer usage is added to progressive explains."),
+ gettext_noop("Equivalent to the BUFFERS option of EXPLAIN."),
+ GUC_EXPLAIN
+ },
+ &progressive_explain_buffers,
+ false,
+ NULL, NULL, NULL
+ },
+
+ {
+ {"progressive_explain_wal", PGC_USERSET, STATS_MONITORING,
+ gettext_noop("Controls whether information on WAL record generation is added to progressive explains."),
+ gettext_noop("Equivalent to the WAL option of EXPLAIN."),
+ GUC_EXPLAIN
+ },
+ &progressive_explain_wal,
+ false,
+ NULL, NULL, NULL
+ },
+
+ {
+ {"progressive_explain_costs", PGC_USERSET, STATS_MONITORING,
+ gettext_noop("Controls whether information on the estimated startup and total cost of each plan node is added to progressive explains."),
+ gettext_noop("Equivalent to the COSTS option of EXPLAIN."),
+ GUC_EXPLAIN
+ },
+ &progressive_explain_costs,
+ true,
+ NULL, NULL, NULL
+ },
+
/* End-of-list marker */
{
{NULL, 0, 0, NULL, NULL}, NULL, false, NULL, NULL, NULL
@@ -3836,6 +3923,30 @@ struct config_int ConfigureNamesInt[] =
NULL, NULL, NULL
},
+ {
+ {"progressive_explain_interval", PGC_USERSET, STATS_MONITORING,
+ gettext_noop("Sets the interval between instrumented progressive "
+ "explains."),
+ NULL,
+ GUC_UNIT_MS
+ },
+ &progressive_explain_interval,
+ 1000, 10, INT_MAX,
+ NULL, NULL, NULL
+ },
+
+ {
+ {"progressive_explain_min_duration", PGC_USERSET, STATS_MONITORING,
+ gettext_noop("Min query duration to start printing instrumented "
+ "progressive explains."),
+ NULL,
+ GUC_UNIT_MS
+ },
+ &progressive_explain_min_duration,
+ 1000, 0, INT_MAX,
+ NULL, NULL, NULL
+ },
+
/* End-of-list marker */
{
{NULL, 0, 0, NULL, NULL}, NULL, 0, 0, 0, NULL, NULL, NULL
@@ -5384,6 +5495,26 @@ struct config_enum ConfigureNamesEnum[] =
NULL, assign_io_method, NULL
},
+ {
+ {"progressive_explain_format", PGC_USERSET, STATS_MONITORING,
+ gettext_noop("Selects the EXPLAIN output format to be used with progressive explains."),
+ gettext_noop("Equivalent to the FORMAT option of EXPLAIN.")
+ },
+ &progressive_explain_format,
+ EXPLAIN_FORMAT_TEXT, explain_format_options,
+ NULL, NULL, NULL
+ },
+
+ {
+ {"progressive_explain", PGC_USERSET, STATS_MONITORING,
+ gettext_noop("Enables progressive explains."),
+ gettext_noop("Explain output is visible via pg_stat_progress_explain.")
+ },
+ &progressive_explain,
+ PROGRESSIVE_EXPLAIN_NONE, progressive_explain_options,
+ NULL, NULL, NULL
+ },
+
/* End-of-list marker */
{
{NULL, 0, 0, NULL, NULL}, NULL, 0, NULL, NULL, NULL, NULL
diff --git a/src/backend/utils/misc/postgresql.conf.sample b/src/backend/utils/misc/postgresql.conf.sample
index ad54585cf1d..cc5b42da2cd 100644
--- a/src/backend/utils/misc/postgresql.conf.sample
+++ b/src/backend/utils/misc/postgresql.conf.sample
@@ -668,6 +668,20 @@
#log_executor_stats = off
+# - Progressive Explain -
+
+#progressive_explain = off
+#progressive_explain_min_duration = 1s
+#progressive_explain_interval = 1s
+#progressive_explain_format = text
+#progressive_explain_settings = off
+#progressive_explain_verbose = off
+#progressive_explain_buffers = off
+#progressive_explain_wal = off
+#progressive_explain_timing = on
+#progressive_explain_costs = on
+
+
#------------------------------------------------------------------------------
# VACUUMING
#------------------------------------------------------------------------------
diff --git a/src/include/catalog/pg_proc.dat b/src/include/catalog/pg_proc.dat
index 890822eaf79..c6ecd537aaa 100644
--- a/src/include/catalog/pg_proc.dat
+++ b/src/include/catalog/pg_proc.dat
@@ -12479,4 +12479,14 @@
proargtypes => 'int4',
prosrc => 'gist_stratnum_common' },
+{ oid => '8770',
+ descr => 'statistics: information about progress of backends running statements',
+ proname => 'pg_stat_progress_explain', prorows => '100', proisstrict => 'f',
+ proretset => 't', provolatile => 's', proparallel => 'r',
+ prorettype => 'record', proargtypes => '',
+ proallargtypes => '{oid,int4,timestamptz,text}',
+ proargmodes => '{o,o,o,o}',
+ proargnames => '{datid,pid,last_update,query_plan}',
+ prosrc => 'pg_stat_progress_explain' },
+
]
diff --git a/src/include/commands/explain.h b/src/include/commands/explain.h
index 387839eb5d2..70079571391 100644
--- a/src/include/commands/explain.h
+++ b/src/include/commands/explain.h
@@ -13,11 +13,33 @@
#ifndef EXPLAIN_H
#define EXPLAIN_H
+#include "datatype/timestamp.h"
#include "executor/executor.h"
#include "parser/parse_node.h"
struct ExplainState; /* defined in explain_state.h */
+typedef enum ProgressiveExplain
+{
+ PROGRESSIVE_EXPLAIN_NONE,
+ PROGRESSIVE_EXPLAIN_EXPLAIN,
+ PROGRESSIVE_EXPLAIN_ANALYZE,
+} ProgressiveExplain;
+
+typedef struct progressiveExplainHashEntry
+{
+ int pid; /* hash key of entry - MUST BE FIRST */
+ dsa_handle h;
+ dsa_pointer p;
+} progressiveExplainHashEntry;
+
+typedef struct progressiveExplainData
+{
+ int pid;
+ TimestampTz last_update;
+ char plan[];
+} progressiveExplainData;
+
/* Hook for plugins to get control in ExplainOneQuery() */
typedef void (*ExplainOneQuery_hook_type) (Query *query,
int cursorOptions,
@@ -83,4 +105,15 @@ extern void ExplainQueryText(struct ExplainState *es, QueryDesc *queryDesc);
extern void ExplainQueryParameters(struct ExplainState *es,
ParamListInfo params, int maxlen);
+extern void ProgressiveExplainSetup(QueryDesc *queryDesc);
+extern void ProgressiveExplainStart(QueryDesc *queryDesc);
+extern void ProgressiveExplainTrigger(void);
+extern void ProgressiveExplainUpdate(PlanState *node);
+extern void ProgressiveExplainFinish(QueryDesc *queryDesc);
+extern Size ProgressiveExplainHashShmemSize(void);
+extern void InitProgressiveExplainHash(void);
+extern TupleTableSlot *ExecProcNodeInstrExplain(PlanState *node);
+
+extern bool ProgressiveExplainPending;
+
#endif /* EXPLAIN_H */
diff --git a/src/include/commands/explain_state.h b/src/include/commands/explain_state.h
index 925097492b9..63ef2a7a298 100644
--- a/src/include/commands/explain_state.h
+++ b/src/include/commands/explain_state.h
@@ -16,6 +16,7 @@
#include "nodes/parsenodes.h"
#include "nodes/plannodes.h"
#include "parser/parse_node.h"
+#include "utils/dsa.h"
typedef enum ExplainSerializeOption
{
@@ -74,6 +75,14 @@ typedef struct ExplainState
/* extensions */
void **extension_state;
int extension_state_allocated;
+ /* set if tracking a progressive explain */
+ bool progressive;
+ /* current plan node in progressive explains */
+ struct PlanState *pe_curr_node;
+ /* reusable instr object used in progressive explains */
+ struct Instrumentation *pe_local_instr;
+ /* dsa area used to store progressive explain data */
+ dsa_area *pe_a;
} ExplainState;
typedef void (*ExplainOptionHandler) (ExplainState *, DefElem *, ParseState *);
diff --git a/src/include/executor/execdesc.h b/src/include/executor/execdesc.h
index ba53305ad42..27692aee542 100644
--- a/src/include/executor/execdesc.h
+++ b/src/include/executor/execdesc.h
@@ -48,6 +48,7 @@ typedef struct QueryDesc
TupleDesc tupDesc; /* descriptor for result tuples */
EState *estate; /* executor's query-wide state */
PlanState *planstate; /* tree of per-plan-node state */
+ struct ExplainState *pestate; /* progressive explain state if enabled */
/* This field is set by ExecutePlan */
bool already_executed; /* true if previously executed */
diff --git a/src/include/executor/instrument.h b/src/include/executor/instrument.h
index 03653ab6c6c..21de2a5632d 100644
--- a/src/include/executor/instrument.h
+++ b/src/include/executor/instrument.h
@@ -109,6 +109,7 @@ extern void InstrStartNode(Instrumentation *instr);
extern void InstrStopNode(Instrumentation *instr, double nTuples);
extern void InstrUpdateTupleCount(Instrumentation *instr, double nTuples);
extern void InstrEndLoop(Instrumentation *instr);
+extern void InstrEndLoopForce(Instrumentation *instr);
extern void InstrAggNode(Instrumentation *dst, Instrumentation *add);
extern void InstrStartParallelQuery(void);
extern void InstrEndParallelQuery(BufferUsage *bufusage, WalUsage *walusage);
diff --git a/src/include/nodes/execnodes.h b/src/include/nodes/execnodes.h
index d4d4e655180..e3afeab01ee 100644
--- a/src/include/nodes/execnodes.h
+++ b/src/include/nodes/execnodes.h
@@ -57,6 +57,7 @@ struct ExprState;
struct ExprContext;
struct RangeTblEntry; /* avoid including parsenodes.h here */
struct ExprEvalStep; /* avoid including execExpr.h everywhere */
+struct QueryDesc; /* avoid including execdesc.h here */
struct CopyMultiInsertBuffer;
struct LogicalTapeSet;
@@ -763,6 +764,9 @@ typedef struct EState
*/
List *es_insert_pending_result_relations;
List *es_insert_pending_modifytables;
+
+ /* Reference to query descriptor */
+ struct QueryDesc *query_desc;
} EState;
@@ -1159,6 +1163,8 @@ typedef struct PlanState
ExecProcNodeMtd ExecProcNode; /* function to return next tuple */
ExecProcNodeMtd ExecProcNodeReal; /* actual function, if above is a
* wrapper */
+ ExecProcNodeMtd ExecProcNodeOriginal; /* pointer to original function
+ * another wrapper was added */
Instrumentation *instrument; /* Optional runtime stats for this node */
WorkerInstrumentation *worker_instrument; /* per-worker instrumentation */
diff --git a/src/include/storage/lwlock.h b/src/include/storage/lwlock.h
index ffa03189e2d..f3499d307f4 100644
--- a/src/include/storage/lwlock.h
+++ b/src/include/storage/lwlock.h
@@ -218,6 +218,7 @@ typedef enum BuiltinTrancheIds
LWTRANCHE_SUBTRANS_SLRU,
LWTRANCHE_XACT_SLRU,
LWTRANCHE_PARALLEL_VACUUM_DSA,
+ LWTRANCHE_PROGRESSIVE_EXPLAIN_DSA,
LWTRANCHE_FIRST_USER_DEFINED,
} BuiltinTrancheIds;
diff --git a/src/include/storage/lwlocklist.h b/src/include/storage/lwlocklist.h
index 932024b1b0b..7d88e7e9b58 100644
--- a/src/include/storage/lwlocklist.h
+++ b/src/include/storage/lwlocklist.h
@@ -84,3 +84,4 @@ PG_LWLOCK(50, DSMRegistry)
PG_LWLOCK(51, InjectionPoint)
PG_LWLOCK(52, SerialControl)
PG_LWLOCK(53, AioWorkerSubmissionQueue)
+PG_LWLOCK(54, ProgressiveExplainHash)
diff --git a/src/include/utils/guc.h b/src/include/utils/guc.h
index f619100467d..cff5c1f4cdb 100644
--- a/src/include/utils/guc.h
+++ b/src/include/utils/guc.h
@@ -278,6 +278,16 @@ extern PGDLLIMPORT int log_min_duration_statement;
extern PGDLLIMPORT int log_temp_files;
extern PGDLLIMPORT double log_statement_sample_rate;
extern PGDLLIMPORT double log_xact_sample_rate;
+extern PGDLLIMPORT int progressive_explain;
+extern PGDLLIMPORT int progressive_explain_min_duration;
+extern PGDLLIMPORT int progressive_explain_interval;
+extern PGDLLIMPORT int progressive_explain_format;
+extern PGDLLIMPORT bool progressive_explain_verbose;
+extern PGDLLIMPORT bool progressive_explain_settings;
+extern PGDLLIMPORT bool progressive_explain_timing;
+extern PGDLLIMPORT bool progressive_explain_buffers;
+extern PGDLLIMPORT bool progressive_explain_wal;
+extern PGDLLIMPORT bool progressive_explain_costs;
extern PGDLLIMPORT char *backtrace_functions;
extern PGDLLIMPORT int temp_file_limit;
@@ -322,6 +332,7 @@ extern PGDLLIMPORT const struct config_enum_entry io_method_options[];
extern PGDLLIMPORT const struct config_enum_entry recovery_target_action_options[];
extern PGDLLIMPORT const struct config_enum_entry wal_level_options[];
extern PGDLLIMPORT const struct config_enum_entry wal_sync_method_options[];
+extern PGDLLIMPORT const struct config_enum_entry explain_format_options[];
/*
* Functions exported by guc.c
diff --git a/src/include/utils/timeout.h b/src/include/utils/timeout.h
index 7b19beafdc9..ea66a0505d9 100644
--- a/src/include/utils/timeout.h
+++ b/src/include/utils/timeout.h
@@ -36,6 +36,8 @@ typedef enum TimeoutId
IDLE_STATS_UPDATE_TIMEOUT,
CLIENT_CONNECTION_CHECK_TIMEOUT,
STARTUP_PROGRESS_TIMEOUT,
+ PROGRESSIVE_EXPLAIN_STARTUP_TIMEOUT,
+ PROGRESSIVE_EXPLAIN_TIMEOUT,
/* First user-definable timeout reason */
USER_TIMEOUT,
/* Maximum number of timeout reasons */
diff --git a/src/test/modules/test_misc/t/008_progressive_explain.pl b/src/test/modules/test_misc/t/008_progressive_explain.pl
new file mode 100644
index 00000000000..05e555a5e26
--- /dev/null
+++ b/src/test/modules/test_misc/t/008_progressive_explain.pl
@@ -0,0 +1,130 @@
+# Copyright (c) 2023-2025, PostgreSQL Global Development Group
+#
+# Test progressive explain
+#
+# We need to make sure pg_stat_progress_explain does not show rows for the local
+# session, even if progressive explain is enabled. For other sessions pg_stat_progress_explain
+# should contain data for a PID only if progressive_explain is enabled and a query
+# is running. Data needs to be removed when query finishes (or gets cancelled).
+
+use strict;
+use PostgreSQL::Test::Cluster;
+use PostgreSQL::Test::Utils;
+use Test::More;
+
+# Initialize node
+my $node = PostgreSQL::Test::Cluster->new('progressive_explain');
+
+$node->init;
+# Configure progressive explain to be logged immediately
+$node->append_conf('postgresql.conf', 'progressive_explain_min_duration = 0');
+$node->start;
+
+# Test for local session
+sub test_local_session
+{
+ my $setting = $_[0];
+ # Make sure local session does not appear in pg_stat_progress_explain
+ my $count = $node->safe_psql(
+ 'postgres', qq[
+ SET progressive_explain='$setting';
+ SELECT count(*) from pg_stat_progress_explain WHERE pid = pg_backend_pid()
+ ]);
+
+ ok($count == "0",
+ "Session cannot see its own explain with progressive_explain set to ${setting}");
+}
+
+# Tests for peer session
+sub test_peer_session
+{
+ my $setting = $_[0];
+ my $ret;
+
+ # Start a background session and get its PID
+ my $background_psql = $node->background_psql(
+ 'postgres',
+ on_error_stop => 0);
+
+ my $pid = $background_psql->query_safe(
+ qq[
+ SELECT pg_backend_pid();
+ ]);
+
+ # Configure progressive explain in background session and run a simple query
+ # letting it finish
+ $background_psql->query_safe(
+ qq[
+ SET progressive_explain='$setting';
+ SELECT 1;
+ ]);
+
+ # Check that pg_stat_progress_explain contains no row for the PID that finished
+ # its query gracefully
+ $ret = $node->safe_psql(
+ 'postgres', qq[
+ SELECT count(*) FROM pg_stat_progress_explain where pid = $pid
+ ]);
+
+ ok($ret == "0",
+ "pg_stat_progress_explain empty for completed query with progressive_explain set to ${setting}");
+
+ # Start query in background session and leave it running
+ $background_psql->query_until(
+ qr/start/, q(
+ \echo start
+ SELECT pg_sleep(600);
+ ));
+
+ $ret = $node->safe_psql(
+ 'postgres', qq[
+ SELECT count(*) FROM pg_stat_progress_explain where pid = $pid
+ ]);
+
+ # If progressive_explain is disabled pg_stat_progress_explain should not contain
+ # row for PID
+ if ($setting eq 'off') {
+ ok($ret == "0",
+ "pg_stat_progress_explain empty for running query with progressive_explain set to ${setting}");
+ }
+ # 1 row for pid is expected for running query
+ else {
+ ok($ret == "1",
+ "pg_stat_progress_explain with 1 row for running query with progressive_explain set to ${setting}");
+ }
+
+ # Terminate running query and make sure it is gone
+ $node->safe_psql(
+ 'postgres', qq[
+ SELECT pg_cancel_backend($pid);
+ ]);
+
+ $node->poll_query_until(
+ 'postgres', qq[
+ SELECT count(*) = 0 FROM pg_stat_activity
+ WHERE pid = $pid and state = 'active'
+ ]);
+
+ # Check again pg_stat_progress_explain and confirm that the existing row is
+ # now gone
+ $ret = $node->safe_psql(
+ 'postgres', qq[
+ SELECT count(*) FROM pg_stat_progress_explain where pid = $pid
+ ]);
+
+ ok($ret == "0",
+ "pg_stat_progress_explain empty for canceled query with progressive_explain set to ${setting}");
+}
+
+# Run tests for the local session
+test_local_session('off');
+test_local_session('explain');
+test_local_session('analyze');
+
+# Run tests for peer session
+test_peer_session('off');
+test_peer_session('explain');
+test_peer_session('analyze');
+
+$node->stop;
+done_testing();
diff --git a/src/test/regress/expected/rules.out b/src/test/regress/expected/rules.out
index 62f69ac20b2..6aac983be42 100644
--- a/src/test/regress/expected/rules.out
+++ b/src/test/regress/expected/rules.out
@@ -2041,6 +2041,13 @@ pg_stat_progress_create_index| SELECT s.pid,
s.param15 AS partitions_done
FROM (pg_stat_get_progress_info('CREATE INDEX'::text) s(pid, datid, relid, param1, param2, param3, param4, param5, param6, param7, param8, param9, param10, param11, param12, param13, param14, param15, param16, param17, param18, param19, param20)
LEFT JOIN pg_database d ON ((s.datid = d.oid)));
+pg_stat_progress_explain| SELECT s.datid,
+ d.datname,
+ s.pid,
+ s.last_update,
+ s.query_plan
+ FROM (pg_stat_progress_explain() s(datid, pid, last_update, query_plan)
+ LEFT JOIN pg_database d ON ((s.datid = d.oid)));
pg_stat_progress_vacuum| SELECT s.pid,
s.datid,
d.datname,
diff --git a/src/tools/pgindent/typedefs.list b/src/tools/pgindent/typedefs.list
index bfa276d2d35..a29a38ac698 100644
--- a/src/tools/pgindent/typedefs.list
+++ b/src/tools/pgindent/typedefs.list
@@ -2295,6 +2295,7 @@ ProcessUtilityContext
ProcessUtility_hook_type
ProcessingMode
ProgressCommandType
+ProgressiveExplain
ProjectSet
ProjectSetPath
ProjectSetState
@@ -3900,6 +3901,8 @@ process_sublinks_context
proclist_head
proclist_mutable_iter
proclist_node
+progressiveExplainData
+progressiveExplainHashEntry
promptStatus_t
pthread_barrier_t
pthread_cond_t
--
2.43.0
On Wed, Mar 19, 2025 at 1:47 PM Rafael Thofehrn Castro
<rafaelthca@gmail.com> wrote:
Sending a new version as rebase was required.
Reading this thread, it seems to me that there has been a good deal of
discussion about things like the name of the feature and what the UI
ought to be, but not much discussion of whether the feature is
actually safe, and not much detailed review of the code. I'm willing
to bet that everybody wants some version of this feature if we can
convince ourselves that it's not going to do horrible things like
cause server crashes, and that even people who don't get their first
choice in terms of how the feature is named or how the GUCs work will
still be pretty happy to have it overall. However, if it breaks stuff
and there's no easy way to fix the breakage, that's going to be a
problem.
In broad strokes, the danger here is that doing stuff in the middle of
query execution that we currently only do at the end of query
execution will turn out to be problematic. The biggest problem, I
think, is whether it's safe to do all of the things that EXPLAIN does
while we're at some random point in query execution. It looks to me
like the wrap/unwrap stuff is more-or-less consistent with previous
discussions of how a feature of this kind should work, though I don't
recall the previous discussion and I think the patch should contain
some comments about why it works the way that it works. I do notice
that WrapExecProcNodeWithExplain does not walk the ps->initPlan list,
which I think is an oversight.
Without having the prior discussion near to hand, I *think* that the
reason we wanted to do this wrap/unwrap stuff is to make it so that
the progressive EXPLAIN code could only execute when entering a new
plan node rather than at any random point inside of that plan node,
and that does seem a lot safer than the alternative. For example, I
think it means that we won't start trying to do progressive EXPLAIN
while already holding some random LWLock, which is very good. However,
this still means we could do a bunch of catalog access (and thus
potentially receive invalidations) at places where that can't happen
today. I'm not sure that's a problem -- the same thing could happen at
any place in the executor where we evaluate a user-supplied
expression, and there are many such places -- but it's certainly worth
a few senior people thinking real carefully about it and trying to
imagine whether there's any scenario in which it might break something
that works today.
One way in which this proposal seems safer than previous proposals is
that previous proposals have involved session A poking session B and
trying to get session B to emit an EXPLAIN on the fly with no prior
setup. That would be very useful, but I think it's more difficult and
more risky than this proposal, where all the configuration happens in
the session that is going to emit the EXPLAIN output. It knows from
the beginning that it's going to maybe be doing this, and so it can do
whatever setup it likes to accomplish that goal. So I think this
design looks pretty good from that point of view.
I don't understand how this would be safe against interrupts or
errors. If a running query is interrupted, what would cause
ProgressiveExplainCleanup() to be called? If the answer is "nothing,"
why isn't that unsafe?
ExecProcNodeOriginal looks like it's basically the same thing as
ExecProcNodeReal, except that it's for a different purpose. But you
would be hard-pressed to guess which one is used for what purpose
based on the field names or the comments. Maybe it's also worth
worrying about whether this is a scalable design. Can we find a way to
use the existing fields here instead of having to add a new one?
The documentation for the progressive_explain = { off | explain |
analyze } option seems like it should go into more detail about how
the "explain" and "analyze" values are different. I'm not 100% sure I
know the answer, and I'm not the least-experienced person who will
ever read this documentation.
WrapMultiExecProcNodesWithExplain seems like a poor choice of name. It
invites confusion with MultiExecProcNode, to which it is unrelated.
I just went to some trouble to start breaking up the monolith that is
explain.c, so I'm not that happy about seeing this patch dump another
800+ lines of source code into that file. Probably we should have a
new source file for some or this, or maybe even more than one.
The changes to explain.h add three new data types. Two of them begin
with an uppercase letter and one with a lowercase letter. That seems
inconsistent. I also don't think that the declaration of char plan[]
is per PostgreSQL coding style. I believe we always write char
plan[FLEXIBLE_ARRAY_MEMBER]. Also, maybe it should be named something
other than plan, since it's really a string-ified explain-y kind of
thing, not literally a Plan. Also, can we please not have structure
members with single letter names? "h" and "p" are going to be
completely ungreppable, and I like grepping.
It looks very strange to me that ProgressiveExplainPrint() seems to
have a theoretical possibility of generating the output and then
throwing it away if we end up with entry == NULL. I guess maybe that
case is not supposed to happen because ProgressiveExplainInit() is
supposed to create the entry, but then why isn't this an elog(ERROR)
or something instead of a no-op?
It seems like when we replace a longer entry with a shorter one, we
forget that it was originally longer. Hence, if the length of a
progressive EXPLAIN is alternately 2922 characters and 2923
characters, we'll reallocate on every other progressive EXPLAIN
instead of at most once.
I'll try to look at this some more tomorrow. It seems like a very
interesting patch, but time is very short for this release and it
doesn't look to me like we have all the kinks sorted out here just
yet.
--
Robert Haas
EDB: http://www.enterprisedb.com
Hi Robert,
Thanks for sparing part of your precious time to look at the patch.
I acknowledge it is a very complex one. Since you're going to take
another look, providing some preliminary comments related to some
of the implementation concerns.
I don't understand how this would be safe against interrupts or
errors. If a running query is interrupted, what would cause
ProgressiveExplainCleanup() to be called? If the answer is "nothing,"
why isn't that unsafe?
The strategy I used here is to use a MemoryContextCallback
(ProgressiveExplainReleaseFunc), configured in the memory context
where the query is being executed, being responsible for calling
ProgressiveExplainCleanup() if the query doesn't end gracefully.
It looks very strange to me that ProgressiveExplainPrint() seems to
have a theoretical possibility of generating the output and then
throwing it away if we end up with entry == NULL. I guess maybe that
case is not supposed to happen because ProgressiveExplainInit() is
supposed to create the entry, but then why isn't this an elog(ERROR)
or something instead of a no-op?
Agreed. Will fix this.
It seems like when we replace a longer entry with a shorter one, we
forget that it was originally longer. Hence, if the length of a
progressive EXPLAIN is alternately 2922 characters and 2923
characters, we'll reallocate on every other progressive EXPLAIN
instead of at most once.
Are you talking about re-printing the plan in the same query execution?
The logic for the code, using your example, would be to allocate 2922 +
PROGRESSIVE_EXPLAIN_FREE_SIZE (4096, currently) initially. If next plans
alternate between 2922 and 2923 no additional allocation will be done.
A reallocation will be needed only if the plan length ends up exceeding
2922+4096. At the end of query execution (or cancellation) that DSA will
be freed and a next query execution will have to allocate again using the
same logic.
Regarding the execProcNode wrapper strategy. It used it precisely because
of the discussion in that other patch. I actually tried not using it here,
and call ProgressiveExplainPrint() in the timeout callback. This resulted
in sporadic crashes, confirming the suspicion that it is not a good
idea.
Regarding all other comments related to variable/function names and having
the feature in a separate file, agree with all the comments. Will send a
new version with the fixes.
Rafael.
On Wed, Mar 19, 2025 at 6:53 PM Rafael Thofehrn Castro
<rafaelthca@gmail.com> wrote:
The strategy I used here is to use a MemoryContextCallback
(ProgressiveExplainReleaseFunc), configured in the memory context
where the query is being executed, being responsible for calling
ProgressiveExplainCleanup() if the query doesn't end gracefully.
Thanks for the pointer. I'm a bit skeptical about what's going on here
in ProgressiveExplainReleaseFunc(). It seems like we're depending on
shared memory to tell us whether we need to do purely backend-local
cleanup, like calling disable_timeout() and resetting
ProgressiveExplainPending and activeQueryDesc. I would have expected
us to keep track in local memory of whether this kind of work needs to
be done. It seems roundabout to take an LWLock, do a hash table lookup
to see if there's an entry there, release the LWLock, and then very
shortly thereafter take the lock a second time to release the entry
that we now know is there.
The comment in ProgressiveExplainCleanup about only detaching from the
DSA if the query ended gracefully is not ideal from my point of view
because it says what the code does instead of why the code does that
thing. Also, the function is seemingly called with queryDesc as an
argument not because you need it for anything but because you're going
to test whether it's null. In that case, you could just pass a
Boolean. Even then, something seems odd about this: why do we have to
be holding ProgressiveExplainHashLock to dsa_detach the
somewhat-inscrutably named area pe_a? And why are we not detaching it
in case of error?
I am wondering why you chose this relatively unusual error cleanup
strategy. What I would have done is invent AtEOXact_ProgressiveExplain
and AtSubEOXact_ProgressiveExplain. In some sense this looks simpler,
because it doesn't need separate handling for transactions and
subtransactions, but it's so unlike what we do in most places that
it's hard for me to tell whether it's correct. I feel like the
approach you've chosen here would make sense if what we wanted to do
was basically release memory or some memory-like resource associated
closely with the context -- e.g. expandedrecord.c releases a
TupleDesc, but this is doing more than that.
I think the effect of this choice is that cleanup of the
progressive-EXPLAIN stuff happens much later than it normally would.
Most of the time, in the case of an abort, we would want
AbortTransaction() to release as many resources as possible, leaving
basically no work to do at CleanupTransaction() time. This is so that
if a user types "SELECT 1/0;" we release resources, as far as
possible, right away, and don't wait for them to subsequently type
"ROLLBACK;". The transaction lives on only as a shell. But these
resources, if I'm reading this right, are going to stick around until
the transaction is actually rolled back, because memory is not freed
until CleanupTransaction() time. I wonder what happens if a query
inside of an explicit transaction aborts after putting an entry in the
progressive-explain view. My guess is that the entry will stick around
until the actual rollback happens.
In fact, now that I think about it, this is probably why we don't
dsa_detach() in ProgressiveExplainCleanup() in cases of error -- the
resource owner cleanup will have already released the DSA segments
long before the memory context is deleted.
I'm sort of inclined to say that this should be rewritten to do error
cleanup in a more normal way. It's probably more code to do it that
way, but I think having it be more similar to other subsystems is
probably worth quite a bit.
It seems like when we replace a longer entry with a shorter one, we
forget that it was originally longer. Hence, if the length of a
progressive EXPLAIN is alternately 2922 characters and 2923
characters, we'll reallocate on every other progressive EXPLAIN
instead of at most once.Are you talking about re-printing the plan in the same query execution?
Yes.
The logic for the code, using your example, would be to allocate 2922 +
PROGRESSIVE_EXPLAIN_FREE_SIZE (4096, currently) initially. If next plans
alternate between 2922 and 2923 no additional allocation will be done.
A reallocation will be needed only if the plan length ends up exceeding
2922+4096. At the end of query execution (or cancellation) that DSA will
be freed and a next query execution will have to allocate again using the
same logic.
It seems to me that if ProgressiveExplainPrint() reaches /* Update
shared memory with new data */ without reallocating,
strlen(pe_data->plan) can be reduced. On the next trip through the
function, we don't know whether the string we're seeing is the
original string -- for which strlen()+PROGRESSIVE_EXPLAIN_FREE_SIZE)
gives us the original allocation size -- or whether the string we're
seeing is a shorter one that was copied over the original, longer
string. PROGRESSIVE_EXPLAIN_FREE_SIZE is big enough that this probably
isn't much of a problem in practice, because consecutive EXPLAIN
outputs for the same query probably won't vary in size by enough to
cause any reallocation. Looking at this more carefully, I think that
the query plan would have to shrink in size by >4kB and then expand
again in order to trigger reallocation, which seems unlikely. But it
still seems unclean to me. Normally, we track how much space we have
actually allocated explicitly, instead of reconstructing that number
from something else, especially something that isn't guaranteed to
produce an accurate result. I think you should just store the number
of available payload bytes at the beginning of the chunk and then
reallocate if it isn't big enough to hold the payload that you have
got.
Regarding the execProcNode wrapper strategy. It used it precisely because
of the discussion in that other patch. I actually tried not using it here,
and call ProgressiveExplainPrint() in the timeout callback. This resulted
in sporadic crashes, confirming the suspicion that it is not a good
idea.
Makes sense, but we need adequate documentation of what we did and why
it works (or at least why we think it works).
Another thing I just noticed is that pg_stat_progress_explain() uses
beentry->st_procpid == ped->pid as the permissions check, but a more
typical test is HAS_PGSTAT_PERMISSIONS(beentry->st_userid). I know
that's only in pgstatfuncs.c, but I think it would be OK to duplicate
the associated test here (i.e. has_privs_of_role(GetUserId(),
ROLE_PG_READ_ALL_STATS) || has_privs_of_role(GetUserId(), role)). I
don't really see a reason why this shouldn't use the same permission
rules as other pg_stat_ things, in particular pg_stat_get_activity().
--
Robert Haas
EDB: http://www.enterprisedb.com
Hello Robert,
Fixed most of the recommendations. Going over one at a time.
The documentation for the progressive_explain = { off | explain |
analyze } option seems like it should go into more detail about how
the "explain" and "analyze" values are different. I'm not 100% sure I
know the answer, and I'm not the least-experienced person who will
ever read this documentation.
Added details about behavior of each option. In the doc for that GUC
there is a link to another section that explains in detail how progressive
explains work.
WrapMultiExecProcNodesWithExplain seems like a poor choice of name. It
invites confusion with MultiExecProcNode, to which it is unrelated.
Renamed that function (and the unwrap equivalent) to
WrapMemberNodesExecProcNodesWithExplain
as MemberNodes is the name used by explain when parsing those types of
nodes.
I do notice that WrapExecProcNodeWithExplain does not walk
the ps->initPlan list, which I think is an oversight.
Fixed.
I just went to some trouble to start breaking up the monolith that is
explain.c, so I'm not that happy about seeing this patch dump another
800+ lines of source code into that file. Probably we should have a
new source file for some or this, or maybe even more than one.
The whole progressive explain code was moved to a new set of files
explain_progressive.h and explain_progressive.c.
The changes to explain.h add three new data types. Two of them begin
with an uppercase letter and one with a lowercase letter. That seems
inconsistent. I also don't think that the declaration of char plan[]
is per PostgreSQL coding style. I believe we always write char
plan[FLEXIBLE_ARRAY_MEMBER]. Also, maybe it should be named something
other than plan, since it's really a string-ified explain-y kind of
thing, not literally a Plan. Also, can we please not have structure
members with single letter names? "h" and "p" are going to be
completely ungreppable, and I like grepping
Done. Adjusted all data types to be uppercase. Added FLEXIBLE_ARRAY_MEMBER
and renamed "h" and "p".
It looks very strange to me that ProgressiveExplainPrint() seems to
have a theoretical possibility of generating the output and then
throwing it away if we end up with entry == NULL. I guess maybe that
case is not supposed to happen because ProgressiveExplainInit() is
supposed to create the entry, but then why isn't this an elog(ERROR)
or something instead of a no-op?
Changed to throw ERROR, which shouldn't happen.
It seems like when we replace a longer entry with a shorter one, we
forget that it was originally longer. Hence, if the length of a
progressive EXPLAIN is alternately 2922 characters and 2923
characters, we'll reallocate on every other progressive EXPLAIN
instead of at most once.
Adjusted this logic. Structure ProgressiveExplainHashEntry now contains
a field to store the allocated size, which is used to compare with
the new plan being printed.
Thanks for the pointer. I'm a bit skeptical about what's going on here
in ProgressiveExplainReleaseFunc(). It seems like we're depending on
shared memory to tell us whether we need to do purely backend-local
cleanup, like calling disable_timeout() and resetting
ProgressiveExplainPending and activeQueryDesc. I would have expected
us to keep track in local memory of whether this kind of work needs to
be done. It seems roundabout to take an LWLock, do a hash table lookup
to see if there's an entry there, release the LWLock, and then very
shortly thereafter take the lock a second time to release the entry
that we now know is there.
The comment in ProgressiveExplainCleanup about only detaching from the
DSA if the query ended gracefully is not ideal from my point of view
because it says what the code does instead of why the code does that
thing. Also, the function is seemingly called with queryDesc as an
argument not because you need it for anything but because you're going
to test whether it's null. In that case, you could just pass a
Boolean. Even then, something seems odd about this: why do we have to
be holding ProgressiveExplainHashLock to dsa_detach the
somewhat-inscrutably named area pe_a? And why are we not detaching it
in case of error?
I am wondering why you chose this relatively unusual error cleanup
strategy. What I would have done is invent AtEOXact_ProgressiveExplain
and AtSubEOXact_ProgressiveExplain. In some sense this looks simpler,
because it doesn't need separate handling for transactions and
subtransactions, but it's so unlike what we do in most places that
it's hard for me to tell whether it's correct. I feel like the
approach you've chosen here would make sense if what we wanted to do
was basically release memory or some memory-like resource associated
closely with the context -- e.g. expandedrecord.c releases a
TupleDesc, but this is doing more than that.
I think the effect of this choice is that cleanup of the
progressive-EXPLAIN stuff happens much later than it normally would.
Most of the time, in the case of an abort, we would want
AbortTransaction() to release as many resources as possible, leaving
basically no work to do at CleanupTransaction() time. This is so that
if a user types "SELECT 1/0;" we release resources, as far as
possible, right away, and don't wait for them to subsequently type
"ROLLBACK;". The transaction lives on only as a shell. But these
resources, if I'm reading this right, are going to stick around until
the transaction is actually rolled back, because memory is not freed
until CleanupTransaction() time. I wonder what happens if a query
inside of an explicit transaction aborts after putting an entry in the
progressive-explain view. My guess is that the entry will stick around
until the actual rollback happens.
In fact, now that I think about it, this is probably why we don't
dsa_detach() in ProgressiveExplainCleanup() in cases of error -- the
resource owner cleanup will have already released the DSA segments
long before the memory context is deleted.
I'm sort of inclined to say that this should be rewritten to do error
cleanup in a more normal way. It's probably more code to do it that
way, but I think having it be more similar to other subsystems is
probably worth quite a bit.
Right. I use dsa_detach() only when the query finishes gracefully. This
is needed otherwise PG will complain with:
WARNING: resource was not closed: dynamic shared memory segment 32349774
That WARNING doesn't appear in case of error and according to my tests
shared memory is correctly being released.
I completely removed the MemoryContextCallback strategy and switched to
using the new function AtEOXact_ProgressiveExplain() called from
AbortTransaction().
This first version of the progressive explain feature was designed to only
keep
track of initial query called by the backend, ignoring all subquery calls.
So
I believe we don't need to worry about having to add custom logic in
AbortSubTransaction(). In case query errors out AbortTransaction() will be
called
and everything related to progressive explains will be cleaned.
Another thing I just noticed is that pg_stat_progress_explain() uses
beentry->st_procpid == ped->pid as the permissions check, but a more
typical test is HAS_PGSTAT_PERMISSIONS(beentry->st_userid). I know
that's only in pgstatfuncs.c, but I think it would be OK to duplicate
the associated test here (i.e. has_privs_of_role(GetUserId(),
ROLE_PG_READ_ALL_STATS) || has_privs_of_role(GetUserId(), role)). I
don't really see a reason why this shouldn't use the same permission
rules as other pg_stat_ things, in particular pg_stat_get_activity().
Adjusted the logic to use has_privs_of_role(GetUserId(),
ROLE_PG_READ_ALL_STATS) || has_privs_of_role(GetUserId(), role) when
checking the privileges.
Rafael.
Attachments:
v11-0001-Proposal-for-progressive-explains.patchapplication/octet-stream; name=v11-0001-Proposal-for-progressive-explains.patchDownload
From a1037bf3282c5aaf25904bc84c26d7ac9e017561 Mon Sep 17 00:00:00 2001
From: rafaelthca <rafaelthca@gmail.com>
Date: Tue, 25 Mar 2025 19:05:59 -0300
Subject: [PATCH v11] Proposal for progressive explains
This proposal introduces a feature to print execution plans of active
queries in an in-memory shared hash object so that other sessions can
visualize via new view pg_stat_progress_explain.
Plans are only printed if new GUC parameter progressive_explain is
enabled.
When progressive_explain is set to 'explain' the plan will be printed
only once after progressive_explain_min_duration. If set to 'analyze'
the QueryDesc will be adjusted adding instrumentation flags. In that
case the plan will be printed on a fixed interval controlled by new
GUC parameter progressive_explain_interval including all instrumentation
stats computed so far (per node rows and execution time).
New view:
- pg_stat_progress_explain
- datid: OID of the database
- datname: Name of the database
- pid: PID of the process running the query
- last_update: timestamp when plan was last printed
- query_plan: the actual plan (limited read privileges)
New GUCs:
- progressive_explain: if progressive plans are printed for local
session.
- type: enum
- default: off
- values: [off, explain, analyze]
- context: user
- progressive_explain_min_duration: min query duration until progressive
explain starts.
- type: int
- default: 1s
- min: 0
- context: user
- progressive_explain_interval: interval between each explain print.
- type: int
- default: 1s
- min: 10ms
- context: user
- progressive_explain_format: format used to print the plans.
- type: enum
- default: text
- values: [TEXT, XML, JSON, or YAML]
- context: user
- progressive_explain_settings: controls whether information about
modified configuration is added to the printed plan.
- type: bool
- default: off
- context: user
- progressive_explain_verbose: controls whether verbose details are
added to the printed plan.
- type: bool
- default: off
- context: user
- progressive_explain_buffers: controls whether buffers details are
added to the printed plan.
- type: bool
- default: off
- context: user
- progressive_explain_timing: controls whether per node timing details
are added to the printed plan.
- type: bool
- default: true
- context: user
- progressive_explain_wal: controls whether WAL record generation
details are added to the printed plan.
- type: bool
- default: off
- context: user
- progressive_explain_costs: controls whether estimated startup and
total cost of each plan noded is added to the printed plan.
- type: bool
- default: true
- context: user
---
contrib/auto_explain/auto_explain.c | 10 +-
doc/src/sgml/config.sgml | 156 ++++
doc/src/sgml/monitoring.sgml | 82 ++
doc/src/sgml/perform.sgml | 97 +++
src/backend/access/transam/xact.c | 2 +
src/backend/catalog/system_views.sql | 10 +
src/backend/commands/Makefile | 1 +
src/backend/commands/explain.c | 162 ++--
src/backend/commands/explain_format.c | 12 +
src/backend/commands/explain_progressive.c | 746 ++++++++++++++++++
src/backend/executor/execMain.c | 24 +
src/backend/executor/execProcnode.c | 14 +-
src/backend/executor/instrument.c | 20 +-
src/backend/storage/ipc/ipci.c | 7 +
src/backend/storage/lmgr/lwlock.c | 1 +
.../utils/activity/wait_event_names.txt | 1 +
src/backend/utils/init/postinit.c | 19 +
src/backend/utils/misc/guc_tables.c | 130 +++
src/backend/utils/misc/postgresql.conf.sample | 14 +
src/include/catalog/pg_proc.dat | 10 +
src/include/commands/explain_progressive.h | 56 ++
src/include/commands/explain_state.h | 9 +
src/include/executor/execdesc.h | 1 +
src/include/executor/instrument.h | 1 +
src/include/nodes/execnodes.h | 6 +
src/include/storage/lwlock.h | 1 +
src/include/storage/lwlocklist.h | 1 +
src/include/utils/guc.h | 11 +
src/include/utils/timeout.h | 2 +
.../test_misc/t/008_progressive_explain.pl | 130 +++
src/test/regress/expected/rules.out | 7 +
src/tools/pgindent/typedefs.list | 3 +
32 files changed, 1687 insertions(+), 59 deletions(-)
create mode 100644 src/backend/commands/explain_progressive.c
create mode 100644 src/include/commands/explain_progressive.h
create mode 100644 src/test/modules/test_misc/t/008_progressive_explain.pl
diff --git a/contrib/auto_explain/auto_explain.c b/contrib/auto_explain/auto_explain.c
index 3b73bd19107..769b79ecc6b 100644
--- a/contrib/auto_explain/auto_explain.c
+++ b/contrib/auto_explain/auto_explain.c
@@ -39,14 +39,6 @@ static int auto_explain_log_level = LOG;
static bool auto_explain_log_nested_statements = false;
static double auto_explain_sample_rate = 1;
-static const struct config_enum_entry format_options[] = {
- {"text", EXPLAIN_FORMAT_TEXT, false},
- {"xml", EXPLAIN_FORMAT_XML, false},
- {"json", EXPLAIN_FORMAT_JSON, false},
- {"yaml", EXPLAIN_FORMAT_YAML, false},
- {NULL, 0, false}
-};
-
static const struct config_enum_entry loglevel_options[] = {
{"debug5", DEBUG5, false},
{"debug4", DEBUG4, false},
@@ -188,7 +180,7 @@ _PG_init(void)
NULL,
&auto_explain_log_format,
EXPLAIN_FORMAT_TEXT,
- format_options,
+ explain_format_options,
PGC_SUSET,
0,
NULL,
diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml
index 69fc93dffc4..ff0b147877e 100644
--- a/doc/src/sgml/config.sgml
+++ b/doc/src/sgml/config.sgml
@@ -8674,6 +8674,162 @@ COPY postgres_log FROM '/full/path/to/logfile.csv' WITH csv;
</listitem>
</varlistentry>
+ <varlistentry id="guc-progressive-explain" xreflabel="progressive_explain">
+ <term><varname>progressive_explain</varname> (<type>enum</type>)
+ <indexterm>
+ <primary><varname>progressive_explain</varname> configuration parameter</primary>
+ </indexterm>
+ </term>
+ <listitem>
+ <para>
+ Determines whether progressive explains are enabled and how
+ they are executed; see <xref linkend="using-explain-progressive"/>.
+ Possible values are <literal>off</literal>, <literal>explain</literal>
+ and <literal>analyze</literal>. The default is <literal>off</literal>.
+ When set to <literal>explain</literal> the plan will be printed only
+ once after <xref linkend="guc-progressive-explain-min-duration"/>. If
+ set to <literal>analyze</literal>, instrumentation flags are enabled,
+ causing the plan to be printed on a fixed interval controlled by
+ <xref linkend="guc-progressive-explain-interval"/> including all
+ instrumentation stats computed so far.
+ </para>
+ </listitem>
+ </varlistentry>
+
+ <varlistentry id="guc-progressive-explain-min-duration" xreflabel="progressive_explain_min_duration">
+ <term><varname>progressive_explain_min_duration</varname> (<type>integer</type>)
+ <indexterm>
+ <primary><varname>progressive_explain_min_duration</varname> configuration parameter</primary>
+ </indexterm>
+ </term>
+ <listitem>
+ <para>
+ Sets the threshold (in milliseconds) until progressive explain is
+ printed for the first time. The default is <literal>1s</literal>.
+ </para>
+ </listitem>
+ </varlistentry>
+
+ <varlistentry id="guc-progressive-explain-interval" xreflabel="progressive_explain_interval">
+ <term><varname>progressive_explain_interval</varname> (<type>integer</type>)
+ <indexterm>
+ <primary><varname>progressive_explain_interval</varname> configuration parameter</primary>
+ </indexterm>
+ </term>
+ <listitem>
+ <para>
+ Sets the interval (in milliseconds) between each instrumented
+ progressive explain. The default is <literal>1s</literal>.
+ </para>
+ </listitem>
+ </varlistentry>
+
+ <varlistentry id="guc-progressive-explain-buffers" xreflabel="progressive_explain_buffers">
+ <term><varname>progressive_explain_buffers</varname> (<type>boolean</type>)
+ <indexterm>
+ <primary><varname>progressive_explain_buffers</varname> configuration parameter</primary>
+ </indexterm>
+ </term>
+ <listitem>
+ <para>
+ Controls whether information on buffer usage is added to
+ progressive explains. Equivalent to the BUFFERS option of
+ EXPLAIN. The default is <literal>off</literal>.
+ </para>
+ </listitem>
+ </varlistentry>
+
+ <varlistentry id="guc-progressive-explain-timing" xreflabel="progressive_explain_timing">
+ <term><varname>progressive_explain_timing</varname> (<type>boolean</type>)
+ <indexterm>
+ <primary><varname>progressive_explain_timing</varname> configuration parameter</primary>
+ </indexterm>
+ </term>
+ <listitem>
+ <para>
+ Controls whether information on per node timing is added
+ to progressive explains. Equivalent to the TIMING option of
+ EXPLAIN. The default is <literal>off</literal>.
+ </para>
+ </listitem>
+ </varlistentry>
+
+ <varlistentry id="guc-progressive-explain-wal" xreflabel="progressive_explain_wal">
+ <term><varname>progressive_explain_wal</varname> (<type>boolean</type>)
+ <indexterm>
+ <primary><varname>progressive_explain_wal</varname> configuration parameter</primary>
+ </indexterm>
+ </term>
+ <listitem>
+ <para>
+ Controls whether information on WAL record generation is
+ added to progressive explains. Equivalent to the WAL option of
+ EXPLAIN. The default is <literal>off</literal>.
+ </para>
+ </listitem>
+ </varlistentry>
+
+ <varlistentry id="guc-progressive-explain-verbose" xreflabel="progressive_explain_verbose">
+ <term><varname>progressive_explain_verbose</varname> (<type>boolean</type>)
+ <indexterm>
+ <primary><varname>progressive_explain_verbose</varname> configuration parameter</primary>
+ </indexterm>
+ </term>
+ <listitem>
+ <para>
+ Controls whether verbose details are added to progressive explains.
+ Equivalent to the VERBOSE option of EXPLAIN. The default is
+ <literal>off</literal>.
+ </para>
+ </listitem>
+ </varlistentry>
+
+ <varlistentry id="guc-progressive-explain-settings" xreflabel="progressive_explain_settings">
+ <term><varname>progressive_explain_settings</varname> (<type>boolean</type>)
+ <indexterm>
+ <primary><varname>progressive_explain_settings</varname> configuration parameter</primary>
+ </indexterm>
+ </term>
+ <listitem>
+ <para>
+ Controls whether information on modified configuration is added to
+ progressive explains. Equivalent to the SETTINGS option of EXPLAIN.
+ The default is <literal>off</literal>.
+ </para>
+ </listitem>
+ </varlistentry>
+
+ <varlistentry id="guc-progressive-explain-costs" xreflabel="progressive_explain_costs">
+ <term><varname>progressive_explain_costs</varname> (<type>boolean</type>)
+ <indexterm>
+ <primary><varname>progressive_explain_costs</varname> configuration parameter</primary>
+ </indexterm>
+ </term>
+ <listitem>
+ <para>
+ Controls whether information on the estimated startup and total cost of
+ each plan node is added to progressive explains. Equivalent to the COSTS
+ option of EXPLAIN.
+ The default is <literal>off</literal>.
+ </para>
+ </listitem>
+ </varlistentry>
+
+ <varlistentry id="guc-progressive-explain-format" xreflabel="progressive_explain_format">
+ <term><varname>progressive_explain_format</varname> (<type>enum</type>)
+ <indexterm>
+ <primary><varname>progressive_explain_format</varname> configuration parameter</primary>
+ </indexterm>
+ </term>
+ <listitem>
+ <para>
+ Selects the EXPLAIN output format to be used with progressive
+ explains. Equivalent to the FORMAT option of EXPLAIN. The default
+ is <literal>text</literal>.
+ </para>
+ </listitem>
+ </varlistentry>
+
</variablelist>
</sect2>
diff --git a/doc/src/sgml/monitoring.sgml b/doc/src/sgml/monitoring.sgml
index 0960f5ba94a..14780c54565 100644
--- a/doc/src/sgml/monitoring.sgml
+++ b/doc/src/sgml/monitoring.sgml
@@ -6840,6 +6840,88 @@ FROM pg_stat_get_backend_idset() AS backendid;
</sect2>
+<sect2 id="explain-progress-reporting">
+ <title>EXPLAIN Progress Reporting</title>
+
+ <indexterm>
+ <primary>pg_stat_progress_explain</primary>
+ </indexterm>
+
+ <para>
+ Whenever a client backend or parallel worker is running a query with
+ <xref linkend="guc-progressive-explain"/> enabled, the
+ <structname>pg_stat_progress_explain</structname> view will contain a
+ corresponding row with query plan details; see
+ <xref linkend="using-explain-progressive"/>. The table below describe the
+ information that will be reported.
+ </para>
+
+ <table id="pg-stat-progress-explain-view" xreflabel="pg_stat_progress_explain">
+ <title><structname>pg_stat_progress_explain</structname> View</title>
+ <tgroup cols="1">
+ <thead>
+ <row>
+ <entry role="catalog_table_entry"><para role="column_definition">
+ Column Type
+ </para>
+ <para>
+ Description
+ </para></entry>
+ </row>
+ </thead>
+
+ <tbody>
+ <row>
+ <entry role="catalog_table_entry"><para role="column_definition">
+ <structfield>datid</structfield> <type>oid</type>
+ </para>
+ <para>
+ OID of the database this backend is connected to
+ </para></entry>
+ </row>
+
+ <row>
+ <entry role="catalog_table_entry"><para role="column_definition">
+ <structfield>datname</structfield> <type>name</type>
+ </para>
+ <para>
+ Name of the database this backend is connected to
+ </para></entry>
+ </row>
+
+ <row>
+ <entry role="catalog_table_entry"><para role="column_definition">
+ <structfield>pid</structfield> <type>integer</type>
+ </para>
+ <para>
+ Process ID of a client backend or parallel worker.
+ </para></entry>
+ </row>
+
+ <row>
+ <entry role="catalog_table_entry"><para role="column_definition">
+ <structfield>last_update</structfield> <type>timestamp with time zone</type>
+ </para>
+ <para>
+ Timestamp when plan was last printed.
+ </para></entry>
+ </row>
+
+ <row>
+ <entry role="catalog_table_entry"><para role="column_definition">
+ <structfield>query_plan</structfield> <type>text</type>
+ </para>
+ <para>
+ The progressive explain text.
+ </para></entry>
+ </row>
+
+ </tbody>
+ </tgroup>
+ </table>
+
+ </sect2>
+
</sect1>
<sect1 id="dynamic-trace">
diff --git a/doc/src/sgml/perform.sgml b/doc/src/sgml/perform.sgml
index 387baac7e8c..04a78f29df9 100644
--- a/doc/src/sgml/perform.sgml
+++ b/doc/src/sgml/perform.sgml
@@ -1169,6 +1169,103 @@ EXPLAIN ANALYZE SELECT * FROM tenk1 WHERE unique1 < 100 AND unique2 > 9000
</para>
</sect2>
+ <sect2 id="using-explain-progressive">
+ <title>Progressive <command>EXPLAIN</command></title>
+
+ <para>
+ The query plan created by the planner for any given active query can
+ be visualized by any session via <xref linkend="pg-stat-progress-explain-view"/>
+ view when <xref linkend="guc-progressive-explain"/> is enabled in the
+ client backend or parallel worker executing query and after min duration
+ specified by <xref linkend="guc-progressive-explain-min-duration"/> has
+ passed. Settings <xref linkend="guc-progressive-explain-timing"/>,
+ <xref linkend="guc-progressive-explain-buffers"/> and
+ <xref linkend="guc-progressive-explain-wal"/> control which additional
+ instrumentaton details are collected and included in the output while
+ <xref linkend="guc-progressive-explain-format"/>,
+ <xref linkend="guc-progressive-explain-verbose"/>,
+ <xref linkend="guc-progressive-explain-settings"/> and
+ <xref linkend="guc-progressive-explain-costs"/>
+ define how the plan is printed and which details are added there.
+ </para>
+
+ <para>
+ When <xref linkend="guc-progressive-explain"/> is set to <literal>explain</literal>
+ the plan will be printed once at the beginning of the query.
+ </para>
+
+ <para>
+<screen>
+SET progressive_explain = 'explain';
+SET
+
+SELECT * FROM test t1 INNER JOIN test t2 ON t1.c1=t2.c1;
+</screen>
+ </para>
+ <para>
+<screen>
+SELECT * FROM pg_stat_progress_explain;
+datid | datname | pid | last_update | query_plan
+-------+----------+-------+-------------------------------+------------------------------------------------------------------------------
+ 5 | postgres | 73972 | 2025-03-13 23:41:01.606324-03 | Hash Join (cost=327879.85..878413.95 rows=9999860 width=18) +
+ | | | | Hash Cond: (t1.c1 = t2.c1) +
+ | | | | -> Seq Scan on test t1 (cost=0.00..154053.60 rows=9999860 width=9) +
+ | | | | -> Hash (cost=154053.60..154053.60 rows=9999860 width=9) +
+ | | | | -> Seq Scan on test t2 (cost=0.00..154053.60 rows=9999860 width=9)+
+ | | | |
+(1 row)
+</screen>
+ </para>
+
+ <para>
+ Setting <xref linkend="guc-progressive-explain"/> to <literal>analyze</literal>
+ will enable instrumentation and the detailed plan is printed on a fixed interval
+ controlled by <xref linkend="guc-progressive-explain-interval"/>, including
+ per node accumulated row count and other statistics.
+ </para>
+
+ <para>
+ Progressive explains include additional information per node to help analyzing
+ execution progress:
+
+ <itemizedlist>
+ <listitem>
+ <para>
+ current: the plan node currently being processed.
+ </para>
+ </listitem>
+ <listitem>
+ <para>
+ never executed: a plan node not processed yet.
+ </para>
+ </listitem>
+ </itemizedlist>
+ </para>
+ <para>
+<screen>
+SET progressive_explain = 'analyze';
+SET
+
+SELECT * FROM test t1 INNER JOIN test t2 ON t1.c1=t2.c1;
+</screen>
+ </para>
+ <para>
+<screen>
+SELECT * FROM pg_stat_progress_explain;
+datid | datname | pid | last_update | query_plan
+-------+----------+-------+-------------------------------+---------------------------------------------------------------------------------------------------------------------------------------
+ 5 | postgres | 73972 | 2025-03-13 23:41:53.951884-03 | Hash Join (cost=327879.85..878413.95 rows=9999860 width=18) (actual time=1581.469..2963.158 rows=64862.00 loops=1) +
+ | | | | Hash Cond: (t1.c1 = t2.c1) +
+ | | | | -> Seq Scan on test t1 (cost=0.00..154053.60 rows=9999860 width=9) (actual time=0.079..486.702 rows=8258962.00 loops=1) (current)+
+ | | | | -> Hash (cost=154053.60..154053.60 rows=9999860 width=9) (actual time=1580.933..1580.933 rows=10000000.00 loops=1) +
+ | | | | -> Seq Scan on test t2 (cost=0.00..154053.60 rows=9999860 width=9) (actual time=0.004..566.961 rows=10000000.00 loops=1) +
+ | | | |
+(1 row)
+</screen>
+ </para>
+
+ </sect2>
+
</sect1>
<sect1 id="planner-stats">
diff --git a/src/backend/access/transam/xact.c b/src/backend/access/transam/xact.c
index b885513f765..2a0214d4935 100644
--- a/src/backend/access/transam/xact.c
+++ b/src/backend/access/transam/xact.c
@@ -36,6 +36,7 @@
#include "catalog/pg_enum.h"
#include "catalog/storage.h"
#include "commands/async.h"
+#include "commands/explain_progressive.h"
#include "commands/tablecmds.h"
#include "commands/trigger.h"
#include "common/pg_prng.h"
@@ -2993,6 +2994,7 @@ AbortTransaction(void)
AtEOXact_PgStat(false, is_parallel_worker);
AtEOXact_ApplyLauncher(false);
AtEOXact_LogicalRepWorkers(false);
+ AtEOXact_ProgressiveExplain();
pgstat_report_xact_timestamp(0);
}
diff --git a/src/backend/catalog/system_views.sql b/src/backend/catalog/system_views.sql
index 31d269b7ee0..767735c1a2c 100644
--- a/src/backend/catalog/system_views.sql
+++ b/src/backend/catalog/system_views.sql
@@ -1334,6 +1334,16 @@ CREATE VIEW pg_stat_progress_copy AS
FROM pg_stat_get_progress_info('COPY') AS S
LEFT JOIN pg_database D ON S.datid = D.oid;
+CREATE VIEW pg_stat_progress_explain AS
+ SELECT
+ S.datid AS datid,
+ D.datname AS datname,
+ S.pid,
+ S.last_update,
+ S.query_plan
+ FROM pg_stat_progress_explain() AS S
+ LEFT JOIN pg_database AS D ON (S.datid = D.oid);
+
CREATE VIEW pg_user_mappings AS
SELECT
U.oid AS umid,
diff --git a/src/backend/commands/Makefile b/src/backend/commands/Makefile
index cb2fbdc7c60..e10224b2cd2 100644
--- a/src/backend/commands/Makefile
+++ b/src/backend/commands/Makefile
@@ -36,6 +36,7 @@ OBJS = \
explain.o \
explain_dr.o \
explain_format.o \
+ explain_progressive.o \
explain_state.o \
extension.o \
foreigncmds.o \
diff --git a/src/backend/commands/explain.c b/src/backend/commands/explain.c
index 391b34a2af2..bca916634f8 100644
--- a/src/backend/commands/explain.c
+++ b/src/backend/commands/explain.c
@@ -20,9 +20,11 @@
#include "commands/explain.h"
#include "commands/explain_dr.h"
#include "commands/explain_format.h"
+#include "commands/explain_progressive.h"
#include "commands/explain_state.h"
#include "commands/prepare.h"
#include "foreign/fdwapi.h"
+#include "funcapi.h"
#include "jit/jit.h"
#include "libpq/pqformat.h"
#include "libpq/protocol.h"
@@ -34,6 +36,7 @@
#include "rewrite/rewriteHandler.h"
#include "storage/bufmgr.h"
#include "tcop/tcopprot.h"
+#include "utils/backend_status.h"
#include "utils/builtins.h"
#include "utils/guc_tables.h"
#include "utils/json.h"
@@ -41,6 +44,7 @@
#include "utils/rel.h"
#include "utils/ruleutils.h"
#include "utils/snapmgr.h"
+#include "utils/timeout.h"
#include "utils/tuplesort.h"
#include "utils/typcache.h"
#include "utils/xml.h"
@@ -139,7 +143,7 @@ static void show_indexsearches_info(PlanState *planstate, ExplainState *es);
static void show_tidbitmap_info(BitmapHeapScanState *planstate,
ExplainState *es);
static void show_instrumentation_count(const char *qlabel, int which,
- PlanState *planstate, ExplainState *es);
+ Instrumentation *instr, ExplainState *es);
static void show_foreignscan_info(ForeignScanState *fsstate, ExplainState *es);
static const char *explain_get_index_name(Oid indexId);
static bool peek_buffer_usage(ExplainState *es, const BufferUsage *usage);
@@ -596,6 +600,15 @@ ExplainOnePlan(PlannedStmt *plannedstmt, CachedPlan *cplan,
/* We can't run ExecutorEnd 'till we're done printing the stats... */
totaltime += elapsed_time(&starttime);
}
+ else
+ {
+ /*
+ * Handle progressive explain cleanup manually if enabled as it is not
+ * called as part of ExecutorFinish.
+ */
+ if (progressive_explain != PROGRESSIVE_EXPLAIN_NONE)
+ ProgressiveExplainFinish(queryDesc);
+ }
/* grab serialization metrics before we destroy the DestReceiver */
if (es->serialize != EXPLAIN_SERIALIZE_NONE)
@@ -1371,6 +1384,7 @@ ExplainNode(PlanState *planstate, List *ancestors,
const char *partialmode = NULL;
const char *operation = NULL;
const char *custom_name = NULL;
+ Instrumentation *local_instr = NULL;
ExplainWorkersState *save_workers_state = es->workers_state;
int save_indent = es->indent;
bool haschildren;
@@ -1834,17 +1848,38 @@ ExplainNode(PlanState *planstate, List *ancestors,
* instrumentation results the user didn't ask for. But we do the
* InstrEndLoop call anyway, if possible, to reduce the number of cases
* auto_explain has to contend with.
+ *
+ * For regular explains instrumentation clean up is called directly in the
+ * main instrumentation objects. Progressive explains need to clone
+ * instrumentation object and forcibly end the loop in nodes that may be
+ * running.
*/
if (planstate->instrument)
- InstrEndLoop(planstate->instrument);
+ {
+ /* Progressive explain. Use auxiliary instrumentation object */
+ if (es->progressive)
+ {
+ local_instr = es->pe_local_instr;
+ *local_instr = *planstate->instrument;
+
+ /* Force end loop even if node is in progress */
+ InstrEndLoopForce(local_instr);
+ }
+ /* Use main instrumentation */
+ else
+ {
+ local_instr = planstate->instrument;
+ InstrEndLoop(local_instr);
+ }
+ }
if (es->analyze &&
- planstate->instrument && planstate->instrument->nloops > 0)
+ local_instr && local_instr->nloops > 0)
{
- double nloops = planstate->instrument->nloops;
- double startup_ms = 1000.0 * planstate->instrument->startup / nloops;
- double total_ms = 1000.0 * planstate->instrument->total / nloops;
- double rows = planstate->instrument->ntuples / nloops;
+ double nloops = local_instr->nloops;
+ double startup_ms = 1000.0 * local_instr->startup / nloops;
+ double total_ms = 1000.0 * local_instr->total / nloops;
+ double rows = local_instr->ntuples / nloops;
if (es->format == EXPLAIN_FORMAT_TEXT)
{
@@ -1854,6 +1889,9 @@ ExplainNode(PlanState *planstate, List *ancestors,
appendStringInfo(es->str, "time=%.3f..%.3f ", startup_ms, total_ms);
appendStringInfo(es->str, "rows=%.2f loops=%.0f)", rows, nloops);
+
+ if (es->progressive && planstate == es->pe_curr_node)
+ appendStringInfo(es->str, " (current)");
}
else
{
@@ -1866,6 +1904,10 @@ ExplainNode(PlanState *planstate, List *ancestors,
}
ExplainPropertyFloat("Actual Rows", NULL, rows, 2, es);
ExplainPropertyFloat("Actual Loops", NULL, nloops, 0, es);
+
+ /* Progressive explain. Add current node flag is applicable */
+ if (es->progressive && planstate == es->pe_curr_node)
+ ExplainPropertyBool("Current", true, es);
}
}
else if (es->analyze)
@@ -1970,13 +2012,13 @@ ExplainNode(PlanState *planstate, List *ancestors,
"Index Cond", planstate, ancestors, es);
if (((IndexScan *) plan)->indexqualorig)
show_instrumentation_count("Rows Removed by Index Recheck", 2,
- planstate, es);
+ local_instr, es);
show_scan_qual(((IndexScan *) plan)->indexorderbyorig,
"Order By", planstate, ancestors, es);
show_scan_qual(plan->qual, "Filter", planstate, ancestors, es);
if (plan->qual)
show_instrumentation_count("Rows Removed by Filter", 1,
- planstate, es);
+ local_instr, es);
show_indexsearches_info(planstate, es);
break;
case T_IndexOnlyScan:
@@ -1984,16 +2026,16 @@ ExplainNode(PlanState *planstate, List *ancestors,
"Index Cond", planstate, ancestors, es);
if (((IndexOnlyScan *) plan)->recheckqual)
show_instrumentation_count("Rows Removed by Index Recheck", 2,
- planstate, es);
+ local_instr, es);
show_scan_qual(((IndexOnlyScan *) plan)->indexorderby,
"Order By", planstate, ancestors, es);
show_scan_qual(plan->qual, "Filter", planstate, ancestors, es);
if (plan->qual)
show_instrumentation_count("Rows Removed by Filter", 1,
- planstate, es);
+ local_instr, es);
if (es->analyze)
ExplainPropertyFloat("Heap Fetches", NULL,
- planstate->instrument->ntuples2, 0, es);
+ local_instr->ntuples2, 0, es);
show_indexsearches_info(planstate, es);
break;
case T_BitmapIndexScan:
@@ -2006,11 +2048,11 @@ ExplainNode(PlanState *planstate, List *ancestors,
"Recheck Cond", planstate, ancestors, es);
if (((BitmapHeapScan *) plan)->bitmapqualorig)
show_instrumentation_count("Rows Removed by Index Recheck", 2,
- planstate, es);
+ local_instr, es);
show_scan_qual(plan->qual, "Filter", planstate, ancestors, es);
if (plan->qual)
show_instrumentation_count("Rows Removed by Filter", 1,
- planstate, es);
+ local_instr, es);
show_tidbitmap_info((BitmapHeapScanState *) planstate, es);
break;
case T_SampleScan:
@@ -2027,7 +2069,7 @@ ExplainNode(PlanState *planstate, List *ancestors,
show_scan_qual(plan->qual, "Filter", planstate, ancestors, es);
if (plan->qual)
show_instrumentation_count("Rows Removed by Filter", 1,
- planstate, es);
+ local_instr, es);
if (IsA(plan, CteScan))
show_ctescan_info(castNode(CteScanState, planstate), es);
break;
@@ -2038,7 +2080,7 @@ ExplainNode(PlanState *planstate, List *ancestors,
show_scan_qual(plan->qual, "Filter", planstate, ancestors, es);
if (plan->qual)
show_instrumentation_count("Rows Removed by Filter", 1,
- planstate, es);
+ local_instr, es);
ExplainPropertyInteger("Workers Planned", NULL,
gather->num_workers, es);
@@ -2062,7 +2104,7 @@ ExplainNode(PlanState *planstate, List *ancestors,
show_scan_qual(plan->qual, "Filter", planstate, ancestors, es);
if (plan->qual)
show_instrumentation_count("Rows Removed by Filter", 1,
- planstate, es);
+ local_instr, es);
ExplainPropertyInteger("Workers Planned", NULL,
gm->num_workers, es);
@@ -2096,7 +2138,7 @@ ExplainNode(PlanState *planstate, List *ancestors,
show_scan_qual(plan->qual, "Filter", planstate, ancestors, es);
if (plan->qual)
show_instrumentation_count("Rows Removed by Filter", 1,
- planstate, es);
+ local_instr, es);
break;
case T_TableFuncScan:
if (es->verbose)
@@ -2110,7 +2152,7 @@ ExplainNode(PlanState *planstate, List *ancestors,
show_scan_qual(plan->qual, "Filter", planstate, ancestors, es);
if (plan->qual)
show_instrumentation_count("Rows Removed by Filter", 1,
- planstate, es);
+ local_instr, es);
show_table_func_scan_info(castNode(TableFuncScanState,
planstate), es);
break;
@@ -2128,7 +2170,7 @@ ExplainNode(PlanState *planstate, List *ancestors,
show_scan_qual(plan->qual, "Filter", planstate, ancestors, es);
if (plan->qual)
show_instrumentation_count("Rows Removed by Filter", 1,
- planstate, es);
+ local_instr, es);
}
break;
case T_TidRangeScan:
@@ -2145,14 +2187,14 @@ ExplainNode(PlanState *planstate, List *ancestors,
show_scan_qual(plan->qual, "Filter", planstate, ancestors, es);
if (plan->qual)
show_instrumentation_count("Rows Removed by Filter", 1,
- planstate, es);
+ local_instr, es);
}
break;
case T_ForeignScan:
show_scan_qual(plan->qual, "Filter", planstate, ancestors, es);
if (plan->qual)
show_instrumentation_count("Rows Removed by Filter", 1,
- planstate, es);
+ local_instr, es);
show_foreignscan_info((ForeignScanState *) planstate, es);
break;
case T_CustomScan:
@@ -2162,7 +2204,7 @@ ExplainNode(PlanState *planstate, List *ancestors,
show_scan_qual(plan->qual, "Filter", planstate, ancestors, es);
if (plan->qual)
show_instrumentation_count("Rows Removed by Filter", 1,
- planstate, es);
+ local_instr, es);
if (css->methods->ExplainCustomScan)
css->methods->ExplainCustomScan(css, ancestors, es);
}
@@ -2172,11 +2214,11 @@ ExplainNode(PlanState *planstate, List *ancestors,
"Join Filter", planstate, ancestors, es);
if (((NestLoop *) plan)->join.joinqual)
show_instrumentation_count("Rows Removed by Join Filter", 1,
- planstate, es);
+ local_instr, es);
show_upper_qual(plan->qual, "Filter", planstate, ancestors, es);
if (plan->qual)
show_instrumentation_count("Rows Removed by Filter", 2,
- planstate, es);
+ local_instr, es);
break;
case T_MergeJoin:
show_upper_qual(((MergeJoin *) plan)->mergeclauses,
@@ -2185,11 +2227,11 @@ ExplainNode(PlanState *planstate, List *ancestors,
"Join Filter", planstate, ancestors, es);
if (((MergeJoin *) plan)->join.joinqual)
show_instrumentation_count("Rows Removed by Join Filter", 1,
- planstate, es);
+ local_instr, es);
show_upper_qual(plan->qual, "Filter", planstate, ancestors, es);
if (plan->qual)
show_instrumentation_count("Rows Removed by Filter", 2,
- planstate, es);
+ local_instr, es);
break;
case T_HashJoin:
show_upper_qual(((HashJoin *) plan)->hashclauses,
@@ -2198,11 +2240,11 @@ ExplainNode(PlanState *planstate, List *ancestors,
"Join Filter", planstate, ancestors, es);
if (((HashJoin *) plan)->join.joinqual)
show_instrumentation_count("Rows Removed by Join Filter", 1,
- planstate, es);
+ local_instr, es);
show_upper_qual(plan->qual, "Filter", planstate, ancestors, es);
if (plan->qual)
show_instrumentation_count("Rows Removed by Filter", 2,
- planstate, es);
+ local_instr, es);
break;
case T_Agg:
show_agg_keys(castNode(AggState, planstate), ancestors, es);
@@ -2210,7 +2252,7 @@ ExplainNode(PlanState *planstate, List *ancestors,
show_hashagg_info((AggState *) planstate, es);
if (plan->qual)
show_instrumentation_count("Rows Removed by Filter", 1,
- planstate, es);
+ local_instr, es);
break;
case T_WindowAgg:
show_window_def(castNode(WindowAggState, planstate), ancestors, es);
@@ -2219,7 +2261,7 @@ ExplainNode(PlanState *planstate, List *ancestors,
show_upper_qual(plan->qual, "Filter", planstate, ancestors, es);
if (plan->qual)
show_instrumentation_count("Rows Removed by Filter", 1,
- planstate, es);
+ local_instr, es);
show_windowagg_info(castNode(WindowAggState, planstate), es);
break;
case T_Group:
@@ -2227,7 +2269,7 @@ ExplainNode(PlanState *planstate, List *ancestors,
show_upper_qual(plan->qual, "Filter", planstate, ancestors, es);
if (plan->qual)
show_instrumentation_count("Rows Removed by Filter", 1,
- planstate, es);
+ local_instr, es);
break;
case T_Sort:
show_sort_keys(castNode(SortState, planstate), ancestors, es);
@@ -2249,7 +2291,7 @@ ExplainNode(PlanState *planstate, List *ancestors,
show_upper_qual(plan->qual, "Filter", planstate, ancestors, es);
if (plan->qual)
show_instrumentation_count("Rows Removed by Filter", 1,
- planstate, es);
+ local_instr, es);
break;
case T_ModifyTable:
show_modifytable_info(castNode(ModifyTableState, planstate), ancestors,
@@ -2294,10 +2336,10 @@ ExplainNode(PlanState *planstate, List *ancestors,
}
/* Show buffer/WAL usage */
- if (es->buffers && planstate->instrument)
- show_buffer_usage(es, &planstate->instrument->bufusage);
- if (es->wal && planstate->instrument)
- show_wal_usage(es, &planstate->instrument->walusage);
+ if (es->buffers && local_instr)
+ show_buffer_usage(es, &local_instr->bufusage);
+ if (es->wal && local_instr)
+ show_wal_usage(es, &local_instr->walusage);
/* Prepare per-worker buffer/WAL usage */
if (es->workers_state && (es->buffers || es->wal) && es->verbose)
@@ -3975,19 +4017,19 @@ show_tidbitmap_info(BitmapHeapScanState *planstate, ExplainState *es)
*/
static void
show_instrumentation_count(const char *qlabel, int which,
- PlanState *planstate, ExplainState *es)
+ Instrumentation *instr, ExplainState *es)
{
double nfiltered;
double nloops;
- if (!es->analyze || !planstate->instrument)
+ if (!es->analyze || !instr)
return;
if (which == 2)
- nfiltered = planstate->instrument->nfiltered2;
+ nfiltered = instr->nfiltered2;
else
- nfiltered = planstate->instrument->nfiltered1;
- nloops = planstate->instrument->nloops;
+ nfiltered = instr->nfiltered1;
+ nloops = instr->nloops;
/* In text mode, suppress zero counts; they're not interesting enough */
if (nfiltered > 0 || es->format != EXPLAIN_FORMAT_TEXT)
@@ -4668,7 +4710,7 @@ show_modifytable_info(ModifyTableState *mtstate, List *ancestors,
{
show_upper_qual((List *) node->onConflictWhere, "Conflict Filter",
&mtstate->ps, ancestors, es);
- show_instrumentation_count("Rows Removed by Conflict Filter", 1, &mtstate->ps, es);
+ show_instrumentation_count("Rows Removed by Conflict Filter", 1, (&mtstate->ps)->instrument, es);
}
/* EXPLAIN ANALYZE display of actual outcome for each tuple proposed */
@@ -4677,11 +4719,24 @@ show_modifytable_info(ModifyTableState *mtstate, List *ancestors,
double total;
double insert_path;
double other_path;
+ Instrumentation *local_instr;
- InstrEndLoop(outerPlanState(mtstate)->instrument);
+ /* Progressive explain. Use auxiliary instrumentation object */
+ if (es->progressive)
+ {
+ local_instr = es->pe_local_instr;
+ *local_instr = *outerPlanState(mtstate)->instrument;
+ /* Force end loop even if node is in progress */
+ InstrEndLoopForce(local_instr);
+ }
+ else
+ {
+ local_instr = outerPlanState(mtstate)->instrument;
+ InstrEndLoop(local_instr);
+ }
/* count the number of source rows */
- total = outerPlanState(mtstate)->instrument->ntuples;
+ total = local_instr->ntuples;
other_path = mtstate->ps.instrument->ntuples2;
insert_path = total - other_path;
@@ -4701,11 +4756,24 @@ show_modifytable_info(ModifyTableState *mtstate, List *ancestors,
double update_path;
double delete_path;
double skipped_path;
+ Instrumentation *local_instr;
- InstrEndLoop(outerPlanState(mtstate)->instrument);
+ /* Progressive explain. Use auxiliary instrumentation object */
+ if (es->progressive)
+ {
+ local_instr = es->pe_local_instr;
+ *local_instr = *outerPlanState(mtstate)->instrument;
+ /* Force end loop even if node is in progress */
+ InstrEndLoopForce(local_instr);
+ }
+ else
+ {
+ local_instr = outerPlanState(mtstate)->instrument;
+ InstrEndLoop(local_instr);
+ }
/* count the number of source rows */
- total = outerPlanState(mtstate)->instrument->ntuples;
+ total = local_instr->ntuples;
insert_path = mtstate->mt_merge_inserted;
update_path = mtstate->mt_merge_updated;
delete_path = mtstate->mt_merge_deleted;
diff --git a/src/backend/commands/explain_format.c b/src/backend/commands/explain_format.c
index 752691d56db..c0d6973d1e5 100644
--- a/src/backend/commands/explain_format.c
+++ b/src/backend/commands/explain_format.c
@@ -16,6 +16,7 @@
#include "commands/explain.h"
#include "commands/explain_format.h"
#include "commands/explain_state.h"
+#include "utils/guc_tables.h"
#include "utils/json.h"
#include "utils/xml.h"
@@ -25,6 +26,17 @@
#define X_CLOSE_IMMEDIATE 2
#define X_NOWHITESPACE 4
+/*
+ * GUC support
+ */
+const struct config_enum_entry explain_format_options[] = {
+ {"text", EXPLAIN_FORMAT_TEXT, false},
+ {"xml", EXPLAIN_FORMAT_XML, false},
+ {"json", EXPLAIN_FORMAT_JSON, false},
+ {"yaml", EXPLAIN_FORMAT_YAML, false},
+ {NULL, 0, false}
+};
+
static void ExplainJSONLineEnding(ExplainState *es);
static void ExplainXMLTag(const char *tagname, int flags, ExplainState *es);
static void ExplainYAMLLineStarting(ExplainState *es);
diff --git a/src/backend/commands/explain_progressive.c b/src/backend/commands/explain_progressive.c
new file mode 100644
index 00000000000..657670661e9
--- /dev/null
+++ b/src/backend/commands/explain_progressive.c
@@ -0,0 +1,746 @@
+/*-------------------------------------------------------------------------
+ *
+ * explain_progressive.c
+ * Code for the progressive explain feature
+ *
+ * Portions Copyright (c) 1996-2025, PostgreSQL Global Development Group
+ * Portions Copyright (c) 1994-5, Regents of the University of California
+ *
+ * IDENTIFICATION
+ * src/backend/commands/explain_progressive.c
+ *
+ *-------------------------------------------------------------------------
+ */
+#include "postgres.h"
+
+#include "catalog/pg_authid.h"
+#include "commands/explain.h"
+#include "commands/explain_format.h"
+#include "commands/explain_progressive.h"
+#include "commands/explain_state.h"
+#include "foreign/fdwapi.h"
+#include "funcapi.h"
+#include "utils/acl.h"
+#include "utils/backend_status.h"
+#include "utils/builtins.h"
+#include "utils/guc_tables.h"
+#include "utils/timeout.h"
+
+
+#define PROGRESSIVE_EXPLAIN_FREE_SIZE 4096
+
+/* Shared hash to store progressive explains */
+static HTAB *progressiveExplainHash = NULL;
+
+/* Pointer to running query */
+static QueryDesc *activeQueryDesc = NULL;
+
+/* Flag set by timeouts to control when to update progressive explains */
+bool ProgressiveExplainPending = false;
+
+static void ProgressiveExplainInit(QueryDesc *queryDesc);
+static void ProgressiveExplainPrint(QueryDesc *queryDesc);
+static void ProgressiveExplainCleanup(QueryDesc *queryDesc);
+static TupleTableSlot *ExecProcNodeExplain(PlanState *node);
+static void WrapExecProcNodeWithExplain(PlanState *ps);
+static void UnwrapExecProcNodeWithExplain(PlanState *ps);
+
+
+
+/*
+ * ProgressiveExplainSetup -
+ * Adjusts QueryDesc with instrumentation for progressive explains.
+ *
+ * If progressive explain is enabled and configured to collect
+ * instrumentation details, we adjust QueryDesc accordingly even if
+ * the query was not initiated with EXPLAIN ANALYZE. This will
+ * directly affect query execution and add computation overhead.
+ */
+void
+ProgressiveExplainSetup(QueryDesc *queryDesc)
+{
+ /* Setup only if this is the outer most query */
+ if (activeQueryDesc == NULL)
+ {
+ /* Track the outer most query desc */
+ activeQueryDesc = queryDesc;
+ queryDesc->pestate = NULL;
+
+ /* Setup instrumentation if enabled */
+ if (progressive_explain == PROGRESSIVE_EXPLAIN_ANALYZE)
+ {
+ if (progressive_explain_timing)
+ queryDesc->instrument_options |= INSTRUMENT_TIMER;
+ else
+ queryDesc->instrument_options |= INSTRUMENT_ROWS;
+ if (progressive_explain_buffers)
+ queryDesc->instrument_options |= INSTRUMENT_BUFFERS;
+ if (progressive_explain_wal)
+ queryDesc->instrument_options |= INSTRUMENT_WAL;
+ }
+ }
+}
+
+/*
+ * ProgressiveExplainStart
+ * Progressive explain start point.
+ */
+void
+ProgressiveExplainStart(QueryDesc *queryDesc)
+{
+ /*
+ * Progressive explain is only done for the outer most query descriptor.
+ */
+ if (queryDesc == activeQueryDesc)
+ {
+ /* Timeout is only needed if duration > 0 */
+ if (progressive_explain_min_duration == 0)
+ ProgressiveExplainInit(queryDesc);
+ else
+ enable_timeout_after(PROGRESSIVE_EXPLAIN_STARTUP_TIMEOUT,
+ progressive_explain_min_duration);
+ }
+}
+
+/*
+ * ProgressiveExplainInit -
+ * Responsible for initialization of all structures related to progressive
+ * explains.
+ *
+ * We define a ExplainState that will be reused in every iteration of
+ * plan prints.
+ *
+ * Progressive explain plans are printed in shared memory via DSAs. Each
+ * A dynamic shared memory area is created to hold the progressive plans.
+ * Each backend printing plans has its own DSA, which is shared with other
+ * backends via the global progressive explain hash through dsa_handle and
+ * dsa_pointer pointers.
+ *
+ * A memory context release callback is defined for manual resource release
+ * in case of query cancellations.
+ *
+ * A periodic timeout is configured to print the plan in fixed intervals if
+ * progressive explain is configured with instrumentation enabled. Otherwise
+ * the plain plan is printed once.
+ */
+void
+ProgressiveExplainInit(QueryDesc *queryDesc)
+{
+ ExplainState *es;
+ ProgressiveExplainHashEntry *entry;
+ bool found;
+
+ /* Initialize ExplainState to be used for all prints */
+ es = NewExplainState();
+ queryDesc->pestate = es;
+
+ /* Local instrumentation object to be reused for every node */
+ es->pe_local_instr = palloc0(sizeof(Instrumentation));
+
+ /*
+ * Mark ExplainState as progressive so that the plan printing function
+ * adjusts logic accordingly.
+ */
+ es->progressive = true;
+
+ es->analyze = (queryDesc->instrument_options &&
+ (progressive_explain == PROGRESSIVE_EXPLAIN_ANALYZE));
+ es->buffers = (es->analyze && progressive_explain_buffers);
+ es->wal = (es->analyze && progressive_explain_wal);
+ es->timing = (es->analyze && progressive_explain_timing);
+ es->summary = (es->analyze);
+ es->format = progressive_explain_format;
+ es->verbose = progressive_explain_verbose;
+ es->settings = progressive_explain_settings;
+ es->costs = progressive_explain_costs;
+
+ /* Define the DSA and share through the hash */
+ es->pe_a = dsa_create(LWTRANCHE_PROGRESSIVE_EXPLAIN_DSA);
+
+ /* Exclusive access is needed to update the hash */
+ LWLockAcquire(ProgressiveExplainHashLock, LW_EXCLUSIVE);
+
+ /* Find or create an entry with desired hash code */
+ entry = (ProgressiveExplainHashEntry *) hash_search(progressiveExplainHash,
+ &MyProcPid,
+ HASH_ENTER,
+ &found);
+
+ entry->pe_h = dsa_get_handle(es->pe_a);
+ entry->pe_p = (dsa_pointer) NULL;
+
+ LWLockRelease(ProgressiveExplainHashLock);
+
+ /* Enable timeout only if instrumentation is enabled */
+ if (es->analyze)
+ enable_timeout_every(PROGRESSIVE_EXPLAIN_TIMEOUT,
+ TimestampTzPlusMilliseconds(GetCurrentTimestamp(),
+ progressive_explain_interval),
+ progressive_explain_interval);
+
+ /* Printing progressive plan for the first time */
+ ProgressiveExplainPrint(queryDesc);
+}
+
+/*
+ * ProgressiveExplainTrigger -
+ * Called by the timeout handler to start printing progressive
+ * explain plans.
+ */
+void
+ProgressiveExplainTrigger(void)
+{
+ /*
+ * Check that the pointer is still active to avoid corner case where the
+ * timeout function is called after cleanup is done.
+ */
+ if (activeQueryDesc)
+ WrapExecProcNodeWithExplain(activeQueryDesc->planstate);
+}
+
+/*
+ * ProgressiveExplainUpdate
+ * Updates progressive explain for instrumented runs.
+ */
+void
+ProgressiveExplainUpdate(PlanState *node)
+{
+ /* Track the current PlanState */
+ node->state->query_desc->pestate->pe_curr_node = node;
+ ProgressiveExplainPrint(node->state->query_desc);
+ node->state->query_desc->pestate->pe_curr_node = NULL;
+
+ /* Reset timeout flag */
+ ProgressiveExplainPending = false;
+}
+
+/*
+ * ProgressiveExplainPrint -
+ * Prints progressive explain in memory.
+ *
+ * This function resets the reusable ExplainState, prints the
+ * plan and updates the DSA with new data.
+ *
+ * DSA memory allocation is also done here. Amount of shared
+ * memory allocated depends on size of currently printed plan.
+ * There may be reallocations in subsequent calls if new plans
+ * don't fit in the existing area.
+ */
+void
+ProgressiveExplainPrint(QueryDesc *queryDesc)
+{
+ bool alloc_needed = false;
+ QueryDesc *currentQueryDesc = queryDesc;
+ ProgressiveExplainHashEntry *entry;
+ ProgressiveExplainData *pe_data;
+ ExplainState *es = queryDesc->pestate;
+
+ /* Reset the string to be reused */
+ resetStringInfo(es->str);
+
+ /* Print the plan */
+ ExplainBeginOutput(es);
+ ExplainPrintPlan(es, currentQueryDesc);
+ ExplainEndOutput(es);
+
+ /* Exclusive access is needed to update the hash */
+ LWLockAcquire(ProgressiveExplainHashLock, LW_EXCLUSIVE);
+ entry = (ProgressiveExplainHashEntry *) hash_search(progressiveExplainHash,
+ &MyProcPid,
+ HASH_FIND,
+ NULL);
+
+ /* Entry must already exist in shared memory at this point */
+ if (!entry)
+ elog(ERROR, "no entry in progressive explain hash for pid %d",
+ MyProcPid);
+ else
+ {
+ /* Plan was never printed */
+ if (!entry->pe_p)
+ alloc_needed = true;
+ else
+ {
+ pe_data = dsa_get_address(es->pe_a,
+ entry->pe_p);
+
+ /*
+ * Plan does not fit in existing shared memory area. Reallocation
+ * is needed.
+ */
+ if (strlen(es->str->data) > entry->plan_alloc_size)
+ {
+ dsa_free(es->pe_a, entry->pe_p);
+ alloc_needed = true;
+ }
+ }
+
+ if (alloc_needed)
+ {
+ /*
+ * The allocated size combines the length of the currently printed
+ * query plan with an additional delta defined by
+ * PROGRESSIVE_EXPLAIN_FREE_SIZE. This strategy prevents having to
+ * reallocate the segment very often, which would be needed in
+ * case the length of the next printed exceeds the previously
+ * allocated size.
+ */
+ entry->plan_alloc_size = add_size(strlen(es->str->data),
+ PROGRESSIVE_EXPLAIN_FREE_SIZE);
+ entry->pe_p = dsa_allocate(es->pe_a,
+ add_size(sizeof(ProgressiveExplainData),
+ entry->plan_alloc_size));
+ pe_data = dsa_get_address(es->pe_a, entry->pe_p);
+ pe_data->pid = MyProcPid;
+ }
+
+ /* Update shared memory with new data */
+ strcpy(pe_data->plan, es->str->data);
+ pe_data->last_update = GetCurrentTimestamp();
+ }
+
+ LWLockRelease(ProgressiveExplainHashLock);
+}
+
+/*
+ * ProgressiveExplainFinish -
+ * Finalizes query execution with progressive explain enabled.
+ */
+void
+ProgressiveExplainFinish(QueryDesc *queryDesc)
+{
+ /*
+ * Progressive explain is only done for the outer most query descriptor.
+ */
+ if (queryDesc == activeQueryDesc)
+ {
+ /* Startup timeout hasn't triggered yet, just disable it */
+ if (get_timeout_active(PROGRESSIVE_EXPLAIN_STARTUP_TIMEOUT))
+ disable_timeout(PROGRESSIVE_EXPLAIN_STARTUP_TIMEOUT, false);
+ /* Initial progressive explain was done, clean everything */
+ else if (queryDesc && queryDesc->pestate)
+ ProgressiveExplainCleanup(queryDesc);
+ }
+}
+
+/*
+ * ProgressiveExplainFinish -
+ * Finalizes query execution with progressive explain enabled.
+ */
+bool
+ProgressiveExplainIsActive(QueryDesc *queryDesc)
+{
+ if (queryDesc == activeQueryDesc)
+ return true;
+ else
+ return false;
+}
+
+/*
+ * End-of-transaction cleanup for progressive explains.
+ */
+void
+AtEOXact_ProgressiveExplain(void)
+{
+ /*
+ * Only perform cleanup if query descriptor is being tracked, which means
+ * that the feature is enabled for the last query executed before the
+ * abort.
+ */
+ if (activeQueryDesc)
+ ProgressiveExplainCleanup(NULL);
+}
+
+/*
+ * ProgressiveExplainCleanup -
+ * Cleanup routine when progressive explain is enabled.
+ *
+ * We need to deal with structures not automatically released by the memory
+ * context removal. Current tasks are:
+ * - remove local backend from progressive explain hash
+ * - detach from DSA used to store shared data
+ */
+void
+ProgressiveExplainCleanup(QueryDesc *queryDesc)
+{
+ /* Stop timeout */
+ disable_timeout(PROGRESSIVE_EXPLAIN_TIMEOUT, false);
+
+ /* Reset timeout flag */
+ ProgressiveExplainPending = false;
+
+ /* Reset querydesc tracker */
+ activeQueryDesc = NULL;
+
+ LWLockAcquire(ProgressiveExplainHashLock, LW_EXCLUSIVE);
+
+ /*
+ * Only detach from DSA if query ended gracefully, ie, if
+ * ProgressiveExplainCleanup was called by function
+ * ProgressiveExplainFinish
+ */
+ if (queryDesc)
+ dsa_detach(queryDesc->pestate->pe_a);
+ hash_search(progressiveExplainHash, &MyProcPid, HASH_REMOVE, NULL);
+ LWLockRelease(ProgressiveExplainHashLock);
+}
+
+/*
+ * ExecProcNodeExplain -
+ * ExecProcNode wrapper that initializes progressive explain
+ * and uwraps ExecProdNode to the original function.
+ */
+static TupleTableSlot *
+ExecProcNodeExplain(PlanState *node)
+{
+ /* Initialize progressive explain */
+ ProgressiveExplainInit(node->state->query_desc);
+
+ /* Unwrap exec proc node for all nodes */
+ UnwrapExecProcNodeWithExplain(node->state->query_desc->planstate);
+
+ /*
+ * Since unwrapping has already done, call ExecProcNode() not
+ * ExecProcNodeOriginal().
+ */
+ return node->ExecProcNode(node);
+}
+
+/*
+ * ExecProcNodeInstrExplain -
+ * ExecProcNode wrapper that performs instrumentation calls and prints
+ * progressive explains. By keeping this a separate function, we add
+ * overhead only when progressive explain is enabled
+ */
+TupleTableSlot *
+ExecProcNodeInstrExplain(PlanState *node)
+{
+ TupleTableSlot *result;
+
+ InstrStartNode(node->instrument);
+
+ /*
+ * Update progressive after timeout is reached.
+ */
+ if (ProgressiveExplainPending)
+ ProgressiveExplainUpdate(node);
+
+ result = node->ExecProcNodeReal(node);
+
+ InstrStopNode(node->instrument, TupIsNull(result) ? 0.0 : 1.0);
+
+ return result;
+}
+
+/*
+ * WrapMemberNodesExecProcNodesWithExplain -
+ * Wrap array of PlanStates ExecProcNodes with ExecProcNodeWithExplain
+ */
+static void
+WrapMemberNodesExecProcNodesWithExplain(PlanState **planstates, int nplans)
+{
+ int i;
+
+ for (i = 0; i < nplans; i++)
+ WrapExecProcNodeWithExplain(planstates[i]);
+}
+
+/*
+ * WrapCustomPlanChildExecProcNodesWithExplain -
+ * Wrap CustomScanstate children's ExecProcNodes with ExecProcNodeWithExplain
+ */
+static void
+WrapCustomPlanChildExecProcNodesWithExplain(CustomScanState *css)
+{
+ ListCell *cell;
+
+ foreach(cell, css->custom_ps)
+ WrapExecProcNodeWithExplain((PlanState *) lfirst(cell));
+}
+
+/*
+ * WrapSubPlansExecProcNodesWithExplain -
+ * Wrap SubPlans ExecProcNodes with ExecProcNodeWithExplain
+ */
+static void
+WrapSubPlansExecProcNodesWithExplain(List *plans)
+{
+ ListCell *lst;
+
+ foreach(lst, plans)
+ {
+ SubPlanState *sps = (SubPlanState *) lfirst(lst);
+
+ WrapExecProcNodeWithExplain(sps->planstate);
+ }
+}
+
+/*
+ * WrapExecProcNodeWithExplain -
+ * Wrap ExecProcNode with ExecProcNodeWithExplain recursively
+ */
+static void
+WrapExecProcNodeWithExplain(PlanState *ps)
+{
+ /* wrapping can be done only once */
+ if (ps->ExecProcNodeOriginal != NULL)
+ return;
+
+ check_stack_depth();
+
+ ps->ExecProcNodeOriginal = ps->ExecProcNode;
+ ps->ExecProcNode = ExecProcNodeExplain;
+
+ /* initPlan-s */
+ if (ps->initPlan != NULL)
+ WrapSubPlansExecProcNodesWithExplain(ps->initPlan);
+
+ /* lefttree */
+ if (ps->lefttree != NULL)
+ WrapExecProcNodeWithExplain(ps->lefttree);
+ /* righttree */
+ if (ps->righttree != NULL)
+ WrapExecProcNodeWithExplain(ps->righttree);
+
+ /* special child plans */
+ switch (nodeTag(ps->plan))
+ {
+ case T_Append:
+ WrapMemberNodesExecProcNodesWithExplain(((AppendState *) ps)->appendplans,
+ ((AppendState *) ps)->as_nplans);
+ break;
+ case T_MergeAppend:
+ WrapMemberNodesExecProcNodesWithExplain(((MergeAppendState *) ps)->mergeplans,
+ ((MergeAppendState *) ps)->ms_nplans);
+ break;
+ case T_BitmapAnd:
+ WrapMemberNodesExecProcNodesWithExplain(((BitmapAndState *) ps)->bitmapplans,
+ ((BitmapAndState *) ps)->nplans);
+ break;
+ case T_BitmapOr:
+ WrapMemberNodesExecProcNodesWithExplain(((BitmapOrState *) ps)->bitmapplans,
+ ((BitmapOrState *) ps)->nplans);
+ break;
+ case T_SubqueryScan:
+ WrapExecProcNodeWithExplain(((SubqueryScanState *) ps)->subplan);
+ break;
+ case T_CustomScan:
+ WrapCustomPlanChildExecProcNodesWithExplain((CustomScanState *) ps);
+ break;
+ default:
+ break;
+ }
+}
+
+/*
+ * UnWrapMemberNodesExecProcNodesWithExplain -
+ * Unwrap array of PlanStates ExecProcNodes with ExecProcNodeWithExplain
+ */
+static void
+UnWrapMemberNodesExecProcNodesWithExplain(PlanState **planstates, int nplans)
+{
+ int i;
+
+ for (i = 0; i < nplans; i++)
+ UnwrapExecProcNodeWithExplain(planstates[i]);
+}
+
+/*
+ * UnwrapCustomPlanChildExecProcNodesWithExplain -
+ * Unwrap CustomScanstate children's ExecProcNodes with ExecProcNodeWithExplain
+ */
+static void
+UnwrapCustomPlanChildExecProcNodesWithExplain(CustomScanState *css)
+{
+ ListCell *cell;
+
+ foreach(cell, css->custom_ps)
+ UnwrapExecProcNodeWithExplain((PlanState *) lfirst(cell));
+}
+
+/*
+ * UnwrapSubPlansExecProcNodesWithExplain -
+ * Unwrap SubPlans ExecProcNodes with ExecProcNodeWithExplain
+ */
+static void
+UnwrapSubPlansExecProcNodesWithExplain(List *plans)
+{
+ ListCell *lst;
+
+ foreach(lst, plans)
+ {
+ SubPlanState *sps = (SubPlanState *) lfirst(lst);
+
+ UnwrapExecProcNodeWithExplain(sps->planstate);
+ }
+}
+
+/*
+ * UnwrapExecProcNodeWithExplain -
+ * Unwrap ExecProcNode with ExecProcNodeWithExplain recursively
+ */
+static void
+UnwrapExecProcNodeWithExplain(PlanState *ps)
+{
+ Assert(ps->ExecProcNodeOriginal != NULL);
+
+ check_stack_depth();
+
+ ps->ExecProcNode = ps->ExecProcNodeOriginal;
+ ps->ExecProcNodeOriginal = NULL;
+
+ /* initPlan-s */
+ if (ps->initPlan != NULL)
+ UnwrapSubPlansExecProcNodesWithExplain(ps->initPlan);
+
+ /* lefttree */
+ if (ps->lefttree != NULL)
+ UnwrapExecProcNodeWithExplain(ps->lefttree);
+ /* righttree */
+ if (ps->righttree != NULL)
+ UnwrapExecProcNodeWithExplain(ps->righttree);
+
+ /* special child plans */
+ switch (nodeTag(ps->plan))
+ {
+ case T_Append:
+ UnWrapMemberNodesExecProcNodesWithExplain(((AppendState *) ps)->appendplans,
+ ((AppendState *) ps)->as_nplans);
+ break;
+ case T_MergeAppend:
+ UnWrapMemberNodesExecProcNodesWithExplain(((MergeAppendState *) ps)->mergeplans,
+ ((MergeAppendState *) ps)->ms_nplans);
+ break;
+ case T_BitmapAnd:
+ UnWrapMemberNodesExecProcNodesWithExplain(((BitmapAndState *) ps)->bitmapplans,
+ ((BitmapAndState *) ps)->nplans);
+ break;
+ case T_BitmapOr:
+ UnWrapMemberNodesExecProcNodesWithExplain(((BitmapOrState *) ps)->bitmapplans,
+ ((BitmapOrState *) ps)->nplans);
+ break;
+ case T_SubqueryScan:
+ UnwrapExecProcNodeWithExplain(((SubqueryScanState *) ps)->subplan);
+ break;
+ case T_CustomScan:
+ UnwrapCustomPlanChildExecProcNodesWithExplain((CustomScanState *) ps);
+ break;
+ default:
+ break;
+ }
+}
+
+/*
+ * ProgressiveExplainHashShmemSize
+ * Compute shared memory space needed for explain hash.
+ */
+Size
+ProgressiveExplainHashShmemSize(void)
+{
+ Size size = 0;
+ long max_table_size;
+
+ max_table_size = add_size(MaxBackends,
+ max_parallel_workers);
+ size = add_size(size,
+ hash_estimate_size(max_table_size,
+ sizeof(ProgressiveExplainHashEntry)));
+
+ return size;
+}
+
+/*
+ * InitProgressiveExplainHash -
+ * Initialize hash used to store data of progressive explains.
+ */
+void
+InitProgressiveExplainHash(void)
+{
+ HASHCTL info;
+
+ info.keysize = sizeof(int);
+ info.entrysize = sizeof(ProgressiveExplainHashEntry);
+
+ progressiveExplainHash = ShmemInitHash("progressive explain hash",
+ 50, 50,
+ &info,
+ HASH_ELEM | HASH_BLOBS);
+}
+
+/*
+ * pg_stat_progress_explain -
+ * Return the progress of progressive explains.
+ */
+Datum
+pg_stat_progress_explain(PG_FUNCTION_ARGS)
+{
+#define EXPLAIN_ACTIVITY_COLS 4
+ int num_backends = pgstat_fetch_stat_numbackends();
+ int curr_backend;
+ HASH_SEQ_STATUS hash_seq;
+ ProgressiveExplainHashEntry *entry;
+ dsa_area *a;
+ ProgressiveExplainData *ped;
+
+ ReturnSetInfo *rsinfo = (ReturnSetInfo *) fcinfo->resultinfo;
+
+ InitMaterializedSRF(fcinfo, 0);
+
+ LWLockAcquire(ProgressiveExplainHashLock, LW_SHARED);
+
+ hash_seq_init(&hash_seq, progressiveExplainHash);
+ while ((entry = hash_seq_search(&hash_seq)) != NULL)
+ {
+ Datum values[EXPLAIN_ACTIVITY_COLS] = {0};
+ bool nulls[EXPLAIN_ACTIVITY_COLS] = {0};
+
+ /*
+ * We don't look at a DSA that doesn't contain data yet, or at our own
+ * row.
+ */
+ if (!DsaPointerIsValid(entry->pe_p) ||
+ MyProcPid == entry->pid)
+ continue;
+
+ a = dsa_attach(entry->pe_h);
+ ped = dsa_get_address(a, entry->pe_p);
+
+ /* 1-based index */
+ for (curr_backend = 1; curr_backend <= num_backends; curr_backend++)
+ {
+ LocalPgBackendStatus *local_beentry;
+ PgBackendStatus *beentry;
+
+ /* Get the next one in the list */
+ local_beentry = pgstat_get_local_beentry_by_index(curr_backend);
+ beentry = &local_beentry->backendStatus;
+
+ if (beentry->st_procpid == ped->pid)
+ {
+ /* Values available to all callers */
+ if (beentry->st_databaseid != InvalidOid)
+ values[0] = ObjectIdGetDatum(beentry->st_databaseid);
+ else
+ nulls[0] = true;
+
+ values[1] = ped->pid;
+ values[2] = TimestampTzGetDatum(ped->last_update);
+
+ if (has_privs_of_role(GetUserId(), ROLE_PG_READ_ALL_STATS) ||
+ has_privs_of_role(GetUserId(), beentry->st_procpid))
+ values[3] = CStringGetTextDatum(ped->plan);
+ else
+ values[3] = CStringGetTextDatum("<insufficient privilege>");
+ break;
+ }
+ }
+
+ tuplestore_putvalues(rsinfo->setResult, rsinfo->setDesc, values, nulls);
+
+ dsa_detach(a);
+
+ }
+ LWLockRelease(ProgressiveExplainHashLock);
+
+ return (Datum) 0;
+}
diff --git a/src/backend/executor/execMain.c b/src/backend/executor/execMain.c
index e9bd98c7738..cbb074eaa93 100644
--- a/src/backend/executor/execMain.c
+++ b/src/backend/executor/execMain.c
@@ -43,6 +43,7 @@
#include "access/xact.h"
#include "catalog/namespace.h"
#include "catalog/partition.h"
+#include "commands/explain_progressive.h"
#include "commands/matview.h"
#include "commands/trigger.h"
#include "executor/executor.h"
@@ -157,6 +158,12 @@ standard_ExecutorStart(QueryDesc *queryDesc, int eflags)
/* caller must ensure the query's snapshot is active */
Assert(GetActiveSnapshot() == queryDesc->snapshot);
+ /*
+ * Setup progressive explain if enabled.
+ */
+ if (progressive_explain != PROGRESSIVE_EXPLAIN_NONE)
+ ProgressiveExplainSetup(queryDesc);
+
/*
* If the transaction is read-only, we need to check if any writes are
* planned to non-temporary tables. EXPLAIN is considered read-only.
@@ -182,6 +189,11 @@ standard_ExecutorStart(QueryDesc *queryDesc, int eflags)
estate = CreateExecutorState();
queryDesc->estate = estate;
+ /*
+ * Adding back reference to QueryDesc
+ */
+ estate->query_desc = queryDesc;
+
oldcontext = MemoryContextSwitchTo(estate->es_query_cxt);
/*
@@ -267,6 +279,12 @@ standard_ExecutorStart(QueryDesc *queryDesc, int eflags)
*/
InitPlan(queryDesc, eflags);
+ /*
+ * Start progressive explain if enabled.
+ */
+ if (progressive_explain != PROGRESSIVE_EXPLAIN_NONE)
+ ProgressiveExplainStart(queryDesc);
+
MemoryContextSwitchTo(oldcontext);
return ExecPlanStillValid(queryDesc->estate);
@@ -516,6 +534,12 @@ standard_ExecutorFinish(QueryDesc *queryDesc)
MemoryContextSwitchTo(oldcontext);
+ /*
+ * Finish progressive explain if enabled.
+ */
+ if (progressive_explain != PROGRESSIVE_EXPLAIN_NONE)
+ ProgressiveExplainFinish(queryDesc);
+
estate->es_finished = true;
}
diff --git a/src/backend/executor/execProcnode.c b/src/backend/executor/execProcnode.c
index f5f9cfbeead..82fe90bf231 100644
--- a/src/backend/executor/execProcnode.c
+++ b/src/backend/executor/execProcnode.c
@@ -72,6 +72,7 @@
*/
#include "postgres.h"
+#include "commands/explain_progressive.h"
#include "executor/executor.h"
#include "executor/nodeAgg.h"
#include "executor/nodeAppend.h"
@@ -118,6 +119,7 @@
#include "executor/nodeWorktablescan.h"
#include "miscadmin.h"
#include "nodes/nodeFuncs.h"
+#include "utils/guc.h"
static TupleTableSlot *ExecProcNodeFirst(PlanState *node);
static TupleTableSlot *ExecProcNodeInstr(PlanState *node);
@@ -462,7 +464,17 @@ ExecProcNodeFirst(PlanState *node)
* have ExecProcNode() directly call the relevant function from now on.
*/
if (node->instrument)
- node->ExecProcNode = ExecProcNodeInstr;
+ {
+ /*
+ * Use wrapper for progressive explains if enabled and the node
+ * belongs to the currently tracked query descriptor.
+ */
+ if (progressive_explain == PROGRESSIVE_EXPLAIN_ANALYZE &&
+ ProgressiveExplainIsActive(node->state->query_desc))
+ node->ExecProcNode = ExecProcNodeInstrExplain;
+ else
+ node->ExecProcNode = ExecProcNodeInstr;
+ }
else
node->ExecProcNode = node->ExecProcNodeReal;
diff --git a/src/backend/executor/instrument.c b/src/backend/executor/instrument.c
index 56e635f4700..6a160ee254f 100644
--- a/src/backend/executor/instrument.c
+++ b/src/backend/executor/instrument.c
@@ -25,6 +25,8 @@ static WalUsage save_pgWalUsage;
static void BufferUsageAdd(BufferUsage *dst, const BufferUsage *add);
static void WalUsageAdd(WalUsage *dst, WalUsage *add);
+static void InstrEndLoopInternal(Instrumentation *instr, bool force);
+
/* Allocate new instrumentation structure(s) */
Instrumentation *
@@ -137,7 +139,7 @@ InstrUpdateTupleCount(Instrumentation *instr, double nTuples)
/* Finish a run cycle for a plan node */
void
-InstrEndLoop(Instrumentation *instr)
+InstrEndLoopInternal(Instrumentation *instr, bool force)
{
double totaltime;
@@ -145,7 +147,7 @@ InstrEndLoop(Instrumentation *instr)
if (!instr->running)
return;
- if (!INSTR_TIME_IS_ZERO(instr->starttime))
+ if (!INSTR_TIME_IS_ZERO(instr->starttime) && !force)
elog(ERROR, "InstrEndLoop called on running node");
/* Accumulate per-cycle statistics into totals */
@@ -164,6 +166,20 @@ InstrEndLoop(Instrumentation *instr)
instr->tuplecount = 0;
}
+/* Safely finish a run cycle for a plan node */
+void
+InstrEndLoop(Instrumentation *instr)
+{
+ InstrEndLoopInternal(instr, false);
+}
+
+/* Forcibly finish a run cycle for a plan node */
+void
+InstrEndLoopForce(Instrumentation *instr)
+{
+ InstrEndLoopInternal(instr, true);
+}
+
/* aggregate instrumentation information */
void
InstrAggNode(Instrumentation *dst, Instrumentation *add)
diff --git a/src/backend/storage/ipc/ipci.c b/src/backend/storage/ipc/ipci.c
index 2fa045e6b0f..389f5b55831 100644
--- a/src/backend/storage/ipc/ipci.c
+++ b/src/backend/storage/ipc/ipci.c
@@ -25,6 +25,7 @@
#include "access/xlogprefetcher.h"
#include "access/xlogrecovery.h"
#include "commands/async.h"
+#include "commands/explain_progressive.h"
#include "miscadmin.h"
#include "pgstat.h"
#include "postmaster/autovacuum.h"
@@ -150,6 +151,7 @@ CalculateShmemSize(int *num_semaphores)
size = add_size(size, InjectionPointShmemSize());
size = add_size(size, SlotSyncShmemSize());
size = add_size(size, AioShmemSize());
+ size = add_size(size, ProgressiveExplainHashShmemSize());
/* include additional requested shmem from preload libraries */
size = add_size(size, total_addin_request);
@@ -302,6 +304,11 @@ CreateOrAttachShmemStructs(void)
*/
PredicateLockShmemInit();
+ /*
+ * Set up instrumented explain hash table
+ */
+ InitProgressiveExplainHash();
+
/*
* Set up process table
*/
diff --git a/src/backend/storage/lmgr/lwlock.c b/src/backend/storage/lmgr/lwlock.c
index 5702c35bb91..7097312b1a8 100644
--- a/src/backend/storage/lmgr/lwlock.c
+++ b/src/backend/storage/lmgr/lwlock.c
@@ -177,6 +177,7 @@ static const char *const BuiltinTrancheNames[] = {
[LWTRANCHE_SUBTRANS_SLRU] = "SubtransSLRU",
[LWTRANCHE_XACT_SLRU] = "XactSLRU",
[LWTRANCHE_PARALLEL_VACUUM_DSA] = "ParallelVacuumDSA",
+ [LWTRANCHE_PROGRESSIVE_EXPLAIN_DSA] = "ProgressiveExplainDSA",
};
StaticAssertDecl(lengthof(BuiltinTrancheNames) ==
diff --git a/src/backend/utils/activity/wait_event_names.txt b/src/backend/utils/activity/wait_event_names.txt
index 9fa12a555e8..b53bc61d0d8 100644
--- a/src/backend/utils/activity/wait_event_names.txt
+++ b/src/backend/utils/activity/wait_event_names.txt
@@ -349,6 +349,7 @@ DSMRegistry "Waiting to read or update the dynamic shared memory registry."
InjectionPoint "Waiting to read or update information related to injection points."
SerialControl "Waiting to read or update shared <filename>pg_serial</filename> state."
AioWorkerSubmissionQueue "Waiting to access AIO worker submission queue."
+ProgressiveExplainHash "Waiting to access backend progressive explain shared hash table."
#
# END OF PREDEFINED LWLOCKS (DO NOT CHANGE THIS LINE)
diff --git a/src/backend/utils/init/postinit.c b/src/backend/utils/init/postinit.c
index 7958ea11b73..3b55a061f3e 100644
--- a/src/backend/utils/init/postinit.c
+++ b/src/backend/utils/init/postinit.c
@@ -33,6 +33,7 @@
#include "catalog/pg_database.h"
#include "catalog/pg_db_role_setting.h"
#include "catalog/pg_tablespace.h"
+#include "commands/explain_progressive.h"
#include "libpq/auth.h"
#include "libpq/libpq-be.h"
#include "mb/pg_wchar.h"
@@ -82,6 +83,8 @@ static void TransactionTimeoutHandler(void);
static void IdleSessionTimeoutHandler(void);
static void IdleStatsUpdateTimeoutHandler(void);
static void ClientCheckTimeoutHandler(void);
+static void ProgressiveExplainStartupTimeoutHandler(void);
+static void ProgressiveExplainTimeoutHandler(void);
static bool ThereIsAtLeastOneRole(void);
static void process_startup_options(Port *port, bool am_superuser);
static void process_settings(Oid databaseid, Oid roleid);
@@ -771,6 +774,10 @@ InitPostgres(const char *in_dbname, Oid dboid,
RegisterTimeout(CLIENT_CONNECTION_CHECK_TIMEOUT, ClientCheckTimeoutHandler);
RegisterTimeout(IDLE_STATS_UPDATE_TIMEOUT,
IdleStatsUpdateTimeoutHandler);
+ RegisterTimeout(PROGRESSIVE_EXPLAIN_TIMEOUT,
+ ProgressiveExplainTimeoutHandler);
+ RegisterTimeout(PROGRESSIVE_EXPLAIN_STARTUP_TIMEOUT,
+ ProgressiveExplainStartupTimeoutHandler);
}
/*
@@ -1432,6 +1439,18 @@ ClientCheckTimeoutHandler(void)
SetLatch(MyLatch);
}
+static void
+ProgressiveExplainStartupTimeoutHandler(void)
+{
+ ProgressiveExplainTrigger();
+}
+
+static void
+ProgressiveExplainTimeoutHandler(void)
+{
+ ProgressiveExplainPending = true;
+}
+
/*
* Returns true if at least one role is defined in this database cluster.
*/
diff --git a/src/backend/utils/misc/guc_tables.c b/src/backend/utils/misc/guc_tables.c
index 989825d3a9c..2a5f5058bda 100644
--- a/src/backend/utils/misc/guc_tables.c
+++ b/src/backend/utils/misc/guc_tables.c
@@ -41,6 +41,8 @@
#include "commands/async.h"
#include "commands/extension.h"
#include "commands/event_trigger.h"
+#include "commands/explain_progressive.h"
+#include "commands/explain_state.h"
#include "commands/tablespace.h"
#include "commands/trigger.h"
#include "commands/user.h"
@@ -479,6 +481,14 @@ static const struct config_enum_entry wal_compression_options[] = {
{NULL, 0, false}
};
+static const struct config_enum_entry progressive_explain_options[] = {
+ {"off", PROGRESSIVE_EXPLAIN_NONE, false},
+ {"explain", PROGRESSIVE_EXPLAIN_EXPLAIN, false},
+ {"analyze", PROGRESSIVE_EXPLAIN_ANALYZE, false},
+ {"false", PROGRESSIVE_EXPLAIN_NONE, true},
+ {NULL, 0, false}
+};
+
/*
* Options for enum values stored in other modules
*/
@@ -533,6 +543,16 @@ int log_parameter_max_length_on_error = 0;
int log_temp_files = -1;
double log_statement_sample_rate = 1.0;
double log_xact_sample_rate = 0;
+int progressive_explain = PROGRESSIVE_EXPLAIN_NONE;
+bool progressive_explain_verbose = false;
+bool progressive_explain_settings = false;
+bool progressive_explain_timing = true;
+bool progressive_explain_buffers = false;
+bool progressive_explain_wal = false;
+bool progressive_explain_costs = true;
+int progressive_explain_min_duration = 1000;
+int progressive_explain_interval = 1000;
+int progressive_explain_format = EXPLAIN_FORMAT_TEXT;
char *backtrace_functions;
int temp_file_limit = -1;
@@ -2141,6 +2161,72 @@ struct config_bool ConfigureNamesBool[] =
NULL, NULL, NULL
},
+ {
+ {"progressive_explain_verbose", PGC_USERSET, STATS_MONITORING,
+ gettext_noop("Controls whether verbose details are added to progressive explains."),
+ gettext_noop("Equivalent to the VERBOSE option of EXPLAIN."),
+ GUC_EXPLAIN
+ },
+ &progressive_explain_verbose,
+ false,
+ NULL, NULL, NULL
+ },
+
+ {
+ {"progressive_explain_settings", PGC_USERSET, STATS_MONITORING,
+ gettext_noop("Controls whether information on modified configuration is added to progressive explains."),
+ gettext_noop("Equivalent to the SETTINGS option of EXPLAIN."),
+ GUC_EXPLAIN
+ },
+ &progressive_explain_settings,
+ false,
+ NULL, NULL, NULL
+ },
+
+ {
+ {"progressive_explain_timing", PGC_USERSET, STATS_MONITORING,
+ gettext_noop("Controls whether information on per node timing is added to progressive explains."),
+ gettext_noop("Equivalent to the TIMING option of EXPLAIN."),
+ GUC_EXPLAIN
+ },
+ &progressive_explain_timing,
+ true,
+ NULL, NULL, NULL
+ },
+
+ {
+ {"progressive_explain_buffers", PGC_USERSET, STATS_MONITORING,
+ gettext_noop("Controls whether information on buffer usage is added to progressive explains."),
+ gettext_noop("Equivalent to the BUFFERS option of EXPLAIN."),
+ GUC_EXPLAIN
+ },
+ &progressive_explain_buffers,
+ false,
+ NULL, NULL, NULL
+ },
+
+ {
+ {"progressive_explain_wal", PGC_USERSET, STATS_MONITORING,
+ gettext_noop("Controls whether information on WAL record generation is added to progressive explains."),
+ gettext_noop("Equivalent to the WAL option of EXPLAIN."),
+ GUC_EXPLAIN
+ },
+ &progressive_explain_wal,
+ false,
+ NULL, NULL, NULL
+ },
+
+ {
+ {"progressive_explain_costs", PGC_USERSET, STATS_MONITORING,
+ gettext_noop("Controls whether information on the estimated startup and total cost of each plan node is added to progressive explains."),
+ gettext_noop("Equivalent to the COSTS option of EXPLAIN."),
+ GUC_EXPLAIN
+ },
+ &progressive_explain_costs,
+ true,
+ NULL, NULL, NULL
+ },
+
/* End-of-list marker */
{
{NULL, 0, 0, NULL, NULL}, NULL, false, NULL, NULL, NULL
@@ -3858,6 +3944,30 @@ struct config_int ConfigureNamesInt[] =
NULL, NULL, NULL
},
+ {
+ {"progressive_explain_interval", PGC_USERSET, STATS_MONITORING,
+ gettext_noop("Sets the interval between instrumented progressive "
+ "explains."),
+ NULL,
+ GUC_UNIT_MS
+ },
+ &progressive_explain_interval,
+ 1000, 10, INT_MAX,
+ NULL, NULL, NULL
+ },
+
+ {
+ {"progressive_explain_min_duration", PGC_USERSET, STATS_MONITORING,
+ gettext_noop("Min query duration to start printing instrumented "
+ "progressive explains."),
+ NULL,
+ GUC_UNIT_MS
+ },
+ &progressive_explain_min_duration,
+ 1000, 0, INT_MAX,
+ NULL, NULL, NULL
+ },
+
/* End-of-list marker */
{
{NULL, 0, 0, NULL, NULL}, NULL, 0, 0, 0, NULL, NULL, NULL
@@ -5406,6 +5516,26 @@ struct config_enum ConfigureNamesEnum[] =
NULL, assign_io_method, NULL
},
+ {
+ {"progressive_explain_format", PGC_USERSET, STATS_MONITORING,
+ gettext_noop("Selects the EXPLAIN output format to be used with progressive explains."),
+ gettext_noop("Equivalent to the FORMAT option of EXPLAIN.")
+ },
+ &progressive_explain_format,
+ EXPLAIN_FORMAT_TEXT, explain_format_options,
+ NULL, NULL, NULL
+ },
+
+ {
+ {"progressive_explain", PGC_USERSET, STATS_MONITORING,
+ gettext_noop("Enables progressive explains."),
+ gettext_noop("Explain output is visible via pg_stat_progress_explain.")
+ },
+ &progressive_explain,
+ PROGRESSIVE_EXPLAIN_NONE, progressive_explain_options,
+ NULL, NULL, NULL
+ },
+
/* End-of-list marker */
{
{NULL, 0, 0, NULL, NULL}, NULL, 0, NULL, NULL, NULL, NULL
diff --git a/src/backend/utils/misc/postgresql.conf.sample b/src/backend/utils/misc/postgresql.conf.sample
index 0b9e3066bde..89aa80afb5c 100644
--- a/src/backend/utils/misc/postgresql.conf.sample
+++ b/src/backend/utils/misc/postgresql.conf.sample
@@ -670,6 +670,20 @@
#log_executor_stats = off
+# - Progressive Explain -
+
+#progressive_explain = off
+#progressive_explain_min_duration = 1s
+#progressive_explain_interval = 1s
+#progressive_explain_format = text
+#progressive_explain_settings = off
+#progressive_explain_verbose = off
+#progressive_explain_buffers = off
+#progressive_explain_wal = off
+#progressive_explain_timing = on
+#progressive_explain_costs = on
+
+
#------------------------------------------------------------------------------
# VACUUMING
#------------------------------------------------------------------------------
diff --git a/src/include/catalog/pg_proc.dat b/src/include/catalog/pg_proc.dat
index 3f7b82e02bb..b47805fe8df 100644
--- a/src/include/catalog/pg_proc.dat
+++ b/src/include/catalog/pg_proc.dat
@@ -12479,4 +12479,14 @@
proargtypes => 'int4',
prosrc => 'gist_stratnum_common' },
+{ oid => '8770',
+ descr => 'statistics: information about progress of backends running statements',
+ proname => 'pg_stat_progress_explain', prorows => '100', proisstrict => 'f',
+ proretset => 't', provolatile => 's', proparallel => 'r',
+ prorettype => 'record', proargtypes => '',
+ proallargtypes => '{oid,int4,timestamptz,text}',
+ proargmodes => '{o,o,o,o}',
+ proargnames => '{datid,pid,last_update,query_plan}',
+ prosrc => 'pg_stat_progress_explain' },
+
]
diff --git a/src/include/commands/explain_progressive.h b/src/include/commands/explain_progressive.h
new file mode 100644
index 00000000000..9c888b6ebe9
--- /dev/null
+++ b/src/include/commands/explain_progressive.h
@@ -0,0 +1,56 @@
+/*-------------------------------------------------------------------------
+ *
+ * explain_progressive.h
+ * prototypes for explain_progressive.c
+ *
+ * Portions Copyright (c) 1996-2025, PostgreSQL Global Development Group
+ * Portions Copyright (c) 1994-5, Regents of the University of California
+ *
+ * src/include/commands/explain_progressive.h
+ *
+ *-------------------------------------------------------------------------
+ */
+#ifndef EXPLAIN_PROGRESSIVE_H
+#define EXPLAIN_PROGRESSIVE_H
+
+#include "datatype/timestamp.h"
+#include "executor/executor.h"
+
+typedef enum ProgressiveExplain
+{
+ PROGRESSIVE_EXPLAIN_NONE,
+ PROGRESSIVE_EXPLAIN_EXPLAIN,
+ PROGRESSIVE_EXPLAIN_ANALYZE,
+} ProgressiveExplain;
+
+typedef struct ProgressiveExplainHashEntry
+{
+ int pid; /* hash key of entry - MUST BE FIRST */
+ int plan_alloc_size;
+ dsa_handle pe_h;
+ dsa_pointer pe_p;
+} ProgressiveExplainHashEntry;
+
+typedef struct ProgressiveExplainData
+{
+ int pid;
+ TimestampTz last_update;
+ char plan[FLEXIBLE_ARRAY_MEMBER];
+} ProgressiveExplainData;
+
+extern bool ProgressiveExplainIsActive(QueryDesc *queryDesc);
+extern void ProgressiveExplainSetup(QueryDesc *queryDesc);
+extern void ProgressiveExplainStart(QueryDesc *queryDesc);
+extern void ProgressiveExplainTrigger(void);
+extern void ProgressiveExplainUpdate(PlanState *node);
+extern void ProgressiveExplainFinish(QueryDesc *queryDesc);
+extern Size ProgressiveExplainHashShmemSize(void);
+extern void InitProgressiveExplainHash(void);
+extern TupleTableSlot *ExecProcNodeInstrExplain(PlanState *node);
+
+/* transaction cleanup code */
+extern void AtEOXact_ProgressiveExplain(void);
+
+extern bool ProgressiveExplainPending;
+
+#endif /* EXPLAIN_PROGRESSIVE_H */
diff --git a/src/include/commands/explain_state.h b/src/include/commands/explain_state.h
index 32728f5d1a1..64370a5d6ea 100644
--- a/src/include/commands/explain_state.h
+++ b/src/include/commands/explain_state.h
@@ -16,6 +16,7 @@
#include "nodes/parsenodes.h"
#include "nodes/plannodes.h"
#include "parser/parse_node.h"
+#include "utils/dsa.h"
typedef enum ExplainSerializeOption
{
@@ -74,6 +75,14 @@ typedef struct ExplainState
/* extensions */
void **extension_state;
int extension_state_allocated;
+ /* set if tracking a progressive explain */
+ bool progressive;
+ /* current plan node in progressive explains */
+ struct PlanState *pe_curr_node;
+ /* reusable instr object used in progressive explains */
+ struct Instrumentation *pe_local_instr;
+ /* dsa area used to store progressive explain data */
+ dsa_area *pe_a;
} ExplainState;
typedef void (*ExplainOptionHandler) (ExplainState *, DefElem *, ParseState *);
diff --git a/src/include/executor/execdesc.h b/src/include/executor/execdesc.h
index ba53305ad42..27692aee542 100644
--- a/src/include/executor/execdesc.h
+++ b/src/include/executor/execdesc.h
@@ -48,6 +48,7 @@ typedef struct QueryDesc
TupleDesc tupDesc; /* descriptor for result tuples */
EState *estate; /* executor's query-wide state */
PlanState *planstate; /* tree of per-plan-node state */
+ struct ExplainState *pestate; /* progressive explain state if enabled */
/* This field is set by ExecutePlan */
bool already_executed; /* true if previously executed */
diff --git a/src/include/executor/instrument.h b/src/include/executor/instrument.h
index 03653ab6c6c..21de2a5632d 100644
--- a/src/include/executor/instrument.h
+++ b/src/include/executor/instrument.h
@@ -109,6 +109,7 @@ extern void InstrStartNode(Instrumentation *instr);
extern void InstrStopNode(Instrumentation *instr, double nTuples);
extern void InstrUpdateTupleCount(Instrumentation *instr, double nTuples);
extern void InstrEndLoop(Instrumentation *instr);
+extern void InstrEndLoopForce(Instrumentation *instr);
extern void InstrAggNode(Instrumentation *dst, Instrumentation *add);
extern void InstrStartParallelQuery(void);
extern void InstrEndParallelQuery(BufferUsage *bufusage, WalUsage *walusage);
diff --git a/src/include/nodes/execnodes.h b/src/include/nodes/execnodes.h
index e42f9f9f957..c863cb8f032 100644
--- a/src/include/nodes/execnodes.h
+++ b/src/include/nodes/execnodes.h
@@ -57,6 +57,7 @@ struct ExprState;
struct ExprContext;
struct RangeTblEntry; /* avoid including parsenodes.h here */
struct ExprEvalStep; /* avoid including execExpr.h everywhere */
+struct QueryDesc; /* avoid including execdesc.h here */
struct CopyMultiInsertBuffer;
struct LogicalTapeSet;
@@ -763,6 +764,9 @@ typedef struct EState
*/
List *es_insert_pending_result_relations;
List *es_insert_pending_modifytables;
+
+ /* Reference to query descriptor */
+ struct QueryDesc *query_desc;
} EState;
@@ -1159,6 +1163,8 @@ typedef struct PlanState
ExecProcNodeMtd ExecProcNode; /* function to return next tuple */
ExecProcNodeMtd ExecProcNodeReal; /* actual function, if above is a
* wrapper */
+ ExecProcNodeMtd ExecProcNodeOriginal; /* pointer to original function
+ * another wrapper was added */
Instrumentation *instrument; /* Optional runtime stats for this node */
WorkerInstrumentation *worker_instrument; /* per-worker instrumentation */
diff --git a/src/include/storage/lwlock.h b/src/include/storage/lwlock.h
index ffa03189e2d..f3499d307f4 100644
--- a/src/include/storage/lwlock.h
+++ b/src/include/storage/lwlock.h
@@ -218,6 +218,7 @@ typedef enum BuiltinTrancheIds
LWTRANCHE_SUBTRANS_SLRU,
LWTRANCHE_XACT_SLRU,
LWTRANCHE_PARALLEL_VACUUM_DSA,
+ LWTRANCHE_PROGRESSIVE_EXPLAIN_DSA,
LWTRANCHE_FIRST_USER_DEFINED,
} BuiltinTrancheIds;
diff --git a/src/include/storage/lwlocklist.h b/src/include/storage/lwlocklist.h
index 932024b1b0b..7d88e7e9b58 100644
--- a/src/include/storage/lwlocklist.h
+++ b/src/include/storage/lwlocklist.h
@@ -84,3 +84,4 @@ PG_LWLOCK(50, DSMRegistry)
PG_LWLOCK(51, InjectionPoint)
PG_LWLOCK(52, SerialControl)
PG_LWLOCK(53, AioWorkerSubmissionQueue)
+PG_LWLOCK(54, ProgressiveExplainHash)
diff --git a/src/include/utils/guc.h b/src/include/utils/guc.h
index f619100467d..cff5c1f4cdb 100644
--- a/src/include/utils/guc.h
+++ b/src/include/utils/guc.h
@@ -278,6 +278,16 @@ extern PGDLLIMPORT int log_min_duration_statement;
extern PGDLLIMPORT int log_temp_files;
extern PGDLLIMPORT double log_statement_sample_rate;
extern PGDLLIMPORT double log_xact_sample_rate;
+extern PGDLLIMPORT int progressive_explain;
+extern PGDLLIMPORT int progressive_explain_min_duration;
+extern PGDLLIMPORT int progressive_explain_interval;
+extern PGDLLIMPORT int progressive_explain_format;
+extern PGDLLIMPORT bool progressive_explain_verbose;
+extern PGDLLIMPORT bool progressive_explain_settings;
+extern PGDLLIMPORT bool progressive_explain_timing;
+extern PGDLLIMPORT bool progressive_explain_buffers;
+extern PGDLLIMPORT bool progressive_explain_wal;
+extern PGDLLIMPORT bool progressive_explain_costs;
extern PGDLLIMPORT char *backtrace_functions;
extern PGDLLIMPORT int temp_file_limit;
@@ -322,6 +332,7 @@ extern PGDLLIMPORT const struct config_enum_entry io_method_options[];
extern PGDLLIMPORT const struct config_enum_entry recovery_target_action_options[];
extern PGDLLIMPORT const struct config_enum_entry wal_level_options[];
extern PGDLLIMPORT const struct config_enum_entry wal_sync_method_options[];
+extern PGDLLIMPORT const struct config_enum_entry explain_format_options[];
/*
* Functions exported by guc.c
diff --git a/src/include/utils/timeout.h b/src/include/utils/timeout.h
index 7b19beafdc9..ea66a0505d9 100644
--- a/src/include/utils/timeout.h
+++ b/src/include/utils/timeout.h
@@ -36,6 +36,8 @@ typedef enum TimeoutId
IDLE_STATS_UPDATE_TIMEOUT,
CLIENT_CONNECTION_CHECK_TIMEOUT,
STARTUP_PROGRESS_TIMEOUT,
+ PROGRESSIVE_EXPLAIN_STARTUP_TIMEOUT,
+ PROGRESSIVE_EXPLAIN_TIMEOUT,
/* First user-definable timeout reason */
USER_TIMEOUT,
/* Maximum number of timeout reasons */
diff --git a/src/test/modules/test_misc/t/008_progressive_explain.pl b/src/test/modules/test_misc/t/008_progressive_explain.pl
new file mode 100644
index 00000000000..05e555a5e26
--- /dev/null
+++ b/src/test/modules/test_misc/t/008_progressive_explain.pl
@@ -0,0 +1,130 @@
+# Copyright (c) 2023-2025, PostgreSQL Global Development Group
+#
+# Test progressive explain
+#
+# We need to make sure pg_stat_progress_explain does not show rows for the local
+# session, even if progressive explain is enabled. For other sessions pg_stat_progress_explain
+# should contain data for a PID only if progressive_explain is enabled and a query
+# is running. Data needs to be removed when query finishes (or gets cancelled).
+
+use strict;
+use PostgreSQL::Test::Cluster;
+use PostgreSQL::Test::Utils;
+use Test::More;
+
+# Initialize node
+my $node = PostgreSQL::Test::Cluster->new('progressive_explain');
+
+$node->init;
+# Configure progressive explain to be logged immediately
+$node->append_conf('postgresql.conf', 'progressive_explain_min_duration = 0');
+$node->start;
+
+# Test for local session
+sub test_local_session
+{
+ my $setting = $_[0];
+ # Make sure local session does not appear in pg_stat_progress_explain
+ my $count = $node->safe_psql(
+ 'postgres', qq[
+ SET progressive_explain='$setting';
+ SELECT count(*) from pg_stat_progress_explain WHERE pid = pg_backend_pid()
+ ]);
+
+ ok($count == "0",
+ "Session cannot see its own explain with progressive_explain set to ${setting}");
+}
+
+# Tests for peer session
+sub test_peer_session
+{
+ my $setting = $_[0];
+ my $ret;
+
+ # Start a background session and get its PID
+ my $background_psql = $node->background_psql(
+ 'postgres',
+ on_error_stop => 0);
+
+ my $pid = $background_psql->query_safe(
+ qq[
+ SELECT pg_backend_pid();
+ ]);
+
+ # Configure progressive explain in background session and run a simple query
+ # letting it finish
+ $background_psql->query_safe(
+ qq[
+ SET progressive_explain='$setting';
+ SELECT 1;
+ ]);
+
+ # Check that pg_stat_progress_explain contains no row for the PID that finished
+ # its query gracefully
+ $ret = $node->safe_psql(
+ 'postgres', qq[
+ SELECT count(*) FROM pg_stat_progress_explain where pid = $pid
+ ]);
+
+ ok($ret == "0",
+ "pg_stat_progress_explain empty for completed query with progressive_explain set to ${setting}");
+
+ # Start query in background session and leave it running
+ $background_psql->query_until(
+ qr/start/, q(
+ \echo start
+ SELECT pg_sleep(600);
+ ));
+
+ $ret = $node->safe_psql(
+ 'postgres', qq[
+ SELECT count(*) FROM pg_stat_progress_explain where pid = $pid
+ ]);
+
+ # If progressive_explain is disabled pg_stat_progress_explain should not contain
+ # row for PID
+ if ($setting eq 'off') {
+ ok($ret == "0",
+ "pg_stat_progress_explain empty for running query with progressive_explain set to ${setting}");
+ }
+ # 1 row for pid is expected for running query
+ else {
+ ok($ret == "1",
+ "pg_stat_progress_explain with 1 row for running query with progressive_explain set to ${setting}");
+ }
+
+ # Terminate running query and make sure it is gone
+ $node->safe_psql(
+ 'postgres', qq[
+ SELECT pg_cancel_backend($pid);
+ ]);
+
+ $node->poll_query_until(
+ 'postgres', qq[
+ SELECT count(*) = 0 FROM pg_stat_activity
+ WHERE pid = $pid and state = 'active'
+ ]);
+
+ # Check again pg_stat_progress_explain and confirm that the existing row is
+ # now gone
+ $ret = $node->safe_psql(
+ 'postgres', qq[
+ SELECT count(*) FROM pg_stat_progress_explain where pid = $pid
+ ]);
+
+ ok($ret == "0",
+ "pg_stat_progress_explain empty for canceled query with progressive_explain set to ${setting}");
+}
+
+# Run tests for the local session
+test_local_session('off');
+test_local_session('explain');
+test_local_session('analyze');
+
+# Run tests for peer session
+test_peer_session('off');
+test_peer_session('explain');
+test_peer_session('analyze');
+
+$node->stop;
+done_testing();
diff --git a/src/test/regress/expected/rules.out b/src/test/regress/expected/rules.out
index 47478969135..62b70cf4618 100644
--- a/src/test/regress/expected/rules.out
+++ b/src/test/regress/expected/rules.out
@@ -2041,6 +2041,13 @@ pg_stat_progress_create_index| SELECT s.pid,
s.param15 AS partitions_done
FROM (pg_stat_get_progress_info('CREATE INDEX'::text) s(pid, datid, relid, param1, param2, param3, param4, param5, param6, param7, param8, param9, param10, param11, param12, param13, param14, param15, param16, param17, param18, param19, param20)
LEFT JOIN pg_database d ON ((s.datid = d.oid)));
+pg_stat_progress_explain| SELECT s.datid,
+ d.datname,
+ s.pid,
+ s.last_update,
+ s.query_plan
+ FROM (pg_stat_progress_explain() s(datid, pid, last_update, query_plan)
+ LEFT JOIN pg_database d ON ((s.datid = d.oid)));
pg_stat_progress_vacuum| SELECT s.pid,
s.datid,
d.datname,
diff --git a/src/tools/pgindent/typedefs.list b/src/tools/pgindent/typedefs.list
index 3fbf5a4c212..0fcddb8ff88 100644
--- a/src/tools/pgindent/typedefs.list
+++ b/src/tools/pgindent/typedefs.list
@@ -2296,6 +2296,9 @@ ProcessUtilityContext
ProcessUtility_hook_type
ProcessingMode
ProgressCommandType
+ProgressiveExplain
+ProgressiveExplainData
+ProgressiveExplainHashEntry
ProjectSet
ProjectSetPath
ProjectSetState
--
2.43.0
Sending a new version that includes the new explain_progressive.c file in
meson.build.
Rafael.
Attachments:
v12-0001-Proposal-for-progressive-explains.patchapplication/octet-stream; name=v12-0001-Proposal-for-progressive-explains.patchDownload
From 27285403a056ff03d43542188797774a79550169 Mon Sep 17 00:00:00 2001
From: rafaelthca <rafaelthca@gmail.com>
Date: Tue, 25 Mar 2025 19:05:59 -0300
Subject: [PATCH v12] Proposal for progressive explains
This proposal introduces a feature to print execution plans of active
queries in an in-memory shared hash object so that other sessions can
visualize via new view pg_stat_progress_explain.
Plans are only printed if new GUC parameter progressive_explain is
enabled.
When progressive_explain is set to 'explain' the plan will be printed
only once after progressive_explain_min_duration. If set to 'analyze'
the QueryDesc will be adjusted adding instrumentation flags. In that
case the plan will be printed on a fixed interval controlled by new
GUC parameter progressive_explain_interval including all instrumentation
stats computed so far (per node rows and execution time).
New view:
- pg_stat_progress_explain
- datid: OID of the database
- datname: Name of the database
- pid: PID of the process running the query
- last_update: timestamp when plan was last printed
- query_plan: the actual plan (limited read privileges)
New GUCs:
- progressive_explain: if progressive plans are printed for local
session.
- type: enum
- default: off
- values: [off, explain, analyze]
- context: user
- progressive_explain_min_duration: min query duration until progressive
explain starts.
- type: int
- default: 1s
- min: 0
- context: user
- progressive_explain_interval: interval between each explain print.
- type: int
- default: 1s
- min: 10ms
- context: user
- progressive_explain_format: format used to print the plans.
- type: enum
- default: text
- values: [TEXT, XML, JSON, or YAML]
- context: user
- progressive_explain_settings: controls whether information about
modified configuration is added to the printed plan.
- type: bool
- default: off
- context: user
- progressive_explain_verbose: controls whether verbose details are
added to the printed plan.
- type: bool
- default: off
- context: user
- progressive_explain_buffers: controls whether buffers details are
added to the printed plan.
- type: bool
- default: off
- context: user
- progressive_explain_timing: controls whether per node timing details
are added to the printed plan.
- type: bool
- default: true
- context: user
- progressive_explain_wal: controls whether WAL record generation
details are added to the printed plan.
- type: bool
- default: off
- context: user
- progressive_explain_costs: controls whether estimated startup and
total cost of each plan noded is added to the printed plan.
- type: bool
- default: true
- context: user
---
contrib/auto_explain/auto_explain.c | 10 +-
doc/src/sgml/config.sgml | 156 ++++
doc/src/sgml/monitoring.sgml | 82 ++
doc/src/sgml/perform.sgml | 97 +++
src/backend/access/transam/xact.c | 2 +
src/backend/catalog/system_views.sql | 10 +
src/backend/commands/Makefile | 1 +
src/backend/commands/explain.c | 162 ++--
src/backend/commands/explain_format.c | 12 +
src/backend/commands/explain_progressive.c | 746 ++++++++++++++++++
src/backend/commands/meson.build | 1 +
src/backend/executor/execMain.c | 24 +
src/backend/executor/execProcnode.c | 14 +-
src/backend/executor/instrument.c | 20 +-
src/backend/storage/ipc/ipci.c | 7 +
src/backend/storage/lmgr/lwlock.c | 1 +
.../utils/activity/wait_event_names.txt | 1 +
src/backend/utils/init/postinit.c | 19 +
src/backend/utils/misc/guc_tables.c | 130 +++
src/backend/utils/misc/postgresql.conf.sample | 14 +
src/include/catalog/pg_proc.dat | 10 +
src/include/commands/explain_progressive.h | 56 ++
src/include/commands/explain_state.h | 9 +
src/include/executor/execdesc.h | 1 +
src/include/executor/instrument.h | 1 +
src/include/nodes/execnodes.h | 6 +
src/include/storage/lwlock.h | 1 +
src/include/storage/lwlocklist.h | 1 +
src/include/utils/guc.h | 11 +
src/include/utils/timeout.h | 2 +
.../test_misc/t/008_progressive_explain.pl | 130 +++
src/test/regress/expected/rules.out | 7 +
src/tools/pgindent/typedefs.list | 3 +
33 files changed, 1688 insertions(+), 59 deletions(-)
create mode 100644 src/backend/commands/explain_progressive.c
create mode 100644 src/include/commands/explain_progressive.h
create mode 100644 src/test/modules/test_misc/t/008_progressive_explain.pl
diff --git a/contrib/auto_explain/auto_explain.c b/contrib/auto_explain/auto_explain.c
index 3b73bd19107..769b79ecc6b 100644
--- a/contrib/auto_explain/auto_explain.c
+++ b/contrib/auto_explain/auto_explain.c
@@ -39,14 +39,6 @@ static int auto_explain_log_level = LOG;
static bool auto_explain_log_nested_statements = false;
static double auto_explain_sample_rate = 1;
-static const struct config_enum_entry format_options[] = {
- {"text", EXPLAIN_FORMAT_TEXT, false},
- {"xml", EXPLAIN_FORMAT_XML, false},
- {"json", EXPLAIN_FORMAT_JSON, false},
- {"yaml", EXPLAIN_FORMAT_YAML, false},
- {NULL, 0, false}
-};
-
static const struct config_enum_entry loglevel_options[] = {
{"debug5", DEBUG5, false},
{"debug4", DEBUG4, false},
@@ -188,7 +180,7 @@ _PG_init(void)
NULL,
&auto_explain_log_format,
EXPLAIN_FORMAT_TEXT,
- format_options,
+ explain_format_options,
PGC_SUSET,
0,
NULL,
diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml
index 69fc93dffc4..ff0b147877e 100644
--- a/doc/src/sgml/config.sgml
+++ b/doc/src/sgml/config.sgml
@@ -8674,6 +8674,162 @@ COPY postgres_log FROM '/full/path/to/logfile.csv' WITH csv;
</listitem>
</varlistentry>
+ <varlistentry id="guc-progressive-explain" xreflabel="progressive_explain">
+ <term><varname>progressive_explain</varname> (<type>enum</type>)
+ <indexterm>
+ <primary><varname>progressive_explain</varname> configuration parameter</primary>
+ </indexterm>
+ </term>
+ <listitem>
+ <para>
+ Determines whether progressive explains are enabled and how
+ they are executed; see <xref linkend="using-explain-progressive"/>.
+ Possible values are <literal>off</literal>, <literal>explain</literal>
+ and <literal>analyze</literal>. The default is <literal>off</literal>.
+ When set to <literal>explain</literal> the plan will be printed only
+ once after <xref linkend="guc-progressive-explain-min-duration"/>. If
+ set to <literal>analyze</literal>, instrumentation flags are enabled,
+ causing the plan to be printed on a fixed interval controlled by
+ <xref linkend="guc-progressive-explain-interval"/> including all
+ instrumentation stats computed so far.
+ </para>
+ </listitem>
+ </varlistentry>
+
+ <varlistentry id="guc-progressive-explain-min-duration" xreflabel="progressive_explain_min_duration">
+ <term><varname>progressive_explain_min_duration</varname> (<type>integer</type>)
+ <indexterm>
+ <primary><varname>progressive_explain_min_duration</varname> configuration parameter</primary>
+ </indexterm>
+ </term>
+ <listitem>
+ <para>
+ Sets the threshold (in milliseconds) until progressive explain is
+ printed for the first time. The default is <literal>1s</literal>.
+ </para>
+ </listitem>
+ </varlistentry>
+
+ <varlistentry id="guc-progressive-explain-interval" xreflabel="progressive_explain_interval">
+ <term><varname>progressive_explain_interval</varname> (<type>integer</type>)
+ <indexterm>
+ <primary><varname>progressive_explain_interval</varname> configuration parameter</primary>
+ </indexterm>
+ </term>
+ <listitem>
+ <para>
+ Sets the interval (in milliseconds) between each instrumented
+ progressive explain. The default is <literal>1s</literal>.
+ </para>
+ </listitem>
+ </varlistentry>
+
+ <varlistentry id="guc-progressive-explain-buffers" xreflabel="progressive_explain_buffers">
+ <term><varname>progressive_explain_buffers</varname> (<type>boolean</type>)
+ <indexterm>
+ <primary><varname>progressive_explain_buffers</varname> configuration parameter</primary>
+ </indexterm>
+ </term>
+ <listitem>
+ <para>
+ Controls whether information on buffer usage is added to
+ progressive explains. Equivalent to the BUFFERS option of
+ EXPLAIN. The default is <literal>off</literal>.
+ </para>
+ </listitem>
+ </varlistentry>
+
+ <varlistentry id="guc-progressive-explain-timing" xreflabel="progressive_explain_timing">
+ <term><varname>progressive_explain_timing</varname> (<type>boolean</type>)
+ <indexterm>
+ <primary><varname>progressive_explain_timing</varname> configuration parameter</primary>
+ </indexterm>
+ </term>
+ <listitem>
+ <para>
+ Controls whether information on per node timing is added
+ to progressive explains. Equivalent to the TIMING option of
+ EXPLAIN. The default is <literal>off</literal>.
+ </para>
+ </listitem>
+ </varlistentry>
+
+ <varlistentry id="guc-progressive-explain-wal" xreflabel="progressive_explain_wal">
+ <term><varname>progressive_explain_wal</varname> (<type>boolean</type>)
+ <indexterm>
+ <primary><varname>progressive_explain_wal</varname> configuration parameter</primary>
+ </indexterm>
+ </term>
+ <listitem>
+ <para>
+ Controls whether information on WAL record generation is
+ added to progressive explains. Equivalent to the WAL option of
+ EXPLAIN. The default is <literal>off</literal>.
+ </para>
+ </listitem>
+ </varlistentry>
+
+ <varlistentry id="guc-progressive-explain-verbose" xreflabel="progressive_explain_verbose">
+ <term><varname>progressive_explain_verbose</varname> (<type>boolean</type>)
+ <indexterm>
+ <primary><varname>progressive_explain_verbose</varname> configuration parameter</primary>
+ </indexterm>
+ </term>
+ <listitem>
+ <para>
+ Controls whether verbose details are added to progressive explains.
+ Equivalent to the VERBOSE option of EXPLAIN. The default is
+ <literal>off</literal>.
+ </para>
+ </listitem>
+ </varlistentry>
+
+ <varlistentry id="guc-progressive-explain-settings" xreflabel="progressive_explain_settings">
+ <term><varname>progressive_explain_settings</varname> (<type>boolean</type>)
+ <indexterm>
+ <primary><varname>progressive_explain_settings</varname> configuration parameter</primary>
+ </indexterm>
+ </term>
+ <listitem>
+ <para>
+ Controls whether information on modified configuration is added to
+ progressive explains. Equivalent to the SETTINGS option of EXPLAIN.
+ The default is <literal>off</literal>.
+ </para>
+ </listitem>
+ </varlistentry>
+
+ <varlistentry id="guc-progressive-explain-costs" xreflabel="progressive_explain_costs">
+ <term><varname>progressive_explain_costs</varname> (<type>boolean</type>)
+ <indexterm>
+ <primary><varname>progressive_explain_costs</varname> configuration parameter</primary>
+ </indexterm>
+ </term>
+ <listitem>
+ <para>
+ Controls whether information on the estimated startup and total cost of
+ each plan node is added to progressive explains. Equivalent to the COSTS
+ option of EXPLAIN.
+ The default is <literal>off</literal>.
+ </para>
+ </listitem>
+ </varlistentry>
+
+ <varlistentry id="guc-progressive-explain-format" xreflabel="progressive_explain_format">
+ <term><varname>progressive_explain_format</varname> (<type>enum</type>)
+ <indexterm>
+ <primary><varname>progressive_explain_format</varname> configuration parameter</primary>
+ </indexterm>
+ </term>
+ <listitem>
+ <para>
+ Selects the EXPLAIN output format to be used with progressive
+ explains. Equivalent to the FORMAT option of EXPLAIN. The default
+ is <literal>text</literal>.
+ </para>
+ </listitem>
+ </varlistentry>
+
</variablelist>
</sect2>
diff --git a/doc/src/sgml/monitoring.sgml b/doc/src/sgml/monitoring.sgml
index 0960f5ba94a..14780c54565 100644
--- a/doc/src/sgml/monitoring.sgml
+++ b/doc/src/sgml/monitoring.sgml
@@ -6840,6 +6840,88 @@ FROM pg_stat_get_backend_idset() AS backendid;
</sect2>
+<sect2 id="explain-progress-reporting">
+ <title>EXPLAIN Progress Reporting</title>
+
+ <indexterm>
+ <primary>pg_stat_progress_explain</primary>
+ </indexterm>
+
+ <para>
+ Whenever a client backend or parallel worker is running a query with
+ <xref linkend="guc-progressive-explain"/> enabled, the
+ <structname>pg_stat_progress_explain</structname> view will contain a
+ corresponding row with query plan details; see
+ <xref linkend="using-explain-progressive"/>. The table below describe the
+ information that will be reported.
+ </para>
+
+ <table id="pg-stat-progress-explain-view" xreflabel="pg_stat_progress_explain">
+ <title><structname>pg_stat_progress_explain</structname> View</title>
+ <tgroup cols="1">
+ <thead>
+ <row>
+ <entry role="catalog_table_entry"><para role="column_definition">
+ Column Type
+ </para>
+ <para>
+ Description
+ </para></entry>
+ </row>
+ </thead>
+
+ <tbody>
+ <row>
+ <entry role="catalog_table_entry"><para role="column_definition">
+ <structfield>datid</structfield> <type>oid</type>
+ </para>
+ <para>
+ OID of the database this backend is connected to
+ </para></entry>
+ </row>
+
+ <row>
+ <entry role="catalog_table_entry"><para role="column_definition">
+ <structfield>datname</structfield> <type>name</type>
+ </para>
+ <para>
+ Name of the database this backend is connected to
+ </para></entry>
+ </row>
+
+ <row>
+ <entry role="catalog_table_entry"><para role="column_definition">
+ <structfield>pid</structfield> <type>integer</type>
+ </para>
+ <para>
+ Process ID of a client backend or parallel worker.
+ </para></entry>
+ </row>
+
+ <row>
+ <entry role="catalog_table_entry"><para role="column_definition">
+ <structfield>last_update</structfield> <type>timestamp with time zone</type>
+ </para>
+ <para>
+ Timestamp when plan was last printed.
+ </para></entry>
+ </row>
+
+ <row>
+ <entry role="catalog_table_entry"><para role="column_definition">
+ <structfield>query_plan</structfield> <type>text</type>
+ </para>
+ <para>
+ The progressive explain text.
+ </para></entry>
+ </row>
+
+ </tbody>
+ </tgroup>
+ </table>
+
+ </sect2>
+
</sect1>
<sect1 id="dynamic-trace">
diff --git a/doc/src/sgml/perform.sgml b/doc/src/sgml/perform.sgml
index 387baac7e8c..04a78f29df9 100644
--- a/doc/src/sgml/perform.sgml
+++ b/doc/src/sgml/perform.sgml
@@ -1169,6 +1169,103 @@ EXPLAIN ANALYZE SELECT * FROM tenk1 WHERE unique1 < 100 AND unique2 > 9000
</para>
</sect2>
+ <sect2 id="using-explain-progressive">
+ <title>Progressive <command>EXPLAIN</command></title>
+
+ <para>
+ The query plan created by the planner for any given active query can
+ be visualized by any session via <xref linkend="pg-stat-progress-explain-view"/>
+ view when <xref linkend="guc-progressive-explain"/> is enabled in the
+ client backend or parallel worker executing query and after min duration
+ specified by <xref linkend="guc-progressive-explain-min-duration"/> has
+ passed. Settings <xref linkend="guc-progressive-explain-timing"/>,
+ <xref linkend="guc-progressive-explain-buffers"/> and
+ <xref linkend="guc-progressive-explain-wal"/> control which additional
+ instrumentaton details are collected and included in the output while
+ <xref linkend="guc-progressive-explain-format"/>,
+ <xref linkend="guc-progressive-explain-verbose"/>,
+ <xref linkend="guc-progressive-explain-settings"/> and
+ <xref linkend="guc-progressive-explain-costs"/>
+ define how the plan is printed and which details are added there.
+ </para>
+
+ <para>
+ When <xref linkend="guc-progressive-explain"/> is set to <literal>explain</literal>
+ the plan will be printed once at the beginning of the query.
+ </para>
+
+ <para>
+<screen>
+SET progressive_explain = 'explain';
+SET
+
+SELECT * FROM test t1 INNER JOIN test t2 ON t1.c1=t2.c1;
+</screen>
+ </para>
+ <para>
+<screen>
+SELECT * FROM pg_stat_progress_explain;
+datid | datname | pid | last_update | query_plan
+-------+----------+-------+-------------------------------+------------------------------------------------------------------------------
+ 5 | postgres | 73972 | 2025-03-13 23:41:01.606324-03 | Hash Join (cost=327879.85..878413.95 rows=9999860 width=18) +
+ | | | | Hash Cond: (t1.c1 = t2.c1) +
+ | | | | -> Seq Scan on test t1 (cost=0.00..154053.60 rows=9999860 width=9) +
+ | | | | -> Hash (cost=154053.60..154053.60 rows=9999860 width=9) +
+ | | | | -> Seq Scan on test t2 (cost=0.00..154053.60 rows=9999860 width=9)+
+ | | | |
+(1 row)
+</screen>
+ </para>
+
+ <para>
+ Setting <xref linkend="guc-progressive-explain"/> to <literal>analyze</literal>
+ will enable instrumentation and the detailed plan is printed on a fixed interval
+ controlled by <xref linkend="guc-progressive-explain-interval"/>, including
+ per node accumulated row count and other statistics.
+ </para>
+
+ <para>
+ Progressive explains include additional information per node to help analyzing
+ execution progress:
+
+ <itemizedlist>
+ <listitem>
+ <para>
+ current: the plan node currently being processed.
+ </para>
+ </listitem>
+ <listitem>
+ <para>
+ never executed: a plan node not processed yet.
+ </para>
+ </listitem>
+ </itemizedlist>
+ </para>
+ <para>
+<screen>
+SET progressive_explain = 'analyze';
+SET
+
+SELECT * FROM test t1 INNER JOIN test t2 ON t1.c1=t2.c1;
+</screen>
+ </para>
+ <para>
+<screen>
+SELECT * FROM pg_stat_progress_explain;
+datid | datname | pid | last_update | query_plan
+-------+----------+-------+-------------------------------+---------------------------------------------------------------------------------------------------------------------------------------
+ 5 | postgres | 73972 | 2025-03-13 23:41:53.951884-03 | Hash Join (cost=327879.85..878413.95 rows=9999860 width=18) (actual time=1581.469..2963.158 rows=64862.00 loops=1) +
+ | | | | Hash Cond: (t1.c1 = t2.c1) +
+ | | | | -> Seq Scan on test t1 (cost=0.00..154053.60 rows=9999860 width=9) (actual time=0.079..486.702 rows=8258962.00 loops=1) (current)+
+ | | | | -> Hash (cost=154053.60..154053.60 rows=9999860 width=9) (actual time=1580.933..1580.933 rows=10000000.00 loops=1) +
+ | | | | -> Seq Scan on test t2 (cost=0.00..154053.60 rows=9999860 width=9) (actual time=0.004..566.961 rows=10000000.00 loops=1) +
+ | | | |
+(1 row)
+</screen>
+ </para>
+
+ </sect2>
+
</sect1>
<sect1 id="planner-stats">
diff --git a/src/backend/access/transam/xact.c b/src/backend/access/transam/xact.c
index b885513f765..2a0214d4935 100644
--- a/src/backend/access/transam/xact.c
+++ b/src/backend/access/transam/xact.c
@@ -36,6 +36,7 @@
#include "catalog/pg_enum.h"
#include "catalog/storage.h"
#include "commands/async.h"
+#include "commands/explain_progressive.h"
#include "commands/tablecmds.h"
#include "commands/trigger.h"
#include "common/pg_prng.h"
@@ -2993,6 +2994,7 @@ AbortTransaction(void)
AtEOXact_PgStat(false, is_parallel_worker);
AtEOXact_ApplyLauncher(false);
AtEOXact_LogicalRepWorkers(false);
+ AtEOXact_ProgressiveExplain();
pgstat_report_xact_timestamp(0);
}
diff --git a/src/backend/catalog/system_views.sql b/src/backend/catalog/system_views.sql
index 31d269b7ee0..767735c1a2c 100644
--- a/src/backend/catalog/system_views.sql
+++ b/src/backend/catalog/system_views.sql
@@ -1334,6 +1334,16 @@ CREATE VIEW pg_stat_progress_copy AS
FROM pg_stat_get_progress_info('COPY') AS S
LEFT JOIN pg_database D ON S.datid = D.oid;
+CREATE VIEW pg_stat_progress_explain AS
+ SELECT
+ S.datid AS datid,
+ D.datname AS datname,
+ S.pid,
+ S.last_update,
+ S.query_plan
+ FROM pg_stat_progress_explain() AS S
+ LEFT JOIN pg_database AS D ON (S.datid = D.oid);
+
CREATE VIEW pg_user_mappings AS
SELECT
U.oid AS umid,
diff --git a/src/backend/commands/Makefile b/src/backend/commands/Makefile
index cb2fbdc7c60..e10224b2cd2 100644
--- a/src/backend/commands/Makefile
+++ b/src/backend/commands/Makefile
@@ -36,6 +36,7 @@ OBJS = \
explain.o \
explain_dr.o \
explain_format.o \
+ explain_progressive.o \
explain_state.o \
extension.o \
foreigncmds.o \
diff --git a/src/backend/commands/explain.c b/src/backend/commands/explain.c
index 391b34a2af2..bca916634f8 100644
--- a/src/backend/commands/explain.c
+++ b/src/backend/commands/explain.c
@@ -20,9 +20,11 @@
#include "commands/explain.h"
#include "commands/explain_dr.h"
#include "commands/explain_format.h"
+#include "commands/explain_progressive.h"
#include "commands/explain_state.h"
#include "commands/prepare.h"
#include "foreign/fdwapi.h"
+#include "funcapi.h"
#include "jit/jit.h"
#include "libpq/pqformat.h"
#include "libpq/protocol.h"
@@ -34,6 +36,7 @@
#include "rewrite/rewriteHandler.h"
#include "storage/bufmgr.h"
#include "tcop/tcopprot.h"
+#include "utils/backend_status.h"
#include "utils/builtins.h"
#include "utils/guc_tables.h"
#include "utils/json.h"
@@ -41,6 +44,7 @@
#include "utils/rel.h"
#include "utils/ruleutils.h"
#include "utils/snapmgr.h"
+#include "utils/timeout.h"
#include "utils/tuplesort.h"
#include "utils/typcache.h"
#include "utils/xml.h"
@@ -139,7 +143,7 @@ static void show_indexsearches_info(PlanState *planstate, ExplainState *es);
static void show_tidbitmap_info(BitmapHeapScanState *planstate,
ExplainState *es);
static void show_instrumentation_count(const char *qlabel, int which,
- PlanState *planstate, ExplainState *es);
+ Instrumentation *instr, ExplainState *es);
static void show_foreignscan_info(ForeignScanState *fsstate, ExplainState *es);
static const char *explain_get_index_name(Oid indexId);
static bool peek_buffer_usage(ExplainState *es, const BufferUsage *usage);
@@ -596,6 +600,15 @@ ExplainOnePlan(PlannedStmt *plannedstmt, CachedPlan *cplan,
/* We can't run ExecutorEnd 'till we're done printing the stats... */
totaltime += elapsed_time(&starttime);
}
+ else
+ {
+ /*
+ * Handle progressive explain cleanup manually if enabled as it is not
+ * called as part of ExecutorFinish.
+ */
+ if (progressive_explain != PROGRESSIVE_EXPLAIN_NONE)
+ ProgressiveExplainFinish(queryDesc);
+ }
/* grab serialization metrics before we destroy the DestReceiver */
if (es->serialize != EXPLAIN_SERIALIZE_NONE)
@@ -1371,6 +1384,7 @@ ExplainNode(PlanState *planstate, List *ancestors,
const char *partialmode = NULL;
const char *operation = NULL;
const char *custom_name = NULL;
+ Instrumentation *local_instr = NULL;
ExplainWorkersState *save_workers_state = es->workers_state;
int save_indent = es->indent;
bool haschildren;
@@ -1834,17 +1848,38 @@ ExplainNode(PlanState *planstate, List *ancestors,
* instrumentation results the user didn't ask for. But we do the
* InstrEndLoop call anyway, if possible, to reduce the number of cases
* auto_explain has to contend with.
+ *
+ * For regular explains instrumentation clean up is called directly in the
+ * main instrumentation objects. Progressive explains need to clone
+ * instrumentation object and forcibly end the loop in nodes that may be
+ * running.
*/
if (planstate->instrument)
- InstrEndLoop(planstate->instrument);
+ {
+ /* Progressive explain. Use auxiliary instrumentation object */
+ if (es->progressive)
+ {
+ local_instr = es->pe_local_instr;
+ *local_instr = *planstate->instrument;
+
+ /* Force end loop even if node is in progress */
+ InstrEndLoopForce(local_instr);
+ }
+ /* Use main instrumentation */
+ else
+ {
+ local_instr = planstate->instrument;
+ InstrEndLoop(local_instr);
+ }
+ }
if (es->analyze &&
- planstate->instrument && planstate->instrument->nloops > 0)
+ local_instr && local_instr->nloops > 0)
{
- double nloops = planstate->instrument->nloops;
- double startup_ms = 1000.0 * planstate->instrument->startup / nloops;
- double total_ms = 1000.0 * planstate->instrument->total / nloops;
- double rows = planstate->instrument->ntuples / nloops;
+ double nloops = local_instr->nloops;
+ double startup_ms = 1000.0 * local_instr->startup / nloops;
+ double total_ms = 1000.0 * local_instr->total / nloops;
+ double rows = local_instr->ntuples / nloops;
if (es->format == EXPLAIN_FORMAT_TEXT)
{
@@ -1854,6 +1889,9 @@ ExplainNode(PlanState *planstate, List *ancestors,
appendStringInfo(es->str, "time=%.3f..%.3f ", startup_ms, total_ms);
appendStringInfo(es->str, "rows=%.2f loops=%.0f)", rows, nloops);
+
+ if (es->progressive && planstate == es->pe_curr_node)
+ appendStringInfo(es->str, " (current)");
}
else
{
@@ -1866,6 +1904,10 @@ ExplainNode(PlanState *planstate, List *ancestors,
}
ExplainPropertyFloat("Actual Rows", NULL, rows, 2, es);
ExplainPropertyFloat("Actual Loops", NULL, nloops, 0, es);
+
+ /* Progressive explain. Add current node flag is applicable */
+ if (es->progressive && planstate == es->pe_curr_node)
+ ExplainPropertyBool("Current", true, es);
}
}
else if (es->analyze)
@@ -1970,13 +2012,13 @@ ExplainNode(PlanState *planstate, List *ancestors,
"Index Cond", planstate, ancestors, es);
if (((IndexScan *) plan)->indexqualorig)
show_instrumentation_count("Rows Removed by Index Recheck", 2,
- planstate, es);
+ local_instr, es);
show_scan_qual(((IndexScan *) plan)->indexorderbyorig,
"Order By", planstate, ancestors, es);
show_scan_qual(plan->qual, "Filter", planstate, ancestors, es);
if (plan->qual)
show_instrumentation_count("Rows Removed by Filter", 1,
- planstate, es);
+ local_instr, es);
show_indexsearches_info(planstate, es);
break;
case T_IndexOnlyScan:
@@ -1984,16 +2026,16 @@ ExplainNode(PlanState *planstate, List *ancestors,
"Index Cond", planstate, ancestors, es);
if (((IndexOnlyScan *) plan)->recheckqual)
show_instrumentation_count("Rows Removed by Index Recheck", 2,
- planstate, es);
+ local_instr, es);
show_scan_qual(((IndexOnlyScan *) plan)->indexorderby,
"Order By", planstate, ancestors, es);
show_scan_qual(plan->qual, "Filter", planstate, ancestors, es);
if (plan->qual)
show_instrumentation_count("Rows Removed by Filter", 1,
- planstate, es);
+ local_instr, es);
if (es->analyze)
ExplainPropertyFloat("Heap Fetches", NULL,
- planstate->instrument->ntuples2, 0, es);
+ local_instr->ntuples2, 0, es);
show_indexsearches_info(planstate, es);
break;
case T_BitmapIndexScan:
@@ -2006,11 +2048,11 @@ ExplainNode(PlanState *planstate, List *ancestors,
"Recheck Cond", planstate, ancestors, es);
if (((BitmapHeapScan *) plan)->bitmapqualorig)
show_instrumentation_count("Rows Removed by Index Recheck", 2,
- planstate, es);
+ local_instr, es);
show_scan_qual(plan->qual, "Filter", planstate, ancestors, es);
if (plan->qual)
show_instrumentation_count("Rows Removed by Filter", 1,
- planstate, es);
+ local_instr, es);
show_tidbitmap_info((BitmapHeapScanState *) planstate, es);
break;
case T_SampleScan:
@@ -2027,7 +2069,7 @@ ExplainNode(PlanState *planstate, List *ancestors,
show_scan_qual(plan->qual, "Filter", planstate, ancestors, es);
if (plan->qual)
show_instrumentation_count("Rows Removed by Filter", 1,
- planstate, es);
+ local_instr, es);
if (IsA(plan, CteScan))
show_ctescan_info(castNode(CteScanState, planstate), es);
break;
@@ -2038,7 +2080,7 @@ ExplainNode(PlanState *planstate, List *ancestors,
show_scan_qual(plan->qual, "Filter", planstate, ancestors, es);
if (plan->qual)
show_instrumentation_count("Rows Removed by Filter", 1,
- planstate, es);
+ local_instr, es);
ExplainPropertyInteger("Workers Planned", NULL,
gather->num_workers, es);
@@ -2062,7 +2104,7 @@ ExplainNode(PlanState *planstate, List *ancestors,
show_scan_qual(plan->qual, "Filter", planstate, ancestors, es);
if (plan->qual)
show_instrumentation_count("Rows Removed by Filter", 1,
- planstate, es);
+ local_instr, es);
ExplainPropertyInteger("Workers Planned", NULL,
gm->num_workers, es);
@@ -2096,7 +2138,7 @@ ExplainNode(PlanState *planstate, List *ancestors,
show_scan_qual(plan->qual, "Filter", planstate, ancestors, es);
if (plan->qual)
show_instrumentation_count("Rows Removed by Filter", 1,
- planstate, es);
+ local_instr, es);
break;
case T_TableFuncScan:
if (es->verbose)
@@ -2110,7 +2152,7 @@ ExplainNode(PlanState *planstate, List *ancestors,
show_scan_qual(plan->qual, "Filter", planstate, ancestors, es);
if (plan->qual)
show_instrumentation_count("Rows Removed by Filter", 1,
- planstate, es);
+ local_instr, es);
show_table_func_scan_info(castNode(TableFuncScanState,
planstate), es);
break;
@@ -2128,7 +2170,7 @@ ExplainNode(PlanState *planstate, List *ancestors,
show_scan_qual(plan->qual, "Filter", planstate, ancestors, es);
if (plan->qual)
show_instrumentation_count("Rows Removed by Filter", 1,
- planstate, es);
+ local_instr, es);
}
break;
case T_TidRangeScan:
@@ -2145,14 +2187,14 @@ ExplainNode(PlanState *planstate, List *ancestors,
show_scan_qual(plan->qual, "Filter", planstate, ancestors, es);
if (plan->qual)
show_instrumentation_count("Rows Removed by Filter", 1,
- planstate, es);
+ local_instr, es);
}
break;
case T_ForeignScan:
show_scan_qual(plan->qual, "Filter", planstate, ancestors, es);
if (plan->qual)
show_instrumentation_count("Rows Removed by Filter", 1,
- planstate, es);
+ local_instr, es);
show_foreignscan_info((ForeignScanState *) planstate, es);
break;
case T_CustomScan:
@@ -2162,7 +2204,7 @@ ExplainNode(PlanState *planstate, List *ancestors,
show_scan_qual(plan->qual, "Filter", planstate, ancestors, es);
if (plan->qual)
show_instrumentation_count("Rows Removed by Filter", 1,
- planstate, es);
+ local_instr, es);
if (css->methods->ExplainCustomScan)
css->methods->ExplainCustomScan(css, ancestors, es);
}
@@ -2172,11 +2214,11 @@ ExplainNode(PlanState *planstate, List *ancestors,
"Join Filter", planstate, ancestors, es);
if (((NestLoop *) plan)->join.joinqual)
show_instrumentation_count("Rows Removed by Join Filter", 1,
- planstate, es);
+ local_instr, es);
show_upper_qual(plan->qual, "Filter", planstate, ancestors, es);
if (plan->qual)
show_instrumentation_count("Rows Removed by Filter", 2,
- planstate, es);
+ local_instr, es);
break;
case T_MergeJoin:
show_upper_qual(((MergeJoin *) plan)->mergeclauses,
@@ -2185,11 +2227,11 @@ ExplainNode(PlanState *planstate, List *ancestors,
"Join Filter", planstate, ancestors, es);
if (((MergeJoin *) plan)->join.joinqual)
show_instrumentation_count("Rows Removed by Join Filter", 1,
- planstate, es);
+ local_instr, es);
show_upper_qual(plan->qual, "Filter", planstate, ancestors, es);
if (plan->qual)
show_instrumentation_count("Rows Removed by Filter", 2,
- planstate, es);
+ local_instr, es);
break;
case T_HashJoin:
show_upper_qual(((HashJoin *) plan)->hashclauses,
@@ -2198,11 +2240,11 @@ ExplainNode(PlanState *planstate, List *ancestors,
"Join Filter", planstate, ancestors, es);
if (((HashJoin *) plan)->join.joinqual)
show_instrumentation_count("Rows Removed by Join Filter", 1,
- planstate, es);
+ local_instr, es);
show_upper_qual(plan->qual, "Filter", planstate, ancestors, es);
if (plan->qual)
show_instrumentation_count("Rows Removed by Filter", 2,
- planstate, es);
+ local_instr, es);
break;
case T_Agg:
show_agg_keys(castNode(AggState, planstate), ancestors, es);
@@ -2210,7 +2252,7 @@ ExplainNode(PlanState *planstate, List *ancestors,
show_hashagg_info((AggState *) planstate, es);
if (plan->qual)
show_instrumentation_count("Rows Removed by Filter", 1,
- planstate, es);
+ local_instr, es);
break;
case T_WindowAgg:
show_window_def(castNode(WindowAggState, planstate), ancestors, es);
@@ -2219,7 +2261,7 @@ ExplainNode(PlanState *planstate, List *ancestors,
show_upper_qual(plan->qual, "Filter", planstate, ancestors, es);
if (plan->qual)
show_instrumentation_count("Rows Removed by Filter", 1,
- planstate, es);
+ local_instr, es);
show_windowagg_info(castNode(WindowAggState, planstate), es);
break;
case T_Group:
@@ -2227,7 +2269,7 @@ ExplainNode(PlanState *planstate, List *ancestors,
show_upper_qual(plan->qual, "Filter", planstate, ancestors, es);
if (plan->qual)
show_instrumentation_count("Rows Removed by Filter", 1,
- planstate, es);
+ local_instr, es);
break;
case T_Sort:
show_sort_keys(castNode(SortState, planstate), ancestors, es);
@@ -2249,7 +2291,7 @@ ExplainNode(PlanState *planstate, List *ancestors,
show_upper_qual(plan->qual, "Filter", planstate, ancestors, es);
if (plan->qual)
show_instrumentation_count("Rows Removed by Filter", 1,
- planstate, es);
+ local_instr, es);
break;
case T_ModifyTable:
show_modifytable_info(castNode(ModifyTableState, planstate), ancestors,
@@ -2294,10 +2336,10 @@ ExplainNode(PlanState *planstate, List *ancestors,
}
/* Show buffer/WAL usage */
- if (es->buffers && planstate->instrument)
- show_buffer_usage(es, &planstate->instrument->bufusage);
- if (es->wal && planstate->instrument)
- show_wal_usage(es, &planstate->instrument->walusage);
+ if (es->buffers && local_instr)
+ show_buffer_usage(es, &local_instr->bufusage);
+ if (es->wal && local_instr)
+ show_wal_usage(es, &local_instr->walusage);
/* Prepare per-worker buffer/WAL usage */
if (es->workers_state && (es->buffers || es->wal) && es->verbose)
@@ -3975,19 +4017,19 @@ show_tidbitmap_info(BitmapHeapScanState *planstate, ExplainState *es)
*/
static void
show_instrumentation_count(const char *qlabel, int which,
- PlanState *planstate, ExplainState *es)
+ Instrumentation *instr, ExplainState *es)
{
double nfiltered;
double nloops;
- if (!es->analyze || !planstate->instrument)
+ if (!es->analyze || !instr)
return;
if (which == 2)
- nfiltered = planstate->instrument->nfiltered2;
+ nfiltered = instr->nfiltered2;
else
- nfiltered = planstate->instrument->nfiltered1;
- nloops = planstate->instrument->nloops;
+ nfiltered = instr->nfiltered1;
+ nloops = instr->nloops;
/* In text mode, suppress zero counts; they're not interesting enough */
if (nfiltered > 0 || es->format != EXPLAIN_FORMAT_TEXT)
@@ -4668,7 +4710,7 @@ show_modifytable_info(ModifyTableState *mtstate, List *ancestors,
{
show_upper_qual((List *) node->onConflictWhere, "Conflict Filter",
&mtstate->ps, ancestors, es);
- show_instrumentation_count("Rows Removed by Conflict Filter", 1, &mtstate->ps, es);
+ show_instrumentation_count("Rows Removed by Conflict Filter", 1, (&mtstate->ps)->instrument, es);
}
/* EXPLAIN ANALYZE display of actual outcome for each tuple proposed */
@@ -4677,11 +4719,24 @@ show_modifytable_info(ModifyTableState *mtstate, List *ancestors,
double total;
double insert_path;
double other_path;
+ Instrumentation *local_instr;
- InstrEndLoop(outerPlanState(mtstate)->instrument);
+ /* Progressive explain. Use auxiliary instrumentation object */
+ if (es->progressive)
+ {
+ local_instr = es->pe_local_instr;
+ *local_instr = *outerPlanState(mtstate)->instrument;
+ /* Force end loop even if node is in progress */
+ InstrEndLoopForce(local_instr);
+ }
+ else
+ {
+ local_instr = outerPlanState(mtstate)->instrument;
+ InstrEndLoop(local_instr);
+ }
/* count the number of source rows */
- total = outerPlanState(mtstate)->instrument->ntuples;
+ total = local_instr->ntuples;
other_path = mtstate->ps.instrument->ntuples2;
insert_path = total - other_path;
@@ -4701,11 +4756,24 @@ show_modifytable_info(ModifyTableState *mtstate, List *ancestors,
double update_path;
double delete_path;
double skipped_path;
+ Instrumentation *local_instr;
- InstrEndLoop(outerPlanState(mtstate)->instrument);
+ /* Progressive explain. Use auxiliary instrumentation object */
+ if (es->progressive)
+ {
+ local_instr = es->pe_local_instr;
+ *local_instr = *outerPlanState(mtstate)->instrument;
+ /* Force end loop even if node is in progress */
+ InstrEndLoopForce(local_instr);
+ }
+ else
+ {
+ local_instr = outerPlanState(mtstate)->instrument;
+ InstrEndLoop(local_instr);
+ }
/* count the number of source rows */
- total = outerPlanState(mtstate)->instrument->ntuples;
+ total = local_instr->ntuples;
insert_path = mtstate->mt_merge_inserted;
update_path = mtstate->mt_merge_updated;
delete_path = mtstate->mt_merge_deleted;
diff --git a/src/backend/commands/explain_format.c b/src/backend/commands/explain_format.c
index 752691d56db..c0d6973d1e5 100644
--- a/src/backend/commands/explain_format.c
+++ b/src/backend/commands/explain_format.c
@@ -16,6 +16,7 @@
#include "commands/explain.h"
#include "commands/explain_format.h"
#include "commands/explain_state.h"
+#include "utils/guc_tables.h"
#include "utils/json.h"
#include "utils/xml.h"
@@ -25,6 +26,17 @@
#define X_CLOSE_IMMEDIATE 2
#define X_NOWHITESPACE 4
+/*
+ * GUC support
+ */
+const struct config_enum_entry explain_format_options[] = {
+ {"text", EXPLAIN_FORMAT_TEXT, false},
+ {"xml", EXPLAIN_FORMAT_XML, false},
+ {"json", EXPLAIN_FORMAT_JSON, false},
+ {"yaml", EXPLAIN_FORMAT_YAML, false},
+ {NULL, 0, false}
+};
+
static void ExplainJSONLineEnding(ExplainState *es);
static void ExplainXMLTag(const char *tagname, int flags, ExplainState *es);
static void ExplainYAMLLineStarting(ExplainState *es);
diff --git a/src/backend/commands/explain_progressive.c b/src/backend/commands/explain_progressive.c
new file mode 100644
index 00000000000..657670661e9
--- /dev/null
+++ b/src/backend/commands/explain_progressive.c
@@ -0,0 +1,746 @@
+/*-------------------------------------------------------------------------
+ *
+ * explain_progressive.c
+ * Code for the progressive explain feature
+ *
+ * Portions Copyright (c) 1996-2025, PostgreSQL Global Development Group
+ * Portions Copyright (c) 1994-5, Regents of the University of California
+ *
+ * IDENTIFICATION
+ * src/backend/commands/explain_progressive.c
+ *
+ *-------------------------------------------------------------------------
+ */
+#include "postgres.h"
+
+#include "catalog/pg_authid.h"
+#include "commands/explain.h"
+#include "commands/explain_format.h"
+#include "commands/explain_progressive.h"
+#include "commands/explain_state.h"
+#include "foreign/fdwapi.h"
+#include "funcapi.h"
+#include "utils/acl.h"
+#include "utils/backend_status.h"
+#include "utils/builtins.h"
+#include "utils/guc_tables.h"
+#include "utils/timeout.h"
+
+
+#define PROGRESSIVE_EXPLAIN_FREE_SIZE 4096
+
+/* Shared hash to store progressive explains */
+static HTAB *progressiveExplainHash = NULL;
+
+/* Pointer to running query */
+static QueryDesc *activeQueryDesc = NULL;
+
+/* Flag set by timeouts to control when to update progressive explains */
+bool ProgressiveExplainPending = false;
+
+static void ProgressiveExplainInit(QueryDesc *queryDesc);
+static void ProgressiveExplainPrint(QueryDesc *queryDesc);
+static void ProgressiveExplainCleanup(QueryDesc *queryDesc);
+static TupleTableSlot *ExecProcNodeExplain(PlanState *node);
+static void WrapExecProcNodeWithExplain(PlanState *ps);
+static void UnwrapExecProcNodeWithExplain(PlanState *ps);
+
+
+
+/*
+ * ProgressiveExplainSetup -
+ * Adjusts QueryDesc with instrumentation for progressive explains.
+ *
+ * If progressive explain is enabled and configured to collect
+ * instrumentation details, we adjust QueryDesc accordingly even if
+ * the query was not initiated with EXPLAIN ANALYZE. This will
+ * directly affect query execution and add computation overhead.
+ */
+void
+ProgressiveExplainSetup(QueryDesc *queryDesc)
+{
+ /* Setup only if this is the outer most query */
+ if (activeQueryDesc == NULL)
+ {
+ /* Track the outer most query desc */
+ activeQueryDesc = queryDesc;
+ queryDesc->pestate = NULL;
+
+ /* Setup instrumentation if enabled */
+ if (progressive_explain == PROGRESSIVE_EXPLAIN_ANALYZE)
+ {
+ if (progressive_explain_timing)
+ queryDesc->instrument_options |= INSTRUMENT_TIMER;
+ else
+ queryDesc->instrument_options |= INSTRUMENT_ROWS;
+ if (progressive_explain_buffers)
+ queryDesc->instrument_options |= INSTRUMENT_BUFFERS;
+ if (progressive_explain_wal)
+ queryDesc->instrument_options |= INSTRUMENT_WAL;
+ }
+ }
+}
+
+/*
+ * ProgressiveExplainStart
+ * Progressive explain start point.
+ */
+void
+ProgressiveExplainStart(QueryDesc *queryDesc)
+{
+ /*
+ * Progressive explain is only done for the outer most query descriptor.
+ */
+ if (queryDesc == activeQueryDesc)
+ {
+ /* Timeout is only needed if duration > 0 */
+ if (progressive_explain_min_duration == 0)
+ ProgressiveExplainInit(queryDesc);
+ else
+ enable_timeout_after(PROGRESSIVE_EXPLAIN_STARTUP_TIMEOUT,
+ progressive_explain_min_duration);
+ }
+}
+
+/*
+ * ProgressiveExplainInit -
+ * Responsible for initialization of all structures related to progressive
+ * explains.
+ *
+ * We define a ExplainState that will be reused in every iteration of
+ * plan prints.
+ *
+ * Progressive explain plans are printed in shared memory via DSAs. Each
+ * A dynamic shared memory area is created to hold the progressive plans.
+ * Each backend printing plans has its own DSA, which is shared with other
+ * backends via the global progressive explain hash through dsa_handle and
+ * dsa_pointer pointers.
+ *
+ * A memory context release callback is defined for manual resource release
+ * in case of query cancellations.
+ *
+ * A periodic timeout is configured to print the plan in fixed intervals if
+ * progressive explain is configured with instrumentation enabled. Otherwise
+ * the plain plan is printed once.
+ */
+void
+ProgressiveExplainInit(QueryDesc *queryDesc)
+{
+ ExplainState *es;
+ ProgressiveExplainHashEntry *entry;
+ bool found;
+
+ /* Initialize ExplainState to be used for all prints */
+ es = NewExplainState();
+ queryDesc->pestate = es;
+
+ /* Local instrumentation object to be reused for every node */
+ es->pe_local_instr = palloc0(sizeof(Instrumentation));
+
+ /*
+ * Mark ExplainState as progressive so that the plan printing function
+ * adjusts logic accordingly.
+ */
+ es->progressive = true;
+
+ es->analyze = (queryDesc->instrument_options &&
+ (progressive_explain == PROGRESSIVE_EXPLAIN_ANALYZE));
+ es->buffers = (es->analyze && progressive_explain_buffers);
+ es->wal = (es->analyze && progressive_explain_wal);
+ es->timing = (es->analyze && progressive_explain_timing);
+ es->summary = (es->analyze);
+ es->format = progressive_explain_format;
+ es->verbose = progressive_explain_verbose;
+ es->settings = progressive_explain_settings;
+ es->costs = progressive_explain_costs;
+
+ /* Define the DSA and share through the hash */
+ es->pe_a = dsa_create(LWTRANCHE_PROGRESSIVE_EXPLAIN_DSA);
+
+ /* Exclusive access is needed to update the hash */
+ LWLockAcquire(ProgressiveExplainHashLock, LW_EXCLUSIVE);
+
+ /* Find or create an entry with desired hash code */
+ entry = (ProgressiveExplainHashEntry *) hash_search(progressiveExplainHash,
+ &MyProcPid,
+ HASH_ENTER,
+ &found);
+
+ entry->pe_h = dsa_get_handle(es->pe_a);
+ entry->pe_p = (dsa_pointer) NULL;
+
+ LWLockRelease(ProgressiveExplainHashLock);
+
+ /* Enable timeout only if instrumentation is enabled */
+ if (es->analyze)
+ enable_timeout_every(PROGRESSIVE_EXPLAIN_TIMEOUT,
+ TimestampTzPlusMilliseconds(GetCurrentTimestamp(),
+ progressive_explain_interval),
+ progressive_explain_interval);
+
+ /* Printing progressive plan for the first time */
+ ProgressiveExplainPrint(queryDesc);
+}
+
+/*
+ * ProgressiveExplainTrigger -
+ * Called by the timeout handler to start printing progressive
+ * explain plans.
+ */
+void
+ProgressiveExplainTrigger(void)
+{
+ /*
+ * Check that the pointer is still active to avoid corner case where the
+ * timeout function is called after cleanup is done.
+ */
+ if (activeQueryDesc)
+ WrapExecProcNodeWithExplain(activeQueryDesc->planstate);
+}
+
+/*
+ * ProgressiveExplainUpdate
+ * Updates progressive explain for instrumented runs.
+ */
+void
+ProgressiveExplainUpdate(PlanState *node)
+{
+ /* Track the current PlanState */
+ node->state->query_desc->pestate->pe_curr_node = node;
+ ProgressiveExplainPrint(node->state->query_desc);
+ node->state->query_desc->pestate->pe_curr_node = NULL;
+
+ /* Reset timeout flag */
+ ProgressiveExplainPending = false;
+}
+
+/*
+ * ProgressiveExplainPrint -
+ * Prints progressive explain in memory.
+ *
+ * This function resets the reusable ExplainState, prints the
+ * plan and updates the DSA with new data.
+ *
+ * DSA memory allocation is also done here. Amount of shared
+ * memory allocated depends on size of currently printed plan.
+ * There may be reallocations in subsequent calls if new plans
+ * don't fit in the existing area.
+ */
+void
+ProgressiveExplainPrint(QueryDesc *queryDesc)
+{
+ bool alloc_needed = false;
+ QueryDesc *currentQueryDesc = queryDesc;
+ ProgressiveExplainHashEntry *entry;
+ ProgressiveExplainData *pe_data;
+ ExplainState *es = queryDesc->pestate;
+
+ /* Reset the string to be reused */
+ resetStringInfo(es->str);
+
+ /* Print the plan */
+ ExplainBeginOutput(es);
+ ExplainPrintPlan(es, currentQueryDesc);
+ ExplainEndOutput(es);
+
+ /* Exclusive access is needed to update the hash */
+ LWLockAcquire(ProgressiveExplainHashLock, LW_EXCLUSIVE);
+ entry = (ProgressiveExplainHashEntry *) hash_search(progressiveExplainHash,
+ &MyProcPid,
+ HASH_FIND,
+ NULL);
+
+ /* Entry must already exist in shared memory at this point */
+ if (!entry)
+ elog(ERROR, "no entry in progressive explain hash for pid %d",
+ MyProcPid);
+ else
+ {
+ /* Plan was never printed */
+ if (!entry->pe_p)
+ alloc_needed = true;
+ else
+ {
+ pe_data = dsa_get_address(es->pe_a,
+ entry->pe_p);
+
+ /*
+ * Plan does not fit in existing shared memory area. Reallocation
+ * is needed.
+ */
+ if (strlen(es->str->data) > entry->plan_alloc_size)
+ {
+ dsa_free(es->pe_a, entry->pe_p);
+ alloc_needed = true;
+ }
+ }
+
+ if (alloc_needed)
+ {
+ /*
+ * The allocated size combines the length of the currently printed
+ * query plan with an additional delta defined by
+ * PROGRESSIVE_EXPLAIN_FREE_SIZE. This strategy prevents having to
+ * reallocate the segment very often, which would be needed in
+ * case the length of the next printed exceeds the previously
+ * allocated size.
+ */
+ entry->plan_alloc_size = add_size(strlen(es->str->data),
+ PROGRESSIVE_EXPLAIN_FREE_SIZE);
+ entry->pe_p = dsa_allocate(es->pe_a,
+ add_size(sizeof(ProgressiveExplainData),
+ entry->plan_alloc_size));
+ pe_data = dsa_get_address(es->pe_a, entry->pe_p);
+ pe_data->pid = MyProcPid;
+ }
+
+ /* Update shared memory with new data */
+ strcpy(pe_data->plan, es->str->data);
+ pe_data->last_update = GetCurrentTimestamp();
+ }
+
+ LWLockRelease(ProgressiveExplainHashLock);
+}
+
+/*
+ * ProgressiveExplainFinish -
+ * Finalizes query execution with progressive explain enabled.
+ */
+void
+ProgressiveExplainFinish(QueryDesc *queryDesc)
+{
+ /*
+ * Progressive explain is only done for the outer most query descriptor.
+ */
+ if (queryDesc == activeQueryDesc)
+ {
+ /* Startup timeout hasn't triggered yet, just disable it */
+ if (get_timeout_active(PROGRESSIVE_EXPLAIN_STARTUP_TIMEOUT))
+ disable_timeout(PROGRESSIVE_EXPLAIN_STARTUP_TIMEOUT, false);
+ /* Initial progressive explain was done, clean everything */
+ else if (queryDesc && queryDesc->pestate)
+ ProgressiveExplainCleanup(queryDesc);
+ }
+}
+
+/*
+ * ProgressiveExplainFinish -
+ * Finalizes query execution with progressive explain enabled.
+ */
+bool
+ProgressiveExplainIsActive(QueryDesc *queryDesc)
+{
+ if (queryDesc == activeQueryDesc)
+ return true;
+ else
+ return false;
+}
+
+/*
+ * End-of-transaction cleanup for progressive explains.
+ */
+void
+AtEOXact_ProgressiveExplain(void)
+{
+ /*
+ * Only perform cleanup if query descriptor is being tracked, which means
+ * that the feature is enabled for the last query executed before the
+ * abort.
+ */
+ if (activeQueryDesc)
+ ProgressiveExplainCleanup(NULL);
+}
+
+/*
+ * ProgressiveExplainCleanup -
+ * Cleanup routine when progressive explain is enabled.
+ *
+ * We need to deal with structures not automatically released by the memory
+ * context removal. Current tasks are:
+ * - remove local backend from progressive explain hash
+ * - detach from DSA used to store shared data
+ */
+void
+ProgressiveExplainCleanup(QueryDesc *queryDesc)
+{
+ /* Stop timeout */
+ disable_timeout(PROGRESSIVE_EXPLAIN_TIMEOUT, false);
+
+ /* Reset timeout flag */
+ ProgressiveExplainPending = false;
+
+ /* Reset querydesc tracker */
+ activeQueryDesc = NULL;
+
+ LWLockAcquire(ProgressiveExplainHashLock, LW_EXCLUSIVE);
+
+ /*
+ * Only detach from DSA if query ended gracefully, ie, if
+ * ProgressiveExplainCleanup was called by function
+ * ProgressiveExplainFinish
+ */
+ if (queryDesc)
+ dsa_detach(queryDesc->pestate->pe_a);
+ hash_search(progressiveExplainHash, &MyProcPid, HASH_REMOVE, NULL);
+ LWLockRelease(ProgressiveExplainHashLock);
+}
+
+/*
+ * ExecProcNodeExplain -
+ * ExecProcNode wrapper that initializes progressive explain
+ * and uwraps ExecProdNode to the original function.
+ */
+static TupleTableSlot *
+ExecProcNodeExplain(PlanState *node)
+{
+ /* Initialize progressive explain */
+ ProgressiveExplainInit(node->state->query_desc);
+
+ /* Unwrap exec proc node for all nodes */
+ UnwrapExecProcNodeWithExplain(node->state->query_desc->planstate);
+
+ /*
+ * Since unwrapping has already done, call ExecProcNode() not
+ * ExecProcNodeOriginal().
+ */
+ return node->ExecProcNode(node);
+}
+
+/*
+ * ExecProcNodeInstrExplain -
+ * ExecProcNode wrapper that performs instrumentation calls and prints
+ * progressive explains. By keeping this a separate function, we add
+ * overhead only when progressive explain is enabled
+ */
+TupleTableSlot *
+ExecProcNodeInstrExplain(PlanState *node)
+{
+ TupleTableSlot *result;
+
+ InstrStartNode(node->instrument);
+
+ /*
+ * Update progressive after timeout is reached.
+ */
+ if (ProgressiveExplainPending)
+ ProgressiveExplainUpdate(node);
+
+ result = node->ExecProcNodeReal(node);
+
+ InstrStopNode(node->instrument, TupIsNull(result) ? 0.0 : 1.0);
+
+ return result;
+}
+
+/*
+ * WrapMemberNodesExecProcNodesWithExplain -
+ * Wrap array of PlanStates ExecProcNodes with ExecProcNodeWithExplain
+ */
+static void
+WrapMemberNodesExecProcNodesWithExplain(PlanState **planstates, int nplans)
+{
+ int i;
+
+ for (i = 0; i < nplans; i++)
+ WrapExecProcNodeWithExplain(planstates[i]);
+}
+
+/*
+ * WrapCustomPlanChildExecProcNodesWithExplain -
+ * Wrap CustomScanstate children's ExecProcNodes with ExecProcNodeWithExplain
+ */
+static void
+WrapCustomPlanChildExecProcNodesWithExplain(CustomScanState *css)
+{
+ ListCell *cell;
+
+ foreach(cell, css->custom_ps)
+ WrapExecProcNodeWithExplain((PlanState *) lfirst(cell));
+}
+
+/*
+ * WrapSubPlansExecProcNodesWithExplain -
+ * Wrap SubPlans ExecProcNodes with ExecProcNodeWithExplain
+ */
+static void
+WrapSubPlansExecProcNodesWithExplain(List *plans)
+{
+ ListCell *lst;
+
+ foreach(lst, plans)
+ {
+ SubPlanState *sps = (SubPlanState *) lfirst(lst);
+
+ WrapExecProcNodeWithExplain(sps->planstate);
+ }
+}
+
+/*
+ * WrapExecProcNodeWithExplain -
+ * Wrap ExecProcNode with ExecProcNodeWithExplain recursively
+ */
+static void
+WrapExecProcNodeWithExplain(PlanState *ps)
+{
+ /* wrapping can be done only once */
+ if (ps->ExecProcNodeOriginal != NULL)
+ return;
+
+ check_stack_depth();
+
+ ps->ExecProcNodeOriginal = ps->ExecProcNode;
+ ps->ExecProcNode = ExecProcNodeExplain;
+
+ /* initPlan-s */
+ if (ps->initPlan != NULL)
+ WrapSubPlansExecProcNodesWithExplain(ps->initPlan);
+
+ /* lefttree */
+ if (ps->lefttree != NULL)
+ WrapExecProcNodeWithExplain(ps->lefttree);
+ /* righttree */
+ if (ps->righttree != NULL)
+ WrapExecProcNodeWithExplain(ps->righttree);
+
+ /* special child plans */
+ switch (nodeTag(ps->plan))
+ {
+ case T_Append:
+ WrapMemberNodesExecProcNodesWithExplain(((AppendState *) ps)->appendplans,
+ ((AppendState *) ps)->as_nplans);
+ break;
+ case T_MergeAppend:
+ WrapMemberNodesExecProcNodesWithExplain(((MergeAppendState *) ps)->mergeplans,
+ ((MergeAppendState *) ps)->ms_nplans);
+ break;
+ case T_BitmapAnd:
+ WrapMemberNodesExecProcNodesWithExplain(((BitmapAndState *) ps)->bitmapplans,
+ ((BitmapAndState *) ps)->nplans);
+ break;
+ case T_BitmapOr:
+ WrapMemberNodesExecProcNodesWithExplain(((BitmapOrState *) ps)->bitmapplans,
+ ((BitmapOrState *) ps)->nplans);
+ break;
+ case T_SubqueryScan:
+ WrapExecProcNodeWithExplain(((SubqueryScanState *) ps)->subplan);
+ break;
+ case T_CustomScan:
+ WrapCustomPlanChildExecProcNodesWithExplain((CustomScanState *) ps);
+ break;
+ default:
+ break;
+ }
+}
+
+/*
+ * UnWrapMemberNodesExecProcNodesWithExplain -
+ * Unwrap array of PlanStates ExecProcNodes with ExecProcNodeWithExplain
+ */
+static void
+UnWrapMemberNodesExecProcNodesWithExplain(PlanState **planstates, int nplans)
+{
+ int i;
+
+ for (i = 0; i < nplans; i++)
+ UnwrapExecProcNodeWithExplain(planstates[i]);
+}
+
+/*
+ * UnwrapCustomPlanChildExecProcNodesWithExplain -
+ * Unwrap CustomScanstate children's ExecProcNodes with ExecProcNodeWithExplain
+ */
+static void
+UnwrapCustomPlanChildExecProcNodesWithExplain(CustomScanState *css)
+{
+ ListCell *cell;
+
+ foreach(cell, css->custom_ps)
+ UnwrapExecProcNodeWithExplain((PlanState *) lfirst(cell));
+}
+
+/*
+ * UnwrapSubPlansExecProcNodesWithExplain -
+ * Unwrap SubPlans ExecProcNodes with ExecProcNodeWithExplain
+ */
+static void
+UnwrapSubPlansExecProcNodesWithExplain(List *plans)
+{
+ ListCell *lst;
+
+ foreach(lst, plans)
+ {
+ SubPlanState *sps = (SubPlanState *) lfirst(lst);
+
+ UnwrapExecProcNodeWithExplain(sps->planstate);
+ }
+}
+
+/*
+ * UnwrapExecProcNodeWithExplain -
+ * Unwrap ExecProcNode with ExecProcNodeWithExplain recursively
+ */
+static void
+UnwrapExecProcNodeWithExplain(PlanState *ps)
+{
+ Assert(ps->ExecProcNodeOriginal != NULL);
+
+ check_stack_depth();
+
+ ps->ExecProcNode = ps->ExecProcNodeOriginal;
+ ps->ExecProcNodeOriginal = NULL;
+
+ /* initPlan-s */
+ if (ps->initPlan != NULL)
+ UnwrapSubPlansExecProcNodesWithExplain(ps->initPlan);
+
+ /* lefttree */
+ if (ps->lefttree != NULL)
+ UnwrapExecProcNodeWithExplain(ps->lefttree);
+ /* righttree */
+ if (ps->righttree != NULL)
+ UnwrapExecProcNodeWithExplain(ps->righttree);
+
+ /* special child plans */
+ switch (nodeTag(ps->plan))
+ {
+ case T_Append:
+ UnWrapMemberNodesExecProcNodesWithExplain(((AppendState *) ps)->appendplans,
+ ((AppendState *) ps)->as_nplans);
+ break;
+ case T_MergeAppend:
+ UnWrapMemberNodesExecProcNodesWithExplain(((MergeAppendState *) ps)->mergeplans,
+ ((MergeAppendState *) ps)->ms_nplans);
+ break;
+ case T_BitmapAnd:
+ UnWrapMemberNodesExecProcNodesWithExplain(((BitmapAndState *) ps)->bitmapplans,
+ ((BitmapAndState *) ps)->nplans);
+ break;
+ case T_BitmapOr:
+ UnWrapMemberNodesExecProcNodesWithExplain(((BitmapOrState *) ps)->bitmapplans,
+ ((BitmapOrState *) ps)->nplans);
+ break;
+ case T_SubqueryScan:
+ UnwrapExecProcNodeWithExplain(((SubqueryScanState *) ps)->subplan);
+ break;
+ case T_CustomScan:
+ UnwrapCustomPlanChildExecProcNodesWithExplain((CustomScanState *) ps);
+ break;
+ default:
+ break;
+ }
+}
+
+/*
+ * ProgressiveExplainHashShmemSize
+ * Compute shared memory space needed for explain hash.
+ */
+Size
+ProgressiveExplainHashShmemSize(void)
+{
+ Size size = 0;
+ long max_table_size;
+
+ max_table_size = add_size(MaxBackends,
+ max_parallel_workers);
+ size = add_size(size,
+ hash_estimate_size(max_table_size,
+ sizeof(ProgressiveExplainHashEntry)));
+
+ return size;
+}
+
+/*
+ * InitProgressiveExplainHash -
+ * Initialize hash used to store data of progressive explains.
+ */
+void
+InitProgressiveExplainHash(void)
+{
+ HASHCTL info;
+
+ info.keysize = sizeof(int);
+ info.entrysize = sizeof(ProgressiveExplainHashEntry);
+
+ progressiveExplainHash = ShmemInitHash("progressive explain hash",
+ 50, 50,
+ &info,
+ HASH_ELEM | HASH_BLOBS);
+}
+
+/*
+ * pg_stat_progress_explain -
+ * Return the progress of progressive explains.
+ */
+Datum
+pg_stat_progress_explain(PG_FUNCTION_ARGS)
+{
+#define EXPLAIN_ACTIVITY_COLS 4
+ int num_backends = pgstat_fetch_stat_numbackends();
+ int curr_backend;
+ HASH_SEQ_STATUS hash_seq;
+ ProgressiveExplainHashEntry *entry;
+ dsa_area *a;
+ ProgressiveExplainData *ped;
+
+ ReturnSetInfo *rsinfo = (ReturnSetInfo *) fcinfo->resultinfo;
+
+ InitMaterializedSRF(fcinfo, 0);
+
+ LWLockAcquire(ProgressiveExplainHashLock, LW_SHARED);
+
+ hash_seq_init(&hash_seq, progressiveExplainHash);
+ while ((entry = hash_seq_search(&hash_seq)) != NULL)
+ {
+ Datum values[EXPLAIN_ACTIVITY_COLS] = {0};
+ bool nulls[EXPLAIN_ACTIVITY_COLS] = {0};
+
+ /*
+ * We don't look at a DSA that doesn't contain data yet, or at our own
+ * row.
+ */
+ if (!DsaPointerIsValid(entry->pe_p) ||
+ MyProcPid == entry->pid)
+ continue;
+
+ a = dsa_attach(entry->pe_h);
+ ped = dsa_get_address(a, entry->pe_p);
+
+ /* 1-based index */
+ for (curr_backend = 1; curr_backend <= num_backends; curr_backend++)
+ {
+ LocalPgBackendStatus *local_beentry;
+ PgBackendStatus *beentry;
+
+ /* Get the next one in the list */
+ local_beentry = pgstat_get_local_beentry_by_index(curr_backend);
+ beentry = &local_beentry->backendStatus;
+
+ if (beentry->st_procpid == ped->pid)
+ {
+ /* Values available to all callers */
+ if (beentry->st_databaseid != InvalidOid)
+ values[0] = ObjectIdGetDatum(beentry->st_databaseid);
+ else
+ nulls[0] = true;
+
+ values[1] = ped->pid;
+ values[2] = TimestampTzGetDatum(ped->last_update);
+
+ if (has_privs_of_role(GetUserId(), ROLE_PG_READ_ALL_STATS) ||
+ has_privs_of_role(GetUserId(), beentry->st_procpid))
+ values[3] = CStringGetTextDatum(ped->plan);
+ else
+ values[3] = CStringGetTextDatum("<insufficient privilege>");
+ break;
+ }
+ }
+
+ tuplestore_putvalues(rsinfo->setResult, rsinfo->setDesc, values, nulls);
+
+ dsa_detach(a);
+
+ }
+ LWLockRelease(ProgressiveExplainHashLock);
+
+ return (Datum) 0;
+}
diff --git a/src/backend/commands/meson.build b/src/backend/commands/meson.build
index dd4cde41d32..2bb0ac7d286 100644
--- a/src/backend/commands/meson.build
+++ b/src/backend/commands/meson.build
@@ -24,6 +24,7 @@ backend_sources += files(
'explain.c',
'explain_dr.c',
'explain_format.c',
+ 'explain_progressive.c',
'explain_state.c',
'extension.c',
'foreigncmds.c',
diff --git a/src/backend/executor/execMain.c b/src/backend/executor/execMain.c
index e9bd98c7738..cbb074eaa93 100644
--- a/src/backend/executor/execMain.c
+++ b/src/backend/executor/execMain.c
@@ -43,6 +43,7 @@
#include "access/xact.h"
#include "catalog/namespace.h"
#include "catalog/partition.h"
+#include "commands/explain_progressive.h"
#include "commands/matview.h"
#include "commands/trigger.h"
#include "executor/executor.h"
@@ -157,6 +158,12 @@ standard_ExecutorStart(QueryDesc *queryDesc, int eflags)
/* caller must ensure the query's snapshot is active */
Assert(GetActiveSnapshot() == queryDesc->snapshot);
+ /*
+ * Setup progressive explain if enabled.
+ */
+ if (progressive_explain != PROGRESSIVE_EXPLAIN_NONE)
+ ProgressiveExplainSetup(queryDesc);
+
/*
* If the transaction is read-only, we need to check if any writes are
* planned to non-temporary tables. EXPLAIN is considered read-only.
@@ -182,6 +189,11 @@ standard_ExecutorStart(QueryDesc *queryDesc, int eflags)
estate = CreateExecutorState();
queryDesc->estate = estate;
+ /*
+ * Adding back reference to QueryDesc
+ */
+ estate->query_desc = queryDesc;
+
oldcontext = MemoryContextSwitchTo(estate->es_query_cxt);
/*
@@ -267,6 +279,12 @@ standard_ExecutorStart(QueryDesc *queryDesc, int eflags)
*/
InitPlan(queryDesc, eflags);
+ /*
+ * Start progressive explain if enabled.
+ */
+ if (progressive_explain != PROGRESSIVE_EXPLAIN_NONE)
+ ProgressiveExplainStart(queryDesc);
+
MemoryContextSwitchTo(oldcontext);
return ExecPlanStillValid(queryDesc->estate);
@@ -516,6 +534,12 @@ standard_ExecutorFinish(QueryDesc *queryDesc)
MemoryContextSwitchTo(oldcontext);
+ /*
+ * Finish progressive explain if enabled.
+ */
+ if (progressive_explain != PROGRESSIVE_EXPLAIN_NONE)
+ ProgressiveExplainFinish(queryDesc);
+
estate->es_finished = true;
}
diff --git a/src/backend/executor/execProcnode.c b/src/backend/executor/execProcnode.c
index f5f9cfbeead..82fe90bf231 100644
--- a/src/backend/executor/execProcnode.c
+++ b/src/backend/executor/execProcnode.c
@@ -72,6 +72,7 @@
*/
#include "postgres.h"
+#include "commands/explain_progressive.h"
#include "executor/executor.h"
#include "executor/nodeAgg.h"
#include "executor/nodeAppend.h"
@@ -118,6 +119,7 @@
#include "executor/nodeWorktablescan.h"
#include "miscadmin.h"
#include "nodes/nodeFuncs.h"
+#include "utils/guc.h"
static TupleTableSlot *ExecProcNodeFirst(PlanState *node);
static TupleTableSlot *ExecProcNodeInstr(PlanState *node);
@@ -462,7 +464,17 @@ ExecProcNodeFirst(PlanState *node)
* have ExecProcNode() directly call the relevant function from now on.
*/
if (node->instrument)
- node->ExecProcNode = ExecProcNodeInstr;
+ {
+ /*
+ * Use wrapper for progressive explains if enabled and the node
+ * belongs to the currently tracked query descriptor.
+ */
+ if (progressive_explain == PROGRESSIVE_EXPLAIN_ANALYZE &&
+ ProgressiveExplainIsActive(node->state->query_desc))
+ node->ExecProcNode = ExecProcNodeInstrExplain;
+ else
+ node->ExecProcNode = ExecProcNodeInstr;
+ }
else
node->ExecProcNode = node->ExecProcNodeReal;
diff --git a/src/backend/executor/instrument.c b/src/backend/executor/instrument.c
index 56e635f4700..6a160ee254f 100644
--- a/src/backend/executor/instrument.c
+++ b/src/backend/executor/instrument.c
@@ -25,6 +25,8 @@ static WalUsage save_pgWalUsage;
static void BufferUsageAdd(BufferUsage *dst, const BufferUsage *add);
static void WalUsageAdd(WalUsage *dst, WalUsage *add);
+static void InstrEndLoopInternal(Instrumentation *instr, bool force);
+
/* Allocate new instrumentation structure(s) */
Instrumentation *
@@ -137,7 +139,7 @@ InstrUpdateTupleCount(Instrumentation *instr, double nTuples)
/* Finish a run cycle for a plan node */
void
-InstrEndLoop(Instrumentation *instr)
+InstrEndLoopInternal(Instrumentation *instr, bool force)
{
double totaltime;
@@ -145,7 +147,7 @@ InstrEndLoop(Instrumentation *instr)
if (!instr->running)
return;
- if (!INSTR_TIME_IS_ZERO(instr->starttime))
+ if (!INSTR_TIME_IS_ZERO(instr->starttime) && !force)
elog(ERROR, "InstrEndLoop called on running node");
/* Accumulate per-cycle statistics into totals */
@@ -164,6 +166,20 @@ InstrEndLoop(Instrumentation *instr)
instr->tuplecount = 0;
}
+/* Safely finish a run cycle for a plan node */
+void
+InstrEndLoop(Instrumentation *instr)
+{
+ InstrEndLoopInternal(instr, false);
+}
+
+/* Forcibly finish a run cycle for a plan node */
+void
+InstrEndLoopForce(Instrumentation *instr)
+{
+ InstrEndLoopInternal(instr, true);
+}
+
/* aggregate instrumentation information */
void
InstrAggNode(Instrumentation *dst, Instrumentation *add)
diff --git a/src/backend/storage/ipc/ipci.c b/src/backend/storage/ipc/ipci.c
index 2fa045e6b0f..389f5b55831 100644
--- a/src/backend/storage/ipc/ipci.c
+++ b/src/backend/storage/ipc/ipci.c
@@ -25,6 +25,7 @@
#include "access/xlogprefetcher.h"
#include "access/xlogrecovery.h"
#include "commands/async.h"
+#include "commands/explain_progressive.h"
#include "miscadmin.h"
#include "pgstat.h"
#include "postmaster/autovacuum.h"
@@ -150,6 +151,7 @@ CalculateShmemSize(int *num_semaphores)
size = add_size(size, InjectionPointShmemSize());
size = add_size(size, SlotSyncShmemSize());
size = add_size(size, AioShmemSize());
+ size = add_size(size, ProgressiveExplainHashShmemSize());
/* include additional requested shmem from preload libraries */
size = add_size(size, total_addin_request);
@@ -302,6 +304,11 @@ CreateOrAttachShmemStructs(void)
*/
PredicateLockShmemInit();
+ /*
+ * Set up instrumented explain hash table
+ */
+ InitProgressiveExplainHash();
+
/*
* Set up process table
*/
diff --git a/src/backend/storage/lmgr/lwlock.c b/src/backend/storage/lmgr/lwlock.c
index 5702c35bb91..7097312b1a8 100644
--- a/src/backend/storage/lmgr/lwlock.c
+++ b/src/backend/storage/lmgr/lwlock.c
@@ -177,6 +177,7 @@ static const char *const BuiltinTrancheNames[] = {
[LWTRANCHE_SUBTRANS_SLRU] = "SubtransSLRU",
[LWTRANCHE_XACT_SLRU] = "XactSLRU",
[LWTRANCHE_PARALLEL_VACUUM_DSA] = "ParallelVacuumDSA",
+ [LWTRANCHE_PROGRESSIVE_EXPLAIN_DSA] = "ProgressiveExplainDSA",
};
StaticAssertDecl(lengthof(BuiltinTrancheNames) ==
diff --git a/src/backend/utils/activity/wait_event_names.txt b/src/backend/utils/activity/wait_event_names.txt
index 9fa12a555e8..b53bc61d0d8 100644
--- a/src/backend/utils/activity/wait_event_names.txt
+++ b/src/backend/utils/activity/wait_event_names.txt
@@ -349,6 +349,7 @@ DSMRegistry "Waiting to read or update the dynamic shared memory registry."
InjectionPoint "Waiting to read or update information related to injection points."
SerialControl "Waiting to read or update shared <filename>pg_serial</filename> state."
AioWorkerSubmissionQueue "Waiting to access AIO worker submission queue."
+ProgressiveExplainHash "Waiting to access backend progressive explain shared hash table."
#
# END OF PREDEFINED LWLOCKS (DO NOT CHANGE THIS LINE)
diff --git a/src/backend/utils/init/postinit.c b/src/backend/utils/init/postinit.c
index 7958ea11b73..3b55a061f3e 100644
--- a/src/backend/utils/init/postinit.c
+++ b/src/backend/utils/init/postinit.c
@@ -33,6 +33,7 @@
#include "catalog/pg_database.h"
#include "catalog/pg_db_role_setting.h"
#include "catalog/pg_tablespace.h"
+#include "commands/explain_progressive.h"
#include "libpq/auth.h"
#include "libpq/libpq-be.h"
#include "mb/pg_wchar.h"
@@ -82,6 +83,8 @@ static void TransactionTimeoutHandler(void);
static void IdleSessionTimeoutHandler(void);
static void IdleStatsUpdateTimeoutHandler(void);
static void ClientCheckTimeoutHandler(void);
+static void ProgressiveExplainStartupTimeoutHandler(void);
+static void ProgressiveExplainTimeoutHandler(void);
static bool ThereIsAtLeastOneRole(void);
static void process_startup_options(Port *port, bool am_superuser);
static void process_settings(Oid databaseid, Oid roleid);
@@ -771,6 +774,10 @@ InitPostgres(const char *in_dbname, Oid dboid,
RegisterTimeout(CLIENT_CONNECTION_CHECK_TIMEOUT, ClientCheckTimeoutHandler);
RegisterTimeout(IDLE_STATS_UPDATE_TIMEOUT,
IdleStatsUpdateTimeoutHandler);
+ RegisterTimeout(PROGRESSIVE_EXPLAIN_TIMEOUT,
+ ProgressiveExplainTimeoutHandler);
+ RegisterTimeout(PROGRESSIVE_EXPLAIN_STARTUP_TIMEOUT,
+ ProgressiveExplainStartupTimeoutHandler);
}
/*
@@ -1432,6 +1439,18 @@ ClientCheckTimeoutHandler(void)
SetLatch(MyLatch);
}
+static void
+ProgressiveExplainStartupTimeoutHandler(void)
+{
+ ProgressiveExplainTrigger();
+}
+
+static void
+ProgressiveExplainTimeoutHandler(void)
+{
+ ProgressiveExplainPending = true;
+}
+
/*
* Returns true if at least one role is defined in this database cluster.
*/
diff --git a/src/backend/utils/misc/guc_tables.c b/src/backend/utils/misc/guc_tables.c
index 989825d3a9c..2a5f5058bda 100644
--- a/src/backend/utils/misc/guc_tables.c
+++ b/src/backend/utils/misc/guc_tables.c
@@ -41,6 +41,8 @@
#include "commands/async.h"
#include "commands/extension.h"
#include "commands/event_trigger.h"
+#include "commands/explain_progressive.h"
+#include "commands/explain_state.h"
#include "commands/tablespace.h"
#include "commands/trigger.h"
#include "commands/user.h"
@@ -479,6 +481,14 @@ static const struct config_enum_entry wal_compression_options[] = {
{NULL, 0, false}
};
+static const struct config_enum_entry progressive_explain_options[] = {
+ {"off", PROGRESSIVE_EXPLAIN_NONE, false},
+ {"explain", PROGRESSIVE_EXPLAIN_EXPLAIN, false},
+ {"analyze", PROGRESSIVE_EXPLAIN_ANALYZE, false},
+ {"false", PROGRESSIVE_EXPLAIN_NONE, true},
+ {NULL, 0, false}
+};
+
/*
* Options for enum values stored in other modules
*/
@@ -533,6 +543,16 @@ int log_parameter_max_length_on_error = 0;
int log_temp_files = -1;
double log_statement_sample_rate = 1.0;
double log_xact_sample_rate = 0;
+int progressive_explain = PROGRESSIVE_EXPLAIN_NONE;
+bool progressive_explain_verbose = false;
+bool progressive_explain_settings = false;
+bool progressive_explain_timing = true;
+bool progressive_explain_buffers = false;
+bool progressive_explain_wal = false;
+bool progressive_explain_costs = true;
+int progressive_explain_min_duration = 1000;
+int progressive_explain_interval = 1000;
+int progressive_explain_format = EXPLAIN_FORMAT_TEXT;
char *backtrace_functions;
int temp_file_limit = -1;
@@ -2141,6 +2161,72 @@ struct config_bool ConfigureNamesBool[] =
NULL, NULL, NULL
},
+ {
+ {"progressive_explain_verbose", PGC_USERSET, STATS_MONITORING,
+ gettext_noop("Controls whether verbose details are added to progressive explains."),
+ gettext_noop("Equivalent to the VERBOSE option of EXPLAIN."),
+ GUC_EXPLAIN
+ },
+ &progressive_explain_verbose,
+ false,
+ NULL, NULL, NULL
+ },
+
+ {
+ {"progressive_explain_settings", PGC_USERSET, STATS_MONITORING,
+ gettext_noop("Controls whether information on modified configuration is added to progressive explains."),
+ gettext_noop("Equivalent to the SETTINGS option of EXPLAIN."),
+ GUC_EXPLAIN
+ },
+ &progressive_explain_settings,
+ false,
+ NULL, NULL, NULL
+ },
+
+ {
+ {"progressive_explain_timing", PGC_USERSET, STATS_MONITORING,
+ gettext_noop("Controls whether information on per node timing is added to progressive explains."),
+ gettext_noop("Equivalent to the TIMING option of EXPLAIN."),
+ GUC_EXPLAIN
+ },
+ &progressive_explain_timing,
+ true,
+ NULL, NULL, NULL
+ },
+
+ {
+ {"progressive_explain_buffers", PGC_USERSET, STATS_MONITORING,
+ gettext_noop("Controls whether information on buffer usage is added to progressive explains."),
+ gettext_noop("Equivalent to the BUFFERS option of EXPLAIN."),
+ GUC_EXPLAIN
+ },
+ &progressive_explain_buffers,
+ false,
+ NULL, NULL, NULL
+ },
+
+ {
+ {"progressive_explain_wal", PGC_USERSET, STATS_MONITORING,
+ gettext_noop("Controls whether information on WAL record generation is added to progressive explains."),
+ gettext_noop("Equivalent to the WAL option of EXPLAIN."),
+ GUC_EXPLAIN
+ },
+ &progressive_explain_wal,
+ false,
+ NULL, NULL, NULL
+ },
+
+ {
+ {"progressive_explain_costs", PGC_USERSET, STATS_MONITORING,
+ gettext_noop("Controls whether information on the estimated startup and total cost of each plan node is added to progressive explains."),
+ gettext_noop("Equivalent to the COSTS option of EXPLAIN."),
+ GUC_EXPLAIN
+ },
+ &progressive_explain_costs,
+ true,
+ NULL, NULL, NULL
+ },
+
/* End-of-list marker */
{
{NULL, 0, 0, NULL, NULL}, NULL, false, NULL, NULL, NULL
@@ -3858,6 +3944,30 @@ struct config_int ConfigureNamesInt[] =
NULL, NULL, NULL
},
+ {
+ {"progressive_explain_interval", PGC_USERSET, STATS_MONITORING,
+ gettext_noop("Sets the interval between instrumented progressive "
+ "explains."),
+ NULL,
+ GUC_UNIT_MS
+ },
+ &progressive_explain_interval,
+ 1000, 10, INT_MAX,
+ NULL, NULL, NULL
+ },
+
+ {
+ {"progressive_explain_min_duration", PGC_USERSET, STATS_MONITORING,
+ gettext_noop("Min query duration to start printing instrumented "
+ "progressive explains."),
+ NULL,
+ GUC_UNIT_MS
+ },
+ &progressive_explain_min_duration,
+ 1000, 0, INT_MAX,
+ NULL, NULL, NULL
+ },
+
/* End-of-list marker */
{
{NULL, 0, 0, NULL, NULL}, NULL, 0, 0, 0, NULL, NULL, NULL
@@ -5406,6 +5516,26 @@ struct config_enum ConfigureNamesEnum[] =
NULL, assign_io_method, NULL
},
+ {
+ {"progressive_explain_format", PGC_USERSET, STATS_MONITORING,
+ gettext_noop("Selects the EXPLAIN output format to be used with progressive explains."),
+ gettext_noop("Equivalent to the FORMAT option of EXPLAIN.")
+ },
+ &progressive_explain_format,
+ EXPLAIN_FORMAT_TEXT, explain_format_options,
+ NULL, NULL, NULL
+ },
+
+ {
+ {"progressive_explain", PGC_USERSET, STATS_MONITORING,
+ gettext_noop("Enables progressive explains."),
+ gettext_noop("Explain output is visible via pg_stat_progress_explain.")
+ },
+ &progressive_explain,
+ PROGRESSIVE_EXPLAIN_NONE, progressive_explain_options,
+ NULL, NULL, NULL
+ },
+
/* End-of-list marker */
{
{NULL, 0, 0, NULL, NULL}, NULL, 0, NULL, NULL, NULL, NULL
diff --git a/src/backend/utils/misc/postgresql.conf.sample b/src/backend/utils/misc/postgresql.conf.sample
index 0b9e3066bde..89aa80afb5c 100644
--- a/src/backend/utils/misc/postgresql.conf.sample
+++ b/src/backend/utils/misc/postgresql.conf.sample
@@ -670,6 +670,20 @@
#log_executor_stats = off
+# - Progressive Explain -
+
+#progressive_explain = off
+#progressive_explain_min_duration = 1s
+#progressive_explain_interval = 1s
+#progressive_explain_format = text
+#progressive_explain_settings = off
+#progressive_explain_verbose = off
+#progressive_explain_buffers = off
+#progressive_explain_wal = off
+#progressive_explain_timing = on
+#progressive_explain_costs = on
+
+
#------------------------------------------------------------------------------
# VACUUMING
#------------------------------------------------------------------------------
diff --git a/src/include/catalog/pg_proc.dat b/src/include/catalog/pg_proc.dat
index df0370256dc..7efc923ba25 100644
--- a/src/include/catalog/pg_proc.dat
+++ b/src/include/catalog/pg_proc.dat
@@ -12486,4 +12486,14 @@
proargtypes => 'int4',
prosrc => 'gist_stratnum_common' },
+{ oid => '8770',
+ descr => 'statistics: information about progress of backends running statements',
+ proname => 'pg_stat_progress_explain', prorows => '100', proisstrict => 'f',
+ proretset => 't', provolatile => 's', proparallel => 'r',
+ prorettype => 'record', proargtypes => '',
+ proallargtypes => '{oid,int4,timestamptz,text}',
+ proargmodes => '{o,o,o,o}',
+ proargnames => '{datid,pid,last_update,query_plan}',
+ prosrc => 'pg_stat_progress_explain' },
+
]
diff --git a/src/include/commands/explain_progressive.h b/src/include/commands/explain_progressive.h
new file mode 100644
index 00000000000..9c888b6ebe9
--- /dev/null
+++ b/src/include/commands/explain_progressive.h
@@ -0,0 +1,56 @@
+/*-------------------------------------------------------------------------
+ *
+ * explain_progressive.h
+ * prototypes for explain_progressive.c
+ *
+ * Portions Copyright (c) 1996-2025, PostgreSQL Global Development Group
+ * Portions Copyright (c) 1994-5, Regents of the University of California
+ *
+ * src/include/commands/explain_progressive.h
+ *
+ *-------------------------------------------------------------------------
+ */
+#ifndef EXPLAIN_PROGRESSIVE_H
+#define EXPLAIN_PROGRESSIVE_H
+
+#include "datatype/timestamp.h"
+#include "executor/executor.h"
+
+typedef enum ProgressiveExplain
+{
+ PROGRESSIVE_EXPLAIN_NONE,
+ PROGRESSIVE_EXPLAIN_EXPLAIN,
+ PROGRESSIVE_EXPLAIN_ANALYZE,
+} ProgressiveExplain;
+
+typedef struct ProgressiveExplainHashEntry
+{
+ int pid; /* hash key of entry - MUST BE FIRST */
+ int plan_alloc_size;
+ dsa_handle pe_h;
+ dsa_pointer pe_p;
+} ProgressiveExplainHashEntry;
+
+typedef struct ProgressiveExplainData
+{
+ int pid;
+ TimestampTz last_update;
+ char plan[FLEXIBLE_ARRAY_MEMBER];
+} ProgressiveExplainData;
+
+extern bool ProgressiveExplainIsActive(QueryDesc *queryDesc);
+extern void ProgressiveExplainSetup(QueryDesc *queryDesc);
+extern void ProgressiveExplainStart(QueryDesc *queryDesc);
+extern void ProgressiveExplainTrigger(void);
+extern void ProgressiveExplainUpdate(PlanState *node);
+extern void ProgressiveExplainFinish(QueryDesc *queryDesc);
+extern Size ProgressiveExplainHashShmemSize(void);
+extern void InitProgressiveExplainHash(void);
+extern TupleTableSlot *ExecProcNodeInstrExplain(PlanState *node);
+
+/* transaction cleanup code */
+extern void AtEOXact_ProgressiveExplain(void);
+
+extern bool ProgressiveExplainPending;
+
+#endif /* EXPLAIN_PROGRESSIVE_H */
diff --git a/src/include/commands/explain_state.h b/src/include/commands/explain_state.h
index 32728f5d1a1..64370a5d6ea 100644
--- a/src/include/commands/explain_state.h
+++ b/src/include/commands/explain_state.h
@@ -16,6 +16,7 @@
#include "nodes/parsenodes.h"
#include "nodes/plannodes.h"
#include "parser/parse_node.h"
+#include "utils/dsa.h"
typedef enum ExplainSerializeOption
{
@@ -74,6 +75,14 @@ typedef struct ExplainState
/* extensions */
void **extension_state;
int extension_state_allocated;
+ /* set if tracking a progressive explain */
+ bool progressive;
+ /* current plan node in progressive explains */
+ struct PlanState *pe_curr_node;
+ /* reusable instr object used in progressive explains */
+ struct Instrumentation *pe_local_instr;
+ /* dsa area used to store progressive explain data */
+ dsa_area *pe_a;
} ExplainState;
typedef void (*ExplainOptionHandler) (ExplainState *, DefElem *, ParseState *);
diff --git a/src/include/executor/execdesc.h b/src/include/executor/execdesc.h
index ba53305ad42..27692aee542 100644
--- a/src/include/executor/execdesc.h
+++ b/src/include/executor/execdesc.h
@@ -48,6 +48,7 @@ typedef struct QueryDesc
TupleDesc tupDesc; /* descriptor for result tuples */
EState *estate; /* executor's query-wide state */
PlanState *planstate; /* tree of per-plan-node state */
+ struct ExplainState *pestate; /* progressive explain state if enabled */
/* This field is set by ExecutePlan */
bool already_executed; /* true if previously executed */
diff --git a/src/include/executor/instrument.h b/src/include/executor/instrument.h
index 03653ab6c6c..21de2a5632d 100644
--- a/src/include/executor/instrument.h
+++ b/src/include/executor/instrument.h
@@ -109,6 +109,7 @@ extern void InstrStartNode(Instrumentation *instr);
extern void InstrStopNode(Instrumentation *instr, double nTuples);
extern void InstrUpdateTupleCount(Instrumentation *instr, double nTuples);
extern void InstrEndLoop(Instrumentation *instr);
+extern void InstrEndLoopForce(Instrumentation *instr);
extern void InstrAggNode(Instrumentation *dst, Instrumentation *add);
extern void InstrStartParallelQuery(void);
extern void InstrEndParallelQuery(BufferUsage *bufusage, WalUsage *walusage);
diff --git a/src/include/nodes/execnodes.h b/src/include/nodes/execnodes.h
index e42f9f9f957..c863cb8f032 100644
--- a/src/include/nodes/execnodes.h
+++ b/src/include/nodes/execnodes.h
@@ -57,6 +57,7 @@ struct ExprState;
struct ExprContext;
struct RangeTblEntry; /* avoid including parsenodes.h here */
struct ExprEvalStep; /* avoid including execExpr.h everywhere */
+struct QueryDesc; /* avoid including execdesc.h here */
struct CopyMultiInsertBuffer;
struct LogicalTapeSet;
@@ -763,6 +764,9 @@ typedef struct EState
*/
List *es_insert_pending_result_relations;
List *es_insert_pending_modifytables;
+
+ /* Reference to query descriptor */
+ struct QueryDesc *query_desc;
} EState;
@@ -1159,6 +1163,8 @@ typedef struct PlanState
ExecProcNodeMtd ExecProcNode; /* function to return next tuple */
ExecProcNodeMtd ExecProcNodeReal; /* actual function, if above is a
* wrapper */
+ ExecProcNodeMtd ExecProcNodeOriginal; /* pointer to original function
+ * another wrapper was added */
Instrumentation *instrument; /* Optional runtime stats for this node */
WorkerInstrumentation *worker_instrument; /* per-worker instrumentation */
diff --git a/src/include/storage/lwlock.h b/src/include/storage/lwlock.h
index ffa03189e2d..f3499d307f4 100644
--- a/src/include/storage/lwlock.h
+++ b/src/include/storage/lwlock.h
@@ -218,6 +218,7 @@ typedef enum BuiltinTrancheIds
LWTRANCHE_SUBTRANS_SLRU,
LWTRANCHE_XACT_SLRU,
LWTRANCHE_PARALLEL_VACUUM_DSA,
+ LWTRANCHE_PROGRESSIVE_EXPLAIN_DSA,
LWTRANCHE_FIRST_USER_DEFINED,
} BuiltinTrancheIds;
diff --git a/src/include/storage/lwlocklist.h b/src/include/storage/lwlocklist.h
index 932024b1b0b..7d88e7e9b58 100644
--- a/src/include/storage/lwlocklist.h
+++ b/src/include/storage/lwlocklist.h
@@ -84,3 +84,4 @@ PG_LWLOCK(50, DSMRegistry)
PG_LWLOCK(51, InjectionPoint)
PG_LWLOCK(52, SerialControl)
PG_LWLOCK(53, AioWorkerSubmissionQueue)
+PG_LWLOCK(54, ProgressiveExplainHash)
diff --git a/src/include/utils/guc.h b/src/include/utils/guc.h
index f619100467d..cff5c1f4cdb 100644
--- a/src/include/utils/guc.h
+++ b/src/include/utils/guc.h
@@ -278,6 +278,16 @@ extern PGDLLIMPORT int log_min_duration_statement;
extern PGDLLIMPORT int log_temp_files;
extern PGDLLIMPORT double log_statement_sample_rate;
extern PGDLLIMPORT double log_xact_sample_rate;
+extern PGDLLIMPORT int progressive_explain;
+extern PGDLLIMPORT int progressive_explain_min_duration;
+extern PGDLLIMPORT int progressive_explain_interval;
+extern PGDLLIMPORT int progressive_explain_format;
+extern PGDLLIMPORT bool progressive_explain_verbose;
+extern PGDLLIMPORT bool progressive_explain_settings;
+extern PGDLLIMPORT bool progressive_explain_timing;
+extern PGDLLIMPORT bool progressive_explain_buffers;
+extern PGDLLIMPORT bool progressive_explain_wal;
+extern PGDLLIMPORT bool progressive_explain_costs;
extern PGDLLIMPORT char *backtrace_functions;
extern PGDLLIMPORT int temp_file_limit;
@@ -322,6 +332,7 @@ extern PGDLLIMPORT const struct config_enum_entry io_method_options[];
extern PGDLLIMPORT const struct config_enum_entry recovery_target_action_options[];
extern PGDLLIMPORT const struct config_enum_entry wal_level_options[];
extern PGDLLIMPORT const struct config_enum_entry wal_sync_method_options[];
+extern PGDLLIMPORT const struct config_enum_entry explain_format_options[];
/*
* Functions exported by guc.c
diff --git a/src/include/utils/timeout.h b/src/include/utils/timeout.h
index 7b19beafdc9..ea66a0505d9 100644
--- a/src/include/utils/timeout.h
+++ b/src/include/utils/timeout.h
@@ -36,6 +36,8 @@ typedef enum TimeoutId
IDLE_STATS_UPDATE_TIMEOUT,
CLIENT_CONNECTION_CHECK_TIMEOUT,
STARTUP_PROGRESS_TIMEOUT,
+ PROGRESSIVE_EXPLAIN_STARTUP_TIMEOUT,
+ PROGRESSIVE_EXPLAIN_TIMEOUT,
/* First user-definable timeout reason */
USER_TIMEOUT,
/* Maximum number of timeout reasons */
diff --git a/src/test/modules/test_misc/t/008_progressive_explain.pl b/src/test/modules/test_misc/t/008_progressive_explain.pl
new file mode 100644
index 00000000000..05e555a5e26
--- /dev/null
+++ b/src/test/modules/test_misc/t/008_progressive_explain.pl
@@ -0,0 +1,130 @@
+# Copyright (c) 2023-2025, PostgreSQL Global Development Group
+#
+# Test progressive explain
+#
+# We need to make sure pg_stat_progress_explain does not show rows for the local
+# session, even if progressive explain is enabled. For other sessions pg_stat_progress_explain
+# should contain data for a PID only if progressive_explain is enabled and a query
+# is running. Data needs to be removed when query finishes (or gets cancelled).
+
+use strict;
+use PostgreSQL::Test::Cluster;
+use PostgreSQL::Test::Utils;
+use Test::More;
+
+# Initialize node
+my $node = PostgreSQL::Test::Cluster->new('progressive_explain');
+
+$node->init;
+# Configure progressive explain to be logged immediately
+$node->append_conf('postgresql.conf', 'progressive_explain_min_duration = 0');
+$node->start;
+
+# Test for local session
+sub test_local_session
+{
+ my $setting = $_[0];
+ # Make sure local session does not appear in pg_stat_progress_explain
+ my $count = $node->safe_psql(
+ 'postgres', qq[
+ SET progressive_explain='$setting';
+ SELECT count(*) from pg_stat_progress_explain WHERE pid = pg_backend_pid()
+ ]);
+
+ ok($count == "0",
+ "Session cannot see its own explain with progressive_explain set to ${setting}");
+}
+
+# Tests for peer session
+sub test_peer_session
+{
+ my $setting = $_[0];
+ my $ret;
+
+ # Start a background session and get its PID
+ my $background_psql = $node->background_psql(
+ 'postgres',
+ on_error_stop => 0);
+
+ my $pid = $background_psql->query_safe(
+ qq[
+ SELECT pg_backend_pid();
+ ]);
+
+ # Configure progressive explain in background session and run a simple query
+ # letting it finish
+ $background_psql->query_safe(
+ qq[
+ SET progressive_explain='$setting';
+ SELECT 1;
+ ]);
+
+ # Check that pg_stat_progress_explain contains no row for the PID that finished
+ # its query gracefully
+ $ret = $node->safe_psql(
+ 'postgres', qq[
+ SELECT count(*) FROM pg_stat_progress_explain where pid = $pid
+ ]);
+
+ ok($ret == "0",
+ "pg_stat_progress_explain empty for completed query with progressive_explain set to ${setting}");
+
+ # Start query in background session and leave it running
+ $background_psql->query_until(
+ qr/start/, q(
+ \echo start
+ SELECT pg_sleep(600);
+ ));
+
+ $ret = $node->safe_psql(
+ 'postgres', qq[
+ SELECT count(*) FROM pg_stat_progress_explain where pid = $pid
+ ]);
+
+ # If progressive_explain is disabled pg_stat_progress_explain should not contain
+ # row for PID
+ if ($setting eq 'off') {
+ ok($ret == "0",
+ "pg_stat_progress_explain empty for running query with progressive_explain set to ${setting}");
+ }
+ # 1 row for pid is expected for running query
+ else {
+ ok($ret == "1",
+ "pg_stat_progress_explain with 1 row for running query with progressive_explain set to ${setting}");
+ }
+
+ # Terminate running query and make sure it is gone
+ $node->safe_psql(
+ 'postgres', qq[
+ SELECT pg_cancel_backend($pid);
+ ]);
+
+ $node->poll_query_until(
+ 'postgres', qq[
+ SELECT count(*) = 0 FROM pg_stat_activity
+ WHERE pid = $pid and state = 'active'
+ ]);
+
+ # Check again pg_stat_progress_explain and confirm that the existing row is
+ # now gone
+ $ret = $node->safe_psql(
+ 'postgres', qq[
+ SELECT count(*) FROM pg_stat_progress_explain where pid = $pid
+ ]);
+
+ ok($ret == "0",
+ "pg_stat_progress_explain empty for canceled query with progressive_explain set to ${setting}");
+}
+
+# Run tests for the local session
+test_local_session('off');
+test_local_session('explain');
+test_local_session('analyze');
+
+# Run tests for peer session
+test_peer_session('off');
+test_peer_session('explain');
+test_peer_session('analyze');
+
+$node->stop;
+done_testing();
diff --git a/src/test/regress/expected/rules.out b/src/test/regress/expected/rules.out
index 47478969135..62b70cf4618 100644
--- a/src/test/regress/expected/rules.out
+++ b/src/test/regress/expected/rules.out
@@ -2041,6 +2041,13 @@ pg_stat_progress_create_index| SELECT s.pid,
s.param15 AS partitions_done
FROM (pg_stat_get_progress_info('CREATE INDEX'::text) s(pid, datid, relid, param1, param2, param3, param4, param5, param6, param7, param8, param9, param10, param11, param12, param13, param14, param15, param16, param17, param18, param19, param20)
LEFT JOIN pg_database d ON ((s.datid = d.oid)));
+pg_stat_progress_explain| SELECT s.datid,
+ d.datname,
+ s.pid,
+ s.last_update,
+ s.query_plan
+ FROM (pg_stat_progress_explain() s(datid, pid, last_update, query_plan)
+ LEFT JOIN pg_database d ON ((s.datid = d.oid)));
pg_stat_progress_vacuum| SELECT s.pid,
s.datid,
d.datname,
diff --git a/src/tools/pgindent/typedefs.list b/src/tools/pgindent/typedefs.list
index 3fbf5a4c212..0fcddb8ff88 100644
--- a/src/tools/pgindent/typedefs.list
+++ b/src/tools/pgindent/typedefs.list
@@ -2296,6 +2296,9 @@ ProcessUtilityContext
ProcessUtility_hook_type
ProcessingMode
ProgressCommandType
+ProgressiveExplain
+ProgressiveExplainData
+ProgressiveExplainHashEntry
ProjectSet
ProjectSetPath
ProjectSetState
--
2.43.0
On Tue, Mar 25, 2025 at 8:52 PM Rafael Thofehrn Castro
<rafaelthca@gmail.com> wrote:
This first version of the progressive explain feature was designed to only keep
track of initial query called by the backend, ignoring all subquery calls. So
I believe we don't need to worry about having to add custom logic in
AbortSubTransaction(). In case query errors out AbortTransaction() will be called
and everything related to progressive explains will be cleaned.
Suppose:
BEGIN;
SELECT 1;
SAVEPOINT bob;
progressively explain something that aborts
I think in this case we will call AbortSubTransaction(), not AbortTransaction().
--
Robert Haas
EDB: http://www.enterprisedb.com
Suppose:
BEGIN;
SELECT 1;
SAVEPOINT bob;
progressively explain something that aborts
I think in this case we will call AbortSubTransaction(), not
AbortTransaction().
Indeed. We need special treatment for subtransactions. There are 2
scenarios where
AbortSubTransaction() will be called alone and can affect progressive
explains:
savepoints and PL/pgSQL exception blocks.
We don't want subxact aborts in PL/pgSQL exception blocks to perform cleanup
as the original query will continue to run and progressive explain tracking
is still applicable there.
So implementation was done based on transaction nested level. Cleanup is
only
performed when AbortSubTransaction() is called in the same transaction
nested
level as the one where the query is running. This covers both PL/pgSQL
exception
blocks and savepoints.
Rafael.
Attachments:
v13-0001-Proposal-for-progressive-explains.patchapplication/octet-stream; name=v13-0001-Proposal-for-progressive-explains.patchDownload
From fead2d30bf6116f48bbe630a73a1161c081c7ed6 Mon Sep 17 00:00:00 2001
From: rafaelthca <rafaelthca@gmail.com>
Date: Tue, 25 Mar 2025 19:05:59 -0300
Subject: [PATCH v13] Proposal for progressive explains
This proposal introduces a feature to print execution plans of active
queries in an in-memory shared hash object so that other sessions can
visualize via new view pg_stat_progress_explain.
Plans are only printed if new GUC parameter progressive_explain is
enabled.
When progressive_explain is set to 'explain' the plan will be printed
only once after progressive_explain_min_duration. If set to 'analyze'
the QueryDesc will be adjusted adding instrumentation flags. In that
case the plan will be printed on a fixed interval controlled by new
GUC parameter progressive_explain_interval including all instrumentation
stats computed so far (per node rows and execution time).
New view:
- pg_stat_progress_explain
- datid: OID of the database
- datname: Name of the database
- pid: PID of the process running the query
- last_update: timestamp when plan was last printed
- query_plan: the actual plan (limited read privileges)
New GUCs:
- progressive_explain: if progressive plans are printed for local
session.
- type: enum
- default: off
- values: [off, explain, analyze]
- context: user
- progressive_explain_min_duration: min query duration until progressive
explain starts.
- type: int
- default: 1s
- min: 0
- context: user
- progressive_explain_interval: interval between each explain print.
- type: int
- default: 1s
- min: 10ms
- context: user
- progressive_explain_format: format used to print the plans.
- type: enum
- default: text
- values: [TEXT, XML, JSON, or YAML]
- context: user
- progressive_explain_settings: controls whether information about
modified configuration is added to the printed plan.
- type: bool
- default: off
- context: user
- progressive_explain_verbose: controls whether verbose details are
added to the printed plan.
- type: bool
- default: off
- context: user
- progressive_explain_buffers: controls whether buffers details are
added to the printed plan.
- type: bool
- default: off
- context: user
- progressive_explain_timing: controls whether per node timing details
are added to the printed plan.
- type: bool
- default: true
- context: user
- progressive_explain_wal: controls whether WAL record generation
details are added to the printed plan.
- type: bool
- default: off
- context: user
- progressive_explain_costs: controls whether estimated startup and
total cost of each plan noded is added to the printed plan.
- type: bool
- default: true
- context: user
---
contrib/auto_explain/auto_explain.c | 10 +-
doc/src/sgml/config.sgml | 156 ++++
doc/src/sgml/monitoring.sgml | 82 ++
doc/src/sgml/perform.sgml | 97 +++
src/backend/access/transam/xact.c | 3 +
src/backend/catalog/system_views.sql | 10 +
src/backend/commands/Makefile | 1 +
src/backend/commands/explain.c | 159 ++--
src/backend/commands/explain_format.c | 12 +
src/backend/commands/explain_progressive.c | 762 ++++++++++++++++++
src/backend/commands/meson.build | 1 +
src/backend/executor/execMain.c | 24 +
src/backend/executor/execProcnode.c | 14 +-
src/backend/executor/instrument.c | 20 +-
src/backend/storage/ipc/ipci.c | 7 +
src/backend/storage/lmgr/lwlock.c | 1 +
src/backend/tcop/pquery.c | 3 +
.../utils/activity/wait_event_names.txt | 1 +
src/backend/utils/init/postinit.c | 19 +
src/backend/utils/misc/guc_tables.c | 130 +++
src/backend/utils/misc/postgresql.conf.sample | 14 +
src/include/catalog/pg_proc.dat | 10 +
src/include/commands/explain_progressive.h | 57 ++
src/include/commands/explain_state.h | 9 +
src/include/executor/execdesc.h | 1 +
src/include/executor/instrument.h | 1 +
src/include/nodes/execnodes.h | 6 +
src/include/storage/lwlock.h | 1 +
src/include/storage/lwlocklist.h | 1 +
src/include/utils/guc.h | 11 +
src/include/utils/timeout.h | 2 +
.../test_misc/t/008_progressive_explain.pl | 130 +++
src/test/regress/expected/rules.out | 7 +
src/tools/pgindent/typedefs.list | 3 +
34 files changed, 1706 insertions(+), 59 deletions(-)
create mode 100644 src/backend/commands/explain_progressive.c
create mode 100644 src/include/commands/explain_progressive.h
create mode 100644 src/test/modules/test_misc/t/008_progressive_explain.pl
diff --git a/contrib/auto_explain/auto_explain.c b/contrib/auto_explain/auto_explain.c
index cd6625020a7..0d28ae2ffe1 100644
--- a/contrib/auto_explain/auto_explain.c
+++ b/contrib/auto_explain/auto_explain.c
@@ -42,14 +42,6 @@ static int auto_explain_log_level = LOG;
static bool auto_explain_log_nested_statements = false;
static double auto_explain_sample_rate = 1;
-static const struct config_enum_entry format_options[] = {
- {"text", EXPLAIN_FORMAT_TEXT, false},
- {"xml", EXPLAIN_FORMAT_XML, false},
- {"json", EXPLAIN_FORMAT_JSON, false},
- {"yaml", EXPLAIN_FORMAT_YAML, false},
- {NULL, 0, false}
-};
-
static const struct config_enum_entry loglevel_options[] = {
{"debug5", DEBUG5, false},
{"debug4", DEBUG4, false},
@@ -191,7 +183,7 @@ _PG_init(void)
NULL,
&auto_explain_log_format,
EXPLAIN_FORMAT_TEXT,
- format_options,
+ explain_format_options,
PGC_SUSET,
0,
NULL,
diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml
index 69fc93dffc4..ff0b147877e 100644
--- a/doc/src/sgml/config.sgml
+++ b/doc/src/sgml/config.sgml
@@ -8674,6 +8674,162 @@ COPY postgres_log FROM '/full/path/to/logfile.csv' WITH csv;
</listitem>
</varlistentry>
+ <varlistentry id="guc-progressive-explain" xreflabel="progressive_explain">
+ <term><varname>progressive_explain</varname> (<type>enum</type>)
+ <indexterm>
+ <primary><varname>progressive_explain</varname> configuration parameter</primary>
+ </indexterm>
+ </term>
+ <listitem>
+ <para>
+ Determines whether progressive explains are enabled and how
+ they are executed; see <xref linkend="using-explain-progressive"/>.
+ Possible values are <literal>off</literal>, <literal>explain</literal>
+ and <literal>analyze</literal>. The default is <literal>off</literal>.
+ When set to <literal>explain</literal> the plan will be printed only
+ once after <xref linkend="guc-progressive-explain-min-duration"/>. If
+ set to <literal>analyze</literal>, instrumentation flags are enabled,
+ causing the plan to be printed on a fixed interval controlled by
+ <xref linkend="guc-progressive-explain-interval"/> including all
+ instrumentation stats computed so far.
+ </para>
+ </listitem>
+ </varlistentry>
+
+ <varlistentry id="guc-progressive-explain-min-duration" xreflabel="progressive_explain_min_duration">
+ <term><varname>progressive_explain_min_duration</varname> (<type>integer</type>)
+ <indexterm>
+ <primary><varname>progressive_explain_min_duration</varname> configuration parameter</primary>
+ </indexterm>
+ </term>
+ <listitem>
+ <para>
+ Sets the threshold (in milliseconds) until progressive explain is
+ printed for the first time. The default is <literal>1s</literal>.
+ </para>
+ </listitem>
+ </varlistentry>
+
+ <varlistentry id="guc-progressive-explain-interval" xreflabel="progressive_explain_interval">
+ <term><varname>progressive_explain_interval</varname> (<type>integer</type>)
+ <indexterm>
+ <primary><varname>progressive_explain_interval</varname> configuration parameter</primary>
+ </indexterm>
+ </term>
+ <listitem>
+ <para>
+ Sets the interval (in milliseconds) between each instrumented
+ progressive explain. The default is <literal>1s</literal>.
+ </para>
+ </listitem>
+ </varlistentry>
+
+ <varlistentry id="guc-progressive-explain-buffers" xreflabel="progressive_explain_buffers">
+ <term><varname>progressive_explain_buffers</varname> (<type>boolean</type>)
+ <indexterm>
+ <primary><varname>progressive_explain_buffers</varname> configuration parameter</primary>
+ </indexterm>
+ </term>
+ <listitem>
+ <para>
+ Controls whether information on buffer usage is added to
+ progressive explains. Equivalent to the BUFFERS option of
+ EXPLAIN. The default is <literal>off</literal>.
+ </para>
+ </listitem>
+ </varlistentry>
+
+ <varlistentry id="guc-progressive-explain-timing" xreflabel="progressive_explain_timing">
+ <term><varname>progressive_explain_timing</varname> (<type>boolean</type>)
+ <indexterm>
+ <primary><varname>progressive_explain_timing</varname> configuration parameter</primary>
+ </indexterm>
+ </term>
+ <listitem>
+ <para>
+ Controls whether information on per node timing is added
+ to progressive explains. Equivalent to the TIMING option of
+ EXPLAIN. The default is <literal>off</literal>.
+ </para>
+ </listitem>
+ </varlistentry>
+
+ <varlistentry id="guc-progressive-explain-wal" xreflabel="progressive_explain_wal">
+ <term><varname>progressive_explain_wal</varname> (<type>boolean</type>)
+ <indexterm>
+ <primary><varname>progressive_explain_wal</varname> configuration parameter</primary>
+ </indexterm>
+ </term>
+ <listitem>
+ <para>
+ Controls whether information on WAL record generation is
+ added to progressive explains. Equivalent to the WAL option of
+ EXPLAIN. The default is <literal>off</literal>.
+ </para>
+ </listitem>
+ </varlistentry>
+
+ <varlistentry id="guc-progressive-explain-verbose" xreflabel="progressive_explain_verbose">
+ <term><varname>progressive_explain_verbose</varname> (<type>boolean</type>)
+ <indexterm>
+ <primary><varname>progressive_explain_verbose</varname> configuration parameter</primary>
+ </indexterm>
+ </term>
+ <listitem>
+ <para>
+ Controls whether verbose details are added to progressive explains.
+ Equivalent to the VERBOSE option of EXPLAIN. The default is
+ <literal>off</literal>.
+ </para>
+ </listitem>
+ </varlistentry>
+
+ <varlistentry id="guc-progressive-explain-settings" xreflabel="progressive_explain_settings">
+ <term><varname>progressive_explain_settings</varname> (<type>boolean</type>)
+ <indexterm>
+ <primary><varname>progressive_explain_settings</varname> configuration parameter</primary>
+ </indexterm>
+ </term>
+ <listitem>
+ <para>
+ Controls whether information on modified configuration is added to
+ progressive explains. Equivalent to the SETTINGS option of EXPLAIN.
+ The default is <literal>off</literal>.
+ </para>
+ </listitem>
+ </varlistentry>
+
+ <varlistentry id="guc-progressive-explain-costs" xreflabel="progressive_explain_costs">
+ <term><varname>progressive_explain_costs</varname> (<type>boolean</type>)
+ <indexterm>
+ <primary><varname>progressive_explain_costs</varname> configuration parameter</primary>
+ </indexterm>
+ </term>
+ <listitem>
+ <para>
+ Controls whether information on the estimated startup and total cost of
+ each plan node is added to progressive explains. Equivalent to the COSTS
+ option of EXPLAIN.
+ The default is <literal>off</literal>.
+ </para>
+ </listitem>
+ </varlistentry>
+
+ <varlistentry id="guc-progressive-explain-format" xreflabel="progressive_explain_format">
+ <term><varname>progressive_explain_format</varname> (<type>enum</type>)
+ <indexterm>
+ <primary><varname>progressive_explain_format</varname> configuration parameter</primary>
+ </indexterm>
+ </term>
+ <listitem>
+ <para>
+ Selects the EXPLAIN output format to be used with progressive
+ explains. Equivalent to the FORMAT option of EXPLAIN. The default
+ is <literal>text</literal>.
+ </para>
+ </listitem>
+ </varlistentry>
+
</variablelist>
</sect2>
diff --git a/doc/src/sgml/monitoring.sgml b/doc/src/sgml/monitoring.sgml
index 0960f5ba94a..14780c54565 100644
--- a/doc/src/sgml/monitoring.sgml
+++ b/doc/src/sgml/monitoring.sgml
@@ -6840,6 +6840,88 @@ FROM pg_stat_get_backend_idset() AS backendid;
</sect2>
+<sect2 id="explain-progress-reporting">
+ <title>EXPLAIN Progress Reporting</title>
+
+ <indexterm>
+ <primary>pg_stat_progress_explain</primary>
+ </indexterm>
+
+ <para>
+ Whenever a client backend or parallel worker is running a query with
+ <xref linkend="guc-progressive-explain"/> enabled, the
+ <structname>pg_stat_progress_explain</structname> view will contain a
+ corresponding row with query plan details; see
+ <xref linkend="using-explain-progressive"/>. The table below describe the
+ information that will be reported.
+ </para>
+
+ <table id="pg-stat-progress-explain-view" xreflabel="pg_stat_progress_explain">
+ <title><structname>pg_stat_progress_explain</structname> View</title>
+ <tgroup cols="1">
+ <thead>
+ <row>
+ <entry role="catalog_table_entry"><para role="column_definition">
+ Column Type
+ </para>
+ <para>
+ Description
+ </para></entry>
+ </row>
+ </thead>
+
+ <tbody>
+ <row>
+ <entry role="catalog_table_entry"><para role="column_definition">
+ <structfield>datid</structfield> <type>oid</type>
+ </para>
+ <para>
+ OID of the database this backend is connected to
+ </para></entry>
+ </row>
+
+ <row>
+ <entry role="catalog_table_entry"><para role="column_definition">
+ <structfield>datname</structfield> <type>name</type>
+ </para>
+ <para>
+ Name of the database this backend is connected to
+ </para></entry>
+ </row>
+
+ <row>
+ <entry role="catalog_table_entry"><para role="column_definition">
+ <structfield>pid</structfield> <type>integer</type>
+ </para>
+ <para>
+ Process ID of a client backend or parallel worker.
+ </para></entry>
+ </row>
+
+ <row>
+ <entry role="catalog_table_entry"><para role="column_definition">
+ <structfield>last_update</structfield> <type>timestamp with time zone</type>
+ </para>
+ <para>
+ Timestamp when plan was last printed.
+ </para></entry>
+ </row>
+
+ <row>
+ <entry role="catalog_table_entry"><para role="column_definition">
+ <structfield>query_plan</structfield> <type>text</type>
+ </para>
+ <para>
+ The progressive explain text.
+ </para></entry>
+ </row>
+
+ </tbody>
+ </tgroup>
+ </table>
+
+ </sect2>
+
</sect1>
<sect1 id="dynamic-trace">
diff --git a/doc/src/sgml/perform.sgml b/doc/src/sgml/perform.sgml
index 387baac7e8c..04a78f29df9 100644
--- a/doc/src/sgml/perform.sgml
+++ b/doc/src/sgml/perform.sgml
@@ -1169,6 +1169,103 @@ EXPLAIN ANALYZE SELECT * FROM tenk1 WHERE unique1 < 100 AND unique2 > 9000
</para>
</sect2>
+ <sect2 id="using-explain-progressive">
+ <title>Progressive <command>EXPLAIN</command></title>
+
+ <para>
+ The query plan created by the planner for any given active query can
+ be visualized by any session via <xref linkend="pg-stat-progress-explain-view"/>
+ view when <xref linkend="guc-progressive-explain"/> is enabled in the
+ client backend or parallel worker executing query and after min duration
+ specified by <xref linkend="guc-progressive-explain-min-duration"/> has
+ passed. Settings <xref linkend="guc-progressive-explain-timing"/>,
+ <xref linkend="guc-progressive-explain-buffers"/> and
+ <xref linkend="guc-progressive-explain-wal"/> control which additional
+ instrumentaton details are collected and included in the output while
+ <xref linkend="guc-progressive-explain-format"/>,
+ <xref linkend="guc-progressive-explain-verbose"/>,
+ <xref linkend="guc-progressive-explain-settings"/> and
+ <xref linkend="guc-progressive-explain-costs"/>
+ define how the plan is printed and which details are added there.
+ </para>
+
+ <para>
+ When <xref linkend="guc-progressive-explain"/> is set to <literal>explain</literal>
+ the plan will be printed once at the beginning of the query.
+ </para>
+
+ <para>
+<screen>
+SET progressive_explain = 'explain';
+SET
+
+SELECT * FROM test t1 INNER JOIN test t2 ON t1.c1=t2.c1;
+</screen>
+ </para>
+ <para>
+<screen>
+SELECT * FROM pg_stat_progress_explain;
+datid | datname | pid | last_update | query_plan
+-------+----------+-------+-------------------------------+------------------------------------------------------------------------------
+ 5 | postgres | 73972 | 2025-03-13 23:41:01.606324-03 | Hash Join (cost=327879.85..878413.95 rows=9999860 width=18) +
+ | | | | Hash Cond: (t1.c1 = t2.c1) +
+ | | | | -> Seq Scan on test t1 (cost=0.00..154053.60 rows=9999860 width=9) +
+ | | | | -> Hash (cost=154053.60..154053.60 rows=9999860 width=9) +
+ | | | | -> Seq Scan on test t2 (cost=0.00..154053.60 rows=9999860 width=9)+
+ | | | |
+(1 row)
+</screen>
+ </para>
+
+ <para>
+ Setting <xref linkend="guc-progressive-explain"/> to <literal>analyze</literal>
+ will enable instrumentation and the detailed plan is printed on a fixed interval
+ controlled by <xref linkend="guc-progressive-explain-interval"/>, including
+ per node accumulated row count and other statistics.
+ </para>
+
+ <para>
+ Progressive explains include additional information per node to help analyzing
+ execution progress:
+
+ <itemizedlist>
+ <listitem>
+ <para>
+ current: the plan node currently being processed.
+ </para>
+ </listitem>
+ <listitem>
+ <para>
+ never executed: a plan node not processed yet.
+ </para>
+ </listitem>
+ </itemizedlist>
+ </para>
+ <para>
+<screen>
+SET progressive_explain = 'analyze';
+SET
+
+SELECT * FROM test t1 INNER JOIN test t2 ON t1.c1=t2.c1;
+</screen>
+ </para>
+ <para>
+<screen>
+SELECT * FROM pg_stat_progress_explain;
+datid | datname | pid | last_update | query_plan
+-------+----------+-------+-------------------------------+---------------------------------------------------------------------------------------------------------------------------------------
+ 5 | postgres | 73972 | 2025-03-13 23:41:53.951884-03 | Hash Join (cost=327879.85..878413.95 rows=9999860 width=18) (actual time=1581.469..2963.158 rows=64862.00 loops=1) +
+ | | | | Hash Cond: (t1.c1 = t2.c1) +
+ | | | | -> Seq Scan on test t1 (cost=0.00..154053.60 rows=9999860 width=9) (actual time=0.079..486.702 rows=8258962.00 loops=1) (current)+
+ | | | | -> Hash (cost=154053.60..154053.60 rows=9999860 width=9) (actual time=1580.933..1580.933 rows=10000000.00 loops=1) +
+ | | | | -> Seq Scan on test t2 (cost=0.00..154053.60 rows=9999860 width=9) (actual time=0.004..566.961 rows=10000000.00 loops=1) +
+ | | | |
+(1 row)
+</screen>
+ </para>
+
+ </sect2>
+
</sect1>
<sect1 id="planner-stats">
diff --git a/src/backend/access/transam/xact.c b/src/backend/access/transam/xact.c
index b885513f765..fcdc4a8fefe 100644
--- a/src/backend/access/transam/xact.c
+++ b/src/backend/access/transam/xact.c
@@ -36,6 +36,7 @@
#include "catalog/pg_enum.h"
#include "catalog/storage.h"
#include "commands/async.h"
+#include "commands/explain_progressive.h"
#include "commands/tablecmds.h"
#include "commands/trigger.h"
#include "common/pg_prng.h"
@@ -2993,6 +2994,7 @@ AbortTransaction(void)
AtEOXact_PgStat(false, is_parallel_worker);
AtEOXact_ApplyLauncher(false);
AtEOXact_LogicalRepWorkers(false);
+ AtAbort_ProgressiveExplain();
pgstat_report_xact_timestamp(0);
}
@@ -5361,6 +5363,7 @@ AbortSubTransaction(void)
AtEOSubXact_HashTables(false, s->nestingLevel);
AtEOSubXact_PgStat(false, s->nestingLevel);
AtSubAbort_Snapshot(s->nestingLevel);
+ AtSubAbort_ProgressiveExplain();
}
/*
diff --git a/src/backend/catalog/system_views.sql b/src/backend/catalog/system_views.sql
index 31d269b7ee0..767735c1a2c 100644
--- a/src/backend/catalog/system_views.sql
+++ b/src/backend/catalog/system_views.sql
@@ -1334,6 +1334,16 @@ CREATE VIEW pg_stat_progress_copy AS
FROM pg_stat_get_progress_info('COPY') AS S
LEFT JOIN pg_database D ON S.datid = D.oid;
+CREATE VIEW pg_stat_progress_explain AS
+ SELECT
+ S.datid AS datid,
+ D.datname AS datname,
+ S.pid,
+ S.last_update,
+ S.query_plan
+ FROM pg_stat_progress_explain() AS S
+ LEFT JOIN pg_database AS D ON (S.datid = D.oid);
+
CREATE VIEW pg_user_mappings AS
SELECT
U.oid AS umid,
diff --git a/src/backend/commands/Makefile b/src/backend/commands/Makefile
index cb2fbdc7c60..e10224b2cd2 100644
--- a/src/backend/commands/Makefile
+++ b/src/backend/commands/Makefile
@@ -36,6 +36,7 @@ OBJS = \
explain.o \
explain_dr.o \
explain_format.o \
+ explain_progressive.o \
explain_state.o \
extension.o \
foreigncmds.o \
diff --git a/src/backend/commands/explain.c b/src/backend/commands/explain.c
index 391b34a2af2..359ae31e7d0 100644
--- a/src/backend/commands/explain.c
+++ b/src/backend/commands/explain.c
@@ -20,6 +20,7 @@
#include "commands/explain.h"
#include "commands/explain_dr.h"
#include "commands/explain_format.h"
+#include "commands/explain_progressive.h"
#include "commands/explain_state.h"
#include "commands/prepare.h"
#include "foreign/fdwapi.h"
@@ -139,7 +140,7 @@ static void show_indexsearches_info(PlanState *planstate, ExplainState *es);
static void show_tidbitmap_info(BitmapHeapScanState *planstate,
ExplainState *es);
static void show_instrumentation_count(const char *qlabel, int which,
- PlanState *planstate, ExplainState *es);
+ Instrumentation *instr, ExplainState *es);
static void show_foreignscan_info(ForeignScanState *fsstate, ExplainState *es);
static const char *explain_get_index_name(Oid indexId);
static bool peek_buffer_usage(ExplainState *es, const BufferUsage *usage);
@@ -596,6 +597,15 @@ ExplainOnePlan(PlannedStmt *plannedstmt, CachedPlan *cplan,
/* We can't run ExecutorEnd 'till we're done printing the stats... */
totaltime += elapsed_time(&starttime);
}
+ else
+ {
+ /*
+ * Handle progressive explain cleanup manually if enabled as it is not
+ * called as part of ExecutorFinish.
+ */
+ if (progressive_explain != PROGRESSIVE_EXPLAIN_NONE)
+ ProgressiveExplainFinish(queryDesc);
+ }
/* grab serialization metrics before we destroy the DestReceiver */
if (es->serialize != EXPLAIN_SERIALIZE_NONE)
@@ -1371,6 +1381,7 @@ ExplainNode(PlanState *planstate, List *ancestors,
const char *partialmode = NULL;
const char *operation = NULL;
const char *custom_name = NULL;
+ Instrumentation *local_instr = NULL;
ExplainWorkersState *save_workers_state = es->workers_state;
int save_indent = es->indent;
bool haschildren;
@@ -1834,17 +1845,38 @@ ExplainNode(PlanState *planstate, List *ancestors,
* instrumentation results the user didn't ask for. But we do the
* InstrEndLoop call anyway, if possible, to reduce the number of cases
* auto_explain has to contend with.
+ *
+ * For regular explains instrumentation clean up is called directly in the
+ * main instrumentation objects. Progressive explains need to clone
+ * instrumentation object and forcibly end the loop in nodes that may be
+ * running.
*/
if (planstate->instrument)
- InstrEndLoop(planstate->instrument);
+ {
+ /* Progressive explain. Use auxiliary instrumentation object */
+ if (es->progressive)
+ {
+ local_instr = es->pe_local_instr;
+ *local_instr = *planstate->instrument;
+
+ /* Force end loop even if node is in progress */
+ InstrEndLoopForce(local_instr);
+ }
+ /* Use main instrumentation */
+ else
+ {
+ local_instr = planstate->instrument;
+ InstrEndLoop(local_instr);
+ }
+ }
if (es->analyze &&
- planstate->instrument && planstate->instrument->nloops > 0)
+ local_instr && local_instr->nloops > 0)
{
- double nloops = planstate->instrument->nloops;
- double startup_ms = 1000.0 * planstate->instrument->startup / nloops;
- double total_ms = 1000.0 * planstate->instrument->total / nloops;
- double rows = planstate->instrument->ntuples / nloops;
+ double nloops = local_instr->nloops;
+ double startup_ms = 1000.0 * local_instr->startup / nloops;
+ double total_ms = 1000.0 * local_instr->total / nloops;
+ double rows = local_instr->ntuples / nloops;
if (es->format == EXPLAIN_FORMAT_TEXT)
{
@@ -1854,6 +1886,9 @@ ExplainNode(PlanState *planstate, List *ancestors,
appendStringInfo(es->str, "time=%.3f..%.3f ", startup_ms, total_ms);
appendStringInfo(es->str, "rows=%.2f loops=%.0f)", rows, nloops);
+
+ if (es->progressive && planstate == es->pe_curr_node)
+ appendStringInfo(es->str, " (current)");
}
else
{
@@ -1866,6 +1901,10 @@ ExplainNode(PlanState *planstate, List *ancestors,
}
ExplainPropertyFloat("Actual Rows", NULL, rows, 2, es);
ExplainPropertyFloat("Actual Loops", NULL, nloops, 0, es);
+
+ /* Progressive explain. Add current node flag is applicable */
+ if (es->progressive && planstate == es->pe_curr_node)
+ ExplainPropertyBool("Current", true, es);
}
}
else if (es->analyze)
@@ -1970,13 +2009,13 @@ ExplainNode(PlanState *planstate, List *ancestors,
"Index Cond", planstate, ancestors, es);
if (((IndexScan *) plan)->indexqualorig)
show_instrumentation_count("Rows Removed by Index Recheck", 2,
- planstate, es);
+ local_instr, es);
show_scan_qual(((IndexScan *) plan)->indexorderbyorig,
"Order By", planstate, ancestors, es);
show_scan_qual(plan->qual, "Filter", planstate, ancestors, es);
if (plan->qual)
show_instrumentation_count("Rows Removed by Filter", 1,
- planstate, es);
+ local_instr, es);
show_indexsearches_info(planstate, es);
break;
case T_IndexOnlyScan:
@@ -1984,16 +2023,16 @@ ExplainNode(PlanState *planstate, List *ancestors,
"Index Cond", planstate, ancestors, es);
if (((IndexOnlyScan *) plan)->recheckqual)
show_instrumentation_count("Rows Removed by Index Recheck", 2,
- planstate, es);
+ local_instr, es);
show_scan_qual(((IndexOnlyScan *) plan)->indexorderby,
"Order By", planstate, ancestors, es);
show_scan_qual(plan->qual, "Filter", planstate, ancestors, es);
if (plan->qual)
show_instrumentation_count("Rows Removed by Filter", 1,
- planstate, es);
+ local_instr, es);
if (es->analyze)
ExplainPropertyFloat("Heap Fetches", NULL,
- planstate->instrument->ntuples2, 0, es);
+ local_instr->ntuples2, 0, es);
show_indexsearches_info(planstate, es);
break;
case T_BitmapIndexScan:
@@ -2006,11 +2045,11 @@ ExplainNode(PlanState *planstate, List *ancestors,
"Recheck Cond", planstate, ancestors, es);
if (((BitmapHeapScan *) plan)->bitmapqualorig)
show_instrumentation_count("Rows Removed by Index Recheck", 2,
- planstate, es);
+ local_instr, es);
show_scan_qual(plan->qual, "Filter", planstate, ancestors, es);
if (plan->qual)
show_instrumentation_count("Rows Removed by Filter", 1,
- planstate, es);
+ local_instr, es);
show_tidbitmap_info((BitmapHeapScanState *) planstate, es);
break;
case T_SampleScan:
@@ -2027,7 +2066,7 @@ ExplainNode(PlanState *planstate, List *ancestors,
show_scan_qual(plan->qual, "Filter", planstate, ancestors, es);
if (plan->qual)
show_instrumentation_count("Rows Removed by Filter", 1,
- planstate, es);
+ local_instr, es);
if (IsA(plan, CteScan))
show_ctescan_info(castNode(CteScanState, planstate), es);
break;
@@ -2038,7 +2077,7 @@ ExplainNode(PlanState *planstate, List *ancestors,
show_scan_qual(plan->qual, "Filter", planstate, ancestors, es);
if (plan->qual)
show_instrumentation_count("Rows Removed by Filter", 1,
- planstate, es);
+ local_instr, es);
ExplainPropertyInteger("Workers Planned", NULL,
gather->num_workers, es);
@@ -2062,7 +2101,7 @@ ExplainNode(PlanState *planstate, List *ancestors,
show_scan_qual(plan->qual, "Filter", planstate, ancestors, es);
if (plan->qual)
show_instrumentation_count("Rows Removed by Filter", 1,
- planstate, es);
+ local_instr, es);
ExplainPropertyInteger("Workers Planned", NULL,
gm->num_workers, es);
@@ -2096,7 +2135,7 @@ ExplainNode(PlanState *planstate, List *ancestors,
show_scan_qual(plan->qual, "Filter", planstate, ancestors, es);
if (plan->qual)
show_instrumentation_count("Rows Removed by Filter", 1,
- planstate, es);
+ local_instr, es);
break;
case T_TableFuncScan:
if (es->verbose)
@@ -2110,7 +2149,7 @@ ExplainNode(PlanState *planstate, List *ancestors,
show_scan_qual(plan->qual, "Filter", planstate, ancestors, es);
if (plan->qual)
show_instrumentation_count("Rows Removed by Filter", 1,
- planstate, es);
+ local_instr, es);
show_table_func_scan_info(castNode(TableFuncScanState,
planstate), es);
break;
@@ -2128,7 +2167,7 @@ ExplainNode(PlanState *planstate, List *ancestors,
show_scan_qual(plan->qual, "Filter", planstate, ancestors, es);
if (plan->qual)
show_instrumentation_count("Rows Removed by Filter", 1,
- planstate, es);
+ local_instr, es);
}
break;
case T_TidRangeScan:
@@ -2145,14 +2184,14 @@ ExplainNode(PlanState *planstate, List *ancestors,
show_scan_qual(plan->qual, "Filter", planstate, ancestors, es);
if (plan->qual)
show_instrumentation_count("Rows Removed by Filter", 1,
- planstate, es);
+ local_instr, es);
}
break;
case T_ForeignScan:
show_scan_qual(plan->qual, "Filter", planstate, ancestors, es);
if (plan->qual)
show_instrumentation_count("Rows Removed by Filter", 1,
- planstate, es);
+ local_instr, es);
show_foreignscan_info((ForeignScanState *) planstate, es);
break;
case T_CustomScan:
@@ -2162,7 +2201,7 @@ ExplainNode(PlanState *planstate, List *ancestors,
show_scan_qual(plan->qual, "Filter", planstate, ancestors, es);
if (plan->qual)
show_instrumentation_count("Rows Removed by Filter", 1,
- planstate, es);
+ local_instr, es);
if (css->methods->ExplainCustomScan)
css->methods->ExplainCustomScan(css, ancestors, es);
}
@@ -2172,11 +2211,11 @@ ExplainNode(PlanState *planstate, List *ancestors,
"Join Filter", planstate, ancestors, es);
if (((NestLoop *) plan)->join.joinqual)
show_instrumentation_count("Rows Removed by Join Filter", 1,
- planstate, es);
+ local_instr, es);
show_upper_qual(plan->qual, "Filter", planstate, ancestors, es);
if (plan->qual)
show_instrumentation_count("Rows Removed by Filter", 2,
- planstate, es);
+ local_instr, es);
break;
case T_MergeJoin:
show_upper_qual(((MergeJoin *) plan)->mergeclauses,
@@ -2185,11 +2224,11 @@ ExplainNode(PlanState *planstate, List *ancestors,
"Join Filter", planstate, ancestors, es);
if (((MergeJoin *) plan)->join.joinqual)
show_instrumentation_count("Rows Removed by Join Filter", 1,
- planstate, es);
+ local_instr, es);
show_upper_qual(plan->qual, "Filter", planstate, ancestors, es);
if (plan->qual)
show_instrumentation_count("Rows Removed by Filter", 2,
- planstate, es);
+ local_instr, es);
break;
case T_HashJoin:
show_upper_qual(((HashJoin *) plan)->hashclauses,
@@ -2198,11 +2237,11 @@ ExplainNode(PlanState *planstate, List *ancestors,
"Join Filter", planstate, ancestors, es);
if (((HashJoin *) plan)->join.joinqual)
show_instrumentation_count("Rows Removed by Join Filter", 1,
- planstate, es);
+ local_instr, es);
show_upper_qual(plan->qual, "Filter", planstate, ancestors, es);
if (plan->qual)
show_instrumentation_count("Rows Removed by Filter", 2,
- planstate, es);
+ local_instr, es);
break;
case T_Agg:
show_agg_keys(castNode(AggState, planstate), ancestors, es);
@@ -2210,7 +2249,7 @@ ExplainNode(PlanState *planstate, List *ancestors,
show_hashagg_info((AggState *) planstate, es);
if (plan->qual)
show_instrumentation_count("Rows Removed by Filter", 1,
- planstate, es);
+ local_instr, es);
break;
case T_WindowAgg:
show_window_def(castNode(WindowAggState, planstate), ancestors, es);
@@ -2219,7 +2258,7 @@ ExplainNode(PlanState *planstate, List *ancestors,
show_upper_qual(plan->qual, "Filter", planstate, ancestors, es);
if (plan->qual)
show_instrumentation_count("Rows Removed by Filter", 1,
- planstate, es);
+ local_instr, es);
show_windowagg_info(castNode(WindowAggState, planstate), es);
break;
case T_Group:
@@ -2227,7 +2266,7 @@ ExplainNode(PlanState *planstate, List *ancestors,
show_upper_qual(plan->qual, "Filter", planstate, ancestors, es);
if (plan->qual)
show_instrumentation_count("Rows Removed by Filter", 1,
- planstate, es);
+ local_instr, es);
break;
case T_Sort:
show_sort_keys(castNode(SortState, planstate), ancestors, es);
@@ -2249,7 +2288,7 @@ ExplainNode(PlanState *planstate, List *ancestors,
show_upper_qual(plan->qual, "Filter", planstate, ancestors, es);
if (plan->qual)
show_instrumentation_count("Rows Removed by Filter", 1,
- planstate, es);
+ local_instr, es);
break;
case T_ModifyTable:
show_modifytable_info(castNode(ModifyTableState, planstate), ancestors,
@@ -2294,10 +2333,10 @@ ExplainNode(PlanState *planstate, List *ancestors,
}
/* Show buffer/WAL usage */
- if (es->buffers && planstate->instrument)
- show_buffer_usage(es, &planstate->instrument->bufusage);
- if (es->wal && planstate->instrument)
- show_wal_usage(es, &planstate->instrument->walusage);
+ if (es->buffers && local_instr)
+ show_buffer_usage(es, &local_instr->bufusage);
+ if (es->wal && local_instr)
+ show_wal_usage(es, &local_instr->walusage);
/* Prepare per-worker buffer/WAL usage */
if (es->workers_state && (es->buffers || es->wal) && es->verbose)
@@ -3975,19 +4014,19 @@ show_tidbitmap_info(BitmapHeapScanState *planstate, ExplainState *es)
*/
static void
show_instrumentation_count(const char *qlabel, int which,
- PlanState *planstate, ExplainState *es)
+ Instrumentation *instr, ExplainState *es)
{
double nfiltered;
double nloops;
- if (!es->analyze || !planstate->instrument)
+ if (!es->analyze || !instr)
return;
if (which == 2)
- nfiltered = planstate->instrument->nfiltered2;
+ nfiltered = instr->nfiltered2;
else
- nfiltered = planstate->instrument->nfiltered1;
- nloops = planstate->instrument->nloops;
+ nfiltered = instr->nfiltered1;
+ nloops = instr->nloops;
/* In text mode, suppress zero counts; they're not interesting enough */
if (nfiltered > 0 || es->format != EXPLAIN_FORMAT_TEXT)
@@ -4668,7 +4707,7 @@ show_modifytable_info(ModifyTableState *mtstate, List *ancestors,
{
show_upper_qual((List *) node->onConflictWhere, "Conflict Filter",
&mtstate->ps, ancestors, es);
- show_instrumentation_count("Rows Removed by Conflict Filter", 1, &mtstate->ps, es);
+ show_instrumentation_count("Rows Removed by Conflict Filter", 1, (&mtstate->ps)->instrument, es);
}
/* EXPLAIN ANALYZE display of actual outcome for each tuple proposed */
@@ -4677,11 +4716,24 @@ show_modifytable_info(ModifyTableState *mtstate, List *ancestors,
double total;
double insert_path;
double other_path;
+ Instrumentation *local_instr;
- InstrEndLoop(outerPlanState(mtstate)->instrument);
+ /* Progressive explain. Use auxiliary instrumentation object */
+ if (es->progressive)
+ {
+ local_instr = es->pe_local_instr;
+ *local_instr = *outerPlanState(mtstate)->instrument;
+ /* Force end loop even if node is in progress */
+ InstrEndLoopForce(local_instr);
+ }
+ else
+ {
+ local_instr = outerPlanState(mtstate)->instrument;
+ InstrEndLoop(local_instr);
+ }
/* count the number of source rows */
- total = outerPlanState(mtstate)->instrument->ntuples;
+ total = local_instr->ntuples;
other_path = mtstate->ps.instrument->ntuples2;
insert_path = total - other_path;
@@ -4701,11 +4753,24 @@ show_modifytable_info(ModifyTableState *mtstate, List *ancestors,
double update_path;
double delete_path;
double skipped_path;
+ Instrumentation *local_instr;
- InstrEndLoop(outerPlanState(mtstate)->instrument);
+ /* Progressive explain. Use auxiliary instrumentation object */
+ if (es->progressive)
+ {
+ local_instr = es->pe_local_instr;
+ *local_instr = *outerPlanState(mtstate)->instrument;
+ /* Force end loop even if node is in progress */
+ InstrEndLoopForce(local_instr);
+ }
+ else
+ {
+ local_instr = outerPlanState(mtstate)->instrument;
+ InstrEndLoop(local_instr);
+ }
/* count the number of source rows */
- total = outerPlanState(mtstate)->instrument->ntuples;
+ total = local_instr->ntuples;
insert_path = mtstate->mt_merge_inserted;
update_path = mtstate->mt_merge_updated;
delete_path = mtstate->mt_merge_deleted;
diff --git a/src/backend/commands/explain_format.c b/src/backend/commands/explain_format.c
index 752691d56db..c0d6973d1e5 100644
--- a/src/backend/commands/explain_format.c
+++ b/src/backend/commands/explain_format.c
@@ -16,6 +16,7 @@
#include "commands/explain.h"
#include "commands/explain_format.h"
#include "commands/explain_state.h"
+#include "utils/guc_tables.h"
#include "utils/json.h"
#include "utils/xml.h"
@@ -25,6 +26,17 @@
#define X_CLOSE_IMMEDIATE 2
#define X_NOWHITESPACE 4
+/*
+ * GUC support
+ */
+const struct config_enum_entry explain_format_options[] = {
+ {"text", EXPLAIN_FORMAT_TEXT, false},
+ {"xml", EXPLAIN_FORMAT_XML, false},
+ {"json", EXPLAIN_FORMAT_JSON, false},
+ {"yaml", EXPLAIN_FORMAT_YAML, false},
+ {NULL, 0, false}
+};
+
static void ExplainJSONLineEnding(ExplainState *es);
static void ExplainXMLTag(const char *tagname, int flags, ExplainState *es);
static void ExplainYAMLLineStarting(ExplainState *es);
diff --git a/src/backend/commands/explain_progressive.c b/src/backend/commands/explain_progressive.c
new file mode 100644
index 00000000000..17516aee0d7
--- /dev/null
+++ b/src/backend/commands/explain_progressive.c
@@ -0,0 +1,762 @@
+/*-------------------------------------------------------------------------
+ *
+ * explain_progressive.c
+ * Code for the progressive explain feature
+ *
+ * Portions Copyright (c) 1996-2025, PostgreSQL Global Development Group
+ * Portions Copyright (c) 1994-5, Regents of the University of California
+ *
+ * IDENTIFICATION
+ * src/backend/commands/explain_progressive.c
+ *
+ *-------------------------------------------------------------------------
+ */
+#include "postgres.h"
+
+#include "access/xact.h"
+#include "catalog/pg_authid.h"
+#include "commands/explain.h"
+#include "commands/explain_format.h"
+#include "commands/explain_progressive.h"
+#include "commands/explain_state.h"
+#include "foreign/fdwapi.h"
+#include "funcapi.h"
+#include "utils/acl.h"
+#include "utils/backend_status.h"
+#include "utils/builtins.h"
+#include "utils/guc_tables.h"
+#include "utils/timeout.h"
+
+
+#define PROGRESSIVE_EXPLAIN_FREE_SIZE 4096
+
+/* Shared hash to store progressive explains */
+static HTAB *progressiveExplainHash = NULL;
+
+/* Pointer to running query */
+static QueryDesc *activeQueryDesc = NULL;
+
+/* Transaction nest level of the tracked query */
+static int activeQueryXactNestLevel = 0;
+
+/* Flag set by timeouts to control when to update progressive explains */
+bool ProgressiveExplainPending = false;
+
+static void ProgressiveExplainInit(QueryDesc *queryDesc);
+static void ProgressiveExplainPrint(QueryDesc *queryDesc);
+static void ProgressiveExplainCleanup(QueryDesc *queryDesc);
+static TupleTableSlot *ExecProcNodeExplain(PlanState *node);
+static void WrapExecProcNodeWithExplain(PlanState *ps);
+static void UnwrapExecProcNodeWithExplain(PlanState *ps);
+
+
+
+/*
+ * ProgressiveExplainSetup -
+ * Track query descriptor and adjust instrumentation.
+ *
+ * If progressive explain is enabled and configured to collect
+ * instrumentation details, we adjust QueryDesc accordingly even if
+ * the query was not initiated with EXPLAIN ANALYZE. This will
+ * directly affect query execution and add computation overhead.
+ */
+void
+ProgressiveExplainSetup(QueryDesc *queryDesc)
+{
+ /* Setup only if this is the outer most query */
+ if (activeQueryDesc == NULL)
+ {
+ activeQueryDesc = queryDesc;
+ activeQueryXactNestLevel = GetCurrentTransactionNestLevel();
+
+ /* Setup instrumentation if enabled */
+ if (progressive_explain == PROGRESSIVE_EXPLAIN_ANALYZE)
+ {
+ if (progressive_explain_timing)
+ queryDesc->instrument_options |= INSTRUMENT_TIMER;
+ else
+ queryDesc->instrument_options |= INSTRUMENT_ROWS;
+ if (progressive_explain_buffers)
+ queryDesc->instrument_options |= INSTRUMENT_BUFFERS;
+ if (progressive_explain_wal)
+ queryDesc->instrument_options |= INSTRUMENT_WAL;
+ }
+ }
+}
+
+/*
+ * ProgressiveExplainStart
+ * Progressive explain start point.
+ */
+void
+ProgressiveExplainStart(QueryDesc *queryDesc)
+{
+ /*
+ * Progressive explain is only done for the outer most query descriptor.
+ */
+ if (queryDesc == activeQueryDesc)
+ {
+ /* Timeout is only needed if duration > 0 */
+ if (progressive_explain_min_duration == 0)
+ ProgressiveExplainInit(queryDesc);
+ else
+ enable_timeout_after(PROGRESSIVE_EXPLAIN_STARTUP_TIMEOUT,
+ progressive_explain_min_duration);
+ }
+}
+
+/*
+ * ProgressiveExplainInit -
+ * Responsible for initialization of all structures related to progressive
+ * explains.
+ *
+ * We define a ExplainState that will be reused in every iteration of
+ * plan prints.
+ *
+ * Progressive explain plans are printed in shared memory via DSAs. Each
+ * A dynamic shared memory area is created to hold the progressive plans.
+ * Each backend printing plans has its own DSA, which is shared with other
+ * backends via the global progressive explain hash through dsa_handle and
+ * dsa_pointer pointers.
+ *
+ * A memory context release callback is defined for manual resource release
+ * in case of query cancellations.
+ *
+ * A periodic timeout is configured to print the plan in fixed intervals if
+ * progressive explain is configured with instrumentation enabled. Otherwise
+ * the plain plan is printed once.
+ */
+void
+ProgressiveExplainInit(QueryDesc *queryDesc)
+{
+ ExplainState *es;
+ ProgressiveExplainHashEntry *entry;
+ bool found;
+
+ /* Initialize ExplainState to be used for all prints */
+ es = NewExplainState();
+ queryDesc->pestate = es;
+
+ /* Local instrumentation object to be reused for every node */
+ es->pe_local_instr = palloc0(sizeof(Instrumentation));
+
+ /*
+ * Mark ExplainState as progressive so that the plan printing function
+ * adjusts logic accordingly.
+ */
+ es->progressive = true;
+
+ es->analyze = (queryDesc->instrument_options &&
+ (progressive_explain == PROGRESSIVE_EXPLAIN_ANALYZE));
+ es->buffers = (es->analyze && progressive_explain_buffers);
+ es->wal = (es->analyze && progressive_explain_wal);
+ es->timing = (es->analyze && progressive_explain_timing);
+ es->summary = (es->analyze);
+ es->format = progressive_explain_format;
+ es->verbose = progressive_explain_verbose;
+ es->settings = progressive_explain_settings;
+ es->costs = progressive_explain_costs;
+
+ /* Define the DSA and share through the hash */
+ es->pe_a = dsa_create(LWTRANCHE_PROGRESSIVE_EXPLAIN_DSA);
+
+ /* Exclusive access is needed to update the hash */
+ LWLockAcquire(ProgressiveExplainHashLock, LW_EXCLUSIVE);
+
+ /* Find or create an entry with desired hash code */
+ entry = (ProgressiveExplainHashEntry *) hash_search(progressiveExplainHash,
+ &MyProcPid,
+ HASH_ENTER,
+ &found);
+
+ entry->pe_h = dsa_get_handle(es->pe_a);
+ entry->pe_p = (dsa_pointer) NULL;
+
+ LWLockRelease(ProgressiveExplainHashLock);
+
+ /* Enable timeout only if instrumentation is enabled */
+ if (es->analyze)
+ enable_timeout_every(PROGRESSIVE_EXPLAIN_TIMEOUT,
+ TimestampTzPlusMilliseconds(GetCurrentTimestamp(),
+ progressive_explain_interval),
+ progressive_explain_interval);
+
+ /* Printing progressive plan for the first time */
+ ProgressiveExplainPrint(queryDesc);
+}
+
+/*
+ * ProgressiveExplainTrigger -
+ * Called by the timeout handler to start printing progressive
+ * explain plans.
+ */
+void
+ProgressiveExplainTrigger(void)
+{
+ /*
+ * Check that the pointer is still active to avoid corner case where the
+ * timeout function is called after cleanup is done.
+ */
+ if (activeQueryDesc)
+ WrapExecProcNodeWithExplain(activeQueryDesc->planstate);
+}
+
+/*
+ * ProgressiveExplainUpdate
+ * Updates progressive explain for instrumented runs.
+ */
+void
+ProgressiveExplainUpdate(PlanState *node)
+{
+ /* Track the current PlanState */
+ node->state->query_desc->pestate->pe_curr_node = node;
+ ProgressiveExplainPrint(node->state->query_desc);
+ node->state->query_desc->pestate->pe_curr_node = NULL;
+
+ /* Reset timeout flag */
+ ProgressiveExplainPending = false;
+}
+
+/*
+ * ProgressiveExplainPrint -
+ * Prints progressive explain in memory.
+ *
+ * This function resets the reusable ExplainState, prints the
+ * plan and updates the DSA with new data.
+ *
+ * DSA memory allocation is also done here. Amount of shared
+ * memory allocated depends on size of currently printed plan.
+ * There may be reallocations in subsequent calls if new plans
+ * don't fit in the existing area.
+ */
+void
+ProgressiveExplainPrint(QueryDesc *queryDesc)
+{
+ bool alloc_needed = false;
+ QueryDesc *currentQueryDesc = queryDesc;
+ ProgressiveExplainHashEntry *entry;
+ ProgressiveExplainHashData *pe_data;
+ ExplainState *es = queryDesc->pestate;
+
+ /* Reset the string to be reused */
+ resetStringInfo(es->str);
+
+ /* Print the plan */
+ ExplainBeginOutput(es);
+ ExplainPrintPlan(es, currentQueryDesc);
+ ExplainEndOutput(es);
+
+ /* Exclusive access is needed to update the hash */
+ LWLockAcquire(ProgressiveExplainHashLock, LW_EXCLUSIVE);
+ entry = (ProgressiveExplainHashEntry *) hash_search(progressiveExplainHash,
+ &MyProcPid,
+ HASH_FIND,
+ NULL);
+
+ /* Entry must already exist in shared memory at this point */
+ if (!entry)
+ elog(ERROR, "no entry in progressive explain hash for pid %d",
+ MyProcPid);
+ else
+ {
+ /* Plan was never printed */
+ if (!entry->pe_p)
+ alloc_needed = true;
+ else
+ {
+ pe_data = dsa_get_address(es->pe_a,
+ entry->pe_p);
+
+ /*
+ * Plan does not fit in existing shared memory area. Reallocation
+ * is needed.
+ */
+ if (strlen(es->str->data) > entry->plan_alloc_size)
+ {
+ dsa_free(es->pe_a, entry->pe_p);
+ alloc_needed = true;
+ }
+ }
+
+ if (alloc_needed)
+ {
+ /*
+ * The allocated size combines the length of the currently printed
+ * query plan with an additional delta defined by
+ * PROGRESSIVE_EXPLAIN_FREE_SIZE. This strategy prevents having to
+ * reallocate the segment very often, which would be needed in
+ * case the length of the next printed exceeds the previously
+ * allocated size.
+ */
+ entry->plan_alloc_size = add_size(strlen(es->str->data),
+ PROGRESSIVE_EXPLAIN_FREE_SIZE);
+ entry->pe_p = dsa_allocate(es->pe_a,
+ add_size(sizeof(ProgressiveExplainHashData),
+ entry->plan_alloc_size));
+ pe_data = dsa_get_address(es->pe_a, entry->pe_p);
+ pe_data->pid = MyProcPid;
+ }
+
+ /* Update shared memory with new data */
+ strcpy(pe_data->plan, es->str->data);
+ pe_data->last_update = GetCurrentTimestamp();
+ }
+
+ LWLockRelease(ProgressiveExplainHashLock);
+}
+
+/*
+ * ProgressiveExplainFinish -
+ * Finalizes query execution with progressive explain enabled.
+ */
+void
+ProgressiveExplainFinish(QueryDesc *queryDesc)
+{
+ /*
+ * Progressive explain is only done for the outer most query descriptor.
+ */
+ if (queryDesc == activeQueryDesc)
+ {
+ /* Startup timeout hasn't triggered yet, just disable it */
+ if (get_timeout_active(PROGRESSIVE_EXPLAIN_STARTUP_TIMEOUT))
+ disable_timeout(PROGRESSIVE_EXPLAIN_STARTUP_TIMEOUT, false);
+ /* Initial progressive explain was done, clean everything */
+ else if (queryDesc && queryDesc->pestate)
+ ProgressiveExplainCleanup(queryDesc);
+ }
+}
+
+/*
+ * ProgressiveExplainFinish -
+ * Finalizes query execution with progressive explain enabled.
+ */
+bool
+ProgressiveExplainIsActive(QueryDesc *queryDesc)
+{
+ if (queryDesc == activeQueryDesc)
+ return true;
+ else
+ return false;
+}
+
+/*
+ * End-of-transaction cleanup for progressive explains.
+ */
+void
+AtAbort_ProgressiveExplain(void)
+{
+ /* Only perform cleanup if query descriptor is being tracked */
+ if (activeQueryDesc)
+ ProgressiveExplainCleanup(NULL);
+}
+
+/*
+ * End-of-subtransaction cleanup for progressive explains.
+ */
+void
+AtSubAbort_ProgressiveExplain(void)
+{
+ /*
+ * Only perform cleanup if query descriptor is being tracked and the
+ * current transaction nested level is the same as the level of the
+ * tracked query. This is to avoid doing cleanup in subtransaction aborts
+ * triggered by exception blocks in functions and procedures.
+ */
+ if (activeQueryDesc &&
+ activeQueryXactNestLevel == GetCurrentTransactionNestLevel())
+ ProgressiveExplainCleanup(NULL);
+}
+
+/*
+ * ProgressiveExplainCleanup -
+ * Cleanup routine when progressive explain is enabled.
+ *
+ * We need to deal with structures not automatically released by the memory
+ * context removal. Current tasks are:
+ * - remove local backend from progressive explain hash
+ * - detach from DSA used to store shared data
+ */
+void
+ProgressiveExplainCleanup(QueryDesc *queryDesc)
+{
+ /* Stop timeout */
+ disable_timeout(PROGRESSIVE_EXPLAIN_TIMEOUT, false);
+
+ /* Reset timeout flag */
+ ProgressiveExplainPending = false;
+
+ /* Reset querydesc tracker */
+ activeQueryDesc = NULL;
+
+ LWLockAcquire(ProgressiveExplainHashLock, LW_EXCLUSIVE);
+
+ /*
+ * Only detach from DSA if query ended gracefully, ie, if
+ * ProgressiveExplainCleanup was called by function
+ * ProgressiveExplainFinish
+ */
+ if (queryDesc)
+ dsa_detach(queryDesc->pestate->pe_a);
+ hash_search(progressiveExplainHash, &MyProcPid, HASH_REMOVE, NULL);
+ LWLockRelease(ProgressiveExplainHashLock);
+}
+
+/*
+ * ExecProcNodeExplain -
+ * ExecProcNode wrapper that initializes progressive explain
+ * and uwraps ExecProdNode to the original function.
+ */
+static TupleTableSlot *
+ExecProcNodeExplain(PlanState *node)
+{
+ /* Initialize progressive explain */
+ ProgressiveExplainInit(node->state->query_desc);
+
+ /* Unwrap exec proc node for all nodes */
+ UnwrapExecProcNodeWithExplain(node->state->query_desc->planstate);
+
+ /*
+ * Since unwrapping has already done, call ExecProcNode() not
+ * ExecProcNodeOriginal().
+ */
+ return node->ExecProcNode(node);
+}
+
+/*
+ * ExecProcNodeInstrExplain -
+ * ExecProcNode wrapper that performs instrumentation calls and prints
+ * progressive explains. By keeping this a separate function, we add
+ * overhead only when progressive explain is enabled
+ */
+TupleTableSlot *
+ExecProcNodeInstrExplain(PlanState *node)
+{
+ TupleTableSlot *result;
+
+ InstrStartNode(node->instrument);
+
+ /*
+ * Update progressive after timeout is reached.
+ */
+ if (ProgressiveExplainPending)
+ ProgressiveExplainUpdate(node);
+
+ result = node->ExecProcNodeReal(node);
+
+ InstrStopNode(node->instrument, TupIsNull(result) ? 0.0 : 1.0);
+
+ return result;
+}
+
+/*
+ * WrapMemberNodesExecProcNodesWithExplain -
+ * Wrap array of PlanStates ExecProcNodes with ExecProcNodeWithExplain
+ */
+static void
+WrapMemberNodesExecProcNodesWithExplain(PlanState **planstates, int nplans)
+{
+ int i;
+
+ for (i = 0; i < nplans; i++)
+ WrapExecProcNodeWithExplain(planstates[i]);
+}
+
+/*
+ * WrapCustomPlanChildExecProcNodesWithExplain -
+ * Wrap CustomScanstate children's ExecProcNodes with ExecProcNodeWithExplain
+ */
+static void
+WrapCustomPlanChildExecProcNodesWithExplain(CustomScanState *css)
+{
+ ListCell *cell;
+
+ foreach(cell, css->custom_ps)
+ WrapExecProcNodeWithExplain((PlanState *) lfirst(cell));
+}
+
+/*
+ * WrapSubPlansExecProcNodesWithExplain -
+ * Wrap SubPlans ExecProcNodes with ExecProcNodeWithExplain
+ */
+static void
+WrapSubPlansExecProcNodesWithExplain(List *plans)
+{
+ ListCell *lst;
+
+ foreach(lst, plans)
+ {
+ SubPlanState *sps = (SubPlanState *) lfirst(lst);
+
+ WrapExecProcNodeWithExplain(sps->planstate);
+ }
+}
+
+/*
+ * WrapExecProcNodeWithExplain -
+ * Wrap ExecProcNode with ExecProcNodeWithExplain recursively
+ */
+static void
+WrapExecProcNodeWithExplain(PlanState *ps)
+{
+ /* wrapping can be done only once */
+ if (ps->ExecProcNodeOriginal != NULL)
+ return;
+
+ check_stack_depth();
+
+ ps->ExecProcNodeOriginal = ps->ExecProcNode;
+ ps->ExecProcNode = ExecProcNodeExplain;
+
+ /* initPlan-s */
+ if (ps->initPlan != NULL)
+ WrapSubPlansExecProcNodesWithExplain(ps->initPlan);
+
+ /* lefttree */
+ if (ps->lefttree != NULL)
+ WrapExecProcNodeWithExplain(ps->lefttree);
+ /* righttree */
+ if (ps->righttree != NULL)
+ WrapExecProcNodeWithExplain(ps->righttree);
+
+ /* special child plans */
+ switch (nodeTag(ps->plan))
+ {
+ case T_Append:
+ WrapMemberNodesExecProcNodesWithExplain(((AppendState *) ps)->appendplans,
+ ((AppendState *) ps)->as_nplans);
+ break;
+ case T_MergeAppend:
+ WrapMemberNodesExecProcNodesWithExplain(((MergeAppendState *) ps)->mergeplans,
+ ((MergeAppendState *) ps)->ms_nplans);
+ break;
+ case T_BitmapAnd:
+ WrapMemberNodesExecProcNodesWithExplain(((BitmapAndState *) ps)->bitmapplans,
+ ((BitmapAndState *) ps)->nplans);
+ break;
+ case T_BitmapOr:
+ WrapMemberNodesExecProcNodesWithExplain(((BitmapOrState *) ps)->bitmapplans,
+ ((BitmapOrState *) ps)->nplans);
+ break;
+ case T_SubqueryScan:
+ WrapExecProcNodeWithExplain(((SubqueryScanState *) ps)->subplan);
+ break;
+ case T_CustomScan:
+ WrapCustomPlanChildExecProcNodesWithExplain((CustomScanState *) ps);
+ break;
+ default:
+ break;
+ }
+}
+
+/*
+ * UnWrapMemberNodesExecProcNodesWithExplain -
+ * Unwrap array of PlanStates ExecProcNodes with ExecProcNodeWithExplain
+ */
+static void
+UnWrapMemberNodesExecProcNodesWithExplain(PlanState **planstates, int nplans)
+{
+ int i;
+
+ for (i = 0; i < nplans; i++)
+ UnwrapExecProcNodeWithExplain(planstates[i]);
+}
+
+/*
+ * UnwrapCustomPlanChildExecProcNodesWithExplain -
+ * Unwrap CustomScanstate children's ExecProcNodes with ExecProcNodeWithExplain
+ */
+static void
+UnwrapCustomPlanChildExecProcNodesWithExplain(CustomScanState *css)
+{
+ ListCell *cell;
+
+ foreach(cell, css->custom_ps)
+ UnwrapExecProcNodeWithExplain((PlanState *) lfirst(cell));
+}
+
+/*
+ * UnwrapSubPlansExecProcNodesWithExplain -
+ * Unwrap SubPlans ExecProcNodes with ExecProcNodeWithExplain
+ */
+static void
+UnwrapSubPlansExecProcNodesWithExplain(List *plans)
+{
+ ListCell *lst;
+
+ foreach(lst, plans)
+ {
+ SubPlanState *sps = (SubPlanState *) lfirst(lst);
+
+ UnwrapExecProcNodeWithExplain(sps->planstate);
+ }
+}
+
+/*
+ * UnwrapExecProcNodeWithExplain -
+ * Unwrap ExecProcNode with ExecProcNodeWithExplain recursively
+ */
+static void
+UnwrapExecProcNodeWithExplain(PlanState *ps)
+{
+ Assert(ps->ExecProcNodeOriginal != NULL);
+
+ check_stack_depth();
+
+ ps->ExecProcNode = ps->ExecProcNodeOriginal;
+ ps->ExecProcNodeOriginal = NULL;
+
+ /* initPlan-s */
+ if (ps->initPlan != NULL)
+ UnwrapSubPlansExecProcNodesWithExplain(ps->initPlan);
+
+ /* lefttree */
+ if (ps->lefttree != NULL)
+ UnwrapExecProcNodeWithExplain(ps->lefttree);
+ /* righttree */
+ if (ps->righttree != NULL)
+ UnwrapExecProcNodeWithExplain(ps->righttree);
+
+ /* special child plans */
+ switch (nodeTag(ps->plan))
+ {
+ case T_Append:
+ UnWrapMemberNodesExecProcNodesWithExplain(((AppendState *) ps)->appendplans,
+ ((AppendState *) ps)->as_nplans);
+ break;
+ case T_MergeAppend:
+ UnWrapMemberNodesExecProcNodesWithExplain(((MergeAppendState *) ps)->mergeplans,
+ ((MergeAppendState *) ps)->ms_nplans);
+ break;
+ case T_BitmapAnd:
+ UnWrapMemberNodesExecProcNodesWithExplain(((BitmapAndState *) ps)->bitmapplans,
+ ((BitmapAndState *) ps)->nplans);
+ break;
+ case T_BitmapOr:
+ UnWrapMemberNodesExecProcNodesWithExplain(((BitmapOrState *) ps)->bitmapplans,
+ ((BitmapOrState *) ps)->nplans);
+ break;
+ case T_SubqueryScan:
+ UnwrapExecProcNodeWithExplain(((SubqueryScanState *) ps)->subplan);
+ break;
+ case T_CustomScan:
+ UnwrapCustomPlanChildExecProcNodesWithExplain((CustomScanState *) ps);
+ break;
+ default:
+ break;
+ }
+}
+
+/*
+ * ProgressiveExplainHashShmemSize
+ * Compute shared memory space needed for explain hash.
+ */
+Size
+ProgressiveExplainHashShmemSize(void)
+{
+ Size size = 0;
+ long max_table_size;
+
+ max_table_size = add_size(MaxBackends,
+ max_parallel_workers);
+ size = add_size(size,
+ hash_estimate_size(max_table_size,
+ sizeof(ProgressiveExplainHashEntry)));
+
+ return size;
+}
+
+/*
+ * InitProgressiveExplainHash -
+ * Initialize hash used to store data of progressive explains.
+ */
+void
+InitProgressiveExplainHash(void)
+{
+ HASHCTL info;
+
+ info.keysize = sizeof(int);
+ info.entrysize = sizeof(ProgressiveExplainHashEntry);
+
+ progressiveExplainHash = ShmemInitHash("progressive explain hash",
+ 50, 50,
+ &info,
+ HASH_ELEM | HASH_BLOBS);
+}
+
+/*
+ * pg_stat_progress_explain -
+ * Return the progress of progressive explains.
+ */
+Datum
+pg_stat_progress_explain(PG_FUNCTION_ARGS)
+{
+#define EXPLAIN_ACTIVITY_COLS 4
+ int num_backends = pgstat_fetch_stat_numbackends();
+ int curr_backend;
+ HASH_SEQ_STATUS hash_seq;
+ ProgressiveExplainHashEntry *entry;
+ dsa_area *a;
+ ProgressiveExplainHashData *ped;
+
+ ReturnSetInfo *rsinfo = (ReturnSetInfo *) fcinfo->resultinfo;
+
+ InitMaterializedSRF(fcinfo, 0);
+
+ LWLockAcquire(ProgressiveExplainHashLock, LW_SHARED);
+
+ hash_seq_init(&hash_seq, progressiveExplainHash);
+ while ((entry = hash_seq_search(&hash_seq)) != NULL)
+ {
+ Datum values[EXPLAIN_ACTIVITY_COLS] = {0};
+ bool nulls[EXPLAIN_ACTIVITY_COLS] = {0};
+
+ /*
+ * We don't look at a DSA that doesn't contain data yet, or at our own
+ * row.
+ */
+ if (!DsaPointerIsValid(entry->pe_p) ||
+ MyProcPid == entry->pid)
+ continue;
+
+ a = dsa_attach(entry->pe_h);
+ ped = dsa_get_address(a, entry->pe_p);
+
+ /* 1-based index */
+ for (curr_backend = 1; curr_backend <= num_backends; curr_backend++)
+ {
+ LocalPgBackendStatus *local_beentry;
+ PgBackendStatus *beentry;
+
+ /* Get the next one in the list */
+ local_beentry = pgstat_get_local_beentry_by_index(curr_backend);
+ beentry = &local_beentry->backendStatus;
+
+ if (beentry->st_procpid == ped->pid)
+ {
+ /* Values available to all callers */
+ if (beentry->st_databaseid != InvalidOid)
+ values[0] = ObjectIdGetDatum(beentry->st_databaseid);
+ else
+ nulls[0] = true;
+
+ values[1] = ped->pid;
+ values[2] = TimestampTzGetDatum(ped->last_update);
+
+ if (has_privs_of_role(GetUserId(), ROLE_PG_READ_ALL_STATS) ||
+ has_privs_of_role(GetUserId(), beentry->st_procpid))
+ values[3] = CStringGetTextDatum(ped->plan);
+ else
+ values[3] = CStringGetTextDatum("<insufficient privilege>");
+ break;
+ }
+ }
+
+ tuplestore_putvalues(rsinfo->setResult, rsinfo->setDesc, values, nulls);
+
+ dsa_detach(a);
+
+ }
+ LWLockRelease(ProgressiveExplainHashLock);
+
+ return (Datum) 0;
+}
diff --git a/src/backend/commands/meson.build b/src/backend/commands/meson.build
index dd4cde41d32..2bb0ac7d286 100644
--- a/src/backend/commands/meson.build
+++ b/src/backend/commands/meson.build
@@ -24,6 +24,7 @@ backend_sources += files(
'explain.c',
'explain_dr.c',
'explain_format.c',
+ 'explain_progressive.c',
'explain_state.c',
'extension.c',
'foreigncmds.c',
diff --git a/src/backend/executor/execMain.c b/src/backend/executor/execMain.c
index e9bd98c7738..cbb074eaa93 100644
--- a/src/backend/executor/execMain.c
+++ b/src/backend/executor/execMain.c
@@ -43,6 +43,7 @@
#include "access/xact.h"
#include "catalog/namespace.h"
#include "catalog/partition.h"
+#include "commands/explain_progressive.h"
#include "commands/matview.h"
#include "commands/trigger.h"
#include "executor/executor.h"
@@ -157,6 +158,12 @@ standard_ExecutorStart(QueryDesc *queryDesc, int eflags)
/* caller must ensure the query's snapshot is active */
Assert(GetActiveSnapshot() == queryDesc->snapshot);
+ /*
+ * Setup progressive explain if enabled.
+ */
+ if (progressive_explain != PROGRESSIVE_EXPLAIN_NONE)
+ ProgressiveExplainSetup(queryDesc);
+
/*
* If the transaction is read-only, we need to check if any writes are
* planned to non-temporary tables. EXPLAIN is considered read-only.
@@ -182,6 +189,11 @@ standard_ExecutorStart(QueryDesc *queryDesc, int eflags)
estate = CreateExecutorState();
queryDesc->estate = estate;
+ /*
+ * Adding back reference to QueryDesc
+ */
+ estate->query_desc = queryDesc;
+
oldcontext = MemoryContextSwitchTo(estate->es_query_cxt);
/*
@@ -267,6 +279,12 @@ standard_ExecutorStart(QueryDesc *queryDesc, int eflags)
*/
InitPlan(queryDesc, eflags);
+ /*
+ * Start progressive explain if enabled.
+ */
+ if (progressive_explain != PROGRESSIVE_EXPLAIN_NONE)
+ ProgressiveExplainStart(queryDesc);
+
MemoryContextSwitchTo(oldcontext);
return ExecPlanStillValid(queryDesc->estate);
@@ -516,6 +534,12 @@ standard_ExecutorFinish(QueryDesc *queryDesc)
MemoryContextSwitchTo(oldcontext);
+ /*
+ * Finish progressive explain if enabled.
+ */
+ if (progressive_explain != PROGRESSIVE_EXPLAIN_NONE)
+ ProgressiveExplainFinish(queryDesc);
+
estate->es_finished = true;
}
diff --git a/src/backend/executor/execProcnode.c b/src/backend/executor/execProcnode.c
index f5f9cfbeead..82fe90bf231 100644
--- a/src/backend/executor/execProcnode.c
+++ b/src/backend/executor/execProcnode.c
@@ -72,6 +72,7 @@
*/
#include "postgres.h"
+#include "commands/explain_progressive.h"
#include "executor/executor.h"
#include "executor/nodeAgg.h"
#include "executor/nodeAppend.h"
@@ -118,6 +119,7 @@
#include "executor/nodeWorktablescan.h"
#include "miscadmin.h"
#include "nodes/nodeFuncs.h"
+#include "utils/guc.h"
static TupleTableSlot *ExecProcNodeFirst(PlanState *node);
static TupleTableSlot *ExecProcNodeInstr(PlanState *node);
@@ -462,7 +464,17 @@ ExecProcNodeFirst(PlanState *node)
* have ExecProcNode() directly call the relevant function from now on.
*/
if (node->instrument)
- node->ExecProcNode = ExecProcNodeInstr;
+ {
+ /*
+ * Use wrapper for progressive explains if enabled and the node
+ * belongs to the currently tracked query descriptor.
+ */
+ if (progressive_explain == PROGRESSIVE_EXPLAIN_ANALYZE &&
+ ProgressiveExplainIsActive(node->state->query_desc))
+ node->ExecProcNode = ExecProcNodeInstrExplain;
+ else
+ node->ExecProcNode = ExecProcNodeInstr;
+ }
else
node->ExecProcNode = node->ExecProcNodeReal;
diff --git a/src/backend/executor/instrument.c b/src/backend/executor/instrument.c
index 56e635f4700..6a160ee254f 100644
--- a/src/backend/executor/instrument.c
+++ b/src/backend/executor/instrument.c
@@ -25,6 +25,8 @@ static WalUsage save_pgWalUsage;
static void BufferUsageAdd(BufferUsage *dst, const BufferUsage *add);
static void WalUsageAdd(WalUsage *dst, WalUsage *add);
+static void InstrEndLoopInternal(Instrumentation *instr, bool force);
+
/* Allocate new instrumentation structure(s) */
Instrumentation *
@@ -137,7 +139,7 @@ InstrUpdateTupleCount(Instrumentation *instr, double nTuples)
/* Finish a run cycle for a plan node */
void
-InstrEndLoop(Instrumentation *instr)
+InstrEndLoopInternal(Instrumentation *instr, bool force)
{
double totaltime;
@@ -145,7 +147,7 @@ InstrEndLoop(Instrumentation *instr)
if (!instr->running)
return;
- if (!INSTR_TIME_IS_ZERO(instr->starttime))
+ if (!INSTR_TIME_IS_ZERO(instr->starttime) && !force)
elog(ERROR, "InstrEndLoop called on running node");
/* Accumulate per-cycle statistics into totals */
@@ -164,6 +166,20 @@ InstrEndLoop(Instrumentation *instr)
instr->tuplecount = 0;
}
+/* Safely finish a run cycle for a plan node */
+void
+InstrEndLoop(Instrumentation *instr)
+{
+ InstrEndLoopInternal(instr, false);
+}
+
+/* Forcibly finish a run cycle for a plan node */
+void
+InstrEndLoopForce(Instrumentation *instr)
+{
+ InstrEndLoopInternal(instr, true);
+}
+
/* aggregate instrumentation information */
void
InstrAggNode(Instrumentation *dst, Instrumentation *add)
diff --git a/src/backend/storage/ipc/ipci.c b/src/backend/storage/ipc/ipci.c
index 2fa045e6b0f..389f5b55831 100644
--- a/src/backend/storage/ipc/ipci.c
+++ b/src/backend/storage/ipc/ipci.c
@@ -25,6 +25,7 @@
#include "access/xlogprefetcher.h"
#include "access/xlogrecovery.h"
#include "commands/async.h"
+#include "commands/explain_progressive.h"
#include "miscadmin.h"
#include "pgstat.h"
#include "postmaster/autovacuum.h"
@@ -150,6 +151,7 @@ CalculateShmemSize(int *num_semaphores)
size = add_size(size, InjectionPointShmemSize());
size = add_size(size, SlotSyncShmemSize());
size = add_size(size, AioShmemSize());
+ size = add_size(size, ProgressiveExplainHashShmemSize());
/* include additional requested shmem from preload libraries */
size = add_size(size, total_addin_request);
@@ -302,6 +304,11 @@ CreateOrAttachShmemStructs(void)
*/
PredicateLockShmemInit();
+ /*
+ * Set up instrumented explain hash table
+ */
+ InitProgressiveExplainHash();
+
/*
* Set up process table
*/
diff --git a/src/backend/storage/lmgr/lwlock.c b/src/backend/storage/lmgr/lwlock.c
index 5702c35bb91..7097312b1a8 100644
--- a/src/backend/storage/lmgr/lwlock.c
+++ b/src/backend/storage/lmgr/lwlock.c
@@ -177,6 +177,7 @@ static const char *const BuiltinTrancheNames[] = {
[LWTRANCHE_SUBTRANS_SLRU] = "SubtransSLRU",
[LWTRANCHE_XACT_SLRU] = "XactSLRU",
[LWTRANCHE_PARALLEL_VACUUM_DSA] = "ParallelVacuumDSA",
+ [LWTRANCHE_PROGRESSIVE_EXPLAIN_DSA] = "ProgressiveExplainDSA",
};
StaticAssertDecl(lengthof(BuiltinTrancheNames) ==
diff --git a/src/backend/tcop/pquery.c b/src/backend/tcop/pquery.c
index 8164d0fbb4f..081966ca267 100644
--- a/src/backend/tcop/pquery.c
+++ b/src/backend/tcop/pquery.c
@@ -102,6 +102,9 @@ CreateQueryDesc(PlannedStmt *plannedstmt,
/* not yet executed */
qd->already_executed = false;
+ /* null until set by progressive explains */
+ qd->pestate = NULL;
+
return qd;
}
diff --git a/src/backend/utils/activity/wait_event_names.txt b/src/backend/utils/activity/wait_event_names.txt
index 9fa12a555e8..b53bc61d0d8 100644
--- a/src/backend/utils/activity/wait_event_names.txt
+++ b/src/backend/utils/activity/wait_event_names.txt
@@ -349,6 +349,7 @@ DSMRegistry "Waiting to read or update the dynamic shared memory registry."
InjectionPoint "Waiting to read or update information related to injection points."
SerialControl "Waiting to read or update shared <filename>pg_serial</filename> state."
AioWorkerSubmissionQueue "Waiting to access AIO worker submission queue."
+ProgressiveExplainHash "Waiting to access backend progressive explain shared hash table."
#
# END OF PREDEFINED LWLOCKS (DO NOT CHANGE THIS LINE)
diff --git a/src/backend/utils/init/postinit.c b/src/backend/utils/init/postinit.c
index 7958ea11b73..3b55a061f3e 100644
--- a/src/backend/utils/init/postinit.c
+++ b/src/backend/utils/init/postinit.c
@@ -33,6 +33,7 @@
#include "catalog/pg_database.h"
#include "catalog/pg_db_role_setting.h"
#include "catalog/pg_tablespace.h"
+#include "commands/explain_progressive.h"
#include "libpq/auth.h"
#include "libpq/libpq-be.h"
#include "mb/pg_wchar.h"
@@ -82,6 +83,8 @@ static void TransactionTimeoutHandler(void);
static void IdleSessionTimeoutHandler(void);
static void IdleStatsUpdateTimeoutHandler(void);
static void ClientCheckTimeoutHandler(void);
+static void ProgressiveExplainStartupTimeoutHandler(void);
+static void ProgressiveExplainTimeoutHandler(void);
static bool ThereIsAtLeastOneRole(void);
static void process_startup_options(Port *port, bool am_superuser);
static void process_settings(Oid databaseid, Oid roleid);
@@ -771,6 +774,10 @@ InitPostgres(const char *in_dbname, Oid dboid,
RegisterTimeout(CLIENT_CONNECTION_CHECK_TIMEOUT, ClientCheckTimeoutHandler);
RegisterTimeout(IDLE_STATS_UPDATE_TIMEOUT,
IdleStatsUpdateTimeoutHandler);
+ RegisterTimeout(PROGRESSIVE_EXPLAIN_TIMEOUT,
+ ProgressiveExplainTimeoutHandler);
+ RegisterTimeout(PROGRESSIVE_EXPLAIN_STARTUP_TIMEOUT,
+ ProgressiveExplainStartupTimeoutHandler);
}
/*
@@ -1432,6 +1439,18 @@ ClientCheckTimeoutHandler(void)
SetLatch(MyLatch);
}
+static void
+ProgressiveExplainStartupTimeoutHandler(void)
+{
+ ProgressiveExplainTrigger();
+}
+
+static void
+ProgressiveExplainTimeoutHandler(void)
+{
+ ProgressiveExplainPending = true;
+}
+
/*
* Returns true if at least one role is defined in this database cluster.
*/
diff --git a/src/backend/utils/misc/guc_tables.c b/src/backend/utils/misc/guc_tables.c
index 989825d3a9c..2a5f5058bda 100644
--- a/src/backend/utils/misc/guc_tables.c
+++ b/src/backend/utils/misc/guc_tables.c
@@ -41,6 +41,8 @@
#include "commands/async.h"
#include "commands/extension.h"
#include "commands/event_trigger.h"
+#include "commands/explain_progressive.h"
+#include "commands/explain_state.h"
#include "commands/tablespace.h"
#include "commands/trigger.h"
#include "commands/user.h"
@@ -479,6 +481,14 @@ static const struct config_enum_entry wal_compression_options[] = {
{NULL, 0, false}
};
+static const struct config_enum_entry progressive_explain_options[] = {
+ {"off", PROGRESSIVE_EXPLAIN_NONE, false},
+ {"explain", PROGRESSIVE_EXPLAIN_EXPLAIN, false},
+ {"analyze", PROGRESSIVE_EXPLAIN_ANALYZE, false},
+ {"false", PROGRESSIVE_EXPLAIN_NONE, true},
+ {NULL, 0, false}
+};
+
/*
* Options for enum values stored in other modules
*/
@@ -533,6 +543,16 @@ int log_parameter_max_length_on_error = 0;
int log_temp_files = -1;
double log_statement_sample_rate = 1.0;
double log_xact_sample_rate = 0;
+int progressive_explain = PROGRESSIVE_EXPLAIN_NONE;
+bool progressive_explain_verbose = false;
+bool progressive_explain_settings = false;
+bool progressive_explain_timing = true;
+bool progressive_explain_buffers = false;
+bool progressive_explain_wal = false;
+bool progressive_explain_costs = true;
+int progressive_explain_min_duration = 1000;
+int progressive_explain_interval = 1000;
+int progressive_explain_format = EXPLAIN_FORMAT_TEXT;
char *backtrace_functions;
int temp_file_limit = -1;
@@ -2141,6 +2161,72 @@ struct config_bool ConfigureNamesBool[] =
NULL, NULL, NULL
},
+ {
+ {"progressive_explain_verbose", PGC_USERSET, STATS_MONITORING,
+ gettext_noop("Controls whether verbose details are added to progressive explains."),
+ gettext_noop("Equivalent to the VERBOSE option of EXPLAIN."),
+ GUC_EXPLAIN
+ },
+ &progressive_explain_verbose,
+ false,
+ NULL, NULL, NULL
+ },
+
+ {
+ {"progressive_explain_settings", PGC_USERSET, STATS_MONITORING,
+ gettext_noop("Controls whether information on modified configuration is added to progressive explains."),
+ gettext_noop("Equivalent to the SETTINGS option of EXPLAIN."),
+ GUC_EXPLAIN
+ },
+ &progressive_explain_settings,
+ false,
+ NULL, NULL, NULL
+ },
+
+ {
+ {"progressive_explain_timing", PGC_USERSET, STATS_MONITORING,
+ gettext_noop("Controls whether information on per node timing is added to progressive explains."),
+ gettext_noop("Equivalent to the TIMING option of EXPLAIN."),
+ GUC_EXPLAIN
+ },
+ &progressive_explain_timing,
+ true,
+ NULL, NULL, NULL
+ },
+
+ {
+ {"progressive_explain_buffers", PGC_USERSET, STATS_MONITORING,
+ gettext_noop("Controls whether information on buffer usage is added to progressive explains."),
+ gettext_noop("Equivalent to the BUFFERS option of EXPLAIN."),
+ GUC_EXPLAIN
+ },
+ &progressive_explain_buffers,
+ false,
+ NULL, NULL, NULL
+ },
+
+ {
+ {"progressive_explain_wal", PGC_USERSET, STATS_MONITORING,
+ gettext_noop("Controls whether information on WAL record generation is added to progressive explains."),
+ gettext_noop("Equivalent to the WAL option of EXPLAIN."),
+ GUC_EXPLAIN
+ },
+ &progressive_explain_wal,
+ false,
+ NULL, NULL, NULL
+ },
+
+ {
+ {"progressive_explain_costs", PGC_USERSET, STATS_MONITORING,
+ gettext_noop("Controls whether information on the estimated startup and total cost of each plan node is added to progressive explains."),
+ gettext_noop("Equivalent to the COSTS option of EXPLAIN."),
+ GUC_EXPLAIN
+ },
+ &progressive_explain_costs,
+ true,
+ NULL, NULL, NULL
+ },
+
/* End-of-list marker */
{
{NULL, 0, 0, NULL, NULL}, NULL, false, NULL, NULL, NULL
@@ -3858,6 +3944,30 @@ struct config_int ConfigureNamesInt[] =
NULL, NULL, NULL
},
+ {
+ {"progressive_explain_interval", PGC_USERSET, STATS_MONITORING,
+ gettext_noop("Sets the interval between instrumented progressive "
+ "explains."),
+ NULL,
+ GUC_UNIT_MS
+ },
+ &progressive_explain_interval,
+ 1000, 10, INT_MAX,
+ NULL, NULL, NULL
+ },
+
+ {
+ {"progressive_explain_min_duration", PGC_USERSET, STATS_MONITORING,
+ gettext_noop("Min query duration to start printing instrumented "
+ "progressive explains."),
+ NULL,
+ GUC_UNIT_MS
+ },
+ &progressive_explain_min_duration,
+ 1000, 0, INT_MAX,
+ NULL, NULL, NULL
+ },
+
/* End-of-list marker */
{
{NULL, 0, 0, NULL, NULL}, NULL, 0, 0, 0, NULL, NULL, NULL
@@ -5406,6 +5516,26 @@ struct config_enum ConfigureNamesEnum[] =
NULL, assign_io_method, NULL
},
+ {
+ {"progressive_explain_format", PGC_USERSET, STATS_MONITORING,
+ gettext_noop("Selects the EXPLAIN output format to be used with progressive explains."),
+ gettext_noop("Equivalent to the FORMAT option of EXPLAIN.")
+ },
+ &progressive_explain_format,
+ EXPLAIN_FORMAT_TEXT, explain_format_options,
+ NULL, NULL, NULL
+ },
+
+ {
+ {"progressive_explain", PGC_USERSET, STATS_MONITORING,
+ gettext_noop("Enables progressive explains."),
+ gettext_noop("Explain output is visible via pg_stat_progress_explain.")
+ },
+ &progressive_explain,
+ PROGRESSIVE_EXPLAIN_NONE, progressive_explain_options,
+ NULL, NULL, NULL
+ },
+
/* End-of-list marker */
{
{NULL, 0, 0, NULL, NULL}, NULL, 0, NULL, NULL, NULL, NULL
diff --git a/src/backend/utils/misc/postgresql.conf.sample b/src/backend/utils/misc/postgresql.conf.sample
index 0b9e3066bde..89aa80afb5c 100644
--- a/src/backend/utils/misc/postgresql.conf.sample
+++ b/src/backend/utils/misc/postgresql.conf.sample
@@ -670,6 +670,20 @@
#log_executor_stats = off
+# - Progressive Explain -
+
+#progressive_explain = off
+#progressive_explain_min_duration = 1s
+#progressive_explain_interval = 1s
+#progressive_explain_format = text
+#progressive_explain_settings = off
+#progressive_explain_verbose = off
+#progressive_explain_buffers = off
+#progressive_explain_wal = off
+#progressive_explain_timing = on
+#progressive_explain_costs = on
+
+
#------------------------------------------------------------------------------
# VACUUMING
#------------------------------------------------------------------------------
diff --git a/src/include/catalog/pg_proc.dat b/src/include/catalog/pg_proc.dat
index 8b68b16d79d..69092d9ccc8 100644
--- a/src/include/catalog/pg_proc.dat
+++ b/src/include/catalog/pg_proc.dat
@@ -12493,4 +12493,14 @@
proargtypes => 'int4',
prosrc => 'gist_stratnum_common' },
+{ oid => '8770',
+ descr => 'statistics: information about progress of backends running statements',
+ proname => 'pg_stat_progress_explain', prorows => '100', proisstrict => 'f',
+ proretset => 't', provolatile => 's', proparallel => 'r',
+ prorettype => 'record', proargtypes => '',
+ proallargtypes => '{oid,int4,timestamptz,text}',
+ proargmodes => '{o,o,o,o}',
+ proargnames => '{datid,pid,last_update,query_plan}',
+ prosrc => 'pg_stat_progress_explain' },
+
]
diff --git a/src/include/commands/explain_progressive.h b/src/include/commands/explain_progressive.h
new file mode 100644
index 00000000000..ae0046aaf90
--- /dev/null
+++ b/src/include/commands/explain_progressive.h
@@ -0,0 +1,57 @@
+/*-------------------------------------------------------------------------
+ *
+ * explain_progressive.h
+ * prototypes for explain_progressive.c
+ *
+ * Portions Copyright (c) 1996-2025, PostgreSQL Global Development Group
+ * Portions Copyright (c) 1994-5, Regents of the University of California
+ *
+ * src/include/commands/explain_progressive.h
+ *
+ *-------------------------------------------------------------------------
+ */
+#ifndef EXPLAIN_PROGRESSIVE_H
+#define EXPLAIN_PROGRESSIVE_H
+
+#include "datatype/timestamp.h"
+#include "executor/executor.h"
+
+typedef enum ProgressiveExplain
+{
+ PROGRESSIVE_EXPLAIN_NONE,
+ PROGRESSIVE_EXPLAIN_EXPLAIN,
+ PROGRESSIVE_EXPLAIN_ANALYZE,
+} ProgressiveExplain;
+
+typedef struct ProgressiveExplainHashEntry
+{
+ int pid; /* hash key of entry - MUST BE FIRST */
+ int plan_alloc_size;
+ dsa_handle pe_h;
+ dsa_pointer pe_p;
+} ProgressiveExplainHashEntry;
+
+typedef struct ProgressiveExplainHashData
+{
+ int pid;
+ TimestampTz last_update;
+ char plan[FLEXIBLE_ARRAY_MEMBER];
+} ProgressiveExplainHashData;
+
+extern bool ProgressiveExplainIsActive(QueryDesc *queryDesc);
+extern void ProgressiveExplainSetup(QueryDesc *queryDesc);
+extern void ProgressiveExplainStart(QueryDesc *queryDesc);
+extern void ProgressiveExplainTrigger(void);
+extern void ProgressiveExplainUpdate(PlanState *node);
+extern void ProgressiveExplainFinish(QueryDesc *queryDesc);
+extern Size ProgressiveExplainHashShmemSize(void);
+extern void InitProgressiveExplainHash(void);
+extern TupleTableSlot *ExecProcNodeInstrExplain(PlanState *node);
+
+/* transaction cleanup code */
+extern void AtAbort_ProgressiveExplain(void);
+extern void AtSubAbort_ProgressiveExplain(void);
+
+extern bool ProgressiveExplainPending;
+
+#endif /* EXPLAIN_PROGRESSIVE_H */
diff --git a/src/include/commands/explain_state.h b/src/include/commands/explain_state.h
index 32728f5d1a1..64370a5d6ea 100644
--- a/src/include/commands/explain_state.h
+++ b/src/include/commands/explain_state.h
@@ -16,6 +16,7 @@
#include "nodes/parsenodes.h"
#include "nodes/plannodes.h"
#include "parser/parse_node.h"
+#include "utils/dsa.h"
typedef enum ExplainSerializeOption
{
@@ -74,6 +75,14 @@ typedef struct ExplainState
/* extensions */
void **extension_state;
int extension_state_allocated;
+ /* set if tracking a progressive explain */
+ bool progressive;
+ /* current plan node in progressive explains */
+ struct PlanState *pe_curr_node;
+ /* reusable instr object used in progressive explains */
+ struct Instrumentation *pe_local_instr;
+ /* dsa area used to store progressive explain data */
+ dsa_area *pe_a;
} ExplainState;
typedef void (*ExplainOptionHandler) (ExplainState *, DefElem *, ParseState *);
diff --git a/src/include/executor/execdesc.h b/src/include/executor/execdesc.h
index ba53305ad42..27692aee542 100644
--- a/src/include/executor/execdesc.h
+++ b/src/include/executor/execdesc.h
@@ -48,6 +48,7 @@ typedef struct QueryDesc
TupleDesc tupDesc; /* descriptor for result tuples */
EState *estate; /* executor's query-wide state */
PlanState *planstate; /* tree of per-plan-node state */
+ struct ExplainState *pestate; /* progressive explain state if enabled */
/* This field is set by ExecutePlan */
bool already_executed; /* true if previously executed */
diff --git a/src/include/executor/instrument.h b/src/include/executor/instrument.h
index 03653ab6c6c..21de2a5632d 100644
--- a/src/include/executor/instrument.h
+++ b/src/include/executor/instrument.h
@@ -109,6 +109,7 @@ extern void InstrStartNode(Instrumentation *instr);
extern void InstrStopNode(Instrumentation *instr, double nTuples);
extern void InstrUpdateTupleCount(Instrumentation *instr, double nTuples);
extern void InstrEndLoop(Instrumentation *instr);
+extern void InstrEndLoopForce(Instrumentation *instr);
extern void InstrAggNode(Instrumentation *dst, Instrumentation *add);
extern void InstrStartParallelQuery(void);
extern void InstrEndParallelQuery(BufferUsage *bufusage, WalUsage *walusage);
diff --git a/src/include/nodes/execnodes.h b/src/include/nodes/execnodes.h
index e42f9f9f957..c863cb8f032 100644
--- a/src/include/nodes/execnodes.h
+++ b/src/include/nodes/execnodes.h
@@ -57,6 +57,7 @@ struct ExprState;
struct ExprContext;
struct RangeTblEntry; /* avoid including parsenodes.h here */
struct ExprEvalStep; /* avoid including execExpr.h everywhere */
+struct QueryDesc; /* avoid including execdesc.h here */
struct CopyMultiInsertBuffer;
struct LogicalTapeSet;
@@ -763,6 +764,9 @@ typedef struct EState
*/
List *es_insert_pending_result_relations;
List *es_insert_pending_modifytables;
+
+ /* Reference to query descriptor */
+ struct QueryDesc *query_desc;
} EState;
@@ -1159,6 +1163,8 @@ typedef struct PlanState
ExecProcNodeMtd ExecProcNode; /* function to return next tuple */
ExecProcNodeMtd ExecProcNodeReal; /* actual function, if above is a
* wrapper */
+ ExecProcNodeMtd ExecProcNodeOriginal; /* pointer to original function
+ * another wrapper was added */
Instrumentation *instrument; /* Optional runtime stats for this node */
WorkerInstrumentation *worker_instrument; /* per-worker instrumentation */
diff --git a/src/include/storage/lwlock.h b/src/include/storage/lwlock.h
index ffa03189e2d..f3499d307f4 100644
--- a/src/include/storage/lwlock.h
+++ b/src/include/storage/lwlock.h
@@ -218,6 +218,7 @@ typedef enum BuiltinTrancheIds
LWTRANCHE_SUBTRANS_SLRU,
LWTRANCHE_XACT_SLRU,
LWTRANCHE_PARALLEL_VACUUM_DSA,
+ LWTRANCHE_PROGRESSIVE_EXPLAIN_DSA,
LWTRANCHE_FIRST_USER_DEFINED,
} BuiltinTrancheIds;
diff --git a/src/include/storage/lwlocklist.h b/src/include/storage/lwlocklist.h
index 932024b1b0b..7d88e7e9b58 100644
--- a/src/include/storage/lwlocklist.h
+++ b/src/include/storage/lwlocklist.h
@@ -84,3 +84,4 @@ PG_LWLOCK(50, DSMRegistry)
PG_LWLOCK(51, InjectionPoint)
PG_LWLOCK(52, SerialControl)
PG_LWLOCK(53, AioWorkerSubmissionQueue)
+PG_LWLOCK(54, ProgressiveExplainHash)
diff --git a/src/include/utils/guc.h b/src/include/utils/guc.h
index f619100467d..cff5c1f4cdb 100644
--- a/src/include/utils/guc.h
+++ b/src/include/utils/guc.h
@@ -278,6 +278,16 @@ extern PGDLLIMPORT int log_min_duration_statement;
extern PGDLLIMPORT int log_temp_files;
extern PGDLLIMPORT double log_statement_sample_rate;
extern PGDLLIMPORT double log_xact_sample_rate;
+extern PGDLLIMPORT int progressive_explain;
+extern PGDLLIMPORT int progressive_explain_min_duration;
+extern PGDLLIMPORT int progressive_explain_interval;
+extern PGDLLIMPORT int progressive_explain_format;
+extern PGDLLIMPORT bool progressive_explain_verbose;
+extern PGDLLIMPORT bool progressive_explain_settings;
+extern PGDLLIMPORT bool progressive_explain_timing;
+extern PGDLLIMPORT bool progressive_explain_buffers;
+extern PGDLLIMPORT bool progressive_explain_wal;
+extern PGDLLIMPORT bool progressive_explain_costs;
extern PGDLLIMPORT char *backtrace_functions;
extern PGDLLIMPORT int temp_file_limit;
@@ -322,6 +332,7 @@ extern PGDLLIMPORT const struct config_enum_entry io_method_options[];
extern PGDLLIMPORT const struct config_enum_entry recovery_target_action_options[];
extern PGDLLIMPORT const struct config_enum_entry wal_level_options[];
extern PGDLLIMPORT const struct config_enum_entry wal_sync_method_options[];
+extern PGDLLIMPORT const struct config_enum_entry explain_format_options[];
/*
* Functions exported by guc.c
diff --git a/src/include/utils/timeout.h b/src/include/utils/timeout.h
index 7b19beafdc9..ea66a0505d9 100644
--- a/src/include/utils/timeout.h
+++ b/src/include/utils/timeout.h
@@ -36,6 +36,8 @@ typedef enum TimeoutId
IDLE_STATS_UPDATE_TIMEOUT,
CLIENT_CONNECTION_CHECK_TIMEOUT,
STARTUP_PROGRESS_TIMEOUT,
+ PROGRESSIVE_EXPLAIN_STARTUP_TIMEOUT,
+ PROGRESSIVE_EXPLAIN_TIMEOUT,
/* First user-definable timeout reason */
USER_TIMEOUT,
/* Maximum number of timeout reasons */
diff --git a/src/test/modules/test_misc/t/008_progressive_explain.pl b/src/test/modules/test_misc/t/008_progressive_explain.pl
new file mode 100644
index 00000000000..05e555a5e26
--- /dev/null
+++ b/src/test/modules/test_misc/t/008_progressive_explain.pl
@@ -0,0 +1,130 @@
+# Copyright (c) 2023-2025, PostgreSQL Global Development Group
+#
+# Test progressive explain
+#
+# We need to make sure pg_stat_progress_explain does not show rows for the local
+# session, even if progressive explain is enabled. For other sessions pg_stat_progress_explain
+# should contain data for a PID only if progressive_explain is enabled and a query
+# is running. Data needs to be removed when query finishes (or gets cancelled).
+
+use strict;
+use PostgreSQL::Test::Cluster;
+use PostgreSQL::Test::Utils;
+use Test::More;
+
+# Initialize node
+my $node = PostgreSQL::Test::Cluster->new('progressive_explain');
+
+$node->init;
+# Configure progressive explain to be logged immediately
+$node->append_conf('postgresql.conf', 'progressive_explain_min_duration = 0');
+$node->start;
+
+# Test for local session
+sub test_local_session
+{
+ my $setting = $_[0];
+ # Make sure local session does not appear in pg_stat_progress_explain
+ my $count = $node->safe_psql(
+ 'postgres', qq[
+ SET progressive_explain='$setting';
+ SELECT count(*) from pg_stat_progress_explain WHERE pid = pg_backend_pid()
+ ]);
+
+ ok($count == "0",
+ "Session cannot see its own explain with progressive_explain set to ${setting}");
+}
+
+# Tests for peer session
+sub test_peer_session
+{
+ my $setting = $_[0];
+ my $ret;
+
+ # Start a background session and get its PID
+ my $background_psql = $node->background_psql(
+ 'postgres',
+ on_error_stop => 0);
+
+ my $pid = $background_psql->query_safe(
+ qq[
+ SELECT pg_backend_pid();
+ ]);
+
+ # Configure progressive explain in background session and run a simple query
+ # letting it finish
+ $background_psql->query_safe(
+ qq[
+ SET progressive_explain='$setting';
+ SELECT 1;
+ ]);
+
+ # Check that pg_stat_progress_explain contains no row for the PID that finished
+ # its query gracefully
+ $ret = $node->safe_psql(
+ 'postgres', qq[
+ SELECT count(*) FROM pg_stat_progress_explain where pid = $pid
+ ]);
+
+ ok($ret == "0",
+ "pg_stat_progress_explain empty for completed query with progressive_explain set to ${setting}");
+
+ # Start query in background session and leave it running
+ $background_psql->query_until(
+ qr/start/, q(
+ \echo start
+ SELECT pg_sleep(600);
+ ));
+
+ $ret = $node->safe_psql(
+ 'postgres', qq[
+ SELECT count(*) FROM pg_stat_progress_explain where pid = $pid
+ ]);
+
+ # If progressive_explain is disabled pg_stat_progress_explain should not contain
+ # row for PID
+ if ($setting eq 'off') {
+ ok($ret == "0",
+ "pg_stat_progress_explain empty for running query with progressive_explain set to ${setting}");
+ }
+ # 1 row for pid is expected for running query
+ else {
+ ok($ret == "1",
+ "pg_stat_progress_explain with 1 row for running query with progressive_explain set to ${setting}");
+ }
+
+ # Terminate running query and make sure it is gone
+ $node->safe_psql(
+ 'postgres', qq[
+ SELECT pg_cancel_backend($pid);
+ ]);
+
+ $node->poll_query_until(
+ 'postgres', qq[
+ SELECT count(*) = 0 FROM pg_stat_activity
+ WHERE pid = $pid and state = 'active'
+ ]);
+
+ # Check again pg_stat_progress_explain and confirm that the existing row is
+ # now gone
+ $ret = $node->safe_psql(
+ 'postgres', qq[
+ SELECT count(*) FROM pg_stat_progress_explain where pid = $pid
+ ]);
+
+ ok($ret == "0",
+ "pg_stat_progress_explain empty for canceled query with progressive_explain set to ${setting}");
+}
+
+# Run tests for the local session
+test_local_session('off');
+test_local_session('explain');
+test_local_session('analyze');
+
+# Run tests for peer session
+test_peer_session('off');
+test_peer_session('explain');
+test_peer_session('analyze');
+
+$node->stop;
+done_testing();
diff --git a/src/test/regress/expected/rules.out b/src/test/regress/expected/rules.out
index 47478969135..62b70cf4618 100644
--- a/src/test/regress/expected/rules.out
+++ b/src/test/regress/expected/rules.out
@@ -2041,6 +2041,13 @@ pg_stat_progress_create_index| SELECT s.pid,
s.param15 AS partitions_done
FROM (pg_stat_get_progress_info('CREATE INDEX'::text) s(pid, datid, relid, param1, param2, param3, param4, param5, param6, param7, param8, param9, param10, param11, param12, param13, param14, param15, param16, param17, param18, param19, param20)
LEFT JOIN pg_database d ON ((s.datid = d.oid)));
+pg_stat_progress_explain| SELECT s.datid,
+ d.datname,
+ s.pid,
+ s.last_update,
+ s.query_plan
+ FROM (pg_stat_progress_explain() s(datid, pid, last_update, query_plan)
+ LEFT JOIN pg_database d ON ((s.datid = d.oid)));
pg_stat_progress_vacuum| SELECT s.pid,
s.datid,
d.datname,
diff --git a/src/tools/pgindent/typedefs.list b/src/tools/pgindent/typedefs.list
index 78e22a14f62..d9bdd7ce536 100644
--- a/src/tools/pgindent/typedefs.list
+++ b/src/tools/pgindent/typedefs.list
@@ -2297,6 +2297,9 @@ ProcessUtilityContext
ProcessUtility_hook_type
ProcessingMode
ProgressCommandType
+ProgressiveExplain
+ProgressiveExplainHashData
+ProgressiveExplainHashEntry
ProjectSet
ProjectSetPath
ProjectSetState
--
2.43.0
On Wed, Mar 26, 2025 at 5:58 PM Rafael Thofehrn Castro
<rafaelthca@gmail.com> wrote:
So implementation was done based on transaction nested level. Cleanup is only
performed when AbortSubTransaction() is called in the same transaction nested
level as the one where the query is running. This covers both PL/pgSQL exception
blocks and savepoints.
Right. I think this is much closer to being correct. However, I
actually think it should look more like this:
void
AtEOSubXact_ProgressiveExplain(bool isCommit, int nestDepth)
{
if (activeQueryDesc != NULL &&
activeQueryXactNestLevel >= nestDepth)
{
if (isCommit)
elog(WARNING, "leaked progressive explain query descriptor");
ProgressiveExplainCleanup(NULL);
}
}
By including the is-commit case in there, we can catch any bugs where
things aren't cleaned up properly before a transaction is committed.
We generally want to test >= nestDepth instead of == nestDepth in case
multiple subtransaction levels abort all at once; I'm not sure it
matters here, but even if it isn't, it's best to be consistent with
the practice elsewhere. Having {Commit,Abort}SubTransaction pass the
nestDepth instead of calling GetCurrentTransactionNestLevel() also has
precedent e.g. see AtEOSubXact_HashTables.
As a further refinement, consider initializing
activeQueryXactNestLevel to -1 and resetting it to that value each
time you end a progressive EXPLAIN, so that activeQueryDesc != NULL if
and only if activeQueryXactNestLevel >= 0. Then the test above can be
simplified to just if (activeQueryXactNestLevel >= nestDepth).
The comment for ProgressiveExplainIsActive() is a copy-paste from
another function that you forgot to update. Also, the function body
could be reduced to a one-liner: return queryDesc == activeQueryDesc;
There is a comment in ExplainOnePlan() that says "Handle progressive
explain cleanup manually if enabled as it is not called as part of
ExecutorFinish," but standard_ExecutorFinish does indeed call
ProgressiveExplainFinish(), so either the comment is misleading or the
code is wrong.
standard_ExecutorFinish() makes its call to ProgressiveExplainFinish()
dependent on the value of the progressive_explain GUC, but that GUC
could be changed in mid-query via set_config() or a plpgsql function
that calls SET, which could result in skipping the cleanup even when
it's needed. I think you should make the call unconditional and make
it entirely the job of ProgressiveExplainFinish() to decide whether
any cleanup is needed.
ProgressiveExplainFinish() calls ProgressiveExplainCleanup() in most
cases, but sometimes just disables the timeout instead. I think this
is weird. I think you should just always call
ProgressiveExplainCleanup() and make sure it's robust and cleans up
however much or little is appropriate in all cases.
On the flip side, I can't really see why
dsa_detach(queryDesc->pestate->pe_a) needs to be done while holding
ProgressiveExplainHashLock. Why not just have
ProgressiveExplainFinish() call ProgressiveExplainCleanup(), and then
afterwards it can do the dsa_detach() in the caller? Then
ProgressiveExplainCleanup() no longer needs an argument.
ProgressiveExplainPrint() can save a level of indentation in a large
chunk of the function by understanding that elog(ERROR) does not
return. You don't need to wrap everything that follows in else {}.
In the documentation table called pg-stat-progress-explain-view, some
descriptions end with a period and others do not.
Calling ProgressiveExplainTrigger() directly from
ProgressiveExplainStartupTimeoutHandler() seems extremely scary --
we've tried hard to make our signal handlers do only very simple
things like setting a flag, and this one runs around the entire
PlanState tree modifying Very Important Things. I fear that's going to
be hard to make robust. Like, what happens if we're going around
trying to change ExecProcNode pointers while the calling code was also
going around trying to change ExecProcNode pointers? I can't quite see
how this won't result in chaos.
--
Robert Haas
EDB: http://www.enterprisedb.com
Thanks Robert. Very thorough analysis there.
Things I don't comment on will be fixed without further discussion.
There is a comment in ExplainOnePlan() that says "Handle progressive
explain cleanup manually if enabled as it is not called as part of
ExecutorFinish," but standard_ExecutorFinish does indeed call
ProgressiveExplainFinish(), so either the comment is misleading or the
code is wrong.
The comment is misleading. What I meant to say is that queries executed via
EXPLAIN (without analyze) don't call ExecutorFinish, so
ProgressiveExplainFinish
isn't called in that context. I will fix the comment.
Calling ProgressiveExplainTrigger() directly from
ProgressiveExplainStartupTimeoutHandler() seems extremely scary --
we've tried hard to make our signal handlers do only very simple
things like setting a flag, and this one runs around the entire
PlanState tree modifying Very Important Things. I fear that's going to
be hard to make robust. Like, what happens if we're going around
trying to change ExecProcNode pointers while the calling code was also
going around trying to change ExecProcNode pointers? I can't quite see
how this won't result in chaos.
Agreed. In that other similar patch to log query plans after a signal is
sent
from another session there was the same discussion and concerns.
I don't see another way of doing it here. This patch became 100x more
complex
after I added GUC progressive_explain_min_duration, that required changing
the
execProcNode wrapper on the fly.
I can see some alternatives here:
A) Use a temporary execProcNode wrapper set at query start here:
/*
* If instrumentation is required, change the wrapper to one that just
* does instrumentation. Otherwise we can dispense with all wrappers and
* have ExecProcNode() directly call the relevant function from now on.
*/
if (node->instrument)
{
/*
* Use wrapper for progressive explains if enabled and the node
* belongs to the currently tracked query descriptor.
*/
if (progressive_explain == PROGRESSIVE_EXPLAIN_ANALYZE &&
ProgressiveExplainIsActive(node->state->query_desc))
node->ExecProcNode = ExecProcNodeInstrExplain;
else
node->ExecProcNode = ExecProcNodeInstr;
This wrapper will have an additional logic that will check if a boolean set
by the timeout function has changed. When that happens the initial
progressive
explain setup is done and execProcNode is unwrapped in place. This should be
safe.
The problem here is that all queries would always be using the custom
wrapper until timeout is triggered, and that can add unnecessary overhead.
B) We get rid of the idea of applying progressive explains to non
instrumented
runs (which was my original idea). My main use-case here is to see progress
of instrumented runs anyways. For that idea we have 2 possibilities:
B1) Keep implementation as is, with all the existing GUCs to control what
is included in the printed plan. We just change GUC progressive_explain to
be
a boolean again. If true, instrumentation will be enabled for the query
being
executed and progressive explain will be printed consecutively.
B2) Get rid of all new GUCs I added and change the progressive explain
feature
to become an option of the EXPLAIN command. Something like:
EXPLAIN (ANALYZE, PROGRESSIVE) SELECT * FROM ...
(B1) would allow progressive explans in regular queries, but I'm not sure
people
would be willing to enable it globally as it adds instrumentation overhead.
What do you think of the options?
Rafael.
On Thu, Mar 27, 2025 at 9:38 PM Rafael Thofehrn Castro
<rafaelthca@gmail.com> wrote:
Calling ProgressiveExplainTrigger() directly from
ProgressiveExplainStartupTimeoutHandler() seems extremely scary --Agreed. In that other similar patch to log query plans after a signal is sent
from another session there was the same discussion and concerns.I don't see another way of doing it here. This patch became 100x more complex
after I added GUC progressive_explain_min_duration, that required changing the
execProcNode wrapper on the fly.I can see some alternatives here:
A) Use a temporary execProcNode wrapper set at query start here:
The problem here is that all queries would always be using the custom
wrapper until timeout is triggered, and that can add unnecessary overhead.B) We get rid of the idea of applying progressive explains to non instrumented
runs (which was my original idea). My main use-case here is to see progress
of instrumented runs anyways. For that idea we have 2 possibilities:B1) Keep implementation as is, with all the existing GUCs to control what
is included in the printed plan. We just change GUC progressive_explain to be
a boolean again. If true, instrumentation will be enabled for the query being
executed and progressive explain will be printed consecutively.B2) Get rid of all new GUCs I added and change the progressive explain feature
to become an option of the EXPLAIN command. Something like:EXPLAIN (ANALYZE, PROGRESSIVE) SELECT * FROM ...
(B1) would allow progressive explans in regular queries, but I'm not sure people
would be willing to enable it globally as it adds instrumentation overhead.
I'm inclined to think that there isn't much value to this feature with
progressive_explain=explain. If you just want to print plans after a
certain amount of elapsed runtime, you can already use auto_explain to
do that. Unless there's some pretty large benefit to doing it with
this feature over what that already does, I think we shouldn't invent
a second method. Granted, auto_explain is a contrib module and this is
proposed as a core feature, but I feel like that use case of printing
the plan once is so different from what I see as the core value
proposition of this feature that I think it might just be confusing to
include it in scope. There's nothing actually "progressive" about
that, because you're just doing it once.
But having said that, I'm not quite sure I understand why you're
proposing (A) and (B1) as separate alternatives. Changing
progressive_explain to be a Boolean doesn't seem like it solves the
problem of needing to wrap ExecProcNode from a signal handler. The
only thing that seems to solve that problem is to instead do the
wrapping at the start of the query, which AIUI is (A). So I feel like
you should do (A) to solve the problem I thought we were talking about
and (B1) to make things simpler. Am I misunderstanding the trade-offs
here?
I did consider proposing (B2) yesterday, not so much because of this
issue but because I wondered whether it might just be a better
interface. But on reflection I decided it wasn't, because it removes
the option to just turn this feature on for all of your queries, which
somebody might want to do. Also, it would actually be kind of a weird
interface, because every other form of EXPLAIN returns the plan as
output and throws away the original query output; but this form would
store the plan ephemerally in a view and return the original output.
I'm sure we could make that work and find a way to document it but it
seems odd.
In general, I think we should err on the side of making this feature
safe even at some performance cost. If we put in place a version of
this feature that imposes a great performance overhead but does not
crash the server, some people will be unhappy, complain, and/or be
unable to use the feature. That will be sad. But, if we put in place a
version of this feature that performs great and occasionally crashes
the server, that will be much sadder. So let's do something we're very
confident is safe. There is always the opportunity to change things in
the future if we're more confident about some questionable choice then
than we are now -- performance optimization can be done after the
fact. But if it's not stable, the only thing we're likely to change in
the future is to revert the whole thing.
--
Robert Haas
EDB: http://www.enterprisedb.com
On 3/26/25 22:57, Rafael Thofehrn Castro wrote:
So implementation was done based on transaction nested level. Cleanup is
only
performed when AbortSubTransaction() is called in the same transaction
nested
level as the one where the query is running. This covers both PL/pgSQL
exception
blocks and savepoints.
Thanks for your efforts!
Let me provide an alternative view of your code.
Postgres has a massive gap between the start of the execution and the
end. All the stuff that happens at that time can't be discovered. That's
sad. If you add a new hook into the ExecuteNode (it may be designed as a
pointer in the PlanState to affect only necessary nodes), you may push
the code out of the core and give other extensions additional tools.
I see your reasoning here [1]/messages/by-id/CAG0ozMrtK_u8Uf5KNZUmRNuMphV5tnC5DEhRBNRGK+K4L506xw@mail.gmail.com, but I think with the commit 4fd02bf7cf9,
it should be revised.
As I see after reviewing your code, to design it as an extension, some
parallel query execution stuff needs to be exported, and Instrumentation
needs to be changed a little. For example, with progressive explain we
have a new node state - 'not visited yet' that is different from 'not
executed'.
What is the purpose of a new hook from a broad perspective? I designed
at least two features with such a hook in the Postgres fork: 1)
real-time execution statistics and 2) query interruption on the
arbitrary trigger (planning error detected, a signal arrived, too much
temp memory allocated, etc.). I am not sure if it would be interesting
for the community, but when a query is executed for more than one
minute, I certainly want to control what is happening and have some
tools except query abortion.
The benefit of such an approach is that it is doable to change the
Instrumentation, add a hook now, and develop this extension without a
rush until it is stable - I think at least the case of parallel
execution may be enhanced.
[1]: /messages/by-id/CAG0ozMrtK_u8Uf5KNZUmRNuMphV5tnC5DEhRBNRGK+K4L506xw@mail.gmail.com
/messages/by-id/CAG0ozMrtK_u8Uf5KNZUmRNuMphV5tnC5DEhRBNRGK+K4L506xw@mail.gmail.com
--
regards, Andrei Lepikhov
But having said that, I'm not quite sure I understand why you're
proposing (A) and (B1) as separate alternatives. Changing
progressive_explain to be a Boolean doesn't seem like it solves the
problem of needing to wrap ExecProcNode from a signal handler. The
only thing that seems to solve that problem is to instead do the
wrapping at the start of the query, which AIUI is (A). So I feel like
you should do (A) to solve the problem I thought we were talking about
and (B1) to make things simpler. Am I misunderstanding the trade-offs
here?
Both (A) and (B1) use the same strategy, which is to wrap at query start.
What
changes is that (A), where we keep the 'explain' option, allows users to
still
see the plan without having to include instrumentation overhead.
But (A) is doomed from the start as the custom wrapper will have custom
logic
that will add some overhead. That is what I was able to avoid with the
current
patch that does wrapping in the timeout handler function.
In (B1) it is ok to have a custom wrapper that does the progressive explain
in fixed internals as that overhead is far less noticeable than the overhead
in the already existing ExecProcNodeInstr wrapper. I tested that.
Notice that the gist of (B1) is already part of the existing patch. If
progressive_explain
is set to 'analyze' I set a wrapper ExecProcNodeInstrExplain at query start.
So I think the way to go is with (B1), where the ExecProcNodeInstrExplain
wrapper will continue to do what it does, but only after
progressive_explain_min_duration
has passed (boolean flag set by the timeout handler function). And I get rid
of the 'explain'.
As you said, visibility on the non instrumented query plan is already a
feature
of auto_explain.
The benefit of such an approach is that it is doable to change the
Instrumentation, add a hook now, and develop this extension without a
rush until it is stable - I think at least the case of parallel
execution may be enhanced.
Thanks for the input Andrei! The main issue is that progressive explain
touches a bunch of different parts of the code that would require additional
hooks. For example, to adjust the execProcNode wrapper at query start. Also
hook for xact/subxact cleanups, I don't think it exists yet.
The biggest challenge is what lies inside ExplainNode(), an already complex
function that calls several other helper functions. Progressive explain had
to change several parts of it.
The only solution I found back then while thinking about this idea was to
make
ExplainNode itself a hook, and the extension would duplicate the whole code
with
the additional custom logic. And that is definitely not a good idea.
With the new hooks Robert added it is indeed one big step ahead, but we
still need
to deal with instrumentation and other nuances as you said.
I am definitely not the authority here to talk about the best way forward.
If there is an agreement in turning this into an extension, it can be a new
feature in auto_explain.
Rafael.
On Fri, Mar 28, 2025 at 12:09 PM Rafael Thofehrn Castro
<rafaelthca@gmail.com> wrote:
I am definitely not the authority here to talk about the best way forward.
If there is an agreement in turning this into an extension, it can be a new
feature in auto_explain.
I'm not against adding some more hooks to explain.c, but I don't
really see how you could add hooks to explain.c that would allow
somebody to implement this feature out of core, unless either (a) you
are thinking that they will copy and paste a lot of core code or (b)
you make the hooks extremely fine-grained and special-purpose,
designed to accomplish the extremely precise thing that this feature
would need them to do. But that just seems like bad hook design. I
don't think hooks have to be amazingly general in order to be
accepted, but if there's no use of the hook that doesn't involve a big
cut-and-paste operation, that's poor, and if the hook can only ever be
used to implement one thing, you might as well put the thing in core.
I feel like there are just two separate things here. There is Rafael's
progressive EXPLAIN patch which, as far as I can see currently, is
really only going to work as in-core feature, and then there is
Andrei's desire for more hooks, which may very well also be a good
idea but not the same idea.
--
Robert Haas
EDB: http://www.enterprisedb.com
On Fri, Mar 28, 2025 at 12:09 PM Rafael Thofehrn Castro
<rafaelthca@gmail.com> wrote:
As you said, visibility on the non instrumented query plan is already a feature
of auto_explain.
True, although thinking about it more, they're not being sent to the
same place. auto_explain goes to the log, and this goes to a view.
What about something like this:
progressive_explain = on | off
progressive_explain_inteval = 5s
If progressive_explain is off, then we don't populate the view. If
it's on, then we populate the view during query startup. Every time
the interval elapses, we update it again. If you want to update the
view just once at query startup and not thereafter, you can set
progressive_explain_interval = 0.
--
Robert Haas
EDB: http://www.enterprisedb.com
True, although thinking about it more, they're not being sent to the
same place. auto_explain goes to the log, and this goes to a view.
What about something like this:
progressive_explain = on | off
progressive_explain_inteval = 5s
If progressive_explain is off, then we don't populate the view. If
it's on, then we populate the view during query startup. Every time
the interval elapses, we update it again. If you want to update the
view just once at query startup and not thereafter, you can set
progressive_explain_interval = 0.
That also works. My concern is that with that approach there is a huge
"hidden" change of execution behavior between setting
progressive_explain_inteval
to 0 and something greater than that.
Setting to 0 doesn't do anything other than dumping the plain plan at
query start. Setting any other value will enable instrumentation under the
hood. This would have to be very well documented.
I'm still more inclined to use:
progressive_explain = off | explain | analyze
progressive_explain_interval = any millisecond greater than a min threshold
(currently 10ms). It doesn't make sense to be dumping the instrumented plan
every 1ms for example, IMHO.
On Fri, Mar 28, 2025 at 3:59 PM Rafael Thofehrn Castro
<rafaelthca@gmail.com> wrote:
True, although thinking about it more, they're not being sent to the
same place. auto_explain goes to the log, and this goes to a view.
What about something like this:
progressive_explain = on | off
progressive_explain_inteval = 5sI'm still more inclined to use:
progressive_explain = off | explain | analyze
progressive_explain_interval = any millisecond greater than a min threshold
(currently 10ms). It doesn't make sense to be dumping the instrumented plan
every 1ms for example, IMHO.
I still have trouble understanding what that means. Is the interval
irrelevant except when progressive_explain = analyze?
--
Robert Haas
EDB: http://www.enterprisedb.com
I still have trouble understanding what that means. Is the interval
irrelevant except when progressive_explain = analyze?
That is correct. Interval currently is only used when instrumentation
is enabled through progressive_explain = analyze.
That is correct. Interval currently is only used when instrumentation
is enabled through progressive_explain = analyze.
I guess there would still be a point in printing the plan on a regular
interval
when instrumentation is disabled. In that case the only thing we would see
changing in the plan is the node currently being executed. But with that we
add more overhead to progressive_explain = 'explain', that in that case will
also require a custom execProcNode wrapper.
On Fri, Mar 28, 2025 at 4:12 PM Rafael Thofehrn Castro
<rafaelthca@gmail.com> wrote:
That is correct. Interval currently is only used when instrumentation
is enabled through progressive_explain = analyze.I guess there would still be a point in printing the plan on a regular interval
when instrumentation is disabled. In that case the only thing we would see
changing in the plan is the node currently being executed. But with that we
add more overhead to progressive_explain = 'explain', that in that case will
also require a custom execProcNode wrapper.
I don't think that's at all worth it. I had even considered asking for
the current node stuff to be removed altogether. Doing regular updates
to only change that seems like a real waste. It's barely ever going to
be useful to see the node being executed right this exact instant -
what you care about is which nodes suck up resources over time.
As far as option naming is concerned, I'm open to input from others,
but personally I feel like ANALYZE is a bit opaque as an EXPLAIN
option in general. We've had previous discussions about the fact that
we might have wanted to name it EXECUTE if EXPLAIN EXECUTE didn't
already mean something else. I think for most people, what ANALYZE
means - somewhat counterintuitively - is that we're actually going to
run the query. But here we're always going to run the query, so my
brain just goes into a tailspin trying to figure out what ANALYZE
means. So, personally, I would like to see progressive_explain = off |
explain | analyze go away in favor of progressive_explain = off | on.
I think we have two ideas on how to get there so far:
(1) Just get rid of what you call progressive_explain = explain. After
all, that amounts to just serializing the plan once, and maybe that's
not actually such a great feature. With that change, then we only have
two remaining values for the progressive_explain, and we can call them
"off" and "on".
(2) Merge progressive_explain = explain and progressive_explain =
analyze into a single value, progressive_explain = on. To tell which
is intended, just check progresive_explain_interval. If it's zero,
then there's no repeat, so we mean what you currently call
progressive_explain = explain i.e. serialize once. Otherwise we mean
progressive_explain = analyze.
I'm open to idea #3, but I'm pretty resistant to the idea of
progressive_explain remaining three-valued as it is today. If 27
people show up to say that they understand what that means perfectly
and Robert's a big fat idiot, I shall of course defer to the
consensus, but I kind of think I might not be alone in finding this
naming confusing.
For what it's worth, I have trouble seeing how anyone gets confused if
we do what I propose as #2. I mean, I agree with your point that they
could misunderstand how much overhead different settings will cause,
so we would need to document that carefully. But if you're just
wondering about the behavior, then I feel like people have a pretty
good chance of guessing that progressive_explain=on,
progressive_explain_interval=0 means do it once. You could think that
means do it at maximum speed, but that seems like a somewhat naive
interpretation. You could also think it disables the feature,
effectively negating progressive_explain=on, but then you should start
to wonder why there are two GUCs. If you don't think either of those
things, then I think "do it once" is the only other reasonable
interpretation.
Of course, sometimes what I think turns out to be completely wrong!
--
Robert Haas
EDB: http://www.enterprisedb.com
(2) Merge progressive_explain = explain and progressive_explain =
analyze into a single value, progressive_explain = on. To tell which
is intended, just check progresive_explain_interval. If it's zero,
then there's no repeat, so we mean what you currently call
progressive_explain = explain i.e. serialize once. Otherwise we mean
progressive_explain = analyze.
Implemented this version. New patch has the following characteristics:
- progressive_explain is a boolean
- GUC progressive_explain_min_duration is removed
- if progresive_explain_interval = 0, update plan in memory only at query
start
- if progresive_explain_interval > 0, share the plan in memory at query
start and update every progresive_explain_interval (instrumentation is
enabled automatically)
- The plan shared in pg_stat_progress_explain at query start does not
contain any instrumentation detail ((never executed), (actual time), etc)
if progresive_explain_interval = 0, even if query is started with EXPLAIN
ANALYZE. My reasoning here is that if the plan will be shared only once we
are actually interested in seeing the plan itself and not instrumentation
progress.
Fixed other comments you shared in previous emails too:
void
AtEOSubXact_ProgressiveExplain(bool isCommit, int nestDepth)
{
if (activeQueryDesc != NULL &&
activeQueryXactNestLevel >= nestDepth)
{
if (isCommit)
elog(WARNING, "leaked progressive explain query descriptor");
ProgressiveExplainCleanup(NULL);
}
}
By including the is-commit case in there, we can catch any bugs where
things aren't cleaned up properly before a transaction is committed.
Added the isCommit logic both to Transaction and Subtransaction commits
and aborts that will notify about leakage if cleanup was not properly
done in a commit. Changed function names back to AtEOXact and AtEOSubXact,
as opposed to AtAbort and AtSubAbort.
We generally want to test >= nestDepth instead of == nestDepth in case
multiple subtransaction levels abort all at once; I'm not sure it
matters here, but even if it isn't, it's best to be consistent with
the practice elsewhere. Having {Commit,Abort}SubTransaction pass the
nestDepth instead of calling GetCurrentTransactionNestLevel() also has
precedent e.g. see AtEOSubXact_HashTables.
Done.
As a further refinement, consider initializing
activeQueryXactNestLevel to -1 and resetting it to that value each
time you end a progressive EXPLAIN, so that activeQueryDesc != NULL if
and only if activeQueryXactNestLevel >= 0. Then the test above can be
simplified to just if (activeQueryXactNestLevel >= nestDepth).
Done.
standard_ExecutorFinish() makes its call to ProgressiveExplainFinish()
dependent on the value of the progressive_explain GUC, but that GUC
could be changed in mid-query via set_config() or a plpgsql function
that calls SET, which could result in skipping the cleanup even when
it's needed. I think you should make the call unconditional and make
it entirely the job of ProgressiveExplainFinish() to decide whether
any cleanup is needed.
Done.
ProgressiveExplainFinish() calls ProgressiveExplainCleanup() in most
cases, but sometimes just disables the timeout instead. I think this
is weird. I think you should just always call
ProgressiveExplainCleanup() and make sure it's robust and cleans up
however much or little is appropriate in all cases.
Now that I removed all the execProcNode wrapping and the conditional cleanup
based on progressive_explain_min_duration (that got removed), this
part became much simpler. ProgressiveExplainFinish always calls
ProgressiveExplainCleanup.
On the flip side, I can't really see why
dsa_detach(queryDesc->pestate->pe_a) needs to be done while holding
ProgressiveExplainHashLock. Why not just have
ProgressiveExplainFinish() call ProgressiveExplainCleanup(), and then
afterwards it can do the dsa_detach() in the caller? Then
ProgressiveExplainCleanup() no longer needs an argument.
That is perfect. Implemented.
ProgressiveExplainPrint() can save a level of indentation in a large
chunk of the function by understanding that elog(ERROR) does not
return. You don't need to wrap everything that follows in else {}.
Fixed.
I will not fix documentation for now as we are not done with
implementation changes yet. Once we agree that the code logic is
safe and sound we can discuss cosmetics (feature name, GUCs, view, etc).
Rafael.
Attachments:
v14-0001-Proposal-for-progressive-explains.patchapplication/octet-stream; name=v14-0001-Proposal-for-progressive-explains.patchDownload
From a01827ed49c7a5bc78daa942947d5b758d2b9e83 Mon Sep 17 00:00:00 2001
From: rafaelthca <rafaelthca@gmail.com>
Date: Sat, 29 Mar 2025 14:19:46 -0300
Subject: [PATCH v14] Proposal for progressive explains
This proposal introduces a feature to print execution plans of active
queries in an in-memory shared hash object so that other sessions can
visualize via new view pg_stat_progress_explain.
Plans are only printed if new GUC parameter progressive_explain is
enabled.
When new GUC progressive_explain_interval is set to 0 the plan will be
printed only at query start. If set to any other value the QueryDesc
will be adjusted adding instrumentation flags. In that case the plan
will be printed on a fixed interval controlled by progressive_explain_interval
including all instrumentation stats computed so far (per node rows and
execution time).
New view:
- pg_stat_progress_explain
- datid: OID of the database
- datname: Name of the database
- pid: PID of the process running the query
- last_update: timestamp when plan was last printed
- query_plan: the actual plan (limited read privileges)
New GUCs:
- progressive_explain: if progressive plans are printed for local
session.
- type: bool
- default: off
- context: user
- progressive_explain_interval: interval between each explain print.
- type: int
- default: 0
- min: 0
- context: user
- progressive_explain_format: format used to print the plans.
- type: enum
- default: text
- values: [TEXT, XML, JSON, or YAML]
- context: user
- progressive_explain_settings: controls whether information about
modified configuration is added to the printed plan.
- type: bool
- default: off
- context: user
- progressive_explain_verbose: controls whether verbose details are
added to the printed plan.
- type: bool
- default: off
- context: user
- progressive_explain_buffers: controls whether buffers details are
added to the printed plan.
- type: bool
- default: off
- context: user
- progressive_explain_timing: controls whether per node timing details
are added to the printed plan.
- type: bool
- default: true
- context: user
- progressive_explain_wal: controls whether WAL record generation
details are added to the printed plan.
- type: bool
- default: off
- context: user
- progressive_explain_costs: controls whether estimated startup and
total cost of each plan noded is added to the printed plan.
- type: bool
- default: true
- context: user
---
contrib/auto_explain/auto_explain.c | 10 +-
doc/src/sgml/config.sgml | 156 ++++++
doc/src/sgml/monitoring.sgml | 82 +++
doc/src/sgml/perform.sgml | 97 ++++
src/backend/access/transam/xact.c | 15 +
src/backend/catalog/system_views.sql | 10 +
src/backend/commands/Makefile | 1 +
src/backend/commands/explain.c | 220 +++++---
src/backend/commands/explain_format.c | 12 +
src/backend/commands/explain_progressive.c | 493 ++++++++++++++++++
src/backend/commands/meson.build | 1 +
src/backend/executor/execMain.c | 21 +
src/backend/executor/execProcnode.c | 16 +-
src/backend/executor/instrument.c | 20 +-
src/backend/storage/ipc/ipci.c | 7 +
src/backend/storage/lmgr/lwlock.c | 1 +
src/backend/tcop/pquery.c | 3 +
.../utils/activity/wait_event_names.txt | 1 +
src/backend/utils/init/postinit.c | 10 +
src/backend/utils/misc/guc_tables.c | 110 ++++
src/backend/utils/misc/postgresql.conf.sample | 13 +
src/include/catalog/pg_proc.dat | 10 +
src/include/commands/explain_progressive.h | 50 ++
src/include/commands/explain_state.h | 9 +
src/include/executor/execdesc.h | 1 +
src/include/executor/instrument.h | 1 +
src/include/nodes/execnodes.h | 6 +
src/include/storage/lwlock.h | 1 +
src/include/storage/lwlocklist.h | 1 +
src/include/utils/guc.h | 10 +
src/include/utils/timeout.h | 1 +
.../test_misc/t/008_progressive_explain.pl | 128 +++++
src/test/regress/expected/rules.out | 7 +
src/tools/pgindent/typedefs.list | 2 +
34 files changed, 1441 insertions(+), 85 deletions(-)
create mode 100644 src/backend/commands/explain_progressive.c
create mode 100644 src/include/commands/explain_progressive.h
create mode 100644 src/test/modules/test_misc/t/008_progressive_explain.pl
diff --git a/contrib/auto_explain/auto_explain.c b/contrib/auto_explain/auto_explain.c
index cd6625020a7..0d28ae2ffe1 100644
--- a/contrib/auto_explain/auto_explain.c
+++ b/contrib/auto_explain/auto_explain.c
@@ -42,14 +42,6 @@ static int auto_explain_log_level = LOG;
static bool auto_explain_log_nested_statements = false;
static double auto_explain_sample_rate = 1;
-static const struct config_enum_entry format_options[] = {
- {"text", EXPLAIN_FORMAT_TEXT, false},
- {"xml", EXPLAIN_FORMAT_XML, false},
- {"json", EXPLAIN_FORMAT_JSON, false},
- {"yaml", EXPLAIN_FORMAT_YAML, false},
- {NULL, 0, false}
-};
-
static const struct config_enum_entry loglevel_options[] = {
{"debug5", DEBUG5, false},
{"debug4", DEBUG4, false},
@@ -191,7 +183,7 @@ _PG_init(void)
NULL,
&auto_explain_log_format,
EXPLAIN_FORMAT_TEXT,
- format_options,
+ explain_format_options,
PGC_SUSET,
0,
NULL,
diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml
index 65ab95be370..c96206e3275 100644
--- a/doc/src/sgml/config.sgml
+++ b/doc/src/sgml/config.sgml
@@ -8682,6 +8682,162 @@ COPY postgres_log FROM '/full/path/to/logfile.csv' WITH csv;
</listitem>
</varlistentry>
+ <varlistentry id="guc-progressive-explain" xreflabel="progressive_explain">
+ <term><varname>progressive_explain</varname> (<type>enum</type>)
+ <indexterm>
+ <primary><varname>progressive_explain</varname> configuration parameter</primary>
+ </indexterm>
+ </term>
+ <listitem>
+ <para>
+ Determines whether progressive explains are enabled and how
+ they are executed; see <xref linkend="using-explain-progressive"/>.
+ Possible values are <literal>off</literal>, <literal>explain</literal>
+ and <literal>analyze</literal>. The default is <literal>off</literal>.
+ When set to <literal>explain</literal> the plan will be printed only
+ once after <xref linkend="guc-progressive-explain-min-duration"/>. If
+ set to <literal>analyze</literal>, instrumentation flags are enabled,
+ causing the plan to be printed on a fixed interval controlled by
+ <xref linkend="guc-progressive-explain-interval"/> including all
+ instrumentation stats computed so far.
+ </para>
+ </listitem>
+ </varlistentry>
+
+ <varlistentry id="guc-progressive-explain-min-duration" xreflabel="progressive_explain_min_duration">
+ <term><varname>progressive_explain_min_duration</varname> (<type>integer</type>)
+ <indexterm>
+ <primary><varname>progressive_explain_min_duration</varname> configuration parameter</primary>
+ </indexterm>
+ </term>
+ <listitem>
+ <para>
+ Sets the threshold (in milliseconds) until progressive explain is
+ printed for the first time. The default is <literal>1s</literal>.
+ </para>
+ </listitem>
+ </varlistentry>
+
+ <varlistentry id="guc-progressive-explain-interval" xreflabel="progressive_explain_interval">
+ <term><varname>progressive_explain_interval</varname> (<type>integer</type>)
+ <indexterm>
+ <primary><varname>progressive_explain_interval</varname> configuration parameter</primary>
+ </indexterm>
+ </term>
+ <listitem>
+ <para>
+ Sets the interval (in milliseconds) between each instrumented
+ progressive explain. The default is <literal>1s</literal>.
+ </para>
+ </listitem>
+ </varlistentry>
+
+ <varlistentry id="guc-progressive-explain-buffers" xreflabel="progressive_explain_buffers">
+ <term><varname>progressive_explain_buffers</varname> (<type>boolean</type>)
+ <indexterm>
+ <primary><varname>progressive_explain_buffers</varname> configuration parameter</primary>
+ </indexterm>
+ </term>
+ <listitem>
+ <para>
+ Controls whether information on buffer usage is added to
+ progressive explains. Equivalent to the BUFFERS option of
+ EXPLAIN. The default is <literal>off</literal>.
+ </para>
+ </listitem>
+ </varlistentry>
+
+ <varlistentry id="guc-progressive-explain-timing" xreflabel="progressive_explain_timing">
+ <term><varname>progressive_explain_timing</varname> (<type>boolean</type>)
+ <indexterm>
+ <primary><varname>progressive_explain_timing</varname> configuration parameter</primary>
+ </indexterm>
+ </term>
+ <listitem>
+ <para>
+ Controls whether information on per node timing is added
+ to progressive explains. Equivalent to the TIMING option of
+ EXPLAIN. The default is <literal>off</literal>.
+ </para>
+ </listitem>
+ </varlistentry>
+
+ <varlistentry id="guc-progressive-explain-wal" xreflabel="progressive_explain_wal">
+ <term><varname>progressive_explain_wal</varname> (<type>boolean</type>)
+ <indexterm>
+ <primary><varname>progressive_explain_wal</varname> configuration parameter</primary>
+ </indexterm>
+ </term>
+ <listitem>
+ <para>
+ Controls whether information on WAL record generation is
+ added to progressive explains. Equivalent to the WAL option of
+ EXPLAIN. The default is <literal>off</literal>.
+ </para>
+ </listitem>
+ </varlistentry>
+
+ <varlistentry id="guc-progressive-explain-verbose" xreflabel="progressive_explain_verbose">
+ <term><varname>progressive_explain_verbose</varname> (<type>boolean</type>)
+ <indexterm>
+ <primary><varname>progressive_explain_verbose</varname> configuration parameter</primary>
+ </indexterm>
+ </term>
+ <listitem>
+ <para>
+ Controls whether verbose details are added to progressive explains.
+ Equivalent to the VERBOSE option of EXPLAIN. The default is
+ <literal>off</literal>.
+ </para>
+ </listitem>
+ </varlistentry>
+
+ <varlistentry id="guc-progressive-explain-settings" xreflabel="progressive_explain_settings">
+ <term><varname>progressive_explain_settings</varname> (<type>boolean</type>)
+ <indexterm>
+ <primary><varname>progressive_explain_settings</varname> configuration parameter</primary>
+ </indexterm>
+ </term>
+ <listitem>
+ <para>
+ Controls whether information on modified configuration is added to
+ progressive explains. Equivalent to the SETTINGS option of EXPLAIN.
+ The default is <literal>off</literal>.
+ </para>
+ </listitem>
+ </varlistentry>
+
+ <varlistentry id="guc-progressive-explain-costs" xreflabel="progressive_explain_costs">
+ <term><varname>progressive_explain_costs</varname> (<type>boolean</type>)
+ <indexterm>
+ <primary><varname>progressive_explain_costs</varname> configuration parameter</primary>
+ </indexterm>
+ </term>
+ <listitem>
+ <para>
+ Controls whether information on the estimated startup and total cost of
+ each plan node is added to progressive explains. Equivalent to the COSTS
+ option of EXPLAIN.
+ The default is <literal>off</literal>.
+ </para>
+ </listitem>
+ </varlistentry>
+
+ <varlistentry id="guc-progressive-explain-format" xreflabel="progressive_explain_format">
+ <term><varname>progressive_explain_format</varname> (<type>enum</type>)
+ <indexterm>
+ <primary><varname>progressive_explain_format</varname> configuration parameter</primary>
+ </indexterm>
+ </term>
+ <listitem>
+ <para>
+ Selects the EXPLAIN output format to be used with progressive
+ explains. Equivalent to the FORMAT option of EXPLAIN. The default
+ is <literal>text</literal>.
+ </para>
+ </listitem>
+ </varlistentry>
+
</variablelist>
</sect2>
diff --git a/doc/src/sgml/monitoring.sgml b/doc/src/sgml/monitoring.sgml
index bacc09cb8af..e5497271eaa 100644
--- a/doc/src/sgml/monitoring.sgml
+++ b/doc/src/sgml/monitoring.sgml
@@ -6842,6 +6842,88 @@ FROM pg_stat_get_backend_idset() AS backendid;
</sect2>
+<sect2 id="explain-progress-reporting">
+ <title>EXPLAIN Progress Reporting</title>
+
+ <indexterm>
+ <primary>pg_stat_progress_explain</primary>
+ </indexterm>
+
+ <para>
+ Whenever a client backend or parallel worker is running a query with
+ <xref linkend="guc-progressive-explain"/> enabled, the
+ <structname>pg_stat_progress_explain</structname> view will contain a
+ corresponding row with query plan details; see
+ <xref linkend="using-explain-progressive"/>. The table below describe the
+ information that will be reported.
+ </para>
+
+ <table id="pg-stat-progress-explain-view" xreflabel="pg_stat_progress_explain">
+ <title><structname>pg_stat_progress_explain</structname> View</title>
+ <tgroup cols="1">
+ <thead>
+ <row>
+ <entry role="catalog_table_entry"><para role="column_definition">
+ Column Type
+ </para>
+ <para>
+ Description
+ </para></entry>
+ </row>
+ </thead>
+
+ <tbody>
+ <row>
+ <entry role="catalog_table_entry"><para role="column_definition">
+ <structfield>datid</structfield> <type>oid</type>
+ </para>
+ <para>
+ OID of the database this backend is connected to
+ </para></entry>
+ </row>
+
+ <row>
+ <entry role="catalog_table_entry"><para role="column_definition">
+ <structfield>datname</structfield> <type>name</type>
+ </para>
+ <para>
+ Name of the database this backend is connected to
+ </para></entry>
+ </row>
+
+ <row>
+ <entry role="catalog_table_entry"><para role="column_definition">
+ <structfield>pid</structfield> <type>integer</type>
+ </para>
+ <para>
+ Process ID of a client backend or parallel worker.
+ </para></entry>
+ </row>
+
+ <row>
+ <entry role="catalog_table_entry"><para role="column_definition">
+ <structfield>last_update</structfield> <type>timestamp with time zone</type>
+ </para>
+ <para>
+ Timestamp when plan was last printed.
+ </para></entry>
+ </row>
+
+ <row>
+ <entry role="catalog_table_entry"><para role="column_definition">
+ <structfield>query_plan</structfield> <type>text</type>
+ </para>
+ <para>
+ The progressive explain text.
+ </para></entry>
+ </row>
+
+ </tbody>
+ </tgroup>
+ </table>
+
+ </sect2>
+
</sect1>
<sect1 id="dynamic-trace">
diff --git a/doc/src/sgml/perform.sgml b/doc/src/sgml/perform.sgml
index 387baac7e8c..04a78f29df9 100644
--- a/doc/src/sgml/perform.sgml
+++ b/doc/src/sgml/perform.sgml
@@ -1169,6 +1169,103 @@ EXPLAIN ANALYZE SELECT * FROM tenk1 WHERE unique1 < 100 AND unique2 > 9000
</para>
</sect2>
+ <sect2 id="using-explain-progressive">
+ <title>Progressive <command>EXPLAIN</command></title>
+
+ <para>
+ The query plan created by the planner for any given active query can
+ be visualized by any session via <xref linkend="pg-stat-progress-explain-view"/>
+ view when <xref linkend="guc-progressive-explain"/> is enabled in the
+ client backend or parallel worker executing query and after min duration
+ specified by <xref linkend="guc-progressive-explain-min-duration"/> has
+ passed. Settings <xref linkend="guc-progressive-explain-timing"/>,
+ <xref linkend="guc-progressive-explain-buffers"/> and
+ <xref linkend="guc-progressive-explain-wal"/> control which additional
+ instrumentaton details are collected and included in the output while
+ <xref linkend="guc-progressive-explain-format"/>,
+ <xref linkend="guc-progressive-explain-verbose"/>,
+ <xref linkend="guc-progressive-explain-settings"/> and
+ <xref linkend="guc-progressive-explain-costs"/>
+ define how the plan is printed and which details are added there.
+ </para>
+
+ <para>
+ When <xref linkend="guc-progressive-explain"/> is set to <literal>explain</literal>
+ the plan will be printed once at the beginning of the query.
+ </para>
+
+ <para>
+<screen>
+SET progressive_explain = 'explain';
+SET
+
+SELECT * FROM test t1 INNER JOIN test t2 ON t1.c1=t2.c1;
+</screen>
+ </para>
+ <para>
+<screen>
+SELECT * FROM pg_stat_progress_explain;
+datid | datname | pid | last_update | query_plan
+-------+----------+-------+-------------------------------+------------------------------------------------------------------------------
+ 5 | postgres | 73972 | 2025-03-13 23:41:01.606324-03 | Hash Join (cost=327879.85..878413.95 rows=9999860 width=18) +
+ | | | | Hash Cond: (t1.c1 = t2.c1) +
+ | | | | -> Seq Scan on test t1 (cost=0.00..154053.60 rows=9999860 width=9) +
+ | | | | -> Hash (cost=154053.60..154053.60 rows=9999860 width=9) +
+ | | | | -> Seq Scan on test t2 (cost=0.00..154053.60 rows=9999860 width=9)+
+ | | | |
+(1 row)
+</screen>
+ </para>
+
+ <para>
+ Setting <xref linkend="guc-progressive-explain"/> to <literal>analyze</literal>
+ will enable instrumentation and the detailed plan is printed on a fixed interval
+ controlled by <xref linkend="guc-progressive-explain-interval"/>, including
+ per node accumulated row count and other statistics.
+ </para>
+
+ <para>
+ Progressive explains include additional information per node to help analyzing
+ execution progress:
+
+ <itemizedlist>
+ <listitem>
+ <para>
+ current: the plan node currently being processed.
+ </para>
+ </listitem>
+ <listitem>
+ <para>
+ never executed: a plan node not processed yet.
+ </para>
+ </listitem>
+ </itemizedlist>
+ </para>
+ <para>
+<screen>
+SET progressive_explain = 'analyze';
+SET
+
+SELECT * FROM test t1 INNER JOIN test t2 ON t1.c1=t2.c1;
+</screen>
+ </para>
+ <para>
+<screen>
+SELECT * FROM pg_stat_progress_explain;
+datid | datname | pid | last_update | query_plan
+-------+----------+-------+-------------------------------+---------------------------------------------------------------------------------------------------------------------------------------
+ 5 | postgres | 73972 | 2025-03-13 23:41:53.951884-03 | Hash Join (cost=327879.85..878413.95 rows=9999860 width=18) (actual time=1581.469..2963.158 rows=64862.00 loops=1) +
+ | | | | Hash Cond: (t1.c1 = t2.c1) +
+ | | | | -> Seq Scan on test t1 (cost=0.00..154053.60 rows=9999860 width=9) (actual time=0.079..486.702 rows=8258962.00 loops=1) (current)+
+ | | | | -> Hash (cost=154053.60..154053.60 rows=9999860 width=9) (actual time=1580.933..1580.933 rows=10000000.00 loops=1) +
+ | | | | -> Seq Scan on test t2 (cost=0.00..154053.60 rows=9999860 width=9) (actual time=0.004..566.961 rows=10000000.00 loops=1) +
+ | | | |
+(1 row)
+</screen>
+ </para>
+
+ </sect2>
+
</sect1>
<sect1 id="planner-stats">
diff --git a/src/backend/access/transam/xact.c b/src/backend/access/transam/xact.c
index b885513f765..28294d802e1 100644
--- a/src/backend/access/transam/xact.c
+++ b/src/backend/access/transam/xact.c
@@ -36,6 +36,7 @@
#include "catalog/pg_enum.h"
#include "catalog/storage.h"
#include "commands/async.h"
+#include "commands/explain_progressive.h"
#include "commands/tablecmds.h"
#include "commands/trigger.h"
#include "common/pg_prng.h"
@@ -2423,6 +2424,12 @@ CommitTransaction(void)
/* Clean up the type cache */
AtEOXact_TypeCache();
+ /*
+ * If progressive explain wasn't properly cleaned after query ended
+ * perform the cleanup and warn about leaked resources.
+ */
+ AtEOXact_ProgressiveExplain(true);
+
/*
* Make catalog changes visible to all backends. This has to happen after
* relcache references are dropped (see comments for
@@ -2993,6 +3000,7 @@ AbortTransaction(void)
AtEOXact_PgStat(false, is_parallel_worker);
AtEOXact_ApplyLauncher(false);
AtEOXact_LogicalRepWorkers(false);
+ AtEOXact_ProgressiveExplain(false);
pgstat_report_xact_timestamp(0);
}
@@ -5193,6 +5201,12 @@ CommitSubTransaction(void)
AtEOSubXact_PgStat(true, s->nestingLevel);
AtSubCommit_Snapshot(s->nestingLevel);
+ /*
+ * If progressive explain wasn't properly cleaned after subxact ended
+ * perform the cleanup and warn about leaked resources.
+ */
+ AtEOSubXact_ProgressiveExplain(true, s->nestingLevel);
+
/*
* We need to restore the upper transaction's read-only state, in case the
* upper is read-write while the child is read-only; GUC will incorrectly
@@ -5361,6 +5375,7 @@ AbortSubTransaction(void)
AtEOSubXact_HashTables(false, s->nestingLevel);
AtEOSubXact_PgStat(false, s->nestingLevel);
AtSubAbort_Snapshot(s->nestingLevel);
+ AtEOSubXact_ProgressiveExplain(false, s->nestingLevel);
}
/*
diff --git a/src/backend/catalog/system_views.sql b/src/backend/catalog/system_views.sql
index 31d269b7ee0..767735c1a2c 100644
--- a/src/backend/catalog/system_views.sql
+++ b/src/backend/catalog/system_views.sql
@@ -1334,6 +1334,16 @@ CREATE VIEW pg_stat_progress_copy AS
FROM pg_stat_get_progress_info('COPY') AS S
LEFT JOIN pg_database D ON S.datid = D.oid;
+CREATE VIEW pg_stat_progress_explain AS
+ SELECT
+ S.datid AS datid,
+ D.datname AS datname,
+ S.pid,
+ S.last_update,
+ S.query_plan
+ FROM pg_stat_progress_explain() AS S
+ LEFT JOIN pg_database AS D ON (S.datid = D.oid);
+
CREATE VIEW pg_user_mappings AS
SELECT
U.oid AS umid,
diff --git a/src/backend/commands/Makefile b/src/backend/commands/Makefile
index cb2fbdc7c60..e10224b2cd2 100644
--- a/src/backend/commands/Makefile
+++ b/src/backend/commands/Makefile
@@ -36,6 +36,7 @@ OBJS = \
explain.o \
explain_dr.o \
explain_format.o \
+ explain_progressive.o \
explain_state.o \
extension.o \
foreigncmds.o \
diff --git a/src/backend/commands/explain.c b/src/backend/commands/explain.c
index ef8aa489af8..179a1f8792b 100644
--- a/src/backend/commands/explain.c
+++ b/src/backend/commands/explain.c
@@ -20,6 +20,7 @@
#include "commands/explain.h"
#include "commands/explain_dr.h"
#include "commands/explain_format.h"
+#include "commands/explain_progressive.h"
#include "commands/explain_state.h"
#include "commands/prepare.h"
#include "foreign/fdwapi.h"
@@ -139,7 +140,7 @@ static void show_indexsearches_info(PlanState *planstate, ExplainState *es);
static void show_tidbitmap_info(BitmapHeapScanState *planstate,
ExplainState *es);
static void show_instrumentation_count(const char *qlabel, int which,
- PlanState *planstate, ExplainState *es);
+ Instrumentation *instr, ExplainState *es);
static void show_foreignscan_info(ForeignScanState *fsstate, ExplainState *es);
static const char *explain_get_index_name(Oid indexId);
static bool peek_buffer_usage(ExplainState *es, const BufferUsage *usage);
@@ -596,6 +597,15 @@ ExplainOnePlan(PlannedStmt *plannedstmt, CachedPlan *cplan,
/* We can't run ExecutorEnd 'till we're done printing the stats... */
totaltime += elapsed_time(&starttime);
}
+ else
+ {
+ /*
+ * Handle progressive explain cleanup manually if enabled as it is not
+ * called as part of ExecutorFinish.
+ */
+ if (progressive_explain)
+ ProgressiveExplainFinish(queryDesc);
+ }
/* grab serialization metrics before we destroy the DestReceiver */
if (es->serialize != EXPLAIN_SERIALIZE_NONE)
@@ -1371,6 +1381,7 @@ ExplainNode(PlanState *planstate, List *ancestors,
const char *partialmode = NULL;
const char *operation = NULL;
const char *custom_name = NULL;
+ Instrumentation *local_instr = NULL;
ExplainWorkersState *save_workers_state = es->workers_state;
int save_indent = es->indent;
bool haschildren;
@@ -1834,53 +1845,90 @@ ExplainNode(PlanState *planstate, List *ancestors,
* instrumentation results the user didn't ask for. But we do the
* InstrEndLoop call anyway, if possible, to reduce the number of cases
* auto_explain has to contend with.
+ *
+ * For regular explains instrumentation clean up is called directly in the
+ * main instrumentation objects. Progressive explains need to clone
+ * instrumentation object and forcibly end the loop in nodes that may be
+ * running.
*/
if (planstate->instrument)
- InstrEndLoop(planstate->instrument);
-
- if (es->analyze &&
- planstate->instrument && planstate->instrument->nloops > 0)
{
- double nloops = planstate->instrument->nloops;
- double startup_ms = 1000.0 * planstate->instrument->startup / nloops;
- double total_ms = 1000.0 * planstate->instrument->total / nloops;
- double rows = planstate->instrument->ntuples / nloops;
-
- if (es->format == EXPLAIN_FORMAT_TEXT)
+ /* Progressive explain. Use auxiliary instrumentation object */
+ if (es->progressive)
{
- appendStringInfo(es->str, " (actual ");
-
- if (es->timing)
- appendStringInfo(es->str, "time=%.3f..%.3f ", startup_ms, total_ms);
+ local_instr = es->pe_local_instr;
+ *local_instr = *planstate->instrument;
- appendStringInfo(es->str, "rows=%.2f loops=%.0f)", rows, nloops);
+ /* Force end loop even if node is in progress */
+ InstrEndLoopForce(local_instr);
}
+ /* Use main instrumentation */
else
{
- if (es->timing)
- {
- ExplainPropertyFloat("Actual Startup Time", "ms", startup_ms,
- 3, es);
- ExplainPropertyFloat("Actual Total Time", "ms", total_ms,
- 3, es);
- }
- ExplainPropertyFloat("Actual Rows", NULL, rows, 2, es);
- ExplainPropertyFloat("Actual Loops", NULL, nloops, 0, es);
+ local_instr = planstate->instrument;
+ InstrEndLoop(local_instr);
}
}
- else if (es->analyze)
+
+ /*
+ * Additional query execution details should only be included if
+ * instrumentation is enabled and, if progressive explain is enabled, it
+ * is configured to update the plan more than once.
+ */
+ if (es->analyze &&
+ (!es->progressive ||
+ (es->progressive && progressive_explain_interval > 0)))
{
- if (es->format == EXPLAIN_FORMAT_TEXT)
- appendStringInfoString(es->str, " (never executed)");
+ if (local_instr && local_instr->nloops > 0)
+ {
+ double nloops = local_instr->nloops;
+ double startup_ms = 1000.0 * local_instr->startup / nloops;
+ double total_ms = 1000.0 * local_instr->total / nloops;
+ double rows = local_instr->ntuples / nloops;
+
+ if (es->format == EXPLAIN_FORMAT_TEXT)
+ {
+ appendStringInfo(es->str, " (actual ");
+
+ if (es->timing)
+ appendStringInfo(es->str, "time=%.3f..%.3f ", startup_ms, total_ms);
+
+ appendStringInfo(es->str, "rows=%.2f loops=%.0f)", rows, nloops);
+
+ if (es->progressive && planstate == es->pe_curr_node)
+ appendStringInfo(es->str, " (current)");
+ }
+ else
+ {
+ if (es->timing)
+ {
+ ExplainPropertyFloat("Actual Startup Time", "ms", startup_ms,
+ 3, es);
+ ExplainPropertyFloat("Actual Total Time", "ms", total_ms,
+ 3, es);
+ }
+ ExplainPropertyFloat("Actual Rows", NULL, rows, 2, es);
+ ExplainPropertyFloat("Actual Loops", NULL, nloops, 0, es);
+
+ /* Progressive explain. Add current node flag is applicable */
+ if (es->progressive && planstate == es->pe_curr_node)
+ ExplainPropertyBool("Current", true, es);
+ }
+ }
else
{
- if (es->timing)
+ if (es->format == EXPLAIN_FORMAT_TEXT)
+ appendStringInfoString(es->str, " (never executed)");
+ else
{
- ExplainPropertyFloat("Actual Startup Time", "ms", 0.0, 3, es);
- ExplainPropertyFloat("Actual Total Time", "ms", 0.0, 3, es);
+ if (es->timing)
+ {
+ ExplainPropertyFloat("Actual Startup Time", "ms", 0.0, 3, es);
+ ExplainPropertyFloat("Actual Total Time", "ms", 0.0, 3, es);
+ }
+ ExplainPropertyFloat("Actual Rows", NULL, 0.0, 0, es);
+ ExplainPropertyFloat("Actual Loops", NULL, 0.0, 0, es);
}
- ExplainPropertyFloat("Actual Rows", NULL, 0.0, 0, es);
- ExplainPropertyFloat("Actual Loops", NULL, 0.0, 0, es);
}
}
@@ -1970,13 +2018,13 @@ ExplainNode(PlanState *planstate, List *ancestors,
"Index Cond", planstate, ancestors, es);
if (((IndexScan *) plan)->indexqualorig)
show_instrumentation_count("Rows Removed by Index Recheck", 2,
- planstate, es);
+ local_instr, es);
show_scan_qual(((IndexScan *) plan)->indexorderbyorig,
"Order By", planstate, ancestors, es);
show_scan_qual(plan->qual, "Filter", planstate, ancestors, es);
if (plan->qual)
show_instrumentation_count("Rows Removed by Filter", 1,
- planstate, es);
+ local_instr, es);
show_indexsearches_info(planstate, es);
break;
case T_IndexOnlyScan:
@@ -1984,16 +2032,16 @@ ExplainNode(PlanState *planstate, List *ancestors,
"Index Cond", planstate, ancestors, es);
if (((IndexOnlyScan *) plan)->recheckqual)
show_instrumentation_count("Rows Removed by Index Recheck", 2,
- planstate, es);
+ local_instr, es);
show_scan_qual(((IndexOnlyScan *) plan)->indexorderby,
"Order By", planstate, ancestors, es);
show_scan_qual(plan->qual, "Filter", planstate, ancestors, es);
if (plan->qual)
show_instrumentation_count("Rows Removed by Filter", 1,
- planstate, es);
+ local_instr, es);
if (es->analyze)
ExplainPropertyFloat("Heap Fetches", NULL,
- planstate->instrument->ntuples2, 0, es);
+ local_instr->ntuples2, 0, es);
show_indexsearches_info(planstate, es);
break;
case T_BitmapIndexScan:
@@ -2006,11 +2054,11 @@ ExplainNode(PlanState *planstate, List *ancestors,
"Recheck Cond", planstate, ancestors, es);
if (((BitmapHeapScan *) plan)->bitmapqualorig)
show_instrumentation_count("Rows Removed by Index Recheck", 2,
- planstate, es);
+ local_instr, es);
show_scan_qual(plan->qual, "Filter", planstate, ancestors, es);
if (plan->qual)
show_instrumentation_count("Rows Removed by Filter", 1,
- planstate, es);
+ local_instr, es);
show_tidbitmap_info((BitmapHeapScanState *) planstate, es);
break;
case T_SampleScan:
@@ -2027,7 +2075,7 @@ ExplainNode(PlanState *planstate, List *ancestors,
show_scan_qual(plan->qual, "Filter", planstate, ancestors, es);
if (plan->qual)
show_instrumentation_count("Rows Removed by Filter", 1,
- planstate, es);
+ local_instr, es);
if (IsA(plan, CteScan))
show_ctescan_info(castNode(CteScanState, planstate), es);
break;
@@ -2038,7 +2086,7 @@ ExplainNode(PlanState *planstate, List *ancestors,
show_scan_qual(plan->qual, "Filter", planstate, ancestors, es);
if (plan->qual)
show_instrumentation_count("Rows Removed by Filter", 1,
- planstate, es);
+ local_instr, es);
ExplainPropertyInteger("Workers Planned", NULL,
gather->num_workers, es);
@@ -2062,7 +2110,7 @@ ExplainNode(PlanState *planstate, List *ancestors,
show_scan_qual(plan->qual, "Filter", planstate, ancestors, es);
if (plan->qual)
show_instrumentation_count("Rows Removed by Filter", 1,
- planstate, es);
+ local_instr, es);
ExplainPropertyInteger("Workers Planned", NULL,
gm->num_workers, es);
@@ -2096,7 +2144,7 @@ ExplainNode(PlanState *planstate, List *ancestors,
show_scan_qual(plan->qual, "Filter", planstate, ancestors, es);
if (plan->qual)
show_instrumentation_count("Rows Removed by Filter", 1,
- planstate, es);
+ local_instr, es);
break;
case T_TableFuncScan:
if (es->verbose)
@@ -2110,7 +2158,7 @@ ExplainNode(PlanState *planstate, List *ancestors,
show_scan_qual(plan->qual, "Filter", planstate, ancestors, es);
if (plan->qual)
show_instrumentation_count("Rows Removed by Filter", 1,
- planstate, es);
+ local_instr, es);
show_table_func_scan_info(castNode(TableFuncScanState,
planstate), es);
break;
@@ -2128,7 +2176,7 @@ ExplainNode(PlanState *planstate, List *ancestors,
show_scan_qual(plan->qual, "Filter", planstate, ancestors, es);
if (plan->qual)
show_instrumentation_count("Rows Removed by Filter", 1,
- planstate, es);
+ local_instr, es);
}
break;
case T_TidRangeScan:
@@ -2145,14 +2193,14 @@ ExplainNode(PlanState *planstate, List *ancestors,
show_scan_qual(plan->qual, "Filter", planstate, ancestors, es);
if (plan->qual)
show_instrumentation_count("Rows Removed by Filter", 1,
- planstate, es);
+ local_instr, es);
}
break;
case T_ForeignScan:
show_scan_qual(plan->qual, "Filter", planstate, ancestors, es);
if (plan->qual)
show_instrumentation_count("Rows Removed by Filter", 1,
- planstate, es);
+ local_instr, es);
show_foreignscan_info((ForeignScanState *) planstate, es);
break;
case T_CustomScan:
@@ -2162,7 +2210,7 @@ ExplainNode(PlanState *planstate, List *ancestors,
show_scan_qual(plan->qual, "Filter", planstate, ancestors, es);
if (plan->qual)
show_instrumentation_count("Rows Removed by Filter", 1,
- planstate, es);
+ local_instr, es);
if (css->methods->ExplainCustomScan)
css->methods->ExplainCustomScan(css, ancestors, es);
}
@@ -2172,11 +2220,11 @@ ExplainNode(PlanState *planstate, List *ancestors,
"Join Filter", planstate, ancestors, es);
if (((NestLoop *) plan)->join.joinqual)
show_instrumentation_count("Rows Removed by Join Filter", 1,
- planstate, es);
+ local_instr, es);
show_upper_qual(plan->qual, "Filter", planstate, ancestors, es);
if (plan->qual)
show_instrumentation_count("Rows Removed by Filter", 2,
- planstate, es);
+ local_instr, es);
break;
case T_MergeJoin:
show_upper_qual(((MergeJoin *) plan)->mergeclauses,
@@ -2185,11 +2233,11 @@ ExplainNode(PlanState *planstate, List *ancestors,
"Join Filter", planstate, ancestors, es);
if (((MergeJoin *) plan)->join.joinqual)
show_instrumentation_count("Rows Removed by Join Filter", 1,
- planstate, es);
+ local_instr, es);
show_upper_qual(plan->qual, "Filter", planstate, ancestors, es);
if (plan->qual)
show_instrumentation_count("Rows Removed by Filter", 2,
- planstate, es);
+ local_instr, es);
break;
case T_HashJoin:
show_upper_qual(((HashJoin *) plan)->hashclauses,
@@ -2198,11 +2246,11 @@ ExplainNode(PlanState *planstate, List *ancestors,
"Join Filter", planstate, ancestors, es);
if (((HashJoin *) plan)->join.joinqual)
show_instrumentation_count("Rows Removed by Join Filter", 1,
- planstate, es);
+ local_instr, es);
show_upper_qual(plan->qual, "Filter", planstate, ancestors, es);
if (plan->qual)
show_instrumentation_count("Rows Removed by Filter", 2,
- planstate, es);
+ local_instr, es);
break;
case T_Agg:
show_agg_keys(castNode(AggState, planstate), ancestors, es);
@@ -2210,7 +2258,7 @@ ExplainNode(PlanState *planstate, List *ancestors,
show_hashagg_info((AggState *) planstate, es);
if (plan->qual)
show_instrumentation_count("Rows Removed by Filter", 1,
- planstate, es);
+ local_instr, es);
break;
case T_WindowAgg:
show_window_def(castNode(WindowAggState, planstate), ancestors, es);
@@ -2219,7 +2267,7 @@ ExplainNode(PlanState *planstate, List *ancestors,
show_upper_qual(plan->qual, "Filter", planstate, ancestors, es);
if (plan->qual)
show_instrumentation_count("Rows Removed by Filter", 1,
- planstate, es);
+ local_instr, es);
show_windowagg_info(castNode(WindowAggState, planstate), es);
break;
case T_Group:
@@ -2227,7 +2275,7 @@ ExplainNode(PlanState *planstate, List *ancestors,
show_upper_qual(plan->qual, "Filter", planstate, ancestors, es);
if (plan->qual)
show_instrumentation_count("Rows Removed by Filter", 1,
- planstate, es);
+ local_instr, es);
break;
case T_Sort:
show_sort_keys(castNode(SortState, planstate), ancestors, es);
@@ -2249,7 +2297,7 @@ ExplainNode(PlanState *planstate, List *ancestors,
show_upper_qual(plan->qual, "Filter", planstate, ancestors, es);
if (plan->qual)
show_instrumentation_count("Rows Removed by Filter", 1,
- planstate, es);
+ local_instr, es);
break;
case T_ModifyTable:
show_modifytable_info(castNode(ModifyTableState, planstate), ancestors,
@@ -2294,10 +2342,10 @@ ExplainNode(PlanState *planstate, List *ancestors,
}
/* Show buffer/WAL usage */
- if (es->buffers && planstate->instrument)
- show_buffer_usage(es, &planstate->instrument->bufusage);
- if (es->wal && planstate->instrument)
- show_wal_usage(es, &planstate->instrument->walusage);
+ if (es->buffers && local_instr)
+ show_buffer_usage(es, &local_instr->bufusage);
+ if (es->wal && local_instr)
+ show_wal_usage(es, &local_instr->walusage);
/* Prepare per-worker buffer/WAL usage */
if (es->workers_state && (es->buffers || es->wal) && es->verbose)
@@ -3975,19 +4023,19 @@ show_tidbitmap_info(BitmapHeapScanState *planstate, ExplainState *es)
*/
static void
show_instrumentation_count(const char *qlabel, int which,
- PlanState *planstate, ExplainState *es)
+ Instrumentation *instr, ExplainState *es)
{
double nfiltered;
double nloops;
- if (!es->analyze || !planstate->instrument)
+ if (!es->analyze || !instr)
return;
if (which == 2)
- nfiltered = planstate->instrument->nfiltered2;
+ nfiltered = instr->nfiltered2;
else
- nfiltered = planstate->instrument->nfiltered1;
- nloops = planstate->instrument->nloops;
+ nfiltered = instr->nfiltered1;
+ nloops = instr->nloops;
/* In text mode, suppress zero counts; they're not interesting enough */
if (nfiltered > 0 || es->format != EXPLAIN_FORMAT_TEXT)
@@ -4668,7 +4716,7 @@ show_modifytable_info(ModifyTableState *mtstate, List *ancestors,
{
show_upper_qual((List *) node->onConflictWhere, "Conflict Filter",
&mtstate->ps, ancestors, es);
- show_instrumentation_count("Rows Removed by Conflict Filter", 1, &mtstate->ps, es);
+ show_instrumentation_count("Rows Removed by Conflict Filter", 1, (&mtstate->ps)->instrument, es);
}
/* EXPLAIN ANALYZE display of actual outcome for each tuple proposed */
@@ -4677,11 +4725,24 @@ show_modifytable_info(ModifyTableState *mtstate, List *ancestors,
double total;
double insert_path;
double other_path;
+ Instrumentation *local_instr;
- InstrEndLoop(outerPlanState(mtstate)->instrument);
+ /* Progressive explain. Use auxiliary instrumentation object */
+ if (es->progressive)
+ {
+ local_instr = es->pe_local_instr;
+ *local_instr = *outerPlanState(mtstate)->instrument;
+ /* Force end loop even if node is in progress */
+ InstrEndLoopForce(local_instr);
+ }
+ else
+ {
+ local_instr = outerPlanState(mtstate)->instrument;
+ InstrEndLoop(local_instr);
+ }
/* count the number of source rows */
- total = outerPlanState(mtstate)->instrument->ntuples;
+ total = local_instr->ntuples;
other_path = mtstate->ps.instrument->ntuples2;
insert_path = total - other_path;
@@ -4701,11 +4762,24 @@ show_modifytable_info(ModifyTableState *mtstate, List *ancestors,
double update_path;
double delete_path;
double skipped_path;
+ Instrumentation *local_instr;
- InstrEndLoop(outerPlanState(mtstate)->instrument);
+ /* Progressive explain. Use auxiliary instrumentation object */
+ if (es->progressive)
+ {
+ local_instr = es->pe_local_instr;
+ *local_instr = *outerPlanState(mtstate)->instrument;
+ /* Force end loop even if node is in progress */
+ InstrEndLoopForce(local_instr);
+ }
+ else
+ {
+ local_instr = outerPlanState(mtstate)->instrument;
+ InstrEndLoop(local_instr);
+ }
/* count the number of source rows */
- total = outerPlanState(mtstate)->instrument->ntuples;
+ total = local_instr->ntuples;
insert_path = mtstate->mt_merge_inserted;
update_path = mtstate->mt_merge_updated;
delete_path = mtstate->mt_merge_deleted;
diff --git a/src/backend/commands/explain_format.c b/src/backend/commands/explain_format.c
index 752691d56db..c0d6973d1e5 100644
--- a/src/backend/commands/explain_format.c
+++ b/src/backend/commands/explain_format.c
@@ -16,6 +16,7 @@
#include "commands/explain.h"
#include "commands/explain_format.h"
#include "commands/explain_state.h"
+#include "utils/guc_tables.h"
#include "utils/json.h"
#include "utils/xml.h"
@@ -25,6 +26,17 @@
#define X_CLOSE_IMMEDIATE 2
#define X_NOWHITESPACE 4
+/*
+ * GUC support
+ */
+const struct config_enum_entry explain_format_options[] = {
+ {"text", EXPLAIN_FORMAT_TEXT, false},
+ {"xml", EXPLAIN_FORMAT_XML, false},
+ {"json", EXPLAIN_FORMAT_JSON, false},
+ {"yaml", EXPLAIN_FORMAT_YAML, false},
+ {NULL, 0, false}
+};
+
static void ExplainJSONLineEnding(ExplainState *es);
static void ExplainXMLTag(const char *tagname, int flags, ExplainState *es);
static void ExplainYAMLLineStarting(ExplainState *es);
diff --git a/src/backend/commands/explain_progressive.c b/src/backend/commands/explain_progressive.c
new file mode 100644
index 00000000000..eddd57ccf50
--- /dev/null
+++ b/src/backend/commands/explain_progressive.c
@@ -0,0 +1,493 @@
+/*-------------------------------------------------------------------------
+ *
+ * explain_progressive.c
+ * Code for the progressive explain feature
+ *
+ * Portions Copyright (c) 1996-2025, PostgreSQL Global Development Group
+ * Portions Copyright (c) 1994-5, Regents of the University of California
+ *
+ * IDENTIFICATION
+ * src/backend/commands/explain_progressive.c
+ *
+ *-------------------------------------------------------------------------
+ */
+#include "postgres.h"
+
+#include "access/xact.h"
+#include "catalog/pg_authid.h"
+#include "commands/explain.h"
+#include "commands/explain_format.h"
+#include "commands/explain_progressive.h"
+#include "commands/explain_state.h"
+#include "foreign/fdwapi.h"
+#include "funcapi.h"
+#include "utils/acl.h"
+#include "utils/backend_status.h"
+#include "utils/builtins.h"
+#include "utils/guc_tables.h"
+#include "utils/timeout.h"
+
+
+#define PROGRESSIVE_EXPLAIN_FREE_SIZE 4096
+
+/* Shared hash to store progressive explains */
+static HTAB *progressiveExplainHash = NULL;
+
+/* Pointer to the tracked query */
+static QueryDesc *activeQueryDesc = NULL;
+
+/* Transaction nest level of the tracked query */
+static int activeQueryXactNestLevel = -1;
+
+/* Flag set by timeouts to control when to update progressive explains */
+bool ProgressiveExplainPending = false;
+
+static void ProgressiveExplainPrint(QueryDesc *queryDesc);
+static void ProgressiveExplainCleanup(void);
+
+
+
+/*
+ * ProgressiveExplainSetup -
+ * Track query descriptor and adjust instrumentation.
+ *
+ * If progressive explain is enabled and configured to collect
+ * instrumentation details, we adjust QueryDesc accordingly even if
+ * the query was not initiated with EXPLAIN ANALYZE. This will
+ * directly affect query execution and add computation overhead.
+ */
+void
+ProgressiveExplainSetup(QueryDesc *queryDesc)
+{
+ /* Setup only if this is the outer most query */
+ if (activeQueryDesc == NULL)
+ {
+ activeQueryDesc = queryDesc;
+ activeQueryXactNestLevel = GetCurrentTransactionNestLevel();
+
+ /*
+ * Enable instrumentation if the plan will be updated more than once.
+ */
+ if (progressive_explain_interval > 0)
+ {
+ if (progressive_explain_timing)
+ queryDesc->instrument_options |= INSTRUMENT_TIMER;
+ else
+ queryDesc->instrument_options |= INSTRUMENT_ROWS;
+ if (progressive_explain_buffers)
+ queryDesc->instrument_options |= INSTRUMENT_BUFFERS;
+ if (progressive_explain_wal)
+ queryDesc->instrument_options |= INSTRUMENT_WAL;
+ }
+ }
+}
+
+/*
+ * ProgressiveExplainStart -
+ * Responsible for initialization of all structures related to progressive
+ * explains.
+ *
+ * We define a ExplainState that will be reused in every iteration of
+ * plan updates.
+ *
+ * Progressive explain plans are updated in shared memory via DSAs. Each
+ * backend updating plans has its own DSA, shared with other backends via
+ * the global progressive explain hash through dsa_handle and
+ * dsa_pointer pointers.
+ *
+ * A periodic timeout is configured to update the plan in fixed intervals if
+ * progressive explain is configured with instrumentation enabled. Otherwise
+ * the plain plan is updated once.
+ */
+void
+ProgressiveExplainStart(QueryDesc *queryDesc)
+{
+ ExplainState *es;
+ ProgressiveExplainHashEntry *entry;
+ bool found;
+
+ /* Initialize ExplainState to be used for all plan updates */
+ es = NewExplainState();
+ queryDesc->pestate = es;
+
+ /* Local instrumentation object to be reused for every node */
+ es->pe_local_instr = palloc0(sizeof(Instrumentation));
+
+ /*
+ * Mark ExplainState as progressive so that ExplainNode() function uses a
+ * special logic when printing the plan.
+ */
+ es->progressive = true;
+
+ es->analyze = (queryDesc->instrument_options &&
+ (progressive_explain_interval > 0));
+ es->buffers = (es->analyze && progressive_explain_buffers);
+ es->wal = (es->analyze && progressive_explain_wal);
+ es->timing = (es->analyze && progressive_explain_timing);
+ es->summary = (es->analyze);
+ es->format = progressive_explain_format;
+ es->verbose = progressive_explain_verbose;
+ es->settings = progressive_explain_settings;
+ es->costs = progressive_explain_costs;
+
+ /* Define the DSA and share through the hash */
+ es->pe_a = dsa_create(LWTRANCHE_PROGRESSIVE_EXPLAIN_DSA);
+
+ /* Exclusive access is needed to update the hash */
+ LWLockAcquire(ProgressiveExplainHashLock, LW_EXCLUSIVE);
+
+ /* Find or create an entry with desired hash code */
+ entry = (ProgressiveExplainHashEntry *) hash_search(progressiveExplainHash,
+ &MyProcPid,
+ HASH_ENTER,
+ &found);
+
+ entry->pe_h = dsa_get_handle(es->pe_a);
+ entry->pe_p = (dsa_pointer) NULL;
+
+ LWLockRelease(ProgressiveExplainHashLock);
+
+ /* Enable timeout only if instrumentation is enabled */
+ if (es->analyze)
+ enable_timeout_every(PROGRESSIVE_EXPLAIN_TIMEOUT,
+ TimestampTzPlusMilliseconds(GetCurrentTimestamp(),
+ progressive_explain_interval),
+ progressive_explain_interval);
+
+ /* Print progressive plan for the first time */
+ ProgressiveExplainPrint(queryDesc);
+}
+
+/*
+ * ProgressiveExplainUpdate
+ * Updates progressive explain for instrumented runs.
+ */
+void
+ProgressiveExplainUpdate(PlanState *node)
+{
+ /* Track the current PlanState */
+ node->state->query_desc->pestate->pe_curr_node = node;
+ ProgressiveExplainPrint(node->state->query_desc);
+ node->state->query_desc->pestate->pe_curr_node = NULL;
+
+ /* Reset timeout flag */
+ ProgressiveExplainPending = false;
+}
+
+/*
+ * ProgressiveExplainPrint -
+ * Updates progressive explain in memory.
+ *
+ * This function resets the reusable ExplainState, updates the
+ * plan and updates the DSA with new data.
+ *
+ * DSA memory allocation is also done here. Amount of shared
+ * memory allocated depends on size of currently updated plan.
+ * There may be reallocations in subsequent calls if new plans
+ * don't fit in the existing area.
+ */
+void
+ProgressiveExplainPrint(QueryDesc *queryDesc)
+{
+ bool alloc_needed = false;
+ QueryDesc *currentQueryDesc = queryDesc;
+ ProgressiveExplainHashEntry *entry;
+ ProgressiveExplainHashData *pe_data;
+ ExplainState *es = queryDesc->pestate;
+
+ /* Reset the string to be reused */
+ resetStringInfo(es->str);
+
+ /* Print the plan */
+ ExplainBeginOutput(es);
+ ExplainPrintPlan(es, currentQueryDesc);
+ ExplainEndOutput(es);
+
+ /* Exclusive access is needed to update the hash */
+ LWLockAcquire(ProgressiveExplainHashLock, LW_EXCLUSIVE);
+ entry = (ProgressiveExplainHashEntry *) hash_search(progressiveExplainHash,
+ &MyProcPid,
+ HASH_FIND,
+ NULL);
+
+ /* Entry must already exist in shared memory at this point */
+ if (!entry)
+ elog(ERROR, "no entry in progressive explain hash for pid %d",
+ MyProcPid);
+
+ /* Plan was never printed */
+ if (!entry->pe_p)
+ alloc_needed = true;
+ else
+ {
+ pe_data = dsa_get_address(es->pe_a,
+ entry->pe_p);
+
+ /*
+ * Plan does not fit in existing shared memory area. Reallocation is
+ * needed.
+ */
+ if (strlen(es->str->data) > entry->plan_alloc_size)
+ {
+ dsa_free(es->pe_a, entry->pe_p);
+ alloc_needed = true;
+ }
+ }
+
+ if (alloc_needed)
+ {
+ /*
+ * The allocated size combines the length of the currently printed
+ * query plan with an additional delta defined by
+ * PROGRESSIVE_EXPLAIN_FREE_SIZE. This strategy prevents having to
+ * reallocate the segment very often, which would be needed in case
+ * the length of the next printed exceeds the previously allocated
+ * size.
+ */
+ entry->plan_alloc_size = add_size(strlen(es->str->data),
+ PROGRESSIVE_EXPLAIN_FREE_SIZE);
+ entry->pe_p = dsa_allocate(es->pe_a,
+ add_size(sizeof(ProgressiveExplainHashData),
+ entry->plan_alloc_size));
+ pe_data = dsa_get_address(es->pe_a, entry->pe_p);
+ pe_data->pid = MyProcPid;
+ }
+
+ /* Update shared memory with new data */
+ strcpy(pe_data->plan, es->str->data);
+ pe_data->last_update = GetCurrentTimestamp();
+
+ LWLockRelease(ProgressiveExplainHashLock);
+}
+
+/*
+ * ProgressiveExplainFinish -
+ * Finalizes query execution with progressive explain enabled.
+ */
+void
+ProgressiveExplainFinish(QueryDesc *queryDesc)
+{
+ /*
+ * Progressive explain is only done for the outer most query descriptor.
+ */
+ if (queryDesc == activeQueryDesc)
+ {
+ ProgressiveExplainCleanup();
+ dsa_detach(queryDesc->pestate->pe_a);
+ }
+}
+
+/*
+ * ProgressiveExplainIsActive -
+ * Checks if argument queryDesc is the one being tracked.
+ */
+bool
+ProgressiveExplainIsActive(QueryDesc *queryDesc)
+{
+ return queryDesc == activeQueryDesc;
+}
+
+/*
+ * End-of-transaction cleanup for progressive explains.
+ */
+void
+AtEOXact_ProgressiveExplain(bool isCommit)
+{
+ /* Only perform cleanup if query descriptor is being tracked */
+ if (activeQueryDesc != NULL)
+ {
+ if (isCommit)
+ elog(WARNING, "leaked progressive explain query descriptor");
+ ProgressiveExplainCleanup();
+ }
+}
+
+/*
+ * End-of-subtransaction cleanup for progressive explains.
+ */
+void
+AtEOSubXact_ProgressiveExplain(bool isCommit, int nestDepth)
+{
+ /*
+ * Only perform cleanup if progressive explain is enabled
+ * (activeQueryXactNestLevel != -1) and the transaction nested level of
+ * the aborted subtransaction is greater or equal compared to the level of
+ * the tracked query. This is to avoid doing cleanup in subtransaction
+ * aborts triggered by exception blocks in functions and procedures.
+ */
+ if (activeQueryXactNestLevel >= nestDepth)
+ {
+ if (isCommit)
+ elog(WARNING, "leaked progressive explain query descriptor");
+ ProgressiveExplainCleanup();
+ }
+}
+
+/*
+ * ProgressiveExplainCleanup -
+ * Cleanup routine when progressive explain is enabled.
+ *
+ * We need to deal with structures not automatically released by the memory
+ * context removal. Current tasks are:
+ * - remove local backend from progressive explain hash
+ * - detach from DSA used to store shared data
+ */
+void
+ProgressiveExplainCleanup(void)
+{
+ /* Stop timeout */
+ disable_timeout(PROGRESSIVE_EXPLAIN_TIMEOUT, false);
+
+ /* Reset timeout flag */
+ ProgressiveExplainPending = false;
+
+ /* Reset querydesc tracker and nested level */
+ activeQueryDesc = NULL;
+ activeQueryXactNestLevel = -1;
+
+ /* Remove backend from the shared hash */
+ LWLockAcquire(ProgressiveExplainHashLock, LW_EXCLUSIVE);
+ hash_search(progressiveExplainHash, &MyProcPid, HASH_REMOVE, NULL);
+ LWLockRelease(ProgressiveExplainHashLock);
+}
+
+/*
+ * ExecProcNodeInstrExplain -
+ * ExecProcNode wrapper that performs instrumentation calls and updates
+ * progressive explains. By keeping this a separate function, we add
+ * overhead only when progressive explain is enabled.
+ */
+TupleTableSlot *
+ExecProcNodeInstrExplain(PlanState *node)
+{
+ TupleTableSlot *result;
+
+ InstrStartNode(node->instrument);
+
+ /*
+ * Update progressive after timeout is reached.
+ */
+ if (ProgressiveExplainPending)
+ ProgressiveExplainUpdate(node);
+
+ result = node->ExecProcNodeReal(node);
+
+ InstrStopNode(node->instrument, TupIsNull(result) ? 0.0 : 1.0);
+
+ return result;
+}
+
+/*
+ * ProgressiveExplainHashShmemSize
+ * Compute shared memory space needed for explain hash.
+ */
+Size
+ProgressiveExplainHashShmemSize(void)
+{
+ Size size = 0;
+ long max_table_size;
+
+ max_table_size = add_size(MaxBackends,
+ max_parallel_workers);
+ size = add_size(size,
+ hash_estimate_size(max_table_size,
+ sizeof(ProgressiveExplainHashEntry)));
+
+ return size;
+}
+
+/*
+ * InitProgressiveExplainHash -
+ * Initialize hash used to store data of progressive explains.
+ */
+void
+InitProgressiveExplainHash(void)
+{
+ HASHCTL info;
+
+ info.keysize = sizeof(int);
+ info.entrysize = sizeof(ProgressiveExplainHashEntry);
+
+ progressiveExplainHash = ShmemInitHash("progressive explain hash",
+ 50, 50,
+ &info,
+ HASH_ELEM | HASH_BLOBS);
+}
+
+/*
+ * pg_stat_progress_explain -
+ * Return the progress of progressive explains.
+ */
+Datum
+pg_stat_progress_explain(PG_FUNCTION_ARGS)
+{
+#define EXPLAIN_ACTIVITY_COLS 4
+ int num_backends = pgstat_fetch_stat_numbackends();
+ int curr_backend;
+ HASH_SEQ_STATUS hash_seq;
+ ProgressiveExplainHashEntry *entry;
+ dsa_area *a;
+ ProgressiveExplainHashData *ped;
+
+ ReturnSetInfo *rsinfo = (ReturnSetInfo *) fcinfo->resultinfo;
+
+ InitMaterializedSRF(fcinfo, 0);
+
+ LWLockAcquire(ProgressiveExplainHashLock, LW_SHARED);
+
+ hash_seq_init(&hash_seq, progressiveExplainHash);
+ while ((entry = hash_seq_search(&hash_seq)) != NULL)
+ {
+ Datum values[EXPLAIN_ACTIVITY_COLS] = {0};
+ bool nulls[EXPLAIN_ACTIVITY_COLS] = {0};
+
+ /*
+ * We don't look at a DSA that doesn't contain data yet, or at our own
+ * row.
+ */
+ if (!DsaPointerIsValid(entry->pe_p) ||
+ MyProcPid == entry->pid)
+ continue;
+
+ a = dsa_attach(entry->pe_h);
+ ped = dsa_get_address(a, entry->pe_p);
+
+ /* 1-based index */
+ for (curr_backend = 1; curr_backend <= num_backends; curr_backend++)
+ {
+ LocalPgBackendStatus *local_beentry;
+ PgBackendStatus *beentry;
+
+ /* Get the next one in the list */
+ local_beentry = pgstat_get_local_beentry_by_index(curr_backend);
+ beentry = &local_beentry->backendStatus;
+
+ if (beentry->st_procpid == ped->pid)
+ {
+ /* Values available to all callers */
+ if (beentry->st_databaseid != InvalidOid)
+ values[0] = ObjectIdGetDatum(beentry->st_databaseid);
+ else
+ nulls[0] = true;
+
+ values[1] = ped->pid;
+ values[2] = TimestampTzGetDatum(ped->last_update);
+
+ if (has_privs_of_role(GetUserId(), ROLE_PG_READ_ALL_STATS) ||
+ has_privs_of_role(GetUserId(), beentry->st_procpid))
+ values[3] = CStringGetTextDatum(ped->plan);
+ else
+ values[3] = CStringGetTextDatum("<insufficient privilege>");
+ break;
+ }
+ }
+
+ tuplestore_putvalues(rsinfo->setResult, rsinfo->setDesc, values, nulls);
+
+ dsa_detach(a);
+
+ }
+ LWLockRelease(ProgressiveExplainHashLock);
+
+ return (Datum) 0;
+}
diff --git a/src/backend/commands/meson.build b/src/backend/commands/meson.build
index dd4cde41d32..2bb0ac7d286 100644
--- a/src/backend/commands/meson.build
+++ b/src/backend/commands/meson.build
@@ -24,6 +24,7 @@ backend_sources += files(
'explain.c',
'explain_dr.c',
'explain_format.c',
+ 'explain_progressive.c',
'explain_state.c',
'extension.c',
'foreigncmds.c',
diff --git a/src/backend/executor/execMain.c b/src/backend/executor/execMain.c
index 2da848970be..89e9fb1bb04 100644
--- a/src/backend/executor/execMain.c
+++ b/src/backend/executor/execMain.c
@@ -43,6 +43,7 @@
#include "access/xact.h"
#include "catalog/namespace.h"
#include "catalog/partition.h"
+#include "commands/explain_progressive.h"
#include "commands/matview.h"
#include "commands/trigger.h"
#include "executor/executor.h"
@@ -160,6 +161,12 @@ standard_ExecutorStart(QueryDesc *queryDesc, int eflags)
/* caller must ensure the query's snapshot is active */
Assert(GetActiveSnapshot() == queryDesc->snapshot);
+ /*
+ * Setup progressive explain if enabled.
+ */
+ if (progressive_explain)
+ ProgressiveExplainSetup(queryDesc);
+
/*
* If the transaction is read-only, we need to check if any writes are
* planned to non-temporary tables. EXPLAIN is considered read-only.
@@ -185,6 +192,11 @@ standard_ExecutorStart(QueryDesc *queryDesc, int eflags)
estate = CreateExecutorState();
queryDesc->estate = estate;
+ /*
+ * Adding back reference to QueryDesc
+ */
+ estate->query_desc = queryDesc;
+
oldcontext = MemoryContextSwitchTo(estate->es_query_cxt);
/*
@@ -270,6 +282,12 @@ standard_ExecutorStart(QueryDesc *queryDesc, int eflags)
*/
InitPlan(queryDesc, eflags);
+ /*
+ * Start progressive explain if enabled.
+ */
+ if (progressive_explain)
+ ProgressiveExplainStart(queryDesc);
+
MemoryContextSwitchTo(oldcontext);
return ExecPlanStillValid(queryDesc->estate);
@@ -519,6 +537,9 @@ standard_ExecutorFinish(QueryDesc *queryDesc)
MemoryContextSwitchTo(oldcontext);
+ /* Finish progressive explain if enabled */
+ ProgressiveExplainFinish(queryDesc);
+
estate->es_finished = true;
}
diff --git a/src/backend/executor/execProcnode.c b/src/backend/executor/execProcnode.c
index f5f9cfbeead..7ca0544e45e 100644
--- a/src/backend/executor/execProcnode.c
+++ b/src/backend/executor/execProcnode.c
@@ -72,6 +72,7 @@
*/
#include "postgres.h"
+#include "commands/explain_progressive.h"
#include "executor/executor.h"
#include "executor/nodeAgg.h"
#include "executor/nodeAppend.h"
@@ -118,6 +119,7 @@
#include "executor/nodeWorktablescan.h"
#include "miscadmin.h"
#include "nodes/nodeFuncs.h"
+#include "utils/guc.h"
static TupleTableSlot *ExecProcNodeFirst(PlanState *node);
static TupleTableSlot *ExecProcNodeInstr(PlanState *node);
@@ -462,7 +464,19 @@ ExecProcNodeFirst(PlanState *node)
* have ExecProcNode() directly call the relevant function from now on.
*/
if (node->instrument)
- node->ExecProcNode = ExecProcNodeInstr;
+ {
+ /*
+ * Use instrumented wrapper for progressive explains only if the
+ * feature is enabled, is configured to update the plan more than once
+ * and the node belongs to the currently tracked query descriptor.
+ */
+ if (progressive_explain &&
+ progressive_explain_interval > 0 &&
+ ProgressiveExplainIsActive(node->state->query_desc))
+ node->ExecProcNode = ExecProcNodeInstrExplain;
+ else
+ node->ExecProcNode = ExecProcNodeInstr;
+ }
else
node->ExecProcNode = node->ExecProcNodeReal;
diff --git a/src/backend/executor/instrument.c b/src/backend/executor/instrument.c
index 56e635f4700..6a160ee254f 100644
--- a/src/backend/executor/instrument.c
+++ b/src/backend/executor/instrument.c
@@ -25,6 +25,8 @@ static WalUsage save_pgWalUsage;
static void BufferUsageAdd(BufferUsage *dst, const BufferUsage *add);
static void WalUsageAdd(WalUsage *dst, WalUsage *add);
+static void InstrEndLoopInternal(Instrumentation *instr, bool force);
+
/* Allocate new instrumentation structure(s) */
Instrumentation *
@@ -137,7 +139,7 @@ InstrUpdateTupleCount(Instrumentation *instr, double nTuples)
/* Finish a run cycle for a plan node */
void
-InstrEndLoop(Instrumentation *instr)
+InstrEndLoopInternal(Instrumentation *instr, bool force)
{
double totaltime;
@@ -145,7 +147,7 @@ InstrEndLoop(Instrumentation *instr)
if (!instr->running)
return;
- if (!INSTR_TIME_IS_ZERO(instr->starttime))
+ if (!INSTR_TIME_IS_ZERO(instr->starttime) && !force)
elog(ERROR, "InstrEndLoop called on running node");
/* Accumulate per-cycle statistics into totals */
@@ -164,6 +166,20 @@ InstrEndLoop(Instrumentation *instr)
instr->tuplecount = 0;
}
+/* Safely finish a run cycle for a plan node */
+void
+InstrEndLoop(Instrumentation *instr)
+{
+ InstrEndLoopInternal(instr, false);
+}
+
+/* Forcibly finish a run cycle for a plan node */
+void
+InstrEndLoopForce(Instrumentation *instr)
+{
+ InstrEndLoopInternal(instr, true);
+}
+
/* aggregate instrumentation information */
void
InstrAggNode(Instrumentation *dst, Instrumentation *add)
diff --git a/src/backend/storage/ipc/ipci.c b/src/backend/storage/ipc/ipci.c
index 2fa045e6b0f..389f5b55831 100644
--- a/src/backend/storage/ipc/ipci.c
+++ b/src/backend/storage/ipc/ipci.c
@@ -25,6 +25,7 @@
#include "access/xlogprefetcher.h"
#include "access/xlogrecovery.h"
#include "commands/async.h"
+#include "commands/explain_progressive.h"
#include "miscadmin.h"
#include "pgstat.h"
#include "postmaster/autovacuum.h"
@@ -150,6 +151,7 @@ CalculateShmemSize(int *num_semaphores)
size = add_size(size, InjectionPointShmemSize());
size = add_size(size, SlotSyncShmemSize());
size = add_size(size, AioShmemSize());
+ size = add_size(size, ProgressiveExplainHashShmemSize());
/* include additional requested shmem from preload libraries */
size = add_size(size, total_addin_request);
@@ -302,6 +304,11 @@ CreateOrAttachShmemStructs(void)
*/
PredicateLockShmemInit();
+ /*
+ * Set up instrumented explain hash table
+ */
+ InitProgressiveExplainHash();
+
/*
* Set up process table
*/
diff --git a/src/backend/storage/lmgr/lwlock.c b/src/backend/storage/lmgr/lwlock.c
index 3df29658f18..5b913e2eff0 100644
--- a/src/backend/storage/lmgr/lwlock.c
+++ b/src/backend/storage/lmgr/lwlock.c
@@ -178,6 +178,7 @@ static const char *const BuiltinTrancheNames[] = {
[LWTRANCHE_XACT_SLRU] = "XactSLRU",
[LWTRANCHE_PARALLEL_VACUUM_DSA] = "ParallelVacuumDSA",
[LWTRANCHE_AIO_URING_COMPLETION] = "AioUringCompletion",
+ [LWTRANCHE_PROGRESSIVE_EXPLAIN_DSA] = "ProgressiveExplainDSA",
};
StaticAssertDecl(lengthof(BuiltinTrancheNames) ==
diff --git a/src/backend/tcop/pquery.c b/src/backend/tcop/pquery.c
index 8164d0fbb4f..081966ca267 100644
--- a/src/backend/tcop/pquery.c
+++ b/src/backend/tcop/pquery.c
@@ -102,6 +102,9 @@ CreateQueryDesc(PlannedStmt *plannedstmt,
/* not yet executed */
qd->already_executed = false;
+ /* null until set by progressive explains */
+ qd->pestate = NULL;
+
return qd;
}
diff --git a/src/backend/utils/activity/wait_event_names.txt b/src/backend/utils/activity/wait_event_names.txt
index 4f44648aca8..c00f985b5b8 100644
--- a/src/backend/utils/activity/wait_event_names.txt
+++ b/src/backend/utils/activity/wait_event_names.txt
@@ -351,6 +351,7 @@ DSMRegistry "Waiting to read or update the dynamic shared memory registry."
InjectionPoint "Waiting to read or update information related to injection points."
SerialControl "Waiting to read or update shared <filename>pg_serial</filename> state."
AioWorkerSubmissionQueue "Waiting to access AIO worker submission queue."
+ProgressiveExplainHash "Waiting to access backend progressive explain shared hash table."
#
# END OF PREDEFINED LWLOCKS (DO NOT CHANGE THIS LINE)
diff --git a/src/backend/utils/init/postinit.c b/src/backend/utils/init/postinit.c
index 7958ea11b73..e070509b403 100644
--- a/src/backend/utils/init/postinit.c
+++ b/src/backend/utils/init/postinit.c
@@ -33,6 +33,7 @@
#include "catalog/pg_database.h"
#include "catalog/pg_db_role_setting.h"
#include "catalog/pg_tablespace.h"
+#include "commands/explain_progressive.h"
#include "libpq/auth.h"
#include "libpq/libpq-be.h"
#include "mb/pg_wchar.h"
@@ -82,6 +83,7 @@ static void TransactionTimeoutHandler(void);
static void IdleSessionTimeoutHandler(void);
static void IdleStatsUpdateTimeoutHandler(void);
static void ClientCheckTimeoutHandler(void);
+static void ProgressiveExplainTimeoutHandler(void);
static bool ThereIsAtLeastOneRole(void);
static void process_startup_options(Port *port, bool am_superuser);
static void process_settings(Oid databaseid, Oid roleid);
@@ -771,6 +773,8 @@ InitPostgres(const char *in_dbname, Oid dboid,
RegisterTimeout(CLIENT_CONNECTION_CHECK_TIMEOUT, ClientCheckTimeoutHandler);
RegisterTimeout(IDLE_STATS_UPDATE_TIMEOUT,
IdleStatsUpdateTimeoutHandler);
+ RegisterTimeout(PROGRESSIVE_EXPLAIN_TIMEOUT,
+ ProgressiveExplainTimeoutHandler);
}
/*
@@ -1432,6 +1436,12 @@ ClientCheckTimeoutHandler(void)
SetLatch(MyLatch);
}
+static void
+ProgressiveExplainTimeoutHandler(void)
+{
+ ProgressiveExplainPending = true;
+}
+
/*
* Returns true if at least one role is defined in this database cluster.
*/
diff --git a/src/backend/utils/misc/guc_tables.c b/src/backend/utils/misc/guc_tables.c
index 76c7c6bb4b1..d260cbbe18c 100644
--- a/src/backend/utils/misc/guc_tables.c
+++ b/src/backend/utils/misc/guc_tables.c
@@ -41,6 +41,8 @@
#include "commands/async.h"
#include "commands/extension.h"
#include "commands/event_trigger.h"
+#include "commands/explain_progressive.h"
+#include "commands/explain_state.h"
#include "commands/tablespace.h"
#include "commands/trigger.h"
#include "commands/user.h"
@@ -533,6 +535,15 @@ int log_parameter_max_length_on_error = 0;
int log_temp_files = -1;
double log_statement_sample_rate = 1.0;
double log_xact_sample_rate = 0;
+bool progressive_explain = false;
+bool progressive_explain_verbose = false;
+bool progressive_explain_settings = false;
+bool progressive_explain_timing = true;
+bool progressive_explain_buffers = false;
+bool progressive_explain_wal = false;
+bool progressive_explain_costs = true;
+int progressive_explain_interval = 0;
+int progressive_explain_format = EXPLAIN_FORMAT_TEXT;
char *backtrace_functions;
int temp_file_limit = -1;
@@ -2131,6 +2142,83 @@ struct config_bool ConfigureNamesBool[] =
NULL, NULL, NULL
},
+ {
+ {"progressive_explain", PGC_USERSET, STATS_MONITORING,
+ gettext_noop("Enables progressive explains."),
+ gettext_noop("Explain output is visible via pg_stat_progress_explain."),
+ GUC_EXPLAIN
+ },
+ &progressive_explain,
+ false,
+ NULL, NULL, NULL
+ },
+
+ {
+ {"progressive_explain_verbose", PGC_USERSET, STATS_MONITORING,
+ gettext_noop("Controls whether verbose details are added to progressive explains."),
+ gettext_noop("Equivalent to the VERBOSE option of EXPLAIN."),
+ GUC_EXPLAIN
+ },
+ &progressive_explain_verbose,
+ false,
+ NULL, NULL, NULL
+ },
+
+ {
+ {"progressive_explain_settings", PGC_USERSET, STATS_MONITORING,
+ gettext_noop("Controls whether information on modified configuration is added to progressive explains."),
+ gettext_noop("Equivalent to the SETTINGS option of EXPLAIN."),
+ GUC_EXPLAIN
+ },
+ &progressive_explain_settings,
+ false,
+ NULL, NULL, NULL
+ },
+
+ {
+ {"progressive_explain_timing", PGC_USERSET, STATS_MONITORING,
+ gettext_noop("Controls whether information on per node timing is added to progressive explains."),
+ gettext_noop("Equivalent to the TIMING option of EXPLAIN."),
+ GUC_EXPLAIN
+ },
+ &progressive_explain_timing,
+ true,
+ NULL, NULL, NULL
+ },
+
+ {
+ {"progressive_explain_buffers", PGC_USERSET, STATS_MONITORING,
+ gettext_noop("Controls whether information on buffer usage is added to progressive explains."),
+ gettext_noop("Equivalent to the BUFFERS option of EXPLAIN."),
+ GUC_EXPLAIN
+ },
+ &progressive_explain_buffers,
+ false,
+ NULL, NULL, NULL
+ },
+
+ {
+ {"progressive_explain_wal", PGC_USERSET, STATS_MONITORING,
+ gettext_noop("Controls whether information on WAL record generation is added to progressive explains."),
+ gettext_noop("Equivalent to the WAL option of EXPLAIN."),
+ GUC_EXPLAIN
+ },
+ &progressive_explain_wal,
+ false,
+ NULL, NULL, NULL
+ },
+
+ {
+ {"progressive_explain_costs", PGC_USERSET, STATS_MONITORING,
+ gettext_noop("Controls whether information on the estimated startup and total cost of each plan node is added to progressive explains."),
+ gettext_noop("Equivalent to the COSTS option of EXPLAIN."),
+ GUC_EXPLAIN
+ },
+ &progressive_explain_costs,
+ true,
+ NULL, NULL, NULL
+ },
+
/* End-of-list marker */
{
{NULL, 0, 0, NULL, NULL}, NULL, false, NULL, NULL, NULL
@@ -3848,6 +3936,18 @@ struct config_int ConfigureNamesInt[] =
NULL, NULL, NULL
},
+ {
+ {"progressive_explain_interval", PGC_USERSET, STATS_MONITORING,
+ gettext_noop("Sets the interval between instrumented progressive "
+ "explains."),
+ NULL,
+ GUC_UNIT_MS
+ },
+ &progressive_explain_interval,
+ 0, 0, INT_MAX,
+ NULL, NULL, NULL
+ },
+
/* End-of-list marker */
{
{NULL, 0, 0, NULL, NULL}, NULL, 0, 0, 0, NULL, NULL, NULL
@@ -5396,6 +5496,16 @@ struct config_enum ConfigureNamesEnum[] =
NULL, assign_io_method, NULL
},
+ {
+ {"progressive_explain_format", PGC_USERSET, STATS_MONITORING,
+ gettext_noop("Selects the EXPLAIN output format to be used with progressive explains."),
+ gettext_noop("Equivalent to the FORMAT option of EXPLAIN.")
+ },
+ &progressive_explain_format,
+ EXPLAIN_FORMAT_TEXT, explain_format_options,
+ NULL, NULL, NULL
+ },
+
/* End-of-list marker */
{
{NULL, 0, 0, NULL, NULL}, NULL, 0, NULL, NULL, NULL, NULL
diff --git a/src/backend/utils/misc/postgresql.conf.sample b/src/backend/utils/misc/postgresql.conf.sample
index 7c12434efa2..f58d9745105 100644
--- a/src/backend/utils/misc/postgresql.conf.sample
+++ b/src/backend/utils/misc/postgresql.conf.sample
@@ -670,6 +670,19 @@
#log_executor_stats = off
+# - Progressive Explain -
+
+#progressive_explain = off
+#progressive_explain_interval = 0
+#progressive_explain_format = text
+#progressive_explain_settings = off
+#progressive_explain_verbose = off
+#progressive_explain_buffers = off
+#progressive_explain_wal = off
+#progressive_explain_timing = on
+#progressive_explain_costs = on
+
+
#------------------------------------------------------------------------------
# VACUUMING
#------------------------------------------------------------------------------
diff --git a/src/include/catalog/pg_proc.dat b/src/include/catalog/pg_proc.dat
index 8b68b16d79d..69092d9ccc8 100644
--- a/src/include/catalog/pg_proc.dat
+++ b/src/include/catalog/pg_proc.dat
@@ -12493,4 +12493,14 @@
proargtypes => 'int4',
prosrc => 'gist_stratnum_common' },
+{ oid => '8770',
+ descr => 'statistics: information about progress of backends running statements',
+ proname => 'pg_stat_progress_explain', prorows => '100', proisstrict => 'f',
+ proretset => 't', provolatile => 's', proparallel => 'r',
+ prorettype => 'record', proargtypes => '',
+ proallargtypes => '{oid,int4,timestamptz,text}',
+ proargmodes => '{o,o,o,o}',
+ proargnames => '{datid,pid,last_update,query_plan}',
+ prosrc => 'pg_stat_progress_explain' },
+
]
diff --git a/src/include/commands/explain_progressive.h b/src/include/commands/explain_progressive.h
new file mode 100644
index 00000000000..0926680c15e
--- /dev/null
+++ b/src/include/commands/explain_progressive.h
@@ -0,0 +1,50 @@
+/*-------------------------------------------------------------------------
+ *
+ * explain_progressive.h
+ * prototypes for explain_progressive.c
+ *
+ * Portions Copyright (c) 1996-2025, PostgreSQL Global Development Group
+ * Portions Copyright (c) 1994-5, Regents of the University of California
+ *
+ * src/include/commands/explain_progressive.h
+ *
+ *-------------------------------------------------------------------------
+ */
+#ifndef EXPLAIN_PROGRESSIVE_H
+#define EXPLAIN_PROGRESSIVE_H
+
+#include "datatype/timestamp.h"
+#include "executor/executor.h"
+
+typedef struct ProgressiveExplainHashEntry
+{
+ int pid; /* hash key of entry - MUST BE FIRST */
+ int plan_alloc_size;
+ dsa_handle pe_h;
+ dsa_pointer pe_p;
+} ProgressiveExplainHashEntry;
+
+typedef struct ProgressiveExplainHashData
+{
+ int pid;
+ TimestampTz last_update;
+ char plan[FLEXIBLE_ARRAY_MEMBER];
+} ProgressiveExplainHashData;
+
+extern bool ProgressiveExplainIsActive(QueryDesc *queryDesc);
+extern void ProgressiveExplainSetup(QueryDesc *queryDesc);
+extern void ProgressiveExplainStart(QueryDesc *queryDesc);
+extern void ProgressiveExplainTrigger(void);
+extern void ProgressiveExplainUpdate(PlanState *node);
+extern void ProgressiveExplainFinish(QueryDesc *queryDesc);
+extern Size ProgressiveExplainHashShmemSize(void);
+extern void InitProgressiveExplainHash(void);
+extern TupleTableSlot *ExecProcNodeInstrExplain(PlanState *node);
+
+/* transaction cleanup code */
+extern void AtEOXact_ProgressiveExplain(bool isCommit);
+extern void AtEOSubXact_ProgressiveExplain(bool isCommit, int nestDepth);
+
+extern bool ProgressiveExplainPending;
+
+#endif /* EXPLAIN_PROGRESSIVE_H */
diff --git a/src/include/commands/explain_state.h b/src/include/commands/explain_state.h
index 32728f5d1a1..64370a5d6ea 100644
--- a/src/include/commands/explain_state.h
+++ b/src/include/commands/explain_state.h
@@ -16,6 +16,7 @@
#include "nodes/parsenodes.h"
#include "nodes/plannodes.h"
#include "parser/parse_node.h"
+#include "utils/dsa.h"
typedef enum ExplainSerializeOption
{
@@ -74,6 +75,14 @@ typedef struct ExplainState
/* extensions */
void **extension_state;
int extension_state_allocated;
+ /* set if tracking a progressive explain */
+ bool progressive;
+ /* current plan node in progressive explains */
+ struct PlanState *pe_curr_node;
+ /* reusable instr object used in progressive explains */
+ struct Instrumentation *pe_local_instr;
+ /* dsa area used to store progressive explain data */
+ dsa_area *pe_a;
} ExplainState;
typedef void (*ExplainOptionHandler) (ExplainState *, DefElem *, ParseState *);
diff --git a/src/include/executor/execdesc.h b/src/include/executor/execdesc.h
index ba53305ad42..27692aee542 100644
--- a/src/include/executor/execdesc.h
+++ b/src/include/executor/execdesc.h
@@ -48,6 +48,7 @@ typedef struct QueryDesc
TupleDesc tupDesc; /* descriptor for result tuples */
EState *estate; /* executor's query-wide state */
PlanState *planstate; /* tree of per-plan-node state */
+ struct ExplainState *pestate; /* progressive explain state if enabled */
/* This field is set by ExecutePlan */
bool already_executed; /* true if previously executed */
diff --git a/src/include/executor/instrument.h b/src/include/executor/instrument.h
index 03653ab6c6c..21de2a5632d 100644
--- a/src/include/executor/instrument.h
+++ b/src/include/executor/instrument.h
@@ -109,6 +109,7 @@ extern void InstrStartNode(Instrumentation *instr);
extern void InstrStopNode(Instrumentation *instr, double nTuples);
extern void InstrUpdateTupleCount(Instrumentation *instr, double nTuples);
extern void InstrEndLoop(Instrumentation *instr);
+extern void InstrEndLoopForce(Instrumentation *instr);
extern void InstrAggNode(Instrumentation *dst, Instrumentation *add);
extern void InstrStartParallelQuery(void);
extern void InstrEndParallelQuery(BufferUsage *bufusage, WalUsage *walusage);
diff --git a/src/include/nodes/execnodes.h b/src/include/nodes/execnodes.h
index 5b6cadb5a6c..b7d2d0458de 100644
--- a/src/include/nodes/execnodes.h
+++ b/src/include/nodes/execnodes.h
@@ -57,6 +57,7 @@ struct ExprState;
struct ExprContext;
struct RangeTblEntry; /* avoid including parsenodes.h here */
struct ExprEvalStep; /* avoid including execExpr.h everywhere */
+struct QueryDesc; /* avoid including execdesc.h here */
struct CopyMultiInsertBuffer;
struct LogicalTapeSet;
@@ -769,6 +770,9 @@ typedef struct EState
*/
List *es_insert_pending_result_relations;
List *es_insert_pending_modifytables;
+
+ /* Reference to query descriptor */
+ struct QueryDesc *query_desc;
} EState;
@@ -1165,6 +1169,8 @@ typedef struct PlanState
ExecProcNodeMtd ExecProcNode; /* function to return next tuple */
ExecProcNodeMtd ExecProcNodeReal; /* actual function, if above is a
* wrapper */
+ ExecProcNodeMtd ExecProcNodeOriginal; /* pointer to original function
+ * another wrapper was added */
Instrumentation *instrument; /* Optional runtime stats for this node */
WorkerInstrumentation *worker_instrument; /* per-worker instrumentation */
diff --git a/src/include/storage/lwlock.h b/src/include/storage/lwlock.h
index 4df1d25c045..a8cff27646c 100644
--- a/src/include/storage/lwlock.h
+++ b/src/include/storage/lwlock.h
@@ -219,6 +219,7 @@ typedef enum BuiltinTrancheIds
LWTRANCHE_XACT_SLRU,
LWTRANCHE_PARALLEL_VACUUM_DSA,
LWTRANCHE_AIO_URING_COMPLETION,
+ LWTRANCHE_PROGRESSIVE_EXPLAIN_DSA,
LWTRANCHE_FIRST_USER_DEFINED,
} BuiltinTrancheIds;
diff --git a/src/include/storage/lwlocklist.h b/src/include/storage/lwlocklist.h
index 932024b1b0b..7d88e7e9b58 100644
--- a/src/include/storage/lwlocklist.h
+++ b/src/include/storage/lwlocklist.h
@@ -84,3 +84,4 @@ PG_LWLOCK(50, DSMRegistry)
PG_LWLOCK(51, InjectionPoint)
PG_LWLOCK(52, SerialControl)
PG_LWLOCK(53, AioWorkerSubmissionQueue)
+PG_LWLOCK(54, ProgressiveExplainHash)
diff --git a/src/include/utils/guc.h b/src/include/utils/guc.h
index f619100467d..6bb9d36b003 100644
--- a/src/include/utils/guc.h
+++ b/src/include/utils/guc.h
@@ -278,6 +278,15 @@ extern PGDLLIMPORT int log_min_duration_statement;
extern PGDLLIMPORT int log_temp_files;
extern PGDLLIMPORT double log_statement_sample_rate;
extern PGDLLIMPORT double log_xact_sample_rate;
+extern PGDLLIMPORT bool progressive_explain;
+extern PGDLLIMPORT int progressive_explain_interval;
+extern PGDLLIMPORT int progressive_explain_format;
+extern PGDLLIMPORT bool progressive_explain_verbose;
+extern PGDLLIMPORT bool progressive_explain_settings;
+extern PGDLLIMPORT bool progressive_explain_timing;
+extern PGDLLIMPORT bool progressive_explain_buffers;
+extern PGDLLIMPORT bool progressive_explain_wal;
+extern PGDLLIMPORT bool progressive_explain_costs;
extern PGDLLIMPORT char *backtrace_functions;
extern PGDLLIMPORT int temp_file_limit;
@@ -322,6 +331,7 @@ extern PGDLLIMPORT const struct config_enum_entry io_method_options[];
extern PGDLLIMPORT const struct config_enum_entry recovery_target_action_options[];
extern PGDLLIMPORT const struct config_enum_entry wal_level_options[];
extern PGDLLIMPORT const struct config_enum_entry wal_sync_method_options[];
+extern PGDLLIMPORT const struct config_enum_entry explain_format_options[];
/*
* Functions exported by guc.c
diff --git a/src/include/utils/timeout.h b/src/include/utils/timeout.h
index 7b19beafdc9..f2751c5b4df 100644
--- a/src/include/utils/timeout.h
+++ b/src/include/utils/timeout.h
@@ -36,6 +36,7 @@ typedef enum TimeoutId
IDLE_STATS_UPDATE_TIMEOUT,
CLIENT_CONNECTION_CHECK_TIMEOUT,
STARTUP_PROGRESS_TIMEOUT,
+ PROGRESSIVE_EXPLAIN_TIMEOUT,
/* First user-definable timeout reason */
USER_TIMEOUT,
/* Maximum number of timeout reasons */
diff --git a/src/test/modules/test_misc/t/008_progressive_explain.pl b/src/test/modules/test_misc/t/008_progressive_explain.pl
new file mode 100644
index 00000000000..895031524ec
--- /dev/null
+++ b/src/test/modules/test_misc/t/008_progressive_explain.pl
@@ -0,0 +1,128 @@
+# Copyright (c) 2023-2025, PostgreSQL Global Development Group
+#
+# Test progressive explain
+#
+# We need to make sure pg_stat_progress_explain does not show rows for the local
+# session, even if progressive explain is enabled. For other sessions pg_stat_progress_explain
+# should contain data for a PID only if progressive_explain is enabled and a query
+# is running. Data needs to be removed when query finishes (or gets cancelled).
+
+use strict;
+use PostgreSQL::Test::Cluster;
+use PostgreSQL::Test::Utils;
+use Test::More;
+
+# Initialize node
+my $node = PostgreSQL::Test::Cluster->new('progressive_explain');
+
+$node->init;
+# Configure progressive explain to be logged immediately
+$node->append_conf('postgresql.conf', 'progressive_explain_interval = 0');
+$node->start;
+
+# Test for local session
+sub test_local_session
+{
+ my $setting = $_[0];
+ # Make sure local session does not appear in pg_stat_progress_explain
+ my $count = $node->safe_psql(
+ 'postgres', qq[
+ SET progressive_explain='$setting';
+ SELECT count(*) from pg_stat_progress_explain WHERE pid = pg_backend_pid()
+ ]);
+
+ ok($count == "0",
+ "Session cannot see its own explain with progressive_explain set to ${setting}");
+}
+
+# Tests for peer session
+sub test_peer_session
+{
+ my $setting = $_[0];
+ my $ret;
+
+ # Start a background session and get its PID
+ my $background_psql = $node->background_psql(
+ 'postgres',
+ on_error_stop => 0);
+
+ my $pid = $background_psql->query_safe(
+ qq[
+ SELECT pg_backend_pid();
+ ]);
+
+ # Configure progressive explain in background session and run a simple query
+ # letting it finish
+ $background_psql->query_safe(
+ qq[
+ SET progressive_explain='$setting';
+ SELECT 1;
+ ]);
+
+ # Check that pg_stat_progress_explain contains no row for the PID that finished
+ # its query gracefully
+ $ret = $node->safe_psql(
+ 'postgres', qq[
+ SELECT count(*) FROM pg_stat_progress_explain where pid = $pid
+ ]);
+
+ ok($ret == "0",
+ "pg_stat_progress_explain empty for completed query with progressive_explain set to ${setting}");
+
+ # Start query in background session and leave it running
+ $background_psql->query_until(
+ qr/start/, q(
+ \echo start
+ SELECT pg_sleep(600);
+ ));
+
+ $ret = $node->safe_psql(
+ 'postgres', qq[
+ SELECT count(*) FROM pg_stat_progress_explain where pid = $pid
+ ]);
+
+ # If progressive_explain is disabled pg_stat_progress_explain should not contain
+ # row for PID
+ if ($setting eq 'off') {
+ ok($ret == "0",
+ "pg_stat_progress_explain empty for running query with progressive_explain set to ${setting}");
+ }
+ # 1 row for pid is expected for running query
+ else {
+ ok($ret == "1",
+ "pg_stat_progress_explain with 1 row for running query with progressive_explain set to ${setting}");
+ }
+
+ # Terminate running query and make sure it is gone
+ $node->safe_psql(
+ 'postgres', qq[
+ SELECT pg_cancel_backend($pid);
+ ]);
+
+ $node->poll_query_until(
+ 'postgres', qq[
+ SELECT count(*) = 0 FROM pg_stat_activity
+ WHERE pid = $pid and state = 'active'
+ ]);
+
+ # Check again pg_stat_progress_explain and confirm that the existing row is
+ # now gone
+ $ret = $node->safe_psql(
+ 'postgres', qq[
+ SELECT count(*) FROM pg_stat_progress_explain where pid = $pid
+ ]);
+
+ ok($ret == "0",
+ "pg_stat_progress_explain empty for canceled query with progressive_explain set to ${setting}");
+}
+
+# Run tests for the local session
+test_local_session('off');
+test_local_session('on');
+
+# Run tests for peer session
+test_peer_session('off');
+test_peer_session('on');
+
+$node->stop;
+done_testing();
diff --git a/src/test/regress/expected/rules.out b/src/test/regress/expected/rules.out
index 47478969135..62b70cf4618 100644
--- a/src/test/regress/expected/rules.out
+++ b/src/test/regress/expected/rules.out
@@ -2041,6 +2041,13 @@ pg_stat_progress_create_index| SELECT s.pid,
s.param15 AS partitions_done
FROM (pg_stat_get_progress_info('CREATE INDEX'::text) s(pid, datid, relid, param1, param2, param3, param4, param5, param6, param7, param8, param9, param10, param11, param12, param13, param14, param15, param16, param17, param18, param19, param20)
LEFT JOIN pg_database d ON ((s.datid = d.oid)));
+pg_stat_progress_explain| SELECT s.datid,
+ d.datname,
+ s.pid,
+ s.last_update,
+ s.query_plan
+ FROM (pg_stat_progress_explain() s(datid, pid, last_update, query_plan)
+ LEFT JOIN pg_database d ON ((s.datid = d.oid)));
pg_stat_progress_vacuum| SELECT s.pid,
s.datid,
d.datname,
diff --git a/src/tools/pgindent/typedefs.list b/src/tools/pgindent/typedefs.list
index b66cecd8799..412e47eda9a 100644
--- a/src/tools/pgindent/typedefs.list
+++ b/src/tools/pgindent/typedefs.list
@@ -2302,6 +2302,8 @@ ProcessUtilityContext
ProcessUtility_hook_type
ProcessingMode
ProgressCommandType
+ProgressiveExplainHashData
+ProgressiveExplainHashEntry
ProjectSet
ProjectSetPath
ProjectSetState
--
2.43.0
On Fri, Mar 7, 2025 at 6:43 AM Rafael Thofehrn Castro
<rafaelthca@gmail.com> wrote:
The wrapper code was implemented by torikoshia
(torikoshia(at)oss(dot)nttdata(dot)com),
so adding the credits here.
On Thu, Mar 20, 2025 at 5:35 AM Robert Haas <robertmhaas@gmail.com>
wrote:
Without having the prior discussion near to hand, I *think* that the
reason we wanted to do this wrap/unwrap stuff is to make it so that
the progressive EXPLAIN code could only execute when entering a new
plan node rather than at any random point inside of that plan node,
and that does seem a lot safer than the alternative.
Your assumption is correct. Various approaches were suggested, such as
picking a small number of safe and sufficient places for this feature or
classifying CFI() calls into safe and unsafe ones. However, in the end,
the wrapping approach [1]/messages/by-id/ac6c51071316279bf903078cf264c37a@oss.nttdata.com was the only one that remained
On 2025-03-30 02:51, Rafael Thofehrn Castro wrote:
Implemented this version. New patch has the following characteristics:
I haven't looked into the code yet, but when I ran below commands during
make installcheck, there was an error and an assertion failure
=# select * from pg_stat_progress_explain;
=# \watch 0.1
ERROR: could not attach to dynamic shared area
WARNING: terminating connection because of crash of another server
process
DETAIL: The postmaster has commanded this server process to roll back
the current transaction and exit, because another server process exited
abnormally and possibly corrupted shared memory.
HINT: In a moment you should be able to reconnect to the database and
repeat your command.
server closed the connection unexpectedly
This probably means the server terminated abnormally
before or while processing the request.
The connection to the server was lost. Attempting reset: Failed.
The connection to the server was lost. Attempting reset: Failed.
TRAP: failed Assert("param->paramkind == PARAM_EXTERN"), File:
"ruleutils.c", Line: 8802, PID: 73180
TRAP: failed Assert("param->paramkind == PARAM_EXTERN"), File:
"ruleutils.c", Line: 8802, PID: 73181
0 postgres 0x000000010365f5c4
ExceptionalCondition + 236
1 postgres 0x00000001035a7830
get_parameter + 1076
0 postgres 0x000000010365f5c4
ExceptionalCondition + 236
2 postgres 0x000000010359ff2c
get_rule_expr + 276
1 postgres 0x00000001035a7830
get_parameter + 1076
3 postgres 0x00000001035a841c
get_rule_expr_paren + 168
2 postgres 0x000000010359ff2c
get_rule_expr + 276
4 postgres 0x00000001035a82c4
get_oper_expr + 292
3 postgres 0x00000001035a841c
get_rule_expr_paren + 168
5 postgres 0x00000001035a01f8
get_rule_expr + 992
4 postgres 0x00000001035a82c4
get_oper_expr + 292
6 postgres 0x0000000103598520
deparse_expression_pretty + 176
5 postgres 0x00000001035a01f8
get_rule_expr + 992
7 postgres 0x0000000103598e78
deparse_expression + 76
6 postgres 0x0000000103598520
deparse_expression_pretty + 176
8 postgres 0x0000000102f94198
show_expression + 100
7 postgres 0x0000000103598e78
deparse_expression + 76
9 postgres 0x0000000102f97690 show_qual +
112
8 postgres 0x0000000102f94198
show_expression + 100
10 postgres 0x0000000102f93734
show_scan_qual + 132
9 postgres 0x0000000102f97690 show_qual +
112
TRAP: failed Assert("param->paramkind == PARAM_EXTERN"), File:
"ruleutils.c", Line: 8802, PID: 73183
11 postgres 0x0000000102f90680 ExplainNode
+ 6828
10 postgres 0x0000000102f93734
show_scan_qual + 132
12 postgres 0x0000000102f8d398
ExplainPrintPlan + 540
0 postgres 0x000000010365f5c4
ExceptionalCondition + 236
11 postgres 0x0000000102f90680 ExplainNode
+ 6828
13 postgres 0x0000000102f9b974
ProgressiveExplainPrint + 72
12 postgres 0x0000000102f8d398
ExplainPrintPlan + 540
1 postgres 0x00000001035a7830
get_parameter + 1076
14 postgres 0x0000000102f9b920
ProgressiveExplainStart + 660
13 postgres 0x0000000102f9b974
ProgressiveExplainPrint + 72
2 postgres 0x000000010359ff2c
get_rule_expr + 276
15 postgres 0x00000001030771e0
standard_ExecutorStart + 984
14 postgres 0x0000000102f9b920
ProgressiveExplainStart + 660
3 postgres 0x00000001035a841c
get_rule_expr_paren + 168
16 postgres 0x0000000103076de8
ExecutorStart + 112
15 postgres 0x00000001030771e0
standard_ExecutorStart + 984
4 postgres 0x00000001035a82c4
get_oper_expr + 292
17 postgres 0x0000000103080d90
ParallelQueryMain + 292
16 postgres 0x0000000103076de8
ExecutorStart + 112
5 postgres 0x00000001035a01f8
get_rule_expr + 992
18 postgres 0x0000000102df7ef8
ParallelWorkerMain + 1712
17 postgres 0x0000000103080d90
ParallelQueryMain + 292
6 postgres 0x0000000103598520
deparse_expression_pretty + 176
19 postgres 0x00000001032a5d60
BackgroundWorkerMain + 824
18 postgres 0x0000000102df7ef8
ParallelWorkerMain + 1712
7 postgres 0x0000000103598e78
deparse_expression + 76
20 postgres 0x00000001032a9ee8
postmaster_child_launch + 492
19 postgres 0x00000001032a5d60
BackgroundWorkerMain + 824
8 postgres 0x0000000102f94198
show_expression + 100
21 postgres 0x00000001032b4c10
StartBackgroundWorker + 416
20 postgres 0x00000001032a9ee8
postmaster_child_launch + 492
9 postgres 0x0000000102f97690 show_qual +
112
22 postgres 0x00000001032af9d8
maybe_start_bgworkers + 552
21 postgres 0x00000001032b4c10
StartBackgroundWorker + 416
10 postgres 0x0000000102f93734
show_scan_qual + 132
23 postgres 0x00000001032b26cc
LaunchMissingBackgroundProcesses + 1316
22 postgres 0x00000001032af9d8
maybe_start_bgworkers + 552
11 postgres 0x0000000102f90680 ExplainNode
+ 6828
24 postgres 0x00000001032afcb0 ServerLoop
+ 616
23 postgres 0x00000001032b26cc
LaunchMissingBackgroundProcesses + 1316
12 postgres 0x0000000102f8d398
ExplainPrintPlan + 540
25 postgres 0x00000001032ae55c
PostmasterMain + 6632
13 postgres 0x0000000102f9b974
ProgressiveExplainPrint + 72
24 postgres 0x00000001032afcb0 ServerLoop
+ 616
25 postgres 0x00000001032ae55c
PostmasterMain + 6632
26 postgres 0x0000000103121160 main + 952
27 dyld 0x000000019cdc0274 start +
2840
26 postgres 0x0000000103121160 main + 952
27 dyld 0x000000019cdc0274 start +
2840
TRAP: failed Assert("param->paramkind == PARAM_EXTERN"), File:
"ruleutils.c", Line: 8802, PID: 73182
[1]: /messages/by-id/ac6c51071316279bf903078cf264c37a@oss.nttdata.com
/messages/by-id/ac6c51071316279bf903078cf264c37a@oss.nttdata.com
--
Atsushi Torikoshi
Seconded from NTT DATA GROUP CORPORATION to SRA OSS K.K.
On 3/31/25 02:23, torikoshia wrote:
On Fri, Mar 7, 2025 at 6:43 AM Rafael Thofehrn Castro
Implemented this version. New patch has the following characteristics:
I haven't looked into the code yet, but when I ran below commands during
make installcheck, there was an error and an assertion failure=# select * from pg_stat_progress_explain;
=# \watch 0.1
Yeah, that's to be expected.
I think many corner cases may be found: hash table in the middle of
filling, opened file descriptors, an incorrect combination of variables,
'not yet visited' subtrees - who knows what else? So, last time, I
just ended up with the idea that using the explain code is a bit
dangerous - in the middle of execution, it is enough to expose only
basic data - rows, numbers and timings. It seems safe to gather.
--
regards, Andrei Lepikhov
Thanks for this valuable testing. I think this is actually a really
good idea for how to test something like this, because the regression
tests contain lots of different queries that do lots of weird things.
On Sun, Mar 30, 2025 at 8:23 PM torikoshia <torikoshia@oss.nttdata.com> wrote:
I haven't looked into the code yet, but when I ran below commands during
make installcheck, there was an error and an assertion failure=# select * from pg_stat_progress_explain;
=# \watch 0.1ERROR: could not attach to dynamic shared area
This seems like a race condition. Probably some backend's dsa_area
went away between the time we got a pointer to it and the time we
actually attached to it. We should be able to find some way of
handling this without an error, like treating the case where the DSA
area is missing the same as the case where there was no DSA pointer in
the first place. However, this is also making me wonder if we
shouldn't be using one DSA shared by all backends rather than a
separate DSA area for every backend. That would require more care to
avoid leaks, but I'm not sure that it's a good idea to be creating and
destroying a DSA area for every single query. But I'm not 100% sure
that's a problem.
TRAP: failed Assert("param->paramkind == PARAM_EXTERN"), File:
"ruleutils.c", Line: 8802, PID: 73180
TRAP: failed Assert("param->paramkind == PARAM_EXTERN"), File:
"ruleutils.c", Line: 8802, PID: 73181
I wonder what is happening here. One systemic danger of the patch is
that there can be a difference between what must be true at the *end*
of a query and what must be true *during* a query. Anything that can't
happen at the end but can happen in the middle is something that the
patch will need to do something about in order to work properly. But I
don't see how that can explain this failure, because AFAIR the patch
just prints the same things that would have been printed by any other
EXPLAIN, with the same stack of namespaces. It seems possible that
this is a pre-existing bug: the regression tests might contain a query
that would cause EXPLAIN to fail, but because the regression tests
don't actually EXPLAIN that query, no failure occurs. But it could
also be something else; for example, maybe this patch is trying to
EXPLAIN something that couldn't be used with a regular EXPLAIN for
some reason. Or maybe the patch doesn't actually succeed in doing the
EXPLAIN with the correct namespace stack in all cases.
--
Robert Haas
EDB: http://www.enterprisedb.com
I haven't looked into the code yet, but when I ran below commands during
make installcheck, there was an error and an assertion failure
Thanks for the report. I actually made a nasty mistake in the last
patch after code refactoring, which is to not properly check that
a QueryDesc is already being tracked. So every subquery call
in the same query is allocating DSAs and those segments are not
being properly cleared. So the patch is broken and probably explains
your crashes.
Just made a 1 line fix and make installcheck looks clean now. Will
do more tests before sending another version.
Hello again,
ERROR: could not attach to dynamic shared area
In addition to that refactoring issue, the current patch had a race
condition in pg_stat_progress_explain to access the DSA of a process
running a query that gets aborted.
While discussing with Robert we agreed that it would be wiser to take
a step back and change the strategy used to share progressive explain
data in shared memory.
Instead of using per backend's DSAs shared via a hash structure I now
define a dsa_pointer and a LWLock in each backend's PGPROC.
A global DSA is created by the first backend that attempts to use
the progressive explain feature. After the DSA is created, subsequent
uses of the feature will just allocate memory there and reference
via PGPROC's dsa_pointer.
This solves the race condition reported by Torikoshi and improves
concurrency performance as now we don't have a global LWLock
controlling shared memory access, but a per-backend LWLock.
Performed the same tests done by Torikoshi and it looks like we are
good now. Even with more frequent inspects in pg_stat_progress_explain
(\watch 0.01).
Rafael.
Attachments:
v15-0001-Proposal-for-progressive-explains.patchapplication/octet-stream; name=v15-0001-Proposal-for-progressive-explains.patchDownload
From b505821deafc0aff6556c9608915f7b2bd0d1dce Mon Sep 17 00:00:00 2001
From: rafaelthca <rafaelthca@gmail.com>
Date: Sat, 29 Mar 2025 14:19:46 -0300
Subject: [PATCH v15] Proposal for progressive explains
This proposal introduces a feature to print execution plans of active
queries in an in-memory shared DSA so that other sessions can visualize
via new view pg_stat_progress_explain.
Plans are only printed if new GUC parameter progressive_explain is
enabled.
When new GUC progressive_explain_interval is set to 0 the plan will be
printed only at query start. If set to any other value the QueryDesc
will be adjusted adding instrumentation flags. In that case the plan
will be printed on a fixed interval controlled by progressive_explain_interval
including all instrumentation stats computed so far (per node rows and
execution time).
New view:
- pg_stat_progress_explain
- datid: OID of the database
- datname: Name of the database
- pid: PID of the process running the query
- last_update: timestamp when plan was last printed
- query_plan: the actual plan (limited read privileges)
New GUCs:
- progressive_explain: if progressive plans are printed for local
session.
- type: bool
- default: off
- context: user
- progressive_explain_interval: interval between each explain print.
- type: int
- default: 0
- min: 0
- context: user
- progressive_explain_format: format used to print the plans.
- type: enum
- default: text
- values: [TEXT, XML, JSON, or YAML]
- context: user
- progressive_explain_settings: controls whether information about
modified configuration is added to the printed plan.
- type: bool
- default: off
- context: user
- progressive_explain_verbose: controls whether verbose details are
added to the printed plan.
- type: bool
- default: off
- context: user
- progressive_explain_buffers: controls whether buffers details are
added to the printed plan.
- type: bool
- default: off
- context: user
- progressive_explain_timing: controls whether per node timing details
are added to the printed plan.
- type: bool
- default: true
- context: user
- progressive_explain_wal: controls whether WAL record generation
details are added to the printed plan.
- type: bool
- default: off
- context: user
- progressive_explain_costs: controls whether estimated startup and
total cost of each plan noded is added to the printed plan.
- type: bool
- default: true
- context: user
---
contrib/auto_explain/auto_explain.c | 10 +-
doc/src/sgml/config.sgml | 156 +++++
doc/src/sgml/monitoring.sgml | 82 +++
doc/src/sgml/perform.sgml | 97 ++++
src/backend/access/transam/xact.c | 15 +
src/backend/catalog/system_views.sql | 10 +
src/backend/commands/Makefile | 1 +
src/backend/commands/explain.c | 220 +++++---
src/backend/commands/explain_format.c | 12 +
src/backend/commands/explain_progressive.c | 532 ++++++++++++++++++
src/backend/commands/meson.build | 1 +
src/backend/executor/execMain.c | 21 +
src/backend/executor/execProcnode.c | 16 +-
src/backend/executor/instrument.c | 20 +-
src/backend/storage/ipc/ipci.c | 7 +
src/backend/storage/lmgr/lwlock.c | 2 +
src/backend/storage/lmgr/proc.c | 6 +
src/backend/tcop/pquery.c | 3 +
.../utils/activity/wait_event_names.txt | 1 +
src/backend/utils/init/postinit.c | 10 +
src/backend/utils/misc/guc_tables.c | 110 ++++
src/backend/utils/misc/postgresql.conf.sample | 13 +
src/include/catalog/pg_proc.dat | 10 +
src/include/commands/explain_progressive.h | 42 ++
src/include/commands/explain_state.h | 7 +
src/include/executor/execdesc.h | 1 +
src/include/executor/instrument.h | 1 +
src/include/nodes/execnodes.h | 6 +
src/include/storage/lwlock.h | 2 +
src/include/storage/lwlocklist.h | 1 +
src/include/storage/proc.h | 5 +
src/include/utils/guc.h | 10 +
src/include/utils/timeout.h | 1 +
.../test_misc/t/008_progressive_explain.pl | 128 +++++
src/test/regress/expected/rules.out | 7 +
src/tools/pgindent/typedefs.list | 1 +
36 files changed, 1482 insertions(+), 85 deletions(-)
create mode 100644 src/backend/commands/explain_progressive.c
create mode 100644 src/include/commands/explain_progressive.h
create mode 100644 src/test/modules/test_misc/t/008_progressive_explain.pl
diff --git a/contrib/auto_explain/auto_explain.c b/contrib/auto_explain/auto_explain.c
index cd6625020a7..0d28ae2ffe1 100644
--- a/contrib/auto_explain/auto_explain.c
+++ b/contrib/auto_explain/auto_explain.c
@@ -42,14 +42,6 @@ static int auto_explain_log_level = LOG;
static bool auto_explain_log_nested_statements = false;
static double auto_explain_sample_rate = 1;
-static const struct config_enum_entry format_options[] = {
- {"text", EXPLAIN_FORMAT_TEXT, false},
- {"xml", EXPLAIN_FORMAT_XML, false},
- {"json", EXPLAIN_FORMAT_JSON, false},
- {"yaml", EXPLAIN_FORMAT_YAML, false},
- {NULL, 0, false}
-};
-
static const struct config_enum_entry loglevel_options[] = {
{"debug5", DEBUG5, false},
{"debug4", DEBUG4, false},
@@ -191,7 +183,7 @@ _PG_init(void)
NULL,
&auto_explain_log_format,
EXPLAIN_FORMAT_TEXT,
- format_options,
+ explain_format_options,
PGC_SUSET,
0,
NULL,
diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml
index fea683cb49c..2d611720d8d 100644
--- a/doc/src/sgml/config.sgml
+++ b/doc/src/sgml/config.sgml
@@ -8689,6 +8689,162 @@ COPY postgres_log FROM '/full/path/to/logfile.csv' WITH csv;
</listitem>
</varlistentry>
+ <varlistentry id="guc-progressive-explain" xreflabel="progressive_explain">
+ <term><varname>progressive_explain</varname> (<type>enum</type>)
+ <indexterm>
+ <primary><varname>progressive_explain</varname> configuration parameter</primary>
+ </indexterm>
+ </term>
+ <listitem>
+ <para>
+ Determines whether progressive explains are enabled and how
+ they are executed; see <xref linkend="using-explain-progressive"/>.
+ Possible values are <literal>off</literal>, <literal>explain</literal>
+ and <literal>analyze</literal>. The default is <literal>off</literal>.
+ When set to <literal>explain</literal> the plan will be printed only
+ once after <xref linkend="guc-progressive-explain-min-duration"/>. If
+ set to <literal>analyze</literal>, instrumentation flags are enabled,
+ causing the plan to be printed on a fixed interval controlled by
+ <xref linkend="guc-progressive-explain-interval"/> including all
+ instrumentation stats computed so far.
+ </para>
+ </listitem>
+ </varlistentry>
+
+ <varlistentry id="guc-progressive-explain-min-duration" xreflabel="progressive_explain_min_duration">
+ <term><varname>progressive_explain_min_duration</varname> (<type>integer</type>)
+ <indexterm>
+ <primary><varname>progressive_explain_min_duration</varname> configuration parameter</primary>
+ </indexterm>
+ </term>
+ <listitem>
+ <para>
+ Sets the threshold (in milliseconds) until progressive explain is
+ printed for the first time. The default is <literal>1s</literal>.
+ </para>
+ </listitem>
+ </varlistentry>
+
+ <varlistentry id="guc-progressive-explain-interval" xreflabel="progressive_explain_interval">
+ <term><varname>progressive_explain_interval</varname> (<type>integer</type>)
+ <indexterm>
+ <primary><varname>progressive_explain_interval</varname> configuration parameter</primary>
+ </indexterm>
+ </term>
+ <listitem>
+ <para>
+ Sets the interval (in milliseconds) between each instrumented
+ progressive explain. The default is <literal>1s</literal>.
+ </para>
+ </listitem>
+ </varlistentry>
+
+ <varlistentry id="guc-progressive-explain-buffers" xreflabel="progressive_explain_buffers">
+ <term><varname>progressive_explain_buffers</varname> (<type>boolean</type>)
+ <indexterm>
+ <primary><varname>progressive_explain_buffers</varname> configuration parameter</primary>
+ </indexterm>
+ </term>
+ <listitem>
+ <para>
+ Controls whether information on buffer usage is added to
+ progressive explains. Equivalent to the BUFFERS option of
+ EXPLAIN. The default is <literal>off</literal>.
+ </para>
+ </listitem>
+ </varlistentry>
+
+ <varlistentry id="guc-progressive-explain-timing" xreflabel="progressive_explain_timing">
+ <term><varname>progressive_explain_timing</varname> (<type>boolean</type>)
+ <indexterm>
+ <primary><varname>progressive_explain_timing</varname> configuration parameter</primary>
+ </indexterm>
+ </term>
+ <listitem>
+ <para>
+ Controls whether information on per node timing is added
+ to progressive explains. Equivalent to the TIMING option of
+ EXPLAIN. The default is <literal>off</literal>.
+ </para>
+ </listitem>
+ </varlistentry>
+
+ <varlistentry id="guc-progressive-explain-wal" xreflabel="progressive_explain_wal">
+ <term><varname>progressive_explain_wal</varname> (<type>boolean</type>)
+ <indexterm>
+ <primary><varname>progressive_explain_wal</varname> configuration parameter</primary>
+ </indexterm>
+ </term>
+ <listitem>
+ <para>
+ Controls whether information on WAL record generation is
+ added to progressive explains. Equivalent to the WAL option of
+ EXPLAIN. The default is <literal>off</literal>.
+ </para>
+ </listitem>
+ </varlistentry>
+
+ <varlistentry id="guc-progressive-explain-verbose" xreflabel="progressive_explain_verbose">
+ <term><varname>progressive_explain_verbose</varname> (<type>boolean</type>)
+ <indexterm>
+ <primary><varname>progressive_explain_verbose</varname> configuration parameter</primary>
+ </indexterm>
+ </term>
+ <listitem>
+ <para>
+ Controls whether verbose details are added to progressive explains.
+ Equivalent to the VERBOSE option of EXPLAIN. The default is
+ <literal>off</literal>.
+ </para>
+ </listitem>
+ </varlistentry>
+
+ <varlistentry id="guc-progressive-explain-settings" xreflabel="progressive_explain_settings">
+ <term><varname>progressive_explain_settings</varname> (<type>boolean</type>)
+ <indexterm>
+ <primary><varname>progressive_explain_settings</varname> configuration parameter</primary>
+ </indexterm>
+ </term>
+ <listitem>
+ <para>
+ Controls whether information on modified configuration is added to
+ progressive explains. Equivalent to the SETTINGS option of EXPLAIN.
+ The default is <literal>off</literal>.
+ </para>
+ </listitem>
+ </varlistentry>
+
+ <varlistentry id="guc-progressive-explain-costs" xreflabel="progressive_explain_costs">
+ <term><varname>progressive_explain_costs</varname> (<type>boolean</type>)
+ <indexterm>
+ <primary><varname>progressive_explain_costs</varname> configuration parameter</primary>
+ </indexterm>
+ </term>
+ <listitem>
+ <para>
+ Controls whether information on the estimated startup and total cost of
+ each plan node is added to progressive explains. Equivalent to the COSTS
+ option of EXPLAIN.
+ The default is <literal>off</literal>.
+ </para>
+ </listitem>
+ </varlistentry>
+
+ <varlistentry id="guc-progressive-explain-format" xreflabel="progressive_explain_format">
+ <term><varname>progressive_explain_format</varname> (<type>enum</type>)
+ <indexterm>
+ <primary><varname>progressive_explain_format</varname> configuration parameter</primary>
+ </indexterm>
+ </term>
+ <listitem>
+ <para>
+ Selects the EXPLAIN output format to be used with progressive
+ explains. Equivalent to the FORMAT option of EXPLAIN. The default
+ is <literal>text</literal>.
+ </para>
+ </listitem>
+ </varlistentry>
+
</variablelist>
</sect2>
diff --git a/doc/src/sgml/monitoring.sgml b/doc/src/sgml/monitoring.sgml
index a6d67d2fbaa..4c2a97bd54f 100644
--- a/doc/src/sgml/monitoring.sgml
+++ b/doc/src/sgml/monitoring.sgml
@@ -6842,6 +6842,88 @@ FROM pg_stat_get_backend_idset() AS backendid;
</sect2>
+<sect2 id="explain-progress-reporting">
+ <title>EXPLAIN Progress Reporting</title>
+
+ <indexterm>
+ <primary>pg_stat_progress_explain</primary>
+ </indexterm>
+
+ <para>
+ Whenever a client backend or parallel worker is running a query with
+ <xref linkend="guc-progressive-explain"/> enabled, the
+ <structname>pg_stat_progress_explain</structname> view will contain a
+ corresponding row with query plan details; see
+ <xref linkend="using-explain-progressive"/>. The table below describe the
+ information that will be reported.
+ </para>
+
+ <table id="pg-stat-progress-explain-view" xreflabel="pg_stat_progress_explain">
+ <title><structname>pg_stat_progress_explain</structname> View</title>
+ <tgroup cols="1">
+ <thead>
+ <row>
+ <entry role="catalog_table_entry"><para role="column_definition">
+ Column Type
+ </para>
+ <para>
+ Description
+ </para></entry>
+ </row>
+ </thead>
+
+ <tbody>
+ <row>
+ <entry role="catalog_table_entry"><para role="column_definition">
+ <structfield>datid</structfield> <type>oid</type>
+ </para>
+ <para>
+ OID of the database this backend is connected to
+ </para></entry>
+ </row>
+
+ <row>
+ <entry role="catalog_table_entry"><para role="column_definition">
+ <structfield>datname</structfield> <type>name</type>
+ </para>
+ <para>
+ Name of the database this backend is connected to
+ </para></entry>
+ </row>
+
+ <row>
+ <entry role="catalog_table_entry"><para role="column_definition">
+ <structfield>pid</structfield> <type>integer</type>
+ </para>
+ <para>
+ Process ID of a client backend or parallel worker.
+ </para></entry>
+ </row>
+
+ <row>
+ <entry role="catalog_table_entry"><para role="column_definition">
+ <structfield>last_update</structfield> <type>timestamp with time zone</type>
+ </para>
+ <para>
+ Timestamp when plan was last printed.
+ </para></entry>
+ </row>
+
+ <row>
+ <entry role="catalog_table_entry"><para role="column_definition">
+ <structfield>query_plan</structfield> <type>text</type>
+ </para>
+ <para>
+ The progressive explain text.
+ </para></entry>
+ </row>
+
+ </tbody>
+ </tgroup>
+ </table>
+
+ </sect2>
+
</sect1>
<sect1 id="dynamic-trace">
diff --git a/doc/src/sgml/perform.sgml b/doc/src/sgml/perform.sgml
index 387baac7e8c..04a78f29df9 100644
--- a/doc/src/sgml/perform.sgml
+++ b/doc/src/sgml/perform.sgml
@@ -1169,6 +1169,103 @@ EXPLAIN ANALYZE SELECT * FROM tenk1 WHERE unique1 < 100 AND unique2 > 9000
</para>
</sect2>
+ <sect2 id="using-explain-progressive">
+ <title>Progressive <command>EXPLAIN</command></title>
+
+ <para>
+ The query plan created by the planner for any given active query can
+ be visualized by any session via <xref linkend="pg-stat-progress-explain-view"/>
+ view when <xref linkend="guc-progressive-explain"/> is enabled in the
+ client backend or parallel worker executing query and after min duration
+ specified by <xref linkend="guc-progressive-explain-min-duration"/> has
+ passed. Settings <xref linkend="guc-progressive-explain-timing"/>,
+ <xref linkend="guc-progressive-explain-buffers"/> and
+ <xref linkend="guc-progressive-explain-wal"/> control which additional
+ instrumentaton details are collected and included in the output while
+ <xref linkend="guc-progressive-explain-format"/>,
+ <xref linkend="guc-progressive-explain-verbose"/>,
+ <xref linkend="guc-progressive-explain-settings"/> and
+ <xref linkend="guc-progressive-explain-costs"/>
+ define how the plan is printed and which details are added there.
+ </para>
+
+ <para>
+ When <xref linkend="guc-progressive-explain"/> is set to <literal>explain</literal>
+ the plan will be printed once at the beginning of the query.
+ </para>
+
+ <para>
+<screen>
+SET progressive_explain = 'explain';
+SET
+
+SELECT * FROM test t1 INNER JOIN test t2 ON t1.c1=t2.c1;
+</screen>
+ </para>
+ <para>
+<screen>
+SELECT * FROM pg_stat_progress_explain;
+datid | datname | pid | last_update | query_plan
+-------+----------+-------+-------------------------------+------------------------------------------------------------------------------
+ 5 | postgres | 73972 | 2025-03-13 23:41:01.606324-03 | Hash Join (cost=327879.85..878413.95 rows=9999860 width=18) +
+ | | | | Hash Cond: (t1.c1 = t2.c1) +
+ | | | | -> Seq Scan on test t1 (cost=0.00..154053.60 rows=9999860 width=9) +
+ | | | | -> Hash (cost=154053.60..154053.60 rows=9999860 width=9) +
+ | | | | -> Seq Scan on test t2 (cost=0.00..154053.60 rows=9999860 width=9)+
+ | | | |
+(1 row)
+</screen>
+ </para>
+
+ <para>
+ Setting <xref linkend="guc-progressive-explain"/> to <literal>analyze</literal>
+ will enable instrumentation and the detailed plan is printed on a fixed interval
+ controlled by <xref linkend="guc-progressive-explain-interval"/>, including
+ per node accumulated row count and other statistics.
+ </para>
+
+ <para>
+ Progressive explains include additional information per node to help analyzing
+ execution progress:
+
+ <itemizedlist>
+ <listitem>
+ <para>
+ current: the plan node currently being processed.
+ </para>
+ </listitem>
+ <listitem>
+ <para>
+ never executed: a plan node not processed yet.
+ </para>
+ </listitem>
+ </itemizedlist>
+ </para>
+ <para>
+<screen>
+SET progressive_explain = 'analyze';
+SET
+
+SELECT * FROM test t1 INNER JOIN test t2 ON t1.c1=t2.c1;
+</screen>
+ </para>
+ <para>
+<screen>
+SELECT * FROM pg_stat_progress_explain;
+datid | datname | pid | last_update | query_plan
+-------+----------+-------+-------------------------------+---------------------------------------------------------------------------------------------------------------------------------------
+ 5 | postgres | 73972 | 2025-03-13 23:41:53.951884-03 | Hash Join (cost=327879.85..878413.95 rows=9999860 width=18) (actual time=1581.469..2963.158 rows=64862.00 loops=1) +
+ | | | | Hash Cond: (t1.c1 = t2.c1) +
+ | | | | -> Seq Scan on test t1 (cost=0.00..154053.60 rows=9999860 width=9) (actual time=0.079..486.702 rows=8258962.00 loops=1) (current)+
+ | | | | -> Hash (cost=154053.60..154053.60 rows=9999860 width=9) (actual time=1580.933..1580.933 rows=10000000.00 loops=1) +
+ | | | | -> Seq Scan on test t2 (cost=0.00..154053.60 rows=9999860 width=9) (actual time=0.004..566.961 rows=10000000.00 loops=1) +
+ | | | |
+(1 row)
+</screen>
+ </para>
+
+ </sect2>
+
</sect1>
<sect1 id="planner-stats">
diff --git a/src/backend/access/transam/xact.c b/src/backend/access/transam/xact.c
index b885513f765..28294d802e1 100644
--- a/src/backend/access/transam/xact.c
+++ b/src/backend/access/transam/xact.c
@@ -36,6 +36,7 @@
#include "catalog/pg_enum.h"
#include "catalog/storage.h"
#include "commands/async.h"
+#include "commands/explain_progressive.h"
#include "commands/tablecmds.h"
#include "commands/trigger.h"
#include "common/pg_prng.h"
@@ -2423,6 +2424,12 @@ CommitTransaction(void)
/* Clean up the type cache */
AtEOXact_TypeCache();
+ /*
+ * If progressive explain wasn't properly cleaned after query ended
+ * perform the cleanup and warn about leaked resources.
+ */
+ AtEOXact_ProgressiveExplain(true);
+
/*
* Make catalog changes visible to all backends. This has to happen after
* relcache references are dropped (see comments for
@@ -2993,6 +3000,7 @@ AbortTransaction(void)
AtEOXact_PgStat(false, is_parallel_worker);
AtEOXact_ApplyLauncher(false);
AtEOXact_LogicalRepWorkers(false);
+ AtEOXact_ProgressiveExplain(false);
pgstat_report_xact_timestamp(0);
}
@@ -5193,6 +5201,12 @@ CommitSubTransaction(void)
AtEOSubXact_PgStat(true, s->nestingLevel);
AtSubCommit_Snapshot(s->nestingLevel);
+ /*
+ * If progressive explain wasn't properly cleaned after subxact ended
+ * perform the cleanup and warn about leaked resources.
+ */
+ AtEOSubXact_ProgressiveExplain(true, s->nestingLevel);
+
/*
* We need to restore the upper transaction's read-only state, in case the
* upper is read-write while the child is read-only; GUC will incorrectly
@@ -5361,6 +5375,7 @@ AbortSubTransaction(void)
AtEOSubXact_HashTables(false, s->nestingLevel);
AtEOSubXact_PgStat(false, s->nestingLevel);
AtSubAbort_Snapshot(s->nestingLevel);
+ AtEOSubXact_ProgressiveExplain(false, s->nestingLevel);
}
/*
diff --git a/src/backend/catalog/system_views.sql b/src/backend/catalog/system_views.sql
index 31d269b7ee0..767735c1a2c 100644
--- a/src/backend/catalog/system_views.sql
+++ b/src/backend/catalog/system_views.sql
@@ -1334,6 +1334,16 @@ CREATE VIEW pg_stat_progress_copy AS
FROM pg_stat_get_progress_info('COPY') AS S
LEFT JOIN pg_database D ON S.datid = D.oid;
+CREATE VIEW pg_stat_progress_explain AS
+ SELECT
+ S.datid AS datid,
+ D.datname AS datname,
+ S.pid,
+ S.last_update,
+ S.query_plan
+ FROM pg_stat_progress_explain() AS S
+ LEFT JOIN pg_database AS D ON (S.datid = D.oid);
+
CREATE VIEW pg_user_mappings AS
SELECT
U.oid AS umid,
diff --git a/src/backend/commands/Makefile b/src/backend/commands/Makefile
index cb2fbdc7c60..e10224b2cd2 100644
--- a/src/backend/commands/Makefile
+++ b/src/backend/commands/Makefile
@@ -36,6 +36,7 @@ OBJS = \
explain.o \
explain_dr.o \
explain_format.o \
+ explain_progressive.o \
explain_state.o \
extension.o \
foreigncmds.o \
diff --git a/src/backend/commands/explain.c b/src/backend/commands/explain.c
index ef8aa489af8..179a1f8792b 100644
--- a/src/backend/commands/explain.c
+++ b/src/backend/commands/explain.c
@@ -20,6 +20,7 @@
#include "commands/explain.h"
#include "commands/explain_dr.h"
#include "commands/explain_format.h"
+#include "commands/explain_progressive.h"
#include "commands/explain_state.h"
#include "commands/prepare.h"
#include "foreign/fdwapi.h"
@@ -139,7 +140,7 @@ static void show_indexsearches_info(PlanState *planstate, ExplainState *es);
static void show_tidbitmap_info(BitmapHeapScanState *planstate,
ExplainState *es);
static void show_instrumentation_count(const char *qlabel, int which,
- PlanState *planstate, ExplainState *es);
+ Instrumentation *instr, ExplainState *es);
static void show_foreignscan_info(ForeignScanState *fsstate, ExplainState *es);
static const char *explain_get_index_name(Oid indexId);
static bool peek_buffer_usage(ExplainState *es, const BufferUsage *usage);
@@ -596,6 +597,15 @@ ExplainOnePlan(PlannedStmt *plannedstmt, CachedPlan *cplan,
/* We can't run ExecutorEnd 'till we're done printing the stats... */
totaltime += elapsed_time(&starttime);
}
+ else
+ {
+ /*
+ * Handle progressive explain cleanup manually if enabled as it is not
+ * called as part of ExecutorFinish.
+ */
+ if (progressive_explain)
+ ProgressiveExplainFinish(queryDesc);
+ }
/* grab serialization metrics before we destroy the DestReceiver */
if (es->serialize != EXPLAIN_SERIALIZE_NONE)
@@ -1371,6 +1381,7 @@ ExplainNode(PlanState *planstate, List *ancestors,
const char *partialmode = NULL;
const char *operation = NULL;
const char *custom_name = NULL;
+ Instrumentation *local_instr = NULL;
ExplainWorkersState *save_workers_state = es->workers_state;
int save_indent = es->indent;
bool haschildren;
@@ -1834,53 +1845,90 @@ ExplainNode(PlanState *planstate, List *ancestors,
* instrumentation results the user didn't ask for. But we do the
* InstrEndLoop call anyway, if possible, to reduce the number of cases
* auto_explain has to contend with.
+ *
+ * For regular explains instrumentation clean up is called directly in the
+ * main instrumentation objects. Progressive explains need to clone
+ * instrumentation object and forcibly end the loop in nodes that may be
+ * running.
*/
if (planstate->instrument)
- InstrEndLoop(planstate->instrument);
-
- if (es->analyze &&
- planstate->instrument && planstate->instrument->nloops > 0)
{
- double nloops = planstate->instrument->nloops;
- double startup_ms = 1000.0 * planstate->instrument->startup / nloops;
- double total_ms = 1000.0 * planstate->instrument->total / nloops;
- double rows = planstate->instrument->ntuples / nloops;
-
- if (es->format == EXPLAIN_FORMAT_TEXT)
+ /* Progressive explain. Use auxiliary instrumentation object */
+ if (es->progressive)
{
- appendStringInfo(es->str, " (actual ");
-
- if (es->timing)
- appendStringInfo(es->str, "time=%.3f..%.3f ", startup_ms, total_ms);
+ local_instr = es->pe_local_instr;
+ *local_instr = *planstate->instrument;
- appendStringInfo(es->str, "rows=%.2f loops=%.0f)", rows, nloops);
+ /* Force end loop even if node is in progress */
+ InstrEndLoopForce(local_instr);
}
+ /* Use main instrumentation */
else
{
- if (es->timing)
- {
- ExplainPropertyFloat("Actual Startup Time", "ms", startup_ms,
- 3, es);
- ExplainPropertyFloat("Actual Total Time", "ms", total_ms,
- 3, es);
- }
- ExplainPropertyFloat("Actual Rows", NULL, rows, 2, es);
- ExplainPropertyFloat("Actual Loops", NULL, nloops, 0, es);
+ local_instr = planstate->instrument;
+ InstrEndLoop(local_instr);
}
}
- else if (es->analyze)
+
+ /*
+ * Additional query execution details should only be included if
+ * instrumentation is enabled and, if progressive explain is enabled, it
+ * is configured to update the plan more than once.
+ */
+ if (es->analyze &&
+ (!es->progressive ||
+ (es->progressive && progressive_explain_interval > 0)))
{
- if (es->format == EXPLAIN_FORMAT_TEXT)
- appendStringInfoString(es->str, " (never executed)");
+ if (local_instr && local_instr->nloops > 0)
+ {
+ double nloops = local_instr->nloops;
+ double startup_ms = 1000.0 * local_instr->startup / nloops;
+ double total_ms = 1000.0 * local_instr->total / nloops;
+ double rows = local_instr->ntuples / nloops;
+
+ if (es->format == EXPLAIN_FORMAT_TEXT)
+ {
+ appendStringInfo(es->str, " (actual ");
+
+ if (es->timing)
+ appendStringInfo(es->str, "time=%.3f..%.3f ", startup_ms, total_ms);
+
+ appendStringInfo(es->str, "rows=%.2f loops=%.0f)", rows, nloops);
+
+ if (es->progressive && planstate == es->pe_curr_node)
+ appendStringInfo(es->str, " (current)");
+ }
+ else
+ {
+ if (es->timing)
+ {
+ ExplainPropertyFloat("Actual Startup Time", "ms", startup_ms,
+ 3, es);
+ ExplainPropertyFloat("Actual Total Time", "ms", total_ms,
+ 3, es);
+ }
+ ExplainPropertyFloat("Actual Rows", NULL, rows, 2, es);
+ ExplainPropertyFloat("Actual Loops", NULL, nloops, 0, es);
+
+ /* Progressive explain. Add current node flag is applicable */
+ if (es->progressive && planstate == es->pe_curr_node)
+ ExplainPropertyBool("Current", true, es);
+ }
+ }
else
{
- if (es->timing)
+ if (es->format == EXPLAIN_FORMAT_TEXT)
+ appendStringInfoString(es->str, " (never executed)");
+ else
{
- ExplainPropertyFloat("Actual Startup Time", "ms", 0.0, 3, es);
- ExplainPropertyFloat("Actual Total Time", "ms", 0.0, 3, es);
+ if (es->timing)
+ {
+ ExplainPropertyFloat("Actual Startup Time", "ms", 0.0, 3, es);
+ ExplainPropertyFloat("Actual Total Time", "ms", 0.0, 3, es);
+ }
+ ExplainPropertyFloat("Actual Rows", NULL, 0.0, 0, es);
+ ExplainPropertyFloat("Actual Loops", NULL, 0.0, 0, es);
}
- ExplainPropertyFloat("Actual Rows", NULL, 0.0, 0, es);
- ExplainPropertyFloat("Actual Loops", NULL, 0.0, 0, es);
}
}
@@ -1970,13 +2018,13 @@ ExplainNode(PlanState *planstate, List *ancestors,
"Index Cond", planstate, ancestors, es);
if (((IndexScan *) plan)->indexqualorig)
show_instrumentation_count("Rows Removed by Index Recheck", 2,
- planstate, es);
+ local_instr, es);
show_scan_qual(((IndexScan *) plan)->indexorderbyorig,
"Order By", planstate, ancestors, es);
show_scan_qual(plan->qual, "Filter", planstate, ancestors, es);
if (plan->qual)
show_instrumentation_count("Rows Removed by Filter", 1,
- planstate, es);
+ local_instr, es);
show_indexsearches_info(planstate, es);
break;
case T_IndexOnlyScan:
@@ -1984,16 +2032,16 @@ ExplainNode(PlanState *planstate, List *ancestors,
"Index Cond", planstate, ancestors, es);
if (((IndexOnlyScan *) plan)->recheckqual)
show_instrumentation_count("Rows Removed by Index Recheck", 2,
- planstate, es);
+ local_instr, es);
show_scan_qual(((IndexOnlyScan *) plan)->indexorderby,
"Order By", planstate, ancestors, es);
show_scan_qual(plan->qual, "Filter", planstate, ancestors, es);
if (plan->qual)
show_instrumentation_count("Rows Removed by Filter", 1,
- planstate, es);
+ local_instr, es);
if (es->analyze)
ExplainPropertyFloat("Heap Fetches", NULL,
- planstate->instrument->ntuples2, 0, es);
+ local_instr->ntuples2, 0, es);
show_indexsearches_info(planstate, es);
break;
case T_BitmapIndexScan:
@@ -2006,11 +2054,11 @@ ExplainNode(PlanState *planstate, List *ancestors,
"Recheck Cond", planstate, ancestors, es);
if (((BitmapHeapScan *) plan)->bitmapqualorig)
show_instrumentation_count("Rows Removed by Index Recheck", 2,
- planstate, es);
+ local_instr, es);
show_scan_qual(plan->qual, "Filter", planstate, ancestors, es);
if (plan->qual)
show_instrumentation_count("Rows Removed by Filter", 1,
- planstate, es);
+ local_instr, es);
show_tidbitmap_info((BitmapHeapScanState *) planstate, es);
break;
case T_SampleScan:
@@ -2027,7 +2075,7 @@ ExplainNode(PlanState *planstate, List *ancestors,
show_scan_qual(plan->qual, "Filter", planstate, ancestors, es);
if (plan->qual)
show_instrumentation_count("Rows Removed by Filter", 1,
- planstate, es);
+ local_instr, es);
if (IsA(plan, CteScan))
show_ctescan_info(castNode(CteScanState, planstate), es);
break;
@@ -2038,7 +2086,7 @@ ExplainNode(PlanState *planstate, List *ancestors,
show_scan_qual(plan->qual, "Filter", planstate, ancestors, es);
if (plan->qual)
show_instrumentation_count("Rows Removed by Filter", 1,
- planstate, es);
+ local_instr, es);
ExplainPropertyInteger("Workers Planned", NULL,
gather->num_workers, es);
@@ -2062,7 +2110,7 @@ ExplainNode(PlanState *planstate, List *ancestors,
show_scan_qual(plan->qual, "Filter", planstate, ancestors, es);
if (plan->qual)
show_instrumentation_count("Rows Removed by Filter", 1,
- planstate, es);
+ local_instr, es);
ExplainPropertyInteger("Workers Planned", NULL,
gm->num_workers, es);
@@ -2096,7 +2144,7 @@ ExplainNode(PlanState *planstate, List *ancestors,
show_scan_qual(plan->qual, "Filter", planstate, ancestors, es);
if (plan->qual)
show_instrumentation_count("Rows Removed by Filter", 1,
- planstate, es);
+ local_instr, es);
break;
case T_TableFuncScan:
if (es->verbose)
@@ -2110,7 +2158,7 @@ ExplainNode(PlanState *planstate, List *ancestors,
show_scan_qual(plan->qual, "Filter", planstate, ancestors, es);
if (plan->qual)
show_instrumentation_count("Rows Removed by Filter", 1,
- planstate, es);
+ local_instr, es);
show_table_func_scan_info(castNode(TableFuncScanState,
planstate), es);
break;
@@ -2128,7 +2176,7 @@ ExplainNode(PlanState *planstate, List *ancestors,
show_scan_qual(plan->qual, "Filter", planstate, ancestors, es);
if (plan->qual)
show_instrumentation_count("Rows Removed by Filter", 1,
- planstate, es);
+ local_instr, es);
}
break;
case T_TidRangeScan:
@@ -2145,14 +2193,14 @@ ExplainNode(PlanState *planstate, List *ancestors,
show_scan_qual(plan->qual, "Filter", planstate, ancestors, es);
if (plan->qual)
show_instrumentation_count("Rows Removed by Filter", 1,
- planstate, es);
+ local_instr, es);
}
break;
case T_ForeignScan:
show_scan_qual(plan->qual, "Filter", planstate, ancestors, es);
if (plan->qual)
show_instrumentation_count("Rows Removed by Filter", 1,
- planstate, es);
+ local_instr, es);
show_foreignscan_info((ForeignScanState *) planstate, es);
break;
case T_CustomScan:
@@ -2162,7 +2210,7 @@ ExplainNode(PlanState *planstate, List *ancestors,
show_scan_qual(plan->qual, "Filter", planstate, ancestors, es);
if (plan->qual)
show_instrumentation_count("Rows Removed by Filter", 1,
- planstate, es);
+ local_instr, es);
if (css->methods->ExplainCustomScan)
css->methods->ExplainCustomScan(css, ancestors, es);
}
@@ -2172,11 +2220,11 @@ ExplainNode(PlanState *planstate, List *ancestors,
"Join Filter", planstate, ancestors, es);
if (((NestLoop *) plan)->join.joinqual)
show_instrumentation_count("Rows Removed by Join Filter", 1,
- planstate, es);
+ local_instr, es);
show_upper_qual(plan->qual, "Filter", planstate, ancestors, es);
if (plan->qual)
show_instrumentation_count("Rows Removed by Filter", 2,
- planstate, es);
+ local_instr, es);
break;
case T_MergeJoin:
show_upper_qual(((MergeJoin *) plan)->mergeclauses,
@@ -2185,11 +2233,11 @@ ExplainNode(PlanState *planstate, List *ancestors,
"Join Filter", planstate, ancestors, es);
if (((MergeJoin *) plan)->join.joinqual)
show_instrumentation_count("Rows Removed by Join Filter", 1,
- planstate, es);
+ local_instr, es);
show_upper_qual(plan->qual, "Filter", planstate, ancestors, es);
if (plan->qual)
show_instrumentation_count("Rows Removed by Filter", 2,
- planstate, es);
+ local_instr, es);
break;
case T_HashJoin:
show_upper_qual(((HashJoin *) plan)->hashclauses,
@@ -2198,11 +2246,11 @@ ExplainNode(PlanState *planstate, List *ancestors,
"Join Filter", planstate, ancestors, es);
if (((HashJoin *) plan)->join.joinqual)
show_instrumentation_count("Rows Removed by Join Filter", 1,
- planstate, es);
+ local_instr, es);
show_upper_qual(plan->qual, "Filter", planstate, ancestors, es);
if (plan->qual)
show_instrumentation_count("Rows Removed by Filter", 2,
- planstate, es);
+ local_instr, es);
break;
case T_Agg:
show_agg_keys(castNode(AggState, planstate), ancestors, es);
@@ -2210,7 +2258,7 @@ ExplainNode(PlanState *planstate, List *ancestors,
show_hashagg_info((AggState *) planstate, es);
if (plan->qual)
show_instrumentation_count("Rows Removed by Filter", 1,
- planstate, es);
+ local_instr, es);
break;
case T_WindowAgg:
show_window_def(castNode(WindowAggState, planstate), ancestors, es);
@@ -2219,7 +2267,7 @@ ExplainNode(PlanState *planstate, List *ancestors,
show_upper_qual(plan->qual, "Filter", planstate, ancestors, es);
if (plan->qual)
show_instrumentation_count("Rows Removed by Filter", 1,
- planstate, es);
+ local_instr, es);
show_windowagg_info(castNode(WindowAggState, planstate), es);
break;
case T_Group:
@@ -2227,7 +2275,7 @@ ExplainNode(PlanState *planstate, List *ancestors,
show_upper_qual(plan->qual, "Filter", planstate, ancestors, es);
if (plan->qual)
show_instrumentation_count("Rows Removed by Filter", 1,
- planstate, es);
+ local_instr, es);
break;
case T_Sort:
show_sort_keys(castNode(SortState, planstate), ancestors, es);
@@ -2249,7 +2297,7 @@ ExplainNode(PlanState *planstate, List *ancestors,
show_upper_qual(plan->qual, "Filter", planstate, ancestors, es);
if (plan->qual)
show_instrumentation_count("Rows Removed by Filter", 1,
- planstate, es);
+ local_instr, es);
break;
case T_ModifyTable:
show_modifytable_info(castNode(ModifyTableState, planstate), ancestors,
@@ -2294,10 +2342,10 @@ ExplainNode(PlanState *planstate, List *ancestors,
}
/* Show buffer/WAL usage */
- if (es->buffers && planstate->instrument)
- show_buffer_usage(es, &planstate->instrument->bufusage);
- if (es->wal && planstate->instrument)
- show_wal_usage(es, &planstate->instrument->walusage);
+ if (es->buffers && local_instr)
+ show_buffer_usage(es, &local_instr->bufusage);
+ if (es->wal && local_instr)
+ show_wal_usage(es, &local_instr->walusage);
/* Prepare per-worker buffer/WAL usage */
if (es->workers_state && (es->buffers || es->wal) && es->verbose)
@@ -3975,19 +4023,19 @@ show_tidbitmap_info(BitmapHeapScanState *planstate, ExplainState *es)
*/
static void
show_instrumentation_count(const char *qlabel, int which,
- PlanState *planstate, ExplainState *es)
+ Instrumentation *instr, ExplainState *es)
{
double nfiltered;
double nloops;
- if (!es->analyze || !planstate->instrument)
+ if (!es->analyze || !instr)
return;
if (which == 2)
- nfiltered = planstate->instrument->nfiltered2;
+ nfiltered = instr->nfiltered2;
else
- nfiltered = planstate->instrument->nfiltered1;
- nloops = planstate->instrument->nloops;
+ nfiltered = instr->nfiltered1;
+ nloops = instr->nloops;
/* In text mode, suppress zero counts; they're not interesting enough */
if (nfiltered > 0 || es->format != EXPLAIN_FORMAT_TEXT)
@@ -4668,7 +4716,7 @@ show_modifytable_info(ModifyTableState *mtstate, List *ancestors,
{
show_upper_qual((List *) node->onConflictWhere, "Conflict Filter",
&mtstate->ps, ancestors, es);
- show_instrumentation_count("Rows Removed by Conflict Filter", 1, &mtstate->ps, es);
+ show_instrumentation_count("Rows Removed by Conflict Filter", 1, (&mtstate->ps)->instrument, es);
}
/* EXPLAIN ANALYZE display of actual outcome for each tuple proposed */
@@ -4677,11 +4725,24 @@ show_modifytable_info(ModifyTableState *mtstate, List *ancestors,
double total;
double insert_path;
double other_path;
+ Instrumentation *local_instr;
- InstrEndLoop(outerPlanState(mtstate)->instrument);
+ /* Progressive explain. Use auxiliary instrumentation object */
+ if (es->progressive)
+ {
+ local_instr = es->pe_local_instr;
+ *local_instr = *outerPlanState(mtstate)->instrument;
+ /* Force end loop even if node is in progress */
+ InstrEndLoopForce(local_instr);
+ }
+ else
+ {
+ local_instr = outerPlanState(mtstate)->instrument;
+ InstrEndLoop(local_instr);
+ }
/* count the number of source rows */
- total = outerPlanState(mtstate)->instrument->ntuples;
+ total = local_instr->ntuples;
other_path = mtstate->ps.instrument->ntuples2;
insert_path = total - other_path;
@@ -4701,11 +4762,24 @@ show_modifytable_info(ModifyTableState *mtstate, List *ancestors,
double update_path;
double delete_path;
double skipped_path;
+ Instrumentation *local_instr;
- InstrEndLoop(outerPlanState(mtstate)->instrument);
+ /* Progressive explain. Use auxiliary instrumentation object */
+ if (es->progressive)
+ {
+ local_instr = es->pe_local_instr;
+ *local_instr = *outerPlanState(mtstate)->instrument;
+ /* Force end loop even if node is in progress */
+ InstrEndLoopForce(local_instr);
+ }
+ else
+ {
+ local_instr = outerPlanState(mtstate)->instrument;
+ InstrEndLoop(local_instr);
+ }
/* count the number of source rows */
- total = outerPlanState(mtstate)->instrument->ntuples;
+ total = local_instr->ntuples;
insert_path = mtstate->mt_merge_inserted;
update_path = mtstate->mt_merge_updated;
delete_path = mtstate->mt_merge_deleted;
diff --git a/src/backend/commands/explain_format.c b/src/backend/commands/explain_format.c
index 752691d56db..c0d6973d1e5 100644
--- a/src/backend/commands/explain_format.c
+++ b/src/backend/commands/explain_format.c
@@ -16,6 +16,7 @@
#include "commands/explain.h"
#include "commands/explain_format.h"
#include "commands/explain_state.h"
+#include "utils/guc_tables.h"
#include "utils/json.h"
#include "utils/xml.h"
@@ -25,6 +26,17 @@
#define X_CLOSE_IMMEDIATE 2
#define X_NOWHITESPACE 4
+/*
+ * GUC support
+ */
+const struct config_enum_entry explain_format_options[] = {
+ {"text", EXPLAIN_FORMAT_TEXT, false},
+ {"xml", EXPLAIN_FORMAT_XML, false},
+ {"json", EXPLAIN_FORMAT_JSON, false},
+ {"yaml", EXPLAIN_FORMAT_YAML, false},
+ {NULL, 0, false}
+};
+
static void ExplainJSONLineEnding(ExplainState *es);
static void ExplainXMLTag(const char *tagname, int flags, ExplainState *es);
static void ExplainYAMLLineStarting(ExplainState *es);
diff --git a/src/backend/commands/explain_progressive.c b/src/backend/commands/explain_progressive.c
new file mode 100644
index 00000000000..d6e588aa53b
--- /dev/null
+++ b/src/backend/commands/explain_progressive.c
@@ -0,0 +1,532 @@
+/*-------------------------------------------------------------------------
+ *
+ * explain_progressive.c
+ * Code for the progressive explain feature
+ *
+ * Portions Copyright (c) 1996-2025, PostgreSQL Global Development Group
+ * Portions Copyright (c) 1994-5, Regents of the University of California
+ *
+ * IDENTIFICATION
+ * src/backend/commands/explain_progressive.c
+ *
+ *-------------------------------------------------------------------------
+ */
+#include "postgres.h"
+
+#include "access/xact.h"
+#include "catalog/pg_authid.h"
+#include "commands/explain.h"
+#include "commands/explain_format.h"
+#include "commands/explain_progressive.h"
+#include "commands/explain_state.h"
+#include "foreign/fdwapi.h"
+#include "funcapi.h"
+#include "storage/procarray.h"
+#include "utils/acl.h"
+#include "utils/backend_status.h"
+#include "utils/builtins.h"
+#include "utils/guc_tables.h"
+#include "utils/timeout.h"
+
+
+#define PROGRESSIVE_EXPLAIN_FREE_SIZE 4096
+
+/* Global DSA handle */
+static dsa_handle *progressiveExplainDSAHandle = NULL;
+
+/* Pointer to the tracked query */
+static QueryDesc *activeQueryDesc = NULL;
+
+/* Transaction nest level of the tracked query */
+static int activeQueryXactNestLevel = -1;
+
+/*
+ * Flag set by timeout function to control when to update
+ * instrumented progressive explains.
+ *
+ */
+bool ProgressiveExplainPending = false;
+
+static void ProgressiveExplainPrint(QueryDesc *queryDesc);
+static void ProgressiveExplainCleanup(bool isCommit);
+
+
+
+/*
+ * ProgressiveExplainSetup -
+ * Track query descriptor and adjust instrumentation.
+ *
+ * If progressive explain is enabled and configured to collect
+ * instrumentation details we adjust QueryDesc accordingly even
+ * if the query was not initiated with EXPLAIN ANALYZE. This will
+ * directly affect query execution and add computation overhead.
+ */
+void
+ProgressiveExplainSetup(QueryDesc *queryDesc)
+{
+ /* Setup only if this is the outer most query */
+ if (activeQueryDesc == NULL)
+ {
+ activeQueryDesc = queryDesc;
+ activeQueryXactNestLevel = GetCurrentTransactionNestLevel();
+
+ /*
+ * Enable instrumentation if the plan will be updated more than once.
+ */
+ if (progressive_explain_interval > 0)
+ {
+ if (progressive_explain_timing)
+ queryDesc->instrument_options |= INSTRUMENT_TIMER;
+ else
+ queryDesc->instrument_options |= INSTRUMENT_ROWS;
+ if (progressive_explain_buffers)
+ queryDesc->instrument_options |= INSTRUMENT_BUFFERS;
+ if (progressive_explain_wal)
+ queryDesc->instrument_options |= INSTRUMENT_WAL;
+ }
+ }
+}
+
+/*
+ * ProgressiveExplainStart -
+ * Responsible for initialization of all structures related to progressive
+ * explains.
+ *
+ * We define a ExplainState that will be reused in every iteration of
+ * plan updates.
+ *
+ * Progressive explain plans are updated in shared memory via global DSA
+ * allocated by the first backend that runs this code.
+ *
+ * A periodic timeout is configured to update the plan in fixed intervals if
+ * progressive explain is configured with instrumentation enabled. Otherwise
+ * the plain plan is updated once.
+ */
+void
+ProgressiveExplainStart(QueryDesc *queryDesc)
+{
+ ExplainState *es;
+
+ /*
+ * Progressive explain is only done for the outer most query descriptor.
+ */
+ if (queryDesc != activeQueryDesc)
+ return;
+
+ /* Initialize ExplainState to be used for all plan updates */
+ es = NewExplainState();
+ queryDesc->pestate = es;
+
+ /* Local instrumentation object to be reused for every node */
+ es->pe_local_instr = palloc0(sizeof(Instrumentation));
+
+ /*
+ * Mark ExplainState as progressive so that ExplainNode() function uses a
+ * special logic when printing the plan.
+ */
+ es->progressive = true;
+
+ es->analyze = (queryDesc->instrument_options &&
+ (progressive_explain_interval > 0));
+ es->buffers = (es->analyze && progressive_explain_buffers);
+ es->wal = (es->analyze && progressive_explain_wal);
+ es->timing = (es->analyze && progressive_explain_timing);
+ es->summary = (es->analyze);
+ es->format = progressive_explain_format;
+ es->verbose = progressive_explain_verbose;
+ es->settings = progressive_explain_settings;
+ es->costs = progressive_explain_costs;
+
+ /*
+ * We need a global exclusive lock to check that the global DSA was
+ * already created.
+ */
+ LWLockAcquire(ProgressiveExplainLock, LW_EXCLUSIVE);
+ if (*progressiveExplainDSAHandle == 0)
+ {
+ /*
+ * Create the DSA and pin it so that it persists regardless of
+ * existing backends.
+ */
+ dsa_area *a = dsa_create(LWTRANCHE_PROGRESSIVE_EXPLAIN_DSA);
+
+ dsa_pin(a);
+ *progressiveExplainDSAHandle = dsa_get_handle(a);
+ dsa_detach(a);
+ }
+ LWLockRelease(ProgressiveExplainLock);
+
+ /* Enable timeout only if instrumentation is enabled */
+ if (es->analyze)
+ enable_timeout_every(PROGRESSIVE_EXPLAIN_TIMEOUT,
+ TimestampTzPlusMilliseconds(GetCurrentTimestamp(),
+ progressive_explain_interval),
+ progressive_explain_interval);
+
+ /* Print progressive plan for the first time */
+ ProgressiveExplainPrint(queryDesc);
+}
+
+/*
+ * ProgressiveExplainUpdate
+ * Updates progressive explain for instrumented runs.
+ */
+void
+ProgressiveExplainUpdate(PlanState *node)
+{
+ /* Track the current PlanState */
+ node->state->query_desc->pestate->pe_curr_node = node;
+ ProgressiveExplainPrint(node->state->query_desc);
+ node->state->query_desc->pestate->pe_curr_node = NULL;
+
+ /* Reset timeout flag */
+ ProgressiveExplainPending = false;
+}
+
+/*
+ * ProgressiveExplainPrint -
+ * Updates progressive explain in memory.
+ *
+ * This function resets the reusable ExplainState, updates the
+ * plan and updates the DSA with new data.
+ *
+ * Memory allocation in the global DSA is also done here. Amount
+ * of shared memory allocated depends on size of currently updated
+ * plan. There may be reallocations in subsequent calls if new
+ * plans don't fit in the existing area.
+ */
+void
+ProgressiveExplainPrint(QueryDesc *queryDesc)
+{
+ bool alloc_needed = false;
+ QueryDesc *currentQueryDesc = queryDesc;
+ ProgressiveExplainData *pe_data;
+ ExplainState *es = queryDesc->pestate;
+ Size size = 0;
+ dsa_area *a;
+
+ /* Reset the string to be reused */
+ resetStringInfo(es->str);
+
+ /* Print the plan */
+ ExplainBeginOutput(es);
+ ExplainPrintPlan(es, currentQueryDesc);
+ ExplainEndOutput(es);
+
+ /*
+ * At this point we are certain that the common DSA area was already
+ * created. Just attach to it without any lock.
+ */
+ a = dsa_attach(*progressiveExplainDSAHandle);
+
+ /* Exclusive access is needed to update the data */
+ LWLockAcquire(&MyProc->peLock, LW_EXCLUSIVE);
+
+ /* Plan was never printed */
+ if (!MyProc->peDSAPointer)
+ alloc_needed = true;
+ else
+ {
+ pe_data = dsa_get_address(a, MyProc->peDSAPointer);
+
+ /*
+ * Plan does not fit in existing shared memory area. Reallocation is
+ * needed.
+ */
+ if (strlen(es->str->data) > pe_data->plan_alloc_size)
+ {
+ dsa_free(a, MyProc->peDSAPointer);
+ alloc_needed = true;
+ }
+ }
+
+ if (alloc_needed)
+ {
+ /*
+ * The allocated size combines the length of the currently printed
+ * query plan with an additional delta defined by
+ * PROGRESSIVE_EXPLAIN_FREE_SIZE. This strategy prevents having to
+ * reallocate the segment very often, which would be needed in case
+ * the length of the next printed exceeds the previously allocated
+ * size.
+ */
+ size = add_size(strlen(es->str->data),
+ PROGRESSIVE_EXPLAIN_FREE_SIZE);
+ MyProc->peDSAPointer = dsa_allocate(a,
+ add_size(sizeof(ProgressiveExplainData), size));
+ pe_data = dsa_get_address(a, MyProc->peDSAPointer);
+ pe_data->plan_alloc_size = size;
+ }
+
+ /* Update shared memory with new data */
+ strcpy(pe_data->plan, es->str->data);
+ pe_data->last_update = GetCurrentTimestamp();
+
+ LWLockRelease(&MyProc->peLock);
+
+ dsa_detach(a);
+}
+
+/*
+ * ProgressiveExplainFinish -
+ * Finalizes query execution with progressive explain enabled.
+ */
+void
+ProgressiveExplainFinish(QueryDesc *queryDesc)
+{
+ /*
+ * Progressive explain is only done for the outer most query descriptor.
+ */
+ if (queryDesc == activeQueryDesc)
+ ProgressiveExplainCleanup(true);
+}
+
+/*
+ * ProgressiveExplainIsActive -
+ * Checks if argument queryDesc is the one being tracked.
+ */
+bool
+ProgressiveExplainIsActive(QueryDesc *queryDesc)
+{
+ return queryDesc == activeQueryDesc;
+}
+
+/*
+ * End-of-transaction cleanup for progressive explains.
+ */
+void
+AtEOXact_ProgressiveExplain(bool isCommit)
+{
+ /* Only perform cleanup if query descriptor is being tracked */
+ if (activeQueryDesc != NULL)
+ {
+ if (isCommit)
+ elog(WARNING, "leaked progressive explain query descriptor");
+ ProgressiveExplainCleanup(isCommit);
+ }
+}
+
+/*
+ * End-of-subtransaction cleanup for progressive explains.
+ */
+void
+AtEOSubXact_ProgressiveExplain(bool isCommit, int nestDepth)
+{
+ /*
+ * Only perform cleanup if progressive explain is enabled
+ * (activeQueryXactNestLevel != -1) and the transaction nested level of
+ * the aborted subtransaction is greater or equal compared to the level of
+ * the tracked query. This is to avoid doing cleanup in subtransaction
+ * aborts triggered by exception blocks in functions and procedures.
+ */
+ if (activeQueryXactNestLevel >= nestDepth)
+ {
+ if (isCommit)
+ elog(WARNING, "leaked progressive explain query descriptor");
+ ProgressiveExplainCleanup(isCommit);
+ }
+}
+
+/*
+ * ProgressiveExplainCleanup -
+ * Cleanup routine when progressive explain is enabled.
+ *
+ * This function resets values of local variables, clears
+ * the DSA pointer of the local backend's PGPROC and
+ * frees memory allocated in the global DSA if query
+ * ended gracefully.
+ */
+void
+ProgressiveExplainCleanup(bool isCommit)
+{
+ dsa_pointer p;
+ dsa_area *a;
+
+ /* Stop timeout */
+ disable_timeout(PROGRESSIVE_EXPLAIN_TIMEOUT, false);
+
+ /* Reset timeout flag */
+ ProgressiveExplainPending = false;
+
+ /* Reset querydesc tracker and nested level */
+ activeQueryDesc = NULL;
+ activeQueryXactNestLevel = -1;
+
+ /* Clear the local backend's DSA pointer */
+ LWLockAcquire(&MyProc->peLock, LW_EXCLUSIVE);
+ p = MyProc->peDSAPointer;
+ MyProc->peDSAPointer = (dsa_pointer) NULL;
+ LWLockRelease(&MyProc->peLock);
+
+ /* Graceful execution, manually clean allocated area */
+ if (isCommit)
+ {
+ /*
+ * At this point we are certain that the common DSA area was already
+ * created. Just attach to it without any lock.
+ */
+ a = dsa_attach(*progressiveExplainDSAHandle);
+ dsa_free(a, p);
+ dsa_detach(a);
+ }
+}
+
+/*
+ * ExecProcNodeInstrExplain -
+ * ExecProcNode wrapper that performs instrumentation calls and updates
+ * progressive explains. By keeping this a separate function, we add
+ * overhead only when instrumented progressive explain is enabled.
+ */
+TupleTableSlot *
+ExecProcNodeInstrExplain(PlanState *node)
+{
+ TupleTableSlot *result;
+
+ InstrStartNode(node->instrument);
+
+ /*
+ * Update progressive after timeout is reached.
+ */
+ if (ProgressiveExplainPending)
+ ProgressiveExplainUpdate(node);
+
+ result = node->ExecProcNodeReal(node);
+
+ InstrStopNode(node->instrument, TupIsNull(result) ? 0.0 : 1.0);
+
+ return result;
+}
+
+/*
+ * ProgressiveExplainShmemSize
+ * Compute shared memory space needed for shared memory
+ * structures used by the progressive explain feature.
+ */
+Size
+ProgressiveExplainShmemSize(void)
+{
+ Size size = 0;
+
+ size = add_size(size, sizeof(dsa_handle));
+
+ return size;
+}
+
+/*
+ * ProgressiveExplainShmemInit -
+ * Initialize shared DSA handle.
+ *
+ * This handle will point to the global DSA area allocated
+ * by the first backend that attempts to perform progressive
+ * explains.
+ *
+ */
+void
+ProgressiveExplainShmemInit(void)
+{
+ bool found;
+
+ progressiveExplainDSAHandle = (dsa_handle *)
+ ShmemInitStruct("Progressive Explain Data",
+ sizeof(dsa_handle),
+ &found);
+
+ *progressiveExplainDSAHandle = 0;
+}
+
+/*
+ * pg_stat_progress_explain -
+ * Return the progress of progressive explains.
+ */
+Datum
+pg_stat_progress_explain(PG_FUNCTION_ARGS)
+{
+#define EXPLAIN_ACTIVITY_COLS 4
+ int num_backends = pgstat_fetch_stat_numbackends();
+ int curr_backend;
+ dsa_area *a;
+ ProgressiveExplainData *ped;
+
+ ReturnSetInfo *rsinfo = (ReturnSetInfo *) fcinfo->resultinfo;
+
+ InitMaterializedSRF(fcinfo, 0);
+
+ /*
+ * Progressive explain DSA was not created yet so there is no progressive
+ * explain to show.
+ */
+ LWLockAcquire(ProgressiveExplainLock, LW_SHARED);
+ if (*progressiveExplainDSAHandle == 0)
+ {
+ LWLockRelease(ProgressiveExplainLock);
+ return (Datum) 0;
+ }
+ LWLockRelease(ProgressiveExplainLock);
+
+ /*
+ * At this point we are certain that the common DSA area was already
+ * created. Just attach to it without any lock.
+ */
+ a = dsa_attach(*progressiveExplainDSAHandle);
+
+ /* 1-based index */
+ for (curr_backend = 1; curr_backend <= num_backends; curr_backend++)
+ {
+ Datum values[EXPLAIN_ACTIVITY_COLS] = {0};
+ bool nulls[EXPLAIN_ACTIVITY_COLS] = {0};
+ LocalPgBackendStatus *local_beentry;
+ PGPROC *proc;
+ PgBackendStatus *beentry;
+
+ /* Get the next one in the list */
+ local_beentry = pgstat_get_local_beentry_by_index(curr_backend);
+ beentry = &local_beentry->backendStatus;
+
+ proc = BackendPidGetProc(beentry->st_procpid);
+
+ /* We are only interested processes with PGPROC */
+ if (proc == NULL)
+ continue;
+
+ /*
+ * Make sure the target backend isn't updating the plan details in
+ * memory while we read it.
+ */
+ LWLockAcquire(&proc->peLock, LW_SHARED);
+
+ /*
+ * We don't look at a DSA that doesn't contain data yet, or at our own
+ * row.
+ */
+ if (!DsaPointerIsValid(proc->peDSAPointer) ||
+ MyProcPid == beentry->st_procpid)
+ {
+ LWLockRelease(&proc->peLock);
+ continue;
+ }
+
+ ped = dsa_get_address(a, proc->peDSAPointer);
+
+ /* Values available to all callers */
+ if (beentry->st_databaseid != InvalidOid)
+ values[0] = ObjectIdGetDatum(beentry->st_databaseid);
+ else
+ nulls[0] = true;
+
+ values[1] = beentry->st_procpid;
+ values[2] = TimestampTzGetDatum(ped->last_update);
+
+ if (has_privs_of_role(GetUserId(), ROLE_PG_READ_ALL_STATS) ||
+ has_privs_of_role(GetUserId(), beentry->st_procpid))
+ values[3] = CStringGetTextDatum(ped->plan);
+ else
+ values[3] = CStringGetTextDatum("<insufficient privilege>");
+
+ LWLockRelease(&proc->peLock);
+
+ tuplestore_putvalues(rsinfo->setResult, rsinfo->setDesc, values, nulls);
+ }
+
+ dsa_detach(a);
+ return (Datum) 0;
+}
diff --git a/src/backend/commands/meson.build b/src/backend/commands/meson.build
index dd4cde41d32..2bb0ac7d286 100644
--- a/src/backend/commands/meson.build
+++ b/src/backend/commands/meson.build
@@ -24,6 +24,7 @@ backend_sources += files(
'explain.c',
'explain_dr.c',
'explain_format.c',
+ 'explain_progressive.c',
'explain_state.c',
'extension.c',
'foreigncmds.c',
diff --git a/src/backend/executor/execMain.c b/src/backend/executor/execMain.c
index 2da848970be..89e9fb1bb04 100644
--- a/src/backend/executor/execMain.c
+++ b/src/backend/executor/execMain.c
@@ -43,6 +43,7 @@
#include "access/xact.h"
#include "catalog/namespace.h"
#include "catalog/partition.h"
+#include "commands/explain_progressive.h"
#include "commands/matview.h"
#include "commands/trigger.h"
#include "executor/executor.h"
@@ -160,6 +161,12 @@ standard_ExecutorStart(QueryDesc *queryDesc, int eflags)
/* caller must ensure the query's snapshot is active */
Assert(GetActiveSnapshot() == queryDesc->snapshot);
+ /*
+ * Setup progressive explain if enabled.
+ */
+ if (progressive_explain)
+ ProgressiveExplainSetup(queryDesc);
+
/*
* If the transaction is read-only, we need to check if any writes are
* planned to non-temporary tables. EXPLAIN is considered read-only.
@@ -185,6 +192,11 @@ standard_ExecutorStart(QueryDesc *queryDesc, int eflags)
estate = CreateExecutorState();
queryDesc->estate = estate;
+ /*
+ * Adding back reference to QueryDesc
+ */
+ estate->query_desc = queryDesc;
+
oldcontext = MemoryContextSwitchTo(estate->es_query_cxt);
/*
@@ -270,6 +282,12 @@ standard_ExecutorStart(QueryDesc *queryDesc, int eflags)
*/
InitPlan(queryDesc, eflags);
+ /*
+ * Start progressive explain if enabled.
+ */
+ if (progressive_explain)
+ ProgressiveExplainStart(queryDesc);
+
MemoryContextSwitchTo(oldcontext);
return ExecPlanStillValid(queryDesc->estate);
@@ -519,6 +537,9 @@ standard_ExecutorFinish(QueryDesc *queryDesc)
MemoryContextSwitchTo(oldcontext);
+ /* Finish progressive explain if enabled */
+ ProgressiveExplainFinish(queryDesc);
+
estate->es_finished = true;
}
diff --git a/src/backend/executor/execProcnode.c b/src/backend/executor/execProcnode.c
index f5f9cfbeead..7ca0544e45e 100644
--- a/src/backend/executor/execProcnode.c
+++ b/src/backend/executor/execProcnode.c
@@ -72,6 +72,7 @@
*/
#include "postgres.h"
+#include "commands/explain_progressive.h"
#include "executor/executor.h"
#include "executor/nodeAgg.h"
#include "executor/nodeAppend.h"
@@ -118,6 +119,7 @@
#include "executor/nodeWorktablescan.h"
#include "miscadmin.h"
#include "nodes/nodeFuncs.h"
+#include "utils/guc.h"
static TupleTableSlot *ExecProcNodeFirst(PlanState *node);
static TupleTableSlot *ExecProcNodeInstr(PlanState *node);
@@ -462,7 +464,19 @@ ExecProcNodeFirst(PlanState *node)
* have ExecProcNode() directly call the relevant function from now on.
*/
if (node->instrument)
- node->ExecProcNode = ExecProcNodeInstr;
+ {
+ /*
+ * Use instrumented wrapper for progressive explains only if the
+ * feature is enabled, is configured to update the plan more than once
+ * and the node belongs to the currently tracked query descriptor.
+ */
+ if (progressive_explain &&
+ progressive_explain_interval > 0 &&
+ ProgressiveExplainIsActive(node->state->query_desc))
+ node->ExecProcNode = ExecProcNodeInstrExplain;
+ else
+ node->ExecProcNode = ExecProcNodeInstr;
+ }
else
node->ExecProcNode = node->ExecProcNodeReal;
diff --git a/src/backend/executor/instrument.c b/src/backend/executor/instrument.c
index 56e635f4700..6a160ee254f 100644
--- a/src/backend/executor/instrument.c
+++ b/src/backend/executor/instrument.c
@@ -25,6 +25,8 @@ static WalUsage save_pgWalUsage;
static void BufferUsageAdd(BufferUsage *dst, const BufferUsage *add);
static void WalUsageAdd(WalUsage *dst, WalUsage *add);
+static void InstrEndLoopInternal(Instrumentation *instr, bool force);
+
/* Allocate new instrumentation structure(s) */
Instrumentation *
@@ -137,7 +139,7 @@ InstrUpdateTupleCount(Instrumentation *instr, double nTuples)
/* Finish a run cycle for a plan node */
void
-InstrEndLoop(Instrumentation *instr)
+InstrEndLoopInternal(Instrumentation *instr, bool force)
{
double totaltime;
@@ -145,7 +147,7 @@ InstrEndLoop(Instrumentation *instr)
if (!instr->running)
return;
- if (!INSTR_TIME_IS_ZERO(instr->starttime))
+ if (!INSTR_TIME_IS_ZERO(instr->starttime) && !force)
elog(ERROR, "InstrEndLoop called on running node");
/* Accumulate per-cycle statistics into totals */
@@ -164,6 +166,20 @@ InstrEndLoop(Instrumentation *instr)
instr->tuplecount = 0;
}
+/* Safely finish a run cycle for a plan node */
+void
+InstrEndLoop(Instrumentation *instr)
+{
+ InstrEndLoopInternal(instr, false);
+}
+
+/* Forcibly finish a run cycle for a plan node */
+void
+InstrEndLoopForce(Instrumentation *instr)
+{
+ InstrEndLoopInternal(instr, true);
+}
+
/* aggregate instrumentation information */
void
InstrAggNode(Instrumentation *dst, Instrumentation *add)
diff --git a/src/backend/storage/ipc/ipci.c b/src/backend/storage/ipc/ipci.c
index 2fa045e6b0f..7858498df92 100644
--- a/src/backend/storage/ipc/ipci.c
+++ b/src/backend/storage/ipc/ipci.c
@@ -25,6 +25,7 @@
#include "access/xlogprefetcher.h"
#include "access/xlogrecovery.h"
#include "commands/async.h"
+#include "commands/explain_progressive.h"
#include "miscadmin.h"
#include "pgstat.h"
#include "postmaster/autovacuum.h"
@@ -150,6 +151,7 @@ CalculateShmemSize(int *num_semaphores)
size = add_size(size, InjectionPointShmemSize());
size = add_size(size, SlotSyncShmemSize());
size = add_size(size, AioShmemSize());
+ size = add_size(size, ProgressiveExplainShmemSize());
/* include additional requested shmem from preload libraries */
size = add_size(size, total_addin_request);
@@ -302,6 +304,11 @@ CreateOrAttachShmemStructs(void)
*/
PredicateLockShmemInit();
+ /*
+ * Set up instrumented explain hash table
+ */
+ ProgressiveExplainShmemInit();
+
/*
* Set up process table
*/
diff --git a/src/backend/storage/lmgr/lwlock.c b/src/backend/storage/lmgr/lwlock.c
index 3df29658f18..0ad668660b2 100644
--- a/src/backend/storage/lmgr/lwlock.c
+++ b/src/backend/storage/lmgr/lwlock.c
@@ -178,6 +178,8 @@ static const char *const BuiltinTrancheNames[] = {
[LWTRANCHE_XACT_SLRU] = "XactSLRU",
[LWTRANCHE_PARALLEL_VACUUM_DSA] = "ParallelVacuumDSA",
[LWTRANCHE_AIO_URING_COMPLETION] = "AioUringCompletion",
+ [LWTRANCHE_PROGRESSIVE_EXPLAIN] = "ProgressiveExplain",
+ [LWTRANCHE_PROGRESSIVE_EXPLAIN_DSA] = "ProgressiveExplainDSA",
};
StaticAssertDecl(lengthof(BuiltinTrancheNames) ==
diff --git a/src/backend/storage/lmgr/proc.c b/src/backend/storage/lmgr/proc.c
index 066319afe2b..a90e7a452b1 100644
--- a/src/backend/storage/lmgr/proc.c
+++ b/src/backend/storage/lmgr/proc.c
@@ -269,6 +269,9 @@ InitProcGlobal(void)
LWLockInitialize(&(proc->fpInfoLock), LWTRANCHE_LOCK_FASTPATH);
}
+ /* Per-backend progressive explain locking. */
+ LWLockInitialize(&(proc->peLock), LWTRANCHE_PROGRESSIVE_EXPLAIN);
+
/*
* Newly created PGPROCs for normal backends, autovacuum workers,
* special workers, bgworkers, and walsenders must be queued up on the
@@ -479,6 +482,9 @@ InitProcess(void)
MyProc->clogGroupMemberLsn = InvalidXLogRecPtr;
Assert(pg_atomic_read_u32(&MyProc->clogGroupNext) == INVALID_PROC_NUMBER);
+ /* Initialize progressive explain field. */
+ MyProc->peDSAPointer = (dsa_pointer) NULL;
+
/*
* Acquire ownership of the PGPROC's latch, so that we can use WaitLatch
* on it. That allows us to repoint the process latch, which so far
diff --git a/src/backend/tcop/pquery.c b/src/backend/tcop/pquery.c
index 8164d0fbb4f..081966ca267 100644
--- a/src/backend/tcop/pquery.c
+++ b/src/backend/tcop/pquery.c
@@ -102,6 +102,9 @@ CreateQueryDesc(PlannedStmt *plannedstmt,
/* not yet executed */
qd->already_executed = false;
+ /* null until set by progressive explains */
+ qd->pestate = NULL;
+
return qd;
}
diff --git a/src/backend/utils/activity/wait_event_names.txt b/src/backend/utils/activity/wait_event_names.txt
index 4f44648aca8..7eb0614cf2f 100644
--- a/src/backend/utils/activity/wait_event_names.txt
+++ b/src/backend/utils/activity/wait_event_names.txt
@@ -351,6 +351,7 @@ DSMRegistry "Waiting to read or update the dynamic shared memory registry."
InjectionPoint "Waiting to read or update information related to injection points."
SerialControl "Waiting to read or update shared <filename>pg_serial</filename> state."
AioWorkerSubmissionQueue "Waiting to access AIO worker submission queue."
+ProgressiveExplain "Waiting to access progressive explain shared memory."
#
# END OF PREDEFINED LWLOCKS (DO NOT CHANGE THIS LINE)
diff --git a/src/backend/utils/init/postinit.c b/src/backend/utils/init/postinit.c
index 7958ea11b73..e070509b403 100644
--- a/src/backend/utils/init/postinit.c
+++ b/src/backend/utils/init/postinit.c
@@ -33,6 +33,7 @@
#include "catalog/pg_database.h"
#include "catalog/pg_db_role_setting.h"
#include "catalog/pg_tablespace.h"
+#include "commands/explain_progressive.h"
#include "libpq/auth.h"
#include "libpq/libpq-be.h"
#include "mb/pg_wchar.h"
@@ -82,6 +83,7 @@ static void TransactionTimeoutHandler(void);
static void IdleSessionTimeoutHandler(void);
static void IdleStatsUpdateTimeoutHandler(void);
static void ClientCheckTimeoutHandler(void);
+static void ProgressiveExplainTimeoutHandler(void);
static bool ThereIsAtLeastOneRole(void);
static void process_startup_options(Port *port, bool am_superuser);
static void process_settings(Oid databaseid, Oid roleid);
@@ -771,6 +773,8 @@ InitPostgres(const char *in_dbname, Oid dboid,
RegisterTimeout(CLIENT_CONNECTION_CHECK_TIMEOUT, ClientCheckTimeoutHandler);
RegisterTimeout(IDLE_STATS_UPDATE_TIMEOUT,
IdleStatsUpdateTimeoutHandler);
+ RegisterTimeout(PROGRESSIVE_EXPLAIN_TIMEOUT,
+ ProgressiveExplainTimeoutHandler);
}
/*
@@ -1432,6 +1436,12 @@ ClientCheckTimeoutHandler(void)
SetLatch(MyLatch);
}
+static void
+ProgressiveExplainTimeoutHandler(void)
+{
+ ProgressiveExplainPending = true;
+}
+
/*
* Returns true if at least one role is defined in this database cluster.
*/
diff --git a/src/backend/utils/misc/guc_tables.c b/src/backend/utils/misc/guc_tables.c
index 4eaeca89f2c..96fcbe8e8d6 100644
--- a/src/backend/utils/misc/guc_tables.c
+++ b/src/backend/utils/misc/guc_tables.c
@@ -41,6 +41,8 @@
#include "commands/async.h"
#include "commands/extension.h"
#include "commands/event_trigger.h"
+#include "commands/explain_progressive.h"
+#include "commands/explain_state.h"
#include "commands/tablespace.h"
#include "commands/trigger.h"
#include "commands/user.h"
@@ -533,6 +535,15 @@ int log_parameter_max_length_on_error = 0;
int log_temp_files = -1;
double log_statement_sample_rate = 1.0;
double log_xact_sample_rate = 0;
+bool progressive_explain = false;
+bool progressive_explain_verbose = false;
+bool progressive_explain_settings = false;
+bool progressive_explain_timing = true;
+bool progressive_explain_buffers = false;
+bool progressive_explain_wal = false;
+bool progressive_explain_costs = true;
+int progressive_explain_interval = 0;
+int progressive_explain_format = EXPLAIN_FORMAT_TEXT;
char *backtrace_functions;
int temp_file_limit = -1;
@@ -2131,6 +2142,83 @@ struct config_bool ConfigureNamesBool[] =
NULL, NULL, NULL
},
+ {
+ {"progressive_explain", PGC_USERSET, STATS_MONITORING,
+ gettext_noop("Enables progressive explains."),
+ gettext_noop("Explain output is visible via pg_stat_progress_explain."),
+ GUC_EXPLAIN
+ },
+ &progressive_explain,
+ false,
+ NULL, NULL, NULL
+ },
+
+ {
+ {"progressive_explain_verbose", PGC_USERSET, STATS_MONITORING,
+ gettext_noop("Controls whether verbose details are added to progressive explains."),
+ gettext_noop("Equivalent to the VERBOSE option of EXPLAIN."),
+ GUC_EXPLAIN
+ },
+ &progressive_explain_verbose,
+ false,
+ NULL, NULL, NULL
+ },
+
+ {
+ {"progressive_explain_settings", PGC_USERSET, STATS_MONITORING,
+ gettext_noop("Controls whether information on modified configuration is added to progressive explains."),
+ gettext_noop("Equivalent to the SETTINGS option of EXPLAIN."),
+ GUC_EXPLAIN
+ },
+ &progressive_explain_settings,
+ false,
+ NULL, NULL, NULL
+ },
+
+ {
+ {"progressive_explain_timing", PGC_USERSET, STATS_MONITORING,
+ gettext_noop("Controls whether information on per node timing is added to progressive explains."),
+ gettext_noop("Equivalent to the TIMING option of EXPLAIN."),
+ GUC_EXPLAIN
+ },
+ &progressive_explain_timing,
+ true,
+ NULL, NULL, NULL
+ },
+
+ {
+ {"progressive_explain_buffers", PGC_USERSET, STATS_MONITORING,
+ gettext_noop("Controls whether information on buffer usage is added to progressive explains."),
+ gettext_noop("Equivalent to the BUFFERS option of EXPLAIN."),
+ GUC_EXPLAIN
+ },
+ &progressive_explain_buffers,
+ false,
+ NULL, NULL, NULL
+ },
+
+ {
+ {"progressive_explain_wal", PGC_USERSET, STATS_MONITORING,
+ gettext_noop("Controls whether information on WAL record generation is added to progressive explains."),
+ gettext_noop("Equivalent to the WAL option of EXPLAIN."),
+ GUC_EXPLAIN
+ },
+ &progressive_explain_wal,
+ false,
+ NULL, NULL, NULL
+ },
+
+ {
+ {"progressive_explain_costs", PGC_USERSET, STATS_MONITORING,
+ gettext_noop("Controls whether information on the estimated startup and total cost of each plan node is added to progressive explains."),
+ gettext_noop("Equivalent to the COSTS option of EXPLAIN."),
+ GUC_EXPLAIN
+ },
+ &progressive_explain_costs,
+ true,
+ NULL, NULL, NULL
+ },
+
/* End-of-list marker */
{
{NULL, 0, 0, NULL, NULL}, NULL, false, NULL, NULL, NULL
@@ -3848,6 +3936,18 @@ struct config_int ConfigureNamesInt[] =
NULL, NULL, NULL
},
+ {
+ {"progressive_explain_interval", PGC_USERSET, STATS_MONITORING,
+ gettext_noop("Sets the interval between instrumented progressive "
+ "explains."),
+ NULL,
+ GUC_UNIT_MS
+ },
+ &progressive_explain_interval,
+ 0, 0, INT_MAX,
+ NULL, NULL, NULL
+ },
+
/* End-of-list marker */
{
{NULL, 0, 0, NULL, NULL}, NULL, 0, 0, 0, NULL, NULL, NULL
@@ -5396,6 +5496,16 @@ struct config_enum ConfigureNamesEnum[] =
NULL, assign_io_method, NULL
},
+ {
+ {"progressive_explain_format", PGC_USERSET, STATS_MONITORING,
+ gettext_noop("Selects the EXPLAIN output format to be used with progressive explains."),
+ gettext_noop("Equivalent to the FORMAT option of EXPLAIN.")
+ },
+ &progressive_explain_format,
+ EXPLAIN_FORMAT_TEXT, explain_format_options,
+ NULL, NULL, NULL
+ },
+
/* End-of-list marker */
{
{NULL, 0, 0, NULL, NULL}, NULL, 0, NULL, NULL, NULL, NULL
diff --git a/src/backend/utils/misc/postgresql.conf.sample b/src/backend/utils/misc/postgresql.conf.sample
index ff56a1f0732..3d45b63b7c5 100644
--- a/src/backend/utils/misc/postgresql.conf.sample
+++ b/src/backend/utils/misc/postgresql.conf.sample
@@ -670,6 +670,19 @@
#log_executor_stats = off
+# - Progressive Explain -
+
+#progressive_explain = off
+#progressive_explain_interval = 0
+#progressive_explain_format = text
+#progressive_explain_settings = off
+#progressive_explain_verbose = off
+#progressive_explain_buffers = off
+#progressive_explain_wal = off
+#progressive_explain_timing = on
+#progressive_explain_costs = on
+
+
#------------------------------------------------------------------------------
# VACUUMING
#------------------------------------------------------------------------------
diff --git a/src/include/catalog/pg_proc.dat b/src/include/catalog/pg_proc.dat
index 8b68b16d79d..69092d9ccc8 100644
--- a/src/include/catalog/pg_proc.dat
+++ b/src/include/catalog/pg_proc.dat
@@ -12493,4 +12493,14 @@
proargtypes => 'int4',
prosrc => 'gist_stratnum_common' },
+{ oid => '8770',
+ descr => 'statistics: information about progress of backends running statements',
+ proname => 'pg_stat_progress_explain', prorows => '100', proisstrict => 'f',
+ proretset => 't', provolatile => 's', proparallel => 'r',
+ prorettype => 'record', proargtypes => '',
+ proallargtypes => '{oid,int4,timestamptz,text}',
+ proargmodes => '{o,o,o,o}',
+ proargnames => '{datid,pid,last_update,query_plan}',
+ prosrc => 'pg_stat_progress_explain' },
+
]
diff --git a/src/include/commands/explain_progressive.h b/src/include/commands/explain_progressive.h
new file mode 100644
index 00000000000..f92ca805cec
--- /dev/null
+++ b/src/include/commands/explain_progressive.h
@@ -0,0 +1,42 @@
+/*-------------------------------------------------------------------------
+ *
+ * explain_progressive.h
+ * prototypes for explain_progressive.c
+ *
+ * Portions Copyright (c) 1996-2025, PostgreSQL Global Development Group
+ * Portions Copyright (c) 1994-5, Regents of the University of California
+ *
+ * src/include/commands/explain_progressive.h
+ *
+ *-------------------------------------------------------------------------
+ */
+#ifndef EXPLAIN_PROGRESSIVE_H
+#define EXPLAIN_PROGRESSIVE_H
+
+#include "datatype/timestamp.h"
+#include "executor/executor.h"
+
+typedef struct ProgressiveExplainData
+{
+ int plan_alloc_size;
+ TimestampTz last_update;
+ char plan[FLEXIBLE_ARRAY_MEMBER];
+} ProgressiveExplainData;
+
+extern bool ProgressiveExplainIsActive(QueryDesc *queryDesc);
+extern void ProgressiveExplainSetup(QueryDesc *queryDesc);
+extern void ProgressiveExplainStart(QueryDesc *queryDesc);
+extern void ProgressiveExplainTrigger(void);
+extern void ProgressiveExplainUpdate(PlanState *node);
+extern void ProgressiveExplainFinish(QueryDesc *queryDesc);
+extern Size ProgressiveExplainShmemSize(void);
+extern void ProgressiveExplainShmemInit(void);
+extern TupleTableSlot *ExecProcNodeInstrExplain(PlanState *node);
+
+/* transaction cleanup code */
+extern void AtEOXact_ProgressiveExplain(bool isCommit);
+extern void AtEOSubXact_ProgressiveExplain(bool isCommit, int nestDepth);
+
+extern bool ProgressiveExplainPending;
+
+#endif /* EXPLAIN_PROGRESSIVE_H */
diff --git a/src/include/commands/explain_state.h b/src/include/commands/explain_state.h
index 32728f5d1a1..af5469d821e 100644
--- a/src/include/commands/explain_state.h
+++ b/src/include/commands/explain_state.h
@@ -16,6 +16,7 @@
#include "nodes/parsenodes.h"
#include "nodes/plannodes.h"
#include "parser/parse_node.h"
+#include "utils/dsa.h"
typedef enum ExplainSerializeOption
{
@@ -74,6 +75,12 @@ typedef struct ExplainState
/* extensions */
void **extension_state;
int extension_state_allocated;
+ /* set if tracking a progressive explain */
+ bool progressive;
+ /* current plan node in progressive explains */
+ struct PlanState *pe_curr_node;
+ /* reusable instr object used in progressive explains */
+ struct Instrumentation *pe_local_instr;
} ExplainState;
typedef void (*ExplainOptionHandler) (ExplainState *, DefElem *, ParseState *);
diff --git a/src/include/executor/execdesc.h b/src/include/executor/execdesc.h
index ba53305ad42..27692aee542 100644
--- a/src/include/executor/execdesc.h
+++ b/src/include/executor/execdesc.h
@@ -48,6 +48,7 @@ typedef struct QueryDesc
TupleDesc tupDesc; /* descriptor for result tuples */
EState *estate; /* executor's query-wide state */
PlanState *planstate; /* tree of per-plan-node state */
+ struct ExplainState *pestate; /* progressive explain state if enabled */
/* This field is set by ExecutePlan */
bool already_executed; /* true if previously executed */
diff --git a/src/include/executor/instrument.h b/src/include/executor/instrument.h
index 03653ab6c6c..21de2a5632d 100644
--- a/src/include/executor/instrument.h
+++ b/src/include/executor/instrument.h
@@ -109,6 +109,7 @@ extern void InstrStartNode(Instrumentation *instr);
extern void InstrStopNode(Instrumentation *instr, double nTuples);
extern void InstrUpdateTupleCount(Instrumentation *instr, double nTuples);
extern void InstrEndLoop(Instrumentation *instr);
+extern void InstrEndLoopForce(Instrumentation *instr);
extern void InstrAggNode(Instrumentation *dst, Instrumentation *add);
extern void InstrStartParallelQuery(void);
extern void InstrEndParallelQuery(BufferUsage *bufusage, WalUsage *walusage);
diff --git a/src/include/nodes/execnodes.h b/src/include/nodes/execnodes.h
index 5b6cadb5a6c..b7d2d0458de 100644
--- a/src/include/nodes/execnodes.h
+++ b/src/include/nodes/execnodes.h
@@ -57,6 +57,7 @@ struct ExprState;
struct ExprContext;
struct RangeTblEntry; /* avoid including parsenodes.h here */
struct ExprEvalStep; /* avoid including execExpr.h everywhere */
+struct QueryDesc; /* avoid including execdesc.h here */
struct CopyMultiInsertBuffer;
struct LogicalTapeSet;
@@ -769,6 +770,9 @@ typedef struct EState
*/
List *es_insert_pending_result_relations;
List *es_insert_pending_modifytables;
+
+ /* Reference to query descriptor */
+ struct QueryDesc *query_desc;
} EState;
@@ -1165,6 +1169,8 @@ typedef struct PlanState
ExecProcNodeMtd ExecProcNode; /* function to return next tuple */
ExecProcNodeMtd ExecProcNodeReal; /* actual function, if above is a
* wrapper */
+ ExecProcNodeMtd ExecProcNodeOriginal; /* pointer to original function
+ * another wrapper was added */
Instrumentation *instrument; /* Optional runtime stats for this node */
WorkerInstrumentation *worker_instrument; /* per-worker instrumentation */
diff --git a/src/include/storage/lwlock.h b/src/include/storage/lwlock.h
index 4df1d25c045..f1f949b7005 100644
--- a/src/include/storage/lwlock.h
+++ b/src/include/storage/lwlock.h
@@ -219,6 +219,8 @@ typedef enum BuiltinTrancheIds
LWTRANCHE_XACT_SLRU,
LWTRANCHE_PARALLEL_VACUUM_DSA,
LWTRANCHE_AIO_URING_COMPLETION,
+ LWTRANCHE_PROGRESSIVE_EXPLAIN,
+ LWTRANCHE_PROGRESSIVE_EXPLAIN_DSA,
LWTRANCHE_FIRST_USER_DEFINED,
} BuiltinTrancheIds;
diff --git a/src/include/storage/lwlocklist.h b/src/include/storage/lwlocklist.h
index 932024b1b0b..3c543953eaa 100644
--- a/src/include/storage/lwlocklist.h
+++ b/src/include/storage/lwlocklist.h
@@ -84,3 +84,4 @@ PG_LWLOCK(50, DSMRegistry)
PG_LWLOCK(51, InjectionPoint)
PG_LWLOCK(52, SerialControl)
PG_LWLOCK(53, AioWorkerSubmissionQueue)
+PG_LWLOCK(54, ProgressiveExplain)
diff --git a/src/include/storage/proc.h b/src/include/storage/proc.h
index f51b03d3822..1224e56a6c0 100644
--- a/src/include/storage/proc.h
+++ b/src/include/storage/proc.h
@@ -22,6 +22,7 @@
#include "storage/pg_sema.h"
#include "storage/proclist_types.h"
#include "storage/procnumber.h"
+#include "utils/dsa.h"
/*
* Each backend advertises up to PGPROC_MAX_CACHED_SUBXIDS TransactionIds
@@ -311,6 +312,10 @@ struct PGPROC
PGPROC *lockGroupLeader; /* lock group leader, if I'm a member */
dlist_head lockGroupMembers; /* list of members, if I'm a leader */
dlist_node lockGroupLink; /* my member link, if I'm a member */
+
+ dsa_pointer peDSAPointer; /* DSA pointer to progressive explain data */
+ /* Protects per-backend progressive explain data */
+ LWLock peLock;
};
/* NOTE: "typedef struct PGPROC PGPROC" appears in storage/lock.h. */
diff --git a/src/include/utils/guc.h b/src/include/utils/guc.h
index f619100467d..6bb9d36b003 100644
--- a/src/include/utils/guc.h
+++ b/src/include/utils/guc.h
@@ -278,6 +278,15 @@ extern PGDLLIMPORT int log_min_duration_statement;
extern PGDLLIMPORT int log_temp_files;
extern PGDLLIMPORT double log_statement_sample_rate;
extern PGDLLIMPORT double log_xact_sample_rate;
+extern PGDLLIMPORT bool progressive_explain;
+extern PGDLLIMPORT int progressive_explain_interval;
+extern PGDLLIMPORT int progressive_explain_format;
+extern PGDLLIMPORT bool progressive_explain_verbose;
+extern PGDLLIMPORT bool progressive_explain_settings;
+extern PGDLLIMPORT bool progressive_explain_timing;
+extern PGDLLIMPORT bool progressive_explain_buffers;
+extern PGDLLIMPORT bool progressive_explain_wal;
+extern PGDLLIMPORT bool progressive_explain_costs;
extern PGDLLIMPORT char *backtrace_functions;
extern PGDLLIMPORT int temp_file_limit;
@@ -322,6 +331,7 @@ extern PGDLLIMPORT const struct config_enum_entry io_method_options[];
extern PGDLLIMPORT const struct config_enum_entry recovery_target_action_options[];
extern PGDLLIMPORT const struct config_enum_entry wal_level_options[];
extern PGDLLIMPORT const struct config_enum_entry wal_sync_method_options[];
+extern PGDLLIMPORT const struct config_enum_entry explain_format_options[];
/*
* Functions exported by guc.c
diff --git a/src/include/utils/timeout.h b/src/include/utils/timeout.h
index 7b19beafdc9..f2751c5b4df 100644
--- a/src/include/utils/timeout.h
+++ b/src/include/utils/timeout.h
@@ -36,6 +36,7 @@ typedef enum TimeoutId
IDLE_STATS_UPDATE_TIMEOUT,
CLIENT_CONNECTION_CHECK_TIMEOUT,
STARTUP_PROGRESS_TIMEOUT,
+ PROGRESSIVE_EXPLAIN_TIMEOUT,
/* First user-definable timeout reason */
USER_TIMEOUT,
/* Maximum number of timeout reasons */
diff --git a/src/test/modules/test_misc/t/008_progressive_explain.pl b/src/test/modules/test_misc/t/008_progressive_explain.pl
new file mode 100644
index 00000000000..895031524ec
--- /dev/null
+++ b/src/test/modules/test_misc/t/008_progressive_explain.pl
@@ -0,0 +1,128 @@
+# Copyright (c) 2023-2025, PostgreSQL Global Development Group
+#
+# Test progressive explain
+#
+# We need to make sure pg_stat_progress_explain does not show rows for the local
+# session, even if progressive explain is enabled. For other sessions pg_stat_progress_explain
+# should contain data for a PID only if progressive_explain is enabled and a query
+# is running. Data needs to be removed when query finishes (or gets cancelled).
+
+use strict;
+use PostgreSQL::Test::Cluster;
+use PostgreSQL::Test::Utils;
+use Test::More;
+
+# Initialize node
+my $node = PostgreSQL::Test::Cluster->new('progressive_explain');
+
+$node->init;
+# Configure progressive explain to be logged immediately
+$node->append_conf('postgresql.conf', 'progressive_explain_interval = 0');
+$node->start;
+
+# Test for local session
+sub test_local_session
+{
+ my $setting = $_[0];
+ # Make sure local session does not appear in pg_stat_progress_explain
+ my $count = $node->safe_psql(
+ 'postgres', qq[
+ SET progressive_explain='$setting';
+ SELECT count(*) from pg_stat_progress_explain WHERE pid = pg_backend_pid()
+ ]);
+
+ ok($count == "0",
+ "Session cannot see its own explain with progressive_explain set to ${setting}");
+}
+
+# Tests for peer session
+sub test_peer_session
+{
+ my $setting = $_[0];
+ my $ret;
+
+ # Start a background session and get its PID
+ my $background_psql = $node->background_psql(
+ 'postgres',
+ on_error_stop => 0);
+
+ my $pid = $background_psql->query_safe(
+ qq[
+ SELECT pg_backend_pid();
+ ]);
+
+ # Configure progressive explain in background session and run a simple query
+ # letting it finish
+ $background_psql->query_safe(
+ qq[
+ SET progressive_explain='$setting';
+ SELECT 1;
+ ]);
+
+ # Check that pg_stat_progress_explain contains no row for the PID that finished
+ # its query gracefully
+ $ret = $node->safe_psql(
+ 'postgres', qq[
+ SELECT count(*) FROM pg_stat_progress_explain where pid = $pid
+ ]);
+
+ ok($ret == "0",
+ "pg_stat_progress_explain empty for completed query with progressive_explain set to ${setting}");
+
+ # Start query in background session and leave it running
+ $background_psql->query_until(
+ qr/start/, q(
+ \echo start
+ SELECT pg_sleep(600);
+ ));
+
+ $ret = $node->safe_psql(
+ 'postgres', qq[
+ SELECT count(*) FROM pg_stat_progress_explain where pid = $pid
+ ]);
+
+ # If progressive_explain is disabled pg_stat_progress_explain should not contain
+ # row for PID
+ if ($setting eq 'off') {
+ ok($ret == "0",
+ "pg_stat_progress_explain empty for running query with progressive_explain set to ${setting}");
+ }
+ # 1 row for pid is expected for running query
+ else {
+ ok($ret == "1",
+ "pg_stat_progress_explain with 1 row for running query with progressive_explain set to ${setting}");
+ }
+
+ # Terminate running query and make sure it is gone
+ $node->safe_psql(
+ 'postgres', qq[
+ SELECT pg_cancel_backend($pid);
+ ]);
+
+ $node->poll_query_until(
+ 'postgres', qq[
+ SELECT count(*) = 0 FROM pg_stat_activity
+ WHERE pid = $pid and state = 'active'
+ ]);
+
+ # Check again pg_stat_progress_explain and confirm that the existing row is
+ # now gone
+ $ret = $node->safe_psql(
+ 'postgres', qq[
+ SELECT count(*) FROM pg_stat_progress_explain where pid = $pid
+ ]);
+
+ ok($ret == "0",
+ "pg_stat_progress_explain empty for canceled query with progressive_explain set to ${setting}");
+}
+
+# Run tests for the local session
+test_local_session('off');
+test_local_session('on');
+
+# Run tests for peer session
+test_peer_session('off');
+test_peer_session('on');
+
+$node->stop;
+done_testing();
diff --git a/src/test/regress/expected/rules.out b/src/test/regress/expected/rules.out
index 47478969135..62b70cf4618 100644
--- a/src/test/regress/expected/rules.out
+++ b/src/test/regress/expected/rules.out
@@ -2041,6 +2041,13 @@ pg_stat_progress_create_index| SELECT s.pid,
s.param15 AS partitions_done
FROM (pg_stat_get_progress_info('CREATE INDEX'::text) s(pid, datid, relid, param1, param2, param3, param4, param5, param6, param7, param8, param9, param10, param11, param12, param13, param14, param15, param16, param17, param18, param19, param20)
LEFT JOIN pg_database d ON ((s.datid = d.oid)));
+pg_stat_progress_explain| SELECT s.datid,
+ d.datname,
+ s.pid,
+ s.last_update,
+ s.query_plan
+ FROM (pg_stat_progress_explain() s(datid, pid, last_update, query_plan)
+ LEFT JOIN pg_database d ON ((s.datid = d.oid)));
pg_stat_progress_vacuum| SELECT s.pid,
s.datid,
d.datname,
diff --git a/src/tools/pgindent/typedefs.list b/src/tools/pgindent/typedefs.list
index b66cecd8799..1bb65913985 100644
--- a/src/tools/pgindent/typedefs.list
+++ b/src/tools/pgindent/typedefs.list
@@ -2302,6 +2302,7 @@ ProcessUtilityContext
ProcessUtility_hook_type
ProcessingMode
ProgressCommandType
+ProgressiveExplainData
ProjectSet
ProjectSetPath
ProjectSetState
--
2.43.0
On 2025-04-01 15:23, Rafael Thofehrn Castro wrote:
Hello again,
ERROR: could not attach to dynamic shared area
In addition to that refactoring issue, the current patch had a race
condition in pg_stat_progress_explain to access the DSA of a process
running a query that gets aborted.While discussing with Robert we agreed that it would be wiser to take
a step back and change the strategy used to share progressive explain
data in shared memory.Instead of using per backend's DSAs shared via a hash structure I now
define a dsa_pointer and a LWLock in each backend's PGPROC.A global DSA is created by the first backend that attempts to use
the progressive explain feature. After the DSA is created, subsequent
uses of the feature will just allocate memory there and reference
via PGPROC's dsa_pointer.This solves the race condition reported by Torikoshi and improves
concurrency performance as now we don't have a global LWLock
controlling shared memory access, but a per-backend LWLock.Performed the same tests done by Torikoshi and it looks like we are
good now. Even with more frequent inspects in pg_stat_progress_explain
(\watch 0.01).
Thanks for updating the patch!
Have you tested enabling progressive_explain?
When I ran the 'make installcheck' test again setting
progressive_explain to on, there was the same assertion failure:
TRAP: failed Assert("param->paramkind == PARAM_EXTERN"), File:
"ruleutils.c", Line: 8802, PID: 116832
postgres: parallel worker for PID 116822
(ExceptionalCondition+0x98)[0xb7311ea8bf80]
postgres: parallel worker for PID 116822 (+0x89de6c)[0xb7311e9ede6c]
postgres: parallel worker for PID 116822 (+0x89eb68)[0xb7311e9eeb68]
postgres: parallel worker for PID 116822 (+0x89e78c)[0xb7311e9ee78c]
postgres: parallel worker for PID 116822 (+0x8a1d10)[0xb7311e9f1d10]
postgres: parallel worker for PID 116822 (+0x89ed80)[0xb7311e9eed80]
postgres: parallel worker for PID 116822 (+0x89e78c)[0xb7311e9ee78c]
postgres: parallel worker for PID 116822 (+0x89f174)[0xb7311e9ef174]
postgres: parallel worker for PID 116822 (+0x89e78c)[0xb7311e9ee78c]
postgres: parallel worker for PID 116822 (+0x89f0b8)[0xb7311e9ef0b8]
postgres: parallel worker for PID 116822 (+0x8928dc)[0xb7311e9e28dc]
postgres: parallel worker for PID 116822
(deparse_expression+0x34)[0xb7311e9e2834]
postgres: parallel worker for PID 116822 (+0x347870)[0xb7311e497870]
postgres: parallel worker for PID 116822 (+0x3478e4)[0xb7311e4978e4]
postgres: parallel worker for PID 116822 (+0x347970)[0xb7311e497970]
...
TRAP: failed Assert("param->paramkind == PARAM_EXTERN"), File:
"ruleutils.c", Line: 8802, PID: 116831
[115650]: LOCATION: LogChildExit, postmaster.c:2846
was terminated by signal 6: Aborted
[115650]: LOCATION: LogChildExit, postmaster.c:2846
off, summary off, timing off, buffers off) select count(*) from ab where
(a = (select 1) or a = (select 3)) and b = 2
[115650]: LOCATION: LogChildExit, postmaster.c:2846
We can reproduce it as follows:
show progressive_explain;
progressive_explain
---------------------
on
create table ab (a int not null, b int not null) partition by list
(a);
create table ab_a2 partition of ab for values in(2) partition by list
(b);
create table ab_a2_b1 partition of ab_a2 for values in (1);
create table ab_a2_b2 partition of ab_a2 for values in (2);
create table ab_a2_b3 partition of ab_a2 for values in (3);
create table ab_a1 partition of ab for values in(1) partition by list
(b);
create table ab_a1_b1 partition of ab_a1 for values in (1);
create table ab_a1_b2 partition of ab_a1 for values in (2);
create table ab_a1_b3 partition of ab_a1 for values in (3);
create table ab_a3 partition of ab for values in(3) partition by list
(b);
create table ab_a3_b1 partition of ab_a3 for values in (1);
create table ab_a3_b2 partition of ab_a3 for values in (2);
create table ab_a3_b3 partition of ab_a3 for values in (3);
set parallel_setup_cost = 0;
set parallel_tuple_cost = 0;
set min_parallel_table_scan_size = 0;
set max_parallel_workers_per_gather = 2;
explain (analyze, costs off, summary off, timing off, buffers off)
select count(*) from ab where (a = (select 1) or a = (select 3)) and b =
2;
WARNING: terminating connection because of crash of another server
process
DETAIL: The postmaster has commanded this server process to roll back
the current transaction and exit, because another server process exited
abnormally and possibly corrupted shared memory.
HINT: In a moment you should be able to reconnect to the database and
repeat your command.
server closed the connection unexpectedly
This probably means the server terminated abnormally
before or while processing the request.
The connection to the server was lost. Attempting reset: Failed.
Note that there is no need to access pg_stat_progress_explain.
Could you please check if you can reproduce this?
--
Regards,
--
Atsushi Torikoshi
Seconded from NTT DATA GROUP CORPORATION to SRA OSS K.K.
On Tue, Apr 1, 2025 at 9:38 AM torikoshia <torikoshia@oss.nttdata.com> wrote:
Could you please check if you can reproduce this?
I can, and I now see that this patch has a pretty big design problem.
The assertion failure occurs when a background worker tries to call
ruleutils.c's get_parameter(), which tries to find the expression from
which the parameter was computed. To do that, it essentially looks
upward in the plan tree until it finds the source of the parameter.
For example, if the parameter came from the bar_id column of relation
foo, we would then deparse the parameter as foo.bar_id, rather than as
$1 or similar. However, this doesn't work when the deparsing is
attempted from within a parallel worker, because the parallel worker
only gets the portion of the plan it's attempting to execute, not the
whole thing. In your test case, the whole plan looks like this:
Aggregate
InitPlan 1
-> Result
InitPlan 2
-> Result
-> Gather
Workers Planned: 2
-> Parallel Append
-> Parallel Seq Scan on ab_a1_b2 ab_1
Filter: ((b = 2) AND ((a = (InitPlan 1).col1) OR
(a = (InitPlan 2).col1)))
-> Parallel Seq Scan on ab_a2_b2 ab_2
Filter: ((b = 2) AND ((a = (InitPlan 1).col1) OR
(a = (InitPlan 2).col1)))
-> Parallel Seq Scan on ab_a3_b2 ab_3
Filter: ((b = 2) AND ((a = (InitPlan 1).col1) OR
(a = (InitPlan 2).col1)))
Those references to (InitPlan 1).col1 are actually Params. I think
what's happening here is approximately that the worker tries to find
the source of those Params, but (InitPlan 1) is above the Gather node
and thus not available to the worker, and so the worker can't find it
and the assertion fails.
In one sense, it is not hard to fix this: the workers shouldn't really
be doing progressive_explain at all, because then we get a
progressive_explain from each process individually instead of one for
the query as a whole, so we could just think of having the workers
ignore the progressive_explain GUC. However, one thing I realized
earlier this morning is that the progressive EXPLAIN can't show any of
the instrumentation that is relayed back from workers to the leader
only at the end of the execution. See the code in ParallelQueryMain()
just after ExecutorFinish().
What I think this means is that this patch needs significant
rethinking to cope with parallel query. I don't think users are going
to be happy with a progressive EXPLAIN that just ignores what the
workers are doing, and I don't think they're going to be happy with N
separate progressive explains that they have to merge together to get
an overall picture of what the query is doing, and I'm absolutely
positive that they're not going to be happy with something that
crashes. I think there may be a way to come up with a good design here
that avoids these problems, but we definitely don't have time before
feature freeze (not that we were necessarily in a great place to think
of committing this before feature freeze anyway, but it definitely
doesn't make sense now that I understand this problem).
--
Robert Haas
EDB: http://www.enterprisedb.com
My bad, I mistakenly did the tests without assertion enabled in the last 2
days,
so couldn't catch that Assertion failure. Was able to reproduce it, thanks.
I guess when the code was designed we were not expecting to be doing
explains
in parallel workers.
One comment is that this has nothing to do with instrumentation. So the
hacks
done with instrument objects are not to blame here.
What is interesting is that removing that Assert (not saying we should do
that)
fixes it and there doesn't seem to be other Asserts complaining anywhere.
The
feature works as expected and there are no crashes.
What I think this means is that this patch needs significant
rethinking to cope with parallel query. I don't think users are going
to be happy with a progressive EXPLAIN that just ignores what the
workers are doing, and I don't think they're going to be happy with N
separate progressive explains that they have to merge together to get
an overall picture of what the query is doing, and I'm absolutely
positive that they're not going to be happy with something that
crashes. I think there may be a way to come up with a good design here
that avoids these problems, but we definitely don't have time before
feature freeze (not that we were necessarily in a great place to think
of committing this before feature freeze anyway, but it definitely
doesn't make sense now that I understand this problem).
Yeah, that is a fair point. But I actually envisioned this feature to also
target parallel workers, from the start. I see a lot of value in being able
to debug what each parallel worker is doing in their blackboxes. It could be
that a worker is lagging behind others as it is dealing with non cached
data blocks for example.
According to my tests the parallel workers can actually push instrumentation
to the parent while still running. It all depends on the types of operations
being performed in the plan.
For example, if the parallel worker is doing a parallel seq scan, then it
will
continuously send chunks of rows to the parent and instrumentation goes with
the data. So we don't actually need to wait for the workers to finish until
instrumentation on the parent gets updated. Here is what I got from
Torikoshi's
test after removing that Assert and enabling instrumented progressive
explain (progressive_explain_interval = '10ms'):
-[ RECORD 1
]-------------------------------------------------------------------------------------------------------------------------------------------------
datid | 5
datname | postgres
pid | 169555
last_update | 2025-04-01 12:44:07.36775-03
query_plan | Gather (cost=0.02..137998.03 rows=10000002 width=8) (actual
time=0.715..868.619 rows=9232106.00 loops=1) (current)
+
| Workers Planned: 2
+
| Workers Launched: 2
+
| InitPlan 1
+
| -> Result (cost=0.00..0.01 rows=1 width=4) (actual
time=0.002..0.002 rows=1.00 loops=1)
+
| InitPlan 2
+
| -> Result (cost=0.00..0.01 rows=1 width=4) (actual
time=0.000..0.000 rows=1.00 loops=1)
+
| -> Parallel Append (cost=0.00..137998.01 rows=4166669
width=8) (actual time=0.364..264.804 rows=1533011.00 loops=1)
+
| -> Seq Scan on ab_a2_b1 ab_2 (cost=0.00..0.00
rows=1 width=8) (never executed)
+
| Filter: ((b = 1) AND ((a = (InitPlan 1).col1)
OR (a = (InitPlan 2).col1)))
+
| -> Seq Scan on ab_a3_b1 ab_3 (cost=0.00..0.00
rows=1 width=8) (never executed)
+
| Filter: ((b = 1) AND ((a = (InitPlan 1).col1)
OR (a = (InitPlan 2).col1)))
+
| -> Parallel Seq Scan on ab_a1_b1 ab_1
(cost=0.00..117164.67 rows=4166667 width=8) (actual time=0.361..192.896
rows=1533011.00 loops=1)+
| Filter: ((b = 1) AND ((a = (InitPlan 1).col1)
OR (a = (InitPlan 2).col1)))
+
|
-[ RECORD 2
]-------------------------------------------------------------------------------------------------------------------------------------------------
datid | 5
datname | postgres
pid | 169596
last_update | 2025-04-01 12:44:07.361846-03
query_plan | Parallel Append (cost=0.00..137998.01 rows=4166669 width=8)
(actual time=0.990..666.845 rows=3839515.00 loops=1) (current)
+
| -> Seq Scan on ab_a2_b1 ab_2 (cost=0.00..0.00 rows=1
width=8) (never executed)
+
| Filter: ((b = 1) AND ((a = $0) OR (a = $1)))
+
| -> Seq Scan on ab_a3_b1 ab_3 (cost=0.00..0.00 rows=1
width=8) (never executed)
+
| Filter: ((b = 1) AND ((a = $0) OR (a = $1)))
+
| -> Parallel Seq Scan on ab_a1_b1 ab_1
(cost=0.00..117164.67 rows=4166667 width=8) (actual time=0.985..480.531
rows=3839515.00 loops=1) +
| Filter: ((b = 1) AND ((a = $0) OR (a = $1)))
+
|
-[ RECORD 3
]-------------------------------------------------------------------------------------------------------------------------------------------------
datid | 5
datname | postgres
pid | 169597
last_update | 2025-04-01 12:44:07.36181-03
query_plan | Parallel Append (cost=0.00..137998.01 rows=4166669 width=8)
(actual time=1.003..669.398 rows=3830293.00 loops=1) (current)
+
| -> Seq Scan on ab_a2_b1 ab_2 (cost=0.00..0.00 rows=1
width=8) (never executed)
+
| Filter: ((b = 1) AND ((a = $0) OR (a = $1)))
+
| -> Seq Scan on ab_a3_b1 ab_3 (cost=0.00..0.00 rows=1
width=8) (actual time=0.019..0.019 rows=0.00 loops=1)
+
| Filter: ((b = 1) AND ((a = $0) OR (a = $1)))
+
| -> Parallel Seq Scan on ab_a1_b1 ab_1
(cost=0.00..117164.67 rows=4166667 width=8) (actual time=0.979..482.420
rows=3830293.00 loops=1) +
| Filter: ((b = 1) AND ((a = $0) OR (a = $1)))
+
|
But yeah, if the parallel worker does a hash join and the HASH node is a
huge
block of sub operations, then yes, upstream will not see anything until the
HASH
is computed. That is why IMO having visibility on background workers has
value
too.
I even designed a query visualizer that combines plans of the parent and
parallel workers in side-by-side windows that includes other stats (per
process wait events, CPU, memory consumption, temp file generation) allowing
us to correlate with specific operations being done in the plans.
But Robert is right, the engineering required to make this happen isn't
easy. Changing that Assert can open a pandora box of other issues and we
have no time to do that for PG18.
Rafael.