Parallel leader process info in EXPLAIN
Hi,
While working on some slides explaining EXPLAIN, I couldn't resist the
urge to add the missing $SUBJECT. The attached 0001 patch gives the
following:
Gather ... time=0.146..33.077 rows=1 loops=1)
Workers Planned: 2
Workers Launched: 2
Buffers: shared hit=4425
-> Parallel Seq Scan on public.t ... time=19.421..30.092 rows=0 loops=3)
Filter: (t.i = 42)
Rows Removed by Filter: 333333
Leader: actual time=0.013..32.025 rows=1 loops=1 <--- NEW
Buffers: shared hit=1546 <--- NEW
Worker 0: actual time=29.069..29.069 rows=0 loops=1
Buffers: shared hit=1126
Worker 1: actual time=29.181..29.181 rows=0 loops=1
Buffers: shared hit=1753
Without that, you have to deduce what work was done in the leader, but
I'd rather just show it.
The 0002 patch adjusts Sort for consistency with that scheme, so you get:
Sort ... time=84.303..122.238 rows=333333 loops=3)
Output: t1.i
Sort Key: t1.i
Leader: Sort Method: external merge Disk: 5864kB <--- DIFFERENT
Worker 0: Sort Method: external merge Disk: 3376kB
Worker 1: Sort Method: external merge Disk: 4504kB
Leader: actual time=119.624..165.949 rows=426914 loops=1
Worker 0: actual time=61.239..90.984 rows=245612 loops=1
Worker 1: actual time=72.046..109.782 rows=327474 loops=1
Without the "Leader" label, it's not really clear to the uninitiated
whether you're looking at combined, average or single process numbers.
Of course there are some more things that could be reported in a
similar way eventually, such as filter counters and hash join details.
For the XML/JSON/YAML formats, I decided to use a <Worker> element
with <Worker-Number>-1</Worker-Number> to indicate the leader.
Perhaps there should be a <Leader> element instead?
Thoughts?
Attachments:
0001-Show-parallel-leader-stats-in-EXPLAIN-output.patchapplication/octet-stream; name=0001-Show-parallel-leader-stats-in-EXPLAIN-output.patchDownload
From 336d28a42f43b396f3760101f96467662364d0ae Mon Sep 17 00:00:00 2001
From: Thomas Munro <thomas.munro@gmail.com>
Date: Wed, 23 Oct 2019 16:17:57 +1300
Subject: [PATCH 1/2] Show parallel leader stats in EXPLAIN output.
Previously, we only showed VERBOSE information from parallel
workers. Show the leader's contribution too.
Author: Thomas Munro
---
src/backend/commands/explain.c | 214 ++++++++++++++++------------
src/backend/executor/execParallel.c | 18 +--
src/include/nodes/execnodes.h | 1 +
3 files changed, 130 insertions(+), 103 deletions(-)
diff --git a/src/backend/commands/explain.c b/src/backend/commands/explain.c
index 62fb3434a3..a1fb22c9f7 100644
--- a/src/backend/commands/explain.c
+++ b/src/backend/commands/explain.c
@@ -105,7 +105,7 @@ static void show_hash_info(HashState *hashstate, ExplainState *es);
static void show_tidbitmap_info(BitmapHeapScanState *planstate,
ExplainState *es);
static void show_instrumentation_count(const char *qlabel, int which,
- PlanState *planstate, ExplainState *es);
+ Instrumentation *instrument, ExplainState *es);
static void show_foreignscan_info(ForeignScanState *fsstate, ExplainState *es);
static void show_eval_params(Bitmapset *bms_params, ExplainState *es);
static const char *explain_get_index_name(Oid indexId);
@@ -1041,6 +1041,68 @@ ExplainPreScanNode(PlanState *planstate, Bitmapset **rels_used)
return planstate_tree_walker(planstate, ExplainPreScanNode, rels_used);
}
+static void
+ExplainNodePerProcess(ExplainState *es, bool *opened_group,
+ int worker_number, Instrumentation *instrument)
+{
+ double nloops = instrument->nloops;
+ double startup_ms;
+ double total_ms;
+ double rows;
+
+ if (nloops <= 0)
+ return;
+ startup_ms = 1000.0 * instrument->startup / nloops;
+ total_ms = 1000.0 * instrument->total / nloops;
+ rows = instrument->ntuples / nloops;
+
+ if (es->format == EXPLAIN_FORMAT_TEXT)
+ {
+ appendStringInfoSpaces(es->str, es->indent * 2);
+ if (worker_number < 0)
+ appendStringInfo(es->str, "Leader: ");
+ else
+ appendStringInfo(es->str, "Worker %d: ", worker_number);
+ if (es->timing)
+ appendStringInfo(es->str,
+ "actual time=%.3f..%.3f rows=%.0f loops=%.0f\n",
+ startup_ms, total_ms, rows, nloops);
+ else
+ appendStringInfo(es->str,
+ "actual rows=%.0f loops=%.0f\n",
+ rows, nloops);
+ es->indent++;
+ if (es->buffers)
+ show_buffer_usage(es, &instrument->bufusage);
+ es->indent--;
+ }
+ else
+ {
+ if (!*opened_group)
+ {
+ ExplainOpenGroup("Workers", "Workers", false, es);
+ *opened_group = true;
+ }
+ ExplainOpenGroup("Worker", NULL, true, es);
+ ExplainPropertyInteger("Worker Number", NULL, worker_number, es);
+
+ if (es->timing)
+ {
+ ExplainPropertyFloat("Actual Startup Time", "ms",
+ startup_ms, 3, es);
+ ExplainPropertyFloat("Actual Total Time", "ms",
+ total_ms, 3, es);
+ }
+ ExplainPropertyFloat("Actual Rows", NULL, rows, 0, es);
+ ExplainPropertyFloat("Actual Loops", NULL, nloops, 0, es);
+
+ if (es->buffers)
+ show_buffer_usage(es, &instrument->bufusage);
+
+ ExplainCloseGroup("Worker", NULL, true, es);
+ }
+}
+
/*
* ExplainNode -
* Appends a description of a plan tree to es->str
@@ -1067,6 +1129,7 @@ ExplainNode(PlanState *planstate, List *ancestors,
ExplainState *es)
{
Plan *plan = planstate->plan;
+ Instrumentation *show_instrument;
const char *pname; /* node type name for text output */
const char *sname; /* node type name for non-text output */
const char *strategy = NULL;
@@ -1076,6 +1139,12 @@ ExplainNode(PlanState *planstate, List *ancestors,
int save_indent = es->indent;
bool haschildren;
+ /* If this is a parallel query, show the combined stats. */
+ if (planstate->combined_instrument)
+ show_instrument = planstate->combined_instrument;
+ else
+ show_instrument = planstate->instrument;
+
switch (nodeTag(plan))
{
case T_Result:
@@ -1505,13 +1574,19 @@ ExplainNode(PlanState *planstate, List *ancestors,
if (planstate->instrument)
InstrEndLoop(planstate->instrument);
- if (es->analyze &&
- planstate->instrument && planstate->instrument->nloops > 0)
+ /*
+ * If we have combined instrumentation from parallel workers, add the
+ * leader's contribution.
+ */
+ if (planstate->combined_instrument && planstate->instrument)
+ InstrAggNode(planstate->combined_instrument, planstate->instrument);
+
+ if (es->analyze && show_instrument && show_instrument->nloops > 0)
{
- double nloops = planstate->instrument->nloops;
- double startup_ms = 1000.0 * planstate->instrument->startup / nloops;
- double total_ms = 1000.0 * planstate->instrument->total / nloops;
- double rows = planstate->instrument->ntuples / nloops;
+ double nloops = show_instrument->nloops;
+ double startup_ms = 1000.0 * show_instrument->startup / nloops;
+ double total_ms = 1000.0 * show_instrument->total / nloops;
+ double rows = show_instrument->ntuples / nloops;
if (es->format == EXPLAIN_FORMAT_TEXT)
{
@@ -1586,26 +1661,26 @@ ExplainNode(PlanState *planstate, List *ancestors,
"Index Cond", planstate, ancestors, es);
if (((IndexScan *) plan)->indexqualorig)
show_instrumentation_count("Rows Removed by Index Recheck", 2,
- planstate, es);
+ show_instrument, es);
show_scan_qual(((IndexScan *) plan)->indexorderbyorig,
"Order By", planstate, ancestors, es);
show_scan_qual(plan->qual, "Filter", planstate, ancestors, es);
if (plan->qual)
show_instrumentation_count("Rows Removed by Filter", 1,
- planstate, es);
+ show_instrument, es);
break;
case T_IndexOnlyScan:
show_scan_qual(((IndexOnlyScan *) plan)->indexqual,
"Index Cond", planstate, ancestors, es);
if (((IndexOnlyScan *) plan)->indexqual)
show_instrumentation_count("Rows Removed by Index Recheck", 2,
- planstate, es);
+ show_instrument, es);
show_scan_qual(((IndexOnlyScan *) plan)->indexorderby,
"Order By", planstate, ancestors, es);
show_scan_qual(plan->qual, "Filter", planstate, ancestors, es);
if (plan->qual)
show_instrumentation_count("Rows Removed by Filter", 1,
- planstate, es);
+ show_instrument, es);
if (es->analyze)
ExplainPropertyFloat("Heap Fetches", NULL,
planstate->instrument->ntuples2, 0, es);
@@ -1619,11 +1694,11 @@ ExplainNode(PlanState *planstate, List *ancestors,
"Recheck Cond", planstate, ancestors, es);
if (((BitmapHeapScan *) plan)->bitmapqualorig)
show_instrumentation_count("Rows Removed by Index Recheck", 2,
- planstate, es);
+ show_instrument, es);
show_scan_qual(plan->qual, "Filter", planstate, ancestors, es);
if (plan->qual)
show_instrumentation_count("Rows Removed by Filter", 1,
- planstate, es);
+ show_instrument, es);
if (es->analyze)
show_tidbitmap_info((BitmapHeapScanState *) planstate, es);
break;
@@ -1641,7 +1716,7 @@ ExplainNode(PlanState *planstate, List *ancestors,
show_scan_qual(plan->qual, "Filter", planstate, ancestors, es);
if (plan->qual)
show_instrumentation_count("Rows Removed by Filter", 1,
- planstate, es);
+ show_instrument, es);
break;
case T_Gather:
{
@@ -1650,7 +1725,7 @@ ExplainNode(PlanState *planstate, List *ancestors,
show_scan_qual(plan->qual, "Filter", planstate, ancestors, es);
if (plan->qual)
show_instrumentation_count("Rows Removed by Filter", 1,
- planstate, es);
+ show_instrument, es);
ExplainPropertyInteger("Workers Planned", NULL,
gather->num_workers, es);
@@ -1696,7 +1771,7 @@ ExplainNode(PlanState *planstate, List *ancestors,
show_scan_qual(plan->qual, "Filter", planstate, ancestors, es);
if (plan->qual)
show_instrumentation_count("Rows Removed by Filter", 1,
- planstate, es);
+ show_instrument, es);
ExplainPropertyInteger("Workers Planned", NULL,
gm->num_workers, es);
@@ -1734,7 +1809,7 @@ ExplainNode(PlanState *planstate, List *ancestors,
show_scan_qual(plan->qual, "Filter", planstate, ancestors, es);
if (plan->qual)
show_instrumentation_count("Rows Removed by Filter", 1,
- planstate, es);
+ show_instrument, es);
break;
case T_TableFuncScan:
if (es->verbose)
@@ -1748,7 +1823,7 @@ ExplainNode(PlanState *planstate, List *ancestors,
show_scan_qual(plan->qual, "Filter", planstate, ancestors, es);
if (plan->qual)
show_instrumentation_count("Rows Removed by Filter", 1,
- planstate, es);
+ show_instrument, es);
break;
case T_TidScan:
{
@@ -1764,14 +1839,14 @@ ExplainNode(PlanState *planstate, List *ancestors,
show_scan_qual(plan->qual, "Filter", planstate, ancestors, es);
if (plan->qual)
show_instrumentation_count("Rows Removed by Filter", 1,
- planstate, es);
+ show_instrument, es);
}
break;
case T_ForeignScan:
show_scan_qual(plan->qual, "Filter", planstate, ancestors, es);
if (plan->qual)
show_instrumentation_count("Rows Removed by Filter", 1,
- planstate, es);
+ show_instrument, es);
show_foreignscan_info((ForeignScanState *) planstate, es);
break;
case T_CustomScan:
@@ -1781,7 +1856,7 @@ ExplainNode(PlanState *planstate, List *ancestors,
show_scan_qual(plan->qual, "Filter", planstate, ancestors, es);
if (plan->qual)
show_instrumentation_count("Rows Removed by Filter", 1,
- planstate, es);
+ show_instrument, es);
if (css->methods->ExplainCustomScan)
css->methods->ExplainCustomScan(css, ancestors, es);
}
@@ -1791,11 +1866,11 @@ ExplainNode(PlanState *planstate, List *ancestors,
"Join Filter", planstate, ancestors, es);
if (((NestLoop *) plan)->join.joinqual)
show_instrumentation_count("Rows Removed by Join Filter", 1,
- planstate, es);
+ show_instrument, es);
show_upper_qual(plan->qual, "Filter", planstate, ancestors, es);
if (plan->qual)
show_instrumentation_count("Rows Removed by Filter", 2,
- planstate, es);
+ show_instrument, es);
break;
case T_MergeJoin:
show_upper_qual(((MergeJoin *) plan)->mergeclauses,
@@ -1804,11 +1879,11 @@ ExplainNode(PlanState *planstate, List *ancestors,
"Join Filter", planstate, ancestors, es);
if (((MergeJoin *) plan)->join.joinqual)
show_instrumentation_count("Rows Removed by Join Filter", 1,
- planstate, es);
+ show_instrument, es);
show_upper_qual(plan->qual, "Filter", planstate, ancestors, es);
if (plan->qual)
show_instrumentation_count("Rows Removed by Filter", 2,
- planstate, es);
+ show_instrument, es);
break;
case T_HashJoin:
show_upper_qual(((HashJoin *) plan)->hashclauses,
@@ -1817,25 +1892,25 @@ ExplainNode(PlanState *planstate, List *ancestors,
"Join Filter", planstate, ancestors, es);
if (((HashJoin *) plan)->join.joinqual)
show_instrumentation_count("Rows Removed by Join Filter", 1,
- planstate, es);
+ show_instrument, es);
show_upper_qual(plan->qual, "Filter", planstate, ancestors, es);
if (plan->qual)
show_instrumentation_count("Rows Removed by Filter", 2,
- planstate, es);
+ show_instrument, es);
break;
case T_Agg:
show_agg_keys(castNode(AggState, planstate), ancestors, es);
show_upper_qual(plan->qual, "Filter", planstate, ancestors, es);
if (plan->qual)
show_instrumentation_count("Rows Removed by Filter", 1,
- planstate, es);
+ show_instrument, es);
break;
case T_Group:
show_group_keys(castNode(GroupState, planstate), ancestors, es);
show_upper_qual(plan->qual, "Filter", planstate, ancestors, es);
if (plan->qual)
show_instrumentation_count("Rows Removed by Filter", 1,
- planstate, es);
+ show_instrument, es);
break;
case T_Sort:
show_sort_keys(castNode(SortState, planstate), ancestors, es);
@@ -1851,7 +1926,7 @@ ExplainNode(PlanState *planstate, List *ancestors,
show_upper_qual(plan->qual, "Filter", planstate, ancestors, es);
if (plan->qual)
show_instrumentation_count("Rows Removed by Filter", 1,
- planstate, es);
+ show_instrument, es);
break;
case T_ModifyTable:
show_modifytable_info(castNode(ModifyTableState, planstate), ancestors,
@@ -1865,73 +1940,21 @@ ExplainNode(PlanState *planstate, List *ancestors,
}
/* Show buffer usage */
- if (es->buffers && planstate->instrument)
- show_buffer_usage(es, &planstate->instrument->bufusage);
+ if (es->buffers && show_instrument)
+ show_buffer_usage(es, &show_instrument->bufusage);
/* Show worker detail */
if (es->analyze && es->verbose && planstate->worker_instrument)
{
WorkerInstrumentation *w = planstate->worker_instrument;
bool opened_group = false;
- int n;
- for (n = 0; n < w->num_workers; ++n)
- {
- Instrumentation *instrument = &w->instrument[n];
- double nloops = instrument->nloops;
- double startup_ms;
- double total_ms;
- double rows;
-
- if (nloops <= 0)
- continue;
- startup_ms = 1000.0 * instrument->startup / nloops;
- total_ms = 1000.0 * instrument->total / nloops;
- rows = instrument->ntuples / nloops;
+ /* Leader */
+ ExplainNodePerProcess(es, &opened_group, -1, planstate->instrument);
- if (es->format == EXPLAIN_FORMAT_TEXT)
- {
- appendStringInfoSpaces(es->str, es->indent * 2);
- appendStringInfo(es->str, "Worker %d: ", n);
- if (es->timing)
- appendStringInfo(es->str,
- "actual time=%.3f..%.3f rows=%.0f loops=%.0f\n",
- startup_ms, total_ms, rows, nloops);
- else
- appendStringInfo(es->str,
- "actual rows=%.0f loops=%.0f\n",
- rows, nloops);
- es->indent++;
- if (es->buffers)
- show_buffer_usage(es, &instrument->bufusage);
- es->indent--;
- }
- else
- {
- if (!opened_group)
- {
- ExplainOpenGroup("Workers", "Workers", false, es);
- opened_group = true;
- }
- ExplainOpenGroup("Worker", NULL, true, es);
- ExplainPropertyInteger("Worker Number", NULL, n, es);
-
- if (es->timing)
- {
- ExplainPropertyFloat("Actual Startup Time", "ms",
- startup_ms, 3, es);
- ExplainPropertyFloat("Actual Total Time", "ms",
- total_ms, 3, es);
- }
- ExplainPropertyFloat("Actual Rows", NULL, rows, 0, es);
- ExplainPropertyFloat("Actual Loops", NULL, nloops, 0, es);
-
- if (es->buffers)
- show_buffer_usage(es, &instrument->bufusage);
-
- ExplainCloseGroup("Worker", NULL, true, es);
- }
- }
+ /* Workers */
+ for (int n = 0; n < w->num_workers; ++n)
+ ExplainNodePerProcess(es, &opened_group, n, &w->instrument[n]);
if (opened_group)
ExplainCloseGroup("Workers", "Workers", false, es);
@@ -2750,19 +2773,19 @@ show_tidbitmap_info(BitmapHeapScanState *planstate, ExplainState *es)
*/
static void
show_instrumentation_count(const char *qlabel, int which,
- PlanState *planstate, ExplainState *es)
+ Instrumentation *instrument, ExplainState *es)
{
double nfiltered;
double nloops;
- if (!es->analyze || !planstate->instrument)
+ if (!es->analyze || !instrument)
return;
if (which == 2)
- nfiltered = planstate->instrument->nfiltered2;
+ nfiltered = instrument->nfiltered2;
else
- nfiltered = planstate->instrument->nfiltered1;
- nloops = planstate->instrument->nloops;
+ nfiltered = instrument->nfiltered1;
+ nloops = instrument->nloops;
/* In text mode, suppress zero counts; they're not interesting enough */
if (nfiltered > 0 || es->format != EXPLAIN_FORMAT_TEXT)
@@ -3280,7 +3303,8 @@ show_modifytable_info(ModifyTableState *mtstate, List *ancestors,
{
show_upper_qual((List *) node->onConflictWhere, "Conflict Filter",
&mtstate->ps, ancestors, es);
- show_instrumentation_count("Rows Removed by Conflict Filter", 1, &mtstate->ps, es);
+ show_instrumentation_count("Rows Removed by Conflict Filter", 1,
+ mtstate->ps.instrument, es);
}
/* EXPLAIN ANALYZE display of actual outcome for each tuple proposed */
diff --git a/src/backend/executor/execParallel.c b/src/backend/executor/execParallel.c
index 53cd2fc666..26a04692ab 100644
--- a/src/backend/executor/execParallel.c
+++ b/src/backend/executor/execParallel.c
@@ -987,20 +987,22 @@ ExecParallelRetrieveInstrumentation(PlanState *planstate,
if (i >= instrumentation->num_plan_nodes)
elog(ERROR, "plan node %d not found", plan_node_id);
- /* Accumulate the statistics from all workers. */
- instrument = GetInstrumentationArray(instrumentation);
- instrument += i * instrumentation->num_workers;
- for (n = 0; n < instrumentation->num_workers; ++n)
- InstrAggNode(planstate->instrument, &instrument[n]);
-
/*
- * Also store the per-worker detail.
- *
* Worker instrumentation should be allocated in the same context as the
* regular instrumentation information, which is the per-query context.
* Switch into per-query memory context.
*/
oldcontext = MemoryContextSwitchTo(planstate->state->es_query_cxt);
+
+ /* Accumulate the statistics from all workers. */
+ if (!planstate->combined_instrument)
+ planstate->combined_instrument = palloc0(sizeof(Instrumentation));
+ instrument = GetInstrumentationArray(instrumentation);
+ instrument += i * instrumentation->num_workers;
+ for (n = 0; n < instrumentation->num_workers; ++n)
+ InstrAggNode(planstate->combined_instrument, &instrument[n]);
+
+ /* Also store the per-worker detail. */
ibytes = mul_size(instrumentation->num_workers, sizeof(Instrumentation));
planstate->worker_instrument =
palloc(ibytes + offsetof(WorkerInstrumentation, instrument));
diff --git a/src/include/nodes/execnodes.h b/src/include/nodes/execnodes.h
index 44f76082e9..cfeddbe5fc 100644
--- a/src/include/nodes/execnodes.h
+++ b/src/include/nodes/execnodes.h
@@ -948,6 +948,7 @@ typedef struct PlanState
* wrapper */
Instrumentation *instrument; /* Optional runtime stats for this node */
+ Instrumentation *combined_instrument; /* sum for leader and workers */
WorkerInstrumentation *worker_instrument; /* per-worker instrumentation */
/* Per-worker JIT instrumentation */
--
2.23.0
0002-Improve-EXPLAIN-of-Sort-in-parallel-queries.patchapplication/octet-stream; name=0002-Improve-EXPLAIN-of-Sort-in-parallel-queries.patchDownload
From 0802cdd382cdb11739f9054281f3ccbb975686c0 Mon Sep 17 00:00:00 2001
From: Thomas Munro <thomas.munro@gmail.com>
Date: Wed, 23 Oct 2019 17:39:09 +1300
Subject: [PATCH 2/2] Improve EXPLAIN of Sort in parallel queries.
Previously, it wasn't very clear to the uninitiated what work
a Sort node did in the leader process. Label the ANALYZE output
explicitly, like VERBOSE does.
Author: Thomas Munro
---
src/backend/commands/explain.c | 134 ++++++++++--------
src/test/regress/expected/select_parallel.out | 2 +-
2 files changed, 75 insertions(+), 61 deletions(-)
diff --git a/src/backend/commands/explain.c b/src/backend/commands/explain.c
index a1fb22c9f7..4247a3c749 100644
--- a/src/backend/commands/explain.c
+++ b/src/backend/commands/explain.c
@@ -2553,86 +2553,100 @@ show_tablesample(TableSampleClause *tsc, PlanState *planstate,
}
}
-/*
- * If it's EXPLAIN ANALYZE, show tuplesort stats for a sort node
- */
static void
-show_sort_info(SortState *sortstate, ExplainState *es)
+show_sort_info_per_process(TuplesortInstrumentation *sinstrument,
+ bool has_workers,
+ int worker_number,
+ bool *opened_group,
+ ExplainState *es)
{
- if (!es->analyze)
- return;
+ const char *sortMethod;
+ const char *spaceType;
+ long spaceUsed;
- if (sortstate->sort_Done && sortstate->tuplesortstate != NULL)
+ if (sinstrument->sortMethod == SORT_TYPE_STILL_IN_PROGRESS)
+ return; /* ignore any unfilled slots */
+
+ sortMethod = tuplesort_method_name(sinstrument->sortMethod);
+ spaceType = tuplesort_space_type_name(sinstrument->spaceType);
+ spaceUsed = sinstrument->spaceUsed;
+
+ if (es->format == EXPLAIN_FORMAT_TEXT)
{
- Tuplesortstate *state = (Tuplesortstate *) sortstate->tuplesortstate;
- TuplesortInstrumentation stats;
- const char *sortMethod;
- const char *spaceType;
- long spaceUsed;
+ char prefix[80];
- tuplesort_get_stats(state, &stats);
- sortMethod = tuplesort_method_name(stats.sortMethod);
- spaceType = tuplesort_space_type_name(stats.spaceType);
- spaceUsed = stats.spaceUsed;
+ if (!has_workers)
+ snprintf(prefix, sizeof(prefix), "");
+ else if (worker_number < 0)
+ snprintf(prefix, sizeof(prefix), "Leader: ");
+ else
+ snprintf(prefix, sizeof(prefix), "Worker %d: ", worker_number);
- if (es->format == EXPLAIN_FORMAT_TEXT)
+ appendStringInfoSpaces(es->str, es->indent * 2);
+ appendStringInfo(es->str,
+ "%sSort Method: %s %s: %ldkB\n",
+ prefix, sortMethod, spaceType, spaceUsed);
+ }
+ else
+ {
+ if (!has_workers)
{
- appendStringInfoSpaces(es->str, es->indent * 2);
- appendStringInfo(es->str, "Sort Method: %s %s: %ldkB\n",
- sortMethod, spaceType, spaceUsed);
+ ExplainPropertyText("Sort Method", sortMethod, es);
+ ExplainPropertyInteger("Sort Space Used", "kB", spaceUsed, es);
+ ExplainPropertyText("Sort Space Type", spaceType, es);
}
else
{
+ if (!*opened_group)
+ {
+ ExplainOpenGroup("Workers", "Workers", false, es);
+ *opened_group = true;
+ }
+ ExplainOpenGroup("Worker", NULL, true, es);
+ ExplainPropertyInteger("Worker Number", NULL, worker_number, es);
ExplainPropertyText("Sort Method", sortMethod, es);
ExplainPropertyInteger("Sort Space Used", "kB", spaceUsed, es);
ExplainPropertyText("Sort Space Type", spaceType, es);
+ ExplainCloseGroup("Worker", NULL, true, es);
}
}
+}
- if (sortstate->shared_info != NULL)
+/*
+ * If it's EXPLAIN ANALYZE, show tuplesort stats for a sort node
+ */
+static void
+show_sort_info(SortState *sortstate, ExplainState *es)
+{
+ bool opened_group = false;
+
+ if (!es->analyze)
+ return;
+
+ /*
+ * Show the leader stats for parallel queries, or only stats for
+ * non-parallel queries.
+ */
+ if (sortstate->sort_Done && sortstate->tuplesortstate != NULL)
{
- int n;
- bool opened_group = false;
+ Tuplesortstate *state = (Tuplesortstate *) sortstate->tuplesortstate;
+ TuplesortInstrumentation stats;
- for (n = 0; n < sortstate->shared_info->num_workers; n++)
- {
- TuplesortInstrumentation *sinstrument;
- const char *sortMethod;
- const char *spaceType;
- long spaceUsed;
-
- sinstrument = &sortstate->shared_info->sinstrument[n];
- if (sinstrument->sortMethod == SORT_TYPE_STILL_IN_PROGRESS)
- continue; /* ignore any unfilled slots */
- sortMethod = tuplesort_method_name(sinstrument->sortMethod);
- spaceType = tuplesort_space_type_name(sinstrument->spaceType);
- spaceUsed = sinstrument->spaceUsed;
+ tuplesort_get_stats(state, &stats);
+ show_sort_info_per_process(&stats, sortstate->shared_info != NULL,
+ -1, &opened_group, es);
+ }
- if (es->format == EXPLAIN_FORMAT_TEXT)
- {
- appendStringInfoSpaces(es->str, es->indent * 2);
- appendStringInfo(es->str,
- "Worker %d: Sort Method: %s %s: %ldkB\n",
- n, sortMethod, spaceType, spaceUsed);
- }
- else
- {
- if (!opened_group)
- {
- ExplainOpenGroup("Workers", "Workers", false, es);
- opened_group = true;
- }
- ExplainOpenGroup("Worker", NULL, true, es);
- ExplainPropertyInteger("Worker Number", NULL, n, es);
- ExplainPropertyText("Sort Method", sortMethod, es);
- ExplainPropertyInteger("Sort Space Used", "kB", spaceUsed, es);
- ExplainPropertyText("Sort Space Type", spaceType, es);
- ExplainCloseGroup("Worker", NULL, true, es);
- }
- }
- if (opened_group)
- ExplainCloseGroup("Workers", "Workers", false, es);
+ /* Show the worker stats. */
+ if (sortstate->shared_info != NULL)
+ {
+ for (int n = 0; n < sortstate->shared_info->num_workers; n++)
+ show_sort_info_per_process(&sortstate->shared_info->sinstrument[n],
+ true, n, &opened_group, es);
}
+
+ if (opened_group)
+ ExplainCloseGroup("Workers", "Workers", false, es);
}
/*
diff --git a/src/test/regress/expected/select_parallel.out b/src/test/regress/expected/select_parallel.out
index 0eca76cb41..5c0738ccdc 100644
--- a/src/test/regress/expected/select_parallel.out
+++ b/src/test/regress/expected/select_parallel.out
@@ -551,7 +551,7 @@ select * from explain_parallel_sort_stats();
Workers Launched: 4
-> Sort (actual rows=2000 loops=15)
Sort Key: tenk1.ten
- Sort Method: quicksort Memory: xxx
+ Leader: Sort Method: quicksort Memory: xxx
Worker 0: Sort Method: quicksort Memory: xxx
Worker 1: Sort Method: quicksort Memory: xxx
Worker 2: Sort Method: quicksort Memory: xxx
--
2.23.0
On Wed, Oct 23, 2019 at 12:30 AM Thomas Munro <thomas.munro@gmail.com>
wrote:
While working on some slides explaining EXPLAIN, I couldn't resist the
urge to add the missing $SUBJECT. The attached 0001 patch gives the
following:Gather ... time=0.146..33.077 rows=1 loops=1)
Workers Planned: 2
Workers Launched: 2
Buffers: shared hit=4425
-> Parallel Seq Scan on public.t ... time=19.421..30.092 rows=0 loops=3)
Filter: (t.i = 42)
Rows Removed by Filter: 333333
Leader: actual time=0.013..32.025 rows=1 loops=1 <--- NEW
Buffers: shared hit=1546 <--- NEW
Worker 0: actual time=29.069..29.069 rows=0 loops=1
Buffers: shared hit=1126
Worker 1: actual time=29.181..29.181 rows=0 loops=1
Buffers: shared hit=1753Without that, you have to deduce what work was done in the leader, but
I'd rather just show it.The 0002 patch adjusts Sort for consistency with that scheme, so you get:
Sort ... time=84.303..122.238 rows=333333 loops=3)
Output: t1.i
Sort Key: t1.i
Leader: Sort Method: external merge Disk: 5864kB <--- DIFFERENT
Worker 0: Sort Method: external merge Disk: 3376kB
Worker 1: Sort Method: external merge Disk: 4504kB
Leader: actual time=119.624..165.949 rows=426914 loops=1
Worker 0: actual time=61.239..90.984 rows=245612 loops=1
Worker 1: actual time=72.046..109.782 rows=327474 loops=1Without the "Leader" label, it's not really clear to the uninitiated
whether you're looking at combined, average or single process numbers.
Cool! I dig it.
Checked out the patches a bit and noticed that the tuplesort
instrumentation uses spaceUsed and I saw this comment in
tuplesort_get_stats()
/*
* Note: it might seem we should provide both memory and disk usage for a
* disk-based sort. However, the current code doesn't track memory space
* accurately once we have begun to return tuples to the caller (since we
* don't account for pfree's the caller is expected to do), so we cannot
* rely on availMem in a disk sort. This does not seem worth the overhead
* to fix. Is it worth creating an API for the memory context code to
* tell us how much is actually used in sortcontext?
*/
might it be worth trying out the memory accounting API
5dd7fc1519461548eebf26c33eac6878ea3e8788 here?
Of course there are some more things that could be reported in a
similar way eventually, such as filter counters and hash join details.
This made me think about other explain wishlist items.
For parallel hashjoin, I would find it useful to know which batches
each worker participated in (maybe just probing to start with, but
loading would be great too).
I'm not sure anyone else (especially users) would care about this,
though.
--
Melanie Plageman
On Wed, Oct 30, 2019 at 9:24 AM Melanie Plageman
<melanieplageman@gmail.com> wrote:
Checked out the patches a bit and noticed that the tuplesort
instrumentation uses spaceUsed and I saw this comment in
tuplesort_get_stats()
might it be worth trying out the memory accounting API
5dd7fc1519461548eebf26c33eac6878ea3e8788 here?
I made exactly the same suggestion several years back, not long after
the API was first proposed by Jeff. However, tuplesort.c has changed a
lot since that time, to the extent that that comment now seems kind of
obsolete. These days, availMem accounting still isn't used at all for
disk sorts. Rather, the slab allocator is used. Virtually all the
memory used to merge is now managed by logtape.c, with only fixed
per-tape memory buffers within tuplesort.c. This per-tape stuff is
tiny anyway -- slightly more than 1KiB per tape.
It would now be relatively straightforward to report the memory used
by disk-based sorts, without needing to use the memory accounting API.
I imagine that it would report the high watermark memory usage during
the final merge. It's possible for this to be lower than the high
watermark during initial run generation (e.g. because of the
MaxAllocSize limit in buffer size within logtape.c), but that still
seems like the most useful figure to users. There'd be a new
"LogicalTapeSetMemory()" function to go along with the existing
LogicalTapeSetBlocks() function, or something along those lines.
Not planning to work on this now, but perhaps you have time for it.
--
Peter Geoghegan
On Wed, Oct 30, 2019 at 10:39:04AM -0700, Peter Geoghegan wrote:
On Wed, Oct 30, 2019 at 9:24 AM Melanie Plageman
<melanieplageman@gmail.com> wrote:Checked out the patches a bit and noticed that the tuplesort
instrumentation uses spaceUsed and I saw this comment in
tuplesort_get_stats()might it be worth trying out the memory accounting API
5dd7fc1519461548eebf26c33eac6878ea3e8788 here?I made exactly the same suggestion several years back, not long after
the API was first proposed by Jeff. However, tuplesort.c has changed a
lot since that time, to the extent that that comment now seems kind of
obsolete. These days, availMem accounting still isn't used at all for
disk sorts. Rather, the slab allocator is used. Virtually all the
memory used to merge is now managed by logtape.c, with only fixed
per-tape memory buffers within tuplesort.c. This per-tape stuff is
tiny anyway -- slightly more than 1KiB per tape.It would now be relatively straightforward to report the memory used
by disk-based sorts, without needing to use the memory accounting API.
I imagine that it would report the high watermark memory usage during
the final merge. It's possible for this to be lower than the high
watermark during initial run generation (e.g. because of the
MaxAllocSize limit in buffer size within logtape.c), but that still
seems like the most useful figure to users. There'd be a new
"LogicalTapeSetMemory()" function to go along with the existing
LogicalTapeSetBlocks() function, or something along those lines.Not planning to work on this now, but perhaps you have time for it.
Another thing worth mentioning is that the memory accounting API does
nothing about the pfree() calls, mentioned in the comment. The memory is
tracked at the block level, so unless the pfree() frees the whole block
(which only really happens for oversized chunks) the accounting info
does not change.
regards
--
Tomas Vondra http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On Thu, Oct 31, 2019 at 5:24 AM Melanie Plageman
<melanieplageman@gmail.com> wrote:
On Wed, Oct 23, 2019 at 12:30 AM Thomas Munro <thomas.munro@gmail.com> wrote:
Of course there are some more things that could be reported in a
similar way eventually, such as filter counters and hash join details.This made me think about other explain wishlist items.
For parallel hashjoin, I would find it useful to know which batches
each worker participated in (maybe just probing to start with, but
loading would be great too).I'm not sure anyone else (especially users) would care about this,
though.
Yeah, I think that'd be interesting. At some point in the patch set
when I was working on the batch load balancing strategy I showed the
number of tuples hashed and number of batches probed by each process
(not the actual batch numbers, since that seems a bit over the top):
/messages/by-id/CAEepm=0th8Le2SDCv32zN7tMyCJYR9oGYJ52fXNYJz1hrpGW+Q@mail.gmail.com
I guess I thought of that as a debugging feature and took it out
because it was too verbose, but maybe it just needs to be controlled
by the VERBOSE switch. Do you think we should put that back?
On Mon, Nov 4, 2019 at 12:11 PM Thomas Munro <thomas.munro@gmail.com> wrote:
I guess I thought of that as a debugging feature and took it out
because it was too verbose, but maybe it just needs to be controlled
by the VERBOSE switch. Do you think we should put that back?
By which I mean: would you like to send a patch? :-)
Here is a new version of the "Leader:" patch, because cfbot told me
that gcc didn't like it as much as clang.
Attachments:
0001-Show-parallel-leader-stats-in-EXPLAIN-output-v2.patchapplication/octet-stream; name=0001-Show-parallel-leader-stats-in-EXPLAIN-output-v2.patchDownload
From cc82173f5927bb602d4b0d0b900a44ea72b89a77 Mon Sep 17 00:00:00 2001
From: Thomas Munro <thomas.munro@gmail.com>
Date: Wed, 23 Oct 2019 16:17:57 +1300
Subject: [PATCH 1/2] Show parallel leader stats in EXPLAIN output.
Previously, we only showed VERBOSE information from parallel
workers. Show the leader's contribution too.
Author: Thomas Munro
Reviewed-by: Melanie Plageman
Discussion: https://postgr.es/m/CA+hUKG+Z22=vkVkXtKRznzRdtj=MtygfZMYUJs8j7ObjzkG1Lw@mail.gmail.com
---
src/backend/commands/explain.c | 214 ++++++++++++++++------------
src/backend/executor/execParallel.c | 18 +--
src/include/nodes/execnodes.h | 1 +
3 files changed, 130 insertions(+), 103 deletions(-)
diff --git a/src/backend/commands/explain.c b/src/backend/commands/explain.c
index 62fb3434a3..a1fb22c9f7 100644
--- a/src/backend/commands/explain.c
+++ b/src/backend/commands/explain.c
@@ -105,7 +105,7 @@ static void show_hash_info(HashState *hashstate, ExplainState *es);
static void show_tidbitmap_info(BitmapHeapScanState *planstate,
ExplainState *es);
static void show_instrumentation_count(const char *qlabel, int which,
- PlanState *planstate, ExplainState *es);
+ Instrumentation *instrument, ExplainState *es);
static void show_foreignscan_info(ForeignScanState *fsstate, ExplainState *es);
static void show_eval_params(Bitmapset *bms_params, ExplainState *es);
static const char *explain_get_index_name(Oid indexId);
@@ -1041,6 +1041,68 @@ ExplainPreScanNode(PlanState *planstate, Bitmapset **rels_used)
return planstate_tree_walker(planstate, ExplainPreScanNode, rels_used);
}
+static void
+ExplainNodePerProcess(ExplainState *es, bool *opened_group,
+ int worker_number, Instrumentation *instrument)
+{
+ double nloops = instrument->nloops;
+ double startup_ms;
+ double total_ms;
+ double rows;
+
+ if (nloops <= 0)
+ return;
+ startup_ms = 1000.0 * instrument->startup / nloops;
+ total_ms = 1000.0 * instrument->total / nloops;
+ rows = instrument->ntuples / nloops;
+
+ if (es->format == EXPLAIN_FORMAT_TEXT)
+ {
+ appendStringInfoSpaces(es->str, es->indent * 2);
+ if (worker_number < 0)
+ appendStringInfo(es->str, "Leader: ");
+ else
+ appendStringInfo(es->str, "Worker %d: ", worker_number);
+ if (es->timing)
+ appendStringInfo(es->str,
+ "actual time=%.3f..%.3f rows=%.0f loops=%.0f\n",
+ startup_ms, total_ms, rows, nloops);
+ else
+ appendStringInfo(es->str,
+ "actual rows=%.0f loops=%.0f\n",
+ rows, nloops);
+ es->indent++;
+ if (es->buffers)
+ show_buffer_usage(es, &instrument->bufusage);
+ es->indent--;
+ }
+ else
+ {
+ if (!*opened_group)
+ {
+ ExplainOpenGroup("Workers", "Workers", false, es);
+ *opened_group = true;
+ }
+ ExplainOpenGroup("Worker", NULL, true, es);
+ ExplainPropertyInteger("Worker Number", NULL, worker_number, es);
+
+ if (es->timing)
+ {
+ ExplainPropertyFloat("Actual Startup Time", "ms",
+ startup_ms, 3, es);
+ ExplainPropertyFloat("Actual Total Time", "ms",
+ total_ms, 3, es);
+ }
+ ExplainPropertyFloat("Actual Rows", NULL, rows, 0, es);
+ ExplainPropertyFloat("Actual Loops", NULL, nloops, 0, es);
+
+ if (es->buffers)
+ show_buffer_usage(es, &instrument->bufusage);
+
+ ExplainCloseGroup("Worker", NULL, true, es);
+ }
+}
+
/*
* ExplainNode -
* Appends a description of a plan tree to es->str
@@ -1067,6 +1129,7 @@ ExplainNode(PlanState *planstate, List *ancestors,
ExplainState *es)
{
Plan *plan = planstate->plan;
+ Instrumentation *show_instrument;
const char *pname; /* node type name for text output */
const char *sname; /* node type name for non-text output */
const char *strategy = NULL;
@@ -1076,6 +1139,12 @@ ExplainNode(PlanState *planstate, List *ancestors,
int save_indent = es->indent;
bool haschildren;
+ /* If this is a parallel query, show the combined stats. */
+ if (planstate->combined_instrument)
+ show_instrument = planstate->combined_instrument;
+ else
+ show_instrument = planstate->instrument;
+
switch (nodeTag(plan))
{
case T_Result:
@@ -1505,13 +1574,19 @@ ExplainNode(PlanState *planstate, List *ancestors,
if (planstate->instrument)
InstrEndLoop(planstate->instrument);
- if (es->analyze &&
- planstate->instrument && planstate->instrument->nloops > 0)
+ /*
+ * If we have combined instrumentation from parallel workers, add the
+ * leader's contribution.
+ */
+ if (planstate->combined_instrument && planstate->instrument)
+ InstrAggNode(planstate->combined_instrument, planstate->instrument);
+
+ if (es->analyze && show_instrument && show_instrument->nloops > 0)
{
- double nloops = planstate->instrument->nloops;
- double startup_ms = 1000.0 * planstate->instrument->startup / nloops;
- double total_ms = 1000.0 * planstate->instrument->total / nloops;
- double rows = planstate->instrument->ntuples / nloops;
+ double nloops = show_instrument->nloops;
+ double startup_ms = 1000.0 * show_instrument->startup / nloops;
+ double total_ms = 1000.0 * show_instrument->total / nloops;
+ double rows = show_instrument->ntuples / nloops;
if (es->format == EXPLAIN_FORMAT_TEXT)
{
@@ -1586,26 +1661,26 @@ ExplainNode(PlanState *planstate, List *ancestors,
"Index Cond", planstate, ancestors, es);
if (((IndexScan *) plan)->indexqualorig)
show_instrumentation_count("Rows Removed by Index Recheck", 2,
- planstate, es);
+ show_instrument, es);
show_scan_qual(((IndexScan *) plan)->indexorderbyorig,
"Order By", planstate, ancestors, es);
show_scan_qual(plan->qual, "Filter", planstate, ancestors, es);
if (plan->qual)
show_instrumentation_count("Rows Removed by Filter", 1,
- planstate, es);
+ show_instrument, es);
break;
case T_IndexOnlyScan:
show_scan_qual(((IndexOnlyScan *) plan)->indexqual,
"Index Cond", planstate, ancestors, es);
if (((IndexOnlyScan *) plan)->indexqual)
show_instrumentation_count("Rows Removed by Index Recheck", 2,
- planstate, es);
+ show_instrument, es);
show_scan_qual(((IndexOnlyScan *) plan)->indexorderby,
"Order By", planstate, ancestors, es);
show_scan_qual(plan->qual, "Filter", planstate, ancestors, es);
if (plan->qual)
show_instrumentation_count("Rows Removed by Filter", 1,
- planstate, es);
+ show_instrument, es);
if (es->analyze)
ExplainPropertyFloat("Heap Fetches", NULL,
planstate->instrument->ntuples2, 0, es);
@@ -1619,11 +1694,11 @@ ExplainNode(PlanState *planstate, List *ancestors,
"Recheck Cond", planstate, ancestors, es);
if (((BitmapHeapScan *) plan)->bitmapqualorig)
show_instrumentation_count("Rows Removed by Index Recheck", 2,
- planstate, es);
+ show_instrument, es);
show_scan_qual(plan->qual, "Filter", planstate, ancestors, es);
if (plan->qual)
show_instrumentation_count("Rows Removed by Filter", 1,
- planstate, es);
+ show_instrument, es);
if (es->analyze)
show_tidbitmap_info((BitmapHeapScanState *) planstate, es);
break;
@@ -1641,7 +1716,7 @@ ExplainNode(PlanState *planstate, List *ancestors,
show_scan_qual(plan->qual, "Filter", planstate, ancestors, es);
if (plan->qual)
show_instrumentation_count("Rows Removed by Filter", 1,
- planstate, es);
+ show_instrument, es);
break;
case T_Gather:
{
@@ -1650,7 +1725,7 @@ ExplainNode(PlanState *planstate, List *ancestors,
show_scan_qual(plan->qual, "Filter", planstate, ancestors, es);
if (plan->qual)
show_instrumentation_count("Rows Removed by Filter", 1,
- planstate, es);
+ show_instrument, es);
ExplainPropertyInteger("Workers Planned", NULL,
gather->num_workers, es);
@@ -1696,7 +1771,7 @@ ExplainNode(PlanState *planstate, List *ancestors,
show_scan_qual(plan->qual, "Filter", planstate, ancestors, es);
if (plan->qual)
show_instrumentation_count("Rows Removed by Filter", 1,
- planstate, es);
+ show_instrument, es);
ExplainPropertyInteger("Workers Planned", NULL,
gm->num_workers, es);
@@ -1734,7 +1809,7 @@ ExplainNode(PlanState *planstate, List *ancestors,
show_scan_qual(plan->qual, "Filter", planstate, ancestors, es);
if (plan->qual)
show_instrumentation_count("Rows Removed by Filter", 1,
- planstate, es);
+ show_instrument, es);
break;
case T_TableFuncScan:
if (es->verbose)
@@ -1748,7 +1823,7 @@ ExplainNode(PlanState *planstate, List *ancestors,
show_scan_qual(plan->qual, "Filter", planstate, ancestors, es);
if (plan->qual)
show_instrumentation_count("Rows Removed by Filter", 1,
- planstate, es);
+ show_instrument, es);
break;
case T_TidScan:
{
@@ -1764,14 +1839,14 @@ ExplainNode(PlanState *planstate, List *ancestors,
show_scan_qual(plan->qual, "Filter", planstate, ancestors, es);
if (plan->qual)
show_instrumentation_count("Rows Removed by Filter", 1,
- planstate, es);
+ show_instrument, es);
}
break;
case T_ForeignScan:
show_scan_qual(plan->qual, "Filter", planstate, ancestors, es);
if (plan->qual)
show_instrumentation_count("Rows Removed by Filter", 1,
- planstate, es);
+ show_instrument, es);
show_foreignscan_info((ForeignScanState *) planstate, es);
break;
case T_CustomScan:
@@ -1781,7 +1856,7 @@ ExplainNode(PlanState *planstate, List *ancestors,
show_scan_qual(plan->qual, "Filter", planstate, ancestors, es);
if (plan->qual)
show_instrumentation_count("Rows Removed by Filter", 1,
- planstate, es);
+ show_instrument, es);
if (css->methods->ExplainCustomScan)
css->methods->ExplainCustomScan(css, ancestors, es);
}
@@ -1791,11 +1866,11 @@ ExplainNode(PlanState *planstate, List *ancestors,
"Join Filter", planstate, ancestors, es);
if (((NestLoop *) plan)->join.joinqual)
show_instrumentation_count("Rows Removed by Join Filter", 1,
- planstate, es);
+ show_instrument, es);
show_upper_qual(plan->qual, "Filter", planstate, ancestors, es);
if (plan->qual)
show_instrumentation_count("Rows Removed by Filter", 2,
- planstate, es);
+ show_instrument, es);
break;
case T_MergeJoin:
show_upper_qual(((MergeJoin *) plan)->mergeclauses,
@@ -1804,11 +1879,11 @@ ExplainNode(PlanState *planstate, List *ancestors,
"Join Filter", planstate, ancestors, es);
if (((MergeJoin *) plan)->join.joinqual)
show_instrumentation_count("Rows Removed by Join Filter", 1,
- planstate, es);
+ show_instrument, es);
show_upper_qual(plan->qual, "Filter", planstate, ancestors, es);
if (plan->qual)
show_instrumentation_count("Rows Removed by Filter", 2,
- planstate, es);
+ show_instrument, es);
break;
case T_HashJoin:
show_upper_qual(((HashJoin *) plan)->hashclauses,
@@ -1817,25 +1892,25 @@ ExplainNode(PlanState *planstate, List *ancestors,
"Join Filter", planstate, ancestors, es);
if (((HashJoin *) plan)->join.joinqual)
show_instrumentation_count("Rows Removed by Join Filter", 1,
- planstate, es);
+ show_instrument, es);
show_upper_qual(plan->qual, "Filter", planstate, ancestors, es);
if (plan->qual)
show_instrumentation_count("Rows Removed by Filter", 2,
- planstate, es);
+ show_instrument, es);
break;
case T_Agg:
show_agg_keys(castNode(AggState, planstate), ancestors, es);
show_upper_qual(plan->qual, "Filter", planstate, ancestors, es);
if (plan->qual)
show_instrumentation_count("Rows Removed by Filter", 1,
- planstate, es);
+ show_instrument, es);
break;
case T_Group:
show_group_keys(castNode(GroupState, planstate), ancestors, es);
show_upper_qual(plan->qual, "Filter", planstate, ancestors, es);
if (plan->qual)
show_instrumentation_count("Rows Removed by Filter", 1,
- planstate, es);
+ show_instrument, es);
break;
case T_Sort:
show_sort_keys(castNode(SortState, planstate), ancestors, es);
@@ -1851,7 +1926,7 @@ ExplainNode(PlanState *planstate, List *ancestors,
show_upper_qual(plan->qual, "Filter", planstate, ancestors, es);
if (plan->qual)
show_instrumentation_count("Rows Removed by Filter", 1,
- planstate, es);
+ show_instrument, es);
break;
case T_ModifyTable:
show_modifytable_info(castNode(ModifyTableState, planstate), ancestors,
@@ -1865,73 +1940,21 @@ ExplainNode(PlanState *planstate, List *ancestors,
}
/* Show buffer usage */
- if (es->buffers && planstate->instrument)
- show_buffer_usage(es, &planstate->instrument->bufusage);
+ if (es->buffers && show_instrument)
+ show_buffer_usage(es, &show_instrument->bufusage);
/* Show worker detail */
if (es->analyze && es->verbose && planstate->worker_instrument)
{
WorkerInstrumentation *w = planstate->worker_instrument;
bool opened_group = false;
- int n;
- for (n = 0; n < w->num_workers; ++n)
- {
- Instrumentation *instrument = &w->instrument[n];
- double nloops = instrument->nloops;
- double startup_ms;
- double total_ms;
- double rows;
-
- if (nloops <= 0)
- continue;
- startup_ms = 1000.0 * instrument->startup / nloops;
- total_ms = 1000.0 * instrument->total / nloops;
- rows = instrument->ntuples / nloops;
+ /* Leader */
+ ExplainNodePerProcess(es, &opened_group, -1, planstate->instrument);
- if (es->format == EXPLAIN_FORMAT_TEXT)
- {
- appendStringInfoSpaces(es->str, es->indent * 2);
- appendStringInfo(es->str, "Worker %d: ", n);
- if (es->timing)
- appendStringInfo(es->str,
- "actual time=%.3f..%.3f rows=%.0f loops=%.0f\n",
- startup_ms, total_ms, rows, nloops);
- else
- appendStringInfo(es->str,
- "actual rows=%.0f loops=%.0f\n",
- rows, nloops);
- es->indent++;
- if (es->buffers)
- show_buffer_usage(es, &instrument->bufusage);
- es->indent--;
- }
- else
- {
- if (!opened_group)
- {
- ExplainOpenGroup("Workers", "Workers", false, es);
- opened_group = true;
- }
- ExplainOpenGroup("Worker", NULL, true, es);
- ExplainPropertyInteger("Worker Number", NULL, n, es);
-
- if (es->timing)
- {
- ExplainPropertyFloat("Actual Startup Time", "ms",
- startup_ms, 3, es);
- ExplainPropertyFloat("Actual Total Time", "ms",
- total_ms, 3, es);
- }
- ExplainPropertyFloat("Actual Rows", NULL, rows, 0, es);
- ExplainPropertyFloat("Actual Loops", NULL, nloops, 0, es);
-
- if (es->buffers)
- show_buffer_usage(es, &instrument->bufusage);
-
- ExplainCloseGroup("Worker", NULL, true, es);
- }
- }
+ /* Workers */
+ for (int n = 0; n < w->num_workers; ++n)
+ ExplainNodePerProcess(es, &opened_group, n, &w->instrument[n]);
if (opened_group)
ExplainCloseGroup("Workers", "Workers", false, es);
@@ -2750,19 +2773,19 @@ show_tidbitmap_info(BitmapHeapScanState *planstate, ExplainState *es)
*/
static void
show_instrumentation_count(const char *qlabel, int which,
- PlanState *planstate, ExplainState *es)
+ Instrumentation *instrument, ExplainState *es)
{
double nfiltered;
double nloops;
- if (!es->analyze || !planstate->instrument)
+ if (!es->analyze || !instrument)
return;
if (which == 2)
- nfiltered = planstate->instrument->nfiltered2;
+ nfiltered = instrument->nfiltered2;
else
- nfiltered = planstate->instrument->nfiltered1;
- nloops = planstate->instrument->nloops;
+ nfiltered = instrument->nfiltered1;
+ nloops = instrument->nloops;
/* In text mode, suppress zero counts; they're not interesting enough */
if (nfiltered > 0 || es->format != EXPLAIN_FORMAT_TEXT)
@@ -3280,7 +3303,8 @@ show_modifytable_info(ModifyTableState *mtstate, List *ancestors,
{
show_upper_qual((List *) node->onConflictWhere, "Conflict Filter",
&mtstate->ps, ancestors, es);
- show_instrumentation_count("Rows Removed by Conflict Filter", 1, &mtstate->ps, es);
+ show_instrumentation_count("Rows Removed by Conflict Filter", 1,
+ mtstate->ps.instrument, es);
}
/* EXPLAIN ANALYZE display of actual outcome for each tuple proposed */
diff --git a/src/backend/executor/execParallel.c b/src/backend/executor/execParallel.c
index 53cd2fc666..26a04692ab 100644
--- a/src/backend/executor/execParallel.c
+++ b/src/backend/executor/execParallel.c
@@ -987,20 +987,22 @@ ExecParallelRetrieveInstrumentation(PlanState *planstate,
if (i >= instrumentation->num_plan_nodes)
elog(ERROR, "plan node %d not found", plan_node_id);
- /* Accumulate the statistics from all workers. */
- instrument = GetInstrumentationArray(instrumentation);
- instrument += i * instrumentation->num_workers;
- for (n = 0; n < instrumentation->num_workers; ++n)
- InstrAggNode(planstate->instrument, &instrument[n]);
-
/*
- * Also store the per-worker detail.
- *
* Worker instrumentation should be allocated in the same context as the
* regular instrumentation information, which is the per-query context.
* Switch into per-query memory context.
*/
oldcontext = MemoryContextSwitchTo(planstate->state->es_query_cxt);
+
+ /* Accumulate the statistics from all workers. */
+ if (!planstate->combined_instrument)
+ planstate->combined_instrument = palloc0(sizeof(Instrumentation));
+ instrument = GetInstrumentationArray(instrumentation);
+ instrument += i * instrumentation->num_workers;
+ for (n = 0; n < instrumentation->num_workers; ++n)
+ InstrAggNode(planstate->combined_instrument, &instrument[n]);
+
+ /* Also store the per-worker detail. */
ibytes = mul_size(instrumentation->num_workers, sizeof(Instrumentation));
planstate->worker_instrument =
palloc(ibytes + offsetof(WorkerInstrumentation, instrument));
diff --git a/src/include/nodes/execnodes.h b/src/include/nodes/execnodes.h
index 44f76082e9..cfeddbe5fc 100644
--- a/src/include/nodes/execnodes.h
+++ b/src/include/nodes/execnodes.h
@@ -948,6 +948,7 @@ typedef struct PlanState
* wrapper */
Instrumentation *instrument; /* Optional runtime stats for this node */
+ Instrumentation *combined_instrument; /* sum for leader and workers */
WorkerInstrumentation *worker_instrument; /* per-worker instrumentation */
/* Per-worker JIT instrumentation */
--
2.23.0
0002-Improve-EXPLAIN-of-Sort-in-parallel-queries-v2.patchapplication/octet-stream; name=0002-Improve-EXPLAIN-of-Sort-in-parallel-queries-v2.patchDownload
From cdd85e5e070d97566296f074deaacb0a523f7a9a Mon Sep 17 00:00:00 2001
From: Thomas Munro <thomas.munro@gmail.com>
Date: Wed, 23 Oct 2019 17:39:09 +1300
Subject: [PATCH 2/2] Improve EXPLAIN of Sort in parallel queries.
Previously, it wasn't very clear to the uninitiated what work
a Sort node did in the leader process. Label the ANALYZE output
explicitly, like VERBOSE does.
Author: Thomas Munro
Reviewed-by: Melanie Plageman
Discussion: https://postgr.es/m/CA+hUKG+Z22=vkVkXtKRznzRdtj=MtygfZMYUJs8j7ObjzkG1Lw@mail.gmail.com
---
src/backend/commands/explain.c | 134 ++++++++++--------
src/test/regress/expected/select_parallel.out | 2 +-
2 files changed, 75 insertions(+), 61 deletions(-)
diff --git a/src/backend/commands/explain.c b/src/backend/commands/explain.c
index a1fb22c9f7..03326d6f8f 100644
--- a/src/backend/commands/explain.c
+++ b/src/backend/commands/explain.c
@@ -2553,86 +2553,100 @@ show_tablesample(TableSampleClause *tsc, PlanState *planstate,
}
}
-/*
- * If it's EXPLAIN ANALYZE, show tuplesort stats for a sort node
- */
static void
-show_sort_info(SortState *sortstate, ExplainState *es)
+show_sort_info_per_process(TuplesortInstrumentation *sinstrument,
+ bool has_workers,
+ int worker_number,
+ bool *opened_group,
+ ExplainState *es)
{
- if (!es->analyze)
- return;
+ const char *sortMethod;
+ const char *spaceType;
+ long spaceUsed;
- if (sortstate->sort_Done && sortstate->tuplesortstate != NULL)
+ if (sinstrument->sortMethod == SORT_TYPE_STILL_IN_PROGRESS)
+ return; /* ignore any unfilled slots */
+
+ sortMethod = tuplesort_method_name(sinstrument->sortMethod);
+ spaceType = tuplesort_space_type_name(sinstrument->spaceType);
+ spaceUsed = sinstrument->spaceUsed;
+
+ if (es->format == EXPLAIN_FORMAT_TEXT)
{
- Tuplesortstate *state = (Tuplesortstate *) sortstate->tuplesortstate;
- TuplesortInstrumentation stats;
- const char *sortMethod;
- const char *spaceType;
- long spaceUsed;
+ char prefix[80];
- tuplesort_get_stats(state, &stats);
- sortMethod = tuplesort_method_name(stats.sortMethod);
- spaceType = tuplesort_space_type_name(stats.spaceType);
- spaceUsed = stats.spaceUsed;
+ if (!has_workers)
+ prefix[0] = '\0';
+ else if (worker_number < 0)
+ snprintf(prefix, sizeof(prefix), "Leader: ");
+ else
+ snprintf(prefix, sizeof(prefix), "Worker %d: ", worker_number);
- if (es->format == EXPLAIN_FORMAT_TEXT)
+ appendStringInfoSpaces(es->str, es->indent * 2);
+ appendStringInfo(es->str,
+ "%sSort Method: %s %s: %ldkB\n",
+ prefix, sortMethod, spaceType, spaceUsed);
+ }
+ else
+ {
+ if (!has_workers)
{
- appendStringInfoSpaces(es->str, es->indent * 2);
- appendStringInfo(es->str, "Sort Method: %s %s: %ldkB\n",
- sortMethod, spaceType, spaceUsed);
+ ExplainPropertyText("Sort Method", sortMethod, es);
+ ExplainPropertyInteger("Sort Space Used", "kB", spaceUsed, es);
+ ExplainPropertyText("Sort Space Type", spaceType, es);
}
else
{
+ if (!*opened_group)
+ {
+ ExplainOpenGroup("Workers", "Workers", false, es);
+ *opened_group = true;
+ }
+ ExplainOpenGroup("Worker", NULL, true, es);
+ ExplainPropertyInteger("Worker Number", NULL, worker_number, es);
ExplainPropertyText("Sort Method", sortMethod, es);
ExplainPropertyInteger("Sort Space Used", "kB", spaceUsed, es);
ExplainPropertyText("Sort Space Type", spaceType, es);
+ ExplainCloseGroup("Worker", NULL, true, es);
}
}
+}
- if (sortstate->shared_info != NULL)
+/*
+ * If it's EXPLAIN ANALYZE, show tuplesort stats for a sort node
+ */
+static void
+show_sort_info(SortState *sortstate, ExplainState *es)
+{
+ bool opened_group = false;
+
+ if (!es->analyze)
+ return;
+
+ /*
+ * Show the leader stats for parallel queries, or only stats for
+ * non-parallel queries.
+ */
+ if (sortstate->sort_Done && sortstate->tuplesortstate != NULL)
{
- int n;
- bool opened_group = false;
+ Tuplesortstate *state = (Tuplesortstate *) sortstate->tuplesortstate;
+ TuplesortInstrumentation stats;
- for (n = 0; n < sortstate->shared_info->num_workers; n++)
- {
- TuplesortInstrumentation *sinstrument;
- const char *sortMethod;
- const char *spaceType;
- long spaceUsed;
-
- sinstrument = &sortstate->shared_info->sinstrument[n];
- if (sinstrument->sortMethod == SORT_TYPE_STILL_IN_PROGRESS)
- continue; /* ignore any unfilled slots */
- sortMethod = tuplesort_method_name(sinstrument->sortMethod);
- spaceType = tuplesort_space_type_name(sinstrument->spaceType);
- spaceUsed = sinstrument->spaceUsed;
+ tuplesort_get_stats(state, &stats);
+ show_sort_info_per_process(&stats, sortstate->shared_info != NULL,
+ -1, &opened_group, es);
+ }
- if (es->format == EXPLAIN_FORMAT_TEXT)
- {
- appendStringInfoSpaces(es->str, es->indent * 2);
- appendStringInfo(es->str,
- "Worker %d: Sort Method: %s %s: %ldkB\n",
- n, sortMethod, spaceType, spaceUsed);
- }
- else
- {
- if (!opened_group)
- {
- ExplainOpenGroup("Workers", "Workers", false, es);
- opened_group = true;
- }
- ExplainOpenGroup("Worker", NULL, true, es);
- ExplainPropertyInteger("Worker Number", NULL, n, es);
- ExplainPropertyText("Sort Method", sortMethod, es);
- ExplainPropertyInteger("Sort Space Used", "kB", spaceUsed, es);
- ExplainPropertyText("Sort Space Type", spaceType, es);
- ExplainCloseGroup("Worker", NULL, true, es);
- }
- }
- if (opened_group)
- ExplainCloseGroup("Workers", "Workers", false, es);
+ /* Show the worker stats. */
+ if (sortstate->shared_info != NULL)
+ {
+ for (int n = 0; n < sortstate->shared_info->num_workers; n++)
+ show_sort_info_per_process(&sortstate->shared_info->sinstrument[n],
+ true, n, &opened_group, es);
}
+
+ if (opened_group)
+ ExplainCloseGroup("Workers", "Workers", false, es);
}
/*
diff --git a/src/test/regress/expected/select_parallel.out b/src/test/regress/expected/select_parallel.out
index 0eca76cb41..5c0738ccdc 100644
--- a/src/test/regress/expected/select_parallel.out
+++ b/src/test/regress/expected/select_parallel.out
@@ -551,7 +551,7 @@ select * from explain_parallel_sort_stats();
Workers Launched: 4
-> Sort (actual rows=2000 loops=15)
Sort Key: tenk1.ten
- Sort Method: quicksort Memory: xxx
+ Leader: Sort Method: quicksort Memory: xxx
Worker 0: Sort Method: quicksort Memory: xxx
Worker 1: Sort Method: quicksort Memory: xxx
Worker 2: Sort Method: quicksort Memory: xxx
--
2.23.0
On Mon, 4 Nov 2019 at 00:30, Thomas Munro <thomas.munro@gmail.com> wrote:
On Mon, Nov 4, 2019 at 12:11 PM Thomas Munro <thomas.munro@gmail.com>
wrote:I guess I thought of that as a debugging feature and took it out
because it was too verbose, but maybe it just needs to be controlled
by the VERBOSE switch. Do you think we should put that back?By which I mean: would you like to send a patch? :-)
Here is a new version of the "Leader:" patch, because cfbot told me
that gcc didn't like it as much as clang.
I was reviewing this patch and here are a few comments,
+static void
+ExplainNodePerProcess(ExplainState *es, bool *opened_group,
+ int worker_number, Instrumentation *instrument)
+{
A small description about this routine would be helpful and will give the
file a consistent look.
Also, I noticed that the worker details are displayed for sort node even
without verbose, but for scans it is only with verbose. Am I missing
something or there is something behind? However, I am not sure if this is
the introduced by this patch-set.
--
Regards,
Rafia Sabih
On Thu, Nov 7, 2019 at 11:37 PM Rafia Sabih <rafia.pghackers@gmail.com> wrote:
I was reviewing this patch and here are a few comments,
Hi Rafia,
Thanks!
+static void +ExplainNodePerProcess(ExplainState *es, bool *opened_group, + int worker_number, Instrumentation *instrument) +{A small description about this routine would be helpful and will give the file a consistent look.
Done for both new functions. I also improved the commit message for
0001 a bit to explain the change better.
Also, I noticed that the worker details are displayed for sort node even without verbose, but for scans it is only with verbose. Am I missing something or there is something behind? However, I am not sure if this is the introduced by this patch-set.
Yeah, it's a pre-existing thing, but I agree it's an interesting
difference. We currently don't have a way to show a 'combined'
version of a parallel (oblivious) sort: we always show the per-process
version, and all this patch changes is how we label the leader's
stats. I suppose someone could argue that in non-VERBOSE mode we
should show the total memory usage (sum from all processes). I suppose
it's possible they use different sort types (one worker runs out of
work_mem and another doesn't), and I'm not sure how how you'd
aggregate that.
Attachments:
0001-Show-parallel-leader-stats-in-EXPLAIN-output-v3.patchapplication/octet-stream; name=0001-Show-parallel-leader-stats-in-EXPLAIN-output-v3.patchDownload
From 7e3739cec08140154818e6f75b1ca74e0f2e9234 Mon Sep 17 00:00:00 2001
From: Thomas Munro <thomas.munro@gmail.com>
Date: Wed, 23 Oct 2019 16:17:57 +1300
Subject: [PATCH 1/2] Show parallel leader stats in EXPLAIN output.
Previously, we only showed VERBOSE information from parallel
workers. Show the leader's contribution too.
Instead of aggregating all workers' instrumentation into the
leader's, create a separate instrumentation object into which
all processes' contributions are aggregated. Then the leader's
is still available for separate display.
Author: Thomas Munro
Reviewed-by: Melanie Plageman, Rafia Sabih
Discussion: https://postgr.es/m/CA+hUKG+Z22=vkVkXtKRznzRdtj=MtygfZMYUJs8j7ObjzkG1Lw@mail.gmail.com
---
src/backend/commands/explain.c | 221 ++++++++++++++++------------
src/backend/executor/execParallel.c | 18 ++-
src/include/nodes/execnodes.h | 1 +
3 files changed, 137 insertions(+), 103 deletions(-)
diff --git a/src/backend/commands/explain.c b/src/backend/commands/explain.c
index 62fb3434a3..89c39de72a 100644
--- a/src/backend/commands/explain.c
+++ b/src/backend/commands/explain.c
@@ -105,7 +105,7 @@ static void show_hash_info(HashState *hashstate, ExplainState *es);
static void show_tidbitmap_info(BitmapHeapScanState *planstate,
ExplainState *es);
static void show_instrumentation_count(const char *qlabel, int which,
- PlanState *planstate, ExplainState *es);
+ Instrumentation *instrument, ExplainState *es);
static void show_foreignscan_info(ForeignScanState *fsstate, ExplainState *es);
static void show_eval_params(Bitmapset *bms_params, ExplainState *es);
static const char *explain_get_index_name(Oid indexId);
@@ -1041,6 +1041,75 @@ ExplainPreScanNode(PlanState *planstate, Bitmapset **rels_used)
return planstate_tree_walker(planstate, ExplainPreScanNode, rels_used);
}
+/*
+ * ExplainNodePerProcess -
+ * Show the per-process instrumentation for a node run in parallel.
+ *
+ * Subroutine for ExplainNode() when showing a parallel query in ANALYZE
+ * VERBOSE mode. Use worker number -1 for the leader process.
+ */
+static void
+ExplainNodePerProcess(ExplainState *es, bool *opened_group,
+ int worker_number, Instrumentation *instrument)
+{
+ double nloops = instrument->nloops;
+ double startup_ms;
+ double total_ms;
+ double rows;
+
+ if (nloops <= 0)
+ return;
+ startup_ms = 1000.0 * instrument->startup / nloops;
+ total_ms = 1000.0 * instrument->total / nloops;
+ rows = instrument->ntuples / nloops;
+
+ if (es->format == EXPLAIN_FORMAT_TEXT)
+ {
+ appendStringInfoSpaces(es->str, es->indent * 2);
+ if (worker_number < 0)
+ appendStringInfo(es->str, "Leader: ");
+ else
+ appendStringInfo(es->str, "Worker %d: ", worker_number);
+ if (es->timing)
+ appendStringInfo(es->str,
+ "actual time=%.3f..%.3f rows=%.0f loops=%.0f\n",
+ startup_ms, total_ms, rows, nloops);
+ else
+ appendStringInfo(es->str,
+ "actual rows=%.0f loops=%.0f\n",
+ rows, nloops);
+ es->indent++;
+ if (es->buffers)
+ show_buffer_usage(es, &instrument->bufusage);
+ es->indent--;
+ }
+ else
+ {
+ if (!*opened_group)
+ {
+ ExplainOpenGroup("Workers", "Workers", false, es);
+ *opened_group = true;
+ }
+ ExplainOpenGroup("Worker", NULL, true, es);
+ ExplainPropertyInteger("Worker Number", NULL, worker_number, es);
+
+ if (es->timing)
+ {
+ ExplainPropertyFloat("Actual Startup Time", "ms",
+ startup_ms, 3, es);
+ ExplainPropertyFloat("Actual Total Time", "ms",
+ total_ms, 3, es);
+ }
+ ExplainPropertyFloat("Actual Rows", NULL, rows, 0, es);
+ ExplainPropertyFloat("Actual Loops", NULL, nloops, 0, es);
+
+ if (es->buffers)
+ show_buffer_usage(es, &instrument->bufusage);
+
+ ExplainCloseGroup("Worker", NULL, true, es);
+ }
+}
+
/*
* ExplainNode -
* Appends a description of a plan tree to es->str
@@ -1067,6 +1136,7 @@ ExplainNode(PlanState *planstate, List *ancestors,
ExplainState *es)
{
Plan *plan = planstate->plan;
+ Instrumentation *show_instrument;
const char *pname; /* node type name for text output */
const char *sname; /* node type name for non-text output */
const char *strategy = NULL;
@@ -1076,6 +1146,12 @@ ExplainNode(PlanState *planstate, List *ancestors,
int save_indent = es->indent;
bool haschildren;
+ /* If this is a parallel query, show the combined stats. */
+ if (planstate->combined_instrument)
+ show_instrument = planstate->combined_instrument;
+ else
+ show_instrument = planstate->instrument;
+
switch (nodeTag(plan))
{
case T_Result:
@@ -1505,13 +1581,19 @@ ExplainNode(PlanState *planstate, List *ancestors,
if (planstate->instrument)
InstrEndLoop(planstate->instrument);
- if (es->analyze &&
- planstate->instrument && planstate->instrument->nloops > 0)
+ /*
+ * If we have combined instrumentation from parallel workers, add the
+ * leader's contribution.
+ */
+ if (planstate->combined_instrument && planstate->instrument)
+ InstrAggNode(planstate->combined_instrument, planstate->instrument);
+
+ if (es->analyze && show_instrument && show_instrument->nloops > 0)
{
- double nloops = planstate->instrument->nloops;
- double startup_ms = 1000.0 * planstate->instrument->startup / nloops;
- double total_ms = 1000.0 * planstate->instrument->total / nloops;
- double rows = planstate->instrument->ntuples / nloops;
+ double nloops = show_instrument->nloops;
+ double startup_ms = 1000.0 * show_instrument->startup / nloops;
+ double total_ms = 1000.0 * show_instrument->total / nloops;
+ double rows = show_instrument->ntuples / nloops;
if (es->format == EXPLAIN_FORMAT_TEXT)
{
@@ -1586,26 +1668,26 @@ ExplainNode(PlanState *planstate, List *ancestors,
"Index Cond", planstate, ancestors, es);
if (((IndexScan *) plan)->indexqualorig)
show_instrumentation_count("Rows Removed by Index Recheck", 2,
- planstate, es);
+ show_instrument, es);
show_scan_qual(((IndexScan *) plan)->indexorderbyorig,
"Order By", planstate, ancestors, es);
show_scan_qual(plan->qual, "Filter", planstate, ancestors, es);
if (plan->qual)
show_instrumentation_count("Rows Removed by Filter", 1,
- planstate, es);
+ show_instrument, es);
break;
case T_IndexOnlyScan:
show_scan_qual(((IndexOnlyScan *) plan)->indexqual,
"Index Cond", planstate, ancestors, es);
if (((IndexOnlyScan *) plan)->indexqual)
show_instrumentation_count("Rows Removed by Index Recheck", 2,
- planstate, es);
+ show_instrument, es);
show_scan_qual(((IndexOnlyScan *) plan)->indexorderby,
"Order By", planstate, ancestors, es);
show_scan_qual(plan->qual, "Filter", planstate, ancestors, es);
if (plan->qual)
show_instrumentation_count("Rows Removed by Filter", 1,
- planstate, es);
+ show_instrument, es);
if (es->analyze)
ExplainPropertyFloat("Heap Fetches", NULL,
planstate->instrument->ntuples2, 0, es);
@@ -1619,11 +1701,11 @@ ExplainNode(PlanState *planstate, List *ancestors,
"Recheck Cond", planstate, ancestors, es);
if (((BitmapHeapScan *) plan)->bitmapqualorig)
show_instrumentation_count("Rows Removed by Index Recheck", 2,
- planstate, es);
+ show_instrument, es);
show_scan_qual(plan->qual, "Filter", planstate, ancestors, es);
if (plan->qual)
show_instrumentation_count("Rows Removed by Filter", 1,
- planstate, es);
+ show_instrument, es);
if (es->analyze)
show_tidbitmap_info((BitmapHeapScanState *) planstate, es);
break;
@@ -1641,7 +1723,7 @@ ExplainNode(PlanState *planstate, List *ancestors,
show_scan_qual(plan->qual, "Filter", planstate, ancestors, es);
if (plan->qual)
show_instrumentation_count("Rows Removed by Filter", 1,
- planstate, es);
+ show_instrument, es);
break;
case T_Gather:
{
@@ -1650,7 +1732,7 @@ ExplainNode(PlanState *planstate, List *ancestors,
show_scan_qual(plan->qual, "Filter", planstate, ancestors, es);
if (plan->qual)
show_instrumentation_count("Rows Removed by Filter", 1,
- planstate, es);
+ show_instrument, es);
ExplainPropertyInteger("Workers Planned", NULL,
gather->num_workers, es);
@@ -1696,7 +1778,7 @@ ExplainNode(PlanState *planstate, List *ancestors,
show_scan_qual(plan->qual, "Filter", planstate, ancestors, es);
if (plan->qual)
show_instrumentation_count("Rows Removed by Filter", 1,
- planstate, es);
+ show_instrument, es);
ExplainPropertyInteger("Workers Planned", NULL,
gm->num_workers, es);
@@ -1734,7 +1816,7 @@ ExplainNode(PlanState *planstate, List *ancestors,
show_scan_qual(plan->qual, "Filter", planstate, ancestors, es);
if (plan->qual)
show_instrumentation_count("Rows Removed by Filter", 1,
- planstate, es);
+ show_instrument, es);
break;
case T_TableFuncScan:
if (es->verbose)
@@ -1748,7 +1830,7 @@ ExplainNode(PlanState *planstate, List *ancestors,
show_scan_qual(plan->qual, "Filter", planstate, ancestors, es);
if (plan->qual)
show_instrumentation_count("Rows Removed by Filter", 1,
- planstate, es);
+ show_instrument, es);
break;
case T_TidScan:
{
@@ -1764,14 +1846,14 @@ ExplainNode(PlanState *planstate, List *ancestors,
show_scan_qual(plan->qual, "Filter", planstate, ancestors, es);
if (plan->qual)
show_instrumentation_count("Rows Removed by Filter", 1,
- planstate, es);
+ show_instrument, es);
}
break;
case T_ForeignScan:
show_scan_qual(plan->qual, "Filter", planstate, ancestors, es);
if (plan->qual)
show_instrumentation_count("Rows Removed by Filter", 1,
- planstate, es);
+ show_instrument, es);
show_foreignscan_info((ForeignScanState *) planstate, es);
break;
case T_CustomScan:
@@ -1781,7 +1863,7 @@ ExplainNode(PlanState *planstate, List *ancestors,
show_scan_qual(plan->qual, "Filter", planstate, ancestors, es);
if (plan->qual)
show_instrumentation_count("Rows Removed by Filter", 1,
- planstate, es);
+ show_instrument, es);
if (css->methods->ExplainCustomScan)
css->methods->ExplainCustomScan(css, ancestors, es);
}
@@ -1791,11 +1873,11 @@ ExplainNode(PlanState *planstate, List *ancestors,
"Join Filter", planstate, ancestors, es);
if (((NestLoop *) plan)->join.joinqual)
show_instrumentation_count("Rows Removed by Join Filter", 1,
- planstate, es);
+ show_instrument, es);
show_upper_qual(plan->qual, "Filter", planstate, ancestors, es);
if (plan->qual)
show_instrumentation_count("Rows Removed by Filter", 2,
- planstate, es);
+ show_instrument, es);
break;
case T_MergeJoin:
show_upper_qual(((MergeJoin *) plan)->mergeclauses,
@@ -1804,11 +1886,11 @@ ExplainNode(PlanState *planstate, List *ancestors,
"Join Filter", planstate, ancestors, es);
if (((MergeJoin *) plan)->join.joinqual)
show_instrumentation_count("Rows Removed by Join Filter", 1,
- planstate, es);
+ show_instrument, es);
show_upper_qual(plan->qual, "Filter", planstate, ancestors, es);
if (plan->qual)
show_instrumentation_count("Rows Removed by Filter", 2,
- planstate, es);
+ show_instrument, es);
break;
case T_HashJoin:
show_upper_qual(((HashJoin *) plan)->hashclauses,
@@ -1817,25 +1899,25 @@ ExplainNode(PlanState *planstate, List *ancestors,
"Join Filter", planstate, ancestors, es);
if (((HashJoin *) plan)->join.joinqual)
show_instrumentation_count("Rows Removed by Join Filter", 1,
- planstate, es);
+ show_instrument, es);
show_upper_qual(plan->qual, "Filter", planstate, ancestors, es);
if (plan->qual)
show_instrumentation_count("Rows Removed by Filter", 2,
- planstate, es);
+ show_instrument, es);
break;
case T_Agg:
show_agg_keys(castNode(AggState, planstate), ancestors, es);
show_upper_qual(plan->qual, "Filter", planstate, ancestors, es);
if (plan->qual)
show_instrumentation_count("Rows Removed by Filter", 1,
- planstate, es);
+ show_instrument, es);
break;
case T_Group:
show_group_keys(castNode(GroupState, planstate), ancestors, es);
show_upper_qual(plan->qual, "Filter", planstate, ancestors, es);
if (plan->qual)
show_instrumentation_count("Rows Removed by Filter", 1,
- planstate, es);
+ show_instrument, es);
break;
case T_Sort:
show_sort_keys(castNode(SortState, planstate), ancestors, es);
@@ -1851,7 +1933,7 @@ ExplainNode(PlanState *planstate, List *ancestors,
show_upper_qual(plan->qual, "Filter", planstate, ancestors, es);
if (plan->qual)
show_instrumentation_count("Rows Removed by Filter", 1,
- planstate, es);
+ show_instrument, es);
break;
case T_ModifyTable:
show_modifytable_info(castNode(ModifyTableState, planstate), ancestors,
@@ -1865,73 +1947,21 @@ ExplainNode(PlanState *planstate, List *ancestors,
}
/* Show buffer usage */
- if (es->buffers && planstate->instrument)
- show_buffer_usage(es, &planstate->instrument->bufusage);
+ if (es->buffers && show_instrument)
+ show_buffer_usage(es, &show_instrument->bufusage);
/* Show worker detail */
if (es->analyze && es->verbose && planstate->worker_instrument)
{
WorkerInstrumentation *w = planstate->worker_instrument;
bool opened_group = false;
- int n;
-
- for (n = 0; n < w->num_workers; ++n)
- {
- Instrumentation *instrument = &w->instrument[n];
- double nloops = instrument->nloops;
- double startup_ms;
- double total_ms;
- double rows;
-
- if (nloops <= 0)
- continue;
- startup_ms = 1000.0 * instrument->startup / nloops;
- total_ms = 1000.0 * instrument->total / nloops;
- rows = instrument->ntuples / nloops;
-
- if (es->format == EXPLAIN_FORMAT_TEXT)
- {
- appendStringInfoSpaces(es->str, es->indent * 2);
- appendStringInfo(es->str, "Worker %d: ", n);
- if (es->timing)
- appendStringInfo(es->str,
- "actual time=%.3f..%.3f rows=%.0f loops=%.0f\n",
- startup_ms, total_ms, rows, nloops);
- else
- appendStringInfo(es->str,
- "actual rows=%.0f loops=%.0f\n",
- rows, nloops);
- es->indent++;
- if (es->buffers)
- show_buffer_usage(es, &instrument->bufusage);
- es->indent--;
- }
- else
- {
- if (!opened_group)
- {
- ExplainOpenGroup("Workers", "Workers", false, es);
- opened_group = true;
- }
- ExplainOpenGroup("Worker", NULL, true, es);
- ExplainPropertyInteger("Worker Number", NULL, n, es);
-
- if (es->timing)
- {
- ExplainPropertyFloat("Actual Startup Time", "ms",
- startup_ms, 3, es);
- ExplainPropertyFloat("Actual Total Time", "ms",
- total_ms, 3, es);
- }
- ExplainPropertyFloat("Actual Rows", NULL, rows, 0, es);
- ExplainPropertyFloat("Actual Loops", NULL, nloops, 0, es);
- if (es->buffers)
- show_buffer_usage(es, &instrument->bufusage);
+ /* Leader */
+ ExplainNodePerProcess(es, &opened_group, -1, planstate->instrument);
- ExplainCloseGroup("Worker", NULL, true, es);
- }
- }
+ /* Workers */
+ for (int n = 0; n < w->num_workers; ++n)
+ ExplainNodePerProcess(es, &opened_group, n, &w->instrument[n]);
if (opened_group)
ExplainCloseGroup("Workers", "Workers", false, es);
@@ -2750,19 +2780,19 @@ show_tidbitmap_info(BitmapHeapScanState *planstate, ExplainState *es)
*/
static void
show_instrumentation_count(const char *qlabel, int which,
- PlanState *planstate, ExplainState *es)
+ Instrumentation *instrument, ExplainState *es)
{
double nfiltered;
double nloops;
- if (!es->analyze || !planstate->instrument)
+ if (!es->analyze || !instrument)
return;
if (which == 2)
- nfiltered = planstate->instrument->nfiltered2;
+ nfiltered = instrument->nfiltered2;
else
- nfiltered = planstate->instrument->nfiltered1;
- nloops = planstate->instrument->nloops;
+ nfiltered = instrument->nfiltered1;
+ nloops = instrument->nloops;
/* In text mode, suppress zero counts; they're not interesting enough */
if (nfiltered > 0 || es->format != EXPLAIN_FORMAT_TEXT)
@@ -3280,7 +3310,8 @@ show_modifytable_info(ModifyTableState *mtstate, List *ancestors,
{
show_upper_qual((List *) node->onConflictWhere, "Conflict Filter",
&mtstate->ps, ancestors, es);
- show_instrumentation_count("Rows Removed by Conflict Filter", 1, &mtstate->ps, es);
+ show_instrumentation_count("Rows Removed by Conflict Filter", 1,
+ mtstate->ps.instrument, es);
}
/* EXPLAIN ANALYZE display of actual outcome for each tuple proposed */
diff --git a/src/backend/executor/execParallel.c b/src/backend/executor/execParallel.c
index 53cd2fc666..26a04692ab 100644
--- a/src/backend/executor/execParallel.c
+++ b/src/backend/executor/execParallel.c
@@ -987,20 +987,22 @@ ExecParallelRetrieveInstrumentation(PlanState *planstate,
if (i >= instrumentation->num_plan_nodes)
elog(ERROR, "plan node %d not found", plan_node_id);
- /* Accumulate the statistics from all workers. */
- instrument = GetInstrumentationArray(instrumentation);
- instrument += i * instrumentation->num_workers;
- for (n = 0; n < instrumentation->num_workers; ++n)
- InstrAggNode(planstate->instrument, &instrument[n]);
-
/*
- * Also store the per-worker detail.
- *
* Worker instrumentation should be allocated in the same context as the
* regular instrumentation information, which is the per-query context.
* Switch into per-query memory context.
*/
oldcontext = MemoryContextSwitchTo(planstate->state->es_query_cxt);
+
+ /* Accumulate the statistics from all workers. */
+ if (!planstate->combined_instrument)
+ planstate->combined_instrument = palloc0(sizeof(Instrumentation));
+ instrument = GetInstrumentationArray(instrumentation);
+ instrument += i * instrumentation->num_workers;
+ for (n = 0; n < instrumentation->num_workers; ++n)
+ InstrAggNode(planstate->combined_instrument, &instrument[n]);
+
+ /* Also store the per-worker detail. */
ibytes = mul_size(instrumentation->num_workers, sizeof(Instrumentation));
planstate->worker_instrument =
palloc(ibytes + offsetof(WorkerInstrumentation, instrument));
diff --git a/src/include/nodes/execnodes.h b/src/include/nodes/execnodes.h
index 44f76082e9..cfeddbe5fc 100644
--- a/src/include/nodes/execnodes.h
+++ b/src/include/nodes/execnodes.h
@@ -948,6 +948,7 @@ typedef struct PlanState
* wrapper */
Instrumentation *instrument; /* Optional runtime stats for this node */
+ Instrumentation *combined_instrument; /* sum for leader and workers */
WorkerInstrumentation *worker_instrument; /* per-worker instrumentation */
/* Per-worker JIT instrumentation */
--
2.23.0
0002-Improve-EXPLAIN-of-Sort-in-parallel-queries-v3.patchapplication/octet-stream; name=0002-Improve-EXPLAIN-of-Sort-in-parallel-queries-v3.patchDownload
From 23be3e560fbd99fd142e177d04bbc1d8cee4c459 Mon Sep 17 00:00:00 2001
From: Thomas Munro <thomas.munro@gmail.com>
Date: Wed, 23 Oct 2019 17:39:09 +1300
Subject: [PATCH 2/2] Improve EXPLAIN of Sort in parallel queries.
Previously, it wasn't very clear to the uninitiated what work
a Sort node did in the leader process. Label the ANALYZE output
explicitly, like VERBOSE does.
Author: Thomas Munro
Reviewed-by: Melanie Plageman, Rafia Sabih
Discussion: https://postgr.es/m/CA+hUKG+Z22=vkVkXtKRznzRdtj=MtygfZMYUJs8j7ObjzkG1Lw@mail.gmail.com
---
src/backend/commands/explain.c | 133 ++++++++++--------
src/test/regress/expected/select_parallel.out | 2 +-
2 files changed, 76 insertions(+), 59 deletions(-)
diff --git a/src/backend/commands/explain.c b/src/backend/commands/explain.c
index 89c39de72a..dcc359de89 100644
--- a/src/backend/commands/explain.c
+++ b/src/backend/commands/explain.c
@@ -2561,85 +2561,102 @@ show_tablesample(TableSampleClause *tsc, PlanState *planstate,
}
/*
- * If it's EXPLAIN ANALYZE, show tuplesort stats for a sort node
+ * Workhorse for show_sort_info(). Use work number -1 for the leader process.
*/
static void
-show_sort_info(SortState *sortstate, ExplainState *es)
+show_sort_info_per_process(TuplesortInstrumentation *sinstrument,
+ bool has_workers,
+ int worker_number,
+ bool *opened_group,
+ ExplainState *es)
{
- if (!es->analyze)
- return;
+ const char *sortMethod;
+ const char *spaceType;
+ long spaceUsed;
- if (sortstate->sort_Done && sortstate->tuplesortstate != NULL)
+ if (sinstrument->sortMethod == SORT_TYPE_STILL_IN_PROGRESS)
+ return; /* ignore any unfilled slots */
+
+ sortMethod = tuplesort_method_name(sinstrument->sortMethod);
+ spaceType = tuplesort_space_type_name(sinstrument->spaceType);
+ spaceUsed = sinstrument->spaceUsed;
+
+ if (es->format == EXPLAIN_FORMAT_TEXT)
{
- Tuplesortstate *state = (Tuplesortstate *) sortstate->tuplesortstate;
- TuplesortInstrumentation stats;
- const char *sortMethod;
- const char *spaceType;
- long spaceUsed;
+ char prefix[80];
- tuplesort_get_stats(state, &stats);
- sortMethod = tuplesort_method_name(stats.sortMethod);
- spaceType = tuplesort_space_type_name(stats.spaceType);
- spaceUsed = stats.spaceUsed;
+ if (!has_workers)
+ prefix[0] = '\0';
+ else if (worker_number < 0)
+ snprintf(prefix, sizeof(prefix), "Leader: ");
+ else
+ snprintf(prefix, sizeof(prefix), "Worker %d: ", worker_number);
- if (es->format == EXPLAIN_FORMAT_TEXT)
+ appendStringInfoSpaces(es->str, es->indent * 2);
+ appendStringInfo(es->str,
+ "%sSort Method: %s %s: %ldkB\n",
+ prefix, sortMethod, spaceType, spaceUsed);
+ }
+ else
+ {
+ if (!has_workers)
{
- appendStringInfoSpaces(es->str, es->indent * 2);
- appendStringInfo(es->str, "Sort Method: %s %s: %ldkB\n",
- sortMethod, spaceType, spaceUsed);
+ ExplainPropertyText("Sort Method", sortMethod, es);
+ ExplainPropertyInteger("Sort Space Used", "kB", spaceUsed, es);
+ ExplainPropertyText("Sort Space Type", spaceType, es);
}
else
{
+ if (!*opened_group)
+ {
+ ExplainOpenGroup("Workers", "Workers", false, es);
+ *opened_group = true;
+ }
+ ExplainOpenGroup("Worker", NULL, true, es);
+ ExplainPropertyInteger("Worker Number", NULL, worker_number, es);
ExplainPropertyText("Sort Method", sortMethod, es);
ExplainPropertyInteger("Sort Space Used", "kB", spaceUsed, es);
ExplainPropertyText("Sort Space Type", spaceType, es);
+ ExplainCloseGroup("Worker", NULL, true, es);
}
}
+}
- if (sortstate->shared_info != NULL)
+/*
+ * If it's EXPLAIN ANALYZE, show tuplesort stats for a sort node
+ */
+static void
+show_sort_info(SortState *sortstate, ExplainState *es)
+{
+ bool opened_group = false;
+
+ if (!es->analyze)
+ return;
+
+ /*
+ * Show the leader stats for parallel queries, or only stats for
+ * non-parallel queries.
+ */
+ if (sortstate->sort_Done && sortstate->tuplesortstate != NULL)
{
- int n;
- bool opened_group = false;
+ Tuplesortstate *state = (Tuplesortstate *) sortstate->tuplesortstate;
+ TuplesortInstrumentation stats;
- for (n = 0; n < sortstate->shared_info->num_workers; n++)
- {
- TuplesortInstrumentation *sinstrument;
- const char *sortMethod;
- const char *spaceType;
- long spaceUsed;
-
- sinstrument = &sortstate->shared_info->sinstrument[n];
- if (sinstrument->sortMethod == SORT_TYPE_STILL_IN_PROGRESS)
- continue; /* ignore any unfilled slots */
- sortMethod = tuplesort_method_name(sinstrument->sortMethod);
- spaceType = tuplesort_space_type_name(sinstrument->spaceType);
- spaceUsed = sinstrument->spaceUsed;
+ tuplesort_get_stats(state, &stats);
+ show_sort_info_per_process(&stats, sortstate->shared_info != NULL,
+ -1, &opened_group, es);
+ }
- if (es->format == EXPLAIN_FORMAT_TEXT)
- {
- appendStringInfoSpaces(es->str, es->indent * 2);
- appendStringInfo(es->str,
- "Worker %d: Sort Method: %s %s: %ldkB\n",
- n, sortMethod, spaceType, spaceUsed);
- }
- else
- {
- if (!opened_group)
- {
- ExplainOpenGroup("Workers", "Workers", false, es);
- opened_group = true;
- }
- ExplainOpenGroup("Worker", NULL, true, es);
- ExplainPropertyInteger("Worker Number", NULL, n, es);
- ExplainPropertyText("Sort Method", sortMethod, es);
- ExplainPropertyInteger("Sort Space Used", "kB", spaceUsed, es);
- ExplainPropertyText("Sort Space Type", spaceType, es);
- ExplainCloseGroup("Worker", NULL, true, es);
- }
- }
- if (opened_group)
- ExplainCloseGroup("Workers", "Workers", false, es);
+ /* Show the worker stats. */
+ if (sortstate->shared_info != NULL)
+ {
+ for (int n = 0; n < sortstate->shared_info->num_workers; n++)
+ show_sort_info_per_process(&sortstate->shared_info->sinstrument[n],
+ true, n, &opened_group, es);
}
+
+ if (opened_group)
+ ExplainCloseGroup("Workers", "Workers", false, es);
}
/*
diff --git a/src/test/regress/expected/select_parallel.out b/src/test/regress/expected/select_parallel.out
index 0eca76cb41..5c0738ccdc 100644
--- a/src/test/regress/expected/select_parallel.out
+++ b/src/test/regress/expected/select_parallel.out
@@ -551,7 +551,7 @@ select * from explain_parallel_sort_stats();
Workers Launched: 4
-> Sort (actual rows=2000 loops=15)
Sort Key: tenk1.ten
- Sort Method: quicksort Memory: xxx
+ Leader: Sort Method: quicksort Memory: xxx
Worker 0: Sort Method: quicksort Memory: xxx
Worker 1: Sort Method: quicksort Memory: xxx
Worker 2: Sort Method: quicksort Memory: xxx
--
2.23.0
Both patches aren't applying cleanly anymore.
The first patch in the set applied cleanly for me before b925a00f4ef65
It mostly seems like the default settings for the patch program were
my problem, but, since I noticed that the patch tester bot was failing
to apply it also, I thought I would suggest rebasing it.
I applied it to the sha before b925a00f4ef65 and then cherry-picked it
to master and it applied fine. I attached that rebased patch with a
new version number (is that the preferred way to indicate that it is
newer even if it contains no new content?).
The second patch in the set needed a bit more looking at to rebase,
which I didn't do yet.
I played around with the first patch in the patchset and very much
appreciate seeing the leaders contribution.
However, I noticed that there were no EXPLAIN diffs in any test files
and just wondered if this was a conscious choice (even with xxx the
actual numbers, I would have thought that there would be an EXPLAIN
VERBOSE with leader participation somewhere in regress).
--
Melanie Plageman
Attachments:
0001-Show-parallel-leader-stats-in-EXPLAIN-output-v4.patchapplication/octet-stream; name=0001-Show-parallel-leader-stats-in-EXPLAIN-output-v4.patchDownload
From 8a5c4cab9bce4a88865b2bf5e2ed9c35f31ddd78 Mon Sep 17 00:00:00 2001
From: Melanie Plageman <melanieplageman@gmail.com>
Date: Fri, 10 Jan 2020 11:21:46 -0800
Subject: [PATCH 1/2] Show parallel leader stats in EXPLAIN output.
Previously, we only showed VERBOSE information from parallel
workers. Show the leader's contribution too.
Instead of aggregating all workers' instrumentation into the
leader's, create a separate instrumentation object into which
all processes' contributions are aggregated. Then the leader's
is still available for separate display.
Author: Thomas Munro
Reviewed-by: Melanie Plageman, Rafia Sabih
Discussion: https://postgr.es/m/CA+hUKG+Z22=vkVkXtKRznzRdtj=MtygfZMYUJs8j7ObjzkG1Lw@mail.gmail.com
---
src/backend/commands/explain.c | 221 ++++++++++++++++------------
src/backend/executor/execParallel.c | 18 ++-
src/include/nodes/execnodes.h | 1 +
3 files changed, 137 insertions(+), 103 deletions(-)
diff --git a/src/backend/commands/explain.c b/src/backend/commands/explain.c
index d189b8d573..6caf2505a9 100644
--- a/src/backend/commands/explain.c
+++ b/src/backend/commands/explain.c
@@ -105,7 +105,7 @@ static void show_hash_info(HashState *hashstate, ExplainState *es);
static void show_tidbitmap_info(BitmapHeapScanState *planstate,
ExplainState *es);
static void show_instrumentation_count(const char *qlabel, int which,
- PlanState *planstate, ExplainState *es);
+ Instrumentation *instrument, ExplainState *es);
static void show_foreignscan_info(ForeignScanState *fsstate, ExplainState *es);
static void show_eval_params(Bitmapset *bms_params, ExplainState *es);
static const char *explain_get_index_name(Oid indexId);
@@ -1058,6 +1058,75 @@ ExplainPreScanNode(PlanState *planstate, Bitmapset **rels_used)
return planstate_tree_walker(planstate, ExplainPreScanNode, rels_used);
}
+/*
+ * ExplainNodePerProcess -
+ * Show the per-process instrumentation for a node run in parallel.
+ *
+ * Subroutine for ExplainNode() when showing a parallel query in ANALYZE
+ * VERBOSE mode. Use worker number -1 for the leader process.
+ */
+static void
+ExplainNodePerProcess(ExplainState *es, bool *opened_group,
+ int worker_number, Instrumentation *instrument)
+{
+ double nloops = instrument->nloops;
+ double startup_ms;
+ double total_ms;
+ double rows;
+
+ if (nloops <= 0)
+ return;
+ startup_ms = 1000.0 * instrument->startup / nloops;
+ total_ms = 1000.0 * instrument->total / nloops;
+ rows = instrument->ntuples / nloops;
+
+ if (es->format == EXPLAIN_FORMAT_TEXT)
+ {
+ appendStringInfoSpaces(es->str, es->indent * 2);
+ if (worker_number < 0)
+ appendStringInfo(es->str, "Leader: ");
+ else
+ appendStringInfo(es->str, "Worker %d: ", worker_number);
+ if (es->timing)
+ appendStringInfo(es->str,
+ "actual time=%.3f..%.3f rows=%.0f loops=%.0f\n",
+ startup_ms, total_ms, rows, nloops);
+ else
+ appendStringInfo(es->str,
+ "actual rows=%.0f loops=%.0f\n",
+ rows, nloops);
+ es->indent++;
+ if (es->buffers)
+ show_buffer_usage(es, &instrument->bufusage);
+ es->indent--;
+ }
+ else
+ {
+ if (!*opened_group)
+ {
+ ExplainOpenGroup("Workers", "Workers", false, es);
+ *opened_group = true;
+ }
+ ExplainOpenGroup("Worker", NULL, true, es);
+ ExplainPropertyInteger("Worker Number", NULL, worker_number, es);
+
+ if (es->timing)
+ {
+ ExplainPropertyFloat("Actual Startup Time", "ms",
+ startup_ms, 3, es);
+ ExplainPropertyFloat("Actual Total Time", "ms",
+ total_ms, 3, es);
+ }
+ ExplainPropertyFloat("Actual Rows", NULL, rows, 0, es);
+ ExplainPropertyFloat("Actual Loops", NULL, nloops, 0, es);
+
+ if (es->buffers)
+ show_buffer_usage(es, &instrument->bufusage);
+
+ ExplainCloseGroup("Worker", NULL, true, es);
+ }
+}
+
/*
* ExplainNode -
* Appends a description of a plan tree to es->str
@@ -1084,6 +1153,7 @@ ExplainNode(PlanState *planstate, List *ancestors,
ExplainState *es)
{
Plan *plan = planstate->plan;
+ Instrumentation *show_instrument;
const char *pname; /* node type name for text output */
const char *sname; /* node type name for non-text output */
const char *strategy = NULL;
@@ -1093,6 +1163,12 @@ ExplainNode(PlanState *planstate, List *ancestors,
int save_indent = es->indent;
bool haschildren;
+ /* If this is a parallel query, show the combined stats. */
+ if (planstate->combined_instrument)
+ show_instrument = planstate->combined_instrument;
+ else
+ show_instrument = planstate->instrument;
+
switch (nodeTag(plan))
{
case T_Result:
@@ -1522,13 +1598,19 @@ ExplainNode(PlanState *planstate, List *ancestors,
if (planstate->instrument)
InstrEndLoop(planstate->instrument);
- if (es->analyze &&
- planstate->instrument && planstate->instrument->nloops > 0)
+ /*
+ * If we have combined instrumentation from parallel workers, add the
+ * leader's contribution.
+ */
+ if (planstate->combined_instrument && planstate->instrument)
+ InstrAggNode(planstate->combined_instrument, planstate->instrument);
+
+ if (es->analyze && show_instrument && show_instrument->nloops > 0)
{
- double nloops = planstate->instrument->nloops;
- double startup_ms = 1000.0 * planstate->instrument->startup / nloops;
- double total_ms = 1000.0 * planstate->instrument->total / nloops;
- double rows = planstate->instrument->ntuples / nloops;
+ double nloops = show_instrument->nloops;
+ double startup_ms = 1000.0 * show_instrument->startup / nloops;
+ double total_ms = 1000.0 * show_instrument->total / nloops;
+ double rows = show_instrument->ntuples / nloops;
if (es->format == EXPLAIN_FORMAT_TEXT)
{
@@ -1603,26 +1685,26 @@ ExplainNode(PlanState *planstate, List *ancestors,
"Index Cond", planstate, ancestors, es);
if (((IndexScan *) plan)->indexqualorig)
show_instrumentation_count("Rows Removed by Index Recheck", 2,
- planstate, es);
+ show_instrument, es);
show_scan_qual(((IndexScan *) plan)->indexorderbyorig,
"Order By", planstate, ancestors, es);
show_scan_qual(plan->qual, "Filter", planstate, ancestors, es);
if (plan->qual)
show_instrumentation_count("Rows Removed by Filter", 1,
- planstate, es);
+ show_instrument, es);
break;
case T_IndexOnlyScan:
show_scan_qual(((IndexOnlyScan *) plan)->indexqual,
"Index Cond", planstate, ancestors, es);
if (((IndexOnlyScan *) plan)->indexqual)
show_instrumentation_count("Rows Removed by Index Recheck", 2,
- planstate, es);
+ show_instrument, es);
show_scan_qual(((IndexOnlyScan *) plan)->indexorderby,
"Order By", planstate, ancestors, es);
show_scan_qual(plan->qual, "Filter", planstate, ancestors, es);
if (plan->qual)
show_instrumentation_count("Rows Removed by Filter", 1,
- planstate, es);
+ show_instrument, es);
if (es->analyze)
ExplainPropertyFloat("Heap Fetches", NULL,
planstate->instrument->ntuples2, 0, es);
@@ -1636,11 +1718,11 @@ ExplainNode(PlanState *planstate, List *ancestors,
"Recheck Cond", planstate, ancestors, es);
if (((BitmapHeapScan *) plan)->bitmapqualorig)
show_instrumentation_count("Rows Removed by Index Recheck", 2,
- planstate, es);
+ show_instrument, es);
show_scan_qual(plan->qual, "Filter", planstate, ancestors, es);
if (plan->qual)
show_instrumentation_count("Rows Removed by Filter", 1,
- planstate, es);
+ show_instrument, es);
if (es->analyze)
show_tidbitmap_info((BitmapHeapScanState *) planstate, es);
break;
@@ -1658,7 +1740,7 @@ ExplainNode(PlanState *planstate, List *ancestors,
show_scan_qual(plan->qual, "Filter", planstate, ancestors, es);
if (plan->qual)
show_instrumentation_count("Rows Removed by Filter", 1,
- planstate, es);
+ show_instrument, es);
break;
case T_Gather:
{
@@ -1667,7 +1749,7 @@ ExplainNode(PlanState *planstate, List *ancestors,
show_scan_qual(plan->qual, "Filter", planstate, ancestors, es);
if (plan->qual)
show_instrumentation_count("Rows Removed by Filter", 1,
- planstate, es);
+ show_instrument, es);
ExplainPropertyInteger("Workers Planned", NULL,
gather->num_workers, es);
@@ -1713,7 +1795,7 @@ ExplainNode(PlanState *planstate, List *ancestors,
show_scan_qual(plan->qual, "Filter", planstate, ancestors, es);
if (plan->qual)
show_instrumentation_count("Rows Removed by Filter", 1,
- planstate, es);
+ show_instrument, es);
ExplainPropertyInteger("Workers Planned", NULL,
gm->num_workers, es);
@@ -1751,7 +1833,7 @@ ExplainNode(PlanState *planstate, List *ancestors,
show_scan_qual(plan->qual, "Filter", planstate, ancestors, es);
if (plan->qual)
show_instrumentation_count("Rows Removed by Filter", 1,
- planstate, es);
+ show_instrument, es);
break;
case T_TableFuncScan:
if (es->verbose)
@@ -1765,7 +1847,7 @@ ExplainNode(PlanState *planstate, List *ancestors,
show_scan_qual(plan->qual, "Filter", planstate, ancestors, es);
if (plan->qual)
show_instrumentation_count("Rows Removed by Filter", 1,
- planstate, es);
+ show_instrument, es);
break;
case T_TidScan:
{
@@ -1781,14 +1863,14 @@ ExplainNode(PlanState *planstate, List *ancestors,
show_scan_qual(plan->qual, "Filter", planstate, ancestors, es);
if (plan->qual)
show_instrumentation_count("Rows Removed by Filter", 1,
- planstate, es);
+ show_instrument, es);
}
break;
case T_ForeignScan:
show_scan_qual(plan->qual, "Filter", planstate, ancestors, es);
if (plan->qual)
show_instrumentation_count("Rows Removed by Filter", 1,
- planstate, es);
+ show_instrument, es);
show_foreignscan_info((ForeignScanState *) planstate, es);
break;
case T_CustomScan:
@@ -1798,7 +1880,7 @@ ExplainNode(PlanState *planstate, List *ancestors,
show_scan_qual(plan->qual, "Filter", planstate, ancestors, es);
if (plan->qual)
show_instrumentation_count("Rows Removed by Filter", 1,
- planstate, es);
+ show_instrument, es);
if (css->methods->ExplainCustomScan)
css->methods->ExplainCustomScan(css, ancestors, es);
}
@@ -1808,11 +1890,11 @@ ExplainNode(PlanState *planstate, List *ancestors,
"Join Filter", planstate, ancestors, es);
if (((NestLoop *) plan)->join.joinqual)
show_instrumentation_count("Rows Removed by Join Filter", 1,
- planstate, es);
+ show_instrument, es);
show_upper_qual(plan->qual, "Filter", planstate, ancestors, es);
if (plan->qual)
show_instrumentation_count("Rows Removed by Filter", 2,
- planstate, es);
+ show_instrument, es);
break;
case T_MergeJoin:
show_upper_qual(((MergeJoin *) plan)->mergeclauses,
@@ -1821,11 +1903,11 @@ ExplainNode(PlanState *planstate, List *ancestors,
"Join Filter", planstate, ancestors, es);
if (((MergeJoin *) plan)->join.joinqual)
show_instrumentation_count("Rows Removed by Join Filter", 1,
- planstate, es);
+ show_instrument, es);
show_upper_qual(plan->qual, "Filter", planstate, ancestors, es);
if (plan->qual)
show_instrumentation_count("Rows Removed by Filter", 2,
- planstate, es);
+ show_instrument, es);
break;
case T_HashJoin:
show_upper_qual(((HashJoin *) plan)->hashclauses,
@@ -1834,25 +1916,25 @@ ExplainNode(PlanState *planstate, List *ancestors,
"Join Filter", planstate, ancestors, es);
if (((HashJoin *) plan)->join.joinqual)
show_instrumentation_count("Rows Removed by Join Filter", 1,
- planstate, es);
+ show_instrument, es);
show_upper_qual(plan->qual, "Filter", planstate, ancestors, es);
if (plan->qual)
show_instrumentation_count("Rows Removed by Filter", 2,
- planstate, es);
+ show_instrument, es);
break;
case T_Agg:
show_agg_keys(castNode(AggState, planstate), ancestors, es);
show_upper_qual(plan->qual, "Filter", planstate, ancestors, es);
if (plan->qual)
show_instrumentation_count("Rows Removed by Filter", 1,
- planstate, es);
+ show_instrument, es);
break;
case T_Group:
show_group_keys(castNode(GroupState, planstate), ancestors, es);
show_upper_qual(plan->qual, "Filter", planstate, ancestors, es);
if (plan->qual)
show_instrumentation_count("Rows Removed by Filter", 1,
- planstate, es);
+ show_instrument, es);
break;
case T_Sort:
show_sort_keys(castNode(SortState, planstate), ancestors, es);
@@ -1868,7 +1950,7 @@ ExplainNode(PlanState *planstate, List *ancestors,
show_upper_qual(plan->qual, "Filter", planstate, ancestors, es);
if (plan->qual)
show_instrumentation_count("Rows Removed by Filter", 1,
- planstate, es);
+ show_instrument, es);
break;
case T_ModifyTable:
show_modifytable_info(castNode(ModifyTableState, planstate), ancestors,
@@ -1882,8 +1964,8 @@ ExplainNode(PlanState *planstate, List *ancestors,
}
/* Show buffer usage */
- if (es->buffers && planstate->instrument)
- show_buffer_usage(es, &planstate->instrument->bufusage);
+ if (es->buffers && show_instrument)
+ show_buffer_usage(es, &show_instrument->bufusage);
/* Show worker detail */
if (es->analyze && es->verbose && !es->hide_workers &&
@@ -1891,65 +1973,13 @@ ExplainNode(PlanState *planstate, List *ancestors,
{
WorkerInstrumentation *w = planstate->worker_instrument;
bool opened_group = false;
- int n;
-
- for (n = 0; n < w->num_workers; ++n)
- {
- Instrumentation *instrument = &w->instrument[n];
- double nloops = instrument->nloops;
- double startup_ms;
- double total_ms;
- double rows;
-
- if (nloops <= 0)
- continue;
- startup_ms = 1000.0 * instrument->startup / nloops;
- total_ms = 1000.0 * instrument->total / nloops;
- rows = instrument->ntuples / nloops;
-
- if (es->format == EXPLAIN_FORMAT_TEXT)
- {
- appendStringInfoSpaces(es->str, es->indent * 2);
- appendStringInfo(es->str, "Worker %d: ", n);
- if (es->timing)
- appendStringInfo(es->str,
- "actual time=%.3f..%.3f rows=%.0f loops=%.0f\n",
- startup_ms, total_ms, rows, nloops);
- else
- appendStringInfo(es->str,
- "actual rows=%.0f loops=%.0f\n",
- rows, nloops);
- es->indent++;
- if (es->buffers)
- show_buffer_usage(es, &instrument->bufusage);
- es->indent--;
- }
- else
- {
- if (!opened_group)
- {
- ExplainOpenGroup("Workers", "Workers", false, es);
- opened_group = true;
- }
- ExplainOpenGroup("Worker", NULL, true, es);
- ExplainPropertyInteger("Worker Number", NULL, n, es);
-
- if (es->timing)
- {
- ExplainPropertyFloat("Actual Startup Time", "ms",
- startup_ms, 3, es);
- ExplainPropertyFloat("Actual Total Time", "ms",
- total_ms, 3, es);
- }
- ExplainPropertyFloat("Actual Rows", NULL, rows, 0, es);
- ExplainPropertyFloat("Actual Loops", NULL, nloops, 0, es);
- if (es->buffers)
- show_buffer_usage(es, &instrument->bufusage);
+ /* Leader */
+ ExplainNodePerProcess(es, &opened_group, -1, planstate->instrument);
- ExplainCloseGroup("Worker", NULL, true, es);
- }
- }
+ /* Workers */
+ for (int n = 0; n < w->num_workers; ++n)
+ ExplainNodePerProcess(es, &opened_group, n, &w->instrument[n]);
if (opened_group)
ExplainCloseGroup("Workers", "Workers", false, es);
@@ -2776,19 +2806,19 @@ show_tidbitmap_info(BitmapHeapScanState *planstate, ExplainState *es)
*/
static void
show_instrumentation_count(const char *qlabel, int which,
- PlanState *planstate, ExplainState *es)
+ Instrumentation *instrument, ExplainState *es)
{
double nfiltered;
double nloops;
- if (!es->analyze || !planstate->instrument)
+ if (!es->analyze || !instrument)
return;
if (which == 2)
- nfiltered = planstate->instrument->nfiltered2;
+ nfiltered = instrument->nfiltered2;
else
- nfiltered = planstate->instrument->nfiltered1;
- nloops = planstate->instrument->nloops;
+ nfiltered = instrument->nfiltered1;
+ nloops = instrument->nloops;
/* In text mode, suppress zero counts; they're not interesting enough */
if (nfiltered > 0 || es->format != EXPLAIN_FORMAT_TEXT)
@@ -3306,7 +3336,8 @@ show_modifytable_info(ModifyTableState *mtstate, List *ancestors,
{
show_upper_qual((List *) node->onConflictWhere, "Conflict Filter",
&mtstate->ps, ancestors, es);
- show_instrumentation_count("Rows Removed by Conflict Filter", 1, &mtstate->ps, es);
+ show_instrumentation_count("Rows Removed by Conflict Filter", 1,
+ mtstate->ps.instrument, es);
}
/* EXPLAIN ANALYZE display of actual outcome for each tuple proposed */
diff --git a/src/backend/executor/execParallel.c b/src/backend/executor/execParallel.c
index a753d6efa0..b3e615aea0 100644
--- a/src/backend/executor/execParallel.c
+++ b/src/backend/executor/execParallel.c
@@ -989,20 +989,22 @@ ExecParallelRetrieveInstrumentation(PlanState *planstate,
if (i >= instrumentation->num_plan_nodes)
elog(ERROR, "plan node %d not found", plan_node_id);
- /* Accumulate the statistics from all workers. */
- instrument = GetInstrumentationArray(instrumentation);
- instrument += i * instrumentation->num_workers;
- for (n = 0; n < instrumentation->num_workers; ++n)
- InstrAggNode(planstate->instrument, &instrument[n]);
-
/*
- * Also store the per-worker detail.
- *
* Worker instrumentation should be allocated in the same context as the
* regular instrumentation information, which is the per-query context.
* Switch into per-query memory context.
*/
oldcontext = MemoryContextSwitchTo(planstate->state->es_query_cxt);
+
+ /* Accumulate the statistics from all workers. */
+ if (!planstate->combined_instrument)
+ planstate->combined_instrument = palloc0(sizeof(Instrumentation));
+ instrument = GetInstrumentationArray(instrumentation);
+ instrument += i * instrumentation->num_workers;
+ for (n = 0; n < instrumentation->num_workers; ++n)
+ InstrAggNode(planstate->combined_instrument, &instrument[n]);
+
+ /* Also store the per-worker detail. */
ibytes = mul_size(instrumentation->num_workers, sizeof(Instrumentation));
planstate->worker_instrument =
palloc(ibytes + offsetof(WorkerInstrumentation, instrument));
diff --git a/src/include/nodes/execnodes.h b/src/include/nodes/execnodes.h
index eaea1f3b0c..3968f7e0e5 100644
--- a/src/include/nodes/execnodes.h
+++ b/src/include/nodes/execnodes.h
@@ -947,6 +947,7 @@ typedef struct PlanState
* wrapper */
Instrumentation *instrument; /* Optional runtime stats for this node */
+ Instrumentation *combined_instrument; /* sum for leader and workers */
WorkerInstrumentation *worker_instrument; /* per-worker instrumentation */
/* Per-worker JIT instrumentation */
--
2.20.1 (Apple Git-117)
So, I think from a code review perspective the code in the patches
LGTM.
As for the EXPLAIN ANALYZE tests--I don't see that many of them in
regress, so maybe that's because they aren't normally very useful. In
this case, it would only be to protect against regressions in printing
the leader instrumentation, I think.
The problem with that is, even with all the non-deterministic info
redacted, if the leader doesn't participate (which is not guaranteed),
then its stats wouldn't be printed at all and that would cause an
incorrectly failing test case...okay I just talked myself out of the
usefulness of testing this.
So, I would move it to "ready for committer", but, since it is not
applying cleanly, I have changed the status to "waiting on author".
On Sat, Jan 25, 2020 at 3:39 PM Melanie Plageman
<melanieplageman@gmail.com> wrote:
So, I think from a code review perspective the code in the patches
LGTM.
As for the EXPLAIN ANALYZE tests--I don't see that many of them in
regress, so maybe that's because they aren't normally very useful. In
this case, it would only be to protect against regressions in printing
the leader instrumentation, I think.
The problem with that is, even with all the non-deterministic info
redacted, if the leader doesn't participate (which is not guaranteed),
then its stats wouldn't be printed at all and that would cause an
incorrectly failing test case...okay I just talked myself out of the
usefulness of testing this.
So, I would move it to "ready for committer", but, since it is not
applying cleanly, I have changed the status to "waiting on author".
Hi Melanie,
Thanks for the reviews!
I think I'm going to abandon 0002 for now, because that stuff is being
refactored independently over here, so rebasing would be futile:
/messages/by-id/CAOtHd0AvAA8CLB9Xz0wnxu1U=zJCKrr1r4QwwXi_kcQsHDVU=Q@mail.gmail.com
On that basis, I've set it to ready for committer (meaning 0001 only).
Thanks for the rebase. I'll let that sit for a couple of days and see
if anything conflicting comes out of that other thread. It's a fair
complaint that we lack tests that show the new output; I'll think
about adding one too.
Thomas Munro <thomas.munro@gmail.com> writes:
I think I'm going to abandon 0002 for now, because that stuff is being
refactored independently over here, so rebasing would be futile:
/messages/by-id/CAOtHd0AvAA8CLB9Xz0wnxu1U=zJCKrr1r4QwwXi_kcQsHDVU=Q@mail.gmail.com
Yeah, your 0002 needs some rethinking. I kind of like the proposed
change in the text-format output:
Workers Launched: 4
-> Sort (actual rows=2000 loops=15)
Sort Key: tenk1.ten
- Sort Method: quicksort Memory: xxx
+ Leader: Sort Method: quicksort Memory: xxx
Worker 0: Sort Method: quicksort Memory: xxx
Worker 1: Sort Method: quicksort Memory: xxx
Worker 2: Sort Method: quicksort Memory: xxx
but it's quite unclear to me how that translates into non-text
formats, especially if we're not to break invariants about which
fields are present in a non-text output structure (cf [1]/messages/by-id/19416.1580069629@sss.pgh.pa.us).
I've occasionally wondered whether we'd be better off presenting
this info as if the leader were "worker 0" and then the N workers
are workers 1 to N. I've not worked out the implications of that
in any detail though. It's fairly easy to see what to do for
fields that can be aggregated (the numbers printed for the node
as a whole are totals), but it doesn't help us any with something
like Sort Method.
On a narrower note, I'm not at all happy with the fact that 0001
adds yet another field to *every* PlanState. I think this is
doubling down on a fundamentally wrong decision to have
ExecParallelRetrieveInstrumentation do some aggregation immediately.
I think we should abandon that and just say that it returns the raw
leader and per-worker data, and then explain.c can aggregate as it
wishes.
regards, tom lane
On Mon, Jan 27, 2020 at 11:49 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:
I've occasionally wondered whether we'd be better off presenting
this info as if the leader were "worker 0" and then the N workers
are workers 1 to N. I've not worked out the implications of that
in any detail though. It's fairly easy to see what to do for
fields that can be aggregated (the numbers printed for the node
as a whole are totals), but it doesn't help us any with something
like Sort Method.
Yeah, in the 0001 patch (which no longer applies and probably just
needs to be rewritten now), I used "Leader:" in the text format, but
worker number -1 in the structured formats, which I expected some
blowback on. I also thought about adding one to all the numbers as
you suggest.
In PHJ I had a related problem: I had to +1 the worker number to get a
zero-based "participant number" so that the leader would have a slot
in various data structures, and I wondered if we shouldn't just do
that to the whole system (eg not just in explain's output or in
localised bits of PHJ code).
On a narrower note, I'm not at all happy with the fact that 0001
adds yet another field to *every* PlanState. I think this is
doubling down on a fundamentally wrong decision to have
ExecParallelRetrieveInstrumentation do some aggregation immediately.
I think we should abandon that and just say that it returns the raw
leader and per-worker data, and then explain.c can aggregate as it
wishes.
Fair point. I will look into that.
Hi Thomas,
On 1/26/20 7:03 PM, Thomas Munro wrote:
Fair point. I will look into that.
Are you still planning on looking at this patch for PG13?
Based on the current state (002 abandoned, 001 needs total rework) I'd
say it should just be Returned with Feedback or Closed for now.
Regards,
--
-David
david@pgmasters.net
On Tue, Mar 17, 2020 at 2:39 AM David Steele <david@pgmasters.net> wrote:
On 1/26/20 7:03 PM, Thomas Munro wrote:
Fair point. I will look into that.
Are you still planning on looking at this patch for PG13?
Based on the current state (002 abandoned, 001 needs total rework) I'd
say it should just be Returned with Feedback or Closed for now.
When you put it like that, yeah :-) I marked it returned with
feedback. Thanks Melanie and Rafia for the reviews so far, and I'll
be back with a new version for PG14.
On Thu, Nov 7, 2019 at 9:48 PM Thomas Munro <thomas.munro@gmail.com> wrote:
On Thu, Nov 7, 2019 at 11:37 PM Rafia Sabih <rafia.pghackers@gmail.com> wrote:
...
Also, I noticed that the worker details are displayed for sort node even without verbose, but for scans it is only with verbose. Am I missing something or there is something behind? However, I am not sure if this is the introduced by this patch-set.Yeah, it's a pre-existing thing, but I agree it's an interesting
difference. We currently don't have a way to show a 'combined'
version of a parallel (oblivious) sort: we always show the per-process
version, and all this patch changes is how we label the leader's
stats. I suppose someone could argue that in non-VERBOSE mode we
should show the total memory usage (sum from all processes). I suppose
it's possible they use different sort types (one worker runs out of
work_mem and another doesn't), and I'm not sure how how you'd
aggregate that.
Over at [1]/messages/by-id/CAAaqYe_ctGqQsauuYS5StPULkES7=t8vNwvEPyzXQdbjAuZ6vA@mail.gmail.com (incremental sort patch) I had a similar question, since
each sort node (even non-parallel) can execute multiple tuplesorts.
The approach I took was to show both average and max for both disk and
memory usage as well as all sort strategies used. It looks like this:
-> Incremental Sort
Sort Key: a, b
Presorted Key: a
Full-sort Groups: 4 (Methods: quicksort) Memory: 26kB (avg), 26kB (max)
-> Index Scan using idx_t_a...
It'd be great if that had a use here too :)
James
[1]: /messages/by-id/CAAaqYe_ctGqQsauuYS5StPULkES7=t8vNwvEPyzXQdbjAuZ6vA@mail.gmail.com