Duplicate Workers entries in some EXPLAIN plans

Started by Maciek Sakrejdaabout 6 years ago27 messages
#1Maciek Sakrejda
m.sakrejda@gmail.com

Hello,

I originally reported this in pgsql-bugs [0]/messages/by-id/CADXhmgSr807j2Pc9aUjW2JOzOBe3FeYnQBe_f9U+-Mm4b1HRUw@mail.gmail.com, but there wasn't much
feedback there, so moving the discussion here. When using JSON, YAML, or
XML-format EXPLAIN on a plan that uses a parallelized sort, the Sort nodes
list two different entries for "Workers", one for the sort-related info,
and one for general worker info. This is what this looks like in JSON (some
details elided):

{
"Node Type": "Sort",
...
"Workers": [
{
"Worker Number": 0,
"Sort Method": "external merge",
"Sort Space Used": 20128,
"Sort Space Type": "Disk"
},
{
"Worker Number": 1,
"Sort Method": "external merge",
"Sort Space Used": 20128,
"Sort Space Type": "Disk"
}
],
...
"Workers": [
{
"Worker Number": 0,
"Actual Startup Time": 309.726,
"Actual Total Time": 310.179,
"Actual Rows": 4128,
"Actual Loops": 1,
"Shared Hit Blocks": 2872,
"Shared Read Blocks": 7584,
"Shared Dirtied Blocks": 0,
"Shared Written Blocks": 0,
"Local Hit Blocks": 0,
"Local Read Blocks": 0,
"Local Dirtied Blocks": 0,
"Local Written Blocks": 0,
"Temp Read Blocks": 490,
"Temp Written Blocks": 2529
},
{
"Worker Number": 1,
"Actual Startup Time": 306.523,
"Actual Total Time": 307.001,
"Actual Rows": 4128,
"Actual Loops": 1,
"Shared Hit Blocks": 3356,
"Shared Read Blocks": 7100,
"Shared Dirtied Blocks": 0,
"Shared Written Blocks": 0,
"Local Hit Blocks": 0,
"Local Read Blocks": 0,
"Local Dirtied Blocks": 0,
"Local Written Blocks": 0,
"Temp Read Blocks": 490,
"Temp Written Blocks": 2529
}
],
"Plans:" ...
}

This is technically valid JSON, but it's extremely difficult to work with,
since default JSON parsing in Ruby, node, Python, Go, and even Postgres'
own jsonb only keep the latest key--the sort information is discarded
(other languages probably don't fare much better; this is what I had on
hand). As Tom Lane pointed out in my pgsql-bugs thread, this has been
reported before [1]/messages/by-id/41ee53a5-a36e-cc8f-1bee-63f6565bb1ee@dalibo.com and in that earlier thread, Andrew Dunstan suggested
that perhaps the simplest solution is to just rename the sort-related
Workers node. Tom expressed some concerns about a breaking change here,
though I think the current behavior means vanishingly few users are parsing
this data correctly. Thoughts?

Thanks,
Maciek

[0]: /messages/by-id/CADXhmgSr807j2Pc9aUjW2JOzOBe3FeYnQBe_f9U+-Mm4b1HRUw@mail.gmail.com
/messages/by-id/CADXhmgSr807j2Pc9aUjW2JOzOBe3FeYnQBe_f9U+-Mm4b1HRUw@mail.gmail.com
[1]: /messages/by-id/41ee53a5-a36e-cc8f-1bee-63f6565bb1ee@dalibo.com
/messages/by-id/41ee53a5-a36e-cc8f-1bee-63f6565bb1ee@dalibo.com

#2Andres Freund
andres@anarazel.de
In reply to: Maciek Sakrejda (#1)
Re: Duplicate Workers entries in some EXPLAIN plans

Hi,

On 2019-10-22 11:58:35 -0700, Maciek Sakrejda wrote:

I originally reported this in pgsql-bugs [0], but there wasn't much
feedback there, so moving the discussion here. When using JSON, YAML, or
XML-format EXPLAIN on a plan that uses a parallelized sort, the Sort nodes
list two different entries for "Workers", one for the sort-related info,
and one for general worker info. This is what this looks like in JSON (some
details elided):

{
"Node Type": "Sort",
...
"Workers": [
{
"Worker Number": 0,
"Sort Method": "external merge",
"Sort Space Used": 20128,
"Sort Space Type": "Disk"
},
{
"Worker Number": 1,
"Sort Method": "external merge",
"Sort Space Used": 20128,
"Sort Space Type": "Disk"
}
],
...
"Workers": [
{

This is technically valid JSON, but it's extremely difficult to work with,
since default JSON parsing in Ruby, node, Python, Go, and even Postgres'
own jsonb only keep the latest key

It's also quite confusing.

As Tom Lane pointed out in my pgsql-bugs thread, this has been
reported before [1] and in that earlier thread, Andrew Dunstan suggested
that perhaps the simplest solution is to just rename the sort-related
Workers node. Thoughts?

Yea, I think we should fix this. The current output basically makes no
sense.

Tom expressed some concerns about a breaking change here,
though I think the current behavior means vanishingly few users are parsing
this data correctly.

Well, in a lot of the cases there's no parallel output for the sort, and
in other cases BUFFERS is not specified. In either case the 'duplicate
key' problem won't exist then.

While Tom said:

On 2019-10-16 09:16:56 +0200, Tom Lane wrote:

I think the text-mode output is intentional, but the other formats
need more work.

Sort Method: external merge Disk: 4920kB
Worker 0: Sort Method: external merge Disk: 5880kB
Worker 1: Sort Method: external merge Disk: 5920kB
Buffers: shared hit=682 read=10188, temp read=1415 written=2101
Worker 0: actual time=130.058..130.324 rows=1324 loops=1
Buffers: shared hit=337 read=3489, temp read=505 written=739
Worker 1: actual time=130.273..130.512 rows=1297 loops=1
Buffers: shared hit=345 read=3507, temp read=505 written=744

I don't think this is close to being good enough to be worth
preserving. I think it's worth avoiding unnecessary breakage of explain
output, but we also shouldn't endlessly carry forward confusing output,
just because of that.

It clearly seems like it'd be better if this instead were

Sort Method: external merge Disk: 4920kB
Buffers: shared hit=682 read=10188, temp read=1415 written=2101
Worker 0: actual time=130.058..130.324 rows=1324 loops=1
Sort Method: external merge Disk: 5880kB
Buffers: shared hit=337 read=3489, temp read=505 written=739
Worker 1: actual time=130.273..130.512 rows=1297 loops=1
Buffers: shared hit=345 read=3507, temp read=505 written=744
Sort Method: external merge Disk: 5920kB

I think the way this information was added in bf11e7ee2e36 and
33001fd7a707, contrasting to the output added in b287df70e408, is just
not right. If we add similar instrumentation reporting to more nodes,
we'll end up with duplicated information all over. Additionally the
per-worker part of show_sort_info() basically just duplicated the rest
of the function. I then also did something similar (although luckily
with a different key...), with the ExplainPrintJIT() call for Gather
nodes.

Unfortunately I think the fix isn't all that trivial, due to the way we
output the per-worker information at the end of ExplainNode(), by just
dumping things into a string. It seems to me that a step in the right
direction would be for ExplainNode() to create
planstate->worker_instrument StringInfos, which can be handed to
routines like show_sort_info(), which would print the per-node
information into that, rather than directly dumping into
es->output. Most of the current "Show worker detail" would be done
earlier in ExplainNode(), at the place where we current display the
"actual rows" bit.

ISTM that should include removing the duplication fo the the contents of
show_sort_info(), and probably also for the Gather, GatherMerge blocks
(I've apparently skipped adding the JIT information to the latter, not
sure if we ought to fix that in the stable branches).

Any chance you want to take a stab at that?

I don't think we'll fix it soon, but damn, all this string appending
around just isn't a great way to reliably build nested data formats.

Greetings,

Andres Freund

#3Maciek Sakrejda
m.sakrejda@gmail.com
In reply to: Andres Freund (#2)
1 attachment(s)
Re: Duplicate Workers entries in some EXPLAIN plans

On Thu, Oct 24, 2019 at 6:48 PM Andres Freund <andres@anarazel.de> wrote:

Unfortunately I think the fix isn't all that trivial, due to the way we
output the per-worker information at the end of ExplainNode(), by just
dumping things into a string. It seems to me that a step in the right
direction would be for ExplainNode() to create
planstate->worker_instrument StringInfos, which can be handed to
routines like show_sort_info(), which would print the per-node
information into that, rather than directly dumping into
es->output. Most of the current "Show worker detail" would be done
earlier in ExplainNode(), at the place where we current display the
"actual rows" bit.

ISTM that should include removing the duplication fo the the contents of
show_sort_info(), and probably also for the Gather, GatherMerge blocks
(I've apparently skipped adding the JIT information to the latter, not
sure if we ought to fix that in the stable branches).

Any chance you want to take a stab at that?

It took me a while, but I did take a stab at it (thanks for your
off-list help). Attached is my patch that changes the structured
formats to merge sort worker output in with costs/timing/buffers
worker output. I have not touched any other worker output yet, since
it's not under a Workers group as far as I can tell (so it does not
exhibit the problem I originally reported). I can try to make further
changes here if the approach is deemed sound. I also have not touched
text output; above you had proposed

Sort Method: external merge Disk: 4920kB
Buffers: shared hit=682 read=10188, temp read=1415 written=2101
Worker 0: actual time=130.058..130.324 rows=1324 loops=1
Sort Method: external merge Disk: 5880kB
Buffers: shared hit=337 read=3489, temp read=505 written=739
Worker 1: actual time=130.273..130.512 rows=1297 loops=1
Buffers: shared hit=345 read=3507, temp read=505 written=744
Sort Method: external merge Disk: 5920kB

which makes sense to me, but I'd like to confirm this is the approach
we want before I add it to the patch.

This is my first C in close to a decade (and I was never much of a C
programmer to begin with), so please be gentle.

As Andres suggested off-list, I also changed the worker output to
order fields that also occur in the parent node in the same way as the
parent node.

I've also added a test for the patch, and because this is really an
EXPLAIN issue rather than a query feature issue, I added a
src/test/regress/sql/explain.sql for the test. I added a couple of
utility functions for munging json-formatted EXPLAIN plans into
something we can repeatably verify in regression tests (the functions
use json rather than jsonb to preserve field order). I have not added
this for YAML or XML (even though they should behave the same way),
since I'm not familiar with the the functions to manipulate those data
types in a similar way (if they exist). My hunch is due to the
similarity of structured formats, just testing JSON is enough, but I
can expand/adjust tests as necessary.

Attachments:

merge-explain-worker-output.patchtext/x-patch; charset=US-ASCII; name=merge-explain-worker-output.patchDownload
diff --git a/src/backend/commands/explain.c b/src/backend/commands/explain.c
index 62fb3434a3..8f898fc20c 100644
--- a/src/backend/commands/explain.c
+++ b/src/backend/commands/explain.c
@@ -100,7 +100,7 @@ static void show_sortorder_options(StringInfo buf, Node *sortexpr,
 								   Oid sortOperator, Oid collation, bool nullsFirst);
 static void show_tablesample(TableSampleClause *tsc, PlanState *planstate,
 							 List *ancestors, ExplainState *es);
-static void show_sort_info(SortState *sortstate, ExplainState *es);
+static void show_sort_info(SortState *sortstate, StringInfo* worker_strs, ExplainState *es);
 static void show_hash_info(HashState *hashstate, ExplainState *es);
 static void show_tidbitmap_info(BitmapHeapScanState *planstate,
 								ExplainState *es);
@@ -290,6 +290,7 @@ NewExplainState(void)
 	es->costs = true;
 	/* Prepare output buffer. */
 	es->str = makeStringInfo();
+	es->root_str = es->str;
 
 	return es;
 }
@@ -1073,9 +1074,20 @@ ExplainNode(PlanState *planstate, List *ancestors,
 	const char *partialmode = NULL;
 	const char *operation = NULL;
 	const char *custom_name = NULL;
+	StringInfo *worker_strs = NULL;
 	int			save_indent = es->indent;
 	bool		haschildren;
 
+	/* Prepare per-worker output */
+	if (es->analyze && planstate->worker_instrument) {
+		int num_workers = planstate->worker_instrument->num_workers;
+		int n;
+		worker_strs = (StringInfo *) palloc0(num_workers * sizeof(StringInfo));
+		for (n = 0; n < num_workers; n++) {
+			worker_strs[n] = makeStringInfo();
+		}
+	}
+
 	switch (nodeTag(plan))
 	{
 		case T_Result:
@@ -1340,6 +1352,64 @@ ExplainNode(PlanState *planstate, List *ancestors,
 		ExplainPropertyBool("Parallel Aware", plan->parallel_aware, es);
 	}
 
+	/* Prepare worker general execution details */
+	if (es->analyze && es->verbose && planstate->worker_instrument)
+	{
+		WorkerInstrumentation *w = planstate->worker_instrument;
+		int			n;
+
+		for (n = 0; n < w->num_workers; ++n)
+		{
+			Instrumentation *instrument = &w->instrument[n];
+			double		nloops = instrument->nloops;
+			double		startup_ms;
+			double		total_ms;
+			double		rows;
+
+			if (nloops <= 0)
+				continue;
+			startup_ms = 1000.0 * instrument->startup / nloops;
+			total_ms = 1000.0 * instrument->total / nloops;
+			rows = instrument->ntuples / nloops;
+
+			if (es->format == EXPLAIN_FORMAT_TEXT)
+			{
+				appendStringInfoSpaces(es->str, es->indent * 2);
+				appendStringInfo(es->str, "Worker %d: ", n);
+				if (es->timing)
+					appendStringInfo(es->str,
+									 "actual time=%.3f..%.3f rows=%.0f loops=%.0f\n",
+									 startup_ms, total_ms, rows, nloops);
+				else
+					appendStringInfo(es->str,
+									 "actual rows=%.0f loops=%.0f\n",
+									 rows, nloops);
+				es->indent++;
+				if (es->buffers)
+					show_buffer_usage(es, &instrument->bufusage);
+				es->indent--;
+			}
+			else
+			{
+				ExplainOpenWorker(worker_strs[n], es);
+
+				if (es->timing)
+				{
+					ExplainPropertyFloat("Actual Startup Time", "ms",
+										 startup_ms, 3, es);
+					ExplainPropertyFloat("Actual Total Time", "ms",
+										 total_ms, 3, es);
+				}
+				ExplainPropertyFloat("Actual Rows", NULL, rows, 0, es);
+				ExplainPropertyFloat("Actual Loops", NULL, nloops, 0, es);
+			}
+		}
+
+		if (es->format != EXPLAIN_FORMAT_TEXT) {
+			ExplainCloseWorker(es);
+		}
+	}
+
 	switch (nodeTag(plan))
 	{
 		case T_SeqScan:
@@ -1839,7 +1909,7 @@ ExplainNode(PlanState *planstate, List *ancestors,
 			break;
 		case T_Sort:
 			show_sort_keys(castNode(SortState, planstate), ancestors, es);
-			show_sort_info(castNode(SortState, planstate), es);
+			show_sort_info(castNode(SortState, planstate), worker_strs, es);
 			break;
 		case T_MergeAppend:
 			show_merge_append_keys(castNode(MergeAppendState, planstate),
@@ -1868,73 +1938,30 @@ ExplainNode(PlanState *planstate, List *ancestors,
 	if (es->buffers && planstate->instrument)
 		show_buffer_usage(es, &planstate->instrument->bufusage);
 
-	/* Show worker detail */
-	if (es->analyze && es->verbose && planstate->worker_instrument)
+	/* Prepare worker buffer usage */
+	if (es->buffers && es->analyze && es->verbose && planstate->worker_instrument
+		&& es->format != EXPLAIN_FORMAT_TEXT)
 	{
 		WorkerInstrumentation *w = planstate->worker_instrument;
-		bool		opened_group = false;
 		int			n;
 
 		for (n = 0; n < w->num_workers; ++n)
 		{
 			Instrumentation *instrument = &w->instrument[n];
 			double		nloops = instrument->nloops;
-			double		startup_ms;
-			double		total_ms;
-			double		rows;
 
 			if (nloops <= 0)
 				continue;
-			startup_ms = 1000.0 * instrument->startup / nloops;
-			total_ms = 1000.0 * instrument->total / nloops;
-			rows = instrument->ntuples / nloops;
 
-			if (es->format == EXPLAIN_FORMAT_TEXT)
-			{
-				appendStringInfoSpaces(es->str, es->indent * 2);
-				appendStringInfo(es->str, "Worker %d: ", n);
-				if (es->timing)
-					appendStringInfo(es->str,
-									 "actual time=%.3f..%.3f rows=%.0f loops=%.0f\n",
-									 startup_ms, total_ms, rows, nloops);
-				else
-					appendStringInfo(es->str,
-									 "actual rows=%.0f loops=%.0f\n",
-									 rows, nloops);
-				es->indent++;
-				if (es->buffers)
-					show_buffer_usage(es, &instrument->bufusage);
-				es->indent--;
-			}
-			else
-			{
-				if (!opened_group)
-				{
-					ExplainOpenGroup("Workers", "Workers", false, es);
-					opened_group = true;
-				}
-				ExplainOpenGroup("Worker", NULL, true, es);
-				ExplainPropertyInteger("Worker Number", NULL, n, es);
-
-				if (es->timing)
-				{
-					ExplainPropertyFloat("Actual Startup Time", "ms",
-										 startup_ms, 3, es);
-					ExplainPropertyFloat("Actual Total Time", "ms",
-										 total_ms, 3, es);
-				}
-				ExplainPropertyFloat("Actual Rows", NULL, rows, 0, es);
-				ExplainPropertyFloat("Actual Loops", NULL, nloops, 0, es);
-
-				if (es->buffers)
-					show_buffer_usage(es, &instrument->bufusage);
-
-				ExplainCloseGroup("Worker", NULL, true, es);
-			}
+			ExplainOpenWorker(worker_strs[n], es);
+			show_buffer_usage(es, &instrument->bufusage);
 		}
+		ExplainCloseWorker(es);
+	}
 
-		if (opened_group)
-			ExplainCloseGroup("Workers", "Workers", false, es);
+	/* Show worker detail */
+	if (planstate->worker_instrument) {
+		ExplainFlushWorkers(worker_strs, planstate->worker_instrument->num_workers, es);
 	}
 
 	/* Get ready to display the child plans */
@@ -2534,7 +2561,7 @@ show_tablesample(TableSampleClause *tsc, PlanState *planstate,
  * If it's EXPLAIN ANALYZE, show tuplesort stats for a sort node
  */
 static void
-show_sort_info(SortState *sortstate, ExplainState *es)
+show_sort_info(SortState *sortstate, StringInfo *worker_strs, ExplainState *es)
 {
 	if (!es->analyze)
 		return;
@@ -2569,7 +2596,6 @@ show_sort_info(SortState *sortstate, ExplainState *es)
 	if (sortstate->shared_info != NULL)
 	{
 		int			n;
-		bool		opened_group = false;
 
 		for (n = 0; n < sortstate->shared_info->num_workers; n++)
 		{
@@ -2594,21 +2620,15 @@ show_sort_info(SortState *sortstate, ExplainState *es)
 			}
 			else
 			{
-				if (!opened_group)
-				{
-					ExplainOpenGroup("Workers", "Workers", false, es);
-					opened_group = true;
-				}
-				ExplainOpenGroup("Worker", NULL, true, es);
-				ExplainPropertyInteger("Worker Number", NULL, n, es);
+				ExplainOpenWorker(worker_strs[n], es);
 				ExplainPropertyText("Sort Method", sortMethod, es);
 				ExplainPropertyInteger("Sort Space Used", "kB", spaceUsed, es);
 				ExplainPropertyText("Sort Space Type", spaceType, es);
-				ExplainCloseGroup("Worker", NULL, true, es);
 			}
 		}
-		if (opened_group)
-			ExplainCloseGroup("Workers", "Workers", false, es);
+		if (es->format != EXPLAIN_FORMAT_TEXT) {
+			ExplainCloseWorker(es);
+		}
 	}
 }
 
@@ -3716,6 +3736,62 @@ ExplainCloseGroup(const char *objtype, const char *labelname,
 	}
 }
 
+/*
+ * Begin output for a specific worker. If we are currently producing output
+ * for another worker, close that worker automatically.
+ */
+void
+ExplainOpenWorker(StringInfo worker_str, ExplainState *es)
+{
+	/*
+	 * We indent twice--once for the "Worker" group and once for the "Workers"
+	 * group--but only need to adjust indentation once for each string of
+	 * ExplainOpenWorker calls.
+	 */
+	if (es->str == es->root_str) {
+		es->indent += 2;
+	}
+
+	es->print_workers = true;
+	es->str = worker_str;
+}
+
+/*
+ * Explicitly end output for the current worker and resume output
+ * for the root of the node.
+ */
+void
+ExplainCloseWorker(ExplainState *es)
+{
+	es->indent -= 2;
+	es->str = es->root_str;
+}
+
+/*
+ * Flush output registered so far to the output buffer and deallocate
+ * mechanism for writing to workers.
+ */
+void
+ExplainFlushWorkers(StringInfo* worker_strs, int num_workers, ExplainState *es)
+{
+	int i;
+
+	if (!es->print_workers) {
+		return;
+	}
+
+	ExplainOpenGroup("Workers", "Workers", false, es);
+	for (i = 0; i < num_workers; i++) {
+		ExplainOpenGroup("Worker", NULL, true, es);
+		ExplainPropertyInteger("Worker Number", NULL, i, es);
+		appendStringInfoString(es->str, worker_strs[i]->data);
+		ExplainCloseGroup("Worker", NULL, true, es);
+	}
+	ExplainCloseGroup("Workers", "Workers", false, es);
+	// do we have any other cleanup to do?
+	es->print_workers = false;
+}
+
 /*
  * Emit a "dummy" group that never has any members.
  *
diff --git a/src/include/commands/explain.h b/src/include/commands/explain.h
index 8639891c16..e16f6872d8 100644
--- a/src/include/commands/explain.h
+++ b/src/include/commands/explain.h
@@ -27,7 +27,7 @@ typedef enum ExplainFormat
 
 typedef struct ExplainState
 {
-	StringInfo	str;			/* output buffer */
+	StringInfo	str;			/* current output buffer */
 	/* options */
 	bool		verbose;		/* be verbose */
 	bool		analyze;		/* print actual times */
@@ -40,6 +40,8 @@ typedef struct ExplainState
 	/* state for output formatting --- not reset for each new plan tree */
 	int			indent;			/* current indentation level */
 	List	   *grouping_stack; /* format-specific grouping state */
+	bool        print_workers;  /* whether current node has worker metadata */
+	StringInfo	root_str;		/* root output buffer */
 	/* state related to the current plan tree (filled by ExplainPrintPlan) */
 	PlannedStmt *pstmt;			/* top of plan */
 	List	   *rtable;			/* range table */
@@ -110,4 +112,9 @@ extern void ExplainOpenGroup(const char *objtype, const char *labelname,
 extern void ExplainCloseGroup(const char *objtype, const char *labelname,
 							  bool labeled, ExplainState *es);
 
+extern void ExplainOpenWorker(StringInfo worker_str, ExplainState *es);
+extern void ExplainCloseWorker(ExplainState *es);
+extern void ExplainFlushWorkers(StringInfo *worker_strs, int num_workers, ExplainState *es);
+
+
 #endif							/* EXPLAIN_H */
diff --git a/src/test/regress/expected/explain.out b/src/test/regress/expected/explain.out
new file mode 100644
index 0000000000..db9914db7b
--- /dev/null
+++ b/src/test/regress/expected/explain.out
@@ -0,0 +1,75 @@
+--
+-- EXPLAIN
+--
+-- helper functions for examining json explain output
+-- return a json explain of given query
+CREATE OR REPLACE FUNCTION json_explain(query text, variadic opts text[])
+  RETURNS json LANGUAGE PLPGSQL AS $$
+DECLARE
+  plan json;
+BEGIN
+  EXECUTE 'EXPLAIN (' || array_to_string(array_append(opts, 'FORMAT JSON) '), ', ') || query INTO STRICT plan;
+  RETURN plan;
+END;
+$$;
+/*
+ * Takes a json object and processes its keys and values. For every
+ * matching key whose value is a number, set it to zero to facilitate
+ * comparisons for values that can vary across executions. If not a number,
+ * set the value to -1 to indicate we matched an unexpected key. Return
+ * non-matching keys and values unmodified.
+ */
+CREATE OR REPLACE FUNCTION json_normalize_numeric_keys_matching(obj json, regexp text)
+  RETURNS json LANGUAGE SQL AS $$
+SELECT
+  coalesce(json_object_agg(
+    key,
+    CASE
+      WHEN key ~ regexp THEN
+        CASE json_typeof(value)
+	  WHEN 'number' THEN 0::text::json
+	  ELSE (-1)::text::json
+	END
+      ELSE value
+    END
+  ), '{}')
+FROM
+  json_each(obj)
+$$;
+-- explain only worker information, normalizing any values that could vary across executions
+CREATE OR REPLACE FUNCTION explain_workers(plan_node json)
+  RETURNS json LANGUAGE SQL AS $$
+SELECT
+  coalesce(json_object_agg(key, value), '{}')
+FROM (
+  SELECT
+    key,
+    CASE key
+      WHEN 'Workers' THEN
+        (SELECT json_agg(
+	   json_normalize_numeric_keys_matching(worker, '(Blocks|Time)$|Sort Space Used')
+        )
+        FROM json_array_elements(value) AS workers(worker))
+      WHEN 'Plans' THEN
+	(SELECT json_agg(
+	  explain_workers(child)
+	)
+	FROM json_array_elements(value) AS children(child))
+    END AS value
+  FROM
+    json_each(plan_node) AS entries(key, value)
+  WHERE
+    key IN ('Workers', 'Plans')
+) AS plan_fields(key, value);
+$$;
+-- test that per-worker sort and buffers output is combined correctly in EXPLAIN
+set force_parallel_mode = true;
+SELECT explain_workers(json_explain($$
+  SELECT * FROM (VALUES(1),(2)) x ORDER BY x;
+$$, 'verbose', 'analyze', 'buffers') -> 0 -> 'Plan');
+                                                                                                                                                                                                                                                                                                                                                                                                                                                              explain_workers                                                                                                                                                                                                                                                                                                                                                                                                                                                               
+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
+ { "Plans" : [{ "Workers" : [{ "Worker Number" : 0, "Actual Startup Time" : 0, "Actual Total Time" : 0, "Actual Rows" : 2, "Actual Loops" : 1, "Sort Method" : "quicksort", "Sort Space Used" : 0, "Sort Space Type" : "Memory", "Shared Hit Blocks" : 0, "Shared Read Blocks" : 0, "Shared Dirtied Blocks" : 0, "Shared Written Blocks" : 0, "Local Hit Blocks" : 0, "Local Read Blocks" : 0, "Local Dirtied Blocks" : 0, "Local Written Blocks" : 0, "Temp Read Blocks" : 0, "Temp Written Blocks" : 0 }], "Plans" : [{ "Workers" : [{ "Worker Number" : 0, "Actual Startup Time" : 0, "Actual Total Time" : 0, "Actual Rows" : 2, "Actual Loops" : 1, "Shared Hit Blocks" : 0, "Shared Read Blocks" : 0, "Shared Dirtied Blocks" : 0, "Shared Written Blocks" : 0, "Local Hit Blocks" : 0, "Local Read Blocks" : 0, "Local Dirtied Blocks" : 0, "Local Written Blocks" : 0, "Temp Read Blocks" : 0, "Temp Written Blocks" : 0 }] }] }] }
+(1 row)
+
+reset force_parallel_mode;
diff --git a/src/test/regress/parallel_schedule b/src/test/regress/parallel_schedule
index d33a4e143d..d2b17dd3ea 100644
--- a/src/test/regress/parallel_schedule
+++ b/src/test/regress/parallel_schedule
@@ -112,7 +112,7 @@ test: plancache limit plpgsql copy2 temp domain rangefuncs prepare conversion tr
 # ----------
 # Another group of parallel tests
 # ----------
-test: partition_join partition_prune reloptions hash_part indexing partition_aggregate partition_info tuplesort
+test: partition_join partition_prune reloptions hash_part indexing partition_aggregate partition_info tuplesort explain
 
 # event triggers cannot run concurrently with any test that runs DDL
 test: event_trigger
diff --git a/src/test/regress/serial_schedule b/src/test/regress/serial_schedule
index f86f5c5682..acba391332 100644
--- a/src/test/regress/serial_schedule
+++ b/src/test/regress/serial_schedule
@@ -193,6 +193,7 @@ test: indexing
 test: partition_aggregate
 test: partition_info
 test: tuplesort
+test: explain
 test: event_trigger
 test: fast_default
 test: stats
diff --git a/src/test/regress/sql/explain.sql b/src/test/regress/sql/explain.sql
new file mode 100644
index 0000000000..5fb6e7d2e1
--- /dev/null
+++ b/src/test/regress/sql/explain.sql
@@ -0,0 +1,77 @@
+--
+-- EXPLAIN
+--
+
+-- helper functions for examining json explain output
+
+-- return a json explain of given query
+CREATE OR REPLACE FUNCTION json_explain(query text, variadic opts text[])
+  RETURNS json LANGUAGE PLPGSQL AS $$
+DECLARE
+  plan json;
+BEGIN
+  EXECUTE 'EXPLAIN (' || array_to_string(array_append(opts, 'FORMAT JSON) '), ', ') || query INTO STRICT plan;
+  RETURN plan;
+END;
+$$;
+
+/*
+ * Takes a json object and processes its keys and values. For every
+ * matching key whose value is a number, set it to zero to facilitate
+ * comparisons for values that can vary across executions. If not a number,
+ * set the value to -1 to indicate we matched an unexpected key. Return
+ * non-matching keys and values unmodified.
+ */
+CREATE OR REPLACE FUNCTION json_normalize_numeric_keys_matching(obj json, regexp text)
+  RETURNS json LANGUAGE SQL AS $$
+SELECT
+  coalesce(json_object_agg(
+    key,
+    CASE
+      WHEN key ~ regexp THEN
+        CASE json_typeof(value)
+	  WHEN 'number' THEN 0::text::json
+	  ELSE (-1)::text::json
+	END
+      ELSE value
+    END
+  ), '{}')
+FROM
+  json_each(obj)
+$$;
+
+-- explain only worker information, normalizing any values that could vary across executions
+CREATE OR REPLACE FUNCTION explain_workers(plan_node json)
+  RETURNS json LANGUAGE SQL AS $$
+SELECT
+  coalesce(json_object_agg(key, value), '{}')
+FROM (
+  SELECT
+    key,
+    CASE key
+      WHEN 'Workers' THEN
+        (SELECT json_agg(
+	   json_normalize_numeric_keys_matching(worker, '(Blocks|Time)$|Sort Space Used')
+        )
+        FROM json_array_elements(value) AS workers(worker))
+      WHEN 'Plans' THEN
+	(SELECT json_agg(
+	  explain_workers(child)
+	)
+	FROM json_array_elements(value) AS children(child))
+    END AS value
+  FROM
+    json_each(plan_node) AS entries(key, value)
+  WHERE
+    key IN ('Workers', 'Plans')
+) AS plan_fields(key, value);
+$$;
+
+-- test that per-worker sort and buffers output is combined correctly in EXPLAIN
+set force_parallel_mode = true;
+
+SELECT explain_workers(json_explain($$
+  SELECT * FROM (VALUES(1),(2)) x ORDER BY x;
+$$, 'verbose', 'analyze', 'buffers') -> 0 -> 'Plan');
+
+reset force_parallel_mode;
#4Maciek Sakrejda
m.sakrejda@gmail.com
In reply to: Maciek Sakrejda (#3)
Re: Duplicate Workers entries in some EXPLAIN plans

I wanted to follow up on this patch since I received no feedback. What
should my next steps be (besides rebasing, though I want to confirm there's
interest before I do that)?

#5Julien Rouhaud
rjuju123@gmail.com
In reply to: Maciek Sakrejda (#4)
Re: Duplicate Workers entries in some EXPLAIN plans

On Fri, Dec 27, 2019 at 12:31 AM Maciek Sakrejda <m.sakrejda@gmail.com> wrote:

I wanted to follow up on this patch since I received no feedback. What should my next steps be (besides rebasing, though I want to confirm there's interest before I do that)?

Given Andres' answer I'd say that there's interest in this patch. You
should register this patch in the next commitfest
(https://commitfest.postgresql.org/26/) to make sure that it's not
forgotten, which unfortunately is probably what happened here .

#6Maciek Sakrejda
m.sakrejda@gmail.com
In reply to: Julien Rouhaud (#5)
Re: Duplicate Workers entries in some EXPLAIN plans

Done! Thanks!

#7Georgios Kokolatos
gkokolatos@pm.me
In reply to: Maciek Sakrejda (#6)
Re: Duplicate Workers entries in some EXPLAIN plans

The following review has been posted through the commitfest application:
make installcheck-world: not tested
Implements feature: not tested
Spec compliant: not tested
Documentation: not tested

This is a high level review only. However, seeing that there is a conflict and does not merge cleanly after commit <b925a00f4ef>, I return to Author.

To be fair the resolution seems quite straight forward and I took the liberty of applying the necessary changes to include the new element of ExplainState introduced in the above commit, namely hide_workers. However since the author might have a different idea on how to incorporate this change I leave it up to him.

Another very high level comment is the introduction of a new test file, namely explain. Seeing `explain.sql` in the tests suits, personally and very opinion based, I would have expected the whole spectrum of the explain properties to be tested. However only a slight fraction of the functionality is tested. Since this is a bit more of a personal opinion, I don't expect any changes unless the author happens to agree.

Other than these minor nitpick, the code seems clear, concise and does what it says on the box. It follows the comments in the discussion thread(s) and solves a real issue.

Please have a look on how commit <b925a00f4ef> affects this patch and rebase.

The new status of this patch is: Waiting on Author

#8Maciek Sakrejda
m.sakrejda@gmail.com
In reply to: Georgios Kokolatos (#7)
1 attachment(s)
Re: Duplicate Workers entries in some EXPLAIN plans

Thanks for the review! I looked at <b925a00f4ef> and rebased the patch
on current master, ac5bdf6.

I introduced a new test file because this bug is specifically about
EXPLAIN output (as opposed to query execution or planning
functionality), and it didn't seem like a test would fit in any of the
other files. I focused on testing just the behavior around this
specific bug (and fix). I think eventually we should probably test
other more fundamental EXPLAIN features (and I'm happy to contribute
to that) in that file, but that seems outside of the scope of this
patch.

Any thoughts on what we should do with text mode output (which is
untouched right now)? The output Andres proposed above makes sense to
me, but I'd like to get more input.

Attachments:

merge-explain-worker-output-v2.patchtext/x-patch; charset=US-ASCII; name=merge-explain-worker-output-v2.patchDownload
diff --git a/src/backend/commands/explain.c b/src/backend/commands/explain.c
index d189b8d573..96a8973884 100644
--- a/src/backend/commands/explain.c
+++ b/src/backend/commands/explain.c
@@ -100,7 +100,7 @@ static void show_sortorder_options(StringInfo buf, Node *sortexpr,
 								   Oid sortOperator, Oid collation, bool nullsFirst);
 static void show_tablesample(TableSampleClause *tsc, PlanState *planstate,
 							 List *ancestors, ExplainState *es);
-static void show_sort_info(SortState *sortstate, ExplainState *es);
+static void show_sort_info(SortState *sortstate, StringInfo* worker_strs, ExplainState *es);
 static void show_hash_info(HashState *hashstate, ExplainState *es);
 static void show_tidbitmap_info(BitmapHeapScanState *planstate,
 								ExplainState *es);
@@ -289,6 +289,7 @@ NewExplainState(void)
 	es->costs = true;
 	/* Prepare output buffer. */
 	es->str = makeStringInfo();
+	es->root_str = es->str;
 
 	return es;
 }
@@ -1090,9 +1091,20 @@ ExplainNode(PlanState *planstate, List *ancestors,
 	const char *partialmode = NULL;
 	const char *operation = NULL;
 	const char *custom_name = NULL;
+	StringInfo *worker_strs = NULL;
 	int			save_indent = es->indent;
 	bool		haschildren;
 
+	/* Prepare per-worker output */
+	if (es->analyze && planstate->worker_instrument) {
+		int num_workers = planstate->worker_instrument->num_workers;
+		int n;
+		worker_strs = (StringInfo *) palloc0(num_workers * sizeof(StringInfo));
+		for (n = 0; n < num_workers; n++) {
+			worker_strs[n] = makeStringInfo();
+		}
+	}
+
 	switch (nodeTag(plan))
 	{
 		case T_Result:
@@ -1357,6 +1369,64 @@ ExplainNode(PlanState *planstate, List *ancestors,
 		ExplainPropertyBool("Parallel Aware", plan->parallel_aware, es);
 	}
 
+	/* Prepare worker general execution details */
+	if (es->analyze && es->verbose && planstate->worker_instrument)
+	{
+		WorkerInstrumentation *w = planstate->worker_instrument;
+		int			n;
+
+		for (n = 0; n < w->num_workers; ++n)
+		{
+			Instrumentation *instrument = &w->instrument[n];
+			double		nloops = instrument->nloops;
+			double		startup_ms;
+			double		total_ms;
+			double		rows;
+
+			if (nloops <= 0)
+				continue;
+			startup_ms = 1000.0 * instrument->startup / nloops;
+			total_ms = 1000.0 * instrument->total / nloops;
+			rows = instrument->ntuples / nloops;
+
+			if (es->format == EXPLAIN_FORMAT_TEXT)
+			{
+				appendStringInfoSpaces(es->str, es->indent * 2);
+				appendStringInfo(es->str, "Worker %d: ", n);
+				if (es->timing)
+					appendStringInfo(es->str,
+									 "actual time=%.3f..%.3f rows=%.0f loops=%.0f\n",
+									 startup_ms, total_ms, rows, nloops);
+				else
+					appendStringInfo(es->str,
+									 "actual rows=%.0f loops=%.0f\n",
+									 rows, nloops);
+				es->indent++;
+				if (es->buffers)
+					show_buffer_usage(es, &instrument->bufusage);
+				es->indent--;
+			}
+			else
+			{
+				ExplainOpenWorker(worker_strs[n], es);
+
+				if (es->timing)
+				{
+					ExplainPropertyFloat("Actual Startup Time", "ms",
+										 startup_ms, 3, es);
+					ExplainPropertyFloat("Actual Total Time", "ms",
+										 total_ms, 3, es);
+				}
+				ExplainPropertyFloat("Actual Rows", NULL, rows, 0, es);
+				ExplainPropertyFloat("Actual Loops", NULL, nloops, 0, es);
+			}
+		}
+
+		if (es->format != EXPLAIN_FORMAT_TEXT) {
+			ExplainCloseWorker(es);
+		}
+	}
+
 	switch (nodeTag(plan))
 	{
 		case T_SeqScan:
@@ -1856,7 +1926,7 @@ ExplainNode(PlanState *planstate, List *ancestors,
 			break;
 		case T_Sort:
 			show_sort_keys(castNode(SortState, planstate), ancestors, es);
-			show_sort_info(castNode(SortState, planstate), es);
+			show_sort_info(castNode(SortState, planstate), worker_strs, es);
 			break;
 		case T_MergeAppend:
 			show_merge_append_keys(castNode(MergeAppendState, planstate),
@@ -1885,74 +1955,30 @@ ExplainNode(PlanState *planstate, List *ancestors,
 	if (es->buffers && planstate->instrument)
 		show_buffer_usage(es, &planstate->instrument->bufusage);
 
-	/* Show worker detail */
-	if (es->analyze && es->verbose && !es->hide_workers &&
-		planstate->worker_instrument)
+	/* Prepare worker buffer usage */
+	if (es->buffers && es->analyze && es->verbose && !es->hide_workers
+		&& planstate->worker_instrument && es->format != EXPLAIN_FORMAT_TEXT)
 	{
 		WorkerInstrumentation *w = planstate->worker_instrument;
-		bool		opened_group = false;
 		int			n;
 
 		for (n = 0; n < w->num_workers; ++n)
 		{
 			Instrumentation *instrument = &w->instrument[n];
 			double		nloops = instrument->nloops;
-			double		startup_ms;
-			double		total_ms;
-			double		rows;
 
 			if (nloops <= 0)
 				continue;
-			startup_ms = 1000.0 * instrument->startup / nloops;
-			total_ms = 1000.0 * instrument->total / nloops;
-			rows = instrument->ntuples / nloops;
 
-			if (es->format == EXPLAIN_FORMAT_TEXT)
-			{
-				appendStringInfoSpaces(es->str, es->indent * 2);
-				appendStringInfo(es->str, "Worker %d: ", n);
-				if (es->timing)
-					appendStringInfo(es->str,
-									 "actual time=%.3f..%.3f rows=%.0f loops=%.0f\n",
-									 startup_ms, total_ms, rows, nloops);
-				else
-					appendStringInfo(es->str,
-									 "actual rows=%.0f loops=%.0f\n",
-									 rows, nloops);
-				es->indent++;
-				if (es->buffers)
-					show_buffer_usage(es, &instrument->bufusage);
-				es->indent--;
-			}
-			else
-			{
-				if (!opened_group)
-				{
-					ExplainOpenGroup("Workers", "Workers", false, es);
-					opened_group = true;
-				}
-				ExplainOpenGroup("Worker", NULL, true, es);
-				ExplainPropertyInteger("Worker Number", NULL, n, es);
-
-				if (es->timing)
-				{
-					ExplainPropertyFloat("Actual Startup Time", "ms",
-										 startup_ms, 3, es);
-					ExplainPropertyFloat("Actual Total Time", "ms",
-										 total_ms, 3, es);
-				}
-				ExplainPropertyFloat("Actual Rows", NULL, rows, 0, es);
-				ExplainPropertyFloat("Actual Loops", NULL, nloops, 0, es);
-
-				if (es->buffers)
-					show_buffer_usage(es, &instrument->bufusage);
-
-				ExplainCloseGroup("Worker", NULL, true, es);
-			}
+			ExplainOpenWorker(worker_strs[n], es);
+			show_buffer_usage(es, &instrument->bufusage);
 		}
+		ExplainCloseWorker(es);
+	}
 
-		if (opened_group)
-			ExplainCloseGroup("Workers", "Workers", false, es);
+	/* Show worker detail */
+	if (planstate->worker_instrument) {
+		ExplainFlushWorkers(worker_strs, planstate->worker_instrument->num_workers, es);
 	}
 
 	/* Get ready to display the child plans */
@@ -2552,7 +2578,7 @@ show_tablesample(TableSampleClause *tsc, PlanState *planstate,
  * If it's EXPLAIN ANALYZE, show tuplesort stats for a sort node
  */
 static void
-show_sort_info(SortState *sortstate, ExplainState *es)
+show_sort_info(SortState *sortstate, StringInfo *worker_strs, ExplainState *es)
 {
 	if (!es->analyze)
 		return;
@@ -2593,7 +2619,6 @@ show_sort_info(SortState *sortstate, ExplainState *es)
 	if (sortstate->shared_info != NULL)
 	{
 		int			n;
-		bool		opened_group = false;
 
 		for (n = 0; n < sortstate->shared_info->num_workers; n++)
 		{
@@ -2620,21 +2645,15 @@ show_sort_info(SortState *sortstate, ExplainState *es)
 			}
 			else
 			{
-				if (!opened_group)
-				{
-					ExplainOpenGroup("Workers", "Workers", false, es);
-					opened_group = true;
-				}
-				ExplainOpenGroup("Worker", NULL, true, es);
-				ExplainPropertyInteger("Worker Number", NULL, n, es);
+				ExplainOpenWorker(worker_strs[n], es);
 				ExplainPropertyText("Sort Method", sortMethod, es);
 				ExplainPropertyInteger("Sort Space Used", "kB", spaceUsed, es);
 				ExplainPropertyText("Sort Space Type", spaceType, es);
-				ExplainCloseGroup("Worker", NULL, true, es);
 			}
 		}
-		if (opened_group)
-			ExplainCloseGroup("Workers", "Workers", false, es);
+		if (es->format != EXPLAIN_FORMAT_TEXT) {
+			ExplainCloseWorker(es);
+		}
 	}
 }
 
@@ -3751,6 +3770,62 @@ ExplainCloseGroup(const char *objtype, const char *labelname,
 	}
 }
 
+/*
+ * Begin output for a specific worker. If we are currently producing output
+ * for another worker, close that worker automatically.
+ */
+void
+ExplainOpenWorker(StringInfo worker_str, ExplainState *es)
+{
+	/*
+	 * We indent twice--once for the "Worker" group and once for the "Workers"
+	 * group--but only need to adjust indentation once for each string of
+	 * ExplainOpenWorker calls.
+	 */
+	if (es->str == es->root_str) {
+		es->indent += 2;
+	}
+
+	es->print_workers = true;
+	es->str = worker_str;
+}
+
+/*
+ * Explicitly end output for the current worker and resume output
+ * for the root of the node.
+ */
+void
+ExplainCloseWorker(ExplainState *es)
+{
+	es->indent -= 2;
+	es->str = es->root_str;
+}
+
+/*
+ * Flush output registered so far to the output buffer and deallocate
+ * mechanism for writing to workers.
+ */
+void
+ExplainFlushWorkers(StringInfo* worker_strs, int num_workers, ExplainState *es)
+{
+	int i;
+
+	if (!es->print_workers) {
+		return;
+	}
+
+	ExplainOpenGroup("Workers", "Workers", false, es);
+	for (i = 0; i < num_workers; i++) {
+		ExplainOpenGroup("Worker", NULL, true, es);
+		ExplainPropertyInteger("Worker Number", NULL, i, es);
+		appendStringInfoString(es->str, worker_strs[i]->data);
+		ExplainCloseGroup("Worker", NULL, true, es);
+	}
+	ExplainCloseGroup("Workers", "Workers", false, es);
+	// do we have any other cleanup to do?
+	es->print_workers = false;
+}
+
 /*
  * Emit a "dummy" group that never has any members.
  *
diff --git a/src/include/commands/explain.h b/src/include/commands/explain.h
index 45027a7c47..2919b2780d 100644
--- a/src/include/commands/explain.h
+++ b/src/include/commands/explain.h
@@ -27,7 +27,7 @@ typedef enum ExplainFormat
 
 typedef struct ExplainState
 {
-	StringInfo	str;			/* output buffer */
+	StringInfo	str;			/* current output buffer */
 	/* options */
 	bool		verbose;		/* be verbose */
 	bool		analyze;		/* print actual times */
@@ -40,6 +40,8 @@ typedef struct ExplainState
 	/* state for output formatting --- not reset for each new plan tree */
 	int			indent;			/* current indentation level */
 	List	   *grouping_stack; /* format-specific grouping state */
+	bool        print_workers;  /* whether current node has worker metadata */
+	StringInfo	root_str;		/* root output buffer */
 	/* state related to the current plan tree (filled by ExplainPrintPlan) */
 	PlannedStmt *pstmt;			/* top of plan */
 	List	   *rtable;			/* range table */
@@ -111,4 +113,9 @@ extern void ExplainOpenGroup(const char *objtype, const char *labelname,
 extern void ExplainCloseGroup(const char *objtype, const char *labelname,
 							  bool labeled, ExplainState *es);
 
+extern void ExplainOpenWorker(StringInfo worker_str, ExplainState *es);
+extern void ExplainCloseWorker(ExplainState *es);
+extern void ExplainFlushWorkers(StringInfo *worker_strs, int num_workers, ExplainState *es);
+
+
 #endif							/* EXPLAIN_H */
diff --git a/src/test/regress/expected/explain.out b/src/test/regress/expected/explain.out
new file mode 100644
index 0000000000..db9914db7b
--- /dev/null
+++ b/src/test/regress/expected/explain.out
@@ -0,0 +1,75 @@
+--
+-- EXPLAIN
+--
+-- helper functions for examining json explain output
+-- return a json explain of given query
+CREATE OR REPLACE FUNCTION json_explain(query text, variadic opts text[])
+  RETURNS json LANGUAGE PLPGSQL AS $$
+DECLARE
+  plan json;
+BEGIN
+  EXECUTE 'EXPLAIN (' || array_to_string(array_append(opts, 'FORMAT JSON) '), ', ') || query INTO STRICT plan;
+  RETURN plan;
+END;
+$$;
+/*
+ * Takes a json object and processes its keys and values. For every
+ * matching key whose value is a number, set it to zero to facilitate
+ * comparisons for values that can vary across executions. If not a number,
+ * set the value to -1 to indicate we matched an unexpected key. Return
+ * non-matching keys and values unmodified.
+ */
+CREATE OR REPLACE FUNCTION json_normalize_numeric_keys_matching(obj json, regexp text)
+  RETURNS json LANGUAGE SQL AS $$
+SELECT
+  coalesce(json_object_agg(
+    key,
+    CASE
+      WHEN key ~ regexp THEN
+        CASE json_typeof(value)
+	  WHEN 'number' THEN 0::text::json
+	  ELSE (-1)::text::json
+	END
+      ELSE value
+    END
+  ), '{}')
+FROM
+  json_each(obj)
+$$;
+-- explain only worker information, normalizing any values that could vary across executions
+CREATE OR REPLACE FUNCTION explain_workers(plan_node json)
+  RETURNS json LANGUAGE SQL AS $$
+SELECT
+  coalesce(json_object_agg(key, value), '{}')
+FROM (
+  SELECT
+    key,
+    CASE key
+      WHEN 'Workers' THEN
+        (SELECT json_agg(
+	   json_normalize_numeric_keys_matching(worker, '(Blocks|Time)$|Sort Space Used')
+        )
+        FROM json_array_elements(value) AS workers(worker))
+      WHEN 'Plans' THEN
+	(SELECT json_agg(
+	  explain_workers(child)
+	)
+	FROM json_array_elements(value) AS children(child))
+    END AS value
+  FROM
+    json_each(plan_node) AS entries(key, value)
+  WHERE
+    key IN ('Workers', 'Plans')
+) AS plan_fields(key, value);
+$$;
+-- test that per-worker sort and buffers output is combined correctly in EXPLAIN
+set force_parallel_mode = true;
+SELECT explain_workers(json_explain($$
+  SELECT * FROM (VALUES(1),(2)) x ORDER BY x;
+$$, 'verbose', 'analyze', 'buffers') -> 0 -> 'Plan');
+                                                                                                                                                                                                                                                                                                                                                                                                                                                              explain_workers                                                                                                                                                                                                                                                                                                                                                                                                                                                               
+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
+ { "Plans" : [{ "Workers" : [{ "Worker Number" : 0, "Actual Startup Time" : 0, "Actual Total Time" : 0, "Actual Rows" : 2, "Actual Loops" : 1, "Sort Method" : "quicksort", "Sort Space Used" : 0, "Sort Space Type" : "Memory", "Shared Hit Blocks" : 0, "Shared Read Blocks" : 0, "Shared Dirtied Blocks" : 0, "Shared Written Blocks" : 0, "Local Hit Blocks" : 0, "Local Read Blocks" : 0, "Local Dirtied Blocks" : 0, "Local Written Blocks" : 0, "Temp Read Blocks" : 0, "Temp Written Blocks" : 0 }], "Plans" : [{ "Workers" : [{ "Worker Number" : 0, "Actual Startup Time" : 0, "Actual Total Time" : 0, "Actual Rows" : 2, "Actual Loops" : 1, "Shared Hit Blocks" : 0, "Shared Read Blocks" : 0, "Shared Dirtied Blocks" : 0, "Shared Written Blocks" : 0, "Local Hit Blocks" : 0, "Local Read Blocks" : 0, "Local Dirtied Blocks" : 0, "Local Written Blocks" : 0, "Temp Read Blocks" : 0, "Temp Written Blocks" : 0 }] }] }] }
+(1 row)
+
+reset force_parallel_mode;
diff --git a/src/test/regress/parallel_schedule b/src/test/regress/parallel_schedule
index d33a4e143d..d2b17dd3ea 100644
--- a/src/test/regress/parallel_schedule
+++ b/src/test/regress/parallel_schedule
@@ -112,7 +112,7 @@ test: plancache limit plpgsql copy2 temp domain rangefuncs prepare conversion tr
 # ----------
 # Another group of parallel tests
 # ----------
-test: partition_join partition_prune reloptions hash_part indexing partition_aggregate partition_info tuplesort
+test: partition_join partition_prune reloptions hash_part indexing partition_aggregate partition_info tuplesort explain
 
 # event triggers cannot run concurrently with any test that runs DDL
 test: event_trigger
diff --git a/src/test/regress/serial_schedule b/src/test/regress/serial_schedule
index f86f5c5682..acba391332 100644
--- a/src/test/regress/serial_schedule
+++ b/src/test/regress/serial_schedule
@@ -193,6 +193,7 @@ test: indexing
 test: partition_aggregate
 test: partition_info
 test: tuplesort
+test: explain
 test: event_trigger
 test: fast_default
 test: stats
diff --git a/src/test/regress/sql/explain.sql b/src/test/regress/sql/explain.sql
new file mode 100644
index 0000000000..5fb6e7d2e1
--- /dev/null
+++ b/src/test/regress/sql/explain.sql
@@ -0,0 +1,77 @@
+--
+-- EXPLAIN
+--
+
+-- helper functions for examining json explain output
+
+-- return a json explain of given query
+CREATE OR REPLACE FUNCTION json_explain(query text, variadic opts text[])
+  RETURNS json LANGUAGE PLPGSQL AS $$
+DECLARE
+  plan json;
+BEGIN
+  EXECUTE 'EXPLAIN (' || array_to_string(array_append(opts, 'FORMAT JSON) '), ', ') || query INTO STRICT plan;
+  RETURN plan;
+END;
+$$;
+
+/*
+ * Takes a json object and processes its keys and values. For every
+ * matching key whose value is a number, set it to zero to facilitate
+ * comparisons for values that can vary across executions. If not a number,
+ * set the value to -1 to indicate we matched an unexpected key. Return
+ * non-matching keys and values unmodified.
+ */
+CREATE OR REPLACE FUNCTION json_normalize_numeric_keys_matching(obj json, regexp text)
+  RETURNS json LANGUAGE SQL AS $$
+SELECT
+  coalesce(json_object_agg(
+    key,
+    CASE
+      WHEN key ~ regexp THEN
+        CASE json_typeof(value)
+	  WHEN 'number' THEN 0::text::json
+	  ELSE (-1)::text::json
+	END
+      ELSE value
+    END
+  ), '{}')
+FROM
+  json_each(obj)
+$$;
+
+-- explain only worker information, normalizing any values that could vary across executions
+CREATE OR REPLACE FUNCTION explain_workers(plan_node json)
+  RETURNS json LANGUAGE SQL AS $$
+SELECT
+  coalesce(json_object_agg(key, value), '{}')
+FROM (
+  SELECT
+    key,
+    CASE key
+      WHEN 'Workers' THEN
+        (SELECT json_agg(
+	   json_normalize_numeric_keys_matching(worker, '(Blocks|Time)$|Sort Space Used')
+        )
+        FROM json_array_elements(value) AS workers(worker))
+      WHEN 'Plans' THEN
+	(SELECT json_agg(
+	  explain_workers(child)
+	)
+	FROM json_array_elements(value) AS children(child))
+    END AS value
+  FROM
+    json_each(plan_node) AS entries(key, value)
+  WHERE
+    key IN ('Workers', 'Plans')
+) AS plan_fields(key, value);
+$$;
+
+-- test that per-worker sort and buffers output is combined correctly in EXPLAIN
+set force_parallel_mode = true;
+
+SELECT explain_workers(json_explain($$
+  SELECT * FROM (VALUES(1),(2)) x ORDER BY x;
+$$, 'verbose', 'analyze', 'buffers') -> 0 -> 'Plan');
+
+reset force_parallel_mode;
#9Georgios Kokolatos
gkokolatos@pm.me
In reply to: Maciek Sakrejda (#8)
Re: Duplicate Workers entries in some EXPLAIN plans

The following review has been posted through the commitfest application:
make installcheck-world: tested, failed
Implements feature: not tested
Spec compliant: not tested
Documentation: not tested

The current version of the patch (v2) applies cleanly and does what it says on the box.

Any thoughts on what we should do with text mode output (which is

untouched right now)? The output Andres proposed above makes sense to
me, but I'd like to get more input.

Nothing to add beyond what is stated in the thread.

Sort Method: external merge Disk: 4920kB
Buffers: shared hit=682 read=10188, temp read=1415 written=2101
Worker 0: actual time=130.058..130.324 rows=1324 loops=1
Sort Method: external merge Disk: 5880kB
Buffers: shared hit=337 read=3489, temp read=505 written=739
Worker 1: actual time=130.273..130.512 rows=1297 loops=1
Buffers: shared hit=345 read=3507, temp read=505 written=744
Sort Method: external merge Disk: 5920kB

This proposal seems like a fitting approach. Awaiting for v3 which
will include the text version. May I suggest a format yaml test case?
Just to make certain that no regressions occur in the future.

Thanks,

#10Maciek Sakrejda
m.sakrejda@gmail.com
In reply to: Georgios Kokolatos (#9)
Re: Duplicate Workers entries in some EXPLAIN plans

Sounds good, I'll try that format. Any idea how to test YAML? With the
JSON format, I was able to rely on Postgres' own JSON-manipulating
functions to strip or canonicalize fields that can vary across
executions--I can't really do that with YAML. Or should I run EXPLAIN
with COSTS OFF, TIMING OFF, SUMMARY OFF and assume that for simple
queries the BUFFERS output (and other fields I can't turn off like
Sort Space Used) *is* going to be stable?

#11Georgios Kokolatos
gkokolatos@pm.me
In reply to: Maciek Sakrejda (#10)
Re: Duplicate Workers entries in some EXPLAIN plans

Sounds good, I'll try that format. Any idea how to test YAML? With the
JSON format, I was able to rely on Postgres' own JSON-manipulating
functions to strip or canonicalize fields that can vary across
executions--I can't really do that with YAML.

Yes, this approach was clear in the patch and works great with Json. Also
you are correct, this can not be done with YAML. I spend a bit of time to
look around and I could not find any tests really on yaml format.

Or should I run EXPLAIN
with COSTS OFF, TIMING OFF, SUMMARY OFF and assume that for simple
queries the BUFFERS output (and other fields I can't turn off like
Sort Space Used) *is* going to be stable?

I have to admit with the current diff tool used in pg_regress, this is not possible.
I am pretty certain that it *is not* going to be stable. Not for long anyways.
I withdraw my suggestion for YAML and currently awaiting for TEXT format only.

#12Maciek Sakrejda
m.sakrejda@gmail.com
In reply to: Georgios Kokolatos (#11)
1 attachment(s)
Re: Duplicate Workers entries in some EXPLAIN plans

TEXT format was tricky due to its inconsistencies, but I think I have
something working reasonably well. I added a simple test for TEXT
format output as well, using a similar approach as the JSON format
test, and liberally regexp_replacing away any volatile output. I
suppose in theory we could do this for YAML, too, but I think it's
gross enough not to be worth it, especially given the high similarity
of all the structured outputs.

Attachments:

merge-explain-worker-output-v3.patchtext/x-patch; charset=US-ASCII; name=merge-explain-worker-output-v3.patchDownload
diff --git a/src/backend/commands/explain.c b/src/backend/commands/explain.c
index d189b8d573..c3c27e13a1 100644
--- a/src/backend/commands/explain.c
+++ b/src/backend/commands/explain.c
@@ -100,7 +100,7 @@ static void show_sortorder_options(StringInfo buf, Node *sortexpr,
 								   Oid sortOperator, Oid collation, bool nullsFirst);
 static void show_tablesample(TableSampleClause *tsc, PlanState *planstate,
 							 List *ancestors, ExplainState *es);
-static void show_sort_info(SortState *sortstate, ExplainState *es);
+static void show_sort_info(SortState *sortstate, StringInfo* worker_strs, ExplainState *es);
 static void show_hash_info(HashState *hashstate, ExplainState *es);
 static void show_tidbitmap_info(BitmapHeapScanState *planstate,
 								ExplainState *es);
@@ -289,6 +289,7 @@ NewExplainState(void)
 	es->costs = true;
 	/* Prepare output buffer. */
 	es->str = makeStringInfo();
+	es->root_str = es->str;
 
 	return es;
 }
@@ -1090,9 +1091,20 @@ ExplainNode(PlanState *planstate, List *ancestors,
 	const char *partialmode = NULL;
 	const char *operation = NULL;
 	const char *custom_name = NULL;
+	StringInfo *worker_strs = NULL;
 	int			save_indent = es->indent;
 	bool		haschildren;
 
+	/* Prepare per-worker output */
+	if (es->analyze && planstate->worker_instrument) {
+		int num_workers = planstate->worker_instrument->num_workers;
+		int n;
+		worker_strs = (StringInfo *) palloc0(num_workers * sizeof(StringInfo));
+		for (n = 0; n < num_workers; n++) {
+			worker_strs[n] = makeStringInfo();
+		}
+	}
+
 	switch (nodeTag(plan))
 	{
 		case T_Result:
@@ -1357,6 +1369,58 @@ ExplainNode(PlanState *planstate, List *ancestors,
 		ExplainPropertyBool("Parallel Aware", plan->parallel_aware, es);
 	}
 
+	/* Prepare worker general execution details */
+	if (es->analyze && es->verbose && planstate->worker_instrument)
+	{
+		WorkerInstrumentation *w = planstate->worker_instrument;
+		int			n;
+
+		for (n = 0; n < w->num_workers; ++n)
+		{
+			Instrumentation *instrument = &w->instrument[n];
+			double		nloops = instrument->nloops;
+			double		startup_ms;
+			double		total_ms;
+			double		rows;
+
+			if (nloops <= 0)
+				continue;
+			startup_ms = 1000.0 * instrument->startup / nloops;
+			total_ms = 1000.0 * instrument->total / nloops;
+			rows = instrument->ntuples / nloops;
+
+			ExplainOpenWorker(worker_strs[n], es);
+
+			if (es->format == EXPLAIN_FORMAT_TEXT)
+			{
+				if (es->timing)
+					appendStringInfo(es->str,
+									 "actual time=%.3f..%.3f rows=%.0f loops=%.0f\n",
+									 startup_ms, total_ms, rows, nloops);
+				else
+					appendStringInfo(es->str,
+									 "actual rows=%.0f loops=%.0f\n",
+									 rows, nloops);
+				if (es->buffers)
+					show_buffer_usage(es, &instrument->bufusage);
+			}
+			else
+			{
+				if (es->timing)
+				{
+					ExplainPropertyFloat("Actual Startup Time", "ms",
+										 startup_ms, 3, es);
+					ExplainPropertyFloat("Actual Total Time", "ms",
+										 total_ms, 3, es);
+				}
+				ExplainPropertyFloat("Actual Rows", NULL, rows, 0, es);
+				ExplainPropertyFloat("Actual Loops", NULL, nloops, 0, es);
+			}
+		}
+
+		ExplainCloseWorker(es);
+	}
+
 	switch (nodeTag(plan))
 	{
 		case T_SeqScan:
@@ -1856,7 +1920,7 @@ ExplainNode(PlanState *planstate, List *ancestors,
 			break;
 		case T_Sort:
 			show_sort_keys(castNode(SortState, planstate), ancestors, es);
-			show_sort_info(castNode(SortState, planstate), es);
+			show_sort_info(castNode(SortState, planstate), worker_strs, es);
 			break;
 		case T_MergeAppend:
 			show_merge_append_keys(castNode(MergeAppendState, planstate),
@@ -1885,74 +1949,30 @@ ExplainNode(PlanState *planstate, List *ancestors,
 	if (es->buffers && planstate->instrument)
 		show_buffer_usage(es, &planstate->instrument->bufusage);
 
-	/* Show worker detail */
-	if (es->analyze && es->verbose && !es->hide_workers &&
-		planstate->worker_instrument)
+	/* Prepare worker buffer usage */
+	if (es->buffers && es->analyze && es->verbose && !es->hide_workers
+		&& planstate->worker_instrument && es->format != EXPLAIN_FORMAT_TEXT)
 	{
 		WorkerInstrumentation *w = planstate->worker_instrument;
-		bool		opened_group = false;
 		int			n;
 
 		for (n = 0; n < w->num_workers; ++n)
 		{
 			Instrumentation *instrument = &w->instrument[n];
 			double		nloops = instrument->nloops;
-			double		startup_ms;
-			double		total_ms;
-			double		rows;
 
 			if (nloops <= 0)
 				continue;
-			startup_ms = 1000.0 * instrument->startup / nloops;
-			total_ms = 1000.0 * instrument->total / nloops;
-			rows = instrument->ntuples / nloops;
-
-			if (es->format == EXPLAIN_FORMAT_TEXT)
-			{
-				appendStringInfoSpaces(es->str, es->indent * 2);
-				appendStringInfo(es->str, "Worker %d: ", n);
-				if (es->timing)
-					appendStringInfo(es->str,
-									 "actual time=%.3f..%.3f rows=%.0f loops=%.0f\n",
-									 startup_ms, total_ms, rows, nloops);
-				else
-					appendStringInfo(es->str,
-									 "actual rows=%.0f loops=%.0f\n",
-									 rows, nloops);
-				es->indent++;
-				if (es->buffers)
-					show_buffer_usage(es, &instrument->bufusage);
-				es->indent--;
-			}
-			else
-			{
-				if (!opened_group)
-				{
-					ExplainOpenGroup("Workers", "Workers", false, es);
-					opened_group = true;
-				}
-				ExplainOpenGroup("Worker", NULL, true, es);
-				ExplainPropertyInteger("Worker Number", NULL, n, es);
 
-				if (es->timing)
-				{
-					ExplainPropertyFloat("Actual Startup Time", "ms",
-										 startup_ms, 3, es);
-					ExplainPropertyFloat("Actual Total Time", "ms",
-										 total_ms, 3, es);
-				}
-				ExplainPropertyFloat("Actual Rows", NULL, rows, 0, es);
-				ExplainPropertyFloat("Actual Loops", NULL, nloops, 0, es);
-
-				if (es->buffers)
-					show_buffer_usage(es, &instrument->bufusage);
-
-				ExplainCloseGroup("Worker", NULL, true, es);
-			}
+			ExplainOpenWorker(worker_strs[n], es);
+			show_buffer_usage(es, &instrument->bufusage);
 		}
+		ExplainCloseWorker(es);
+	}
 
-		if (opened_group)
-			ExplainCloseGroup("Workers", "Workers", false, es);
+	/* Show worker detail */
+	if (planstate->worker_instrument) {
+		ExplainFlushWorkers(worker_strs, planstate->worker_instrument->num_workers, es);
 	}
 
 	/* Get ready to display the child plans */
@@ -2552,7 +2572,7 @@ show_tablesample(TableSampleClause *tsc, PlanState *planstate,
  * If it's EXPLAIN ANALYZE, show tuplesort stats for a sort node
  */
 static void
-show_sort_info(SortState *sortstate, ExplainState *es)
+show_sort_info(SortState *sortstate, StringInfo *worker_strs, ExplainState *es)
 {
 	if (!es->analyze)
 		return;
@@ -2593,7 +2613,7 @@ show_sort_info(SortState *sortstate, ExplainState *es)
 	if (sortstate->shared_info != NULL)
 	{
 		int			n;
-		bool		opened_group = false;
+		bool 		indent = es->print_workers;
 
 		for (n = 0; n < sortstate->shared_info->num_workers; n++)
 		{
@@ -2609,32 +2629,26 @@ show_sort_info(SortState *sortstate, ExplainState *es)
 			spaceType = tuplesort_space_type_name(sinstrument->spaceType);
 			spaceUsed = sinstrument->spaceUsed;
 
+			ExplainOpenWorker(worker_strs[n], es);
+
 			if (es->format == EXPLAIN_FORMAT_TEXT)
 			{
-				appendStringInfoSpaces(es->str, es->indent * 2);
-				if (n > 0 || !es->hide_workers)
-					appendStringInfo(es->str, "Worker %d:  ", n);
+				if (indent)
+				{
+					appendStringInfoSpaces(es->str, es->indent * 2);
+				}
 				appendStringInfo(es->str,
 								 "Sort Method: %s  %s: %ldkB\n",
 								 sortMethod, spaceType, spaceUsed);
 			}
 			else
 			{
-				if (!opened_group)
-				{
-					ExplainOpenGroup("Workers", "Workers", false, es);
-					opened_group = true;
-				}
-				ExplainOpenGroup("Worker", NULL, true, es);
-				ExplainPropertyInteger("Worker Number", NULL, n, es);
 				ExplainPropertyText("Sort Method", sortMethod, es);
 				ExplainPropertyInteger("Sort Space Used", "kB", spaceUsed, es);
 				ExplainPropertyText("Sort Space Type", spaceType, es);
-				ExplainCloseGroup("Worker", NULL, true, es);
 			}
 		}
-		if (opened_group)
-			ExplainCloseGroup("Workers", "Workers", false, es);
+		ExplainCloseWorker(es);
 	}
 }
 
@@ -3751,6 +3765,75 @@ ExplainCloseGroup(const char *objtype, const char *labelname,
 	}
 }
 
+/*
+ * Begin output for a specific worker. If we are currently producing output
+ * for another worker, close that worker automatically.
+ */
+void
+ExplainOpenWorker(StringInfo worker_str, ExplainState *es)
+{
+	/*
+	 * For structured formats, we indent twice--once for the "Worker" group
+	 * and once for the "Workers" group--but only need to adjust indentation
+	 * once for each string of ExplainOpenWorker calls. For text format, we
+	 * just indent once, to add worker info on the next worker line.
+	 */
+	if (es->str == es->root_str)
+	{
+		es->indent += es->format == EXPLAIN_FORMAT_TEXT ? 1 : 2;
+	}
+
+	es->print_workers = true;
+	es->str = worker_str;
+}
+
+/*
+ * Explicitly end output for the current worker and resume output
+ * for the root of the node.
+ */
+void
+ExplainCloseWorker(ExplainState *es)
+{
+	/* as above, the text format has different indentation */
+	es->indent -= es->format == EXPLAIN_FORMAT_TEXT ? 1 : 2;
+
+	es->str = es->root_str;
+}
+
+/*
+ * Flush output registered so far to the output buffer and deallocate
+ * mechanism for writing to workers.
+ */
+void
+ExplainFlushWorkers(StringInfo* worker_strs, int num_workers, ExplainState *es)
+{
+	int i;
+
+	if (!es->print_workers) {
+		return;
+	}
+
+	ExplainOpenGroup("Workers", "Workers", false, es);
+	for (i = 0; i < num_workers; i++) {
+		ExplainOpenGroup("Worker", NULL, true, es);
+		if (es->format == EXPLAIN_FORMAT_TEXT)
+		{
+			appendStringInfoSpaces(es->str, es->indent * 2);
+			appendStringInfo(es->str, "Worker %d: ", i);
+		}
+		else
+		{
+			ExplainPropertyInteger("Worker Number", NULL, i, es);
+		}
+
+		appendStringInfoString(es->str, worker_strs[i]->data);
+		ExplainCloseGroup("Worker", NULL, true, es);
+	}
+	ExplainCloseGroup("Workers", "Workers", false, es);
+	// do we have any other cleanup to do?
+	es->print_workers = false;
+}
+
 /*
  * Emit a "dummy" group that never has any members.
  *
diff --git a/src/include/commands/explain.h b/src/include/commands/explain.h
index 45027a7c47..2919b2780d 100644
--- a/src/include/commands/explain.h
+++ b/src/include/commands/explain.h
@@ -27,7 +27,7 @@ typedef enum ExplainFormat
 
 typedef struct ExplainState
 {
-	StringInfo	str;			/* output buffer */
+	StringInfo	str;			/* current output buffer */
 	/* options */
 	bool		verbose;		/* be verbose */
 	bool		analyze;		/* print actual times */
@@ -40,6 +40,8 @@ typedef struct ExplainState
 	/* state for output formatting --- not reset for each new plan tree */
 	int			indent;			/* current indentation level */
 	List	   *grouping_stack; /* format-specific grouping state */
+	bool        print_workers;  /* whether current node has worker metadata */
+	StringInfo	root_str;		/* root output buffer */
 	/* state related to the current plan tree (filled by ExplainPrintPlan) */
 	PlannedStmt *pstmt;			/* top of plan */
 	List	   *rtable;			/* range table */
@@ -111,4 +113,9 @@ extern void ExplainOpenGroup(const char *objtype, const char *labelname,
 extern void ExplainCloseGroup(const char *objtype, const char *labelname,
 							  bool labeled, ExplainState *es);
 
+extern void ExplainOpenWorker(StringInfo worker_str, ExplainState *es);
+extern void ExplainCloseWorker(ExplainState *es);
+extern void ExplainFlushWorkers(StringInfo *worker_strs, int num_workers, ExplainState *es);
+
+
 #endif							/* EXPLAIN_H */
diff --git a/src/test/regress/expected/explain.out b/src/test/regress/expected/explain.out
new file mode 100644
index 0000000000..8034a4e0db
--- /dev/null
+++ b/src/test/regress/expected/explain.out
@@ -0,0 +1,114 @@
+--
+-- EXPLAIN
+--
+-- helper functions for examining explain output
+-- return a json explain of given query
+CREATE OR REPLACE FUNCTION json_explain(query text, variadic opts text[])
+  RETURNS json LANGUAGE PLPGSQL AS $$
+DECLARE
+  plan json;
+BEGIN
+  EXECUTE 'EXPLAIN (' || array_to_string(array_append(opts, 'FORMAT JSON) '), ', ') || query INTO STRICT plan;
+  RETURN plan;
+END;
+$$;
+-- return a text explain of given query
+CREATE OR REPLACE FUNCTION text_explain(query text, variadic opts text[])
+  RETURNS text LANGUAGE PLPGSQL AS $$
+DECLARE
+  plan_row RECORD;
+  plan text = '';
+BEGIN
+  FOR plan_row IN EXECUTE 'EXPLAIN (' || array_to_string(array_append(opts, 'FORMAT TEXT) '), ', ') || query LOOP
+    plan = plan || plan_row."QUERY PLAN" || E'\n';
+  END LOOP;
+
+  RETURN plan;
+END;
+$$;
+/*
+ * Takes a json object and processes its keys and values. For every
+ * matching key whose value is a number, set it to zero to facilitate
+ * comparisons for values that can vary across executions. If not a number,
+ * set the value to -1 to indicate we matched an unexpected key. Return
+ * non-matching keys and values unmodified.
+ */
+CREATE OR REPLACE FUNCTION json_normalize_numeric_keys_matching(obj json, regexp text)
+  RETURNS json LANGUAGE SQL AS $$
+SELECT
+  coalesce(json_object_agg(
+    key,
+    CASE
+      WHEN key ~ regexp THEN
+        CASE json_typeof(value)
+	  WHEN 'number' THEN 0::text::json
+	  ELSE (-1)::text::json
+	END
+      ELSE value
+    END
+  ), '{}')
+FROM
+  json_each(obj)
+$$;
+-- explain only worker information, normalizing any values that could vary across executions
+CREATE OR REPLACE FUNCTION explain_workers(plan_node json)
+  RETURNS json LANGUAGE SQL AS $$
+SELECT
+  coalesce(json_object_agg(key, value), '{}')
+FROM (
+  SELECT
+    key,
+    CASE key
+      WHEN 'Workers' THEN
+        (SELECT json_agg(
+	   json_normalize_numeric_keys_matching(worker, '(Blocks|Time)$|Sort Space Used')
+        )
+        FROM json_array_elements(value) AS workers(worker))
+      WHEN 'Plans' THEN
+	(SELECT json_agg(
+	  explain_workers(child)
+	)
+	FROM json_array_elements(value) AS children(child))
+    END AS value
+  FROM
+    json_each(plan_node) AS entries(key, value)
+  WHERE
+    key IN ('Workers', 'Plans')
+) AS plan_fields(key, value);
+$$;
+-- test that per-worker sort and buffers output is combined correctly in EXPLAIN
+set force_parallel_mode = true;
+SELECT explain_workers(json_explain($$
+  SELECT * FROM (VALUES(1),(2)) x ORDER BY x;
+$$, 'verbose', 'analyze', 'buffers') -> 0 -> 'Plan');
+                                                                                                                                                                                                                                                                                                                                                                                                                                                              explain_workers                                                                                                                                                                                                                                                                                                                                                                                                                                                               
+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
+ { "Plans" : [{ "Workers" : [{ "Worker Number" : 0, "Actual Startup Time" : 0, "Actual Total Time" : 0, "Actual Rows" : 2, "Actual Loops" : 1, "Sort Method" : "quicksort", "Sort Space Used" : 0, "Sort Space Type" : "Memory", "Shared Hit Blocks" : 0, "Shared Read Blocks" : 0, "Shared Dirtied Blocks" : 0, "Shared Written Blocks" : 0, "Local Hit Blocks" : 0, "Local Read Blocks" : 0, "Local Dirtied Blocks" : 0, "Local Written Blocks" : 0, "Temp Read Blocks" : 0, "Temp Written Blocks" : 0 }], "Plans" : [{ "Workers" : [{ "Worker Number" : 0, "Actual Startup Time" : 0, "Actual Total Time" : 0, "Actual Rows" : 2, "Actual Loops" : 1, "Shared Hit Blocks" : 0, "Shared Read Blocks" : 0, "Shared Dirtied Blocks" : 0, "Shared Written Blocks" : 0, "Local Hit Blocks" : 0, "Local Read Blocks" : 0, "Local Dirtied Blocks" : 0, "Local Written Blocks" : 0, "Temp Read Blocks" : 0, "Temp Written Blocks" : 0 }] }] }] }
+(1 row)
+
+SELECT regexp_replace(regexp_replace(text_explain($$
+  SELECT * FROM (VALUES(1),(2)) x ORDER BY x;
+$$, 'verbose', 'analyze', 'buffers', 'timing off', 'costs off', 'summary off'),
+  'Buffers: .+?$', 'Buffers: xxx', 'n'), 'Sort Method: .+?$', 'Sort Method: xxx', 'n');
+                          regexp_replace                           
+-------------------------------------------------------------------
+ Gather (actual rows=2 loops=1)                                   +
+   Output: "*VALUES*".column1, (ROW("*VALUES*".column1))          +
+   Workers Planned: 1                                             +
+   Workers Launched: 1                                            +
+   Single Copy: true                                              +
+   Buffers: xxx                                                   +
+   ->  Sort (actual rows=2 loops=1)                               +
+         Output: "*VALUES*".column1, (ROW("*VALUES*".column1))    +
+         Sort Key: (ROW("*VALUES*".column1))                      +
+         Buffers: shared hit=114                                  +
+         Worker 0: actual rows=2 loops=1                          +
+           Buffers: shared hit=114                                +
+           Sort Method: xxx                                       +
+         ->  Values Scan on "*VALUES*" (actual rows=2 loops=1)    +
+               Output: "*VALUES*".column1, ROW("*VALUES*".column1)+
+               Worker 0: actual rows=2 loops=1                    +
+ 
+(1 row)
+
+reset force_parallel_mode;
diff --git a/src/test/regress/expected/select_parallel.out b/src/test/regress/expected/select_parallel.out
index 96dfb7c8dd..cba15f7133 100644
--- a/src/test/regress/expected/select_parallel.out
+++ b/src/test/regress/expected/select_parallel.out
@@ -591,10 +591,10 @@ select * from explain_parallel_sort_stats();
          ->  Sort (actual rows=2000 loops=15)
                Sort Key: tenk1.ten
                Sort Method: quicksort  Memory: xxx
-               Worker 0:  Sort Method: quicksort  Memory: xxx
-               Worker 1:  Sort Method: quicksort  Memory: xxx
-               Worker 2:  Sort Method: quicksort  Memory: xxx
-               Worker 3:  Sort Method: quicksort  Memory: xxx
+               Worker 0: Sort Method: quicksort  Memory: xxx
+               Worker 1: Sort Method: quicksort  Memory: xxx
+               Worker 2: Sort Method: quicksort  Memory: xxx
+               Worker 3: Sort Method: quicksort  Memory: xxx
                ->  Parallel Seq Scan on tenk1 (actual rows=2000 loops=15)
                      Filter: (ten < 100)
 (14 rows)
diff --git a/src/test/regress/parallel_schedule b/src/test/regress/parallel_schedule
index d33a4e143d..d2b17dd3ea 100644
--- a/src/test/regress/parallel_schedule
+++ b/src/test/regress/parallel_schedule
@@ -112,7 +112,7 @@ test: plancache limit plpgsql copy2 temp domain rangefuncs prepare conversion tr
 # ----------
 # Another group of parallel tests
 # ----------
-test: partition_join partition_prune reloptions hash_part indexing partition_aggregate partition_info tuplesort
+test: partition_join partition_prune reloptions hash_part indexing partition_aggregate partition_info tuplesort explain
 
 # event triggers cannot run concurrently with any test that runs DDL
 test: event_trigger
diff --git a/src/test/regress/serial_schedule b/src/test/regress/serial_schedule
index f86f5c5682..acba391332 100644
--- a/src/test/regress/serial_schedule
+++ b/src/test/regress/serial_schedule
@@ -193,6 +193,7 @@ test: indexing
 test: partition_aggregate
 test: partition_info
 test: tuplesort
+test: explain
 test: event_trigger
 test: fast_default
 test: stats
diff --git a/src/test/regress/sql/explain.sql b/src/test/regress/sql/explain.sql
new file mode 100644
index 0000000000..03accb1b83
--- /dev/null
+++ b/src/test/regress/sql/explain.sql
@@ -0,0 +1,98 @@
+--
+-- EXPLAIN
+--
+
+-- helper functions for examining explain output
+
+-- return a json explain of given query
+CREATE OR REPLACE FUNCTION json_explain(query text, variadic opts text[])
+  RETURNS json LANGUAGE PLPGSQL AS $$
+DECLARE
+  plan json;
+BEGIN
+  EXECUTE 'EXPLAIN (' || array_to_string(array_append(opts, 'FORMAT JSON) '), ', ') || query INTO STRICT plan;
+  RETURN plan;
+END;
+$$;
+
+-- return a text explain of given query
+CREATE OR REPLACE FUNCTION text_explain(query text, variadic opts text[])
+  RETURNS text LANGUAGE PLPGSQL AS $$
+DECLARE
+  plan_row RECORD;
+  plan text = '';
+BEGIN
+  FOR plan_row IN EXECUTE 'EXPLAIN (' || array_to_string(array_append(opts, 'FORMAT TEXT) '), ', ') || query LOOP
+    plan = plan || plan_row."QUERY PLAN" || E'\n';
+  END LOOP;
+
+  RETURN plan;
+END;
+$$;
+
+
+/*
+ * Takes a json object and processes its keys and values. For every
+ * matching key whose value is a number, set it to zero to facilitate
+ * comparisons for values that can vary across executions. If not a number,
+ * set the value to -1 to indicate we matched an unexpected key. Return
+ * non-matching keys and values unmodified.
+ */
+CREATE OR REPLACE FUNCTION json_normalize_numeric_keys_matching(obj json, regexp text)
+  RETURNS json LANGUAGE SQL AS $$
+SELECT
+  coalesce(json_object_agg(
+    key,
+    CASE
+      WHEN key ~ regexp THEN
+        CASE json_typeof(value)
+	  WHEN 'number' THEN 0::text::json
+	  ELSE (-1)::text::json
+	END
+      ELSE value
+    END
+  ), '{}')
+FROM
+  json_each(obj)
+$$;
+
+-- explain only worker information, normalizing any values that could vary across executions
+CREATE OR REPLACE FUNCTION explain_workers(plan_node json)
+  RETURNS json LANGUAGE SQL AS $$
+SELECT
+  coalesce(json_object_agg(key, value), '{}')
+FROM (
+  SELECT
+    key,
+    CASE key
+      WHEN 'Workers' THEN
+        (SELECT json_agg(
+	   json_normalize_numeric_keys_matching(worker, '(Blocks|Time)$|Sort Space Used')
+        )
+        FROM json_array_elements(value) AS workers(worker))
+      WHEN 'Plans' THEN
+	(SELECT json_agg(
+	  explain_workers(child)
+	)
+	FROM json_array_elements(value) AS children(child))
+    END AS value
+  FROM
+    json_each(plan_node) AS entries(key, value)
+  WHERE
+    key IN ('Workers', 'Plans')
+) AS plan_fields(key, value);
+$$;
+
+-- test that per-worker sort and buffers output is combined correctly in EXPLAIN
+set force_parallel_mode = true;
+
+SELECT explain_workers(json_explain($$
+  SELECT * FROM (VALUES(1),(2)) x ORDER BY x;
+$$, 'verbose', 'analyze', 'buffers') -> 0 -> 'Plan');
+
+SELECT regexp_replace(regexp_replace(text_explain($$
+  SELECT * FROM (VALUES(1),(2)) x ORDER BY x;
+$$, 'verbose', 'analyze', 'buffers', 'timing off', 'costs off', 'summary off'),
+  'Buffers: .+?$', 'Buffers: xxx', 'n'), 'Sort Method: .+?$', 'Sort Method: xxx', 'n');
+
+reset force_parallel_mode;
#13Georgios Kokolatos
gkokolatos@pm.me
In reply to: Maciek Sakrejda (#12)
Re: Duplicate Workers entries in some EXPLAIN plans

The following review has been posted through the commitfest application:
make installcheck-world: not tested
Implements feature: not tested
Spec compliant: not tested
Documentation: not tested

TEXT format was tricky due to its inconsistencies, but I think I have
something working reasonably well. I added a simple test for TEXT
format output as well, using a similar approach as the JSON format

Great!

test, and liberally regexp_replacing away any volatile output. I
suppose in theory we could do this for YAML, too, but I think it's
gross enough not to be worth it, especially given the high similarity
of all the structured outputs.

Agreed, what is in the patch suffices. Overall great work, a couple of
minor nitpicks if you allow me.

+   /* Prepare per-worker output */
+   if (es->analyze && planstate->worker_instrument) {

Style, parenthesis on its own line.

+       int num_workers = planstate->worker_instrument->num_workers;
+       int n;
+       worker_strs = (StringInfo *) palloc0(num_workers * sizeof(StringInfo));
+       for (n = 0; n < num_workers; n++) {

I think C99 would be better here. Also no parenthesis needed.

+           worker_strs[n] = makeStringInfo();
+       }
+   }

@@ -1357,6 +1369,58 @@ ExplainNode(PlanState *planstate, List *ancestors,
ExplainPropertyBool("Parallel Aware", plan->parallel_aware, es);
}

+   /* Prepare worker general execution details */
+   if (es->analyze && es->verbose && planstate->worker_instrument)
+   {
+       WorkerInstrumentation *w = planstate->worker_instrument;
+       int         n;
+
+       for (n = 0; n < w->num_workers; ++n)

I think C99 would be better here.

+       {
+           Instrumentation *instrument = &w->instrument[n];
+           double      nloops = instrument->nloops;
-               appendStringInfoSpaces(es->str, es->indent * 2);
-               if (n > 0 || !es->hide_workers)
-                   appendStringInfo(es->str, "Worker %d:  ", n);
+               if (indent)
+               {
+                   appendStringInfoSpaces(es->str, es->indent * 2);
+               }

Style: No parenthesis needed

-       if (opened_group)
-           ExplainCloseGroup("Workers", "Workers", false, es);
+   /* Show worker detail */
+   if (planstate->worker_instrument) {
+       ExplainFlushWorkers(worker_strs, planstate->worker_instrument->num_workers, es);
    }

Style: No parenthesis needed

+    * just indent once, to add worker info on the next worker line.
+    */
+   if (es->str == es->root_str)
+   {
+       es->indent += es->format == EXPLAIN_FORMAT_TEXT ? 1 : 2;
+   }
+

Style: No parenthesis needed

+   ExplainCloseGroup("Workers", "Workers", false, es);
+   // do we have any other cleanup to do?

This comment does not really explain anything. Either remove
or rephrase. Also C style comments.

+ es->print_workers = false;
+}

int indent; /* current indentation level */
List *grouping_stack; /* format-specific grouping state */
+ bool print_workers; /* whether current node has worker metadata */

Hmm.. commit <b925a00f4ef> introduced `hide_workers` in the struct. Having both
names in the struct so far apart even seems a bit confusing and sloppy. Do you
think it would be possible to combine or rename?

+extern void ExplainOpenWorker(StringInfo worker_str, ExplainState *es);
+extern void ExplainCloseWorker(ExplainState *es);
+extern void ExplainFlushWorkers(StringInfo *worker_strs, int num_workers, ExplainState *es);

No need to expose those, is there? I feel there should be static.

Awaiting for answer or resolution of these comments to change the status.

//Georgios

#14Maciek Sakrejda
m.sakrejda@gmail.com
In reply to: Georgios Kokolatos (#13)
Re: Duplicate Workers entries in some EXPLAIN plans

Thanks! I'll fix the brace issues. Re: the other items:

+       int num_workers = planstate->worker_instrument->num_workers;
+       int n;
+       worker_strs = (StringInfo *) palloc0(num_workers * sizeof(StringInfo));
+       for (n = 0; n < num_workers; n++) {

I think C99 would be better here. Also no parenthesis needed.

Pardon my C illiteracy, but what am I doing that's not valid C99 here?

+   /* Prepare worker general execution details */
+   if (es->analyze && es->verbose && planstate->worker_instrument)
+   {
+       WorkerInstrumentation *w = planstate->worker_instrument;
+       int         n;
+
+       for (n = 0; n < w->num_workers; ++n)

I think C99 would be better here.

And here (if it's not the same problem)?

+   ExplainCloseGroup("Workers", "Workers", false, es);
+   // do we have any other cleanup to do?

This comment does not really explain anything. Either remove
or rephrase. Also C style comments.

Good catch, thanks--I had put this in to remind myself (and reviewers)
about cleanup, but I don't think there's anything else to do, so I'll
just drop it.

int indent; /* current indentation level */
List *grouping_stack; /* format-specific grouping state */
+ bool print_workers; /* whether current node has worker metadata */

Hmm.. commit <b925a00f4ef> introduced `hide_workers` in the struct. Having both
names in the struct so far apart even seems a bit confusing and sloppy. Do you
think it would be possible to combine or rename?

I noticed that. I was thinking about combining them, but
"hide_workers" seems to be about "pretend there is no worker output
even if there is" and "print_workers" is "keep track of whether or not
there is worker output to print". Maybe I'll rename to
"has_worker_output"?

+extern void ExplainOpenWorker(StringInfo worker_str, ExplainState *es);
+extern void ExplainCloseWorker(ExplainState *es);
+extern void ExplainFlushWorkers(StringInfo *worker_strs, int num_workers, ExplainState *es);

No need to expose those, is there? I feel there should be static.

Good call, I'll update.

#15Georgios Kokolatos
gkokolatos@pm.me
In reply to: Maciek Sakrejda (#14)
Re: Duplicate Workers entries in some EXPLAIN plans
+       int num_workers = planstate->worker_instrument->num_workers;
+       int n;
+       worker_strs = (StringInfo *) palloc0(num_workers * sizeof(StringInfo));
+       for (n = 0; n < num_workers; n++) {

I think C99 would be better here. Also no parenthesis needed.

Pardon my C illiteracy, but what am I doing that's not valid C99 here?

My bad, I should have been more clear. I meant that it is preferable to use
the C99 standard which calls for declaring variables in the scope that you
need them. In this case, 'n' is needed only in the for loop, so something like

for (int n = 0; n < num_workers; n++)

is preferable. To be clear, your code was perfectly valid. It was only the
style I was referring to.

+ for (n = 0; n < w->num_workers; ++n)

I think C99 would be better here.

And here (if it's not the same problem)?

Exactly the same as above.

int indent; /* current indentation level */
List *grouping_stack; /* format-specific grouping state */
+ bool print_workers; /* whether current node has worker metadata */

Hmm.. commit <b925a00f4ef> introduced `hide_workers` in the struct. Having both
names in the struct so far apart even seems a bit confusing and sloppy. Do you
think it would be possible to combine or rename?

I noticed that. I was thinking about combining them, but
"hide_workers" seems to be about "pretend there is no worker output
even if there is" and "print_workers" is "keep track of whether or not
there is worker output to print". Maybe I'll rename to
"has_worker_output"?

The rename sounds a bit better in my humble opinion. Thanks.

#16Maciek Sakrejda
m.sakrejda@gmail.com
In reply to: Georgios Kokolatos (#15)
1 attachment(s)
Re: Duplicate Workers entries in some EXPLAIN plans

On Wed, Jan 22, 2020 at 9:37 AM Georgios Kokolatos <gkokolatos@pm.me> wrote:

My bad, I should have been more clear. I meant that it is preferable to use
the C99 standard which calls for declaring variables in the scope that you
need them.

Ah, I see. I think I got that from ExplainPrintSettings. I've
corrected my usage--thanks for pointing it out. I appreciate the
effort to maintain a consistent style.

int indent; /* current indentation level */
List *grouping_stack; /* format-specific grouping state */
+ bool print_workers; /* whether current node has worker metadata */

Hmm.. commit <b925a00f4ef> introduced `hide_workers` in the struct. Having both
names in the struct so far apart even seems a bit confusing and sloppy. Do you
think it would be possible to combine or rename?

I noticed that. I was thinking about combining them, but
"hide_workers" seems to be about "pretend there is no worker output
even if there is" and "print_workers" is "keep track of whether or not
there is worker output to print". Maybe I'll rename to
"has_worker_output"?

The rename sounds a bit better in my humble opinion. Thanks.

Also, reviewing my code again, I noticed that when I moved the general
worker output earlier, I missed part of the merge conflict: I had
replaced

- /* Show worker detail */
- if (es->analyze && es->verbose && !es->hide_workers &&
- planstate->worker_instrument)

with

+       /* Prepare worker general execution details */
+       if (es->analyze && es->verbose && planstate->worker_instrument)

which ignores the es->hide_workers flag (it did not fail the tests,
but the intent is pretty clear). I've corrected this in the current
patch.

I also noticed that we can now handle worker buffer output more
consistently across TEXT and structured formats, so I made that small
change too:

diff --git a/src/backend/commands/explain.c b/src/backend/commands/explain.c
index 140d0be426..b23b015594 100644
--- a/src/backend/commands/explain.c
+++ b/src/backend/commands/explain.c
@@ -1401,8 +1401,6 @@ ExplainNode(PlanState *planstate, List *ancestors,
                                        appendStringInfo(es->str,

"actual rows=%.0f loops=%.0f\n",

  rows, nloops);
-                               if (es->buffers)
-                                       show_buffer_usage(es,
&instrument->bufusage);
                        }
                        else
                        {
@@ -1951,7 +1949,7 @@ ExplainNode(PlanState *planstate, List *ancestors,
        /* Prepare worker buffer usage */
        if (es->buffers && es->analyze && es->verbose && !es->hide_workers
-               && planstate->worker_instrument && es->format !=
EXPLAIN_FORMAT_TEXT)
+               && planstate->worker_instrument)
        {
                WorkerInstrumentation *w = planstate->worker_instrument;
                int                     n;
diff --git a/src/test/regress/expected/explain.out
b/src/test/regress/expected/explain.out
index 8034a4e0db..a4eed3067f 100644
--- a/src/test/regress/expected/explain.out
+++ b/src/test/regress/expected/explain.out
@@ -103,8 +103,8 @@ $$, 'verbose', 'analyze', 'buffers', 'timing off',
'costs off', 'summary off'),
          Sort Key: (ROW("*VALUES*".column1))                      +
          Buffers: shared hit=114                                  +
          Worker 0: actual rows=2 loops=1                          +
-           Buffers: shared hit=114                                +
            Sort Method: xxx                                       +
+           Buffers: shared hit=114                                +
          ->  Values Scan on "*VALUES*" (actual rows=2 loops=1)    +
                Output: "*VALUES*".column1, ROW("*VALUES*".column1)+
                Worker 0: actual rows=2 loops=1                    +

I think the "producing plan output for a worker" process is easier to
reason about now, and while it changes TEXT format worker output
order, the other changes in this patch are more drastic so this
probably does not matter.

I've also addressed the other feedback above, and reworded a couple of
comments slightly.

Attachments:

merge-explain-worker-output-v4.patchtext/x-patch; charset=US-ASCII; name=merge-explain-worker-output-v4.patchDownload
diff --git a/src/backend/commands/explain.c b/src/backend/commands/explain.c
index d189b8d573..b4108f82ec 100644
--- a/src/backend/commands/explain.c
+++ b/src/backend/commands/explain.c
@@ -100,7 +100,7 @@ static void show_sortorder_options(StringInfo buf, Node *sortexpr,
 								   Oid sortOperator, Oid collation, bool nullsFirst);
 static void show_tablesample(TableSampleClause *tsc, PlanState *planstate,
 							 List *ancestors, ExplainState *es);
-static void show_sort_info(SortState *sortstate, ExplainState *es);
+static void show_sort_info(SortState *sortstate, StringInfo* worker_strs, ExplainState *es);
 static void show_hash_info(HashState *hashstate, ExplainState *es);
 static void show_tidbitmap_info(BitmapHeapScanState *planstate,
 								ExplainState *es);
@@ -131,7 +131,9 @@ static void ExplainXMLTag(const char *tagname, int flags, ExplainState *es);
 static void ExplainJSONLineEnding(ExplainState *es);
 static void ExplainYAMLLineStarting(ExplainState *es);
 static void escape_yaml(StringInfo buf, const char *str);
-
+static void ExplainOpenWorker(StringInfo worker_str, ExplainState *es);
+static void ExplainCloseWorker(ExplainState *es);
+static void ExplainFlushWorkers(StringInfo *worker_strs, int num_workers, ExplainState *es);
 
 
 /*
@@ -289,6 +291,7 @@ NewExplainState(void)
 	es->costs = true;
 	/* Prepare output buffer. */
 	es->str = makeStringInfo();
+	es->root_str = es->str;
 
 	return es;
 }
@@ -1090,9 +1093,19 @@ ExplainNode(PlanState *planstate, List *ancestors,
 	const char *partialmode = NULL;
 	const char *operation = NULL;
 	const char *custom_name = NULL;
+	StringInfo *worker_strs = NULL;
 	int			save_indent = es->indent;
 	bool		haschildren;
 
+	/* Prepare per-worker output */
+	if (es->analyze && planstate->worker_instrument)
+	{
+		int num_workers = planstate->worker_instrument->num_workers;
+		worker_strs = (StringInfo *) palloc0(num_workers * sizeof(StringInfo));
+		for (int n = 0; n < num_workers; n++)
+			worker_strs[n] = makeStringInfo();
+	}
+
 	switch (nodeTag(plan))
 	{
 		case T_Result:
@@ -1357,6 +1370,55 @@ ExplainNode(PlanState *planstate, List *ancestors,
 		ExplainPropertyBool("Parallel Aware", plan->parallel_aware, es);
 	}
 
+	/* Prepare worker general execution details */
+	if (es->analyze && es->verbose && !es->hide_workers && planstate->worker_instrument)
+	{
+		WorkerInstrumentation *w = planstate->worker_instrument;
+
+		for (int n = 0; n < w->num_workers; ++n)
+		{
+			Instrumentation *instrument = &w->instrument[n];
+			double		nloops = instrument->nloops;
+			double		startup_ms;
+			double		total_ms;
+			double		rows;
+
+			if (nloops <= 0)
+				continue;
+			startup_ms = 1000.0 * instrument->startup / nloops;
+			total_ms = 1000.0 * instrument->total / nloops;
+			rows = instrument->ntuples / nloops;
+
+			ExplainOpenWorker(worker_strs[n], es);
+
+			if (es->format == EXPLAIN_FORMAT_TEXT)
+			{
+				if (es->timing)
+					appendStringInfo(es->str,
+									 "actual time=%.3f..%.3f rows=%.0f loops=%.0f\n",
+									 startup_ms, total_ms, rows, nloops);
+				else
+					appendStringInfo(es->str,
+									 "actual rows=%.0f loops=%.0f\n",
+									 rows, nloops);
+			}
+			else
+			{
+				if (es->timing)
+				{
+					ExplainPropertyFloat("Actual Startup Time", "ms",
+										 startup_ms, 3, es);
+					ExplainPropertyFloat("Actual Total Time", "ms",
+										 total_ms, 3, es);
+				}
+				ExplainPropertyFloat("Actual Rows", NULL, rows, 0, es);
+				ExplainPropertyFloat("Actual Loops", NULL, nloops, 0, es);
+			}
+		}
+
+		ExplainCloseWorker(es);
+	}
+
 	switch (nodeTag(plan))
 	{
 		case T_SeqScan:
@@ -1856,7 +1918,7 @@ ExplainNode(PlanState *planstate, List *ancestors,
 			break;
 		case T_Sort:
 			show_sort_keys(castNode(SortState, planstate), ancestors, es);
-			show_sort_info(castNode(SortState, planstate), es);
+			show_sort_info(castNode(SortState, planstate), worker_strs, es);
 			break;
 		case T_MergeAppend:
 			show_merge_append_keys(castNode(MergeAppendState, planstate),
@@ -1885,76 +1947,31 @@ ExplainNode(PlanState *planstate, List *ancestors,
 	if (es->buffers && planstate->instrument)
 		show_buffer_usage(es, &planstate->instrument->bufusage);
 
-	/* Show worker detail */
-	if (es->analyze && es->verbose && !es->hide_workers &&
-		planstate->worker_instrument)
+	/* Prepare worker buffer usage */
+	if (es->buffers && es->analyze && es->verbose && !es->hide_workers
+		&& planstate->worker_instrument)
 	{
 		WorkerInstrumentation *w = planstate->worker_instrument;
-		bool		opened_group = false;
 		int			n;
 
 		for (n = 0; n < w->num_workers; ++n)
 		{
 			Instrumentation *instrument = &w->instrument[n];
 			double		nloops = instrument->nloops;
-			double		startup_ms;
-			double		total_ms;
-			double		rows;
 
 			if (nloops <= 0)
 				continue;
-			startup_ms = 1000.0 * instrument->startup / nloops;
-			total_ms = 1000.0 * instrument->total / nloops;
-			rows = instrument->ntuples / nloops;
 
-			if (es->format == EXPLAIN_FORMAT_TEXT)
-			{
-				appendStringInfoSpaces(es->str, es->indent * 2);
-				appendStringInfo(es->str, "Worker %d: ", n);
-				if (es->timing)
-					appendStringInfo(es->str,
-									 "actual time=%.3f..%.3f rows=%.0f loops=%.0f\n",
-									 startup_ms, total_ms, rows, nloops);
-				else
-					appendStringInfo(es->str,
-									 "actual rows=%.0f loops=%.0f\n",
-									 rows, nloops);
-				es->indent++;
-				if (es->buffers)
-					show_buffer_usage(es, &instrument->bufusage);
-				es->indent--;
-			}
-			else
-			{
-				if (!opened_group)
-				{
-					ExplainOpenGroup("Workers", "Workers", false, es);
-					opened_group = true;
-				}
-				ExplainOpenGroup("Worker", NULL, true, es);
-				ExplainPropertyInteger("Worker Number", NULL, n, es);
-
-				if (es->timing)
-				{
-					ExplainPropertyFloat("Actual Startup Time", "ms",
-										 startup_ms, 3, es);
-					ExplainPropertyFloat("Actual Total Time", "ms",
-										 total_ms, 3, es);
-				}
-				ExplainPropertyFloat("Actual Rows", NULL, rows, 0, es);
-				ExplainPropertyFloat("Actual Loops", NULL, nloops, 0, es);
-
-				if (es->buffers)
-					show_buffer_usage(es, &instrument->bufusage);
-
-				ExplainCloseGroup("Worker", NULL, true, es);
-			}
+			ExplainOpenWorker(worker_strs[n], es);
+			show_buffer_usage(es, &instrument->bufusage);
 		}
-
-		if (opened_group)
-			ExplainCloseGroup("Workers", "Workers", false, es);
+		ExplainCloseWorker(es);
 	}
 
+	/* Show worker detail */
+	if (planstate->worker_instrument)
+		ExplainFlushWorkers(worker_strs, planstate->worker_instrument->num_workers, es);
+
 	/* Get ready to display the child plans */
 	haschildren = planstate->initPlan ||
 		outerPlanState(planstate) ||
@@ -2552,7 +2569,7 @@ show_tablesample(TableSampleClause *tsc, PlanState *planstate,
  * If it's EXPLAIN ANALYZE, show tuplesort stats for a sort node
  */
 static void
-show_sort_info(SortState *sortstate, ExplainState *es)
+show_sort_info(SortState *sortstate, StringInfo *worker_strs, ExplainState *es)
 {
 	if (!es->analyze)
 		return;
@@ -2593,7 +2610,7 @@ show_sort_info(SortState *sortstate, ExplainState *es)
 	if (sortstate->shared_info != NULL)
 	{
 		int			n;
-		bool		opened_group = false;
+		bool 		indent = es->has_worker_output;
 
 		for (n = 0; n < sortstate->shared_info->num_workers; n++)
 		{
@@ -2609,32 +2626,24 @@ show_sort_info(SortState *sortstate, ExplainState *es)
 			spaceType = tuplesort_space_type_name(sinstrument->spaceType);
 			spaceUsed = sinstrument->spaceUsed;
 
+			ExplainOpenWorker(worker_strs[n], es);
+
 			if (es->format == EXPLAIN_FORMAT_TEXT)
 			{
-				appendStringInfoSpaces(es->str, es->indent * 2);
-				if (n > 0 || !es->hide_workers)
-					appendStringInfo(es->str, "Worker %d:  ", n);
+				if (indent)
+					appendStringInfoSpaces(es->str, es->indent * 2);
 				appendStringInfo(es->str,
 								 "Sort Method: %s  %s: %ldkB\n",
 								 sortMethod, spaceType, spaceUsed);
 			}
 			else
 			{
-				if (!opened_group)
-				{
-					ExplainOpenGroup("Workers", "Workers", false, es);
-					opened_group = true;
-				}
-				ExplainOpenGroup("Worker", NULL, true, es);
-				ExplainPropertyInteger("Worker Number", NULL, n, es);
 				ExplainPropertyText("Sort Method", sortMethod, es);
 				ExplainPropertyInteger("Sort Space Used", "kB", spaceUsed, es);
 				ExplainPropertyText("Sort Space Type", spaceType, es);
-				ExplainCloseGroup("Worker", NULL, true, es);
 			}
 		}
-		if (opened_group)
-			ExplainCloseGroup("Workers", "Workers", false, es);
+		ExplainCloseWorker(es);
 	}
 }
 
@@ -3751,6 +3760,68 @@ ExplainCloseGroup(const char *objtype, const char *labelname,
 	}
 }
 
+/*
+ * Begin output for a specific worker. If we are currently producing output
+ * for another worker, close that worker automatically.
+ */
+static void
+ExplainOpenWorker(StringInfo worker_str, ExplainState *es)
+{
+	/*
+	 * For structured formats, we indent twice: once for the "Worker" group
+	 * itself and once for the surrounding "Workers" group. For text format,
+	 * we just indent once, to add worker info on the next worker line. Either
+	 * way, we only need to adjust indentation once for each string of
+	 * ExplainOpenWorker calls.
+	 */
+	if (es->str == es->root_str)
+		es->indent += es->format == EXPLAIN_FORMAT_TEXT ? 1 : 2;
+
+	es->has_worker_output = true;
+	es->str = worker_str;
+}
+
+/*
+ * Explicitly end output for the current worker and resume output
+ * for the root of the node.
+ */
+static void
+ExplainCloseWorker(ExplainState *es)
+{
+	/* as above, the text format has different indentation */
+	es->indent -= es->format == EXPLAIN_FORMAT_TEXT ? 1 : 2;
+
+	es->str = es->root_str;
+}
+
+/*
+ * Flush worker output for this node and reset the worker output state.
+ */
+static void
+ExplainFlushWorkers(StringInfo* worker_strs, int num_workers, ExplainState *es)
+{
+	if (!es->has_worker_output)
+		return;
+
+	ExplainOpenGroup("Workers", "Workers", false, es);
+	for (int i = 0; i < num_workers; i++)
+	{
+		ExplainOpenGroup("Worker", NULL, true, es);
+		if (es->format == EXPLAIN_FORMAT_TEXT)
+		{
+			appendStringInfoSpaces(es->str, es->indent * 2);
+			appendStringInfo(es->str, "Worker %d: ", i);
+		}
+		else
+			ExplainPropertyInteger("Worker Number", NULL, i, es);
+
+		appendStringInfoString(es->str, worker_strs[i]->data);
+		ExplainCloseGroup("Worker", NULL, true, es);
+	}
+	ExplainCloseGroup("Workers", "Workers", false, es);
+	es->has_worker_output = false;
+}
+
 /*
  * Emit a "dummy" group that never has any members.
  *
diff --git a/src/include/commands/explain.h b/src/include/commands/explain.h
index 45027a7c47..1f9208b571 100644
--- a/src/include/commands/explain.h
+++ b/src/include/commands/explain.h
@@ -27,7 +27,7 @@ typedef enum ExplainFormat
 
 typedef struct ExplainState
 {
-	StringInfo	str;			/* output buffer */
+	StringInfo	str;			/* current output buffer */
 	/* options */
 	bool		verbose;		/* be verbose */
 	bool		analyze;		/* print actual times */
@@ -40,6 +40,8 @@ typedef struct ExplainState
 	/* state for output formatting --- not reset for each new plan tree */
 	int			indent;			/* current indentation level */
 	List	   *grouping_stack; /* format-specific grouping state */
+	bool	    has_worker_output;	/* whether current node has worker metadata */
+	StringInfo	root_str;		/* root output buffer */
 	/* state related to the current plan tree (filled by ExplainPrintPlan) */
 	PlannedStmt *pstmt;			/* top of plan */
 	List	   *rtable;			/* range table */
diff --git a/src/test/regress/expected/explain.out b/src/test/regress/expected/explain.out
new file mode 100644
index 0000000000..a4eed3067f
--- /dev/null
+++ b/src/test/regress/expected/explain.out
@@ -0,0 +1,114 @@
+--
+-- EXPLAIN
+--
+-- helper functions for examining explain output
+-- return a json explain of given query
+CREATE OR REPLACE FUNCTION json_explain(query text, variadic opts text[])
+  RETURNS json LANGUAGE PLPGSQL AS $$
+DECLARE
+  plan json;
+BEGIN
+  EXECUTE 'EXPLAIN (' || array_to_string(array_append(opts, 'FORMAT JSON) '), ', ') || query INTO STRICT plan;
+  RETURN plan;
+END;
+$$;
+-- return a text explain of given query
+CREATE OR REPLACE FUNCTION text_explain(query text, variadic opts text[])
+  RETURNS text LANGUAGE PLPGSQL AS $$
+DECLARE
+  plan_row RECORD;
+  plan text = '';
+BEGIN
+  FOR plan_row IN EXECUTE 'EXPLAIN (' || array_to_string(array_append(opts, 'FORMAT TEXT) '), ', ') || query LOOP
+    plan = plan || plan_row."QUERY PLAN" || E'\n';
+  END LOOP;
+
+  RETURN plan;
+END;
+$$;
+/*
+ * Takes a json object and processes its keys and values. For every
+ * matching key whose value is a number, set it to zero to facilitate
+ * comparisons for values that can vary across executions. If not a number,
+ * set the value to -1 to indicate we matched an unexpected key. Return
+ * non-matching keys and values unmodified.
+ */
+CREATE OR REPLACE FUNCTION json_normalize_numeric_keys_matching(obj json, regexp text)
+  RETURNS json LANGUAGE SQL AS $$
+SELECT
+  coalesce(json_object_agg(
+    key,
+    CASE
+      WHEN key ~ regexp THEN
+        CASE json_typeof(value)
+	  WHEN 'number' THEN 0::text::json
+	  ELSE (-1)::text::json
+	END
+      ELSE value
+    END
+  ), '{}')
+FROM
+  json_each(obj)
+$$;
+-- explain only worker information, normalizing any values that could vary across executions
+CREATE OR REPLACE FUNCTION explain_workers(plan_node json)
+  RETURNS json LANGUAGE SQL AS $$
+SELECT
+  coalesce(json_object_agg(key, value), '{}')
+FROM (
+  SELECT
+    key,
+    CASE key
+      WHEN 'Workers' THEN
+        (SELECT json_agg(
+	   json_normalize_numeric_keys_matching(worker, '(Blocks|Time)$|Sort Space Used')
+        )
+        FROM json_array_elements(value) AS workers(worker))
+      WHEN 'Plans' THEN
+	(SELECT json_agg(
+	  explain_workers(child)
+	)
+	FROM json_array_elements(value) AS children(child))
+    END AS value
+  FROM
+    json_each(plan_node) AS entries(key, value)
+  WHERE
+    key IN ('Workers', 'Plans')
+) AS plan_fields(key, value);
+$$;
+-- test that per-worker sort and buffers output is combined correctly in EXPLAIN
+set force_parallel_mode = true;
+SELECT explain_workers(json_explain($$
+  SELECT * FROM (VALUES(1),(2)) x ORDER BY x;
+$$, 'verbose', 'analyze', 'buffers') -> 0 -> 'Plan');
+                                                                                                                                                                                                                                                                                                                                                                                                                                                              explain_workers                                                                                                                                                                                                                                                                                                                                                                                                                                                               
+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
+ { "Plans" : [{ "Workers" : [{ "Worker Number" : 0, "Actual Startup Time" : 0, "Actual Total Time" : 0, "Actual Rows" : 2, "Actual Loops" : 1, "Sort Method" : "quicksort", "Sort Space Used" : 0, "Sort Space Type" : "Memory", "Shared Hit Blocks" : 0, "Shared Read Blocks" : 0, "Shared Dirtied Blocks" : 0, "Shared Written Blocks" : 0, "Local Hit Blocks" : 0, "Local Read Blocks" : 0, "Local Dirtied Blocks" : 0, "Local Written Blocks" : 0, "Temp Read Blocks" : 0, "Temp Written Blocks" : 0 }], "Plans" : [{ "Workers" : [{ "Worker Number" : 0, "Actual Startup Time" : 0, "Actual Total Time" : 0, "Actual Rows" : 2, "Actual Loops" : 1, "Shared Hit Blocks" : 0, "Shared Read Blocks" : 0, "Shared Dirtied Blocks" : 0, "Shared Written Blocks" : 0, "Local Hit Blocks" : 0, "Local Read Blocks" : 0, "Local Dirtied Blocks" : 0, "Local Written Blocks" : 0, "Temp Read Blocks" : 0, "Temp Written Blocks" : 0 }] }] }] }
+(1 row)
+
+SELECT regexp_replace(regexp_replace(text_explain($$
+  SELECT * FROM (VALUES(1),(2)) x ORDER BY x;
+$$, 'verbose', 'analyze', 'buffers', 'timing off', 'costs off', 'summary off'),
+  'Buffers: .+?$', 'Buffers: xxx', 'n'), 'Sort Method: .+?$', 'Sort Method: xxx', 'n');
+                          regexp_replace                           
+-------------------------------------------------------------------
+ Gather (actual rows=2 loops=1)                                   +
+   Output: "*VALUES*".column1, (ROW("*VALUES*".column1))          +
+   Workers Planned: 1                                             +
+   Workers Launched: 1                                            +
+   Single Copy: true                                              +
+   Buffers: xxx                                                   +
+   ->  Sort (actual rows=2 loops=1)                               +
+         Output: "*VALUES*".column1, (ROW("*VALUES*".column1))    +
+         Sort Key: (ROW("*VALUES*".column1))                      +
+         Buffers: shared hit=114                                  +
+         Worker 0: actual rows=2 loops=1                          +
+           Sort Method: xxx                                       +
+           Buffers: shared hit=114                                +
+         ->  Values Scan on "*VALUES*" (actual rows=2 loops=1)    +
+               Output: "*VALUES*".column1, ROW("*VALUES*".column1)+
+               Worker 0: actual rows=2 loops=1                    +
+ 
+(1 row)
+
+reset force_parallel_mode;
diff --git a/src/test/regress/expected/select_parallel.out b/src/test/regress/expected/select_parallel.out
index 96dfb7c8dd..cba15f7133 100644
--- a/src/test/regress/expected/select_parallel.out
+++ b/src/test/regress/expected/select_parallel.out
@@ -591,10 +591,10 @@ select * from explain_parallel_sort_stats();
          ->  Sort (actual rows=2000 loops=15)
                Sort Key: tenk1.ten
                Sort Method: quicksort  Memory: xxx
-               Worker 0:  Sort Method: quicksort  Memory: xxx
-               Worker 1:  Sort Method: quicksort  Memory: xxx
-               Worker 2:  Sort Method: quicksort  Memory: xxx
-               Worker 3:  Sort Method: quicksort  Memory: xxx
+               Worker 0: Sort Method: quicksort  Memory: xxx
+               Worker 1: Sort Method: quicksort  Memory: xxx
+               Worker 2: Sort Method: quicksort  Memory: xxx
+               Worker 3: Sort Method: quicksort  Memory: xxx
                ->  Parallel Seq Scan on tenk1 (actual rows=2000 loops=15)
                      Filter: (ten < 100)
 (14 rows)
diff --git a/src/test/regress/parallel_schedule b/src/test/regress/parallel_schedule
index d33a4e143d..d2b17dd3ea 100644
--- a/src/test/regress/parallel_schedule
+++ b/src/test/regress/parallel_schedule
@@ -112,7 +112,7 @@ test: plancache limit plpgsql copy2 temp domain rangefuncs prepare conversion tr
 # ----------
 # Another group of parallel tests
 # ----------
-test: partition_join partition_prune reloptions hash_part indexing partition_aggregate partition_info tuplesort
+test: partition_join partition_prune reloptions hash_part indexing partition_aggregate partition_info tuplesort explain
 
 # event triggers cannot run concurrently with any test that runs DDL
 test: event_trigger
diff --git a/src/test/regress/serial_schedule b/src/test/regress/serial_schedule
index f86f5c5682..acba391332 100644
--- a/src/test/regress/serial_schedule
+++ b/src/test/regress/serial_schedule
@@ -193,6 +193,7 @@ test: indexing
 test: partition_aggregate
 test: partition_info
 test: tuplesort
+test: explain
 test: event_trigger
 test: fast_default
 test: stats
diff --git a/src/test/regress/sql/explain.sql b/src/test/regress/sql/explain.sql
new file mode 100644
index 0000000000..03accb1b83
--- /dev/null
+++ b/src/test/regress/sql/explain.sql
@@ -0,0 +1,98 @@
+--
+-- EXPLAIN
+--
+
+-- helper functions for examining explain output
+
+-- return a json explain of given query
+CREATE OR REPLACE FUNCTION json_explain(query text, variadic opts text[])
+  RETURNS json LANGUAGE PLPGSQL AS $$
+DECLARE
+  plan json;
+BEGIN
+  EXECUTE 'EXPLAIN (' || array_to_string(array_append(opts, 'FORMAT JSON) '), ', ') || query INTO STRICT plan;
+  RETURN plan;
+END;
+$$;
+
+-- return a text explain of given query
+CREATE OR REPLACE FUNCTION text_explain(query text, variadic opts text[])
+  RETURNS text LANGUAGE PLPGSQL AS $$
+DECLARE
+  plan_row RECORD;
+  plan text = '';
+BEGIN
+  FOR plan_row IN EXECUTE 'EXPLAIN (' || array_to_string(array_append(opts, 'FORMAT TEXT) '), ', ') || query LOOP
+    plan = plan || plan_row."QUERY PLAN" || E'\n';
+  END LOOP;
+
+  RETURN plan;
+END;
+$$;
+
+
+/*
+ * Takes a json object and processes its keys and values. For every
+ * matching key whose value is a number, set it to zero to facilitate
+ * comparisons for values that can vary across executions. If not a number,
+ * set the value to -1 to indicate we matched an unexpected key. Return
+ * non-matching keys and values unmodified.
+ */
+CREATE OR REPLACE FUNCTION json_normalize_numeric_keys_matching(obj json, regexp text)
+  RETURNS json LANGUAGE SQL AS $$
+SELECT
+  coalesce(json_object_agg(
+    key,
+    CASE
+      WHEN key ~ regexp THEN
+        CASE json_typeof(value)
+	  WHEN 'number' THEN 0::text::json
+	  ELSE (-1)::text::json
+	END
+      ELSE value
+    END
+  ), '{}')
+FROM
+  json_each(obj)
+$$;
+
+-- explain only worker information, normalizing any values that could vary across executions
+CREATE OR REPLACE FUNCTION explain_workers(plan_node json)
+  RETURNS json LANGUAGE SQL AS $$
+SELECT
+  coalesce(json_object_agg(key, value), '{}')
+FROM (
+  SELECT
+    key,
+    CASE key
+      WHEN 'Workers' THEN
+        (SELECT json_agg(
+	   json_normalize_numeric_keys_matching(worker, '(Blocks|Time)$|Sort Space Used')
+        )
+        FROM json_array_elements(value) AS workers(worker))
+      WHEN 'Plans' THEN
+	(SELECT json_agg(
+	  explain_workers(child)
+	)
+	FROM json_array_elements(value) AS children(child))
+    END AS value
+  FROM
+    json_each(plan_node) AS entries(key, value)
+  WHERE
+    key IN ('Workers', 'Plans')
+) AS plan_fields(key, value);
+$$;
+
+-- test that per-worker sort and buffers output is combined correctly in EXPLAIN
+set force_parallel_mode = true;
+
+SELECT explain_workers(json_explain($$
+  SELECT * FROM (VALUES(1),(2)) x ORDER BY x;
+$$, 'verbose', 'analyze', 'buffers') -> 0 -> 'Plan');
+
+SELECT regexp_replace(regexp_replace(text_explain($$
+  SELECT * FROM (VALUES(1),(2)) x ORDER BY x;
+$$, 'verbose', 'analyze', 'buffers', 'timing off', 'costs off', 'summary off'),
+  'Buffers: .+?$', 'Buffers: xxx', 'n'), 'Sort Method: .+?$', 'Sort Method: xxx', 'n');
+
+reset force_parallel_mode;
#17Georgios Kokolatos
gkokolatos@pm.me
In reply to: Maciek Sakrejda (#16)
Re: Duplicate Workers entries in some EXPLAIN plans

The following review has been posted through the commitfest application:
make installcheck-world: not tested
Implements feature: not tested
Spec compliant: not tested
Documentation: not tested

Ah, I see. I think I got that from ExplainPrintSettings. I've
corrected my usage--thanks for pointing it out. I appreciate the
effort to maintain a consistent style.

Thanks, I am just following the reviewing guide to be honest.

Also, reviewing my code again, I noticed that when I moved the general
worker output earlier, I missed part of the merge conflict:

Right. I thought that was intentional.

which ignores the es->hide_workers flag (it did not fail the tests,
but the intent is pretty clear). I've corrected this in the current
patch.

Noted and appreciated.

I also noticed that we can now handle worker buffer output more
consistently across TEXT and structured formats, so I made that small
change too:

Looks good.

I think the "producing plan output for a worker" process is easier to
reason about now, and while it changes TEXT format worker output
order, the other changes in this patch are more drastic so this
probably does not matter.

I've also addressed the other feedback above, and reworded a couple of
comments slightly.

Thanks for the above. Looks clean, does what it says in the tin and solves a
problem that needs solving. All relevant installcheck-world pass. As far as I
am concerned, I think it is ready to be sent to a committer. Updating the status
accordingly.

The new status of this patch is: Ready for Committer

#18Maciek Sakrejda
m.sakrejda@gmail.com
In reply to: Georgios Kokolatos (#17)
1 attachment(s)
Re: Duplicate Workers entries in some EXPLAIN plans

Great, thank you. I noticed in the CF patch tester app, the build
fails on Windows [1]https://ci.appveyor.com/project/postgresql-cfbot/postgresql/build/1.0.76313. Investigating, I realized I had failed to fully
strip volatile EXPLAIN info (namely buffers) in TEXT mode due to a
bad regexp_replace. I've fixed this in the attached patch (which I
also rebased against current master again).

[1]: https://ci.appveyor.com/project/postgresql-cfbot/postgresql/build/1.0.76313

Attachments:

merge-explain-worker-output-v5.patchtext/x-patch; charset=US-ASCII; name=merge-explain-worker-output-v5.patchDownload
diff --git a/src/backend/commands/explain.c b/src/backend/commands/explain.c
index d189b8d573..b4108f82ec 100644
--- a/src/backend/commands/explain.c
+++ b/src/backend/commands/explain.c
@@ -100,7 +100,7 @@ static void show_sortorder_options(StringInfo buf, Node *sortexpr,
 								   Oid sortOperator, Oid collation, bool nullsFirst);
 static void show_tablesample(TableSampleClause *tsc, PlanState *planstate,
 							 List *ancestors, ExplainState *es);
-static void show_sort_info(SortState *sortstate, ExplainState *es);
+static void show_sort_info(SortState *sortstate, StringInfo* worker_strs, ExplainState *es);
 static void show_hash_info(HashState *hashstate, ExplainState *es);
 static void show_tidbitmap_info(BitmapHeapScanState *planstate,
 								ExplainState *es);
@@ -131,7 +131,9 @@ static void ExplainXMLTag(const char *tagname, int flags, ExplainState *es);
 static void ExplainJSONLineEnding(ExplainState *es);
 static void ExplainYAMLLineStarting(ExplainState *es);
 static void escape_yaml(StringInfo buf, const char *str);
-
+static void ExplainOpenWorker(StringInfo worker_str, ExplainState *es);
+static void ExplainCloseWorker(ExplainState *es);
+static void ExplainFlushWorkers(StringInfo *worker_strs, int num_workers, ExplainState *es);
 
 
 /*
@@ -289,6 +291,7 @@ NewExplainState(void)
 	es->costs = true;
 	/* Prepare output buffer. */
 	es->str = makeStringInfo();
+	es->root_str = es->str;
 
 	return es;
 }
@@ -1090,9 +1093,19 @@ ExplainNode(PlanState *planstate, List *ancestors,
 	const char *partialmode = NULL;
 	const char *operation = NULL;
 	const char *custom_name = NULL;
+	StringInfo *worker_strs = NULL;
 	int			save_indent = es->indent;
 	bool		haschildren;
 
+	/* Prepare per-worker output */
+	if (es->analyze && planstate->worker_instrument)
+	{
+		int num_workers = planstate->worker_instrument->num_workers;
+		worker_strs = (StringInfo *) palloc0(num_workers * sizeof(StringInfo));
+		for (int n = 0; n < num_workers; n++)
+			worker_strs[n] = makeStringInfo();
+	}
+
 	switch (nodeTag(plan))
 	{
 		case T_Result:
@@ -1357,6 +1370,55 @@ ExplainNode(PlanState *planstate, List *ancestors,
 		ExplainPropertyBool("Parallel Aware", plan->parallel_aware, es);
 	}
 
+	/* Prepare worker general execution details */
+	if (es->analyze && es->verbose && !es->hide_workers && planstate->worker_instrument)
+	{
+		WorkerInstrumentation *w = planstate->worker_instrument;
+
+		for (int n = 0; n < w->num_workers; ++n)
+		{
+			Instrumentation *instrument = &w->instrument[n];
+			double		nloops = instrument->nloops;
+			double		startup_ms;
+			double		total_ms;
+			double		rows;
+
+			if (nloops <= 0)
+				continue;
+			startup_ms = 1000.0 * instrument->startup / nloops;
+			total_ms = 1000.0 * instrument->total / nloops;
+			rows = instrument->ntuples / nloops;
+
+			ExplainOpenWorker(worker_strs[n], es);
+
+			if (es->format == EXPLAIN_FORMAT_TEXT)
+			{
+				if (es->timing)
+					appendStringInfo(es->str,
+									 "actual time=%.3f..%.3f rows=%.0f loops=%.0f\n",
+									 startup_ms, total_ms, rows, nloops);
+				else
+					appendStringInfo(es->str,
+									 "actual rows=%.0f loops=%.0f\n",
+									 rows, nloops);
+			}
+			else
+			{
+				if (es->timing)
+				{
+					ExplainPropertyFloat("Actual Startup Time", "ms",
+										 startup_ms, 3, es);
+					ExplainPropertyFloat("Actual Total Time", "ms",
+										 total_ms, 3, es);
+				}
+				ExplainPropertyFloat("Actual Rows", NULL, rows, 0, es);
+				ExplainPropertyFloat("Actual Loops", NULL, nloops, 0, es);
+			}
+		}
+
+		ExplainCloseWorker(es);
+	}
+
 	switch (nodeTag(plan))
 	{
 		case T_SeqScan:
@@ -1856,7 +1918,7 @@ ExplainNode(PlanState *planstate, List *ancestors,
 			break;
 		case T_Sort:
 			show_sort_keys(castNode(SortState, planstate), ancestors, es);
-			show_sort_info(castNode(SortState, planstate), es);
+			show_sort_info(castNode(SortState, planstate), worker_strs, es);
 			break;
 		case T_MergeAppend:
 			show_merge_append_keys(castNode(MergeAppendState, planstate),
@@ -1885,76 +1947,31 @@ ExplainNode(PlanState *planstate, List *ancestors,
 	if (es->buffers && planstate->instrument)
 		show_buffer_usage(es, &planstate->instrument->bufusage);
 
-	/* Show worker detail */
-	if (es->analyze && es->verbose && !es->hide_workers &&
-		planstate->worker_instrument)
+	/* Prepare worker buffer usage */
+	if (es->buffers && es->analyze && es->verbose && !es->hide_workers
+		&& planstate->worker_instrument)
 	{
 		WorkerInstrumentation *w = planstate->worker_instrument;
-		bool		opened_group = false;
 		int			n;
 
 		for (n = 0; n < w->num_workers; ++n)
 		{
 			Instrumentation *instrument = &w->instrument[n];
 			double		nloops = instrument->nloops;
-			double		startup_ms;
-			double		total_ms;
-			double		rows;
 
 			if (nloops <= 0)
 				continue;
-			startup_ms = 1000.0 * instrument->startup / nloops;
-			total_ms = 1000.0 * instrument->total / nloops;
-			rows = instrument->ntuples / nloops;
 
-			if (es->format == EXPLAIN_FORMAT_TEXT)
-			{
-				appendStringInfoSpaces(es->str, es->indent * 2);
-				appendStringInfo(es->str, "Worker %d: ", n);
-				if (es->timing)
-					appendStringInfo(es->str,
-									 "actual time=%.3f..%.3f rows=%.0f loops=%.0f\n",
-									 startup_ms, total_ms, rows, nloops);
-				else
-					appendStringInfo(es->str,
-									 "actual rows=%.0f loops=%.0f\n",
-									 rows, nloops);
-				es->indent++;
-				if (es->buffers)
-					show_buffer_usage(es, &instrument->bufusage);
-				es->indent--;
-			}
-			else
-			{
-				if (!opened_group)
-				{
-					ExplainOpenGroup("Workers", "Workers", false, es);
-					opened_group = true;
-				}
-				ExplainOpenGroup("Worker", NULL, true, es);
-				ExplainPropertyInteger("Worker Number", NULL, n, es);
-
-				if (es->timing)
-				{
-					ExplainPropertyFloat("Actual Startup Time", "ms",
-										 startup_ms, 3, es);
-					ExplainPropertyFloat("Actual Total Time", "ms",
-										 total_ms, 3, es);
-				}
-				ExplainPropertyFloat("Actual Rows", NULL, rows, 0, es);
-				ExplainPropertyFloat("Actual Loops", NULL, nloops, 0, es);
-
-				if (es->buffers)
-					show_buffer_usage(es, &instrument->bufusage);
-
-				ExplainCloseGroup("Worker", NULL, true, es);
-			}
+			ExplainOpenWorker(worker_strs[n], es);
+			show_buffer_usage(es, &instrument->bufusage);
 		}
-
-		if (opened_group)
-			ExplainCloseGroup("Workers", "Workers", false, es);
+		ExplainCloseWorker(es);
 	}
 
+	/* Show worker detail */
+	if (planstate->worker_instrument)
+		ExplainFlushWorkers(worker_strs, planstate->worker_instrument->num_workers, es);
+
 	/* Get ready to display the child plans */
 	haschildren = planstate->initPlan ||
 		outerPlanState(planstate) ||
@@ -2552,7 +2569,7 @@ show_tablesample(TableSampleClause *tsc, PlanState *planstate,
  * If it's EXPLAIN ANALYZE, show tuplesort stats for a sort node
  */
 static void
-show_sort_info(SortState *sortstate, ExplainState *es)
+show_sort_info(SortState *sortstate, StringInfo *worker_strs, ExplainState *es)
 {
 	if (!es->analyze)
 		return;
@@ -2593,7 +2610,7 @@ show_sort_info(SortState *sortstate, ExplainState *es)
 	if (sortstate->shared_info != NULL)
 	{
 		int			n;
-		bool		opened_group = false;
+		bool 		indent = es->has_worker_output;
 
 		for (n = 0; n < sortstate->shared_info->num_workers; n++)
 		{
@@ -2609,32 +2626,24 @@ show_sort_info(SortState *sortstate, ExplainState *es)
 			spaceType = tuplesort_space_type_name(sinstrument->spaceType);
 			spaceUsed = sinstrument->spaceUsed;
 
+			ExplainOpenWorker(worker_strs[n], es);
+
 			if (es->format == EXPLAIN_FORMAT_TEXT)
 			{
-				appendStringInfoSpaces(es->str, es->indent * 2);
-				if (n > 0 || !es->hide_workers)
-					appendStringInfo(es->str, "Worker %d:  ", n);
+				if (indent)
+					appendStringInfoSpaces(es->str, es->indent * 2);
 				appendStringInfo(es->str,
 								 "Sort Method: %s  %s: %ldkB\n",
 								 sortMethod, spaceType, spaceUsed);
 			}
 			else
 			{
-				if (!opened_group)
-				{
-					ExplainOpenGroup("Workers", "Workers", false, es);
-					opened_group = true;
-				}
-				ExplainOpenGroup("Worker", NULL, true, es);
-				ExplainPropertyInteger("Worker Number", NULL, n, es);
 				ExplainPropertyText("Sort Method", sortMethod, es);
 				ExplainPropertyInteger("Sort Space Used", "kB", spaceUsed, es);
 				ExplainPropertyText("Sort Space Type", spaceType, es);
-				ExplainCloseGroup("Worker", NULL, true, es);
 			}
 		}
-		if (opened_group)
-			ExplainCloseGroup("Workers", "Workers", false, es);
+		ExplainCloseWorker(es);
 	}
 }
 
@@ -3751,6 +3760,68 @@ ExplainCloseGroup(const char *objtype, const char *labelname,
 	}
 }
 
+/*
+ * Begin output for a specific worker. If we are currently producing output
+ * for another worker, close that worker automatically.
+ */
+static void
+ExplainOpenWorker(StringInfo worker_str, ExplainState *es)
+{
+	/*
+	 * For structured formats, we indent twice: once for the "Worker" group
+	 * itself and once for the surrounding "Workers" group. For text format,
+	 * we just indent once, to add worker info on the next worker line. Either
+	 * way, we only need to adjust indentation once for each string of
+	 * ExplainOpenWorker calls.
+	 */
+	if (es->str == es->root_str)
+		es->indent += es->format == EXPLAIN_FORMAT_TEXT ? 1 : 2;
+
+	es->has_worker_output = true;
+	es->str = worker_str;
+}
+
+/*
+ * Explicitly end output for the current worker and resume output
+ * for the root of the node.
+ */
+static void
+ExplainCloseWorker(ExplainState *es)
+{
+	/* as above, the text format has different indentation */
+	es->indent -= es->format == EXPLAIN_FORMAT_TEXT ? 1 : 2;
+
+	es->str = es->root_str;
+}
+
+/*
+ * Flush worker output for this node and reset the worker output state.
+ */
+static void
+ExplainFlushWorkers(StringInfo* worker_strs, int num_workers, ExplainState *es)
+{
+	if (!es->has_worker_output)
+		return;
+
+	ExplainOpenGroup("Workers", "Workers", false, es);
+	for (int i = 0; i < num_workers; i++)
+	{
+		ExplainOpenGroup("Worker", NULL, true, es);
+		if (es->format == EXPLAIN_FORMAT_TEXT)
+		{
+			appendStringInfoSpaces(es->str, es->indent * 2);
+			appendStringInfo(es->str, "Worker %d: ", i);
+		}
+		else
+			ExplainPropertyInteger("Worker Number", NULL, i, es);
+
+		appendStringInfoString(es->str, worker_strs[i]->data);
+		ExplainCloseGroup("Worker", NULL, true, es);
+	}
+	ExplainCloseGroup("Workers", "Workers", false, es);
+	es->has_worker_output = false;
+}
+
 /*
  * Emit a "dummy" group that never has any members.
  *
diff --git a/src/include/commands/explain.h b/src/include/commands/explain.h
index 45027a7c47..1f9208b571 100644
--- a/src/include/commands/explain.h
+++ b/src/include/commands/explain.h
@@ -27,7 +27,7 @@ typedef enum ExplainFormat
 
 typedef struct ExplainState
 {
-	StringInfo	str;			/* output buffer */
+	StringInfo	str;			/* current output buffer */
 	/* options */
 	bool		verbose;		/* be verbose */
 	bool		analyze;		/* print actual times */
@@ -40,6 +40,8 @@ typedef struct ExplainState
 	/* state for output formatting --- not reset for each new plan tree */
 	int			indent;			/* current indentation level */
 	List	   *grouping_stack; /* format-specific grouping state */
+	bool	    has_worker_output;	/* whether current node has worker metadata */
+	StringInfo	root_str;		/* root output buffer */
 	/* state related to the current plan tree (filled by ExplainPrintPlan) */
 	PlannedStmt *pstmt;			/* top of plan */
 	List	   *rtable;			/* range table */
diff --git a/src/test/regress/expected/explain.out b/src/test/regress/expected/explain.out
new file mode 100644
index 0000000000..12ea1f0e5b
--- /dev/null
+++ b/src/test/regress/expected/explain.out
@@ -0,0 +1,114 @@
+--
+-- EXPLAIN
+--
+-- helper functions for examining explain output
+-- return a json explain of given query
+CREATE OR REPLACE FUNCTION json_explain(query text, variadic opts text[])
+  RETURNS json LANGUAGE PLPGSQL AS $$
+DECLARE
+  plan json;
+BEGIN
+  EXECUTE 'EXPLAIN (' || array_to_string(array_append(opts, 'FORMAT JSON) '), ', ') || query INTO STRICT plan;
+  RETURN plan;
+END;
+$$;
+-- return a text explain of given query
+CREATE OR REPLACE FUNCTION text_explain(query text, variadic opts text[])
+  RETURNS text LANGUAGE PLPGSQL AS $$
+DECLARE
+  plan_row RECORD;
+  plan text = '';
+BEGIN
+  FOR plan_row IN EXECUTE 'EXPLAIN (' || array_to_string(array_append(opts, 'FORMAT TEXT) '), ', ') || query LOOP
+    plan = plan || plan_row."QUERY PLAN" || E'\n';
+  END LOOP;
+
+  RETURN plan;
+END;
+$$;
+/*
+ * Takes a json object and processes its keys and values. For every
+ * matching key whose value is a number, set it to zero to facilitate
+ * comparisons for values that can vary across executions. If not a number,
+ * set the value to -1 to indicate we matched an unexpected key. Return
+ * non-matching keys and values unmodified.
+ */
+CREATE OR REPLACE FUNCTION json_normalize_numeric_keys_matching(obj json, regexp text)
+  RETURNS json LANGUAGE SQL AS $$
+SELECT
+  coalesce(json_object_agg(
+    key,
+    CASE
+      WHEN key ~ regexp THEN
+        CASE json_typeof(value)
+	  WHEN 'number' THEN 0::text::json
+	  ELSE (-1)::text::json
+	END
+      ELSE value
+    END
+  ), '{}')
+FROM
+  json_each(obj)
+$$;
+-- explain only worker information, normalizing any values that could vary across executions
+CREATE OR REPLACE FUNCTION explain_workers(plan_node json)
+  RETURNS json LANGUAGE SQL AS $$
+SELECT
+  coalesce(json_object_agg(key, value), '{}')
+FROM (
+  SELECT
+    key,
+    CASE key
+      WHEN 'Workers' THEN
+        (SELECT json_agg(
+	   json_normalize_numeric_keys_matching(worker, '(Blocks|Time)$|Sort Space Used')
+        )
+        FROM json_array_elements(value) AS workers(worker))
+      WHEN 'Plans' THEN
+	(SELECT json_agg(
+	  explain_workers(child)
+	)
+	FROM json_array_elements(value) AS children(child))
+    END AS value
+  FROM
+    json_each(plan_node) AS entries(key, value)
+  WHERE
+    key IN ('Workers', 'Plans')
+) AS plan_fields(key, value);
+$$;
+-- test that per-worker sort and buffers output is combined correctly in EXPLAIN
+set force_parallel_mode = true;
+SELECT explain_workers(json_explain($$
+  SELECT * FROM (VALUES(1),(2)) x ORDER BY x;
+$$, 'verbose', 'analyze', 'buffers') -> 0 -> 'Plan');
+                                                                                                                                                                                                                                                                                                                                                                                                                                                              explain_workers                                                                                                                                                                                                                                                                                                                                                                                                                                                               
+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
+ { "Plans" : [{ "Workers" : [{ "Worker Number" : 0, "Actual Startup Time" : 0, "Actual Total Time" : 0, "Actual Rows" : 2, "Actual Loops" : 1, "Sort Method" : "quicksort", "Sort Space Used" : 0, "Sort Space Type" : "Memory", "Shared Hit Blocks" : 0, "Shared Read Blocks" : 0, "Shared Dirtied Blocks" : 0, "Shared Written Blocks" : 0, "Local Hit Blocks" : 0, "Local Read Blocks" : 0, "Local Dirtied Blocks" : 0, "Local Written Blocks" : 0, "Temp Read Blocks" : 0, "Temp Written Blocks" : 0 }], "Plans" : [{ "Workers" : [{ "Worker Number" : 0, "Actual Startup Time" : 0, "Actual Total Time" : 0, "Actual Rows" : 2, "Actual Loops" : 1, "Shared Hit Blocks" : 0, "Shared Read Blocks" : 0, "Shared Dirtied Blocks" : 0, "Shared Written Blocks" : 0, "Local Hit Blocks" : 0, "Local Read Blocks" : 0, "Local Dirtied Blocks" : 0, "Local Written Blocks" : 0, "Temp Read Blocks" : 0, "Temp Written Blocks" : 0 }] }] }] }
+(1 row)
+
+SELECT regexp_replace(regexp_replace(text_explain($$
+  SELECT * FROM (VALUES(1),(2)) x ORDER BY x;
+$$, 'verbose', 'analyze', 'buffers', 'timing off', 'costs off', 'summary off'),
+  'Buffers:.+$', 'Buffers: xxx', 'ng'), 'Sort Method:.+$', 'Sort Method: xxx', 'ng');
+                          regexp_replace                           
+-------------------------------------------------------------------
+ Gather (actual rows=2 loops=1)                                   +
+   Output: "*VALUES*".column1, (ROW("*VALUES*".column1))          +
+   Workers Planned: 1                                             +
+   Workers Launched: 1                                            +
+   Single Copy: true                                              +
+   Buffers: xxx                                                   +
+   ->  Sort (actual rows=2 loops=1)                               +
+         Output: "*VALUES*".column1, (ROW("*VALUES*".column1))    +
+         Sort Key: (ROW("*VALUES*".column1))                      +
+         Buffers: xxx                                             +
+         Worker 0: actual rows=2 loops=1                          +
+           Sort Method: xxx                                       +
+           Buffers: xxx                                           +
+         ->  Values Scan on "*VALUES*" (actual rows=2 loops=1)    +
+               Output: "*VALUES*".column1, ROW("*VALUES*".column1)+
+               Worker 0: actual rows=2 loops=1                    +
+ 
+(1 row)
+
+reset force_parallel_mode;
diff --git a/src/test/regress/expected/select_parallel.out b/src/test/regress/expected/select_parallel.out
index 96dfb7c8dd..cba15f7133 100644
--- a/src/test/regress/expected/select_parallel.out
+++ b/src/test/regress/expected/select_parallel.out
@@ -591,10 +591,10 @@ select * from explain_parallel_sort_stats();
          ->  Sort (actual rows=2000 loops=15)
                Sort Key: tenk1.ten
                Sort Method: quicksort  Memory: xxx
-               Worker 0:  Sort Method: quicksort  Memory: xxx
-               Worker 1:  Sort Method: quicksort  Memory: xxx
-               Worker 2:  Sort Method: quicksort  Memory: xxx
-               Worker 3:  Sort Method: quicksort  Memory: xxx
+               Worker 0: Sort Method: quicksort  Memory: xxx
+               Worker 1: Sort Method: quicksort  Memory: xxx
+               Worker 2: Sort Method: quicksort  Memory: xxx
+               Worker 3: Sort Method: quicksort  Memory: xxx
                ->  Parallel Seq Scan on tenk1 (actual rows=2000 loops=15)
                      Filter: (ten < 100)
 (14 rows)
diff --git a/src/test/regress/parallel_schedule b/src/test/regress/parallel_schedule
index d33a4e143d..d2b17dd3ea 100644
--- a/src/test/regress/parallel_schedule
+++ b/src/test/regress/parallel_schedule
@@ -112,7 +112,7 @@ test: plancache limit plpgsql copy2 temp domain rangefuncs prepare conversion tr
 # ----------
 # Another group of parallel tests
 # ----------
-test: partition_join partition_prune reloptions hash_part indexing partition_aggregate partition_info tuplesort
+test: partition_join partition_prune reloptions hash_part indexing partition_aggregate partition_info tuplesort explain
 
 # event triggers cannot run concurrently with any test that runs DDL
 test: event_trigger
diff --git a/src/test/regress/serial_schedule b/src/test/regress/serial_schedule
index f86f5c5682..acba391332 100644
--- a/src/test/regress/serial_schedule
+++ b/src/test/regress/serial_schedule
@@ -193,6 +193,7 @@ test: indexing
 test: partition_aggregate
 test: partition_info
 test: tuplesort
+test: explain
 test: event_trigger
 test: fast_default
 test: stats
diff --git a/src/test/regress/sql/explain.sql b/src/test/regress/sql/explain.sql
new file mode 100644
index 0000000000..af443000da
--- /dev/null
+++ b/src/test/regress/sql/explain.sql
@@ -0,0 +1,98 @@
+--
+-- EXPLAIN
+--
+
+-- helper functions for examining explain output
+
+-- return a json explain of given query
+CREATE OR REPLACE FUNCTION json_explain(query text, variadic opts text[])
+  RETURNS json LANGUAGE PLPGSQL AS $$
+DECLARE
+  plan json;
+BEGIN
+  EXECUTE 'EXPLAIN (' || array_to_string(array_append(opts, 'FORMAT JSON) '), ', ') || query INTO STRICT plan;
+  RETURN plan;
+END;
+$$;
+
+-- return a text explain of given query
+CREATE OR REPLACE FUNCTION text_explain(query text, variadic opts text[])
+  RETURNS text LANGUAGE PLPGSQL AS $$
+DECLARE
+  plan_row RECORD;
+  plan text = '';
+BEGIN
+  FOR plan_row IN EXECUTE 'EXPLAIN (' || array_to_string(array_append(opts, 'FORMAT TEXT) '), ', ') || query LOOP
+    plan = plan || plan_row."QUERY PLAN" || E'\n';
+  END LOOP;
+
+  RETURN plan;
+END;
+$$;
+
+
+/*
+ * Takes a json object and processes its keys and values. For every
+ * matching key whose value is a number, set it to zero to facilitate
+ * comparisons for values that can vary across executions. If not a number,
+ * set the value to -1 to indicate we matched an unexpected key. Return
+ * non-matching keys and values unmodified.
+ */
+CREATE OR REPLACE FUNCTION json_normalize_numeric_keys_matching(obj json, regexp text)
+  RETURNS json LANGUAGE SQL AS $$
+SELECT
+  coalesce(json_object_agg(
+    key,
+    CASE
+      WHEN key ~ regexp THEN
+        CASE json_typeof(value)
+	  WHEN 'number' THEN 0::text::json
+	  ELSE (-1)::text::json
+	END
+      ELSE value
+    END
+  ), '{}')
+FROM
+  json_each(obj)
+$$;
+
+-- explain only worker information, normalizing any values that could vary across executions
+CREATE OR REPLACE FUNCTION explain_workers(plan_node json)
+  RETURNS json LANGUAGE SQL AS $$
+SELECT
+  coalesce(json_object_agg(key, value), '{}')
+FROM (
+  SELECT
+    key,
+    CASE key
+      WHEN 'Workers' THEN
+        (SELECT json_agg(
+	   json_normalize_numeric_keys_matching(worker, '(Blocks|Time)$|Sort Space Used')
+        )
+        FROM json_array_elements(value) AS workers(worker))
+      WHEN 'Plans' THEN
+	(SELECT json_agg(
+	  explain_workers(child)
+	)
+	FROM json_array_elements(value) AS children(child))
+    END AS value
+  FROM
+    json_each(plan_node) AS entries(key, value)
+  WHERE
+    key IN ('Workers', 'Plans')
+) AS plan_fields(key, value);
+$$;
+
+-- test that per-worker sort and buffers output is combined correctly in EXPLAIN
+set force_parallel_mode = true;
+
+SELECT explain_workers(json_explain($$
+  SELECT * FROM (VALUES(1),(2)) x ORDER BY x;
+$$, 'verbose', 'analyze', 'buffers') -> 0 -> 'Plan');
+
+SELECT regexp_replace(regexp_replace(text_explain($$
+  SELECT * FROM (VALUES(1),(2)) x ORDER BY x;
+$$, 'verbose', 'analyze', 'buffers', 'timing off', 'costs off', 'summary off'),
+  'Buffers:.+$', 'Buffers: xxx', 'ng'), 'Sort Method:.+$', 'Sort Method: xxx', 'ng');
+
+reset force_parallel_mode;
#19Tom Lane
tgl@sss.pgh.pa.us
In reply to: Maciek Sakrejda (#18)
Re: Duplicate Workers entries in some EXPLAIN plans

Maciek Sakrejda <m.sakrejda@gmail.com> writes:

Great, thank you. I noticed in the CF patch tester app, the build
fails on Windows [1]. Investigating, I realized I had failed to fully
strip volatile EXPLAIN info (namely buffers) in TEXT mode due to a
bad regexp_replace.

You haven't gone nearly far enough in that direction. The proposed
output still relies on the assumption that the session will always
get as many workers as it asks for, which it will not. For previous
bitter experience in this department, see for instance commits 4ea03f3f4
and 13e8b2ee8.

TBH I am not sure that the proposed regression tests for this change
can be committed at all. Which is a bit of a problem perhaps, but
then again we don't have terribly good coverage for the existing code
either, for the same reason.

regards, tom lane

#20Maciek Sakrejda
m.sakrejda@gmail.com
In reply to: Tom Lane (#19)
Re: Duplicate Workers entries in some EXPLAIN plans

Okay. Does not getting as many workers as it asks for include
sometimes getting zero, completely changing the actual output? If so,
I'll submit a new version of the patch removing all tests--I was
hoping to improve coverage, but I guess this is not the way to start.
If not, can we keep the json tests at least if we only consider the
first worker?

#21Tom Lane
tgl@sss.pgh.pa.us
In reply to: Maciek Sakrejda (#20)
1 attachment(s)
Re: Duplicate Workers entries in some EXPLAIN plans

Maciek Sakrejda <m.sakrejda@gmail.com> writes:

Okay. Does not getting as many workers as it asks for include
sometimes getting zero, completely changing the actual output?

Yup :-(. We could possibly avoid that by running the explain
test by itself rather than in a parallel group, but I don't
especially want to add yet more non-parallelizable tests.

Meanwhile, I spent some time looking at the code, and wasn't very happy
with it. I'm on board with the general plan of redirecting EXPLAIN
output into per-worker buffers that we eventually recombine, but I think
that this particular coding is pretty unmaintainable/unextensible.
In particular, I'm really unhappy that the code is ignoring EXPLAIN's
grouping_state stack. That means that when it's formatting fields that
belong to the per-worker group, it's using the state-stack entry that
corresponds to the plan node's main level. This seems to accidentally
work, but that fact depends on a number of shaky assumptions:

* Before messing with any per-worker data, we've always emitted at
least one field in the plan node's main level, so that the state-stack
entry isn't at its initial state for the level.

* Before actually emitting the shunted-aside data, we've always emitted
a "Worker Number" field in correct format within the per-worker group,
so that the formatting state is now correct for a non-initial field.

* There is no formatting difference between any non-first fields in
a level (ie the state stack entries are basically just booleans),
so that it doesn't matter how many plan-node fields we emitted before
starting the per-worker data, so long as there was at least one, nor
does transiently abusing the plan node level's stack entry like this
break the formatting of subsequent plan-node-level fields.

Now maybe we'll never implement an output format that breaks that
last assumption, and maybe we'll never rearrange the EXPLAIN code
in a way that breaks either of the first two. But I don't like those
assumptions too much. I also didn't like the code's assumption that
all the non-text formats interpret es->indent the same.

I also noted an actual bug, which is that the patch fails regression
testing under force_parallel_mode = regress. This isn't really your
fault, because the issue is in this obscure and poorly-commented hack
in show_sort_info:

* You might think we should just skip this stanza entirely when
* es->hide_workers is true, but then we'd get no sort-method output at
* all. We have to make it look like worker 0's data is top-level data.
* Currently, we only bother with that for text-format output.

Nonetheless, it's broken.

So I spent some time hacking on this and came up with the attached.
It's noticeably more verbose than your patch, but it keeps the
output-format-aware code at what seems to me to be a maintainable
arm's-length distance from the parallel-worker hacking. TEXT is
still a special case of course :-(.

This patch just covers the code, I'm not taking any position yet
about what to do about the tests. I did tweak the code to eliminate
the one formatting difference in select_parallel (ie put two spaces
after "Worker N:", which I think reads better anyhow), so it
passes check-world as it stands.

Thoughts?

regards, tom lane

Attachments:

merge-explain-worker-output-v6.patchtext/x-diff; charset=us-ascii; name=merge-explain-worker-output-v6.patchDownload
diff --git a/src/backend/commands/explain.c b/src/backend/commands/explain.c
index d189b8d..9e965d2 100644
--- a/src/backend/commands/explain.c
+++ b/src/backend/commands/explain.c
@@ -123,11 +123,20 @@ static void ExplainSubPlans(List *plans, List *ancestors,
 							const char *relationship, ExplainState *es);
 static void ExplainCustomChildren(CustomScanState *css,
 								  List *ancestors, ExplainState *es);
+static ExplainWorkersState *ExplainCreateWorkersState(int num_workers);
+static void ExplainOpenWorker(int n, ExplainState *es);
+static void ExplainCloseWorker(int n, ExplainState *es);
+static void ExplainFlushWorkersState(ExplainState *es);
 static void ExplainProperty(const char *qlabel, const char *unit,
 							const char *value, bool numeric, ExplainState *es);
+static void ExplainOpenSetAsideGroup(const char *objtype, const char *labelname,
+									 bool labeled, int depth, ExplainState *es);
+static void ExplainSaveGroup(ExplainState *es, int depth, int *state_save);
+static void ExplainRestoreGroup(ExplainState *es, int depth, int *state_save);
 static void ExplainDummyGroup(const char *objtype, const char *labelname,
 							  ExplainState *es);
 static void ExplainXMLTag(const char *tagname, int flags, ExplainState *es);
+static void ExplainIndentText(ExplainState *es);
 static void ExplainJSONLineEnding(ExplainState *es);
 static void ExplainYAMLLineStarting(ExplainState *es);
 static void escape_yaml(StringInfo buf, const char *str);
@@ -827,7 +836,7 @@ ExplainPrintJIT(ExplainState *es, int jit_flags,
 	/* for higher density, open code the text output format */
 	if (es->format == EXPLAIN_FORMAT_TEXT)
 	{
-		appendStringInfoSpaces(es->str, es->indent * 2);
+		ExplainIndentText(es);
 		if (for_workers)
 			appendStringInfo(es->str, "JIT for worker %u:\n", worker_num);
 		else
@@ -836,7 +845,7 @@ ExplainPrintJIT(ExplainState *es, int jit_flags,
 
 		ExplainPropertyInteger("Functions", NULL, ji->created_functions, es);
 
-		appendStringInfoSpaces(es->str, es->indent * 2);
+		ExplainIndentText(es);
 		appendStringInfo(es->str, "Options: %s %s, %s %s, %s %s, %s %s\n",
 						 "Inlining", jit_flags & PGJIT_INLINE ? "true" : "false",
 						 "Optimization", jit_flags & PGJIT_OPT3 ? "true" : "false",
@@ -845,7 +854,7 @@ ExplainPrintJIT(ExplainState *es, int jit_flags,
 
 		if (es->analyze && es->timing)
 		{
-			appendStringInfoSpaces(es->str, es->indent * 2);
+			ExplainIndentText(es);
 			appendStringInfo(es->str,
 							 "Timing: %s %.3f ms, %s %.3f ms, %s %.3f ms, %s %.3f ms, %s %.3f ms\n",
 							 "Generation", 1000.0 * INSTR_TIME_GET_DOUBLE(ji->generation_counter),
@@ -1074,9 +1083,10 @@ ExplainPreScanNode(PlanState *planstate, Bitmapset **rels_used)
  * optional name to be attached to the node.
  *
  * In text format, es->indent is controlled in this function since we only
- * want it to change at plan-node boundaries.  In non-text formats, es->indent
- * corresponds to the nesting depth of logical output groups, and therefore
- * is controlled by ExplainOpenGroup/ExplainCloseGroup.
+ * want it to change at plan-node boundaries (but a few subroutines will
+ * transiently increment it).  In non-text formats, es->indent corresponds
+ * to the nesting depth of logical output groups, and therefore is controlled
+ * by ExplainOpenGroup/ExplainCloseGroup.
  */
 static void
 ExplainNode(PlanState *planstate, List *ancestors,
@@ -1090,9 +1100,20 @@ ExplainNode(PlanState *planstate, List *ancestors,
 	const char *partialmode = NULL;
 	const char *operation = NULL;
 	const char *custom_name = NULL;
+	ExplainWorkersState *save_workers_state = es->workers_state;
 	int			save_indent = es->indent;
 	bool		haschildren;
 
+	/*
+	 * Prepare per-worker output buffers, if needed.  We'll append the data in
+	 * these to the main output string further down.
+	 */
+	if (planstate->worker_instrument && es->analyze && !es->hide_workers)
+		es->workers_state = ExplainCreateWorkersState(planstate->worker_instrument->num_workers);
+	else
+		es->workers_state = NULL;
+
+	/* Identify plan node type, and print generic details */
 	switch (nodeTag(plan))
 	{
 		case T_Result:
@@ -1324,13 +1345,13 @@ ExplainNode(PlanState *planstate, List *ancestors,
 	{
 		if (plan_name)
 		{
-			appendStringInfoSpaces(es->str, es->indent * 2);
+			ExplainIndentText(es);
 			appendStringInfo(es->str, "%s\n", plan_name);
 			es->indent++;
 		}
 		if (es->indent)
 		{
-			appendStringInfoSpaces(es->str, es->indent * 2);
+			ExplainIndentText(es);
 			appendStringInfoString(es->str, "->  ");
 			es->indent += 2;
 		}
@@ -1574,6 +1595,56 @@ ExplainNode(PlanState *planstate, List *ancestors,
 	if (es->format == EXPLAIN_FORMAT_TEXT)
 		appendStringInfoChar(es->str, '\n');
 
+	/* prepare per-worker general execution details */
+	if (es->workers_state && es->verbose)
+	{
+		WorkerInstrumentation *w = planstate->worker_instrument;
+
+		for (int n = 0; n < w->num_workers; n++)
+		{
+			Instrumentation *instrument = &w->instrument[n];
+			double		nloops = instrument->nloops;
+			double		startup_ms;
+			double		total_ms;
+			double		rows;
+
+			if (nloops <= 0)
+				continue;
+			startup_ms = 1000.0 * instrument->startup / nloops;
+			total_ms = 1000.0 * instrument->total / nloops;
+			rows = instrument->ntuples / nloops;
+
+			ExplainOpenWorker(n, es);
+
+			if (es->format == EXPLAIN_FORMAT_TEXT)
+			{
+				ExplainIndentText(es);
+				if (es->timing)
+					appendStringInfo(es->str,
+									 "actual time=%.3f..%.3f rows=%.0f loops=%.0f\n",
+									 startup_ms, total_ms, rows, nloops);
+				else
+					appendStringInfo(es->str,
+									 "actual rows=%.0f loops=%.0f\n",
+									 rows, nloops);
+			}
+			else
+			{
+				if (es->timing)
+				{
+					ExplainPropertyFloat("Actual Startup Time", "ms",
+										 startup_ms, 3, es);
+					ExplainPropertyFloat("Actual Total Time", "ms",
+										 total_ms, 3, es);
+				}
+				ExplainPropertyFloat("Actual Rows", NULL, rows, 0, es);
+				ExplainPropertyFloat("Actual Loops", NULL, nloops, 0, es);
+			}
+
+			ExplainCloseWorker(n, es);
+		}
+	}
+
 	/* target list */
 	if (es->verbose)
 		show_plan_tlist(planstate, ancestors, es);
@@ -1885,76 +1956,30 @@ ExplainNode(PlanState *planstate, List *ancestors,
 	if (es->buffers && planstate->instrument)
 		show_buffer_usage(es, &planstate->instrument->bufusage);
 
-	/* Show worker detail */
-	if (es->analyze && es->verbose && !es->hide_workers &&
-		planstate->worker_instrument)
+	/* Prepare per-worker buffer usage */
+	if (es->workers_state && es->buffers && es->verbose)
 	{
 		WorkerInstrumentation *w = planstate->worker_instrument;
-		bool		opened_group = false;
-		int			n;
 
-		for (n = 0; n < w->num_workers; ++n)
+		for (int n = 0; n < w->num_workers; n++)
 		{
 			Instrumentation *instrument = &w->instrument[n];
 			double		nloops = instrument->nloops;
-			double		startup_ms;
-			double		total_ms;
-			double		rows;
 
 			if (nloops <= 0)
 				continue;
-			startup_ms = 1000.0 * instrument->startup / nloops;
-			total_ms = 1000.0 * instrument->total / nloops;
-			rows = instrument->ntuples / nloops;
 
-			if (es->format == EXPLAIN_FORMAT_TEXT)
-			{
-				appendStringInfoSpaces(es->str, es->indent * 2);
-				appendStringInfo(es->str, "Worker %d: ", n);
-				if (es->timing)
-					appendStringInfo(es->str,
-									 "actual time=%.3f..%.3f rows=%.0f loops=%.0f\n",
-									 startup_ms, total_ms, rows, nloops);
-				else
-					appendStringInfo(es->str,
-									 "actual rows=%.0f loops=%.0f\n",
-									 rows, nloops);
-				es->indent++;
-				if (es->buffers)
-					show_buffer_usage(es, &instrument->bufusage);
-				es->indent--;
-			}
-			else
-			{
-				if (!opened_group)
-				{
-					ExplainOpenGroup("Workers", "Workers", false, es);
-					opened_group = true;
-				}
-				ExplainOpenGroup("Worker", NULL, true, es);
-				ExplainPropertyInteger("Worker Number", NULL, n, es);
-
-				if (es->timing)
-				{
-					ExplainPropertyFloat("Actual Startup Time", "ms",
-										 startup_ms, 3, es);
-					ExplainPropertyFloat("Actual Total Time", "ms",
-										 total_ms, 3, es);
-				}
-				ExplainPropertyFloat("Actual Rows", NULL, rows, 0, es);
-				ExplainPropertyFloat("Actual Loops", NULL, nloops, 0, es);
-
-				if (es->buffers)
-					show_buffer_usage(es, &instrument->bufusage);
-
-				ExplainCloseGroup("Worker", NULL, true, es);
-			}
+			ExplainOpenWorker(n, es);
+			show_buffer_usage(es, &instrument->bufusage);
+			ExplainCloseWorker(n, es);
 		}
-
-		if (opened_group)
-			ExplainCloseGroup("Workers", "Workers", false, es);
 	}
 
+	/* Show per-worker details, then pop that stack */
+	if (es->workers_state)
+		ExplainFlushWorkersState(es);
+	es->workers_state = save_workers_state;
+
 	/* Get ready to display the child plans */
 	haschildren = planstate->initPlan ||
 		outerPlanState(planstate) ||
@@ -2525,7 +2550,7 @@ show_tablesample(TableSampleClause *tsc, PlanState *planstate,
 	{
 		bool		first = true;
 
-		appendStringInfoSpaces(es->str, es->indent * 2);
+		ExplainIndentText(es);
 		appendStringInfo(es->str, "Sampling: %s (", method_name);
 		foreach(lc, params)
 		{
@@ -2572,7 +2597,7 @@ show_sort_info(SortState *sortstate, ExplainState *es)
 
 		if (es->format == EXPLAIN_FORMAT_TEXT)
 		{
-			appendStringInfoSpaces(es->str, es->indent * 2);
+			ExplainIndentText(es);
 			appendStringInfo(es->str, "Sort Method: %s  %s: %ldkB\n",
 							 sortMethod, spaceType, spaceUsed);
 		}
@@ -2588,12 +2613,14 @@ show_sort_info(SortState *sortstate, ExplainState *es)
 	 * You might think we should just skip this stanza entirely when
 	 * es->hide_workers is true, but then we'd get no sort-method output at
 	 * all.  We have to make it look like worker 0's data is top-level data.
-	 * Currently, we only bother with that for text-format output.
+	 * This is easily done by just skipping the OpenWorker/CloseWorker calls.
+	 * Currently, we don't worry about the possibility that there are multiple
+	 * workers in such a case; if there are, duplicate output fields will be
+	 * emitted.
 	 */
 	if (sortstate->shared_info != NULL)
 	{
 		int			n;
-		bool		opened_group = false;
 
 		for (n = 0; n < sortstate->shared_info->num_workers; n++)
 		{
@@ -2609,32 +2636,26 @@ show_sort_info(SortState *sortstate, ExplainState *es)
 			spaceType = tuplesort_space_type_name(sinstrument->spaceType);
 			spaceUsed = sinstrument->spaceUsed;
 
+			if (es->workers_state)
+				ExplainOpenWorker(n, es);
+
 			if (es->format == EXPLAIN_FORMAT_TEXT)
 			{
-				appendStringInfoSpaces(es->str, es->indent * 2);
-				if (n > 0 || !es->hide_workers)
-					appendStringInfo(es->str, "Worker %d:  ", n);
+				ExplainIndentText(es);
 				appendStringInfo(es->str,
 								 "Sort Method: %s  %s: %ldkB\n",
 								 sortMethod, spaceType, spaceUsed);
 			}
 			else
 			{
-				if (!opened_group)
-				{
-					ExplainOpenGroup("Workers", "Workers", false, es);
-					opened_group = true;
-				}
-				ExplainOpenGroup("Worker", NULL, true, es);
-				ExplainPropertyInteger("Worker Number", NULL, n, es);
 				ExplainPropertyText("Sort Method", sortMethod, es);
 				ExplainPropertyInteger("Sort Space Used", "kB", spaceUsed, es);
 				ExplainPropertyText("Sort Space Type", spaceType, es);
-				ExplainCloseGroup("Worker", NULL, true, es);
 			}
+
+			if (es->workers_state)
+				ExplainCloseWorker(n, es);
 		}
-		if (opened_group)
-			ExplainCloseGroup("Workers", "Workers", false, es);
 	}
 }
 
@@ -2721,7 +2742,7 @@ show_hash_info(HashState *hashstate, ExplainState *es)
 		else if (hinstrument.nbatch_original != hinstrument.nbatch ||
 				 hinstrument.nbuckets_original != hinstrument.nbuckets)
 		{
-			appendStringInfoSpaces(es->str, es->indent * 2);
+			ExplainIndentText(es);
 			appendStringInfo(es->str,
 							 "Buckets: %d (originally %d)  Batches: %d (originally %d)  Memory Usage: %ldkB\n",
 							 hinstrument.nbuckets,
@@ -2732,7 +2753,7 @@ show_hash_info(HashState *hashstate, ExplainState *es)
 		}
 		else
 		{
-			appendStringInfoSpaces(es->str, es->indent * 2);
+			ExplainIndentText(es);
 			appendStringInfo(es->str,
 							 "Buckets: %d  Batches: %d  Memory Usage: %ldkB\n",
 							 hinstrument.nbuckets, hinstrument.nbatch,
@@ -2758,7 +2779,7 @@ show_tidbitmap_info(BitmapHeapScanState *planstate, ExplainState *es)
 	{
 		if (planstate->exact_pages > 0 || planstate->lossy_pages > 0)
 		{
-			appendStringInfoSpaces(es->str, es->indent * 2);
+			ExplainIndentText(es);
 			appendStringInfoString(es->str, "Heap Blocks:");
 			if (planstate->exact_pages > 0)
 				appendStringInfo(es->str, " exact=%ld", planstate->exact_pages);
@@ -2894,7 +2915,7 @@ show_buffer_usage(ExplainState *es, const BufferUsage *usage)
 		/* Show only positive counter values. */
 		if (has_shared || has_local || has_temp)
 		{
-			appendStringInfoSpaces(es->str, es->indent * 2);
+			ExplainIndentText(es);
 			appendStringInfoString(es->str, "Buffers:");
 
 			if (has_shared)
@@ -2949,7 +2970,7 @@ show_buffer_usage(ExplainState *es, const BufferUsage *usage)
 		/* As above, show only positive counter values. */
 		if (has_timing)
 		{
-			appendStringInfoSpaces(es->str, es->indent * 2);
+			ExplainIndentText(es);
 			appendStringInfoString(es->str, "I/O Timings:");
 			if (!INSTR_TIME_IS_ZERO(usage->blk_read_time))
 				appendStringInfo(es->str, " read=%0.3f",
@@ -3237,7 +3258,7 @@ show_modifytable_info(ModifyTableState *mtstate, List *ancestors,
 			 */
 			if (es->format == EXPLAIN_FORMAT_TEXT)
 			{
-				appendStringInfoSpaces(es->str, es->indent * 2);
+				ExplainIndentText(es);
 				appendStringInfoString(es->str,
 									   fdwroutine ? foperation : operation);
 			}
@@ -3427,6 +3448,150 @@ ExplainCustomChildren(CustomScanState *css, List *ancestors, ExplainState *es)
 }
 
 /*
+ * Create a per-plan-node workspace for collecting per-worker data.
+ */
+static ExplainWorkersState *
+ExplainCreateWorkersState(int num_workers)
+{
+	ExplainWorkersState *wstate;
+
+	wstate = (ExplainWorkersState *) palloc(sizeof(ExplainWorkersState));
+	wstate->num_workers = num_workers;
+	wstate->worker_inited = (bool *) palloc0(num_workers * sizeof(bool));
+	wstate->worker_str = (StringInfoData *)
+		palloc0(num_workers * sizeof(StringInfoData));
+	wstate->worker_state_save = (int *) palloc(num_workers * sizeof(int));
+	return wstate;
+}
+
+/*
+ * Begin or resume output into the set-aside group for worker N.
+ */
+static void
+ExplainOpenWorker(int n, ExplainState *es)
+{
+	ExplainWorkersState *wstate = es->workers_state;
+
+	Assert(wstate);
+	Assert(n >= 0 && n < wstate->num_workers);
+
+	/* Save prior output pointer */
+	wstate->prev_str = es->str;
+
+	if (!wstate->worker_inited[n])
+	{
+		/* First time through, so create the buffer for this worker */
+		initStringInfo(&wstate->worker_str[n]);
+		es->str = &wstate->worker_str[n];
+
+		/*
+		 * Push suitable initial formatting state for this worker's field
+		 * group.  We allow one extra logical nesting level, since this group
+		 * will eventually be wrapped in an outer "Workers" group.
+		 */
+		ExplainOpenSetAsideGroup("Worker", NULL, true, 2, es);
+
+		/*
+		 * In non-TEXT formats, we always emit a "Worker Number" field even if
+		 * there's no other data for this worker.
+		 */
+		if (es->format != EXPLAIN_FORMAT_TEXT)
+			ExplainPropertyInteger("Worker Number", NULL, n, es);
+
+		wstate->worker_inited[n] = true;
+	}
+	else
+	{
+		/* Resuming output for a worker we've already emitted some data for */
+		es->str = &wstate->worker_str[n];
+
+		/* Restore formatting state saved by last ExplainCloseWorker() */
+		ExplainRestoreGroup(es, 2, &wstate->worker_state_save[n]);
+	}
+
+	/*
+	 * In TEXT format, prefix the first output line for this worker with
+	 * "Worker N:".  Then, any additional lines should be indented one more
+	 * stop than the "Worker N" line is.
+	 */
+	if (es->format == EXPLAIN_FORMAT_TEXT)
+	{
+		if (es->str->len == 0)
+		{
+			ExplainIndentText(es);
+			appendStringInfo(es->str, "Worker %d:  ", n);
+		}
+
+		es->indent++;
+	}
+}
+
+/*
+ * End output for worker N --- must pair with previous ExplainOpenWorker call
+ */
+static void
+ExplainCloseWorker(int n, ExplainState *es)
+{
+	ExplainWorkersState *wstate = es->workers_state;
+
+	Assert(wstate);
+	Assert(n >= 0 && n < wstate->num_workers);
+	Assert(wstate->worker_inited[n]);
+
+	/*
+	 * Save formatting state in case we do another ExplainOpenWorker(), then
+	 * pop the formatting stack.
+	 */
+	ExplainSaveGroup(es, 2, &wstate->worker_state_save[n]);
+
+	/*
+	 * In TEXT format, if we didn't actually produce any output line(s) then
+	 * truncate off the partial line (this avoids bogus output if, for
+	 * instance, show_buffer_usage chooses not to print anything).  Also fix
+	 * up the indent level.
+	 */
+	if (es->format == EXPLAIN_FORMAT_TEXT)
+	{
+		while (es->str->len > 0 && es->str->data[es->str->len - 1] != '\n')
+			es->str->data[--(es->str->len)] = '\0';
+
+		es->indent--;
+	}
+
+	/* Restore prior output pointer */
+	es->str = wstate->prev_str;
+}
+
+/*
+ * Print per-worker info for current node, then free the ExplainWorkersState.
+ */
+static void
+ExplainFlushWorkersState(ExplainState *es)
+{
+	ExplainWorkersState *wstate = es->workers_state;
+
+	ExplainOpenGroup("Workers", "Workers", false, es);
+	for (int i = 0; i < wstate->num_workers; i++)
+	{
+		if (wstate->worker_inited[i])
+		{
+			/* This must match previous ExplainOpenSetAsideGroup call */
+			ExplainOpenGroup("Worker", NULL, true, es);
+			appendStringInfoString(es->str, wstate->worker_str[i].data);
+			ExplainCloseGroup("Worker", NULL, true, es);
+
+			pfree(wstate->worker_str[i].data);
+		}
+	}
+	ExplainCloseGroup("Workers", "Workers", false, es);
+
+	pfree(wstate->worker_inited);
+	pfree(wstate->worker_str);
+	pfree(wstate->worker_state_save);
+	pfree(wstate);
+}
+
+/*
  * Explain a property, such as sort keys or targets, that takes the form of
  * a list of unlabeled items.  "data" is a list of C strings.
  */
@@ -3439,7 +3604,7 @@ ExplainPropertyList(const char *qlabel, List *data, ExplainState *es)
 	switch (es->format)
 	{
 		case EXPLAIN_FORMAT_TEXT:
-			appendStringInfoSpaces(es->str, es->indent * 2);
+			ExplainIndentText(es);
 			appendStringInfo(es->str, "%s: ", qlabel);
 			foreach(lc, data)
 			{
@@ -3560,7 +3725,7 @@ ExplainProperty(const char *qlabel, const char *unit, const char *value,
 	switch (es->format)
 	{
 		case EXPLAIN_FORMAT_TEXT:
-			appendStringInfoSpaces(es->str, es->indent * 2);
+			ExplainIndentText(es);
 			if (unit)
 				appendStringInfo(es->str, "%s: %s %s\n", qlabel, value, unit);
 			else
@@ -3752,6 +3917,117 @@ ExplainCloseGroup(const char *objtype, const char *labelname,
 }
 
 /*
+ * Open a group of related objects, without emitting actual data.
+ *
+ * Prepare the formatting state as though we were beginning a group with
+ * the identified properties, but don't actually emit anything.  Output
+ * subsequent to this call can be redirected into a separate output buffer,
+ * and then eventually appended to the main output buffer after doing a
+ * regular ExplainOpenGroup call (with the same parameters).
+ *
+ * The extra "depth" parameter is the new group's depth compared to current.
+ * It could be more than one, in case the eventual output will be enclosed
+ * in additional nesting group levels.  We assume we don't need to track
+ * formatting state for those levels while preparing this group's output.
+ *
+ * There is no ExplainCloseSetAsideGroup --- in current usage, we always
+ * pop this state with ExplainSaveGroup.
+ */
+static void
+ExplainOpenSetAsideGroup(const char *objtype, const char *labelname,
+						 bool labeled, int depth, ExplainState *es)
+{
+	switch (es->format)
+	{
+		case EXPLAIN_FORMAT_TEXT:
+			/* nothing to do */
+			break;
+
+		case EXPLAIN_FORMAT_XML:
+			es->indent += depth;
+			break;
+
+		case EXPLAIN_FORMAT_JSON:
+			es->grouping_stack = lcons_int(0, es->grouping_stack);
+			es->indent += depth;
+			break;
+
+		case EXPLAIN_FORMAT_YAML:
+			if (labelname)
+				es->grouping_stack = lcons_int(1, es->grouping_stack);
+			else
+				es->grouping_stack = lcons_int(0, es->grouping_stack);
+			es->indent += depth;
+			break;
+	}
+}
+
+/*
+ * Pop one level of grouping state, allowing for a re-push later.
+ *
+ * This is typically used after ExplainOpenSetAsideGroup; pass the
+ * same "depth" used for that.
+ *
+ * This should not emit any output.  If state needs to be saved,
+ * save it at *state_save.  Currently, an integer save area is sufficient
+ * for all formats, but we might need to revisit that someday.
+ */
+static void
+ExplainSaveGroup(ExplainState *es, int depth, int *state_save)
+{
+	switch (es->format)
+	{
+		case EXPLAIN_FORMAT_TEXT:
+			/* nothing to do */
+			break;
+
+		case EXPLAIN_FORMAT_XML:
+			es->indent -= depth;
+			break;
+
+		case EXPLAIN_FORMAT_JSON:
+			es->indent -= depth;
+			*state_save = linitial_int(es->grouping_stack);
+			es->grouping_stack = list_delete_first(es->grouping_stack);
+			break;
+
+		case EXPLAIN_FORMAT_YAML:
+			es->indent -= depth;
+			*state_save = linitial_int(es->grouping_stack);
+			es->grouping_stack = list_delete_first(es->grouping_stack);
+			break;
+	}
+}
+
+/*
+ * Re-push one level of grouping state, undoing the effects of ExplainSaveGroup.
+ */
+static void
+ExplainRestoreGroup(ExplainState *es, int depth, int *state_save)
+{
+	switch (es->format)
+	{
+		case EXPLAIN_FORMAT_TEXT:
+			/* nothing to do */
+			break;
+
+		case EXPLAIN_FORMAT_XML:
+			es->indent += depth;
+			break;
+
+		case EXPLAIN_FORMAT_JSON:
+			es->grouping_stack = lcons_int(*state_save, es->grouping_stack);
+			es->indent += depth;
+			break;
+
+		case EXPLAIN_FORMAT_YAML:
+			es->grouping_stack = lcons_int(*state_save, es->grouping_stack);
+			es->indent += depth;
+			break;
+	}
+}
+
+/*
  * Emit a "dummy" group that never has any members.
  *
  * objtype is the type of the group object, labelname is its label within
@@ -3913,6 +4189,21 @@ ExplainXMLTag(const char *tagname, int flags, ExplainState *es)
 }
 
 /*
+ * Indent a text-format line.
+ *
+ * We indent by two spaces per indentation level.  However, when emitting
+ * data for a parallel worker there might already be data on the current line
+ * (cf. ExplainOpenWorker); in that case, don't indent any more.
+ */
+static void
+ExplainIndentText(ExplainState *es)
+{
+	Assert(es->format == EXPLAIN_FORMAT_TEXT);
+	if (es->str->len == 0 || es->str->data[es->str->len - 1] == '\n')
+		appendStringInfoSpaces(es->str, es->indent * 2);
+}
+
+/*
  * Emit a JSON line ending.
  *
  * JSON requires a comma after each property but the last.  To facilitate this,
diff --git a/src/include/commands/explain.h b/src/include/commands/explain.h
index 45027a7..853e206 100644
--- a/src/include/commands/explain.h
+++ b/src/include/commands/explain.h
@@ -25,6 +25,15 @@ typedef enum ExplainFormat
 	EXPLAIN_FORMAT_YAML
 } ExplainFormat;
 
+typedef struct ExplainWorkersState
+{
+	int			num_workers;	/* # of worker processes the plan used */
+	bool	   *worker_inited;	/* per-worker state-initialized flags */
+	StringInfoData *worker_str; /* per-worker transient output buffers */
+	int		   *worker_state_save;	/* per-worker grouping state save areas */
+	StringInfo	prev_str;		/* saved output buffer while redirecting */
+} ExplainWorkersState;
+
 typedef struct ExplainState
 {
 	StringInfo	str;			/* output buffer */
@@ -47,6 +56,8 @@ typedef struct ExplainState
 	List	   *deparse_cxt;	/* context list for deparsing expressions */
 	Bitmapset  *printed_subplans;	/* ids of SubPlans we've printed */
 	bool		hide_workers;	/* set if we find an invisible Gather */
+	/* state related to the current plan node */
+	ExplainWorkersState *workers_state; /* needed if parallel plan */
 } ExplainState;
 
 /* Hook for plugins to get control in ExplainOneQuery() */
#22Maciek Sakrejda
m.sakrejda@gmail.com
In reply to: Tom Lane (#21)
Re: Duplicate Workers entries in some EXPLAIN plans

Thanks for the thorough review. I obviously missed some critical
issues. I recognized some of the other maintainability problems, but
did not have a sense of how to best avoid them, being unfamiliar with
the code.

For what it's worth, this version makes sense to me.

#23Tom Lane
tgl@sss.pgh.pa.us
In reply to: Maciek Sakrejda (#22)
1 attachment(s)
Re: Duplicate Workers entries in some EXPLAIN plans

Maciek Sakrejda <m.sakrejda@gmail.com> writes:

For what it's worth, this version makes sense to me.

Thanks for looking. Here's a version that deals with the JIT
instrumentation. As Andres noted far upthread, that was also
really bogusly done before. Not only could you get multiple "JIT"
subnodes on a Gather node, but we failed to print the info at all
if the parallelism was expressed as Gather Merge rather than
Gather.

A side effect of this change is that per-worker JIT info is now
printed one plan level further down: before it was printed on
the Gather node, but now it's attached to the Gather's child,
because that's where we print other per-worker data. I don't
see anything particularly wrong with that, but it's another
change from the behavior today.

It's still really unclear to me how we could exercise any of
this behavior meaningfully in a regression test. I thought
for a little bit about using the TAP infrastructure instead
of a traditional-style test, but it seems like that doesn't
buy anything except for a bias towards ignoring details instead
of overspecifying them. Which is not much of an improvement.

regards, tom lane

Attachments:

merge-explain-worker-output-v7.patchtext/x-diff; charset=us-ascii; name=merge-explain-worker-output-v7.patchDownload
diff --git a/src/backend/commands/explain.c b/src/backend/commands/explain.c
index d189b8d..fe3c83b 100644
--- a/src/backend/commands/explain.c
+++ b/src/backend/commands/explain.c
@@ -57,6 +57,8 @@ static void ExplainOneQuery(Query *query, int cursorOptions,
 							IntoClause *into, ExplainState *es,
 							const char *queryString, ParamListInfo params,
 							QueryEnvironment *queryEnv);
+static void ExplainPrintJIT(ExplainState *es, int jit_flags,
+							JitInstrumentation *ji);
 static void report_triggers(ResultRelInfo *rInfo, bool show_relname,
 							ExplainState *es);
 static double elapsed_time(instr_time *starttime);
@@ -123,11 +125,20 @@ static void ExplainSubPlans(List *plans, List *ancestors,
 							const char *relationship, ExplainState *es);
 static void ExplainCustomChildren(CustomScanState *css,
 								  List *ancestors, ExplainState *es);
+static ExplainWorkersState *ExplainCreateWorkersState(int num_workers);
+static void ExplainOpenWorker(int n, ExplainState *es);
+static void ExplainCloseWorker(int n, ExplainState *es);
+static void ExplainFlushWorkersState(ExplainState *es);
 static void ExplainProperty(const char *qlabel, const char *unit,
 							const char *value, bool numeric, ExplainState *es);
+static void ExplainOpenSetAsideGroup(const char *objtype, const char *labelname,
+									 bool labeled, int depth, ExplainState *es);
+static void ExplainSaveGroup(ExplainState *es, int depth, int *state_save);
+static void ExplainRestoreGroup(ExplainState *es, int depth, int *state_save);
 static void ExplainDummyGroup(const char *objtype, const char *labelname,
 							  ExplainState *es);
 static void ExplainXMLTag(const char *tagname, int flags, ExplainState *es);
+static void ExplainIndentText(ExplainState *es);
 static void ExplainJSONLineEnding(ExplainState *es);
 static void ExplainYAMLLineStarting(ExplainState *es);
 static void escape_yaml(StringInfo buf, const char *str);
@@ -790,31 +801,22 @@ ExplainPrintJITSummary(ExplainState *es, QueryDesc *queryDesc)
 	if (queryDesc->estate->es_jit_worker_instr)
 		InstrJitAgg(&ji, queryDesc->estate->es_jit_worker_instr);
 
-	ExplainPrintJIT(es, queryDesc->estate->es_jit_flags, &ji, -1);
+	ExplainPrintJIT(es, queryDesc->estate->es_jit_flags, &ji);
 }
 
 /*
  * ExplainPrintJIT -
  *	  Append information about JITing to es->str.
- *
- * Can be used to print the JIT instrumentation of the backend (worker_num =
- * -1) or that of a specific worker (worker_num = ...).
  */
-void
-ExplainPrintJIT(ExplainState *es, int jit_flags,
-				JitInstrumentation *ji, int worker_num)
+static void
+ExplainPrintJIT(ExplainState *es, int jit_flags, JitInstrumentation *ji)
 {
 	instr_time	total_time;
-	bool		for_workers = (worker_num >= 0);
 
 	/* don't print information if no JITing happened */
 	if (!ji || ji->created_functions == 0)
 		return;
 
-	/* don't print per-worker info if we're supposed to hide that */
-	if (for_workers && es->hide_workers)
-		return;
-
 	/* calculate total time */
 	INSTR_TIME_SET_ZERO(total_time);
 	INSTR_TIME_ADD(total_time, ji->generation_counter);
@@ -827,16 +829,13 @@ ExplainPrintJIT(ExplainState *es, int jit_flags,
 	/* for higher density, open code the text output format */
 	if (es->format == EXPLAIN_FORMAT_TEXT)
 	{
-		appendStringInfoSpaces(es->str, es->indent * 2);
-		if (for_workers)
-			appendStringInfo(es->str, "JIT for worker %u:\n", worker_num);
-		else
-			appendStringInfoString(es->str, "JIT:\n");
-		es->indent += 1;
+		ExplainIndentText(es);
+		appendStringInfoString(es->str, "JIT:\n");
+		es->indent++;
 
 		ExplainPropertyInteger("Functions", NULL, ji->created_functions, es);
 
-		appendStringInfoSpaces(es->str, es->indent * 2);
+		ExplainIndentText(es);
 		appendStringInfo(es->str, "Options: %s %s, %s %s, %s %s, %s %s\n",
 						 "Inlining", jit_flags & PGJIT_INLINE ? "true" : "false",
 						 "Optimization", jit_flags & PGJIT_OPT3 ? "true" : "false",
@@ -845,7 +844,7 @@ ExplainPrintJIT(ExplainState *es, int jit_flags,
 
 		if (es->analyze && es->timing)
 		{
-			appendStringInfoSpaces(es->str, es->indent * 2);
+			ExplainIndentText(es);
 			appendStringInfo(es->str,
 							 "Timing: %s %.3f ms, %s %.3f ms, %s %.3f ms, %s %.3f ms, %s %.3f ms\n",
 							 "Generation", 1000.0 * INSTR_TIME_GET_DOUBLE(ji->generation_counter),
@@ -855,11 +854,10 @@ ExplainPrintJIT(ExplainState *es, int jit_flags,
 							 "Total", 1000.0 * INSTR_TIME_GET_DOUBLE(total_time));
 		}
 
-		es->indent -= 1;
+		es->indent--;
 	}
 	else
 	{
-		ExplainPropertyInteger("Worker Number", NULL, worker_num, es);
 		ExplainPropertyInteger("Functions", NULL, ji->created_functions, es);
 
 		ExplainOpenGroup("Options", "Options", true, es);
@@ -1074,9 +1072,10 @@ ExplainPreScanNode(PlanState *planstate, Bitmapset **rels_used)
  * optional name to be attached to the node.
  *
  * In text format, es->indent is controlled in this function since we only
- * want it to change at plan-node boundaries.  In non-text formats, es->indent
- * corresponds to the nesting depth of logical output groups, and therefore
- * is controlled by ExplainOpenGroup/ExplainCloseGroup.
+ * want it to change at plan-node boundaries (but a few subroutines will
+ * transiently increment it).  In non-text formats, es->indent corresponds
+ * to the nesting depth of logical output groups, and therefore is controlled
+ * by ExplainOpenGroup/ExplainCloseGroup.
  */
 static void
 ExplainNode(PlanState *planstate, List *ancestors,
@@ -1090,9 +1089,20 @@ ExplainNode(PlanState *planstate, List *ancestors,
 	const char *partialmode = NULL;
 	const char *operation = NULL;
 	const char *custom_name = NULL;
+	ExplainWorkersState *save_workers_state = es->workers_state;
 	int			save_indent = es->indent;
 	bool		haschildren;
 
+	/*
+	 * Prepare per-worker output buffers, if needed.  We'll append the data in
+	 * these to the main output string further down.
+	 */
+	if (planstate->worker_instrument && es->analyze && !es->hide_workers)
+		es->workers_state = ExplainCreateWorkersState(planstate->worker_instrument->num_workers);
+	else
+		es->workers_state = NULL;
+
+	/* Identify plan node type, and print generic details */
 	switch (nodeTag(plan))
 	{
 		case T_Result:
@@ -1324,13 +1334,13 @@ ExplainNode(PlanState *planstate, List *ancestors,
 	{
 		if (plan_name)
 		{
-			appendStringInfoSpaces(es->str, es->indent * 2);
+			ExplainIndentText(es);
 			appendStringInfo(es->str, "%s\n", plan_name);
 			es->indent++;
 		}
 		if (es->indent)
 		{
-			appendStringInfoSpaces(es->str, es->indent * 2);
+			ExplainIndentText(es);
 			appendStringInfoString(es->str, "->  ");
 			es->indent += 2;
 		}
@@ -1574,6 +1584,56 @@ ExplainNode(PlanState *planstate, List *ancestors,
 	if (es->format == EXPLAIN_FORMAT_TEXT)
 		appendStringInfoChar(es->str, '\n');
 
+	/* prepare per-worker general execution details */
+	if (es->workers_state && es->verbose)
+	{
+		WorkerInstrumentation *w = planstate->worker_instrument;
+
+		for (int n = 0; n < w->num_workers; n++)
+		{
+			Instrumentation *instrument = &w->instrument[n];
+			double		nloops = instrument->nloops;
+			double		startup_ms;
+			double		total_ms;
+			double		rows;
+
+			if (nloops <= 0)
+				continue;
+			startup_ms = 1000.0 * instrument->startup / nloops;
+			total_ms = 1000.0 * instrument->total / nloops;
+			rows = instrument->ntuples / nloops;
+
+			ExplainOpenWorker(n, es);
+
+			if (es->format == EXPLAIN_FORMAT_TEXT)
+			{
+				ExplainIndentText(es);
+				if (es->timing)
+					appendStringInfo(es->str,
+									 "actual time=%.3f..%.3f rows=%.0f loops=%.0f\n",
+									 startup_ms, total_ms, rows, nloops);
+				else
+					appendStringInfo(es->str,
+									 "actual rows=%.0f loops=%.0f\n",
+									 rows, nloops);
+			}
+			else
+			{
+				if (es->timing)
+				{
+					ExplainPropertyFloat("Actual Startup Time", "ms",
+										 startup_ms, 3, es);
+					ExplainPropertyFloat("Actual Total Time", "ms",
+										 total_ms, 3, es);
+				}
+				ExplainPropertyFloat("Actual Rows", NULL, rows, 0, es);
+				ExplainPropertyFloat("Actual Loops", NULL, nloops, 0, es);
+			}
+
+			ExplainCloseWorker(n, es);
+		}
+	}
+
 	/* target list */
 	if (es->verbose)
 		show_plan_tlist(planstate, ancestors, es);
@@ -1684,24 +1744,6 @@ ExplainNode(PlanState *planstate, List *ancestors,
 										   nworkers, es);
 				}
 
-				/*
-				 * Print per-worker Jit instrumentation. Use same conditions
-				 * as for the leader's JIT instrumentation, see comment there.
-				 */
-				if (es->costs && es->verbose &&
-					outerPlanState(planstate)->worker_jit_instrument)
-				{
-					PlanState  *child = outerPlanState(planstate);
-					int			n;
-					SharedJitInstrumentation *w = child->worker_jit_instrument;
-
-					for (n = 0; n < w->num_workers; ++n)
-					{
-						ExplainPrintJIT(es, child->state->es_jit_flags,
-										&w->jit_instr[n], n);
-					}
-				}
-
 				if (gather->single_copy || es->format != EXPLAIN_FORMAT_TEXT)
 					ExplainPropertyBool("Single Copy", gather->single_copy, es);
 			}
@@ -1881,80 +1923,54 @@ ExplainNode(PlanState *planstate, List *ancestors,
 			break;
 	}
 
+	/*
+	 * Prepare per-worker JIT instrumentation.  As with the overall JIT
+	 * summary, this is printed only if printing costs is enabled.
+	 */
+	if (es->workers_state && es->costs && es->verbose)
+	{
+		SharedJitInstrumentation *w = planstate->worker_jit_instrument;
+
+		if (w)
+		{
+			for (int n = 0; n < w->num_workers; n++)
+			{
+				ExplainOpenWorker(n, es);
+				ExplainPrintJIT(es, planstate->state->es_jit_flags,
+								&w->jit_instr[n]);
+				ExplainCloseWorker(n, es);
+			}
+		}
+	}
+
 	/* Show buffer usage */
 	if (es->buffers && planstate->instrument)
 		show_buffer_usage(es, &planstate->instrument->bufusage);
 
-	/* Show worker detail */
-	if (es->analyze && es->verbose && !es->hide_workers &&
-		planstate->worker_instrument)
+	/* Prepare per-worker buffer usage */
+	if (es->workers_state && es->buffers && es->verbose)
 	{
 		WorkerInstrumentation *w = planstate->worker_instrument;
-		bool		opened_group = false;
-		int			n;
 
-		for (n = 0; n < w->num_workers; ++n)
+		for (int n = 0; n < w->num_workers; n++)
 		{
 			Instrumentation *instrument = &w->instrument[n];
 			double		nloops = instrument->nloops;
-			double		startup_ms;
-			double		total_ms;
-			double		rows;
 
 			if (nloops <= 0)
 				continue;
-			startup_ms = 1000.0 * instrument->startup / nloops;
-			total_ms = 1000.0 * instrument->total / nloops;
-			rows = instrument->ntuples / nloops;
-
-			if (es->format == EXPLAIN_FORMAT_TEXT)
-			{
-				appendStringInfoSpaces(es->str, es->indent * 2);
-				appendStringInfo(es->str, "Worker %d: ", n);
-				if (es->timing)
-					appendStringInfo(es->str,
-									 "actual time=%.3f..%.3f rows=%.0f loops=%.0f\n",
-									 startup_ms, total_ms, rows, nloops);
-				else
-					appendStringInfo(es->str,
-									 "actual rows=%.0f loops=%.0f\n",
-									 rows, nloops);
-				es->indent++;
-				if (es->buffers)
-					show_buffer_usage(es, &instrument->bufusage);
-				es->indent--;
-			}
-			else
-			{
-				if (!opened_group)
-				{
-					ExplainOpenGroup("Workers", "Workers", false, es);
-					opened_group = true;
-				}
-				ExplainOpenGroup("Worker", NULL, true, es);
-				ExplainPropertyInteger("Worker Number", NULL, n, es);
-
-				if (es->timing)
-				{
-					ExplainPropertyFloat("Actual Startup Time", "ms",
-										 startup_ms, 3, es);
-					ExplainPropertyFloat("Actual Total Time", "ms",
-										 total_ms, 3, es);
-				}
-				ExplainPropertyFloat("Actual Rows", NULL, rows, 0, es);
-				ExplainPropertyFloat("Actual Loops", NULL, nloops, 0, es);
 
-				if (es->buffers)
-					show_buffer_usage(es, &instrument->bufusage);
-
-				ExplainCloseGroup("Worker", NULL, true, es);
-			}
+			ExplainOpenWorker(n, es);
+			show_buffer_usage(es, &instrument->bufusage);
+			ExplainCloseWorker(n, es);
 		}
-
-		if (opened_group)
-			ExplainCloseGroup("Workers", "Workers", false, es);
 	}
 
+	/* Show per-worker details for this plan node, then pop that stack */
+	if (es->workers_state)
+		ExplainFlushWorkersState(es);
+	es->workers_state = save_workers_state;
+
 	/* Get ready to display the child plans */
 	haschildren = planstate->initPlan ||
 		outerPlanState(planstate) ||
@@ -2525,7 +2541,7 @@ show_tablesample(TableSampleClause *tsc, PlanState *planstate,
 	{
 		bool		first = true;
 
-		appendStringInfoSpaces(es->str, es->indent * 2);
+		ExplainIndentText(es);
 		appendStringInfo(es->str, "Sampling: %s (", method_name);
 		foreach(lc, params)
 		{
@@ -2572,7 +2588,7 @@ show_sort_info(SortState *sortstate, ExplainState *es)
 
 		if (es->format == EXPLAIN_FORMAT_TEXT)
 		{
-			appendStringInfoSpaces(es->str, es->indent * 2);
+			ExplainIndentText(es);
 			appendStringInfo(es->str, "Sort Method: %s  %s: %ldkB\n",
 							 sortMethod, spaceType, spaceUsed);
 		}
@@ -2588,12 +2604,14 @@ show_sort_info(SortState *sortstate, ExplainState *es)
 	 * You might think we should just skip this stanza entirely when
 	 * es->hide_workers is true, but then we'd get no sort-method output at
 	 * all.  We have to make it look like worker 0's data is top-level data.
-	 * Currently, we only bother with that for text-format output.
+	 * This is easily done by just skipping the OpenWorker/CloseWorker calls.
+	 * Currently, we don't worry about the possibility that there are multiple
+	 * workers in such a case; if there are, duplicate output fields will be
+	 * emitted.
 	 */
 	if (sortstate->shared_info != NULL)
 	{
 		int			n;
-		bool		opened_group = false;
 
 		for (n = 0; n < sortstate->shared_info->num_workers; n++)
 		{
@@ -2609,32 +2627,26 @@ show_sort_info(SortState *sortstate, ExplainState *es)
 			spaceType = tuplesort_space_type_name(sinstrument->spaceType);
 			spaceUsed = sinstrument->spaceUsed;
 
+			if (es->workers_state)
+				ExplainOpenWorker(n, es);
+
 			if (es->format == EXPLAIN_FORMAT_TEXT)
 			{
-				appendStringInfoSpaces(es->str, es->indent * 2);
-				if (n > 0 || !es->hide_workers)
-					appendStringInfo(es->str, "Worker %d:  ", n);
+				ExplainIndentText(es);
 				appendStringInfo(es->str,
 								 "Sort Method: %s  %s: %ldkB\n",
 								 sortMethod, spaceType, spaceUsed);
 			}
 			else
 			{
-				if (!opened_group)
-				{
-					ExplainOpenGroup("Workers", "Workers", false, es);
-					opened_group = true;
-				}
-				ExplainOpenGroup("Worker", NULL, true, es);
-				ExplainPropertyInteger("Worker Number", NULL, n, es);
 				ExplainPropertyText("Sort Method", sortMethod, es);
 				ExplainPropertyInteger("Sort Space Used", "kB", spaceUsed, es);
 				ExplainPropertyText("Sort Space Type", spaceType, es);
-				ExplainCloseGroup("Worker", NULL, true, es);
 			}
+
+			if (es->workers_state)
+				ExplainCloseWorker(n, es);
 		}
-		if (opened_group)
-			ExplainCloseGroup("Workers", "Workers", false, es);
 	}
 }
 
@@ -2721,7 +2733,7 @@ show_hash_info(HashState *hashstate, ExplainState *es)
 		else if (hinstrument.nbatch_original != hinstrument.nbatch ||
 				 hinstrument.nbuckets_original != hinstrument.nbuckets)
 		{
-			appendStringInfoSpaces(es->str, es->indent * 2);
+			ExplainIndentText(es);
 			appendStringInfo(es->str,
 							 "Buckets: %d (originally %d)  Batches: %d (originally %d)  Memory Usage: %ldkB\n",
 							 hinstrument.nbuckets,
@@ -2732,7 +2744,7 @@ show_hash_info(HashState *hashstate, ExplainState *es)
 		}
 		else
 		{
-			appendStringInfoSpaces(es->str, es->indent * 2);
+			ExplainIndentText(es);
 			appendStringInfo(es->str,
 							 "Buckets: %d  Batches: %d  Memory Usage: %ldkB\n",
 							 hinstrument.nbuckets, hinstrument.nbatch,
@@ -2758,7 +2770,7 @@ show_tidbitmap_info(BitmapHeapScanState *planstate, ExplainState *es)
 	{
 		if (planstate->exact_pages > 0 || planstate->lossy_pages > 0)
 		{
-			appendStringInfoSpaces(es->str, es->indent * 2);
+			ExplainIndentText(es);
 			appendStringInfoString(es->str, "Heap Blocks:");
 			if (planstate->exact_pages > 0)
 				appendStringInfo(es->str, " exact=%ld", planstate->exact_pages);
@@ -2894,7 +2906,7 @@ show_buffer_usage(ExplainState *es, const BufferUsage *usage)
 		/* Show only positive counter values. */
 		if (has_shared || has_local || has_temp)
 		{
-			appendStringInfoSpaces(es->str, es->indent * 2);
+			ExplainIndentText(es);
 			appendStringInfoString(es->str, "Buffers:");
 
 			if (has_shared)
@@ -2949,7 +2961,7 @@ show_buffer_usage(ExplainState *es, const BufferUsage *usage)
 		/* As above, show only positive counter values. */
 		if (has_timing)
 		{
-			appendStringInfoSpaces(es->str, es->indent * 2);
+			ExplainIndentText(es);
 			appendStringInfoString(es->str, "I/O Timings:");
 			if (!INSTR_TIME_IS_ZERO(usage->blk_read_time))
 				appendStringInfo(es->str, " read=%0.3f",
@@ -3237,7 +3249,7 @@ show_modifytable_info(ModifyTableState *mtstate, List *ancestors,
 			 */
 			if (es->format == EXPLAIN_FORMAT_TEXT)
 			{
-				appendStringInfoSpaces(es->str, es->indent * 2);
+				ExplainIndentText(es);
 				appendStringInfoString(es->str,
 									   fdwroutine ? foperation : operation);
 			}
@@ -3427,6 +3439,158 @@ ExplainCustomChildren(CustomScanState *css, List *ancestors, ExplainState *es)
 }
 
 /*
+ * Create a per-plan-node workspace for collecting per-worker data.
+ *
+ * Output related to each worker will be temporarily "set aside" into a
+ * separate buffer, which we'll merge into the main output stream once
+ * we've processed all data for the plan node.  This makes it feasible to
+ * generate a coherent sub-group of fields for each worker, even though the
+ * code that produces the fields is in several different places in this file.
+ * Formatting of such a set-aside field group is managed by
+ * ExplainOpenSetAsideGroup and ExplainSaveGroup/ExplainRestoreGroup.
+ */
+static ExplainWorkersState *
+ExplainCreateWorkersState(int num_workers)
+{
+	ExplainWorkersState *wstate;
+
+	wstate = (ExplainWorkersState *) palloc(sizeof(ExplainWorkersState));
+	wstate->num_workers = num_workers;
+	wstate->worker_inited = (bool *) palloc0(num_workers * sizeof(bool));
+	wstate->worker_str = (StringInfoData *)
+		palloc0(num_workers * sizeof(StringInfoData));
+	wstate->worker_state_save = (int *) palloc(num_workers * sizeof(int));
+	return wstate;
+}
+
+/*
+ * Begin or resume output into the set-aside group for worker N.
+ */
+static void
+ExplainOpenWorker(int n, ExplainState *es)
+{
+	ExplainWorkersState *wstate = es->workers_state;
+
+	Assert(wstate);
+	Assert(n >= 0 && n < wstate->num_workers);
+
+	/* Save prior output buffer pointer */
+	wstate->prev_str = es->str;
+
+	if (!wstate->worker_inited[n])
+	{
+		/* First time through, so create the buffer for this worker */
+		initStringInfo(&wstate->worker_str[n]);
+		es->str = &wstate->worker_str[n];
+
+		/*
+		 * Push suitable initial formatting state for this worker's field
+		 * group.  We allow one extra logical nesting level, since this group
+		 * will eventually be wrapped in an outer "Workers" group.
+		 */
+		ExplainOpenSetAsideGroup("Worker", NULL, true, 2, es);
+
+		/*
+		 * In non-TEXT formats we always emit a "Worker Number" field, even if
+		 * there's no other data for this worker.
+		 */
+		if (es->format != EXPLAIN_FORMAT_TEXT)
+			ExplainPropertyInteger("Worker Number", NULL, n, es);
+
+		wstate->worker_inited[n] = true;
+	}
+	else
+	{
+		/* Resuming output for a worker we've already emitted some data for */
+		es->str = &wstate->worker_str[n];
+
+		/* Restore formatting state saved by last ExplainCloseWorker() */
+		ExplainRestoreGroup(es, 2, &wstate->worker_state_save[n]);
+	}
+
+	/*
+	 * In TEXT format, prefix the first output line for this worker with
+	 * "Worker N:".  Then, any additional lines should be indented one more
+	 * stop than the "Worker N" line is.
+	 */
+	if (es->format == EXPLAIN_FORMAT_TEXT)
+	{
+		if (es->str->len == 0)
+		{
+			ExplainIndentText(es);
+			appendStringInfo(es->str, "Worker %d:  ", n);
+		}
+
+		es->indent++;
+	}
+}
+
+/*
+ * End output for worker N --- must pair with previous ExplainOpenWorker call
+ */
+static void
+ExplainCloseWorker(int n, ExplainState *es)
+{
+	ExplainWorkersState *wstate = es->workers_state;
+
+	Assert(wstate);
+	Assert(n >= 0 && n < wstate->num_workers);
+	Assert(wstate->worker_inited[n]);
+
+	/*
+	 * Save formatting state in case we do another ExplainOpenWorker(), then
+	 * pop the formatting stack.
+	 */
+	ExplainSaveGroup(es, 2, &wstate->worker_state_save[n]);
+
+	/*
+	 * In TEXT format, if we didn't actually produce any output line(s) then
+	 * truncate off the partial line.  (This is to avoid bogus output if, for
+	 * instance, show_buffer_usage chooses not to print anything for the
+	 * worker.)  Also fix up the indent level.
+	 */
+	if (es->format == EXPLAIN_FORMAT_TEXT)
+	{
+		while (es->str->len > 0 && es->str->data[es->str->len - 1] != '\n')
+			es->str->data[--(es->str->len)] = '\0';
+
+		es->indent--;
+	}
+
+	/* Restore prior output buffer pointer */
+	es->str = wstate->prev_str;
+}
+
+/*
+ * Print per-worker info for current node, then free the ExplainWorkersState.
+ */
+static void
+ExplainFlushWorkersState(ExplainState *es)
+{
+	ExplainWorkersState *wstate = es->workers_state;
+
+	ExplainOpenGroup("Workers", "Workers", false, es);
+	for (int i = 0; i < wstate->num_workers; i++)
+	{
+		if (wstate->worker_inited[i])
+		{
+			/* This must match previous ExplainOpenSetAsideGroup call */
+			ExplainOpenGroup("Worker", NULL, true, es);
+			appendStringInfoString(es->str, wstate->worker_str[i].data);
+			ExplainCloseGroup("Worker", NULL, true, es);
+
+			pfree(wstate->worker_str[i].data);
+		}
+	}
+	ExplainCloseGroup("Workers", "Workers", false, es);
+
+	pfree(wstate->worker_inited);
+	pfree(wstate->worker_str);
+	pfree(wstate->worker_state_save);
+	pfree(wstate);
+}
+
+/*
  * Explain a property, such as sort keys or targets, that takes the form of
  * a list of unlabeled items.  "data" is a list of C strings.
  */
@@ -3439,7 +3603,7 @@ ExplainPropertyList(const char *qlabel, List *data, ExplainState *es)
 	switch (es->format)
 	{
 		case EXPLAIN_FORMAT_TEXT:
-			appendStringInfoSpaces(es->str, es->indent * 2);
+			ExplainIndentText(es);
 			appendStringInfo(es->str, "%s: ", qlabel);
 			foreach(lc, data)
 			{
@@ -3560,7 +3724,7 @@ ExplainProperty(const char *qlabel, const char *unit, const char *value,
 	switch (es->format)
 	{
 		case EXPLAIN_FORMAT_TEXT:
-			appendStringInfoSpaces(es->str, es->indent * 2);
+			ExplainIndentText(es);
 			if (unit)
 				appendStringInfo(es->str, "%s: %s %s\n", qlabel, value, unit);
 			else
@@ -3752,6 +3916,117 @@ ExplainCloseGroup(const char *objtype, const char *labelname,
 }
 
 /*
+ * Open a group of related objects, without emitting actual data.
+ *
+ * Prepare the formatting state as though we were beginning a group with
+ * the identified properties, but don't actually emit anything.  Output
+ * subsequent to this call can be redirected into a separate output buffer,
+ * and then eventually appended to the main output buffer after doing a
+ * regular ExplainOpenGroup call (with the same parameters).
+ *
+ * The extra "depth" parameter is the new group's depth compared to current.
+ * It could be more than one, in case the eventual output will be enclosed
+ * in additional nesting group levels.  We assume we don't need to track
+ * formatting state for those levels while preparing this group's output.
+ *
+ * There is no ExplainCloseSetAsideGroup --- in current usage, we always
+ * pop this state with ExplainSaveGroup.
+ */
+static void
+ExplainOpenSetAsideGroup(const char *objtype, const char *labelname,
+						 bool labeled, int depth, ExplainState *es)
+{
+	switch (es->format)
+	{
+		case EXPLAIN_FORMAT_TEXT:
+			/* nothing to do */
+			break;
+
+		case EXPLAIN_FORMAT_XML:
+			es->indent += depth;
+			break;
+
+		case EXPLAIN_FORMAT_JSON:
+			es->grouping_stack = lcons_int(0, es->grouping_stack);
+			es->indent += depth;
+			break;
+
+		case EXPLAIN_FORMAT_YAML:
+			if (labelname)
+				es->grouping_stack = lcons_int(1, es->grouping_stack);
+			else
+				es->grouping_stack = lcons_int(0, es->grouping_stack);
+			es->indent += depth;
+			break;
+	}
+}
+
+/*
+ * Pop one level of grouping state, allowing for a re-push later.
+ *
+ * This is typically used after ExplainOpenSetAsideGroup; pass the
+ * same "depth" used for that.
+ *
+ * This should not emit any output.  If state needs to be saved,
+ * save it at *state_save.  Currently, an integer save area is sufficient
+ * for all formats, but we might need to revisit that someday.
+ */
+static void
+ExplainSaveGroup(ExplainState *es, int depth, int *state_save)
+{
+	switch (es->format)
+	{
+		case EXPLAIN_FORMAT_TEXT:
+			/* nothing to do */
+			break;
+
+		case EXPLAIN_FORMAT_XML:
+			es->indent -= depth;
+			break;
+
+		case EXPLAIN_FORMAT_JSON:
+			es->indent -= depth;
+			*state_save = linitial_int(es->grouping_stack);
+			es->grouping_stack = list_delete_first(es->grouping_stack);
+			break;
+
+		case EXPLAIN_FORMAT_YAML:
+			es->indent -= depth;
+			*state_save = linitial_int(es->grouping_stack);
+			es->grouping_stack = list_delete_first(es->grouping_stack);
+			break;
+	}
+}
+
+/*
+ * Re-push one level of grouping state, undoing the effects of ExplainSaveGroup.
+ */
+static void
+ExplainRestoreGroup(ExplainState *es, int depth, int *state_save)
+{
+	switch (es->format)
+	{
+		case EXPLAIN_FORMAT_TEXT:
+			/* nothing to do */
+			break;
+
+		case EXPLAIN_FORMAT_XML:
+			es->indent += depth;
+			break;
+
+		case EXPLAIN_FORMAT_JSON:
+			es->grouping_stack = lcons_int(*state_save, es->grouping_stack);
+			es->indent += depth;
+			break;
+
+		case EXPLAIN_FORMAT_YAML:
+			es->grouping_stack = lcons_int(*state_save, es->grouping_stack);
+			es->indent += depth;
+			break;
+	}
+}
+
+/*
  * Emit a "dummy" group that never has any members.
  *
  * objtype is the type of the group object, labelname is its label within
@@ -3913,6 +4188,21 @@ ExplainXMLTag(const char *tagname, int flags, ExplainState *es)
 }
 
 /*
+ * Indent a text-format line.
+ *
+ * We indent by two spaces per indentation level.  However, when emitting
+ * data for a parallel worker there might already be data on the current line
+ * (cf. ExplainOpenWorker); in that case, don't indent any more.
+ */
+static void
+ExplainIndentText(ExplainState *es)
+{
+	Assert(es->format == EXPLAIN_FORMAT_TEXT);
+	if (es->str->len == 0 || es->str->data[es->str->len - 1] == '\n')
+		appendStringInfoSpaces(es->str, es->indent * 2);
+}
+
+/*
  * Emit a JSON line ending.
  *
  * JSON requires a comma after each property but the last.  To facilitate this,
diff --git a/src/include/commands/explain.h b/src/include/commands/explain.h
index 45027a7..54f6240 100644
--- a/src/include/commands/explain.h
+++ b/src/include/commands/explain.h
@@ -25,6 +25,15 @@ typedef enum ExplainFormat
 	EXPLAIN_FORMAT_YAML
 } ExplainFormat;
 
+typedef struct ExplainWorkersState
+{
+	int			num_workers;	/* # of worker processes the plan used */
+	bool	   *worker_inited;	/* per-worker state-initialized flags */
+	StringInfoData *worker_str; /* per-worker transient output buffers */
+	int		   *worker_state_save;	/* per-worker grouping state save areas */
+	StringInfo	prev_str;		/* saved output buffer while redirecting */
+} ExplainWorkersState;
+
 typedef struct ExplainState
 {
 	StringInfo	str;			/* output buffer */
@@ -47,6 +56,8 @@ typedef struct ExplainState
 	List	   *deparse_cxt;	/* context list for deparsing expressions */
 	Bitmapset  *printed_subplans;	/* ids of SubPlans we've printed */
 	bool		hide_workers;	/* set if we find an invisible Gather */
+	/* state related to the current plan node */
+	ExplainWorkersState *workers_state; /* needed if parallel plan */
 } ExplainState;
 
 /* Hook for plugins to get control in ExplainOneQuery() */
@@ -84,8 +95,6 @@ extern void ExplainPrintPlan(ExplainState *es, QueryDesc *queryDesc);
 extern void ExplainPrintTriggers(ExplainState *es, QueryDesc *queryDesc);
 
 extern void ExplainPrintJITSummary(ExplainState *es, QueryDesc *queryDesc);
-extern void ExplainPrintJIT(ExplainState *es, int jit_flags,
-							struct JitInstrumentation *jit_instr, int worker_num);
 
 extern void ExplainQueryText(ExplainState *es, QueryDesc *queryDesc);
 
#24Tom Lane
tgl@sss.pgh.pa.us
In reply to: Tom Lane (#23)
Re: Duplicate Workers entries in some EXPLAIN plans

I wrote:

It's still really unclear to me how we could exercise any of
this behavior meaningfully in a regression test. I thought
for a little bit about using the TAP infrastructure instead
of a traditional-style test, but it seems like that doesn't
buy anything except for a bias towards ignoring details instead
of overspecifying them. Which is not much of an improvement.

After further thought, I decided that about the best we can do
is suppress the "Workers" field in the regression test's expected
output. This still gives us code coverage of the relevant code,
and we can check that the output is valid JSON before we strip it,
so it's not a dead loss.

I rewrote the test script a bit to add some coverage of XML and YAML
output formats, since we had exactly none before, and pushed it.

regards, tom lane

#25Andres Freund
andres@anarazel.de
In reply to: Tom Lane (#23)
Re: Duplicate Workers entries in some EXPLAIN plans

Hi,

On 2020-01-25 14:23:50 -0500, Tom Lane wrote:

A side effect of this change is that per-worker JIT info is now
printed one plan level further down: before it was printed on
the Gather node, but now it's attached to the Gather's child,
because that's where we print other per-worker data. I don't
see anything particularly wrong with that, but it's another
change from the behavior today.

Yea, I don't see any need to be bothered by that.

It's still really unclear to me how we could exercise any of
this behavior meaningfully in a regression test. I thought
for a little bit about using the TAP infrastructure instead
of a traditional-style test, but it seems like that doesn't
buy anything except for a bias towards ignoring details instead
of overspecifying them. Which is not much of an improvement.

Hm. I'd like to avoid needing TAP for this kind of thing, it'll just
make it much more expensive in just about all ways.

Testing JIT explain is "easy" enough I think, I've posted a patch in
another thread, which just skips over the region of the test if JIT is
not available. See e.g. 0008 in
/messages/by-id/20191029000229.fkjmuld3g7f2jq7i@alap3.anarazel.de
(that's a thread I'd love your input in btw)

It's harder for parallel query though. If parallel query were able to
reuse workers, we could "just" check at the beginning of the test if we
are able to get the workers we need, and then skip the rest of the tests
if not. But as things stand that doesn't guarantee anything.

I wonder if we could introduce a debug GUC that makes parallel worker
acquisition just retry in a loop, for a time determined by the GUC. That
obviously would be a bad idea to do in a production setup, but it could
be good enough for regression tests? There are some deadlock dangers,
but I'm not sure they really matter for the tests.

+	/* prepare per-worker general execution details */
+	if (es->workers_state && es->verbose)
+	{
+		WorkerInstrumentation *w = planstate->worker_instrument;
+
+		for (int n = 0; n < w->num_workers; n++)
+		{
+			Instrumentation *instrument = &w->instrument[n];
+			double		nloops = instrument->nloops;
+			double		startup_ms;
+			double		total_ms;
+			double		rows;
+
+			if (nloops <= 0)
+				continue;
+			startup_ms = 1000.0 * instrument->startup / nloops;
+			total_ms = 1000.0 * instrument->total / nloops;
+			rows = instrument->ntuples / nloops;
+
+			ExplainOpenWorker(n, es);
+
+			if (es->format == EXPLAIN_FORMAT_TEXT)
+			{
+				ExplainIndentText(es);
+				if (es->timing)
+					appendStringInfo(es->str,
+									 "actual time=%.3f..%.3f rows=%.0f loops=%.0f\n",
+									 startup_ms, total_ms, rows, nloops);
+				else
+					appendStringInfo(es->str,
+									 "actual rows=%.0f loops=%.0f\n",
+									 rows, nloops);
+			}
+			else
+			{
+				if (es->timing)
+				{
+					ExplainPropertyFloat("Actual Startup Time", "ms",
+										 startup_ms, 3, es);
+					ExplainPropertyFloat("Actual Total Time", "ms",
+										 total_ms, 3, es);
+				}
+				ExplainPropertyFloat("Actual Rows", NULL, rows, 0, es);
+				ExplainPropertyFloat("Actual Loops", NULL, nloops, 0, es);
+			}
+
+			ExplainCloseWorker(n, es);
+		}
+	}

I'd personally move this into a separate function, given the patches
moves the code around already. ExplainNode() is already hard enough to
navigate...

It probably also makes sense to move everything but the nloops <= 0,
ExplainOpenWorker/ExplainCloseWorker into its own function. As far as I
can tell it now should be identical between the non-parallel case?

+/*
+ * Begin or resume output into the set-aside group for worker N.
+ */
+static void

Would it make sense to make these functions non-static? It seems
plausible that code for a custom node or such would want to add
its own information?

+ExplainOpenWorker(int n, ExplainState *es)
+{
+	ExplainWorkersState *wstate = es->workers_state;
+
+	Assert(wstate);
+	Assert(n >= 0 && n < wstate->num_workers);
+
+	/* Save prior output buffer pointer */
+	wstate->prev_str = es->str;
+
+	if (!wstate->worker_inited[n])
+	{
+		/* First time through, so create the buffer for this worker */
+		initStringInfo(&wstate->worker_str[n]);
+		es->str = &wstate->worker_str[n];
+
+		/*
+		 * Push suitable initial formatting state for this worker's field
+		 * group.  We allow one extra logical nesting level, since this group
+		 * will eventually be wrapped in an outer "Workers" group.
+		 */
+		ExplainOpenSetAsideGroup("Worker", NULL, true, 2, es);
+
+		/*
+		 * In non-TEXT formats we always emit a "Worker Number" field, even if
+		 * there's no other data for this worker.
+		 */
+		if (es->format != EXPLAIN_FORMAT_TEXT)
+			ExplainPropertyInteger("Worker Number", NULL, n, es);
+
+		wstate->worker_inited[n] = true;
+	}
+	else
+	{
+		/* Resuming output for a worker we've already emitted some data for */
+		es->str = &wstate->worker_str[n];
+
+		/* Restore formatting state saved by last ExplainCloseWorker() */
+		ExplainRestoreGroup(es, 2, &wstate->worker_state_save[n]);
+	}
+
+	/*
+	 * In TEXT format, prefix the first output line for this worker with
+	 * "Worker N:".  Then, any additional lines should be indented one more
+	 * stop than the "Worker N" line is.
+	 */
+	if (es->format == EXPLAIN_FORMAT_TEXT)
+	{
+		if (es->str->len == 0)
+		{
+			ExplainIndentText(es);
+			appendStringInfo(es->str, "Worker %d:  ", n);
+		}
+
+		es->indent++;
+	}
+}

I don't quite get the Worker %d bit. Why are we not outputting that in
the !worker_inited block?

+/*
+ * Print per-worker info for current node, then free the ExplainWorkersState.
+ */
+static void
+ExplainFlushWorkersState(ExplainState *es)
+{
+	ExplainWorkersState *wstate = es->workers_state;
+
+	ExplainOpenGroup("Workers", "Workers", false, es);
+	for (int i = 0; i < wstate->num_workers; i++)
+	{
+		if (wstate->worker_inited[i])
+		{
+			/* This must match previous ExplainOpenSetAsideGroup call */
+			ExplainOpenGroup("Worker", NULL, true, es);
+			appendStringInfoString(es->str, wstate->worker_str[i].data);

Probably never matters, but given we do have the string length already,
we could use appendBinaryStringInfo().

+ ExplainCloseGroup("Worker", NULL, true, es);

Not related to this patch: I never got why we maintain a grouping stack
for some things, but don't do it for the group properties
themselves.

/*
+ * Open a group of related objects, without emitting actual data.
+ *
+ * Prepare the formatting state as though we were beginning a group with
+ * the identified properties, but don't actually emit anything.  Output
+ * subsequent to this call can be redirected into a separate output buffer,
+ * and then eventually appended to the main output buffer after doing a
+ * regular ExplainOpenGroup call (with the same parameters).
+ *
+ * The extra "depth" parameter is the new group's depth compared to current.
+ * It could be more than one, in case the eventual output will be enclosed
+ * in additional nesting group levels.  We assume we don't need to track
+ * formatting state for those levels while preparing this group's output.
+ *
+ * There is no ExplainCloseSetAsideGroup --- in current usage, we always
+ * pop this state with ExplainSaveGroup.
+ */
+static void
+ExplainOpenSetAsideGroup(const char *objtype, const char *labelname,
+						 bool labeled, int depth, ExplainState *es)
+{
+	switch (es->format)
+	{
+		case EXPLAIN_FORMAT_TEXT:
+			/* nothing to do */
+			break;
+
+		case EXPLAIN_FORMAT_XML:
+			es->indent += depth;
+			break;
+
+		case EXPLAIN_FORMAT_JSON:
+			es->grouping_stack = lcons_int(0, es->grouping_stack);
+			es->indent += depth;
+			break;
+
+		case EXPLAIN_FORMAT_YAML:
+			if (labelname)
+				es->grouping_stack = lcons_int(1, es->grouping_stack);
+			else
+				es->grouping_stack = lcons_int(0, es->grouping_stack);
+			es->indent += depth;
+			break;
+	}
+}

Hm. It might be worthwhile to rename ExplainOpenSetAsideGroup and use it
from ExplainOpenGroup()? Seems we could just call it after
ExplainOpenGroup()'s switch, and remove all the indent/grouping_stack
related code from ExplainOpenGroup().

Greetings,

Andres Freund

#26Tom Lane
tgl@sss.pgh.pa.us
In reply to: Andres Freund (#25)
Re: Duplicate Workers entries in some EXPLAIN plans

Andres Freund <andres@anarazel.de> writes:

I wonder if we could introduce a debug GUC that makes parallel worker
acquisition just retry in a loop, for a time determined by the GUC. That
obviously would be a bad idea to do in a production setup, but it could
be good enough for regression tests? There are some deadlock dangers,
but I'm not sure they really matter for the tests.

Hmmm .... might work. Seems like a better idea than "run it by itself"
as we have to do now.

I'd personally move this into a separate function, given the patches
moves the code around already. ExplainNode() is already hard enough to
navigate...

Well, it was already inline in ExplainNode, so this just moved the
code a few lines. I'm not that excited about moving little bits of
that function out-of-line.

+/*
+ * Begin or resume output into the set-aside group for worker N.
+ */
+static void

Would it make sense to make these functions non-static? It seems
plausible that code for a custom node or such would want to add
its own information?

Maybe, but seems to me that there'd be a whole lot of other infrastructure
needed to track additional per-worker state. I'd just as soon not
expose this stuff until (a) there's a concrete not hypothetical use case
and (b) it's been around long enough to feel comfortable that there's
nothing wrong with the design.

+	/*
+	 * In TEXT format, prefix the first output line for this worker with
+	 * "Worker N:".  Then, any additional lines should be indented one more
+	 * stop than the "Worker N" line is.
+	 */

I don't quite get the Worker %d bit. Why are we not outputting that in
the !worker_inited block?

We might strip it off again in ExplainCloseWorker, and then potentially
add it back again in a later ExplainOpenWorker. That could only happen
if an earlier ExplainOpen/CloseWorker fragment decides not to emit any
text and then a later one wants to do so. Which I'm pretty sure is
unreachable right now, but I don't think this code should assume that.

+ appendStringInfoString(es->str, wstate->worker_str[i].data);

Probably never matters, but given we do have the string length already,
we could use appendBinaryStringInfo().

Ah, I was thinking about making that change but then forgot.

+ ExplainCloseGroup("Worker", NULL, true, es);

Not related to this patch: I never got why we maintain a grouping stack
for some things, but don't do it for the group properties
themselves.

Right now it'd just be extra overhead. If we ever have a case where it's
not convenient for an ExplainCloseGroup caller to provide the same data
as for ExplainOpenGroup, then I'd be on board with storing that info.

Hm. It might be worthwhile to rename ExplainOpenSetAsideGroup and use it
from ExplainOpenGroup()? Seems we could just call it after
ExplainOpenGroup()'s switch, and remove all the indent/grouping_stack
related code from ExplainOpenGroup().

Hmm. It seemed easier to me to keep them separate, but ...

I did consider a design in which, instead of ExplainOpenSetAsideGroup,
there was some function that would initialize the "state_save" area and
then you'd call the "restore" function to make that state active. It
seemed like that would be too dissimilar from ExplainOpenGroup --- but
conceivably, we could reimplement ExplainOpenGroup as calling the
initialize function and then the restore function (along with doing some
output). Not really sure that'd be an improvement though: it'd involve
less duplicate code, but the functions would individually be harder to
wrap your brain around.

regards, tom lane

#27Tom Lane
tgl@sss.pgh.pa.us
In reply to: Tom Lane (#26)
Re: Duplicate Workers entries in some EXPLAIN plans

I wrote:

Andres Freund <andres@anarazel.de> writes:

I wonder if we could introduce a debug GUC that makes parallel worker
acquisition just retry in a loop, for a time determined by the GUC. That
obviously would be a bad idea to do in a production setup, but it could
be good enough for regression tests? There are some deadlock dangers,
but I'm not sure they really matter for the tests.

Hmmm .... might work. Seems like a better idea than "run it by itself"
as we have to do now.

The more I think about this, the more it seems like a good idea, and
not only for regression test purposes. If you're about to launch a
query that will run for hours even with the max number of workers,
you don't want it to launch with less than that number just because
somebody else was eating a worker slot for a few milliseconds.

So I'm imagining a somewhat general-purpose GUC defined like
"max_delay_to_acquire_parallel_worker", measured say in milliseconds.
The default would be zero (current behavior: try once and give up),
but you could set it to small positive values if you have that kind
of production concern, while the regression tests could set it to big
positive values. This would alleviate all sorts of problems we have
with not being able to assume stable results from parallel worker
acquisition in the tests.

regards, tom lane