Explain: Duplicate key "Workers" in JSON format

Started by Pierre Giraudover 6 years ago7 messages
#1Pierre Giraud
pierre.giraud@dalibo.com

Hi,

I'm currently working on a tool to visualize an execution plan [1]https://dalibo.github.io/pev2/. For
those who know PEV, it's actually a fork of this great tool since it
hasn't been active for more than 2 years.

Among other things, I'd like to show information for the parallel
queries. First of which is information about the workers.

I'm facing a problem when I am trying to parse a plan in the JSON
format. The "Workers" key may be duplicated.

While it's not invalid to have keys with the same name several times at
the same level in a JSON object, it makes it almost impossible to get
the full info when parsed. Indeed when parsing such a JSON string only
the last key is kept. Part of the information is lost.

JSON validators warn us with the following message : "Duplicate key,
names should be unique."

Here's an example of a plan in VERBOSE mode.

[
{
"Plan": {
"Node Type": "Gather Merge",
"Parallel Aware": false,
"Actual Startup Time": 1720.052,
"Actual Total Time": 4252.290,
"Actual Rows": 10000000,
"Actual Loops": 1,
"Output": ["c1", "c2"],
"Workers Planned": 2,
"Workers Launched": 2,
"Plans": [
{
"Node Type": "Sort",
"Parent Relationship": "Outer",
"Parallel Aware": false,
"Actual Startup Time": 1558.638,
"Actual Total Time": 2127.522,
"Actual Rows": 3333333,
"Actual Loops": 3,
"Output": ["c1", "c2"],
"Sort Key": ["t1.c1"],
"Sort Method": "external merge",
"Sort Space Used": 126152,
"Sort Space Type": "Disk",
"Workers": [
{
"Worker Number": 0,
"Sort Method": "external merge",
"Sort Space Used": 73552,
"Sort Space Type": "Disk"
},
{
"Worker Number": 1,
"Sort Method": "external merge",
"Sort Space Used": 73320,
"Sort Space Type": "Disk"
}
],
"Workers": [
{
"Worker Number": 0,
"Actual Startup Time": 1487.846,
"Actual Total Time": 1996.879,
"Actual Rows": 2692973,
"Actual Loops": 1
},
{
"Worker Number": 1,
"Actual Startup Time": 1468.256,
"Actual Total Time": 2012.744,
"Actual Rows": 2684443,
"Actual Loops": 1
}
],
"Plans": [
{
"Node Type": "Seq Scan",
"Parent Relationship": "Outer",
"Parallel Aware": true,
"Relation Name": "t1",
"Schema": "public",
"Alias": "t1",
"Actual Startup Time": 0.211,
"Actual Total Time": 372.858,
"Actual Rows": 3333333,
"Actual Loops": 3,
"Output": ["c1", "c2"],
"Workers": [
{
"Worker Number": 0,
"Actual Startup Time": 0.029,
"Actual Total Time": 368.356,
"Actual Rows": 2692973,
"Actual Loops": 1
},
{
"Worker Number": 1,
"Actual Startup Time": 0.033,
"Actual Total Time": 368.874,
"Actual Rows": 2684443,
"Actual Loops": 1
}
]
}
]
}
]
},
"Planning Time": 0.170,
"Triggers": [
],
"Execution Time": 4695.141
}
]

As you can see, the "Workers" key is duplicated in the Sort node.

Here's the equivalent in TEXT format:

---------------------------------
Gather Merge (cost=735306.27..1707599.95 rows=8333364 width=17)
(actual time=1560.468..3749.583 rows=10000000 loops=1)
Output: c1, c2
Workers Planned: 2
Workers Launched: 2
-> Sort (cost=734306.25..744722.95 rows=4166682 width=17) (actual
time=1474.182..1967.788 rows=3333333 loops=3)
Output: c1, c2
Sort Key: t1.c1
Sort Method: external merge Disk: 125168kB
Worker 0: Sort Method: external merge Disk: 73768kB
Worker 1: Sort Method: external merge Disk: 74088kB
Worker 0: actual time=1431.136..1883.370 rows=2700666 loops=1
Worker 1: actual time=1431.175..1891.630 rows=2712505 loops=1
-> Parallel Seq Scan on public.t1 (cost=0.00..105264.82
rows=4166682 width=17) (actual time=0.214..386.014 rows=3333333 loops=3)
Output: c1, c2
Worker 0: actual time=0.027..382.325 rows=2700666 loops=1
Worker 1: actual time=0.038..384.951 rows=2712505 loops=1
Planning Time: 0.180 ms
Execution Time: 4166.867 ms
(18 rows)
---------------------------------

I think that the text format should stay as is.

For the JSON format however it would be better in my opinion if
"Workers" data is merged. Parsing should not imply anything else than
"var myObj = JSON.parse(theJsonString);".

What do you think?

Thanks.

[1]: https://dalibo.github.io/pev2/

#2Andrew Dunstan
andrew.dunstan@2ndquadrant.com
In reply to: Pierre Giraud (#1)
Re: Explain: Duplicate key "Workers" in JSON format

On 8/23/19 8:47 AM, Pierre Giraud wrote:

"Plans": [
{
"Node Type": "Sort",
"Parent Relationship": "Outer",
"Parallel Aware": false,
"Actual Startup Time": 1558.638,
"Actual Total Time": 2127.522,
"Actual Rows": 3333333,
"Actual Loops": 3,
"Output": ["c1", "c2"],
"Sort Key": ["t1.c1"],
"Sort Method": "external merge",
"Sort Space Used": 126152,
"Sort Space Type": "Disk",
"Workers": [
{
"Worker Number": 0,
"Sort Method": "external merge",
"Sort Space Used": 73552,
"Sort Space Type": "Disk"
},
{
"Worker Number": 1,
"Sort Method": "external merge",
"Sort Space Used": 73320,
"Sort Space Type": "Disk"
}
],
"Workers": [
{
"Worker Number": 0,
"Actual Startup Time": 1487.846,
"Actual Total Time": 1996.879,
"Actual Rows": 2692973,
"Actual Loops": 1
},
{
"Worker Number": 1,
"Actual Startup Time": 1468.256,
"Actual Total Time": 2012.744,
"Actual Rows": 2684443,
"Actual Loops": 1
}
],

Yes, that's really awful.

...

I think that the text format should stay as is.

For the JSON format however it would be better in my opinion if
"Workers" data is merged. Parsing should not imply anything else than
"var myObj = JSON.parse(theJsonString);".

What do you think?

Maybe a simpler fix would be to rename one set of nodes to
"Sort-Workers" or some such.

cheers

andrew

--
Andrew Dunstan https://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

In reply to: Andrew Dunstan (#2)
1 attachment(s)
Re: Explain: Duplicate key "Workers" in JSON format

Hi hackers,

On Fri, 23 Aug 2019 09:30:59 -0400
Andrew Dunstan <andrew.dunstan@2ndquadrant.com> wrote:

On 8/23/19 8:47 AM, Pierre Giraud wrote:

"Plans": [
{
"Node Type": "Sort",
"Parent Relationship": "Outer",
"Parallel Aware": false,
"Actual Startup Time": 1558.638,
"Actual Total Time": 2127.522,
"Actual Rows": 3333333,
"Actual Loops": 3,
"Output": ["c1", "c2"],
"Sort Key": ["t1.c1"],
"Sort Method": "external merge",
"Sort Space Used": 126152,
"Sort Space Type": "Disk",
"Workers": [
{
"Worker Number": 0,
"Sort Method": "external merge",
"Sort Space Used": 73552,
"Sort Space Type": "Disk"
},
{
"Worker Number": 1,
"Sort Method": "external merge",
"Sort Space Used": 73320,
"Sort Space Type": "Disk"
}
],
"Workers": [
{
"Worker Number": 0,
"Actual Startup Time": 1487.846,
"Actual Total Time": 1996.879,
"Actual Rows": 2692973,
"Actual Loops": 1
},
{
"Worker Number": 1,
"Actual Startup Time": 1468.256,
"Actual Total Time": 2012.744,
"Actual Rows": 2684443,
"Actual Loops": 1
}
],

Yes, that's really awful.

...

I think that the text format should stay as is.

For the JSON format however it would be better in my opinion if
"Workers" data is merged. Parsing should not imply anything else than
"var myObj = JSON.parse(theJsonString);".

What do you think?

Please, find in attachment a patch that merge parallel data together.

Note that duplicated keys are discouraged in json specs, but they are forbidden
in yaml ones. See link in the commit message.

Regards,

Attachments:

0001-Merge-parallel-infos-for-each-worker-in-explain.patchtext/x-patchDownload
From 37595f7eaf7650672e38aa011a5400f5c94c70bf Mon Sep 17 00:00:00 2001
From: Jehan-Guillaume de Rorthais <jgdr@dalibo.com>
Date: Thu, 5 Dec 2019 17:41:17 +0100
Subject: [PATCH] Merge parallel infos for each worker in explain

Explain output was building two different group for parallel
general infos and parallel sort infos for each workers.

This is fine for text output, but not for json where dupplicate
keys are discouraged, neither in yaml were it is an error according
to the specs: https://yaml.org/spec/1.2/spec.html#id2759572.

Reported-by: Pierre Giraud
Discussion: https://www.postgresql.org/message-id/flat/41ee53a5-a36e-cc8f-1bee-63f6565bb1ee%40dalibo.com
Discussion: https://www.postgresql.org/message-id/flat/941.1571722493%40sss.pgh.pa.us#36db9545b24929d39639f6bd79390a06
---
 src/backend/commands/explain.c | 89 ++++++++++++++++++----------------
 1 file changed, 48 insertions(+), 41 deletions(-)

diff --git a/src/backend/commands/explain.c b/src/backend/commands/explain.c
index 62fb3434a3..10f985a4cb 100644
--- a/src/backend/commands/explain.c
+++ b/src/backend/commands/explain.c
@@ -101,6 +101,7 @@ static void show_sortorder_options(StringInfo buf, Node *sortexpr,
 static void show_tablesample(TableSampleClause *tsc, PlanState *planstate,
 							 List *ancestors, ExplainState *es);
 static void show_sort_info(SortState *sortstate, ExplainState *es);
+static void show_parallel_sort_info(SortState *sortstate, ExplainState *es, int n);
 static void show_hash_info(HashState *hashstate, ExplainState *es);
 static void show_tidbitmap_info(BitmapHeapScanState *planstate,
 								ExplainState *es);
@@ -1904,6 +1905,10 @@ ExplainNode(PlanState *planstate, List *ancestors,
 				es->indent++;
 				if (es->buffers)
 					show_buffer_usage(es, &instrument->bufusage);
+
+				if (nodeTag(plan) == T_Sort)
+					show_parallel_sort_info(castNode(SortState, planstate), es, n);
+
 				es->indent--;
 			}
 			else
@@ -1929,6 +1934,9 @@ ExplainNode(PlanState *planstate, List *ancestors,
 				if (es->buffers)
 					show_buffer_usage(es, &instrument->bufusage);
 
+				if (nodeTag(plan) == T_Sort)
+					show_parallel_sort_info(castNode(SortState, planstate), es, n);
+
 				ExplainCloseGroup("Worker", NULL, true, es);
 			}
 		}
@@ -2565,50 +2573,49 @@ show_sort_info(SortState *sortstate, ExplainState *es)
 			ExplainPropertyText("Sort Space Type", spaceType, es);
 		}
 	}
+}
 
-	if (sortstate->shared_info != NULL)
-	{
-		int			n;
-		bool		opened_group = false;
+/*
+ * If it's EXPLAIN ANALYZE, show tuplesort stats for a parallel sort node
+ */
+static void
+show_parallel_sort_info(SortState *sortstate, ExplainState *es, int n)
+{
+	TuplesortInstrumentation *sinstrument;
+	const char *sortMethod;
+	const char *spaceType;
+	long		spaceUsed;
 
-		for (n = 0; n < sortstate->shared_info->num_workers; n++)
-		{
-			TuplesortInstrumentation *sinstrument;
-			const char *sortMethod;
-			const char *spaceType;
-			long		spaceUsed;
-
-			sinstrument = &sortstate->shared_info->sinstrument[n];
-			if (sinstrument->sortMethod == SORT_TYPE_STILL_IN_PROGRESS)
-				continue;		/* ignore any unfilled slots */
-			sortMethod = tuplesort_method_name(sinstrument->sortMethod);
-			spaceType = tuplesort_space_type_name(sinstrument->spaceType);
-			spaceUsed = sinstrument->spaceUsed;
+	if (!es->analyze)
+		return;
 
-			if (es->format == EXPLAIN_FORMAT_TEXT)
-			{
-				appendStringInfoSpaces(es->str, es->indent * 2);
-				appendStringInfo(es->str,
-								 "Worker %d:  Sort Method: %s  %s: %ldkB\n",
-								 n, sortMethod, spaceType, spaceUsed);
-			}
-			else
-			{
-				if (!opened_group)
-				{
-					ExplainOpenGroup("Workers", "Workers", false, es);
-					opened_group = true;
-				}
-				ExplainOpenGroup("Worker", NULL, true, es);
-				ExplainPropertyInteger("Worker Number", NULL, n, es);
-				ExplainPropertyText("Sort Method", sortMethod, es);
-				ExplainPropertyInteger("Sort Space Used", "kB", spaceUsed, es);
-				ExplainPropertyText("Sort Space Type", spaceType, es);
-				ExplainCloseGroup("Worker", NULL, true, es);
-			}
-		}
-		if (opened_group)
-			ExplainCloseGroup("Workers", "Workers", false, es);
+	if (! sortstate->sort_Done || sortstate->tuplesortstate == NULL)
+		return;
+
+	if (sortstate->shared_info == NULL)
+		return;
+
+	sinstrument = &sortstate->shared_info->sinstrument[n];
+
+	if (sinstrument->sortMethod == SORT_TYPE_STILL_IN_PROGRESS)
+		return;		/* ignore any unfilled slots */
+
+	sortMethod = tuplesort_method_name(sinstrument->sortMethod);
+	spaceType = tuplesort_space_type_name(sinstrument->spaceType);
+	spaceUsed = sinstrument->spaceUsed;
+
+	if (es->format == EXPLAIN_FORMAT_TEXT)
+	{
+		appendStringInfoSpaces(es->str, es->indent * 2);
+		appendStringInfo(es->str,
+						 "Sort Method: %s  %s: %ldkB\n",
+						 sortMethod, spaceType, spaceUsed);
+	}
+	else
+	{
+		ExplainPropertyText("Sort Method", sortMethod, es);
+		ExplainPropertyInteger("Sort Space Used", "kB", spaceUsed, es);
+		ExplainPropertyText("Sort Space Type", spaceType, es);
 	}
 }
 
-- 
2.20.1

#4Alvaro Herrera
alvherre@2ndquadrant.com
In reply to: Jehan-Guillaume de Rorthais (#3)
Re: Explain: Duplicate key "Workers" in JSON format

We have not fixed this, have we?

--
�lvaro Herrera https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

#5Andres Freund
andres@anarazel.de
In reply to: Alvaro Herrera (#4)
Re: Explain: Duplicate key "Workers" in JSON format

On 2020-03-31 14:06:50 -0300, Alvaro Herrera wrote:

We have not fixed this, have we?

I thought we did:

commit 10013684970453a0ddc86050bba813c611114321
Author: Tom Lane <tgl@sss.pgh.pa.us>
Date: 2020-01-25 18:16:42 -0500

Clean up EXPLAIN's handling of per-worker details.

Previously, it was possible for EXPLAIN ANALYZE of a parallel query
to produce several different "Workers" fields for a single plan node,
because different portions of explain.c independently generated
per-worker data and wrapped that output in separate fields. This
is pretty bogus, especially for the structured output formats: even
if it's not technically illegal, most programs would have a hard time
dealing with such data.

#6Tom Lane
tgl@sss.pgh.pa.us
In reply to: Alvaro Herrera (#4)
Re: Explain: Duplicate key "Workers" in JSON format

Alvaro Herrera <alvherre@2ndquadrant.com> writes:

We have not fixed this, have we?

Isn't this 10013684970453a0ddc86050bba813c611114321 ?

regards, tom lane

#7Alvaro Herrera
alvherre@2ndquadrant.com
In reply to: Tom Lane (#6)
Re: Explain: Duplicate key "Workers" in JSON format

On 2020-Mar-31, Tom Lane wrote:

Alvaro Herrera <alvherre@2ndquadrant.com> writes:

We have not fixed this, have we?

Isn't this 10013684970453a0ddc86050bba813c611114321 ?

Ah, right, another thread reporting the same thing, two months after
this one.

Thanks both :-)

--
�lvaro Herrera https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services