pg13dev: explain partial, parallel hashagg, and memory use

Started by Justin Pryzbyover 5 years ago8 messages

pryzby@telsasoft.com

over 5 years ago

I'm testing with a customer's data on pg13dev and got output for which Peak
Memory doesn't look right/useful. I reproduced it on 565f16902.

CREATE TABLE p(i int) PARTITION BY RANGE(i);
CREATE TABLE p1 PARTITION OF p FOR VALUES FROM (0)TO(1000);
CREATE TABLE p2 PARTITION OF p FOR VALUES FROM (1000)TO(2000);
CREATE TABLE p3 PARTITION OF p FOR VALUES FROM (2000)TO(3000);
INSERT INTO p SELECT i%3000 FROM generate_series(1,999999)i;
VACUUM ANALYZE p;

postgres=# explain(analyze,settings) SELECT i, COUNT(1) FROM p GROUP BY 1;
QUERY PLAN
---------------------------------------------------------------------------------------------------------------------------------------
Gather (cost=7469.00..14214.45 rows=2502 width=12) (actual time=489.409..514.209 rows=3000 loops=1)
Workers Planned: 2
Workers Launched: 2
-> Parallel Append (cost=6469.00..12964.25 rows=1251 width=12) (actual time=476.291..477.179 rows=1000 loops=3)
-> HashAggregate (cost=6487.99..6497.99 rows=1000 width=12) (actual time=474.454..475.203 rows=1000 loops=1)
Group Key: p.i
Batches: 1 Memory Usage: 0kB
Worker 0: Batches: 1 Memory Usage: 193kB
Worker 1: Batches: 1 Memory Usage: 0kB
-> Seq Scan on p1 p (cost=0.00..4817.99 rows=333999 width=4) (actual time=0.084..100.677 rows=333999 loops=1)
-> HashAggregate (cost=6469.00..6479.00 rows=1000 width=12) (actual time=468.517..469.272 rows=1000 loops=1)
Group Key: p_1.i
Batches: 1 Memory Usage: 0kB
Worker 0: Batches: 1 Memory Usage: 0kB
Worker 1: Batches: 1 Memory Usage: 193kB
-> Seq Scan on p2 p_1 (cost=0.00..4804.00 rows=333000 width=4) (actual time=0.082..102.154 rows=333000 loops=1)
-> HashAggregate (cost=6469.00..6479.00 rows=1000 width=12) (actual time=485.887..486.509 rows=1000 loops=1)
Group Key: p_2.i
Batches: 1 Memory Usage: 193kB
Worker 0: Batches: 1 Memory Usage: 0kB
Worker 1: Batches: 1 Memory Usage: 0kB
-> Seq Scan on p3 p_2 (cost=0.00..4804.00 rows=333000 width=4) (actual time=0.043..104.631 rows=333000 loops=1)
Settings: effective_io_concurrency = '0', enable_partitionwise_aggregate = 'on', enable_partitionwise_join = 'on', work_mem = '127MB'

--
Justin

David Rowley

dgrowleyml@gmail.com

over 5 years ago

In reply to: Justin Pryzby (#1)

Re: pg13dev: explain partial, parallel hashagg, and memory use

On Wed, 5 Aug 2020 at 13:21, Justin Pryzby <pryzby@telsasoft.com> wrote:

I'm testing with a customer's data on pg13dev and got output for which Peak
Memory doesn't look right/useful. I reproduced it on 565f16902.

Likely the sanity of those results depends on whether you think that
the Memory Usage reported outside of the workers is meant to be the
sum of all processes or the memory usage for the leader backend.

All that's going on here is that the Parallel Append is using some
parallel safe paths and giving one to each worker. The 2 workers take
the first 2 subpaths and the leader takes the third. The memory usage
reported helps confirm that's the case.

Can you explain what you'd want to see changed about this? Or do you
want to see the non-parallel worker memory be the sum of all workers?
Sort does not seem to do that, so I'm not sure if we should consider
hash agg as an exception to that.

One thing I did notice from playing with this table is that Sort does
not show the memory used by the leader process when it didn't do any
of the work itself.

postgres=# set parallel_leader_participation =off;
SET
postgres=# explain analyze select i from p group by i;

-> Sort (cost=59436.92..60686.92 rows=500000 width=4)
(actual time=246.836..280.985 rows=500000 loops=2)
Sort Key: p.i
Worker 0: Sort Method: quicksort Memory: 27898kB
Worker 1: Sort Method: quicksort Memory: 55842kB

Whereas with the leader helping out we get:

-> Sort (cost=51284.39..52326.05 rows=416666 width=4) (actual
time=191.814..213.418 rows=333333 loops=3)
Sort Key: p.i
Sort Method: quicksort Memory: 33009kB
Worker 0: Sort Method: quicksort Memory: 25287kB
Worker 1: Sort Method: quicksort Memory: 25445kB

Maybe we should do the same for hash agg when the leader didn't assist?

David

James Coleman

jtc331@gmail.com

over 5 years ago

In reply to: David Rowley (#2)

Re: pg13dev: explain partial, parallel hashagg, and memory use

On Tue, Aug 4, 2020 at 9:44 PM David Rowley <dgrowleyml@gmail.com> wrote:

On Wed, 5 Aug 2020 at 13:21, Justin Pryzby <pryzby@telsasoft.com> wrote:

I'm testing with a customer's data on pg13dev and got output for which Peak
Memory doesn't look right/useful. I reproduced it on 565f16902.

Likely the sanity of those results depends on whether you think that
the Memory Usage reported outside of the workers is meant to be the
sum of all processes or the memory usage for the leader backend.

All that's going on here is that the Parallel Append is using some
parallel safe paths and giving one to each worker. The 2 workers take
the first 2 subpaths and the leader takes the third. The memory usage
reported helps confirm that's the case.

Can you explain what you'd want to see changed about this? Or do you
want to see the non-parallel worker memory be the sum of all workers?
Sort does not seem to do that, so I'm not sure if we should consider
hash agg as an exception to that.

I've always found the way we report parallel workers in EXPLAIN quite
confusing. I realize it matches the actual implementation model (the
leader often is also "another worker", but I think the natural
expectation from a user perspective would be that you'd show as
workers all backends (including the leader) that did work, and then
aggregate into a summary line (where the leader is displayed now).

In the current output there's nothing really to hint to the use that
the model is leader + workers and that the "summary" line is really
the leader. If I were to design this from scratch, I'd want to propose
doing what I said above (summary aggregate line + treat leader as a
worker line, likely with a "leader" tag), but that seems like a big
change to make now. On the other hand, perhaps designating what looks
like a summary line as the "leader" or some such would help clear up
the confusion? Perhaps it could also say "Participating" or
"Non-participating"?

James

Justin Pryzby

pryzby@telsasoft.com

over 5 years ago

In reply to: David Rowley (#2)

Re: pg13dev: explain partial, parallel hashagg, and memory use

On Wed, Aug 05, 2020 at 01:44:17PM +1200, David Rowley wrote:

On Wed, 5 Aug 2020 at 13:21, Justin Pryzby <pryzby@telsasoft.com> wrote:

I'm testing with a customer's data on pg13dev and got output for which Peak
Memory doesn't look right/useful. I reproduced it on 565f16902.

Likely the sanity of those results depends on whether you think that
the Memory Usage reported outside of the workers is meant to be the
sum of all processes or the memory usage for the leader backend.

All that's going on here is that the Parallel Append is using some
parallel safe paths and giving one to each worker. The 2 workers take
the first 2 subpaths and the leader takes the third. The memory usage
reported helps confirm that's the case.

I'm not sure there's a problem, but all the 0kB were suspicious to me.

I think you're saying that one worker alone handled each HashAgg, and the other
worker (and leader) show 0kB. I guess in my naive thinking it's odd to show a
worker which wasn't active for that subpath (at least in text output). But I
don't know the expected behavior of parallel hashagg, so that explains most of
my confusion.

On Tue, Aug 04, 2020 at 10:01:18PM -0400, James Coleman wrote:

Perhaps it could also say "Participating" or "Non-participating"?

Yes, that'd help me alot :)

Also odd (to me). If I encourage more workers, there are "slots" for each
"planned" worker, even though fewer were launched:

postgres=# ALTER TABLE p3 SET (parallel_workers=11);
postgres=# SET max_parallel_workers_per_gather=11;
Finalize HashAggregate (cost=10299.64..10329.64 rows=3000 width=12) (actual time=297.793..299.933 rows=3000 loops=1)
Group Key: p.i
Batches: 1 Memory Usage: 625kB
-> Gather (cost=2928.09..10134.64 rows=33000 width=12) (actual time=233.398..282.429 rows=13000 loops=1)
Workers Planned: 11
Workers Launched: 7
-> Parallel Append (cost=1928.09..5834.64 rows=3000 width=12) (actual time=214.358..232.980 rows=1625 loops=8)
-> Partial HashAggregate (cost=1933.46..1943.46 rows=1000 width=12) (actual time=167.936..171.345 rows=1000 loops=4)
Group Key: p.i
Batches: 1 Memory Usage: 0kB
Worker 0: Batches: 1 Memory Usage: 193kB
Worker 1: Batches: 1 Memory Usage: 193kB
Worker 2: Batches: 1 Memory Usage: 0kB
Worker 3: Batches: 1 Memory Usage: 0kB
Worker 4: Batches: 1 Memory Usage: 193kB
Worker 5: Batches: 1 Memory Usage: 0kB
Worker 6: Batches: 1 Memory Usage: 193kB
Worker 7: Batches: 0 Memory Usage: 0kB
Worker 8: Batches: 0 Memory Usage: 0kB
Worker 9: Batches: 0 Memory Usage: 0kB
Worker 10: Batches: 0 Memory Usage: 0kB

Thanks,
--
Justin

David Rowley

dgrowleyml@gmail.com

over 5 years ago

In reply to: Justin Pryzby (#4)

Re: pg13dev: explain partial, parallel hashagg, and memory use

On Wed, 5 Aug 2020 at 14:13, Justin Pryzby <pryzby@telsasoft.com> wrote:

Also odd (to me). If I encourage more workers, there are "slots" for each
"planned" worker, even though fewer were launched:

Looking at explain.c for "num_workers; " (including the final space at
the end), looking at each for loop that loops over each worker, quite
a number of those locations have a condition that skips the worker.

For example, show_sort_info() does

if (sinstrument->sortMethod == SORT_TYPE_STILL_IN_PROGRESS)
continue; /* ignore any unfilled slots */

So maybe Hash Agg should be doing something similar. Additionally,
maybe it should not show the leader details if the leader didn't help.

David

David Rowley

dgrowleyml@gmail.com

over 5 years ago

In reply to: David Rowley (#5)

1 attachment(s)

Re: pg13dev: explain partial, parallel hashagg, and memory use

On Wed, 5 Aug 2020 at 14:27, David Rowley <dgrowleyml@gmail.com> wrote:

On Wed, 5 Aug 2020 at 14:13, Justin Pryzby <pryzby@telsasoft.com> wrote:

Also odd (to me). If I encourage more workers, there are "slots" for each
"planned" worker, even though fewer were launched:

Looking at explain.c for "num_workers; " (including the final space at
the end), looking at each for loop that loops over each worker, quite
a number of those locations have a condition that skips the worker.

For example, show_sort_info() does

if (sinstrument->sortMethod == SORT_TYPE_STILL_IN_PROGRESS)
continue; /* ignore any unfilled slots */

So maybe Hash Agg should be doing something similar. Additionally,
maybe it should not show the leader details if the leader didn't help.

Here's what I had in mind.

The unpatched format got even more broken with EXPLAIN (ANALYZE,
VERBOSE), so this is certainly a bug fix.

David

Attachments:

dont_display_hashagg_properties_for_workers_that_dont_assist.patchapplication/octet-stream; name=dont_display_hashagg_properties_for_workers_that_dont_assist.patchDownload

diff --git a/src/backend/commands/explain.c b/src/backend/commands/explain.c
index 1e565fd337..30e0a7ee7f 100644
--- a/src/backend/commands/explain.c
+++ b/src/backend/commands/explain.c
@@ -3063,15 +3063,19 @@ show_hashagg_info(AggState *aggstate, ExplainState *es)
 			ExplainPropertyInteger("Planned Partitions", NULL,
 								   aggstate->hash_planned_partitions, es);
 
-		if (!es->analyze)
-			return;
-
-		/* EXPLAIN ANALYZE */
-		ExplainPropertyInteger("HashAgg Batches", NULL,
-							   aggstate->hash_batches_used, es);
-		ExplainPropertyInteger("Peak Memory Usage", "kB", memPeakKb, es);
-		ExplainPropertyInteger("Disk Usage", "kB",
-							   aggstate->hash_disk_used, es);
+		/*
+		 * During parallel query the leader may have not helped out.  We
+		 * detect this by checking how much memory it used.  If we find it
+		 * didn't do any work then we don't show its properties.
+		 */
+		if (es->analyze && aggstate->hash_mem_peak > 0)
+		{
+			ExplainPropertyInteger("HashAgg Batches", NULL,
+								   aggstate->hash_batches_used, es);
+			ExplainPropertyInteger("Peak Memory Usage", "kB", memPeakKb, es);
+			ExplainPropertyInteger("Disk Usage", "kB",
+								   aggstate->hash_disk_used, es);
+		}
 	}
 	else
 	{
@@ -3085,26 +3089,32 @@ show_hashagg_info(AggState *aggstate, ExplainState *es)
 			gotone = true;
 		}
 
-		if (!es->analyze)
+		/*
+		 * During parallel query the leader may have not helped out.  We
+		 * detect this by checking how much memory it used.  If we find it
+		 * didn't do any work then we don't show its properties.
+		 */
+		if (es->analyze && aggstate->hash_mem_peak > 0)
 		{
-			if (gotone)
-				appendStringInfoChar(es->str, '\n');
-			return;
-		}
+			if (!gotone)
+				ExplainIndentText(es);
+			else
+				appendStringInfoString(es->str, "  ");
 
-		if (!gotone)
-			ExplainIndentText(es);
-		else
-			appendStringInfoString(es->str, "  ");
+			appendStringInfo(es->str, "Batches: %d  Memory Usage: " INT64_FORMAT "kB",
+							 aggstate->hash_batches_used, memPeakKb);
+			gotone = true;
 
-		appendStringInfo(es->str, "Batches: %d  Memory Usage: " INT64_FORMAT "kB",
-						 aggstate->hash_batches_used, memPeakKb);
+			/* Only display disk usage if we spilled to disk */
+			if (aggstate->hash_batches_used > 1)
+			{
+				appendStringInfo(es->str, "  Disk Usage: " UINT64_FORMAT "kB",
+					aggstate->hash_disk_used);
+			}
+		}
 
-		/* Only display disk usage if we spilled to disk */
-		if (aggstate->hash_batches_used > 1)
-			appendStringInfo(es->str, "  Disk Usage: " UINT64_FORMAT "kB",
-							 aggstate->hash_disk_used);
-		appendStringInfoChar(es->str, '\n');
+		if (gotone)
+			appendStringInfoChar(es->str, '\n');
 	}
 
 	/* Display stats for each parallel worker */
@@ -3117,6 +3127,9 @@ show_hashagg_info(AggState *aggstate, ExplainState *es)
 			int			hash_batches_used;
 
 			sinstrument = &aggstate->shared_info->sinstrument[n];
+			/* Skip workers that didn't do anything */
+			if (sinstrument->hash_mem_peak == 0)
+				continue;
 			hash_disk_used = sinstrument->hash_disk_used;
 			hash_batches_used = sinstrument->hash_batches_used;
 			memPeakKb = (sinstrument->hash_mem_peak + 1023) / 1024;

David Rowley

dgrowleyml@gmail.com

over 5 years ago

In reply to: David Rowley (#6)

Re: pg13dev: explain partial, parallel hashagg, and memory use

On Wed, 5 Aug 2020 at 17:25, David Rowley <dgrowleyml@gmail.com> wrote:

On Wed, 5 Aug 2020 at 14:27, David Rowley <dgrowleyml@gmail.com> wrote:

So maybe Hash Agg should be doing something similar. Additionally,
maybe it should not show the leader details if the leader didn't help.

Here's what I had in mind.

Just coming back to this. I'd like to push it soon, but it's currently
late here. I'll look at pushing it in my morning in about 8 hours
time.

If anyone has any comments please let me know before then.

David

David Rowley

dgrowleyml@gmail.com

over 5 years ago

In reply to: David Rowley (#7)

Re: pg13dev: explain partial, parallel hashagg, and memory use

On Fri, 7 Aug 2020 at 00:44, David Rowley <dgrowleyml@gmail.com> wrote:

Just coming back to this. I'd like to push it soon, but it's currently
late here. I'll look at pushing it in my morning in about 8 hours
time.

Pushed.

David