Result Cache node shows per-worker info even for workers not launched

Started by Amit Khandekarover 4 years ago8 messages

amitdkhan.pg@gmail.com

over 4 years ago

1 attachment(s)

Hi,

If planned parallel workers do not get launched, the Result Cache plan
node shows all-0 stats for each of those workers:

tpch=# set max_parallel_workers TO 0;
SET
tpch=# explain analyze
select avg(l_discount) from orders, lineitem
where
l_orderkey = o_orderkey
and o_orderdate < date '1995-03-09'
and l_shipdate > date '1995-03-09';

QUERY PLAN
---------------------------------------------------------------------------------------------------------------------------------------------------------
Finalize Aggregate (cost=315012.87..315012.88 rows=1 width=32)
(actual time=27533.482..27533.598 rows=1 loops=1)
-> Gather (cost=315012.44..315012.85 rows=4 width=32) (actual
time=27533.471..27533.587 rows=1 loops=1)
Workers Planned: 4
Workers Launched: 0
-> Partial Aggregate (cost=314012.44..314012.45 rows=1
width=32) (actual time=27533.177..27533.178 rows=1 loops=1)
-> Nested Loop (cost=0.44..309046.68 rows=1986303
width=4) (actual time=0.400..27390.835 rows=748912 loops=1)
-> Parallel Seq Scan on lineitem
(cost=0.00..154513.66 rows=4120499 width=12) (actual
time=0.044..7910.399 rows=16243662 loops=1)
Filter: (l_shipdate > '1995-03-09'::date)
Rows Removed by Filter: 13756133
-> Result Cache (cost=0.44..0.53 rows=1
width=4) (actual time=0.001..0.001 rows=0 loops=16243662)
Cache Key: lineitem.l_orderkey
Hits: 12085749 Misses: 4157913 Evictions:
3256424 Overflows: 0 Memory Usage: 65537kB
Worker 0: Hits: 0 Misses: 0 Evictions: 0
Overflows: 0 Memory Usage: 0kB
Worker 1: Hits: 0 Misses: 0 Evictions: 0
Overflows: 0 Memory Usage: 0kB
Worker 2: Hits: 0 Misses: 0 Evictions: 0
Overflows: 0 Memory Usage: 0kB
Worker 3: Hits: 0 Misses: 0 Evictions: 0
Overflows: 0 Memory Usage: 0kB
-> Index Scan using orders_pkey on orders
(cost=0.43..0.52 rows=1 width=4) (actual time=0.002..0.002 rows=0
loops=4157913)
Index Cond: (o_orderkey = lineitem.l_orderkey)
Filter: (o_orderdate < '1995-03-09'::date)
Rows Removed by Filter: 1
Planning Time: 0.211 ms
Execution Time: 27553.477 ms
(22 rows)

By looking at the other cases like show_sort_info() or printing
per-worker jit info, I could see that the general policy is that we
skip printing info for workers that are not launched. Attached is a
patch to do the same for Result Cache.

I was earlier thinking about using (instrument[n].nloops == 0) to
check for not-launched workers. But we are already using "if
(rcstate->stats.cache_misses == 0)" for the leader process, so for
consistency I used the same method for workers.

--
Thanks,
-Amit Khandekar
Huawei Technologies

Attachments:

skip_notlaunched_workers.patchapplication/octet-stream; name=skip_notlaunched_workers.patchDownload

diff --git a/src/backend/commands/explain.c b/src/backend/commands/explain.c
index ede8cec947..60c74dc4f0 100644
--- a/src/backend/commands/explain.c
+++ b/src/backend/commands/explain.c
@@ -3168,6 +3168,17 @@ show_resultcache_info(ResultCacheState *rcstate, List *ancestors,
 
 		si = &rcstate->shared_info->sinstrument[n];
 
+		/*
+		 * Skip for workers that didn't launch. The assumption is that result
+		 * cache's suplan is never a partial plan.  So the assumption for the
+		 * leader that there will always be at least one cache miss holds true
+		 * for worker also. If result cache starts accepting partial subplans,
+		 * then zero cache_misses won't necessarily mean the worker has not
+		 * launched.
+		 */
+		if (si->cache_misses == 0)
+			continue;
+
 		if (es->workers_state)
 			ExplainOpenWorker(n, es);

David Rowley

dgrowleyml@gmail.com

over 4 years ago

In reply to: Amit Khandekar (#1)

1 attachment(s)

Re: Result Cache node shows per-worker info even for workers not launched

On Wed, 28 Apr 2021 at 00:39, Amit Khandekar <amitdkhan.pg@gmail.com> wrote:

If planned parallel workers do not get launched, the Result Cache plan
node shows all-0 stats for each of those workers:

Thanks for reporting this and for the patch.

You're right that there is a problem here. I did in fact have code to
skip workers that didn't have any cache misses right up until v18 of
the patch [1]/messages/by-id/CAApHDvoOmTtNPoF-+Q1dAOMa8vWFsFbyQb_A0iUKTS5nf2DuLw@mail.gmail.com, but I removed it because I was getting some test
instability in the partition_prune regression tests. That was
highlighted by the CFbot machines. I mentioned about that in the final
paragraph of [2]/messages/by-id/CAApHDvrz4f+i1wu-8hyqJ=pxYDroGA5Okgo5rWPOj47RZ6QTmQ@mail.gmail.com. I didn't mention the exact test there, but I was
talking about the test in partition_prune.sql.

By the time it came to b6002a796, I did end up changing the
partition_prune tests. However, I had to back that version out again
because of some problems with force_parallel_mode = regress buildfarm
animals. By the time I re-committed Result Cache in 9eacee2e6, I had
changed the partition_prune tests so they did SET enable_resultcache =
0 before running that parallel test. I'd basically decided that the
test was never going to be stable on the buildfarm.

The problem there was that the results would vary depending on if the
parallel worker managed to do anything before the main process did all
the work. Because the tests are pretty small scale, many machines
wouldn't manage to get their worker(s) in gear and running before the
main process had finished the test. This was the reason I removed the
cache_misses == 0 test in explain.c. I'd thought that I could make
that test stable by just always outputting the cache stats line from
workers, even if they didn't assist during execution.

So, given that I removed the parallel test in partition_prune.sql, and
don't have any EXPLAIN ANALYZE output for parallel tests in
resultcache.sql, it should be safe enough to put that cache_misses ==
0 test back into explain.c

I've attached a patch to do this. The explain.c part is pretty similar
to your patch, I just took my original code and comment.

The patch also removes the SET force_parallel_mode = off in
resultcache.sql. That's no longer needed due to adding this check in
explain.c again. I also removed the changes I made to the
explain_parallel_append function in partition_prune.sql. I shouldn't
have included those in 9eacee2e6.

I plan to push this in the next 24 hours or so.

David

[1]: /messages/by-id/CAApHDvoOmTtNPoF-+Q1dAOMa8vWFsFbyQb_A0iUKTS5nf2DuLw@mail.gmail.com
[2]: /messages/by-id/CAApHDvrz4f+i1wu-8hyqJ=pxYDroGA5Okgo5rWPOj47RZ6QTmQ@mail.gmail.com

Attachments:

resultcache_explain_fix.patchapplication/octet-stream; name=resultcache_explain_fix.patchDownload

diff --git a/src/backend/commands/explain.c b/src/backend/commands/explain.c
index 3d5198e234..336b4f22ac 100644
--- a/src/backend/commands/explain.c
+++ b/src/backend/commands/explain.c
@@ -3186,6 +3186,13 @@ show_resultcache_info(ResultCacheState *rcstate, List *ancestors,
 
 		si = &rcstate->shared_info->sinstrument[n];
 
+		/*
+		 * Skip workers that didn't do any work.  We needn't bother checking
+		 * for cache hits as a miss will always occur before a cache hit.
+		 */
+		if (si->cache_misses == 0)
+			continue;
+
 		if (es->workers_state)
 			ExplainOpenWorker(n, es);
 
diff --git a/src/test/regress/expected/partition_prune.out b/src/test/regress/expected/partition_prune.out
index 1a7149bfd5..2c62e4a7a6 100644
--- a/src/test/regress/expected/partition_prune.out
+++ b/src/test/regress/expected/partition_prune.out
@@ -1958,9 +1958,6 @@ begin
         ln := regexp_replace(ln, 'Workers Launched: \d+', 'Workers Launched: N');
         ln := regexp_replace(ln, 'actual rows=\d+ loops=\d+', 'actual rows=N loops=N');
         ln := regexp_replace(ln, 'Rows Removed by Filter: \d+', 'Rows Removed by Filter: N');
-        ln := regexp_replace(ln, 'Hits: \d+', 'Hits: N');
-        ln := regexp_replace(ln, 'Misses: \d+', 'Misses: N');
-        ln := regexp_replace(ln, 'Memory Usage: \d+', 'Memory Usage: N');
         return next ln;
     end loop;
 end;
diff --git a/src/test/regress/expected/resultcache.out b/src/test/regress/expected/resultcache.out
index 7f4bf95f3f..5b5dd6838e 100644
--- a/src/test/regress/expected/resultcache.out
+++ b/src/test/regress/expected/resultcache.out
@@ -31,9 +31,6 @@ $$;
 -- Ensure we get a result cache on the inner side of the nested loop
 SET enable_hashjoin TO off;
 SET enable_bitmapscan TO off;
--- force_parallel_mode = regress can cause some instability in EXPLAIN ANALYZE
--- output, so let's ensure that we turn it off.
-SET force_parallel_mode TO off;
 SELECT explain_resultcache('
 SELECT COUNT(*),AVG(t1.unique1) FROM tenk1 t1
 INNER JOIN tenk1 t2 ON t1.unique1 = t2.twenty
@@ -118,7 +115,6 @@ WHERE t2.unique1 < 1200;', true);
 
 RESET enable_mergejoin;
 RESET work_mem;
-RESET force_parallel_mode;
 RESET enable_bitmapscan;
 RESET enable_hashjoin;
 -- Test parallel plans with Result Cache.
diff --git a/src/test/regress/sql/partition_prune.sql b/src/test/regress/sql/partition_prune.sql
index 247264f93b..16c8dc5f1f 100644
--- a/src/test/regress/sql/partition_prune.sql
+++ b/src/test/regress/sql/partition_prune.sql
@@ -464,9 +464,6 @@ begin
         ln := regexp_replace(ln, 'Workers Launched: \d+', 'Workers Launched: N');
         ln := regexp_replace(ln, 'actual rows=\d+ loops=\d+', 'actual rows=N loops=N');
         ln := regexp_replace(ln, 'Rows Removed by Filter: \d+', 'Rows Removed by Filter: N');
-        ln := regexp_replace(ln, 'Hits: \d+', 'Hits: N');
-        ln := regexp_replace(ln, 'Misses: \d+', 'Misses: N');
-        ln := regexp_replace(ln, 'Memory Usage: \d+', 'Memory Usage: N');
         return next ln;
     end loop;
 end;
diff --git a/src/test/regress/sql/resultcache.sql b/src/test/regress/sql/resultcache.sql
index 3fede90b00..43a70d56a5 100644
--- a/src/test/regress/sql/resultcache.sql
+++ b/src/test/regress/sql/resultcache.sql
@@ -33,9 +33,7 @@ $$;
 -- Ensure we get a result cache on the inner side of the nested loop
 SET enable_hashjoin TO off;
 SET enable_bitmapscan TO off;
--- force_parallel_mode = regress can cause some instability in EXPLAIN ANALYZE
--- output, so let's ensure that we turn it off.
-SET force_parallel_mode TO off;
+
 SELECT explain_resultcache('
 SELECT COUNT(*),AVG(t1.unique1) FROM tenk1 t1
 INNER JOIN tenk1 t2 ON t1.unique1 = t2.twenty
@@ -69,7 +67,6 @@ INNER JOIN tenk1 t2 ON t1.unique1 = t2.thousand
 WHERE t2.unique1 < 1200;', true);
 RESET enable_mergejoin;
 RESET work_mem;
-RESET force_parallel_mode;
 RESET enable_bitmapscan;
 RESET enable_hashjoin;

Bharath Rupireddy

bharath.rupireddyforpostgres@gmail.com

over 4 years ago

In reply to: David Rowley (#2)

Re: Result Cache node shows per-worker info even for workers not launched

On Wed, Apr 28, 2021 at 1:54 PM David Rowley <dgrowleyml@gmail.com> wrote:

I plan to push this in the next 24 hours or so.

I happen to see explain_resultcache in resultcache.sql, seems like two
of the tests still have numbers for cache hits and misses - Hits: 980
Misses: 20, won't these make tests unstable? Will these numbers be
same across machines? Or is it that no buildfarm had caught these? The
comment below says that, the hits and misses are not same across
machines:
-- Ensure we get some evictions. We're unable to validate the hits and misses
-- here as the number of entries that fit in the cache at once will vary
-- between different machines.

Should we remove the hide_hitmiss parameter in explain_resultcache and
always print N for non-zero and Zero for 0 hits and misses? This
clearly shows that we have 0 or non-zero hits or misses.

Am I missing something?

With Regards,
Bharath Rupireddy.
EnterpriseDB: http://www.enterprisedb.com

Amit Khandekar

amitdkhan.pg@gmail.com

over 4 years ago

In reply to: Bharath Rupireddy (#3)

Re: Result Cache node shows per-worker info even for workers not launched

On Wed, 28 Apr 2021 at 15:08, Bharath Rupireddy
<bharath.rupireddyforpostgres@gmail.com> wrote:

On Wed, Apr 28, 2021 at 1:54 PM David Rowley <dgrowleyml@gmail.com> wrote:

I plan to push this in the next 24 hours or so.

I happen to see explain_resultcache in resultcache.sql, seems like two
of the tests still have numbers for cache hits and misses - Hits: 980
Misses: 20, won't these make tests unstable? Will these numbers be
same across machines? Or is it that no buildfarm had caught these? The
comment below says that, the hits and misses are not same across
machines:
-- Ensure we get some evictions. We're unable to validate the hits and misses
-- here as the number of entries that fit in the cache at once will vary
-- between different machines.

Should we remove the hide_hitmiss parameter in explain_resultcache and
always print N for non-zero and Zero for 0 hits and misses? This
clearly shows that we have 0 or non-zero hits or misses.

Am I missing something?

I believe, the assumption here is that with no workers involved, it is
guaranteed to have the exact same cache misses and hits anywhere.

David Rowley

dgrowleyml@gmail.com

over 4 years ago

In reply to: Bharath Rupireddy (#3)

Re: Result Cache node shows per-worker info even for workers not launched

On Wed, 28 Apr 2021 at 21:38, Bharath Rupireddy
<bharath.rupireddyforpostgres@gmail.com> wrote:

On Wed, Apr 28, 2021 at 1:54 PM David Rowley <dgrowleyml@gmail.com> wrote:

I plan to push this in the next 24 hours or so.

I happen to see explain_resultcache in resultcache.sql, seems like two
of the tests still have numbers for cache hits and misses - Hits: 980
Misses: 20, won't these make tests unstable? Will these numbers be
same across machines? Or is it that no buildfarm had caught these? The
comment below says that, the hits and misses are not same across
machines:
-- Ensure we get some evictions. We're unable to validate the hits and misses
-- here as the number of entries that fit in the cache at once will vary
-- between different machines.

The only reason it would be unstable is if there are cache evictions.
Evictions will only happen if the cache fills up and we need to make
way for new entries. A 32-bit machine, for example, will use slightly
less memory for caching items, so the number of evictions is going to
be a bit less on those machine. Having an unstable number of
evictions will cause the hits and misses to be unstable too.
Otherwise, the number of misses is predictable, it'll be the number of
distinct sets of parameters that we lookup in the cache. Any repeats
will be a hit. So hits plus misses should just add up to the number
of times that a normal parameterized nested loop would execute the
inner side, and that's predictable too. It would only change if you
change the query or the data in the table.

Should we remove the hide_hitmiss parameter in explain_resultcache and
always print N for non-zero and Zero for 0 hits and misses? This
clearly shows that we have 0 or non-zero hits or misses.

I added that because if there are no evictions then the hits and
misses should be perfectly stable, providing the test is small enough
not to exceed work_mem and fill the cache. If someone was to run the
tests with a small work_mem, then there would be no shortage of other
tests that would fail due to plan changes. These tests were designed
to be small enough so there's no danger of getting close to work_mem
and filling the cache.

However, I did add 1 test that sets work_mem down to 64kB to ensure
the eviction code does get some exercise. You'll notice that I pass
"true" to explain_resultcache() to hide the hits and misses there. We
can't test the exact number of hits/misses/evictions there, but we can
at least tell apart the zero and non-zero by how I coded
explain_resultcache() to replace with Zero or N depending on if the
number was zero or above zero.

David

Amit Khandekar

amitdkhan.pg@gmail.com

over 4 years ago

In reply to: David Rowley (#5)

Re: Result Cache node shows per-worker info even for workers not launched

On Wed, 28 Apr 2021 at 16:14, David Rowley <dgrowleyml@gmail.com> wrote:

However, I did add 1 test that sets work_mem down to 64kB to ensure
the eviction code does get some exercise. You'll notice that I pass
"true" to explain_resultcache() to hide the hits and misses there. We
can't test the exact number of hits/misses/evictions there, but we can
at least tell apart the zero and non-zero by how I coded
explain_resultcache() to replace with Zero or N depending on if the
number was zero or above zero.

Thanks for the explanation. I did realize after replying to Bharat
upthread, that I was wrong in assuming that the cache misses and cache
hits are always stable for non-parallel scans.

Amit Khandekar

amitdkhan.pg@gmail.com

over 4 years ago

In reply to: David Rowley (#2)

Re: Result Cache node shows per-worker info even for workers not launched

On Wed, 28 Apr 2021 at 13:54, David Rowley <dgrowleyml@gmail.com> wrote:

So, given that I removed the parallel test in partition_prune.sql, and
don't have any EXPLAIN ANALYZE output for parallel tests in
resultcache.sql, it should be safe enough to put that cache_misses ==
0 test back into explain.c

I've attached a patch to do this. The explain.c part is pretty similar
to your patch, I just took my original code and comment.

Sounds good. And thanks for the cleanup patch, and the brief history.
Patch looks ok to me.

David Rowley

dgrowleyml@gmail.com

over 4 years ago

In reply to: Amit Khandekar (#7)

Re: Result Cache node shows per-worker info even for workers not launched

On Wed, 28 Apr 2021 at 23:05, Amit Khandekar <amitdkhan.pg@gmail.com> wrote:

On Wed, 28 Apr 2021 at 13:54, David Rowley <dgrowleyml@gmail.com> wrote:

I've attached a patch to do this. The explain.c part is pretty similar
to your patch, I just took my original code and comment.

Sounds good. And thanks for the cleanup patch, and the brief history.
Patch looks ok to me.

Thanks for the review. I pushed the patch with a small additional
change to further tidy up show_resultcache_info().

David