plan with result cache is very slow when work_mem is not enough

Started by Pavel Stehuleover 4 years ago24 messages
#1Pavel Stehule
pavel.stehule@gmail.com

Hi

I am testing new features of Postgres 14, and now I am trying to check the
result cache. Unfortunately on my test data, the result is not too good.
the behaviour is very non linear. Is it expected?

create table t1(a int, t2_id int);
insert into t1 select random() * 100000, random() * 100000 from
generate_series(1,1000000);
create table t2(b int, id int);
insert into t2 select random() * 100000, random() * 100000 from
generate_series(1,1000000);
create index on t2(id);

vacuum analyze t1, t2;

when work_mem is 40MB

QUERY PLAN

-------------------------------------------------------------------------------------------------------------------------------------------

Nested Loop (cost=4.65..472639.79 rows=1000000 width=16) (actual
time=0.041..1078.882 rows=1000000 loops=1)

-> Seq Scan on t1 (cost=0.00..14425.00 rows=1000000 width=8) (actual
time=0.010..60.212 rows=1000000 loops=1)

-> Result Cache (cost=4.65..4.67 rows=1 width=8) (actual
time=0.001..0.001 rows=1 loops=1000000)

Cache Key: t1.t2_id

Hits: 900006 Misses: 99994 Evictions: 0 Overflows: 0 Memory
Usage: 10547kB

-> Aggregate (cost=4.64..4.65 rows=1 width=8) (actual
time=0.003..0.003 rows=1 loops=99994)

-> Index Only Scan using t2_id_idx on t2 (cost=0.42..4.62
rows=11 width=0) (actual time=0.002..0.003 rows=10 loops=99994)

Index Cond: (id = t1.t2_id)

Heap Fetches: 0

Planning Time: 0.091 ms

Execution Time: 1120.177 ms

when work_mem is 10MB

postgres=# set work_mem to '10MB'; -- 11MB is ok
SET
postgres=# explain analyze select * from t1, lateral(select count(*) from
t2 where t1.t2_id = t2.id) s ;
QUERY PLAN

--------------------------------------------------------------------------------------------------------------------------------------------
Nested Loop (cost=4.65..472639.79 rows=1000000 width=16) (actual
time=0.040..56576.187 rows=1000000 loops=1)
-> Seq Scan on t1 (cost=0.00..14425.00 rows=1000000 width=8) (actual
time=0.010..76.753 rows=1000000 loops=1)
-> Result Cache (cost=4.65..4.67 rows=1 width=8) (actual
time=0.056..0.056 rows=1 loops=1000000)
Cache Key: t1.t2_id
Hits: 884158 Misses: 115842 Evictions: 18752 Overflows: 0
Memory Usage: 10241kB
-> Aggregate (cost=4.64..4.65 rows=1 width=8) (actual
time=0.005..0.005 rows=1 loops=115842)
-> Index Only Scan using t2_id_idx on t2 (cost=0.42..4.62
rows=11 width=0) (actual time=0.003..0.004 rows=10 loops=115842)
Index Cond: (id = t1.t2_id)
Heap Fetches: 0
Planning Time: 0.087 ms
Execution Time: 56621.421 ms
(11 rows)

The query without result cache

postgres=# explain analyze select * from t1, lateral(select count(*) from
t2 where t1.t2_id = t2.id) s ;
QUERY PLAN

---------------------------------------------------------------------------------------------------------------------------------------
Nested Loop (cost=4.64..4689425.00 rows=1000000 width=16) (actual
time=0.031..3260.858 rows=1000000 loops=1)
-> Seq Scan on t1 (cost=0.00..14425.00 rows=1000000 width=8) (actual
time=0.008..71.792 rows=1000000 loops=1)
-> Aggregate (cost=4.64..4.65 rows=1 width=8) (actual
time=0.003..0.003 rows=1 loops=1000000)
-> Index Only Scan using t2_id_idx on t2 (cost=0.42..4.62
rows=11 width=0) (actual time=0.002..0.002 rows=10 loops=1000000)
Index Cond: (id = t1.t2_id)
Heap Fetches: 0
Planning Time: 0.081 ms
Execution Time: 3293.543 ms
(8 rows)

Samples: 119K of event 'cycles', 4000 Hz, Event count (approx.):
Overhead Shared Object Symbol
79.20% postgres [.] cache_reduce_memory
1.94% [kernel] [k] native_write_msr_safe
1.63% [kernel] [k] update_cfs_shares
1.00% [kernel] [k] trigger_load_balance
0.97% [kernel] [k] timerqueue_add
0.51% [kernel] [k] task_tick_fair
0.51% [kernel] [k] task_cputime
0.50% [kernel] [k] perf_event_task_tick
0.50% [kernel] [k] update_curr
0.49% [kernel] [k] hrtimer_active

Regards

Pavel

#2Pavel Stehule
pavel.stehule@gmail.com
In reply to: Pavel Stehule (#1)
Re: plan with result cache is very slow when work_mem is not enough

pá 7. 5. 2021 v 20:24 odesílatel Pavel Stehule <pavel.stehule@gmail.com>
napsal:

Hi

I am testing new features of Postgres 14, and now I am trying to check the
result cache. Unfortunately on my test data, the result is not too good.
the behaviour is very non linear. Is it expected?

create table t1(a int, t2_id int);
insert into t1 select random() * 100000, random() * 100000 from
generate_series(1,1000000);
create table t2(b int, id int);
insert into t2 select random() * 100000, random() * 100000 from
generate_series(1,1000000);
create index on t2(id);

vacuum analyze t1, t2;

when work_mem is 40MB

QUERY PLAN

-------------------------------------------------------------------------------------------------------------------------------------------

Nested Loop (cost=4.65..472639.79 rows=1000000 width=16) (actual
time=0.041..1078.882 rows=1000000 loops=1)

-> Seq Scan on t1 (cost=0.00..14425.00 rows=1000000 width=8) (actual
time=0.010..60.212 rows=1000000 loops=1)

-> Result Cache (cost=4.65..4.67 rows=1 width=8) (actual
time=0.001..0.001 rows=1 loops=1000000)

Cache Key: t1.t2_id

Hits: 900006 Misses: 99994 Evictions: 0 Overflows: 0 Memory
Usage: 10547kB

-> Aggregate (cost=4.64..4.65 rows=1 width=8) (actual
time=0.003..0.003 rows=1 loops=99994)

-> Index Only Scan using t2_id_idx on t2 (cost=0.42..4.62
rows=11 width=0) (actual time=0.002..0.003 rows=10 loops=99994)

Index Cond: (id = t1.t2_id)

Heap Fetches: 0

Planning Time: 0.091 ms

Execution Time: 1120.177 ms

when work_mem is 10MB

postgres=# set work_mem to '10MB'; -- 11MB is ok
SET
postgres=# explain analyze select * from t1, lateral(select count(*) from
t2 where t1.t2_id = t2.id) s ;
QUERY
PLAN

--------------------------------------------------------------------------------------------------------------------------------------------
Nested Loop (cost=4.65..472639.79 rows=1000000 width=16) (actual
time=0.040..56576.187 rows=1000000 loops=1)
-> Seq Scan on t1 (cost=0.00..14425.00 rows=1000000 width=8) (actual
time=0.010..76.753 rows=1000000 loops=1)
-> Result Cache (cost=4.65..4.67 rows=1 width=8) (actual
time=0.056..0.056 rows=1 loops=1000000)
Cache Key: t1.t2_id
Hits: 884158 Misses: 115842 Evictions: 18752 Overflows: 0
Memory Usage: 10241kB
-> Aggregate (cost=4.64..4.65 rows=1 width=8) (actual
time=0.005..0.005 rows=1 loops=115842)
-> Index Only Scan using t2_id_idx on t2 (cost=0.42..4.62
rows=11 width=0) (actual time=0.003..0.004 rows=10 loops=115842)
Index Cond: (id = t1.t2_id)
Heap Fetches: 0
Planning Time: 0.087 ms
Execution Time: 56621.421 ms
(11 rows)

can be possible to disable caching when the number of evictions across some
limit ?

Can be calculated some average cache hit ratio against evictions, and when
this ratio will be too big, then the cache can be bypassed.

Show quoted text

The query without result cache

postgres=# explain analyze select * from t1, lateral(select count(*) from
t2 where t1.t2_id = t2.id) s ;
QUERY PLAN

---------------------------------------------------------------------------------------------------------------------------------------
Nested Loop (cost=4.64..4689425.00 rows=1000000 width=16) (actual
time=0.031..3260.858 rows=1000000 loops=1)
-> Seq Scan on t1 (cost=0.00..14425.00 rows=1000000 width=8) (actual
time=0.008..71.792 rows=1000000 loops=1)
-> Aggregate (cost=4.64..4.65 rows=1 width=8) (actual
time=0.003..0.003 rows=1 loops=1000000)
-> Index Only Scan using t2_id_idx on t2 (cost=0.42..4.62
rows=11 width=0) (actual time=0.002..0.002 rows=10 loops=1000000)
Index Cond: (id = t1.t2_id)
Heap Fetches: 0
Planning Time: 0.081 ms
Execution Time: 3293.543 ms
(8 rows)

Samples: 119K of event 'cycles', 4000 Hz, Event count (approx.):
Overhead Shared Object Symbol
79.20% postgres [.] cache_reduce_memory
1.94% [kernel] [k] native_write_msr_safe
1.63% [kernel] [k] update_cfs_shares
1.00% [kernel] [k] trigger_load_balance
0.97% [kernel] [k] timerqueue_add
0.51% [kernel] [k] task_tick_fair
0.51% [kernel] [k] task_cputime
0.50% [kernel] [k] perf_event_task_tick
0.50% [kernel] [k] update_curr
0.49% [kernel] [k] hrtimer_active

Regards

Pavel

#3Yura Sokolov
y.sokolov@postgrespro.ru
In reply to: Pavel Stehule (#2)
Re: plan with result cache is very slow when work_mem is not enough

Pavel Stehule писал 2021-05-07 21:45:

Samples: 119K of event 'cycles', 4000 Hz, Event count (approx.):
Overhead Shared Object Symbol
79.20% postgres [.] cache_reduce_memory
1.94% [kernel] [k] native_write_msr_safe
1.63% [kernel] [k] update_cfs_shares
1.00% [kernel] [k] trigger_load_balance
0.97% [kernel] [k] timerqueue_add
0.51% [kernel] [k] task_tick_fair
0.51% [kernel] [k] task_cputime
0.50% [kernel] [k] perf_event_task_tick
0.50% [kernel] [k] update_curr
0.49% [kernel] [k] hrtimer_active

Regards

Pavel

It is strange to see cache_reduce_memory itself consumes a lot of CPU.
It doesn't contain CPU hungry code.
It calls prepare_probe_slot, that calls some tuple forming. Then
it calls resultcache_lookup that may call to ResultCacheHash_hash
and ResultCacheHash_equal. And finally it calls remove_cache_entry.
I suppose remove_cache_entry should consume most of CPU time since
it does deallocations.
And if you compile with --enable-cassert, then remove_cache_entry
iterates through whole cache hashtable, therefore it reaches
quadratic complexity easily (or more correct M*N, where M - size
of a table, N - eviction count).

regards,
Yura Sokolov

#4Tomas Vondra
tomas.vondra@enterprisedb.com
In reply to: Yura Sokolov (#3)
Re: plan with result cache is very slow when work_mem is not enough

On 5/7/21 9:06 PM, Yura Sokolov wrote:

Pavel Stehule писал 2021-05-07 21:45:

Samples: 119K of event 'cycles', 4000 Hz, Event count (approx.):
Overhead  Shared Object             Symbol
79.20%  postgres                  [.] cache_reduce_memory
1.94%  [kernel]                  [k] native_write_msr_safe
1.63%  [kernel]                  [k] update_cfs_shares
1.00%  [kernel]                  [k] trigger_load_balance
0.97%  [kernel]                  [k] timerqueue_add
0.51%  [kernel]                  [k] task_tick_fair
0.51%  [kernel]                  [k] task_cputime
0.50%  [kernel]                  [k] perf_event_task_tick
0.50%  [kernel]                  [k] update_curr
0.49%  [kernel]                  [k] hrtimer_active

Regards

Pavel

It is strange to see cache_reduce_memory itself consumes a lot of CPU.
It doesn't contain CPU hungry code.
It calls prepare_probe_slot, that calls some tuple forming. Then
it calls resultcache_lookup that may call to ResultCacheHash_hash
and ResultCacheHash_equal. And finally it calls remove_cache_entry.
I suppose remove_cache_entry should consume most of CPU time since
it does deallocations.
And if you compile with --enable-cassert, then remove_cache_entry
iterates through whole cache hashtable, therefore it reaches
quadratic complexity easily (or more correct M*N, where M - size
of a table, N - eviction count).

Yeah. I tried reproducing the issue, but without success ...

Not sure what's wrong, but --enable-cassert is one option. Or maybe
there's some funny behavior due to collecting timing info?

FWIW the timings on my laptop look like this:

work_mem=40MB 5065ms
work_mem=10MB 5104ms
resultcache=off 13453ms

So a very different behavior from what Pavel reported. But if I rebuild
with casserts, I get the same massive slowdown, so I guess that's it.

regards

--
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

#5Pavel Stehule
pavel.stehule@gmail.com
In reply to: Yura Sokolov (#3)
Re: plan with result cache is very slow when work_mem is not enough

pá 7. 5. 2021 v 21:06 odesílatel Yura Sokolov <y.sokolov@postgrespro.ru>
napsal:

Pavel Stehule писал 2021-05-07 21:45:

Samples: 119K of event 'cycles', 4000 Hz, Event count (approx.):
Overhead Shared Object Symbol
79.20% postgres [.] cache_reduce_memory
1.94% [kernel] [k] native_write_msr_safe
1.63% [kernel] [k] update_cfs_shares
1.00% [kernel] [k] trigger_load_balance
0.97% [kernel] [k] timerqueue_add
0.51% [kernel] [k] task_tick_fair
0.51% [kernel] [k] task_cputime
0.50% [kernel] [k] perf_event_task_tick
0.50% [kernel] [k] update_curr
0.49% [kernel] [k] hrtimer_active

Regards

Pavel

It is strange to see cache_reduce_memory itself consumes a lot of CPU.
It doesn't contain CPU hungry code.
It calls prepare_probe_slot, that calls some tuple forming. Then
it calls resultcache_lookup that may call to ResultCacheHash_hash
and ResultCacheHash_equal. And finally it calls remove_cache_entry.
I suppose remove_cache_entry should consume most of CPU time since
it does deallocations.
And if you compile with --enable-cassert, then remove_cache_entry
iterates through whole cache hashtable, therefore it reaches
quadratic complexity easily (or more correct M*N, where M - size
of a table, N - eviction count).

yes, the slowdown is related to debug assertions

Pavel

Show quoted text

regards,
Yura Sokolov

#6David Rowley
dgrowleyml@gmail.com
In reply to: Pavel Stehule (#5)
Re: plan with result cache is very slow when work_mem is not enough

On Sat, 8 May 2021 at 07:18, Pavel Stehule <pavel.stehule@gmail.com> wrote:

yes, the slowdown is related to debug assertions

With USE_ASSERT_CHECKING builds, I did add some code that verifies the
memory tracking is set correctly when evicting from the cache. This
code is pretty expensive as it loops over the entire cache to check
the memory accounting every time we evict something from the cache.
Originally, I had this code only run when some other constant was
defined, but I ended up changing it to compile that code in for all
assert enabled builds.

I considered that it might be too expensive as you can see from the
comment in [1]https://github.com/postgres/postgres/blob/master/src/backend/executor/nodeResultCache.c#L305. I just wanted to get a few machines other than my own
to verify that the memory accounting code was working as expected.
There have been no complaints of any Assert failures yet, so maybe
it's safe to consider either removing the code entirely or just having
it run when some other more specific to purpose constant is defined.
If we did the latter, then I'd have concerns that nothing would ever
run the code to check the memory accounting, that's why I ended up
changing it to run with USE_ASSERT_CHECKING builds.

David

[1]: https://github.com/postgres/postgres/blob/master/src/backend/executor/nodeResultCache.c#L305

#7Pavel Stehule
pavel.stehule@gmail.com
In reply to: David Rowley (#6)
Re: plan with result cache is very slow when work_mem is not enough

pá 7. 5. 2021 v 21:56 odesílatel David Rowley <dgrowleyml@gmail.com> napsal:

On Sat, 8 May 2021 at 07:18, Pavel Stehule <pavel.stehule@gmail.com>
wrote:

yes, the slowdown is related to debug assertions

With USE_ASSERT_CHECKING builds, I did add some code that verifies the
memory tracking is set correctly when evicting from the cache. This
code is pretty expensive as it loops over the entire cache to check
the memory accounting every time we evict something from the cache.
Originally, I had this code only run when some other constant was
defined, but I ended up changing it to compile that code in for all
assert enabled builds.

I considered that it might be too expensive as you can see from the
comment in [1]. I just wanted to get a few machines other than my own
to verify that the memory accounting code was working as expected.
There have been no complaints of any Assert failures yet, so maybe
it's safe to consider either removing the code entirely or just having
it run when some other more specific to purpose constant is defined.
If we did the latter, then I'd have concerns that nothing would ever
run the code to check the memory accounting, that's why I ended up
changing it to run with USE_ASSERT_CHECKING builds.

I understand. I think this is too slow for generic assertions, because the
overhead is about 50x.

But I understand, so it may be necessary to have this code active some time.

Regards

Pavel

Show quoted text

David

[1]
https://github.com/postgres/postgres/blob/master/src/backend/executor/nodeResultCache.c#L305

#8David Rowley
dgrowleyml@gmail.com
In reply to: Pavel Stehule (#7)
Re: plan with result cache is very slow when work_mem is not enough

On Sat, 8 May 2021 at 08:18, Pavel Stehule <pavel.stehule@gmail.com> wrote:

pá 7. 5. 2021 v 21:56 odesílatel David Rowley <dgrowleyml@gmail.com> napsal:

With USE_ASSERT_CHECKING builds, I did add some code that verifies the
memory tracking is set correctly when evicting from the cache. This
code is pretty expensive as it loops over the entire cache to check
the memory accounting every time we evict something from the cache.
Originally, I had this code only run when some other constant was
defined, but I ended up changing it to compile that code in for all
assert enabled builds.

I considered that it might be too expensive as you can see from the
comment in [1]. I just wanted to get a few machines other than my own
to verify that the memory accounting code was working as expected.
There have been no complaints of any Assert failures yet, so maybe
it's safe to consider either removing the code entirely or just having
it run when some other more specific to purpose constant is defined.
If we did the latter, then I'd have concerns that nothing would ever
run the code to check the memory accounting, that's why I ended up
changing it to run with USE_ASSERT_CHECKING builds.

I understand. I think this is too slow for generic assertions, because the overhead is about 50x.

I didn't expect it would show up quite that much. If you scaled the
test up a bit more and increased work_mem further, then it would be
even more than 50x.

At one point when I was developing the patch, I had two high water
marks for cache memory. When we reached the upper of the two marks,
I'd reduce the memory down to the lower of two marks. The lower of
the two marks was set to 98% of the higher mark. In the end, I got
rid of that as I didn't really see what extra overhead there was from
just running the eviction code every time we require another byte.
However, if we did have that again, then the memory checking could
just be done when we run the eviction code. We'd then need to consume
that 2% more memory before it would run again.

My current thinking is that I don't really want to add that complexity
just for some Assert code. I'd only want to do it if there was another
valid reason to.

Another thought I have is that maybe it would be ok just to move
memory accounting debug code so it only runs once in
ExecEndResultCache. I struggling to imagine that if the memory
tracking did go out of whack, that the problem would have accidentally
fixed itself by the time we got to ExecEndResultCache(). I guess even
if the accounting was counting far too much memory and we'd evicted
everything from the cache to try and get the memory usage down, we'd
still find the problem during ExecEndResultCache(), even if the cache
had become completely empty as a result.

David

#9Tomas Vondra
tomas.vondra@enterprisedb.com
In reply to: David Rowley (#8)
Re: plan with result cache is very slow when work_mem is not enough

On 5/7/21 11:04 PM, David Rowley wrote:

On Sat, 8 May 2021 at 08:18, Pavel Stehule <pavel.stehule@gmail.com> wrote:

pá 7. 5. 2021 v 21:56 odesílatel David Rowley <dgrowleyml@gmail.com> napsal:

With USE_ASSERT_CHECKING builds, I did add some code that verifies the
memory tracking is set correctly when evicting from the cache. This
code is pretty expensive as it loops over the entire cache to check
the memory accounting every time we evict something from the cache.
Originally, I had this code only run when some other constant was
defined, but I ended up changing it to compile that code in for all
assert enabled builds.

I considered that it might be too expensive as you can see from the
comment in [1]. I just wanted to get a few machines other than my own
to verify that the memory accounting code was working as expected.
There have been no complaints of any Assert failures yet, so maybe
it's safe to consider either removing the code entirely or just having
it run when some other more specific to purpose constant is defined.
If we did the latter, then I'd have concerns that nothing would ever
run the code to check the memory accounting, that's why I ended up
changing it to run with USE_ASSERT_CHECKING builds.

I understand. I think this is too slow for generic assertions, because the overhead is about 50x.

I didn't expect it would show up quite that much. If you scaled the
test up a bit more and increased work_mem further, then it would be
even more than 50x.

At one point when I was developing the patch, I had two high water
marks for cache memory. When we reached the upper of the two marks,
I'd reduce the memory down to the lower of two marks. The lower of
the two marks was set to 98% of the higher mark. In the end, I got
rid of that as I didn't really see what extra overhead there was from
just running the eviction code every time we require another byte.
However, if we did have that again, then the memory checking could
just be done when we run the eviction code. We'd then need to consume
that 2% more memory before it would run again.

My current thinking is that I don't really want to add that complexity
just for some Assert code. I'd only want to do it if there was another
valid reason to.

Agreed. I think this approach to eviction (i.e. evicting more than you
need) would be useful if the actual eviction code was expensive, and
doing it in a "batch" would make it significantly cheaper. But I don't
think "asserts are expensive" is a good reason for it.

Another thought I have is that maybe it would be ok just to move
memory accounting debug code so it only runs once in
ExecEndResultCache. I struggling to imagine that if the memory
tracking did go out of whack, that the problem would have accidentally
fixed itself by the time we got to ExecEndResultCache(). I guess even
if the accounting was counting far too much memory and we'd evicted
everything from the cache to try and get the memory usage down, we'd
still find the problem during ExecEndResultCache(), even if the cache
had become completely empty as a result.

I don't think postponing the debug code until much later is a great
idea. When something goes wrong it's good to know ASAP, otherwise it's
much more difficult to identify the issue.

Not sure we need to do something here - for regression tests this is not
an issue, because those generally work with small data sets. And if you
run with asserts on large amounts of data, I think this is acceptable.

I had the same dilemma with the new BRIN index opclasses, which also
have some extensive and expensive assert checks - for the regression
tests that's fine, and it proved very useful during development.

I have considered enabling those extra checks only on request somehow,
but I'd bet no one would do that and I'd forget it exists pretty soon.

regards

--
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

#10David Rowley
dgrowleyml@gmail.com
In reply to: Tomas Vondra (#9)
1 attachment(s)
Re: plan with result cache is very slow when work_mem is not enough

On Sat, 8 May 2021 at 09:16, Tomas Vondra <tomas.vondra@enterprisedb.com> wrote:

On 5/7/21 11:04 PM, David Rowley wrote:

Another thought I have is that maybe it would be ok just to move
memory accounting debug code so it only runs once in
ExecEndResultCache. I struggling to imagine that if the memory
tracking did go out of whack, that the problem would have accidentally
fixed itself by the time we got to ExecEndResultCache(). I guess even
if the accounting was counting far too much memory and we'd evicted
everything from the cache to try and get the memory usage down, we'd
still find the problem during ExecEndResultCache(), even if the cache
had become completely empty as a result.

I don't think postponing the debug code until much later is a great
idea. When something goes wrong it's good to know ASAP, otherwise it's
much more difficult to identify the issue.

I thought about this a bit and I was about to agree, but then I changed my mind.

The biggest concern I have on this topic is that we end up with zero
validation done for the memory accounting. If we can find a cheaper
place to put the Asserts that will at least bring our attention to the
fact that some problem exists, then more investigation can ensue. I
don't personally expect that every assert failure will show us the
exact location of the bug.

Additionally, I don't really think there is a huge amount of room for
bugs creeping in here as there's not all that many places that the
'mem_used' field gets updated, so there are not many places to forget
to do it.

Another way to look at this is that, where the Asserts are today,
there are zero memory accounting checks done in all cases that don't
evict any tuples. I feel by moving the tests to ExecEndResultCache()
we'll do memory validation for all plans using a Result Cache in
Assert builds. Yes, we might just need to do a bit more work to find
out exactly where the problem is, but some investigation would need to
happen anyway. I think if anyone changes anything which breaks the
memory accounting then they'll be aware of it quite quickly and they
can just look at what they did wrong.

David

Attachments:

only_validate_resultcache_mem_on_plan_shutdown.patchapplication/octet-stream; name=only_validate_resultcache_mem_on_plan_shutdown.patchDownload
diff --git a/src/backend/executor/nodeResultCache.c b/src/backend/executor/nodeResultCache.c
index 534d733eb3..919238d1ff 100644
--- a/src/backend/executor/nodeResultCache.c
+++ b/src/backend/executor/nodeResultCache.c
@@ -298,41 +298,6 @@ remove_cache_entry(ResultCacheState *rcstate, ResultCacheEntry *entry)
 
 	dlist_delete(&entry->key->lru_node);
 
-#ifdef USE_ASSERT_CHECKING
-
-	/*
-	 * Validate the memory accounting code is correct in assert builds. XXX is
-	 * this too expensive for USE_ASSERT_CHECKING?
-	 */
-	{
-		int			i,
-					count;
-		uint64		mem = 0;
-
-		count = 0;
-		for (i = 0; i < rcstate->hashtable->size; i++)
-		{
-			ResultCacheEntry *entry = &rcstate->hashtable->data[i];
-
-			if (entry->status == resultcache_SH_IN_USE)
-			{
-				ResultCacheTuple *tuple = entry->tuplehead;
-
-				mem += EMPTY_ENTRY_MEMORY_BYTES(entry);
-				while (tuple != NULL)
-				{
-					mem += CACHE_TUPLE_BYTES(tuple);
-					tuple = tuple->next;
-				}
-				count++;
-			}
-		}
-
-		Assert(count == rcstate->hashtable->members);
-		Assert(mem == rcstate->mem_used);
-	}
-#endif
-
 	/* Remove all of the tuples from this entry */
 	entry_purge_tuples(rcstate, entry);
 
@@ -977,6 +942,35 @@ ExecInitResultCache(ResultCache *node, EState *estate, int eflags)
 void
 ExecEndResultCache(ResultCacheState *node)
 {
+#ifdef USE_ASSERT_CHECKING
+	/* Validate the memory accounting code is correct in assert builds. */
+	{
+		int			count;
+		uint64		mem = 0;
+		resultcache_iterator i;
+		ResultCacheEntry *entry;
+
+		resultcache_start_iterate(node->hashtable, &i);
+
+		count = 0;
+		while ((entry = resultcache_iterate(node->hashtable, &i)) != NULL)
+		{
+			ResultCacheTuple *tuple = entry->tuplehead;
+
+			mem += EMPTY_ENTRY_MEMORY_BYTES(entry);
+			while (tuple != NULL)
+			{
+				mem += CACHE_TUPLE_BYTES(tuple);
+				tuple = tuple->next;
+			}
+			count++;
+		}
+
+		Assert(count == node->hashtable->members);
+		Assert(mem == node->mem_used);
+	}
+#endif
+
 	/*
 	 * When ending a parallel worker, copy the statistics gathered by the
 	 * worker back into shared memory so that it can be picked up by the main
#11Justin Pryzby
pryzby@telsasoft.com
In reply to: David Rowley (#10)
Re: plan with result cache is very slow when work_mem is not enough

On Sat, May 08, 2021 at 02:26:44PM +1200, David Rowley wrote:

On Sat, 8 May 2021 at 09:16, Tomas Vondra <tomas.vondra@enterprisedb.com> wrote:

On 5/7/21 11:04 PM, David Rowley wrote:

Another thought I have is that maybe it would be ok just to move
memory accounting debug code so it only runs once in
ExecEndResultCache. I struggling to imagine that if the memory
tracking did go out of whack, that the problem would have accidentally
fixed itself by the time we got to ExecEndResultCache(). I guess even
if the accounting was counting far too much memory and we'd evicted
everything from the cache to try and get the memory usage down, we'd
still find the problem during ExecEndResultCache(), even if the cache
had become completely empty as a result.

I don't think postponing the debug code until much later is a great
idea. When something goes wrong it's good to know ASAP, otherwise it's
much more difficult to identify the issue.

I thought about this a bit and I was about to agree, but then I changed my mind.

Yes, we might just need to do a bit more work to find
out exactly where the problem is, but some investigation would need to
happen anyway. I think if anyone changes anything which breaks the
memory accounting then they'll be aware of it quite quickly and they
can just look at what they did wrong.

You could put this into a separate function called by ExecEndResultCache().
Then anyone that breaks the memory accounting can also call the function in the
places they changed to help figure out what they broke.

- * Validate the memory accounting code is correct in assert builds. XXX is
- * this too expensive for USE_ASSERT_CHECKING?

--
Justin

#12David Rowley
dgrowleyml@gmail.com
In reply to: Justin Pryzby (#11)
Re: plan with result cache is very slow when work_mem is not enough

On Sat, 8 May 2021 at 14:43, Justin Pryzby <pryzby@telsasoft.com> wrote:

You could put this into a separate function called by ExecEndResultCache().
Then anyone that breaks the memory accounting can also call the function in the
places they changed to help figure out what they broke.

I almost did it that way and left a call to it in remove_cache_entry()
#ifdef'd out. However, as mentioned, I'm more concerned about the
accounting being broken and left broken than I am with making it take
a little less time to find the exact place to fix the breakage. If
the breakage was to occur when adding a new entry to the cache then it
might not narrow it down much if we just give users an easy way to
check the memory accounting during only evictions. The only way to
highlight the problem as soon as it occurs would be to validate the
memory tracking every time the mem_used field is changed. I think that
would be overkill.

I also find it hard to imagine what other reasons we'll have in the
future to adjust 'mem_used'. At the moment there are 4 places. Two
that add bytes and two that subtract bytes. They're all hidden inside
reusable functions that are in charge of adding and removing entries
from the cache.

David

#13Yura Sokolov
y.sokolov@postgrespro.ru
In reply to: Tomas Vondra (#9)
Re: plan with result cache is very slow when work_mem is not enough

8 мая 2021 г. 00:16:54 GMT+03:00, Tomas Vondra <tomas.vondra@enterprisedb.com> пишет:

On 5/7/21 11:04 PM, David Rowley wrote:

On Sat, 8 May 2021 at 08:18, Pavel Stehule <pavel.stehule@gmail.com>

wrote:

pá 7. 5. 2021 v 21:56 odesílatel David Rowley <dgrowleyml@gmail.com>

napsal:

With USE_ASSERT_CHECKING builds, I did add some code that verifies

the

memory tracking is set correctly when evicting from the cache. This
code is pretty expensive as it loops over the entire cache to check
the memory accounting every time we evict something from the cache.
Originally, I had this code only run when some other constant was
defined, but I ended up changing it to compile that code in for all
assert enabled builds.

I considered that it might be too expensive as you can see from the
comment in [1]. I just wanted to get a few machines other than my

own

to verify that the memory accounting code was working as expected.
There have been no complaints of any Assert failures yet, so maybe
it's safe to consider either removing the code entirely or just

having

it run when some other more specific to purpose constant is

defined.

If we did the latter, then I'd have concerns that nothing would

ever

run the code to check the memory accounting, that's why I ended up
changing it to run with USE_ASSERT_CHECKING builds.

I understand. I think this is too slow for generic assertions,

because the overhead is about 50x.

I didn't expect it would show up quite that much. If you scaled the
test up a bit more and increased work_mem further, then it would be
even more than 50x.

At one point when I was developing the patch, I had two high water
marks for cache memory. When we reached the upper of the two marks,
I'd reduce the memory down to the lower of two marks. The lower of
the two marks was set to 98% of the higher mark. In the end, I got
rid of that as I didn't really see what extra overhead there was from
just running the eviction code every time we require another byte.
However, if we did have that again, then the memory checking could
just be done when we run the eviction code. We'd then need to consume
that 2% more memory before it would run again.

My current thinking is that I don't really want to add that

complexity

just for some Assert code. I'd only want to do it if there was

another

valid reason to.

Agreed. I think this approach to eviction (i.e. evicting more than you
need) would be useful if the actual eviction code was expensive, and
doing it in a "batch" would make it significantly cheaper. But I don't
think "asserts are expensive" is a good reason for it.

Another thought I have is that maybe it would be ok just to move
memory accounting debug code so it only runs once in
ExecEndResultCache. I struggling to imagine that if the memory
tracking did go out of whack, that the problem would have

accidentally

fixed itself by the time we got to ExecEndResultCache(). I guess

even

if the accounting was counting far too much memory and we'd evicted
everything from the cache to try and get the memory usage down, we'd
still find the problem during ExecEndResultCache(), even if the cache
had become completely empty as a result.

I don't think postponing the debug code until much later is a great
idea. When something goes wrong it's good to know ASAP, otherwise it's
much more difficult to identify the issue.

Not sure we need to do something here - for regression tests this is
not
an issue, because those generally work with small data sets. And if you

run with asserts on large amounts of data, I think this is acceptable.

I had the same dilemma with the new BRIN index opclasses, which also
have some extensive and expensive assert checks - for the regression
tests that's fine, and it proved very useful during development.

I have considered enabling those extra checks only on request somehow,
but I'd bet no one would do that and I'd forget it exists pretty soon.

regards

Perhaps there is need for flag "heavy asserts". Or option "--enable-cassert=heavy". Then USE_ASSERT_CHECKING could be defined to integer 1 or 2 depending on "heaviness" of enabled checks.

regards
Yura Sokolov

#14David Rowley
dgrowleyml@gmail.com
In reply to: Yura Sokolov (#13)
Re: plan with result cache is very slow when work_mem is not enough

On Sat, 8 May 2021 at 19:03, Yura Sokolov <y.sokolov@postgrespro.ru> wrote:

Perhaps there is need for flag "heavy asserts". Or option "--enable-cassert=heavy". Then USE_ASSERT_CHECKING could be defined to integer 1 or 2 depending on "heaviness" of enabled checks.

I'd rather explore all other options before we went and did something
like that. The reason I feel this way is; a couple of jobs ago, I
worked on a project that had quite a large test suite. In order to
run the tests on a typical dev machine took about 16 hours. When
writing new tests, nobody really ever gave much thought to how long
tests would take to run. The time it took had already grown to
something much longer than was practical to run before a commit.
Instead, we had a test farm. We'd commit to master (yuck) and the test
farm would run the tests. Unfortunately, the farm was not big enough
to run every test on every commit, so if it encountered a test failure
(as it commonly did) it tried to bisect and find the first breaking
commit for that test then alert the person (probably the next day).
bisecting works out pretty badly when someone commits a build breaking
commit.

Now that I work on PostgreSQL, I very much like the fact that I can
run make check-world -j and see very quickly that no tests fail. I'd
rather not have something like heavy asserts as I believe it would
just become a dumping ground for badly thought through Asserts that
most likely could be done in some better way that didn't require so
much effort. (I'll put mine in that category) It might not take long
before most committers opted to run make check-world without the heavy
asserts because it became just too slow. When that happens, it means
the buildfarm is left to catch these errors and most likely the
feedback will take much longer to get back to the committer due to the
animals running these flags also having to churn through the slow
tests.

So, I'm -1 for "heavy asserts".

David

#15Tomas Vondra
tomas.vondra@enterprisedb.com
In reply to: David Rowley (#14)
Re: plan with result cache is very slow when work_mem is not enough

On 5/8/21 9:39 AM, David Rowley wrote:

On Sat, 8 May 2021 at 19:03, Yura Sokolov <y.sokolov@postgrespro.ru> wrote:

Perhaps there is need for flag "heavy asserts". Or option "--enable-cassert=heavy". Then USE_ASSERT_CHECKING could be defined to integer 1 or 2 depending on "heaviness" of enabled checks.

I'd rather explore all other options before we went and did something
like that. The reason I feel this way is; a couple of jobs ago, I
worked on a project that had quite a large test suite. In order to
run the tests on a typical dev machine took about 16 hours. When
writing new tests, nobody really ever gave much thought to how long
tests would take to run. The time it took had already grown to
something much longer than was practical to run before a commit.
Instead, we had a test farm. We'd commit to master (yuck) and the test
farm would run the tests. Unfortunately, the farm was not big enough
to run every test on every commit, so if it encountered a test failure
(as it commonly did) it tried to bisect and find the first breaking
commit for that test then alert the person (probably the next day).
bisecting works out pretty badly when someone commits a build breaking
commit.

Now that I work on PostgreSQL, I very much like the fact that I can
run make check-world -j and see very quickly that no tests fail. I'd
rather not have something like heavy asserts as I believe it would
just become a dumping ground for badly thought through Asserts that
most likely could be done in some better way that didn't require so
much effort. (I'll put mine in that category) It might not take long
before most committers opted to run make check-world without the heavy
asserts because it became just too slow. When that happens, it means
the buildfarm is left to catch these errors and most likely the
feedback will take much longer to get back to the committer due to the
animals running these flags also having to churn through the slow
tests.

So, I'm -1 for "heavy asserts".

Yeah, -1 from me to.

But I think we should differentiate between asserts that (a)
significantly affect "our" regression tests with limited amounts of
data, and (b) asserts that are an issue for large data sets. The asserts
discussed in this thread are in (b), but not (a).

I don't know if there's a better way to do these tests, but if there's
not I'd not worry about it too much for now.

regards

--
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

#16David Rowley
dgrowleyml@gmail.com
In reply to: Tomas Vondra (#15)
Re: plan with result cache is very slow when work_mem is not enough

On Sat, 8 May 2021 at 22:33, Tomas Vondra <tomas.vondra@enterprisedb.com> wrote:

I don't know if there's a better way to do these tests, but if there's
not I'd not worry about it too much for now.

So you're -1 on the proposed patch?

David

#17Tomas Vondra
tomas.vondra@enterprisedb.com
In reply to: David Rowley (#16)
Re: plan with result cache is very slow when work_mem is not enough

On 5/8/21 1:27 PM, David Rowley wrote:

On Sat, 8 May 2021 at 22:33, Tomas Vondra <tomas.vondra@enterprisedb.com> wrote:

I don't know if there's a better way to do these tests, but if there's
not I'd not worry about it too much for now.

So you're -1 on the proposed patch?

Oh! I have not noticed there was a patch. No, I'm not against the patch,
if it improves the testing (and you're right it's better to do some
testing even without evictions).

regards

--
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

#18Pavel Stehule
pavel.stehule@gmail.com
In reply to: David Rowley (#14)
Re: plan with result cache is very slow when work_mem is not enough

so 8. 5. 2021 v 9:39 odesílatel David Rowley <dgrowleyml@gmail.com> napsal:

On Sat, 8 May 2021 at 19:03, Yura Sokolov <y.sokolov@postgrespro.ru>
wrote:

Perhaps there is need for flag "heavy asserts". Or option

"--enable-cassert=heavy". Then USE_ASSERT_CHECKING could be defined to
integer 1 or 2 depending on "heaviness" of enabled checks.

I'd rather explore all other options before we went and did something
like that. The reason I feel this way is; a couple of jobs ago, I
worked on a project that had quite a large test suite. In order to
run the tests on a typical dev machine took about 16 hours. When
writing new tests, nobody really ever gave much thought to how long
tests would take to run. The time it took had already grown to
something much longer than was practical to run before a commit.
Instead, we had a test farm. We'd commit to master (yuck) and the test
farm would run the tests. Unfortunately, the farm was not big enough
to run every test on every commit, so if it encountered a test failure
(as it commonly did) it tried to bisect and find the first breaking
commit for that test then alert the person (probably the next day).
bisecting works out pretty badly when someone commits a build breaking
commit.

Now that I work on PostgreSQL, I very much like the fact that I can
run make check-world -j and see very quickly that no tests fail. I'd
rather not have something like heavy asserts as I believe it would
just become a dumping ground for badly thought through Asserts that
most likely could be done in some better way that didn't require so
much effort. (I'll put mine in that category) It might not take long
before most committers opted to run make check-world without the heavy
asserts because it became just too slow. When that happens, it means
the buildfarm is left to catch these errors and most likely the
feedback will take much longer to get back to the committer due to the
animals running these flags also having to churn through the slow
tests.

So, I'm -1 for "heavy asserts".

Personally, I have not problem with too slow assertions, although it is not
too practical. The main problem is some shock, and feeling so some is
wrong. I spent 1 hour detecting if it is a bug or not.

Can it be possible to identify this situation?

Maybe use some specific name of this routine - like

assert_only_check_xxxx

Then I can see this warning in perf, and I don't need to do other or deeper
checks

Regards

Pavel

Show quoted text

David

#19Tom Lane
tgl@sss.pgh.pa.us
In reply to: Pavel Stehule (#18)
Re: plan with result cache is very slow when work_mem is not enough

Pavel Stehule <pavel.stehule@gmail.com> writes:

so 8. 5. 2021 v 9:39 odesílatel David Rowley <dgrowleyml@gmail.com> napsal:

So, I'm -1 for "heavy asserts".

Personally, I have not problem with too slow assertions, although it is not
too practical.

I'm very much on David's side here. I'm currently trying to figure out
why CLOBBER_CACHE_ALWAYS tests have gotten something like 10% slower
in the past six weeks or so --- if you look at hyrax for example,
it finished "make check" in 12:52:44 in its last fully-green run
back in mid-March, while the latest report shows 14:08:18. I think
something needs to be done about that. It's bad enough that no
developers will run cache-clobber tests by hand; but if it takes
a week to get buildfarm feedback, you've got exactly the same
problems David describes.

regards, tom lane

#20David Rowley
dgrowleyml@gmail.com
In reply to: Tomas Vondra (#17)
Re: plan with result cache is very slow when work_mem is not enough

On Sun, 9 May 2021 at 00:07, Tomas Vondra <tomas.vondra@enterprisedb.com> wrote:

On 5/8/21 1:27 PM, David Rowley wrote:

On Sat, 8 May 2021 at 22:33, Tomas Vondra <tomas.vondra@enterprisedb.com> wrote:

I don't know if there's a better way to do these tests, but if there's
not I'd not worry about it too much for now.

So you're -1 on the proposed patch?

Oh! I have not noticed there was a patch. No, I'm not against the patch,
if it improves the testing (and you're right it's better to do some
testing even without evictions).

There seems to be plenty of other ideas floating around, but since
nobody seems to be objecting to reducing the Assert overheads, I've
pushed the proposed patch.

The more I thought about this, the more I think I should have done it
this way in the first place.

David

#21David Rowley
dgrowleyml@gmail.com
In reply to: Pavel Stehule (#18)
Re: plan with result cache is very slow when work_mem is not enough

On Sun, 9 May 2021 at 03:29, Pavel Stehule <pavel.stehule@gmail.com> wrote:

Personally, I have not problem with too slow assertions, although it is not too practical. The main problem is some shock, and feeling so some is wrong. I spent 1 hour detecting if it is a bug or not.

Thanks for spending the time figuring out where the slowness came from.

Can it be possible to identify this situation?

Maybe use some specific name of this routine - like

assert_only_check_xxxx

Then I can see this warning in perf, and I don't need to do other or deeper checks

I don't think we need to go around leaving clues for people who run
perf on cassert builds. I think anyone doing that should just never
expect any meaningful results.

David

#22Pavel Stehule
pavel.stehule@gmail.com
In reply to: David Rowley (#21)
Re: plan with result cache is very slow when work_mem is not enough

ne 9. 5. 2021 v 3:01 odesílatel David Rowley <dgrowleyml@gmail.com> napsal:

On Sun, 9 May 2021 at 03:29, Pavel Stehule <pavel.stehule@gmail.com>
wrote:

Personally, I have not problem with too slow assertions, although it is

not too practical. The main problem is some shock, and feeling so some is
wrong. I spent 1 hour detecting if it is a bug or not.

Thanks for spending the time figuring out where the slowness came from.

Can it be possible to identify this situation?

Maybe use some specific name of this routine - like

assert_only_check_xxxx

Then I can see this warning in perf, and I don't need to do other or

deeper checks

I don't think we need to go around leaving clues for people who run
perf on cassert builds. I think anyone doing that should just never
expect any meaningful results.

ok

Pavel

Show quoted text

David

#23Yura Sokolov
y.sokolov@postgrespro.ru
In reply to: David Rowley (#21)
Re: plan with result cache is very slow when work_mem is not enough

David Rowley писал 2021-05-09 04:01:

On Sun, 9 May 2021 at 03:29, Pavel Stehule <pavel.stehule@gmail.com>
wrote:

Personally, I have not problem with too slow assertions, although it
is not too practical. The main problem is some shock, and feeling so
some is wrong. I spent 1 hour detecting if it is a bug or not.

Thanks for spending the time figuring out where the slowness came from.

Can it be possible to identify this situation?

Maybe use some specific name of this routine - like

assert_only_check_xxxx

Then I can see this warning in perf, and I don't need to do other or
deeper checks

I don't think we need to go around leaving clues for people who run
perf on cassert builds. I think anyone doing that should just never
expect any meaningful results.

Occasionally there is a need to run cassert builds in production to
catch an issue. It is usually ok if cassert build O(1) slower than
optimized biuld (ie it is slower in some constant factor C). But
if cassert build will be quadratically slower, it will unusable.

regards,
Yura

#24Andres Freund
andres@anarazel.de
In reply to: Yura Sokolov (#23)
Re: plan with result cache is very slow when work_mem is not enough

Hi,

On 2021-05-09 15:57:22 +0300, Yura Sokolov wrote:

Occasionally there is a need to run cassert builds in production to
catch an issue. It is usually ok if cassert build O(1) slower than
optimized biuld (ie it is slower in some constant factor C). But
if cassert build will be quadratically slower, it will unusable.

The memory context assertion overhead is more than O(1) expensive. I
think there's plenty other cases like it. We removed some (e.g. it used
to be that we scanned O(#shared_buffers) entries in the local pin table,
at the end of the transaction). I don't think we want to limit ourselves
to O(1) checks. That's not to say we should have a O(n^2) or such,
unless we have confidence n rarely will be big.

Greetings,

Andres Freund