Reducing opr_sanity test's runtime under CLOBBER_CACHE_ALWAYS
In a nearby thread I bemoaned the fact that the core regression tests
seem to have gotten significantly slower in the last couple of months,
at least with CCA enabled: hyrax reports completing them in 12:52:44
on 18 March, while its most recent run on 1 May took 14:08:18.
Trying to diagnose the cause overall seemed a bit daunting, but
I thought I'd dig into the opr_sanity test in particular, as it
is one of the slower tests under CCA to start with and had also
slowed down noticeably (from 3701581 ms to 4761183 ms, or 28%).
I was able to complete a bisection using just that test, and
got an unexpected result: most of the slowdown appeared at
ab596105b (BRIN minmax-multi indexes). Apparently the additional
time is simply from having to check the additional pg_amop and
pg_amproc entries, which that patch added quite a few of.
I noticed that all of the slowest queries in that test were dependent
on the binary_coercible() plpgsql function that it uses. Now, that
function has always been a rather lame attempt to approximate the
behavior of the parser's IsBinaryCoercible() function, so I've been
thinking for some time that we ought to get rid of it in favor of
actually using IsBinaryCoercible(). I tried that, by adding a
shim function to regress.c, and got a most gratifying result:
on my machine opr_sanity's runtime with
debug_invalidate_system_caches_always = 1 drops from
29m9s to 3m19s. Without CCA the speedup is far less impressive,
360ms to 305ms, but that's still useful. Especially since this
makes the test strictly more accurate.
(I am thinking that this suggests that plpgsql may be hurt more
by cache clobbers than it really needs to be; but doing anything
about that would require some research.)
Anyway, I propose that we ought to sneak this into HEAD, since
it's only touching test code and not anything production-critical.
The patch is a bit more invasive than I would have liked, because
adding the SQL definition of binary_coercible() to create_function_1
(where the other regress.c functions are declared) didn't work:
that runs after opr_sanity, and just moving it up to before
opr_sanity causes the latter to complain about some of the functions
in it. So I ended up splitting the create_function_1 test into
create_function_0 and create_function_1. It's annoying from a
parallelism standpoint that create_function_0 runs by itself, but
the two parallel groups ahead of it are already full. Maybe we
should rebalance that by moving a few of those tests to run in
parallel with create_function_0, but I didn't do that here.
Thoughts?
regards, tom lane
Attachments:
make-binary_coercible-a-C-function-1.patchtext/x-diff; charset=us-ascii; name=make-binary_coercible-a-C-function-1.patchDownload+99-148
On Sat, May 08, 2021 at 03:44:57PM -0400, Tom Lane wrote:
I tried that, by adding a
shim function to regress.c, and got a most gratifying result:
on my machine opr_sanity's runtime with
debug_invalidate_system_caches_always = 1 drops from
29m9s to 3m19s. Without CCA the speedup is far less impressive,
360ms to 305ms, but that's still useful. Especially since this
makes the test strictly more accurate.
The speedup is quite welcome and still impressive in both cases.
Anyway, I propose that we ought to sneak this into HEAD, since
it's only touching test code and not anything production-critical.
+1 for pushing it in HEAD.
Looking at the patch, explicitly_binary_coercible wasn't used since
e9f42d529f990f94e1b7bdcec4a1111465c85326 (and was renamed there too). Just to
be sure, is it ok to remove it, as it was described as
--- We don't currently use this for any tests in this file, but it is a --- reasonable alternative definition for some scenarios.
It would still be in the git history in needed, so I'm not objecting.
On 5/8/21 3:44 PM, Tom Lane wrote:
Anyway, I propose that we ought to sneak this into HEAD, since
it's only touching test code and not anything production-critical.The patch is a bit more invasive than I would have liked, because
adding the SQL definition of binary_coercible() to create_function_1
(where the other regress.c functions are declared) didn't work:
that runs after opr_sanity, and just moving it up to before
opr_sanity causes the latter to complain about some of the functions
in it. So I ended up splitting the create_function_1 test into
create_function_0 and create_function_1. It's annoying from a
parallelism standpoint that create_function_0 runs by itself, but
the two parallel groups ahead of it are already full. Maybe we
should rebalance that by moving a few of those tests to run in
parallel with create_function_0, but I didn't do that here.Thoughts?
+1 for doing it now.
You could possibly just move "inet macaddr macaddr8 " to the following
group and so have room for create_function_0. I just tried that and it
seemed happy.
cheers
andrew
--
Andrew Dunstan
EDB: https://www.enterprisedb.com
Julien Rouhaud <rjuju123@gmail.com> writes:
Looking at the patch, explicitly_binary_coercible wasn't used since
e9f42d529f990f94e1b7bdcec4a1111465c85326 (and was renamed there too). Just to
be sure, is it ok to remove it, as it was described as
--- We don't currently use this for any tests in this file, but it is a --- reasonable alternative definition for some scenarios.
It would still be in the git history in needed, so I'm not objecting.
It's my own comment, so it doesn't scare me particularly ;-).
I think that
(a) it's unlikely we'll ever again need that old physically-coercible
check. That was a hangover from Berkeley-era type cheats, and I think
our standards are higher now. If somebody submits a patch that would
depend on such a cheat, I think our response would be "fix the patch",
not "it's okay to weaken the type-matching checks".
(b) if we did need it, we'd probably want an implementation like this
one (ie invoke some C code), both for speed and because it's hard to
make a plpgsql function's behavior match the C code's exactly.
regards, tom lane
On Sun, May 09, 2021 at 01:01:38PM -0400, Tom Lane wrote:
Julien Rouhaud <rjuju123@gmail.com> writes:
Looking at the patch, explicitly_binary_coercible wasn't used since
e9f42d529f990f94e1b7bdcec4a1111465c85326 (and was renamed there too). Just to
be sure, is it ok to remove it, as it was described as--- We don't currently use this for any tests in this file, but it is a --- reasonable alternative definition for some scenarios.It would still be in the git history in needed, so I'm not objecting.
It's my own comment, so it doesn't scare me particularly ;-).
Yes, I saw that when digging in git history :)
I think that
(a) it's unlikely we'll ever again need that old physically-coercible
check. That was a hangover from Berkeley-era type cheats, and I think
our standards are higher now. If somebody submits a patch that would
depend on such a cheat, I think our response would be "fix the patch",
not "it's okay to weaken the type-matching checks".(b) if we did need it, we'd probably want an implementation like this
one (ie invoke some C code), both for speed and because it's hard to
make a plpgsql function's behavior match the C code's exactly.
I quite agree with both. As I said I just wanted to mention it for extra
safety.
Hi,
On 2021-05-08 15:44:57 -0400, Tom Lane wrote:
In a nearby thread I bemoaned the fact that the core regression tests
seem to have gotten significantly slower in the last couple of months,
at least with CCA enabled: hyrax reports completing them in 12:52:44
on 18 March, while its most recent run on 1 May took 14:08:18.Trying to diagnose the cause overall seemed a bit daunting, but
I thought I'd dig into the opr_sanity test in particular, as it
is one of the slower tests under CCA to start with and had also
slowed down noticeably (from 3701581 ms to 4761183 ms, or 28%).
I was able to complete a bisection using just that test, and
got an unexpected result: most of the slowdown appeared at
ab596105b (BRIN minmax-multi indexes). Apparently the additional
time is simply from having to check the additional pg_amop and
pg_amproc entries, which that patch added quite a few of.
I suspect that it might be not just that. From a quick profile it looks
like debug_invalidate_system_caches_always spends a good chunk of its
time in ResetCatalogCache() and hash_seq_search(). Those cost linearly
with the size of the underlying hash tables.
Wo what what might be happening is that the additional catalog entries
pushed some of the catcache hash tables into growing
(RehashCatCache()). Which then makes all subsequent ResetCatalogCache()
scans slower.
Not that that changes much - your proposed fix still seems reasonable.
I noticed that all of the slowest queries in that test were dependent
on the binary_coercible() plpgsql function that it uses. Now, that
function has always been a rather lame attempt to approximate the
behavior of the parser's IsBinaryCoercible() function, so I've been
thinking for some time that we ought to get rid of it in favor of
actually using IsBinaryCoercible(). I tried that, by adding a
shim function to regress.c, and got a most gratifying result:
on my machine opr_sanity's runtime with
debug_invalidate_system_caches_always = 1 drops from
29m9s to 3m19s. Without CCA the speedup is far less impressive,
360ms to 305ms, but that's still useful. Especially since this
makes the test strictly more accurate.
Cool!
Anyway, I propose that we ought to sneak this into HEAD, since
it's only touching test code and not anything production-critical.
+1
Greetings,
Andres Freund
Andres Freund <andres@anarazel.de> writes:
On 2021-05-08 15:44:57 -0400, Tom Lane wrote:
I was able to complete a bisection using just that test, and
got an unexpected result: most of the slowdown appeared at
ab596105b (BRIN minmax-multi indexes). Apparently the additional
time is simply from having to check the additional pg_amop and
pg_amproc entries, which that patch added quite a few of.
I suspect that it might be not just that. From a quick profile it looks
like debug_invalidate_system_caches_always spends a good chunk of its
time in ResetCatalogCache() and hash_seq_search(). Those cost linearly
with the size of the underlying hash tables.
Wo what what might be happening is that the additional catalog entries
pushed some of the catcache hash tables into growing
(RehashCatCache()). Which then makes all subsequent ResetCatalogCache()
scans slower.
Hm. But constantly flushing the caches should mean that they're never
populated with very many entries at one time, which ought to forestall
that, at least to some extent.
I wonder if there's anything we could do to make ResetCatalogCache
faster? It wouldn't help much for normal execution of course,
but it might do something to bring CCA testing time down out of
the stratosphere.
regards, tom lane
Hi,
On 2021-05-10 14:06:16 -0400, Tom Lane wrote:
Hm. But constantly flushing the caches should mean that they're never
populated with very many entries at one time, which ought to forestall
that, at least to some extent.
That's probably true...
I wonder if there's anything we could do to make ResetCatalogCache
faster? It wouldn't help much for normal execution of course,
but it might do something to bring CCA testing time down out of
the stratosphere.
We could make the hashtables shrink, not just grow...
There's also the issue that most people, I assume, run CCA tests with -O0. In
a quick test that does make a big difference in e.g. ResetCatalogCache(). I
just added a function specific annotation to optimize just that function and
the overall time in my test shrank 10% or so.
Greetings,
Andres Freund
Andres Freund <andres@anarazel.de> writes:
On 2021-05-10 14:06:16 -0400, Tom Lane wrote:
I wonder if there's anything we could do to make ResetCatalogCache
faster? It wouldn't help much for normal execution of course,
but it might do something to bring CCA testing time down out of
the stratosphere.
We could make the hashtables shrink, not just grow...
Maybe ...
There's also the issue that most people, I assume, run CCA tests with -O0. In
a quick test that does make a big difference in e.g. ResetCatalogCache(). I
just added a function specific annotation to optimize just that function and
the overall time in my test shrank 10% or so.
If they do I think they're nuts ;-). CCA is slow enough already without
hobbling it.
hyrax appears to use the usual -O2, as does/did avocet. Not sure
if we have any other CCA buildfarm members right now.
regards, tom lane
Andres Freund <andres@anarazel.de> writes:
On 2021-05-10 14:06:16 -0400, Tom Lane wrote:
I wonder if there's anything we could do to make ResetCatalogCache
faster? It wouldn't help much for normal execution of course,
but it might do something to bring CCA testing time down out of
the stratosphere.
We could make the hashtables shrink, not just grow...
I noticed that we already have counters that can tell whether a
catcache or dynahash table is empty, so I experimented with the
attached patch. Testing one of the slow queries from privileges.sql
(which might not be very representative of the overall issue),
I see:
HEAD:
Time: 536429.715 ms (08:56.430)
with ResetCatalogCache hack:
Time: 488938.597 ms (08:08.939)
plus hash_seq_search hack:
Time: 475400.634 ms (07:55.401)
Of course, the issue with these patches is that they change these
counters from things that (I think) we only trust for statistical
purposes into things that had better be right or you're going to
have impossible-to-track-down bugs with sometimes failing to
invalidate cache entries. My gut feeling is that the risk-to-reward
ratio is worth it for changing ResetCatalogCache, but not for
hash_seq_search. This is partly because of the greater absolute
payback and partly because ResetCatalogCache doesn't deal with
shared data structures, reducing the risk of counting issues.
regards, tom lane
Attachments:
save-work-for-empty-data-structures-1.patchtext/x-diff; charset=us-ascii; name=save-work-for-empty-data-structures-1.patchDownload+18-2
Hi,
On 2021-05-10 16:17:18 -0400, Tom Lane wrote:
I noticed that we already have counters that can tell whether a
catcache or dynahash table is empty, so I experimented with the
attached patch. Testing one of the slow queries from privileges.sql
(which might not be very representative of the overall issue),
I see:HEAD:
Time: 536429.715 ms (08:56.430)with ResetCatalogCache hack:
Time: 488938.597 ms (08:08.939)plus hash_seq_search hack:
Time: 475400.634 ms (07:55.401)
Oh, nice.
Perhaps we generally ought to lower the initial sycache sizes further?
20cb18db4668 did that, but we still have things like PROCNAMEARGNSP,
PROCOID, RELNAMENSP, RELOID, STATRELATTINH, ... using 128 as the initial
size. Not hard to imagine that some of these are larger than what simple
workloads or CCA encounter.
Of course, the issue with these patches is that they change these
counters from things that (I think) we only trust for statistical
purposes into things that had better be right or you're going to
have impossible-to-track-down bugs with sometimes failing to
invalidate cache entries. My gut feeling is that the risk-to-reward
ratio is worth it for changing ResetCatalogCache, but not for
hash_seq_search. This is partly because of the greater absolute
payback and partly because ResetCatalogCache doesn't deal with
shared data structures, reducing the risk of counting issues.
That sounds reasonable. We could mitigate the risk for dynahash by
testing HASH_SHARED_MEM (which we don't store right now), but it's not
clear it's worth it here. But I wonder if there's other cases where it
could help? If we did make the check support shared memory *and*
partitioned tables, I could easily see it be a win for things like
LockReleaseAll().
Greetings,
Andres Freund
On Tue, May 11, 2021 at 8:52 AM Andres Freund <andres@anarazel.de> wrote:
... If we did make the check support shared memory *and*
partitioned tables, I could easily see it be a win for things like
LockReleaseAll().
For that case, has the idea of maintaining a dlist of local locks been
considered?
Hi,
On 2021-05-11 10:57:03 +1200, Thomas Munro wrote:
On Tue, May 11, 2021 at 8:52 AM Andres Freund <andres@anarazel.de> wrote:
... If we did make the check support shared memory *and*
partitioned tables, I could easily see it be a win for things like
LockReleaseAll().
Errr, that's not even a shared hashtable... So it would help even if we
just excluded shared memory hashtables.
For that case, has the idea of maintaining a dlist of local locks been
considered?
Yea, there's been a long discussion on that for
LockReleaseAll(). Combined with alternatives around shrinking the hashtable...
Greetings,
Andres Freund
Andrew Dunstan <andrew@dunslane.net> writes:
+1 for doing it now.
Pushed.
You could possibly just move "inet macaddr macaddr8 " to the following
group and so have room for create_function_0. I just tried that and it
seemed happy.
I decided that the minimum change would be to push tstypes to the
following group, so I did it like that.
regards, tom lane