Re: Recovering from detoast-related catcache invalidations

Started by Xiaoran Wangover 2 years ago26 messageshackers
Jump to latest
#1Xiaoran Wang
fanfuxiaoran@gmail.com

Hi,

BTW, while nosing around I found what seems like a very nasty related
bug. Suppose that a catalog tuple being loaded into syscache contains
some toasted fields. CatalogCacheCreateEntry will flatten the tuple,
involving fetches from toast tables that will certainly cause
AcceptInvalidationMessages calls. What if one of those should have
invalidated this tuple? We will not notice, because it's not in
the hashtable yet. When we do add it, we will mark it not-dead,
meaning that the stale entry looks fine and could persist for a long
while.

I spent some time trying to understand the bug and finally, I can reproduce
it locally with the following steps:

step1:
create a function called 'test' with a long body that must be stored in a
toast table.
and put it in schema 'yy' by : "alter function test set schema yy";

step 2:
I added a breakpoint at 'toast_flatten_tuple' for session1 ,
then execute the following SQL:
----------
set search_path='public';
alter function test set schema xx;
----------
step 3:
when the session1 stops at the breakpoint, I open session2 and execute
-----------
set search_path = 'yy';
alter function test set schema public;
-----------
step4:
resume the session1 , it reports the error "ERROR: could not find a
function named "test""

step 5:
continue to execute "alter function test set schema xx;" in session1, but
it still can not work and report the above error although the function test
already belongs to schema 'public'

Obviously, in session 1, the "test" proc tuple in the cache is outdated.

The detection of "get an invalidation" could be refined: what I did
here is to check for any advance of SharedInvalidMessageCounter,
which clearly will have a significant number of false positives.
However, the only way I see to make that a lot better is to
temporarily create a placeholder catcache entry (probably a negative
one) with the same keys, and then see if it got marked dead.
This seems a little expensive, plus I'm afraid that it'd be actively
wrong in the recursive-lookup cases that the existing comment in
SearchCatCacheMiss is talking about (that is, the catcache entry
might mislead any recursive lookup that happens).

I have reviewed your patch, and it looks good. But instead of checking for
any advance of SharedInvalidMessageCounter ( if the invalidate message is
not related to the current tuple, it is a little expensive) I have another
idea: we can recheck the visibility of the tuple with CatalogSnapshot(the
CatalogSnapthot must be refreshed if there is any SharedInvalidMessages) if
it is not visible, we re-fetch the tuple, otherwise, we can continue to use
it as it is not outdated.

I added a commit based on your patch and attached it.

Attachments:

0001-Recheck-the-tuple-visibility.patchapplication/octet-stream; name=0001-Recheck-the-tuple-visibility.patchDownload+5-3
#2Tom Lane
tgl@sss.pgh.pa.us
In reply to: Xiaoran Wang (#1)

Xiaoran Wang <fanfuxiaoran@gmail.com> writes:

The detection of "get an invalidation" could be refined: what I did
here is to check for any advance of SharedInvalidMessageCounter,
which clearly will have a significant number of false positives.

I have reviewed your patch, and it looks good. But instead of checking for
any advance of SharedInvalidMessageCounter ( if the invalidate message is
not related to the current tuple, it is a little expensive) I have another
idea: we can recheck the visibility of the tuple with CatalogSnapshot(the
CatalogSnapthot must be refreshed if there is any SharedInvalidMessages) if
it is not visible, we re-fetch the tuple, otherwise, we can continue to use
it as it is not outdated.

Maybe, but that undocumented hack in SetHintBits seems completely
unacceptable. Isn't there a cleaner way to make this check?

Also, I'm pretty dubious that GetNonHistoricCatalogSnapshot rather
than GetCatalogSnapshot is the right thing, because the catcaches
use the latter.

regards, tom lane

#3Xiaoran Wang
fanfuxiaoran@gmail.com
In reply to: Tom Lane (#2)

Also, I'm pretty dubious that GetNonHistoricCatalogSnapshot rather
than GetCatalogSnapshot is the right thing, because the catcaches
use the latter.

Yes, you are right, should use GetCatalogSnapshot here.

Maybe, but that undocumented hack in SetHintBits seems completely
unacceptable. Isn't there a cleaner way to make this check?

Maybe we don't need to call 'HeapTupleSatisfiesVisibility' to check if the
tuple has been deleted.
As the tuple's xmin must been committed, so we just need to check if its
xmax is committed,
like the below:

------------
@@ -1956,9 +1956,11 @@ CatalogCacheCreateEntry(CatCache *cache, HeapTuple
ntp, Datum *arguments,
                 */
                if (HeapTupleHasExternal(ntp))
                {
+                       TransactionId xmax;
                        dtp = toast_flatten_tuple(ntp, cache->cc_tupdesc);
-                       if (!HeapTupleSatisfiesVisibility(ntp,
GetNonHistoricCatalogSnapshot(cache->cc_reloid), InvalidBuffer))
+                       xmax = HeapTupleHeaderGetUpdateXid(ntp->t_data);
+                       if (TransactionIdIsValid(xmax) &&
TransactionIdDidCommit(xmax))
                        {
                                heap_freetuple(dtp);
                                return NULL;
------------

I'm not quite sure the code is correct, I cannot clearly understand
'HeapTupleHeaderGetUpdateXid', and I need more time to dive into it.

Any thoughts?

Tom Lane <tgl@sss.pgh.pa.us> 于2024年1月12日周五 06:21写道:

Show quoted text

Xiaoran Wang <fanfuxiaoran@gmail.com> writes:

The detection of "get an invalidation" could be refined: what I did
here is to check for any advance of SharedInvalidMessageCounter,
which clearly will have a significant number of false positives.

I have reviewed your patch, and it looks good. But instead of checking

for

any advance of SharedInvalidMessageCounter ( if the invalidate message is
not related to the current tuple, it is a little expensive) I have

another

idea: we can recheck the visibility of the tuple with

CatalogSnapshot(the

CatalogSnapthot must be refreshed if there is any SharedInvalidMessages)

if

it is not visible, we re-fetch the tuple, otherwise, we can continue to

use

it as it is not outdated.

Maybe, but that undocumented hack in SetHintBits seems completely
unacceptable. Isn't there a cleaner way to make this check?

Also, I'm pretty dubious that GetNonHistoricCatalogSnapshot rather
than GetCatalogSnapshot is the right thing, because the catcaches
use the latter.

regards, tom lane

#4Tom Lane
tgl@sss.pgh.pa.us
In reply to: Xiaoran Wang (#3)

Xiaoran Wang <fanfuxiaoran@gmail.com> writes:

Maybe, but that undocumented hack in SetHintBits seems completely
unacceptable. Isn't there a cleaner way to make this check?

Maybe we don't need to call 'HeapTupleSatisfiesVisibility' to check if the
tuple has been deleted.
As the tuple's xmin must been committed, so we just need to check if its
xmax is committed,

I'm not super thrilled with that. Something I realized last night is
that your proposal only works if "ntp" is pointing directly into the
catalog's disk buffers. If something earlier than this code had made
a local-memory copy of the catalog tuple, then it's possible that its
header fields (particularly xmax) are out of date compared to shared
buffers and would fail to tell us that some other process just
invalidated the tuple. Now in fact, with the current implementation
of syscache_getnext() the result is actually a live tuple and so we
can expect to see any relevant updates. But I think we'd better add
some Asserts that that's so; and that also provides us with a way to
call HeapTupleSatisfiesVisibility fully legally, because we can get
the buffer reference out of the scan descriptor too.

This is uncomfortably much in bed with the tuple table slot code,
perhaps, but I don't see a way to do it more cleanly unless we want
to add some new provisions to that API. Andres, do you have any
thoughts about that?

Anyway, this approach gets rid of false positives, which is great
for performance and bad for testing. Code coverage says that now
we never hit the failure paths during regression tests, which is
unsurprising, but I'm not very comfortable with leaving those paths
unexercised. I tried to make an isolation test to exercise them,
but there's no good way at the SQL level to get a session to block
during the detoast step. LOCK TABLE on some catalog's toast table
would do, but we disallow it. I thought about adding a small C
function to regress.so to take out such a lock, but we have no
infrastructure for referencing regress.so from isolation tests.
What I ended up doing is adding a random failure about 0.1% of
the time in USE_ASSERT_CHECKING builds --- that's intellectually
ugly for sure, but doing better seems like way more work than
it's worth.

regards, tom lane

Attachments:

v3-fix-stale-catcache-entries.patchtext/x-diff; charset=us-ascii; name=v3-fix-stale-catcache-entries.patchDownload+135-40
#5Tom Lane
tgl@sss.pgh.pa.us
In reply to: Tom Lane (#4)

I wrote:

This is uncomfortably much in bed with the tuple table slot code,
perhaps, but I don't see a way to do it more cleanly unless we want
to add some new provisions to that API. Andres, do you have any
thoughts about that?

Oh! After nosing around a bit more I remembered systable_recheck_tuple,
which is meant for exactly this purpose. So v4 attached.

regards, tom lane

Attachments:

v4-fix-stale-catcache-entries.patchtext/x-diff; charset=us-ascii; name=v4-fix-stale-catcache-entries.patchDownload+124-40
#6Xiaoran Wang
fanfuxiaoran@gmail.com
In reply to: Tom Lane (#5)

Great! That's what exactly we need.

The patch LGTM, +1

Tom Lane <tgl@sss.pgh.pa.us> 于2024年1月13日周六 04:47写道:

Show quoted text

I wrote:

This is uncomfortably much in bed with the tuple table slot code,
perhaps, but I don't see a way to do it more cleanly unless we want
to add some new provisions to that API. Andres, do you have any
thoughts about that?

Oh! After nosing around a bit more I remembered systable_recheck_tuple,
which is meant for exactly this purpose. So v4 attached.

regards, tom lane

#7Xiaoran Wang
fanfuxiaoran@gmail.com
In reply to: Xiaoran Wang (#6)

Hmm, how about first checking if any invalidated shared messages have been
accepted, then rechecking the tuple's visibility?
If there is no invalidated shared message accepted during
'toast_flatten_tuple',
there is no need to do then visibility check, then it can save several
CPU cycles.

----
if (inval_count != SharedInvalidMessageCounter &&
!systable_recheck_tuple(scandesc, ntp))
{
heap_freetuple(dtp);
return NULL;
}
----

Xiaoran Wang <fanfuxiaoran@gmail.com> 于2024年1月13日周六 13:16写道:

Show quoted text

Great! That's what exactly we need.

The patch LGTM, +1

Tom Lane <tgl@sss.pgh.pa.us> 于2024年1月13日周六 04:47写道:

I wrote:

This is uncomfortably much in bed with the tuple table slot code,
perhaps, but I don't see a way to do it more cleanly unless we want
to add some new provisions to that API. Andres, do you have any
thoughts about that?

Oh! After nosing around a bit more I remembered systable_recheck_tuple,
which is meant for exactly this purpose. So v4 attached.

regards, tom lane

#8Tom Lane
tgl@sss.pgh.pa.us
In reply to: Xiaoran Wang (#7)

Xiaoran Wang <fanfuxiaoran@gmail.com> writes:

Hmm, how about first checking if any invalidated shared messages have been
accepted, then rechecking the tuple's visibility?
If there is no invalidated shared message accepted during
'toast_flatten_tuple',
there is no need to do then visibility check, then it can save several
CPU cycles.

Meh, I'd just as soon not add the additional dependency/risk of bugs.
This is an expensive and seldom-taken code path, so I don't think
shaving a few cycles is really important.

regards, tom lane

#9Tom Lane
tgl@sss.pgh.pa.us
In reply to: Tom Lane (#8)

I wrote:

Xiaoran Wang <fanfuxiaoran@gmail.com> writes:

Hmm, how about first checking if any invalidated shared messages have been
accepted, then rechecking the tuple's visibility?
If there is no invalidated shared message accepted during
'toast_flatten_tuple',
there is no need to do then visibility check, then it can save several
CPU cycles.

Meh, I'd just as soon not add the additional dependency/risk of bugs.
This is an expensive and seldom-taken code path, so I don't think
shaving a few cycles is really important.

It occurred to me that this idea might be more interesting if we
could encapsulate it right into systable_recheck_tuple: something
like having systable_beginscan capture the current
SharedInvalidMessageCounter and save it in the SysScanDesc struct,
then compare in systable_recheck_tuple to possibly short-circuit
that work. This'd eliminate one of the main bug hazards in the
idea, namely that you might capture SharedInvalidMessageCounter too
late, after something's already happened. However, the whole idea
only works for catalogs that have catcaches, and the other users of
systable_recheck_tuple are interested in pg_depend which doesn't.
So that put a damper on my enthusiasm for the idea.

regards, tom lane

#10Noah Misch
noah@leadboat.com
In reply to: Tom Lane (#5)

On Fri, Jan 12, 2024 at 03:47:13PM -0500, Tom Lane wrote:

I wrote:

This is uncomfortably much in bed with the tuple table slot code,
perhaps, but I don't see a way to do it more cleanly unless we want
to add some new provisions to that API. Andres, do you have any
thoughts about that?

Oh! After nosing around a bit more I remembered systable_recheck_tuple,
which is meant for exactly this purpose. So v4 attached.

systable_recheck_tuple() is blind to heap_inplace_update(), so it's not a
general proxy for invalidation messages. The commit for $SUBJECT (ad98fb1)
doesn't create any new malfunctions, but I expect the systable_recheck_tuple()
part will change again before the heap_inplace_update() story is over
(/messages/by-id/flat/CAMp+ueZQz3yDk7qg42hk6-9gxniYbp-=bG2mgqecErqR5gGGOA@mail.gmail.com).

#11Xiaoran Wang
fanfuxiaoran@gmail.com
In reply to: Tom Lane (#9)

This is an interesting idea.
Although some catalog tables are not in catcaches,
such as pg_depend, when scanning them, if there is any
SharedInvalidationMessage, the CatalogSnapshot
will be invalidated and recreated ("RelationInvalidatesSnapshotsOnly"
in syscache.c)
Maybe during the system_scan, it receives the SharedInvalidationMessages
and returns the tuples which
are out of date. systable_recheck_tuple is used in dependency.c for such
case.

Tom Lane <tgl@sss.pgh.pa.us> 于2024年1月14日周日 03:12写道:

Show quoted text

I wrote:

Xiaoran Wang <fanfuxiaoran@gmail.com> writes:

Hmm, how about first checking if any invalidated shared messages have

been

accepted, then rechecking the tuple's visibility?
If there is no invalidated shared message accepted during
'toast_flatten_tuple',
there is no need to do then visibility check, then it can save several
CPU cycles.

Meh, I'd just as soon not add the additional dependency/risk of bugs.
This is an expensive and seldom-taken code path, so I don't think
shaving a few cycles is really important.

It occurred to me that this idea might be more interesting if we
could encapsulate it right into systable_recheck_tuple: something
like having systable_beginscan capture the current
SharedInvalidMessageCounter and save it in the SysScanDesc struct,
then compare in systable_recheck_tuple to possibly short-circuit
that work. This'd eliminate one of the main bug hazards in the
idea, namely that you might capture SharedInvalidMessageCounter too
late, after something's already happened. However, the whole idea
only works for catalogs that have catcaches, and the other users of
systable_recheck_tuple are interested in pg_depend which doesn't.
So that put a damper on my enthusiasm for the idea.

regards, tom lane

#12Noah Misch
noah@leadboat.com
In reply to: Noah Misch (#10)

On Sun, Jan 14, 2024 at 12:14:11PM -0800, Noah Misch wrote:

On Fri, Jan 12, 2024 at 03:47:13PM -0500, Tom Lane wrote:

Oh! After nosing around a bit more I remembered systable_recheck_tuple,
which is meant for exactly this purpose. So v4 attached.

systable_recheck_tuple() is blind to heap_inplace_update(), so it's not a
general proxy for invalidation messages. The commit for $SUBJECT (ad98fb1)
doesn't create any new malfunctions, but I expect the systable_recheck_tuple()
part will change again before the heap_inplace_update() story is over
(/messages/by-id/flat/CAMp+ueZQz3yDk7qg42hk6-9gxniYbp-=bG2mgqecErqR5gGGOA@mail.gmail.com).

Commit f9f47f0 (2024-06-27) addressed inplace updates here.

#13Heikki Linnakangas
heikki.linnakangas@enterprisedb.com
In reply to: Noah Misch (#12)

On 25/09/2024 00:20, Noah Misch wrote:

On Sun, Jan 14, 2024 at 12:14:11PM -0800, Noah Misch wrote:

On Fri, Jan 12, 2024 at 03:47:13PM -0500, Tom Lane wrote:

Oh! After nosing around a bit more I remembered systable_recheck_tuple,
which is meant for exactly this purpose. So v4 attached.

systable_recheck_tuple() is blind to heap_inplace_update(), so it's not a
general proxy for invalidation messages. The commit for $SUBJECT (ad98fb1)
doesn't create any new malfunctions, but I expect the systable_recheck_tuple()
part will change again before the heap_inplace_update() story is over
(/messages/by-id/flat/CAMp+ueZQz3yDk7qg42hk6-9gxniYbp-=bG2mgqecErqR5gGGOA@mail.gmail.com).

Commit f9f47f0 (2024-06-27) addressed inplace updates here.

I started to wonder if there's still an issue with catcache list
entries. The code to build a CatCList looks like this:

SearchCatCacheList()
systable_beginscan()
while (HeapTupleIsValid(ntp = systable_getnext(scandesc))) {
ct = CatalogCacheCreateEntry(ntp)
if (ct == NULL)
{
/* 'ntp' was concurrently invalidated, start all over */
}
}
systable_endscan();

/* create CatCList entry */

CatalogCacheCreateEntry() can accept catcache invalidations when it
opens the toast table, and it now has recheck logic to detect the case
that the tuple it's processing (ntp) is invalidated. However, isn't it
also possible that it accepts an invalidation message for a tuple that
we had processed in an earlier iteration of the loop? Or that a new
catalog tuple was inserted that should be part of the list we're building?

--
Heikki Linnakangas
Neon (https://neon.tech)

#14Tom Lane
tgl@sss.pgh.pa.us
In reply to: Heikki Linnakangas (#13)

Heikki Linnakangas <hlinnaka@iki.fi> writes:

CatalogCacheCreateEntry() can accept catcache invalidations when it
opens the toast table, and it now has recheck logic to detect the case
that the tuple it's processing (ntp) is invalidated. However, isn't it
also possible that it accepts an invalidation message for a tuple that
we had processed in an earlier iteration of the loop? Or that a new
catalog tuple was inserted that should be part of the list we're building?

The expectation is that the list will be built and returned to the
caller, but it's already marked as stale so it will be rebuilt
on next request.

We could consider putting a loop around that, but (a) it might loop a
lot of times, and (b) returning a stale list isn't much different from
the situation where the list-invalidating event arrives a nanosecond
after we finish rather than a nanosecond before. Ultimately it's the
caller's responsibility that the returned list be consistent enough
for its purposes. It might achieve that by first taking a lock on a
related table, for example.

regards, tom lane

#15Heikki Linnakangas
heikki.linnakangas@enterprisedb.com
In reply to: Tom Lane (#14)

On 13/12/2024 17:30, Tom Lane wrote:

Heikki Linnakangas <hlinnaka@iki.fi> writes:

CatalogCacheCreateEntry() can accept catcache invalidations when it
opens the toast table, and it now has recheck logic to detect the case
that the tuple it's processing (ntp) is invalidated. However, isn't it
also possible that it accepts an invalidation message for a tuple that
we had processed in an earlier iteration of the loop? Or that a new
catalog tuple was inserted that should be part of the list we're building?

The expectation is that the list will be built and returned to the
caller, but it's already marked as stale so it will be rebuilt
on next request.

Ah, you mean this at the end:

/* mark list dead if any members already dead */
if (ct->dead)
cl->dead = true;

Ok, I missed that. It does not handle the 2nd scenario though: If a new
catalog tuple is concurrently inserted that should be part of the list,
it is missed.

I was able to reproduce that, by pausing a process with gdb while it's
building the list in SearchCatCacheList():

1. Create a function called foofunc(integer). It must be large so that
its pg_proc tuple is toasted.

2. In one backend, run "SELECT foofunc(1)". It calls
FuncnameGetCandidates() which calls
"SearchSysCacheList1(PROCNAMEARGSNSP, CStringGetDatum(funcname));". Put
a break point in SearchCatCacheList() just after the systable_beginscan().

3. In another backend, create function foofunc() with no args.

4. continue execution from the breakpoint.

5. Run "SELECT foofunc()" in the first session. It fails to find the
function. The error persists, it will fail to find that function if you
try again, until the syscache is invalidated again for some reason.

Attached is an injection point test case to reproduce that. If you
change the test so that the function's body is shorter, so that it's not
toasted, the test passes.

--
Heikki Linnakangas
Neon (https://neon.tech)

Attachments:

0001-Demonstrate-catcache-list-invalidation-bug.patchtext/x-patch; charset=UTF-8; name=0001-Demonstrate-catcache-list-invalidation-bug.patchDownload+100-1
#16Heikki Linnakangas
heikki.linnakangas@enterprisedb.com
In reply to: Heikki Linnakangas (#15)

On 14/12/2024 02:06, Heikki Linnakangas wrote:

Ok, I missed that. It does not handle the 2nd scenario though: If a new
catalog tuple is concurrently inserted that should be part of the list,
it is missed.

I was able to reproduce that, by pausing a process with gdb while it's
building the list in SearchCatCacheList():

1. Create a function called foofunc(integer). It must be large so that
its pg_proc tuple is toasted.

2. In one backend, run "SELECT foofunc(1)". It calls
FuncnameGetCandidates() which calls
"SearchSysCacheList1(PROCNAMEARGSNSP, CStringGetDatum(funcname));". Put
a break point in SearchCatCacheList() just after the systable_beginscan().

3. In another backend, create function foofunc() with no args.

4. continue execution from the breakpoint.

5. Run "SELECT foofunc()" in the first session. It fails to find the
function. The error persists, it will fail to find that function if you
try again, until the syscache is invalidated again for some reason.

Attached is an injection point test case to reproduce that. If you
change the test so that the function's body is shorter, so that it's not
toasted, the test passes.

I'm thinking of the attached to fix this. It changes the strategy for
detecting concurrent cache invalidations. Instead of the "recheck"
mechanism that was introduced in commit ad98fb1422, keep a stack of
"build in-progress" entries, and CatCacheInvalidate() invalidate those
"in-progress" entries in addition to the actual CatCTup and CatCList
entries.

My first attempt was to insert the CatCTup or CatCList entry to the
catcache before starting to build it, marked with a flag to indicate
that the entry isn't fully built yet. But when I started to write that
it got pretty invasive, and it felt easier to add another structure to
hold the in-progress entries instead.

(I'm not sure I got the 'volatile' markers on the local variable right
in this patch; before committing this I'll need to freshen my memory on
the rules on PG_TRY() and local variables again. Also, I'm not sure I
want to commit the test with the injection point, but it's useful now to
demonstrate the bug.)

--
Heikki Linnakangas
Neon (https://neon.tech)

Attachments:

0001-Demonstrate-catcache-list-invalidation-bug.patchtext/x-patch; charset=UTF-8; name=0001-Demonstrate-catcache-list-invalidation-bug.patchDownload+100-1
0001-Don-t-allow-GetTransactionSnapshot-in-logical-decodi.patchtext/x-patch; charset=UTF-8; name=0001-Don-t-allow-GetTransactionSnapshot-in-logical-decodi.patchDownload+4-9
#17Michael Paquier
michael@paquier.xyz
In reply to: Heikki Linnakangas (#16)

On Tue, Dec 24, 2024 at 12:18:09AM +0200, Heikki Linnakangas wrote:

My first attempt was to insert the CatCTup or CatCList entry to the catcache
before starting to build it, marked with a flag to indicate that the entry
isn't fully built yet. But when I started to write that it got pretty
invasive, and it felt easier to add another structure to hold the
in-progress entries instead.

From ec248c69cb42a0747ecc6a63ac4e4682cce2ee93 Mon Sep 17 00:00:00 2001
From: Heikki Linnakangas <heikki.linnakangas@iki.fi>
Date: Fri, 20 Dec 2024 18:37:44 +0200
Subject: [PATCH 1/1] Don't allow GetTransactionSnapshot() in logical decoding

It seems to me that this is not what you intended to attach for the
catcache inconsistency fix?
--
Michael

#18Heikki Linnakangas
heikki.linnakangas@enterprisedb.com
In reply to: Michael Paquier (#17)

On 24/12/2024 09:38, Michael Paquier wrote:

On Tue, Dec 24, 2024 at 12:18:09AM +0200, Heikki Linnakangas wrote:

My first attempt was to insert the CatCTup or CatCList entry to the catcache
before starting to build it, marked with a flag to indicate that the entry
isn't fully built yet. But when I started to write that it got pretty
invasive, and it felt easier to add another structure to hold the
in-progress entries instead.

From ec248c69cb42a0747ecc6a63ac4e4682cce2ee93 Mon Sep 17 00:00:00 2001
From: Heikki Linnakangas <heikki.linnakangas@iki.fi>
Date: Fri, 20 Dec 2024 18:37:44 +0200
Subject: [PATCH 1/1] Don't allow GetTransactionSnapshot() in logical decoding

It seems to me that this is not what you intended to attach for the
catcache inconsistency fix?

Right, sorry, here are the correct patches.

--
Heikki Linnakangas
Neon (https://neon.tech)

Attachments:

v2-0001-Demonstrate-catcache-list-invalidation-bug.patchtext/x-patch; charset=UTF-8; name=v2-0001-Demonstrate-catcache-list-invalidation-bug.patchDownload+100-1
v2-0002-Fix-bug-in-catcache-invalidation.patchtext/x-patch; charset=UTF-8; name=v2-0002-Fix-bug-in-catcache-invalidation.patchDownload+112-75
#19Noah Misch
noah@leadboat.com
In reply to: Heikki Linnakangas (#16)

On Tue, Dec 24, 2024 at 12:18:09AM +0200, Heikki Linnakangas wrote:

On 14/12/2024 02:06, Heikki Linnakangas wrote:

Ok, I missed that. It does not handle the 2nd scenario though: If a new
catalog tuple is concurrently inserted that should be part of the list,
it is missed.

Attached is an injection point test case to reproduce that. If you
change the test so that the function's body is shorter, so that it's not
toasted, the test passes.

Nice discovery.

I'm thinking of the attached to fix this. It changes the strategy for
detecting concurrent cache invalidations. Instead of the "recheck" mechanism
that was introduced in commit ad98fb1422, keep a stack of "build
in-progress" entries, and CatCacheInvalidate() invalidate those
"in-progress" entries in addition to the actual CatCTup and CatCList
entries.

My first attempt was to insert the CatCTup or CatCList entry to the catcache
before starting to build it, marked with a flag to indicate that the entry
isn't fully built yet. But when I started to write that it got pretty
invasive, and it felt easier to add another structure to hold the
in-progress entries instead.

That's similar to how relcache has been doing it (in_progress_list). I see no
problem applying that technique here.

not sure I want to
commit the test with the injection point, but it's useful now to demonstrate
the bug.)

I'd err on the side of including it. Apart from some copied comments, the
test looks ready.

On Wed, Dec 25, 2024 at 11:27:38AM +0200, Heikki Linnakangas wrote:

+++ b/src/test/modules/test_misc/t/007_bugs.pl

test_misc/t/007_bugs.pl could be home to almost anything. How about naming it
007_catcache_inval.pl?

@@ -744,6 +770,13 @@ ResetCatalogCache(CatCache *cache)
#endif
}
}
+
+	/* Also invalidate any entries that are being built */
+	for (CatCCreating *e = catcache_creating_stack; e != NULL; e = e->next)
+	{
+		if (e->cache == cache)
+			e->dead = true;
+	}
}

With debug_discard_caches=1, "make check" hangs early with some INSERT using
100% CPU. The new test file does likewise. I bet this needs a special case
to short-circuit debug_discard_caches, like RelationCacheInvalidate() has.

@@ -1665,6 +1698,8 @@ SearchCatCacheList(CatCache *cache,
HeapTuple ntp;
MemoryContext oldcxt;
int i;
+ volatile CatCCreating creating_list;

You could drop the volatile by copying catcache_creating_stack to an
additional var that you reference after longjmp, instead of referencing
creating_list.next after longjmp. Likewise for the instance of volatile in
CatalogCacheCreateEntry().
https://pubs.opengroup.org/onlinepubs/9799919799/functions/longjmp.html has
the rules. Using volatile is fine, though.

+ bool first_iter = true;

This is okay as non-volatile, but it could just as easily move inside the
PG_TRY.

@@ -2076,38 +2118,33 @@ CatalogCacheCreateEntry(CatCache *cache, HeapTuple ntp, SysScanDesc scandesc,

+			PG_TRY();
{
-				matches = equalTuple(before, ntp);
-				heap_freetuple(before);
+				dtp = toast_flatten_tuple(ntp, cache->cc_tupdesc);

gcc 4.8.5 warns:

catcache.c: In function ‘CatalogCacheCreateEntry’:
catcache.c:2159:29: warning: ‘dtp’ may be used uninitialized in this function [-Wmaybe-uninitialized]
ct->tuple.t_tableOid = dtp->t_tableOid;

I remember some commit of a gcc 4.x warning fix in a recent year, and we do
have buildfarm representation. Consider silencing it.

The rest looks good. Thanks.

#20Heikki Linnakangas
heikki.linnakangas@enterprisedb.com
In reply to: Noah Misch (#19)

On 07/01/2025 23:56, Noah Misch wrote:

On Tue, Dec 24, 2024 at 12:18:09AM +0200, Heikki Linnakangas wrote:

I'm thinking of the attached to fix this. It changes the strategy for
detecting concurrent cache invalidations. Instead of the "recheck" mechanism
that was introduced in commit ad98fb1422, keep a stack of "build
in-progress" entries, and CatCacheInvalidate() invalidate those
"in-progress" entries in addition to the actual CatCTup and CatCList
entries.

My first attempt was to insert the CatCTup or CatCList entry to the catcache
before starting to build it, marked with a flag to indicate that the entry
isn't fully built yet. But when I started to write that it got pretty
invasive, and it felt easier to add another structure to hold the
in-progress entries instead.

That's similar to how relcache has been doing it (in_progress_list). I see no
problem applying that technique here.

Oh thanks, I didn't notice that in relcache.c. I adjusted the naming and
comments in the new catcache.c code to be a little closer to the
relcache.c version, just to make it a bit more consistent.

The main difference is that relcache.c uses an array and an end-of-xact
callback to reset the array on error, while my implementation uses a
linked list of entries allocated on stack and PG_TRY-CATCH for error
cleanup. I considered adopting relcache.c's approach for the sake of
consistency, but decided to keep my original approach in the end. Some
of the code in catcache.c already had a suitable PG_TRY-CATCH block and
I didn't want to add a new end-of-xact callback just for this.

not sure I want to
commit the test with the injection point, but it's useful now to demonstrate
the bug.)

I'd err on the side of including it. Apart from some copied comments, the
test looks ready.

Ok, I cleaned up the comments.

On Wed, Dec 25, 2024 at 11:27:38AM +0200, Heikki Linnakangas wrote:

+++ b/src/test/modules/test_misc/t/007_bugs.pl

test_misc/t/007_bugs.pl could be home to almost anything. How about naming it
007_catcache_inval.pl?

Renamed

@@ -744,6 +770,13 @@ ResetCatalogCache(CatCache *cache)
#endif
}
}
+
+	/* Also invalidate any entries that are being built */
+	for (CatCCreating *e = catcache_creating_stack; e != NULL; e = e->next)
+	{
+		if (e->cache == cache)
+			e->dead = true;
+	}
}

With debug_discard_caches=1, "make check" hangs early with some INSERT using
100% CPU. The new test file does likewise. I bet this needs a special case
to short-circuit debug_discard_caches, like RelationCacheInvalidate() has.

Added

@@ -1665,6 +1698,8 @@ SearchCatCacheList(CatCache *cache,
HeapTuple ntp;
MemoryContext oldcxt;
int i;
+ volatile CatCCreating creating_list;

You could drop the volatile by copying catcache_creating_stack to an
additional var that you reference after longjmp, instead of referencing
creating_list.next after longjmp. Likewise for the instance of volatile in
CatalogCacheCreateEntry().
https://pubs.opengroup.org/onlinepubs/9799919799/functions/longjmp.html has
the rules. Using volatile is fine, though.

Thanks for the tips. I used the additional var, it seems a little nicer.

+ bool first_iter = true;

This is okay as non-volatile, but it could just as easily move inside the
PG_TRY.

Done

@@ -2076,38 +2118,33 @@ CatalogCacheCreateEntry(CatCache *cache, HeapTuple ntp, SysScanDesc scandesc,

+			PG_TRY();
{
-				matches = equalTuple(before, ntp);
-				heap_freetuple(before);
+				dtp = toast_flatten_tuple(ntp, cache->cc_tupdesc);

gcc 4.8.5 warns:

catcache.c: In function ‘CatalogCacheCreateEntry’:
catcache.c:2159:29: warning: ‘dtp’ may be used uninitialized in this function [-Wmaybe-uninitialized]
ct->tuple.t_tableOid = dtp->t_tableOid;

I remember some commit of a gcc 4.x warning fix in a recent year, and we do
have buildfarm representation. Consider silencing it.

Hmm, I guess the compiler doesn't see that it's initialized with all the
PG_TRY()/CATCH() stuff. I initialized it to NULL to silence the warning.

Review of this new version is much appreciated, but if I don't hear
anything I'll commit and backpatch this.

--
Heikki Linnakangas
Neon (https://neon.tech)

Attachments:

v3-0001-Fix-catcache-invalidation-of-a-list-entry-that-s-.patchtext/x-patch; charset=UTF-8; name=v3-0001-Fix-catcache-invalidation-of-a-list-entry-that-s-.patchDownload+248-80
#21Noah Misch
noah@leadboat.com
In reply to: Heikki Linnakangas (#20)
#22Heikki Linnakangas
heikki.linnakangas@enterprisedb.com
In reply to: Noah Misch (#21)
#23Andres Freund
andres@anarazel.de
In reply to: Heikki Linnakangas (#22)
#24Andres Freund
andres@anarazel.de
In reply to: Andres Freund (#23)
#25Arseniy Mukhin
arseniy.mukhin.dev@gmail.com
In reply to: Andres Freund (#24)
#26Andres Freund
andres@anarazel.de
In reply to: Arseniy Mukhin (#25)