Re: Recovering from detoast-related catcache invalidations

Started by Xiaoran Wangabout 2 years ago26 messages

fanfuxiaoran@gmail.com

about 2 years ago

1 attachment(s)

Hi,

BTW, while nosing around I found what seems like a very nasty related
bug. Suppose that a catalog tuple being loaded into syscache contains
some toasted fields. CatalogCacheCreateEntry will flatten the tuple,
involving fetches from toast tables that will certainly cause
AcceptInvalidationMessages calls. What if one of those should have
invalidated this tuple? We will not notice, because it's not in
the hashtable yet. When we do add it, we will mark it not-dead,
meaning that the stale entry looks fine and could persist for a long
while.

I spent some time trying to understand the bug and finally, I can reproduce
it locally with the following steps:

step1:
create a function called 'test' with a long body that must be stored in a
toast table.
and put it in schema 'yy' by : "alter function test set schema yy";

step 2:
I added a breakpoint at 'toast_flatten_tuple' for session1 ,
then execute the following SQL:
----------
set search_path='public';
alter function test set schema xx;
----------
step 3:
when the session1 stops at the breakpoint, I open session2 and execute
-----------
set search_path = 'yy';
alter function test set schema public;
-----------
step4:
resume the session1 , it reports the error "ERROR: could not find a
function named "test""

step 5:
continue to execute "alter function test set schema xx;" in session1, but
it still can not work and report the above error although the function test
already belongs to schema 'public'

Obviously, in session 1, the "test" proc tuple in the cache is outdated.

The detection of "get an invalidation" could be refined: what I did
here is to check for any advance of SharedInvalidMessageCounter,
which clearly will have a significant number of false positives.
However, the only way I see to make that a lot better is to
temporarily create a placeholder catcache entry (probably a negative
one) with the same keys, and then see if it got marked dead.
This seems a little expensive, plus I'm afraid that it'd be actively
wrong in the recursive-lookup cases that the existing comment in
SearchCatCacheMiss is talking about (that is, the catcache entry
might mislead any recursive lookup that happens).

I have reviewed your patch, and it looks good. But instead of checking for
any advance of SharedInvalidMessageCounter ( if the invalidate message is
not related to the current tuple, it is a little expensive) I have another
idea: we can recheck the visibility of the tuple with CatalogSnapshot(the
CatalogSnapthot must be refreshed if there is any SharedInvalidMessages) if
it is not visible, we re-fetch the tuple, otherwise, we can continue to use
it as it is not outdated.

I added a commit based on your patch and attached it.

Attachments:

0001-Recheck-the-tuple-visibility.patchapplication/octet-stream; name=0001-Recheck-the-tuple-visibility.patchDownload

From 55843061ec10c4914618c3c5c4b3372142c6a19d Mon Sep 17 00:00:00 2001
From: Xiaoran Wang <wxiaoran@vmware.com>
Date: Fri, 12 Jan 2024 01:49:52 +0800
Subject: [PATCH] Recheck the tuple visibility

---
 src/backend/access/heap/heapam_visibility.c | 2 ++
 src/backend/utils/cache/catcache.c          | 5 +++--
 2 files changed, 5 insertions(+), 2 deletions(-)

diff --git a/src/backend/access/heap/heapam_visibility.c b/src/backend/access/heap/heapam_visibility.c
index a716001341..251792fcc7 100644
--- a/src/backend/access/heap/heapam_visibility.c
+++ b/src/backend/access/heap/heapam_visibility.c
@@ -116,6 +116,8 @@ static inline void
 SetHintBits(HeapTupleHeader tuple, Buffer buffer,
 			uint16 infomask, TransactionId xid)
 {
+	if (buffer == InvalidBuffer)
+		return;
 	if (TransactionIdIsValid(xid))
 	{
 		/* NB: xid must be known committed here! */
diff --git a/src/backend/utils/cache/catcache.c b/src/backend/utils/cache/catcache.c
index 2d72e8106c..1d11d7a7fb 100644
--- a/src/backend/utils/cache/catcache.c
+++ b/src/backend/utils/cache/catcache.c
@@ -15,6 +15,7 @@
 #include "postgres.h"
 
 #include "access/genam.h"
+#include "access/heapam.h"
 #include "access/heaptoast.h"
 #include "access/relscan.h"
 #include "access/sysattr.h"
@@ -38,6 +39,7 @@
 #include "utils/memutils.h"
 #include "utils/rel.h"
 #include "utils/resowner.h"
+#include "utils/snapmgr.h"
 #include "utils/syscache.h"
 
 
@@ -1954,10 +1956,9 @@ CatalogCacheCreateEntry(CatCache *cache, HeapTuple ntp, Datum *arguments,
 		 */
 		if (HeapTupleHasExternal(ntp))
 		{
-			uint64		inval_count = SharedInvalidMessageCounter;
 
 			dtp = toast_flatten_tuple(ntp, cache->cc_tupdesc);
-			if (inval_count != SharedInvalidMessageCounter)
+			if (!HeapTupleSatisfiesVisibility(ntp, GetNonHistoricCatalogSnapshot(cache->cc_reloid), InvalidBuffer))
 			{
 				heap_freetuple(dtp);
 				return NULL;
-- 
2.39.2 (Apple Git-143)

Tom Lane

tgl@sss.pgh.pa.us

about 2 years ago

In reply to: Xiaoran Wang (#1)

Xiaoran Wang <fanfuxiaoran@gmail.com> writes:

The detection of "get an invalidation" could be refined: what I did
here is to check for any advance of SharedInvalidMessageCounter,
which clearly will have a significant number of false positives.

I have reviewed your patch, and it looks good. But instead of checking for
any advance of SharedInvalidMessageCounter ( if the invalidate message is
not related to the current tuple, it is a little expensive) I have another
idea: we can recheck the visibility of the tuple with CatalogSnapshot(the
CatalogSnapthot must be refreshed if there is any SharedInvalidMessages) if
it is not visible, we re-fetch the tuple, otherwise, we can continue to use
it as it is not outdated.

Maybe, but that undocumented hack in SetHintBits seems completely
unacceptable. Isn't there a cleaner way to make this check?

Also, I'm pretty dubious that GetNonHistoricCatalogSnapshot rather
than GetCatalogSnapshot is the right thing, because the catcaches
use the latter.

regards, tom lane

Xiaoran Wang

fanfuxiaoran@gmail.com

almost 2 years ago

In reply to: Tom Lane (#2)

Also, I'm pretty dubious that GetNonHistoricCatalogSnapshot rather
than GetCatalogSnapshot is the right thing, because the catcaches
use the latter.

Yes, you are right, should use GetCatalogSnapshot here.

Maybe, but that undocumented hack in SetHintBits seems completely
unacceptable. Isn't there a cleaner way to make this check?

Maybe we don't need to call 'HeapTupleSatisfiesVisibility' to check if the
tuple has been deleted.
As the tuple's xmin must been committed, so we just need to check if its
xmax is committed,
like the below:

------------
@@ -1956,9 +1956,11 @@ CatalogCacheCreateEntry(CatCache *cache, HeapTuple
ntp, Datum *arguments,
                 */
                if (HeapTupleHasExternal(ntp))
                {
+                       TransactionId xmax;

                        dtp = toast_flatten_tuple(ntp, cache->cc_tupdesc);
-                       if (!HeapTupleSatisfiesVisibility(ntp,
GetNonHistoricCatalogSnapshot(cache->cc_reloid), InvalidBuffer))
+                       xmax = HeapTupleHeaderGetUpdateXid(ntp->t_data);
+                       if (TransactionIdIsValid(xmax) &&
TransactionIdDidCommit(xmax))
                        {
                                heap_freetuple(dtp);
                                return NULL;
------------

I'm not quite sure the code is correct, I cannot clearly understand
'HeapTupleHeaderGetUpdateXid', and I need more time to dive into it.

Any thoughts?

Tom Lane <tgl@sss.pgh.pa.us> 于2024年1月12日周五 06:21写道：

Show quoted text

Xiaoran Wang <fanfuxiaoran@gmail.com> writes:

The detection of "get an invalidation" could be refined: what I did
here is to check for any advance of SharedInvalidMessageCounter,
which clearly will have a significant number of false positives.

I have reviewed your patch, and it looks good. But instead of checking

for

any advance of SharedInvalidMessageCounter ( if the invalidate message is
not related to the current tuple, it is a little expensive) I have

another

idea: we can recheck the visibility of the tuple with

CatalogSnapshot(the

CatalogSnapthot must be refreshed if there is any SharedInvalidMessages)

if

it is not visible, we re-fetch the tuple, otherwise, we can continue to

use

it as it is not outdated.

Maybe, but that undocumented hack in SetHintBits seems completely
unacceptable. Isn't there a cleaner way to make this check?

Also, I'm pretty dubious that GetNonHistoricCatalogSnapshot rather
than GetCatalogSnapshot is the right thing, because the catcaches
use the latter.

regards, tom lane

Tom Lane

tgl@sss.pgh.pa.us

almost 2 years ago

In reply to: Xiaoran Wang (#3)

1 attachment(s)

Xiaoran Wang <fanfuxiaoran@gmail.com> writes:

Maybe, but that undocumented hack in SetHintBits seems completely
unacceptable. Isn't there a cleaner way to make this check?

Maybe we don't need to call 'HeapTupleSatisfiesVisibility' to check if the
tuple has been deleted.
As the tuple's xmin must been committed, so we just need to check if its
xmax is committed,

I'm not super thrilled with that. Something I realized last night is
that your proposal only works if "ntp" is pointing directly into the
catalog's disk buffers. If something earlier than this code had made
a local-memory copy of the catalog tuple, then it's possible that its
header fields (particularly xmax) are out of date compared to shared
buffers and would fail to tell us that some other process just
invalidated the tuple. Now in fact, with the current implementation
of syscache_getnext() the result is actually a live tuple and so we
can expect to see any relevant updates. But I think we'd better add
some Asserts that that's so; and that also provides us with a way to
call HeapTupleSatisfiesVisibility fully legally, because we can get
the buffer reference out of the scan descriptor too.

This is uncomfortably much in bed with the tuple table slot code,
perhaps, but I don't see a way to do it more cleanly unless we want
to add some new provisions to that API. Andres, do you have any
thoughts about that?

Anyway, this approach gets rid of false positives, which is great
for performance and bad for testing. Code coverage says that now
we never hit the failure paths during regression tests, which is
unsurprising, but I'm not very comfortable with leaving those paths
unexercised. I tried to make an isolation test to exercise them,
but there's no good way at the SQL level to get a session to block
during the detoast step. LOCK TABLE on some catalog's toast table
would do, but we disallow it. I thought about adding a small C
function to regress.so to take out such a lock, but we have no
infrastructure for referencing regress.so from isolation tests.
What I ended up doing is adding a random failure about 0.1% of
the time in USE_ASSERT_CHECKING builds --- that's intellectually
ugly for sure, but doing better seems like way more work than
it's worth.

regards, tom lane

Attachments:

v3-fix-stale-catcache-entries.patchtext/x-diff; charset=us-ascii; name=v3-fix-stale-catcache-entries.patchDownload

diff --git a/src/backend/utils/cache/catcache.c b/src/backend/utils/cache/catcache.c
index c0591cffe5..b1ce7bf513 100644
--- a/src/backend/utils/cache/catcache.c
+++ b/src/backend/utils/cache/catcache.c
@@ -15,6 +15,7 @@
 #include "postgres.h"
 
 #include "access/genam.h"
+#include "access/heapam.h"
 #include "access/heaptoast.h"
 #include "access/relscan.h"
 #include "access/sysattr.h"
@@ -24,8 +25,10 @@
 #include "catalog/pg_operator.h"
 #include "catalog/pg_type.h"
 #include "common/hashfn.h"
+#include "common/pg_prng.h"
 #include "miscadmin.h"
 #include "port/pg_bitutils.h"
+#include "storage/bufmgr.h"
 #ifdef CATCACHE_STATS
 #include "storage/ipc.h"		/* for on_proc_exit */
 #endif
@@ -90,10 +93,10 @@ static void CatCachePrintStats(int code, Datum arg);
 static void CatCacheRemoveCTup(CatCache *cache, CatCTup *ct);
 static void CatCacheRemoveCList(CatCache *cache, CatCList *cl);
 static void CatalogCacheInitializeCache(CatCache *cache);
-static CatCTup *CatalogCacheCreateEntry(CatCache *cache, HeapTuple ntp,
+static CatCTup *CatalogCacheCreateEntry(CatCache *cache,
+										HeapTuple ntp, SysScanDesc scandesc,
 										Datum *arguments,
-										uint32 hashValue, Index hashIndex,
-										bool negative);
+										uint32 hashValue, Index hashIndex);
 
 static void ReleaseCatCacheWithOwner(HeapTuple tuple, ResourceOwner resowner);
 static void ReleaseCatCacheListWithOwner(CatCList *list, ResourceOwner resowner);
@@ -1372,6 +1375,7 @@ SearchCatCacheMiss(CatCache *cache,
 	SysScanDesc scandesc;
 	HeapTuple	ntp;
 	CatCTup    *ct;
+	bool		stale;
 	Datum		arguments[CATCACHE_MAXKEYS];
 
 	/* Initialize local parameter array */
@@ -1380,16 +1384,6 @@ SearchCatCacheMiss(CatCache *cache,
 	arguments[2] = v3;
 	arguments[3] = v4;
 
-	/*
-	 * Ok, need to make a lookup in the relation, copy the scankey and fill
-	 * out any per-call fields.
-	 */
-	memcpy(cur_skey, cache->cc_skey, sizeof(ScanKeyData) * nkeys);
-	cur_skey[0].sk_argument = v1;
-	cur_skey[1].sk_argument = v2;
-	cur_skey[2].sk_argument = v3;
-	cur_skey[3].sk_argument = v4;
-
 	/*
 	 * Tuple was not found in cache, so we have to try to retrieve it directly
 	 * from the relation.  If found, we will add it to the cache; if not
@@ -1404,9 +1398,28 @@ SearchCatCacheMiss(CatCache *cache,
 	 * will eventually age out of the cache, so there's no functional problem.
 	 * This case is rare enough that it's not worth expending extra cycles to
 	 * detect.
+	 *
+	 * Another case, which we *must* handle, is that the tuple could become
+	 * outdated during CatalogCacheCreateEntry's attempt to detoast it (since
+	 * AcceptInvalidationMessages can run during TOAST table access).  We do
+	 * not want to return already-stale catcache entries, so we loop around
+	 * and do the table scan again if that happens.
 	 */
 	relation = table_open(cache->cc_reloid, AccessShareLock);
 
+	do
+	{
+		/*
+		 * Ok, need to make a lookup in the relation, copy the scankey and
+		 * fill out any per-call fields.  (We must re-do this when retrying,
+		 * because systable_beginscan scribbles on the scankey.)
+		 */
+		memcpy(cur_skey, cache->cc_skey, sizeof(ScanKeyData) * nkeys);
+		cur_skey[0].sk_argument = v1;
+		cur_skey[1].sk_argument = v2;
+		cur_skey[2].sk_argument = v3;
+		cur_skey[3].sk_argument = v4;
+
 	scandesc = systable_beginscan(relation,
 								  cache->cc_indexoid,
 								  IndexScanOK(cache, cur_skey),
@@ -1415,12 +1428,18 @@ SearchCatCacheMiss(CatCache *cache,
 								  cur_skey);
 
 	ct = NULL;
+	stale = false;
 
 	while (HeapTupleIsValid(ntp = systable_getnext(scandesc)))
 	{
-		ct = CatalogCacheCreateEntry(cache, ntp, arguments,
-									 hashValue, hashIndex,
-									 false);
+		ct = CatalogCacheCreateEntry(cache, ntp, scandesc, NULL,
+									 hashValue, hashIndex);
+		/* upon failure, we must start the scan over */
+		if (ct == NULL)
+		{
+			stale = true;
+			break;
+		}
 		/* immediately set the refcount to 1 */
 		ResourceOwnerEnlarge(CurrentResourceOwner);
 		ct->refcount++;
@@ -1429,6 +1448,7 @@ SearchCatCacheMiss(CatCache *cache,
 	}
 
 	systable_endscan(scandesc);
+	} while (stale);
 
 	table_close(relation, AccessShareLock);
 
@@ -1447,9 +1467,11 @@ SearchCatCacheMiss(CatCache *cache,
 		if (IsBootstrapProcessingMode())
 			return NULL;
 
-		ct = CatalogCacheCreateEntry(cache, NULL, arguments,
-									 hashValue, hashIndex,
-									 true);
+		ct = CatalogCacheCreateEntry(cache, NULL, NULL, arguments,
+									 hashValue, hashIndex);
+
+		/* Creating a negative cache entry shouldn't fail */
+		Assert(ct != NULL);
 
 		CACHE_elog(DEBUG2, "SearchCatCache(%s): Contains %d/%d tuples",
 				   cache->cc_relname, cache->cc_ntup, CacheHdr->ch_ntup);
@@ -1663,7 +1685,8 @@ SearchCatCacheList(CatCache *cache,
 	 * We have to bump the member refcounts temporarily to ensure they won't
 	 * get dropped from the cache while loading other members. We use a PG_TRY
 	 * block to ensure we can undo those refcounts if we get an error before
-	 * we finish constructing the CatCList.
+	 * we finish constructing the CatCList.  ctlist must be valid throughout
+	 * the PG_TRY block.
 	 */
 	ctlist = NIL;
 
@@ -1672,19 +1695,23 @@ SearchCatCacheList(CatCache *cache,
 		ScanKeyData cur_skey[CATCACHE_MAXKEYS];
 		Relation	relation;
 		SysScanDesc scandesc;
-
-		/*
-		 * Ok, need to make a lookup in the relation, copy the scankey and
-		 * fill out any per-call fields.
-		 */
-		memcpy(cur_skey, cache->cc_skey, sizeof(ScanKeyData) * cache->cc_nkeys);
-		cur_skey[0].sk_argument = v1;
-		cur_skey[1].sk_argument = v2;
-		cur_skey[2].sk_argument = v3;
-		cur_skey[3].sk_argument = v4;
+		bool		stale;
 
 		relation = table_open(cache->cc_reloid, AccessShareLock);
 
+		do
+		{
+			/*
+			 * Ok, need to make a lookup in the relation, copy the scankey and
+			 * fill out any per-call fields.  (We must re-do this when
+			 * retrying, because systable_beginscan scribbles on the scankey.)
+			 */
+			memcpy(cur_skey, cache->cc_skey, sizeof(ScanKeyData) * cache->cc_nkeys);
+			cur_skey[0].sk_argument = v1;
+			cur_skey[1].sk_argument = v2;
+			cur_skey[2].sk_argument = v3;
+			cur_skey[3].sk_argument = v4;
+
 		scandesc = systable_beginscan(relation,
 									  cache->cc_indexoid,
 									  IndexScanOK(cache, cur_skey),
@@ -1695,6 +1722,8 @@ SearchCatCacheList(CatCache *cache,
 		/* The list will be ordered iff we are doing an index scan */
 		ordered = (scandesc->irel != NULL);
 
+		stale = false;
+
 		while (HeapTupleIsValid(ntp = systable_getnext(scandesc)))
 		{
 			uint32		hashValue;
@@ -1737,9 +1766,32 @@ SearchCatCacheList(CatCache *cache,
 			if (!found)
 			{
 				/* We didn't find a usable entry, so make a new one */
-				ct = CatalogCacheCreateEntry(cache, ntp, arguments,
-											 hashValue, hashIndex,
-											 false);
+				ct = CatalogCacheCreateEntry(cache, ntp, scandesc, NULL,
+											 hashValue, hashIndex);
+				/* upon failure, we must start the scan over */
+				if (ct == NULL)
+				{
+					/*
+					 * Release refcounts on any items we already had.  We dare
+					 * not try to free them if they're now unreferenced, since
+					 * an error while doing that would result in the PG_CATCH
+					 * below doing extra refcount decrements.  Besides, we'll
+					 * likely re-adopt those items in the next iteration, so
+					 * it's not worth complicating matters to try to get rid
+					 * of them.
+					 */
+					foreach(ctlist_item, ctlist)
+					{
+						ct = (CatCTup *) lfirst(ctlist_item);
+						Assert(ct->c_list == NULL);
+						Assert(ct->refcount > 0);
+						ct->refcount--;
+					}
+					/* Reset ctlist in preparation for new try */
+					ctlist = NIL;
+					stale = true;
+					break;
+				}
 			}
 
 			/* Careful here: add entry to ctlist, then bump its refcount */
@@ -1749,6 +1801,7 @@ SearchCatCacheList(CatCache *cache,
 		}
 
 		systable_endscan(scandesc);
+		} while (stale);
 
 		table_close(relation, AccessShareLock);
 
@@ -1865,22 +1918,41 @@ ReleaseCatCacheListWithOwner(CatCList *list, ResourceOwner resowner)
  * CatalogCacheCreateEntry
  *		Create a new CatCTup entry, copying the given HeapTuple and other
  *		supplied data into it.  The new entry initially has refcount 0.
+ *
+ * To create a normal cache entry, ntp must be the HeapTuple just fetched
+ * from scandesc, and "arguments" is not used.  To create a negative cache
+ * entry, pass NULL for ntp and scandesc; then "arguments" is the cache
+ * keys to use.  In either case, hashValue/hashIndex are the hash values
+ * computed from the cache keys.
+ *
+ * Returns NULL if we attempt to detoast the tuple and observe that it
+ * became stale.  (This cannot happen for a negative entry.)  Caller must
+ * retry the tuple lookup in that case.
  */
 static CatCTup *
-CatalogCacheCreateEntry(CatCache *cache, HeapTuple ntp, Datum *arguments,
-						uint32 hashValue, Index hashIndex,
-						bool negative)
+CatalogCacheCreateEntry(CatCache *cache, HeapTuple ntp, SysScanDesc scandesc,
+						Datum *arguments,
+						uint32 hashValue, Index hashIndex)
 {
 	CatCTup    *ct;
 	HeapTuple	dtp;
 	MemoryContext oldcxt;
 
-	/* negative entries have no tuple associated */
 	if (ntp)
 	{
 		int			i;
 
-		Assert(!negative);
+		/*
+		 * The visibility check below essentially never fails during our
+		 * regression tests, and there's no easy way to force it to fail for
+		 * testing purposes.  To ensure we have test coverage for the retry
+		 * paths in our callers, make debug builds randomly fail about 0.1% of
+		 * the times through this code path.
+		 */
+#ifdef USE_ASSERT_CHECKING
+		if (pg_prng_uint32(&pg_global_prng_state) <= (PG_UINT32_MAX / 1000))
+			return NULL;
+#endif
 
 		/*
 		 * If there are any out-of-line toasted fields in the tuple, expand
@@ -1888,9 +1960,32 @@ CatalogCacheCreateEntry(CatCache *cache, HeapTuple ntp, Datum *arguments,
 		 * entry, and also protects us against the possibility of the toast
 		 * tuples being freed before we attempt to fetch them, in case of
 		 * something using a slightly stale catcache entry.
+		 *
+		 * The tuple could become stale while we are doing toast table access
+		 * (since AcceptInvalidationMessages can run then), so we must recheck
+		 * its visibility afterwards.  To do that, we must be undesirably
+		 * familiar with the fact that systable_getnext() will return a direct
+		 * pointer into a catalog disk buffer.  This code is likely to need
+		 * work if we ever want to support non-heap catalogs.
 		 */
 		if (HeapTupleHasExternal(ntp))
+		{
+			BufferHeapTupleTableSlot *bslot = (BufferHeapTupleTableSlot *) scandesc->slot;
+
+			/* Assert that we have a live pointer to a pinned catalog buffer */
+			Assert(TTS_IS_BUFFERTUPLE(scandesc->slot));
+			Assert(ntp == bslot->base.tuple);
+			Assert(BufferIsValid(bslot->buffer));
+
 			dtp = toast_flatten_tuple(ntp, cache->cc_tupdesc);
+			if (!HeapTupleSatisfiesVisibility(ntp,
+											  GetCatalogSnapshot(cache->cc_reloid),
+											  bslot->buffer))
+			{
+				heap_freetuple(dtp);
+				return NULL;
+			}
+		}
 		else
 			dtp = ntp;
 
@@ -1929,7 +2024,7 @@ CatalogCacheCreateEntry(CatCache *cache, HeapTuple ntp, Datum *arguments,
 	}
 	else
 	{
-		Assert(negative);
+		/* Set up keys for a negative cache entry */
 		oldcxt = MemoryContextSwitchTo(CacheMemoryContext);
 		ct = (CatCTup *) palloc(sizeof(CatCTup));
 
@@ -1951,7 +2046,7 @@ CatalogCacheCreateEntry(CatCache *cache, HeapTuple ntp, Datum *arguments,
 	ct->c_list = NULL;
 	ct->refcount = 0;			/* for the moment */
 	ct->dead = false;
-	ct->negative = negative;
+	ct->negative = (ntp == NULL);
 	ct->hash_value = hashValue;
 
 	dlist_push_head(&cache->cc_bucket[hashIndex], &ct->cache_elem);

Tom Lane

tgl@sss.pgh.pa.us

almost 2 years ago

In reply to: Tom Lane (#4)

1 attachment(s)

I wrote:

This is uncomfortably much in bed with the tuple table slot code,
perhaps, but I don't see a way to do it more cleanly unless we want
to add some new provisions to that API. Andres, do you have any
thoughts about that?

Oh! After nosing around a bit more I remembered systable_recheck_tuple,
which is meant for exactly this purpose. So v4 attached.

regards, tom lane

Attachments:

v4-fix-stale-catcache-entries.patchtext/x-diff; charset=us-ascii; name=v4-fix-stale-catcache-entries.patchDownload

diff --git a/src/backend/utils/cache/catcache.c b/src/backend/utils/cache/catcache.c
index c0591cffe5..0dcd45d2f0 100644
--- a/src/backend/utils/cache/catcache.c
+++ b/src/backend/utils/cache/catcache.c
@@ -24,6 +24,7 @@
 #include "catalog/pg_operator.h"
 #include "catalog/pg_type.h"
 #include "common/hashfn.h"
+#include "common/pg_prng.h"
 #include "miscadmin.h"
 #include "port/pg_bitutils.h"
 #ifdef CATCACHE_STATS
@@ -90,10 +91,10 @@ static void CatCachePrintStats(int code, Datum arg);
 static void CatCacheRemoveCTup(CatCache *cache, CatCTup *ct);
 static void CatCacheRemoveCList(CatCache *cache, CatCList *cl);
 static void CatalogCacheInitializeCache(CatCache *cache);
-static CatCTup *CatalogCacheCreateEntry(CatCache *cache, HeapTuple ntp,
+static CatCTup *CatalogCacheCreateEntry(CatCache *cache,
+										HeapTuple ntp, SysScanDesc scandesc,
 										Datum *arguments,
-										uint32 hashValue, Index hashIndex,
-										bool negative);
+										uint32 hashValue, Index hashIndex);
 
 static void ReleaseCatCacheWithOwner(HeapTuple tuple, ResourceOwner resowner);
 static void ReleaseCatCacheListWithOwner(CatCList *list, ResourceOwner resowner);
@@ -1372,6 +1373,7 @@ SearchCatCacheMiss(CatCache *cache,
 	SysScanDesc scandesc;
 	HeapTuple	ntp;
 	CatCTup    *ct;
+	bool		stale;
 	Datum		arguments[CATCACHE_MAXKEYS];
 
 	/* Initialize local parameter array */
@@ -1380,16 +1382,6 @@ SearchCatCacheMiss(CatCache *cache,
 	arguments[2] = v3;
 	arguments[3] = v4;
 
-	/*
-	 * Ok, need to make a lookup in the relation, copy the scankey and fill
-	 * out any per-call fields.
-	 */
-	memcpy(cur_skey, cache->cc_skey, sizeof(ScanKeyData) * nkeys);
-	cur_skey[0].sk_argument = v1;
-	cur_skey[1].sk_argument = v2;
-	cur_skey[2].sk_argument = v3;
-	cur_skey[3].sk_argument = v4;
-
 	/*
 	 * Tuple was not found in cache, so we have to try to retrieve it directly
 	 * from the relation.  If found, we will add it to the cache; if not
@@ -1404,9 +1396,28 @@ SearchCatCacheMiss(CatCache *cache,
 	 * will eventually age out of the cache, so there's no functional problem.
 	 * This case is rare enough that it's not worth expending extra cycles to
 	 * detect.
+	 *
+	 * Another case, which we *must* handle, is that the tuple could become
+	 * outdated during CatalogCacheCreateEntry's attempt to detoast it (since
+	 * AcceptInvalidationMessages can run during TOAST table access).  We do
+	 * not want to return already-stale catcache entries, so we loop around
+	 * and do the table scan again if that happens.
 	 */
 	relation = table_open(cache->cc_reloid, AccessShareLock);
 
+	do
+	{
+		/*
+		 * Ok, need to make a lookup in the relation, copy the scankey and
+		 * fill out any per-call fields.  (We must re-do this when retrying,
+		 * because systable_beginscan scribbles on the scankey.)
+		 */
+		memcpy(cur_skey, cache->cc_skey, sizeof(ScanKeyData) * nkeys);
+		cur_skey[0].sk_argument = v1;
+		cur_skey[1].sk_argument = v2;
+		cur_skey[2].sk_argument = v3;
+		cur_skey[3].sk_argument = v4;
+
 	scandesc = systable_beginscan(relation,
 								  cache->cc_indexoid,
 								  IndexScanOK(cache, cur_skey),
@@ -1415,12 +1426,18 @@ SearchCatCacheMiss(CatCache *cache,
 								  cur_skey);
 
 	ct = NULL;
+	stale = false;
 
 	while (HeapTupleIsValid(ntp = systable_getnext(scandesc)))
 	{
-		ct = CatalogCacheCreateEntry(cache, ntp, arguments,
-									 hashValue, hashIndex,
-									 false);
+		ct = CatalogCacheCreateEntry(cache, ntp, scandesc, NULL,
+									 hashValue, hashIndex);
+		/* upon failure, we must start the scan over */
+		if (ct == NULL)
+		{
+			stale = true;
+			break;
+		}
 		/* immediately set the refcount to 1 */
 		ResourceOwnerEnlarge(CurrentResourceOwner);
 		ct->refcount++;
@@ -1429,6 +1446,7 @@ SearchCatCacheMiss(CatCache *cache,
 	}
 
 	systable_endscan(scandesc);
+	} while (stale);
 
 	table_close(relation, AccessShareLock);
 
@@ -1447,9 +1465,11 @@ SearchCatCacheMiss(CatCache *cache,
 		if (IsBootstrapProcessingMode())
 			return NULL;
 
-		ct = CatalogCacheCreateEntry(cache, NULL, arguments,
-									 hashValue, hashIndex,
-									 true);
+		ct = CatalogCacheCreateEntry(cache, NULL, NULL, arguments,
+									 hashValue, hashIndex);
+
+		/* Creating a negative cache entry shouldn't fail */
+		Assert(ct != NULL);
 
 		CACHE_elog(DEBUG2, "SearchCatCache(%s): Contains %d/%d tuples",
 				   cache->cc_relname, cache->cc_ntup, CacheHdr->ch_ntup);
@@ -1663,7 +1683,8 @@ SearchCatCacheList(CatCache *cache,
 	 * We have to bump the member refcounts temporarily to ensure they won't
 	 * get dropped from the cache while loading other members. We use a PG_TRY
 	 * block to ensure we can undo those refcounts if we get an error before
-	 * we finish constructing the CatCList.
+	 * we finish constructing the CatCList.  ctlist must be valid throughout
+	 * the PG_TRY block.
 	 */
 	ctlist = NIL;
 
@@ -1672,19 +1693,23 @@ SearchCatCacheList(CatCache *cache,
 		ScanKeyData cur_skey[CATCACHE_MAXKEYS];
 		Relation	relation;
 		SysScanDesc scandesc;
-
-		/*
-		 * Ok, need to make a lookup in the relation, copy the scankey and
-		 * fill out any per-call fields.
-		 */
-		memcpy(cur_skey, cache->cc_skey, sizeof(ScanKeyData) * cache->cc_nkeys);
-		cur_skey[0].sk_argument = v1;
-		cur_skey[1].sk_argument = v2;
-		cur_skey[2].sk_argument = v3;
-		cur_skey[3].sk_argument = v4;
+		bool		stale;
 
 		relation = table_open(cache->cc_reloid, AccessShareLock);
 
+		do
+		{
+			/*
+			 * Ok, need to make a lookup in the relation, copy the scankey and
+			 * fill out any per-call fields.  (We must re-do this when
+			 * retrying, because systable_beginscan scribbles on the scankey.)
+			 */
+			memcpy(cur_skey, cache->cc_skey, sizeof(ScanKeyData) * cache->cc_nkeys);
+			cur_skey[0].sk_argument = v1;
+			cur_skey[1].sk_argument = v2;
+			cur_skey[2].sk_argument = v3;
+			cur_skey[3].sk_argument = v4;
+
 		scandesc = systable_beginscan(relation,
 									  cache->cc_indexoid,
 									  IndexScanOK(cache, cur_skey),
@@ -1695,6 +1720,8 @@ SearchCatCacheList(CatCache *cache,
 		/* The list will be ordered iff we are doing an index scan */
 		ordered = (scandesc->irel != NULL);
 
+		stale = false;
+
 		while (HeapTupleIsValid(ntp = systable_getnext(scandesc)))
 		{
 			uint32		hashValue;
@@ -1737,9 +1764,32 @@ SearchCatCacheList(CatCache *cache,
 			if (!found)
 			{
 				/* We didn't find a usable entry, so make a new one */
-				ct = CatalogCacheCreateEntry(cache, ntp, arguments,
-											 hashValue, hashIndex,
-											 false);
+				ct = CatalogCacheCreateEntry(cache, ntp, scandesc, NULL,
+											 hashValue, hashIndex);
+				/* upon failure, we must start the scan over */
+				if (ct == NULL)
+				{
+					/*
+					 * Release refcounts on any items we already had.  We dare
+					 * not try to free them if they're now unreferenced, since
+					 * an error while doing that would result in the PG_CATCH
+					 * below doing extra refcount decrements.  Besides, we'll
+					 * likely re-adopt those items in the next iteration, so
+					 * it's not worth complicating matters to try to get rid
+					 * of them.
+					 */
+					foreach(ctlist_item, ctlist)
+					{
+						ct = (CatCTup *) lfirst(ctlist_item);
+						Assert(ct->c_list == NULL);
+						Assert(ct->refcount > 0);
+						ct->refcount--;
+					}
+					/* Reset ctlist in preparation for new try */
+					ctlist = NIL;
+					stale = true;
+					break;
+				}
 			}
 
 			/* Careful here: add entry to ctlist, then bump its refcount */
@@ -1749,6 +1799,7 @@ SearchCatCacheList(CatCache *cache,
 		}
 
 		systable_endscan(scandesc);
+		} while (stale);
 
 		table_close(relation, AccessShareLock);
 
@@ -1865,22 +1916,42 @@ ReleaseCatCacheListWithOwner(CatCList *list, ResourceOwner resowner)
  * CatalogCacheCreateEntry
  *		Create a new CatCTup entry, copying the given HeapTuple and other
  *		supplied data into it.  The new entry initially has refcount 0.
+ *
+ * To create a normal cache entry, ntp must be the HeapTuple just fetched
+ * from scandesc, and "arguments" is not used.  To create a negative cache
+ * entry, pass NULL for ntp and scandesc; then "arguments" is the cache
+ * keys to use.  In either case, hashValue/hashIndex are the hash values
+ * computed from the cache keys.
+ *
+ * Returns NULL if we attempt to detoast the tuple and observe that it
+ * became stale.  (This cannot happen for a negative entry.)  Caller must
+ * retry the tuple lookup in that case.
  */
 static CatCTup *
-CatalogCacheCreateEntry(CatCache *cache, HeapTuple ntp, Datum *arguments,
-						uint32 hashValue, Index hashIndex,
-						bool negative)
+CatalogCacheCreateEntry(CatCache *cache, HeapTuple ntp, SysScanDesc scandesc,
+						Datum *arguments,
+						uint32 hashValue, Index hashIndex)
 {
 	CatCTup    *ct;
 	HeapTuple	dtp;
 	MemoryContext oldcxt;
 
-	/* negative entries have no tuple associated */
 	if (ntp)
 	{
 		int			i;
 
-		Assert(!negative);
+		/*
+		 * The visibility recheck below essentially never fails during our
+		 * regression tests, and there's no easy way to force it to fail for
+		 * testing purposes.  To ensure we have test coverage for the retry
+		 * paths in our callers, make debug builds randomly fail about 0.1% of
+		 * the times through this code path, even when there's no toasted
+		 * fields.
+		 */
+#ifdef USE_ASSERT_CHECKING
+		if (pg_prng_uint32(&pg_global_prng_state) <= (PG_UINT32_MAX / 1000))
+			return NULL;
+#endif
 
 		/*
 		 * If there are any out-of-line toasted fields in the tuple, expand
@@ -1890,7 +1961,20 @@ CatalogCacheCreateEntry(CatCache *cache, HeapTuple ntp, Datum *arguments,
 		 * something using a slightly stale catcache entry.
 		 */
 		if (HeapTupleHasExternal(ntp))
+		{
 			dtp = toast_flatten_tuple(ntp, cache->cc_tupdesc);
+
+			/*
+			 * The tuple could become stale while we are doing toast table
+			 * access (since AcceptInvalidationMessages can run then), so we
+			 * must recheck its visibility afterwards.
+			 */
+			if (!systable_recheck_tuple(scandesc, ntp))
+			{
+				heap_freetuple(dtp);
+				return NULL;
+			}
+		}
 		else
 			dtp = ntp;
 
@@ -1929,7 +2013,7 @@ CatalogCacheCreateEntry(CatCache *cache, HeapTuple ntp, Datum *arguments,
 	}
 	else
 	{
-		Assert(negative);
+		/* Set up keys for a negative cache entry */
 		oldcxt = MemoryContextSwitchTo(CacheMemoryContext);
 		ct = (CatCTup *) palloc(sizeof(CatCTup));
 
@@ -1951,7 +2035,7 @@ CatalogCacheCreateEntry(CatCache *cache, HeapTuple ntp, Datum *arguments,
 	ct->c_list = NULL;
 	ct->refcount = 0;			/* for the moment */
 	ct->dead = false;
-	ct->negative = negative;
+	ct->negative = (ntp == NULL);
 	ct->hash_value = hashValue;
 
 	dlist_push_head(&cache->cc_bucket[hashIndex], &ct->cache_elem);

Xiaoran Wang

fanfuxiaoran@gmail.com

almost 2 years ago

In reply to: Tom Lane (#5)

Great! That's what exactly we need.

The patch LGTM, +1

Tom Lane <tgl@sss.pgh.pa.us> 于2024年1月13日周六 04:47写道：

Show quoted text

I wrote:

This is uncomfortably much in bed with the tuple table slot code,
perhaps, but I don't see a way to do it more cleanly unless we want
to add some new provisions to that API. Andres, do you have any
thoughts about that?

Oh! After nosing around a bit more I remembered systable_recheck_tuple,
which is meant for exactly this purpose. So v4 attached.

regards, tom lane

Xiaoran Wang

fanfuxiaoran@gmail.com

almost 2 years ago

In reply to: Xiaoran Wang (#6)

Hmm, how about first checking if any invalidated shared messages have been
accepted, then rechecking the tuple's visibility?
If there is no invalidated shared message accepted during
'toast_flatten_tuple',
there is no need to do then visibility check, then it can save several
CPU cycles.

----
if (inval_count != SharedInvalidMessageCounter &&
!systable_recheck_tuple(scandesc, ntp))
{
heap_freetuple(dtp);
return NULL;
}
----

Xiaoran Wang <fanfuxiaoran@gmail.com> 于2024年1月13日周六 13:16写道：

Show quoted text

Great! That's what exactly we need.

The patch LGTM, +1

Tom Lane <tgl@sss.pgh.pa.us> 于2024年1月13日周六 04:47写道：

I wrote:

This is uncomfortably much in bed with the tuple table slot code,
perhaps, but I don't see a way to do it more cleanly unless we want
to add some new provisions to that API. Andres, do you have any
thoughts about that?

Oh! After nosing around a bit more I remembered systable_recheck_tuple,
which is meant for exactly this purpose. So v4 attached.

regards, tom lane

Tom Lane

tgl@sss.pgh.pa.us

almost 2 years ago

In reply to: Xiaoran Wang (#7)

Xiaoran Wang <fanfuxiaoran@gmail.com> writes:

Hmm, how about first checking if any invalidated shared messages have been
accepted, then rechecking the tuple's visibility?
If there is no invalidated shared message accepted during
'toast_flatten_tuple',
there is no need to do then visibility check, then it can save several
CPU cycles.

Meh, I'd just as soon not add the additional dependency/risk of bugs.
This is an expensive and seldom-taken code path, so I don't think
shaving a few cycles is really important.

regards, tom lane

Tom Lane

tgl@sss.pgh.pa.us

almost 2 years ago

In reply to: Tom Lane (#8)

I wrote:

Xiaoran Wang <fanfuxiaoran@gmail.com> writes:

Hmm, how about first checking if any invalidated shared messages have been
accepted, then rechecking the tuple's visibility?
If there is no invalidated shared message accepted during
'toast_flatten_tuple',
there is no need to do then visibility check, then it can save several
CPU cycles.

Meh, I'd just as soon not add the additional dependency/risk of bugs.
This is an expensive and seldom-taken code path, so I don't think
shaving a few cycles is really important.

It occurred to me that this idea might be more interesting if we
could encapsulate it right into systable_recheck_tuple: something
like having systable_beginscan capture the current
SharedInvalidMessageCounter and save it in the SysScanDesc struct,
then compare in systable_recheck_tuple to possibly short-circuit
that work. This'd eliminate one of the main bug hazards in the
idea, namely that you might capture SharedInvalidMessageCounter too
late, after something's already happened. However, the whole idea
only works for catalogs that have catcaches, and the other users of
systable_recheck_tuple are interested in pg_depend which doesn't.
So that put a damper on my enthusiasm for the idea.

regards, tom lane

#10

Noah Misch

noah@leadboat.com

almost 2 years ago

In reply to: Tom Lane (#5)

On Fri, Jan 12, 2024 at 03:47:13PM -0500, Tom Lane wrote:

I wrote:

This is uncomfortably much in bed with the tuple table slot code,
perhaps, but I don't see a way to do it more cleanly unless we want
to add some new provisions to that API. Andres, do you have any
thoughts about that?

Oh! After nosing around a bit more I remembered systable_recheck_tuple,
which is meant for exactly this purpose. So v4 attached.

systable_recheck_tuple() is blind to heap_inplace_update(), so it's not a
general proxy for invalidation messages. The commit for $SUBJECT (ad98fb1)
doesn't create any new malfunctions, but I expect the systable_recheck_tuple()
part will change again before the heap_inplace_update() story is over
(/messages/by-id/flat/CAMp+ueZQz3yDk7qg42hk6-9gxniYbp-=bG2mgqecErqR5gGGOA@mail.gmail.com).

#11

Xiaoran Wang

fanfuxiaoran@gmail.com

almost 2 years ago

In reply to: Tom Lane (#9)

This is an interesting idea.
Although some catalog tables are not in catcaches,
such as pg_depend, when scanning them, if there is any
SharedInvalidationMessage, the CatalogSnapshot
will be invalidated and recreated ("RelationInvalidatesSnapshotsOnly"
in syscache.c)
Maybe during the system_scan, it receives the SharedInvalidationMessages
and returns the tuples which
are out of date. systable_recheck_tuple is used in dependency.c for such
case.

Tom Lane <tgl@sss.pgh.pa.us> 于2024年1月14日周日 03:12写道：

Show quoted text

I wrote:

Xiaoran Wang <fanfuxiaoran@gmail.com> writes:

Hmm, how about first checking if any invalidated shared messages have

been

accepted, then rechecking the tuple's visibility?
If there is no invalidated shared message accepted during
'toast_flatten_tuple',
there is no need to do then visibility check, then it can save several
CPU cycles.

Meh, I'd just as soon not add the additional dependency/risk of bugs.
This is an expensive and seldom-taken code path, so I don't think
shaving a few cycles is really important.

It occurred to me that this idea might be more interesting if we
could encapsulate it right into systable_recheck_tuple: something
like having systable_beginscan capture the current
SharedInvalidMessageCounter and save it in the SysScanDesc struct,
then compare in systable_recheck_tuple to possibly short-circuit
that work. This'd eliminate one of the main bug hazards in the
idea, namely that you might capture SharedInvalidMessageCounter too
late, after something's already happened. However, the whole idea
only works for catalogs that have catcaches, and the other users of
systable_recheck_tuple are interested in pg_depend which doesn't.
So that put a damper on my enthusiasm for the idea.

regards, tom lane

#12

Noah Misch

noah@leadboat.com

over 1 year ago

In reply to: Noah Misch (#10)

On Sun, Jan 14, 2024 at 12:14:11PM -0800, Noah Misch wrote:

On Fri, Jan 12, 2024 at 03:47:13PM -0500, Tom Lane wrote:

Oh! After nosing around a bit more I remembered systable_recheck_tuple,
which is meant for exactly this purpose. So v4 attached.

systable_recheck_tuple() is blind to heap_inplace_update(), so it's not a
general proxy for invalidation messages. The commit for $SUBJECT (ad98fb1)
doesn't create any new malfunctions, but I expect the systable_recheck_tuple()
part will change again before the heap_inplace_update() story is over
(/messages/by-id/flat/CAMp+ueZQz3yDk7qg42hk6-9gxniYbp-=bG2mgqecErqR5gGGOA@mail.gmail.com).

Commit f9f47f0 (2024-06-27) addressed inplace updates here.

#13

Heikki Linnakangas

hlinnaka@iki.fi

about 1 year ago

In reply to: Noah Misch (#12)

On 25/09/2024 00:20, Noah Misch wrote:

On Sun, Jan 14, 2024 at 12:14:11PM -0800, Noah Misch wrote:

On Fri, Jan 12, 2024 at 03:47:13PM -0500, Tom Lane wrote:

Oh! After nosing around a bit more I remembered systable_recheck_tuple,
which is meant for exactly this purpose. So v4 attached.

systable_recheck_tuple() is blind to heap_inplace_update(), so it's not a
general proxy for invalidation messages. The commit for $SUBJECT (ad98fb1)
doesn't create any new malfunctions, but I expect the systable_recheck_tuple()
part will change again before the heap_inplace_update() story is over
(/messages/by-id/flat/CAMp+ueZQz3yDk7qg42hk6-9gxniYbp-=bG2mgqecErqR5gGGOA@mail.gmail.com).

Commit f9f47f0 (2024-06-27) addressed inplace updates here.

I started to wonder if there's still an issue with catcache list
entries. The code to build a CatCList looks like this:

SearchCatCacheList()
systable_beginscan()
while (HeapTupleIsValid(ntp = systable_getnext(scandesc))) {
ct = CatalogCacheCreateEntry(ntp)
if (ct == NULL)
{
/* 'ntp' was concurrently invalidated, start all over */
}
}
systable_endscan();

/* create CatCList entry */

CatalogCacheCreateEntry() can accept catcache invalidations when it
opens the toast table, and it now has recheck logic to detect the case
that the tuple it's processing (ntp) is invalidated. However, isn't it
also possible that it accepts an invalidation message for a tuple that
we had processed in an earlier iteration of the loop? Or that a new
catalog tuple was inserted that should be part of the list we're building?

--
Heikki Linnakangas
Neon (https://neon.tech)

#14

Tom Lane

tgl@sss.pgh.pa.us

about 1 year ago

In reply to: Heikki Linnakangas (#13)

Heikki Linnakangas <hlinnaka@iki.fi> writes:

CatalogCacheCreateEntry() can accept catcache invalidations when it
opens the toast table, and it now has recheck logic to detect the case
that the tuple it's processing (ntp) is invalidated. However, isn't it
also possible that it accepts an invalidation message for a tuple that
we had processed in an earlier iteration of the loop? Or that a new
catalog tuple was inserted that should be part of the list we're building?

The expectation is that the list will be built and returned to the
caller, but it's already marked as stale so it will be rebuilt
on next request.

We could consider putting a loop around that, but (a) it might loop a
lot of times, and (b) returning a stale list isn't much different from
the situation where the list-invalidating event arrives a nanosecond
after we finish rather than a nanosecond before. Ultimately it's the
caller's responsibility that the returned list be consistent enough
for its purposes. It might achieve that by first taking a lock on a
related table, for example.

regards, tom lane

#15

Heikki Linnakangas

hlinnaka@iki.fi

about 1 year ago

In reply to: Tom Lane (#14)

1 attachment(s)

On 13/12/2024 17:30, Tom Lane wrote:

Heikki Linnakangas <hlinnaka@iki.fi> writes:

CatalogCacheCreateEntry() can accept catcache invalidations when it
opens the toast table, and it now has recheck logic to detect the case
that the tuple it's processing (ntp) is invalidated. However, isn't it
also possible that it accepts an invalidation message for a tuple that
we had processed in an earlier iteration of the loop? Or that a new
catalog tuple was inserted that should be part of the list we're building?

The expectation is that the list will be built and returned to the
caller, but it's already marked as stale so it will be rebuilt
on next request.

Ah, you mean this at the end:

/* mark list dead if any members already dead */
if (ct->dead)
cl->dead = true;

Ok, I missed that. It does not handle the 2nd scenario though: If a new
catalog tuple is concurrently inserted that should be part of the list,
it is missed.

I was able to reproduce that, by pausing a process with gdb while it's
building the list in SearchCatCacheList():

1. Create a function called foofunc(integer). It must be large so that
its pg_proc tuple is toasted.

2. In one backend, run "SELECT foofunc(1)". It calls
FuncnameGetCandidates() which calls
"SearchSysCacheList1(PROCNAMEARGSNSP, CStringGetDatum(funcname));". Put
a break point in SearchCatCacheList() just after the systable_beginscan().

3. In another backend, create function foofunc() with no args.

4. continue execution from the breakpoint.

5. Run "SELECT foofunc()" in the first session. It fails to find the
function. The error persists, it will fail to find that function if you
try again, until the syscache is invalidated again for some reason.

Attached is an injection point test case to reproduce that. If you
change the test so that the function's body is shorter, so that it's not
toasted, the test passes.

--
Heikki Linnakangas
Neon (https://neon.tech)

Attachments:

0001-Demonstrate-catcache-list-invalidation-bug.patchtext/x-patch; charset=UTF-8; name=0001-Demonstrate-catcache-list-invalidation-bug.patchDownload

From 0a3bd2805f332c849333c071d39a6595726fa4c9 Mon Sep 17 00:00:00 2001
From: Heikki Linnakangas <heikki.linnakangas@iki.fi>
Date: Sat, 14 Dec 2024 02:05:55 +0200
Subject: [PATCH 1/1] Demonstrate catcache list invalidation bug

---
 src/backend/utils/cache/catcache.c       |  3 +
 src/test/modules/test_misc/meson.build   |  1 +
 src/test/modules/test_misc/t/007_bugs.pl | 96 ++++++++++++++++++++++++
 3 files changed, 100 insertions(+)
 create mode 100644 src/test/modules/test_misc/t/007_bugs.pl

diff --git a/src/backend/utils/cache/catcache.c b/src/backend/utils/cache/catcache.c
index ee303dc501d..4f06b45239c 100644
--- a/src/backend/utils/cache/catcache.c
+++ b/src/backend/utils/cache/catcache.c
@@ -34,6 +34,7 @@
 #include "utils/catcache.h"
 #include "utils/datum.h"
 #include "utils/fmgroids.h"
+#include "utils/injection_point.h"
 #include "utils/inval.h"
 #include "utils/memutils.h"
 #include "utils/rel.h"
@@ -1811,6 +1812,8 @@ SearchCatCacheList(CatCache *cache,
 
 			stale = false;
 
+			INJECTION_POINT("catcache-list-miss-systable-scan-started");
+
 			while (HeapTupleIsValid(ntp = systable_getnext(scandesc)))
 			{
 				uint32		hashValue;
diff --git a/src/test/modules/test_misc/meson.build b/src/test/modules/test_misc/meson.build
index 283ffa751aa..3abcd471c03 100644
--- a/src/test/modules/test_misc/meson.build
+++ b/src/test/modules/test_misc/meson.build
@@ -15,6 +15,7 @@ tests += {
       't/004_io_direct.pl',
       't/005_timeouts.pl',
       't/006_signal_autovacuum.pl',
+      't/007_bugs.pl',
     ],
   },
 }
diff --git a/src/test/modules/test_misc/t/007_bugs.pl b/src/test/modules/test_misc/t/007_bugs.pl
new file mode 100644
index 00000000000..48adf2f7b65
--- /dev/null
+++ b/src/test/modules/test_misc/t/007_bugs.pl
@@ -0,0 +1,96 @@
+
+# Copyright (c) 2024, PostgreSQL Global Development Group
+
+use strict;
+use warnings FATAL => 'all';
+use locale;
+
+use PostgreSQL::Test::Cluster;
+use PostgreSQL::Test::Utils;
+use Time::HiRes qw(usleep);
+use Test::More;
+
+# Test timeouts that will cause FATAL errors.  This test relies on injection
+# points to await a timeout occurrence. Relying on sleep proved to be unstable
+# on buildfarm. It's difficult to rely on the NOTICE injection point because
+# the backend under FATAL error can behave differently.
+
+if ($ENV{enable_injection_points} ne 'yes')
+{
+	plan skip_all => 'Injection points not supported by this build';
+}
+
+# Node initialization
+my $node = PostgreSQL::Test::Cluster->new('master');
+$node->init();
+$node->start;
+
+# Check if the extension injection_points is available, as it may be
+# possible that this script is run with installcheck, where the module
+# would not be installed by default.
+if (!$node->check_extension('injection_points'))
+{
+	plan skip_all => 'Extension injection_points not installed';
+}
+
+$node->safe_psql('postgres', 'CREATE EXTENSION injection_points;');
+
+
+sub randStr {
+	my $len = shift;
+	my @chars = ("A".."Z", "a".."z", "0".."9");
+	return join '', map { @chars[rand @chars] } 1 .. $len;
+}
+
+# XXX: If you make this shorter, the pg_proc tuple is not toasted, and the test passes
+my $longtext = randStr(10000);    # test fails
+#my $longtext = randStr(10);       # test passes
+
+$node->safe_psql('postgres', qq[
+    CREATE FUNCTION foofunc(dummy integer) RETURNS integer AS \$\$ SELECT 1; /* $longtext */ \$\$ LANGUAGE SQL
+]);
+
+my $psql_session = $node->background_psql('postgres');
+my $psql_session2 = $node->background_psql('postgres');
+
+# Set injection point in the session, to pause while populating the catcache list
+$psql_session->query_safe(
+	qq[
+    SELECT injection_points_set_local();
+    SELECT injection_points_attach('catcache-list-miss-systable-scan-started', 'wait');
+]);
+
+# This pauses on the injection point while populating catcache list for
+# functions with name "foofunc"
+$psql_session->query_until(
+	qr/starting_bg_psql/, q(
+   \echo starting_bg_psql
+   SELECT foofunc(1);
+));
+
+# While the first session is building the catcache list, create a new
+# function that overloads the same name. This sends a catcache invalidation.
+$node->safe_psql('postgres', qq[
+    CREATE FUNCTION foofunc() RETURNS integer AS \$\$ SELECT 123 \$\$ LANGUAGE SQL
+]);
+
+# Continue the paused session. It will continue to construct the catcache
+# list, and will accept invalidations while doing that, but it misses the
+# new function we just created. The "SELECT foofunc(1)" query will now
+# finish.
+$psql_session2->query_safe(qq[
+    SELECT injection_points_wakeup('catcache-list-miss-systable-scan-started');
+    SELECT injection_points_detach('catcache-list-miss-systable-scan-started');
+]);
+
+# Demonstrate that we can run other queries in the session. (Not necessary, but
+# makes it more clear that the session is permanently missing the catcache entry)
+$psql_session->query_safe("SELECT now();");
+
+# This should find the new function, but it does not.
+$psql_session->query_safe("SELECT foofunc();");
+
+ok($psql_session->quit);
+ok($psql_session2->quit);
+
+done_testing();
-- 
2.39.5

#16

Heikki Linnakangas

hlinnaka@iki.fi

about 1 year ago

In reply to: Heikki Linnakangas (#15)

2 attachment(s)

On 14/12/2024 02:06, Heikki Linnakangas wrote:

Ok, I missed that. It does not handle the 2nd scenario though: If a new
catalog tuple is concurrently inserted that should be part of the list,
it is missed.

I was able to reproduce that, by pausing a process with gdb while it's
building the list in SearchCatCacheList():

1. Create a function called foofunc(integer). It must be large so that
its pg_proc tuple is toasted.

2. In one backend, run "SELECT foofunc(1)". It calls
FuncnameGetCandidates() which calls
"SearchSysCacheList1(PROCNAMEARGSNSP, CStringGetDatum(funcname));". Put
a break point in SearchCatCacheList() just after the systable_beginscan().

3. In another backend, create function foofunc() with no args.

4. continue execution from the breakpoint.

5. Run "SELECT foofunc()" in the first session. It fails to find the
function. The error persists, it will fail to find that function if you
try again, until the syscache is invalidated again for some reason.

Attached is an injection point test case to reproduce that. If you
change the test so that the function's body is shorter, so that it's not
toasted, the test passes.

I'm thinking of the attached to fix this. It changes the strategy for
detecting concurrent cache invalidations. Instead of the "recheck"
mechanism that was introduced in commit ad98fb1422, keep a stack of
"build in-progress" entries, and CatCacheInvalidate() invalidate those
"in-progress" entries in addition to the actual CatCTup and CatCList
entries.

My first attempt was to insert the CatCTup or CatCList entry to the
catcache before starting to build it, marked with a flag to indicate
that the entry isn't fully built yet. But when I started to write that
it got pretty invasive, and it felt easier to add another structure to
hold the in-progress entries instead.

(I'm not sure I got the 'volatile' markers on the local variable right
in this patch; before committing this I'll need to freshen my memory on
the rules on PG_TRY() and local variables again. Also, I'm not sure I
want to commit the test with the injection point, but it's useful now to
demonstrate the bug.)

--
Heikki Linnakangas
Neon (https://neon.tech)

Attachments:

0001-Demonstrate-catcache-list-invalidation-bug.patchtext/x-patch; charset=UTF-8; name=0001-Demonstrate-catcache-list-invalidation-bug.patchDownload

From 0a3bd2805f332c849333c071d39a6595726fa4c9 Mon Sep 17 00:00:00 2001
From: Heikki Linnakangas <heikki.linnakangas@iki.fi>
Date: Sat, 14 Dec 2024 02:05:55 +0200
Subject: [PATCH 1/1] Demonstrate catcache list invalidation bug

---
 src/backend/utils/cache/catcache.c       |  3 +
 src/test/modules/test_misc/meson.build   |  1 +
 src/test/modules/test_misc/t/007_bugs.pl | 96 ++++++++++++++++++++++++
 3 files changed, 100 insertions(+)
 create mode 100644 src/test/modules/test_misc/t/007_bugs.pl

diff --git a/src/backend/utils/cache/catcache.c b/src/backend/utils/cache/catcache.c
index ee303dc501d..4f06b45239c 100644
--- a/src/backend/utils/cache/catcache.c
+++ b/src/backend/utils/cache/catcache.c
@@ -34,6 +34,7 @@
 #include "utils/catcache.h"
 #include "utils/datum.h"
 #include "utils/fmgroids.h"
+#include "utils/injection_point.h"
 #include "utils/inval.h"
 #include "utils/memutils.h"
 #include "utils/rel.h"
@@ -1811,6 +1812,8 @@ SearchCatCacheList(CatCache *cache,
 
 			stale = false;
 
+			INJECTION_POINT("catcache-list-miss-systable-scan-started");
+
 			while (HeapTupleIsValid(ntp = systable_getnext(scandesc)))
 			{
 				uint32		hashValue;
diff --git a/src/test/modules/test_misc/meson.build b/src/test/modules/test_misc/meson.build
index 283ffa751aa..3abcd471c03 100644
--- a/src/test/modules/test_misc/meson.build
+++ b/src/test/modules/test_misc/meson.build
@@ -15,6 +15,7 @@ tests += {
       't/004_io_direct.pl',
       't/005_timeouts.pl',
       't/006_signal_autovacuum.pl',
+      't/007_bugs.pl',
     ],
   },
 }
diff --git a/src/test/modules/test_misc/t/007_bugs.pl b/src/test/modules/test_misc/t/007_bugs.pl
new file mode 100644
index 00000000000..48adf2f7b65
--- /dev/null
+++ b/src/test/modules/test_misc/t/007_bugs.pl
@@ -0,0 +1,96 @@
+
+# Copyright (c) 2024, PostgreSQL Global Development Group
+
+use strict;
+use warnings FATAL => 'all';
+use locale;
+
+use PostgreSQL::Test::Cluster;
+use PostgreSQL::Test::Utils;
+use Time::HiRes qw(usleep);
+use Test::More;
+
+# Test timeouts that will cause FATAL errors.  This test relies on injection
+# points to await a timeout occurrence. Relying on sleep proved to be unstable
+# on buildfarm. It's difficult to rely on the NOTICE injection point because
+# the backend under FATAL error can behave differently.
+
+if ($ENV{enable_injection_points} ne 'yes')
+{
+	plan skip_all => 'Injection points not supported by this build';
+}
+
+# Node initialization
+my $node = PostgreSQL::Test::Cluster->new('master');
+$node->init();
+$node->start;
+
+# Check if the extension injection_points is available, as it may be
+# possible that this script is run with installcheck, where the module
+# would not be installed by default.
+if (!$node->check_extension('injection_points'))
+{
+	plan skip_all => 'Extension injection_points not installed';
+}
+
+$node->safe_psql('postgres', 'CREATE EXTENSION injection_points;');
+
+
+sub randStr {
+	my $len = shift;
+	my @chars = ("A".."Z", "a".."z", "0".."9");
+	return join '', map { @chars[rand @chars] } 1 .. $len;
+}
+
+# XXX: If you make this shorter, the pg_proc tuple is not toasted, and the test passes
+my $longtext = randStr(10000);    # test fails
+#my $longtext = randStr(10);       # test passes
+
+$node->safe_psql('postgres', qq[
+    CREATE FUNCTION foofunc(dummy integer) RETURNS integer AS \$\$ SELECT 1; /* $longtext */ \$\$ LANGUAGE SQL
+]);
+
+my $psql_session = $node->background_psql('postgres');
+my $psql_session2 = $node->background_psql('postgres');
+
+# Set injection point in the session, to pause while populating the catcache list
+$psql_session->query_safe(
+	qq[
+    SELECT injection_points_set_local();
+    SELECT injection_points_attach('catcache-list-miss-systable-scan-started', 'wait');
+]);
+
+# This pauses on the injection point while populating catcache list for
+# functions with name "foofunc"
+$psql_session->query_until(
+	qr/starting_bg_psql/, q(
+   \echo starting_bg_psql
+   SELECT foofunc(1);
+));
+
+# While the first session is building the catcache list, create a new
+# function that overloads the same name. This sends a catcache invalidation.
+$node->safe_psql('postgres', qq[
+    CREATE FUNCTION foofunc() RETURNS integer AS \$\$ SELECT 123 \$\$ LANGUAGE SQL
+]);
+
+# Continue the paused session. It will continue to construct the catcache
+# list, and will accept invalidations while doing that, but it misses the
+# new function we just created. The "SELECT foofunc(1)" query will now
+# finish.
+$psql_session2->query_safe(qq[
+    SELECT injection_points_wakeup('catcache-list-miss-systable-scan-started');
+    SELECT injection_points_detach('catcache-list-miss-systable-scan-started');
+]);
+
+# Demonstrate that we can run other queries in the session. (Not necessary, but
+# makes it more clear that the session is permanently missing the catcache entry)
+$psql_session->query_safe("SELECT now();");
+
+# This should find the new function, but it does not.
+$psql_session->query_safe("SELECT foofunc();");
+
+ok($psql_session->quit);
+ok($psql_session2->quit);
+
+done_testing();
-- 
2.39.5

0001-Don-t-allow-GetTransactionSnapshot-in-logical-decodi.patchtext/x-patch; charset=UTF-8; name=0001-Don-t-allow-GetTransactionSnapshot-in-logical-decodi.patchDownload

From ec248c69cb42a0747ecc6a63ac4e4682cce2ee93 Mon Sep 17 00:00:00 2001
From: Heikki Linnakangas <heikki.linnakangas@iki.fi>
Date: Fri, 20 Dec 2024 18:37:44 +0200
Subject: [PATCH 1/1] Don't allow GetTransactionSnapshot() in logical decoding

A historic snapshot should only be used for catalog access, not
general queries. We never call GetTransactionSnapshot() during logical
decoding, which is good because it wouldn't be very sensible, so the
code to deal with that was unreachable and untested. Turn it into an
error, to avoid doing that in the future either.
---
 src/backend/utils/time/snapmgr.c | 12 ++++--------
 1 file changed, 4 insertions(+), 8 deletions(-)

diff --git a/src/backend/utils/time/snapmgr.c b/src/backend/utils/time/snapmgr.c
index e60360338d5..3717869f736 100644
--- a/src/backend/utils/time/snapmgr.c
+++ b/src/backend/utils/time/snapmgr.c
@@ -212,16 +212,12 @@ Snapshot
 GetTransactionSnapshot(void)
 {
 	/*
-	 * Return historic snapshot if doing logical decoding. We'll never need a
-	 * non-historic transaction snapshot in this (sub-)transaction, so there's
-	 * no need to be careful to set one up for later calls to
-	 * GetTransactionSnapshot().
+	 * This should not be called while doing logical decoding.  Historic
+	 * snapshots are only usable for catalog access, not for general-purpose
+	 * queries.
 	 */
 	if (HistoricSnapshotActive())
-	{
-		Assert(!FirstSnapshotSet);
-		return HistoricSnapshot;
-	}
+		elog(ERROR, "cannot take query snapshot during logical decoding");
 
 	/* First call in transaction? */
 	if (!FirstSnapshotSet)
-- 
2.39.5

#17

Michael Paquier

michael@paquier.xyz

about 1 year ago

In reply to: Heikki Linnakangas (#16)

On Tue, Dec 24, 2024 at 12:18:09AM +0200, Heikki Linnakangas wrote:

My first attempt was to insert the CatCTup or CatCList entry to the catcache
before starting to build it, marked with a flag to indicate that the entry
isn't fully built yet. But when I started to write that it got pretty
invasive, and it felt easier to add another structure to hold the
in-progress entries instead.

From ec248c69cb42a0747ecc6a63ac4e4682cce2ee93 Mon Sep 17 00:00:00 2001
From: Heikki Linnakangas <heikki.linnakangas@iki.fi>
Date: Fri, 20 Dec 2024 18:37:44 +0200
Subject: [PATCH 1/1] Don't allow GetTransactionSnapshot() in logical decoding

It seems to me that this is not what you intended to attach for the
catcache inconsistency fix?
--
Michael

#18

Heikki Linnakangas

hlinnaka@iki.fi

about 1 year ago

In reply to: Michael Paquier (#17)

2 attachment(s)

On 24/12/2024 09:38, Michael Paquier wrote:

On Tue, Dec 24, 2024 at 12:18:09AM +0200, Heikki Linnakangas wrote:

My first attempt was to insert the CatCTup or CatCList entry to the catcache
before starting to build it, marked with a flag to indicate that the entry
isn't fully built yet. But when I started to write that it got pretty
invasive, and it felt easier to add another structure to hold the
in-progress entries instead.

From ec248c69cb42a0747ecc6a63ac4e4682cce2ee93 Mon Sep 17 00:00:00 2001
From: Heikki Linnakangas <heikki.linnakangas@iki.fi>
Date: Fri, 20 Dec 2024 18:37:44 +0200
Subject: [PATCH 1/1] Don't allow GetTransactionSnapshot() in logical decoding

It seems to me that this is not what you intended to attach for the
catcache inconsistency fix?

Right, sorry, here are the correct patches.

--
Heikki Linnakangas
Neon (https://neon.tech)

Attachments:

v2-0001-Demonstrate-catcache-list-invalidation-bug.patchtext/x-patch; charset=UTF-8; name=v2-0001-Demonstrate-catcache-list-invalidation-bug.patchDownload

From 9fb473ba6d252998967f6d69a566c2be1d8d7f79 Mon Sep 17 00:00:00 2001
From: Heikki Linnakangas <heikki.linnakangas@iki.fi>
Date: Tue, 24 Dec 2024 00:13:36 +0200
Subject: [PATCH v2 1/2] Demonstrate catcache list invalidation bug

---
 src/backend/utils/cache/catcache.c       |  3 +
 src/test/modules/test_misc/meson.build   |  1 +
 src/test/modules/test_misc/t/007_bugs.pl | 96 ++++++++++++++++++++++++
 3 files changed, 100 insertions(+)
 create mode 100644 src/test/modules/test_misc/t/007_bugs.pl

diff --git a/src/backend/utils/cache/catcache.c b/src/backend/utils/cache/catcache.c
index ee303dc501d..4f06b45239c 100644
--- a/src/backend/utils/cache/catcache.c
+++ b/src/backend/utils/cache/catcache.c
@@ -34,6 +34,7 @@
 #include "utils/catcache.h"
 #include "utils/datum.h"
 #include "utils/fmgroids.h"
+#include "utils/injection_point.h"
 #include "utils/inval.h"
 #include "utils/memutils.h"
 #include "utils/rel.h"
@@ -1811,6 +1812,8 @@ SearchCatCacheList(CatCache *cache,
 
 			stale = false;
 
+			INJECTION_POINT("catcache-list-miss-systable-scan-started");
+
 			while (HeapTupleIsValid(ntp = systable_getnext(scandesc)))
 			{
 				uint32		hashValue;
diff --git a/src/test/modules/test_misc/meson.build b/src/test/modules/test_misc/meson.build
index 283ffa751aa..3abcd471c03 100644
--- a/src/test/modules/test_misc/meson.build
+++ b/src/test/modules/test_misc/meson.build
@@ -15,6 +15,7 @@ tests += {
       't/004_io_direct.pl',
       't/005_timeouts.pl',
       't/006_signal_autovacuum.pl',
+      't/007_bugs.pl',
     ],
   },
 }
diff --git a/src/test/modules/test_misc/t/007_bugs.pl b/src/test/modules/test_misc/t/007_bugs.pl
new file mode 100644
index 00000000000..48adf2f7b65
--- /dev/null
+++ b/src/test/modules/test_misc/t/007_bugs.pl
@@ -0,0 +1,96 @@
+
+# Copyright (c) 2024, PostgreSQL Global Development Group
+
+use strict;
+use warnings FATAL => 'all';
+use locale;
+
+use PostgreSQL::Test::Cluster;
+use PostgreSQL::Test::Utils;
+use Time::HiRes qw(usleep);
+use Test::More;
+
+# Test timeouts that will cause FATAL errors.  This test relies on injection
+# points to await a timeout occurrence. Relying on sleep proved to be unstable
+# on buildfarm. It's difficult to rely on the NOTICE injection point because
+# the backend under FATAL error can behave differently.
+
+if ($ENV{enable_injection_points} ne 'yes')
+{
+	plan skip_all => 'Injection points not supported by this build';
+}
+
+# Node initialization
+my $node = PostgreSQL::Test::Cluster->new('master');
+$node->init();
+$node->start;
+
+# Check if the extension injection_points is available, as it may be
+# possible that this script is run with installcheck, where the module
+# would not be installed by default.
+if (!$node->check_extension('injection_points'))
+{
+	plan skip_all => 'Extension injection_points not installed';
+}
+
+$node->safe_psql('postgres', 'CREATE EXTENSION injection_points;');
+
+
+sub randStr {
+	my $len = shift;
+	my @chars = ("A".."Z", "a".."z", "0".."9");
+	return join '', map { @chars[rand @chars] } 1 .. $len;
+}
+
+# XXX: If you make this shorter, the pg_proc tuple is not toasted, and the test passes
+my $longtext = randStr(10000);    # test fails
+#my $longtext = randStr(10);       # test passes
+
+$node->safe_psql('postgres', qq[
+    CREATE FUNCTION foofunc(dummy integer) RETURNS integer AS \$\$ SELECT 1; /* $longtext */ \$\$ LANGUAGE SQL
+]);
+
+my $psql_session = $node->background_psql('postgres');
+my $psql_session2 = $node->background_psql('postgres');
+
+# Set injection point in the session, to pause while populating the catcache list
+$psql_session->query_safe(
+	qq[
+    SELECT injection_points_set_local();
+    SELECT injection_points_attach('catcache-list-miss-systable-scan-started', 'wait');
+]);
+
+# This pauses on the injection point while populating catcache list for
+# functions with name "foofunc"
+$psql_session->query_until(
+	qr/starting_bg_psql/, q(
+   \echo starting_bg_psql
+   SELECT foofunc(1);
+));
+
+# While the first session is building the catcache list, create a new
+# function that overloads the same name. This sends a catcache invalidation.
+$node->safe_psql('postgres', qq[
+    CREATE FUNCTION foofunc() RETURNS integer AS \$\$ SELECT 123 \$\$ LANGUAGE SQL
+]);
+
+# Continue the paused session. It will continue to construct the catcache
+# list, and will accept invalidations while doing that, but it misses the
+# new function we just created. The "SELECT foofunc(1)" query will now
+# finish.
+$psql_session2->query_safe(qq[
+    SELECT injection_points_wakeup('catcache-list-miss-systable-scan-started');
+    SELECT injection_points_detach('catcache-list-miss-systable-scan-started');
+]);
+
+# Demonstrate that we can run other queries in the session. (Not necessary, but
+# makes it more clear that the session is permanently missing the catcache entry)
+$psql_session->query_safe("SELECT now();");
+
+# This should find the new function, but it does not.
+$psql_session->query_safe("SELECT foofunc();");
+
+ok($psql_session->quit);
+ok($psql_session2->quit);
+
+done_testing();
-- 
2.39.5

v2-0002-Fix-bug-in-catcache-invalidation.patchtext/x-patch; charset=UTF-8; name=v2-0002-Fix-bug-in-catcache-invalidation.patchDownload

From 9ed223c36591b9ab777de48e8e3f28e321d3f9d4 Mon Sep 17 00:00:00 2001
From: Heikki Linnakangas <heikki.linnakangas@iki.fi>
Date: Tue, 24 Dec 2024 00:07:38 +0200
Subject: [PATCH v2 2/2] Fix bug in catcache invalidation

If a new entry is added that belongs to a catche list entry, and cache
invalidation happens while the list entry is being built, the new entry
can be missed.

To fix, change the way we detect concurrent invalidations while a
catcache entry is being built. Keep a stack of entries that are being
built, and apply cache invalidation to those entries in addition to the
real catcache entries.

Discussion: XXX
---
 src/backend/utils/cache/catcache.c | 185 +++++++++++++++++------------
 src/tools/pgindent/typedefs.list   |   1 +
 2 files changed, 112 insertions(+), 74 deletions(-)

diff --git a/src/backend/utils/cache/catcache.c b/src/backend/utils/cache/catcache.c
index 4f06b45239c..2332c52c2bd 100644
--- a/src/backend/utils/cache/catcache.c
+++ b/src/backend/utils/cache/catcache.c
@@ -41,6 +41,23 @@
 #include "utils/resowner.h"
 #include "utils/syscache.h"
 
+/*
+ * If a catcache invalidation happens while we are in the middle of creating a
+ * catcache entry (or list), it might apply to the entry we're creating,
+ * making it invalid before it's inserted to the catcache.  To catch such
+ * cases, we have a stack of "create-in-progress" entries. Cache invalidation
+ * marks any matching entries in the stack as dead, in addition to the actual
+ * CatCTup and CatCList entries .
+ */
+typedef struct CatCCreating
+{
+	CatCache   *cache;
+	uint32		hash_value;
+	bool		list;
+	bool		dead;
+	struct CatCCreating *next;
+} CatCCreating;
+static CatCCreating *catcache_creating_stack = NULL;
 
  /* #define CACHEDEBUG */	/* turns DEBUG elogs on */
 
@@ -93,8 +110,7 @@ static void CatCacheRemoveCList(CatCache *cache, CatCList *cl);
 static void RehashCatCache(CatCache *cp);
 static void RehashCatCacheLists(CatCache *cp);
 static void CatalogCacheInitializeCache(CatCache *cache);
-static CatCTup *CatalogCacheCreateEntry(CatCache *cache,
-										HeapTuple ntp, SysScanDesc scandesc,
+static CatCTup *CatalogCacheCreateEntry(CatCache *cache, HeapTuple ntp,
 										Datum *arguments,
 										uint32 hashValue, Index hashIndex);
 
@@ -662,6 +678,16 @@ CatCacheInvalidate(CatCache *cache, uint32 hashValue)
 			/* could be multiple matches, so keep looking! */
 		}
 	}
+
+	/* Also invalidate any entries that are being built */
+	for (CatCCreating *e = catcache_creating_stack; e != NULL; e = e->next)
+	{
+		if (e->cache == cache)
+		{
+			if (e->list || e->hash_value == hashValue)
+				e->dead = true;
+		}
+	}
 }
 
 /* ----------------------------------------------------------------
@@ -744,6 +770,13 @@ ResetCatalogCache(CatCache *cache)
 #endif
 		}
 	}
+
+	/* Also invalidate any entries that are being built */
+	for (CatCCreating *e = catcache_creating_stack; e != NULL; e = e->next)
+	{
+		if (e->cache == cache)
+			e->dead = true;
+	}
 }
 
 /*
@@ -1493,7 +1526,7 @@ SearchCatCacheMiss(CatCache *cache,
 
 		while (HeapTupleIsValid(ntp = systable_getnext(scandesc)))
 		{
-			ct = CatalogCacheCreateEntry(cache, ntp, scandesc, NULL,
+			ct = CatalogCacheCreateEntry(cache, ntp, NULL,
 										 hashValue, hashIndex);
 			/* upon failure, we must start the scan over */
 			if (ct == NULL)
@@ -1528,7 +1561,7 @@ SearchCatCacheMiss(CatCache *cache,
 		if (IsBootstrapProcessingMode())
 			return NULL;
 
-		ct = CatalogCacheCreateEntry(cache, NULL, NULL, arguments,
+		ct = CatalogCacheCreateEntry(cache, NULL, arguments,
 									 hashValue, hashIndex);
 
 		/* Creating a negative cache entry shouldn't fail */
@@ -1665,6 +1698,8 @@ SearchCatCacheList(CatCache *cache,
 	HeapTuple	ntp;
 	MemoryContext oldcxt;
 	int			i;
+	volatile CatCCreating creating_list;
+	bool		first_iter = true;
 
 	/*
 	 * one-time startup overhead for each cache
@@ -1779,12 +1814,25 @@ SearchCatCacheList(CatCache *cache,
 	 */
 	ctlist = NIL;
 
+	/*
+	 * Cache invalidation can happen while we're building the list.
+	 * CatalogCacheCreateEntry() handles concurrent invalidation of individual
+	 * tuples, but it's also possible that a new entry is concurrently added
+	 * that we might miss. Push an "in-progress" entry that will receive the
+	 * invalidation, until we have built the final list entry.
+	 */
+	creating_list.cache = cache;
+	creating_list.hash_value = lHashValue;
+	creating_list.list = true;
+	creating_list.dead = false;
+	creating_list.next = catcache_creating_stack;
+	catcache_creating_stack = (CatCCreating *) &creating_list;
+
 	PG_TRY();
 	{
 		ScanKeyData cur_skey[CATCACHE_MAXKEYS];
 		Relation	relation;
 		SysScanDesc scandesc;
-		bool		stale;
 
 		relation = table_open(cache->cc_reloid, AccessShareLock);
 
@@ -1810,9 +1858,10 @@ SearchCatCacheList(CatCache *cache,
 			/* The list will be ordered iff we are doing an index scan */
 			ordered = (scandesc->irel != NULL);
 
-			stale = false;
+			creating_list.dead = false;
 
-			INJECTION_POINT("catcache-list-miss-systable-scan-started");
+			if (first_iter)
+				INJECTION_POINT("catcache-list-miss-systable-scan-started");
 
 			while (HeapTupleIsValid(ntp = systable_getnext(scandesc)))
 			{
@@ -1856,30 +1905,13 @@ SearchCatCacheList(CatCache *cache,
 				if (!found)
 				{
 					/* We didn't find a usable entry, so make a new one */
-					ct = CatalogCacheCreateEntry(cache, ntp, scandesc, NULL,
+					ct = CatalogCacheCreateEntry(cache, ntp, NULL,
 												 hashValue, hashIndex);
+
 					/* upon failure, we must start the scan over */
 					if (ct == NULL)
 					{
-						/*
-						 * Release refcounts on any items we already had.  We
-						 * dare not try to free them if they're now
-						 * unreferenced, since an error while doing that would
-						 * result in the PG_CATCH below doing extra refcount
-						 * decrements.  Besides, we'll likely re-adopt those
-						 * items in the next iteration, so it's not worth
-						 * complicating matters to try to get rid of them.
-						 */
-						foreach(ctlist_item, ctlist)
-						{
-							ct = (CatCTup *) lfirst(ctlist_item);
-							Assert(ct->c_list == NULL);
-							Assert(ct->refcount > 0);
-							ct->refcount--;
-						}
-						/* Reset ctlist in preparation for new try */
-						ctlist = NIL;
-						stale = true;
+						creating_list.dead = true;
 						break;
 					}
 				}
@@ -1891,7 +1923,32 @@ SearchCatCacheList(CatCache *cache,
 			}
 
 			systable_endscan(scandesc);
-		} while (stale);
+
+			/* upon failure, we must start the scan over */
+			if (creating_list.dead)
+			{
+				/*
+				 * Release refcounts on any items we already had.  We dare not
+				 * try to free them if they're now unreferenced, since an
+				 * error while doing that would result in the PG_CATCH below
+				 * doing extra refcount decrements.  Besides, we'll likely
+				 * re-adopt those items in the next iteration, so it's not
+				 * worth complicating matters to try to get rid of them.
+				 */
+				foreach(ctlist_item, ctlist)
+				{
+					ct = (CatCTup *) lfirst(ctlist_item);
+					Assert(ct->c_list == NULL);
+					Assert(ct->refcount > 0);
+					ct->refcount--;
+				}
+				/* Reset ctlist in preparation for new try */
+				ctlist = NIL;
+				first_iter = false;
+				continue;
+			}
+			break;
+		} while (true);
 
 		table_close(relation, AccessShareLock);
 
@@ -1919,6 +1976,7 @@ SearchCatCacheList(CatCache *cache,
 	}
 	PG_CATCH();
 	{
+		catcache_creating_stack = creating_list.next;
 		foreach(ctlist_item, ctlist)
 		{
 			ct = (CatCTup *) lfirst(ctlist_item);
@@ -1937,6 +1995,8 @@ SearchCatCacheList(CatCache *cache,
 		PG_RE_THROW();
 	}
 	PG_END_TRY();
+	catcache_creating_stack = creating_list.next;
+	Assert(!creating_list.dead);
 
 	cl->cl_magic = CL_MAGIC;
 	cl->my_cache = cache;
@@ -2009,23 +2069,6 @@ ReleaseCatCacheListWithOwner(CatCList *list, ResourceOwner resowner)
 }
 
 
-/*
- * equalTuple
- *		Are these tuples memcmp()-equal?
- */
-static bool
-equalTuple(HeapTuple a, HeapTuple b)
-{
-	uint32		alen;
-	uint32		blen;
-
-	alen = a->t_len;
-	blen = b->t_len;
-	return (alen == blen &&
-			memcmp((char *) a->t_data,
-				   (char *) b->t_data, blen) == 0);
-}
-
 /*
  * CatalogCacheCreateEntry
  *		Create a new CatCTup entry, copying the given HeapTuple and other
@@ -2033,17 +2076,16 @@ equalTuple(HeapTuple a, HeapTuple b)
  *
  * To create a normal cache entry, ntp must be the HeapTuple just fetched
  * from scandesc, and "arguments" is not used.  To create a negative cache
- * entry, pass NULL for ntp and scandesc; then "arguments" is the cache
- * keys to use.  In either case, hashValue/hashIndex are the hash values
- * computed from the cache keys.
+ * entry, pass NULL for ntp; then "arguments" is the cache keys to use.
+ * In either case, hashValue/hashIndex are the hash values computed from
+ * the cache keys.
  *
  * Returns NULL if we attempt to detoast the tuple and observe that it
  * became stale.  (This cannot happen for a negative entry.)  Caller must
  * retry the tuple lookup in that case.
  */
 static CatCTup *
-CatalogCacheCreateEntry(CatCache *cache, HeapTuple ntp, SysScanDesc scandesc,
-						Datum *arguments,
+CatalogCacheCreateEntry(CatCache *cache, HeapTuple ntp, Datum *arguments,
 						uint32 hashValue, Index hashIndex)
 {
 	CatCTup    *ct;
@@ -2076,38 +2118,33 @@ CatalogCacheCreateEntry(CatCache *cache, HeapTuple ntp, SysScanDesc scandesc,
 		 */
 		if (HeapTupleHasExternal(ntp))
 		{
-			bool		need_cmp = IsInplaceUpdateOid(cache->cc_reloid);
-			HeapTuple	before = NULL;
-			bool		matches = true;
-
-			if (need_cmp)
-				before = heap_copytuple(ntp);
-			dtp = toast_flatten_tuple(ntp, cache->cc_tupdesc);
+			volatile CatCCreating creating;
 
 			/*
 			 * The tuple could become stale while we are doing toast table
-			 * access (since AcceptInvalidationMessages can run then).
-			 * equalTuple() detects staleness from inplace updates, while
-			 * systable_recheck_tuple() detects staleness from normal updates.
-			 *
-			 * While this equalTuple() follows the usual rule of reading with
-			 * a pin and no buffer lock, it warrants suspicion since an
-			 * inplace update could appear at any moment.  It's safe because
-			 * the inplace update sends an invalidation that can't reorder
-			 * before the inplace heap change.  If the heap change reaches
-			 * this process just after equalTuple() looks, we've not missed
-			 * its inval.
+			 * access (since AcceptInvalidationMessages can run then). The
+			 * invalidation will mark our 'creating' entry as dead.
 			 */
-			if (need_cmp)
+			creating.cache = cache;
+			creating.hash_value = hashValue;
+			creating.list = false;
+			creating.dead = false;
+			creating.next = catcache_creating_stack;
+			catcache_creating_stack = (CatCCreating *) &creating;
+
+			PG_TRY();
 			{
-				matches = equalTuple(before, ntp);
-				heap_freetuple(before);
+				dtp = toast_flatten_tuple(ntp, cache->cc_tupdesc);
 			}
-			if (!matches || !systable_recheck_tuple(scandesc, ntp))
+			PG_FINALLY();
 			{
-				heap_freetuple(dtp);
-				return NULL;
+				Assert(catcache_creating_stack == &creating);
+				catcache_creating_stack = creating.next;
 			}
+			PG_END_TRY();
+
+			if (creating.dead)
+				return NULL;
 		}
 		else
 			dtp = ntp;
diff --git a/src/tools/pgindent/typedefs.list b/src/tools/pgindent/typedefs.list
index e1c4f913f84..e1d63776581 100644
--- a/src/tools/pgindent/typedefs.list
+++ b/src/tools/pgindent/typedefs.list
@@ -383,6 +383,7 @@ CaseTestExpr
 CaseWhen
 Cash
 CastInfo
+CatCCreating
 CatCList
 CatCTup
 CatCache
-- 
2.39.5

#19

Noah Misch

noah@leadboat.com

about 1 year ago

In reply to: Heikki Linnakangas (#16)

On Tue, Dec 24, 2024 at 12:18:09AM +0200, Heikki Linnakangas wrote:

On 14/12/2024 02:06, Heikki Linnakangas wrote:

Ok, I missed that. It does not handle the 2nd scenario though: If a new
catalog tuple is concurrently inserted that should be part of the list,
it is missed.

Attached is an injection point test case to reproduce that. If you
change the test so that the function's body is shorter, so that it's not
toasted, the test passes.

Nice discovery.

I'm thinking of the attached to fix this. It changes the strategy for
detecting concurrent cache invalidations. Instead of the "recheck" mechanism
that was introduced in commit ad98fb1422, keep a stack of "build
in-progress" entries, and CatCacheInvalidate() invalidate those
"in-progress" entries in addition to the actual CatCTup and CatCList
entries.

My first attempt was to insert the CatCTup or CatCList entry to the catcache
before starting to build it, marked with a flag to indicate that the entry
isn't fully built yet. But when I started to write that it got pretty
invasive, and it felt easier to add another structure to hold the
in-progress entries instead.

That's similar to how relcache has been doing it (in_progress_list). I see no
problem applying that technique here.

not sure I want to
commit the test with the injection point, but it's useful now to demonstrate
the bug.)

I'd err on the side of including it. Apart from some copied comments, the
test looks ready.

On Wed, Dec 25, 2024 at 11:27:38AM +0200, Heikki Linnakangas wrote:

+++ b/src/test/modules/test_misc/t/007_bugs.pl

test_misc/t/007_bugs.pl could be home to almost anything. How about naming it
007_catcache_inval.pl?

@@ -744,6 +770,13 @@ ResetCatalogCache(CatCache *cache)
#endif
}
}
+
+	/* Also invalidate any entries that are being built */
+	for (CatCCreating *e = catcache_creating_stack; e != NULL; e = e->next)
+	{
+		if (e->cache == cache)
+			e->dead = true;
+	}
}

With debug_discard_caches=1, "make check" hangs early with some INSERT using
100% CPU. The new test file does likewise. I bet this needs a special case
to short-circuit debug_discard_caches, like RelationCacheInvalidate() has.

@@ -1665,6 +1698,8 @@ SearchCatCacheList(CatCache *cache,
HeapTuple ntp;
MemoryContext oldcxt;
int i;
+ volatile CatCCreating creating_list;

You could drop the volatile by copying catcache_creating_stack to an
additional var that you reference after longjmp, instead of referencing
creating_list.next after longjmp. Likewise for the instance of volatile in
CatalogCacheCreateEntry().
https://pubs.opengroup.org/onlinepubs/9799919799/functions/longjmp.html has
the rules. Using volatile is fine, though.

+ bool first_iter = true;

This is okay as non-volatile, but it could just as easily move inside the
PG_TRY.

@@ -2076,38 +2118,33 @@ CatalogCacheCreateEntry(CatCache *cache, HeapTuple ntp, SysScanDesc scandesc,

+			PG_TRY();
{
-				matches = equalTuple(before, ntp);
-				heap_freetuple(before);
+				dtp = toast_flatten_tuple(ntp, cache->cc_tupdesc);

gcc 4.8.5 warns:

catcache.c: In function ‘CatalogCacheCreateEntry’:
catcache.c:2159:29: warning: ‘dtp’ may be used uninitialized in this function [-Wmaybe-uninitialized]
ct->tuple.t_tableOid = dtp->t_tableOid;

I remember some commit of a gcc 4.x warning fix in a recent year, and we do
have buildfarm representation. Consider silencing it.

The rest looks good. Thanks.

#20

Heikki Linnakangas

hlinnaka@iki.fi

about 1 year ago

In reply to: Noah Misch (#19)

1 attachment(s)

On 07/01/2025 23:56, Noah Misch wrote:

On Tue, Dec 24, 2024 at 12:18:09AM +0200, Heikki Linnakangas wrote:

I'm thinking of the attached to fix this. It changes the strategy for
detecting concurrent cache invalidations. Instead of the "recheck" mechanism
that was introduced in commit ad98fb1422, keep a stack of "build
in-progress" entries, and CatCacheInvalidate() invalidate those
"in-progress" entries in addition to the actual CatCTup and CatCList
entries.

My first attempt was to insert the CatCTup or CatCList entry to the catcache
before starting to build it, marked with a flag to indicate that the entry
isn't fully built yet. But when I started to write that it got pretty
invasive, and it felt easier to add another structure to hold the
in-progress entries instead.

That's similar to how relcache has been doing it (in_progress_list). I see no
problem applying that technique here.

Oh thanks, I didn't notice that in relcache.c. I adjusted the naming and
comments in the new catcache.c code to be a little closer to the
relcache.c version, just to make it a bit more consistent.

The main difference is that relcache.c uses an array and an end-of-xact
callback to reset the array on error, while my implementation uses a
linked list of entries allocated on stack and PG_TRY-CATCH for error
cleanup. I considered adopting relcache.c's approach for the sake of
consistency, but decided to keep my original approach in the end. Some
of the code in catcache.c already had a suitable PG_TRY-CATCH block and
I didn't want to add a new end-of-xact callback just for this.

not sure I want to
commit the test with the injection point, but it's useful now to demonstrate
the bug.)

I'd err on the side of including it. Apart from some copied comments, the
test looks ready.

Ok, I cleaned up the comments.

On Wed, Dec 25, 2024 at 11:27:38AM +0200, Heikki Linnakangas wrote:
+++ b/src/test/modules/test_misc/t/007_bugs.pl
test_misc/t/007_bugs.pl could be home to almost anything. How about naming it
007_catcache_inval.pl?

Renamed

@@ -744,6 +770,13 @@ ResetCatalogCache(CatCache *cache)
#endif
}
}
+
+	/* Also invalidate any entries that are being built */
+	for (CatCCreating *e = catcache_creating_stack; e != NULL; e = e->next)
+	{
+		if (e->cache == cache)
+			e->dead = true;
+	}
}
With debug_discard_caches=1, "make check" hangs early with some INSERT using
100% CPU. The new test file does likewise. I bet this needs a special case
to short-circuit debug_discard_caches, like RelationCacheInvalidate() has.

Added

@@ -1665,6 +1698,8 @@ SearchCatCacheList(CatCache *cache,
HeapTuple ntp;
MemoryContext oldcxt;
int i;
+ volatile CatCCreating creating_list;

You could drop the volatile by copying catcache_creating_stack to an
additional var that you reference after longjmp, instead of referencing
creating_list.next after longjmp. Likewise for the instance of volatile in
CatalogCacheCreateEntry().
https://pubs.opengroup.org/onlinepubs/9799919799/functions/longjmp.html has
the rules. Using volatile is fine, though.

Thanks for the tips. I used the additional var, it seems a little nicer.

+ bool first_iter = true;

This is okay as non-volatile, but it could just as easily move inside the
PG_TRY.

Done

@@ -2076,38 +2118,33 @@ CatalogCacheCreateEntry(CatCache *cache, HeapTuple ntp, SysScanDesc scandesc,
+			PG_TRY();
{
-				matches = equalTuple(before, ntp);
-				heap_freetuple(before);
+				dtp = toast_flatten_tuple(ntp, cache->cc_tupdesc);
gcc 4.8.5 warns:

catcache.c: In function ‘CatalogCacheCreateEntry’:
catcache.c:2159:29: warning: ‘dtp’ may be used uninitialized in this function [-Wmaybe-uninitialized]
ct->tuple.t_tableOid = dtp->t_tableOid;

I remember some commit of a gcc 4.x warning fix in a recent year, and we do
have buildfarm representation. Consider silencing it.

Hmm, I guess the compiler doesn't see that it's initialized with all the
PG_TRY()/CATCH() stuff. I initialized it to NULL to silence the warning.

Review of this new version is much appreciated, but if I don't hear
anything I'll commit and backpatch this.

--
Heikki Linnakangas
Neon (https://neon.tech)

Attachments:

v3-0001-Fix-catcache-invalidation-of-a-list-entry-that-s-.patchtext/x-patch; charset=UTF-8; name=v3-0001-Fix-catcache-invalidation-of-a-list-entry-that-s-.patchDownload

From 64b040da9eccc2507d866e9cbda79e50fb8c16ea Mon Sep 17 00:00:00 2001
From: Heikki Linnakangas <heikki.linnakangas@iki.fi>
Date: Wed, 8 Jan 2025 23:42:48 +0200
Subject: [PATCH v3 1/1] Fix catcache invalidation of a list entry that's being
 built

If a new catalog tuple is inserted that belongs to a catcache list
entry, and cache invalidation happens while the list entry is being
built, the list entry might miss the newly inserted tuple.

To fix, change the way we detect concurrent invalidations while a
catcache entry is being built. Keep a stack of entries that are being
built, and apply cache invalidation to those entries in addition to
the real catcache entries. This is similar to the in-progress list in
relcache.c.

Back-patch to all supported versions.

Reviewed-by: Noah Misch
Discussion: https://www.postgresql.org/message-id/2234dc98-06fe-42ed-b5db-ac17384dc880@iki.fi
---
 src/backend/utils/cache/catcache.c            | 220 +++++++++++-------
 src/backend/utils/cache/inval.c               |   2 +-
 src/include/utils/catcache.h                  |   1 +
 src/test/modules/test_misc/meson.build        |   1 +
 .../modules/test_misc/t/007_catcache_inval.pl | 102 ++++++++
 src/tools/pgindent/typedefs.list              |   1 +
 6 files changed, 248 insertions(+), 79 deletions(-)
 create mode 100644 src/test/modules/test_misc/t/007_catcache_inval.pl

diff --git a/src/backend/utils/cache/catcache.c b/src/backend/utils/cache/catcache.c
index 96eb68d63ab..a7f2761ed53 100644
--- a/src/backend/utils/cache/catcache.c
+++ b/src/backend/utils/cache/catcache.c
@@ -34,12 +34,31 @@
 #include "utils/catcache.h"
 #include "utils/datum.h"
 #include "utils/fmgroids.h"
+#include "utils/injection_point.h"
 #include "utils/inval.h"
 #include "utils/memutils.h"
 #include "utils/rel.h"
 #include "utils/resowner.h"
 #include "utils/syscache.h"
 
+/*
+ * If a catcache invalidation is processed while we are in the middle of
+ * creating a catcache entry (or list), it might apply to the entry we're
+ * creating, making it invalid before it's been inserted to the catcache.  To
+ * catch such cases, we have a stack of "create-in-progress" entries.  Cache
+ * invalidation marks any matching entries in the stack as dead, in addition
+ * to the actual CatCTup and CatCList entries.
+ */
+typedef struct CatCInProgress
+{
+	CatCache   *cache;			/* cache that the entry belongs to */
+	uint32		hash_value;		/* hash of the entry; ignored for lists */
+	bool		list;			/* is it a list entry? */
+	bool		dead;			/* set when the entry is invalidated */
+	struct CatCInProgress *next;
+} CatCInProgress;
+
+static CatCInProgress *catcache_in_progress_stack = NULL;
 
  /* #define CACHEDEBUG */	/* turns DEBUG elogs on */
 
@@ -92,8 +111,7 @@ static void CatCacheRemoveCList(CatCache *cache, CatCList *cl);
 static void RehashCatCache(CatCache *cp);
 static void RehashCatCacheLists(CatCache *cp);
 static void CatalogCacheInitializeCache(CatCache *cache);
-static CatCTup *CatalogCacheCreateEntry(CatCache *cache,
-										HeapTuple ntp, SysScanDesc scandesc,
+static CatCTup *CatalogCacheCreateEntry(CatCache *cache, HeapTuple ntp,
 										Datum *arguments,
 										uint32 hashValue, Index hashIndex);
 
@@ -661,6 +679,16 @@ CatCacheInvalidate(CatCache *cache, uint32 hashValue)
 			/* could be multiple matches, so keep looking! */
 		}
 	}
+
+	/* Also invalidate any entries that are being built */
+	for (CatCInProgress *e = catcache_in_progress_stack; e != NULL; e = e->next)
+	{
+		if (e->cache == cache)
+		{
+			if (e->list || e->hash_value == hashValue)
+				e->dead = true;
+		}
+	}
 }
 
 /* ----------------------------------------------------------------
@@ -697,9 +725,14 @@ CreateCacheMemoryContext(void)
  *
  * This is not very efficient if the target cache is nearly empty.
  * However, it shouldn't need to be efficient; we don't invoke it often.
+ *
+ * If 'debug_discard' is true, we are being called as part of
+ * debug_discard_caches.  In that case, the cache not reset for correctness,
+ * but just to get more testing of cache invalidation.  We skip resetting
+ * in-progress build entries in that case, or we'd never make any progress.
  */
 static void
-ResetCatalogCache(CatCache *cache)
+ResetCatalogCache(CatCache *cache, bool debug_discard)
 {
 	dlist_mutable_iter iter;
 	int			i;
@@ -743,6 +776,16 @@ ResetCatalogCache(CatCache *cache)
 #endif
 		}
 	}
+
+	/* Also invalidate any entries that are being built */
+	if (!debug_discard)
+	{
+		for (CatCInProgress *e = catcache_in_progress_stack; e != NULL; e = e->next)
+		{
+			if (e->cache == cache)
+				e->dead = true;
+		}
+	}
 }
 
 /*
@@ -752,6 +795,12 @@ ResetCatalogCache(CatCache *cache)
  */
 void
 ResetCatalogCaches(void)
+{
+	ResetCatalogCachesExt(false);
+}
+
+void
+ResetCatalogCachesExt(bool debug_discard)
 {
 	slist_iter	iter;
 
@@ -761,7 +810,7 @@ ResetCatalogCaches(void)
 	{
 		CatCache   *cache = slist_container(CatCache, cc_next, iter.cur);
 
-		ResetCatalogCache(cache);
+		ResetCatalogCache(cache, debug_discard);
 	}
 
 	CACHE_elog(DEBUG2, "end of ResetCatalogCaches call");
@@ -795,7 +844,7 @@ CatalogCacheFlushCatalog(Oid catId)
 		if (cache->cc_reloid == catId)
 		{
 			/* Yes, so flush all its contents */
-			ResetCatalogCache(cache);
+			ResetCatalogCache(cache, false);
 
 			/* Tell inval.c to call syscache callbacks for this cache */
 			CallSyscacheCallbacks(cache->id, 0);
@@ -1492,7 +1541,7 @@ SearchCatCacheMiss(CatCache *cache,
 
 		while (HeapTupleIsValid(ntp = systable_getnext(scandesc)))
 		{
-			ct = CatalogCacheCreateEntry(cache, ntp, scandesc, NULL,
+			ct = CatalogCacheCreateEntry(cache, ntp, NULL,
 										 hashValue, hashIndex);
 			/* upon failure, we must start the scan over */
 			if (ct == NULL)
@@ -1527,7 +1576,7 @@ SearchCatCacheMiss(CatCache *cache,
 		if (IsBootstrapProcessingMode())
 			return NULL;
 
-		ct = CatalogCacheCreateEntry(cache, NULL, NULL, arguments,
+		ct = CatalogCacheCreateEntry(cache, NULL, arguments,
 									 hashValue, hashIndex);
 
 		/* Creating a negative cache entry shouldn't fail */
@@ -1664,6 +1713,8 @@ SearchCatCacheList(CatCache *cache,
 	HeapTuple	ntp;
 	MemoryContext oldcxt;
 	int			i;
+	CatCInProgress *save_in_progress;
+	CatCInProgress in_progress_ent;
 
 	/*
 	 * one-time startup overhead for each cache
@@ -1778,12 +1829,28 @@ SearchCatCacheList(CatCache *cache,
 	 */
 	ctlist = NIL;
 
+	/*
+	 * Cache invalidation can happen while we're building the list.
+	 * CatalogCacheCreateEntry() handles concurrent invalidation of individual
+	 * tuples, but it's also possible that a new entry is concurrently added
+	 * that should be part of the list we're building.  Register an
+	 * "in-progress" entry that will receive the invalidation, until we have
+	 * built the final list entry.
+	 */
+	save_in_progress = catcache_in_progress_stack;
+	in_progress_ent.next = catcache_in_progress_stack;
+	in_progress_ent.cache = cache;
+	in_progress_ent.hash_value = lHashValue;
+	in_progress_ent.list = true;
+	in_progress_ent.dead = false;
+	catcache_in_progress_stack = &in_progress_ent;
+
 	PG_TRY();
 	{
 		ScanKeyData cur_skey[CATCACHE_MAXKEYS];
 		Relation	relation;
 		SysScanDesc scandesc;
-		bool		stale;
+		bool		first_iter = true;
 
 		relation = table_open(cache->cc_reloid, AccessShareLock);
 
@@ -1797,8 +1864,32 @@ SearchCatCacheList(CatCache *cache,
 		cur_skey[2].sk_argument = v3;
 		cur_skey[3].sk_argument = v4;
 
+		/*
+		 * Scan the table for matching entries.  If an invalidation arrives
+		 * mid-build, we will loop back here to retry.
+		 */
 		do
 		{
+			/*
+			 * If we are retrying, release refcounts on any items created on
+			 * the previous iteration.  We dare not try to free them if
+			 * they're now unreferenced, since an error while doing that would
+			 * result in the PG_CATCH below doing extra refcount decrements.
+			 * Besides, we'll likely re-adopt those items in the next
+			 * iteration, so it's not worth complicating matters to try to get
+			 * rid of them.
+			 */
+			foreach(ctlist_item, ctlist)
+			{
+				ct = (CatCTup *) lfirst(ctlist_item);
+				Assert(ct->c_list == NULL);
+				Assert(ct->refcount > 0);
+				ct->refcount--;
+			}
+			/* Reset ctlist in preparation for new try */
+			ctlist = NIL;
+			in_progress_ent.dead = false;
+
 			scandesc = systable_beginscan(relation,
 										  cache->cc_indexoid,
 										  IndexScanOK(cache),
@@ -1809,9 +1900,15 @@ SearchCatCacheList(CatCache *cache,
 			/* The list will be ordered iff we are doing an index scan */
 			ordered = (scandesc->irel != NULL);
 
-			stale = false;
+			/* Injection point to help testing the recursive invalidation case */
+			if (first_iter)
+			{
+				INJECTION_POINT("catcache-list-miss-systable-scan-started");
+				first_iter = false;
+			}
 
-			while (HeapTupleIsValid(ntp = systable_getnext(scandesc)))
+			while (HeapTupleIsValid(ntp = systable_getnext(scandesc)) &&
+				   !in_progress_ent.dead)
 			{
 				uint32		hashValue;
 				Index		hashIndex;
@@ -1853,30 +1950,13 @@ SearchCatCacheList(CatCache *cache,
 				if (!found)
 				{
 					/* We didn't find a usable entry, so make a new one */
-					ct = CatalogCacheCreateEntry(cache, ntp, scandesc, NULL,
+					ct = CatalogCacheCreateEntry(cache, ntp, NULL,
 												 hashValue, hashIndex);
+
 					/* upon failure, we must start the scan over */
 					if (ct == NULL)
 					{
-						/*
-						 * Release refcounts on any items we already had.  We
-						 * dare not try to free them if they're now
-						 * unreferenced, since an error while doing that would
-						 * result in the PG_CATCH below doing extra refcount
-						 * decrements.  Besides, we'll likely re-adopt those
-						 * items in the next iteration, so it's not worth
-						 * complicating matters to try to get rid of them.
-						 */
-						foreach(ctlist_item, ctlist)
-						{
-							ct = (CatCTup *) lfirst(ctlist_item);
-							Assert(ct->c_list == NULL);
-							Assert(ct->refcount > 0);
-							ct->refcount--;
-						}
-						/* Reset ctlist in preparation for new try */
-						ctlist = NIL;
-						stale = true;
+						in_progress_ent.dead = true;
 						break;
 					}
 				}
@@ -1888,7 +1968,7 @@ SearchCatCacheList(CatCache *cache,
 			}
 
 			systable_endscan(scandesc);
-		} while (stale);
+		} while (in_progress_ent.dead);
 
 		table_close(relation, AccessShareLock);
 
@@ -1916,6 +1996,9 @@ SearchCatCacheList(CatCache *cache,
 	}
 	PG_CATCH();
 	{
+		Assert(catcache_in_progress_stack == &in_progress_ent);
+		catcache_in_progress_stack = save_in_progress;
+
 		foreach(ctlist_item, ctlist)
 		{
 			ct = (CatCTup *) lfirst(ctlist_item);
@@ -1934,6 +2017,8 @@ SearchCatCacheList(CatCache *cache,
 		PG_RE_THROW();
 	}
 	PG_END_TRY();
+	Assert(catcache_in_progress_stack == &in_progress_ent);
+	catcache_in_progress_stack = save_in_progress;
 
 	cl->cl_magic = CL_MAGIC;
 	cl->my_cache = cache;
@@ -2006,23 +2091,6 @@ ReleaseCatCacheListWithOwner(CatCList *list, ResourceOwner resowner)
 }
 
 
-/*
- * equalTuple
- *		Are these tuples memcmp()-equal?
- */
-static bool
-equalTuple(HeapTuple a, HeapTuple b)
-{
-	uint32		alen;
-	uint32		blen;
-
-	alen = a->t_len;
-	blen = b->t_len;
-	return (alen == blen &&
-			memcmp((char *) a->t_data,
-				   (char *) b->t_data, blen) == 0);
-}
-
 /*
  * CatalogCacheCreateEntry
  *		Create a new CatCTup entry, copying the given HeapTuple and other
@@ -2030,26 +2098,25 @@ equalTuple(HeapTuple a, HeapTuple b)
  *
  * To create a normal cache entry, ntp must be the HeapTuple just fetched
  * from scandesc, and "arguments" is not used.  To create a negative cache
- * entry, pass NULL for ntp and scandesc; then "arguments" is the cache
- * keys to use.  In either case, hashValue/hashIndex are the hash values
- * computed from the cache keys.
+ * entry, pass NULL for ntp; then "arguments" is the cache keys to use.
+ * In either case, hashValue/hashIndex are the hash values computed from
+ * the cache keys.
  *
  * Returns NULL if we attempt to detoast the tuple and observe that it
  * became stale.  (This cannot happen for a negative entry.)  Caller must
  * retry the tuple lookup in that case.
  */
 static CatCTup *
-CatalogCacheCreateEntry(CatCache *cache, HeapTuple ntp, SysScanDesc scandesc,
-						Datum *arguments,
+CatalogCacheCreateEntry(CatCache *cache, HeapTuple ntp, Datum *arguments,
 						uint32 hashValue, Index hashIndex)
 {
 	CatCTup    *ct;
-	HeapTuple	dtp;
 	MemoryContext oldcxt;
 
 	if (ntp)
 	{
 		int			i;
+		HeapTuple	dtp = NULL;
 
 		/*
 		 * The visibility recheck below essentially never fails during our
@@ -2073,38 +2140,35 @@ CatalogCacheCreateEntry(CatCache *cache, HeapTuple ntp, SysScanDesc scandesc,
 		 */
 		if (HeapTupleHasExternal(ntp))
 		{
-			bool		need_cmp = IsInplaceUpdateOid(cache->cc_reloid);
-			HeapTuple	before = NULL;
-			bool		matches = true;
-
-			if (need_cmp)
-				before = heap_copytuple(ntp);
-			dtp = toast_flatten_tuple(ntp, cache->cc_tupdesc);
+			CatCInProgress *save_in_progress;
+			CatCInProgress in_progress_ent;
 
 			/*
 			 * The tuple could become stale while we are doing toast table
-			 * access (since AcceptInvalidationMessages can run then).
-			 * equalTuple() detects staleness from inplace updates, while
-			 * systable_recheck_tuple() detects staleness from normal updates.
-			 *
-			 * While this equalTuple() follows the usual rule of reading with
-			 * a pin and no buffer lock, it warrants suspicion since an
-			 * inplace update could appear at any moment.  It's safe because
-			 * the inplace update sends an invalidation that can't reorder
-			 * before the inplace heap change.  If the heap change reaches
-			 * this process just after equalTuple() looks, we've not missed
-			 * its inval.
+			 * access (since AcceptInvalidationMessages can run then).  The
+			 * invalidation will mark our in-progress entry as dead.
 			 */
-			if (need_cmp)
+			save_in_progress = catcache_in_progress_stack;
+			in_progress_ent.next = catcache_in_progress_stack;
+			in_progress_ent.cache = cache;
+			in_progress_ent.hash_value = hashValue;
+			in_progress_ent.list = false;
+			in_progress_ent.dead = false;
+			catcache_in_progress_stack = &in_progress_ent;
+
+			PG_TRY();
 			{
-				matches = equalTuple(before, ntp);
-				heap_freetuple(before);
+				dtp = toast_flatten_tuple(ntp, cache->cc_tupdesc);
 			}
-			if (!matches || !systable_recheck_tuple(scandesc, ntp))
+			PG_FINALLY();
 			{
-				heap_freetuple(dtp);
-				return NULL;
+				Assert(catcache_in_progress_stack == &in_progress_ent);
+				catcache_in_progress_stack = save_in_progress;
 			}
+			PG_END_TRY();
+
+			if (in_progress_ent.dead)
+				return NULL;
 		}
 		else
 			dtp = ntp;
diff --git a/src/backend/utils/cache/inval.c b/src/backend/utils/cache/inval.c
index 254947e943b..f41d314eae3 100644
--- a/src/backend/utils/cache/inval.c
+++ b/src/backend/utils/cache/inval.c
@@ -733,7 +733,7 @@ InvalidateSystemCachesExtended(bool debug_discard)
 	int			i;
 
 	InvalidateCatalogSnapshot();
-	ResetCatalogCaches();
+	ResetCatalogCachesExt(debug_discard);
 	RelationCacheInvalidate(debug_discard); /* gets smgr and relmap too */
 
 	for (i = 0; i < syscache_callback_count; i++)
diff --git a/src/include/utils/catcache.h b/src/include/utils/catcache.h
index 1f2ca533aa9..277ec33c00b 100644
--- a/src/include/utils/catcache.h
+++ b/src/include/utils/catcache.h
@@ -220,6 +220,7 @@ extern CatCList *SearchCatCacheList(CatCache *cache, int nkeys,
 extern void ReleaseCatCacheList(CatCList *list);
 
 extern void ResetCatalogCaches(void);
+extern void ResetCatalogCachesExt(bool debug_discard);
 extern void CatalogCacheFlushCatalog(Oid catId);
 extern void CatCacheInvalidate(CatCache *cache, uint32 hashValue);
 extern void PrepareToInvalidateCacheTuple(Relation relation,
diff --git a/src/test/modules/test_misc/meson.build b/src/test/modules/test_misc/meson.build
index 65a9518a00d..9c50de7efb0 100644
--- a/src/test/modules/test_misc/meson.build
+++ b/src/test/modules/test_misc/meson.build
@@ -15,6 +15,7 @@ tests += {
       't/004_io_direct.pl',
       't/005_timeouts.pl',
       't/006_signal_autovacuum.pl',
+      't/007_catcache_inval.pl',
     ],
   },
 }
diff --git a/src/test/modules/test_misc/t/007_catcache_inval.pl b/src/test/modules/test_misc/t/007_catcache_inval.pl
new file mode 100644
index 00000000000..a1e5f56a524
--- /dev/null
+++ b/src/test/modules/test_misc/t/007_catcache_inval.pl
@@ -0,0 +1,102 @@
+
+# Copyright (c) 2025, PostgreSQL Global Development Group
+
+# Test recursive catalog cache invalidation, i.e. invalidation while a
+# catalog cache entry is being built.
+
+use strict;
+use warnings FATAL => 'all';
+
+use PostgreSQL::Test::Cluster;
+use PostgreSQL::Test::Utils;
+use Test::More;
+
+if ($ENV{enable_injection_points} ne 'yes')
+{
+	plan skip_all => 'Injection points not supported by this build';
+}
+
+# Node initialization
+my $node = PostgreSQL::Test::Cluster->new('node');
+$node->init();
+$node->append_conf(
+	'postgresql.conf', qq{
+debug_discard_caches=1
+});
+$node->start;
+
+# Check if the extension injection_points is available, as it may be
+# possible that this script is run with installcheck, where the module
+# would not be installed by default.
+if (!$node->check_extension('injection_points'))
+{
+	plan skip_all => 'Extension injection_points not installed';
+}
+
+$node->safe_psql('postgres', 'CREATE EXTENSION injection_points;');
+
+
+sub randStr
+{
+	my $len = shift;
+	my @chars = ("A" .. "Z", "a" .. "z", "0" .. "9");
+	return join '', map { @chars[ rand @chars ] } 1 .. $len;
+}
+
+# Create a function with a large body, so that it is toasted.
+my $longtext = randStr(10000);
+$node->safe_psql(
+	'postgres', qq[
+    CREATE FUNCTION foofunc(dummy integer) RETURNS integer AS \$\$ SELECT 1; /* $longtext */ \$\$ LANGUAGE SQL
+]);
+
+my $psql_session = $node->background_psql('postgres');
+my $psql_session2 = $node->background_psql('postgres');
+
+# Set injection point in the session, to pause while populating the
+# catcache list
+$psql_session->query_safe(
+	qq[
+    SELECT injection_points_set_local();
+    SELECT injection_points_attach('catcache-list-miss-systable-scan-started', 'wait');
+]);
+
+# This pauses on the injection point while populating catcache list
+# for functions with name "foofunc"
+$psql_session->query_until(
+	qr/starting_bg_psql/, q(
+   \echo starting_bg_psql
+   SELECT foofunc(1);
+));
+
+# While the first session is building the catcache list, create a new
+# function that overloads the same name. This sends a catcache
+# invalidation.
+$node->safe_psql(
+	'postgres', qq[
+    CREATE FUNCTION foofunc() RETURNS integer AS \$\$ SELECT 123 \$\$ LANGUAGE SQL
+]);
+
+# Continue the paused session. It will continue to construct the
+# catcache list, and will accept invalidations while doing that.
+#
+# (The fact that the first function has a large body is crucial,
+# because the cache invalidation is accepted during detoasting.  If
+# the body is not toasted, the invalidation is processed after
+# building the catcache list, which avoids the recursion that we are
+# trying to exercise here.)
+#
+# The "SELECT foofunc(1)" query will now finish.
+$psql_session2->query_safe(
+	qq[
+    SELECT injection_points_wakeup('catcache-list-miss-systable-scan-started');
+    SELECT injection_points_detach('catcache-list-miss-systable-scan-started');
+]);
+
+# Test that the new function is visible to the session.
+$psql_session->query_safe("SELECT foofunc();");
+
+ok($psql_session->quit);
+ok($psql_session2->quit);
+
+done_testing();
diff --git a/src/tools/pgindent/typedefs.list b/src/tools/pgindent/typedefs.list
index 9f83ecf181f..8a89a2a428f 100644
--- a/src/tools/pgindent/typedefs.list
+++ b/src/tools/pgindent/typedefs.list
@@ -384,6 +384,7 @@ CaseTestExpr
 CaseWhen
 Cash
 CastInfo
+CatCInProgress
 CatCList
 CatCTup
 CatCache
-- 
2.39.5

#21

Noah Misch

noah@leadboat.com

about 1 year ago

In reply to: Heikki Linnakangas (#20)

On Thu, Jan 09, 2025 at 11:39:53AM +0200, Heikki Linnakangas wrote:

On 07/01/2025 23:56, Noah Misch wrote:

On Tue, Dec 24, 2024 at 12:18:09AM +0200, Heikki Linnakangas wrote:

I'm thinking of the attached to fix this. It changes the strategy for
detecting concurrent cache invalidations. Instead of the "recheck" mechanism
that was introduced in commit ad98fb1422, keep a stack of "build
in-progress" entries, and CatCacheInvalidate() invalidate those
"in-progress" entries in addition to the actual CatCTup and CatCList
entries.

My first attempt was to insert the CatCTup or CatCList entry to the catcache
before starting to build it, marked with a flag to indicate that the entry
isn't fully built yet. But when I started to write that it got pretty
invasive, and it felt easier to add another structure to hold the
in-progress entries instead.

That's similar to how relcache has been doing it (in_progress_list). I see no
problem applying that technique here.

Oh thanks, I didn't notice that in relcache.c. I adjusted the naming and
comments in the new catcache.c code to be a little closer to the relcache.c
version, just to make it a bit more consistent.

The main difference is that relcache.c uses an array and an end-of-xact
callback to reset the array on error, while my implementation uses a linked
list of entries allocated on stack and PG_TRY-CATCH for error cleanup. I
considered adopting relcache.c's approach for the sake of consistency, but
decided to keep my original approach in the end. Some of the code in
catcache.c already had a suitable PG_TRY-CATCH block and I didn't want to
add a new end-of-xact callback just for this.

That difference is reasonable. catcache will add PG_TRY overhead only in the
HEAP_HASEXTERNAL case, which is not a normal benchmark situation. If relcache
were to adopt the PG_TRY approach, it would be harder to rule out the
significance of the overhead. So a long-term state of using both designs is
reasonable. It's not a mere historical accident.

gcc 4.8.5 warns:

catcache.c: In function ‘CatalogCacheCreateEntry’:
catcache.c:2159:29: warning: ‘dtp’ may be used uninitialized in this function [-Wmaybe-uninitialized]
ct->tuple.t_tableOid = dtp->t_tableOid;

I remember some commit of a gcc 4.x warning fix in a recent year, and we do
have buildfarm representation. Consider silencing it.

Hmm, I guess the compiler doesn't see that it's initialized with all the
PG_TRY()/CATCH() stuff. I initialized it to NULL to silence the warning.

It's warning-free now.

@@ -697,9 +725,14 @@ CreateCacheMemoryContext(void)
*
* This is not very efficient if the target cache is nearly empty.
* However, it shouldn't need to be efficient; we don't invoke it often.
+ *
+ * If 'debug_discard' is true, we are being called as part of
+ * debug_discard_caches.  In that case, the cache not reset for correctness,

s/cache not/cache is not/ or similar

+			PG_TRY();
{
-				matches = equalTuple(before, ntp);
-				heap_freetuple(before);
+				dtp = toast_flatten_tuple(ntp, cache->cc_tupdesc);
}
-			if (!matches || !systable_recheck_tuple(scandesc, ntp))
+			PG_FINALLY();
{
-				heap_freetuple(dtp);

Is this an intentional removal of the heap_freetuple(dtp) before return NULL?

-				return NULL;
+				Assert(catcache_in_progress_stack == &in_progress_ent);
+				catcache_in_progress_stack = save_in_progress;
}
+			PG_END_TRY();
+
+			if (in_progress_ent.dead)
+				return NULL;

...

+$node->append_conf(
+	'postgresql.conf', qq{
+debug_discard_caches=1
+});

I'd drop this debug_discard_caches. debug_discard_caches buildfarm runs will
still test it, so let's have normal runs test the normal case.

#22

Heikki Linnakangas

hlinnaka@iki.fi

12 months ago

In reply to: Noah Misch (#21)

On 12/01/2025 03:26, Noah Misch wrote:

On Thu, Jan 09, 2025 at 11:39:53AM +0200, Heikki Linnakangas wrote:

On 07/01/2025 23:56, Noah Misch wrote:
@@ -697,9 +725,14 @@ CreateCacheMemoryContext(void)
*
* This is not very efficient if the target cache is nearly empty.
* However, it shouldn't need to be efficient; we don't invoke it often.
+ *
+ * If 'debug_discard' is true, we are being called as part of
+ * debug_discard_caches.  In that case, the cache not reset for correctness,

s/cache not/cache is not/ or similar

fixed

+			PG_TRY();
{
-				matches = equalTuple(before, ntp);
-				heap_freetuple(before);
+				dtp = toast_flatten_tuple(ntp, cache->cc_tupdesc);
}
-			if (!matches || !systable_recheck_tuple(scandesc, ntp))
+			PG_FINALLY();
{
-				heap_freetuple(dtp);

Is this an intentional removal of the heap_freetuple(dtp) before return NULL?

No, that was a mistake. Fixed.

-				return NULL;
+				Assert(catcache_in_progress_stack == &in_progress_ent);
+				catcache_in_progress_stack = save_in_progress;
}
+			PG_END_TRY();
+
+			if (in_progress_ent.dead)
+				return NULL;
...
+$node->append_conf(
+	'postgresql.conf', qq{
+debug_discard_caches=1
+});
I'd drop this debug_discard_caches. debug_discard_caches buildfarm runs will
still test it, so let's have normal runs test the normal case.

Ah yes, I didn't mean to include that; removed.

Committed with those fixes. Thanks for the review!

--
Heikki Linnakangas
Neon (https://neon.tech)

#23

Andres Freund

andres@anarazel.de

10 months ago

In reply to: Heikki Linnakangas (#22)

Hi,

On 2025-01-14 15:13:21 +0200, Heikki Linnakangas wrote:

Committed with those fixes. Thanks for the review!

The test doesn't seem entirely stable. E.g.
https://cirrus-ci.com/task/6166374147424256
failed spuriously:

[08:52:06.822](0.002s) # issuing query 1 via background psql:
# SELECT injection_points_set_local();
# SELECT injection_points_attach('catcache-list-miss-systable-scan-started', 'wait');
[08:52:06.851](0.029s) # results query 1:
# {
# 'stderr' => 'background_psql: QUERY_SEPARATOR 1:
# ',
# 'stdout' => '
#
# background_psql: QUERY_SEPARATOR 1:
# '
# }
[08:52:06.893](0.042s) # issuing query 1 via background psql:
# SELECT injection_points_wakeup('catcache-list-miss-systable-scan-started');
# SELECT injection_points_detach('catcache-list-miss-systable-scan-started');
[08:52:06.897](0.004s) # pump_until: process terminated unexpectedly when searching for "(?^:(^|\n)background_psql: QUERY_SEPARATOR 1:\r?\n)" with stream: ""
process ended prematurely at /tmp/cirrus-ci-build/src/test/perl/PostgreSQL/Test/Utils.pm line 440.

2025-03-25 08:52:06.896 UTC [34240][client backend] [007_catcache_inval.pl][4/2:0] ERROR: could not find injection point catcache-list-miss-systable-scan-started to wake up
2025-03-25 08:52:06.896 UTC [34240][client backend] [007_catcache_inval.pl][4/2:0] STATEMENT: SELECT injection_points_wakeup('catcache-list-miss-systable-scan-started');

Greetings,

Andres Freund

#24

Andres Freund

andres@anarazel.de

4 months ago

In reply to: Andres Freund (#23)

On 2025-03-26 07:21:43 -0400, Andres Freund wrote:

On 2025-01-14 15:13:21 +0200, Heikki Linnakangas wrote:

Committed with those fixes. Thanks for the review!

The test doesn't seem entirely stable. E.g.
https://cirrus-ci.com/task/6166374147424256
failed spuriously:

[08:52:06.822](0.002s) # issuing query 1 via background psql:
# SELECT injection_points_set_local();
# SELECT injection_points_attach('catcache-list-miss-systable-scan-started', 'wait');
[08:52:06.851](0.029s) # results query 1:
# {
# 'stderr' => 'background_psql: QUERY_SEPARATOR 1:
# ',
# 'stdout' => '
#
# background_psql: QUERY_SEPARATOR 1:
# '
# }
[08:52:06.893](0.042s) # issuing query 1 via background psql:
# SELECT injection_points_wakeup('catcache-list-miss-systable-scan-started');
# SELECT injection_points_detach('catcache-list-miss-systable-scan-started');
[08:52:06.897](0.004s) # pump_until: process terminated unexpectedly when searching for "(?^:(^|\n)background_psql: QUERY_SEPARATOR 1:\r?\n)" with stream: ""
process ended prematurely at /tmp/cirrus-ci-build/src/test/perl/PostgreSQL/Test/Utils.pm line 440.

2025-03-25 08:52:06.896 UTC [34240][client backend] [007_catcache_inval.pl][4/2:0] ERROR: could not find injection point catcache-list-miss-systable-scan-started to wake up
2025-03-25 08:52:06.896 UTC [34240][client backend] [007_catcache_inval.pl][4/2:0] STATEMENT: SELECT injection_points_wakeup('catcache-list-miss-systable-scan-started');

And again: https://cirrus-ci.com/task/6082321633247232

Ping?

#25

Arseniy Mukhin

arseniy.mukhin.dev@gmail.com

3 months ago

In reply to: Andres Freund (#24)

1 attachment(s)

Hi,

On Fri, Sep 19, 2025 at 11:11 PM Andres Freund <andres@anarazel.de> wrote:

On 2025-03-26 07:21:43 -0400, Andres Freund wrote:

On 2025-01-14 15:13:21 +0200, Heikki Linnakangas wrote:

Committed with those fixes. Thanks for the review!

The test doesn't seem entirely stable. E.g.
https://cirrus-ci.com/task/6166374147424256
failed spuriously:

[08:52:06.822](0.002s) # issuing query 1 via background psql:
# SELECT injection_points_set_local();
# SELECT injection_points_attach('catcache-list-miss-systable-scan-started', 'wait');
[08:52:06.851](0.029s) # results query 1:
# {
# 'stderr' => 'background_psql: QUERY_SEPARATOR 1:
# ',
# 'stdout' => '
#
# background_psql: QUERY_SEPARATOR 1:
# '
# }
[08:52:06.893](0.042s) # issuing query 1 via background psql:
# SELECT injection_points_wakeup('catcache-list-miss-systable-scan-started');
# SELECT injection_points_detach('catcache-list-miss-systable-scan-started');
[08:52:06.897](0.004s) # pump_until: process terminated unexpectedly when searching for "(?^:(^|\n)background_psql: QUERY_SEPARATOR 1:\r?\n)" with stream: ""
process ended prematurely at /tmp/cirrus-ci-build/src/test/perl/PostgreSQL/Test/Utils.pm line 440.

2025-03-25 08:52:06.896 UTC [34240][client backend] [007_catcache_inval.pl][4/2:0] ERROR: could not find injection point catcache-list-miss-systable-scan-started to wake up
2025-03-25 08:52:06.896 UTC [34240][client backend] [007_catcache_inval.pl][4/2:0] STATEMENT: SELECT injection_points_wakeup('catcache-list-miss-systable-scan-started');

And again: https://cirrus-ci.com/task/6082321633247232

Ping?

The wait_for_event call, which is typically used with a wait injection
point, is missing. Could this be the cause of instability? If this
makes sense, please find the attached fix.

Best regards,
Arseniy Mukhin

Attachments:

0001-Add-missed-wait_for_event-in-007_catcache_inval.pl-T.patch.nocfbotapplication/octet-stream; name=0001-Add-missed-wait_for_event-in-007_catcache_inval.pl-T.patch.nocfbotDownload

From 6611f5dc305922cf6916607edd35cc4ee7e592bc Mon Sep 17 00:00:00 2001
From: Arseniy Mukhin <arseniy.mukhin.dev@gmail.com>
Date: Mon, 29 Sep 2025 19:05:32 +0300
Subject: [PATCH] Add missed wait_for_event in 007_catcache_inval.pl TAP test

---
 src/test/modules/test_misc/t/007_catcache_inval.pl | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/src/test/modules/test_misc/t/007_catcache_inval.pl b/src/test/modules/test_misc/t/007_catcache_inval.pl
index 422301b5342..d5f5546d70a 100644
--- a/src/test/modules/test_misc/t/007_catcache_inval.pl
+++ b/src/test/modules/test_misc/t/007_catcache_inval.pl
@@ -65,6 +65,8 @@ $psql_session->query_until(
    SELECT foofunc(1);
 ));
 
+$node->wait_for_event('client backend', 'catcache-list-miss-systable-scan-started');
+
 # While the first session is building the catcache list, create a new
 # function that overloads the same name. This sends a catcache
 # invalidation.
-- 
2.43.0

#26

Andres Freund

andres@anarazel.de

24 days ago

In reply to: Arseniy Mukhin (#25)

Hi,

On 2025-09-29 19:34:47 +0300, Arseniy Mukhin wrote:

On Fri, Sep 19, 2025 at 11:11 PM Andres Freund <andres@anarazel.de> wrote:

On 2025-03-26 07:21:43 -0400, Andres Freund wrote:

On 2025-01-14 15:13:21 +0200, Heikki Linnakangas wrote:

Committed with those fixes. Thanks for the review!

The test doesn't seem entirely stable. E.g.
https://cirrus-ci.com/task/6166374147424256
failed spuriously:

[08:52:06.822](0.002s) # issuing query 1 via background psql:
# SELECT injection_points_set_local();
# SELECT injection_points_attach('catcache-list-miss-systable-scan-started', 'wait');
[08:52:06.851](0.029s) # results query 1:
# {
# 'stderr' => 'background_psql: QUERY_SEPARATOR 1:
# ',
# 'stdout' => '
#
# background_psql: QUERY_SEPARATOR 1:
# '
# }
[08:52:06.893](0.042s) # issuing query 1 via background psql:
# SELECT injection_points_wakeup('catcache-list-miss-systable-scan-started');
# SELECT injection_points_detach('catcache-list-miss-systable-scan-started');
[08:52:06.897](0.004s) # pump_until: process terminated unexpectedly when searching for "(?^:(^|\n)background_psql: QUERY_SEPARATOR 1:\r?\n)" with stream: ""
process ended prematurely at /tmp/cirrus-ci-build/src/test/perl/PostgreSQL/Test/Utils.pm line 440.

2025-03-25 08:52:06.896 UTC [34240][client backend] [007_catcache_inval.pl][4/2:0] ERROR: could not find injection point catcache-list-miss-systable-scan-started to wake up
2025-03-25 08:52:06.896 UTC [34240][client backend] [007_catcache_inval.pl][4/2:0] STATEMENT: SELECT injection_points_wakeup('catcache-list-miss-systable-scan-started');

And again: https://cirrus-ci.com/task/6082321633247232

Ping?

The wait_for_event call, which is typically used with a wait injection
point, is missing. Could this be the cause of instability? If this
makes sense, please find the attached fix.

I was just reminded of this thread because I saw the failure again:
https://cirrus-ci.com/task/5859971612540928
(it's unrelated to the patch)

I think you might be right - the wait point might not yet have been reached,
because the query_until() just waits for "starting_bg_psql" being printed by
\echo starting_bg_psql
SELECT foofunc(1);

while the wait point is only hit during the "SELECT foofunc(1)'. There's no
guarantee that we will have reached the wait point by this point.

I found that I can reproduce the issue with

--- i/src/test/modules/test_misc/t/007_catcache_inval.pl
+++ w/src/test/modules/test_misc/t/007_catcache_inval.pl
@@ -53,6 +53,7 @@ my $psql_session2 = $node->background_psql('postgres');
 # catcache list
 $psql_session->query_safe(
     qq[
+    SELECT pg_sleep(0.1);
     SELECT injection_points_set_local();
     SELECT injection_points_attach('catcache-list-miss-systable-scan-started', 'wait');
 ]);
@@ -62,6 +63,7 @@ $psql_session->query_safe(
 $psql_session->query_until(
     qr/starting_bg_psql/, q(
    \echo starting_bg_psql
+   SELECT pg_sleep(3);
    SELECT foofunc(1);
 ));

(the first SELECT just is there to later avoid hitting the injection point, by
already having loaded the cache entry for pg_sleep).

And indeed your patch fixes that.

Greetings,

Andres Freund