Recovering from detoast-related catcache invalidations
In bug #18163 [1]/messages/by-id/18163-859bad19a43edcf6@postgresql.org, Alexander proved the misgivings I had in [2]/messages/by-id/1389919.1697144487@sss.pgh.pa.us
about catcache detoasting being possibly unsafe:
BTW, while nosing around I found what seems like a very nasty related
bug. Suppose that a catalog tuple being loaded into syscache contains
some toasted fields. CatalogCacheCreateEntry will flatten the tuple,
involving fetches from toast tables that will certainly cause
AcceptInvalidationMessages calls. What if one of those should have
invalidated this tuple? We will not notice, because it's not in
the hashtable yet. When we do add it, we will mark it not-dead,
meaning that the stale entry looks fine and could persist for a long
while.
Attached is a POC patch for fixing this. The idea is that if we get
an invalidation while trying to detoast a catalog tuple, we should
go around and re-read the tuple a second time to get an up-to-date
version (or, possibly, discover that it no longer exists). In the
case of SearchCatCacheList, we have to drop and reload the entire
catcache list, but fortunately not a lot of new code is needed.
The detection of "get an invalidation" could be refined: what I did
here is to check for any advance of SharedInvalidMessageCounter,
which clearly will have a significant number of false positives.
However, the only way I see to make that a lot better is to
temporarily create a placeholder catcache entry (probably a negative
one) with the same keys, and then see if it got marked dead.
This seems a little expensive, plus I'm afraid that it'd be actively
wrong in the recursive-lookup cases that the existing comment in
SearchCatCacheMiss is talking about (that is, the catcache entry
might mislead any recursive lookup that happens).
Moreover, if we did do something like that then the new code
paths would be essentially untested. As the patch stands, the
reload path seems to get taken 10 to 20 times during a
"make installcheck-parallel" run of the core regression tests
(out of about 150 times that catcache detoasting is required).
Probably all of those are false-positive cases, but at least
they're exercising the logic.
So I'm inclined to leave it like this, but perhaps somebody
else will have a different opinion.
(BTW, there's a fair amount of existing catcache.c code that
will need to be indented another tab stop, but in the interests
of keeping the patch legible I didn't do that yet.)
Comments?
regards, tom lane
[1]: /messages/by-id/18163-859bad19a43edcf6@postgresql.org
[2]: /messages/by-id/1389919.1697144487@sss.pgh.pa.us
Attachments:
v1-fix-stale-catcache-entries.patchtext/x-diff; charset=us-ascii; name=v1-fix-stale-catcache-entries.patchDownload+92-20
I wrote:
In bug #18163 [1], Alexander proved the misgivings I had in [2]
about catcache detoasting being possibly unsafe:
...
Attached is a POC patch for fixing this.
The cfbot pointed out that this needed a rebase. No substantive
changes.
regards, tom lane