Failures in constraints regression test, "read only 0 of 8192 bytes"
These two animals seem to have got mixed up about about the size of
this relation in the same place:
https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=avocet&dt=2024-02-28%2017%3A34%3A30
https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=trilobite&dt=2024-03-01%2006%3A47%3A53
+++ /home/buildfarm/trilobite/buildroot/HEAD/pgsql.build/src/test/regress/results/constraints.out
2024-03-01 08:22:11.624897033 +0100
@@ -573,42 +573,38 @@
UNIQUE (i) DEFERRABLE INITIALLY DEFERRED;
BEGIN;
INSERT INTO unique_tbl VALUES (1, 'five');
+ERROR: could not read blocks 0..0 in file "base/16384/21437": read
only 0 of 8192 bytes
That error message changed slightly in my smgrreadv() commit a couple
of months ago (it would have been "block 0" and now it's "blocks 0..0"
because now we can read more than one block at a time) but I don't
immediately see how anything at that low level could be responsible
for this.
These are "my" animals (running at a local university). There's a couple
interesting details:
1) the animals run on the same machine (one with gcc, one with clang)
2) I did upgrade the OS and restarted the machine on 2024/02/26, i.e.
right before the failures started
These might be just coincidences, but maybe something got broken by the
upgrade ... OTOH it's weird it'd affect just HEAD and none of the other
branches, and on two difference compilers.
Just to be sure I removed the buildroot, in case there's something wrong
with ccache. It's a wild guess, but I don't have any other idea.
regards
On 3/2/24 22:39, Thomas Munro wrote:
These two animals seem to have got mixed up about about the size of
this relation in the same place:https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=avocet&dt=2024-02-28%2017%3A34%3A30
https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=trilobite&dt=2024-03-01%2006%3A47%3A53+++ /home/buildfarm/trilobite/buildroot/HEAD/pgsql.build/src/test/regress/results/constraints.out 2024-03-01 08:22:11.624897033 +0100 @@ -573,42 +573,38 @@ UNIQUE (i) DEFERRABLE INITIALLY DEFERRED; BEGIN; INSERT INTO unique_tbl VALUES (1, 'five'); +ERROR: could not read blocks 0..0 in file "base/16384/21437": read only 0 of 8192 bytesThat error message changed slightly in my smgrreadv() commit a couple
of months ago (it would have been "block 0" and now it's "blocks 0..0"
because now we can read more than one block at a time) but I don't
immediately see how anything at that low level could be responsible
for this.
--
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
Le samedi 2 mars 2024, 23:29:52 CET Tomas Vondra a écrit :
These are "my" animals (running at a local university). There's a couple
interesting details:
Hi Tomas,
do you still have the failing cluster data ?
Noah pointed me to this thread, and it looks a bit similar to the FSM
corruption issue I'm facing: https://www.postgresql.org/message-id/
1925490.taCxCBeP46%40aivenlaptop
So if you still have the data, it would be nice to see if you indeed have a
corrupted FSM, and if you have indications when it happened.
Best regards,
--
Ronan Dunklau
On 3/4/24 14:16, Ronan Dunklau wrote:
Le samedi 2 mars 2024, 23:29:52 CET Tomas Vondra a écrit :
These are "my" animals (running at a local university). There's a couple
interesting details:Hi Tomas,
do you still have the failing cluster data ?Noah pointed me to this thread, and it looks a bit similar to the FSM
corruption issue I'm facing: https://www.postgresql.org/message-id/
1925490.taCxCBeP46%40aivenlaptopSo if you still have the data, it would be nice to see if you indeed have a
corrupted FSM, and if you have indications when it happened.
Sorry, I nuked the buildroot so I don't have the data anymore. Let's see
if it fails again.
regards
--
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
Happened again. I see this is OpenSUSE. Does that mean the file
system is Btrfs?
On 3/8/24 09:33, Thomas Munro wrote:
Happened again. I see this is OpenSUSE. Does that mean the file
system is Btrfs?
It is, but I don't think that matters - I've been able to reproduce this
locally on my laptop using ext4 filesystem. I'd bet the important piece
here is -DCLOBBER_CACHE_ALWAYS (and it seems avocet/trilobite are the
only animals running with this).
Also, if this really is a filesystem (or environment) issue, it seems
very strange it'd only affect HEAD and not the other branches. So it
seems quite likely this is actually triggered by a commit.
Looking at the commits from the good/bad runs, I see this:
avocet: good=4c2369a bad=f5a465f
trilobite: good=d13ff82 bad=def0ce3
That means the commit would have to be somewhere here:
f5a465f1a07 Promote assertion about !ReindexIsProcessingIndex to ...
57b28c08305 Doc: fix minor typos in two ECPG function descriptions.
28e858c0f95 Improve documentation and GUC description for ...
a661bf7b0f5 Remove flaky isolation tests for timeouts
874d817baa1 Multiple revisions to the GROUP BY reordering tests
466979ef031 Replace lateral references to removed rels in subqueries
a6b2a51e16d Avoid dangling-pointer problem with partitionwise ...
d360e3cc60e Fix compiler warning on typedef redeclaration
8af25652489 Introduce a new smgr bulk loading facility.
e612384fc78 Fix mistake in SQL features list
d13ff82319c Fix BF failure in commit 93db6cbda0.
My guess would be 8af25652489, as it's the only storage-related commit.
I'm currently running tests to verify this.
regards
--
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
On 3/8/24 13:21, Tomas Vondra wrote:
On 3/8/24 09:33, Thomas Munro wrote:
Happened again. I see this is OpenSUSE. Does that mean the file
system is Btrfs?It is, but I don't think that matters - I've been able to reproduce this
locally on my laptop using ext4 filesystem. I'd bet the important piece
here is -DCLOBBER_CACHE_ALWAYS (and it seems avocet/trilobite are the
only animals running with this).Also, if this really is a filesystem (or environment) issue, it seems
very strange it'd only affect HEAD and not the other branches. So it
seems quite likely this is actually triggered by a commit.Looking at the commits from the good/bad runs, I see this:
avocet: good=4c2369a bad=f5a465f
trilobite: good=d13ff82 bad=def0ce3That means the commit would have to be somewhere here:
f5a465f1a07 Promote assertion about !ReindexIsProcessingIndex to ...
57b28c08305 Doc: fix minor typos in two ECPG function descriptions.
28e858c0f95 Improve documentation and GUC description for ...
a661bf7b0f5 Remove flaky isolation tests for timeouts
874d817baa1 Multiple revisions to the GROUP BY reordering tests
466979ef031 Replace lateral references to removed rels in subqueries
a6b2a51e16d Avoid dangling-pointer problem with partitionwise ...
d360e3cc60e Fix compiler warning on typedef redeclaration
8af25652489 Introduce a new smgr bulk loading facility.
e612384fc78 Fix mistake in SQL features list
d13ff82319c Fix BF failure in commit 93db6cbda0.My guess would be 8af25652489, as it's the only storage-related commit.
I'm currently running tests to verify this.
Yup, the breakage starts with this commit. I haven't looked into the
root cause, or whether the commit maybe just made some pre-existing
issue easier to hit. Also, I haven't followed the discussion on the
pgsql-bugs thread [1]/messages/by-id/1878547.tdWV9SEqCh@aivenlaptop, maybe there are some interesting findings.
[1]: /messages/by-id/1878547.tdWV9SEqCh@aivenlaptop
/messages/by-id/1878547.tdWV9SEqCh@aivenlaptop
regards
--
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
Le vendredi 8 mars 2024, 14:36:48 CET Tomas Vondra a écrit :
My guess would be 8af25652489, as it's the only storage-related commit.
I'm currently running tests to verify this.
Yup, the breakage starts with this commit. I haven't looked into the
root cause, or whether the commit maybe just made some pre-existing
issue easier to hit. Also, I haven't followed the discussion on the
pgsql-bugs thread [1], maybe there are some interesting findings.
If that happens only on HEAD and not on 16, and doesn't involve WAL replay,
then it's not the same bug.
--
Ronan Dunklau
On Sat, Mar 9, 2024 at 2:36 AM Tomas Vondra
<tomas.vondra@enterprisedb.com> wrote:
On 3/8/24 13:21, Tomas Vondra wrote:
My guess would be 8af25652489, as it's the only storage-related commit.
I'm currently running tests to verify this.
Yup, the breakage starts with this commit. I haven't looked into the
root cause, or whether the commit maybe just made some pre-existing
issue easier to hit. Also, I haven't followed the discussion on the
pgsql-bugs thread [1], maybe there are some interesting findings.
Adding Heikki.
On 3/8/24 21:29, Thomas Munro wrote:
On Sat, Mar 9, 2024 at 2:36 AM Tomas Vondra
<tomas.vondra@enterprisedb.com> wrote:On 3/8/24 13:21, Tomas Vondra wrote:
My guess would be 8af25652489, as it's the only storage-related commit.
I'm currently running tests to verify this.
Yup, the breakage starts with this commit. I haven't looked into the
root cause, or whether the commit maybe just made some pre-existing
issue easier to hit. Also, I haven't followed the discussion on the
pgsql-bugs thread [1], maybe there are some interesting findings.Adding Heikki.
I spent a bit of time investigating this today, but haven't made much
progress due to (a) my unfamiliarity with the smgr code in general and
the patch in particular, and (b) CLOBBER_CACHE_ALWAYS making it quite
time consuming to iterate and experiment.
However, the smallest schedule that still reproduces the issue is:
-------------------
test: test_setup
test: create_aggregate create_function_sql create_cast constraints
triggers select inherit typed_table vacuum drop_if_exists
updatable_views roleattributes create_am hash_func errors infinite_recurse
-------------------
I tried to reduce the second step to just "constraints", but that does
not fail. Clearly there's some concurrency at play, and having just a
single backend makes that go away.
regards
--
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
On Sat, Mar 9, 2024 at 9:48 AM Tomas Vondra
<tomas.vondra@enterprisedb.com> wrote:
I spent a bit of time investigating this today, but haven't made much
progress due to (a) my unfamiliarity with the smgr code in general and
the patch in particular, and (b) CLOBBER_CACHE_ALWAYS making it quite
time consuming to iterate and experiment.However, the smallest schedule that still reproduces the issue is:
-------------------
test: test_setuptest: create_aggregate create_function_sql create_cast constraints
triggers select inherit typed_table vacuum drop_if_exists
updatable_views roleattributes create_am hash_func errors infinite_recurse
-------------------
Thanks, reproduced here (painfully slowly). Looking...
Huh, also look at these extra problems later in the logs of the latest
trilobite and avocet runs, further down after the short read errors:
TRAP: failed Assert("j > attnum"), File: "heaptuple.c", Line: 640, PID: 15753
postgres: autovacuum worker regression(ExceptionalCondition+0x67)[0x9c5f37]
postgres: autovacuum worker regression[0x4b60c8]
postgres: autovacuum worker regression[0x5ff735]
postgres: autovacuum worker regression[0x5fe468]
postgres: autovacuum worker regression(analyze_rel+0x133)[0x5fd5e3]
postgres: autovacuum worker regression(vacuum+0x6b6)[0x683926]
postgres: autovacuum worker regression[0x7ce5e3]
postgres: autovacuum worker regression[0x7cc4f0]
postgres: autovacuum worker regression(StartAutoVacWorker+0x22)[0x7cc152]
postgres: autovacuum worker regression[0x7d57d1]
postgres: autovacuum worker regression[0x7d37bf]
postgres: autovacuum worker regression[0x6f5a4f]
Then crash recovery fails, in one case with:
2024-03-07 20:28:18.632 CET [15860:4] WARNING: will not overwrite a used ItemId
2024-03-07 20:28:18.632 CET [15860:5] CONTEXT: WAL redo at 0/FB07A48
for Heap/INSERT: off: 9, flags: 0x00; blkref #0: rel 1663/16384/2610,
blk 12
2024-03-07 20:28:18.632 CET [15860:6] PANIC: failed to add tuple
2024-03-07 20:28:18.632 CET [15860:7] CONTEXT: WAL redo at 0/FB07A48
for Heap/INSERT: off: 9, flags: 0x00; blkref #0: rel 1663/16384/2610,
blk 12
... and in another with:
2024-03-05 11:51:27.992 CET [25510:4] PANIC: invalid lp
2024-03-05 11:51:27.992 CET [25510:5] CONTEXT: WAL redo at 0/F87A8D0
for Heap/INPLACE: off: 29; blkref #0: rel 1663/16384/20581, blk 0
On Sun, Mar 10, 2024 at 5:02 PM Thomas Munro <thomas.munro@gmail.com> wrote:
Thanks, reproduced here (painfully slowly). Looking...
I changed that ERROR to a PANIC and now I can see that
_bt_metaversion() is failing to read a meta page (block 0), and the
file is indeed of size 0 in my filesystem. Which is not cool, for a
btree. Looking at btbuildempty(), we have this sequence:
bulkstate = smgr_bulk_start_rel(index, INIT_FORKNUM);
/* Construct metapage. */
metabuf = smgr_bulk_get_buf(bulkstate);
_bt_initmetapage((Page) metabuf, P_NONE, 0, allequalimage);
smgr_bulk_write(bulkstate, BTREE_METAPAGE, metabuf, true);
smgr_bulk_finish(bulkstate);
Ooh. One idea would be that the smgr lifetime stuff is b0rked,
introducing corruption. Bulk write itself isn't pinning the smgr
relation, it's relying purely on the object not being invalidated,
which the theory of 21d9c3ee's commit message allowed for but ... here
it's destroyed (HASH_REMOVE'd) sooner under CACHE_CLOBBER_ALWAYS,
which I think we failed to grok. If that's it, I'm surprised that
things don't implode more spectacularly. Perhaps HASH_REMOVE should
clobber objects in debug builds, similar to pfree?
For that hypothesis, the corruption might not be happening in the
above-quoted code itself, because it doesn't seem to have an
invalidation acceptance point (unless I'm missing it). Some other
bulk write got mixed up? Not sure yet.
I won't be surprised if the answer is: if you're holding a reference,
you have to get a pin (referring to bulk_write.c).
On Sun, Mar 10, 2024 at 6:48 PM Thomas Munro <thomas.munro@gmail.com> wrote:
I won't be surprised if the answer is: if you're holding a reference,
you have to get a pin (referring to bulk_write.c).
Ahhh, on second thoughts, I take that back, I think the original
theory still actually works just fine. It's just that somewhere in
our refactoring of that commit, when we were vacillating between
different semantics for 'destroy' and 'release', I think we made a
mistake: in RelationCacheInvalidate() I think we should now call
smgrreleaseall(), not smgrdestroyall(). That satisfies the
requirements for sinval queue overflow: we close md.c segments (and
most importantly virtual file descriptors), so our lack of sinval
records can't hurt us, we'll reopen all files as required. That's
what CLOBBER_CACHE_ALWAYS is effectively testing (and more). But the
smgr pointer remains valid, and retains only its "identity", eg hash
table key, and that's also fine. It won't be destroyed until after
the end of the transaction. Which was the point, and it allows things
like bulk_write.c (and streaming_read.c) to hold an smgr reference.
Right?
Thanks for diagnosing this!
On 10/03/2024 08:23, Thomas Munro wrote:
On Sun, Mar 10, 2024 at 6:48 PM Thomas Munro <thomas.munro@gmail.com> wrote:
I won't be surprised if the answer is: if you're holding a reference,
you have to get a pin (referring to bulk_write.c).Ahhh, on second thoughts, I take that back, I think the original
theory still actually works just fine. It's just that somewhere in
our refactoring of that commit, when we were vacillating between
different semantics for 'destroy' and 'release', I think we made a
mistake: in RelationCacheInvalidate() I think we should now call
smgrreleaseall(), not smgrdestroyall().
Yes, I ran the reproducer with 'rr', and came to the same conclusion.
That smgrdestroyall() call closes destroys the SmgrRelation, breaking
the new assumption that an unpinned SmgrRelation is valid until end of
transaction.
That satisfies the
requirements for sinval queue overflow: we close md.c segments (and
most importantly virtual file descriptors), so our lack of sinval
records can't hurt us, we'll reopen all files as required. That's
what CLOBBER_CACHE_ALWAYS is effectively testing (and more). But the
smgr pointer remains valid, and retains only its "identity", eg hash
table key, and that's also fine. It won't be destroyed until after
the end of the transaction. Which was the point, and it allows things
like bulk_write.c (and streaming_read.c) to hold an smgr reference.
Right
+1.
I wonder if we can now simplify RelationCacheInvalidate() further. In
the loop above that smgrdestroyall():
while ((idhentry = (RelIdCacheEnt *) hash_seq_search(&status)) != NULL)
{
relation = idhentry->reldesc;/* Must close all smgr references to avoid leaving dangling ptrs */
RelationCloseSmgr(relation);/*
* Ignore new relations; no other backend will manipulate them before
* we commit. Likewise, before replacing a relation's relfilelocator,
* we shall have acquired AccessExclusiveLock and drained any
* applicable pending invalidations.
*/
if (relation->rd_createSubid != InvalidSubTransactionId ||
relation->rd_firstRelfilelocatorSubid != InvalidSubTransactionId)
continue;
I don't think we need to call RelationCloseSmgr() for relations created
in the same transaction anymore.
--
Heikki Linnakangas
Neon (https://neon.tech)
On 10/03/2024 11:20, Heikki Linnakangas wrote:
On 10/03/2024 08:23, Thomas Munro wrote:
On Sun, Mar 10, 2024 at 6:48 PM Thomas Munro <thomas.munro@gmail.com> wrote:
I won't be surprised if the answer is: if you're holding a reference,
you have to get a pin (referring to bulk_write.c).Ahhh, on second thoughts, I take that back, I think the original
theory still actually works just fine. It's just that somewhere in
our refactoring of that commit, when we were vacillating between
different semantics for 'destroy' and 'release', I think we made a
mistake: in RelationCacheInvalidate() I think we should now call
smgrreleaseall(), not smgrdestroyall().Yes, I ran the reproducer with 'rr', and came to the same conclusion.
That smgrdestroyall() call closes destroys the SmgrRelation, breaking
the new assumption that an unpinned SmgrRelation is valid until end of
transaction.That satisfies the
requirements for sinval queue overflow: we close md.c segments (and
most importantly virtual file descriptors), so our lack of sinval
records can't hurt us, we'll reopen all files as required. That's
what CLOBBER_CACHE_ALWAYS is effectively testing (and more). But the
smgr pointer remains valid, and retains only its "identity", eg hash
table key, and that's also fine. It won't be destroyed until after
the end of the transaction. Which was the point, and it allows things
like bulk_write.c (and streaming_read.c) to hold an smgr reference.
Right+1.
I wonder if we can now simplify RelationCacheInvalidate() further. In
the loop above that smgrdestroyall():while ((idhentry = (RelIdCacheEnt *) hash_seq_search(&status)) != NULL)
{
relation = idhentry->reldesc;/* Must close all smgr references to avoid leaving dangling ptrs */
RelationCloseSmgr(relation);/*
* Ignore new relations; no other backend will manipulate them before
* we commit. Likewise, before replacing a relation's relfilelocator,
* we shall have acquired AccessExclusiveLock and drained any
* applicable pending invalidations.
*/
if (relation->rd_createSubid != InvalidSubTransactionId ||
relation->rd_firstRelfilelocatorSubid != InvalidSubTransactionId)
continue;I don't think we need to call RelationCloseSmgr() for relations created
in the same transaction anymore.
Barring objections, I'll commit the attached.
Hmm, I'm not sure if we need even smgrreleaseall() here anymore. It's
not required for correctness AFAICS. We don't do it in single-rel
invalidation in RelationCacheInvalidateEntry() either.
--
Heikki Linnakangas
Neon (https://neon.tech)
Attachments:
0001-Don-t-destroy-SMgrRelations-at-relcache-invalidation.patchtext/x-patch; charset=UTF-8; name=0001-Don-t-destroy-SMgrRelations-at-relcache-invalidation.patchDownload
From be03591a4c69224de5b96335aa4a29b3462e2bc9 Mon Sep 17 00:00:00 2001
From: Heikki Linnakangas <heikki.linnakangas@iki.fi>
Date: Sun, 10 Mar 2024 22:18:59 +0200
Subject: [PATCH 1/1] Don't destroy SMgrRelations at relcache invalidation
With commit 21d9c3ee4e, SMgrRelations remain valid until end of
transaction (or longer if they're "pinned"). Relcache invalidation can
happen in the middle of a transaction, so we must not destroy them at
relcache invalidation anymore.
This was revealed by failures in the 'constraints' test in buildfarm
animals using -DCLOBBER_CACHE_ALWAYS. That started failing with commit
8af2565248, which was the first commit that started to rely on an
SMgrRelation living until end of transaction.
Diagnosed-by: Tomas Vondra, Thomas Munro
Discussion: https://www.postgresql.org/message-id/CA%2BhUKGK%2B5DOmLaBp3Z7C4S-Yv6yoROvr1UncjH2S1ZbPT8D%2BZg%40mail.gmail.com
---
src/backend/utils/cache/relcache.c | 10 +++-------
1 file changed, 3 insertions(+), 7 deletions(-)
diff --git a/src/backend/utils/cache/relcache.c b/src/backend/utils/cache/relcache.c
index 7ad6a35a50e..2cd19d603fb 100644
--- a/src/backend/utils/cache/relcache.c
+++ b/src/backend/utils/cache/relcache.c
@@ -2985,9 +2985,6 @@ RelationCacheInvalidate(bool debug_discard)
{
relation = idhentry->reldesc;
- /* Must close all smgr references to avoid leaving dangling ptrs */
- RelationCloseSmgr(relation);
-
/*
* Ignore new relations; no other backend will manipulate them before
* we commit. Likewise, before replacing a relation's relfilelocator,
@@ -3039,11 +3036,10 @@ RelationCacheInvalidate(bool debug_discard)
}
/*
- * Now zap any remaining smgr cache entries. This must happen before we
- * start to rebuild entries, since that may involve catalog fetches which
- * will re-open catalog files.
+ * We cannot destroy the SMgrRelations as there might still be references
+ * to them, but close the underlying file descriptors.
*/
- smgrdestroyall();
+ smgrreleaseall();
/* Phase 2: rebuild the items found to need rebuild in phase 1 */
foreach(l, rebuildFirstList)
--
2.39.2
On Mon, Mar 11, 2024 at 9:30 AM Heikki Linnakangas <hlinnaka@iki.fi> wrote:
Barring objections, I'll commit the attached.
+1
I guess the comment for smgrreleaseall() could also be updated. It
mentions only PROCSIGNAL_BARRIER_SMGRRELEASE, but I think sinval
overflow (InvalidateSystemCaches()) should also be mentioned?
Hmm, I'm not sure if we need even smgrreleaseall() here anymore. It's
not required for correctness AFAICS. We don't do it in single-rel
invalidation in RelationCacheInvalidateEntry() either.
I think we do, because we have missed sinval messages. It's unlikely
but a relfilenode might have been recycled, and we might have file
descriptors that point to the unlinked files. That is, there are new
files with the same names and we need to open those ones.
On Mon, Mar 11, 2024 at 9:59 AM Thomas Munro <thomas.munro@gmail.com> wrote:
On Mon, Mar 11, 2024 at 9:30 AM Heikki Linnakangas <hlinnaka@iki.fi> wrote:
Hmm, I'm not sure if we need even smgrreleaseall() here anymore. It's
not required for correctness AFAICS. We don't do it in single-rel
invalidation in RelationCacheInvalidateEntry() either.I think we do, because we have missed sinval messages. It's unlikely
but a relfilenode might have been recycled, and we might have file
descriptors that point to the unlinked files. That is, there are new
files with the same names and we need to open those ones.
... though I think you would be right if Dilip and Robert had
succeeded in their quest to introduce 56-bit non-cycling relfilenodes.
And for the record, we can also shoot ourselves in the foot in another
known case without sinval[1]/messages/by-id/CA+hUKGLs554tQFCUjv_vn7ft9Xv5LNjPoAd--3Df+JJKJ7A8kw@mail.gmail.com, so more work is needed here, but that
doesn't mean this sinval code path should also aim footwards.
[1]: /messages/by-id/CA+hUKGLs554tQFCUjv_vn7ft9Xv5LNjPoAd--3Df+JJKJ7A8kw@mail.gmail.com
On 10/03/2024 22:59, Thomas Munro wrote:
On Mon, Mar 11, 2024 at 9:30 AM Heikki Linnakangas <hlinnaka@iki.fi> wrote:
Barring objections, I'll commit the attached.
+1
Pushed, thanks!
I guess the comment for smgrreleaseall() could also be updated. It
mentions only PROCSIGNAL_BARRIER_SMGRRELEASE, but I think sinval
overflow (InvalidateSystemCaches()) should also be mentioned?
I removed that comment; people can grep to find the callers.
Hmm, I'm not sure if we need even smgrreleaseall() here anymore. It's
not required for correctness AFAICS. We don't do it in single-rel
invalidation in RelationCacheInvalidateEntry() either.I think we do, because we have missed sinval messages. It's unlikely
but a relfilenode might have been recycled, and we might have file
descriptors that point to the unlinked files. That is, there are new
files with the same names and we need to open those ones.
Gotcha.
--
Heikki Linnakangas
Neon (https://neon.tech)
Hello Heikki,
11.03.2024 10:09, Heikki Linnakangas wrote:
On 10/03/2024 22:59, Thomas Munro wrote:
On Mon, Mar 11, 2024 at 9:30 AM Heikki Linnakangas <hlinnaka@iki.fi> wrote:
Barring objections, I'll commit the attached.
+1
Pushed, thanks!
Please look at a new anomaly, that I've discovered in master.
Starting from af0e7deb4, the following script:
numjobs=80
for ((i=1;i<=50;i++)); do
echo "I $i"
for ((j=1;j<=numjobs;j++)); do createdb db$j; done
for ((j=1;j<=numjobs;j++)); do
echo "
VACUUM FULL pg_class;
REINDEX INDEX pg_class_oid_index;
" | psql -d db$j >/dev/null 2>&1 &
echo "
CREATE TABLE tbl1 (t text);
DROP TABLE tbl1;
" | psql -d db$j >/dev/null 2>&1 &
done
wait
grep 'was terminated' server.log && break;
for ((j=1;j<=numjobs;j++)); do dropdb db$j; done
done
triggers a segfault:
2024-06-19 19:22:49.009 UTC [1607210:6] LOG: server process (PID 1607671) was terminated by signal 11: Segmentation fault
with the following stack trace:
Core was generated by `postgres: law db50 [local] CREATE TABLE '.
Program terminated with signal SIGSEGV, Segmentation fault.
warning: Section `.reg-xstate/1607671' in core file too small.
#0 0x000055d04cb8232e in RelationReloadNailed (relation=0x7f7d0a1b1fd8) at relcache.c:2415
2415 relp = (Form_pg_class) GETSTRUCT(pg_class_tuple);
(gdb) bt
#0 0x000055d04cb8232e in RelationReloadNailed (relation=0x7f7d0a1b1fd8) at relcache.c:2415
#1 0x000055d04cb8278c in RelationClearRelation (relation=0x7f7d0a1b1fd8, rebuild=true) at relcache.c:2560
#2 0x000055d04cb834e9 in RelationCacheInvalidate (debug_discard=false) at relcache.c:3048
#3 0x000055d04cb72e3c in InvalidateSystemCachesExtended (debug_discard=false) at inval.c:680
#4 0x000055d04cb73190 in InvalidateSystemCaches () at inval.c:794
#5 0x000055d04c9754ad in ReceiveSharedInvalidMessages (
invalFunction=0x55d04cb72eee <LocalExecuteInvalidationMessage>,
resetFunction=0x55d04cb7317e <InvalidateSystemCaches>) at sinval.c:105
#6 0x000055d04cb731b4 in AcceptInvalidationMessages () at inval.c:808
#7 0x000055d04c97d2ed in LockRelationOid (relid=2662, lockmode=1) at lmgr.c:136
#8 0x000055d04c404b1f in relation_open (relationId=2662, lockmode=1) at relation.c:55
#9 0x000055d04c47941e in index_open (relationId=2662, lockmode=1) at indexam.c:137
#10 0x000055d04c4787c2 in systable_beginscan (heapRelation=0x7f7d0a1b1fd8, indexId=2662, indexOK=true, snapshot=0x0,
nkeys=1, key=0x7ffd456a8570) at genam.c:396
#11 0x000055d04cb7e93d in ScanPgRelation (targetRelId=3466, indexOK=true, force_non_historic=false) at relcache.c:381
#12 0x000055d04cb7fe15 in RelationBuildDesc (targetRelId=3466, insertIt=true) at relcache.c:1093
#13 0x000055d04cb81c93 in RelationIdGetRelation (relationId=3466) at relcache.c:2108
#14 0x000055d04c404b29 in relation_open (relationId=3466, lockmode=1) at relation.c:58
#15 0x000055d04cb720a6 in BuildEventTriggerCache () at evtcache.c:129
#16 0x000055d04cb71f6a in EventCacheLookup (event=EVT_SQLDrop) at evtcache.c:68
#17 0x000055d04c61c037 in trackDroppedObjectsNeeded () at event_trigger.c:1342
#18 0x000055d04c61bf02 in EventTriggerBeginCompleteQuery () at event_trigger.c:1284
#19 0x000055d04c9ac744 in ProcessUtilitySlow (pstate=0x55d04d04aca0, pstmt=0x55d04d021540,
queryString=0x55d04d020830 "CREATE TABLE tbl1 (t text);", context=PROCESS_UTILITY_TOPLEVEL, params=0x0,
queryEnv=0x0, dest=0x55d04d021800, qc=0x7ffd456a8fb0) at utility.c:1107
#20 0x000055d04c9ac64d in standard_ProcessUtility (pstmt=0x55d04d021540,
queryString=0x55d04d020830 "CREATE TABLE tbl1 (t text);", readOnlyTree=false, context=PROCESS_UTILITY_TOPLEVEL,
params=0x0, queryEnv=0x0, dest=0x55d04d021800, qc=0x7ffd456a8fb0) at utility.c:1067
#21 0x000055d04c9ab54d in ProcessUtility (pstmt=0x55d04d021540,
queryString=0x55d04d020830 "CREATE TABLE tbl1 (t text);", readOnlyTree=false, context=PROCESS_UTILITY_TOPLEVEL,
params=0x0, queryEnv=0x0, dest=0x55d04d021800, qc=0x7ffd456a8fb0) at utility.c:523
#22 0x000055d04c9a9dc6 in PortalRunUtility (portal=0x55d04d0c1020, pstmt=0x55d04d021540, isTopLevel=true,
setHoldSnapshot=false, dest=0x55d04d021800, qc=0x7ffd456a8fb0) at pquery.c:1158
#23 0x000055d04c9aa03d in PortalRunMulti (portal=0x55d04d0c1020, isTopLevel=true, setHoldSnapshot=false,
dest=0x55d04d021800, altdest=0x55d04d021800, qc=0x7ffd456a8fb0) at pquery.c:1315
#24 0x000055d04c9a9487 in PortalRun (portal=0x55d04d0c1020, count=9223372036854775807, isTopLevel=true, run_once=true,
dest=0x55d04d021800, altdest=0x55d04d021800, qc=0x7ffd456a8fb0) at pquery.c:791
#25 0x000055d04c9a2150 in exec_simple_query (query_string=0x55d04d020830 "CREATE TABLE tbl1 (t text);")
at postgres.c:1273
#26 0x000055d04c9a71f5 in PostgresMain (dbname=0x55d04d05dbf0 "db50", username=0x55d04d05dbd8 "law") at postgres.c:4675
#27 0x000055d04c8bd8c8 in BackendRun (port=0x55d04d0540c0) at postmaster.c:4475
#28 0x000055d04c8bcedd in BackendStartup (port=0x55d04d0540c0) at postmaster.c:4151
#29 0x000055d04c8b93a9 in ServerLoop () at postmaster.c:1769
#30 0x000055d04c8b8ca4 in PostmasterMain (argc=3, argv=0x55d04d01a770) at postmaster.c:1468
#31 0x000055d04c76c577 in main (argc=3, argv=0x55d04d01a770) at main.c:197
(gdb) p pg_class_tuple
$1 = (HeapTuple) 0x0
server.log might also contain:
2024-06-19 19:25:38.060 UTC [1618682:5] psql ERROR: could not read blocks 3..3 in file "base/28531/28840": read only 0
of 8192 bytes
Best regards,
Alexander
On Wed, Jun 19, 2024 at 11:00:00PM +0300, Alexander Lakhin wrote:
Starting from af0e7deb4, the following script:
[...]
triggers a segfault:
2024-06-19 19:22:49.009 UTC [1607210:6] LOG: server process (PID
1607671) was terminated by signal 11: Segmentation fault
Open item added for this issue.
--
Michael
On 19/06/2024 23:00, Alexander Lakhin wrote:
Please look at a new anomaly, that I've discovered in master.
...
triggers a segfault:
2024-06-19 19:22:49.009 UTC [1607210:6] LOG: server process (PID 1607671) was terminated by signal 11: Segmentation fault...
server.log might also contain:
2024-06-19 19:25:38.060 UTC [1618682:5] psql ERROR: could not read blocks 3..3 in file "base/28531/28840": read only 0
of 8192 bytes
Thanks for the report! I was not able to reproduce the segfault, but I
do see the "could not read blocks" error very quickly with the script.
In commit af0e7deb4a, I removed the call to RelationCloseSmgr() from
RelationCacheInvalidate(). I thought it was no longer needed, because we
no longer free the underlying SmgrRelation.
However, it meant that if the relfilenode of the relation was changed,
the relation keeps pointing to the SMgrRelation of the old relfilenode.
So we still need the RelationCloseSmgr() call, in case the relfilenode
has changed.
Per attached patch.
--
Heikki Linnakangas
Neon (https://neon.tech)
Attachments:
0001-Fix-relcache-invalidation-when-relfilelocator-is-upd.patchtext/x-patch; charset=UTF-8; name=0001-Fix-relcache-invalidation-when-relfilelocator-is-upd.patchDownload
From 17a1e68a636edd87a34acf34dace2a696af81a2d Mon Sep 17 00:00:00 2001
From: Heikki Linnakangas <heikki.linnakangas@iki.fi>
Date: Fri, 21 Jun 2024 01:45:53 +0300
Subject: [PATCH 1/1] Fix relcache invalidation when relfilelocator is updated
In commit af0e7deb4a, I removed this call to RelationCloseSmgr(),
because the dangling SMgrRelation was no longer an issue. However, we
still need to call in case the relation's relfilelocator has changed,
so that we open the new relfile on the next access.
Reported-by: Alexander Lakhin <exclusion@gmail.com>
Discussion: https://www.postgresql.org/message-id/987b1c8c-8c91-4847-ca0e-879f421680ff%40gmail.com
---
src/backend/utils/cache/relcache.c | 3 +++
1 file changed, 3 insertions(+)
diff --git a/src/backend/utils/cache/relcache.c b/src/backend/utils/cache/relcache.c
index 35dbb87ae3..f45880d96d 100644
--- a/src/backend/utils/cache/relcache.c
+++ b/src/backend/utils/cache/relcache.c
@@ -3032,6 +3032,9 @@ RelationCacheInvalidate(bool debug_discard)
{
relation = idhentry->reldesc;
+ /* Close all smgr references, in case the relfilelocator has changed */
+ RelationCloseSmgr(relation);
+
/*
* Ignore new relations; no other backend will manipulate them before
* we commit. Likewise, before replacing a relation's relfilelocator,
--
2.39.2
Heikki Linnakangas <hlinnaka@iki.fi> writes:
In commit af0e7deb4a, I removed the call to RelationCloseSmgr() from
RelationCacheInvalidate(). I thought it was no longer needed, because we
no longer free the underlying SmgrRelation.
However, it meant that if the relfilenode of the relation was changed,
the relation keeps pointing to the SMgrRelation of the old relfilenode.
So we still need the RelationCloseSmgr() call, in case the relfilenode
has changed.
Ouch. How come we did not see this immediately in testing? I'd have
thought surely such a bug would be exposed by any command that
rewrites a heap.
regards, tom lane
On 21/06/2024 02:12, Tom Lane wrote:
Heikki Linnakangas <hlinnaka@iki.fi> writes:
In commit af0e7deb4a, I removed the call to RelationCloseSmgr() from
RelationCacheInvalidate(). I thought it was no longer needed, because we
no longer free the underlying SmgrRelation.However, it meant that if the relfilenode of the relation was changed,
the relation keeps pointing to the SMgrRelation of the old relfilenode.
So we still need the RelationCloseSmgr() call, in case the relfilenode
has changed.Ouch. How come we did not see this immediately in testing? I'd have
thought surely such a bug would be exposed by any command that
rewrites a heap.
There is a RelationCloseSmgr() call in RelationClearRelation(), which
covers the common cases. This only occurs during
RelationCacheInvalidate(), when pg_class's relfilenumber was changed.
Hmm, looking closer, I think this might be a more appropriate place for
the RelationCloseSmgr() call:
/*
* If it's a mapped relation, immediately update its rd_locator in
* case its relfilenumber changed. We must do this during phase 1
* in case the relation is consulted during rebuild of other
* relcache entries in phase 2. It's safe since consulting the
* map doesn't involve any access to relcache entries.
*/
if (RelationIsMapped(relation))
RelationInitPhysicalAddr(relation);
That's where we change the relfilenumber, before the
RelationClearRelation() call.
--
Heikki Linnakangas
Neon (https://neon.tech)
On 21/06/2024 02:25, Heikki Linnakangas wrote:
Hmm, looking closer, I think this might be a more appropriate place for
the RelationCloseSmgr() call:/*
* If it's a mapped relation, immediately update its rd_locator in
* case its relfilenumber changed. We must do this during phase 1
* in case the relation is consulted during rebuild of other
* relcache entries in phase 2. It's safe since consulting the
* map doesn't involve any access to relcache entries.
*/
if (RelationIsMapped(relation))
RelationInitPhysicalAddr(relation);That's where we change the relfilenumber, before the
RelationClearRelation() call.
Pushed a fix that way.
--
Heikki Linnakangas
Neon (https://neon.tech)