v13: CLUSTER segv with wal_level=minimal and parallel index creation
Following a bulk load, a CLUSTER command run by a maintenance script crashed.
This is currently reproducible on that instance, so please suggest if I can
provide more info.
< 2020-09-06 15:44:16.369 MDT >LOG: background worker "parallel worker" (PID 2576) was terminated by signal 6: Aborted
< 2020-09-06 15:44:16.369 MDT >DETAIL: Failed process was running: CLUSTER pg_attribute USING pg_attribute_relid_attnam_index
The crash happens during:
ts=# REINDEX INDEX pg_attribute_relid_attnum_index;
..but not:
ts=# REINDEX INDEX pg_attribute_relid_attnam_index ;
pg_catalog | pg_attribute_relid_attnam_index | index | postgres | pg_attribute | permanent | 31 MB |
pg_catalog | pg_attribute_relid_attnum_index | index | postgres | pg_attribute | permanent | 35 MB |
I suspect
|commit c6b92041d Skip WAL for new relfilenodes, under wal_level=minimal.
In fact, I set wal_level=minimal for the bulk load. Note also:
override | data_checksums | on
configuration file | checkpoint_timeout | 60
configuration file | maintenance_work_mem | 1048576
configuration file | max_wal_senders | 0
configuration file | wal_compression | on
configuration file | wal_level | minimal
configuration file | fsync | off
configuration file | full_page_writes | off
default | server_version | 13beta3
(gdb) bt
#0 0x00007ff9999ad387 in raise () from /lib64/libc.so.6
#1 0x00007ff9999aea78 in abort () from /lib64/libc.so.6
#2 0x0000000000921da5 in ExceptionalCondition (conditionName=conditionName@entry=0xad4078 "relcache_verdict == RelFileNodeSkippingWAL(relation->rd_node)", errorType=errorType@entry=0x977f49 "FailedAssertion",
fileName=fileName@entry=0xad3068 "relcache.c", lineNumber=lineNumber@entry=2976) at assert.c:67
#3 0x000000000091a08b in AssertPendingSyncConsistency (relation=0x7ff99c2a70b8) at relcache.c:2976
#4 AssertPendingSyncs_RelationCache () at relcache.c:3036
#5 0x000000000058e591 in smgrDoPendingSyncs (isCommit=isCommit@entry=true, isParallelWorker=isParallelWorker@entry=true) at storage.c:685
#6 0x000000000053b1a4 in CommitTransaction () at xact.c:2118
#7 0x000000000053b826 in EndParallelWorkerTransaction () at xact.c:5300
#8 0x000000000052fcf7 in ParallelWorkerMain (main_arg=<optimized out>) at parallel.c:1479
#9 0x000000000076047a in StartBackgroundWorker () at bgworker.c:813
#10 0x000000000076d88d in do_start_bgworker (rw=0x23ac110) at postmaster.c:5865
#11 maybe_start_bgworkers () at postmaster.c:6091
#12 0x000000000076e43e in sigusr1_handler (postgres_signal_arg=<optimized out>) at postmaster.c:5260
#13 <signal handler called>
#14 0x00007ff999a6c983 in __select_nocancel () from /lib64/libc.so.6
#15 0x00000000004887bc in ServerLoop () at postmaster.c:1691
#16 0x000000000076fb45 in PostmasterMain (argc=argc@entry=3, argv=argv@entry=0x237d280) at postmaster.c:1400
#17 0x000000000048a83d in main (argc=3, argv=0x237d280) at main.c:210
(gdb) bt f
...
#4 AssertPendingSyncs_RelationCache () at relcache.c:3036
status = {hashp = 0x23cba50, curBucket = 449, curEntry = 0x0}
locallock = <optimized out>
rels = 0x23ff018
maxrels = <optimized out>
nrels = 0
idhentry = <optimized out>
i = <optimized out>
#5 0x000000000058e591 in smgrDoPendingSyncs (isCommit=isCommit@entry=true, isParallelWorker=isParallelWorker@entry=true) at storage.c:685
pending = <optimized out>
nrels = 0
maxrels = 0
srels = 0x0
scan = {hashp = 0x23edf60, curBucket = 9633000, curEntry = 0xe01600 <TopTransactionStateData>}
pendingsync = <optimized out>
#6 0x000000000053b1a4 in CommitTransaction () at xact.c:2118
s = 0xe01600 <TopTransactionStateData>
latestXid = <optimized out>
is_parallel_worker = true
__func__ = "CommitTransaction"
This is easily reproduced, at least on pg_attribute
[pryzbyj@localhost ~]$ /usr/pgsql-13/bin/initdb -D pgsql13.dat
[pryzbyj@localhost ~]$ /usr/pgsql-13/bin/postgres -D pgsql13.dat -c logging_collector=off -c port=5678 -c unix_socket_directories=/tmp -c wal-level=minimal -c max_wal_senders=0&
[pryzbyj@localhost ~]$ psql -h /tmp -p 5678 postgres -c "SET min_parallel_table_scan_size=0" -c "SET client_min_messages=debug" -c "REINDEX INDEX pg_attribute_relid_attnum_index"
SET
SET
DEBUG: building index "pg_attribute_relid_attnum_index" on table "pg_attribute" with request for 1 parallel worker
TRAP: FailedAssertion("relcache_verdict == RelFileNodeSkippingWAL(relation->rd_node)", File: "relcache.c", Line: 2976)
postgres: parallel worker for PID 9637 (ExceptionalCondition+0x66)[0x921d86]
postgres: parallel worker for PID 9637 (AssertPendingSyncs_RelationCache+0x1db)[0x91a08b]
postgres: parallel worker for PID 9637 (smgrDoPendingSyncs+0x71)[0x58e591]
postgres: parallel worker for PID 9637 [0x53b1a4]
postgres: parallel worker for PID 9637 (EndParallelWorkerTransaction+0x16)[0x53b826]
postgres: parallel worker for PID 9637 (ParallelWorkerMain+0x437)[0x52fcf7]
postgres: parallel worker for PID 9637 (StartBackgroundWorker+0x25a)[0x76047a]
postgres: parallel worker for PID 9637 [0x76d88d]
postgres: parallel worker for PID 9637 [0x76e43e]
/lib64/libpthread.so.0(+0xf630)[0x7f46d4b47630]
/lib64/libc.so.6(__select+0x13)[0x7f46d26ba983]
postgres: parallel worker for PID 9637 [0x4887bc]
postgres: parallel worker for PID 9637 (PostmasterMain+0x1165)[0x76fb45]
postgres: parallel worker for PID 9637 (main+0x70d)[0x48a83d]
/lib64/libc.so.6(__libc_start_main+0xf5)[0x7f46d25e7555]
--
Justin
Justin Pryzby <pryzby@telsasoft.com> writes:
This is easily reproduced, at least on pg_attribute
Yeah, duplicated here. The assertion (not SIGSEGV) is in code
added by c6b92041d, so it seems clearly an open issue for v13.
I did not check to see what happens in a non-assert build.
regards, tom lane
At Mon, 07 Sep 2020 00:33:03 -0400, Tom Lane <tgl@sss.pgh.pa.us> wrote in
Justin Pryzby <pryzby@telsasoft.com> writes:
This is easily reproduced, at least on pg_attribute
Thanks for the repro.
Yeah, duplicated here. The assertion (not SIGSEGV) is in code
added by c6b92041d, so it seems clearly an open issue for v13.I did not check to see what happens in a non-assert build.
The cause is that the worker had received pending-sync entry correctly
but not never created a relcache entry for the relation using
RelationBuildDesc. So the rd_firstRelfilenodeSubid is not correctly
set.
I'm investigating it.
regards.
--
Kyotaro Horiguchi
NTT Open Source Software Center
At Mon, 07 Sep 2020 13:45:28 +0900 (JST), Kyotaro Horiguchi <horikyota.ntt@gmail.com> wrote in
The cause is that the worker had received pending-sync entry correctly
but not never created a relcache entry for the relation using
RelationBuildDesc. So the rd_firstRelfilenodeSubid is not correctly
set.I'm investigating it.
Relcaches are loaded from a file with old content at parallel worker
startup. The relcache entry is corrected by invalidation at taking a
lock but pending syncs are not considered.
Since parallel workers don't access the files so we can just ignore
the assertion safely, but I want to rd_firstRelfilenodeSubid flag at
invalidation, as attached PoC patch.
regards.
--
Kyotaro Horiguchi
NTT Open Source Software Center
Attachments:
fix_para_reindex_on_mapped_rels_opc.patchtext/x-patch; charset=us-asciiDownload+17-0
On Mon, Sep 07, 2020 at 05:40:36PM +0900, Kyotaro Horiguchi wrote:
At Mon, 07 Sep 2020 13:45:28 +0900 (JST), Kyotaro Horiguchi <horikyota.ntt@gmail.com> wrote in
The cause is that the worker had received pending-sync entry correctly
but not never created a relcache entry for the relation using
RelationBuildDesc. So the rd_firstRelfilenodeSubid is not correctly
set.I'm investigating it.
Relcaches are loaded from a file with old content at parallel worker
startup. The relcache entry is corrected by invalidation at taking a
lock but pending syncs are not considered.Since parallel workers don't access the files so we can just ignore
the assertion safely, but I want to rd_firstRelfilenodeSubid flag at
invalidation, as attached PoC patch.
[patch: When RelationInitPhysicalAddr() handles a mapped relation, re-fill
rd_firstRelfilenodeSubid from RelFileNodeSkippingWAL(), like
RelationBuildDesc() would do.]
As a PoC, this looks promising. Thanks. Would you add a test case such that
the following demonstrates the bug in the absence of your PoC?
printf '%s\n%s\n%s\n' 'log_statement = all' 'wal_level = minimal' 'max_wal_senders = 0' >/tmp/minimal.conf
make check TEMP_CONFIG=/tmp/minimal.conf
Please have the test try both a nailed-and-mapped relation and a "nailed, but
not mapped" relation. I am fairly confident that your PoC fixes the former
case, but the latter may need additional code.
At Mon, 7 Sep 2020 02:32:55 -0700, Noah Misch <noah@leadboat.com> wrote in
On Mon, Sep 07, 2020 at 05:40:36PM +0900, Kyotaro Horiguchi wrote:
At Mon, 07 Sep 2020 13:45:28 +0900 (JST), Kyotaro Horiguchi <horikyota.ntt@gmail.com> wrote in
The cause is that the worker had received pending-sync entry correctly
but not never created a relcache entry for the relation using
RelationBuildDesc. So the rd_firstRelfilenodeSubid is not correctly
set.I'm investigating it.
Relcaches are loaded from a file with old content at parallel worker
startup. The relcache entry is corrected by invalidation at taking a
lock but pending syncs are not considered.Since parallel workers don't access the files so we can just ignore
the assertion safely, but I want to rd_firstRelfilenodeSubid flag at
invalidation, as attached PoC patch.[patch: When RelationInitPhysicalAddr() handles a mapped relation, re-fill
rd_firstRelfilenodeSubid from RelFileNodeSkippingWAL(), like
RelationBuildDesc() would do.]As a PoC, this looks promising. Thanks. Would you add a test case such that
the following demonstrates the bug in the absence of your PoC?printf '%s\n%s\n%s\n' 'log_statement = all' 'wal_level = minimal' 'max_wal_senders = 0' >/tmp/minimal.conf
make check TEMP_CONFIG=/tmp/minimal.conf
Mmm. I was close to add some tests to 018_wal_optimize.pl but your
suggestion seems better. I added several ines to create_index.sql.
Please have the test try both a nailed-and-mapped relation and a "nailed, but
not mapped" relation. I am fairly confident that your PoC fixes the former
case, but the latter may need additional code.
Mmm. You're right. I choosed pg_amproc_fam_proc_index as
nailed-but-not-mapped index.
regards.
--
Kyotaro Horiguchi
NTT Open Source Software Center
Attachments:
0001-Fix-assertion-failure-during-reindex-while-wal_level.patchtext/x-patch; charset=us-asciiDownload+27-1
At Tue, 08 Sep 2020 09:13:53 +0900 (JST), Kyotaro Horiguchi <horikyota.ntt@gmail.com> wrote in
At Mon, 7 Sep 2020 02:32:55 -0700, Noah Misch <noah@leadboat.com> wrote in
As a PoC, this looks promising. Thanks. Would you add a test case such that
the following demonstrates the bug in the absence of your PoC?printf '%s\n%s\n%s\n' 'log_statement = all' 'wal_level = minimal' 'max_wal_senders = 0' >/tmp/minimal.conf
make check TEMP_CONFIG=/tmp/minimal.confMmm. I was close to add some tests to 018_wal_optimize.pl but your
suggestion seems better. I added several ines to create_index.sql.Please have the test try both a nailed-and-mapped relation and a "nailed, but
not mapped" relation. I am fairly confident that your PoC fixes the former
case, but the latter may need additional code.Mmm. You're right. I choosed pg_amproc_fam_proc_index as
nailed-but-not-mapped index.
I fixed a typo (s/staring/starting/).
regards.
--
Kyotaro Horiguchi
NTT Open Source Software Center
Attachments:
v2-0001-Fix-assertion-failure-during-reindex-while-wal_le.patchtext/x-patch; charset=us-asciiDownload+27-1
On Tue, Sep 08, 2020 at 10:43:32AM +0900, Kyotaro Horiguchi wrote:
At Tue, 08 Sep 2020 09:13:53 +0900 (JST), Kyotaro Horiguchi <horikyota.ntt@gmail.com> wrote in
At Mon, 7 Sep 2020 02:32:55 -0700, Noah Misch <noah@leadboat.com> wrote in
As a PoC, this looks promising. Thanks. Would you add a test case such that
the following demonstrates the bug in the absence of your PoC?printf '%s\n%s\n%s\n' 'log_statement = all' 'wal_level = minimal' 'max_wal_senders = 0' >/tmp/minimal.conf
make check TEMP_CONFIG=/tmp/minimal.confMmm. I was close to add some tests to 018_wal_optimize.pl but your
suggestion seems better. I added several ines to create_index.sql.Please have the test try both a nailed-and-mapped relation and a "nailed, but
not mapped" relation. I am fairly confident that your PoC fixes the former
case, but the latter may need additional code.Mmm. You're right. I choosed pg_amproc_fam_proc_index as
nailed-but-not-mapped index.I fixed a typo (s/staring/starting/).
At a glance, this looks reasonable. If a closer look doesn't reveal problems,
I'll push this.
On Mon, Sep 07, 2020 at 07:47:09PM -0700, Noah Misch wrote:
On Tue, Sep 08, 2020 at 10:43:32AM +0900, Kyotaro Horiguchi wrote:
At Tue, 08 Sep 2020 09:13:53 +0900 (JST), Kyotaro Horiguchi <horikyota.ntt@gmail.com> wrote in
At Mon, 7 Sep 2020 02:32:55 -0700, Noah Misch <noah@leadboat.com> wrote in
As a PoC, this looks promising. Thanks. Would you add a test case such that
the following demonstrates the bug in the absence of your PoC?printf '%s\n%s\n%s\n' 'log_statement = all' 'wal_level = minimal' 'max_wal_senders = 0' >/tmp/minimal.conf
make check TEMP_CONFIG=/tmp/minimal.confMmm. I was close to add some tests to 018_wal_optimize.pl but your
suggestion seems better. I added several ines to create_index.sql.
After looking closer, I've moved the test to reindex_catalog.sql; see that
file's header comment for the reasons. One now needs this command:
printf '%s\n%s\n%s\n' 'log_statement = all' 'wal_level = minimal' 'max_wal_senders = 0' >/tmp/minimal.conf
make check-tests TESTS=reindex_catalog TEMP_CONFIG=/tmp/minimal.conf
Please have the test try both a nailed-and-mapped relation and a "nailed, but
not mapped" relation. I am fairly confident that your PoC fixes the former
case, but the latter may need additional code.Mmm. You're right. I choosed pg_amproc_fam_proc_index as
nailed-but-not-mapped index.I fixed a typo (s/staring/starting/).
At a glance, this looks reasonable. If a closer look doesn't reveal problems,
I'll push this.
RelationBuildDesc() calls RelationInitPhysicalAddr(), so RelationBuildDesc()
can stop calling RelFileNodeSkippingWAL(). The attached version makes it so,
and I plan to push it.
Attachments:
skip-wal-nailed-v3.patchtext/plain; charset=us-asciiDownload+36-8
Thank you for the checking.
At Tue, 8 Sep 2020 21:05:43 -0700, Noah Misch <noah@leadboat.com> wrote in
On Mon, Sep 07, 2020 at 07:47:09PM -0700, Noah Misch wrote:
After looking closer, I've moved the test to reindex_catalog.sql; see that
file's header comment for the reasons. One now needs this command:printf '%s\n%s\n%s\n' 'log_statement = all' 'wal_level = minimal' 'max_wal_senders = 0' >/tmp/minimal.conf
make check-tests TESTS=reindex_catalog TEMP_CONFIG=/tmp/minimal.conf
I missed the file. It is surely the best place.
RelationBuildDesc() calls RelationInitPhysicalAddr(), so RelationBuildDesc()
can stop calling RelFileNodeSkippingWAL(). The attached version makes it so,
and I plan to push it.
Sure. relNode is filled with zeros so RelationInitPhysicalAddr() works
correctly for RelationBuildDesc().
Both changes look good to me.
regards.
--
Kyotaro Horiguchi
NTT Open Source Software Center
At Wed, 09 Sep 2020 14:40:37 +0900 (JST), Kyotaro Horiguchi <horikyota.ntt@gmail.com> wrote in
Sure. relNode is filled with zeros so RelationInitPhysicalAddr() works
correctly for RelationBuildDesc().Both changes look good to me.
Thank you for committing this!
regards.
--
Kyotaro Horiguchi
NTT Open Source Software Center