v13: CLUSTER segv with wal_level=minimal and parallel index creation

Started by Justin Pryzbyover 5 years ago12 messageshackers
Jump to latest
#1Justin Pryzby
pryzby@telsasoft.com

Following a bulk load, a CLUSTER command run by a maintenance script crashed.
This is currently reproducible on that instance, so please suggest if I can
provide more info.

< 2020-09-06 15:44:16.369 MDT >LOG: background worker "parallel worker" (PID 2576) was terminated by signal 6: Aborted
< 2020-09-06 15:44:16.369 MDT >DETAIL: Failed process was running: CLUSTER pg_attribute USING pg_attribute_relid_attnam_index

The crash happens during:
ts=# REINDEX INDEX pg_attribute_relid_attnum_index;
..but not:
ts=# REINDEX INDEX pg_attribute_relid_attnam_index ;

pg_catalog | pg_attribute_relid_attnam_index | index | postgres | pg_attribute | permanent | 31 MB |
pg_catalog | pg_attribute_relid_attnum_index | index | postgres | pg_attribute | permanent | 35 MB |

I suspect
|commit c6b92041d Skip WAL for new relfilenodes, under wal_level=minimal.

In fact, I set wal_level=minimal for the bulk load. Note also:
override | data_checksums | on
configuration file | checkpoint_timeout | 60
configuration file | maintenance_work_mem | 1048576
configuration file | max_wal_senders | 0
configuration file | wal_compression | on
configuration file | wal_level | minimal
configuration file | fsync | off
configuration file | full_page_writes | off
default | server_version | 13beta3

(gdb) bt
#0 0x00007ff9999ad387 in raise () from /lib64/libc.so.6
#1 0x00007ff9999aea78 in abort () from /lib64/libc.so.6
#2 0x0000000000921da5 in ExceptionalCondition (conditionName=conditionName@entry=0xad4078 "relcache_verdict == RelFileNodeSkippingWAL(relation->rd_node)", errorType=errorType@entry=0x977f49 "FailedAssertion",
fileName=fileName@entry=0xad3068 "relcache.c", lineNumber=lineNumber@entry=2976) at assert.c:67
#3 0x000000000091a08b in AssertPendingSyncConsistency (relation=0x7ff99c2a70b8) at relcache.c:2976
#4 AssertPendingSyncs_RelationCache () at relcache.c:3036
#5 0x000000000058e591 in smgrDoPendingSyncs (isCommit=isCommit@entry=true, isParallelWorker=isParallelWorker@entry=true) at storage.c:685
#6 0x000000000053b1a4 in CommitTransaction () at xact.c:2118
#7 0x000000000053b826 in EndParallelWorkerTransaction () at xact.c:5300
#8 0x000000000052fcf7 in ParallelWorkerMain (main_arg=<optimized out>) at parallel.c:1479
#9 0x000000000076047a in StartBackgroundWorker () at bgworker.c:813
#10 0x000000000076d88d in do_start_bgworker (rw=0x23ac110) at postmaster.c:5865
#11 maybe_start_bgworkers () at postmaster.c:6091
#12 0x000000000076e43e in sigusr1_handler (postgres_signal_arg=<optimized out>) at postmaster.c:5260
#13 <signal handler called>
#14 0x00007ff999a6c983 in __select_nocancel () from /lib64/libc.so.6
#15 0x00000000004887bc in ServerLoop () at postmaster.c:1691
#16 0x000000000076fb45 in PostmasterMain (argc=argc@entry=3, argv=argv@entry=0x237d280) at postmaster.c:1400
#17 0x000000000048a83d in main (argc=3, argv=0x237d280) at main.c:210

(gdb) bt f
...
#4 AssertPendingSyncs_RelationCache () at relcache.c:3036
status = {hashp = 0x23cba50, curBucket = 449, curEntry = 0x0}
locallock = <optimized out>
rels = 0x23ff018
maxrels = <optimized out>
nrels = 0
idhentry = <optimized out>
i = <optimized out>
#5 0x000000000058e591 in smgrDoPendingSyncs (isCommit=isCommit@entry=true, isParallelWorker=isParallelWorker@entry=true) at storage.c:685
pending = <optimized out>
nrels = 0
maxrels = 0
srels = 0x0
scan = {hashp = 0x23edf60, curBucket = 9633000, curEntry = 0xe01600 <TopTransactionStateData>}
pendingsync = <optimized out>
#6 0x000000000053b1a4 in CommitTransaction () at xact.c:2118
s = 0xe01600 <TopTransactionStateData>
latestXid = <optimized out>
is_parallel_worker = true
__func__ = "CommitTransaction"

#2Justin Pryzby
pryzby@telsasoft.com
In reply to: Justin Pryzby (#1)
Re: v13: CLUSTER segv with wal_level=minimal and parallel index creation

This is easily reproduced, at least on pg_attribute

[pryzbyj@localhost ~]$ /usr/pgsql-13/bin/initdb -D pgsql13.dat
[pryzbyj@localhost ~]$ /usr/pgsql-13/bin/postgres -D pgsql13.dat -c logging_collector=off -c port=5678 -c unix_socket_directories=/tmp -c wal-level=minimal -c max_wal_senders=0&
[pryzbyj@localhost ~]$ psql -h /tmp -p 5678 postgres -c "SET min_parallel_table_scan_size=0" -c "SET client_min_messages=debug" -c "REINDEX INDEX pg_attribute_relid_attnum_index"
SET
SET
DEBUG: building index "pg_attribute_relid_attnum_index" on table "pg_attribute" with request for 1 parallel worker
TRAP: FailedAssertion("relcache_verdict == RelFileNodeSkippingWAL(relation->rd_node)", File: "relcache.c", Line: 2976)
postgres: parallel worker for PID 9637 (ExceptionalCondition+0x66)[0x921d86]
postgres: parallel worker for PID 9637 (AssertPendingSyncs_RelationCache+0x1db)[0x91a08b]
postgres: parallel worker for PID 9637 (smgrDoPendingSyncs+0x71)[0x58e591]
postgres: parallel worker for PID 9637 [0x53b1a4]
postgres: parallel worker for PID 9637 (EndParallelWorkerTransaction+0x16)[0x53b826]
postgres: parallel worker for PID 9637 (ParallelWorkerMain+0x437)[0x52fcf7]
postgres: parallel worker for PID 9637 (StartBackgroundWorker+0x25a)[0x76047a]
postgres: parallel worker for PID 9637 [0x76d88d]
postgres: parallel worker for PID 9637 [0x76e43e]
/lib64/libpthread.so.0(+0xf630)[0x7f46d4b47630]
/lib64/libc.so.6(__select+0x13)[0x7f46d26ba983]
postgres: parallel worker for PID 9637 [0x4887bc]
postgres: parallel worker for PID 9637 (PostmasterMain+0x1165)[0x76fb45]
postgres: parallel worker for PID 9637 (main+0x70d)[0x48a83d]
/lib64/libc.so.6(__libc_start_main+0xf5)[0x7f46d25e7555]

--
Justin

#3Tom Lane
tgl@sss.pgh.pa.us
In reply to: Justin Pryzby (#2)
Re: v13: CLUSTER segv with wal_level=minimal and parallel index creation

Justin Pryzby <pryzby@telsasoft.com> writes:

This is easily reproduced, at least on pg_attribute

Yeah, duplicated here. The assertion (not SIGSEGV) is in code
added by c6b92041d, so it seems clearly an open issue for v13.

I did not check to see what happens in a non-assert build.

regards, tom lane

#4Kyotaro Horiguchi
horikyota.ntt@gmail.com
In reply to: Tom Lane (#3)
Re: v13: CLUSTER segv with wal_level=minimal and parallel index creation

At Mon, 07 Sep 2020 00:33:03 -0400, Tom Lane <tgl@sss.pgh.pa.us> wrote in

Justin Pryzby <pryzby@telsasoft.com> writes:

This is easily reproduced, at least on pg_attribute

Thanks for the repro.

Yeah, duplicated here. The assertion (not SIGSEGV) is in code
added by c6b92041d, so it seems clearly an open issue for v13.

I did not check to see what happens in a non-assert build.

The cause is that the worker had received pending-sync entry correctly
but not never created a relcache entry for the relation using
RelationBuildDesc. So the rd_firstRelfilenodeSubid is not correctly
set.

I'm investigating it.

regards.

--
Kyotaro Horiguchi
NTT Open Source Software Center

#5Kyotaro Horiguchi
horikyota.ntt@gmail.com
In reply to: Kyotaro Horiguchi (#4)
Re: v13: CLUSTER segv with wal_level=minimal and parallel index creation

At Mon, 07 Sep 2020 13:45:28 +0900 (JST), Kyotaro Horiguchi <horikyota.ntt@gmail.com> wrote in

The cause is that the worker had received pending-sync entry correctly
but not never created a relcache entry for the relation using
RelationBuildDesc. So the rd_firstRelfilenodeSubid is not correctly
set.

I'm investigating it.

Relcaches are loaded from a file with old content at parallel worker
startup. The relcache entry is corrected by invalidation at taking a
lock but pending syncs are not considered.

Since parallel workers don't access the files so we can just ignore
the assertion safely, but I want to rd_firstRelfilenodeSubid flag at
invalidation, as attached PoC patch.

regards.

--
Kyotaro Horiguchi
NTT Open Source Software Center

Attachments:

fix_para_reindex_on_mapped_rels_opc.patchtext/x-patch; charset=us-asciiDownload+17-0
#6Noah Misch
noah@leadboat.com
In reply to: Kyotaro Horiguchi (#5)
Re: v13: CLUSTER segv with wal_level=minimal and parallel index creation

On Mon, Sep 07, 2020 at 05:40:36PM +0900, Kyotaro Horiguchi wrote:

At Mon, 07 Sep 2020 13:45:28 +0900 (JST), Kyotaro Horiguchi <horikyota.ntt@gmail.com> wrote in

The cause is that the worker had received pending-sync entry correctly
but not never created a relcache entry for the relation using
RelationBuildDesc. So the rd_firstRelfilenodeSubid is not correctly
set.

I'm investigating it.

Relcaches are loaded from a file with old content at parallel worker
startup. The relcache entry is corrected by invalidation at taking a
lock but pending syncs are not considered.

Since parallel workers don't access the files so we can just ignore
the assertion safely, but I want to rd_firstRelfilenodeSubid flag at
invalidation, as attached PoC patch.

[patch: When RelationInitPhysicalAddr() handles a mapped relation, re-fill
rd_firstRelfilenodeSubid from RelFileNodeSkippingWAL(), like
RelationBuildDesc() would do.]

As a PoC, this looks promising. Thanks. Would you add a test case such that
the following demonstrates the bug in the absence of your PoC?

printf '%s\n%s\n%s\n' 'log_statement = all' 'wal_level = minimal' 'max_wal_senders = 0' >/tmp/minimal.conf
make check TEMP_CONFIG=/tmp/minimal.conf

Please have the test try both a nailed-and-mapped relation and a "nailed, but
not mapped" relation. I am fairly confident that your PoC fixes the former
case, but the latter may need additional code.

#7Kyotaro Horiguchi
horikyota.ntt@gmail.com
In reply to: Noah Misch (#6)
Re: v13: CLUSTER segv with wal_level=minimal and parallel index creation

At Mon, 7 Sep 2020 02:32:55 -0700, Noah Misch <noah@leadboat.com> wrote in

On Mon, Sep 07, 2020 at 05:40:36PM +0900, Kyotaro Horiguchi wrote:

At Mon, 07 Sep 2020 13:45:28 +0900 (JST), Kyotaro Horiguchi <horikyota.ntt@gmail.com> wrote in

The cause is that the worker had received pending-sync entry correctly
but not never created a relcache entry for the relation using
RelationBuildDesc. So the rd_firstRelfilenodeSubid is not correctly
set.

I'm investigating it.

Relcaches are loaded from a file with old content at parallel worker
startup. The relcache entry is corrected by invalidation at taking a
lock but pending syncs are not considered.

Since parallel workers don't access the files so we can just ignore
the assertion safely, but I want to rd_firstRelfilenodeSubid flag at
invalidation, as attached PoC patch.

[patch: When RelationInitPhysicalAddr() handles a mapped relation, re-fill
rd_firstRelfilenodeSubid from RelFileNodeSkippingWAL(), like
RelationBuildDesc() would do.]

As a PoC, this looks promising. Thanks. Would you add a test case such that
the following demonstrates the bug in the absence of your PoC?

printf '%s\n%s\n%s\n' 'log_statement = all' 'wal_level = minimal' 'max_wal_senders = 0' >/tmp/minimal.conf
make check TEMP_CONFIG=/tmp/minimal.conf

Mmm. I was close to add some tests to 018_wal_optimize.pl but your
suggestion seems better. I added several ines to create_index.sql.

Please have the test try both a nailed-and-mapped relation and a "nailed, but
not mapped" relation. I am fairly confident that your PoC fixes the former
case, but the latter may need additional code.

Mmm. You're right. I choosed pg_amproc_fam_proc_index as
nailed-but-not-mapped index.

regards.

--
Kyotaro Horiguchi
NTT Open Source Software Center

Attachments:

0001-Fix-assertion-failure-during-reindex-while-wal_level.patchtext/x-patch; charset=us-asciiDownload+27-1
#8Kyotaro Horiguchi
horikyota.ntt@gmail.com
In reply to: Kyotaro Horiguchi (#7)
Re: v13: CLUSTER segv with wal_level=minimal and parallel index creation

At Tue, 08 Sep 2020 09:13:53 +0900 (JST), Kyotaro Horiguchi <horikyota.ntt@gmail.com> wrote in

At Mon, 7 Sep 2020 02:32:55 -0700, Noah Misch <noah@leadboat.com> wrote in

As a PoC, this looks promising. Thanks. Would you add a test case such that
the following demonstrates the bug in the absence of your PoC?

printf '%s\n%s\n%s\n' 'log_statement = all' 'wal_level = minimal' 'max_wal_senders = 0' >/tmp/minimal.conf
make check TEMP_CONFIG=/tmp/minimal.conf

Mmm. I was close to add some tests to 018_wal_optimize.pl but your
suggestion seems better. I added several ines to create_index.sql.

Please have the test try both a nailed-and-mapped relation and a "nailed, but
not mapped" relation. I am fairly confident that your PoC fixes the former
case, but the latter may need additional code.

Mmm. You're right. I choosed pg_amproc_fam_proc_index as
nailed-but-not-mapped index.

I fixed a typo (s/staring/starting/).

regards.

--
Kyotaro Horiguchi
NTT Open Source Software Center

Attachments:

v2-0001-Fix-assertion-failure-during-reindex-while-wal_le.patchtext/x-patch; charset=us-asciiDownload+27-1
#9Noah Misch
noah@leadboat.com
In reply to: Kyotaro Horiguchi (#8)
Re: v13: CLUSTER segv with wal_level=minimal and parallel index creation

On Tue, Sep 08, 2020 at 10:43:32AM +0900, Kyotaro Horiguchi wrote:

At Tue, 08 Sep 2020 09:13:53 +0900 (JST), Kyotaro Horiguchi <horikyota.ntt@gmail.com> wrote in

At Mon, 7 Sep 2020 02:32:55 -0700, Noah Misch <noah@leadboat.com> wrote in

As a PoC, this looks promising. Thanks. Would you add a test case such that
the following demonstrates the bug in the absence of your PoC?

printf '%s\n%s\n%s\n' 'log_statement = all' 'wal_level = minimal' 'max_wal_senders = 0' >/tmp/minimal.conf
make check TEMP_CONFIG=/tmp/minimal.conf

Mmm. I was close to add some tests to 018_wal_optimize.pl but your
suggestion seems better. I added several ines to create_index.sql.

Please have the test try both a nailed-and-mapped relation and a "nailed, but
not mapped" relation. I am fairly confident that your PoC fixes the former
case, but the latter may need additional code.

Mmm. You're right. I choosed pg_amproc_fam_proc_index as
nailed-but-not-mapped index.

I fixed a typo (s/staring/starting/).

At a glance, this looks reasonable. If a closer look doesn't reveal problems,
I'll push this.

#10Noah Misch
noah@leadboat.com
In reply to: Noah Misch (#9)
Re: v13: CLUSTER segv with wal_level=minimal and parallel index creation

On Mon, Sep 07, 2020 at 07:47:09PM -0700, Noah Misch wrote:

On Tue, Sep 08, 2020 at 10:43:32AM +0900, Kyotaro Horiguchi wrote:

At Tue, 08 Sep 2020 09:13:53 +0900 (JST), Kyotaro Horiguchi <horikyota.ntt@gmail.com> wrote in

At Mon, 7 Sep 2020 02:32:55 -0700, Noah Misch <noah@leadboat.com> wrote in

As a PoC, this looks promising. Thanks. Would you add a test case such that
the following demonstrates the bug in the absence of your PoC?

printf '%s\n%s\n%s\n' 'log_statement = all' 'wal_level = minimal' 'max_wal_senders = 0' >/tmp/minimal.conf
make check TEMP_CONFIG=/tmp/minimal.conf

Mmm. I was close to add some tests to 018_wal_optimize.pl but your
suggestion seems better. I added several ines to create_index.sql.

After looking closer, I've moved the test to reindex_catalog.sql; see that
file's header comment for the reasons. One now needs this command:

printf '%s\n%s\n%s\n' 'log_statement = all' 'wal_level = minimal' 'max_wal_senders = 0' >/tmp/minimal.conf
make check-tests TESTS=reindex_catalog TEMP_CONFIG=/tmp/minimal.conf

Please have the test try both a nailed-and-mapped relation and a "nailed, but
not mapped" relation. I am fairly confident that your PoC fixes the former
case, but the latter may need additional code.

Mmm. You're right. I choosed pg_amproc_fam_proc_index as
nailed-but-not-mapped index.

I fixed a typo (s/staring/starting/).

At a glance, this looks reasonable. If a closer look doesn't reveal problems,
I'll push this.

RelationBuildDesc() calls RelationInitPhysicalAddr(), so RelationBuildDesc()
can stop calling RelFileNodeSkippingWAL(). The attached version makes it so,
and I plan to push it.

Attachments:

skip-wal-nailed-v3.patchtext/plain; charset=us-asciiDownload+36-8
#11Kyotaro Horiguchi
horikyota.ntt@gmail.com
In reply to: Noah Misch (#10)
Re: v13: CLUSTER segv with wal_level=minimal and parallel index creation

Thank you for the checking.

At Tue, 8 Sep 2020 21:05:43 -0700, Noah Misch <noah@leadboat.com> wrote in

On Mon, Sep 07, 2020 at 07:47:09PM -0700, Noah Misch wrote:
After looking closer, I've moved the test to reindex_catalog.sql; see that
file's header comment for the reasons. One now needs this command:

printf '%s\n%s\n%s\n' 'log_statement = all' 'wal_level = minimal' 'max_wal_senders = 0' >/tmp/minimal.conf
make check-tests TESTS=reindex_catalog TEMP_CONFIG=/tmp/minimal.conf

I missed the file. It is surely the best place.

RelationBuildDesc() calls RelationInitPhysicalAddr(), so RelationBuildDesc()
can stop calling RelFileNodeSkippingWAL(). The attached version makes it so,
and I plan to push it.

Sure. relNode is filled with zeros so RelationInitPhysicalAddr() works
correctly for RelationBuildDesc().

Both changes look good to me.

regards.

--
Kyotaro Horiguchi
NTT Open Source Software Center

#12Kyotaro Horiguchi
horikyota.ntt@gmail.com
In reply to: Kyotaro Horiguchi (#11)
Re: v13: CLUSTER segv with wal_level=minimal and parallel index creation

At Wed, 09 Sep 2020 14:40:37 +0900 (JST), Kyotaro Horiguchi <horikyota.ntt@gmail.com> wrote in

Sure. relNode is filled with zeros so RelationInitPhysicalAddr() works
correctly for RelationBuildDesc().

Both changes look good to me.

Thank you for committing this!

regards.
--
Kyotaro Horiguchi
NTT Open Source Software Center