v13: CLUSTER segv with wal_level=minimal and parallel index creation

Started by Justin Pryzbyover 5 years ago12 messages
#1Justin Pryzby
pryzby@telsasoft.com

Following a bulk load, a CLUSTER command run by a maintenance script crashed.
This is currently reproducible on that instance, so please suggest if I can
provide more info.

< 2020-09-06 15:44:16.369 MDT >LOG: background worker "parallel worker" (PID 2576) was terminated by signal 6: Aborted
< 2020-09-06 15:44:16.369 MDT >DETAIL: Failed process was running: CLUSTER pg_attribute USING pg_attribute_relid_attnam_index

The crash happens during:
ts=# REINDEX INDEX pg_attribute_relid_attnum_index;
..but not:
ts=# REINDEX INDEX pg_attribute_relid_attnam_index ;

pg_catalog | pg_attribute_relid_attnam_index | index | postgres | pg_attribute | permanent | 31 MB |
pg_catalog | pg_attribute_relid_attnum_index | index | postgres | pg_attribute | permanent | 35 MB |

I suspect
|commit c6b92041d Skip WAL for new relfilenodes, under wal_level=minimal.

In fact, I set wal_level=minimal for the bulk load. Note also:
override | data_checksums | on
configuration file | checkpoint_timeout | 60
configuration file | maintenance_work_mem | 1048576
configuration file | max_wal_senders | 0
configuration file | wal_compression | on
configuration file | wal_level | minimal
configuration file | fsync | off
configuration file | full_page_writes | off
default | server_version | 13beta3

(gdb) bt
#0 0x00007ff9999ad387 in raise () from /lib64/libc.so.6
#1 0x00007ff9999aea78 in abort () from /lib64/libc.so.6
#2 0x0000000000921da5 in ExceptionalCondition (conditionName=conditionName@entry=0xad4078 "relcache_verdict == RelFileNodeSkippingWAL(relation->rd_node)", errorType=errorType@entry=0x977f49 "FailedAssertion",
fileName=fileName@entry=0xad3068 "relcache.c", lineNumber=lineNumber@entry=2976) at assert.c:67
#3 0x000000000091a08b in AssertPendingSyncConsistency (relation=0x7ff99c2a70b8) at relcache.c:2976
#4 AssertPendingSyncs_RelationCache () at relcache.c:3036
#5 0x000000000058e591 in smgrDoPendingSyncs (isCommit=isCommit@entry=true, isParallelWorker=isParallelWorker@entry=true) at storage.c:685
#6 0x000000000053b1a4 in CommitTransaction () at xact.c:2118
#7 0x000000000053b826 in EndParallelWorkerTransaction () at xact.c:5300
#8 0x000000000052fcf7 in ParallelWorkerMain (main_arg=<optimized out>) at parallel.c:1479
#9 0x000000000076047a in StartBackgroundWorker () at bgworker.c:813
#10 0x000000000076d88d in do_start_bgworker (rw=0x23ac110) at postmaster.c:5865
#11 maybe_start_bgworkers () at postmaster.c:6091
#12 0x000000000076e43e in sigusr1_handler (postgres_signal_arg=<optimized out>) at postmaster.c:5260
#13 <signal handler called>
#14 0x00007ff999a6c983 in __select_nocancel () from /lib64/libc.so.6
#15 0x00000000004887bc in ServerLoop () at postmaster.c:1691
#16 0x000000000076fb45 in PostmasterMain (argc=argc@entry=3, argv=argv@entry=0x237d280) at postmaster.c:1400
#17 0x000000000048a83d in main (argc=3, argv=0x237d280) at main.c:210

(gdb) bt f
...
#4 AssertPendingSyncs_RelationCache () at relcache.c:3036
status = {hashp = 0x23cba50, curBucket = 449, curEntry = 0x0}
locallock = <optimized out>
rels = 0x23ff018
maxrels = <optimized out>
nrels = 0
idhentry = <optimized out>
i = <optimized out>
#5 0x000000000058e591 in smgrDoPendingSyncs (isCommit=isCommit@entry=true, isParallelWorker=isParallelWorker@entry=true) at storage.c:685
pending = <optimized out>
nrels = 0
maxrels = 0
srels = 0x0
scan = {hashp = 0x23edf60, curBucket = 9633000, curEntry = 0xe01600 <TopTransactionStateData>}
pendingsync = <optimized out>
#6 0x000000000053b1a4 in CommitTransaction () at xact.c:2118
s = 0xe01600 <TopTransactionStateData>
latestXid = <optimized out>
is_parallel_worker = true
__func__ = "CommitTransaction"

#2Justin Pryzby
pryzby@telsasoft.com
In reply to: Justin Pryzby (#1)
Re: v13: CLUSTER segv with wal_level=minimal and parallel index creation

This is easily reproduced, at least on pg_attribute

[pryzbyj@localhost ~]$ /usr/pgsql-13/bin/initdb -D pgsql13.dat
[pryzbyj@localhost ~]$ /usr/pgsql-13/bin/postgres -D pgsql13.dat -c logging_collector=off -c port=5678 -c unix_socket_directories=/tmp -c wal-level=minimal -c max_wal_senders=0&
[pryzbyj@localhost ~]$ psql -h /tmp -p 5678 postgres -c "SET min_parallel_table_scan_size=0" -c "SET client_min_messages=debug" -c "REINDEX INDEX pg_attribute_relid_attnum_index"
SET
SET
DEBUG: building index "pg_attribute_relid_attnum_index" on table "pg_attribute" with request for 1 parallel worker
TRAP: FailedAssertion("relcache_verdict == RelFileNodeSkippingWAL(relation->rd_node)", File: "relcache.c", Line: 2976)
postgres: parallel worker for PID 9637 (ExceptionalCondition+0x66)[0x921d86]
postgres: parallel worker for PID 9637 (AssertPendingSyncs_RelationCache+0x1db)[0x91a08b]
postgres: parallel worker for PID 9637 (smgrDoPendingSyncs+0x71)[0x58e591]
postgres: parallel worker for PID 9637 [0x53b1a4]
postgres: parallel worker for PID 9637 (EndParallelWorkerTransaction+0x16)[0x53b826]
postgres: parallel worker for PID 9637 (ParallelWorkerMain+0x437)[0x52fcf7]
postgres: parallel worker for PID 9637 (StartBackgroundWorker+0x25a)[0x76047a]
postgres: parallel worker for PID 9637 [0x76d88d]
postgres: parallel worker for PID 9637 [0x76e43e]
/lib64/libpthread.so.0(+0xf630)[0x7f46d4b47630]
/lib64/libc.so.6(__select+0x13)[0x7f46d26ba983]
postgres: parallel worker for PID 9637 [0x4887bc]
postgres: parallel worker for PID 9637 (PostmasterMain+0x1165)[0x76fb45]
postgres: parallel worker for PID 9637 (main+0x70d)[0x48a83d]
/lib64/libc.so.6(__libc_start_main+0xf5)[0x7f46d25e7555]

--
Justin

#3Tom Lane
tgl@sss.pgh.pa.us
In reply to: Justin Pryzby (#2)
Re: v13: CLUSTER segv with wal_level=minimal and parallel index creation

Justin Pryzby <pryzby@telsasoft.com> writes:

This is easily reproduced, at least on pg_attribute

Yeah, duplicated here. The assertion (not SIGSEGV) is in code
added by c6b92041d, so it seems clearly an open issue for v13.

I did not check to see what happens in a non-assert build.

regards, tom lane

#4Kyotaro Horiguchi
horikyota.ntt@gmail.com
In reply to: Tom Lane (#3)
Re: v13: CLUSTER segv with wal_level=minimal and parallel index creation

At Mon, 07 Sep 2020 00:33:03 -0400, Tom Lane <tgl@sss.pgh.pa.us> wrote in

Justin Pryzby <pryzby@telsasoft.com> writes:

This is easily reproduced, at least on pg_attribute

Thanks for the repro.

Yeah, duplicated here. The assertion (not SIGSEGV) is in code
added by c6b92041d, so it seems clearly an open issue for v13.

I did not check to see what happens in a non-assert build.

The cause is that the worker had received pending-sync entry correctly
but not never created a relcache entry for the relation using
RelationBuildDesc. So the rd_firstRelfilenodeSubid is not correctly
set.

I'm investigating it.

regards.

--
Kyotaro Horiguchi
NTT Open Source Software Center

#5Kyotaro Horiguchi
horikyota.ntt@gmail.com
In reply to: Kyotaro Horiguchi (#4)
1 attachment(s)
Re: v13: CLUSTER segv with wal_level=minimal and parallel index creation

At Mon, 07 Sep 2020 13:45:28 +0900 (JST), Kyotaro Horiguchi <horikyota.ntt@gmail.com> wrote in

The cause is that the worker had received pending-sync entry correctly
but not never created a relcache entry for the relation using
RelationBuildDesc. So the rd_firstRelfilenodeSubid is not correctly
set.

I'm investigating it.

Relcaches are loaded from a file with old content at parallel worker
startup. The relcache entry is corrected by invalidation at taking a
lock but pending syncs are not considered.

Since parallel workers don't access the files so we can just ignore
the assertion safely, but I want to rd_firstRelfilenodeSubid flag at
invalidation, as attached PoC patch.

regards.

--
Kyotaro Horiguchi
NTT Open Source Software Center

Attachments:

fix_para_reindex_on_mapped_rels_opc.patchtext/x-patch; charset=us-asciiDownload
diff --git a/src/backend/utils/cache/relcache.c b/src/backend/utils/cache/relcache.c
index 96ecad02dd..d66e914345 100644
--- a/src/backend/utils/cache/relcache.c
+++ b/src/backend/utils/cache/relcache.c
@@ -1322,10 +1322,27 @@ RelationInitPhysicalAddr(Relation relation)
 	}
 	else
 	{
+		Oid oldnode = relation->rd_node.relNode;
+
 		/* Consult the relation mapper */
 		relation->rd_node.relNode =
 			RelationMapOidToFilenode(relation->rd_id,
 									 relation->rd_rel->relisshared);
+
+		/*
+		 * The leader process might have replaced storage before staring this
+		 * worker. We need to set rd_firstRelfilenodeSubid according to pending
+		 * sync hash in that case. Currently the information is not actually
+		 * used in worker processes, but make things tidy.
+		 */
+		if (IsParallelWorker() && oldnode != relation->rd_node.relNode)
+		{
+			if (RelFileNodeSkippingWAL(relation->rd_node))
+				relation->rd_firstRelfilenodeSubid = TopSubTransactionId;
+			else
+				relation->rd_firstRelfilenodeSubid = InvalidTransactionId;
+		}
+
 		if (!OidIsValid(relation->rd_node.relNode))
 			elog(ERROR, "could not find relation mapping for relation \"%s\", OID %u",
 				 RelationGetRelationName(relation), relation->rd_id);
#6Noah Misch
noah@leadboat.com
In reply to: Kyotaro Horiguchi (#5)
Re: v13: CLUSTER segv with wal_level=minimal and parallel index creation

On Mon, Sep 07, 2020 at 05:40:36PM +0900, Kyotaro Horiguchi wrote:

At Mon, 07 Sep 2020 13:45:28 +0900 (JST), Kyotaro Horiguchi <horikyota.ntt@gmail.com> wrote in

The cause is that the worker had received pending-sync entry correctly
but not never created a relcache entry for the relation using
RelationBuildDesc. So the rd_firstRelfilenodeSubid is not correctly
set.

I'm investigating it.

Relcaches are loaded from a file with old content at parallel worker
startup. The relcache entry is corrected by invalidation at taking a
lock but pending syncs are not considered.

Since parallel workers don't access the files so we can just ignore
the assertion safely, but I want to rd_firstRelfilenodeSubid flag at
invalidation, as attached PoC patch.

[patch: When RelationInitPhysicalAddr() handles a mapped relation, re-fill
rd_firstRelfilenodeSubid from RelFileNodeSkippingWAL(), like
RelationBuildDesc() would do.]

As a PoC, this looks promising. Thanks. Would you add a test case such that
the following demonstrates the bug in the absence of your PoC?

printf '%s\n%s\n%s\n' 'log_statement = all' 'wal_level = minimal' 'max_wal_senders = 0' >/tmp/minimal.conf
make check TEMP_CONFIG=/tmp/minimal.conf

Please have the test try both a nailed-and-mapped relation and a "nailed, but
not mapped" relation. I am fairly confident that your PoC fixes the former
case, but the latter may need additional code.

#7Kyotaro Horiguchi
horikyota.ntt@gmail.com
In reply to: Noah Misch (#6)
1 attachment(s)
Re: v13: CLUSTER segv with wal_level=minimal and parallel index creation

At Mon, 7 Sep 2020 02:32:55 -0700, Noah Misch <noah@leadboat.com> wrote in

On Mon, Sep 07, 2020 at 05:40:36PM +0900, Kyotaro Horiguchi wrote:

At Mon, 07 Sep 2020 13:45:28 +0900 (JST), Kyotaro Horiguchi <horikyota.ntt@gmail.com> wrote in

The cause is that the worker had received pending-sync entry correctly
but not never created a relcache entry for the relation using
RelationBuildDesc. So the rd_firstRelfilenodeSubid is not correctly
set.

I'm investigating it.

Relcaches are loaded from a file with old content at parallel worker
startup. The relcache entry is corrected by invalidation at taking a
lock but pending syncs are not considered.

Since parallel workers don't access the files so we can just ignore
the assertion safely, but I want to rd_firstRelfilenodeSubid flag at
invalidation, as attached PoC patch.

[patch: When RelationInitPhysicalAddr() handles a mapped relation, re-fill
rd_firstRelfilenodeSubid from RelFileNodeSkippingWAL(), like
RelationBuildDesc() would do.]

As a PoC, this looks promising. Thanks. Would you add a test case such that
the following demonstrates the bug in the absence of your PoC?

printf '%s\n%s\n%s\n' 'log_statement = all' 'wal_level = minimal' 'max_wal_senders = 0' >/tmp/minimal.conf
make check TEMP_CONFIG=/tmp/minimal.conf

Mmm. I was close to add some tests to 018_wal_optimize.pl but your
suggestion seems better. I added several ines to create_index.sql.

Please have the test try both a nailed-and-mapped relation and a "nailed, but
not mapped" relation. I am fairly confident that your PoC fixes the former
case, but the latter may need additional code.

Mmm. You're right. I choosed pg_amproc_fam_proc_index as
nailed-but-not-mapped index.

regards.

--
Kyotaro Horiguchi
NTT Open Source Software Center

Attachments:

0001-Fix-assertion-failure-during-reindex-while-wal_level.patchtext/x-patch; charset=us-asciiDownload
From c4d53acd64011f1ed5652177923d010f4c50fa89 Mon Sep 17 00:00:00 2001
From: Kyotaro Horiguchi <horikyoga.ntt@gmail.com>
Date: Mon, 7 Sep 2020 20:30:30 +0900
Subject: [PATCH] Fix assertion failure during reindex while wal_level=minimal.

When wal_level=minimal, nailed relations are forgotten to set
rd_firstRelfilenodeSubid correctly on parallel workers. Fix it.
---
 src/backend/utils/cache/relcache.c         | 16 ++++++++++++++++
 src/test/regress/expected/create_index.out |  5 +++++
 src/test/regress/sql/create_index.sql      |  6 ++++++
 3 files changed, 27 insertions(+)

diff --git a/src/backend/utils/cache/relcache.c b/src/backend/utils/cache/relcache.c
index 96ecad02dd..cd6c8bbc70 100644
--- a/src/backend/utils/cache/relcache.c
+++ b/src/backend/utils/cache/relcache.c
@@ -1273,6 +1273,8 @@ RelationBuildDesc(Oid targetRelId, bool insertIt)
 static void
 RelationInitPhysicalAddr(Relation relation)
 {
+	Oid oldnode = relation->rd_node.relNode;
+
 	/* these relations kinds never have storage */
 	if (!RELKIND_HAS_STORAGE(relation->rd_rel->relkind))
 		return;
@@ -1330,6 +1332,20 @@ RelationInitPhysicalAddr(Relation relation)
 			elog(ERROR, "could not find relation mapping for relation \"%s\", OID %u",
 				 RelationGetRelationName(relation), relation->rd_id);
 	}
+
+	/*
+	 * Our leader process might have replaced storage before staring this
+	 * worker. We need to set rd_firstRelfilenodeSubid according to pending
+	 * sync hash in that case. Currently the information is not actually used
+	 * in worker processes, but make things tidy.
+	 */
+	if (IsParallelWorker() && oldnode != relation->rd_node.relNode)
+	{
+		if (RelFileNodeSkippingWAL(relation->rd_node))
+			relation->rd_firstRelfilenodeSubid = TopSubTransactionId;
+		else
+			relation->rd_firstRelfilenodeSubid = InvalidTransactionId;
+	}
 }
 
 /*
diff --git a/src/test/regress/expected/create_index.out b/src/test/regress/expected/create_index.out
index 814416d936..6468418639 100644
--- a/src/test/regress/expected/create_index.out
+++ b/src/test/regress/expected/create_index.out
@@ -2598,6 +2598,11 @@ REINDEX TABLE pg_toast.pg_toast_1260;
 ERROR:  must be owner of table pg_toast_1260
 REINDEX INDEX pg_toast.pg_toast_1260_index;
 ERROR:  must be owner of index pg_toast_1260_index
+-- parallel reindex
+RESET ROLE;
+SET min_parallel_table_scan_size = 0;
+REINDEX INDEX pg_attribute_relid_attnum_index;	-- nailed and mapped
+REINDEX INDEX pg_amproc_fam_proc_index; 		-- nailed but not mapped
 -- Clean up
 RESET ROLE;
 REVOKE USAGE ON SCHEMA pg_toast FROM regress_reindexuser;
diff --git a/src/test/regress/sql/create_index.sql b/src/test/regress/sql/create_index.sql
index f3667bacdc..d9cb3f14f6 100644
--- a/src/test/regress/sql/create_index.sql
+++ b/src/test/regress/sql/create_index.sql
@@ -1115,6 +1115,12 @@ SET SESSION ROLE regress_reindexuser;
 REINDEX TABLE pg_toast.pg_toast_1260;
 REINDEX INDEX pg_toast.pg_toast_1260_index;
 
+-- parallel reindex
+RESET ROLE;
+SET min_parallel_table_scan_size = 0;
+REINDEX INDEX pg_attribute_relid_attnum_index;	-- nailed and mapped
+REINDEX INDEX pg_amproc_fam_proc_index; 		-- nailed but not mapped
+
 -- Clean up
 RESET ROLE;
 REVOKE USAGE ON SCHEMA pg_toast FROM regress_reindexuser;
-- 
2.18.4

#8Kyotaro Horiguchi
horikyota.ntt@gmail.com
In reply to: Kyotaro Horiguchi (#7)
1 attachment(s)
Re: v13: CLUSTER segv with wal_level=minimal and parallel index creation

At Tue, 08 Sep 2020 09:13:53 +0900 (JST), Kyotaro Horiguchi <horikyota.ntt@gmail.com> wrote in

At Mon, 7 Sep 2020 02:32:55 -0700, Noah Misch <noah@leadboat.com> wrote in

As a PoC, this looks promising. Thanks. Would you add a test case such that
the following demonstrates the bug in the absence of your PoC?

printf '%s\n%s\n%s\n' 'log_statement = all' 'wal_level = minimal' 'max_wal_senders = 0' >/tmp/minimal.conf
make check TEMP_CONFIG=/tmp/minimal.conf

Mmm. I was close to add some tests to 018_wal_optimize.pl but your
suggestion seems better. I added several ines to create_index.sql.

Please have the test try both a nailed-and-mapped relation and a "nailed, but
not mapped" relation. I am fairly confident that your PoC fixes the former
case, but the latter may need additional code.

Mmm. You're right. I choosed pg_amproc_fam_proc_index as
nailed-but-not-mapped index.

I fixed a typo (s/staring/starting/).

regards.

--
Kyotaro Horiguchi
NTT Open Source Software Center

Attachments:

v2-0001-Fix-assertion-failure-during-reindex-while-wal_le.patchtext/x-patch; charset=us-asciiDownload
From 44512f8672395dc40a001eb92b7ffca060eb411d Mon Sep 17 00:00:00 2001
From: Kyotaro Horiguchi <horikyoga.ntt@gmail.com>
Date: Mon, 7 Sep 2020 20:30:30 +0900
Subject: [PATCH v2] Fix assertion failure during reindex while
 wal_level=minimal.

When wal_level=minimal, nailed relations are forgotten to set
rd_firstRelfilenodeSubid correctly on parallel workers. Fix it.
---
 src/backend/utils/cache/relcache.c         | 16 ++++++++++++++++
 src/test/regress/expected/create_index.out |  5 +++++
 src/test/regress/sql/create_index.sql      |  6 ++++++
 3 files changed, 27 insertions(+)

diff --git a/src/backend/utils/cache/relcache.c b/src/backend/utils/cache/relcache.c
index 96ecad02dd..8a61fd674e 100644
--- a/src/backend/utils/cache/relcache.c
+++ b/src/backend/utils/cache/relcache.c
@@ -1273,6 +1273,8 @@ RelationBuildDesc(Oid targetRelId, bool insertIt)
 static void
 RelationInitPhysicalAddr(Relation relation)
 {
+	Oid oldnode = relation->rd_node.relNode;
+
 	/* these relations kinds never have storage */
 	if (!RELKIND_HAS_STORAGE(relation->rd_rel->relkind))
 		return;
@@ -1330,6 +1332,20 @@ RelationInitPhysicalAddr(Relation relation)
 			elog(ERROR, "could not find relation mapping for relation \"%s\", OID %u",
 				 RelationGetRelationName(relation), relation->rd_id);
 	}
+
+	/*
+	 * Our leader process might have replaced storage before starting this
+	 * worker. We need to set rd_firstRelfilenodeSubid according to pending
+	 * sync hash in that case. Currently the information is not actually used
+	 * in worker processes, but make things tidy.
+	 */
+	if (IsParallelWorker() && oldnode != relation->rd_node.relNode)
+	{
+		if (RelFileNodeSkippingWAL(relation->rd_node))
+			relation->rd_firstRelfilenodeSubid = TopSubTransactionId;
+		else
+			relation->rd_firstRelfilenodeSubid = InvalidTransactionId;
+	}
 }
 
 /*
diff --git a/src/test/regress/expected/create_index.out b/src/test/regress/expected/create_index.out
index 64c0c66859..66ec1465e1 100644
--- a/src/test/regress/expected/create_index.out
+++ b/src/test/regress/expected/create_index.out
@@ -2732,6 +2732,11 @@ REINDEX TABLE pg_toast.pg_toast_1260;
 ERROR:  must be owner of table pg_toast_1260
 REINDEX INDEX pg_toast.pg_toast_1260_index;
 ERROR:  must be owner of index pg_toast_1260_index
+-- parallel reindex
+RESET ROLE;
+SET min_parallel_table_scan_size = 0;
+REINDEX INDEX pg_attribute_relid_attnum_index;	-- nailed and mapped
+REINDEX INDEX pg_amproc_fam_proc_index; 		-- nailed but not mapped
 -- Clean up
 RESET ROLE;
 REVOKE USAGE ON SCHEMA pg_toast FROM regress_reindexuser;
diff --git a/src/test/regress/sql/create_index.sql b/src/test/regress/sql/create_index.sql
index 37f7259da9..6b104a2389 100644
--- a/src/test/regress/sql/create_index.sql
+++ b/src/test/regress/sql/create_index.sql
@@ -1191,6 +1191,12 @@ SET SESSION ROLE regress_reindexuser;
 REINDEX TABLE pg_toast.pg_toast_1260;
 REINDEX INDEX pg_toast.pg_toast_1260_index;
 
+-- parallel reindex
+RESET ROLE;
+SET min_parallel_table_scan_size = 0;
+REINDEX INDEX pg_attribute_relid_attnum_index;	-- nailed and mapped
+REINDEX INDEX pg_amproc_fam_proc_index; 		-- nailed but not mapped
+
 -- Clean up
 RESET ROLE;
 REVOKE USAGE ON SCHEMA pg_toast FROM regress_reindexuser;
-- 
2.18.4

#9Noah Misch
noah@leadboat.com
In reply to: Kyotaro Horiguchi (#8)
Re: v13: CLUSTER segv with wal_level=minimal and parallel index creation

On Tue, Sep 08, 2020 at 10:43:32AM +0900, Kyotaro Horiguchi wrote:

At Tue, 08 Sep 2020 09:13:53 +0900 (JST), Kyotaro Horiguchi <horikyota.ntt@gmail.com> wrote in

At Mon, 7 Sep 2020 02:32:55 -0700, Noah Misch <noah@leadboat.com> wrote in

As a PoC, this looks promising. Thanks. Would you add a test case such that
the following demonstrates the bug in the absence of your PoC?

printf '%s\n%s\n%s\n' 'log_statement = all' 'wal_level = minimal' 'max_wal_senders = 0' >/tmp/minimal.conf
make check TEMP_CONFIG=/tmp/minimal.conf

Mmm. I was close to add some tests to 018_wal_optimize.pl but your
suggestion seems better. I added several ines to create_index.sql.

Please have the test try both a nailed-and-mapped relation and a "nailed, but
not mapped" relation. I am fairly confident that your PoC fixes the former
case, but the latter may need additional code.

Mmm. You're right. I choosed pg_amproc_fam_proc_index as
nailed-but-not-mapped index.

I fixed a typo (s/staring/starting/).

At a glance, this looks reasonable. If a closer look doesn't reveal problems,
I'll push this.

#10Noah Misch
noah@leadboat.com
In reply to: Noah Misch (#9)
1 attachment(s)
Re: v13: CLUSTER segv with wal_level=minimal and parallel index creation

On Mon, Sep 07, 2020 at 07:47:09PM -0700, Noah Misch wrote:

On Tue, Sep 08, 2020 at 10:43:32AM +0900, Kyotaro Horiguchi wrote:

At Tue, 08 Sep 2020 09:13:53 +0900 (JST), Kyotaro Horiguchi <horikyota.ntt@gmail.com> wrote in

At Mon, 7 Sep 2020 02:32:55 -0700, Noah Misch <noah@leadboat.com> wrote in

As a PoC, this looks promising. Thanks. Would you add a test case such that
the following demonstrates the bug in the absence of your PoC?

printf '%s\n%s\n%s\n' 'log_statement = all' 'wal_level = minimal' 'max_wal_senders = 0' >/tmp/minimal.conf
make check TEMP_CONFIG=/tmp/minimal.conf

Mmm. I was close to add some tests to 018_wal_optimize.pl but your
suggestion seems better. I added several ines to create_index.sql.

After looking closer, I've moved the test to reindex_catalog.sql; see that
file's header comment for the reasons. One now needs this command:

printf '%s\n%s\n%s\n' 'log_statement = all' 'wal_level = minimal' 'max_wal_senders = 0' >/tmp/minimal.conf
make check-tests TESTS=reindex_catalog TEMP_CONFIG=/tmp/minimal.conf

Please have the test try both a nailed-and-mapped relation and a "nailed, but
not mapped" relation. I am fairly confident that your PoC fixes the former
case, but the latter may need additional code.

Mmm. You're right. I choosed pg_amproc_fam_proc_index as
nailed-but-not-mapped index.

I fixed a typo (s/staring/starting/).

At a glance, this looks reasonable. If a closer look doesn't reveal problems,
I'll push this.

RelationBuildDesc() calls RelationInitPhysicalAddr(), so RelationBuildDesc()
can stop calling RelFileNodeSkippingWAL(). The attached version makes it so,
and I plan to push it.

Attachments:

skip-wal-nailed-v3.patchtext/plain; charset=us-asciiDownload
Author:     Noah Misch <noah@leadboat.com>
Commit:     Noah Misch <noah@leadboat.com>

    Fix rd_firstRelfilenodeSubid for nailed relations, in parallel workers.
    
    Move applicable code out of RelationBuildDesc(), which nailed relations
    bypass.  Non-assert builds experienced no known problems.  Back-patch to
    v13, where commit c6b92041d38512a4176ed76ad06f713d2e6c01a8 introduced
    rd_firstRelfilenodeSubid.
    
    Kyotaro Horiguchi.  Reported by Justin Pryzby.
    
    Discussion: https://postgr.es/m/20200907023737.GA7158@telsasoft.com

diff --git a/src/backend/utils/cache/relcache.c b/src/backend/utils/cache/relcache.c
index 96ecad0..9061af8 100644
--- a/src/backend/utils/cache/relcache.c
+++ b/src/backend/utils/cache/relcache.c
@@ -1243,14 +1243,6 @@ RelationBuildDesc(Oid targetRelId, bool insertIt)
 	if (insertIt)
 		RelationCacheInsert(relation, true);
 
-	/*
-	 * For RelationNeedsWAL() to answer correctly on parallel workers, restore
-	 * rd_firstRelfilenodeSubid.  No subtransactions start or end while in
-	 * parallel mode, so the specific SubTransactionId does not matter.
-	 */
-	if (IsParallelWorker() && RelFileNodeSkippingWAL(relation->rd_node))
-		relation->rd_firstRelfilenodeSubid = TopSubTransactionId;
-
 	/* It's fully valid */
 	relation->rd_isvalid = true;
 
@@ -1273,6 +1265,8 @@ RelationBuildDesc(Oid targetRelId, bool insertIt)
 static void
 RelationInitPhysicalAddr(Relation relation)
 {
+	Oid			oldnode = relation->rd_node.relNode;
+
 	/* these relations kinds never have storage */
 	if (!RELKIND_HAS_STORAGE(relation->rd_rel->relkind))
 		return;
@@ -1330,6 +1324,19 @@ RelationInitPhysicalAddr(Relation relation)
 			elog(ERROR, "could not find relation mapping for relation \"%s\", OID %u",
 				 RelationGetRelationName(relation), relation->rd_id);
 	}
+
+	/*
+	 * For RelationNeedsWAL() to answer correctly on parallel workers, restore
+	 * rd_firstRelfilenodeSubid.  No subtransactions start or end while in
+	 * parallel mode, so the specific SubTransactionId does not matter.
+	 */
+	if (IsParallelWorker() && oldnode != relation->rd_node.relNode)
+	{
+		if (RelFileNodeSkippingWAL(relation->rd_node))
+			relation->rd_firstRelfilenodeSubid = TopSubTransactionId;
+		else
+			relation->rd_firstRelfilenodeSubid = InvalidSubTransactionId;
+	}
 }
 
 /*
diff --git a/src/test/regress/expected/reindex_catalog.out b/src/test/regress/expected/reindex_catalog.out
index 4b5fba4..204f056 100644
--- a/src/test/regress/expected/reindex_catalog.out
+++ b/src/test/regress/expected/reindex_catalog.out
@@ -36,3 +36,13 @@ REINDEX INDEX pg_index_indexrelid_index; -- non-mapped, non-shared, critical
 REINDEX INDEX pg_index_indrelid_index; -- non-mapped, non-shared, non-critical
 REINDEX INDEX pg_database_oid_index; -- mapped, shared, critical
 REINDEX INDEX pg_shdescription_o_c_index; -- mapped, shared, non-critical
+-- Check the same REINDEX INDEX statements under parallelism.
+BEGIN;
+SET min_parallel_table_scan_size = 0;
+REINDEX INDEX pg_class_oid_index; -- mapped, non-shared, critical
+REINDEX INDEX pg_class_relname_nsp_index; -- mapped, non-shared, non-critical
+REINDEX INDEX pg_index_indexrelid_index; -- non-mapped, non-shared, critical
+REINDEX INDEX pg_index_indrelid_index; -- non-mapped, non-shared, non-critical
+REINDEX INDEX pg_database_oid_index; -- mapped, shared, critical
+REINDEX INDEX pg_shdescription_o_c_index; -- mapped, shared, non-critical
+ROLLBACK;
diff --git a/src/test/regress/sql/reindex_catalog.sql b/src/test/regress/sql/reindex_catalog.sql
index 87ecf52..8203641 100644
--- a/src/test/regress/sql/reindex_catalog.sql
+++ b/src/test/regress/sql/reindex_catalog.sql
@@ -39,3 +39,14 @@ REINDEX INDEX pg_index_indexrelid_index; -- non-mapped, non-shared, critical
 REINDEX INDEX pg_index_indrelid_index; -- non-mapped, non-shared, non-critical
 REINDEX INDEX pg_database_oid_index; -- mapped, shared, critical
 REINDEX INDEX pg_shdescription_o_c_index; -- mapped, shared, non-critical
+
+-- Check the same REINDEX INDEX statements under parallelism.
+BEGIN;
+SET min_parallel_table_scan_size = 0;
+REINDEX INDEX pg_class_oid_index; -- mapped, non-shared, critical
+REINDEX INDEX pg_class_relname_nsp_index; -- mapped, non-shared, non-critical
+REINDEX INDEX pg_index_indexrelid_index; -- non-mapped, non-shared, critical
+REINDEX INDEX pg_index_indrelid_index; -- non-mapped, non-shared, non-critical
+REINDEX INDEX pg_database_oid_index; -- mapped, shared, critical
+REINDEX INDEX pg_shdescription_o_c_index; -- mapped, shared, non-critical
+ROLLBACK;
#11Kyotaro Horiguchi
horikyota.ntt@gmail.com
In reply to: Noah Misch (#10)
Re: v13: CLUSTER segv with wal_level=minimal and parallel index creation

Thank you for the checking.

At Tue, 8 Sep 2020 21:05:43 -0700, Noah Misch <noah@leadboat.com> wrote in

On Mon, Sep 07, 2020 at 07:47:09PM -0700, Noah Misch wrote:
After looking closer, I've moved the test to reindex_catalog.sql; see that
file's header comment for the reasons. One now needs this command:

printf '%s\n%s\n%s\n' 'log_statement = all' 'wal_level = minimal' 'max_wal_senders = 0' >/tmp/minimal.conf
make check-tests TESTS=reindex_catalog TEMP_CONFIG=/tmp/minimal.conf

I missed the file. It is surely the best place.

RelationBuildDesc() calls RelationInitPhysicalAddr(), so RelationBuildDesc()
can stop calling RelFileNodeSkippingWAL(). The attached version makes it so,
and I plan to push it.

Sure. relNode is filled with zeros so RelationInitPhysicalAddr() works
correctly for RelationBuildDesc().

Both changes look good to me.

regards.

--
Kyotaro Horiguchi
NTT Open Source Software Center

#12Kyotaro Horiguchi
horikyota.ntt@gmail.com
In reply to: Kyotaro Horiguchi (#11)
Re: v13: CLUSTER segv with wal_level=minimal and parallel index creation

At Wed, 09 Sep 2020 14:40:37 +0900 (JST), Kyotaro Horiguchi <horikyota.ntt@gmail.com> wrote in

Sure. relNode is filled with zeros so RelationInitPhysicalAddr() works
correctly for RelationBuildDesc().

Both changes look good to me.

Thank you for committing this!

regards.
--
Kyotaro Horiguchi
NTT Open Source Software Center