CREATE UNLOGGED TABLE seq faults when debug_discard_caches=1

Started by Spyridon Dimitrios Agathosover 3 years ago7 messageshackers
Jump to latest
#1Spyridon Dimitrios Agathos
spyridon.dimitrios.agathos@gmail.com

Hi Hackers,

while testing the developer settings of PSQL (14.5) I came across this
issue:

postgres=# CREATE UNLOGGED TABLE stats (
postgres(# pg_hash BIGINT NOT NULL,
postgres(# category TEXT NOT NULL,
postgres(# PRIMARY KEY (pg_hash, category)
postgres(# );
server closed the connection unexpectedly
This probably means the server terminated abnormally
before or while processing the request.
The connection to the server was lost. Attempting reset: Failed.

Checking the stack trace I found this:
Program received signal SIGSEGV, Segmentation fault.
0x0000000000ab6662 in smgrwrite (reln=0x0, forknum=INIT_FORKNUM,
blocknum=0, buffer=0x2b5eec0 "", skipFsync=true)
at
/opt/postgresql-src/debug-build/../src/backend/storage/smgr/smgr.c:526
526 smgrsw[reln->smgr_which].smgr_write(reln, forknum, blocknum,
(gdb) bt
#0 0x0000000000ab6662 in smgrwrite (reln=0x0, forknum=INIT_FORKNUM,
blocknum=0, buffer=0x2b5eec0 "", skipFsync=true) at
/opt/postgresql-src/debug-build/../src/backend/storage/smgr/smgr.c:526
#1 0x000000000056991b in btbuildempty (index=0x7fe60ac9be60) at
/opt/postgresql-src/debug-build/../src/backend/access/nbtree/nbtree.c:166
#2 0x0000000000623ad9 in index_build (heapRelation=0x7fe60ac9c078,
indexRelation=0x7fe60ac9be60, indexInfo=0x2b4c330, isreindex=false,
parallel=true) at
/opt/postgresql-src/debug-build/../src/backend/catalog/index.c:3028
#3 0x0000000000621886 in index_create (heapRelation=0x7fe60ac9c078,
indexRelationName=0x2b4c448 "stats_pkey", indexRelationId=16954,
parentIndexRelid=0, parentConstraintId=0, relFileNode=0,
indexInfo=0x2b4c330, indexColNames=0x2b4bee8, accessMethodObjectId=403,
tableSpaceId=0, collationObjectId=0x2b4c560, classObjectId=0x2b4c580,
coloptions=0x2b4c5a0, reloptions=0,
flags=3, constr_flags=0, allow_system_table_mods=false,
is_internal=false, constraintId=0x7ffef5cc4a7c) at
/opt/postgresql-src/debug-build/../src/backend/catalog/index.c:1232
#4 0x000000000074af6e in DefineIndex (relationId=16949, stmt=0x2b527a0,
indexRelationId=0, parentIndexId=0, parentConstraintId=0,
is_alter_table=false, check_rights=true, check_not_in_use=true,
skip_build=false, quiet=false) at
/opt/postgresql-src/debug-build/../src/backend/commands/indexcmds.c:1164
#5 0x0000000000ac8d78 in ProcessUtilitySlow (pstate=0x2b49230,
pstmt=0x2b48fe8, queryString=0x2a71650 "CREATE UNLOGGED TABLE stats (\n
pg_hash BIGINT NOT NULL,\n category TEXT NOT NULL,\n PRIMARY KEY
(pg_hash, category)\n);", context=PROCESS_UTILITY_SUBCOMMAND, params=0x0,
queryEnv=0x0, dest=0xe9ceb0 <donothingDR>, qc=0x0)
at /opt/postgresql-src/debug-build/../src/backend/tcop/utility.c:1535
#6 0x0000000000ac6637 in standard_ProcessUtility (pstmt=0x2b48fe8,
queryString=0x2a71650 "CREATE UNLOGGED TABLE stats (\n pg_hash BIGINT
NOT NULL,\n category TEXT NOT NULL,\n PRIMARY KEY (pg_hash,
category)\n);", readOnlyTree=false, context=PROCESS_UTILITY_SUBCOMMAND,
params=0x0, queryEnv=0x0, dest=0xe9ceb0 <donothingDR>, qc=0x0)
at /opt/postgresql-src/debug-build/../src/backend/tcop/utility.c:1066
#7 0x0000000000ac548b in ProcessUtility (pstmt=0x2b48fe8,
queryString=0x2a71650 "CREATE UNLOGGED TABLE stats (\n pg_hash BIGINT
NOT NULL,\n category TEXT NOT NULL,\n PRIMARY KEY (pg_hash,
category)\n);", readOnlyTree=false, context=PROCESS_UTILITY_SUBCOMMAND,
params=0x0, queryEnv=0x0, dest=0xe9ceb0 <donothingDR>, qc=0x0)
at /opt/postgresql-src/debug-build/../src/backend/tcop/utility.c:527
#8 0x0000000000ac7e5e in ProcessUtilitySlow (pstate=0x2b52b10,
pstmt=0x2a72d28, queryString=0x2a71650 "CREATE UNLOGGED TABLE stats (\n
pg_hash BIGINT NOT NULL,\n category TEXT NOT NULL,\n PRIMARY KEY
(pg_hash, category)\n);", context=PROCESS_UTILITY_TOPLEVEL, params=0x0,
queryEnv=0x0, dest=0x2a72df8, qc=0x7ffef5cc6c10)
at /opt/postgresql-src/debug-build/../src/backend/tcop/utility.c:1244
#9 0x0000000000ac6637 in standard_ProcessUtility (pstmt=0x2a72d28,
queryString=0x2a71650 "CREATE UNLOGGED TABLE stats (\n pg_hash BIGINT
NOT NULL,\n category TEXT NOT NULL,\n PRIMARY KEY (pg_hash,
category)\n);", readOnlyTree=false, context=PROCESS_UTILITY_TOPLEVEL,
params=0x0, queryEnv=0x0, dest=0x2a72df8, qc=0x7ffef5cc6c10)
at /opt/postgresql-src/debug-build/../src/backend/tcop/utility.c:1066
#10 0x0000000000ac548b in ProcessUtility (pstmt=0x2a72d28,
queryString=0x2a71650 "CREATE UNLOGGED TABLE stats (\n pg_hash BIGINT
NOT NULL,\n category TEXT NOT NULL,\n PRIMARY KEY (pg_hash,
category)\n);", readOnlyTree=false, context=PROCESS_UTILITY_TOPLEVEL,
params=0x0, queryEnv=0x0, dest=0x2a72df8, qc=0x7ffef5cc6c10)
at /opt/postgresql-src/debug-build/../src/backend/tcop/utility.c:527
#11 0x0000000000ac4aad in PortalRunUtility (portal=0x2b06bf0,
pstmt=0x2a72d28, isTopLevel=true, setHoldSnapshot=false, dest=0x2a72df8,
qc=0x7ffef5cc6c10) at
/opt/postgresql-src/debug-build/../src/backend/tcop/pquery.c:1155
#12 0x0000000000ac3b57 in PortalRunMulti (portal=0x2b06bf0,
isTopLevel=true, setHoldSnapshot=false, dest=0x2a72df8, altdest=0x2a72df8,
qc=0x7ffef5cc6c10) at
/opt/postgresql-src/debug-build/../src/backend/tcop/pquery.c:1312
#13 0x0000000000ac306f in PortalRun (portal=0x2b06bf0,
count=9223372036854775807, isTopLevel=true, run_once=true, dest=0x2a72df8,
altdest=0x2a72df8, qc=0x7ffef5cc6c10) at
/opt/postgresql-src/debug-build/../src/backend/tcop/pquery.c:788
#14 0x0000000000abdfad in exec_simple_query (query_string=0x2a71650 "CREATE
UNLOGGED TABLE stats (\n pg_hash BIGINT NOT NULL,\n category TEXT NOT
NULL,\n PRIMARY KEY (pg_hash, category)\n);") at
/opt/postgresql-src/debug-build/../src/backend/tcop/postgres.c:1213
#15 0x0000000000abd1fb in PostgresMain (argc=1, argv=0x7ffef5cc6e50,
dbname=0x2a9cb90 "postgres", username=0x2a9cb68 "host_user") at
/opt/postgresql-src/debug-build/../src/backend/tcop/postgres.c:4496
#16 0x00000000009c2b4a in BackendRun (port=0x2a964c0) at
/opt/postgresql-src/debug-build/../src/backend/postmaster/postmaster.c:4530
#17 0x00000000009c2074 in BackendStartup (port=0x2a964c0) at
/opt/postgresql-src/debug-build/../src/backend/postmaster/postmaster.c:4252
#18 0x00000000009c0e27 in ServerLoop () at
/opt/postgresql-src/debug-build/../src/backend/postmaster/postmaster.c:1745
#19 0x00000000009be275 in PostmasterMain (argc=3, argv=0x2a6add0) at
/opt/postgresql-src/debug-build/../src/backend/postmaster/postmaster.c:1417
#20 0x0000000000896dc3 in main (argc=3, argv=0x2a6add0) at
/opt/postgresql-src/debug-build/../src/backend/main/main.c:209

The error does not appear if the table is not defined as UNLOGGED, or if
the primary key is not compound.
Is it that the specific developer option is not used by the community to
run tests?

Kind regards,

--
Spiros
(ServiceNow)

#2Tom Lane
tgl@sss.pgh.pa.us
In reply to: Spyridon Dimitrios Agathos (#1)
Re: CREATE UNLOGGED TABLE seq faults when debug_discard_caches=1

Spyridon Dimitrios Agathos <spyridon.dimitrios.agathos@gmail.com> writes:

while testing the developer settings of PSQL (14.5) I came across this
issue:

postgres=# CREATE UNLOGGED TABLE stats (
postgres(# pg_hash BIGINT NOT NULL,
postgres(# category TEXT NOT NULL,
postgres(# PRIMARY KEY (pg_hash, category)
postgres(# );
server closed the connection unexpectedly

Hmm ... confirmed in the v14 branch, but v15 and HEAD are fine,
evidently as a result of commit f10f0ae42 having replaced this
unprotected use of index->rd_smgr.

I wonder whether we ought to back-patch f10f0ae42. We could
leave the RelationOpenSmgr macro in existence to avoid unnecessary
breakage of extension code, but stop using it within our own code.

regards, tom lane

#3Tom Lane
tgl@sss.pgh.pa.us
In reply to: Tom Lane (#2)
Re: CREATE UNLOGGED TABLE seq faults when debug_discard_caches=1

I wrote:

I wonder whether we ought to back-patch f10f0ae42. We could
leave the RelationOpenSmgr macro in existence to avoid unnecessary
breakage of extension code, but stop using it within our own code.

Concretely, about like this for v14 (didn't look at the older
branches yet).

I'm not sure whether to recommend that outside extensions switch to using
RelationGetSmgr in pre-v15 branches. If they do, they run a risk
of compile failure should they be built against old back-branch
headers. Once compiled, though, they'd work against any minor release
(since RelationGetSmgr is static inline, not something in the core
backend). So maybe that'd be good enough, and keeping their code in
sync with what they need for v15 would be worth something.

regards, tom lane

Attachments:

backpatch-f10f0ae42-v14.patchtext/x-diff; charset=us-ascii; name=backpatch-f10f0ae42-v14.patchDownload+171-162
#4David Geier
geidav.pg@gmail.com
In reply to: Tom Lane (#3)
Re: CREATE UNLOGGED TABLE seq faults when debug_discard_caches=1

Hi Tom,

Back-patching but keeping RelationOpenSgmr() for extensions sounds
reasonable.

On a different note: are we frequently running our tests suites with
debug_discard_caches=1 enabled?
It doesn't seem like. I just ran make check with debug_discard_caches=1 on

- latest master: everything passes.
- version 14.5: fails in create_index, create_index_spgist, create_view.

So the buggy code path is at least covered by the tests. But it seems
like we could have found it earlier by regularly running with
debug_discard_caches=1.

--
David Geier
(ServiceNow)

Show quoted text

On 11/17/22 18:51, Tom Lane wrote:

I wrote:

I wonder whether we ought to back-patch f10f0ae42. We could
leave the RelationOpenSmgr macro in existence to avoid unnecessary
breakage of extension code, but stop using it within our own code.

Concretely, about like this for v14 (didn't look at the older
branches yet).

I'm not sure whether to recommend that outside extensions switch to using
RelationGetSmgr in pre-v15 branches. If they do, they run a risk
of compile failure should they be built against old back-branch
headers. Once compiled, though, they'd work against any minor release
(since RelationGetSmgr is static inline, not something in the core
backend). So maybe that'd be good enough, and keeping their code in
sync with what they need for v15 would be worth something.

regards, tom lane

#5Tom Lane
tgl@sss.pgh.pa.us
In reply to: David Geier (#4)
Re: CREATE UNLOGGED TABLE seq faults when debug_discard_caches=1

David Geier <geidav.pg@gmail.com> writes:

On a different note: are we frequently running our tests suites with
debug_discard_caches=1 enabled?
It doesn't seem like.

Hmm. Buildfarm members avocet and trilobite are supposed to be
doing that, but their runtimes of late put the lie to it.
Configuration option got lost somewhere?

prion is running with -DRELCACHE_FORCE_RELEASE -DCATCACHE_FORCE_RELEASE,
which I would have thought would be enough to catch this, but I guess
not.

regards, tom lane

#6Tomas Vondra
tomas.vondra@2ndquadrant.com
In reply to: Tom Lane (#5)
Re: CREATE UNLOGGED TABLE seq faults when debug_discard_caches=1

On 11/18/22 15:43, Tom Lane wrote:

David Geier <geidav.pg@gmail.com> writes:

On a different note: are we frequently running our tests suites with
debug_discard_caches=1 enabled?
It doesn't seem like.

Hmm. Buildfarm members avocet and trilobite are supposed to be
doing that, but their runtimes of late put the lie to it.
Configuration option got lost somewhere?

Yup, my bad - I forgot to tweak CPPFLAGS when upgrading the buildfarm
client to v12. Fixed, next run should be with

CPPFLAGS => '-DCLOBBER_CACHE_ALWAYS',

regards

--
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

#7Jakub Wartak
jakub.wartak@enterprisedb.com
In reply to: Tomas Vondra (#6)
Re: CREATE UNLOGGED TABLE seq faults when debug_discard_caches=1

Hi hackers,

Mr Lane, thank you for backporting this also to version 13. It seems
to be occuring in the wild (without debug_discard_caches) for real
user too when doing a lot of "CREATE INDEX i ON
unlogged_table_truncated_after_crash (x,y)" which sometimes (rarely)
results in SIGSEGV11. I've reproduced it also on 13.9 recently thanks
to "break *btbuildempty / call InvalidateSystemCaches()".

I'm leaving partial stack trace so that other might find it (note the:
smgrwrite reln=0x0):

#0 smgrwrite (reln=0x0, forknum=INIT_FORKNUM, blocknum=0,
buffer=0xeef828 "", skipFsync=true) at smgr.c:516
#1 0x00000000004e5492 in btbuildempty (index=0x7f201fc3c7e0) at nbtree.c:178
#2 0x00000000005417f4 in index_build
(heapRelation=heapRelation@entry=0x7f201fc49dd0,
indexRelation=indexRelation@entry=0x7f201fc3c7e0,
indexInfo=indexInfo@entry=0x1159dd8,
#3 0x0000000000542838 in index_create
(heapRelation=heapRelation@entry=0x7f201fc49dd0,
indexRelationName=indexRelationName@entry=0x1159f38 "xxxxxxxx",
indexRelationId=yyyyy..)
#4 0x00000000005db9c8 in DefineIndex
(relationId=relationId@entry=1804880199, stmt=stmt@entry=0xf2fab8,
indexRelationId=indexRelationId@entry=0,
parentIndexId=parentIndexId@entry=0

-Jakub Wartak.

On Fri, Nov 25, 2022 at 9:48 AM Tomas Vondra
<tomas.vondra@enterprisedb.com> wrote:

Show quoted text

On 11/18/22 15:43, Tom Lane wrote:

David Geier <geidav.pg@gmail.com> writes:

On a different note: are we frequently running our tests suites with
debug_discard_caches=1 enabled?
It doesn't seem like.

Hmm. Buildfarm members avocet and trilobite are supposed to be
doing that, but their runtimes of late put the lie to it.
Configuration option got lost somewhere?

Yup, my bad - I forgot to tweak CPPFLAGS when upgrading the buildfarm
client to v12. Fixed, next run should be with

CPPFLAGS => '-DCLOBBER_CACHE_ALWAYS',

regards

--
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company