Assertion in pgstat_assoc_relation() fails intermittently

Started by Bharath Rupireddyalmost 3 years ago2 messages
#1Bharath Rupireddy
bharath.rupireddyforpostgres@gmail.com

Hi,

I recently observed an assertion failure [1]running bootstrap script ... TRAP: failed Assert("rel->pgstat_info->relation == NULL"), File: "pgstat_relation.c", Line: 143, PID: 837245 /home/ubuntu/postgres/inst/bin/postgres(ExceptionalCondition+0xbb)[0x55d98ff6abc4] /home/ubuntu/postgres/inst/bin/postgres(pgstat_assoc_relation+0xcd)[0x55d98fdb3db7] /home/ubuntu/postgres/inst/bin/postgres(+0x1326f5)[0x55d98f8576f5] /home/ubuntu/postgres/inst/bin/postgres(heap_beginscan+0x17a)[0x55d98f8586b5] /home/ubuntu/postgres/inst/bin/postgres(table_beginscan_catalog+0x6e)[0x55d98f8c4cf3] /home/ubuntu/postgres/inst/bin/postgres(+0x1f3d29)[0x55d98f918d29] /home/ubuntu/postgres/inst/bin/postgres(+0x1f4031)[0x55d98f919031] /home/ubuntu/postgres/inst/bin/postgres(DefineAttr+0x216)[0x55d98f918375] /home/ubuntu/postgres/inst/bin/postgres(boot_yyparse+0x115c)[0x55d98f91499c] /home/ubuntu/postgres/inst/bin/postgres(BootstrapModeMain+0x5cb)[0x55d98f917cda] /home/ubuntu/postgres/inst/bin/postgres(main+0x2f3)[0x55d98fb63738] /lib/x86_64-linux-gnu/libc.so.6(+0x29d90)[0x7faac24d8d90] /lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0x80)[0x7faac24d8e40] /home/ubuntu/postgres/inst/bin/postgres(_start+0x25)[0x55d98f7f2045] Aborted (core dumped) child process exited with exit code 134 initdb: removing data directory "data" a few times on my dev
setup during initdb. The code was built with --enable-debug
--enable-cassert CFLAGS="-ggdb3 -O0". The assertion was gone after I
did make distclean and built the source code again. It looks like the
same relation (pg_type [2](gdb) p *relation $2 = {rd_locator = {spcOid = 1663, dbOid = 1, relNumber = 1247}, rd_smgr = 0x562d93e090a8, rd_refcnt = 3, rd_backend = -1, rd_islocaltemp = false, rd_isnailed = true, rd_isvalid = true, rd_indexvalid = false, rd_statvalid = false, rd_createSubid = 1, rd_newRelfilelocatorSubid = 0, rd_firstRelfilelocatorSubid = 0, rd_droppedSubid = 0, rd_rel = 0x562d93ddae18, rd_att = 0x562d93dd9dd8, rd_id = 1247, rd_lockInfo = {lockRelId = { relId = 1247, dbId = 1}}, rd_rules = 0x0, rd_rulescxt = 0x0, trigdesc = 0x0, rd_rsdesc = 0x0, rd_fkeylist = 0x0, rd_fkeyvalid = false, rd_partkey = 0x0, rd_partkeycxt = 0x0, rd_partdesc = 0x0, rd_pdcxt = 0x0, rd_partdesc_nodetached = 0x0, rd_pddcxt = 0x0, rd_partdesc_nodetached_xmin = 0, rd_partcheck = 0x0, rd_partcheckvalid = false, rd_partcheckcxt = 0x0, rd_indexlist = 0x0, rd_pkindex = 0, rd_replidindex = 0, rd_statlist = 0x0, rd_attrsvalid = false, rd_keyattr = 0x0, rd_pkattr = 0x0, rd_idattr = 0x0, rd_hotblockingattr = 0x0, rd_summarizedattr = 0x0, rd_pubdesc = 0x0, rd_options = 0x0, rd_amhandler = 3, rd_tableam = 0x562d92582040 <heapam_methods>, rd_index = 0x0, rd_indextuple = 0x0, rd_indexcxt = 0x0, rd_indam = 0x0, rd_opfamily = 0x0, rd_opcintype = 0x0, rd_support = 0x0, rd_supportinfo = 0x0, rd_indoption = 0x0, rd_indexprs = 0x0, rd_indpred = 0x0, rd_exclops = 0x0, rd_exclprocs = 0x0, rd_exclstrats = 0x0, rd_indcollation = 0x0, rd_opcoptions = 0x0, rd_amcache = 0x0, rd_fdwroutine = 0x0, rd_toastoid = 0, pgstat_enabled = true, pgstat_info = 0x562d93d79fb8} (gdb) p *relation->rd_rel $3 = {oid = 0, relname = {data = "pg_type", '\000' <repeats 56 times>}, relnamespace = 11, reltype = 0, reloftype = 0, relowner = 10, relam = 2, relfilenode = 0, reltablespace = 0, relpages = 0, reltuples = 0, relallvisible = 0, reltoastrelid = 0, relhasindex = false, relisshared = false, relpersistence = 112 'p', relkind = 114 'r', relnatts = 32, relchecks = 0, relhasrules = false, relhastriggers = false, relhassubclass = false, relrowsecurity = false, relforcerowsecurity = false, relispopulated = true, relreplident = 110 'n', relispartition = false, relrewrite = 0, relfrozenxid = 0, relminmxid = 0} (gdb)) is linked to multiple relcache entries.
I'm not sure if anyone else has seen this, but thought of reporting it
here. Note that I'm not seeing this issue any more.

[1]: running bootstrap script ... TRAP: failed Assert("rel->pgstat_info->relation == NULL"), File: "pgstat_relation.c", Line: 143, PID: 837245 /home/ubuntu/postgres/inst/bin/postgres(ExceptionalCondition+0xbb)[0x55d98ff6abc4] /home/ubuntu/postgres/inst/bin/postgres(pgstat_assoc_relation+0xcd)[0x55d98fdb3db7] /home/ubuntu/postgres/inst/bin/postgres(+0x1326f5)[0x55d98f8576f5] /home/ubuntu/postgres/inst/bin/postgres(heap_beginscan+0x17a)[0x55d98f8586b5] /home/ubuntu/postgres/inst/bin/postgres(table_beginscan_catalog+0x6e)[0x55d98f8c4cf3] /home/ubuntu/postgres/inst/bin/postgres(+0x1f3d29)[0x55d98f918d29] /home/ubuntu/postgres/inst/bin/postgres(+0x1f4031)[0x55d98f919031] /home/ubuntu/postgres/inst/bin/postgres(DefineAttr+0x216)[0x55d98f918375] /home/ubuntu/postgres/inst/bin/postgres(boot_yyparse+0x115c)[0x55d98f91499c] /home/ubuntu/postgres/inst/bin/postgres(BootstrapModeMain+0x5cb)[0x55d98f917cda] /home/ubuntu/postgres/inst/bin/postgres(main+0x2f3)[0x55d98fb63738] /lib/x86_64-linux-gnu/libc.so.6(+0x29d90)[0x7faac24d8d90] /lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0x80)[0x7faac24d8e40] /home/ubuntu/postgres/inst/bin/postgres(_start+0x25)[0x55d98f7f2045] Aborted (core dumped) child process exited with exit code 134 initdb: removing data directory "data"
running bootstrap script ... TRAP: failed
Assert("rel->pgstat_info->relation == NULL"), File:
"pgstat_relation.c", Line: 143, PID: 837245
/home/ubuntu/postgres/inst/bin/postgres(ExceptionalCondition+0xbb)[0x55d98ff6abc4]
/home/ubuntu/postgres/inst/bin/postgres(pgstat_assoc_relation+0xcd)[0x55d98fdb3db7]
/home/ubuntu/postgres/inst/bin/postgres(+0x1326f5)[0x55d98f8576f5]
/home/ubuntu/postgres/inst/bin/postgres(heap_beginscan+0x17a)[0x55d98f8586b5]
/home/ubuntu/postgres/inst/bin/postgres(table_beginscan_catalog+0x6e)[0x55d98f8c4cf3]
/home/ubuntu/postgres/inst/bin/postgres(+0x1f3d29)[0x55d98f918d29]
/home/ubuntu/postgres/inst/bin/postgres(+0x1f4031)[0x55d98f919031]
/home/ubuntu/postgres/inst/bin/postgres(DefineAttr+0x216)[0x55d98f918375]
/home/ubuntu/postgres/inst/bin/postgres(boot_yyparse+0x115c)[0x55d98f91499c]
/home/ubuntu/postgres/inst/bin/postgres(BootstrapModeMain+0x5cb)[0x55d98f917cda]
/home/ubuntu/postgres/inst/bin/postgres(main+0x2f3)[0x55d98fb63738]
/lib/x86_64-linux-gnu/libc.so.6(+0x29d90)[0x7faac24d8d90]
/lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0x80)[0x7faac24d8e40]
/home/ubuntu/postgres/inst/bin/postgres(_start+0x25)[0x55d98f7f2045]
Aborted (core dumped)
child process exited with exit code 134
initdb: removing data directory "data"

#3 0x0000562d92043da9 in pgstat_assoc_relation (rel=0x562d93dd9bc8)
at pgstat_relation.c:148
#4 0x0000562d91ae76f5 in initscan (scan=0x562d93e047f8, key=0x0,
keep_startblock=false) at heapam.c:344
#5 0x0000562d91ae86b5 in heap_beginscan (relation=0x562d93dd9bc8,
snapshot=0x562d93dfe3f8, nkeys=0, key=0x0,
parallel_scan=0x0, flags=961) at heapam.c:1017
#6 0x0000562d91b54cf3 in table_beginscan_catalog
(relation=0x562d93dd9bc8, nkeys=0, key=0x0) at tableam.c:119
#7 0x0000562d91ba8d29 in populate_typ_list () at bootstrap.c:719
#8 0x0000562d91ba9031 in gettype (type=0x562d93e047d8 "anyarray") at
bootstrap.c:801
#9 0x0000562d91ba8375 in DefineAttr (name=0x562d93e047b8
"attmissingval", type=0x562d93e047d8 "anyarray", attnum=25,
nullness=1) at bootstrap.c:521
#10 0x0000562d91ba499c in boot_yyparse () at
/home/ubuntu/postgres/src/backend/bootstrap/bootparse.y:438
#11 0x0000562d91ba7cda in BootstrapModeMain (argc=6,
argv=0x562d93d4a1d8, check_only=false) at bootstrap.c:370
#12 0x0000562d91df3738 in main (argc=7, argv=0x562d93d4a1d0) at main.c:189

[2]: (gdb) p *relation $2 = {rd_locator = {spcOid = 1663, dbOid = 1, relNumber = 1247}, rd_smgr = 0x562d93e090a8, rd_refcnt = 3, rd_backend = -1, rd_islocaltemp = false, rd_isnailed = true, rd_isvalid = true, rd_indexvalid = false, rd_statvalid = false, rd_createSubid = 1, rd_newRelfilelocatorSubid = 0, rd_firstRelfilelocatorSubid = 0, rd_droppedSubid = 0, rd_rel = 0x562d93ddae18, rd_att = 0x562d93dd9dd8, rd_id = 1247, rd_lockInfo = {lockRelId = { relId = 1247, dbId = 1}}, rd_rules = 0x0, rd_rulescxt = 0x0, trigdesc = 0x0, rd_rsdesc = 0x0, rd_fkeylist = 0x0, rd_fkeyvalid = false, rd_partkey = 0x0, rd_partkeycxt = 0x0, rd_partdesc = 0x0, rd_pdcxt = 0x0, rd_partdesc_nodetached = 0x0, rd_pddcxt = 0x0, rd_partdesc_nodetached_xmin = 0, rd_partcheck = 0x0, rd_partcheckvalid = false, rd_partcheckcxt = 0x0, rd_indexlist = 0x0, rd_pkindex = 0, rd_replidindex = 0, rd_statlist = 0x0, rd_attrsvalid = false, rd_keyattr = 0x0, rd_pkattr = 0x0, rd_idattr = 0x0, rd_hotblockingattr = 0x0, rd_summarizedattr = 0x0, rd_pubdesc = 0x0, rd_options = 0x0, rd_amhandler = 3, rd_tableam = 0x562d92582040 <heapam_methods>, rd_index = 0x0, rd_indextuple = 0x0, rd_indexcxt = 0x0, rd_indam = 0x0, rd_opfamily = 0x0, rd_opcintype = 0x0, rd_support = 0x0, rd_supportinfo = 0x0, rd_indoption = 0x0, rd_indexprs = 0x0, rd_indpred = 0x0, rd_exclops = 0x0, rd_exclprocs = 0x0, rd_exclstrats = 0x0, rd_indcollation = 0x0, rd_opcoptions = 0x0, rd_amcache = 0x0, rd_fdwroutine = 0x0, rd_toastoid = 0, pgstat_enabled = true, pgstat_info = 0x562d93d79fb8} (gdb) p *relation->rd_rel $3 = {oid = 0, relname = {data = "pg_type", '\000' <repeats 56 times>}, relnamespace = 11, reltype = 0, reloftype = 0, relowner = 10, relam = 2, relfilenode = 0, reltablespace = 0, relpages = 0, reltuples = 0, relallvisible = 0, reltoastrelid = 0, relhasindex = false, relisshared = false, relpersistence = 112 'p', relkind = 114 'r', relnatts = 32, relchecks = 0, relhasrules = false, relhastriggers = false, relhassubclass = false, relrowsecurity = false, relforcerowsecurity = false, relispopulated = true, relreplident = 110 'n', relispartition = false, relrewrite = 0, relfrozenxid = 0, relminmxid = 0} (gdb)
(gdb) p *relation
$2 = {rd_locator = {spcOid = 1663, dbOid = 1, relNumber = 1247},
rd_smgr = 0x562d93e090a8, rd_refcnt = 3, rd_backend = -1,
rd_islocaltemp = false,
rd_isnailed = true, rd_isvalid = true, rd_indexvalid = false,
rd_statvalid = false, rd_createSubid = 1, rd_newRelfilelocatorSubid =
0,
rd_firstRelfilelocatorSubid = 0, rd_droppedSubid = 0, rd_rel =
0x562d93ddae18, rd_att = 0x562d93dd9dd8, rd_id = 1247, rd_lockInfo =
{lockRelId = {
relId = 1247, dbId = 1}}, rd_rules = 0x0, rd_rulescxt = 0x0,
trigdesc = 0x0, rd_rsdesc = 0x0, rd_fkeylist = 0x0, rd_fkeyvalid =
false, rd_partkey = 0x0,
rd_partkeycxt = 0x0, rd_partdesc = 0x0, rd_pdcxt = 0x0,
rd_partdesc_nodetached = 0x0, rd_pddcxt = 0x0,
rd_partdesc_nodetached_xmin = 0, rd_partcheck = 0x0,
rd_partcheckvalid = false, rd_partcheckcxt = 0x0, rd_indexlist =
0x0, rd_pkindex = 0, rd_replidindex = 0, rd_statlist = 0x0,
rd_attrsvalid = false,
rd_keyattr = 0x0, rd_pkattr = 0x0, rd_idattr = 0x0,
rd_hotblockingattr = 0x0, rd_summarizedattr = 0x0, rd_pubdesc = 0x0,
rd_options = 0x0, rd_amhandler = 3,
rd_tableam = 0x562d92582040 <heapam_methods>, rd_index = 0x0,
rd_indextuple = 0x0, rd_indexcxt = 0x0, rd_indam = 0x0, rd_opfamily =
0x0, rd_opcintype = 0x0,
rd_support = 0x0, rd_supportinfo = 0x0, rd_indoption = 0x0,
rd_indexprs = 0x0, rd_indpred = 0x0, rd_exclops = 0x0, rd_exclprocs =
0x0, rd_exclstrats = 0x0,
rd_indcollation = 0x0, rd_opcoptions = 0x0, rd_amcache = 0x0,
rd_fdwroutine = 0x0, rd_toastoid = 0, pgstat_enabled = true,
pgstat_info = 0x562d93d79fb8}
(gdb) p *relation->rd_rel
$3 = {oid = 0, relname = {data = "pg_type", '\000' <repeats 56
times>}, relnamespace = 11, reltype = 0, reloftype = 0, relowner = 10,
relam = 2,
relfilenode = 0, reltablespace = 0, relpages = 0, reltuples = 0,
relallvisible = 0, reltoastrelid = 0, relhasindex = false, relisshared
= false,
relpersistence = 112 'p', relkind = 114 'r', relnatts = 32,
relchecks = 0, relhasrules = false, relhastriggers = false,
relhassubclass = false,
relrowsecurity = false, relforcerowsecurity = false, relispopulated
= true, relreplident = 110 'n', relispartition = false, relrewrite =
0, relfrozenxid = 0,
relminmxid = 0}
(gdb)

--
Bharath Rupireddy
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

#2Kyotaro Horiguchi
horikyota.ntt@gmail.com
In reply to: Bharath Rupireddy (#1)
Re: Assertion in pgstat_assoc_relation() fails intermittently

At Mon, 27 Mar 2023 11:46:08 +0530, Bharath Rupireddy <bharath.rupireddyforpostgres@gmail.com> wrote in

I recently observed an assertion failure [1] a few times on my dev
setup during initdb. The code was built with --enable-debug
--enable-cassert CFLAGS="-ggdb3 -O0". The assertion was gone after I
did make distclean and built the source code again. It looks like the
same relation (pg_type [2]) is linked to multiple relcache entries.
I'm not sure if anyone else has seen this, but thought of reporting it
here. Note that I'm not seeing this issue any more.

This seems like the same issue with [a] and it was fixed by cb2e7ddfe5
on Dec 2, 2022.

[a] /messages/by-id/CALDaNm2yXz+zOtv7y5zBd5WKT8O0Ld3YxikuU3dcyCvxF7gypA@mail.gmail.com

a> #5 0x00005590bf283139 in ExceptionalCondition
a> (conditionName=0x5590bf468170 "rel->pgstat_info->relation == NULL",
a> fileName=0x5590bf46812b "pgstat_relation.c", lineNumber=143) at
a> assert.c:66
a> #6 0x00005590bf0ce5f8 in pgstat_assoc_relation (rel=0x7efcce996a48)
a> at pgstat_relation.c:143
a> #7 0x00005590beb83046 in initscan (scan=0x5590bfbf4af8, key=0x0,
a> keep_startblock=false) at heapam.c:343
a> #8 0x00005590beb8466f in heap_beginscan (relation=0x7efcce996a48,
snapshot=0x5590bfb5a520, nkeys=0, key=0x0, parallel_scan=0x0,
flags=449) at heapam.c:1223

[1]
running bootstrap script ... TRAP: failed
Assert("rel->pgstat_info->relation == NULL"), File:
"pgstat_relation.c", Line: 143, PID: 837245
/home/ubuntu/postgres/inst/bin/postgres(ExceptionalCondition+0xbb)[0x55d98ff6abc4]
/home/ubuntu/postgres/inst/bin/postgres(pgstat_assoc_relation+0xcd)[0x55d98fdb3db7]
/home/ubuntu/postgres/inst/bin/postgres(+0x1326f5)[0x55d98f8576f5]
/home/ubuntu/postgres/inst/bin/postgres(heap_beginscan+0x17a)[0x55d98f8586b5]
/home/ubuntu/postgres/inst/bin/postgres(table_beginscan_catalog+0x6e)[0x55d98f8c4cf3]

regareds.

--
Kyotaro Horiguchi
NTT Open Source Software Center