BUG #17372: Altering statistics during concurrent drop can lead to a server crash

Started by PG Bug reporting formover 4 years ago3 messagesbugs
Jump to latest
#1PG Bug reporting form
noreply@postgresql.org

The following bug has been logged on the website:

Bug reference: 17372
Logged by: Alexander Lakhin
Email address: exclusion@gmail.com
PostgreSQL version: 14.1
Operating system: Ubuntu 20.04
Description:

The following script (based on the stats_ext test):
psql -c "CREATE TABLE ab1 (a INTEGER, b INTEGER, c INTEGER);"
for i in `seq 100`; do
echo "iteration $i"
( { for n in `seq 20`; do echo "CREATE STATISTICS ab1_a_b_stats ON a, b FROM
ab1; DROP STATISTICS ab1_a_b_stats;"; done } | psql ) >psql1.log 2>&1 &
( { for n in `seq 20`; do echo "ALTER STATISTICS ab1_a_b_stats SET
STATISTICS -1;"; done } | psql ) >psql2.log 2>&1 &

wait
sleep 2
coredumpctl --no-pager && break
done

causes the server crash with the following stacktrace:
Core was generated by `postgres: law regression [local] ALTER STATISTICS
'.
Program terminated with signal SIGSEGV, Segmentation fault.
#0 0x000055c5f56ff405 in heap_deform_tuple (tuple=0x0,
tupleDesc=0x55c5f6c96a40, values=0x55c5f6c4a1a0,
isnull=0x55c5f6c45e08) at heaptuple.c:1252
1252 HeapTupleHeader tup = tuple->t_data;
(gdb) bt
#0 0x000055c5f56ff405 in heap_deform_tuple (tuple=0x0,
tupleDesc=0x55c5f6c96a40, values=0x55c5f6c4a1a0,
isnull=0x55c5f6c45e08) at heaptuple.c:1252
#1 0x000055c5f56ff127 in heap_modify_tuple (tuple=0x0,
tupleDesc=0x55c5f6c96a40, replValues=0x7ffe4d9f8ee0,
replIsnull=0x7ffe4d9f8ece, doReplace=0x7ffe4d9f8ed7) at
heaptuple.c:1139
#2 0x000055c5f597293a in AlterStatistics (stmt=0x55c5f6c22360) at
statscmds.c:695
#3 0x000055c5f5c679ff in ProcessUtilitySlow (pstate=0x55c5f6c431b0,
pstmt=0x55c5f6c226b0,
queryString=0x55c5f6c218a0 "ALTER STATISTICS ab1_a_b_stats SET
STATISTICS -1;", context=PROCESS_UTILITY_TOPLEVEL,
params=0x0, queryEnv=0x0, dest=0x55c5f6c227a0, qc=0x7ffe4d9f9700) at
utility.c:1883
#4 0x000055c5f5c65c13 in standard_ProcessUtility (pstmt=0x55c5f6c226b0,
queryString=0x55c5f6c218a0 "ALTER STATISTICS ab1_a_b_stats SET
STATISTICS -1;", readOnlyTree=false,
context=PROCESS_UTILITY_TOPLEVEL, params=0x0, queryEnv=0x0,
dest=0x55c5f6c227a0, qc=0x7ffe4d9f9700)
at utility.c:1066
#5 0x000055c5f5c64b83 in ProcessUtility (pstmt=0x55c5f6c226b0,
queryString=0x55c5f6c218a0 "ALTER STATISTICS ab1_a_b_stats SET
STATISTICS -1;", readOnlyTree=false,
context=PROCESS_UTILITY_TOPLEVEL, params=0x0, queryEnv=0x0,
dest=0x55c5f6c227a0, qc=0x7ffe4d9f9700) at utility.c:527
#6 0x000055c5f5c63422 in PortalRunUtility (portal=0x55c5f6c84210,
pstmt=0x55c5f6c226b0, isTopLevel=true,
setHoldSnapshot=false, dest=0x55c5f6c227a0, qc=0x7ffe4d9f9700) at
pquery.c:1155
#7 0x000055c5f5c636ad in PortalRunMulti (portal=0x55c5f6c84210,
isTopLevel=true, setHoldSnapshot=false,
dest=0x55c5f6c227a0, altdest=0x55c5f6c227a0, qc=0x7ffe4d9f9700) at
pquery.c:1312
#8 0x000055c5f5c62adc in PortalRun (portal=0x55c5f6c84210,
count=9223372036854775807, isTopLevel=true, run_once=true,
dest=0x55c5f6c227a0, altdest=0x55c5f6c227a0, qc=0x7ffe4d9f9700) at
pquery.c:788
#9 0x000055c5f5c5b918 in exec_simple_query (
query_string=0x55c5f6c218a0 "ALTER STATISTICS ab1_a_b_stats SET
STATISTICS -1;") at postgres.c:1214
#10 0x000055c5f5c60807 in PostgresMain (argc=1, argv=0x7ffe4d9f9920,
dbname=0x55c5f6c4d9f8 "regression",
username=0x55c5f6c4d9d8 "law") at postgres.c:4486
#11 0x000055c5f5b84fd6 in BackendRun (port=0x55c5f6c46300) at
postmaster.c:4530
#12 0x000055c5f5b84831 in BackendStartup (port=0x55c5f6c46300) at
postmaster.c:4252
#13 0x000055c5f5b80626 in ServerLoop () at postmaster.c:1745
#14 0x000055c5f5b7fd83 in PostmasterMain (argc=3, argv=0x55c5f6c1b910) at
postmaster.c:1417
#15 0x000055c5f5a6f2e0 in main (argc=3, argv=0x55c5f6c1b910) at main.c:209

#2Tomas Vondra
tomas.vondra@2ndquadrant.com
In reply to: PG Bug reporting form (#1)
Re: BUG #17372: Altering statistics during concurrent drop can lead to a server crash

On 1/20/22 05:00, PG Bug reporting form wrote:

The following bug has been logged on the website:

Bug reference: 17372
Logged by: Alexander Lakhin
Email address: exclusion@gmail.com
PostgreSQL version: 14.1
Operating system: Ubuntu 20.04
Description:

The following script (based on the stats_ext test):
psql -c "CREATE TABLE ab1 (a INTEGER, b INTEGER, c INTEGER);"
for i in `seq 100`; do
echo "iteration $i"
( { for n in `seq 20`; do echo "CREATE STATISTICS ab1_a_b_stats ON a, b FROM
ab1; DROP STATISTICS ab1_a_b_stats;"; done } | psql ) >psql1.log 2>&1 &
( { for n in `seq 20`; do echo "ALTER STATISTICS ab1_a_b_stats SET
STATISTICS -1;"; done } | psql ) >psql2.log 2>&1 &

wait
sleep 2
coredumpctl --no-pager && break
done

Thanks for the report - I've reproduced it using your script (I had to
add a short wait before the syscache lookup). The issue is fairly
simple, we never check we actually found a tuple in the syscache, i.e.
there's no HeapTupleIsValid() call. We check the OID earlier, but the
tuple may have disappeared since then.

IMO the usual "cache lookup failed" error that we throw in other places
in similar cases is good enough. We already fail with other errors
(tuple concurrently updated/deleted) here.

Patch attached.

regards

--
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Attachments:

alter-statistics-fix.patchtext/x-patch; charset=UTF-8; name=alter-statistics-fix.patchDownload+2-0
#3Alexander Lakhin
exclusion@gmail.com
In reply to: Tomas Vondra (#2)
Re: BUG #17372: Altering statistics during concurrent drop can lead to a server crash

Hello Tomas,
20.01.2022 20:57, Tomas Vondra wrote:

Thanks for the report - I've reproduced it using your script (I had to
add a short wait before the syscache lookup). The issue is fairly
simple, we never check we actually found a tuple in the syscache, i.e.
there's no HeapTupleIsValid() call. We check the OID earlier, but the
tuple may have disappeared since then.

IMO the usual "cache lookup failed" error that we throw in other places
in similar cases is good enough. We already fail with other errors
(tuple concurrently updated/deleted) here.

Yes, the rough, but simple search:
grep -Porz
'.*=\s*SearchSysCache1.*\n(?!(.*|.*\n.*|.*\n.*\n.*|.*\n.*\n.*\n.*)HeapTupleIsValid)'
./
finds only this place.
Your fix works for me.

Thanks!

Best regards,
Alexander