BUG #16219: EvalPlanQualFetchRowMark segfaults on Updates

Started by PG Bug reporting formabout 6 years ago2 messagesbugs
Jump to latest
#1PG Bug reporting form
noreply@postgresql.org

The following bug has been logged on the website:

Bug reference: 16219
Logged by: Julian Schauder
Email address: julian.schauder@gmx.de
PostgreSQL version: 12.1
Operating system: Ubuntu 18.04
Description:

Howdy Hackers,

for a week now (upgrade pg11=>12) we're experiencing a bug that causes
segfaults on a daily basis.
I've yet to reproduce this exact behaviour as the querys do not fail all the
time but rather now-and-then
when run concurrently. ( ~4 Crashes so far )

Outputs(identifiers) are anonymised.

2020-01-20 01:39:24.246 CET [2112] LOG: server process (PID 31599) was

terminated by signal 11: Segmentation fault

2020-01-20 01:39:24.246 CET [2112] DETAIL: Failed process was running:

UPDATE partitionedtable ia SET A=A, B=now() FROM temptable tt WHERE
ia.id=tt.ia_id AND A::text IS DISTINCT FROM tt.A::text;

2020-01-20 01:39:24.246 CET [2112] LOG: terminating any other active

server processes

#0 EvalPlanQualFetchRowMark (epqstate=0x5576bfbccfc0, rti=2,

slot=slot@entry=0x5576c888b6c0) at
./build/../src/backend/executor/execMain.c:2590

#1 0x00005576bdf449ce in ExecScanFetch (recheckMtd=0x5576bdf65740

<SeqRecheck>, accessMtd=0x5576bdf656b0 <SeqNext>, node=0x5576c888b500) at
./build/../src/backend/executor/execScan.c:115

#2 ExecScan (node=0x5576c888b500, accessMtd=0x5576bdf656b0 <SeqNext>,

recheckMtd=0x5576bdf65740 <SeqRecheck>) at
./build/../src/backend/executor/execScan.c:200

#3 0x00005576bdf63f7d in ExecProcNode (node=0x5576c888b500) at

./build/../src/include/executor/executor.h:239

#4 ExecNestLoop (pstate=<optimized out>) at

./build/../src/backend/executor/nodeNestloop.c:109

#5 0x00005576bdf3e9a9 in ExecProcNode (node=0x5576c888b310) at

./build/../src/include/executor/executor.h:239

#6 EvalPlanQualNext (epqstate=epqstate@entry=0x5576bfbccfc0) at

./build/../src/backend/executor/execMain.c:2695

#7 0x00005576bdf3ee77 in EvalPlanQual

(epqstate=epqstate@entry=0x5576bfbccfc0,
relation=relation@entry=0x7fbfd57df168, rti=42,
inputslot=inputslot@entry=0x5576cfb66578) at ./build/../src/backend/>
executor/execMain.c:2467

#8 0x00005576bdf61f5e in ExecUpdate

(mtstate=mtstate@entry=0x5576bfbccec8, tupleid=0x7ffdd552f89a, oldtuple=0x0,
slot=<optimized out>, planSlot=planSlot@entry=0x5576c153f428,
epqstate=epqstate@entry=0x5576bfbccfc0, estate=0x5576bf5df080,

canSetTag=true) at

./build/../src/backend/executor/nodeModifyTable.c:1387

#9 0x00005576bdf62df9 in ExecModifyTable (pstate=0x5576bfbccec8) at

./build/../src/backend/executor/nodeModifyTable.c:2223

#10 0x00005576bdf3c23d in ExecProcNode (node=0x5576bfbccec8) at

./build/../src/include/executor/executor.h:239

#11 ExecutePlan (execute_once=<optimized out>, dest=0x5576be63ec00

<donothingDR>, direction=<optimized out>, numberTuples=0,
sendTuples=<optimized out>, operation=CMD_UPDATE,
use_parallel_mode=<optimized out>, planstate=0x5576bfbccec8,

estate=0x5576bf5df080) at

./build/../src/backend/executor/execMain.c:1646

#12 standard_ExecutorRun (queryDesc=0x5576bed15ae0, direction=<optimized

out>, count=0, execute_once=<optimized out>) at
./build/../src/backend/executor/execMain.c:364

#13 0x00007fc89b1a4125 in pgss_ExecutorRun (queryDesc=0x5576bed15ae0,

direction=ForwardScanDirection, count=0, execute_once=<optimized out>) at
./build/../contrib/pg_stat_statements/pg_stat_statements.c:893

#14 0x00005576be0993b5 in ProcessQuery (plan=<optimized out>,
sourceText=0x5576bed159d0 "UPDATE partitionedtable ia SET A=A, B=now()

FROM temptable tt WHERE ia.id=tt.ia_id AND A::text IS DISTINCT FROM
tt.A::text"...,

params=0x0, queryEnv=0x0, dest=0x5576be63ec00 <donothingDR>,

completionTag=0x7ffdd552ff40 "") at
./build/../src/backend/tcop/pquery.c:161

#15 0x00005576be0995f3 in PortalRunMulti

(portal=portal@entry=0x5576bea978f0, isTopLevel=isTopLevel@entry=true,
setHoldSnapshot=setHoldSnapshot@entry=false, dest=0x5576be63ec00
<donothingDR>, dest@entry=0x5576bea00890,

altdest=0x5576be63ec00 <donothingDR>, altdest@entry=0x5576bea00890,

completionTag=completionTag@entry=0x7ffdd552ff40 "") at
./build/../src/backend/tcop/pquery.c:1283

#16 0x00005576be09a1ff in PortalRun (portal=portal@entry=0x5576bea978f0,

count=count@entry=1, isTopLevel=isTopLevel@entry=true, run_once=<optimized
out>, dest=dest@entry=0x5576bea00890, altdest=altdest@entry=0x5576bea00890,

completionTag=0x7ffdd552ff40 "") at

./build/../src/backend/tcop/pquery.c:796

#17 0x00005576be097a31 in exec_execute_message (max_rows=1,

portal_name=0x5576bea00480 "") at
./build/../src/backend/tcop/postgres.c:2090

#18 PostgresMain (argc=<optimized out>, argv=argv@entry=0x5576bea2fbd0,

dbname=<optimized out>, username=<optimized out>) at
./build/../src/backend/tcop/postgres.c:4297

#19 0x00005576be01e53b in BackendRun (port=0x5576bea27dc0,

port=0x5576bea27dc0) at
./build/../src/backend/postmaster/postmaster.c:4437

#20 BackendStartup (port=0x5576bea27dc0) at

./build/../src/backend/postmaster/postmaster.c:4128

#21 ServerLoop () at ./build/../src/backend/postmaster/postmaster.c:1704
#22 0x00005576be01f513 in PostmasterMain (argc=5, argv=0x5576be9f9090) at

./build/../src/backend/postmaster/postmaster.c:1377

#23 0x00005576bdd95cf6 in main (argc=5, argv=0x5576be9f9090) at

./build/../src/backend/main/main.c:228

#0 EvalPlanQualFetchRowMark (epqstate=0x5576bfbccfc0, rti=2,

slot=slot@entry=0x5576c888b6c0) at
./build/../src/backend/executor/execMain.c:2590

earm = 0x5576c7176258
erm = 0x24
datum = <optimized out>
isNull = false
__func__ = "EvalPlanQualFetchRowMark"
#1 0x00005576bdf449ce in ExecScanFetch (recheckMtd=0x5576bdf65740

<SeqRecheck>, accessMtd=0x5576bdf656b0 <SeqNext>, node=0x5576c888b500) at
./build/../src/backend/executor/execScan.c:115

slot = 0x5576c888b6c0
epqstate = <optimized out>
scanrelid = <optimized out>
slot = <optimized out>
estate = 0x5576c888b100
estate = <optimized out>
epqstate = <optimized out>
scanrelid = <optimized out>
slot = <optimized out>
slot = <optimized out>
slot = <optimized out>
slot = <optimized out>
#2 ExecScan (node=0x5576c888b500,

Excerpt shows the pointer 0x24 of erm to be uninitialized'ish. Once i can
capture a second
stacktrace i might be able to confirm if this is indeed a random value or
always 0x24. However,
we're workarounding this by simply not running this query concurrently
anymore hence i might
lack new stacktraces soon.

UPDATE partitionedtable ia SET A=A, B=now() FROM temptable tt WHERE

ia.id=tt.ia_id AND A::text IS DISTINCT FROM tt.A::text;

QUERY PLAN

-------------------------------------------------------------------------------------------------------------------------------

Update on partitionedtable ia (cost=0.43..1202173.35 rows=465660

width=143)

Update on partitionedtable ia
Update on partitionedtable_0062 ia_1
Update on partitionedtable_0061 ia_2
Update on partitionedtable_0060 ia_3
[repeats for another ~400 Partitions]
[..]
-> Nested Loop (cost=0.43..3386.78 rows=1194 width=145)
-> Seq Scan on temptable tt (cost=0.00..22.00 rows=1200

width=46)

-> Index Scan using partitionedtable_pk on partitionedtable ia

(cost=0.43..2.79 rows=1 width=560)

Index Cond: (id = tt.ia_id)
Filter: ((document)::text IS DISTINCT FROM

(tt.document_json)::text)

-> Nested Loop (cost=0.42..3108.99 rows=1194 width=159)
-> Seq Scan on temptable tt (cost=0.00..22.00 rows=1200

width=46)

-> Index Scan using partitionedtable_062_pk on

partitionedtable_0062 ia_1 (cost=0.42..2.56 rows=1 width=518)

Index Cond: (id = tt.ia_id)
Filter: ((document)::text IS DISTINCT FROM

(tt.document_json)::text)

[repeats for another ~400 Partitions]
[..]

JIT:
Functions: 3510
Options: Inlining true, Optimization true, Expressions true, Deforming

true

(2344 rows)

The Query basically updates a massively partitioned* Table from the
contents
of a temptable.

*Inheritance-Partitioning on ID(not modified within the query)
*Inheritance has triggers. ISTM that they are not of any concern.

System is a current Ubuntu18.04 with a freshly pg_upgrade'd postgresql-12.1
pgdg-build,
including new statistics, indices and vacuum.

I'm currently trying to get this to a minimized and reproducible form as i
do not see an
obvious error in the code yet. For the time being i fail to actually
recreate as accurate as
production does :)

regards,
julian

#2Tom Lane
tgl@sss.pgh.pa.us
In reply to: PG Bug reporting form (#1)
Re: BUG #16219: EvalPlanQualFetchRowMark segfaults on Updates

PG Bug reporting form <noreply@postgresql.org> writes:

for a week now (upgrade pg11=>12) we're experiencing a bug that causes
segfaults on a daily basis.
I've yet to reproduce this exact behaviour as the querys do not fail all the
time but rather now-and-then
when run concurrently. ( ~4 Crashes so far )

Not sure if you realize that the EvalPlanQual stuff is only called when
there's an uncommitted conflicting update on a target row. So the way
I'd try to reproduce something like this is

session 1 session 2

begin;
issue update query;
issue same update query;
commit;

which should take the race-condition issue out of the equation,
since session 2 will certainly be trying to update the same
rows that session 1 did but hasn't yet committed.

There may be other hard-to-reproduce factors at work, of course.
If you can test on a debug build (with --enable-cassert), that'd
help remove a few more variables.

earm = 0x5576c7176258
erm = 0x24

Excerpt shows the pointer 0x24 of erm to be uninitialized'ish.

Yeah. If you could do "p *earm" here, that might offer some clues;
I'm wondering if that entire struct has gotten trashed (likely because
somebody freed it too soon).

regards, tom lane