BUG #16184: Segmentation Fault during update

Started by PG Bug reporting formover 6 years ago7 messagesbugs
Jump to latest
#1PG Bug reporting form
noreply@postgresql.org

The following bug has been logged on the website:

Bug reference: 16184
Logged by: Piotr Wlodarczyk
Email address: piotrwlodarczyk89@gmail.com
PostgreSQL version: 12.1
Operating system: Linux dev-g1-db-synchro.gnb.adm 3.10.0-1062.7.1.el
Description:

Hello geeks,

We're preparing to upgrade postgresql from 11.5 to 12.1 but still we have
some bugs and issues. Most of them are our fault, but now we have problem we
don't know how to play with it.

Log:
2020-01-02 15:42:26 CET [12729]: [95-1] user=,db=,xid=0,state=00000 LOG:
server process (PID 28879) was terminated by signal 11: Segmentation fault
2020-01-02 15:42:26 CET [12729]: [96-1] user=,db=,xid=0,state=00000 DETAIL:
Failed process was running: UPDATE job_abstract a
SET job_status = job_status
WHERE (a.xmax :: TEXT :: BIGINT < (SELECT

txid_snapshot_xmin(txid_current_snapshot()))) AND ((SELECT

txid_snapshot_xmin(

txid_current_snapshot())

<= (SELECT

txid_snapshot_xmax(

txid_current_snapshot()))))
AND job_status = 0 AND NOT xmax = 0;

Please don't ask me why this update is made in this way - I suppose to set
xmax to 0 but why? I don't know. Its legacy code, being part of core of our
application.

Reproducting this bug is not easy and I don't know which conditions has to
be met to replicate this bug. But during application is working and we made
this update, segmentation fault occurs. But sometimes it works.

Column job_status is telling application that this row should be preceded
one more time. Up to 11.5 this bug doesn't exists.

#2Tom Lane
tgl@sss.pgh.pa.us
In reply to: PG Bug reporting form (#1)
Re: BUG #16184: Segmentation Fault during update

PG Bug reporting form <noreply@postgresql.org> writes:

Log:
2020-01-02 15:42:26 CET [12729]: [95-1] user=,db=,xid=0,state=00000 LOG:
server process (PID 28879) was terminated by signal 11: Segmentation fault
2020-01-02 15:42:26 CET [12729]: [96-1] user=,db=,xid=0,state=00000 DETAIL:
Failed process was running: UPDATE job_abstract a
SET job_status = job_status
WHERE (a.xmax :: TEXT :: BIGINT < (SELECT
txid_snapshot_xmin(txid_current_snapshot()))) AND ((SELECT

Clearly a bug, but there's not much we can do with this amount of
information. If you can provide a stack trace, that'd be helpful.

https://wiki.postgresql.org/wiki/Generating_a_stack_trace_of_a_PostgreSQL_backend

Reproducting this bug is not easy and I don't know which conditions has to
be met to replicate this bug. But during application is working and we made
this update, segmentation fault occurs. But sometimes it works.

My gut instinct is that this might be a concurrent-update problem,
which would explain the lack of reproducibility. Try to get
multiple sessions to update the same table row(s) at the same time.

Also, we had some bugs of that ilk on 12.0, please make sure you
really are updated to 12.1.

regards, tom lane

#3Piotr Włodarczyk
piotrwlodarczyk89@gmail.com
In reply to: Tom Lane (#2)
Re: BUG #16184: Segmentation Fault during update

Hi,

sorry for the late reply, but it was not so easy to get more logs. So what
we have now is listed below:

(gdb) bt
#0 0x0000000000616912 in EvalPlanQualBegin (epqstate=epqstate@entry=0x2ae9520)
at execMain.c:2735
#1 0x0000000000616cfb in EvalPlanQual (epqstate=epqstate@entry=0x2ae9520,
relation=relation@entry=0x7f694c76f838, rti=14,
inputslot=inputslot@entry=0x31bc408)
at execMain.c:2454
#2 0x0000000000638bdf in ExecUpdate (mtstate=mtstate@entry=0x2ae9428,
tupleid=0x7ffc94e3cf40, oldtuple=0x0, slot=<optimized out>,
planSlot=0x312fc28, epqstate=epqstate@entry=0x2ae9520,
estate=estate@entry=0x2aff450,
canSetTag=true) at nodeModifyTable.c:1387
#3 0x000000000063914d in ExecModifyTable (pstate=0x2ae9428) at
nodeModifyTable.c:2223
#4 0x0000000000614302 in ExecProcNode (node=0x2ae9428) at
../../../src/include/executor/executor.h:239
#5 ExecutePlan (execute_once=<optimized out>, dest=0x2f82d88,
direction=<optimized out>, numberTuples=0, sendTuples=false,
operation=CMD_UPDATE, use_parallel_mode=<optimized out>,
planstate=0x2ae9428, estate=0x2aff450) at execMain.c:1646
#6 standard_ExecutorRun (queryDesc=0x2c7c460, direction=<optimized out>,
count=0, execute_once=<optimized out>) at execMain.c:364
#7 0x000000000076365a in ProcessQuery (plan=<optimized out>,
sourceText=0x2a9d7a0 " UPDATE job_abstract a\n SET job_status =
job_status\n WHERE (a.xmax :: TEXT :: BIGINT < (SELECT\n", ' ' <repeats 39
times>, "txid_snapshot_xmin(txid_current_snapshot()))) AND ((SELECT\n
"..., params=0x0, queryEnv=0x0,
dest=0x2f82d88, completionTag=0x7ffc94e3d250 "") at pquery.c:161
#8 0x0000000000763882 in PortalRunMulti (portal=portal@entry=0x2be30a0,
isTopLevel=isTopLevel@entry=true, setHoldSnapshot=setHoldSnapshot@entry=false,
dest=dest@entry=0x2f82d88, altdest=altdest@entry=0x2f82d88,
completionTag=completionTag@entry=0x7ffc94e3d250 "") at pquery.c:1283
#9 0x00000000007643b5 in PortalRun (portal=portal@entry=0x2be30a0,
count=count@entry=9223372036854775807, isTopLevel=isTopLevel@entry=true,
run_once=run_once@entry=true, dest=dest@entry=0x2f82d88,
altdest=altdest@entry=0x2f82d88,
completionTag=completionTag@entry=0x7ffc94e3d250
"") at pquery.c:796
#10 0x00000000007604b5 in exec_simple_query (query_string=0x2a9d7a0 "
UPDATE job_abstract a\n SET job_status = job_status\n WHERE (a.xmax ::
TEXT :: BIGINT < (SELECT\n", ' ' <repeats 39 times>,
"txid_snapshot_xmin(txid_current_snapshot()))) AND ((SELECT\n "...) at
postgres.c:1215
#11 0x0000000000761782 in PostgresMain (argc=<optimized out>,
argv=argv@entry=0x2ac50a8, dbname=0x2ac4f88 "synchro_test",
username=<optimized out>) at postgres.c:4236
#12 0x0000000000483d47 in BackendRun (port=<optimized out>, port=<optimized
out>) at postmaster.c:4437
#13 BackendStartup (port=0x2abd4c0) at postmaster.c:4128
#14 ServerLoop () at postmaster.c:1704
#15 0x00000000006f106f in PostmasterMain (argc=argc@entry=7,
argv=argv@entry=0x2a98330)
at postmaster.c:1377
#16 0x0000000000484cd3 in main (argc=7, argv=0x2a98330) at main.c:228
(gdb)

Is it enought?

On Thu, Jan 2, 2020 at 4:33 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:

PG Bug reporting form <noreply@postgresql.org> writes:

Log:
2020-01-02 15:42:26 CET [12729]: [95-1] user=,db=,xid=0,state=00000 LOG:
server process (PID 28879) was terminated by signal 11: Segmentation

fault

2020-01-02 15:42:26 CET [12729]: [96-1] user=,db=,xid=0,state=00000

DETAIL:

Failed process was running: UPDATE job_abstract a
SET job_status = job_status
WHERE (a.xmax :: TEXT :: BIGINT < (SELECT
txid_snapshot_xmin(txid_current_snapshot()))) AND ((SELECT

Clearly a bug, but there's not much we can do with this amount of
information. If you can provide a stack trace, that'd be helpful.

https://wiki.postgresql.org/wiki/Generating_a_stack_trace_of_a_PostgreSQL_backend

Reproducting this bug is not easy and I don't know which conditions has

to

be met to replicate this bug. But during application is working and we

made

this update, segmentation fault occurs. But sometimes it works.

My gut instinct is that this might be a concurrent-update problem,
which would explain the lack of reproducibility. Try to get
multiple sessions to update the same table row(s) at the same time.

Also, we had some bugs of that ilk on 12.0, please make sure you
really are updated to 12.1.

regards, tom lane

--

Pozdrawiam
Piotr Włodarczyk

#4Tom Lane
tgl@sss.pgh.pa.us
In reply to: Piotr Włodarczyk (#3)
Re: BUG #16184: Segmentation Fault during update

=?UTF-8?Q?Piotr_W=C5=82odarczyk?= <piotrwlodarczyk89@gmail.com> writes:

sorry for the late reply, but it was not so easy to get more logs. So what
we have now is listed below:

(gdb) bt
#0 0x0000000000616912 in EvalPlanQualBegin (epqstate=epqstate@entry=0x2ae9520)
at execMain.c:2735
#1 0x0000000000616cfb in EvalPlanQual (epqstate=epqstate@entry=0x2ae9520,
relation=relation@entry=0x7f694c76f838, rti=14,
inputslot=inputslot@entry=0x31bc408)
at execMain.c:2454
#2 0x0000000000638bdf in ExecUpdate (mtstate=mtstate@entry=0x2ae9428,
tupleid=0x7ffc94e3cf40, oldtuple=0x0, slot=<optimized out>,
planSlot=0x312fc28, epqstate=epqstate@entry=0x2ae9520,
estate=estate@entry=0x2aff450,
canSetTag=true) at nodeModifyTable.c:1387
#3 0x000000000063914d in ExecModifyTable (pstate=0x2ae9428) at
nodeModifyTable.c:2223

OK, so that confirms my suspicion that it was an EvalPlanQual issue
(hence, you need concurrent updates on the same row to trigger it).

I believe this might be the same bug we isolated and fixed just a
couple of days ago:

https://git.postgresql.org/gitweb/?p=postgresql.git;a=commitdiff;h=87fed2a197abc1397b63ee74b3fa7eb20471fff5

The stack trace looks the same as what Oleksii showed in that thread
for a non-assert build. As far as I know, that bug only triggers if
a single ModifyTable plan node has to run EPQ checks for more than
one target table --- so, is this query updating a partitioned table
or inheritance tree, and is it plausible that it had to update rows
in more than one sub-table, and that some other transaction(s) were
concurrently updating those same rows?

regards, tom lane

#5Piotr Włodarczyk
piotrwlodarczyk89@gmail.com
In reply to: Tom Lane (#4)
Re: BUG #16184: Segmentation Fault during update

Hi,

I don't know what is EvalPlanQual but conditions you wrote about are met.

Table which is updated is main table and it name is job_abstract. The're a
lot of another tables for example job_bca, job_mod, job_smth which extends
job_abstract.

In my case situation is as follows: in the same time it's possible that
some different transactions are in progress (started but not
committed yet). The're some legacy (looks like some kind of guard) that
update under some conditions table job_abstract. In another transaction(s)
table job_bca (which extends job_abstract - inheritance) is updated too.
This two (or more) transactions are updating the same rows, but different
set of fields. I think that author of this update tried to force change
xmax of job_abstract making dumb update job_status = 0 where ... job_status
= 0. But everything works fine to postgres 11.5. When we upgreade to 12.1
segmentation fault occurs. But i saw in commit you linked to description:
"Back-patch to v12 where the bug was introduced".

On Fri, Jan 31, 2020 at 6:41 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:

=?UTF-8?Q?Piotr_W=C5=82odarczyk?= <piotrwlodarczyk89@gmail.com> writes:

sorry for the late reply, but it was not so easy to get more logs. So

what

we have now is listed below:

(gdb) bt
#0 0x0000000000616912 in EvalPlanQualBegin (epqstate=epqstate@entry

=0x2ae9520)

at execMain.c:2735
#1 0x0000000000616cfb in EvalPlanQual (epqstate=epqstate@entry

=0x2ae9520,

relation=relation@entry=0x7f694c76f838, rti=14,
inputslot=inputslot@entry=0x31bc408)
at execMain.c:2454
#2 0x0000000000638bdf in ExecUpdate (mtstate=mtstate@entry=0x2ae9428,
tupleid=0x7ffc94e3cf40, oldtuple=0x0, slot=<optimized out>,
planSlot=0x312fc28, epqstate=epqstate@entry=0x2ae9520,
estate=estate@entry=0x2aff450,
canSetTag=true) at nodeModifyTable.c:1387
#3 0x000000000063914d in ExecModifyTable (pstate=0x2ae9428) at
nodeModifyTable.c:2223

OK, so that confirms my suspicion that it was an EvalPlanQual issue
(hence, you need concurrent updates on the same row to trigger it).

I believe this might be the same bug we isolated and fixed just a
couple of days ago:

https://git.postgresql.org/gitweb/?p=postgresql.git;a=commitdiff;h=87fed2a197abc1397b63ee74b3fa7eb20471fff5

The stack trace looks the same as what Oleksii showed in that thread
for a non-assert build. As far as I know, that bug only triggers if
a single ModifyTable plan node has to run EPQ checks for more than
one target table --- so, is this query updating a partitioned table
or inheritance tree, and is it plausible that it had to update rows
in more than one sub-table, and that some other transaction(s) were
concurrently updating those same rows?

regards, tom lane

--

Pozdrawiam
Piotr Włodarczyk

#6Tom Lane
tgl@sss.pgh.pa.us
In reply to: Piotr Włodarczyk (#5)
Re: BUG #16184: Segmentation Fault during update

=?UTF-8?Q?Piotr_W=C5=82odarczyk?= <piotrwlodarczyk89@gmail.com> writes:

I don't know what is EvalPlanQual but conditions you wrote about are met.

OK, then I'm pretty confident that patch will fix your problem.
It'll be in 12.2 which is due out in two weeks.

regards, tom lane

#7Piotr Włodarczyk
piotrwlodarczyk89@gmail.com
In reply to: Tom Lane (#6)
Re: BUG #16184: Segmentation Fault during update

Ok,

thanks.

On Fri, Jan 31, 2020 at 11:30 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:

=?UTF-8?Q?Piotr_W=C5=82odarczyk?= <piotrwlodarczyk89@gmail.com> writes:

I don't know what is EvalPlanQual but conditions you wrote about are met.

OK, then I'm pretty confident that patch will fix your problem.
It'll be in 12.2 which is due out in two weeks.

regards, tom lane

--

Pozdrawiam
Piotr Włodarczyk