BUG #15395: Assert failure when using CURRENT OF with inheritance

Started by PG Bug reporting formover 7 years ago4 messagesbugs
Jump to latest
#1PG Bug reporting form
noreply@postgresql.org

The following bug has been logged on the website:

Bug reference: 15395
Logged by: Matvey Arye
Email address: mat@timescale.com
PostgreSQL version: 10.4
Operating system: OSX
Description:

The following SQL script causes an Assert failure if PostgreSQL is compiled
with asserts. When run on a PostgreSQL instance with Asserts turned off,
this script seems to work as expected. Thus, I suspect that it may be safe
to just remove the Assert at execCurrent.c:239 but am not familiar enough
with this part of the code to be sure.

The SQL script is:
```
CREATE TABLE current_check (currentid int, payload text);

CREATE TABLE current_check_1 () INHERITS (current_check);
CREATE TABLE current_check_2 () INHERITS (current_check);

INSERT INTO current_check_1 VALUES
(1, 'abc');
INSERT INTO current_check_2 VALUES
(2, 'bcd'),
(3, 'cde'),
(4, 'def');

BEGIN;

DECLARE current_check_cursor SCROLL CURSOR FOR SELECT * FROM
current_check;

FETCH ABSOLUTE 2 FROM current_check_cursor;
FETCH ABSOLUTE 1 FROM current_check_cursor;

DELETE FROM current_check WHERE CURRENT OF current_check_cursor RETURNING
*;

COMMIT;
```

The backtrace for the Assert is:
```
* frame #0: 0x00007fff54a19b66 libsystem_kernel.dylib`__pthread_kill +
10
frame #1: 0x00007fff54be4080 libsystem_pthread.dylib`pthread_kill +
333
frame #2: 0x00007fff549751ae libsystem_c.dylib`abort + 127
frame #3: 0x0000000102581630
postgres`ExceptionalCondition(conditionName="!(((bool) (((const
void*)(current_tid) != ((void*)0)) && ((current_tid)->ip_posid !=0))))",
errorType="FailedAssertion", fileName="execCurrent.c", lineNumber=239) at
assert.c:54
frame #4: 0x00000001021b025f
postgres`execCurrentOf(cexpr=0x00007f890604bf00,
econtext=0x00007f890609c640, table_oid=16402,
current_tid=0x00007ffeedcf4588) atexecCurrent.c:239
frame #5: 0x00000001021ffbd5
postgres`TidListEval(tidstate=0x00007f890609a6b8) at nodeTidscan.c:224
frame #6: 0x00000001021ff4de postgres`TidNext(node=0x00007f890609a6b8)
at nodeTidscan.c:329
frame #7: 0x00000001021cabab
postgres`ExecScanFetch(node=0x00007f890609a6b8, accessMtd=(postgres`TidNext
at nodeTidscan.c:304), recheckMtd=(postgres`TidRecheck at
nodeTidscan.c:420)) at execScan.c:97
frame #8: 0x00000001021ca8d3 postgres`ExecScan(node=0x00007f890609a6b8,
accessMtd=(postgres`TidNext at nodeTidscan.c:304),
recheckMtd=(postgres`TidRecheck at nodeTidscan.c:420)) at execScan.c:164
frame #9: 0x00000001021ff082
postgres`ExecTidScan(pstate=0x00007f890609a6b8) at nodeTidscan.c:453
frame #10: 0x00000001021c8d72
postgres`ExecProcNodeFirst(node=0x00007f890609a6b8) at execProcnode.c:430
frame #11: 0x00000001021f27f2
postgres`ExecProcNode(node=0x00007f890609a6b8) at executor.h:250
frame #12: 0x00000001021f182f
postgres`ExecModifyTable(pstate=0x00007f8906098f00) at
nodeModifyTable.c:1645
frame #13: 0x00000001021c5a82
postgres`ExecProcNode(node=0x00007f8906098f00) at executor.h:250
frame #14: 0x00000001021c1813
postgres`ExecutePlan(estate=0x00007f8906098840,
planstate=0x00007f8906098f00, use_parallel_mode='\0', operation=CMD_DELETE,
sendTuples='\x01', numberTuples=0, direction=ForwardScanDirection,
dest=0x00007f8906000278, execute_once='\x01') at execMain.c:1722
frame #15: 0x00000001021c16db
postgres`standard_ExecutorRun(queryDesc=0x00007f8906000310,
direction=ForwardScanDirection, count=0, execute_once='\x01') at
execMain.c:363
frame #16: 0x00000001021c1497
postgres`ExecutorRun(queryDesc=0x00007f8906000310,
direction=ForwardScanDirection, count=0, execute_once='\x01') at
execMain.c:306
frame #17: 0x00000001023cfc4e
postgres`ProcessQuery(plan=0x00007f890608eea0, sourceText="DELETE FROM
current_check WHERE CURRENT OF current_check_cursor RETURNING *;",
params=0x0000000000000000, queryEnv=0x0000000000000000,
dest=0x00007f8906000278, completionTag="") at pquery.c:161
frame #18: 0x00000001023cecb8
postgres`PortalRunMulti(portal=0x00007f8903806040, isTopLevel='\x01',
setHoldSnapshot='\x01', dest=0x00007f8906000278, altdest=0x00000001027d3348,
completionTag="") at pquery.c:1286
frame #19: 0x00000001023ce6b4
postgres`FillPortalStore(portal=0x00007f8903806040, isTopLevel='\x01') at
pquery.c:1033
frame #20: 0x00000001023ce285
postgres`PortalRun(portal=0x00007f8903806040, count=9223372036854775807,
isTopLevel='\x01', run_once='\x01', dest=0x00007f890608ef98,
altdest=0x00007f890608ef98, completionTag="") at pquery.c:768
frame #21: 0x00000001023c9c2c
postgres`exec_simple_query(query_string="DELETE FROM current_check WHERE
CURRENT OF current_check_cursor RETURNING *;") at postgres.c:1099
frame #22: 0x00000001023c8e37 postgres`PostgresMain(argc=1,
argv=0x00007f8905006d10, dbname="single", username="default_perm_user") at
postgres.c:4088
frame #23: 0x000000010230f08e
postgres`BackendRun(port=0x00007f89049006c0) at postmaster.c:4405
frame #24: 0x000000010230e403
postgres`BackendStartup(port=0x00007f89049006c0) at postmaster.c:4077
frame #25: 0x000000010230d3e5 postgres`ServerLoop at postmaster.c:1755
frame #26: 0x000000010230ab25 postgres`PostmasterMain(argc=8,
argv=0x00007f8902c03160) at postmaster.c:1363
frame #27: 0x000000010222e529 postgres`main(argc=8,
argv=0x00007f8902c03160) at main.c:228
```

#2Tom Lane
tgl@sss.pgh.pa.us
In reply to: PG Bug reporting form (#1)
Re: BUG #15395: Assert failure when using CURRENT OF with inheritance

=?utf-8?q?PG_Bug_reporting_form?= <noreply@postgresql.org> writes:

The following SQL script causes an Assert failure if PostgreSQL is compiled
with asserts.

Nice catch! This has evidently been broken for a long time.

When run on a PostgreSQL instance with Asserts turned off,
this script seems to work as expected. Thus, I suspect that it may be safe
to just remove the Assert at execCurrent.c:239 but am not familiar enough
with this part of the code to be sure.

No, I don't think that's a great fix. After tracing through it, I can
see that what's going wrong is:

1. The plan tree for the cursor looks like

EXPLAIN DECLARE current_check_cursor SCROLL CURSOR FOR SELECT * FROM current_check;
QUERY PLAN
--------------------------------------------------------------------------
Append (cost=0.00..58.11 rows=2541 width=36)
-> Seq Scan on current_check (cost=0.00..0.00 rows=1 width=36)
-> Seq Scan on current_check_1 (cost=0.00..22.70 rows=1270 width=36)
-> Seq Scan on current_check_2 (cost=0.00..22.70 rows=1270 width=36)
(4 rows)

2. The "FETCH ABSOLUTE 2" command runs through the single tuple available
from the seqscans on current_check and current_check_1, and then stops
on the first tuple available from current_check_2.

3. The "FETCH ABSOLUTE 1" command rewinds the executor and fetches the
first available tuple, which is in current_check_1.

4. At this point, the SeqScan node for current_check has been run to
completion, so its ScanTupleSlot is empty. The node for current_check_1
is correctly positioned on a tuple. The node for current_check_2 has
not been visited yet, so its state is as ExecReScan left it. The trouble
is that the only thing that's happened to its ScanTupleSlot is where
heapam.c's initscan() does

ItemPointerSetInvalid(&scan->rs_ctup.t_self);

It's only rather accidental that the plan node's ScanTupleSlot is
pointing at that at all, but it is. So execCurrent.c sees an apparently
valid scan tuple containing an invalid t_self, and it spits up, which
I think is entirely a good thing.

I think that the correct fix is to ensure that the ScanTupleSlot gets
cleared during ExecReScan, which fortunately is pretty easy to mechanize
because all scan node types call ExecScanReScan. So the execScan.c
part of the patch below seems to be enough to resolve the reported case.

However, we're not really out of the woods yet, because I noticed that
places like ExecReScanAppend will postpone calling ExecReScan if there's
a chgParam flag set on their sub-nodes. This means that we can't
necessarily rely on ExecutorRewind performing an immediate call of
ExecReScan for every scan node in the plan. It seems to me, therefore,
that we need to teach execCurrent.c to check for pending-rescan flags
and not believe that a node is really on a tuple if it finds such a
flag at or above the node. The execCurrent.c part of the attached patch
does that.

I've not tried yet to create an actual test case for the chgParam-based
failure. It's definitely possible that that problem is only hypothetical
at the moment because cursor plans that would satisfy search_plan_tree
would be too simple to contain any such flags. But I think we'd better
add that logic anyway.

regards, tom lane

Attachments:

fix-bug-15395.patchtext/x-diff; charset=us-ascii; name=fix-bug-15395.patchDownload+52-16
#3Tom Lane
tgl@sss.pgh.pa.us
In reply to: Tom Lane (#2)
Re: BUG #15395: Assert failure when using CURRENT OF with inheritance

I wrote:

I've not tried yet to create an actual test case for the chgParam-based
failure. It's definitely possible that that problem is only hypothetical
at the moment because cursor plans that would satisfy search_plan_tree
would be too simple to contain any such flags. But I think we'd better
add that logic anyway.

After fooling with that for awhile longer, indeed there doesn't seem to
be a way to cause such a failure today. The closest thing I was able
to find was that an InitPlan's output params could get marked for
recalculation, resulting in chgParam getting set on nodes that reference
the InitPlan. A cursor query could certainly contain an InitPlan ---
but there doesn't seem to be any code path that would result in forcing
recalculation of one, except with things that couldn't be in a simple
cursor plan, such as NestLoop joins. Still, it's not real hard to
conceive of future changes that would render this a live issue, so
I still feel that the search_plan_tree change is a good idea.

So, I've added a test case based on your example and pushed the patch.

BTW, I don't understand your report that just removing the Assert made
it work for you. That would've resulted in returning an invalid TID
to nodeTidscan.c, which AFAICS should've ended up with an ereport about
a bad block number in heap_fetch.

regards, tom lane

#4Mat Arye
mat@timescale.com
In reply to: Tom Lane (#3)
Re: BUG #15395: Assert failure when using CURRENT OF with inheritance

Sorry for the delayed response. The patch looks good to me and your
explanation makes sense.

I just re-ran my original test sql script on an official docker image of
postgres (which obviously has asserts off) and didn't get any errors + the
correct row is deleted.
I can investigate further if you think it's worthwhile.

Thanks for all your help.

-Mat

On Sun, Sep 23, 2018 at 4:06 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:

Show quoted text

I wrote:

I've not tried yet to create an actual test case for the chgParam-based
failure. It's definitely possible that that problem is only hypothetical
at the moment because cursor plans that would satisfy search_plan_tree
would be too simple to contain any such flags. But I think we'd better
add that logic anyway.

After fooling with that for awhile longer, indeed there doesn't seem to
be a way to cause such a failure today. The closest thing I was able
to find was that an InitPlan's output params could get marked for
recalculation, resulting in chgParam getting set on nodes that reference
the InitPlan. A cursor query could certainly contain an InitPlan ---
but there doesn't seem to be any code path that would result in forcing
recalculation of one, except with things that couldn't be in a simple
cursor plan, such as NestLoop joins. Still, it's not real hard to
conceive of future changes that would render this a live issue, so
I still feel that the search_plan_tree change is a good idea.

So, I've added a test case based on your example and pushed the patch.

BTW, I don't understand your report that just removing the Assert made
it work for you. That would've resulted in returning an invalid TID
to nodeTidscan.c, which AFAICS should've ended up with an ereport about
a bad block number in heap_fetch.

regards, tom lane