Core dump with nested CREATE TEMP TABLE
I don't have an independent reproduction yet (though make test in [1]https://github.com/BlueTreble/test_factory/tree/crash
should reproduce this). I haven't actually been able to reproduce by
hand yet, but pgtap has something to do with this. This is affecting at
least 9.4 and a fairly recent HEAD.
-- Bits from top of test/sql/base.sql
\i test/helpers/setup.sql
SET ROLE = DEFAULT;
CREATE ROLE test_role;
GRANT USAGE ON SCHEMA tap TO test_role;
GRANT test_role TO test_factory__owner;
CREATE SCHEMA test AUTHORIZATION test_role;
SET ROLE = test_role;
SET search_path = test, tap;
\i test/helpers/create.sql
SELECT tf.register(
'customer'
, array[
row(
'insert'
, $$INSERT INTO customer VALUES (DEFAULT, 'first', 'last' )
RETURNING *$$
)::tf.test_set
, row(
'function'
, $$SELECT * FROM customer__add( 'func first', 'func last' )$$
)::tf.test_set
]
);
SELECT tf.register(
'invoice'
, array[
row(
'base'
, $$INSERT INTO invoice VALUES(
DEFAULT
, (tf.get( NULL::customer, 'insert' )).customer_id
, current_date
, current_date + 30
) RETURNING *$$
)::tf.test_set
]
);
SELECT no_plan();
SELECT lives_ok($$SELECT * FROM tf.get( NULL::invoice, 'base' )$$);
not ok 1
# Failed test 1
# died: 42703: column "c_data_table_name" does not exist
-- Ok, got an error, but no crash. But now...
SELECT results_eq(
$$SELECT * FROM tf.get( NULL::invoice, 'base' )$$
, $$VALUES( 1, 1, current_date, current_date + 30 )$$
, 'invoice factory output'
);
server closed the connection unexpectedly
This probably means the server terminated abnormally
before or while processing the request.
The connection to the server was lost. Attempting reset: Failed.
lives_ok() and results_eq() are both pgTap functions. Definitions at [2]https://github.com/theory/pgtap/blob/master/sql/pgtap.sql.in#L746
and [3]https://github.com/theory/pgtap/blob/master/sql/pgtap.sql.in#L6541 which is being called by https://github.com/theory/pgtap/blob/master/sql/pgtap.sql.in#L6591 -- Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX Experts in Analytics, Data Architecture and PostgreSQL Data in Trouble? Get it in Treble! http://BlueTreble.com.
I've attached a full server log from running make test, but the most
relevant bit is below:
DEBUG: AbortSubTransaction
CONTEXT: PL/pgSQL function results_eq(refcursor,refcursor,text) line 11
during exception cleanup
PL/pgSQL function results_eq(text,text,text) line 9 at assignment
DEBUG: name: unnamed; blockState: INPROGRESS; state: INPROGR,
xid/subid/cid: 1980/1/979, nestlvl: 1, children: 1981 1982 1983 1984
1985 1986 1987 1988 1989 1990 1991 1992 1993 1994 1995 1996 1997 1998 1999
CONTEXT: PL/pgSQL function results_eq(refcursor,refcursor,text) line 11
during exception cleanup
PL/pgSQL function results_eq(text,text,text) line 9 at assignment
DEBUG: name: pg_psql_temporary_savepoint; blockState: SUB INPROGRS;
state: INPROGR, xid/subid/cid: 2000/34/979, nestlvl: 2, children:
CONTEXT: PL/pgSQL function results_eq(refcursor,refcursor,text) line 11
during exception cleanup
PL/pgSQL function results_eq(text,text,text) line 9 at assignment
DEBUG: name: unnamed; blockState: SUB INPROGRS; state: INPROGR,
xid/subid/cid: 2001/35/979, nestlvl: 3, children:
CONTEXT: PL/pgSQL function results_eq(refcursor,refcursor,text) line 11
during exception cleanup
PL/pgSQL function results_eq(text,text,text) line 9 at assignment
TRAP: FailedAssertion("!(rebuild ? !((bool)((relation)->rd_refcnt == 0))
: ((bool)((relation)->rd_refcnt == 0)))", File: "relcache.c", Line: 2055)
[1]: https://github.com/BlueTreble/test_factory/tree/crash
[2]: https://github.com/theory/pgtap/blob/master/sql/pgtap.sql.in#L746
[3]: https://github.com/theory/pgtap/blob/master/sql/pgtap.sql.in#L6541 which is being called by https://github.com/theory/pgtap/blob/master/sql/pgtap.sql.in#L6591 -- Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX Experts in Analytics, Data Architecture and PostgreSQL Data in Trouble? Get it in Treble! http://BlueTreble.com
which is being called by
https://github.com/theory/pgtap/blob/master/sql/pgtap.sql.in#L6591
--
Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX
Experts in Analytics, Data Architecture and PostgreSQL
Data in Trouble? Get it in Treble! http://BlueTreble.com
Attachments:
Looks like a 98k file won't get through the list...
-------- Forwarded Message --------
Subject: Core dump with nested CREATE TEMP TABLE
Date: Thu, 27 Aug 2015 19:45:12 -0500
From: Jim Nasby <Jim.Nasby@BlueTreble.com>
To: Pg Hackers <pgsql-hackers@postgresql.org>
I don't have an independent reproduction yet (though make test in [1]https://github.com/BlueTreble/test_factory/tree/crash
should reproduce this). I haven't actually been able to reproduce by
hand yet, but pgtap has something to do with this. This is affecting at
least 9.4 and a fairly recent HEAD.
-- Bits from top of test/sql/base.sql
\i test/helpers/setup.sql
SET ROLE = DEFAULT;
CREATE ROLE test_role;
GRANT USAGE ON SCHEMA tap TO test_role;
GRANT test_role TO test_factory__owner;
CREATE SCHEMA test AUTHORIZATION test_role;
SET ROLE = test_role;
SET search_path = test, tap;
\i test/helpers/create.sql
SELECT tf.register(
'customer'
, array[
row(
'insert'
, $$INSERT INTO customer VALUES (DEFAULT, 'first', 'last' )
RETURNING *$$
)::tf.test_set
, row(
'function'
, $$SELECT * FROM customer__add( 'func first', 'func last' )$$
)::tf.test_set
]
);
SELECT tf.register(
'invoice'
, array[
row(
'base'
, $$INSERT INTO invoice VALUES(
DEFAULT
, (tf.get( NULL::customer, 'insert' )).customer_id
, current_date
, current_date + 30
) RETURNING *$$
)::tf.test_set
]
);
SELECT no_plan();
SELECT lives_ok($$SELECT * FROM tf.get( NULL::invoice, 'base' )$$);
not ok 1
# Failed test 1
# died: 42703: column "c_data_table_name" does not exist
-- Ok, got an error, but no crash. But now...
SELECT results_eq(
$$SELECT * FROM tf.get( NULL::invoice, 'base' )$$
, $$VALUES( 1, 1, current_date, current_date + 30 )$$
, 'invoice factory output'
);
server closed the connection unexpectedly
This probably means the server terminated abnormally
before or while processing the request.
The connection to the server was lost. Attempting reset: Failed.
lives_ok() and results_eq() are both pgTap functions. Definitions at [2]https://github.com/theory/pgtap/blob/master/sql/pgtap.sql.in#L746
and [3]https://github.com/theory/pgtap/blob/master/sql/pgtap.sql.in#L6541 which is being called by https://github.com/theory/pgtap/blob/master/sql/pgtap.sql.in#L6591 -- Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX Experts in Analytics, Data Architecture and PostgreSQL Data in Trouble? Get it in Treble! http://BlueTreble.com.
I've attached a full server log from running make test, but the most
relevant bit is below:
DEBUG: AbortSubTransaction
CONTEXT: PL/pgSQL function results_eq(refcursor,refcursor,text) line 11
during exception cleanup
PL/pgSQL function results_eq(text,text,text) line 9 at assignment
DEBUG: name: unnamed; blockState: INPROGRESS; state: INPROGR,
xid/subid/cid: 1980/1/979, nestlvl: 1, children: 1981 1982 1983 1984
1985 1986 1987 1988 1989 1990 1991 1992 1993 1994 1995 1996 1997 1998 1999
CONTEXT: PL/pgSQL function results_eq(refcursor,refcursor,text) line 11
during exception cleanup
PL/pgSQL function results_eq(text,text,text) line 9 at assignment
DEBUG: name: pg_psql_temporary_savepoint; blockState: SUB INPROGRS;
state: INPROGR, xid/subid/cid: 2000/34/979, nestlvl: 2, children:
CONTEXT: PL/pgSQL function results_eq(refcursor,refcursor,text) line 11
during exception cleanup
PL/pgSQL function results_eq(text,text,text) line 9 at assignment
DEBUG: name: unnamed; blockState: SUB INPROGRS; state: INPROGR,
xid/subid/cid: 2001/35/979, nestlvl: 3, children:
CONTEXT: PL/pgSQL function results_eq(refcursor,refcursor,text) line 11
during exception cleanup
PL/pgSQL function results_eq(text,text,text) line 9 at assignment
TRAP: FailedAssertion("!(rebuild ? !((bool)((relation)->rd_refcnt == 0))
: ((bool)((relation)->rd_refcnt == 0)))", File: "relcache.c", Line: 2055)
[1]: https://github.com/BlueTreble/test_factory/tree/crash
[2]: https://github.com/theory/pgtap/blob/master/sql/pgtap.sql.in#L746
[3]: https://github.com/theory/pgtap/blob/master/sql/pgtap.sql.in#L6541 which is being called by https://github.com/theory/pgtap/blob/master/sql/pgtap.sql.in#L6591 -- Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX Experts in Analytics, Data Architecture and PostgreSQL Data in Trouble? Get it in Treble! http://BlueTreble.com
which is being called by
https://github.com/theory/pgtap/blob/master/sql/pgtap.sql.in#L6591
--
Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX
Experts in Analytics, Data Architecture and PostgreSQL
Data in Trouble? Get it in Treble! http://BlueTreble.com
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Sat, Aug 29, 2015 at 5:02 AM, Jim Nasby <Jim.Nasby@bluetreble.com> wrote:
Looks like a 98k file won't get through the list...
Is it compressed? Note that we have sometimes larger patches than that, but
perhaps those had special permissions by the admins of this list.
--
Michael
Michael Paquier <michael.paquier@gmail.com> writes:
On Sat, Aug 29, 2015 at 5:02 AM, Jim Nasby <Jim.Nasby@bluetreble.com> wrote:
Looks like a 98k file won't get through the list...
Is it compressed? Note that we have sometimes larger patches than that, but
perhaps those had special permissions by the admins of this list.
Messages significantly larger than that go through all the time. Maybe
you had it marked with some weird MIME type?
regards, tom lane
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On 8/28/15 8:39 PM, Tom Lane wrote:
Michael Paquier <michael.paquier@gmail.com> writes:
On Sat, Aug 29, 2015 at 5:02 AM, Jim Nasby <Jim.Nasby@bluetreble.com> wrote:
Looks like a 98k file won't get through the list...
Is it compressed? Note that we have sometimes larger patches than that, but
perhaps those had special permissions by the admins of this list.Messages significantly larger than that go through all the time. Maybe
you had it marked with some weird MIME type?
Apparently the original email did go through and my MUA search just
failed to find it. Sorry for the noise.
Original email:
/messages/by-id/55DFAF18.4060601@BlueTreble.com
--
Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX
Experts in Analytics, Data Architecture and PostgreSQL
Data in Trouble? Get it in Treble! http://BlueTreble.com
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Sat, Aug 29, 2015 at 11:18 AM, Jim Nasby <Jim.Nasby@bluetreble.com>
wrote:
On 8/28/15 8:39 PM, Tom Lane wrote:
Michael Paquier <michael.paquier@gmail.com> writes:
On Sat, Aug 29, 2015 at 5:02 AM, Jim Nasby <Jim.Nasby@bluetreble.com>
wrote:Looks like a 98k file won't get through the list...
Is it compressed? Note that we have sometimes larger patches than that,
but
perhaps those had special permissions by the admins of this list.Messages significantly larger than that go through all the time. Maybe
you had it marked with some weird MIME type?Apparently the original email did go through and my MUA search just failed
to find it. Sorry for the noise.Original email:
/messages/by-id/55DFAF18.4060601@BlueTreble.com
Ah, OK, you meant this file... Yes I was able to receive it as well in your
original email. I'll try to investigate further later, but Tom may beat me
first. He usually does.
--
Michael
Michael Paquier <michael.paquier@gmail.com> writes:
Ah, OK, you meant this file... Yes I was able to receive it as well in your
original email. I'll try to investigate further later, but Tom may beat me
first. He usually does.
Jim had indicated the bug wasn't reproducible on the strength of that
info, so I was waiting for him to provide a more reproducible case.
regards, tom lane
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On 8/29/15 8:04 AM, Tom Lane wrote:
Michael Paquier <michael.paquier@gmail.com> writes:
Ah, OK, you meant this file... Yes I was able to receive it as well in your
original email. I'll try to investigate further later, but Tom may beat me
first. He usually does.Jim had indicated the bug wasn't reproducible on the strength of that
info, so I was waiting for him to provide a more reproducible case.
It was reproducible, just not very self contained. [1]https://github.com/BlueTreble/test_factory/blob/crash/crash.sql is a bit better,
but it still involves pgTap as well as test_factory.
Steps to reproduce:
Download https://github.com/BlueTreble/test_factory/archive/crash.zip
Unzip, cd into directory
pgxn install pgtap (or just make test)
make install if you didn't do make test
psql -f crash.sql
Stack trace below. Relevant assert:
1967 * As per notes above, a rel to be rebuilt MUST have refcnt >
0; while of
1968 * course it would be a bad idea to blow away one with nonzero
refcnt.
1969 */
-> 1970 Assert(rebuild ?
1971 !RelationHasReferenceCountZero(relation) :
1972 RelationHasReferenceCountZero(relation));
Relevant bits of relation:
│ ◆─(Relation) relation = 0x00000001165832a8 ││ │
│ ├─◆─(RelFileNode) rd_node ││ │
│ ├─◆─(SMgrRelationData *) rd_smgr = 0x0000000000000000 ││ │
│ ├─(int) rd_refcnt = 1 ││ │
│ ├─(BackendId) rd_backend = 2 ││ │
│ ├─(bool) rd_islocaltemp = '\x01' ││ │
│ ├─(bool) rd_isnailed = '\0' ││ │
│ ├─(bool) rd_isvalid = '\x01' ││ │
│ ├─(char) rd_indexvalid = '\0' ││ │
│ ├─(SubTransactionId) rd_createSubid = 16 ││ │
│ ├─(SubTransactionId) rd_newRelfilenodeSubid = 0 ││ │
│ ├─◆─(Form_pg_class) rd_rel = 0x00000001165838d8 ││ │
│ │ ├─◆─(NameData) relname ││ │
│ │ │ └─◆─(char [64]) data "invoice_0000000003"
rebuild is 0.
[1]: https://github.com/BlueTreble/test_factory/blob/crash/crash.sql
(lldb) bt
* thread #1: tid = 0x3b03a0, 0x00007fff92a5a866 libsystem_kernel.dylib`__pthread_kill + 10, queue = 'com.apple.main-thread', stop reason = signal SIGABRT
* frame #0: 0x00007fff92a5a866 libsystem_kernel.dylib`__pthread_kill + 10
frame #1: 0x00007fff9001b35c libsystem_pthread.dylib`pthread_kill + 92
frame #2: 0x00007fff8cf89b1a libsystem_c.dylib`abort + 125
frame #3: 0x000000010cdb4039 postgres`ExceptionalCondition(conditionName=0x000000010cf59cfd, errorType=0x000000010cec392d, fileName=0x000000010cf59045, lineNumber=1972) + 137 at assert.c:54
frame #4: 0x000000010cd9b332 postgres`RelationClearRelation(relation=0x0000000116594cd8, rebuild='\0') + 162 at relcache.c:1970
frame #5: 0x000000010cd9cc0f postgres`AtEOSubXact_cleanup(relation=0x0000000116594cd8, isCommit='\0', mySubid=15, parentSubid=14) + 79 at relcache.c:2665
frame #6: 0x000000010cd9cb92 postgres`AtEOSubXact_RelationCache(isCommit='\0', mySubid=15, parentSubid=14) + 242 at relcache.c:2633
frame #7: 0x000000010c9b6e88 postgres`AbortSubTransaction + 440 at xact.c:4373
frame #8: 0x000000010c9b8ef2 postgres`RollbackAndReleaseCurrentSubTransaction + 178 at xact.c:3948
frame #9: 0x0000000116295c93 plpgsql.so`exec_stmt_block(estate=0x00007fff532bfa50, block=0x00007fa3f56ab600) + 1443 at pl_exec.c:1214
frame #10: 0x0000000116294625 plpgsql.so`plpgsql_exec_function(func=0x00007fa3f5369c10, fcinfo=0x00007fa3f5634160, simple_eval_estate=0x0000000000000000) + 837 at pl_exec.c:344
frame #11: 0x000000011628dbb6 plpgsql.so`plpgsql_call_handler(fcinfo=0x00007fa3f5634160) + 470 at pl_handler.c:243
frame #12: 0x000000010caf2471 postgres`ExecMakeFunctionResultNoSets(fcache=0x00007fa3f56340f0, econtext=<unavailable>, isNull=0x00007fff532bfd5b, isDone=<unavailable>) + 209 at execQual.c:2026
frame #13: 0x0000000116299772 plpgsql.so`exec_eval_simple_expr(estate=0x00007fff532c0090, expr=0x00007fa3f53a1e60, result=0x00007fff532bfd10, isNull=0x00007fff532bfd5b, rettype=0x00007fff532bfd5c) + 562 at pl_exec.c:5203
frame #14: 0x000000011629911d plpgsql.so`exec_eval_expr(estate=0x00007fff532c0090, expr=0x00007fa3f53a1e60, isNull=0x00007fff532bfd5b, rettype=0x00007fff532bfd5c) + 93 at pl_exec.c:4790
frame #15: 0x000000011629ac70 plpgsql.so`exec_assign_expr(estate=0x00007fff532c0090, target=0x00007fa3f541c100, expr=0x00007fa3f53a1e60) + 48 at pl_exec.c:3992
frame #16: 0x000000011629b300 plpgsql.so`exec_stmt_assign(estate=0x00007fff532c0090, stmt=0x00007fa3f53a1f50) + 128 at pl_exec.c:1511
frame #17: 0x000000011629af6c plpgsql.so`exec_stmt(estate=0x00007fff532c0090, stmt=0x00007fa3f53a1f50) + 252 at pl_exec.c:1399
frame #18: 0x000000011629ad1e plpgsql.so`exec_stmts(estate=0x00007fff532c0090, stmts=0x00007fa3f53a1bd0) + 126 at pl_exec.c:1362
frame #19: 0x0000000116295f1f plpgsql.so`exec_stmt_block(estate=0x00007fff532c0090, block=0x00007fa3f53a21d0) + 2095 at pl_exec.c:1300
frame #20: 0x0000000116294625 plpgsql.so`plpgsql_exec_function(func=0x00007fa3f52b1a30, fcinfo=0x00007fa3f53f9500, simple_eval_estate=0x0000000000000000) + 837 at pl_exec.c:344
frame #21: 0x000000011628dbb6 plpgsql.so`plpgsql_call_handler(fcinfo=0x00007fa3f53f9500) + 470 at pl_handler.c:243
frame #22: 0x000000010caf2471 postgres`ExecMakeFunctionResultNoSets(fcache=0x00007fa3f53f9490, econtext=<unavailable>, isNull=0x00007fa3f53f9f98, isDone=<unavailable>) + 209 at execQual.c:2026
frame #23: 0x000000010caf1a4d postgres`ExecProject [inlined] ExecTargetList(targetlist=<unavailable>, econtext=<unavailable>, values=<unavailable>, isnull=<unavailable>, itemIsDone=<unavailable>, isDone=<unavailable>) + 90 at execQual.c:5307
frame #24: 0x000000010caf19f3 postgres`ExecProject(projInfo=<unavailable>, isDone=0x00007fff532c033c) + 323 at execQual.c:5522
frame #25: 0x000000010cb07233 postgres`ExecResult(node=0x00007fa3f53f9150) + 179 at nodeResult.c:155
frame #26: 0x000000010caeae1c postgres`ExecProcNode(node=0x00007fa3f53f9150) + 92 at execProcnode.c:373
frame #27: 0x000000010cae6836 postgres`ExecutePlan(estate=0x00007fa3f53f9038, planstate=0x00007fa3f53f9150, operation=CMD_SELECT, sendTuples='\x01', numberTuples=0, direction=ForwardScanDirection, dest=0x00007fa3f53cd740) + 118 at execMain.c:1490
frame #28: 0x000000010cae675b postgres`standard_ExecutorRun(queryDesc=0x00007fa3f54f3638, direction=ForwardScanDirection, count=0) + 523 at execMain.c:319
frame #29: 0x000000010cae653d postgres`ExecutorRun(queryDesc=0x00007fa3f54f3638, direction=ForwardScanDirection, count=0) + 77 at execMain.c:267
frame #30: 0x000000010cc2a2b7 postgres`PortalRunSelect(portal=0x00007fa3f3012638, forward=<unavailable>, count=0, dest=<unavailable>) + 263 at pquery.c:946
frame #31: 0x000000010cc29dd5 postgres`PortalRun(portal=0x00007fa3f3012638, count=9223372036854775807, isTopLevel=<unavailable>, dest=0x00007fa3f53cd740, altdest=<unavailable>, completionTag=0x00007fff532c06d0) + 469 at pquery.c:790
frame #32: 0x000000010cc25b86 postgres`exec_simple_query(query_string=0x00007fa3f300e638) + 1302 at postgres.c:1072
frame #33: 0x000000010cc24def postgres`PostgresMain(argc=1, argv=0x00007fa3f401c5f0, dbname=0x00007fa3f401c450, username=0x00007fa3f30031f8) + 2895 at postgres.c:4074
frame #34: 0x000000010cb9da48 postgres`PostmasterMain [inlined] BackendRun + 8024 at postmaster.c:4155
frame #35: 0x000000010cb9da22 postgres`PostmasterMain [inlined] BackendStartup at postmaster.c:3829
frame #36: 0x000000010cb9da22 postgres`PostmasterMain [inlined] ServerLoop at postmaster.c:1597
frame #37: 0x000000010cb9da22 postgres`PostmasterMain(argc=<unavailable>, argv=<unavailable>) + 7986 at postmaster.c:1244
frame #38: 0x000000010cb218cd postgres`main(argc=<unavailable>, argv=<unavailable>) + 1325 at main.c:228
frame #39: 0x00007fff8e9a35fd libdyld.dylib`start + 1
(lldb)
--
Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX
Experts in Analytics, Data Architecture and PostgreSQL
Data in Trouble? Get it in Treble! http://BlueTreble.com
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
Hi,
On 2015-08-29 11:06:05 -0500, Jim Nasby wrote:
Stack trace below. Relevant assert:
What exact revision is this on? Either there's some aggressive inlining
going on, or these lines don't entirely match up with HEAD.
That's compiled with optimization, isn't it? Could you compile with -O0?
Can you go up to
frame #19: 0x0000000116295f1f plpgsql.so`exec_stmt_block(estate=0x00007fff532c0090, block=0x00007fa3f53a21d0) + 2095 at pl_exec.c:1300
and do 'disass/m' to see where we actually are?
The area around #19 likely is the PG_CATCH inside exec_stmt_block. So
what's happening is that there's an exception raised, while handling the
previous exception. And apparently we're not doing that entirely
right. Besides the line numbers "during exception cleanup" hints at
that.
Greetings,
Andres Freund
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On 8/29/15 11:31 AM, Andres Freund wrote:
Hi,
On 2015-08-29 11:06:05 -0500, Jim Nasby wrote:
Stack trace below. Relevant assert:
What exact revision is this on? Either there's some aggressive inlining
going on, or these lines don't entirely match up with HEAD.
Oops, that was 9.4.1. Trace from just pulled HEAD below, compiled with -O0.
That's compiled with optimization, isn't it? Could you compile with -O0?
Can you go up to
frame #19: 0x0000000116295f1f plpgsql.so`exec_stmt_block(estate=0x00007fff532c0090, block=0x00007fa3f53a21d0) + 2095 at pl_exec.c:1300
and do 'disass/m' to see where we actually are?
That's now 20 instead of 19. disass/m gives an error (note my gdb is
tweaked for apple and everything else I've posted here is from lldb).
Output from 'disassemble' in frame 20 at bottom.
The area around #19 likely is the PG_CATCH inside exec_stmt_block. So
what's happening is that there's an exception raised, while handling the
previous exception. And apparently we're not doing that entirely
right. Besides the line numbers "during exception cleanup" hints at
that.
If the line numbers and the 'gui' mode of lldb are to be believed, frame
20 is NOT in the try block, but #9 is:
│ 1242 │ PG_CATCH(); ││ │
│ 1243 │ { ││ │
│ 1244 │ ErrorData *edata; ││ │
│ 1245 │ ListCell *e; ││ │
│ 1246 │ ││ │
│ 1247 │ estate->err_text = gettext_noop("during exception cleanup"); ││ │
│ 1248 │ ││ │
│ 1249 │ /* Save error info */ ││ │
│ 1250 │ MemoryContextSwitchTo(oldcontext); ││ │
│ 1251 │ edata = CopyErrorData(); ││ │
│ 1252 │ FlushErrorState(); ││ │
│ 1253 │ ││ │
│ 1254 │ /* Abort the inner transaction */ ││ │
│ 1255 │◆ RollbackAndReleaseCurrentSubTransaction();
Looking at frame 10
432 */
433 estate.err_text = NULL;
434 estate.err_stmt = (PLpgSQL_stmt *) (func->action);
-> 435 rc = exec_stmt_block(&estate, func->action);
436 if (rc != PLPGSQL_RC_RETURN)
437 {
438 estate.err_stmt = NULL;
func->fn_signature is results_eq(refcursor,refcursor,text), which is a
pgtap function that has an exception handler, and I know the code that
it's calling also has an exception handler.
(lldb) bt
* thread #1: tid = 0x3c8fb4, 0x00007fff92a5a866 libsystem_kernel.dylib`__pthread_kill + 10, queue = 'com.apple.main-thread', stop reason = signal SIGABRT
* frame #0: 0x00007fff92a5a866 libsystem_kernel.dylib`__pthread_kill + 10
frame #1: 0x00007fff9001b35c libsystem_pthread.dylib`pthread_kill + 92
frame #2: 0x00007fff8cf89b1a libsystem_c.dylib`abort + 125
frame #3: 0x000000010fcf3b99 postgres`ExceptionalCondition(conditionName=0x000000010fe157e9, errorType=0x000000010fd60053, fileName=0x000000010fe14a7f, lineNumber=2055) + 137 at assert.c:54
frame #4: 0x000000010fcda0ac postgres`RelationClearRelation(relation=0x000000011954f388, rebuild='\0') + 140 at relcache.c:2053
frame #5: 0x000000010fcdb8cf postgres`AtEOSubXact_cleanup(relation=0x000000011954f388, isCommit='\0', mySubid=16, parentSubid=15) + 79 at relcache.c:2752
frame #6: 0x000000010fcdb852 postgres`AtEOSubXact_RelationCache(isCommit='\0', mySubid=16, parentSubid=15) + 242 at relcache.c:2720
frame #7: 0x000000010f7dd096 postgres`AbortSubTransaction + 486 at xact.c:4606
frame #8: 0x000000010f7df322 postgres`RollbackAndReleaseCurrentSubTransaction + 178 at xact.c:4163
frame #9: 0x0000000119245a29 plpgsql.so`exec_stmt_block(estate=0x00007fff50509138, block=0x00007fcc4d0367b0) + 1449 at pl_exec.c:1255
frame #10: 0x000000011924403b plpgsql.so`plpgsql_exec_function(func=0x00007fcc4d145b18, fcinfo=0x00007fcc4d025ef8, simple_eval_estate=0x0000000000000000) + 1163 at pl_exec.c:435
frame #11: 0x000000011923d5c6 plpgsql.so`plpgsql_call_handler(fcinfo=0x00007fcc4d025ef8) + 470 at pl_handler.c:253
frame #12: 0x000000010f98cbc9 postgres`ExecMakeFunctionResultNoSets(fcache=0x00007fcc4d025e88, econtext=0x00007fcc4d124350, isNull=0x00007fff505094ff, isDone=0x0000000000000000) + 377 at execQual.c:2019
frame #13: 0x000000010f98686d postgres`ExecEvalFunc(fcache=0x00007fcc4d025e88, econtext=0x00007fcc4d124350, isNull=0x00007fff505094ff, isDone=0x0000000000000000) + 205 at execQual.c:2410
frame #14: 0x00000001192498c9 plpgsql.so`exec_eval_simple_expr(estate=0x00007fff50509818, expr=0x00007fcc4d14a0d8, result=0x00007fff505094a8, isNull=0x00007fff505094ff, rettype=0x00007fff505094f8, rettypmod=0x00007fff505094f4) + 633 at pl_exec.c:5449
frame #15: 0x00000001192491e5 plpgsql.so`exec_eval_expr(estate=0x00007fff50509818, expr=0x00007fcc4d14a0d8, isNull=0x00007fff505094ff, rettype=0x00007fff505094f8, rettypmod=0x00007fff505094f4) + 101 at pl_exec.c:5025
frame #16: 0x000000011924b226 plpgsql.so`exec_assign_expr(estate=0x00007fff50509818, target=0x00007fcc4d0f7300, expr=0x00007fcc4d14a0d8) + 118 at pl_exec.c:4201
frame #17: 0x000000011924b88a plpgsql.so`exec_stmt_assign(estate=0x00007fff50509818, stmt=0x00007fcc4d14a1c8) + 106 at pl_exec.c:1556
frame #18: 0x000000011924b4fc plpgsql.so`exec_stmt(estate=0x00007fff50509818, stmt=0x00007fcc4d14a1c8) + 252 at pl_exec.c:1440
frame #19: 0x000000011924b2de plpgsql.so`exec_stmts(estate=0x00007fff50509818, stmts=0x00007fcc4d149e28) + 126 at pl_exec.c:1403
frame #20: 0x0000000119245cbe plpgsql.so`exec_stmt_block(estate=0x00007fff50509818, block=0x00007fcc4d14a3e8) + 2110 at pl_exec.c:1341
frame #21: 0x000000011924403b plpgsql.so`plpgsql_exec_function(func=0x00007fcc4d144d28, fcinfo=0x00007fcc4d0e6700, simple_eval_estate=0x0000000000000000) + 1163 at pl_exec.c:435
frame #22: 0x000000011923d5c6 plpgsql.so`plpgsql_call_handler(fcinfo=0x00007fcc4d0e6700) + 470 at pl_handler.c:253
frame #23: 0x000000010f98cbc9 postgres`ExecMakeFunctionResultNoSets(fcache=0x00007fcc4d0e6690, econtext=0x00007fcc4d0e6468, isNull=0x00007fcc4d0e7198, isDone=0x00007fcc4d0e72d8) + 377 at execQual.c:2019
frame #24: 0x000000010f98686d postgres`ExecEvalFunc(fcache=0x00007fcc4d0e6690, econtext=0x00007fcc4d0e6468, isNull=0x00007fcc4d0e7198, isDone=0x00007fcc4d0e72d8) + 205 at execQual.c:2410
frame #25: 0x000000010f98bd9b postgres`ExecTargetList(targetlist=0x00007fcc4d0e72a0, econtext=0x00007fcc4d0e6468, values=0x00007fcc4d0e7178, isnull=0x00007fcc4d0e7198, itemIsDone=0x00007fcc4d0e72d8, isDone=0x00007fff50509c5c) + 187 at execQual.c:5345
frame #26: 0x000000010f98bca4 postgres`ExecProject(projInfo=0x00007fcc4d0e71b8, isDone=0x00007fff50509c5c) + 756 at execQual.c:5560
frame #27: 0x000000010f9af2eb postgres`ExecResult(node=0x00007fcc4d0e6350) + 363 at nodeResult.c:155
frame #28: 0x000000010f9800a0 postgres`ExecProcNode(node=0x00007fcc4d0e6350) + 160 at execProcnode.c:385
frame #29: 0x000000010f97b4f6 postgres`ExecutePlan(estate=0x00007fcc4d0e6238, planstate=0x00007fcc4d0e6350, operation=CMD_SELECT, sendTuples='\x01', numberTuples=0, direction=ForwardScanDirection, dest=0x00007fcc4d01cbb8) + 118 at execMain.c:1549
frame #30: 0x000000010f97b419 postgres`standard_ExecutorRun(queryDesc=0x00007fcc4d1b4638, direction=ForwardScanDirection, count=0) + 457 at execMain.c:337
frame #31: 0x000000010f97b23d postgres`ExecutorRun(queryDesc=0x00007fcc4d1b4638, direction=ForwardScanDirection, count=0) + 77 at execMain.c:285
frame #32: 0x000000010fb4c143 postgres`PortalRunSelect(portal=0x00007fcc4c020c38, forward='\x01', count=0, dest=0x00007fcc4d01cbb8) + 307 at pquery.c:946
frame #33: 0x000000010fb4bb04 postgres`PortalRun(portal=0x00007fcc4c020c38, count=9223372036854775807, isTopLevel='\x01', dest=0x00007fcc4d01cbb8, altdest=0x00007fcc4d01cbb8, completionTag=0x00007fff5050a090) + 820 at pquery.c:790
frame #34: 0x000000010fb470c6 postgres`exec_simple_query(query_string=0x00007fcc4d863c38) + 1302 at postgres.c:1104
frame #35: 0x000000010fb46352 postgres`PostgresMain(argc=1, argv=0x00007fcc4c0028b0, dbname=0x00007fcc4c002710, username=0x00007fcc4c0026f0) + 2546 at postgres.c:4025
frame #36: 0x000000010faaa41e postgres`BackendRun(port=0x00007fcc4be000f0) + 686 at postmaster.c:4230
frame #37: 0x000000010faa99b0 postgres`BackendStartup(port=0x00007fcc4be000f0) + 384 at postmaster.c:3906
frame #38: 0x000000010faa60c7 postgres`ServerLoop + 663 at postmaster.c:1665
frame #39: 0x000000010faa3704 postgres`PostmasterMain(argc=3, argv=0x00007fcc4bc042a0) + 5652 at postmaster.c:1274
frame #40: 0x000000010f9de85c postgres`main(argc=3, argv=0x00007fcc4bc042a0) + 796 at main.c:223
frame #41: 0x00007fff8e9a35fd libdyld.dylib`start + 1
Dump of assembler code for function exec_stmt_block:
0x0000000119245480 <exec_stmt_block+0>: push %rbp
0x0000000119245481 <exec_stmt_block+1>: mov %rsp,%rbp
0x0000000119245484 <exec_stmt_block+4>: sub $0x190,%rsp
0x000000011924548b <exec_stmt_block+11>: lea 0x173a7(%rip),%rax # 0x11925c839
0x0000000119245492 <exec_stmt_block+18>: mov 0x18bc7(%rip),%rcx # 0x11925e060
0x0000000119245499 <exec_stmt_block+25>: mov (%rcx),%rcx
0x000000011924549c <exec_stmt_block+28>: mov %rcx,-0x8(%rbp)
0x00000001192454a0 <exec_stmt_block+32>: mov %rdi,-0xb0(%rbp)
0x00000001192454a7 <exec_stmt_block+39>: mov %rsi,-0xb8(%rbp)
0x00000001192454ae <exec_stmt_block+46>: movl $0xffffffff,-0xbc(%rbp)
0x00000001192454b8 <exec_stmt_block+56>: mov -0xb0(%rbp),%rcx
0x00000001192454bf <exec_stmt_block+63>: mov %rax,0xb0(%rcx)
0x00000001192454c6 <exec_stmt_block+70>: movl $0x0,-0xc0(%rbp)
0x00000001192454d0 <exec_stmt_block+80>: mov -0xc0(%rbp),%eax
0x00000001192454d6 <exec_stmt_block+86>: mov -0xb8(%rbp),%rcx
0x00000001192454dd <exec_stmt_block+93>: cmp 0x18(%rcx),%eax
0x00000001192454e0 <exec_stmt_block+96>: jge 0x1192457bd <exec_stmt_block+829>
0x00000001192454e6 <exec_stmt_block+102>: movslq -0xc0(%rbp),%rax
0x00000001192454ed <exec_stmt_block+109>: mov -0xb8(%rbp),%rcx
0x00000001192454f4 <exec_stmt_block+116>: mov 0x20(%rcx),%rcx
0x00000001192454f8 <exec_stmt_block+120>: mov (%rcx,%rax,4),%edx
0x00000001192454fb <exec_stmt_block+123>: mov %edx,-0xc4(%rbp)
0x0000000119245501 <exec_stmt_block+129>: movslq -0xc4(%rbp),%rax
0x0000000119245508 <exec_stmt_block+136>: mov -0xb0(%rbp),%rcx
0x000000011924550f <exec_stmt_block+143>: mov 0x60(%rcx),%rcx
0x0000000119245513 <exec_stmt_block+147>: mov (%rcx,%rax,8),%rax
0x0000000119245517 <exec_stmt_block+151>: mov (%rax),%edx
0x0000000119245519 <exec_stmt_block+153>: mov %edx,%esi
0x000000011924551b <exec_stmt_block+155>: add $0xfffffffffffffffd,%esi
0x000000011924551e <exec_stmt_block+158>: sub $0x2,%esi
0x0000000119245521 <exec_stmt_block+161>: mov %edx,-0x13c(%rbp)
0x0000000119245527 <exec_stmt_block+167>: mov %esi,-0x140(%rbp)
0x000000011924552d <exec_stmt_block+173>: jb 0x119245750 <exec_stmt_block+720>
0x0000000119245533 <exec_stmt_block+179>: jmpq 0x119245538 <exec_stmt_block+184>
0x0000000119245538 <exec_stmt_block+184>: mov -0x13c(%rbp),%eax
0x000000011924553e <exec_stmt_block+190>: sub $0x2,%eax
0x0000000119245541 <exec_stmt_block+193>: mov %eax,-0x144(%rbp)
0x0000000119245547 <exec_stmt_block+199>: je 0x1192456b8 <exec_stmt_block+568>
0x000000011924554d <exec_stmt_block+205>: jmpq 0x119245552 <exec_stmt_block+210>
0x0000000119245552 <exec_stmt_block+210>: mov -0x13c(%rbp),%eax
0x0000000119245558 <exec_stmt_block+216>: test %eax,%eax
0x000000011924555a <exec_stmt_block+218>: jne 0x119245755 <exec_stmt_block+725>
0x0000000119245560 <exec_stmt_block+224>: jmpq 0x119245565 <exec_stmt_block+229>
0x0000000119245565 <exec_stmt_block+229>: mov $0x0,%rdx
0x000000011924556f <exec_stmt_block+239>: mov $0x1,%ecx
0x0000000119245574 <exec_stmt_block+244>: mov $0x0,%r8d
0x000000011924557a <exec_stmt_block+250>: movslq -0xc4(%rbp),%rax
0x0000000119245581 <exec_stmt_block+257>: mov -0xb0(%rbp),%rsi
0x0000000119245588 <exec_stmt_block+264>: mov 0x60(%rsi),%rsi
0x000000011924558c <exec_stmt_block+268>: mov (%rsi,%rax,8),%rax
0x0000000119245590 <exec_stmt_block+272>: mov %rax,-0xd0(%rbp)
0x0000000119245597 <exec_stmt_block+279>: mov -0xb0(%rbp),%rdi
0x000000011924559e <exec_stmt_block+286>: mov -0xd0(%rbp),%rsi
0x00000001192455a5 <exec_stmt_block+293>: callq 0x119244dc0 <assign_simple_var>
0x00000001192455aa <exec_stmt_block+298>: mov -0xd0(%rbp),%rax
0x00000001192455b1 <exec_stmt_block+305>: cmpq $0x0,0x28(%rax)
0x00000001192455b9 <exec_stmt_block+313>: jne 0x119245692 <exec_stmt_block+530>
0x00000001192455bf <exec_stmt_block+319>: mov -0xd0(%rbp),%rax
0x00000001192455c6 <exec_stmt_block+326>: mov 0x18(%rax),%rax
0x00000001192455ca <exec_stmt_block+330>: movsbl 0x13(%rax),%ecx
0x00000001192455ce <exec_stmt_block+334>: cmp $0x64,%ecx
0x00000001192455d4 <exec_stmt_block+340>: jne 0x11924560b <exec_stmt_block+395>
0x00000001192455da <exec_stmt_block+346>: mov $0x0,%rdx
0x00000001192455e4 <exec_stmt_block+356>: mov $0x1,%ecx
0x00000001192455e9 <exec_stmt_block+361>: mov $0x2c1,%r8d
0x00000001192455ef <exec_stmt_block+367>: mov $0xffffffff,%r9d
0x00000001192455f5 <exec_stmt_block+373>: mov -0xb0(%rbp),%rdi
0x00000001192455fc <exec_stmt_block+380>: mov -0xd0(%rbp),%rax
0x0000000119245603 <exec_stmt_block+387>: mov %rax,%rsi
0x0000000119245606 <exec_stmt_block+390>: callq 0x119247d40 <exec_assign_value>
0x000000011924560b <exec_stmt_block+395>: mov -0xd0(%rbp),%rax
0x0000000119245612 <exec_stmt_block+402>: cmpl $0x0,0x24(%rax)
0x0000000119245619 <exec_stmt_block+409>: je 0x11924568d <exec_stmt_block+525>
0x000000011924561f <exec_stmt_block+415>: jmpq 0x119245624 <exec_stmt_block+420>
0x0000000119245624 <exec_stmt_block+420>: mov $0x14,%edi
0x0000000119245629 <exec_stmt_block+425>: lea 0x16860(%rip),%rsi # 0x11925be90
0x0000000119245630 <exec_stmt_block+432>: mov $0x46e,%edx
0x0000000119245635 <exec_stmt_block+437>: lea 0x17232(%rip),%rcx # 0x11925c86e
0x000000011924563c <exec_stmt_block+444>: lea 0x136fd(%rip),%r8 # 0x119258d40
0x0000000119245643 <exec_stmt_block+451>: callq 0x119256590 <dyld_stub_errstart>
0x0000000119245648 <exec_stmt_block+456>: cmp $0x0,%al
0x000000011924564a <exec_stmt_block+458>: je 0x119245688 <exec_stmt_block+520>
0x0000000119245650 <exec_stmt_block+464>: mov $0x4000082,%edi
0x0000000119245655 <exec_stmt_block+469>: callq 0x119256554 <dyld_stub_errcode>
0x000000011924565a <exec_stmt_block+474>: lea 0x1721d(%rip),%rdi # 0x11925c87e
0x0000000119245661 <exec_stmt_block+481>: mov -0xd0(%rbp),%rcx
0x0000000119245668 <exec_stmt_block+488>: mov 0x8(%rcx),%rsi
0x000000011924566c <exec_stmt_block+492>: mov %eax,-0x148(%rbp)
0x0000000119245672 <exec_stmt_block+498>: mov $0x0,%al
0x0000000119245674 <exec_stmt_block+500>: callq 0x119256578 <dyld_stub_errmsg>
0x0000000119245679 <exec_stmt_block+505>: mov -0x148(%rbp),%edi
0x000000011924567f <exec_stmt_block+511>: mov %eax,%esi
0x0000000119245681 <exec_stmt_block+513>: mov $0x0,%al
0x0000000119245683 <exec_stmt_block+515>: callq 0x11925656c <dyld_stub_errfinish>
0x0000000119245688 <exec_stmt_block+520>: callq 0x119256776 <dyld_stub_abort>
0x000000011924568d <exec_stmt_block+525>: jmpq 0x1192456b3 <exec_stmt_block+563>
0x0000000119245692 <exec_stmt_block+530>: mov -0xb0(%rbp),%rdi
0x0000000119245699 <exec_stmt_block+537>: mov -0xd0(%rbp),%rax
0x00000001192456a0 <exec_stmt_block+544>: mov -0xd0(%rbp),%rcx
0x00000001192456a7 <exec_stmt_block+551>: mov 0x28(%rcx),%rdx
0x00000001192456ab <exec_stmt_block+555>: mov %rax,%rsi
0x00000001192456ae <exec_stmt_block+558>: callq 0x11924b1b0 <exec_assign_expr>
0x00000001192456b3 <exec_stmt_block+563>: jmpq 0x1192457a2 <exec_stmt_block+802>
0x00000001192456b8 <exec_stmt_block+568>: movslq -0xc4(%rbp),%rax
0x00000001192456bf <exec_stmt_block+575>: mov -0xb0(%rbp),%rcx
0x00000001192456c6 <exec_stmt_block+582>: mov 0x60(%rcx),%rcx
0x00000001192456ca <exec_stmt_block+586>: mov (%rcx,%rax,8),%rax
0x00000001192456ce <exec_stmt_block+590>: mov %rax,-0xd8(%rbp)
0x00000001192456d5 <exec_stmt_block+597>: mov -0xd8(%rbp),%rax
0x00000001192456dc <exec_stmt_block+604>: cmpb $0x0,0x28(%rax)
0x00000001192456e0 <exec_stmt_block+608>: je 0x119245701 <exec_stmt_block+641>
0x00000001192456e6 <exec_stmt_block+614>: mov -0xd8(%rbp),%rax
0x00000001192456ed <exec_stmt_block+621>: mov 0x18(%rax),%rdi
0x00000001192456f1 <exec_stmt_block+625>: callq 0x11925665c <dyld_stub_heap_freetuple>
0x00000001192456f6 <exec_stmt_block+630>: mov -0xd8(%rbp),%rax
0x00000001192456fd <exec_stmt_block+637>: movb $0x0,0x28(%rax)
0x0000000119245701 <exec_stmt_block+641>: mov -0xd8(%rbp),%rax
0x0000000119245708 <exec_stmt_block+648>: cmpb $0x0,0x29(%rax)
0x000000011924570c <exec_stmt_block+652>: je 0x11924572d <exec_stmt_block+685>
0x0000000119245712 <exec_stmt_block+658>: mov -0xd8(%rbp),%rax
0x0000000119245719 <exec_stmt_block+665>: mov 0x20(%rax),%rdi
0x000000011924571d <exec_stmt_block+669>: callq 0x119256338 <dyld_stub_FreeTupleDesc>
0x0000000119245722 <exec_stmt_block+674>: mov -0xd8(%rbp),%rax
0x0000000119245729 <exec_stmt_block+681>: movb $0x0,0x29(%rax)
0x000000011924572d <exec_stmt_block+685>: mov -0xd8(%rbp),%rax
0x0000000119245734 <exec_stmt_block+692>: movq $0x0,0x18(%rax)
0x000000011924573c <exec_stmt_block+700>: mov -0xd8(%rbp),%rax
0x0000000119245743 <exec_stmt_block+707>: movq $0x0,0x20(%rax)
0x000000011924574b <exec_stmt_block+715>: jmpq 0x1192457a2 <exec_stmt_block+802>
0x0000000119245750 <exec_stmt_block+720>: jmpq 0x1192457a2 <exec_stmt_block+802>
0x0000000119245755 <exec_stmt_block+725>: jmpq 0x11924575a <exec_stmt_block+730>
0x000000011924575a <exec_stmt_block+730>: lea 0x1672f(%rip),%rdi # 0x11925be90
0x0000000119245761 <exec_stmt_block+737>: mov $0x491,%esi
0x0000000119245766 <exec_stmt_block+742>: lea 0x17101(%rip),%rdx # 0x11925c86e
0x000000011924576d <exec_stmt_block+749>: callq 0x119256548 <dyld_stub_elog_start>
0x0000000119245772 <exec_stmt_block+754>: mov $0x14,%edi
0x0000000119245777 <exec_stmt_block+759>: lea 0x141c1(%rip),%rsi # 0x11925993f
0x000000011924577e <exec_stmt_block+766>: movslq -0xc4(%rbp),%rdx
0x0000000119245785 <exec_stmt_block+773>: mov -0xb0(%rbp),%rax
0x000000011924578c <exec_stmt_block+780>: mov 0x60(%rax),%rax
0x0000000119245790 <exec_stmt_block+784>: mov (%rax,%rdx,8),%rax
0x0000000119245794 <exec_stmt_block+788>: mov (%rax),%edx
0x0000000119245796 <exec_stmt_block+790>: mov $0x0,%al
0x0000000119245798 <exec_stmt_block+792>: callq 0x119256542 <dyld_stub_elog_finish>
0x000000011924579d <exec_stmt_block+797>: callq 0x119256776 <dyld_stub_abort>
0x00000001192457a2 <exec_stmt_block+802>: jmpq 0x1192457a7 <exec_stmt_block+807>
0x00000001192457a7 <exec_stmt_block+807>: mov -0xc0(%rbp),%eax
0x00000001192457ad <exec_stmt_block+813>: add $0x1,%eax
0x00000001192457b2 <exec_stmt_block+818>: mov %eax,-0xc0(%rbp)
0x00000001192457b8 <exec_stmt_block+824>: jmpq 0x1192454d0 <exec_stmt_block+80>
0x00000001192457bd <exec_stmt_block+829>: mov -0xb8(%rbp),%rax
0x00000001192457c4 <exec_stmt_block+836>: cmpq $0x0,0x28(%rax)
0x00000001192457cc <exec_stmt_block+844>: je 0x119245c95 <exec_stmt_block+2069>
0x00000001192457d2 <exec_stmt_block+850>: mov $0x0,%rdi
0x00000001192457dc <exec_stmt_block+860>: lea 0x170d2(%rip),%rax # 0x11925c8b5
0x00000001192457e3 <exec_stmt_block+867>: mov 0x1881e(%rip),%rcx # 0x11925e008
0x00000001192457ea <exec_stmt_block+874>: mov 0x1880f(%rip),%rdx # 0x11925e000
0x00000001192457f1 <exec_stmt_block+881>: mov (%rdx),%rdx
0x00000001192457f4 <exec_stmt_block+884>: mov %rdx,-0xe0(%rbp)
0x00000001192457fb <exec_stmt_block+891>: mov (%rcx),%rcx
0x00000001192457fe <exec_stmt_block+894>: mov %rcx,-0xe8(%rbp)
0x0000000119245805 <exec_stmt_block+901>: mov -0xb0(%rbp),%rcx
0x000000011924580c <exec_stmt_block+908>: mov 0xa0(%rcx),%rcx
0x0000000119245813 <exec_stmt_block+915>: mov %rcx,-0xf0(%rbp)
0x000000011924581a <exec_stmt_block+922>: mov -0xb0(%rbp),%rcx
0x0000000119245821 <exec_stmt_block+929>: mov 0x30(%rcx),%rcx
0x0000000119245825 <exec_stmt_block+933>: mov %rcx,-0xf8(%rbp)
0x000000011924582c <exec_stmt_block+940>: mov -0xb0(%rbp),%rcx
0x0000000119245833 <exec_stmt_block+947>: mov %rax,0xb0(%rcx)
0x000000011924583a <exec_stmt_block+954>: callq 0x1192562ae <dyld_stub_BeginInternalSubTransaction>
0x000000011924583f <exec_stmt_block+959>: mov -0xe0(%rbp),%rdi
0x0000000119245846 <exec_stmt_block+966>: callq 0x119245e50 <MemoryContextSwitchTo>
0x000000011924584b <exec_stmt_block+971>: mov %rax,-0x150(%rbp)
0x0000000119245852 <exec_stmt_block+978>: mov $0x0,%esi
0x0000000119245857 <exec_stmt_block+983>: lea -0xa0(%rbp),%rdi
0x000000011924585e <exec_stmt_block+990>: mov 0x18813(%rip),%rax # 0x11925e078
0x0000000119245865 <exec_stmt_block+997>: mov 0x187bc(%rip),%rcx # 0x11925e028
0x000000011924586c <exec_stmt_block+1004>: mov (%rcx),%rcx
0x000000011924586f <exec_stmt_block+1007>: mov %rcx,-0x100(%rbp)
0x0000000119245876 <exec_stmt_block+1014>: mov (%rax),%rax
0x0000000119245879 <exec_stmt_block+1017>: mov %rax,-0x108(%rbp)
0x0000000119245880 <exec_stmt_block+1024>: callq 0x11925679a <dyld_stub_sigsetjmp>
0x0000000119245885 <exec_stmt_block+1029>: cmp $0x0,%eax
0x000000011924588a <exec_stmt_block+1034>: jne 0x1192459c9 <exec_stmt_block+1353>
0x0000000119245890 <exec_stmt_block+1040>: mov 0x18791(%rip),%rax # 0x11925e028
0x0000000119245897 <exec_stmt_block+1047>: lea -0xa0(%rbp),%rcx
0x000000011924589e <exec_stmt_block+1054>: mov %rcx,(%rax)
0x00000001192458a1 <exec_stmt_block+1057>: mov -0xb0(%rbp),%rdi
0x00000001192458a8 <exec_stmt_block+1064>: callq 0x11924b0a0 <plpgsql_create_econtext>
0x00000001192458ad <exec_stmt_block+1069>: mov -0xb0(%rbp),%rax
0x00000001192458b4 <exec_stmt_block+1076>: movq $0x0,0xb0(%rax)
0x00000001192458bf <exec_stmt_block+1087>: mov -0xb0(%rbp),%rdi
0x00000001192458c6 <exec_stmt_block+1094>: mov -0xb8(%rbp),%rax
0x00000001192458cd <exec_stmt_block+1101>: mov 0x10(%rax),%rsi
0x00000001192458d1 <exec_stmt_block+1105>: callq 0x11924b260 <exec_stmts>
0x00000001192458d6 <exec_stmt_block+1110>: lea 0x16ff5(%rip),%rcx # 0x11925c8d2
0x00000001192458dd <exec_stmt_block+1117>: mov %eax,-0xbc(%rbp)
0x00000001192458e3 <exec_stmt_block+1123>: mov -0xb0(%rbp),%rsi
0x00000001192458ea <exec_stmt_block+1130>: mov %rcx,0xb0(%rsi)
0x00000001192458f1 <exec_stmt_block+1137>: mov -0xbc(%rbp),%eax
0x00000001192458f7 <exec_stmt_block+1143>: cmp $0x2,%eax
0x00000001192458fc <exec_stmt_block+1148>: jne 0x119245981 <exec_stmt_block+1281>
0x0000000119245902 <exec_stmt_block+1154>: mov -0xb0(%rbp),%rax
0x0000000119245909 <exec_stmt_block+1161>: cmpb $0x0,0x1d(%rax)
0x000000011924590d <exec_stmt_block+1165>: jne 0x119245981 <exec_stmt_block+1281>
0x0000000119245913 <exec_stmt_block+1171>: mov -0xb0(%rbp),%rax
0x000000011924591a <exec_stmt_block+1178>: cmpb $0x0,0x10(%rax)
0x000000011924591e <exec_stmt_block+1182>: jne 0x119245981 <exec_stmt_block+1281>
0x0000000119245924 <exec_stmt_block+1188>: mov -0xb0(%rbp),%rax
0x000000011924592b <exec_stmt_block+1195>: cmpq $0x0,0x20(%rax)
0x0000000119245933 <exec_stmt_block+1203>: jne 0x119245981 <exec_stmt_block+1281>
0x0000000119245939 <exec_stmt_block+1209>: lea -0x10a(%rbp),%rsi
0x0000000119245940 <exec_stmt_block+1216>: lea -0x10b(%rbp),%rdx
0x0000000119245947 <exec_stmt_block+1223>: mov -0xb0(%rbp),%rax
0x000000011924594e <exec_stmt_block+1230>: mov 0x14(%rax),%edi
0x0000000119245951 <exec_stmt_block+1233>: callq 0x119256620 <dyld_stub_get_typlenbyval>
0x0000000119245956 <exec_stmt_block+1238>: mov -0xb0(%rbp),%rax
0x000000011924595d <exec_stmt_block+1245>: mov 0x8(%rax),%rdi
0x0000000119245961 <exec_stmt_block+1249>: mov -0x10b(%rbp),%cl
0x0000000119245967 <exec_stmt_block+1255>: movswl -0x10a(%rbp),%edx
0x000000011924596e <exec_stmt_block+1262>: movsbl %cl,%esi
0x0000000119245971 <exec_stmt_block+1265>: callq 0x119256530 <dyld_stub_datumCopy>
0x0000000119245976 <exec_stmt_block+1270>: mov -0xb0(%rbp),%rdi
0x000000011924597d <exec_stmt_block+1277>: mov %rax,0x8(%rdi)
0x0000000119245981 <exec_stmt_block+1281>: callq 0x1192563c8 <dyld_stub_ReleaseCurrentSubTransaction>
0x0000000119245986 <exec_stmt_block+1286>: mov -0xe0(%rbp),%rdi
0x000000011924598d <exec_stmt_block+1293>: callq 0x119245e50 <MemoryContextSwitchTo>
0x0000000119245992 <exec_stmt_block+1298>: mov 0x1866f(%rip),%rdi # 0x11925e008
0x0000000119245999 <exec_stmt_block+1305>: mov -0xe8(%rbp),%rcx
0x00000001192459a0 <exec_stmt_block+1312>: mov %rcx,(%rdi)
0x00000001192459a3 <exec_stmt_block+1315>: mov -0xf0(%rbp),%rcx
0x00000001192459aa <exec_stmt_block+1322>: mov -0xb0(%rbp),%rdi
0x00000001192459b1 <exec_stmt_block+1329>: mov %rcx,0xa0(%rdi)
0x00000001192459b8 <exec_stmt_block+1336>: mov %rax,-0x158(%rbp)
0x00000001192459bf <exec_stmt_block+1343>: callq 0x11925646a <dyld_stub_SPI_restore_connection>
0x00000001192459c4 <exec_stmt_block+1348>: jmpq 0x119245c2d <exec_stmt_block+1965>
0x00000001192459c9 <exec_stmt_block+1353>: lea 0x16f1e(%rip),%rax # 0x11925c8ee
0x00000001192459d0 <exec_stmt_block+1360>: mov 0x186a1(%rip),%rcx # 0x11925e078
0x00000001192459d7 <exec_stmt_block+1367>: mov 0x1864a(%rip),%rdx # 0x11925e028
0x00000001192459de <exec_stmt_block+1374>: mov -0x100(%rbp),%rsi
0x00000001192459e5 <exec_stmt_block+1381>: mov %rsi,(%rdx)
0x00000001192459e8 <exec_stmt_block+1384>: mov -0x108(%rbp),%rdx
0x00000001192459ef <exec_stmt_block+1391>: mov %rdx,(%rcx)
0x00000001192459f2 <exec_stmt_block+1394>: mov -0xb0(%rbp),%rcx
0x00000001192459f9 <exec_stmt_block+1401>: mov %rax,0xb0(%rcx)
0x0000000119245a00 <exec_stmt_block+1408>: mov -0xe0(%rbp),%rdi
0x0000000119245a07 <exec_stmt_block+1415>: callq 0x119245e50 <MemoryContextSwitchTo>
0x0000000119245a0c <exec_stmt_block+1420>: mov %rax,-0x160(%rbp)
0x0000000119245a13 <exec_stmt_block+1427>: callq 0x1192562c6 <dyld_stub_CopyErrorData>
0x0000000119245a18 <exec_stmt_block+1432>: mov %rax,-0x118(%rbp)
0x0000000119245a1f <exec_stmt_block+1439>: callq 0x119256320 <dyld_stub_FlushErrorState>
0x0000000119245a24 <exec_stmt_block+1444>: callq 0x1192563da <dyld_stub_RollbackAndReleaseCurrentSubTransaction>
0x0000000119245a29 <exec_stmt_block+1449>: mov -0xe0(%rbp),%rdi
0x0000000119245a30 <exec_stmt_block+1456>: callq 0x119245e50 <MemoryContextSwitchTo>
0x0000000119245a35 <exec_stmt_block+1461>: mov 0x185cc(%rip),%rcx # 0x11925e008
0x0000000119245a3c <exec_stmt_block+1468>: mov -0xe8(%rbp),%rdx
0x0000000119245a43 <exec_stmt_block+1475>: mov %rdx,(%rcx)
0x0000000119245a46 <exec_stmt_block+1478>: mov -0xf0(%rbp),%rcx
0x0000000119245a4d <exec_stmt_block+1485>: mov -0xb0(%rbp),%rdx
0x0000000119245a54 <exec_stmt_block+1492>: mov %rcx,0xa0(%rdx)
0x0000000119245a5b <exec_stmt_block+1499>: mov %rax,-0x168(%rbp)
0x0000000119245a62 <exec_stmt_block+1506>: callq 0x11925646a <dyld_stub_SPI_restore_connection>
0x0000000119245a67 <exec_stmt_block+1511>: mov -0xb0(%rbp),%rax
0x0000000119245a6e <exec_stmt_block+1518>: movq $0x0,0x90(%rax)
0x0000000119245a79 <exec_stmt_block+1529>: mov -0xb0(%rbp),%rdi
0x0000000119245a80 <exec_stmt_block+1536>: callq 0x1192453a0 <exec_eval_cleanup>
0x0000000119245a85 <exec_stmt_block+1541>: mov -0xb8(%rbp),%rax
0x0000000119245a8c <exec_stmt_block+1548>: mov 0x28(%rax),%rax
0x0000000119245a90 <exec_stmt_block+1552>: mov 0x8(%rax),%rdi
0x0000000119245a94 <exec_stmt_block+1556>: callq 0x11924a3e0 <list_head>
0x0000000119245a99 <exec_stmt_block+1561>: mov %rax,-0x120(%rbp)
0x0000000119245aa0 <exec_stmt_block+1568>: cmpq $0x0,-0x120(%rbp)
0x0000000119245aab <exec_stmt_block+1579>: je 0x119245bed <exec_stmt_block+1901>
0x0000000119245ab1 <exec_stmt_block+1585>: mov -0x120(%rbp),%rax
0x0000000119245ab8 <exec_stmt_block+1592>: mov (%rax),%rax
0x0000000119245abb <exec_stmt_block+1595>: mov %rax,-0x128(%rbp)
0x0000000119245ac2 <exec_stmt_block+1602>: mov -0x118(%rbp),%rdi
0x0000000119245ac9 <exec_stmt_block+1609>: mov -0x128(%rbp),%rax
0x0000000119245ad0 <exec_stmt_block+1616>: mov 0x8(%rax),%rsi
0x0000000119245ad4 <exec_stmt_block+1620>: callq 0x11924b320 <exception_matches_conditions>
0x0000000119245ad9 <exec_stmt_block+1625>: cmp $0x0,%al
0x0000000119245adb <exec_stmt_block+1627>: je 0x119245bd1 <exec_stmt_block+1873>
0x0000000119245ae1 <exec_stmt_block+1633>: mov -0xb8(%rbp),%rax
0x0000000119245ae8 <exec_stmt_block+1640>: mov 0x28(%rax),%rax
0x0000000119245aec <exec_stmt_block+1644>: movslq (%rax),%rax
0x0000000119245aef <exec_stmt_block+1647>: mov -0xb0(%rbp),%rcx
0x0000000119245af6 <exec_stmt_block+1654>: mov 0x60(%rcx),%rcx
0x0000000119245afa <exec_stmt_block+1658>: mov (%rcx,%rax,8),%rax
0x0000000119245afe <exec_stmt_block+1662>: mov %rax,-0x130(%rbp)
0x0000000119245b05 <exec_stmt_block+1669>: mov -0xb8(%rbp),%rax
0x0000000119245b0c <exec_stmt_block+1676>: mov 0x28(%rax),%rax
0x0000000119245b10 <exec_stmt_block+1680>: movslq 0x4(%rax),%rax
0x0000000119245b14 <exec_stmt_block+1684>: mov -0xb0(%rbp),%rcx
0x0000000119245b1b <exec_stmt_block+1691>: mov 0x60(%rcx),%rcx
0x0000000119245b1f <exec_stmt_block+1695>: mov (%rcx,%rax,8),%rax
0x0000000119245b23 <exec_stmt_block+1699>: mov %rax,-0x138(%rbp)
0x0000000119245b2a <exec_stmt_block+1706>: mov -0xb0(%rbp),%rdi
0x0000000119245b31 <exec_stmt_block+1713>: mov -0x130(%rbp),%rsi
0x0000000119245b38 <exec_stmt_block+1720>: mov -0x118(%rbp),%rax
0x0000000119245b3f <exec_stmt_block+1727>: mov 0x38(%rax),%edx
0x0000000119245b42 <exec_stmt_block+1730>: mov %rdi,-0x170(%rbp)
0x0000000119245b49 <exec_stmt_block+1737>: mov %edx,%edi
0x0000000119245b4b <exec_stmt_block+1739>: mov %rsi,-0x178(%rbp)
0x0000000119245b52 <exec_stmt_block+1746>: callq 0x11925674c <dyld_stub_unpack_sql_state>
0x0000000119245b57 <exec_stmt_block+1751>: mov -0x170(%rbp),%rdi
0x0000000119245b5e <exec_stmt_block+1758>: mov -0x178(%rbp),%rsi
0x0000000119245b65 <exec_stmt_block+1765>: mov %rax,%rdx
0x0000000119245b68 <exec_stmt_block+1768>: callq 0x119246cf0 <assign_text_var>
0x0000000119245b6d <exec_stmt_block+1773>: mov -0xb0(%rbp),%rdi
0x0000000119245b74 <exec_stmt_block+1780>: mov -0x138(%rbp),%rsi
0x0000000119245b7b <exec_stmt_block+1787>: mov -0x118(%rbp),%rax
0x0000000119245b82 <exec_stmt_block+1794>: mov 0x40(%rax),%rdx
0x0000000119245b86 <exec_stmt_block+1798>: callq 0x119246cf0 <assign_text_var>
0x0000000119245b8b <exec_stmt_block+1803>: mov -0x118(%rbp),%rax
0x0000000119245b92 <exec_stmt_block+1810>: mov -0xb0(%rbp),%rcx
0x0000000119245b99 <exec_stmt_block+1817>: mov %rax,0x30(%rcx)
0x0000000119245b9d <exec_stmt_block+1821>: mov -0xb0(%rbp),%rax
0x0000000119245ba4 <exec_stmt_block+1828>: movq $0x0,0xb0(%rax)
0x0000000119245baf <exec_stmt_block+1839>: mov -0xb0(%rbp),%rdi
0x0000000119245bb6 <exec_stmt_block+1846>: mov -0x128(%rbp),%rax
0x0000000119245bbd <exec_stmt_block+1853>: mov 0x10(%rax),%rsi
0x0000000119245bc1 <exec_stmt_block+1857>: callq 0x11924b260 <exec_stmts>
0x0000000119245bc6 <exec_stmt_block+1862>: mov %eax,-0xbc(%rbp)
0x0000000119245bcc <exec_stmt_block+1868>: jmpq 0x119245bed <exec_stmt_block+1901>
0x0000000119245bd1 <exec_stmt_block+1873>: jmpq 0x119245bd6 <exec_stmt_block+1878>
0x0000000119245bd6 <exec_stmt_block+1878>: mov -0x120(%rbp),%rax
0x0000000119245bdd <exec_stmt_block+1885>: mov 0x8(%rax),%rax
0x0000000119245be1 <exec_stmt_block+1889>: mov %rax,-0x120(%rbp)
0x0000000119245be8 <exec_stmt_block+1896>: jmpq 0x119245aa0 <exec_stmt_block+1568>
0x0000000119245bed <exec_stmt_block+1901>: mov -0xf8(%rbp),%rax
0x0000000119245bf4 <exec_stmt_block+1908>: mov -0xb0(%rbp),%rcx
0x0000000119245bfb <exec_stmt_block+1915>: mov %rax,0x30(%rcx)
0x0000000119245bff <exec_stmt_block+1919>: cmpq $0x0,-0x120(%rbp)
0x0000000119245c0a <exec_stmt_block+1930>: jne 0x119245c1c <exec_stmt_block+1948>
0x0000000119245c10 <exec_stmt_block+1936>: mov -0x118(%rbp),%rdi
0x0000000119245c17 <exec_stmt_block+1943>: callq 0x1192563b0 <dyld_stub_ReThrowError>
0x0000000119245c1c <exec_stmt_block+1948>: mov -0x118(%rbp),%rdi
0x0000000119245c23 <exec_stmt_block+1955>: callq 0x119256326 <dyld_stub_FreeErrorData>
0x0000000119245c28 <exec_stmt_block+1960>: jmpq 0x119245c2d <exec_stmt_block+1965>
0x0000000119245c2d <exec_stmt_block+1965>: mov 0x18444(%rip),%rax # 0x11925e078
0x0000000119245c34 <exec_stmt_block+1972>: mov 0x183ed(%rip),%rcx # 0x11925e028
0x0000000119245c3b <exec_stmt_block+1979>: mov -0x100(%rbp),%rdx
0x0000000119245c42 <exec_stmt_block+1986>: mov %rdx,(%rcx)
0x0000000119245c45 <exec_stmt_block+1989>: mov -0x108(%rbp),%rcx
0x0000000119245c4c <exec_stmt_block+1996>: mov %rcx,(%rax)
0x0000000119245c4f <exec_stmt_block+1999>: jmpq 0x119245c54 <exec_stmt_block+2004>
0x0000000119245c54 <exec_stmt_block+2004>: mov -0xf8(%rbp),%rax
0x0000000119245c5b <exec_stmt_block+2011>: mov -0xb0(%rbp),%rcx
0x0000000119245c62 <exec_stmt_block+2018>: cmp 0x30(%rcx),%rax
0x0000000119245c66 <exec_stmt_block+2022>: je 0x119245c8b <exec_stmt_block+2059>
0x0000000119245c6c <exec_stmt_block+2028>: lea 0x16c94(%rip),%rdi # 0x11925c907
0x0000000119245c73 <exec_stmt_block+2035>: lea 0x13488(%rip),%rsi # 0x119259102
0x0000000119245c7a <exec_stmt_block+2042>: lea 0x1620f(%rip),%rdx # 0x11925be90
0x0000000119245c81 <exec_stmt_block+2049>: mov $0x534,%ecx
0x0000000119245c86 <exec_stmt_block+2054>: callq 0x119256314 <dyld_stub_ExceptionalCondition>
0x0000000119245c8b <exec_stmt_block+2059>: jmpq 0x119245c90 <exec_stmt_block+2064>
0x0000000119245c90 <exec_stmt_block+2064>: jmpq 0x119245cc4 <exec_stmt_block+2116>
0x0000000119245c95 <exec_stmt_block+2069>: mov -0xb0(%rbp),%rax
0x0000000119245c9c <exec_stmt_block+2076>: movq $0x0,0xb0(%rax)
0x0000000119245ca7 <exec_stmt_block+2087>: mov -0xb0(%rbp),%rdi
0x0000000119245cae <exec_stmt_block+2094>: mov -0xb8(%rbp),%rax
0x0000000119245cb5 <exec_stmt_block+2101>: mov 0x10(%rax),%rsi
0x0000000119245cb9 <exec_stmt_block+2105>: callq 0x11924b260 <exec_stmts>
0x0000000119245cbe <exec_stmt_block+2110>: mov %eax,-0xbc(%rbp)
0x0000000119245cc4 <exec_stmt_block+2116>: mov -0xb0(%rbp),%rax
0x0000000119245ccb <exec_stmt_block+2123>: movq $0x0,0xb0(%rax)
0x0000000119245cd6 <exec_stmt_block+2134>: mov -0xbc(%rbp),%ecx
0x0000000119245cdc <exec_stmt_block+2140>: test %ecx,%ecx
0x0000000119245cde <exec_stmt_block+2142>: mov %ecx,-0x17c(%rbp)
0x0000000119245ce4 <exec_stmt_block+2148>: je 0x119245d26 <exec_stmt_block+2214>
0x0000000119245cea <exec_stmt_block+2154>: jmpq 0x119245cef <exec_stmt_block+2159>
0x0000000119245cef <exec_stmt_block+2159>: mov -0x17c(%rbp),%eax
0x0000000119245cf5 <exec_stmt_block+2165>: sub $0x1,%eax
0x0000000119245cf8 <exec_stmt_block+2168>: mov %eax,-0x180(%rbp)
0x0000000119245cfe <exec_stmt_block+2174>: je 0x119245d37 <exec_stmt_block+2231>
0x0000000119245d04 <exec_stmt_block+2180>: jmpq 0x119245d09 <exec_stmt_block+2185>
0x0000000119245d09 <exec_stmt_block+2185>: mov -0x17c(%rbp),%eax
0x0000000119245d0f <exec_stmt_block+2191>: add $0xfffffffffffffffe,%eax
0x0000000119245d12 <exec_stmt_block+2194>: sub $0x1,%eax
0x0000000119245d15 <exec_stmt_block+2197>: mov %eax,-0x184(%rbp)
0x0000000119245d1b <exec_stmt_block+2203>: ja 0x119245dd2 <exec_stmt_block+2386>
0x0000000119245d21 <exec_stmt_block+2209>: jmpq 0x119245d26 <exec_stmt_block+2214>
0x0000000119245d26 <exec_stmt_block+2214>: mov -0xbc(%rbp),%eax
0x0000000119245d2c <exec_stmt_block+2220>: mov %eax,-0xa4(%rbp)
0x0000000119245d32 <exec_stmt_block+2226>: jmpq 0x119245e0d <exec_stmt_block+2445>
0x0000000119245d37 <exec_stmt_block+2231>: mov -0xb0(%rbp),%rax
0x0000000119245d3e <exec_stmt_block+2238>: cmpq $0x0,0x28(%rax)
0x0000000119245d46 <exec_stmt_block+2246>: jne 0x119245d5b <exec_stmt_block+2267>
0x0000000119245d4c <exec_stmt_block+2252>: movl $0x1,-0xa4(%rbp)
0x0000000119245d56 <exec_stmt_block+2262>: jmpq 0x119245e0d <exec_stmt_block+2445>
0x0000000119245d5b <exec_stmt_block+2267>: mov -0xb8(%rbp),%rax
0x0000000119245d62 <exec_stmt_block+2274>: cmpq $0x0,0x8(%rax)
0x0000000119245d6a <exec_stmt_block+2282>: jne 0x119245d7f <exec_stmt_block+2303>
0x0000000119245d70 <exec_stmt_block+2288>: movl $0x1,-0xa4(%rbp)
0x0000000119245d7a <exec_stmt_block+2298>: jmpq 0x119245e0d <exec_stmt_block+2445>
0x0000000119245d7f <exec_stmt_block+2303>: mov -0xb8(%rbp),%rax
0x0000000119245d86 <exec_stmt_block+2310>: mov 0x8(%rax),%rdi
0x0000000119245d8a <exec_stmt_block+2314>: mov -0xb0(%rbp),%rax
0x0000000119245d91 <exec_stmt_block+2321>: mov 0x28(%rax),%rsi
0x0000000119245d95 <exec_stmt_block+2325>: callq 0x1192567a6 <dyld_stub_strcmp>
0x0000000119245d9a <exec_stmt_block+2330>: cmp $0x0,%eax
0x0000000119245d9f <exec_stmt_block+2335>: je 0x119245db4 <exec_stmt_block+2356>
0x0000000119245da5 <exec_stmt_block+2341>: movl $0x1,-0xa4(%rbp)
0x0000000119245daf <exec_stmt_block+2351>: jmpq 0x119245e0d <exec_stmt_block+2445>
0x0000000119245db4 <exec_stmt_block+2356>: mov -0xb0(%rbp),%rax
0x0000000119245dbb <exec_stmt_block+2363>: movq $0x0,0x28(%rax)
0x0000000119245dc3 <exec_stmt_block+2371>: movl $0x0,-0xa4(%rbp)
0x0000000119245dcd <exec_stmt_block+2381>: jmpq 0x119245e0d <exec_stmt_block+2445>
0x0000000119245dd2 <exec_stmt_block+2386>: jmpq 0x119245dd7 <exec_stmt_block+2391>
0x0000000119245dd7 <exec_stmt_block+2391>: lea 0x160b2(%rip),%rdi # 0x11925be90
0x0000000119245dde <exec_stmt_block+2398>: mov $0x55c,%esi
0x0000000119245de3 <exec_stmt_block+2403>: lea 0x16a84(%rip),%rdx # 0x11925c86e
0x0000000119245dea <exec_stmt_block+2410>: callq 0x119256548 <dyld_stub_elog_start>
0x0000000119245def <exec_stmt_block+2415>: mov $0x14,%edi
0x0000000119245df4 <exec_stmt_block+2420>: lea 0x16b33(%rip),%rsi # 0x11925c92e
0x0000000119245dfb <exec_stmt_block+2427>: mov -0xbc(%rbp),%edx
0x0000000119245e01 <exec_stmt_block+2433>: mov $0x0,%al
0x0000000119245e03 <exec_stmt_block+2435>: callq 0x119256542 <dyld_stub_elog_finish>
0x0000000119245e08 <exec_stmt_block+2440>: callq 0x119256776 <dyld_stub_abort>
0x0000000119245e0d <exec_stmt_block+2445>: mov 0x1824c(%rip),%rax # 0x11925e060
0x0000000119245e14 <exec_stmt_block+2452>: mov -0xa4(%rbp),%ecx
0x0000000119245e1a <exec_stmt_block+2458>: mov (%rax),%rax
0x0000000119245e1d <exec_stmt_block+2461>: cmp -0x8(%rbp),%rax
0x0000000119245e21 <exec_stmt_block+2465>: mov %ecx,-0x188(%rbp)
0x0000000119245e27 <exec_stmt_block+2471>: jne 0x119245e3c <exec_stmt_block+2492>
0x0000000119245e2d <exec_stmt_block+2477>: mov -0x188(%rbp),%eax
0x0000000119245e33 <exec_stmt_block+2483>: add $0x190,%rsp
0x0000000119245e3a <exec_stmt_block+2490>: pop %rbp
0x0000000119245e3b <exec_stmt_block+2491>: retq
0x0000000119245e3c <exec_stmt_block+2492>: callq 0x11925676a <dyld_stub___stack_chk_fail>
End of assembler dump.
--
Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX
Experts in Analytics, Data Architecture and PostgreSQL
Data in Trouble? Get it in Treble! http://BlueTreble.com
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Sun, Aug 30, 2015 at 1:06 AM, Jim Nasby wrote:
Steps to reproduce:
Download https://github.com/BlueTreble/test_factory/archive/crash.zip
Unzip, cd into directory
pgxn install pgtap (or just make test)
FWIW, make test fails:
! ERROR: 42703: column "c_data_table_name" does not exist
! LINE 4: , c_data_table_name
make install if you didn't do make test
psql -f crash.sql
As does this one with the same error. I am not exactly sure what I am missing.
Regards,
--
Michael
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On 9/1/15 1:08 AM, Michael Paquier wrote:
On Sun, Aug 30, 2015 at 1:06 AM, Jim Nasby wrote:
Steps to reproduce:
Download https://github.com/BlueTreble/test_factory/archive/crash.zip
Unzip, cd into directory
pgxn install pgtap (or just make test)FWIW, make test fails:
! ERROR: 42703: column "c_data_table_name" does not exist
! LINE 4: , c_data_table_namemake install if you didn't do make test
psql -f crash.sqlAs does this one with the same error. I am not exactly sure what I am missing.
Regards,
Well nuts, pretty sure that means the error isn't reproducing for you.
:/ Do you maybe have unusual config options or postgresql.conf options?
Here's a log of how I can reproduce from a just-pulled copy of HEAD:
decibel@decina:[17:22]~/pgsql/HEAD (master $=)$head config.log |grep ./conf
$ ./configure --with-includes=/opt/local/include
--with-libraries=/opt/local/lib --with-perl --with-python
--enable-depend -C --enable-tap-tests
--prefix=/Users/decibel/pgsql/HEAD/i --with-pgport=5555 --enable-debug
CC=ccache clang -Qunused-arguments -fcolor-diagnostics --enable-cassert
CFLAGS=-O0 --no-create --no-recursion
decibel@decina:[17:22]~/pgsql/HEAD (master $=)$
make; make install; pg_ctl init; pg_ctl start; createdb
decibel@decina:[17:20]~/tmp$mv ~/Downloads/test_factory-crash.zip .
decibel@decina:[17:20]~/tmp$open test_factory-crash.zip
decibel@decina:[17:20]~/tmp$cd test_factory-crash/
decibel@decina:[17:20]~/tmp/test_factory-crash$pgHEAD
decibel@decina:[17:20]~/tmp/test_factory-crash$make test
rm -rf sql/test_factory--0.1.1.sql
rm -rf results/ regression.diffs regression.out tmp_check/ log/
cp sql/test_factory.sql sql/test_factory--0.1.1.sql
/bin/sh
/Users/decibel/pgsql/HEAD/i/lib/pgxs/src/makefiles/../../config/install-sh
-c -d '/Users/decibel/pgsql/HEAD/i/share/extension'
/bin/sh
/Users/decibel/pgsql/HEAD/i/lib/pgxs/src/makefiles/../../config/install-sh
-c -d '/Users/decibel/pgsql/HEAD/i/share/doc/extension'
/usr/bin/install -c -m 644 .//test_factory.control
'/Users/decibel/pgsql/HEAD/i/share/extension/'
/usr/bin/install -c -m 644 .//doc/test_factory.asc
'/Users/decibel/pgsql/HEAD/i/share/doc/extension/'
/Users/decibel/pgsql/HEAD/i/lib/pgxs/src/makefiles/../../src/test/regress/pg_regress
--inputdir=./ --bindir='/Users/decibel/pgsql/HEAD/i/bin'
--inputdir=test --load-language=plpgsql --dbname=contrib_regression base
(using postmaster on Unix socket, default port)
============== dropping database "contrib_regression" ==============
NOTICE: database "contrib_regression" does not exist, skipping
DROP DATABASE
============== creating database "contrib_regression" ==============
CREATE DATABASE
ALTER DATABASE
============== installing plpgsql ==============
CREATE LANGUAGE
============== running regression test queries ==============
test base ... FAILED (test process exited with exit
code 2)
======================
1 of 1 tests failed.
======================
The differences that caused some tests to fail can be viewed in the
file "/Users/decibel/tmp/test_factory-crash/regression.diffs". A copy
of the test summary that you see
above is saved in the file
"/Users/decibel/tmp/test_factory-crash/regression.out".
make: [installcheck] Error 1 (ignored)
*** /Users/decibel/tmp/test_factory-crash/test/expected/base.out Sat Aug
29 08:50:08 2015
--- /Users/decibel/tmp/test_factory-crash/results/base.out Tue Sep 1
17:21:04 2015
***************
*** 1,18 ****
\set ECHO none
! ok 1 - Create customer table
! ok 2 - Register test customers
! ok 3 - Create function customer__add
! ok 4 - Create invoice table
! ok 5 - Register test invoices
! ok 6 - customer table is empty
! ok 7 - invoice table is empty
! ok 8 - invoice factory output
! ok 9 - invoice table content
! ok 10 - customer table content
! ok 11 - invoice factory second call
! ok 12 - invoice table content stayed constant
! ok 13 - customer table content stayed constant
! ok 14 - Test function factory
! ok 15 - customer table has new row
! ok 16 - truncate invoice
! ok 17 - invoice factory get remains the same after truncate
--- 1,10 ----
\set ECHO none
! ok 1 - Register test customers
! ok 2 - Create function customer__add
! ok 3 - Register test invoices
! ok 4 - customer table is empty
! ok 5 - invoice table is empty
! server closed the connection unexpectedly
! This probably means the server terminated abnormally
! before or while processing the request.
! connection to server was lost
======================================================================
decibel@decina:[17:21]~/tmp/test_factory-crash$
--
Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX
Experts in Analytics, Data Architecture and PostgreSQL
Data in Trouble? Get it in Treble! http://BlueTreble.com
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Wed, Sep 2, 2015 at 7:23 AM, Jim Nasby wrote:
Well nuts, pretty sure that means the error isn't reproducing for you. :/ Do
you maybe have unusual config options or postgresql.conf options?
Yep. And I have found one reason why it was not working, I have been
using --extra-version with a custom string and the Makefile of
test_factory failed to detect that (you may want to use VERSION_NUM in
Makefile.global instead), leading to test_factory--0.1.1.sql to not be
installed as well.
Even after fixing that, I am facing the same problem when running the test, aka:
psql:crash.sql:42: ERROR: 42703: column "c_data_table_name" does not exist
LINE 4: , c_data_table_name
And the same error show up, should I use the branch crash in the
github repo or your zip file, make test or make install/psql -f
crash.sql.
test_factory is a jungle to me. Perhaps you could just extract a
self-contained test case? It does not matter if the file is long as
long as the problem can be easily reproduced.
Regards,
--
Michael
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Wed, Sep 2, 2015 at 9:39 AM, Michael Paquier
<michael.paquier@gmail.com> wrote:
On Wed, Sep 2, 2015 at 7:23 AM, Jim Nasby wrote:
Well nuts, pretty sure that means the error isn't reproducing for you. :/ Do
you maybe have unusual config options or postgresql.conf options?Yep. And I have found one reason why it was not working, I have been
using --extra-version with a custom string and the Makefile of
test_factory failed to detect that (you may want to use VERSION_NUM in
Makefile.global instead), leading to test_factory--0.1.1.sql to not be
installed as well.Even after fixing that, I am facing the same problem when running the test, aka:
psql:crash.sql:42: ERROR: 42703: column "c_data_table_name" does not exist
LINE 4: , c_data_table_name
And the same error show up, should I use the branch crash in the
github repo or your zip file, make test or make install/psql -f
crash.sql.test_factory is a jungle to me. Perhaps you could just extract a
self-contained test case? It does not matter if the file is long as
long as the problem can be easily reproduced.
Mea culpa. It is possible to run crash.sql, with test_factory instead
0.2.1 installed in a cluster. So I picked it up from github on master
branch, deployed it, and then crash.sql is able to run. However I was
not able to reproduce a failure on master, REL9_4_STABLE and 9.4.1.
Thoughts?
--
Michael
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On 9/1/15 8:42 PM, Michael Paquier wrote:
On Wed, Sep 2, 2015 at 9:39 AM, Michael Paquier
<michael.paquier@gmail.com> wrote:On Wed, Sep 2, 2015 at 7:23 AM, Jim Nasby wrote:
Well nuts, pretty sure that means the error isn't reproducing for you. :/ Do
you maybe have unusual config options or postgresql.conf options?Yep. And I have found one reason why it was not working, I have been
using --extra-version with a custom string and the Makefile of
test_factory failed to detect that (you may want to use VERSION_NUM in
Makefile.global instead), leading to test_factory--0.1.1.sql to not be
installed as well.Even after fixing that, I am facing the same problem when running the test, aka:
psql:crash.sql:42: ERROR: 42703: column "c_data_table_name" does not exist
LINE 4: , c_data_table_name
And the same error show up, should I use the branch crash in the
github repo or your zip file, make test or make install/psql -f
crash.sql.test_factory is a jungle to me. Perhaps you could just extract a
self-contained test case? It does not matter if the file is long as
long as the problem can be easily reproduced.Mea culpa. It is possible to run crash.sql, with test_factory instead
0.2.1 installed in a cluster. So I picked it up from github on master
branch, deployed it, and then crash.sql is able to run. However I was
not able to reproduce a failure on master, REL9_4_STABLE and 9.4.1.
Thoughts?
The crash is triggered by having an exception raised in this particular
call stack. Since there's no syntax error in master/0.2.1, there's no
assert failure either. Were you running asserts? What configure line are
you using? I might be able to track this down to something particular to
my configure.
--
Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX
Experts in Analytics, Data Architecture and PostgreSQL
Data in Trouble? Get it in Treble! http://BlueTreble.com
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On 9/1/15 8:42 PM, Michael Paquier wrote:
test_factory is a jungle to me. Perhaps you could just extract a
self-contained test case? It does not matter if the file is long as
long as the problem can be easily reproduced.
Sorry, more info on what's happening here.
The idea behind test_factory is to allow you to register a command that
creates and returns test data (IE: INSERT INTO table VALUES( DEFAULT,
'my test data' ) RETURNING *). That insert statement is stored in a
table (_tf._test_factory). When this dynamic statement is executed, the
results will be stored in a specially named table (plpgsql variable
c_data_table_name).
When you call tf.get(), it first attempts to grab all the rows from
c_data_table_name. The first time you do this in a database, that table
won't exist. tg.get's exception block will create a temp table holding
the results of the stored statement (IE: INSERT INTO table ...).
Something else important here is that in crash.sql there is a nested
tf.get() call:
-- Invoice
, $$INSERT INTO invoice VALUES(
DEFAULT
, (tf.get( NULL::customer, 'insert' )).customer_id
, current_date
, current_date + 30
) RETURNING *$$
Note that calls tf.get for customer (which is a simple INSERT).
This is where stuff gets weird. If you run tf.get( NULL::customer,
'insert' ) you get a regular plpgsql error. If you simply run tf.get()
for invoices, *outside* of tap.results_eq(), you also only get a plpgsql
error. To trigger the assert, you must use tf.get( NULL::invoice, 'base'
) from within tap.results_eq(). That's the only way I've found to
trigger this.
AFAICT, that call stack looks like this:
results_eq opens a cursor to run $$SELECT * FROM tf.get( NULL::invoice,
'base' )$$
plpgsql does it's thing and eventually that statement begins execution
tf.get() does a few things then attempts to read from a non-existent
table. tf.get's outer block catches that exception and runs dynamic SQL
to create a temp table. That dynamic SQL contains (in part) this: ,
(tf.get( NULL::customer, 'insert' )).customer_id
*That* tf.get also attempts to read from a non-existent table and
*successfully* creates it's temp table. It then does
PERFORM _tf.table_create( c_data_table_name );
which fails due to a bug in _tf.table_create().
Now we have a second exception bubbling back up to the exception handler
of the second tf.get call, which goes up to the exception handler for
the first tf.get call. That call was in the process of creating a temp
table (invoice_003). The error continues up to the FETCH command in
results_eq(). The assert happens somewhere after here, and it's because
the refcount on that temp table (invoice_003) is unexpected. I'm tracing
through this scenario by hand right now to try and figure out exactly
when that assert blows up, but I know it's happening in
results_eq(refcursor, refcursor, text).
BTW, I just updated the crash branch to ensure that test_factory 0.1.1
is what gets installed.
--
Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX
Experts in Analytics, Data Architecture and PostgreSQL
Data in Trouble? Get it in Treble! http://BlueTreble.com
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Wed, Sep 2, 2015 at 12:59 PM, Jim Nasby wrote:
On 9/1/15 8:42 PM, Michael Paquier wrote:
The crash is triggered by having an exception raised in this particular call
stack. Since there's no syntax error in master/0.2.1, there's no assert
failure either. Were you running asserts?
I thought I was, but visibly it was mistaken. So, after re-rolling
configure, I have been able to reproduce the crash, and as far as I
can see all supported versions are impacted. I tested down to 9.1
where extensions were introduced, and the stack trace, as well as the
assertion failing are the same, similar to what Jim has reported. I am
just digging more into this thing now.
--
Michael
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On 9/1/15 11:59 PM, Michael Paquier wrote:
On Wed, Sep 2, 2015 at 12:59 PM, Jim Nasby wrote:
On 9/1/15 8:42 PM, Michael Paquier wrote:
The crash is triggered by having an exception raised in this particular call
stack. Since there's no syntax error in master/0.2.1, there's no assert
failure either. Were you running asserts?I thought I was, but visibly it was mistaken. So, after re-rolling
configure, I have been able to reproduce the crash, and as far as I
can see all supported versions are impacted. I tested down to 9.1
where extensions were introduced, and the stack trace, as well as the
assertion failing are the same, similar to what Jim has reported. I am
just digging more into this thing now.
I just had a theory that reference counts were messed up because there
was both a temp table and a real table with the same name. Turns out
that's not the case, but I do now know that the assert is failing for
the reference count on the temp table.
--
Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX
Experts in Analytics, Data Architecture and PostgreSQL
Data in Trouble? Get it in Treble! http://BlueTreble.com
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Wed, Sep 2, 2015 at 6:13 AM, Jim Nasby <Jim.Nasby@bluetreble.com> wrote:
On 9/1/15 11:59 PM, Michael Paquier wrote:
On Wed, Sep 2, 2015 at 12:59 PM, Jim Nasby wrote:
On 9/1/15 8:42 PM, Michael Paquier wrote:
The crash is triggered by having an exception raised in this particular
call
stack. Since there's no syntax error in master/0.2.1, there's no assert
failure either. Were you running asserts?I thought I was, but visibly it was mistaken. So, after re-rolling
configure, I have been able to reproduce the crash, and as far as I
can see all supported versions are impacted. I tested down to 9.1
where extensions were introduced, and the stack trace, as well as the
assertion failing are the same, similar to what Jim has reported. I am
just digging more into this thing now.I just had a theory that reference counts were messed up because there was
both a temp table and a real table with the same name. Turns out that's
not
the case, but I do now know that the assert is failing for the reference
count on the temp table.
Yes, that's what I have been looking at actually by having some markers in
relcache.c. The reference count of this relation get incremented here:
LOG: increment ref count relation: invoice_0000000003, rd_refcnt: 1
CONTEXT: SQL statement "
CREATE TEMP TABLE invoice_0000000003 ON COMMIT DROP AS
WITH i AS (
INSERT INTO invoice VALUES(
DEFAULT
, (tf.get( NULL::customer, 'insert' )).customer_id
, current_date
, current_date + 30
) RETURNING *
)
SELECT *
FROM i
"
PL/pgSQL function tf.get(anyelement,text) line 29 at EXECUTE
PL/pgSQL function results_eq(refcursor,refcursor,text) line 11 at
FETCH
PL/pgSQL function results_eq(text,text,text) line 9 at assignment
STATEMENT: SELECT results_eq(
$$SELECT * FROM tf.get( NULL::invoice, 'base' )$$
, $$VALUES( 1, 1, current_date, current_date + 30 )$$
, 'invoice factory output'
);
And it gets cleared here without being decremented when cleaning up the
second exception:
LOG: clear relation: invoice_0000000003, rd_refcnt: 1
CONTEXT: PL/pgSQL function results_eq(refcursor,refcursor,text) line 11
during exception cleanup
PL/pgSQL function results_eq(text,text,text) line 9 at assignment
STATEMENT: SELECT results_eq(
$$SELECT * FROM tf.get( NULL::invoice, 'base' )$$
, $$VALUES( 1, 1, current_date, current_date + 30 )$$
, 'invoice factory output'
);
--
Michael
On Wed, Sep 2, 2015 at 6:21 AM, Michael Paquier wrote:
Yes, that's what I have been looking at actually by having some markers in relcache.c. The reference count of this relation get incremented here:
So, I have been playing more with this code, and as mentioned by
Andres this test case goes twice through the PG_CATCH block in
pl_exec.c, causing the crash as the temporary relation does not get
dereferenced. I may be missing something, but isn't it a matter of
moving the switch to the old memory context at the end of the block so
as we can get a correct cleanup of the estate in the first block?
See for example the attached that fixes the problem for me, and passes
make checkworld, though I expect Tom and Andres to jump in and say
that this is not correct within the next 12 hours :)
I haven't written yet a test case but I think that we could reproduce
that simply by having a relation referenced in the exception block of
a first function, calling a second function that itself raises an
exception, causing the referencing error.
Regards,
--
Michael
Attachments:
20150902_plpgsql_crash.patchtext/x-patch; charset=US-ASCII; name=20150902_plpgsql_crash.patchDownload
diff --git a/src/pl/plpgsql/src/pl_exec.c b/src/pl/plpgsql/src/pl_exec.c
index 935fa62..56084c3 100644
--- a/src/pl/plpgsql/src/pl_exec.c
+++ b/src/pl/plpgsql/src/pl_exec.c
@@ -1253,8 +1253,6 @@ exec_stmt_block(PLpgSQL_execstate *estate, PLpgSQL_stmt_block *block)
/* Abort the inner transaction */
RollbackAndReleaseCurrentSubTransaction();
- MemoryContextSwitchTo(oldcontext);
- CurrentResourceOwner = oldowner;
/* Revert to outer eval_econtext */
estate->eval_econtext = old_eval_econtext;
@@ -1326,6 +1324,9 @@ exec_stmt_block(PLpgSQL_execstate *estate, PLpgSQL_stmt_block *block)
ReThrowError(edata);
else
FreeErrorData(edata);
+
+ MemoryContextSwitchTo(oldcontext);
+ CurrentResourceOwner = oldowner;
}
PG_END_TRY();
Michael Paquier wrote:
I haven't written yet a test case but I think that we could reproduce
that simply by having a relation referenced in the exception block of
a first function, calling a second function that itself raises an
exception, causing the referencing error.
Hm, so function 2 is called in the exception block of function 1?
Are you going to produce the test case for this? Jim?
--
�lvaro Herrera http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On 9/2/15 2:17 PM, Alvaro Herrera wrote:
Michael Paquier wrote:
I haven't written yet a test case but I think that we could reproduce
that simply by having a relation referenced in the exception block of
a first function, calling a second function that itself raises an
exception, causing the referencing error.Hm, so function 2 is called in the exception block of function 1?
Are you going to produce the test case for this? Jim?
I had attempted one but the issue is that it's more than just an
exception block thing. If it was that simple then we'd still get the
crash without involving pgTap. So I suspect you need to have a named
cursor in the mix as well.
Let me make another attempt at something simpler.
--
Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX
Experts in Analytics, Data Architecture and PostgreSQL
Data in Trouble? Get it in Treble! http://BlueTreble.com
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On 9/2/15 2:56 PM, Jim Nasby wrote:
On 9/2/15 2:17 PM, Alvaro Herrera wrote:
Michael Paquier wrote:
I haven't written yet a test case but I think that we could reproduce
that simply by having a relation referenced in the exception block of
a first function, calling a second function that itself raises an
exception, causing the referencing error.Hm, so function 2 is called in the exception block of function 1?
Are you going to produce the test case for this? Jim?
I had attempted one but the issue is that it's more than just an
exception block thing. If it was that simple then we'd still get the
crash without involving pgTap. So I suspect you need to have a named
cursor in the mix as well.Let me make another attempt at something simpler.
After some time messing around with this I've been able to remove pgTap
from the equation using the attached crash.sql (relevant snippet below).
This only works if you pass a refcursor into the function. It doesn't
work if you do
OPEN have FOR EXECUTE $$SELECT * FROM tf.get( NULL::invoice, 'base' )$$;
inside the function instead. This code also produces different results;
it outputs the error message before crashing and the stack trace shows
the assert is now failing while trying to abort the top-level
transaction. So it looks like the scenario is:
BEGIN;
DECLARE (open?) cursor that calls function with exception handler that
calls function with exception handler that calls something that fails
The double set of exceptions seems to be critical; if instead of calling
tf.get(invoice) (which recursively does tf.get(customer)) I define the
cursor to call tf.get(customer) directly there's no assert.
I can poke at this more tomorrow, but it'd be very helpful if someone
could explain this failure mode, as I'm basically stumbling in the dark
on a test case right now.
CREATE OR REPLACE FUNCTION a(
have refcursor
) RETURNS void LANGUAGE plpgsql AS $body$
DECLARE
have_rec record;
BEGIN
FETCH have INTO have_rec;
END
$body$;
DECLARE h CURSOR FOR SELECT * FROM tf.get( NULL::invoice, 'base' );
SELECT a(
'h'::refcursor
);
(lldb) bt
* thread #1: tid = 0x722ed8, 0x00007fff92a5a866 libsystem_kernel.dylib`__pthread_kill + 10, queue = 'com.apple.main-thread', stop reason = signal SIGABRT
* frame #0: 0x00007fff92a5a866 libsystem_kernel.dylib`__pthread_kill + 10
frame #1: 0x00007fff9001b35c libsystem_pthread.dylib`pthread_kill + 92
frame #2: 0x00007fff8cf89b1a libsystem_c.dylib`abort + 125
frame #3: 0x000000010cdb4039 postgres`ExceptionalCondition(conditionName=0x000000010cf59cfd, errorType=0x000000010cec392d, fileName=0x000000010cf59045, lineNumber=1972) + 137 at assert.c:54
frame #4: 0x000000010cd9b332 postgres`RelationClearRelation(relation=0x000000011658f110, rebuild='\0') + 162 at relcache.c:1970
frame #5: 0x000000010cd9cc0f postgres`AtEOSubXact_cleanup(relation=0x000000011658f110, isCommit='\0', mySubid=15, parentSubid=1) + 79 at relcache.c:2665
frame #6: 0x000000010cd9cb92 postgres`AtEOSubXact_RelationCache(isCommit='\0', mySubid=15, parentSubid=1) + 242 at relcache.c:2633
frame #7: 0x000000010c9b6e88 postgres`AbortSubTransaction + 440 at xact.c:4373
frame #8: 0x000000010c9b7208 postgres`AbortCurrentTransaction + 312 at xact.c:2947
frame #9: 0x000000010cc249f3 postgres`PostgresMain(argc=1, argv=0x00007fa3f4800378, dbname=0x00007fa3f4800200, username=0x00007fa3f30031f8) + 1875 at postgres.c:3902
frame #10: 0x000000010cb9da48 postgres`PostmasterMain [inlined] BackendRun + 8024 at postmaster.c:4155
frame #11: 0x000000010cb9da22 postgres`PostmasterMain [inlined] BackendStartup at postmaster.c:3829
frame #12: 0x000000010cb9da22 postgres`PostmasterMain [inlined] ServerLoop at postmaster.c:1597
frame #13: 0x000000010cb9da22 postgres`PostmasterMain(argc=<unavailable>, argv=<unavailable>) + 7986 at postmaster.c:1244
frame #14: 0x000000010cb218cd postgres`main(argc=<unavailable>, argv=<unavailable>) + 1325 at main.c:228
frame #15: 0x00007fff8e9a35fd libdyld.dylib`start + 1
--
Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX
Experts in Analytics, Data Architecture and PostgreSQL
Data in Trouble? Get it in Treble! http://BlueTreble.com
Attachments:
On Thu, Sep 3, 2015 at 1:46 AM, Jim Nasby <Jim.Nasby@bluetreble.com> wrote:
The double set of exceptions seems to be critical; if instead of calling
tf.get(invoice) (which recursively does tf.get(customer)) I define the
cursor to call tf.get(customer) directly there's no assert.
Finally I have been able to crack down the problem, and it can be
reproduced with the following test case for example:
BEGIN;
CREATE OR REPLACE FUNCTION create_self_tab() RETURNS text
LANGUAGE plpgsql AS $$
BEGIN
CREATE TEMP TABLE new_table ON COMMIT DROP AS SELECT create_self_tab();
RETURN 'foo';
END $$;
DECLARE h CURSOR FOR SELECT create_self_tab();
SAVEPOINT self_tab_point;
FETCH h; -- error
COMMIT;
When fetching the first tuple, the transaction status is cleaned up to the
savepoint because the call actually fails in the second loop at the
temporary relation exists, but it happens that the temporary table that has
been created tried to be cleanup up but it is still referenced, causing the
assertion failure. So that's not related to the use of the exception
blocks. What directed me to the SAVEPOINT causing the issue is the use of
ON_ERROR_ROLLBACK in the test case Jim proposed. I don't have a patch at
hand yet, still now things are easier to test.
--
Michael
Michael Paquier <michael.paquier@gmail.com> writes:
Finally I have been able to crack down the problem, and it can be
reproduced with the following test case for example:
Hm. It looks like the problem is that we're executing the function
within the Portal created for cursor "h", and that Portal is older
than the subtransaction created by the savepoint, so when we abort the
subtransaction we do not clean up the Portal. That means that resources
held by that Portal haven't been released --- in particular it still has
a relcache pin on the "new_table" --- when subtransaction abort gets to
the point of wanting to drop relcache entries created during the
subtransaction. So the Assert has caught an actual problem: we'd have
flushed the relcache entry while there was still an open reference to it.
I'm inclined to think that the only real fix for this type of problem
is that at subtransaction abort, we have to prevent further execution
of any Portals that have executed at all within the subxact (ie force
them into PORTAL_FAILED state and clean out their resources, see
MarkPortalFailed). It's not good enough to kill the one(s) that are
actively executing at the instant of failure, because anything that's
run at all since the subxact started might be holding a reference to an
object we're about to delete.
I'm a little worried that that will break usage patterns that work today,
though.
regards, tom lane
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
I wrote:
I'm inclined to think that the only real fix for this type of problem
is that at subtransaction abort, we have to prevent further execution
of any Portals that have executed at all within the subxact (ie force
them into PORTAL_FAILED state and clean out their resources, see
MarkPortalFailed). It's not good enough to kill the one(s) that are
actively executing at the instant of failure, because anything that's
run at all since the subxact started might be holding a reference to an
object we're about to delete.
I'm a little worried that that will break usage patterns that work today,
though.
I experimented with the attached patch. It seems to work to stop the
crash Michael exhibited (I've not tried to duplicate Jim's original
example, though). However, it also causes a regression test failure,
because transactions.sql does this:
BEGIN;
DECLARE c CURSOR FOR SELECT unique2 FROM tenk1 ORDER BY unique2;
SAVEPOINT one;
FETCH 10 FROM c;
ROLLBACK TO SAVEPOINT one;
FETCH 10 FROM c;
which is exactly the case we're trying to reject now. So that fills
me with fear that this approach would break existing applications.
On the other hand, I do not really see a good alternative :-(.
I thought about trying to detect whether the Portal actually had any
references to "new in subtransaction" objects to decide whether to
kill it or not, but that seems awfully fragile.
Ideas?
regards, tom lane
Attachments:
kill-used-cursors-during-subxact-abort.patchtext/x-diff; charset=us-ascii; name=kill-used-cursors-during-subxact-abort.patchDownload
diff --git a/src/backend/commands/portalcmds.c b/src/backend/commands/portalcmds.c
index 2794537..327b2a5 100644
*** a/src/backend/commands/portalcmds.c
--- b/src/backend/commands/portalcmds.c
*************** PersistHoldablePortal(Portal portal)
*** 335,345 ****
/*
* Check for improper portal use, and mark portal active.
*/
! if (portal->status != PORTAL_READY)
! ereport(ERROR,
! (errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
! errmsg("portal \"%s\" cannot be run", portal->name)));
! portal->status = PORTAL_ACTIVE;
/*
* Set up global portal context pointers.
--- 335,341 ----
/*
* Check for improper portal use, and mark portal active.
*/
! MarkPortalActive(portal);
/*
* Set up global portal context pointers.
diff --git a/src/backend/tcop/pquery.c b/src/backend/tcop/pquery.c
index 9c14e8a..0df86a2 100644
*** a/src/backend/tcop/pquery.c
--- b/src/backend/tcop/pquery.c
*************** PortalRun(Portal portal, long count, boo
*** 734,744 ****
/*
* Check for improper portal use, and mark portal active.
*/
! if (portal->status != PORTAL_READY)
! ereport(ERROR,
! (errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
! errmsg("portal \"%s\" cannot be run", portal->name)));
! portal->status = PORTAL_ACTIVE;
/*
* Set up global portal context pointers.
--- 734,740 ----
/*
* Check for improper portal use, and mark portal active.
*/
! MarkPortalActive(portal);
/*
* Set up global portal context pointers.
*************** PortalRunFetch(Portal portal,
*** 1398,1408 ****
/*
* Check for improper portal use, and mark portal active.
*/
! if (portal->status != PORTAL_READY)
! ereport(ERROR,
! (errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
! errmsg("portal \"%s\" cannot be run", portal->name)));
! portal->status = PORTAL_ACTIVE;
/*
* Set up global portal context pointers.
--- 1394,1400 ----
/*
* Check for improper portal use, and mark portal active.
*/
! MarkPortalActive(portal);
/*
* Set up global portal context pointers.
diff --git a/src/backend/utils/mmgr/portalmem.c b/src/backend/utils/mmgr/portalmem.c
index 7530498..74f5089 100644
*** a/src/backend/utils/mmgr/portalmem.c
--- b/src/backend/utils/mmgr/portalmem.c
*************** CreatePortal(const char *name, bool allo
*** 232,237 ****
--- 232,238 ----
portal->status = PORTAL_NEW;
portal->cleanup = PortalCleanup;
portal->createSubid = GetCurrentSubTransactionId();
+ portal->activeSubid = portal->createSubid;
portal->strategy = PORTAL_MULTI_QUERY;
portal->cursorOptions = CURSOR_OPT_NO_SCROLL;
portal->atStart = true;
*************** UnpinPortal(Portal portal)
*** 403,408 ****
--- 404,428 ----
}
/*
+ * MarkPortalActive
+ * Transition a portal from READY to ACTIVE state.
+ *
+ * NOTE: never set portal->status = PORTAL_ACTIVE directly; call this instead.
+ */
+ void
+ MarkPortalActive(Portal portal)
+ {
+ /* For safety, this is a runtime test not just an Assert */
+ if (portal->status != PORTAL_READY)
+ ereport(ERROR,
+ (errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
+ errmsg("portal \"%s\" cannot be run", portal->name)));
+ /* Perform the state transition */
+ portal->status = PORTAL_ACTIVE;
+ portal->activeSubid = GetCurrentSubTransactionId();
+ }
+
+ /*
* MarkPortalDone
* Transition a portal from ACTIVE to DONE state.
*
*************** PreCommit_Portals(bool isPrepare)
*** 690,695 ****
--- 710,716 ----
* not belonging to this transaction.
*/
portal->createSubid = InvalidSubTransactionId;
+ portal->activeSubid = InvalidSubTransactionId;
/* Report we changed state */
result = true;
*************** AtCleanup_Portals(void)
*** 836,843 ****
/*
* Pre-subcommit processing for portals.
*
! * Reassign the portals created in the current subtransaction to the parent
! * subtransaction.
*/
void
AtSubCommit_Portals(SubTransactionId mySubid,
--- 857,864 ----
/*
* Pre-subcommit processing for portals.
*
! * Reassign portals created or used in the current subtransaction to the
! * parent subtransaction.
*/
void
AtSubCommit_Portals(SubTransactionId mySubid,
*************** AtSubCommit_Portals(SubTransactionId myS
*** 859,872 ****
if (portal->resowner)
ResourceOwnerNewParent(portal->resowner, parentXactOwner);
}
}
}
/*
* Subtransaction abort handling for portals.
*
! * Deactivate portals created during the failed subtransaction.
! * Note that per AtSubCommit_Portals, this will catch portals created
* in descendants of the subtransaction too.
*
* We don't destroy any portals here; that's done in AtSubCleanup_Portals.
--- 880,895 ----
if (portal->resowner)
ResourceOwnerNewParent(portal->resowner, parentXactOwner);
}
+ if (portal->activeSubid == mySubid)
+ portal->activeSubid = parentSubid;
}
}
/*
* Subtransaction abort handling for portals.
*
! * Deactivate portals created or used during the failed subtransaction.
! * Note that per AtSubCommit_Portals, this will catch portals created/used
* in descendants of the subtransaction too.
*
* We don't destroy any portals here; that's done in AtSubCleanup_Portals.
*************** AtSubAbort_Portals(SubTransactionId mySu
*** 885,900 ****
{
Portal portal = hentry->portal;
if (portal->createSubid != mySubid)
continue;
/*
* Force any live portals of my own subtransaction into FAILED state.
* We have to do this because they might refer to objects created or
* changed in the failed subtransaction, leading to crashes if
* execution is resumed, or even if we just try to run ExecutorEnd.
- * (Note we do NOT do this to upper-level portals, since they cannot
- * have such references and hence may be able to continue.)
*/
if (portal->status == PORTAL_READY ||
portal->status == PORTAL_ACTIVE)
--- 908,956 ----
{
Portal portal = hentry->portal;
+ /* Was it created in this subtransaction? */
if (portal->createSubid != mySubid)
+ {
+ /* No, but maybe it was used in this subtransaction? */
+ if (portal->activeSubid == mySubid)
+ {
+ /* Maintain activeSubid until the portal is removed */
+ portal->activeSubid = parentSubid;
+
+ /*
+ * Upper-level portals that were used in this subxact must be
+ * forced into FAILED state, for the same reasons discussed
+ * below.
+ */
+ if (portal->status == PORTAL_READY ||
+ portal->status == PORTAL_ACTIVE)
+ MarkPortalFailed(portal);
+
+ /*
+ * Also, if we failed it during the current subxact (either
+ * just above, or earlier), reattach its resowner to the
+ * current subtransaction's resowner so that its resources get
+ * cleaned up while we abort this subtransaction. This is
+ * needed to avoid getting Assert failures or worse when we
+ * clean up objects created in the subtransaction.
+ */
+ if (portal->status == PORTAL_FAILED &&
+ portal->resowner)
+ {
+ ResourceOwnerNewParent(portal->resowner,
+ CurrentResourceOwner);
+ portal->resowner = NULL;
+ }
+ }
+ /* If it wasn't created in this subxact, nothing more to do */
continue;
+ }
/*
* Force any live portals of my own subtransaction into FAILED state.
* We have to do this because they might refer to objects created or
* changed in the failed subtransaction, leading to crashes if
* execution is resumed, or even if we just try to run ExecutorEnd.
*/
if (portal->status == PORTAL_READY ||
portal->status == PORTAL_ACTIVE)
diff --git a/src/include/utils/portal.h b/src/include/utils/portal.h
index 1267b9e..7215e46 100644
*** a/src/include/utils/portal.h
--- b/src/include/utils/portal.h
*************** typedef struct PortalData
*** 119,130 ****
MemoryContext heap; /* subsidiary memory for portal */
ResourceOwner resowner; /* resources owned by portal */
void (*cleanup) (Portal portal); /* cleanup hook */
- SubTransactionId createSubid; /* the ID of the creating subxact */
/*
! * if createSubid is InvalidSubTransactionId, the portal is held over from
! * a previous transaction
*/
/* The query or queries the portal will execute */
const char *sourceText; /* text of query (as of 8.4, never NULL) */
--- 119,136 ----
MemoryContext heap; /* subsidiary memory for portal */
ResourceOwner resowner; /* resources owned by portal */
void (*cleanup) (Portal portal); /* cleanup hook */
/*
! * State data for remembering which subtransaction(s) the portal was
! * created or used in. If the portal is held over from a previous
! * transaction, both subxids are InvalidSubTransactionId. Otherwise,
! * createSubid is the creating subxact and activeSubid is the last subxact
! * in which we ran the portal. We must deactivate the portal if either of
! * these subxacts becomes aborted, since it might hold references to
! * objects created in said subxact.
*/
+ SubTransactionId createSubid; /* the creating subxact */
+ SubTransactionId activeSubid; /* the last subxact with activity */
/* The query or queries the portal will execute */
const char *sourceText; /* text of query (as of 8.4, never NULL) */
*************** extern Portal CreatePortal(const char *n
*** 207,212 ****
--- 213,219 ----
extern Portal CreateNewPortal(void);
extern void PinPortal(Portal portal);
extern void UnpinPortal(Portal portal);
+ extern void MarkPortalActive(Portal portal);
extern void MarkPortalDone(Portal portal);
extern void MarkPortalFailed(Portal portal);
extern void PortalDrop(Portal portal, bool isTopCommit);
On Fri, Sep 4, 2015 at 6:51 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
I experimented with the attached patch. It seems to work to stop the
crash Michael exhibited (I've not tried to duplicate Jim's original
example, though). However, it also causes a regression test failure,
because transactions.sql does this:
Neat patch, it would have taken me longer to figure out that... I have
tried with Jim's test case and the patch is working.
which is exactly the case we're trying to reject now. So that fills
me with fear that this approach would break existing applications.
On the other hand, I do not really see a good alternative :-(.
This behavior is present since 2004 with fe548629, so that's a real concern
to me, especially if there are drivers around relying on this behavior.
There are for example some code patterns around Postgres ODBC that could be
impacted, not sure which ones but I guess that some internal error handling
is not going to like that.
I thought about trying to detect whether the Portal actually had any
references to "new in subtransaction" objects to decide whether to
kill it or not, but that seems awfully fragile.Ideas?
Yes. This diff on top of your patch:
@@ -922,8 +922,7 @@ AtSubAbort_Portals(SubTransactionId mySubid,
* must be forced into FAILED state, for
the same reasons
* discussed below.
*/
- if (portal->status == PORTAL_READY ||
- portal->status == PORTAL_ACTIVE)
+ if (portal->status == PORTAL_ACTIVE)
MarkPortalFailed(portal);
This way only the active portals are marked as failed. The regression tests
that are failing with your patch applied visibly do not activate the portal
they use, just marking it as ready to be used. This seems to be the safest
approach to me on stable branches, as well as on master, this way we are
sure that resources on the failed subxacts are cleaned up correctly, and
that existing applications are not impacted.
I am attaching a new patch with what I changed and a test case based on my
previous one.
Regards,
--
Michael
Attachments:
20150904_cursor_reference_fix.patchapplication/x-patch; name=20150904_cursor_reference_fix.patchDownload
diff --git a/src/backend/commands/portalcmds.c b/src/backend/commands/portalcmds.c
index 2794537..327b2a5 100644
--- a/src/backend/commands/portalcmds.c
+++ b/src/backend/commands/portalcmds.c
@@ -335,11 +335,7 @@ PersistHoldablePortal(Portal portal)
/*
* Check for improper portal use, and mark portal active.
*/
- if (portal->status != PORTAL_READY)
- ereport(ERROR,
- (errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
- errmsg("portal \"%s\" cannot be run", portal->name)));
- portal->status = PORTAL_ACTIVE;
+ MarkPortalActive(portal);
/*
* Set up global portal context pointers.
diff --git a/src/backend/tcop/pquery.c b/src/backend/tcop/pquery.c
index 9c14e8a..0df86a2 100644
--- a/src/backend/tcop/pquery.c
+++ b/src/backend/tcop/pquery.c
@@ -734,11 +734,7 @@ PortalRun(Portal portal, long count, bool isTopLevel,
/*
* Check for improper portal use, and mark portal active.
*/
- if (portal->status != PORTAL_READY)
- ereport(ERROR,
- (errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
- errmsg("portal \"%s\" cannot be run", portal->name)));
- portal->status = PORTAL_ACTIVE;
+ MarkPortalActive(portal);
/*
* Set up global portal context pointers.
@@ -1398,11 +1394,7 @@ PortalRunFetch(Portal portal,
/*
* Check for improper portal use, and mark portal active.
*/
- if (portal->status != PORTAL_READY)
- ereport(ERROR,
- (errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
- errmsg("portal \"%s\" cannot be run", portal->name)));
- portal->status = PORTAL_ACTIVE;
+ MarkPortalActive(portal);
/*
* Set up global portal context pointers.
diff --git a/src/backend/utils/mmgr/portalmem.c b/src/backend/utils/mmgr/portalmem.c
index 7530498..639c2b3 100644
--- a/src/backend/utils/mmgr/portalmem.c
+++ b/src/backend/utils/mmgr/portalmem.c
@@ -232,6 +232,7 @@ CreatePortal(const char *name, bool allowDup, bool dupSilent)
portal->status = PORTAL_NEW;
portal->cleanup = PortalCleanup;
portal->createSubid = GetCurrentSubTransactionId();
+ portal->activeSubid = portal->createSubid;
portal->strategy = PORTAL_MULTI_QUERY;
portal->cursorOptions = CURSOR_OPT_NO_SCROLL;
portal->atStart = true;
@@ -403,6 +404,25 @@ UnpinPortal(Portal portal)
}
/*
+ * MarkPortalActive
+ * Transition a portal from READY to ACTIVE state.
+ *
+ * NOTE: never set portal->status = PORTAL_ACTIVE directly; call this instead.
+ */
+void
+MarkPortalActive(Portal portal)
+{
+ /* For safety, this is a runtime test not just an Assert */
+ if (portal->status != PORTAL_READY)
+ ereport(ERROR,
+ (errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
+ errmsg("portal \"%s\" cannot be run", portal->name)));
+ /* Perform the state transition */
+ portal->status = PORTAL_ACTIVE;
+ portal->activeSubid = GetCurrentSubTransactionId();
+}
+
+/*
* MarkPortalDone
* Transition a portal from ACTIVE to DONE state.
*
@@ -690,6 +710,7 @@ PreCommit_Portals(bool isPrepare)
* not belonging to this transaction.
*/
portal->createSubid = InvalidSubTransactionId;
+ portal->activeSubid = InvalidSubTransactionId;
/* Report we changed state */
result = true;
@@ -836,8 +857,8 @@ AtCleanup_Portals(void)
/*
* Pre-subcommit processing for portals.
*
- * Reassign the portals created in the current subtransaction to the parent
- * subtransaction.
+ * Reassign portals created or used in the current subtransaction to the
+ * parent subtransaction.
*/
void
AtSubCommit_Portals(SubTransactionId mySubid,
@@ -859,14 +880,16 @@ AtSubCommit_Portals(SubTransactionId mySubid,
if (portal->resowner)
ResourceOwnerNewParent(portal->resowner, parentXactOwner);
}
+ if (portal->activeSubid == mySubid)
+ portal->activeSubid = parentSubid;
}
}
/*
* Subtransaction abort handling for portals.
*
- * Deactivate portals created during the failed subtransaction.
- * Note that per AtSubCommit_Portals, this will catch portals created
+ * Deactivate portals created or used during the failed subtransaction.
+ * Note that per AtSubCommit_Portals, this will catch portals created/used
* in descendants of the subtransaction too.
*
* We don't destroy any portals here; that's done in AtSubCleanup_Portals.
@@ -885,16 +908,48 @@ AtSubAbort_Portals(SubTransactionId mySubid,
{
Portal portal = hentry->portal;
+ /* Was it created in this subtransaction? */
if (portal->createSubid != mySubid)
+ {
+ /* No, but maybe it was used in this subtransaction? */
+ if (portal->activeSubid == mySubid)
+ {
+ /* Maintain activeSubid until the portal is removed */
+ portal->activeSubid = parentSubid;
+
+ /*
+ * Upper-level active portals that were used in this subxact
+ * must be forced into FAILED state, for the same reasons
+ * discussed below.
+ */
+ if (portal->status == PORTAL_ACTIVE)
+ MarkPortalFailed(portal);
+
+ /*
+ * Also, if we failed it during the current subxact (either
+ * just above, or earlier), reattach its resowner to the
+ * current subtransaction's resowner so that its resources get
+ * cleaned up while we abort this subtransaction. This is
+ * needed to avoid getting Assert failures or worse when we
+ * clean up objects created in the subtransaction.
+ */
+ if (portal->status == PORTAL_FAILED &&
+ portal->resowner)
+ {
+ ResourceOwnerNewParent(portal->resowner,
+ CurrentResourceOwner);
+ portal->resowner = NULL;
+ }
+ }
+ /* If it wasn't used in this subxact, nothing more to do */
continue;
+ }
/*
* Force any live portals of my own subtransaction into FAILED state.
* We have to do this because they might refer to objects created or
* changed in the failed subtransaction, leading to crashes if
* execution is resumed, or even if we just try to run ExecutorEnd.
- * (Note we do NOT do this to upper-level portals, since they cannot
- * have such references and hence may be able to continue.)
*/
if (portal->status == PORTAL_READY ||
portal->status == PORTAL_ACTIVE)
diff --git a/src/include/utils/portal.h b/src/include/utils/portal.h
index 1267b9e..7215e46 100644
--- a/src/include/utils/portal.h
+++ b/src/include/utils/portal.h
@@ -119,12 +119,18 @@ typedef struct PortalData
MemoryContext heap; /* subsidiary memory for portal */
ResourceOwner resowner; /* resources owned by portal */
void (*cleanup) (Portal portal); /* cleanup hook */
- SubTransactionId createSubid; /* the ID of the creating subxact */
/*
- * if createSubid is InvalidSubTransactionId, the portal is held over from
- * a previous transaction
+ * State data for remembering which subtransaction(s) the portal was
+ * created or used in. If the portal is held over from a previous
+ * transaction, both subxids are InvalidSubTransactionId. Otherwise,
+ * createSubid is the creating subxact and activeSubid is the last subxact
+ * in which we ran the portal. We must deactivate the portal if either of
+ * these subxacts becomes aborted, since it might hold references to
+ * objects created in said subxact.
*/
+ SubTransactionId createSubid; /* the creating subxact */
+ SubTransactionId activeSubid; /* the last subxact with activity */
/* The query or queries the portal will execute */
const char *sourceText; /* text of query (as of 8.4, never NULL) */
@@ -207,6 +213,7 @@ extern Portal CreatePortal(const char *name, bool allowDup, bool dupSilent);
extern Portal CreateNewPortal(void);
extern void PinPortal(Portal portal);
extern void UnpinPortal(Portal portal);
+extern void MarkPortalActive(Portal portal);
extern void MarkPortalDone(Portal portal);
extern void MarkPortalFailed(Portal portal);
extern void PortalDrop(Portal portal, bool isTopCommit);
diff --git a/src/test/regress/expected/transactions.out b/src/test/regress/expected/transactions.out
index 5d70863..42f4c78 100644
--- a/src/test/regress/expected/transactions.out
+++ b/src/test/regress/expected/transactions.out
@@ -566,6 +566,24 @@ insert into revalidate_bug values (1);
insert into revalidate_bug values (inverse(0));
drop table revalidate_bug;
drop function inverse(int);
+-- test case for relations referenced in active cursors
+CREATE OR REPLACE FUNCTION create_self_tab() RETURNS text
+LANGUAGE plpgsql AS $$
+BEGIN
+ CREATE TEMP TABLE new_table ON COMMIT DROP AS SELECT create_self_tab();
+ RETURN 'foo';
+END $$;
+BEGIN;
+DECLARE h CURSOR FOR SELECT create_self_tab();
+SAVEPOINT self_tab_point;
+FETCH h; -- error
+ERROR: relation "new_table" already exists
+CONTEXT: SQL statement "CREATE TEMP TABLE new_table ON COMMIT DROP AS SELECT create_self_tab()"
+PL/pgSQL function create_self_tab() line 3 at SQL statement
+SQL statement "CREATE TEMP TABLE new_table ON COMMIT DROP AS SELECT create_self_tab()"
+PL/pgSQL function create_self_tab() line 3 at SQL statement
+COMMIT;
+DROP FUNCTION create_self_tab();
-- verify that cursors created during an aborted subtransaction are
-- closed, but that we do not rollback the effect of any FETCHs
-- performed in the aborted subtransaction
diff --git a/src/test/regress/sql/transactions.sql b/src/test/regress/sql/transactions.sql
index 9fac4a3..8983e79 100644
--- a/src/test/regress/sql/transactions.sql
+++ b/src/test/regress/sql/transactions.sql
@@ -351,6 +351,19 @@ insert into revalidate_bug values (inverse(0));
drop table revalidate_bug;
drop function inverse(int);
+-- test case for relations referenced in active cursors
+CREATE OR REPLACE FUNCTION create_self_tab() RETURNS text
+LANGUAGE plpgsql AS $$
+BEGIN
+ CREATE TEMP TABLE new_table ON COMMIT DROP AS SELECT create_self_tab();
+ RETURN 'foo';
+END $$;
+BEGIN;
+DECLARE h CURSOR FOR SELECT create_self_tab();
+SAVEPOINT self_tab_point;
+FETCH h; -- error
+COMMIT;
+DROP FUNCTION create_self_tab();
-- verify that cursors created during an aborted subtransaction are
-- closed, but that we do not rollback the effect of any FETCHs
Michael Paquier <michael.paquier@gmail.com> writes:
On Fri, Sep 4, 2015 at 6:51 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
Ideas?
Yes. This diff on top of your patch: @@ -922,8 +922,7 @@ AtSubAbort_Portals(SubTransactionId mySubid, * must be forced into FAILED state, for the same reasons * discussed below. */ - if (portal->status == PORTAL_READY || - portal->status == PORTAL_ACTIVE) + if (portal->status == PORTAL_ACTIVE) MarkPortalFailed(portal);
This way only the active portals are marked as failed.
Hmm. I am not feeling especially comfortable about this: it's not clear
that there's anything preventing a suspended portal from containing a
dangerous reference. However, a quick trial did not show that it was
possible to break it --- in the cases I tried, we detected that a cached
plan was no longer valid, tried to rebuild it, and noticed the missing
object at that point. So maybe it's OK.
On reflection I think that the tracking of activeSubid in my patch is
probably overkill if we're attacking it this way. We can just have
AtSubAbort_Portals fail any ACTIVE portal regardless of subxact level,
which is pretty analogous to what AtAbort_Portals has done for a long
time.
Let me work on this and see if I can get to a simpler patch.
regards, tom lane
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Fri, Sep 4, 2015 at 9:57 AM, Tom Lane <tgl@sss.pgh.pa.us>wrote:
On reflection I think that the tracking of activeSubid in my patch is
probably overkill if we're attacking it this way. We can just have
AtSubAbort_Portals fail any ACTIVE portal regardless of subxact level,
which is pretty analogous to what AtAbort_Portals has done for a long
time.
Tracking the activated subxact looked neat from my side, that's more
consistent with what is done when the portal is marked as ready,
particularly with the new routine introduced.
Let me work on this and see if I can get to a simpler patch.
Oh, OK. Thanks!
--
Michael
Michael Paquier <michael.paquier@gmail.com> writes:
On Fri, Sep 4, 2015 at 9:57 AM, Tom Lane <tgl@sss.pgh.pa.us>wrote:
On reflection I think that the tracking of activeSubid in my patch is
probably overkill if we're attacking it this way. We can just have
AtSubAbort_Portals fail any ACTIVE portal regardless of subxact level,
which is pretty analogous to what AtAbort_Portals has done for a long
time.
Tracking the activated subxact looked neat from my side, that's more
consistent with what is done when the portal is marked as ready,
particularly with the new routine introduced.
Actually, that idea did not work at all: it caused errors inside plpgsql
EXCEPT blocks to try to kill the portal running the outer function call.
Ooops.
regards, tom lane
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
I wrote:
Hmm. I am not feeling especially comfortable about this: it's not clear
that there's anything preventing a suspended portal from containing a
dangerous reference. However, a quick trial did not show that it was
possible to break it --- in the cases I tried, we detected that a cached
plan was no longer valid, tried to rebuild it, and noticed the missing
object at that point. So maybe it's OK.
After further thought I concluded that this probably is safe. The
portal's original query was created and planned when it was opened,
so it cannot contain any references to objects of the failed
subtransaction. We have a hazard from queries within functions,
but if the portal is suspended then no such functions are in progress.
Attached is an updated patch with comments to this effect and some
minor other code cleanup (mainly, not assuming that CurrentResourceOwner
is the right thing to use in AtSubAbort_Portals).
regards, tom lane
Attachments:
kill-active-portals-in-subxact-abort.patchtext/x-diff; charset=us-ascii; name=kill-active-portals-in-subxact-abort.patchDownload
diff --git a/src/backend/access/transam/xact.c b/src/backend/access/transam/xact.c
index b53d95f..ae9e850 100644
*** a/src/backend/access/transam/xact.c
--- b/src/backend/access/transam/xact.c
*************** AbortSubTransaction(void)
*** 4585,4590 ****
--- 4585,4591 ----
AfterTriggerEndSubXact(false);
AtSubAbort_Portals(s->subTransactionId,
s->parent->subTransactionId,
+ s->curTransactionOwner,
s->parent->curTransactionOwner);
AtEOSubXact_LargeObject(false, s->subTransactionId,
s->parent->subTransactionId);
diff --git a/src/backend/commands/portalcmds.c b/src/backend/commands/portalcmds.c
index 2794537..327b2a5 100644
*** a/src/backend/commands/portalcmds.c
--- b/src/backend/commands/portalcmds.c
*************** PersistHoldablePortal(Portal portal)
*** 335,345 ****
/*
* Check for improper portal use, and mark portal active.
*/
! if (portal->status != PORTAL_READY)
! ereport(ERROR,
! (errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
! errmsg("portal \"%s\" cannot be run", portal->name)));
! portal->status = PORTAL_ACTIVE;
/*
* Set up global portal context pointers.
--- 335,341 ----
/*
* Check for improper portal use, and mark portal active.
*/
! MarkPortalActive(portal);
/*
* Set up global portal context pointers.
diff --git a/src/backend/tcop/pquery.c b/src/backend/tcop/pquery.c
index 9c14e8a..0df86a2 100644
*** a/src/backend/tcop/pquery.c
--- b/src/backend/tcop/pquery.c
*************** PortalRun(Portal portal, long count, boo
*** 734,744 ****
/*
* Check for improper portal use, and mark portal active.
*/
! if (portal->status != PORTAL_READY)
! ereport(ERROR,
! (errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
! errmsg("portal \"%s\" cannot be run", portal->name)));
! portal->status = PORTAL_ACTIVE;
/*
* Set up global portal context pointers.
--- 734,740 ----
/*
* Check for improper portal use, and mark portal active.
*/
! MarkPortalActive(portal);
/*
* Set up global portal context pointers.
*************** PortalRunFetch(Portal portal,
*** 1398,1408 ****
/*
* Check for improper portal use, and mark portal active.
*/
! if (portal->status != PORTAL_READY)
! ereport(ERROR,
! (errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
! errmsg("portal \"%s\" cannot be run", portal->name)));
! portal->status = PORTAL_ACTIVE;
/*
* Set up global portal context pointers.
--- 1394,1400 ----
/*
* Check for improper portal use, and mark portal active.
*/
! MarkPortalActive(portal);
/*
* Set up global portal context pointers.
diff --git a/src/backend/utils/mmgr/portalmem.c b/src/backend/utils/mmgr/portalmem.c
index 7530498..44a87f7 100644
*** a/src/backend/utils/mmgr/portalmem.c
--- b/src/backend/utils/mmgr/portalmem.c
*************** CreatePortal(const char *name, bool allo
*** 232,237 ****
--- 232,238 ----
portal->status = PORTAL_NEW;
portal->cleanup = PortalCleanup;
portal->createSubid = GetCurrentSubTransactionId();
+ portal->activeSubid = portal->createSubid;
portal->strategy = PORTAL_MULTI_QUERY;
portal->cursorOptions = CURSOR_OPT_NO_SCROLL;
portal->atStart = true;
*************** UnpinPortal(Portal portal)
*** 403,408 ****
--- 404,428 ----
}
/*
+ * MarkPortalActive
+ * Transition a portal from READY to ACTIVE state.
+ *
+ * NOTE: never set portal->status = PORTAL_ACTIVE directly; call this instead.
+ */
+ void
+ MarkPortalActive(Portal portal)
+ {
+ /* For safety, this is a runtime test not just an Assert */
+ if (portal->status != PORTAL_READY)
+ ereport(ERROR,
+ (errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
+ errmsg("portal \"%s\" cannot be run", portal->name)));
+ /* Perform the state transition */
+ portal->status = PORTAL_ACTIVE;
+ portal->activeSubid = GetCurrentSubTransactionId();
+ }
+
+ /*
* MarkPortalDone
* Transition a portal from ACTIVE to DONE state.
*
*************** PreCommit_Portals(bool isPrepare)
*** 690,695 ****
--- 710,716 ----
* not belonging to this transaction.
*/
portal->createSubid = InvalidSubTransactionId;
+ portal->activeSubid = InvalidSubTransactionId;
/* Report we changed state */
result = true;
*************** AtCleanup_Portals(void)
*** 836,843 ****
/*
* Pre-subcommit processing for portals.
*
! * Reassign the portals created in the current subtransaction to the parent
! * subtransaction.
*/
void
AtSubCommit_Portals(SubTransactionId mySubid,
--- 857,864 ----
/*
* Pre-subcommit processing for portals.
*
! * Reassign portals created or used in the current subtransaction to the
! * parent subtransaction.
*/
void
AtSubCommit_Portals(SubTransactionId mySubid,
*************** AtSubCommit_Portals(SubTransactionId myS
*** 859,872 ****
if (portal->resowner)
ResourceOwnerNewParent(portal->resowner, parentXactOwner);
}
}
}
/*
* Subtransaction abort handling for portals.
*
! * Deactivate portals created during the failed subtransaction.
! * Note that per AtSubCommit_Portals, this will catch portals created
* in descendants of the subtransaction too.
*
* We don't destroy any portals here; that's done in AtSubCleanup_Portals.
--- 880,895 ----
if (portal->resowner)
ResourceOwnerNewParent(portal->resowner, parentXactOwner);
}
+ if (portal->activeSubid == mySubid)
+ portal->activeSubid = parentSubid;
}
}
/*
* Subtransaction abort handling for portals.
*
! * Deactivate portals created or used during the failed subtransaction.
! * Note that per AtSubCommit_Portals, this will catch portals created/used
* in descendants of the subtransaction too.
*
* We don't destroy any portals here; that's done in AtSubCleanup_Portals.
*************** AtSubCommit_Portals(SubTransactionId myS
*** 874,879 ****
--- 897,903 ----
void
AtSubAbort_Portals(SubTransactionId mySubid,
SubTransactionId parentSubid,
+ ResourceOwner myXactOwner,
ResourceOwner parentXactOwner)
{
HASH_SEQ_STATUS status;
*************** AtSubAbort_Portals(SubTransactionId mySu
*** 885,900 ****
{
Portal portal = hentry->portal;
if (portal->createSubid != mySubid)
continue;
/*
* Force any live portals of my own subtransaction into FAILED state.
* We have to do this because they might refer to objects created or
! * changed in the failed subtransaction, leading to crashes if
! * execution is resumed, or even if we just try to run ExecutorEnd.
! * (Note we do NOT do this to upper-level portals, since they cannot
! * have such references and hence may be able to continue.)
*/
if (portal->status == PORTAL_READY ||
portal->status == PORTAL_ACTIVE)
--- 909,966 ----
{
Portal portal = hentry->portal;
+ /* Was it created in this subtransaction? */
if (portal->createSubid != mySubid)
+ {
+ /* No, but maybe it was used in this subtransaction? */
+ if (portal->activeSubid == mySubid)
+ {
+ /* Maintain activeSubid until the portal is removed */
+ portal->activeSubid = parentSubid;
+
+ /*
+ * Upper-level portals that failed while running in this
+ * subtransaction must be forced into FAILED state, for the
+ * same reasons discussed below.
+ *
+ * We assume we can get away without forcing upper-level READY
+ * portals to fail, even if they were run and then suspended.
+ * In theory a suspended upper-level portal could have
+ * acquired some references to objects that are about to be
+ * destroyed, but there should be sufficient defenses against
+ * such cases: the portal's original query cannot contain such
+ * references, and any references within, say, cached plans of
+ * PL/pgSQL functions are not from active queries and should
+ * be protected by revalidation logic.
+ */
+ if (portal->status == PORTAL_ACTIVE)
+ MarkPortalFailed(portal);
+
+ /*
+ * Also, if we failed it during the current subtransaction
+ * (either just above, or earlier), reattach its resource
+ * owner to the current subtransaction's resource owner, so
+ * that any resources it still holds will be released while
+ * cleaning up this subtransaction. This prevents some corner
+ * cases wherein we might get Asserts or worse while cleaning
+ * up objects created during the current subtransaction
+ * (because they're still referenced within this portal).
+ */
+ if (portal->status == PORTAL_FAILED && portal->resowner)
+ {
+ ResourceOwnerNewParent(portal->resowner, myXactOwner);
+ portal->resowner = NULL;
+ }
+ }
+ /* Done if it wasn't created in this subtransaction */
continue;
+ }
/*
* Force any live portals of my own subtransaction into FAILED state.
* We have to do this because they might refer to objects created or
! * changed in the failed subtransaction, leading to crashes within
! * ExecutorEnd when portalcmds.c tries to close down the portal.
*/
if (portal->status == PORTAL_READY ||
portal->status == PORTAL_ACTIVE)
diff --git a/src/include/utils/portal.h b/src/include/utils/portal.h
index 1267b9e..c369107 100644
*** a/src/include/utils/portal.h
--- b/src/include/utils/portal.h
*************** typedef struct PortalData
*** 119,130 ****
MemoryContext heap; /* subsidiary memory for portal */
ResourceOwner resowner; /* resources owned by portal */
void (*cleanup) (Portal portal); /* cleanup hook */
- SubTransactionId createSubid; /* the ID of the creating subxact */
/*
! * if createSubid is InvalidSubTransactionId, the portal is held over from
! * a previous transaction
*/
/* The query or queries the portal will execute */
const char *sourceText; /* text of query (as of 8.4, never NULL) */
--- 119,134 ----
MemoryContext heap; /* subsidiary memory for portal */
ResourceOwner resowner; /* resources owned by portal */
void (*cleanup) (Portal portal); /* cleanup hook */
/*
! * State data for remembering which subtransaction(s) the portal was
! * created or used in. If the portal is held over from a previous
! * transaction, both subxids are InvalidSubTransactionId. Otherwise,
! * createSubid is the creating subxact and activeSubid is the last subxact
! * in which we ran the portal.
*/
+ SubTransactionId createSubid; /* the creating subxact */
+ SubTransactionId activeSubid; /* the last subxact with activity */
/* The query or queries the portal will execute */
const char *sourceText; /* text of query (as of 8.4, never NULL) */
*************** extern void AtSubCommit_Portals(SubTrans
*** 201,212 ****
--- 205,218 ----
ResourceOwner parentXactOwner);
extern void AtSubAbort_Portals(SubTransactionId mySubid,
SubTransactionId parentSubid,
+ ResourceOwner myXactOwner,
ResourceOwner parentXactOwner);
extern void AtSubCleanup_Portals(SubTransactionId mySubid);
extern Portal CreatePortal(const char *name, bool allowDup, bool dupSilent);
extern Portal CreateNewPortal(void);
extern void PinPortal(Portal portal);
extern void UnpinPortal(Portal portal);
+ extern void MarkPortalActive(Portal portal);
extern void MarkPortalDone(Portal portal);
extern void MarkPortalFailed(Portal portal);
extern void PortalDrop(Portal portal, bool isTopCommit);
diff --git a/src/test/regress/expected/transactions.out b/src/test/regress/expected/transactions.out
index 5d70863..d9b702d 100644
*** a/src/test/regress/expected/transactions.out
--- b/src/test/regress/expected/transactions.out
*************** fetch from foo;
*** 613,618 ****
--- 613,664 ----
(1 row)
abort;
+ -- Test for proper cleanup after a failure in a cursor portal
+ -- that was created in an outer subtransaction
+ CREATE FUNCTION invert(x float8) RETURNS float8 LANGUAGE plpgsql AS
+ $$ begin return 1/x; end $$;
+ CREATE FUNCTION create_temp_tab() RETURNS text
+ LANGUAGE plpgsql AS $$
+ BEGIN
+ CREATE TEMP TABLE new_table (f1 float8);
+ -- case of interest is that we fail while holding an open
+ -- relcache reference to new_table
+ INSERT INTO new_table SELECT invert(0.0);
+ RETURN 'foo';
+ END $$;
+ BEGIN;
+ DECLARE ok CURSOR FOR SELECT * FROM int8_tbl;
+ DECLARE ctt CURSOR FOR SELECT create_temp_tab();
+ FETCH ok;
+ q1 | q2
+ -----+-----
+ 123 | 456
+ (1 row)
+
+ SAVEPOINT s1;
+ FETCH ok; -- should work
+ q1 | q2
+ -----+------------------
+ 123 | 4567890123456789
+ (1 row)
+
+ FETCH ctt; -- error occurs here
+ ERROR: division by zero
+ CONTEXT: PL/pgSQL function invert(double precision) line 1 at RETURN
+ SQL statement "INSERT INTO new_table SELECT invert(0.0)"
+ PL/pgSQL function create_temp_tab() line 6 at SQL statement
+ ROLLBACK TO s1;
+ FETCH ok; -- should work
+ q1 | q2
+ ------------------+-----
+ 4567890123456789 | 123
+ (1 row)
+
+ FETCH ctt; -- must be rejected
+ ERROR: portal "ctt" cannot be run
+ COMMIT;
+ DROP FUNCTION create_temp_tab();
+ DROP FUNCTION invert(x float8);
-- Test for successful cleanup of an aborted transaction at session exit.
-- THIS MUST BE THE LAST TEST IN THIS FILE.
begin;
diff --git a/src/test/regress/sql/transactions.sql b/src/test/regress/sql/transactions.sql
index 9fac4a3..bf9cb05 100644
*** a/src/test/regress/sql/transactions.sql
--- b/src/test/regress/sql/transactions.sql
*************** fetch from foo;
*** 387,392 ****
--- 387,424 ----
abort;
+
+ -- Test for proper cleanup after a failure in a cursor portal
+ -- that was created in an outer subtransaction
+ CREATE FUNCTION invert(x float8) RETURNS float8 LANGUAGE plpgsql AS
+ $$ begin return 1/x; end $$;
+
+ CREATE FUNCTION create_temp_tab() RETURNS text
+ LANGUAGE plpgsql AS $$
+ BEGIN
+ CREATE TEMP TABLE new_table (f1 float8);
+ -- case of interest is that we fail while holding an open
+ -- relcache reference to new_table
+ INSERT INTO new_table SELECT invert(0.0);
+ RETURN 'foo';
+ END $$;
+
+ BEGIN;
+ DECLARE ok CURSOR FOR SELECT * FROM int8_tbl;
+ DECLARE ctt CURSOR FOR SELECT create_temp_tab();
+ FETCH ok;
+ SAVEPOINT s1;
+ FETCH ok; -- should work
+ FETCH ctt; -- error occurs here
+ ROLLBACK TO s1;
+ FETCH ok; -- should work
+ FETCH ctt; -- must be rejected
+ COMMIT;
+
+ DROP FUNCTION create_temp_tab();
+ DROP FUNCTION invert(x float8);
+
+
-- Test for successful cleanup of an aborted transaction at session exit.
-- THIS MUST BE THE LAST TEST IN THIS FILE.
On Fri, Sep 4, 2015 at 12:04 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
I wrote:
Hmm. I am not feeling especially comfortable about this: it's not clear
that there's anything preventing a suspended portal from containing a
dangerous reference. However, a quick trial did not show that it was
possible to break it --- in the cases I tried, we detected that a cached
plan was no longer valid, tried to rebuild it, and noticed the missing
object at that point. So maybe it's OK.After further thought I concluded that this probably is safe. The
portal's original query was created and planned when it was opened,
so it cannot contain any references to objects of the failed
subtransaction. We have a hazard from queries within functions,
but if the portal is suspended then no such functions are in progress.Attached is an updated patch with comments to this effect and some
minor other code cleanup (mainly, not assuming that CurrentResourceOwner
is the right thing to use in AtSubAbort_Portals).
Thanks! This looks good to me.
--
Michael
Michael Paquier <michael.paquier@gmail.com> writes:
On Fri, Sep 4, 2015 at 12:04 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
Attached is an updated patch with comments to this effect and some
minor other code cleanup (mainly, not assuming that CurrentResourceOwner
is the right thing to use in AtSubAbort_Portals).
Thanks! This looks good to me.
If no further comments, I'll see about pushing and back-patching this.
In the stable branches, the new field in Portal needs to go at the end,
to avoid an ABI break for extensions that look at Portals. Otherwise I'm
inclined to push it as-is. In particular, that would break any extensions
that call AtSubAbort_Portals() directly, but I'm having a really hard time
believing that there are any. (If anyone can make a case that there are
some, the fallback would be to remove the new argument in the back
branches and make AtSubAbort_Portals depend on CurrentResourceOwner again;
but I'd rather not do it that way.)
Also, I am very strongly tempted to convert the original problem-causing
Assert in relcache.c into a regular runtime test-and-elog. If we're wrong
about having sealed off this hole completely, I'd much rather see the
result be an elog than silent corruption of the relcache.
regards, tom lane
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Fri, Sep 4, 2015 at 10:52 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
Also, I am very strongly tempted to convert the original problem-causing
Assert in relcache.c into a regular runtime test-and-elog. If we're wrong
about having sealed off this hole completely, I'd much rather see the
result be an elog than silent corruption of the relcache.
+1. For a HEAD-only change I suppose.
--
Michael
Michael Paquier <michael.paquier@gmail.com> writes:
On Fri, Sep 4, 2015 at 10:52 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
Also, I am very strongly tempted to convert the original problem-causing
Assert in relcache.c into a regular runtime test-and-elog. If we're wrong
about having sealed off this hole completely, I'd much rather see the
result be an elog than silent corruption of the relcache.
+1. For a HEAD-only change I suppose.
No, the point is to have less dangerous behavior *in the field* if the
situation occurs. I have every intention of back-patching this bit as
well.
Although after some experimentation with a build lacking the Portal-level
fixes, an elog(ERROR) in RelationClearRelation isn't much fun either,
because the problem case causes errors during error cleanup, soon leading
to "PANIC: ERRORDATA_STACK_SIZE exceeded".
After review of all the callers of RelationClearRelation, it seems like
most of them already have sufficient guards to prevent triggering of the
Assert. The ones that lack such tests are AtEOXact_cleanup and
AtEOSubXact_cleanup. So what I'm now thinking is that those should do
something along the lines of
if (isCommit)
relation->rd_createSubid = parentSubid;
else if (RelationHasReferenceCountZero(relation))
{
RelationClearRelation(relation, false);
return;
}
else
{
elog(WARNING, "rel so-and-so is still referenced");
relation->rd_createSubid = parentSubid;
}
so as to mitigate the damage if there's a dangling reference.
regards, tom lane
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
I wrote:
After review of all the callers of RelationClearRelation, it seems like
most of them already have sufficient guards to prevent triggering of the
Assert. The ones that lack such tests are AtEOXact_cleanup and
AtEOSubXact_cleanup. So what I'm now thinking is that those should do
something along the lines of
Specifically, this, which can be shown to mitigate the results of the
problem cases in an otherwise-unpatched build.
regards, tom lane
Attachments:
prevent-relcache-assert-failure.patchtext/x-diff; charset=us-ascii; name=prevent-relcache-assert-failure.patchDownload
diff --git a/src/backend/utils/cache/relcache.c b/src/backend/utils/cache/relcache.c
index 44e9509..420ef3d 100644
*** a/src/backend/utils/cache/relcache.c
--- b/src/backend/utils/cache/relcache.c
*************** RelationClearRelation(Relation relation,
*** 2048,2054 ****
{
/*
* As per notes above, a rel to be rebuilt MUST have refcnt > 0; while of
! * course it would be a bad idea to blow away one with nonzero refcnt.
*/
Assert(rebuild ?
!RelationHasReferenceCountZero(relation) :
--- 2048,2056 ----
{
/*
* As per notes above, a rel to be rebuilt MUST have refcnt > 0; while of
! * course it would be an equally bad idea to blow away one with nonzero
! * refcnt, since that would leave someone somewhere with a dangling
! * pointer. All callers are expected to have verified that this holds.
*/
Assert(rebuild ?
!RelationHasReferenceCountZero(relation) :
*************** AtEOXact_cleanup(Relation relation, bool
*** 2654,2664 ****
{
if (isCommit)
relation->rd_createSubid = InvalidSubTransactionId;
! else
{
RelationClearRelation(relation, false);
return;
}
}
/*
--- 2656,2680 ----
{
if (isCommit)
relation->rd_createSubid = InvalidSubTransactionId;
! else if (RelationHasReferenceCountZero(relation))
{
RelationClearRelation(relation, false);
return;
}
+ else
+ {
+ /*
+ * Hmm, somewhere there's a (leaked?) reference to the relation.
+ * We daren't remove the entry for fear of dereferencing a
+ * dangling pointer later. Bleat, and mark it as not belonging to
+ * the current transaction. Hopefully it'll get cleaned up
+ * eventually. This must be just a WARNING to avoid
+ * error-during-error-recovery loops.
+ */
+ relation->rd_createSubid = InvalidSubTransactionId;
+ elog(WARNING, "cannot remove relcache entry for \"%s\" because it has nonzero refcount",
+ RelationGetRelationName(relation));
+ }
}
/*
*************** AtEOSubXact_cleanup(Relation relation, b
*** 2747,2757 ****
{
if (isCommit)
relation->rd_createSubid = parentSubid;
! else
{
RelationClearRelation(relation, false);
return;
}
}
/*
--- 2763,2786 ----
{
if (isCommit)
relation->rd_createSubid = parentSubid;
! else if (RelationHasReferenceCountZero(relation))
{
RelationClearRelation(relation, false);
return;
}
+ else
+ {
+ /*
+ * Hmm, somewhere there's a (leaked?) reference to the relation.
+ * We daren't remove the entry for fear of dereferencing a
+ * dangling pointer later. Bleat, and transfer it to the parent
+ * subtransaction so we can try again later. This must be just a
+ * WARNING to avoid error-during-error-recovery loops.
+ */
+ relation->rd_createSubid = parentSubid;
+ elog(WARNING, "cannot remove relcache entry for \"%s\" because it has nonzero refcount",
+ RelationGetRelationName(relation));
+ }
}
/*
I wrote:
After review of all the callers of RelationClearRelation, it seems like
most of them already have sufficient guards to prevent triggering of the
Assert. The ones that lack such tests are AtEOXact_cleanup and
AtEOSubXact_cleanup. So what I'm now thinking is that those should do
something along the lines of
Specifically, this, which can be shown to mitigate the results of the
problem cases in an otherwise-unpatched build.
And pushed. I noticed that while the relcache.c fix mitigates the error
pretty well in 9.3 and up, in the older branches you still end up with
a PANIC due to error stack overflow. This may indicate that there's
some patch we'd have been better off back-porting. However, with the
Portal changes in place the test cases work in all branches, so I'm
not excited enough to pursue the point further myself.
regards, tom lane
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On 9/4/15 12:41 PM, Tom Lane wrote:
I wrote:
After review of all the callers of RelationClearRelation, it seems like
most of them already have sufficient guards to prevent triggering of the
Assert. The ones that lack such tests are AtEOXact_cleanup and
AtEOSubXact_cleanup. So what I'm now thinking is that those should do
something along the lines ofSpecifically, this, which can be shown to mitigate the results of the
problem cases in an otherwise-unpatched build.And pushed. I noticed that while the relcache.c fix mitigates the error
pretty well in 9.3 and up, in the older branches you still end up with
a PANIC due to error stack overflow. This may indicate that there's
some patch we'd have been better off back-porting. However, with the
Portal changes in place the test cases work in all branches, so I'm
not excited enough to pursue the point further myself.
I was about to test this and was verifying that the crash still worked
when I noticed this in the logs (9.4.1, not HEAD). Not sure if there's
any realion here or not...
WARNING: relation "pg_proc" page 121 is uninitialized --- fixing
WARNING: relation "pg_proc" page 122 is uninitialized --- fixing
WARNING: relation "pg_proc" page 123 is uninitialized --- fixing
WARNING: relation "pg_proc" page 124 is uninitialized --- fixing
WARNING: relation "pg_proc" page 125 is uninitialized --- fixing
WARNING: relation "pg_proc" page 126 is uninitialized --- fixing
WARNING: relation "pg_proc" page 127 is uninitialized --- fixing
WARNING: relation "pg_proc" page 128 is uninitialized --- fixing
WARNING: relation "pg_proc" page 129 is uninitialized --- fixing
WARNING: relation "pg_proc" page 130 is uninitialized --- fixing
WARNING: relation "pg_proc" page 131 is uninitialized --- fixing
WARNING: relation "pg_proc" page 132 is uninitialized --- fixing
WARNING: relation "pg_proc" page 133 is uninitialized --- fixing
WARNING: relation "pg_proc" page 134 is uninitialized --- fixing
WARNING: relation "pg_proc" page 135 is uninitialized --- fixing
WARNING: relation "pg_proc" page 136 is uninitialized --- fixing
WARNING: relation "pg_proc" page 137 is uninitialized --- fixing
WARNING: relation "pg_proc" page 138 is uninitialized --- fixing
WARNING: relation "pg_proc" page 139 is uninitialized --- fixing
WARNING: relation "pg_proc" page 140 is uninitialized --- fixing
WARNING: relation "pg_proc" page 141 is uninitialized --- fixing
WARNING: relation "pg_proc" page 142 is uninitialized --- fixing
WARNING: relation "pg_proc" page 143 is uninitialized --- fixing
WARNING: relation "pg_proc" page 144 is uninitialized --- fixing
WARNING: relation "pg_proc" page 145 is uninitialized --- fixing
WARNING: relation "pg_proc" page 146 is uninitialized --- fixing
WARNING: relation "pg_proc" page 147 is uninitialized --- fixing
WARNING: relation "pg_proc" page 148 is uninitialized --- fixing
WARNING: relation "pg_proc" page 149 is uninitialized --- fixing
WARNING: relation "pg_proc" page 150 is uninitialized --- fixing
WARNING: relation "pg_proc" page 151 is uninitialized --- fixing
WARNING: relation "pg_proc" page 152 is uninitialized --- fixing
WARNING: relation "pg_proc" page 153 is uninitialized --- fixing
WARNING: relation "pg_proc" page 154 is uninitialized --- fixing
WARNING: relation "pg_proc" page 155 is uninitialized --- fixing
WARNING: relation "pg_proc" page 156 is uninitialized --- fixing
WARNING: relation "pg_proc" page 157 is uninitialized --- fixing
WARNING: relation "pg_proc" page 158 is uninitialized --- fixing
WARNING: relation "pg_proc" page 159 is uninitialized --- fixing
WARNING: relation "pg_proc" page 160 is uninitialized --- fixing
WARNING: relation "pg_proc" page 161 is uninitialized --- fixing
WARNING: relation "pg_proc" page 162 is uninitialized --- fixing
WARNING: relation "pg_proc" page 246 is uninitialized --- fixing
WARNING: relation "pg_depend" page 82 is uninitialized --- fixing
WARNING: relation "pg_depend" page 83 is uninitialized --- fixing
WARNING: relation "pg_depend" page 84 is uninitialized --- fixing
WARNING: relation "pg_depend" page 85 is uninitialized --- fixing
WARNING: relation "pg_depend" page 86 is uninitialized --- fixing
WARNING: relation "pg_depend" page 87 is uninitialized --- fixing
WARNING: relation "pg_depend" page 88 is uninitialized --- fixing
WARNING: relation "pg_depend" page 89 is uninitialized --- fixing
WARNING: relation "pg_depend" page 90 is uninitialized --- fixing
WARNING: relation "pg_depend" page 91 is uninitialized --- fixing
WARNING: relation "pg_depend" page 92 is uninitialized --- fixing
WARNING: relation "pg_depend" page 93 is uninitialized --- fixing
WARNING: relation "pg_depend" page 94 is uninitialized --- fixing
WARNING: relation "pg_depend" page 95 is uninitialized --- fixing
WARNING: relation "pg_depend" page 96 is uninitialized --- fixing
WARNING: relation "pg_depend" page 112 is uninitialized --- fixing
WARNING: relation "pg_depend" page 128 is uninitialized --- fixing
--
Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX
Experts in Analytics, Data Architecture and PostgreSQL
Data in Trouble? Get it in Treble! http://BlueTreble.com
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Sat, Sep 5, 2015 at 3:41 PM, Jim Nasby <Jim.Nasby@bluetreble.com> wrote:
I was about to test this and was verifying that the crash still worked
when I noticed this in the logs (9.4.1, not HEAD). Not sure if there's any
realion here or not...WARNING: relation "pg_proc" page 121 is uninitialized --- fixing
WARNING: relation "pg_proc" page 122 is uninitialized --- fixing
[reading vacuumlazy.c...] This seems unrelated and I would not worry about
it. Those system catalogs have been extended with new pages by a couple of
backends, but a crash happened before they could actually insert tuples on
it and commit. Perhaps you were creating a bunch of objects when a crash
happened, no?
Coming to the point, did you see a new crash with test_factory? Is that
some remnant of a previous test?
--
Michael
Michael Paquier <michael.paquier@gmail.com> writes:
On Sat, Sep 5, 2015 at 3:41 PM, Jim Nasby <Jim.Nasby@bluetreble.com> wrote:
I was about to test this and was verifying that the crash still worked
when I noticed this in the logs (9.4.1, not HEAD). Not sure if there's any
realion here or not...WARNING: relation "pg_proc" page 121 is uninitialized --- fixing
WARNING: relation "pg_proc" page 122 is uninitialized --- fixing
[reading vacuumlazy.c...] This seems unrelated and I would not worry about
it. Those system catalogs have been extended with new pages by a couple of
backends, but a crash happened before they could actually insert tuples on
it and commit. Perhaps you were creating a bunch of objects when a crash
happened, no?
If this is the system you were doing the previous crash testing on, I'd
say it's a direct artifact of those crashes. You were repeatedly crashing
the system during transactions that created functions and temp tables,
which would surely make entries in pg_proc, pg_depend, etc. The crash
came before commit and so neither the relation pages nor the WAL entries
would necessarily get to disk. But extension of the relations to make
room would still happen. Moreover, there's probably nothing to cause
those new pages to get added to the relations' FSM, so each new test
attempt would add another set of pages. Eventually autovacuum would
notice, which is what you've got logged here, but it might take awhile.
regards, tom lane
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Thu, Sep 03, 2015 at 11:04:11PM -0400, Tom Lane wrote:
*************** AtSubAbort_Portals(SubTransactionId mySu
--- 909,966 ---- { Portal portal = hentry->portal;+ /* Was it created in this subtransaction? */ if (portal->createSubid != mySubid) + { + /* No, but maybe it was used in this subtransaction? */ + if (portal->activeSubid == mySubid) + {
...
+ if (portal->status == PORTAL_ACTIVE) + MarkPortalFailed(portal);
Do you have a test case that reaches this particular MarkPortalFailed() call?
My attempts stumbled over the fact that, before we reach here, each of the
three MarkPortalActive() callers will have already called MarkPortalFailed()
in its PG_CATCH block. ("make check" does not reach this call.)
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
Noah Misch <noah@leadboat.com> writes:
On Thu, Sep 03, 2015 at 11:04:11PM -0400, Tom Lane wrote:
*************** AtSubAbort_Portals(SubTransactionId mySu
--- 909,966 ---- { Portal portal = hentry->portal;+ /* Was it created in this subtransaction? */ if (portal->createSubid != mySubid) + { + /* No, but maybe it was used in this subtransaction? */ + if (portal->activeSubid == mySubid) + {...
+ if (portal->status == PORTAL_ACTIVE) + MarkPortalFailed(portal);
Do you have a test case that reaches this particular MarkPortalFailed() call?
My attempts stumbled over the fact that, before we reach here, each of the
three MarkPortalActive() callers will have already called MarkPortalFailed()
in its PG_CATCH block. ("make check" does not reach this call.)
Offhand I think that's just belt-and-suspenders-too coding. As you say,
we'd typically have failed active portals already before getting here.
But the responsibility of this routine is to *guarantee* that no broken
portals remain active, so I'd not want to remove this check.
Do you have a particular reason for asking?
regards, tom lane
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Sat, Jan 02, 2016 at 01:46:07PM -0500, Tom Lane wrote:
Noah Misch <noah@leadboat.com> writes:
On Thu, Sep 03, 2015 at 11:04:11PM -0400, Tom Lane wrote:
*************** AtSubAbort_Portals(SubTransactionId mySu
--- 909,966 ---- { Portal portal = hentry->portal;+ /* Was it created in this subtransaction? */ if (portal->createSubid != mySubid) + { + /* No, but maybe it was used in this subtransaction? */ + if (portal->activeSubid == mySubid) + {...
+ if (portal->status == PORTAL_ACTIVE) + MarkPortalFailed(portal);Do you have a test case that reaches this particular MarkPortalFailed() call?
My attempts stumbled over the fact that, before we reach here, each of the
three MarkPortalActive() callers will have already called MarkPortalFailed()
in its PG_CATCH block. ("make check" does not reach this call.)Offhand I think that's just belt-and-suspenders-too coding. As you say,
we'd typically have failed active portals already before getting here.
But the responsibility of this routine is to *guarantee* that no broken
portals remain active, so I'd not want to remove this check.Do you have a particular reason for asking?
After reading the log message for and comments added in commit c5454f9, I
understood that the commit changed PostgreSQL to fail portals that did not
fail before. That is to say, queries that formerly completed without error
would now elicit errors. I was looking into what sorts of queries it affected
in this way. If the new MarkPortalFailed() is indeed dead code, then the
commit affects no query that way. The meat of the commit is then the
ResourceOwnerNewParent() call, which lets queries ERROR cleanly instead of
seeing assertion failures or undefined behavior.
I am inclined to add an Assert(portal->status != PORTAL_ACTIVE) to emphasize
that this is backup only. MarkPortalActive() callers remain responsible for
updating the status to something else before relinquishing control.
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
Noah Misch <noah@leadboat.com> writes:
I am inclined to add an Assert(portal->status != PORTAL_ACTIVE) to emphasize
that this is backup only. MarkPortalActive() callers remain responsible for
updating the status to something else before relinquishing control.
No, I do not think that would be an improvement. There is no contract
saying that this must be done earlier, IMO.
regards, tom lane
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Sat, Jan 02, 2016 at 07:22:13PM -0500, Tom Lane wrote:
Noah Misch <noah@leadboat.com> writes:
I am inclined to add an Assert(portal->status != PORTAL_ACTIVE) to emphasize
that this is backup only. MarkPortalActive() callers remain responsible for
updating the status to something else before relinquishing control.No, I do not think that would be an improvement. There is no contract
saying that this must be done earlier, IMO.
Indeed, nobody wrote a contract. The assertion would record what has been the
sole standing practice for eleven years (since commit a393fbf9). It would
prompt discussion if a proposed patch would depart from that practice, and
that is a good thing. Also, every addition of dead code should label that
code to aid future readers.
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Sun, Jan 03, 2016 at 12:35:56AM -0500, Noah Misch wrote:
On Sat, Jan 02, 2016 at 07:22:13PM -0500, Tom Lane wrote:
Noah Misch <noah@leadboat.com> writes:
I am inclined to add an Assert(portal->status != PORTAL_ACTIVE) to emphasize
that this is backup only. MarkPortalActive() callers remain responsible for
updating the status to something else before relinquishing control.No, I do not think that would be an improvement. There is no contract
saying that this must be done earlier, IMO.Indeed, nobody wrote a contract. The assertion would record what has been the
sole standing practice for eleven years (since commit a393fbf9). It would
prompt discussion if a proposed patch would depart from that practice, and
that is a good thing. Also, every addition of dead code should label that
code to aid future readers.
Here's the patch I have in mind. AtAbort_Portals() has an older
MarkPortalFailed() call, and the story is somewhat different there per its new
comment. That call is plausible to reach with no bug involved, but
MarkPortalFailed() would then be the wrong thing.
nm
Attachments:
AtSubAbort_Portals-ACTIVE-v1.patchtext/plain; charset=us-asciiDownload
diff --git a/src/backend/utils/mmgr/portalmem.c b/src/backend/utils/mmgr/portalmem.c
index a53673c..632b202 100644
--- a/src/backend/utils/mmgr/portalmem.c
+++ b/src/backend/utils/mmgr/portalmem.c
@@ -762,13 +762,22 @@ AtAbort_Portals(void)
hash_seq_init(&status, PortalHashTable);
while ((hentry = (PortalHashEnt *) hash_seq_search(&status)) != NULL)
{
Portal portal = hentry->portal;
- /* Any portal that was actually running has to be considered broken */
+ /*
+ * See similar code in AtSubAbort_Portals(). This would fire if code
+ * orchestrating multiple top-level transactions within a portal, such
+ * as VACUUM, caught errors and continued under the same portal with a
+ * fresh transaction. No part of core PostgreSQL functions that way,
+ * though it's a fair thing to want. Such code would wish the portal
+ * to remain ACTIVE, as in PreCommit_Portals(); we don't cater to
+ * that. Instead, like AtSubAbort_Portals(), interpret this as a bug.
+ */
+ Assert(portal->status != PORTAL_ACTIVE);
if (portal->status == PORTAL_ACTIVE)
MarkPortalFailed(portal);
/*
* Do nothing else to cursors held over from a previous transaction.
*/
@@ -916,26 +925,29 @@ AtSubAbort_Portals(SubTransactionId mySubid,
if (portal->activeSubid == mySubid)
{
/* Maintain activeSubid until the portal is removed */
portal->activeSubid = parentSubid;
/*
- * Upper-level portals that failed while running in this
- * subtransaction must be forced into FAILED state, for the
- * same reasons discussed below.
+ * If a bug in a MarkPortalActive() caller has it miss cleanup
+ * after having failed while running an upper-level portal in
+ * this subtransaction, we don't know what else in the portal
+ * is wrong. Force it into FAILED state, for the same reasons
+ * discussed below.
*
* We assume we can get away without forcing upper-level READY
* portals to fail, even if they were run and then suspended.
* In theory a suspended upper-level portal could have
* acquired some references to objects that are about to be
* destroyed, but there should be sufficient defenses against
* such cases: the portal's original query cannot contain such
* references, and any references within, say, cached plans of
* PL/pgSQL functions are not from active queries and should
* be protected by revalidation logic.
*/
+ Assert(portal->status != PORTAL_ACTIVE);
if (portal->status == PORTAL_ACTIVE)
MarkPortalFailed(portal);
/*
* Also, if we failed it during the current subtransaction
* (either just above, or earlier), reattach its resource
@@ -957,14 +969,19 @@ AtSubAbort_Portals(SubTransactionId mySubid,
}
/*
* Force any live portals of my own subtransaction into FAILED state.
* We have to do this because they might refer to objects created or
* changed in the failed subtransaction, leading to crashes within
- * ExecutorEnd when portalcmds.c tries to close down the portal.
+ * ExecutorEnd when portalcmds.c tries to close down the portal. Each
+ * MarkPortalActive() caller ensures it updates the portal status
+ * again before relinquishing control, so ACTIVE can't happen here.
+ * If a bug does make it happen, dispose the portal like a normal
+ * MarkPortalActive() caller would.
*/
+ Assert(portal->status != PORTAL_ACTIVE);
if (portal->status == PORTAL_READY ||
portal->status == PORTAL_ACTIVE)
MarkPortalFailed(portal);
/*
* Allow portalcmds.c to clean up the state it knows about, if we
On Thu, Jan 28, 2016 at 12:40 PM, Noah Misch <noah@leadboat.com> wrote:
On Sun, Jan 03, 2016 at 12:35:56AM -0500, Noah Misch wrote:
On Sat, Jan 02, 2016 at 07:22:13PM -0500, Tom Lane wrote:
Noah Misch <noah@leadboat.com> writes:
I am inclined to add an Assert(portal->status != PORTAL_ACTIVE) to emphasize
that this is backup only. MarkPortalActive() callers remain responsible for
updating the status to something else before relinquishing control.No, I do not think that would be an improvement. There is no contract
saying that this must be done earlier, IMO.Indeed, nobody wrote a contract. The assertion would record what has been the
sole standing practice for eleven years (since commit a393fbf9). It would
prompt discussion if a proposed patch would depart from that practice, and
that is a good thing. Also, every addition of dead code should label that
code to aid future readers.Here's the patch I have in mind. AtAbort_Portals() has an older
MarkPortalFailed() call, and the story is somewhat different there per its new
comment. That call is plausible to reach with no bug involved, but
MarkPortalFailed() would then be the wrong thing.
+ * fresh transaction. No part of core PostgreSQL functions that way,
+ * though it's a fair thing to want. Such code would wish the portal
From the point of view of core code, this stands true, but, for my 2c,
honestly, I think that is just going to annoy more people working on
plugins and forks of Postgres. When working on Postgres-XC and
developing stuff for the core code, I recall having been annoyed a
couple of times by similar assert limitations, because sometimes they
did not actually make sense in the context of what I was doing, and
other times things got suddendly broken after bumping the forks code
base to a major upgrades because a routine used up to now broke its
contract. Perhaps I was doing it wrong at this point though, and
should have used something else.
--
Michael
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Wed, Jan 27, 2016 at 10:40 PM, Noah Misch <noah@leadboat.com> wrote:
On Sun, Jan 03, 2016 at 12:35:56AM -0500, Noah Misch wrote:
On Sat, Jan 02, 2016 at 07:22:13PM -0500, Tom Lane wrote:
Noah Misch <noah@leadboat.com> writes:
I am inclined to add an Assert(portal->status != PORTAL_ACTIVE) to emphasize
that this is backup only. MarkPortalActive() callers remain responsible for
updating the status to something else before relinquishing control.No, I do not think that would be an improvement. There is no contract
saying that this must be done earlier, IMO.Indeed, nobody wrote a contract. The assertion would record what has been the
sole standing practice for eleven years (since commit a393fbf9). It would
prompt discussion if a proposed patch would depart from that practice, and
that is a good thing. Also, every addition of dead code should label that
code to aid future readers.Here's the patch I have in mind. AtAbort_Portals() has an older
MarkPortalFailed() call, and the story is somewhat different there per its new
comment. That call is plausible to reach with no bug involved, but
MarkPortalFailed() would then be the wrong thing.
+ Assert(portal->status != PORTAL_ACTIVE);
if (portal->status == PORTAL_ACTIVE)
MarkPortalFailed(portal);
Now that just looks kooky to me. We assert that the portal isn't
active, but then cater to the possibility that it might be anyway?
That seems totally contrary to our usual programming practices, and a
bad idea for that reason.
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Thu, Jan 28, 2016 at 12:57:36PM +0900, Michael Paquier wrote:
On Thu, Jan 28, 2016 at 12:40 PM, Noah Misch <noah@leadboat.com> wrote: + * fresh transaction. No part of core PostgreSQL functions that way, + * though it's a fair thing to want. Such code would wish the portal
From the point of view of core code, this stands true, but, for my 2c,
honestly, I think that is just going to annoy more people working on
plugins and forks of Postgres. When working on Postgres-XC and
developing stuff for the core code, I recall having been annoyed a
couple of times by similar assert limitations
At first, I left out that assertion in case some extension code did the thing
I described, perhaps in a background worker. I then realized that
MarkPortalFailed() is the wrong thing for such code, which would want
treatment similar to this bit of PreCommit_Portals():
/*
* Do not touch active portals --- this can only happen in the case of
* a multi-transaction utility command, such as VACUUM.
*
* Note however that any resource owner attached to such a portal is
* still going to go away, so don't leave a dangling pointer.
*/
if (portal->status == PORTAL_ACTIVE)
{
portal->resowner = NULL;
continue;
}
If you can think of a case where the code would work okay despite its active
portal being marked as failed, that would be a good reason to omit the one
assertion. Otherwise, an assertion seems better than silently doing the
known-wrong thing.
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Wed, Jan 27, 2016 at 11:04:33PM -0500, Robert Haas wrote:
+ Assert(portal->status != PORTAL_ACTIVE);
if (portal->status == PORTAL_ACTIVE)
MarkPortalFailed(portal);Now that just looks kooky to me. We assert that the portal isn't
active, but then cater to the possibility that it might be anyway?
Right.
That seems totally contrary to our usual programming practices, and a
bad idea for that reason.
It is contrary to our usual programming practices, I agree. I borrowed the
idea from untenured code (da3751c8, 2015-11-11) in load_relcache_init_file():
if (nailed_rels != NUM_CRITICAL_SHARED_RELS ||
nailed_indexes != NUM_CRITICAL_SHARED_INDEXES)
{
elog(WARNING, "found %d nailed shared rels and %d nailed shared indexes in init file, but expected %d and %d respectively",
nailed_rels, nailed_indexes,
NUM_CRITICAL_SHARED_RELS, NUM_CRITICAL_SHARED_INDEXES);
/* Make sure we get developers' attention about this */
Assert(false);
I liked this pattern. It's a good fit for cases that we design to be
impossible yet for which we have a workaround if they do happen. That being
said, if you feel it's bad, I would be fine using elog(FATAL). I envision
this as a master-only change in any case. No PGXN module references
PORTAL_ACTIVE or MarkPortalActive(), so it's unlikely that extension code will
notice the change whether in Assert() form or in elog() form. What is best?
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Wed, Jan 27, 2016 at 11:47 PM, Noah Misch <noah@leadboat.com> wrote:
On Wed, Jan 27, 2016 at 11:04:33PM -0500, Robert Haas wrote:
+ Assert(portal->status != PORTAL_ACTIVE);
if (portal->status == PORTAL_ACTIVE)
MarkPortalFailed(portal);Now that just looks kooky to me. We assert that the portal isn't
active, but then cater to the possibility that it might be anyway?Right.
That seems totally contrary to our usual programming practices, and a
bad idea for that reason.It is contrary to our usual programming practices, I agree. I borrowed the
idea from untenured code (da3751c8, 2015-11-11) in load_relcache_init_file():if (nailed_rels != NUM_CRITICAL_SHARED_RELS ||
nailed_indexes != NUM_CRITICAL_SHARED_INDEXES)
{
elog(WARNING, "found %d nailed shared rels and %d nailed shared indexes in init file, but expected %d and %d respectively",
nailed_rels, nailed_indexes,
NUM_CRITICAL_SHARED_RELS, NUM_CRITICAL_SHARED_INDEXES);
/* Make sure we get developers' attention about this */
Assert(false);I liked this pattern. It's a good fit for cases that we design to be
impossible yet for which we have a workaround if they do happen. That being
said, if you feel it's bad, I would be fine using elog(FATAL). I envision
this as a master-only change in any case. No PGXN module references
PORTAL_ACTIVE or MarkPortalActive(), so it's unlikely that extension code will
notice the change whether in Assert() form or in elog() form. What is best?
I'm honestly failing to understand why we should change anything at
all. I don't believe that doing something more severe than marking
the portal failed actually improves anything. I suppose if I had to
pick between what you proposed before and elog(FATAL) I'd pick the
latter, but I see no real reason to cut off future code (or
third-party code) at the knees like that. I don't see it as either
necessary or desirable to forbid something just because there's no
clear present use case for it. The code you quote emits a warning
about a reasonably forseeable situation that can never be right, but
there's no particular reason to think that MarkPortalFailed is the
wrong thing to do here if that situation came up.
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Thu, Jan 28, 2016 at 11:12:15AM -0500, Robert Haas wrote:
I'm honestly failing to understand why we should change anything at
all. I don't believe that doing something more severe than marking
the portal failed actually improves anything. I suppose if I had to
pick between what you proposed before and elog(FATAL) I'd pick the
latter, but I see no real reason to cut off future code (or
third-party code) at the knees like that. I don't see it as either
necessary or desirable to forbid something just because there's no
clear present use case for it.
As you say, forbidding things makes friction in the event that someone comes
along wanting to do the forbidden thing. Forbidding things also simplifies
the system, making it easier to verify. This decision should depend on the
API's audience. We have prominently-public APIs like lsyscache.h,
stringinfo.h and funcapi.h. They should cater to reasonably-foreseeable use
cases, not just what yesterday's users have actually used. We then have
narrow-interest APIs like subselect.h and view.h. For those, the value of a
simpler system exceeds the risk-adjusted cost of friction. They should impose
strict constraints on their callers.
Portal status belongs to the second category. PortalRun(), PortalRunFetch()
and PersistHoldablePortal() are the three core functions that place portals
into PORTAL_ACTIVE status. No PGXN module does so; none so much as references
a PortalStatus constant or MarkPortal{Active,Done,Failed}() function. If
someone adds a fourth core portal runner, the system will be simpler and
better if that function replicates the structure of the existing three.
The code you quote emits a warning
about a reasonably forseeable situation that can never be right, but
there's no particular reason to think that MarkPortalFailed is the
wrong thing to do here if that situation came up.
I have two reasons to expect these MarkPortalFailed() calls will be the wrong
thing for hypothetical code reaching them. First, PortalRun() and peers reset
ActivePortal and PortalContext on error in addition to fixing portal status
with MarkPortalFailed(). If code forgets to update the status, there's a
substantial chance it forgot the other two. My patch added a comment
explaining the second reason:
+ /*
+ * See similar code in AtSubAbort_Portals(). This would fire if code
+ * orchestrating multiple top-level transactions within a portal, such
+ * as VACUUM, caught errors and continued under the same portal with a
+ * fresh transaction. No part of core PostgreSQL functions that way,
+ * though it's a fair thing to want. Such code would wish the portal
+ * to remain ACTIVE, as in PreCommit_Portals(); we don't cater to
+ * that. Instead, like AtSubAbort_Portals(), interpret this as a bug.
+ */
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Fri, Jan 29, 2016 at 11:19 PM, Noah Misch <noah@leadboat.com> wrote:
As you say, forbidding things makes friction in the event that someone comes
along wanting to do the forbidden thing. Forbidding things also simplifies
the system, making it easier to verify. This decision should depend on the
API's audience. We have prominently-public APIs like lsyscache.h,
stringinfo.h and funcapi.h. They should cater to reasonably-foreseeable use
cases, not just what yesterday's users have actually used. We then have
narrow-interest APIs like subselect.h and view.h. For those, the value of a
simpler system exceeds the risk-adjusted cost of friction. They should impose
strict constraints on their callers.Portal status belongs to the second category. PortalRun(), PortalRunFetch()
and PersistHoldablePortal() are the three core functions that place portals
into PORTAL_ACTIVE status. No PGXN module does so; none so much as references
a PortalStatus constant or MarkPortal{Active,Done,Failed}() function. If
someone adds a fourth core portal runner, the system will be simpler and
better if that function replicates the structure of the existing three.
I won't argue with that, but it strikes me that if I were reviewing a
patch to do such a thing, I'd surely compare the new caller against
the existing ones, so any failure to adhere to the same design pattern
would be caught that way. I expect other reviewers and/or committers
would almost certainly do something similar. If we presuppose
incompetence on the part of future reviewers and committers, no action
taken now can secure us.
The code you quote emits a warning
about a reasonably forseeable situation that can never be right, but
there's no particular reason to think that MarkPortalFailed is the
wrong thing to do here if that situation came up.I have two reasons to expect these MarkPortalFailed() calls will be the wrong
thing for hypothetical code reaching them. First, PortalRun() and peers reset
ActivePortal and PortalContext on error in addition to fixing portal status
with MarkPortalFailed(). If code forgets to update the status, there's a
substantial chance it forgot the other two. My patch added a comment
explaining the second reason:+ /* + * See similar code in AtSubAbort_Portals(). This would fire if code + * orchestrating multiple top-level transactions within a portal, such + * as VACUUM, caught errors and continued under the same portal with a + * fresh transaction. No part of core PostgreSQL functions that way, + * though it's a fair thing to want. Such code would wish the portal + * to remain ACTIVE, as in PreCommit_Portals(); we don't cater to + * that. Instead, like AtSubAbort_Portals(), interpret this as a bug. + */
You may be right, but then again Tom had a different opinion, even
after seeing your patch, and he's no dummy.
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Sat, Jan 30, 2016 at 07:37:45AM -0500, Robert Haas wrote:
On Fri, Jan 29, 2016 at 11:19 PM, Noah Misch <noah@leadboat.com> wrote:
As you say, forbidding things makes friction in the event that someone comes
along wanting to do the forbidden thing. Forbidding things also simplifies
the system, making it easier to verify. This decision should depend on the
API's audience. We have prominently-public APIs like lsyscache.h,
stringinfo.h and funcapi.h. They should cater to reasonably-foreseeable use
cases, not just what yesterday's users have actually used. We then have
narrow-interest APIs like subselect.h and view.h. For those, the value of a
simpler system exceeds the risk-adjusted cost of friction. They should impose
strict constraints on their callers.Portal status belongs to the second category. PortalRun(), PortalRunFetch()
and PersistHoldablePortal() are the three core functions that place portals
into PORTAL_ACTIVE status. No PGXN module does so; none so much as references
a PortalStatus constant or MarkPortal{Active,Done,Failed}() function. If
someone adds a fourth core portal runner, the system will be simpler and
better if that function replicates the structure of the existing three.I won't argue with that, but it strikes me that if I were reviewing a
patch to do such a thing, I'd surely compare the new caller against
the existing ones, so any failure to adhere to the same design pattern
would be caught that way. I expect other reviewers and/or committers
would almost certainly do something similar. If we presuppose
incompetence on the part of future reviewers and committers, no action
taken now can secure us.
You could offer that paragraph as an objection to almost all Assert(), elog(),
and automated tests. Why levy it against this patch? The valuable ways
assertions and tests supplement review are well-established.
You may be right, but then again Tom had a different opinion, even
after seeing your patch, and he's no dummy.
Eh? Tom last posted to this thread before I first posted a patch.
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Sat, Jan 30, 2016 at 2:13 PM, Noah Misch <noah@leadboat.com> wrote:
You could offer that paragraph as an objection to almost all Assert(), elog(),
and automated tests. Why levy it against this patch? The valuable ways
assertions and tests supplement review are well-established.
Sure, that's true, but I don't view all situations in the same way, so
I don't write the same thing in answer to each one. I think I've
pretty much said what I have to say about this; if nothing I wrote up
until now swayed you, it's unlikely that anything else I say after
this point will either.
You may be right, but then again Tom had a different opinion, even
after seeing your patch, and he's no dummy.Eh? Tom last posted to this thread before I first posted a patch.
/messages/by-id/29758.1451780533@sss.pgh.pa.us
seems to me to be a vote against the concept embodied by the patch.
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Sat, Jan 30, 2016 at 04:28:47PM -0500, Robert Haas wrote:
I think I've
pretty much said what I have to say about this; if nothing I wrote up
until now swayed you, it's unlikely that anything else I say after
this point will either.
Say I drop the parts that change the binary. Does the attached v2 manage to
improve PostgreSQL, or is it neutral-or-harmful like v1?
On Sat, Jan 30, 2016 at 04:28:47PM -0500, Robert Haas wrote:
On Sat, Jan 30, 2016 at 2:13 PM, Noah Misch <noah@leadboat.com> wrote:
On Sat, Jan 30, 2016 at 07:37:45AM -0500, Robert Haas wrote:
On Fri, Jan 29, 2016 at 11:19 PM, Noah Misch <noah@leadboat.com> wrote:
On Thu, Jan 28, 2016 at 11:12:15AM -0500, Robert Haas wrote:
The code you quote emits a warning
about a reasonably forseeable situation that can never be right, but
there's no particular reason to think that MarkPortalFailed is the
wrong thing to do here if that situation came up.I have two reasons to expect these MarkPortalFailed() calls will be the wrong
thing for hypothetical code reaching them. First, PortalRun() and peers reset
ActivePortal and PortalContext on error in addition to fixing portal status
with MarkPortalFailed(). If code forgets to update the status, there's a
substantial chance it forgot the other two. My patch added a comment
explaining the second reason:+ /* + * See similar code in AtSubAbort_Portals(). This would fire if code + * orchestrating multiple top-level transactions within a portal, such + * as VACUUM, caught errors and continued under the same portal with a + * fresh transaction. No part of core PostgreSQL functions that way, + * though it's a fair thing to want. Such code would wish the portal + * to remain ACTIVE, as in PreCommit_Portals(); we don't cater to + * that. Instead, like AtSubAbort_Portals(), interpret this as a bug. + */You may be right, but then again Tom had a different opinion, even
after seeing your patch, and he's no dummy.Eh? Tom last posted to this thread before I first posted a patch.
/messages/by-id/29758.1451780533@sss.pgh.pa.us
seems to me to be a vote against the concept embodied by the patch.
That much is true. The order of postings is the opposite of what you stated,
and there's no mailing list evidence that anyone other than you has read my
explanation of why MarkPortalFailed() is wrong here. Alternately, I demand
the schematics for Tom's time machine.
Attachments:
AtSubAbort_Portals-ACTIVE-v2.patchtext/plain; charset=us-asciiDownload
diff --git a/src/backend/utils/mmgr/portalmem.c b/src/backend/utils/mmgr/portalmem.c
index a53673c..c840aa6 100644
--- a/src/backend/utils/mmgr/portalmem.c
+++ b/src/backend/utils/mmgr/portalmem.c
@@ -765,7 +765,14 @@ AtAbort_Portals(void)
{
Portal portal = hentry->portal;
- /* Any portal that was actually running has to be considered broken */
+ /*
+ * See similar code in AtSubAbort_Portals(). This would fire if code
+ * orchestrating multiple top-level transactions within a portal, such
+ * as VACUUM, caught errors and continued under the same portal with a
+ * fresh transaction. No part of core PostgreSQL functions that way.
+ * XXX Such code would wish the portal to remain ACTIVE, as in
+ * PreCommit_Portals().
+ */
if (portal->status == PORTAL_ACTIVE)
MarkPortalFailed(portal);
@@ -919,9 +926,11 @@ AtSubAbort_Portals(SubTransactionId mySubid,
portal->activeSubid = parentSubid;
/*
- * Upper-level portals that failed while running in this
- * subtransaction must be forced into FAILED state, for the
- * same reasons discussed below.
+ * If a bug in a MarkPortalActive() caller has it miss cleanup
+ * after having failed while running an upper-level portal in
+ * this subtransaction, we don't know what else in the portal
+ * is wrong. Force it into FAILED state, for the same reasons
+ * discussed below.
*
* We assume we can get away without forcing upper-level READY
* portals to fail, even if they were run and then suspended.
@@ -960,7 +969,11 @@ AtSubAbort_Portals(SubTransactionId mySubid,
* Force any live portals of my own subtransaction into FAILED state.
* We have to do this because they might refer to objects created or
* changed in the failed subtransaction, leading to crashes within
- * ExecutorEnd when portalcmds.c tries to close down the portal.
+ * ExecutorEnd when portalcmds.c tries to close down the portal. Each
+ * MarkPortalActive() caller ensures it updates the portal status
+ * again before relinquishing control, so ACTIVE can't happen here.
+ * If a bug does make it happen, dispose the portal like a normal
+ * MarkPortalActive() caller would.
*/
if (portal->status == PORTAL_READY ||
portal->status == PORTAL_ACTIVE)
Noah Misch <noah@leadboat.com> writes:
On Sat, Jan 30, 2016 at 04:28:47PM -0500, Robert Haas wrote:
I think I've
pretty much said what I have to say about this; if nothing I wrote up
until now swayed you, it's unlikely that anything else I say after
this point will either.
Say I drop the parts that change the binary. Does the attached v2 manage to
improve PostgreSQL, or is it neutral-or-harmful like v1?
There can surely be no objection to improving these comments. However,
I'm not convinced that we should word the comments to insist that the
hypothetical cases are bugs. As I said before, I do not think there is an
API contract that would promise that portals don't reach here in ACTIVE
state. So IMO it's fair to note that no such case can arise currently,
but not to state that it's a bug if it does. So for example I'd reword
your last comment addition along the lines of "Currently, every
MarkPortalActive() caller ensures it updates the portal status again
before relinquishing control, so that ACTIVE can't happen here. If it
does happen, dispose the portal like existing MarkPortalActive() callers
would."
regards, tom lane
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Sun, Jan 31, 2016 at 11:50 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
Noah Misch <noah@leadboat.com> writes:
On Sat, Jan 30, 2016 at 04:28:47PM -0500, Robert Haas wrote:
I think I've
pretty much said what I have to say about this; if nothing I wrote up
until now swayed you, it's unlikely that anything else I say after
this point will either.Say I drop the parts that change the binary. Does the attached v2 manage to
improve PostgreSQL, or is it neutral-or-harmful like v1?There can surely be no objection to improving these comments. However,
I'm not convinced that we should word the comments to insist that the
hypothetical cases are bugs. As I said before, I do not think there is an
API contract that would promise that portals don't reach here in ACTIVE
state. So IMO it's fair to note that no such case can arise currently,
but not to state that it's a bug if it does. So for example I'd reword
your last comment addition along the lines of "Currently, every
MarkPortalActive() caller ensures it updates the portal status again
before relinquishing control, so that ACTIVE can't happen here. If it
does happen, dispose the portal like existing MarkPortalActive() callers
would."
+1. I think Noah's comment additions constitute useful and helpful
information, but I too am doubtful about the characterization of the
situations in question as bugs. I don't see what the evidence for
that is, especially given that it's quite hard to predict how the code
might change in the future. However, to reiterate, a more neutral
position of the same facts would get my vote.
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers