Another logical decoding assertion failure
/messages/by-id/BLU436-SMTP12682D628F61AB9736099C3DCE80@phx.gbl
recalls me that I also saw an assertion failure recently. Although I
wanted to isolate and report my issue when my vacation is over, this
report made me curious whether I saw the same. Eventually it seems to be
a different symptom.
Following are the steps that trigger the failure in my environment
(9.5devel), although it's not always that straightforward - sometimes I
need to create and populate one more table. (I use the 'test_decoding'
contrib module.)
postgres=# SELECT pg_create_logical_replication_slot('my_slot',
'test_decoding');
pg_create_logical_replication_slot
------------------------------------
(my_slot,0/16F3B30)
(1 row)
postgres=# CREATE TABLE a AS SELECT * FROM
pg_logical_slot_get_changes('my_slot', NULL, NULL) WITH NO DATA;
SELECT 0
postgres=# INSERT INTO a SELECT * FROM
pg_logical_slot_get_changes('my_slot', NULL, NULL);
INSERT 0 2
postgres=# INSERT INTO a SELECT * FROM
pg_logical_slot_get_changes('my_slot', NULL, NULL);
The connection to the server was lost. Attempting reset: Failed.
!>
The stack trace (should I upload / send the whole core file anywhere?):
(gdb) bt
#0 0x00007f12d0640849 in raise () from /lib64/libc.so.6
#1 0x00007f12d0641cd8 in abort () from /lib64/libc.so.6
#2 0x00000000008c2d51 in ExceptionalCondition (conditionName=0xa926a0
"!(((bool)((relation)->rd_refcnt == 0)))",
errorType=0xa92210 "FailedAssertion", fileName=0xa92104
"relcache.c", lineNumber=1892) at assert.c:54
#3 0x00000000008b2776 in RelationDestroyRelation
(relation=0x7f12c766d240, remember_tupdesc=0 '\000') at relcache.c:1892
#4 0x00000000008b2c28 in RelationClearRelation
(relation=0x7f12c766d240, rebuild=1 '\001') at relcache.c:2106
#5 0x00000000008b2fa3 in RelationFlushRelation
(relation=0x7f12c766d240) at relcache.c:2204
#6 0x00000000008b30b5 in RelationCacheInvalidateEntry
(relationId=16384) at relcache.c:2256
#7 0x00000000008abc65 in LocalExecuteInvalidationMessage
(msg=0x28c1260) at inval.c:567
#8 0x000000000073e1f5 in ReorderBufferExecuteInvalidations
(rb=0x28b0318, txn=0x28b0430) at reorderbuffer.c:1798
#9 0x000000000073ddbf in ReorderBufferForget (rb=0x28b0318, xid=1279,
lsn=24154944) at reorderbuffer.c:1645
#10 0x00000000007383f8 in DecodeCommit (ctx=0x2894658,
buf=0x7fff3e658c30, xid=1279, dboid=12736, commit_time=461418357793855,
nsubxacts=0, sub_xids=0x28af650, ninval_msgs=69, msgs=0x28af650) at
decode.c:539
#11 0x0000000000737c91 in DecodeXactOp (ctx=0x2894658,
buf=0x7fff3e658c30) at decode.c:207
#12 0x00000000007379ce in LogicalDecodingProcessRecord (ctx=0x2894658,
record=0x28af610) at decode.c:103
#13 0x000000000073b0d6 in pg_logical_slot_get_changes_guts
(fcinfo=0x7fff3e658f50, confirm=1 '\001', binary=0 '\000') at
logicalfuncs.c:432
#14 0x000000000073b221 in pg_logical_slot_get_changes
(fcinfo=0x7fff3e658f50) at logicalfuncs.c:478
#15 0x000000000063e1a3 in ExecMakeTableFunctionResult
(funcexpr=0x288be78, econtext=0x288b758, argContext=0x28b5580,
expectedDesc=0x288ccd8, randomAccess=0 '\000') at execQual.c:2157
#16 0x000000000065e302 in FunctionNext (node=0x288ba18) at
nodeFunctionscan.c:95
#17 0x000000000064537e in ExecScanFetch (node=0x288ba18,
accessMtd=0x65e24b <FunctionNext>, recheckMtd=0x65e630 <FunctionRecheck>)
at execScan.c:82
#18 0x00000000006453ed in ExecScan (node=0x288ba18, accessMtd=0x65e24b
<FunctionNext>, recheckMtd=0x65e630 <FunctionRecheck>)
at execScan.c:132
#19 0x000000000065e665 in ExecFunctionScan (node=0x288ba18) at
nodeFunctionscan.c:269
#20 0x000000000063a649 in ExecProcNode (node=0x288ba18) at
execProcnode.c:426
#21 0x000000000065ca53 in ExecModifyTable (node=0x288b8c0) at
nodeModifyTable.c:926
#22 0x000000000063a577 in ExecProcNode (node=0x288b8c0) at
execProcnode.c:377
#23 0x0000000000638465 in ExecutePlan (estate=0x288b518,
planstate=0x288b8c0, operation=CMD_INSERT, sendTuples=0 '\000',
numberTuples=0,
direction=ForwardScanDirection, dest=0x2815ac0) at execMain.c:1475
#24 0x000000000063667a in standard_ExecutorRun (queryDesc=0x288f948,
direction=ForwardScanDirection, count=0) at execMain.c:308
#25 0x0000000000636512 in ExecutorRun (queryDesc=0x288f948,
direction=ForwardScanDirection, count=0) at execMain.c:256
#26 0x00000000007998ec in ProcessQuery (plan=0x28159c8,
sourceText=0x2854d18 "INSERT INTO a SELECT * FROM
pg_logical_slot_get_changes('my_slot', NULL, NULL);", params=0x0,
dest=0x2815ac0,
completionTag=0x7fff3e659920 "") at pquery.c:185
#27 0x000000000079b16e in PortalRunMulti (portal=0x2890638, isTopLevel=1
'\001', dest=0x2815ac0, altdest=0x2815ac0,
completionTag=0x7fff3e659920 "") at pquery.c:1279
#28 0x000000000079a7b1 in PortalRun (portal=0x2890638,
count=9223372036854775807, isTopLevel=1 '\001', dest=0x2815ac0,
altdest=0x2815ac0,
completionTag=0x7fff3e659920 "") at pquery.c:816
#29 0x000000000079487b in exec_simple_query (
query_string=0x2854d18 "INSERT INTO a SELECT * FROM
pg_logical_slot_get_changes('my_slot', NULL, NULL);") at postgres.c:1045
#30 0x00000000007989ec in PostgresMain (argc=1, argv=0x27eaab0,
dbname=0x27ea910 "postgres", username=0x27ea8f0 "anton") at postgres.c:4010
#31 0x0000000000721da1 in BackendRun (port=0x280c630) at postmaster.c:4113
#32 0x00000000007214de in BackendStartup (port=0x280c630) at
postmaster.c:3787
#33 0x000000000071de16 in ServerLoop () at postmaster.c:1566
// Tony
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On 2014-08-15 14:53:45 +0200, Antonin Houska wrote:
/messages/by-id/BLU436-SMTP12682D628F61AB9736099C3DCE80@phx.gbl
recalls me that I also saw an assertion failure recently. Although I
wanted to isolate and report my issue when my vacation is over, this
report made me curious whether I saw the same. Eventually it seems to be
a different symptom.
That's something separate, yes.
Following are the steps that trigger the failure in my environment
(9.5devel), although it's not always that straightforward - sometimes I
need to create and populate one more table. (I use the 'test_decoding'
contrib module.)postgres=# SELECT pg_create_logical_replication_slot('my_slot',
'test_decoding');
pg_create_logical_replication_slot
------------------------------------
(my_slot,0/16F3B30)
(1 row)postgres=# CREATE TABLE a AS SELECT * FROM
pg_logical_slot_get_changes('my_slot', NULL, NULL) WITH NO DATA;
SELECT 0
postgres=# INSERT INTO a SELECT * FROM
pg_logical_slot_get_changes('my_slot', NULL, NULL);
INSERT 0 2
postgres=# INSERT INTO a SELECT * FROM
pg_logical_slot_get_changes('my_slot', NULL, NULL);
The connection to the server was lost. Attempting reset: Failed.
!>
Is this just to reproduce the issue, or are you really storing the
results of logical decoding in a table again?
Why? That'll just result in circularity, no? It's not something I
really thought about allowing when writing this. Possibly we can make it
work, but I don't really see the point. We obviously shouldn't just
crash, but that's not my point.
Greetings,
Andres Freund
--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On 08/15/2014 03:16 PM, Andres Freund wrote:
On 2014-08-15 14:53:45 +0200, Antonin Houska wrote:
postgres=# SELECT pg_create_logical_replication_slot('my_slot',
'test_decoding');
pg_create_logical_replication_slot
------------------------------------
(my_slot,0/16F3B30)
(1 row)postgres=# CREATE TABLE a AS SELECT * FROM
pg_logical_slot_get_changes('my_slot', NULL, NULL) WITH NO DATA;
SELECT 0
postgres=# INSERT INTO a SELECT * FROM
pg_logical_slot_get_changes('my_slot', NULL, NULL);
INSERT 0 2
postgres=# INSERT INTO a SELECT * FROM
pg_logical_slot_get_changes('my_slot', NULL, NULL);
The connection to the server was lost. Attempting reset: Failed.
!>Is this just to reproduce the issue, or are you really storing the
results of logical decoding in a table again?Why? That'll just result in circularity, no? It's not something I
really thought about allowing when writing this. Possibly we can make it
work, but I don't really see the point. We obviously shouldn't just
crash, but that's not my point.
It was basically an experiment. I tried to capture changes into a table.
I don't know whether it's legal (i.e. whether the current call of
pg_logical_slot_get_changes() should include changes that the current
transaction did). Unloged / temporary table may be necessary for case
like this.
The reason I report it is that (I think), if such an use case is not
legal, it should be detected somehow and handled using ereport / elog.
// Tony
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On 2014-08-15 14:53:45 +0200, Antonin Houska wrote:
/messages/by-id/BLU436-SMTP12682D628F61AB9736099C3DCE80@phx.gbl
recalls me that I also saw an assertion failure recently. Although I
wanted to isolate and report my issue when my vacation is over, this
report made me curious whether I saw the same. Eventually it seems to be
a different symptom.Following are the steps that trigger the failure in my environment
(9.5devel), although it's not always that straightforward - sometimes I
need to create and populate one more table. (I use the 'test_decoding'
contrib module.)postgres=# SELECT pg_create_logical_replication_slot('my_slot',
'test_decoding');
pg_create_logical_replication_slot
------------------------------------
(my_slot,0/16F3B30)
(1 row)postgres=# CREATE TABLE a AS SELECT * FROM
pg_logical_slot_get_changes('my_slot', NULL, NULL) WITH NO DATA;
SELECT 0
postgres=# INSERT INTO a SELECT * FROM
pg_logical_slot_get_changes('my_slot', NULL, NULL);
INSERT 0 2
postgres=# INSERT INTO a SELECT * FROM
pg_logical_slot_get_changes('my_slot', NULL, NULL);
The connection to the server was lost. Attempting reset: Failed.
!>
This was actually triggered by a bug in logical decoding. I've pushed a
fix. Thanks for the report!
I still think it's a bad idea to do something like the above, but even
after a fair amount of thinking I can't come up with a case that's an
actual problem. So I think we can just allow it.
Obviously it's not a good idea to use a output plugin without filtering,
and a logged table for this - each subsequent get_changes() will
otherwise contain the rows inserted into the table the previous
round. The attached regression tests contains pretty output demonstrating
that:
BEGIN
table public.changeresult: INSERT: data[text]:'BEGIN'
table public.changeresult: INSERT: data[text]:'table public.changeresult: INSERT: data[text]:''BEGIN'''
table public.changeresult: INSERT: data[text]:'table public.changeresult: INSERT: data[text]:''table public.somechange: INSERT: id[integer]:1'''
table public.changeresult: INSERT: data[text]:'table public.changeresult: INSERT: data[text]:''COMMIT'''
table public.changeresult: INSERT: data[text]:'COMMIT'
table public.changeresult: INSERT: data[text]:'BEGIN'
table public.changeresult: INSERT: data[text]:'table public.changeresult: INSERT: data[text]:''BEGIN'''
table public.changeresult: INSERT: data[text]:'table public.changeresult: INSERT: data[text]:''table public.changeresult: INSERT: data[text]:''''BEGIN'''''''
table public.changeresult: INSERT: data[text]:'table public.changeresult: INSERT: data[text]:''table public.changeresult: INSERT: data[text]:''''table public.somechange: INSERT: id[integer]:1'''''''
table public.changeresult: INSERT: data[text]:'table public.changeresult: INSERT: data[text]:''table public.changeresult: INSERT: data[text]:''''COMMIT'''''''
table public.changeresult: INSERT: data[text]:'table public.changeresult: INSERT: data[text]:''COMMIT'''
table public.changeresult: INSERT: data[text]:'COMMIT'
COMMIT
(14 rows)
Greetings,
Andres Freund
--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers