BUG #13844: Logical decoding bug with subxact + row locking
The following bug has been logged on the website:
Bug reference: 13844
Logged by: Jarred Ward
Email address: jarred@webriots.com
PostgreSQL version: 9.4.5
Operating system: Linux
Description:
This is a pretty serious logical decoding bug that bit us pretty hard
in production.
Run the following code:
----------------------------------------------------------------------------
CREATE TABLE foo(a SERIAL PRIMARY KEY, b TEXT);
SELECT pg_create_logical_replication_slot('regression_slot',
'test_decoding');
INSERT INTO foo (b) VALUES ('john');
BEGIN;
SELECT * FROM foo WHERE b = 'john' FOR UPDATE;
SAVEPOINT sp1;
INSERT INTO foo (b) VALUES ('jane');
ROLLBACK TO SAVEPOINT sp1;
INSERT INTO foo (b) VALUES ('doe');
END;
SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL,
NULL);
----------------------------------------------------------------------------
The last select above should return the decoded WAL, but instead it
returns:
----------------------------------------------------------------------------
ERROR: subxact logged without previous toplevel record
----------------------------------------------------------------------------
And the process is disconnected with no way to recover the replication
slot without dropping and recreating.
If SELECT statement in the transaction is changed the following:
----------------------------------------------------------------------------
SELECT * FROM foo WHERE b = 'john' FOR UPDATE;
----------------------------------------------------------------------------
to remove the row level locking the correct decoded change set is
returned.
--
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs
On Tue, Jan 5, 2016 at 03:32:49AM +0000, jarred@webriots.com wrote:
The following bug has been logged on the website:
Bug reference: 13844
Logged by: Jarred Ward
Email address: jarred@webriots.com
PostgreSQL version: 9.4.5
Operating system: Linux
Description:This is a pretty serious logical decoding bug that bit us pretty hard
in production.Run the following code:
----------------------------------------------------------------------------
CREATE TABLE foo(a SERIAL PRIMARY KEY, b TEXT);SELECT pg_create_logical_replication_slot('regression_slot',
'test_decoding');INSERT INTO foo (b) VALUES ('john');
BEGIN;
SELECT * FROM foo WHERE b = 'john' FOR UPDATE;
SAVEPOINT sp1;
INSERT INTO foo (b) VALUES ('jane');
ROLLBACK TO SAVEPOINT sp1;
INSERT INTO foo (b) VALUES ('doe');
END;SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL,
NULL);----------------------------------------------------------------------------
The last select above should return the decoded WAL, but instead it
returns:----------------------------------------------------------------------------
ERROR: subxact logged without previous toplevel record----------------------------------------------------------------------------
And the process is disconnected with no way to recover the replication
slot without dropping and recreating.If SELECT statement in the transaction is changed the following:
----------------------------------------------------------------------------
SELECT * FROM foo WHERE b = 'john' FOR UPDATE;----------------------------------------------------------------------------
to remove the row level locking the correct decoded change set is
returned.
I can confirm this behavior.
--
Bruce Momjian <bruce@momjian.us> http://momjian.us
EnterpriseDB http://enterprisedb.com
+ As you are, so once was I. As I am, so you will be. +
+ Roman grave inscription +
--
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs
Hi,
On 2016-01-05 03:32:49 +0000, jarred@webriots.com wrote:
This is a pretty serious logical decoding bug that bit us pretty hard
in production.Run the following code:
----------------------------------------------------------------------------
CREATE TABLE foo(a SERIAL PRIMARY KEY, b TEXT);SELECT pg_create_logical_replication_slot('regression_slot',
'test_decoding');INSERT INTO foo (b) VALUES ('john');
BEGIN;
SELECT * FROM foo WHERE b = 'john' FOR UPDATE;
SAVEPOINT sp1;
INSERT INTO foo (b) VALUES ('jane');
ROLLBACK TO SAVEPOINT sp1;
INSERT INTO foo (b) VALUES ('doe');
END;SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL,
NULL);----------------------------------------------------------------------------
The last select above should return the decoded WAL, but instead it
returns:----------------------------------------------------------------------------
ERROR: subxact logged without previous toplevel record----------------------------------------------------------------------------
Ugh, that's an annoying bug. Working on producing a fix.
The problem is that logical decoding expects to know about toplevel xids
before subtransaction/savepoint xids. Which is enforced in the WAL
logging code, so that's ok. But the problem is that right now the WAL
produced looks like (abbreviated):
rmgr: Heap tx: 585, lsn: 0/01A304A8, desc: LOCK off 1: xid 585 LOCK_ONLY EXCL_LOCK KEYS_UPDATED , blkref #0: rel 1663/12385/24607 blk 0
rmgr: Heap tx: 586, lsn: 0/01A30658, desc: INSERT off 2, blkref #0: rel 1663/12385/24607 blk 0 FPW
rmgr: Transaction tx: 586, lsn: 0/01A30798, desc: ABORT 2016-02-07 12:21:18.624045 CET
rmgr: Standby tx: 0, lsn: 0/01A307C0, desc: RUNNING_XACTS nextXid 587 latestCompletedXid 586 oldestRunningXid 585; 1 xacts: 585
rmgr: Heap tx: 587, lsn: 0/01A307F8, desc: INSERT off 3, blkref #0: rel 1663/12385/24607 blk 0
rmgr: Btree tx: 587, lsn: 0/01A30838, desc: INSERT_LEAF off 3, blkref #0: rel 1663/12385/24614 blk 1
rmgr: Transaction tx: 585, lsn: 0/01A30878, desc: COMMIT 2016-02-07 12:21:28.062978 CET; subxacts: 587
and decode.c doesn't care about LOCK records so far. Which means
reorderbuffer.c doesn't know about the relevant xid. Hm.
Greetings,
Andres Freund
--
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs
Andres Freund wrote:
Ugh, that's an annoying bug. Working on producing a fix.
The problem is that logical decoding expects to know about toplevel xids
before subtransaction/savepoint xids. Which is enforced in the WAL
logging code, so that's ok. But the problem is that right now the WAL
produced looks like (abbreviated):
rmgr: Heap tx: 585, lsn: 0/01A304A8, desc: LOCK off 1: xid 585 LOCK_ONLY EXCL_LOCK KEYS_UPDATED , blkref #0: rel 1663/12385/24607 blk 0
rmgr: Heap tx: 586, lsn: 0/01A30658, desc: INSERT off 2, blkref #0: rel 1663/12385/24607 blk 0 FPW
rmgr: Transaction tx: 586, lsn: 0/01A30798, desc: ABORT 2016-02-07 12:21:18.624045 CET
rmgr: Standby tx: 0, lsn: 0/01A307C0, desc: RUNNING_XACTS nextXid 587 latestCompletedXid 586 oldestRunningXid 585; 1 xacts: 585
rmgr: Heap tx: 587, lsn: 0/01A307F8, desc: INSERT off 3, blkref #0: rel 1663/12385/24607 blk 0
rmgr: Btree tx: 587, lsn: 0/01A30838, desc: INSERT_LEAF off 3, blkref #0: rel 1663/12385/24614 blk 1
rmgr: Transaction tx: 585, lsn: 0/01A30878, desc: COMMIT 2016-02-07 12:21:28.062978 CET; subxacts: 587and decode.c doesn't care about LOCK records so far. Which means
reorderbuffer.c doesn't know about the relevant xid. Hm.
Hi,
Any progress with this?
--
�lvaro Herrera http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
--
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs
On 2016-03-04 12:36:32 -0300, Alvaro Herrera wrote:
Andres Freund wrote:
Ugh, that's an annoying bug. Working on producing a fix.
The problem is that logical decoding expects to know about toplevel xids
before subtransaction/savepoint xids. Which is enforced in the WAL
logging code, so that's ok. But the problem is that right now the WAL
produced looks like (abbreviated):
rmgr: Heap tx: 585, lsn: 0/01A304A8, desc: LOCK off 1: xid 585 LOCK_ONLY EXCL_LOCK KEYS_UPDATED , blkref #0: rel 1663/12385/24607 blk 0
rmgr: Heap tx: 586, lsn: 0/01A30658, desc: INSERT off 2, blkref #0: rel 1663/12385/24607 blk 0 FPW
rmgr: Transaction tx: 586, lsn: 0/01A30798, desc: ABORT 2016-02-07 12:21:18.624045 CET
rmgr: Standby tx: 0, lsn: 0/01A307C0, desc: RUNNING_XACTS nextXid 587 latestCompletedXid 586 oldestRunningXid 585; 1 xacts: 585
rmgr: Heap tx: 587, lsn: 0/01A307F8, desc: INSERT off 3, blkref #0: rel 1663/12385/24607 blk 0
rmgr: Btree tx: 587, lsn: 0/01A30838, desc: INSERT_LEAF off 3, blkref #0: rel 1663/12385/24614 blk 1
rmgr: Transaction tx: 585, lsn: 0/01A30878, desc: COMMIT 2016-02-07 12:21:28.062978 CET; subxacts: 587and decode.c doesn't care about LOCK records so far. Which means
reorderbuffer.c doesn't know about the relevant xid. Hm.Any progress with this?
Yes. The attached WIP patch fixes this, and some other issues. I'm
working on splitting it up, and adding some more testcases.
Andres
Attachments:
logical.difftext/x-diff; charset=us-asciiDownload+222-70
Andres Freund wrote:
On 2016-03-04 12:36:32 -0300, Alvaro Herrera wrote:
Any progress with this?
Yes. The attached WIP patch fixes this, and some other issues. I'm
working on splitting it up, and adding some more testcases.
Great, thanks. Looks interesting. I started to write a patch today
before posting and in doing so I started to realize it was a bit bigger
than what I was doing. I'll give it a look later.
Do you think you can get this before the next set of releases? If you
think I can help you get this done, please let me know.
--
�lvaro Herrera http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
--
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs
On 2016-03-04 15:28:40 -0300, Alvaro Herrera wrote:
Do you think you can get this before the next set of releases? If you
think I can help you get this done, please let me know.
Yes, I hope to get it done in the next few hours.
Andres
--
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs
On 2016-03-04 10:30:02 -0800, Andres Freund wrote:
On 2016-03-04 15:28:40 -0300, Alvaro Herrera wrote:
Do you think you can get this before the next set of releases? If you
think I can help you get this done, please let me know.Yes, I hope to get it done in the next few hours.
Here's updated and separated versions of patches addressing all the open
logical decoding issues that I know of. I plan to take another look over
them tomorrow and then push. 0001 is the patch addressing this specific
issue.
Regards,
Andres
Attachments:
0001-logical-decoding-Tell-reorderbuffer-about-all-xids.patchtext/x-patch; charset=us-asciiDownload+108-18
0002-logical-decoding-old-newtuple-in-spooled-UPDATE-chan.patchtext/x-patch; charset=us-asciiDownload+39-13
0003-logical-decoding-Fix-handling-of-large-old-tuples-wi.patchtext/x-patch; charset=utf-8Download+189-54
Hi Jared,
On 2016-01-05 03:32:49 +0000, jarred@webriots.com wrote:
The following bug has been logged on the website:
Bug reference: 13844
Logged by: Jarred Ward
Email address: jarred@webriots.com
PostgreSQL version: 9.4.5
Operating system: Linux
Description:This is a pretty serious logical decoding bug that bit us pretty hard
in production.Run the following code:
----------------------------------------------------------------------------
CREATE TABLE foo(a SERIAL PRIMARY KEY, b TEXT);SELECT pg_create_logical_replication_slot('regression_slot',
'test_decoding');INSERT INTO foo (b) VALUES ('john');
BEGIN;
SELECT * FROM foo WHERE b = 'john' FOR UPDATE;
SAVEPOINT sp1;
INSERT INTO foo (b) VALUES ('jane');
ROLLBACK TO SAVEPOINT sp1;
INSERT INTO foo (b) VALUES ('doe');
END;SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL,
NULL);----------------------------------------------------------------------------
The last select above should return the decoded WAL, but instead it
returns:----------------------------------------------------------------------------
ERROR: subxact logged without previous toplevel record----------------------------------------------------------------------------
And the process is disconnected with no way to recover the replication
slot without dropping and recreating.If SELECT statement in the transaction is changed the following:
----------------------------------------------------------------------------
SELECT * FROM foo WHERE b = 'john' FOR UPDATE;----------------------------------------------------------------------------
to remove the row level locking the correct decoded change set is
returned.
I pushed a fix for this; it'll be included in the next set of
maintenance releases which are coming up soon-ish. Sorry that it took so
long.
Thanks for the report!
Regards,
Andres
--
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs