Why is the LSN reported for pg_logical_emit_message() different from other decoded operations?

Started by torikoshia11 days ago6 messageshackers

torikoshia@oss.nttdata.com

11 days ago

Hi,

While investigating logical decoding of pg_logical_emit_message(),
I noticed that the LSN reported for logical messages differs from
the LSN reported for other operations such as INSERT, UPDATE, and
DELETE.

For example, with the following transaction:

BEGIN;
INSERT INTO data(data) VALUES('1');
UPDATE data SET data = 'a' WHERE id = 1;
DELETE FROM data WHERE id = 1;
SELECT * FROM pg_logical_emit_message(true, 'test1', 'aaa');
INSERT INTO data(data) VALUES('2');
TRUNCATE data;
COMMIT;

=# SELECT * FROM pg_logical_slot_get_changes('regression_slot', NULL,
NULL);
lsn | xid | data
------------+-----+------------------------------------------------------------
0/017E9640 | 696 | BEGIN 696
0/017E9640 | 696 | table public.data: INSERT: id[integer]:3
data[text]:'1'
0/017E96C0 | 696 | table public.data: UPDATE: id[integer]:1
data[text]:'a'
0/017E9740 | 696 | table public.data: DELETE: id[integer]:1
0/017E97F8 | 696 | message: transactional: 1 prefix: test1, sz: 3
content:aaa
0/017E9830 | 696 | table public.data: INSERT: id[integer]:4
data[text]:'2'
0/017EA420 | 696 | table public.data: TRUNCATE: (no-flags)
0/017EA5C0 | 696 | COMMIT 696

$ pg_waldump -f data/pg_wal/000000010000000000000001
rmgr: Heap len (rec/tot): 61/ 61, tx: 696, lsn:
0/017E9640, prev 0/017E9608, desc: INSERT off: 5, flags: 0x08, blkref
#0: rel 1663/5/16385 blk 0
..(snip)..
rmgr: Heap len (rec/tot): 72/ 72, tx: 696, lsn:
0/017E96C0, prev 0/017E9680, desc: HOT_UPDATE old_xmax: 696, old_off: 2,
old_infobits: [], flags: 0x10, new_xmax: 0, new_off: 6, blkref #0: rel
1663/5/16385 blk 0
..(snip)..
rmgr: Heap len (rec/tot): 64/ 64, tx: 696, lsn:
0/017E9740, prev 0/017E9708, desc: DELETE xmax: 696, off: 6, infobits:
[KEYS_UPDATED], flags: 0x04, blkref #0: rel 1663/5/16385 blk 0
..(snip)..
rmgr: LogicalMessage len (rec/tot): 59/ 59, tx: 696, lsn:
0/017E97B8, prev 0/017E9780, desc: MESSAGE transactional, prefix
"test1"; payload (3 bytes): 61 61 61
rmgr: Standby len (rec/tot): 54/ 54, tx: 0, lsn:
0/017E97F8, prev 0/017E97B8, desc: RUNNING_XACTS nextXid 697
latestCompletedXid 695 oldestRunningXid 696; 1 xacts: 696
..(snip)..
rmgr: Heap len (rec/tot): 61/ 61, tx: 696, lsn:
0/017E9830, prev 0/017E97F8, desc: INSERT off: 7, flags: 0x08, blkref
#0: rel 1663/5/16385 blk 0

Comparing the output of pg_logical_slot_get_changes() with the
pg_waldump, the LSNs reported for INSERT, UPDATE, and DELETE
match the start LSN of the corresponding WAL records. However,
the LSN reported for the logical message does not match the
LogicalMessage WAL record itself (0/017E97B8); instead, it
matches the LSN of the following RUNNING_XACTS record
(0/017E97F8).

I found that changes such as INSERTs and UPDATEs are queued via
ReorderBufferQueueChange() using XLogRecordBuffer.origptr,
whereas logical messages are queued using
XLogRecordBuffer.endptr. This appears to explain the observed
behavior.

My question is: is there a particular reason why logical messages
use endptr instead of origptr?

Looking through the history, this behavior seems to go back to
commit 3fe3511d05127c, which added logical decoding support for
pg_logical_emit_message().

BTW The reason I started investigating this is that we
encountered a data-loss issue in the Debezium PostgreSQL
connector, which uses logical decoding for Change Data Capture.
Under certain circumstances, messages emitted by
pg_logical_emit_message() could be skipped during recovery
because the message LSN behaves differently from other decoded
operations, as described. The attached patch, which uses origptr
instead of endptr, eliminates the issue in my testing.

Of course, I think consumers of logical decoding, such as
Debezium, could work around this by treating message LSNs
differently. However, compared to other decoded operations, this
special handling feels somewhat unexpected, so I wanted to ask
whether the current behavior is intentional.

Thanks,

--
Atsushi Torikoshi
Seconded from NTT DATA CORPORATION to SRA OSS K.K.