Why is the LSN reported for pg_logical_emit_message() different from other decoded operations?

Started by torikoshia11 days ago6 messageshackers
Jump to latest
#1torikoshia
torikoshia@oss.nttdata.com

Hi,

While investigating logical decoding of pg_logical_emit_message(),
I noticed that the LSN reported for logical messages differs from
the LSN reported for other operations such as INSERT, UPDATE, and
DELETE.

For example, with the following transaction:

BEGIN;
INSERT INTO data(data) VALUES('1');
UPDATE data SET data = 'a' WHERE id = 1;
DELETE FROM data WHERE id = 1;
SELECT * FROM pg_logical_emit_message(true, 'test1', 'aaa');
INSERT INTO data(data) VALUES('2');
TRUNCATE data;
COMMIT;

=# SELECT * FROM pg_logical_slot_get_changes('regression_slot', NULL,
NULL);
lsn | xid | data
------------+-----+------------------------------------------------------------
0/017E9640 | 696 | BEGIN 696
0/017E9640 | 696 | table public.data: INSERT: id[integer]:3
data[text]:'1'
0/017E96C0 | 696 | table public.data: UPDATE: id[integer]:1
data[text]:'a'
0/017E9740 | 696 | table public.data: DELETE: id[integer]:1
0/017E97F8 | 696 | message: transactional: 1 prefix: test1, sz: 3
content:aaa
0/017E9830 | 696 | table public.data: INSERT: id[integer]:4
data[text]:'2'
0/017EA420 | 696 | table public.data: TRUNCATE: (no-flags)
0/017EA5C0 | 696 | COMMIT 696

$ pg_waldump -f data/pg_wal/000000010000000000000001
rmgr: Heap len (rec/tot): 61/ 61, tx: 696, lsn:
0/017E9640, prev 0/017E9608, desc: INSERT off: 5, flags: 0x08, blkref
#0: rel 1663/5/16385 blk 0
..(snip)..
rmgr: Heap len (rec/tot): 72/ 72, tx: 696, lsn:
0/017E96C0, prev 0/017E9680, desc: HOT_UPDATE old_xmax: 696, old_off: 2,
old_infobits: [], flags: 0x10, new_xmax: 0, new_off: 6, blkref #0: rel
1663/5/16385 blk 0
..(snip)..
rmgr: Heap len (rec/tot): 64/ 64, tx: 696, lsn:
0/017E9740, prev 0/017E9708, desc: DELETE xmax: 696, off: 6, infobits:
[KEYS_UPDATED], flags: 0x04, blkref #0: rel 1663/5/16385 blk 0
..(snip)..
rmgr: LogicalMessage len (rec/tot): 59/ 59, tx: 696, lsn:
0/017E97B8, prev 0/017E9780, desc: MESSAGE transactional, prefix
"test1"; payload (3 bytes): 61 61 61
rmgr: Standby len (rec/tot): 54/ 54, tx: 0, lsn:
0/017E97F8, prev 0/017E97B8, desc: RUNNING_XACTS nextXid 697
latestCompletedXid 695 oldestRunningXid 696; 1 xacts: 696
..(snip)..
rmgr: Heap len (rec/tot): 61/ 61, tx: 696, lsn:
0/017E9830, prev 0/017E97F8, desc: INSERT off: 7, flags: 0x08, blkref
#0: rel 1663/5/16385 blk 0

Comparing the output of pg_logical_slot_get_changes() with the
pg_waldump, the LSNs reported for INSERT, UPDATE, and DELETE
match the start LSN of the corresponding WAL records. However,
the LSN reported for the logical message does not match the
LogicalMessage WAL record itself (0/017E97B8); instead, it
matches the LSN of the following RUNNING_XACTS record
(0/017E97F8).

I found that changes such as INSERTs and UPDATEs are queued via
ReorderBufferQueueChange() using XLogRecordBuffer.origptr,
whereas logical messages are queued using
XLogRecordBuffer.endptr. This appears to explain the observed
behavior.

My question is: is there a particular reason why logical messages
use endptr instead of origptr?

Looking through the history, this behavior seems to go back to
commit 3fe3511d05127c, which added logical decoding support for
pg_logical_emit_message().

BTW The reason I started investigating this is that we
encountered a data-loss issue in the Debezium PostgreSQL
connector, which uses logical decoding for Change Data Capture.
Under certain circumstances, messages emitted by
pg_logical_emit_message() could be skipped during recovery
because the message LSN behaves differently from other decoded
operations, as described. The attached patch, which uses origptr
instead of endptr, eliminates the issue in my testing.

Of course, I think consumers of logical decoding, such as
Debezium, could work around this by treating message LSNs
differently. However, compared to other decoded operations, this
special handling feels somewhat unexpected, so I wanted to ask
whether the current behavior is intentional.

Thanks,

--
Atsushi Torikoshi
Seconded from NTT DATA CORPORATION to SRA OSS K.K.

Attachments:

v1-0001-Align-logical-message-LSN-with-other-operations.patchtext/x-diff; name=v1-0001-Align-logical-message-LSN-with-other-operations.patchDownload+2-3
#2Yugo Nagata
nagata@sraoss.co.jp
In reply to: torikoshia (#1)
Re: Why is the LSN reported for pg_logical_emit_message() different from other decoded operations?

On Mon, 8 Jun 2026 22:09:06 +0900
torikoshia <torikoshia@oss.nttdata.com> wrote:

Hi,

While investigating logical decoding of pg_logical_emit_message(),
I noticed that the LSN reported for logical messages differs from
the LSN reported for other operations such as INSERT, UPDATE, and
DELETE.

For example, with the following transaction:

BEGIN;
INSERT INTO data(data) VALUES('1');
UPDATE data SET data = 'a' WHERE id = 1;
DELETE FROM data WHERE id = 1;
SELECT * FROM pg_logical_emit_message(true, 'test1', 'aaa');
INSERT INTO data(data) VALUES('2');
TRUNCATE data;
COMMIT;

=# SELECT * FROM pg_logical_slot_get_changes('regression_slot', NULL,
NULL);
lsn | xid | data
------------+-----+------------------------------------------------------------
0/017E9640 | 696 | BEGIN 696
0/017E9640 | 696 | table public.data: INSERT: id[integer]:3
data[text]:'1'
0/017E96C0 | 696 | table public.data: UPDATE: id[integer]:1
data[text]:'a'
0/017E9740 | 696 | table public.data: DELETE: id[integer]:1
0/017E97F8 | 696 | message: transactional: 1 prefix: test1, sz: 3
content:aaa
0/017E9830 | 696 | table public.data: INSERT: id[integer]:4
data[text]:'2'
0/017EA420 | 696 | table public.data: TRUNCATE: (no-flags)
0/017EA5C0 | 696 | COMMIT 696

$ pg_waldump -f data/pg_wal/000000010000000000000001
rmgr: Heap len (rec/tot): 61/ 61, tx: 696, lsn:
0/017E9640, prev 0/017E9608, desc: INSERT off: 5, flags: 0x08, blkref
#0: rel 1663/5/16385 blk 0
..(snip)..
rmgr: Heap len (rec/tot): 72/ 72, tx: 696, lsn:
0/017E96C0, prev 0/017E9680, desc: HOT_UPDATE old_xmax: 696, old_off: 2,
old_infobits: [], flags: 0x10, new_xmax: 0, new_off: 6, blkref #0: rel
1663/5/16385 blk 0
..(snip)..
rmgr: Heap len (rec/tot): 64/ 64, tx: 696, lsn:
0/017E9740, prev 0/017E9708, desc: DELETE xmax: 696, off: 6, infobits:
[KEYS_UPDATED], flags: 0x04, blkref #0: rel 1663/5/16385 blk 0
..(snip)..
rmgr: LogicalMessage len (rec/tot): 59/ 59, tx: 696, lsn:
0/017E97B8, prev 0/017E9780, desc: MESSAGE transactional, prefix
"test1"; payload (3 bytes): 61 61 61
rmgr: Standby len (rec/tot): 54/ 54, tx: 0, lsn:
0/017E97F8, prev 0/017E97B8, desc: RUNNING_XACTS nextXid 697
latestCompletedXid 695 oldestRunningXid 696; 1 xacts: 696
..(snip)..
rmgr: Heap len (rec/tot): 61/ 61, tx: 696, lsn:
0/017E9830, prev 0/017E97F8, desc: INSERT off: 7, flags: 0x08, blkref
#0: rel 1663/5/16385 blk 0

Comparing the output of pg_logical_slot_get_changes() with the
pg_waldump, the LSNs reported for INSERT, UPDATE, and DELETE
match the start LSN of the corresponding WAL records. However,
the LSN reported for the logical message does not match the
LogicalMessage WAL record itself (0/017E97B8); instead, it
matches the LSN of the following RUNNING_XACTS record
(0/017E97F8).

I found that changes such as INSERTs and UPDATEs are queued via
ReorderBufferQueueChange() using XLogRecordBuffer.origptr,
whereas logical messages are queued using
XLogRecordBuffer.endptr. This appears to explain the observed
behavior.

My question is: is there a particular reason why logical messages
use endptr instead of origptr?

Looking through the history, this behavior seems to go back to
commit 3fe3511d05127c, which added logical decoding support for
pg_logical_emit_message().

I could not find any discussion about the LSN choice in the pgsql-hackers
thread [1]/messages/by-id/5685F999.6010202@2ndquadrant.com. Perhaps it was intended that the LSN reported for a logical
message should match the return value of pg_logical_emit_message(), as
suggested by the test you fixed, but I'm not sure.

However, the documentation [2]https://www.postgresql.org/docs/current/logicaldecoding-output-plugin.html#LOGICALDECODING-OUTPUT-PLUGIN-MESSAGE for the callback function says:

typedef void (*LogicalDecodeMessageCB) (struct LogicalDecodingContext *ctx,
ReorderBufferTXN *txn,
XLogRecPtr message_lsn,
bool transactional,
const char *prefix,
Size message_size,
const char *message);

... The lsn has WAL location of the message. ...

Based on my reading, the current behavior does not seem to match that
description. However, if reporting the end LSN is intentional, perhaps
the documentation should be updated to clarify that.

BTW, since the test no longer uses $message_lsn after your change, could we
remove the variable that stores the result of pg_logical_emit_message() as well?

[1]: /messages/by-id/5685F999.6010202@2ndquadrant.com
[2]: https://www.postgresql.org/docs/current/logicaldecoding-output-plugin.html#LOGICALDECODING-OUTPUT-PLUGIN-MESSAGE

Regards,
Yugo Nagata
--
Yugo Nagata <nagata@sraoss.co.jp>

#3Kyotaro Horiguchi
horikyota.ntt@gmail.com
In reply to: Yugo Nagata (#2)
Re: Why is the LSN reported for pg_logical_emit_message() different from other decoded operations?

Hello.

At Tue, 9 Jun 2026 11:23:32 +0900, Yugo Nagata <nagata@sraoss.co.jp> wrote in

Based on my reading, the current behavior does not seem to match that
description. However, if reporting the end LSN is intentional, perhaps
the documentation should be updated to clarify that.

Looking at the existing code, PostgreSQL appears to use the start LSN
for most record types, except for a few special cases such as COMMIT
records.

As for logical decoding messages, their contents are not sent to the
subscriber in the first place. Even if they were, the subscriber would
only use them to advance its received LSN. From that perspective,
PostgreSQL does not seem to care whether the LSN associated with a
message record refers to the beginning or the end of the WAL record.

Of course, this raises the question of whether any existing extension
relies on the current behavior. I don't know the answer to that.

That said, the documentation says:

https://www.postgresql.org/docs/devel/protocol-logicalrep-message-formats.html

Message

....

Int64 (XLogRecPtr)
The LSN of the logical decoding message.

The value currently comes from XLogRecordBuffer.endptr, which is
passed to logicalmsg_decode(). If the documentation is correct, then
it seems to me that this should instead use origptr.

Regards.

--
Kyotaro Horiguchi
NTT Open Source Software Center

#4torikoshia
torikoshia@oss.nttdata.com
In reply to: Kyotaro Horiguchi (#3)
Re: Why is the LSN reported for pg_logical_emit_message() different from other decoded operations?

Nagata-san, Horiguchi-san,

Thank you for confirming! My understanding is as follows:

- It is not clear why only logical messages return endptr.
- According to the documentation, the value should be startptr rather
than endptr.
- PostgreSQL itself does not have mechanism that uses this LSN, so
either value does not cause a problem within PostgreSQL.

Regarding the third point, I believe this distinction is
important for consumers of logical decoding. Typically, a
consumer records the LSN up to which the decoded result has been
successfully processed and uses that LSN as the restart point
after a crash or other failure. From that perspective, whether
startptr or endptr is returned matters.

With the current behavior, consumers need to be aware that only
logical messages return endptr (and this does not appear to be
documented, so one would have to discover it through testing).
Then they need special handling to translate the completion LSN
for logical messages back to the previous processing position.

On this point, I have started discussing with the Debezium
community, where I originally encountered this issue. The main
question is whether a change in PostgreSQL's behavior would be
welcomed, or whether Debezium instead treats logical messages
differently from other decoded records. If they reach some
consensus, I would be happy to share the outcome here. It is also
possible that someone from the Debezium community will comment on
this thread directly.

I think it's valuable to hear from anyone familiar with products
or applications that decode pg_logical_emit_message().

For example, I wonder whether Fujitsu's 'Userlog Operation' might
also be affected:
https://www.postgresql.fastware.com/hubfs/_Global/Manuals/FEP-v17forx86-UserlogOperationGuide.pdf

Thanks,

--
Atsushi Torikoshi

#5Hayato Kuroda (Fujitsu)
kuroda.hayato@fujitsu.com
In reply to: torikoshia (#4)
RE: Why is the LSN reported for pg_logical_emit_message() different from other decoded operations?

Dear Torikoshi-san,

Sorry for the late join...

With the current behavior, consumers need to be aware that only
logical messages return endptr (and this does not appear to be
documented, so one would have to discover it through testing).
Then they need special handling to translate the completion LSN
for logical messages back to the previous processing position.

Actually I did not recognize till I found the thread, and I'm also unclear
the reason of the difference. From my perspective any blockers within
core are not found.

For example, I wonder whether Fujitsu's 'Userlog Operation' might
also be affected:
https://www.postgresql.fastware.com/hubfs/_Global/Manuals/FEP-v17forx86-
UserlogOperationGuide.pdf

FYI, I confirmed the proprietary won't be affected by the change you proposed.

Best regards,
Hayato Kuroda
FUJITSU LIMITED

#6torikoshia
torikoshia@oss.nttdata.com
In reply to: Hayato Kuroda (Fujitsu) (#5)
Re: Why is the LSN reported for pg_logical_emit_message() different from other decoded operations?

On 2026-06-18 17:27, Hayato Kuroda (Fujitsu) wrote:

Dear Torikoshi-san,

Sorry for the late join...

With the current behavior, consumers need to be aware that only
logical messages return endptr (and this does not appear to be
documented, so one would have to discover it through testing).
Then they need special handling to translate the completion LSN
for logical messages back to the previous processing position.

Actually I did not recognize till I found the thread, and I'm also
unclear
the reason of the difference. From my perspective any blockers within
core are not found.

For example, I wonder whether Fujitsu's 'Userlog Operation' might
also be affected:
https://www.postgresql.fastware.com/hubfs/_Global/Manuals/FEP-v17forx86-
UserlogOperationGuide.pdf

FYI, I confirmed the proprietary won't be affected by the change you
proposed.

Thanks for your confirmation! I appreciate that.

On 2026-06-11 11:27, torikoshia wrote:

On this point, I have started discussing with the Debezium
community, where I originally encountered this issue. The main
question is whether a change in PostgreSQL's behavior would be
welcomed, or whether Debezium instead treats logical messages
differently from other decoded records.

FYI, this discussion is still ongoing.

--
Atsushi Torikoshi