Logical replication 'invalid memory alloc request size 1585837200' after upgrading to 17.5
PostgreSQL v17.5 (Ubuntu 17.5-1.pgdg24.04+1); Ubuntu 24.04.2 LTS (kernel
6.8.0); x86-64
Good morning from DeepBlueCapital. Soon after upgrading to 17.5 from 17.4, we
started seeing logical replication failures with publisher errors like this:
ERROR: invalid memory alloc request size 1196493216
(the exact size varies). Here is a typical log extract from the publisher:
2025-05-19 10:30:14 CEST \[1348336-465] remote\_production\_user\@blue DEBUG:
00000: write FB03/349DEF90 flush FB03/349DEF90 apply FB03/349DEF90 reply\_time
2025-05-19 10:30:07.467048+02
2025-05-19 10:30:14 CEST \[1348336-466] remote\_production\_user\@blue LOCATION:
ProcessStandbyReplyMessage, walsender.c:2431
2025-05-19 10:30:14 CEST \[1348336-467] remote\_production\_user\@blue DEBUG:
00000: skipped replication of an empty transaction with XID: 207637565
2025-05-19 10:30:14 CEST \[1348336-468] remote\_production\_user\@blue CONTEXT:
slot "jnb\_production", output plugin "pgoutput", in the commit callback,
associated LSN FB03/349FF938
2025-05-19 10:30:14 CEST \[1348336-469] remote\_production\_user\@blue LOCATION:
pgoutput\_commit\_txn, pgoutput.c:629
2025-05-19 10:30:14 CEST \[1348336-470] remote\_production\_user\@blue DEBUG:
00000: UpdateDecodingStats: updating stats 0x5ae1616c17a8 0 0 0 0 1 0 1 191
2025-05-19 10:30:14 CEST \[1348336-471] remote\_production\_user\@blue LOCATION:
UpdateDecodingStats, logical.c:1943
2025-05-19 10:30:14 CEST \[1348336-472] remote\_production\_user\@blue DEBUG:
00000: found top level transaction 207637519, with catalog changes
2025-05-19 10:30:14 CEST \[1348336-473] remote\_production\_user\@blue LOCATION:
SnapBuildCommitTxn, snapbuild.c:1150
2025-05-19 10:30:14 CEST \[1348336-474] remote\_production\_user\@blue DEBUG:
00000: adding a new snapshot and invalidations to 207616976 at FB03/34A1AAE0
2025-05-19 10:30:14 CEST \[1348336-475] remote\_production\_user\@blue LOCATION:
SnapBuildDistributeSnapshotAndInval, snapbuild.c:915
2025-05-19 10:30:14 CEST \[1348336-476] remote\_production\_user\@blue ERROR:
XX000: invalid memory alloc request size 1196493216
If I'm reading it right, things go wrong on the publisher while preparing the
message, i.e. it's not a subscriber problem.
This particular instance was triggered by a large number of catalog
invalidations: I dumped what I think is the relevant WAL with "pg_waldump -s
FB03/34A1AAE0 -p 17/main/ --xid=207637519" and the output was a single long line:
rmgr: Transaction len (rec/tot): 10665/ 10665, tx: 207637519, lsn:
FB03/34A1AAE0, prev FB03/34A1A8C8, desc: COMMIT 2025-05-19 08:10:12.880599 CEST;
dropped stats: 2/17426/661557718 2/17426/661557717 2/17426/661557714
2/17426/661557678 2/17426/661557677 2/17426/661557674 2/17426/661557673
2/17426/661557672 2/17426/661557669 2/17426/661557618 2/17426/661557617
2/17426/661557614; inval msgs: catcache 80 catcache 79 catcache 80 catcache 79
catcache 55 catcache 54 catcache 7 catcache 6 catcache 7 catcache 6 catcache 7
catcache 6 catcache 7 catcache 6 catcache 7 catcache 6 catcache 7 catcache 6
catcache 7 catcache 6 catcache 7 catcache 6 catcache 7 catcache 6 catcache 7
catcache 6 catcache 7 catcache 6 catcache 7 catcache 6 catcache 7 catcache 6
catcache 55 catcache 54 catcache 7 catcache 6 catcache 7 catcache 6 catcache 7
catcache 6 catcache 7 catcache 6 catcache 7 catcache 6 catcache 7 catcache 6
catcache 7 catcache 6 catcache 7 catcache 6 catcache 7 catcache 6 catcache 55
catcache 54 catcache 7 catcache 6 catcache 7 catcache 6 catcache 32 catcache 55
catcache 54 catcache 55 catcache 54 catcache 55 catcache 54 catcache 80 catcache
79 catcache 80 catcache 79 catcache 55 catcache 54 catcache 7 catcache 6
catcache 7 catcache 6 catcache 7 catcache 6 catcache 7 catcache 6 catcache 7
catcache 6 catcache 7 catcache 6 catcache 7 catcache 6 catcache 7 catcache 6
catcache 7 catcache 6 catcache 7 catcache 6 catcache 7 catcache 6 catcache 7
catcache 6 catcache 7 catcache 6 catcache 55 catcache 54 catcache 7 catcache 6
catcache 7 catcache 6 catcache 7 catcache 6 catcache 7 catcache 6 catcache 7
catcache 6 catcache 7 catcache 6 catcache 7 catcache 6 catcache 7 catcache 6
catcache 7 catcache 6 catcache 55 catcache 54 catcache 7 catcache 6 catcache 7
catcache 6 catcache 32 catcache 55 catcache 54 catcache 55 catcache 54 catcache
55 catcache 54 catcache 63 catcache 63 catcache 63 catcache 63 catcache 63
catcache 63 catcache 63 catcache 55 catcache 54 catcache 80 catcache 79 catcache
80 catcache 79 catcache 55 catcache 54 catcache 7 catcache 6 catcache 7 catcache
6 catcache 7 catcache 6 catcache 7 catcache 6 catcache 7 catcache 6 catcache 7
catcache 6 catcache 7 catcache 6 catcache 7 catcache 6 catcache 7 catcache 6
catcache 7 catcache 6 catcache 7 catcache 6 catcache 7 catcache 6 catcache 7
catcache 6 catcache 7 catcache 6 catcache 7 catcache 6 catcache 7 catcache 6
catcache 7 catcache 6 catcache 55 catcache 54 catcache 7 catcache 6 catcache 7
catcache 6 catcache 7 catcache 6 catcache 7 catcache 6 catcache 7 catcache 6
catcache 7 catcache 6 catcache 7 catcache 6 catcache 7 catcache 6 catcache 7
catcache 6 catcache 55 catcache 54 catcache 7 catcache 6 catcache 7 catcache 6
catcache 32 catcache 55 catcache 54 catcache 55 catcache 54 catcache 55 catcache
54 catcache 80 catcache 79 catcache 80 catcache 79 catcache 55 catcache 54
catcache 7 catcache 6 catcache 7 catcache 6 catcache 7 catcache 6 catcache 7
catcache 6 catcache 7 catcache 6 catcache 7 catcache 6 catcache 7 catcache 6
catcache 7 catcache 6 catcache 7 catcache 6 catcache 7 catcache 6 catcache 7
catcache 6 catcache 7 catcache 6 catcache 7 catcache 6 catcache 7 catcache 6
catcache 7 catcache 6 catcache 7 catcache 6 catcache 7 catcache 6 catcache 55
catcache 54 catcache 7 catcache 6 catcache 7 catcache 6 catcache 7 catcache 6
catcache 7 catcache 6 catcache 7 catcache 6 catcache 7 catcache 6 catcache 7
catcache 6 catcache 7 catcache 6 catcache 7 catcache 6 catcache 55 catcache 54
catcache 7 catcache 6 catcache 7 catcache 6 catcache 32 catcache 55 catcache 54
catcache 55 catcache 54 catcache 55 catcache 54 catcache 63 catcache 63 catcache
63 catcache 63 catcache 63 catcache 63 catcache 63 catcache 63 catcache 63
catcache 63 catcache 63 catcache 55 catcache 54 catcache 32 catcache 7 catcache
6 catcache 7 catcache 6 catcache 55 catcache 54 catcache 7 catcache 6 catcache 7
catcache 6 catcache 7 catcache 6 catcache 7 catcache 6 catcache 7 catcache 6
catcache 7 catcache 6 catcache 7 catcache 6 catcache 7 catcache 6 catcache 7
catcache 6 catcache 55 catcache 54 catcache 80 catcache 79 catcache 80 catcache
79 catcache 63 catcache 63 catcache 63 catcache 63 catcache 63 catcache 63
catcache 63 catcache 63 catcache 63 catcache 63 catcache 63 catcache 7 catcache
6 catcache 7 catcache 6 catcache 7 catcache 6 catcache 7 catcache 6 catcache 7
catcache 6 catcache 7 catcache 6 catcache 7 catcache 6 catcache 7 catcache 6
catcache 7 catcache 6 catcache 7 catcache 6 catcache 7 catcache 6 catcache 7
catcache 6 catcache 7 catcache 6 catcache 7 catcache 6 catcache 7 catcache 6
catcache 7 catcache 6 catcache 7 catcache 6 catcache 55 catcache 54 catcache 32
catcache 7 catcache 6 catcache 7 catcache 6 catcache 55 catcache 54 catcache 7
catcache 6 catcache 7 catcache 6 catcache 7 catcache 6 catcache 7 catcache 6
catcache 7 catcache 6 catcache 7 catcache 6 catcache 7 catcache 6 catcache 7
catcache 6 catcache 7 catcache 6 catcache 55 catcache 54 catcache 80 catcache 79
catcache 80 catcache 79 catcache 7 catcache 6 catcache 7 catcache 6 catcache 7
catcache 6 catcache 7 catcache 6 catcache 7 catcache 6 catcache 7 catcache 6
catcache 7 catcache 6 catcache 7 catcache 6 catcache 7 catcache 6 catcache 7
catcache 6 catcache 7 catcache 6 catcache 7 catcache 6 catcache 7 catcache 6
catcache 7 catcache 6 catcache 7 catcache 6 catcache 7 catcache 6 catcache 7
catcache 6 catcache 55 catcache 54 catcache 32 catcache 7 catcache 6 catcache 7
catcache 6 catcache 55 catcache 54 catcache 7 catcache 6 catcache 7 catcache 6
catcache 7 catcache 6 catcache 7 catcache 6 catcache 7 catcache 6 catcache 7
catcache 6 catcache 7 catcache 6 catcache 7 catcache 6 catcache 7 catcache 6
catcache 55 catcache 54 catcache 80 catcache 79 catcache 80 catcache 79 catcache
63 catcache 63 catcache 63 catcache 63 catcache 63 catcache 63 catcache 63
catcache 7 catcache 6 catcache 7 catcache 6 catcache 7 catcache 6 catcache 7
catcache 6 catcache 7 catcache 6 catcache 7 catcache 6 catcache 7 catcache 6
catcache 7 catcache 6 catcache 7 catcache 6 catcache 7 catcache 6 catcache 7
catcache 6 catcache 7 catcache 6 catcache 7 catcache 6 catcache 55 catcache 54
catcache 32 catcache 7 catcache 6 catcache 7 catcache 6 catcache 55 catcache 54
catcache 7 catcache 6 catcache 7 catcache 6 catcache 7 catcache 6 catcache 7
catcache 6 catcache 7 catcache 6 catcache 7 catcache 6 catcache 7 catcache 6
catcache 7 catcache 6 catcache 7 catcache 6 catcache 55 catcache 54 catcache 80
catcache 79 catcache 80 catcache 79 catcache 7 catcache 6 catcache 7 catcache 6
catcache 7 catcache 6 catcache 7 catcache 6 catcache 7 catcache 6 catcache 7
catcache 6 catcache 7 catcache 6 catcache 7 catcache 6 catcache 7 catcache 6
catcache 7 catcache 6 catcache 7 catcache 6 catcache 7 catcache 6 catcache 7
catcache 6 catcache 55 catcache 54 snapshot 2608 relcache 661557614 snapshot
1214 relcache 661557617 relcache 661557618 relcache 661557617 snapshot 2608
relcache 661557617 relcache 661557618 relcache 661557614 snapshot 2608 snapshot
2608 relcache 661557669 snapshot 1214 relcache 661557672 relcache 661557673
relcache 661557672 snapshot 2608 relcache 661557672 relcache 661557673 relcache
661557669 snapshot 2608 relcache 661557669 snapshot 2608 relcache 661557674
snapshot 1214 relcache 661557677 relcache 661557678 relcache 661557677 snapshot
2608 relcache 661557677 relcache 661557678 relcache 661557674 snapshot 2608
snapshot 2608 relcache 661557714 snapshot 1214 relcache 661557717 relcache
661557718 relcache 661557717 snapshot 2608 relcache 661557717 relcache 661557718
relcache 661557714 snapshot 2608 relcache 661557714 relcache 661557718 relcache
661557717 snapshot 2608 relcache 661557717 snapshot 2608 snapshot 2608 snapshot
2608 relcache 661557714 snapshot 2608 snapshot 1214 relcache 661557678 relcache
661557677 snapshot 2608 relcache 661557677 snapshot 2608 snapshot 2608 snapshot
2608 relcache 661557674 snapshot 2608 snapshot 1214 relcache 661557673 relcache
661557672 snapshot 2608 relcache 661557672 snapshot 2608 snapshot 2608 snapshot
2608 relcache 661557669 snapshot 2608 snapshot 1214 relcache 661557618 relcache
661557617 snapshot 2608 relcache 661557617 snapshot 2608 snapshot 2608 snapshot
2608 relcache 661557614 snapshot 2608 snapshot 1214
While it is long, it doesn't seem to merit allocating anything like 1GB of
memory. So I'm guessing that postgres is miscalculating the required size somehow.
If I skip over this LSN, for example by dropping the subscription and recreating
it anew, then things go fine for a while before hitting another "invalid memory
alloc request", i.e. it wasn't just a one-off. On the other hand, after
downgrading to 17.4, subscribers spontaneously recovered and the issue has gone
way. Since I didn't skip over the last LSN of this kind, presumably 17.4
successfully serialized a message for the same problematic bit of WAL that
caused 17.5 to blow up, which suggests a regression between 17.4 and 17.5.
Best wishes, Duncan.
On Mon, 19 May 2025 at 20:08, Duncan Sands <duncan.sands@deepbluecap.com> wrote:
PostgreSQL v17.5 (Ubuntu 17.5-1.pgdg24.04+1); Ubuntu 24.04.2 LTS (kernel
6.8.0); x86-64Good morning from DeepBlueCapital. Soon after upgrading to 17.5 from 17.4, we
started seeing logical replication failures with publisher errors like this:ERROR: invalid memory alloc request size 1196493216
(the exact size varies). Here is a typical log extract from the publisher:
2025-05-19 10:30:14 CEST \[1348336-465] remote\_production\_user\@blue DEBUG:
00000: write FB03/349DEF90 flush FB03/349DEF90 apply FB03/349DEF90 reply\_time
2025-05-19 10:30:07.467048+02
2025-05-19 10:30:14 CEST \[1348336-466] remote\_production\_user\@blue LOCATION:
ProcessStandbyReplyMessage, walsender.c:2431
2025-05-19 10:30:14 CEST \[1348336-467] remote\_production\_user\@blue DEBUG:
00000: skipped replication of an empty transaction with XID: 207637565
2025-05-19 10:30:14 CEST \[1348336-468] remote\_production\_user\@blue CONTEXT:
slot "jnb\_production", output plugin "pgoutput", in the commit callback,
associated LSN FB03/349FF938
2025-05-19 10:30:14 CEST \[1348336-469] remote\_production\_user\@blue LOCATION:
pgoutput\_commit\_txn, pgoutput.c:629
2025-05-19 10:30:14 CEST \[1348336-470] remote\_production\_user\@blue DEBUG:
00000: UpdateDecodingStats: updating stats 0x5ae1616c17a8 0 0 0 0 1 0 1 191
2025-05-19 10:30:14 CEST \[1348336-471] remote\_production\_user\@blue LOCATION:
UpdateDecodingStats, logical.c:1943
2025-05-19 10:30:14 CEST \[1348336-472] remote\_production\_user\@blue DEBUG:
00000: found top level transaction 207637519, with catalog changes
2025-05-19 10:30:14 CEST \[1348336-473] remote\_production\_user\@blue LOCATION:
SnapBuildCommitTxn, snapbuild.c:1150
2025-05-19 10:30:14 CEST \[1348336-474] remote\_production\_user\@blue DEBUG:
00000: adding a new snapshot and invalidations to 207616976 at FB03/34A1AAE0
2025-05-19 10:30:14 CEST \[1348336-475] remote\_production\_user\@blue LOCATION:
SnapBuildDistributeSnapshotAndInval, snapbuild.c:915
2025-05-19 10:30:14 CEST \[1348336-476] remote\_production\_user\@blue ERROR:
XX000: invalid memory alloc request size 1196493216If I'm reading it right, things go wrong on the publisher while preparing the
message, i.e. it's not a subscriber problem.This particular instance was triggered by a large number of catalog
invalidations: I dumped what I think is the relevant WAL with "pg_waldump -s
FB03/34A1AAE0 -p 17/main/ --xid=207637519" and the output was a single long line:rmgr: Transaction len (rec/tot): 10665/ 10665, tx: 207637519, lsn:
FB03/34A1AAE0, prev FB03/34A1A8C8, desc: COMMIT 2025-05-19 08:10:12.880599 CEST;
dropped stats: 2/17426/661557718 2/17426/661557717 2/17426/661557714
2/17426/661557678 2/17426/661557677 2/17426/661557674 2/17426/661557673
2/17426/661557672 2/17426/661557669 2/17426/661557618 2/17426/661557617
2/17426/661557614; inval msgs: catcache 80 catcache 79 catcache 80 catcache 79
catcache 55 catcache 54 catcache 7 catcache 6 catcache 7 catcache 6 catcache 7
catcache 6 catcache 7 catcache 6 catcache 7 catcache 6 catcache 7 catcache 6
catcache 7 catcache 6 catcache 7 catcache 6 catcache 7 catcache 6 catcache 7
catcache 6 catcache 7 catcache 6 catcache 7 catcache 6 catcache 7 catcache 6
catcache 55 catcache 54 catcache 7 catcache 6 catcache 7 catcache 6 catcache 7
catcache 6 catcache 7 catcache 6 catcache 7 catcache 6 catcache 7 catcache 6
catcache 7 catcache 6 catcache 7 catcache 6 catcache 7 catcache 6 catcache 55
catcache 54 catcache 7 catcache 6 catcache 7 catcache 6 catcache 32 catcache 55
catcache 54 catcache 55 catcache 54 catcache 55 catcache 54 catcache 80 catcache
79 catcache 80 catcache 79 catcache 55 catcache 54 catcache 7 catcache 6
catcache 7 catcache 6 catcache 7 catcache 6 catcache 7 catcache 6 catcache 7
catcache 6 catcache 7 catcache 6 catcache 7 catcache 6 catcache 7 catcache 6
catcache 7 catcache 6 catcache 7 catcache 6 catcache 7 catcache 6 catcache 7
catcache 6 catcache 7 catcache 6 catcache 55 catcache 54 catcache 7 catcache 6
catcache 7 catcache 6 catcache 7 catcache 6 catcache 7 catcache 6 catcache 7
catcache 6 catcache 7 catcache 6 catcache 7 catcache 6 catcache 7 catcache 6
catcache 7 catcache 6 catcache 55 catcache 54 catcache 7 catcache 6 catcache 7
catcache 6 catcache 32 catcache 55 catcache 54 catcache 55 catcache 54 catcache
55 catcache 54 catcache 63 catcache 63 catcache 63 catcache 63 catcache 63
catcache 63 catcache 63 catcache 55 catcache 54 catcache 80 catcache 79 catcache
80 catcache 79 catcache 55 catcache 54 catcache 7 catcache 6 catcache 7 catcache
6 catcache 7 catcache 6 catcache 7 catcache 6 catcache 7 catcache 6 catcache 7
catcache 6 catcache 7 catcache 6 catcache 7 catcache 6 catcache 7 catcache 6
catcache 7 catcache 6 catcache 7 catcache 6 catcache 7 catcache 6 catcache 7
catcache 6 catcache 7 catcache 6 catcache 7 catcache 6 catcache 7 catcache 6
catcache 7 catcache 6 catcache 55 catcache 54 catcache 7 catcache 6 catcache 7
catcache 6 catcache 7 catcache 6 catcache 7 catcache 6 catcache 7 catcache 6
catcache 7 catcache 6 catcache 7 catcache 6 catcache 7 catcache 6 catcache 7
catcache 6 catcache 55 catcache 54 catcache 7 catcache 6 catcache 7 catcache 6
catcache 32 catcache 55 catcache 54 catcache 55 catcache 54 catcache 55 catcache
54 catcache 80 catcache 79 catcache 80 catcache 79 catcache 55 catcache 54
catcache 7 catcache 6 catcache 7 catcache 6 catcache 7 catcache 6 catcache 7
catcache 6 catcache 7 catcache 6 catcache 7 catcache 6 catcache 7 catcache 6
catcache 7 catcache 6 catcache 7 catcache 6 catcache 7 catcache 6 catcache 7
catcache 6 catcache 7 catcache 6 catcache 7 catcache 6 catcache 7 catcache 6
catcache 7 catcache 6 catcache 7 catcache 6 catcache 7 catcache 6 catcache 55
catcache 54 catcache 7 catcache 6 catcache 7 catcache 6 catcache 7 catcache 6
catcache 7 catcache 6 catcache 7 catcache 6 catcache 7 catcache 6 catcache 7
catcache 6 catcache 7 catcache 6 catcache 7 catcache 6 catcache 55 catcache 54
catcache 7 catcache 6 catcache 7 catcache 6 catcache 32 catcache 55 catcache 54
catcache 55 catcache 54 catcache 55 catcache 54 catcache 63 catcache 63 catcache
63 catcache 63 catcache 63 catcache 63 catcache 63 catcache 63 catcache 63
catcache 63 catcache 63 catcache 55 catcache 54 catcache 32 catcache 7 catcache
6 catcache 7 catcache 6 catcache 55 catcache 54 catcache 7 catcache 6 catcache 7
catcache 6 catcache 7 catcache 6 catcache 7 catcache 6 catcache 7 catcache 6
catcache 7 catcache 6 catcache 7 catcache 6 catcache 7 catcache 6 catcache 7
catcache 6 catcache 55 catcache 54 catcache 80 catcache 79 catcache 80 catcache
79 catcache 63 catcache 63 catcache 63 catcache 63 catcache 63 catcache 63
catcache 63 catcache 63 catcache 63 catcache 63 catcache 63 catcache 7 catcache
6 catcache 7 catcache 6 catcache 7 catcache 6 catcache 7 catcache 6 catcache 7
catcache 6 catcache 7 catcache 6 catcache 7 catcache 6 catcache 7 catcache 6
catcache 7 catcache 6 catcache 7 catcache 6 catcache 7 catcache 6 catcache 7
catcache 6 catcache 7 catcache 6 catcache 7 catcache 6 catcache 7 catcache 6
catcache 7 catcache 6 catcache 7 catcache 6 catcache 55 catcache 54 catcache 32
catcache 7 catcache 6 catcache 7 catcache 6 catcache 55 catcache 54 catcache 7
catcache 6 catcache 7 catcache 6 catcache 7 catcache 6 catcache 7 catcache 6
catcache 7 catcache 6 catcache 7 catcache 6 catcache 7 catcache 6 catcache 7
catcache 6 catcache 7 catcache 6 catcache 55 catcache 54 catcache 80 catcache 79
catcache 80 catcache 79 catcache 7 catcache 6 catcache 7 catcache 6 catcache 7
catcache 6 catcache 7 catcache 6 catcache 7 catcache 6 catcache 7 catcache 6
catcache 7 catcache 6 catcache 7 catcache 6 catcache 7 catcache 6 catcache 7
catcache 6 catcache 7 catcache 6 catcache 7 catcache 6 catcache 7 catcache 6
catcache 7 catcache 6 catcache 7 catcache 6 catcache 7 catcache 6 catcache 7
catcache 6 catcache 55 catcache 54 catcache 32 catcache 7 catcache 6 catcache 7
catcache 6 catcache 55 catcache 54 catcache 7 catcache 6 catcache 7 catcache 6
catcache 7 catcache 6 catcache 7 catcache 6 catcache 7 catcache 6 catcache 7
catcache 6 catcache 7 catcache 6 catcache 7 catcache 6 catcache 7 catcache 6
catcache 55 catcache 54 catcache 80 catcache 79 catcache 80 catcache 79 catcache
63 catcache 63 catcache 63 catcache 63 catcache 63 catcache 63 catcache 63
catcache 7 catcache 6 catcache 7 catcache 6 catcache 7 catcache 6 catcache 7
catcache 6 catcache 7 catcache 6 catcache 7 catcache 6 catcache 7 catcache 6
catcache 7 catcache 6 catcache 7 catcache 6 catcache 7 catcache 6 catcache 7
catcache 6 catcache 7 catcache 6 catcache 7 catcache 6 catcache 55 catcache 54
catcache 32 catcache 7 catcache 6 catcache 7 catcache 6 catcache 55 catcache 54
catcache 7 catcache 6 catcache 7 catcache 6 catcache 7 catcache 6 catcache 7
catcache 6 catcache 7 catcache 6 catcache 7 catcache 6 catcache 7 catcache 6
catcache 7 catcache 6 catcache 7 catcache 6 catcache 55 catcache 54 catcache 80
catcache 79 catcache 80 catcache 79 catcache 7 catcache 6 catcache 7 catcache 6
catcache 7 catcache 6 catcache 7 catcache 6 catcache 7 catcache 6 catcache 7
catcache 6 catcache 7 catcache 6 catcache 7 catcache 6 catcache 7 catcache 6
catcache 7 catcache 6 catcache 7 catcache 6 catcache 7 catcache 6 catcache 7
catcache 6 catcache 55 catcache 54 snapshot 2608 relcache 661557614 snapshot
1214 relcache 661557617 relcache 661557618 relcache 661557617 snapshot 2608
relcache 661557617 relcache 661557618 relcache 661557614 snapshot 2608 snapshot
2608 relcache 661557669 snapshot 1214 relcache 661557672 relcache 661557673
relcache 661557672 snapshot 2608 relcache 661557672 relcache 661557673 relcache
661557669 snapshot 2608 relcache 661557669 snapshot 2608 relcache 661557674
snapshot 1214 relcache 661557677 relcache 661557678 relcache 661557677 snapshot
2608 relcache 661557677 relcache 661557678 relcache 661557674 snapshot 2608
snapshot 2608 relcache 661557714 snapshot 1214 relcache 661557717 relcache
661557718 relcache 661557717 snapshot 2608 relcache 661557717 relcache 661557718
relcache 661557714 snapshot 2608 relcache 661557714 relcache 661557718 relcache
661557717 snapshot 2608 relcache 661557717 snapshot 2608 snapshot 2608 snapshot
2608 relcache 661557714 snapshot 2608 snapshot 1214 relcache 661557678 relcache
661557677 snapshot 2608 relcache 661557677 snapshot 2608 snapshot 2608 snapshot
2608 relcache 661557674 snapshot 2608 snapshot 1214 relcache 661557673 relcache
661557672 snapshot 2608 relcache 661557672 snapshot 2608 snapshot 2608 snapshot
2608 relcache 661557669 snapshot 2608 snapshot 1214 relcache 661557618 relcache
661557617 snapshot 2608 relcache 661557617 snapshot 2608 snapshot 2608 snapshot
2608 relcache 661557614 snapshot 2608 snapshot 1214While it is long, it doesn't seem to merit allocating anything like 1GB of
memory. So I'm guessing that postgres is miscalculating the required size somehow.If I skip over this LSN, for example by dropping the subscription and recreating
it anew, then things go fine for a while before hitting another "invalid memory
alloc request", i.e. it wasn't just a one-off. On the other hand, after
downgrading to 17.4, subscribers spontaneously recovered and the issue has gone
way. Since I didn't skip over the last LSN of this kind, presumably 17.4
successfully serialized a message for the same problematic bit of WAL that
caused 17.5 to blow up, which suggests a regression between 17.4 and 17.5.
Hi Duncan,
Thanks for reporting this.
I tried adding around ~80000 invalidations but could not reproduce the issue.
Can you share the steps to reproduce the above scenario?
Thanks and Regards,
Shlok Kyal
On Mon, May 19, 2025 at 8:08 PM Duncan Sands
<duncan.sands@deepbluecap.com> wrote:
PostgreSQL v17.5 (Ubuntu 17.5-1.pgdg24.04+1); Ubuntu 24.04.2 LTS (kernel
6.8.0); x86-64Good morning from DeepBlueCapital. Soon after upgrading to 17.5 from 17.4, we
started seeing logical replication failures with publisher errors like this:ERROR: invalid memory alloc request size 1196493216
(the exact size varies). Here is a typical log extract from the publisher:
2025-05-19 10:30:14 CEST \[1348336-465] remote\_production\_user\@blue DEBUG:
00000: write FB03/349DEF90 flush FB03/349DEF90 apply FB03/349DEF90 reply\_time
2025-05-19 10:30:07.467048+02
2025-05-19 10:30:14 CEST \[1348336-466] remote\_production\_user\@blue LOCATION:
ProcessStandbyReplyMessage, walsender.c:2431
2025-05-19 10:30:14 CEST \[1348336-467] remote\_production\_user\@blue DEBUG:
00000: skipped replication of an empty transaction with XID: 207637565
2025-05-19 10:30:14 CEST \[1348336-468] remote\_production\_user\@blue CONTEXT:
slot "jnb\_production", output plugin "pgoutput", in the commit callback,
associated LSN FB03/349FF938
2025-05-19 10:30:14 CEST \[1348336-469] remote\_production\_user\@blue LOCATION:
pgoutput\_commit\_txn, pgoutput.c:629
2025-05-19 10:30:14 CEST \[1348336-470] remote\_production\_user\@blue DEBUG:
00000: UpdateDecodingStats: updating stats 0x5ae1616c17a8 0 0 0 0 1 0 1 191
2025-05-19 10:30:14 CEST \[1348336-471] remote\_production\_user\@blue LOCATION:
UpdateDecodingStats, logical.c:1943
2025-05-19 10:30:14 CEST \[1348336-472] remote\_production\_user\@blue DEBUG:
00000: found top level transaction 207637519, with catalog changes
2025-05-19 10:30:14 CEST \[1348336-473] remote\_production\_user\@blue LOCATION:
SnapBuildCommitTxn, snapbuild.c:1150
2025-05-19 10:30:14 CEST \[1348336-474] remote\_production\_user\@blue DEBUG:
00000: adding a new snapshot and invalidations to 207616976 at FB03/34A1AAE0
2025-05-19 10:30:14 CEST \[1348336-475] remote\_production\_user\@blue LOCATION:
SnapBuildDistributeSnapshotAndInval, snapbuild.c:915
2025-05-19 10:30:14 CEST \[1348336-476] remote\_production\_user\@blue ERROR:
XX000: invalid memory alloc request size 1196493216If I'm reading it right, things go wrong on the publisher while preparing the
message, i.e. it's not a subscriber problem.
Right, I also think so.
This particular instance was triggered by a large number of catalog
invalidations: I dumped what I think is the relevant WAL with "pg_waldump -s
FB03/34A1AAE0 -p 17/main/ --xid=207637519" and the output was a single long line:
...
...
While it is long, it doesn't seem to merit allocating anything like 1GB of
memory. So I'm guessing that postgres is miscalculating the required size somehow.
We fixed a bug in commit 4909b38af0 to distribute invalidation at the
transaction end to avoid data loss in certain cases, which could cause
such a problem. I am wondering that even prior to that commit, we
would eventually end up allocating the required memory for a
transaction for all the invalidations because of repalloc in
ReorderBufferAddInvalidations, so why it matter with this commit? One
possibility is that we need allocations for multiple in-progress
transactions now. I'll think more about this. It would be helpful if
you could share more details about the workload, or if possible, a
testcase or script using which we can reproduce this problem.
--
With Regards,
Amit Kapila.
On Wed, May 21, 2025 at 11:18 AM Amit Kapila <amit.kapila16@gmail.com> wrote:
On Mon, May 19, 2025 at 8:08 PM Duncan Sands
<duncan.sands@deepbluecap.com> wrote:While it is long, it doesn't seem to merit allocating anything like 1GB of
memory. So I'm guessing that postgres is miscalculating the required size somehow.We fixed a bug in commit 4909b38af0 to distribute invalidation at the
transaction end to avoid data loss in certain cases, which could cause
such a problem. I am wondering that even prior to that commit, we
would eventually end up allocating the required memory for a
transaction for all the invalidations because of repalloc in
ReorderBufferAddInvalidations, so why it matter with this commit? One
possibility is that we need allocations for multiple in-progress
transactions now.
I think the problem here is that when we are distributing
invalidations to a concurrent transaction, in addition to queuing the
invalidations as a change, we also copy the distributed invalidations
along with the original transaction's invalidations via repalloc in
ReorderBufferAddInvalidations. So, when there are many in-progress
transactions, each would try to copy all its accumulated invalidations
to the remaining in-progress transactions. This could lead to such an
increase in allocation request size. However, after queuing the
change, we don't need to copy it along with the original transaction's
invalidations. This is because the copy is only required when we don't
process any changes in cases like ReorderBufferForget(). I have
analyzed all such cases, and my analysis is as follows:
ReorderBufferForget()
------------------------------
It is okay not to perform the invalidations that we got from other
concurrent transactions during ReorderBufferForget. This is because
ReorderBufferForget executes invalidations when we skip the
transaction being decoded, as it is not from a database of interest.
So, we execute only to invalidate shared catalogs (See comment at the
caller of ReorderBufferForget). It is sufficient to execute such
invalidations in the source transaction only because the transaction
being skipped wouldn't have loaded anything in the shared catalog.
ReorderBufferAbort()
-----------------------------
ReorderBufferAbort() process invalidation when it has already streamed
some changes. Whenever it would have streamed the change, it would
have processed the concurrent transactions' invalidation messages that
happened before the statement that led to streaming. That should be
sufficient for us.
Consider the following variant of the original case that required the
distribution of invalidations:
1) S1: CREATE TABLE d(data text not null);
2) S1: INSERT INTO d VALUES('d1');
3) S2: BEGIN; INSERT INTO d VALUES('d2');
4) S1: ALTER PUBLICATION pb ADD TABLE d;
5) S2: INSERT INTO unrelated_tab VALUES(1);
6) S2: ROLLBACK;
7) S2: INSERT INTO d VALUES('d3');
8) S1: INSERT INTO d VALUES('d4');
The problem with the sequence is that the insert from 3) could be
decoded *after* 4) in step 5) due to streaming, and that to decode the
insert (which happened before the ALTER) the catalog snapshot and
cache state is from *before* the ALTER TABLE. Because the transaction
started in 3) doesn't modify any catalogs, no invalidations are
executed after decoding it. The result could be that the cache looks
like it did at 3), not like after 4). However, this won't create a
problem because while streaming at 5), we would execute invalidation
from S-1 due to the change added via message
REORDER_BUFFER_CHANGE_INVALIDATION in ReorderBufferAddInvalidations.
ReorderBufferInvalidate
--------------------------------
The reason is the same as ReorderBufferForget(), as it executes
invalidations for the same reason, but with a different function to
avoid the cleanup of the buffer at the end.
XLOG_XACT_INVALIDATIONS
-------------------------------------------
While processing XLOG_XACT_INVALIDATIONS, we don't need invalidations
accumulated from other xacts because this is a special case to execute
invalidations from a particular command (DDL) in a transaction. It
won't build any cache, so it can't create any invalid state.
--
With Regards,
Amit Kapila.
Hi Amit and Shlok, thanks for thinking about this issue. We are working on
reproducing it in our test environment. Since it seems likely to be related to
our primary database being very busy with lots of concurrency and large
transactions, we are starting by creating a streaming replication copy of our
primary server (this copy to run 17.5, with the primary on 17.4), with the idea
of then doing logical replication from the standby to see if we hit the same
issue. If so, that gives something to poke at, and we can move on to something
better from there.
Best wishes, Duncan.
Show quoted text
On 21/05/2025 07:48, Amit Kapila wrote:
On Mon, May 19, 2025 at 8:08 PM Duncan Sands
<duncan.sands@deepbluecap.com> wrote:PostgreSQL v17.5 (Ubuntu 17.5-1.pgdg24.04+1); Ubuntu 24.04.2 LTS (kernel
6.8.0); x86-64Good morning from DeepBlueCapital. Soon after upgrading to 17.5 from 17.4, we
started seeing logical replication failures with publisher errors like this:ERROR: invalid memory alloc request size 1196493216
(the exact size varies). Here is a typical log extract from the publisher:
2025-05-19 10:30:14 CEST \[1348336-465] remote\_production\_user\@blue DEBUG:
00000: write FB03/349DEF90 flush FB03/349DEF90 apply FB03/349DEF90 reply\_time
2025-05-19 10:30:07.467048+02
2025-05-19 10:30:14 CEST \[1348336-466] remote\_production\_user\@blue LOCATION:
ProcessStandbyReplyMessage, walsender.c:2431
2025-05-19 10:30:14 CEST \[1348336-467] remote\_production\_user\@blue DEBUG:
00000: skipped replication of an empty transaction with XID: 207637565
2025-05-19 10:30:14 CEST \[1348336-468] remote\_production\_user\@blue CONTEXT:
slot "jnb\_production", output plugin "pgoutput", in the commit callback,
associated LSN FB03/349FF938
2025-05-19 10:30:14 CEST \[1348336-469] remote\_production\_user\@blue LOCATION:
pgoutput\_commit\_txn, pgoutput.c:629
2025-05-19 10:30:14 CEST \[1348336-470] remote\_production\_user\@blue DEBUG:
00000: UpdateDecodingStats: updating stats 0x5ae1616c17a8 0 0 0 0 1 0 1 191
2025-05-19 10:30:14 CEST \[1348336-471] remote\_production\_user\@blue LOCATION:
UpdateDecodingStats, logical.c:1943
2025-05-19 10:30:14 CEST \[1348336-472] remote\_production\_user\@blue DEBUG:
00000: found top level transaction 207637519, with catalog changes
2025-05-19 10:30:14 CEST \[1348336-473] remote\_production\_user\@blue LOCATION:
SnapBuildCommitTxn, snapbuild.c:1150
2025-05-19 10:30:14 CEST \[1348336-474] remote\_production\_user\@blue DEBUG:
00000: adding a new snapshot and invalidations to 207616976 at FB03/34A1AAE0
2025-05-19 10:30:14 CEST \[1348336-475] remote\_production\_user\@blue LOCATION:
SnapBuildDistributeSnapshotAndInval, snapbuild.c:915
2025-05-19 10:30:14 CEST \[1348336-476] remote\_production\_user\@blue ERROR:
XX000: invalid memory alloc request size 1196493216If I'm reading it right, things go wrong on the publisher while preparing the
message, i.e. it's not a subscriber problem.Right, I also think so.
This particular instance was triggered by a large number of catalog
invalidations: I dumped what I think is the relevant WAL with "pg_waldump -s
FB03/34A1AAE0 -p 17/main/ --xid=207637519" and the output was a single long line:...
...While it is long, it doesn't seem to merit allocating anything like 1GB of
memory. So I'm guessing that postgres is miscalculating the required size somehow.We fixed a bug in commit 4909b38af0 to distribute invalidation at the
transaction end to avoid data loss in certain cases, which could cause
such a problem. I am wondering that even prior to that commit, we
would eventually end up allocating the required memory for a
transaction for all the invalidations because of repalloc in
ReorderBufferAddInvalidations, so why it matter with this commit? One
possibility is that we need allocations for multiple in-progress
transactions now. I'll think more about this. It would be helpful if
you could share more details about the workload, or if possible, a
testcase or script using which we can reproduce this problem.
Dear hackers,
I think the problem here is that when we are distributing
invalidations to a concurrent transaction, in addition to queuing the
invalidations as a change, we also copy the distributed invalidations
along with the original transaction's invalidations via repalloc in
ReorderBufferAddInvalidations. So, when there are many in-progress
transactions, each would try to copy all its accumulated invalidations
to the remaining in-progress transactions. This could lead to such an
increase in allocation request size. However, after queuing the
change, we don't need to copy it along with the original transaction's
invalidations. This is because the copy is only required when we don't
process any changes in cases like ReorderBufferForget(). I have
analyzed all such cases, and my analysis is as follows:
Based on the analysis, I created a PoC which avoids the repalloc().
Invalidation messages distributed by SnapBuildDistributeSnapshotAndInval() are
skipped to add in the list, just queued - repalloc can be skipped. Also, the function
distributes messages only in the list, so received messages won't be sent again.
Now a patch for PG17 is created for testing purpose. Duncan, can you apply this and
confirms whether the issue can be solved?
Best regards,
Hayato Kuroda
FUJITSU LIMITED
Attachments:
PG17-0001-Avoid-distributing-invalidation-messages-several-tim.patchapplication/octet-stream; name=PG17-0001-Avoid-distributing-invalidation-messages-several-tim.patchDownload+51-29
Based on the analysis, I created a PoC which avoids the repalloc().
Invalidation messages distributed by SnapBuildDistributeSnapshotAndInval() are
skipped to add in the list, just queued - repalloc can be skipped. Also, the function
distributes messages only in the list, so received messages won't be sent again.Now a patch for PG17 is created for testing purpose. Duncan, can you apply this and
confirms whether the issue can be solved?
Thanks Hayato Kuroda, will do, however it may take a few days.
Best wishes, Duncan.
On Wed, May 21, 2025 at 4:12 AM Amit Kapila <amit.kapila16@gmail.com> wrote:
On Wed, May 21, 2025 at 11:18 AM Amit Kapila <amit.kapila16@gmail.com> wrote:
On Mon, May 19, 2025 at 8:08 PM Duncan Sands
<duncan.sands@deepbluecap.com> wrote:While it is long, it doesn't seem to merit allocating anything like 1GB of
memory. So I'm guessing that postgres is miscalculating the required size somehow.We fixed a bug in commit 4909b38af0 to distribute invalidation at the
transaction end to avoid data loss in certain cases, which could cause
such a problem. I am wondering that even prior to that commit, we
would eventually end up allocating the required memory for a
transaction for all the invalidations because of repalloc in
ReorderBufferAddInvalidations, so why it matter with this commit? One
possibility is that we need allocations for multiple in-progress
transactions now.I think the problem here is that when we are distributing
invalidations to a concurrent transaction, in addition to queuing the
invalidations as a change, we also copy the distributed invalidations
along with the original transaction's invalidations via repalloc in
ReorderBufferAddInvalidations. So, when there are many in-progress
transactions, each would try to copy all its accumulated invalidations
to the remaining in-progress transactions. This could lead to such an
increase in allocation request size.
I agree with this analysis.
However, after queuing the
change, we don't need to copy it along with the original transaction's
invalidations. This is because the copy is only required when we don't
process any changes in cases like ReorderBufferForget().
It seems that we use the accumulated invalidation message also after
replaying or concurrently aborting a transaction via
ReorderBufferExecuteInvalidations(). Do we need to consider such cases
too?
Regards,
--
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com
On Wed, May 21, 2025 at 11:54 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
On Wed, May 21, 2025 at 4:12 AM Amit Kapila <amit.kapila16@gmail.com> wrote:
However, after queuing the
change, we don't need to copy it along with the original transaction's
invalidations. This is because the copy is only required when we don't
process any changes in cases like ReorderBufferForget().It seems that we use the accumulated invalidation message also after
replaying or concurrently aborting a transaction via
ReorderBufferExecuteInvalidations(). Do we need to consider such cases
too?
Good point. After replaying the transaction, it doesn't matter because
we would have already relayed the required invalidation while
processing REORDER_BUFFER_CHANGE_INVALIDATION messages. However for
concurrent abort case it could matter. See my analysis for the same
below:
Simulation of concurrent abort
------------------------------------------
1) S1: CREATE TABLE d(data text not null);
2) S1: INSERT INTO d VALUES('d1');
3) S2: BEGIN; INSERT INTO d VALUES('d2');
4) S2: INSERT INTO unrelated_tab VALUES(1);
5) S1: ALTER PUBLICATION pb ADD TABLE d;
6) S2: INSERT INTO unrelated_tab VALUES(2);
7) S2: ROLLBACK;
8) S2: INSERT INTO d VALUES('d3');
9) S1: INSERT INTO d VALUES('d4');
The problem with the sequence is that the insert from 3) could be
decoded *after* 5) in step 6) due to streaming and that to decode the
insert (which happened before the ALTER) the catalog snapshot and
cache state is from *before* the ALTER TABLE. Because the transaction
started in 3) doesn't actually modify any catalogs, no invalidations
are executed after decoding it. Now, assume, while decoding Insert
from 4), we detected a concurrent abort, then the distributed
invalidation won't be executed, and if we don't have accumulated
messages in txn->invalidations, then the invalidation from step 5)
won't be performed. The data loss can occur in steps 8 and 9. This is
just a theory, so I could be missing something.
If the above turns out to be a problem, one idea for fixing it is that
for the concurrent abort case (both during streaming and for prepared
transaction's processing), we still check all the remaining changes
and process only the changes related to invalidations. This has to be
done before the current txn changes are freed via
ReorderBufferResetTXN->ReorderBufferTruncateTXN.
Thoughts?
--
With Regards,
Amit Kapila.
On Wed, 21 May 2025 at 17:18, Hayato Kuroda (Fujitsu)
<kuroda.hayato@fujitsu.com> wrote:
Dear hackers,
I think the problem here is that when we are distributing
invalidations to a concurrent transaction, in addition to queuing the
invalidations as a change, we also copy the distributed invalidations
along with the original transaction's invalidations via repalloc in
ReorderBufferAddInvalidations. So, when there are many in-progress
transactions, each would try to copy all its accumulated invalidations
to the remaining in-progress transactions. This could lead to such an
increase in allocation request size. However, after queuing the
change, we don't need to copy it along with the original transaction's
invalidations. This is because the copy is only required when we don't
process any changes in cases like ReorderBufferForget(). I have
analyzed all such cases, and my analysis is as follows:Based on the analysis, I created a PoC which avoids the repalloc().
Invalidation messages distributed by SnapBuildDistributeSnapshotAndInval() are
skipped to add in the list, just queued - repalloc can be skipped. Also, the function
distributes messages only in the list, so received messages won't be sent again.Now a patch for PG17 is created for testing purpose. Duncan, can you apply this and
confirms whether the issue can be solved?
Hi,
I was able to reproduce the issue with following test:
1. First begin 9 concurrent txn. (BEGIN; INSERT into t1 values(11);)
2. In 10th concurrent txn : perform 1000 DDL (ALTER PUBLICATION ADD/DROP TABLE)
3. For each concurrent 9 txn. Perform:
i. Add 1000 DDL
ii. COMMIT;
iii. BEGIN; INSERT into t1 values(11);
4. Perform step (2 and 3) in loop
This steps reproduced the error:
2025-05-22 19:03:35.111 JST [63150] sub1 ERROR: invalid memory alloc
request size 1555752832
2025-05-22 19:03:35.111 JST [63150] sub1 STATEMENT: START_REPLICATION
SLOT "sub1" LOGICAL 0/0 (proto_version '4', streaming 'parallel',
origin 'any', publication_names '"pub1"')
I have also attached the test script for the same.
Also, I tried to run the test with Kuroda-san's patch and it did not
reproduce the issue.
Thanks and Regards,
Shlok Kyal
Attachments:
Dear Amit, Sawada-san,
Good point. After replaying the transaction, it doesn't matter because
we would have already relayed the required invalidation while
processing REORDER_BUFFER_CHANGE_INVALIDATION messages. However
for
concurrent abort case it could matter. See my analysis for the same
below:Simulation of concurrent abort
------------------------------------------
1) S1: CREATE TABLE d(data text not null);
2) S1: INSERT INTO d VALUES('d1');
3) S2: BEGIN; INSERT INTO d VALUES('d2');
4) S2: INSERT INTO unrelated_tab VALUES(1);
5) S1: ALTER PUBLICATION pb ADD TABLE d;
6) S2: INSERT INTO unrelated_tab VALUES(2);
7) S2: ROLLBACK;
8) S2: INSERT INTO d VALUES('d3');
9) S1: INSERT INTO d VALUES('d4');
The problem with the sequence is that the insert from 3) could be
decoded *after* 5) in step 6) due to streaming and that to decode the
insert (which happened before the ALTER) the catalog snapshot and
cache state is from *before* the ALTER TABLE. Because the transaction
started in 3) doesn't actually modify any catalogs, no invalidations
are executed after decoding it. Now, assume, while decoding Insert
from 4), we detected a concurrent abort, then the distributed
invalidation won't be executed, and if we don't have accumulated
messages in txn->invalidations, then the invalidation from step 5)
won't be performed. The data loss can occur in steps 8 and 9. This is
just a theory, so I could be missing something.
I verified this is real or not, and succeeded to reproduce. See appendix the
detailed steps.
If the above turns out to be a problem, one idea for fixing it is that
for the concurrent abort case (both during streaming and for prepared
transaction's processing), we still check all the remaining changes
and process only the changes related to invalidations. This has to be
done before the current txn changes are freed via
ReorderBufferResetTXN->ReorderBufferTruncateTXN.
I roughly implemented the part, PSA the updated version. One concern is whether we
should consider the case that invalidations can cause ereport(ERROR). If happens,
the walsender will exit at that time.
Appendix - reproducer
==============
Only an instance was used in the test. Defined objects were:
```
CREATE TABLE d(data text not null);
CREATE TABLE unrelated_tab(data text not null);
CREATE PUBLICATION pb;
```
Then, pg_recvlogical was used to allow replicating changes. Actual command:
```
$ pg_recvlogical --plugin=pgoutput --create-slot --start --slot test -U postgres
-d postgres -o proto_version=4 -o publication_names=pb -o messages=true
-o streaming=true -f -
```
Below are the actual steps. Gdb debugger was used to synchronize tests.
0. Prepare two sessions S1, and S2, and one replication connection
1. Ran "INSERT INTO d VALUES('d1');" on S1
2. Ran "BEGIN; INSERT INTO d VALUES('d2');"
3. Ran "INSERT INTO unrelated_tab VALUES('d2');"
4. Ran "ALTER PUBLICATION pb ADD TABLE d;"
5. Attached the walsender process via gdb
6. set a breakpoint at HandleConcurrentAbort
7. Ran INSERT INTO unrelated_tab VALUES(generate_series(1, 5000));
This allows to stream changes in S2.
8. Confrimed that gdb stopped the walsender process.
9. Ran continue comamnd in gdb several times, to ensure the process
accesses to "unrelated_tab". On my env, backtrace at that time was [1]``` Breakpoint 1, HandleConcurrentAbort () at ../postgres/src/backend/access/index/genam.c:484 484 if (TransactionIdIsValid(CheckXidAlive) && (gdb) bt #0 HandleConcurrentAbort () at ../postgres/src/backend/access/index/genam.c:484 #1 0x000000000052628a in systable_getnext (sysscan=0x31bcbd0) at ../postgres/src/backend/access/index/genam.c:545 #2 0x0000000000b37afa in SearchCatCacheMiss (cache=0x3107180, nkeys=1, hashValue=2617776010, hashIndex=10, v1=16389, v2=0, v3=0, v4=0) at ../postgres/src/backend/utils/cache/catcache.c:1544 #3 0x0000000000b379a3 in SearchCatCacheInternal (cache=0x3107180, nkeys=1, v1=16389, v2=0, v3=0, v4=0) at ../postgres/src/backend/utils/cache/catcache.c:1464 #4 0x0000000000b3769a in SearchCatCache1 (cache=0x3107180, v1=16389) at ../postgres/src/backend/utils/cache/catcache.c:1332 #5 0x0000000000b544d5 in SearchSysCache1 (cacheId=55, key1=16389) at ../postgres/src/backend/utils/cache/syscache.c:228 #6 0x0000000000b3e62a in get_rel_namespace (relid=16389) at ../postgres/src/backend/utils/cache/lsyscache.c:1956 #7 0x00007fa3fdb1e0ec in get_rel_sync_entry (data=0x3160108, relation=0x7fa3fd06f398) at ../postgres/src/backend/replication/pgoutput/pgoutput.c:2037 #8 0x00007fa3fdb1d126 in pgoutput_change (ctx=0x315fd90, txn=0x31b8aa0, relation=0x7fa3fd06f398, change=0x31bab18) at ../postgres/src/backend/replication/pgoutput/pgoutput.c:1455 --Type <RET> for more, q to quit, c to continue without paging--q Quit (gdb) f 8 #8 0x00007fa3fdb1d126 in pgoutput_change (ctx=0x315fd90, txn=0x31b8aa0, relation=0x7fa3fd06f398, change=0x31bab18) at ../postgres/src/backend/replication/pgoutput/pgoutput.c:1455 1455 relentry = get_rel_sync_entry(data, relation); (gdb) p relation->rd_rel.relname $2 = {data = "unrelated_tab", '\000' <repeats 50 times>} ```.
10. Ran "ROLLBACK" in S2
11. On gdb session, moved forward the program and ensured that the concurrent_abort
error was raised.
12. gdb detached from the walsender
13. Ran "INSERT INTO d VALUES('d3');" on S2
14. Ran "INSERT INTO d VALUES('d4');" on S1.
15. Checked the output from pg_recvlogical, and confirmed d3 and d4 were not output [2]``` $ pg_recvlogical --plugin=pgoutput --create-slot --start --slot test -U postgres -d postgres -o proto_version=4 -o publication_names=pb -o messages=true -o streaming=true -f - S E A ```
[1]: ``` Breakpoint 1, HandleConcurrentAbort () at ../postgres/src/backend/access/index/genam.c:484 484 if (TransactionIdIsValid(CheckXidAlive) && (gdb) bt #0 HandleConcurrentAbort () at ../postgres/src/backend/access/index/genam.c:484 #1 0x000000000052628a in systable_getnext (sysscan=0x31bcbd0) at ../postgres/src/backend/access/index/genam.c:545 #2 0x0000000000b37afa in SearchCatCacheMiss (cache=0x3107180, nkeys=1, hashValue=2617776010, hashIndex=10, v1=16389, v2=0, v3=0, v4=0) at ../postgres/src/backend/utils/cache/catcache.c:1544 #3 0x0000000000b379a3 in SearchCatCacheInternal (cache=0x3107180, nkeys=1, v1=16389, v2=0, v3=0, v4=0) at ../postgres/src/backend/utils/cache/catcache.c:1464 #4 0x0000000000b3769a in SearchCatCache1 (cache=0x3107180, v1=16389) at ../postgres/src/backend/utils/cache/catcache.c:1332 #5 0x0000000000b544d5 in SearchSysCache1 (cacheId=55, key1=16389) at ../postgres/src/backend/utils/cache/syscache.c:228 #6 0x0000000000b3e62a in get_rel_namespace (relid=16389) at ../postgres/src/backend/utils/cache/lsyscache.c:1956 #7 0x00007fa3fdb1e0ec in get_rel_sync_entry (data=0x3160108, relation=0x7fa3fd06f398) at ../postgres/src/backend/replication/pgoutput/pgoutput.c:2037 #8 0x00007fa3fdb1d126 in pgoutput_change (ctx=0x315fd90, txn=0x31b8aa0, relation=0x7fa3fd06f398, change=0x31bab18) at ../postgres/src/backend/replication/pgoutput/pgoutput.c:1455 --Type <RET> for more, q to quit, c to continue without paging--q Quit (gdb) f 8 #8 0x00007fa3fdb1d126 in pgoutput_change (ctx=0x315fd90, txn=0x31b8aa0, relation=0x7fa3fd06f398, change=0x31bab18) at ../postgres/src/backend/replication/pgoutput/pgoutput.c:1455 1455 relentry = get_rel_sync_entry(data, relation); (gdb) p relation->rd_rel.relname $2 = {data = "unrelated_tab", '\000' <repeats 50 times>} ```
```
Breakpoint 1, HandleConcurrentAbort () at ../postgres/src/backend/access/index/genam.c:484
484 if (TransactionIdIsValid(CheckXidAlive) &&
(gdb) bt
#0 HandleConcurrentAbort () at ../postgres/src/backend/access/index/genam.c:484
#1 0x000000000052628a in systable_getnext (sysscan=0x31bcbd0)
at ../postgres/src/backend/access/index/genam.c:545
#2 0x0000000000b37afa in SearchCatCacheMiss (cache=0x3107180, nkeys=1, hashValue=2617776010,
hashIndex=10, v1=16389, v2=0, v3=0, v4=0)
at ../postgres/src/backend/utils/cache/catcache.c:1544
#3 0x0000000000b379a3 in SearchCatCacheInternal (cache=0x3107180, nkeys=1, v1=16389, v2=0, v3=0,
v4=0) at ../postgres/src/backend/utils/cache/catcache.c:1464
#4 0x0000000000b3769a in SearchCatCache1 (cache=0x3107180, v1=16389)
at ../postgres/src/backend/utils/cache/catcache.c:1332
#5 0x0000000000b544d5 in SearchSysCache1 (cacheId=55, key1=16389)
at ../postgres/src/backend/utils/cache/syscache.c:228
#6 0x0000000000b3e62a in get_rel_namespace (relid=16389)
at ../postgres/src/backend/utils/cache/lsyscache.c:1956
#7 0x00007fa3fdb1e0ec in get_rel_sync_entry (data=0x3160108, relation=0x7fa3fd06f398)
at ../postgres/src/backend/replication/pgoutput/pgoutput.c:2037
#8 0x00007fa3fdb1d126 in pgoutput_change (ctx=0x315fd90, txn=0x31b8aa0, relation=0x7fa3fd06f398,
change=0x31bab18) at ../postgres/src/backend/replication/pgoutput/pgoutput.c:1455
--Type <RET> for more, q to quit, c to continue without paging--q
Quit
(gdb) f 8
#8 0x00007fa3fdb1d126 in pgoutput_change (ctx=0x315fd90, txn=0x31b8aa0, relation=0x7fa3fd06f398,
change=0x31bab18) at ../postgres/src/backend/replication/pgoutput/pgoutput.c:1455
1455 relentry = get_rel_sync_entry(data, relation);
(gdb) p relation->rd_rel.relname
$2 = {data = "unrelated_tab", '\000' <repeats 50 times>}
```
[2]: ``` $ pg_recvlogical --plugin=pgoutput --create-slot --start --slot test -U postgres -d postgres -o proto_version=4 -o publication_names=pb -o messages=true -o streaming=true -f - S E A ```
```
$ pg_recvlogical --plugin=pgoutput --create-slot --start --slot test -U postgres
-d postgres -o proto_version=4 -o publication_names=pb -o messages=true
-o streaming=true -f -
S
E
A
```
Best regards,
Hayato Kuroda
FUJITSU LIMITED
Attachments:
v2-PG17-0001-Avoid-distributing-invalidation-messages-sev.patchapplication/octet-stream; name=v2-PG17-0001-Avoid-distributing-invalidation-messages-sev.patchDownload+100-29
On Thu, May 22, 2025 at 6:29 PM Hayato Kuroda (Fujitsu)
<kuroda.hayato@fujitsu.com> wrote:
Dear Amit, Sawada-san,
Good point. After replaying the transaction, it doesn't matter because
we would have already relayed the required invalidation while
processing REORDER_BUFFER_CHANGE_INVALIDATION messages. However
for
concurrent abort case it could matter. See my analysis for the same
below:Simulation of concurrent abort
------------------------------------------
1) S1: CREATE TABLE d(data text not null);
2) S1: INSERT INTO d VALUES('d1');
3) S2: BEGIN; INSERT INTO d VALUES('d2');
4) S2: INSERT INTO unrelated_tab VALUES(1);
5) S1: ALTER PUBLICATION pb ADD TABLE d;
6) S2: INSERT INTO unrelated_tab VALUES(2);
7) S2: ROLLBACK;
8) S2: INSERT INTO d VALUES('d3');
9) S1: INSERT INTO d VALUES('d4');The problem with the sequence is that the insert from 3) could be
decoded *after* 5) in step 6) due to streaming and that to decode the
insert (which happened before the ALTER) the catalog snapshot and
cache state is from *before* the ALTER TABLE. Because the transaction
started in 3) doesn't actually modify any catalogs, no invalidations
are executed after decoding it. Now, assume, while decoding Insert
from 4), we detected a concurrent abort, then the distributed
invalidation won't be executed, and if we don't have accumulated
messages in txn->invalidations, then the invalidation from step 5)
won't be performed. The data loss can occur in steps 8 and 9. This is
just a theory, so I could be missing something.I verified this is real or not, and succeeded to reproduce. See appendix the
detailed steps.If the above turns out to be a problem, one idea for fixing it is that
for the concurrent abort case (both during streaming and for prepared
transaction's processing), we still check all the remaining changes
and process only the changes related to invalidations. This has to be
done before the current txn changes are freed via
ReorderBufferResetTXN->ReorderBufferTruncateTXN.I roughly implemented the part, PSA the updated version. One concern is whether we
should consider the case that invalidations can cause ereport(ERROR). If happens,
the walsender will exit at that time.
But, in the catch part, we are already executing invalidations:
...
/* make sure there's no cache pollution */
ReorderBufferExecuteInvalidations(txn->ninvalidations, txn->invalidations);
...
So, the behaviour should be the same.
--
With Regards,
Amit Kapila.
On Thu, May 22, 2025 at 3:57 AM Amit Kapila <amit.kapila16@gmail.com> wrote:
On Wed, May 21, 2025 at 11:54 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
On Wed, May 21, 2025 at 4:12 AM Amit Kapila <amit.kapila16@gmail.com> wrote:
However, after queuing the
change, we don't need to copy it along with the original transaction's
invalidations. This is because the copy is only required when we don't
process any changes in cases like ReorderBufferForget().It seems that we use the accumulated invalidation message also after
replaying or concurrently aborting a transaction via
ReorderBufferExecuteInvalidations(). Do we need to consider such cases
too?Good point. After replaying the transaction, it doesn't matter because
we would have already relayed the required invalidation while
processing REORDER_BUFFER_CHANGE_INVALIDATION messages.
I think the reason why we execute all invalidation messages even in
non concurrent abort cases is that we need to invalidate all caches as
well that are loaded during the replay. Consider the following
sequences:
1) S1: CREATE TABLE d (data text not null);
2) S1: INSERT INTO d VALUES ('d1');
3) S2: BEGIN; INSERT INTO d VALUES ('d2');
4) S3: BEGIN; INSERT INTO d VALUES ('d3');
5) S1: ALTER PUBLICATION pb ADD TABLE d;
6) S2: INSERT INTO d VALUES ('d4');
7) S2: COMMIT;
8) S3: COMMIT;
9) S2: INSERT INTO d VALUES('d5');
10) S1: INSERT INTO d VALUES ('d6');
When replaying S2's first transaction at 7), we decode the insert from
3) using the snapshot which is from before the ALTER, creating the
cache for table 'd'. Then we invalidate the cache by the inval message
distributed from S1's the ALTER and then build the relcache again when
decoding the insert from 6). The cache is the state after the ALTER.
When replaying S3's transaction at 8), we should decode the insert
from 4) using the snapshot which is from before the ALTER. Since we
call ReorderBufferExecuteInvalidations() also in non concurrent abort
paths, we can invalidate the relcache built when decoding the insert
from 6). If we don't include the inval message distributed from 5) to
txn->invalidations, we don't invalidate the relcache and end up
sending the insert from 4) even though it happened before the ALTER.
However for
concurrent abort case it could matter. See my analysis for the same
below:Simulation of concurrent abort
------------------------------------------
1) S1: CREATE TABLE d(data text not null);
2) S1: INSERT INTO d VALUES('d1');
3) S2: BEGIN; INSERT INTO d VALUES('d2');
4) S2: INSERT INTO unrelated_tab VALUES(1);
5) S1: ALTER PUBLICATION pb ADD TABLE d;
6) S2: INSERT INTO unrelated_tab VALUES(2);
7) S2: ROLLBACK;
8) S2: INSERT INTO d VALUES('d3');
9) S1: INSERT INTO d VALUES('d4');The problem with the sequence is that the insert from 3) could be
decoded *after* 5) in step 6) due to streaming and that to decode the
insert (which happened before the ALTER) the catalog snapshot and
cache state is from *before* the ALTER TABLE. Because the transaction
started in 3) doesn't actually modify any catalogs, no invalidations
are executed after decoding it. Now, assume, while decoding Insert
from 4), we detected a concurrent abort, then the distributed
invalidation won't be executed, and if we don't have accumulated
messages in txn->invalidations, then the invalidation from step 5)
won't be performed. The data loss can occur in steps 8 and 9. This is
just a theory, so I could be missing something.
This scenario makes sense to me. I agree that this turns out to be a problem.
If the above turns out to be a problem, one idea for fixing it is that
for the concurrent abort case (both during streaming and for prepared
transaction's processing), we still check all the remaining changes
and process only the changes related to invalidations. This has to be
done before the current txn changes are freed via
ReorderBufferResetTXN->ReorderBufferTruncateTXN.Thoughts?
If the above hypothesis is true, we need to consider another idea so
that we can execute invalidation messages in both cases.
Regards,
--
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com
Dear Hayato Kuroda, thank you so much for working on this problem. Your patch
PG17-0001-Avoid-distributing-invalidation-messages-several-tim.patch solves the
issue for me. Without it I get an invalid memory alloc request error within
about twenty minutes. With your patch, 24 hours have passed with no errors.
Best wishes, Duncan.
Show quoted text
On 21/05/2025 13:48, Hayato Kuroda (Fujitsu) wrote:
Dear hackers,
I think the problem here is that when we are distributing
invalidations to a concurrent transaction, in addition to queuing the
invalidations as a change, we also copy the distributed invalidations
along with the original transaction's invalidations via repalloc in
ReorderBufferAddInvalidations. So, when there are many in-progress
transactions, each would try to copy all its accumulated invalidations
to the remaining in-progress transactions. This could lead to such an
increase in allocation request size. However, after queuing the
change, we don't need to copy it along with the original transaction's
invalidations. This is because the copy is only required when we don't
process any changes in cases like ReorderBufferForget(). I have
analyzed all such cases, and my analysis is as follows:Based on the analysis, I created a PoC which avoids the repalloc().
Invalidation messages distributed by SnapBuildDistributeSnapshotAndInval() are
skipped to add in the list, just queued - repalloc can be skipped. Also, the function
distributes messages only in the list, so received messages won't be sent again.Now a patch for PG17 is created for testing purpose. Duncan, can you apply this and
confirms whether the issue can be solved?Best regards,
Hayato Kuroda
FUJITSU LIMITED
Dear Sawada-san,
I think the reason why we execute all invalidation messages even in
non concurrent abort cases is that we need to invalidate all caches as
well that are loaded during the replay. Consider the following
sequences:1) S1: CREATE TABLE d (data text not null);
2) S1: INSERT INTO d VALUES ('d1');
3) S2: BEGIN; INSERT INTO d VALUES ('d2');
4) S3: BEGIN; INSERT INTO d VALUES ('d3');
5) S1: ALTER PUBLICATION pb ADD TABLE d;
6) S2: INSERT INTO d VALUES ('d4');
7) S2: COMMIT;
8) S3: COMMIT;
9) S2: INSERT INTO d VALUES('d5');
10) S1: INSERT INTO d VALUES ('d6');When replaying S2's first transaction at 7), we decode the insert from
3) using the snapshot which is from before the ALTER, creating the
cache for table 'd'. Then we invalidate the cache by the inval message
distributed from S1's the ALTER and then build the relcache again when
decoding the insert from 6). The cache is the state after the ALTER.
When replaying S3's transaction at 8), we should decode the insert
from 4) using the snapshot which is from before the ALTER. Since we
call ReorderBufferExecuteInvalidations() also in non concurrent abort
paths, we can invalidate the relcache built when decoding the insert
from 6). If we don't include the inval message distributed from 5) to
txn->invalidations, we don't invalidate the relcache and end up
sending the insert from 4) even though it happened before the ALTER.
Thanks for giving another scenario. Let me test this workload and share result later.
Best regards,
Hayato Kuroda
FUJITSU LIMITED
Dear Sawada-san,
I think the reason why we execute all invalidation messages even in
non concurrent abort cases is that we need to invalidate all caches as
well that are loaded during the replay. Consider the following
sequences:1) S1: CREATE TABLE d (data text not null);
2) S1: INSERT INTO d VALUES ('d1');
3) S2: BEGIN; INSERT INTO d VALUES ('d2');
4) S3: BEGIN; INSERT INTO d VALUES ('d3');
5) S1: ALTER PUBLICATION pb ADD TABLE d;
6) S2: INSERT INTO d VALUES ('d4');
7) S2: COMMIT;
8) S3: COMMIT;
9) S2: INSERT INTO d VALUES('d5');
10) S1: INSERT INTO d VALUES ('d6');When replaying S2's first transaction at 7), we decode the insert from
3) using the snapshot which is from before the ALTER, creating the
cache for table 'd'. Then we invalidate the cache by the inval message
distributed from S1's the ALTER and then build the relcache again when
decoding the insert from 6). The cache is the state after the ALTER.
When replaying S3's transaction at 8), we should decode the insert
from 4) using the snapshot which is from before the ALTER. Since we
call ReorderBufferExecuteInvalidations() also in non concurrent abort
paths, we can invalidate the relcache built when decoding the insert
from 6). If we don't include the inval message distributed from 5) to
txn->invalidations, we don't invalidate the relcache and end up
sending the insert from 4) even though it happened before the ALTER.
You're right. I tested the workload on the latest PG17 and PoC, and confirmed that
PoC replicated d3 tuple, which is not good.
If the above hypothesis is true, we need to consider another idea so
that we can execute invalidation messages in both cases.
The straightforward fix is to check the change queue as well when the transaction
has invalidation messages. 0003 implemented that. One downside is that traversing
changes can affect performance. Currently we iterates all of changes even a
single REORDER_BUFFER_CHANGE_INVALIDATION. I cannot find better solutions for now.
Thought?
Best regards,
Hayato Kuroda
FUJITSU LIMITED
Attachments:
v3-PG17-0001-Avoid-distributing-invalidation-messages-sev.patchapplication/octet-stream; name=v3-PG17-0001-Avoid-distributing-invalidation-messages-sev.patchDownload+114-29
On Mon, May 26, 2025 at 2:52 PM Hayato Kuroda (Fujitsu)
<kuroda.hayato@fujitsu.com> wrote:
If the above hypothesis is true, we need to consider another idea so
that we can execute invalidation messages in both cases.The straightforward fix is to check the change queue as well when the transaction
has invalidation messages. 0003 implemented that. One downside is that traversing
changes can affect performance. Currently we iterates all of changes even a
single REORDER_BUFFER_CHANGE_INVALIDATION. I cannot find better solutions for now.
It can impact the performance for large transactions with fewer
invalidations, especially the ones which has spilled changes because
it needs to traverse the entire list of changes again at the end. The
other idea would be to add new member(s) in ReorderBufferTXN to
receive distributed invalidations. For adding the new member in
ReorderBufferTXN: (a) in HEAD, it should be okay, (b) for
backbranches, we may be able to add at the end, but we should check if
there are any extensions using sizeof(ReorderBufferTxn) and if they
are using what we need to do.
I think the new member could be similar to existing members (uint32
ninvalidations; and SharedInvalidationMessage *invalidations;) or a
separate change queue of only REORDER_BUFFER_CHANGE_INVALIDATION
messages. The second one is worth considering because multiple
transactions can distribute their invalidations to a single txn in
chunks, which can be stored as separate changes, and the other benefit
of the second one is lower risk of the need for a larger chunk of
memory allocation due to repalloc. Also, it would be easy to consider
its size via ReorderBufferChangeMemoryUpdate.
--
With Regards,
Amit Kapila.
Re: Duncan Sands
PostgreSQL v17.5 (Ubuntu 17.5-1.pgdg24.04+1); Ubuntu 24.04.2 LTS (kernel
6.8.0); x86-64
Fwiw, one more field report from Debian:
We had a severe issue after upgrading postgresql-16 from 16.8-1.pgdg110+1 to 16.9-1.pgdg110+1. The following error happened quickly on a logical replication (sorry for the French).
ERREUR: n'a pas pu recevoir des donn�es du flux de WAL : ERROR: invalid memory alloc request size 1196912896
After trying several things, we finally managed to restart the replication after downgrading the pg-related packages on this host alone (the other servers were not impacted).
https://salsa.debian.org/postgresql/postgresql/-/issues/4
(no extra details in there, though)
Christoph
On Mon, May 26, 2025 at 4:19 AM Amit Kapila <amit.kapila16@gmail.com> wrote:
On Mon, May 26, 2025 at 2:52 PM Hayato Kuroda (Fujitsu)
<kuroda.hayato@fujitsu.com> wrote:If the above hypothesis is true, we need to consider another idea so
that we can execute invalidation messages in both cases.The straightforward fix is to check the change queue as well when the transaction
has invalidation messages. 0003 implemented that. One downside is that traversing
changes can affect performance. Currently we iterates all of changes even a
single REORDER_BUFFER_CHANGE_INVALIDATION. I cannot find better solutions for now.It can impact the performance for large transactions with fewer
invalidations, especially the ones which has spilled changes because
it needs to traverse the entire list of changes again at the end.
Agreed.
The
other idea would be to add new member(s) in ReorderBufferTXN to
receive distributed invalidations. For adding the new member in
ReorderBufferTXN: (a) in HEAD, it should be okay, (b) for
backbranches, we may be able to add at the end, but we should check if
there are any extensions using sizeof(ReorderBufferTxn) and if they
are using what we need to do.
If we can make sure that that change won't break the existing
extensions, I think this would be the most reasonable solution.
Regards,
--
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com
Dear Sawada-san, Amit,
It can impact the performance for large transactions with fewer
invalidations, especially the ones which has spilled changes because
it needs to traverse the entire list of changes again at the end.Agreed.
The
other idea would be to add new member(s) in ReorderBufferTXN to
receive distributed invalidations. For adding the new member in
ReorderBufferTXN: (a) in HEAD, it should be okay, (b) for
backbranches, we may be able to add at the end, but we should check if
there are any extensions using sizeof(ReorderBufferTxn) and if they
are using what we need to do.If we can make sure that that change won't break the existing
extensions, I think this would be the most reasonable solution.
Based on the discussion, I created PoC for master/PG17. Please see attached.
The basic idea is to introduce the new queue which only contains distributed inval
messages. Contents are consumed at end of transactions. I feel some of codes can
be re-used so that internal functions are introduced. At least, it could pass
regression tests and workloads discussed here.
Best regards,
Hayato Kuroda
FUJITSU LIMITED