BUG #18641: Logical decoding of two-phase commit fails with TOASTed default values

Started by PG Bug reporting formover 1 year ago10 messageshackersbugs
Jump to latest
#1PG Bug reporting form
noreply@postgresql.org
hackersbugs

The following bug has been logged on the website:

Bug reference: 18641
Logged by: Alexander Lakhin
Email address: exclusion@gmail.com
PostgreSQL version: 17.0
Operating system: Ubuntu 22.04
Description:

Logical decoding of a transaction like:
BEGIN;
INSERT INTO test_tab VALUES(1);
PREPARE TRANSACTION 'pt';

where test_tab defined as:
CREATE TABLE test_tab(a int primary key,
t text DEFAULT 'Some TOASTable value';

for a subscription created with two_phase = on, fails as below:
2024-09-28 06:44:50.708 UTC [3741774:6][client backend][6/2:740] LOG:
statement: PREPARE TRANSACTION 'pt';
2024-09-28 06:44:50.709 UTC [3741774:7][client backend][:0] LOG:
disconnection: session time: 0:00:00.006 user=law database=postgres
host=[local]
2024-09-28 06:44:50.713 UTC [3741741:17][walsender][25/0:0] ERROR:
unexpected table_index_fetch_tuple call during logical decoding
2024-09-28 06:44:50.713 UTC [3741741:18][walsender][25/0:0] BACKTRACE:
table_index_fetch_tuple at tableam.h:1253:3
index_fetch_heap at indexam.c:637:10
index_getnext_slot at indexam.c:697:6
systable_getnext_ordered at genam.c:717:5
heap_fetch_toast_slice at heaptoast.c:698:17
table_relation_fetch_toast_slice at tableam.h:1924:1
toast_fetch_datum at detoast.c:379:2
detoast_attr at detoast.c:123:10
pg_detoast_datum_packed at fmgr.c:1867:10
text_to_cstring at varlena.c:220:23
AttrDefaultFetch at relcache.c:4537:17
RelationBuildTupleDesc at relcache.c:697:4
RelationBuildDesc at relcache.c:1188:24
RelationIdGetRelation at relcache.c:2116:7
ReorderBufferProcessTXN at reorderbuffer.c:2246:17
ReorderBufferReplay at reorderbuffer.c:2725:2
ReorderBufferPrepare at reorderbuffer.c:2822:2
DecodePrepare at decode.c:826:2
xact_decode at decode.c:347:5
LogicalDecodingProcessRecord at decode.c:123:1
XLogSendLogical at walsender.c:3413:33
WalSndLoop at walsender.c:2814:4
StartLogicalReplication at walsender.c:1523:2
exec_replication_command at walsender.c:2148:5
PostgresMain at postgres.c:4763:11
BackendInitialize at backend_startup.c:123:1
postmaster_child_launch at launch_backend.c:281:9
BackendStartup at postmaster.c:3593:8
ServerLoop at postmaster.c:1677:10
PostmasterMain at postmaster.c:1372:11
startup_hacks at main.c:217:1
/lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0x80)
[0x7f30a90d7e40]
postgres: node_publisher: walsender law postgres [local]
START_REPLICATION(_start+0x25) [0x5647b6023565]
2024-09-28 06:44:50.713 UTC [3741741:19][walsender][25/0:0] LOG: released
logical replication slot "test_sub"
2024-09-28 06:44:50.713 UTC [3741741:20][walsender][25/0:0] LOG: could not
send data to client: Broken pipe
2024-09-28 06:44:50.713 UTC [3741741:21][walsender][25/0:0] FATAL:
connection to client lost

The issue can be easily reproduced with 022_twophase_cascade.pl modified
like this:
@@ -67,0 +68 @@ $node_A->safe_psql('postgres',
+my $default = join('', map {chr(65 + rand 26)} (1 .. 10000));
@@ -69 +70 @@ $node_B->safe_psql('postgres',
-    "CREATE TABLE test_tab (a int primary key, b bytea, c timestamptz
DEFAULT now(), d bigint DEFAULT 999)"
+    "CREATE TABLE test_tab (a int primary key, b bytea, c timestamptz
DEFAULT now(), d bigint DEFAULT 999, t text DEFAULT '$default')"
@@ -72 +73 @@ $node_C->safe_psql('postgres',
-    "CREATE TABLE test_tab (a int primary key, b bytea, c timestamptz
DEFAULT now(), d bigint DEFAULT 999)"
+    "CREATE TABLE test_tab (a int primary key, b bytea, c timestamptz
DEFAULT now(), d bigint DEFAULT 999, t text DEFAULT '$default')"

Reproduced on REL_15_STABLE (starting from a8fd13cab) .. master.

#2Takeshi Ideriha
iderihatakeshi@gmail.com
In reply to: PG Bug reporting form (#1)
hackersbugs
Re: BUG #18641: Logical decoding of two-phase commit fails with TOASTed default values

Hi, Alexander and hackers

The following bug has been logged on the website:

Bug reference: 18641
Logged by: Alexander Lakhin
Email address: exclusion@gmail.com
PostgreSQL version: 17.0
Operating system: Ubuntu 22.04
Description:

Logical decoding of a transaction like:
BEGIN;
INSERT INTO test_tab VALUES(1);
PREPARE TRANSACTION 'pt';

where test_tab defined as:
CREATE TABLE test_tab(a int primary key,
t text DEFAULT 'Some TOASTable value';

for a subscription created with two_phase = on, fails as below:
2024-09-28 06:44:50.708 UTC [3741774:6][client backend][6/2:740] LOG:
statement: PREPARE TRANSACTION 'pt';
2024-09-28 06:44:50.709 UTC [3741774:7][client backend][:0] LOG:
disconnection: session time: 0:00:00.006 user=law database=postgres
host=[local]
2024-09-28 06:44:50.713 UTC [3741741:17][walsender][25/0:0] ERROR:
unexpected table_index_fetch_tuple call during logical decoding
2024-09-28 06:44:50.713 UTC [3741741:18][walsender][25/0:0] BACKTRACE:
table_index_fetch_tuple at tableam.h:1253:3
index_fetch_heap at indexam.c:637:10
index_getnext_slot at indexam.c:697:6
systable_getnext_ordered at genam.c:717:5
heap_fetch_toast_slice at heaptoast.c:698:17
table_relation_fetch_toast_slice at tableam.h:1924:1
toast_fetch_datum at detoast.c:379:2
detoast_attr at detoast.c:123:10
pg_detoast_datum_packed at fmgr.c:1867:10
text_to_cstring at varlena.c:220:23
AttrDefaultFetch at relcache.c:4537:17
RelationBuildTupleDesc at relcache.c:697:4
RelationBuildDesc at relcache.c:1188:24
RelationIdGetRelation at relcache.c:2116:7
ReorderBufferProcessTXN at reorderbuffer.c:2246:17
ReorderBufferReplay at reorderbuffer.c:2725:2
ReorderBufferPrepare at reorderbuffer.c:2822:2
DecodePrepare at decode.c:826:2
xact_decode at decode.c:347:5
LogicalDecodingProcessRecord at decode.c:123:1
XLogSendLogical at walsender.c:3413:33
WalSndLoop at walsender.c:2814:4
StartLogicalReplication at walsender.c:1523:2
exec_replication_command at walsender.c:2148:5
PostgresMain at postgres.c:4763:11
BackendInitialize at backend_startup.c:123:1
postmaster_child_launch at launch_backend.c:281:9
BackendStartup at postmaster.c:3593:8
ServerLoop at postmaster.c:1677:10
PostmasterMain at postmaster.c:1372:11
startup_hacks at main.c:217:1
/lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0x80)
[0x7f30a90d7e40]
postgres: node_publisher: walsender law postgres [local]
START_REPLICATION(_start+0x25) [0x5647b6023565]
2024-09-28 06:44:50.713 UTC [3741741:19][walsender][25/0:0] LOG: released
logical replication slot "test_sub"
2024-09-28 06:44:50.713 UTC [3741741:20][walsender][25/0:0] LOG: could not
send data to client: Broken pipe
2024-09-28 06:44:50.713 UTC [3741741:21][walsender][25/0:0] FATAL:
connection to client lost

The issue can be easily reproduced with 022_twophase_cascade.pl modified
like this:
@@ -67,0 +68 @@ $node_A->safe_psql('postgres',
+my $default = join('', map {chr(65 + rand 26)} (1 .. 10000));
@@ -69 +70 @@ $node_B->safe_psql('postgres',
-    "CREATE TABLE test_tab (a int primary key, b bytea, c timestamptz
DEFAULT now(), d bigint DEFAULT 999)"
+    "CREATE TABLE test_tab (a int primary key, b bytea, c timestamptz
DEFAULT now(), d bigint DEFAULT 999, t text DEFAULT '$default')"
@@ -72 +73 @@ $node_C->safe_psql('postgres',
-    "CREATE TABLE test_tab (a int primary key, b bytea, c timestamptz
DEFAULT now(), d bigint DEFAULT 999)"
+    "CREATE TABLE test_tab (a int primary key, b bytea, c timestamptz
DEFAULT now(), d bigint DEFAULT 999, t text DEFAULT '$default')"

Reproduced on REL_15_STABLE (starting from a8fd13cab) .. master.

Thank you for reporting the issue.
I was able to reproduce the issue by modifying 022_twophase_cascade.pl
accordingly.

The scan for toast index is actually done under systable_getnext_ordered,
where HandleConcurrentAbort() is called. So it seems to me that this
scan is actually safe for concurrent abort in logical decoding.
Logic around HandleConcurrentAbort is intorduced
https://github.com/postgres/postgres/commit/7259736a6e5b7c7588fff9578370736a6648acbb.

Though I may not understand the logic around HandleConcurrentAbort
fully and I am not sure not-setting bsysscan at
systable_beginscan_ordered is intentional,
it seems to me setting and unsetting a bsysscan flag in
systable_beginscan_ordered and systable_endscan_ordered would resolve
the issue.
Patch is attached.

Regards,
Takeshi Ideriha

Attachments:

0001-Set-and-unset-bsysscan-flag-in-systable_-_ordered.patchapplication/octet-stream; name=0001-Set-and-unset-bsysscan-flag-in-systable_-_ordered.patchDownload+17-1
#3Amit Kapila
amit.kapila16@gmail.com
In reply to: Takeshi Ideriha (#2)
hackersbugs
Re: BUG #18641: Logical decoding of two-phase commit fails with TOASTed default values

On Mon, Sep 30, 2024 at 6:46 AM Takeshi Ideriha
<iderihatakeshi@gmail.com> wrote:

Patch is attached.

Thanks for the patch. I'll look into it.

--
With Regards,
Amit Kapila.

#4Amit Kapila
amit.kapila16@gmail.com
In reply to: Takeshi Ideriha (#2)
hackersbugs
Re: BUG #18641: Logical decoding of two-phase commit fails with TOASTed default values

On Mon, Sep 30, 2024 at 6:46 AM Takeshi Ideriha
<iderihatakeshi@gmail.com> wrote:

Thank you for reporting the issue.
I was able to reproduce the issue by modifying 022_twophase_cascade.pl
accordingly.

The scan for toast index is actually done under systable_getnext_ordered,
where HandleConcurrentAbort() is called. So it seems to me that this
scan is actually safe for concurrent abort in logical decoding.
Logic around HandleConcurrentAbort is intorduced
https://github.com/postgres/postgres/commit/7259736a6e5b7c7588fff9578370736a6648acbb.

Though I may not understand the logic around HandleConcurrentAbort
fully and I am not sure not-setting bsysscan at
systable_beginscan_ordered is intentional,
it seems to me setting and unsetting a bsysscan flag in
systable_beginscan_ordered and systable_endscan_ordered would resolve
the issue.

We forgot to set/unset the flag in functions
systable_beginscan_ordered and systable_endscan_ordered. BTW,
shouldn't this occur even without prepare transaction? If so, we need
to backpatch this till 14.

Also, it is better to have a test for this, and let's ensure that the
new test doesn't increase the regression time too much if possible.

One minor point:
+
+    /*
+ * If CheckXidAlive is set then set a flag to indicate that system table

The indentation in the first comment line seems off.

--
With Regards,
Amit Kapila.

#5Takeshi Ideriha
iderihatakeshi@gmail.com
In reply to: Amit Kapila (#4)
hackersbugs
Re: BUG #18641: Logical decoding of two-phase commit fails with TOASTed default values

Hi, Amit
(adding hackers for discussion)

We forgot to set/unset the flag in functions
systable_beginscan_ordered and systable_endscan_ordered. BTW,

Thank you for the clarification.

shouldn't this occur even without prepare transaction? If so, we need
to backpatch this till 14.

Yes, it occurred also at PG14.
I did some tests using 015_stream.pl with some modification like
below, which tests the subscription about stream is on but two-phase
is off.
The same issue occurred at both current head source code and PG14.

```
--- a/src/test/subscription/t/015_stream.pl
+++ b/src/test/subscription/t/015_stream.pl
@@ -134,9 +134,11 @@ my $node_subscriber =
PostgreSQL::Test::Cluster->new('subscriber');
 $node_subscriber->init;
 $node_subscriber->start;
+my $default = join('', map {chr(65 + rand 26)} (1 .. 10000));
+
 # Create some preexisting content on publisher
 $node_publisher->safe_psql('postgres',
-       "CREATE TABLE test_tab (a int primary key, b bytea)");
+       "CREATE TABLE test_tab (a int primary key, b bytea, t text
DEFAULT '$default')");
 $node_publisher->safe_psql('postgres',
        "INSERT INTO test_tab VALUES (1, 'foo'), (2, 'bar')");

@@ -144,7 +146,7 @@ $node_publisher->safe_psql('postgres', "CREATE
TABLE test_tab_2 (a int)");

 # Setup structure on subscriber
 $node_subscriber->safe_psql('postgres',
-       "CREATE TABLE test_tab (a int primary key, b bytea, c
timestamptz DEFAULT now(), d bigint DEFAULT 999)"
+       "CREATE TABLE test_tab (a int primary key, b bytea, t text
DEFAULT '$default', c timestamptz DEFAULT now(), d bigint DEFAULT
999)"
 );
```

Logs from head source:
```
2024-10-01 12:34:56.694 UTC [575202] LOG: starting PostgreSQL 18devel
on x86_64-pc-linux-gnu, compiled by gcc (GCC) 11.4.1 20230605 (Red Hat
11.4.1-2), 64-bit
...
2024-10-01 12:34:57.506 UTC [575258] 015_stream.pl LOG: statement: BEGIN;
2024-10-01 12:34:57.506 UTC [575258] 015_stream.pl LOG: statement:
INSERT INTO test_tab SELECT i, sha256(i::text::bytea) FROM
generate_series(3, 5000) s(i);
2024-10-01 12:34:57.524 UTC [575242] tap_sub ERROR: unexpected
table_index_fetch_tuple call during logical decoding
2024-10-01 12:34:57.524 UTC [575242] tap_sub STATEMENT:
START_REPLICATION SLOT "tap_sub" LOGICAL 0/0 (proto_version '4',
streaming 'on', origin 'any', publication_names '"tap_pub"')
2024-10-01 12:34:57.525 UTC [575242] tap_sub LOG: released logical
replication slot "tap_sub"
2024-10-01 12:34:57.525 UTC [575242] tap_sub LOG: could not send data
to client: Broken pipe
2024-10-01 12:34:57.525 UTC [575242] tap_sub FATAL: connection to client lost
...
2024-10-01 12:34:57.829 UTC [575260] tap_sub STATEMENT:
START_REPLICATION SLOT "tap_sub" LOGICAL 0/0 (proto_version '4',
streaming 'on', origin 'any', publication_names '"tap_pub"')
2024-10-01 12:34:57.829 UTC [575260] tap_sub LOG: starting logical
decoding for slot "tap_sub"
2024-10-01 12:34:57.829 UTC [575260] tap_sub DETAIL: Streaming
transactions committing after 0/1583A68, reading WAL from 0/1583A30.
2024-10-01 12:34:57.829 UTC [575260] tap_sub STATEMENT:
START_REPLICATION SLOT "tap_sub" LOGICAL 0/0 (proto_version '4',
streaming 'on', origin 'any', publication_names '"tap_pub"')
2024-10-01 12:34:57.829 UTC [575260] tap_sub LOG: logical decoding
found consistent point at 0/1583A30
2024-10-01 12:34:57.829 UTC [575260] tap_sub DETAIL: There are no
running transactions.
2024-10-01 12:34:57.829 UTC [575260] tap_sub STATEMENT:
START_REPLICATION SLOT "tap_sub" LOGICAL 0/0 (proto_version '4',
streaming 'on', origin 'any', publication_names '"tap_pub"')
2024-10-01 12:34:57.831 UTC [575260] tap_sub ERROR: unexpected
table_index_fetch_tuple call during logical decoding
2024-10-01 12:34:57.831 UTC [575260] tap_sub STATEMENT:
START_REPLICATION SLOT "tap_sub" LOGICAL 0/0 (proto_version '4',
streaming 'on', origin 'any', publication_names '"tap_pub"')
2024-10-01 12:34:57.832 UTC [575260] tap_sub LOG: released logical
replication slot "tap_sub"
2024-10-01 12:34:57.832 UTC [575260] tap_sub LOG: could not send data
to client: Broken pipe
2024-10-01 12:34:57.832 UTC [575260] tap_sub FATAL: connection to client lost
```

Logs from PG14 source:

```
2024-10-01 13:20:29.409 UTC [593696] LOG: starting PostgreSQL 14.9 on
x86_64-pc-linux-gnu, compiled by gcc (GCC) 11.4.1 20230605 (Red Hat
11.4.1-2), 64-bit
...
2024-10-01 13:20:31.285 UTC [593750] 015_stream.pl LOG: statement: BEGIN;
2024-10-01 13:20:31.285 UTC [593750] 015_stream.pl LOG: statement:
INSERT INTO test_tab SELECT i, md5(i::text) FROM generate_series(3,
5000) s(i);
2024-10-01 13:20:31.301 UTC [593740] tap_sub ERROR: unexpected
table_index_fetch_tuple call during logical decoding
2024-10-01 13:20:31.301 UTC [593740] tap_sub STATEMENT:
START_REPLICATION SLOT "tap_sub" LOGICAL 0/0 (proto_version '2',
streaming 'on', publication_names '"tap_pub"')
2024-10-01 13:20:31.302 UTC [593740] tap_sub LOG: could not send data
to client: Broken pipe
2024-10-01 13:20:31.302 UTC [593740] tap_sub FATAL: connection to client lost
...
2024-10-01 13:20:31.830 UTC [593756] tap_sub STATEMENT:
START_REPLICATION SLOT "tap_sub" LOGICAL 0/0 (proto_version '2',
streaming 'on', publication_names '"tap_pub"')
2024-10-01 13:20:31.830 UTC [593756] tap_sub LOG: logical decoding
found consistent point at 0/172C758
2024-10-01 13:20:31.830 UTC [593756] tap_sub DETAIL: There are no
running transactions.
2024-10-01 13:20:31.830 UTC [593756] tap_sub STATEMENT:
START_REPLICATION SLOT "tap_sub" LOGICAL 0/0 (proto_version '2',
streaming 'on', publication_names '"tap_pub"')
2024-10-01 13:20:31.834 UTC [593756] tap_sub ERROR: unexpected
table_index_fetch_tuple call during logical decoding
2024-10-01 13:20:31.834 UTC [593756] tap_sub STATEMENT:
START_REPLICATION SLOT "tap_sub" LOGICAL 0/0 (proto_version '2',
streaming 'on', publication_names '"tap_pub"')
2024-10-01 13:20:31.835 UTC [593756] tap_sub LOG: could not send data
to client: Broken pipe
2024-10-01 13:20:31.835 UTC [593756] tap_sub FATAL: connection to client lost
```

Also, it is better to have a test for this, and let's ensure that the
new test doesn't increase the regression time too much if possible.

Sure. I'm going to add some test codes in a few days.

The indentation in the first comment line seems off.

Opps, thank you. Fixed.

regards,
Takeshi Ideriha

Attachments:

0001-Set-and-unset-bsysscan-flag-in-systable_-_ordered-v2.patchapplication/x-patch; name=0001-Set-and-unset-bsysscan-flag-in-systable_-_ordered-v2.patchDownload+16-1
#6Takeshi Ideriha
iderihatakeshi@gmail.com
In reply to: Takeshi Ideriha (#5)
hackersbugs
Re: BUG #18641: Logical decoding of two-phase commit fails with TOASTed default values

Hi, Amit
(Sending this again due to previous email got moderated because I
tried to multi post to hackers and bugs.)

We forgot to set/unset the flag in functions
systable_beginscan_ordered and systable_endscan_ordered. BTW,

Thank you for the clarification.

shouldn't this occur even without prepare transaction? If so, we need
to backpatch this till 14.

Yes, it occurred also at PG14.
I did some tests using 015_stream.pl with some modification like
below, which tests the subscription about stream is on but two-phase
is off.
The same issue occurred at both current head source code and PG14.

```
--- a/src/test/subscription/t/015_stream.pl
+++ b/src/test/subscription/t/015_stream.pl
@@ -134,9 +134,11 @@ my $node_subscriber =
PostgreSQL::Test::Cluster->new
('subscriber');
 $node_subscriber->init;
 $node_subscriber->start;
+my $default = join('', map {chr(65 + rand 26)} (1 .. 10000));
+
 # Create some preexisting content on publisher
 $node_publisher->safe_psql('postgres',
-       "CREATE TABLE test_tab (a int primary key, b bytea)");
+       "CREATE TABLE test_tab (a int primary key, b bytea, t text
DEFAULT '$default')");
 $node_publisher->safe_psql('postgres',
        "INSERT INTO test_tab VALUES (1, 'foo'), (2, 'bar')");

@@ -144,7 +146,7 @@ $node_publisher->safe_psql('postgres', "CREATE
TABLE test_tab_2 (a int)");

 # Setup structure on subscriber
 $node_subscriber->safe_psql('postgres',
-       "CREATE TABLE test_tab (a int primary key, b bytea, c
timestamptz DEFAULT now(), d bigint DEFAULT 999)"
+       "CREATE TABLE test_tab (a int primary key, b bytea, t text
DEFAULT '$default', c timestamptz DEFAULT now(), d bigint DEFAULT
999)"
 );
```

Logs from head source:
```
2024-10-01 12:34:56.694 UTC [575202] LOG: starting PostgreSQL 18devel
on x86_64-pc-linux-gnu, compiled by gcc (GCC) 11.4.1 20230605 (Red Hat
11.4.1-2), 64-bit
...
2024-10-01 12:34:57.506 UTC [575258] 015_stream.pl LOG: statement: BEGIN;
2024-10-01 12:34:57.506 UTC [575258] 015_stream.pl LOG: statement:
INSERT INTO test_tab SELECT i, sha256(i::text::bytea) FROM
generate_series(3, 5000) s(i);
2024-10-01 12:34:57.524 UTC [575242] tap_sub ERROR: unexpected
table_index_fetch_tuple call during logical decoding
2024-10-01 12:34:57.524 UTC [575242] tap_sub STATEMENT:
START_REPLICATION SLOT "tap_sub" LOGICAL 0/0 (proto_version '4',
streaming 'on', origin 'any', publication_names '"tap_pub"')
2024-10-01 12:34:57.525 UTC [575242] tap_sub LOG: released logical
replication slot "tap_sub"
2024-10-01 12:34:57.525 UTC [575242] tap_sub LOG: could not send data
to client: Broken pipe
2024-10-01 12:34:57.525 UTC [575242] tap_sub FATAL: connection to client lost
...
2024-10-01 12:34:57.829 UTC [575260] tap_sub STATEMENT:
START_REPLICATION SLOT "tap_sub" LOGICAL 0/0 (proto_version '4',
streaming 'on', origin 'any', publication_names '"tap_pub"')
2024-10-01 12:34:57.829 UTC [575260] tap_sub LOG: starting logical
decoding for slot "tap_sub"
2024-10-01 12:34:57.829 UTC [575260] tap_sub DETAIL: Streaming
transactions committing after 0/1583A68, reading WAL from 0/1583A30.
2024-10-01 12:34:57.829 UTC [575260] tap_sub STATEMENT:
START_REPLICATION SLOT "tap_sub" LOGICAL 0/0 (proto_version '4',
streaming 'on', origin 'any', publication_names '"tap_pub"')
2024-10-01 12:34:57.829 UTC [575260] tap_sub LOG: logical decoding
found consistent point at 0/1583A30
2024-10-01 12:34:57.829 UTC [575260] tap_sub DETAIL: There are no
running transactions.
2024-10-01 12:34:57.829 UTC [575260] tap_sub STATEMENT:
START_REPLICATION SLOT "tap_sub" LOGICAL 0/0 (proto_version '4',
streaming 'on', origin 'any', publication_names '"tap_pub"')
2024-10-01 12:34:57.831 UTC [575260] tap_sub ERROR: unexpected
table_index_fetch_tuple call during logical decoding
2024-10-01 12:34:57.831 UTC [575260] tap_sub STATEMENT:
START_REPLICATION SLOT "tap_sub" LOGICAL 0/0 (proto_version '4',
streaming 'on', origin 'any', publication_names '"tap_pub"')
2024-10-01 12:34:57.832 UTC [575260] tap_sub LOG: released logical
replication slot "tap_sub"
2024-10-01 12:34:57.832 UTC [575260] tap_sub LOG: could not send data
to client: Broken pipe
2024-10-01 12:34:57.832 UTC [575260] tap_sub FATAL: connection to client lost
```

Logs from PG14 source:

```
2024-10-01 13:20:29.409 UTC [593696] LOG: starting PostgreSQL 14.9 on
x86_64-pc-linux-gnu, compiled by gcc (GCC) 11.4.1 20230605 (Red Hat
11.4.1-2), 64-bit
...
2024-10-01 13:20:31.285 UTC [593750] 015_stream.pl LOG: statement: BEGIN;
2024-10-01 13:20:31.285 UTC [593750] 015_stream.pl LOG: statement:
INSERT INTO test_tab SELECT i, md5(i::text) FROM generate_series(3,
5000) s(i);
2024-10-01 13:20:31.301 UTC [593740] tap_sub ERROR: unexpected
table_index_fetch_tuple call during logical decoding
2024-10-01 13:20:31.301 UTC [593740] tap_sub STATEMENT:
START_REPLICATION SLOT "tap_sub" LOGICAL 0/0 (proto_version '2',
streaming 'on', publication_names '"tap_pub"')
2024-10-01 13:20:31.302 UTC [593740] tap_sub LOG: could not send data
to client: Broken pipe
2024-10-01 13:20:31.302 UTC [593740] tap_sub FATAL: connection to client lost
...
2024-10-01 13:20:31.830 UTC [593756] tap_sub STATEMENT:
START_REPLICATION SLOT "tap_sub" LOGICAL 0/0 (proto_version '2',
streaming 'on', publication_names '"tap_pub"')
2024-10-01 13:20:31.830 UTC [593756] tap_sub LOG: logical decoding
found consistent point at 0/172C758
2024-10-01 13:20:31.830 UTC [593756] tap_sub DETAIL: There are no
running transactions.
2024-10-01 13:20:31.830 UTC [593756] tap_sub STATEMENT:
START_REPLICATION SLOT "tap_sub" LOGICAL 0/0 (proto_version '2',
streaming 'on', publication_names '"tap_pub"')
2024-10-01 13:20:31.834 UTC [593756] tap_sub ERROR: unexpected
table_index_fetch_tuple call during logical decoding
2024-10-01 13:20:31.834 UTC [593756] tap_sub STATEMENT:
START_REPLICATION SLOT "tap_sub" LOGICAL 0/0 (proto_version '2',
streaming 'on', publication_names '"tap_pub"')
2024-10-01 13:20:31.835 UTC [593756] tap_sub LOG: could not send data
to client: Broken pipe
2024-10-01 13:20:31.835 UTC [593756] tap_sub FATAL: connection to client lost
```

Also, it is better to have a test for this, and let's ensure that the
new test doesn't increase the regression time too much if possible.

Sure. I'm going to add some test codes in a few days.

The indentation in the first comment line seems off.

Opps, thank you. Fixed.

regards,
Takeshi Ideriha

Attachments:

0001-Set-and-unset-bsysscan-flag-in-systable_-_ordered-v2.patchapplication/octet-stream; name=0001-Set-and-unset-bsysscan-flag-in-systable_-_ordered-v2.patchDownload+16-1
#7Amit Kapila
amit.kapila16@gmail.com
In reply to: Takeshi Ideriha (#5)
hackersbugs
Re: BUG #18641: Logical decoding of two-phase commit fails with TOASTed default values

On Tue, Oct 1, 2024 at 7:07 PM Takeshi Ideriha <iderihatakeshi@gmail.com> wrote:

We forgot to set/unset the flag in functions
systable_beginscan_ordered and systable_endscan_ordered. BTW,

Thank you for the clarification.

shouldn't this occur even without prepare transaction? If so, we need
to backpatch this till 14.

Yes, it occurred also at PG14.
I did some tests using 015_stream.pl with some modification like
below, which tests the subscription about stream is on but two-phase
is off.
The same issue occurred at both current head source code and PG14.

```
--- a/src/test/subscription/t/015_stream.pl
+++ b/src/test/subscription/t/015_stream.pl
@@ -134,9 +134,11 @@ my $node_subscriber =
PostgreSQL::Test::Cluster->new('subscriber');
$node_subscriber->init;
$node_subscriber->start;
+my $default = join('', map {chr(65 + rand 26)} (1 .. 10000));
+
# Create some preexisting content on publisher
$node_publisher->safe_psql('postgres',
-       "CREATE TABLE test_tab (a int primary key, b bytea)");
+       "CREATE TABLE test_tab (a int primary key, b bytea, t text
DEFAULT '$default')");
$node_publisher->safe_psql('postgres',
"INSERT INTO test_tab VALUES (1, 'foo'), (2, 'bar')");

@@ -144,7 +146,7 @@ $node_publisher->safe_psql('postgres', "CREATE
TABLE test_tab_2 (a int)");

# Setup structure on subscriber
$node_subscriber->safe_psql('postgres',
-       "CREATE TABLE test_tab (a int primary key, b bytea, c
timestamptz DEFAULT now(), d bigint DEFAULT 999)"
+       "CREATE TABLE test_tab (a int primary key, b bytea, t text
DEFAULT '$default', c timestamptz DEFAULT now(), d bigint DEFAULT
999)"
);
```

Logs from head source:
```
2024-10-01 12:34:56.694 UTC [575202] LOG: starting PostgreSQL 18devel
on x86_64-pc-linux-gnu, compiled by gcc (GCC) 11.4.1 20230605 (Red Hat
11.4.1-2), 64-bit
...
2024-10-01 12:34:57.506 UTC [575258] 015_stream.pl LOG: statement: BEGIN;
2024-10-01 12:34:57.506 UTC [575258] 015_stream.pl LOG: statement:
INSERT INTO test_tab SELECT i, sha256(i::text::bytea) FROM
generate_series(3, 5000) s(i);
2024-10-01 12:34:57.524 UTC [575242] tap_sub ERROR: unexpected
table_index_fetch_tuple call during logical decoding
2024-10-01 12:34:57.524 UTC [575242] tap_sub STATEMENT:
START_REPLICATION SLOT "tap_sub" LOGICAL 0/0 (proto_version '4',
streaming 'on', origin 'any', publication_names '"tap_pub"')
2024-10-01 12:34:57.525 UTC [575242] tap_sub LOG: released logical
replication slot "tap_sub"
2024-10-01 12:34:57.525 UTC [575242] tap_sub LOG: could not send data
to client: Broken pipe
2024-10-01 12:34:57.525 UTC [575242] tap_sub FATAL: connection to client lost
...
2024-10-01 12:34:57.829 UTC [575260] tap_sub STATEMENT:
START_REPLICATION SLOT "tap_sub" LOGICAL 0/0 (proto_version '4',
streaming 'on', origin 'any', publication_names '"tap_pub"')
2024-10-01 12:34:57.829 UTC [575260] tap_sub LOG: starting logical
decoding for slot "tap_sub"
2024-10-01 12:34:57.829 UTC [575260] tap_sub DETAIL: Streaming
transactions committing after 0/1583A68, reading WAL from 0/1583A30.
2024-10-01 12:34:57.829 UTC [575260] tap_sub STATEMENT:
START_REPLICATION SLOT "tap_sub" LOGICAL 0/0 (proto_version '4',
streaming 'on', origin 'any', publication_names '"tap_pub"')
2024-10-01 12:34:57.829 UTC [575260] tap_sub LOG: logical decoding
found consistent point at 0/1583A30
2024-10-01 12:34:57.829 UTC [575260] tap_sub DETAIL: There are no
running transactions.
2024-10-01 12:34:57.829 UTC [575260] tap_sub STATEMENT:
START_REPLICATION SLOT "tap_sub" LOGICAL 0/0 (proto_version '4',
streaming 'on', origin 'any', publication_names '"tap_pub"')
2024-10-01 12:34:57.831 UTC [575260] tap_sub ERROR: unexpected
table_index_fetch_tuple call during logical decoding
2024-10-01 12:34:57.831 UTC [575260] tap_sub STATEMENT:
START_REPLICATION SLOT "tap_sub" LOGICAL 0/0 (proto_version '4',
streaming 'on', origin 'any', publication_names '"tap_pub"')
2024-10-01 12:34:57.832 UTC [575260] tap_sub LOG: released logical
replication slot "tap_sub"
2024-10-01 12:34:57.832 UTC [575260] tap_sub LOG: could not send data
to client: Broken pipe
2024-10-01 12:34:57.832 UTC [575260] tap_sub FATAL: connection to client lost
```

Logs from PG14 source:

```
2024-10-01 13:20:29.409 UTC [593696] LOG: starting PostgreSQL 14.9 on
x86_64-pc-linux-gnu, compiled by gcc (GCC) 11.4.1 20230605 (Red Hat
11.4.1-2), 64-bit
...
2024-10-01 13:20:31.285 UTC [593750] 015_stream.pl LOG: statement: BEGIN;
2024-10-01 13:20:31.285 UTC [593750] 015_stream.pl LOG: statement:
INSERT INTO test_tab SELECT i, md5(i::text) FROM generate_series(3,
5000) s(i);
2024-10-01 13:20:31.301 UTC [593740] tap_sub ERROR: unexpected
table_index_fetch_tuple call during logical decoding
2024-10-01 13:20:31.301 UTC [593740] tap_sub STATEMENT:
START_REPLICATION SLOT "tap_sub" LOGICAL 0/0 (proto_version '2',
streaming 'on', publication_names '"tap_pub"')
2024-10-01 13:20:31.302 UTC [593740] tap_sub LOG: could not send data
to client: Broken pipe
2024-10-01 13:20:31.302 UTC [593740] tap_sub FATAL: connection to client lost
...
2024-10-01 13:20:31.830 UTC [593756] tap_sub STATEMENT:
START_REPLICATION SLOT "tap_sub" LOGICAL 0/0 (proto_version '2',
streaming 'on', publication_names '"tap_pub"')
2024-10-01 13:20:31.830 UTC [593756] tap_sub LOG: logical decoding
found consistent point at 0/172C758
2024-10-01 13:20:31.830 UTC [593756] tap_sub DETAIL: There are no
running transactions.
2024-10-01 13:20:31.830 UTC [593756] tap_sub STATEMENT:
START_REPLICATION SLOT "tap_sub" LOGICAL 0/0 (proto_version '2',
streaming 'on', publication_names '"tap_pub"')
2024-10-01 13:20:31.834 UTC [593756] tap_sub ERROR: unexpected
table_index_fetch_tuple call during logical decoding
2024-10-01 13:20:31.834 UTC [593756] tap_sub STATEMENT:
START_REPLICATION SLOT "tap_sub" LOGICAL 0/0 (proto_version '2',
streaming 'on', publication_names '"tap_pub"')
2024-10-01 13:20:31.835 UTC [593756] tap_sub LOG: could not send data
to client: Broken pipe
2024-10-01 13:20:31.835 UTC [593756] tap_sub FATAL: connection to client lost
```

Also, it is better to have a test for this, and let's ensure that the
new test doesn't increase the regression time too much if possible.

Sure. I'm going to add some test codes in a few days.

The above test shown in the email will work to test this issue.
However, we should use 'debug_logical_replication_streaming' as that
will help to reproduce the issue with just one write operation. Is it
possible to add the test in 018_stream_subxact_abort where we already
use it?

--
With Regards,
Amit Kapila.

#8Amit Kapila
amit.kapila16@gmail.com
In reply to: Takeshi Ideriha (#5)
hackersbugs
Re: BUG #18641: Logical decoding of two-phase commit fails with TOASTed default values

On Tue, Oct 1, 2024 at 7:07 PM Takeshi Ideriha <iderihatakeshi@gmail.com> wrote:

Sending it again to avoid getting it stuck because the original was
replied to both -hackers and -bugs.

We forgot to set/unset the flag in functions
systable_beginscan_ordered and systable_endscan_ordered. BTW,

Thank you for the clarification.

shouldn't this occur even without prepare transaction? If so, we need
to backpatch this till 14.

Yes, it occurred also at PG14.
I did some tests using 015_stream.pl with some modification like
below, which tests the subscription about stream is on but two-phase
is off.
The same issue occurred at both current head source code and PG14.

```
--- a/src/test/subscription/t/015_stream.pl
+++ b/src/test/subscription/t/015_stream.pl
@@ -134,9 +134,11 @@ my $node_subscriber =
PostgreSQL::Test::Cluster->new('subscriber');
$node_subscriber->init;
$node_subscriber->start;
+my $default = join('', map {chr(65 + rand 26)} (1 .. 10000));
+
# Create some preexisting content on publisher
$node_publisher->safe_psql('postgres',
-       "CREATE TABLE test_tab (a int primary key, b bytea)");
+       "CREATE TABLE test_tab (a int primary key, b bytea, t text
DEFAULT '$default')");
$node_publisher->safe_psql('postgres',
"INSERT INTO test_tab VALUES (1, 'foo'), (2, 'bar')");

@@ -144,7 +146,7 @@ $node_publisher->safe_psql('postgres', "CREATE
TABLE test_tab_2 (a int)");

# Setup structure on subscriber
$node_subscriber->safe_psql('postgres',
-       "CREATE TABLE test_tab (a int primary key, b bytea, c
timestamptz DEFAULT now(), d bigint DEFAULT 999)"
+       "CREATE TABLE test_tab (a int primary key, b bytea, t text
DEFAULT '$default', c timestamptz DEFAULT now(), d bigint DEFAULT
999)"
);
```

Logs from head source:
```
2024-10-01 12:34:56.694 UTC [575202] LOG: starting PostgreSQL 18devel
on x86_64-pc-linux-gnu, compiled by gcc (GCC) 11.4.1 20230605 (Red Hat
11.4.1-2), 64-bit
...
2024-10-01 12:34:57.506 UTC [575258] 015_stream.pl LOG: statement: BEGIN;
2024-10-01 12:34:57.506 UTC [575258] 015_stream.pl LOG: statement:
INSERT INTO test_tab SELECT i, sha256(i::text::bytea) FROM
generate_series(3, 5000) s(i);
2024-10-01 12:34:57.524 UTC [575242] tap_sub ERROR: unexpected
table_index_fetch_tuple call during logical decoding
2024-10-01 12:34:57.524 UTC [575242] tap_sub STATEMENT:
START_REPLICATION SLOT "tap_sub" LOGICAL 0/0 (proto_version '4',
streaming 'on', origin 'any', publication_names '"tap_pub"')
2024-10-01 12:34:57.525 UTC [575242] tap_sub LOG: released logical
replication slot "tap_sub"
2024-10-01 12:34:57.525 UTC [575242] tap_sub LOG: could not send data
to client: Broken pipe
2024-10-01 12:34:57.525 UTC [575242] tap_sub FATAL: connection to client lost
...
2024-10-01 12:34:57.829 UTC [575260] tap_sub STATEMENT:
START_REPLICATION SLOT "tap_sub" LOGICAL 0/0 (proto_version '4',
streaming 'on', origin 'any', publication_names '"tap_pub"')
2024-10-01 12:34:57.829 UTC [575260] tap_sub LOG: starting logical
decoding for slot "tap_sub"
2024-10-01 12:34:57.829 UTC [575260] tap_sub DETAIL: Streaming
transactions committing after 0/1583A68, reading WAL from 0/1583A30.
2024-10-01 12:34:57.829 UTC [575260] tap_sub STATEMENT:
START_REPLICATION SLOT "tap_sub" LOGICAL 0/0 (proto_version '4',
streaming 'on', origin 'any', publication_names '"tap_pub"')
2024-10-01 12:34:57.829 UTC [575260] tap_sub LOG: logical decoding
found consistent point at 0/1583A30
2024-10-01 12:34:57.829 UTC [575260] tap_sub DETAIL: There are no
running transactions.
2024-10-01 12:34:57.829 UTC [575260] tap_sub STATEMENT:
START_REPLICATION SLOT "tap_sub" LOGICAL 0/0 (proto_version '4',
streaming 'on', origin 'any', publication_names '"tap_pub"')
2024-10-01 12:34:57.831 UTC [575260] tap_sub ERROR: unexpected
table_index_fetch_tuple call during logical decoding
2024-10-01 12:34:57.831 UTC [575260] tap_sub STATEMENT:
START_REPLICATION SLOT "tap_sub" LOGICAL 0/0 (proto_version '4',
streaming 'on', origin 'any', publication_names '"tap_pub"')
2024-10-01 12:34:57.832 UTC [575260] tap_sub LOG: released logical
replication slot "tap_sub"
2024-10-01 12:34:57.832 UTC [575260] tap_sub LOG: could not send data
to client: Broken pipe
2024-10-01 12:34:57.832 UTC [575260] tap_sub FATAL: connection to client lost
```

Logs from PG14 source:

```
2024-10-01 13:20:29.409 UTC [593696] LOG: starting PostgreSQL 14.9 on
x86_64-pc-linux-gnu, compiled by gcc (GCC) 11.4.1 20230605 (Red Hat
11.4.1-2), 64-bit
...
2024-10-01 13:20:31.285 UTC [593750] 015_stream.pl LOG: statement: BEGIN;
2024-10-01 13:20:31.285 UTC [593750] 015_stream.pl LOG: statement:
INSERT INTO test_tab SELECT i, md5(i::text) FROM generate_series(3,
5000) s(i);
2024-10-01 13:20:31.301 UTC [593740] tap_sub ERROR: unexpected
table_index_fetch_tuple call during logical decoding
2024-10-01 13:20:31.301 UTC [593740] tap_sub STATEMENT:
START_REPLICATION SLOT "tap_sub" LOGICAL 0/0 (proto_version '2',
streaming 'on', publication_names '"tap_pub"')
2024-10-01 13:20:31.302 UTC [593740] tap_sub LOG: could not send data
to client: Broken pipe
2024-10-01 13:20:31.302 UTC [593740] tap_sub FATAL: connection to client lost
...
2024-10-01 13:20:31.830 UTC [593756] tap_sub STATEMENT:
START_REPLICATION SLOT "tap_sub" LOGICAL 0/0 (proto_version '2',
streaming 'on', publication_names '"tap_pub"')
2024-10-01 13:20:31.830 UTC [593756] tap_sub LOG: logical decoding
found consistent point at 0/172C758
2024-10-01 13:20:31.830 UTC [593756] tap_sub DETAIL: There are no
running transactions.
2024-10-01 13:20:31.830 UTC [593756] tap_sub STATEMENT:
START_REPLICATION SLOT "tap_sub" LOGICAL 0/0 (proto_version '2',
streaming 'on', publication_names '"tap_pub"')
2024-10-01 13:20:31.834 UTC [593756] tap_sub ERROR: unexpected
table_index_fetch_tuple call during logical decoding
2024-10-01 13:20:31.834 UTC [593756] tap_sub STATEMENT:
START_REPLICATION SLOT "tap_sub" LOGICAL 0/0 (proto_version '2',
streaming 'on', publication_names '"tap_pub"')
2024-10-01 13:20:31.835 UTC [593756] tap_sub LOG: could not send data
to client: Broken pipe
2024-10-01 13:20:31.835 UTC [593756] tap_sub FATAL: connection to client lost
```

Also, it is better to have a test for this, and let's ensure that the
new test doesn't increase the regression time too much if possible.

Sure. I'm going to add some test codes in a few days.

The above test shown in the email will work to test this issue.
However, we should use 'debug_logical_replication_streaming' as that
will help to reproduce the issue with just one write operation. Is it
possible to add the test in 018_stream_subxact_abort where we already
use it?

--
With Regards,
Amit Kapila.

#9Zhijie Hou (Fujitsu)
houzj.fnst@fujitsu.com
In reply to: Amit Kapila (#8)
hackersbugs
RE: BUG #18641: Logical decoding of two-phase commit fails with TOASTed default values

On Saturday, October 5, 2024 6:12 PM Amit Kapila <amit.kapila16@gmail.com> wrote:

Sending it again to avoid getting it stuck because the original was replied to both -hackers and -bugs.

On Tue, Oct 1, 2024 at 7:07 PM Takeshi Ideriha
<iderihatakeshi@gmail.com>
wrote:

We forgot to set/unset the flag in functions
systable_beginscan_ordered and systable_endscan_ordered. BTW,

Thank you for the clarification.

shouldn't this occur even without prepare transaction? If so, we
need to backpatch this till 14.

Yes, it occurred also at PG14.
I did some tests using 015_stream.pl with some modification like
below, which tests the subscription about stream is on but two-phase
is off.
The same issue occurred at both current head source code and PG14.

```
--- a/src/test/subscription/t/015_stream.pl
+++ b/src/test/subscription/t/015_stream.pl
@@ -134,9 +134,11 @@ my $node_subscriber = 
PostgreSQL::Test::Cluster->new('subscriber');
$node_subscriber->init;
$node_subscriber->start;
+my $default = join('', map {chr(65 + rand 26)} (1 .. 10000));
+
# Create some preexisting content on publisher 
$node_publisher->safe_psql('postgres',
-       "CREATE TABLE test_tab (a int primary key, b bytea)");
+       "CREATE TABLE test_tab (a int primary key, b bytea, t text
DEFAULT '$default')");
$node_publisher->safe_psql('postgres',
"INSERT INTO test_tab VALUES (1, 'foo'), (2, 'bar')");

@@ -144,7 +146,7 @@ $node_publisher->safe_psql('postgres', "CREATE
TABLE test_tab_2 (a int)");

# Setup structure on subscriber
$node_subscriber->safe_psql('postgres',
-       "CREATE TABLE test_tab (a int primary key, b bytea, c
timestamptz DEFAULT now(), d bigint DEFAULT 999)"
+       "CREATE TABLE test_tab (a int primary key, b bytea, t text
DEFAULT '$default', c timestamptz DEFAULT now(), d bigint DEFAULT 
999)"

Also, it is better to have a test for this, and let's ensure that
the new test doesn't increase the regression time too much if possible.

Sure. I'm going to add some test codes in a few days.

The above test shown in the email will work to test this issue.
However, we should use 'debug_logical_replication_streaming' as that
will help to reproduce the issue with just one write operation. Is it
possible to add the test in 018_stream_subxact_abort where we already use it?

I think it might be better to put the test in test_decoding/sql/stream.sql as
the bug is only related to the decoding part. Writing it in sql test also seems cheaper.

Just share a top-up patch 0002 which includes the tests I suggested for
reference. The test would fail without the fix.

Node that I didn't change any logic in 0001 as the fix looks good to
me. I just ran the pgindent for the 0001.

Best Regards,
Hou zj

Attachments:

v3-0001-Set-and-unset-bsysscan-flag-in-systable_-_ordered.patchapplication/octet-stream; name=v3-0001-Set-and-unset-bsysscan-flag-in-systable_-_ordered.patchDownload+16-1
v3-0002-add-a-testcase-in-stream.sql.patchapplication/octet-stream; name=v3-0002-add-a-testcase-in-stream.sql.patchDownload+44-1
#10Zhijie Hou (Fujitsu)
houzj.fnst@fujitsu.com
In reply to: Zhijie Hou (Fujitsu) (#9)
hackersbugs
RE: BUG #18641: Logical decoding of two-phase commit fails with TOASTed default values

On Monday, October 7, 2024 1:57 PM Zhijie Hou (Fujitsu) <houzj.fnst@fujitsu.com> wrote:

On Saturday, October 5, 2024 6:12 PM Amit Kapila <amit.kapila16@gmail.com>
wrote:

Sending it again to avoid getting it stuck because the original was replied to
both -hackers and -bugs.

On Tue, Oct 1, 2024 at 7:07 PM Takeshi Ideriha
<iderihatakeshi@gmail.com>
wrote:

We forgot to set/unset the flag in functions
systable_beginscan_ordered and systable_endscan_ordered. BTW,

Thank you for the clarification.

shouldn't this occur even without prepare transaction? If so, we
need to backpatch this till 14.

Yes, it occurred also at PG14.
I did some tests using 015_stream.pl with some modification like
below, which tests the subscription about stream is on but two-phase
is off.
The same issue occurred at both current head source code and PG14.

```
--- a/src/test/subscription/t/015_stream.pl
+++ b/src/test/subscription/t/015_stream.pl
@@ -134,9 +134,11 @@ my $node_subscriber =
PostgreSQL::Test::Cluster->new('subscriber');
$node_subscriber->init;
$node_subscriber->start;
+my $default = join('', map {chr(65 + rand 26)} (1 .. 10000));
+
# Create some preexisting content on publisher
$node_publisher->safe_psql('postgres',
-       "CREATE TABLE test_tab (a int primary key, b bytea)");
+       "CREATE TABLE test_tab (a int primary key, b bytea, t text
DEFAULT '$default')");
$node_publisher->safe_psql('postgres',
"INSERT INTO test_tab VALUES (1, 'foo'), (2, 'bar')");

@@ -144,7 +146,7 @@ $node_publisher->safe_psql('postgres', "CREATE
TABLE test_tab_2 (a int)");

# Setup structure on subscriber
$node_subscriber->safe_psql('postgres',
-       "CREATE TABLE test_tab (a int primary key, b bytea, c
timestamptz DEFAULT now(), d bigint DEFAULT 999)"
+       "CREATE TABLE test_tab (a int primary key, b bytea, t text
DEFAULT '$default', c timestamptz DEFAULT now(), d bigint DEFAULT
999)"

Also, it is better to have a test for this, and let's ensure that
the new test doesn't increase the regression time too much if possible.

Sure. I'm going to add some test codes in a few days.

The above test shown in the email will work to test this issue.
However, we should use 'debug_logical_replication_streaming' as that
will help to reproduce the issue with just one write operation. Is it
possible to add the test in 018_stream_subxact_abort where we already use

it?

I think it might be better to put the test in test_decoding/sql/stream.sql as the
bug is only related to the decoding part. Writing it in sql test also seems
cheaper.

Just share a top-up patch 0002 which includes the tests I suggested for
reference. The test would fail without the fix.

Node that I didn't change any logic in 0001 as the fix looks good to me. I just ran
the pgindent for the 0001.

While testing the fix on back-branches, I realized that we don't have the
debug_logical_replication_streaming GUC on PG14 and 15. So I tried to generate
enough data to exceed the logical_decoding_work_mem in the test instead. Please
see the attachment for reference. I also rebased 0001 on back-branches, so just
share them here as well.

Best Regards,
Hou zj

Attachments:

v3-0002-add-a-testcase-in-stream.sql-PG14-PG15.patchapplication/octet-stream; name=v3-0002-add-a-testcase-in-stream.sql-PG14-PG15.patchDownload+38-1
v3-0001-Set-and-unset-bsysscan-flag-in-systable_-_ordered-PG14-PG15.patchapplication/octet-stream; name=v3-0001-Set-and-unset-bsysscan-flag-in-systable_-_ordered-PG14-PG15.patchDownload+16-1
v3-0001-Set-and-unset-bsysscan-flag-in-systable_-_ordered-PG17.patchapplication/octet-stream; name=v3-0001-Set-and-unset-bsysscan-flag-in-systable_-_ordered-PG17.patchDownload+16-1
v3-0001-Set-and-unset-bsysscan-flag-in-systable_-_ordered-HEAD.patchapplication/octet-stream; name=v3-0001-Set-and-unset-bsysscan-flag-in-systable_-_ordered-HEAD.patchDownload+16-1
v3-0001-Set-and-unset-bsysscan-flag-in-systable_-_ordered-PG16.patchapplication/octet-stream; name=v3-0001-Set-and-unset-bsysscan-flag-in-systable_-_ordered-PG16.patchDownload+16-1
v3-0002-add-a-testcase-in-stream.sql-PG16.patchapplication/octet-stream; name=v3-0002-add-a-testcase-in-stream.sql-PG16.patchDownload+44-1
v3-0002-add-a-testcase-in-stream.sql-HEAD.patchapplication/octet-stream; name=v3-0002-add-a-testcase-in-stream.sql-HEAD.patchDownload+44-1
v3-0002-add-a-testcase-in-stream.sql-PG17.patchapplication/octet-stream; name=v3-0002-add-a-testcase-in-stream.sql-PG17.patchDownload+44-1