[bug] Logical Decoding of relation rewrite with toast does not reset toast_hash

Started by Drouvot, Bertrandover 4 years ago23 messages
#1Drouvot, Bertrand
bdrouvot@amazon.com
1 attachment(s)

Hi hackers,

During logical decoding, we have recently observed (and reported in [1]/messages/by-id/CAA4eK1KcUPwwhDVhJmdQExc09AzEBZMGbOa-u3DYaJs1zzfEnA@mail.gmail.com)
errors like:

ERROR:  could not open relation with OID 0

After investigating a recent issue on a PostgreSQL database that was
encountering this error, we found that the logical decoding of relation
rewrite with toast could produce this error without resetting the
toast_hash.

We were able to create this repro of the error:

postgres=# \! cat bdt_repro.sql
select pg_create_logical_replication_slot('bdt_slot','test_decoding');
CREATE TABLE tbl1 (a INT, b TEXT);
CREATE TABLE tbl2 (a INT);
ALTER TABLE tbl1 ALTER COLUMN b SET STORAGE EXTERNAL;
BEGIN;
INSERT INTO tbl1 VALUES(1, repeat('a', 4000)) ;
ALTER TABLE tbl1 ADD COLUMN id serial primary key;
INSERT INTO tbl2 VALUES(1);
commit;
select * from pg_logical_slot_get_changes('bdt_slot', null, null);

That ends up on 12.5 with:

ERROR:  could not open relation with OID 0

And on current master with:

server closed the connection unexpectedly
    This probably means the server terminated abnormally
    before or while processing the request.
The connection to the server was lost. Attempting reset: Failed.

The issue has been introduced by 325f2ec555 (and more precisely by its
change in reorderbuffer.c), so it does affect pre v11 versions:

git branch -r --contains 325f2ec555
  origin/HEAD -> origin/master
  origin/REL_11_STABLE
  origin/REL_12_STABLE
  origin/REL_13_STABLE
  origin/REL_14_STABLE
  origin/master

The fact that current master is producing a different behavior than 12.5
(for example) is due to 4daa140a2f that generates a failed assertion in
such a case (after going to the code path that should print out the ERROR):

#2  0x0000000000b29fab in ExceptionalCondition (conditionName=0xce6850
"(rb->size >= sz) && (txn->size >= sz)", errorType=0xce5f84
"FailedAssertion", fileName=0xce5fd0 "reorderbuffer.c", lineNumber=3141)
at assert.c:69
#3  0x00000000008ff1fb in ReorderBufferChangeMemoryUpdate (rb=0x11a7a40,
change=0x11c94b8, addition=false) at reorderbuffer.c:3141
#4  0x00000000008fab27 in ReorderBufferReturnChange (rb=0x11a7a40,
change=0x11c94b8, upd_mem=true) at reorderbuffer.c:477
#5  0x0000000000902ec1 in ReorderBufferToastReset (rb=0x11a7a40,
txn=0x11b1998) at reorderbuffer.c:4799
#6  0x00000000008faaa2 in ReorderBufferReturnTXN (rb=0x11a7a40,
txn=0x11b1998) at reorderbuffer.c:448
#7  0x00000000008fc95b in ReorderBufferCleanupTXN (rb=0x11a7a40,
txn=0x11b1998) at reorderbuffer.c:1540

I am adding Amit to this thread to make him aware than this is related
to the recent issue Jeremy and I were talking about in [1]/messages/by-id/CAA4eK1KcUPwwhDVhJmdQExc09AzEBZMGbOa-u3DYaJs1zzfEnA@mail.gmail.com - which we
now believe is not linked to the logical decoding and speculative insert
bug fixed in 4daa140a2f but likely to this new toast rewrite bug.

Please find enclosed a patch proposal to:

* Avoid the failed assertion on current master and generate the error
message instead (should the code reach that stage).
* Reset the toast_hash in case of relation rewrite with toast (so that
the logical decoding in the above repro is working).

I am adding this patch to the next commitfest.

Thanks
Bertrand

[1]: /messages/by-id/CAA4eK1KcUPwwhDVhJmdQExc09AzEBZMGbOa-u3DYaJs1zzfEnA@mail.gmail.com
/messages/by-id/CAA4eK1KcUPwwhDVhJmdQExc09AzEBZMGbOa-u3DYaJs1zzfEnA@mail.gmail.com

Attachments:

v1-0001-toast-rewrite.patchtext/plain; charset=UTF-8; name=v1-0001-toast-rewrite.patch; x-mac-creator=0; x-mac-type=0Download
diff --git a/src/backend/replication/logical/reorderbuffer.c b/src/backend/replication/logical/reorderbuffer.c
index 1b4f4a528a..f1a89a3d85 100644
--- a/src/backend/replication/logical/reorderbuffer.c
+++ b/src/backend/replication/logical/reorderbuffer.c
@@ -2170,7 +2170,17 @@ ReorderBufferProcessTXN(ReorderBuffer *rb, ReorderBufferTXN *txn,
 					 * plugin has asked for them.
 					 */
 					if (relation->rd_rel->relrewrite && !rb->output_rewrites)
+					{
+						/* Clear reassembled toast chunks if we're sure
+						 * they're not required anymore. The creator of the
+						 * tuple tells us.
+						 */
+						if (!IsToastRelation(relation) &&
+							change->data.tp.clear_toast_afterwards)
+							ReorderBufferToastReset(rb, txn);
+
 						goto change_done;
+					}
 
 					/*
 					 * For now ignore sequence changes entirely. Most of the
@@ -4609,6 +4619,11 @@ ReorderBufferToastReplace(ReorderBuffer *rb, ReorderBufferTXN *txn,
 	if (txn->toast_hash == NULL)
 		return;
 
+	toast_rel = RelationIdGetRelation(relation->rd_rel->reltoastrelid);
+	if (!RelationIsValid(toast_rel))
+		elog(ERROR, "could not open relation with OID %u",
+			 relation->rd_rel->reltoastrelid);
+
 	/*
 	 * We're going to modify the size of the change, so to make sure the
 	 * accounting is correct we'll make it look like we're removing the change
@@ -4623,11 +4638,6 @@ ReorderBufferToastReplace(ReorderBuffer *rb, ReorderBufferTXN *txn,
 
 	desc = RelationGetDescr(relation);
 
-	toast_rel = RelationIdGetRelation(relation->rd_rel->reltoastrelid);
-	if (!RelationIsValid(toast_rel))
-		elog(ERROR, "could not open relation with OID %u",
-			 relation->rd_rel->reltoastrelid);
-
 	toast_desc = RelationGetDescr(toast_rel);
 
 	/* should we allocate from stack instead? */
#2David Zhang
david.zhang@highgo.ca
In reply to: Drouvot, Bertrand (#1)
Re: [bug] Logical Decoding of relation rewrite with toast does not reset toast_hash

Hi Drouvot,
I can reproduce the issue you mentioned on REL_12_STABLE as well as Master branch, but the patch doesn't apply to REL_12_STABLE. After applied it to Master branch, it returns some wired result when run the query in the first time.
As you can see in the log below, after the first time execute the query `select * from pg_logical_slot_get_changes('bdt_slot', null, null);` it returns some extra data.
david:postgres$ psql -d postgres
psql (15devel)
Type "help" for help.

postgres=# \q
david:postgres$ psql -d postgres
psql (15devel)
Type "help" for help.

postgres=# select pg_create_logical_replication_slot('bdt_slot','test_decoding');
pg_create_logical_replication_slot
------------------------------------
(bdt_slot,0/1484FA8)
(1 row)

postgres=# CREATE TABLE tbl1 (a INT, b TEXT);
CREATE TABLE
postgres=# CREATE TABLE tbl2 (a INT);
CREATE TABLE
postgres=# ALTER TABLE tbl1 ALTER COLUMN b SET STORAGE EXTERNAL;
ALTER TABLE
postgres=#
postgres=# BEGIN;
BEGIN
postgres=*# INSERT INTO tbl1 VALUES(1, repeat('a', 4000)) ;
INSERT 0 1
postgres=*# ALTER TABLE tbl1 ADD COLUMN id serial primary key;
ALTER TABLE
postgres=*# INSERT INTO tbl2 VALUES(1);
INSERT 0 1
postgres=*# commit;
COMMIT
postgres=#
postgres=# select * from pg_logical_slot_get_changes('bdt_slot', null, null);
lsn | xid |

data

-----------+-----+--------------------------------------------------------------------------------------------------------------------------------------------
--------------------------------------------------------------------------------------------------------------------------------------------------------------
--------------------------------------------------------------------------------------------------------------------------------------------------------------
--------------------------------------------------------------------------------------------------------------------------------------------------------------
--------------------------------------------------------------------------------------------------------------------------------------------------------------
--------------------------------------------------------------------------------------------------------------------------------------------------------------
--------------------------------------------------------------------------------------------------------------------------------------------------------------
--------------------------------------------------------------------------------------------------------------------------------------------------------------
--------------------------------------------------------------------------------------------------------------------------------------------------------------
postgres=# select * from pg_logical_slot_get_changes('bdt_slot', null, null);
lsn | xid | data
-----+-----+------
(0 rows)

postgres=# select * from pg_logical_slot_get_changes('bdt_slot', null, null);
lsn | xid | data
-----+-----+------
(0 rows)

postgres=# select * from pg_logical_slot_get_changes('bdt_slot', null, null);
lsn | xid | data
-----+-----+------
(0 rows)

postgres=#

Thank you,
David

#3Drouvot, Bertrand
bdrouvot@amazon.com
In reply to: David Zhang (#2)
Re: [UNVERIFIED SENDER] Re: [bug] Logical Decoding of relation rewrite with toast does not reset toast_hash

Hi David,

On 8/7/21 12:18 AM, David Zhang wrote:

Hi Drouvot,
I can reproduce the issue you mentioned on REL_12_STABLE as well as Master branch,

Thanks for looking at it!

but the patch doesn't apply to REL_12_STABLE.

Indeed this patch version provided is done for the current Master branch.

After applied it to Master branch, it returns some wired result when run the query in the first time.
As you can see in the log below, after the first time execute the query `select * from pg_logical_slot_get_changes('bdt_slot', null, null);` it returns some extra data.
david:postgres$ psql -d postgres
psql (15devel)
Type "help" for help.

postgres=# \q
david:postgres$ psql -d postgres
psql (15devel)
Type "help" for help.

postgres=# select pg_create_logical_replication_slot('bdt_slot','test_decoding');
pg_create_logical_replication_slot
------------------------------------
(bdt_slot,0/1484FA8)
(1 row)

postgres=# CREATE TABLE tbl1 (a INT, b TEXT);
CREATE TABLE
postgres=# CREATE TABLE tbl2 (a INT);
CREATE TABLE
postgres=# ALTER TABLE tbl1 ALTER COLUMN b SET STORAGE EXTERNAL;
ALTER TABLE
postgres=#
postgres=# BEGIN;
BEGIN
postgres=*# INSERT INTO tbl1 VALUES(1, repeat('a', 4000)) ;
INSERT 0 1
postgres=*# ALTER TABLE tbl1 ADD COLUMN id serial primary key;
ALTER TABLE
postgres=*# INSERT INTO tbl2 VALUES(1);
INSERT 0 1
postgres=*# commit;
COMMIT
postgres=#
postgres=# select * from pg_logical_slot_get_changes('bdt_slot', null, null);
lsn | xid |

data

-----------+-----+--------------------------------------------------------------------------------------------------------------------------------------------
--------------------------------------------------------------------------------------------------------------------------------------------------------------
--------------------------------------------------------------------------------------------------------------------------------------------------------------
--------------------------------------------------------------------------------------------------------------------------------------------------------------
--------------------------------------------------------------------------------------------------------------------------------------------------------------
--------------------------------------------------------------------------------------------------------------------------------------------------------------
--------------------------------------------------------------------------------------------------------------------------------------------------------------
--------------------------------------------------------------------------------------------------------------------------------------------------------------
--------------------------------------------------------------------------------------------------------------------------------------------------------------
postgres=# select * from pg_logical_slot_get_changes('bdt_slot', null, null);
lsn | xid | data
-----+-----+------
(0 rows)

postgres=# select * from pg_logical_slot_get_changes('bdt_slot', null, null);
lsn | xid | data
-----+-----+------
(0 rows)

postgres=# select * from pg_logical_slot_get_changes('bdt_slot', null, null);
lsn | xid | data
-----+-----+------
(0 rows)

postgres=#

I don't see extra data in your output and it looks like your copy/paste
is missing some content, no?

On my side, that looks good and here is what i get with the patch applied:

psql (15devel)
Type "help" for help.

postgres=# \i repro.sql
 pg_create_logical_replication_slot
------------------------------------
 (bdt_slot,0/172DAF0)
(1 row)

CREATE TABLE
CREATE TABLE
ALTER TABLE
BEGIN
INSERT 0 1
ALTER TABLE
INSERT 0 1
COMMIT
    lsn    | xid |

                          data

-----------+-----+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
--------------------------------------
 0/172DB40 | 708 | BEGIN 708
 0/1753298 | 708 | COMMIT 708
 0/17532C8 | 709 | BEGIN 709
 0/1754828 | 709 | COMMIT 709
 0/1754828 | 710 | BEGIN 710
 0/1754B10 | 710 | COMMIT 710
 0/1754B10 | 711 | BEGIN 711
 0/1755CA0 | 711 | table public.tbl1: INSERT: a[integer]:1
b[text]:'aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa'
 0/176F970 | 711 | table public.tbl2: INSERT: a[integer]:1
 0/1770688 | 711 | COMMIT 711
(10 rows)

Thanks

Bertrand

#4Amit Kapila
amit.kapila16@gmail.com
In reply to: Drouvot, Bertrand (#1)
Re: [bug] Logical Decoding of relation rewrite with toast does not reset toast_hash

On Fri, Jul 9, 2021 at 12:22 PM Drouvot, Bertrand <bdrouvot@amazon.com> wrote:

Please find enclosed a patch proposal to:

* Avoid the failed assertion on current master and generate the error message instead (should the code reach that stage).
* Reset the toast_hash in case of relation rewrite with toast (so that the logical decoding in the above repro is working).

I think instead of resetting toast_hash for this case why don't we set
'relrewrite' for toast tables as well during rewrite? If we do that
then we will simply skip assembling toast chunks for the toast table.
In make_new_heap(), we are calling NewHeapCreateToastTable() to create
toast table where we can pass additional information (probably
'toastid'), if required to set 'relrewrite'. Additionally, let's add a
test case if possible for this.

BTW, I see this as an Open Item for PG-14 [1]https://wiki.postgresql.org/wiki/PostgreSQL_14_Open_Items -- With Regards, Amit Kapila. which seems wrong to me
as this is a bug from previous versions. I am not sure who added it
but do you see any reason for this to consider as an open item for
PG-14?

[1]: https://wiki.postgresql.org/wiki/PostgreSQL_14_Open_Items -- With Regards, Amit Kapila.
--
With Regards,
Amit Kapila.

#5Dilip Kumar
dilipbalaut@gmail.com
In reply to: Amit Kapila (#4)
Re: [bug] Logical Decoding of relation rewrite with toast does not reset toast_hash

On Mon, Aug 9, 2021 at 2:07 PM Amit Kapila <amit.kapila16@gmail.com> wrote:

On Fri, Jul 9, 2021 at 12:22 PM Drouvot, Bertrand <bdrouvot@amazon.com> wrote:

Please find enclosed a patch proposal to:

* Avoid the failed assertion on current master and generate the error message instead (should the code reach that stage).
* Reset the toast_hash in case of relation rewrite with toast (so that the logical decoding in the above repro is working).

I think instead of resetting toast_hash for this case why don't we set
'relrewrite' for toast tables as well during rewrite? If we do that
then we will simply skip assembling toast chunks for the toast table.
In make_new_heap(), we are calling NewHeapCreateToastTable() to create
toast table where we can pass additional information (probably
'toastid'), if required to set 'relrewrite'. Additionally, let's add a
test case if possible for this.

I agree with Amit, that setting relrewrite for the toast relation as
well is better as we can simply avoid processing the toast tuple as
well in such cases.

--
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com

#6Drouvot, Bertrand
bdrouvot@amazon.com
In reply to: Amit Kapila (#4)
Re: [bug] Logical Decoding of relation rewrite with toast does not reset toast_hash

Hi Amit,

On 8/9/21 10:37 AM, Amit Kapila wrote:

On Fri, Jul 9, 2021 at 12:22 PM Drouvot, Bertrand <bdrouvot@amazon.com> wrote:

Please find enclosed a patch proposal to:

* Avoid the failed assertion on current master and generate the error message instead (should the code reach that stage).
* Reset the toast_hash in case of relation rewrite with toast (so that the logical decoding in the above repro is working).

I think instead of resetting toast_hash for this case why don't we set
'relrewrite' for toast tables as well during rewrite? If we do that
then we will simply skip assembling toast chunks for the toast table.

Thanks for looking at it!

I do agree, that would be even better than the current patch approach:
I'll work on it.

In make_new_heap(), we are calling NewHeapCreateToastTable() to create
toast table where we can pass additional information (probably
'toastid'), if required to set 'relrewrite'. Additionally, let's add a
test case if possible for this.

+ 1 for the test case, it will be added in the next version of the patch.

BTW, I see this as an Open Item for PG-14 [1] which seems wrong to me
as this is a bug from previous versions. I am not sure who added it

Me neither.

but do you see any reason for this to consider as an open item for
PG-14?

No, I don't see any reasons as this is a bug from previous versions.

Thanks

Bertrand

#7Amit Kapila
amit.kapila16@gmail.com
In reply to: Drouvot, Bertrand (#6)
Re: [bug] Logical Decoding of relation rewrite with toast does not reset toast_hash

On Mon, Aug 9, 2021 at 3:37 PM Drouvot, Bertrand <bdrouvot@amazon.com> wrote:

Hi Amit,

On 8/9/21 10:37 AM, Amit Kapila wrote:

On Fri, Jul 9, 2021 at 12:22 PM Drouvot, Bertrand <bdrouvot@amazon.com> wrote:

Please find enclosed a patch proposal to:

* Avoid the failed assertion on current master and generate the error message instead (should the code reach that stage).
* Reset the toast_hash in case of relation rewrite with toast (so that the logical decoding in the above repro is working).

I think instead of resetting toast_hash for this case why don't we set
'relrewrite' for toast tables as well during rewrite? If we do that
then we will simply skip assembling toast chunks for the toast table.

Thanks for looking at it!

I do agree, that would be even better than the current patch approach:
I'll work on it.

In make_new_heap(), we are calling NewHeapCreateToastTable() to create
toast table where we can pass additional information (probably
'toastid'), if required to set 'relrewrite'. Additionally, let's add a
test case if possible for this.

+ 1 for the test case, it will be added in the next version of the patch.

Thanks, please see, if you can prepare patches for the back-branches as well.

--
With Regards,
Amit Kapila.

#8Amit Kapila
amit.kapila16@gmail.com
In reply to: Drouvot, Bertrand (#6)
Re: [bug] Logical Decoding of relation rewrite with toast does not reset toast_hash

On Mon, Aug 9, 2021 at 3:37 PM Drouvot, Bertrand <bdrouvot@amazon.com> wrote:

BTW, I see this as an Open Item for PG-14 [1] which seems wrong to me
as this is a bug from previous versions. I am not sure who added it

Me neither.

but do you see any reason for this to consider as an open item for
PG-14?

No, I don't see any reasons as this is a bug from previous versions.

Thanks for the confirmation. I have moved this to the section: "Older
bugs affecting stable branches".

--
With Regards,
Amit Kapila.

#9Drouvot, Bertrand
bdrouvot@amazon.com
In reply to: Amit Kapila (#7)
1 attachment(s)
Re: [bug] Logical Decoding of relation rewrite with toast does not reset toast_hash

Hi Amit,

On 8/9/21 1:12 PM, Amit Kapila wrote:

On Mon, Aug 9, 2021 at 3:37 PM Drouvot, Bertrand <bdrouvot@amazon.com> wrote:

Hi Amit,

On 8/9/21 10:37 AM, Amit Kapila wrote:

On Fri, Jul 9, 2021 at 12:22 PM Drouvot, Bertrand <bdrouvot@amazon.com> wrote:

Please find enclosed a patch proposal to:

* Avoid the failed assertion on current master and generate the error message instead (should the code reach that stage).
* Reset the toast_hash in case of relation rewrite with toast (so that the logical decoding in the above repro is working).

I think instead of resetting toast_hash for this case why don't we set
'relrewrite' for toast tables as well during rewrite? If we do that
then we will simply skip assembling toast chunks for the toast table.

Thanks for looking at it!

I do agree, that would be even better than the current patch approach:
I'll work on it.

In make_new_heap(), we are calling NewHeapCreateToastTable() to create
toast table where we can pass additional information (probably
'toastid'), if required to set 'relrewrite'. Additionally, let's add a
test case if possible for this.

+ 1 for the test case, it will be added in the next version of the patch.

Thanks, please see, if you can prepare patches for the back-branches as well.

Please find attached the new version that:

- sets "relwrewrite" for the toast.

- contains a new test case.

As far preparing the patches for the back-branches: I will do it for
sure, but I would prefer that we agree on a polished version on current
master first.

Thanks

Bertrand

Attachments:

v2-0001-toast-rewrite.patchtext/plain; charset=UTF-8; name=v2-0001-toast-rewrite.patch; x-mac-creator=0; x-mac-type=0Download
diff --git a/contrib/test_decoding/expected/toast.out b/contrib/test_decoding/expected/toast.out
index 75c4d22d80..dacefaa010 100644
--- a/contrib/test_decoding/expected/toast.out
+++ b/contrib/test_decoding/expected/toast.out
@@ -360,6 +360,29 @@ WHERE data NOT LIKE '%INSERT: %';
  COMMIT
 (4 rows)
 
+/*
+ * Test decoding relation rewrite with toast.
+ * The insert into tbl2 within the same transaction
+ * is there to check there is no remaining toast_hash
+ * not being reset.
+ */
+CREATE TABLE tbl1 (a INT, b TEXT);
+CREATE TABLE tbl2 (a INT);
+ALTER TABLE tbl1 ALTER COLUMN b SET STORAGE EXTERNAL;
+BEGIN;
+INSERT INTO tbl1 VALUES(1, repeat('a', 4000)) ;
+ALTER TABLE tbl1 ADD COLUMN id serial primary key;
+INSERT INTO tbl2 VALUES(1);
+commit;
+SELECT substr(data, 1, 200) FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'include-xids', '0', 'skip-empty-xacts', '1');
+                                                                                                  substr                                                                                                  
+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
+ BEGIN
+ table public.tbl1: INSERT: a[integer]:1 b[text]:'aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
+ table public.tbl2: INSERT: a[integer]:1
+ COMMIT
+(4 rows)
+
 SELECT pg_drop_replication_slot('regression_slot');
  pg_drop_replication_slot 
 --------------------------
diff --git a/contrib/test_decoding/sql/toast.sql b/contrib/test_decoding/sql/toast.sql
index 016c3ab784..c6816e7bc3 100644
--- a/contrib/test_decoding/sql/toast.sql
+++ b/contrib/test_decoding/sql/toast.sql
@@ -308,4 +308,21 @@ DROP TABLE toasted_several;
 
 SELECT regexp_replace(data, '^(.{100}).*(.{100})$', '\1..\2') FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'include-xids', '0', 'skip-empty-xacts', '1')
 WHERE data NOT LIKE '%INSERT: %';
+
+/*
+ * Test decoding relation rewrite with toast.
+ * The insert into tbl2 within the same transaction
+ * is there to check there is no remaining toast_hash
+ * not being reset.
+ */
+CREATE TABLE tbl1 (a INT, b TEXT);
+CREATE TABLE tbl2 (a INT);
+ALTER TABLE tbl1 ALTER COLUMN b SET STORAGE EXTERNAL;
+BEGIN;
+INSERT INTO tbl1 VALUES(1, repeat('a', 4000)) ;
+ALTER TABLE tbl1 ADD COLUMN id serial primary key;
+INSERT INTO tbl2 VALUES(1);
+commit;
+SELECT substr(data, 1, 200) FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'include-xids', '0', 'skip-empty-xacts', '1');
+
 SELECT pg_drop_replication_slot('regression_slot');
diff --git a/src/backend/catalog/toasting.c b/src/backend/catalog/toasting.c
index 147b5abc19..0db90c2011 100644
--- a/src/backend/catalog/toasting.c
+++ b/src/backend/catalog/toasting.c
@@ -36,9 +36,11 @@
 #include "utils/syscache.h"
 
 static void CheckAndCreateToastTable(Oid relOid, Datum reloptions,
-									 LOCKMODE lockmode, bool check);
+									 LOCKMODE lockmode, bool check,
+									 Oid OIDOldToast);
 static bool create_toast_table(Relation rel, Oid toastOid, Oid toastIndexOid,
-							   Datum reloptions, LOCKMODE lockmode, bool check);
+							   Datum reloptions, LOCKMODE lockmode, bool check,
+							   Oid OIDOldToast);
 static bool needs_toast_table(Relation rel);
 
 
@@ -57,30 +59,34 @@ static bool needs_toast_table(Relation rel);
 void
 AlterTableCreateToastTable(Oid relOid, Datum reloptions, LOCKMODE lockmode)
 {
-	CheckAndCreateToastTable(relOid, reloptions, lockmode, true);
+	CheckAndCreateToastTable(relOid, reloptions, lockmode, true, InvalidOid);
 }
 
 void
-NewHeapCreateToastTable(Oid relOid, Datum reloptions, LOCKMODE lockmode)
+NewHeapCreateToastTable(Oid relOid, Datum reloptions, LOCKMODE lockmode,
+						Oid OIDOldToast)
 {
-	CheckAndCreateToastTable(relOid, reloptions, lockmode, false);
+	CheckAndCreateToastTable(relOid, reloptions, lockmode, false, OIDOldToast);
 }
 
 void
 NewRelationCreateToastTable(Oid relOid, Datum reloptions)
 {
-	CheckAndCreateToastTable(relOid, reloptions, AccessExclusiveLock, false);
+	CheckAndCreateToastTable(relOid, reloptions, AccessExclusiveLock, false,
+							 InvalidOid);
 }
 
 static void
-CheckAndCreateToastTable(Oid relOid, Datum reloptions, LOCKMODE lockmode, bool check)
+CheckAndCreateToastTable(Oid relOid, Datum reloptions, LOCKMODE lockmode,
+						 bool check, Oid OIDOldToast)
 {
 	Relation	rel;
 
 	rel = table_open(relOid, lockmode);
 
 	/* create_toast_table does all the work */
-	(void) create_toast_table(rel, InvalidOid, InvalidOid, reloptions, lockmode, check);
+	(void) create_toast_table(rel, InvalidOid, InvalidOid, reloptions, lockmode,
+							  check, OIDOldToast);
 
 	table_close(rel, NoLock);
 }
@@ -104,7 +110,7 @@ BootstrapToastTable(char *relName, Oid toastOid, Oid toastIndexOid)
 
 	/* create_toast_table does all the work */
 	if (!create_toast_table(rel, toastOid, toastIndexOid, (Datum) 0,
-							AccessExclusiveLock, false))
+							AccessExclusiveLock, false, InvalidOid))
 		elog(ERROR, "\"%s\" does not require a toast table",
 			 relName);
 
@@ -121,7 +127,8 @@ BootstrapToastTable(char *relName, Oid toastOid, Oid toastIndexOid)
  */
 static bool
 create_toast_table(Relation rel, Oid toastOid, Oid toastIndexOid,
-				   Datum reloptions, LOCKMODE lockmode, bool check)
+				   Datum reloptions, LOCKMODE lockmode, bool check,
+				   Oid OIDOldToast)
 {
 	Oid			relOid = RelationGetRelid(rel);
 	HeapTuple	reltup;
@@ -258,7 +265,7 @@ create_toast_table(Relation rel, Oid toastOid, Oid toastIndexOid,
 										   false,
 										   true,
 										   true,
-										   InvalidOid,
+										   OIDOldToast,
 										   NULL);
 	Assert(toast_relid != InvalidOid);
 
diff --git a/src/backend/commands/cluster.c b/src/backend/commands/cluster.c
index b3d8b6deb0..a5817dc5d1 100644
--- a/src/backend/commands/cluster.c
+++ b/src/backend/commands/cluster.c
@@ -735,7 +735,7 @@ make_new_heap(Oid OIDOldHeap, Oid NewTableSpace, Oid NewAccessMethod,
 		if (isNull)
 			reloptions = (Datum) 0;
 
-		NewHeapCreateToastTable(OIDNewHeap, reloptions, lockmode);
+		NewHeapCreateToastTable(OIDNewHeap, reloptions, lockmode, toastid);
 
 		ReleaseSysCache(tuple);
 	}
diff --git a/src/backend/commands/tablecmds.c b/src/backend/commands/tablecmds.c
index fcd778c62a..ebd2311902 100644
--- a/src/backend/commands/tablecmds.c
+++ b/src/backend/commands/tablecmds.c
@@ -3861,6 +3861,10 @@ RenameRelationInternal(Oid myrelid, const char *newrelname, bool is_internal, bo
 	 */
 	namestrcpy(&(relform->relname), newrelname);
 
+	/* reset relrewrite for toast */
+	if (relform->relkind == RELKIND_TOASTVALUE)
+		relform->relrewrite = InvalidOid;
+
 	CatalogTupleUpdate(relrelation, &reltup->t_self, reltup);
 
 	InvokeObjectPostAlterHookArg(RelationRelationId, myrelid, 0,
diff --git a/src/include/catalog/toasting.h b/src/include/catalog/toasting.h
index f3fa85f64e..7067c4c61b 100644
--- a/src/include/catalog/toasting.h
+++ b/src/include/catalog/toasting.h
@@ -21,7 +21,7 @@
  */
 extern void NewRelationCreateToastTable(Oid relOid, Datum reloptions);
 extern void NewHeapCreateToastTable(Oid relOid, Datum reloptions,
-									LOCKMODE lockmode);
+									LOCKMODE lockmode, Oid OIDOldToast);
 extern void AlterTableCreateToastTable(Oid relOid, Datum reloptions,
 									   LOCKMODE lockmode);
 extern void BootstrapToastTable(char *relName,
#10Drouvot, Bertrand
bdrouvot@amazon.com
In reply to: Drouvot, Bertrand (#9)
Re: [bug] Logical Decoding of relation rewrite with toast does not reset toast_hash

Hi Amit,

On 8/10/21 1:59 PM, Drouvot, Bertrand wrote:

Hi Amit,

On 8/9/21 1:12 PM, Amit Kapila wrote:

On Mon, Aug 9, 2021 at 3:37 PM Drouvot, Bertrand
<bdrouvot@amazon.com> wrote:

Hi Amit,

On 8/9/21 10:37 AM, Amit Kapila wrote:

On Fri, Jul 9, 2021 at 12:22 PM Drouvot, Bertrand
<bdrouvot@amazon.com> wrote:

Please find enclosed a patch proposal to:

* Avoid the failed assertion on current master and generate the
error message instead (should the code reach that stage).
* Reset the toast_hash in case of relation rewrite with toast (so
that the logical decoding in the above repro is working).

I think instead of resetting toast_hash for this case why don't we set
'relrewrite' for toast tables as well during rewrite? If we do that
then we will simply skip assembling toast chunks for the toast table.

Thanks for looking at it!

I do agree, that would be even better than the current patch approach:
I'll work on it.

In make_new_heap(), we are calling NewHeapCreateToastTable() to create
toast table where we can pass additional information (probably
'toastid'), if required to set 'relrewrite'. Additionally, let's add a
test case if possible for this.

+ 1 for the test case, it will be added in the next version of the
patch.

Thanks, please see, if you can prepare patches for the back-branches
as well.

Please find attached the new version that:

- sets "relwrewrite" for the toast.

- contains a new test case.

The first version of the patch contained a change in
ReorderBufferToastReplace() (to put the call to
RelationIsValid(toast_rel) and display the error message when it is not
valid before the call to ReorderBufferChangeMemoryUpdate()).

That way we also avoid the failed assertion described in the first
message of this thread (but would report the error message instead).

Forgot to mention that I did not add this change in the new patch
version as I’m thinking it would be better to create another patch for
that purpose (as not really related to toast rewrite), what do you think?

Thanks
Bertrand

#11Amit Kapila
amit.kapila16@gmail.com
In reply to: Drouvot, Bertrand (#10)
Re: [bug] Logical Decoding of relation rewrite with toast does not reset toast_hash

On Thu, Aug 12, 2021 at 12:15 PM Drouvot, Bertrand <bdrouvot@amazon.com> wrote:

On 8/10/21 1:59 PM, Drouvot, Bertrand wrote:

Hi Amit,

The first version of the patch contained a change in
ReorderBufferToastReplace() (to put the call to
RelationIsValid(toast_rel) and display the error message when it is not
valid before the call to ReorderBufferChangeMemoryUpdate()).

That way we also avoid the failed assertion described in the first
message of this thread (but would report the error message instead).

Forgot to mention that I did not add this change in the new patch
version

I think that is the right decision.

--
With Regards,
Amit Kapila.

#12Amit Kapila
amit.kapila16@gmail.com
In reply to: Drouvot, Bertrand (#9)
Re: [bug] Logical Decoding of relation rewrite with toast does not reset toast_hash

On Tue, Aug 10, 2021 at 5:30 PM Drouvot, Bertrand <bdrouvot@amazon.com> wrote:

Hi Amit,

On 8/9/21 1:12 PM, Amit Kapila wrote:

On Mon, Aug 9, 2021 at 3:37 PM Drouvot, Bertrand <bdrouvot@amazon.com> wrote:

Hi Amit,

On 8/9/21 10:37 AM, Amit Kapila wrote:

On Fri, Jul 9, 2021 at 12:22 PM Drouvot, Bertrand <bdrouvot@amazon.com> wrote:

Please find enclosed a patch proposal to:

* Avoid the failed assertion on current master and generate the error message instead (should the code reach that stage).
* Reset the toast_hash in case of relation rewrite with toast (so that the logical decoding in the above repro is working).

I think instead of resetting toast_hash for this case why don't we set
'relrewrite' for toast tables as well during rewrite? If we do that
then we will simply skip assembling toast chunks for the toast table.

Thanks for looking at it!

I do agree, that would be even better than the current patch approach:
I'll work on it.

In make_new_heap(), we are calling NewHeapCreateToastTable() to create
toast table where we can pass additional information (probably
'toastid'), if required to set 'relrewrite'. Additionally, let's add a
test case if possible for this.

+ 1 for the test case, it will be added in the next version of the patch.

Thanks, please see, if you can prepare patches for the back-branches as well.

Please find attached the new version that:

- sets "relwrewrite" for the toast.

--- a/src/backend/commands/tablecmds.c
+++ b/src/backend/commands/tablecmds.c
@@ -3861,6 +3861,10 @@ RenameRelationInternal(Oid myrelid, const char
*newrelname, bool is_internal, bo
  */
  namestrcpy(&(relform->relname), newrelname);
+ /* reset relrewrite for toast */
+ if (relform->relkind == RELKIND_TOASTVALUE)
+ relform->relrewrite = InvalidOid;
+

I find this change quite ad-hoc. I think this API is quite generic to
make such a change. I see two ways for this (a) pass a new bool flag
(reset_toast_rewrite) in this API and then make this change, (b) in
the specific place where we need this, change relrewrite separately
via a new API.

I would prefer (b) in the ideal case, but I understand it would be an
additional cost, so maybe (a) is also okay. What do you people think?

--
With Regards,
Amit Kapila.

#13Amit Kapila
amit.kapila16@gmail.com
In reply to: Amit Kapila (#12)
Re: [bug] Logical Decoding of relation rewrite with toast does not reset toast_hash

On Thu, Aug 12, 2021 at 4:30 PM Amit Kapila <amit.kapila16@gmail.com> wrote:

On Tue, Aug 10, 2021 at 5:30 PM Drouvot, Bertrand <bdrouvot@amazon.com> wrote:

Please find attached the new version that:

- sets "relwrewrite" for the toast.

--- a/src/backend/commands/tablecmds.c
+++ b/src/backend/commands/tablecmds.c
@@ -3861,6 +3861,10 @@ RenameRelationInternal(Oid myrelid, const char
*newrelname, bool is_internal, bo
*/
namestrcpy(&(relform->relname), newrelname);
+ /* reset relrewrite for toast */
+ if (relform->relkind == RELKIND_TOASTVALUE)
+ relform->relrewrite = InvalidOid;
+

I find this change quite ad-hoc. I think this API is quite generic to
make such a change. I see two ways for this (a) pass a new bool flag
(reset_toast_rewrite) in this API and then make this change, (b) in
the specific place where we need this, change relrewrite separately
via a new API.

I would prefer (b) in the ideal case, but I understand it would be an
additional cost, so maybe (a) is also okay. What do you people think?

One minor comment:
+/*
+ * Test decoding relation rewrite with toast.
+ * The insert into tbl2 within the same transaction
+ * is there to check there is no remaining toast_hash
+ * not being reset.
+ */

You can extend each line of comment up to 80 chars. The current one
looks a bit odd.

--
With Regards,
Amit Kapila.

#14Drouvot, Bertrand
bdrouvot@amazon.com
In reply to: Amit Kapila (#12)
Re: [bug] Logical Decoding of relation rewrite with toast does not reset toast_hash

Hi,

On 8/12/21 1:00 PM, Amit Kapila wrote:

On Tue, Aug 10, 2021 at 5:30 PM Drouvot, Bertrand <bdrouvot@amazon.com> wrote:

Hi Amit,

On 8/9/21 1:12 PM, Amit Kapila wrote:

On Mon, Aug 9, 2021 at 3:37 PM Drouvot, Bertrand <bdrouvot@amazon.com> wrote:

Hi Amit,

On 8/9/21 10:37 AM, Amit Kapila wrote:

On Fri, Jul 9, 2021 at 12:22 PM Drouvot, Bertrand <bdrouvot@amazon.com> wrote:

Please find enclosed a patch proposal to:

* Avoid the failed assertion on current master and generate the error message instead (should the code reach that stage).
* Reset the toast_hash in case of relation rewrite with toast (so that the logical decoding in the above repro is working).

I think instead of resetting toast_hash for this case why don't we set
'relrewrite' for toast tables as well during rewrite? If we do that
then we will simply skip assembling toast chunks for the toast table.

Thanks for looking at it!

I do agree, that would be even better than the current patch approach:
I'll work on it.

In make_new_heap(), we are calling NewHeapCreateToastTable() to create
toast table where we can pass additional information (probably
'toastid'), if required to set 'relrewrite'. Additionally, let's add a
test case if possible for this.

+ 1 for the test case, it will be added in the next version of the patch.

Thanks, please see, if you can prepare patches for the back-branches as well.

Please find attached the new version that:

- sets "relwrewrite" for the toast.

--- a/src/backend/commands/tablecmds.c
+++ b/src/backend/commands/tablecmds.c
@@ -3861,6 +3861,10 @@ RenameRelationInternal(Oid myrelid, const char
*newrelname, bool is_internal, bo
*/
namestrcpy(&(relform->relname), newrelname);
+ /* reset relrewrite for toast */
+ if (relform->relkind == RELKIND_TOASTVALUE)
+ relform->relrewrite = InvalidOid;
+

I find this change quite ad-hoc. I think this API is quite generic to
make such a change. I see two ways for this (a) pass a new bool flag
(reset_toast_rewrite) in this API and then make this change, (b) in
the specific place where we need this, change relrewrite separately
via a new API.

I would prefer (b) in the ideal case, but I understand it would be an
additional cost, so maybe (a) is also okay. What do you people think?

I would prefer a) because:

- b) would need to update the exact same tuple one more time (means
doing more or less the same work: open relation, search for the tuple,
update the tuple...)

- a) would still give the ability for someone reading the code to
understand where the relrewrite reset is needed (as opposed to the way
the patch is currently written)

- finish_heap_swap() with swap_toast_by_content set to false, is the
only place where we currently need to reset explicitly relrewrite (so
that we would have the new API produced by b) being called only at that
place).

- That means that b) would be only for code readability but at the price
of extra cost.

That said, I think we can go with a) and rethink about b) later on if
there is a need of this new API in other places for other reasons.

Thanks

Bertrand

#15Drouvot, Bertrand
bdrouvot@amazon.com
In reply to: Amit Kapila (#13)
Re: [bug] Logical Decoding of relation rewrite with toast does not reset toast_hash

On 8/12/21 1:28 PM, Amit Kapila wrote:

On Thu, Aug 12, 2021 at 4:30 PM Amit Kapila <amit.kapila16@gmail.com> wrote:

On Tue, Aug 10, 2021 at 5:30 PM Drouvot, Bertrand <bdrouvot@amazon.com> wrote:

Please find attached the new version that:

- sets "relwrewrite" for the toast.

--- a/src/backend/commands/tablecmds.c
+++ b/src/backend/commands/tablecmds.c
@@ -3861,6 +3861,10 @@ RenameRelationInternal(Oid myrelid, const char
*newrelname, bool is_internal, bo
*/
namestrcpy(&(relform->relname), newrelname);
+ /* reset relrewrite for toast */
+ if (relform->relkind == RELKIND_TOASTVALUE)
+ relform->relrewrite = InvalidOid;
+

I find this change quite ad-hoc. I think this API is quite generic to
make such a change. I see two ways for this (a) pass a new bool flag
(reset_toast_rewrite) in this API and then make this change, (b) in
the specific place where we need this, change relrewrite separately
via a new API.

I would prefer (b) in the ideal case, but I understand it would be an
additional cost, so maybe (a) is also okay. What do you people think?

One minor comment:
+/*
+ * Test decoding relation rewrite with toast.
+ * The insert into tbl2 within the same transaction
+ * is there to check there is no remaining toast_hash
+ * not being reset.
+ */

You can extend each line of comment up to 80 chars. The current one
looks a bit odd.

Thanks. I'll update the patch and comment that way once we have decided
if we are going for a) or b).

Thanks

Bertrand

#16Amit Kapila
amit.kapila16@gmail.com
In reply to: Drouvot, Bertrand (#14)
Re: [bug] Logical Decoding of relation rewrite with toast does not reset toast_hash

On Fri, Aug 13, 2021 at 11:47 AM Drouvot, Bertrand <bdrouvot@amazon.com> wrote:

On 8/12/21 1:00 PM, Amit Kapila wrote:

- sets "relwrewrite" for the toast.

--- a/src/backend/commands/tablecmds.c
+++ b/src/backend/commands/tablecmds.c
@@ -3861,6 +3861,10 @@ RenameRelationInternal(Oid myrelid, const char
*newrelname, bool is_internal, bo
*/
namestrcpy(&(relform->relname), newrelname);
+ /* reset relrewrite for toast */
+ if (relform->relkind == RELKIND_TOASTVALUE)
+ relform->relrewrite = InvalidOid;
+

I find this change quite ad-hoc. I think this API is quite generic to
make such a change. I see two ways for this (a) pass a new bool flag
(reset_toast_rewrite) in this API and then make this change, (b) in
the specific place where we need this, change relrewrite separately
via a new API.

I would prefer (b) in the ideal case, but I understand it would be an
additional cost, so maybe (a) is also okay. What do you people think?

I would prefer a) because:

- b) would need to update the exact same tuple one more time (means
doing more or less the same work: open relation, search for the tuple,
update the tuple...)

- a) would still give the ability for someone reading the code to
understand where the relrewrite reset is needed (as opposed to the way
the patch is currently written)

- finish_heap_swap() with swap_toast_by_content set to false, is the
only place where we currently need to reset explicitly relrewrite (so
that we would have the new API produced by b) being called only at that
place).

- That means that b) would be only for code readability but at the price
of extra cost.

Anybody else would like to weigh in? I think it is good if few others
also share their opinion as we need to backpatch this bug-fix.

--
With Regards,
Amit Kapila.

#17David Zhang
david.zhang@highgo.ca
In reply to: Drouvot, Bertrand (#3)
Re: [UNVERIFIED SENDER] Re: [bug] Logical Decoding of relation rewrite with toast does not reset toast_hash

Hi Drouvot,

I don't see extra data in your output and it looks like your
copy/paste is missing some content, no?

On my side, that looks good and here is what i get with the patch applied:

I ran the test again, now I got the same output as yours, and it looks
good for me. (The issue I mentioned in previous email was caused by my
console output.)

Thank you,

--
David

Software Engineer
Highgo Software Inc. (Canada)
www.highgo.ca

#18Drouvot, Bertrand
bdrouvot@amazon.com
In reply to: Amit Kapila (#16)
1 attachment(s)
Re: [bug] Logical Decoding of relation rewrite with toast does not reset toast_hash

Hi,

On 8/13/21 11:17 AM, Amit Kapila wrote:

On Fri, Aug 13, 2021 at 11:47 AM Drouvot, Bertrand <bdrouvot@amazon.com> wrote:

On 8/12/21 1:00 PM, Amit Kapila wrote:

- sets "relwrewrite" for the toast.

--- a/src/backend/commands/tablecmds.c
+++ b/src/backend/commands/tablecmds.c
@@ -3861,6 +3861,10 @@ RenameRelationInternal(Oid myrelid, const char
*newrelname, bool is_internal, bo
*/
namestrcpy(&(relform->relname), newrelname);
+ /* reset relrewrite for toast */
+ if (relform->relkind == RELKIND_TOASTVALUE)
+ relform->relrewrite = InvalidOid;
+

I find this change quite ad-hoc. I think this API is quite generic to
make such a change. I see two ways for this (a) pass a new bool flag
(reset_toast_rewrite) in this API and then make this change, (b) in
the specific place where we need this, change relrewrite separately
via a new API.

I would prefer (b) in the ideal case, but I understand it would be an
additional cost, so maybe (a) is also okay. What do you people think?

I would prefer a) because:

- b) would need to update the exact same tuple one more time (means
doing more or less the same work: open relation, search for the tuple,
update the tuple...)

- a) would still give the ability for someone reading the code to
understand where the relrewrite reset is needed (as opposed to the way
the patch is currently written)

- finish_heap_swap() with swap_toast_by_content set to false, is the
only place where we currently need to reset explicitly relrewrite (so
that we would have the new API produced by b) being called only at that
place).

- That means that b) would be only for code readability but at the price
of extra cost.

Anybody else would like to weigh in? I think it is good if few others
also share their opinion as we need to backpatch this bug-fix.

I had a second thoughts on it and now think option b) is better.

It would make the code clearer by using a new API and the extra cost of
it (mainly search again for the pg_class tuple and update it) should be ok.

Please find patch version V3 making use of a new API.

Thanks

Bertrand

Attachments:

v3-0001-toast-rewrite.patchtext/plain; charset=UTF-8; name=v3-0001-toast-rewrite.patch; x-mac-creator=0; x-mac-type=0Download
diff --git a/contrib/test_decoding/expected/toast.out b/contrib/test_decoding/expected/toast.out
index 75c4d22d80..cd03e9d50a 100644
--- a/contrib/test_decoding/expected/toast.out
+++ b/contrib/test_decoding/expected/toast.out
@@ -360,6 +360,28 @@ WHERE data NOT LIKE '%INSERT: %';
  COMMIT
 (4 rows)
 
+/*
+ * Test decoding relation rewrite with toast. The insert into tbl2 within the
+ * same transaction is there to check that there is no remaining toast_hash not
+ * being reset.
+ */
+CREATE TABLE tbl1 (a INT, b TEXT);
+CREATE TABLE tbl2 (a INT);
+ALTER TABLE tbl1 ALTER COLUMN b SET STORAGE EXTERNAL;
+BEGIN;
+INSERT INTO tbl1 VALUES(1, repeat('a', 4000)) ;
+ALTER TABLE tbl1 ADD COLUMN id serial primary key;
+INSERT INTO tbl2 VALUES(1);
+commit;
+SELECT substr(data, 1, 200) FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'include-xids', '0', 'skip-empty-xacts', '1');
+                                                                                                  substr                                                                                                  
+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
+ BEGIN
+ table public.tbl1: INSERT: a[integer]:1 b[text]:'aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
+ table public.tbl2: INSERT: a[integer]:1
+ COMMIT
+(4 rows)
+
 SELECT pg_drop_replication_slot('regression_slot');
  pg_drop_replication_slot 
 --------------------------
diff --git a/contrib/test_decoding/sql/toast.sql b/contrib/test_decoding/sql/toast.sql
index 016c3ab784..d1c560a174 100644
--- a/contrib/test_decoding/sql/toast.sql
+++ b/contrib/test_decoding/sql/toast.sql
@@ -308,4 +308,20 @@ DROP TABLE toasted_several;
 
 SELECT regexp_replace(data, '^(.{100}).*(.{100})$', '\1..\2') FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'include-xids', '0', 'skip-empty-xacts', '1')
 WHERE data NOT LIKE '%INSERT: %';
+
+/*
+ * Test decoding relation rewrite with toast. The insert into tbl2 within the
+ * same transaction is there to check that there is no remaining toast_hash not
+ * being reset.
+ */
+CREATE TABLE tbl1 (a INT, b TEXT);
+CREATE TABLE tbl2 (a INT);
+ALTER TABLE tbl1 ALTER COLUMN b SET STORAGE EXTERNAL;
+BEGIN;
+INSERT INTO tbl1 VALUES(1, repeat('a', 4000)) ;
+ALTER TABLE tbl1 ADD COLUMN id serial primary key;
+INSERT INTO tbl2 VALUES(1);
+commit;
+SELECT substr(data, 1, 200) FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'include-xids', '0', 'skip-empty-xacts', '1');
+
 SELECT pg_drop_replication_slot('regression_slot');
diff --git a/src/backend/catalog/toasting.c b/src/backend/catalog/toasting.c
index 147b5abc19..0db90c2011 100644
--- a/src/backend/catalog/toasting.c
+++ b/src/backend/catalog/toasting.c
@@ -36,9 +36,11 @@
 #include "utils/syscache.h"
 
 static void CheckAndCreateToastTable(Oid relOid, Datum reloptions,
-									 LOCKMODE lockmode, bool check);
+									 LOCKMODE lockmode, bool check,
+									 Oid OIDOldToast);
 static bool create_toast_table(Relation rel, Oid toastOid, Oid toastIndexOid,
-							   Datum reloptions, LOCKMODE lockmode, bool check);
+							   Datum reloptions, LOCKMODE lockmode, bool check,
+							   Oid OIDOldToast);
 static bool needs_toast_table(Relation rel);
 
 
@@ -57,30 +59,34 @@ static bool needs_toast_table(Relation rel);
 void
 AlterTableCreateToastTable(Oid relOid, Datum reloptions, LOCKMODE lockmode)
 {
-	CheckAndCreateToastTable(relOid, reloptions, lockmode, true);
+	CheckAndCreateToastTable(relOid, reloptions, lockmode, true, InvalidOid);
 }
 
 void
-NewHeapCreateToastTable(Oid relOid, Datum reloptions, LOCKMODE lockmode)
+NewHeapCreateToastTable(Oid relOid, Datum reloptions, LOCKMODE lockmode,
+						Oid OIDOldToast)
 {
-	CheckAndCreateToastTable(relOid, reloptions, lockmode, false);
+	CheckAndCreateToastTable(relOid, reloptions, lockmode, false, OIDOldToast);
 }
 
 void
 NewRelationCreateToastTable(Oid relOid, Datum reloptions)
 {
-	CheckAndCreateToastTable(relOid, reloptions, AccessExclusiveLock, false);
+	CheckAndCreateToastTable(relOid, reloptions, AccessExclusiveLock, false,
+							 InvalidOid);
 }
 
 static void
-CheckAndCreateToastTable(Oid relOid, Datum reloptions, LOCKMODE lockmode, bool check)
+CheckAndCreateToastTable(Oid relOid, Datum reloptions, LOCKMODE lockmode,
+						 bool check, Oid OIDOldToast)
 {
 	Relation	rel;
 
 	rel = table_open(relOid, lockmode);
 
 	/* create_toast_table does all the work */
-	(void) create_toast_table(rel, InvalidOid, InvalidOid, reloptions, lockmode, check);
+	(void) create_toast_table(rel, InvalidOid, InvalidOid, reloptions, lockmode,
+							  check, OIDOldToast);
 
 	table_close(rel, NoLock);
 }
@@ -104,7 +110,7 @@ BootstrapToastTable(char *relName, Oid toastOid, Oid toastIndexOid)
 
 	/* create_toast_table does all the work */
 	if (!create_toast_table(rel, toastOid, toastIndexOid, (Datum) 0,
-							AccessExclusiveLock, false))
+							AccessExclusiveLock, false, InvalidOid))
 		elog(ERROR, "\"%s\" does not require a toast table",
 			 relName);
 
@@ -121,7 +127,8 @@ BootstrapToastTable(char *relName, Oid toastOid, Oid toastIndexOid)
  */
 static bool
 create_toast_table(Relation rel, Oid toastOid, Oid toastIndexOid,
-				   Datum reloptions, LOCKMODE lockmode, bool check)
+				   Datum reloptions, LOCKMODE lockmode, bool check,
+				   Oid OIDOldToast)
 {
 	Oid			relOid = RelationGetRelid(rel);
 	HeapTuple	reltup;
@@ -258,7 +265,7 @@ create_toast_table(Relation rel, Oid toastOid, Oid toastIndexOid,
 										   false,
 										   true,
 										   true,
-										   InvalidOid,
+										   OIDOldToast,
 										   NULL);
 	Assert(toast_relid != InvalidOid);
 
diff --git a/src/backend/commands/cluster.c b/src/backend/commands/cluster.c
index 39185cc5b9..94a9f78967 100644
--- a/src/backend/commands/cluster.c
+++ b/src/backend/commands/cluster.c
@@ -735,7 +735,7 @@ make_new_heap(Oid OIDOldHeap, Oid NewTableSpace, Oid NewAccessMethod,
 		if (isNull)
 			reloptions = (Datum) 0;
 
-		NewHeapCreateToastTable(OIDNewHeap, reloptions, lockmode);
+		NewHeapCreateToastTable(OIDNewHeap, reloptions, lockmode, toastid);
 
 		ReleaseSysCache(tuple);
 	}
@@ -1526,6 +1526,16 @@ finish_heap_swap(Oid OIDOldHeap, Oid OIDNewHeap,
 
 			RenameRelationInternal(toastidx,
 								   NewToastName, true, true);
+
+			/*
+			 * Reset the relrewrite for the toast. We need to call
+			 * CommandCounterIncrement() first to avoid the
+			 * "tuple already updated by self" error. Indeed the exact same
+			 * pg_class tuple has already been updated while
+			 * calling RenameRelationInternal().
+			 */
+			CommandCounterIncrement();
+			ResetRelRewrite(newrel->rd_rel->reltoastrelid);
 		}
 		relation_close(newrel, NoLock);
 	}
diff --git a/src/backend/commands/tablecmds.c b/src/backend/commands/tablecmds.c
index a03077139d..dbee6ae199 100644
--- a/src/backend/commands/tablecmds.c
+++ b/src/backend/commands/tablecmds.c
@@ -3849,6 +3849,37 @@ RenameRelationInternal(Oid myrelid, const char *newrelname, bool is_internal, bo
 	relation_close(targetrelation, NoLock);
 }
 
+/*
+ *		ResetRelRewrite - reset relrewrite
+ */
+void
+ResetRelRewrite(Oid myrelid)
+{
+	Relation	relrelation;	/* for RELATION relation */
+	HeapTuple	reltup;
+	Form_pg_class relform;
+
+	/*
+	 * Find relation's pg_class tuple.
+	 */
+	relrelation = table_open(RelationRelationId, RowExclusiveLock);
+
+	reltup = SearchSysCacheCopy1(RELOID, ObjectIdGetDatum(myrelid));
+	if (!HeapTupleIsValid(reltup))	/* shouldn't happen */
+		elog(ERROR, "cache lookup failed for relation %u", myrelid);
+	relform = (Form_pg_class) GETSTRUCT(reltup);
+
+	/*
+	 * Update pg_class tuple.
+	 */
+	relform->relrewrite = InvalidOid;
+
+	CatalogTupleUpdate(relrelation, &reltup->t_self, reltup);
+
+	heap_freetuple(reltup);
+	table_close(relrelation, RowExclusiveLock);
+}
+
 /*
  * Disallow ALTER TABLE (and similar commands) when the current backend has
  * any open reference to the target table besides the one just acquired by
diff --git a/src/include/catalog/toasting.h b/src/include/catalog/toasting.h
index f3fa85f64e..7067c4c61b 100644
--- a/src/include/catalog/toasting.h
+++ b/src/include/catalog/toasting.h
@@ -21,7 +21,7 @@
  */
 extern void NewRelationCreateToastTable(Oid relOid, Datum reloptions);
 extern void NewHeapCreateToastTable(Oid relOid, Datum reloptions,
-									LOCKMODE lockmode);
+									LOCKMODE lockmode, Oid OIDOldToast);
 extern void AlterTableCreateToastTable(Oid relOid, Datum reloptions,
 									   LOCKMODE lockmode);
 extern void BootstrapToastTable(char *relName,
diff --git a/src/include/commands/tablecmds.h b/src/include/commands/tablecmds.h
index 14f4b4882f..336549cc5f 100644
--- a/src/include/commands/tablecmds.h
+++ b/src/include/commands/tablecmds.h
@@ -78,6 +78,8 @@ extern void RenameRelationInternal(Oid myrelid,
 								   const char *newrelname, bool is_internal,
 								   bool is_index);
 
+extern void ResetRelRewrite(Oid myrelid);
+
 extern void find_composite_type_dependencies(Oid typeOid,
 											 Relation origRelation,
 											 const char *origTypeName);
#19Amit Kapila
amit.kapila16@gmail.com
In reply to: Drouvot, Bertrand (#18)
Re: [bug] Logical Decoding of relation rewrite with toast does not reset toast_hash

On Wed, Aug 18, 2021 at 1:27 PM Drouvot, Bertrand <bdrouvot@amazon.com> wrote:

Hi,

On 8/13/21 11:17 AM, Amit Kapila wrote:

On Fri, Aug 13, 2021 at 11:47 AM Drouvot, Bertrand <bdrouvot@amazon.com> wrote:

On 8/12/21 1:00 PM, Amit Kapila wrote:

- sets "relwrewrite" for the toast.

--- a/src/backend/commands/tablecmds.c
+++ b/src/backend/commands/tablecmds.c
@@ -3861,6 +3861,10 @@ RenameRelationInternal(Oid myrelid, const char
*newrelname, bool is_internal, bo
*/
namestrcpy(&(relform->relname), newrelname);
+ /* reset relrewrite for toast */
+ if (relform->relkind == RELKIND_TOASTVALUE)
+ relform->relrewrite = InvalidOid;
+

I find this change quite ad-hoc. I think this API is quite generic to
make such a change. I see two ways for this (a) pass a new bool flag
(reset_toast_rewrite) in this API and then make this change, (b) in
the specific place where we need this, change relrewrite separately
via a new API.

I would prefer (b) in the ideal case, but I understand it would be an
additional cost, so maybe (a) is also okay. What do you people think?

I would prefer a) because:

- b) would need to update the exact same tuple one more time (means
doing more or less the same work: open relation, search for the tuple,
update the tuple...)

- a) would still give the ability for someone reading the code to
understand where the relrewrite reset is needed (as opposed to the way
the patch is currently written)

- finish_heap_swap() with swap_toast_by_content set to false, is the
only place where we currently need to reset explicitly relrewrite (so
that we would have the new API produced by b) being called only at that
place).

- That means that b) would be only for code readability but at the price
of extra cost.

Anybody else would like to weigh in? I think it is good if few others
also share their opinion as we need to backpatch this bug-fix.

I had a second thoughts on it and now think option b) is better.

It would make the code clearer by using a new API and the extra cost of
it (mainly search again for the pg_class tuple and update it) should be ok.

I agree especially because I am not very comfortable changing the
RenameRelationInternal() API in back branches. One minor comment:

+
+ /*
+ * Reset the relrewrite for the toast. We need to call
+ * CommandCounterIncrement() first to avoid the
+ * "tuple already updated by self" error. Indeed the exact same
+ * pg_class tuple has already been updated while
+ * calling RenameRelationInternal().
+ */
+ CommandCounterIncrement();

It would be better if we can write the above comment as "The
command-counter increment is required here as we are about to update
the tuple that is updated as part of RenameRelationInternal." or
something like that.

I would like to push and back-patch the proposed patch (after some
minor edits in the comments) unless someone thinks otherwise.

--
With Regards,
Amit Kapila.

#20Drouvot, Bertrand
bdrouvot@amazon.com
In reply to: Amit Kapila (#19)
3 attachment(s)
Re: [bug] Logical Decoding of relation rewrite with toast does not reset toast_hash

Hi,

On 8/18/21 12:01 PM, Amit Kapila wrote:

On Wed, Aug 18, 2021 at 1:27 PM Drouvot, Bertrand <bdrouvot@amazon.com> wrote:

Hi,

On 8/13/21 11:17 AM, Amit Kapila wrote:

On Fri, Aug 13, 2021 at 11:47 AM Drouvot, Bertrand <bdrouvot@amazon.com> wrote:

On 8/12/21 1:00 PM, Amit Kapila wrote:

- sets "relwrewrite" for the toast.

--- a/src/backend/commands/tablecmds.c
+++ b/src/backend/commands/tablecmds.c
@@ -3861,6 +3861,10 @@ RenameRelationInternal(Oid myrelid, const char
*newrelname, bool is_internal, bo
*/
namestrcpy(&(relform->relname), newrelname);
+ /* reset relrewrite for toast */
+ if (relform->relkind == RELKIND_TOASTVALUE)
+ relform->relrewrite = InvalidOid;
+

I find this change quite ad-hoc. I think this API is quite generic to
make such a change. I see two ways for this (a) pass a new bool flag
(reset_toast_rewrite) in this API and then make this change, (b) in
the specific place where we need this, change relrewrite separately
via a new API.

I would prefer (b) in the ideal case, but I understand it would be an
additional cost, so maybe (a) is also okay. What do you people think?

I would prefer a) because:

- b) would need to update the exact same tuple one more time (means
doing more or less the same work: open relation, search for the tuple,
update the tuple...)

- a) would still give the ability for someone reading the code to
understand where the relrewrite reset is needed (as opposed to the way
the patch is currently written)

- finish_heap_swap() with swap_toast_by_content set to false, is the
only place where we currently need to reset explicitly relrewrite (so
that we would have the new API produced by b) being called only at that
place).

- That means that b) would be only for code readability but at the price
of extra cost.

Anybody else would like to weigh in? I think it is good if few others
also share their opinion as we need to backpatch this bug-fix.

I had a second thoughts on it and now think option b) is better.

It would make the code clearer by using a new API and the extra cost of
it (mainly search again for the pg_class tuple and update it) should be ok.

I agree especially because I am not very comfortable changing the
RenameRelationInternal() API in back branches. One minor comment:

+
+ /*
+ * Reset the relrewrite for the toast. We need to call
+ * CommandCounterIncrement() first to avoid the
+ * "tuple already updated by self" error. Indeed the exact same
+ * pg_class tuple has already been updated while
+ * calling RenameRelationInternal().
+ */
+ CommandCounterIncrement();

It would be better if we can write the above comment as "The
command-counter increment is required here as we are about to update
the tuple that is updated as part of RenameRelationInternal." or
something like that.

I would like to push and back-patch the proposed patch (after some
minor edits in the comments) unless someone thinks otherwise.

Thanks!

I've updated the comment and prepared the back patch versions:

Please find attached:

v4-0001-toast-rewrite-master-branch.patch: to be applied on the master
and REL_14_STABLE branches

v4-0001-toast-rewrite-13-stable-branch.patch: to be applied on the
REL_13_STABLE and REL_12_STABLE branches

v4-0001-toast-rewrite-11-stable-branch.patch: to be applied on the
REL_11_STABLE branch

I stopped the back patching here as the issue has been introduced in
325f2ec555.

Thanks

Bertrand

Attachments:

v4-0001-toast-rewrite-13-stable-branch.patchtext/plain; charset=UTF-8; name=v4-0001-toast-rewrite-13-stable-branch.patch; x-mac-creator=0; x-mac-type=0Download
diff --git a/contrib/test_decoding/expected/toast.out b/contrib/test_decoding/expected/toast.out
index 75c4d22d80..cd03e9d50a 100644
--- a/contrib/test_decoding/expected/toast.out
+++ b/contrib/test_decoding/expected/toast.out
@@ -360,6 +360,28 @@ WHERE data NOT LIKE '%INSERT: %';
  COMMIT
 (4 rows)
 
+/*
+ * Test decoding relation rewrite with toast. The insert into tbl2 within the
+ * same transaction is there to check that there is no remaining toast_hash not
+ * being reset.
+ */
+CREATE TABLE tbl1 (a INT, b TEXT);
+CREATE TABLE tbl2 (a INT);
+ALTER TABLE tbl1 ALTER COLUMN b SET STORAGE EXTERNAL;
+BEGIN;
+INSERT INTO tbl1 VALUES(1, repeat('a', 4000)) ;
+ALTER TABLE tbl1 ADD COLUMN id serial primary key;
+INSERT INTO tbl2 VALUES(1);
+commit;
+SELECT substr(data, 1, 200) FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'include-xids', '0', 'skip-empty-xacts', '1');
+                                                                                                  substr                                                                                                  
+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
+ BEGIN
+ table public.tbl1: INSERT: a[integer]:1 b[text]:'aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
+ table public.tbl2: INSERT: a[integer]:1
+ COMMIT
+(4 rows)
+
 SELECT pg_drop_replication_slot('regression_slot');
  pg_drop_replication_slot 
 --------------------------
diff --git a/contrib/test_decoding/sql/toast.sql b/contrib/test_decoding/sql/toast.sql
index 016c3ab784..d1c560a174 100644
--- a/contrib/test_decoding/sql/toast.sql
+++ b/contrib/test_decoding/sql/toast.sql
@@ -308,4 +308,20 @@ DROP TABLE toasted_several;
 
 SELECT regexp_replace(data, '^(.{100}).*(.{100})$', '\1..\2') FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'include-xids', '0', 'skip-empty-xacts', '1')
 WHERE data NOT LIKE '%INSERT: %';
+
+/*
+ * Test decoding relation rewrite with toast. The insert into tbl2 within the
+ * same transaction is there to check that there is no remaining toast_hash not
+ * being reset.
+ */
+CREATE TABLE tbl1 (a INT, b TEXT);
+CREATE TABLE tbl2 (a INT);
+ALTER TABLE tbl1 ALTER COLUMN b SET STORAGE EXTERNAL;
+BEGIN;
+INSERT INTO tbl1 VALUES(1, repeat('a', 4000)) ;
+ALTER TABLE tbl1 ADD COLUMN id serial primary key;
+INSERT INTO tbl2 VALUES(1);
+commit;
+SELECT substr(data, 1, 200) FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'include-xids', '0', 'skip-empty-xacts', '1');
+
 SELECT pg_drop_replication_slot('regression_slot');
diff --git a/src/backend/catalog/toasting.c b/src/backend/catalog/toasting.c
index 3f7ab8d389..2d4a7e1b68 100644
--- a/src/backend/catalog/toasting.c
+++ b/src/backend/catalog/toasting.c
@@ -38,9 +38,11 @@
 Oid			binary_upgrade_next_toast_pg_type_oid = InvalidOid;
 
 static void CheckAndCreateToastTable(Oid relOid, Datum reloptions,
-									 LOCKMODE lockmode, bool check);
+									 LOCKMODE lockmode, bool check,
+									 Oid OIDOldToast);
 static bool create_toast_table(Relation rel, Oid toastOid, Oid toastIndexOid,
-							   Datum reloptions, LOCKMODE lockmode, bool check);
+							   Datum reloptions, LOCKMODE lockmode, bool check,
+							   Oid OIDOldToast);
 static bool needs_toast_table(Relation rel);
 
 
@@ -59,30 +61,34 @@ static bool needs_toast_table(Relation rel);
 void
 AlterTableCreateToastTable(Oid relOid, Datum reloptions, LOCKMODE lockmode)
 {
-	CheckAndCreateToastTable(relOid, reloptions, lockmode, true);
+	CheckAndCreateToastTable(relOid, reloptions, lockmode, true, InvalidOid);
 }
 
 void
-NewHeapCreateToastTable(Oid relOid, Datum reloptions, LOCKMODE lockmode)
+NewHeapCreateToastTable(Oid relOid, Datum reloptions, LOCKMODE lockmode,
+						Oid OIDOldToast)
 {
-	CheckAndCreateToastTable(relOid, reloptions, lockmode, false);
+	CheckAndCreateToastTable(relOid, reloptions, lockmode, false, OIDOldToast);
 }
 
 void
 NewRelationCreateToastTable(Oid relOid, Datum reloptions)
 {
-	CheckAndCreateToastTable(relOid, reloptions, AccessExclusiveLock, false);
+	CheckAndCreateToastTable(relOid, reloptions, AccessExclusiveLock, false,
+							 InvalidOid);
 }
 
 static void
-CheckAndCreateToastTable(Oid relOid, Datum reloptions, LOCKMODE lockmode, bool check)
+CheckAndCreateToastTable(Oid relOid, Datum reloptions, LOCKMODE lockmode,
+						 bool check, Oid OIDOldToast)
 {
 	Relation	rel;
 
 	rel = table_open(relOid, lockmode);
 
 	/* create_toast_table does all the work */
-	(void) create_toast_table(rel, InvalidOid, InvalidOid, reloptions, lockmode, check);
+	(void) create_toast_table(rel, InvalidOid, InvalidOid, reloptions, lockmode,
+							  check, OIDOldToast);
 
 	table_close(rel, NoLock);
 }
@@ -108,7 +114,7 @@ BootstrapToastTable(char *relName, Oid toastOid, Oid toastIndexOid)
 
 	/* create_toast_table does all the work */
 	if (!create_toast_table(rel, toastOid, toastIndexOid, (Datum) 0,
-							AccessExclusiveLock, false))
+							AccessExclusiveLock, false, InvalidOid))
 		elog(ERROR, "\"%s\" does not require a toast table",
 			 relName);
 
@@ -125,7 +131,8 @@ BootstrapToastTable(char *relName, Oid toastOid, Oid toastIndexOid)
  */
 static bool
 create_toast_table(Relation rel, Oid toastOid, Oid toastIndexOid,
-				   Datum reloptions, LOCKMODE lockmode, bool check)
+				   Datum reloptions, LOCKMODE lockmode, bool check,
+				   Oid OIDOldToast)
 {
 	Oid			relOid = RelationGetRelid(rel);
 	HeapTuple	reltup;
@@ -270,7 +277,7 @@ create_toast_table(Relation rel, Oid toastOid, Oid toastIndexOid,
 										   false,
 										   true,
 										   true,
-										   InvalidOid,
+										   OIDOldToast,
 										   NULL);
 	Assert(toast_relid != InvalidOid);
 
diff --git a/src/backend/commands/cluster.c b/src/backend/commands/cluster.c
index 04d12a7ece..2b042978db 100644
--- a/src/backend/commands/cluster.c
+++ b/src/backend/commands/cluster.c
@@ -709,7 +709,7 @@ make_new_heap(Oid OIDOldHeap, Oid NewTableSpace, char relpersistence,
 		if (isNull)
 			reloptions = (Datum) 0;
 
-		NewHeapCreateToastTable(OIDNewHeap, reloptions, lockmode);
+		NewHeapCreateToastTable(OIDNewHeap, reloptions, lockmode, toastid);
 
 		ReleaseSysCache(tuple);
 	}
@@ -1487,6 +1487,14 @@ finish_heap_swap(Oid OIDOldHeap, Oid OIDNewHeap,
 
 			RenameRelationInternal(toastidx,
 								   NewToastName, true, true);
+
+			/*
+			 * Reset the relrewrite for the toast. The command-counter
+			 * increment is required here as we are about to update
+			 * the tuple that is updated as part of RenameRelationInternal.
+			 */
+			CommandCounterIncrement();
+			ResetRelRewrite(newrel->rd_rel->reltoastrelid);
 		}
 		relation_close(newrel, NoLock);
 	}
diff --git a/src/backend/commands/tablecmds.c b/src/backend/commands/tablecmds.c
index 8784bec0e6..b149f2e9d8 100644
--- a/src/backend/commands/tablecmds.c
+++ b/src/backend/commands/tablecmds.c
@@ -3562,6 +3562,37 @@ RenameRelationInternal(Oid myrelid, const char *newrelname, bool is_internal, bo
 	relation_close(targetrelation, NoLock);
 }
 
+/*
+ *		ResetRelRewrite - reset relrewrite
+ */
+void
+ResetRelRewrite(Oid myrelid)
+{
+	Relation	relrelation;	/* for RELATION relation */
+	HeapTuple	reltup;
+	Form_pg_class relform;
+
+	/*
+	 * Find relation's pg_class tuple.
+	 */
+	relrelation = table_open(RelationRelationId, RowExclusiveLock);
+
+	reltup = SearchSysCacheCopy1(RELOID, ObjectIdGetDatum(myrelid));
+	if (!HeapTupleIsValid(reltup))	/* shouldn't happen */
+		elog(ERROR, "cache lookup failed for relation %u", myrelid);
+	relform = (Form_pg_class) GETSTRUCT(reltup);
+
+	/*
+	 * Update pg_class tuple.
+	 */
+	relform->relrewrite = InvalidOid;
+
+	CatalogTupleUpdate(relrelation, &reltup->t_self, reltup);
+
+	heap_freetuple(reltup);
+	table_close(relrelation, RowExclusiveLock);
+}
+
 /*
  * Disallow ALTER TABLE (and similar commands) when the current backend has
  * any open reference to the target table besides the one just acquired by
diff --git a/src/include/catalog/toasting.h b/src/include/catalog/toasting.h
index 51491c4513..bde14263a1 100644
--- a/src/include/catalog/toasting.h
+++ b/src/include/catalog/toasting.h
@@ -24,7 +24,7 @@
  */
 extern void NewRelationCreateToastTable(Oid relOid, Datum reloptions);
 extern void NewHeapCreateToastTable(Oid relOid, Datum reloptions,
-									LOCKMODE lockmode);
+									LOCKMODE lockmode, Oid OIDOldToast);
 extern void AlterTableCreateToastTable(Oid relOid, Datum reloptions,
 									   LOCKMODE lockmode);
 extern void BootstrapToastTable(char *relName,
diff --git a/src/include/commands/tablecmds.h b/src/include/commands/tablecmds.h
index c1581ad178..e01a1715d5 100644
--- a/src/include/commands/tablecmds.h
+++ b/src/include/commands/tablecmds.h
@@ -71,6 +71,8 @@ extern void RenameRelationInternal(Oid myrelid,
 								   const char *newrelname, bool is_internal,
 								   bool is_index);
 
+extern void ResetRelRewrite(Oid myrelid);
+
 extern void find_composite_type_dependencies(Oid typeOid,
 											 Relation origRelation,
 											 const char *origTypeName);
v4-0001-toast-rewrite-11-stable-branch.patchtext/plain; charset=UTF-8; name=v4-0001-toast-rewrite-11-stable-branch.patch; x-mac-creator=0; x-mac-type=0Download
diff --git a/contrib/test_decoding/expected/toast.out b/contrib/test_decoding/expected/toast.out
index 75c4d22d80..cd03e9d50a 100644
--- a/contrib/test_decoding/expected/toast.out
+++ b/contrib/test_decoding/expected/toast.out
@@ -360,6 +360,28 @@ WHERE data NOT LIKE '%INSERT: %';
  COMMIT
 (4 rows)
 
+/*
+ * Test decoding relation rewrite with toast. The insert into tbl2 within the
+ * same transaction is there to check that there is no remaining toast_hash not
+ * being reset.
+ */
+CREATE TABLE tbl1 (a INT, b TEXT);
+CREATE TABLE tbl2 (a INT);
+ALTER TABLE tbl1 ALTER COLUMN b SET STORAGE EXTERNAL;
+BEGIN;
+INSERT INTO tbl1 VALUES(1, repeat('a', 4000)) ;
+ALTER TABLE tbl1 ADD COLUMN id serial primary key;
+INSERT INTO tbl2 VALUES(1);
+commit;
+SELECT substr(data, 1, 200) FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'include-xids', '0', 'skip-empty-xacts', '1');
+                                                                                                  substr                                                                                                  
+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
+ BEGIN
+ table public.tbl1: INSERT: a[integer]:1 b[text]:'aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
+ table public.tbl2: INSERT: a[integer]:1
+ COMMIT
+(4 rows)
+
 SELECT pg_drop_replication_slot('regression_slot');
  pg_drop_replication_slot 
 --------------------------
diff --git a/contrib/test_decoding/sql/toast.sql b/contrib/test_decoding/sql/toast.sql
index 016c3ab784..d1c560a174 100644
--- a/contrib/test_decoding/sql/toast.sql
+++ b/contrib/test_decoding/sql/toast.sql
@@ -308,4 +308,20 @@ DROP TABLE toasted_several;
 
 SELECT regexp_replace(data, '^(.{100}).*(.{100})$', '\1..\2') FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'include-xids', '0', 'skip-empty-xacts', '1')
 WHERE data NOT LIKE '%INSERT: %';
+
+/*
+ * Test decoding relation rewrite with toast. The insert into tbl2 within the
+ * same transaction is there to check that there is no remaining toast_hash not
+ * being reset.
+ */
+CREATE TABLE tbl1 (a INT, b TEXT);
+CREATE TABLE tbl2 (a INT);
+ALTER TABLE tbl1 ALTER COLUMN b SET STORAGE EXTERNAL;
+BEGIN;
+INSERT INTO tbl1 VALUES(1, repeat('a', 4000)) ;
+ALTER TABLE tbl1 ADD COLUMN id serial primary key;
+INSERT INTO tbl2 VALUES(1);
+commit;
+SELECT substr(data, 1, 200) FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'include-xids', '0', 'skip-empty-xacts', '1');
+
 SELECT pg_drop_replication_slot('regression_slot');
diff --git a/src/backend/catalog/toasting.c b/src/backend/catalog/toasting.c
index 3baaa08238..92b0b7cd2e 100644
--- a/src/backend/catalog/toasting.c
+++ b/src/backend/catalog/toasting.c
@@ -37,9 +37,11 @@
 Oid			binary_upgrade_next_toast_pg_type_oid = InvalidOid;
 
 static void CheckAndCreateToastTable(Oid relOid, Datum reloptions,
-						 LOCKMODE lockmode, bool check);
+						 LOCKMODE lockmode, bool check,
+						 Oid OIDOldToast);
 static bool create_toast_table(Relation rel, Oid toastOid, Oid toastIndexOid,
-				   Datum reloptions, LOCKMODE lockmode, bool check);
+				   Datum reloptions, LOCKMODE lockmode, bool check,
+				   Oid OIDOldToast);
 static bool needs_toast_table(Relation rel);
 
 
@@ -58,30 +60,34 @@ static bool needs_toast_table(Relation rel);
 void
 AlterTableCreateToastTable(Oid relOid, Datum reloptions, LOCKMODE lockmode)
 {
-	CheckAndCreateToastTable(relOid, reloptions, lockmode, true);
+	CheckAndCreateToastTable(relOid, reloptions, lockmode, true, InvalidOid);
 }
 
 void
-NewHeapCreateToastTable(Oid relOid, Datum reloptions, LOCKMODE lockmode)
+NewHeapCreateToastTable(Oid relOid, Datum reloptions, LOCKMODE lockmode,
+						Oid OIDOldToast)
 {
-	CheckAndCreateToastTable(relOid, reloptions, lockmode, false);
+	CheckAndCreateToastTable(relOid, reloptions, lockmode, false, OIDOldToast);
 }
 
 void
 NewRelationCreateToastTable(Oid relOid, Datum reloptions)
 {
-	CheckAndCreateToastTable(relOid, reloptions, AccessExclusiveLock, false);
+	CheckAndCreateToastTable(relOid, reloptions, AccessExclusiveLock, false,
+							 InvalidOid);
 }
 
 static void
-CheckAndCreateToastTable(Oid relOid, Datum reloptions, LOCKMODE lockmode, bool check)
+CheckAndCreateToastTable(Oid relOid, Datum reloptions, LOCKMODE lockmode,
+						 bool check, Oid OIDOldToast)
 {
 	Relation	rel;
 
 	rel = heap_open(relOid, lockmode);
 
 	/* create_toast_table does all the work */
-	(void) create_toast_table(rel, InvalidOid, InvalidOid, reloptions, lockmode, check);
+	(void) create_toast_table(rel, InvalidOid, InvalidOid, reloptions, lockmode,
+							  check, OIDOldToast);
 
 	heap_close(rel, NoLock);
 }
@@ -107,7 +113,7 @@ BootstrapToastTable(char *relName, Oid toastOid, Oid toastIndexOid)
 
 	/* create_toast_table does all the work */
 	if (!create_toast_table(rel, toastOid, toastIndexOid, (Datum) 0,
-							AccessExclusiveLock, false))
+							AccessExclusiveLock, false, InvalidOid))
 		elog(ERROR, "\"%s\" does not require a toast table",
 			 relName);
 
@@ -124,7 +130,8 @@ BootstrapToastTable(char *relName, Oid toastOid, Oid toastIndexOid)
  */
 static bool
 create_toast_table(Relation rel, Oid toastOid, Oid toastIndexOid,
-				   Datum reloptions, LOCKMODE lockmode, bool check)
+				   Datum reloptions, LOCKMODE lockmode, bool check,
+				   Oid OIDOldToast)
 {
 	Oid			relOid = RelationGetRelid(rel);
 	HeapTuple	reltup;
@@ -279,7 +286,7 @@ create_toast_table(Relation rel, Oid toastOid, Oid toastIndexOid,
 										   false,
 										   true,
 										   true,
-										   InvalidOid,
+										   OIDOldToast,
 										   NULL);
 	Assert(toast_relid != InvalidOid);
 
diff --git a/src/backend/commands/cluster.c b/src/backend/commands/cluster.c
index 0a425e1018..ac7277e2de 100644
--- a/src/backend/commands/cluster.c
+++ b/src/backend/commands/cluster.c
@@ -727,7 +727,7 @@ make_new_heap(Oid OIDOldHeap, Oid NewTableSpace, char relpersistence,
 		if (isNull)
 			reloptions = (Datum) 0;
 
-		NewHeapCreateToastTable(OIDNewHeap, reloptions, lockmode);
+		NewHeapCreateToastTable(OIDNewHeap, reloptions, lockmode, toastid);
 
 		ReleaseSysCache(tuple);
 	}
@@ -1666,6 +1666,14 @@ finish_heap_swap(Oid OIDOldHeap, Oid OIDNewHeap,
 
 			RenameRelationInternal(toastidx,
 								   NewToastName, true);
+
+			/*
+			 * Reset the relrewrite for the toast. The command-counter
+			 * increment is required here as we are about to update
+			 * the tuple that is updated as part of RenameRelationInternal.
+			 */
+			CommandCounterIncrement();
+			ResetRelRewrite(newrel->rd_rel->reltoastrelid);
 		}
 		relation_close(newrel, NoLock);
 	}
diff --git a/src/backend/commands/tablecmds.c b/src/backend/commands/tablecmds.c
index cd4ce802a2..166a04c83a 100644
--- a/src/backend/commands/tablecmds.c
+++ b/src/backend/commands/tablecmds.c
@@ -3268,6 +3268,37 @@ RenameRelationInternal(Oid myrelid, const char *newrelname, bool is_internal)
 	relation_close(targetrelation, NoLock);
 }
 
+/*
+ *		ResetRelRewrite - reset relrewrite
+ */
+void
+ResetRelRewrite(Oid myrelid)
+{
+	Relation	relrelation;	/* for RELATION relation */
+	HeapTuple	reltup;
+	Form_pg_class relform;
+
+	/*
+	 * Find relation's pg_class tuple.
+	 */
+	relrelation = heap_open(RelationRelationId, RowExclusiveLock);
+
+	reltup = SearchSysCacheCopy1(RELOID, ObjectIdGetDatum(myrelid));
+	if (!HeapTupleIsValid(reltup))	/* shouldn't happen */
+		elog(ERROR, "cache lookup failed for relation %u", myrelid);
+	relform = (Form_pg_class) GETSTRUCT(reltup);
+
+	/*
+	 * Update pg_class tuple.
+	 */
+	relform->relrewrite = InvalidOid;
+
+	CatalogTupleUpdate(relrelation, &reltup->t_self, reltup);
+
+	heap_freetuple(reltup);
+	heap_close(relrelation, RowExclusiveLock);
+}
+
 /*
  * Disallow ALTER TABLE (and similar commands) when the current backend has
  * any open reference to the target table besides the one just acquired by
diff --git a/src/include/catalog/toasting.h b/src/include/catalog/toasting.h
index 3db39b8f86..78b8772722 100644
--- a/src/include/catalog/toasting.h
+++ b/src/include/catalog/toasting.h
@@ -21,7 +21,7 @@
  */
 extern void NewRelationCreateToastTable(Oid relOid, Datum reloptions);
 extern void NewHeapCreateToastTable(Oid relOid, Datum reloptions,
-						LOCKMODE lockmode);
+						LOCKMODE lockmode, Oid OIDOldToast);
 extern void AlterTableCreateToastTable(Oid relOid, Datum reloptions,
 						   LOCKMODE lockmode);
 extern void BootstrapToastTable(char *relName,
diff --git a/src/include/commands/tablecmds.h b/src/include/commands/tablecmds.h
index ac93bd3de8..397256ce78 100644
--- a/src/include/commands/tablecmds.h
+++ b/src/include/commands/tablecmds.h
@@ -69,6 +69,8 @@ extern ObjectAddress RenameRelation(RenameStmt *stmt);
 extern void RenameRelationInternal(Oid myrelid,
 					   const char *newrelname, bool is_internal);
 
+extern void ResetRelRewrite(Oid myrelid);
+
 extern void find_composite_type_dependencies(Oid typeOid,
 								 Relation origRelation,
 								 const char *origTypeName);
v4-0001-toast-rewrite-master-branch.patchtext/plain; charset=UTF-8; name=v4-0001-toast-rewrite-master-branch.patch; x-mac-creator=0; x-mac-type=0Download
diff --git a/contrib/test_decoding/expected/toast.out b/contrib/test_decoding/expected/toast.out
index 75c4d22d80..cd03e9d50a 100644
--- a/contrib/test_decoding/expected/toast.out
+++ b/contrib/test_decoding/expected/toast.out
@@ -360,6 +360,28 @@ WHERE data NOT LIKE '%INSERT: %';
  COMMIT
 (4 rows)
 
+/*
+ * Test decoding relation rewrite with toast. The insert into tbl2 within the
+ * same transaction is there to check that there is no remaining toast_hash not
+ * being reset.
+ */
+CREATE TABLE tbl1 (a INT, b TEXT);
+CREATE TABLE tbl2 (a INT);
+ALTER TABLE tbl1 ALTER COLUMN b SET STORAGE EXTERNAL;
+BEGIN;
+INSERT INTO tbl1 VALUES(1, repeat('a', 4000)) ;
+ALTER TABLE tbl1 ADD COLUMN id serial primary key;
+INSERT INTO tbl2 VALUES(1);
+commit;
+SELECT substr(data, 1, 200) FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'include-xids', '0', 'skip-empty-xacts', '1');
+                                                                                                  substr                                                                                                  
+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
+ BEGIN
+ table public.tbl1: INSERT: a[integer]:1 b[text]:'aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
+ table public.tbl2: INSERT: a[integer]:1
+ COMMIT
+(4 rows)
+
 SELECT pg_drop_replication_slot('regression_slot');
  pg_drop_replication_slot 
 --------------------------
diff --git a/contrib/test_decoding/sql/toast.sql b/contrib/test_decoding/sql/toast.sql
index 016c3ab784..d1c560a174 100644
--- a/contrib/test_decoding/sql/toast.sql
+++ b/contrib/test_decoding/sql/toast.sql
@@ -308,4 +308,20 @@ DROP TABLE toasted_several;
 
 SELECT regexp_replace(data, '^(.{100}).*(.{100})$', '\1..\2') FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'include-xids', '0', 'skip-empty-xacts', '1')
 WHERE data NOT LIKE '%INSERT: %';
+
+/*
+ * Test decoding relation rewrite with toast. The insert into tbl2 within the
+ * same transaction is there to check that there is no remaining toast_hash not
+ * being reset.
+ */
+CREATE TABLE tbl1 (a INT, b TEXT);
+CREATE TABLE tbl2 (a INT);
+ALTER TABLE tbl1 ALTER COLUMN b SET STORAGE EXTERNAL;
+BEGIN;
+INSERT INTO tbl1 VALUES(1, repeat('a', 4000)) ;
+ALTER TABLE tbl1 ADD COLUMN id serial primary key;
+INSERT INTO tbl2 VALUES(1);
+commit;
+SELECT substr(data, 1, 200) FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'include-xids', '0', 'skip-empty-xacts', '1');
+
 SELECT pg_drop_replication_slot('regression_slot');
diff --git a/src/backend/catalog/toasting.c b/src/backend/catalog/toasting.c
index 147b5abc19..0db90c2011 100644
--- a/src/backend/catalog/toasting.c
+++ b/src/backend/catalog/toasting.c
@@ -36,9 +36,11 @@
 #include "utils/syscache.h"
 
 static void CheckAndCreateToastTable(Oid relOid, Datum reloptions,
-									 LOCKMODE lockmode, bool check);
+									 LOCKMODE lockmode, bool check,
+									 Oid OIDOldToast);
 static bool create_toast_table(Relation rel, Oid toastOid, Oid toastIndexOid,
-							   Datum reloptions, LOCKMODE lockmode, bool check);
+							   Datum reloptions, LOCKMODE lockmode, bool check,
+							   Oid OIDOldToast);
 static bool needs_toast_table(Relation rel);
 
 
@@ -57,30 +59,34 @@ static bool needs_toast_table(Relation rel);
 void
 AlterTableCreateToastTable(Oid relOid, Datum reloptions, LOCKMODE lockmode)
 {
-	CheckAndCreateToastTable(relOid, reloptions, lockmode, true);
+	CheckAndCreateToastTable(relOid, reloptions, lockmode, true, InvalidOid);
 }
 
 void
-NewHeapCreateToastTable(Oid relOid, Datum reloptions, LOCKMODE lockmode)
+NewHeapCreateToastTable(Oid relOid, Datum reloptions, LOCKMODE lockmode,
+						Oid OIDOldToast)
 {
-	CheckAndCreateToastTable(relOid, reloptions, lockmode, false);
+	CheckAndCreateToastTable(relOid, reloptions, lockmode, false, OIDOldToast);
 }
 
 void
 NewRelationCreateToastTable(Oid relOid, Datum reloptions)
 {
-	CheckAndCreateToastTable(relOid, reloptions, AccessExclusiveLock, false);
+	CheckAndCreateToastTable(relOid, reloptions, AccessExclusiveLock, false,
+							 InvalidOid);
 }
 
 static void
-CheckAndCreateToastTable(Oid relOid, Datum reloptions, LOCKMODE lockmode, bool check)
+CheckAndCreateToastTable(Oid relOid, Datum reloptions, LOCKMODE lockmode,
+						 bool check, Oid OIDOldToast)
 {
 	Relation	rel;
 
 	rel = table_open(relOid, lockmode);
 
 	/* create_toast_table does all the work */
-	(void) create_toast_table(rel, InvalidOid, InvalidOid, reloptions, lockmode, check);
+	(void) create_toast_table(rel, InvalidOid, InvalidOid, reloptions, lockmode,
+							  check, OIDOldToast);
 
 	table_close(rel, NoLock);
 }
@@ -104,7 +110,7 @@ BootstrapToastTable(char *relName, Oid toastOid, Oid toastIndexOid)
 
 	/* create_toast_table does all the work */
 	if (!create_toast_table(rel, toastOid, toastIndexOid, (Datum) 0,
-							AccessExclusiveLock, false))
+							AccessExclusiveLock, false, InvalidOid))
 		elog(ERROR, "\"%s\" does not require a toast table",
 			 relName);
 
@@ -121,7 +127,8 @@ BootstrapToastTable(char *relName, Oid toastOid, Oid toastIndexOid)
  */
 static bool
 create_toast_table(Relation rel, Oid toastOid, Oid toastIndexOid,
-				   Datum reloptions, LOCKMODE lockmode, bool check)
+				   Datum reloptions, LOCKMODE lockmode, bool check,
+				   Oid OIDOldToast)
 {
 	Oid			relOid = RelationGetRelid(rel);
 	HeapTuple	reltup;
@@ -258,7 +265,7 @@ create_toast_table(Relation rel, Oid toastOid, Oid toastIndexOid,
 										   false,
 										   true,
 										   true,
-										   InvalidOid,
+										   OIDOldToast,
 										   NULL);
 	Assert(toast_relid != InvalidOid);
 
diff --git a/src/backend/commands/cluster.c b/src/backend/commands/cluster.c
index 39185cc5b9..9d22f648a8 100644
--- a/src/backend/commands/cluster.c
+++ b/src/backend/commands/cluster.c
@@ -735,7 +735,7 @@ make_new_heap(Oid OIDOldHeap, Oid NewTableSpace, Oid NewAccessMethod,
 		if (isNull)
 			reloptions = (Datum) 0;
 
-		NewHeapCreateToastTable(OIDNewHeap, reloptions, lockmode);
+		NewHeapCreateToastTable(OIDNewHeap, reloptions, lockmode, toastid);
 
 		ReleaseSysCache(tuple);
 	}
@@ -1526,6 +1526,14 @@ finish_heap_swap(Oid OIDOldHeap, Oid OIDNewHeap,
 
 			RenameRelationInternal(toastidx,
 								   NewToastName, true, true);
+
+			/*
+			 * Reset the relrewrite for the toast. The command-counter
+			 * increment is required here as we are about to update
+			 * the tuple that is updated as part of RenameRelationInternal.
+			 */
+			CommandCounterIncrement();
+			ResetRelRewrite(newrel->rd_rel->reltoastrelid);
 		}
 		relation_close(newrel, NoLock);
 	}
diff --git a/src/backend/commands/tablecmds.c b/src/backend/commands/tablecmds.c
index a03077139d..dbee6ae199 100644
--- a/src/backend/commands/tablecmds.c
+++ b/src/backend/commands/tablecmds.c
@@ -3849,6 +3849,37 @@ RenameRelationInternal(Oid myrelid, const char *newrelname, bool is_internal, bo
 	relation_close(targetrelation, NoLock);
 }
 
+/*
+ *		ResetRelRewrite - reset relrewrite
+ */
+void
+ResetRelRewrite(Oid myrelid)
+{
+	Relation	relrelation;	/* for RELATION relation */
+	HeapTuple	reltup;
+	Form_pg_class relform;
+
+	/*
+	 * Find relation's pg_class tuple.
+	 */
+	relrelation = table_open(RelationRelationId, RowExclusiveLock);
+
+	reltup = SearchSysCacheCopy1(RELOID, ObjectIdGetDatum(myrelid));
+	if (!HeapTupleIsValid(reltup))	/* shouldn't happen */
+		elog(ERROR, "cache lookup failed for relation %u", myrelid);
+	relform = (Form_pg_class) GETSTRUCT(reltup);
+
+	/*
+	 * Update pg_class tuple.
+	 */
+	relform->relrewrite = InvalidOid;
+
+	CatalogTupleUpdate(relrelation, &reltup->t_self, reltup);
+
+	heap_freetuple(reltup);
+	table_close(relrelation, RowExclusiveLock);
+}
+
 /*
  * Disallow ALTER TABLE (and similar commands) when the current backend has
  * any open reference to the target table besides the one just acquired by
diff --git a/src/include/catalog/toasting.h b/src/include/catalog/toasting.h
index f3fa85f64e..7067c4c61b 100644
--- a/src/include/catalog/toasting.h
+++ b/src/include/catalog/toasting.h
@@ -21,7 +21,7 @@
  */
 extern void NewRelationCreateToastTable(Oid relOid, Datum reloptions);
 extern void NewHeapCreateToastTable(Oid relOid, Datum reloptions,
-									LOCKMODE lockmode);
+									LOCKMODE lockmode, Oid OIDOldToast);
 extern void AlterTableCreateToastTable(Oid relOid, Datum reloptions,
 									   LOCKMODE lockmode);
 extern void BootstrapToastTable(char *relName,
diff --git a/src/include/commands/tablecmds.h b/src/include/commands/tablecmds.h
index 14f4b4882f..336549cc5f 100644
--- a/src/include/commands/tablecmds.h
+++ b/src/include/commands/tablecmds.h
@@ -78,6 +78,8 @@ extern void RenameRelationInternal(Oid myrelid,
 								   const char *newrelname, bool is_internal,
 								   bool is_index);
 
+extern void ResetRelRewrite(Oid myrelid);
+
 extern void find_composite_type_dependencies(Oid typeOid,
 											 Relation origRelation,
 											 const char *origTypeName);
#21Amit Kapila
amit.kapila16@gmail.com
In reply to: Drouvot, Bertrand (#20)
Re: [bug] Logical Decoding of relation rewrite with toast does not reset toast_hash

On Wed, Aug 18, 2021 at 8:09 PM Drouvot, Bertrand <bdrouvot@amazon.com> wrote:

Hi,

On 8/18/21 12:01 PM, Amit Kapila wrote:

On Wed, Aug 18, 2021 at 1:27 PM Drouvot, Bertrand <bdrouvot@amazon.com> wrote:

Hi,

I've updated the comment and prepared the back patch versions:

I have verified and all your patches look good to me. I'll push and
backpatch this by Wednesday unless there are more comments.

--
With Regards,
Amit Kapila.

#22Drouvot, Bertrand
bdrouvot@amazon.com
In reply to: Amit Kapila (#21)
Re: [bug] Logical Decoding of relation rewrite with toast does not reset toast_hash

Hi,

On 8/23/21 12:02 PM, Amit Kapila wrote:

CAUTION: This email originated from outside of the organization. Do not click links or open attachments unless you can confirm the sender and know the content is safe.

On Wed, Aug 18, 2021 at 8:09 PM Drouvot, Bertrand <bdrouvot@amazon.com> wrote:

Hi,

On 8/18/21 12:01 PM, Amit Kapila wrote:

On Wed, Aug 18, 2021 at 1:27 PM Drouvot, Bertrand <bdrouvot@amazon.com> wrote:

Hi,

I've updated the comment and prepared the back patch versions:

I have verified and all your patches look good to me. I'll push and
backpatch this by Wednesday unless there are more comments.

I just saw that the patch has been committed.

Thanks for your help and time on this.

I'll mark the corresponding commitfest entry as "Committed".

Thanks

Bertrand

#23Amit Kapila
amit.kapila16@gmail.com
In reply to: Drouvot, Bertrand (#22)
Re: [bug] Logical Decoding of relation rewrite with toast does not reset toast_hash

On Wed, Aug 25, 2021 at 11:26 AM Drouvot, Bertrand <bdrouvot@amazon.com> wrote:

I just saw that the patch has been committed.

Thanks for your help and time on this.

I'll mark the corresponding commitfest entry as "Committed".

Thanks for your work on this. I have also marked it closed in
PostgreSQL_14_Open_Items doc [1]https://wiki.postgresql.org/wiki/PostgreSQL_14_Open_Items

[1]: https://wiki.postgresql.org/wiki/PostgreSQL_14_Open_Items

--
With Regards,
Amit Kapila.