[BUG]Update Toast data failure in logical replication
Hi
I think I just found a bug in logical replication. Data couldn't be synchronized while updating toast data. Could anyone take a look at it?
Here is the steps to proceduce the BUG:
------publisher------
CREATE TABLE toasted_key (
id serial,
toasted_key text PRIMARY KEY,
toasted_col1 text,
toasted_col2 text
);
CREATE PUBLICATION pub FOR TABLE toasted_key;
------subscriber------
CREATE TABLE toasted_key (
id serial,
toasted_key text PRIMARY KEY,
toasted_col1 text,
toasted_col2 text
);
CREATE SUBSCRIPTION sub CONNECTION 'dbname=postgres' PUBLICATION pub;
------publisher------
ALTER TABLE toasted_key ALTER COLUMN toasted_key SET STORAGE EXTERNAL;
ALTER TABLE toasted_key ALTER COLUMN toasted_col1 SET STORAGE EXTERNAL;
INSERT INTO toasted_key(toasted_key, toasted_col1) VALUES(repeat('1234567890', 200), repeat('9876543210', 200));
UPDATE toasted_key SET toasted_col2 = toasted_col1;
------subscriber------
SELECT count(*) FROM toasted_key WHERE toasted_col2 = toasted_col1;
The above command is supposed to output "count = 1" but in fact it outputs "count = 0" which means UPDATE operation failed at the subscriber. Right?
I debugged and found the subscriber could receive message from publisher, and in apply_handle_update_internal function, it invoked FindReplTupleInLocalRel function but failed to find a tuple.
FYI, I also tested DELETE operation(DELETE FROM toasted_key;), which also invoked FindReplTupleInLocalRel function, and the result is ok.
Regards
Tang
On Friday, May 28, 2021 2:16 PM, tanghy.fnst@fujitsu.com wrote:
I think I just found a bug in logical replication. Data couldn't be synchronized while updating toast data.
Could anyone take a look at it?
FYI. The problem also occurs in PG-13. I will try to check from which version it got introduced.
Regards,
Tang
On Friday, May 28, 2021 3:02 PM, tanghy.fnst@fujitsu.com wrote:
FYI. The problem also occurs in PG-13. I will try to check from which version it
got introduced.
I reproduced it in PG-10,11,12,13.
I think the problem has been existing since Logical replication introduced in PG-10.
Regards
Tang
On Fri, May 28, 2021 at 12:31 PM tanghy.fnst@fujitsu.com
<tanghy.fnst@fujitsu.com> wrote:
On Friday, May 28, 2021 3:02 PM, tanghy.fnst@fujitsu.com wrote:
FYI. The problem also occurs in PG-13. I will try to check from which version it
got introduced.I reproduced it in PG-10,11,12,13.
I think the problem has been existing since Logical replication introduced in PG-10.
Seems you did not set the replica identity for updating the tuple.
Try this before updating, and it should work.
ALTER TABLE toasted_key REPLICA IDENTITY USING INDEX toasted_key_pkey;
or
ALTER TABLE toasted_key REPLICA IDENTITY FULL.
--
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com
On Friday, May 28, 2021 6:51 PM, Dilip Kumar <dilipbalaut@gmail.com> wrote:
Seems you did not set the replica identity for updating the tuple.
Try this before updating, and it should work.
Thanks for your reply. I tried it.
ALTER TABLE toasted_key REPLICA IDENTITY USING INDEX toasted_key_pkey;
This didn't work.
or
ALTER TABLE toasted_key REPLICA IDENTITY FULL.
It worked.
And I noticed if the length of PRIMARY KEY (toasted_key) is short, data could be synchronized successfully with default replica identity.
Could you tell me why we need to set replica identity?
Regards
Tang
On Mon, May 31, 2021 at 8:04 AM tanghy.fnst@fujitsu.com
<tanghy.fnst@fujitsu.com> wrote:
On Friday, May 28, 2021 6:51 PM, Dilip Kumar <dilipbalaut@gmail.com> wrote:
Seems you did not set the replica identity for updating the tuple.
Try this before updating, and it should work.Thanks for your reply. I tried it.
ALTER TABLE toasted_key REPLICA IDENTITY USING INDEX toasted_key_pkey;
This didn't work.
or
ALTER TABLE toasted_key REPLICA IDENTITY FULL.
It worked.
And I noticed if the length of PRIMARY KEY (toasted_key) is short, data could be synchronized successfully with default replica identity.
Could you tell me why we need to set replica identity?
Looks like some problem if the replica identity is an index and the
value is stored externally, I will debug this and let you know.
--
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com
On Mon, May 31, 2021 at 12:20 PM Dilip Kumar <dilipbalaut@gmail.com> wrote:
On Mon, May 31, 2021 at 8:04 AM tanghy.fnst@fujitsu.com
<tanghy.fnst@fujitsu.com> wrote:On Friday, May 28, 2021 6:51 PM, Dilip Kumar <dilipbalaut@gmail.com> wrote:
Seems you did not set the replica identity for updating the tuple.
Try this before updating, and it should work.Thanks for your reply. I tried it.
ALTER TABLE toasted_key REPLICA IDENTITY USING INDEX toasted_key_pkey;
This didn't work.
or
ALTER TABLE toasted_key REPLICA IDENTITY FULL.
It worked.
And I noticed if the length of PRIMARY KEY (toasted_key) is short, data could be synchronized successfully with default replica identity.
Could you tell me why we need to set replica identity?Looks like some problem if the replica identity is an index and the
value is stored externally, I will debug this and let you know.
The problem is if the key attribute is not changed we don't log it as
it should get logged along with the updated tuple, but if it is
externally stored then the complete key will never be logged because
there is no log from the toast table. For fixing this if the key is
externally stored then always log that.
Please test with the attached patch.
--
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com
Attachments:
v1-0001-Extract-unchanged-replica-identity-key-if-it-is-s.patchtext/x-patch; charset=US-ASCII; name=v1-0001-Extract-unchanged-replica-identity-key-if-it-is-s.patchDownload
From 965c18471919e231ec94edac9025f7178f121206 Mon Sep 17 00:00:00 2001
From: Dilip Kumar <dilipkumar@localhost.localdomain>
Date: Mon, 31 May 2021 14:37:26 +0530
Subject: [PATCH v1] Extract unchanged replica identity key if it is stored
externally
If replica identity is set to key and the key is not modified we don't log
key seperately because it should be logged along with the updated tuple. But
if the key is stored externally we must have to detoast and log it separately.
---
src/backend/access/heap/heapam.c | 7 +++++--
1 file changed, 5 insertions(+), 2 deletions(-)
diff --git a/src/backend/access/heap/heapam.c b/src/backend/access/heap/heapam.c
index bd60129..27d60e6 100644
--- a/src/backend/access/heap/heapam.c
+++ b/src/backend/access/heap/heapam.c
@@ -8407,8 +8407,11 @@ ExtractReplicaIdentity(Relation relation, HeapTuple tp, bool key_changed,
return tp;
}
- /* if the key hasn't changed and we're only logging the key, we're done */
- if (!key_changed)
+ /*
+ * if the key hasn't changedand we're only logging the key, we're done.
+ * But if tuple has external data then we might have to detoast the key.
+ */
+ if ((!key_changed) && !HeapTupleHasExternal(tp))
return NULL;
/* find out the replica identity columns */
--
1.8.3.1
On Mon, May 31, 2021 5:12 PM Dilip Kumar <dilipbalaut@gmail.com> wrote:
The problem is if the key attribute is not changed we don't log it as
it should get logged along with the updated tuple, but if it is
externally stored then the complete key will never be logged because
there is no log from the toast table. For fixing this if the key is
externally stored then always log that.Please test with the attached patch.
Thanks for your patch. I tested it and the bug was fixed.
I'm still trying to understand your fix, please allow me to ask more(maybe silly) questions if I found any.
+ * if the key hasn't changedand we're only logging the key, we're done.
I think "changedand" should be "changed and".
Regards
Tang
On Mon, 31 May 2021 at 3:33 PM, tanghy.fnst@fujitsu.com <
tanghy.fnst@fujitsu.com> wrote:
On Mon, May 31, 2021 5:12 PM Dilip Kumar <dilipbalaut@gmail.com> wrote:
The problem is if the key attribute is not changed we don't log it as
it should get logged along with the updated tuple, but if it is
externally stored then the complete key will never be logged because
there is no log from the toast table. For fixing this if the key is
externally stored then always log that.Please test with the attached patch.
Thanks for your patch. I tested it and the bug was fixed.
Thanks for confirming this.
I'm still trying to understand your fix, please allow me to ask more(maybe
silly) questions if I found any.+ * if the key hasn't changedand we're only logging the key, we're
done.I think "changedand" should be "changed and".
Okay, I will fix it. Lets see what others have to say about this fix, if
we agree with this then I think we might have to change the test output. I
will do that in the next version along with your comment fix.
Hi
I have some questions with your patch. Here are two cases I used to check the bug.
Case1(PK toasted_key is short), data could be synchronized on HEAD.
---------------
INSERT INTO toasted_key(toasted_key, toasted_col1) VALUES('111', repeat('9876543210', 200));
UPDATE toasted_key SET toasted_col2 = toasted_col1;
---------------
Case2(PK toasted_key is very long), data couldn’t be synchronized on HEAD.(which is the bug)
---------------
INSERT INTO toasted_key(toasted_key, toasted_col1) VALUES(repeat('9876543210', 200), '111');
UPDATE toasted_key SET toasted_col2 = toasted_col1;
---------------
So I think the bug is only related with the length of primary key.
I noticed that in case1, ExtractReplicaIdentity function returned NULL on HEAD. But after your fix, it didn’t return NULL. There is no problem with this case on HEAD, but the patch modified its return value. I’m not sure if it would bring new problems. Have you checked it?
Regards
Tang
On Tue, Jun 1, 2021 at 12:29 PM tanghy.fnst@fujitsu.com
<tanghy.fnst@fujitsu.com> wrote:
Hi
I have some questions with your patch. Here are two cases I used to check the bug.
Case1(PK toasted_key is short), data could be synchronized on HEAD.
---------------
INSERT INTO toasted_key(toasted_key, toasted_col1) VALUES('111', repeat('9876543210', 200));
UPDATE toasted_key SET toasted_col2 = toasted_col1;
---------------
Case2(PK toasted_key is very long), data couldn’t be synchronized on HEAD.(which is the bug)
---------------
INSERT INTO toasted_key(toasted_key, toasted_col1) VALUES(repeat('9876543210', 200), '111');
UPDATE toasted_key SET toasted_col2 = toasted_col1;
---------------
So I think the bug is only related with the length of primary key.
I noticed that in case1, ExtractReplicaIdentity function returned NULL on HEAD. But after your fix, it didn’t return NULL. There is no problem with this case on HEAD, but the patch modified its return value. I’m not sure if it would bring new problems. Have you checked it?
Good observation, basically, my check says that any field in the tuple
is toasted then prepare the key tuple, actually, after that, I should
recheck whether the key field specifically toasted or not and if it is
not then we can continue returning NULL. I will fix this in the next
version.
--
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com
On Tue, Jun 1, 2021 at 3:39 PM Dilip Kumar <dilipbalaut@gmail.com> wrote:
On Tue, Jun 1, 2021 at 12:29 PM tanghy.fnst@fujitsu.com
I noticed that in case1, ExtractReplicaIdentity function returned NULL on HEAD. But after your fix, it didn’t return NULL. There is no problem with this case on HEAD, but the patch modified its return value. I’m not sure if it would bring new problems. Have you checked it?
Good observation, basically, my check says that any field in the tuple
is toasted then prepare the key tuple, actually, after that, I should
recheck whether the key field specifically toasted or not and if it is
not then we can continue returning NULL. I will fix this in the next
version.
Attached patch fixes that, I haven't yet added the test case. Once
someone confirms on the approach then I will add a test case to the
patch.
--
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com
Attachments:
v2-0001-Extract-unchanged-replica-identity-key-if-it-is-s.patchtext/x-patch; charset=US-ASCII; name=v2-0001-Extract-unchanged-replica-identity-key-if-it-is-s.patchDownload
From 3be161266dd013b8fdd68352dc1c6bd34ed36069 Mon Sep 17 00:00:00 2001
From: Dilip Kumar <dilipkumar@localhost.localdomain>
Date: Mon, 31 May 2021 14:37:26 +0530
Subject: [PATCH v2] Extract unchanged replica identity key if it is stored
externally
If replica identity is set to key and the key is not modified we don't log
key seperately because it should be logged along with the updated tuple. But
if the key is stored externally we must have to detoast and log it separately.
---
src/backend/access/heap/heapam.c | 16 ++++++++++++++--
1 file changed, 14 insertions(+), 2 deletions(-)
diff --git a/src/backend/access/heap/heapam.c b/src/backend/access/heap/heapam.c
index bd60129..9b53cdc 100644
--- a/src/backend/access/heap/heapam.c
+++ b/src/backend/access/heap/heapam.c
@@ -8407,8 +8407,11 @@ ExtractReplicaIdentity(Relation relation, HeapTuple tp, bool key_changed,
return tp;
}
- /* if the key hasn't changed and we're only logging the key, we're done */
- if (!key_changed)
+ /*
+ * if the key hasn't changed and we're only logging the key, we're done.
+ * But if tuple has external data then we might have to detoast the key.
+ */
+ if ((!key_changed) && !HeapTupleHasExternal(tp))
return NULL;
/* find out the replica identity columns */
@@ -8459,6 +8462,15 @@ ExtractReplicaIdentity(Relation relation, HeapTuple tp, bool key_changed,
key_tuple = toast_flatten_tuple(oldtup, desc);
heap_freetuple(oldtup);
}
+ /*
+ * If key tuple doesn't have any external data and key is not changed then
+ * just free the key tuple and return NULL.
+ */
+ else if (!key_changed)
+ {
+ heap_freetuple(key_tuple);
+ return NULL;
+ }
return key_tuple;
}
--
1.8.3.1
On Wed, Jun 2, 2021 2:44 PM Dilip Kumar <dilipbalaut@gmail.com> wrote:
Attached patch fixes that, I haven't yet added the test case. Once
someone confirms on the approach then I will add a test case to the
patch.
key_tuple = heap_form_tuple(desc, values, nulls);
*copy = true;
...
key_tuple = toast_flatten_tuple(oldtup, desc);
heap_freetuple(oldtup);
}
+ /*
+ * If key tuple doesn't have any external data and key is not changed then
+ * just free the key tuple and return NULL.
+ */
+ else if (!key_changed)
+ {
+ heap_freetuple(key_tuple);
+ return NULL;
+ }
return key_tuple;
}
I think "*copy = false" should be added before return NULL because we don't return a modified copy tuple here. Thoughts?
Regards
Tang
On Wed, Jun 2, 2021 at 2:37 PM tanghy.fnst@fujitsu.com
<tanghy.fnst@fujitsu.com> wrote:
On Wed, Jun 2, 2021 2:44 PM Dilip Kumar <dilipbalaut@gmail.com> wrote:
Attached patch fixes that, I haven't yet added the test case. Once
someone confirms on the approach then I will add a test case to the
patch.key_tuple = heap_form_tuple(desc, values, nulls); *copy = true; ... key_tuple = toast_flatten_tuple(oldtup, desc); heap_freetuple(oldtup); } + /* + * If key tuple doesn't have any external data and key is not changed then + * just free the key tuple and return NULL. + */ + else if (!key_changed) + { + heap_freetuple(key_tuple); + return NULL; + }return key_tuple;
}I think "*copy = false" should be added before return NULL because we don't return a modified copy tuple here. Thoughts?
Yes, you are right. I will change it in the next version, along with
the test case.
--
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com
On Wed, Jun 2, 2021 at 3:10 PM Dilip Kumar <dilipbalaut@gmail.com> wrote:
Yes, you are right. I will change it in the next version, along with
the test case.
+ /*
+ * if the key hasn't changed and we're only logging the key, we're done.
+ * But if tuple has external data then we might have to detoast the key.
+ */
This doesn't really mention why we need to detoast the key even when
the key remains the same. I guess we can add some more details here.
--
Thanks & Regards,
Kuntal Ghosh
On Wed, Jun 2, 2021 at 7:20 PM Kuntal Ghosh <kuntalghosh.2007@gmail.com> wrote:
On Wed, Jun 2, 2021 at 3:10 PM Dilip Kumar <dilipbalaut@gmail.com> wrote:
Yes, you are right. I will change it in the next version, along with
the test case.+ /* + * if the key hasn't changed and we're only logging the key, we're done. + * But if tuple has external data then we might have to detoast the key. + */ This doesn't really mention why we need to detoast the key even when the key remains the same. I guess we can add some more details here.
Noted, let's see what others have to say about fixing this, then I
will fix this along with one other pending comment and I will also add
the test case. Thanks for looking into this.
--
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com
On Wed, Jun 2, 2021 at 7:23 PM Dilip Kumar <dilipbalaut@gmail.com> wrote:
On Wed, Jun 2, 2021 at 7:20 PM Kuntal Ghosh <kuntalghosh.2007@gmail.com> wrote:
On Wed, Jun 2, 2021 at 3:10 PM Dilip Kumar <dilipbalaut@gmail.com> wrote:
Yes, you are right. I will change it in the next version, along with
the test case.+ /* + * if the key hasn't changed and we're only logging the key, we're done. + * But if tuple has external data then we might have to detoast the key. + */ This doesn't really mention why we need to detoast the key even when the key remains the same. I guess we can add some more details here.Noted, let's see what others have to say about fixing this, then I
will fix this along with one other pending comment and I will also add
the test case. Thanks for looking into this.
I have fixed all the pending issues, I see there is already a test
case for this so I have changed the output for that.
--
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com
Attachments:
v3-0001-Extract-unchanged-replica-identity-key-if-it-is-s.patchtext/x-patch; charset=US-ASCII; name=v3-0001-Extract-unchanged-replica-identity-key-if-it-is-s.patchDownload
From fe1b9131c6c53a6335dd4fd7ceb5f8c18aacdba9 Mon Sep 17 00:00:00 2001
From: Dilip Kumar <dilipkumar@localhost.localdomain>
Date: Mon, 31 May 2021 14:37:26 +0530
Subject: [PATCH v3] Extract unchanged replica identity key if it is stored
externally
If replica identity is set to key and the key is not modified we don't log
key seperately because it should be logged along with the updated tuple. But
if the key is stored externally we must have to detoast and log it separately.
---
contrib/test_decoding/expected/toast.out | 2 +-
src/backend/access/heap/heapam.c | 20 ++++++++++++++++++--
2 files changed, 19 insertions(+), 3 deletions(-)
diff --git a/contrib/test_decoding/expected/toast.out b/contrib/test_decoding/expected/toast.out
index 75c4d22..769ab0e 100644
--- a/contrib/test_decoding/expected/toast.out
+++ b/contrib/test_decoding/expected/toast.out
@@ -77,7 +77,7 @@ SELECT substr(data, 1, 200) FROM pg_logical_slot_get_changes('regression_slot',
table public.toasted_key: INSERT: id[integer]:1 toasted_key[text]:'1234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123
COMMIT
BEGIN
- table public.toasted_key: UPDATE: id[integer]:1 toasted_key[text]:unchanged-toast-datum toasted_col1[text]:unchanged-toast-datum toasted_col2[text]:'987654321098765432109876543210987654321098765432109
+ table public.toasted_key: UPDATE: old-key: toasted_key[text]:'123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678
COMMIT
BEGIN
table public.toasted_key: UPDATE: old-key: toasted_key[text]:'123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678
diff --git a/src/backend/access/heap/heapam.c b/src/backend/access/heap/heapam.c
index 2433998..afd775e 100644
--- a/src/backend/access/heap/heapam.c
+++ b/src/backend/access/heap/heapam.c
@@ -8339,8 +8339,14 @@ ExtractReplicaIdentity(Relation relation, HeapTuple tp, bool key_changed,
return tp;
}
- /* if the key hasn't changed and we're only logging the key, we're done */
- if (!key_changed)
+ /*
+ * If the key hasn't changed and we're only logging the key, we're done.
+ * But if tuple has external data then it's possible that key might have
+ * stored externally, if so we need to extract its value because the value
+ * of the externally stored key data will not be logged with the main tuple
+ * so we will have to log as old key tuple for replica identity.
+ */
+ if ((!key_changed) && !HeapTupleHasExternal(tp))
return NULL;
/* find out the replica identity columns */
@@ -8391,6 +8397,16 @@ ExtractReplicaIdentity(Relation relation, HeapTuple tp, bool key_changed,
key_tuple = toast_flatten_tuple(oldtup, desc);
heap_freetuple(oldtup);
}
+ /*
+ * If key tuple doesn't have any external data and key is not changed then
+ * just free the key tuple and return NULL.
+ */
+ else if (!key_changed)
+ {
+ heap_freetuple(key_tuple);
+ *copy = false;
+ return NULL;
+ }
return key_tuple;
}
--
1.8.3.1
On Thu, Jun 3, 2021 7:45 PMDilip Kumar <dilipbalaut@gmail.com> wrote:
I have fixed all the pending issues, I see there is already a test
case for this so I have changed the output for that.
Thanks for your patch. I tested it for all branches(10,11,12,13,HEAD) and all of them passed.(This bug was introduced in PG-10.)
I also tested the scenario where I found this bug, data could be synchronized after your fix.
Regards
Tang
On Fri, Jun 4, 2021 at 8:25 AM tanghy.fnst@fujitsu.com
<tanghy.fnst@fujitsu.com> wrote:
On Thu, Jun 3, 2021 7:45 PMDilip Kumar <dilipbalaut@gmail.com> wrote:
I have fixed all the pending issues, I see there is already a test
case for this so I have changed the output for that.Thanks for your patch. I tested it for all branches(10,11,12,13,HEAD) and all of them passed.(This bug was introduced in PG-10.)
I also tested the scenario where I found this bug, data could be synchronized after your fix.
Thanks for verifying this.
--
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com
On Thu, Jun 3, 2021 at 5:15 PM Dilip Kumar <dilipbalaut@gmail.com> wrote:
On Wed, Jun 2, 2021 at 7:23 PM Dilip Kumar <dilipbalaut@gmail.com> wrote:
On Wed, Jun 2, 2021 at 7:20 PM Kuntal Ghosh <kuntalghosh.2007@gmail.com> wrote:
On Wed, Jun 2, 2021 at 3:10 PM Dilip Kumar <dilipbalaut@gmail.com> wrote:
Yes, you are right. I will change it in the next version, along with
the test case.+ /* + * if the key hasn't changed and we're only logging the key, we're done. + * But if tuple has external data then we might have to detoast the key. + */ This doesn't really mention why we need to detoast the key even when the key remains the same. I guess we can add some more details here.Noted, let's see what others have to say about fixing this, then I
will fix this along with one other pending comment and I will also add
the test case. Thanks for looking into this.I have fixed all the pending issues, I see there is already a test
case for this so I have changed the output for that.
IIUC, this issue occurs because we don't log the actual key value for
unchanged toast key. It is neither logged as part of old_key_tuple nor
for new tuple due to which we are not able to find it in the
apply-side when we searched it via FindReplTupleInLocalRel. Now, I
think it will work if we LOG the entire unchanged toasted value as you
have done in the patch but can we explore some other way to fix it. In
the subscriber-side, can we detect that the key column has toasted
value in the new tuple and try to first fetch it and then do the index
search for the fetched toasted value? I am not sure about the
feasibility of this but wanted to see if we can someway avoid logging
unchanged toasted key value as that might save us from additional WAL.
--
With Regards,
Amit Kapila.
On Thu, Jul 22, 2021 at 4:11 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
On Thu, Jun 3, 2021 at 5:15 PM Dilip Kumar <dilipbalaut@gmail.com> wrote:
On Wed, Jun 2, 2021 at 7:23 PM Dilip Kumar <dilipbalaut@gmail.com> wrote:
On Wed, Jun 2, 2021 at 7:20 PM Kuntal Ghosh <kuntalghosh.2007@gmail.com> wrote:
On Wed, Jun 2, 2021 at 3:10 PM Dilip Kumar <dilipbalaut@gmail.com> wrote:
Yes, you are right. I will change it in the next version, along with
the test case.+ /* + * if the key hasn't changed and we're only logging the key, we're done. + * But if tuple has external data then we might have to detoast the key. + */ This doesn't really mention why we need to detoast the key even when the key remains the same. I guess we can add some more details here.Noted, let's see what others have to say about fixing this, then I
will fix this along with one other pending comment and I will also add
the test case. Thanks for looking into this.I have fixed all the pending issues, I see there is already a test
case for this so I have changed the output for that.IIUC, this issue occurs because we don't log the actual key value for
unchanged toast key. It is neither logged as part of old_key_tuple nor
for new tuple due to which we are not able to find it in the
apply-side when we searched it via FindReplTupleInLocalRel. Now, I
think it will work if we LOG the entire unchanged toasted value as you
have done in the patch but can we explore some other way to fix it. In
the subscriber-side, can we detect that the key column has toasted
value in the new tuple and try to first fetch it and then do the index
search for the fetched toasted value? I am not sure about the
feasibility of this but wanted to see if we can someway avoid logging
unchanged toasted key value as that might save us from additional WAL.
Yeah if we can do this then it will be a better approach, I think as
you mentioned we can avoid logging unchanged toast key data. I will
investigate this next week and update the thread.
--
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com
On Thu, Jul 22, 2021 at 8:02 PM Dilip Kumar <dilipbalaut@gmail.com> wrote:
On Thu, Jul 22, 2021 at 4:11 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
On Thu, Jun 3, 2021 at 5:15 PM Dilip Kumar <dilipbalaut@gmail.com> wrote:
On Wed, Jun 2, 2021 at 7:23 PM Dilip Kumar <dilipbalaut@gmail.com> wrote:
On Wed, Jun 2, 2021 at 7:20 PM Kuntal Ghosh <kuntalghosh.2007@gmail.com> wrote:
On Wed, Jun 2, 2021 at 3:10 PM Dilip Kumar <dilipbalaut@gmail.com> wrote:
Yes, you are right. I will change it in the next version, along with
the test case.+ /* + * if the key hasn't changed and we're only logging the key, we're done. + * But if tuple has external data then we might have to detoast the key. + */ This doesn't really mention why we need to detoast the key even when the key remains the same. I guess we can add some more details here.Noted, let's see what others have to say about fixing this, then I
will fix this along with one other pending comment and I will also add
the test case. Thanks for looking into this.I have fixed all the pending issues, I see there is already a test
case for this so I have changed the output for that.IIUC, this issue occurs because we don't log the actual key value for
unchanged toast key. It is neither logged as part of old_key_tuple nor
for new tuple due to which we are not able to find it in the
apply-side when we searched it via FindReplTupleInLocalRel. Now, I
think it will work if we LOG the entire unchanged toasted value as you
have done in the patch but can we explore some other way to fix it. In
the subscriber-side, can we detect that the key column has toasted
value in the new tuple and try to first fetch it and then do the index
search for the fetched toasted value? I am not sure about the
feasibility of this but wanted to see if we can someway avoid logging
unchanged toasted key value as that might save us from additional WAL.Yeah if we can do this then it will be a better approach, I think as
you mentioned we can avoid logging unchanged toast key data. I will
investigate this next week and update the thread.
Okay, thanks. I think one point we need to consider here is that on
the subscriber side, we use dirtysnapshot to search the key, so we
need to ensure that we don't fetch the wrong data. I am not sure what
will happen when by the time we try to search the tuple in the
subscriber-side for the update, it has been removed and re-inserted
with the same values by the user. Do we find the newly inserted tuple
and update it? If so, can it also happen even if logged the unchanged
old_key_tuple as the patch is doing currently?
--
With Regards,
Amit Kapila.
On Fri, Jul 23, 2021 at 8:58 AM Amit Kapila <amit.kapila16@gmail.com> wrote:
Okay, thanks. I think one point we need to consider here is that on
the subscriber side, we use dirtysnapshot to search the key, so we
need to ensure that we don't fetch the wrong data. I am not sure what
will happen when by the time we try to search the tuple in the
subscriber-side for the update, it has been removed and re-inserted
with the same values by the user. Do we find the newly inserted tuple
and update it? If so, can it also happen even if logged the unchanged
old_key_tuple as the patch is doing currently?
I was thinking more about this idea, but IMHO, unless we send the key
toasted tuple from the publisher how is the subscriber supposed to
fetch it. Because that is the key value for finding the tuple on the
subscriber side and if we haven't sent the key value, how are we
supposed to find the tuple on the subscriber side?
--
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com
On Mon, Jul 26, 2021 at 10:45 AM Dilip Kumar <dilipbalaut@gmail.com> wrote:
I was thinking more about this idea, but IMHO, unless we send the key
toasted tuple from the publisher how is the subscriber supposed to
fetch it. Because that is the key value for finding the tuple on the
subscriber side and if we haven't sent the key value, how are we
supposed to find the tuple on the subscriber side?
I was thinking of using toast pointer but that won't work because it
can be different on the subscriber-side. I don't see any better ideas
to fix this issue. This problem seems to be from the time Logical
Replication has been introduced, so adding others (who are generally
involved in this area) to see what they think about this bug? I think
people might not be using toasted columns for Replica Identity due to
which this problem has been reported yet but I feel this is quite a
fundamental issue and we should do something about this.
Let me summarize the problem for the ease of others.
The logical replica can go out of sync for UPDATES when there is a
toast column as part of REPLICA IDENTITY. In such cases, updates are
not replicated if the key column doesn't change because we don't log
the actual key value for the unchanged toast key. It is neither logged
as part of old_key_tuple nor for new tuple due to which we are not
able to find the tuple to be updated on the subscriber-side and the
update is ignored on the subscriber-side. We log this in DEBUG1 mode
but I don't think the user can do anything about this and the replica
will go out-of-sync. This works when the replica identity column value
is not toasted because then it will be part of the new tuple and we
use that to fetch the tuple on the subscriber.
Now, it is not clear if the key-value (for the toast column which is
part of replica identity) is not present in WAL how we can find the
tuple to update on subscriber? We can't use the toast pointer in the
new tuple to fetch the toast information as that can be different on
subscribers. The simple way is to WAL LOG the unchanged toasted value
as part of old_key_tuple, this will be required only for toast
columns.
Note that Delete works because we WAL Log the unchanged key tuple in that case.
--
With Regards,
Amit Kapila.
On Fri, Jul 30, 2021 at 10:21 AM Amit Kapila <amit.kapila16@gmail.com> wrote:
This problem seems to be from the time Logical
Replication has been introduced, so adding others (who are generally
involved in this area) to see what they think about this bug? I think
people might not be using toasted columns for Replica Identity due to
which this problem has been reported yet but I feel this is quite a
fundamental issue and we should do something about this.Let me summarize the problem for the ease of others.
The logical replica can go out of sync for UPDATES when there is a
toast column as part of REPLICA IDENTITY. In such cases, updates are
not replicated if the key column doesn't change because we don't log
the actual key value for the unchanged toast key. It is neither logged
as part of old_key_tuple nor for new tuple due to which we are not
able to find the tuple to be updated on the subscriber-side and the
update is ignored on the subscriber-side. We log this in DEBUG1 mode
but I don't think the user can do anything about this and the replica
will go out-of-sync. This works when the replica identity column value
is not toasted because then it will be part of the new tuple and we
use that to fetch the tuple on the subscriber.
It seems to me this problem exists from the time we introduced
wal_level = logical in the commit e55704d8b2 [1]commit e55704d8b2fe522fbc9435acbb5bc59033478bd5 Author: Robert Haas <rhaas@postgresql.org> Date: Tue Dec 10 18:33:45 2013 -0500, or another
possibility is that logical replication commit didn't consider
something to make it work. Andres, Robert, Petr, can you guys please
comment because otherwise, we might miss something here.
[1]: commit e55704d8b2fe522fbc9435acbb5bc59033478bd5 Author: Robert Haas <rhaas@postgresql.org> Date: Tue Dec 10 18:33:45 2013 -0500
commit e55704d8b2fe522fbc9435acbb5bc59033478bd5
Author: Robert Haas <rhaas@postgresql.org>
Date: Tue Dec 10 18:33:45 2013 -0500
Add new wal_level, logical, sufficient for logical decoding.
When wal_level=logical, we'll log columns from the old tuple as
configured by the REPLICA IDENTITY facility added in commit
07cacba983ef79be4a84fcd0e0ca3b5fcb85dd65. This makes it possible a
properly-configured logical replication solution to correctly
follow table updates even if they change the chosen key columns, or,
with REPLICA IDENTITY FULL, even if the table has no key at all. Note
that updates which do not modify the replica identity column won't log
anything extra, making the choice of a good key (i.e. one that will
rarely be changed) important to performance when wal_level=logical is
configured.
..
Andres Freund, reviewed in various versions by myself, Heikki
Linnakangas, KONDO Mitsumasa, and many others.
--
With Regards,
Amit Kapila.
On 2021-Jul-30, Amit Kapila wrote:
I was thinking of using toast pointer but that won't work because it
can be different on the subscriber-side. I don't see any better ideas
to fix this issue. This problem seems to be from the time Logical
Replication has been introduced, so adding others (who are generally
involved in this area) to see what they think about this bug? I think
people might not be using toasted columns for Replica Identity due to
which this problem has been reported yet but I feel this is quite a
fundamental issue and we should do something about this.
In the evening before going offline a week ago I was looking at this and
my conclusion was that this was a legitimate problem: the original
implementation is faulty in that the full detoasted value is required to
be transmitted in order for downstream to be able to read the value.
I am not sure if at the level of logical decoding it is a problem
theoretically, but at least for logical replication it is clearly a
practical problem.
Reading Dilip's last posted patch that day, I had some reservations
about the API of ExtractReplicaIdentity. The new argument makes for a
very strange to explain behavior "return the key values if they are
unchanged, *or* if they are toasted" ... ??? I tried to make sense of
that, and tried to find a concept that would make sense to the whole,
but couldn't find any obvious angle in the short time I looked at it.
I haven't looked at it again.
--
Álvaro Herrera PostgreSQL Developer — https://www.EnterpriseDB.com/
"El Maquinismo fue proscrito so pena de cosquilleo hasta la muerte"
(Ijon Tichy en Viajes, Stanislaw Lem)
On Tue, Aug 10, 2021 at 8:08 PM Alvaro Herrera <alvherre@alvh.no-ip.org> wrote:
On 2021-Jul-30, Amit Kapila wrote:
Reading Dilip's last posted patch that day, I had some reservations
about the API of ExtractReplicaIdentity. The new argument makes for a
very strange to explain behavior "return the key values if they are
unchanged, *or* if they are toasted" ... ???
I think we can say it as "Return the key values if they are changed
*or* if they are toasted". Currently, we have an exception for Deletes
where the caller always passed key_changed as true, so maybe we can
have a similar exception when the tuple has toasted values. Can we
think of changing the flag to "key_required" instead of "key_changed"
and let the caller identify and set its value? For Deletes, it will
work the same but for Updates, the caller needs to compute it by
checking if any of the key columns are modified or has a toast value.
We can try to see if the caller can identify it cheaply along with
determining the modified_attrs as at that time we will anyway check
replica key attrs.
Currently, in proposed patch first, we check that the tuple has any
toast values and then it deforms and forms the new key tuple. After
that, it checks if the key has any toast values and then only decides
to return the tuple. If as described in the previous paragraph, we can
cheaply identify whether the key has toasted values, then we can avoid
deform/form cost in some cases. Also, I think we need to change the
Replica Identity description in the docs[1]https://www.postgresql.org/docs/devel/sql-altertable.html.
[1]: https://www.postgresql.org/docs/devel/sql-altertable.html
--
With Regards,
Amit Kapila.
On Wed, Aug 11, 2021 at 10:30 AM Amit Kapila <amit.kapila16@gmail.com> wrote:
On Tue, Aug 10, 2021 at 8:08 PM Alvaro Herrera <alvherre@alvh.no-ip.org> wrote:
On 2021-Jul-30, Amit Kapila wrote:
Reading Dilip's last posted patch that day, I had some reservations
about the API of ExtractReplicaIdentity. The new argument makes for a
very strange to explain behavior "return the key values if they are
unchanged, *or* if they are toasted" ... ???I think we can say it as "Return the key values if they are changed
*or* if they are toasted". Currently, we have an exception for Deletes
where the caller always passed key_changed as true, so maybe we can
have a similar exception when the tuple has toasted values. Can we
think of changing the flag to "key_required" instead of "key_changed"
and let the caller identify and set its value? For Deletes, it will
work the same but for Updates, the caller needs to compute it by
checking if any of the key columns are modified or has a toast value.
We can try to see if the caller can identify it cheaply along with
determining the modified_attrs as at that time we will anyway check
replica key attrs.
Right
Currently, in proposed patch first, we check that the tuple has any
toast values and then it deforms and forms the new key tuple. After
that, it checks if the key has any toast values and then only decides
to return the tuple. If as described in the previous paragraph, we can
cheaply identify whether the key has toasted values, then we can avoid
deform/form cost in some cases. Also, I think we need to change the
Replica Identity description in the docs[1].
Yeah we can avoid that by detecting any toasted replica identity key
in HeapDetermineModifiedColumns, check the attached patch.
--
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com
Attachments:
v4-0001-Extract-unchanged-external-replica-identity-key.patchapplication/x-patch; name=v4-0001-Extract-unchanged-external-replica-identity-key.patchDownload
From 3879809bf1a995995dccf5064eb9282f72af0412 Mon Sep 17 00:00:00 2001
From: Dilip Kumar <dilipkumar@localhost.localdomain>
Date: Wed, 11 Aug 2021 18:09:25 +0530
Subject: [PATCH v4] Extract unchanged external replica identity key
If replica identity is set to key and the key is not modified we don't log
key seperately because it should be logged along with the updated tuple. But
if the key is stored externally we must have to detoast and log it separately.
---
contrib/test_decoding/expected/toast.out | 2 +-
src/backend/access/heap/heapam.c | 51 ++++++++++++++++++++++----------
2 files changed, 37 insertions(+), 16 deletions(-)
diff --git a/contrib/test_decoding/expected/toast.out b/contrib/test_decoding/expected/toast.out
index 75c4d22..769ab0e 100644
--- a/contrib/test_decoding/expected/toast.out
+++ b/contrib/test_decoding/expected/toast.out
@@ -77,7 +77,7 @@ SELECT substr(data, 1, 200) FROM pg_logical_slot_get_changes('regression_slot',
table public.toasted_key: INSERT: id[integer]:1 toasted_key[text]:'1234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123
COMMIT
BEGIN
- table public.toasted_key: UPDATE: id[integer]:1 toasted_key[text]:unchanged-toast-datum toasted_col1[text]:unchanged-toast-datum toasted_col2[text]:'987654321098765432109876543210987654321098765432109
+ table public.toasted_key: UPDATE: old-key: toasted_key[text]:'123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678
COMMIT
BEGIN
table public.toasted_key: UPDATE: old-key: toasted_key[text]:'123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678
diff --git a/src/backend/access/heap/heapam.c b/src/backend/access/heap/heapam.c
index 2433998..e788926 100644
--- a/src/backend/access/heap/heapam.c
+++ b/src/backend/access/heap/heapam.c
@@ -80,7 +80,9 @@ static XLogRecPtr log_heap_update(Relation reln, Buffer oldbuf,
bool all_visible_cleared, bool new_all_visible_cleared);
static Bitmapset *HeapDetermineModifiedColumns(Relation relation,
Bitmapset *interesting_cols,
- HeapTuple oldtup, HeapTuple newtup);
+ Bitmapset *check_external_attr,
+ HeapTuple oldtup, HeapTuple newtup,
+ bool *id_has_external);
static bool heap_acquire_tuplock(Relation relation, ItemPointer tid,
LockTupleMode mode, LockWaitPolicy wait_policy,
bool *have_tuple_lock);
@@ -106,7 +108,7 @@ static bool ConditionalMultiXactIdWait(MultiXactId multi, MultiXactStatus status
static void index_delete_sort(TM_IndexDeleteOp *delstate);
static int bottomup_sort_and_shrink(TM_IndexDeleteOp *delstate);
static XLogRecPtr log_heap_new_cid(Relation relation, HeapTuple tup);
-static HeapTuple ExtractReplicaIdentity(Relation rel, HeapTuple tup, bool key_changed,
+static HeapTuple ExtractReplicaIdentity(Relation rel, HeapTuple tup, bool key_required,
bool *copy);
@@ -3185,6 +3187,7 @@ heap_update(Relation relation, ItemPointer otid, HeapTuple newtup,
bool all_visible_cleared_new = false;
bool checked_lockers;
bool locker_remains;
+ bool id_has_external = false;
TransactionId xmax_new_tuple,
xmax_old_tuple;
uint16 infomask_old_tuple,
@@ -3282,7 +3285,8 @@ heap_update(Relation relation, ItemPointer otid, HeapTuple newtup,
/* Determine columns modified by the update. */
modified_attrs = HeapDetermineModifiedColumns(relation, interesting_attrs,
- &oldtup, newtup);
+ id_attrs, &oldtup,
+ newtup, &id_has_external);
/*
* If we're not updating any "key" column, we can grab a weaker lock type.
@@ -3883,10 +3887,12 @@ l2:
* Compute replica identity tuple before entering the critical section so
* we don't PANIC upon a memory allocation failure.
* ExtractReplicaIdentity() will return NULL if nothing needs to be
- * logged.
+ * logged. Pass old key required as true only if the replica identity key
+ * columns are modified or it has external data.
*/
old_key_tuple = ExtractReplicaIdentity(relation, &oldtup,
- bms_overlap(modified_attrs, id_attrs),
+ bms_overlap(modified_attrs, id_attrs) ||
+ id_has_external,
&old_key_copied);
/* NO EREPORT(ERROR) from here till changes are logged */
@@ -4047,7 +4053,8 @@ l2:
*/
static bool
heap_tuple_attr_equals(TupleDesc tupdesc, int attrnum,
- HeapTuple tup1, HeapTuple tup2)
+ HeapTuple tup1, HeapTuple tup2,
+ bool *old_has_external)
{
Datum value1,
value2;
@@ -4083,6 +4090,13 @@ heap_tuple_attr_equals(TupleDesc tupdesc, int attrnum,
value1 = heap_getattr(tup1, attrnum, tupdesc, &isnull1);
value2 = heap_getattr(tup2, attrnum, tupdesc, &isnull2);
+ /* check whether the value is stored externally or not and set the flag */
+ if (!isnull1 && TupleDescAttr(tupdesc, attrnum - 1)->attlen == -1 &&
+ VARATT_IS_EXTERNAL((struct varlena *) DatumGetPointer(value1)))
+ *old_has_external = true;
+ else
+ *old_has_external = false;
+
/*
* If one value is NULL and other is not, then they are certainly not
* equal
@@ -4129,19 +4143,25 @@ heap_tuple_attr_equals(TupleDesc tupdesc, int attrnum,
*/
static Bitmapset *
HeapDetermineModifiedColumns(Relation relation, Bitmapset *interesting_cols,
- HeapTuple oldtup, HeapTuple newtup)
+ Bitmapset *check_external_attr,
+ HeapTuple oldtup, HeapTuple newtup,
+ bool *id_has_external)
{
int attnum;
Bitmapset *modified = NULL;
+ bool has_external = false;
while ((attnum = bms_first_member(interesting_cols)) >= 0)
{
attnum += FirstLowInvalidHeapAttributeNumber;
if (!heap_tuple_attr_equals(RelationGetDescr(relation),
- attnum, oldtup, newtup))
+ attnum, oldtup, newtup, &has_external))
modified = bms_add_member(modified,
attnum - FirstLowInvalidHeapAttributeNumber);
+ if (has_external && bms_is_member(attnum - FirstLowInvalidHeapAttributeNumber,
+ check_external_attr))
+ *id_has_external = true;
}
return modified;
@@ -8300,14 +8320,15 @@ log_heap_new_cid(Relation relation, HeapTuple tup)
* Returns NULL if there's no need to log an identity or if there's no suitable
* key defined.
*
- * key_changed should be false if caller knows that no replica identity
- * columns changed value. It's always true in the DELETE case.
+ * key_required should be false if caller knows that no replica identity
+ * columns changed value and it doesn't has any external data.
+ * It's always true in the DELETE case.
*
* *copy is set to true if the returned tuple is a modified copy rather than
* the same tuple that was passed in.
*/
static HeapTuple
-ExtractReplicaIdentity(Relation relation, HeapTuple tp, bool key_changed,
+ExtractReplicaIdentity(Relation relation, HeapTuple tp, bool key_required,
bool *copy)
{
TupleDesc desc = RelationGetDescr(relation);
@@ -8340,7 +8361,7 @@ ExtractReplicaIdentity(Relation relation, HeapTuple tp, bool key_changed,
}
/* if the key hasn't changed and we're only logging the key, we're done */
- if (!key_changed)
+ if (!key_required)
return NULL;
/* find out the replica identity columns */
@@ -8348,10 +8369,10 @@ ExtractReplicaIdentity(Relation relation, HeapTuple tp, bool key_changed,
INDEX_ATTR_BITMAP_IDENTITY_KEY);
/*
- * If there's no defined replica identity columns, treat as !key_changed.
+ * If there's no defined replica identity columns, treat as !key_required.
* (This case should not be reachable from heap_update, since that should
- * calculate key_changed accurately. But heap_delete just passes constant
- * true for key_changed, so we can hit this case in deletes.)
+ * calculate key_required accurately. But heap_delete just passes constant
+ * true for key_required, so we can hit this case in deletes.)
*/
if (bms_is_empty(idattrs))
return NULL;
--
1.8.3.1
On Wed, Aug 11, 2021 at 10:45 PM Dilip Kumar <dilipbalaut@gmail.com> wrote:
Yeah we can avoid that by detecting any toasted replica identity key
in HeapDetermineModifiedColumns, check the attached patch.
The patch applies cleanly, all tests pass, I tried out a few toast
combination tests and they seem to be working fine.
No review comments, the patch looks good to me.
regards,
Ajin Cherian
Fujitsu Australia
On Wed, Aug 11, 2021 at 10:45 PM Dilip Kumar <dilipbalaut@gmail.com> wrote:
Yeah we can avoid that by detecting any toasted replica identity key
in HeapDetermineModifiedColumns, check the attached patch.
I had a second look at this, and I just had a small doubt. Since the
convention is that for UPDATES, the old tuple/key is written to
WAL only if the one of the columns in the key has changed as part of
the update, and we are breaking that convention with this patch by
also including
the old key if it is toasted and is stored in disk even if it is not changed.
Why do we not include the detoasted key as part of the new tuple
rather than the old tuple? Then we don't really break this convention.
And one small typo in the patch:
The header above ExtractReplicaIdentity()
Before:
* key_required should be false if caller knows that no replica identity
* columns changed value and it doesn't has any external data.
* It's always true in the DELETE case.
After:
* key_required should be false if caller knows that no replica identity
* columns changed value and it doesn't have any external data.
* It's always true in the DELETE case.
regards,
Ajin Cherian
Fujitsu Australia
On Wed, Sep 8, 2021 at 11:26 AM Ajin Cherian <itsajin@gmail.com> wrote:
On Wed, Aug 11, 2021 at 10:45 PM Dilip Kumar <dilipbalaut@gmail.com>
wrote:Yeah we can avoid that by detecting any toasted replica identity key
in HeapDetermineModifiedColumns, check the attached patch.I had a second look at this, and I just had a small doubt. Since the
convention is that for UPDATES, the old tuple/key is written to
WAL only if the one of the columns in the key has changed as part of
the update, and we are breaking that convention with this patch by
also including
the old key if it is toasted and is stored in disk even if it is not
changed.
Why do we not include the detoasted key as part of the new tuple
rather than the old tuple? Then we don't really break this convention.
The purpose of including the toasted old data is only for the replica
identity, but if you include it in the new tuple then it will affect the
general wal replay, basically, now you will have large detoasted data in
your new tuple which your are directly going to memcpy on the standby while
replaying so that will create corruption. So basically, you can not
include this in the new tuple without changing a lot of logic around replay
which is completely useless.
So let this tuple be for a specific purpose and that is replica identity in
our case.
And one small typo in the patch:
The header above ExtractReplicaIdentity()
Before:
* key_required should be false if caller knows that no replica identity
* columns changed value and it doesn't has any external data.
* It's always true in the DELETE case.After:
* key_required should be false if caller knows that no replica identity
* columns changed value and it doesn't have any external data.
* It's always true in the DELETE case.
Okay, I will change this.
--
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com
On Wed, Aug 11, 2021 at 06:14:55PM +0530, Dilip Kumar wrote:
Right
Amit, are you planning to look more at this patch? It has been a
couple of months since the last update, and this is still a bug as far
as I understand.
FWIW, I find the API changes of HeapDetermineModifiedColumns() and
ExtractReplicaIdentity() a bit grotty. Shouldn't we try to flatten
the old tuple instead? There are things like
toast_flatten_tuple_to_datum() for this purpose if a tuple satisfies
HeapTupleHasExternal(), or just heap_copy_tuple_as_datum().
--
Michael
On Mon, Jan 24, 2022 at 9:28 AM Michael Paquier <michael@paquier.xyz> wrote:
On Wed, Aug 11, 2021 at 06:14:55PM +0530, Dilip Kumar wrote:
Right
Amit, are you planning to look more at this patch? It has been a
couple of months since the last update, and this is still a bug as far
as I understand.FWIW, I find the API changes of HeapDetermineModifiedColumns() and
ExtractReplicaIdentity() a bit grotty. Shouldn't we try to flatten
the old tuple instead? There are things like
toast_flatten_tuple_to_datum() for this purpose if a tuple satisfies
HeapTupleHasExternal(), or just heap_copy_tuple_as_datum().
That can add overhead in cases where we don't need to log the toasted
values of the old tuple. We only need it for the case where we have
unchanged toasted replica identity columns. In the previous version
[1]: /messages/by-id/CAFiTN-sTS4bB7W3UJV3iUm=wKdr9EpOwyK97hNr77MzFQm_NBw@mail.gmail.com
have overhead as explained in the second paragraph of the email [2]/messages/by-id/CAA4eK1KgZr=QSBE_Qh0Qfb2ma1Tc6+ZxkMaUHO7aC7b9WSCRaw@mail.gmail.com.
Also, Alvaro seems to have some reservations about that change. I
don't know if there is a better way to fix this but I could be missing
something.
[1]: /messages/by-id/CAFiTN-sTS4bB7W3UJV3iUm=wKdr9EpOwyK97hNr77MzFQm_NBw@mail.gmail.com
[2]: /messages/by-id/CAA4eK1KgZr=QSBE_Qh0Qfb2ma1Tc6+ZxkMaUHO7aC7b9WSCRaw@mail.gmail.com
--
With Regards,
Amit Kapila.
On Mon, Jan 24, 2022, at 12:58 AM, Michael Paquier wrote:
FWIW, I find the API changes of HeapDetermineModifiedColumns() and
ExtractReplicaIdentity() a bit grotty. Shouldn't we try to flatten
the old tuple instead? There are things like
toast_flatten_tuple_to_datum() for this purpose if a tuple satisfies
HeapTupleHasExternal(), or just heap_copy_tuple_as_datum().
I checked v4 and I don't like the HeapDetermineModifiedColumns() and
heap_tuple_attr_equals() changes either. It seems it is hijacking these
functions to something else. I would suggest to change the signature to
static void
heap_tuple_attr_equals(TupleDesc tupdesc, int attrnum,
HeapTuple tup1, HeapTuple tup2,
bool *is_equal, bool *key_has_external);
and
static void
HeapDetermineModifiedColumns(Relation relation, Bitmapset *interesting_cols,
HeapTuple oldtup, HeapTuple newtup,
Bitmapset *modified_attrs, bool *key_has_external);
I didn't figure out a cheap way to check if the key has external value other
than slightly modifying the HeapDetermineModifiedColumns() function and its
subroutine heap_tuple_attr_equals(). As Alvaro said I don't think adding
HeapTupleHasExternal() (as in v3) is a good idea because it does not optimize
genuine cases such as a table whose PK is an integer and contains a single
TOAST column.
Another suggestion is to keep key_changed and add another attribute
(key_has_external) to ExtractReplicaIdentity(). If we need key_changed in the
future we'll have to decompose it again. You also encapsulates that
optimization into the function that helps with future improvements/fixes.
static HeapTuple
ExtractReplicaIdentity(Relation relation, HeapTuple tp, bool key_changed,
bool key_has_external, bool *copy);
--
Euler Taveira
EDB https://www.enterprisedb.com/
On Tue, Aug 10, 2021 at 1:20 AM Amit Kapila <amit.kapila16@gmail.com> wrote:
It seems to me this problem exists from the time we introduced
wal_level = logical in the commit e55704d8b2 [1], or another
possibility is that logical replication commit didn't consider
something to make it work. Andres, Robert, Petr, can you guys please
comment because otherwise, we might miss something here.
I'm belatedly getting around to looking at this thread. My
recollection of this is:
I think we realized when we were working on the logical decoding stuff
that the key columns of the old tuple would have to be detoasted in
order for the mechanism to work, because I remember worrying about
whether it would potentially be a problem that the WAL record would
end up huge. However, I think we believed that the new tuple wouldn't
need to have the detoasted values, because logical decoding is
designed to notice all the TOAST insertions for the new tuple and
reassemble those separate chunks to get the original value back. And
off-hand I'm not sure why that logic doesn't apply just as much to the
key columns as any others.
But the evidence does suggest that there's some kind of bug here, so
evidently there's some flaw in that line of thinking. I'm not sure
off-hand what it is, though.
--
Robert Haas
EDB: http://www.enterprisedb.com
Hi,
On 2022-01-24 15:10:05 -0500, Robert Haas wrote:
I think we realized when we were working on the logical decoding stuff
that the key columns of the old tuple would have to be detoasted in
order for the mechanism to work, because I remember worrying about
whether it would potentially be a problem that the WAL record would
end up huge. However, I think we believed that the new tuple wouldn't
need to have the detoasted values, because logical decoding is
designed to notice all the TOAST insertions for the new tuple and
reassemble those separate chunks to get the original value back.
Possibly the root of the problem is that we/I didn't think of cases where the
primary key is an external toast datum - in moast scenarios you'd an error
about a too wide index tuple. But of course that neglects cases where toasting
happens due to SET STORAGE or due to the aggregate tuple width, rather than
individual column width.
And off-hand I'm not sure why that logic doesn't apply just as much to the
key columns as any others.
The difference is that it's mostly fine to not have unchanging key columns as
part of decoded update - you just don't update those columns. But you can't do
that without knowing the replica identity...
Greetings,
Andres Freund
On Mon, Jan 24, 2022 at 4:17 PM Andres Freund <andres@anarazel.de> wrote:
Possibly the root of the problem is that we/I didn't think of cases where the
primary key is an external toast datum - in moast scenarios you'd an error
about a too wide index tuple. But of course that neglects cases where toasting
happens due to SET STORAGE or due to the aggregate tuple width, rather than
individual column width.
That seems consistent with what's been described on this thread so
far, but I still don't quite understand why the logic that reassembles
TOAST chunks doesn't solve it. I mean, decoding doesn't know whether
any changes are happening on the subscriber side, so it's not like we
can (a) query for the row and then (b) decide to reassemble TOAST
chunks if we find it, or something like that. The decoding has to say,
well, here's the new tuple and the old key columns, and then the
subscriber does whatever it does. I guess it could check whether the
old and new values are identical to decide whether to drop that column
out of the result, but it shouldn't compare a TOAST pointer to a
detoasted value and go "yeah, that looks equal"....
--
Robert Haas
EDB: http://www.enterprisedb.com
On 2022-01-24 16:31:08 -0500, Robert Haas wrote:
That seems consistent with what's been described on this thread so
far, but I still don't quite understand why the logic that reassembles
TOAST chunks doesn't solve it.
There are no toast chunks to reassemble if the update didn't change the
primary key. So this just hits the path we'd also hit for an unchanged toasted
non-key column.
On Mon, Jan 24, 2022 at 4:42 PM Andres Freund <andres@anarazel.de> wrote:
On 2022-01-24 16:31:08 -0500, Robert Haas wrote:
That seems consistent with what's been described on this thread so
far, but I still don't quite understand why the logic that reassembles
TOAST chunks doesn't solve it.There are no toast chunks to reassemble if the update didn't change the
primary key. So this just hits the path we'd also hit for an unchanged toasted
non-key column.
Oh. Hmm. That's bad.
--
Robert Haas
EDB: http://www.enterprisedb.com
On Tue, Jan 25, 2022 at 12:26 AM Euler Taveira <euler@eulerto.com> wrote:
On Mon, Jan 24, 2022, at 12:58 AM, Michael Paquier wrote:
FWIW, I find the API changes of HeapDetermineModifiedColumns() and
ExtractReplicaIdentity() a bit grotty. Shouldn't we try to flatten
the old tuple instead? There are things like
toast_flatten_tuple_to_datum() for this purpose if a tuple satisfies
HeapTupleHasExternal(), or just heap_copy_tuple_as_datum().I checked v4 and I don't like the HeapDetermineModifiedColumns() and
heap_tuple_attr_equals() changes either. It seems it is hijacking these
functions to something else. I would suggest to change the signature tostatic void
heap_tuple_attr_equals(TupleDesc tupdesc, int attrnum,
HeapTuple tup1, HeapTuple tup2,
bool *is_equal, bool *key_has_external);and
static void
HeapDetermineModifiedColumns(Relation relation, Bitmapset *interesting_cols,
HeapTuple oldtup, HeapTuple newtup,
Bitmapset *modified_attrs, bool *key_has_external);I didn't figure out a cheap way to check if the key has external value other
than slightly modifying the HeapDetermineModifiedColumns() function and its
subroutine heap_tuple_attr_equals().
I am not sure if your proposal is much different compared to v4 or how
much it improves the situation? I see you didn't consider
'check_external_attr' parameter and I think that is important to know
if the key has any external toast value. Overall, I see your point
that the change of APIs looks a bit ugly. But, I guess that is more
due to their names and current purpose. I think it could be better if
we bring all the code of heap_tuple_attr_equals in its only caller
HeapDetermineModifiedColumns or at least part of the code where we get
attr value and can determine whether the value is stored externally.
Then change name of HeapDetermineModifiedColumns to
HeapDetermineColumnsInfo with additional parameters.
As Alvaro said I don't think adding
HeapTupleHasExternal() (as in v3) is a good idea because it does not optimize
genuine cases such as a table whose PK is an integer and contains a single
TOAST column.Another suggestion is to keep key_changed and add another attribute
(key_has_external) to ExtractReplicaIdentity(). If we need key_changed in the
future we'll have to decompose it again.
True, but we can make the required changes at that point as well.
OTOH, we can do what you are suggesting as well but I am not sure if
that is required.
--
With Regards,
Amit Kapila.
On Tue, Jan 25, 2022 at 11:59 AM Amit Kapila <amit.kapila16@gmail.com> wrote:
On Tue, Jan 25, 2022 at 12:26 AM Euler Taveira <euler@eulerto.com> wrote:
I am not sure if your proposal is much different compared to v4 or how
much it improves the situation? I see you didn't consider
'check_external_attr' parameter and I think that is important to know
if the key has any external toast value. Overall, I see your point
that the change of APIs looks a bit ugly. But, I guess that is more
due to their names and current purpose. I think it could be better if
we bring all the code of heap_tuple_attr_equals in its only caller
HeapDetermineModifiedColumns or at least part of the code where we get
attr value and can determine whether the value is stored externally.
Then change name of HeapDetermineModifiedColumns to
HeapDetermineColumnsInfo with additional parameters.
I think the best way is to do some refactoring and renaming of the
function, because as part of HeapDetermineModifiedColumns we are
already processing the tuple so we can not put extra overhead of
reprocessing it again. In short I like the idea of renaming the
HeapDetermineModifiedColumns and moving part of heap_tuple_attr_equals
code into the caller. Here is the patch set for the same. I have
divided it into two patches which can eventually be merged, 0001- for
refactoring 0002- does the actual work.
--
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com
Attachments:
v5-0002-WAL-log-unchanged-toasted-eplica-identity-key-att.patchtext/x-patch; charset=US-ASCII; name=v5-0002-WAL-log-unchanged-toasted-eplica-identity-key-att.patchDownload
From db839d20ab109488670de8b480b9f30f4f30c084 Mon Sep 17 00:00:00 2001
From: Dilip Kumar <dilipkumar@localhost.localdomain>
Date: Fri, 28 Jan 2022 11:35:12 +0530
Subject: [PATCH v5 2/2] WAL log unchanged toasted eplica identity key
attributes
Currently, if the unchanged replica identity key attributes are not
logged separately because they are getting logged as part of the
new tuple. But if it is stored externally then the untoasted value
is not getting logged as part of the new tuple. So we need to log
that as part of the old_key_tuple. In order to do that we need to
identify whether any of the unchanged replica identity key attributes
is stored externally or not. So, we are trying to identify this
while identifying the modified attributes because at that time we
are already processing the tuple so there will not be any extra
overhead.
---
contrib/test_decoding/expected/toast.out | 2 +-
src/backend/access/heap/heapam.c | 77 ++++++++++++++++++++++++--------
2 files changed, 60 insertions(+), 19 deletions(-)
diff --git a/contrib/test_decoding/expected/toast.out b/contrib/test_decoding/expected/toast.out
index cd03e9d..a757e7d 100644
--- a/contrib/test_decoding/expected/toast.out
+++ b/contrib/test_decoding/expected/toast.out
@@ -77,7 +77,7 @@ SELECT substr(data, 1, 200) FROM pg_logical_slot_get_changes('regression_slot',
table public.toasted_key: INSERT: id[integer]:1 toasted_key[text]:'1234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123
COMMIT
BEGIN
- table public.toasted_key: UPDATE: id[integer]:1 toasted_key[text]:unchanged-toast-datum toasted_col1[text]:unchanged-toast-datum toasted_col2[text]:'987654321098765432109876543210987654321098765432109
+ table public.toasted_key: UPDATE: old-key: toasted_key[text]:'123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678
COMMIT
BEGIN
table public.toasted_key: UPDATE: old-key: toasted_key[text]:'123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678
diff --git a/src/backend/access/heap/heapam.c b/src/backend/access/heap/heapam.c
index 86eb016..ab17f2e 100644
--- a/src/backend/access/heap/heapam.c
+++ b/src/backend/access/heap/heapam.c
@@ -78,9 +78,11 @@ static XLogRecPtr log_heap_update(Relation reln, Buffer oldbuf,
Buffer newbuf, HeapTuple oldtup,
HeapTuple newtup, HeapTuple old_key_tuple,
bool all_visible_cleared, bool new_all_visible_cleared);
-static Bitmapset *HeapDetermineModifiedColumns(Relation relation,
- Bitmapset *interesting_cols,
- HeapTuple oldtup, HeapTuple newtup);
+static Bitmapset *HeapDetermineColumnsInfo(Relation relation,
+ Bitmapset *interesting_cols,
+ Bitmapset *check_external_attr,
+ HeapTuple oldtup, HeapTuple newtup,
+ bool *id_has_external);
static bool heap_acquire_tuplock(Relation relation, ItemPointer tid,
LockTupleMode mode, LockWaitPolicy wait_policy,
bool *have_tuple_lock);
@@ -106,7 +108,7 @@ static bool ConditionalMultiXactIdWait(MultiXactId multi, MultiXactStatus status
static void index_delete_sort(TM_IndexDeleteOp *delstate);
static int bottomup_sort_and_shrink(TM_IndexDeleteOp *delstate);
static XLogRecPtr log_heap_new_cid(Relation relation, HeapTuple tup);
-static HeapTuple ExtractReplicaIdentity(Relation rel, HeapTuple tup, bool key_changed,
+static HeapTuple ExtractReplicaIdentity(Relation rel, HeapTuple tup, bool key_required,
bool *copy);
@@ -3184,6 +3186,7 @@ heap_update(Relation relation, ItemPointer otid, HeapTuple newtup,
bool all_visible_cleared_new = false;
bool checked_lockers;
bool locker_remains;
+ bool id_has_external = false;
TransactionId xmax_new_tuple,
xmax_old_tuple;
uint16 infomask_old_tuple,
@@ -3262,9 +3265,17 @@ heap_update(Relation relation, ItemPointer otid, HeapTuple newtup,
/* the new tuple is ready, except for this: */
newtup->t_tableOid = RelationGetRelid(relation);
- /* Determine columns modified by the update. */
- modified_attrs = HeapDetermineModifiedColumns(relation, interesting_attrs,
- &oldtup, newtup);
+ /*
+ * Determine columns modified by the update. And also identify whether
+ * any of the unmodified replica identity key attributes is externally
+ * stored or not. We need to identify that because if the replica identity
+ * key attribute is unchanged and it is externally stored then its
+ * untoasted value will not be WAL logged as part of the new tuple so we
+ * will have to do it as part of the old_key_tuple.
+ */
+ modified_attrs = HeapDetermineColumnsInfo(relation, interesting_attrs,
+ id_attrs, &oldtup,
+ newtup, &id_has_external);
/*
* If we're not updating any "key" column, we can grab a weaker lock type.
@@ -3864,10 +3875,12 @@ l2:
* Compute replica identity tuple before entering the critical section so
* we don't PANIC upon a memory allocation failure.
* ExtractReplicaIdentity() will return NULL if nothing needs to be
- * logged.
+ * logged. Pass old key required as true only if the replica identity key
+ * columns are modified or it has external data.
*/
old_key_tuple = ExtractReplicaIdentity(relation, &oldtup,
- bms_overlap(modified_attrs, id_attrs),
+ bms_overlap(modified_attrs, id_attrs) ||
+ id_has_external,
&old_key_copied);
/* NO EREPORT(ERROR) from here till changes are logged */
@@ -4073,12 +4086,20 @@ heap_tuple_attr_equals(TupleDesc tupdesc, int attrnum,
* Given an updated tuple, determine (and return into the output bitmapset),
* from those listed as interesting, the set of columns that changed.
*
+ * check_external_attr is bitmapset of the attribute from the old tuple for
+ * which we want to check whether any of unmodified attribute contains
+ * externally stored attribute. And, if so then has_external will be set to
+ * true.
+ *
* The input bitmapset is destructively modified; that is OK since this is
* invoked at most once in heap_update.
*/
static Bitmapset *
-HeapDetermineModifiedColumns(Relation relation, Bitmapset *interesting_cols,
- HeapTuple oldtup, HeapTuple newtup)
+HeapDetermineColumnsInfo(Relation relation,
+ Bitmapset *interesting_cols,
+ Bitmapset *check_external_attr,
+ HeapTuple oldtup, HeapTuple newtup,
+ bool *has_external)
{
int attrnum;
Bitmapset *modified = NULL;
@@ -4123,9 +4144,28 @@ HeapDetermineModifiedColumns(Relation relation, Bitmapset *interesting_cols,
if (!heap_tuple_attr_equals(RelationGetDescr(relation), attrnum,
value1, value2, isnull1, isnull2))
+ {
modified = bms_add_member(modified,
attrnum -
FirstLowInvalidHeapAttributeNumber);
+ continue;
+ }
+
+ /*
+ * If attribute is NULL or attlen is no -1 then it can not be stored
+ * externally.
+ */
+ if (isnull1 || TupleDescAttr(tupdesc, attrnum - 1)->attlen != -1)
+ continue;
+
+ /*
+ * If the attribute is stored externally and is a member of the
+ * check_external_attr bitmapset then set has_external to true.
+ */
+ if (VARATT_IS_EXTERNAL((struct varlena *) DatumGetPointer(value1)) &&
+ bms_is_member(attrnum - FirstLowInvalidHeapAttributeNumber,
+ check_external_attr))
+ *has_external = true;
}
return modified;
@@ -8362,14 +8402,15 @@ log_heap_new_cid(Relation relation, HeapTuple tup)
* Returns NULL if there's no need to log an identity or if there's no suitable
* key defined.
*
- * key_changed should be false if caller knows that no replica identity
- * columns changed value. It's always true in the DELETE case.
+ * key_required should be false if caller knows that no replica identity
+ * columns changed value and it doesn't has any external data.
+ * It's always true in the DELETE case.
*
* *copy is set to true if the returned tuple is a modified copy rather than
* the same tuple that was passed in.
*/
static HeapTuple
-ExtractReplicaIdentity(Relation relation, HeapTuple tp, bool key_changed,
+ExtractReplicaIdentity(Relation relation, HeapTuple tp, bool key_required,
bool *copy)
{
TupleDesc desc = RelationGetDescr(relation);
@@ -8402,7 +8443,7 @@ ExtractReplicaIdentity(Relation relation, HeapTuple tp, bool key_changed,
}
/* if the key hasn't changed and we're only logging the key, we're done */
- if (!key_changed)
+ if (!key_required)
return NULL;
/* find out the replica identity columns */
@@ -8410,10 +8451,10 @@ ExtractReplicaIdentity(Relation relation, HeapTuple tp, bool key_changed,
INDEX_ATTR_BITMAP_IDENTITY_KEY);
/*
- * If there's no defined replica identity columns, treat as !key_changed.
+ * If there's no defined replica identity columns, treat as !key_required.
* (This case should not be reachable from heap_update, since that should
- * calculate key_changed accurately. But heap_delete just passes constant
- * true for key_changed, so we can hit this case in deletes.)
+ * calculate key_required accurately. But heap_delete just passes constant
+ * true for key_required, so we can hit this case in deletes.)
*/
if (bms_is_empty(idattrs))
return NULL;
--
1.8.3.1
v5-0001-Code-refactoring-for-the-next-patch.patchtext/x-patch; charset=US-ASCII; name=v5-0001-Code-refactoring-for-the-next-patch.patchDownload
From d2281c5b2e61a382d4616d780bb5cc69656ef087 Mon Sep 17 00:00:00 2001
From: Dilip Kumar <dilipkumar@localhost.localdomain>
Date: Fri, 28 Jan 2022 10:44:22 +0530
Subject: [PATCH v5 1/2] Code refactoring for the next patch
Maybe this can be merged with the next patch but kept separate for
ease of review
---
src/backend/access/heap/heapam.c | 81 +++++++++++++++++++++-------------------
1 file changed, 42 insertions(+), 39 deletions(-)
diff --git a/src/backend/access/heap/heapam.c b/src/backend/access/heap/heapam.c
index 98230aa..86eb016 100644
--- a/src/backend/access/heap/heapam.c
+++ b/src/backend/access/heap/heapam.c
@@ -4028,43 +4028,11 @@ l2:
*/
static bool
heap_tuple_attr_equals(TupleDesc tupdesc, int attrnum,
- HeapTuple tup1, HeapTuple tup2)
+ Datum value1, Datum value2, bool isnull1, bool isnull2)
{
- Datum value1,
- value2;
- bool isnull1,
- isnull2;
Form_pg_attribute att;
/*
- * If it's a whole-tuple reference, say "not equal". It's not really
- * worth supporting this case, since it could only succeed after a no-op
- * update, which is hardly a case worth optimizing for.
- */
- if (attrnum == 0)
- return false;
-
- /*
- * Likewise, automatically say "not equal" for any system attribute other
- * than tableOID; we cannot expect these to be consistent in a HOT chain,
- * or even to be set correctly yet in the new tuple.
- */
- if (attrnum < 0)
- {
- if (attrnum != TableOidAttributeNumber)
- return false;
- }
-
- /*
- * Extract the corresponding values. XXX this is pretty inefficient if
- * there are many indexed columns. Should HeapDetermineModifiedColumns do
- * a single heap_deform_tuple call on each tuple, instead? But that
- * doesn't work for system columns ...
- */
- value1 = heap_getattr(tup1, attrnum, tupdesc, &isnull1);
- value2 = heap_getattr(tup2, attrnum, tupdesc, &isnull2);
-
- /*
* If one value is NULL and other is not, then they are certainly not
* equal
*/
@@ -4112,17 +4080,52 @@ static Bitmapset *
HeapDetermineModifiedColumns(Relation relation, Bitmapset *interesting_cols,
HeapTuple oldtup, HeapTuple newtup)
{
- int attnum;
+ int attrnum;
Bitmapset *modified = NULL;
+ TupleDesc tupdesc = RelationGetDescr(relation);
- while ((attnum = bms_first_member(interesting_cols)) >= 0)
+ while ((attrnum = bms_first_member(interesting_cols)) >= 0)
{
- attnum += FirstLowInvalidHeapAttributeNumber;
+ Datum value1,
+ value2;
+ bool isnull1,
+ isnull2;
+
+ attrnum += FirstLowInvalidHeapAttributeNumber;
+
+ /*
+ * If it's a whole-tuple reference, say "not equal". It's not really
+ * worth supporting this case, since it could only succeed after a
+ * no-op update, which is hardly a case worth optimizing for.
+ */
+ if (attrnum == 0)
+ continue;
+
+ /*
+ * Likewise, automatically say "not equal" for any system attribute
+ * other than tableOID; we cannot expect these to be consistent in a
+ * HOT chain, or even to be set correctly yet in the new tuple.
+ */
+ if (attrnum < 0)
+ {
+ if (attrnum != TableOidAttributeNumber)
+ continue;
+ }
+
+ /*
+ * Extract the corresponding values. XXX this is pretty inefficient if
+ * there are many indexed columns. Should HeapDetermineModifiedColumns do
+ * a single heap_deform_tuple call on each tuple, instead? But that
+ * doesn't work for system columns ...
+ */
+ value1 = heap_getattr(oldtup, attrnum, tupdesc, &isnull1);
+ value2 = heap_getattr(newtup, attrnum, tupdesc, &isnull2);
- if (!heap_tuple_attr_equals(RelationGetDescr(relation),
- attnum, oldtup, newtup))
+ if (!heap_tuple_attr_equals(RelationGetDescr(relation), attrnum,
+ value1, value2, isnull1, isnull2))
modified = bms_add_member(modified,
- attnum - FirstLowInvalidHeapAttributeNumber);
+ attrnum -
+ FirstLowInvalidHeapAttributeNumber);
}
return modified;
--
1.8.3.1
On Fri, Jan 28, 2022 at 12:16 PM Dilip Kumar <dilipbalaut@gmail.com> wrote:
I think the best way is to do some refactoring and renaming of the
function, because as part of HeapDetermineModifiedColumns we are
already processing the tuple so we can not put extra overhead of
reprocessing it again. In short I like the idea of renaming the
HeapDetermineModifiedColumns and moving part of heap_tuple_attr_equals
code into the caller. Here is the patch set for the same. I have
divided it into two patches which can eventually be merged, 0001- for
refactoring 0002- does the actual work.
+ /*
+ * If it's a whole-tuple reference, say "not equal". It's not really
+ * worth supporting this case, since it could only succeed after a
+ * no-op update, which is hardly a case worth optimizing for.
+ */
+ if (attrnum == 0)
+ continue;
+
+ /*
+ * Likewise, automatically say "not equal" for any system attribute
+ * other than tableOID; we cannot expect these to be consistent in a
+ * HOT chain, or even to be set correctly yet in the new tuple.
+ */
+ if (attrnum < 0)
+ {
+ if (attrnum != TableOidAttributeNumber)
+ continue;
+ }
These two cases need to be considered as the corresponding attribute
is modified, so the attnum needs to be added in the bitmapset of
modified attrs.
I have changed this and various other comments in the patch. I have
modified the docs as well to reflect the changes. I thought of adding
a test but I think the current test in toast.sql seems sufficient.
Kindly let me know what you think of the attached? I think we should
backpatch this till v10. What do you think?
Does anyone else have better ideas to fix this?
--
With Regards,
Amit Kapila.
Attachments:
v6-0001-WAL-log-unchanged-toasted-replica-identity-key-at.patchapplication/octet-stream; name=v6-0001-WAL-log-unchanged-toasted-replica-identity-key-at.patchDownload
From 887e8e061f7a242f6586af470210c66605d56a1a Mon Sep 17 00:00:00 2001
From: Amit Kapila <akapila@postgresql.org>
Date: Sat, 29 Jan 2022 15:16:20 +0530
Subject: [PATCH v7] WAL log unchanged toasted replica identity key attributes.
Currently, during UPDATE, the unchanged replica identity key attributes
are not logged separately because they are getting logged as part of the
new tuple. But if they are stored externally then the untoasted values are
not getting logged as part of the new tuple and logical replication won't
be able to replicate such UPDATEs. So we need to log such attributes as
part of the old_key_tuple during UPDATE.
Reported-by: Haiying Tang
Author: Dilip Kumar and Amit Kapila
Discussion: https://postgr.es/m/OS0PR01MB611342D0A92D4F4BF26C0F47FB229@OS0PR01MB6113.jpnprd01.prod.outlook.com
---
contrib/test_decoding/expected/toast.out | 2 +-
doc/src/sgml/ref/alter_table.sgml | 6 +-
src/backend/access/heap/heapam.c | 178 ++++++++++++++++++++-----------
3 files changed, 119 insertions(+), 67 deletions(-)
diff --git a/contrib/test_decoding/expected/toast.out b/contrib/test_decoding/expected/toast.out
index cd03e9d..a757e7d 100644
--- a/contrib/test_decoding/expected/toast.out
+++ b/contrib/test_decoding/expected/toast.out
@@ -77,7 +77,7 @@ SELECT substr(data, 1, 200) FROM pg_logical_slot_get_changes('regression_slot',
table public.toasted_key: INSERT: id[integer]:1 toasted_key[text]:'1234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123
COMMIT
BEGIN
- table public.toasted_key: UPDATE: id[integer]:1 toasted_key[text]:unchanged-toast-datum toasted_col1[text]:unchanged-toast-datum toasted_col2[text]:'987654321098765432109876543210987654321098765432109
+ table public.toasted_key: UPDATE: old-key: toasted_key[text]:'123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678
COMMIT
BEGIN
table public.toasted_key: UPDATE: old-key: toasted_key[text]:'123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678
diff --git a/doc/src/sgml/ref/alter_table.sgml b/doc/src/sgml/ref/alter_table.sgml
index c8b16c1..1a93f2b 100644
--- a/doc/src/sgml/ref/alter_table.sgml
+++ b/doc/src/sgml/ref/alter_table.sgml
@@ -873,8 +873,10 @@ WITH ( MODULUS <replaceable class="parameter">numeric_literal</replaceable>, REM
This form changes the information which is written to the write-ahead log
to identify rows which are updated or deleted. This option has no effect
except when logical replication is in use.
- In all cases, no old values are logged unless at least one of the columns
- that would be logged differs between the old and new versions of the row.
+ In all cases except toasted values, no old values are logged unless at
+ least one of the columns that would be logged differs between the old and
+ new versions of the row. We detoast the unchanged old toast values and log
+ it.
<variablelist>
<varlistentry>
<term><literal>DEFAULT</literal></term>
diff --git a/src/backend/access/heap/heapam.c b/src/backend/access/heap/heapam.c
index 98230aa..7cbf73b 100644
--- a/src/backend/access/heap/heapam.c
+++ b/src/backend/access/heap/heapam.c
@@ -78,9 +78,11 @@ static XLogRecPtr log_heap_update(Relation reln, Buffer oldbuf,
Buffer newbuf, HeapTuple oldtup,
HeapTuple newtup, HeapTuple old_key_tuple,
bool all_visible_cleared, bool new_all_visible_cleared);
-static Bitmapset *HeapDetermineModifiedColumns(Relation relation,
- Bitmapset *interesting_cols,
- HeapTuple oldtup, HeapTuple newtup);
+static Bitmapset *HeapDetermineColumnsInfo(Relation relation,
+ Bitmapset *interesting_cols,
+ Bitmapset *check_external_cols,
+ HeapTuple oldtup, HeapTuple newtup,
+ bool *id_has_external);
static bool heap_acquire_tuplock(Relation relation, ItemPointer tid,
LockTupleMode mode, LockWaitPolicy wait_policy,
bool *have_tuple_lock);
@@ -106,7 +108,7 @@ static bool ConditionalMultiXactIdWait(MultiXactId multi, MultiXactStatus status
static void index_delete_sort(TM_IndexDeleteOp *delstate);
static int bottomup_sort_and_shrink(TM_IndexDeleteOp *delstate);
static XLogRecPtr log_heap_new_cid(Relation relation, HeapTuple tup);
-static HeapTuple ExtractReplicaIdentity(Relation rel, HeapTuple tup, bool key_changed,
+static HeapTuple ExtractReplicaIdentity(Relation rel, HeapTuple tup, bool key_required,
bool *copy);
@@ -3184,6 +3186,7 @@ heap_update(Relation relation, ItemPointer otid, HeapTuple newtup,
bool all_visible_cleared_new = false;
bool checked_lockers;
bool locker_remains;
+ bool id_has_external = false;
TransactionId xmax_new_tuple,
xmax_old_tuple;
uint16 infomask_old_tuple,
@@ -3251,7 +3254,7 @@ heap_update(Relation relation, ItemPointer otid, HeapTuple newtup,
Assert(ItemIdIsNormal(lp));
/*
- * Fill in enough data in oldtup for HeapDetermineModifiedColumns to work
+ * Fill in enough data in oldtup for HeapDetermineColumnsInfo to work
* properly.
*/
oldtup.t_tableOid = RelationGetRelid(relation);
@@ -3262,9 +3265,17 @@ heap_update(Relation relation, ItemPointer otid, HeapTuple newtup,
/* the new tuple is ready, except for this: */
newtup->t_tableOid = RelationGetRelid(relation);
- /* Determine columns modified by the update. */
- modified_attrs = HeapDetermineModifiedColumns(relation, interesting_attrs,
- &oldtup, newtup);
+ /*
+ * Determine columns modified by the update. Additionally, identify
+ * whether any of the unmodified replica identity key attributes in the
+ * old tuple is externally stored or not. This is required because for
+ * such attributes the flattened value won't be WAL logged as part of the
+ * new tuple so we must include it as part of the old_key_tuple. See
+ * ExtractReplicaIdentity.
+ */
+ modified_attrs = HeapDetermineColumnsInfo(relation, interesting_attrs,
+ id_attrs, &oldtup,
+ newtup, &id_has_external);
/*
* If we're not updating any "key" column, we can grab a weaker lock type.
@@ -3864,10 +3875,12 @@ l2:
* Compute replica identity tuple before entering the critical section so
* we don't PANIC upon a memory allocation failure.
* ExtractReplicaIdentity() will return NULL if nothing needs to be
- * logged.
+ * logged. Pass old key required as true only if the replica identity key
+ * columns are modified or it has external data.
*/
old_key_tuple = ExtractReplicaIdentity(relation, &oldtup,
- bms_overlap(modified_attrs, id_attrs),
+ bms_overlap(modified_attrs, id_attrs) ||
+ id_has_external,
&old_key_copied);
/* NO EREPORT(ERROR) from here till changes are logged */
@@ -4023,48 +4036,16 @@ l2:
}
/*
- * Check if the specified attribute's value is same in both given tuples.
- * Subroutine for HeapDetermineModifiedColumns.
+ * Check if the specified attribute's values are the same. Subroutine for
+ * HeapDetermineColumnsInfo.
*/
static bool
-heap_tuple_attr_equals(TupleDesc tupdesc, int attrnum,
- HeapTuple tup1, HeapTuple tup2)
+heap_tuple_attr_equals(TupleDesc tupdesc, int attrnum, Datum value1,
+ Datum value2, bool isnull1, bool isnull2)
{
- Datum value1,
- value2;
- bool isnull1,
- isnull2;
Form_pg_attribute att;
/*
- * If it's a whole-tuple reference, say "not equal". It's not really
- * worth supporting this case, since it could only succeed after a no-op
- * update, which is hardly a case worth optimizing for.
- */
- if (attrnum == 0)
- return false;
-
- /*
- * Likewise, automatically say "not equal" for any system attribute other
- * than tableOID; we cannot expect these to be consistent in a HOT chain,
- * or even to be set correctly yet in the new tuple.
- */
- if (attrnum < 0)
- {
- if (attrnum != TableOidAttributeNumber)
- return false;
- }
-
- /*
- * Extract the corresponding values. XXX this is pretty inefficient if
- * there are many indexed columns. Should HeapDetermineModifiedColumns do
- * a single heap_deform_tuple call on each tuple, instead? But that
- * doesn't work for system columns ...
- */
- value1 = heap_getattr(tup1, attrnum, tupdesc, &isnull1);
- value2 = heap_getattr(tup2, attrnum, tupdesc, &isnull2);
-
- /*
* If one value is NULL and other is not, then they are certainly not
* equal
*/
@@ -4105,24 +4086,92 @@ heap_tuple_attr_equals(TupleDesc tupdesc, int attrnum,
* Given an updated tuple, determine (and return into the output bitmapset),
* from those listed as interesting, the set of columns that changed.
*
- * The input bitmapset is destructively modified; that is OK since this is
- * invoked at most once in heap_update.
+ * has_external indicates if any of the unmodified attributes (from those
+ * listed as interesting) of the old tuple is a member of check_external_cols
+ * and is stored externally.
+ *
+ * The input interesting_cols bitmapset is destructively modified; that is OK
+ * since this is invoked at most once in heap_update.
*/
static Bitmapset *
-HeapDetermineModifiedColumns(Relation relation, Bitmapset *interesting_cols,
- HeapTuple oldtup, HeapTuple newtup)
+HeapDetermineColumnsInfo(Relation relation,
+ Bitmapset *interesting_cols,
+ Bitmapset *check_external_cols,
+ HeapTuple oldtup, HeapTuple newtup,
+ bool *has_external)
{
- int attnum;
+ int attrnum;
Bitmapset *modified = NULL;
+ TupleDesc tupdesc = RelationGetDescr(relation);
- while ((attnum = bms_first_member(interesting_cols)) >= 0)
+ while ((attrnum = bms_first_member(interesting_cols)) >= 0)
{
- attnum += FirstLowInvalidHeapAttributeNumber;
+ Datum value1,
+ value2;
+ bool isnull1,
+ isnull2;
+
+ attrnum += FirstLowInvalidHeapAttributeNumber;
- if (!heap_tuple_attr_equals(RelationGetDescr(relation),
- attnum, oldtup, newtup))
+ /*
+ * If it's a whole-tuple reference, say "not equal". It's not really
+ * worth supporting this case, since it could only succeed after a
+ * no-op update, which is hardly a case worth optimizing for.
+ */
+ if (attrnum == 0)
+ {
modified = bms_add_member(modified,
- attnum - FirstLowInvalidHeapAttributeNumber);
+ attrnum -
+ FirstLowInvalidHeapAttributeNumber);
+ continue;
+ }
+
+ /*
+ * Likewise, automatically say "not equal" for any system attribute
+ * other than tableOID; we cannot expect these to be consistent in a
+ * HOT chain, or even to be set correctly yet in the new tuple.
+ */
+ if (attrnum < 0)
+ {
+ if (attrnum != TableOidAttributeNumber)
+ {
+ modified = bms_add_member(modified,
+ attrnum -
+ FirstLowInvalidHeapAttributeNumber);
+ continue;
+ }
+ }
+
+ /*
+ * Extract the corresponding values. XXX this is pretty inefficient
+ * if there are many indexed columns. Should we do a single
+ * heap_deform_tuple call on each tuple, instead? But that doesn't
+ * work for system columns ...
+ */
+ value1 = heap_getattr(oldtup, attrnum, tupdesc, &isnull1);
+ value2 = heap_getattr(newtup, attrnum, tupdesc, &isnull2);
+
+ if (!heap_tuple_attr_equals(RelationGetDescr(relation), attrnum,
+ value1, value2, isnull1, isnull2))
+ {
+ modified = bms_add_member(modified,
+ attrnum -
+ FirstLowInvalidHeapAttributeNumber);
+ continue;
+ }
+
+ /* No need to check attributes that can't be stored externally. */
+ if (isnull1 || TupleDescAttr(tupdesc, attrnum - 1)->attlen != -1)
+ continue;
+
+ /*
+ * Check if the old tuple's attribute is stored externally and is a
+ * member of check_external_cols.
+ */
+ if (VARATT_IS_EXTERNAL((struct varlena *) DatumGetPointer(value1)) &&
+ bms_is_member(attrnum - FirstLowInvalidHeapAttributeNumber,
+ check_external_cols))
+ *has_external = true;
}
return modified;
@@ -8359,14 +8408,15 @@ log_heap_new_cid(Relation relation, HeapTuple tup)
* Returns NULL if there's no need to log an identity or if there's no suitable
* key defined.
*
- * key_changed should be false if caller knows that no replica identity
- * columns changed value. It's always true in the DELETE case.
+ * key_required should be false if caller knows that no replica identity
+ * columns changed value and it doesn't has any external data. It's always
+ * true in the DELETE case.
*
* *copy is set to true if the returned tuple is a modified copy rather than
* the same tuple that was passed in.
*/
static HeapTuple
-ExtractReplicaIdentity(Relation relation, HeapTuple tp, bool key_changed,
+ExtractReplicaIdentity(Relation relation, HeapTuple tp, bool key_required,
bool *copy)
{
TupleDesc desc = RelationGetDescr(relation);
@@ -8398,8 +8448,8 @@ ExtractReplicaIdentity(Relation relation, HeapTuple tp, bool key_changed,
return tp;
}
- /* if the key hasn't changed and we're only logging the key, we're done */
- if (!key_changed)
+ /* if the key isn't required and we're only logging the key, we're done */
+ if (!key_required)
return NULL;
/* find out the replica identity columns */
@@ -8407,10 +8457,10 @@ ExtractReplicaIdentity(Relation relation, HeapTuple tp, bool key_changed,
INDEX_ATTR_BITMAP_IDENTITY_KEY);
/*
- * If there's no defined replica identity columns, treat as !key_changed.
+ * If there's no defined replica identity columns, treat as !key_required.
* (This case should not be reachable from heap_update, since that should
- * calculate key_changed accurately. But heap_delete just passes constant
- * true for key_changed, so we can hit this case in deletes.)
+ * calculate key_required accurately. But heap_delete just passes
+ * constant true for key_required, so we can hit this case in deletes.)
*/
if (bms_is_empty(idattrs))
return NULL;
--
1.8.3.1
On Sat, Jan 29, 2022 at 3:57 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
On Fri, Jan 28, 2022 at 12:16 PM Dilip Kumar <dilipbalaut@gmail.com> wrote:
+ /* + * If it's a whole-tuple reference, say "not equal". It's not really + * worth supporting this case, since it could only succeed after a + * no-op update, which is hardly a case worth optimizing for. + */ + if (attrnum == 0) + continue; + + /* + * Likewise, automatically say "not equal" for any system attribute + * other than tableOID; we cannot expect these to be consistent in a + * HOT chain, or even to be set correctly yet in the new tuple. + */ + if (attrnum < 0) + { + if (attrnum != TableOidAttributeNumber) + continue; + }These two cases need to be considered as the corresponding attribute
is modified, so the attnum needs to be added in the bitmapset of
modified attrs.
Yeah right.
I have changed this and various other comments in the patch. I have
modified the docs as well to reflect the changes. I thought of adding
a test but I think the current test in toast.sql seems sufficient.
Kindly let me know what you think of the attached? I think we should
backpatch this till v10. What do you think?
Looks fine to me.
--
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com
On Mon, Jan 31, 2022 at 9:03 AM Dilip Kumar <dilipbalaut@gmail.com> wrote:
I have changed this and various other comments in the patch. I have
modified the docs as well to reflect the changes. I thought of adding
a test but I think the current test in toast.sql seems sufficient.
Kindly let me know what you think of the attached? I think we should
backpatch this till v10. What do you think?Looks fine to me.
Attached are the patches for back-branches till v10. I have made two
modifications: (a) changed heap_tuple_attr_equals() to
heap_attr_equals() as we are not passing tuple to it; (b) changed
parameter name 'check_external_cols' to 'external_cols' to make it
sound similar to existing parameter 'interesting_cols' in
HeapDetermine* function.
Let me know what you think of the attached? Do you see any reason not
to back-patch this fix?
--
With Regards,
Amit Kapila.
Attachments:
13-v7-0001-WAL-log-unchanged-toasted-replica-identity-key-a.patchapplication/octet-stream; name=13-v7-0001-WAL-log-unchanged-toasted-replica-identity-key-a.patchDownload
From cc9207b12ce20a49943304b786df78b1ae32455b Mon Sep 17 00:00:00 2001
From: Amit Kapila <akapila@postgresql.org>
Date: Fri, 4 Feb 2022 11:51:05 +0530
Subject: [PATCH v13] WAL log unchanged toasted replica identity key
attributes.
Currently, during UPDATE, the unchanged replica identity key attributes
are not logged separately because they are getting logged as part of the
new tuple. But if they are stored externally then the untoasted values are
not getting logged as part of the new tuple and logical replication won't
be able to replicate such UPDATEs. So we need to log such attributes as
part of the old_key_tuple during UPDATE.
Reported-by: Haiying Tang
Author: Dilip Kumar and Amit Kapila
Backpatch-through: 10
Discussion: https://postgr.es/m/OS0PR01MB611342D0A92D4F4BF26C0F47FB229@OS0PR01MB6113.jpnprd01.prod.outlook.com
---
contrib/test_decoding/expected/toast.out | 2 +-
doc/src/sgml/ref/alter_table.sgml | 6 +-
src/backend/access/heap/heapam.c | 178 ++++++++++++++++++++-----------
3 files changed, 119 insertions(+), 67 deletions(-)
diff --git a/contrib/test_decoding/expected/toast.out b/contrib/test_decoding/expected/toast.out
index cd03e9d..a757e7d 100644
--- a/contrib/test_decoding/expected/toast.out
+++ b/contrib/test_decoding/expected/toast.out
@@ -77,7 +77,7 @@ SELECT substr(data, 1, 200) FROM pg_logical_slot_get_changes('regression_slot',
table public.toasted_key: INSERT: id[integer]:1 toasted_key[text]:'1234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123
COMMIT
BEGIN
- table public.toasted_key: UPDATE: id[integer]:1 toasted_key[text]:unchanged-toast-datum toasted_col1[text]:unchanged-toast-datum toasted_col2[text]:'987654321098765432109876543210987654321098765432109
+ table public.toasted_key: UPDATE: old-key: toasted_key[text]:'123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678
COMMIT
BEGIN
table public.toasted_key: UPDATE: old-key: toasted_key[text]:'123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678
diff --git a/doc/src/sgml/ref/alter_table.sgml b/doc/src/sgml/ref/alter_table.sgml
index ca95939..abfbd88 100644
--- a/doc/src/sgml/ref/alter_table.sgml
+++ b/doc/src/sgml/ref/alter_table.sgml
@@ -821,8 +821,10 @@ WITH ( MODULUS <replaceable class="parameter">numeric_literal</replaceable>, REM
This form changes the information which is written to the write-ahead log
to identify rows which are updated or deleted. This option has no effect
except when logical replication is in use.
- In all cases, no old values are logged unless at least one of the columns
- that would be logged differs between the old and new versions of the row.
+ In all cases except toasted values, no old values are logged unless at
+ least one of the columns that would be logged differs between the old and
+ new versions of the row. We detoast the unchanged old toast values and log
+ them.
<variablelist>
<varlistentry>
<term><literal>DEFAULT</literal></term>
diff --git a/src/backend/access/heap/heapam.c b/src/backend/access/heap/heapam.c
index 0807b40..092a415 100644
--- a/src/backend/access/heap/heapam.c
+++ b/src/backend/access/heap/heapam.c
@@ -76,9 +76,11 @@ static XLogRecPtr log_heap_update(Relation reln, Buffer oldbuf,
Buffer newbuf, HeapTuple oldtup,
HeapTuple newtup, HeapTuple old_key_tuple,
bool all_visible_cleared, bool new_all_visible_cleared);
-static Bitmapset *HeapDetermineModifiedColumns(Relation relation,
- Bitmapset *interesting_cols,
- HeapTuple oldtup, HeapTuple newtup);
+static Bitmapset *HeapDetermineColumnsInfo(Relation relation,
+ Bitmapset *interesting_cols,
+ Bitmapset *external_cols,
+ HeapTuple oldtup, HeapTuple newtup,
+ bool *id_has_external);
static bool heap_acquire_tuplock(Relation relation, ItemPointer tid,
LockTupleMode mode, LockWaitPolicy wait_policy,
bool *have_tuple_lock);
@@ -102,7 +104,7 @@ static void MultiXactIdWait(MultiXactId multi, MultiXactStatus status, uint16 in
static bool ConditionalMultiXactIdWait(MultiXactId multi, MultiXactStatus status,
uint16 infomask, Relation rel, int *remaining);
static XLogRecPtr log_heap_new_cid(Relation relation, HeapTuple tup);
-static HeapTuple ExtractReplicaIdentity(Relation rel, HeapTuple tup, bool key_changed,
+static HeapTuple ExtractReplicaIdentity(Relation rel, HeapTuple tup, bool key_required,
bool *copy);
@@ -2910,6 +2912,7 @@ heap_update(Relation relation, ItemPointer otid, HeapTuple newtup,
bool all_visible_cleared_new = false;
bool checked_lockers;
bool locker_remains;
+ bool id_has_external = false;
TransactionId xmax_new_tuple,
xmax_old_tuple;
uint16 infomask_old_tuple,
@@ -2994,7 +2997,7 @@ heap_update(Relation relation, ItemPointer otid, HeapTuple newtup,
Assert(ItemIdIsNormal(lp));
/*
- * Fill in enough data in oldtup for HeapDetermineModifiedColumns to work
+ * Fill in enough data in oldtup for HeapDetermineColumnsInfo to work
* properly.
*/
oldtup.t_tableOid = RelationGetRelid(relation);
@@ -3005,9 +3008,17 @@ heap_update(Relation relation, ItemPointer otid, HeapTuple newtup,
/* the new tuple is ready, except for this: */
newtup->t_tableOid = RelationGetRelid(relation);
- /* Determine columns modified by the update. */
- modified_attrs = HeapDetermineModifiedColumns(relation, interesting_attrs,
- &oldtup, newtup);
+ /*
+ * Determine columns modified by the update. Additionally, identify
+ * whether any of the unmodified replica identity key attributes in the
+ * old tuple is externally stored or not. This is required because for
+ * such attributes the flattened value won't be WAL logged as part of the
+ * new tuple so we must include it as part of the old_key_tuple. See
+ * ExtractReplicaIdentity.
+ */
+ modified_attrs = HeapDetermineColumnsInfo(relation, interesting_attrs,
+ id_attrs, &oldtup,
+ newtup, &id_has_external);
/*
* If we're not updating any "key" column, we can grab a weaker lock type.
@@ -3609,10 +3620,12 @@ l2:
* Compute replica identity tuple before entering the critical section so
* we don't PANIC upon a memory allocation failure.
* ExtractReplicaIdentity() will return NULL if nothing needs to be
- * logged.
+ * logged. Pass old key required as true only if the replica identity key
+ * columns are modified or it has external data.
*/
old_key_tuple = ExtractReplicaIdentity(relation, &oldtup,
- bms_overlap(modified_attrs, id_attrs),
+ bms_overlap(modified_attrs, id_attrs) ||
+ id_has_external,
&old_key_copied);
/* NO EREPORT(ERROR) from here till changes are logged */
@@ -3768,48 +3781,16 @@ l2:
}
/*
- * Check if the specified attribute's value is same in both given tuples.
- * Subroutine for HeapDetermineModifiedColumns.
+ * Check if the specified attribute's values are the same. Subroutine for
+ * HeapDetermineColumnsInfo.
*/
static bool
-heap_tuple_attr_equals(TupleDesc tupdesc, int attrnum,
- HeapTuple tup1, HeapTuple tup2)
+heap_attr_equals(TupleDesc tupdesc, int attrnum, Datum value1, Datum value2,
+ bool isnull1, bool isnull2)
{
- Datum value1,
- value2;
- bool isnull1,
- isnull2;
Form_pg_attribute att;
/*
- * If it's a whole-tuple reference, say "not equal". It's not really
- * worth supporting this case, since it could only succeed after a no-op
- * update, which is hardly a case worth optimizing for.
- */
- if (attrnum == 0)
- return false;
-
- /*
- * Likewise, automatically say "not equal" for any system attribute other
- * than tableOID; we cannot expect these to be consistent in a HOT chain,
- * or even to be set correctly yet in the new tuple.
- */
- if (attrnum < 0)
- {
- if (attrnum != TableOidAttributeNumber)
- return false;
- }
-
- /*
- * Extract the corresponding values. XXX this is pretty inefficient if
- * there are many indexed columns. Should HeapDetermineModifiedColumns do
- * a single heap_deform_tuple call on each tuple, instead? But that
- * doesn't work for system columns ...
- */
- value1 = heap_getattr(tup1, attrnum, tupdesc, &isnull1);
- value2 = heap_getattr(tup2, attrnum, tupdesc, &isnull2);
-
- /*
* If one value is NULL and other is not, then they are certainly not
* equal
*/
@@ -3850,24 +3831,92 @@ heap_tuple_attr_equals(TupleDesc tupdesc, int attrnum,
* Given an updated tuple, determine (and return into the output bitmapset),
* from those listed as interesting, the set of columns that changed.
*
- * The input bitmapset is destructively modified; that is OK since this is
- * invoked at most once in heap_update.
+ * has_external indicates if any of the unmodified attributes (from those
+ * listed as interesting) of the old tuple is a member of external_cols and is
+ * stored externally.
+ *
+ * The input interesting_cols bitmapset is destructively modified; that is OK
+ * since this is invoked at most once in heap_update.
*/
static Bitmapset *
-HeapDetermineModifiedColumns(Relation relation, Bitmapset *interesting_cols,
- HeapTuple oldtup, HeapTuple newtup)
+HeapDetermineColumnsInfo(Relation relation,
+ Bitmapset *interesting_cols,
+ Bitmapset *external_cols,
+ HeapTuple oldtup, HeapTuple newtup,
+ bool *has_external)
{
- int attnum;
+ int attrnum;
Bitmapset *modified = NULL;
+ TupleDesc tupdesc = RelationGetDescr(relation);
- while ((attnum = bms_first_member(interesting_cols)) >= 0)
+ while ((attrnum = bms_first_member(interesting_cols)) >= 0)
{
- attnum += FirstLowInvalidHeapAttributeNumber;
+ Datum value1,
+ value2;
+ bool isnull1,
+ isnull2;
- if (!heap_tuple_attr_equals(RelationGetDescr(relation),
- attnum, oldtup, newtup))
+ attrnum += FirstLowInvalidHeapAttributeNumber;
+
+ /*
+ * If it's a whole-tuple reference, say "not equal". It's not really
+ * worth supporting this case, since it could only succeed after a
+ * no-op update, which is hardly a case worth optimizing for.
+ */
+ if (attrnum == 0)
+ {
modified = bms_add_member(modified,
- attnum - FirstLowInvalidHeapAttributeNumber);
+ attrnum -
+ FirstLowInvalidHeapAttributeNumber);
+ continue;
+ }
+
+ /*
+ * Likewise, automatically say "not equal" for any system attribute
+ * other than tableOID; we cannot expect these to be consistent in a
+ * HOT chain, or even to be set correctly yet in the new tuple.
+ */
+ if (attrnum < 0)
+ {
+ if (attrnum != TableOidAttributeNumber)
+ {
+ modified = bms_add_member(modified,
+ attrnum -
+ FirstLowInvalidHeapAttributeNumber);
+ continue;
+ }
+ }
+
+ /*
+ * Extract the corresponding values. XXX this is pretty inefficient
+ * if there are many indexed columns. Should we do a single
+ * heap_deform_tuple call on each tuple, instead? But that doesn't
+ * work for system columns ...
+ */
+ value1 = heap_getattr(oldtup, attrnum, tupdesc, &isnull1);
+ value2 = heap_getattr(newtup, attrnum, tupdesc, &isnull2);
+
+ if (!heap_attr_equals(RelationGetDescr(relation), attrnum, value1,
+ value2, isnull1, isnull2))
+ {
+ modified = bms_add_member(modified,
+ attrnum -
+ FirstLowInvalidHeapAttributeNumber);
+ continue;
+ }
+
+ /* No need to check attributes that can't be stored externally. */
+ if (isnull1 || TupleDescAttr(tupdesc, attrnum - 1)->attlen != -1)
+ continue;
+
+ /*
+ * Check if the old tuple's attribute is stored externally and is a
+ * member of external_cols.
+ */
+ if (VARATT_IS_EXTERNAL((struct varlena *) DatumGetPointer(value1)) &&
+ bms_is_member(attrnum - FirstLowInvalidHeapAttributeNumber,
+ external_cols))
+ *has_external = true;
}
return modified;
@@ -7620,14 +7669,15 @@ log_heap_new_cid(Relation relation, HeapTuple tup)
* Returns NULL if there's no need to log an identity or if there's no suitable
* key defined.
*
- * key_changed should be false if caller knows that no replica identity
- * columns changed value. It's always true in the DELETE case.
+ * key_required should be false if caller knows that no replica identity
+ * columns changed value and it doesn't has any external data. It's always
+ * true in the DELETE case.
*
* *copy is set to true if the returned tuple is a modified copy rather than
* the same tuple that was passed in.
*/
static HeapTuple
-ExtractReplicaIdentity(Relation relation, HeapTuple tp, bool key_changed,
+ExtractReplicaIdentity(Relation relation, HeapTuple tp, bool key_required,
bool *copy)
{
TupleDesc desc = RelationGetDescr(relation);
@@ -7659,8 +7709,8 @@ ExtractReplicaIdentity(Relation relation, HeapTuple tp, bool key_changed,
return tp;
}
- /* if the key hasn't changed and we're only logging the key, we're done */
- if (!key_changed)
+ /* if the key isn't required and we're only logging the key, we're done */
+ if (!key_required)
return NULL;
/* find out the replica identity columns */
@@ -7668,10 +7718,10 @@ ExtractReplicaIdentity(Relation relation, HeapTuple tp, bool key_changed,
INDEX_ATTR_BITMAP_IDENTITY_KEY);
/*
- * If there's no defined replica identity columns, treat as !key_changed.
+ * If there's no defined replica identity columns, treat as !key_required.
* (This case should not be reachable from heap_update, since that should
- * calculate key_changed accurately. But heap_delete just passes constant
- * true for key_changed, so we can hit this case in deletes.)
+ * calculate key_required accurately. But heap_delete just passes
+ * constant true for key_required, so we can hit this case in deletes.)
*/
if (bms_is_empty(idattrs))
return NULL;
--
1.8.3.1
HEAD-v7-0001-WAL-log-unchanged-toasted-replica-identity-key-at.patchapplication/octet-stream; name=HEAD-v7-0001-WAL-log-unchanged-toasted-replica-identity-key-at.patchDownload
From e1a5d309b3708da7d4d9e086a90f63bad2541db0 Mon Sep 17 00:00:00 2001
From: Amit Kapila <akapila@postgresql.org>
Date: Fri, 4 Feb 2022 10:30:32 +0530
Subject: [PATCH v8] WAL log unchanged toasted replica identity key attributes.
Currently, during UPDATE, the unchanged replica identity key attributes
are not logged separately because they are getting logged as part of the
new tuple. But if they are stored externally then the untoasted values are
not getting logged as part of the new tuple and logical replication won't
be able to replicate such UPDATEs. So we need to log such attributes as
part of the old_key_tuple during UPDATE.
Reported-by: Haiying Tang
Author: Dilip Kumar and Amit Kapila
Backpatch-through: 10
Discussion: https://postgr.es/m/OS0PR01MB611342D0A92D4F4BF26C0F47FB229@OS0PR01MB6113.jpnprd01.prod.outlook.com
---
contrib/test_decoding/expected/toast.out | 2 +-
doc/src/sgml/ref/alter_table.sgml | 6 +-
src/backend/access/heap/heapam.c | 178 ++++++++++++++++++++-----------
3 files changed, 119 insertions(+), 67 deletions(-)
diff --git a/contrib/test_decoding/expected/toast.out b/contrib/test_decoding/expected/toast.out
index cd03e9d..a757e7d 100644
--- a/contrib/test_decoding/expected/toast.out
+++ b/contrib/test_decoding/expected/toast.out
@@ -77,7 +77,7 @@ SELECT substr(data, 1, 200) FROM pg_logical_slot_get_changes('regression_slot',
table public.toasted_key: INSERT: id[integer]:1 toasted_key[text]:'1234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123
COMMIT
BEGIN
- table public.toasted_key: UPDATE: id[integer]:1 toasted_key[text]:unchanged-toast-datum toasted_col1[text]:unchanged-toast-datum toasted_col2[text]:'987654321098765432109876543210987654321098765432109
+ table public.toasted_key: UPDATE: old-key: toasted_key[text]:'123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678
COMMIT
BEGIN
table public.toasted_key: UPDATE: old-key: toasted_key[text]:'123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678
diff --git a/doc/src/sgml/ref/alter_table.sgml b/doc/src/sgml/ref/alter_table.sgml
index dee026e..d67ef7c 100644
--- a/doc/src/sgml/ref/alter_table.sgml
+++ b/doc/src/sgml/ref/alter_table.sgml
@@ -873,8 +873,10 @@ WITH ( MODULUS <replaceable class="parameter">numeric_literal</replaceable>, REM
This form changes the information which is written to the write-ahead log
to identify rows which are updated or deleted. This option has no effect
except when logical replication is in use.
- In all cases, no old values are logged unless at least one of the columns
- that would be logged differs between the old and new versions of the row.
+ In all cases except toasted values, no old values are logged unless at
+ least one of the columns that would be logged differs between the old and
+ new versions of the row. We detoast the unchanged old toast values and log
+ them.
<variablelist>
<varlistentry>
<term><literal>DEFAULT</literal></term>
diff --git a/src/backend/access/heap/heapam.c b/src/backend/access/heap/heapam.c
index 98230aa..4d80d35 100644
--- a/src/backend/access/heap/heapam.c
+++ b/src/backend/access/heap/heapam.c
@@ -78,9 +78,11 @@ static XLogRecPtr log_heap_update(Relation reln, Buffer oldbuf,
Buffer newbuf, HeapTuple oldtup,
HeapTuple newtup, HeapTuple old_key_tuple,
bool all_visible_cleared, bool new_all_visible_cleared);
-static Bitmapset *HeapDetermineModifiedColumns(Relation relation,
- Bitmapset *interesting_cols,
- HeapTuple oldtup, HeapTuple newtup);
+static Bitmapset *HeapDetermineColumnsInfo(Relation relation,
+ Bitmapset *interesting_cols,
+ Bitmapset *external_cols,
+ HeapTuple oldtup, HeapTuple newtup,
+ bool *id_has_external);
static bool heap_acquire_tuplock(Relation relation, ItemPointer tid,
LockTupleMode mode, LockWaitPolicy wait_policy,
bool *have_tuple_lock);
@@ -106,7 +108,7 @@ static bool ConditionalMultiXactIdWait(MultiXactId multi, MultiXactStatus status
static void index_delete_sort(TM_IndexDeleteOp *delstate);
static int bottomup_sort_and_shrink(TM_IndexDeleteOp *delstate);
static XLogRecPtr log_heap_new_cid(Relation relation, HeapTuple tup);
-static HeapTuple ExtractReplicaIdentity(Relation rel, HeapTuple tup, bool key_changed,
+static HeapTuple ExtractReplicaIdentity(Relation rel, HeapTuple tup, bool key_required,
bool *copy);
@@ -3184,6 +3186,7 @@ heap_update(Relation relation, ItemPointer otid, HeapTuple newtup,
bool all_visible_cleared_new = false;
bool checked_lockers;
bool locker_remains;
+ bool id_has_external = false;
TransactionId xmax_new_tuple,
xmax_old_tuple;
uint16 infomask_old_tuple,
@@ -3251,7 +3254,7 @@ heap_update(Relation relation, ItemPointer otid, HeapTuple newtup,
Assert(ItemIdIsNormal(lp));
/*
- * Fill in enough data in oldtup for HeapDetermineModifiedColumns to work
+ * Fill in enough data in oldtup for HeapDetermineColumnsInfo to work
* properly.
*/
oldtup.t_tableOid = RelationGetRelid(relation);
@@ -3262,9 +3265,17 @@ heap_update(Relation relation, ItemPointer otid, HeapTuple newtup,
/* the new tuple is ready, except for this: */
newtup->t_tableOid = RelationGetRelid(relation);
- /* Determine columns modified by the update. */
- modified_attrs = HeapDetermineModifiedColumns(relation, interesting_attrs,
- &oldtup, newtup);
+ /*
+ * Determine columns modified by the update. Additionally, identify
+ * whether any of the unmodified replica identity key attributes in the
+ * old tuple is externally stored or not. This is required because for
+ * such attributes the flattened value won't be WAL logged as part of the
+ * new tuple so we must include it as part of the old_key_tuple. See
+ * ExtractReplicaIdentity.
+ */
+ modified_attrs = HeapDetermineColumnsInfo(relation, interesting_attrs,
+ id_attrs, &oldtup,
+ newtup, &id_has_external);
/*
* If we're not updating any "key" column, we can grab a weaker lock type.
@@ -3864,10 +3875,12 @@ l2:
* Compute replica identity tuple before entering the critical section so
* we don't PANIC upon a memory allocation failure.
* ExtractReplicaIdentity() will return NULL if nothing needs to be
- * logged.
+ * logged. Pass old key required as true only if the replica identity key
+ * columns are modified or it has external data.
*/
old_key_tuple = ExtractReplicaIdentity(relation, &oldtup,
- bms_overlap(modified_attrs, id_attrs),
+ bms_overlap(modified_attrs, id_attrs) ||
+ id_has_external,
&old_key_copied);
/* NO EREPORT(ERROR) from here till changes are logged */
@@ -4023,48 +4036,16 @@ l2:
}
/*
- * Check if the specified attribute's value is same in both given tuples.
- * Subroutine for HeapDetermineModifiedColumns.
+ * Check if the specified attribute's values are the same. Subroutine for
+ * HeapDetermineColumnsInfo.
*/
static bool
-heap_tuple_attr_equals(TupleDesc tupdesc, int attrnum,
- HeapTuple tup1, HeapTuple tup2)
+heap_attr_equals(TupleDesc tupdesc, int attrnum, Datum value1, Datum value2,
+ bool isnull1, bool isnull2)
{
- Datum value1,
- value2;
- bool isnull1,
- isnull2;
Form_pg_attribute att;
/*
- * If it's a whole-tuple reference, say "not equal". It's not really
- * worth supporting this case, since it could only succeed after a no-op
- * update, which is hardly a case worth optimizing for.
- */
- if (attrnum == 0)
- return false;
-
- /*
- * Likewise, automatically say "not equal" for any system attribute other
- * than tableOID; we cannot expect these to be consistent in a HOT chain,
- * or even to be set correctly yet in the new tuple.
- */
- if (attrnum < 0)
- {
- if (attrnum != TableOidAttributeNumber)
- return false;
- }
-
- /*
- * Extract the corresponding values. XXX this is pretty inefficient if
- * there are many indexed columns. Should HeapDetermineModifiedColumns do
- * a single heap_deform_tuple call on each tuple, instead? But that
- * doesn't work for system columns ...
- */
- value1 = heap_getattr(tup1, attrnum, tupdesc, &isnull1);
- value2 = heap_getattr(tup2, attrnum, tupdesc, &isnull2);
-
- /*
* If one value is NULL and other is not, then they are certainly not
* equal
*/
@@ -4105,24 +4086,92 @@ heap_tuple_attr_equals(TupleDesc tupdesc, int attrnum,
* Given an updated tuple, determine (and return into the output bitmapset),
* from those listed as interesting, the set of columns that changed.
*
- * The input bitmapset is destructively modified; that is OK since this is
- * invoked at most once in heap_update.
+ * has_external indicates if any of the unmodified attributes (from those
+ * listed as interesting) of the old tuple is a member of external_cols and is
+ * stored externally.
+ *
+ * The input interesting_cols bitmapset is destructively modified; that is OK
+ * since this is invoked at most once in heap_update.
*/
static Bitmapset *
-HeapDetermineModifiedColumns(Relation relation, Bitmapset *interesting_cols,
- HeapTuple oldtup, HeapTuple newtup)
+HeapDetermineColumnsInfo(Relation relation,
+ Bitmapset *interesting_cols,
+ Bitmapset *external_cols,
+ HeapTuple oldtup, HeapTuple newtup,
+ bool *has_external)
{
- int attnum;
+ int attrnum;
Bitmapset *modified = NULL;
+ TupleDesc tupdesc = RelationGetDescr(relation);
- while ((attnum = bms_first_member(interesting_cols)) >= 0)
+ while ((attrnum = bms_first_member(interesting_cols)) >= 0)
{
- attnum += FirstLowInvalidHeapAttributeNumber;
+ Datum value1,
+ value2;
+ bool isnull1,
+ isnull2;
+
+ attrnum += FirstLowInvalidHeapAttributeNumber;
- if (!heap_tuple_attr_equals(RelationGetDescr(relation),
- attnum, oldtup, newtup))
+ /*
+ * If it's a whole-tuple reference, say "not equal". It's not really
+ * worth supporting this case, since it could only succeed after a
+ * no-op update, which is hardly a case worth optimizing for.
+ */
+ if (attrnum == 0)
+ {
modified = bms_add_member(modified,
- attnum - FirstLowInvalidHeapAttributeNumber);
+ attrnum -
+ FirstLowInvalidHeapAttributeNumber);
+ continue;
+ }
+
+ /*
+ * Likewise, automatically say "not equal" for any system attribute
+ * other than tableOID; we cannot expect these to be consistent in a
+ * HOT chain, or even to be set correctly yet in the new tuple.
+ */
+ if (attrnum < 0)
+ {
+ if (attrnum != TableOidAttributeNumber)
+ {
+ modified = bms_add_member(modified,
+ attrnum -
+ FirstLowInvalidHeapAttributeNumber);
+ continue;
+ }
+ }
+
+ /*
+ * Extract the corresponding values. XXX this is pretty inefficient
+ * if there are many indexed columns. Should we do a single
+ * heap_deform_tuple call on each tuple, instead? But that doesn't
+ * work for system columns ...
+ */
+ value1 = heap_getattr(oldtup, attrnum, tupdesc, &isnull1);
+ value2 = heap_getattr(newtup, attrnum, tupdesc, &isnull2);
+
+ if (!heap_attr_equals(RelationGetDescr(relation), attrnum, value1,
+ value2, isnull1, isnull2))
+ {
+ modified = bms_add_member(modified,
+ attrnum -
+ FirstLowInvalidHeapAttributeNumber);
+ continue;
+ }
+
+ /* No need to check attributes that can't be stored externally. */
+ if (isnull1 || TupleDescAttr(tupdesc, attrnum - 1)->attlen != -1)
+ continue;
+
+ /*
+ * Check if the old tuple's attribute is stored externally and is a
+ * member of external_cols.
+ */
+ if (VARATT_IS_EXTERNAL((struct varlena *) DatumGetPointer(value1)) &&
+ bms_is_member(attrnum - FirstLowInvalidHeapAttributeNumber,
+ external_cols))
+ *has_external = true;
}
return modified;
@@ -8359,14 +8408,15 @@ log_heap_new_cid(Relation relation, HeapTuple tup)
* Returns NULL if there's no need to log an identity or if there's no suitable
* key defined.
*
- * key_changed should be false if caller knows that no replica identity
- * columns changed value. It's always true in the DELETE case.
+ * key_required should be false if caller knows that no replica identity
+ * columns changed value and it doesn't has any external data. It's always
+ * true in the DELETE case.
*
* *copy is set to true if the returned tuple is a modified copy rather than
* the same tuple that was passed in.
*/
static HeapTuple
-ExtractReplicaIdentity(Relation relation, HeapTuple tp, bool key_changed,
+ExtractReplicaIdentity(Relation relation, HeapTuple tp, bool key_required,
bool *copy)
{
TupleDesc desc = RelationGetDescr(relation);
@@ -8398,8 +8448,8 @@ ExtractReplicaIdentity(Relation relation, HeapTuple tp, bool key_changed,
return tp;
}
- /* if the key hasn't changed and we're only logging the key, we're done */
- if (!key_changed)
+ /* if the key isn't required and we're only logging the key, we're done */
+ if (!key_required)
return NULL;
/* find out the replica identity columns */
@@ -8407,10 +8457,10 @@ ExtractReplicaIdentity(Relation relation, HeapTuple tp, bool key_changed,
INDEX_ATTR_BITMAP_IDENTITY_KEY);
/*
- * If there's no defined replica identity columns, treat as !key_changed.
+ * If there's no defined replica identity columns, treat as !key_required.
* (This case should not be reachable from heap_update, since that should
- * calculate key_changed accurately. But heap_delete just passes constant
- * true for key_changed, so we can hit this case in deletes.)
+ * calculate key_required accurately. But heap_delete just passes
+ * constant true for key_required, so we can hit this case in deletes.)
*/
if (bms_is_empty(idattrs))
return NULL;
--
1.8.3.1
14-v7-0001-WAL-log-unchanged-toasted-replica-identity-key-a.patchapplication/octet-stream; name=14-v7-0001-WAL-log-unchanged-toasted-replica-identity-key-a.patchDownload
From 8efed5ed8cac8a36bda3dd37bce2fb420e82e456 Mon Sep 17 00:00:00 2001
From: Amit Kapila <akapila@postgresql.org>
Date: Fri, 4 Feb 2022 11:04:34 +0530
Subject: [PATCH v14] WAL log unchanged toasted replica identity key
attributes.
Currently, during UPDATE, the unchanged replica identity key attributes
are not logged separately because they are getting logged as part of the
new tuple. But if they are stored externally then the untoasted values are
not getting logged as part of the new tuple and logical replication won't
be able to replicate such UPDATEs. So we need to log such attributes as
part of the old_key_tuple during UPDATE.
Reported-by: Haiying Tang
Author: Dilip Kumar and Amit Kapila
Backpatch-through: 10
Discussion: https://postgr.es/m/OS0PR01MB611342D0A92D4F4BF26C0F47FB229@OS0PR01MB6113.jpnprd01.prod.outlook.com
---
contrib/test_decoding/expected/toast.out | 2 +-
doc/src/sgml/ref/alter_table.sgml | 6 +-
src/backend/access/heap/heapam.c | 178 ++++++++++++++++++++-----------
3 files changed, 119 insertions(+), 67 deletions(-)
diff --git a/contrib/test_decoding/expected/toast.out b/contrib/test_decoding/expected/toast.out
index cd03e9d..a757e7d 100644
--- a/contrib/test_decoding/expected/toast.out
+++ b/contrib/test_decoding/expected/toast.out
@@ -77,7 +77,7 @@ SELECT substr(data, 1, 200) FROM pg_logical_slot_get_changes('regression_slot',
table public.toasted_key: INSERT: id[integer]:1 toasted_key[text]:'1234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123
COMMIT
BEGIN
- table public.toasted_key: UPDATE: id[integer]:1 toasted_key[text]:unchanged-toast-datum toasted_col1[text]:unchanged-toast-datum toasted_col2[text]:'987654321098765432109876543210987654321098765432109
+ table public.toasted_key: UPDATE: old-key: toasted_key[text]:'123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678
COMMIT
BEGIN
table public.toasted_key: UPDATE: old-key: toasted_key[text]:'123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678
diff --git a/doc/src/sgml/ref/alter_table.sgml b/doc/src/sgml/ref/alter_table.sgml
index d4f2a73..a3e14b2 100644
--- a/doc/src/sgml/ref/alter_table.sgml
+++ b/doc/src/sgml/ref/alter_table.sgml
@@ -852,8 +852,10 @@ WITH ( MODULUS <replaceable class="parameter">numeric_literal</replaceable>, REM
This form changes the information which is written to the write-ahead log
to identify rows which are updated or deleted. This option has no effect
except when logical replication is in use.
- In all cases, no old values are logged unless at least one of the columns
- that would be logged differs between the old and new versions of the row.
+ In all cases except toasted values, no old values are logged unless at
+ least one of the columns that would be logged differs between the old and
+ new versions of the row. We detoast the unchanged old toast values and log
+ them.
<variablelist>
<varlistentry>
<term><literal>DEFAULT</literal></term>
diff --git a/src/backend/access/heap/heapam.c b/src/backend/access/heap/heapam.c
index 114a969..d85a019 100644
--- a/src/backend/access/heap/heapam.c
+++ b/src/backend/access/heap/heapam.c
@@ -78,9 +78,11 @@ static XLogRecPtr log_heap_update(Relation reln, Buffer oldbuf,
Buffer newbuf, HeapTuple oldtup,
HeapTuple newtup, HeapTuple old_key_tuple,
bool all_visible_cleared, bool new_all_visible_cleared);
-static Bitmapset *HeapDetermineModifiedColumns(Relation relation,
- Bitmapset *interesting_cols,
- HeapTuple oldtup, HeapTuple newtup);
+static Bitmapset *HeapDetermineColumnsInfo(Relation relation,
+ Bitmapset *interesting_cols,
+ Bitmapset *external_cols,
+ HeapTuple oldtup, HeapTuple newtup,
+ bool *id_has_external);
static bool heap_acquire_tuplock(Relation relation, ItemPointer tid,
LockTupleMode mode, LockWaitPolicy wait_policy,
bool *have_tuple_lock);
@@ -106,7 +108,7 @@ static bool ConditionalMultiXactIdWait(MultiXactId multi, MultiXactStatus status
static void index_delete_sort(TM_IndexDeleteOp *delstate);
static int bottomup_sort_and_shrink(TM_IndexDeleteOp *delstate);
static XLogRecPtr log_heap_new_cid(Relation relation, HeapTuple tup);
-static HeapTuple ExtractReplicaIdentity(Relation rel, HeapTuple tup, bool key_changed,
+static HeapTuple ExtractReplicaIdentity(Relation rel, HeapTuple tup, bool key_required,
bool *copy);
@@ -3185,6 +3187,7 @@ heap_update(Relation relation, ItemPointer otid, HeapTuple newtup,
bool all_visible_cleared_new = false;
bool checked_lockers;
bool locker_remains;
+ bool id_has_external = false;
TransactionId xmax_new_tuple,
xmax_old_tuple;
uint16 infomask_old_tuple,
@@ -3269,7 +3272,7 @@ heap_update(Relation relation, ItemPointer otid, HeapTuple newtup,
Assert(ItemIdIsNormal(lp));
/*
- * Fill in enough data in oldtup for HeapDetermineModifiedColumns to work
+ * Fill in enough data in oldtup for HeapDetermineColumnsInfo to work
* properly.
*/
oldtup.t_tableOid = RelationGetRelid(relation);
@@ -3280,9 +3283,17 @@ heap_update(Relation relation, ItemPointer otid, HeapTuple newtup,
/* the new tuple is ready, except for this: */
newtup->t_tableOid = RelationGetRelid(relation);
- /* Determine columns modified by the update. */
- modified_attrs = HeapDetermineModifiedColumns(relation, interesting_attrs,
- &oldtup, newtup);
+ /*
+ * Determine columns modified by the update. Additionally, identify
+ * whether any of the unmodified replica identity key attributes in the
+ * old tuple is externally stored or not. This is required because for
+ * such attributes the flattened value won't be WAL logged as part of the
+ * new tuple so we must include it as part of the old_key_tuple. See
+ * ExtractReplicaIdentity.
+ */
+ modified_attrs = HeapDetermineColumnsInfo(relation, interesting_attrs,
+ id_attrs, &oldtup,
+ newtup, &id_has_external);
/*
* If we're not updating any "key" column, we can grab a weaker lock type.
@@ -3883,10 +3894,12 @@ l2:
* Compute replica identity tuple before entering the critical section so
* we don't PANIC upon a memory allocation failure.
* ExtractReplicaIdentity() will return NULL if nothing needs to be
- * logged.
+ * logged. Pass old key required as true only if the replica identity key
+ * columns are modified or it has external data.
*/
old_key_tuple = ExtractReplicaIdentity(relation, &oldtup,
- bms_overlap(modified_attrs, id_attrs),
+ bms_overlap(modified_attrs, id_attrs) ||
+ id_has_external,
&old_key_copied);
/* NO EREPORT(ERROR) from here till changes are logged */
@@ -4042,48 +4055,16 @@ l2:
}
/*
- * Check if the specified attribute's value is same in both given tuples.
- * Subroutine for HeapDetermineModifiedColumns.
+ * Check if the specified attribute's values are the same. Subroutine for
+ * HeapDetermineColumnsInfo.
*/
static bool
-heap_tuple_attr_equals(TupleDesc tupdesc, int attrnum,
- HeapTuple tup1, HeapTuple tup2)
+heap_attr_equals(TupleDesc tupdesc, int attrnum, Datum value1, Datum value2,
+ bool isnull1, bool isnull2)
{
- Datum value1,
- value2;
- bool isnull1,
- isnull2;
Form_pg_attribute att;
/*
- * If it's a whole-tuple reference, say "not equal". It's not really
- * worth supporting this case, since it could only succeed after a no-op
- * update, which is hardly a case worth optimizing for.
- */
- if (attrnum == 0)
- return false;
-
- /*
- * Likewise, automatically say "not equal" for any system attribute other
- * than tableOID; we cannot expect these to be consistent in a HOT chain,
- * or even to be set correctly yet in the new tuple.
- */
- if (attrnum < 0)
- {
- if (attrnum != TableOidAttributeNumber)
- return false;
- }
-
- /*
- * Extract the corresponding values. XXX this is pretty inefficient if
- * there are many indexed columns. Should HeapDetermineModifiedColumns do
- * a single heap_deform_tuple call on each tuple, instead? But that
- * doesn't work for system columns ...
- */
- value1 = heap_getattr(tup1, attrnum, tupdesc, &isnull1);
- value2 = heap_getattr(tup2, attrnum, tupdesc, &isnull2);
-
- /*
* If one value is NULL and other is not, then they are certainly not
* equal
*/
@@ -4124,24 +4105,92 @@ heap_tuple_attr_equals(TupleDesc tupdesc, int attrnum,
* Given an updated tuple, determine (and return into the output bitmapset),
* from those listed as interesting, the set of columns that changed.
*
- * The input bitmapset is destructively modified; that is OK since this is
- * invoked at most once in heap_update.
+ * has_external indicates if any of the unmodified attributes (from those
+ * listed as interesting) of the old tuple is a member of external_cols and is
+ * stored externally.
+ *
+ * The input interesting_cols bitmapset is destructively modified; that is OK
+ * since this is invoked at most once in heap_update.
*/
static Bitmapset *
-HeapDetermineModifiedColumns(Relation relation, Bitmapset *interesting_cols,
- HeapTuple oldtup, HeapTuple newtup)
+HeapDetermineColumnsInfo(Relation relation,
+ Bitmapset *interesting_cols,
+ Bitmapset *external_cols,
+ HeapTuple oldtup, HeapTuple newtup,
+ bool *has_external)
{
- int attnum;
+ int attrnum;
Bitmapset *modified = NULL;
+ TupleDesc tupdesc = RelationGetDescr(relation);
- while ((attnum = bms_first_member(interesting_cols)) >= 0)
+ while ((attrnum = bms_first_member(interesting_cols)) >= 0)
{
- attnum += FirstLowInvalidHeapAttributeNumber;
+ Datum value1,
+ value2;
+ bool isnull1,
+ isnull2;
+
+ attrnum += FirstLowInvalidHeapAttributeNumber;
- if (!heap_tuple_attr_equals(RelationGetDescr(relation),
- attnum, oldtup, newtup))
+ /*
+ * If it's a whole-tuple reference, say "not equal". It's not really
+ * worth supporting this case, since it could only succeed after a
+ * no-op update, which is hardly a case worth optimizing for.
+ */
+ if (attrnum == 0)
+ {
modified = bms_add_member(modified,
- attnum - FirstLowInvalidHeapAttributeNumber);
+ attrnum -
+ FirstLowInvalidHeapAttributeNumber);
+ continue;
+ }
+
+ /*
+ * Likewise, automatically say "not equal" for any system attribute
+ * other than tableOID; we cannot expect these to be consistent in a
+ * HOT chain, or even to be set correctly yet in the new tuple.
+ */
+ if (attrnum < 0)
+ {
+ if (attrnum != TableOidAttributeNumber)
+ {
+ modified = bms_add_member(modified,
+ attrnum -
+ FirstLowInvalidHeapAttributeNumber);
+ continue;
+ }
+ }
+
+ /*
+ * Extract the corresponding values. XXX this is pretty inefficient
+ * if there are many indexed columns. Should we do a single
+ * heap_deform_tuple call on each tuple, instead? But that doesn't
+ * work for system columns ...
+ */
+ value1 = heap_getattr(oldtup, attrnum, tupdesc, &isnull1);
+ value2 = heap_getattr(newtup, attrnum, tupdesc, &isnull2);
+
+ if (!heap_attr_equals(RelationGetDescr(relation), attrnum, value1,
+ value2, isnull1, isnull2))
+ {
+ modified = bms_add_member(modified,
+ attrnum -
+ FirstLowInvalidHeapAttributeNumber);
+ continue;
+ }
+
+ /* No need to check attributes that can't be stored externally. */
+ if (isnull1 || TupleDescAttr(tupdesc, attrnum - 1)->attlen != -1)
+ continue;
+
+ /*
+ * Check if the old tuple's attribute is stored externally and is a
+ * member of external_cols.
+ */
+ if (VARATT_IS_EXTERNAL((struct varlena *) DatumGetPointer(value1)) &&
+ bms_is_member(attrnum - FirstLowInvalidHeapAttributeNumber,
+ external_cols))
+ *has_external = true;
}
return modified;
@@ -8306,14 +8355,15 @@ log_heap_new_cid(Relation relation, HeapTuple tup)
* Returns NULL if there's no need to log an identity or if there's no suitable
* key defined.
*
- * key_changed should be false if caller knows that no replica identity
- * columns changed value. It's always true in the DELETE case.
+ * key_required should be false if caller knows that no replica identity
+ * columns changed value and it doesn't has any external data. It's always
+ * true in the DELETE case.
*
* *copy is set to true if the returned tuple is a modified copy rather than
* the same tuple that was passed in.
*/
static HeapTuple
-ExtractReplicaIdentity(Relation relation, HeapTuple tp, bool key_changed,
+ExtractReplicaIdentity(Relation relation, HeapTuple tp, bool key_required,
bool *copy)
{
TupleDesc desc = RelationGetDescr(relation);
@@ -8345,8 +8395,8 @@ ExtractReplicaIdentity(Relation relation, HeapTuple tp, bool key_changed,
return tp;
}
- /* if the key hasn't changed and we're only logging the key, we're done */
- if (!key_changed)
+ /* if the key isn't required and we're only logging the key, we're done */
+ if (!key_required)
return NULL;
/* find out the replica identity columns */
@@ -8354,10 +8404,10 @@ ExtractReplicaIdentity(Relation relation, HeapTuple tp, bool key_changed,
INDEX_ATTR_BITMAP_IDENTITY_KEY);
/*
- * If there's no defined replica identity columns, treat as !key_changed.
+ * If there's no defined replica identity columns, treat as !key_required.
* (This case should not be reachable from heap_update, since that should
- * calculate key_changed accurately. But heap_delete just passes constant
- * true for key_changed, so we can hit this case in deletes.)
+ * calculate key_required accurately. But heap_delete just passes
+ * constant true for key_required, so we can hit this case in deletes.)
*/
if (bms_is_empty(idattrs))
return NULL;
--
1.8.3.1
12-v7-0001-WAL-log-unchanged-toasted-replica-identity-key-a.patchapplication/octet-stream; name=12-v7-0001-WAL-log-unchanged-toasted-replica-identity-key-a.patchDownload
From 3d2ea51fda6a9f7cfba5069418e9b0cf04c9d7cb Mon Sep 17 00:00:00 2001
From: Amit Kapila <akapila@postgresql.org>
Date: Fri, 4 Feb 2022 12:12:40 +0530
Subject: [PATCH v12] WAL log unchanged toasted replica identity key
attributes.
Currently, during UPDATE, the unchanged replica identity key attributes
are not logged separately because they are getting logged as part of the
new tuple. But if they are stored externally then the untoasted values are
not getting logged as part of the new tuple and logical replication won't
be able to replicate such UPDATEs. So we need to log such attributes as
part of the old_key_tuple during UPDATE.
Reported-by: Haiying Tang
Author: Dilip Kumar and Amit Kapila
Backpatch-through: 10
Discussion: https://postgr.es/m/OS0PR01MB611342D0A92D4F4BF26C0F47FB229@OS0PR01MB6113.jpnprd01.prod.outlook.com
---
contrib/test_decoding/expected/toast.out | 2 +-
doc/src/sgml/ref/alter_table.sgml | 6 +-
src/backend/access/heap/heapam.c | 178 ++++++++++++++++++++-----------
3 files changed, 119 insertions(+), 67 deletions(-)
diff --git a/contrib/test_decoding/expected/toast.out b/contrib/test_decoding/expected/toast.out
index cd03e9d..a757e7d 100644
--- a/contrib/test_decoding/expected/toast.out
+++ b/contrib/test_decoding/expected/toast.out
@@ -77,7 +77,7 @@ SELECT substr(data, 1, 200) FROM pg_logical_slot_get_changes('regression_slot',
table public.toasted_key: INSERT: id[integer]:1 toasted_key[text]:'1234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123
COMMIT
BEGIN
- table public.toasted_key: UPDATE: id[integer]:1 toasted_key[text]:unchanged-toast-datum toasted_col1[text]:unchanged-toast-datum toasted_col2[text]:'987654321098765432109876543210987654321098765432109
+ table public.toasted_key: UPDATE: old-key: toasted_key[text]:'123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678
COMMIT
BEGIN
table public.toasted_key: UPDATE: old-key: toasted_key[text]:'123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678
diff --git a/doc/src/sgml/ref/alter_table.sgml b/doc/src/sgml/ref/alter_table.sgml
index c572f75..3ef87a8 100644
--- a/doc/src/sgml/ref/alter_table.sgml
+++ b/doc/src/sgml/ref/alter_table.sgml
@@ -800,8 +800,10 @@ WITH ( MODULUS <replaceable class="parameter">numeric_literal</replaceable>, REM
This form changes the information which is written to the write-ahead log
to identify rows which are updated or deleted. This option has no effect
except when logical replication is in use.
- In all cases, no old values are logged unless at least one of the columns
- that would be logged differs between the old and new versions of the row.
+ In all cases except toasted values, no old values are logged unless at
+ least one of the columns that would be logged differs between the old and
+ new versions of the row. We detoast the unchanged old toast values and log
+ them.
<variablelist>
<varlistentry>
<term><literal>DEFAULT</literal></term>
diff --git a/src/backend/access/heap/heapam.c b/src/backend/access/heap/heapam.c
index 24d3932..3f0f543 100644
--- a/src/backend/access/heap/heapam.c
+++ b/src/backend/access/heap/heapam.c
@@ -76,9 +76,11 @@ static XLogRecPtr log_heap_update(Relation reln, Buffer oldbuf,
Buffer newbuf, HeapTuple oldtup,
HeapTuple newtup, HeapTuple old_key_tup,
bool all_visible_cleared, bool new_all_visible_cleared);
-static Bitmapset *HeapDetermineModifiedColumns(Relation relation,
- Bitmapset *interesting_cols,
- HeapTuple oldtup, HeapTuple newtup);
+static Bitmapset *HeapDetermineColumnsInfo(Relation relation,
+ Bitmapset *interesting_cols,
+ Bitmapset *external_cols,
+ HeapTuple oldtup, HeapTuple newtup,
+ bool *id_has_external);
static bool heap_acquire_tuplock(Relation relation, ItemPointer tid,
LockTupleMode mode, LockWaitPolicy wait_policy,
bool *have_tuple_lock);
@@ -102,7 +104,7 @@ static void MultiXactIdWait(MultiXactId multi, MultiXactStatus status, uint16 in
static bool ConditionalMultiXactIdWait(MultiXactId multi, MultiXactStatus status,
uint16 infomask, Relation rel, int *remaining);
static XLogRecPtr log_heap_new_cid(Relation relation, HeapTuple tup);
-static HeapTuple ExtractReplicaIdentity(Relation rel, HeapTuple tup, bool key_modified,
+static HeapTuple ExtractReplicaIdentity(Relation rel, HeapTuple tup, bool key_required,
bool *copy);
@@ -2941,6 +2943,7 @@ heap_update(Relation relation, ItemPointer otid, HeapTuple newtup,
bool all_visible_cleared_new = false;
bool checked_lockers;
bool locker_remains;
+ bool id_has_external = false;
TransactionId xmax_new_tuple,
xmax_old_tuple;
uint16 infomask_old_tuple,
@@ -3025,7 +3028,7 @@ heap_update(Relation relation, ItemPointer otid, HeapTuple newtup,
Assert(ItemIdIsNormal(lp));
/*
- * Fill in enough data in oldtup for HeapDetermineModifiedColumns to work
+ * Fill in enough data in oldtup for HeapDetermineColumnsInfo to work
* properly.
*/
oldtup.t_tableOid = RelationGetRelid(relation);
@@ -3036,9 +3039,17 @@ heap_update(Relation relation, ItemPointer otid, HeapTuple newtup,
/* the new tuple is ready, except for this: */
newtup->t_tableOid = RelationGetRelid(relation);
- /* Determine columns modified by the update. */
- modified_attrs = HeapDetermineModifiedColumns(relation, interesting_attrs,
- &oldtup, newtup);
+ /*
+ * Determine columns modified by the update. Additionally, identify
+ * whether any of the unmodified replica identity key attributes in the
+ * old tuple is externally stored or not. This is required because for
+ * such attributes the flattened value won't be WAL logged as part of the
+ * new tuple so we must include it as part of the old_key_tuple. See
+ * ExtractReplicaIdentity.
+ */
+ modified_attrs = HeapDetermineColumnsInfo(relation, interesting_attrs,
+ id_attrs, &oldtup,
+ newtup, &id_has_external);
/*
* If we're not updating any "key" column, we can grab a weaker lock type.
@@ -3639,10 +3650,12 @@ l2:
* Compute replica identity tuple before entering the critical section so
* we don't PANIC upon a memory allocation failure.
* ExtractReplicaIdentity() will return NULL if nothing needs to be
- * logged.
+ * logged. Pass old key required as true only if the replica identity key
+ * columns are modified or it has external data.
*/
old_key_tuple = ExtractReplicaIdentity(relation, &oldtup,
- bms_overlap(modified_attrs, id_attrs),
+ bms_overlap(modified_attrs, id_attrs) ||
+ id_has_external,
&old_key_copied);
/* NO EREPORT(ERROR) from here till changes are logged */
@@ -3798,48 +3811,16 @@ l2:
}
/*
- * Check if the specified attribute's value is same in both given tuples.
- * Subroutine for HeapDetermineModifiedColumns.
+ * Check if the specified attribute's values are the same. Subroutine for
+ * HeapDetermineColumnsInfo.
*/
static bool
-heap_tuple_attr_equals(TupleDesc tupdesc, int attrnum,
- HeapTuple tup1, HeapTuple tup2)
+heap_attr_equals(TupleDesc tupdesc, int attrnum, Datum value1, Datum value2,
+ bool isnull1, bool isnull2)
{
- Datum value1,
- value2;
- bool isnull1,
- isnull2;
Form_pg_attribute att;
/*
- * If it's a whole-tuple reference, say "not equal". It's not really
- * worth supporting this case, since it could only succeed after a no-op
- * update, which is hardly a case worth optimizing for.
- */
- if (attrnum == 0)
- return false;
-
- /*
- * Likewise, automatically say "not equal" for any system attribute other
- * than tableOID; we cannot expect these to be consistent in a HOT chain,
- * or even to be set correctly yet in the new tuple.
- */
- if (attrnum < 0)
- {
- if (attrnum != TableOidAttributeNumber)
- return false;
- }
-
- /*
- * Extract the corresponding values. XXX this is pretty inefficient if
- * there are many indexed columns. Should HeapDetermineModifiedColumns do
- * a single heap_deform_tuple call on each tuple, instead? But that
- * doesn't work for system columns ...
- */
- value1 = heap_getattr(tup1, attrnum, tupdesc, &isnull1);
- value2 = heap_getattr(tup2, attrnum, tupdesc, &isnull2);
-
- /*
* If one value is NULL and other is not, then they are certainly not
* equal
*/
@@ -3880,24 +3861,92 @@ heap_tuple_attr_equals(TupleDesc tupdesc, int attrnum,
* Given an updated tuple, determine (and return into the output bitmapset),
* from those listed as interesting, the set of columns that changed.
*
- * The input bitmapset is destructively modified; that is OK since this is
- * invoked at most once in heap_update.
+ * has_external indicates if any of the unmodified attributes (from those
+ * listed as interesting) of the old tuple is a member of external_cols and is
+ * stored externally.
+ *
+ * The input interesting_cols bitmapset is destructively modified; that is OK
+ * since this is invoked at most once in heap_update.
*/
static Bitmapset *
-HeapDetermineModifiedColumns(Relation relation, Bitmapset *interesting_cols,
- HeapTuple oldtup, HeapTuple newtup)
+HeapDetermineColumnsInfo(Relation relation,
+ Bitmapset *interesting_cols,
+ Bitmapset *external_cols,
+ HeapTuple oldtup, HeapTuple newtup,
+ bool *has_external)
{
- int attnum;
+ int attrnum;
Bitmapset *modified = NULL;
+ TupleDesc tupdesc = RelationGetDescr(relation);
- while ((attnum = bms_first_member(interesting_cols)) >= 0)
+ while ((attrnum = bms_first_member(interesting_cols)) >= 0)
{
- attnum += FirstLowInvalidHeapAttributeNumber;
+ Datum value1,
+ value2;
+ bool isnull1,
+ isnull2;
- if (!heap_tuple_attr_equals(RelationGetDescr(relation),
- attnum, oldtup, newtup))
+ attrnum += FirstLowInvalidHeapAttributeNumber;
+
+ /*
+ * If it's a whole-tuple reference, say "not equal". It's not really
+ * worth supporting this case, since it could only succeed after a
+ * no-op update, which is hardly a case worth optimizing for.
+ */
+ if (attrnum == 0)
+ {
modified = bms_add_member(modified,
- attnum - FirstLowInvalidHeapAttributeNumber);
+ attrnum -
+ FirstLowInvalidHeapAttributeNumber);
+ continue;
+ }
+
+ /*
+ * Likewise, automatically say "not equal" for any system attribute
+ * other than tableOID; we cannot expect these to be consistent in a
+ * HOT chain, or even to be set correctly yet in the new tuple.
+ */
+ if (attrnum < 0)
+ {
+ if (attrnum != TableOidAttributeNumber)
+ {
+ modified = bms_add_member(modified,
+ attrnum -
+ FirstLowInvalidHeapAttributeNumber);
+ continue;
+ }
+ }
+
+ /*
+ * Extract the corresponding values. XXX this is pretty inefficient
+ * if there are many indexed columns. Should we do a single
+ * heap_deform_tuple call on each tuple, instead? But that doesn't
+ * work for system columns ...
+ */
+ value1 = heap_getattr(oldtup, attrnum, tupdesc, &isnull1);
+ value2 = heap_getattr(newtup, attrnum, tupdesc, &isnull2);
+
+ if (!heap_attr_equals(RelationGetDescr(relation), attrnum, value1,
+ value2, isnull1, isnull2))
+ {
+ modified = bms_add_member(modified,
+ attrnum -
+ FirstLowInvalidHeapAttributeNumber);
+ continue;
+ }
+
+ /* No need to check attributes that can't be stored externally. */
+ if (isnull1 || TupleDescAttr(tupdesc, attrnum - 1)->attlen != -1)
+ continue;
+
+ /*
+ * Check if the old tuple's attribute is stored externally and is a
+ * member of external_cols.
+ */
+ if (VARATT_IS_EXTERNAL((struct varlena *) DatumGetPointer(value1)) &&
+ bms_is_member(attrnum - FirstLowInvalidHeapAttributeNumber,
+ external_cols))
+ *has_external = true;
}
return modified;
@@ -7661,14 +7710,15 @@ log_heap_new_cid(Relation relation, HeapTuple tup)
* Returns NULL if there's no need to log an identity or if there's no suitable
* key defined.
*
- * key_changed should be false if caller knows that no replica identity
- * columns changed value. It's always true in the DELETE case.
+ * key_required should be false if caller knows that no replica identity
+ * columns changed value and it doesn't has any external data. It's always
+ * true in the DELETE case.
*
* *copy is set to true if the returned tuple is a modified copy rather than
* the same tuple that was passed in.
*/
static HeapTuple
-ExtractReplicaIdentity(Relation relation, HeapTuple tp, bool key_changed,
+ExtractReplicaIdentity(Relation relation, HeapTuple tp, bool key_required,
bool *copy)
{
TupleDesc desc = RelationGetDescr(relation);
@@ -7700,8 +7750,8 @@ ExtractReplicaIdentity(Relation relation, HeapTuple tp, bool key_changed,
return tp;
}
- /* if the key hasn't changed and we're only logging the key, we're done */
- if (!key_changed)
+ /* if the key isn't required and we're only logging the key, we're done */
+ if (!key_required)
return NULL;
/* find out the replica identity columns */
@@ -7709,10 +7759,10 @@ ExtractReplicaIdentity(Relation relation, HeapTuple tp, bool key_changed,
INDEX_ATTR_BITMAP_IDENTITY_KEY);
/*
- * If there's no defined replica identity columns, treat as !key_changed.
+ * If there's no defined replica identity columns, treat as !key_required.
* (This case should not be reachable from heap_update, since that should
- * calculate key_changed accurately. But heap_delete just passes constant
- * true for key_changed, so we can hit this case in deletes.)
+ * calculate key_required accurately. But heap_delete just passes
+ * constant true for key_required, so we can hit this case in deletes.)
*/
if (bms_is_empty(idattrs))
return NULL;
--
1.8.3.1
11-v7-0001-WAL-log-unchanged-toasted-replica-identity-key-a.patchapplication/octet-stream; name=11-v7-0001-WAL-log-unchanged-toasted-replica-identity-key-a.patchDownload
From 7fb435da40b61dcd4a2e75c767f88bbee36c6ae4 Mon Sep 17 00:00:00 2001
From: Amit Kapila <akapila@postgresql.org>
Date: Fri, 4 Feb 2022 16:30:33 +0530
Subject: [PATCH v11] WAL log unchanged toasted replica identity key
attributes.
Currently, during UPDATE, the unchanged replica identity key attributes
are not logged separately because they are getting logged as part of the
new tuple. But if they are stored externally then the untoasted values are
not getting logged as part of the new tuple and logical replication won't
be able to replicate such UPDATEs. So we need to log such attributes as
part of the old_key_tuple during UPDATE.
Reported-by: Haiying Tang
Author: Dilip Kumar and Amit Kapila
Backpatch-through: 10
Discussion: https://postgr.es/m/OS0PR01MB611342D0A92D4F4BF26C0F47FB229@OS0PR01MB6113.jpnprd01.prod.outlook.com
---
contrib/test_decoding/expected/toast.out | 2 +-
doc/src/sgml/ref/alter_table.sgml | 6 +-
src/backend/access/heap/heapam.c | 173 +++++++++++++++--------
3 files changed, 118 insertions(+), 63 deletions(-)
diff --git a/contrib/test_decoding/expected/toast.out b/contrib/test_decoding/expected/toast.out
index cd03e9d50a..a757e7dc8d 100644
--- a/contrib/test_decoding/expected/toast.out
+++ b/contrib/test_decoding/expected/toast.out
@@ -77,7 +77,7 @@ SELECT substr(data, 1, 200) FROM pg_logical_slot_get_changes('regression_slot',
table public.toasted_key: INSERT: id[integer]:1 toasted_key[text]:'1234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123
COMMIT
BEGIN
- table public.toasted_key: UPDATE: id[integer]:1 toasted_key[text]:unchanged-toast-datum toasted_col1[text]:unchanged-toast-datum toasted_col2[text]:'987654321098765432109876543210987654321098765432109
+ table public.toasted_key: UPDATE: old-key: toasted_key[text]:'123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678
COMMIT
BEGIN
table public.toasted_key: UPDATE: old-key: toasted_key[text]:'123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678
diff --git a/doc/src/sgml/ref/alter_table.sgml b/doc/src/sgml/ref/alter_table.sgml
index 3e18b7dba5..b526a0e26b 100644
--- a/doc/src/sgml/ref/alter_table.sgml
+++ b/doc/src/sgml/ref/alter_table.sgml
@@ -815,8 +815,10 @@ WITH ( MODULUS <replaceable class="parameter">numeric_literal</replaceable>, REM
This form changes the information which is written to the write-ahead log
to identify rows which are updated or deleted. This option has no effect
except when logical replication is in use.
- In all cases, no old values are logged unless at least one of the columns
- that would be logged differs between the old and new versions of the row.
+ In all cases except toasted values, no old values are logged unless at
+ least one of the columns that would be logged differs between the old and
+ new versions of the row. We detoast the unchanged old toast values and log
+ them.
<variablelist>
<varlistentry>
<term><literal>DEFAULT</literal></term>
diff --git a/src/backend/access/heap/heapam.c b/src/backend/access/heap/heapam.c
index 834435cdf7..1a791e0b08 100644
--- a/src/backend/access/heap/heapam.c
+++ b/src/backend/access/heap/heapam.c
@@ -101,9 +101,11 @@ static XLogRecPtr log_heap_update(Relation reln, Buffer oldbuf,
Buffer newbuf, HeapTuple oldtup,
HeapTuple newtup, HeapTuple old_key_tup,
bool all_visible_cleared, bool new_all_visible_cleared);
-static Bitmapset *HeapDetermineModifiedColumns(Relation relation,
- Bitmapset *interesting_cols,
- HeapTuple oldtup, HeapTuple newtup);
+static Bitmapset *HeapDetermineColumnsInfo(Relation relation,
+ Bitmapset *interesting_cols,
+ Bitmapset *external_cols,
+ HeapTuple oldtup, HeapTuple newtup,
+ bool *id_has_external);
static bool heap_acquire_tuplock(Relation relation, ItemPointer tid,
LockTupleMode mode, LockWaitPolicy wait_policy,
bool *have_tuple_lock);
@@ -127,7 +129,7 @@ static void MultiXactIdWait(MultiXactId multi, MultiXactStatus status, uint16 in
static bool ConditionalMultiXactIdWait(MultiXactId multi, MultiXactStatus status,
uint16 infomask, Relation rel, int *remaining);
static XLogRecPtr log_heap_new_cid(Relation relation, HeapTuple tup);
-static HeapTuple ExtractReplicaIdentity(Relation rel, HeapTuple tup, bool key_modified,
+static HeapTuple ExtractReplicaIdentity(Relation rel, HeapTuple tup, bool key_required,
bool *copy);
static bool ProjIndexIsUnchanged(Relation relation, HeapTuple oldtup, HeapTuple newtup);
@@ -3568,6 +3570,7 @@ heap_update(Relation relation, ItemPointer otid, HeapTuple newtup,
bool all_visible_cleared_new = false;
bool checked_lockers;
bool locker_remains;
+ bool id_has_external = false;
TransactionId xmax_new_tuple,
xmax_old_tuple;
uint16 infomask_old_tuple,
@@ -3652,7 +3655,7 @@ heap_update(Relation relation, ItemPointer otid, HeapTuple newtup,
Assert(ItemIdIsNormal(lp));
/*
- * Fill in enough data in oldtup for HeapDetermineModifiedColumns to work
+ * Fill in enough data in oldtup for HeapDetermineColumnsInfo to work
* properly.
*/
oldtup.t_tableOid = RelationGetRelid(relation);
@@ -3678,9 +3681,17 @@ heap_update(Relation relation, ItemPointer otid, HeapTuple newtup,
Assert(!(newtup->t_data->t_infomask & HEAP_HASOID));
}
- /* Determine columns modified by the update. */
- modified_attrs = HeapDetermineModifiedColumns(relation, interesting_attrs,
- &oldtup, newtup);
+ /*
+ * Determine columns modified by the update. Additionally, identify
+ * whether any of the unmodified replica identity key attributes in the
+ * old tuple is externally stored or not. This is required because for
+ * such attributes the flattened value won't be WAL logged as part of the
+ * new tuple so we must include it as part of the old_key_tuple. See
+ * ExtractReplicaIdentity.
+ */
+ modified_attrs = HeapDetermineColumnsInfo(relation, interesting_attrs,
+ id_attrs, &oldtup,
+ newtup, &id_has_external);
/*
* If we're not updating any "key" column, we can grab a weaker lock type.
@@ -4266,10 +4277,12 @@ l2:
* Compute replica identity tuple before entering the critical section so
* we don't PANIC upon a memory allocation failure.
* ExtractReplicaIdentity() will return NULL if nothing needs to be
- * logged.
+ * logged. Pass old key required as true only if the replica identity key
+ * columns are modified or it has external data.
*/
old_key_tuple = ExtractReplicaIdentity(relation, &oldtup,
- bms_overlap(modified_attrs, id_attrs),
+ bms_overlap(modified_attrs, id_attrs) ||
+ id_has_external,
&old_key_copied);
/* NO EREPORT(ERROR) from here till changes are logged */
@@ -4426,48 +4439,15 @@ l2:
}
/*
- * Check if the specified attribute's value is same in both given tuples.
- * Subroutine for HeapDetermineModifiedColumns.
+ * Check if the specified attribute's values are the same. Subroutine for
+ * HeapDetermineColumnsInfo.
*/
static bool
-heap_tuple_attr_equals(TupleDesc tupdesc, int attrnum,
- HeapTuple tup1, HeapTuple tup2)
+heap_attr_equals(TupleDesc tupdesc, int attrnum, Datum value1, Datum value2,
+ bool isnull1, bool isnull2)
{
- Datum value1,
- value2;
- bool isnull1,
- isnull2;
Form_pg_attribute att;
- /*
- * If it's a whole-tuple reference, say "not equal". It's not really
- * worth supporting this case, since it could only succeed after a no-op
- * update, which is hardly a case worth optimizing for.
- */
- if (attrnum == 0)
- return false;
-
- /*
- * Likewise, automatically say "not equal" for any system attribute other
- * than OID and tableOID; we cannot expect these to be consistent in a HOT
- * chain, or even to be set correctly yet in the new tuple.
- */
- if (attrnum < 0)
- {
- if (attrnum != ObjectIdAttributeNumber &&
- attrnum != TableOidAttributeNumber)
- return false;
- }
-
- /*
- * Extract the corresponding values. XXX this is pretty inefficient if
- * there are many indexed columns. Should HeapDetermineModifiedColumns do
- * a single heap_deform_tuple call on each tuple, instead? But that
- * doesn't work for system columns ...
- */
- value1 = heap_getattr(tup1, attrnum, tupdesc, &isnull1);
- value2 = heap_getattr(tup2, attrnum, tupdesc, &isnull2);
-
/*
* If one value is NULL and other is not, then they are certainly not
* equal
@@ -4589,24 +4569,93 @@ ProjIndexIsUnchanged(Relation relation, HeapTuple oldtup, HeapTuple newtup)
* Given an updated tuple, determine (and return into the output bitmapset),
* from those listed as interesting, the set of columns that changed.
*
- * The input bitmapset is destructively modified; that is OK since this is
- * invoked at most once in heap_update.
+ * has_external indicates if any of the unmodified attributes (from those
+ * listed as interesting) of the old tuple is a member of external_cols and is
+ * stored externally.
+ *
+ * The input interesting_cols bitmapset is destructively modified; that is OK
+ * since this is invoked at most once in heap_update.
*/
static Bitmapset *
-HeapDetermineModifiedColumns(Relation relation, Bitmapset *interesting_cols,
- HeapTuple oldtup, HeapTuple newtup)
+HeapDetermineColumnsInfo(Relation relation,
+ Bitmapset *interesting_cols,
+ Bitmapset *external_cols,
+ HeapTuple oldtup, HeapTuple newtup,
+ bool *has_external)
{
- int attnum;
+ int attrnum;
Bitmapset *modified = NULL;
+ TupleDesc tupdesc = RelationGetDescr(relation);
- while ((attnum = bms_first_member(interesting_cols)) >= 0)
+ while ((attrnum = bms_first_member(interesting_cols)) >= 0)
{
- attnum += FirstLowInvalidHeapAttributeNumber;
+ Datum value1,
+ value2;
+ bool isnull1,
+ isnull2;
+
+ attrnum += FirstLowInvalidHeapAttributeNumber;
+
+ /*
+ * If it's a whole-tuple reference, say "not equal". It's not really
+ * worth supporting this case, since it could only succeed after a
+ * no-op update, which is hardly a case worth optimizing for.
+ */
+ if (attrnum == 0)
+ {
+ modified = bms_add_member(modified,
+ attrnum -
+ FirstLowInvalidHeapAttributeNumber);
+ continue;
+ }
+
+ /*
+ * Likewise, automatically say "not equal" for any system attribute
+ * other than OID and tableOID; we cannot expect these to be consistent
+ * in a HOT chain, or even to be set correctly yet in the new tuple.
+ */
+ if (attrnum < 0)
+ {
+ if (attrnum != ObjectIdAttributeNumber &&
+ attrnum != TableOidAttributeNumber)
+ {
+ modified = bms_add_member(modified,
+ attrnum -
+ FirstLowInvalidHeapAttributeNumber);
+ continue;
+ }
+ }
- if (!heap_tuple_attr_equals(RelationGetDescr(relation),
- attnum, oldtup, newtup))
+ /*
+ * Extract the corresponding values. XXX this is pretty inefficient
+ * if there are many indexed columns. Should we do a single
+ * heap_deform_tuple call on each tuple, instead? But that doesn't
+ * work for system columns ...
+ */
+ value1 = heap_getattr(oldtup, attrnum, tupdesc, &isnull1);
+ value2 = heap_getattr(newtup, attrnum, tupdesc, &isnull2);
+
+ if (!heap_attr_equals(RelationGetDescr(relation), attrnum, value1,
+ value2, isnull1, isnull2))
+ {
modified = bms_add_member(modified,
- attnum - FirstLowInvalidHeapAttributeNumber);
+ attrnum -
+ FirstLowInvalidHeapAttributeNumber);
+ continue;
+ }
+
+ /* No need to check attributes that can't be stored externally. */
+ if (isnull1 || TupleDescAttr(tupdesc, attrnum - 1)->attlen != -1)
+ continue;
+
+ /*
+ * Check if the old tuple's attribute is stored externally and is a
+ * member of external_cols.
+ */
+ if (VARATT_IS_EXTERNAL((struct varlena *) DatumGetPointer(value1)) &&
+ bms_is_member(attrnum - FirstLowInvalidHeapAttributeNumber,
+ external_cols))
+ *has_external = true;
}
return modified;
@@ -8116,9 +8165,13 @@ log_heap_new_cid(Relation relation, HeapTuple tup)
*
* Returns NULL if there's no need to log an identity or if there's no suitable
* key in the Relation relation.
+ *
+ * key_required should be false if caller knows that no replica identity
+ * columns changed value and it doesn't has any external data. It's always
+ * true in the DELETE case.
*/
static HeapTuple
-ExtractReplicaIdentity(Relation relation, HeapTuple tp, bool key_changed, bool *copy)
+ExtractReplicaIdentity(Relation relation, HeapTuple tp, bool key_required, bool *copy)
{
TupleDesc desc = RelationGetDescr(relation);
Oid replidindex;
@@ -8151,8 +8204,8 @@ ExtractReplicaIdentity(Relation relation, HeapTuple tp, bool key_changed, bool *
return tp;
}
- /* if the key hasn't changed and we're only logging the key, we're done */
- if (!key_changed)
+ /* if the key isn't required and we're only logging the key, we're done */
+ if (!key_required)
return NULL;
/* find the replica identity index */
--
2.28.0.windows.1
10-v7-0001-WAL-log-unchanged-toasted-replica-identity-key-a.patchapplication/octet-stream; name=10-v7-0001-WAL-log-unchanged-toasted-replica-identity-key-a.patchDownload
From ac4b5e7d5715921b238abcf26896dc6f0cb843a9 Mon Sep 17 00:00:00 2001
From: Amit Kapila <akapila@postgresql.org>
Date: Fri, 4 Feb 2022 17:37:31 +0530
Subject: [PATCH v10] WAL log unchanged toasted replica identity key
attributes.
Currently, during UPDATE, the unchanged replica identity key attributes
are not logged separately because they are getting logged as part of the
new tuple. But if they are stored externally then the untoasted values are
not getting logged as part of the new tuple and logical replication won't
be able to replicate such UPDATEs. So we need to log such attributes as
part of the old_key_tuple during UPDATE.
Reported-by: Haiying Tang
Author: Dilip Kumar and Amit Kapila
Backpatch-through: 10
Discussion: https://postgr.es/m/OS0PR01MB611342D0A92D4F4BF26C0F47FB229@OS0PR01MB6113.jpnprd01.prod.outlook.com
---
contrib/test_decoding/expected/toast.out | 2 +-
doc/src/sgml/ref/alter_table.sgml | 6 +-
src/backend/access/heap/heapam.c | 173 ++++++++++++++++++++-----------
3 files changed, 118 insertions(+), 63 deletions(-)
diff --git a/contrib/test_decoding/expected/toast.out b/contrib/test_decoding/expected/toast.out
index 91a9a1e..a8e3676 100644
--- a/contrib/test_decoding/expected/toast.out
+++ b/contrib/test_decoding/expected/toast.out
@@ -77,7 +77,7 @@ SELECT substr(data, 1, 200) FROM pg_logical_slot_get_changes('regression_slot',
table public.toasted_key: INSERT: id[integer]:1 toasted_key[text]:'1234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123
COMMIT
BEGIN
- table public.toasted_key: UPDATE: id[integer]:1 toasted_key[text]:unchanged-toast-datum toasted_col1[text]:unchanged-toast-datum toasted_col2[text]:'987654321098765432109876543210987654321098765432109
+ table public.toasted_key: UPDATE: old-key: toasted_key[text]:'123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678
COMMIT
BEGIN
table public.toasted_key: UPDATE: old-key: toasted_key[text]:'123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678
diff --git a/doc/src/sgml/ref/alter_table.sgml b/doc/src/sgml/ref/alter_table.sgml
index 1279c19..31c27d0 100644
--- a/doc/src/sgml/ref/alter_table.sgml
+++ b/doc/src/sgml/ref/alter_table.sgml
@@ -738,8 +738,10 @@ ALTER TABLE [ IF EXISTS ] <replaceable class="PARAMETER">name</replaceable>
This form changes the information which is written to the write-ahead log
to identify rows which are updated or deleted. This option has no effect
except when logical replication is in use.
- In all cases, no old values are logged unless at least one of the columns
- that would be logged differs between the old and new versions of the row.
+ In all cases except toasted values, no old values are logged unless at
+ least one of the columns that would be logged differs between the old and
+ new versions of the row. We detoast the unchanged old toast values and log
+ them.
<variablelist>
<varlistentry>
<term><literal>DEFAULT</literal></term>
diff --git a/src/backend/access/heap/heapam.c b/src/backend/access/heap/heapam.c
index 144bbcd..a39c393 100644
--- a/src/backend/access/heap/heapam.c
+++ b/src/backend/access/heap/heapam.c
@@ -96,9 +96,11 @@ static XLogRecPtr log_heap_update(Relation reln, Buffer oldbuf,
Buffer newbuf, HeapTuple oldtup,
HeapTuple newtup, HeapTuple old_key_tup,
bool all_visible_cleared, bool new_all_visible_cleared);
-static Bitmapset *HeapDetermineModifiedColumns(Relation relation,
- Bitmapset *interesting_cols,
- HeapTuple oldtup, HeapTuple newtup);
+static Bitmapset *HeapDetermineColumnsInfo(Relation relation,
+ Bitmapset *interesting_cols,
+ Bitmapset *external_cols,
+ HeapTuple oldtup, HeapTuple newtup,
+ bool *id_has_external);
static bool heap_acquire_tuplock(Relation relation, ItemPointer tid,
LockTupleMode mode, LockWaitPolicy wait_policy,
bool *have_tuple_lock);
@@ -122,7 +124,7 @@ static void MultiXactIdWait(MultiXactId multi, MultiXactStatus status, uint16 in
static bool ConditionalMultiXactIdWait(MultiXactId multi, MultiXactStatus status,
uint16 infomask, Relation rel, int *remaining);
static XLogRecPtr log_heap_new_cid(Relation relation, HeapTuple tup);
-static HeapTuple ExtractReplicaIdentity(Relation rel, HeapTuple tup, bool key_modified,
+static HeapTuple ExtractReplicaIdentity(Relation rel, HeapTuple tup, bool key_required,
bool *copy);
@@ -3510,6 +3512,7 @@ heap_update(Relation relation, ItemPointer otid, HeapTuple newtup,
bool all_visible_cleared_new = false;
bool checked_lockers;
bool locker_remains;
+ bool id_has_external = false;
TransactionId xmax_new_tuple,
xmax_old_tuple;
uint16 infomask_old_tuple,
@@ -3594,7 +3597,7 @@ heap_update(Relation relation, ItemPointer otid, HeapTuple newtup,
Assert(ItemIdIsNormal(lp));
/*
- * Fill in enough data in oldtup for HeapDetermineModifiedColumns to work
+ * Fill in enough data in oldtup for HeapDetermineColumnsInfo to work
* properly.
*/
oldtup.t_tableOid = RelationGetRelid(relation);
@@ -3620,9 +3623,17 @@ heap_update(Relation relation, ItemPointer otid, HeapTuple newtup,
Assert(!(newtup->t_data->t_infomask & HEAP_HASOID));
}
- /* Determine columns modified by the update. */
- modified_attrs = HeapDetermineModifiedColumns(relation, interesting_attrs,
- &oldtup, newtup);
+ /*
+ * Determine columns modified by the update. Additionally, identify
+ * whether any of the unmodified replica identity key attributes in the
+ * old tuple is externally stored or not. This is required because for
+ * such attributes the flattened value won't be WAL logged as part of the
+ * new tuple so we must include it as part of the old_key_tuple. See
+ * ExtractReplicaIdentity.
+ */
+ modified_attrs = HeapDetermineColumnsInfo(relation, interesting_attrs,
+ id_attrs, &oldtup,
+ newtup, &id_has_external);
/*
* If we're not updating any "key" column, we can grab a weaker lock type.
@@ -4201,10 +4212,12 @@ l2:
* Compute replica identity tuple before entering the critical section so
* we don't PANIC upon a memory allocation failure.
* ExtractReplicaIdentity() will return NULL if nothing needs to be
- * logged.
+ * logged. Pass old key required as true only if the replica identity key
+ * columns are modified or it has external data.
*/
old_key_tuple = ExtractReplicaIdentity(relation, &oldtup,
- bms_overlap(modified_attrs, id_attrs),
+ bms_overlap(modified_attrs, id_attrs) ||
+ id_has_external,
&old_key_copied);
/* NO EREPORT(ERROR) from here till changes are logged */
@@ -4360,49 +4373,16 @@ l2:
}
/*
- * Check if the specified attribute's value is same in both given tuples.
- * Subroutine for HeapDetermineModifiedColumns.
+ * Check if the specified attribute's values are the same. Subroutine for
+ * HeapDetermineColumnsInfo.
*/
static bool
-heap_tuple_attr_equals(TupleDesc tupdesc, int attrnum,
- HeapTuple tup1, HeapTuple tup2)
+heap_attr_equals(TupleDesc tupdesc, int attrnum, Datum value1, Datum value2,
+ bool isnull1, bool isnull2)
{
- Datum value1,
- value2;
- bool isnull1,
- isnull2;
Form_pg_attribute att;
/*
- * If it's a whole-tuple reference, say "not equal". It's not really
- * worth supporting this case, since it could only succeed after a no-op
- * update, which is hardly a case worth optimizing for.
- */
- if (attrnum == 0)
- return false;
-
- /*
- * Likewise, automatically say "not equal" for any system attribute other
- * than OID and tableOID; we cannot expect these to be consistent in a HOT
- * chain, or even to be set correctly yet in the new tuple.
- */
- if (attrnum < 0)
- {
- if (attrnum != ObjectIdAttributeNumber &&
- attrnum != TableOidAttributeNumber)
- return false;
- }
-
- /*
- * Extract the corresponding values. XXX this is pretty inefficient if
- * there are many indexed columns. Should HeapDetermineModifiedColumns do
- * a single heap_deform_tuple call on each tuple, instead? But that
- * doesn't work for system columns ...
- */
- value1 = heap_getattr(tup1, attrnum, tupdesc, &isnull1);
- value2 = heap_getattr(tup2, attrnum, tupdesc, &isnull2);
-
- /*
* If one value is NULL and other is not, then they are certainly not
* equal
*/
@@ -4443,24 +4423,93 @@ heap_tuple_attr_equals(TupleDesc tupdesc, int attrnum,
* Given an updated tuple, determine (and return into the output bitmapset),
* from those listed as interesting, the set of columns that changed.
*
- * The input bitmapset is destructively modified; that is OK since this is
- * invoked at most once in heap_update.
+ * has_external indicates if any of the unmodified attributes (from those
+ * listed as interesting) of the old tuple is a member of external_cols and is
+ * stored externally.
+ *
+ * The input interesting_cols bitmapset is destructively modified; that is OK
+ * since this is invoked at most once in heap_update.
*/
static Bitmapset *
-HeapDetermineModifiedColumns(Relation relation, Bitmapset *interesting_cols,
- HeapTuple oldtup, HeapTuple newtup)
+HeapDetermineColumnsInfo(Relation relation,
+ Bitmapset *interesting_cols,
+ Bitmapset *external_cols,
+ HeapTuple oldtup, HeapTuple newtup,
+ bool *has_external)
{
- int attnum;
+ int attrnum;
Bitmapset *modified = NULL;
+ TupleDesc tupdesc = RelationGetDescr(relation);
- while ((attnum = bms_first_member(interesting_cols)) >= 0)
+ while ((attrnum = bms_first_member(interesting_cols)) >= 0)
{
- attnum += FirstLowInvalidHeapAttributeNumber;
+ Datum value1,
+ value2;
+ bool isnull1,
+ isnull2;
+
+ attrnum += FirstLowInvalidHeapAttributeNumber;
+
+ /*
+ * If it's a whole-tuple reference, say "not equal". It's not really
+ * worth supporting this case, since it could only succeed after a
+ * no-op update, which is hardly a case worth optimizing for.
+ */
+ if (attrnum == 0)
+ {
+ modified = bms_add_member(modified,
+ attrnum -
+ FirstLowInvalidHeapAttributeNumber);
+ continue;
+ }
+
+ /*
+ * Likewise, automatically say "not equal" for any system attribute
+ * other than OID and tableOID; we cannot expect these to be consistent
+ * in a HOT chain, or even to be set correctly yet in the new tuple.
+ */
+ if (attrnum < 0)
+ {
+ if (attrnum != ObjectIdAttributeNumber &&
+ attrnum != TableOidAttributeNumber)
+ {
+ modified = bms_add_member(modified,
+ attrnum -
+ FirstLowInvalidHeapAttributeNumber);
+ continue;
+ }
+ }
- if (!heap_tuple_attr_equals(RelationGetDescr(relation),
- attnum, oldtup, newtup))
+ /*
+ * Extract the corresponding values. XXX this is pretty inefficient
+ * if there are many indexed columns. Should we do a single
+ * heap_deform_tuple call on each tuple, instead? But that doesn't
+ * work for system columns ...
+ */
+ value1 = heap_getattr(oldtup, attrnum, tupdesc, &isnull1);
+ value2 = heap_getattr(newtup, attrnum, tupdesc, &isnull2);
+
+ if (!heap_attr_equals(RelationGetDescr(relation), attrnum, value1,
+ value2, isnull1, isnull2))
+ {
modified = bms_add_member(modified,
- attnum - FirstLowInvalidHeapAttributeNumber);
+ attrnum -
+ FirstLowInvalidHeapAttributeNumber);
+ continue;
+ }
+
+ /* No need to check attributes that can't be stored externally. */
+ if (isnull1 || TupleDescAttr(tupdesc, attrnum - 1)->attlen != -1)
+ continue;
+
+ /*
+ * Check if the old tuple's attribute is stored externally and is a
+ * member of external_cols.
+ */
+ if (VARATT_IS_EXTERNAL((struct varlena *) DatumGetPointer(value1)) &&
+ bms_is_member(attrnum - FirstLowInvalidHeapAttributeNumber,
+ external_cols))
+ *has_external = true;
}
return modified;
@@ -7964,9 +8013,13 @@ log_heap_new_cid(Relation relation, HeapTuple tup)
*
* Returns NULL if there's no need to log an identity or if there's no suitable
* key in the Relation relation.
+ *
+ * key_required should be false if caller knows that no replica identity
+ * columns changed value and it doesn't has any external data. It's always
+ * true in the DELETE case.
*/
static HeapTuple
-ExtractReplicaIdentity(Relation relation, HeapTuple tp, bool key_changed, bool *copy)
+ExtractReplicaIdentity(Relation relation, HeapTuple tp, bool key_required, bool *copy)
{
TupleDesc desc = RelationGetDescr(relation);
Oid replidindex;
@@ -8000,8 +8053,8 @@ ExtractReplicaIdentity(Relation relation, HeapTuple tp, bool key_changed, bool *
return tp;
}
- /* if the key hasn't changed and we're only logging the key, we're done */
- if (!key_changed)
+ /* if the key isn't required and we're only logging the key, we're done */
+ if (!key_required)
return NULL;
/* find the replica identity index */
--
1.8.3.1
I don't have a reason not to commit this patch. I have some suggestions
on the comments and docs though.
@@ -8359,14 +8408,15 @@ log_heap_new_cid(Relation relation, HeapTuple tup) * Returns NULL if there's no need to log an identity or if there's no suitable * key defined. * - * key_changed should be false if caller knows that no replica identity - * columns changed value. It's always true in the DELETE case. + * key_required should be false if caller knows that no replica identity + * columns changed value and it doesn't has any external data. It's always + * true in the DELETE case. * * *copy is set to true if the returned tuple is a modified copy rather than * the same tuple that was passed in. */ static HeapTuple -ExtractReplicaIdentity(Relation relation, HeapTuple tp, bool key_changed, +ExtractReplicaIdentity(Relation relation, HeapTuple tp, bool key_required,
I find the new comment pretty hard to interpret. I would say something
like "Pass key_required true if any replica identity columns changed
value, or if any of them have external data. DELETE must always pass
true".
diff --git a/doc/src/sgml/ref/alter_table.sgml b/doc/src/sgml/ref/alter_table.sgml index dee026e..d67ef7c 100644 --- a/doc/src/sgml/ref/alter_table.sgml +++ b/doc/src/sgml/ref/alter_table.sgml @@ -873,8 +873,10 @@ WITH ( MODULUS <replaceable class="parameter">numeric_literal</replaceable>, REM This form changes the information which is written to the write-ahead log to identify rows which are updated or deleted. This option has no effect except when logical replication is in use. - In all cases, no old values are logged unless at least one of the columns - that would be logged differs between the old and new versions of the row. + In all cases except toasted values, no old values are logged unless at + least one of the columns that would be logged differs between the old and + new versions of the row. We detoast the unchanged old toast values and log + them.
Here we're patching with a minimal wording change with almost
incomprehensible results. I think we should patch more extensively.
I suggest:
This form changes the information which is written to the
write-ahead log to identify rows which are updated or deleted.
In most cases, the old value of each column is only logged if
it differs from the new value; however, if the old value is
stored externally, it is always logged regardless of whether it
changed.
This option has no effect unless logical replication is in use.
I didn't get a chance to review the code, but I think this is valuable.
--
Álvaro Herrera 39°49'30"S 73°17'W — https://www.EnterpriseDB.com/
On Fri, Feb 4, 2022 at 9:06 PM Alvaro Herrera <alvherre@alvh.no-ip.org> wrote:
I don't have a reason not to commit this patch.
It is not very clear to me from this so just checking again, are you
fine with back-patching this as well?
I have some suggestions
on the comments and docs though.
Thanks, your suggestions look good to me. I'll take care of these in
the next version.
--
With Regards,
Amit Kapila.
On 2022-Feb-05, Amit Kapila wrote:
On Fri, Feb 4, 2022 at 9:06 PM Alvaro Herrera <alvherre@alvh.no-ip.org> wrote:
I don't have a reason not to commit this patch.
It is not very clear to me from this so just checking again, are you
fine with back-patching this as well?
Hmm, of course, I never thought it'd be a good idea to let the bug
unfixed in back branches.
Thanks!
--
Álvaro Herrera PostgreSQL Developer — https://www.EnterpriseDB.com/
"El sudor es la mejor cura para un pensamiento enfermo" (Bardia)
Hi,
On 2022-02-04 17:45:36 +0530, Amit Kapila wrote:
diff --git a/contrib/test_decoding/expected/toast.out b/contrib/test_decoding/expected/toast.out index cd03e9d..a757e7d 100644 --- a/contrib/test_decoding/expected/toast.out +++ b/contrib/test_decoding/expected/toast.out @@ -77,7 +77,7 @@ SELECT substr(data, 1, 200) FROM pg_logical_slot_get_changes('regression_slot', table public.toasted_key: INSERT: id[integer]:1 toasted_key[text]:'1234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123 COMMIT BEGIN - table public.toasted_key: UPDATE: id[integer]:1 toasted_key[text]:unchanged-toast-datum toasted_col1[text]:unchanged-toast-datum toasted_col2[text]:'987654321098765432109876543210987654321098765432109 + table public.toasted_key: UPDATE: old-key: toasted_key[text]:'123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678
Hm. This looks weird. What happened to the the change to toasted_col2 that was
in the "removed" line?
This corresponds to the following statement I think:
-- test update of a toasted key without changing it
UPDATE toasted_key SET toasted_col2 = toasted_col1;
which previously was inserted as:
INSERT INTO toasted_key(toasted_key, toasted_col1) VALUES(repeat('1234567890', 200), repeat('9876543210', 200));
so toasted_col2 should have changed from NULL to repeat('9876543210', 200)
Am I misreading something?
Greetings,
Andres Freund
On Sun, Feb 6, 2022 at 5:04 AM Andres Freund <andres@anarazel.de> wrote:
On 2022-02-04 17:45:36 +0530, Amit Kapila wrote:
diff --git a/contrib/test_decoding/expected/toast.out b/contrib/test_decoding/expected/toast.out index cd03e9d..a757e7d 100644 --- a/contrib/test_decoding/expected/toast.out +++ b/contrib/test_decoding/expected/toast.out @@ -77,7 +77,7 @@ SELECT substr(data, 1, 200) FROM pg_logical_slot_get_changes('regression_slot', table public.toasted_key: INSERT: id[integer]:1 toasted_key[text]:'1234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123 COMMIT BEGIN - table public.toasted_key: UPDATE: id[integer]:1 toasted_key[text]:unchanged-toast-datum toasted_col1[text]:unchanged-toast-datum toasted_col2[text]:'987654321098765432109876543210987654321098765432109 + table public.toasted_key: UPDATE: old-key: toasted_key[text]:'123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678Hm. This looks weird. What happened to the the change to toasted_col2 that was
in the "removed" line?This corresponds to the following statement I think:
-- test update of a toasted key without changing it
UPDATE toasted_key SET toasted_col2 = toasted_col1;
which previously was inserted as:
INSERT INTO toasted_key(toasted_key, toasted_col1) VALUES(repeat('1234567890', 200), repeat('9876543210', 200));so toasted_col2 should have changed from NULL to repeat('9876543210', 200)
Right, and it is getting changed. We are just printing the first 200
characters (by using SQL [1]SELECT substr(data, 1, 200) FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'include-xids', '0', 'skip-empty-xacts', '1');) from the decoded tuple so what is shown
in the results is the initial 200 bytes. The complete decoded data
after the patch is as follows:
table public.toasted_key: UPDATE: old-key:
toasted_key[text]:'12345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890'
new-tuple: id[integer]:2 toasted_key[text]:unchanged-toast-datum
toasted_col1[text]:unchanged-toast-datum
toasted_col2[text]:'98765432109876543210987654321098765432109876543210987654321098765432109876543210987654321098765432109876543210987654321098765432109876543210987654321098765432109876543210987654321098765432109876543210987654321098765432109876543210987654321098765432109876543210987654321098765432109876543210987654321098765432109876543210987654321098765432109876543210987654321098765432109876543210987654321098765432109876543210987654321098765432109876543210987654321098765432109876543210987654321098765432109876543210987654321098765432109876543210987654321098765432109876543210987654321098765432109876543210987654321098765432109876543210987654321098765432109876543210987654321098765432109876543210987654321098765432109876543210987654321098765432109876543210987654321098765432109876543210987654321098765432109876543210987654321098765432109876543210987654321098765432109876543210987654321098765432109876543210987654321098765432109876543210987654321098765432109876543210987654321098765432109876543210987654321098765432109876543210987654321098765432109876543210987654321098765432109876543210987654321098765432109876543210987654321098765432109876543210987654321098765432109876543210987654321098765432109876543210987654321098765432109876543210987654321098765432109876543210987654321098765432109876543210987654321098765432109876543210987654321098765432109876543210987654321098765432109876543210987654321098765432109876543210987654321098765432109876543210987654321098765432109876543210987654321098765432109876543210987654321098765432109876543210987654321098765432109876543210987654321098765432109876543210987654321098765432109876543210987654321098765432109876543210987654321098765432109876543210987654321098765432109876543210987654321098765432109876543210987654321098765432109876543210987654321098765432109876543210987654321098765432109876543210987654321098765432109876543210987654321098765432109876543210987654321098765432109876543210987654321098765432109876543210987654321098765432109876543210987654321098765432109876543210'
So, in the result, the initial 200 bytes contain data of old-key which
is what we expect.
[1]: SELECT substr(data, 1, 200) FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'include-xids', '0', 'skip-empty-xacts', '1');
pg_logical_slot_get_changes('regression_slot', NULL, NULL,
'include-xids', '0', 'skip-empty-xacts', '1');
--
With Regards,
Amit Kapila.
On Sat, Feb 5, 2022 at 6:10 AM Amit Kapila <amit.kapila16@gmail.com> wrote:
On Fri, Feb 4, 2022 at 9:06 PM Alvaro Herrera <alvherre@alvh.no-ip.org> wrote:
I have some suggestions
on the comments and docs though.Thanks, your suggestions look good to me. I'll take care of these in
the next version.
Attached please find the modified patches.
--
With Regards,
Amit Kapila.
Attachments:
HEAD-v8-0001-WAL-log-unchanged-toasted-replica-identity-key-.patchapplication/octet-stream; name=HEAD-v8-0001-WAL-log-unchanged-toasted-replica-identity-key-.patchDownload
From a3a8703ecfb51e2068467204ce0a2153a909601e Mon Sep 17 00:00:00 2001
From: Amit Kapila <akapila@postgresql.org>
Date: Fri, 4 Feb 2022 10:30:32 +0530
Subject: [PATCH v815] WAL log unchanged toasted replica identity key
attributes.
Currently, during UPDATE, the unchanged replica identity key attributes
are not logged separately because they are getting logged as part of the
new tuple. But if they are stored externally then the untoasted values are
not getting logged as part of the new tuple and logical replication won't
be able to replicate such UPDATEs. So we need to log such attributes as
part of the old_key_tuple during UPDATE.
Reported-by: Haiying Tang
Author: Dilip Kumar and Amit Kapila
Reviewed-by: Alvaro Herrera
Backpatch-through: 10
Discussion: https://postgr.es/m/OS0PR01MB611342D0A92D4F4BF26C0F47FB229@OS0PR01MB6113.jpnprd01.prod.outlook.com
---
contrib/test_decoding/expected/toast.out | 2 +-
doc/src/sgml/ref/alter_table.sgml | 9 +-
src/backend/access/heap/heapam.c | 177 ++++++++++++++++++++-----------
3 files changed, 119 insertions(+), 69 deletions(-)
diff --git a/contrib/test_decoding/expected/toast.out b/contrib/test_decoding/expected/toast.out
index cd03e9d..a757e7d 100644
--- a/contrib/test_decoding/expected/toast.out
+++ b/contrib/test_decoding/expected/toast.out
@@ -77,7 +77,7 @@ SELECT substr(data, 1, 200) FROM pg_logical_slot_get_changes('regression_slot',
table public.toasted_key: INSERT: id[integer]:1 toasted_key[text]:'1234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123
COMMIT
BEGIN
- table public.toasted_key: UPDATE: id[integer]:1 toasted_key[text]:unchanged-toast-datum toasted_col1[text]:unchanged-toast-datum toasted_col2[text]:'987654321098765432109876543210987654321098765432109
+ table public.toasted_key: UPDATE: old-key: toasted_key[text]:'123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678
COMMIT
BEGIN
table public.toasted_key: UPDATE: old-key: toasted_key[text]:'123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678
diff --git a/doc/src/sgml/ref/alter_table.sgml b/doc/src/sgml/ref/alter_table.sgml
index dee026e..5c0735e 100644
--- a/doc/src/sgml/ref/alter_table.sgml
+++ b/doc/src/sgml/ref/alter_table.sgml
@@ -871,10 +871,11 @@ WITH ( MODULUS <replaceable class="parameter">numeric_literal</replaceable>, REM
<listitem>
<para>
This form changes the information which is written to the write-ahead log
- to identify rows which are updated or deleted. This option has no effect
- except when logical replication is in use.
- In all cases, no old values are logged unless at least one of the columns
- that would be logged differs between the old and new versions of the row.
+ to identify rows which are updated or deleted.
+ In most cases, the old value of each column is only logged if it differs
+ from the new value; however, if the old value is stored externally, it is
+ always logged regardless of whether it changed.
+ This option has no effect except when logical replication is in use.
<variablelist>
<varlistentry>
<term><literal>DEFAULT</literal></term>
diff --git a/src/backend/access/heap/heapam.c b/src/backend/access/heap/heapam.c
index 98230aa..593ee3a 100644
--- a/src/backend/access/heap/heapam.c
+++ b/src/backend/access/heap/heapam.c
@@ -78,9 +78,11 @@ static XLogRecPtr log_heap_update(Relation reln, Buffer oldbuf,
Buffer newbuf, HeapTuple oldtup,
HeapTuple newtup, HeapTuple old_key_tuple,
bool all_visible_cleared, bool new_all_visible_cleared);
-static Bitmapset *HeapDetermineModifiedColumns(Relation relation,
- Bitmapset *interesting_cols,
- HeapTuple oldtup, HeapTuple newtup);
+static Bitmapset *HeapDetermineColumnsInfo(Relation relation,
+ Bitmapset *interesting_cols,
+ Bitmapset *external_cols,
+ HeapTuple oldtup, HeapTuple newtup,
+ bool *id_has_external);
static bool heap_acquire_tuplock(Relation relation, ItemPointer tid,
LockTupleMode mode, LockWaitPolicy wait_policy,
bool *have_tuple_lock);
@@ -106,7 +108,7 @@ static bool ConditionalMultiXactIdWait(MultiXactId multi, MultiXactStatus status
static void index_delete_sort(TM_IndexDeleteOp *delstate);
static int bottomup_sort_and_shrink(TM_IndexDeleteOp *delstate);
static XLogRecPtr log_heap_new_cid(Relation relation, HeapTuple tup);
-static HeapTuple ExtractReplicaIdentity(Relation rel, HeapTuple tup, bool key_changed,
+static HeapTuple ExtractReplicaIdentity(Relation rel, HeapTuple tup, bool key_required,
bool *copy);
@@ -3184,6 +3186,7 @@ heap_update(Relation relation, ItemPointer otid, HeapTuple newtup,
bool all_visible_cleared_new = false;
bool checked_lockers;
bool locker_remains;
+ bool id_has_external = false;
TransactionId xmax_new_tuple,
xmax_old_tuple;
uint16 infomask_old_tuple,
@@ -3251,7 +3254,7 @@ heap_update(Relation relation, ItemPointer otid, HeapTuple newtup,
Assert(ItemIdIsNormal(lp));
/*
- * Fill in enough data in oldtup for HeapDetermineModifiedColumns to work
+ * Fill in enough data in oldtup for HeapDetermineColumnsInfo to work
* properly.
*/
oldtup.t_tableOid = RelationGetRelid(relation);
@@ -3262,9 +3265,17 @@ heap_update(Relation relation, ItemPointer otid, HeapTuple newtup,
/* the new tuple is ready, except for this: */
newtup->t_tableOid = RelationGetRelid(relation);
- /* Determine columns modified by the update. */
- modified_attrs = HeapDetermineModifiedColumns(relation, interesting_attrs,
- &oldtup, newtup);
+ /*
+ * Determine columns modified by the update. Additionally, identify
+ * whether any of the unmodified replica identity key attributes in the
+ * old tuple is externally stored or not. This is required because for
+ * such attributes the flattened value won't be WAL logged as part of the
+ * new tuple so we must include it as part of the old_key_tuple. See
+ * ExtractReplicaIdentity.
+ */
+ modified_attrs = HeapDetermineColumnsInfo(relation, interesting_attrs,
+ id_attrs, &oldtup,
+ newtup, &id_has_external);
/*
* If we're not updating any "key" column, we can grab a weaker lock type.
@@ -3864,10 +3875,12 @@ l2:
* Compute replica identity tuple before entering the critical section so
* we don't PANIC upon a memory allocation failure.
* ExtractReplicaIdentity() will return NULL if nothing needs to be
- * logged.
+ * logged. Pass old key required as true only if the replica identity key
+ * columns are modified or it has external data.
*/
old_key_tuple = ExtractReplicaIdentity(relation, &oldtup,
- bms_overlap(modified_attrs, id_attrs),
+ bms_overlap(modified_attrs, id_attrs) ||
+ id_has_external,
&old_key_copied);
/* NO EREPORT(ERROR) from here till changes are logged */
@@ -4023,48 +4036,16 @@ l2:
}
/*
- * Check if the specified attribute's value is same in both given tuples.
- * Subroutine for HeapDetermineModifiedColumns.
+ * Check if the specified attribute's values are the same. Subroutine for
+ * HeapDetermineColumnsInfo.
*/
static bool
-heap_tuple_attr_equals(TupleDesc tupdesc, int attrnum,
- HeapTuple tup1, HeapTuple tup2)
+heap_attr_equals(TupleDesc tupdesc, int attrnum, Datum value1, Datum value2,
+ bool isnull1, bool isnull2)
{
- Datum value1,
- value2;
- bool isnull1,
- isnull2;
Form_pg_attribute att;
/*
- * If it's a whole-tuple reference, say "not equal". It's not really
- * worth supporting this case, since it could only succeed after a no-op
- * update, which is hardly a case worth optimizing for.
- */
- if (attrnum == 0)
- return false;
-
- /*
- * Likewise, automatically say "not equal" for any system attribute other
- * than tableOID; we cannot expect these to be consistent in a HOT chain,
- * or even to be set correctly yet in the new tuple.
- */
- if (attrnum < 0)
- {
- if (attrnum != TableOidAttributeNumber)
- return false;
- }
-
- /*
- * Extract the corresponding values. XXX this is pretty inefficient if
- * there are many indexed columns. Should HeapDetermineModifiedColumns do
- * a single heap_deform_tuple call on each tuple, instead? But that
- * doesn't work for system columns ...
- */
- value1 = heap_getattr(tup1, attrnum, tupdesc, &isnull1);
- value2 = heap_getattr(tup2, attrnum, tupdesc, &isnull2);
-
- /*
* If one value is NULL and other is not, then they are certainly not
* equal
*/
@@ -4105,24 +4086,92 @@ heap_tuple_attr_equals(TupleDesc tupdesc, int attrnum,
* Given an updated tuple, determine (and return into the output bitmapset),
* from those listed as interesting, the set of columns that changed.
*
- * The input bitmapset is destructively modified; that is OK since this is
- * invoked at most once in heap_update.
+ * has_external indicates if any of the unmodified attributes (from those
+ * listed as interesting) of the old tuple is a member of external_cols and is
+ * stored externally.
+ *
+ * The input interesting_cols bitmapset is destructively modified; that is OK
+ * since this is invoked at most once in heap_update.
*/
static Bitmapset *
-HeapDetermineModifiedColumns(Relation relation, Bitmapset *interesting_cols,
- HeapTuple oldtup, HeapTuple newtup)
+HeapDetermineColumnsInfo(Relation relation,
+ Bitmapset *interesting_cols,
+ Bitmapset *external_cols,
+ HeapTuple oldtup, HeapTuple newtup,
+ bool *has_external)
{
- int attnum;
+ int attrnum;
Bitmapset *modified = NULL;
+ TupleDesc tupdesc = RelationGetDescr(relation);
- while ((attnum = bms_first_member(interesting_cols)) >= 0)
+ while ((attrnum = bms_first_member(interesting_cols)) >= 0)
{
- attnum += FirstLowInvalidHeapAttributeNumber;
+ Datum value1,
+ value2;
+ bool isnull1,
+ isnull2;
+
+ attrnum += FirstLowInvalidHeapAttributeNumber;
- if (!heap_tuple_attr_equals(RelationGetDescr(relation),
- attnum, oldtup, newtup))
+ /*
+ * If it's a whole-tuple reference, say "not equal". It's not really
+ * worth supporting this case, since it could only succeed after a
+ * no-op update, which is hardly a case worth optimizing for.
+ */
+ if (attrnum == 0)
+ {
modified = bms_add_member(modified,
- attnum - FirstLowInvalidHeapAttributeNumber);
+ attrnum -
+ FirstLowInvalidHeapAttributeNumber);
+ continue;
+ }
+
+ /*
+ * Likewise, automatically say "not equal" for any system attribute
+ * other than tableOID; we cannot expect these to be consistent in a
+ * HOT chain, or even to be set correctly yet in the new tuple.
+ */
+ if (attrnum < 0)
+ {
+ if (attrnum != TableOidAttributeNumber)
+ {
+ modified = bms_add_member(modified,
+ attrnum -
+ FirstLowInvalidHeapAttributeNumber);
+ continue;
+ }
+ }
+
+ /*
+ * Extract the corresponding values. XXX this is pretty inefficient
+ * if there are many indexed columns. Should we do a single
+ * heap_deform_tuple call on each tuple, instead? But that doesn't
+ * work for system columns ...
+ */
+ value1 = heap_getattr(oldtup, attrnum, tupdesc, &isnull1);
+ value2 = heap_getattr(newtup, attrnum, tupdesc, &isnull2);
+
+ if (!heap_attr_equals(RelationGetDescr(relation), attrnum, value1,
+ value2, isnull1, isnull2))
+ {
+ modified = bms_add_member(modified,
+ attrnum -
+ FirstLowInvalidHeapAttributeNumber);
+ continue;
+ }
+
+ /* No need to check attributes that can't be stored externally. */
+ if (isnull1 || TupleDescAttr(tupdesc, attrnum - 1)->attlen != -1)
+ continue;
+
+ /*
+ * Check if the old tuple's attribute is stored externally and is a
+ * member of external_cols.
+ */
+ if (VARATT_IS_EXTERNAL((struct varlena *) DatumGetPointer(value1)) &&
+ bms_is_member(attrnum - FirstLowInvalidHeapAttributeNumber,
+ external_cols))
+ *has_external = true;
}
return modified;
@@ -8359,14 +8408,14 @@ log_heap_new_cid(Relation relation, HeapTuple tup)
* Returns NULL if there's no need to log an identity or if there's no suitable
* key defined.
*
- * key_changed should be false if caller knows that no replica identity
- * columns changed value. It's always true in the DELETE case.
+ * Pass key_required true if any replica identity columns changed value, or if
+ * any of them have any external data. Delete must always pass true.
*
* *copy is set to true if the returned tuple is a modified copy rather than
* the same tuple that was passed in.
*/
static HeapTuple
-ExtractReplicaIdentity(Relation relation, HeapTuple tp, bool key_changed,
+ExtractReplicaIdentity(Relation relation, HeapTuple tp, bool key_required,
bool *copy)
{
TupleDesc desc = RelationGetDescr(relation);
@@ -8398,8 +8447,8 @@ ExtractReplicaIdentity(Relation relation, HeapTuple tp, bool key_changed,
return tp;
}
- /* if the key hasn't changed and we're only logging the key, we're done */
- if (!key_changed)
+ /* if the key isn't required and we're only logging the key, we're done */
+ if (!key_required)
return NULL;
/* find out the replica identity columns */
@@ -8407,10 +8456,10 @@ ExtractReplicaIdentity(Relation relation, HeapTuple tp, bool key_changed,
INDEX_ATTR_BITMAP_IDENTITY_KEY);
/*
- * If there's no defined replica identity columns, treat as !key_changed.
+ * If there's no defined replica identity columns, treat as !key_required.
* (This case should not be reachable from heap_update, since that should
- * calculate key_changed accurately. But heap_delete just passes constant
- * true for key_changed, so we can hit this case in deletes.)
+ * calculate key_required accurately. But heap_delete just passes
+ * constant true for key_required, so we can hit this case in deletes.)
*/
if (bms_is_empty(idattrs))
return NULL;
--
1.8.3.1
14-v8-0001-WAL-log-unchanged-toasted-replica-identity-key-.patchapplication/octet-stream; name=14-v8-0001-WAL-log-unchanged-toasted-replica-identity-key-.patchDownload
From 34123616e72c6c28453c23bf45ac8c30ba592fd0 Mon Sep 17 00:00:00 2001
From: Amit Kapila <akapila@postgresql.org>
Date: Fri, 4 Feb 2022 11:04:34 +0530
Subject: [PATCH v814] WAL log unchanged toasted replica identity key
attributes.
Currently, during UPDATE, the unchanged replica identity key attributes
are not logged separately because they are getting logged as part of the
new tuple. But if they are stored externally then the untoasted values are
not getting logged as part of the new tuple and logical replication won't
be able to replicate such UPDATEs. So we need to log such attributes as
part of the old_key_tuple during UPDATE.
Reported-by: Haiying Tang
Author: Dilip Kumar and Amit Kapila
Reviewed-by: Alvaro Herrera
Backpatch-through: 10
Discussion: https://postgr.es/m/OS0PR01MB611342D0A92D4F4BF26C0F47FB229@OS0PR01MB6113.jpnprd01.prod.outlook.com
---
contrib/test_decoding/expected/toast.out | 2 +-
doc/src/sgml/ref/alter_table.sgml | 9 +-
src/backend/access/heap/heapam.c | 177 ++++++++++++++++++++-----------
3 files changed, 119 insertions(+), 69 deletions(-)
diff --git a/contrib/test_decoding/expected/toast.out b/contrib/test_decoding/expected/toast.out
index cd03e9d..a757e7d 100644
--- a/contrib/test_decoding/expected/toast.out
+++ b/contrib/test_decoding/expected/toast.out
@@ -77,7 +77,7 @@ SELECT substr(data, 1, 200) FROM pg_logical_slot_get_changes('regression_slot',
table public.toasted_key: INSERT: id[integer]:1 toasted_key[text]:'1234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123
COMMIT
BEGIN
- table public.toasted_key: UPDATE: id[integer]:1 toasted_key[text]:unchanged-toast-datum toasted_col1[text]:unchanged-toast-datum toasted_col2[text]:'987654321098765432109876543210987654321098765432109
+ table public.toasted_key: UPDATE: old-key: toasted_key[text]:'123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678
COMMIT
BEGIN
table public.toasted_key: UPDATE: old-key: toasted_key[text]:'123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678
diff --git a/doc/src/sgml/ref/alter_table.sgml b/doc/src/sgml/ref/alter_table.sgml
index d4f2a73..8933bb0 100644
--- a/doc/src/sgml/ref/alter_table.sgml
+++ b/doc/src/sgml/ref/alter_table.sgml
@@ -850,10 +850,11 @@ WITH ( MODULUS <replaceable class="parameter">numeric_literal</replaceable>, REM
<listitem>
<para>
This form changes the information which is written to the write-ahead log
- to identify rows which are updated or deleted. This option has no effect
- except when logical replication is in use.
- In all cases, no old values are logged unless at least one of the columns
- that would be logged differs between the old and new versions of the row.
+ to identify rows which are updated or deleted.
+ In most cases, the old value of each column is only logged if it differs
+ from the new value; however, if the old value is stored externally, it is
+ always logged regardless of whether it changed.
+ This option has no effect except when logical replication is in use.
<variablelist>
<varlistentry>
<term><literal>DEFAULT</literal></term>
diff --git a/src/backend/access/heap/heapam.c b/src/backend/access/heap/heapam.c
index 114a969..180ed61 100644
--- a/src/backend/access/heap/heapam.c
+++ b/src/backend/access/heap/heapam.c
@@ -78,9 +78,11 @@ static XLogRecPtr log_heap_update(Relation reln, Buffer oldbuf,
Buffer newbuf, HeapTuple oldtup,
HeapTuple newtup, HeapTuple old_key_tuple,
bool all_visible_cleared, bool new_all_visible_cleared);
-static Bitmapset *HeapDetermineModifiedColumns(Relation relation,
- Bitmapset *interesting_cols,
- HeapTuple oldtup, HeapTuple newtup);
+static Bitmapset *HeapDetermineColumnsInfo(Relation relation,
+ Bitmapset *interesting_cols,
+ Bitmapset *external_cols,
+ HeapTuple oldtup, HeapTuple newtup,
+ bool *id_has_external);
static bool heap_acquire_tuplock(Relation relation, ItemPointer tid,
LockTupleMode mode, LockWaitPolicy wait_policy,
bool *have_tuple_lock);
@@ -106,7 +108,7 @@ static bool ConditionalMultiXactIdWait(MultiXactId multi, MultiXactStatus status
static void index_delete_sort(TM_IndexDeleteOp *delstate);
static int bottomup_sort_and_shrink(TM_IndexDeleteOp *delstate);
static XLogRecPtr log_heap_new_cid(Relation relation, HeapTuple tup);
-static HeapTuple ExtractReplicaIdentity(Relation rel, HeapTuple tup, bool key_changed,
+static HeapTuple ExtractReplicaIdentity(Relation rel, HeapTuple tup, bool key_required,
bool *copy);
@@ -3185,6 +3187,7 @@ heap_update(Relation relation, ItemPointer otid, HeapTuple newtup,
bool all_visible_cleared_new = false;
bool checked_lockers;
bool locker_remains;
+ bool id_has_external = false;
TransactionId xmax_new_tuple,
xmax_old_tuple;
uint16 infomask_old_tuple,
@@ -3269,7 +3272,7 @@ heap_update(Relation relation, ItemPointer otid, HeapTuple newtup,
Assert(ItemIdIsNormal(lp));
/*
- * Fill in enough data in oldtup for HeapDetermineModifiedColumns to work
+ * Fill in enough data in oldtup for HeapDetermineColumnsInfo to work
* properly.
*/
oldtup.t_tableOid = RelationGetRelid(relation);
@@ -3280,9 +3283,17 @@ heap_update(Relation relation, ItemPointer otid, HeapTuple newtup,
/* the new tuple is ready, except for this: */
newtup->t_tableOid = RelationGetRelid(relation);
- /* Determine columns modified by the update. */
- modified_attrs = HeapDetermineModifiedColumns(relation, interesting_attrs,
- &oldtup, newtup);
+ /*
+ * Determine columns modified by the update. Additionally, identify
+ * whether any of the unmodified replica identity key attributes in the
+ * old tuple is externally stored or not. This is required because for
+ * such attributes the flattened value won't be WAL logged as part of the
+ * new tuple so we must include it as part of the old_key_tuple. See
+ * ExtractReplicaIdentity.
+ */
+ modified_attrs = HeapDetermineColumnsInfo(relation, interesting_attrs,
+ id_attrs, &oldtup,
+ newtup, &id_has_external);
/*
* If we're not updating any "key" column, we can grab a weaker lock type.
@@ -3883,10 +3894,12 @@ l2:
* Compute replica identity tuple before entering the critical section so
* we don't PANIC upon a memory allocation failure.
* ExtractReplicaIdentity() will return NULL if nothing needs to be
- * logged.
+ * logged. Pass old key required as true only if the replica identity key
+ * columns are modified or it has external data.
*/
old_key_tuple = ExtractReplicaIdentity(relation, &oldtup,
- bms_overlap(modified_attrs, id_attrs),
+ bms_overlap(modified_attrs, id_attrs) ||
+ id_has_external,
&old_key_copied);
/* NO EREPORT(ERROR) from here till changes are logged */
@@ -4042,48 +4055,16 @@ l2:
}
/*
- * Check if the specified attribute's value is same in both given tuples.
- * Subroutine for HeapDetermineModifiedColumns.
+ * Check if the specified attribute's values are the same. Subroutine for
+ * HeapDetermineColumnsInfo.
*/
static bool
-heap_tuple_attr_equals(TupleDesc tupdesc, int attrnum,
- HeapTuple tup1, HeapTuple tup2)
+heap_attr_equals(TupleDesc tupdesc, int attrnum, Datum value1, Datum value2,
+ bool isnull1, bool isnull2)
{
- Datum value1,
- value2;
- bool isnull1,
- isnull2;
Form_pg_attribute att;
/*
- * If it's a whole-tuple reference, say "not equal". It's not really
- * worth supporting this case, since it could only succeed after a no-op
- * update, which is hardly a case worth optimizing for.
- */
- if (attrnum == 0)
- return false;
-
- /*
- * Likewise, automatically say "not equal" for any system attribute other
- * than tableOID; we cannot expect these to be consistent in a HOT chain,
- * or even to be set correctly yet in the new tuple.
- */
- if (attrnum < 0)
- {
- if (attrnum != TableOidAttributeNumber)
- return false;
- }
-
- /*
- * Extract the corresponding values. XXX this is pretty inefficient if
- * there are many indexed columns. Should HeapDetermineModifiedColumns do
- * a single heap_deform_tuple call on each tuple, instead? But that
- * doesn't work for system columns ...
- */
- value1 = heap_getattr(tup1, attrnum, tupdesc, &isnull1);
- value2 = heap_getattr(tup2, attrnum, tupdesc, &isnull2);
-
- /*
* If one value is NULL and other is not, then they are certainly not
* equal
*/
@@ -4124,24 +4105,92 @@ heap_tuple_attr_equals(TupleDesc tupdesc, int attrnum,
* Given an updated tuple, determine (and return into the output bitmapset),
* from those listed as interesting, the set of columns that changed.
*
- * The input bitmapset is destructively modified; that is OK since this is
- * invoked at most once in heap_update.
+ * has_external indicates if any of the unmodified attributes (from those
+ * listed as interesting) of the old tuple is a member of external_cols and is
+ * stored externally.
+ *
+ * The input interesting_cols bitmapset is destructively modified; that is OK
+ * since this is invoked at most once in heap_update.
*/
static Bitmapset *
-HeapDetermineModifiedColumns(Relation relation, Bitmapset *interesting_cols,
- HeapTuple oldtup, HeapTuple newtup)
+HeapDetermineColumnsInfo(Relation relation,
+ Bitmapset *interesting_cols,
+ Bitmapset *external_cols,
+ HeapTuple oldtup, HeapTuple newtup,
+ bool *has_external)
{
- int attnum;
+ int attrnum;
Bitmapset *modified = NULL;
+ TupleDesc tupdesc = RelationGetDescr(relation);
- while ((attnum = bms_first_member(interesting_cols)) >= 0)
+ while ((attrnum = bms_first_member(interesting_cols)) >= 0)
{
- attnum += FirstLowInvalidHeapAttributeNumber;
+ Datum value1,
+ value2;
+ bool isnull1,
+ isnull2;
+
+ attrnum += FirstLowInvalidHeapAttributeNumber;
- if (!heap_tuple_attr_equals(RelationGetDescr(relation),
- attnum, oldtup, newtup))
+ /*
+ * If it's a whole-tuple reference, say "not equal". It's not really
+ * worth supporting this case, since it could only succeed after a
+ * no-op update, which is hardly a case worth optimizing for.
+ */
+ if (attrnum == 0)
+ {
modified = bms_add_member(modified,
- attnum - FirstLowInvalidHeapAttributeNumber);
+ attrnum -
+ FirstLowInvalidHeapAttributeNumber);
+ continue;
+ }
+
+ /*
+ * Likewise, automatically say "not equal" for any system attribute
+ * other than tableOID; we cannot expect these to be consistent in a
+ * HOT chain, or even to be set correctly yet in the new tuple.
+ */
+ if (attrnum < 0)
+ {
+ if (attrnum != TableOidAttributeNumber)
+ {
+ modified = bms_add_member(modified,
+ attrnum -
+ FirstLowInvalidHeapAttributeNumber);
+ continue;
+ }
+ }
+
+ /*
+ * Extract the corresponding values. XXX this is pretty inefficient
+ * if there are many indexed columns. Should we do a single
+ * heap_deform_tuple call on each tuple, instead? But that doesn't
+ * work for system columns ...
+ */
+ value1 = heap_getattr(oldtup, attrnum, tupdesc, &isnull1);
+ value2 = heap_getattr(newtup, attrnum, tupdesc, &isnull2);
+
+ if (!heap_attr_equals(RelationGetDescr(relation), attrnum, value1,
+ value2, isnull1, isnull2))
+ {
+ modified = bms_add_member(modified,
+ attrnum -
+ FirstLowInvalidHeapAttributeNumber);
+ continue;
+ }
+
+ /* No need to check attributes that can't be stored externally. */
+ if (isnull1 || TupleDescAttr(tupdesc, attrnum - 1)->attlen != -1)
+ continue;
+
+ /*
+ * Check if the old tuple's attribute is stored externally and is a
+ * member of external_cols.
+ */
+ if (VARATT_IS_EXTERNAL((struct varlena *) DatumGetPointer(value1)) &&
+ bms_is_member(attrnum - FirstLowInvalidHeapAttributeNumber,
+ external_cols))
+ *has_external = true;
}
return modified;
@@ -8306,14 +8355,14 @@ log_heap_new_cid(Relation relation, HeapTuple tup)
* Returns NULL if there's no need to log an identity or if there's no suitable
* key defined.
*
- * key_changed should be false if caller knows that no replica identity
- * columns changed value. It's always true in the DELETE case.
+ * Pass key_required true if any replica identity columns changed value, or if
+ * any of them have any external data. Delete must always pass true.
*
* *copy is set to true if the returned tuple is a modified copy rather than
* the same tuple that was passed in.
*/
static HeapTuple
-ExtractReplicaIdentity(Relation relation, HeapTuple tp, bool key_changed,
+ExtractReplicaIdentity(Relation relation, HeapTuple tp, bool key_required,
bool *copy)
{
TupleDesc desc = RelationGetDescr(relation);
@@ -8345,8 +8394,8 @@ ExtractReplicaIdentity(Relation relation, HeapTuple tp, bool key_changed,
return tp;
}
- /* if the key hasn't changed and we're only logging the key, we're done */
- if (!key_changed)
+ /* if the key isn't required and we're only logging the key, we're done */
+ if (!key_required)
return NULL;
/* find out the replica identity columns */
@@ -8354,10 +8403,10 @@ ExtractReplicaIdentity(Relation relation, HeapTuple tp, bool key_changed,
INDEX_ATTR_BITMAP_IDENTITY_KEY);
/*
- * If there's no defined replica identity columns, treat as !key_changed.
+ * If there's no defined replica identity columns, treat as !key_required.
* (This case should not be reachable from heap_update, since that should
- * calculate key_changed accurately. But heap_delete just passes constant
- * true for key_changed, so we can hit this case in deletes.)
+ * calculate key_required accurately. But heap_delete just passes
+ * constant true for key_required, so we can hit this case in deletes.)
*/
if (bms_is_empty(idattrs))
return NULL;
--
1.8.3.1
13-v8-0001-WAL-log-unchanged-toasted-replica-identity-key-.patchapplication/octet-stream; name=13-v8-0001-WAL-log-unchanged-toasted-replica-identity-key-.patchDownload
From 5b1a368cb9efdd4b2e741965dd34a43df461365d Mon Sep 17 00:00:00 2001
From: Amit Kapila <akapila@postgresql.org>
Date: Fri, 4 Feb 2022 11:51:05 +0530
Subject: [PATCH v813] WAL log unchanged toasted replica identity key
attributes.
Currently, during UPDATE, the unchanged replica identity key attributes
are not logged separately because they are getting logged as part of the
new tuple. But if they are stored externally then the untoasted values are
not getting logged as part of the new tuple and logical replication won't
be able to replicate such UPDATEs. So we need to log such attributes as
part of the old_key_tuple during UPDATE.
Reported-by: Haiying Tang
Author: Dilip Kumar and Amit Kapila
Reviewed-by: Alvaro Herrera
Backpatch-through: 10
Discussion: https://postgr.es/m/OS0PR01MB611342D0A92D4F4BF26C0F47FB229@OS0PR01MB6113.jpnprd01.prod.outlook.com
---
contrib/test_decoding/expected/toast.out | 2 +-
doc/src/sgml/ref/alter_table.sgml | 9 +-
src/backend/access/heap/heapam.c | 177 ++++++++++++++++++++-----------
3 files changed, 119 insertions(+), 69 deletions(-)
diff --git a/contrib/test_decoding/expected/toast.out b/contrib/test_decoding/expected/toast.out
index cd03e9d..a757e7d 100644
--- a/contrib/test_decoding/expected/toast.out
+++ b/contrib/test_decoding/expected/toast.out
@@ -77,7 +77,7 @@ SELECT substr(data, 1, 200) FROM pg_logical_slot_get_changes('regression_slot',
table public.toasted_key: INSERT: id[integer]:1 toasted_key[text]:'1234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123
COMMIT
BEGIN
- table public.toasted_key: UPDATE: id[integer]:1 toasted_key[text]:unchanged-toast-datum toasted_col1[text]:unchanged-toast-datum toasted_col2[text]:'987654321098765432109876543210987654321098765432109
+ table public.toasted_key: UPDATE: old-key: toasted_key[text]:'123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678
COMMIT
BEGIN
table public.toasted_key: UPDATE: old-key: toasted_key[text]:'123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678
diff --git a/doc/src/sgml/ref/alter_table.sgml b/doc/src/sgml/ref/alter_table.sgml
index ca95939..2eced5d 100644
--- a/doc/src/sgml/ref/alter_table.sgml
+++ b/doc/src/sgml/ref/alter_table.sgml
@@ -819,10 +819,11 @@ WITH ( MODULUS <replaceable class="parameter">numeric_literal</replaceable>, REM
<listitem>
<para>
This form changes the information which is written to the write-ahead log
- to identify rows which are updated or deleted. This option has no effect
- except when logical replication is in use.
- In all cases, no old values are logged unless at least one of the columns
- that would be logged differs between the old and new versions of the row.
+ to identify rows which are updated or deleted.
+ In most cases, the old value of each column is only logged if it differs
+ from the new value; however, if the old value is stored externally, it is
+ always logged regardless of whether it changed.
+ This option has no effect except when logical replication is in use.
<variablelist>
<varlistentry>
<term><literal>DEFAULT</literal></term>
diff --git a/src/backend/access/heap/heapam.c b/src/backend/access/heap/heapam.c
index 0807b40..e788e9f 100644
--- a/src/backend/access/heap/heapam.c
+++ b/src/backend/access/heap/heapam.c
@@ -76,9 +76,11 @@ static XLogRecPtr log_heap_update(Relation reln, Buffer oldbuf,
Buffer newbuf, HeapTuple oldtup,
HeapTuple newtup, HeapTuple old_key_tuple,
bool all_visible_cleared, bool new_all_visible_cleared);
-static Bitmapset *HeapDetermineModifiedColumns(Relation relation,
- Bitmapset *interesting_cols,
- HeapTuple oldtup, HeapTuple newtup);
+static Bitmapset *HeapDetermineColumnsInfo(Relation relation,
+ Bitmapset *interesting_cols,
+ Bitmapset *external_cols,
+ HeapTuple oldtup, HeapTuple newtup,
+ bool *id_has_external);
static bool heap_acquire_tuplock(Relation relation, ItemPointer tid,
LockTupleMode mode, LockWaitPolicy wait_policy,
bool *have_tuple_lock);
@@ -102,7 +104,7 @@ static void MultiXactIdWait(MultiXactId multi, MultiXactStatus status, uint16 in
static bool ConditionalMultiXactIdWait(MultiXactId multi, MultiXactStatus status,
uint16 infomask, Relation rel, int *remaining);
static XLogRecPtr log_heap_new_cid(Relation relation, HeapTuple tup);
-static HeapTuple ExtractReplicaIdentity(Relation rel, HeapTuple tup, bool key_changed,
+static HeapTuple ExtractReplicaIdentity(Relation rel, HeapTuple tup, bool key_required,
bool *copy);
@@ -2910,6 +2912,7 @@ heap_update(Relation relation, ItemPointer otid, HeapTuple newtup,
bool all_visible_cleared_new = false;
bool checked_lockers;
bool locker_remains;
+ bool id_has_external = false;
TransactionId xmax_new_tuple,
xmax_old_tuple;
uint16 infomask_old_tuple,
@@ -2994,7 +2997,7 @@ heap_update(Relation relation, ItemPointer otid, HeapTuple newtup,
Assert(ItemIdIsNormal(lp));
/*
- * Fill in enough data in oldtup for HeapDetermineModifiedColumns to work
+ * Fill in enough data in oldtup for HeapDetermineColumnsInfo to work
* properly.
*/
oldtup.t_tableOid = RelationGetRelid(relation);
@@ -3005,9 +3008,17 @@ heap_update(Relation relation, ItemPointer otid, HeapTuple newtup,
/* the new tuple is ready, except for this: */
newtup->t_tableOid = RelationGetRelid(relation);
- /* Determine columns modified by the update. */
- modified_attrs = HeapDetermineModifiedColumns(relation, interesting_attrs,
- &oldtup, newtup);
+ /*
+ * Determine columns modified by the update. Additionally, identify
+ * whether any of the unmodified replica identity key attributes in the
+ * old tuple is externally stored or not. This is required because for
+ * such attributes the flattened value won't be WAL logged as part of the
+ * new tuple so we must include it as part of the old_key_tuple. See
+ * ExtractReplicaIdentity.
+ */
+ modified_attrs = HeapDetermineColumnsInfo(relation, interesting_attrs,
+ id_attrs, &oldtup,
+ newtup, &id_has_external);
/*
* If we're not updating any "key" column, we can grab a weaker lock type.
@@ -3609,10 +3620,12 @@ l2:
* Compute replica identity tuple before entering the critical section so
* we don't PANIC upon a memory allocation failure.
* ExtractReplicaIdentity() will return NULL if nothing needs to be
- * logged.
+ * logged. Pass old key required as true only if the replica identity key
+ * columns are modified or it has external data.
*/
old_key_tuple = ExtractReplicaIdentity(relation, &oldtup,
- bms_overlap(modified_attrs, id_attrs),
+ bms_overlap(modified_attrs, id_attrs) ||
+ id_has_external,
&old_key_copied);
/* NO EREPORT(ERROR) from here till changes are logged */
@@ -3768,48 +3781,16 @@ l2:
}
/*
- * Check if the specified attribute's value is same in both given tuples.
- * Subroutine for HeapDetermineModifiedColumns.
+ * Check if the specified attribute's values are the same. Subroutine for
+ * HeapDetermineColumnsInfo.
*/
static bool
-heap_tuple_attr_equals(TupleDesc tupdesc, int attrnum,
- HeapTuple tup1, HeapTuple tup2)
+heap_attr_equals(TupleDesc tupdesc, int attrnum, Datum value1, Datum value2,
+ bool isnull1, bool isnull2)
{
- Datum value1,
- value2;
- bool isnull1,
- isnull2;
Form_pg_attribute att;
/*
- * If it's a whole-tuple reference, say "not equal". It's not really
- * worth supporting this case, since it could only succeed after a no-op
- * update, which is hardly a case worth optimizing for.
- */
- if (attrnum == 0)
- return false;
-
- /*
- * Likewise, automatically say "not equal" for any system attribute other
- * than tableOID; we cannot expect these to be consistent in a HOT chain,
- * or even to be set correctly yet in the new tuple.
- */
- if (attrnum < 0)
- {
- if (attrnum != TableOidAttributeNumber)
- return false;
- }
-
- /*
- * Extract the corresponding values. XXX this is pretty inefficient if
- * there are many indexed columns. Should HeapDetermineModifiedColumns do
- * a single heap_deform_tuple call on each tuple, instead? But that
- * doesn't work for system columns ...
- */
- value1 = heap_getattr(tup1, attrnum, tupdesc, &isnull1);
- value2 = heap_getattr(tup2, attrnum, tupdesc, &isnull2);
-
- /*
* If one value is NULL and other is not, then they are certainly not
* equal
*/
@@ -3850,24 +3831,92 @@ heap_tuple_attr_equals(TupleDesc tupdesc, int attrnum,
* Given an updated tuple, determine (and return into the output bitmapset),
* from those listed as interesting, the set of columns that changed.
*
- * The input bitmapset is destructively modified; that is OK since this is
- * invoked at most once in heap_update.
+ * has_external indicates if any of the unmodified attributes (from those
+ * listed as interesting) of the old tuple is a member of external_cols and is
+ * stored externally.
+ *
+ * The input interesting_cols bitmapset is destructively modified; that is OK
+ * since this is invoked at most once in heap_update.
*/
static Bitmapset *
-HeapDetermineModifiedColumns(Relation relation, Bitmapset *interesting_cols,
- HeapTuple oldtup, HeapTuple newtup)
+HeapDetermineColumnsInfo(Relation relation,
+ Bitmapset *interesting_cols,
+ Bitmapset *external_cols,
+ HeapTuple oldtup, HeapTuple newtup,
+ bool *has_external)
{
- int attnum;
+ int attrnum;
Bitmapset *modified = NULL;
+ TupleDesc tupdesc = RelationGetDescr(relation);
- while ((attnum = bms_first_member(interesting_cols)) >= 0)
+ while ((attrnum = bms_first_member(interesting_cols)) >= 0)
{
- attnum += FirstLowInvalidHeapAttributeNumber;
+ Datum value1,
+ value2;
+ bool isnull1,
+ isnull2;
- if (!heap_tuple_attr_equals(RelationGetDescr(relation),
- attnum, oldtup, newtup))
+ attrnum += FirstLowInvalidHeapAttributeNumber;
+
+ /*
+ * If it's a whole-tuple reference, say "not equal". It's not really
+ * worth supporting this case, since it could only succeed after a
+ * no-op update, which is hardly a case worth optimizing for.
+ */
+ if (attrnum == 0)
+ {
modified = bms_add_member(modified,
- attnum - FirstLowInvalidHeapAttributeNumber);
+ attrnum -
+ FirstLowInvalidHeapAttributeNumber);
+ continue;
+ }
+
+ /*
+ * Likewise, automatically say "not equal" for any system attribute
+ * other than tableOID; we cannot expect these to be consistent in a
+ * HOT chain, or even to be set correctly yet in the new tuple.
+ */
+ if (attrnum < 0)
+ {
+ if (attrnum != TableOidAttributeNumber)
+ {
+ modified = bms_add_member(modified,
+ attrnum -
+ FirstLowInvalidHeapAttributeNumber);
+ continue;
+ }
+ }
+
+ /*
+ * Extract the corresponding values. XXX this is pretty inefficient
+ * if there are many indexed columns. Should we do a single
+ * heap_deform_tuple call on each tuple, instead? But that doesn't
+ * work for system columns ...
+ */
+ value1 = heap_getattr(oldtup, attrnum, tupdesc, &isnull1);
+ value2 = heap_getattr(newtup, attrnum, tupdesc, &isnull2);
+
+ if (!heap_attr_equals(RelationGetDescr(relation), attrnum, value1,
+ value2, isnull1, isnull2))
+ {
+ modified = bms_add_member(modified,
+ attrnum -
+ FirstLowInvalidHeapAttributeNumber);
+ continue;
+ }
+
+ /* No need to check attributes that can't be stored externally. */
+ if (isnull1 || TupleDescAttr(tupdesc, attrnum - 1)->attlen != -1)
+ continue;
+
+ /*
+ * Check if the old tuple's attribute is stored externally and is a
+ * member of external_cols.
+ */
+ if (VARATT_IS_EXTERNAL((struct varlena *) DatumGetPointer(value1)) &&
+ bms_is_member(attrnum - FirstLowInvalidHeapAttributeNumber,
+ external_cols))
+ *has_external = true;
}
return modified;
@@ -7620,14 +7669,14 @@ log_heap_new_cid(Relation relation, HeapTuple tup)
* Returns NULL if there's no need to log an identity or if there's no suitable
* key defined.
*
- * key_changed should be false if caller knows that no replica identity
- * columns changed value. It's always true in the DELETE case.
+ * Pass key_required true if any replica identity columns changed value, or if
+ * any of them have any external data. Delete must always pass true.
*
* *copy is set to true if the returned tuple is a modified copy rather than
* the same tuple that was passed in.
*/
static HeapTuple
-ExtractReplicaIdentity(Relation relation, HeapTuple tp, bool key_changed,
+ExtractReplicaIdentity(Relation relation, HeapTuple tp, bool key_required,
bool *copy)
{
TupleDesc desc = RelationGetDescr(relation);
@@ -7659,8 +7708,8 @@ ExtractReplicaIdentity(Relation relation, HeapTuple tp, bool key_changed,
return tp;
}
- /* if the key hasn't changed and we're only logging the key, we're done */
- if (!key_changed)
+ /* if the key isn't required and we're only logging the key, we're done */
+ if (!key_required)
return NULL;
/* find out the replica identity columns */
@@ -7668,10 +7717,10 @@ ExtractReplicaIdentity(Relation relation, HeapTuple tp, bool key_changed,
INDEX_ATTR_BITMAP_IDENTITY_KEY);
/*
- * If there's no defined replica identity columns, treat as !key_changed.
+ * If there's no defined replica identity columns, treat as !key_required.
* (This case should not be reachable from heap_update, since that should
- * calculate key_changed accurately. But heap_delete just passes constant
- * true for key_changed, so we can hit this case in deletes.)
+ * calculate key_required accurately. But heap_delete just passes
+ * constant true for key_required, so we can hit this case in deletes.)
*/
if (bms_is_empty(idattrs))
return NULL;
--
1.8.3.1
12-v8-0001-WAL-log-unchanged-toasted-replica-identity-key-.patchapplication/octet-stream; name=12-v8-0001-WAL-log-unchanged-toasted-replica-identity-key-.patchDownload
From 23ff0fda4f951b2b2f15fe21d58706abb4ff5345 Mon Sep 17 00:00:00 2001
From: Amit Kapila <akapila@postgresql.org>
Date: Fri, 4 Feb 2022 12:12:40 +0530
Subject: [PATCH v812] WAL log unchanged toasted replica identity key
attributes.
Currently, during UPDATE, the unchanged replica identity key attributes
are not logged separately because they are getting logged as part of the
new tuple. But if they are stored externally then the untoasted values are
not getting logged as part of the new tuple and logical replication won't
be able to replicate such UPDATEs. So we need to log such attributes as
part of the old_key_tuple during UPDATE.
Reported-by: Haiying Tang
Author: Dilip Kumar and Amit Kapila
Reviewed-by: Alvaro Herrera
Backpatch-through: 10
Discussion: https://postgr.es/m/OS0PR01MB611342D0A92D4F4BF26C0F47FB229@OS0PR01MB6113.jpnprd01.prod.outlook.com
---
contrib/test_decoding/expected/toast.out | 2 +-
doc/src/sgml/ref/alter_table.sgml | 9 +-
src/backend/access/heap/heapam.c | 177 ++++++++++++++++++++-----------
3 files changed, 119 insertions(+), 69 deletions(-)
diff --git a/contrib/test_decoding/expected/toast.out b/contrib/test_decoding/expected/toast.out
index cd03e9d..a757e7d 100644
--- a/contrib/test_decoding/expected/toast.out
+++ b/contrib/test_decoding/expected/toast.out
@@ -77,7 +77,7 @@ SELECT substr(data, 1, 200) FROM pg_logical_slot_get_changes('regression_slot',
table public.toasted_key: INSERT: id[integer]:1 toasted_key[text]:'1234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123
COMMIT
BEGIN
- table public.toasted_key: UPDATE: id[integer]:1 toasted_key[text]:unchanged-toast-datum toasted_col1[text]:unchanged-toast-datum toasted_col2[text]:'987654321098765432109876543210987654321098765432109
+ table public.toasted_key: UPDATE: old-key: toasted_key[text]:'123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678
COMMIT
BEGIN
table public.toasted_key: UPDATE: old-key: toasted_key[text]:'123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678
diff --git a/doc/src/sgml/ref/alter_table.sgml b/doc/src/sgml/ref/alter_table.sgml
index c572f75..327aedc 100644
--- a/doc/src/sgml/ref/alter_table.sgml
+++ b/doc/src/sgml/ref/alter_table.sgml
@@ -798,10 +798,11 @@ WITH ( MODULUS <replaceable class="parameter">numeric_literal</replaceable>, REM
<listitem>
<para>
This form changes the information which is written to the write-ahead log
- to identify rows which are updated or deleted. This option has no effect
- except when logical replication is in use.
- In all cases, no old values are logged unless at least one of the columns
- that would be logged differs between the old and new versions of the row.
+ to identify rows which are updated or deleted.
+ In most cases, the old value of each column is only logged if it differs
+ from the new value; however, if the old value is stored externally, it is
+ always logged regardless of whether it changed.
+ This option has no effect except when logical replication is in use.
<variablelist>
<varlistentry>
<term><literal>DEFAULT</literal></term>
diff --git a/src/backend/access/heap/heapam.c b/src/backend/access/heap/heapam.c
index 24d3932..0b54542 100644
--- a/src/backend/access/heap/heapam.c
+++ b/src/backend/access/heap/heapam.c
@@ -76,9 +76,11 @@ static XLogRecPtr log_heap_update(Relation reln, Buffer oldbuf,
Buffer newbuf, HeapTuple oldtup,
HeapTuple newtup, HeapTuple old_key_tup,
bool all_visible_cleared, bool new_all_visible_cleared);
-static Bitmapset *HeapDetermineModifiedColumns(Relation relation,
- Bitmapset *interesting_cols,
- HeapTuple oldtup, HeapTuple newtup);
+static Bitmapset *HeapDetermineColumnsInfo(Relation relation,
+ Bitmapset *interesting_cols,
+ Bitmapset *external_cols,
+ HeapTuple oldtup, HeapTuple newtup,
+ bool *id_has_external);
static bool heap_acquire_tuplock(Relation relation, ItemPointer tid,
LockTupleMode mode, LockWaitPolicy wait_policy,
bool *have_tuple_lock);
@@ -102,7 +104,7 @@ static void MultiXactIdWait(MultiXactId multi, MultiXactStatus status, uint16 in
static bool ConditionalMultiXactIdWait(MultiXactId multi, MultiXactStatus status,
uint16 infomask, Relation rel, int *remaining);
static XLogRecPtr log_heap_new_cid(Relation relation, HeapTuple tup);
-static HeapTuple ExtractReplicaIdentity(Relation rel, HeapTuple tup, bool key_modified,
+static HeapTuple ExtractReplicaIdentity(Relation rel, HeapTuple tup, bool key_required,
bool *copy);
@@ -2941,6 +2943,7 @@ heap_update(Relation relation, ItemPointer otid, HeapTuple newtup,
bool all_visible_cleared_new = false;
bool checked_lockers;
bool locker_remains;
+ bool id_has_external = false;
TransactionId xmax_new_tuple,
xmax_old_tuple;
uint16 infomask_old_tuple,
@@ -3025,7 +3028,7 @@ heap_update(Relation relation, ItemPointer otid, HeapTuple newtup,
Assert(ItemIdIsNormal(lp));
/*
- * Fill in enough data in oldtup for HeapDetermineModifiedColumns to work
+ * Fill in enough data in oldtup for HeapDetermineColumnsInfo to work
* properly.
*/
oldtup.t_tableOid = RelationGetRelid(relation);
@@ -3036,9 +3039,17 @@ heap_update(Relation relation, ItemPointer otid, HeapTuple newtup,
/* the new tuple is ready, except for this: */
newtup->t_tableOid = RelationGetRelid(relation);
- /* Determine columns modified by the update. */
- modified_attrs = HeapDetermineModifiedColumns(relation, interesting_attrs,
- &oldtup, newtup);
+ /*
+ * Determine columns modified by the update. Additionally, identify
+ * whether any of the unmodified replica identity key attributes in the
+ * old tuple is externally stored or not. This is required because for
+ * such attributes the flattened value won't be WAL logged as part of the
+ * new tuple so we must include it as part of the old_key_tuple. See
+ * ExtractReplicaIdentity.
+ */
+ modified_attrs = HeapDetermineColumnsInfo(relation, interesting_attrs,
+ id_attrs, &oldtup,
+ newtup, &id_has_external);
/*
* If we're not updating any "key" column, we can grab a weaker lock type.
@@ -3639,10 +3650,12 @@ l2:
* Compute replica identity tuple before entering the critical section so
* we don't PANIC upon a memory allocation failure.
* ExtractReplicaIdentity() will return NULL if nothing needs to be
- * logged.
+ * logged. Pass old key required as true only if the replica identity key
+ * columns are modified or it has external data.
*/
old_key_tuple = ExtractReplicaIdentity(relation, &oldtup,
- bms_overlap(modified_attrs, id_attrs),
+ bms_overlap(modified_attrs, id_attrs) ||
+ id_has_external,
&old_key_copied);
/* NO EREPORT(ERROR) from here till changes are logged */
@@ -3798,48 +3811,16 @@ l2:
}
/*
- * Check if the specified attribute's value is same in both given tuples.
- * Subroutine for HeapDetermineModifiedColumns.
+ * Check if the specified attribute's values are the same. Subroutine for
+ * HeapDetermineColumnsInfo.
*/
static bool
-heap_tuple_attr_equals(TupleDesc tupdesc, int attrnum,
- HeapTuple tup1, HeapTuple tup2)
+heap_attr_equals(TupleDesc tupdesc, int attrnum, Datum value1, Datum value2,
+ bool isnull1, bool isnull2)
{
- Datum value1,
- value2;
- bool isnull1,
- isnull2;
Form_pg_attribute att;
/*
- * If it's a whole-tuple reference, say "not equal". It's not really
- * worth supporting this case, since it could only succeed after a no-op
- * update, which is hardly a case worth optimizing for.
- */
- if (attrnum == 0)
- return false;
-
- /*
- * Likewise, automatically say "not equal" for any system attribute other
- * than tableOID; we cannot expect these to be consistent in a HOT chain,
- * or even to be set correctly yet in the new tuple.
- */
- if (attrnum < 0)
- {
- if (attrnum != TableOidAttributeNumber)
- return false;
- }
-
- /*
- * Extract the corresponding values. XXX this is pretty inefficient if
- * there are many indexed columns. Should HeapDetermineModifiedColumns do
- * a single heap_deform_tuple call on each tuple, instead? But that
- * doesn't work for system columns ...
- */
- value1 = heap_getattr(tup1, attrnum, tupdesc, &isnull1);
- value2 = heap_getattr(tup2, attrnum, tupdesc, &isnull2);
-
- /*
* If one value is NULL and other is not, then they are certainly not
* equal
*/
@@ -3880,24 +3861,92 @@ heap_tuple_attr_equals(TupleDesc tupdesc, int attrnum,
* Given an updated tuple, determine (and return into the output bitmapset),
* from those listed as interesting, the set of columns that changed.
*
- * The input bitmapset is destructively modified; that is OK since this is
- * invoked at most once in heap_update.
+ * has_external indicates if any of the unmodified attributes (from those
+ * listed as interesting) of the old tuple is a member of external_cols and is
+ * stored externally.
+ *
+ * The input interesting_cols bitmapset is destructively modified; that is OK
+ * since this is invoked at most once in heap_update.
*/
static Bitmapset *
-HeapDetermineModifiedColumns(Relation relation, Bitmapset *interesting_cols,
- HeapTuple oldtup, HeapTuple newtup)
+HeapDetermineColumnsInfo(Relation relation,
+ Bitmapset *interesting_cols,
+ Bitmapset *external_cols,
+ HeapTuple oldtup, HeapTuple newtup,
+ bool *has_external)
{
- int attnum;
+ int attrnum;
Bitmapset *modified = NULL;
+ TupleDesc tupdesc = RelationGetDescr(relation);
- while ((attnum = bms_first_member(interesting_cols)) >= 0)
+ while ((attrnum = bms_first_member(interesting_cols)) >= 0)
{
- attnum += FirstLowInvalidHeapAttributeNumber;
+ Datum value1,
+ value2;
+ bool isnull1,
+ isnull2;
- if (!heap_tuple_attr_equals(RelationGetDescr(relation),
- attnum, oldtup, newtup))
+ attrnum += FirstLowInvalidHeapAttributeNumber;
+
+ /*
+ * If it's a whole-tuple reference, say "not equal". It's not really
+ * worth supporting this case, since it could only succeed after a
+ * no-op update, which is hardly a case worth optimizing for.
+ */
+ if (attrnum == 0)
+ {
modified = bms_add_member(modified,
- attnum - FirstLowInvalidHeapAttributeNumber);
+ attrnum -
+ FirstLowInvalidHeapAttributeNumber);
+ continue;
+ }
+
+ /*
+ * Likewise, automatically say "not equal" for any system attribute
+ * other than tableOID; we cannot expect these to be consistent in a
+ * HOT chain, or even to be set correctly yet in the new tuple.
+ */
+ if (attrnum < 0)
+ {
+ if (attrnum != TableOidAttributeNumber)
+ {
+ modified = bms_add_member(modified,
+ attrnum -
+ FirstLowInvalidHeapAttributeNumber);
+ continue;
+ }
+ }
+
+ /*
+ * Extract the corresponding values. XXX this is pretty inefficient
+ * if there are many indexed columns. Should we do a single
+ * heap_deform_tuple call on each tuple, instead? But that doesn't
+ * work for system columns ...
+ */
+ value1 = heap_getattr(oldtup, attrnum, tupdesc, &isnull1);
+ value2 = heap_getattr(newtup, attrnum, tupdesc, &isnull2);
+
+ if (!heap_attr_equals(RelationGetDescr(relation), attrnum, value1,
+ value2, isnull1, isnull2))
+ {
+ modified = bms_add_member(modified,
+ attrnum -
+ FirstLowInvalidHeapAttributeNumber);
+ continue;
+ }
+
+ /* No need to check attributes that can't be stored externally. */
+ if (isnull1 || TupleDescAttr(tupdesc, attrnum - 1)->attlen != -1)
+ continue;
+
+ /*
+ * Check if the old tuple's attribute is stored externally and is a
+ * member of external_cols.
+ */
+ if (VARATT_IS_EXTERNAL((struct varlena *) DatumGetPointer(value1)) &&
+ bms_is_member(attrnum - FirstLowInvalidHeapAttributeNumber,
+ external_cols))
+ *has_external = true;
}
return modified;
@@ -7661,14 +7710,14 @@ log_heap_new_cid(Relation relation, HeapTuple tup)
* Returns NULL if there's no need to log an identity or if there's no suitable
* key defined.
*
- * key_changed should be false if caller knows that no replica identity
- * columns changed value. It's always true in the DELETE case.
+ * Pass key_required true if any replica identity columns changed value, or if
+ * any of them have any external data. Delete must always pass true.
*
* *copy is set to true if the returned tuple is a modified copy rather than
* the same tuple that was passed in.
*/
static HeapTuple
-ExtractReplicaIdentity(Relation relation, HeapTuple tp, bool key_changed,
+ExtractReplicaIdentity(Relation relation, HeapTuple tp, bool key_required,
bool *copy)
{
TupleDesc desc = RelationGetDescr(relation);
@@ -7700,8 +7749,8 @@ ExtractReplicaIdentity(Relation relation, HeapTuple tp, bool key_changed,
return tp;
}
- /* if the key hasn't changed and we're only logging the key, we're done */
- if (!key_changed)
+ /* if the key isn't required and we're only logging the key, we're done */
+ if (!key_required)
return NULL;
/* find out the replica identity columns */
@@ -7709,10 +7758,10 @@ ExtractReplicaIdentity(Relation relation, HeapTuple tp, bool key_changed,
INDEX_ATTR_BITMAP_IDENTITY_KEY);
/*
- * If there's no defined replica identity columns, treat as !key_changed.
+ * If there's no defined replica identity columns, treat as !key_required.
* (This case should not be reachable from heap_update, since that should
- * calculate key_changed accurately. But heap_delete just passes constant
- * true for key_changed, so we can hit this case in deletes.)
+ * calculate key_required accurately. But heap_delete just passes
+ * constant true for key_required, so we can hit this case in deletes.)
*/
if (bms_is_empty(idattrs))
return NULL;
--
1.8.3.1
11-v8-0001-WAL-log-unchanged-toasted-replica-identity-key-.patchapplication/octet-stream; name=11-v8-0001-WAL-log-unchanged-toasted-replica-identity-key-.patchDownload
From 0279eaa79022c44273831deca8568c48a1f1ca31 Mon Sep 17 00:00:00 2001
From: Amit Kapila <akapila@postgresql.org>
Date: Fri, 4 Feb 2022 16:30:33 +0530
Subject: [PATCH v811] WAL log unchanged toasted replica identity key
attributes.
Currently, during UPDATE, the unchanged replica identity key attributes
are not logged separately because they are getting logged as part of the
new tuple. But if they are stored externally then the untoasted values are
not getting logged as part of the new tuple and logical replication won't
be able to replicate such UPDATEs. So we need to log such attributes as
part of the old_key_tuple during UPDATE.
Reported-by: Haiying Tang
Author: Dilip Kumar and Amit Kapila
Reviewed-by: Alvaro Herrera
Backpatch-through: 10
Discussion: https://postgr.es/m/OS0PR01MB611342D0A92D4F4BF26C0F47FB229@OS0PR01MB6113.jpnprd01.prod.outlook.com
---
contrib/test_decoding/expected/toast.out | 2 +-
doc/src/sgml/ref/alter_table.sgml | 9 +-
src/backend/access/heap/heapam.c | 172 ++++++++++++++++++++-----------
3 files changed, 118 insertions(+), 65 deletions(-)
diff --git a/contrib/test_decoding/expected/toast.out b/contrib/test_decoding/expected/toast.out
index cd03e9d..a757e7d 100644
--- a/contrib/test_decoding/expected/toast.out
+++ b/contrib/test_decoding/expected/toast.out
@@ -77,7 +77,7 @@ SELECT substr(data, 1, 200) FROM pg_logical_slot_get_changes('regression_slot',
table public.toasted_key: INSERT: id[integer]:1 toasted_key[text]:'1234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123
COMMIT
BEGIN
- table public.toasted_key: UPDATE: id[integer]:1 toasted_key[text]:unchanged-toast-datum toasted_col1[text]:unchanged-toast-datum toasted_col2[text]:'987654321098765432109876543210987654321098765432109
+ table public.toasted_key: UPDATE: old-key: toasted_key[text]:'123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678
COMMIT
BEGIN
table public.toasted_key: UPDATE: old-key: toasted_key[text]:'123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678
diff --git a/doc/src/sgml/ref/alter_table.sgml b/doc/src/sgml/ref/alter_table.sgml
index 3e18b7d..8564adb 100644
--- a/doc/src/sgml/ref/alter_table.sgml
+++ b/doc/src/sgml/ref/alter_table.sgml
@@ -813,10 +813,11 @@ WITH ( MODULUS <replaceable class="parameter">numeric_literal</replaceable>, REM
<listitem>
<para>
This form changes the information which is written to the write-ahead log
- to identify rows which are updated or deleted. This option has no effect
- except when logical replication is in use.
- In all cases, no old values are logged unless at least one of the columns
- that would be logged differs between the old and new versions of the row.
+ to identify rows which are updated or deleted.
+ In most cases, the old value of each column is only logged if it differs
+ from the new value; however, if the old value is stored externally, it is
+ always logged regardless of whether it changed.
+ This option has no effect except when logical replication is in use.
<variablelist>
<varlistentry>
<term><literal>DEFAULT</literal></term>
diff --git a/src/backend/access/heap/heapam.c b/src/backend/access/heap/heapam.c
index 834435c..59543b7 100644
--- a/src/backend/access/heap/heapam.c
+++ b/src/backend/access/heap/heapam.c
@@ -101,9 +101,11 @@ static XLogRecPtr log_heap_update(Relation reln, Buffer oldbuf,
Buffer newbuf, HeapTuple oldtup,
HeapTuple newtup, HeapTuple old_key_tup,
bool all_visible_cleared, bool new_all_visible_cleared);
-static Bitmapset *HeapDetermineModifiedColumns(Relation relation,
- Bitmapset *interesting_cols,
- HeapTuple oldtup, HeapTuple newtup);
+static Bitmapset *HeapDetermineColumnsInfo(Relation relation,
+ Bitmapset *interesting_cols,
+ Bitmapset *external_cols,
+ HeapTuple oldtup, HeapTuple newtup,
+ bool *id_has_external);
static bool heap_acquire_tuplock(Relation relation, ItemPointer tid,
LockTupleMode mode, LockWaitPolicy wait_policy,
bool *have_tuple_lock);
@@ -127,7 +129,7 @@ static void MultiXactIdWait(MultiXactId multi, MultiXactStatus status, uint16 in
static bool ConditionalMultiXactIdWait(MultiXactId multi, MultiXactStatus status,
uint16 infomask, Relation rel, int *remaining);
static XLogRecPtr log_heap_new_cid(Relation relation, HeapTuple tup);
-static HeapTuple ExtractReplicaIdentity(Relation rel, HeapTuple tup, bool key_modified,
+static HeapTuple ExtractReplicaIdentity(Relation rel, HeapTuple tup, bool key_required,
bool *copy);
static bool ProjIndexIsUnchanged(Relation relation, HeapTuple oldtup, HeapTuple newtup);
@@ -3568,6 +3570,7 @@ heap_update(Relation relation, ItemPointer otid, HeapTuple newtup,
bool all_visible_cleared_new = false;
bool checked_lockers;
bool locker_remains;
+ bool id_has_external = false;
TransactionId xmax_new_tuple,
xmax_old_tuple;
uint16 infomask_old_tuple,
@@ -3652,7 +3655,7 @@ heap_update(Relation relation, ItemPointer otid, HeapTuple newtup,
Assert(ItemIdIsNormal(lp));
/*
- * Fill in enough data in oldtup for HeapDetermineModifiedColumns to work
+ * Fill in enough data in oldtup for HeapDetermineColumnsInfo to work
* properly.
*/
oldtup.t_tableOid = RelationGetRelid(relation);
@@ -3678,9 +3681,17 @@ heap_update(Relation relation, ItemPointer otid, HeapTuple newtup,
Assert(!(newtup->t_data->t_infomask & HEAP_HASOID));
}
- /* Determine columns modified by the update. */
- modified_attrs = HeapDetermineModifiedColumns(relation, interesting_attrs,
- &oldtup, newtup);
+ /*
+ * Determine columns modified by the update. Additionally, identify
+ * whether any of the unmodified replica identity key attributes in the
+ * old tuple is externally stored or not. This is required because for
+ * such attributes the flattened value won't be WAL logged as part of the
+ * new tuple so we must include it as part of the old_key_tuple. See
+ * ExtractReplicaIdentity.
+ */
+ modified_attrs = HeapDetermineColumnsInfo(relation, interesting_attrs,
+ id_attrs, &oldtup,
+ newtup, &id_has_external);
/*
* If we're not updating any "key" column, we can grab a weaker lock type.
@@ -4266,10 +4277,12 @@ l2:
* Compute replica identity tuple before entering the critical section so
* we don't PANIC upon a memory allocation failure.
* ExtractReplicaIdentity() will return NULL if nothing needs to be
- * logged.
+ * logged. Pass old key required as true only if the replica identity key
+ * columns are modified or it has external data.
*/
old_key_tuple = ExtractReplicaIdentity(relation, &oldtup,
- bms_overlap(modified_attrs, id_attrs),
+ bms_overlap(modified_attrs, id_attrs) ||
+ id_has_external,
&old_key_copied);
/* NO EREPORT(ERROR) from here till changes are logged */
@@ -4426,49 +4439,16 @@ l2:
}
/*
- * Check if the specified attribute's value is same in both given tuples.
- * Subroutine for HeapDetermineModifiedColumns.
+ * Check if the specified attribute's values are the same. Subroutine for
+ * HeapDetermineColumnsInfo.
*/
static bool
-heap_tuple_attr_equals(TupleDesc tupdesc, int attrnum,
- HeapTuple tup1, HeapTuple tup2)
+heap_attr_equals(TupleDesc tupdesc, int attrnum, Datum value1, Datum value2,
+ bool isnull1, bool isnull2)
{
- Datum value1,
- value2;
- bool isnull1,
- isnull2;
Form_pg_attribute att;
/*
- * If it's a whole-tuple reference, say "not equal". It's not really
- * worth supporting this case, since it could only succeed after a no-op
- * update, which is hardly a case worth optimizing for.
- */
- if (attrnum == 0)
- return false;
-
- /*
- * Likewise, automatically say "not equal" for any system attribute other
- * than OID and tableOID; we cannot expect these to be consistent in a HOT
- * chain, or even to be set correctly yet in the new tuple.
- */
- if (attrnum < 0)
- {
- if (attrnum != ObjectIdAttributeNumber &&
- attrnum != TableOidAttributeNumber)
- return false;
- }
-
- /*
- * Extract the corresponding values. XXX this is pretty inefficient if
- * there are many indexed columns. Should HeapDetermineModifiedColumns do
- * a single heap_deform_tuple call on each tuple, instead? But that
- * doesn't work for system columns ...
- */
- value1 = heap_getattr(tup1, attrnum, tupdesc, &isnull1);
- value2 = heap_getattr(tup2, attrnum, tupdesc, &isnull2);
-
- /*
* If one value is NULL and other is not, then they are certainly not
* equal
*/
@@ -4589,24 +4569,93 @@ ProjIndexIsUnchanged(Relation relation, HeapTuple oldtup, HeapTuple newtup)
* Given an updated tuple, determine (and return into the output bitmapset),
* from those listed as interesting, the set of columns that changed.
*
- * The input bitmapset is destructively modified; that is OK since this is
- * invoked at most once in heap_update.
+ * has_external indicates if any of the unmodified attributes (from those
+ * listed as interesting) of the old tuple is a member of external_cols and is
+ * stored externally.
+ *
+ * The input interesting_cols bitmapset is destructively modified; that is OK
+ * since this is invoked at most once in heap_update.
*/
static Bitmapset *
-HeapDetermineModifiedColumns(Relation relation, Bitmapset *interesting_cols,
- HeapTuple oldtup, HeapTuple newtup)
+HeapDetermineColumnsInfo(Relation relation,
+ Bitmapset *interesting_cols,
+ Bitmapset *external_cols,
+ HeapTuple oldtup, HeapTuple newtup,
+ bool *has_external)
{
- int attnum;
+ int attrnum;
Bitmapset *modified = NULL;
+ TupleDesc tupdesc = RelationGetDescr(relation);
- while ((attnum = bms_first_member(interesting_cols)) >= 0)
+ while ((attrnum = bms_first_member(interesting_cols)) >= 0)
{
- attnum += FirstLowInvalidHeapAttributeNumber;
+ Datum value1,
+ value2;
+ bool isnull1,
+ isnull2;
+
+ attrnum += FirstLowInvalidHeapAttributeNumber;
+
+ /*
+ * If it's a whole-tuple reference, say "not equal". It's not really
+ * worth supporting this case, since it could only succeed after a
+ * no-op update, which is hardly a case worth optimizing for.
+ */
+ if (attrnum == 0)
+ {
+ modified = bms_add_member(modified,
+ attrnum -
+ FirstLowInvalidHeapAttributeNumber);
+ continue;
+ }
+
+ /*
+ * Likewise, automatically say "not equal" for any system attribute
+ * other than OID and tableOID; we cannot expect these to be consistent
+ * in a HOT chain, or even to be set correctly yet in the new tuple.
+ */
+ if (attrnum < 0)
+ {
+ if (attrnum != ObjectIdAttributeNumber &&
+ attrnum != TableOidAttributeNumber)
+ {
+ modified = bms_add_member(modified,
+ attrnum -
+ FirstLowInvalidHeapAttributeNumber);
+ continue;
+ }
+ }
- if (!heap_tuple_attr_equals(RelationGetDescr(relation),
- attnum, oldtup, newtup))
+ /*
+ * Extract the corresponding values. XXX this is pretty inefficient
+ * if there are many indexed columns. Should we do a single
+ * heap_deform_tuple call on each tuple, instead? But that doesn't
+ * work for system columns ...
+ */
+ value1 = heap_getattr(oldtup, attrnum, tupdesc, &isnull1);
+ value2 = heap_getattr(newtup, attrnum, tupdesc, &isnull2);
+
+ if (!heap_attr_equals(RelationGetDescr(relation), attrnum, value1,
+ value2, isnull1, isnull2))
+ {
modified = bms_add_member(modified,
- attnum - FirstLowInvalidHeapAttributeNumber);
+ attrnum -
+ FirstLowInvalidHeapAttributeNumber);
+ continue;
+ }
+
+ /* No need to check attributes that can't be stored externally. */
+ if (isnull1 || TupleDescAttr(tupdesc, attrnum - 1)->attlen != -1)
+ continue;
+
+ /*
+ * Check if the old tuple's attribute is stored externally and is a
+ * member of external_cols.
+ */
+ if (VARATT_IS_EXTERNAL((struct varlena *) DatumGetPointer(value1)) &&
+ bms_is_member(attrnum - FirstLowInvalidHeapAttributeNumber,
+ external_cols))
+ *has_external = true;
}
return modified;
@@ -8116,9 +8165,12 @@ log_heap_new_cid(Relation relation, HeapTuple tup)
*
* Returns NULL if there's no need to log an identity or if there's no suitable
* key in the Relation relation.
+ *
+ * Pass key_required true if any replica identity columns change value, or if
+ * any of them have any external data. Delete must always pass true.
*/
static HeapTuple
-ExtractReplicaIdentity(Relation relation, HeapTuple tp, bool key_changed, bool *copy)
+ExtractReplicaIdentity(Relation relation, HeapTuple tp, bool key_required, bool *copy)
{
TupleDesc desc = RelationGetDescr(relation);
Oid replidindex;
@@ -8151,8 +8203,8 @@ ExtractReplicaIdentity(Relation relation, HeapTuple tp, bool key_changed, bool *
return tp;
}
- /* if the key hasn't changed and we're only logging the key, we're done */
- if (!key_changed)
+ /* if the key isn't required and we're only logging the key, we're done */
+ if (!key_required)
return NULL;
/* find the replica identity index */
--
1.8.3.1
10-v8-0001-WAL-log-unchanged-toasted-replica-identity-key-.patchapplication/octet-stream; name=10-v8-0001-WAL-log-unchanged-toasted-replica-identity-key-.patchDownload
From 3733c699a11c44a71e666290ee1784dcf17b52c5 Mon Sep 17 00:00:00 2001
From: Amit Kapila <akapila@postgresql.org>
Date: Fri, 4 Feb 2022 17:37:31 +0530
Subject: [PATCH v810] WAL log unchanged toasted replica identity key
attributes.
Currently, during UPDATE, the unchanged replica identity key attributes
are not logged separately because they are getting logged as part of the
new tuple. But if they are stored externally then the untoasted values are
not getting logged as part of the new tuple and logical replication won't
be able to replicate such UPDATEs. So we need to log such attributes as
part of the old_key_tuple during UPDATE.
Reported-by: Haiying Tang
Author: Dilip Kumar and Amit Kapila
Reviewed-by: Alvaro Herrera
Backpatch-through: 10
Discussion: https://postgr.es/m/OS0PR01MB611342D0A92D4F4BF26C0F47FB229@OS0PR01MB6113.jpnprd01.prod.outlook.com
---
contrib/test_decoding/expected/toast.out | 2 +-
doc/src/sgml/ref/alter_table.sgml | 9 +-
src/backend/access/heap/heapam.c | 172 ++++++++++++++++++++-----------
3 files changed, 118 insertions(+), 65 deletions(-)
diff --git a/contrib/test_decoding/expected/toast.out b/contrib/test_decoding/expected/toast.out
index 91a9a1e..a8e3676 100644
--- a/contrib/test_decoding/expected/toast.out
+++ b/contrib/test_decoding/expected/toast.out
@@ -77,7 +77,7 @@ SELECT substr(data, 1, 200) FROM pg_logical_slot_get_changes('regression_slot',
table public.toasted_key: INSERT: id[integer]:1 toasted_key[text]:'1234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123
COMMIT
BEGIN
- table public.toasted_key: UPDATE: id[integer]:1 toasted_key[text]:unchanged-toast-datum toasted_col1[text]:unchanged-toast-datum toasted_col2[text]:'987654321098765432109876543210987654321098765432109
+ table public.toasted_key: UPDATE: old-key: toasted_key[text]:'123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678
COMMIT
BEGIN
table public.toasted_key: UPDATE: old-key: toasted_key[text]:'123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678
diff --git a/doc/src/sgml/ref/alter_table.sgml b/doc/src/sgml/ref/alter_table.sgml
index 1279c19..0295d8f 100644
--- a/doc/src/sgml/ref/alter_table.sgml
+++ b/doc/src/sgml/ref/alter_table.sgml
@@ -736,10 +736,11 @@ ALTER TABLE [ IF EXISTS ] <replaceable class="PARAMETER">name</replaceable>
<listitem>
<para>
This form changes the information which is written to the write-ahead log
- to identify rows which are updated or deleted. This option has no effect
- except when logical replication is in use.
- In all cases, no old values are logged unless at least one of the columns
- that would be logged differs between the old and new versions of the row.
+ to identify rows which are updated or deleted.
+ In most cases, the old value of each column is only logged if it differs
+ from the new value; however, if the old value is stored externally, it is
+ always logged regardless of whether it changed.
+ This option has no effect except when logical replication is in use.
<variablelist>
<varlistentry>
<term><literal>DEFAULT</literal></term>
diff --git a/src/backend/access/heap/heapam.c b/src/backend/access/heap/heapam.c
index 144bbcd..fa48b2a 100644
--- a/src/backend/access/heap/heapam.c
+++ b/src/backend/access/heap/heapam.c
@@ -96,9 +96,11 @@ static XLogRecPtr log_heap_update(Relation reln, Buffer oldbuf,
Buffer newbuf, HeapTuple oldtup,
HeapTuple newtup, HeapTuple old_key_tup,
bool all_visible_cleared, bool new_all_visible_cleared);
-static Bitmapset *HeapDetermineModifiedColumns(Relation relation,
- Bitmapset *interesting_cols,
- HeapTuple oldtup, HeapTuple newtup);
+static Bitmapset *HeapDetermineColumnsInfo(Relation relation,
+ Bitmapset *interesting_cols,
+ Bitmapset *external_cols,
+ HeapTuple oldtup, HeapTuple newtup,
+ bool *id_has_external);
static bool heap_acquire_tuplock(Relation relation, ItemPointer tid,
LockTupleMode mode, LockWaitPolicy wait_policy,
bool *have_tuple_lock);
@@ -122,7 +124,7 @@ static void MultiXactIdWait(MultiXactId multi, MultiXactStatus status, uint16 in
static bool ConditionalMultiXactIdWait(MultiXactId multi, MultiXactStatus status,
uint16 infomask, Relation rel, int *remaining);
static XLogRecPtr log_heap_new_cid(Relation relation, HeapTuple tup);
-static HeapTuple ExtractReplicaIdentity(Relation rel, HeapTuple tup, bool key_modified,
+static HeapTuple ExtractReplicaIdentity(Relation rel, HeapTuple tup, bool key_required,
bool *copy);
@@ -3510,6 +3512,7 @@ heap_update(Relation relation, ItemPointer otid, HeapTuple newtup,
bool all_visible_cleared_new = false;
bool checked_lockers;
bool locker_remains;
+ bool id_has_external = false;
TransactionId xmax_new_tuple,
xmax_old_tuple;
uint16 infomask_old_tuple,
@@ -3594,7 +3597,7 @@ heap_update(Relation relation, ItemPointer otid, HeapTuple newtup,
Assert(ItemIdIsNormal(lp));
/*
- * Fill in enough data in oldtup for HeapDetermineModifiedColumns to work
+ * Fill in enough data in oldtup for HeapDetermineColumnsInfo to work
* properly.
*/
oldtup.t_tableOid = RelationGetRelid(relation);
@@ -3620,9 +3623,17 @@ heap_update(Relation relation, ItemPointer otid, HeapTuple newtup,
Assert(!(newtup->t_data->t_infomask & HEAP_HASOID));
}
- /* Determine columns modified by the update. */
- modified_attrs = HeapDetermineModifiedColumns(relation, interesting_attrs,
- &oldtup, newtup);
+ /*
+ * Determine columns modified by the update. Additionally, identify
+ * whether any of the unmodified replica identity key attributes in the
+ * old tuple is externally stored or not. This is required because for
+ * such attributes the flattened value won't be WAL logged as part of the
+ * new tuple so we must include it as part of the old_key_tuple. See
+ * ExtractReplicaIdentity.
+ */
+ modified_attrs = HeapDetermineColumnsInfo(relation, interesting_attrs,
+ id_attrs, &oldtup,
+ newtup, &id_has_external);
/*
* If we're not updating any "key" column, we can grab a weaker lock type.
@@ -4201,10 +4212,12 @@ l2:
* Compute replica identity tuple before entering the critical section so
* we don't PANIC upon a memory allocation failure.
* ExtractReplicaIdentity() will return NULL if nothing needs to be
- * logged.
+ * logged. Pass old key required as true only if the replica identity key
+ * columns are modified or it has external data.
*/
old_key_tuple = ExtractReplicaIdentity(relation, &oldtup,
- bms_overlap(modified_attrs, id_attrs),
+ bms_overlap(modified_attrs, id_attrs) ||
+ id_has_external,
&old_key_copied);
/* NO EREPORT(ERROR) from here till changes are logged */
@@ -4360,49 +4373,16 @@ l2:
}
/*
- * Check if the specified attribute's value is same in both given tuples.
- * Subroutine for HeapDetermineModifiedColumns.
+ * Check if the specified attribute's values are the same. Subroutine for
+ * HeapDetermineColumnsInfo.
*/
static bool
-heap_tuple_attr_equals(TupleDesc tupdesc, int attrnum,
- HeapTuple tup1, HeapTuple tup2)
+heap_attr_equals(TupleDesc tupdesc, int attrnum, Datum value1, Datum value2,
+ bool isnull1, bool isnull2)
{
- Datum value1,
- value2;
- bool isnull1,
- isnull2;
Form_pg_attribute att;
/*
- * If it's a whole-tuple reference, say "not equal". It's not really
- * worth supporting this case, since it could only succeed after a no-op
- * update, which is hardly a case worth optimizing for.
- */
- if (attrnum == 0)
- return false;
-
- /*
- * Likewise, automatically say "not equal" for any system attribute other
- * than OID and tableOID; we cannot expect these to be consistent in a HOT
- * chain, or even to be set correctly yet in the new tuple.
- */
- if (attrnum < 0)
- {
- if (attrnum != ObjectIdAttributeNumber &&
- attrnum != TableOidAttributeNumber)
- return false;
- }
-
- /*
- * Extract the corresponding values. XXX this is pretty inefficient if
- * there are many indexed columns. Should HeapDetermineModifiedColumns do
- * a single heap_deform_tuple call on each tuple, instead? But that
- * doesn't work for system columns ...
- */
- value1 = heap_getattr(tup1, attrnum, tupdesc, &isnull1);
- value2 = heap_getattr(tup2, attrnum, tupdesc, &isnull2);
-
- /*
* If one value is NULL and other is not, then they are certainly not
* equal
*/
@@ -4443,24 +4423,93 @@ heap_tuple_attr_equals(TupleDesc tupdesc, int attrnum,
* Given an updated tuple, determine (and return into the output bitmapset),
* from those listed as interesting, the set of columns that changed.
*
- * The input bitmapset is destructively modified; that is OK since this is
- * invoked at most once in heap_update.
+ * has_external indicates if any of the unmodified attributes (from those
+ * listed as interesting) of the old tuple is a member of external_cols and is
+ * stored externally.
+ *
+ * The input interesting_cols bitmapset is destructively modified; that is OK
+ * since this is invoked at most once in heap_update.
*/
static Bitmapset *
-HeapDetermineModifiedColumns(Relation relation, Bitmapset *interesting_cols,
- HeapTuple oldtup, HeapTuple newtup)
+HeapDetermineColumnsInfo(Relation relation,
+ Bitmapset *interesting_cols,
+ Bitmapset *external_cols,
+ HeapTuple oldtup, HeapTuple newtup,
+ bool *has_external)
{
- int attnum;
+ int attrnum;
Bitmapset *modified = NULL;
+ TupleDesc tupdesc = RelationGetDescr(relation);
- while ((attnum = bms_first_member(interesting_cols)) >= 0)
+ while ((attrnum = bms_first_member(interesting_cols)) >= 0)
{
- attnum += FirstLowInvalidHeapAttributeNumber;
+ Datum value1,
+ value2;
+ bool isnull1,
+ isnull2;
+
+ attrnum += FirstLowInvalidHeapAttributeNumber;
+
+ /*
+ * If it's a whole-tuple reference, say "not equal". It's not really
+ * worth supporting this case, since it could only succeed after a
+ * no-op update, which is hardly a case worth optimizing for.
+ */
+ if (attrnum == 0)
+ {
+ modified = bms_add_member(modified,
+ attrnum -
+ FirstLowInvalidHeapAttributeNumber);
+ continue;
+ }
+
+ /*
+ * Likewise, automatically say "not equal" for any system attribute
+ * other than OID and tableOID; we cannot expect these to be consistent
+ * in a HOT chain, or even to be set correctly yet in the new tuple.
+ */
+ if (attrnum < 0)
+ {
+ if (attrnum != ObjectIdAttributeNumber &&
+ attrnum != TableOidAttributeNumber)
+ {
+ modified = bms_add_member(modified,
+ attrnum -
+ FirstLowInvalidHeapAttributeNumber);
+ continue;
+ }
+ }
- if (!heap_tuple_attr_equals(RelationGetDescr(relation),
- attnum, oldtup, newtup))
+ /*
+ * Extract the corresponding values. XXX this is pretty inefficient
+ * if there are many indexed columns. Should we do a single
+ * heap_deform_tuple call on each tuple, instead? But that doesn't
+ * work for system columns ...
+ */
+ value1 = heap_getattr(oldtup, attrnum, tupdesc, &isnull1);
+ value2 = heap_getattr(newtup, attrnum, tupdesc, &isnull2);
+
+ if (!heap_attr_equals(RelationGetDescr(relation), attrnum, value1,
+ value2, isnull1, isnull2))
+ {
modified = bms_add_member(modified,
- attnum - FirstLowInvalidHeapAttributeNumber);
+ attrnum -
+ FirstLowInvalidHeapAttributeNumber);
+ continue;
+ }
+
+ /* No need to check attributes that can't be stored externally. */
+ if (isnull1 || TupleDescAttr(tupdesc, attrnum - 1)->attlen != -1)
+ continue;
+
+ /*
+ * Check if the old tuple's attribute is stored externally and is a
+ * member of external_cols.
+ */
+ if (VARATT_IS_EXTERNAL((struct varlena *) DatumGetPointer(value1)) &&
+ bms_is_member(attrnum - FirstLowInvalidHeapAttributeNumber,
+ external_cols))
+ *has_external = true;
}
return modified;
@@ -7964,9 +8013,12 @@ log_heap_new_cid(Relation relation, HeapTuple tup)
*
* Returns NULL if there's no need to log an identity or if there's no suitable
* key in the Relation relation.
+ *
+ * Pass key_required true if any replica identity columns changed value, or if
+ * any of them have any external data. Delete must always pass true.
*/
static HeapTuple
-ExtractReplicaIdentity(Relation relation, HeapTuple tp, bool key_changed, bool *copy)
+ExtractReplicaIdentity(Relation relation, HeapTuple tp, bool key_required, bool *copy)
{
TupleDesc desc = RelationGetDescr(relation);
Oid replidindex;
@@ -8000,8 +8052,8 @@ ExtractReplicaIdentity(Relation relation, HeapTuple tp, bool key_changed, bool *
return tp;
}
- /* if the key hasn't changed and we're only logging the key, we're done */
- if (!key_changed)
+ /* if the key isn't required and we're only logging the key, we're done */
+ if (!key_required)
return NULL;
/* find the replica identity index */
--
1.8.3.1
On Mon, Feb 7, 2022 at 12:25 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
On Sat, Feb 5, 2022 at 6:10 AM Amit Kapila <amit.kapila16@gmail.com> wrote:
On Fri, Feb 4, 2022 at 9:06 PM Alvaro Herrera <alvherre@alvh.no-ip.org> wrote:
I have some suggestions
on the comments and docs though.Thanks, your suggestions look good to me. I'll take care of these in
the next version.Attached please find the modified patches.
I have looked into the latest modification and back branch patches and
they look fine to me.
--
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com
Hi,
On 2022-02-07 08:44:00 +0530, Amit Kapila wrote:
Right, and it is getting changed. We are just printing the first 200
characters (by using SQL [1]) from the decoded tuple so what is shown
in the results is the initial 200 bytes.
Ah, I knew I must have been missing something.
The complete decoded data after the patch is as follows:
Hm. I think we should change the way the strings are shortened - otherwise we
don't really verify much in that test. Perhaps we could just replace the long
repetitive strings with something shorter in the output?
E.g. using something like regexp_replace(data, '(1234567890|9876543210){200}', '\1{200}','g')
inside the substr().
Wonder if we should deduplicate the number of different toasted strings in the
file to something that'd allow us to have a single "redact_toast" function or
such. There's too many different ones to have a reasonbly simple redaction
function right now. But that's perhaps better done separately.
Greetings,
Andres Freund
On Tue, Feb 8, 2022 at 12:48 AM Andres Freund <andres@anarazel.de> wrote:
Hi,
On 2022-02-07 08:44:00 +0530, Amit Kapila wrote:
Right, and it is getting changed. We are just printing the first 200
characters (by using SQL [1]) from the decoded tuple so what is shown
in the results is the initial 200 bytes.Ah, I knew I must have been missing something.
The complete decoded data after the patch is as follows:
Hm. I think we should change the way the strings are shortened - otherwise we
don't really verify much in that test. Perhaps we could just replace the long
repetitive strings with something shorter in the output?E.g. using something like regexp_replace(data, '(1234567890|9876543210){200}', '\1{200}','g')
inside the substr().
IMHO, in this particular case using regexp_replace as you explained
would be a good option as we will be verifying complete data instead
of just the first 200 characters.
--
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com
On Tue, Feb 8, 2022 at 12:48 AM Andres Freund <andres@anarazel.de> wrote:
On 2022-02-07 08:44:00 +0530, Amit Kapila wrote:
Right, and it is getting changed. We are just printing the first 200
characters (by using SQL [1]) from the decoded tuple so what is shown
in the results is the initial 200 bytes.Ah, I knew I must have been missing something.
The complete decoded data after the patch is as follows:
Hm. I think we should change the way the strings are shortened - otherwise we
don't really verify much in that test. Perhaps we could just replace the long
repetitive strings with something shorter in the output?E.g. using something like regexp_replace(data, '(1234567890|9876543210){200}', '\1{200}','g')
inside the substr().
This sounds like a good idea. Shall we do this as part of this patch
itself or as a separate improvement?
Wonder if we should deduplicate the number of different toasted strings in the
file to something that'd allow us to have a single "redact_toast" function or
such. There's too many different ones to have a reasonbly simple redaction
function right now.
I think this is also worth trying.
But that's perhaps better done separately.
+1.
--
With Regards,
Amit Kapila.
On Mon, Feb 7, 2022 2:55 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
On Sat, Feb 5, 2022 at 6:10 AM Amit Kapila <amit.kapila16@gmail.com> wrote:
On Fri, Feb 4, 2022 at 9:06 PM Alvaro Herrera <alvherre@alvh.no-ip.org> wrote:
I have some suggestions
on the comments and docs though.Thanks, your suggestions look good to me. I'll take care of these in
the next version.Attached please find the modified patches.
Thanks for your patch. I tried it and it works well.
Two small comments:
1)
+static Bitmapset *HeapDetermineColumnsInfo(Relation relation,
+ Bitmapset *interesting_cols,
+ Bitmapset *external_cols,
+ HeapTuple oldtup, HeapTuple newtup,
+ bool *id_has_external);
+HeapDetermineColumnsInfo(Relation relation,
+ Bitmapset *interesting_cols,
+ Bitmapset *external_cols,
+ HeapTuple oldtup, HeapTuple newtup,
+ bool *has_external)
The declaration and the definition of this function use different parameter
names for the last parameter (id_has_external and has_external), it's better to
be consistent.
2)
+ /*
+ * Check if the old tuple's attribute is stored externally and is a
+ * member of external_cols.
+ */
+ if (VARATT_IS_EXTERNAL((struct varlena *) DatumGetPointer(value1)) &&
+ bms_is_member(attrnum - FirstLowInvalidHeapAttributeNumber,
+ external_cols))
+ *has_external = true;
If has_external is already true, it seems we don't need this check, so should we
check has_external first?
Regards,
Tang
On Tue, Feb 8, 2022, at 10:18 PM, tanghy.fnst@fujitsu.com wrote:
2) + /* + * Check if the old tuple's attribute is stored externally and is a + * member of external_cols. + */ + if (VARATT_IS_EXTERNAL((struct varlena *) DatumGetPointer(value1)) && + bms_is_member(attrnum - FirstLowInvalidHeapAttributeNumber, + external_cols)) + *has_external = true;If has_external is already true, it seems we don't need this check, so should we
check has_external first?
Is it worth it? I don't think so. It complicates a non-critical path. In
general, the condition will be executed once or twice.
--
Euler Taveira
EDB https://www.enterprisedb.com/
On Wed, Feb 9, 2022 at 7:16 AM Euler Taveira <euler@eulerto.com> wrote:
On Tue, Feb 8, 2022, at 10:18 PM, tanghy.fnst@fujitsu.com wrote:
2) + /* + * Check if the old tuple's attribute is stored externally and is a + * member of external_cols. + */ + if (VARATT_IS_EXTERNAL((struct varlena *) DatumGetPointer(value1)) && + bms_is_member(attrnum - FirstLowInvalidHeapAttributeNumber, + external_cols)) + *has_external = true;If has_external is already true, it seems we don't need this check, so should we
check has_external first?Is it worth it? I don't think so.
I also don't think it is worth adding such a check.
--
With Regards,
Amit Kapila.
On Tue, Feb 8, 2022 3:18 AM Andres Freund <andres@anarazel.de> wrote:
On 2022-02-07 08:44:00 +0530, Amit Kapila wrote:
Right, and it is getting changed. We are just printing the first 200
characters (by using SQL [1]) from the decoded tuple so what is shown
in the results is the initial 200 bytes.Ah, I knew I must have been missing something.
The complete decoded data after the patch is as follows:
Hm. I think we should change the way the strings are shortened - otherwise we
don't really verify much in that test. Perhaps we could just replace the long
repetitive strings with something shorter in the output?E.g. using something like regexp_replace(data,
'(1234567890|9876543210){200}', '\1{200}','g')
inside the substr().Wonder if we should deduplicate the number of different toasted strings in the
file to something that'd allow us to have a single "redact_toast" function or
such. There's too many different ones to have a reasonbly simple redaction
function right now. But that's perhaps better done separately.
I tried to make the output shorter using your suggestion like the following SQL,
please see the attached patch, which is based on v8 patch[1]/messages/by-id/CAA4eK1L_Z_2LDwMNbGrwoO+Fc-2Q04YORQSA9UfGUTMQpy2O1Q@mail.gmail.com.
SELECT substr(regexp_replace(data, '(1234567890|9876543210){200}', '\1{200}','g'), 1, 200) FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'include-xids', '0', 'skip-empty-xacts', '1');
Note that some strings are still longer than 200 characters even though they have
been shorter, so they can't be shown entirely.
e.g.
table public.toasted_key: UPDATE: old-key: toasted_key[text]:'1234567890{200}' new-tuple: id[integer]:1 toasted_key[text]:unchanged-toast-datum toasted_col1[text]:unchanged-toast-datum toasted_col2[te
The entire string is:
table public.toasted_key: UPDATE: old-key: toasted_key[text]:'1234567890{200}' new-tuple: id[integer]:1 toasted_key[text]:unchanged-toast-datum toasted_col1[text]:unchanged-toast-datum toasted_col2[text]:'9876543210{200}'
Maybe it's better to change the substr length to 250 to show the entire string, or we
can do it as separate HEAD only improvement where we can deduplicate some of the
other long strings as well. Thoughts?
[1]: /messages/by-id/CAA4eK1L_Z_2LDwMNbGrwoO+Fc-2Q04YORQSA9UfGUTMQpy2O1Q@mail.gmail.com
Regards,
Tang
Attachments:
improve_toast_test.diffapplication/octet-stream; name=improve_toast_test.diffDownload
diff --git a/contrib/test_decoding/expected/toast.out b/contrib/test_decoding/expected/toast.out
index a757e7dc8d..9af9bb3b34 100644
--- a/contrib/test_decoding/expected/toast.out
+++ b/contrib/test_decoding/expected/toast.out
@@ -52,7 +52,7 @@ CREATE TABLE toasted_copy (
);
ALTER TABLE toasted_copy ALTER COLUMN data SET STORAGE EXTERNAL;
\copy toasted_copy FROM STDIN
-SELECT substr(data, 1, 200) FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'include-xids', '0', 'skip-empty-xacts', '1');
+SELECT substr(regexp_replace(data, '(1234567890|9876543210){200}', '\1{200}','g'), 1, 200) FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'include-xids', '0', 'skip-empty-xacts', '1');
substr
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
BEGIN
@@ -74,22 +74,22 @@ SELECT substr(data, 1, 200) FROM pg_logical_slot_get_changes('regression_slot',
table public.xpto: DELETE: id[integer]:1
COMMIT
BEGIN
- table public.toasted_key: INSERT: id[integer]:1 toasted_key[text]:'1234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123
+ table public.toasted_key: INSERT: id[integer]:1 toasted_key[text]:'1234567890{200}' toasted_col1[text]:'9876543210{200}' toasted_col2[text]:null
COMMIT
BEGIN
- table public.toasted_key: UPDATE: old-key: toasted_key[text]:'123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678
+ table public.toasted_key: UPDATE: old-key: toasted_key[text]:'1234567890{200}' new-tuple: id[integer]:1 toasted_key[text]:unchanged-toast-datum toasted_col1[text]:unchanged-toast-datum toasted_col2[te
COMMIT
BEGIN
- table public.toasted_key: UPDATE: old-key: toasted_key[text]:'123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678
+ table public.toasted_key: UPDATE: old-key: toasted_key[text]:'1234567890{200}' new-tuple: id[integer]:1 toasted_key[text]:'1234567890{200}1' toasted_col1[text]:unchanged-toast-datum toasted_col2[text]
COMMIT
BEGIN
- table public.toasted_key: DELETE: toasted_key[text]:'123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567
+ table public.toasted_key: DELETE: toasted_key[text]:'1234567890{200}1'
COMMIT
BEGIN
table public.toasted_copy: INSERT: id[integer]:1 data[text]:'untoasted1'
- table public.toasted_copy: INSERT: id[integer]:2 data[text]:'toasted1-1234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890
+ table public.toasted_copy: INSERT: id[integer]:2 data[text]:'toasted1-1234567890{200}'
table public.toasted_copy: INSERT: id[integer]:3 data[text]:'untoasted2'
- table public.toasted_copy: INSERT: id[integer]:4 data[text]:'toasted2-1234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890
+ table public.toasted_copy: INSERT: id[integer]:4 data[text]:'toasted2-1234567890{200}'
table public.toasted_copy: INSERT: id[integer]:5 data[text]:'untoasted3'
table public.toasted_copy: INSERT: id[integer]:6 data[text]:'untoasted4'
table public.toasted_copy: INSERT: id[integer]:7 data[text]:'untoasted5'
@@ -286,7 +286,7 @@ SELECT substr(data, 1, 200) FROM pg_logical_slot_get_changes('regression_slot',
table public.toasted_copy: INSERT: id[integer]:198 data[text]:'untoasted196'
table public.toasted_copy: INSERT: id[integer]:199 data[text]:'untoasted197'
table public.toasted_copy: INSERT: id[integer]:200 data[text]:'untoasted198'
- table public.toasted_copy: INSERT: id[integer]:201 data[text]:'toasted3-12345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678
+ table public.toasted_copy: INSERT: id[integer]:201 data[text]:'toasted3-1234567890{200}'
table public.toasted_copy: INSERT: id[integer]:202 data[text]:'untoasted199'
table public.toasted_copy: INSERT: id[integer]:203 data[text]:'untoasted200'
COMMIT
diff --git a/contrib/test_decoding/sql/toast.sql b/contrib/test_decoding/sql/toast.sql
index d1c560a174..ac351955e4 100644
--- a/contrib/test_decoding/sql/toast.sql
+++ b/contrib/test_decoding/sql/toast.sql
@@ -264,7 +264,7 @@ ALTER TABLE toasted_copy ALTER COLUMN data SET STORAGE EXTERNAL;
202 untoasted199
203 untoasted200
\.
-SELECT substr(data, 1, 200) FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'include-xids', '0', 'skip-empty-xacts', '1');
+SELECT substr(regexp_replace(data, '(1234567890|9876543210){200}', '\1{200}','g'), 1, 200) FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'include-xids', '0', 'skip-empty-xacts', '1');
-- test we can decode "old" tuples bigger than the max heap tuple size correctly
DROP TABLE IF EXISTS toasted_several;
On Wed, Feb 9, 2022 at 11:08 AM tanghy.fnst@fujitsu.com
<tanghy.fnst@fujitsu.com> wrote:
On Tue, Feb 8, 2022 3:18 AM Andres Freund <andres@anarazel.de> wrote:
On 2022-02-07 08:44:00 +0530, Amit Kapila wrote:
Right, and it is getting changed. We are just printing the first 200
characters (by using SQL [1]) from the decoded tuple so what is shown
in the results is the initial 200 bytes.Ah, I knew I must have been missing something.
The complete decoded data after the patch is as follows:
Hm. I think we should change the way the strings are shortened - otherwise we
don't really verify much in that test. Perhaps we could just replace the long
repetitive strings with something shorter in the output?E.g. using something like regexp_replace(data,
'(1234567890|9876543210){200}', '\1{200}','g')
inside the substr().Wonder if we should deduplicate the number of different toasted strings in the
file to something that'd allow us to have a single "redact_toast" function or
such. There's too many different ones to have a reasonbly simple redaction
function right now. But that's perhaps better done separately.I tried to make the output shorter using your suggestion like the following SQL,
please see the attached patch, which is based on v8 patch[1].SELECT substr(regexp_replace(data, '(1234567890|9876543210){200}', '\1{200}','g'), 1, 200) FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'include-xids', '0', 'skip-empty-xacts', '1');
Note that some strings are still longer than 200 characters even though they have
been shorter, so they can't be shown entirely.e.g.
table public.toasted_key: UPDATE: old-key: toasted_key[text]:'1234567890{200}' new-tuple: id[integer]:1 toasted_key[text]:unchanged-toast-datum toasted_col1[text]:unchanged-toast-datum toasted_col2[teThe entire string is:
table public.toasted_key: UPDATE: old-key: toasted_key[text]:'1234567890{200}' new-tuple: id[integer]:1 toasted_key[text]:unchanged-toast-datum toasted_col1[text]:unchanged-toast-datum toasted_col2[text]:'9876543210{200}'Maybe it's better to change the substr length to 250 to show the entire string, or we
can do it as separate HEAD only improvement where we can deduplicate some of the
other long strings as well. Thoughts?
I think it is better to do this as a separate HEAD-only improvement as
it can affect other tests results. We can also try to deduplicate some
of the other long strings used in toast.sql file along with it.
--
With Regards,
Amit Kapila.
On Mon, Feb 7, 2022 at 1:27 PM Dilip Kumar <dilipbalaut@gmail.com> wrote:
On Mon, Feb 7, 2022 at 12:25 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
Attached please find the modified patches.
I have looked into the latest modification and back branch patches and
they look fine to me.
Today, while looking at this patch again, I think I see one problem
with the below change (referring pg10 patch):
+ if (attrnum < 0)
+ {
+ if (attrnum != ObjectIdAttributeNumber &&
+ attrnum != TableOidAttributeNumber)
+ {
+ modified = bms_add_member(modified,
+ attrnum -
+ FirstLowInvalidHeapAttributeNumber);
+ continue;
+ }
+ }
...
...
+ /* No need to check attributes that can't be stored externally. */
+ if (isnull1 || TupleDescAttr(tupdesc, attrnum - 1)->attlen != -1)
+ continue;
I think it is possible that we use TupleDescAttr for system attribute
(in this case ObjectIdAttributeNumber/TableOidAttributeNumber) which
will be wrong as it contains only user attributes, not system
attributes. See comments atop TupleDescData.
I think this check should be modified to if (attrnum < 0 || isnull1
|| TupleDescAttr(tupdesc, attrnum - 1)->attlen != -1). What do you
think?
* Another minor comment:
+ if (!heap_attr_equals(RelationGetDescr(relation), attrnum, value1,
+ value2, isnull1, isnull2))
I think here we can directly use tupdesc instead of RelationGetDescr(relation).
--
With Regards,
Amit Kapila.
On Thu, Feb 10, 2022 9:34 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
On Mon, Feb 7, 2022 at 1:27 PM Dilip Kumar <dilipbalaut@gmail.com> wrote:
On Mon, Feb 7, 2022 at 12:25 PM Amit Kapila <amit.kapila16@gmail.com>
wrote:
Attached please find the modified patches.
I have looked into the latest modification and back branch patches and
they look fine to me.Today, while looking at this patch again, I think I see one problem with the below change (referring pg10 patch): + if (attrnum < 0) + { + if (attrnum != ObjectIdAttributeNumber && + attrnum != TableOidAttributeNumber) + { + modified = bms_add_member(modified, + attrnum - + FirstLowInvalidHeapAttributeNumber); + continue; + } + } ... ... + /* No need to check attributes that can't be stored externally. */ + if (isnull1 || TupleDescAttr(tupdesc, attrnum - 1)->attlen != -1) + continue;I think it is possible that we use TupleDescAttr for system attribute
(in this case ObjectIdAttributeNumber/TableOidAttributeNumber) which
will be wrong as it contains only user attributes, not system
attributes. See comments atop TupleDescData.I think this check should be modified to if (attrnum < 0 || isnull1
|| TupleDescAttr(tupdesc, attrnum - 1)->attlen != -1). What do you
think?
I agree with you.
* Another minor comment: + if (!heap_attr_equals(RelationGetDescr(relation), attrnum, value1, + value2, isnull1, isnull2))I think here we can directly use tupdesc instead of RelationGetDescr(relation).
+1.
Attached the patches which fixed the above two comments and the first comment in
my previous mail [1]/messages/by-id/OS0PR01MB61134DD41BE6D986B9DB80CCFB2E9@OS0PR01MB6113.jpnprd01.prod.outlook.com, the rest is the same as before.
I ran the tests on all branches, they all passed as expected.
[1]: /messages/by-id/OS0PR01MB61134DD41BE6D986B9DB80CCFB2E9@OS0PR01MB6113.jpnprd01.prod.outlook.com
Regards,
Tang
Attachments:
13-v9-0001-WAL-log-unchanged-toasted-replica-identity-key-.patchapplication/octet-stream; name=13-v9-0001-WAL-log-unchanged-toasted-replica-identity-key-.patchDownload
From cfc5f8239861b013f69bafce52ec24aabae29fae Mon Sep 17 00:00:00 2001
From: Amit Kapila <akapila@postgresql.org>
Date: Fri, 4 Feb 2022 11:51:05 +0530
Subject: [PATCH v913] WAL log unchanged toasted replica identity key
attributes.
Currently, during UPDATE, the unchanged replica identity key attributes
are not logged separately because they are getting logged as part of the
new tuple. But if they are stored externally then the untoasted values are
not getting logged as part of the new tuple and logical replication won't
be able to replicate such UPDATEs. So we need to log such attributes as
part of the old_key_tuple during UPDATE.
Reported-by: Haiying Tang
Author: Dilip Kumar and Amit Kapila
Reviewed-by: Alvaro Herrera
Backpatch-through: 10
Discussion: https://postgr.es/m/OS0PR01MB611342D0A92D4F4BF26C0F47FB229@OS0PR01MB6113.jpnprd01.prod.outlook.com
---
contrib/test_decoding/expected/toast.out | 2 +-
doc/src/sgml/ref/alter_table.sgml | 9 +-
src/backend/access/heap/heapam.c | 181 +++++++++++++++--------
3 files changed, 123 insertions(+), 69 deletions(-)
diff --git a/contrib/test_decoding/expected/toast.out b/contrib/test_decoding/expected/toast.out
index cd03e9d50a..a757e7dc8d 100644
--- a/contrib/test_decoding/expected/toast.out
+++ b/contrib/test_decoding/expected/toast.out
@@ -77,7 +77,7 @@ SELECT substr(data, 1, 200) FROM pg_logical_slot_get_changes('regression_slot',
table public.toasted_key: INSERT: id[integer]:1 toasted_key[text]:'1234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123
COMMIT
BEGIN
- table public.toasted_key: UPDATE: id[integer]:1 toasted_key[text]:unchanged-toast-datum toasted_col1[text]:unchanged-toast-datum toasted_col2[text]:'987654321098765432109876543210987654321098765432109
+ table public.toasted_key: UPDATE: old-key: toasted_key[text]:'123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678
COMMIT
BEGIN
table public.toasted_key: UPDATE: old-key: toasted_key[text]:'123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678
diff --git a/doc/src/sgml/ref/alter_table.sgml b/doc/src/sgml/ref/alter_table.sgml
index ca959391ef..2eced5d65b 100644
--- a/doc/src/sgml/ref/alter_table.sgml
+++ b/doc/src/sgml/ref/alter_table.sgml
@@ -819,10 +819,11 @@ WITH ( MODULUS <replaceable class="parameter">numeric_literal</replaceable>, REM
<listitem>
<para>
This form changes the information which is written to the write-ahead log
- to identify rows which are updated or deleted. This option has no effect
- except when logical replication is in use.
- In all cases, no old values are logged unless at least one of the columns
- that would be logged differs between the old and new versions of the row.
+ to identify rows which are updated or deleted.
+ In most cases, the old value of each column is only logged if it differs
+ from the new value; however, if the old value is stored externally, it is
+ always logged regardless of whether it changed.
+ This option has no effect except when logical replication is in use.
<variablelist>
<varlistentry>
<term><literal>DEFAULT</literal></term>
diff --git a/src/backend/access/heap/heapam.c b/src/backend/access/heap/heapam.c
index 0807b4031e..252eca8c3b 100644
--- a/src/backend/access/heap/heapam.c
+++ b/src/backend/access/heap/heapam.c
@@ -76,9 +76,11 @@ static XLogRecPtr log_heap_update(Relation reln, Buffer oldbuf,
Buffer newbuf, HeapTuple oldtup,
HeapTuple newtup, HeapTuple old_key_tuple,
bool all_visible_cleared, bool new_all_visible_cleared);
-static Bitmapset *HeapDetermineModifiedColumns(Relation relation,
- Bitmapset *interesting_cols,
- HeapTuple oldtup, HeapTuple newtup);
+static Bitmapset *HeapDetermineColumnsInfo(Relation relation,
+ Bitmapset *interesting_cols,
+ Bitmapset *external_cols,
+ HeapTuple oldtup, HeapTuple newtup,
+ bool *has_external);
static bool heap_acquire_tuplock(Relation relation, ItemPointer tid,
LockTupleMode mode, LockWaitPolicy wait_policy,
bool *have_tuple_lock);
@@ -102,7 +104,7 @@ static void MultiXactIdWait(MultiXactId multi, MultiXactStatus status, uint16 in
static bool ConditionalMultiXactIdWait(MultiXactId multi, MultiXactStatus status,
uint16 infomask, Relation rel, int *remaining);
static XLogRecPtr log_heap_new_cid(Relation relation, HeapTuple tup);
-static HeapTuple ExtractReplicaIdentity(Relation rel, HeapTuple tup, bool key_changed,
+static HeapTuple ExtractReplicaIdentity(Relation rel, HeapTuple tup, bool key_required,
bool *copy);
@@ -2910,6 +2912,7 @@ heap_update(Relation relation, ItemPointer otid, HeapTuple newtup,
bool all_visible_cleared_new = false;
bool checked_lockers;
bool locker_remains;
+ bool id_has_external = false;
TransactionId xmax_new_tuple,
xmax_old_tuple;
uint16 infomask_old_tuple,
@@ -2994,7 +2997,7 @@ heap_update(Relation relation, ItemPointer otid, HeapTuple newtup,
Assert(ItemIdIsNormal(lp));
/*
- * Fill in enough data in oldtup for HeapDetermineModifiedColumns to work
+ * Fill in enough data in oldtup for HeapDetermineColumnsInfo to work
* properly.
*/
oldtup.t_tableOid = RelationGetRelid(relation);
@@ -3005,9 +3008,17 @@ heap_update(Relation relation, ItemPointer otid, HeapTuple newtup,
/* the new tuple is ready, except for this: */
newtup->t_tableOid = RelationGetRelid(relation);
- /* Determine columns modified by the update. */
- modified_attrs = HeapDetermineModifiedColumns(relation, interesting_attrs,
- &oldtup, newtup);
+ /*
+ * Determine columns modified by the update. Additionally, identify
+ * whether any of the unmodified replica identity key attributes in the
+ * old tuple is externally stored or not. This is required because for
+ * such attributes the flattened value won't be WAL logged as part of the
+ * new tuple so we must include it as part of the old_key_tuple. See
+ * ExtractReplicaIdentity.
+ */
+ modified_attrs = HeapDetermineColumnsInfo(relation, interesting_attrs,
+ id_attrs, &oldtup,
+ newtup, &id_has_external);
/*
* If we're not updating any "key" column, we can grab a weaker lock type.
@@ -3609,10 +3620,12 @@ l2:
* Compute replica identity tuple before entering the critical section so
* we don't PANIC upon a memory allocation failure.
* ExtractReplicaIdentity() will return NULL if nothing needs to be
- * logged.
+ * logged. Pass old key required as true only if the replica identity key
+ * columns are modified or it has external data.
*/
old_key_tuple = ExtractReplicaIdentity(relation, &oldtup,
- bms_overlap(modified_attrs, id_attrs),
+ bms_overlap(modified_attrs, id_attrs) ||
+ id_has_external,
&old_key_copied);
/* NO EREPORT(ERROR) from here till changes are logged */
@@ -3768,47 +3781,15 @@ l2:
}
/*
- * Check if the specified attribute's value is same in both given tuples.
- * Subroutine for HeapDetermineModifiedColumns.
+ * Check if the specified attribute's values are the same. Subroutine for
+ * HeapDetermineColumnsInfo.
*/
static bool
-heap_tuple_attr_equals(TupleDesc tupdesc, int attrnum,
- HeapTuple tup1, HeapTuple tup2)
+heap_attr_equals(TupleDesc tupdesc, int attrnum, Datum value1, Datum value2,
+ bool isnull1, bool isnull2)
{
- Datum value1,
- value2;
- bool isnull1,
- isnull2;
Form_pg_attribute att;
- /*
- * If it's a whole-tuple reference, say "not equal". It's not really
- * worth supporting this case, since it could only succeed after a no-op
- * update, which is hardly a case worth optimizing for.
- */
- if (attrnum == 0)
- return false;
-
- /*
- * Likewise, automatically say "not equal" for any system attribute other
- * than tableOID; we cannot expect these to be consistent in a HOT chain,
- * or even to be set correctly yet in the new tuple.
- */
- if (attrnum < 0)
- {
- if (attrnum != TableOidAttributeNumber)
- return false;
- }
-
- /*
- * Extract the corresponding values. XXX this is pretty inefficient if
- * there are many indexed columns. Should HeapDetermineModifiedColumns do
- * a single heap_deform_tuple call on each tuple, instead? But that
- * doesn't work for system columns ...
- */
- value1 = heap_getattr(tup1, attrnum, tupdesc, &isnull1);
- value2 = heap_getattr(tup2, attrnum, tupdesc, &isnull2);
-
/*
* If one value is NULL and other is not, then they are certainly not
* equal
@@ -3850,24 +3831,96 @@ heap_tuple_attr_equals(TupleDesc tupdesc, int attrnum,
* Given an updated tuple, determine (and return into the output bitmapset),
* from those listed as interesting, the set of columns that changed.
*
- * The input bitmapset is destructively modified; that is OK since this is
- * invoked at most once in heap_update.
+ * has_external indicates if any of the unmodified attributes (from those
+ * listed as interesting) of the old tuple is a member of external_cols and is
+ * stored externally.
+ *
+ * The input interesting_cols bitmapset is destructively modified; that is OK
+ * since this is invoked at most once in heap_update.
*/
static Bitmapset *
-HeapDetermineModifiedColumns(Relation relation, Bitmapset *interesting_cols,
- HeapTuple oldtup, HeapTuple newtup)
+HeapDetermineColumnsInfo(Relation relation,
+ Bitmapset *interesting_cols,
+ Bitmapset *external_cols,
+ HeapTuple oldtup, HeapTuple newtup,
+ bool *has_external)
{
- int attnum;
+ int attrnum;
Bitmapset *modified = NULL;
+ TupleDesc tupdesc = RelationGetDescr(relation);
- while ((attnum = bms_first_member(interesting_cols)) >= 0)
+ while ((attrnum = bms_first_member(interesting_cols)) >= 0)
{
- attnum += FirstLowInvalidHeapAttributeNumber;
+ Datum value1,
+ value2;
+ bool isnull1,
+ isnull2;
+
+ attrnum += FirstLowInvalidHeapAttributeNumber;
- if (!heap_tuple_attr_equals(RelationGetDescr(relation),
- attnum, oldtup, newtup))
+ /*
+ * If it's a whole-tuple reference, say "not equal". It's not really
+ * worth supporting this case, since it could only succeed after a
+ * no-op update, which is hardly a case worth optimizing for.
+ */
+ if (attrnum == 0)
+ {
modified = bms_add_member(modified,
- attnum - FirstLowInvalidHeapAttributeNumber);
+ attrnum -
+ FirstLowInvalidHeapAttributeNumber);
+ continue;
+ }
+
+ /*
+ * Likewise, automatically say "not equal" for any system attribute
+ * other than tableOID; we cannot expect these to be consistent in a
+ * HOT chain, or even to be set correctly yet in the new tuple.
+ */
+ if (attrnum < 0)
+ {
+ if (attrnum != TableOidAttributeNumber)
+ {
+ modified = bms_add_member(modified,
+ attrnum -
+ FirstLowInvalidHeapAttributeNumber);
+ continue;
+ }
+ }
+
+ /*
+ * Extract the corresponding values. XXX this is pretty inefficient
+ * if there are many indexed columns. Should we do a single
+ * heap_deform_tuple call on each tuple, instead? But that doesn't
+ * work for system columns ...
+ */
+ value1 = heap_getattr(oldtup, attrnum, tupdesc, &isnull1);
+ value2 = heap_getattr(newtup, attrnum, tupdesc, &isnull2);
+
+ if (!heap_attr_equals(tupdesc, attrnum, value1,
+ value2, isnull1, isnull2))
+ {
+ modified = bms_add_member(modified,
+ attrnum -
+ FirstLowInvalidHeapAttributeNumber);
+ continue;
+ }
+
+ /*
+ * No need to check attributes that can't be stored externally. Note
+ * that system attributes can't be stored externally.
+ */
+ if (attrnum < 0 || isnull1 ||
+ TupleDescAttr(tupdesc, attrnum - 1)->attlen != -1)
+ continue;
+
+ /*
+ * Check if the old tuple's attribute is stored externally and is a
+ * member of external_cols.
+ */
+ if (VARATT_IS_EXTERNAL((struct varlena *) DatumGetPointer(value1)) &&
+ bms_is_member(attrnum - FirstLowInvalidHeapAttributeNumber,
+ external_cols))
+ *has_external = true;
}
return modified;
@@ -7620,14 +7673,14 @@ log_heap_new_cid(Relation relation, HeapTuple tup)
* Returns NULL if there's no need to log an identity or if there's no suitable
* key defined.
*
- * key_changed should be false if caller knows that no replica identity
- * columns changed value. It's always true in the DELETE case.
+ * Pass key_required true if any replica identity columns changed value, or if
+ * any of them have any external data. Delete must always pass true.
*
* *copy is set to true if the returned tuple is a modified copy rather than
* the same tuple that was passed in.
*/
static HeapTuple
-ExtractReplicaIdentity(Relation relation, HeapTuple tp, bool key_changed,
+ExtractReplicaIdentity(Relation relation, HeapTuple tp, bool key_required,
bool *copy)
{
TupleDesc desc = RelationGetDescr(relation);
@@ -7659,8 +7712,8 @@ ExtractReplicaIdentity(Relation relation, HeapTuple tp, bool key_changed,
return tp;
}
- /* if the key hasn't changed and we're only logging the key, we're done */
- if (!key_changed)
+ /* if the key isn't required and we're only logging the key, we're done */
+ if (!key_required)
return NULL;
/* find out the replica identity columns */
@@ -7668,10 +7721,10 @@ ExtractReplicaIdentity(Relation relation, HeapTuple tp, bool key_changed,
INDEX_ATTR_BITMAP_IDENTITY_KEY);
/*
- * If there's no defined replica identity columns, treat as !key_changed.
+ * If there's no defined replica identity columns, treat as !key_required.
* (This case should not be reachable from heap_update, since that should
- * calculate key_changed accurately. But heap_delete just passes constant
- * true for key_changed, so we can hit this case in deletes.)
+ * calculate key_required accurately. But heap_delete just passes
+ * constant true for key_required, so we can hit this case in deletes.)
*/
if (bms_is_empty(idattrs))
return NULL;
--
2.18.4
14-v9-0001-WAL-log-unchanged-toasted-replica-identity-key-.patchapplication/octet-stream; name=14-v9-0001-WAL-log-unchanged-toasted-replica-identity-key-.patchDownload
From df870332452f43ca3c4c73c09f42204045976475 Mon Sep 17 00:00:00 2001
From: Amit Kapila <akapila@postgresql.org>
Date: Fri, 4 Feb 2022 11:04:34 +0530
Subject: [PATCH v914] WAL log unchanged toasted replica identity key
attributes.
Currently, during UPDATE, the unchanged replica identity key attributes
are not logged separately because they are getting logged as part of the
new tuple. But if they are stored externally then the untoasted values are
not getting logged as part of the new tuple and logical replication won't
be able to replicate such UPDATEs. So we need to log such attributes as
part of the old_key_tuple during UPDATE.
Reported-by: Haiying Tang
Author: Dilip Kumar and Amit Kapila
Reviewed-by: Alvaro Herrera
Backpatch-through: 10
Discussion: https://postgr.es/m/OS0PR01MB611342D0A92D4F4BF26C0F47FB229@OS0PR01MB6113.jpnprd01.prod.outlook.com
---
contrib/test_decoding/expected/toast.out | 2 +-
doc/src/sgml/ref/alter_table.sgml | 9 +-
src/backend/access/heap/heapam.c | 181 +++++++++++++++--------
3 files changed, 123 insertions(+), 69 deletions(-)
diff --git a/contrib/test_decoding/expected/toast.out b/contrib/test_decoding/expected/toast.out
index cd03e9d50a..a757e7dc8d 100644
--- a/contrib/test_decoding/expected/toast.out
+++ b/contrib/test_decoding/expected/toast.out
@@ -77,7 +77,7 @@ SELECT substr(data, 1, 200) FROM pg_logical_slot_get_changes('regression_slot',
table public.toasted_key: INSERT: id[integer]:1 toasted_key[text]:'1234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123
COMMIT
BEGIN
- table public.toasted_key: UPDATE: id[integer]:1 toasted_key[text]:unchanged-toast-datum toasted_col1[text]:unchanged-toast-datum toasted_col2[text]:'987654321098765432109876543210987654321098765432109
+ table public.toasted_key: UPDATE: old-key: toasted_key[text]:'123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678
COMMIT
BEGIN
table public.toasted_key: UPDATE: old-key: toasted_key[text]:'123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678
diff --git a/doc/src/sgml/ref/alter_table.sgml b/doc/src/sgml/ref/alter_table.sgml
index d4f2a73fc4..8933bb0d77 100644
--- a/doc/src/sgml/ref/alter_table.sgml
+++ b/doc/src/sgml/ref/alter_table.sgml
@@ -850,10 +850,11 @@ WITH ( MODULUS <replaceable class="parameter">numeric_literal</replaceable>, REM
<listitem>
<para>
This form changes the information which is written to the write-ahead log
- to identify rows which are updated or deleted. This option has no effect
- except when logical replication is in use.
- In all cases, no old values are logged unless at least one of the columns
- that would be logged differs between the old and new versions of the row.
+ to identify rows which are updated or deleted.
+ In most cases, the old value of each column is only logged if it differs
+ from the new value; however, if the old value is stored externally, it is
+ always logged regardless of whether it changed.
+ This option has no effect except when logical replication is in use.
<variablelist>
<varlistentry>
<term><literal>DEFAULT</literal></term>
diff --git a/src/backend/access/heap/heapam.c b/src/backend/access/heap/heapam.c
index 114a96954a..3887a58bbc 100644
--- a/src/backend/access/heap/heapam.c
+++ b/src/backend/access/heap/heapam.c
@@ -78,9 +78,11 @@ static XLogRecPtr log_heap_update(Relation reln, Buffer oldbuf,
Buffer newbuf, HeapTuple oldtup,
HeapTuple newtup, HeapTuple old_key_tuple,
bool all_visible_cleared, bool new_all_visible_cleared);
-static Bitmapset *HeapDetermineModifiedColumns(Relation relation,
- Bitmapset *interesting_cols,
- HeapTuple oldtup, HeapTuple newtup);
+static Bitmapset *HeapDetermineColumnsInfo(Relation relation,
+ Bitmapset *interesting_cols,
+ Bitmapset *external_cols,
+ HeapTuple oldtup, HeapTuple newtup,
+ bool *has_external);
static bool heap_acquire_tuplock(Relation relation, ItemPointer tid,
LockTupleMode mode, LockWaitPolicy wait_policy,
bool *have_tuple_lock);
@@ -106,7 +108,7 @@ static bool ConditionalMultiXactIdWait(MultiXactId multi, MultiXactStatus status
static void index_delete_sort(TM_IndexDeleteOp *delstate);
static int bottomup_sort_and_shrink(TM_IndexDeleteOp *delstate);
static XLogRecPtr log_heap_new_cid(Relation relation, HeapTuple tup);
-static HeapTuple ExtractReplicaIdentity(Relation rel, HeapTuple tup, bool key_changed,
+static HeapTuple ExtractReplicaIdentity(Relation rel, HeapTuple tup, bool key_required,
bool *copy);
@@ -3185,6 +3187,7 @@ heap_update(Relation relation, ItemPointer otid, HeapTuple newtup,
bool all_visible_cleared_new = false;
bool checked_lockers;
bool locker_remains;
+ bool id_has_external = false;
TransactionId xmax_new_tuple,
xmax_old_tuple;
uint16 infomask_old_tuple,
@@ -3269,7 +3272,7 @@ heap_update(Relation relation, ItemPointer otid, HeapTuple newtup,
Assert(ItemIdIsNormal(lp));
/*
- * Fill in enough data in oldtup for HeapDetermineModifiedColumns to work
+ * Fill in enough data in oldtup for HeapDetermineColumnsInfo to work
* properly.
*/
oldtup.t_tableOid = RelationGetRelid(relation);
@@ -3280,9 +3283,17 @@ heap_update(Relation relation, ItemPointer otid, HeapTuple newtup,
/* the new tuple is ready, except for this: */
newtup->t_tableOid = RelationGetRelid(relation);
- /* Determine columns modified by the update. */
- modified_attrs = HeapDetermineModifiedColumns(relation, interesting_attrs,
- &oldtup, newtup);
+ /*
+ * Determine columns modified by the update. Additionally, identify
+ * whether any of the unmodified replica identity key attributes in the
+ * old tuple is externally stored or not. This is required because for
+ * such attributes the flattened value won't be WAL logged as part of the
+ * new tuple so we must include it as part of the old_key_tuple. See
+ * ExtractReplicaIdentity.
+ */
+ modified_attrs = HeapDetermineColumnsInfo(relation, interesting_attrs,
+ id_attrs, &oldtup,
+ newtup, &id_has_external);
/*
* If we're not updating any "key" column, we can grab a weaker lock type.
@@ -3883,10 +3894,12 @@ l2:
* Compute replica identity tuple before entering the critical section so
* we don't PANIC upon a memory allocation failure.
* ExtractReplicaIdentity() will return NULL if nothing needs to be
- * logged.
+ * logged. Pass old key required as true only if the replica identity key
+ * columns are modified or it has external data.
*/
old_key_tuple = ExtractReplicaIdentity(relation, &oldtup,
- bms_overlap(modified_attrs, id_attrs),
+ bms_overlap(modified_attrs, id_attrs) ||
+ id_has_external,
&old_key_copied);
/* NO EREPORT(ERROR) from here till changes are logged */
@@ -4042,47 +4055,15 @@ l2:
}
/*
- * Check if the specified attribute's value is same in both given tuples.
- * Subroutine for HeapDetermineModifiedColumns.
+ * Check if the specified attribute's values are the same. Subroutine for
+ * HeapDetermineColumnsInfo.
*/
static bool
-heap_tuple_attr_equals(TupleDesc tupdesc, int attrnum,
- HeapTuple tup1, HeapTuple tup2)
+heap_attr_equals(TupleDesc tupdesc, int attrnum, Datum value1, Datum value2,
+ bool isnull1, bool isnull2)
{
- Datum value1,
- value2;
- bool isnull1,
- isnull2;
Form_pg_attribute att;
- /*
- * If it's a whole-tuple reference, say "not equal". It's not really
- * worth supporting this case, since it could only succeed after a no-op
- * update, which is hardly a case worth optimizing for.
- */
- if (attrnum == 0)
- return false;
-
- /*
- * Likewise, automatically say "not equal" for any system attribute other
- * than tableOID; we cannot expect these to be consistent in a HOT chain,
- * or even to be set correctly yet in the new tuple.
- */
- if (attrnum < 0)
- {
- if (attrnum != TableOidAttributeNumber)
- return false;
- }
-
- /*
- * Extract the corresponding values. XXX this is pretty inefficient if
- * there are many indexed columns. Should HeapDetermineModifiedColumns do
- * a single heap_deform_tuple call on each tuple, instead? But that
- * doesn't work for system columns ...
- */
- value1 = heap_getattr(tup1, attrnum, tupdesc, &isnull1);
- value2 = heap_getattr(tup2, attrnum, tupdesc, &isnull2);
-
/*
* If one value is NULL and other is not, then they are certainly not
* equal
@@ -4124,24 +4105,96 @@ heap_tuple_attr_equals(TupleDesc tupdesc, int attrnum,
* Given an updated tuple, determine (and return into the output bitmapset),
* from those listed as interesting, the set of columns that changed.
*
- * The input bitmapset is destructively modified; that is OK since this is
- * invoked at most once in heap_update.
+ * has_external indicates if any of the unmodified attributes (from those
+ * listed as interesting) of the old tuple is a member of external_cols and is
+ * stored externally.
+ *
+ * The input interesting_cols bitmapset is destructively modified; that is OK
+ * since this is invoked at most once in heap_update.
*/
static Bitmapset *
-HeapDetermineModifiedColumns(Relation relation, Bitmapset *interesting_cols,
- HeapTuple oldtup, HeapTuple newtup)
+HeapDetermineColumnsInfo(Relation relation,
+ Bitmapset *interesting_cols,
+ Bitmapset *external_cols,
+ HeapTuple oldtup, HeapTuple newtup,
+ bool *has_external)
{
- int attnum;
+ int attrnum;
Bitmapset *modified = NULL;
+ TupleDesc tupdesc = RelationGetDescr(relation);
- while ((attnum = bms_first_member(interesting_cols)) >= 0)
+ while ((attrnum = bms_first_member(interesting_cols)) >= 0)
{
- attnum += FirstLowInvalidHeapAttributeNumber;
+ Datum value1,
+ value2;
+ bool isnull1,
+ isnull2;
+
+ attrnum += FirstLowInvalidHeapAttributeNumber;
- if (!heap_tuple_attr_equals(RelationGetDescr(relation),
- attnum, oldtup, newtup))
+ /*
+ * If it's a whole-tuple reference, say "not equal". It's not really
+ * worth supporting this case, since it could only succeed after a
+ * no-op update, which is hardly a case worth optimizing for.
+ */
+ if (attrnum == 0)
+ {
modified = bms_add_member(modified,
- attnum - FirstLowInvalidHeapAttributeNumber);
+ attrnum -
+ FirstLowInvalidHeapAttributeNumber);
+ continue;
+ }
+
+ /*
+ * Likewise, automatically say "not equal" for any system attribute
+ * other than tableOID; we cannot expect these to be consistent in a
+ * HOT chain, or even to be set correctly yet in the new tuple.
+ */
+ if (attrnum < 0)
+ {
+ if (attrnum != TableOidAttributeNumber)
+ {
+ modified = bms_add_member(modified,
+ attrnum -
+ FirstLowInvalidHeapAttributeNumber);
+ continue;
+ }
+ }
+
+ /*
+ * Extract the corresponding values. XXX this is pretty inefficient
+ * if there are many indexed columns. Should we do a single
+ * heap_deform_tuple call on each tuple, instead? But that doesn't
+ * work for system columns ...
+ */
+ value1 = heap_getattr(oldtup, attrnum, tupdesc, &isnull1);
+ value2 = heap_getattr(newtup, attrnum, tupdesc, &isnull2);
+
+ if (!heap_attr_equals(tupdesc, attrnum, value1,
+ value2, isnull1, isnull2))
+ {
+ modified = bms_add_member(modified,
+ attrnum -
+ FirstLowInvalidHeapAttributeNumber);
+ continue;
+ }
+
+ /*
+ * No need to check attributes that can't be stored externally. Note
+ * that system attributes can't be stored externally.
+ */
+ if (attrnum < 0 || isnull1 ||
+ TupleDescAttr(tupdesc, attrnum - 1)->attlen != -1)
+ continue;
+
+ /*
+ * Check if the old tuple's attribute is stored externally and is a
+ * member of external_cols.
+ */
+ if (VARATT_IS_EXTERNAL((struct varlena *) DatumGetPointer(value1)) &&
+ bms_is_member(attrnum - FirstLowInvalidHeapAttributeNumber,
+ external_cols))
+ *has_external = true;
}
return modified;
@@ -8306,14 +8359,14 @@ log_heap_new_cid(Relation relation, HeapTuple tup)
* Returns NULL if there's no need to log an identity or if there's no suitable
* key defined.
*
- * key_changed should be false if caller knows that no replica identity
- * columns changed value. It's always true in the DELETE case.
+ * Pass key_required true if any replica identity columns changed value, or if
+ * any of them have any external data. Delete must always pass true.
*
* *copy is set to true if the returned tuple is a modified copy rather than
* the same tuple that was passed in.
*/
static HeapTuple
-ExtractReplicaIdentity(Relation relation, HeapTuple tp, bool key_changed,
+ExtractReplicaIdentity(Relation relation, HeapTuple tp, bool key_required,
bool *copy)
{
TupleDesc desc = RelationGetDescr(relation);
@@ -8345,8 +8398,8 @@ ExtractReplicaIdentity(Relation relation, HeapTuple tp, bool key_changed,
return tp;
}
- /* if the key hasn't changed and we're only logging the key, we're done */
- if (!key_changed)
+ /* if the key isn't required and we're only logging the key, we're done */
+ if (!key_required)
return NULL;
/* find out the replica identity columns */
@@ -8354,10 +8407,10 @@ ExtractReplicaIdentity(Relation relation, HeapTuple tp, bool key_changed,
INDEX_ATTR_BITMAP_IDENTITY_KEY);
/*
- * If there's no defined replica identity columns, treat as !key_changed.
+ * If there's no defined replica identity columns, treat as !key_required.
* (This case should not be reachable from heap_update, since that should
- * calculate key_changed accurately. But heap_delete just passes constant
- * true for key_changed, so we can hit this case in deletes.)
+ * calculate key_required accurately. But heap_delete just passes
+ * constant true for key_required, so we can hit this case in deletes.)
*/
if (bms_is_empty(idattrs))
return NULL;
--
2.18.4
HEAD-v9-0001-WAL-log-unchanged-toasted-replica-identity-key-at.patchapplication/octet-stream; name=HEAD-v9-0001-WAL-log-unchanged-toasted-replica-identity-key-at.patchDownload
From 8bad13f448e8968a1141bfdfa752240f9c542659 Mon Sep 17 00:00:00 2001
From: Amit Kapila <akapila@postgresql.org>
Date: Fri, 4 Feb 2022 10:30:32 +0530
Subject: [PATCH v9] WAL log unchanged toasted replica identity key attributes.
Currently, during UPDATE, the unchanged replica identity key attributes
are not logged separately because they are getting logged as part of the
new tuple. But if they are stored externally then the untoasted values are
not getting logged as part of the new tuple and logical replication won't
be able to replicate such UPDATEs. So we need to log such attributes as
part of the old_key_tuple during UPDATE.
Reported-by: Haiying Tang
Author: Dilip Kumar and Amit Kapila
Reviewed-by: Alvaro Herrera
Backpatch-through: 10
Discussion: https://postgr.es/m/OS0PR01MB611342D0A92D4F4BF26C0F47FB229@OS0PR01MB6113.jpnprd01.prod.outlook.com
---
contrib/test_decoding/expected/toast.out | 2 +-
doc/src/sgml/ref/alter_table.sgml | 9 +-
src/backend/access/heap/heapam.c | 181 +++++++++++++++--------
3 files changed, 123 insertions(+), 69 deletions(-)
diff --git a/contrib/test_decoding/expected/toast.out b/contrib/test_decoding/expected/toast.out
index cd03e9d50a..a757e7dc8d 100644
--- a/contrib/test_decoding/expected/toast.out
+++ b/contrib/test_decoding/expected/toast.out
@@ -77,7 +77,7 @@ SELECT substr(data, 1, 200) FROM pg_logical_slot_get_changes('regression_slot',
table public.toasted_key: INSERT: id[integer]:1 toasted_key[text]:'1234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123
COMMIT
BEGIN
- table public.toasted_key: UPDATE: id[integer]:1 toasted_key[text]:unchanged-toast-datum toasted_col1[text]:unchanged-toast-datum toasted_col2[text]:'987654321098765432109876543210987654321098765432109
+ table public.toasted_key: UPDATE: old-key: toasted_key[text]:'123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678
COMMIT
BEGIN
table public.toasted_key: UPDATE: old-key: toasted_key[text]:'123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678
diff --git a/doc/src/sgml/ref/alter_table.sgml b/doc/src/sgml/ref/alter_table.sgml
index dee026e3a0..5c0735e08a 100644
--- a/doc/src/sgml/ref/alter_table.sgml
+++ b/doc/src/sgml/ref/alter_table.sgml
@@ -871,10 +871,11 @@ WITH ( MODULUS <replaceable class="parameter">numeric_literal</replaceable>, REM
<listitem>
<para>
This form changes the information which is written to the write-ahead log
- to identify rows which are updated or deleted. This option has no effect
- except when logical replication is in use.
- In all cases, no old values are logged unless at least one of the columns
- that would be logged differs between the old and new versions of the row.
+ to identify rows which are updated or deleted.
+ In most cases, the old value of each column is only logged if it differs
+ from the new value; however, if the old value is stored externally, it is
+ always logged regardless of whether it changed.
+ This option has no effect except when logical replication is in use.
<variablelist>
<varlistentry>
<term><literal>DEFAULT</literal></term>
diff --git a/src/backend/access/heap/heapam.c b/src/backend/access/heap/heapam.c
index 98230aac49..9ae642fbeb 100644
--- a/src/backend/access/heap/heapam.c
+++ b/src/backend/access/heap/heapam.c
@@ -78,9 +78,11 @@ static XLogRecPtr log_heap_update(Relation reln, Buffer oldbuf,
Buffer newbuf, HeapTuple oldtup,
HeapTuple newtup, HeapTuple old_key_tuple,
bool all_visible_cleared, bool new_all_visible_cleared);
-static Bitmapset *HeapDetermineModifiedColumns(Relation relation,
- Bitmapset *interesting_cols,
- HeapTuple oldtup, HeapTuple newtup);
+static Bitmapset *HeapDetermineColumnsInfo(Relation relation,
+ Bitmapset *interesting_cols,
+ Bitmapset *external_cols,
+ HeapTuple oldtup, HeapTuple newtup,
+ bool *has_external);
static bool heap_acquire_tuplock(Relation relation, ItemPointer tid,
LockTupleMode mode, LockWaitPolicy wait_policy,
bool *have_tuple_lock);
@@ -106,7 +108,7 @@ static bool ConditionalMultiXactIdWait(MultiXactId multi, MultiXactStatus status
static void index_delete_sort(TM_IndexDeleteOp *delstate);
static int bottomup_sort_and_shrink(TM_IndexDeleteOp *delstate);
static XLogRecPtr log_heap_new_cid(Relation relation, HeapTuple tup);
-static HeapTuple ExtractReplicaIdentity(Relation rel, HeapTuple tup, bool key_changed,
+static HeapTuple ExtractReplicaIdentity(Relation rel, HeapTuple tup, bool key_required,
bool *copy);
@@ -3184,6 +3186,7 @@ heap_update(Relation relation, ItemPointer otid, HeapTuple newtup,
bool all_visible_cleared_new = false;
bool checked_lockers;
bool locker_remains;
+ bool id_has_external = false;
TransactionId xmax_new_tuple,
xmax_old_tuple;
uint16 infomask_old_tuple,
@@ -3251,7 +3254,7 @@ heap_update(Relation relation, ItemPointer otid, HeapTuple newtup,
Assert(ItemIdIsNormal(lp));
/*
- * Fill in enough data in oldtup for HeapDetermineModifiedColumns to work
+ * Fill in enough data in oldtup for HeapDetermineColumnsInfo to work
* properly.
*/
oldtup.t_tableOid = RelationGetRelid(relation);
@@ -3262,9 +3265,17 @@ heap_update(Relation relation, ItemPointer otid, HeapTuple newtup,
/* the new tuple is ready, except for this: */
newtup->t_tableOid = RelationGetRelid(relation);
- /* Determine columns modified by the update. */
- modified_attrs = HeapDetermineModifiedColumns(relation, interesting_attrs,
- &oldtup, newtup);
+ /*
+ * Determine columns modified by the update. Additionally, identify
+ * whether any of the unmodified replica identity key attributes in the
+ * old tuple is externally stored or not. This is required because for
+ * such attributes the flattened value won't be WAL logged as part of the
+ * new tuple so we must include it as part of the old_key_tuple. See
+ * ExtractReplicaIdentity.
+ */
+ modified_attrs = HeapDetermineColumnsInfo(relation, interesting_attrs,
+ id_attrs, &oldtup,
+ newtup, &id_has_external);
/*
* If we're not updating any "key" column, we can grab a weaker lock type.
@@ -3864,10 +3875,12 @@ l2:
* Compute replica identity tuple before entering the critical section so
* we don't PANIC upon a memory allocation failure.
* ExtractReplicaIdentity() will return NULL if nothing needs to be
- * logged.
+ * logged. Pass old key required as true only if the replica identity key
+ * columns are modified or it has external data.
*/
old_key_tuple = ExtractReplicaIdentity(relation, &oldtup,
- bms_overlap(modified_attrs, id_attrs),
+ bms_overlap(modified_attrs, id_attrs) ||
+ id_has_external,
&old_key_copied);
/* NO EREPORT(ERROR) from here till changes are logged */
@@ -4023,47 +4036,15 @@ l2:
}
/*
- * Check if the specified attribute's value is same in both given tuples.
- * Subroutine for HeapDetermineModifiedColumns.
+ * Check if the specified attribute's values are the same. Subroutine for
+ * HeapDetermineColumnsInfo.
*/
static bool
-heap_tuple_attr_equals(TupleDesc tupdesc, int attrnum,
- HeapTuple tup1, HeapTuple tup2)
+heap_attr_equals(TupleDesc tupdesc, int attrnum, Datum value1, Datum value2,
+ bool isnull1, bool isnull2)
{
- Datum value1,
- value2;
- bool isnull1,
- isnull2;
Form_pg_attribute att;
- /*
- * If it's a whole-tuple reference, say "not equal". It's not really
- * worth supporting this case, since it could only succeed after a no-op
- * update, which is hardly a case worth optimizing for.
- */
- if (attrnum == 0)
- return false;
-
- /*
- * Likewise, automatically say "not equal" for any system attribute other
- * than tableOID; we cannot expect these to be consistent in a HOT chain,
- * or even to be set correctly yet in the new tuple.
- */
- if (attrnum < 0)
- {
- if (attrnum != TableOidAttributeNumber)
- return false;
- }
-
- /*
- * Extract the corresponding values. XXX this is pretty inefficient if
- * there are many indexed columns. Should HeapDetermineModifiedColumns do
- * a single heap_deform_tuple call on each tuple, instead? But that
- * doesn't work for system columns ...
- */
- value1 = heap_getattr(tup1, attrnum, tupdesc, &isnull1);
- value2 = heap_getattr(tup2, attrnum, tupdesc, &isnull2);
-
/*
* If one value is NULL and other is not, then they are certainly not
* equal
@@ -4105,24 +4086,96 @@ heap_tuple_attr_equals(TupleDesc tupdesc, int attrnum,
* Given an updated tuple, determine (and return into the output bitmapset),
* from those listed as interesting, the set of columns that changed.
*
- * The input bitmapset is destructively modified; that is OK since this is
- * invoked at most once in heap_update.
+ * has_external indicates if any of the unmodified attributes (from those
+ * listed as interesting) of the old tuple is a member of external_cols and is
+ * stored externally.
+ *
+ * The input interesting_cols bitmapset is destructively modified; that is OK
+ * since this is invoked at most once in heap_update.
*/
static Bitmapset *
-HeapDetermineModifiedColumns(Relation relation, Bitmapset *interesting_cols,
- HeapTuple oldtup, HeapTuple newtup)
+HeapDetermineColumnsInfo(Relation relation,
+ Bitmapset *interesting_cols,
+ Bitmapset *external_cols,
+ HeapTuple oldtup, HeapTuple newtup,
+ bool *has_external)
{
- int attnum;
+ int attrnum;
Bitmapset *modified = NULL;
+ TupleDesc tupdesc = RelationGetDescr(relation);
- while ((attnum = bms_first_member(interesting_cols)) >= 0)
+ while ((attrnum = bms_first_member(interesting_cols)) >= 0)
{
- attnum += FirstLowInvalidHeapAttributeNumber;
+ Datum value1,
+ value2;
+ bool isnull1,
+ isnull2;
+
+ attrnum += FirstLowInvalidHeapAttributeNumber;
- if (!heap_tuple_attr_equals(RelationGetDescr(relation),
- attnum, oldtup, newtup))
+ /*
+ * If it's a whole-tuple reference, say "not equal". It's not really
+ * worth supporting this case, since it could only succeed after a
+ * no-op update, which is hardly a case worth optimizing for.
+ */
+ if (attrnum == 0)
+ {
modified = bms_add_member(modified,
- attnum - FirstLowInvalidHeapAttributeNumber);
+ attrnum -
+ FirstLowInvalidHeapAttributeNumber);
+ continue;
+ }
+
+ /*
+ * Likewise, automatically say "not equal" for any system attribute
+ * other than tableOID; we cannot expect these to be consistent in a
+ * HOT chain, or even to be set correctly yet in the new tuple.
+ */
+ if (attrnum < 0)
+ {
+ if (attrnum != TableOidAttributeNumber)
+ {
+ modified = bms_add_member(modified,
+ attrnum -
+ FirstLowInvalidHeapAttributeNumber);
+ continue;
+ }
+ }
+
+ /*
+ * Extract the corresponding values. XXX this is pretty inefficient
+ * if there are many indexed columns. Should we do a single
+ * heap_deform_tuple call on each tuple, instead? But that doesn't
+ * work for system columns ...
+ */
+ value1 = heap_getattr(oldtup, attrnum, tupdesc, &isnull1);
+ value2 = heap_getattr(newtup, attrnum, tupdesc, &isnull2);
+
+ if (!heap_attr_equals(tupdesc, attrnum, value1,
+ value2, isnull1, isnull2))
+ {
+ modified = bms_add_member(modified,
+ attrnum -
+ FirstLowInvalidHeapAttributeNumber);
+ continue;
+ }
+
+ /*
+ * No need to check attributes that can't be stored externally. Note
+ * that system attributes can't be stored externally.
+ */
+ if (attrnum < 0 || isnull1 ||
+ TupleDescAttr(tupdesc, attrnum - 1)->attlen != -1)
+ continue;
+
+ /*
+ * Check if the old tuple's attribute is stored externally and is a
+ * member of external_cols.
+ */
+ if (VARATT_IS_EXTERNAL((struct varlena *) DatumGetPointer(value1)) &&
+ bms_is_member(attrnum - FirstLowInvalidHeapAttributeNumber,
+ external_cols))
+ *has_external = true;
}
return modified;
@@ -8359,14 +8412,14 @@ log_heap_new_cid(Relation relation, HeapTuple tup)
* Returns NULL if there's no need to log an identity or if there's no suitable
* key defined.
*
- * key_changed should be false if caller knows that no replica identity
- * columns changed value. It's always true in the DELETE case.
+ * Pass key_required true if any replica identity columns changed value, or if
+ * any of them have any external data. Delete must always pass true.
*
* *copy is set to true if the returned tuple is a modified copy rather than
* the same tuple that was passed in.
*/
static HeapTuple
-ExtractReplicaIdentity(Relation relation, HeapTuple tp, bool key_changed,
+ExtractReplicaIdentity(Relation relation, HeapTuple tp, bool key_required,
bool *copy)
{
TupleDesc desc = RelationGetDescr(relation);
@@ -8398,8 +8451,8 @@ ExtractReplicaIdentity(Relation relation, HeapTuple tp, bool key_changed,
return tp;
}
- /* if the key hasn't changed and we're only logging the key, we're done */
- if (!key_changed)
+ /* if the key isn't required and we're only logging the key, we're done */
+ if (!key_required)
return NULL;
/* find out the replica identity columns */
@@ -8407,10 +8460,10 @@ ExtractReplicaIdentity(Relation relation, HeapTuple tp, bool key_changed,
INDEX_ATTR_BITMAP_IDENTITY_KEY);
/*
- * If there's no defined replica identity columns, treat as !key_changed.
+ * If there's no defined replica identity columns, treat as !key_required.
* (This case should not be reachable from heap_update, since that should
- * calculate key_changed accurately. But heap_delete just passes constant
- * true for key_changed, so we can hit this case in deletes.)
+ * calculate key_required accurately. But heap_delete just passes
+ * constant true for key_required, so we can hit this case in deletes.)
*/
if (bms_is_empty(idattrs))
return NULL;
--
2.18.4
10-v9-0001-WAL-log-unchanged-toasted-replica-identity-key-.patchapplication/octet-stream; name=10-v9-0001-WAL-log-unchanged-toasted-replica-identity-key-.patchDownload
From bdfec4c978a1e39d13a461b1c1ffd77f36513bb2 Mon Sep 17 00:00:00 2001
From: Amit Kapila <akapila@postgresql.org>
Date: Fri, 4 Feb 2022 17:37:31 +0530
Subject: [PATCH v910] WAL log unchanged toasted replica identity key
attributes.
Currently, during UPDATE, the unchanged replica identity key attributes
are not logged separately because they are getting logged as part of the
new tuple. But if they are stored externally then the untoasted values are
not getting logged as part of the new tuple and logical replication won't
be able to replicate such UPDATEs. So we need to log such attributes as
part of the old_key_tuple during UPDATE.
Reported-by: Haiying Tang
Author: Dilip Kumar and Amit Kapila
Reviewed-by: Alvaro Herrera
Backpatch-through: 10
Discussion: https://postgr.es/m/OS0PR01MB611342D0A92D4F4BF26C0F47FB229@OS0PR01MB6113.jpnprd01.prod.outlook.com
---
contrib/test_decoding/expected/toast.out | 2 +-
doc/src/sgml/ref/alter_table.sgml | 9 +-
src/backend/access/heap/heapam.c | 176 +++++++++++++++--------
3 files changed, 122 insertions(+), 65 deletions(-)
diff --git a/contrib/test_decoding/expected/toast.out b/contrib/test_decoding/expected/toast.out
index 91a9a1e86d..a8e3676f47 100644
--- a/contrib/test_decoding/expected/toast.out
+++ b/contrib/test_decoding/expected/toast.out
@@ -77,7 +77,7 @@ SELECT substr(data, 1, 200) FROM pg_logical_slot_get_changes('regression_slot',
table public.toasted_key: INSERT: id[integer]:1 toasted_key[text]:'1234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123
COMMIT
BEGIN
- table public.toasted_key: UPDATE: id[integer]:1 toasted_key[text]:unchanged-toast-datum toasted_col1[text]:unchanged-toast-datum toasted_col2[text]:'987654321098765432109876543210987654321098765432109
+ table public.toasted_key: UPDATE: old-key: toasted_key[text]:'123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678
COMMIT
BEGIN
table public.toasted_key: UPDATE: old-key: toasted_key[text]:'123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678
diff --git a/doc/src/sgml/ref/alter_table.sgml b/doc/src/sgml/ref/alter_table.sgml
index 1279c19e81..0295d8fbd0 100644
--- a/doc/src/sgml/ref/alter_table.sgml
+++ b/doc/src/sgml/ref/alter_table.sgml
@@ -736,10 +736,11 @@ ALTER TABLE [ IF EXISTS ] <replaceable class="PARAMETER">name</replaceable>
<listitem>
<para>
This form changes the information which is written to the write-ahead log
- to identify rows which are updated or deleted. This option has no effect
- except when logical replication is in use.
- In all cases, no old values are logged unless at least one of the columns
- that would be logged differs between the old and new versions of the row.
+ to identify rows which are updated or deleted.
+ In most cases, the old value of each column is only logged if it differs
+ from the new value; however, if the old value is stored externally, it is
+ always logged regardless of whether it changed.
+ This option has no effect except when logical replication is in use.
<variablelist>
<varlistentry>
<term><literal>DEFAULT</literal></term>
diff --git a/src/backend/access/heap/heapam.c b/src/backend/access/heap/heapam.c
index 144bbcdfe7..e76a810859 100644
--- a/src/backend/access/heap/heapam.c
+++ b/src/backend/access/heap/heapam.c
@@ -96,9 +96,11 @@ static XLogRecPtr log_heap_update(Relation reln, Buffer oldbuf,
Buffer newbuf, HeapTuple oldtup,
HeapTuple newtup, HeapTuple old_key_tup,
bool all_visible_cleared, bool new_all_visible_cleared);
-static Bitmapset *HeapDetermineModifiedColumns(Relation relation,
- Bitmapset *interesting_cols,
- HeapTuple oldtup, HeapTuple newtup);
+static Bitmapset *HeapDetermineColumnsInfo(Relation relation,
+ Bitmapset *interesting_cols,
+ Bitmapset *external_cols,
+ HeapTuple oldtup, HeapTuple newtup,
+ bool *has_external);
static bool heap_acquire_tuplock(Relation relation, ItemPointer tid,
LockTupleMode mode, LockWaitPolicy wait_policy,
bool *have_tuple_lock);
@@ -122,7 +124,7 @@ static void MultiXactIdWait(MultiXactId multi, MultiXactStatus status, uint16 in
static bool ConditionalMultiXactIdWait(MultiXactId multi, MultiXactStatus status,
uint16 infomask, Relation rel, int *remaining);
static XLogRecPtr log_heap_new_cid(Relation relation, HeapTuple tup);
-static HeapTuple ExtractReplicaIdentity(Relation rel, HeapTuple tup, bool key_modified,
+static HeapTuple ExtractReplicaIdentity(Relation rel, HeapTuple tup, bool key_required,
bool *copy);
@@ -3510,6 +3512,7 @@ heap_update(Relation relation, ItemPointer otid, HeapTuple newtup,
bool all_visible_cleared_new = false;
bool checked_lockers;
bool locker_remains;
+ bool id_has_external = false;
TransactionId xmax_new_tuple,
xmax_old_tuple;
uint16 infomask_old_tuple,
@@ -3594,7 +3597,7 @@ heap_update(Relation relation, ItemPointer otid, HeapTuple newtup,
Assert(ItemIdIsNormal(lp));
/*
- * Fill in enough data in oldtup for HeapDetermineModifiedColumns to work
+ * Fill in enough data in oldtup for HeapDetermineColumnsInfo to work
* properly.
*/
oldtup.t_tableOid = RelationGetRelid(relation);
@@ -3620,9 +3623,17 @@ heap_update(Relation relation, ItemPointer otid, HeapTuple newtup,
Assert(!(newtup->t_data->t_infomask & HEAP_HASOID));
}
- /* Determine columns modified by the update. */
- modified_attrs = HeapDetermineModifiedColumns(relation, interesting_attrs,
- &oldtup, newtup);
+ /*
+ * Determine columns modified by the update. Additionally, identify
+ * whether any of the unmodified replica identity key attributes in the
+ * old tuple is externally stored or not. This is required because for
+ * such attributes the flattened value won't be WAL logged as part of the
+ * new tuple so we must include it as part of the old_key_tuple. See
+ * ExtractReplicaIdentity.
+ */
+ modified_attrs = HeapDetermineColumnsInfo(relation, interesting_attrs,
+ id_attrs, &oldtup,
+ newtup, &id_has_external);
/*
* If we're not updating any "key" column, we can grab a weaker lock type.
@@ -4201,10 +4212,12 @@ l2:
* Compute replica identity tuple before entering the critical section so
* we don't PANIC upon a memory allocation failure.
* ExtractReplicaIdentity() will return NULL if nothing needs to be
- * logged.
+ * logged. Pass old key required as true only if the replica identity key
+ * columns are modified or it has external data.
*/
old_key_tuple = ExtractReplicaIdentity(relation, &oldtup,
- bms_overlap(modified_attrs, id_attrs),
+ bms_overlap(modified_attrs, id_attrs) ||
+ id_has_external,
&old_key_copied);
/* NO EREPORT(ERROR) from here till changes are logged */
@@ -4360,48 +4373,15 @@ l2:
}
/*
- * Check if the specified attribute's value is same in both given tuples.
- * Subroutine for HeapDetermineModifiedColumns.
+ * Check if the specified attribute's values are the same. Subroutine for
+ * HeapDetermineColumnsInfo.
*/
static bool
-heap_tuple_attr_equals(TupleDesc tupdesc, int attrnum,
- HeapTuple tup1, HeapTuple tup2)
+heap_attr_equals(TupleDesc tupdesc, int attrnum, Datum value1, Datum value2,
+ bool isnull1, bool isnull2)
{
- Datum value1,
- value2;
- bool isnull1,
- isnull2;
Form_pg_attribute att;
- /*
- * If it's a whole-tuple reference, say "not equal". It's not really
- * worth supporting this case, since it could only succeed after a no-op
- * update, which is hardly a case worth optimizing for.
- */
- if (attrnum == 0)
- return false;
-
- /*
- * Likewise, automatically say "not equal" for any system attribute other
- * than OID and tableOID; we cannot expect these to be consistent in a HOT
- * chain, or even to be set correctly yet in the new tuple.
- */
- if (attrnum < 0)
- {
- if (attrnum != ObjectIdAttributeNumber &&
- attrnum != TableOidAttributeNumber)
- return false;
- }
-
- /*
- * Extract the corresponding values. XXX this is pretty inefficient if
- * there are many indexed columns. Should HeapDetermineModifiedColumns do
- * a single heap_deform_tuple call on each tuple, instead? But that
- * doesn't work for system columns ...
- */
- value1 = heap_getattr(tup1, attrnum, tupdesc, &isnull1);
- value2 = heap_getattr(tup2, attrnum, tupdesc, &isnull2);
-
/*
* If one value is NULL and other is not, then they are certainly not
* equal
@@ -4443,24 +4423,97 @@ heap_tuple_attr_equals(TupleDesc tupdesc, int attrnum,
* Given an updated tuple, determine (and return into the output bitmapset),
* from those listed as interesting, the set of columns that changed.
*
- * The input bitmapset is destructively modified; that is OK since this is
- * invoked at most once in heap_update.
+ * has_external indicates if any of the unmodified attributes (from those
+ * listed as interesting) of the old tuple is a member of external_cols and is
+ * stored externally.
+ *
+ * The input interesting_cols bitmapset is destructively modified; that is OK
+ * since this is invoked at most once in heap_update.
*/
static Bitmapset *
-HeapDetermineModifiedColumns(Relation relation, Bitmapset *interesting_cols,
- HeapTuple oldtup, HeapTuple newtup)
+HeapDetermineColumnsInfo(Relation relation,
+ Bitmapset *interesting_cols,
+ Bitmapset *external_cols,
+ HeapTuple oldtup, HeapTuple newtup,
+ bool *has_external)
{
- int attnum;
+ int attrnum;
Bitmapset *modified = NULL;
+ TupleDesc tupdesc = RelationGetDescr(relation);
- while ((attnum = bms_first_member(interesting_cols)) >= 0)
+ while ((attrnum = bms_first_member(interesting_cols)) >= 0)
{
- attnum += FirstLowInvalidHeapAttributeNumber;
+ Datum value1,
+ value2;
+ bool isnull1,
+ isnull2;
+
+ attrnum += FirstLowInvalidHeapAttributeNumber;
+
+ /*
+ * If it's a whole-tuple reference, say "not equal". It's not really
+ * worth supporting this case, since it could only succeed after a
+ * no-op update, which is hardly a case worth optimizing for.
+ */
+ if (attrnum == 0)
+ {
+ modified = bms_add_member(modified,
+ attrnum -
+ FirstLowInvalidHeapAttributeNumber);
+ continue;
+ }
- if (!heap_tuple_attr_equals(RelationGetDescr(relation),
- attnum, oldtup, newtup))
+ /*
+ * Likewise, automatically say "not equal" for any system attribute
+ * other than OID and tableOID; we cannot expect these to be consistent
+ * in a HOT chain, or even to be set correctly yet in the new tuple.
+ */
+ if (attrnum < 0)
+ {
+ if (attrnum != ObjectIdAttributeNumber &&
+ attrnum != TableOidAttributeNumber)
+ {
+ modified = bms_add_member(modified,
+ attrnum -
+ FirstLowInvalidHeapAttributeNumber);
+ continue;
+ }
+ }
+
+ /*
+ * Extract the corresponding values. XXX this is pretty inefficient
+ * if there are many indexed columns. Should we do a single
+ * heap_deform_tuple call on each tuple, instead? But that doesn't
+ * work for system columns ...
+ */
+ value1 = heap_getattr(oldtup, attrnum, tupdesc, &isnull1);
+ value2 = heap_getattr(newtup, attrnum, tupdesc, &isnull2);
+
+ if (!heap_attr_equals(tupdesc, attrnum, value1,
+ value2, isnull1, isnull2))
+ {
modified = bms_add_member(modified,
- attnum - FirstLowInvalidHeapAttributeNumber);
+ attrnum -
+ FirstLowInvalidHeapAttributeNumber);
+ continue;
+ }
+
+ /*
+ * No need to check attributes that can't be stored externally. Note
+ * that system attributes can't be stored externally.
+ */
+ if (attrnum < 0 || isnull1 ||
+ TupleDescAttr(tupdesc, attrnum - 1)->attlen != -1)
+ continue;
+
+ /*
+ * Check if the old tuple's attribute is stored externally and is a
+ * member of external_cols.
+ */
+ if (VARATT_IS_EXTERNAL((struct varlena *) DatumGetPointer(value1)) &&
+ bms_is_member(attrnum - FirstLowInvalidHeapAttributeNumber,
+ external_cols))
+ *has_external = true;
}
return modified;
@@ -7964,9 +8017,12 @@ log_heap_new_cid(Relation relation, HeapTuple tup)
*
* Returns NULL if there's no need to log an identity or if there's no suitable
* key in the Relation relation.
+ *
+ * Pass key_required true if any replica identity columns changed value, or if
+ * any of them have any external data. Delete must always pass true.
*/
static HeapTuple
-ExtractReplicaIdentity(Relation relation, HeapTuple tp, bool key_changed, bool *copy)
+ExtractReplicaIdentity(Relation relation, HeapTuple tp, bool key_required, bool *copy)
{
TupleDesc desc = RelationGetDescr(relation);
Oid replidindex;
@@ -8000,8 +8056,8 @@ ExtractReplicaIdentity(Relation relation, HeapTuple tp, bool key_changed, bool *
return tp;
}
- /* if the key hasn't changed and we're only logging the key, we're done */
- if (!key_changed)
+ /* if the key isn't required and we're only logging the key, we're done */
+ if (!key_required)
return NULL;
/* find the replica identity index */
--
2.18.4
11-v9-0001-WAL-log-unchanged-toasted-replica-identity-key-.patchapplication/octet-stream; name=11-v9-0001-WAL-log-unchanged-toasted-replica-identity-key-.patchDownload
From cc6bb494f08d0592e53b828e3ecfccd5055335f5 Mon Sep 17 00:00:00 2001
From: Amit Kapila <akapila@postgresql.org>
Date: Fri, 4 Feb 2022 16:30:33 +0530
Subject: [PATCH v911] WAL log unchanged toasted replica identity key
attributes.
Currently, during UPDATE, the unchanged replica identity key attributes
are not logged separately because they are getting logged as part of the
new tuple. But if they are stored externally then the untoasted values are
not getting logged as part of the new tuple and logical replication won't
be able to replicate such UPDATEs. So we need to log such attributes as
part of the old_key_tuple during UPDATE.
Reported-by: Haiying Tang
Author: Dilip Kumar and Amit Kapila
Reviewed-by: Alvaro Herrera
Backpatch-through: 10
Discussion: https://postgr.es/m/OS0PR01MB611342D0A92D4F4BF26C0F47FB229@OS0PR01MB6113.jpnprd01.prod.outlook.com
---
contrib/test_decoding/expected/toast.out | 2 +-
doc/src/sgml/ref/alter_table.sgml | 9 +-
src/backend/access/heap/heapam.c | 176 +++++++++++++++--------
3 files changed, 122 insertions(+), 65 deletions(-)
diff --git a/contrib/test_decoding/expected/toast.out b/contrib/test_decoding/expected/toast.out
index cd03e9d50a..a757e7dc8d 100644
--- a/contrib/test_decoding/expected/toast.out
+++ b/contrib/test_decoding/expected/toast.out
@@ -77,7 +77,7 @@ SELECT substr(data, 1, 200) FROM pg_logical_slot_get_changes('regression_slot',
table public.toasted_key: INSERT: id[integer]:1 toasted_key[text]:'1234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123
COMMIT
BEGIN
- table public.toasted_key: UPDATE: id[integer]:1 toasted_key[text]:unchanged-toast-datum toasted_col1[text]:unchanged-toast-datum toasted_col2[text]:'987654321098765432109876543210987654321098765432109
+ table public.toasted_key: UPDATE: old-key: toasted_key[text]:'123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678
COMMIT
BEGIN
table public.toasted_key: UPDATE: old-key: toasted_key[text]:'123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678
diff --git a/doc/src/sgml/ref/alter_table.sgml b/doc/src/sgml/ref/alter_table.sgml
index 3e18b7dba5..8564adb3d2 100644
--- a/doc/src/sgml/ref/alter_table.sgml
+++ b/doc/src/sgml/ref/alter_table.sgml
@@ -813,10 +813,11 @@ WITH ( MODULUS <replaceable class="parameter">numeric_literal</replaceable>, REM
<listitem>
<para>
This form changes the information which is written to the write-ahead log
- to identify rows which are updated or deleted. This option has no effect
- except when logical replication is in use.
- In all cases, no old values are logged unless at least one of the columns
- that would be logged differs between the old and new versions of the row.
+ to identify rows which are updated or deleted.
+ In most cases, the old value of each column is only logged if it differs
+ from the new value; however, if the old value is stored externally, it is
+ always logged regardless of whether it changed.
+ This option has no effect except when logical replication is in use.
<variablelist>
<varlistentry>
<term><literal>DEFAULT</literal></term>
diff --git a/src/backend/access/heap/heapam.c b/src/backend/access/heap/heapam.c
index 834435cdf7..5e971b9780 100644
--- a/src/backend/access/heap/heapam.c
+++ b/src/backend/access/heap/heapam.c
@@ -101,9 +101,11 @@ static XLogRecPtr log_heap_update(Relation reln, Buffer oldbuf,
Buffer newbuf, HeapTuple oldtup,
HeapTuple newtup, HeapTuple old_key_tup,
bool all_visible_cleared, bool new_all_visible_cleared);
-static Bitmapset *HeapDetermineModifiedColumns(Relation relation,
- Bitmapset *interesting_cols,
- HeapTuple oldtup, HeapTuple newtup);
+static Bitmapset *HeapDetermineColumnsInfo(Relation relation,
+ Bitmapset *interesting_cols,
+ Bitmapset *external_cols,
+ HeapTuple oldtup, HeapTuple newtup,
+ bool *has_external);
static bool heap_acquire_tuplock(Relation relation, ItemPointer tid,
LockTupleMode mode, LockWaitPolicy wait_policy,
bool *have_tuple_lock);
@@ -127,7 +129,7 @@ static void MultiXactIdWait(MultiXactId multi, MultiXactStatus status, uint16 in
static bool ConditionalMultiXactIdWait(MultiXactId multi, MultiXactStatus status,
uint16 infomask, Relation rel, int *remaining);
static XLogRecPtr log_heap_new_cid(Relation relation, HeapTuple tup);
-static HeapTuple ExtractReplicaIdentity(Relation rel, HeapTuple tup, bool key_modified,
+static HeapTuple ExtractReplicaIdentity(Relation rel, HeapTuple tup, bool key_required,
bool *copy);
static bool ProjIndexIsUnchanged(Relation relation, HeapTuple oldtup, HeapTuple newtup);
@@ -3568,6 +3570,7 @@ heap_update(Relation relation, ItemPointer otid, HeapTuple newtup,
bool all_visible_cleared_new = false;
bool checked_lockers;
bool locker_remains;
+ bool id_has_external = false;
TransactionId xmax_new_tuple,
xmax_old_tuple;
uint16 infomask_old_tuple,
@@ -3652,7 +3655,7 @@ heap_update(Relation relation, ItemPointer otid, HeapTuple newtup,
Assert(ItemIdIsNormal(lp));
/*
- * Fill in enough data in oldtup for HeapDetermineModifiedColumns to work
+ * Fill in enough data in oldtup for HeapDetermineColumnsInfo to work
* properly.
*/
oldtup.t_tableOid = RelationGetRelid(relation);
@@ -3678,9 +3681,17 @@ heap_update(Relation relation, ItemPointer otid, HeapTuple newtup,
Assert(!(newtup->t_data->t_infomask & HEAP_HASOID));
}
- /* Determine columns modified by the update. */
- modified_attrs = HeapDetermineModifiedColumns(relation, interesting_attrs,
- &oldtup, newtup);
+ /*
+ * Determine columns modified by the update. Additionally, identify
+ * whether any of the unmodified replica identity key attributes in the
+ * old tuple is externally stored or not. This is required because for
+ * such attributes the flattened value won't be WAL logged as part of the
+ * new tuple so we must include it as part of the old_key_tuple. See
+ * ExtractReplicaIdentity.
+ */
+ modified_attrs = HeapDetermineColumnsInfo(relation, interesting_attrs,
+ id_attrs, &oldtup,
+ newtup, &id_has_external);
/*
* If we're not updating any "key" column, we can grab a weaker lock type.
@@ -4266,10 +4277,12 @@ l2:
* Compute replica identity tuple before entering the critical section so
* we don't PANIC upon a memory allocation failure.
* ExtractReplicaIdentity() will return NULL if nothing needs to be
- * logged.
+ * logged. Pass old key required as true only if the replica identity key
+ * columns are modified or it has external data.
*/
old_key_tuple = ExtractReplicaIdentity(relation, &oldtup,
- bms_overlap(modified_attrs, id_attrs),
+ bms_overlap(modified_attrs, id_attrs) ||
+ id_has_external,
&old_key_copied);
/* NO EREPORT(ERROR) from here till changes are logged */
@@ -4426,48 +4439,15 @@ l2:
}
/*
- * Check if the specified attribute's value is same in both given tuples.
- * Subroutine for HeapDetermineModifiedColumns.
+ * Check if the specified attribute's values are the same. Subroutine for
+ * HeapDetermineColumnsInfo.
*/
static bool
-heap_tuple_attr_equals(TupleDesc tupdesc, int attrnum,
- HeapTuple tup1, HeapTuple tup2)
+heap_attr_equals(TupleDesc tupdesc, int attrnum, Datum value1, Datum value2,
+ bool isnull1, bool isnull2)
{
- Datum value1,
- value2;
- bool isnull1,
- isnull2;
Form_pg_attribute att;
- /*
- * If it's a whole-tuple reference, say "not equal". It's not really
- * worth supporting this case, since it could only succeed after a no-op
- * update, which is hardly a case worth optimizing for.
- */
- if (attrnum == 0)
- return false;
-
- /*
- * Likewise, automatically say "not equal" for any system attribute other
- * than OID and tableOID; we cannot expect these to be consistent in a HOT
- * chain, or even to be set correctly yet in the new tuple.
- */
- if (attrnum < 0)
- {
- if (attrnum != ObjectIdAttributeNumber &&
- attrnum != TableOidAttributeNumber)
- return false;
- }
-
- /*
- * Extract the corresponding values. XXX this is pretty inefficient if
- * there are many indexed columns. Should HeapDetermineModifiedColumns do
- * a single heap_deform_tuple call on each tuple, instead? But that
- * doesn't work for system columns ...
- */
- value1 = heap_getattr(tup1, attrnum, tupdesc, &isnull1);
- value2 = heap_getattr(tup2, attrnum, tupdesc, &isnull2);
-
/*
* If one value is NULL and other is not, then they are certainly not
* equal
@@ -4589,24 +4569,97 @@ ProjIndexIsUnchanged(Relation relation, HeapTuple oldtup, HeapTuple newtup)
* Given an updated tuple, determine (and return into the output bitmapset),
* from those listed as interesting, the set of columns that changed.
*
- * The input bitmapset is destructively modified; that is OK since this is
- * invoked at most once in heap_update.
+ * has_external indicates if any of the unmodified attributes (from those
+ * listed as interesting) of the old tuple is a member of external_cols and is
+ * stored externally.
+ *
+ * The input interesting_cols bitmapset is destructively modified; that is OK
+ * since this is invoked at most once in heap_update.
*/
static Bitmapset *
-HeapDetermineModifiedColumns(Relation relation, Bitmapset *interesting_cols,
- HeapTuple oldtup, HeapTuple newtup)
+HeapDetermineColumnsInfo(Relation relation,
+ Bitmapset *interesting_cols,
+ Bitmapset *external_cols,
+ HeapTuple oldtup, HeapTuple newtup,
+ bool *has_external)
{
- int attnum;
+ int attrnum;
Bitmapset *modified = NULL;
+ TupleDesc tupdesc = RelationGetDescr(relation);
- while ((attnum = bms_first_member(interesting_cols)) >= 0)
+ while ((attrnum = bms_first_member(interesting_cols)) >= 0)
{
- attnum += FirstLowInvalidHeapAttributeNumber;
+ Datum value1,
+ value2;
+ bool isnull1,
+ isnull2;
+
+ attrnum += FirstLowInvalidHeapAttributeNumber;
+
+ /*
+ * If it's a whole-tuple reference, say "not equal". It's not really
+ * worth supporting this case, since it could only succeed after a
+ * no-op update, which is hardly a case worth optimizing for.
+ */
+ if (attrnum == 0)
+ {
+ modified = bms_add_member(modified,
+ attrnum -
+ FirstLowInvalidHeapAttributeNumber);
+ continue;
+ }
- if (!heap_tuple_attr_equals(RelationGetDescr(relation),
- attnum, oldtup, newtup))
+ /*
+ * Likewise, automatically say "not equal" for any system attribute
+ * other than OID and tableOID; we cannot expect these to be consistent
+ * in a HOT chain, or even to be set correctly yet in the new tuple.
+ */
+ if (attrnum < 0)
+ {
+ if (attrnum != ObjectIdAttributeNumber &&
+ attrnum != TableOidAttributeNumber)
+ {
+ modified = bms_add_member(modified,
+ attrnum -
+ FirstLowInvalidHeapAttributeNumber);
+ continue;
+ }
+ }
+
+ /*
+ * Extract the corresponding values. XXX this is pretty inefficient
+ * if there are many indexed columns. Should we do a single
+ * heap_deform_tuple call on each tuple, instead? But that doesn't
+ * work for system columns ...
+ */
+ value1 = heap_getattr(oldtup, attrnum, tupdesc, &isnull1);
+ value2 = heap_getattr(newtup, attrnum, tupdesc, &isnull2);
+
+ if (!heap_attr_equals(tupdesc, attrnum, value1,
+ value2, isnull1, isnull2))
+ {
modified = bms_add_member(modified,
- attnum - FirstLowInvalidHeapAttributeNumber);
+ attrnum -
+ FirstLowInvalidHeapAttributeNumber);
+ continue;
+ }
+
+ /*
+ * No need to check attributes that can't be stored externally. Note
+ * that system attributes can't be stored externally.
+ */
+ if (attrnum < 0 || isnull1 ||
+ TupleDescAttr(tupdesc, attrnum - 1)->attlen != -1)
+ continue;
+
+ /*
+ * Check if the old tuple's attribute is stored externally and is a
+ * member of external_cols.
+ */
+ if (VARATT_IS_EXTERNAL((struct varlena *) DatumGetPointer(value1)) &&
+ bms_is_member(attrnum - FirstLowInvalidHeapAttributeNumber,
+ external_cols))
+ *has_external = true;
}
return modified;
@@ -8116,9 +8169,12 @@ log_heap_new_cid(Relation relation, HeapTuple tup)
*
* Returns NULL if there's no need to log an identity or if there's no suitable
* key in the Relation relation.
+ *
+ * Pass key_required true if any replica identity columns changed value, or if
+ * any of them have any external data. Delete must always pass true.
*/
static HeapTuple
-ExtractReplicaIdentity(Relation relation, HeapTuple tp, bool key_changed, bool *copy)
+ExtractReplicaIdentity(Relation relation, HeapTuple tp, bool key_required, bool *copy)
{
TupleDesc desc = RelationGetDescr(relation);
Oid replidindex;
@@ -8151,8 +8207,8 @@ ExtractReplicaIdentity(Relation relation, HeapTuple tp, bool key_changed, bool *
return tp;
}
- /* if the key hasn't changed and we're only logging the key, we're done */
- if (!key_changed)
+ /* if the key isn't required and we're only logging the key, we're done */
+ if (!key_required)
return NULL;
/* find the replica identity index */
--
2.18.4
12-v9-0001-WAL-log-unchanged-toasted-replica-identity-key-.patchapplication/octet-stream; name=12-v9-0001-WAL-log-unchanged-toasted-replica-identity-key-.patchDownload
From d41da1f2d9624710874b2be0d7a78496972dca88 Mon Sep 17 00:00:00 2001
From: Amit Kapila <akapila@postgresql.org>
Date: Fri, 4 Feb 2022 12:12:40 +0530
Subject: [PATCH v912] WAL log unchanged toasted replica identity key
attributes.
Currently, during UPDATE, the unchanged replica identity key attributes
are not logged separately because they are getting logged as part of the
new tuple. But if they are stored externally then the untoasted values are
not getting logged as part of the new tuple and logical replication won't
be able to replicate such UPDATEs. So we need to log such attributes as
part of the old_key_tuple during UPDATE.
Reported-by: Haiying Tang
Author: Dilip Kumar and Amit Kapila
Reviewed-by: Alvaro Herrera
Backpatch-through: 10
Discussion: https://postgr.es/m/OS0PR01MB611342D0A92D4F4BF26C0F47FB229@OS0PR01MB6113.jpnprd01.prod.outlook.com
---
contrib/test_decoding/expected/toast.out | 2 +-
doc/src/sgml/ref/alter_table.sgml | 9 +-
src/backend/access/heap/heapam.c | 181 +++++++++++++++--------
3 files changed, 123 insertions(+), 69 deletions(-)
diff --git a/contrib/test_decoding/expected/toast.out b/contrib/test_decoding/expected/toast.out
index cd03e9d50a..a757e7dc8d 100644
--- a/contrib/test_decoding/expected/toast.out
+++ b/contrib/test_decoding/expected/toast.out
@@ -77,7 +77,7 @@ SELECT substr(data, 1, 200) FROM pg_logical_slot_get_changes('regression_slot',
table public.toasted_key: INSERT: id[integer]:1 toasted_key[text]:'1234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123
COMMIT
BEGIN
- table public.toasted_key: UPDATE: id[integer]:1 toasted_key[text]:unchanged-toast-datum toasted_col1[text]:unchanged-toast-datum toasted_col2[text]:'987654321098765432109876543210987654321098765432109
+ table public.toasted_key: UPDATE: old-key: toasted_key[text]:'123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678
COMMIT
BEGIN
table public.toasted_key: UPDATE: old-key: toasted_key[text]:'123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678
diff --git a/doc/src/sgml/ref/alter_table.sgml b/doc/src/sgml/ref/alter_table.sgml
index c572f75adb..327aedc409 100644
--- a/doc/src/sgml/ref/alter_table.sgml
+++ b/doc/src/sgml/ref/alter_table.sgml
@@ -798,10 +798,11 @@ WITH ( MODULUS <replaceable class="parameter">numeric_literal</replaceable>, REM
<listitem>
<para>
This form changes the information which is written to the write-ahead log
- to identify rows which are updated or deleted. This option has no effect
- except when logical replication is in use.
- In all cases, no old values are logged unless at least one of the columns
- that would be logged differs between the old and new versions of the row.
+ to identify rows which are updated or deleted.
+ In most cases, the old value of each column is only logged if it differs
+ from the new value; however, if the old value is stored externally, it is
+ always logged regardless of whether it changed.
+ This option has no effect except when logical replication is in use.
<variablelist>
<varlistentry>
<term><literal>DEFAULT</literal></term>
diff --git a/src/backend/access/heap/heapam.c b/src/backend/access/heap/heapam.c
index 24d39327c9..bae1063e54 100644
--- a/src/backend/access/heap/heapam.c
+++ b/src/backend/access/heap/heapam.c
@@ -76,9 +76,11 @@ static XLogRecPtr log_heap_update(Relation reln, Buffer oldbuf,
Buffer newbuf, HeapTuple oldtup,
HeapTuple newtup, HeapTuple old_key_tup,
bool all_visible_cleared, bool new_all_visible_cleared);
-static Bitmapset *HeapDetermineModifiedColumns(Relation relation,
- Bitmapset *interesting_cols,
- HeapTuple oldtup, HeapTuple newtup);
+static Bitmapset *HeapDetermineColumnsInfo(Relation relation,
+ Bitmapset *interesting_cols,
+ Bitmapset *external_cols,
+ HeapTuple oldtup, HeapTuple newtup,
+ bool *has_external);
static bool heap_acquire_tuplock(Relation relation, ItemPointer tid,
LockTupleMode mode, LockWaitPolicy wait_policy,
bool *have_tuple_lock);
@@ -102,7 +104,7 @@ static void MultiXactIdWait(MultiXactId multi, MultiXactStatus status, uint16 in
static bool ConditionalMultiXactIdWait(MultiXactId multi, MultiXactStatus status,
uint16 infomask, Relation rel, int *remaining);
static XLogRecPtr log_heap_new_cid(Relation relation, HeapTuple tup);
-static HeapTuple ExtractReplicaIdentity(Relation rel, HeapTuple tup, bool key_modified,
+static HeapTuple ExtractReplicaIdentity(Relation rel, HeapTuple tup, bool key_required,
bool *copy);
@@ -2941,6 +2943,7 @@ heap_update(Relation relation, ItemPointer otid, HeapTuple newtup,
bool all_visible_cleared_new = false;
bool checked_lockers;
bool locker_remains;
+ bool id_has_external = false;
TransactionId xmax_new_tuple,
xmax_old_tuple;
uint16 infomask_old_tuple,
@@ -3025,7 +3028,7 @@ heap_update(Relation relation, ItemPointer otid, HeapTuple newtup,
Assert(ItemIdIsNormal(lp));
/*
- * Fill in enough data in oldtup for HeapDetermineModifiedColumns to work
+ * Fill in enough data in oldtup for HeapDetermineColumnsInfo to work
* properly.
*/
oldtup.t_tableOid = RelationGetRelid(relation);
@@ -3036,9 +3039,17 @@ heap_update(Relation relation, ItemPointer otid, HeapTuple newtup,
/* the new tuple is ready, except for this: */
newtup->t_tableOid = RelationGetRelid(relation);
- /* Determine columns modified by the update. */
- modified_attrs = HeapDetermineModifiedColumns(relation, interesting_attrs,
- &oldtup, newtup);
+ /*
+ * Determine columns modified by the update. Additionally, identify
+ * whether any of the unmodified replica identity key attributes in the
+ * old tuple is externally stored or not. This is required because for
+ * such attributes the flattened value won't be WAL logged as part of the
+ * new tuple so we must include it as part of the old_key_tuple. See
+ * ExtractReplicaIdentity.
+ */
+ modified_attrs = HeapDetermineColumnsInfo(relation, interesting_attrs,
+ id_attrs, &oldtup,
+ newtup, &id_has_external);
/*
* If we're not updating any "key" column, we can grab a weaker lock type.
@@ -3639,10 +3650,12 @@ l2:
* Compute replica identity tuple before entering the critical section so
* we don't PANIC upon a memory allocation failure.
* ExtractReplicaIdentity() will return NULL if nothing needs to be
- * logged.
+ * logged. Pass old key required as true only if the replica identity key
+ * columns are modified or it has external data.
*/
old_key_tuple = ExtractReplicaIdentity(relation, &oldtup,
- bms_overlap(modified_attrs, id_attrs),
+ bms_overlap(modified_attrs, id_attrs) ||
+ id_has_external,
&old_key_copied);
/* NO EREPORT(ERROR) from here till changes are logged */
@@ -3798,47 +3811,15 @@ l2:
}
/*
- * Check if the specified attribute's value is same in both given tuples.
- * Subroutine for HeapDetermineModifiedColumns.
+ * Check if the specified attribute's values are the same. Subroutine for
+ * HeapDetermineColumnsInfo.
*/
static bool
-heap_tuple_attr_equals(TupleDesc tupdesc, int attrnum,
- HeapTuple tup1, HeapTuple tup2)
+heap_attr_equals(TupleDesc tupdesc, int attrnum, Datum value1, Datum value2,
+ bool isnull1, bool isnull2)
{
- Datum value1,
- value2;
- bool isnull1,
- isnull2;
Form_pg_attribute att;
- /*
- * If it's a whole-tuple reference, say "not equal". It's not really
- * worth supporting this case, since it could only succeed after a no-op
- * update, which is hardly a case worth optimizing for.
- */
- if (attrnum == 0)
- return false;
-
- /*
- * Likewise, automatically say "not equal" for any system attribute other
- * than tableOID; we cannot expect these to be consistent in a HOT chain,
- * or even to be set correctly yet in the new tuple.
- */
- if (attrnum < 0)
- {
- if (attrnum != TableOidAttributeNumber)
- return false;
- }
-
- /*
- * Extract the corresponding values. XXX this is pretty inefficient if
- * there are many indexed columns. Should HeapDetermineModifiedColumns do
- * a single heap_deform_tuple call on each tuple, instead? But that
- * doesn't work for system columns ...
- */
- value1 = heap_getattr(tup1, attrnum, tupdesc, &isnull1);
- value2 = heap_getattr(tup2, attrnum, tupdesc, &isnull2);
-
/*
* If one value is NULL and other is not, then they are certainly not
* equal
@@ -3880,24 +3861,96 @@ heap_tuple_attr_equals(TupleDesc tupdesc, int attrnum,
* Given an updated tuple, determine (and return into the output bitmapset),
* from those listed as interesting, the set of columns that changed.
*
- * The input bitmapset is destructively modified; that is OK since this is
- * invoked at most once in heap_update.
+ * has_external indicates if any of the unmodified attributes (from those
+ * listed as interesting) of the old tuple is a member of external_cols and is
+ * stored externally.
+ *
+ * The input interesting_cols bitmapset is destructively modified; that is OK
+ * since this is invoked at most once in heap_update.
*/
static Bitmapset *
-HeapDetermineModifiedColumns(Relation relation, Bitmapset *interesting_cols,
- HeapTuple oldtup, HeapTuple newtup)
+HeapDetermineColumnsInfo(Relation relation,
+ Bitmapset *interesting_cols,
+ Bitmapset *external_cols,
+ HeapTuple oldtup, HeapTuple newtup,
+ bool *has_external)
{
- int attnum;
+ int attrnum;
Bitmapset *modified = NULL;
+ TupleDesc tupdesc = RelationGetDescr(relation);
- while ((attnum = bms_first_member(interesting_cols)) >= 0)
+ while ((attrnum = bms_first_member(interesting_cols)) >= 0)
{
- attnum += FirstLowInvalidHeapAttributeNumber;
+ Datum value1,
+ value2;
+ bool isnull1,
+ isnull2;
+
+ attrnum += FirstLowInvalidHeapAttributeNumber;
- if (!heap_tuple_attr_equals(RelationGetDescr(relation),
- attnum, oldtup, newtup))
+ /*
+ * If it's a whole-tuple reference, say "not equal". It's not really
+ * worth supporting this case, since it could only succeed after a
+ * no-op update, which is hardly a case worth optimizing for.
+ */
+ if (attrnum == 0)
+ {
modified = bms_add_member(modified,
- attnum - FirstLowInvalidHeapAttributeNumber);
+ attrnum -
+ FirstLowInvalidHeapAttributeNumber);
+ continue;
+ }
+
+ /*
+ * Likewise, automatically say "not equal" for any system attribute
+ * other than tableOID; we cannot expect these to be consistent in a
+ * HOT chain, or even to be set correctly yet in the new tuple.
+ */
+ if (attrnum < 0)
+ {
+ if (attrnum != TableOidAttributeNumber)
+ {
+ modified = bms_add_member(modified,
+ attrnum -
+ FirstLowInvalidHeapAttributeNumber);
+ continue;
+ }
+ }
+
+ /*
+ * Extract the corresponding values. XXX this is pretty inefficient
+ * if there are many indexed columns. Should we do a single
+ * heap_deform_tuple call on each tuple, instead? But that doesn't
+ * work for system columns ...
+ */
+ value1 = heap_getattr(oldtup, attrnum, tupdesc, &isnull1);
+ value2 = heap_getattr(newtup, attrnum, tupdesc, &isnull2);
+
+ if (!heap_attr_equals(tupdesc, attrnum, value1,
+ value2, isnull1, isnull2))
+ {
+ modified = bms_add_member(modified,
+ attrnum -
+ FirstLowInvalidHeapAttributeNumber);
+ continue;
+ }
+
+ /*
+ * No need to check attributes that can't be stored externally. Note
+ * that system attributes can't be stored externally.
+ */
+ if (attrnum < 0 || isnull1 ||
+ TupleDescAttr(tupdesc, attrnum - 1)->attlen != -1)
+ continue;
+
+ /*
+ * Check if the old tuple's attribute is stored externally and is a
+ * member of external_cols.
+ */
+ if (VARATT_IS_EXTERNAL((struct varlena *) DatumGetPointer(value1)) &&
+ bms_is_member(attrnum - FirstLowInvalidHeapAttributeNumber,
+ external_cols))
+ *has_external = true;
}
return modified;
@@ -7661,14 +7714,14 @@ log_heap_new_cid(Relation relation, HeapTuple tup)
* Returns NULL if there's no need to log an identity or if there's no suitable
* key defined.
*
- * key_changed should be false if caller knows that no replica identity
- * columns changed value. It's always true in the DELETE case.
+ * Pass key_required true if any replica identity columns changed value, or if
+ * any of them have any external data. Delete must always pass true.
*
* *copy is set to true if the returned tuple is a modified copy rather than
* the same tuple that was passed in.
*/
static HeapTuple
-ExtractReplicaIdentity(Relation relation, HeapTuple tp, bool key_changed,
+ExtractReplicaIdentity(Relation relation, HeapTuple tp, bool key_required,
bool *copy)
{
TupleDesc desc = RelationGetDescr(relation);
@@ -7700,8 +7753,8 @@ ExtractReplicaIdentity(Relation relation, HeapTuple tp, bool key_changed,
return tp;
}
- /* if the key hasn't changed and we're only logging the key, we're done */
- if (!key_changed)
+ /* if the key isn't required and we're only logging the key, we're done */
+ if (!key_required)
return NULL;
/* find out the replica identity columns */
@@ -7709,10 +7762,10 @@ ExtractReplicaIdentity(Relation relation, HeapTuple tp, bool key_changed,
INDEX_ATTR_BITMAP_IDENTITY_KEY);
/*
- * If there's no defined replica identity columns, treat as !key_changed.
+ * If there's no defined replica identity columns, treat as !key_required.
* (This case should not be reachable from heap_update, since that should
- * calculate key_changed accurately. But heap_delete just passes constant
- * true for key_changed, so we can hit this case in deletes.)
+ * calculate key_required accurately. But heap_delete just passes
+ * constant true for key_required, so we can hit this case in deletes.)
*/
if (bms_is_empty(idattrs))
return NULL;
--
2.18.4
On Fri, Feb 11, 2022 at 12:00 PM tanghy.fnst@fujitsu.com
<tanghy.fnst@fujitsu.com> wrote:
Attached the patches which fixed the above two comments and the first comment in
my previous mail [1], the rest is the same as before.
I ran the tests on all branches, they all passed as expected.
Thanks, these look good to me. I'll push these early next week
(Monday) unless there are any more suggestions or comments.
--
With Regards,
Amit Kapila.
On Fri, Feb 11, 2022 at 4:27 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
On Fri, Feb 11, 2022 at 12:00 PM tanghy.fnst@fujitsu.com
<tanghy.fnst@fujitsu.com> wrote:Attached the patches which fixed the above two comments and the first comment in
my previous mail [1], the rest is the same as before.
I ran the tests on all branches, they all passed as expected.Thanks, these look good to me. I'll push these early next week
(Monday) unless there are any more suggestions or comments.
Pushed!
--
With Regards,
Amit Kapila.
On Mon, Feb 14, 2022 at 2:54 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
On Fri, Feb 11, 2022 at 4:27 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
On Fri, Feb 11, 2022 at 12:00 PM tanghy.fnst@fujitsu.com
<tanghy.fnst@fujitsu.com> wrote:Attached the patches which fixed the above two comments and the first comment in
my previous mail [1], the rest is the same as before.
I ran the tests on all branches, they all passed as expected.Thanks, these look good to me. I'll push these early next week
(Monday) unless there are any more suggestions or comments.Pushed!
Thanks!!
--
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com
Hi,
On 2022-02-14 14:54:41 +0530, Amit Kapila wrote:
On Fri, Feb 11, 2022 at 4:27 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
On Fri, Feb 11, 2022 at 12:00 PM tanghy.fnst@fujitsu.com
<tanghy.fnst@fujitsu.com> wrote:Attached the patches which fixed the above two comments and the first comment in
my previous mail [1], the rest is the same as before.
I ran the tests on all branches, they all passed as expected.Thanks, these look good to me. I'll push these early next week
(Monday) unless there are any more suggestions or comments.Pushed!
Thanks for all the work on this!