Disallow UPDATE/DELETE on table with unpublished generated column as REPLICA IDENTITY

Started by Shlok Kyalover 1 year ago55 messages
Jump to latest
#1Shlok Kyal
shlok.kyal.oss@gmail.com

Hi,

There was an issue reported recently by Sawada-san on a different thread [1]/messages/by-id/CAD21AoA_RBkMa-6nUpBSoEP9s=46r3oq15vQkunVRCsYKXKMnA@mail.gmail.com.
I have created this thread to discuss the issue separately.

Currently, generated columns can be published only when we explicitly
specify it in the Publication column list.
An issue was found that UPDATE and DELETE are allowed on the table
even if its replica identity is set to generated columns that are not
published.
For example:
CREATE TABLE testpub_gencol (a INT, b INT GENERATED ALWAYS AS (a + 1)
STORED NOT NULL);
CREATE UNIQUE INDEX testpub_gencol_idx ON testpub_gencol (b);
ALTER TABLE testpub_gencol REPLICA IDENTITY USING index testpub_gencol_idx;
CREATE PUBLICATION pub_gencol FOR TABLE testpub_gencol;
UPDATE testpub_gencol SET a = 100 WHERE a = 1;

Here the generated column 'b' is set as REPLICA IDENTITY for table
'testpub_gencol'. When we create publication 'pub_gencol' we do not
specify any column list, so column 'b' will not be published.
So, the update message generated by the last UPDATE would have NULL
for column 'b'.

To avoid the issue, we can disallow UPDATE/DELETE on table with
unpublished generated column as REPLICA IDENTITY. I have attached a
patch for the same.

[1]: /messages/by-id/CAD21AoA_RBkMa-6nUpBSoEP9s=46r3oq15vQkunVRCsYKXKMnA@mail.gmail.com

Thanks and regards,
Shlok Kyal

Attachments:

v1-0001-Disallow-UPDATE-DELETE-on-table-with-generated-co.patchapplication/octet-stream; name=v1-0001-Disallow-UPDATE-DELETE-on-table-with-generated-co.patchDownload+73-1
#2Aleksander Alekseev
aleksander@timescale.com
In reply to: Shlok Kyal (#1)
Re: Disallow UPDATE/DELETE on table with unpublished generated column as REPLICA IDENTITY

Hi Shlok,

Here the generated column 'b' is set as REPLICA IDENTITY for table
'testpub_gencol'. When we create publication 'pub_gencol' we do not
specify any column list, so column 'b' will not be published.
So, the update message generated by the last UPDATE would have NULL
for column 'b'.

To avoid the issue, we can disallow UPDATE/DELETE on table with
unpublished generated column as REPLICA IDENTITY. I have attached a
patch for the same.

I don't think this would be a correct fix. Let's say I *don't* have
any publications:

```
=# CREATE TABLE testpub_gencol (a INT, b INT GENERATED ALWAYS AS (a + 1)
STORED NOT NULL);
CREATE TABLE

=# CREATE UNIQUE INDEX testpub_gencol_idx ON testpub_gencol (b);
CREATE INDEX

=# INSERT INTO testpub_gencol (a) VALUES (1);
INSERT 0 1

=# UPDATE testpub_gencol SET a = 100 WHERE a = 1;
UPDATE 1
eax=# SELECT * FROM testpub_gencol ;
a | b
-----+-----
100 | 101
(1 row)
```

So far everything works fine. You are saying that when one creates a
publication UPDATEs should stop working. That would be rather
surprising behavior for a typical user not to mention that it will
break the current behavior.

I believe one would expect that both UPDATEs and the publication
should continue to work. Perhaps we should forbid the creation of a
publication like this instead. Or alternatively include a generated
column to the publication list if it's used as a replica identity. Or
maybe even keep everything as is.

Thoughts?

--
Best regards,
Aleksander Alekseev

#3Shlok Kyal
shlok.kyal.oss@gmail.com
In reply to: Aleksander Alekseev (#2)
Re: Disallow UPDATE/DELETE on table with unpublished generated column as REPLICA IDENTITY

Hi Aleksander,

Here the generated column 'b' is set as REPLICA IDENTITY for table
'testpub_gencol'. When we create publication 'pub_gencol' we do not
specify any column list, so column 'b' will not be published.
So, the update message generated by the last UPDATE would have NULL
for column 'b'.

To avoid the issue, we can disallow UPDATE/DELETE on table with
unpublished generated column as REPLICA IDENTITY. I have attached a
patch for the same.

I don't think this would be a correct fix. Let's say I *don't* have
any publications:

```
=# CREATE TABLE testpub_gencol (a INT, b INT GENERATED ALWAYS AS (a + 1)
STORED NOT NULL);
CREATE TABLE

=# CREATE UNIQUE INDEX testpub_gencol_idx ON testpub_gencol (b);
CREATE INDEX

=# INSERT INTO testpub_gencol (a) VALUES (1);
INSERT 0 1

=# UPDATE testpub_gencol SET a = 100 WHERE a = 1;
UPDATE 1
eax=# SELECT * FROM testpub_gencol ;
a | b
-----+-----
100 | 101
(1 row)
```

So far everything works fine. You are saying that when one creates a
publication UPDATEs should stop working. That would be rather
surprising behavior for a typical user not to mention that it will
break the current behavior.

I believe one would expect that both UPDATEs and the publication
should continue to work. Perhaps we should forbid the creation of a
publication like this instead. Or alternatively include a generated
column to the publication list if it's used as a replica identity. Or
maybe even keep everything as is.

Thoughts?

While testing I found that similar behaviors already exist in some
cases. Where once we create a publication UPDATES might stop working.
For example:
Case1:
postgres=# create table t1(c1 int);
CREATE TABLE
postgres=# insert into t1 values(1);
INSERT 0 1
postgres=# update t1 set c1 = 100 where c1 = 1;
UPDATE 1
postgres=# create publication pub for table t1;
CREATE PUBLICATION
postgres=# update t1 set c1 = 100 where c1 = 1;
ERROR: cannot update table "t1" because it does not have a replica
identity and publishes updates
HINT: To enable updating the table, set REPLICA IDENTITY using ALTER TABLE.

Case2:
postgres=# create table t2(c1 int, c2 int not null);
CREATE TABLE
postgres=# create unique index t2_idx on t2 (c2);
CREATE INDEX
postgres=# alter table t2 replica identity using index t2_idx;
ALTER TABLE
postgres=# insert into t2 values(1,1);
INSERT 0 1
postgres=# update t2 set c1 = 100 where c1 = 1;
UPDATE 1
postgres=# create publication pub2 for table t2 where (c1 > 10);
CREATE PUBLICATION
postgres=# update t2 set c1 = 100 where c1 = 1;
ERROR: cannot update table "t2"
DETAIL: Column used in the publication WHERE expression is not part
of the replica identity.

Behaviour with the patch provided in [1]/messages/by-id/CANhcyEVw4V2Awe2AB6i0E5AJLNdASShGfdBLbUd1XtWDboymCA@mail.gmail.com to resolve the issue:
postgres=# create table t3(c1 int, c2 INT GENERATED ALWAYS AS (c1 + 1)
STORED NOT NULL);
CREATE TABLE
postgres=# create unique index t3_idx on t3 (c2);
CREATE INDEX
postgres=# alter table t3 replica identity using index t3_idx;
ALTER TABLE
postgres=# insert into t3 values(1);
INSERT 0 1
postgres=# update t3 set c1 = 100 where c1 = 1;
UPDATE 1
postgres=# create publication pub3 for table t3;
CREATE PUBLICATION
postgres=# update t3 set c1 = 100 where c1 = 1;
ERROR: cannot update table "t3"
DETAIL: Column list used by the publication does not cover the
replica identity.

So, I think this behavior would be acceptable. Thoughts?

[1]: /messages/by-id/CANhcyEVw4V2Awe2AB6i0E5AJLNdASShGfdBLbUd1XtWDboymCA@mail.gmail.com

Thanks and Regards,
Shlok Kyal

#4Aleksander Alekseev
aleksander@timescale.com
In reply to: Shlok Kyal (#3)
Re: Disallow UPDATE/DELETE on table with unpublished generated column as REPLICA IDENTITY

Hi Shlok,

So, I think this behavior would be acceptable. Thoughts?

That's a fair point, thanks for sharing. Personally I find this
behavior somewhat suboptimal but since we already have it in certain
cases I guess what you propose might be acceptable.

I'm still not entirely happy about breaking the existing behavior in
the discussed case. Not sure what the lesser evil would be - breaking
it or keeping it as is.

Some input from other people on the mailing list would be appreciated.

--
Best regards,
Aleksander Alekseev

#5Amit Kapila
amit.kapila16@gmail.com
In reply to: Aleksander Alekseev (#4)
Re: Disallow UPDATE/DELETE on table with unpublished generated column as REPLICA IDENTITY

On Wed, Nov 6, 2024 at 5:48 PM Aleksander Alekseev
<aleksander@timescale.com> wrote:

So, I think this behavior would be acceptable. Thoughts?

That's a fair point, thanks for sharing. Personally I find this
behavior somewhat suboptimal but since we already have it in certain
cases I guess what you propose might be acceptable.

This is not a suboptimal behavior but a must to have, otherwise, there
is no way we can identify the row to update on the subscriber side.
Also, this is not in certain cases but in all cases for UPDATE/DELETE,
we need REPLICA IDENTITY to be set. See more about REPLICA IDENTITY in
Alter Table docs [1]https://www.postgresql.org/docs/devel/sql-altertable.html. The problem reported by Shlok is that even
though we have a REPLICA IDENTITY defined on a generated column but
still won't send the required column value (as generated columns are
skipped by default) to the subscriber which will lead to ERROR as
mentioned below. Now, one can argue that this is not expected from the
user or why the user would have such a setup but I think we should fix
the problem if it leads to unexpected behavior on the subscriber.

I'm still not entirely happy about breaking the existing behavior in
the discussed case. Not sure what the lesser evil would be - breaking
it or keeping it as is.

The current behavior is not acceptable because it would generate an
ERROR as follows on the subscriber:

2024-11-07 10:50:31.381 IST [16260] ERROR: publisher did not send
replica identity column expected by the logical replication target
relation "public.testpub_gencol"
2024-11-07 10:50:31.381 IST [16260] CONTEXT: processing remote data
for replication origin "pg_16389" during message type "UPDATE" for
replication target relation "public.testpub_gencol" in transaction
748, finished at 0/176D5D8
2024-11-07 10:50:31.398 IST [6216] LOG: background worker "logical
replication apply worker" (PID 16260) exited with exit code 1

Some input from other people on the mailing list would be appreciated.

We should fix this in the HEAD and back branches.

[1]: https://www.postgresql.org/docs/devel/sql-altertable.html

--
With Regards,
Amit Kapila.

#6Amit Kapila
amit.kapila16@gmail.com
In reply to: Shlok Kyal (#1)
Re: Disallow UPDATE/DELETE on table with unpublished generated column as REPLICA IDENTITY

On Tue, Nov 5, 2024 at 12:53 PM Shlok Kyal <shlok.kyal.oss@gmail.com> wrote:

To avoid the issue, we can disallow UPDATE/DELETE on table with
unpublished generated column as REPLICA IDENTITY. I have attached a
patch for the same.

+CREATE PUBLICATION pub_gencol FOR TABLE testpub_gencol;
+UPDATE testpub_gencol SET a = 100 WHERE a = 1;
+ERROR:  cannot update table "testpub_gencol"
+DETAIL:  Column list used by the publication does not cover the
replica identity.

This is not a correct ERROR message as the publication doesn't have
any column list associated with it. You have added the code to detect
this in the column list code path which I think is not required. BTW,
you also need to consider the latest commit 7054186c4e for this. I
guess you need to keep another flag in PublicationDesc to detect this
and then give an appropriate ERROR.

--
With Regards,
Amit Kapila.

#7Amit Kapila
amit.kapila16@gmail.com
In reply to: Amit Kapila (#5)
Re: Disallow UPDATE/DELETE on table with unpublished generated column as REPLICA IDENTITY

On Thu, Nov 7, 2024 at 11:04 AM Amit Kapila <amit.kapila16@gmail.com> wrote:

On Wed, Nov 6, 2024 at 5:48 PM Aleksander Alekseev
<aleksander@timescale.com> wrote:

We should fix this in the HEAD and back branches.

BTW, I was thinking as to how to fix it on back branches and it seems
we should restrict to define REPLICA IDENTITY on stored generated
columns in the first place in back branches as those can't be
replicated. So, the following should fail:

CREATE TABLE testpub_gencol (a INT, b INT GENERATED ALWAYS AS (a + 1)
STORED NOT NULL);
CREATE UNIQUE INDEX testpub_gencol_idx ON testpub_gencol (b);
ALTER TABLE testpub_gencol REPLICA IDENTITY USING index testpub_gencol_idx;

Peter, do you have an opinion on this?

[1]: https://www.postgresql.org/docs/devel/ddl-generated-columns.html

--
With Regards,
Amit Kapila.

#8Shlok Kyal
shlok.kyal.oss@gmail.com
In reply to: Amit Kapila (#6)
Re: Disallow UPDATE/DELETE on table with unpublished generated column as REPLICA IDENTITY

Hi Amit,

On Thu, 7 Nov 2024 at 11:37, Amit Kapila <amit.kapila16@gmail.com> wrote:

On Tue, Nov 5, 2024 at 12:53 PM Shlok Kyal <shlok.kyal.oss@gmail.com> wrote:

To avoid the issue, we can disallow UPDATE/DELETE on table with
unpublished generated column as REPLICA IDENTITY. I have attached a
patch for the same.

+CREATE PUBLICATION pub_gencol FOR TABLE testpub_gencol;
+UPDATE testpub_gencol SET a = 100 WHERE a = 1;
+ERROR:  cannot update table "testpub_gencol"
+DETAIL:  Column list used by the publication does not cover the
replica identity.

This is not a correct ERROR message as the publication doesn't have
any column list associated with it. You have added the code to detect
this in the column list code path which I think is not required. BTW,
you also need to consider the latest commit 7054186c4e for this. I
guess you need to keep another flag in PublicationDesc to detect this
and then give an appropriate ERROR.

I have addressed the comments and provided an updated patch. Also, I
am currently working to fix this issue in back branches.

Thanks and Regards,
Shlok Kyal

Attachments:

v2-0001-Disallow-UPDATE-DELETE-on-table-with-generated-co.patchapplication/octet-stream; name=v2-0001-Disallow-UPDATE-DELETE-on-table-with-generated-co.patchDownload+187-1
#9Alvaro Herrera
alvherre@2ndquadrant.com
In reply to: Amit Kapila (#7)
Re: Disallow UPDATE/DELETE on table with unpublished generated column as REPLICA IDENTITY

On 2024-Nov-07, Amit Kapila wrote:

BTW, I was thinking as to how to fix it on back branches and it seems
we should restrict to define REPLICA IDENTITY on stored generated
columns in the first place in back branches as those can't be
replicated. So, the following should fail:

CREATE TABLE testpub_gencol (a INT, b INT GENERATED ALWAYS AS (a + 1)
STORED NOT NULL);
CREATE UNIQUE INDEX testpub_gencol_idx ON testpub_gencol (b);
ALTER TABLE testpub_gencol REPLICA IDENTITY USING index testpub_gencol_idx;

Peter, do you have an opinion on this?

I think a blanket restriction of this sort is not a good idea (at least
in back branches), because there might be people using replica
identities with stacks other than pgoutput. Would it work to enforce
the restriction when such a table is added to a publication?

--
Álvaro Herrera 48°01'N 7°57'E — https://www.EnterpriseDB.com/
"Nunca confiaré en un traidor. Ni siquiera si el traidor lo he creado yo"
(Barón Vladimir Harkonnen)

#10Amit Kapila
amit.kapila16@gmail.com
In reply to: Alvaro Herrera (#9)
Re: Disallow UPDATE/DELETE on table with unpublished generated column as REPLICA IDENTITY

On Fri, Nov 8, 2024 at 5:17 PM Alvaro Herrera <alvherre@alvh.no-ip.org> wrote:

On 2024-Nov-07, Amit Kapila wrote:

BTW, I was thinking as to how to fix it on back branches and it seems
we should restrict to define REPLICA IDENTITY on stored generated
columns in the first place in back branches as those can't be
replicated. So, the following should fail:

CREATE TABLE testpub_gencol (a INT, b INT GENERATED ALWAYS AS (a + 1)
STORED NOT NULL);
CREATE UNIQUE INDEX testpub_gencol_idx ON testpub_gencol (b);
ALTER TABLE testpub_gencol REPLICA IDENTITY USING index testpub_gencol_idx;

Peter, do you have an opinion on this?

I think a blanket restriction of this sort is not a good idea (at least
in back branches), because there might be people using replica
identities with stacks other than pgoutput.

Do you mean to say that people using plugins other than pgoutput may
already be sending generated columns, so defining replica identity
should be okay for them?

Would it work to enforce
the restriction when such a table is added to a publication?

But what if somebody defines REPLICA IDENTITY on the generated column
after adding the table to the publication?

--
With Regards,
Amit Kapila.

#11Amit Kapila
amit.kapila16@gmail.com
In reply to: Amit Kapila (#10)
Re: Disallow UPDATE/DELETE on table with unpublished generated column as REPLICA IDENTITY

On Sat, Nov 9, 2024 at 8:46 AM Amit Kapila <amit.kapila16@gmail.com> wrote:

On Fri, Nov 8, 2024 at 5:17 PM Alvaro Herrera <alvherre@alvh.no-ip.org> wrote:

On 2024-Nov-07, Amit Kapila wrote:

BTW, I was thinking as to how to fix it on back branches and it seems
we should restrict to define REPLICA IDENTITY on stored generated
columns in the first place in back branches as those can't be
replicated. So, the following should fail:

CREATE TABLE testpub_gencol (a INT, b INT GENERATED ALWAYS AS (a + 1)
STORED NOT NULL);
CREATE UNIQUE INDEX testpub_gencol_idx ON testpub_gencol (b);
ALTER TABLE testpub_gencol REPLICA IDENTITY USING index testpub_gencol_idx;

Peter, do you have an opinion on this?

I think a blanket restriction of this sort is not a good idea (at least
in back branches), because there might be people using replica
identities with stacks other than pgoutput.

Do you mean to say that people using plugins other than pgoutput may
already be sending generated columns, so defining replica identity
should be okay for them?

If we don't want to add a restriction to not create replica identity
on generated columns then I think the solution similar to HEAD should
be okay which is to restrict UPDATE/DELETE in such cases.

Also, another point against restricting defining REPLICA IDENTITY on
generated columns is that we do allow generated columns to be PRIMARY
KEY which is a DEFAULT for REPLICA IDENTITY, so that also needs to be
restricted. That won't be a good idea.

--
With Regards,
Amit Kapila.

#12Zhijie Hou (Fujitsu)
houzj.fnst@fujitsu.com
In reply to: Shlok Kyal (#8)
RE: Disallow UPDATE/DELETE on table with unpublished generated column as REPLICA IDENTITY

On Friday, November 8, 2024 7:06 PM Shlok Kyal <shlok.kyal.oss@gmail.com> wrote:

Hi Amit,

On Thu, 7 Nov 2024 at 11:37, Amit Kapila <amit.kapila16@gmail.com> wrote:

On Tue, Nov 5, 2024 at 12:53 PM Shlok Kyal <shlok.kyal.oss@gmail.com>

wrote:

To avoid the issue, we can disallow UPDATE/DELETE on table with
unpublished generated column as REPLICA IDENTITY. I have attached a
patch for the same.

+CREATE PUBLICATION pub_gencol FOR TABLE testpub_gencol; UPDATE
+testpub_gencol SET a = 100 WHERE a = 1;
+ERROR:  cannot update table "testpub_gencol"
+DETAIL:  Column list used by the publication does not cover the
replica identity.

This is not a correct ERROR message as the publication doesn't have
any column list associated with it. You have added the code to detect
this in the column list code path which I think is not required. BTW,
you also need to consider the latest commit 7054186c4e for this. I
guess you need to keep another flag in PublicationDesc to detect this
and then give an appropriate ERROR.

I have addressed the comments and provided an updated patch. Also, I am
currently working to fix this issue in back branches.

Thanks for the patch. I am reviewing it and have some initial comments:

1.
+			char attgenerated = get_attgenerated(relid, attnum);
+

I think it's unnecessary to initialize attgenerated here because the value will
be overwritten if pubviaroot is true anyway. Also, the get_attgenerated()
is not cheap.

2.

I think the patch missed to check the case when table is marked REPLICA
IDENTITY FULL, and generated column is not published:

CREATE TABLE testpub_gencol (a INT, b INT GENERATED ALWAYS AS (a + 1) STORED NOT NULL);
ALTER TABLE testpub_gencol REPLICA IDENTITY FULL;
CREATE PUBLICATION pub_gencol FOR TABLE testpub_gencol;
UPDATE testpub_gencol SET a = 2;

I expected the UPDATE to fail in above case, but it can still pass after applying the patch.

3.

+		 * If the publication is FOR ALL TABLES we can skip the validation.
+		 */

This comment seems not clear to me, could you elaborate a bit more on this ?

4.

Also, I think the patch does not handle the FOR ALL TABLE case correctly:

CREATE TABLE testpub_gencol (a INT, b INT GENERATED ALWAYS AS (a + 1) STORED NOT NULL);
CREATE UNIQUE INDEX testpub_gencol_idx ON testpub_gencol (b);
ALTER TABLE testpub_gencol REPLICA IDENTITY USING index testpub_gencol_idx;
CREATE PUBLICATION pub_gencol FOR ALL TABLEs;
UPDATE testpub_gencol SET a = 2;

I expected the UPDATE to fail in above case as well.

5.

+	else if (cmd == CMD_UPDATE && !pubdesc.replident_has_valid_gen_cols)
+		ereport(ERROR,
+				(errcode(ERRCODE_INVALID_COLUMN_REFERENCE),
+				 errmsg("cannot update table \"%s\"",
+						RelationGetRelationName(rel)),
+				 errdetail("REPLICA IDENTITY consists of an unpublished generated column.")));

I think it would be better to use lower case "replica identity" to consistent
with other existing messages.

Best Regards,
Hou zj

#13Alvaro Herrera
alvherre@2ndquadrant.com
In reply to: Zhijie Hou (Fujitsu) (#12)
Re: Disallow UPDATE/DELETE on table with unpublished generated column as REPLICA IDENTITY

On 2024-Nov-09, Amit Kapila wrote:

On Fri, Nov 8, 2024 at 5:17 PM Alvaro Herrera <alvherre@alvh.no-ip.org> wrote:

On 2024-Nov-07, Amit Kapila wrote:

BTW, I was thinking as to how to fix it on back branches and it seems
we should restrict to define REPLICA IDENTITY on stored generated
columns in the first place in back branches as those can't be
replicated.

I think a blanket restriction of this sort is not a good idea (at least
in back branches), because there might be people using replica
identities with stacks other than pgoutput.

Do you mean to say that people using plugins other than pgoutput may
already be sending generated columns, so defining replica identity
should be okay for them?

Yes.

If we don't want to add a restriction to not create replica identity
on generated columns then I think the solution similar to HEAD should
be okay which is to restrict UPDATE/DELETE in such cases.

Hmm, I don't know about this. Maybe nobody cares, but I'm uneasy about
it. I'm wondering about hypothetical cases where people is already
using this combination of features in stable branches, without pgoutput.
I think it's not great to add restrictions that didn't exist when they
upgraded to some stable branch. In branch master it's probably okay,
because they'll have to test before upgrading and they'll realize the
problem and have the chance to adjust (or complain) before calling the
upgrade good. But if we do that for stable branches, we'd deprive them
of the ability to do minor upgrades, which would be Not Good.

So, another option is to do nothing for stable branches.

Would it work to enforce the restriction when such a table is added
to a publication?

But what if somebody defines REPLICA IDENTITY on the generated column
after adding the table to the publication?

Well, maybe we can restrict the change of REPLICA IDENTITY if the table
is already in a pgoutput publication?

On 2024-Nov-12, Amit Kapila wrote:

Also, another point against restricting defining REPLICA IDENTITY on
generated columns is that we do allow generated columns to be PRIMARY
KEY which is a DEFAULT for REPLICA IDENTITY, so that also needs to be
restricted. That won't be a good idea.

Oh, that's a good point too.

It's not clear to me why doesn't pgoutput cope with generated columns in
replica identities. Maybe that can be reconsidered?

--
Álvaro Herrera 48°01'N 7°57'E — https://www.EnterpriseDB.com/
"La persona que no quería pecar / estaba obligada a sentarse
en duras y empinadas sillas / desprovistas, por cierto
de blandos atenuantes" (Patricio Vogel)

#14Amit Kapila
amit.kapila16@gmail.com
In reply to: Alvaro Herrera (#13)
Re: Disallow UPDATE/DELETE on table with unpublished generated column as REPLICA IDENTITY

On Tue, Nov 12, 2024 at 2:15 PM Alvaro Herrera <alvherre@alvh.no-ip.org> wrote:

On 2024-Nov-09, Amit Kapila wrote:

On Fri, Nov 8, 2024 at 5:17 PM Alvaro Herrera <alvherre@alvh.no-ip.org> wrote:

On 2024-Nov-07, Amit Kapila wrote:

BTW, I was thinking as to how to fix it on back branches and it seems
we should restrict to define REPLICA IDENTITY on stored generated
columns in the first place in back branches as those can't be
replicated.

I think a blanket restriction of this sort is not a good idea (at least
in back branches), because there might be people using replica
identities with stacks other than pgoutput.

Do you mean to say that people using plugins other than pgoutput may
already be sending generated columns, so defining replica identity
should be okay for them?

Yes.

If we don't want to add a restriction to not create replica identity
on generated columns then I think the solution similar to HEAD should
be okay which is to restrict UPDATE/DELETE in such cases.

Hmm, I don't know about this. Maybe nobody cares, but I'm uneasy about
it. I'm wondering about hypothetical cases where people is already
using this combination of features in stable branches, without pgoutput.
I think it's not great to add restrictions that didn't exist when they
upgraded to some stable branch. In branch master it's probably okay,
because they'll have to test before upgrading and they'll realize the
problem and have the chance to adjust (or complain) before calling the
upgrade good. But if we do that for stable branches, we'd deprive them
of the ability to do minor upgrades, which would be Not Good.

So, another option is to do nothing for stable branches.

Fair enough. The other point in favor of that option is that nobody
has reported this problem yet but my guess is that they would have
probably not used such a combination at least with pgoutput plugin
otherwise, they would have faced the ERRORs on the subscriber. So, we
can do this only for HEAD and decide on the fix if anyone ever reports
this problem.

Would it work to enforce the restriction when such a table is added
to a publication?

But what if somebody defines REPLICA IDENTITY on the generated column
after adding the table to the publication?

Well, maybe we can restrict the change of REPLICA IDENTITY if the table
is already in a pgoutput publication?

What about the PRIMARY KEY case as shared in my later email? Even
apart from that the plugin is decided via slot, so we won't be able to
detect from table<->publication relationship.

On 2024-Nov-12, Amit Kapila wrote:

Also, another point against restricting defining REPLICA IDENTITY on
generated columns is that we do allow generated columns to be PRIMARY
KEY which is a DEFAULT for REPLICA IDENTITY, so that also needs to be
restricted. That won't be a good idea.

Oh, that's a good point too.

It's not clear to me why doesn't pgoutput cope with generated columns in
replica identities. Maybe that can be reconsidered?

In stable branches, we intentionally skip publishing generated columns
as we assumed that the subscriber side also had a generated column.
So, sending it would be a waste of network bandwidth. OTOH, when one
tries to replicate the changes to some other database that didn't have
the generated columns concept, it would create a problem. So we
developed a new feature for HEAD as part of commits 745217a051 and
7054186c4e which allows the publication of generated columns when
explicitly specified by the users.

--
With Regards,
Amit Kapila.

#15Alvaro Herrera
alvherre@2ndquadrant.com
In reply to: Amit Kapila (#14)
Re: Disallow UPDATE/DELETE on table with unpublished generated column as REPLICA IDENTITY

On 2024-Nov-12, Amit Kapila wrote:

On Tue, Nov 12, 2024 at 2:15 PM Alvaro Herrera <alvherre@alvh.no-ip.org> wrote:

So, another option is to do nothing for stable branches.

Fair enough. The other point in favor of that option is that nobody
has reported this problem yet but my guess is that they would have
probably not used such a combination at least with pgoutput plugin
otherwise, they would have faced the ERRORs on the subscriber. So, we
can do this only for HEAD and decide on the fix if anyone ever reports
this problem.

Right.

Well, maybe we can restrict the change of REPLICA IDENTITY if the table
is already in a pgoutput publication?

What about the PRIMARY KEY case as shared in my later email? Even
apart from that the plugin is decided via slot, so we won't be able to
detect from table<->publication relationship.

I responded to both emails together, my response is what you quoted
below:

On 2024-Nov-12, Amit Kapila wrote:

Also, another point against restricting defining REPLICA IDENTITY on
generated columns is that we do allow generated columns to be PRIMARY
KEY which is a DEFAULT for REPLICA IDENTITY, so that also needs to be
restricted. That won't be a good idea.

Oh, that's a good point too.

(I was acknowledging this as a problem case.)

It's not clear to me why doesn't pgoutput cope with generated columns in
replica identities. Maybe that can be reconsidered?

In stable branches, we intentionally skip publishing generated columns
as we assumed that the subscriber side also had a generated column.
So, sending it would be a waste of network bandwidth. OTOH, when one
tries to replicate the changes to some other database that didn't have
the generated columns concept, it would create a problem. So we
developed a new feature for HEAD as part of commits 745217a051 and
7054186c4e which allows the publication of generated columns when
explicitly specified by the users.

Ah, I think it's good then, we don't need to do anything further on
this. It's just not supported on earlier branches (and it doesn't work
with pgoutput, though it does with other plugins); and master has a
mechanism for it to work with any output plugin.

--
Álvaro Herrera Breisgau, Deutschland — https://www.EnterpriseDB.com/

#16Amit Kapila
amit.kapila16@gmail.com
In reply to: Alvaro Herrera (#15)
Re: Disallow UPDATE/DELETE on table with unpublished generated column as REPLICA IDENTITY

On Tue, Nov 12, 2024 at 5:37 PM Alvaro Herrera <alvherre@alvh.no-ip.org> wrote:

On 2024-Nov-12, Amit Kapila wrote:

It's not clear to me why doesn't pgoutput cope with generated columns in
replica identities. Maybe that can be reconsidered?

In stable branches, we intentionally skip publishing generated columns
as we assumed that the subscriber side also had a generated column.
So, sending it would be a waste of network bandwidth. OTOH, when one
tries to replicate the changes to some other database that didn't have
the generated columns concept, it would create a problem. So we
developed a new feature for HEAD as part of commits 745217a051 and
7054186c4e which allows the publication of generated columns when
explicitly specified by the users.

Ah, I think it's good then, we don't need to do anything further on
this. It's just not supported on earlier branches (and it doesn't work
with pgoutput, though it does with other plugins); and master has a
mechanism for it to work with any output plugin.

I think we still need a fix for the master for the case when generated
columns are not published but are part of REPLICA IDENTITY as that
could lead to failures in applying UPDATE and DELETE on subscriber.
Am, I missing something?

--
With Regards,
Amit Kapila.

#17Alvaro Herrera
alvherre@2ndquadrant.com
In reply to: Amit Kapila (#16)
Re: Disallow UPDATE/DELETE on table with unpublished generated column as REPLICA IDENTITY

On 2024-Nov-12, Amit Kapila wrote:

I think we still need a fix for the master for the case when generated
columns are not published but are part of REPLICA IDENTITY as that
could lead to failures in applying UPDATE and DELETE on subscriber.

Ah, I thought that was already in place.

Am, I missing something?

Nope, it's me who was missing something.

--
Álvaro Herrera Breisgau, Deutschland — https://www.EnterpriseDB.com/
"Use it up, wear it out, make it do, or do without"

#18Shlok Kyal
shlok.kyal.oss@gmail.com
In reply to: Zhijie Hou (Fujitsu) (#12)
Re: Disallow UPDATE/DELETE on table with unpublished generated column as REPLICA IDENTITY

Thanks for providing the comments.

On Tue, 12 Nov 2024 at 12:52, Zhijie Hou (Fujitsu)
<houzj.fnst@fujitsu.com> wrote:

On Friday, November 8, 2024 7:06 PM Shlok Kyal <shlok.kyal.oss@gmail.com> wrote:

Hi Amit,

On Thu, 7 Nov 2024 at 11:37, Amit Kapila <amit.kapila16@gmail.com> wrote:

On Tue, Nov 5, 2024 at 12:53 PM Shlok Kyal <shlok.kyal.oss@gmail.com>

wrote:

To avoid the issue, we can disallow UPDATE/DELETE on table with
unpublished generated column as REPLICA IDENTITY. I have attached a
patch for the same.

+CREATE PUBLICATION pub_gencol FOR TABLE testpub_gencol; UPDATE
+testpub_gencol SET a = 100 WHERE a = 1;
+ERROR:  cannot update table "testpub_gencol"
+DETAIL:  Column list used by the publication does not cover the
replica identity.

This is not a correct ERROR message as the publication doesn't have
any column list associated with it. You have added the code to detect
this in the column list code path which I think is not required. BTW,
you also need to consider the latest commit 7054186c4e for this. I
guess you need to keep another flag in PublicationDesc to detect this
and then give an appropriate ERROR.

I have addressed the comments and provided an updated patch. Also, I am
currently working to fix this issue in back branches.

Thanks for the patch. I am reviewing it and have some initial comments:

1.
+                       char attgenerated = get_attgenerated(relid, attnum);
+

I think it's unnecessary to initialize attgenerated here because the value will
be overwritten if pubviaroot is true anyway. Also, the get_attgenerated()
is not cheap.

Fixed

2.

I think the patch missed to check the case when table is marked REPLICA
IDENTITY FULL, and generated column is not published:

CREATE TABLE testpub_gencol (a INT, b INT GENERATED ALWAYS AS (a + 1) STORED NOT NULL);
ALTER TABLE testpub_gencol REPLICA IDENTITY FULL;
CREATE PUBLICATION pub_gencol FOR TABLE testpub_gencol;
UPDATE testpub_gencol SET a = 2;

I expected the UPDATE to fail in above case, but it can still pass after applying the patch.

Fixed

3.

+                * If the publication is FOR ALL TABLES we can skip the validation.
+                */

This comment seems not clear to me, could you elaborate a bit more on this ?

I missed to handle the case FOR ALL TABLES. Have removed the comment.

4.

Also, I think the patch does not handle the FOR ALL TABLE case correctly:

CREATE TABLE testpub_gencol (a INT, b INT GENERATED ALWAYS AS (a + 1) STORED NOT NULL);
CREATE UNIQUE INDEX testpub_gencol_idx ON testpub_gencol (b);
ALTER TABLE testpub_gencol REPLICA IDENTITY USING index testpub_gencol_idx;
CREATE PUBLICATION pub_gencol FOR ALL TABLEs;
UPDATE testpub_gencol SET a = 2;

I expected the UPDATE to fail in above case as well.

Fixed

5.

+       else if (cmd == CMD_UPDATE && !pubdesc.replident_has_valid_gen_cols)
+               ereport(ERROR,
+                               (errcode(ERRCODE_INVALID_COLUMN_REFERENCE),
+                                errmsg("cannot update table \"%s\"",
+                                               RelationGetRelationName(rel)),
+                                errdetail("REPLICA IDENTITY consists of an unpublished generated column.")));

I think it would be better to use lower case "replica identity" to consistent
with other existing messages.

Fixed

I have attached the updated patch here.

Thanks and Regards,
Shlok Kyal

Attachments:

v3-0001-Disallow-UPDATE-DELETE-on-table-with-generated-co.patchapplication/octet-stream; name=v3-0001-Disallow-UPDATE-DELETE-on-table-with-generated-co.patchDownload+215-14
#19Amit Kapila
amit.kapila16@gmail.com
In reply to: Alvaro Herrera (#17)
Re: Disallow UPDATE/DELETE on table with unpublished generated column as REPLICA IDENTITY

On Tue, Nov 12, 2024 at 7:05 PM Alvaro Herrera <alvherre@alvh.no-ip.org> wrote:

On 2024-Nov-12, Amit Kapila wrote:

I think we still need a fix for the master for the case when generated
columns are not published but are part of REPLICA IDENTITY as that
could lead to failures in applying UPDATE and DELETE on subscriber.

Ah, I thought that was already in place.

No, we left it with the thought that we needed something for it in the
back branches as well. But now that we have decided not to do anything
for the back branches, we should fix it in HEAD.

Am, I missing something?

Nope, it's me who was missing something.

No problem, thanks for all the feedback and helping us to conclude.

--
With Regards,
Amit Kapila.

#20vignesh C
vignesh21@gmail.com
In reply to: Shlok Kyal (#18)
Re: Disallow UPDATE/DELETE on table with unpublished generated column as REPLICA IDENTITY

On Wed, 13 Nov 2024 at 11:15, Shlok Kyal <shlok.kyal.oss@gmail.com> wrote:

Thanks for providing the comments.

On Tue, 12 Nov 2024 at 12:52, Zhijie Hou (Fujitsu)
<houzj.fnst@fujitsu.com> wrote:

On Friday, November 8, 2024 7:06 PM Shlok Kyal <shlok.kyal.oss@gmail.com> wrote:

Hi Amit,

On Thu, 7 Nov 2024 at 11:37, Amit Kapila <amit.kapila16@gmail.com> wrote:

On Tue, Nov 5, 2024 at 12:53 PM Shlok Kyal <shlok.kyal.oss@gmail.com>

wrote:

To avoid the issue, we can disallow UPDATE/DELETE on table with
unpublished generated column as REPLICA IDENTITY. I have attached a
patch for the same.

+CREATE PUBLICATION pub_gencol FOR TABLE testpub_gencol; UPDATE
+testpub_gencol SET a = 100 WHERE a = 1;
+ERROR:  cannot update table "testpub_gencol"
+DETAIL:  Column list used by the publication does not cover the
replica identity.

This is not a correct ERROR message as the publication doesn't have
any column list associated with it. You have added the code to detect
this in the column list code path which I think is not required. BTW,
you also need to consider the latest commit 7054186c4e for this. I
guess you need to keep another flag in PublicationDesc to detect this
and then give an appropriate ERROR.

I have addressed the comments and provided an updated patch. Also, I am
currently working to fix this issue in back branches.

Thanks for the patch. I am reviewing it and have some initial comments:

1.
+                       char attgenerated = get_attgenerated(relid, attnum);
+

I think it's unnecessary to initialize attgenerated here because the value will
be overwritten if pubviaroot is true anyway. Also, the get_attgenerated()
is not cheap.

Fixed

2.

I think the patch missed to check the case when table is marked REPLICA
IDENTITY FULL, and generated column is not published:

CREATE TABLE testpub_gencol (a INT, b INT GENERATED ALWAYS AS (a + 1) STORED NOT NULL);
ALTER TABLE testpub_gencol REPLICA IDENTITY FULL;
CREATE PUBLICATION pub_gencol FOR TABLE testpub_gencol;
UPDATE testpub_gencol SET a = 2;

I expected the UPDATE to fail in above case, but it can still pass after applying the patch.

Fixed

3.

+                * If the publication is FOR ALL TABLES we can skip the validation.
+                */

This comment seems not clear to me, could you elaborate a bit more on this ?

I missed to handle the case FOR ALL TABLES. Have removed the comment.

4.

Also, I think the patch does not handle the FOR ALL TABLE case correctly:

CREATE TABLE testpub_gencol (a INT, b INT GENERATED ALWAYS AS (a + 1) STORED NOT NULL);
CREATE UNIQUE INDEX testpub_gencol_idx ON testpub_gencol (b);
ALTER TABLE testpub_gencol REPLICA IDENTITY USING index testpub_gencol_idx;
CREATE PUBLICATION pub_gencol FOR ALL TABLEs;
UPDATE testpub_gencol SET a = 2;

I expected the UPDATE to fail in above case as well.

Fixed

5.

+       else if (cmd == CMD_UPDATE && !pubdesc.replident_has_valid_gen_cols)
+               ereport(ERROR,
+                               (errcode(ERRCODE_INVALID_COLUMN_REFERENCE),
+                                errmsg("cannot update table \"%s\"",
+                                               RelationGetRelationName(rel)),
+                                errdetail("REPLICA IDENTITY consists of an unpublished generated column.")));

I think it would be better to use lower case "replica identity" to consistent
with other existing messages.

Fixed

I have attached the updated patch here.

Few comments:
1) In the first check relation->rd_rel->relispartition also is checked
whereas in the below it is not checked, shouldn't the same check be
there below to avoid few of the function calls which are not required:
+       if (pubviaroot && relation->rd_rel->relispartition)
+       {
+               publish_as_relid =
GetTopMostAncestorInPublication(pubid, ancestors, NULL);
+
+               if (!OidIsValid(publish_as_relid))
+                       publish_as_relid = relid;
+       }
+
+                       if (pubviaroot)
+                       {
+                               /* attribute name in the child table */
+                               char       *colname =
get_attname(relid, attnum, false);
+
+                               /*
+                                * Determine the attnum for the
attribute name in parent (we
+                                * are using the column list defined
on the parent).
+                                */
+                               attnum = get_attnum(publish_as_relid, colname);
+                               attgenerated =
get_attgenerated(publish_as_relid, attnum);
+                       }
+                       else
+                               attgenerated = get_attgenerated(relid, attnum);
2) I think we could use check_and_fetch_column_list to see that it is
not a column list publication instead of below code:
+       if (!puballtables)
+       {
+               tuple = SearchSysCache2(PUBLICATIONRELMAP,
+
ObjectIdGetDatum(publish_as_relid),
+
ObjectIdGetDatum(pubid));
+
+               if (!HeapTupleIsValid(tuple))
+                       return false;
+
+               (void) SysCacheGetAttr(PUBLICATIONRELMAP, tuple,
+
Anum_pg_publication_rel_prattrs,
+                                                          &isnull);
+
+               ReleaseSysCache(tuple);
+       }
+
+       if(puballtables || isnull)
3) Since there is only a single statement, remove the enclosing parenthisis:
+               if (!pubform->pubgencols &&
+                       (pubform->pubupdate || pubform->pubdelete) &&
+                       replident_has_unpublished_gen_col(pubid,
relation, ancestors,
+
                   pubform->pubviaroot, pubform->puballtables))
+               {
+                       pubdesc->replident_has_valid_gen_cols = false;
+               }
4) Pgindent should be run there are few issues:
4.a)
+extern bool replident_has_unpublished_gen_col(Oid pubid, Relation relation,
+
                         List *ancestors, bool pubviaroot, bool
puballtables);
4.b)
+       }
+
+       if(puballtables || isnull)
+       {
+               int                     x;
+               Bitmapset  *idattrs = NULL;
4.c)
+                * generated column we should error out.
+                */
+               if(relation->rd_rel->relreplident == REPLICA_IDENTITY_FULL &&
+                  relation->rd_att->constr &&
relation->rd_att->constr->has_generated_stored)
+                       result = true;
4.d)
+               while ((x = bms_next_member(idattrs, x)) >= 0)
+               {
+                       AttrNumber      attnum = (x +
FirstLowInvalidHeapAttributeNumber);
+                       char attgenerated;
5) You could do this in a single line comment:
+               /*
+                * Check if any REPLICA IDENTITY column is an generated column.
+                */
+               while ((x = bms_next_member(idattrs, x)) >= 0)
6) I felt one of update or delete is enough in this case as the code
path is same:
+UPDATE testpub_gencol SET a = 100 WHERE a = 1;
+DELETE FROM testpub_gencol WHERE a = 100;
+
+-- error - generated column "b" is not published and REPLICA IDENTITY
is set FULL
+ALTER TABLE testpub_gencol REPLICA IDENTITY FULL;
+UPDATE testpub_gencol SET a = 100 WHERE a = 1;
+DELETE FROM testpub_gencol WHERE a = 100;
+DROP PUBLICATION pub_gencol;
+
+-- ok - generated column "b" is published and is part of REPLICA IDENTITY
+CREATE PUBLICATION pub_gencol FOR TABLE testpub_gencol with
(publish_generated_columns = true);
+UPDATE testpub_gencol SET a = 100 WHERE a = 1;
+DELETE FROM testpub_gencol WHERE a = 100;

Regards,
Vignesh

#21Shlok Kyal
shlok.kyal.oss@gmail.com
In reply to: vignesh C (#20)
#22vignesh C
vignesh21@gmail.com
In reply to: Shlok Kyal (#21)
#23Shlok Kyal
shlok.kyal.oss@gmail.com
In reply to: vignesh C (#22)
#24vignesh C
vignesh21@gmail.com
In reply to: Shlok Kyal (#23)
#25Shlok Kyal
shlok.kyal.oss@gmail.com
In reply to: vignesh C (#24)
#26vignesh C
vignesh21@gmail.com
In reply to: Shlok Kyal (#25)
#27Zhijie Hou (Fujitsu)
houzj.fnst@fujitsu.com
In reply to: Shlok Kyal (#25)
#28Shlok Kyal
shlok.kyal.oss@gmail.com
In reply to: vignesh C (#26)
#29Shlok Kyal
shlok.kyal.oss@gmail.com
In reply to: Zhijie Hou (Fujitsu) (#27)
#30vignesh C
vignesh21@gmail.com
In reply to: Shlok Kyal (#28)
#31Shlok Kyal
shlok.kyal.oss@gmail.com
In reply to: vignesh C (#30)
#32Zhijie Hou (Fujitsu)
houzj.fnst@fujitsu.com
In reply to: Shlok Kyal (#31)
#33vignesh C
vignesh21@gmail.com
In reply to: Shlok Kyal (#31)
#34Shlok Kyal
shlok.kyal.oss@gmail.com
In reply to: Zhijie Hou (Fujitsu) (#32)
#35Shlok Kyal
shlok.kyal.oss@gmail.com
In reply to: vignesh C (#33)
#36Zhijie Hou (Fujitsu)
houzj.fnst@fujitsu.com
In reply to: Shlok Kyal (#35)
#37Zhijie Hou (Fujitsu)
houzj.fnst@fujitsu.com
In reply to: Zhijie Hou (Fujitsu) (#36)
#38Shlok Kyal
shlok.kyal.oss@gmail.com
In reply to: Zhijie Hou (Fujitsu) (#36)
#39Zhijie Hou (Fujitsu)
houzj.fnst@fujitsu.com
In reply to: Shlok Kyal (#38)
#40vignesh C
vignesh21@gmail.com
In reply to: Shlok Kyal (#38)
#41Shlok Kyal
shlok.kyal.oss@gmail.com
In reply to: vignesh C (#40)
#42Amit Kapila
amit.kapila16@gmail.com
In reply to: Shlok Kyal (#41)
#43Shlok Kyal
shlok.kyal.oss@gmail.com
In reply to: Amit Kapila (#42)
#44vignesh C
vignesh21@gmail.com
In reply to: Shlok Kyal (#43)
#45Shlok Kyal
shlok.kyal.oss@gmail.com
In reply to: vignesh C (#44)
#46Zhijie Hou (Fujitsu)
houzj.fnst@fujitsu.com
In reply to: Shlok Kyal (#45)
#47Shlok Kyal
shlok.kyal.oss@gmail.com
In reply to: Zhijie Hou (Fujitsu) (#46)
#48Amit Kapila
amit.kapila16@gmail.com
In reply to: Shlok Kyal (#47)
#49Peter Smith
smithpb2250@gmail.com
In reply to: Amit Kapila (#48)
#50Amit Kapila
amit.kapila16@gmail.com
In reply to: Peter Smith (#49)
#51Peter Smith
smithpb2250@gmail.com
In reply to: Amit Kapila (#50)
#52Amit Kapila
amit.kapila16@gmail.com
In reply to: Peter Smith (#51)
#53Peter Smith
smithpb2250@gmail.com
In reply to: Amit Kapila (#52)
#54Shlok Kyal
shlok.kyal.oss@gmail.com
In reply to: Peter Smith (#53)
#55Amit Kapila
amit.kapila16@gmail.com
In reply to: Shlok Kyal (#54)