Question about behavior of deletes with REPLICA IDENTITY NOTHING

Started by James Colemanabout 2 years ago29 messageshackers
Jump to latest
#1James Coleman
jtc331@gmail.com

Hello,

We recently noticed some behavior that seems reasonable but also
surprised our engineers based on the docs.

If we have this setup:
create table items(i int);
insert into items(i) values (1);
create publication test_pub for all tables;

Then when we:
delete from items where i = 1;

we get:
ERROR: cannot delete from table "items" because it does not have a
replica identity and publishes deletes
HINT: To enable deleting from the table, set REPLICA IDENTITY using
ALTER TABLE.

Fair enough. But if we do this:
alter table items replica identity nothing;

because the docs [1] say that NOTHING means "Records no information
about the old row." We still get the same error when we try the DELETE
again.

The publication docs [2] say "A published table must have a replica
identity configured in order to be able to replicate UPDATE and DELETE
operations, so that appropriate rows to update or delete can be
identified on the subscriber side."

We interpreted the intersection of these two docs to imply that if you
explicitly configured NOTHING that the publication would simply not
log anything about the original row. Part of the confusion I think was
fed by reading "must have a replica identity set" as "have selected
one of the options via ALTER TABLE REPLICA IDENTITY" -- i.e., as
meaning that a setting has been configured rather than being about a
subset of those possible configuration values/a specific key existing
on the table.

I'm wondering if this might be a surprise to anyone else, and if so,
is there a minor docs tweak that might avoid the confusion?

Thanks,
James Coleman

1: https://www.postgresql.org/docs/current/sql-altertable.html#SQL-ALTERTABLE-REPLICA-IDENTITY
2: https://www.postgresql.org/docs/current/logical-replication-publication.html

#2Laurenz Albe
laurenz.albe@cybertec.at
In reply to: James Coleman (#1)
Re: Question about behavior of deletes with REPLICA IDENTITY NOTHING

On Wed, 2024-02-07 at 15:12 -0500, James Coleman wrote:

We recently noticed some behavior that seems reasonable but also
surprised our engineers based on the docs.

If we have this setup:
create table items(i int);
insert into items(i) values (1);
create publication test_pub for all tables;

Then when we:
delete from items where i = 1;

we get:
ERROR: cannot delete from table "items" because it does not have a
replica identity and publishes deletes
HINT: To enable deleting from the table, set REPLICA IDENTITY using
ALTER TABLE.

Fair enough. But if we do this:
alter table items replica identity nothing;

because the docs [1] say that NOTHING means "Records no information
about the old row." We still get the same error when we try the DELETE
again.

Well, "REPLICA IDENTITY NOTHING" is the same as "has no replica identity".
So is "REPLICA IDENTITY DEFAULT" if there is no primary key, or
"REPLICA IDENTITY USING INDEX ..." if the index is dropped.

See "pg_class": the column "relreplident" is not nullable.

Yours,
Laurenz Albe

#3James Coleman
jtc331@gmail.com
In reply to: Laurenz Albe (#2)
Re: Question about behavior of deletes with REPLICA IDENTITY NOTHING

On Wed, Feb 7, 2024 at 3:22 PM Laurenz Albe <laurenz.albe@cybertec.at> wrote:

On Wed, 2024-02-07 at 15:12 -0500, James Coleman wrote:

We recently noticed some behavior that seems reasonable but also
surprised our engineers based on the docs.

If we have this setup:
create table items(i int);
insert into items(i) values (1);
create publication test_pub for all tables;

Then when we:
delete from items where i = 1;

we get:
ERROR: cannot delete from table "items" because it does not have a
replica identity and publishes deletes
HINT: To enable deleting from the table, set REPLICA IDENTITY using
ALTER TABLE.

Fair enough. But if we do this:
alter table items replica identity nothing;

because the docs [1] say that NOTHING means "Records no information
about the old row." We still get the same error when we try the DELETE
again.

Well, "REPLICA IDENTITY NOTHING" is the same as "has no replica identity".
So is "REPLICA IDENTITY DEFAULT" if there is no primary key, or
"REPLICA IDENTITY USING INDEX ..." if the index is dropped.

See "pg_class": the column "relreplident" is not nullable.

Right, I think the confusing point for us is that the docs for NOTHING
("Records no information about the old row") imply you can decide you
don't have to record anything if you don't want to do so, but the
publication feature is effectively overriding that and asserting that
you can't make that choice.

Regards,
James Coleman

#4Peter Smith
smithpb2250@gmail.com
In reply to: James Coleman (#3)
Re: Question about behavior of deletes with REPLICA IDENTITY NOTHING

On Thu, Feb 8, 2024 at 9:04 AM James Coleman <jtc331@gmail.com> wrote:

On Wed, Feb 7, 2024 at 3:22 PM Laurenz Albe <laurenz.albe@cybertec.at> wrote:

On Wed, 2024-02-07 at 15:12 -0500, James Coleman wrote:

We recently noticed some behavior that seems reasonable but also
surprised our engineers based on the docs.

If we have this setup:
create table items(i int);
insert into items(i) values (1);
create publication test_pub for all tables;

Then when we:
delete from items where i = 1;

we get:
ERROR: cannot delete from table "items" because it does not have a
replica identity and publishes deletes
HINT: To enable deleting from the table, set REPLICA IDENTITY using
ALTER TABLE.

Fair enough. But if we do this:
alter table items replica identity nothing;

because the docs [1] say that NOTHING means "Records no information
about the old row." We still get the same error when we try the DELETE
again.

Well, "REPLICA IDENTITY NOTHING" is the same as "has no replica identity".
So is "REPLICA IDENTITY DEFAULT" if there is no primary key, or
"REPLICA IDENTITY USING INDEX ..." if the index is dropped.

See "pg_class": the column "relreplident" is not nullable.

Right, I think the confusing point for us is that the docs for NOTHING
("Records no information about the old row") imply you can decide you
don't have to record anything if you don't want to do so, but the
publication feature is effectively overriding that and asserting that
you can't make that choice.

Hi, I can see how the current docs could be interpreted in a way that
was not intended.

~~~

To emphasise the DEFAULT behaviour that Laurenze described, I felt
there could be another sentence about DEFAULT, the same as there is
already for the USING INDEX case.

BEFORE [1]https://www.postgresql.org/docs/current/sql-altertable.html#SQL-ALTERTABLE-REPLICA-IDENTITY
Records the old values of the columns of the primary key, if any. This
is the default for non-system tables.

SUGGESTION
Records the old values of the columns of the primary key, if any. This
is the default for non-system tables. If there is no primary key, the
behavior is the same as NOTHING.

~~~

If that is done, then would a publication docs tweak like the one
below clarify things sufficiently?

BEFORE [2]https://www.postgresql.org/docs/current/logical-replication-publication.html
If a table without a replica identity is added to a publication that
replicates UPDATE or DELETE operations then subsequent UPDATE or
DELETE operations will cause an error on the publisher.

SUGGESTION
If a table without a replica identity (or with replica identity
behavior equivalent to NOTHING) is added to a publication that
replicates UPDATE or DELETE operations then subsequent UPDATE or
DELETE operations will cause an error on the publisher.

======
[1]: https://www.postgresql.org/docs/current/sql-altertable.html#SQL-ALTERTABLE-REPLICA-IDENTITY
[2]: https://www.postgresql.org/docs/current/logical-replication-publication.html

Kind Regards,
Peter Smith.
Fujitsu Australia

#5James Coleman
jtc331@gmail.com
In reply to: Peter Smith (#4)
Re: Question about behavior of deletes with REPLICA IDENTITY NOTHING

On Wed, Feb 7, 2024 at 6:04 PM Peter Smith <smithpb2250@gmail.com> wrote:

On Thu, Feb 8, 2024 at 9:04 AM James Coleman <jtc331@gmail.com> wrote:

On Wed, Feb 7, 2024 at 3:22 PM Laurenz Albe <laurenz.albe@cybertec.at> wrote:

On Wed, 2024-02-07 at 15:12 -0500, James Coleman wrote:

We recently noticed some behavior that seems reasonable but also
surprised our engineers based on the docs.

If we have this setup:
create table items(i int);
insert into items(i) values (1);
create publication test_pub for all tables;

Then when we:
delete from items where i = 1;

we get:
ERROR: cannot delete from table "items" because it does not have a
replica identity and publishes deletes
HINT: To enable deleting from the table, set REPLICA IDENTITY using
ALTER TABLE.

Fair enough. But if we do this:
alter table items replica identity nothing;

because the docs [1] say that NOTHING means "Records no information
about the old row." We still get the same error when we try the DELETE
again.

Well, "REPLICA IDENTITY NOTHING" is the same as "has no replica identity".
So is "REPLICA IDENTITY DEFAULT" if there is no primary key, or
"REPLICA IDENTITY USING INDEX ..." if the index is dropped.

See "pg_class": the column "relreplident" is not nullable.

Right, I think the confusing point for us is that the docs for NOTHING
("Records no information about the old row") imply you can decide you
don't have to record anything if you don't want to do so, but the
publication feature is effectively overriding that and asserting that
you can't make that choice.

Hi, I can see how the current docs could be interpreted in a way that
was not intended.

~~~

To emphasise the DEFAULT behaviour that Laurenze described, I felt
there could be another sentence about DEFAULT, the same as there is
already for the USING INDEX case.

BEFORE [1]
Records the old values of the columns of the primary key, if any. This
is the default for non-system tables.

SUGGESTION
Records the old values of the columns of the primary key, if any. This
is the default for non-system tables. If there is no primary key, the
behavior is the same as NOTHING.

~~~

If that is done, then would a publication docs tweak like the one
below clarify things sufficiently?

BEFORE [2]
If a table without a replica identity is added to a publication that
replicates UPDATE or DELETE operations then subsequent UPDATE or
DELETE operations will cause an error on the publisher.

SUGGESTION
If a table without a replica identity (or with replica identity
behavior equivalent to NOTHING) is added to a publication that
replicates UPDATE or DELETE operations then subsequent UPDATE or
DELETE operations will cause an error on the publisher.

======
[1] https://www.postgresql.org/docs/current/sql-altertable.html#SQL-ALTERTABLE-REPLICA-IDENTITY
[2] https://www.postgresql.org/docs/current/logical-replication-publication.html

Kind Regards,
Peter Smith.
Fujitsu Australia

Thanks for looking at this!

Yes, both of those changes together would make this unambiguous (and,
I think, easier to mentally parse).

Thanks,
James Coleman

#6Peter Smith
smithpb2250@gmail.com
In reply to: James Coleman (#5)
Re: Question about behavior of deletes with REPLICA IDENTITY NOTHING

On Thu, Feb 8, 2024 at 11:12 AM James Coleman <jtc331@gmail.com> wrote:

On Wed, Feb 7, 2024 at 6:04 PM Peter Smith <smithpb2250@gmail.com> wrote:

On Thu, Feb 8, 2024 at 9:04 AM James Coleman <jtc331@gmail.com> wrote:

On Wed, Feb 7, 2024 at 3:22 PM Laurenz Albe <laurenz.albe@cybertec.at> wrote:

On Wed, 2024-02-07 at 15:12 -0500, James Coleman wrote:

We recently noticed some behavior that seems reasonable but also
surprised our engineers based on the docs.

If we have this setup:
create table items(i int);
insert into items(i) values (1);
create publication test_pub for all tables;

Then when we:
delete from items where i = 1;

we get:
ERROR: cannot delete from table "items" because it does not have a
replica identity and publishes deletes
HINT: To enable deleting from the table, set REPLICA IDENTITY using
ALTER TABLE.

Fair enough. But if we do this:
alter table items replica identity nothing;

because the docs [1] say that NOTHING means "Records no information
about the old row." We still get the same error when we try the DELETE
again.

Well, "REPLICA IDENTITY NOTHING" is the same as "has no replica identity".
So is "REPLICA IDENTITY DEFAULT" if there is no primary key, or
"REPLICA IDENTITY USING INDEX ..." if the index is dropped.

See "pg_class": the column "relreplident" is not nullable.

Right, I think the confusing point for us is that the docs for NOTHING
("Records no information about the old row") imply you can decide you
don't have to record anything if you don't want to do so, but the
publication feature is effectively overriding that and asserting that
you can't make that choice.

Hi, I can see how the current docs could be interpreted in a way that
was not intended.

~~~

To emphasise the DEFAULT behaviour that Laurenze described, I felt
there could be another sentence about DEFAULT, the same as there is
already for the USING INDEX case.

BEFORE [1]
Records the old values of the columns of the primary key, if any. This
is the default for non-system tables.

SUGGESTION
Records the old values of the columns of the primary key, if any. This
is the default for non-system tables. If there is no primary key, the
behavior is the same as NOTHING.

~~~

If that is done, then would a publication docs tweak like the one
below clarify things sufficiently?

BEFORE [2]
If a table without a replica identity is added to a publication that
replicates UPDATE or DELETE operations then subsequent UPDATE or
DELETE operations will cause an error on the publisher.

SUGGESTION
If a table without a replica identity (or with replica identity
behavior equivalent to NOTHING) is added to a publication that
replicates UPDATE or DELETE operations then subsequent UPDATE or
DELETE operations will cause an error on the publisher.

======
[1] https://www.postgresql.org/docs/current/sql-altertable.html#SQL-ALTERTABLE-REPLICA-IDENTITY
[2] https://www.postgresql.org/docs/current/logical-replication-publication.html

Kind Regards,
Peter Smith.
Fujitsu Australia

Thanks for looking at this!

Yes, both of those changes together would make this unambiguous (and,
I think, easier to mentally parse).

OK, here then is a patch to do like that.

======
Kind Regards,
Peter Smith.
Fujitsu Australia

Attachments:

v1-0001-replica-identity-clarifications.patchapplication/octet-stream; name=v1-0001-replica-identity-clarifications.patchDownload+4-3
#7Laurenz Albe
laurenz.albe@cybertec.at
In reply to: Peter Smith (#6)
Re: Question about behavior of deletes with REPLICA IDENTITY NOTHING

On Thu, 2024-02-08 at 13:40 +1100, Peter Smith wrote:

-   how to set the replica identity.  If a table without a replica identity is
+   how to set the replica identity.  If a table without a replica identity
+   (or with replica identity behavior the same as <literal>NOTHING</literal>) is
added to a publication that replicates <command>UPDATE</command>
or <command>DELETE</command> operations then
subsequent <command>UPDATE</command> or <command>DELETE</command>

I had the impression that the root of the confusion was the perceived difference
between "REPLICA IDENTITY NOTHING" and "no replica identity", and that change
doesn't improve that.

How about:

If a table without a replica identity (explicitly set to <literal>NOTHING</literal>,
or set to a primary key or index that doesn't exist) is added ...

Yours,
Laurenz Albe

#8Ashutosh Bapat
ashutosh.bapat@enterprisedb.com
In reply to: Laurenz Albe (#7)
Re: Question about behavior of deletes with REPLICA IDENTITY NOTHING

On Thu, Feb 8, 2024 at 9:57 AM Laurenz Albe <laurenz.albe@cybertec.at> wrote:

On Thu, 2024-02-08 at 13:40 +1100, Peter Smith wrote:

-   how to set the replica identity.  If a table without a replica identity is
+   how to set the replica identity.  If a table without a replica identity
+   (or with replica identity behavior the same as <literal>NOTHING</literal>) is
added to a publication that replicates <command>UPDATE</command>
or <command>DELETE</command> operations then
subsequent <command>UPDATE</command> or <command>DELETE</command>

I had the impression that the root of the confusion was the perceived difference
between "REPLICA IDENTITY NOTHING" and "no replica identity", and that change
doesn't improve that.

How about:

If a table without a replica identity (explicitly set to <literal>NOTHING</literal>,
or set to a primary key or index that doesn't exist) is added ...

Another possibility is just to improve the documentation of various
options as follows.

DEFAULT

If there is a primary key, record the old values of the columns of the
primary key. Otherwise it acts as NOTHING. This is the default for
non-system tables.

USING INDEX index_name

Records the old values of the columns covered by the named index, that
must be unique, not partial, not deferrable, and include only columns
marked NOT NULL. If this index is dropped, the behavior is the same as
NOTHING.

FULL

Records the old values of all columns in the row.

NOTHING

Records no information about the old row. This is equivalent to having
no replica identity. This is the default for system tables.

--
Best Wishes,
Ashutosh Bapat

#9James Coleman
jtc331@gmail.com
In reply to: Laurenz Albe (#7)
Re: Question about behavior of deletes with REPLICA IDENTITY NOTHING

On Wed, Feb 7, 2024 at 11:27 PM Laurenz Albe <laurenz.albe@cybertec.at> wrote:

On Thu, 2024-02-08 at 13:40 +1100, Peter Smith wrote:

-   how to set the replica identity.  If a table without a replica identity is
+   how to set the replica identity.  If a table without a replica identity
+   (or with replica identity behavior the same as <literal>NOTHING</literal>) is
added to a publication that replicates <command>UPDATE</command>
or <command>DELETE</command> operations then
subsequent <command>UPDATE</command> or <command>DELETE</command>

I had the impression that the root of the confusion was the perceived difference
between "REPLICA IDENTITY NOTHING" and "no replica identity", and that change
doesn't improve that.

How about:

If a table without a replica identity (explicitly set to <literal>NOTHING</literal>,
or set to a primary key or index that doesn't exist) is added ...

I think that would work also. I was reading the initial suggestion as
"(or with replica identity behavior the same as..." as defining what
"without a replica identity" meant, which would avoid the confusion.
But your proposal is more explicit and more succinct, so I think it's
the better option of the two.

Regards,
James Coleman

#10James Coleman
jtc331@gmail.com
In reply to: Ashutosh Bapat (#8)
Re: Question about behavior of deletes with REPLICA IDENTITY NOTHING

On Thu, Feb 8, 2024 at 4:47 AM Ashutosh Bapat
<ashutosh.bapat.oss@gmail.com> wrote:

On Thu, Feb 8, 2024 at 9:57 AM Laurenz Albe <laurenz.albe@cybertec.at> wrote:

On Thu, 2024-02-08 at 13:40 +1100, Peter Smith wrote:

-   how to set the replica identity.  If a table without a replica identity is
+   how to set the replica identity.  If a table without a replica identity
+   (or with replica identity behavior the same as <literal>NOTHING</literal>) is
added to a publication that replicates <command>UPDATE</command>
or <command>DELETE</command> operations then
subsequent <command>UPDATE</command> or <command>DELETE</command>

I had the impression that the root of the confusion was the perceived difference
between "REPLICA IDENTITY NOTHING" and "no replica identity", and that change
doesn't improve that.

How about:

If a table without a replica identity (explicitly set to <literal>NOTHING</literal>,
or set to a primary key or index that doesn't exist) is added ...

Another possibility is just to improve the documentation of various
options as follows.

DEFAULT

If there is a primary key, record the old values of the columns of the
primary key. Otherwise it acts as NOTHING. This is the default for
non-system tables.

USING INDEX index_name

Records the old values of the columns covered by the named index, that
must be unique, not partial, not deferrable, and include only columns
marked NOT NULL. If this index is dropped, the behavior is the same as
NOTHING.

FULL

Records the old values of all columns in the row.

NOTHING

Records no information about the old row. This is equivalent to having
no replica identity. This is the default for system tables.

This is the simplest change, and it does solve the confusion, so I'd
be happy with it also. The other proposals have the benefit of having
all the information necessary on the publications page rather than
requiring the user to refer to the ALTER TABLE REPLICA IDENTITY page
to understand what's meant.

Regards,
James Coleman

#11Peter Smith
smithpb2250@gmail.com
In reply to: James Coleman (#9)
Re: Question about behavior of deletes with REPLICA IDENTITY NOTHING

While revisiting some old threads, I found this one that seemed to
reach a conclusion, but then it seemed nothing happened.

After multiple suggestions AFAICT James preferred the docs [1]https://www.postgresql.org/docs/devel/logical-replication-publication.html
modification suggested [2]/messages/by-id/CAAaqYe91iO3dfUnVmBs4M-4aUX_zHmPN72ELE7c_8qAO_toPmA@mail.gmail.com by Laurenz.

Should we make a CF entry for this with the status RfC, or was the
whole thing abandoned for some reason?

======
[1]: https://www.postgresql.org/docs/devel/logical-replication-publication.html
[2]: /messages/by-id/CAAaqYe91iO3dfUnVmBs4M-4aUX_zHmPN72ELE7c_8qAO_toPmA@mail.gmail.com

Kind Regards,
Peter Smith.
Fujitsu Australia

#12James Coleman
jtc331@gmail.com
In reply to: Peter Smith (#11)
Re: Question about behavior of deletes with REPLICA IDENTITY NOTHING

On Mon, Dec 16, 2024 at 6:17 PM Peter Smith <smithpb2250@gmail.com> wrote:

While revisiting some old threads, I found this one that seemed to
reach a conclusion, but then it seemed nothing happened.

After multiple suggestions AFAICT James preferred the docs [1]
modification suggested [2] by Laurenz.

Should we make a CF entry for this with the status RfC, or was the
whole thing abandoned for some reason?

======
[1] https://www.postgresql.org/docs/devel/logical-replication-publication.html
[2] /messages/by-id/CAAaqYe91iO3dfUnVmBs4M-4aUX_zHmPN72ELE7c_8qAO_toPmA@mail.gmail.com

Kind Regards,
Peter Smith.
Fujitsu Australia

Yes, I would appreciate it moving forward.

Did you want to create the CF entry or should I?

Regards,
James Coleman

#13Peter Smith
smithpb2250@gmail.com
In reply to: James Coleman (#12)
Re: Question about behavior of deletes with REPLICA IDENTITY NOTHING

On Wed, Dec 18, 2024 at 2:50 AM James Coleman <jtc331@gmail.com> wrote:

On Mon, Dec 16, 2024 at 6:17 PM Peter Smith <smithpb2250@gmail.com> wrote:

While revisiting some old threads, I found this one that seemed to
reach a conclusion, but then it seemed nothing happened.

After multiple suggestions AFAICT James preferred the docs [1]
modification suggested [2] by Laurenz.

Should we make a CF entry for this with the status RfC, or was the
whole thing abandoned for some reason?

======
[1] https://www.postgresql.org/docs/devel/logical-replication-publication.html
[2] /messages/by-id/CAAaqYe91iO3dfUnVmBs4M-4aUX_zHmPN72ELE7c_8qAO_toPmA@mail.gmail.com

Kind Regards,
Peter Smith.
Fujitsu Australia

Yes, I would appreciate it moving forward.

Did you want to create the CF entry or should I?

I did it. See here: https://commitfest.postgresql.org/51/5445/ marked
it as Ready for Committer.

I was unsure whether to record the author as you (the thread author)
or as Laurenz (the favoured patch author) so I just left some fields
blank.

======
Kind Regards,
Peter Smith.
Fujitsu Australia

#14Amit Kapila
amit.kapila16@gmail.com
In reply to: James Coleman (#10)
Re: Question about behavior of deletes with REPLICA IDENTITY NOTHING

On Thu, Feb 8, 2024 at 7:24 PM James Coleman <jtc331@gmail.com> wrote:

On Thu, Feb 8, 2024 at 4:47 AM Ashutosh Bapat
<ashutosh.bapat.oss@gmail.com> wrote:

On Thu, Feb 8, 2024 at 9:57 AM Laurenz Albe <laurenz.albe@cybertec.at> wrote:

On Thu, 2024-02-08 at 13:40 +1100, Peter Smith wrote:

-   how to set the replica identity.  If a table without a replica identity is
+   how to set the replica identity.  If a table without a replica identity
+   (or with replica identity behavior the same as <literal>NOTHING</literal>) is
added to a publication that replicates <command>UPDATE</command>
or <command>DELETE</command> operations then
subsequent <command>UPDATE</command> or <command>DELETE</command>

I had the impression that the root of the confusion was the perceived difference
between "REPLICA IDENTITY NOTHING" and "no replica identity", and that change
doesn't improve that.

How about:

If a table without a replica identity (explicitly set to <literal>NOTHING</literal>,
or set to a primary key or index that doesn't exist) is added ...

Another possibility is just to improve the documentation of various
options as follows.

DEFAULT

If there is a primary key, record the old values of the columns of the
primary key. Otherwise it acts as NOTHING. This is the default for
non-system tables.

USING INDEX index_name

Records the old values of the columns covered by the named index, that
must be unique, not partial, not deferrable, and include only columns
marked NOT NULL. If this index is dropped, the behavior is the same as
NOTHING.

FULL

Records the old values of all columns in the row.

NOTHING

Records no information about the old row. This is equivalent to having
no replica identity. This is the default for system tables.

This is the simplest change, and it does solve the confusion, so I'd
be happy with it also. The other proposals have the benefit of having
all the information necessary on the publications page rather than
requiring the user to refer to the ALTER TABLE REPLICA IDENTITY page
to understand what's meant.

There is no harm in having it at both places (publications page and
ALTER TABLE REPLICA IDENTITY page). Would someone be interested in
preparing a patch with the changes agreed upon?

--
With Regards,
Amit Kapila.

#15Amit Kapila
amit.kapila16@gmail.com
In reply to: Laurenz Albe (#7)
Re: Question about behavior of deletes with REPLICA IDENTITY NOTHING

On Thu, Feb 8, 2024 at 9:57 AM Laurenz Albe <laurenz.albe@cybertec.at> wrote:

On Thu, 2024-02-08 at 13:40 +1100, Peter Smith wrote:

-   how to set the replica identity.  If a table without a replica identity is
+   how to set the replica identity.  If a table without a replica identity
+   (or with replica identity behavior the same as <literal>NOTHING</literal>) is
added to a publication that replicates <command>UPDATE</command>
or <command>DELETE</command> operations then
subsequent <command>UPDATE</command> or <command>DELETE</command>

I had the impression that the root of the confusion was the perceived difference
between "REPLICA IDENTITY NOTHING" and "no replica identity", and that change
doesn't improve that.

How about:

If a table without a replica identity (explicitly set to <literal>NOTHING</literal>,
or set to a primary key or index that doesn't exist) is added ...

Is it correct to say "set to a primary key or index that doesn't
exist"? Because when it is set to the primary key then it should work.

I think Peter's proposal along with Ashutosh's proposal is the simpler
approach to clarify things in this area but I am fine if others find
some other way of updating docs better.

--
With Regards,
Amit Kapila.

#16Robert Treat
xzilla@users.sourceforge.net
In reply to: Amit Kapila (#15)
Re: Question about behavior of deletes with REPLICA IDENTITY NOTHING

On Wed, Dec 18, 2024 at 5:56 AM Amit Kapila <amit.kapila16@gmail.com> wrote:

On Thu, Feb 8, 2024 at 9:57 AM Laurenz Albe <laurenz.albe@cybertec.at> wrote:

On Thu, 2024-02-08 at 13:40 +1100, Peter Smith wrote:

-   how to set the replica identity.  If a table without a replica identity is
+   how to set the replica identity.  If a table without a replica identity
+   (or with replica identity behavior the same as <literal>NOTHING</literal>) is
added to a publication that replicates <command>UPDATE</command>
or <command>DELETE</command> operations then
subsequent <command>UPDATE</command> or <command>DELETE</command>

I had the impression that the root of the confusion was the perceived difference
between "REPLICA IDENTITY NOTHING" and "no replica identity", and that change
doesn't improve that.

How about:

If a table without a replica identity (explicitly set to <literal>NOTHING</literal>,
or set to a primary key or index that doesn't exist) is added ...

Is it correct to say "set to a primary key or index that doesn't
exist"? Because when it is set to the primary key then it should work.

I think Peter's proposal along with Ashutosh's proposal is the simpler
approach to clarify things in this area but I am fine if others find
some other way of updating docs better.

After reading this thread, I think part of the confusion here is that
technically speaking there is no such thing as having "no replica
identity"; when a table is created, by default it's "replica identity"
is set to DEFAULT, and how it behaves at that point will be dependent
on the structure of the table (PK or no PK), not whether it is being
replicated/published. To that end, I've taken the above suggestions
and re-worked them to remove that language, as well as add some
additional clarity.

Patch attached for this, I am going to update the commitfest with
myself as author and will work this further if needed. Thanks all.

Robert Treat
https://xzilla.net

Attachments:

0001-Replica-Identity-clarifications.patchapplication/octet-stream; name=0001-Replica-Identity-clarifications.patchDownload+6-4
#17Peter Smith
smithpb2250@gmail.com
In reply to: Robert Treat (#16)
Re: Question about behavior of deletes with REPLICA IDENTITY NOTHING

On Sat, Jan 4, 2025 at 4:23 AM Robert Treat <rob@xzilla.net> wrote:

On Wed, Dec 18, 2024 at 5:56 AM Amit Kapila <amit.kapila16@gmail.com> wrote:

On Thu, Feb 8, 2024 at 9:57 AM Laurenz Albe <laurenz.albe@cybertec.at> wrote:

On Thu, 2024-02-08 at 13:40 +1100, Peter Smith wrote:

-   how to set the replica identity.  If a table without a replica identity is
+   how to set the replica identity.  If a table without a replica identity
+   (or with replica identity behavior the same as <literal>NOTHING</literal>) is
added to a publication that replicates <command>UPDATE</command>
or <command>DELETE</command> operations then
subsequent <command>UPDATE</command> or <command>DELETE</command>

I had the impression that the root of the confusion was the perceived difference
between "REPLICA IDENTITY NOTHING" and "no replica identity", and that change
doesn't improve that.

How about:

If a table without a replica identity (explicitly set to <literal>NOTHING</literal>,
or set to a primary key or index that doesn't exist) is added ...

Is it correct to say "set to a primary key or index that doesn't
exist"? Because when it is set to the primary key then it should work.

I think Peter's proposal along with Ashutosh's proposal is the simpler
approach to clarify things in this area but I am fine if others find
some other way of updating docs better.

After reading this thread, I think part of the confusion here is that
technically speaking there is no such thing as having "no replica
identity"; when a table is created, by default it's "replica identity"
is set to DEFAULT, and how it behaves at that point will be dependent
on the structure of the table (PK or no PK), not whether it is being
replicated/published. To that end, I've taken the above suggestions
and re-worked them to remove that language, as well as add some
additional clarity.

Patch attached for this, I am going to update the commitfest with
myself as author and will work this further if needed. Thanks all.

Robert Treat
https://xzilla.net

Hi Robert.

Thanks for picking up this patch.

Some review comments for the 0001 patch.

======
doc/src/sgml/logical-replication.sgml

1.
    fallback if no other solution is possible.  If a replica identity other
    than <literal>FULL</literal> is set on the publisher side, a
replica identity
    comprising the same or fewer columns must also be set on the subscriber
-   side.  See <xref linkend="sql-altertable-replica-identity"/> for details on
-   how to set the replica identity.  If a table without a replica identity is
+   side.  If a table with replica identity set to <literal>NOTHING</literal>
+   (or set to use a primary key or index that doesn't exist) is
    added to a publication that replicates <command>UPDATE</command>
    or <command>DELETE</command> operations then
    subsequent <command>UPDATE</command> or <command>DELETE</command>
    operations will cause an error on the publisher.  <command>INSERT</command>
    operations can proceed regardless of any replica identity.
+   See <xref linkend="sql-altertable-replica-identity"/> for details on
+   how to set the replica identity.

1a.
That part "If a table with replica identity set to
<literal>NOTHING</literal> (or set to use a primary key or index that
doesn't exist) is added ..." is not very clear to me.

IIUC, there are 3 ways for this to be a problem:
i) RI is set to NOTHING
ii) RI is set to DEFAULT but there is no valid PK ==> same as NOTHING
iii) RI is set USING INDEX but then that index gets dropped ==> same as NOTHING

To avoid any misunderstandings, why don't we just spell that out in
full? So, ...

SUGGESTION
A replica identity behaves the same as <literal>NOTHING</literal> when
it is set to <literal>DEFAULT</literal> and there is no primary key,
or when it is set <literal>USING INDEX</literal> but the index no
longer exists. If a table with replica identity set to
<literal>NOTHING</literal> (or behaving the same as
<literal>NOTHING</literal>) is added to a publication that replicates
<command>UPDATE</command> or <command>DELETE</command> operations then
subsequent <command>UPDATE</command> or <command>DELETE</command>
operations will cause an error on the publisher.
~

1b.
I've always thought that for this important topic of logical
replication it is unusual that there is not even a heading for this
topic anywhere. This makes it unnecessarily difficult to find this
information. IMO it would greatly help just to have a "Replica
Identity" subsection for all this paragraph, so then it will appear
nicely in the Chapter 29 table-of-contents making it much easier to
find.

~

1c.
That great big slab of paragraph text is hard to read. I suspect it
was done that way simply to separate the RI topic visually from the
other (paragraph) topics of the sect1 heading. But now, after we
introduce the sect2 heading (my suggestion #1b above), we don't have
to do that anymore, so more blank lines can be added and it improves
the readability a lot.

======
src/sgml/ref/alter_table.sgml b/doc/src/sgml/ref/alter_table.sgml

2.
         <para>
-         Records the old values of the columns of the primary key, if any.
+         Records the old values of the columns of the primary key. When there
+         is no primary key, the behavior is the same as
<literal>NOTHING</literal>.
          This is the default for non-system tables.
         </para>

Currently what "This" refers to seems ambiguous. It might be better to
rearrange to put that last sentence as second.

SUGGESTION
Records the old values of the columns of the primary key. This is the
default for non-system tables. When there is no primary key, the
behavior is the same as NOTHING.

======

PSA a diff patch atop yours which makes my suggested changes

======
Kind Regards,
Peter Smith.
Fujitsu Australia

Attachments:

ps_diffs_atop_roberts_patch.txttext/plain; charset=US-ASCII; name=ps_diffs_atop_roberts_patch.txtDownload+53-33
#18Robert Treat
xzilla@users.sourceforge.net
In reply to: Peter Smith (#17)
Re: Question about behavior of deletes with REPLICA IDENTITY NOTHING

On Thu, Jan 9, 2025 at 2:46 AM Peter Smith <smithpb2250@gmail.com> wrote:

On Sat, Jan 4, 2025 at 4:23 AM Robert Treat <rob@xzilla.net> wrote:

On Wed, Dec 18, 2024 at 5:56 AM Amit Kapila <amit.kapila16@gmail.com> wrote:

On Thu, Feb 8, 2024 at 9:57 AM Laurenz Albe <laurenz.albe@cybertec.at> wrote:

On Thu, 2024-02-08 at 13:40 +1100, Peter Smith wrote:

-   how to set the replica identity.  If a table without a replica identity is
+   how to set the replica identity.  If a table without a replica identity
+   (or with replica identity behavior the same as <literal>NOTHING</literal>) is
added to a publication that replicates <command>UPDATE</command>
or <command>DELETE</command> operations then
subsequent <command>UPDATE</command> or <command>DELETE</command>

I had the impression that the root of the confusion was the perceived difference
between "REPLICA IDENTITY NOTHING" and "no replica identity", and that change
doesn't improve that.

How about:

If a table without a replica identity (explicitly set to <literal>NOTHING</literal>,
or set to a primary key or index that doesn't exist) is added ...

Is it correct to say "set to a primary key or index that doesn't
exist"? Because when it is set to the primary key then it should work.

I think Peter's proposal along with Ashutosh's proposal is the simpler
approach to clarify things in this area but I am fine if others find
some other way of updating docs better.

After reading this thread, I think part of the confusion here is that
technically speaking there is no such thing as having "no replica
identity"; when a table is created, by default it's "replica identity"
is set to DEFAULT, and how it behaves at that point will be dependent
on the structure of the table (PK or no PK), not whether it is being
replicated/published. To that end, I've taken the above suggestions
and re-worked them to remove that language, as well as add some
additional clarity.

Patch attached for this, I am going to update the commitfest with
myself as author and will work this further if needed. Thanks all.

Robert Treat
https://xzilla.net

Hi Robert.

Thanks for picking up this patch.

Some review comments for the 0001 patch.

======
doc/src/sgml/logical-replication.sgml

1.
fallback if no other solution is possible.  If a replica identity other
than <literal>FULL</literal> is set on the publisher side, a
replica identity
comprising the same or fewer columns must also be set on the subscriber
-   side.  See <xref linkend="sql-altertable-replica-identity"/> for details on
-   how to set the replica identity.  If a table without a replica identity is
+   side.  If a table with replica identity set to <literal>NOTHING</literal>
+   (or set to use a primary key or index that doesn't exist) is
added to a publication that replicates <command>UPDATE</command>
or <command>DELETE</command> operations then
subsequent <command>UPDATE</command> or <command>DELETE</command>
operations will cause an error on the publisher.  <command>INSERT</command>
operations can proceed regardless of any replica identity.
+   See <xref linkend="sql-altertable-replica-identity"/> for details on
+   how to set the replica identity.

1a.
That part "If a table with replica identity set to
<literal>NOTHING</literal> (or set to use a primary key or index that
doesn't exist) is added ..." is not very clear to me.

IIUC, there are 3 ways for this to be a problem:
i) RI is set to NOTHING
ii) RI is set to DEFAULT but there is no valid PK ==> same as NOTHING
iii) RI is set USING INDEX but then that index gets dropped ==> same as NOTHING

To avoid any misunderstandings, why don't we just spell that out in
full? So, ...

SUGGESTION
A replica identity behaves the same as <literal>NOTHING</literal> when
it is set to <literal>DEFAULT</literal> and there is no primary key,
or when it is set <literal>USING INDEX</literal> but the index no
longer exists. If a table with replica identity set to
<literal>NOTHING</literal> (or behaving the same as
<literal>NOTHING</literal>) is added to a publication that replicates
<command>UPDATE</command> or <command>DELETE</command> operations then
subsequent <command>UPDATE</command> or <command>DELETE</command>
operations will cause an error on the publisher.
~

This felt a little wordy, but incorporated it into an updated version.

1b.
I've always thought that for this important topic of logical
replication it is unusual that there is not even a heading for this
topic anywhere. This makes it unnecessarily difficult to find this
information. IMO it would greatly help just to have a "Replica
Identity" subsection for all this paragraph, so then it will appear
nicely in the Chapter 29 table-of-contents making it much easier to
find.

~

1c.
That great big slab of paragraph text is hard to read. I suspect it
was done that way simply to separate the RI topic visually from the
other (paragraph) topics of the sect1 heading. But now, after we
introduce the sect2 heading (my suggestion #1b above), we don't have
to do that anymore, so more blank lines can be added and it improves
the readability a lot.

Agreed.

======
src/sgml/ref/alter_table.sgml b/doc/src/sgml/ref/alter_table.sgml

2.
<para>
-         Records the old values of the columns of the primary key, if any.
+         Records the old values of the columns of the primary key. When there
+         is no primary key, the behavior is the same as
<literal>NOTHING</literal>.
This is the default for non-system tables.
</para>

Currently what "This" refers to seems ambiguous. It might be better to
rearrange to put that last sentence as second.

SUGGESTION
Records the old values of the columns of the primary key. This is the
default for non-system tables. When there is no primary key, the
behavior is the same as NOTHING.

======

+1

PSA a diff patch atop yours which makes my suggested changes

Updated patch incorporating your latest changes attached.

Robert Treat
https://xzilla.net

Attachments:

v2-0001-Expand-and-clarify-Replica-Identity-information.patchapplication/octet-stream; name=v2-0001-Expand-and-clarify-Replica-Identity-information.patchDownload+53-32
#19Peter Smith
smithpb2250@gmail.com
In reply to: Robert Treat (#18)
Re: Question about behavior of deletes with REPLICA IDENTITY NOTHING

Hi Robert.

The content and rendering of patch v2 LGTM.

Should the word wrapping within the file
doc/src/sgml/logical-replication.sgml be tidied up though?

======
Kind Regards,
Peter Smith.
Fujitsu Australia

#20Robert Treat
xzilla@users.sourceforge.net
In reply to: Peter Smith (#19)
Re: Question about behavior of deletes with REPLICA IDENTITY NOTHING

On Thu, Jan 9, 2025 at 10:41 PM Peter Smith <smithpb2250@gmail.com> wrote:

Hi Robert.

The content and rendering of patch v2 LGTM.

Should the word wrapping within the file
doc/src/sgml/logical-replication.sgml be tidied up though?

Definitely couldn't hurt; Updated patch cleans that up a bit and
tweaks the link to alter table replica status.

Robert Treat
https://xzilla.net

Attachments:

v3-0001-Expand-and-clarify-Replica-Identity-information.patchapplication/octet-stream; name=v3-0001-Expand-and-clarify-Replica-Identity-information.patchDownload+56-31
#21Peter Smith
smithpb2250@gmail.com
In reply to: Robert Treat (#20)
#22Amit Kapila
amit.kapila16@gmail.com
In reply to: Robert Treat (#20)
#23Robert Treat
xzilla@users.sourceforge.net
In reply to: Amit Kapila (#22)
#24Amit Kapila
amit.kapila16@gmail.com
In reply to: Robert Treat (#23)
#25Robert Treat
xzilla@users.sourceforge.net
In reply to: Amit Kapila (#24)
#26Peter Smith
smithpb2250@gmail.com
In reply to: Robert Treat (#25)
#27Robert Treat
xzilla@users.sourceforge.net
In reply to: James Coleman (#1)
#28Amit Kapila
amit.kapila16@gmail.com
In reply to: Robert Treat (#27)
#29Amit Kapila
amit.kapila16@gmail.com
In reply to: Amit Kapila (#28)