Ambiguous description on new columns

Started by PG Doc comments formover 1 year ago17 messages
#1PG Doc comments form
noreply@postgresql.org

The following documentation comment has been logged on the website:

Page: https://www.postgresql.org/docs/16/logical-replication-col-lists.html
Description:

The documentation on this page mentions:

"If no column list is specified, any columns added later are automatically
replicated."

It feels ambiguous what this could mean. Does it mean:

1/ That if you alter the table on the publisher and add a new column, it
will be replicated

2/ If you add a column list later and add a column to it, it will be
replicated

In both cases, does the subscriber automatically create this column if it
wasn't there before? I recall reading that the initial data synchronization
requires the schema of the publisher database to be created on the
subscriber first. But then later updates sync newly created columns? I don't
recall any pages on logical replication mentioning this, up to this point.

Regards,
Koen De Groote

#2Guillaume Lelarge
guillaume@lelarge.info
In reply to: PG Doc comments form (#1)
Re: Ambiguous description on new columns

Hi,

Le mar. 21 mai 2024 à 12:40, PG Doc comments form <noreply@postgresql.org>
a écrit :

The following documentation comment has been logged on the website:

Page:
https://www.postgresql.org/docs/16/logical-replication-col-lists.html
Description:

The documentation on this page mentions:

"If no column list is specified, any columns added later are automatically
replicated."

It feels ambiguous what this could mean. Does it mean:

1/ That if you alter the table on the publisher and add a new column, it
will be replicated

2/ If you add a column list later and add a column to it, it will be
replicated

In both cases, does the subscriber automatically create this column if it
wasn't there before? I recall reading that the initial data synchronization
requires the schema of the publisher database to be created on the
subscriber first. But then later updates sync newly created columns? I
don't
recall any pages on logical replication mentioning this, up to this point.

It feels ambiguous. DDL commands are not replicated, so the new columns
don't appear automagically on the subscriber. You have to add them to the
subscriber. But values of new columns are replicated, whether or not you
have added the new columns on the subscriber.

Regards.

--
Guillaume.

#3Peter Smith
smithpb2250@gmail.com
In reply to: PG Doc comments form (#1)
1 attachment(s)
Re: Ambiguous description on new columns

On Tue, May 21, 2024 at 8:40 PM PG Doc comments form
<noreply@postgresql.org> wrote:

The following documentation comment has been logged on the website:

Page: https://www.postgresql.org/docs/16/logical-replication-col-lists.html
Description:

The documentation on this page mentions:

"If no column list is specified, any columns added later are automatically
replicated."

It feels ambiguous what this could mean. Does it mean:

1/ That if you alter the table on the publisher and add a new column, it
will be replicated

2/ If you add a column list later and add a column to it, it will be
replicated

In both cases, does the subscriber automatically create this column if it
wasn't there before?

No, the subscriber will not automatically create the column. That is
already clearly said at the top of the same page you linked "The table
on the subscriber side must have at least all the columns that are
published."

All that "If no column list..." paragraph was trying to say is:

CREATE PUBLICATION pub FOR TABLE T;

is not quite the same as:

CREATE PUBLICATION pub FOR TABLE T(a,b,c);

The difference is, in the 1st case if you then ALTER the TABLE T to
have a new column 'd' then that will automatically start replicating
the 'd' data without having to do anything to either the PUBLICATION
or the SUBSCRIPTION. Of course, if TABLE T at the subscriber side does
not have a column 'd' then you'll get an error because your subscriber
table needs to have *at least* all the replicated columns. (I
demonstrate this error below)

Whereas in the 2nd case, even though you ALTER'ed the TABLE T to have
a new column 'd' then that won't be replicated because 'd' was not
named in the PUBLICATION's column list.

~~~~

Here's an example where you can see this in action

Here is an example of the 1st case -- it shows 'd' is automatically
replicated and also shows the subscriber-side error caused by the
missing column:

test_pub=# CREATE TABLE T(a int,b int, c int);
test_pub=# CREATE PUBLICATION pub FOR TABLE T;

test_sub=# CREATE TABLE T(a int,b int, c int);
test_sub=# CREATE SUBSCRIPTION sub CONNECTION 'dbname=test_pub' PUBLICATION pub;

See the replication happening
test_pub=# INSERT INTO T VALUES (1,2,3);
test_sub=# SELECT * FROM t;
a | b | c
---+---+---
1 | 2 | 3
(1 row)

Now alter the publisher table T and insert some new data
test_pub=# ALTER TABLE T ADD COLUMN d int;
test_pub=# INSERT INTO T VALUES (5,6,7,8);

This will cause subscription errors like:
2024-05-22 11:53:19.098 AEST [16226] ERROR: logical replication
target relation "public.t" is missing replicated column: "d"

~~~~

I think the following small change will remove any ambiguity:

BEFORE
If no column list is specified, any columns added later are
automatically replicated.

SUGGESTION
If no column list is specified, any columns added to the table later
are automatically replicated.

~~

I attached a small patch to make the above change.

Thoughts?

======
Kind Regards,
Peter Smith.
Fujitsu Australia

Attachments:

v1-0001-Fix-minor-ambiguity.patchapplication/octet-stream; name=v1-0001-Fix-minor-ambiguity.patchDownload
From 3db40ffbdb0270bc2508c8663ac6bea2c4ecf383 Mon Sep 17 00:00:00 2001
From: Peter Smith <peter.b.smith@fujitsu.com>
Date: Wed, 22 May 2024 12:18:14 +1000
Subject: [PATCH v1] Fix minor ambiguity

---
 doc/src/sgml/logical-replication.sgml | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/doc/src/sgml/logical-replication.sgml b/doc/src/sgml/logical-replication.sgml
index ec21306..4154610 100644
--- a/doc/src/sgml/logical-replication.sgml
+++ b/doc/src/sgml/logical-replication.sgml
@@ -1288,7 +1288,7 @@ test_sub=# SELECT * FROM child ORDER BY a;
   </para>
 
   <para>
-   If no column list is specified, any columns added later are automatically
+   If no column list is specified, any columns added to the table later are automatically
    replicated. This means that having a column list which names all columns
    is not the same as having no column list at all.
   </para>
-- 
1.8.3.1

#4Peter Smith
smithpb2250@gmail.com
In reply to: PG Doc comments form (#1)
1 attachment(s)
Re: Ambiguous description on new columns

On Tue, May 21, 2024 at 8:40 PM PG Doc comments form
<noreply@postgresql.org> wrote:

The following documentation comment has been logged on the website:

Page: https://www.postgresql.org/docs/16/logical-replication-col-lists.html
Description:

The documentation on this page mentions:

"If no column list is specified, any columns added later are automatically
replicated."

It feels ambiguous what this could mean. Does it mean:

1/ That if you alter the table on the publisher and add a new column, it
will be replicated

2/ If you add a column list later and add a column to it, it will be
replicated

In both cases, does the subscriber automatically create this column if it
wasn't there before?

No, the subscriber will not automatically create the column. That is
already clearly said at the top of the same page you linked "The table
on the subscriber side must have at least all the columns that are
published."

All that "If no column list..." paragraph was trying to say is:

CREATE PUBLICATION pub FOR TABLE T;

is not quite the same as:

CREATE PUBLICATION pub FOR TABLE T(a,b,c);

The difference is, in the 1st case if you then ALTER the TABLE T to
have a new column 'd' then that will automatically start replicating
the 'd' data without having to do anything to either the PUBLICATION
or the SUBSCRIPTION. Of course, if TABLE T at the subscriber side does
not have a column 'd' then you'll get an error because your subscriber
table needs to have *at least* all the replicated columns. (I
demonstrate this error below)

Whereas in the 2nd case, even though you ALTER'ed the TABLE T to have
a new column 'd' then that won't be replicated because 'd' was not
named in the PUBLICATION's column list.

~~~~

Here's an example where you can see this in action

Here is an example of the 1st case -- it shows 'd' is automatically
replicated and also shows the subscriber-side error caused by the
missing column:

test_pub=# CREATE TABLE T(a int,b int, c int);
test_pub=# CREATE PUBLICATION pub FOR TABLE T;

test_sub=# CREATE TABLE T(a int,b int, c int);
test_sub=# CREATE SUBSCRIPTION sub CONNECTION 'dbname=test_pub' PUBLICATION pub;

See the replication happening
test_pub=# INSERT INTO T VALUES (1,2,3);
test_sub=# SELECT * FROM t;
a | b | c
---+---+---
1 | 2 | 3
(1 row)

Now alter the publisher table T and insert some new data
test_pub=# ALTER TABLE T ADD COLUMN d int;
test_pub=# INSERT INTO T VALUES (5,6,7,8);

This will cause subscription errors like:
2024-05-22 11:53:19.098 AEST [16226] ERROR: logical replication
target relation "public.t" is missing replicated column: "d"

~~~~

I think the following small change will remove any ambiguity:

BEFORE
If no column list is specified, any columns added later are
automatically replicated.

SUGGESTION
If no column list is specified, any columns added to the table later
are automatically replicated.

~~

I attached a small patch to make the above change.

Thoughts?

======
Kind Regards,
Peter Smith.
Fujitsu Australia

Attachments:

v1-0001-Fix-minor-ambiguity.patchapplication/octet-stream; name=v1-0001-Fix-minor-ambiguity.patchDownload
From 3db40ffbdb0270bc2508c8663ac6bea2c4ecf383 Mon Sep 17 00:00:00 2001
From: Peter Smith <peter.b.smith@fujitsu.com>
Date: Wed, 22 May 2024 12:18:14 +1000
Subject: [PATCH v1] Fix minor ambiguity

---
 doc/src/sgml/logical-replication.sgml | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/doc/src/sgml/logical-replication.sgml b/doc/src/sgml/logical-replication.sgml
index ec21306..4154610 100644
--- a/doc/src/sgml/logical-replication.sgml
+++ b/doc/src/sgml/logical-replication.sgml
@@ -1288,7 +1288,7 @@ test_sub=# SELECT * FROM child ORDER BY a;
   </para>
 
   <para>
-   If no column list is specified, any columns added later are automatically
+   If no column list is specified, any columns added to the table later are automatically
    replicated. This means that having a column list which names all columns
    is not the same as having no column list at all.
   </para>
-- 
1.8.3.1

#5David G. Johnston
david.g.johnston@gmail.com
In reply to: PG Doc comments form (#1)
Re: Ambiguous description on new columns

On Tue, May 21, 2024 at 3:40 AM PG Doc comments form <noreply@postgresql.org>
wrote:

The following documentation comment has been logged on the website:

Page:
https://www.postgresql.org/docs/16/logical-replication-col-lists.html
Description:

The documentation on this page mentions:

"If no column list is specified, any columns added later are automatically
replicated."

It feels ambiguous what this could mean. Does it mean:

1/ That if you alter the table on the publisher and add a new column, it
will be replicated

Yes, this is the only thing in scope you can "add columns to later".

2/ If you add a column list later and add a column to it, it will be
replicated

I feel like we failed somewhere if the reader believes that it is possible
to alter a publication in this way.

David J.

#6David G. Johnston
david.g.johnston@gmail.com
In reply to: Peter Smith (#4)
Re: Ambiguous description on new columns

On Tue, May 21, 2024 at 7:48 PM Peter Smith <smithpb2250@gmail.com> wrote:

I think the following small change will remove any ambiguity:

BEFORE
If no column list is specified, any columns added later are
automatically replicated.

SUGGESTION
If no column list is specified, any columns added to the table later
are automatically replicated.

~~

Extended Before:

Each publication can optionally specify which columns of each table are
replicated to subscribers. The table on the subscriber side must have at
least all the columns that are published. If no column list is specified,
then all columns on the publisher are replicated. See CREATE PUBLICATION
for details on the syntax.

The choice of columns can be based on behavioral or performance reasons.
However, do not rely on this feature for security: a malicious subscriber
is able to obtain data from columns that are not specifically published. If
security is a consideration, protections can be applied at the publisher
side.

If no column list is specified, any columns added later are automatically
replicated. This means that having a column list which names all columns is
not the same as having no column list at all.

I'd suggest:

Each publication can optionally specify which columns of each table are
replicated to subscribers. The table on the subscriber side must have at
least all the columns that are published. If no column list is specified,
then all columns on the publisher[, present and future,] are replicated.
See CREATE PUBLICATION for details on the syntax.

...security...

...delete the entire "ambiguous" paragraph...

David J.

#7Peter Smith
smithpb2250@gmail.com
In reply to: David G. Johnston (#6)
Re: Ambiguous description on new columns

On Wed, May 22, 2024 at 1:22 PM David G. Johnston
<david.g.johnston@gmail.com> wrote:

On Tue, May 21, 2024 at 7:48 PM Peter Smith <smithpb2250@gmail.com> wrote:

I think the following small change will remove any ambiguity:

BEFORE
If no column list is specified, any columns added later are
automatically replicated.

SUGGESTION
If no column list is specified, any columns added to the table later
are automatically replicated.

~~

Extended Before:

Each publication can optionally specify which columns of each table are replicated to subscribers. The table on the subscriber side must have at least all the columns that are published. If no column list is specified, then all columns on the publisher are replicated. See CREATE PUBLICATION for details on the syntax.

The choice of columns can be based on behavioral or performance reasons. However, do not rely on this feature for security: a malicious subscriber is able to obtain data from columns that are not specifically published. If security is a consideration, protections can be applied at the publisher side.

If no column list is specified, any columns added later are automatically replicated. This means that having a column list which names all columns is not the same as having no column list at all.

I'd suggest:

Each publication can optionally specify which columns of each table are replicated to subscribers. The table on the subscriber side must have at least all the columns that are published. If no column list is specified, then all columns on the publisher[, present and future,] are replicated. See CREATE PUBLICATION for details on the syntax.

...security...

...delete the entire "ambiguous" paragraph...

The "ambiguous" paragraph was trying to make the point that although
(a) having no column-list at all and
(b) having a column list that names every table column

starts off looking and working the same, don't be tricked into
thinking they are exactly equivalent, because if the table ever gets
ALTERED later then the behaviour of those PUBLICATIONs begins to
differ.

~

Your suggested text doesn't seem quite as explicit about that subtle
point, but I guess since you can still infer the same meaning it is
fine.

But, maybe say "all columns on the published table" instead of "all
columns on the publisher".

======
Kind Regards,
Peter Smith.
Fujitsu Australia

#8David G. Johnston
david.g.johnston@gmail.com
In reply to: Peter Smith (#7)
Re: Ambiguous description on new columns

On Tuesday, May 21, 2024, Peter Smith <smithpb2250@gmail.com> wrote:

Each publication can optionally specify which columns of each table are

replicated to subscribers. The table on the subscriber side must have at
least all the columns that are published. If no column list is specified,
then all columns on the publisher[, present and future,] are replicated.
See CREATE PUBLICATION for details on the syntax.

...security...

...delete the entire "ambiguous" paragraph...

Your suggested text doesn't seem quite as explicit about that subtle
point, but I guess since you can still infer the same meaning it is
fine.

Right, it doesn’t seem that subtle so long as we point out what an absent
column list means. if you specify a column list you get exactly what you
asked for. It’s like listing columns in select. But if you don’t specify
a column list you get whatever is there at runtime. Which I presume also
means dropped columns no longer get replicated, but I haven’t tested and
the docs don’t seem to cover column removal…

In contrast, if we don’t say this, one might reasonably assume that it
behaves like:
Create view vw select * from tbl;
when it doesn’t.

So yes, I do think saying “present and future” sufficiently covers the
intent of the removed paragraph and clearly ties that to the table columns
in response to this complaint.

But, maybe say "all columns on the published table" instead of "all
columns on the publisher".

Agreed.

David J.

#9Laurenz Albe
laurenz.albe@cybertec.at
In reply to: Peter Smith (#4)
Re: Ambiguous description on new columns

On Wed, 2024-05-22 at 12:47 +1000, Peter Smith wrote:

I think the following small change will remove any ambiguity:

BEFORE
If no column list is specified, any columns added later are
automatically replicated.

SUGGESTION
If no column list is specified, any columns added to the table later
are automatically replicated.

~~

I attached a small patch to make the above change.

+1 on that change.

Yours,
Laurenz Albe

#10vignesh C
vignesh21@gmail.com
In reply to: Peter Smith (#4)
Re: Ambiguous description on new columns

On Wed, 22 May 2024 at 08:18, Peter Smith <smithpb2250@gmail.com> wrote:

On Tue, May 21, 2024 at 8:40 PM PG Doc comments form
<noreply@postgresql.org> wrote:

The following documentation comment has been logged on the website:

Page: https://www.postgresql.org/docs/16/logical-replication-col-lists.html
Description:

The documentation on this page mentions:

"If no column list is specified, any columns added later are automatically
replicated."

It feels ambiguous what this could mean. Does it mean:

1/ That if you alter the table on the publisher and add a new column, it
will be replicated

2/ If you add a column list later and add a column to it, it will be
replicated

In both cases, does the subscriber automatically create this column if it
wasn't there before?

No, the subscriber will not automatically create the column. That is
already clearly said at the top of the same page you linked "The table
on the subscriber side must have at least all the columns that are
published."

All that "If no column list..." paragraph was trying to say is:

CREATE PUBLICATION pub FOR TABLE T;

is not quite the same as:

CREATE PUBLICATION pub FOR TABLE T(a,b,c);

The difference is, in the 1st case if you then ALTER the TABLE T to
have a new column 'd' then that will automatically start replicating
the 'd' data without having to do anything to either the PUBLICATION
or the SUBSCRIPTION. Of course, if TABLE T at the subscriber side does
not have a column 'd' then you'll get an error because your subscriber
table needs to have *at least* all the replicated columns. (I
demonstrate this error below)

Whereas in the 2nd case, even though you ALTER'ed the TABLE T to have
a new column 'd' then that won't be replicated because 'd' was not
named in the PUBLICATION's column list.

~~~~

Here's an example where you can see this in action

Here is an example of the 1st case -- it shows 'd' is automatically
replicated and also shows the subscriber-side error caused by the
missing column:

test_pub=# CREATE TABLE T(a int,b int, c int);
test_pub=# CREATE PUBLICATION pub FOR TABLE T;

test_sub=# CREATE TABLE T(a int,b int, c int);
test_sub=# CREATE SUBSCRIPTION sub CONNECTION 'dbname=test_pub' PUBLICATION pub;

See the replication happening
test_pub=# INSERT INTO T VALUES (1,2,3);
test_sub=# SELECT * FROM t;
a | b | c
---+---+---
1 | 2 | 3
(1 row)

Now alter the publisher table T and insert some new data
test_pub=# ALTER TABLE T ADD COLUMN d int;
test_pub=# INSERT INTO T VALUES (5,6,7,8);

This will cause subscription errors like:
2024-05-22 11:53:19.098 AEST [16226] ERROR: logical replication
target relation "public.t" is missing replicated column: "d"

~~~~

I think the following small change will remove any ambiguity:

BEFORE
If no column list is specified, any columns added later are
automatically replicated.

SUGGESTION
If no column list is specified, any columns added to the table later
are automatically replicated.

~~

I attached a small patch to make the above change.

Thoughts?

A minor suggestion, the rest looks good:
It would enhance clarity to include a line break following "If no
column list is specified, any columns added to the table later are":
-   If no column list is specified, any columns added later are automatically
+   If no column list is specified, any columns added to the table
later are automatically
    replicated. This means that having a column list which names all columns

Regards,
Vignesh

#11Peter Smith
smithpb2250@gmail.com
In reply to: PG Doc comments form (#1)
1 attachment(s)
Re: Ambiguous description on new columns

On Wed, May 29, 2024 at 8:04 PM vignesh C <vignesh21@gmail.com> wrote:

On Wed, 22 May 2024 at 14:26, Peter Smith <smithpb2250@gmail.com> wrote:

On Tue, May 21, 2024 at 8:40 PM PG Doc comments form
<noreply@postgresql.org> wrote:

The following documentation comment has been logged on the website:

Page: https://www.postgresql.org/docs/16/logical-replication-col-lists.html
Description:

The documentation on this page mentions:

"If no column list is specified, any columns added later are automatically
replicated."

It feels ambiguous what this could mean. Does it mean:

1/ That if you alter the table on the publisher and add a new column, it
will be replicated

2/ If you add a column list later and add a column to it, it will be
replicated

In both cases, does the subscriber automatically create this column if it
wasn't there before?

No, the subscriber will not automatically create the column. That is
already clearly said at the top of the same page you linked "The table
on the subscriber side must have at least all the columns that are
published."

All that "If no column list..." paragraph was trying to say is:

CREATE PUBLICATION pub FOR TABLE T;

is not quite the same as:

CREATE PUBLICATION pub FOR TABLE T(a,b,c);

The difference is, in the 1st case if you then ALTER the TABLE T to
have a new column 'd' then that will automatically start replicating
the 'd' data without having to do anything to either the PUBLICATION
or the SUBSCRIPTION. Of course, if TABLE T at the subscriber side does
not have a column 'd' then you'll get an error because your subscriber
table needs to have *at least* all the replicated columns. (I
demonstrate this error below)

Whereas in the 2nd case, even though you ALTER'ed the TABLE T to have
a new column 'd' then that won't be replicated because 'd' was not
named in the PUBLICATION's column list.

~~~~

Here's an example where you can see this in action

Here is an example of the 1st case -- it shows 'd' is automatically
replicated and also shows the subscriber-side error caused by the
missing column:

test_pub=# CREATE TABLE T(a int,b int, c int);
test_pub=# CREATE PUBLICATION pub FOR TABLE T;

test_sub=# CREATE TABLE T(a int,b int, c int);
test_sub=# CREATE SUBSCRIPTION sub CONNECTION 'dbname=test_pub' PUBLICATION pub;

See the replication happening
test_pub=# INSERT INTO T VALUES (1,2,3);
test_sub=# SELECT * FROM t;
a | b | c
---+---+---
1 | 2 | 3
(1 row)

Now alter the publisher table T and insert some new data
test_pub=# ALTER TABLE T ADD COLUMN d int;
test_pub=# INSERT INTO T VALUES (5,6,7,8);

This will cause subscription errors like:
2024-05-22 11:53:19.098 AEST [16226] ERROR: logical replication
target relation "public.t" is missing replicated column: "d"

~~~~

I think the following small change will remove any ambiguity:

BEFORE
If no column list is specified, any columns added later are
automatically replicated.

SUGGESTION
If no column list is specified, any columns added to the table later
are automatically replicated.

~~

I attached a small patch to make the above change.

A small recommendation:
It would enhance clarity to include a line break following "If no
column list is specified, any columns added to the table later are":
-   If no column list is specified, any columns added later are automatically
+   If no column list is specified, any columns added to the table
later are automatically
replicated. This means that having a column list which names all columns

Hi Vignesh,

IIUC you're saying my v1 patch *content* and rendering is OK, but you
only wanted the SGML text to have better wrapping for < 80 chars
lines. So I have attached a patch v2 with improved wrapping. If you
meant something different then please explain.

======
Kind Regards,
Peter Smith.
Fujitsu Australia

Attachments:

v2-0001-Fix-minor-ambiguity.patchapplication/octet-stream; name=v2-0001-Fix-minor-ambiguity.patchDownload
From d55d78a0407b234bf434cb394437cb328f9b877a Mon Sep 17 00:00:00 2001
From: Peter Smith <peter.b.smith@fujitsu.com>
Date: Thu, 30 May 2024 10:36:43 +1000
Subject: [PATCH v2] Fix minor ambiguity

---
 doc/src/sgml/logical-replication.sgml | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/doc/src/sgml/logical-replication.sgml b/doc/src/sgml/logical-replication.sgml
index ec21306..5b06359 100644
--- a/doc/src/sgml/logical-replication.sgml
+++ b/doc/src/sgml/logical-replication.sgml
@@ -1288,9 +1288,9 @@ test_sub=# SELECT * FROM child ORDER BY a;
   </para>
 
   <para>
-   If no column list is specified, any columns added later are automatically
-   replicated. This means that having a column list which names all columns
-   is not the same as having no column list at all.
+   If no column list is specified, any columns added to the table later are
+   automatically replicated. This means that having a column list which names
+   all columns is not the same as having no column list at all.
   </para>
 
   <para>
-- 
1.8.3.1

#12vignesh C
vignesh21@gmail.com
In reply to: Peter Smith (#11)
Re: Ambiguous description on new columns

On Thu, 30 May 2024 at 06:21, Peter Smith <smithpb2250@gmail.com> wrote:

On Wed, May 29, 2024 at 8:04 PM vignesh C <vignesh21@gmail.com> wrote:

On Wed, 22 May 2024 at 14:26, Peter Smith <smithpb2250@gmail.com> wrote:

On Tue, May 21, 2024 at 8:40 PM PG Doc comments form
<noreply@postgresql.org> wrote:

The following documentation comment has been logged on the website:

Page: https://www.postgresql.org/docs/16/logical-replication-col-lists.html
Description:

The documentation on this page mentions:

"If no column list is specified, any columns added later are automatically
replicated."

It feels ambiguous what this could mean. Does it mean:

1/ That if you alter the table on the publisher and add a new column, it
will be replicated

2/ If you add a column list later and add a column to it, it will be
replicated

In both cases, does the subscriber automatically create this column if it
wasn't there before?

No, the subscriber will not automatically create the column. That is
already clearly said at the top of the same page you linked "The table
on the subscriber side must have at least all the columns that are
published."

All that "If no column list..." paragraph was trying to say is:

CREATE PUBLICATION pub FOR TABLE T;

is not quite the same as:

CREATE PUBLICATION pub FOR TABLE T(a,b,c);

The difference is, in the 1st case if you then ALTER the TABLE T to
have a new column 'd' then that will automatically start replicating
the 'd' data without having to do anything to either the PUBLICATION
or the SUBSCRIPTION. Of course, if TABLE T at the subscriber side does
not have a column 'd' then you'll get an error because your subscriber
table needs to have *at least* all the replicated columns. (I
demonstrate this error below)

Whereas in the 2nd case, even though you ALTER'ed the TABLE T to have
a new column 'd' then that won't be replicated because 'd' was not
named in the PUBLICATION's column list.

~~~~

Here's an example where you can see this in action

Here is an example of the 1st case -- it shows 'd' is automatically
replicated and also shows the subscriber-side error caused by the
missing column:

test_pub=# CREATE TABLE T(a int,b int, c int);
test_pub=# CREATE PUBLICATION pub FOR TABLE T;

test_sub=# CREATE TABLE T(a int,b int, c int);
test_sub=# CREATE SUBSCRIPTION sub CONNECTION 'dbname=test_pub' PUBLICATION pub;

See the replication happening
test_pub=# INSERT INTO T VALUES (1,2,3);
test_sub=# SELECT * FROM t;
a | b | c
---+---+---
1 | 2 | 3
(1 row)

Now alter the publisher table T and insert some new data
test_pub=# ALTER TABLE T ADD COLUMN d int;
test_pub=# INSERT INTO T VALUES (5,6,7,8);

This will cause subscription errors like:
2024-05-22 11:53:19.098 AEST [16226] ERROR: logical replication
target relation "public.t" is missing replicated column: "d"

~~~~

I think the following small change will remove any ambiguity:

BEFORE
If no column list is specified, any columns added later are
automatically replicated.

SUGGESTION
If no column list is specified, any columns added to the table later
are automatically replicated.

~~

I attached a small patch to make the above change.

A small recommendation:
It would enhance clarity to include a line break following "If no
column list is specified, any columns added to the table later are":
-   If no column list is specified, any columns added later are automatically
+   If no column list is specified, any columns added to the table
later are automatically
replicated. This means that having a column list which names all columns

Hi Vignesh,

IIUC you're saying my v1 patch *content* and rendering is OK, but you
only wanted the SGML text to have better wrapping for < 80 chars
lines. So I have attached a patch v2 with improved wrapping. If you
meant something different then please explain.

Yes, that is what I meant and the updated patch looks good.

Regards,
Vignesh

#13vignesh C
vignesh21@gmail.com
In reply to: vignesh C (#12)
Re: Ambiguous description on new columns

On Fri, 31 May 2024 at 08:58, vignesh C <vignesh21@gmail.com> wrote:

On Thu, 30 May 2024 at 06:21, Peter Smith <smithpb2250@gmail.com> wrote:

On Wed, May 29, 2024 at 8:04 PM vignesh C <vignesh21@gmail.com> wrote:

On Wed, 22 May 2024 at 14:26, Peter Smith <smithpb2250@gmail.com> wrote:

On Tue, May 21, 2024 at 8:40 PM PG Doc comments form
<noreply@postgresql.org> wrote:

The following documentation comment has been logged on the website:

Page: https://www.postgresql.org/docs/16/logical-replication-col-lists.html
Description:

The documentation on this page mentions:

"If no column list is specified, any columns added later are automatically
replicated."

It feels ambiguous what this could mean. Does it mean:

1/ That if you alter the table on the publisher and add a new column, it
will be replicated

2/ If you add a column list later and add a column to it, it will be
replicated

In both cases, does the subscriber automatically create this column if it
wasn't there before?

No, the subscriber will not automatically create the column. That is
already clearly said at the top of the same page you linked "The table
on the subscriber side must have at least all the columns that are
published."

All that "If no column list..." paragraph was trying to say is:

CREATE PUBLICATION pub FOR TABLE T;

is not quite the same as:

CREATE PUBLICATION pub FOR TABLE T(a,b,c);

The difference is, in the 1st case if you then ALTER the TABLE T to
have a new column 'd' then that will automatically start replicating
the 'd' data without having to do anything to either the PUBLICATION
or the SUBSCRIPTION. Of course, if TABLE T at the subscriber side does
not have a column 'd' then you'll get an error because your subscriber
table needs to have *at least* all the replicated columns. (I
demonstrate this error below)

Whereas in the 2nd case, even though you ALTER'ed the TABLE T to have
a new column 'd' then that won't be replicated because 'd' was not
named in the PUBLICATION's column list.

~~~~

Here's an example where you can see this in action

Here is an example of the 1st case -- it shows 'd' is automatically
replicated and also shows the subscriber-side error caused by the
missing column:

test_pub=# CREATE TABLE T(a int,b int, c int);
test_pub=# CREATE PUBLICATION pub FOR TABLE T;

test_sub=# CREATE TABLE T(a int,b int, c int);
test_sub=# CREATE SUBSCRIPTION sub CONNECTION 'dbname=test_pub' PUBLICATION pub;

See the replication happening
test_pub=# INSERT INTO T VALUES (1,2,3);
test_sub=# SELECT * FROM t;
a | b | c
---+---+---
1 | 2 | 3
(1 row)

Now alter the publisher table T and insert some new data
test_pub=# ALTER TABLE T ADD COLUMN d int;
test_pub=# INSERT INTO T VALUES (5,6,7,8);

This will cause subscription errors like:
2024-05-22 11:53:19.098 AEST [16226] ERROR: logical replication
target relation "public.t" is missing replicated column: "d"

~~~~

I think the following small change will remove any ambiguity:

BEFORE
If no column list is specified, any columns added later are
automatically replicated.

SUGGESTION
If no column list is specified, any columns added to the table later
are automatically replicated.

~~

I attached a small patch to make the above change.

A small recommendation:
It would enhance clarity to include a line break following "If no
column list is specified, any columns added to the table later are":
-   If no column list is specified, any columns added later are automatically
+   If no column list is specified, any columns added to the table
later are automatically
replicated. This means that having a column list which names all columns

Hi Vignesh,

IIUC you're saying my v1 patch *content* and rendering is OK, but you
only wanted the SGML text to have better wrapping for < 80 chars
lines. So I have attached a patch v2 with improved wrapping. If you
meant something different then please explain.

Yes, that is what I meant and the updated patch looks good.

Adding Amit to get his opinion on the same.

Regards,
Vignesh

#14Amit Kapila
amit.kapila16@gmail.com
In reply to: Peter Smith (#11)
Re: Ambiguous description on new columns

On Fri, May 31, 2024 at 10:54 PM Peter Smith <smithpb2250@gmail.com> wrote:

On Wed, May 29, 2024 at 8:04 PM vignesh C <vignesh21@gmail.com> wrote:

The following documentation comment has been logged on the website:

Page: https://www.postgresql.org/docs/16/logical-replication-col-lists.html
Description:

The documentation on this page mentions:

"If no column list is specified, any columns added later are automatically
replicated."

It feels ambiguous what this could mean. Does it mean:

1/ That if you alter the table on the publisher and add a new column, it
will be replicated

2/ If you add a column list later and add a column to it, it will be
replicated

In both cases, does the subscriber automatically create this column if it
wasn't there before?

~~~~

I think the following small change will remove any ambiguity:

BEFORE
If no column list is specified, any columns added later are
automatically replicated.

SUGGESTION
If no column list is specified, any columns added to the table later
are automatically replicated.

~~

I attached a small patch to make the above change.

A small recommendation:
It would enhance clarity to include a line break following "If no
column list is specified, any columns added to the table later are":
-   If no column list is specified, any columns added later are automatically
+   If no column list is specified, any columns added to the table
later are automatically
replicated. This means that having a column list which names all columns

Hi Vignesh,

IIUC you're saying my v1 patch *content* and rendering is OK, but you
only wanted the SGML text to have better wrapping for < 80 chars
lines. So I have attached a patch v2 with improved wrapping. If you
meant something different then please explain.

Your patch is an improvement. Koen, does the proposed change make
things clear to you?

--
With Regards,
Amit Kapila.

#15Amit Kapila
amit.kapila16@gmail.com
In reply to: Amit Kapila (#14)
Re: Ambiguous description on new columns

On Tue, Jun 4, 2024 at 11:26 AM Amit Kapila <amit.kapila16@gmail.com> wrote:

IIUC you're saying my v1 patch *content* and rendering is OK, but you
only wanted the SGML text to have better wrapping for < 80 chars
lines. So I have attached a patch v2 with improved wrapping. If you
meant something different then please explain.

Your patch is an improvement. Koen, does the proposed change make
things clear to you?

I am planning to push and backpatch the latest patch by Peter Smith
unless there are any further comments or suggestions.

--
With Regards,
Amit Kapila.

#16Koen De Groote
kdg.dev@gmail.com
In reply to: Amit Kapila (#14)
Re: Ambiguous description on new columns

Yes, this change is clear to me that the "columns added" applies to the
table on the publisher.

Regards,
Koen De Groote

On Tue, Jun 4, 2024 at 7:57 AM Amit Kapila <amit.kapila16@gmail.com> wrote:

Show quoted text

On Fri, May 31, 2024 at 10:54 PM Peter Smith <smithpb2250@gmail.com>
wrote:

On Wed, May 29, 2024 at 8:04 PM vignesh C <vignesh21@gmail.com> wrote:

The following documentation comment has been logged on the website:

Page:

https://www.postgresql.org/docs/16/logical-replication-col-lists.html

Description:

The documentation on this page mentions:

"If no column list is specified, any columns added later are

automatically

replicated."

It feels ambiguous what this could mean. Does it mean:

1/ That if you alter the table on the publisher and add a new

column, it

will be replicated

2/ If you add a column list later and add a column to it, it will

be

replicated

In both cases, does the subscriber automatically create this

column if it

wasn't there before?

~~~~

I think the following small change will remove any ambiguity:

BEFORE
If no column list is specified, any columns added later are
automatically replicated.

SUGGESTION
If no column list is specified, any columns added to the table later
are automatically replicated.

~~

I attached a small patch to make the above change.

A small recommendation:
It would enhance clarity to include a line break following "If no
column list is specified, any columns added to the table later are":
- If no column list is specified, any columns added later are

automatically

+ If no column list is specified, any columns added to the table
later are automatically
replicated. This means that having a column list which names all

columns

Hi Vignesh,

IIUC you're saying my v1 patch *content* and rendering is OK, but you
only wanted the SGML text to have better wrapping for < 80 chars
lines. So I have attached a patch v2 with improved wrapping. If you
meant something different then please explain.

Your patch is an improvement. Koen, does the proposed change make
things clear to you?

--
With Regards,
Amit Kapila.

#17Amit Kapila
amit.kapila16@gmail.com
In reply to: Koen De Groote (#16)
Re: Ambiguous description on new columns

On Fri, Jun 7, 2024 at 3:23 PM Koen De Groote <kdg.dev@gmail.com> wrote:

Yes, this change is clear to me that the "columns added" applies to the table on the publisher.

Thanks for the confirmation. I have pushed and backpatched the fix.

--
With Regards,
Amit Kapila.