bogus: logical replication rows/cols combinations

Started by Alvaro Herreraover 3 years ago84 messages

alvherre@alvh.no-ip.org

over 3 years ago

I just noticed that publishing tables on multiple publications with
different row filters and column lists has somewhat surprising behavior.
To wit: if a column is published in any row-filtered publication, then
the values for that column are sent to the subscriber even for rows that
don't match the row filter, as long as the row matches the row filter
for any other publication, even if that other publication doesn't
include the column.

Here's an example.

Publisher:

create table uno (a int primary key, b int, c int);
create publication uno for table uno (a, b) where (a > 0);
create publication dos for table uno (a, c) where (a < 0);

Here, we specify: publish columns a,b for rows with positive a, and
publish columns a,c for rows with negative a.

What happened next will surprise you! Well, maybe not. On subscriber:

create table uno (a int primary key, b int, c int);
create subscription sub_uno connection 'port=55432 dbname=alvherre' publication uno,dos;

Publisher:
insert into uno values (1, 2, 3), (-1, 3, 4);

Publication 'uno' only has columns a and b, so row with a=1 should not
have value c=3. And publication 'dos' only has columns a and c, so row
with a=-1 should not have value b=3. But, on subscriber:

table uno;
a │ b │ c
────┼───┼───
1 │ 2 │ 3
-1 │ 3 │ 4

q.e.d.

I think results from a too simplistic view on how to mix multiple
publications with row filters and column lists. IIRC we are saying "if
column X appears in *any* publication, then the value is published",
period, and don't stop to evaluate the row filter corresponding to each
of those publications.

The desired result on subscriber is:

table uno;
a │ b │ c
────┼───┼───
1 │ 2 │
-1 │ │ 4

Thoughts?

--
Álvaro Herrera 48°01'N 7°57'E — https://www.EnterpriseDB.com/

Tomas Vondra

tomas.vondra@enterprisedb.com

over 3 years ago

In reply to: Alvaro Herrera (#1)

Re: bogus: logical replication rows/cols combinations

On 4/25/22 17:48, Alvaro Herrera wrote:

I just noticed that publishing tables on multiple publications with
different row filters and column lists has somewhat surprising behavior.
To wit: if a column is published in any row-filtered publication, then
the values for that column are sent to the subscriber even for rows that
don't match the row filter, as long as the row matches the row filter
for any other publication, even if that other publication doesn't
include the column.

Here's an example.

Publisher:

create table uno (a int primary key, b int, c int);
create publication uno for table uno (a, b) where (a > 0);
create publication dos for table uno (a, c) where (a < 0);

Here, we specify: publish columns a,b for rows with positive a, and
publish columns a,c for rows with negative a.

What happened next will surprise you! Well, maybe not. On subscriber:

create table uno (a int primary key, b int, c int);
create subscription sub_uno connection 'port=55432 dbname=alvherre' publication uno,dos;

Publisher:
insert into uno values (1, 2, 3), (-1, 3, 4);

Publication 'uno' only has columns a and b, so row with a=1 should not
have value c=3. And publication 'dos' only has columns a and c, so row
with a=-1 should not have value b=3. But, on subscriber:

table uno;
a │ b │ c
────┼───┼───
1 │ 2 │ 3
-1 │ 3 │ 4

q.e.d.

I think results from a too simplistic view on how to mix multiple
publications with row filters and column lists. IIRC we are saying "if
column X appears in *any* publication, then the value is published",
period, and don't stop to evaluate the row filter corresponding to each
of those publications.

Right.

The desired result on subscriber is:

table uno;
a │ b │ c
────┼───┼───
1 │ 2 │
-1 │ │ 4

Thoughts?

I'm not quite sure which of the two behaviors is more "desirable". In a
way, it's somewhat similar to publish_as_relid, which is also calculated
not considering which of the row filters match?

But maybe you're right and it should behave the way you propose ... the
example I have in mind is a use case replicating table with two types of
rows - sensitive and non-sensitive. For sensitive, we replicate only
some of the columns, for non-sensitive we replicate everything. Which
could be implemented as two publications

create publication sensitive_rows
for table t (a, b) where (is_sensitive);

create publication non_sensitive_rows
for table t where (not is_sensitive);

But the way it's implemented now, we'll always replicate all columns,
because the second publication has no column list.

Changing this to behave the way you expect would be quite difficult,
because at the moment we build a single OR expression from all the row
filters. We'd have to keep the individual expressions, so that we can
build a column list for each of them (in order to ignore those that
don't match).

We'd have to remove various other optimizations - for example we can't
just discard row filters if we found "no_filter" publication. Or more
precisely, we'd have to consider column lists too.

In other words, we'd have to merge pgoutput_column_list_init into
pgoutput_row_filter_init, and then modify pgoutput_row_filter to
evaluate the row filters one by one, and build the column list.

I can take a stab at it, but it seems strange to not apply the same
logic to evaluation of publish_as_relid. I wonder what Amit thinks about
this, as he wrote the row filter stuff.

regards

--
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Amit Kapila

amit.kapila16@gmail.com

over 3 years ago

In reply to: Tomas Vondra (#2)

Re: bogus: logical replication rows/cols combinations

On Tue, Apr 26, 2022 at 4:00 AM Tomas Vondra
<tomas.vondra@enterprisedb.com> wrote:

On 4/25/22 17:48, Alvaro Herrera wrote:

The desired result on subscriber is:

table uno;
a │ b │ c
────┼───┼───
1 │ 2 │
-1 │ │ 4

Thoughts?

I'm not quite sure which of the two behaviors is more "desirable". In a
way, it's somewhat similar to publish_as_relid, which is also calculated
not considering which of the row filters match?

Right, or in other words, we check all publications to decide it and
similar is the case for publication actions which are also computed
independently for all publications.

But maybe you're right and it should behave the way you propose ... the
example I have in mind is a use case replicating table with two types of
rows - sensitive and non-sensitive. For sensitive, we replicate only
some of the columns, for non-sensitive we replicate everything. Which
could be implemented as two publications

create publication sensitive_rows
for table t (a, b) where (is_sensitive);

create publication non_sensitive_rows
for table t where (not is_sensitive);

But the way it's implemented now, we'll always replicate all columns,
because the second publication has no column list.

Changing this to behave the way you expect would be quite difficult,
because at the moment we build a single OR expression from all the row
filters. We'd have to keep the individual expressions, so that we can
build a column list for each of them (in order to ignore those that
don't match).

We'd have to remove various other optimizations - for example we can't
just discard row filters if we found "no_filter" publication.

I don't think that is the right way. We need some way to combine
expressions and I feel the current behavior is sane. I mean to say
that even if there is one publication that has no filter (column/row),
we should publish all rows with all columns. Now, as mentioned above
combining row filters or column lists for all publications appears to
be consistent with what we already do and seems correct behavior to
me.

To me, it appears that the method used to decide whether a particular
table is published or not is also similar to what we do for row
filters or column lists. Even if there is one publication that
publishes all tables, we consider the current table to be published
irrespective of whether other publications have published that table
or not.

Or more
precisely, we'd have to consider column lists too.

In other words, we'd have to merge pgoutput_column_list_init into
pgoutput_row_filter_init, and then modify pgoutput_row_filter to
evaluate the row filters one by one, and build the column list.

Hmm, I think even if we want to do something here, we also need to
think about how to achieve similar behavior for initial tablesync
which will be more tricky.

I can take a stab at it, but it seems strange to not apply the same
logic to evaluation of publish_as_relid.

Yeah, the current behavior seems to be consistent with what we already do.

I wonder what Amit thinks about
this, as he wrote the row filter stuff.

I feel we can explain a bit more about this in docs. We already have
some explanation of how row filters are combined [1]https://www.postgresql.org/docs/devel/logical-replication-row-filter.html#LOGICAL-REPLICATION-ROW-FILTER-COMBINING. We can probably
add a few examples for column lists.

[1]: https://www.postgresql.org/docs/devel/logical-replication-row-filter.html#LOGICAL-REPLICATION-ROW-FILTER-COMBINING

--
With Regards,
Amit Kapila.

Michael Paquier

michael@paquier.xyz

over 3 years ago

In reply to: Amit Kapila (#3)

Re: bogus: logical replication rows/cols combinations

On Wed, Apr 27, 2022 at 10:25:50AM +0530, Amit Kapila wrote:

I feel we can explain a bit more about this in docs. We already have
some explanation of how row filters are combined [1]. We can probably
add a few examples for column lists.

I am not completely sure exactly what we should do here, but this
stuff needs to be at least discussed. I have added an open item.
--
Michael

Alvaro Herrera

alvherre@alvh.no-ip.org

over 3 years ago

In reply to: Tomas Vondra (#2)

Re: bogus: logical replication rows/cols combinations

On 2022-Apr-26, Tomas Vondra wrote:

I'm not quite sure which of the two behaviors is more "desirable". In a
way, it's somewhat similar to publish_as_relid, which is also calculated
not considering which of the row filters match?

I grepped doc/src/sgml for `publish_as_relid` and found no hits, so
I suppose it's not a user-visible feature as such.

But maybe you're right and it should behave the way you propose ... the
example I have in mind is a use case replicating table with two types of
rows - sensitive and non-sensitive. For sensitive, we replicate only
some of the columns, for non-sensitive we replicate everything.

Exactly. If we blindly publish row/column values that aren't in *any*
publications, this may lead to leaking protected values.

Changing this to behave the way you expect would be quite difficult,
because at the moment we build a single OR expression from all the row
filters. We'd have to keep the individual expressions, so that we can
build a column list for each of them (in order to ignore those that
don't match).

I think we should do that, yeah.

I can take a stab at it, but it seems strange to not apply the same
logic to evaluation of publish_as_relid. I wonder what Amit thinks about
this, as he wrote the row filter stuff.

By grepping publicationcmds.c, it seems that publish_as_relid refers to
the ancestor partitioned table that is used for column list and
rowfilter determination, when a partition is being published as part of
it. I don't think these things are exactly parallel.

... In fact I think they are quite orthogonal: probably you should be
able to publish a partitioned table in two publications, with different
rowfilters and different column lists (which can come from the
topmost partitioned table), and each partition should still work in the
way I describe above.

--
Álvaro Herrera Breisgau, Deutschland — https://www.EnterpriseDB.com/
"[PostgreSQL] is a great group; in my opinion it is THE best open source
development communities in existence anywhere." (Lamar Owen)

Alvaro Herrera

alvherre@alvh.no-ip.org

over 3 years ago

In reply to: Amit Kapila (#3)

Re: bogus: logical replication rows/cols combinations

On 2022-Apr-27, Amit Kapila wrote:

On Tue, Apr 26, 2022 at 4:00 AM Tomas Vondra
<tomas.vondra@enterprisedb.com> wrote:

I can take a stab at it, but it seems strange to not apply the same
logic to evaluation of publish_as_relid.

Yeah, the current behavior seems to be consistent with what we already
do.

Sorry, this argument makes no sense to me. The combination of both
features is not consistent, and both features are new.
'publish_as_relid' is an implementation detail. If the implementation
fails to follow the feature design, then the implementation must be
fixed ... not the design!

IMO, we should first determine how we want row filters and column lists
to work when used in conjunction -- for relations (sets of rows) in a
general sense. After we have done that, then we can use that design to
drive how we want partitioned tables to be handled for it. Keep in mind
that when users see a partitioned table, what they first see is a table.
They want all their tables to work in pretty much the same way --
partitioned or not partitioned. The fact that a table is partitioned
should affect as little as possible the way it interacts with other
features.

Now, another possibility is to say "naah, this is too hard", or even
"naah, there's no time to write all that for this release". That might
be okay, but in that case let's add an implementation restriction to
ensure that we don't paint ourselves in a corner regarding what is
reasonable behavior. For example, an easy restriction might be: if a
table is in multiple publications with mismatching row filters/column
lists, then a subscriber is not allowed to subscribe to both
publications. (Maybe this restriction isn't exactly what we need so
that it actually implements what we need, not sure). Then, if/when in
the future we implement this correctly, we can lift the restriction.

--
Álvaro Herrera 48°01'N 7°57'E — https://www.EnterpriseDB.com/
"La conclusión que podemos sacar de esos estudios es que
no podemos sacar ninguna conclusión de ellos" (Tanenbaum)

houzj.fnst@fujitsu.com

over 3 years ago

In reply to: Amit Kapila (#3)

RE: bogus: logical replication rows/cols combinations

On Wednesday, April 27, 2022 12:56 PM From: Amit Kapila <amit.kapila16@gmail.com> wrote:

On Tue, Apr 26, 2022 at 4:00 AM Tomas Vondra
<tomas.vondra@enterprisedb.com> wrote:

On 4/25/22 17:48, Alvaro Herrera wrote:

The desired result on subscriber is:

table uno;
a │ b │ c
────┼───┼───
1 │ 2 │
-1 │ │ 4

Thoughts?

I'm not quite sure which of the two behaviors is more "desirable". In a
way, it's somewhat similar to publish_as_relid, which is also calculated
not considering which of the row filters match?

Right, or in other words, we check all publications to decide it and
similar is the case for publication actions which are also computed
independently for all publications.

But maybe you're right and it should behave the way you propose ... the
example I have in mind is a use case replicating table with two types of
rows - sensitive and non-sensitive. For sensitive, we replicate only
some of the columns, for non-sensitive we replicate everything. Which
could be implemented as two publications

create publication sensitive_rows
for table t (a, b) where (is_sensitive);

create publication non_sensitive_rows
for table t where (not is_sensitive);

But the way it's implemented now, we'll always replicate all columns,
because the second publication has no column list.

Changing this to behave the way you expect would be quite difficult,
because at the moment we build a single OR expression from all the row
filters. We'd have to keep the individual expressions, so that we can
build a column list for each of them (in order to ignore those that
don't match).

We'd have to remove various other optimizations - for example we can't
just discard row filters if we found "no_filter" publication.

I don't think that is the right way. We need some way to combine
expressions and I feel the current behavior is sane. I mean to say
that even if there is one publication that has no filter (column/row),
we should publish all rows with all columns. Now, as mentioned above
combining row filters or column lists for all publications appears to
be consistent with what we already do and seems correct behavior to
me.

To me, it appears that the method used to decide whether a particular
table is published or not is also similar to what we do for row
filters or column lists. Even if there is one publication that
publishes all tables, we consider the current table to be published
irrespective of whether other publications have published that table
or not.

Or more
precisely, we'd have to consider column lists too.

In other words, we'd have to merge pgoutput_column_list_init into
pgoutput_row_filter_init, and then modify pgoutput_row_filter to
evaluate the row filters one by one, and build the column list.

Hmm, I think even if we want to do something here, we also need to
think about how to achieve similar behavior for initial tablesync
which will be more tricky.

I think it could be difficult to make the initial tablesync behave the same.
Currently, we make a "COPY" command to do the table sync, I am not sure
how to change the "COPY" query to achieve the expected behavior here.

BTW, For the use case mentioned here:
"""
replicating table with two types of
rows - sensitive and non-sensitive. For sensitive, we replicate only
some of the columns, for non-sensitive we replicate everything.
"""

One approach to do this is to create two subscriptions and two
publications which seems a workaround.
-----
create publication uno for table uno (a, b) where (a > 0);
create publication dos for table uno (a, c) where (a < 0);

create subscription sub_uno connection 'port=55432 dbname=alvherre' publication uno;
create subscription sub_dos connection 'port=55432 dbname=alvherre' publication dos;
-----

Best regards,
Hou zj

Amit Kapila

amit.kapila16@gmail.com

over 3 years ago

In reply to: Alvaro Herrera (#5)

Re: bogus: logical replication rows/cols combinations

On Wed, Apr 27, 2022 at 3:13 PM Alvaro Herrera <alvherre@alvh.no-ip.org> wrote:

On 2022-Apr-26, Tomas Vondra wrote:

I'm not quite sure which of the two behaviors is more "desirable". In a
way, it's somewhat similar to publish_as_relid, which is also calculated
not considering which of the row filters match?

I grepped doc/src/sgml for `publish_as_relid` and found no hits, so
I suppose it's not a user-visible feature as such.

`publish_as_relid` is computed based on 'publish_via_partition_root'
setting of publication which is a user-visible feature.

But maybe you're right and it should behave the way you propose ... the
example I have in mind is a use case replicating table with two types of
rows - sensitive and non-sensitive. For sensitive, we replicate only
some of the columns, for non-sensitive we replicate everything.

Exactly. If we blindly publish row/column values that aren't in *any*
publications, this may lead to leaking protected values.

Changing this to behave the way you expect would be quite difficult,
because at the moment we build a single OR expression from all the row
filters. We'd have to keep the individual expressions, so that we can
build a column list for each of them (in order to ignore those that
don't match).

I think we should do that, yeah.

This can hit the performance as we need to evaluate each expression
for each row.

I can take a stab at it, but it seems strange to not apply the same
logic to evaluation of publish_as_relid. I wonder what Amit thinks about
this, as he wrote the row filter stuff.

By grepping publicationcmds.c, it seems that publish_as_relid refers to
the ancestor partitioned table that is used for column list and
rowfilter determination, when a partition is being published as part of
it.

Yeah, this is true when the corresponding publication has set
'publish_via_partition_root' as true.

I don't think these things are exactly parallel.

Currently, when the subscription has multiple publications, we combine
the objects, and actions of those publications. It happens for
'publish_via_partition_root', publication actions, tables, column
lists, or row filters. I think the whole design works on this idea
even the initial table sync. I think it might need a major change
(which I am not sure about at this stage) if we want to make the
initial sync also behave similar to what you are proposing.

I feel it would be much easier to create two different subscriptions
as mentioned by Hou-San [1]/messages/by-id/OS0PR01MB5716B82315A067F1D78F247E94FA9@OS0PR01MB5716.jpnprd01.prod.outlook.com for the case you are talking about if the
user really needs something like that.

... In fact I think they are quite orthogonal: probably you should be
able to publish a partitioned table in two publications, with different
rowfilters and different column lists (which can come from the
topmost partitioned table), and each partition should still work in the
way I describe above.

We consider the column lists or row filters for either the partition
(on which the current operation is performed) or partitioned table
based on 'publish_via_partition_root' parameter of publication.

[1]: /messages/by-id/OS0PR01MB5716B82315A067F1D78F247E94FA9@OS0PR01MB5716.jpnprd01.prod.outlook.com

--
With Regards,
Amit Kapila.

Alvaro Herrera

alvherre@alvh.no-ip.org

over 3 years ago

In reply to: Amit Kapila (#8)

Re: bogus: logical replication rows/cols combinations

On 2022-Apr-27, Amit Kapila wrote:

On Wed, Apr 27, 2022 at 3:13 PM Alvaro Herrera <alvherre@alvh.no-ip.org> wrote:

Changing this to behave the way you expect would be quite difficult,
because at the moment we build a single OR expression from all the row
filters. We'd have to keep the individual expressions, so that we can
build a column list for each of them (in order to ignore those that
don't match).

I think we should do that, yeah.

This can hit the performance as we need to evaluate each expression
for each row.

So we do things because they are easy and fast, rather than because they
work correctly?

... In fact I think they are quite orthogonal: probably you should be
able to publish a partitioned table in two publications, with different
rowfilters and different column lists (which can come from the
topmost partitioned table), and each partition should still work in the
way I describe above.

We consider the column lists or row filters for either the partition
(on which the current operation is performed) or partitioned table
based on 'publish_via_partition_root' parameter of publication.

OK, but this isn't relevant to what I wrote.

--
Álvaro Herrera PostgreSQL Developer — https://www.EnterpriseDB.com/

#10

Amit Kapila

amit.kapila16@gmail.com

over 3 years ago

In reply to: Alvaro Herrera (#9)

Re: bogus: logical replication rows/cols combinations

On Wed, Apr 27, 2022 at 4:27 PM Alvaro Herrera <alvherre@alvh.no-ip.org> wrote:

On 2022-Apr-27, Amit Kapila wrote:

On Wed, Apr 27, 2022 at 3:13 PM Alvaro Herrera <alvherre@alvh.no-ip.org> wrote:

Changing this to behave the way you expect would be quite difficult,
because at the moment we build a single OR expression from all the row
filters. We'd have to keep the individual expressions, so that we can
build a column list for each of them (in order to ignore those that
don't match).

I think we should do that, yeah.

This can hit the performance as we need to evaluate each expression
for each row.

So we do things because they are easy and fast, rather than because they
work correctly?

The point is I am not sure if what you are saying is better behavior
than current but if others feel it is better then we can try to do
something for it. In the above sentence, I just wanted to say that it
will impact performance but if that is required then sure we should do
it that way.

--
With Regards,
Amit Kapila.

#11

Tomas Vondra

tomas.vondra@enterprisedb.com

over 3 years ago

In reply to: Amit Kapila (#10)

1 attachment(s)

Re: bogus: logical replication rows/cols combinations

Hi,

so I've been looking at tweaking the code so that the behavior matches
Alvaro's expectations. It passes check-world but I'm not claiming it's
nowhere near commitable - the purpose is mostly to give better idea of
how invasive the change is etc.

As described earlier, this abandons the idea of building a single OR
expression from all the row filters (per action), and replaces that with
a list of per-publication info (struct PublicationInfo), combining info
about both row filters and column lists.

This means we can't initialize the row filters and column lists
separately, but at the same time. So pgoutput_row_filter_init was
modified to initialize both, and pgoutput_column_list_init was removed.

With this info, we can calculate column lists only for publications with
matching row filters, which is what the modified pgoutput_row_filter
does (the calculated column list is returned through a parameter).

This however does not remove the 'columns' from RelationSyncEntry
entirely. We still need that "superset" column list when sending schema.

Imagine two publications, one replicating (a,b) and the other (a,c),
maybe depending on row filter. send_relation_and_attrs() needs to send
info about all three attributes (a,b,c), i.e. about any attribute that
might end up being replicated.

We might try to be smarter and send the exact schema needed by the next
operation, i.e. when inserting (a,b) we'd make sure the last schema we
sent was (a,b) and invalidate/resend it otherwise. But that might easily
result in "trashing" where we send the schema and the next operation
invalidates it right away because it needs a different schema.

But there's another reason to do it like this - it seems desirable to
actually reset columns don't match the calculated column list. Using
Alvaro's example, it seems reasonable to expect these two transactions
to produce the same result on the subscriber:

1) insert (a,b) + update to (a,c)

insert into uno values (1, 2, 3);
update uno set a = -1 where a = 1;

2) insert (a,c)

insert into uno values (-1, 2, 3);

But to do this, the update actually needs to send (-1,NULL,3).

So in this case we'll have (a,b,c) column list in RelationSyncEntry, and
only attributes on this list will be sent as part of schema. And DML
actions we'll calculate either (a,b) or (a,c) depending on the row
filter, and missing attributes will be replicated as NULL.

I haven't done any tests how this affect performance, but I have a
couple thoughts regarding that:

a) I kinda doubt the optimizations would really matter in practice,
because how likely is it that one relation is in many publications (in
the same subscription)?

b) Did anyone actually do some benchmarks that I could repeat, to see
how much worse this is?

c) AFAICS we could optimize this in at least some common cases. For
example we could combine the entries with matching row filters, and/or
column filters.

regards

--
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Attachments:

rework-column-row-filtering.patchtext/x-patch; charset=UTF-8; name=rework-column-row-filtering.patchDownload

diff --git a/src/backend/replication/logical/proto.c b/src/backend/replication/logical/proto.c
index ff8513e2d2..41c5e3413f 100644
--- a/src/backend/replication/logical/proto.c
+++ b/src/backend/replication/logical/proto.c
@@ -33,7 +33,9 @@ static void logicalrep_write_attrs(StringInfo out, Relation rel,
 								   Bitmapset *columns);
 static void logicalrep_write_tuple(StringInfo out, Relation rel,
 								   TupleTableSlot *slot,
-								   bool binary, Bitmapset *columns);
+								   bool binary,
+								   Bitmapset *schema_columns,
+								   Bitmapset *columns);
 static void logicalrep_read_attrs(StringInfo in, LogicalRepRelation *rel);
 static void logicalrep_read_tuple(StringInfo in, LogicalRepTupleData *tuple);
 
@@ -412,7 +414,8 @@ logicalrep_read_origin(StringInfo in, XLogRecPtr *origin_lsn)
  */
 void
 logicalrep_write_insert(StringInfo out, TransactionId xid, Relation rel,
-						TupleTableSlot *newslot, bool binary, Bitmapset *columns)
+						TupleTableSlot *newslot, bool binary,
+						Bitmapset *schema_columns, Bitmapset *columns)
 {
 	pq_sendbyte(out, LOGICAL_REP_MSG_INSERT);
 
@@ -424,7 +427,8 @@ logicalrep_write_insert(StringInfo out, TransactionId xid, Relation rel,
 	pq_sendint32(out, RelationGetRelid(rel));
 
 	pq_sendbyte(out, 'N');		/* new tuple follows */
-	logicalrep_write_tuple(out, rel, newslot, binary, columns);
+	logicalrep_write_tuple(out, rel, newslot, binary,
+						   schema_columns, columns);
 }
 
 /*
@@ -457,7 +461,8 @@ logicalrep_read_insert(StringInfo in, LogicalRepTupleData *newtup)
 void
 logicalrep_write_update(StringInfo out, TransactionId xid, Relation rel,
 						TupleTableSlot *oldslot, TupleTableSlot *newslot,
-						bool binary, Bitmapset *columns)
+						bool binary, Bitmapset *schema_columns,
+						Bitmapset *columns)
 {
 	pq_sendbyte(out, LOGICAL_REP_MSG_UPDATE);
 
@@ -478,11 +483,12 @@ logicalrep_write_update(StringInfo out, TransactionId xid, Relation rel,
 			pq_sendbyte(out, 'O');	/* old tuple follows */
 		else
 			pq_sendbyte(out, 'K');	/* old key follows */
-		logicalrep_write_tuple(out, rel, oldslot, binary, NULL);
+		logicalrep_write_tuple(out, rel, oldslot, binary, NULL, NULL);
 	}
 
 	pq_sendbyte(out, 'N');		/* new tuple follows */
-	logicalrep_write_tuple(out, rel, newslot, binary, columns);
+	logicalrep_write_tuple(out, rel, newslot, binary,
+						   schema_columns, columns);
 }
 
 /*
@@ -551,7 +557,7 @@ logicalrep_write_delete(StringInfo out, TransactionId xid, Relation rel,
 	else
 		pq_sendbyte(out, 'K');	/* old key follows */
 
-	logicalrep_write_tuple(out, rel, oldslot, binary, NULL);
+	logicalrep_write_tuple(out, rel, oldslot, binary, NULL, NULL);
 }
 
 /*
@@ -766,7 +772,8 @@ logicalrep_read_typ(StringInfo in, LogicalRepTyp *ltyp)
  */
 static void
 logicalrep_write_tuple(StringInfo out, Relation rel, TupleTableSlot *slot,
-					   bool binary, Bitmapset *columns)
+					   bool binary,
+					   Bitmapset *schema_columns, Bitmapset *columns)
 {
 	TupleDesc	desc;
 	Datum	   *values;
@@ -783,7 +790,7 @@ logicalrep_write_tuple(StringInfo out, Relation rel, TupleTableSlot *slot,
 		if (att->attisdropped || att->attgenerated)
 			continue;
 
-		if (!column_in_column_list(att->attnum, columns))
+		if (!column_in_column_list(att->attnum, schema_columns))
 			continue;
 
 		nliveatts++;
@@ -804,10 +811,23 @@ logicalrep_write_tuple(StringInfo out, Relation rel, TupleTableSlot *slot,
 		if (att->attisdropped || att->attgenerated)
 			continue;
 
-		if (!column_in_column_list(att->attnum, columns))
+		/*
+		 * Columns that are not in schema (union of column lists) should
+		 * be skipped entirely.
+		 */
+		if (!column_in_column_list(att->attnum, schema_columns))
 			continue;
 
-		if (isnull[i])
+		/*
+		 * Columns not in the column list (derived consindering row filters)
+		 * we just send NULL.
+		 *
+		 * XXX Not sure this is quite correct, though. Imagine you replicate
+		 * values for columns (A,B), but it changes the row filter. Can we
+		 * send NULL that would overwrite "proper" value replicated earlier?
+		 */
+		if (isnull[i] ||
+			!column_in_column_list(att->attnum, columns))
 		{
 			pq_sendbyte(out, LOGICALREP_COLUMN_NULL);
 			continue;
diff --git a/src/backend/replication/pgoutput/pgoutput.c b/src/backend/replication/pgoutput/pgoutput.c
index b197bfd565..85456e6d9f 100644
--- a/src/backend/replication/pgoutput/pgoutput.c
+++ b/src/backend/replication/pgoutput/pgoutput.c
@@ -137,13 +137,13 @@ typedef struct RelationSyncEntry
 	PublicationActions pubactions;
 
 	/*
-	 * ExprState array for row filter. Different publication actions don't
-	 * allow multiple expressions to always be combined into one, because
-	 * updates or deletes restrict the column in expression to be part of the
-	 * replica identity index whereas inserts do not have this restriction, so
-	 * there is one ExprState per publication action.
+	 * Info about row filters and column lists for each publication this
+	 * relation is included in. We keep a list with per-publication info in
+	 * order to calculate appropriate column list depending on which row
+	 * filter(s) match. Each element contains an ExprState for the filter
+	 * and column list.
 	 */
-	ExprState  *exprstate[NUM_ROWFILTER_PUBACTIONS];
+	dlist_head	pubinfos;		/* one per publication */
 	EState	   *estate;			/* executor state used for row filter */
 	TupleTableSlot *new_slot;	/* slot for storing new tuple */
 	TupleTableSlot *old_slot;	/* slot for storing old tuple */
@@ -208,6 +208,29 @@ typedef struct PGOutputTxnData
 								 * been sent */
 }		PGOutputTxnData;
 
+/*
+ * Info use to track and evaluate row filters for each publication the relation
+ * is included in, and calculat ethe column list.
+ */
+typedef struct PublicationInfo {
+
+	/* doubly-linked list */
+	dlist_node	node;
+
+	/* publication OID (XXX not really needed) */
+	Oid			oid;
+
+	/* row filter (expression state) */
+	ExprState  *rowfilter;
+
+	/* column list */
+	Bitmapset  *columns;
+
+	/* actions published by the publication */
+	PublicationActions	pubactions;
+
+} PublicationInfo;
+
 /* Map used to remember which relation schemas we sent. */
 static HTAB *RelationSyncCache = NULL;
 
@@ -235,12 +258,8 @@ static bool pgoutput_row_filter_exec_expr(ExprState *state,
 static bool pgoutput_row_filter(Relation relation, TupleTableSlot *old_slot,
 								TupleTableSlot **new_slot_ptr,
 								RelationSyncEntry *entry,
-								ReorderBufferChangeType *action);
-
-/* column list routines */
-static void pgoutput_column_list_init(PGOutputData *data,
-									  List *publications,
-									  RelationSyncEntry *entry);
+								ReorderBufferChangeType *action,
+								Bitmapset **column_list);
 
 /*
  * Specify output plugin callbacks
@@ -822,18 +841,24 @@ pgoutput_ensure_entry_cxt(PGOutputData *data, RelationSyncEntry *entry)
 }
 
 /*
- * Initialize the row filter.
+ * Initialize the row filter and column list.
+ *
+ * Prepare information (ExprState, etc) used to evaluate per-publication row
+ * filters, and column lists.
+ *
+ * We also calculate a "total" column list as a union of all per-publication
+ * column lists, irrespectedly of row filters. This is used to send schema
+ * for the relsync entry, etc.
  */
 static void
 pgoutput_row_filter_init(PGOutputData *data, List *publications,
 						 RelationSyncEntry *entry)
 {
 	ListCell   *lc;
-	List	   *rfnodes[] = {NIL, NIL, NIL};	/* One per pubaction */
-	bool		no_filter[] = {false, false, false};	/* One per pubaction */
 	MemoryContext oldctx;
-	int			idx;
-	bool		has_filter = true;
+	bool		all_columns = false;
+
+	dlist_init(&entry->pubinfos);
 
 	/*
 	 * Find if there are any row filters for this relation. If there are, then
@@ -855,7 +880,12 @@ pgoutput_row_filter_init(PGOutputData *data, List *publications,
 		Publication *pub = lfirst(lc);
 		HeapTuple	rftuple = NULL;
 		Datum		rfdatum = 0;
+		Datum		cfdatum = 0;
 		bool		pub_no_filter = false;
+		bool		pub_no_list = false;
+		Relation	relation = RelationIdGetRelation(entry->publish_as_relid);
+
+		PublicationInfo *pubinfo = NULL;
 
 		if (pub->alltables)
 		{
@@ -865,6 +895,7 @@ pgoutput_row_filter_init(PGOutputData *data, List *publications,
 			 * publications it does).
 			 */
 			pub_no_filter = true;
+			pub_no_list = true;
 		}
 		else
 		{
@@ -881,191 +912,75 @@ pgoutput_row_filter_init(PGOutputData *data, List *publications,
 				rfdatum = SysCacheGetAttr(PUBLICATIONRELMAP, rftuple,
 										  Anum_pg_publication_rel_prqual,
 										  &pub_no_filter);
+
+				/*
+				 * Lookup the column list attribute.
+				 *
+				 * Null indicates no list.
+				 */
+				cfdatum = SysCacheGetAttr(PUBLICATIONRELMAP, rftuple,
+										  Anum_pg_publication_rel_prattrs,
+										  &pub_no_list);
 			}
 			else
 			{
 				pub_no_filter = true;
+				pub_no_list = true;
 			}
 		}
 
-		if (pub_no_filter)
-		{
-			if (rftuple)
-				ReleaseSysCache(rftuple);
-
-			no_filter[PUBACTION_INSERT] |= pub->pubactions.pubinsert;
-			no_filter[PUBACTION_UPDATE] |= pub->pubactions.pubupdate;
-			no_filter[PUBACTION_DELETE] |= pub->pubactions.pubdelete;
-
-			/*
-			 * Quick exit if all the DML actions are publicized via this
-			 * publication.
-			 */
-			if (no_filter[PUBACTION_INSERT] &&
-				no_filter[PUBACTION_UPDATE] &&
-				no_filter[PUBACTION_DELETE])
-			{
-				has_filter = false;
-				break;
-			}
-
-			/* No additional work for this publication. Next one. */
-			continue;
-		}
-
-		/* Form the per pubaction row filter lists. */
-		if (pub->pubactions.pubinsert && !no_filter[PUBACTION_INSERT])
-			rfnodes[PUBACTION_INSERT] = lappend(rfnodes[PUBACTION_INSERT],
-												TextDatumGetCString(rfdatum));
-		if (pub->pubactions.pubupdate && !no_filter[PUBACTION_UPDATE])
-			rfnodes[PUBACTION_UPDATE] = lappend(rfnodes[PUBACTION_UPDATE],
-												TextDatumGetCString(rfdatum));
-		if (pub->pubactions.pubdelete && !no_filter[PUBACTION_DELETE])
-			rfnodes[PUBACTION_DELETE] = lappend(rfnodes[PUBACTION_DELETE],
-												TextDatumGetCString(rfdatum));
-
-		ReleaseSysCache(rftuple);
-	}							/* loop all subscribed publications */
-
-	/* Clean the row filter */
-	for (idx = 0; idx < NUM_ROWFILTER_PUBACTIONS; idx++)
-	{
-		if (no_filter[idx])
-		{
-			list_free_deep(rfnodes[idx]);
-			rfnodes[idx] = NIL;
-		}
-	}
-
-	if (has_filter)
-	{
-		Relation	relation = RelationIdGetRelation(entry->publish_as_relid);
-
 		pgoutput_ensure_entry_cxt(data, entry);
 
-		/*
-		 * Now all the filters for all pubactions are known. Combine them when
-		 * their pubactions are the same.
-		 */
 		oldctx = MemoryContextSwitchTo(entry->entry_cxt);
-		entry->estate = create_estate_for_relation(relation);
-		for (idx = 0; idx < NUM_ROWFILTER_PUBACTIONS; idx++)
-		{
-			List	   *filters = NIL;
-			Expr	   *rfnode;
 
-			if (rfnodes[idx] == NIL)
-				continue;
-
-			foreach(lc, rfnodes[idx])
-				filters = lappend(filters, stringToNode((char *) lfirst(lc)));
-
-			/* combine the row filter and cache the ExprState */
-			rfnode = make_orclause(filters);
-			entry->exprstate[idx] = ExecPrepareExpr(rfnode, entry->estate);
-		}						/* for each pubaction */
-		MemoryContextSwitchTo(oldctx);
-
-		RelationClose(relation);
-	}
-}
+		pubinfo = (PublicationInfo *) palloc0(sizeof(PublicationInfo));
 
-/*
- * Initialize the column list.
- */
-static void
-pgoutput_column_list_init(PGOutputData *data, List *publications,
-						  RelationSyncEntry *entry)
-{
-	ListCell   *lc;
+		pubinfo->oid = pub->oid;
+		pubinfo->pubactions = pub->pubactions;
 
-	/*
-	 * Find if there are any column lists for this relation. If there are,
-	 * build a bitmap merging all the column lists.
-	 *
-	 * All the given publication-table mappings must be checked.
-	 *
-	 * Multiple publications might have multiple column lists for this relation.
-	 *
-	 * FOR ALL TABLES and FOR ALL TABLES IN SCHEMA implies "don't use column
-	 * list" so it takes precedence.
-	 */
-	foreach(lc, publications)
-	{
-		Publication *pub = lfirst(lc);
-		HeapTuple	cftuple = NULL;
-		Datum		cfdatum = 0;
+		if (!pub_no_filter)
+		{
+			entry->estate = create_estate_for_relation(relation);
 
-		/*
-		 * Assume there's no column list. Only if we find pg_publication_rel
-		 * entry with a column list we'll switch it to false.
-		 */
-		bool		pub_no_list = true;
+			pubinfo->rowfilter
+				= ExecPrepareExpr(stringToNode(TextDatumGetCString(rfdatum)),
+								  entry->estate);
+		}
 
 		/*
-		 * If the publication is FOR ALL TABLES then it is treated the same as if
-		 * there are no column lists (even if other publications have a list).
+		 * Build the column list bitmap in the per-entry context.
+		 *
+		 * We need to merge column lists from all publications, so we
+		 * update the same bitmapset. If the column list is null, we
+		 * interpret it as replicating all columns.
 		 */
-		if (!pub->alltables)
+		pubinfo->columns = NULL;
+		if (!pub_no_list)	/* when not null */
 		{
-			/*
-			 * Check for the presence of a column list in this publication.
-			 *
-			 * Note: If we find no pg_publication_rel row, it's a publication
-			 * defined for a whole schema, so it can't have a column list, just
-			 * like a FOR ALL TABLES publication.
-			 */
-			cftuple = SearchSysCache2(PUBLICATIONRELMAP,
-									  ObjectIdGetDatum(entry->publish_as_relid),
-									  ObjectIdGetDatum(pub->oid));
-
-			if (HeapTupleIsValid(cftuple))
-			{
-				/*
-				 * Lookup the column list attribute.
-				 *
-				 * Note: We update the pub_no_list value directly, because if
-				 * the value is NULL, we have no list (and vice versa).
-				 */
-				cfdatum = SysCacheGetAttr(PUBLICATIONRELMAP, cftuple,
-										  Anum_pg_publication_rel_prattrs,
-										  &pub_no_list);
-
-				/*
-				 * Build the column list bitmap in the per-entry context.
-				 *
-				 * We need to merge column lists from all publications, so we
-				 * update the same bitmapset. If the column list is null, we
-				 * interpret it as replicating all columns.
-				 */
-				if (!pub_no_list)	/* when not null */
-				{
-					pgoutput_ensure_entry_cxt(data, entry);
+			pubinfo->columns = pub_collist_to_bitmapset(pubinfo->columns,
+														cfdatum,
+														entry->entry_cxt);
 
-					entry->columns = pub_collist_to_bitmapset(entry->columns,
-															  cfdatum,
-															  entry->entry_cxt);
-				}
-			}
+			entry->columns = pub_collist_to_bitmapset(entry->columns,
+													  cfdatum,
+													  entry->entry_cxt);
 		}
+		else
+			all_columns = true;
 
-		/*
-		 * Found a publication with no column list, so we're done. But first
-		 * discard column list we might have from preceding publications.
-		 */
-		if (pub_no_list)
-		{
-			if (cftuple)
-				ReleaseSysCache(cftuple);
+		if (HeapTupleIsValid(rftuple))
+			ReleaseSysCache(rftuple);
 
-			bms_free(entry->columns);
-			entry->columns = NULL;
+		MemoryContextSwitchTo(oldctx);
+		RelationClose(relation);
 
-			break;
-		}
+		dlist_push_tail(&entry->pubinfos, &pubinfo->node);
 
-		ReleaseSysCache(cftuple);
 	}							/* loop all subscribed publications */
+
+	/* any of the publications replicates all columns */
+	if (all_columns)
+		entry->columns = NULL;
 }
 
 /*
@@ -1115,7 +1030,8 @@ init_tuple_slot(PGOutputData *data, Relation relation,
 }
 
 /*
- * Change is checked against the row filter if any.
+ * Change is checked against the row filter if any, and calculate the column
+ * list applicable to the operation (with respect to matching row filters).
  *
  * Returns true if the change is to be replicated, else false.
  *
@@ -1136,6 +1052,8 @@ init_tuple_slot(PGOutputData *data, Relation relation,
  *
  * The new action is updated in the action parameter.
  *
+ * The calculated column list is returned in the column_list parameter.
+ *
  * The new slot could be updated when transforming the UPDATE into INSERT,
  * because the original new tuple might not have column values from the replica
  * identity.
@@ -1167,17 +1085,21 @@ init_tuple_slot(PGOutputData *data, Relation relation,
 static bool
 pgoutput_row_filter(Relation relation, TupleTableSlot *old_slot,
 					TupleTableSlot **new_slot_ptr, RelationSyncEntry *entry,
-					ReorderBufferChangeType *action)
+					ReorderBufferChangeType *action, Bitmapset **column_list)
 {
 	TupleDesc	desc;
 	int			i;
-	bool		old_matched,
-				new_matched,
+	bool		old_matched_any = false,
+				new_matched_any = false,
 				result;
-	TupleTableSlot *tmp_new_slot;
+	TupleTableSlot *tmp_new_slot = NULL;
 	TupleTableSlot *new_slot = *new_slot_ptr;
-	ExprContext *ecxt;
-	ExprState  *filter_exprstate;
+	dlist_iter	iter;
+	bool		matching = false;
+
+	/* Column list calculated from publications matching the row filter. */
+	Bitmapset  *columns = NULL;
+	bool		all_columns = false;
 
 	/*
 	 * We need this map to avoid relying on ReorderBufferChangeType enums
@@ -1195,115 +1117,191 @@ pgoutput_row_filter(Relation relation, TupleTableSlot *old_slot,
 
 	Assert(new_slot || old_slot);
 
-	/* Get the corresponding row filter */
-	filter_exprstate = entry->exprstate[map_changetype_pubaction[*action]];
+	dlist_foreach(iter, &entry->pubinfos)
+	{
+		ExprContext *ecxt;
+		bool		old_matched,
+					new_matched;
 
-	/* Bail out if there is no row filter */
-	if (!filter_exprstate)
-		return true;
+		PublicationInfo  *pubinfo
+			= dlist_container(PublicationInfo, node, iter.cur);
 
-	elog(DEBUG3, "table \"%s.%s\" has row filter",
-		 get_namespace_name(RelationGetNamespace(relation)),
-		 RelationGetRelationName(relation));
+		/* ignore publications not replicating this action */
+		if ((*action == REORDER_BUFFER_CHANGE_INSERT) &&
+			(!pubinfo->pubactions.pubinsert))
+			continue;
+		else if ((*action == REORDER_BUFFER_CHANGE_UPDATE) &&
+				 (!pubinfo->pubactions.pubupdate))
+			continue;
+		else if ((*action == REORDER_BUFFER_CHANGE_DELETE) &&
+			(!pubinfo->pubactions.pubdelete))
+			continue;
 
-	ResetPerTupleExprContext(entry->estate);
+		if (!pubinfo->rowfilter)
+		{
+			matching = true;
 
-	ecxt = GetPerTupleExprContext(entry->estate);
+			/*
+			 * Update/merge the column list.
+			 *
+			 * If the publication has no column list, we interpret it as a list
+			 * with all columns. Otherwise we just add it to the bitmap.
+			 *
+			 * FIXME This is repeated in three places. Maybe refactor?
+			 */
+			if (!pubinfo->columns)
+			{
+				all_columns = true;
+				bms_free(columns);
+				columns = NULL;
+			}
+			else if (!all_columns)
+				columns = bms_union(columns, pubinfo->columns);
 
-	/*
-	 * For the following occasions where there is only one tuple, we can
-	 * evaluate the row filter for that tuple and return.
-	 *
-	 * For inserts, we only have the new tuple.
-	 *
-	 * For updates, we can have only a new tuple when none of the replica
-	 * identity columns changed and none of those columns have external data
-	 * but we still need to evaluate the row filter for the new tuple as the
-	 * existing values of those columns might not match the filter. Also, users
-	 * can use constant expressions in the row filter, so we anyway need to
-	 * evaluate it for the new tuple.
-	 *
-	 * For deletes, we only have the old tuple.
-	 */
-	if (!new_slot || !old_slot)
-	{
-		ecxt->ecxt_scantuple = new_slot ? new_slot : old_slot;
-		result = pgoutput_row_filter_exec_expr(filter_exprstate, ecxt);
+			continue;
+		}
 
-		return result;
-	}
+		elog(DEBUG3, "table \"%s.%s\" has row filter",
+			get_namespace_name(RelationGetNamespace(relation)),
+			RelationGetRelationName(relation));
 
-	/*
-	 * Both the old and new tuples must be valid only for updates and need to
-	 * be checked against the row filter.
-	 */
-	Assert(map_changetype_pubaction[*action] == PUBACTION_UPDATE);
+		ResetPerTupleExprContext(entry->estate);
+
+		ecxt = GetPerTupleExprContext(entry->estate);
 
-	slot_getallattrs(new_slot);
-	slot_getallattrs(old_slot);
+		/*
+		 * For the following occasions where there is only one tuple, we can
+		 * evaluate the row filter for that tuple and return.
+		 *
+		 * For inserts, we only have the new tuple.
+		 *
+		 * For updates, we can have only a new tuple when none of the replica
+		 * identity columns changed and none of those columns have external data
+		 * but we still need to evaluate the row filter for the new tuple as the
+		 * existing values of those columns might not match the filter. Also, users
+		 * can use constant expressions in the row filter, so we anyway need to
+		 * evaluate it for the new tuple.
+		 *
+		 * For deletes, we only have the old tuple.
+		 */
+		if (!new_slot || !old_slot)
+		{
+			ecxt->ecxt_scantuple = new_slot ? new_slot : old_slot;
+			result = pgoutput_row_filter_exec_expr(pubinfo->rowfilter, ecxt);
 
-	tmp_new_slot = NULL;
-	desc = RelationGetDescr(relation);
+			matching |= result;
 
-	/*
-	 * The new tuple might not have all the replica identity columns, in which
-	 * case it needs to be copied over from the old tuple.
-	 */
-	for (i = 0; i < desc->natts; i++)
-	{
-		Form_pg_attribute att = TupleDescAttr(desc, i);
+			/*
+			 * FIXME refactor to reuse this code in multiple places
+			 * 
+			 * XXX Should this only update column list when using new slot?
+			 * If evaluationg old slot, that's delete, no?
+			 */
+			if (result)
+			{
+				if (!pubinfo->columns)
+				{
+					all_columns = true;
+					bms_free(columns);
+					columns = NULL;
+				}
+				else if (!all_columns)
+				{
+					columns = bms_union(columns, pubinfo->columns);
+				}
+			}
+
+			continue;
+		}
 
 		/*
-		 * if the column in the new tuple or old tuple is null, nothing to do
+		 * Both the old and new tuples must be valid only for updates and need to
+		 * be checked against the row filter.
 		 */
-		if (new_slot->tts_isnull[i] || old_slot->tts_isnull[i])
-			continue;
+		Assert(map_changetype_pubaction[*action] == PUBACTION_UPDATE);
+
+		slot_getallattrs(new_slot);
+		slot_getallattrs(old_slot);
+
+		tmp_new_slot = NULL;
+		desc = RelationGetDescr(relation);
 
 		/*
-		 * Unchanged toasted replica identity columns are only logged in the
-		 * old tuple. Copy this over to the new tuple. The changed (or WAL
-		 * Logged) toast values are always assembled in memory and set as
-		 * VARTAG_INDIRECT. See ReorderBufferToastReplace.
+		 * The new tuple might not have all the replica identity columns, in which
+		 * case it needs to be copied over from the old tuple.
 		 */
-		if (att->attlen == -1 &&
-			VARATT_IS_EXTERNAL_ONDISK(new_slot->tts_values[i]) &&
-			!VARATT_IS_EXTERNAL_ONDISK(old_slot->tts_values[i]))
+		for (i = 0; i < desc->natts; i++)
 		{
-			if (!tmp_new_slot)
+			Form_pg_attribute att = TupleDescAttr(desc, i);
+
+			/*
+			 * if the column in the new tuple or old tuple is null, nothing to do
+			 */
+			if (new_slot->tts_isnull[i] || old_slot->tts_isnull[i])
+				continue;
+
+			/*
+			 * Unchanged toasted replica identity columns are only logged in the
+			 * old tuple. Copy this over to the new tuple. The changed (or WAL
+			 * Logged) toast values are always assembled in memory and set as
+			 * VARTAG_INDIRECT. See ReorderBufferToastReplace.
+			 */
+			if (att->attlen == -1 &&
+				VARATT_IS_EXTERNAL_ONDISK(new_slot->tts_values[i]) &&
+				!VARATT_IS_EXTERNAL_ONDISK(old_slot->tts_values[i]))
 			{
-				tmp_new_slot = MakeSingleTupleTableSlot(desc, &TTSOpsVirtual);
-				ExecClearTuple(tmp_new_slot);
+				if (!tmp_new_slot)
+				{
+					tmp_new_slot = MakeSingleTupleTableSlot(desc, &TTSOpsVirtual);
+					ExecClearTuple(tmp_new_slot);
 
-				memcpy(tmp_new_slot->tts_values, new_slot->tts_values,
-					   desc->natts * sizeof(Datum));
-				memcpy(tmp_new_slot->tts_isnull, new_slot->tts_isnull,
-					   desc->natts * sizeof(bool));
-			}
+					memcpy(tmp_new_slot->tts_values, new_slot->tts_values,
+						   desc->natts * sizeof(Datum));
+					memcpy(tmp_new_slot->tts_isnull, new_slot->tts_isnull,
+						   desc->natts * sizeof(bool));
+				}
 
-			tmp_new_slot->tts_values[i] = old_slot->tts_values[i];
-			tmp_new_slot->tts_isnull[i] = old_slot->tts_isnull[i];
+				tmp_new_slot->tts_values[i] = old_slot->tts_values[i];
+				tmp_new_slot->tts_isnull[i] = old_slot->tts_isnull[i];
+			}
 		}
-	}
 
-	ecxt->ecxt_scantuple = old_slot;
-	old_matched = pgoutput_row_filter_exec_expr(filter_exprstate, ecxt);
+		ecxt->ecxt_scantuple = old_slot;
 
-	if (tmp_new_slot)
-	{
-		ExecStoreVirtualTuple(tmp_new_slot);
-		ecxt->ecxt_scantuple = tmp_new_slot;
-	}
-	else
-		ecxt->ecxt_scantuple = new_slot;
+		old_matched = pgoutput_row_filter_exec_expr(pubinfo->rowfilter, ecxt);
+		old_matched_any |= old_matched;
+
+		if (tmp_new_slot)
+		{
+			ExecStoreVirtualTuple(tmp_new_slot);
+			ecxt->ecxt_scantuple = tmp_new_slot;
+		}
+		else
+			ecxt->ecxt_scantuple = new_slot;
 
-	new_matched = pgoutput_row_filter_exec_expr(filter_exprstate, ecxt);
+		new_matched = pgoutput_row_filter_exec_expr(pubinfo->rowfilter, ecxt);
+		new_matched_any |= new_matched;
 
-	/*
-	 * Case 1: if both tuples don't match the row filter, bailout. Send
-	 * nothing.
-	 */
-	if (!old_matched && !new_matched)
-		return false;
+		/*
+		 * Case 1: if both tuples don't match the row filter, bailout. Send
+		 * nothing.
+		 */
+		if (!old_matched && !new_matched)
+			continue;	/* continue with the next row filter */
+
+		/*
+		 * Case 4: if both tuples match the row filter, transformation isn't
+		 * required. (*action is default UPDATE).
+		 */
+		if (!pubinfo->columns)
+		{
+			all_columns = true;
+			bms_free(columns);
+			columns = NULL;
+		}
+		else if (!all_columns && new_matched)
+			columns = bms_union(columns, pubinfo->columns);
+	}
 
 	/*
 	 * Case 2: if the old tuple doesn't satisfy the row filter but the new
@@ -1314,9 +1312,10 @@ pgoutput_row_filter(Relation relation, TupleTableSlot *old_slot,
 	 * while inserting the tuple in the downstream node, we have all the
 	 * required column values.
 	 */
-	if (!old_matched && new_matched)
+	if (!old_matched_any && new_matched_any)
 	{
 		*action = REORDER_BUFFER_CHANGE_INSERT;
+		matching = true;
 
 		if (tmp_new_slot)
 			*new_slot_ptr = tmp_new_slot;
@@ -1329,15 +1328,18 @@ pgoutput_row_filter(Relation relation, TupleTableSlot *old_slot,
 	 * This transformation does not require another tuple. The Old tuple will
 	 * be used for DELETE.
 	 */
-	else if (old_matched && !new_matched)
+	else if (old_matched_any && !new_matched_any)
+	{
 		*action = REORDER_BUFFER_CHANGE_DELETE;
+		matching = true;
+	}
+	else if (old_matched_any && new_matched_any)
+		matching = true;
 
-	/*
-	 * Case 4: if both tuples match the row filter, transformation isn't
-	 * required. (*action is default UPDATE).
-	 */
+	if (column_list)
+		*column_list = columns;
 
-	return true;
+	return matching;
 }
 
 /*
@@ -1359,6 +1361,7 @@ pgoutput_change(LogicalDecodingContext *ctx, ReorderBufferTXN *txn,
 	ReorderBufferChangeType action = change->action;
 	TupleTableSlot *old_slot = NULL;
 	TupleTableSlot *new_slot = NULL;
+	Bitmapset	   *columns = NULL;
 
 	if (!is_publishable_relation(relation))
 		return;
@@ -1423,7 +1426,7 @@ pgoutput_change(LogicalDecodingContext *ctx, ReorderBufferTXN *txn,
 
 			/* Check row filter */
 			if (!pgoutput_row_filter(targetrel, NULL, &new_slot, relentry,
-									 &action))
+									 &action, &columns))
 				break;
 
 			/*
@@ -1444,7 +1447,8 @@ pgoutput_change(LogicalDecodingContext *ctx, ReorderBufferTXN *txn,
 
 			OutputPluginPrepareWrite(ctx, true);
 			logicalrep_write_insert(ctx->out, xid, targetrel, new_slot,
-									data->binary, relentry->columns);
+									data->binary,
+									relentry->columns, columns);
 			OutputPluginWrite(ctx, true);
 			break;
 		case REORDER_BUFFER_CHANGE_UPDATE:
@@ -1483,7 +1487,7 @@ pgoutput_change(LogicalDecodingContext *ctx, ReorderBufferTXN *txn,
 
 			/* Check row filter */
 			if (!pgoutput_row_filter(targetrel, old_slot, &new_slot,
-									 relentry, &action))
+									 relentry, &action, &columns))
 				break;
 
 			/* Send BEGIN if we haven't yet */
@@ -1503,12 +1507,12 @@ pgoutput_change(LogicalDecodingContext *ctx, ReorderBufferTXN *txn,
 				case REORDER_BUFFER_CHANGE_INSERT:
 					logicalrep_write_insert(ctx->out, xid, targetrel,
 											new_slot, data->binary,
-											relentry->columns);
+											relentry->columns, columns);
 					break;
 				case REORDER_BUFFER_CHANGE_UPDATE:
 					logicalrep_write_update(ctx->out, xid, targetrel,
 											old_slot, new_slot, data->binary,
-											relentry->columns);
+											relentry->columns, columns);
 					break;
 				case REORDER_BUFFER_CHANGE_DELETE:
 					logicalrep_write_delete(ctx->out, xid, targetrel,
@@ -1547,7 +1551,7 @@ pgoutput_change(LogicalDecodingContext *ctx, ReorderBufferTXN *txn,
 
 				/* Check row filter */
 				if (!pgoutput_row_filter(targetrel, old_slot, &new_slot,
-										 relentry, &action))
+										 relentry, &action, NULL))
 					break;
 
 				/* Send BEGIN if we haven't yet */
@@ -1977,7 +1981,7 @@ get_rel_sync_entry(PGOutputData *data, Relation relation)
 			entry->pubactions.pubdelete = entry->pubactions.pubtruncate = false;
 		entry->new_slot = NULL;
 		entry->old_slot = NULL;
-		memset(entry->exprstate, 0, sizeof(entry->exprstate));
+		dlist_init(&entry->pubinfos);
 		entry->entry_cxt = NULL;
 		entry->publish_as_relid = InvalidOid;
 		entry->columns = NULL;
@@ -2056,7 +2060,7 @@ get_rel_sync_entry(PGOutputData *data, Relation relation)
 
 		entry->entry_cxt = NULL;
 		entry->estate = NULL;
-		memset(entry->exprstate, 0, sizeof(entry->exprstate));
+		dlist_init(&entry->pubinfos);
 
 		/*
 		 * Build publication cache. We can't use one provided by relcache as
@@ -2192,11 +2196,8 @@ get_rel_sync_entry(PGOutputData *data, Relation relation)
 			/* Initialize the tuple slot and map */
 			init_tuple_slot(data, relation, entry);
 
-			/* Initialize the row filter */
+			/* Initialize the row filter and column list info */
 			pgoutput_row_filter_init(data, rel_publications, entry);
-
-			/* Initialize the column list */
-			pgoutput_column_list_init(data, rel_publications, entry);
 		}
 
 		list_free(pubids);
diff --git a/src/include/replication/logicalproto.h b/src/include/replication/logicalproto.h
index a771ab8ff3..7d45b94d3c 100644
--- a/src/include/replication/logicalproto.h
+++ b/src/include/replication/logicalproto.h
@@ -209,12 +209,14 @@ extern char *logicalrep_read_origin(StringInfo in, XLogRecPtr *origin_lsn);
 extern void logicalrep_write_insert(StringInfo out, TransactionId xid,
 									Relation rel,
 									TupleTableSlot *newslot,
-									bool binary, Bitmapset *columns);
+									bool binary, Bitmapset *schema_columns,
+									Bitmapset *columns);
 extern LogicalRepRelId logicalrep_read_insert(StringInfo in, LogicalRepTupleData *newtup);
 extern void logicalrep_write_update(StringInfo out, TransactionId xid,
 									Relation rel,
 									TupleTableSlot *oldslot,
-									TupleTableSlot *newslot, bool binary, Bitmapset *columns);
+									TupleTableSlot *newslot, bool binary,
+									Bitmapset *schema_columns, Bitmapset *columns);
 extern LogicalRepRelId logicalrep_read_update(StringInfo in,
 											  bool *has_oldtuple, LogicalRepTupleData *oldtup,
 											  LogicalRepTupleData *newtup);

#12

Amit Kapila

amit.kapila16@gmail.com

over 3 years ago

In reply to: Tomas Vondra (#11)

Re: bogus: logical replication rows/cols combinations

On Thu, Apr 28, 2022 at 3:26 AM Tomas Vondra
<tomas.vondra@enterprisedb.com> wrote:

so I've been looking at tweaking the code so that the behavior matches
Alvaro's expectations. It passes check-world but I'm not claiming it's
nowhere near commitable - the purpose is mostly to give better idea of
how invasive the change is etc.

I was just skimming through the patch and didn't find anything related
to initial sync handling. I feel the behavior should be same for
initial sync and replication.

--
With Regards,
Amit Kapila.

#13

Peter Eisentraut

peter.eisentraut@enterprisedb.com

over 3 years ago

In reply to: Alvaro Herrera (#6)

Re: bogus: logical replication rows/cols combinations

On 27.04.22 11:53, Alvaro Herrera wrote:

Now, another possibility is to say "naah, this is too hard", or even
"naah, there's no time to write all that for this release". That might
be okay, but in that case let's add an implementation restriction to
ensure that we don't paint ourselves in a corner regarding what is
reasonable behavior. For example, an easy restriction might be: if a
table is in multiple publications with mismatching row filters/column
lists, then a subscriber is not allowed to subscribe to both
publications. (Maybe this restriction isn't exactly what we need so
that it actually implements what we need, not sure). Then, if/when in
the future we implement this correctly, we can lift the restriction.

My feeling is also that we should prohibit the combinations that we
cannot make work correctly.

#14

Peter Eisentraut

peter.eisentraut@enterprisedb.com

over 3 years ago

In reply to: Amit Kapila (#8)

Re: bogus: logical replication rows/cols combinations

On 27.04.22 12:33, Amit Kapila wrote:

Currently, when the subscription has multiple publications, we combine
the objects, and actions of those publications. It happens for
'publish_via_partition_root', publication actions, tables, column
lists, or row filters. I think the whole design works on this idea
even the initial table sync. I think it might need a major change
(which I am not sure about at this stage) if we want to make the
initial sync also behave similar to what you are proposing.

If one publication says "publish if insert" and another publication says
"publish if update", then the combination of that is clearly "publish if
insert or update". Similarly, if one publication says "WHERE (foo)" and
one says "WHERE (bar)", then the combination is "WHERE (foo OR bar)".

But if one publication says "publish columns a and b if condition-X" and
another publication says "publish columns a and c if not-condition-X",
then the combination is clearly *not* "publish columns a, b, c if true".
That is not logical, in the literal sense of that word.

I wonder how we handle the combination of

pub1: publish=insert WHERE (foo)
pub2: publish=update WHERE (bar)

I think it would be incorrect if the combination is

pub1, pub2: publish=insert,update WHERE (foo OR bar).

#15

Tomas Vondra

tomas.vondra@enterprisedb.com

over 3 years ago

In reply to: Amit Kapila (#12)

1 attachment(s)

Re: bogus: logical replication rows/cols combinations

On 4/28/22 05:17, Amit Kapila wrote:

On Thu, Apr 28, 2022 at 3:26 AM Tomas Vondra
<tomas.vondra@enterprisedb.com> wrote:

so I've been looking at tweaking the code so that the behavior matches
Alvaro's expectations. It passes check-world but I'm not claiming it's
nowhere near commitable - the purpose is mostly to give better idea of
how invasive the change is etc.

I was just skimming through the patch and didn't find anything related
to initial sync handling. I feel the behavior should be same for
initial sync and replication.

Yeah, sorry for not mentioning that - my goal was to explore and try
getting the behavior in regular replication right first, before
attempting to do the same thing in tablesync.

Attached is a patch doing the same thing in tablesync. The overall idea
is to generate copy statement with CASE expressions, applying filters to
individual columns. For Alvaro's example, this generates something like

SELECT
(CASE WHEN (a < 0) OR (a > 0) THEN a ELSE NULL END) AS a,
(CASE WHEN (a > 0) THEN b ELSE NULL END) AS b,
(CASE WHEN (a < 0) THEN c ELSE NULL END) AS c
FROM uno WHERE (a < 0) OR (a > 0)

And that seems to work fine. Similarly to regular replication we have to
use both the "total" column list (union of per-publication lists) and
per-publication (row filter + column list), but that's expected.

There's a couple options how we might optimize this for common cases.
For example if there's just a single publication, there's no need to
generate the CASE expressions - the WHERE filter will do the trick.

regards

--
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Attachments:

rework-column-row-filtering-v2.patchtext/x-patch; charset=UTF-8; name=rework-column-row-filtering-v2.patchDownload

diff --git a/src/backend/replication/logical/proto.c b/src/backend/replication/logical/proto.c
index ff8513e2d29..41c5e3413f6 100644
--- a/src/backend/replication/logical/proto.c
+++ b/src/backend/replication/logical/proto.c
@@ -33,7 +33,9 @@ static void logicalrep_write_attrs(StringInfo out, Relation rel,
 								   Bitmapset *columns);
 static void logicalrep_write_tuple(StringInfo out, Relation rel,
 								   TupleTableSlot *slot,
-								   bool binary, Bitmapset *columns);
+								   bool binary,
+								   Bitmapset *schema_columns,
+								   Bitmapset *columns);
 static void logicalrep_read_attrs(StringInfo in, LogicalRepRelation *rel);
 static void logicalrep_read_tuple(StringInfo in, LogicalRepTupleData *tuple);
 
@@ -412,7 +414,8 @@ logicalrep_read_origin(StringInfo in, XLogRecPtr *origin_lsn)
  */
 void
 logicalrep_write_insert(StringInfo out, TransactionId xid, Relation rel,
-						TupleTableSlot *newslot, bool binary, Bitmapset *columns)
+						TupleTableSlot *newslot, bool binary,
+						Bitmapset *schema_columns, Bitmapset *columns)
 {
 	pq_sendbyte(out, LOGICAL_REP_MSG_INSERT);
 
@@ -424,7 +427,8 @@ logicalrep_write_insert(StringInfo out, TransactionId xid, Relation rel,
 	pq_sendint32(out, RelationGetRelid(rel));
 
 	pq_sendbyte(out, 'N');		/* new tuple follows */
-	logicalrep_write_tuple(out, rel, newslot, binary, columns);
+	logicalrep_write_tuple(out, rel, newslot, binary,
+						   schema_columns, columns);
 }
 
 /*
@@ -457,7 +461,8 @@ logicalrep_read_insert(StringInfo in, LogicalRepTupleData *newtup)
 void
 logicalrep_write_update(StringInfo out, TransactionId xid, Relation rel,
 						TupleTableSlot *oldslot, TupleTableSlot *newslot,
-						bool binary, Bitmapset *columns)
+						bool binary, Bitmapset *schema_columns,
+						Bitmapset *columns)
 {
 	pq_sendbyte(out, LOGICAL_REP_MSG_UPDATE);
 
@@ -478,11 +483,12 @@ logicalrep_write_update(StringInfo out, TransactionId xid, Relation rel,
 			pq_sendbyte(out, 'O');	/* old tuple follows */
 		else
 			pq_sendbyte(out, 'K');	/* old key follows */
-		logicalrep_write_tuple(out, rel, oldslot, binary, NULL);
+		logicalrep_write_tuple(out, rel, oldslot, binary, NULL, NULL);
 	}
 
 	pq_sendbyte(out, 'N');		/* new tuple follows */
-	logicalrep_write_tuple(out, rel, newslot, binary, columns);
+	logicalrep_write_tuple(out, rel, newslot, binary,
+						   schema_columns, columns);
 }
 
 /*
@@ -551,7 +557,7 @@ logicalrep_write_delete(StringInfo out, TransactionId xid, Relation rel,
 	else
 		pq_sendbyte(out, 'K');	/* old key follows */
 
-	logicalrep_write_tuple(out, rel, oldslot, binary, NULL);
+	logicalrep_write_tuple(out, rel, oldslot, binary, NULL, NULL);
 }
 
 /*
@@ -766,7 +772,8 @@ logicalrep_read_typ(StringInfo in, LogicalRepTyp *ltyp)
  */
 static void
 logicalrep_write_tuple(StringInfo out, Relation rel, TupleTableSlot *slot,
-					   bool binary, Bitmapset *columns)
+					   bool binary,
+					   Bitmapset *schema_columns, Bitmapset *columns)
 {
 	TupleDesc	desc;
 	Datum	   *values;
@@ -783,7 +790,7 @@ logicalrep_write_tuple(StringInfo out, Relation rel, TupleTableSlot *slot,
 		if (att->attisdropped || att->attgenerated)
 			continue;
 
-		if (!column_in_column_list(att->attnum, columns))
+		if (!column_in_column_list(att->attnum, schema_columns))
 			continue;
 
 		nliveatts++;
@@ -804,10 +811,23 @@ logicalrep_write_tuple(StringInfo out, Relation rel, TupleTableSlot *slot,
 		if (att->attisdropped || att->attgenerated)
 			continue;
 
-		if (!column_in_column_list(att->attnum, columns))
+		/*
+		 * Columns that are not in schema (union of column lists) should
+		 * be skipped entirely.
+		 */
+		if (!column_in_column_list(att->attnum, schema_columns))
 			continue;
 
-		if (isnull[i])
+		/*
+		 * Columns not in the column list (derived consindering row filters)
+		 * we just send NULL.
+		 *
+		 * XXX Not sure this is quite correct, though. Imagine you replicate
+		 * values for columns (A,B), but it changes the row filter. Can we
+		 * send NULL that would overwrite "proper" value replicated earlier?
+		 */
+		if (isnull[i] ||
+			!column_in_column_list(att->attnum, columns))
 		{
 			pq_sendbyte(out, LOGICALREP_COLUMN_NULL);
 			continue;
diff --git a/src/backend/replication/logical/tablesync.c b/src/backend/replication/logical/tablesync.c
index 49ceec3bdc8..fd547e16f4a 100644
--- a/src/backend/replication/logical/tablesync.c
+++ b/src/backend/replication/logical/tablesync.c
@@ -126,6 +126,20 @@ static bool FetchTableStates(bool *started_tx);
 
 StringInfo	copybuf = NULL;
 
+/*
+ * Info use to track and evaluate row filters for each publication the relation
+ * is included in, and calculat ethe column list.
+ */
+typedef struct PublicationInfo {
+
+	/* row filter (expression state) */
+	Node	   *rowfilter;
+
+	/* column list */
+	Bitmapset  *columns;
+
+} PublicationInfo;
+
 /*
  * Exit routine for synchronization worker.
  */
@@ -696,14 +710,14 @@ copy_read_data(void *outbuf, int minread, int maxread)
  */
 static void
 fetch_remote_table_info(char *nspname, char *relname,
-						LogicalRepRelation *lrel, List **qual)
+						LogicalRepRelation *lrel, List **pubinfos)
 {
 	WalRcvExecResult *res;
 	StringInfoData cmd;
 	TupleTableSlot *slot;
 	Oid			tableRow[] = {OIDOID, CHAROID, CHAROID};
 	Oid			attrRow[] = {INT2OID, TEXTOID, OIDOID, BOOLOID};
-	Oid			qualRow[] = {TEXTOID};
+	Oid			qualRow[] = {TEXTOID, INT2VECTOROID};
 	bool		isnull;
 	int			natt;
 	ListCell   *lc;
@@ -878,6 +892,7 @@ fetch_remote_table_info(char *nspname, char *relname,
 	/* We don't know the number of rows coming, so allocate enough space. */
 	lrel->attnames = palloc0(MaxTupleAttributeNumber * sizeof(char *));
 	lrel->atttyps = palloc0(MaxTupleAttributeNumber * sizeof(Oid));
+	lrel->attnums = palloc0(MaxTupleAttributeNumber * sizeof(int16));
 	lrel->attkeys = NULL;
 
 	/*
@@ -905,6 +920,7 @@ fetch_remote_table_info(char *nspname, char *relname,
 		Assert(!isnull);
 
 		lrel->attnames[natt] = rel_colname;
+		lrel->attnums[natt] = attnum;
 		lrel->atttyps[natt] = DatumGetObjectId(slot_getattr(slot, 3, &isnull));
 		Assert(!isnull);
 
@@ -943,6 +959,7 @@ fetch_remote_table_info(char *nspname, char *relname,
 	 * 3) one of the subscribed publications is declared as ALL TABLES IN
 	 * SCHEMA that includes this relation
 	 */
+	*pubinfos = NIL;
 	if (walrcv_server_version(LogRepWorkerWalRcvConn) >= 150000)
 	{
 		StringInfoData pub_names;
@@ -965,7 +982,7 @@ fetch_remote_table_info(char *nspname, char *relname,
 		/* Check for row filters. */
 		resetStringInfo(&cmd);
 		appendStringInfo(&cmd,
-						 "SELECT DISTINCT pg_get_expr(pr.prqual, pr.prrelid)"
+						 "SELECT DISTINCT pg_get_expr(pr.prqual, pr.prrelid), pr.prattrs"
 						 "  FROM pg_publication p"
 						 "  LEFT OUTER JOIN pg_publication_rel pr"
 						 "       ON (p.oid = pr.prpubid AND pr.prrelid = %u),"
@@ -976,7 +993,7 @@ fetch_remote_table_info(char *nspname, char *relname,
 						 lrel->remoteid,
 						 pub_names.data);
 
-		res = walrcv_exec(LogRepWorkerWalRcvConn, cmd.data, 1, qualRow);
+		res = walrcv_exec(LogRepWorkerWalRcvConn, cmd.data, lengthof(qualRow), qualRow);
 
 		if (res->status != WALRCV_OK_TUPLES)
 			ereport(ERROR,
@@ -993,21 +1010,31 @@ fetch_remote_table_info(char *nspname, char *relname,
 		slot = MakeSingleTupleTableSlot(res->tupledesc, &TTSOpsMinimalTuple);
 		while (tuplestore_gettupleslot(res->tuplestore, true, false, slot))
 		{
+			PublicationInfo *pubinfo;
 			Datum		rf = slot_getattr(slot, 1, &isnull);
+			Datum		cl;
+
+			pubinfo = (PublicationInfo	*) palloc0(sizeof(PublicationInfo));
+
+			if (!isnull)
+				pubinfo->rowfilter = (Node *) makeString(TextDatumGetCString(rf));
+
+			cl = slot_getattr(slot, 2, &isnull);
 
 			if (!isnull)
-				*qual = lappend(*qual, makeString(TextDatumGetCString(rf)));
-			else
 			{
-				/* Ignore filters and cleanup as necessary. */
-				if (*qual)
+				int i;
+				int2vector *prattrs = (int2vector *) cl;
+
+				for (i = 0; i < prattrs->dim1; i++)
 				{
-					list_free_deep(*qual);
-					*qual = NIL;
+					pubinfo->columns = bms_add_member(pubinfo->columns,
+													  prattrs->values[i]);
 				}
-				break;
 			}
 
+			*pubinfos = lappend(*pubinfos, pubinfo);
+
 			ExecClearTuple(slot);
 		}
 		ExecDropSingleTupleTableSlot(slot);
@@ -1028,7 +1055,7 @@ copy_table(Relation rel)
 {
 	LogicalRepRelMapEntry *relmapentry;
 	LogicalRepRelation lrel;
-	List	   *qual = NIL;
+	List	   *pubinfos = NIL;
 	WalRcvExecResult *res;
 	StringInfoData cmd;
 	CopyFromState cstate;
@@ -1037,7 +1064,7 @@ copy_table(Relation rel)
 
 	/* Get the publisher relation info. */
 	fetch_remote_table_info(get_namespace_name(RelationGetNamespace(rel)),
-							RelationGetRelationName(rel), &lrel, &qual);
+							RelationGetRelationName(rel), &lrel, &pubinfos);
 
 	/* Put the relation into relmap. */
 	logicalrep_relmap_update(&lrel);
@@ -1050,7 +1077,9 @@ copy_table(Relation rel)
 	initStringInfo(&cmd);
 
 	/* Regular table with no row filter */
-	if (lrel.relkind == RELKIND_RELATION && qual == NIL)
+	/* FIXME pubinfos is never NULL now, need to detect absence of row filters
+	 * in a different way */
+	if (lrel.relkind == RELKIND_RELATION && pubinfos == NIL)
 	{
 		appendStringInfo(&cmd, "COPY %s (",
 						 quote_qualified_identifier(lrel.nspname, lrel.relname));
@@ -1076,11 +1105,55 @@ copy_table(Relation rel)
 		 * (SELECT ...), but we can't just do SELECT * because we need to not
 		 * copy generated columns. For tables with any row filters, build a
 		 * SELECT query with OR'ed row filters for COPY.
+		 *
+		 * FIXME can be simplified if all subscriptions have the same column
+		 * list (or no column list), in which case we don't need the CASE
+		 * expressions at all.
 		 */
 		appendStringInfoString(&cmd, "COPY (SELECT ");
 		for (int i = 0; i < lrel.natts; i++)
 		{
-			appendStringInfoString(&cmd, quote_identifier(lrel.attnames[i]));
+			ListCell   *lc;
+			StringInfoData qual;
+			bool		no_filter = false;
+			bool		is_first = true;
+
+			initStringInfo(&qual);
+
+			/* find all row filters for the column, combine them using OR */
+			foreach (lc, pubinfos)
+			{
+				PublicationInfo *pubinfo = (PublicationInfo *) lfirst(lc);
+
+				/* not included in this publication column list */
+				if (pubinfo->columns != NULL &&
+					!bms_is_member(lrel.attnums[i], pubinfo->columns))
+					continue;
+
+				/* covered by this publication, is there an expression? */
+				if (pubinfo->rowfilter == NULL)
+				{
+					no_filter = true;
+					break;
+				}
+
+				if (is_first)
+				{
+					appendStringInfo(&qual, "%s", strVal(pubinfo->rowfilter));
+					is_first = false;
+				}
+				else
+					appendStringInfo(&qual, " OR %s", strVal(pubinfo->rowfilter));
+			}
+
+			if (no_filter)
+				appendStringInfoString(&cmd, quote_identifier(lrel.attnames[i]));
+			else
+				appendStringInfo(&cmd, "(CASE WHEN (%s) THEN %s ELSE NULL END) AS %s",
+								 qual.data,
+								 quote_identifier(lrel.attnames[i]),
+								 quote_identifier(lrel.attnames[i]));
+			
 			if (i < lrel.natts - 1)
 				appendStringInfoString(&cmd, ", ");
 		}
@@ -1095,6 +1168,24 @@ copy_table(Relation rel)
 			appendStringInfoString(&cmd, "ONLY ");
 
 		appendStringInfoString(&cmd, quote_qualified_identifier(lrel.nspname, lrel.relname));
+
+		{
+			List *qual = NIL;
+			ListCell *lc;
+
+			foreach (lc, pubinfos)
+			{
+				PublicationInfo *pubinfo = (PublicationInfo *) lfirst(lc);
+
+				if (pubinfo->rowfilter == NULL)
+				{
+					qual = NIL;
+					break;
+				}
+
+				qual = lappend(qual, pubinfo->rowfilter);
+			}
+
 		/* list of OR'ed filters */
 		if (qual != NIL)
 		{
@@ -1110,6 +1201,8 @@ copy_table(Relation rel)
 			list_free_deep(qual);
 		}
 
+		}
+
 		appendStringInfoString(&cmd, ") TO STDOUT");
 	}
 	res = walrcv_exec(LogRepWorkerWalRcvConn, cmd.data, 0, NULL);
diff --git a/src/backend/replication/pgoutput/pgoutput.c b/src/backend/replication/pgoutput/pgoutput.c
index b197bfd565d..85456e6d9f5 100644
--- a/src/backend/replication/pgoutput/pgoutput.c
+++ b/src/backend/replication/pgoutput/pgoutput.c
@@ -137,13 +137,13 @@ typedef struct RelationSyncEntry
 	PublicationActions pubactions;
 
 	/*
-	 * ExprState array for row filter. Different publication actions don't
-	 * allow multiple expressions to always be combined into one, because
-	 * updates or deletes restrict the column in expression to be part of the
-	 * replica identity index whereas inserts do not have this restriction, so
-	 * there is one ExprState per publication action.
+	 * Info about row filters and column lists for each publication this
+	 * relation is included in. We keep a list with per-publication info in
+	 * order to calculate appropriate column list depending on which row
+	 * filter(s) match. Each element contains an ExprState for the filter
+	 * and column list.
 	 */
-	ExprState  *exprstate[NUM_ROWFILTER_PUBACTIONS];
+	dlist_head	pubinfos;		/* one per publication */
 	EState	   *estate;			/* executor state used for row filter */
 	TupleTableSlot *new_slot;	/* slot for storing new tuple */
 	TupleTableSlot *old_slot;	/* slot for storing old tuple */
@@ -208,6 +208,29 @@ typedef struct PGOutputTxnData
 								 * been sent */
 }		PGOutputTxnData;
 
+/*
+ * Info use to track and evaluate row filters for each publication the relation
+ * is included in, and calculat ethe column list.
+ */
+typedef struct PublicationInfo {
+
+	/* doubly-linked list */
+	dlist_node	node;
+
+	/* publication OID (XXX not really needed) */
+	Oid			oid;
+
+	/* row filter (expression state) */
+	ExprState  *rowfilter;
+
+	/* column list */
+	Bitmapset  *columns;
+
+	/* actions published by the publication */
+	PublicationActions	pubactions;
+
+} PublicationInfo;
+
 /* Map used to remember which relation schemas we sent. */
 static HTAB *RelationSyncCache = NULL;
 
@@ -235,12 +258,8 @@ static bool pgoutput_row_filter_exec_expr(ExprState *state,
 static bool pgoutput_row_filter(Relation relation, TupleTableSlot *old_slot,
 								TupleTableSlot **new_slot_ptr,
 								RelationSyncEntry *entry,
-								ReorderBufferChangeType *action);
-
-/* column list routines */
-static void pgoutput_column_list_init(PGOutputData *data,
-									  List *publications,
-									  RelationSyncEntry *entry);
+								ReorderBufferChangeType *action,
+								Bitmapset **column_list);
 
 /*
  * Specify output plugin callbacks
@@ -822,18 +841,24 @@ pgoutput_ensure_entry_cxt(PGOutputData *data, RelationSyncEntry *entry)
 }
 
 /*
- * Initialize the row filter.
+ * Initialize the row filter and column list.
+ *
+ * Prepare information (ExprState, etc) used to evaluate per-publication row
+ * filters, and column lists.
+ *
+ * We also calculate a "total" column list as a union of all per-publication
+ * column lists, irrespectedly of row filters. This is used to send schema
+ * for the relsync entry, etc.
  */
 static void
 pgoutput_row_filter_init(PGOutputData *data, List *publications,
 						 RelationSyncEntry *entry)
 {
 	ListCell   *lc;
-	List	   *rfnodes[] = {NIL, NIL, NIL};	/* One per pubaction */
-	bool		no_filter[] = {false, false, false};	/* One per pubaction */
 	MemoryContext oldctx;
-	int			idx;
-	bool		has_filter = true;
+	bool		all_columns = false;
+
+	dlist_init(&entry->pubinfos);
 
 	/*
 	 * Find if there are any row filters for this relation. If there are, then
@@ -855,7 +880,12 @@ pgoutput_row_filter_init(PGOutputData *data, List *publications,
 		Publication *pub = lfirst(lc);
 		HeapTuple	rftuple = NULL;
 		Datum		rfdatum = 0;
+		Datum		cfdatum = 0;
 		bool		pub_no_filter = false;
+		bool		pub_no_list = false;
+		Relation	relation = RelationIdGetRelation(entry->publish_as_relid);
+
+		PublicationInfo *pubinfo = NULL;
 
 		if (pub->alltables)
 		{
@@ -865,6 +895,7 @@ pgoutput_row_filter_init(PGOutputData *data, List *publications,
 			 * publications it does).
 			 */
 			pub_no_filter = true;
+			pub_no_list = true;
 		}
 		else
 		{
@@ -881,191 +912,75 @@ pgoutput_row_filter_init(PGOutputData *data, List *publications,
 				rfdatum = SysCacheGetAttr(PUBLICATIONRELMAP, rftuple,
 										  Anum_pg_publication_rel_prqual,
 										  &pub_no_filter);
+
+				/*
+				 * Lookup the column list attribute.
+				 *
+				 * Null indicates no list.
+				 */
+				cfdatum = SysCacheGetAttr(PUBLICATIONRELMAP, rftuple,
+										  Anum_pg_publication_rel_prattrs,
+										  &pub_no_list);
 			}
 			else
 			{
 				pub_no_filter = true;
+				pub_no_list = true;
 			}
 		}
 
-		if (pub_no_filter)
-		{
-			if (rftuple)
-				ReleaseSysCache(rftuple);
-
-			no_filter[PUBACTION_INSERT] |= pub->pubactions.pubinsert;
-			no_filter[PUBACTION_UPDATE] |= pub->pubactions.pubupdate;
-			no_filter[PUBACTION_DELETE] |= pub->pubactions.pubdelete;
-
-			/*
-			 * Quick exit if all the DML actions are publicized via this
-			 * publication.
-			 */
-			if (no_filter[PUBACTION_INSERT] &&
-				no_filter[PUBACTION_UPDATE] &&
-				no_filter[PUBACTION_DELETE])
-			{
-				has_filter = false;
-				break;
-			}
-
-			/* No additional work for this publication. Next one. */
-			continue;
-		}
-
-		/* Form the per pubaction row filter lists. */
-		if (pub->pubactions.pubinsert && !no_filter[PUBACTION_INSERT])
-			rfnodes[PUBACTION_INSERT] = lappend(rfnodes[PUBACTION_INSERT],
-												TextDatumGetCString(rfdatum));
-		if (pub->pubactions.pubupdate && !no_filter[PUBACTION_UPDATE])
-			rfnodes[PUBACTION_UPDATE] = lappend(rfnodes[PUBACTION_UPDATE],
-												TextDatumGetCString(rfdatum));
-		if (pub->pubactions.pubdelete && !no_filter[PUBACTION_DELETE])
-			rfnodes[PUBACTION_DELETE] = lappend(rfnodes[PUBACTION_DELETE],
-												TextDatumGetCString(rfdatum));
-
-		ReleaseSysCache(rftuple);
-	}							/* loop all subscribed publications */
-
-	/* Clean the row filter */
-	for (idx = 0; idx < NUM_ROWFILTER_PUBACTIONS; idx++)
-	{
-		if (no_filter[idx])
-		{
-			list_free_deep(rfnodes[idx]);
-			rfnodes[idx] = NIL;
-		}
-	}
-
-	if (has_filter)
-	{
-		Relation	relation = RelationIdGetRelation(entry->publish_as_relid);
-
 		pgoutput_ensure_entry_cxt(data, entry);
 
-		/*
-		 * Now all the filters for all pubactions are known. Combine them when
-		 * their pubactions are the same.
-		 */
 		oldctx = MemoryContextSwitchTo(entry->entry_cxt);
-		entry->estate = create_estate_for_relation(relation);
-		for (idx = 0; idx < NUM_ROWFILTER_PUBACTIONS; idx++)
-		{
-			List	   *filters = NIL;
-			Expr	   *rfnode;
 
-			if (rfnodes[idx] == NIL)
-				continue;
-
-			foreach(lc, rfnodes[idx])
-				filters = lappend(filters, stringToNode((char *) lfirst(lc)));
-
-			/* combine the row filter and cache the ExprState */
-			rfnode = make_orclause(filters);
-			entry->exprstate[idx] = ExecPrepareExpr(rfnode, entry->estate);
-		}						/* for each pubaction */
-		MemoryContextSwitchTo(oldctx);
-
-		RelationClose(relation);
-	}
-}
+		pubinfo = (PublicationInfo *) palloc0(sizeof(PublicationInfo));
 
-/*
- * Initialize the column list.
- */
-static void
-pgoutput_column_list_init(PGOutputData *data, List *publications,
-						  RelationSyncEntry *entry)
-{
-	ListCell   *lc;
+		pubinfo->oid = pub->oid;
+		pubinfo->pubactions = pub->pubactions;
 
-	/*
-	 * Find if there are any column lists for this relation. If there are,
-	 * build a bitmap merging all the column lists.
-	 *
-	 * All the given publication-table mappings must be checked.
-	 *
-	 * Multiple publications might have multiple column lists for this relation.
-	 *
-	 * FOR ALL TABLES and FOR ALL TABLES IN SCHEMA implies "don't use column
-	 * list" so it takes precedence.
-	 */
-	foreach(lc, publications)
-	{
-		Publication *pub = lfirst(lc);
-		HeapTuple	cftuple = NULL;
-		Datum		cfdatum = 0;
+		if (!pub_no_filter)
+		{
+			entry->estate = create_estate_for_relation(relation);
 
-		/*
-		 * Assume there's no column list. Only if we find pg_publication_rel
-		 * entry with a column list we'll switch it to false.
-		 */
-		bool		pub_no_list = true;
+			pubinfo->rowfilter
+				= ExecPrepareExpr(stringToNode(TextDatumGetCString(rfdatum)),
+								  entry->estate);
+		}
 
 		/*
-		 * If the publication is FOR ALL TABLES then it is treated the same as if
-		 * there are no column lists (even if other publications have a list).
+		 * Build the column list bitmap in the per-entry context.
+		 *
+		 * We need to merge column lists from all publications, so we
+		 * update the same bitmapset. If the column list is null, we
+		 * interpret it as replicating all columns.
 		 */
-		if (!pub->alltables)
+		pubinfo->columns = NULL;
+		if (!pub_no_list)	/* when not null */
 		{
-			/*
-			 * Check for the presence of a column list in this publication.
-			 *
-			 * Note: If we find no pg_publication_rel row, it's a publication
-			 * defined for a whole schema, so it can't have a column list, just
-			 * like a FOR ALL TABLES publication.
-			 */
-			cftuple = SearchSysCache2(PUBLICATIONRELMAP,
-									  ObjectIdGetDatum(entry->publish_as_relid),
-									  ObjectIdGetDatum(pub->oid));
-
-			if (HeapTupleIsValid(cftuple))
-			{
-				/*
-				 * Lookup the column list attribute.
-				 *
-				 * Note: We update the pub_no_list value directly, because if
-				 * the value is NULL, we have no list (and vice versa).
-				 */
-				cfdatum = SysCacheGetAttr(PUBLICATIONRELMAP, cftuple,
-										  Anum_pg_publication_rel_prattrs,
-										  &pub_no_list);
-
-				/*
-				 * Build the column list bitmap in the per-entry context.
-				 *
-				 * We need to merge column lists from all publications, so we
-				 * update the same bitmapset. If the column list is null, we
-				 * interpret it as replicating all columns.
-				 */
-				if (!pub_no_list)	/* when not null */
-				{
-					pgoutput_ensure_entry_cxt(data, entry);
+			pubinfo->columns = pub_collist_to_bitmapset(pubinfo->columns,
+														cfdatum,
+														entry->entry_cxt);
 
-					entry->columns = pub_collist_to_bitmapset(entry->columns,
-															  cfdatum,
-															  entry->entry_cxt);
-				}
-			}
+			entry->columns = pub_collist_to_bitmapset(entry->columns,
+													  cfdatum,
+													  entry->entry_cxt);
 		}
+		else
+			all_columns = true;
 
-		/*
-		 * Found a publication with no column list, so we're done. But first
-		 * discard column list we might have from preceding publications.
-		 */
-		if (pub_no_list)
-		{
-			if (cftuple)
-				ReleaseSysCache(cftuple);
+		if (HeapTupleIsValid(rftuple))
+			ReleaseSysCache(rftuple);
 
-			bms_free(entry->columns);
-			entry->columns = NULL;
+		MemoryContextSwitchTo(oldctx);
+		RelationClose(relation);
 
-			break;
-		}
+		dlist_push_tail(&entry->pubinfos, &pubinfo->node);
 
-		ReleaseSysCache(cftuple);
 	}							/* loop all subscribed publications */
+
+	/* any of the publications replicates all columns */
+	if (all_columns)
+		entry->columns = NULL;
 }
 
 /*
@@ -1115,7 +1030,8 @@ init_tuple_slot(PGOutputData *data, Relation relation,
 }
 
 /*
- * Change is checked against the row filter if any.
+ * Change is checked against the row filter if any, and calculate the column
+ * list applicable to the operation (with respect to matching row filters).
  *
  * Returns true if the change is to be replicated, else false.
  *
@@ -1136,6 +1052,8 @@ init_tuple_slot(PGOutputData *data, Relation relation,
  *
  * The new action is updated in the action parameter.
  *
+ * The calculated column list is returned in the column_list parameter.
+ *
  * The new slot could be updated when transforming the UPDATE into INSERT,
  * because the original new tuple might not have column values from the replica
  * identity.
@@ -1167,17 +1085,21 @@ init_tuple_slot(PGOutputData *data, Relation relation,
 static bool
 pgoutput_row_filter(Relation relation, TupleTableSlot *old_slot,
 					TupleTableSlot **new_slot_ptr, RelationSyncEntry *entry,
-					ReorderBufferChangeType *action)
+					ReorderBufferChangeType *action, Bitmapset **column_list)
 {
 	TupleDesc	desc;
 	int			i;
-	bool		old_matched,
-				new_matched,
+	bool		old_matched_any = false,
+				new_matched_any = false,
 				result;
-	TupleTableSlot *tmp_new_slot;
+	TupleTableSlot *tmp_new_slot = NULL;
 	TupleTableSlot *new_slot = *new_slot_ptr;
-	ExprContext *ecxt;
-	ExprState  *filter_exprstate;
+	dlist_iter	iter;
+	bool		matching = false;
+
+	/* Column list calculated from publications matching the row filter. */
+	Bitmapset  *columns = NULL;
+	bool		all_columns = false;
 
 	/*
 	 * We need this map to avoid relying on ReorderBufferChangeType enums
@@ -1195,115 +1117,191 @@ pgoutput_row_filter(Relation relation, TupleTableSlot *old_slot,
 
 	Assert(new_slot || old_slot);
 
-	/* Get the corresponding row filter */
-	filter_exprstate = entry->exprstate[map_changetype_pubaction[*action]];
+	dlist_foreach(iter, &entry->pubinfos)
+	{
+		ExprContext *ecxt;
+		bool		old_matched,
+					new_matched;
 
-	/* Bail out if there is no row filter */
-	if (!filter_exprstate)
-		return true;
+		PublicationInfo  *pubinfo
+			= dlist_container(PublicationInfo, node, iter.cur);
 
-	elog(DEBUG3, "table \"%s.%s\" has row filter",
-		 get_namespace_name(RelationGetNamespace(relation)),
-		 RelationGetRelationName(relation));
+		/* ignore publications not replicating this action */
+		if ((*action == REORDER_BUFFER_CHANGE_INSERT) &&
+			(!pubinfo->pubactions.pubinsert))
+			continue;
+		else if ((*action == REORDER_BUFFER_CHANGE_UPDATE) &&
+				 (!pubinfo->pubactions.pubupdate))
+			continue;
+		else if ((*action == REORDER_BUFFER_CHANGE_DELETE) &&
+			(!pubinfo->pubactions.pubdelete))
+			continue;
 
-	ResetPerTupleExprContext(entry->estate);
+		if (!pubinfo->rowfilter)
+		{
+			matching = true;
 
-	ecxt = GetPerTupleExprContext(entry->estate);
+			/*
+			 * Update/merge the column list.
+			 *
+			 * If the publication has no column list, we interpret it as a list
+			 * with all columns. Otherwise we just add it to the bitmap.
+			 *
+			 * FIXME This is repeated in three places. Maybe refactor?
+			 */
+			if (!pubinfo->columns)
+			{
+				all_columns = true;
+				bms_free(columns);
+				columns = NULL;
+			}
+			else if (!all_columns)
+				columns = bms_union(columns, pubinfo->columns);
 
-	/*
-	 * For the following occasions where there is only one tuple, we can
-	 * evaluate the row filter for that tuple and return.
-	 *
-	 * For inserts, we only have the new tuple.
-	 *
-	 * For updates, we can have only a new tuple when none of the replica
-	 * identity columns changed and none of those columns have external data
-	 * but we still need to evaluate the row filter for the new tuple as the
-	 * existing values of those columns might not match the filter. Also, users
-	 * can use constant expressions in the row filter, so we anyway need to
-	 * evaluate it for the new tuple.
-	 *
-	 * For deletes, we only have the old tuple.
-	 */
-	if (!new_slot || !old_slot)
-	{
-		ecxt->ecxt_scantuple = new_slot ? new_slot : old_slot;
-		result = pgoutput_row_filter_exec_expr(filter_exprstate, ecxt);
+			continue;
+		}
 
-		return result;
-	}
+		elog(DEBUG3, "table \"%s.%s\" has row filter",
+			get_namespace_name(RelationGetNamespace(relation)),
+			RelationGetRelationName(relation));
 
-	/*
-	 * Both the old and new tuples must be valid only for updates and need to
-	 * be checked against the row filter.
-	 */
-	Assert(map_changetype_pubaction[*action] == PUBACTION_UPDATE);
+		ResetPerTupleExprContext(entry->estate);
+
+		ecxt = GetPerTupleExprContext(entry->estate);
 
-	slot_getallattrs(new_slot);
-	slot_getallattrs(old_slot);
+		/*
+		 * For the following occasions where there is only one tuple, we can
+		 * evaluate the row filter for that tuple and return.
+		 *
+		 * For inserts, we only have the new tuple.
+		 *
+		 * For updates, we can have only a new tuple when none of the replica
+		 * identity columns changed and none of those columns have external data
+		 * but we still need to evaluate the row filter for the new tuple as the
+		 * existing values of those columns might not match the filter. Also, users
+		 * can use constant expressions in the row filter, so we anyway need to
+		 * evaluate it for the new tuple.
+		 *
+		 * For deletes, we only have the old tuple.
+		 */
+		if (!new_slot || !old_slot)
+		{
+			ecxt->ecxt_scantuple = new_slot ? new_slot : old_slot;
+			result = pgoutput_row_filter_exec_expr(pubinfo->rowfilter, ecxt);
 
-	tmp_new_slot = NULL;
-	desc = RelationGetDescr(relation);
+			matching |= result;
 
-	/*
-	 * The new tuple might not have all the replica identity columns, in which
-	 * case it needs to be copied over from the old tuple.
-	 */
-	for (i = 0; i < desc->natts; i++)
-	{
-		Form_pg_attribute att = TupleDescAttr(desc, i);
+			/*
+			 * FIXME refactor to reuse this code in multiple places
+			 * 
+			 * XXX Should this only update column list when using new slot?
+			 * If evaluationg old slot, that's delete, no?
+			 */
+			if (result)
+			{
+				if (!pubinfo->columns)
+				{
+					all_columns = true;
+					bms_free(columns);
+					columns = NULL;
+				}
+				else if (!all_columns)
+				{
+					columns = bms_union(columns, pubinfo->columns);
+				}
+			}
+
+			continue;
+		}
 
 		/*
-		 * if the column in the new tuple or old tuple is null, nothing to do
+		 * Both the old and new tuples must be valid only for updates and need to
+		 * be checked against the row filter.
 		 */
-		if (new_slot->tts_isnull[i] || old_slot->tts_isnull[i])
-			continue;
+		Assert(map_changetype_pubaction[*action] == PUBACTION_UPDATE);
+
+		slot_getallattrs(new_slot);
+		slot_getallattrs(old_slot);
+
+		tmp_new_slot = NULL;
+		desc = RelationGetDescr(relation);
 
 		/*
-		 * Unchanged toasted replica identity columns are only logged in the
-		 * old tuple. Copy this over to the new tuple. The changed (or WAL
-		 * Logged) toast values are always assembled in memory and set as
-		 * VARTAG_INDIRECT. See ReorderBufferToastReplace.
+		 * The new tuple might not have all the replica identity columns, in which
+		 * case it needs to be copied over from the old tuple.
 		 */
-		if (att->attlen == -1 &&
-			VARATT_IS_EXTERNAL_ONDISK(new_slot->tts_values[i]) &&
-			!VARATT_IS_EXTERNAL_ONDISK(old_slot->tts_values[i]))
+		for (i = 0; i < desc->natts; i++)
 		{
-			if (!tmp_new_slot)
+			Form_pg_attribute att = TupleDescAttr(desc, i);
+
+			/*
+			 * if the column in the new tuple or old tuple is null, nothing to do
+			 */
+			if (new_slot->tts_isnull[i] || old_slot->tts_isnull[i])
+				continue;
+
+			/*
+			 * Unchanged toasted replica identity columns are only logged in the
+			 * old tuple. Copy this over to the new tuple. The changed (or WAL
+			 * Logged) toast values are always assembled in memory and set as
+			 * VARTAG_INDIRECT. See ReorderBufferToastReplace.
+			 */
+			if (att->attlen == -1 &&
+				VARATT_IS_EXTERNAL_ONDISK(new_slot->tts_values[i]) &&
+				!VARATT_IS_EXTERNAL_ONDISK(old_slot->tts_values[i]))
 			{
-				tmp_new_slot = MakeSingleTupleTableSlot(desc, &TTSOpsVirtual);
-				ExecClearTuple(tmp_new_slot);
+				if (!tmp_new_slot)
+				{
+					tmp_new_slot = MakeSingleTupleTableSlot(desc, &TTSOpsVirtual);
+					ExecClearTuple(tmp_new_slot);
 
-				memcpy(tmp_new_slot->tts_values, new_slot->tts_values,
-					   desc->natts * sizeof(Datum));
-				memcpy(tmp_new_slot->tts_isnull, new_slot->tts_isnull,
-					   desc->natts * sizeof(bool));
-			}
+					memcpy(tmp_new_slot->tts_values, new_slot->tts_values,
+						   desc->natts * sizeof(Datum));
+					memcpy(tmp_new_slot->tts_isnull, new_slot->tts_isnull,
+						   desc->natts * sizeof(bool));
+				}
 
-			tmp_new_slot->tts_values[i] = old_slot->tts_values[i];
-			tmp_new_slot->tts_isnull[i] = old_slot->tts_isnull[i];
+				tmp_new_slot->tts_values[i] = old_slot->tts_values[i];
+				tmp_new_slot->tts_isnull[i] = old_slot->tts_isnull[i];
+			}
 		}
-	}
 
-	ecxt->ecxt_scantuple = old_slot;
-	old_matched = pgoutput_row_filter_exec_expr(filter_exprstate, ecxt);
+		ecxt->ecxt_scantuple = old_slot;
 
-	if (tmp_new_slot)
-	{
-		ExecStoreVirtualTuple(tmp_new_slot);
-		ecxt->ecxt_scantuple = tmp_new_slot;
-	}
-	else
-		ecxt->ecxt_scantuple = new_slot;
+		old_matched = pgoutput_row_filter_exec_expr(pubinfo->rowfilter, ecxt);
+		old_matched_any |= old_matched;
+
+		if (tmp_new_slot)
+		{
+			ExecStoreVirtualTuple(tmp_new_slot);
+			ecxt->ecxt_scantuple = tmp_new_slot;
+		}
+		else
+			ecxt->ecxt_scantuple = new_slot;
 
-	new_matched = pgoutput_row_filter_exec_expr(filter_exprstate, ecxt);
+		new_matched = pgoutput_row_filter_exec_expr(pubinfo->rowfilter, ecxt);
+		new_matched_any |= new_matched;
 
-	/*
-	 * Case 1: if both tuples don't match the row filter, bailout. Send
-	 * nothing.
-	 */
-	if (!old_matched && !new_matched)
-		return false;
+		/*
+		 * Case 1: if both tuples don't match the row filter, bailout. Send
+		 * nothing.
+		 */
+		if (!old_matched && !new_matched)
+			continue;	/* continue with the next row filter */
+
+		/*
+		 * Case 4: if both tuples match the row filter, transformation isn't
+		 * required. (*action is default UPDATE).
+		 */
+		if (!pubinfo->columns)
+		{
+			all_columns = true;
+			bms_free(columns);
+			columns = NULL;
+		}
+		else if (!all_columns && new_matched)
+			columns = bms_union(columns, pubinfo->columns);
+	}
 
 	/*
 	 * Case 2: if the old tuple doesn't satisfy the row filter but the new
@@ -1314,9 +1312,10 @@ pgoutput_row_filter(Relation relation, TupleTableSlot *old_slot,
 	 * while inserting the tuple in the downstream node, we have all the
 	 * required column values.
 	 */
-	if (!old_matched && new_matched)
+	if (!old_matched_any && new_matched_any)
 	{
 		*action = REORDER_BUFFER_CHANGE_INSERT;
+		matching = true;
 
 		if (tmp_new_slot)
 			*new_slot_ptr = tmp_new_slot;
@@ -1329,15 +1328,18 @@ pgoutput_row_filter(Relation relation, TupleTableSlot *old_slot,
 	 * This transformation does not require another tuple. The Old tuple will
 	 * be used for DELETE.
 	 */
-	else if (old_matched && !new_matched)
+	else if (old_matched_any && !new_matched_any)
+	{
 		*action = REORDER_BUFFER_CHANGE_DELETE;
+		matching = true;
+	}
+	else if (old_matched_any && new_matched_any)
+		matching = true;
 
-	/*
-	 * Case 4: if both tuples match the row filter, transformation isn't
-	 * required. (*action is default UPDATE).
-	 */
+	if (column_list)
+		*column_list = columns;
 
-	return true;
+	return matching;
 }
 
 /*
@@ -1359,6 +1361,7 @@ pgoutput_change(LogicalDecodingContext *ctx, ReorderBufferTXN *txn,
 	ReorderBufferChangeType action = change->action;
 	TupleTableSlot *old_slot = NULL;
 	TupleTableSlot *new_slot = NULL;
+	Bitmapset	   *columns = NULL;
 
 	if (!is_publishable_relation(relation))
 		return;
@@ -1423,7 +1426,7 @@ pgoutput_change(LogicalDecodingContext *ctx, ReorderBufferTXN *txn,
 
 			/* Check row filter */
 			if (!pgoutput_row_filter(targetrel, NULL, &new_slot, relentry,
-									 &action))
+									 &action, &columns))
 				break;
 
 			/*
@@ -1444,7 +1447,8 @@ pgoutput_change(LogicalDecodingContext *ctx, ReorderBufferTXN *txn,
 
 			OutputPluginPrepareWrite(ctx, true);
 			logicalrep_write_insert(ctx->out, xid, targetrel, new_slot,
-									data->binary, relentry->columns);
+									data->binary,
+									relentry->columns, columns);
 			OutputPluginWrite(ctx, true);
 			break;
 		case REORDER_BUFFER_CHANGE_UPDATE:
@@ -1483,7 +1487,7 @@ pgoutput_change(LogicalDecodingContext *ctx, ReorderBufferTXN *txn,
 
 			/* Check row filter */
 			if (!pgoutput_row_filter(targetrel, old_slot, &new_slot,
-									 relentry, &action))
+									 relentry, &action, &columns))
 				break;
 
 			/* Send BEGIN if we haven't yet */
@@ -1503,12 +1507,12 @@ pgoutput_change(LogicalDecodingContext *ctx, ReorderBufferTXN *txn,
 				case REORDER_BUFFER_CHANGE_INSERT:
 					logicalrep_write_insert(ctx->out, xid, targetrel,
 											new_slot, data->binary,
-											relentry->columns);
+											relentry->columns, columns);
 					break;
 				case REORDER_BUFFER_CHANGE_UPDATE:
 					logicalrep_write_update(ctx->out, xid, targetrel,
 											old_slot, new_slot, data->binary,
-											relentry->columns);
+											relentry->columns, columns);
 					break;
 				case REORDER_BUFFER_CHANGE_DELETE:
 					logicalrep_write_delete(ctx->out, xid, targetrel,
@@ -1547,7 +1551,7 @@ pgoutput_change(LogicalDecodingContext *ctx, ReorderBufferTXN *txn,
 
 				/* Check row filter */
 				if (!pgoutput_row_filter(targetrel, old_slot, &new_slot,
-										 relentry, &action))
+										 relentry, &action, NULL))
 					break;
 
 				/* Send BEGIN if we haven't yet */
@@ -1977,7 +1981,7 @@ get_rel_sync_entry(PGOutputData *data, Relation relation)
 			entry->pubactions.pubdelete = entry->pubactions.pubtruncate = false;
 		entry->new_slot = NULL;
 		entry->old_slot = NULL;
-		memset(entry->exprstate, 0, sizeof(entry->exprstate));
+		dlist_init(&entry->pubinfos);
 		entry->entry_cxt = NULL;
 		entry->publish_as_relid = InvalidOid;
 		entry->columns = NULL;
@@ -2056,7 +2060,7 @@ get_rel_sync_entry(PGOutputData *data, Relation relation)
 
 		entry->entry_cxt = NULL;
 		entry->estate = NULL;
-		memset(entry->exprstate, 0, sizeof(entry->exprstate));
+		dlist_init(&entry->pubinfos);
 
 		/*
 		 * Build publication cache. We can't use one provided by relcache as
@@ -2192,11 +2196,8 @@ get_rel_sync_entry(PGOutputData *data, Relation relation)
 			/* Initialize the tuple slot and map */
 			init_tuple_slot(data, relation, entry);
 
-			/* Initialize the row filter */
+			/* Initialize the row filter and column list info */
 			pgoutput_row_filter_init(data, rel_publications, entry);
-
-			/* Initialize the column list */
-			pgoutput_column_list_init(data, rel_publications, entry);
 		}
 
 		list_free(pubids);
diff --git a/src/include/replication/logicalproto.h b/src/include/replication/logicalproto.h
index a771ab8ff33..1a3f4f34bf4 100644
--- a/src/include/replication/logicalproto.h
+++ b/src/include/replication/logicalproto.h
@@ -104,6 +104,7 @@ typedef struct LogicalRepRelation
 	char	   *relname;		/* relation name */
 	int			natts;			/* number of columns */
 	char	  **attnames;		/* column names */
+	int16	   *attnums;		/* column attnums */
 	Oid		   *atttyps;		/* column types */
 	char		replident;		/* replica identity */
 	char		relkind;		/* remote relation kind */
@@ -209,12 +210,14 @@ extern char *logicalrep_read_origin(StringInfo in, XLogRecPtr *origin_lsn);
 extern void logicalrep_write_insert(StringInfo out, TransactionId xid,
 									Relation rel,
 									TupleTableSlot *newslot,
-									bool binary, Bitmapset *columns);
+									bool binary, Bitmapset *schema_columns,
+									Bitmapset *columns);
 extern LogicalRepRelId logicalrep_read_insert(StringInfo in, LogicalRepTupleData *newtup);
 extern void logicalrep_write_update(StringInfo out, TransactionId xid,
 									Relation rel,
 									TupleTableSlot *oldslot,
-									TupleTableSlot *newslot, bool binary, Bitmapset *columns);
+									TupleTableSlot *newslot, bool binary,
+									Bitmapset *schema_columns, Bitmapset *columns);
 extern LogicalRepRelId logicalrep_read_update(StringInfo in,
 											  bool *has_oldtuple, LogicalRepTupleData *oldtup,
 											  LogicalRepTupleData *newtup);

#16

Tomas Vondra

tomas.vondra@enterprisedb.com

over 3 years ago

In reply to: Peter Eisentraut (#14)

Re: bogus: logical replication rows/cols combinations

On 4/28/22 14:26, Peter Eisentraut wrote:

On 27.04.22 12:33, Amit Kapila wrote:

Currently, when the subscription has multiple publications, we combine
the objects, and actions of those publications. It happens for
'publish_via_partition_root', publication actions, tables, column
lists, or row filters. I think the whole design works on this idea
even the initial table sync. I think it might need a major change
(which I am not sure about at this stage) if we want to make the
initial sync also behave similar to what you are proposing.

If one publication says "publish if insert" and another publication says
"publish if update", then the combination of that is clearly "publish if
insert or update". Similarly, if one publication says "WHERE (foo)" and
one says "WHERE (bar)", then the combination is "WHERE (foo OR bar)".

But if one publication says "publish columns a and b if condition-X" and
another publication says "publish columns a and c if not-condition-X",
then the combination is clearly *not* "publish columns a, b, c if true".
That is not logical, in the literal sense of that word.

I wonder how we handle the combination of

pub1: publish=insert WHERE (foo)
pub2: publish=update WHERE (bar)

I think it would be incorrect if the combination is

pub1, pub2: publish=insert,update WHERE (foo OR bar).

That's a good question, actually. No, we don't combine the publications
like this, the row filters are kept "per action". But the exact behavior
turns out to be rather confusing in this case.

(Note: This has nothing to do with column lists.)

Consider an example similar to what Alvaro posted earlier:

create table uno (a int primary key, b int, c int);

create publication uno for table uno where (a > 0)
with (publish='insert');

create publication dos for table uno where (a < 0)
with (publish='update');

And do this:

insert into uno values (1, 2, 3), (-1, 3, 4)

which on the subscriber produces just one row, because (a<0) replicates
only updates:

a | b | c
---+---+---
1 | 2 | 3
(1 row)

Now, let's update the (a<0) row.

update uno set a = 2 where a = -1;

It might seem reasonable to expect the updated row (2,3,4) to appear on
the subscriber, but no - that's not what happens. Because we have (a<0)
for UPDATE, and we evaluate this on the old row (matches) and new row
(does not match). And pgoutput_row_filter() decides the update needs to
be converted to DELETE, despite the old row was not replicated at all.

I'm not sure if pgoutput_row_filter() can even make reasonable decisions
with such configuration (combination of row filters, actions ...). But
it sure seems confusing, because if you just inserted the updated row,
it would get replicated.

Which brings me to a second problem, related to this one. Imagine you
create the subscription *after* inserting the two rows. In that case you
get this:

a | b | c
----+---+---
1 | 2 | 3
-1 | 3 | 4
(2 rows)

because tablesync.c ignores which actions is the publication (and thus
the rowfilter) defined for.

I think it's natural to expect that (INSERT + sync) and (sync + INSERT)
produce the same output on the subscriber.

I'm not sure we can actually make this perfectly sane with arbitrary
combinations of filters and actions. It would probably depend on whether
the actions are commutative, associative and stuff like that. But maybe
we can come up with restrictions that'd make this sane?

regards

--
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

#17

Amit Kapila

amit.kapila16@gmail.com

over 3 years ago

In reply to: Tomas Vondra (#16)

Re: bogus: logical replication rows/cols combinations

On Thu, Apr 28, 2022 at 11:00 PM Tomas Vondra
<tomas.vondra@enterprisedb.com> wrote:

On 4/28/22 14:26, Peter Eisentraut wrote:

On 27.04.22 12:33, Amit Kapila wrote:

I wonder how we handle the combination of

pub1: publish=insert WHERE (foo)
pub2: publish=update WHERE (bar)

I think it would be incorrect if the combination is

pub1, pub2: publish=insert,update WHERE (foo OR bar).

That's a good question, actually. No, we don't combine the publications
like this, the row filters are kept "per action".

Right, and it won't work even if try to combine in this case because
of replica identity restrictions.

But the exact behavior
turns out to be rather confusing in this case.

(Note: This has nothing to do with column lists.)

Consider an example similar to what Alvaro posted earlier:

create table uno (a int primary key, b int, c int);

create publication uno for table uno where (a > 0)
with (publish='insert');

create publication dos for table uno where (a < 0)
with (publish='update');

And do this:

insert into uno values (1, 2, 3), (-1, 3, 4)

which on the subscriber produces just one row, because (a<0) replicates
only updates:

a | b | c
---+---+---
1 | 2 | 3
(1 row)

Now, let's update the (a<0) row.

update uno set a = 2 where a = -1;

It might seem reasonable to expect the updated row (2,3,4) to appear on
the subscriber, but no - that's not what happens. Because we have (a<0)
for UPDATE, and we evaluate this on the old row (matches) and new row
(does not match). And pgoutput_row_filter() decides the update needs to
be converted to DELETE, despite the old row was not replicated at all.

Right, but we don't know what previously would have happened maybe the
user would have altered the publication action after the initial row
is published in which case this DELETE is required as is shown in the
example below. We can only make the decision based on the current
tuple. For example:

create table uno (a int primary key, b int, c int);

create publication uno for table uno where (a > 0)
with (publish='insert');

create publication dos for table uno where (a < 0)
with (publish='insert');

-- create subscription for both these publications.

insert into uno values (1, 2, 3), (-1, 3, 4);

Alter publication dos set (publish='update');

update uno set a = 2 where a = -1;

Now, in this case, the old row was replicated and we would need a
DELETE corresponding to it.

I'm not sure if pgoutput_row_filter() can even make reasonable decisions
with such configuration (combination of row filters, actions ...). But
it sure seems confusing, because if you just inserted the updated row,
it would get replicated.

True, but that is what the combination of publications suggests. The
publication that publishes inserts have different criteria than
updates, so such behavior (a particular row when inserted will be
replicated but when it came as a result of an update it won't be
replicated) is expected.

Which brings me to a second problem, related to this one. Imagine you
create the subscription *after* inserting the two rows. In that case you
get this:

a | b | c
----+---+---
1 | 2 | 3
-1 | 3 | 4
(2 rows)

because tablesync.c ignores which actions is the publication (and thus
the rowfilter) defined for.

Yeah, this is the behavior of tablesync.c with or without rowfilter.
It ignores publication actions. So, if you update any tuple before
creation of subscription it will be replicated but the same update
won't be replicated after initial sync if the publication just
publishes 'insert'. I think we can't decide which data to copy based
on publication actions as COPY wouldn't know if a particular row is
due to a fresh insert or due to an update. In your example, it is
possible that row (-1, 3, 4) would have been there due to an update.

I think it's natural to expect that (INSERT + sync) and (sync + INSERT)
produce the same output on the subscriber.

I'm not sure we can actually make this perfectly sane with arbitrary
combinations of filters and actions. It would probably depend on whether
the actions are commutative, associative and stuff like that. But maybe
we can come up with restrictions that'd make this sane?

True, I think to some extent we rely on users to define it sanely
otherwise currently also it can easily lead to even replication being
stuck. This can happen when the user is trying to operate on the same
table and define publication/subscription on multiple nodes for it.
See [1]https://commitfest.postgresql.org/38/3610/ where we trying to deal with such a problem.

[1]: https://commitfest.postgresql.org/38/3610/

--
With Regards,
Amit Kapila.

#18

Amit Kapila

amit.kapila16@gmail.com

over 3 years ago

In reply to: Peter Eisentraut (#14)

Re: bogus: logical replication rows/cols combinations

On Thu, Apr 28, 2022 at 5:56 PM Peter Eisentraut
<peter.eisentraut@enterprisedb.com> wrote:

On 27.04.22 12:33, Amit Kapila wrote:

Currently, when the subscription has multiple publications, we combine
the objects, and actions of those publications. It happens for
'publish_via_partition_root', publication actions, tables, column
lists, or row filters. I think the whole design works on this idea
even the initial table sync. I think it might need a major change
(which I am not sure about at this stage) if we want to make the
initial sync also behave similar to what you are proposing.

If one publication says "publish if insert" and another publication says
"publish if update", then the combination of that is clearly "publish if
insert or update". Similarly, if one publication says "WHERE (foo)" and
one says "WHERE (bar)", then the combination is "WHERE (foo OR bar)".

But if one publication says "publish columns a and b if condition-X" and
another publication says "publish columns a and c if not-condition-X",
then the combination is clearly *not* "publish columns a, b, c if true".
That is not logical, in the literal sense of that word.

So, what should be the behavior in the below cases:

Case-1:
pub1: "publish columns a and b if condition-X"
pub2: "publish column c if condition-X"

Isn't it okay to combine these?

Case-2:
pub1: "publish columns a and b if condition-X"
pub2: "publish columns c if condition-Y"

Here Y is subset of condition X (say something like condition-X: "col1

5" and condition-Y: "col1 > 10").

What should we do in such a case?

I think if there are some cases where combining them is okay but in
other cases, it is not okay then it is better to prohibit 'not-okay'
cases if that is feasible.

--
With Regards,
Amit Kapila.

#19

Tomas Vondra

tomas.vondra@enterprisedb.com

over 3 years ago

In reply to: Amit Kapila (#17)

Re: bogus: logical replication rows/cols combinations

On 4/29/22 06:48, Amit Kapila wrote:

On Thu, Apr 28, 2022 at 11:00 PM Tomas Vondra
<tomas.vondra@enterprisedb.com> wrote:

On 4/28/22 14:26, Peter Eisentraut wrote:

On 27.04.22 12:33, Amit Kapila wrote:

I wonder how we handle the combination of

pub1: publish=insert WHERE (foo)
pub2: publish=update WHERE (bar)

I think it would be incorrect if the combination is

pub1, pub2: publish=insert,update WHERE (foo OR bar).

That's a good question, actually. No, we don't combine the publications
like this, the row filters are kept "per action".

Right, and it won't work even if try to combine in this case because
of replica identity restrictions.

But the exact behavior
turns out to be rather confusing in this case.

(Note: This has nothing to do with column lists.)

Consider an example similar to what Alvaro posted earlier:

create table uno (a int primary key, b int, c int);

create publication uno for table uno where (a > 0)
with (publish='insert');

create publication dos for table uno where (a < 0)
with (publish='update');

And do this:

insert into uno values (1, 2, 3), (-1, 3, 4)

which on the subscriber produces just one row, because (a<0) replicates
only updates:

a | b | c
---+---+---
1 | 2 | 3
(1 row)

Now, let's update the (a<0) row.

update uno set a = 2 where a = -1;

It might seem reasonable to expect the updated row (2,3,4) to appear on
the subscriber, but no - that's not what happens. Because we have (a<0)
for UPDATE, and we evaluate this on the old row (matches) and new row
(does not match). And pgoutput_row_filter() decides the update needs to
be converted to DELETE, despite the old row was not replicated at all.

Right, but we don't know what previously would have happened maybe the
user would have altered the publication action after the initial row
is published in which case this DELETE is required as is shown in the
example below. We can only make the decision based on the current
tuple. For example:

create table uno (a int primary key, b int, c int);

create publication uno for table uno where (a > 0)
with (publish='insert');

create publication dos for table uno where (a < 0)
with (publish='insert');

-- create subscription for both these publications.

insert into uno values (1, 2, 3), (-1, 3, 4);

Alter publication dos set (publish='update');

update uno set a = 2 where a = -1;

Now, in this case, the old row was replicated and we would need a
DELETE corresponding to it.

I think such issues due to ALTER of the publication are somewhat
expected, and I think users will understand they might need to resync
the subscription or something like that.

A similar example might be just changing the where condition,

create publication p for table t where (a > 10);

and then

alter publication p set table t where (a > 15);

If we replicated any rows with (a > 10) and (a <= 15), we'll just stop
replicating them. But if we re-create the subscription, we end up with a
different set of rows on the subscriber, omitting rows with (a <= 15).

In principle we'd need to replicate the ALTER somehow, to delete or
insert the rows that start/stop matching the row filter. It's a bit
similar to not replicating DDL, perhaps.

But I think the issue I've described is different, because you don't
have to change the subscriptions at all and you'll still have the
problem. I mean, imagine doing this:

-- publisher
create table t (a int primary key, b int);
create publication p for table t where (a > 10) with (publish='update');

-- subscriber
create table t (a int primary key, b int);
create subscription s connection '...' publication p;

-- publisher
insert into t select i, i from generate_series(1,20) s(i);
update t set b = b * 10;

-- subscriber
--> has no rows in "t"
--> recreate the subscription
drop subscription s;
create subscription s connection '...' publication p;

--> now it has all the rows with (a>10), because tablesync ignores
publication actions

The reason why I find this really annoying is that it makes it almost
impossible to setup two logical replicas that'd be "consistent", unless
you create them at the same time (= without any writes in between). And
it's damn difficult to think about the inconsistencies.

IMHO this all stems from allowing row filters and restricting pubactions
at the same time (notice this only used a single publication). So maybe
the best option would be to disallow combining these two features? That
would ensure the row filter filter is always applied to all actions in a
consistent manner, preventing all these issues.

Maybe that's not possible - maybe there are valid use cases that would
need such combination, and you mentioned replica identity might be an
issue (and maybe requiring RIF with row filters is not desirable).

So maybe we should at least warn against this in the documentation?

I'm not sure if pgoutput_row_filter() can even make reasonable decisions
with such configuration (combination of row filters, actions ...). But
it sure seems confusing, because if you just inserted the updated row,
it would get replicated.

True, but that is what the combination of publications suggests. The
publication that publishes inserts have different criteria than
updates, so such behavior (a particular row when inserted will be
replicated but when it came as a result of an update it won't be
replicated) is expected.

Which brings me to a second problem, related to this one. Imagine you
create the subscription *after* inserting the two rows. In that case you
get this:

a | b | c
----+---+---
1 | 2 | 3
-1 | 3 | 4
(2 rows)

because tablesync.c ignores which actions is the publication (and thus
the rowfilter) defined for.

Yeah, this is the behavior of tablesync.c with or without rowfilter.
It ignores publication actions. So, if you update any tuple before
creation of subscription it will be replicated but the same update
won't be replicated after initial sync if the publication just
publishes 'insert'. I think we can't decide which data to copy based
on publication actions as COPY wouldn't know if a particular row is
due to a fresh insert or due to an update. In your example, it is
possible that row (-1, 3, 4) would have been there due to an update.

Right. Which is why I think disallowing these two features (filtering
actions and row filters) might prevent this, because it eliminates this
ambiguity. It would not matter if a row was INSERTed or UPDATEd when
evaluating the row filter.

I think it's natural to expect that (INSERT + sync) and (sync + INSERT)
produce the same output on the subscriber.

I'm not sure we can actually make this perfectly sane with arbitrary
combinations of filters and actions. It would probably depend on whether
the actions are commutative, associative and stuff like that. But maybe
we can come up with restrictions that'd make this sane?

True, I think to some extent we rely on users to define it sanely
otherwise currently also it can easily lead to even replication being
stuck. This can happen when the user is trying to operate on the same
table and define publication/subscription on multiple nodes for it.
See [1] where we trying to deal with such a problem.

[1] - https://commitfest.postgresql.org/38/3610/

That seems to deal with a circular replication, i.e. two logical
replication links - a bit like a multi-master. Not sure how is that
related to the issue we're discussing here?

regards

--
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

#20

Amit Kapila

amit.kapila16@gmail.com

over 3 years ago

In reply to: Tomas Vondra (#19)

Re: bogus: logical replication rows/cols combinations

On Sat, Apr 30, 2022 at 2:02 AM Tomas Vondra
<tomas.vondra@enterprisedb.com> wrote:

On 4/29/22 06:48, Amit Kapila wrote:

On Thu, Apr 28, 2022 at 11:00 PM Tomas Vondra

I think such issues due to ALTER of the publication are somewhat
expected, and I think users will understand they might need to resync
the subscription or something like that.

A similar example might be just changing the where condition,

create publication p for table t where (a > 10);

and then

alter publication p set table t where (a > 15);

If we replicated any rows with (a > 10) and (a <= 15), we'll just stop
replicating them. But if we re-create the subscription, we end up with a
different set of rows on the subscriber, omitting rows with (a <= 15).

In principle we'd need to replicate the ALTER somehow, to delete or
insert the rows that start/stop matching the row filter. It's a bit
similar to not replicating DDL, perhaps.

But I think the issue I've described is different, because you don't
have to change the subscriptions at all and you'll still have the
problem. I mean, imagine doing this:

-- publisher
create table t (a int primary key, b int);
create publication p for table t where (a > 10) with (publish='update');

-- subscriber
create table t (a int primary key, b int);
create subscription s connection '...' publication p;

-- publisher
insert into t select i, i from generate_series(1,20) s(i);
update t set b = b * 10;

-- subscriber
--> has no rows in "t"
--> recreate the subscription
drop subscription s;
create subscription s connection '...' publication p;

--> now it has all the rows with (a>10), because tablesync ignores
publication actions

The reason why I find this really annoying is that it makes it almost
impossible to setup two logical replicas that'd be "consistent", unless
you create them at the same time (= without any writes in between). And
it's damn difficult to think about the inconsistencies.

I understood your case related to the initial sync and it is with or
without rowfilter.

IMHO this all stems from allowing row filters and restricting pubactions
at the same time (notice this only used a single publication). So maybe
the best option would be to disallow combining these two features? That
would ensure the row filter filter is always applied to all actions in a
consistent manner, preventing all these issues.

Maybe that's not possible - maybe there are valid use cases that would
need such combination, and you mentioned replica identity might be an
issue

Yes, that is the reason we can't combine the row filters for all pubactions.

(and maybe requiring RIF with row filters is not desirable).

So maybe we should at least warn against this in the documentation?

Yeah, I find this as the most suitable thing to do to address your
concern. I would like to add this information to the 'Initial
Snapshot' page with some examples (both with and without a row
filter).

True, I think to some extent we rely on users to define it sanely
otherwise currently also it can easily lead to even replication being
stuck. This can happen when the user is trying to operate on the same
table and define publication/subscription on multiple nodes for it.
See [1] where we trying to deal with such a problem.

[1] - https://commitfest.postgresql.org/38/3610/

That seems to deal with a circular replication, i.e. two logical
replication links - a bit like a multi-master. Not sure how is that
related to the issue we're discussing here?

It is not directly related to what we are discussing here but I was
trying to emphasize the point that users need to define the logical
replication via pub/sub sanely otherwise they might see some weird
behaviors like that.

--
With Regards,
Amit Kapila.

#21

Alvaro Herrera

alvherre@alvh.no-ip.org

over 3 years ago

In reply to: Tomas Vondra (#15)

1 attachment(s)

Re: bogus: logical replication rows/cols combinations

On 2022-Apr-28, Tomas Vondra wrote:

Attached is a patch doing the same thing in tablesync. The overall idea
is to generate copy statement with CASE expressions, applying filters to
individual columns. For Alvaro's example, this generates something like

SELECT
(CASE WHEN (a < 0) OR (a > 0) THEN a ELSE NULL END) AS a,
(CASE WHEN (a > 0) THEN b ELSE NULL END) AS b,
(CASE WHEN (a < 0) THEN c ELSE NULL END) AS c
FROM uno WHERE (a < 0) OR (a > 0)

I've been reading the tablesync.c code you propose and the idea seems
correct. (I was distracted by wondering if a different data structure
would be more appropriate, because what's there looks slightly
uncomfortable to work with. But after playing around I can't find
anything that feels better in an obvious way.)

(I confess I'm a bit bothered by the fact that there are now three
different data structures in our code called PublicationInfo).

I propose some comment changes in the attached patch, and my
interpretation (untested) of the idea of optimizing for a single
publication. (In there I also rename logicalrep_relmap_free_entry
because it's confusing. That should be a separate patch but I didn't
split it before posting, apologies.)

There's a couple options how we might optimize this for common cases.
For example if there's just a single publication, there's no need to
generate the CASE expressions - the WHERE filter will do the trick.

Right.

--
Álvaro Herrera Breisgau, Deutschland — https://www.EnterpriseDB.com/

Attachments:

0001-Change-some-comments.patchtext/x-diff; charset=utf-8Download

From b8abd662b36aa588a7566fa7da3a2b1cd78d5e51 Mon Sep 17 00:00:00 2001
From: Alvaro Herrera <alvherre@alvh.no-ip.org>
Date: Fri, 29 Apr 2022 17:41:34 +0200
Subject: [PATCH 1/2] Change some comments

---
 src/backend/replication/logical/relation.c  | 15 ++---
 src/backend/replication/logical/tablesync.c | 72 ++++++++++++---------
 2 files changed, 49 insertions(+), 38 deletions(-)

diff --git a/src/backend/replication/logical/relation.c b/src/backend/replication/logical/relation.c
index 80fb561a9a..9307a6bf1d 100644
--- a/src/backend/replication/logical/relation.c
+++ b/src/backend/replication/logical/relation.c
@@ -119,14 +119,13 @@ logicalrep_relmap_init(void)
 }
 
 /*
- * Free the entry of a relation map cache.
+ * Clear the contents of a LogicalRepRelMapEntry, freeing any subsidiary
+ * allocated members.  The entry itself is not freed.
  */
 static void
-logicalrep_relmap_free_entry(LogicalRepRelMapEntry *entry)
+logicalrep_relmap_clear_entry(LogicalRepRelMapEntry *entry)
 {
-	LogicalRepRelation *remoterel;
-
-	remoterel = &entry->remoterel;
+	LogicalRepRelation *remoterel = &entry->remoterel;
 
 	pfree(remoterel->nspname);
 	pfree(remoterel->relname);
@@ -165,13 +164,13 @@ logicalrep_relmap_update(LogicalRepRelation *remoterel)
 		logicalrep_relmap_init();
 
 	/*
-	 * HASH_ENTER returns the existing entry if present or creates a new one.
+	 * Find an hashtable entry for this rel, or create a new one.  If there
+	 * was one, clear its contents so we can fill it fresh.
 	 */
 	entry = hash_search(LogicalRepRelMap, (void *) &remoterel->remoteid,
 						HASH_ENTER, &found);
-
 	if (found)
-		logicalrep_relmap_free_entry(entry);
+		logicalrep_relmap_clear_entry(entry);
 
 	memset(entry, 0, sizeof(LogicalRepRelMapEntry));
 
diff --git a/src/backend/replication/logical/tablesync.c b/src/backend/replication/logical/tablesync.c
index fd547e16f4..82ee406eb4 100644
--- a/src/backend/replication/logical/tablesync.c
+++ b/src/backend/replication/logical/tablesync.c
@@ -1100,15 +1100,14 @@ copy_table(Relation rel)
 	}
 	else
 	{
+		List	   *quals;
+		ListCell   *lc;
+
 		/*
 		 * For non-tables and tables with row filters, we need to do COPY
 		 * (SELECT ...), but we can't just do SELECT * because we need to not
 		 * copy generated columns. For tables with any row filters, build a
 		 * SELECT query with OR'ed row filters for COPY.
-		 *
-		 * FIXME can be simplified if all subscriptions have the same column
-		 * list (or no column list), in which case we don't need the CASE
-		 * expressions at all.
 		 */
 		appendStringInfoString(&cmd, "COPY (SELECT ");
 		for (int i = 0; i < lrel.natts; i++)
@@ -1146,7 +1145,17 @@ copy_table(Relation rel)
 					appendStringInfo(&qual, " OR %s", strVal(pubinfo->rowfilter));
 			}
 
-			if (no_filter)
+			/*
+			 * If there's no row filter (which includes the case of there
+			 * being several publications of which at least one does not have
+			 * a row filter), or there is a single publication, then we can
+			 * emit the column name itself (the latter is possible because the
+			 * WHERE clause will take care of omitting these rows; this works
+			 * only with a single publication.)  If there are several
+			 * publications, then emit a CASE expression that only prints the
+			 * column if the row meets the row filter of any publications.
+			 */
+			if (no_filter || list_length(pubinfos) == 1)
 				appendStringInfoString(&cmd, quote_identifier(lrel.attnames[i]));
 			else
 				appendStringInfo(&cmd, "(CASE WHEN (%s) THEN %s ELSE NULL END) AS %s",
@@ -1169,38 +1178,41 @@ copy_table(Relation rel)
 
 		appendStringInfoString(&cmd, quote_qualified_identifier(lrel.nspname, lrel.relname));
 
+		/*
+		 * Now, based on the row filters, build a WHERE clause for the query.
+		 * If there's at least one publication with no row filter, then the
+		 * overall WHERE reduces to empty.
+		 */
+
+		quals = NIL;
+		foreach (lc, pubinfos)
 		{
-			List *qual = NIL;
-			ListCell *lc;
+			PublicationInfo *pubinfo = (PublicationInfo *) lfirst(lc);
 
-			foreach (lc, pubinfos)
+			if (pubinfo->rowfilter == NULL)
 			{
-				PublicationInfo *pubinfo = (PublicationInfo *) lfirst(lc);
-
-				if (pubinfo->rowfilter == NULL)
-				{
-					qual = NIL;
-					break;
-				}
-
-				qual = lappend(qual, pubinfo->rowfilter);
+				quals = NIL;
+				break;
 			}
 
-		/* list of OR'ed filters */
-		if (qual != NIL)
-		{
-			ListCell   *lc;
-			char	   *q = strVal(linitial(qual));
-
-			appendStringInfo(&cmd, " WHERE %s", q);
-			for_each_from(lc, qual, 1)
-			{
-				q = strVal(lfirst(lc));
-				appendStringInfo(&cmd, " OR %s", q);
-			}
-			list_free_deep(qual);
+			quals = lappend(quals, pubinfo->rowfilter);
 		}
 
+		/*
+		 * If we do have filters, write a WHERE clause that ORs them all
+		 * together.
+		 */
+		if (quals != NIL)
+		{
+			char	   *q = strVal(linitial(quals));
+
+			appendStringInfo(&cmd, " WHERE (%s)", q);
+			for_each_from(lc, quals, 1)
+			{
+				q = strVal(lfirst(lc));
+				appendStringInfo(&cmd, " OR (%s)", q);
+			}
+			list_free_deep(quals);
 		}
 
 		appendStringInfoString(&cmd, ") TO STDOUT");
-- 
2.30.2

#22

Alvaro Herrera

alvherre@alvh.no-ip.org

over 3 years ago

In reply to: Amit Kapila (#20)

Re: bogus: logical replication rows/cols combinations

On 2022-Apr-30, Amit Kapila wrote:

On Sat, Apr 30, 2022 at 2:02 AM Tomas Vondra
<tomas.vondra@enterprisedb.com> wrote:

That seems to deal with a circular replication, i.e. two logical
replication links - a bit like a multi-master. Not sure how is that
related to the issue we're discussing here?

It is not directly related to what we are discussing here but I was
trying to emphasize the point that users need to define the logical
replication via pub/sub sanely otherwise they might see some weird
behaviors like that.

I agree with that.

My proposal is that if users want to define multiple publications, and
their definitions conflict in a way that would behave ridiculously (==
bound to cause data inconsistencies eventually), an error should be
thrown. Maybe we will not be able to catch all bogus cases, but we can
be prepared for the most obvious ones, and patch later when we find
others.

--
Álvaro Herrera 48°01'N 7°57'E — https://www.EnterpriseDB.com/
"En las profundidades de nuestro inconsciente hay una obsesiva necesidad
de un universo lógico y coherente. Pero el universo real se halla siempre
un paso más allá de la lógica" (Irulan)

#23

Amit Kapila

amit.kapila16@gmail.com

over 3 years ago

In reply to: Alvaro Herrera (#22)

Re: bogus: logical replication rows/cols combinations

On Sat, Apr 30, 2022 at 3:01 PM Alvaro Herrera <alvherre@alvh.no-ip.org> wrote:

On 2022-Apr-30, Amit Kapila wrote:

On Sat, Apr 30, 2022 at 2:02 AM Tomas Vondra
<tomas.vondra@enterprisedb.com> wrote:

That seems to deal with a circular replication, i.e. two logical
replication links - a bit like a multi-master. Not sure how is that
related to the issue we're discussing here?

It is not directly related to what we are discussing here but I was
trying to emphasize the point that users need to define the logical
replication via pub/sub sanely otherwise they might see some weird
behaviors like that.

I agree with that.

My proposal is that if users want to define multiple publications, and
their definitions conflict in a way that would behave ridiculously (==
bound to cause data inconsistencies eventually), an error should be
thrown. Maybe we will not be able to catch all bogus cases, but we can
be prepared for the most obvious ones, and patch later when we find
others.

I agree with throwing errors for obvious/known bogus cases but do we
want to throw errors or restrict the combining of column lists when
row filters are present in all cases? See some examples [1 ] where it
may be valid to combine them.

[1]: /messages/by-id/CAA4eK1K+PkkC6_FDemGMC_i+Aakx+3=QG-g4We3BdCK7dK_bgA@mail.gmail.com

--
With Regards,
Amit Kapila.

#24

Alvaro Herrera

alvherre@alvh.no-ip.org

over 3 years ago

In reply to: Amit Kapila (#23)

Re: bogus: logical replication rows/cols combinations

On 2022-Apr-30, Amit Kapila wrote:

I agree with throwing errors for obvious/known bogus cases but do we
want to throw errors or restrict the combining of column lists when
row filters are present in all cases? See some examples [1 ] where it
may be valid to combine them.

I agree we should try to combine things when it is sensible to do so.
Another case that may make sense if there are two or more publications
with identical column lists but different row filters -- in such cases,
as Tomas suggested, we should combine the filters with OR.

Also, if only INSERTs are published and not UPDATE/DELETEs, then it
might be sensible to combine everything, regardless of whether or not
the column lists and row filters match.

--
Álvaro Herrera PostgreSQL Developer — https://www.EnterpriseDB.com/
"Tiene valor aquel que admite que es un cobarde" (Fernandel)

#25

Tomas Vondra

tomas.vondra@enterprisedb.com

over 3 years ago

In reply to: Alvaro Herrera (#21)

Re: bogus: logical replication rows/cols combinations

On 4/30/22 11:28, Alvaro Herrera wrote:

On 2022-Apr-28, Tomas Vondra wrote:

Attached is a patch doing the same thing in tablesync. The overall idea
is to generate copy statement with CASE expressions, applying filters to
individual columns. For Alvaro's example, this generates something like

SELECT
(CASE WHEN (a < 0) OR (a > 0) THEN a ELSE NULL END) AS a,
(CASE WHEN (a > 0) THEN b ELSE NULL END) AS b,
(CASE WHEN (a < 0) THEN c ELSE NULL END) AS c
FROM uno WHERE (a < 0) OR (a > 0)

I've been reading the tablesync.c code you propose and the idea seems
correct. (I was distracted by wondering if a different data structure
would be more appropriate, because what's there looks slightly
uncomfortable to work with. But after playing around I can't find
anything that feels better in an obvious way.)

(I confess I'm a bit bothered by the fact that there are now three
different data structures in our code called PublicationInfo).

True. I haven't really thought about naming of the data structures, so
maybe we should name them differently.

I propose some comment changes in the attached patch, and my
interpretation (untested) of the idea of optimizing for a single
publication. (In there I also rename logicalrep_relmap_free_entry
because it's confusing. That should be a separate patch but I didn't
split it before posting, apologies.)

There's a couple options how we might optimize this for common cases.
For example if there's just a single publication, there's no need to
generate the CASE expressions - the WHERE filter will do the trick.

Right.

OK, now that we agree on the approach in general, I'll look into these
optimizations (and the comments from your patch).

regards

--
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

#26

Tomas Vondra

tomas.vondra@enterprisedb.com

over 3 years ago

In reply to: Amit Kapila (#18)

Re: bogus: logical replication rows/cols combinations

On 4/29/22 07:05, Amit Kapila wrote:

On Thu, Apr 28, 2022 at 5:56 PM Peter Eisentraut
<peter.eisentraut@enterprisedb.com> wrote:

On 27.04.22 12:33, Amit Kapila wrote:

Currently, when the subscription has multiple publications, we combine
the objects, and actions of those publications. It happens for
'publish_via_partition_root', publication actions, tables, column
lists, or row filters. I think the whole design works on this idea
even the initial table sync. I think it might need a major change
(which I am not sure about at this stage) if we want to make the
initial sync also behave similar to what you are proposing.

If one publication says "publish if insert" and another publication says
"publish if update", then the combination of that is clearly "publish if
insert or update". Similarly, if one publication says "WHERE (foo)" and
one says "WHERE (bar)", then the combination is "WHERE (foo OR bar)".

But if one publication says "publish columns a and b if condition-X" and
another publication says "publish columns a and c if not-condition-X",
then the combination is clearly *not* "publish columns a, b, c if true".
That is not logical, in the literal sense of that word.

So, what should be the behavior in the below cases:

Case-1:
pub1: "publish columns a and b if condition-X"
pub2: "publish column c if condition-X"

Isn't it okay to combine these?

Yes, I think it's reasonable to combine those. So the whole publication
will have

WHERE (condition-X)

and the column list will be (a,b,c).

Case-2:
pub1: "publish columns a and b if condition-X"
pub2: "publish columns c if condition-Y"

In this case the publication will have

WHERE (condition-X or condition-Y)

and there will be different column filters for different row sets:

if (condition-X and condition-Y)
=> (a,b,c)
else if (condition-X and NOT condition-Y)
=> (a,b)
else if (condition-Y and NOT condition-X)
=> (c)

I think this behavior is reasonable, and it's what the patch does.

Here Y is subset of condition X (say something like condition-X:
"col > 5" and condition-Y: "col1 > 10").>
What should we do in such a case?

I think if there are some cases where combining them is okay but in
other cases, it is not okay then it is better to prohibit 'not-okay'
cases if that is feasible.

Not sure I understand what's the (supposed) issue with this example.
We'll simply do this:

if (col1 > 5 and col1 > 10)
=> (a,b,c)
else if (col1 > 5 and col1 <= 10)
=> (a,b)
else if (col1 > 10 and col1 <= 5)
=> (c)

Obviously, the third branch is unreachable, because the if condition can
never be satisfied, so we can never see only column list (c). But that's
fine IMO. When phrased using the CASE expressions (as in tablesync) it's
probably somewhat less cumbersome.

I think it's easier to think about this using "data redaction" example
where you specify which columns can be replicated under what condition.
Obviously, that's "orthogonal" in the sense that we specify column list
for a row filer condition, not row filter for a column. But in principle
it's the same thing, just different grammar.

And in that case it makes perfect sense that you don't blindly combine
the column lists from all publications, because that'd defeat the whole
point of filtering columns based on row filters.

Imagine have a table with customers from different regions, and you want
to replicate the data somewhere else, but for some reason you can only
replicate details for one particular region, and subset of columns for
everyone else. So you'd do something like this:

CREATE PUBLICATION p1 FOR TABLE customers (... all columns ...)
WHERE region = 'USA';

CREATE PUBLICATION p1 FOR TABLE customers (... subset of columns ...)
WHERE region != 'USA';

I think ignoring the row filters and just merging the column lists makes
no sense for this use case.

regards

--
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

#27

Tomas Vondra

tomas.vondra@enterprisedb.com

over 3 years ago

In reply to: Amit Kapila (#23)

Re: bogus: logical replication rows/cols combinations

On 4/30/22 12:11, Amit Kapila wrote:

On Sat, Apr 30, 2022 at 3:01 PM Alvaro Herrera <alvherre@alvh.no-ip.org> wrote:

On 2022-Apr-30, Amit Kapila wrote:

On Sat, Apr 30, 2022 at 2:02 AM Tomas Vondra
<tomas.vondra@enterprisedb.com> wrote:

That seems to deal with a circular replication, i.e. two logical
replication links - a bit like a multi-master. Not sure how is that
related to the issue we're discussing here?

It is not directly related to what we are discussing here but I was
trying to emphasize the point that users need to define the logical
replication via pub/sub sanely otherwise they might see some weird
behaviors like that.

I agree with that.

My proposal is that if users want to define multiple publications, and
their definitions conflict in a way that would behave ridiculously (==
bound to cause data inconsistencies eventually), an error should be
thrown. Maybe we will not be able to catch all bogus cases, but we can
be prepared for the most obvious ones, and patch later when we find
others.

I agree with throwing errors for obvious/known bogus cases but do we
want to throw errors or restrict the combining of column lists when
row filters are present in all cases? See some examples [1 ] where it
may be valid to combine them.

I think there are three challenges:

(a) Deciding what's an obvious bug or an unsupported case (e.g. because
it's not clear what's the correct behavior / way to merge column lists).

(b) When / where to detect the issue.

As I said before [1]/messages/by-id/45d27a8a-7c7a-88e8-a3db-c7c1d144df5e@enterprisedb.com, I think the issue stems from essentially allowing
DML to have different row filters / column lists. So we could forbid
publications to specify WITH (publish=...) and one of the two features,
or make sure subscription does not combine multiple such publications.

The second option has the annoying consequence that it makes this
useless for the "data redaction" use case I described in [2]/messages/by-id/338e719c-4bc8-f40a-f701-e29543a264e4@enterprisedb.com, because
that relies on combining multiple publications.

Furthermore, what if the publications change after the subscriptions get
created? Will we be able to detect the error etc.?

So I'd prefer the first option, but maybe that prevents some useful use
cases too ...

regards

[1]: /messages/by-id/45d27a8a-7c7a-88e8-a3db-c7c1d144df5e@enterprisedb.com
/messages/by-id/45d27a8a-7c7a-88e8-a3db-c7c1d144df5e@enterprisedb.com

[2]: /messages/by-id/338e719c-4bc8-f40a-f701-e29543a264e4@enterprisedb.com
/messages/by-id/338e719c-4bc8-f40a-f701-e29543a264e4@enterprisedb.com

--
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

#28

Amit Kapila

amit.kapila16@gmail.com

over 3 years ago

In reply to: Tomas Vondra (#27)

Re: bogus: logical replication rows/cols combinations

On Mon, May 2, 2022 at 3:27 AM Tomas Vondra
<tomas.vondra@enterprisedb.com> wrote:

On 4/30/22 12:11, Amit Kapila wrote:

On Sat, Apr 30, 2022 at 3:01 PM Alvaro Herrera <alvherre@alvh.no-ip.org> wrote:

My proposal is that if users want to define multiple publications, and
their definitions conflict in a way that would behave ridiculously (==
bound to cause data inconsistencies eventually), an error should be
thrown. Maybe we will not be able to catch all bogus cases, but we can
be prepared for the most obvious ones, and patch later when we find
others.

I agree with throwing errors for obvious/known bogus cases but do we
want to throw errors or restrict the combining of column lists when
row filters are present in all cases? See some examples [1 ] where it
may be valid to combine them.

I think there are three challenges:

(a) Deciding what's an obvious bug or an unsupported case (e.g. because
it's not clear what's the correct behavior / way to merge column lists).

(b) When / where to detect the issue.

(c) Making sure this does not break/prevent existing use cases.

As I said before [1], I think the issue stems from essentially allowing
DML to have different row filters / column lists. So we could forbid
publications to specify WITH (publish=...) and one of the two features,

I don't think this is feasible for row filters because that would mean
publishing all actions because we have a restriction that all columns
referenced in the row filter expression are part of the REPLICA
IDENTITY index. This restriction is only valid for updates/deletes, so
if we allow all pubactions then this will be imposed on inserts as
well. A similar restriction is there for column lists as well, so I
don't think we can do it there as well. Do you have some idea to
address it?

or make sure subscription does not combine multiple such publications.

Yeah, or don't allow to define such publications in the first place so
that different subscriptions can't combine them but I guess that might
forbid some useful cases as well where publication may not get
combined with other publications.

The second option has the annoying consequence that it makes this
useless for the "data redaction" use case I described in [2], because
that relies on combining multiple publications.

True, but as a workaround users can create different subscriptions for
different publications.

Furthermore, what if the publications change after the subscriptions get
created? Will we be able to detect the error etc.?

I think from that apart from 'Create Subscription', the same check
needs to be added for Alter Subscription ... Refresh, Alter
Subscription ... Enable.

In the publication side, we need an additional check in Alter
Publication ... SET table variant. One idea is that we get all other
publications for which the corresponding relation is defined. And then
if we find anything which we don't want to allow then we can throw an
error. This will forbid some useful cases as well as mentioned above.
So, the other possibility is to expose all publications for a
walsender, and then we can find the exact set of publications where
the current publication is used with other publications and we can
check only those publications. So, if we have three walsenders
(walsnd1: pub1, pub2; walsnd2 pub2; walsnd3: pub2, pub3) in the system
and we are currently altering publication pub1 then we need to check
only pub3 for any conflicting conditions. Yet another simple way could
be that we don't allow to change column list via Alter Publication ...
Set variant because the other variants anyway need REFRESH publication
which we have covered.

I think it is tricky to decide what exactly we want to forbid, so, we
may want to follow something simple like if the column list and row
filters for a table are different in the required set of publications
then we treat it as an unsupported case. I think this will prohibit
some useful cases but should probably forbid the cases we are worried
about here.

--
With Regards,
Amit Kapila.

#29

Amit Kapila

amit.kapila16@gmail.com

over 3 years ago

In reply to: Amit Kapila (#28)

Re: bogus: logical replication rows/cols combinations

On Mon, May 2, 2022 at 11:01 AM Amit Kapila <amit.kapila16@gmail.com> wrote:

On Mon, May 2, 2022 at 3:27 AM Tomas Vondra
<tomas.vondra@enterprisedb.com> wrote:

On 4/30/22 12:11, Amit Kapila wrote:

On Sat, Apr 30, 2022 at 3:01 PM Alvaro Herrera <alvherre@alvh.no-ip.org> wrote:

My proposal is that if users want to define multiple publications, and
their definitions conflict in a way that would behave ridiculously (==
bound to cause data inconsistencies eventually), an error should be
thrown. Maybe we will not be able to catch all bogus cases, but we can
be prepared for the most obvious ones, and patch later when we find
others.

I agree with throwing errors for obvious/known bogus cases but do we
want to throw errors or restrict the combining of column lists when
row filters are present in all cases? See some examples [1 ] where it
may be valid to combine them.

I think there are three challenges:

(a) Deciding what's an obvious bug or an unsupported case (e.g. because
it's not clear what's the correct behavior / way to merge column lists).

(b) When / where to detect the issue.

(c) Making sure this does not break/prevent existing use cases.

As I said before [1], I think the issue stems from essentially allowing
DML to have different row filters / column lists. So we could forbid
publications to specify WITH (publish=...) and one of the two features,

I don't think this is feasible for row filters because that would mean
publishing all actions because we have a restriction that all columns

Read the above sentence as: "publishing all actions and we have a
restriction that all columns ..."

referenced in the row filter expression are part of the REPLICA
IDENTITY index. This restriction is only valid for updates/deletes, so
if we allow all pubactions then this will be imposed on inserts as
well. A similar restriction is there for column lists as well, so I
don't think we can do it there as well. Do you have some idea to
address it?

or make sure subscription does not combine multiple such publications.

Yeah, or don't allow to define such publications in the first place so
that different subscriptions can't combine them but I guess that might
forbid some useful cases as well where publication may not get
combined with other publications.

The second option has the annoying consequence that it makes this
useless for the "data redaction" use case I described in [2], because
that relies on combining multiple publications.

True, but as a workaround users can create different subscriptions for
different publications.

Furthermore, what if the publications change after the subscriptions get
created? Will we be able to detect the error etc.?

I think from that apart from 'Create Subscription', the same check
needs to be added for Alter Subscription ... Refresh, Alter
Subscription ... Enable.

In the publication side, we need an additional check in Alter
Publication ... SET table variant. One idea is that we get all other
publications for which the corresponding relation is defined. And then
if we find anything which we don't want to allow then we can throw an
error. This will forbid some useful cases as well as mentioned above.
So, the other possibility is to expose all publications for a
walsender, and then we can find the exact set of publications where
the current publication is used with other publications and we can
check only those publications. So, if we have three walsenders
(walsnd1: pub1, pub2; walsnd2 pub2; walsnd3: pub2, pub3) in the system
and we are currently altering publication pub1 then we need to check
only pub3 for any conflicting conditions.

Typo, it should be pub2 instead of pub3 in the above sentence.

Yet another simple way could
be that we don't allow to change column list via Alter Publication ...
Set variant because the other variants anyway need REFRESH publication
which we have covered.

I think it is tricky to decide what exactly we want to forbid, so, we
may want to follow something simple like if the column list and row
filters for a table are different in the required set of publications
then we treat it as an unsupported case. I think this will prohibit
some useful cases but should probably forbid the cases we are worried
about here.

--
With Regards,
Amit Kapila.

#30

Tomas Vondra

tomas.vondra@enterprisedb.com

over 3 years ago

In reply to: Amit Kapila (#28)

Re: bogus: logical replication rows/cols combinations

On 5/2/22 07:31, Amit Kapila wrote:

On Mon, May 2, 2022 at 3:27 AM Tomas Vondra
<tomas.vondra@enterprisedb.com> wrote:

On 4/30/22 12:11, Amit Kapila wrote:

On Sat, Apr 30, 2022 at 3:01 PM Alvaro Herrera <alvherre@alvh.no-ip.org> wrote:

My proposal is that if users want to define multiple publications, and
their definitions conflict in a way that would behave ridiculously (==
bound to cause data inconsistencies eventually), an error should be
thrown. Maybe we will not be able to catch all bogus cases, but we can
be prepared for the most obvious ones, and patch later when we find
others.

I agree with throwing errors for obvious/known bogus cases but do we
want to throw errors or restrict the combining of column lists when
row filters are present in all cases? See some examples [1 ] where it
may be valid to combine them.

I think there are three challenges:

(a) Deciding what's an obvious bug or an unsupported case (e.g. because
it's not clear what's the correct behavior / way to merge column lists).

(b) When / where to detect the issue.

(c) Making sure this does not break/prevent existing use cases.

As I said before [1], I think the issue stems from essentially allowing
DML to have different row filters / column lists. So we could forbid
publications to specify WITH (publish=...) and one of the two features,

I don't think this is feasible for row filters because that would mean
publishing all actions because we have a restriction that all columns
referenced in the row filter expression are part of the REPLICA
IDENTITY index. This restriction is only valid for updates/deletes, so
if we allow all pubactions then this will be imposed on inserts as
well. A similar restriction is there for column lists as well, so I
don't think we can do it there as well. Do you have some idea to
address it?

No, I haven't thought about how exactly to implement this, and I have
not thought about how to deal with the replica identity issues. My
thoughts were that we'd only really need this for tables with row
filters and/or column lists, treating it as a cost of those features.

But yeah, it seems annoying.

or make sure subscription does not combine multiple such publications.

Yeah, or don't allow to define such publications in the first place so
that different subscriptions can't combine them but I guess that might
forbid some useful cases as well where publication may not get
combined with other publications.

But how would you check that? You don't know which publications will be
combined by a subscription until you create the subscription, right?

The second option has the annoying consequence that it makes this
useless for the "data redaction" use case I described in [2], because
that relies on combining multiple publications.

True, but as a workaround users can create different subscriptions for
different publications.

Won't that replicate duplicate data, when the row filters re not
mutually exclusive?

Furthermore, what if the publications change after the subscriptions get
created? Will we be able to detect the error etc.?

I think from that apart from 'Create Subscription', the same check
needs to be added for Alter Subscription ... Refresh, Alter
Subscription ... Enable.

In the publication side, we need an additional check in Alter
Publication ... SET table variant. One idea is that we get all other
publications for which the corresponding relation is defined. And then
if we find anything which we don't want to allow then we can throw an
error. This will forbid some useful cases as well as mentioned above.
So, the other possibility is to expose all publications for a
walsender, and then we can find the exact set of publications where
the current publication is used with other publications and we can
check only those publications. So, if we have three walsenders
(walsnd1: pub1, pub2; walsnd2 pub2; walsnd3: pub2, pub3) in the system
and we are currently altering publication pub1 then we need to check
only pub3 for any conflicting conditions. Yet another simple way could
be that we don't allow to change column list via Alter Publication ...
Set variant because the other variants anyway need REFRESH publication
which we have covered.

I think it is tricky to decide what exactly we want to forbid, so, we
may want to follow something simple like if the column list and row
filters for a table are different in the required set of publications
then we treat it as an unsupported case. I think this will prohibit
some useful cases but should probably forbid the cases we are worried
about here.

I don't have a clear idea on what the right tradeoff is :-(

Maybe we're digressing a bit from the stuff Alvaro complained about
initially. Arguably the existing column list behavior is surprising and
would not work with reasonable use cases. So let's fix it.

But maybe you're right validating row filters is a step too far. Yes,
users may define strange combinations of publications, but is that
really an issue we have to solve?

regards

--
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

#31

Alvaro Herrera

alvherre@alvh.no-ip.org

over 3 years ago

In reply to: Tomas Vondra (#30)

Re: bogus: logical replication rows/cols combinations

On 2022-May-02, Tomas Vondra wrote:

On 5/2/22 07:31, Amit Kapila wrote:

Yeah, or don't allow to define such publications in the first place so
that different subscriptions can't combine them but I guess that might
forbid some useful cases as well where publication may not get
combined with other publications.

But how would you check that? You don't know which publications will be
combined by a subscription until you create the subscription, right?

... and I think this poses a problem: if the publisher has multiple
publications and the subscriber later uses those to create a combined
subscription, we can check at CREATE SUBSCRIPTION time that they can be
combined correctly. But if the publisher decides to change the
publications changing the rules and they are no longer consistent, can
we throw an error at ALTER PUBLICATION point? If the publisher can
detect that they are being used together by some subscription, then
maybe we can check consistency in the publication side and everything is
all right. But I'm not sure that the publisher knows who is subscribed
to what, so this might not be an option.

The latter ultimately means that we aren't sure that a combined
subscription is safe. And in turn this means that a pg_dump of such a
database cannot be restored (because the CREATE SUBSCRIPTION will be
rejected as being inconsistent).

--
Álvaro Herrera 48°01'N 7°57'E — https://www.EnterpriseDB.com/

#32

Tomas Vondra

tomas.vondra@enterprisedb.com

over 3 years ago

In reply to: Alvaro Herrera (#31)

Re: bogus: logical replication rows/cols combinations

On 5/2/22 12:17, Alvaro Herrera wrote:

On 2022-May-02, Tomas Vondra wrote:

On 5/2/22 07:31, Amit Kapila wrote:

Yeah, or don't allow to define such publications in the first place so
that different subscriptions can't combine them but I guess that might
forbid some useful cases as well where publication may not get
combined with other publications.

But how would you check that? You don't know which publications will be
combined by a subscription until you create the subscription, right?

... and I think this poses a problem: if the publisher has multiple
publications and the subscriber later uses those to create a combined
subscription, we can check at CREATE SUBSCRIPTION time that they can be
combined correctly. But if the publisher decides to change the
publications changing the rules and they are no longer consistent, can
we throw an error at ALTER PUBLICATION point? If the publisher can
detect that they are being used together by some subscription, then
maybe we can check consistency in the publication side and everything is
all right. But I'm not sure that the publisher knows who is subscribed
to what, so this might not be an option.

AFAIK we don't track that (publication/subscription mapping). The
publications are listed in publication_names parameter of the
START_REPLICATION command.

The latter ultimately means that we aren't sure that a combined
subscription is safe. And in turn this means that a pg_dump of such a
database cannot be restored (because the CREATE SUBSCRIPTION will be
rejected as being inconsistent).

We could do this check when executing the START_REPLICATION command, no?

regards

--
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

#33

Alvaro Herrera

alvherre@alvh.no-ip.org

over 3 years ago

In reply to: Tomas Vondra (#32)

Re: bogus: logical replication rows/cols combinations

On 2022-May-02, Tomas Vondra wrote:

On 5/2/22 12:17, Alvaro Herrera wrote:

The latter ultimately means that we aren't sure that a combined
subscription is safe. And in turn this means that a pg_dump of such a
database cannot be restored (because the CREATE SUBSCRIPTION will be
rejected as being inconsistent).

We could do this check when executing the START_REPLICATION command, no?

Ah! That sounds like it might work: we throw WARNINGs are CREATE
SUBSCRIPTION (so that users are immediately aware in case something is
going to fail later, but the objects are still created and they can fix
the publications afterwards), but the real ERROR is in START_REPLICATION.

--
Álvaro Herrera PostgreSQL Developer — https://www.EnterpriseDB.com/
"Uno puede defenderse de los ataques; contra los elogios se esta indefenso"

#34

Amit Kapila

amit.kapila16@gmail.com

over 3 years ago

In reply to: Tomas Vondra (#32)

Re: bogus: logical replication rows/cols combinations

On Mon, May 2, 2022 at 3:53 PM Tomas Vondra
<tomas.vondra@enterprisedb.com> wrote:

On 5/2/22 12:17, Alvaro Herrera wrote:

On 2022-May-02, Tomas Vondra wrote:

On 5/2/22 07:31, Amit Kapila wrote:

Yeah, or don't allow to define such publications in the first place so
that different subscriptions can't combine them but I guess that might
forbid some useful cases as well where publication may not get
combined with other publications.

But how would you check that? You don't know which publications will be
combined by a subscription until you create the subscription, right?

Yeah, I was thinking to check for all publications where the same
relation is published but as mentioned that may not be a very good
option as that would unnecessarily block many valid cases.

... and I think this poses a problem: if the publisher has multiple
publications and the subscriber later uses those to create a combined
subscription, we can check at CREATE SUBSCRIPTION time that they can be
combined correctly. But if the publisher decides to change the
publications changing the rules and they are no longer consistent, can
we throw an error at ALTER PUBLICATION point? If the publisher can
detect that they are being used together by some subscription, then
maybe we can check consistency in the publication side and everything is
all right. But I'm not sure that the publisher knows who is subscribed
to what, so this might not be an option.

AFAIK we don't track that (publication/subscription mapping). The
publications are listed in publication_names parameter of the
START_REPLICATION command.

We don't do that currently but we can as mentioned in my previous
email [1]/messages/by-id/CAA4eK1LGX-ig==QyL+==nKvcAS3qFU7=NiKL77ukUT-Q_4XncQ@mail.gmail.com. Let me write the relevant part again. We need to expose all
publications for a walsender, and then we can find the exact set of
publications where the current publication is used with other
publications and we can check only those publications. So, if we have
three walsenders (walsnd1: pub1, pub2; walsnd2 pub2; walsnd3: pub2,
pub3) in the system and we are currently altering publication pub1
then we need to check only pub3 for any conflicting conditions.

I think it is possible to expose a list of publications for each
walsender as it is stored in each walsenders
LogicalDecodingContext->output_plugin_private. AFAIK, each walsender
can have one such LogicalDecodingContext and we can probably share it
via shared memory?

[1]: /messages/by-id/CAA4eK1LGX-ig==QyL+==nKvcAS3qFU7=NiKL77ukUT-Q_4XncQ@mail.gmail.com

--
With Regards,
Amit Kapila.

#35

Amit Kapila

amit.kapila16@gmail.com

over 3 years ago

In reply to: Tomas Vondra (#30)

Re: bogus: logical replication rows/cols combinations

On Mon, May 2, 2022 at 3:05 PM Tomas Vondra
<tomas.vondra@enterprisedb.com> wrote:

On 5/2/22 07:31, Amit Kapila wrote:

On Mon, May 2, 2022 at 3:27 AM Tomas Vondra
<tomas.vondra@enterprisedb.com> wrote:

The second option has the annoying consequence that it makes this
useless for the "data redaction" use case I described in [2], because
that relies on combining multiple publications.

True, but as a workaround users can create different subscriptions for
different publications.

Won't that replicate duplicate data, when the row filters re not
mutually exclusive?

True, but this is a recommendation for mutually exclusive data, and as
far as I can understand the example given by you [1]/messages/by-id/338e719c-4bc8-f40a-f701-e29543a264e4@enterprisedb.com and Alvaro has
mutually exclusive conditions. In your example, one of the
publications has a condition (region = 'USA') and the other
publication has a condition (region != 'USA'), so will there be a
problem in using different subscriptions for such cases?

[1]: /messages/by-id/338e719c-4bc8-f40a-f701-e29543a264e4@enterprisedb.com

--
With Regards,
Amit Kapila.

#36

Alvaro Herrera

alvherre@alvh.no-ip.org

over 3 years ago

In reply to: Amit Kapila (#34)

Re: bogus: logical replication rows/cols combinations

On 2022-May-02, Amit Kapila wrote:

We don't do that currently but we can as mentioned in my previous
email [1]. Let me write the relevant part again. We need to expose all
publications for a walsender, and then we can find the exact set of
publications where the current publication is used with other
publications and we can check only those publications. So, if we have
three walsenders (walsnd1: pub1, pub2; walsnd2 pub2; walsnd3: pub2,
pub3) in the system and we are currently altering publication pub1
then we need to check only pub3 for any conflicting conditions.

Hmm ... so what happens in the current system, if you have a running
walsender and modify the publication concurrently? Will the subscriber
start getting the changes with the new publication definition, at some
arbitrary point in the middle of their stream? If that's what we do,
maybe we should have a signalling system which disconnects all
walsenders using that publication, so that they can connect and receive
the new definition.

I don't see anything in the publication DDL that interacts with
walsenders -- perhaps I'm overlooking something.

I think it is possible to expose a list of publications for each
walsender as it is stored in each walsenders
LogicalDecodingContext->output_plugin_private. AFAIK, each walsender
can have one such LogicalDecodingContext and we can probably share it
via shared memory?

I guess we need to create a DSM each time a walsender opens a
connection, at START_REPLICATION time. Then ALTER PUBLICATION needs to
connect to all DSMs of all running walsenders and see if they are
reading from it. Is that what you have in mind? Alternatively, we
could have one DSM per publication with a PID array of all walsenders
that are sending it (each walsender needs to add its PID as it starts).
The latter might be better.

--
Álvaro Herrera 48°01'N 7°57'E — https://www.EnterpriseDB.com/
"La rebeldía es la virtud original del hombre" (Arthur Schopenhauer)

#37

Alvaro Herrera

alvherre@alvh.no-ip.org

over 3 years ago

In reply to: Tomas Vondra (#15)

Re: bogus: logical replication rows/cols combinations

On 2022-Apr-28, Tomas Vondra wrote:

SELECT
(CASE WHEN (a < 0) OR (a > 0) THEN a ELSE NULL END) AS a,
(CASE WHEN (a > 0) THEN b ELSE NULL END) AS b,
(CASE WHEN (a < 0) THEN c ELSE NULL END) AS c
FROM uno WHERE (a < 0) OR (a > 0)

BTW, looking at the new COPY commands, the idea of "COPY table_foo
(PUBLICATION pub1, pub2)" is looking more and more attractive, as a
replacement for having the replica cons up an ad-hoc subquery to COPY
from. Something to think about for pg16, maybe.

--
Álvaro Herrera Breisgau, Deutschland — https://www.EnterpriseDB.com/
"You're _really_ hosed if the person doing the hiring doesn't understand
relational systems: you end up with a whole raft of programmers, none of
whom has had a Date with the clue stick." (Andrew Sullivan)

#38

Tomas Vondra

tomas.vondra@enterprisedb.com

over 3 years ago

In reply to: Alvaro Herrera (#36)

Re: bogus: logical replication rows/cols combinations

On 5/2/22 13:44, Alvaro Herrera wrote:

On 2022-May-02, Amit Kapila wrote:

We don't do that currently but we can as mentioned in my previous
email [1]. Let me write the relevant part again. We need to expose all
publications for a walsender, and then we can find the exact set of
publications where the current publication is used with other
publications and we can check only those publications. So, if we have
three walsenders (walsnd1: pub1, pub2; walsnd2 pub2; walsnd3: pub2,
pub3) in the system and we are currently altering publication pub1
then we need to check only pub3 for any conflicting conditions.

Hmm ... so what happens in the current system, if you have a running
walsender and modify the publication concurrently? Will the subscriber
start getting the changes with the new publication definition, at some
arbitrary point in the middle of their stream? If that's what we do,
maybe we should have a signalling system which disconnects all
walsenders using that publication, so that they can connect and receive
the new definition.

I don't see anything in the publication DDL that interacts with
walsenders -- perhaps I'm overlooking something.

pgoutput.c is relies on relcache callbacks to get notified of changes.
See the stuff that touches replicate_valid and publications_valid. So
the walsender should notice the changes immediately.

Maybe you have some particular case in mind, though?

I think it is possible to expose a list of publications for each
walsender as it is stored in each walsenders
LogicalDecodingContext->output_plugin_private. AFAIK, each walsender
can have one such LogicalDecodingContext and we can probably share it
via shared memory?

I guess we need to create a DSM each time a walsender opens a
connection, at START_REPLICATION time. Then ALTER PUBLICATION needs to
connect to all DSMs of all running walsenders and see if they are
reading from it. Is that what you have in mind? Alternatively, we
could have one DSM per publication with a PID array of all walsenders
that are sending it (each walsender needs to add its PID as it starts).
The latter might be better.

I don't quite follow what we're trying to build here. The walsender
already knows which publications it works with - how else would
pgoutput.c know that? So the walsender should be able to validate the
stuff it's supposed to replicate is OK.

Why would we need to know publications replicated by other walsenders?
And what if the subscriber is not connected at the moment? In that case
there'll be no walsender.

regards

--
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

#39

Alvaro Herrera

alvherre@alvh.no-ip.org

over 3 years ago

In reply to: Tomas Vondra (#38)

Re: bogus: logical replication rows/cols combinations

On 2022-May-02, Tomas Vondra wrote:

pgoutput.c is relies on relcache callbacks to get notified of changes.
See the stuff that touches replicate_valid and publications_valid. So
the walsender should notice the changes immediately.

Hmm, I suppose that makes any changes easy enough to detect. We don't
need a separate signalling mechanism.

But it does mean that the walsender needs to test the consistency of
[rowfilter, column list, published actions] whenever they change for any
of the current publications and it is working for more than one, and
disconnect if the combination no longer complies with the rules. By the
next time the replica tries to connect, START_REPLICATION will throw the
error.

Why would we need to know publications replicated by other walsenders?
And what if the subscriber is not connected at the moment? In that case
there'll be no walsender.

Sure, if the replica is not connected then there's no issue -- as you
say, that replica will fail at START_REPLICATION time.

--
Álvaro Herrera PostgreSQL Developer — https://www.EnterpriseDB.com/
"La gente vulgar sólo piensa en pasar el tiempo;
el que tiene talento, en aprovecharlo"

#40

Tomas Vondra

tomas.vondra@enterprisedb.com

over 3 years ago

In reply to: Amit Kapila (#35)

Re: bogus: logical replication rows/cols combinations

On 5/2/22 13:23, Amit Kapila wrote:

On Mon, May 2, 2022 at 3:05 PM Tomas Vondra
<tomas.vondra@enterprisedb.com> wrote:

On 5/2/22 07:31, Amit Kapila wrote:

On Mon, May 2, 2022 at 3:27 AM Tomas Vondra
<tomas.vondra@enterprisedb.com> wrote:

The second option has the annoying consequence that it makes this
useless for the "data redaction" use case I described in [2], because
that relies on combining multiple publications.

True, but as a workaround users can create different subscriptions for
different publications.

Won't that replicate duplicate data, when the row filters re not
mutually exclusive?

True, but this is a recommendation for mutually exclusive data, and as
far as I can understand the example given by you [1] and Alvaro has
mutually exclusive conditions. In your example, one of the
publications has a condition (region = 'USA') and the other
publication has a condition (region != 'USA'), so will there be a
problem in using different subscriptions for such cases?

I kept that example intentionally simple, but I'm sure we could come up
with more complex use cases. Following the "data redaction" idea, we
could also apply the "deny all" approach, and do something like this:

-- replicate the minimal column list by default (replica identity)
CREATE PUBLICATION p1 FOR TABLE t (id, region);

-- replicate more columns for the selected region
CREATE PUBLICATION p2 FOR TABLE t (...) WHERE (region = 'USA')

Now, I admit this is something I just made up, but I think it seems like
a pretty common approach.

regards

--
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

#41

Tomas Vondra

tomas.vondra@enterprisedb.com

over 3 years ago

In reply to: Alvaro Herrera (#39)

Re: bogus: logical replication rows/cols combinations

On 5/2/22 19:51, Alvaro Herrera wrote:

On 2022-May-02, Tomas Vondra wrote:

pgoutput.c is relies on relcache callbacks to get notified of changes.
See the stuff that touches replicate_valid and publications_valid. So
the walsender should notice the changes immediately.

Hmm, I suppose that makes any changes easy enough to detect. We don't
need a separate signalling mechanism.

But it does mean that the walsender needs to test the consistency of
[rowfilter, column list, published actions] whenever they change for any
of the current publications and it is working for more than one, and
disconnect if the combination no longer complies with the rules. By the
next time the replica tries to connect, START_REPLICATION will throw the
error.

Why would we need to know publications replicated by other walsenders?
And what if the subscriber is not connected at the moment? In that case
there'll be no walsender.

Sure, if the replica is not connected then there's no issue -- as you
say, that replica will fail at START_REPLICATION time.

Right, I got confused a bit.

Anyway, I think the main challenge is defining what exactly we want to
check, in order to ensure "sensible" behavior, without preventing way
too many sensible use cases.

regards

--
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

#42

Peter Eisentraut

peter.eisentraut@enterprisedb.com

over 3 years ago

In reply to: Tomas Vondra (#26)

Re: bogus: logical replication rows/cols combinations

On 01.05.22 23:42, Tomas Vondra wrote:

Imagine have a table with customers from different regions, and you want
to replicate the data somewhere else, but for some reason you can only
replicate details for one particular region, and subset of columns for
everyone else. So you'd do something like this:

CREATE PUBLICATION p1 FOR TABLE customers (... all columns ...)
WHERE region = 'USA';

CREATE PUBLICATION p1 FOR TABLE customers (... subset of columns ...)
WHERE region != 'USA';

I think ignoring the row filters and just merging the column lists makes
no sense for this use case.

I'm thinking now the underlying problem is that we shouldn't combine
column lists at all. Examples like the above where you want to redact
values somehow are better addressed with something like triggers or an
actual "column filter" that works dynamically or some other mechanism.

The main purpose, in my mind, of column lists is if the tables
statically have different shapes on publisher and subscriber. Perhaps
for space reasons or regulatory reasons you don't want to replicate
everything. But then it doesn't make sense to combine column lists. If
you decide over here that the subscriber table has this shape and over
there that the subscriber table has that other shape, then the
combination of the two will be a table that has neither shape and so
will not work for anything.

I think in general we should be much more restrictive in how we combine
publications. Unless we are really sure it makes sense, we should
disallow it. Users can always make a new publication with different
settings and subscribe to that directly.

#43

Amit Kapila

amit.kapila16@gmail.com

over 3 years ago

In reply to: Tomas Vondra (#41)

Re: bogus: logical replication rows/cols combinations

On Tue, May 3, 2022 at 12:10 AM Tomas Vondra
<tomas.vondra@enterprisedb.com> wrote:

On 5/2/22 19:51, Alvaro Herrera wrote:

Why would we need to know publications replicated by other walsenders?
And what if the subscriber is not connected at the moment? In that case
there'll be no walsender.

Sure, if the replica is not connected then there's no issue -- as you
say, that replica will fail at START_REPLICATION time.

Right, I got confused a bit.

Anyway, I think the main challenge is defining what exactly we want to
check, in order to ensure "sensible" behavior, without preventing way
too many sensible use cases.

I could think of below two options:
1. Forbid any case where column list is different for the same table
when combining publications.
2. Forbid if the column list and row filters for a table are different
in the set of publications we are planning to combine. This means we
will allow combining column lists when row filters are not present or
when column list is the same (we don't get anything additional by
combining but the idea is we won't forbid such cases) and row filters
are different.

Now, I think the points in favor of (1) are that the main purpose of
introducing a column list are: (a) the structure/schema of the
subscriber is different from the publisher, (b) want to hide sensitive
columns data. In both cases, it should be fine if we follow (1) and
from Peter E.'s latest email [1]/messages/by-id/47dd2cb9-4e96-169f-15ac-f9407fb54d43@enterprisedb.com he also seems to be indicating the
same. If we want to be slightly more relaxed then we can probably (2).
We can decide on something else as well but I feel it should be such
that it is easy to explain.

[1]: /messages/by-id/47dd2cb9-4e96-169f-15ac-f9407fb54d43@enterprisedb.com

--
With Regards,
Amit Kapila.

#44

Amit Kapila

amit.kapila16@gmail.com

over 3 years ago

In reply to: Alvaro Herrera (#36)

Re: bogus: logical replication rows/cols combinations

On Mon, May 2, 2022 at 6:11 PM Alvaro Herrera <alvherre@alvh.no-ip.org> wrote:

On 2022-May-02, Amit Kapila wrote:

I think it is possible to expose a list of publications for each
walsender as it is stored in each walsenders
LogicalDecodingContext->output_plugin_private. AFAIK, each walsender
can have one such LogicalDecodingContext and we can probably share it
via shared memory?

I guess we need to create a DSM each time a walsender opens a
connection, at START_REPLICATION time. Then ALTER PUBLICATION needs to
connect to all DSMs of all running walsenders and see if they are
reading from it. Is that what you have in mind?

Yes, something on these lines. We need a way to get the list of
publications each walsender is publishing data for.

Alternatively, we
could have one DSM per publication with a PID array of all walsenders
that are sending it (each walsender needs to add its PID as it starts).

I think for this we need to check DSM for all the publications and I
feel in general publications should be more than the number of
walsenders, so the previous approach seems better to me. However, any
one of these or similar ideas should be okay.

--
With Regards,
Amit Kapila.

#45

Tomas Vondra

tomas.vondra@enterprisedb.com

over 3 years ago

In reply to: Peter Eisentraut (#42)

Re: bogus: logical replication rows/cols combinations

On 5/2/22 22:34, Peter Eisentraut wrote:

On 01.05.22 23:42, Tomas Vondra wrote:

Imagine have a table with customers from different regions, and you want
to replicate the data somewhere else, but for some reason you can only
replicate details for one particular region, and subset of columns for
everyone else. So you'd do something like this:

CREATE PUBLICATION p1 FOR TABLE customers (... all columns ...)
WHERE region = 'USA';

CREATE PUBLICATION p1 FOR TABLE customers (... subset of columns ...)
WHERE region != 'USA';

I think ignoring the row filters and just merging the column lists makes
no sense for this use case.

I'm thinking now the underlying problem is that we shouldn't combine
column lists at all. Examples like the above where you want to redact
values somehow are better addressed with something like triggers or an
actual "column filter" that works dynamically or some other mechanism.

So what's wrong with merging the column lists as implemented in the v2
patch, posted a couple days ago?

I don't think triggers are a suitable alternative, as it executes on the
subscriber node. So you have to first copy the data to the remote node,
where it gets filtered. With column filters the data gets redacted on
the publisher.

The main purpose, in my mind, of column lists is if the tables
statically have different shapes on publisher and subscriber. Perhaps
for space reasons or regulatory reasons you don't want to replicate
everything. But then it doesn't make sense to combine column lists. If
you decide over here that the subscriber table has this shape and over
there that the subscriber table has that other shape, then the
combination of the two will be a table that has neither shape and so
will not work for anything.

Yeah. If we intend to use column lists only to adapt to a different
schema on the subscriber node, then maybe it'd be fine to not merge
column lists. It'd probably be reasonable to allow at least cases with
multiple publications using the same column list, though. In that case
there's no ambiguity.

I think in general we should be much more restrictive in how we combine
publications. Unless we are really sure it makes sense, we should
disallow it. Users can always make a new publication with different
settings and subscribe to that directly.

I agree with that in principle - correct first, flexibility second. If
the behavior is not correct, it doesn't matter how flexible it is.

I still think the data redaction use case is valid/interesting, but if
we want to impose some restrictions I'm OK with that, as long as it's
done in a way that we can relax in the future to allow that use case
(that is, without introducing any incompatibilities).

However, what's the definition of "correctness" in this context? Without
that it's hard to say if the restrictions make the behavior any more
correct. It'd be unfortunate to impose restritions, which will prevent
some use cases, only to discover we haven't actually made it correct.

For example, is it enough to restrict column lists, or does it need to
restrict e.g. row filters too? And does it need to consider other stuff,
like publications replicating different actions?

For example, if we allow different column lists (or row filters) for
different actions (one publication for insert, another one for update),
we still have the strange behavior described before.

And if we force users to use separate subscriptions, I'm not sure that
really improves the situation for users who actually need that. They'll
do that, and aside from all the problems they'll also face issues with
timing between the two concurrent subscriptions, having to decode stuff
multiple times, etc.

regards

--
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

#46

Peter Eisentraut

peter.eisentraut@enterprisedb.com

over 3 years ago

In reply to: Tomas Vondra (#45)

Re: bogus: logical replication rows/cols combinations

On 03.05.22 21:40, Tomas Vondra wrote:

So what's wrong with merging the column lists as implemented in the v2
patch, posted a couple days ago?

Merging the column lists is ok if all other publication attributes
match. Otherwise, I think not.

I don't think triggers are a suitable alternative, as it executes on the
subscriber node. So you have to first copy the data to the remote node,
where it gets filtered. With column filters the data gets redacted on
the publisher.

Right, triggers are not currently a solution. But you could imagine a
redaction filter system that runs on the publisher that modifies rows
before they are sent out.

#47

houzj.fnst@fujitsu.com

over 3 years ago

In reply to: Amit Kapila (#43)

RE: bogus: logical replication rows/cols combinations

On Tuesday, May 3, 2022 11:31 AM Amit Kapila <amit.kapila16@gmail.com> wrote:

On Tue, May 3, 2022 at 12:10 AM Tomas Vondra
<tomas.vondra@enterprisedb.com> wrote:

On 5/2/22 19:51, Alvaro Herrera wrote:

Why would we need to know publications replicated by other

walsenders?

And what if the subscriber is not connected at the moment? In that case
there'll be no walsender.

Sure, if the replica is not connected then there's no issue -- as you
say, that replica will fail at START_REPLICATION time.

Right, I got confused a bit.

Anyway, I think the main challenge is defining what exactly we want to
check, in order to ensure "sensible" behavior, without preventing way
too many sensible use cases.

I could think of below two options:
1. Forbid any case where column list is different for the same table
when combining publications.
2. Forbid if the column list and row filters for a table are different
in the set of publications we are planning to combine. This means we
will allow combining column lists when row filters are not present or
when column list is the same (we don't get anything additional by
combining but the idea is we won't forbid such cases) and row filters
are different.

Now, I think the points in favor of (1) are that the main purpose of
introducing a column list are: (a) the structure/schema of the
subscriber is different from the publisher, (b) want to hide sensitive
columns data. In both cases, it should be fine if we follow (1) and
from Peter E.'s latest email [1] he also seems to be indicating the
same. If we want to be slightly more relaxed then we can probably (2).
We can decide on something else as well but I feel it should be such
that it is easy to explain.

I also think it makes sense to add a restriction like (1). I am planning to
implement the restriction if no one objects.

Best regards,
Hou zj

#48

Tomas Vondra

tomas.vondra@enterprisedb.com

over 3 years ago

In reply to: houzj.fnst@fujitsu.com (#47)

Re: bogus: logical replication rows/cols combinations

On 5/6/22 05:23, houzj.fnst@fujitsu.com wrote:

On Tuesday, May 3, 2022 11:31 AM Amit Kapila <amit.kapila16@gmail.com> wrote:

On Tue, May 3, 2022 at 12:10 AM Tomas Vondra
<tomas.vondra@enterprisedb.com> wrote:

On 5/2/22 19:51, Alvaro Herrera wrote:

Why would we need to know publications replicated by other

walsenders?

And what if the subscriber is not connected at the moment? In that case
there'll be no walsender.

Sure, if the replica is not connected then there's no issue -- as you
say, that replica will fail at START_REPLICATION time.

Right, I got confused a bit.

Anyway, I think the main challenge is defining what exactly we want to
check, in order to ensure "sensible" behavior, without preventing way
too many sensible use cases.

I could think of below two options:
1. Forbid any case where column list is different for the same table
when combining publications.
2. Forbid if the column list and row filters for a table are different
in the set of publications we are planning to combine. This means we
will allow combining column lists when row filters are not present or
when column list is the same (we don't get anything additional by
combining but the idea is we won't forbid such cases) and row filters
are different.

Now, I think the points in favor of (1) are that the main purpose of
introducing a column list are: (a) the structure/schema of the
subscriber is different from the publisher, (b) want to hide sensitive
columns data. In both cases, it should be fine if we follow (1) and
from Peter E.'s latest email [1] he also seems to be indicating the
same. If we want to be slightly more relaxed then we can probably (2).
We can decide on something else as well but I feel it should be such
that it is easy to explain.

I also think it makes sense to add a restriction like (1). I am planning to
implement the restriction if no one objects.

I'm not going to block that approach if that's the consensus here,
though I'm not convinced.

Let me point out (1) does *not* work for data redaction use case,
certainly not the example Alvaro and me presented, because that relies
on a combination of row filters and column filters. Requiring all column
lists to be the same (and not specific to row filter) prevents that
example from working. Yes, you can create multiple subscriptions, but
that brings it's own set of challenges too.

I doubt forcing users to use the more complex setup is good idea, and
combining the column lists per [1]/messages/by-id/5a85b8b7-fc1c-364b-5c62-0bb3e1e25824@enterprisedb.com seems sound to me.

That being said, the good thing is this restriction seems it might be
relaxed in the future to work per [1]/messages/by-id/5a85b8b7-fc1c-364b-5c62-0bb3e1e25824@enterprisedb.com, without causing any backwards
compatibility issues.

Should we do something similar for row filters, though? It seems quite
weird we're so concerned about unexpected behavior due to combining
column lists (despite having a patch that makes it behave sanely), and
at the same time wave off similarly strange behavior due to combining
row filters because "that's what you get if you define the publications
in a strange way".

regards

[1]: /messages/by-id/5a85b8b7-fc1c-364b-5c62-0bb3e1e25824@enterprisedb.com
/messages/by-id/5a85b8b7-fc1c-364b-5c62-0bb3e1e25824@enterprisedb.com

--
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

#49

Amit Kapila

amit.kapila16@gmail.com

over 3 years ago

In reply to: Tomas Vondra (#48)

Re: bogus: logical replication rows/cols combinations

On Fri, May 6, 2022 at 5:56 PM Tomas Vondra
<tomas.vondra@enterprisedb.com> wrote:

I could think of below two options:
1. Forbid any case where column list is different for the same table
when combining publications.
2. Forbid if the column list and row filters for a table are different
in the set of publications we are planning to combine. This means we
will allow combining column lists when row filters are not present or
when column list is the same (we don't get anything additional by
combining but the idea is we won't forbid such cases) and row filters
are different.

Now, I think the points in favor of (1) are that the main purpose of
introducing a column list are: (a) the structure/schema of the
subscriber is different from the publisher, (b) want to hide sensitive
columns data. In both cases, it should be fine if we follow (1) and
from Peter E.'s latest email [1] he also seems to be indicating the
same. If we want to be slightly more relaxed then we can probably (2).
We can decide on something else as well but I feel it should be such
that it is easy to explain.

I also think it makes sense to add a restriction like (1). I am planning to
implement the restriction if no one objects.

I'm not going to block that approach if that's the consensus here,
though I'm not convinced.

Let me point out (1) does *not* work for data redaction use case,
certainly not the example Alvaro and me presented, because that relies
on a combination of row filters and column filters.

This should just forbid the case presented by Alvaro in his first
email in this thread [1]/messages/by-id/202204251548.mudq7jbqnh7r@alvherre.pgsql.

Requiring all column
lists to be the same (and not specific to row filter) prevents that
example from working. Yes, you can create multiple subscriptions, but
that brings it's own set of challenges too.

I doubt forcing users to use the more complex setup is good idea, and
combining the column lists per [1] seems sound to me.

That being said, the good thing is this restriction seems it might be
relaxed in the future to work per [1], without causing any backwards
compatibility issues.

These are my thoughts as well. Even, if we decide to go via the column
list merging approach (in selective cases), we need to do some
performance testing of that approach as it does much more work per
tuple. It is possible that the impact is not much but still worth
evaluating, so let's try to see the patch to prohibit combining the
column lists then we can decide.

Should we do something similar for row filters, though? It seems quite
weird we're so concerned about unexpected behavior due to combining
column lists (despite having a patch that makes it behave sanely), and
at the same time wave off similarly strange behavior due to combining
row filters because "that's what you get if you define the publications
in a strange way".

During development, we found that we can't combine the row-filters for
'insert' and 'update'/'delete' because of replica identity
restrictions, so we have kept them separate. But if we came across
other such things then we can either try to fix those or forbid them.

[1]: /messages/by-id/202204251548.mudq7jbqnh7r@alvherre.pgsql

--
With Regards,
Amit Kapila.

#50

Tomas Vondra

tomas.vondra@enterprisedb.com

over 3 years ago

In reply to: Amit Kapila (#49)

Re: bogus: logical replication rows/cols combinations

On 5/6/22 15:40, Amit Kapila wrote:

On Fri, May 6, 2022 at 5:56 PM Tomas Vondra
<tomas.vondra@enterprisedb.com> wrote:

I could think of below two options:
1. Forbid any case where column list is different for the same table
when combining publications.
2. Forbid if the column list and row filters for a table are different
in the set of publications we are planning to combine. This means we
will allow combining column lists when row filters are not present or
when column list is the same (we don't get anything additional by
combining but the idea is we won't forbid such cases) and row filters
are different.

Now, I think the points in favor of (1) are that the main purpose of
introducing a column list are: (a) the structure/schema of the
subscriber is different from the publisher, (b) want to hide sensitive
columns data. In both cases, it should be fine if we follow (1) and
from Peter E.'s latest email [1] he also seems to be indicating the
same. If we want to be slightly more relaxed then we can probably (2).
We can decide on something else as well but I feel it should be such
that it is easy to explain.

I also think it makes sense to add a restriction like (1). I am planning to
implement the restriction if no one objects.

I'm not going to block that approach if that's the consensus here,
though I'm not convinced.

Let me point out (1) does *not* work for data redaction use case,
certainly not the example Alvaro and me presented, because that relies
on a combination of row filters and column filters.

This should just forbid the case presented by Alvaro in his first
email in this thread [1].

Requiring all column
lists to be the same (and not specific to row filter) prevents that
example from working. Yes, you can create multiple subscriptions, but
that brings it's own set of challenges too.

I doubt forcing users to use the more complex setup is good idea, and
combining the column lists per [1] seems sound to me.

That being said, the good thing is this restriction seems it might be
relaxed in the future to work per [1], without causing any backwards
compatibility issues.

These are my thoughts as well. Even, if we decide to go via the column
list merging approach (in selective cases), we need to do some
performance testing of that approach as it does much more work per
tuple. It is possible that the impact is not much but still worth
evaluating, so let's try to see the patch to prohibit combining the
column lists then we can decide.

Surely we could do some performance testing now. I doubt it's very
expensive - sure, you can construct cases with many row filters / column
lists, but how likely is that in practice?

Moreover, it's not like this would affect existing setups, so even if
it's a bit expensive, we may interpret that as cost of the feature.

Should we do something similar for row filters, though? It seems quite
weird we're so concerned about unexpected behavior due to combining
column lists (despite having a patch that makes it behave sanely), and
at the same time wave off similarly strange behavior due to combining
row filters because "that's what you get if you define the publications
in a strange way".

During development, we found that we can't combine the row-filters for
'insert' and 'update'/'delete' because of replica identity
restrictions, so we have kept them separate. But if we came across
other such things then we can either try to fix those or forbid them.

I understand how we got to the current state. I'm just saying that this
allows defining separate publications for insert, update and delete
actions, and set different row filters for each of them. Which results
in behavior that is hard to explain/understand, especially when it comes
to tablesync.

It seems quite strange to prohibit merging column lists because there
might be some strange behavior that no one described, and allow setups
with different row filters that definitely have strange behavior.

regards

--
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

#51

Amit Kapila

amit.kapila16@gmail.com

over 3 years ago

In reply to: Alvaro Herrera (#36)

Re: bogus: logical replication rows/cols combinations

On Mon, May 2, 2022 at 6:11 PM Alvaro Herrera <alvherre@alvh.no-ip.org> wrote:

On 2022-May-02, Amit Kapila wrote:

I think it is possible to expose a list of publications for each
walsender as it is stored in each walsenders
LogicalDecodingContext->output_plugin_private. AFAIK, each walsender
can have one such LogicalDecodingContext and we can probably share it
via shared memory?

I guess we need to create a DSM each time a walsender opens a
connection, at START_REPLICATION time. Then ALTER PUBLICATION needs to
connect to all DSMs of all running walsenders and see if they are
reading from it. Is that what you have in mind? Alternatively, we
could have one DSM per publication with a PID array of all walsenders
that are sending it (each walsender needs to add its PID as it starts).
The latter might be better.

While thinking about using DSM here, I came across one of your commits
f2f9fcb303 which seems to indicate that it is not a good idea to rely
on it but I think you have changed dynamic shared memory to fixed
shared memory usage because that was more suitable rather than DSM is
not portable. Because I see a commit bcbd940806 where we have removed
the 'none' option of dynamic_shared_memory_type. So, I think it should
be okay to use DSM in this context. What do you think?

--
With Regards,
Amit Kapila.

#52

Tomas Vondra

tomas.vondra@enterprisedb.com

over 3 years ago

In reply to: Amit Kapila (#51)

Re: bogus: logical replication rows/cols combinations

On 5/7/22 07:36, Amit Kapila wrote:

On Mon, May 2, 2022 at 6:11 PM Alvaro Herrera <alvherre@alvh.no-ip.org> wrote:

On 2022-May-02, Amit Kapila wrote:

I think it is possible to expose a list of publications for each
walsender as it is stored in each walsenders
LogicalDecodingContext->output_plugin_private. AFAIK, each walsender
can have one such LogicalDecodingContext and we can probably share it
via shared memory?

I guess we need to create a DSM each time a walsender opens a
connection, at START_REPLICATION time. Then ALTER PUBLICATION needs to
connect to all DSMs of all running walsenders and see if they are
reading from it. Is that what you have in mind? Alternatively, we
could have one DSM per publication with a PID array of all walsenders
that are sending it (each walsender needs to add its PID as it starts).
The latter might be better.

While thinking about using DSM here, I came across one of your commits
f2f9fcb303 which seems to indicate that it is not a good idea to rely
on it but I think you have changed dynamic shared memory to fixed
shared memory usage because that was more suitable rather than DSM is
not portable. Because I see a commit bcbd940806 where we have removed
the 'none' option of dynamic_shared_memory_type. So, I think it should
be okay to use DSM in this context. What do you think?

Why would any of this be needed?

ALTER PUBLICATION will invalidate the RelationSyncEntry entries in all
walsenders, no? So AFAICS it should be enough to enforce the limitations
in get_rel_sync_entry, which is necessary anyway because the subscriber
may not be connected when ALTER PUBLICATION gets executed.

regards

--
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

#53

Amit Kapila

amit.kapila16@gmail.com

over 3 years ago

In reply to: Tomas Vondra (#52)

Re: bogus: logical replication rows/cols combinations

On Sun, May 8, 2022 at 11:41 PM Tomas Vondra
<tomas.vondra@enterprisedb.com> wrote:

On 5/7/22 07:36, Amit Kapila wrote:

On Mon, May 2, 2022 at 6:11 PM Alvaro Herrera <alvherre@alvh.no-ip.org> wrote:

On 2022-May-02, Amit Kapila wrote:

I think it is possible to expose a list of publications for each
walsender as it is stored in each walsenders
LogicalDecodingContext->output_plugin_private. AFAIK, each walsender
can have one such LogicalDecodingContext and we can probably share it
via shared memory?

I guess we need to create a DSM each time a walsender opens a
connection, at START_REPLICATION time. Then ALTER PUBLICATION needs to
connect to all DSMs of all running walsenders and see if they are
reading from it. Is that what you have in mind? Alternatively, we
could have one DSM per publication with a PID array of all walsenders
that are sending it (each walsender needs to add its PID as it starts).
The latter might be better.

While thinking about using DSM here, I came across one of your commits
f2f9fcb303 which seems to indicate that it is not a good idea to rely
on it but I think you have changed dynamic shared memory to fixed
shared memory usage because that was more suitable rather than DSM is
not portable. Because I see a commit bcbd940806 where we have removed
the 'none' option of dynamic_shared_memory_type. So, I think it should
be okay to use DSM in this context. What do you think?

Why would any of this be needed?

ALTER PUBLICATION will invalidate the RelationSyncEntry entries in all
walsenders, no? So AFAICS it should be enough to enforce the limitations
in get_rel_sync_entry,

Yes, that should be sufficient to enforce limitations in
get_rel_sync_entry() but it will lead to the following behavior:
a. The Alter Publication command will be successful but later in the
logs, the error will be logged and the user needs to check it and take
appropriate action. Till that time the walsender will be in an error
loop which means it will restart and again lead to the same error till
the user takes some action.
b. As we use historic snapshots, so even after the user takes action
say by changing publication, it won't be reflected. So, the option for
the user would be to drop their subscription.

Am, I missing something? If not, then are we okay with such behavior?
If yes, then I think it would be much easier implementation-wise and
probably advisable at this point. We can document it so that users are
careful and can take necessary action if they get into such a
situation. Any way we can improve this in future as you also suggested
earlier.

which is necessary anyway because the subscriber
may not be connected when ALTER PUBLICATION gets executed.

If we are not okay with the resultant behavior of detecting this in
get_rel_sync_entry(), then we can solve this in some other way as
Alvaro has indicated in one of his responses which is to detect that
at start replication time probably in the subscriber-side.

--
With Regards,
Amit Kapila.

#54

Tomas Vondra

tomas.vondra@enterprisedb.com

over 3 years ago

In reply to: Amit Kapila (#53)

Re: bogus: logical replication rows/cols combinations

On 5/9/22 05:45, Amit Kapila wrote:

On Sun, May 8, 2022 at 11:41 PM Tomas Vondra
<tomas.vondra@enterprisedb.com> wrote:

On 5/7/22 07:36, Amit Kapila wrote:

On Mon, May 2, 2022 at 6:11 PM Alvaro Herrera <alvherre@alvh.no-ip.org> wrote:

On 2022-May-02, Amit Kapila wrote:

I think it is possible to expose a list of publications for each
walsender as it is stored in each walsenders
LogicalDecodingContext->output_plugin_private. AFAIK, each walsender
can have one such LogicalDecodingContext and we can probably share it
via shared memory?

I guess we need to create a DSM each time a walsender opens a
connection, at START_REPLICATION time. Then ALTER PUBLICATION needs to
connect to all DSMs of all running walsenders and see if they are
reading from it. Is that what you have in mind? Alternatively, we
could have one DSM per publication with a PID array of all walsenders
that are sending it (each walsender needs to add its PID as it starts).
The latter might be better.

While thinking about using DSM here, I came across one of your commits
f2f9fcb303 which seems to indicate that it is not a good idea to rely
on it but I think you have changed dynamic shared memory to fixed
shared memory usage because that was more suitable rather than DSM is
not portable. Because I see a commit bcbd940806 where we have removed
the 'none' option of dynamic_shared_memory_type. So, I think it should
be okay to use DSM in this context. What do you think?

Why would any of this be needed?

ALTER PUBLICATION will invalidate the RelationSyncEntry entries in all
walsenders, no? So AFAICS it should be enough to enforce the limitations
in get_rel_sync_entry,

Yes, that should be sufficient to enforce limitations in
get_rel_sync_entry() but it will lead to the following behavior:
a. The Alter Publication command will be successful but later in the
logs, the error will be logged and the user needs to check it and take
appropriate action. Till that time the walsender will be in an error
loop which means it will restart and again lead to the same error till
the user takes some action.
b. As we use historic snapshots, so even after the user takes action
say by changing publication, it won't be reflected. So, the option for
the user would be to drop their subscription.

Am, I missing something? If not, then are we okay with such behavior?
If yes, then I think it would be much easier implementation-wise and
probably advisable at this point. We can document it so that users are
careful and can take necessary action if they get into such a
situation. Any way we can improve this in future as you also suggested
earlier.

which is necessary anyway because the subscriber
may not be connected when ALTER PUBLICATION gets executed.

If we are not okay with the resultant behavior of detecting this in
get_rel_sync_entry(), then we can solve this in some other way as
Alvaro has indicated in one of his responses which is to detect that
at start replication time probably in the subscriber-side.

IMO that behavior is acceptable. We have to do that check anyway, and
the subscription may start failing after ALTER PUBLICATION for a number
of other reasons anyway so the user needs/should check the logs.

And if needed, we can improve this and start doing the proactive-checks
during ALTER PUBLICATION too.

regards

--
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

#55

Amit Kapila

amit.kapila16@gmail.com

over 3 years ago

In reply to: Tomas Vondra (#54)

Re: bogus: logical replication rows/cols combinations

On Wed, May 11, 2022 at 12:35 AM Tomas Vondra
<tomas.vondra@enterprisedb.com> wrote:

On 5/9/22 05:45, Amit Kapila wrote:

On Sun, May 8, 2022 at 11:41 PM Tomas Vondra
<tomas.vondra@enterprisedb.com> wrote:

On 5/7/22 07:36, Amit Kapila wrote:

On Mon, May 2, 2022 at 6:11 PM Alvaro Herrera <alvherre@alvh.no-ip.org> wrote:

On 2022-May-02, Amit Kapila wrote:

I think it is possible to expose a list of publications for each
walsender as it is stored in each walsenders
LogicalDecodingContext->output_plugin_private. AFAIK, each walsender
can have one such LogicalDecodingContext and we can probably share it
via shared memory?

I guess we need to create a DSM each time a walsender opens a
connection, at START_REPLICATION time. Then ALTER PUBLICATION needs to
connect to all DSMs of all running walsenders and see if they are
reading from it. Is that what you have in mind? Alternatively, we
could have one DSM per publication with a PID array of all walsenders
that are sending it (each walsender needs to add its PID as it starts).
The latter might be better.

While thinking about using DSM here, I came across one of your commits
f2f9fcb303 which seems to indicate that it is not a good idea to rely
on it but I think you have changed dynamic shared memory to fixed
shared memory usage because that was more suitable rather than DSM is
not portable. Because I see a commit bcbd940806 where we have removed
the 'none' option of dynamic_shared_memory_type. So, I think it should
be okay to use DSM in this context. What do you think?

Why would any of this be needed?

ALTER PUBLICATION will invalidate the RelationSyncEntry entries in all
walsenders, no? So AFAICS it should be enough to enforce the limitations
in get_rel_sync_entry,

Yes, that should be sufficient to enforce limitations in
get_rel_sync_entry() but it will lead to the following behavior:
a. The Alter Publication command will be successful but later in the
logs, the error will be logged and the user needs to check it and take
appropriate action. Till that time the walsender will be in an error
loop which means it will restart and again lead to the same error till
the user takes some action.
b. As we use historic snapshots, so even after the user takes action
say by changing publication, it won't be reflected. So, the option for
the user would be to drop their subscription.

Am, I missing something? If not, then are we okay with such behavior?
If yes, then I think it would be much easier implementation-wise and
probably advisable at this point. We can document it so that users are
careful and can take necessary action if they get into such a
situation. Any way we can improve this in future as you also suggested
earlier.

which is necessary anyway because the subscriber
may not be connected when ALTER PUBLICATION gets executed.

If we are not okay with the resultant behavior of detecting this in
get_rel_sync_entry(), then we can solve this in some other way as
Alvaro has indicated in one of his responses which is to detect that
at start replication time probably in the subscriber-side.

IMO that behavior is acceptable.

Fair enough, then we should go with a simpler approach to detect it in
pgoutput.c (get_rel_sync_entry).

We have to do that check anyway, and
the subscription may start failing after ALTER PUBLICATION for a number
of other reasons anyway so the user needs/should check the logs.

I think ALTER PUBLICATION won't ever lead to failure in walsender.
Sure, users can do something due to which subscriber-side failures can
happen due to constraint failures. Do you have some specific cases in
mind?

And if needed, we can improve this and start doing the proactive-checks
during ALTER PUBLICATION too.

Agreed.

--
With Regards,
Amit Kapila.

#56

houzj.fnst@fujitsu.com

over 3 years ago

In reply to: Amit Kapila (#55)

1 attachment(s)

RE: bogus: logical replication rows/cols combinations

On Wednesday, May 11, 2022 11:33 AM Amit Kapila <amit.kapila16@gmail.com> wrote:

On Wed, May 11, 2022 at 12:35 AM Tomas Vondra
<tomas.vondra@enterprisedb.com> wrote:

On 5/9/22 05:45, Amit Kapila wrote:

On Sun, May 8, 2022 at 11:41 PM Tomas Vondra
<tomas.vondra@enterprisedb.com> wrote:

On 5/7/22 07:36, Amit Kapila wrote:

On Mon, May 2, 2022 at 6:11 PM Alvaro Herrera

<alvherre@alvh.no-ip.org> wrote:

On 2022-May-02, Amit Kapila wrote:

I think it is possible to expose a list of publications for each
walsender as it is stored in each walsenders
LogicalDecodingContext->output_plugin_private. AFAIK, each
LogicalDecodingContext->walsender
can have one such LogicalDecodingContext and we can probably
share it via shared memory?

I guess we need to create a DSM each time a walsender opens a
connection, at START_REPLICATION time. Then ALTER PUBLICATION
needs to connect to all DSMs of all running walsenders and see if
they are reading from it. Is that what you have in mind?
Alternatively, we could have one DSM per publication with a PID
array of all walsenders that are sending it (each walsender needs to

add its PID as it starts).

The latter might be better.

While thinking about using DSM here, I came across one of your
commits
f2f9fcb303 which seems to indicate that it is not a good idea to
rely on it but I think you have changed dynamic shared memory to
fixed shared memory usage because that was more suitable rather
than DSM is not portable. Because I see a commit bcbd940806 where
we have removed the 'none' option of dynamic_shared_memory_type.
So, I think it should be okay to use DSM in this context. What do you

think?

Why would any of this be needed?

ALTER PUBLICATION will invalidate the RelationSyncEntry entries in
all walsenders, no? So AFAICS it should be enough to enforce the
limitations in get_rel_sync_entry,

Yes, that should be sufficient to enforce limitations in
get_rel_sync_entry() but it will lead to the following behavior:
a. The Alter Publication command will be successful but later in the
logs, the error will be logged and the user needs to check it and
take appropriate action. Till that time the walsender will be in an
error loop which means it will restart and again lead to the same
error till the user takes some action.
b. As we use historic snapshots, so even after the user takes action
say by changing publication, it won't be reflected. So, the option
for the user would be to drop their subscription.

Am, I missing something? If not, then are we okay with such behavior?
If yes, then I think it would be much easier implementation-wise and
probably advisable at this point. We can document it so that users
are careful and can take necessary action if they get into such a
situation. Any way we can improve this in future as you also
suggested earlier.

which is necessary anyway because the subscriber may not be
connected when ALTER PUBLICATION gets executed.

If we are not okay with the resultant behavior of detecting this in
get_rel_sync_entry(), then we can solve this in some other way as
Alvaro has indicated in one of his responses which is to detect that
at start replication time probably in the subscriber-side.

IMO that behavior is acceptable.

Fair enough, then we should go with a simpler approach to detect it in
pgoutput.c (get_rel_sync_entry).

OK, here is the patch that try to check column list in that way. The patch also
check the column list when CREATE SUBSCRIPTION and when starting initial copy.

Best regards,
Hou zj

Attachments:

0001-Disallow-combining-publication-when-column-list-is-d.patchapplication/octet-stream; name=0001-Disallow-combining-publication-when-column-list-is-d.patchDownload

From f7f68fd11211ffbe5770fb565ac80620dd6ccea9 Mon Sep 17 00:00:00 2001
From: "houzj.fnst" <houzj.fnst@cn.fujitsu.com>
Date: Wed, 11 May 2022 14:29:05 +0800
Subject: [PATCH] Disallow combining publication when column list is different
 for the same table

the main purpose of introducing a column list are statically have different
shapes on publisher and subscriber or hide sensitive columns data. In both
cases, it doesn't seems make sense to combine column lists. So disallow the
cases where column list is different for the same table when combining
publications.
---
 src/backend/commands/subscriptioncmds.c     |  37 +++++++--
 src/backend/replication/logical/tablesync.c |  61 +++++++++-----
 src/backend/replication/pgoutput/pgoutput.c |  78 ++++++++---------
 src/test/subscription/t/031_column_list.pl  | 124 +++++++++-------------------
 4 files changed, 154 insertions(+), 146 deletions(-)

diff --git a/src/backend/commands/subscriptioncmds.c b/src/backend/commands/subscriptioncmds.c
index b94236f..8650cc5 100644
--- a/src/backend/commands/subscriptioncmds.c
+++ b/src/backend/commands/subscriptioncmds.c
@@ -1753,7 +1753,8 @@ AlterSubscriptionOwner_oid(Oid subid, Oid newOwnerId)
 
 /*
  * Get the list of tables which belong to specified publications on the
- * publisher connection.
+ * publisher connection. Also get the column list for each table and check if
+ * column lists are the same in different publications.
  */
 static List *
 fetch_table_list(WalReceiverConn *wrconn, List *publications)
@@ -1761,17 +1762,34 @@ fetch_table_list(WalReceiverConn *wrconn, List *publications)
 	WalRcvExecResult *res;
 	StringInfoData cmd;
 	TupleTableSlot *slot;
-	Oid			tableRow[2] = {TEXTOID, TEXTOID};
+	Oid			tableRow[3] = {TEXTOID, TEXTOID, INT2VECTOROID};
 	List	   *tablelist = NIL;
 
 	initStringInfo(&cmd);
-	appendStringInfoString(&cmd, "SELECT DISTINCT t.schemaname, t.tablename\n"
-						   "  FROM pg_catalog.pg_publication_tables t\n"
+	appendStringInfoString(&cmd,
+						   "SELECT DISTINCT t.schemaname,\n"
+						   "                t.tablename,\n"
+						   "                (CASE WHEN (array_length(pr.prattrs, 1) = t.relnatts)\n"
+						   "                THEN NULL ELSE pr.prattrs END)\n"
+						   "  FROM (SELECT P.pubname AS pubname,\n"
+						   "               N.nspname AS schemaname,\n"
+						   "               C.relname AS tablename,\n"
+						   "               P.oid AS pubid,\n"
+						   "               C.oid AS reloid,\n"
+						   "               C.relnatts\n"
+						   "          FROM pg_publication P,\n"
+						   "          LATERAL pg_get_publication_tables(P.pubname) GPT,\n"
+						   "          pg_class C JOIN pg_namespace N\n"
+						   "                     ON (N.oid = C.relnamespace)\n"
+						   "          WHERE C.oid = GPT.relid) t\n"
+						   "  LEFT OUTER JOIN pg_publication_rel pr\n"
+						   "       ON (t.pubid = pr.prpubid AND\n"
+						   "	       pr.prrelid = reloid)\n"
 						   " WHERE t.pubname IN (");
 	get_publications_str(publications, &cmd, true);
 	appendStringInfoChar(&cmd, ')');
 
-	res = walrcv_exec(wrconn, cmd.data, 2, tableRow);
+	res = walrcv_exec(wrconn, cmd.data, 3, tableRow);
 	pfree(cmd.data);
 
 	if (res->status != WALRCV_OK_TUPLES)
@@ -1795,7 +1813,14 @@ fetch_table_list(WalReceiverConn *wrconn, List *publications)
 		Assert(!isnull);
 
 		rv = makeRangeVar(nspname, relname, -1);
-		tablelist = lappend(tablelist, rv);
+
+		if (list_member(tablelist, rv))
+			ereport(WARNING,
+					errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+					errmsg("cannot use different column lists for table \"%s.%s\" in different publications",
+						   nspname, relname));
+		else
+			tablelist = lappend(tablelist, rv);
 
 		ExecClearTuple(slot);
 	}
diff --git a/src/backend/replication/logical/tablesync.c b/src/backend/replication/logical/tablesync.c
index 49ceec3..8ac4171 100644
--- a/src/backend/replication/logical/tablesync.c
+++ b/src/backend/replication/logical/tablesync.c
@@ -771,7 +771,7 @@ fetch_remote_table_info(char *nspname, char *relname,
 	{
 		WalRcvExecResult *pubres;
 		TupleTableSlot *slot;
-		Oid			attrsRow[] = {INT2OID};
+		Oid			attrsRow[] = {INT2VECTOROID};
 		StringInfoData pub_names;
 		bool		first = true;
 
@@ -786,21 +786,20 @@ fetch_remote_table_info(char *nspname, char *relname,
 
 		/*
 		 * Fetch info about column lists for the relation (from all the
-		 * publications). We unnest the int2vector values, because that
-		 * makes it easier to combine lists by simply adding the attnums
-		 * to a new bitmap (without having to parse the int2vector data).
-		 * This preserves NULL values, so that if one of the publications
-		 * has no column list, we'll know that.
+		 * publications).
 		 */
 		resetStringInfo(&cmd);
 		appendStringInfo(&cmd,
-						 "SELECT DISTINCT unnest"
+						 "SELECT DISTINCT"
+						 "  (CASE WHEN (array_length(pr.prattrs, 1) = c.relnatts)"
+						 "   THEN NULL ELSE pr.prattrs END)"
 						 "  FROM pg_publication p"
 						 "  LEFT OUTER JOIN pg_publication_rel pr"
 						 "       ON (p.oid = pr.prpubid AND pr.prrelid = %u)"
 						 "  LEFT OUTER JOIN unnest(pr.prattrs) ON TRUE,"
-						 "  LATERAL pg_get_publication_tables(p.pubname) gpt"
-						 " WHERE gpt.relid = %u"
+						 "  LATERAL pg_get_publication_tables(p.pubname) gpt,"
+						 "  pg_class c"
+						 " WHERE gpt.relid = %u AND c.oid = gpt.relid"
 						 "   AND p.pubname IN ( %s )",
 						 lrel->remoteid,
 						 lrel->remoteid,
@@ -815,27 +814,49 @@ fetch_remote_table_info(char *nspname, char *relname,
 					 errmsg("could not fetch column list info for table \"%s.%s\" from publisher: %s",
 							nspname, relname, pubres->err)));
 
+		first = true;
+
 		/*
-		 * Merge the column lists (from different publications) by creating
-		 * a single bitmap with all the attnums. If we find a NULL value,
-		 * that means one of the publications has no column list for the
-		 * table we're syncing.
+		 * Traverse the column lists from different publications and build a
+		 * single bitmap with the attnums.
+		 *
+		 * During the loop, check that if all the column lists are the same and
+		 * report an error if not.
+		 *
+		 * If we find a NULL value, that means one of the publications has no
+		 * column list for the table we're syncing.
 		 */
 		slot = MakeSingleTupleTableSlot(pubres->tupledesc, &TTSOpsMinimalTuple);
 		while (tuplestore_gettupleslot(pubres->tuplestore, true, false, slot))
 		{
 			Datum	cfval = slot_getattr(slot, 1, &isnull);
+			Bitmapset  *cols = NULL;
 
-			/* NULL means empty column list, so we're done. */
-			if (isnull)
+			if (!isnull)
 			{
-				bms_free(included_cols);
-				included_cols = NULL;
-				break;
+				ArrayType  *arr;
+				int			nelems;
+				int16	   *elems;
+
+				arr = DatumGetArrayTypeP(cfval);
+				nelems = ARR_DIMS(arr)[0];
+				elems = (int16 *) ARR_DATA_PTR(arr);
+
+				for (int i = 0; i < nelems; i++)
+					cols = bms_add_member(cols, elems[i]);
 			}
 
-			included_cols = bms_add_member(included_cols,
-										DatumGetInt16(cfval));
+			/* NULL means empty column list. */
+			if (first)
+			{
+				included_cols = cols;
+				first = false;
+			}
+			else if (!bms_equal(included_cols, cols))
+				ereport(ERROR,
+						errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+						errmsg("cannot use different column lists for table \"%s.%s\" in different publications",
+							   nspname, relname));
 
 			ExecClearTuple(slot);
 		}
diff --git a/src/backend/replication/pgoutput/pgoutput.c b/src/backend/replication/pgoutput/pgoutput.c
index b197bfd..7ca09de 100644
--- a/src/backend/replication/pgoutput/pgoutput.c
+++ b/src/backend/replication/pgoutput/pgoutput.c
@@ -803,16 +803,13 @@ pgoutput_row_filter_exec_expr(ExprState *state, ExprContext *econtext)
  * Make sure the per-entry memory context exists.
  */
 static void
-pgoutput_ensure_entry_cxt(PGOutputData *data, RelationSyncEntry *entry)
+pgoutput_ensure_entry_cxt(PGOutputData *data, RelationSyncEntry *entry,
+						  Relation relation)
 {
-	Relation	relation;
-
 	/* The context may already exist, in which case bail out. */
 	if (entry->entry_cxt)
 		return;
 
-	relation = RelationIdGetRelation(entry->publish_as_relid);
-
 	entry->entry_cxt = AllocSetContextCreate(data->cachectx,
 											 "entry private context",
 											 ALLOCSET_SMALL_SIZES);
@@ -941,7 +938,7 @@ pgoutput_row_filter_init(PGOutputData *data, List *publications,
 	{
 		Relation	relation = RelationIdGetRelation(entry->publish_as_relid);
 
-		pgoutput_ensure_entry_cxt(data, entry);
+		pgoutput_ensure_entry_cxt(data, entry, relation);
 
 		/*
 		 * Now all the filters for all pubactions are known. Combine them when
@@ -978,14 +975,20 @@ pgoutput_column_list_init(PGOutputData *data, List *publications,
 						  RelationSyncEntry *entry)
 {
 	ListCell   *lc;
+	bool		first = true;
+	Relation	relation = RelationIdGetRelation(entry->publish_as_relid);
 
 	/*
 	 * Find if there are any column lists for this relation. If there are,
-	 * build a bitmap merging all the column lists.
+	 * build a bitmap using the column lists.
 	 *
-	 * All the given publication-table mappings must be checked.
+	 * Note that we don't support the case where column list is different for
+	 * the same table when combining publications. But we still need to check
+	 * all the given publication-table mappings and report an error if any
+	 * publications have different column list.
 	 *
-	 * Multiple publications might have multiple column lists for this relation.
+	 * Multiple publications might have multiple column lists for this
+	 * relation.
 	 *
 	 * FOR ALL TABLES and FOR ALL TABLES IN SCHEMA implies "don't use column
 	 * list" so it takes precedence.
@@ -995,12 +998,7 @@ pgoutput_column_list_init(PGOutputData *data, List *publications,
 		Publication *pub = lfirst(lc);
 		HeapTuple	cftuple = NULL;
 		Datum		cfdatum = 0;
-
-		/*
-		 * Assume there's no column list. Only if we find pg_publication_rel
-		 * entry with a column list we'll switch it to false.
-		 */
-		bool		pub_no_list = true;
+		Bitmapset	*cols = NULL;
 
 		/*
 		 * If the publication is FOR ALL TABLES then it is treated the same as if
@@ -1008,6 +1006,8 @@ pgoutput_column_list_init(PGOutputData *data, List *publications,
 		 */
 		if (!pub->alltables)
 		{
+			bool		pub_no_list = true;
+
 			/*
 			 * Check for the presence of a column list in this publication.
 			 *
@@ -1033,39 +1033,43 @@ pgoutput_column_list_init(PGOutputData *data, List *publications,
 
 				/*
 				 * Build the column list bitmap in the per-entry context.
-				 *
-				 * We need to merge column lists from all publications, so we
-				 * update the same bitmapset. If the column list is null, we
-				 * interpret it as replicating all columns.
 				 */
 				if (!pub_no_list)	/* when not null */
 				{
-					pgoutput_ensure_entry_cxt(data, entry);
+					pgoutput_ensure_entry_cxt(data, entry, relation);
+
+					cols = pub_collist_to_bitmapset(cols, cfdatum,
+													entry->entry_cxt);
 
-					entry->columns = pub_collist_to_bitmapset(entry->columns,
-															  cfdatum,
-															  entry->entry_cxt);
+					/*
+					 * If column list includes all the columns of the table,
+					 * set it to NULL.
+					 */
+					if (bms_num_members(cols) == RelationGetNumberOfAttributes(relation))
+					{
+						bms_free(cols);
+						cols = NULL;
+					}
 				}
+
+				ReleaseSysCache(cftuple);
 			}
 		}
 
-		/*
-		 * Found a publication with no column list, so we're done. But first
-		 * discard column list we might have from preceding publications.
-		 */
-		if (pub_no_list)
+		if (first)
 		{
-			if (cftuple)
-				ReleaseSysCache(cftuple);
-
-			bms_free(entry->columns);
-			entry->columns = NULL;
-
-			break;
+			entry->columns = cols;
+			first = false;
 		}
-
-		ReleaseSysCache(cftuple);
+		else if (!bms_equal(entry->columns, cols))
+			ereport(ERROR,
+					errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+					errmsg("cannot use different column lists for table \"%s.%s\" in different publications",
+						   get_namespace_name(RelationGetNamespace(relation)),
+						   RelationGetRelationName(relation)));
 	}							/* loop all subscribed publications */
+
+	RelationClose(relation);
 }
 
 /*
diff --git a/src/test/subscription/t/031_column_list.pl b/src/test/subscription/t/031_column_list.pl
index bdcf3e4..3dfe7d1 100644
--- a/src/test/subscription/t/031_column_list.pl
+++ b/src/test/subscription/t/031_column_list.pl
@@ -21,6 +21,8 @@ $node_subscriber->start;
 
 my $publisher_connstr = $node_publisher->connstr . ' dbname=postgres';
 
+my $offset = 0;
+
 sub wait_for_subscription_sync
 {
 	my ($node) = @_;
@@ -334,12 +336,12 @@ is($result, qq(1|abc
 2|xyz), 'update on column tab2.c is not replicated');
 
 
-# TEST: add a table to two publications with different column lists, and
+# TEST: add a table to two publications with same column lists, and
 # create a single subscription replicating both publications
 $node_publisher->safe_psql('postgres', qq(
 	CREATE TABLE tab5 (a int PRIMARY KEY, b int, c int, d int);
 	CREATE PUBLICATION pub2 FOR TABLE tab5 (a, b);
-	CREATE PUBLICATION pub3 FOR TABLE tab5 (a, d);
+	CREATE PUBLICATION pub3 FOR TABLE tab5 (a, b);
 
 	-- insert a couple initial records
 	INSERT INTO tab5 VALUES (1, 11, 111, 1111);
@@ -358,8 +360,7 @@ wait_for_subscription_sync($node_subscriber);
 
 $node_publisher->wait_for_catchup('sub1');
 
-# insert data and make sure all the columns (union of the columns lists)
-# get fully replicated
+# insert data and make sure all the columns get fully replicated
 $node_publisher->safe_psql('postgres', qq(
 	INSERT INTO tab5 VALUES (3, 33, 333, 3333);
 	INSERT INTO tab5 VALUES (4, 44, 444, 4444);
@@ -368,39 +369,11 @@ $node_publisher->safe_psql('postgres', qq(
 $node_publisher->wait_for_catchup('sub1');
 
 is($node_subscriber->safe_psql('postgres',"SELECT * FROM tab5 ORDER BY a"),
-   qq(1|11|1111
-2|22|2222
-3|33|3333
-4|44|4444),
-   'overlapping publications with overlapping column lists');
-
-# and finally, remove the column list for one of the publications, which
-# means replicating all columns (removing the column list), but first add
-# the missing column to the table on subscriber
-$node_publisher->safe_psql('postgres', qq(
-	ALTER PUBLICATION pub3 SET TABLE tab5;
-));
-
-$node_subscriber->safe_psql('postgres', qq(
-	ALTER SUBSCRIPTION sub1 REFRESH PUBLICATION;
-	ALTER TABLE tab5 ADD COLUMN c INT;
-));
-
-wait_for_subscription_sync($node_subscriber);
-
-$node_publisher->safe_psql('postgres', qq(
-	INSERT INTO tab5 VALUES (5, 55, 555, 5555);
-));
-
-$node_publisher->wait_for_catchup('sub1');
-
-is($node_subscriber->safe_psql('postgres',"SELECT * FROM tab5 ORDER BY a"),
-   qq(1|11|1111|
-2|22|2222|
-3|33|3333|
-4|44|4444|
-5|55|5555|555),
-   'overlapping publications with overlapping column lists');
+   qq(1|11|
+2|22|
+3|33|
+4|44|),
+   'insert on column tab5.d is not replicated');
 
 # TEST: create a table with a column list, then change the replica
 # identity by replacing a primary key (but use a different column in
@@ -822,51 +795,18 @@ is($node_subscriber->safe_psql('postgres',"SELECT * FROM test_part_d ORDER BY a,
 3|),
    'partitions with different replica identities not replicated correctly');
 
-# TEST: With a table included in multiple publications, we should use a
-# union of the column lists. So with column lists (a,b) and (a,c) we
-# should replicate (a,b,c).
-
-$node_publisher->safe_psql('postgres', qq(
-	CREATE TABLE test_mix_1 (a int PRIMARY KEY, b int, c int);
-	CREATE PUBLICATION pub_mix_1 FOR TABLE test_mix_1 (a, b);
-	CREATE PUBLICATION pub_mix_2 FOR TABLE test_mix_1 (a, c);
-
-	-- initial data
-	INSERT INTO test_mix_1 VALUES (1, 2, 3);
-));
-
-$node_subscriber->safe_psql('postgres', qq(
-	CREATE TABLE test_mix_1 (a int PRIMARY KEY, b int, c int);
-	ALTER SUBSCRIPTION sub1 SET PUBLICATION pub_mix_1, pub_mix_2;
-));
-
-wait_for_subscription_sync($node_subscriber);
-
-$node_publisher->safe_psql('postgres', qq(
-	INSERT INTO test_mix_1 VALUES (4, 5, 6);
-));
-
-$node_publisher->wait_for_catchup('sub1');
-
-is($node_subscriber->safe_psql('postgres',"SELECT * FROM test_mix_1 ORDER BY a"),
-   qq(1|2|3
-4|5|6),
-   'a mix of publications should use a union of column list');
-
-
-# TEST: With a table included in multiple publications, we should use a
-# union of the column lists. If any of the publications is FOR ALL
-# TABLES, we should replicate all columns.
+# TEST: With a table included in the publications is FOR ALL TABLES, it means
+# replicate all columns.
 
 # drop unnecessary tables, so as not to interfere with the FOR ALL TABLES
 $node_publisher->safe_psql('postgres', qq(
-	DROP TABLE tab1, tab2, tab3, tab4, tab5, tab6, tab7, test_mix_1,
+	DROP TABLE tab1, tab2, tab3, tab4, tab5, tab6, tab7,
 			   test_part, test_part_a, test_part_b, test_part_c, test_part_d;
 ));
 
 $node_publisher->safe_psql('postgres', qq(
 	CREATE TABLE test_mix_2 (a int PRIMARY KEY, b int, c int);
-	CREATE PUBLICATION pub_mix_3 FOR TABLE test_mix_2 (a, b);
+	CREATE PUBLICATION pub_mix_3 FOR TABLE test_mix_2 (a, b, c);
 	CREATE PUBLICATION pub_mix_4 FOR ALL TABLES;
 
 	-- initial data
@@ -890,12 +830,11 @@ $node_publisher->wait_for_catchup('sub1');
 is($node_subscriber->safe_psql('postgres',"SELECT * FROM test_mix_2"),
    qq(1|2|3
 4|5|6),
-   'a mix of publications should use a union of column list');
+   'all columns should be replicated');
 
 
-# TEST: With a table included in multiple publications, we should use a
-# union of the column lists. If any of the publications is FOR ALL
-# TABLES IN SCHEMA, we should replicate all columns.
+# TEST: With a table included in the publication which is FOR ALL TABLES IN
+# SCHEMA, it means replicate all columns.
 
 $node_subscriber->safe_psql('postgres', qq(
 	DROP SUBSCRIPTION sub1;
@@ -905,7 +844,7 @@ $node_subscriber->safe_psql('postgres', qq(
 $node_publisher->safe_psql('postgres', qq(
 	DROP TABLE test_mix_2;
 	CREATE TABLE test_mix_3 (a int PRIMARY KEY, b int, c int);
-	CREATE PUBLICATION pub_mix_5 FOR TABLE test_mix_3 (a, b);
+	CREATE PUBLICATION pub_mix_5 FOR TABLE test_mix_3 (a, b, c);
 	CREATE PUBLICATION pub_mix_6 FOR ALL TABLES IN SCHEMA public;
 
 	-- initial data
@@ -927,8 +866,7 @@ $node_publisher->wait_for_catchup('sub1');
 is($node_subscriber->safe_psql('postgres',"SELECT * FROM test_mix_3"),
    qq(1|2|3
 4|5|6),
-   'a mix of publications should use a union of column list');
-
+   'all columns should be replicated');
 
 # TEST: Check handling of publish_via_partition_root - if a partition is
 # published through partition root, we should only apply the column list
@@ -979,7 +917,7 @@ is($node_subscriber->safe_psql('postgres',"SELECT * FROM test_root ORDER BY a, b
 # TEST: Multiple publications which publish schema of parent table and
 # partition. The partition is published through two publications, once
 # through a schema (so no column list) containing the parent, and then
-# also directly (with a columns list). The expected outcome is there is
+# also directly (all columns). The expected outcome is there is
 # no column list.
 
 $node_publisher->safe_psql('postgres', qq(
@@ -990,7 +928,7 @@ $node_publisher->safe_psql('postgres', qq(
 	CREATE TABLE t_1 PARTITION OF s1.t FOR VALUES FROM (1) TO (10);
 
 	CREATE PUBLICATION pub1 FOR ALL TABLES IN SCHEMA s1;
-	CREATE PUBLICATION pub2 FOR TABLE t_1(b);
+	CREATE PUBLICATION pub2 FOR TABLE t_1(a, b, c);
 
 	-- initial data
 	INSERT INTO s1.t VALUES (1, 2, 3);
@@ -1124,6 +1062,26 @@ is($node_subscriber->safe_psql('postgres',"SELECT * FROM t ORDER BY a, b, c"),
 4||),
    'publication containing both parent and child relation');
 
+# TEST: With a table included in multiple publications with different column
+# lists, we should catch the error in the log
+
+$node_publisher->safe_psql('postgres', qq(
+	CREATE TABLE test_mix_1 (a int PRIMARY KEY, b int, c int);
+	CREATE PUBLICATION pub_mix_1 FOR TABLE test_mix_1 (a, b);
+	CREATE PUBLICATION pub_mix_2 FOR TABLE test_mix_1 (a, c);
+));
+
+$node_subscriber->safe_psql('postgres', qq(
+	CREATE TABLE test_mix_1 (a int PRIMARY KEY, b int, c int);
+	ALTER SUBSCRIPTION sub1 SET PUBLICATION pub_mix_1, pub_mix_2 WITH (copy_data = false);
+));
+
+$node_publisher->wait_for_catchup('sub1');
+
+my $logfile = slurp_file($node_subscriber->logfile, $offset);
+ok( $logfile =~
+	  qr/cannot use different column lists for table "public.test_mix_1" in different publications/,
+	'different column lists detected');
 
 $node_subscriber->stop('fast');
 $node_publisher->stop('fast');
-- 
2.7.2.windows.1

#57

Amit Kapila

amit.kapila16@gmail.com

over 3 years ago

In reply to: houzj.fnst@fujitsu.com (#56)

Re: bogus: logical replication rows/cols combinations

On Wed, May 11, 2022 at 12:55 PM houzj.fnst@fujitsu.com
<houzj.fnst@fujitsu.com> wrote:

On Wednesday, May 11, 2022 11:33 AM Amit Kapila <amit.kapila16@gmail.com> wrote:

Fair enough, then we should go with a simpler approach to detect it in
pgoutput.c (get_rel_sync_entry).

OK, here is the patch that try to check column list in that way. The patch also
check the column list when CREATE SUBSCRIPTION and when starting initial copy.

Few comments:
===============
1.
initStringInfo(&cmd);
- appendStringInfoString(&cmd, "SELECT DISTINCT t.schemaname, t.tablename\n"
-    "  FROM pg_catalog.pg_publication_tables t\n"
+ appendStringInfoString(&cmd,
+    "SELECT DISTINCT t.schemaname,\n"
+    "                t.tablename,\n"
+    "                (CASE WHEN (array_length(pr.prattrs, 1) = t.relnatts)\n"
+    "                THEN NULL ELSE pr.prattrs END)\n"
+    "  FROM (SELECT P.pubname AS pubname,\n"
+    "               N.nspname AS schemaname,\n"
+    "               C.relname AS tablename,\n"
+    "               P.oid AS pubid,\n"
+    "               C.oid AS reloid,\n"
+    "               C.relnatts\n"
+    "          FROM pg_publication P,\n"
+    "          LATERAL pg_get_publication_tables(P.pubname) GPT,\n"
+    "          pg_class C JOIN pg_namespace N\n"
+    "                     ON (N.oid = C.relnamespace)\n"
+    "          WHERE C.oid = GPT.relid) t\n"
+    "  LEFT OUTER JOIN pg_publication_rel pr\n"
+    "       ON (t.pubid = pr.prpubid AND\n"
+    "        pr.prrelid = reloid)\n"

Can we modify pg_publication_tables to get the row filter and column
list and then use it directly instead of constructing this query?

2.
+ if (list_member(tablelist, rv))
+ ereport(WARNING,
+ errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+ errmsg("cannot use different column lists for table \"%s.%s\" in
different publications",
+    nspname, relname));
+ else

Can we write comments to explain why we are using WARNING here instead of ERROR?

3.
static void
-pgoutput_ensure_entry_cxt(PGOutputData *data, RelationSyncEntry *entry)
+pgoutput_ensure_entry_cxt(PGOutputData *data, RelationSyncEntry *entry,
+   Relation relation)

What is the need to change this interface as part of this patch?

--
With Regards,
Amit Kapila.

#58

Amit Kapila

amit.kapila16@gmail.com

over 3 years ago

In reply to: Amit Kapila (#57)

Re: bogus: logical replication rows/cols combinations

On Thu, May 12, 2022 at 12:15 PM Amit Kapila <amit.kapila16@gmail.com> wrote:

On Wed, May 11, 2022 at 12:55 PM houzj.fnst@fujitsu.com
<houzj.fnst@fujitsu.com> wrote:

On Wednesday, May 11, 2022 11:33 AM Amit Kapila <amit.kapila16@gmail.com> wrote:

Fair enough, then we should go with a simpler approach to detect it in
pgoutput.c (get_rel_sync_entry).

OK, here is the patch that try to check column list in that way. The patch also
check the column list when CREATE SUBSCRIPTION and when starting initial copy.

Few comments:
===============

...

One more point, I think we should update the docs for this.

--
With Regards,
Amit Kapila.

#59

houzj.fnst@fujitsu.com

over 3 years ago

In reply to: Amit Kapila (#57)

2 attachment(s)

RE: bogus: logical replication rows/cols combinations

On Thursday, May 12, 2022 2:45 PM Amit Kapila <amit.kapila16@gmail.com> wrote:

On Wed, May 11, 2022 at 12:55 PM houzj.fnst@fujitsu.com
<houzj.fnst@fujitsu.com> wrote:

On Wednesday, May 11, 2022 11:33 AM Amit Kapila

<amit.kapila16@gmail.com> wrote:

Fair enough, then we should go with a simpler approach to detect it
in pgoutput.c (get_rel_sync_entry).

OK, here is the patch that try to check column list in that way. The
patch also check the column list when CREATE SUBSCRIPTION and when

starting initial copy.
Few comments:
===============
1.
initStringInfo(&cmd);
- appendStringInfoString(&cmd, "SELECT DISTINCT t.schemaname,
t.tablename\n"
-    "  FROM pg_catalog.pg_publication_tables t\n"
+ appendStringInfoString(&cmd,
+    "SELECT DISTINCT t.schemaname,\n"
+    "                t.tablename,\n"
+    "                (CASE WHEN (array_length(pr.prattrs, 1) = t.relnatts)\n"
+    "                THEN NULL ELSE pr.prattrs END)\n"
+    "  FROM (SELECT P.pubname AS pubname,\n"
+    "               N.nspname AS schemaname,\n"
+    "               C.relname AS tablename,\n"
+    "               P.oid AS pubid,\n"
+    "               C.oid AS reloid,\n"
+    "               C.relnatts\n"
+    "          FROM pg_publication P,\n"
+    "          LATERAL pg_get_publication_tables(P.pubname) GPT,\n"
+    "          pg_class C JOIN pg_namespace N\n"
+    "                     ON (N.oid = C.relnamespace)\n"
+    "          WHERE C.oid = GPT.relid) t\n"
+    "  LEFT OUTER JOIN pg_publication_rel pr\n"
+    "       ON (t.pubid = pr.prpubid AND\n"
+    "        pr.prrelid = reloid)\n"
Can we modify pg_publication_tables to get the row filter and column list and
then use it directly instead of constructing this query?

Agreed. If we can get columnlist and rowfilter from pg_publication_tables, it
will be more convenient. And I think users that want to fetch the columnlist
and rowfilter of table can also benefit from it.

2.
+ if (list_member(tablelist, rv))
+ ereport(WARNING,
+ errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+ errmsg("cannot use different column lists for table \"%s.%s\" in
different publications",
+    nspname, relname));
+ else

Can we write comments to explain why we are using WARNING here instead of
ERROR?

3.
static void
-pgoutput_ensure_entry_cxt(PGOutputData *data, RelationSyncEntry *entry)
+pgoutput_ensure_entry_cxt(PGOutputData *data, RelationSyncEntry *entry,
+   Relation relation)

What is the need to change this interface as part of this patch?

Attach the new version patch which addressed these comments and update the
document. 0001 patch is to extent the view and 0002 patch is to add restriction
for column list.

Best regards,
Hou zj

Attachments:

0002-Disallow-combining-publication-when-column-list-is-d.patchapplication/octet-stream; name=0002-Disallow-combining-publication-when-column-list-is-d.patchDownload

From e1fb8c867353262a43419ee1892bbd64f4d613d7 Mon Sep 17 00:00:00 2001
From: "houzj.fnst" <houzj.fnst@cn.fujitsu.com>
Date: Fri, 13 May 2022 13:19:49 +0800
Subject: [PATCH] Disallow combining publication when column list is different
 for the same table

the main purpose of introducing a column list are statically have different
shapes on publisher and subscriber or hide sensitive columns data. In both
cases, it doesn't seems make sense to combine column lists. So disallow the
cases where column list is different for the same table when combining
publications.
---
 src/backend/commands/subscriptioncmds.c     |  24 ++++--
 src/backend/replication/logical/tablesync.c |  61 ++++++++-----
 src/backend/replication/pgoutput/pgoutput.c |  75 ++++++++--------
 src/test/subscription/t/031_column_list.pl  | 127 +++++++++-------------------
 4 files changed, 137 insertions(+), 150 deletions(-)

diff --git a/src/backend/commands/subscriptioncmds.c b/src/backend/commands/subscriptioncmds.c
index 690cdaa..c19eb81 100644
--- a/src/backend/commands/subscriptioncmds.c
+++ b/src/backend/commands/subscriptioncmds.c
@@ -1753,7 +1753,8 @@ AlterSubscriptionOwner_oid(Oid subid, Oid newOwnerId)
 
 /*
  * Get the list of tables which belong to specified publications on the
- * publisher connection.
+ * publisher connection. Also get the column list for each table and check if
+ * column lists are the same in different publications.
  */
 static List *
 fetch_table_list(WalReceiverConn *wrconn, List *publications)
@@ -1761,17 +1762,18 @@ fetch_table_list(WalReceiverConn *wrconn, List *publications)
 	WalRcvExecResult *res;
 	StringInfoData cmd;
 	TupleTableSlot *slot;
-	Oid			tableRow[2] = {TEXTOID, TEXTOID};
+	Oid			tableRow[3] = {TEXTOID, TEXTOID, INT2VECTOROID};
 	List	   *tablelist = NIL;
 
 	initStringInfo(&cmd);
-	appendStringInfoString(&cmd, "SELECT DISTINCT t.schemaname, t.tablename\n"
+	appendStringInfoString(&cmd, "SELECT DISTINCT t.schemaname, t.tablename, \n"
+						   "                      t.columnlist\n"
 						   "  FROM pg_catalog.pg_publication_tables t\n"
 						   " WHERE t.pubname IN (");
 	get_publications_str(publications, &cmd, true);
 	appendStringInfoChar(&cmd, ')');
 
-	res = walrcv_exec(wrconn, cmd.data, 2, tableRow);
+	res = walrcv_exec(wrconn, cmd.data, 3, tableRow);
 	pfree(cmd.data);
 
 	if (res->status != WALRCV_OK_TUPLES)
@@ -1795,7 +1797,19 @@ fetch_table_list(WalReceiverConn *wrconn, List *publications)
 		Assert(!isnull);
 
 		rv = makeRangeVar(nspname, relname, -1);
-		tablelist = lappend(tablelist, rv);
+
+		/*
+		 * We only throw a warning here so that the subcription can still be
+		 * created and let user aware that something is going to fail later and
+		 * they can fix the publications afterwards.
+		 */
+		if (list_member(tablelist, rv))
+			ereport(WARNING,
+					errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+					errmsg("cannot use different column lists for table \"%s.%s\" in different publications",
+						   nspname, relname));
+		else
+			tablelist = lappend(tablelist, rv);
 
 		ExecClearTuple(slot);
 	}
diff --git a/src/backend/replication/logical/tablesync.c b/src/backend/replication/logical/tablesync.c
index 994c7a0..42de832 100644
--- a/src/backend/replication/logical/tablesync.c
+++ b/src/backend/replication/logical/tablesync.c
@@ -771,7 +771,7 @@ fetch_remote_table_info(char *nspname, char *relname,
 	{
 		WalRcvExecResult *pubres;
 		TupleTableSlot *slot;
-		Oid			attrsRow[] = {INT2OID};
+		Oid			attrsRow[] = {INT2VECTOROID};
 		StringInfoData pub_names;
 		bool		first = true;
 
@@ -786,19 +786,17 @@ fetch_remote_table_info(char *nspname, char *relname,
 
 		/*
 		 * Fetch info about column lists for the relation (from all the
-		 * publications). We unnest the int2vector values, because that makes
-		 * it easier to combine lists by simply adding the attnums to a new
-		 * bitmap (without having to parse the int2vector data). This
-		 * preserves NULL values, so that if one of the publications has no
-		 * column list, we'll know that.
+		 * publications).
 		 */
 		resetStringInfo(&cmd);
 		appendStringInfo(&cmd,
-						 "SELECT DISTINCT unnest"
+						 "SELECT DISTINCT"
+						 "  (CASE WHEN (array_length(gpt.attrs, 1) = c.relnatts)"
+						 "   THEN NULL ELSE gpt.attrs END)"
 						 "  FROM pg_publication p,"
-						 "  LATERAL pg_get_publication_tables(p.pubname) gpt"
-						 "  LEFT OUTER JOIN unnest(gpt.attrs) ON TRUE"
-						 " WHERE gpt.relid = %u"
+						 "  LATERAL pg_get_publication_tables(p.pubname) gpt,"
+						 "  pg_class c"
+						 " WHERE gpt.relid = %u AND c.oid = gpt.relid"
 						 "   AND p.pubname IN ( %s )",
 						 lrel->remoteid,
 						 pub_names.data);
@@ -812,27 +810,48 @@ fetch_remote_table_info(char *nspname, char *relname,
 					 errmsg("could not fetch column list info for table \"%s.%s\" from publisher: %s",
 							nspname, relname, pubres->err)));
 
+		first = true;
+
 		/*
-		 * Merge the column lists (from different publications) by creating a
-		 * single bitmap with all the attnums. If we find a NULL value, that
-		 * means one of the publications has no column list for the table
-		 * we're syncing.
+		 * Traverse the column lists from different publications and build a
+		 * single bitmap with the attnums.
+		 *
+		 * During the loop, check that if all the column lists are the same and
+		 * report an error if not.
+		 *
+		 * If we find a NULL value, that means one of the publications has no
+		 * column list for the table we're syncing.
 		 */
 		slot = MakeSingleTupleTableSlot(pubres->tupledesc, &TTSOpsMinimalTuple);
 		while (tuplestore_gettupleslot(pubres->tuplestore, true, false, slot))
 		{
 			Datum		cfval = slot_getattr(slot, 1, &isnull);
+			Bitmapset  *cols = NULL;
 
-			/* NULL means empty column list, so we're done. */
-			if (isnull)
+			if (!isnull)
 			{
-				bms_free(included_cols);
-				included_cols = NULL;
-				break;
+				ArrayType  *arr;
+				int			nelems;
+				int16	   *elems;
+
+				arr = DatumGetArrayTypeP(cfval);
+				nelems = ARR_DIMS(arr)[0];
+				elems = (int16 *) ARR_DATA_PTR(arr);
+
+				for (int i = 0; i < nelems; i++)
+					cols = bms_add_member(cols, elems[i]);
 			}
 
-			included_cols = bms_add_member(included_cols,
-										   DatumGetInt16(cfval));
+			if (first)
+			{
+				included_cols = cols;
+				first = false;
+			}
+			else if (!bms_equal(included_cols, cols))
+				ereport(ERROR,
+						errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+						errmsg("cannot use different column lists for table \"%s.%s\" in different publications",
+							   nspname, relname));
 
 			ExecClearTuple(slot);
 		}
diff --git a/src/backend/replication/pgoutput/pgoutput.c b/src/backend/replication/pgoutput/pgoutput.c
index 42c06af..4f58df2 100644
--- a/src/backend/replication/pgoutput/pgoutput.c
+++ b/src/backend/replication/pgoutput/pgoutput.c
@@ -979,12 +979,17 @@ pgoutput_column_list_init(PGOutputData *data, List *publications,
 						  RelationSyncEntry *entry)
 {
 	ListCell   *lc;
+	bool		first = true;
+	Relation	relation = RelationIdGetRelation(entry->publish_as_relid);
 
 	/*
 	 * Find if there are any column lists for this relation. If there are,
-	 * build a bitmap merging all the column lists.
+	 * build a bitmap using the column lists.
 	 *
-	 * All the given publication-table mappings must be checked.
+	 * Note that we don't support the case where column list is different for
+	 * the same table when combining publications. But we still need to check
+	 * all the given publication-table mappings and report an error if any
+	 * publications have different column list.
 	 *
 	 * Multiple publications might have multiple column lists for this
 	 * relation.
@@ -997,12 +1002,7 @@ pgoutput_column_list_init(PGOutputData *data, List *publications,
 		Publication *pub = lfirst(lc);
 		HeapTuple	cftuple = NULL;
 		Datum		cfdatum = 0;
-
-		/*
-		 * Assume there's no column list. Only if we find pg_publication_rel
-		 * entry with a column list we'll switch it to false.
-		 */
-		bool		pub_no_list = true;
+		Bitmapset	*cols = NULL;
 
 		/*
 		 * If the publication is FOR ALL TABLES then it is treated the same as
@@ -1011,6 +1011,8 @@ pgoutput_column_list_init(PGOutputData *data, List *publications,
 		 */
 		if (!pub->alltables)
 		{
+			bool		pub_no_list = true;
+
 			/*
 			 * Check for the presence of a column list in this publication.
 			 *
@@ -1024,51 +1026,48 @@ pgoutput_column_list_init(PGOutputData *data, List *publications,
 
 			if (HeapTupleIsValid(cftuple))
 			{
-				/*
-				 * Lookup the column list attribute.
-				 *
-				 * Note: We update the pub_no_list value directly, because if
-				 * the value is NULL, we have no list (and vice versa).
-				 */
+				/* Lookup the column list attribute. */
 				cfdatum = SysCacheGetAttr(PUBLICATIONRELMAP, cftuple,
 										  Anum_pg_publication_rel_prattrs,
 										  &pub_no_list);
 
-				/*
-				 * Build the column list bitmap in the per-entry context.
-				 *
-				 * We need to merge column lists from all publications, so we
-				 * update the same bitmapset. If the column list is null, we
-				 * interpret it as replicating all columns.
-				 */
+				/* Build the column list bitmap in the per-entry context. */
 				if (!pub_no_list)	/* when not null */
 				{
 					pgoutput_ensure_entry_cxt(data, entry);
 
-					entry->columns = pub_collist_to_bitmapset(entry->columns,
-															  cfdatum,
-															  entry->entry_cxt);
+					cols = pub_collist_to_bitmapset(cols, cfdatum,
+													entry->entry_cxt);
+
+					/*
+					 * If column list includes all the columns of the table,
+					 * set it to NULL.
+					 */
+					if (bms_num_members(cols) == RelationGetNumberOfAttributes(relation))
+					{
+						bms_free(cols);
+						cols = NULL;
+					}
 				}
+
+				ReleaseSysCache(cftuple);
 			}
 		}
 
-		/*
-		 * Found a publication with no column list, so we're done. But first
-		 * discard column list we might have from preceding publications.
-		 */
-		if (pub_no_list)
+		if (first)
 		{
-			if (cftuple)
-				ReleaseSysCache(cftuple);
-
-			bms_free(entry->columns);
-			entry->columns = NULL;
-
-			break;
+			entry->columns = cols;
+			first = false;
 		}
-
-		ReleaseSysCache(cftuple);
+		else if (!bms_equal(entry->columns, cols))
+			ereport(ERROR,
+					errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+					errmsg("cannot use different column lists for table \"%s.%s\" in different publications",
+						   get_namespace_name(RelationGetNamespace(relation)),
+						   RelationGetRelationName(relation)));
 	}							/* loop all subscribed publications */
+
+	RelationClose(relation);
 }
 
 /*
diff --git a/src/test/subscription/t/031_column_list.pl b/src/test/subscription/t/031_column_list.pl
index 19812e1..b454a05 100644
--- a/src/test/subscription/t/031_column_list.pl
+++ b/src/test/subscription/t/031_column_list.pl
@@ -21,6 +21,8 @@ $node_subscriber->start;
 
 my $publisher_connstr = $node_publisher->connstr . ' dbname=postgres';
 
+my $offset = 0;
+
 sub wait_for_subscription_sync
 {
 	my ($node) = @_;
@@ -361,13 +363,13 @@ is( $result, qq(1|abc
 2|xyz), 'update on column tab2.c is not replicated');
 
 
-# TEST: add a table to two publications with different column lists, and
+# TEST: add a table to two publications with same column lists, and
 # create a single subscription replicating both publications
 $node_publisher->safe_psql(
 	'postgres', qq(
 	CREATE TABLE tab5 (a int PRIMARY KEY, b int, c int, d int);
 	CREATE PUBLICATION pub2 FOR TABLE tab5 (a, b);
-	CREATE PUBLICATION pub3 FOR TABLE tab5 (a, d);
+	CREATE PUBLICATION pub3 FOR TABLE tab5 (a, b);
 
 	-- insert a couple initial records
 	INSERT INTO tab5 VALUES (1, 11, 111, 1111);
@@ -388,8 +390,7 @@ wait_for_subscription_sync($node_subscriber);
 
 $node_publisher->wait_for_catchup('sub1');
 
-# insert data and make sure all the columns (union of the columns lists)
-# get fully replicated
+# insert data and make sure the columns in column list get fully replicated
 $node_publisher->safe_psql(
 	'postgres', qq(
 	INSERT INTO tab5 VALUES (3, 33, 333, 3333);
@@ -399,42 +400,12 @@ $node_publisher->safe_psql(
 $node_publisher->wait_for_catchup('sub1');
 
 is( $node_subscriber->safe_psql('postgres', "SELECT * FROM tab5 ORDER BY a"),
-	qq(1|11|1111
-2|22|2222
-3|33|3333
-4|44|4444),
+	qq(1|11|
+2|22|
+3|33|
+4|44|),
 	'overlapping publications with overlapping column lists');
 
-# and finally, remove the column list for one of the publications, which
-# means replicating all columns (removing the column list), but first add
-# the missing column to the table on subscriber
-$node_publisher->safe_psql(
-	'postgres', qq(
-	ALTER PUBLICATION pub3 SET TABLE tab5;
-));
-
-$node_subscriber->safe_psql(
-	'postgres', qq(
-	ALTER SUBSCRIPTION sub1 REFRESH PUBLICATION;
-	ALTER TABLE tab5 ADD COLUMN c INT;
-));
-
-wait_for_subscription_sync($node_subscriber);
-
-$node_publisher->safe_psql(
-	'postgres', qq(
-	INSERT INTO tab5 VALUES (5, 55, 555, 5555);
-));
-
-$node_publisher->wait_for_catchup('sub1');
-
-is( $node_subscriber->safe_psql('postgres', "SELECT * FROM tab5 ORDER BY a"),
-	qq(1|11|1111|
-2|22|2222|
-3|33|3333|
-4|44|4444|
-5|55|5555|555),
-	'overlapping publications with overlapping column lists');
 
 # TEST: create a table with a column list, then change the replica
 # identity by replacing a primary key (but use a different column in
@@ -900,57 +871,21 @@ is( $node_subscriber->safe_psql(
 3|),
 	'partitions with different replica identities not replicated correctly');
 
-# TEST: With a table included in multiple publications, we should use a
-# union of the column lists. So with column lists (a,b) and (a,c) we
-# should replicate (a,b,c).
 
-$node_publisher->safe_psql(
-	'postgres', qq(
-	CREATE TABLE test_mix_1 (a int PRIMARY KEY, b int, c int);
-	CREATE PUBLICATION pub_mix_1 FOR TABLE test_mix_1 (a, b);
-	CREATE PUBLICATION pub_mix_2 FOR TABLE test_mix_1 (a, c);
-
-	-- initial data
-	INSERT INTO test_mix_1 VALUES (1, 2, 3);
-));
-
-$node_subscriber->safe_psql(
-	'postgres', qq(
-	CREATE TABLE test_mix_1 (a int PRIMARY KEY, b int, c int);
-	ALTER SUBSCRIPTION sub1 SET PUBLICATION pub_mix_1, pub_mix_2;
-));
-
-wait_for_subscription_sync($node_subscriber);
-
-$node_publisher->safe_psql(
-	'postgres', qq(
-	INSERT INTO test_mix_1 VALUES (4, 5, 6);
-));
-
-$node_publisher->wait_for_catchup('sub1');
-
-is( $node_subscriber->safe_psql(
-		'postgres', "SELECT * FROM test_mix_1 ORDER BY a"),
-	qq(1|2|3
-4|5|6),
-	'a mix of publications should use a union of column list');
-
-
-# TEST: With a table included in multiple publications, we should use a
-# union of the column lists. If any of the publications is FOR ALL
-# TABLES, we should replicate all columns.
+# TEST: With a table included in the publications is FOR ALL TABLES, it
+# means replicate all columns.
 
 # drop unnecessary tables, so as not to interfere with the FOR ALL TABLES
 $node_publisher->safe_psql(
 	'postgres', qq(
-	DROP TABLE tab1, tab2, tab3, tab4, tab5, tab6, tab7, test_mix_1,
+	DROP TABLE tab1, tab2, tab3, tab4, tab5, tab6, tab7,
 			   test_part, test_part_a, test_part_b, test_part_c, test_part_d;
 ));
 
 $node_publisher->safe_psql(
 	'postgres', qq(
 	CREATE TABLE test_mix_2 (a int PRIMARY KEY, b int, c int);
-	CREATE PUBLICATION pub_mix_3 FOR TABLE test_mix_2 (a, b);
+	CREATE PUBLICATION pub_mix_3 FOR TABLE test_mix_2 (a, b, c);
 	CREATE PUBLICATION pub_mix_4 FOR ALL TABLES;
 
 	-- initial data
@@ -976,12 +911,11 @@ $node_publisher->wait_for_catchup('sub1');
 is( $node_subscriber->safe_psql('postgres', "SELECT * FROM test_mix_2"),
 	qq(1|2|3
 4|5|6),
-	'a mix of publications should use a union of column list');
+	'all columns should be replicated');
 
 
-# TEST: With a table included in multiple publications, we should use a
-# union of the column lists. If any of the publications is FOR ALL
-# TABLES IN SCHEMA, we should replicate all columns.
+# TEST: With a table included in the publication which is FOR ALL TABLES
+# IN SCHEMA, it means replicate all columns.
 
 $node_subscriber->safe_psql(
 	'postgres', qq(
@@ -993,7 +927,7 @@ $node_publisher->safe_psql(
 	'postgres', qq(
 	DROP TABLE test_mix_2;
 	CREATE TABLE test_mix_3 (a int PRIMARY KEY, b int, c int);
-	CREATE PUBLICATION pub_mix_5 FOR TABLE test_mix_3 (a, b);
+	CREATE PUBLICATION pub_mix_5 FOR TABLE test_mix_3 (a, b, c);
 	CREATE PUBLICATION pub_mix_6 FOR ALL TABLES IN SCHEMA public;
 
 	-- initial data
@@ -1017,7 +951,7 @@ $node_publisher->wait_for_catchup('sub1');
 is( $node_subscriber->safe_psql('postgres', "SELECT * FROM test_mix_3"),
 	qq(1|2|3
 4|5|6),
-	'a mix of publications should use a union of column list');
+	'all columns should be replicated');
 
 
 # TEST: Check handling of publish_via_partition_root - if a partition is
@@ -1074,7 +1008,7 @@ is( $node_subscriber->safe_psql(
 # TEST: Multiple publications which publish schema of parent table and
 # partition. The partition is published through two publications, once
 # through a schema (so no column list) containing the parent, and then
-# also directly (with a columns list). The expected outcome is there is
+# also directly (with all columns). The expected outcome is there is
 # no column list.
 
 $node_publisher->safe_psql(
@@ -1086,7 +1020,7 @@ $node_publisher->safe_psql(
 	CREATE TABLE t_1 PARTITION OF s1.t FOR VALUES FROM (1) TO (10);
 
 	CREATE PUBLICATION pub1 FOR ALL TABLES IN SCHEMA s1;
-	CREATE PUBLICATION pub2 FOR TABLE t_1(b);
+	CREATE PUBLICATION pub2 FOR TABLE t_1(a, b, c);
 
 	-- initial data
 	INSERT INTO s1.t VALUES (1, 2, 3);
@@ -1233,6 +1167,27 @@ is( $node_subscriber->safe_psql(
 	'publication containing both parent and child relation');
 
 
+# TEST: With a table included in multiple publications with different column
+# lists, we should catch the error in the log
+
+$node_publisher->safe_psql('postgres', qq(
+	CREATE TABLE test_mix_1 (a int PRIMARY KEY, b int, c int);
+	CREATE PUBLICATION pub_mix_1 FOR TABLE test_mix_1 (a, b);
+	CREATE PUBLICATION pub_mix_2 FOR TABLE test_mix_1 (a, c);
+));
+
+$node_subscriber->safe_psql('postgres', qq(
+	CREATE TABLE test_mix_1 (a int PRIMARY KEY, b int, c int);
+	ALTER SUBSCRIPTION sub1 SET PUBLICATION pub_mix_1, pub_mix_2 WITH (copy_data = false);
+));
+
+$node_publisher->wait_for_catchup('sub1');
+
+my $logfile = slurp_file($node_subscriber->logfile, $offset);
+ok( $logfile =~
+	  qr/cannot use different column lists for table "public.test_mix_1" in different publications/,
+	'different column lists detected');
+
 $node_subscriber->stop('fast');
 $node_publisher->stop('fast');
 
-- 
2.7.2.windows.1

0001-extent-pg_publication_tables.patchapplication/octet-stream; name=0001-extent-pg_publication_tables.patchDownload

From 16d36b556d066480cda4dd170e13183aaf9d8173 Mon Sep 17 00:00:00 2001
From: "houzj.fnst" <houzj.fnst@cn.fujitsu.com>
Date: Fri, 13 May 2022 11:35:14 +0800
Subject: [PATCH] extent pg_publication_tables

Extend the pg_publication_tables view and pg_get_publication_tables function
so that they can return the column list and row fitler of the table. It will
make it easier for users and developers to fetch the column lists and row filters.

---
 doc/src/sgml/catalogs.sgml                  | 20 +++++++++++
 src/backend/catalog/pg_publication.c        | 52 ++++++++++++++++++++++++++++-
 src/backend/catalog/system_views.sql        |  4 ++-
 src/backend/replication/logical/tablesync.c | 14 +++-----
 src/include/catalog/pg_proc.dat             |  6 ++--
 src/test/regress/expected/publication.out   | 42 +++++++++++------------
 src/test/regress/expected/rules.out         |  6 ++--
 7 files changed, 106 insertions(+), 38 deletions(-)

diff --git a/doc/src/sgml/catalogs.sgml b/doc/src/sgml/catalogs.sgml
index a533a21..fd0c61b 100644
--- a/doc/src/sgml/catalogs.sgml
+++ b/doc/src/sgml/catalogs.sgml
@@ -11687,6 +11687,26 @@ SELECT * FROM pg_locks pl LEFT JOIN pg_prepared_xacts ppx
        Name of table
       </para></entry>
      </row>
+
+     <row>
+      <entry role="catalog_table_entry"><para role="column_definition">
+       <structfield>columnlist</structfield> <type>int2vector</type>
+       (references <link linkend="catalog-pg-publication-rel"><structname>pg_publication_rel</structname></link>.<structfield>prattrs</structfield>)
+      </para>
+      <para>
+       Column list of table
+      </para></entry>
+     </row>
+
+     <row>
+      <entry role="catalog_table_entry"><para role="column_definition">
+       <structfield>rowfilter</structfield> <type>text</type>
+       (references <link linkend="catalog-pg-publication-rel"><structname>pg_publication_rel</structname></link>.<structfield>prqual</structfield>)
+      </para>
+      <para>
+       Row filter of table
+      </para></entry>
+     </row>
     </tbody>
    </tgroup>
   </table>
diff --git a/src/backend/catalog/pg_publication.c b/src/backend/catalog/pg_publication.c
index e2c8bcb..3fe607e 100644
--- a/src/backend/catalog/pg_publication.c
+++ b/src/backend/catalog/pg_publication.c
@@ -1082,6 +1082,7 @@ get_publication_name(Oid pubid, bool missing_ok)
 Datum
 pg_get_publication_tables(PG_FUNCTION_ARGS)
 {
+#define NUM_PUBLICATOIN_TABLES_ELEM	3
 	FuncCallContext *funcctx;
 	char	   *pubname = text_to_cstring(PG_GETARG_TEXT_PP(0));
 	Publication *publication;
@@ -1090,6 +1091,7 @@ pg_get_publication_tables(PG_FUNCTION_ARGS)
 	/* stuff done only on the first call of the function */
 	if (SRF_IS_FIRSTCALL())
 	{
+		TupleDesc	tupdesc;
 		MemoryContext oldcontext;
 
 		/* create a function context for cross-call persistence */
@@ -1136,6 +1138,16 @@ pg_get_publication_tables(PG_FUNCTION_ARGS)
 				tables = filter_partitions(tables);
 		}
 
+		/* Construct a tuple descriptor for the result rows. */
+		tupdesc = CreateTemplateTupleDesc(NUM_PUBLICATOIN_TABLES_ELEM);
+		TupleDescInitEntry(tupdesc, (AttrNumber) 1, "relid",
+						   OIDOID, -1, 0);
+		TupleDescInitEntry(tupdesc, (AttrNumber) 2, "attrs",
+						   INT2VECTOROID, -1, 0);
+		TupleDescInitEntry(tupdesc, (AttrNumber) 3, "qual",
+						   PG_NODE_TREEOID, -1, 0);
+
+		funcctx->tuple_desc = BlessTupleDesc(tupdesc);
 		funcctx->user_fctx = (void *) tables;
 
 		MemoryContextSwitchTo(oldcontext);
@@ -1147,9 +1159,47 @@ pg_get_publication_tables(PG_FUNCTION_ARGS)
 
 	if (funcctx->call_cntr < list_length(tables))
 	{
+		HeapTuple	pubtuple = NULL;
+		HeapTuple	rettuple;
 		Oid			relid = list_nth_oid(tables, funcctx->call_cntr);
+		Datum		values[NUM_PUBLICATOIN_TABLES_ELEM];
+		bool		nulls[NUM_PUBLICATOIN_TABLES_ELEM];
+
+		/*
+		 * Form tuple with appropriate data.
+		 */
+		MemSet(nulls, 0, sizeof(nulls));
+		MemSet(values, 0, sizeof(values));
+
+		publication = GetPublicationByName(pubname, false);
+
+		values[0] = ObjectIdGetDatum(relid);
+
+		pubtuple = SearchSysCacheCopy2(PUBLICATIONRELMAP,
+									   ObjectIdGetDatum(relid),
+									   ObjectIdGetDatum(publication->oid));
+
+		if (HeapTupleIsValid(pubtuple))
+		{
+			/* Lookup the column list attribute. */
+			values[1] = SysCacheGetAttr(PUBLICATIONRELMAP, pubtuple,
+										Anum_pg_publication_rel_prattrs,
+										&(nulls[1]));
+
+			/* Null indicates no filter. */
+			values[2] = SysCacheGetAttr(PUBLICATIONRELMAP, pubtuple,
+									  Anum_pg_publication_rel_prqual,
+									  &(nulls[2]));
+		}
+		else
+		{
+			nulls[1] = true;
+			nulls[2] = true;
+		}
+
+		rettuple = heap_form_tuple(funcctx->tuple_desc, values, nulls);
 
-		SRF_RETURN_NEXT(funcctx, ObjectIdGetDatum(relid));
+		SRF_RETURN_NEXT(funcctx, HeapTupleGetDatum(rettuple));
 	}
 
 	SRF_RETURN_DONE(funcctx);
diff --git a/src/backend/catalog/system_views.sql b/src/backend/catalog/system_views.sql
index 0fc614e..f758f15 100644
--- a/src/backend/catalog/system_views.sql
+++ b/src/backend/catalog/system_views.sql
@@ -368,7 +368,9 @@ CREATE VIEW pg_publication_tables AS
     SELECT
         P.pubname AS pubname,
         N.nspname AS schemaname,
-        C.relname AS tablename
+        C.relname AS tablename,
+        GPT.attrs AS columnlist,
+        pg_get_expr(GPT.qual, GPT.relid) AS rowfilter
     FROM pg_publication P,
          LATERAL pg_get_publication_tables(P.pubname) GPT,
          pg_class C JOIN pg_namespace N ON (N.oid = C.relnamespace)
diff --git a/src/backend/replication/logical/tablesync.c b/src/backend/replication/logical/tablesync.c
index b03e0f5..994c7a0 100644
--- a/src/backend/replication/logical/tablesync.c
+++ b/src/backend/replication/logical/tablesync.c
@@ -795,15 +795,12 @@ fetch_remote_table_info(char *nspname, char *relname,
 		resetStringInfo(&cmd);
 		appendStringInfo(&cmd,
 						 "SELECT DISTINCT unnest"
-						 "  FROM pg_publication p"
-						 "  LEFT OUTER JOIN pg_publication_rel pr"
-						 "       ON (p.oid = pr.prpubid AND pr.prrelid = %u)"
-						 "  LEFT OUTER JOIN unnest(pr.prattrs) ON TRUE,"
+						 "  FROM pg_publication p,"
 						 "  LATERAL pg_get_publication_tables(p.pubname) gpt"
+						 "  LEFT OUTER JOIN unnest(gpt.attrs) ON TRUE"
 						 " WHERE gpt.relid = %u"
 						 "   AND p.pubname IN ( %s )",
 						 lrel->remoteid,
-						 lrel->remoteid,
 						 pub_names.data);
 
 		pubres = walrcv_exec(LogRepWorkerWalRcvConn, cmd.data,
@@ -965,15 +962,12 @@ fetch_remote_table_info(char *nspname, char *relname,
 		/* Check for row filters. */
 		resetStringInfo(&cmd);
 		appendStringInfo(&cmd,
-						 "SELECT DISTINCT pg_get_expr(pr.prqual, pr.prrelid)"
-						 "  FROM pg_publication p"
-						 "  LEFT OUTER JOIN pg_publication_rel pr"
-						 "       ON (p.oid = pr.prpubid AND pr.prrelid = %u),"
+						 "SELECT DISTINCT pg_get_expr(gpt.qual, gpt.relid)"
+						 "  FROM pg_publication p,"
 						 "  LATERAL pg_get_publication_tables(p.pubname) gpt"
 						 " WHERE gpt.relid = %u"
 						 "   AND p.pubname IN ( %s )",
 						 lrel->remoteid,
-						 lrel->remoteid,
 						 pub_names.data);
 
 		res = walrcv_exec(LogRepWorkerWalRcvConn, cmd.data, 1, qualRow);
diff --git a/src/include/catalog/pg_proc.dat b/src/include/catalog/pg_proc.dat
index babe16f..cf6235d 100644
--- a/src/include/catalog/pg_proc.dat
+++ b/src/include/catalog/pg_proc.dat
@@ -11675,9 +11675,9 @@
 # publications
 { oid => '6119', descr => 'get OIDs of tables in a publication',
   proname => 'pg_get_publication_tables', prorows => '1000', proretset => 't',
-  provolatile => 's', prorettype => 'oid', proargtypes => 'text',
-  proallargtypes => '{text,oid}', proargmodes => '{i,o}',
-  proargnames => '{pubname,relid}', prosrc => 'pg_get_publication_tables' },
+  provolatile => 's', prorettype => 'record', proargtypes => 'text',
+  proallargtypes => '{text,oid,int2vector,pg_node_tree}', proargmodes => '{i,o,o,o}',
+  proargnames => '{pubname,relid,attrs,qual}', prosrc => 'pg_get_publication_tables' },
 { oid => '6121',
   descr => 'returns whether a relation can be part of a publication',
   proname => 'pg_relation_is_publishable', provolatile => 's',
diff --git a/src/test/regress/expected/publication.out b/src/test/regress/expected/publication.out
index 398c0f3..4ebafb3 100644
--- a/src/test/regress/expected/publication.out
+++ b/src/test/regress/expected/publication.out
@@ -1585,52 +1585,52 @@ CREATE TABLE sch2.tbl1_part1 PARTITION OF sch1.tbl1 FOR VALUES FROM (1) to (10);
 -- Schema publication that does not include the schema that has the parent table
 CREATE PUBLICATION pub FOR ALL TABLES IN SCHEMA sch2 WITH (PUBLISH_VIA_PARTITION_ROOT=1);
 SELECT * FROM pg_publication_tables;
- pubname | schemaname | tablename  
----------+------------+------------
- pub     | sch2       | tbl1_part1
+ pubname | schemaname | tablename  | columnlist | rowfilter 
+---------+------------+------------+------------+-----------
+ pub     | sch2       | tbl1_part1 |            | 
 (1 row)
 
 DROP PUBLICATION pub;
 -- Table publication that does not include the parent table
 CREATE PUBLICATION pub FOR TABLE sch2.tbl1_part1 WITH (PUBLISH_VIA_PARTITION_ROOT=1);
 SELECT * FROM pg_publication_tables;
- pubname | schemaname | tablename  
----------+------------+------------
- pub     | sch2       | tbl1_part1
+ pubname | schemaname | tablename  | columnlist | rowfilter 
+---------+------------+------------+------------+-----------
+ pub     | sch2       | tbl1_part1 |            | 
 (1 row)
 
 -- Table publication that includes both the parent table and the child table
 ALTER PUBLICATION pub ADD TABLE sch1.tbl1;
 SELECT * FROM pg_publication_tables;
- pubname | schemaname | tablename 
----------+------------+-----------
- pub     | sch1       | tbl1
+ pubname | schemaname | tablename | columnlist | rowfilter 
+---------+------------+-----------+------------+-----------
+ pub     | sch1       | tbl1      |            | 
 (1 row)
 
 DROP PUBLICATION pub;
 -- Schema publication that does not include the schema that has the parent table
 CREATE PUBLICATION pub FOR ALL TABLES IN SCHEMA sch2 WITH (PUBLISH_VIA_PARTITION_ROOT=0);
 SELECT * FROM pg_publication_tables;
- pubname | schemaname | tablename  
----------+------------+------------
- pub     | sch2       | tbl1_part1
+ pubname | schemaname | tablename  | columnlist | rowfilter 
+---------+------------+------------+------------+-----------
+ pub     | sch2       | tbl1_part1 |            | 
 (1 row)
 
 DROP PUBLICATION pub;
 -- Table publication that does not include the parent table
 CREATE PUBLICATION pub FOR TABLE sch2.tbl1_part1 WITH (PUBLISH_VIA_PARTITION_ROOT=0);
 SELECT * FROM pg_publication_tables;
- pubname | schemaname | tablename  
----------+------------+------------
- pub     | sch2       | tbl1_part1
+ pubname | schemaname | tablename  | columnlist | rowfilter 
+---------+------------+------------+------------+-----------
+ pub     | sch2       | tbl1_part1 |            | 
 (1 row)
 
 -- Table publication that includes both the parent table and the child table
 ALTER PUBLICATION pub ADD TABLE sch1.tbl1;
 SELECT * FROM pg_publication_tables;
- pubname | schemaname | tablename  
----------+------------+------------
- pub     | sch2       | tbl1_part1
+ pubname | schemaname | tablename  | columnlist | rowfilter 
+---------+------------+------------+------------+-----------
+ pub     | sch2       | tbl1_part1 |            | 
 (1 row)
 
 DROP PUBLICATION pub;
@@ -1643,9 +1643,9 @@ CREATE TABLE sch1.tbl1_part3 (a int) PARTITION BY RANGE(a);
 ALTER TABLE sch1.tbl1 ATTACH PARTITION sch1.tbl1_part3 FOR VALUES FROM (20) to (30);
 CREATE PUBLICATION pub FOR ALL TABLES IN SCHEMA sch1 WITH (PUBLISH_VIA_PARTITION_ROOT=1);
 SELECT * FROM pg_publication_tables;
- pubname | schemaname | tablename 
----------+------------+-----------
- pub     | sch1       | tbl1
+ pubname | schemaname | tablename | columnlist | rowfilter 
+---------+------------+-----------+------------+-----------
+ pub     | sch1       | tbl1      |            | 
 (1 row)
 
 RESET client_min_messages;
diff --git a/src/test/regress/expected/rules.out b/src/test/regress/expected/rules.out
index 21effe8..c7f8300 100644
--- a/src/test/regress/expected/rules.out
+++ b/src/test/regress/expected/rules.out
@@ -1437,9 +1437,11 @@ pg_prepared_xacts| SELECT p.transaction,
      LEFT JOIN pg_database d ON ((p.dbid = d.oid)));
 pg_publication_tables| SELECT p.pubname,
     n.nspname AS schemaname,
-    c.relname AS tablename
+    c.relname AS tablename,
+    gpt.attrs AS columnlist,
+    pg_get_expr(gpt.qual, gpt.relid) AS rowfilter
    FROM pg_publication p,
-    LATERAL pg_get_publication_tables((p.pubname)::text) gpt(relid),
+    LATERAL pg_get_publication_tables((p.pubname)::text) gpt(relid, attrs, qual),
     (pg_class c
      JOIN pg_namespace n ON ((n.oid = c.relnamespace)))
   WHERE (c.oid = gpt.relid);
-- 
2.7.2.windows.1

#60

Amit Kapila

amit.kapila16@gmail.com

over 3 years ago

In reply to: houzj.fnst@fujitsu.com (#59)

Re: bogus: logical replication rows/cols combinations

On Fri, May 13, 2022 at 11:32 AM houzj.fnst@fujitsu.com
<houzj.fnst@fujitsu.com> wrote:

On Thursday, May 12, 2022 2:45 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
On Wed, May 11, 2022 at 12:55 PM houzj.fnst@fujitsu.com
<houzj.fnst@fujitsu.com> wrote:
Few comments:
===============
1.
initStringInfo(&cmd);
- appendStringInfoString(&cmd, "SELECT DISTINCT t.schemaname,
t.tablename\n"
-    "  FROM pg_catalog.pg_publication_tables t\n"
+ appendStringInfoString(&cmd,
+    "SELECT DISTINCT t.schemaname,\n"
+    "                t.tablename,\n"
+    "                (CASE WHEN (array_length(pr.prattrs, 1) = t.relnatts)\n"
+    "                THEN NULL ELSE pr.prattrs END)\n"
+    "  FROM (SELECT P.pubname AS pubname,\n"
+    "               N.nspname AS schemaname,\n"
+    "               C.relname AS tablename,\n"
+    "               P.oid AS pubid,\n"
+    "               C.oid AS reloid,\n"
+    "               C.relnatts\n"
+    "          FROM pg_publication P,\n"
+    "          LATERAL pg_get_publication_tables(P.pubname) GPT,\n"
+    "          pg_class C JOIN pg_namespace N\n"
+    "                     ON (N.oid = C.relnamespace)\n"
+    "          WHERE C.oid = GPT.relid) t\n"
+    "  LEFT OUTER JOIN pg_publication_rel pr\n"
+    "       ON (t.pubid = pr.prpubid AND\n"
+    "        pr.prrelid = reloid)\n"
Can we modify pg_publication_tables to get the row filter and column list and
then use it directly instead of constructing this query?
Agreed. If we can get columnlist and rowfilter from pg_publication_tables, it
will be more convenient. And I think users that want to fetch the columnlist
and rowfilter of table can also benefit from it.

After the change for this, we will give an error on combining
publications where one of the publications specifies all columns in
the table and the other doesn't provide any columns. We should not
give an error as both mean all columns.

Attach the new version patch which addressed these comments and update the
document. 0001 patch is to extent the view and 0002 patch is to add restriction
for column list.

I think it is better to display column names for columnlist in the
exposed view similar to attnames in the pg_stats_ext view. I think
that will make it easier for users to understand this information.

2.
 { oid => '6119', descr => 'get OIDs of tables in a publication',
   proname => 'pg_get_publication_tables', prorows => '1000', proretset => 't',
-  provolatile => 's', prorettype => 'oid', proargtypes => 'text',
-  proallargtypes => '{text,oid}', proargmodes => '{i,o}',
-  proargnames => '{pubname,relid}', prosrc => 'pg_get_publication_tables' },
+  provolatile => 's', prorettype => 'record', proargtypes => 'text',
+  proallargtypes => '{text,oid,int2vector,pg_node_tree}', proargmodes
=> '{i,o,o,o}',

I think we should change the "descr" to something like: 'get
information of tables in a publication'

3.
+
+ /*
+ * We only throw a warning here so that the subcription can still be
+ * created and let user aware that something is going to fail later and
+ * they can fix the publications afterwards.
+ */
+ if (list_member(tablelist, rv))
+ ereport(WARNING,
+ errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+ errmsg("cannot use different column lists for table \"%s.%s\" in
different publications",
+    nspname, relname));

Can we extend this comment to explain the case where after Alter
Publication, if the user dumps and restores back the subscription,
there is a possibility that "CREATE SUBSCRIPTION" won't work if we
give ERROR here instead of WARNING?

--
With Regards,
Amit Kapila.

#61

houzj.fnst@fujitsu.com

over 3 years ago

In reply to: Amit Kapila (#60)

2 attachment(s)

RE: bogus: logical replication rows/cols combinations

On Monday, May 16, 2022 2:10 PM Amit Kapila <amit.kapila16@gmail.com>

On Fri, May 13, 2022 at 11:32 AM houzj.fnst@fujitsu.com
<houzj.fnst@fujitsu.com> wrote:

On Thursday, May 12, 2022 2:45 PM Amit Kapila <amit.kapila16@gmail.com>

wrote:
On Wed, May 11, 2022 at 12:55 PM houzj.fnst@fujitsu.com
<houzj.fnst@fujitsu.com> wrote:
Few comments:
===============
1.
initStringInfo(&cmd);
- appendStringInfoString(&cmd, "SELECT DISTINCT t.schemaname,
t.tablename\n"
-    "  FROM pg_catalog.pg_publication_tables t\n"
+ appendStringInfoString(&cmd,
+    "SELECT DISTINCT t.schemaname,\n"
+    "                t.tablename,\n"
+    "                (CASE WHEN (array_length(pr.prattrs, 1) =
t.relnatts)\n"
+    "                THEN NULL ELSE pr.prattrs END)\n"
+    "  FROM (SELECT P.pubname AS pubname,\n"
+    "               N.nspname AS schemaname,\n"
+    "               C.relname AS tablename,\n"
+    "               P.oid AS pubid,\n"
+    "               C.oid AS reloid,\n"
+    "               C.relnatts\n"
+    "          FROM pg_publication P,\n"
+    "          LATERAL pg_get_publication_tables(P.pubname) GPT,\n"
+    "          pg_class C JOIN pg_namespace N\n"
+    "                     ON (N.oid = C.relnamespace)\n"
+    "          WHERE C.oid = GPT.relid) t\n"
+    "  LEFT OUTER JOIN pg_publication_rel pr\n"
+    "       ON (t.pubid = pr.prpubid AND\n"
+    "        pr.prrelid = reloid)\n"
Can we modify pg_publication_tables to get the row filter and column list
and

then use it directly instead of constructing this query?

Agreed. If we can get columnlist and rowfilter from pg_publication_tables, it
will be more convenient. And I think users that want to fetch the columnlist
and rowfilter of table can also benefit from it.

After the change for this, we will give an error on combining
publications where one of the publications specifies all columns in
the table and the other doesn't provide any columns. We should not
give an error as both mean all columns.

Thanks for the comments. Fixed.

Attach the new version patch which addressed these comments and update

the

document. 0001 patch is to extent the view and 0002 patch is to add

restriction

for column list.

Few comments:
=================
1.
postgres=# select * from pg_publication_tables;
pubname | schemaname | tablename | columnlist | rowfilter
---------+------------+-----------+------------+-----------
pub1 | public | t1 | |
pub2 | public | t1 | 1 2 | (c3 < 10)
(2 rows)

I think it is better to display column names for columnlist in the
exposed view similar to attnames in the pg_stats_ext view. I think
that will make it easier for users to understand this information.

Agreed and changed.

2.
{ oid => '6119', descr => 'get OIDs of tables in a publication',
proname => 'pg_get_publication_tables', prorows => '1000', proretset =>
't',
-  provolatile => 's', prorettype => 'oid', proargtypes => 'text',
-  proallargtypes => '{text,oid}', proargmodes => '{i,o}',
-  proargnames => '{pubname,relid}', prosrc => 'pg_get_publication_tables' },
+  provolatile => 's', prorettype => 'record', proargtypes => 'text',
+  proallargtypes => '{text,oid,int2vector,pg_node_tree}', proargmodes
=> '{i,o,o,o}',

I think we should change the "descr" to something like: 'get
information of tables in a publication'

Changed.

3.
+
+ /*
+ * We only throw a warning here so that the subcription can still be
+ * created and let user aware that something is going to fail later and
+ * they can fix the publications afterwards.
+ */
+ if (list_member(tablelist, rv))
+ ereport(WARNING,
+ errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+ errmsg("cannot use different column lists for table \"%s.%s\" in
different publications",
+    nspname, relname));
Can we extend this comment to explain the case where after Alter
Publication, if the user dumps and restores back the subscription,
there is a possibility that "CREATE SUBSCRIPTION" won't work if we
give ERROR here instead of WARNING?

After rethinking about this, It seems ok to report an ERROR here as the pg_dump
of subscription always set (connect = false). So, we won't hit the check when
restore the dump which means the restore can be successful even if user change
the publication afterwards. Based on this, I have changed the warning to error.

Attach the new version patch.

Best regards,
Hou zj

Attachments:

v2-0002-Prohibit-combining-publications-with-different-colum.patchapplication/octet-stream; name=v2-0002-Prohibit-combining-publications-with-different-colum.patchDownload

From aa3fa3a3de41b37f292308654c697a8c23ebb6a7 Mon Sep 17 00:00:00 2001
From: "houzj.fnst" <houzj.fnst@cn.fujitsu.com>
Date: Fri, 13 May 2022 13:19:49 +0800
Subject: [PATCH] Prohibit combining publications with different column lists.

The main purpose of introducing a column list is to have statically
different shapes on publisher and subscriber or hide sensitive column
data. In both cases, it doesn't seems to make sense to combine column
lists.

So, we disallow the cases where the column list is different for the same
table when combining publications. It can be later extended to combine the
column lists for selective cases where required.

Reported-by: Alvaro Herrera
Author: Hou Zhijie
Reviewed-by: Amit Kapila
Discussion: https://postgr.es/m/202204251548.mudq7jbqnh7r@alvherre.pgsql
---
 doc/src/sgml/ref/alter_publication.sgml     |  12 +++-
 doc/src/sgml/ref/create_subscription.sgml   |   5 ++
 src/backend/commands/subscriptioncmds.c     |  23 ++++--
 src/backend/replication/logical/tablesync.c |  60 ++++++++++------
 src/backend/replication/pgoutput/pgoutput.c |  77 ++++++++++----------
 src/test/subscription/t/031_column_list.pl  | 105 +++++-----------------------
 6 files changed, 127 insertions(+), 155 deletions(-)

diff --git a/doc/src/sgml/ref/alter_publication.sgml b/doc/src/sgml/ref/alter_publication.sgml
index e2cce49..f03933a 100644
--- a/doc/src/sgml/ref/alter_publication.sgml
+++ b/doc/src/sgml/ref/alter_publication.sgml
@@ -116,7 +116,17 @@ ALTER PUBLICATION <replaceable class="parameter">name</replaceable> RENAME TO <r
 
      <para>
       Optionally, a column list can be specified.  See <xref
-      linkend="sql-createpublication"/> for details.
+      linkend="sql-createpublication"/> for details. Note that a subscription
+      having several publications in which the same table has been published
+      with different column lists is not supported. So, changing the column
+      lists of the tables being subscribed could cause inconsistency of column
+      lists among publications in which case <command>ALTER PUBLICATION</command>
+      command will be successful but later the WalSender in publisher or the
+      subscriber may throw an error. In this scenario, the user needs to
+      recreate the subscription after adjusting the column list or drop the
+      problematic publication using
+      <literal>ALTER SUBSCRIPTION ... DROP PUBLICATION</literal> and then add
+      it back after adjusting the column list.
      </para>
 
      <para>
diff --git a/doc/src/sgml/ref/create_subscription.sgml b/doc/src/sgml/ref/create_subscription.sgml
index 203bb41..f6f82a0 100644
--- a/doc/src/sgml/ref/create_subscription.sgml
+++ b/doc/src/sgml/ref/create_subscription.sgml
@@ -356,6 +356,11 @@ CREATE SUBSCRIPTION <replaceable class="parameter">subscription_name</replaceabl
   </para>
 
   <para>
+   Subscription having several publications in which the same table has been
+   published with different column lists is not supported.
+  </para>
+
+  <para>
    We allow non-existent publications to be specified so that users can add
    those later. This means
    <link linkend="catalog-pg-subscription"><structname>pg_subscription</structname></link>
diff --git a/src/backend/commands/subscriptioncmds.c b/src/backend/commands/subscriptioncmds.c
index 690cdaa..991b2c1 100644
--- a/src/backend/commands/subscriptioncmds.c
+++ b/src/backend/commands/subscriptioncmds.c
@@ -1753,7 +1753,8 @@ AlterSubscriptionOwner_oid(Oid subid, Oid newOwnerId)
 
 /*
  * Get the list of tables which belong to specified publications on the
- * publisher connection.
+ * publisher connection. Also get the column list for each table and check if
+ * column lists are the same in different publications.
  */
 static List *
 fetch_table_list(WalReceiverConn *wrconn, List *publications)
@@ -1761,17 +1762,18 @@ fetch_table_list(WalReceiverConn *wrconn, List *publications)
 	WalRcvExecResult *res;
 	StringInfoData cmd;
 	TupleTableSlot *slot;
-	Oid			tableRow[2] = {TEXTOID, TEXTOID};
+	Oid			tableRow[3] = {TEXTOID, TEXTOID, NAMEARRAYOID};
 	List	   *tablelist = NIL;
 
 	initStringInfo(&cmd);
-	appendStringInfoString(&cmd, "SELECT DISTINCT t.schemaname, t.tablename\n"
+	appendStringInfoString(&cmd, "SELECT DISTINCT t.schemaname, t.tablename, \n"
+						   "                      t.attnames\n"
 						   "  FROM pg_catalog.pg_publication_tables t\n"
 						   " WHERE t.pubname IN (");
 	get_publications_str(publications, &cmd, true);
 	appendStringInfoChar(&cmd, ')');
 
-	res = walrcv_exec(wrconn, cmd.data, 2, tableRow);
+	res = walrcv_exec(wrconn, cmd.data, 3, tableRow);
 	pfree(cmd.data);
 
 	if (res->status != WALRCV_OK_TUPLES)
@@ -1795,7 +1797,18 @@ fetch_table_list(WalReceiverConn *wrconn, List *publications)
 		Assert(!isnull);
 
 		rv = makeRangeVar(nspname, relname, -1);
-		tablelist = lappend(tablelist, rv);
+
+		/*
+		 * We don't support the case where column list is different for the
+		 * same table in different publications.
+		 */
+		if (list_member(tablelist, rv))
+			ereport(ERROR,
+					errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+					errmsg("cannot use different column lists for table \"%s.%s\" in different publications",
+						   nspname, relname));
+		else
+			tablelist = lappend(tablelist, rv);
 
 		ExecClearTuple(slot);
 	}
diff --git a/src/backend/replication/logical/tablesync.c b/src/backend/replication/logical/tablesync.c
index 994c7a0..9ce4b9f 100644
--- a/src/backend/replication/logical/tablesync.c
+++ b/src/backend/replication/logical/tablesync.c
@@ -771,7 +771,7 @@ fetch_remote_table_info(char *nspname, char *relname,
 	{
 		WalRcvExecResult *pubres;
 		TupleTableSlot *slot;
-		Oid			attrsRow[] = {INT2OID};
+		Oid			attrsRow[] = {INT2VECTOROID};
 		StringInfoData pub_names;
 		bool		first = true;
 
@@ -786,19 +786,17 @@ fetch_remote_table_info(char *nspname, char *relname,
 
 		/*
 		 * Fetch info about column lists for the relation (from all the
-		 * publications). We unnest the int2vector values, because that makes
-		 * it easier to combine lists by simply adding the attnums to a new
-		 * bitmap (without having to parse the int2vector data). This
-		 * preserves NULL values, so that if one of the publications has no
-		 * column list, we'll know that.
+		 * publications).
 		 */
 		resetStringInfo(&cmd);
 		appendStringInfo(&cmd,
-						 "SELECT DISTINCT unnest"
+						 "SELECT DISTINCT"
+						 "  (CASE WHEN (array_length(gpt.attrs, 1) = c.relnatts)"
+						 "   THEN NULL ELSE gpt.attrs END)"
 						 "  FROM pg_publication p,"
-						 "  LATERAL pg_get_publication_tables(p.pubname) gpt"
-						 "  LEFT OUTER JOIN unnest(gpt.attrs) ON TRUE"
-						 " WHERE gpt.relid = %u"
+						 "  LATERAL pg_get_publication_tables(p.pubname) gpt,"
+						 "  pg_class c"
+						 " WHERE gpt.relid = %u AND c.oid = gpt.relid"
 						 "   AND p.pubname IN ( %s )",
 						 lrel->remoteid,
 						 pub_names.data);
@@ -813,26 +811,42 @@ fetch_remote_table_info(char *nspname, char *relname,
 							nspname, relname, pubres->err)));
 
 		/*
-		 * Merge the column lists (from different publications) by creating a
-		 * single bitmap with all the attnums. If we find a NULL value, that
-		 * means one of the publications has no column list for the table
-		 * we're syncing.
+		 * We don't support the case where column list is different for the
+		 * same table when combining publications. So there should be only one
+		 * row returned. Although we already checked this when creating
+		 * subscription, we still need to check here in case the column list
+		 * was changed afterwards.
+		 */
+		if (tuplestore_tuple_count(pubres->tuplestore) > 1)
+			ereport(ERROR,
+					errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+					errmsg("cannot use different column lists for table \"%s.%s\" in different publications",
+						   nspname, relname));
+
+		/*
+		 * Get the column list and build a single bitmap with the attnums.
+		 *
+		 * If we find a NULL value, it means all the columns should be
+		 * replicated.
 		 */
 		slot = MakeSingleTupleTableSlot(pubres->tupledesc, &TTSOpsMinimalTuple);
-		while (tuplestore_gettupleslot(pubres->tuplestore, true, false, slot))
+		if (tuplestore_gettupleslot(pubres->tuplestore, true, false, slot))
 		{
 			Datum		cfval = slot_getattr(slot, 1, &isnull);
 
-			/* NULL means empty column list, so we're done. */
-			if (isnull)
+			if (!isnull)
 			{
-				bms_free(included_cols);
-				included_cols = NULL;
-				break;
-			}
+				ArrayType  *arr;
+				int			nelems;
+				int16	   *elems;
+
+				arr = DatumGetArrayTypeP(cfval);
+				nelems = ARR_DIMS(arr)[0];
+				elems = (int16 *) ARR_DATA_PTR(arr);
 
-			included_cols = bms_add_member(included_cols,
-										   DatumGetInt16(cfval));
+				for (natt = 0; natt < nelems; natt++)
+					included_cols = bms_add_member(included_cols, elems[natt]);
+			}
 
 			ExecClearTuple(slot);
 		}
diff --git a/src/backend/replication/pgoutput/pgoutput.c b/src/backend/replication/pgoutput/pgoutput.c
index 42c06af..3d888a5 100644
--- a/src/backend/replication/pgoutput/pgoutput.c
+++ b/src/backend/replication/pgoutput/pgoutput.c
@@ -979,30 +979,30 @@ pgoutput_column_list_init(PGOutputData *data, List *publications,
 						  RelationSyncEntry *entry)
 {
 	ListCell   *lc;
+	bool		first = true;
+	Relation	relation = RelationIdGetRelation(entry->publish_as_relid);
 
 	/*
 	 * Find if there are any column lists for this relation. If there are,
-	 * build a bitmap merging all the column lists.
+	 * build a bitmap using the column lists.
 	 *
-	 * All the given publication-table mappings must be checked.
+	 * Note that we don't support the case where column list is different for
+	 * the same table when combining publications. But we still need to check
+	 * all the given publication-table mappings and report an error if any
+	 * publications have different column list.
 	 *
 	 * Multiple publications might have multiple column lists for this
 	 * relation.
 	 *
 	 * FOR ALL TABLES and FOR ALL TABLES IN SCHEMA implies "don't use column
-	 * list" so it takes precedence.
+	 * list".
 	 */
 	foreach(lc, publications)
 	{
 		Publication *pub = lfirst(lc);
 		HeapTuple	cftuple = NULL;
 		Datum		cfdatum = 0;
-
-		/*
-		 * Assume there's no column list. Only if we find pg_publication_rel
-		 * entry with a column list we'll switch it to false.
-		 */
-		bool		pub_no_list = true;
+		Bitmapset	*cols = NULL;
 
 		/*
 		 * If the publication is FOR ALL TABLES then it is treated the same as
@@ -1011,6 +1011,8 @@ pgoutput_column_list_init(PGOutputData *data, List *publications,
 		 */
 		if (!pub->alltables)
 		{
+			bool		pub_no_list = true;
+
 			/*
 			 * Check for the presence of a column list in this publication.
 			 *
@@ -1024,51 +1026,48 @@ pgoutput_column_list_init(PGOutputData *data, List *publications,
 
 			if (HeapTupleIsValid(cftuple))
 			{
-				/*
-				 * Lookup the column list attribute.
-				 *
-				 * Note: We update the pub_no_list value directly, because if
-				 * the value is NULL, we have no list (and vice versa).
-				 */
+				/* Lookup the column list attribute. */
 				cfdatum = SysCacheGetAttr(PUBLICATIONRELMAP, cftuple,
 										  Anum_pg_publication_rel_prattrs,
 										  &pub_no_list);
 
-				/*
-				 * Build the column list bitmap in the per-entry context.
-				 *
-				 * We need to merge column lists from all publications, so we
-				 * update the same bitmapset. If the column list is null, we
-				 * interpret it as replicating all columns.
-				 */
+				/* Build the column list bitmap in the per-entry context. */
 				if (!pub_no_list)	/* when not null */
 				{
 					pgoutput_ensure_entry_cxt(data, entry);
 
-					entry->columns = pub_collist_to_bitmapset(entry->columns,
-															  cfdatum,
-															  entry->entry_cxt);
+					cols = pub_collist_to_bitmapset(cols, cfdatum,
+													entry->entry_cxt);
+
+					/*
+					 * If column list includes all the columns of the table,
+					 * set it to NULL.
+					 */
+					if (bms_num_members(cols) == RelationGetNumberOfAttributes(relation))
+					{
+						bms_free(cols);
+						cols = NULL;
+					}
 				}
+
+				ReleaseSysCache(cftuple);
 			}
 		}
 
-		/*
-		 * Found a publication with no column list, so we're done. But first
-		 * discard column list we might have from preceding publications.
-		 */
-		if (pub_no_list)
+		if (first)
 		{
-			if (cftuple)
-				ReleaseSysCache(cftuple);
-
-			bms_free(entry->columns);
-			entry->columns = NULL;
-
-			break;
+			entry->columns = cols;
+			first = false;
 		}
-
-		ReleaseSysCache(cftuple);
+		else if (!bms_equal(entry->columns, cols))
+			ereport(ERROR,
+					errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+					errmsg("cannot use different column lists for table \"%s.%s\" in different publications",
+						   get_namespace_name(RelationGetNamespace(relation)),
+						   RelationGetRelationName(relation)));
 	}							/* loop all subscribed publications */
+
+	RelationClose(relation);
 }
 
 /*
diff --git a/src/test/subscription/t/031_column_list.pl b/src/test/subscription/t/031_column_list.pl
index 19812e1..22be6fd 100644
--- a/src/test/subscription/t/031_column_list.pl
+++ b/src/test/subscription/t/031_column_list.pl
@@ -361,13 +361,13 @@ is( $result, qq(1|abc
 2|xyz), 'update on column tab2.c is not replicated');
 
 
-# TEST: add a table to two publications with different column lists, and
+# TEST: add a table to two publications with same column lists, and
 # create a single subscription replicating both publications
 $node_publisher->safe_psql(
 	'postgres', qq(
 	CREATE TABLE tab5 (a int PRIMARY KEY, b int, c int, d int);
 	CREATE PUBLICATION pub2 FOR TABLE tab5 (a, b);
-	CREATE PUBLICATION pub3 FOR TABLE tab5 (a, d);
+	CREATE PUBLICATION pub3 FOR TABLE tab5 (a, b);
 
 	-- insert a couple initial records
 	INSERT INTO tab5 VALUES (1, 11, 111, 1111);
@@ -388,8 +388,7 @@ wait_for_subscription_sync($node_subscriber);
 
 $node_publisher->wait_for_catchup('sub1');
 
-# insert data and make sure all the columns (union of the columns lists)
-# get fully replicated
+# insert data and make sure the columns in column list get fully replicated
 $node_publisher->safe_psql(
 	'postgres', qq(
 	INSERT INTO tab5 VALUES (3, 33, 333, 3333);
@@ -399,42 +398,12 @@ $node_publisher->safe_psql(
 $node_publisher->wait_for_catchup('sub1');
 
 is( $node_subscriber->safe_psql('postgres', "SELECT * FROM tab5 ORDER BY a"),
-	qq(1|11|1111
-2|22|2222
-3|33|3333
-4|44|4444),
+	qq(1|11|
+2|22|
+3|33|
+4|44|),
 	'overlapping publications with overlapping column lists');
 
-# and finally, remove the column list for one of the publications, which
-# means replicating all columns (removing the column list), but first add
-# the missing column to the table on subscriber
-$node_publisher->safe_psql(
-	'postgres', qq(
-	ALTER PUBLICATION pub3 SET TABLE tab5;
-));
-
-$node_subscriber->safe_psql(
-	'postgres', qq(
-	ALTER SUBSCRIPTION sub1 REFRESH PUBLICATION;
-	ALTER TABLE tab5 ADD COLUMN c INT;
-));
-
-wait_for_subscription_sync($node_subscriber);
-
-$node_publisher->safe_psql(
-	'postgres', qq(
-	INSERT INTO tab5 VALUES (5, 55, 555, 5555);
-));
-
-$node_publisher->wait_for_catchup('sub1');
-
-is( $node_subscriber->safe_psql('postgres', "SELECT * FROM tab5 ORDER BY a"),
-	qq(1|11|1111|
-2|22|2222|
-3|33|3333|
-4|44|4444|
-5|55|5555|555),
-	'overlapping publications with overlapping column lists');
 
 # TEST: create a table with a column list, then change the replica
 # identity by replacing a primary key (but use a different column in
@@ -900,57 +869,21 @@ is( $node_subscriber->safe_psql(
 3|),
 	'partitions with different replica identities not replicated correctly');
 
-# TEST: With a table included in multiple publications, we should use a
-# union of the column lists. So with column lists (a,b) and (a,c) we
-# should replicate (a,b,c).
 
-$node_publisher->safe_psql(
-	'postgres', qq(
-	CREATE TABLE test_mix_1 (a int PRIMARY KEY, b int, c int);
-	CREATE PUBLICATION pub_mix_1 FOR TABLE test_mix_1 (a, b);
-	CREATE PUBLICATION pub_mix_2 FOR TABLE test_mix_1 (a, c);
-
-	-- initial data
-	INSERT INTO test_mix_1 VALUES (1, 2, 3);
-));
-
-$node_subscriber->safe_psql(
-	'postgres', qq(
-	CREATE TABLE test_mix_1 (a int PRIMARY KEY, b int, c int);
-	ALTER SUBSCRIPTION sub1 SET PUBLICATION pub_mix_1, pub_mix_2;
-));
-
-wait_for_subscription_sync($node_subscriber);
-
-$node_publisher->safe_psql(
-	'postgres', qq(
-	INSERT INTO test_mix_1 VALUES (4, 5, 6);
-));
-
-$node_publisher->wait_for_catchup('sub1');
-
-is( $node_subscriber->safe_psql(
-		'postgres', "SELECT * FROM test_mix_1 ORDER BY a"),
-	qq(1|2|3
-4|5|6),
-	'a mix of publications should use a union of column list');
-
-
-# TEST: With a table included in multiple publications, we should use a
-# union of the column lists. If any of the publications is FOR ALL
-# TABLES, we should replicate all columns.
+# TEST: With a table included in the publications which is FOR ALL TABLES, it
+# means replicate all columns.
 
 # drop unnecessary tables, so as not to interfere with the FOR ALL TABLES
 $node_publisher->safe_psql(
 	'postgres', qq(
-	DROP TABLE tab1, tab2, tab3, tab4, tab5, tab6, tab7, test_mix_1,
+	DROP TABLE tab1, tab2, tab3, tab4, tab5, tab6, tab7,
 			   test_part, test_part_a, test_part_b, test_part_c, test_part_d;
 ));
 
 $node_publisher->safe_psql(
 	'postgres', qq(
 	CREATE TABLE test_mix_2 (a int PRIMARY KEY, b int, c int);
-	CREATE PUBLICATION pub_mix_3 FOR TABLE test_mix_2 (a, b);
+	CREATE PUBLICATION pub_mix_3 FOR TABLE test_mix_2 (a, b, c);
 	CREATE PUBLICATION pub_mix_4 FOR ALL TABLES;
 
 	-- initial data
@@ -976,12 +909,11 @@ $node_publisher->wait_for_catchup('sub1');
 is( $node_subscriber->safe_psql('postgres', "SELECT * FROM test_mix_2"),
 	qq(1|2|3
 4|5|6),
-	'a mix of publications should use a union of column list');
+	'all columns should be replicated');
 
 
-# TEST: With a table included in multiple publications, we should use a
-# union of the column lists. If any of the publications is FOR ALL
-# TABLES IN SCHEMA, we should replicate all columns.
+# TEST: With a table included in the publication which is FOR ALL TABLES
+# IN SCHEMA, it means replicate all columns.
 
 $node_subscriber->safe_psql(
 	'postgres', qq(
@@ -993,7 +925,7 @@ $node_publisher->safe_psql(
 	'postgres', qq(
 	DROP TABLE test_mix_2;
 	CREATE TABLE test_mix_3 (a int PRIMARY KEY, b int, c int);
-	CREATE PUBLICATION pub_mix_5 FOR TABLE test_mix_3 (a, b);
+	CREATE PUBLICATION pub_mix_5 FOR TABLE test_mix_3 (a, b, c);
 	CREATE PUBLICATION pub_mix_6 FOR ALL TABLES IN SCHEMA public;
 
 	-- initial data
@@ -1017,7 +949,7 @@ $node_publisher->wait_for_catchup('sub1');
 is( $node_subscriber->safe_psql('postgres', "SELECT * FROM test_mix_3"),
 	qq(1|2|3
 4|5|6),
-	'a mix of publications should use a union of column list');
+	'all columns should be replicated');
 
 
 # TEST: Check handling of publish_via_partition_root - if a partition is
@@ -1074,7 +1006,7 @@ is( $node_subscriber->safe_psql(
 # TEST: Multiple publications which publish schema of parent table and
 # partition. The partition is published through two publications, once
 # through a schema (so no column list) containing the parent, and then
-# also directly (with a columns list). The expected outcome is there is
+# also directly (with all columns). The expected outcome is there is
 # no column list.
 
 $node_publisher->safe_psql(
@@ -1086,7 +1018,7 @@ $node_publisher->safe_psql(
 	CREATE TABLE t_1 PARTITION OF s1.t FOR VALUES FROM (1) TO (10);
 
 	CREATE PUBLICATION pub1 FOR ALL TABLES IN SCHEMA s1;
-	CREATE PUBLICATION pub2 FOR TABLE t_1(b);
+	CREATE PUBLICATION pub2 FOR TABLE t_1(a, b, c);
 
 	-- initial data
 	INSERT INTO s1.t VALUES (1, 2, 3);
@@ -1232,7 +1164,6 @@ is( $node_subscriber->safe_psql(
 4||),
 	'publication containing both parent and child relation');
 
-
 $node_subscriber->stop('fast');
 $node_publisher->stop('fast');
 
-- 
2.7.2.windows.1

v2-0001-Extend-pg_publication_tables-to-display-column-list-.patchapplication/octet-stream; name=v2-0001-Extend-pg_publication_tables-to-display-column-list-.patchDownload

From 11809a7d7052d0b894f55c22f0660a5fdc217082 Mon Sep 17 00:00:00 2001
From: "houzj.fnst" <houzj.fnst@cn.fujitsu.com>
Date: Fri, 13 May 2022 11:35:14 +0800
Subject: [PATCH 1/2] Extend pg_publication_tables to display column list and
 row filter.

Commit 923def9a53 and 52e4f0cd47 allowed to specify column lists and row
filters for publication tables. This commit extends the
pg_publication_tables view and pg_get_publication_tables function to
display that information.

This information will be useful to users and we also need this for the
later commit that prohibits combining multiple publications with different
column lists for the same table.

Author: Hou Zhijie
Reviewed By: Amit Kapila
Discussion: https://postgr.es/m/202204251548.mudq7jbqnh7r@alvherre.pgsql
---
 doc/src/sgml/catalogs.sgml                  | 27 +++++++++++++--
 src/backend/catalog/pg_publication.c        | 52 ++++++++++++++++++++++++++++-
 src/backend/catalog/system_views.sql        | 10 +++++-
 src/backend/replication/logical/tablesync.c | 14 +++-----
 src/include/catalog/pg_proc.dat             |  8 ++---
 src/test/regress/expected/publication.out   | 42 +++++++++++------------
 src/test/regress/expected/rules.out         | 13 ++++++--
 7 files changed, 124 insertions(+), 42 deletions(-)

diff --git a/doc/src/sgml/catalogs.sgml b/doc/src/sgml/catalogs.sgml
index a533a21..008f81a 100644
--- a/doc/src/sgml/catalogs.sgml
+++ b/doc/src/sgml/catalogs.sgml
@@ -9691,7 +9691,7 @@ SCRAM-SHA-256$<replaceable>&lt;iteration count&gt;</replaceable>:<replaceable>&l
 
      <row>
       <entry><link linkend="view-pg-publication-tables"><structname>pg_publication_tables</structname></link></entry>
-      <entry>publications and their associated tables</entry>
+      <entry>publications and information of their associated tables</entry>
      </row>
 
      <row>
@@ -11635,8 +11635,8 @@ SELECT * FROM pg_locks pl LEFT JOIN pg_prepared_xacts ppx
 
   <para>
    The view <structname>pg_publication_tables</structname> provides
-   information about the mapping between publications and the tables they
-   contain.  Unlike the underlying catalog
+   information about the mapping between publications and information of
+   tables they contain.  Unlike the underlying catalog
    <link linkend="catalog-pg-publication-rel"><structname>pg_publication_rel</structname></link>,
    this view expands publications defined as <literal>FOR ALL TABLES</literal>
    and <literal>FOR ALL TABLES IN SCHEMA</literal>, so for such publications
@@ -11687,6 +11687,27 @@ SELECT * FROM pg_locks pl LEFT JOIN pg_prepared_xacts ppx
        Name of table
       </para></entry>
      </row>
+
+     <row>
+      <entry role="catalog_table_entry"><para role="column_definition">
+       <structfield>attnames</structfield> <type>name[]</type>
+       (references <link linkend="catalog-pg-attribute"><structname>pg_attribute</structname></link>.<structfield>attname</structfield>)
+      </para>
+      <para>
+       Names of table columns included in the publication. This contains all
+       the columns of table when user didn't specify column list for the
+       table.
+      </para></entry>
+     </row>
+
+     <row>
+      <entry role="catalog_table_entry"><para role="column_definition">
+       <structfield>rowfilter</structfield> <type>text</type>
+      </para>
+      <para>
+       Expression for the table's publication qualifying condition.
+      </para></entry>
+     </row>
     </tbody>
    </tgroup>
   </table>
diff --git a/src/backend/catalog/pg_publication.c b/src/backend/catalog/pg_publication.c
index e2c8bcb..3fe607e 100644
--- a/src/backend/catalog/pg_publication.c
+++ b/src/backend/catalog/pg_publication.c
@@ -1082,6 +1082,7 @@ get_publication_name(Oid pubid, bool missing_ok)
 Datum
 pg_get_publication_tables(PG_FUNCTION_ARGS)
 {
+#define NUM_PUBLICATOIN_TABLES_ELEM	3
 	FuncCallContext *funcctx;
 	char	   *pubname = text_to_cstring(PG_GETARG_TEXT_PP(0));
 	Publication *publication;
@@ -1090,6 +1091,7 @@ pg_get_publication_tables(PG_FUNCTION_ARGS)
 	/* stuff done only on the first call of the function */
 	if (SRF_IS_FIRSTCALL())
 	{
+		TupleDesc	tupdesc;
 		MemoryContext oldcontext;
 
 		/* create a function context for cross-call persistence */
@@ -1136,6 +1138,16 @@ pg_get_publication_tables(PG_FUNCTION_ARGS)
 				tables = filter_partitions(tables);
 		}
 
+		/* Construct a tuple descriptor for the result rows. */
+		tupdesc = CreateTemplateTupleDesc(NUM_PUBLICATOIN_TABLES_ELEM);
+		TupleDescInitEntry(tupdesc, (AttrNumber) 1, "relid",
+						   OIDOID, -1, 0);
+		TupleDescInitEntry(tupdesc, (AttrNumber) 2, "attrs",
+						   INT2VECTOROID, -1, 0);
+		TupleDescInitEntry(tupdesc, (AttrNumber) 3, "qual",
+						   PG_NODE_TREEOID, -1, 0);
+
+		funcctx->tuple_desc = BlessTupleDesc(tupdesc);
 		funcctx->user_fctx = (void *) tables;
 
 		MemoryContextSwitchTo(oldcontext);
@@ -1147,9 +1159,47 @@ pg_get_publication_tables(PG_FUNCTION_ARGS)
 
 	if (funcctx->call_cntr < list_length(tables))
 	{
+		HeapTuple	pubtuple = NULL;
+		HeapTuple	rettuple;
 		Oid			relid = list_nth_oid(tables, funcctx->call_cntr);
+		Datum		values[NUM_PUBLICATOIN_TABLES_ELEM];
+		bool		nulls[NUM_PUBLICATOIN_TABLES_ELEM];
+
+		/*
+		 * Form tuple with appropriate data.
+		 */
+		MemSet(nulls, 0, sizeof(nulls));
+		MemSet(values, 0, sizeof(values));
+
+		publication = GetPublicationByName(pubname, false);
+
+		values[0] = ObjectIdGetDatum(relid);
+
+		pubtuple = SearchSysCacheCopy2(PUBLICATIONRELMAP,
+									   ObjectIdGetDatum(relid),
+									   ObjectIdGetDatum(publication->oid));
+
+		if (HeapTupleIsValid(pubtuple))
+		{
+			/* Lookup the column list attribute. */
+			values[1] = SysCacheGetAttr(PUBLICATIONRELMAP, pubtuple,
+										Anum_pg_publication_rel_prattrs,
+										&(nulls[1]));
+
+			/* Null indicates no filter. */
+			values[2] = SysCacheGetAttr(PUBLICATIONRELMAP, pubtuple,
+									  Anum_pg_publication_rel_prqual,
+									  &(nulls[2]));
+		}
+		else
+		{
+			nulls[1] = true;
+			nulls[2] = true;
+		}
+
+		rettuple = heap_form_tuple(funcctx->tuple_desc, values, nulls);
 
-		SRF_RETURN_NEXT(funcctx, ObjectIdGetDatum(relid));
+		SRF_RETURN_NEXT(funcctx, HeapTupleGetDatum(rettuple));
 	}
 
 	SRF_RETURN_DONE(funcctx);
diff --git a/src/backend/catalog/system_views.sql b/src/backend/catalog/system_views.sql
index 0fc614e..fedaed5 100644
--- a/src/backend/catalog/system_views.sql
+++ b/src/backend/catalog/system_views.sql
@@ -368,7 +368,15 @@ CREATE VIEW pg_publication_tables AS
     SELECT
         P.pubname AS pubname,
         N.nspname AS schemaname,
-        C.relname AS tablename
+        C.relname AS tablename,
+        ( SELECT array_agg(a.attname ORDER BY a.attnum)
+          FROM unnest(CASE WHEN GPT.attrs IS NOT NULL THEN GPT.attrs
+                      ELSE (SELECT array_agg(g) FROM generate_series(1, C.relnatts) g)
+                      END) k
+               JOIN pg_attribute a
+                    ON (a.attrelid = GPT.relid AND a.attnum = k)
+        ) AS attnames,
+        pg_get_expr(GPT.qual, GPT.relid) AS rowfilter
     FROM pg_publication P,
          LATERAL pg_get_publication_tables(P.pubname) GPT,
          pg_class C JOIN pg_namespace N ON (N.oid = C.relnamespace)
diff --git a/src/backend/replication/logical/tablesync.c b/src/backend/replication/logical/tablesync.c
index b03e0f5..994c7a0 100644
--- a/src/backend/replication/logical/tablesync.c
+++ b/src/backend/replication/logical/tablesync.c
@@ -795,15 +795,12 @@ fetch_remote_table_info(char *nspname, char *relname,
 		resetStringInfo(&cmd);
 		appendStringInfo(&cmd,
 						 "SELECT DISTINCT unnest"
-						 "  FROM pg_publication p"
-						 "  LEFT OUTER JOIN pg_publication_rel pr"
-						 "       ON (p.oid = pr.prpubid AND pr.prrelid = %u)"
-						 "  LEFT OUTER JOIN unnest(pr.prattrs) ON TRUE,"
+						 "  FROM pg_publication p,"
 						 "  LATERAL pg_get_publication_tables(p.pubname) gpt"
+						 "  LEFT OUTER JOIN unnest(gpt.attrs) ON TRUE"
 						 " WHERE gpt.relid = %u"
 						 "   AND p.pubname IN ( %s )",
 						 lrel->remoteid,
-						 lrel->remoteid,
 						 pub_names.data);
 
 		pubres = walrcv_exec(LogRepWorkerWalRcvConn, cmd.data,
@@ -965,15 +962,12 @@ fetch_remote_table_info(char *nspname, char *relname,
 		/* Check for row filters. */
 		resetStringInfo(&cmd);
 		appendStringInfo(&cmd,
-						 "SELECT DISTINCT pg_get_expr(pr.prqual, pr.prrelid)"
-						 "  FROM pg_publication p"
-						 "  LEFT OUTER JOIN pg_publication_rel pr"
-						 "       ON (p.oid = pr.prpubid AND pr.prrelid = %u),"
+						 "SELECT DISTINCT pg_get_expr(gpt.qual, gpt.relid)"
+						 "  FROM pg_publication p,"
 						 "  LATERAL pg_get_publication_tables(p.pubname) gpt"
 						 " WHERE gpt.relid = %u"
 						 "   AND p.pubname IN ( %s )",
 						 lrel->remoteid,
-						 lrel->remoteid,
 						 pub_names.data);
 
 		res = walrcv_exec(LogRepWorkerWalRcvConn, cmd.data, 1, qualRow);
diff --git a/src/include/catalog/pg_proc.dat b/src/include/catalog/pg_proc.dat
index babe16f..87aa571 100644
--- a/src/include/catalog/pg_proc.dat
+++ b/src/include/catalog/pg_proc.dat
@@ -11673,11 +11673,11 @@
   prosrc => 'pg_show_replication_origin_status' },
 
 # publications
-{ oid => '6119', descr => 'get OIDs of tables in a publication',
+{ oid => '6119', descr => 'get information of tables in a publication',
   proname => 'pg_get_publication_tables', prorows => '1000', proretset => 't',
-  provolatile => 's', prorettype => 'oid', proargtypes => 'text',
-  proallargtypes => '{text,oid}', proargmodes => '{i,o}',
-  proargnames => '{pubname,relid}', prosrc => 'pg_get_publication_tables' },
+  provolatile => 's', prorettype => 'record', proargtypes => 'text',
+  proallargtypes => '{text,oid,int2vector,pg_node_tree}', proargmodes => '{i,o,o,o}',
+  proargnames => '{pubname,relid,attrs,qual}', prosrc => 'pg_get_publication_tables' },
 { oid => '6121',
   descr => 'returns whether a relation can be part of a publication',
   proname => 'pg_relation_is_publishable', provolatile => 's',
diff --git a/src/test/regress/expected/publication.out b/src/test/regress/expected/publication.out
index 398c0f3..274b37d 100644
--- a/src/test/regress/expected/publication.out
+++ b/src/test/regress/expected/publication.out
@@ -1585,52 +1585,52 @@ CREATE TABLE sch2.tbl1_part1 PARTITION OF sch1.tbl1 FOR VALUES FROM (1) to (10);
 -- Schema publication that does not include the schema that has the parent table
 CREATE PUBLICATION pub FOR ALL TABLES IN SCHEMA sch2 WITH (PUBLISH_VIA_PARTITION_ROOT=1);
 SELECT * FROM pg_publication_tables;
- pubname | schemaname | tablename  
----------+------------+------------
- pub     | sch2       | tbl1_part1
+ pubname | schemaname | tablename  | attnames | rowfilter 
+---------+------------+------------+----------+-----------
+ pub     | sch2       | tbl1_part1 | {a}      | 
 (1 row)
 
 DROP PUBLICATION pub;
 -- Table publication that does not include the parent table
 CREATE PUBLICATION pub FOR TABLE sch2.tbl1_part1 WITH (PUBLISH_VIA_PARTITION_ROOT=1);
 SELECT * FROM pg_publication_tables;
- pubname | schemaname | tablename  
----------+------------+------------
- pub     | sch2       | tbl1_part1
+ pubname | schemaname | tablename  | attnames | rowfilter 
+---------+------------+------------+----------+-----------
+ pub     | sch2       | tbl1_part1 | {a}      | 
 (1 row)
 
 -- Table publication that includes both the parent table and the child table
 ALTER PUBLICATION pub ADD TABLE sch1.tbl1;
 SELECT * FROM pg_publication_tables;
- pubname | schemaname | tablename 
----------+------------+-----------
- pub     | sch1       | tbl1
+ pubname | schemaname | tablename | attnames | rowfilter 
+---------+------------+-----------+----------+-----------
+ pub     | sch1       | tbl1      | {a}      | 
 (1 row)
 
 DROP PUBLICATION pub;
 -- Schema publication that does not include the schema that has the parent table
 CREATE PUBLICATION pub FOR ALL TABLES IN SCHEMA sch2 WITH (PUBLISH_VIA_PARTITION_ROOT=0);
 SELECT * FROM pg_publication_tables;
- pubname | schemaname | tablename  
----------+------------+------------
- pub     | sch2       | tbl1_part1
+ pubname | schemaname | tablename  | attnames | rowfilter 
+---------+------------+------------+----------+-----------
+ pub     | sch2       | tbl1_part1 | {a}      | 
 (1 row)
 
 DROP PUBLICATION pub;
 -- Table publication that does not include the parent table
 CREATE PUBLICATION pub FOR TABLE sch2.tbl1_part1 WITH (PUBLISH_VIA_PARTITION_ROOT=0);
 SELECT * FROM pg_publication_tables;
- pubname | schemaname | tablename  
----------+------------+------------
- pub     | sch2       | tbl1_part1
+ pubname | schemaname | tablename  | attnames | rowfilter 
+---------+------------+------------+----------+-----------
+ pub     | sch2       | tbl1_part1 | {a}      | 
 (1 row)
 
 -- Table publication that includes both the parent table and the child table
 ALTER PUBLICATION pub ADD TABLE sch1.tbl1;
 SELECT * FROM pg_publication_tables;
- pubname | schemaname | tablename  
----------+------------+------------
- pub     | sch2       | tbl1_part1
+ pubname | schemaname | tablename  | attnames | rowfilter 
+---------+------------+------------+----------+-----------
+ pub     | sch2       | tbl1_part1 | {a}      | 
 (1 row)
 
 DROP PUBLICATION pub;
@@ -1643,9 +1643,9 @@ CREATE TABLE sch1.tbl1_part3 (a int) PARTITION BY RANGE(a);
 ALTER TABLE sch1.tbl1 ATTACH PARTITION sch1.tbl1_part3 FOR VALUES FROM (20) to (30);
 CREATE PUBLICATION pub FOR ALL TABLES IN SCHEMA sch1 WITH (PUBLISH_VIA_PARTITION_ROOT=1);
 SELECT * FROM pg_publication_tables;
- pubname | schemaname | tablename 
----------+------------+-----------
- pub     | sch1       | tbl1
+ pubname | schemaname | tablename | attnames | rowfilter 
+---------+------------+-----------+----------+-----------
+ pub     | sch1       | tbl1      | {a}      | 
 (1 row)
 
 RESET client_min_messages;
diff --git a/src/test/regress/expected/rules.out b/src/test/regress/expected/rules.out
index 21effe8..fc3cde3 100644
--- a/src/test/regress/expected/rules.out
+++ b/src/test/regress/expected/rules.out
@@ -1437,9 +1437,18 @@ pg_prepared_xacts| SELECT p.transaction,
      LEFT JOIN pg_database d ON ((p.dbid = d.oid)));
 pg_publication_tables| SELECT p.pubname,
     n.nspname AS schemaname,
-    c.relname AS tablename
+    c.relname AS tablename,
+    ( SELECT array_agg(a.attname ORDER BY a.attnum) AS array_agg
+           FROM (unnest(
+                CASE
+                    WHEN (gpt.attrs IS NOT NULL) THEN (gpt.attrs)::integer[]
+                    ELSE ( SELECT array_agg(g.g) AS array_agg
+                       FROM generate_series(1, (c.relnatts)::integer) g(g))
+                END) k(k)
+             JOIN pg_attribute a ON (((a.attrelid = gpt.relid) AND (a.attnum = k.k))))) AS attnames,
+    pg_get_expr(gpt.qual, gpt.relid) AS rowfilter
    FROM pg_publication p,
-    LATERAL pg_get_publication_tables((p.pubname)::text) gpt(relid),
+    LATERAL pg_get_publication_tables((p.pubname)::text) gpt(relid, attrs, qual),
     (pg_class c
      JOIN pg_namespace n ON ((n.oid = c.relnamespace)))
   WHERE (c.oid = gpt.relid);
-- 
2.7.2.windows.1

#62

Alvaro Herrera

alvherre@alvh.no-ip.org

over 3 years ago

In reply to: Amit Kapila (#60)

Re: bogus: logical replication rows/cols combinations

On 2022-May-16, Amit Kapila wrote:

Agreed. If we can get columnlist and rowfilter from pg_publication_tables, it
will be more convenient. And I think users that want to fetch the columnlist
and rowfilter of table can also benefit from it.

After the change for this, we will give an error on combining
publications where one of the publications specifies all columns in
the table and the other doesn't provide any columns. We should not
give an error as both mean all columns.

But don't we need to behave the same way for both column lists and row
filters? I understand that some cases with different row filters for
different publications have shown to have weird behavior, so I think
it'd make sense to restrict it in the same way. That would allow us to
extend the behavior in a sensible way when we develop that, instead of
setting in stone now behavior that we regret later.

Few comments:
=================
1.
postgres=# select * from pg_publication_tables;
pubname | schemaname | tablename | columnlist | rowfilter
---------+------------+-----------+------------+-----------
pub1 | public | t1 | |
pub2 | public | t1 | 1 2 | (c3 < 10)
(2 rows)

I think it is better to display column names for columnlist in the
exposed view similar to attnames in the pg_stats_ext view. I think
that will make it easier for users to understand this information.

I think we should change the "descr" to something like: 'get
information of tables in a publication'

3.
+
+ /*
+ * We only throw a warning here so that the subcription can still be
+ * created and let user aware that something is going to fail later and
+ * they can fix the publications afterwards.
+ */
+ if (list_member(tablelist, rv))
+ ereport(WARNING,
+ errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+ errmsg("cannot use different column lists for table \"%s.%s\" in
different publications",
+    nspname, relname));
Can we extend this comment to explain the case where after Alter
Publication, if the user dumps and restores back the subscription,
there is a possibility that "CREATE SUBSCRIPTION" won't work if we
give ERROR here instead of WARNING?

Yeah, and not only the comment — I think we need to have more in the
warning message itself. How about:

ERROR: cannot use different column lists for table "..." in different publications
DETAIL: The subscription "..." cannot currently be used for replication.

I think this whole affair is a bit sad TBH and I'm sure it'll give us
some grief -- similar to replication slots becoming inactive and nobody
noticing. A user changing a publication in a way that prevents some
replica from working and the warnings are hidden, they could have
trouble noticing that the replica is stuck.

But I have no better ideas.

--
Álvaro Herrera 48°01'N 7°57'E — https://www.EnterpriseDB.com/

#63

shiy.fnst@fujitsu.com

over 3 years ago

In reply to: houzj.fnst@fujitsu.com (#61)

RE: bogus: logical replication rows/cols combinations

On Mon, May 16, 2022 8:34 PM houzj.fnst@fujitsu.com <houzj.fnst@fujitsu.com> wrote:

Attach the new version patch.

Thanks for your patch. Here are some comments:

1. (0001 patch)
/*
* Returns Oids of tables in a publication.
*/
Datum
pg_get_publication_tables(PG_FUNCTION_ARGS)

Should we modify the comment of pg_get_publication_tables() to "Returns
information of tables in a publication"?

2. (0002 patch)

+	 * Note that we don't support the case where column list is different for
+	 * the same table when combining publications. But we still need to check
+	 * all the given publication-table mappings and report an error if any
+	 * publications have different column list.
 	 *
 	 * Multiple publications might have multiple column lists for this
 	 * relation.

I think it would be better if we swap the order of these two paragraphs.

Regards,
Shi yu

#64

Amit Kapila

amit.kapila16@gmail.com

over 3 years ago

In reply to: Alvaro Herrera (#62)

Re: bogus: logical replication rows/cols combinations

On Mon, May 16, 2022 at 6:50 PM Alvaro Herrera <alvherre@alvh.no-ip.org> wrote:

On 2022-May-16, Amit Kapila wrote:

Agreed. If we can get columnlist and rowfilter from pg_publication_tables, it
will be more convenient. And I think users that want to fetch the columnlist
and rowfilter of table can also benefit from it.

After the change for this, we will give an error on combining
publications where one of the publications specifies all columns in
the table and the other doesn't provide any columns. We should not
give an error as both mean all columns.

But don't we need to behave the same way for both column lists and row
filters? I understand that some cases with different row filters for
different publications have shown to have weird behavior, so I think
it'd make sense to restrict it in the same way.

I think the case where we are worried about row filter behavior is for
initial table sync where we ignore publication actions and that is
true with and without row filters. See email [1]/messages/by-id/CAA4eK1L_98LF7Db4yFY1PhKKRzoT83xtN41jTS5X+8OeGrAkLw@mail.gmail.com. We are planning to
document that behavior as a separate patch. The idea we have used for
row filters is similar to what IBM DB2 [2]https://www.ibm.com/docs/en/idr/11.4.0?topic=rows-log-record-variables and Oracle [3]https://docs.oracle.com/en/cloud/paas/goldengate-cloud/gwuad/selecting-and-filtering-rows.html#GUID-11296A70-D953-4426-8EAA-37C2B4432446 uses where
they allow combining filters with pub-action (operation (insert,
update, delete) in their case).

I think both column lists and row filters have a different purpose and
we shouldn't try to make them behave in the same way. The main purpose
of introducing a column list is to have statically different shapes on
publisher and subscriber or hide sensitive column data. In both cases,
it doesn't seem to make sense to combine column lists and we didn't
find any other database doing so. OTOH, for row filters, it makes
sense to combine filters for each pub-action as both IBM DB2 and
Oracle seems to be doing.

3.
+
+ /*
+ * We only throw a warning here so that the subcription can still be
+ * created and let user aware that something is going to fail later and
+ * they can fix the publications afterwards.
+ */
+ if (list_member(tablelist, rv))
+ ereport(WARNING,
+ errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+ errmsg("cannot use different column lists for table \"%s.%s\" in
different publications",
+    nspname, relname));
Can we extend this comment to explain the case where after Alter
Publication, if the user dumps and restores back the subscription,
there is a possibility that "CREATE SUBSCRIPTION" won't work if we
give ERROR here instead of WARNING?
Yeah, and not only the comment — I think we need to have more in the
warning message itself.

But as mentioned by Hou-San in his last email (pg_dump of subscription
always set (connect = false) which means it won't try to fetch column
list), I think we don't need to give a WARNING here, instead, we can
use ERROR. So, do we need the extra DETAIL (The subscription "..."
cannot currently be used for replication.) as that is implicit for the
ERROR case?

I think this whole affair is a bit sad TBH and I'm sure it'll give us
some grief -- similar to replication slots becoming inactive and nobody
noticing. A user changing a publication in a way that prevents some
replica from working and the warnings are hidden, they could have
trouble noticing that the replica is stuck.

I agree and it seems this is the best we can do for now.

[1]: /messages/by-id/CAA4eK1L_98LF7Db4yFY1PhKKRzoT83xtN41jTS5X+8OeGrAkLw@mail.gmail.com
[2]: https://www.ibm.com/docs/en/idr/11.4.0?topic=rows-log-record-variables
[3]: https://docs.oracle.com/en/cloud/paas/goldengate-cloud/gwuad/selecting-and-filtering-rows.html#GUID-11296A70-D953-4426-8EAA-37C2B4432446

--
With Regards,
Amit Kapila.

#65

osumi.takamichi@fujitsu.com

over 3 years ago

In reply to: houzj.fnst@fujitsu.com (#61)

RE: bogus: logical replication rows/cols combinations

On Monday, May 16, 2022 9:34 PM houzj.fnst@fujitsu.com <houzj.fnst@fujitsu.com> wrote:

Attach the new version patch.

Hi,

I have few minor comments.

For v2-0001.

(1) Unnecessary period at the end of column explanation

+     <row>
+      <entry role="catalog_table_entry"><para role="column_definition">
+       <structfield>rowfilter</structfield> <type>text</type>
+      </para>
+      <para>
+       Expression for the table's publication qualifying condition.
+      </para></entry>
+     </row>

It seems when we write a simple noun to explain a column,
we don't need to put a period at the end of the explanation.
Kindly change
FROM:
"Expression for the table's publication qualifying condition."
TO:
"Expression for the table's publication qualifying condition"

For v2-0002.

(a) typo in the commit message

Kindly change
FROM:
"In both cases, it doesn't seems to make sense to combine column lists."
TO:
"In both cases, it doesn't seem to make sense to combine column lists."
or "In both cases, it doesn't make sense to combine column lists."

(b) fetch_table_list

+               if (list_member(tablelist, rv))
+                       ereport(ERROR,
+                                       errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+                                       errmsg("cannot use different column lists for table \"%s.%s\" in different publications",
+                                                  nspname, relname));

Kindly add tests for new error paths, when we add them.

Best Regards,
Takamichi Osumi

#66

Amit Kapila

amit.kapila16@gmail.com

over 3 years ago

In reply to: houzj.fnst@fujitsu.com (#61)

Re: bogus: logical replication rows/cols combinations

On Mon, May 16, 2022 at 6:04 PM houzj.fnst@fujitsu.com
<houzj.fnst@fujitsu.com> wrote:

Attach the new version patch.

Few minor comments:
==================
1.
+      <para>
+       Names of table columns included in the publication. This contains all
+       the columns of table when user didn't specify column list for the
+       table.
+      </para></entry>

Can we slightly change it to: "Names of table columns included in the
publication. This contains all the columns of the table when the user
didn't specify the column list for the table."

2. Below comments needs to be removed from tablesync.c as we don't
combine column lists after this patch.

* For initial synchronization, column lists can be ignored in following
* cases:
*
* 1) one of the subscribed publications for the table hasn't specified
* any column list
*
* 2) one of the subscribed publications has puballtables set to true
*
* 3) one of the subscribed publications is declared as ALL TABLES IN
* SCHEMA that includes this relation

3.
Note that we don't support the case where column list is different for
+ * the same table when combining publications. But we still need to check
+ * all the given publication-table mappings and report an error if any
+ * publications have different column list.

Can we change this comment to: "Note that we don't support the case
where the column list is different for the same table when combining
publications. But one can later change the publication so we still
need to check all the given publication-table mappings and report an
error if any publications have a different column list."?

4. Can we add a test for different column lists if it is not already there?

--
With Regards,
Amit Kapila.

#67

houzj.fnst@fujitsu.com

over 3 years ago

In reply to: Amit Kapila (#66)

2 attachment(s)

RE: bogus: logical replication rows/cols combinations

On Tuesday, May 17, 2022 2:53 PM Amit Kapila <amit.kapila16@gmail.com> wrote:

Few minor comments:
==================
1.
+      <para>
+       Names of table columns included in the publication. This contains all
+       the columns of table when user didn't specify column list for the
+       table.
+      </para></entry>
Can we slightly change it to: "Names of table columns included in the
publication. This contains all the columns of the table when the user
didn't specify the column list for the table."

2. Below comments needs to be removed from tablesync.c as we don't
combine column lists after this patch.

* For initial synchronization, column lists can be ignored in following
* cases:
*
* 1) one of the subscribed publications for the table hasn't specified
* any column list
*
* 2) one of the subscribed publications has puballtables set to true
*
* 3) one of the subscribed publications is declared as ALL TABLES IN
* SCHEMA that includes this relation
3.
Note that we don't support the case where column list is different for
+ * the same table when combining publications. But we still need to check
+ * all the given publication-table mappings and report an error if any
+ * publications have different column list.
Can we change this comment to: "Note that we don't support the case
where the column list is different for the same table when combining
publications. But one can later change the publication so we still
need to check all the given publication-table mappings and report an
error if any publications have a different column list."?

4. Can we add a test for different column lists if it is not already there?

Thanks for the comments.

Attach the new version patch which addressed all the above comments and
comments from Shi yu[1]/messages/by-id/OSZPR01MB6310F32344884F9C12F45071FDCE9@OSZPR01MB6310.jpnprd01.prod.outlook.com and Osumi-san[2]/messages/by-id/TYCPR01MB83736AEC2493FCBB75CC7556EDCE9@TYCPR01MB8373.jpnprd01.prod.outlook.com.

[1]: /messages/by-id/OSZPR01MB6310F32344884F9C12F45071FDCE9@OSZPR01MB6310.jpnprd01.prod.outlook.com
[2]: /messages/by-id/TYCPR01MB83736AEC2493FCBB75CC7556EDCE9@TYCPR01MB8373.jpnprd01.prod.outlook.com

Best regards,
Hou zj

Attachments:

v3-0001-Extend-pg_publication_tables-to-display-column-list-.patchapplication/octet-stream; name=v3-0001-Extend-pg_publication_tables-to-display-column-list-.patchDownload

From 87104fadcbdba46874a52e72e844ed4c3a73de1d Mon Sep 17 00:00:00 2001
From: "houzj.fnst" <houzj.fnst@cn.fujitsu.com>
Date: Fri, 13 May 2022 11:35:14 +0800
Subject: [PATCH] Extend pg_publication_tables to display column list and row
 filter.

Commit 923def9a53 and 52e4f0cd47 allowed to specify column lists and row
filters for publication tables. This commit extends the
pg_publication_tables view and pg_get_publication_tables function to
display that information.

This information will be useful to users and we also need this for the
later commit that prohibits combining multiple publications with different
column lists for the same table.

Author: Hou Zhijie
Reviewed By: Amit Kapila, Alvaro Herrera, Shi Yu, Takamichi Osumi
Discussion: https://postgr.es/m/202204251548.mudq7jbqnh7r@alvherre.pgsql
---
 doc/src/sgml/catalogs.sgml                  | 27 +++++++++++++--
 src/backend/catalog/pg_publication.c        | 54 +++++++++++++++++++++++++++--
 src/backend/catalog/system_views.sql        | 10 +++++-
 src/backend/replication/logical/tablesync.c | 14 +++-----
 src/include/catalog/pg_proc.dat             |  8 ++---
 src/test/regress/expected/publication.out   | 42 +++++++++++-----------
 src/test/regress/expected/rules.out         | 13 +++++--
 7 files changed, 125 insertions(+), 43 deletions(-)

diff --git a/doc/src/sgml/catalogs.sgml b/doc/src/sgml/catalogs.sgml
index a533a21..d96c72e 100644
--- a/doc/src/sgml/catalogs.sgml
+++ b/doc/src/sgml/catalogs.sgml
@@ -9691,7 +9691,7 @@ SCRAM-SHA-256$<replaceable>&lt;iteration count&gt;</replaceable>:<replaceable>&l
 
      <row>
       <entry><link linkend="view-pg-publication-tables"><structname>pg_publication_tables</structname></link></entry>
-      <entry>publications and their associated tables</entry>
+      <entry>publications and information of their associated tables</entry>
      </row>
 
      <row>
@@ -11635,8 +11635,8 @@ SELECT * FROM pg_locks pl LEFT JOIN pg_prepared_xacts ppx
 
   <para>
    The view <structname>pg_publication_tables</structname> provides
-   information about the mapping between publications and the tables they
-   contain.  Unlike the underlying catalog
+   information about the mapping between publications and information of
+   tables they contain.  Unlike the underlying catalog
    <link linkend="catalog-pg-publication-rel"><structname>pg_publication_rel</structname></link>,
    this view expands publications defined as <literal>FOR ALL TABLES</literal>
    and <literal>FOR ALL TABLES IN SCHEMA</literal>, so for such publications
@@ -11687,6 +11687,27 @@ SELECT * FROM pg_locks pl LEFT JOIN pg_prepared_xacts ppx
        Name of table
       </para></entry>
      </row>
+
+     <row>
+      <entry role="catalog_table_entry"><para role="column_definition">
+       <structfield>attnames</structfield> <type>name[]</type>
+       (references <link linkend="catalog-pg-attribute"><structname>pg_attribute</structname></link>.<structfield>attname</structfield>)
+      </para>
+      <para>
+       Names of table columns included in the publication. This contains all
+       the columns of the table when the user didn't specify the column list
+       for the table.
+      </para></entry>
+     </row>
+
+     <row>
+      <entry role="catalog_table_entry"><para role="column_definition">
+       <structfield>rowfilter</structfield> <type>text</type>
+      </para>
+      <para>
+       Expression for the table's publication qualifying condition
+      </para></entry>
+     </row>
     </tbody>
    </tgroup>
   </table>
diff --git a/src/backend/catalog/pg_publication.c b/src/backend/catalog/pg_publication.c
index e2c8bcb..152b4ba 100644
--- a/src/backend/catalog/pg_publication.c
+++ b/src/backend/catalog/pg_publication.c
@@ -1077,11 +1077,12 @@ get_publication_name(Oid pubid, bool missing_ok)
 }
 
 /*
- * Returns Oids of tables in a publication.
+ * Returns information of tables in a publication.
  */
 Datum
 pg_get_publication_tables(PG_FUNCTION_ARGS)
 {
+#define NUM_PUBLICATOIN_TABLES_ELEM	3
 	FuncCallContext *funcctx;
 	char	   *pubname = text_to_cstring(PG_GETARG_TEXT_PP(0));
 	Publication *publication;
@@ -1090,6 +1091,7 @@ pg_get_publication_tables(PG_FUNCTION_ARGS)
 	/* stuff done only on the first call of the function */
 	if (SRF_IS_FIRSTCALL())
 	{
+		TupleDesc	tupdesc;
 		MemoryContext oldcontext;
 
 		/* create a function context for cross-call persistence */
@@ -1136,6 +1138,16 @@ pg_get_publication_tables(PG_FUNCTION_ARGS)
 				tables = filter_partitions(tables);
 		}
 
+		/* Construct a tuple descriptor for the result rows. */
+		tupdesc = CreateTemplateTupleDesc(NUM_PUBLICATOIN_TABLES_ELEM);
+		TupleDescInitEntry(tupdesc, (AttrNumber) 1, "relid",
+						   OIDOID, -1, 0);
+		TupleDescInitEntry(tupdesc, (AttrNumber) 2, "attrs",
+						   INT2VECTOROID, -1, 0);
+		TupleDescInitEntry(tupdesc, (AttrNumber) 3, "qual",
+						   PG_NODE_TREEOID, -1, 0);
+
+		funcctx->tuple_desc = BlessTupleDesc(tupdesc);
 		funcctx->user_fctx = (void *) tables;
 
 		MemoryContextSwitchTo(oldcontext);
@@ -1147,9 +1159,47 @@ pg_get_publication_tables(PG_FUNCTION_ARGS)
 
 	if (funcctx->call_cntr < list_length(tables))
 	{
+		HeapTuple	pubtuple = NULL;
+		HeapTuple	rettuple;
 		Oid			relid = list_nth_oid(tables, funcctx->call_cntr);
+		Datum		values[NUM_PUBLICATOIN_TABLES_ELEM];
+		bool		nulls[NUM_PUBLICATOIN_TABLES_ELEM];
+
+		/*
+		 * Form tuple with appropriate data.
+		 */
+		MemSet(nulls, 0, sizeof(nulls));
+		MemSet(values, 0, sizeof(values));
+
+		publication = GetPublicationByName(pubname, false);
+
+		values[0] = ObjectIdGetDatum(relid);
+
+		pubtuple = SearchSysCacheCopy2(PUBLICATIONRELMAP,
+									   ObjectIdGetDatum(relid),
+									   ObjectIdGetDatum(publication->oid));
+
+		if (HeapTupleIsValid(pubtuple))
+		{
+			/* Lookup the column list attribute. */
+			values[1] = SysCacheGetAttr(PUBLICATIONRELMAP, pubtuple,
+										Anum_pg_publication_rel_prattrs,
+										&(nulls[1]));
+
+			/* Null indicates no filter. */
+			values[2] = SysCacheGetAttr(PUBLICATIONRELMAP, pubtuple,
+									  Anum_pg_publication_rel_prqual,
+									  &(nulls[2]));
+		}
+		else
+		{
+			nulls[1] = true;
+			nulls[2] = true;
+		}
+
+		rettuple = heap_form_tuple(funcctx->tuple_desc, values, nulls);
 
-		SRF_RETURN_NEXT(funcctx, ObjectIdGetDatum(relid));
+		SRF_RETURN_NEXT(funcctx, HeapTupleGetDatum(rettuple));
 	}
 
 	SRF_RETURN_DONE(funcctx);
diff --git a/src/backend/catalog/system_views.sql b/src/backend/catalog/system_views.sql
index 0fc614e..fedaed5 100644
--- a/src/backend/catalog/system_views.sql
+++ b/src/backend/catalog/system_views.sql
@@ -368,7 +368,15 @@ CREATE VIEW pg_publication_tables AS
     SELECT
         P.pubname AS pubname,
         N.nspname AS schemaname,
-        C.relname AS tablename
+        C.relname AS tablename,
+        ( SELECT array_agg(a.attname ORDER BY a.attnum)
+          FROM unnest(CASE WHEN GPT.attrs IS NOT NULL THEN GPT.attrs
+                      ELSE (SELECT array_agg(g) FROM generate_series(1, C.relnatts) g)
+                      END) k
+               JOIN pg_attribute a
+                    ON (a.attrelid = GPT.relid AND a.attnum = k)
+        ) AS attnames,
+        pg_get_expr(GPT.qual, GPT.relid) AS rowfilter
     FROM pg_publication P,
          LATERAL pg_get_publication_tables(P.pubname) GPT,
          pg_class C JOIN pg_namespace N ON (N.oid = C.relnamespace)
diff --git a/src/backend/replication/logical/tablesync.c b/src/backend/replication/logical/tablesync.c
index b03e0f5..994c7a0 100644
--- a/src/backend/replication/logical/tablesync.c
+++ b/src/backend/replication/logical/tablesync.c
@@ -795,15 +795,12 @@ fetch_remote_table_info(char *nspname, char *relname,
 		resetStringInfo(&cmd);
 		appendStringInfo(&cmd,
 						 "SELECT DISTINCT unnest"
-						 "  FROM pg_publication p"
-						 "  LEFT OUTER JOIN pg_publication_rel pr"
-						 "       ON (p.oid = pr.prpubid AND pr.prrelid = %u)"
-						 "  LEFT OUTER JOIN unnest(pr.prattrs) ON TRUE,"
+						 "  FROM pg_publication p,"
 						 "  LATERAL pg_get_publication_tables(p.pubname) gpt"
+						 "  LEFT OUTER JOIN unnest(gpt.attrs) ON TRUE"
 						 " WHERE gpt.relid = %u"
 						 "   AND p.pubname IN ( %s )",
 						 lrel->remoteid,
-						 lrel->remoteid,
 						 pub_names.data);
 
 		pubres = walrcv_exec(LogRepWorkerWalRcvConn, cmd.data,
@@ -965,15 +962,12 @@ fetch_remote_table_info(char *nspname, char *relname,
 		/* Check for row filters. */
 		resetStringInfo(&cmd);
 		appendStringInfo(&cmd,
-						 "SELECT DISTINCT pg_get_expr(pr.prqual, pr.prrelid)"
-						 "  FROM pg_publication p"
-						 "  LEFT OUTER JOIN pg_publication_rel pr"
-						 "       ON (p.oid = pr.prpubid AND pr.prrelid = %u),"
+						 "SELECT DISTINCT pg_get_expr(gpt.qual, gpt.relid)"
+						 "  FROM pg_publication p,"
 						 "  LATERAL pg_get_publication_tables(p.pubname) gpt"
 						 " WHERE gpt.relid = %u"
 						 "   AND p.pubname IN ( %s )",
 						 lrel->remoteid,
-						 lrel->remoteid,
 						 pub_names.data);
 
 		res = walrcv_exec(LogRepWorkerWalRcvConn, cmd.data, 1, qualRow);
diff --git a/src/include/catalog/pg_proc.dat b/src/include/catalog/pg_proc.dat
index babe16f..87aa571 100644
--- a/src/include/catalog/pg_proc.dat
+++ b/src/include/catalog/pg_proc.dat
@@ -11673,11 +11673,11 @@
   prosrc => 'pg_show_replication_origin_status' },
 
 # publications
-{ oid => '6119', descr => 'get OIDs of tables in a publication',
+{ oid => '6119', descr => 'get information of tables in a publication',
   proname => 'pg_get_publication_tables', prorows => '1000', proretset => 't',
-  provolatile => 's', prorettype => 'oid', proargtypes => 'text',
-  proallargtypes => '{text,oid}', proargmodes => '{i,o}',
-  proargnames => '{pubname,relid}', prosrc => 'pg_get_publication_tables' },
+  provolatile => 's', prorettype => 'record', proargtypes => 'text',
+  proallargtypes => '{text,oid,int2vector,pg_node_tree}', proargmodes => '{i,o,o,o}',
+  proargnames => '{pubname,relid,attrs,qual}', prosrc => 'pg_get_publication_tables' },
 { oid => '6121',
   descr => 'returns whether a relation can be part of a publication',
   proname => 'pg_relation_is_publishable', provolatile => 's',
diff --git a/src/test/regress/expected/publication.out b/src/test/regress/expected/publication.out
index 398c0f3..274b37d 100644
--- a/src/test/regress/expected/publication.out
+++ b/src/test/regress/expected/publication.out
@@ -1585,52 +1585,52 @@ CREATE TABLE sch2.tbl1_part1 PARTITION OF sch1.tbl1 FOR VALUES FROM (1) to (10);
 -- Schema publication that does not include the schema that has the parent table
 CREATE PUBLICATION pub FOR ALL TABLES IN SCHEMA sch2 WITH (PUBLISH_VIA_PARTITION_ROOT=1);
 SELECT * FROM pg_publication_tables;
- pubname | schemaname | tablename  
----------+------------+------------
- pub     | sch2       | tbl1_part1
+ pubname | schemaname | tablename  | attnames | rowfilter 
+---------+------------+------------+----------+-----------
+ pub     | sch2       | tbl1_part1 | {a}      | 
 (1 row)
 
 DROP PUBLICATION pub;
 -- Table publication that does not include the parent table
 CREATE PUBLICATION pub FOR TABLE sch2.tbl1_part1 WITH (PUBLISH_VIA_PARTITION_ROOT=1);
 SELECT * FROM pg_publication_tables;
- pubname | schemaname | tablename  
----------+------------+------------
- pub     | sch2       | tbl1_part1
+ pubname | schemaname | tablename  | attnames | rowfilter 
+---------+------------+------------+----------+-----------
+ pub     | sch2       | tbl1_part1 | {a}      | 
 (1 row)
 
 -- Table publication that includes both the parent table and the child table
 ALTER PUBLICATION pub ADD TABLE sch1.tbl1;
 SELECT * FROM pg_publication_tables;
- pubname | schemaname | tablename 
----------+------------+-----------
- pub     | sch1       | tbl1
+ pubname | schemaname | tablename | attnames | rowfilter 
+---------+------------+-----------+----------+-----------
+ pub     | sch1       | tbl1      | {a}      | 
 (1 row)
 
 DROP PUBLICATION pub;
 -- Schema publication that does not include the schema that has the parent table
 CREATE PUBLICATION pub FOR ALL TABLES IN SCHEMA sch2 WITH (PUBLISH_VIA_PARTITION_ROOT=0);
 SELECT * FROM pg_publication_tables;
- pubname | schemaname | tablename  
----------+------------+------------
- pub     | sch2       | tbl1_part1
+ pubname | schemaname | tablename  | attnames | rowfilter 
+---------+------------+------------+----------+-----------
+ pub     | sch2       | tbl1_part1 | {a}      | 
 (1 row)
 
 DROP PUBLICATION pub;
 -- Table publication that does not include the parent table
 CREATE PUBLICATION pub FOR TABLE sch2.tbl1_part1 WITH (PUBLISH_VIA_PARTITION_ROOT=0);
 SELECT * FROM pg_publication_tables;
- pubname | schemaname | tablename  
----------+------------+------------
- pub     | sch2       | tbl1_part1
+ pubname | schemaname | tablename  | attnames | rowfilter 
+---------+------------+------------+----------+-----------
+ pub     | sch2       | tbl1_part1 | {a}      | 
 (1 row)
 
 -- Table publication that includes both the parent table and the child table
 ALTER PUBLICATION pub ADD TABLE sch1.tbl1;
 SELECT * FROM pg_publication_tables;
- pubname | schemaname | tablename  
----------+------------+------------
- pub     | sch2       | tbl1_part1
+ pubname | schemaname | tablename  | attnames | rowfilter 
+---------+------------+------------+----------+-----------
+ pub     | sch2       | tbl1_part1 | {a}      | 
 (1 row)
 
 DROP PUBLICATION pub;
@@ -1643,9 +1643,9 @@ CREATE TABLE sch1.tbl1_part3 (a int) PARTITION BY RANGE(a);
 ALTER TABLE sch1.tbl1 ATTACH PARTITION sch1.tbl1_part3 FOR VALUES FROM (20) to (30);
 CREATE PUBLICATION pub FOR ALL TABLES IN SCHEMA sch1 WITH (PUBLISH_VIA_PARTITION_ROOT=1);
 SELECT * FROM pg_publication_tables;
- pubname | schemaname | tablename 
----------+------------+-----------
- pub     | sch1       | tbl1
+ pubname | schemaname | tablename | attnames | rowfilter 
+---------+------------+-----------+----------+-----------
+ pub     | sch1       | tbl1      | {a}      | 
 (1 row)
 
 RESET client_min_messages;
diff --git a/src/test/regress/expected/rules.out b/src/test/regress/expected/rules.out
index 21effe8..fc3cde3 100644
--- a/src/test/regress/expected/rules.out
+++ b/src/test/regress/expected/rules.out
@@ -1437,9 +1437,18 @@ pg_prepared_xacts| SELECT p.transaction,
      LEFT JOIN pg_database d ON ((p.dbid = d.oid)));
 pg_publication_tables| SELECT p.pubname,
     n.nspname AS schemaname,
-    c.relname AS tablename
+    c.relname AS tablename,
+    ( SELECT array_agg(a.attname ORDER BY a.attnum) AS array_agg
+           FROM (unnest(
+                CASE
+                    WHEN (gpt.attrs IS NOT NULL) THEN (gpt.attrs)::integer[]
+                    ELSE ( SELECT array_agg(g.g) AS array_agg
+                       FROM generate_series(1, (c.relnatts)::integer) g(g))
+                END) k(k)
+             JOIN pg_attribute a ON (((a.attrelid = gpt.relid) AND (a.attnum = k.k))))) AS attnames,
+    pg_get_expr(gpt.qual, gpt.relid) AS rowfilter
    FROM pg_publication p,
-    LATERAL pg_get_publication_tables((p.pubname)::text) gpt(relid),
+    LATERAL pg_get_publication_tables((p.pubname)::text) gpt(relid, attrs, qual),
     (pg_class c
      JOIN pg_namespace n ON ((n.oid = c.relnamespace)))
   WHERE (c.oid = gpt.relid);
-- 
2.7.2.windows.1

v3-0002-Prohibit-combining-publications-with-different-colum.patchapplication/octet-stream; name=v3-0002-Prohibit-combining-publications-with-different-colum.patchDownload

From 91a4774b9f351cb09f461ee920be9031cc4e073e Mon Sep 17 00:00:00 2001
From: "houzj.fnst" <houzj.fnst@cn.fujitsu.com>
Date: Fri, 13 May 2022 13:19:49 +0800
Subject: [PATCH] Prohibit combining publications with different column lists.

The main purpose of introducing a column list is to have statically
different shapes on publisher and subscriber or hide sensitive column
data. In both cases, it doesn't seem to make sense to combine column
lists.

So, we disallow the cases where the column list is different for the same
table when combining publications. It can be later extended to combine the
column lists for selective cases where required.

Reported-by: Alvaro Herrera
Author: Hou Zhijie
Reviewed-by: Amit Kapila, Alvaro Herrera, Shi Yu, Takamichi Osumi
Discussion: https://postgr.es/m/202204251548.mudq7jbqnh7r@alvherre.pgsql
---
 doc/src/sgml/ref/alter_publication.sgml     |  12 ++-
 doc/src/sgml/ref/create_subscription.sgml   |   5 +
 src/backend/commands/subscriptioncmds.c     |  23 ++++-
 src/backend/replication/logical/tablesync.c |  72 ++++++-------
 src/backend/replication/pgoutput/pgoutput.c |  80 +++++++--------
 src/test/subscription/t/031_column_list.pl  | 150 ++++++++++++----------------
 6 files changed, 176 insertions(+), 166 deletions(-)

diff --git a/doc/src/sgml/ref/alter_publication.sgml b/doc/src/sgml/ref/alter_publication.sgml
index e2cce49..f03933a 100644
--- a/doc/src/sgml/ref/alter_publication.sgml
+++ b/doc/src/sgml/ref/alter_publication.sgml
@@ -116,7 +116,17 @@ ALTER PUBLICATION <replaceable class="parameter">name</replaceable> RENAME TO <r
 
      <para>
       Optionally, a column list can be specified.  See <xref
-      linkend="sql-createpublication"/> for details.
+      linkend="sql-createpublication"/> for details. Note that a subscription
+      having several publications in which the same table has been published
+      with different column lists is not supported. So, changing the column
+      lists of the tables being subscribed could cause inconsistency of column
+      lists among publications in which case <command>ALTER PUBLICATION</command>
+      command will be successful but later the WalSender in publisher or the
+      subscriber may throw an error. In this scenario, the user needs to
+      recreate the subscription after adjusting the column list or drop the
+      problematic publication using
+      <literal>ALTER SUBSCRIPTION ... DROP PUBLICATION</literal> and then add
+      it back after adjusting the column list.
      </para>
 
      <para>
diff --git a/doc/src/sgml/ref/create_subscription.sgml b/doc/src/sgml/ref/create_subscription.sgml
index 203bb41..f6f82a0 100644
--- a/doc/src/sgml/ref/create_subscription.sgml
+++ b/doc/src/sgml/ref/create_subscription.sgml
@@ -356,6 +356,11 @@ CREATE SUBSCRIPTION <replaceable class="parameter">subscription_name</replaceabl
   </para>
 
   <para>
+   Subscription having several publications in which the same table has been
+   published with different column lists is not supported.
+  </para>
+
+  <para>
    We allow non-existent publications to be specified so that users can add
    those later. This means
    <link linkend="catalog-pg-subscription"><structname>pg_subscription</structname></link>
diff --git a/src/backend/commands/subscriptioncmds.c b/src/backend/commands/subscriptioncmds.c
index 690cdaa..991b2c1 100644
--- a/src/backend/commands/subscriptioncmds.c
+++ b/src/backend/commands/subscriptioncmds.c
@@ -1753,7 +1753,8 @@ AlterSubscriptionOwner_oid(Oid subid, Oid newOwnerId)
 
 /*
  * Get the list of tables which belong to specified publications on the
- * publisher connection.
+ * publisher connection. Also get the column list for each table and check if
+ * column lists are the same in different publications.
  */
 static List *
 fetch_table_list(WalReceiverConn *wrconn, List *publications)
@@ -1761,17 +1762,18 @@ fetch_table_list(WalReceiverConn *wrconn, List *publications)
 	WalRcvExecResult *res;
 	StringInfoData cmd;
 	TupleTableSlot *slot;
-	Oid			tableRow[2] = {TEXTOID, TEXTOID};
+	Oid			tableRow[3] = {TEXTOID, TEXTOID, NAMEARRAYOID};
 	List	   *tablelist = NIL;
 
 	initStringInfo(&cmd);
-	appendStringInfoString(&cmd, "SELECT DISTINCT t.schemaname, t.tablename\n"
+	appendStringInfoString(&cmd, "SELECT DISTINCT t.schemaname, t.tablename, \n"
+						   "                      t.attnames\n"
 						   "  FROM pg_catalog.pg_publication_tables t\n"
 						   " WHERE t.pubname IN (");
 	get_publications_str(publications, &cmd, true);
 	appendStringInfoChar(&cmd, ')');
 
-	res = walrcv_exec(wrconn, cmd.data, 2, tableRow);
+	res = walrcv_exec(wrconn, cmd.data, 3, tableRow);
 	pfree(cmd.data);
 
 	if (res->status != WALRCV_OK_TUPLES)
@@ -1795,7 +1797,18 @@ fetch_table_list(WalReceiverConn *wrconn, List *publications)
 		Assert(!isnull);
 
 		rv = makeRangeVar(nspname, relname, -1);
-		tablelist = lappend(tablelist, rv);
+
+		/*
+		 * We don't support the case where column list is different for the
+		 * same table in different publications.
+		 */
+		if (list_member(tablelist, rv))
+			ereport(ERROR,
+					errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+					errmsg("cannot use different column lists for table \"%s.%s\" in different publications",
+						   nspname, relname));
+		else
+			tablelist = lappend(tablelist, rv);
 
 		ExecClearTuple(slot);
 	}
diff --git a/src/backend/replication/logical/tablesync.c b/src/backend/replication/logical/tablesync.c
index 994c7a0..afc5bcd 100644
--- a/src/backend/replication/logical/tablesync.c
+++ b/src/backend/replication/logical/tablesync.c
@@ -753,17 +753,6 @@ fetch_remote_table_info(char *nspname, char *relname,
 	/*
 	 * Get column lists for each relation.
 	 *
-	 * For initial synchronization, column lists can be ignored in following
-	 * cases:
-	 *
-	 * 1) one of the subscribed publications for the table hasn't specified
-	 * any column list
-	 *
-	 * 2) one of the subscribed publications has puballtables set to true
-	 *
-	 * 3) one of the subscribed publications is declared as ALL TABLES IN
-	 * SCHEMA that includes this relation
-	 *
 	 * We need to do this before fetching info about column names and types,
 	 * so that we can skip columns that should not be replicated.
 	 */
@@ -771,7 +760,7 @@ fetch_remote_table_info(char *nspname, char *relname,
 	{
 		WalRcvExecResult *pubres;
 		TupleTableSlot *slot;
-		Oid			attrsRow[] = {INT2OID};
+		Oid			attrsRow[] = {INT2VECTOROID};
 		StringInfoData pub_names;
 		bool		first = true;
 
@@ -786,19 +775,17 @@ fetch_remote_table_info(char *nspname, char *relname,
 
 		/*
 		 * Fetch info about column lists for the relation (from all the
-		 * publications). We unnest the int2vector values, because that makes
-		 * it easier to combine lists by simply adding the attnums to a new
-		 * bitmap (without having to parse the int2vector data). This
-		 * preserves NULL values, so that if one of the publications has no
-		 * column list, we'll know that.
+		 * publications).
 		 */
 		resetStringInfo(&cmd);
 		appendStringInfo(&cmd,
-						 "SELECT DISTINCT unnest"
+						 "SELECT DISTINCT"
+						 "  (CASE WHEN (array_length(gpt.attrs, 1) = c.relnatts)"
+						 "   THEN NULL ELSE gpt.attrs END)"
 						 "  FROM pg_publication p,"
-						 "  LATERAL pg_get_publication_tables(p.pubname) gpt"
-						 "  LEFT OUTER JOIN unnest(gpt.attrs) ON TRUE"
-						 " WHERE gpt.relid = %u"
+						 "  LATERAL pg_get_publication_tables(p.pubname) gpt,"
+						 "  pg_class c"
+						 " WHERE gpt.relid = %u AND c.oid = gpt.relid"
 						 "   AND p.pubname IN ( %s )",
 						 lrel->remoteid,
 						 pub_names.data);
@@ -813,26 +800,43 @@ fetch_remote_table_info(char *nspname, char *relname,
 							nspname, relname, pubres->err)));
 
 		/*
-		 * Merge the column lists (from different publications) by creating a
-		 * single bitmap with all the attnums. If we find a NULL value, that
-		 * means one of the publications has no column list for the table
-		 * we're syncing.
+		 * We don't support the case where the column list is different for the
+		 * same table when combining publications. So there should be only one
+		 * row returned. Although we already checked this when creating
+		 * subscription, we still need to check here in case the column list
+		 * was changed after creating the subscription and before the sync
+		 * worker is started.
+		 */
+		if (tuplestore_tuple_count(pubres->tuplestore) > 1)
+			ereport(ERROR,
+					errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+					errmsg("cannot use different column lists for table \"%s.%s\" in different publications",
+						   nspname, relname));
+
+		/*
+		 * Get the column list and build a single bitmap with the attnums.
+		 *
+		 * If we find a NULL value, it means all the columns should be
+		 * replicated.
 		 */
 		slot = MakeSingleTupleTableSlot(pubres->tupledesc, &TTSOpsMinimalTuple);
-		while (tuplestore_gettupleslot(pubres->tuplestore, true, false, slot))
+		if (tuplestore_gettupleslot(pubres->tuplestore, true, false, slot))
 		{
 			Datum		cfval = slot_getattr(slot, 1, &isnull);
 
-			/* NULL means empty column list, so we're done. */
-			if (isnull)
+			if (!isnull)
 			{
-				bms_free(included_cols);
-				included_cols = NULL;
-				break;
-			}
+				ArrayType  *arr;
+				int			nelems;
+				int16	   *elems;
 
-			included_cols = bms_add_member(included_cols,
-										   DatumGetInt16(cfval));
+				arr = DatumGetArrayTypeP(cfval);
+				nelems = ARR_DIMS(arr)[0];
+				elems = (int16 *) ARR_DATA_PTR(arr);
+
+				for (natt = 0; natt < nelems; natt++)
+					included_cols = bms_add_member(included_cols, elems[natt]);
+			}
 
 			ExecClearTuple(slot);
 		}
diff --git a/src/backend/replication/pgoutput/pgoutput.c b/src/backend/replication/pgoutput/pgoutput.c
index 42c06af..e9079bd 100644
--- a/src/backend/replication/pgoutput/pgoutput.c
+++ b/src/backend/replication/pgoutput/pgoutput.c
@@ -979,30 +979,31 @@ pgoutput_column_list_init(PGOutputData *data, List *publications,
 						  RelationSyncEntry *entry)
 {
 	ListCell   *lc;
+	bool		first = true;
+	Relation	relation = RelationIdGetRelation(entry->publish_as_relid);
 
 	/*
 	 * Find if there are any column lists for this relation. If there are,
-	 * build a bitmap merging all the column lists.
-	 *
-	 * All the given publication-table mappings must be checked.
+	 * build a bitmap using the column lists.
 	 *
 	 * Multiple publications might have multiple column lists for this
 	 * relation.
 	 *
+	 * Note that we don't support the case where the column list is different
+	 * for the same table when combining publications. But one can later change
+	 * the publication so we still need to check all the given
+	 * publication-table mappings and report an error if any publications have
+	 * a different column list.
+	 *
 	 * FOR ALL TABLES and FOR ALL TABLES IN SCHEMA implies "don't use column
-	 * list" so it takes precedence.
+	 * list".
 	 */
 	foreach(lc, publications)
 	{
 		Publication *pub = lfirst(lc);
 		HeapTuple	cftuple = NULL;
 		Datum		cfdatum = 0;
-
-		/*
-		 * Assume there's no column list. Only if we find pg_publication_rel
-		 * entry with a column list we'll switch it to false.
-		 */
-		bool		pub_no_list = true;
+		Bitmapset	*cols = NULL;
 
 		/*
 		 * If the publication is FOR ALL TABLES then it is treated the same as
@@ -1011,6 +1012,8 @@ pgoutput_column_list_init(PGOutputData *data, List *publications,
 		 */
 		if (!pub->alltables)
 		{
+			bool		pub_no_list = true;
+
 			/*
 			 * Check for the presence of a column list in this publication.
 			 *
@@ -1024,51 +1027,48 @@ pgoutput_column_list_init(PGOutputData *data, List *publications,
 
 			if (HeapTupleIsValid(cftuple))
 			{
-				/*
-				 * Lookup the column list attribute.
-				 *
-				 * Note: We update the pub_no_list value directly, because if
-				 * the value is NULL, we have no list (and vice versa).
-				 */
+				/* Lookup the column list attribute. */
 				cfdatum = SysCacheGetAttr(PUBLICATIONRELMAP, cftuple,
 										  Anum_pg_publication_rel_prattrs,
 										  &pub_no_list);
 
-				/*
-				 * Build the column list bitmap in the per-entry context.
-				 *
-				 * We need to merge column lists from all publications, so we
-				 * update the same bitmapset. If the column list is null, we
-				 * interpret it as replicating all columns.
-				 */
+				/* Build the column list bitmap in the per-entry context. */
 				if (!pub_no_list)	/* when not null */
 				{
 					pgoutput_ensure_entry_cxt(data, entry);
 
-					entry->columns = pub_collist_to_bitmapset(entry->columns,
-															  cfdatum,
-															  entry->entry_cxt);
+					cols = pub_collist_to_bitmapset(cols, cfdatum,
+													entry->entry_cxt);
+
+					/*
+					 * If column list includes all the columns of the table,
+					 * set it to NULL.
+					 */
+					if (bms_num_members(cols) == RelationGetNumberOfAttributes(relation))
+					{
+						bms_free(cols);
+						cols = NULL;
+					}
 				}
+
+				ReleaseSysCache(cftuple);
 			}
 		}
 
-		/*
-		 * Found a publication with no column list, so we're done. But first
-		 * discard column list we might have from preceding publications.
-		 */
-		if (pub_no_list)
+		if (first)
 		{
-			if (cftuple)
-				ReleaseSysCache(cftuple);
-
-			bms_free(entry->columns);
-			entry->columns = NULL;
-
-			break;
+			entry->columns = cols;
+			first = false;
 		}
-
-		ReleaseSysCache(cftuple);
+		else if (!bms_equal(entry->columns, cols))
+			ereport(ERROR,
+					errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+					errmsg("cannot use different column lists for table \"%s.%s\" in different publications",
+						   get_namespace_name(RelationGetNamespace(relation)),
+						   RelationGetRelationName(relation)));
 	}							/* loop all subscribed publications */
+
+	RelationClose(relation);
 }
 
 /*
diff --git a/src/test/subscription/t/031_column_list.pl b/src/test/subscription/t/031_column_list.pl
index 19812e1..7f031bc 100644
--- a/src/test/subscription/t/031_column_list.pl
+++ b/src/test/subscription/t/031_column_list.pl
@@ -20,6 +20,7 @@ $node_subscriber->append_conf('postgresql.conf',
 $node_subscriber->start;
 
 my $publisher_connstr = $node_publisher->connstr . ' dbname=postgres';
+my $offset = 0;
 
 sub wait_for_subscription_sync
 {
@@ -361,13 +362,13 @@ is( $result, qq(1|abc
 2|xyz), 'update on column tab2.c is not replicated');
 
 
-# TEST: add a table to two publications with different column lists, and
+# TEST: add a table to two publications with same column lists, and
 # create a single subscription replicating both publications
 $node_publisher->safe_psql(
 	'postgres', qq(
 	CREATE TABLE tab5 (a int PRIMARY KEY, b int, c int, d int);
 	CREATE PUBLICATION pub2 FOR TABLE tab5 (a, b);
-	CREATE PUBLICATION pub3 FOR TABLE tab5 (a, d);
+	CREATE PUBLICATION pub3 FOR TABLE tab5 (a, b);
 
 	-- insert a couple initial records
 	INSERT INTO tab5 VALUES (1, 11, 111, 1111);
@@ -388,8 +389,7 @@ wait_for_subscription_sync($node_subscriber);
 
 $node_publisher->wait_for_catchup('sub1');
 
-# insert data and make sure all the columns (union of the columns lists)
-# get fully replicated
+# insert data and make sure the columns in column list get fully replicated
 $node_publisher->safe_psql(
 	'postgres', qq(
 	INSERT INTO tab5 VALUES (3, 33, 333, 3333);
@@ -399,42 +399,12 @@ $node_publisher->safe_psql(
 $node_publisher->wait_for_catchup('sub1');
 
 is( $node_subscriber->safe_psql('postgres', "SELECT * FROM tab5 ORDER BY a"),
-	qq(1|11|1111
-2|22|2222
-3|33|3333
-4|44|4444),
+	qq(1|11|
+2|22|
+3|33|
+4|44|),
 	'overlapping publications with overlapping column lists');
 
-# and finally, remove the column list for one of the publications, which
-# means replicating all columns (removing the column list), but first add
-# the missing column to the table on subscriber
-$node_publisher->safe_psql(
-	'postgres', qq(
-	ALTER PUBLICATION pub3 SET TABLE tab5;
-));
-
-$node_subscriber->safe_psql(
-	'postgres', qq(
-	ALTER SUBSCRIPTION sub1 REFRESH PUBLICATION;
-	ALTER TABLE tab5 ADD COLUMN c INT;
-));
-
-wait_for_subscription_sync($node_subscriber);
-
-$node_publisher->safe_psql(
-	'postgres', qq(
-	INSERT INTO tab5 VALUES (5, 55, 555, 5555);
-));
-
-$node_publisher->wait_for_catchup('sub1');
-
-is( $node_subscriber->safe_psql('postgres', "SELECT * FROM tab5 ORDER BY a"),
-	qq(1|11|1111|
-2|22|2222|
-3|33|3333|
-4|44|4444|
-5|55|5555|555),
-	'overlapping publications with overlapping column lists');
 
 # TEST: create a table with a column list, then change the replica
 # identity by replacing a primary key (but use a different column in
@@ -900,57 +870,21 @@ is( $node_subscriber->safe_psql(
 3|),
 	'partitions with different replica identities not replicated correctly');
 
-# TEST: With a table included in multiple publications, we should use a
-# union of the column lists. So with column lists (a,b) and (a,c) we
-# should replicate (a,b,c).
-
-$node_publisher->safe_psql(
-	'postgres', qq(
-	CREATE TABLE test_mix_1 (a int PRIMARY KEY, b int, c int);
-	CREATE PUBLICATION pub_mix_1 FOR TABLE test_mix_1 (a, b);
-	CREATE PUBLICATION pub_mix_2 FOR TABLE test_mix_1 (a, c);
-
-	-- initial data
-	INSERT INTO test_mix_1 VALUES (1, 2, 3);
-));
-
-$node_subscriber->safe_psql(
-	'postgres', qq(
-	CREATE TABLE test_mix_1 (a int PRIMARY KEY, b int, c int);
-	ALTER SUBSCRIPTION sub1 SET PUBLICATION pub_mix_1, pub_mix_2;
-));
-
-wait_for_subscription_sync($node_subscriber);
-
-$node_publisher->safe_psql(
-	'postgres', qq(
-	INSERT INTO test_mix_1 VALUES (4, 5, 6);
-));
-
-$node_publisher->wait_for_catchup('sub1');
-
-is( $node_subscriber->safe_psql(
-		'postgres', "SELECT * FROM test_mix_1 ORDER BY a"),
-	qq(1|2|3
-4|5|6),
-	'a mix of publications should use a union of column list');
-
 
-# TEST: With a table included in multiple publications, we should use a
-# union of the column lists. If any of the publications is FOR ALL
-# TABLES, we should replicate all columns.
+# TEST: With a table included in the publications which is FOR ALL TABLES, it
+# means replicate all columns.
 
 # drop unnecessary tables, so as not to interfere with the FOR ALL TABLES
 $node_publisher->safe_psql(
 	'postgres', qq(
-	DROP TABLE tab1, tab2, tab3, tab4, tab5, tab6, tab7, test_mix_1,
+	DROP TABLE tab1, tab2, tab3, tab4, tab5, tab6, tab7,
 			   test_part, test_part_a, test_part_b, test_part_c, test_part_d;
 ));
 
 $node_publisher->safe_psql(
 	'postgres', qq(
 	CREATE TABLE test_mix_2 (a int PRIMARY KEY, b int, c int);
-	CREATE PUBLICATION pub_mix_3 FOR TABLE test_mix_2 (a, b);
+	CREATE PUBLICATION pub_mix_3 FOR TABLE test_mix_2 (a, b, c);
 	CREATE PUBLICATION pub_mix_4 FOR ALL TABLES;
 
 	-- initial data
@@ -976,12 +910,11 @@ $node_publisher->wait_for_catchup('sub1');
 is( $node_subscriber->safe_psql('postgres', "SELECT * FROM test_mix_2"),
 	qq(1|2|3
 4|5|6),
-	'a mix of publications should use a union of column list');
+	'all columns should be replicated');
 
 
-# TEST: With a table included in multiple publications, we should use a
-# union of the column lists. If any of the publications is FOR ALL
-# TABLES IN SCHEMA, we should replicate all columns.
+# TEST: With a table included in the publication which is FOR ALL TABLES
+# IN SCHEMA, it means replicate all columns.
 
 $node_subscriber->safe_psql(
 	'postgres', qq(
@@ -993,7 +926,7 @@ $node_publisher->safe_psql(
 	'postgres', qq(
 	DROP TABLE test_mix_2;
 	CREATE TABLE test_mix_3 (a int PRIMARY KEY, b int, c int);
-	CREATE PUBLICATION pub_mix_5 FOR TABLE test_mix_3 (a, b);
+	CREATE PUBLICATION pub_mix_5 FOR TABLE test_mix_3 (a, b, c);
 	CREATE PUBLICATION pub_mix_6 FOR ALL TABLES IN SCHEMA public;
 
 	-- initial data
@@ -1017,7 +950,7 @@ $node_publisher->wait_for_catchup('sub1');
 is( $node_subscriber->safe_psql('postgres', "SELECT * FROM test_mix_3"),
 	qq(1|2|3
 4|5|6),
-	'a mix of publications should use a union of column list');
+	'all columns should be replicated');
 
 
 # TEST: Check handling of publish_via_partition_root - if a partition is
@@ -1074,7 +1007,7 @@ is( $node_subscriber->safe_psql(
 # TEST: Multiple publications which publish schema of parent table and
 # partition. The partition is published through two publications, once
 # through a schema (so no column list) containing the parent, and then
-# also directly (with a columns list). The expected outcome is there is
+# also directly (with all columns). The expected outcome is there is
 # no column list.
 
 $node_publisher->safe_psql(
@@ -1086,7 +1019,7 @@ $node_publisher->safe_psql(
 	CREATE TABLE t_1 PARTITION OF s1.t FOR VALUES FROM (1) TO (10);
 
 	CREATE PUBLICATION pub1 FOR ALL TABLES IN SCHEMA s1;
-	CREATE PUBLICATION pub2 FOR TABLE t_1(b);
+	CREATE PUBLICATION pub2 FOR TABLE t_1(a, b, c);
 
 	-- initial data
 	INSERT INTO s1.t VALUES (1, 2, 3);
@@ -1233,6 +1166,51 @@ is( $node_subscriber->safe_psql(
 	'publication containing both parent and child relation');
 
 
+# TEST: With a table included in multiple publications with different column
+# lists, we should catch the error when creating the subscription.
+
+$node_publisher->safe_psql('postgres', qq(
+	CREATE TABLE test_mix_1 (a int PRIMARY KEY, b int, c int);
+	CREATE PUBLICATION pub_mix_1 FOR TABLE test_mix_1 (a, b);
+	CREATE PUBLICATION pub_mix_2 FOR TABLE test_mix_1 (a, c);
+));
+
+$node_subscriber->safe_psql('postgres', qq(
+	DROP SUBSCRIPTION sub1;
+	CREATE TABLE test_mix_1 (a int PRIMARY KEY, b int, c int);
+));
+
+my ($cmdret, $stdout, $stderr) = $node_subscriber->psql(
+	'postgres', qq(
+	CREATE SUBSCRIPTION sub1 CONNECTION '$publisher_connstr' PUBLICATION pub_mix_1, pub_mix_2;
+));
+
+ok( $stderr =~
+	  qr/cannot use different column lists for table "public.test_mix_1" in different publications/,
+	'different column lists detected');
+
+# TEST: If the column list is changed after creating the subscription, we
+# should catch the error reported by walsender.
+
+$node_publisher->safe_psql('postgres', qq(
+	ALTER PUBLICATION pub_mix_1 SET TABLE test_mix_1 (a, c);
+));
+
+$node_subscriber->safe_psql('postgres', qq(
+	CREATE SUBSCRIPTION sub1 CONNECTION '$publisher_connstr' PUBLICATION pub_mix_1, pub_mix_2;
+));
+
+$node_publisher->wait_for_catchup('sub1');
+
+$node_publisher->safe_psql('postgres', qq(
+	ALTER PUBLICATION pub_mix_1 SET TABLE test_mix_1 (a, b);
+	INSERT INTO test_mix_1 VALUES(1, 1, 1);
+));
+
+$offset = $node_publisher->wait_for_log(
+	qr/cannot use different column lists for table "public.test_mix_1" in different publications/,
+	$offset);
+
 $node_subscriber->stop('fast');
 $node_publisher->stop('fast');
 
-- 
2.7.2.windows.1

#68

Amit Kapila

amit.kapila16@gmail.com

over 3 years ago

In reply to: houzj.fnst@fujitsu.com (#67)

Re: bogus: logical replication rows/cols combinations

On Tue, May 17, 2022 at 2:40 PM houzj.fnst@fujitsu.com
<houzj.fnst@fujitsu.com> wrote:

Attach the new version patch which addressed all the above comments and
comments from Shi yu[1] and Osumi-san[2].

Thanks, your first patch looks good to me. I'll commit that tomorrow
unless there are more comments on the same. The second one is also in
good shape but I would like to test it a bit more and also see if
others have any suggestions/objections on the same.

--
With Regards,
Amit Kapila.

#69

Amit Kapila

amit.kapila16@gmail.com

over 3 years ago

In reply to: Alvaro Herrera (#62)

Re: bogus: logical replication rows/cols combinations

On Mon, May 16, 2022 at 6:50 PM Alvaro Herrera <alvherre@alvh.no-ip.org> wrote:

On 2022-May-16, Amit Kapila wrote:

Few comments:
=================
1.
postgres=# select * from pg_publication_tables;
pubname | schemaname | tablename | columnlist | rowfilter
---------+------------+-----------+------------+-----------
pub1 | public | t1 | |
pub2 | public | t1 | 1 2 | (c3 < 10)
(2 rows)

I think it is better to display column names for columnlist in the
exposed view similar to attnames in the pg_stats_ext view. I think
that will make it easier for users to understand this information.

+1

I have committed the first patch after fixing this part. It seems Tom
is not very happy doing this after beta-1 [1]. The reason we get this
information via this view (and underlying function) is that it
simplifies the queries on the subscriber-side as you can see in the
second patch. The query change is as below:
@@ -1761,17 +1762,18 @@ fetch_table_list(WalReceiverConn *wrconn, List
*publications)
  WalRcvExecResult *res;
  StringInfoData cmd;
  TupleTableSlot *slot;
- Oid tableRow[2] = {TEXTOID, TEXTOID};
+ Oid tableRow[3] = {TEXTOID, TEXTOID, NAMEARRAYOID};
  List    *tablelist = NIL;

  initStringInfo(&cmd);
- appendStringInfoString(&cmd, "SELECT DISTINCT t.schemaname, t.tablename\n"
+ appendStringInfoString(&cmd, "SELECT DISTINCT t.schemaname, t.tablename, \n"
+    "                      t.attnames\n"
     "  FROM pg_catalog.pg_publication_tables t\n"
     " WHERE t.pubname IN (");

Now, there is another way to change this query as well as done by
Hou-San in his first version [2] of the patch. The changed query with
that approach will be something like:
@@ -1761,17 +1762,34 @@ fetch_table_list(WalReceiverConn *wrconn, List
*publications)
  WalRcvExecResult *res;
  StringInfoData cmd;
  TupleTableSlot *slot;
- Oid tableRow[2] = {TEXTOID, TEXTOID};
+ Oid tableRow[3] = {TEXTOID, TEXTOID, INT2VECTOROID};
  List    *tablelist = NIL;

  initStringInfo(&cmd);
- appendStringInfoString(&cmd, "SELECT DISTINCT t.schemaname, t.tablename\n"
-    "  FROM pg_catalog.pg_publication_tables t\n"
+ appendStringInfoString(&cmd,
+    "SELECT DISTINCT t.schemaname,\n"
+    "                t.tablename,\n"
+    "                (CASE WHEN (array_length(pr.prattrs, 1) = t.relnatts)\n"
+    "                THEN NULL ELSE pr.prattrs END)\n"
+    "  FROM (SELECT P.pubname AS pubname,\n"
+    "               N.nspname AS schemaname,\n"
+    "               C.relname AS tablename,\n"
+    "               P.oid AS pubid,\n"
+    "               C.oid AS reloid,\n"
+    "               C.relnatts\n"
+    "          FROM pg_publication P,\n"
+    "          LATERAL pg_get_publication_tables(P.pubname) GPT,\n"
+    "          pg_class C JOIN pg_namespace N\n"
+    "                     ON (N.oid = C.relnamespace)\n"
+    "          WHERE C.oid = GPT.relid) t\n"
+    "  LEFT OUTER JOIN pg_publication_rel pr\n"
+    "       ON (t.pubid = pr.prpubid AND\n"
+    "        pr.prrelid = reloid)\n"

It appeared slightly complex and costly to me, so I have given the
suggestion to change it as we have now in the second patch as shown
above. Now, I can think of below ways to proceed here:

a. Revert the change in view (and underlying function) as done in
commit 0ff20288e1 and consider the alternate way (using a slightly
complex query) to fix. Then maybe for PG-16, we can simplify it by
changing the underlying function and view.
b. Proceed with the current approach of using a simplified query.

What do you think?

[1]: /messages/by-id/91075.1652929852@sss.pgh.pa.us
[2]: /messages/by-id/OS0PR01MB5716A594C58DE4FFD1F8100B94C89@OS0PR01MB5716.jpnprd01.prod.outlook.com

--
With Regards,
Amit Kapila.

#70

Justin Pryzby

pryzby@telsasoft.com

over 3 years ago

In reply to: Amit Kapila (#69)

Re: bogus: logical replication rows/cols combinations

On Thu, May 19, 2022 at 10:33:13AM +0530, Amit Kapila wrote:

I have committed the first patch after fixing this part. It seems Tom
is not very happy doing this after beta-1 [1]. The reason we get this
information via this view (and underlying function) is that it
simplifies the queries on the subscriber-side as you can see in the
second patch. The query change is as below:
[1] - /messages/by-id/91075.1652929852@sss.pgh.pa.us

I think Tom's concern is that adding information to a view seems like adding a
feature that hadn't previously been contemplated.
(Catalog changes themselves are not prohibited during the beta period).

a. Revert the change in view (and underlying function) as done in
commit 0ff20288e1 and consider the alternate way (using a slightly
complex query) to fix. Then maybe for PG-16, we can simplify it by
changing the underlying function and view.

But, ISTM that it makes no sense to do it differently for v15 just to avoid the
appearance of adding a new feature, only to re-do it in 2 weeks for v16...
So (from a passive observer) +0.1 to keep the current patch.

I have some minor language fixes to that patch.

diff --git a/doc/src/sgml/catalogs.sgml b/doc/src/sgml/catalogs.sgml
index d96c72e5310..82aa84e96e1 100644
--- a/doc/src/sgml/catalogs.sgml
+++ b/doc/src/sgml/catalogs.sgml
@@ -9691,7 +9691,7 @@ SCRAM-SHA-256$<replaceable>&lt;iteration count&gt;</replaceable>:<replaceable>&l

      <row>
       <entry><link linkend="view-pg-publication-tables"><structname>pg_publication_tables</structname></link></entry>
-      <entry>publications and information of their associated tables</entry>
+      <entry>publications and information about their associated tables</entry>
      </row>

<row>
@@ -11635,7 +11635,7 @@ SELECT * FROM pg_locks pl LEFT JOIN pg_prepared_xacts ppx

   <para>
    The view <structname>pg_publication_tables</structname> provides
-   information about the mapping between publications and information of
+   information about the mapping between publications and information about
    tables they contain.  Unlike the underlying catalog
    <link linkend="catalog-pg-publication-rel"><structname>pg_publication_rel</structname></link>,
    this view expands publications defined as <literal>FOR ALL TABLES</literal>
@@ -11695,7 +11695,7 @@ SELECT * FROM pg_locks pl LEFT JOIN pg_prepared_xacts ppx
       </para>
       <para>
        Names of table columns included in the publication. This contains all
-       the columns of the table when the user didn't specify the column list
+       the columns of the table when the user didn't specify a column list
        for the table.
       </para></entry>
      </row>
diff --git a/src/backend/catalog/pg_publication.c b/src/backend/catalog/pg_publication.c
index 8c7fca62de3..2f706f638ce 100644
--- a/src/backend/catalog/pg_publication.c
+++ b/src/backend/catalog/pg_publication.c
@@ -1077,7 +1077,7 @@ get_publication_name(Oid pubid, bool missing_ok)
 }

 /*
- * Returns information of tables in a publication.
+ * Returns information about tables in a publication.
  */
 Datum
 pg_get_publication_tables(PG_FUNCTION_ARGS)
diff --git a/src/include/catalog/pg_proc.dat b/src/include/catalog/pg_proc.dat
index 87aa571a331..86f13293090 100644
--- a/src/include/catalog/pg_proc.dat
+++ b/src/include/catalog/pg_proc.dat
@@ -11673,7 +11673,7 @@
   prosrc => 'pg_show_replication_origin_status' },

 # publications
-{ oid => '6119', descr => 'get information of tables in a publication',
+{ oid => '6119', descr => 'get information about tables in a publication',
   proname => 'pg_get_publication_tables', prorows => '1000', proretset => 't',
   provolatile => 's', prorettype => 'record', proargtypes => 'text',
   proallargtypes => '{text,oid,int2vector,pg_node_tree}', proargmodes => '{i,o,o,o}',

#71

Tom Lane

tgl@sss.pgh.pa.us

over 3 years ago

In reply to: Justin Pryzby (#70)

Re: bogus: logical replication rows/cols combinations

Justin Pryzby <pryzby@telsasoft.com> writes:

On Thu, May 19, 2022 at 10:33:13AM +0530, Amit Kapila wrote:

I have committed the first patch after fixing this part. It seems Tom
is not very happy doing this after beta-1 [1]. The reason we get this
information via this view (and underlying function) is that it
simplifies the queries on the subscriber-side as you can see in the
second patch. The query change is as below:
[1] - /messages/by-id/91075.1652929852@sss.pgh.pa.us

I think Tom's concern is that adding information to a view seems like adding a
feature that hadn't previously been contemplated.
(Catalog changes themselves are not prohibited during the beta period).

It certainly smells like a new feature, but my concern was more around the
post-beta catalog change. We do those only if really forced to, and the
explanation in the commit message didn't satisfy me as to why it was
necessary. This explanation isn't much better --- if we're trying to
prohibit a certain class of publication definitions, what good does it do
to check that on the subscriber side? Even more to the point, how can we
have a subscriber do that by relying on view columns that don't exist in
older versions? I'm also quite concerned about anything that involves
subscribers examining row filter conditions; that sounds like a pretty
direct route to bugs involving unsolvability and the halting problem.

(But I've not read very much of this thread ... been a bit under the
weather the last couple weeks. Maybe this actually is a sane solution.
It just doesn't sound like one at this level of detail.)

regards, tom lane

#72

Amit Kapila

amit.kapila16@gmail.com

over 3 years ago

In reply to: Tom Lane (#71)

Re: bogus: logical replication rows/cols combinations

On Thu, May 19, 2022 at 7:54 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:

Justin Pryzby <pryzby@telsasoft.com> writes:

On Thu, May 19, 2022 at 10:33:13AM +0530, Amit Kapila wrote:

I have committed the first patch after fixing this part. It seems Tom
is not very happy doing this after beta-1 [1]. The reason we get this
information via this view (and underlying function) is that it
simplifies the queries on the subscriber-side as you can see in the
second patch. The query change is as below:
[1] - /messages/by-id/91075.1652929852@sss.pgh.pa.us

I think Tom's concern is that adding information to a view seems like adding a
feature that hadn't previously been contemplated.
(Catalog changes themselves are not prohibited during the beta period).

It certainly smells like a new feature, but my concern was more around the
post-beta catalog change. We do those only if really forced to, and the
explanation in the commit message didn't satisfy me as to why it was
necessary. This explanation isn't much better --- if we're trying to
prohibit a certain class of publication definitions, what good does it do
to check that on the subscriber side?

It is required on the subscriber side because prohibition is only for
the cases where multiple publications are combined. We disallow the
cases where the column list is different for the same table when
combining publications. For example:

Publisher-side:
Create table tab(c1 int, c2 int);
Create Publication pub1 for table tab(c1);
Create Publication pub1 for table tab(c2);

Subscriber-side:
Create Subscription sub1 Connection 'dbname=postgres' Publication pub1, pub2;

We want to prohibit such cases. So, it would be better to check at the
time of 'Create Subscription' to validate such combinations and
prohibit them. To achieve that we extended the existing function
pg_get_publication_tables() and view pg_publication_tables to expose
the column list and verify such a combination. We primarily need
column list information for this prohibition but it appeared natural
to expose the row filter.

As mentioned in my previous email, we can fetch the required
information directly from system table pg_publication_rel and extend
the query in fetch_table_list to achieve the desired purpose but
extending the existing function/view for this appears to be a simpler
way.

Even more to the point, how can we
have a subscriber do that by relying on view columns that don't exist in
older versions?

We need a version check like (if
(walrcv_server_version(LogRepWorkerWalRcvConn) >= 150000)) for that.

I'm also quite concerned about anything that involves
subscribers examining row filter conditions; that sounds like a pretty
direct route to bugs involving unsolvability and the halting problem.

We examine only the column list for the purpose of this prohibition.

--
With Regards,
Amit Kapila.

#73

houzj.fnst@fujitsu.com

over 3 years ago

In reply to: Amit Kapila (#72)

1 attachment(s)

RE: bogus: logical replication rows/cols combinations

On Friday, May 20, 2022 11:06 AM Amit Kapila <amit.kapila16@gmail.com> wrote:

On Thu, May 19, 2022 at 7:54 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:

Even more to the point, how can we
have a subscriber do that by relying on view columns that don't exist
in older versions?

We need a version check like (if
(walrcv_server_version(LogRepWorkerWalRcvConn) >= 150000)) for that.

Thanks for pointing it out. Here is the new version patch which add this version check.

Best regards,
Hou zj

Attachments:

v4-0002-Prohibit-combining-publications-with-different-colum.patchapplication/octet-stream; name=v4-0002-Prohibit-combining-publications-with-different-colum.patchDownload

From 080d3e8ac5d9ec44b52f0a03e6d204d7f451a779 Mon Sep 17 00:00:00 2001
From: "houzj.fnst" <houzj.fnst@cn.fujitsu.com>
Date: Fri, 13 May 2022 13:19:49 +0800
Subject: [PATCH] Prohibit combining publications with different column lists.

The main purpose of introducing a column list is to have statically
different shapes on publisher and subscriber or hide sensitive column
data. In both cases, it doesn't seem to make sense to combine column
lists.

So, we disallow the cases where the column list is different for the same
table when combining publications. It can be later extended to combine the
column lists for selective cases where required.

Reported-by: Alvaro Herrera
Author: Hou Zhijie
Reviewed-by: Amit Kapila, Alvaro Herrera, Shi Yu, Takamichi Osumi
Discussion: https://postgr.es/m/202204251548.mudq7jbqnh7r@alvherre.pgsql
---
 doc/src/sgml/ref/alter_publication.sgml     |  12 ++-
 doc/src/sgml/ref/create_subscription.sgml   |   5 +
 src/backend/commands/subscriptioncmds.c     |  30 ++++--
 src/backend/replication/logical/tablesync.c |  72 ++++++-------
 src/backend/replication/pgoutput/pgoutput.c |  80 +++++++--------
 src/test/subscription/t/031_column_list.pl  | 150 ++++++++++++----------------
 6 files changed, 182 insertions(+), 167 deletions(-)

diff --git a/doc/src/sgml/ref/alter_publication.sgml b/doc/src/sgml/ref/alter_publication.sgml
index e2cce49..f03933a 100644
--- a/doc/src/sgml/ref/alter_publication.sgml
+++ b/doc/src/sgml/ref/alter_publication.sgml
@@ -116,7 +116,17 @@ ALTER PUBLICATION <replaceable class="parameter">name</replaceable> RENAME TO <r
 
      <para>
       Optionally, a column list can be specified.  See <xref
-      linkend="sql-createpublication"/> for details.
+      linkend="sql-createpublication"/> for details. Note that a subscription
+      having several publications in which the same table has been published
+      with different column lists is not supported. So, changing the column
+      lists of the tables being subscribed could cause inconsistency of column
+      lists among publications in which case <command>ALTER PUBLICATION</command>
+      command will be successful but later the WalSender in publisher or the
+      subscriber may throw an error. In this scenario, the user needs to
+      recreate the subscription after adjusting the column list or drop the
+      problematic publication using
+      <literal>ALTER SUBSCRIPTION ... DROP PUBLICATION</literal> and then add
+      it back after adjusting the column list.
      </para>
 
      <para>
diff --git a/doc/src/sgml/ref/create_subscription.sgml b/doc/src/sgml/ref/create_subscription.sgml
index 203bb41..f6f82a0 100644
--- a/doc/src/sgml/ref/create_subscription.sgml
+++ b/doc/src/sgml/ref/create_subscription.sgml
@@ -356,6 +356,11 @@ CREATE SUBSCRIPTION <replaceable class="parameter">subscription_name</replaceabl
   </para>
 
   <para>
+   Subscription having several publications in which the same table has been
+   published with different column lists is not supported.
+  </para>
+
+  <para>
    We allow non-existent publications to be specified so that users can add
    those later. This means
    <link linkend="catalog-pg-subscription"><structname>pg_subscription</structname></link>
diff --git a/src/backend/commands/subscriptioncmds.c b/src/backend/commands/subscriptioncmds.c
index 690cdaa..641caf7 100644
--- a/src/backend/commands/subscriptioncmds.c
+++ b/src/backend/commands/subscriptioncmds.c
@@ -1753,7 +1753,8 @@ AlterSubscriptionOwner_oid(Oid subid, Oid newOwnerId)
 
 /*
  * Get the list of tables which belong to specified publications on the
- * publisher connection.
+ * publisher connection. Also get the column list for each table and check if
+ * column lists are the same in different publications.
  */
 static List *
 fetch_table_list(WalReceiverConn *wrconn, List *publications)
@@ -1761,17 +1762,23 @@ fetch_table_list(WalReceiverConn *wrconn, List *publications)
 	WalRcvExecResult *res;
 	StringInfoData cmd;
 	TupleTableSlot *slot;
-	Oid			tableRow[2] = {TEXTOID, TEXTOID};
+	Oid			tableRow[3] = {TEXTOID, TEXTOID, NAMEARRAYOID};
 	List	   *tablelist = NIL;
+	bool		check_columnlist = (walrcv_server_version(wrconn) >= 150000);
 
 	initStringInfo(&cmd);
-	appendStringInfoString(&cmd, "SELECT DISTINCT t.schemaname, t.tablename\n"
-						   "  FROM pg_catalog.pg_publication_tables t\n"
+	appendStringInfoString(&cmd, "SELECT DISTINCT t.schemaname, t.tablename \n");
+
+	/* Get column lists for each relation if the publisher supports it */
+	if (check_columnlist)
+		appendStringInfoString(&cmd, ", t.attnames\n");
+
+	appendStringInfoString(&cmd, "FROM pg_catalog.pg_publication_tables t\n"
 						   " WHERE t.pubname IN (");
 	get_publications_str(publications, &cmd, true);
 	appendStringInfoChar(&cmd, ')');
 
-	res = walrcv_exec(wrconn, cmd.data, 2, tableRow);
+	res = walrcv_exec(wrconn, cmd.data, check_columnlist ? 3 : 2, tableRow);
 	pfree(cmd.data);
 
 	if (res->status != WALRCV_OK_TUPLES)
@@ -1795,7 +1802,18 @@ fetch_table_list(WalReceiverConn *wrconn, List *publications)
 		Assert(!isnull);
 
 		rv = makeRangeVar(nspname, relname, -1);
-		tablelist = lappend(tablelist, rv);
+
+		/*
+		 * We don't support the case where column list is different for the
+		 * same table in different publications.
+		 */
+		if (check_columnlist && list_member(tablelist, rv))
+			ereport(ERROR,
+					errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+					errmsg("cannot use different column lists for table \"%s.%s\" in different publications",
+						   nspname, relname));
+		else
+			tablelist = lappend(tablelist, rv);
 
 		ExecClearTuple(slot);
 	}
diff --git a/src/backend/replication/logical/tablesync.c b/src/backend/replication/logical/tablesync.c
index 994c7a0..e4633bd 100644
--- a/src/backend/replication/logical/tablesync.c
+++ b/src/backend/replication/logical/tablesync.c
@@ -753,17 +753,6 @@ fetch_remote_table_info(char *nspname, char *relname,
 	/*
 	 * Get column lists for each relation.
 	 *
-	 * For initial synchronization, column lists can be ignored in following
-	 * cases:
-	 *
-	 * 1) one of the subscribed publications for the table hasn't specified
-	 * any column list
-	 *
-	 * 2) one of the subscribed publications has puballtables set to true
-	 *
-	 * 3) one of the subscribed publications is declared as ALL TABLES IN
-	 * SCHEMA that includes this relation
-	 *
 	 * We need to do this before fetching info about column names and types,
 	 * so that we can skip columns that should not be replicated.
 	 */
@@ -771,7 +760,7 @@ fetch_remote_table_info(char *nspname, char *relname,
 	{
 		WalRcvExecResult *pubres;
 		TupleTableSlot *slot;
-		Oid			attrsRow[] = {INT2OID};
+		Oid			attrsRow[] = {INT2VECTOROID};
 		StringInfoData pub_names;
 		bool		first = true;
 
@@ -786,19 +775,17 @@ fetch_remote_table_info(char *nspname, char *relname,
 
 		/*
 		 * Fetch info about column lists for the relation (from all the
-		 * publications). We unnest the int2vector values, because that makes
-		 * it easier to combine lists by simply adding the attnums to a new
-		 * bitmap (without having to parse the int2vector data). This
-		 * preserves NULL values, so that if one of the publications has no
-		 * column list, we'll know that.
+		 * publications).
 		 */
 		resetStringInfo(&cmd);
 		appendStringInfo(&cmd,
-						 "SELECT DISTINCT unnest"
+						 "SELECT DISTINCT"
+						 "  (CASE WHEN (array_length(gpt.attrs, 1) = c.relnatts)"
+						 "   THEN NULL ELSE gpt.attrs END)"
 						 "  FROM pg_publication p,"
-						 "  LATERAL pg_get_publication_tables(p.pubname) gpt"
-						 "  LEFT OUTER JOIN unnest(gpt.attrs) ON TRUE"
-						 " WHERE gpt.relid = %u"
+						 "  LATERAL pg_get_publication_tables(p.pubname) gpt,"
+						 "  pg_class c"
+						 " WHERE gpt.relid = %u AND c.oid = gpt.relid"
 						 "   AND p.pubname IN ( %s )",
 						 lrel->remoteid,
 						 pub_names.data);
@@ -813,26 +800,43 @@ fetch_remote_table_info(char *nspname, char *relname,
 							nspname, relname, pubres->err)));
 
 		/*
-		 * Merge the column lists (from different publications) by creating a
-		 * single bitmap with all the attnums. If we find a NULL value, that
-		 * means one of the publications has no column list for the table
-		 * we're syncing.
+		 * We don't support the case where the column list is different for the
+		 * same table when combining publications. So there should be only one
+		 * row returned. Although we already checked this when creating
+		 * subscription, we still need to check here in case the column list
+		 * was changed after creating the subscription and before the sync
+		 * worker is started.
+		 */
+		if (tuplestore_tuple_count(pubres->tuplestore) > 1)
+			ereport(ERROR,
+					errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+					errmsg("cannot use different column lists for table \"%s.%s\" in different publications",
+						   nspname, relname));
+
+		/*
+		 * Get the column list and build a single bitmap with the attnums.
+		 *
+		 * If we find a NULL value, it means all the columns should be
+		 * replicated.
 		 */
 		slot = MakeSingleTupleTableSlot(pubres->tupledesc, &TTSOpsMinimalTuple);
-		while (tuplestore_gettupleslot(pubres->tuplestore, true, false, slot))
+		if (tuplestore_gettupleslot(pubres->tuplestore, true, false, slot))
 		{
 			Datum		cfval = slot_getattr(slot, 1, &isnull);
 
-			/* NULL means empty column list, so we're done. */
-			if (isnull)
+			if (!isnull)
 			{
-				bms_free(included_cols);
-				included_cols = NULL;
-				break;
-			}
+				ArrayType  *arr;
+				int			nelems;
+				int16	   *elems;
 
-			included_cols = bms_add_member(included_cols,
-										   DatumGetInt16(cfval));
+				arr = DatumGetArrayTypeP(cfval);
+				nelems = ARR_DIMS(arr)[0];
+				elems = (int16 *) ARR_DATA_PTR(arr);
+
+				for (natt = 0; natt < nelems; natt++)
+					included_cols = bms_add_member(included_cols, elems[natt]);
+			}
 
 			ExecClearTuple(slot);
 		}
diff --git a/src/backend/replication/pgoutput/pgoutput.c b/src/backend/replication/pgoutput/pgoutput.c
index 42c06af..e9079bd 100644
--- a/src/backend/replication/pgoutput/pgoutput.c
+++ b/src/backend/replication/pgoutput/pgoutput.c
@@ -979,30 +979,31 @@ pgoutput_column_list_init(PGOutputData *data, List *publications,
 						  RelationSyncEntry *entry)
 {
 	ListCell   *lc;
+	bool		first = true;
+	Relation	relation = RelationIdGetRelation(entry->publish_as_relid);
 
 	/*
 	 * Find if there are any column lists for this relation. If there are,
-	 * build a bitmap merging all the column lists.
-	 *
-	 * All the given publication-table mappings must be checked.
+	 * build a bitmap using the column lists.
 	 *
 	 * Multiple publications might have multiple column lists for this
 	 * relation.
 	 *
+	 * Note that we don't support the case where the column list is different
+	 * for the same table when combining publications. But one can later change
+	 * the publication so we still need to check all the given
+	 * publication-table mappings and report an error if any publications have
+	 * a different column list.
+	 *
 	 * FOR ALL TABLES and FOR ALL TABLES IN SCHEMA implies "don't use column
-	 * list" so it takes precedence.
+	 * list".
 	 */
 	foreach(lc, publications)
 	{
 		Publication *pub = lfirst(lc);
 		HeapTuple	cftuple = NULL;
 		Datum		cfdatum = 0;
-
-		/*
-		 * Assume there's no column list. Only if we find pg_publication_rel
-		 * entry with a column list we'll switch it to false.
-		 */
-		bool		pub_no_list = true;
+		Bitmapset	*cols = NULL;
 
 		/*
 		 * If the publication is FOR ALL TABLES then it is treated the same as
@@ -1011,6 +1012,8 @@ pgoutput_column_list_init(PGOutputData *data, List *publications,
 		 */
 		if (!pub->alltables)
 		{
+			bool		pub_no_list = true;
+
 			/*
 			 * Check for the presence of a column list in this publication.
 			 *
@@ -1024,51 +1027,48 @@ pgoutput_column_list_init(PGOutputData *data, List *publications,
 
 			if (HeapTupleIsValid(cftuple))
 			{
-				/*
-				 * Lookup the column list attribute.
-				 *
-				 * Note: We update the pub_no_list value directly, because if
-				 * the value is NULL, we have no list (and vice versa).
-				 */
+				/* Lookup the column list attribute. */
 				cfdatum = SysCacheGetAttr(PUBLICATIONRELMAP, cftuple,
 										  Anum_pg_publication_rel_prattrs,
 										  &pub_no_list);
 
-				/*
-				 * Build the column list bitmap in the per-entry context.
-				 *
-				 * We need to merge column lists from all publications, so we
-				 * update the same bitmapset. If the column list is null, we
-				 * interpret it as replicating all columns.
-				 */
+				/* Build the column list bitmap in the per-entry context. */
 				if (!pub_no_list)	/* when not null */
 				{
 					pgoutput_ensure_entry_cxt(data, entry);
 
-					entry->columns = pub_collist_to_bitmapset(entry->columns,
-															  cfdatum,
-															  entry->entry_cxt);
+					cols = pub_collist_to_bitmapset(cols, cfdatum,
+													entry->entry_cxt);
+
+					/*
+					 * If column list includes all the columns of the table,
+					 * set it to NULL.
+					 */
+					if (bms_num_members(cols) == RelationGetNumberOfAttributes(relation))
+					{
+						bms_free(cols);
+						cols = NULL;
+					}
 				}
+
+				ReleaseSysCache(cftuple);
 			}
 		}
 
-		/*
-		 * Found a publication with no column list, so we're done. But first
-		 * discard column list we might have from preceding publications.
-		 */
-		if (pub_no_list)
+		if (first)
 		{
-			if (cftuple)
-				ReleaseSysCache(cftuple);
-
-			bms_free(entry->columns);
-			entry->columns = NULL;
-
-			break;
+			entry->columns = cols;
+			first = false;
 		}
-
-		ReleaseSysCache(cftuple);
+		else if (!bms_equal(entry->columns, cols))
+			ereport(ERROR,
+					errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+					errmsg("cannot use different column lists for table \"%s.%s\" in different publications",
+						   get_namespace_name(RelationGetNamespace(relation)),
+						   RelationGetRelationName(relation)));
 	}							/* loop all subscribed publications */
+
+	RelationClose(relation);
 }
 
 /*
diff --git a/src/test/subscription/t/031_column_list.pl b/src/test/subscription/t/031_column_list.pl
index 19812e1..7f031bc 100644
--- a/src/test/subscription/t/031_column_list.pl
+++ b/src/test/subscription/t/031_column_list.pl
@@ -20,6 +20,7 @@ $node_subscriber->append_conf('postgresql.conf',
 $node_subscriber->start;
 
 my $publisher_connstr = $node_publisher->connstr . ' dbname=postgres';
+my $offset = 0;
 
 sub wait_for_subscription_sync
 {
@@ -361,13 +362,13 @@ is( $result, qq(1|abc
 2|xyz), 'update on column tab2.c is not replicated');
 
 
-# TEST: add a table to two publications with different column lists, and
+# TEST: add a table to two publications with same column lists, and
 # create a single subscription replicating both publications
 $node_publisher->safe_psql(
 	'postgres', qq(
 	CREATE TABLE tab5 (a int PRIMARY KEY, b int, c int, d int);
 	CREATE PUBLICATION pub2 FOR TABLE tab5 (a, b);
-	CREATE PUBLICATION pub3 FOR TABLE tab5 (a, d);
+	CREATE PUBLICATION pub3 FOR TABLE tab5 (a, b);
 
 	-- insert a couple initial records
 	INSERT INTO tab5 VALUES (1, 11, 111, 1111);
@@ -388,8 +389,7 @@ wait_for_subscription_sync($node_subscriber);
 
 $node_publisher->wait_for_catchup('sub1');
 
-# insert data and make sure all the columns (union of the columns lists)
-# get fully replicated
+# insert data and make sure the columns in column list get fully replicated
 $node_publisher->safe_psql(
 	'postgres', qq(
 	INSERT INTO tab5 VALUES (3, 33, 333, 3333);
@@ -399,42 +399,12 @@ $node_publisher->safe_psql(
 $node_publisher->wait_for_catchup('sub1');
 
 is( $node_subscriber->safe_psql('postgres', "SELECT * FROM tab5 ORDER BY a"),
-	qq(1|11|1111
-2|22|2222
-3|33|3333
-4|44|4444),
+	qq(1|11|
+2|22|
+3|33|
+4|44|),
 	'overlapping publications with overlapping column lists');
 
-# and finally, remove the column list for one of the publications, which
-# means replicating all columns (removing the column list), but first add
-# the missing column to the table on subscriber
-$node_publisher->safe_psql(
-	'postgres', qq(
-	ALTER PUBLICATION pub3 SET TABLE tab5;
-));
-
-$node_subscriber->safe_psql(
-	'postgres', qq(
-	ALTER SUBSCRIPTION sub1 REFRESH PUBLICATION;
-	ALTER TABLE tab5 ADD COLUMN c INT;
-));
-
-wait_for_subscription_sync($node_subscriber);
-
-$node_publisher->safe_psql(
-	'postgres', qq(
-	INSERT INTO tab5 VALUES (5, 55, 555, 5555);
-));
-
-$node_publisher->wait_for_catchup('sub1');
-
-is( $node_subscriber->safe_psql('postgres', "SELECT * FROM tab5 ORDER BY a"),
-	qq(1|11|1111|
-2|22|2222|
-3|33|3333|
-4|44|4444|
-5|55|5555|555),
-	'overlapping publications with overlapping column lists');
 
 # TEST: create a table with a column list, then change the replica
 # identity by replacing a primary key (but use a different column in
@@ -900,57 +870,21 @@ is( $node_subscriber->safe_psql(
 3|),
 	'partitions with different replica identities not replicated correctly');
 
-# TEST: With a table included in multiple publications, we should use a
-# union of the column lists. So with column lists (a,b) and (a,c) we
-# should replicate (a,b,c).
-
-$node_publisher->safe_psql(
-	'postgres', qq(
-	CREATE TABLE test_mix_1 (a int PRIMARY KEY, b int, c int);
-	CREATE PUBLICATION pub_mix_1 FOR TABLE test_mix_1 (a, b);
-	CREATE PUBLICATION pub_mix_2 FOR TABLE test_mix_1 (a, c);
-
-	-- initial data
-	INSERT INTO test_mix_1 VALUES (1, 2, 3);
-));
-
-$node_subscriber->safe_psql(
-	'postgres', qq(
-	CREATE TABLE test_mix_1 (a int PRIMARY KEY, b int, c int);
-	ALTER SUBSCRIPTION sub1 SET PUBLICATION pub_mix_1, pub_mix_2;
-));
-
-wait_for_subscription_sync($node_subscriber);
-
-$node_publisher->safe_psql(
-	'postgres', qq(
-	INSERT INTO test_mix_1 VALUES (4, 5, 6);
-));
-
-$node_publisher->wait_for_catchup('sub1');
-
-is( $node_subscriber->safe_psql(
-		'postgres', "SELECT * FROM test_mix_1 ORDER BY a"),
-	qq(1|2|3
-4|5|6),
-	'a mix of publications should use a union of column list');
-
 
-# TEST: With a table included in multiple publications, we should use a
-# union of the column lists. If any of the publications is FOR ALL
-# TABLES, we should replicate all columns.
+# TEST: With a table included in the publications which is FOR ALL TABLES, it
+# means replicate all columns.
 
 # drop unnecessary tables, so as not to interfere with the FOR ALL TABLES
 $node_publisher->safe_psql(
 	'postgres', qq(
-	DROP TABLE tab1, tab2, tab3, tab4, tab5, tab6, tab7, test_mix_1,
+	DROP TABLE tab1, tab2, tab3, tab4, tab5, tab6, tab7,
 			   test_part, test_part_a, test_part_b, test_part_c, test_part_d;
 ));
 
 $node_publisher->safe_psql(
 	'postgres', qq(
 	CREATE TABLE test_mix_2 (a int PRIMARY KEY, b int, c int);
-	CREATE PUBLICATION pub_mix_3 FOR TABLE test_mix_2 (a, b);
+	CREATE PUBLICATION pub_mix_3 FOR TABLE test_mix_2 (a, b, c);
 	CREATE PUBLICATION pub_mix_4 FOR ALL TABLES;
 
 	-- initial data
@@ -976,12 +910,11 @@ $node_publisher->wait_for_catchup('sub1');
 is( $node_subscriber->safe_psql('postgres', "SELECT * FROM test_mix_2"),
 	qq(1|2|3
 4|5|6),
-	'a mix of publications should use a union of column list');
+	'all columns should be replicated');
 
 
-# TEST: With a table included in multiple publications, we should use a
-# union of the column lists. If any of the publications is FOR ALL
-# TABLES IN SCHEMA, we should replicate all columns.
+# TEST: With a table included in the publication which is FOR ALL TABLES
+# IN SCHEMA, it means replicate all columns.
 
 $node_subscriber->safe_psql(
 	'postgres', qq(
@@ -993,7 +926,7 @@ $node_publisher->safe_psql(
 	'postgres', qq(
 	DROP TABLE test_mix_2;
 	CREATE TABLE test_mix_3 (a int PRIMARY KEY, b int, c int);
-	CREATE PUBLICATION pub_mix_5 FOR TABLE test_mix_3 (a, b);
+	CREATE PUBLICATION pub_mix_5 FOR TABLE test_mix_3 (a, b, c);
 	CREATE PUBLICATION pub_mix_6 FOR ALL TABLES IN SCHEMA public;
 
 	-- initial data
@@ -1017,7 +950,7 @@ $node_publisher->wait_for_catchup('sub1');
 is( $node_subscriber->safe_psql('postgres', "SELECT * FROM test_mix_3"),
 	qq(1|2|3
 4|5|6),
-	'a mix of publications should use a union of column list');
+	'all columns should be replicated');
 
 
 # TEST: Check handling of publish_via_partition_root - if a partition is
@@ -1074,7 +1007,7 @@ is( $node_subscriber->safe_psql(
 # TEST: Multiple publications which publish schema of parent table and
 # partition. The partition is published through two publications, once
 # through a schema (so no column list) containing the parent, and then
-# also directly (with a columns list). The expected outcome is there is
+# also directly (with all columns). The expected outcome is there is
 # no column list.
 
 $node_publisher->safe_psql(
@@ -1086,7 +1019,7 @@ $node_publisher->safe_psql(
 	CREATE TABLE t_1 PARTITION OF s1.t FOR VALUES FROM (1) TO (10);
 
 	CREATE PUBLICATION pub1 FOR ALL TABLES IN SCHEMA s1;
-	CREATE PUBLICATION pub2 FOR TABLE t_1(b);
+	CREATE PUBLICATION pub2 FOR TABLE t_1(a, b, c);
 
 	-- initial data
 	INSERT INTO s1.t VALUES (1, 2, 3);
@@ -1233,6 +1166,51 @@ is( $node_subscriber->safe_psql(
 	'publication containing both parent and child relation');
 
 
+# TEST: With a table included in multiple publications with different column
+# lists, we should catch the error when creating the subscription.
+
+$node_publisher->safe_psql('postgres', qq(
+	CREATE TABLE test_mix_1 (a int PRIMARY KEY, b int, c int);
+	CREATE PUBLICATION pub_mix_1 FOR TABLE test_mix_1 (a, b);
+	CREATE PUBLICATION pub_mix_2 FOR TABLE test_mix_1 (a, c);
+));
+
+$node_subscriber->safe_psql('postgres', qq(
+	DROP SUBSCRIPTION sub1;
+	CREATE TABLE test_mix_1 (a int PRIMARY KEY, b int, c int);
+));
+
+my ($cmdret, $stdout, $stderr) = $node_subscriber->psql(
+	'postgres', qq(
+	CREATE SUBSCRIPTION sub1 CONNECTION '$publisher_connstr' PUBLICATION pub_mix_1, pub_mix_2;
+));
+
+ok( $stderr =~
+	  qr/cannot use different column lists for table "public.test_mix_1" in different publications/,
+	'different column lists detected');
+
+# TEST: If the column list is changed after creating the subscription, we
+# should catch the error reported by walsender.
+
+$node_publisher->safe_psql('postgres', qq(
+	ALTER PUBLICATION pub_mix_1 SET TABLE test_mix_1 (a, c);
+));
+
+$node_subscriber->safe_psql('postgres', qq(
+	CREATE SUBSCRIPTION sub1 CONNECTION '$publisher_connstr' PUBLICATION pub_mix_1, pub_mix_2;
+));
+
+$node_publisher->wait_for_catchup('sub1');
+
+$node_publisher->safe_psql('postgres', qq(
+	ALTER PUBLICATION pub_mix_1 SET TABLE test_mix_1 (a, b);
+	INSERT INTO test_mix_1 VALUES(1, 1, 1);
+));
+
+$offset = $node_publisher->wait_for_log(
+	qr/cannot use different column lists for table "public.test_mix_1" in different publications/,
+	$offset);
+
 $node_subscriber->stop('fast');
 $node_publisher->stop('fast');
 
-- 
2.7.2.windows.1

#74

Amit Kapila

amit.kapila16@gmail.com

over 3 years ago

In reply to: Amit Kapila (#72)

Re: bogus: logical replication rows/cols combinations

On Fri, May 20, 2022 at 8:36 AM Amit Kapila <amit.kapila16@gmail.com> wrote:

On Thu, May 19, 2022 at 7:54 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:

Justin Pryzby <pryzby@telsasoft.com> writes:

On Thu, May 19, 2022 at 10:33:13AM +0530, Amit Kapila wrote:

I have committed the first patch after fixing this part. It seems Tom
is not very happy doing this after beta-1 [1]. The reason we get this
information via this view (and underlying function) is that it
simplifies the queries on the subscriber-side as you can see in the
second patch. The query change is as below:
[1] - /messages/by-id/91075.1652929852@sss.pgh.pa.us

I think Tom's concern is that adding information to a view seems like adding a
feature that hadn't previously been contemplated.
(Catalog changes themselves are not prohibited during the beta period).

It certainly smells like a new feature, but my concern was more around the
post-beta catalog change. We do those only if really forced to, and the
explanation in the commit message didn't satisfy me as to why it was
necessary. This explanation isn't much better --- if we're trying to
prohibit a certain class of publication definitions, what good does it do
to check that on the subscriber side?

It is required on the subscriber side because prohibition is only for
the cases where multiple publications are combined. We disallow the
cases where the column list is different for the same table when
combining publications. For example:

Publisher-side:
Create table tab(c1 int, c2 int);
Create Publication pub1 for table tab(c1);
Create Publication pub1 for table tab(c2);

Subscriber-side:
Create Subscription sub1 Connection 'dbname=postgres' Publication pub1, pub2;

We want to prohibit such cases. So, it would be better to check at the
time of 'Create Subscription' to validate such combinations and
prohibit them. To achieve that we extended the existing function
pg_get_publication_tables() and view pg_publication_tables to expose
the column list and verify such a combination. We primarily need
column list information for this prohibition but it appeared natural
to expose the row filter.

I still feel that the current approach to extend the underlying
function and view is a better idea but if you and or others are not
convinced then we can try to achieve it by extending the existing
query on the subscriber side as mentioned in my previous email [1]/messages/by-id/CAA4eK1KfL=ez5fKPB-0Nrgf7wiqN9bXP-YHHj2YH5utXAmjYug@mail.gmail.com.
Kindly let me know your opinion?

[1]: /messages/by-id/CAA4eK1KfL=ez5fKPB-0Nrgf7wiqN9bXP-YHHj2YH5utXAmjYug@mail.gmail.com

--
With Regards,
Amit Kapila.

#75

Amit Kapila

amit.kapila16@gmail.com

over 3 years ago

In reply to: Amit Kapila (#74)

Re: bogus: logical replication rows/cols combinations

On Tue, May 24, 2022 at 3:19 PM Amit Kapila <amit.kapila16@gmail.com> wrote:

On Fri, May 20, 2022 at 8:36 AM Amit Kapila <amit.kapila16@gmail.com> wrote:

On Thu, May 19, 2022 at 7:54 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:

Justin Pryzby <pryzby@telsasoft.com> writes:

On Thu, May 19, 2022 at 10:33:13AM +0530, Amit Kapila wrote:

I have committed the first patch after fixing this part. It seems Tom
is not very happy doing this after beta-1 [1]. The reason we get this
information via this view (and underlying function) is that it
simplifies the queries on the subscriber-side as you can see in the
second patch. The query change is as below:
[1] - /messages/by-id/91075.1652929852@sss.pgh.pa.us

I think Tom's concern is that adding information to a view seems like adding a
feature that hadn't previously been contemplated.
(Catalog changes themselves are not prohibited during the beta period).

It certainly smells like a new feature, but my concern was more around the
post-beta catalog change. We do those only if really forced to, and the
explanation in the commit message didn't satisfy me as to why it was
necessary. This explanation isn't much better --- if we're trying to
prohibit a certain class of publication definitions, what good does it do
to check that on the subscriber side?

It is required on the subscriber side because prohibition is only for
the cases where multiple publications are combined. We disallow the
cases where the column list is different for the same table when
combining publications. For example:

Publisher-side:
Create table tab(c1 int, c2 int);
Create Publication pub1 for table tab(c1);
Create Publication pub1 for table tab(c2);

Subscriber-side:
Create Subscription sub1 Connection 'dbname=postgres' Publication pub1, pub2;

We want to prohibit such cases. So, it would be better to check at the
time of 'Create Subscription' to validate such combinations and
prohibit them. To achieve that we extended the existing function
pg_get_publication_tables() and view pg_publication_tables to expose
the column list and verify such a combination. We primarily need
column list information for this prohibition but it appeared natural
to expose the row filter.

I still feel that the current approach to extend the underlying
function and view is a better idea but if you and or others are not
convinced then we can try to achieve it by extending the existing
query on the subscriber side as mentioned in my previous email [1].
Kindly let me know your opinion?

Unless someone has objections or thinks otherwise, I am planning to
proceed with the approach of extending the function/view (patch for
which is already committed) and using it to prohibit the combinations
of publications having different column lists as is done in the
currently proposed patch [1]/messages/by-id/OS0PR01MB5716AD7C0FE7386630BDBAAB94D79@OS0PR01MB5716.jpnprd01.prod.outlook.com.

[1]: /messages/by-id/OS0PR01MB5716AD7C0FE7386630BDBAAB94D79@OS0PR01MB5716.jpnprd01.prod.outlook.com

--
With Regards,
Amit Kapila.

#76

Amit Kapila

amit.kapila16@gmail.com

over 3 years ago

In reply to: houzj.fnst@fujitsu.com (#73)

1 attachment(s)

Re: bogus: logical replication rows/cols combinations

On Tue, May 24, 2022 at 11:03 AM houzj.fnst@fujitsu.com
<houzj.fnst@fujitsu.com> wrote:

On Friday, May 20, 2022 11:06 AM Amit Kapila <amit.kapila16@gmail.com> wrote:

Thanks for pointing it out. Here is the new version patch which add this version check.

I have added/edited a few comments and ran pgindent. The attached
looks good to me. I'll push this early next week unless there are more
comments/suggestions.

--
With Regards,
Amit Kapila.

Attachments:

v5-0001-Prohibit-combining-publications-with-different-co.patchapplication/octet-stream; name=v5-0001-Prohibit-combining-publications-with-different-co.patchDownload

From 201721776782809c3bb99bb6ff8fbdedb02483ab Mon Sep 17 00:00:00 2001
From: Amit Kapila <akapila@postgresql.org>
Date: Fri, 27 May 2022 08:30:33 +0530
Subject: [PATCH v5] Prohibit combining publications with different column
 lists.

The main purpose of introducing a column list is to have statically
different shapes on publisher and subscriber or hide sensitive column
data. In both cases, it doesn't seem to make sense to combine column
lists.

So, we disallow the cases where the column list is different for the same
table when combining publications. It can be later extended to combine the
column lists for selective cases where required.

Reported-by: Alvaro Herrera
Author: Hou Zhijie
Reviewed-by: Amit Kapila, Alvaro Herrera, Shi Yu, Takamichi Osumi
Discussion: https://postgr.es/m/202204251548.mudq7jbqnh7r@alvherre.pgsql
---
 doc/src/sgml/ref/alter_publication.sgml     |  12 ++-
 doc/src/sgml/ref/create_subscription.sgml   |   5 +
 src/backend/commands/subscriptioncmds.c     |  28 +++++-
 src/backend/replication/logical/tablesync.c |  72 ++++++-------
 src/backend/replication/pgoutput/pgoutput.c |  80 +++++++--------
 src/test/subscription/t/031_column_list.pl  | 150 ++++++++++++----------------
 6 files changed, 181 insertions(+), 166 deletions(-)

diff --git a/doc/src/sgml/ref/alter_publication.sgml b/doc/src/sgml/ref/alter_publication.sgml
index e2cce49..f03933a 100644
--- a/doc/src/sgml/ref/alter_publication.sgml
+++ b/doc/src/sgml/ref/alter_publication.sgml
@@ -116,7 +116,17 @@ ALTER PUBLICATION <replaceable class="parameter">name</replaceable> RENAME TO <r
 
      <para>
       Optionally, a column list can be specified.  See <xref
-      linkend="sql-createpublication"/> for details.
+      linkend="sql-createpublication"/> for details. Note that a subscription
+      having several publications in which the same table has been published
+      with different column lists is not supported. So, changing the column
+      lists of the tables being subscribed could cause inconsistency of column
+      lists among publications in which case <command>ALTER PUBLICATION</command>
+      command will be successful but later the WalSender in publisher or the
+      subscriber may throw an error. In this scenario, the user needs to
+      recreate the subscription after adjusting the column list or drop the
+      problematic publication using
+      <literal>ALTER SUBSCRIPTION ... DROP PUBLICATION</literal> and then add
+      it back after adjusting the column list.
      </para>
 
      <para>
diff --git a/doc/src/sgml/ref/create_subscription.sgml b/doc/src/sgml/ref/create_subscription.sgml
index 203bb41..f6f82a0 100644
--- a/doc/src/sgml/ref/create_subscription.sgml
+++ b/doc/src/sgml/ref/create_subscription.sgml
@@ -356,6 +356,11 @@ CREATE SUBSCRIPTION <replaceable class="parameter">subscription_name</replaceabl
   </para>
 
   <para>
+   Subscription having several publications in which the same table has been
+   published with different column lists is not supported.
+  </para>
+
+  <para>
    We allow non-existent publications to be specified so that users can add
    those later. This means
    <link linkend="catalog-pg-subscription"><structname>pg_subscription</structname></link>
diff --git a/src/backend/commands/subscriptioncmds.c b/src/backend/commands/subscriptioncmds.c
index 690cdaa..83e6eae 100644
--- a/src/backend/commands/subscriptioncmds.c
+++ b/src/backend/commands/subscriptioncmds.c
@@ -1754,6 +1754,11 @@ AlterSubscriptionOwner_oid(Oid subid, Oid newOwnerId)
 /*
  * Get the list of tables which belong to specified publications on the
  * publisher connection.
+ *
+ * Note that we don't support the case where the column list is different for
+ * the same table in different publications to avoid sending unwanted column
+ * information for some of the rows. This can happen when both the column
+ * list and row filter are specified for different publications.
  */
 static List *
 fetch_table_list(WalReceiverConn *wrconn, List *publications)
@@ -1761,17 +1766,23 @@ fetch_table_list(WalReceiverConn *wrconn, List *publications)
 	WalRcvExecResult *res;
 	StringInfoData cmd;
 	TupleTableSlot *slot;
-	Oid			tableRow[2] = {TEXTOID, TEXTOID};
+	Oid			tableRow[3] = {TEXTOID, TEXTOID, NAMEARRAYOID};
 	List	   *tablelist = NIL;
+	bool		check_columnlist = (walrcv_server_version(wrconn) >= 150000);
 
 	initStringInfo(&cmd);
-	appendStringInfoString(&cmd, "SELECT DISTINCT t.schemaname, t.tablename\n"
-						   "  FROM pg_catalog.pg_publication_tables t\n"
+	appendStringInfoString(&cmd, "SELECT DISTINCT t.schemaname, t.tablename \n");
+
+	/* Get column lists for each relation if the publisher supports it */
+	if (check_columnlist)
+		appendStringInfoString(&cmd, ", t.attnames\n");
+
+	appendStringInfoString(&cmd, "FROM pg_catalog.pg_publication_tables t\n"
 						   " WHERE t.pubname IN (");
 	get_publications_str(publications, &cmd, true);
 	appendStringInfoChar(&cmd, ')');
 
-	res = walrcv_exec(wrconn, cmd.data, 2, tableRow);
+	res = walrcv_exec(wrconn, cmd.data, check_columnlist ? 3 : 2, tableRow);
 	pfree(cmd.data);
 
 	if (res->status != WALRCV_OK_TUPLES)
@@ -1795,7 +1806,14 @@ fetch_table_list(WalReceiverConn *wrconn, List *publications)
 		Assert(!isnull);
 
 		rv = makeRangeVar(nspname, relname, -1);
-		tablelist = lappend(tablelist, rv);
+
+		if (check_columnlist && list_member(tablelist, rv))
+			ereport(ERROR,
+					errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+					errmsg("cannot use different column lists for table \"%s.%s\" in different publications",
+						   nspname, relname));
+		else
+			tablelist = lappend(tablelist, rv);
 
 		ExecClearTuple(slot);
 	}
diff --git a/src/backend/replication/logical/tablesync.c b/src/backend/replication/logical/tablesync.c
index 994c7a0..670c6fc 100644
--- a/src/backend/replication/logical/tablesync.c
+++ b/src/backend/replication/logical/tablesync.c
@@ -753,17 +753,6 @@ fetch_remote_table_info(char *nspname, char *relname,
 	/*
 	 * Get column lists for each relation.
 	 *
-	 * For initial synchronization, column lists can be ignored in following
-	 * cases:
-	 *
-	 * 1) one of the subscribed publications for the table hasn't specified
-	 * any column list
-	 *
-	 * 2) one of the subscribed publications has puballtables set to true
-	 *
-	 * 3) one of the subscribed publications is declared as ALL TABLES IN
-	 * SCHEMA that includes this relation
-	 *
 	 * We need to do this before fetching info about column names and types,
 	 * so that we can skip columns that should not be replicated.
 	 */
@@ -771,7 +760,7 @@ fetch_remote_table_info(char *nspname, char *relname,
 	{
 		WalRcvExecResult *pubres;
 		TupleTableSlot *slot;
-		Oid			attrsRow[] = {INT2OID};
+		Oid			attrsRow[] = {INT2VECTOROID};
 		StringInfoData pub_names;
 		bool		first = true;
 
@@ -786,19 +775,17 @@ fetch_remote_table_info(char *nspname, char *relname,
 
 		/*
 		 * Fetch info about column lists for the relation (from all the
-		 * publications). We unnest the int2vector values, because that makes
-		 * it easier to combine lists by simply adding the attnums to a new
-		 * bitmap (without having to parse the int2vector data). This
-		 * preserves NULL values, so that if one of the publications has no
-		 * column list, we'll know that.
+		 * publications).
 		 */
 		resetStringInfo(&cmd);
 		appendStringInfo(&cmd,
-						 "SELECT DISTINCT unnest"
+						 "SELECT DISTINCT"
+						 "  (CASE WHEN (array_length(gpt.attrs, 1) = c.relnatts)"
+						 "   THEN NULL ELSE gpt.attrs END)"
 						 "  FROM pg_publication p,"
-						 "  LATERAL pg_get_publication_tables(p.pubname) gpt"
-						 "  LEFT OUTER JOIN unnest(gpt.attrs) ON TRUE"
-						 " WHERE gpt.relid = %u"
+						 "  LATERAL pg_get_publication_tables(p.pubname) gpt,"
+						 "  pg_class c"
+						 " WHERE gpt.relid = %u AND c.oid = gpt.relid"
 						 "   AND p.pubname IN ( %s )",
 						 lrel->remoteid,
 						 pub_names.data);
@@ -813,26 +800,43 @@ fetch_remote_table_info(char *nspname, char *relname,
 							nspname, relname, pubres->err)));
 
 		/*
-		 * Merge the column lists (from different publications) by creating a
-		 * single bitmap with all the attnums. If we find a NULL value, that
-		 * means one of the publications has no column list for the table
-		 * we're syncing.
+		 * We don't support the case where the column list is different for
+		 * the same table when combining publications. See comments atop
+		 * fetch_table_list. So there should be only one row returned.
+		 * Although we already checked this when creating the subscription, we
+		 * still need to check here in case the column list was changed after
+		 * creating the subscription and before the sync worker is started.
+		 */
+		if (tuplestore_tuple_count(pubres->tuplestore) > 1)
+			ereport(ERROR,
+					errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+					errmsg("cannot use different column lists for table \"%s.%s\" in different publications",
+						   nspname, relname));
+
+		/*
+		 * Get the column list and build a single bitmap with the attnums.
+		 *
+		 * If we find a NULL value, it means all the columns should be
+		 * replicated.
 		 */
 		slot = MakeSingleTupleTableSlot(pubres->tupledesc, &TTSOpsMinimalTuple);
-		while (tuplestore_gettupleslot(pubres->tuplestore, true, false, slot))
+		if (tuplestore_gettupleslot(pubres->tuplestore, true, false, slot))
 		{
 			Datum		cfval = slot_getattr(slot, 1, &isnull);
 
-			/* NULL means empty column list, so we're done. */
-			if (isnull)
+			if (!isnull)
 			{
-				bms_free(included_cols);
-				included_cols = NULL;
-				break;
-			}
+				ArrayType  *arr;
+				int			nelems;
+				int16	   *elems;
 
-			included_cols = bms_add_member(included_cols,
-										   DatumGetInt16(cfval));
+				arr = DatumGetArrayTypeP(cfval);
+				nelems = ARR_DIMS(arr)[0];
+				elems = (int16 *) ARR_DATA_PTR(arr);
+
+				for (natt = 0; natt < nelems; natt++)
+					included_cols = bms_add_member(included_cols, elems[natt]);
+			}
 
 			ExecClearTuple(slot);
 		}
diff --git a/src/backend/replication/pgoutput/pgoutput.c b/src/backend/replication/pgoutput/pgoutput.c
index 42c06af..8deae57 100644
--- a/src/backend/replication/pgoutput/pgoutput.c
+++ b/src/backend/replication/pgoutput/pgoutput.c
@@ -979,30 +979,31 @@ pgoutput_column_list_init(PGOutputData *data, List *publications,
 						  RelationSyncEntry *entry)
 {
 	ListCell   *lc;
+	bool		first = true;
+	Relation	relation = RelationIdGetRelation(entry->publish_as_relid);
 
 	/*
 	 * Find if there are any column lists for this relation. If there are,
-	 * build a bitmap merging all the column lists.
-	 *
-	 * All the given publication-table mappings must be checked.
+	 * build a bitmap using the column lists.
 	 *
 	 * Multiple publications might have multiple column lists for this
 	 * relation.
 	 *
+	 * Note that we don't support the case where the column list is different
+	 * for the same table when combining publications. See comments atop
+	 * fetch_table_list. But one can later change the publication so we still
+	 * need to check all the given publication-table mappings and report an
+	 * error if any publications have a different column list.
+	 *
 	 * FOR ALL TABLES and FOR ALL TABLES IN SCHEMA implies "don't use column
-	 * list" so it takes precedence.
+	 * list".
 	 */
 	foreach(lc, publications)
 	{
 		Publication *pub = lfirst(lc);
 		HeapTuple	cftuple = NULL;
 		Datum		cfdatum = 0;
-
-		/*
-		 * Assume there's no column list. Only if we find pg_publication_rel
-		 * entry with a column list we'll switch it to false.
-		 */
-		bool		pub_no_list = true;
+		Bitmapset  *cols = NULL;
 
 		/*
 		 * If the publication is FOR ALL TABLES then it is treated the same as
@@ -1011,6 +1012,8 @@ pgoutput_column_list_init(PGOutputData *data, List *publications,
 		 */
 		if (!pub->alltables)
 		{
+			bool		pub_no_list = true;
+
 			/*
 			 * Check for the presence of a column list in this publication.
 			 *
@@ -1024,51 +1027,48 @@ pgoutput_column_list_init(PGOutputData *data, List *publications,
 
 			if (HeapTupleIsValid(cftuple))
 			{
-				/*
-				 * Lookup the column list attribute.
-				 *
-				 * Note: We update the pub_no_list value directly, because if
-				 * the value is NULL, we have no list (and vice versa).
-				 */
+				/* Lookup the column list attribute. */
 				cfdatum = SysCacheGetAttr(PUBLICATIONRELMAP, cftuple,
 										  Anum_pg_publication_rel_prattrs,
 										  &pub_no_list);
 
-				/*
-				 * Build the column list bitmap in the per-entry context.
-				 *
-				 * We need to merge column lists from all publications, so we
-				 * update the same bitmapset. If the column list is null, we
-				 * interpret it as replicating all columns.
-				 */
+				/* Build the column list bitmap in the per-entry context. */
 				if (!pub_no_list)	/* when not null */
 				{
 					pgoutput_ensure_entry_cxt(data, entry);
 
-					entry->columns = pub_collist_to_bitmapset(entry->columns,
-															  cfdatum,
-															  entry->entry_cxt);
+					cols = pub_collist_to_bitmapset(cols, cfdatum,
+													entry->entry_cxt);
+
+					/*
+					 * If column list includes all the columns of the table,
+					 * set it to NULL.
+					 */
+					if (bms_num_members(cols) == RelationGetNumberOfAttributes(relation))
+					{
+						bms_free(cols);
+						cols = NULL;
+					}
 				}
+
+				ReleaseSysCache(cftuple);
 			}
 		}
 
-		/*
-		 * Found a publication with no column list, so we're done. But first
-		 * discard column list we might have from preceding publications.
-		 */
-		if (pub_no_list)
+		if (first)
 		{
-			if (cftuple)
-				ReleaseSysCache(cftuple);
-
-			bms_free(entry->columns);
-			entry->columns = NULL;
-
-			break;
+			entry->columns = cols;
+			first = false;
 		}
-
-		ReleaseSysCache(cftuple);
+		else if (!bms_equal(entry->columns, cols))
+			ereport(ERROR,
+					errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+					errmsg("cannot use different column lists for table \"%s.%s\" in different publications",
+						   get_namespace_name(RelationGetNamespace(relation)),
+						   RelationGetRelationName(relation)));
 	}							/* loop all subscribed publications */
+
+	RelationClose(relation);
 }
 
 /*
diff --git a/src/test/subscription/t/031_column_list.pl b/src/test/subscription/t/031_column_list.pl
index 19812e1..7f031bc 100644
--- a/src/test/subscription/t/031_column_list.pl
+++ b/src/test/subscription/t/031_column_list.pl
@@ -20,6 +20,7 @@ $node_subscriber->append_conf('postgresql.conf',
 $node_subscriber->start;
 
 my $publisher_connstr = $node_publisher->connstr . ' dbname=postgres';
+my $offset = 0;
 
 sub wait_for_subscription_sync
 {
@@ -361,13 +362,13 @@ is( $result, qq(1|abc
 2|xyz), 'update on column tab2.c is not replicated');
 
 
-# TEST: add a table to two publications with different column lists, and
+# TEST: add a table to two publications with same column lists, and
 # create a single subscription replicating both publications
 $node_publisher->safe_psql(
 	'postgres', qq(
 	CREATE TABLE tab5 (a int PRIMARY KEY, b int, c int, d int);
 	CREATE PUBLICATION pub2 FOR TABLE tab5 (a, b);
-	CREATE PUBLICATION pub3 FOR TABLE tab5 (a, d);
+	CREATE PUBLICATION pub3 FOR TABLE tab5 (a, b);
 
 	-- insert a couple initial records
 	INSERT INTO tab5 VALUES (1, 11, 111, 1111);
@@ -388,8 +389,7 @@ wait_for_subscription_sync($node_subscriber);
 
 $node_publisher->wait_for_catchup('sub1');
 
-# insert data and make sure all the columns (union of the columns lists)
-# get fully replicated
+# insert data and make sure the columns in column list get fully replicated
 $node_publisher->safe_psql(
 	'postgres', qq(
 	INSERT INTO tab5 VALUES (3, 33, 333, 3333);
@@ -399,42 +399,12 @@ $node_publisher->safe_psql(
 $node_publisher->wait_for_catchup('sub1');
 
 is( $node_subscriber->safe_psql('postgres', "SELECT * FROM tab5 ORDER BY a"),
-	qq(1|11|1111
-2|22|2222
-3|33|3333
-4|44|4444),
+	qq(1|11|
+2|22|
+3|33|
+4|44|),
 	'overlapping publications with overlapping column lists');
 
-# and finally, remove the column list for one of the publications, which
-# means replicating all columns (removing the column list), but first add
-# the missing column to the table on subscriber
-$node_publisher->safe_psql(
-	'postgres', qq(
-	ALTER PUBLICATION pub3 SET TABLE tab5;
-));
-
-$node_subscriber->safe_psql(
-	'postgres', qq(
-	ALTER SUBSCRIPTION sub1 REFRESH PUBLICATION;
-	ALTER TABLE tab5 ADD COLUMN c INT;
-));
-
-wait_for_subscription_sync($node_subscriber);
-
-$node_publisher->safe_psql(
-	'postgres', qq(
-	INSERT INTO tab5 VALUES (5, 55, 555, 5555);
-));
-
-$node_publisher->wait_for_catchup('sub1');
-
-is( $node_subscriber->safe_psql('postgres', "SELECT * FROM tab5 ORDER BY a"),
-	qq(1|11|1111|
-2|22|2222|
-3|33|3333|
-4|44|4444|
-5|55|5555|555),
-	'overlapping publications with overlapping column lists');
 
 # TEST: create a table with a column list, then change the replica
 # identity by replacing a primary key (but use a different column in
@@ -900,57 +870,21 @@ is( $node_subscriber->safe_psql(
 3|),
 	'partitions with different replica identities not replicated correctly');
 
-# TEST: With a table included in multiple publications, we should use a
-# union of the column lists. So with column lists (a,b) and (a,c) we
-# should replicate (a,b,c).
-
-$node_publisher->safe_psql(
-	'postgres', qq(
-	CREATE TABLE test_mix_1 (a int PRIMARY KEY, b int, c int);
-	CREATE PUBLICATION pub_mix_1 FOR TABLE test_mix_1 (a, b);
-	CREATE PUBLICATION pub_mix_2 FOR TABLE test_mix_1 (a, c);
-
-	-- initial data
-	INSERT INTO test_mix_1 VALUES (1, 2, 3);
-));
-
-$node_subscriber->safe_psql(
-	'postgres', qq(
-	CREATE TABLE test_mix_1 (a int PRIMARY KEY, b int, c int);
-	ALTER SUBSCRIPTION sub1 SET PUBLICATION pub_mix_1, pub_mix_2;
-));
-
-wait_for_subscription_sync($node_subscriber);
-
-$node_publisher->safe_psql(
-	'postgres', qq(
-	INSERT INTO test_mix_1 VALUES (4, 5, 6);
-));
-
-$node_publisher->wait_for_catchup('sub1');
-
-is( $node_subscriber->safe_psql(
-		'postgres', "SELECT * FROM test_mix_1 ORDER BY a"),
-	qq(1|2|3
-4|5|6),
-	'a mix of publications should use a union of column list');
-
 
-# TEST: With a table included in multiple publications, we should use a
-# union of the column lists. If any of the publications is FOR ALL
-# TABLES, we should replicate all columns.
+# TEST: With a table included in the publications which is FOR ALL TABLES, it
+# means replicate all columns.
 
 # drop unnecessary tables, so as not to interfere with the FOR ALL TABLES
 $node_publisher->safe_psql(
 	'postgres', qq(
-	DROP TABLE tab1, tab2, tab3, tab4, tab5, tab6, tab7, test_mix_1,
+	DROP TABLE tab1, tab2, tab3, tab4, tab5, tab6, tab7,
 			   test_part, test_part_a, test_part_b, test_part_c, test_part_d;
 ));
 
 $node_publisher->safe_psql(
 	'postgres', qq(
 	CREATE TABLE test_mix_2 (a int PRIMARY KEY, b int, c int);
-	CREATE PUBLICATION pub_mix_3 FOR TABLE test_mix_2 (a, b);
+	CREATE PUBLICATION pub_mix_3 FOR TABLE test_mix_2 (a, b, c);
 	CREATE PUBLICATION pub_mix_4 FOR ALL TABLES;
 
 	-- initial data
@@ -976,12 +910,11 @@ $node_publisher->wait_for_catchup('sub1');
 is( $node_subscriber->safe_psql('postgres', "SELECT * FROM test_mix_2"),
 	qq(1|2|3
 4|5|6),
-	'a mix of publications should use a union of column list');
+	'all columns should be replicated');
 
 
-# TEST: With a table included in multiple publications, we should use a
-# union of the column lists. If any of the publications is FOR ALL
-# TABLES IN SCHEMA, we should replicate all columns.
+# TEST: With a table included in the publication which is FOR ALL TABLES
+# IN SCHEMA, it means replicate all columns.
 
 $node_subscriber->safe_psql(
 	'postgres', qq(
@@ -993,7 +926,7 @@ $node_publisher->safe_psql(
 	'postgres', qq(
 	DROP TABLE test_mix_2;
 	CREATE TABLE test_mix_3 (a int PRIMARY KEY, b int, c int);
-	CREATE PUBLICATION pub_mix_5 FOR TABLE test_mix_3 (a, b);
+	CREATE PUBLICATION pub_mix_5 FOR TABLE test_mix_3 (a, b, c);
 	CREATE PUBLICATION pub_mix_6 FOR ALL TABLES IN SCHEMA public;
 
 	-- initial data
@@ -1017,7 +950,7 @@ $node_publisher->wait_for_catchup('sub1');
 is( $node_subscriber->safe_psql('postgres', "SELECT * FROM test_mix_3"),
 	qq(1|2|3
 4|5|6),
-	'a mix of publications should use a union of column list');
+	'all columns should be replicated');
 
 
 # TEST: Check handling of publish_via_partition_root - if a partition is
@@ -1074,7 +1007,7 @@ is( $node_subscriber->safe_psql(
 # TEST: Multiple publications which publish schema of parent table and
 # partition. The partition is published through two publications, once
 # through a schema (so no column list) containing the parent, and then
-# also directly (with a columns list). The expected outcome is there is
+# also directly (with all columns). The expected outcome is there is
 # no column list.
 
 $node_publisher->safe_psql(
@@ -1086,7 +1019,7 @@ $node_publisher->safe_psql(
 	CREATE TABLE t_1 PARTITION OF s1.t FOR VALUES FROM (1) TO (10);
 
 	CREATE PUBLICATION pub1 FOR ALL TABLES IN SCHEMA s1;
-	CREATE PUBLICATION pub2 FOR TABLE t_1(b);
+	CREATE PUBLICATION pub2 FOR TABLE t_1(a, b, c);
 
 	-- initial data
 	INSERT INTO s1.t VALUES (1, 2, 3);
@@ -1233,6 +1166,51 @@ is( $node_subscriber->safe_psql(
 	'publication containing both parent and child relation');
 
 
+# TEST: With a table included in multiple publications with different column
+# lists, we should catch the error when creating the subscription.
+
+$node_publisher->safe_psql('postgres', qq(
+	CREATE TABLE test_mix_1 (a int PRIMARY KEY, b int, c int);
+	CREATE PUBLICATION pub_mix_1 FOR TABLE test_mix_1 (a, b);
+	CREATE PUBLICATION pub_mix_2 FOR TABLE test_mix_1 (a, c);
+));
+
+$node_subscriber->safe_psql('postgres', qq(
+	DROP SUBSCRIPTION sub1;
+	CREATE TABLE test_mix_1 (a int PRIMARY KEY, b int, c int);
+));
+
+my ($cmdret, $stdout, $stderr) = $node_subscriber->psql(
+	'postgres', qq(
+	CREATE SUBSCRIPTION sub1 CONNECTION '$publisher_connstr' PUBLICATION pub_mix_1, pub_mix_2;
+));
+
+ok( $stderr =~
+	  qr/cannot use different column lists for table "public.test_mix_1" in different publications/,
+	'different column lists detected');
+
+# TEST: If the column list is changed after creating the subscription, we
+# should catch the error reported by walsender.
+
+$node_publisher->safe_psql('postgres', qq(
+	ALTER PUBLICATION pub_mix_1 SET TABLE test_mix_1 (a, c);
+));
+
+$node_subscriber->safe_psql('postgres', qq(
+	CREATE SUBSCRIPTION sub1 CONNECTION '$publisher_connstr' PUBLICATION pub_mix_1, pub_mix_2;
+));
+
+$node_publisher->wait_for_catchup('sub1');
+
+$node_publisher->safe_psql('postgres', qq(
+	ALTER PUBLICATION pub_mix_1 SET TABLE test_mix_1 (a, b);
+	INSERT INTO test_mix_1 VALUES(1, 1, 1);
+));
+
+$offset = $node_publisher->wait_for_log(
+	qr/cannot use different column lists for table "public.test_mix_1" in different publications/,
+	$offset);
+
 $node_subscriber->stop('fast');
 $node_publisher->stop('fast');
 
-- 
1.8.3.1

#77

Justin Pryzby

pryzby@telsasoft.com

over 3 years ago

In reply to: Amit Kapila (#76)

Re: bogus: logical replication rows/cols combinations

On Fri, May 27, 2022 at 11:17:00AM +0530, Amit Kapila wrote:

On Tue, May 24, 2022 at 11:03 AM houzj.fnst@fujitsu.com <houzj.fnst@fujitsu.com> wrote:

On Friday, May 20, 2022 11:06 AM Amit Kapila <amit.kapila16@gmail.com> wrote:

Thanks for pointing it out. Here is the new version patch which add this version check.

I have added/edited a few comments and ran pgindent. The attached
looks good to me. I'll push this early next week unless there are more
comments/suggestions.

A minor doc review.
Note that I also sent some doc comments at 20220519120724.GO19626@telsasoft.com.

+      lists among publications in which case <command>ALTER PUBLICATION</command>
+      command will be successful but later the WalSender in publisher or the

COMMA in which

remove "command" ?

s/in publisher/on the publisher/

+   Subscription having several publications in which the same table has been
+   published with different column lists is not supported.

Either "Subscriptions having .. are not supported"; or,
"A subscription having .. is not supported".

#78

houzj.fnst@fujitsu.com

over 3 years ago

In reply to: Justin Pryzby (#77)

2 attachment(s)

RE: bogus: logical replication rows/cols combinations

On Friday, May 27, 2022 1:54 PM Justin Pryzby <pryzby@telsasoft.com> wrote:

On Fri, May 27, 2022 at 11:17:00AM +0530, Amit Kapila wrote:

On Tue, May 24, 2022 at 11:03 AM houzj.fnst@fujitsu.com

<houzj.fnst@fujitsu.com> wrote:

On Friday, May 20, 2022 11:06 AM Amit Kapila <amit.kapila16@gmail.com>

wrote:

Thanks for pointing it out. Here is the new version patch which add this

version check.

I have added/edited a few comments and ran pgindent. The attached
looks good to me. I'll push this early next week unless there are more
comments/suggestions.

A minor doc review.
Note that I also sent some doc comments at
20220519120724.GO19626@telsasoft.com.
+      lists among publications in which case <command>ALTER
PUBLICATION</command>
+      command will be successful but later the WalSender in publisher
+ or the
COMMA in which

remove "command" ?

s/in publisher/on the publisher/
+   Subscription having several publications in which the same table has been
+   published with different column lists is not supported.
Either "Subscriptions having .. are not supported"; or, "A subscription having ..
is not supported".

Thanks for the comments. Here is the new version patch set which fixes these.

Best regards,
Hou zj

Attachments:

0001-language-fixes-on-HEAD-from-Justin.patchapplication/octet-stream; name=0001-language-fixes-on-HEAD-from-Justin.patchDownload

From 83a6e85100a173124396f8d4ac29ff170319f0a4 Mon Sep 17 00:00:00 2001
From: "houzj.fnst" <houzj.fnst@cn.fujitsu.com>
Date: Fri, 27 May 2022 15:24:52 +0800
Subject: [PATCH] language fixes from Justin

---
 doc/src/sgml/catalogs.sgml           | 6 +++---
 src/backend/catalog/pg_publication.c | 2 +-
 src/include/catalog/pg_proc.dat      | 2 +-
 3 files changed, 5 insertions(+), 5 deletions(-)

diff --git a/doc/src/sgml/catalogs.sgml b/doc/src/sgml/catalogs.sgml
index c00c93d..9808180 100644
--- a/doc/src/sgml/catalogs.sgml
+++ b/doc/src/sgml/catalogs.sgml
@@ -9701,7 +9701,7 @@ SCRAM-SHA-256$<replaceable>&lt;iteration count&gt;</replaceable>:<replaceable>&l
 
      <row>
       <entry><link linkend="view-pg-publication-tables"><structname>pg_publication_tables</structname></link></entry>
-      <entry>publications and information of their associated tables</entry>
+      <entry>publications and information about their associated tables</entry>
      </row>
 
      <row>
@@ -11645,7 +11645,7 @@ SELECT * FROM pg_locks pl LEFT JOIN pg_prepared_xacts ppx
 
   <para>
    The view <structname>pg_publication_tables</structname> provides
-   information about the mapping between publications and information of
+   information about the mapping between publications and information about
    tables they contain.  Unlike the underlying catalog
    <link linkend="catalog-pg-publication-rel"><structname>pg_publication_rel</structname></link>,
    this view expands publications defined as <literal>FOR ALL TABLES</literal>
@@ -11705,7 +11705,7 @@ SELECT * FROM pg_locks pl LEFT JOIN pg_prepared_xacts ppx
       </para>
       <para>
        Names of table columns included in the publication. This contains all
-       the columns of the table when the user didn't specify the column list
+       the columns of the table when the user didn't specify a column list
        for the table.
       </para></entry>
      </row>
diff --git a/src/backend/catalog/pg_publication.c b/src/backend/catalog/pg_publication.c
index 8c7fca6..2f706f6 100644
--- a/src/backend/catalog/pg_publication.c
+++ b/src/backend/catalog/pg_publication.c
@@ -1077,7 +1077,7 @@ get_publication_name(Oid pubid, bool missing_ok)
 }
 
 /*
- * Returns information of tables in a publication.
+ * Returns information about tables in a publication.
  */
 Datum
 pg_get_publication_tables(PG_FUNCTION_ARGS)
diff --git a/src/include/catalog/pg_proc.dat b/src/include/catalog/pg_proc.dat
index 87aa571..86f1329 100644
--- a/src/include/catalog/pg_proc.dat
+++ b/src/include/catalog/pg_proc.dat
@@ -11673,7 +11673,7 @@
   prosrc => 'pg_show_replication_origin_status' },
 
 # publications
-{ oid => '6119', descr => 'get information of tables in a publication',
+{ oid => '6119', descr => 'get information about tables in a publication',
   proname => 'pg_get_publication_tables', prorows => '1000', proretset => 't',
   provolatile => 's', prorettype => 'record', proargtypes => 'text',
   proallargtypes => '{text,oid,int2vector,pg_node_tree}', proargmodes => '{i,o,o,o}',
-- 
2.7.2.windows.1

v6-0001-Prohibit-combining-publications-with-different-co.patchapplication/octet-stream; name=v6-0001-Prohibit-combining-publications-with-different-co.patchDownload

From 201721776782809c3bb99bb6ff8fbdedb02483ab Mon Sep 17 00:00:00 2001
From: Amit Kapila <akapila@postgresql.org>
Date: Fri, 27 May 2022 08:30:33 +0530
Subject: [PATCH v5] Prohibit combining publications with different column
 lists.

The main purpose of introducing a column list is to have statically
different shapes on publisher and subscriber or hide sensitive column
data. In both cases, it doesn't seem to make sense to combine column
lists.

So, we disallow the cases where the column list is different for the same
table when combining publications. It can be later extended to combine the
column lists for selective cases where required.

Reported-by: Alvaro Herrera
Author: Hou Zhijie
Reviewed-by: Amit Kapila, Alvaro Herrera, Shi Yu, Takamichi Osumi, Justin Pryzby
Discussion: https://postgr.es/m/202204251548.mudq7jbqnh7r@alvherre.pgsql
---
 doc/src/sgml/ref/alter_publication.sgml     |  12 ++-
 doc/src/sgml/ref/create_subscription.sgml   |   5 +
 src/backend/commands/subscriptioncmds.c     |  28 +++++-
 src/backend/replication/logical/tablesync.c |  72 ++++++-------
 src/backend/replication/pgoutput/pgoutput.c |  80 +++++++--------
 src/test/subscription/t/031_column_list.pl  | 150 ++++++++++++----------------
 6 files changed, 181 insertions(+), 166 deletions(-)

diff --git a/doc/src/sgml/ref/alter_publication.sgml b/doc/src/sgml/ref/alter_publication.sgml
index e2cce49..f03933a 100644
--- a/doc/src/sgml/ref/alter_publication.sgml
+++ b/doc/src/sgml/ref/alter_publication.sgml
@@ -116,7 +116,17 @@ ALTER PUBLICATION <replaceable class="parameter">name</replaceable> RENAME TO <r
 
      <para>
       Optionally, a column list can be specified.  See <xref
-      linkend="sql-createpublication"/> for details.
+      linkend="sql-createpublication"/> for details. Note that a subscription
+      having several publications in which the same table has been published
+      with different column lists is not supported. So, changing the column
+      lists of the tables being subscribed could cause inconsistency of column
+      lists among publications, in which case <command>ALTER PUBLICATION</command>
+      will be successful but later the WalSender on the publisher or the
+      subscriber may throw an error. In this scenario, the user needs to
+      recreate the subscription after adjusting the column list or drop the
+      problematic publication using
+      <literal>ALTER SUBSCRIPTION ... DROP PUBLICATION</literal> and then add
+      it back after adjusting the column list.
      </para>
 
      <para>
diff --git a/doc/src/sgml/ref/create_subscription.sgml b/doc/src/sgml/ref/create_subscription.sgml
index 203bb41..f6f82a0 100644
--- a/doc/src/sgml/ref/create_subscription.sgml
+++ b/doc/src/sgml/ref/create_subscription.sgml
@@ -356,6 +356,11 @@ CREATE SUBSCRIPTION <replaceable class="parameter">subscription_name</replaceabl
   </para>
 
   <para>
+   Subscriptions having several publications in which the same table has been
+   published with different column lists are not supported.
+  </para>
+
+  <para>
    We allow non-existent publications to be specified so that users can add
    those later. This means
    <link linkend="catalog-pg-subscription"><structname>pg_subscription</structname></link>
diff --git a/src/backend/commands/subscriptioncmds.c b/src/backend/commands/subscriptioncmds.c
index 690cdaa..83e6eae 100644
--- a/src/backend/commands/subscriptioncmds.c
+++ b/src/backend/commands/subscriptioncmds.c
@@ -1754,6 +1754,11 @@ AlterSubscriptionOwner_oid(Oid subid, Oid newOwnerId)
 /*
  * Get the list of tables which belong to specified publications on the
  * publisher connection.
+ *
+ * Note that we don't support the case where the column list is different for
+ * the same table in different publications to avoid sending unwanted column
+ * information for some of the rows. This can happen when both the column
+ * list and row filter are specified for different publications.
  */
 static List *
 fetch_table_list(WalReceiverConn *wrconn, List *publications)
@@ -1761,17 +1766,23 @@ fetch_table_list(WalReceiverConn *wrconn, List *publications)
 	WalRcvExecResult *res;
 	StringInfoData cmd;
 	TupleTableSlot *slot;
-	Oid			tableRow[2] = {TEXTOID, TEXTOID};
+	Oid			tableRow[3] = {TEXTOID, TEXTOID, NAMEARRAYOID};
 	List	   *tablelist = NIL;
+	bool		check_columnlist = (walrcv_server_version(wrconn) >= 150000);
 
 	initStringInfo(&cmd);
-	appendStringInfoString(&cmd, "SELECT DISTINCT t.schemaname, t.tablename\n"
-						   "  FROM pg_catalog.pg_publication_tables t\n"
+	appendStringInfoString(&cmd, "SELECT DISTINCT t.schemaname, t.tablename \n");
+
+	/* Get column lists for each relation if the publisher supports it */
+	if (check_columnlist)
+		appendStringInfoString(&cmd, ", t.attnames\n");
+
+	appendStringInfoString(&cmd, "FROM pg_catalog.pg_publication_tables t\n"
 						   " WHERE t.pubname IN (");
 	get_publications_str(publications, &cmd, true);
 	appendStringInfoChar(&cmd, ')');
 
-	res = walrcv_exec(wrconn, cmd.data, 2, tableRow);
+	res = walrcv_exec(wrconn, cmd.data, check_columnlist ? 3 : 2, tableRow);
 	pfree(cmd.data);
 
 	if (res->status != WALRCV_OK_TUPLES)
@@ -1795,7 +1806,14 @@ fetch_table_list(WalReceiverConn *wrconn, List *publications)
 		Assert(!isnull);
 
 		rv = makeRangeVar(nspname, relname, -1);
-		tablelist = lappend(tablelist, rv);
+
+		if (check_columnlist && list_member(tablelist, rv))
+			ereport(ERROR,
+					errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+					errmsg("cannot use different column lists for table \"%s.%s\" in different publications",
+						   nspname, relname));
+		else
+			tablelist = lappend(tablelist, rv);
 
 		ExecClearTuple(slot);
 	}
diff --git a/src/backend/replication/logical/tablesync.c b/src/backend/replication/logical/tablesync.c
index 994c7a0..670c6fc 100644
--- a/src/backend/replication/logical/tablesync.c
+++ b/src/backend/replication/logical/tablesync.c
@@ -753,17 +753,6 @@ fetch_remote_table_info(char *nspname, char *relname,
 	/*
 	 * Get column lists for each relation.
 	 *
-	 * For initial synchronization, column lists can be ignored in following
-	 * cases:
-	 *
-	 * 1) one of the subscribed publications for the table hasn't specified
-	 * any column list
-	 *
-	 * 2) one of the subscribed publications has puballtables set to true
-	 *
-	 * 3) one of the subscribed publications is declared as ALL TABLES IN
-	 * SCHEMA that includes this relation
-	 *
 	 * We need to do this before fetching info about column names and types,
 	 * so that we can skip columns that should not be replicated.
 	 */
@@ -771,7 +760,7 @@ fetch_remote_table_info(char *nspname, char *relname,
 	{
 		WalRcvExecResult *pubres;
 		TupleTableSlot *slot;
-		Oid			attrsRow[] = {INT2OID};
+		Oid			attrsRow[] = {INT2VECTOROID};
 		StringInfoData pub_names;
 		bool		first = true;
 
@@ -786,19 +775,17 @@ fetch_remote_table_info(char *nspname, char *relname,
 
 		/*
 		 * Fetch info about column lists for the relation (from all the
-		 * publications). We unnest the int2vector values, because that makes
-		 * it easier to combine lists by simply adding the attnums to a new
-		 * bitmap (without having to parse the int2vector data). This
-		 * preserves NULL values, so that if one of the publications has no
-		 * column list, we'll know that.
+		 * publications).
 		 */
 		resetStringInfo(&cmd);
 		appendStringInfo(&cmd,
-						 "SELECT DISTINCT unnest"
+						 "SELECT DISTINCT"
+						 "  (CASE WHEN (array_length(gpt.attrs, 1) = c.relnatts)"
+						 "   THEN NULL ELSE gpt.attrs END)"
 						 "  FROM pg_publication p,"
-						 "  LATERAL pg_get_publication_tables(p.pubname) gpt"
-						 "  LEFT OUTER JOIN unnest(gpt.attrs) ON TRUE"
-						 " WHERE gpt.relid = %u"
+						 "  LATERAL pg_get_publication_tables(p.pubname) gpt,"
+						 "  pg_class c"
+						 " WHERE gpt.relid = %u AND c.oid = gpt.relid"
 						 "   AND p.pubname IN ( %s )",
 						 lrel->remoteid,
 						 pub_names.data);
@@ -813,26 +800,43 @@ fetch_remote_table_info(char *nspname, char *relname,
 							nspname, relname, pubres->err)));
 
 		/*
-		 * Merge the column lists (from different publications) by creating a
-		 * single bitmap with all the attnums. If we find a NULL value, that
-		 * means one of the publications has no column list for the table
-		 * we're syncing.
+		 * We don't support the case where the column list is different for
+		 * the same table when combining publications. See comments atop
+		 * fetch_table_list. So there should be only one row returned.
+		 * Although we already checked this when creating the subscription, we
+		 * still need to check here in case the column list was changed after
+		 * creating the subscription and before the sync worker is started.
+		 */
+		if (tuplestore_tuple_count(pubres->tuplestore) > 1)
+			ereport(ERROR,
+					errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+					errmsg("cannot use different column lists for table \"%s.%s\" in different publications",
+						   nspname, relname));
+
+		/*
+		 * Get the column list and build a single bitmap with the attnums.
+		 *
+		 * If we find a NULL value, it means all the columns should be
+		 * replicated.
 		 */
 		slot = MakeSingleTupleTableSlot(pubres->tupledesc, &TTSOpsMinimalTuple);
-		while (tuplestore_gettupleslot(pubres->tuplestore, true, false, slot))
+		if (tuplestore_gettupleslot(pubres->tuplestore, true, false, slot))
 		{
 			Datum		cfval = slot_getattr(slot, 1, &isnull);
 
-			/* NULL means empty column list, so we're done. */
-			if (isnull)
+			if (!isnull)
 			{
-				bms_free(included_cols);
-				included_cols = NULL;
-				break;
-			}
+				ArrayType  *arr;
+				int			nelems;
+				int16	   *elems;
 
-			included_cols = bms_add_member(included_cols,
-										   DatumGetInt16(cfval));
+				arr = DatumGetArrayTypeP(cfval);
+				nelems = ARR_DIMS(arr)[0];
+				elems = (int16 *) ARR_DATA_PTR(arr);
+
+				for (natt = 0; natt < nelems; natt++)
+					included_cols = bms_add_member(included_cols, elems[natt]);
+			}
 
 			ExecClearTuple(slot);
 		}
diff --git a/src/backend/replication/pgoutput/pgoutput.c b/src/backend/replication/pgoutput/pgoutput.c
index 42c06af..8deae57 100644
--- a/src/backend/replication/pgoutput/pgoutput.c
+++ b/src/backend/replication/pgoutput/pgoutput.c
@@ -979,30 +979,31 @@ pgoutput_column_list_init(PGOutputData *data, List *publications,
 						  RelationSyncEntry *entry)
 {
 	ListCell   *lc;
+	bool		first = true;
+	Relation	relation = RelationIdGetRelation(entry->publish_as_relid);
 
 	/*
 	 * Find if there are any column lists for this relation. If there are,
-	 * build a bitmap merging all the column lists.
-	 *
-	 * All the given publication-table mappings must be checked.
+	 * build a bitmap using the column lists.
 	 *
 	 * Multiple publications might have multiple column lists for this
 	 * relation.
 	 *
+	 * Note that we don't support the case where the column list is different
+	 * for the same table when combining publications. See comments atop
+	 * fetch_table_list. But one can later change the publication so we still
+	 * need to check all the given publication-table mappings and report an
+	 * error if any publications have a different column list.
+	 *
 	 * FOR ALL TABLES and FOR ALL TABLES IN SCHEMA implies "don't use column
-	 * list" so it takes precedence.
+	 * list".
 	 */
 	foreach(lc, publications)
 	{
 		Publication *pub = lfirst(lc);
 		HeapTuple	cftuple = NULL;
 		Datum		cfdatum = 0;
-
-		/*
-		 * Assume there's no column list. Only if we find pg_publication_rel
-		 * entry with a column list we'll switch it to false.
-		 */
-		bool		pub_no_list = true;
+		Bitmapset  *cols = NULL;
 
 		/*
 		 * If the publication is FOR ALL TABLES then it is treated the same as
@@ -1011,6 +1012,8 @@ pgoutput_column_list_init(PGOutputData *data, List *publications,
 		 */
 		if (!pub->alltables)
 		{
+			bool		pub_no_list = true;
+
 			/*
 			 * Check for the presence of a column list in this publication.
 			 *
@@ -1024,51 +1027,48 @@ pgoutput_column_list_init(PGOutputData *data, List *publications,
 
 			if (HeapTupleIsValid(cftuple))
 			{
-				/*
-				 * Lookup the column list attribute.
-				 *
-				 * Note: We update the pub_no_list value directly, because if
-				 * the value is NULL, we have no list (and vice versa).
-				 */
+				/* Lookup the column list attribute. */
 				cfdatum = SysCacheGetAttr(PUBLICATIONRELMAP, cftuple,
 										  Anum_pg_publication_rel_prattrs,
 										  &pub_no_list);
 
-				/*
-				 * Build the column list bitmap in the per-entry context.
-				 *
-				 * We need to merge column lists from all publications, so we
-				 * update the same bitmapset. If the column list is null, we
-				 * interpret it as replicating all columns.
-				 */
+				/* Build the column list bitmap in the per-entry context. */
 				if (!pub_no_list)	/* when not null */
 				{
 					pgoutput_ensure_entry_cxt(data, entry);
 
-					entry->columns = pub_collist_to_bitmapset(entry->columns,
-															  cfdatum,
-															  entry->entry_cxt);
+					cols = pub_collist_to_bitmapset(cols, cfdatum,
+													entry->entry_cxt);
+
+					/*
+					 * If column list includes all the columns of the table,
+					 * set it to NULL.
+					 */
+					if (bms_num_members(cols) == RelationGetNumberOfAttributes(relation))
+					{
+						bms_free(cols);
+						cols = NULL;
+					}
 				}
+
+				ReleaseSysCache(cftuple);
 			}
 		}
 
-		/*
-		 * Found a publication with no column list, so we're done. But first
-		 * discard column list we might have from preceding publications.
-		 */
-		if (pub_no_list)
+		if (first)
 		{
-			if (cftuple)
-				ReleaseSysCache(cftuple);
-
-			bms_free(entry->columns);
-			entry->columns = NULL;
-
-			break;
+			entry->columns = cols;
+			first = false;
 		}
-
-		ReleaseSysCache(cftuple);
+		else if (!bms_equal(entry->columns, cols))
+			ereport(ERROR,
+					errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+					errmsg("cannot use different column lists for table \"%s.%s\" in different publications",
+						   get_namespace_name(RelationGetNamespace(relation)),
+						   RelationGetRelationName(relation)));
 	}							/* loop all subscribed publications */
+
+	RelationClose(relation);
 }
 
 /*
diff --git a/src/test/subscription/t/031_column_list.pl b/src/test/subscription/t/031_column_list.pl
index 19812e1..7f031bc 100644
--- a/src/test/subscription/t/031_column_list.pl
+++ b/src/test/subscription/t/031_column_list.pl
@@ -20,6 +20,7 @@ $node_subscriber->append_conf('postgresql.conf',
 $node_subscriber->start;
 
 my $publisher_connstr = $node_publisher->connstr . ' dbname=postgres';
+my $offset = 0;
 
 sub wait_for_subscription_sync
 {
@@ -361,13 +362,13 @@ is( $result, qq(1|abc
 2|xyz), 'update on column tab2.c is not replicated');
 
 
-# TEST: add a table to two publications with different column lists, and
+# TEST: add a table to two publications with same column lists, and
 # create a single subscription replicating both publications
 $node_publisher->safe_psql(
 	'postgres', qq(
 	CREATE TABLE tab5 (a int PRIMARY KEY, b int, c int, d int);
 	CREATE PUBLICATION pub2 FOR TABLE tab5 (a, b);
-	CREATE PUBLICATION pub3 FOR TABLE tab5 (a, d);
+	CREATE PUBLICATION pub3 FOR TABLE tab5 (a, b);
 
 	-- insert a couple initial records
 	INSERT INTO tab5 VALUES (1, 11, 111, 1111);
@@ -388,8 +389,7 @@ wait_for_subscription_sync($node_subscriber);
 
 $node_publisher->wait_for_catchup('sub1');
 
-# insert data and make sure all the columns (union of the columns lists)
-# get fully replicated
+# insert data and make sure the columns in column list get fully replicated
 $node_publisher->safe_psql(
 	'postgres', qq(
 	INSERT INTO tab5 VALUES (3, 33, 333, 3333);
@@ -399,42 +399,12 @@ $node_publisher->safe_psql(
 $node_publisher->wait_for_catchup('sub1');
 
 is( $node_subscriber->safe_psql('postgres', "SELECT * FROM tab5 ORDER BY a"),
-	qq(1|11|1111
-2|22|2222
-3|33|3333
-4|44|4444),
+	qq(1|11|
+2|22|
+3|33|
+4|44|),
 	'overlapping publications with overlapping column lists');
 
-# and finally, remove the column list for one of the publications, which
-# means replicating all columns (removing the column list), but first add
-# the missing column to the table on subscriber
-$node_publisher->safe_psql(
-	'postgres', qq(
-	ALTER PUBLICATION pub3 SET TABLE tab5;
-));
-
-$node_subscriber->safe_psql(
-	'postgres', qq(
-	ALTER SUBSCRIPTION sub1 REFRESH PUBLICATION;
-	ALTER TABLE tab5 ADD COLUMN c INT;
-));
-
-wait_for_subscription_sync($node_subscriber);
-
-$node_publisher->safe_psql(
-	'postgres', qq(
-	INSERT INTO tab5 VALUES (5, 55, 555, 5555);
-));
-
-$node_publisher->wait_for_catchup('sub1');
-
-is( $node_subscriber->safe_psql('postgres', "SELECT * FROM tab5 ORDER BY a"),
-	qq(1|11|1111|
-2|22|2222|
-3|33|3333|
-4|44|4444|
-5|55|5555|555),
-	'overlapping publications with overlapping column lists');
 
 # TEST: create a table with a column list, then change the replica
 # identity by replacing a primary key (but use a different column in
@@ -900,57 +870,21 @@ is( $node_subscriber->safe_psql(
 3|),
 	'partitions with different replica identities not replicated correctly');
 
-# TEST: With a table included in multiple publications, we should use a
-# union of the column lists. So with column lists (a,b) and (a,c) we
-# should replicate (a,b,c).
-
-$node_publisher->safe_psql(
-	'postgres', qq(
-	CREATE TABLE test_mix_1 (a int PRIMARY KEY, b int, c int);
-	CREATE PUBLICATION pub_mix_1 FOR TABLE test_mix_1 (a, b);
-	CREATE PUBLICATION pub_mix_2 FOR TABLE test_mix_1 (a, c);
-
-	-- initial data
-	INSERT INTO test_mix_1 VALUES (1, 2, 3);
-));
-
-$node_subscriber->safe_psql(
-	'postgres', qq(
-	CREATE TABLE test_mix_1 (a int PRIMARY KEY, b int, c int);
-	ALTER SUBSCRIPTION sub1 SET PUBLICATION pub_mix_1, pub_mix_2;
-));
-
-wait_for_subscription_sync($node_subscriber);
-
-$node_publisher->safe_psql(
-	'postgres', qq(
-	INSERT INTO test_mix_1 VALUES (4, 5, 6);
-));
-
-$node_publisher->wait_for_catchup('sub1');
-
-is( $node_subscriber->safe_psql(
-		'postgres', "SELECT * FROM test_mix_1 ORDER BY a"),
-	qq(1|2|3
-4|5|6),
-	'a mix of publications should use a union of column list');
-
 
-# TEST: With a table included in multiple publications, we should use a
-# union of the column lists. If any of the publications is FOR ALL
-# TABLES, we should replicate all columns.
+# TEST: With a table included in the publications which is FOR ALL TABLES, it
+# means replicate all columns.
 
 # drop unnecessary tables, so as not to interfere with the FOR ALL TABLES
 $node_publisher->safe_psql(
 	'postgres', qq(
-	DROP TABLE tab1, tab2, tab3, tab4, tab5, tab6, tab7, test_mix_1,
+	DROP TABLE tab1, tab2, tab3, tab4, tab5, tab6, tab7,
 			   test_part, test_part_a, test_part_b, test_part_c, test_part_d;
 ));
 
 $node_publisher->safe_psql(
 	'postgres', qq(
 	CREATE TABLE test_mix_2 (a int PRIMARY KEY, b int, c int);
-	CREATE PUBLICATION pub_mix_3 FOR TABLE test_mix_2 (a, b);
+	CREATE PUBLICATION pub_mix_3 FOR TABLE test_mix_2 (a, b, c);
 	CREATE PUBLICATION pub_mix_4 FOR ALL TABLES;
 
 	-- initial data
@@ -976,12 +910,11 @@ $node_publisher->wait_for_catchup('sub1');
 is( $node_subscriber->safe_psql('postgres', "SELECT * FROM test_mix_2"),
 	qq(1|2|3
 4|5|6),
-	'a mix of publications should use a union of column list');
+	'all columns should be replicated');
 
 
-# TEST: With a table included in multiple publications, we should use a
-# union of the column lists. If any of the publications is FOR ALL
-# TABLES IN SCHEMA, we should replicate all columns.
+# TEST: With a table included in the publication which is FOR ALL TABLES
+# IN SCHEMA, it means replicate all columns.
 
 $node_subscriber->safe_psql(
 	'postgres', qq(
@@ -993,7 +926,7 @@ $node_publisher->safe_psql(
 	'postgres', qq(
 	DROP TABLE test_mix_2;
 	CREATE TABLE test_mix_3 (a int PRIMARY KEY, b int, c int);
-	CREATE PUBLICATION pub_mix_5 FOR TABLE test_mix_3 (a, b);
+	CREATE PUBLICATION pub_mix_5 FOR TABLE test_mix_3 (a, b, c);
 	CREATE PUBLICATION pub_mix_6 FOR ALL TABLES IN SCHEMA public;
 
 	-- initial data
@@ -1017,7 +950,7 @@ $node_publisher->wait_for_catchup('sub1');
 is( $node_subscriber->safe_psql('postgres', "SELECT * FROM test_mix_3"),
 	qq(1|2|3
 4|5|6),
-	'a mix of publications should use a union of column list');
+	'all columns should be replicated');
 
 
 # TEST: Check handling of publish_via_partition_root - if a partition is
@@ -1074,7 +1007,7 @@ is( $node_subscriber->safe_psql(
 # TEST: Multiple publications which publish schema of parent table and
 # partition. The partition is published through two publications, once
 # through a schema (so no column list) containing the parent, and then
-# also directly (with a columns list). The expected outcome is there is
+# also directly (with all columns). The expected outcome is there is
 # no column list.
 
 $node_publisher->safe_psql(
@@ -1086,7 +1019,7 @@ $node_publisher->safe_psql(
 	CREATE TABLE t_1 PARTITION OF s1.t FOR VALUES FROM (1) TO (10);
 
 	CREATE PUBLICATION pub1 FOR ALL TABLES IN SCHEMA s1;
-	CREATE PUBLICATION pub2 FOR TABLE t_1(b);
+	CREATE PUBLICATION pub2 FOR TABLE t_1(a, b, c);
 
 	-- initial data
 	INSERT INTO s1.t VALUES (1, 2, 3);
@@ -1233,6 +1166,51 @@ is( $node_subscriber->safe_psql(
 	'publication containing both parent and child relation');
 
 
+# TEST: With a table included in multiple publications with different column
+# lists, we should catch the error when creating the subscription.
+
+$node_publisher->safe_psql('postgres', qq(
+	CREATE TABLE test_mix_1 (a int PRIMARY KEY, b int, c int);
+	CREATE PUBLICATION pub_mix_1 FOR TABLE test_mix_1 (a, b);
+	CREATE PUBLICATION pub_mix_2 FOR TABLE test_mix_1 (a, c);
+));
+
+$node_subscriber->safe_psql('postgres', qq(
+	DROP SUBSCRIPTION sub1;
+	CREATE TABLE test_mix_1 (a int PRIMARY KEY, b int, c int);
+));
+
+my ($cmdret, $stdout, $stderr) = $node_subscriber->psql(
+	'postgres', qq(
+	CREATE SUBSCRIPTION sub1 CONNECTION '$publisher_connstr' PUBLICATION pub_mix_1, pub_mix_2;
+));
+
+ok( $stderr =~
+	  qr/cannot use different column lists for table "public.test_mix_1" in different publications/,
+	'different column lists detected');
+
+# TEST: If the column list is changed after creating the subscription, we
+# should catch the error reported by walsender.
+
+$node_publisher->safe_psql('postgres', qq(
+	ALTER PUBLICATION pub_mix_1 SET TABLE test_mix_1 (a, c);
+));
+
+$node_subscriber->safe_psql('postgres', qq(
+	CREATE SUBSCRIPTION sub1 CONNECTION '$publisher_connstr' PUBLICATION pub_mix_1, pub_mix_2;
+));
+
+$node_publisher->wait_for_catchup('sub1');
+
+$node_publisher->safe_psql('postgres', qq(
+	ALTER PUBLICATION pub_mix_1 SET TABLE test_mix_1 (a, b);
+	INSERT INTO test_mix_1 VALUES(1, 1, 1);
+));
+
+$offset = $node_publisher->wait_for_log(
+	qr/cannot use different column lists for table "public.test_mix_1" in different publications/,
+	$offset);
+
 $node_subscriber->stop('fast');
 $node_publisher->stop('fast');
 
-- 
1.8.3.1

#79

Amit Kapila

amit.kapila16@gmail.com

over 3 years ago

In reply to: houzj.fnst@fujitsu.com (#78)

Re: bogus: logical replication rows/cols combinations

On Fri, May 27, 2022 at 1:04 PM houzj.fnst@fujitsu.com
<houzj.fnst@fujitsu.com> wrote:

On Friday, May 27, 2022 1:54 PM Justin Pryzby <pryzby@telsasoft.com> wrote:
On Fri, May 27, 2022 at 11:17:00AM +0530, Amit Kapila wrote:

On Tue, May 24, 2022 at 11:03 AM houzj.fnst@fujitsu.com

<houzj.fnst@fujitsu.com> wrote:

On Friday, May 20, 2022 11:06 AM Amit Kapila <amit.kapila16@gmail.com>

wrote:

Thanks for pointing it out. Here is the new version patch which add this

version check.

I have added/edited a few comments and ran pgindent. The attached
looks good to me. I'll push this early next week unless there are more
comments/suggestions.

A minor doc review.
Note that I also sent some doc comments at
20220519120724.GO19626@telsasoft.com.
+      lists among publications in which case <command>ALTER
PUBLICATION</command>
+      command will be successful but later the WalSender in publisher
+ or the
COMMA in which

remove "command" ?

s/in publisher/on the publisher/
+   Subscription having several publications in which the same table has been
+   published with different column lists is not supported.
Either "Subscriptions having .. are not supported"; or, "A subscription having ..
is not supported".
Thanks for the comments. Here is the new version patch set which fixes these.

I have pushed the bug-fix patch. I'll look at the language
improvements patch next.

--
With Regards,
Amit Kapila.

#80

Peter Smith

smithpb2250@gmail.com

over 3 years ago

In reply to: Amit Kapila (#79)

Re: bogus: logical replication rows/cols combinations

On Thu, Jun 2, 2022 at 9:58 PM Amit Kapila <amit.kapila16@gmail.com> wrote:

On Fri, May 27, 2022 at 1:04 PM houzj.fnst@fujitsu.com
<houzj.fnst@fujitsu.com> wrote:
On Friday, May 27, 2022 1:54 PM Justin Pryzby <pryzby@telsasoft.com> wrote:
On Fri, May 27, 2022 at 11:17:00AM +0530, Amit Kapila wrote:

On Tue, May 24, 2022 at 11:03 AM houzj.fnst@fujitsu.com

<houzj.fnst@fujitsu.com> wrote:

On Friday, May 20, 2022 11:06 AM Amit Kapila <amit.kapila16@gmail.com>

wrote:

Thanks for pointing it out. Here is the new version patch which add this

version check.

I have added/edited a few comments and ran pgindent. The attached
looks good to me. I'll push this early next week unless there are more
comments/suggestions.

A minor doc review.
Note that I also sent some doc comments at
20220519120724.GO19626@telsasoft.com.
+      lists among publications in which case <command>ALTER
PUBLICATION</command>
+      command will be successful but later the WalSender in publisher
+ or the
COMMA in which

remove "command" ?

s/in publisher/on the publisher/
+   Subscription having several publications in which the same table has been
+   published with different column lists is not supported.
Either "Subscriptions having .. are not supported"; or, "A subscription having ..
is not supported".
Thanks for the comments. Here is the new version patch set which fixes these.
I have pushed the bug-fix patch. I'll look at the language
improvements patch next.

I noticed the patch "0001-language-fixes-on-HEAD-from-Justin.patch" says:

@@ -11673,7 +11673,7 @@
prosrc => 'pg_show_replication_origin_status' },

 # publications
-{ oid => '6119', descr => 'get information of tables in a publication',
+{ oid => '6119', descr => 'get information about tables in a publication',

~~~

But, this grammar website [1]https://pediaa.com/difference-between-of-and-about/ says:

What Does Of Mean
As defined by Cambridge dictionary Of is basically used “to show
possession, belonging, or origin”.

What Does About Mean
Similarly about primarily indicates ‘On the subject of; concerning’ as
defined by the Oxford dictionary. Or about in brief highlights some
fact ‘on the subject of, or connected with’

The main difference between of and about is that of implies a
possessive quality while about implies concerning or on the subject of
something.

~~~

From which I guess

1. 'get information of tables in a publication' ~= 'get information
belonging to tables in a publication'

2. 'get information about tables in a publication' ~= 'get information
on the subject of tables in a publication'

The 'pg_publication_tables' view contains various attributes
(tablename, attnames, rowfilter, etc) BELONGING TO each table of the
publication, so the current description (using 'of') was already the
more accurate one wasn't it?

------
[1]: https://pediaa.com/difference-between-of-and-about/

Kind Regards,
Peter Smith.
Fujitsu Australia

#81

Justin Pryzby

pryzby@telsasoft.com

over 3 years ago

In reply to: Peter Smith (#80)

Re: bogus: logical replication rows/cols combinations

On Mon, Jun 06, 2022 at 03:42:31PM +1000, Peter Smith wrote:

I noticed the patch "0001-language-fixes-on-HEAD-from-Justin.patch" says:

@@ -11673,7 +11673,7 @@
prosrc => 'pg_show_replication_origin_status' },
# publications
-{ oid => '6119', descr => 'get information of tables in a publication',
+{ oid => '6119', descr => 'get information about tables in a publication',
~~~

But, this grammar website [1] says:

...

From which I guess

1. 'get information of tables in a publication' ~= 'get information
belonging to tables in a publication'

But the information doesn't "belong to" the tables.

The information is "regarding" the tables (or "associated with" or "concerned
with" or "respecting" or "on the subject of" the tables).

I think my change is correct based on the grammar definition, as well as its
intuitive "feel".

--
Justin

#82

Peter Smith

smithpb2250@gmail.com

over 3 years ago

In reply to: Justin Pryzby (#81)

Re: bogus: logical replication rows/cols combinations

On Wed, Jun 8, 2022 at 1:25 PM Justin Pryzby <pryzby@telsasoft.com> wrote:

On Mon, Jun 06, 2022 at 03:42:31PM +1000, Peter Smith wrote:
I noticed the patch "0001-language-fixes-on-HEAD-from-Justin.patch" says:

@@ -11673,7 +11673,7 @@
prosrc => 'pg_show_replication_origin_status' },
# publications
-{ oid => '6119', descr => 'get information of tables in a publication',
+{ oid => '6119', descr => 'get information about tables in a publication',
~~~

But, this grammar website [1] says:
...

From which I guess

1. 'get information of tables in a publication' ~= 'get information
belonging to tables in a publication'

But the information doesn't "belong to" the tables.

The information is "regarding" the tables (or "associated with" or "concerned
with" or "respecting" or "on the subject of" the tables).

I think my change is correct based on the grammar definition, as well as its
intuitive "feel".

Actually, I have no problem with this being worded either way. My
point was mostly to question if it was really worth changing it at
this time - e.g. I think there is a reluctance to change anything to
do with the catalogs during beta (even when a catversion bump may not
be required).

I agree that "about" seems better if the text said, "get information
about tables". But it does not say that - it says "get information
about tables in a publication" which I felt made a subtle difference.

e.g.1 "... on the subject of / concerned with tables."
- sounds like attributes about each table (col names, row filter etc)

versus

e.g.2 "... on the subject of / concerned with tables in a publication."
- sounds less like information PER table, and more like information
about the table membership of the publication.

Any ambiguities can be eliminated if this text was just fixed to be
consistent with the wording of catalogs.sgml:
e.g. "publications and information about their associated tables"

But then this comes full circle back to my question if during beta is
a good time to be making such a change.

------
Kind Regards,
Peter Smith.
Fujitsu Australia

#83

Amit Kapila

amit.kapila16@gmail.com

over 3 years ago

In reply to: Peter Smith (#82)

Re: bogus: logical replication rows/cols combinations

On Wed, Jun 8, 2022 at 11:05 AM Peter Smith <smithpb2250@gmail.com> wrote:

On Wed, Jun 8, 2022 at 1:25 PM Justin Pryzby <pryzby@telsasoft.com> wrote:
On Mon, Jun 06, 2022 at 03:42:31PM +1000, Peter Smith wrote:
I noticed the patch "0001-language-fixes-on-HEAD-from-Justin.patch" says:

@@ -11673,7 +11673,7 @@
prosrc => 'pg_show_replication_origin_status' },
# publications
-{ oid => '6119', descr => 'get information of tables in a publication',
+{ oid => '6119', descr => 'get information about tables in a publication',
~~~

But, this grammar website [1] says:
...

From which I guess

1. 'get information of tables in a publication' ~= 'get information
belonging to tables in a publication'

But the information doesn't "belong to" the tables.

The information is "regarding" the tables (or "associated with" or "concerned
with" or "respecting" or "on the subject of" the tables).

I think my change is correct based on the grammar definition, as well as its
intuitive "feel".
Actually, I have no problem with this being worded either way. My
point was mostly to question if it was really worth changing it at
this time - e.g. I think there is a reluctance to change anything to
do with the catalogs during beta (even when a catversion bump may not
be required).

I agree that "about" seems better if the text said, "get information
about tables". But it does not say that - it says "get information
about tables in a publication" which I felt made a subtle difference.

e.g.1 "... on the subject of / concerned with tables."
- sounds like attributes about each table (col names, row filter etc)

versus

e.g.2 "... on the subject of / concerned with tables in a publication."
- sounds less like information PER table, and more like information
about the table membership of the publication.

~~

Any ambiguities can be eliminated if this text was just fixed to be
consistent with the wording of catalogs.sgml:
e.g. "publications and information about their associated tables"

I don't know if this is better than the current text for this view:
'get information of tables in a publication' and unless we have a
consensus on any change here, I think it is better to retain the
current text as it is.

I would like to close the Open item listed corresponding to this
thread [1]https://wiki.postgresql.org/wiki/PostgreSQL_15_Open_Items as the fix for the reported issue is committed
(fd0b9dcebd). Do let me know if you or others think otherwise?

[1]: https://wiki.postgresql.org/wiki/PostgreSQL_15_Open_Items

--
With Regards,
Amit Kapila.

#84

Amit Kapila

amit.kapila16@gmail.com

over 3 years ago

In reply to: Amit Kapila (#83)

Re: bogus: logical replication rows/cols combinations

On Mon, Jun 13, 2022 at 8:54 AM Amit Kapila <amit.kapila16@gmail.com> wrote:

I would like to close the Open item listed corresponding to this
thread [1] as the fix for the reported issue is committed
(fd0b9dcebd). Do let me know if you or others think otherwise?

Seeing no objections, I have closed this item.

--
With Regards,
Amit Kapila.