tablesync copy ignores publication actions

Started by Peter Smithover 3 years ago13 messages
#1Peter Smith
smithpb2250@gmail.com
1 attachment(s)

The logical replication tablesync ignores the publication 'publish'
operations during the initial data copy.

This is current/known PG behaviour (e.g. as recently mentioned [1]/messages/by-id/CAA4eK1L_98LF7Db4yFY1PhKKRzoT83xtN41jTS5X+8OeGrAkLw@mail.gmail.com)
but it was not documented anywhere.

This patch just documents the existing behaviour and gives some examples.

------
[1]: /messages/by-id/CAA4eK1L_98LF7Db4yFY1PhKKRzoT83xtN41jTS5X+8OeGrAkLw@mail.gmail.com

Kind Regards,
Peter Smith.
Fujitsu Australia

Attachments:

v1-0001-PGDOCS-tablesync-ignores-publish-operation.patchapplication/octet-stream; name=v1-0001-PGDOCS-tablesync-ignores-publish-operation.patchDownload
From fc425a59b2f17427861033cb03df86b36b60db89 Mon Sep 17 00:00:00 2001
From: Peter Smith <peter.b.smith@fujitsu.com>
Date: Fri, 13 May 2022 18:18:27 +1000
Subject: [PATCH v1] PGDOCS tablesync ignores publish operation

The tablesync ignores the publication 'publish' operations during the initial data copy.

This is existing PG behaviour but it was not documented before.
This patch documents the behaviour and gives some examples.
---
 doc/src/sgml/logical-replication.sgml | 90 ++++++++++++++++++++++++++++++++++-
 1 file changed, 89 insertions(+), 1 deletion(-)

diff --git a/doc/src/sgml/logical-replication.sgml b/doc/src/sgml/logical-replication.sgml
index 145ea71..8dfc8f7 100644
--- a/doc/src/sgml/logical-replication.sgml
+++ b/doc/src/sgml/logical-replication.sgml
@@ -459,6 +459,9 @@
     published with different <literal>WHERE</literal> clauses, rows that satisfy
     <emphasis>any</emphasis> of the expressions will be copied. See
     <xref linkend="logical-replication-row-filter-combining"/> for details.
+    Note that during the initial data synchronizaton (unlike normal replication)
+    the <literal>publish</literal> operation of the publisher is ignored. See
+    <xref linkend="logical-replication-snapshot"/> for details.
    </para>
 
    <note>
@@ -500,7 +503,8 @@
       </para>
      </listitem>
     </itemizedlist></para>
-
+    <para>
+    </para>
   </sect2>
 
   <sect2 id="logical-replication-row-filter-examples">
@@ -1095,7 +1099,91 @@ CONTEXT:  processing remote data for replication origin "pg_16395" during "INSER
      replication of the table is given back to the main apply process where
      replication continues as normal.
     </para>
+    <note>
+     <para>
+      Publication <literal>publish</literal> operations (
+      <literal>insert</literal>, <literal>update</literal>, <literal>delete</literal>,
+      ...) are ignored during the initial data copy.
+     </para>
+    </note>
+  </sect2>
+
+  <sect2 id="logical-replication-snapshot-example">
+    <title>Example</title>
+
+    <para>
+     Create some test tables on the publisher.
+<programlisting>
+test_pub=# CREATE TABLE t1(a int, b text, PRIMARY KEY(a));
+CREATE TABLE
+test_pub=# CREATE TABLE t2(c int, d text, PRIMARY KEY(c));
+CREATE TABLE
+</programlisting></para>
+
+    <para>
+     Create the same tables on the subscriber.
+<programlisting>
+test_sub=# CREATE TABLE t1(a int, b text, PRIMARY KEY(a));
+CREATE TABLE
+test_sub=# CREATE TABLE t2(c int, d text, PRIMARY KEY(c));
+CREATE TABLE
+</programlisting></para>
+
+    <para>
+     Insert data to the tables at the publisher side.
+<programlisting>
+test_pub=# INSERT INTO t1 VALUES (1, 'one'), (2, 'two'), (3, 'three');
+INSERT 0 3
+test_pub=# INSERT INTO t2 VALUES (1, 'A'), (2, 'B'), (3, 'C');
+INSERT 0 3
+</programlisting></para>
+
+    <para>
+     Create publications for the tables. The publication <literal>pub2</literal>
+     limits the <literal>publish</literal> operations.
+<programlisting>
+test_pub=# CREATE PUBLICATION pub1 FOR TABLE t1;
+CREATE PUBLICATION
+test_pub=# CREATE PUBLICATION pub2 FOR TABLE t2 WITH (publish = 'truncate');
+CREATE PUBLICATION
+</programlisting></para>
+
+    <para>
+     Create subscriptions to each of those publications. Subscriptions
+     will copy initial data by default.
+<programlisting>
+test_sub=# CREATE SUBSCRIPTION sub1
+test_sub-#CONNECTION 'host=localhost dbname=test_pub application_name=sub1'
+test_sub-#PUBLICATION pub1;
+CREATE SUBSCRIPTION
+test_sub=# CREATE SUBSCRIPTION sub2
+test_sub-#CONNECTION 'host=localhost dbname=test_pub application_name=sub2'
+test_sub-#PUBLICATION pub2;
+CREATE SUBSCRIPTION
+</programlisting></para>
+
+    <para>
+     Observe that initial table data is copied, regardless of the
+     <literal>publish</literal> operation of the publication.
+<programlisting>
+test_sub=# SELECT * FROM t1;
+ a |   b
+---+-------
+ 1 | one
+ 2 | two
+ 3 | three
+(3 rows)
+
+test_sub=# SELECT * FROM t2;
+ c | d
+---+---
+ 1 | A
+ 2 | B
+ 3 | C
+(3 rows)
+</programlisting></para>
   </sect2>
+
  </sect1>
 
  <sect1 id="logical-replication-monitoring">
-- 
1.8.3.1

#2Euler Taveira
euler@eulerto.com
In reply to: Peter Smith (#1)
Re: tablesync copy ignores publication actions

On Tue, Jun 7, 2022, at 1:10 AM, Peter Smith wrote:

The logical replication tablesync ignores the publication 'publish'
operations during the initial data copy.

This is current/known PG behaviour (e.g. as recently mentioned [1])
but it was not documented anywhere.

initial data synchronization != replication. publish parameter is a replication
property; it is not a initial data synchronization property. Maybe we should
make it clear like you are suggesting.

This patch just documents the existing behaviour and gives some examples.

Why did you add this information to that specific paragraph? IMO it belongs to
a separate paragraph; I would add it as the first paragraph in that subsection.

I suggest the following paragraph:

<para>
The initial data synchronization does not take into account the
<literal>publish</literal> parameter to copy the existing data.
</para>

There is no point to link the Initial Snapshot subsection. That subsection is
explaining the initial copy steps and you want to inform about the effect of a
publication parameter on the initial copy. Although both are talking about the
same topic (initial copy), that link to Initial Snapshot subsection won't add
additional information about the publish parameter. You could expand the
suggested sentence to make it clear what publish parameter is or even add a
link to the CREATE PUBLICATION synopsis (that explains about publish
parameter).

You add an empty paragraph. Remove it.

I'm not sure it deserves an example. It is an easy-to-understand concept and a
good description is better than ~ 80 new lines.

--
Euler Taveira
EDB https://www.enterprisedb.com/

#3Amit Kapila
amit.kapila16@gmail.com
In reply to: Euler Taveira (#2)
Re: tablesync copy ignores publication actions

On Tue, Jun 7, 2022 at 7:08 PM Euler Taveira <euler@eulerto.com> wrote:

On Tue, Jun 7, 2022, at 1:10 AM, Peter Smith wrote:

The logical replication tablesync ignores the publication 'publish'
operations during the initial data copy.

This is current/known PG behaviour (e.g. as recently mentioned [1])
but it was not documented anywhere.

initial data synchronization != replication. publish parameter is a replication
property; it is not a initial data synchronization property. Maybe we should
make it clear like you are suggesting.

+1 to document it. We respect some other properties of publication
like the publish_via_partition_root parameter, column lists, and row
filters. So it is better to explain about 'publish' parameter which we
ignore during the initial sync.

This patch just documents the existing behaviour and gives some examples.

Why did you add this information to that specific paragraph? IMO it belongs to
a separate paragraph; I would add it as the first paragraph in that subsection.

I suggest the following paragraph:

<para>
The initial data synchronization does not take into account the
<literal>publish</literal> parameter to copy the existing data.
</para>

There is no point to link the Initial Snapshot subsection. That subsection is
explaining the initial copy steps and you want to inform about the effect of a
publication parameter on the initial copy. Although both are talking about the
same topic (initial copy), that link to Initial Snapshot subsection won't add
additional information about the publish parameter.

Here, we are explaining the behavior of row filters during initial
sync so adding a link to the Initial Snapshot section makes sense to
me.

You could expand the
suggested sentence to make it clear what publish parameter is or even add a
link to the CREATE PUBLICATION synopsis (that explains about publish
parameter).

+1. I suggest that we should add some text about the behavior of
initial sync in CREATE PUBLICATION doc (along with the 'publish'
parameter) or otherwise, we can explain it where we are talking about
publications [1]https://www.postgresql.org/docs/devel/logical-replication-publication.html.

You add an empty paragraph. Remove it.

I'm not sure it deserves an example. It is an easy-to-understand concept and a
good description is better than ~ 80 new lines.

I don't think it is very clear that "initial data synchronization !=
replication" as mentioned by you nor does our docs does a good job in
explaining it otherwise the confusion wouldn't have arisen in the
email link shared by Peter. Personally, I think such things can be
better explained by example and in that regards the example shared by
Peter does half the job because it doesn't explain the replication
part. I don't think "Initial Snapshot" is the right place for these
examples considering we want to show the replication based on the
publish actions. We can extend it to show one example with row filters
as well. How about showing these examples in the Subscription section
[2]: https://www.postgresql.org/docs/devel/logical-replication-subscription.html

[1]: https://www.postgresql.org/docs/devel/logical-replication-publication.html
[2]: https://www.postgresql.org/docs/devel/logical-replication-subscription.html

--
With Regards,
Amit Kapila.

#4shiy.fnst@fujitsu.com
shiy.fnst@fujitsu.com
In reply to: Amit Kapila (#3)
RE: tablesync copy ignores publication actions

On Wed, Jun 8, 2022 12:10 PM Amit Kapila <amit.kapila16@gmail.com> wrote:

On Tue, Jun 7, 2022 at 7:08 PM Euler Taveira <euler@eulerto.com> wrote:

On Tue, Jun 7, 2022, at 1:10 AM, Peter Smith wrote:

The logical replication tablesync ignores the publication 'publish'
operations during the initial data copy.

This is current/known PG behaviour (e.g. as recently mentioned [1])
but it was not documented anywhere.

initial data synchronization != replication. publish parameter is a replication
property; it is not a initial data synchronization property. Maybe we should
make it clear like you are suggesting.

+1 to document it. We respect some other properties of publication
like the publish_via_partition_root parameter, column lists, and row
filters. So it is better to explain about 'publish' parameter which we
ignore during the initial sync.

I also agree to add it to the document.

You could expand the
suggested sentence to make it clear what publish parameter is or even add

a

link to the CREATE PUBLICATION synopsis (that explains about publish
parameter).

+1. I suggest that we should add some text about the behavior of
initial sync in CREATE PUBLICATION doc (along with the 'publish'
parameter) or otherwise, we can explain it where we are talking about
publications [1].

I noticed that CREATE SUBSCRIPTION doc mentions that row filter will affect
initial sync along with "copy_data" parameter.[1]https://www.postgresql.org/docs/devel/sql-createsubscription.html Maybe we can add some text for
"publish" parameter there.

[1]: https://www.postgresql.org/docs/devel/sql-createsubscription.html

Regards,
Shi yu

#5Peter Smith
smithpb2250@gmail.com
In reply to: Peter Smith (#1)
1 attachment(s)
Re: tablesync copy ignores publication actions

PSA v2 of the patch, based on all feedback received.

~~~

Main differences from v1:

* Rewording and more explanatory text.

* The examples were moved to the "Subscription" [1]https://www.postgresql.org/docs/devel/logical-replication-subscription.html page and also
extended to show some normal replication and row filter examples, from
[Amit].

* Added some text to CREATE PUBLICATION 'publish' param [2]https://www.postgresql.org/docs/devel/sql-createpublication.html, from [Euler][Amit].

* Added some text to CREATE SUBSCRIPTION Notes [3]https://www.postgresql.org/docs/devel/sql-createsubscription.html, from [Shi-san].

* Added some text to the "Publication page" [4]https://www.postgresql.org/docs/devel/logical-replication-publication.html to say the 'publish'
is only for DML operations.

* I changed the note in "Row Filter - Initial Data Synchronization"
[5]: https://www.postgresql.org/docs/devel/logical-replication-row-filter.html#LOGICAL-REPLICATION-ROW-FILTER-INITIAL-DATA-SYNC
data exposed by the initial copy, which a DML operation would not
expose.

------
[1]: https://www.postgresql.org/docs/devel/logical-replication-subscription.html
[2]: https://www.postgresql.org/docs/devel/sql-createpublication.html
[3]: https://www.postgresql.org/docs/devel/sql-createsubscription.html
[4]: https://www.postgresql.org/docs/devel/logical-replication-publication.html
[5]: https://www.postgresql.org/docs/devel/logical-replication-row-filter.html#LOGICAL-REPLICATION-ROW-FILTER-INITIAL-DATA-SYNC

[Euler] /messages/by-id/bd49c14d-7a01-4ae3-b424-8c49630fec57@www.fastmail.com
[Amit] /messages/by-id/CAA4eK1Lb5QpWCQU8qkELnX6t8z7JeVtGantmKptxkkpxnYnpHA@mail.gmail.com
[Shi-san] /messages/by-id/OSZPR01MB631026B8428422EAC1BFB8A4FDAA9@OSZPR01MB6310.jpnprd01.prod.outlook.com

Kind Regards,
Peter Smith.
Fujitsu Australia

Attachments:

v2-0001-PGDOCS-tablesync-ignores-publish-operation.patchapplication/octet-stream; name=v2-0001-PGDOCS-tablesync-ignores-publish-operation.patchDownload
From aa87a43c7ba2282439ef6328a4a1734fdfe8672b Mon Sep 17 00:00:00 2001
From: Peter Smith <peter.b.smith@fujitsu.com>
Date: Tue, 14 Jun 2022 17:25:55 +1000
Subject: [PATCH v2] PGDOCS tablesync ignores publish operation

The logical replication initial data synchronization (tablesync) does not
take into account the  publication 'publish' parameter when copying the
existing table data.

This has long been PG behaviour but it was not documented before.

This patch documents it and gives some examples.
---
 doc/src/sgml/logical-replication.sgml     | 218 +++++++++++++++++++++++++++++-
 doc/src/sgml/ref/create_publication.sgml  |   5 +
 doc/src/sgml/ref/create_subscription.sgml |   4 +
 3 files changed, 223 insertions(+), 4 deletions(-)

diff --git a/doc/src/sgml/logical-replication.sgml b/doc/src/sgml/logical-replication.sgml
index 145ea71..42e30ab 100644
--- a/doc/src/sgml/logical-replication.sgml
+++ b/doc/src/sgml/logical-replication.sgml
@@ -117,9 +117,10 @@
    Publications can choose to limit the changes they produce to
    any combination of <command>INSERT</command>, <command>UPDATE</command>,
    <command>DELETE</command>, and <command>TRUNCATE</command>, similar to how triggers are fired by
-   particular event types.  By default, all operation types are replicated.
-   (Row filters have no effect for <command>TRUNCATE</command>. See
-   <xref linkend="logical-replication-row-filter"/>).
+   particular event types. By default, all operation types are replicated.
+   These are DML operation limitations only; they do not affect the initial
+   data synchronization copy. (Row filters have no effect for
+   <command>TRUNCATE</command>. See <xref linkend="logical-replication-row-filter"/>).
   </para>
 
   <para>
@@ -317,6 +318,199 @@
     </itemizedlist>
    </para>
   </sect2>
+
+  <sect2 id="logical-replication-subscription-examples">
+    <title>Examples</title>
+
+    <para>
+     Create some test tables on the publisher.
+<programlisting>
+test_pub=# CREATE TABLE t1(a int, b text, PRIMARY KEY(a));
+CREATE TABLE
+test_pub=# CREATE TABLE t2(c int, d text, PRIMARY KEY(c));
+CREATE TABLE
+test_pub=# CREATE TABLE t3(e int, f text, PRIMARY KEY(e));
+CREATE TABLE
+</programlisting></para>
+
+    <para>
+     Create the same tables on the subscriber.
+<programlisting>
+test_sub=# CREATE TABLE t1(a int, b text, PRIMARY KEY(a));
+CREATE TABLE
+test_sub=# CREATE TABLE t2(c int, d text, PRIMARY KEY(c));
+CREATE TABLE
+test_sub=# CREATE TABLE t3(e int, f text, PRIMARY KEY(e));
+CREATE TABLE
+</programlisting></para>
+
+    <para>
+     Insert data to the tables at the publisher side.
+<programlisting>
+test_pub=# INSERT INTO t1 VALUES (1, 'one'), (2, 'two'), (3, 'three');
+INSERT 0 3
+test_pub=# INSERT INTO t2 VALUES (1, 'A'), (2, 'B'), (3, 'C');
+INSERT 0 3
+test_pub=# INSERT INTO t3 VALUES (1, 'i'), (2, 'ii'), (3, 'iii');
+INSERT 0 3
+</programlisting></para>
+
+    <para>
+     Create publications for the tables. The publications <literal>pub2</literal>
+     and <literal>pub3a</literal> disallow some <literal>publish</literal>
+     operations. The publication <literal>pub3b</literal> has a row filter.
+<programlisting>
+test_pub=# CREATE PUBLICATION pub1 FOR TABLE t1;
+CREATE PUBLICATION
+test_pub=# CREATE PUBLICATION pub2 FOR TABLE t2 WITH (publish = 'truncate');
+CREATE PUBLICATION
+test_pub=# CREATE PUBLICATION pub3a FOR TABLE t3 WITH (publish = 'truncate');
+CREATE PUBLICATION
+test_pub=# CREATE PUBLICATION pub3b FOR TABLE t3 WHERE (e > 5);
+CREATE PUBLICATION
+</programlisting></para>
+
+    <para>
+     Create subscriptions for the publications. The subscription
+     <literal>sub3</literal> subscribes to both <literal>pub3a</literal> and
+     <literal>pub3b</literal>. All subscriptions will copy initial data by default.
+<programlisting>
+test_sub=# CREATE SUBSCRIPTION sub1
+test_sub-# CONNECTION 'host=localhost dbname=test_pub application_name=sub1'
+test_sub-# PUBLICATION pub1;
+CREATE SUBSCRIPTION
+test_sub=# CREATE SUBSCRIPTION sub2
+test_sub-# CONNECTION 'host=localhost dbname=test_pub application_name=sub2'
+test_sub-# PUBLICATION pub2;
+CREATE SUBSCRIPTION
+test_sub=# CREATE SUBSCRIPTION sub3
+test_sub-# CONNECTION 'host=localhost dbname=test_pub application_name=sub3'
+test_sub-# PUBLICATION pub3a, pub3b;
+CREATE SUBSCRIPTION
+</programlisting></para>
+
+    <para>
+     Observe that initial table data is copied, regardless of the
+     <literal>publish</literal> operation of the publication.
+<programlisting>
+test_sub=# SELECT * FROM t1;
+ a |   b
+---+-------
+ 1 | one
+ 2 | two
+ 3 | three
+(3 rows)
+
+test_sub=# SELECT * FROM t2;
+ c | d
+---+---
+ 1 | A
+ 2 | B
+ 3 | C
+(3 rows)
+</programlisting></para>
+
+    <para>
+     Furthermore, because the initial data copy ignores the <literal>publish</literal>
+     operation, and because publication <literal>pub3a</literal> has no row filter,
+     it means the copied table <literal>t3</literal> contains all rows even when
+     they do not patch the row filter of publication <literal>pub3b</literal>.
+<programlisting>
+test_sub=# SELECT * FROM t3;
+ e |  f
+---+-----
+ 1 | i
+ 2 | ii
+ 3 | iii
+(3 rows)
+</programlisting></para>
+
+   <para>
+    Insert more data to the tables at the publisher side.
+<programlisting>
+test_pub=# INSERT INTO t1 VALUES (4, 'four'), (5, 'five'), (6, 'six');
+INSERT 0 3
+test_pub=# INSERT INTO t2 VALUES (4, 'D'), (5, 'E'), (6, 'F');
+INSERT 0 3
+test_pub=# INSERT INTO t3 VALUES (4, 'iv'), (5, 'v'), (6, 'vi');
+INSERT 0 3
+</programlisting></para>
+
+   <para>
+    Now the publisher side data looks like:
+<programlisting>
+test_pub=# SELECT * FROM t1;
+ a |   b
+---+-------
+ 1 | one
+ 2 | two
+ 3 | three
+ 4 | four
+ 5 | five
+ 6 | six
+(6 rows)
+
+test_pub=# SELECT * FROM t2;
+ c | d
+---+---
+ 1 | A
+ 2 | B
+ 3 | C
+ 4 | D
+ 5 | E
+ 6 | F
+(6 rows)
+
+test_pub=# SELECT * FROM t3;
+ e |  f
+---+-----
+ 1 | i
+ 2 | ii
+ 3 | iii
+ 4 | iv
+ 5 | v
+ 6 | vi
+(6 rows)
+</programlisting></para>
+
+   <para>
+    Observe that during normal replication the appropriate
+    <literal>publish</literal> operations are used. This means publications
+    <literal>pub2</literal> and <literal>pub3a</literal> will not replicate the
+    <literal>INSERT</literal>. Also, publication <literal>pub3b</literal> will
+    only replicate data that matches the row filter of <literal>pub3b</literal>.
+    Now the subscriber side data looks like:
+<programlisting>
+test_sub=# SELECT * FROM t1;
+ a |   b
+---+-------
+ 1 | one
+ 2 | two
+ 3 | three
+ 4 | four
+ 5 | five
+ 6 | six
+(6 rows)
+
+test_sub=# SELECT * FROM t2;
+ c | d
+---+---
+ 1 | A
+ 2 | B
+ 3 | C
+(3 rows)
+
+test_sub=# SELECT * FROM t3;
+ e |  f
+---+-----
+ 1 | i
+ 2 | ii
+ 3 | iii
+ 6 | vi
+(4 rows)
+</programlisting></para>
+  </sect2>
+
  </sect1>
 
  <sect1 id="logical-replication-row-filter">
@@ -461,6 +655,16 @@
     <xref linkend="logical-replication-row-filter-combining"/> for details.
    </para>
 
+   <warning>
+    <para>
+     Because initial data synchronization does not take into account the
+     <literal>publish</literal> parameter when copying existing table data,
+     some rows may be copied that would not be replicated using DML. Refer to
+     <xref linkend="logical-replication-snapshot"/>, and see
+     <xref linkend="logical-replication-subscription-examples"/> for examples.
+    </para>
+   </warning>
+
    <note>
     <para>
      If the subscriber is in a release prior to 15, copy pre-existing data
@@ -500,7 +704,6 @@
       </para>
      </listitem>
     </itemizedlist></para>
-
   </sect2>
 
   <sect2 id="logical-replication-row-filter-examples">
@@ -1095,6 +1298,13 @@ CONTEXT:  processing remote data for replication origin "pg_16395" during "INSER
      replication of the table is given back to the main apply process where
      replication continues as normal.
     </para>
+    <note>
+     <para>
+      The publication <literal>publish</literal> parameter only affects what
+      DML operations will be replicated. The initial data synchronization does
+      not take this parameter into account when copying the existing table data.
+     </para>
+    </note>
   </sect2>
  </sect1>
 
diff --git a/doc/src/sgml/ref/create_publication.sgml b/doc/src/sgml/ref/create_publication.sgml
index 1a828e8..92e1045 100644
--- a/doc/src/sgml/ref/create_publication.sgml
+++ b/doc/src/sgml/ref/create_publication.sgml
@@ -177,6 +177,11 @@ CREATE PUBLICATION <replaceable class="parameter">name</replaceable>
           and so the default value for this option is
           <literal>'insert, update, delete, truncate'</literal>.
          </para>
+         <para>
+          This parameter only affects DML operations. In particular, the
+          subscription initial data synchronization does not take this parameter
+          into account when copying existing table data.
+         </para>
         </listitem>
        </varlistentry>
 
diff --git a/doc/src/sgml/ref/create_subscription.sgml b/doc/src/sgml/ref/create_subscription.sgml
index 35b39c2..34b3264 100644
--- a/doc/src/sgml/ref/create_subscription.sgml
+++ b/doc/src/sgml/ref/create_subscription.sgml
@@ -353,6 +353,10 @@ CREATE SUBSCRIPTION <replaceable class="parameter">subscription_name</replaceabl
    15 then any row filtering is ignored during the initial data synchronization
    phase. For this case, the user might want to consider deleting any initially
    copied data that would be incompatible with subsequent filtering.
+   Because initial data synchronization does not take into account the publication
+   <literal>publish</literal> parameter when copying existing table data, some rows
+   may be copied that would not be replicated using DML. See
+   <xref linkend="logical-replication-subscription-examples"/> for examples.
   </para>
 
   <para>
-- 
1.8.3.1

#6shiy.fnst@fujitsu.com
shiy.fnst@fujitsu.com
In reply to: Peter Smith (#5)
RE: tablesync copy ignores publication actions

On Tue, Jun 14, 2022 3:36 PM Peter Smith <smithpb2250@gmail.com> wrote:

PSA v2 of the patch, based on all feedback received.

~~~

Main differences from v1:

* Rewording and more explanatory text.

* The examples were moved to the "Subscription" [1] page and also
extended to show some normal replication and row filter examples, from
[Amit].

* Added some text to CREATE PUBLICATION 'publish' param [2], from
[Euler][Amit].

* Added some text to CREATE SUBSCRIPTION Notes [3], from [Shi-san].

* Added some text to the "Publication page" [4] to say the 'publish'
is only for DML operations.

* I changed the note in "Row Filter - Initial Data Synchronization"
[5] to be a warning because I felt users could be surprised to see
data exposed by the initial copy, which a DML operation would not
expose.

Thanks for updating the patch. Two comments:

1.
+     it means the copied table <literal>t3</literal> contains all rows even when
+     they do not patch the row filter of publication <literal>pub3b</literal>.

Typo. I think "they do not patch the row filter" should be "they do not match
the row filter", right?

2.
@@ -500,7 +704,6 @@
</para>
</listitem>
</itemizedlist></para>
-
</sect2>

<sect2 id="logical-replication-row-filter-examples">

It seems we should remove this change.

Regards,
Shi yu

#7Peter Smith
smithpb2250@gmail.com
In reply to: shiy.fnst@fujitsu.com (#6)
1 attachment(s)
Re: tablesync copy ignores publication actions

On Wed, Jun 15, 2022 at 5:05 PM shiy.fnst@fujitsu.com
<shiy.fnst@fujitsu.com> wrote:

...

Thanks for updating the patch. Two comments:

1.
+     it means the copied table <literal>t3</literal> contains all rows even when
+     they do not patch the row filter of publication <literal>pub3b</literal>.

Typo. I think "they do not patch the row filter" should be "they do not match
the row filter", right?

2.
@@ -500,7 +704,6 @@
</para>
</listitem>
</itemizedlist></para>
-
</sect2>

<sect2 id="logical-replication-row-filter-examples">

It seems we should remove this change.

Thank you for your review comments. Those reported mistakes are fixed
in the attached patch v3.

------
Kind Regards,
Peter Smith.
Fujitsu Australia

Attachments:

v3-0001-PGDOCS-tablesync-ignores-publish-operation.patchapplication/octet-stream; name=v3-0001-PGDOCS-tablesync-ignores-publish-operation.patchDownload
From f8ed3810395564fdd82b8d7f48f7f27d1a2d3d08 Mon Sep 17 00:00:00 2001
From: Peter Smith <peter.b.smith@fujitsu.com>
Date: Thu, 16 Jun 2022 10:18:52 +1000
Subject: [PATCH v3] PGDOCS tablesync ignores publish operation

The logical replication initial data synchronization (tablesync) does not
take into account the  publication 'publish' parameter when copying the
existing table data.

This has long been PG behaviour but it was not documented before.

This patch documents it and gives some examples.
---
 doc/src/sgml/logical-replication.sgml     | 217 +++++++++++++++++++++++++++++-
 doc/src/sgml/ref/create_publication.sgml  |   5 +
 doc/src/sgml/ref/create_subscription.sgml |   4 +
 3 files changed, 223 insertions(+), 3 deletions(-)

diff --git a/doc/src/sgml/logical-replication.sgml b/doc/src/sgml/logical-replication.sgml
index 145ea71..e7de76d 100644
--- a/doc/src/sgml/logical-replication.sgml
+++ b/doc/src/sgml/logical-replication.sgml
@@ -117,9 +117,10 @@
    Publications can choose to limit the changes they produce to
    any combination of <command>INSERT</command>, <command>UPDATE</command>,
    <command>DELETE</command>, and <command>TRUNCATE</command>, similar to how triggers are fired by
-   particular event types.  By default, all operation types are replicated.
-   (Row filters have no effect for <command>TRUNCATE</command>. See
-   <xref linkend="logical-replication-row-filter"/>).
+   particular event types. By default, all operation types are replicated.
+   These are DML operation limitations only; they do not affect the initial
+   data synchronization copy. (Row filters have no effect for
+   <command>TRUNCATE</command>. See <xref linkend="logical-replication-row-filter"/>).
   </para>
 
   <para>
@@ -317,6 +318,199 @@
     </itemizedlist>
    </para>
   </sect2>
+
+  <sect2 id="logical-replication-subscription-examples">
+    <title>Examples</title>
+
+    <para>
+     Create some test tables on the publisher.
+<programlisting>
+test_pub=# CREATE TABLE t1(a int, b text, PRIMARY KEY(a));
+CREATE TABLE
+test_pub=# CREATE TABLE t2(c int, d text, PRIMARY KEY(c));
+CREATE TABLE
+test_pub=# CREATE TABLE t3(e int, f text, PRIMARY KEY(e));
+CREATE TABLE
+</programlisting></para>
+
+    <para>
+     Create the same tables on the subscriber.
+<programlisting>
+test_sub=# CREATE TABLE t1(a int, b text, PRIMARY KEY(a));
+CREATE TABLE
+test_sub=# CREATE TABLE t2(c int, d text, PRIMARY KEY(c));
+CREATE TABLE
+test_sub=# CREATE TABLE t3(e int, f text, PRIMARY KEY(e));
+CREATE TABLE
+</programlisting></para>
+
+    <para>
+     Insert data to the tables at the publisher side.
+<programlisting>
+test_pub=# INSERT INTO t1 VALUES (1, 'one'), (2, 'two'), (3, 'three');
+INSERT 0 3
+test_pub=# INSERT INTO t2 VALUES (1, 'A'), (2, 'B'), (3, 'C');
+INSERT 0 3
+test_pub=# INSERT INTO t3 VALUES (1, 'i'), (2, 'ii'), (3, 'iii');
+INSERT 0 3
+</programlisting></para>
+
+    <para>
+     Create publications for the tables. The publications <literal>pub2</literal>
+     and <literal>pub3a</literal> disallow some <literal>publish</literal>
+     operations. The publication <literal>pub3b</literal> has a row filter.
+<programlisting>
+test_pub=# CREATE PUBLICATION pub1 FOR TABLE t1;
+CREATE PUBLICATION
+test_pub=# CREATE PUBLICATION pub2 FOR TABLE t2 WITH (publish = 'truncate');
+CREATE PUBLICATION
+test_pub=# CREATE PUBLICATION pub3a FOR TABLE t3 WITH (publish = 'truncate');
+CREATE PUBLICATION
+test_pub=# CREATE PUBLICATION pub3b FOR TABLE t3 WHERE (e > 5);
+CREATE PUBLICATION
+</programlisting></para>
+
+    <para>
+     Create subscriptions for the publications. The subscription
+     <literal>sub3</literal> subscribes to both <literal>pub3a</literal> and
+     <literal>pub3b</literal>. All subscriptions will copy initial data by default.
+<programlisting>
+test_sub=# CREATE SUBSCRIPTION sub1
+test_sub-# CONNECTION 'host=localhost dbname=test_pub application_name=sub1'
+test_sub-# PUBLICATION pub1;
+CREATE SUBSCRIPTION
+test_sub=# CREATE SUBSCRIPTION sub2
+test_sub-# CONNECTION 'host=localhost dbname=test_pub application_name=sub2'
+test_sub-# PUBLICATION pub2;
+CREATE SUBSCRIPTION
+test_sub=# CREATE SUBSCRIPTION sub3
+test_sub-# CONNECTION 'host=localhost dbname=test_pub application_name=sub3'
+test_sub-# PUBLICATION pub3a, pub3b;
+CREATE SUBSCRIPTION
+</programlisting></para>
+
+    <para>
+     Observe that initial table data is copied, regardless of the
+     <literal>publish</literal> operation of the publication.
+<programlisting>
+test_sub=# SELECT * FROM t1;
+ a |   b
+---+-------
+ 1 | one
+ 2 | two
+ 3 | three
+(3 rows)
+
+test_sub=# SELECT * FROM t2;
+ c | d
+---+---
+ 1 | A
+ 2 | B
+ 3 | C
+(3 rows)
+</programlisting></para>
+
+    <para>
+     Furthermore, because the initial data copy ignores the <literal>publish</literal>
+     operation, and because publication <literal>pub3a</literal> has no row filter,
+     it means the copied table <literal>t3</literal> contains all rows even when
+     they do not match the row filter of publication <literal>pub3b</literal>.
+<programlisting>
+test_sub=# SELECT * FROM t3;
+ e |  f
+---+-----
+ 1 | i
+ 2 | ii
+ 3 | iii
+(3 rows)
+</programlisting></para>
+
+   <para>
+    Insert more data to the tables at the publisher side.
+<programlisting>
+test_pub=# INSERT INTO t1 VALUES (4, 'four'), (5, 'five'), (6, 'six');
+INSERT 0 3
+test_pub=# INSERT INTO t2 VALUES (4, 'D'), (5, 'E'), (6, 'F');
+INSERT 0 3
+test_pub=# INSERT INTO t3 VALUES (4, 'iv'), (5, 'v'), (6, 'vi');
+INSERT 0 3
+</programlisting></para>
+
+   <para>
+    Now the publisher side data looks like:
+<programlisting>
+test_pub=# SELECT * FROM t1;
+ a |   b
+---+-------
+ 1 | one
+ 2 | two
+ 3 | three
+ 4 | four
+ 5 | five
+ 6 | six
+(6 rows)
+
+test_pub=# SELECT * FROM t2;
+ c | d
+---+---
+ 1 | A
+ 2 | B
+ 3 | C
+ 4 | D
+ 5 | E
+ 6 | F
+(6 rows)
+
+test_pub=# SELECT * FROM t3;
+ e |  f
+---+-----
+ 1 | i
+ 2 | ii
+ 3 | iii
+ 4 | iv
+ 5 | v
+ 6 | vi
+(6 rows)
+</programlisting></para>
+
+   <para>
+    Observe that during normal replication the appropriate
+    <literal>publish</literal> operations are used. This means publications
+    <literal>pub2</literal> and <literal>pub3a</literal> will not replicate the
+    <literal>INSERT</literal>. Also, publication <literal>pub3b</literal> will
+    only replicate data that matches the row filter of <literal>pub3b</literal>.
+    Now the subscriber side data looks like:
+<programlisting>
+test_sub=# SELECT * FROM t1;
+ a |   b
+---+-------
+ 1 | one
+ 2 | two
+ 3 | three
+ 4 | four
+ 5 | five
+ 6 | six
+(6 rows)
+
+test_sub=# SELECT * FROM t2;
+ c | d
+---+---
+ 1 | A
+ 2 | B
+ 3 | C
+(3 rows)
+
+test_sub=# SELECT * FROM t3;
+ e |  f
+---+-----
+ 1 | i
+ 2 | ii
+ 3 | iii
+ 6 | vi
+(4 rows)
+</programlisting></para>
+  </sect2>
+
  </sect1>
 
  <sect1 id="logical-replication-row-filter">
@@ -461,6 +655,16 @@
     <xref linkend="logical-replication-row-filter-combining"/> for details.
    </para>
 
+   <warning>
+    <para>
+     Because initial data synchronization does not take into account the
+     <literal>publish</literal> parameter when copying existing table data,
+     some rows may be copied that would not be replicated using DML. Refer to
+     <xref linkend="logical-replication-snapshot"/>, and see
+     <xref linkend="logical-replication-subscription-examples"/> for examples.
+    </para>
+   </warning>
+
    <note>
     <para>
      If the subscriber is in a release prior to 15, copy pre-existing data
@@ -1095,6 +1299,13 @@ CONTEXT:  processing remote data for replication origin "pg_16395" during "INSER
      replication of the table is given back to the main apply process where
      replication continues as normal.
     </para>
+    <note>
+     <para>
+      The publication <literal>publish</literal> parameter only affects what
+      DML operations will be replicated. The initial data synchronization does
+      not take this parameter into account when copying the existing table data.
+     </para>
+    </note>
   </sect2>
  </sect1>
 
diff --git a/doc/src/sgml/ref/create_publication.sgml b/doc/src/sgml/ref/create_publication.sgml
index 1a828e8..92e1045 100644
--- a/doc/src/sgml/ref/create_publication.sgml
+++ b/doc/src/sgml/ref/create_publication.sgml
@@ -177,6 +177,11 @@ CREATE PUBLICATION <replaceable class="parameter">name</replaceable>
           and so the default value for this option is
           <literal>'insert, update, delete, truncate'</literal>.
          </para>
+         <para>
+          This parameter only affects DML operations. In particular, the
+          subscription initial data synchronization does not take this parameter
+          into account when copying existing table data.
+         </para>
         </listitem>
        </varlistentry>
 
diff --git a/doc/src/sgml/ref/create_subscription.sgml b/doc/src/sgml/ref/create_subscription.sgml
index 35b39c2..34b3264 100644
--- a/doc/src/sgml/ref/create_subscription.sgml
+++ b/doc/src/sgml/ref/create_subscription.sgml
@@ -353,6 +353,10 @@ CREATE SUBSCRIPTION <replaceable class="parameter">subscription_name</replaceabl
    15 then any row filtering is ignored during the initial data synchronization
    phase. For this case, the user might want to consider deleting any initially
    copied data that would be incompatible with subsequent filtering.
+   Because initial data synchronization does not take into account the publication
+   <literal>publish</literal> parameter when copying existing table data, some rows
+   may be copied that would not be replicated using DML. See
+   <xref linkend="logical-replication-subscription-examples"/> for examples.
   </para>
 
   <para>
-- 
1.8.3.1

#8Amit Kapila
amit.kapila16@gmail.com
In reply to: Peter Smith (#7)
Re: tablesync copy ignores publication actions

On Thu, Jun 16, 2022 at 6:07 AM Peter Smith <smithpb2250@gmail.com> wrote:

Thank you for your review comments. Those reported mistakes are fixed
in the attached patch v3.

This patch looks mostly good to me except for a few minor comments
which are mentioned below. It is not very clear in which branch(es) we
should commit this patch? As per my understanding, this is a
pre-existing behavior but we want to document it because (a) It was
not already documented, and (b) we followed it for row filters in
PG-15 it seems that should be explained. So, we have the following
options (a) commit it only for PG-15, (b) commit for PG-15 and
backpatch the relevant sections, or (c) commit it when branch opens
for PG-16. What do you or others think?

Few comments:
==============
1.

-   particular event types.  By default, all operation types are replicated.
-   (Row filters have no effect for <command>TRUNCATE</command>. See
-   <xref linkend="logical-replication-row-filter"/>).
+   particular event types. By default, all operation types are replicated.
+   These are DML operation limitations only; they do not affect the initial
+   data synchronization copy.

Using limitations in the above sentence can be misleading. Can we
change it to something like: "These publication specifications apply
only for DML operations; they do ... ".

2.
+ operations. The publication <literal>pub3b</literal> has a row filter.

In the Examples section, you have used row filter whereas that section
is later in the docs. So, it is better if you give reference to that
section in the above sentence (see Section ...).

3.
+         <para>
+          This parameter only affects DML operations. In particular, the
+          subscription initial data synchronization does not take
this parameter
+          into account when copying existing table data.
+         </para>

In the second sentence: "... subscription initial data synchronization
..." doesn't sound appropriate. Can we change it to something like:
"In particular, the initial data synchronization (see Section ..) in
logical replication does not take this parameter into account when
copying existing table data."?

--
With Regards,
Amit Kapila.

#9Peter Smith
smithpb2250@gmail.com
In reply to: Amit Kapila (#8)
1 attachment(s)
Re: tablesync copy ignores publication actions

On Wed, Jun 22, 2022 at 2:18 PM Amit Kapila <amit.kapila16@gmail.com> wrote:

On Thu, Jun 16, 2022 at 6:07 AM Peter Smith <smithpb2250@gmail.com> wrote:

Thank you for your review comments. Those reported mistakes are fixed
in the attached patch v3.

This patch looks mostly good to me except for a few minor comments
which are mentioned below. It is not very clear in which branch(es) we
should commit this patch? As per my understanding, this is a
pre-existing behavior but we want to document it because (a) It was
not already documented, and (b) we followed it for row filters in
PG-15 it seems that should be explained. So, we have the following
options (a) commit it only for PG-15, (b) commit for PG-15 and
backpatch the relevant sections, or (c) commit it when branch opens
for PG-16. What do you or others think?

Even though this is a very old docs omission, AFAIK nobody ever raised
it as a problem before. It only became more important because of the
PG15 row-filters. So I think option (a) is ok.

Few comments:
==============
1.

-   particular event types.  By default, all operation types are replicated.
-   (Row filters have no effect for <command>TRUNCATE</command>. See
-   <xref linkend="logical-replication-row-filter"/>).
+   particular event types. By default, all operation types are replicated.
+   These are DML operation limitations only; they do not affect the initial
+   data synchronization copy.

Using limitations in the above sentence can be misleading. Can we
change it to something like: "These publication specifications apply
only for DML operations; they do ... ".

OK - modified as suggested.

2.
+ operations. The publication <literal>pub3b</literal> has a row filter.

In the Examples section, you have used row filter whereas that section
is later in the docs. So, it is better if you give reference to that
section in the above sentence (see Section ...).

OK - added xref as suggested.

3.
+         <para>
+          This parameter only affects DML operations. In particular, the
+          subscription initial data synchronization does not take
this parameter
+          into account when copying existing table data.
+         </para>

In the second sentence: "... subscription initial data synchronization
..." doesn't sound appropriate. Can we change it to something like:
"In particular, the initial data synchronization (see Section ..) in
logical replication does not take this parameter into account when
copying existing table data."?

OK - modified and added xref as suggested.

~~

PSA patch v4 to address all the above review comments.

------
Kind Regards,
Peter Smith.
Fujitsu Australia

Attachments:

v4-0001-PGDOCS-tablesync-ignores-publish-operation.patchapplication/octet-stream; name=v4-0001-PGDOCS-tablesync-ignores-publish-operation.patchDownload
From 3560735c0c4799f087c4bffadf7c413438c9dc7f Mon Sep 17 00:00:00 2001
From: Peter Smith <peter.b.smith@fujitsu.com>
Date: Wed, 22 Jun 2022 18:39:31 +1000
Subject: [PATCH v4] PGDOCS tablesync ignores publish operation

The logical replication initial data synchronization (tablesync) does not
take into account the  publication 'publish' parameter when copying the
existing table data.

This has long been PG behaviour but it was not documented before.

This patch documents it and gives some examples.
---
 doc/src/sgml/logical-replication.sgml     | 218 +++++++++++++++++++++++++++++-
 doc/src/sgml/ref/create_publication.sgml  |   6 +
 doc/src/sgml/ref/create_subscription.sgml |   4 +
 3 files changed, 225 insertions(+), 3 deletions(-)

diff --git a/doc/src/sgml/logical-replication.sgml b/doc/src/sgml/logical-replication.sgml
index 145ea71..bdf1e7b 100644
--- a/doc/src/sgml/logical-replication.sgml
+++ b/doc/src/sgml/logical-replication.sgml
@@ -117,9 +117,10 @@
    Publications can choose to limit the changes they produce to
    any combination of <command>INSERT</command>, <command>UPDATE</command>,
    <command>DELETE</command>, and <command>TRUNCATE</command>, similar to how triggers are fired by
-   particular event types.  By default, all operation types are replicated.
-   (Row filters have no effect for <command>TRUNCATE</command>. See
-   <xref linkend="logical-replication-row-filter"/>).
+   particular event types. By default, all operation types are replicated.
+   These publication specifications apply only for DML operations; they do not affect the initial
+   data synchronization copy. (Row filters have no effect for
+   <command>TRUNCATE</command>. See <xref linkend="logical-replication-row-filter"/>).
   </para>
 
   <para>
@@ -317,6 +318,200 @@
     </itemizedlist>
    </para>
   </sect2>
+
+  <sect2 id="logical-replication-subscription-examples">
+    <title>Examples</title>
+
+    <para>
+     Create some test tables on the publisher.
+<programlisting>
+test_pub=# CREATE TABLE t1(a int, b text, PRIMARY KEY(a));
+CREATE TABLE
+test_pub=# CREATE TABLE t2(c int, d text, PRIMARY KEY(c));
+CREATE TABLE
+test_pub=# CREATE TABLE t3(e int, f text, PRIMARY KEY(e));
+CREATE TABLE
+</programlisting></para>
+
+    <para>
+     Create the same tables on the subscriber.
+<programlisting>
+test_sub=# CREATE TABLE t1(a int, b text, PRIMARY KEY(a));
+CREATE TABLE
+test_sub=# CREATE TABLE t2(c int, d text, PRIMARY KEY(c));
+CREATE TABLE
+test_sub=# CREATE TABLE t3(e int, f text, PRIMARY KEY(e));
+CREATE TABLE
+</programlisting></para>
+
+    <para>
+     Insert data to the tables at the publisher side.
+<programlisting>
+test_pub=# INSERT INTO t1 VALUES (1, 'one'), (2, 'two'), (3, 'three');
+INSERT 0 3
+test_pub=# INSERT INTO t2 VALUES (1, 'A'), (2, 'B'), (3, 'C');
+INSERT 0 3
+test_pub=# INSERT INTO t3 VALUES (1, 'i'), (2, 'ii'), (3, 'iii');
+INSERT 0 3
+</programlisting></para>
+
+    <para>
+     Create publications for the tables. The publications <literal>pub2</literal>
+     and <literal>pub3a</literal> disallow some <literal>publish</literal>
+     operations. The publication <literal>pub3b</literal> has a row filter (see
+     <xref linkend="logical-replication-row-filter"/>).
+<programlisting>
+test_pub=# CREATE PUBLICATION pub1 FOR TABLE t1;
+CREATE PUBLICATION
+test_pub=# CREATE PUBLICATION pub2 FOR TABLE t2 WITH (publish = 'truncate');
+CREATE PUBLICATION
+test_pub=# CREATE PUBLICATION pub3a FOR TABLE t3 WITH (publish = 'truncate');
+CREATE PUBLICATION
+test_pub=# CREATE PUBLICATION pub3b FOR TABLE t3 WHERE (e > 5);
+CREATE PUBLICATION
+</programlisting></para>
+
+    <para>
+     Create subscriptions for the publications. The subscription
+     <literal>sub3</literal> subscribes to both <literal>pub3a</literal> and
+     <literal>pub3b</literal>. All subscriptions will copy initial data by default.
+<programlisting>
+test_sub=# CREATE SUBSCRIPTION sub1
+test_sub-# CONNECTION 'host=localhost dbname=test_pub application_name=sub1'
+test_sub-# PUBLICATION pub1;
+CREATE SUBSCRIPTION
+test_sub=# CREATE SUBSCRIPTION sub2
+test_sub-# CONNECTION 'host=localhost dbname=test_pub application_name=sub2'
+test_sub-# PUBLICATION pub2;
+CREATE SUBSCRIPTION
+test_sub=# CREATE SUBSCRIPTION sub3
+test_sub-# CONNECTION 'host=localhost dbname=test_pub application_name=sub3'
+test_sub-# PUBLICATION pub3a, pub3b;
+CREATE SUBSCRIPTION
+</programlisting></para>
+
+    <para>
+     Observe that initial table data is copied, regardless of the
+     <literal>publish</literal> operation of the publication.
+<programlisting>
+test_sub=# SELECT * FROM t1;
+ a |   b
+---+-------
+ 1 | one
+ 2 | two
+ 3 | three
+(3 rows)
+
+test_sub=# SELECT * FROM t2;
+ c | d
+---+---
+ 1 | A
+ 2 | B
+ 3 | C
+(3 rows)
+</programlisting></para>
+
+    <para>
+     Furthermore, because the initial data copy ignores the <literal>publish</literal>
+     operation, and because publication <literal>pub3a</literal> has no row filter,
+     it means the copied table <literal>t3</literal> contains all rows even when
+     they do not match the row filter of publication <literal>pub3b</literal>.
+<programlisting>
+test_sub=# SELECT * FROM t3;
+ e |  f
+---+-----
+ 1 | i
+ 2 | ii
+ 3 | iii
+(3 rows)
+</programlisting></para>
+
+   <para>
+    Insert more data to the tables at the publisher side.
+<programlisting>
+test_pub=# INSERT INTO t1 VALUES (4, 'four'), (5, 'five'), (6, 'six');
+INSERT 0 3
+test_pub=# INSERT INTO t2 VALUES (4, 'D'), (5, 'E'), (6, 'F');
+INSERT 0 3
+test_pub=# INSERT INTO t3 VALUES (4, 'iv'), (5, 'v'), (6, 'vi');
+INSERT 0 3
+</programlisting></para>
+
+   <para>
+    Now the publisher side data looks like:
+<programlisting>
+test_pub=# SELECT * FROM t1;
+ a |   b
+---+-------
+ 1 | one
+ 2 | two
+ 3 | three
+ 4 | four
+ 5 | five
+ 6 | six
+(6 rows)
+
+test_pub=# SELECT * FROM t2;
+ c | d
+---+---
+ 1 | A
+ 2 | B
+ 3 | C
+ 4 | D
+ 5 | E
+ 6 | F
+(6 rows)
+
+test_pub=# SELECT * FROM t3;
+ e |  f
+---+-----
+ 1 | i
+ 2 | ii
+ 3 | iii
+ 4 | iv
+ 5 | v
+ 6 | vi
+(6 rows)
+</programlisting></para>
+
+   <para>
+    Observe that during normal replication the appropriate
+    <literal>publish</literal> operations are used. This means publications
+    <literal>pub2</literal> and <literal>pub3a</literal> will not replicate the
+    <literal>INSERT</literal>. Also, publication <literal>pub3b</literal> will
+    only replicate data that matches the row filter of <literal>pub3b</literal>.
+    Now the subscriber side data looks like:
+<programlisting>
+test_sub=# SELECT * FROM t1;
+ a |   b
+---+-------
+ 1 | one
+ 2 | two
+ 3 | three
+ 4 | four
+ 5 | five
+ 6 | six
+(6 rows)
+
+test_sub=# SELECT * FROM t2;
+ c | d
+---+---
+ 1 | A
+ 2 | B
+ 3 | C
+(3 rows)
+
+test_sub=# SELECT * FROM t3;
+ e |  f
+---+-----
+ 1 | i
+ 2 | ii
+ 3 | iii
+ 6 | vi
+(4 rows)
+</programlisting></para>
+  </sect2>
+
  </sect1>
 
  <sect1 id="logical-replication-row-filter">
@@ -461,6 +656,16 @@
     <xref linkend="logical-replication-row-filter-combining"/> for details.
    </para>
 
+   <warning>
+    <para>
+     Because initial data synchronization does not take into account the
+     <literal>publish</literal> parameter when copying existing table data,
+     some rows may be copied that would not be replicated using DML. Refer to
+     <xref linkend="logical-replication-snapshot"/>, and see
+     <xref linkend="logical-replication-subscription-examples"/> for examples.
+    </para>
+   </warning>
+
    <note>
     <para>
      If the subscriber is in a release prior to 15, copy pre-existing data
@@ -1095,6 +1300,13 @@ CONTEXT:  processing remote data for replication origin "pg_16395" during "INSER
      replication of the table is given back to the main apply process where
      replication continues as normal.
     </para>
+    <note>
+     <para>
+      The publication <literal>publish</literal> parameter only affects what
+      DML operations will be replicated. The initial data synchronization does
+      not take this parameter into account when copying the existing table data.
+     </para>
+    </note>
   </sect2>
  </sect1>
 
diff --git a/doc/src/sgml/ref/create_publication.sgml b/doc/src/sgml/ref/create_publication.sgml
index 1a828e8..5790d76 100644
--- a/doc/src/sgml/ref/create_publication.sgml
+++ b/doc/src/sgml/ref/create_publication.sgml
@@ -177,6 +177,12 @@ CREATE PUBLICATION <replaceable class="parameter">name</replaceable>
           and so the default value for this option is
           <literal>'insert, update, delete, truncate'</literal>.
          </para>
+         <para>
+          This parameter only affects DML operations. In particular, the initial
+          data synchronization (see <xref linkend="logical-replication-snapshot"/>)
+          for logical replication does not take this parameter into account when
+          copying existing table data.
+         </para>
         </listitem>
        </varlistentry>
 
diff --git a/doc/src/sgml/ref/create_subscription.sgml b/doc/src/sgml/ref/create_subscription.sgml
index 35b39c2..34b3264 100644
--- a/doc/src/sgml/ref/create_subscription.sgml
+++ b/doc/src/sgml/ref/create_subscription.sgml
@@ -353,6 +353,10 @@ CREATE SUBSCRIPTION <replaceable class="parameter">subscription_name</replaceabl
    15 then any row filtering is ignored during the initial data synchronization
    phase. For this case, the user might want to consider deleting any initially
    copied data that would be incompatible with subsequent filtering.
+   Because initial data synchronization does not take into account the publication
+   <literal>publish</literal> parameter when copying existing table data, some rows
+   may be copied that would not be replicated using DML. See
+   <xref linkend="logical-replication-subscription-examples"/> for examples.
   </para>
 
   <para>
-- 
1.8.3.1

#10shiy.fnst@fujitsu.com
shiy.fnst@fujitsu.com
In reply to: Peter Smith (#9)
RE: tablesync copy ignores publication actions

On Wed, Jun 22, 2022 4:49 PM Peter Smith <smithpb2250@gmail.com> wrote:

On Wed, Jun 22, 2022 at 2:18 PM Amit Kapila <amit.kapila16@gmail.com>
wrote:

On Thu, Jun 16, 2022 at 6:07 AM Peter Smith <smithpb2250@gmail.com>

wrote:

Thank you for your review comments. Those reported mistakes are fixed
in the attached patch v3.

This patch looks mostly good to me except for a few minor comments
which are mentioned below. It is not very clear in which branch(es) we
should commit this patch? As per my understanding, this is a
pre-existing behavior but we want to document it because (a) It was
not already documented, and (b) we followed it for row filters in
PG-15 it seems that should be explained. So, we have the following
options (a) commit it only for PG-15, (b) commit for PG-15 and
backpatch the relevant sections, or (c) commit it when branch opens
for PG-16. What do you or others think?

Even though this is a very old docs omission, AFAIK nobody ever raised
it as a problem before. It only became more important because of the
PG15 row-filters. So I think option (a) is ok.

I also think option (a) is ok.

PSA patch v4 to address all the above review comments.

Thanks for updating the patch. It looks good to me.

Besides, I tested the examples in the patch, and there's no problem.

Regards,
Shi yu

#11Amit Kapila
amit.kapila16@gmail.com
In reply to: shiy.fnst@fujitsu.com (#10)
Re: tablesync copy ignores publication actions

On Thu, Jun 23, 2022 at 8:43 AM shiy.fnst@fujitsu.com
<shiy.fnst@fujitsu.com> wrote:

On Wed, Jun 22, 2022 4:49 PM Peter Smith <smithpb2250@gmail.com> wrote:

This patch looks mostly good to me except for a few minor comments
which are mentioned below. It is not very clear in which branch(es) we
should commit this patch? As per my understanding, this is a
pre-existing behavior but we want to document it because (a) It was
not already documented, and (b) we followed it for row filters in
PG-15 it seems that should be explained. So, we have the following
options (a) commit it only for PG-15, (b) commit for PG-15 and
backpatch the relevant sections, or (c) commit it when branch opens
for PG-16. What do you or others think?

Even though this is a very old docs omission, AFAIK nobody ever raised
it as a problem before. It only became more important because of the
PG15 row-filters. So I think option (a) is ok.

I also think option (a) is ok.

PSA patch v4 to address all the above review comments.

Thanks for updating the patch. It looks good to me.

The patch looks good to me as well. I will push this patch in HEAD (as
per option (a)) tomorrow unless I see any more suggestions/comments.

--
With Regards,
Amit Kapila.

#12Robert Haas
robertmhaas@gmail.com
In reply to: Amit Kapila (#11)
Re: tablesync copy ignores publication actions

On Thu, Jun 23, 2022 at 2:13 AM Amit Kapila <amit.kapila16@gmail.com> wrote:

The patch looks good to me as well. I will push this patch in HEAD (as
per option (a)) tomorrow unless I see any more suggestions/comments.

The example seems to demonstrate the point quite well but one thing
that I notice is that it is quite long. I don't really see an obvious
way of making it shorter without making it less clear, so perhaps
that's fine.

--
Robert Haas
EDB: http://www.enterprisedb.com

#13Amit Kapila
amit.kapila16@gmail.com
In reply to: Robert Haas (#12)
Re: tablesync copy ignores publication actions

On Fri, Jun 24, 2022 at 2:09 AM Robert Haas <robertmhaas@gmail.com> wrote:

On Thu, Jun 23, 2022 at 2:13 AM Amit Kapila <amit.kapila16@gmail.com> wrote:

The patch looks good to me as well. I will push this patch in HEAD (as
per option (a)) tomorrow unless I see any more suggestions/comments.

The example seems to demonstrate the point quite well but one thing
that I notice is that it is quite long. I don't really see an obvious
way of making it shorter without making it less clear, so perhaps
that's fine.

Thanks for looking into it. Pushed!

--
With Regards,
Amit Kapila.