PG DOCS - logical replication filtering

amit.kapila16@gmail.com

almost 4 years ago

In reply to: Aleksander Alekseev (#2)

Re: PG DOCS - logical replication filtering

On Wed, Mar 2, 2022 at 2:37 PM Aleksander Alekseev
<aleksander@timescale.com> wrote:

I see that a large part of the documentation is commented and marked as TBA (Column Filters, Combining Different Kinds of Filters). Could you please clarify if it's a work-in-progress patch? If it's not, I believe the commented part should be removed before committing.

I think we can remove any Column Filters related information
(placeholders), if that patch gets committed, we can always extend the
existing docs.

--
With Regards,
Amit Kapila.

Aleksander Alekseev

aleksander@timescale.com

almost 4 years ago

In reply to: Amit Kapila (#4)

1 attachment(s)

Re: PG DOCS - logical replication filtering

Hi hackers,

I see that a large part of the documentation is commented and marked as
TBA (Column Filters, Combining Different Kinds of Filters). Could you
please clarify if it's a work-in-progress patch? If it's not, I believe the
commented part should be removed before committing.

I think we can remove any Column Filters related information
(placeholders), if that patch gets committed, we can always extend the
existing docs.

Here is an updated version of the patch.

--
Best regards,
Aleksander Alekseev

Attachments:

v2-0001-Update-the-documentation-for-logical-replication.patchapplication/octet-stream; name=v2-0001-Update-the-documentation-for-logical-replication.patchDownload

From e39bd692eb3adeb8029b1ff409aca49b861d46d9 Mon Sep 17 00:00:00 2001
From: Peter Smith <peter.b.smith@fujitsu.com>
Date: Wed, 2 Mar 2022 15:31:45 +1100
Subject: [PATCH] Update the documentation for logical replication.

This patch introduces a new documentation page for giving an overview of
what kinds of logical replication filtering are available, and how to use them.
The main new content addition to this page is the section describing
"Row Filtering".

Author: Peter Smith
Reviewed By: Greg Nancarrow, Aleksander Alekseev, Amit Kapila
Discussion: https://www.postgresql.org/message-id/CAHut%2BPtnsBr59%3D_NvxXp_%3DS-em0WxyuDOQmSTuHGb4sVhkHffg%40mail.gmail.com
---
 doc/src/sgml/logical-replication.sgml    | 353 ++++++++++++++++++++++-
 doc/src/sgml/ref/create_publication.sgml |   2 +
 2 files changed, 353 insertions(+), 2 deletions(-)

diff --git a/doc/src/sgml/logical-replication.sgml b/doc/src/sgml/logical-replication.sgml
index fb4472356d..88ecf2b914 100644
--- a/doc/src/sgml/logical-replication.sgml
+++ b/doc/src/sgml/logical-replication.sgml
@@ -116,8 +116,10 @@
   <para>
    Publications can choose to limit the changes they produce to
    any combination of <command>INSERT</command>, <command>UPDATE</command>,
-   <command>DELETE</command>, and <command>TRUNCATE</command>, similar to how triggers are fired by
-   particular event types.  By default, all operation types are replicated.
+   <command>DELETE</command>, and <command>TRUNCATE</command> by using
+   <quote>operation filters</quote>. By default, all operation types are
+   replicated. See <xref linkend="logical-replication-operation-filtering"/>
+   for details.
   </para>
 
   <para>
@@ -317,6 +319,353 @@
   </sect2>
  </sect1>
 
+ <sect1 id="logical-replication-filtering">
+  <title>Filtering</title>
+
+   <para>
+    By default, all data from all published tables will be replicated to the
+    appropriate subscribers.
+   </para>
+
+   <para>
+    The default data replication can be reduced by using filters. A user might
+    choose to use filters for behavioral or for performance reasons.
+   </para>
+
+   <para>
+    There are 2 different ways to filter what data gets replicated.
+   </para>
+
+  <sect2 id="logical-replication-operation-filtering">
+   <title>Operation Filters</title>
+
+   <para>
+    Publications can choose to limit the changes they produce to any combination
+    of <command>INSERT</command>, <command>UPDATE</command>, <command>DELETE</command>,
+    and <command>TRUNCATE</command>, similar to how triggers are fired by
+    particular event types.
+   </para>
+
+   <para>
+    By default, all operation types are replicated.
+   </para>
+
+   <para>
+    Use the <literal>PUBLICATION ...<link linkend="sql-createpublication">
+    WITH (publish = <replaceable class="parameter">value</replaceable>)
+    </link></literal> clause to modify which operations will do replication.
+   </para>
+
+   <sect3 id="logical-replication-opf-examples">
+    <title>Examples</title>
+
+    <para>
+     Create a publication that only publishes <command>INSERT</command>
+     operations in one table:
+<programlisting>
+CREATE PUBLICATION insert_only FOR TABLE mydata WITH (publish = 'insert');
+</programlisting>
+    </para>
+
+   </sect3>
+
+  </sect2>
+
+  <sect2 id="logical-replication-row-filtering">
+   <title>Row Filters</title>
+
+   <para>
+    With row filtering, the decision whether to replicate row data depends on
+    the row data values. Rows that don't satisfy an optional <literal>WHERE</literal>
+    clause will be filtered out. This allows a database or set of tables to be
+    partially replicated.
+   </para>
+
+   <para>
+    Use the <literal>PUBLICATION ... FOR TABLE </literal>
+    <link linkend="sql-createpublication"><literal>
+    <replaceable class="parameter">table_name</replaceable> WHERE (
+    <replaceable class="parameter">expression</replaceable>)</literal></link>
+    to specify which rows will be replicated.
+   </para>
+
+   <para>
+    The row filtering is defined per table. A new row filter can be added simply
+    by specifying a <literal>WHERE</literal> clause after the table name. The
+    <literal>WHERE</literal> clause must be enclosed by parentheses.
+   </para>
+
+   <sect3 id="logical-replication-rf-rules">
+    <title>Row filter Rules</title>
+
+    <para>
+     If the row filter evaluates to <literal>false</literal> or
+     <literal>NULL</literal> then the row is not replicated.
+    </para>
+
+    <para>
+     Row filters are applied <emphasis>before</emphasis> publishing the changes.
+    </para>
+
+    <para>
+     The <literal>WHERE</literal> clause expression is evaluated with the same
+     role used for the replication connection.
+    </para>
+
+    <para>
+     Row filters have no effect for <command>TRUNCATE</command> commands.
+    </para>
+
+   </sect3>
+
+   <sect3 id="logical-replication-rf-transformations">
+    <title>UPDATE transformations</title>
+
+    <para>
+     Whenever an <command>UPDATE</command> is processed, the row filter
+     expression is evaluated for both the old and new row (i.e. before
+     and after the data is updated).
+    </para>
+
+    <para>
+     If both evaluations are <literal>true</literal>, it replicates the
+     <command>UPDATE</command>.
+    </para>
+
+    <para>
+     If both evaluations are <literal>false</literal>, it doesn't replicate
+     anything.
+    </para>
+
+    <para>
+     If only one of the old/new rows matches the row filter expression, the
+     <command>UPDATE</command> is transformed to <command>INSERT</command> or
+     <command>DELETE</command>, to avoid any data inconsistency, as follows:
+<synopsis>
+Case 1: old-row (no match)    new-row (no match)   -->   (drop change)
+Case 2: old-row (no match)    new row (match)      -->   INSERT
+Case 3: old-row (match)       new-row (no match)   -->   DELETE
+Case 4: old-row (match)       new row (match)      -->   UPDATE
+</synopsis>
+    </para>
+
+   </sect3>
+
+   <sect3 id="logical-replication-rf-partitions">
+    <title>Partition tables</title>
+
+    <para>
+     If the publication contains a partitioned table, the publication parameter
+     <literal>publish_via_partition_root</literal> determines which row filter
+     is used.
+     <itemizedlist>
+
+      <listitem>
+       <para>
+        If <literal>publish_via_partition_root</literal> is <literal>false</literal>
+       (default), each <emphasis>partition's</emphasis> row filter is used.
+       </para>
+      </listitem>
+
+      <listitem>
+       <para>
+        If <literal>publish_via_partition_root</literal> is <literal>true</literal>,
+        the <emphasis>root partitioned table's</emphasis> row filter is used.
+       </para>
+      </listitem>
+
+     </itemizedlist>
+    </para>
+
+   </sect3>
+
+   <sect3 id="logical-replication-rf-restrictions">
+    <title>WHERE clause Expression Restrictions</title>
+
+    <para>
+     The row filter <literal>WHERE</literal> clause for a table added to a
+     publication that publishes <command>UPDATE</command> and/or
+     <command>DELETE</command> operations must contain only columns that are
+     covered by the <quote>replica identity</quote>. The row filter
+     <literal>WHERE</literal> clause for a table added to a publication that
+     publishes <command>INSERT</command> can use any column.
+    </para>
+
+    <para>
+     The <literal>WHERE</literal> clause allows only simple expressions that
+     don't have user-defined functions, operators, non-immutable built-in
+     functions, or references to system columns.
+    </para>
+
+   </sect3>
+
+   <sect3 id="logical-replication-rf-tablesync">
+    <title>Subscriber initial table synchronization</title>
+
+    <para>
+     If you choose to do the initial table synchronization, only data that
+     satisfies the row filters is copied to the subscriber.
+    </para>
+
+    <para>
+     If the subscription has several publications in which a table has been
+     published with different <literal>WHERE</literal> clauses, rows that satisfy
+     <emphasis>any</emphasis> of the expressions will be copied. (see
+     <xref linkend="logical-replication-rf-combining"/>).
+    </para>
+
+    <note>
+     <para>
+      Publication operations of filters are ignored during initial table
+      synchronization.
+     </para>
+    </note>
+
+    <note>
+     <para>
+      If a subscriber is a pre-15 version, the initial table synchronization
+      won't use row filters even if they are defined in the publication.
+     </para>
+    </note>
+
+   </sect3>
+
+   <sect3 id="logical-replication-rf-combining">
+    <title>Combining multiple row filters for the same table</title>
+
+    <para>
+     If the subscription has several publications in which the same table has
+     been published with different row filters (for the same publish
+     operation), those expressions get OR'ed together, so that rows satisfying
+     <emphasis>any</emphasis> of the expressions will be replicated.
+   </para>
+
+   <para>
+    This means all the other filters (for the same table) become redundant if:
+    <itemizedlist>
+
+     <listitem>
+      <para>
+       one of the publications has no filter at all.
+      </para>
+     </listitem>
+
+     <listitem>
+      <para>
+       one of the publications was created using <literal>FOR ALL TABLES</literal>.
+      </para>
+     </listitem>
+
+     <listitem>
+      <para>
+       one of the publications was created using
+       <literal>FOR ALL TABLES IN SCHEMA</literal> and the table belongs to
+       that schema.
+      </para>
+     </listitem>
+
+    </itemizedlist>
+   </para>
+
+   </sect3>
+
+   <sect3 id="logical-replication-rf-psql">
+    <title>PSQL commands to display Row Filters</title>
+
+    <para>
+     PSQL commands <command>\dRp+</command> and <command>\d</command> will
+     display row filter information.
+    </para>
+
+   </sect3>
+
+   <sect3 id="logical-replication-rf-examples">
+    <title>Examples</title>
+
+    <para>
+     Create some tables to be used in the following examples.
+<programlisting>
+test_pub=# CREATE TABLE t1(a int, b int, c text);
+CREATE TABLE
+test_pub=# CREATE TABLE t2(d int, e int, f int);
+CREATE TABLE
+</programlisting>
+    </para>
+
+    <para>
+     Create some publications.
+    </para>
+
+    <para>
+     Notice that table <literal>t1</literal> is a member of multiple
+     publications.
+<programlisting>
+test_pub=# CREATE PUBLICATION p1 FOR TABLE t1 WHERE (a > 5 AND c IS NULL);
+CREATE PUBLICATION
+test_pub=# CREATE PUBLICATION p2 FOR TABLE t1, t2 WHERE (e = 99);
+CREATE PUBLICATION
+</programlisting>
+    </para>
+
+    <para>
+     The PSQL command <command>\dRp+</command> shows what row filters are
+     defined for the publications.
+<programlisting>
+test_pub=# \dRp+
+                               Publication p1
+  Owner   | All tables | Inserts | Updates | Deletes | Truncates | Via root 
+----------+------------+---------+---------+---------+-----------+----------
+ postgres | f          | t       | t       | t       | t         | f
+Tables:
+    "public.t1" WHERE ((a > 5) AND (c IS NULL))
+
+                               Publication p2
+  Owner   | All tables | Inserts | Updates | Deletes | Truncates | Via root 
+----------+------------+---------+---------+---------+-----------+----------
+ postgres | f          | t       | t       | t       | t         | f
+Tables:
+    "public.t1"
+    "public.t2" WHERE (e = 99)
+</programlisting>
+    </para>
+
+    <para>
+     The PSQL command <command>\d</command> shows what publications the table is
+     a member of, as well as that table's row filter in those publications.
+    </para>
+
+    <para>
+     Notice that table <literal>t1</literal> is a member of two publications, but
+     only has a <literal>WHERE</literal> clause for <literal>p1</literal>.
+<programlisting>
+test_pub=# \d t1
+                 Table "public.t1"
+ Column |  Type   | Collation | Nullable | Default 
+--------+---------+-----------+----------+---------
+ a      | integer |           |          | 
+ b      | integer |           |          | 
+ c      | text    |           |          | 
+Publications:
+    "p1" WHERE ((a > 5) AND (c IS NULL))
+    "p2"
+
+test_pub=# \d t2
+                 Table "public.t2"
+ Column |  Type   | Collation | Nullable | Default 
+--------+---------+-----------+----------+---------
+ d      | integer |           |          | 
+ e      | integer |           |          | 
+ f      | integer |           |          | 
+Publications:
+    "p2" WHERE (e = 99)
+</programlisting>
+    </para>
+
+   </sect3>
+
+  </sect2>
+ </sect1>
+
  <sect1 id="logical-replication-conflicts">
   <title>Conflicts</title>
 
diff --git a/doc/src/sgml/ref/create_publication.sgml b/doc/src/sgml/ref/create_publication.sgml
index 4979b9b646..cb16e97b22 100644
--- a/doc/src/sgml/ref/create_publication.sgml
+++ b/doc/src/sgml/ref/create_publication.sgml
@@ -247,6 +247,8 @@ CREATE PUBLICATION <replaceable class="parameter">name</replaceable>
    <literal>publish_via_partition_root</literal> determines if it uses the
    partition's row filter (if the parameter is false, the default) or the root
    partitioned table's row filter.
+   See <xref linkend="logical-replication-row-filtering"/> for more details about 
+   row filters.
   </para>
 
   <para>
-- 
2.35.1

Peter Eisentraut

peter.eisentraut@enterprisedb.com

almost 4 years ago

In reply to: Peter Smith (#1)

Re: PG DOCS - logical replication filtering

On 02.03.22 05:47, Peter Smith wrote:

This patch introduces a new "Filtering" page to give a common place
where all kinds of logical replication filtering can be described.
(e.g. It is envisaged that a "Column Filters" section can be added
sometime in the future).

The pending feature to select a subset of table columns to replicate is
not "column filtering". The thread might still be still called that,
but we have changed the patch to not use that terminology.

Filtering is a dynamic action based on actual values. The row filtering
feature does that. The column list feature is a static DDL-time
configuration. It is no more filtering than specifying a list of tables
in a publication is table filtering.

So please consider organizing the documentation differently to not
create this confusion.

smithpb2250@gmail.com

almost 4 years ago

In reply to: Aleksander Alekseev (#5)

Re: PG DOCS - logical replication filtering

On Wed, Mar 2, 2022 at 8:43 PM Aleksander Alekseev
<aleksander@timescale.com> wrote:
...

Here is an updated version of the patch.

Thanks for your review comments and fixes. The updated v2 patch looks
good to me.

------
Kind Regards,
Peter Smith.
Fujitsu Australia

smithpb2250@gmail.com

almost 4 years ago

In reply to: Amit Kapila (#4)

Re: PG DOCS - logical replication filtering

On Wed, Mar 2, 2022 at 8:26 PM Amit Kapila <amit.kapila16@gmail.com> wrote:

On Wed, Mar 2, 2022 at 2:37 PM Aleksander Alekseev
<aleksander@timescale.com> wrote:

I see that a large part of the documentation is commented and marked as TBA (Column Filters, Combining Different Kinds of Filters). Could you please clarify if it's a work-in-progress patch? If it's not, I believe the commented part should be removed before committing.

I think we can remove any Column Filters related information
(placeholders), if that patch gets committed, we can always extend the
existing docs.

------
Kind Regards,
Peter Smith.
Fujitsu Australia.

[1]: https://www.postgresql.org/docs/devel/logical-replication.html

amit.kapila16@gmail.com

almost 4 years ago

In reply to: Peter Eisentraut (#6)

Re: PG DOCS - logical replication filtering

On Wed, Mar 2, 2022 at 8:00 PM Peter Eisentraut
<peter.eisentraut@enterprisedb.com> wrote:

On 02.03.22 05:47, Peter Smith wrote:

This patch introduces a new "Filtering" page to give a common place
where all kinds of logical replication filtering can be described.
(e.g. It is envisaged that a "Column Filters" section can be added
sometime in the future).

The pending feature to select a subset of table columns to replicate is
not "column filtering". The thread might still be still called that,
but we have changed the patch to not use that terminology.

Filtering is a dynamic action based on actual values. The row filtering
feature does that. The column list feature is a static DDL-time
configuration. It is no more filtering than specifying a list of tables
in a publication is table filtering.

So please consider organizing the documentation differently to not
create this confusion.

+1. I think Row Filters can directly be a section just before
Conflicts on the logical replication page [1]https://www.postgresql.org/docs/devel/logical-replication.html.

Some comments on the patch:
1. I think we can extend/add the example to have filters on more than
one table. This has been noticed multiple times during development
that people are not very clear on it.
2. I think we can add an example or two for row filters actions (like
Insert, Update).
3.
 Publications can choose to limit the changes they produce to
    any combination of <command>INSERT</command>, <command>UPDATE</command>,
-   <command>DELETE</command>, and <command>TRUNCATE</command>,
similar to how triggers are fired by
-   particular event types.  By default, all operation types are replicated.
+   <command>DELETE</command>, and <command>TRUNCATE</command> by using
+   <quote>operation filters</quote>.

By this, one can imply that row filters are used for Truncate as well
but that is not true. I know that that patch later specifies that "Row
filters have no effect for <command>TRUNCATE</command> commands." but
the above modification is not very clear.

--
With Regards,
Amit Kapila.

#10

smithpb2250@gmail.com

almost 4 years ago

In reply to: Amit Kapila (#9)

Re: PG DOCS - logical replication filtering

On Thu, Mar 3, 2022 at 2:15 PM Amit Kapila <amit.kapila16@gmail.com> wrote:

On Wed, Mar 2, 2022 at 8:00 PM Peter Eisentraut
<peter.eisentraut@enterprisedb.com> wrote:

On 02.03.22 05:47, Peter Smith wrote:

This patch introduces a new "Filtering" page to give a common place
where all kinds of logical replication filtering can be described.
(e.g. It is envisaged that a "Column Filters" section can be added
sometime in the future).

The pending feature to select a subset of table columns to replicate is
not "column filtering". The thread might still be still called that,
but we have changed the patch to not use that terminology.

Filtering is a dynamic action based on actual values. The row filtering
feature does that. The column list feature is a static DDL-time
configuration. It is no more filtering than specifying a list of tables
in a publication is table filtering.

So please consider organizing the documentation differently to not
create this confusion.

+1. I think Row Filters can directly be a section just before
Conflicts on the logical replication page [1].

OK. I will reorganize the page as suggested, and also attend to the
other comments below.

Some comments on the patch:
1. I think we can extend/add the example to have filters on more than
one table. This has been noticed multiple times during development
that people are not very clear on it.
2. I think we can add an example or two for row filters actions (like
Insert, Update).
3.
Publications can choose to limit the changes they produce to
any combination of <command>INSERT</command>, <command>UPDATE</command>,
-   <command>DELETE</command>, and <command>TRUNCATE</command>,
similar to how triggers are fired by
-   particular event types.  By default, all operation types are replicated.
+   <command>DELETE</command>, and <command>TRUNCATE</command> by using
+   <quote>operation filters</quote>.
By this, one can imply that row filters are used for Truncate as well
but that is not true. I know that that patch later specifies that "Row
filters have no effect for <command>TRUNCATE</command> commands." but
the above modification is not very clear.

[1] - https://www.postgresql.org/docs/devel/logical-replication.html

------
Kind Regards,
Peter Smith.
Fujitsu Australia.

#11

smithpb2250@gmail.com

almost 4 years ago

In reply to: Aleksander Alekseev (#5)

1 attachment(s)

Re: PG DOCS - logical replication filtering

PSA patch v3 to address all comments received so far.

------
Kind Regards,
Peter Smith.
Fujitsu Australia

Attachments:

v3-0001-Update-the-documentation-for-logical-replication.patchapplication/octet-stream; name=v3-0001-Update-the-documentation-for-logical-replication.patchDownload

From 48359d69107da581d004da37230f949cb8796eb8 Mon Sep 17 00:00:00 2001
From: Peter Smith <peter.b.smith@fujitsu.com>
Date: Fri, 4 Mar 2022 14:32:33 +1100
Subject: [PATCH v3] Update the documentation for logical replication.

This patch introduces a new documentation page for describing
the "Row Filtering" feature.

Author: Peter Smith
Reviewed By: Greg Nancarrow, Aleksander Alekseev, Amit Kapila, Ajin Cherian
Discussion: https://www.postgresql.org/message-id/CAHut%2BPtnsBr59%3D_NvxXp_%3DS-em0WxyuDOQmSTuHGb4sVhkHffg%40mail.gmail.com
---
 doc/src/sgml/logical-replication.sgml    | 521 +++++++++++++++++++++++++++++++
 doc/src/sgml/ref/create_publication.sgml |   2 +
 2 files changed, 523 insertions(+)

diff --git a/doc/src/sgml/logical-replication.sgml b/doc/src/sgml/logical-replication.sgml
index 96b4886..657b460 100644
--- a/doc/src/sgml/logical-replication.sgml
+++ b/doc/src/sgml/logical-replication.sgml
@@ -118,6 +118,8 @@
    any combination of <command>INSERT</command>, <command>UPDATE</command>,
    <command>DELETE</command>, and <command>TRUNCATE</command>, similar to how triggers are fired by
    particular event types.  By default, all operation types are replicated.
+   (Row filters have no effect for <command>TRUNCATE</command>. See 
+   <xref linkend="logical-replication-row-filtering"/>).
   </para>
 
   <para>
@@ -317,6 +319,525 @@
   </sect2>
  </sect1>
 
+ <sect1 id="logical-replication-row-filtering">
+  <title>Row Filters</title>
+
+  <para>
+   By default, all data from all published tables will be replicated to the
+   appropriate subscribers.
+  </para>
+
+  <para>
+   The default data replication can be reduced by using <firstterm>row filters</firstterm>.
+   A user might choose to use row filters for behavioral or for performance
+   reasons.
+  </para>
+
+  <para>
+   With row filtering, the decision whether to replicate row data depends on
+   the row data values. Rows that don't satisfy an optional <literal>WHERE</literal>
+   clause will be filtered out. This allows a database or set of tables to be
+   partially replicated.
+  </para>
+
+  <para>
+   Use the <literal>PUBLICATION ... FOR TABLE </literal>
+   <link linkend="sql-createpublication"><literal>
+   <replaceable class="parameter">table_name</replaceable> WHERE (
+   <replaceable class="parameter">expression</replaceable>)</literal></link>
+   to specify which rows will be replicated.
+  </para>
+
+  <para>
+   The row filtering is defined per table. A new row filter can be added simply
+   by specifying a <literal>WHERE</literal> clause after the table name. The
+   <literal>WHERE</literal> clause must be enclosed by parentheses.
+  </para>
+
+  <sect2 id="logical-replication-rf-rules">
+   <title>Row filter Rules</title>
+
+   <para>
+    If the row filter evaluates to <literal>false</literal> or
+    <literal>NULL</literal> then the row is not replicated.
+   </para>
+
+   <para>
+    Row filters are applied <emphasis>before</emphasis> publishing the changes.
+   </para>
+
+   <para>
+    The <literal>WHERE</literal> clause expression is evaluated with the same
+    role used for the replication connection.
+   </para>
+
+   <para>
+    Row filters have no effect for <command>TRUNCATE</command> commands.
+   </para>
+
+  </sect2>
+
+  <sect2 id="logical-replication-rf-transformations">
+   <title>UPDATE transformations</title>
+
+   <para>
+    Whenever an <command>UPDATE</command> is processed, the row filter
+    expression is evaluated for both the old and new row (i.e. before
+    and after the data is updated).
+   </para>
+
+   <para>
+    If both evaluations are <literal>true</literal>, it replicates the
+    <command>UPDATE</command>.
+   </para>
+
+   <para>
+    If both evaluations are <literal>false</literal>, it doesn't replicate
+    anything.
+   </para>
+
+   <para>
+    If only one of the old/new rows matches the row filter expression, the
+    <command>UPDATE</command> is transformed to <command>INSERT</command> or
+    <command>DELETE</command>, to avoid any data inconsistency, as follows:
+<synopsis>
+Case 1: old-row (no match)    new-row (no match)   -->   (drop change)
+Case 2: old-row (no match)    new row (match)      -->   INSERT
+Case 3: old-row (match)       new-row (no match)   -->   DELETE
+Case 4: old-row (match)       new row (match)      -->   UPDATE
+</synopsis>
+   </para>
+
+  </sect2>
+
+  <sect2 id="logical-replication-rf-partitions">
+   <title>Partition tables</title>
+
+   <para>
+    If the publication contains a partitioned table, the publication parameter
+    <literal>publish_via_partition_root</literal> determines which row filter
+    is used.
+    <itemizedlist>
+
+     <listitem>
+      <para>
+       If <literal>publish_via_partition_root</literal> is <literal>false</literal>
+       (default), each <emphasis>partition's</emphasis> row filter is used.
+      </para>
+     </listitem>
+
+     <listitem>
+      <para>
+       If <literal>publish_via_partition_root</literal> is <literal>true</literal>,
+       the <emphasis>root partitioned table's</emphasis> row filter is used.
+      </para>
+     </listitem>
+
+    </itemizedlist>
+   </para>
+
+  </sect2>
+
+  <sect2 id="logical-replication-rf-restrictions">
+   <title>WHERE clause Expression Restrictions</title>
+
+   <para>
+    The row filter <literal>WHERE</literal> clause for a table added to a
+    publication that publishes <command>UPDATE</command> and/or
+    <command>DELETE</command> operations must contain only columns that are
+    covered by the <quote>replica identity</quote>. The row filter
+    <literal>WHERE</literal> clause for a table added to a publication that
+    publishes <command>INSERT</command> can use any column.
+   </para>
+
+   <para>
+    The <literal>WHERE</literal> clause allows only simple expressions that
+    don't have user-defined functions, operators, non-immutable built-in
+    functions, or references to system columns.
+   </para>
+
+  </sect2>
+
+  <sect2 id="logical-replication-rf-tablesync">
+   <title>Subscriber initial table synchronization</title>
+
+   <para>
+    If you choose to do the initial table synchronization, only data that
+    satisfies the row filters is copied to the subscriber.
+   </para>
+
+   <para>
+    If the subscription has several publications in which a table has been
+    published with different <literal>WHERE</literal> clauses, rows that satisfy
+    <emphasis>any</emphasis> of the expressions will be copied. (see
+    <xref linkend="logical-replication-rf-combining"/>).
+   </para>
+
+   <note>
+    <para>
+     Publication operations of filters are ignored during initial table
+     synchronization.
+    </para>
+   </note>
+
+   <note>
+    <para>
+     If a subscriber is a pre-15 version, the initial table synchronization
+     won't use row filters even if they are defined in the publication.
+    </para>
+   </note>
+
+  </sect2>
+
+  <sect2 id="logical-replication-rf-combining">
+   <title>Combining multiple row filters for the same table</title>
+
+   <para>
+    If the subscription has several publications in which the same table has
+    been published with different row filters (for the same publish
+    operation), those expressions get OR'ed together, so that rows satisfying
+    <emphasis>any</emphasis> of the expressions will be replicated.
+   </para>
+
+   <para>
+    This means all the other filters (for the same table) become redundant if:
+    <itemizedlist>
+
+     <listitem>
+      <para>
+       one of the publications has no filter at all.
+      </para>
+     </listitem>
+
+     <listitem>
+      <para>
+       one of the publications was created using <literal>FOR ALL TABLES</literal>.
+      </para>
+     </listitem>
+
+     <listitem>
+      <para>
+       one of the publications was created using
+       <literal>FOR ALL TABLES IN SCHEMA</literal> and the table belongs to
+       that schema.
+      </para>
+     </listitem>
+
+    </itemizedlist>
+   </para>
+
+  </sect2>
+
+  <sect2 id="logical-replication-rf-psql">
+   <title>PSQL commands to display Row Filters</title>
+
+   <para>
+    PSQL commands <command>\dRp+</command> and <command>\d</command> will
+    display row filter information.
+   </para>
+
+  </sect2>
+
+  <sect2 id="logical-replication-rf-examples">
+   <title>Examples</title>
+
+   <para>
+    Create some tables to be used in the following examples.
+<programlisting>
+test_pub=# CREATE TABLE t1(a int, b int, c text, primary key(a,c));
+CREATE TABLE
+test_pub=# CREATE TABLE t2(d int primary key, e int, f int);
+CREATE TABLE
+test_pub=# CREATE TABLE t3(g int primary key, h int, i int);
+CREATE TABLE
+</programlisting>
+   </para>
+
+   <para>
+    Create some publications.
+   </para>
+
+   <para>
+    Notice that publication <literal>p2</literal> has tables with and without
+    row filters.
+   </para>
+
+   <para>
+    Notice that publication <literal>p3</literal> has row filters on multiple
+    tables.
+<programlisting>
+test_pub=# CREATE PUBLICATION p1 FOR TABLE t1 WHERE (a > 5 AND c = 'NSW');
+CREATE PUBLICATION
+test_pub=# CREATE PUBLICATION p2 FOR TABLE t1, t2 WHERE (e = 99);
+CREATE PUBLICATION
+test_pub=# CREATE PUBLICATION p3 FOR TABLE t2 WHERE (d = 10), t3 WHERE (g = 10);
+CREATE PUBLICATION
+</programlisting>
+   </para>
+
+   <para>
+    The PSQL command <command>\dRp+</command> shows what row filters are
+    defined for the publications.
+<programlisting>
+test_pub=# \dRp+
+                               Publication p1
+  Owner   | All tables | Inserts | Updates | Deletes | Truncates | Via root 
+----------+------------+---------+---------+---------+-----------+----------
+ postgres | f          | t       | t       | t       | t         | f
+Tables:
+    "public.t1" WHERE ((a > 5) AND (c = 'NSW'::text))
+
+                               Publication p2
+  Owner   | All tables | Inserts | Updates | Deletes | Truncates | Via root 
+----------+------------+---------+---------+---------+-----------+----------
+ postgres | f          | t       | t       | t       | t         | f
+Tables:
+    "public.t1"
+    "public.t2" WHERE (e = 99)
+
+                               Publication p3
+  Owner   | All tables | Inserts | Updates | Deletes | Truncates | Via root 
+----------+------------+---------+---------+---------+-----------+----------
+ postgres | f          | t       | t       | t       | t         | f
+Tables:
+    "public.t2" WHERE (d = 10)
+    "public.t3" WHERE (g = 10)
+</programlisting>
+   </para>
+
+   <para>
+    The PSQL command <command>\d</command> shows what publications the table is
+    a member of, as well as that table's row filter in those publications.
+   </para>
+
+   <para>
+    Notice that table <literal>t1</literal> is a member of two publications, but
+    has a <literal>WHERE</literal> clause only for <literal>p1</literal>.
+   </para>
+
+   <para>
+    Notice that table <literal>t2</literal> is a member of two publications, and
+    has a different <literal>WHERE</literal> clause for each of them.
+<programlisting>
+test_pub=# \d t1
+                 Table "public.t1"
+ Column |  Type   | Collation | Nullable | Default 
+--------+---------+-----------+----------+---------
+ a      | integer |           | not null | 
+ b      | integer |           |          | 
+ c      | text    |           | not null | 
+Indexes:
+    "t1_pkey" PRIMARY KEY, btree (a, c)
+Publications:
+    "p1" WHERE ((a > 5) AND (c = 'NSW'::text))
+    "p2"
+</programlisting>
+<programlisting>
+test_pub=# \d t2
+                 Table "public.t2"
+ Column |  Type   | Collation | Nullable | Default 
+--------+---------+-----------+----------+---------
+ d      | integer |           | not null | 
+ e      | integer |           |          | 
+ f      | integer |           |          | 
+Indexes:
+    "t2_pkey" PRIMARY KEY, btree (d)
+Publications:
+    "p2" WHERE (e = 99)
+    "p3" WHERE (d = 10)
+</programlisting>
+<programlisting>
+test_pub=# \d t3
+                 Table "public.t3"
+ Column |  Type   | Collation | Nullable | Default 
+--------+---------+-----------+----------+---------
+ g      | integer |           | not null | 
+ h      | integer |           |          | 
+ i      | integer |           |          | 
+Indexes:
+    "t3_pkey" PRIMARY KEY, btree (g)
+Publications:
+    "p3" WHERE (g = 10)
+</programlisting>
+   </para>
+
+   <para>
+    On the subscriber node, create a table <literal>t1</literal> and a
+    subscription <literal>s1</literal> to publication <literal>p1</literal>. 
+<programlisting>
+test_sub=# CREATE TABLE t1(a int, b int, c text, primary key(a,c));
+CREATE TABLE
+test_sub=# CREATE SUBSCRIPTION s1
+test_sub-# CONNECTION 'host=localhost dbname=test_pub application_name=s1'
+test_sub-# PUBLICATION p1;
+CREATE SUBSCRIPTION
+</programlisting>
+   </para>
+
+   <para>
+    <literal>INSERT</literal> some data.
+   </para>
+
+   <para>
+    Notice that only the rows satisfying the <literal>t1 WHERE</literal>
+    clause of publication <literal>p1</literal> are replicated.
+<programlisting>
+test_pub=# INSERT INTO t1 VALUES (2, 102, 'NSW');
+INSERT 0 1
+test_pub=# INSERT INTO t1 VALUES (3, 103, 'QLD');
+INSERT 0 1
+test_pub=# INSERT INTO t1 VALUES (4, 104, 'VIC');
+INSERT 0 1
+test_pub=# INSERT INTO t1 VALUES (5, 105, 'ACT');
+INSERT 0 1
+test_pub=# INSERT INTO t1 VALUES (6, 106, 'NSW');
+INSERT 0 1
+test_pub=# INSERT INTO t1 VALUES (7, 107, 'NT');
+INSERT 0 1
+test_pub=# INSERT INTO t1 VALUES (8, 108, 'QLD');
+INSERT 0 1
+test_pub=# INSERT INTO t1 VALUES (9, 109, 'NSW');
+INSERT 0 1
+</programlisting>
+<programlisting>
+test_pub=# SELECT * FROM t1; 
+ a |  b  |  c
+---+-----+-----
+ 2 | 102 | NSW
+ 3 | 103 | QLD
+ 4 | 104 | VIC
+ 5 | 105 | ACT
+ 6 | 106 | NSW
+ 7 | 107 | NT
+ 8 | 108 | QLD
+ 9 | 109 | NSW
+(8 rows)
+</programlisting>
+<programlisting>
+test_sub=# SELECT * FROM t1;
+ a |  b  |  c  
+---+-----+-----
+ 6 | 106 | NSW
+ 9 | 109 | NSW
+(2 rows)
+</programlisting>
+   </para>
+
+   <para>
+    <literal>UPDATE</literal> some data, where the old and new values both
+    satisfy the <literal>t1 WHERE</literal> clause of publication
+    <literal>p1</literal>.
+   </para>
+
+   <para>
+    Notice this replicates as a normal <literal>UPDATE</literal>.
+<programlisting>
+test_pub=# UPDATE t1 SET b = 999 WHERE a = 6;
+UPDATE 1
+</programlisting>
+<programlisting>
+test_pub=# SELECT * FROM t1;
+ a |  b  |  c  
+---+-----+-----
+ 2 | 102 | NSW
+ 3 | 103 | QLD
+ 4 | 104 | VIC
+ 5 | 105 | ACT
+ 7 | 107 | NT
+ 8 | 108 | QLD
+ 9 | 109 | NSW
+ 6 | 999 | NSW
+(8 rows)
+</programlisting>
+<programlisting>
+test_sub=# SELECT * FROM t1;
+ a |  b  |  c  
+---+-----+-----
+ 9 | 109 | NSW
+ 6 | 999 | NSW
+(2 rows)
+</programlisting>
+   </para>
+
+   <para>
+    <literal>UPDATE</literal> some data, where the old values did not satisfy
+    the <literal>t1 WHERE</literal> clause of publication <literal>p1</literal>,
+    but the new values do satisfy it.
+   </para>
+
+   <para>
+    Notice the <literal>UPDATE</literal> is transformed and replicates as an
+    <literal>INSERT</literal>.
+<programlisting>
+test_pub=# UPDATE t1 SET a = 999 WHERE a = 2;
+UPDATE 1
+</programlisting>
+<programlisting>
+test_pub=# SELECT * FROM t1;
+  a  |  b  |  c  
+-----+-----+-----
+   3 | 103 | QLD
+   4 | 104 | VIC
+   5 | 105 | ACT
+   7 | 107 | NT
+   8 | 108 | QLD
+   9 | 109 | NSW
+   6 | 999 | NSW
+ 999 | 102 | NSW
+(8 rows)
+</programlisting>
+<programlisting>
+test_sub=# SELECT * FROM t1;
+  a  |  b  |  c  
+-----+-----+-----
+   9 | 109 | NSW
+   6 | 999 | NSW
+ 999 | 102 | NSW
+(3 rows)
+</programlisting>
+   </para>
+
+   <para>
+    <literal>UPDATE</literal> some data, where the old values satisfied
+    the <literal>t1 WHERE</literal> clause of publication <literal>p1</literal>,
+    but the new values do not satisfy it.
+   </para>
+
+   <para>
+    Notice the <literal>UPDATE</literal> is transformed and replicates as a
+    <literal>DELETE</literal>.
+<programlisting>
+test_pub=# UPDATE t1 SET c = 'VIC' WHERE a = 9;
+UPDATE 1
+</programlisting>
+<programlisting>
+test_pub=# SELECT * FROM t1;
+  a  |  b  |  c  
+-----+-----+-----
+   3 | 103 | QLD
+   4 | 104 | VIC
+   5 | 105 | ACT
+   7 | 107 | NT
+   8 | 108 | QLD
+   6 | 999 | NSW
+ 999 | 102 | NSW
+   9 | 109 | VIC
+(8 rows)
+</programlisting>
+<programlisting>
+test_sub=# SELECT * FROM t1;
+  a  |  b  |  c  
+-----+-----+-----
+   6 | 999 | NSW
+ 999 | 102 | NSW
+(2 rows)
+</programlisting>
+   </para>
+
+  </sect2>
+
+ </sect1>
+
  <sect1 id="logical-replication-conflicts">
   <title>Conflicts</title>
 
diff --git a/doc/src/sgml/ref/create_publication.sgml b/doc/src/sgml/ref/create_publication.sgml
index 4979b9b..cb16e97 100644
--- a/doc/src/sgml/ref/create_publication.sgml
+++ b/doc/src/sgml/ref/create_publication.sgml
@@ -247,6 +247,8 @@ CREATE PUBLICATION <replaceable class="parameter">name</replaceable>
    <literal>publish_via_partition_root</literal> determines if it uses the
    partition's row filter (if the parameter is false, the default) or the root
    partitioned table's row filter.
+   See <xref linkend="logical-replication-row-filtering"/> for more details about 
+   row filters.
   </para>
 
   <para>
-- 
1.8.3.1

#12

smithpb2250@gmail.com

almost 4 years ago

In reply to: Amit Kapila (#9)

Re: PG DOCS - logical replication filtering

On Thu, Mar 3, 2022 at 2:15 PM Amit Kapila <amit.kapila16@gmail.com> wrote:

On Wed, Mar 2, 2022 at 8:00 PM Peter Eisentraut
<peter.eisentraut@enterprisedb.com> wrote:

On 02.03.22 05:47, Peter Smith wrote:

This patch introduces a new "Filtering" page to give a common place
where all kinds of logical replication filtering can be described.
(e.g. It is envisaged that a "Column Filters" section can be added
sometime in the future).

The pending feature to select a subset of table columns to replicate is
not "column filtering". The thread might still be still called that,
but we have changed the patch to not use that terminology.

Filtering is a dynamic action based on actual values. The row filtering
feature does that. The column list feature is a static DDL-time
configuration. It is no more filtering than specifying a list of tables
in a publication is table filtering.

So please consider organizing the documentation differently to not
create this confusion.

+1. I think Row Filters can directly be a section just before
Conflicts on the logical replication page [1].

Some comments on the patch:
1. I think we can extend/add the example to have filters on more than
one table. This has been noticed multiple times during development
that people are not very clear on it.

Added example in v3 [1]/messages/by-id/CAHut+Ptwp0FscVpmMjHLV6_=SHR5ndwvWdX_gb41_3H2UA9ecA@mail.gmail.com

2. I think we can add an example or two for row filters actions (like
Insert, Update).

Added examples of INSERT and UPDATE in v3 [1]/messages/by-id/CAHut+Ptwp0FscVpmMjHLV6_=SHR5ndwvWdX_gb41_3H2UA9ecA@mail.gmail.com

3.
Publications can choose to limit the changes they produce to
any combination of <command>INSERT</command>, <command>UPDATE</command>,
-   <command>DELETE</command>, and <command>TRUNCATE</command>,
similar to how triggers are fired by
-   particular event types.  By default, all operation types are replicated.
+   <command>DELETE</command>, and <command>TRUNCATE</command> by using
+   <quote>operation filters</quote>.
By this, one can imply that row filters are used for Truncate as well
but that is not true. I know that that patch later specifies that "Row
filters have no effect for <command>TRUNCATE</command> commands." but
the above modification is not very clear.

Fixed in v3 [1]/messages/by-id/CAHut+Ptwp0FscVpmMjHLV6_=SHR5ndwvWdX_gb41_3H2UA9ecA@mail.gmail.com. Restored original text, and added a note about row filters.

------
[1]: /messages/by-id/CAHut+Ptwp0FscVpmMjHLV6_=SHR5ndwvWdX_gb41_3H2UA9ecA@mail.gmail.com

Kind Regards,
Peter Smith.
Fujitsu Australia.

#13

euler@eulerto.com

almost 4 years ago

In reply to: Peter Smith (#11)

1 attachment(s)

Re: PG DOCS - logical replication filtering

On Fri, Mar 4, 2022, at 12:41 AM, Peter Smith wrote:

PSA patch v3 to address all comments received so far.

Peter,

I started reviewing this documentation for row filter and I noticed that I was
changing too much lines to submit a patch on the top of it. Hence, I'm
attaching a new version based on this one.

Here as some of the changes that I did:

* links: I renamed the ids to use "logical-replication-row-filter" instead of
"logical-replication-rf" because it is used in the anchors. A compound word
is better than an abbreviation.
* titles: I changed all titles because of some stylish issue (words are
generally capitalized) or because it reads better.
* sections: I moved the "Restrictions" section to the top but it seems
important than the other sections. I removed the "psql" section. The psql
commands are shown in the "Examples" section so I don't think we should
provide a section for it.
* sentences: I expanded several sentences to provide details about the specific
topic. I also reordered some sentences and removed some duplicated sentences.
* replica identity: I added a link to replica identity. It is a key concept for
row filter.
* transformations: I added a few sentences explaining when/why a transformation
is required. I removed the "Cases" part (same as in the source code) because
it is redundant with the new sentences.
* partitioned table: the title should be _partitioned_ table. I replaced the
bullets with sentences in the same paragraph.
* initial data sync: I removed the "subscriber" from the section title. When
you say "initial data synchronization" it seems clear we're talking about the
subscriber. I also add a sentence saying why pre-15 does not consider row
filters.
* combining row filters: I renamed the section and decided to remove "for the
same table". When reading the first sentences from this section, it is clear
that row filtering is per table. So if you are combining multiple row
filters, it should be for the same table. I also added a sentence saying why
some clauses make the row filter irrelevant.
* examples: I combined the psql commands that shows row filter information
together (\dRp+ and \d). I changed to connection string to avoid "localhost".
Why? It does not seem a separate service (there is no port) and setup pub/sub
in the same cluster requires additional steps. It is better to illustrate
different clusters (at least it seems so since we don't provide details from
publisher). I changed a value in an UPDATE because both UPDATEs use 999.

We could probably reduce the number of rows in the example but I didn't bother
to remove them.

It seems we can remove some sentences from the CREATE PUBLICATION because we
have a new section that explains all of it. I think the link that was added by
this patch is sufficient.

--
Euler Taveira
EDB https://www.enterprisedb.com/

Attachments:

v4-0001-doc-new-section-for-row-filter.patchtext/x-patch; name=v4-0001-doc-new-section-for-row-filter.patchDownload

From aea0c822489fbdeba14141e215f76e3777521bb2 Mon Sep 17 00:00:00 2001
From: Euler Taveira <euler.taveira@enterprisedb.com>
Date: Thu, 10 Mar 2022 17:38:24 -0300
Subject: [PATCH v4] doc: new section for row filter.

---
 doc/src/sgml/logical-replication.sgml    | 444 +++++++++++++++++++++++
 doc/src/sgml/ref/create_publication.sgml |   2 +
 2 files changed, 446 insertions(+)

diff --git a/doc/src/sgml/logical-replication.sgml b/doc/src/sgml/logical-replication.sgml
index 82326c3901..38ac77b6e1 100644
--- a/doc/src/sgml/logical-replication.sgml
+++ b/doc/src/sgml/logical-replication.sgml
@@ -118,6 +118,8 @@
    any combination of <command>INSERT</command>, <command>UPDATE</command>,
    <command>DELETE</command>, and <command>TRUNCATE</command>, similar to how triggers are fired by
    particular event types.  By default, all operation types are replicated.
+   (Row filters have no effect for <command>TRUNCATE</command>. See 
+   <xref linkend="logical-replication-row-filter"/>).
   </para>
 
   <para>
@@ -317,6 +319,448 @@
   </sect2>
  </sect1>
 
+ <sect1 id="logical-replication-row-filter">
+  <title>Row Filter</title>
+
+  <para>
+   The published tables replicate all data to the subscribers. The replicated
+   data can be selected using a <firstterm>row filter</firstterm>. The user
+   might choose to use row filters for behavioral, security or performance
+   reasons.
+  </para>
+
+  <para>
+   If a published table sets a row filter, a row is replicated if its data
+   satisfies the row filter expression. It means if row filters are used in a
+   set of tables, data will be partially replicated.
+  </para>
+
+  <para>
+   The row filter is defined per table. Use a <literal>WHERE</literal> clause
+   after the table name for each published table that requires data to be
+   filtered out. The <literal>WHERE</literal> clause must be enclosed by
+   parentheses. See <xref linkend="sql-createpublication"/> for details.
+  </para>
+
+  <sect2 id="logical-replication-row-filter-rules">
+   <title>Row Filter Rules</title>
+
+   <para>
+    Row filter is applied <emphasis>before</emphasis> publishing the changes.
+   </para>
+
+   <para>
+    If the row filter evaluates to <literal>false</literal> or
+    <literal>NULL</literal> then the row is not replicated.
+   </para>
+
+   <para>
+    The <literal>WHERE</literal> clause expression is evaluated with the same
+    role used for the replication connection. It means the role specified in the
+    <literal>CONNECTION</literal> clause of the <xref linkend="sql-createsubscription"/>.
+   </para>
+
+   <para>
+    Row filter has no effect for <command>TRUNCATE</command> command.
+   </para>
+
+  </sect2>
+
+  <sect2 id="logical-replication-row-filter-restrictions">
+   <title>Expression Restrictions</title>
+
+   <para>
+    The <literal>WHERE</literal> clause allows only simple expressions. It
+    cannot contain user-defined functions, operators, types, and collations,
+    system column references and non-immutable built-in functions.
+   </para>
+
+   <para>
+    If a publication publishes <literal>update</literal> or
+    <literal>delete</literal> operations, the <literal>WHERE</literal> clause
+    must contain only columns that are part of the replica identity (see
+    <xref linkend="sql-altertable-replica-identity"/>). If it publishes only
+    <literal>insert</literal> operation, the expression can use any column.
+   </para>
+
+  </sect2>
+
+  <sect2 id="logical-replication-row-filter-transformations">
+   <title>UPDATE Transformations</title>
+
+   <para>
+    Whenever an <command>UPDATE</command> is processed, the row filter
+    expression is evaluated for the old and new row. If both evaluations are
+    <literal>true</literal>, it replicates the <command>UPDATE</command> change.
+    If both evaluations are <literal>false</literal>, it doesn't replicate the
+    change. If only one of the rows matches the row filter expression, a
+    transformation is required. <command>UPDATE</command> should be transformed
+    into a <command>DELETE</command> or <command>INSERT</command> to avoid data
+    inconsistency. The row on the subscriber should reflect what is defined by
+    the row filter expression on the publisher.
+   </para>
+
+   <para>
+    If the old row satisfies the row filter expression (it was sent to the
+    subscriber) but the new row doesn't, after the <command>UPDATE</command>,
+    from a data consistency perspective, the old row should be removed from the
+    subscriber. Hence, <command>UPDATE</command> should be transformed into a
+    <command>DELETE</command>.
+   </para>
+
+   <para>
+    On the other hand, if the old row doesn't satisfy the row filter expression
+    (it wasn't sent to the subscriber) but the new row does, after the
+    <command>UPDATE</command>, from a data consistency perspective, that new row
+    should be inserted into the subscriber. Hence, <command>UPDATE</command>
+    should be transformed into an <command>INSERT</command>.
+   </para>
+
+  </sect2>
+
+  <sect2 id="logical-replication-row-filter-partitioned-table">
+   <title>Partitioned Table</title>
+
+   <para>
+    If the publication contains a partitioned table, the parameter
+    <literal>publish_via_partition_root</literal> determines which row filter
+    expression is used. If the parameter
+    <literal>publish_via_partition_root</literal> is true, the row filter
+    expression associated with the partitioned table is used. Otherwise, the row
+    filter expression associated with the individual partition is used.
+   </para>
+
+  </sect2>
+
+  <sect2 id="logical-replication-row-filter-initial-data-sync">
+   <title>Initial Data Synchronization</title>
+
+   <para>
+    If the subscription requires copying pre-existing data in the publications
+    and a publication contains <literal>WHERE</literal> clauses, only data that
+    satisfies the row filter expressions is copied to the subscriber.
+   </para>
+
+   <para>
+    If the subscription has several publications in which a table has been
+    published with different <literal>WHERE</literal> clauses, rows that satisfy
+    any of the expressions will be copied (see
+    <xref linkend="logical-replication-row-filter-combining"/> for details).
+   </para>
+
+   <para>
+    If the subscriber is in a release prior to 15, copy pre-existing data
+    doesn't use row filter expressions even if it is defined in the publication.
+    This is because old releases can only copy the entire table data.
+   </para>
+
+  </sect2>
+
+  <sect2 id="logical-replication-row-filter-combining">
+   <title>Combining Multiple Row Filters</title>
+
+   <para>
+    If the subscription has several publications in which the same table has
+    been published with different row filters (for the same
+    <literal>publish</literal> operation), rows that satisfy any of the row
+    filter expressions is replicated. It means all other row filters for the
+    same table become irrelevant if:
+
+    <itemizedlist>
+     <listitem>
+      <para>
+       one of the publications has no row filter;
+      </para>
+     </listitem>
+     <listitem>
+      <para>
+       one of the publications was created using
+       <literal>FOR ALL TABLES</literal> clause. This clause does not allow row
+       filters;
+      </para>
+     </listitem>
+     <listitem>
+      <para>
+       one of the publications was created using
+       <literal>FOR ALL TABLES IN SCHEMA</literal> clause and the table belongs
+       to the referred schema. This clause does not allow row filters.
+      </para>
+     </listitem>
+    </itemizedlist>
+   </para>
+
+  </sect2>
+
+  <sect2 id="logical-replication-row-filter-examples">
+   <title>Examples</title>
+
+   <para>
+    Create some tables to be used in the following examples.
+<programlisting>
+testpub=# CREATE TABLE t1(a int, b int, c text, primary key(a,c));
+CREATE TABLE
+testpub=# CREATE TABLE t2(d int, e int, f int, primary key(d));
+CREATE TABLE
+testpub=# CREATE TABLE t3(g int, h int, i int, primary key(g));
+CREATE TABLE
+</programlisting>
+   </para>
+
+   <para>
+    Create some publications.
+<programlisting>
+testpub=# CREATE PUBLICATION p1 FOR TABLE t1 WHERE (a > 5 AND c = 'NSW');
+CREATE PUBLICATION
+testpub=# CREATE PUBLICATION p2 FOR TABLE t1, t2 WHERE (e = 99);
+CREATE PUBLICATION
+testpub=# CREATE PUBLICATION p3 FOR TABLE t2 WHERE (d = 10), t3 WHERE (g = 10);
+CREATE PUBLICATION
+</programlisting>
+   </para>
+
+   <para>
+    The publication <literal>p1</literal> contains only one table
+    (<literal>t1</literal>) that has a row filter expression for it.
+   </para>
+
+   <para>
+    The publication <literal>p2</literal> contains 2 tables. The table
+    <literal>t1</literal> has no row filter and the table <literal>t2</literal>
+    has a row filter.
+   </para>
+
+   <para>
+    The publication <literal>p3</literal> contains 2 tables and each one has a
+    defined row filter.
+   </para>
+
+   <para>
+    The <command>psql</command> shows the row filter expressions (if defined)
+    for each table.
+
+<programlisting>
+testpub=# \dRp+
+                               Publication p1
+  Owner   | All tables | Inserts | Updates | Deletes | Truncates | Via root 
+----------+------------+---------+---------+---------+-----------+----------
+ postgres | f          | t       | t       | t       | t         | f
+Tables:
+    "public.t1" WHERE ((a > 5) AND (c = 'NSW'::text))
+
+                               Publication p2
+  Owner   | All tables | Inserts | Updates | Deletes | Truncates | Via root 
+----------+------------+---------+---------+---------+-----------+----------
+ postgres | f          | t       | t       | t       | t         | f
+Tables:
+    "public.t1"
+    "public.t2" WHERE (e = 99)
+
+                               Publication p3
+  Owner   | All tables | Inserts | Updates | Deletes | Truncates | Via root 
+----------+------------+---------+---------+---------+-----------+----------
+ postgres | f          | t       | t       | t       | t         | f
+Tables:
+    "public.t2" WHERE (d = 10)
+    "public.t3" WHERE (g = 10)
+
+testpub=# \d t1
+                 Table "public.t1"
+ Column |  Type   | Collation | Nullable | Default 
+--------+---------+-----------+----------+---------
+ a      | integer |           | not null | 
+ b      | integer |           |          | 
+ c      | text    |           | not null | 
+Indexes:
+    "t1_pkey" PRIMARY KEY, btree (a, c)
+Publications:
+    "p1" WHERE ((a > 5) AND (c = 'NSW'::text))
+    "p2"
+
+testpub=# \d t2
+                 Table "public.t2"
+ Column |  Type   | Collation | Nullable | Default 
+--------+---------+-----------+----------+---------
+ d      | integer |           | not null | 
+ e      | integer |           |          | 
+ f      | integer |           |          | 
+Indexes:
+    "t2_pkey" PRIMARY KEY, btree (d)
+Publications:
+    "p2" WHERE (e = 99)
+    "p3" WHERE (d = 10)
+</programlisting>
+   </para>
+
+   <para>
+    On the subscriber, create the table <literal>t1</literal> with the same
+    definition as the one on the publisher and also create the subscription
+    <literal>s1</literal> that subscribes to the publication
+    <literal>p1</literal>.
+<programlisting>
+testsub=# CREATE TABLE t1(a int, b int, c text, primary key(a,c));
+CREATE TABLE
+testsub=# CREATE SUBSCRIPTION s1
+testsub-# CONNECTION 'host=10.1.2.3 port=5432 user=repuser dbname=testpub'
+testsub-# PUBLICATION p1;
+CREATE SUBSCRIPTION
+</programlisting>
+   </para>
+
+   <para>
+    Insert some rows.
+<programlisting>
+testpub=# INSERT INTO t1 VALUES (2, 102, 'NSW');
+INSERT 0 1
+testpub=# INSERT INTO t1 VALUES (3, 103, 'QLD');
+INSERT 0 1
+testpub=# INSERT INTO t1 VALUES (4, 104, 'VIC');
+INSERT 0 1
+testpub=# INSERT INTO t1 VALUES (5, 105, 'ACT');
+INSERT 0 1
+testpub=# INSERT INTO t1 VALUES (6, 106, 'NSW');
+INSERT 0 1
+testpub=# INSERT INTO t1 VALUES (7, 107, 'NT');
+INSERT 0 1
+testpub=# INSERT INTO t1 VALUES (8, 108, 'QLD');
+INSERT 0 1
+testpub=# INSERT INTO t1 VALUES (9, 109, 'NSW');
+INSERT 0 1
+</programlisting>
+<programlisting>
+testpub=# SELECT * FROM t1; 
+ a |  b  |  c
+---+-----+-----
+ 2 | 102 | NSW
+ 3 | 103 | QLD
+ 4 | 104 | VIC
+ 5 | 105 | ACT
+ 6 | 106 | NSW
+ 7 | 107 | NT
+ 8 | 108 | QLD
+ 9 | 109 | NSW
+(8 rows)
+</programlisting>
+   </para>
+
+   <para>
+    Notice that only the rows satisfying the <literal>t1</literal>
+    <literal>WHERE</literal> clause of the publication
+    <literal>p1</literal> are replicated.
+<programlisting>
+testsub=# SELECT * FROM t1;
+ a |  b  |  c  
+---+-----+-----
+ 6 | 106 | NSW
+ 9 | 109 | NSW
+(2 rows)
+</programlisting>
+   </para>
+
+   <para>
+    Update a row whose both (old and new) rows satisfy the <literal>t1</literal>
+    <literal>WHERE</literal> clause of the publication <literal>p1</literal>. It
+    replicates the <command>UPDATE</command> as expected.
+<programlisting>
+testpub=# UPDATE t1 SET b = 999 WHERE a = 6;
+UPDATE 1
+</programlisting>
+<programlisting>
+testpub=# SELECT * FROM t1;
+ a |  b  |  c  
+---+-----+-----
+ 2 | 102 | NSW
+ 3 | 103 | QLD
+ 4 | 104 | VIC
+ 5 | 105 | ACT
+ 7 | 107 | NT
+ 8 | 108 | QLD
+ 9 | 109 | NSW
+ 6 | 999 | NSW
+(8 rows)
+</programlisting>
+<programlisting>
+testsub=# SELECT * FROM t1;
+ a |  b  |  c  
+---+-----+-----
+ 9 | 109 | NSW
+ 6 | 999 | NSW
+(2 rows)
+</programlisting>
+   </para>
+
+   <para>
+    Update a row whose the old row doesn't satisfy the <literal>t1</literal>
+    <literal>WHERE</literal> clause of the publication <literal>p1</literal>
+    but the new row satisfies it. The <literal>UPDATE</literal> is transformed
+    into an <command>INSERT</command> and the change is replicated. See the new
+    row on the subscriber.
+<programlisting>
+testpub=# UPDATE t1 SET a = 555 WHERE a = 2;
+UPDATE 1
+</programlisting>
+<programlisting>
+testpub=# SELECT * FROM t1;
+  a  |  b  |  c  
+-----+-----+-----
+   3 | 103 | QLD
+   4 | 104 | VIC
+   5 | 105 | ACT
+   7 | 107 | NT
+   8 | 108 | QLD
+   9 | 109 | NSW
+   6 | 999 | NSW
+ 555 | 102 | NSW
+(8 rows)
+</programlisting>
+<programlisting>
+testsub=# SELECT * FROM t1;
+  a  |  b  |  c  
+-----+-----+-----
+   9 | 109 | NSW
+   6 | 999 | NSW
+ 555 | 102 | NSW
+(3 rows)
+</programlisting>
+   </para>
+
+   <para>
+    Update a row whose the old row satisfies the <literal>t1</literal>
+    <literal>WHERE</literal> clause of the publication <literal>p1</literal>
+    but the new row doesn't satisfy it. The <command>UPDATE</command> is
+    transformed into a <command>DELETE</command> and the change is replicated.
+    See that the row is removed from the subscriber.
+<programlisting>
+testpub=# UPDATE t1 SET c = 'VIC' WHERE a = 9;
+UPDATE 1
+</programlisting>
+<programlisting>
+testpub=# SELECT * FROM t1;
+  a  |  b  |  c  
+-----+-----+-----
+   3 | 103 | QLD
+   4 | 104 | VIC
+   5 | 105 | ACT
+   7 | 107 | NT
+   8 | 108 | QLD
+   6 | 999 | NSW
+ 999 | 102 | NSW
+   9 | 109 | VIC
+(8 rows)
+</programlisting>
+<programlisting>
+testsub=# SELECT * FROM t1;
+  a  |  b  |  c  
+-----+-----+-----
+   6 | 999 | NSW
+ 999 | 102 | NSW
+(2 rows)
+</programlisting>
+   </para>
+
+  </sect2>
+
+ </sect1>
+
  <sect1 id="logical-replication-conflicts">
   <title>Conflicts</title>
 
diff --git a/doc/src/sgml/ref/create_publication.sgml b/doc/src/sgml/ref/create_publication.sgml
index 4979b9b646..3591d51af3 100644
--- a/doc/src/sgml/ref/create_publication.sgml
+++ b/doc/src/sgml/ref/create_publication.sgml
@@ -247,6 +247,8 @@ CREATE PUBLICATION <replaceable class="parameter">name</replaceable>
    <literal>publish_via_partition_root</literal> determines if it uses the
    partition's row filter (if the parameter is false, the default) or the root
    partitioned table's row filter.
+   See <xref linkend="logical-replication-row-filter"/> for details about row
+   filters.
   </para>
 
   <para>
-- 
2.30.2

#14

smithpb2250@gmail.com

almost 4 years ago

In reply to: Euler Taveira (#13)

1 attachment(s)

Re: PG DOCS - logical replication filtering

On Fri, Mar 11, 2022 at 9:37 AM Euler Taveira <euler@eulerto.com> wrote:

On Fri, Mar 4, 2022, at 12:41 AM, Peter Smith wrote:

PSA patch v3 to address all comments received so far.

Peter,

I started reviewing this documentation for row filter and I noticed that I was
changing too much lines to submit a patch on the top of it. Hence, I'm
attaching a new version based on this one.

Sorry for my long delay in replying. I have been away.

Thanks very much for the updated patch with all your suggestions.
There were many changes in your v4. I have not merged everything
exactly, but I did take the majority of your suggestions on-board in
the attached v5.

I responded below to each change.

Here as some of the changes that I did:

* links: I renamed the ids to use "logical-replication-row-filter" instead of
"logical-replication-rf" because it is used in the anchors. A compound word
is better than an abbreviation.

OK, done as suggested.

* titles: I changed all titles because of some stylish issue (words are
generally capitalized) or because it reads better.

OK, most titles changed as suggested.

* sections: I moved the "Restrictions" section to the top but it seems
important than the other sections. I removed the "psql" section. The psql
commands are shown in the "Examples" section so I don't think we should
provide a section for it.

OK, moved the "Restrictions" section and removed the "psql" section.

* sentences: I expanded several sentences to provide details about the specific
topic. I also reordered some sentences and removed some duplicated sentences.

I did not take everything exactly as in your v4, but wherever your
suggested changes added some more information I tried to include them
using similar wording to yours.

* replica identity: I added a link to replica identity. It is a key concept for
row filter.

OK, done as suggested.

* transformations: I added a few sentences explaining when/why a transformation
is required. I removed the "Cases" part (same as in the source code) because
it is redundant with the new sentences.

OK, I incorporated most of your new descriptions for the
transformations, however I still wanted to keep the summary of "cases"
part because IMO it makes the rules so much clearer than just having
the text descriptions.

* partitioned table: the title should be _partitioned_ table. I replaced the
bullets with sentences in the same paragraph.

OK. The title was changed, but I kept the bullets.

* initial data sync: I removed the "subscriber" from the section title. When
you say "initial data synchronization" it seems clear we're talking about the
subscriber. I also add a sentence saying why pre-15 does not consider row
filters.

OK. Title is changed. Also I added your sentence about the pre-15. But
I kept all the HTML note rendering that you'd removed in v4. I think
this information was important enough to be a "note" instead of just
buried in a paragraph.

* combining row filters: I renamed the section and decided to remove "for the
same table". When reading the first sentences from this section, it is clear
that row filtering is per table. So if you are combining multiple row
filters, it should be for the same table. I also added a sentence saying why
some clauses make the row filter irrelevant.

OK. Title is changed. Your extra sentence was added.

* examples: I combined the psql commands that shows row filter information
together (\dRp+ and \d). I changed to connection string to avoid "localhost".
Why? It does not seem a separate service (there is no port) and setup pub/sub
in the same cluster requires additional steps. It is better to illustrate
different clusters (at least it seems so since we don't provide details from
publisher). I changed a value in an UPDATE because both UPDATEs use 999.

I did not combine those \dRp+ and \d. I kept them separate so I could
write some separate notes about them.

I'm unsure about the connection change. This documentation is for "Row
Filters" so the examples were only intended to demonstrate row filters
and nothing else. I wanted to have a trivial connection such that a
user can just cut/paste directly from the example and get something
they can immediately test without having to change it. I don't mind
changing this later but probably I'd like to get some other opinions
about it first.

About the UPDATE (555 value) - OK, I changed that value as you suggested.

We could probably reduce the number of rows in the example but I didn't bother
to remove them.

It seems we can remove some sentences from the CREATE PUBLICATION because we
have a new section that explains all of it. I think the link that was added by
this patch is sufficient.

Yeah, maybe some sentences can be removed. But even though some
information is duplicated it might be useful to have a few things
still mentioned on the CREATE PUBLICATION page just so the user does
not have to chase links too much. I will wait to see if other people
have an opinion about it before removing any content from that page.
Meanwhile, I have made the create_publication.sgml identical to your
v4.

~~~

There should be much fewer v4/v5 differences now although there might
be a few things I missed that you want to re-comment on. Hopefully, it
will now be easier to post review comments as BEFORE/AFTER text
fragments - that would help other people to see the suggestions more
easily and give their opinions too.

------
Kind Regards,
Peter Smith.
Fujitsu Australia.

Attachments:

v5-0001-This-patch-introduces-a-new-documentation-page-fo.patchapplication/octet-stream; name=v5-0001-This-patch-introduces-a-new-documentation-page-fo.patchDownload

From 7ec8041194b458b71c2a4ca531de2f345d4d6f09 Mon Sep 17 00:00:00 2001
From: postgres <postgres@localhost.localdomain>
Date: Thu, 24 Mar 2022 16:38:29 +1100
Subject: [PATCH v5] This patch introduces a new documentation page for
 describing the "Row Filters" feature.

Author: Peter Smith, Euler Taveira
Reviewed By: Greg Nancarrow, Aleksander Alekseev, Amit Kapila, Ajin Cherian
Discussion: https://www.postgresql.org/message-id/CAHut%2BPtnsBr59%3D_NvxXp_%3DS-em0WxyuDOQmSTuHGb4sVhkHffg%40mail.gmail.com
---
 doc/src/sgml/logical-replication.sgml    | 521 +++++++++++++++++++++++++++++++
 doc/src/sgml/ref/create_publication.sgml |   2 +
 2 files changed, 523 insertions(+)

diff --git a/doc/src/sgml/logical-replication.sgml b/doc/src/sgml/logical-replication.sgml
index 6431d47..504af09 100644
--- a/doc/src/sgml/logical-replication.sgml
+++ b/doc/src/sgml/logical-replication.sgml
@@ -118,6 +118,8 @@
    any combination of <command>INSERT</command>, <command>UPDATE</command>,
    <command>DELETE</command>, and <command>TRUNCATE</command>, similar to how triggers are fired by
    particular event types.  By default, all operation types are replicated.
+   (Row filters have no effect for <command>TRUNCATE</command>. See 
+   <xref linkend="logical-replication-row-filter"/>).
   </para>
 
   <para>
@@ -317,6 +319,525 @@
   </sect2>
  </sect1>
 
+ <sect1 id="logical-replication-row-filter">
+  <title>Row Filters</title>
+
+  <para>
+   By default, all data from all published tables will be replicated to the
+   appropriate subscribers.
+  </para>
+
+  <para>
+   The replicated data can be reduced by using a <firstterm>row filter</firstterm>.
+   A user might choose to use row filters for behavioral, security or performance
+   reasons.
+  </para>
+
+  <para>
+   If a published table sets a row filter, a row is replicated only if its data
+   satisfies the row filter expression. This allows a set of tables to be
+   partially replicated.
+  </para>
+
+  <para>
+   The row filter is defined per table. Use a <literal>WHERE</literal> clause
+   after the table name for each published table that requires data to be
+   filtered out. The <literal>WHERE</literal> clause must be enclosed by
+   parentheses. See <xref linkend="sql-createpublication"/> for details.
+  </para>
+
+  <sect2 id="logical-replication-row-filter-rules">
+   <title>Row Filter Rules</title>
+
+   <para>
+    Row filters are applied <emphasis>before</emphasis> publishing the changes.
+   </para>
+
+   <para>
+    If the row filter evaluates to <literal>false</literal> or
+    <literal>NULL</literal> then the row is not replicated.
+   </para>
+
+   <para>
+    The <literal>WHERE</literal> clause expression is evaluated with the same
+    role used for the replication connection. i.e. the role specified in the
+    <literal>CONNECTION</literal> clause of the <xref linkend="sql-createsubscription"/>.
+   </para>
+
+   <para>
+    Row filters have no effect for <command>TRUNCATE</command> command.
+   </para>
+
+  </sect2>
+
+  <sect2 id="logical-replication-row-filter-restrictions">
+   <title>Expression Restrictions</title>
+
+   <para>
+    The <literal>WHERE</literal> clause allows only simple expressions. It
+    cannot contain user-defined functions, operators, types, and collations,
+    system column references or non-immutable built-in functions.
+   </para>
+
+   <para>
+    If a publication publishes <command>UPDATE</command> and/or
+    <command>DELETE</command> operations, the row filter <literal>WHERE</literal> clause
+    must contain only columns that are covered by the replica identity (see
+    <xref linkend="sql-altertable-replica-identity"/>). If a publication publishes only
+    <command>INSERT</command>, the row filter <literal>WHERE</literal> clause
+    can use any column.
+   </para>
+
+  </sect2>
+
+  <sect2 id="logical-replication-row-filter-transformations">
+   <title>UPDATE Transformations</title>
+
+   <para>
+    Whenever an <command>UPDATE</command> is processed, the row filter
+    expression is evaluated for both the old and new row (i.e. before
+    and after the data is updated).
+   </para>
+
+   <para>
+    If both evaluations are <literal>true</literal>, it replicates the
+    <command>UPDATE</command> change.
+   </para>
+
+   <para>
+    If both evaluations are <literal>false</literal>, it doesn't replicate
+    the change.
+   </para>
+
+   <para>
+    If only one of the old/new rows matches the row filter expression, the
+    <command>UPDATE</command> is transformed to <command>INSERT</command> or
+    <command>DELETE</command>, to avoid any data inconsistency. The row on the
+    subscriber should reflect what is defined by the row filter expression on
+    the publisher.
+
+   <itemizedlist>
+    <listitem>
+     <para>
+      If the old row satisfies the row filter expression (it was sent to the
+      subscriber) but the new row doesn't, then from a data consistency
+      perspective the old row should be removed from the subscriber.
+      So the <command>UPDATE</command> is transformed into a <command>DELETE</command>.
+     </para>
+    </listitem>
+    <listitem>
+     <para>
+      If the old row doesn't satisfy the row filter expression (it wasn't sent
+      to the subscriber) but the new row does, then from a data consistency
+      perspective the new row should be added to the subscriber.
+      So the <command>UPDATE</command> is transformed into an <command>INSERT</command>.
+     </para>
+    </listitem>
+   </itemizedlist>
+
+Summary:
+<synopsis>
+Case 1: old-row (no match)    new-row (no match)   -->   (drop change)
+Case 2: old-row (no match)    new row (match)      -->   INSERT
+Case 3: old-row (match)       new-row (no match)   -->   DELETE
+Case 4: old-row (match)       new row (match)      -->   UPDATE
+</synopsis>
+   </para>
+
+  </sect2>
+
+  <sect2 id="logical-replication-row-filter-partitioned-table">
+   <title>Partitioned Tables</title>
+
+   <para>
+    If the publication contains a partitioned table, the publication parameter
+    <literal>publish_via_partition_root</literal> determines which row filter
+    is used.
+    <itemizedlist>
+
+     <listitem>
+      <para>
+       If <literal>publish_via_partition_root</literal> is <literal>false</literal>
+       (default), each <emphasis>partition's</emphasis> row filter is used.
+      </para>
+     </listitem>
+
+     <listitem>
+      <para>
+       If <literal>publish_via_partition_root</literal> is <literal>true</literal>,
+       the <emphasis>root partitioned table's</emphasis> row filter is used.
+      </para>
+     </listitem>
+
+    </itemizedlist>
+   </para>
+
+  </sect2>
+
+  <sect2 id="logical-replication-row-filter-initial-data-sync">
+   <title>Initial Data Synchronization</title>
+
+   <para>
+    If the subscription requires copying pre-existing table data
+    and a publication contains <literal>WHERE</literal> clauses, only data that
+    satisfies the row filter expressions is copied to the subscriber.
+   </para>
+
+   <para>
+    If the subscription has several publications in which a table has been
+    published with different <literal>WHERE</literal> clauses, rows that satisfy
+    <emphasis>any</emphasis> of the expressions will be copied. (see
+    <xref linkend="logical-replication-row-filter-combining"/> for details).
+   </para>
+
+   <note>
+    <para>
+     Publication <literal>publish</literal> operations are ignored when copying pre-existing table data.
+    </para>
+   </note>
+
+   <note>
+    <para>
+     If the subscriber is in a release prior to 15, copy pre-existing data
+     doesn't use row filters even if they are defined in the publication.
+     This is because old releases can only copy the entire table data.
+    </para>
+   </note>
+
+  </sect2>
+
+  <sect2 id="logical-replication-row-filter-combining">
+   <title>Combining Multiple Row Filters</title>
+
+   <para>
+    If the subscription has several publications in which the same table has
+    been published with different row filters (for the same <literal>publish</literal>
+    operation), those expressions get OR'ed together, so that rows satisfying
+    <emphasis>any</emphasis> of the expressions will be replicated.
+   </para>
+
+   <para>
+    This means all the other row filters for the same table become redundant if:
+    <itemizedlist>
+
+     <listitem>
+      <para>
+       one of the publications has no row filter.
+      </para>
+     </listitem>
+
+     <listitem>
+      <para>
+       one of the publications was created using <literal>FOR ALL TABLES</literal>.
+       This clause does not allow row filters.
+      </para>
+     </listitem>
+
+     <listitem>
+      <para>
+       one of the publications was created using
+       <literal>FOR ALL TABLES IN SCHEMA</literal> and the table belongs to
+       the referred schema. This clause does not allow row filters.
+      </para>
+     </listitem>
+
+    </itemizedlist>
+   </para>
+
+  </sect2>
+
+  <sect2 id="logical-replication-row-filter-examples">
+   <title>Examples</title>
+
+   <para>
+    Create some tables to be used in the following examples.
+<programlisting>
+testpub=# CREATE TABLE t1(a int, b int, c text, primary key(a,c));
+CREATE TABLE
+testpub=# CREATE TABLE t2(d int primary key, e int, f int);
+CREATE TABLE
+testpub=# CREATE TABLE t3(g int primary key, h int, i int);
+CREATE TABLE
+</programlisting>
+   </para>
+
+   <para>
+    Create some publications.
+   </para>
+   <para>
+    - notice publication <literal>p1</literal> has 1 table with a row filter.
+   </para>
+   <para>
+    - notice publication <literal>p2</literal> has 2 tables, one without a row
+    filter, and one with a row filter.
+   </para>
+   <para>
+    - notice publication <literal>p3</literal> has 2 tables, both with row filters.
+<programlisting>
+testpub=# CREATE PUBLICATION p1 FOR TABLE t1 WHERE (a > 5 AND c = 'NSW');
+CREATE PUBLICATION
+testpub=# CREATE PUBLICATION p2 FOR TABLE t1, t2 WHERE (e = 99);
+CREATE PUBLICATION
+testpub=# CREATE PUBLICATION p3 FOR TABLE t2 WHERE (d = 10), t3 WHERE (g = 10);
+CREATE PUBLICATION
+</programlisting>
+   </para>
+
+   <para>
+    The PSQL command <command>\dRp+</command> shows the row filter expressions
+    (if defined) for each table of the publications.
+<programlisting>
+testpub=# \dRp+
+                               Publication p1
+  Owner   | All tables | Inserts | Updates | Deletes | Truncates | Via root 
+----------+------------+---------+---------+---------+-----------+----------
+ postgres | f          | t       | t       | t       | t         | f
+Tables:
+    "public.t1" WHERE ((a > 5) AND (c = 'NSW'::text))
+
+                               Publication p2
+  Owner   | All tables | Inserts | Updates | Deletes | Truncates | Via root 
+----------+------------+---------+---------+---------+-----------+----------
+ postgres | f          | t       | t       | t       | t         | f
+Tables:
+    "public.t1"
+    "public.t2" WHERE (e = 99)
+
+                               Publication p3
+  Owner   | All tables | Inserts | Updates | Deletes | Truncates | Via root 
+----------+------------+---------+---------+---------+-----------+----------
+ postgres | f          | t       | t       | t       | t         | f
+Tables:
+    "public.t2" WHERE (d = 10)
+    "public.t3" WHERE (g = 10)
+</programlisting>
+   </para>
+
+   <para>
+    The PSQL command <command>\d</command> shows what publications the table is
+    a member of, as well as that table's row filter expression (if defined) in
+    those publications.
+   </para>
+   <para>
+    - notice table <literal>t1</literal> is a member of 2 publications, but
+    has a row filter only in <literal>p1</literal>.
+   </para>
+   <para>
+    - notice table <literal>t2</literal> is a member of 2 publications, and
+    has a different row filter in each of them.
+<programlisting>
+testpub=# \d t1
+                 Table "public.t1"
+ Column |  Type   | Collation | Nullable | Default 
+--------+---------+-----------+----------+---------
+ a      | integer |           | not null | 
+ b      | integer |           |          | 
+ c      | text    |           | not null | 
+Indexes:
+    "t1_pkey" PRIMARY KEY, btree (a, c)
+Publications:
+    "p1" WHERE ((a > 5) AND (c = 'NSW'::text))
+    "p2"
+
+testpub=# \d t2
+                 Table "public.t2"
+ Column |  Type   | Collation | Nullable | Default 
+--------+---------+-----------+----------+---------
+ d      | integer |           | not null | 
+ e      | integer |           |          | 
+ f      | integer |           |          | 
+Indexes:
+    "t2_pkey" PRIMARY KEY, btree (d)
+Publications:
+    "p2" WHERE (e = 99)
+    "p3" WHERE (d = 10)
+
+testpub=# \d t3
+                 Table "public.t3"
+ Column |  Type   | Collation | Nullable | Default 
+--------+---------+-----------+----------+---------
+ g      | integer |           | not null | 
+ h      | integer |           |          | 
+ i      | integer |           |          | 
+Indexes:
+    "t3_pkey" PRIMARY KEY, btree (g)
+Publications:
+    "p3" WHERE (g = 10)
+</programlisting>
+   </para>
+
+   <para>
+    On the subscriber node, create a table <literal>t1</literal> with the same
+    definition as the one on the publisher, and also create the subscription
+    <literal>s1</literal> that subscribes to the publication <literal>p1</literal>.
+<programlisting>
+testsub=# CREATE TABLE t1(a int, b int, c text, primary key(a,c));
+CREATE TABLE
+testsub=# CREATE SUBSCRIPTION s1
+testsub-# CONNECTION 'host=localhost dbname=testpub application_name=s1'
+testsub-# PUBLICATION p1;
+CREATE SUBSCRIPTION
+</programlisting>
+   </para>
+
+   <para>
+    Insert some rows.
+   </para>
+   <para>
+    - notice that only the rows satisfying the <literal>t1 WHERE</literal>
+    clause of publication <literal>p1</literal> are replicated.
+<programlisting>
+testpub=# INSERT INTO t1 VALUES (2, 102, 'NSW');
+INSERT 0 1
+testpub=# INSERT INTO t1 VALUES (3, 103, 'QLD');
+INSERT 0 1
+testpub=# INSERT INTO t1 VALUES (4, 104, 'VIC');
+INSERT 0 1
+testpub=# INSERT INTO t1 VALUES (5, 105, 'ACT');
+INSERT 0 1
+testpub=# INSERT INTO t1 VALUES (6, 106, 'NSW');
+INSERT 0 1
+testpub=# INSERT INTO t1 VALUES (7, 107, 'NT');
+INSERT 0 1
+testpub=# INSERT INTO t1 VALUES (8, 108, 'QLD');
+INSERT 0 1
+testpub=# INSERT INTO t1 VALUES (9, 109, 'NSW');
+INSERT 0 1
+
+testpub=# SELECT * FROM t1; 
+ a |  b  |  c
+---+-----+-----
+ 2 | 102 | NSW
+ 3 | 103 | QLD
+ 4 | 104 | VIC
+ 5 | 105 | ACT
+ 6 | 106 | NSW
+ 7 | 107 | NT
+ 8 | 108 | QLD
+ 9 | 109 | NSW
+(8 rows)
+</programlisting>
+<programlisting>
+testsub=# SELECT * FROM t1;
+ a |  b  |  c  
+---+-----+-----
+ 6 | 106 | NSW
+ 9 | 109 | NSW
+(2 rows)
+</programlisting>
+   </para>
+
+   <para>
+    Update some data, where the old and new row values both
+    satisfy the <literal>t1 WHERE</literal> clause of publication
+    <literal>p1</literal>.
+   </para>
+
+   <para>
+    - notice this replicates as a normal <command>UPDATE</command>.
+<programlisting>
+testpub=# UPDATE t1 SET b = 999 WHERE a = 6;
+UPDATE 1
+
+testpub=# SELECT * FROM t1;
+ a |  b  |  c  
+---+-----+-----
+ 2 | 102 | NSW
+ 3 | 103 | QLD
+ 4 | 104 | VIC
+ 5 | 105 | ACT
+ 7 | 107 | NT
+ 8 | 108 | QLD
+ 9 | 109 | NSW
+ 6 | 999 | NSW
+(8 rows)
+</programlisting>
+<programlisting>
+testsub=# SELECT * FROM t1;
+ a |  b  |  c  
+---+-----+-----
+ 9 | 109 | NSW
+ 6 | 999 | NSW
+(2 rows)
+</programlisting>
+   </para>
+
+   <para>
+    Update some data, where the old row values did not satisfy
+    the <literal>t1 WHERE</literal> clause of publication <literal>p1</literal>,
+    but the new row values do satisfy it.
+   </para>
+
+   <para>
+    - notice the <command>UPDATE</command> is transformed into an <command>INSERT</command>
+    and the change is replicated.
+<programlisting>
+testpub=# UPDATE t1 SET a = 555 WHERE a = 2;
+UPDATE 1
+
+testpub=# SELECT * FROM t1;
+  a  |  b  |  c  
+-----+-----+-----
+   3 | 103 | QLD
+   4 | 104 | VIC
+   5 | 105 | ACT
+   7 | 107 | NT
+   8 | 108 | QLD
+   9 | 109 | NSW
+   6 | 999 | NSW
+ 555 | 102 | NSW
+(8 rows)
+</programlisting>
+<programlisting>
+testsub=# SELECT * FROM t1;
+  a  |  b  |  c  
+-----+-----+-----
+   9 | 109 | NSW
+   6 | 999 | NSW
+ 555 | 102 | NSW
+(3 rows)
+</programlisting>
+   </para>
+
+   <para>
+    Update some data, where the old row values satisfied
+    the <literal>t1 WHERE</literal> clause of publication <literal>p1</literal>,
+    but the new row values do not satisfy it.
+   </para>
+   <para>
+    - notice the <command>UPDATE</command> is transformed into a <command>DELETE</command>
+    and the change is replicated.
+<programlisting>
+testpub=# UPDATE t1 SET c = 'VIC' WHERE a = 9;
+UPDATE 1
+
+testpub=# SELECT * FROM t1;
+  a  |  b  |  c  
+-----+-----+-----
+   3 | 103 | QLD
+   4 | 104 | VIC
+   5 | 105 | ACT
+   7 | 107 | NT
+   8 | 108 | QLD
+   6 | 999 | NSW
+ 555 | 102 | NSW
+   9 | 109 | VIC
+(8 rows)
+</programlisting>
+<programlisting>
+testsub=# SELECT * FROM t1;
+  a  |  b  |  c  
+-----+-----+-----
+   6 | 999 | NSW
+ 555 | 102 | NSW
+(2 rows)
+</programlisting>
+   </para>
+
+  </sect2>
+
+ </sect1>
+
  <sect1 id="logical-replication-conflicts">
   <title>Conflicts</title>
 
diff --git a/doc/src/sgml/ref/create_publication.sgml b/doc/src/sgml/ref/create_publication.sgml
index 4979b9b..3591d51 100644
--- a/doc/src/sgml/ref/create_publication.sgml
+++ b/doc/src/sgml/ref/create_publication.sgml
@@ -247,6 +247,8 @@ CREATE PUBLICATION <replaceable class="parameter">name</replaceable>
    <literal>publish_via_partition_root</literal> determines if it uses the
    partition's row filter (if the parameter is false, the default) or the root
    partitioned table's row filter.
+   See <xref linkend="logical-replication-row-filter"/> for details about row
+   filters.
   </para>
 
   <para>
-- 
1.8.3.1

#15

amit.kapila16@gmail.com

almost 4 years ago

In reply to: Peter Smith (#14)

Re: PG DOCS - logical replication filtering

On Thu, Mar 24, 2022 at 11:48 AM Peter Smith <smithpb2250@gmail.com> wrote:

Review comments:
===============
1.
+    The <literal>WHERE</literal> clause expression is evaluated with the same
+    role used for the replication connection. i.e. the role specified in the
+    <literal>CONNECTION</literal> clause of the <xref
linkend="sql-createsubscription"/>.

Can we use () around i.e. sentence? It looks bit odd to me now.
The <literal>WHERE</literal> clause expression is evaluated with the
same role used for the replication connection (i.e., the role
specified in the <literal>CONNECTION</literal> clause of the <xref
linkend="sql-createsubscription"/>).

2.
+   <para>
+    Whenever an <command>UPDATE</command> is processed, the row filter
+    expression is evaluated for both the old and new row (i.e. before
+    and after the data is updated).

Can we write the below part slightly differently?
Before:
(i.e. before and after the data is updated).
After:
(i.e the data before and after the update).

3.
+   <para>
+    Whenever an <command>UPDATE</command> is processed, the row filter
+    expression is evaluated for both the old and new row (i.e. before
+    and after the data is updated).
+   </para>
+
+   <para>
+    If both evaluations are <literal>true</literal>, it replicates the
+    <command>UPDATE</command> change.
+   </para>
+
+   <para>
+    If both evaluations are <literal>false</literal>, it doesn't replicate
+    the change.
+   </para>

I think we can combine these three separate paragraphs.

4.
+   <para>
+    If the publication contains a partitioned table, the publication parameter
+    <literal>publish_via_partition_root</literal> determines which row filter
+    is used.
+    <itemizedlist>
+
+     <listitem>
+      <para>
+       If <literal>publish_via_partition_root</literal> is
<literal>false</literal>
+       (default), each <emphasis>partition's</emphasis> row filter is used.
+      </para>
+     </listitem>
+
+     <listitem>
+      <para>
+       If <literal>publish_via_partition_root</literal> is
<literal>true</literal>,
+       the <emphasis>root partitioned table's</emphasis> row filter is used.
+      </para>
+     </listitem>
+
+    </itemizedlist>
+   </para>

I think we can combine this into single para as Euler had in his version.

5.
+   <note>
+    <para>
+     Publication <literal>publish</literal> operations are ignored
when copying pre-existing table data.
+    </para>
+   </note>
+
+   <note>
+    <para>
+     If the subscriber is in a release prior to 15, copy pre-existing data
+     doesn't use row filters even if they are defined in the publication.
+     This is because old releases can only copy the entire table data.
+    </para>
+   </note>

I don't see the need for the first <note> here, the second one seems
to convey it.

6.
+   <para>
+    Create some tables to be used in the following examples.
+<programlisting>
+testpub=# CREATE TABLE t1(a int, b int, c text, primary key(a,c));
+CREATE TABLE
+testpub=# CREATE TABLE t2(d int primary key, e int, f int);
+CREATE TABLE
+testpub=# CREATE TABLE t3(g int primary key, h int, i int);
+CREATE TABLE
+</programlisting>
+   </para>
+
+   <para>
+    Create some publications.
+   </para>
+   <para>
+    - notice publication <literal>p1</literal> has 1 table with a row filter.
+   </para>
+   <para>
+    - notice publication <literal>p2</literal> has 2 tables, one without a row
+    filter, and one with a row filter.
+   </para>
+   <para>
+    - notice publication <literal>p3</literal> has 2 tables, both
with row filters.
+<programlisting>
+testpub=# CREATE PUBLICATION p1 FOR TABLE t1 WHERE (a > 5 AND c = 'NSW');
+CREATE PUBLICATION
+testpub=# CREATE PUBLICATION p2 FOR TABLE t1, t2 WHERE (e = 99);
+CREATE PUBLICATION
+testpub=# CREATE PUBLICATION p3 FOR TABLE t2 WHERE (d = 10), t3 WHERE (g = 10);
+CREATE PUBLICATION
+</programlisting>
+   </para>

I think it is better to use the corresponding content from Euler's version.

7.
+   <para>
+    The PSQL command <command>\d</command> shows what publications the table is
+    a member of, as well as that table's row filter expression (if defined) in
+    those publications.
+   </para>
+   <para>
+    - notice table <literal>t1</literal> is a member of 2 publications, but
+    has a row filter only in <literal>p1</literal>.
+   </para>
+   <para>
+    - notice table <literal>t2</literal> is a member of 2 publications, and
+    has a different row filter in each of them.

This looks unnecessary to me. Let's remove this part.

8.
+   <para>
+    - notice that only the rows satisfying the <literal>t1 WHERE</literal>
+    clause of publication <literal>p1</literal> are replicated.

Again, it is better to use Euler's version for this and at the place,
he had in his version. Similarly, adjust other notices if any like
this one.

9. I suggest adding an example for partition tables showing how the
row filter is used based on the 'publish_via_partition_root'
parameter.

--
With Regards,
Amit Kapila.

#16

smithpb2250@gmail.com

almost 4 years ago

In reply to: Amit Kapila (#15)

1 attachment(s)

Re: PG DOCS - logical replication filtering

PSA patch v6 to address some of Amit's review comments [1]/messages/by-id/CAA4eK1JdwQQsxa+zpsBW5rCxEfXopYx381nwcCyeXk6mpF8ChQ@mail.gmail.com.

------
[1]: /messages/by-id/CAA4eK1JdwQQsxa+zpsBW5rCxEfXopYx381nwcCyeXk6mpF8ChQ@mail.gmail.com

Kind Regards,
Peter Smith.
Fujitsu Australia

Attachments:

v6-0001-PG-DOCS-page-for-row-filters.patchapplication/octet-stream; name=v6-0001-PG-DOCS-page-for-row-filters.patchDownload

From 47eab24dc65167ff4cd8c35fa86f596139f0afa3 Mon Sep 17 00:00:00 2001
From: Peter Smith <peter.b.smith@fujitsu.com>
Date: Fri, 8 Apr 2022 15:45:27 +1000
Subject: [PATCH v6] PG DOCS page for row filters.

This patch introduces a new documentation page for describing the "Row Filters" feature.

Author: Peter Smith, Euler Taveira
Reviewed By: Greg Nancarrow, Aleksander Alekseev, Amit Kapila, Ajin Cherian
Discussion: https://www.postgresql.org/message-id/CAHut%2BPtnsBr59%3D_NvxXp_%3DS-em0WxyuDOQmSTuHGb4sVhkHffg%40mail.gmail.com
---
 doc/src/sgml/logical-replication.sgml    | 551 +++++++++++++++++++++++++++++++
 doc/src/sgml/ref/create_publication.sgml |   2 +
 2 files changed, 553 insertions(+)

diff --git a/doc/src/sgml/logical-replication.sgml b/doc/src/sgml/logical-replication.sgml
index 555fbd7..6e23be9 100644
--- a/doc/src/sgml/logical-replication.sgml
+++ b/doc/src/sgml/logical-replication.sgml
@@ -118,6 +118,8 @@
    any combination of <command>INSERT</command>, <command>UPDATE</command>,
    <command>DELETE</command>, and <command>TRUNCATE</command>, similar to how triggers are fired by
    particular event types.  By default, all operation types are replicated.
+   (Row filters have no effect for <command>TRUNCATE</command>. See 
+   <xref linkend="logical-replication-row-filter"/>).
   </para>
 
   <para>
@@ -317,6 +319,555 @@
   </sect2>
  </sect1>
 
+ <sect1 id="logical-replication-row-filter">
+  <title>Row Filters</title>
+
+  <para>
+   By default, all data from all published tables will be replicated to the
+   appropriate subscribers.
+  </para>
+
+  <para>
+   The replicated data can be reduced by using a <firstterm>row filter</firstterm>.
+   A user might choose to use row filters for behavioral, security or performance
+   reasons.
+  </para>
+
+  <para>
+   If a published table sets a row filter, a row is replicated only if its data
+   satisfies the row filter expression. This allows a set of tables to be
+   partially replicated.
+  </para>
+
+  <para>
+   The row filter is defined per table. Use a <literal>WHERE</literal> clause
+   after the table name for each published table that requires data to be
+   filtered out. The <literal>WHERE</literal> clause must be enclosed by
+   parentheses. See <xref linkend="sql-createpublication"/> for details.
+  </para>
+
+  <sect2 id="logical-replication-row-filter-rules">
+   <title>Row Filter Rules</title>
+
+   <para>
+    Row filters are applied <emphasis>before</emphasis> publishing the changes.
+   </para>
+
+   <para>
+    If the row filter evaluates to <literal>false</literal> or
+    <literal>NULL</literal> then the row is not replicated.
+   </para>
+
+   <para>
+    The <literal>WHERE</literal> clause expression is evaluated with the same
+    role used for the replication connection (i.e. the role specified in the
+    <literal>CONNECTION</literal> clause of the <xref linkend="sql-createsubscription"/>).
+   </para>
+
+   <para>
+    Row filters have no effect for <command>TRUNCATE</command> command.
+   </para>
+
+  </sect2>
+
+  <sect2 id="logical-replication-row-filter-restrictions">
+   <title>Expression Restrictions</title>
+
+   <para>
+    The <literal>WHERE</literal> clause allows only simple expressions. It
+    cannot contain user-defined functions, operators, types, and collations,
+    system column references or non-immutable built-in functions.
+   </para>
+
+   <para>
+    If a publication publishes <command>UPDATE</command> and/or
+    <command>DELETE</command> operations, the row filter <literal>WHERE</literal> clause
+    must contain only columns that are covered by the replica identity (see
+    <xref linkend="sql-altertable-replica-identity"/>). If a publication publishes only
+    <command>INSERT</command>, the row filter <literal>WHERE</literal> clause
+    can use any column.
+   </para>
+
+  </sect2>
+
+  <sect2 id="logical-replication-row-filter-transformations">
+   <title>UPDATE Transformations</title>
+
+   <para>
+    Whenever an <command>UPDATE</command> is processed, the row filter
+    expression is evaluated for both the old and new row (i.e. using the data
+    before and after the update).
+   </para>
+
+   <para>
+    If both evaluations are <literal>true</literal>, it replicates the
+    <command>UPDATE</command> change.
+   </para>
+
+   <para>
+    If both evaluations are <literal>false</literal>, it doesn't replicate
+    the change.
+   </para>
+
+   <para>
+    If only one of the old/new rows matches the row filter expression, the
+    <command>UPDATE</command> is transformed to <command>INSERT</command> or
+    <command>DELETE</command>, to avoid any data inconsistency. The row on the
+    subscriber should reflect what is defined by the row filter expression on
+    the publisher.
+
+   <itemizedlist>
+    <listitem>
+     <para>
+      If the old row satisfies the row filter expression (it was sent to the
+      subscriber) but the new row doesn't, then from a data consistency
+      perspective the old row should be removed from the subscriber.
+      So the <command>UPDATE</command> is transformed into a <command>DELETE</command>.
+     </para>
+    </listitem>
+    <listitem>
+     <para>
+      If the old row doesn't satisfy the row filter expression (it wasn't sent
+      to the subscriber) but the new row does, then from a data consistency
+      perspective the new row should be added to the subscriber.
+      So the <command>UPDATE</command> is transformed into an <command>INSERT</command>.
+     </para>
+    </listitem>
+   </itemizedlist>
+
+Summary:
+<synopsis>
+Case 1: old-row (no match)    new-row (no match)   -->   (drop change)
+Case 2: old-row (no match)    new row (match)      -->   INSERT
+Case 3: old-row (match)       new-row (no match)   -->   DELETE
+Case 4: old-row (match)       new row (match)      -->   UPDATE
+</synopsis>
+   </para>
+
+  </sect2>
+
+  <sect2 id="logical-replication-row-filter-partitioned-table">
+   <title>Partitioned Tables</title>
+
+   <para>
+    If the publication contains a partitioned table, the publication parameter
+    <literal>publish_via_partition_root</literal> determines which row filter
+    is used.
+    <itemizedlist>
+
+     <listitem>
+      <para>
+       If <literal>publish_via_partition_root</literal> is <literal>false</literal>
+       (default), each <emphasis>partition's</emphasis> row filter is used.
+      </para>
+     </listitem>
+
+     <listitem>
+      <para>
+       If <literal>publish_via_partition_root</literal> is <literal>true</literal>,
+       the <emphasis>root partitioned table's</emphasis> row filter is used.
+      </para>
+     </listitem>
+
+    </itemizedlist>
+   </para>
+
+  </sect2>
+
+  <sect2 id="logical-replication-row-filter-initial-data-sync">
+   <title>Initial Data Synchronization</title>
+
+   <para>
+    If the subscription requires copying pre-existing table data
+    and a publication contains <literal>WHERE</literal> clauses, only data that
+    satisfies the row filter expressions is copied to the subscriber.
+   </para>
+
+   <para>
+    If the subscription has several publications in which a table has been
+    published with different <literal>WHERE</literal> clauses, rows that satisfy
+    <emphasis>any</emphasis> of the expressions will be copied. (see
+    <xref linkend="logical-replication-row-filter-combining"/> for details).
+   </para>
+
+   <note>
+    <para>
+     Publication <literal>publish</literal> operations are ignored when copying pre-existing table data.
+    </para>
+   </note>
+
+   <note>
+    <para>
+     If the subscriber is in a release prior to 15, copy pre-existing data
+     doesn't use row filters even if they are defined in the publication.
+     This is because old releases can only copy the entire table data.
+    </para>
+   </note>
+
+  </sect2>
+
+  <sect2 id="logical-replication-row-filter-combining">
+   <title>Combining Multiple Row Filters</title>
+
+   <para>
+    If the subscription has several publications in which the same table has
+    been published with different row filters (for the same <literal>publish</literal>
+    operation), those expressions get OR'ed together, so that rows satisfying
+    <emphasis>any</emphasis> of the expressions will be replicated.
+   </para>
+
+   <para>
+    This means all the other row filters for the same table become redundant if:
+    <itemizedlist>
+
+     <listitem>
+      <para>
+       one of the publications has no row filter.
+      </para>
+     </listitem>
+
+     <listitem>
+      <para>
+       one of the publications was created using <literal>FOR ALL TABLES</literal>.
+       This clause does not allow row filters.
+      </para>
+     </listitem>
+
+     <listitem>
+      <para>
+       one of the publications was created using
+       <literal>FOR ALL TABLES IN SCHEMA</literal> and the table belongs to
+       the referred schema. This clause does not allow row filters.
+      </para>
+     </listitem>
+
+    </itemizedlist>
+   </para>
+
+  </sect2>
+
+  <sect2 id="logical-replication-row-filter-examples">
+   <title>Examples</title>
+
+   <para>
+    Create some tables to be used in the following examples.
+<programlisting>
+testpub=# CREATE TABLE t1(a int, b int, c text, primary key(a,c));
+CREATE TABLE
+testpub=# CREATE TABLE t2(d int, e int, f int, primary key(d));
+CREATE TABLE
+testpub=# CREATE TABLE t3(g int, h int, i int, primary key(g));
+CREATE TABLE
+</programlisting>
+   </para>
+
+   <para>
+    Create some publications.
+<programlisting>
+testpub=# CREATE PUBLICATION p1 FOR TABLE t1 WHERE (a > 5 AND c = 'NSW');
+CREATE PUBLICATION
+testpub=# CREATE PUBLICATION p2 FOR TABLE t1, t2 WHERE (e = 99);
+CREATE PUBLICATION
+testpub=# CREATE PUBLICATION p3 FOR TABLE t2 WHERE (d = 10), t3 WHERE (g = 10);
+CREATE PUBLICATION
+</programlisting>
+    <itemizedlist>
+     <listitem>
+      <para>
+       Publication <literal>p1</literal> has 1 table (<literal>t1</literal>) and
+       that table has a row filter.
+      </para>
+     </listitem>
+     <listitem>
+      <para>
+       Publication <literal>p2</literal> has 2 tables. Table <literal>t1</literal>
+       has no row filter, and table <literal>t2</literal> has a row filter.
+      </para>
+     </listitem>
+     <listitem>
+      <para>
+       Publication <literal>p3</literal> has 2 tables, and both of them have a
+       row filter.
+      </para>
+     </listitem>
+    </itemizedlist>
+   </para>
+
+   <para>
+    The PSQL command <command>\dRp+</command> shows the row filter expressions
+    (if defined) for each table of the publications.
+<programlisting>
+testpub=# \dRp+
+                               Publication p1
+  Owner   | All tables | Inserts | Updates | Deletes | Truncates | Via root 
+----------+------------+---------+---------+---------+-----------+----------
+ postgres | f          | t       | t       | t       | t         | f
+Tables:
+    "public.t1" WHERE ((a > 5) AND (c = 'NSW'::text))
+
+                               Publication p2
+  Owner   | All tables | Inserts | Updates | Deletes | Truncates | Via root 
+----------+------------+---------+---------+---------+-----------+----------
+ postgres | f          | t       | t       | t       | t         | f
+Tables:
+    "public.t1"
+    "public.t2" WHERE (e = 99)
+
+                               Publication p3
+  Owner   | All tables | Inserts | Updates | Deletes | Truncates | Via root 
+----------+------------+---------+---------+---------+-----------+----------
+ postgres | f          | t       | t       | t       | t         | f
+Tables:
+    "public.t2" WHERE (d = 10)
+    "public.t3" WHERE (g = 10)
+</programlisting>
+   </para>
+
+   <para>
+    The PSQL command <command>\d</command> shows what publications the table is
+    a member of, as well as that table's row filter expression (if defined) in
+    those publications.
+<programlisting>
+testpub=# \d t1
+                 Table "public.t1"
+ Column |  Type   | Collation | Nullable | Default 
+--------+---------+-----------+----------+---------
+ a      | integer |           | not null | 
+ b      | integer |           |          | 
+ c      | text    |           | not null | 
+Indexes:
+    "t1_pkey" PRIMARY KEY, btree (a, c)
+Publications:
+    "p1" WHERE ((a > 5) AND (c = 'NSW'::text))
+    "p2"
+
+testpub=# \d t2
+                 Table "public.t2"
+ Column |  Type   | Collation | Nullable | Default 
+--------+---------+-----------+----------+---------
+ d      | integer |           | not null | 
+ e      | integer |           |          | 
+ f      | integer |           |          | 
+Indexes:
+    "t2_pkey" PRIMARY KEY, btree (d)
+Publications:
+    "p2" WHERE (e = 99)
+    "p3" WHERE (d = 10)
+
+testpub=# \d t3
+                 Table "public.t3"
+ Column |  Type   | Collation | Nullable | Default 
+--------+---------+-----------+----------+---------
+ g      | integer |           | not null | 
+ h      | integer |           |          | 
+ i      | integer |           |          | 
+Indexes:
+    "t3_pkey" PRIMARY KEY, btree (g)
+Publications:
+    "p3" WHERE (g = 10)
+</programlisting>
+    <itemizedlist>
+     <listitem>
+      <para>
+       Table <literal>t1</literal> is a member of 2 publications, but
+       has a row filter only in <literal>p1</literal>.
+      </para>
+     </listitem>
+     <listitem>
+      <para>
+       Table <literal>t2</literal> is a member of 2 publications, and
+       has a different row filter in each of them.
+      </para>
+     </listitem>
+    </itemizedlist>
+   </para>
+
+   <para>
+    On the subscriber node, create a table <literal>t1</literal> with the same
+    definition as the one on the publisher, and also create the subscription
+    <literal>s1</literal> that subscribes to the publication <literal>p1</literal>.
+<programlisting>
+testsub=# CREATE TABLE t1(a int, b int, c text, primary key(a,c));
+CREATE TABLE
+testsub=# CREATE SUBSCRIPTION s1
+testsub-# CONNECTION 'host=localhost dbname=testpub application_name=s1'
+testsub-# PUBLICATION p1;
+CREATE SUBSCRIPTION
+</programlisting>
+   </para>
+
+   <para>
+    Insert some rows.
+<programlisting>
+testpub=# INSERT INTO t1 VALUES (2, 102, 'NSW');
+INSERT 0 1
+testpub=# INSERT INTO t1 VALUES (3, 103, 'QLD');
+INSERT 0 1
+testpub=# INSERT INTO t1 VALUES (4, 104, 'VIC');
+INSERT 0 1
+testpub=# INSERT INTO t1 VALUES (5, 105, 'ACT');
+INSERT 0 1
+testpub=# INSERT INTO t1 VALUES (6, 106, 'NSW');
+INSERT 0 1
+testpub=# INSERT INTO t1 VALUES (7, 107, 'NT');
+INSERT 0 1
+testpub=# INSERT INTO t1 VALUES (8, 108, 'QLD');
+INSERT 0 1
+testpub=# INSERT INTO t1 VALUES (9, 109, 'NSW');
+INSERT 0 1
+
+testpub=# SELECT * FROM t1; 
+ a |  b  |  c
+---+-----+-----
+ 2 | 102 | NSW
+ 3 | 103 | QLD
+ 4 | 104 | VIC
+ 5 | 105 | ACT
+ 6 | 106 | NSW
+ 7 | 107 | NT
+ 8 | 108 | QLD
+ 9 | 109 | NSW
+(8 rows)
+</programlisting>
+<programlisting>
+testsub=# SELECT * FROM t1;
+ a |  b  |  c  
+---+-----+-----
+ 6 | 106 | NSW
+ 9 | 109 | NSW
+(2 rows)
+</programlisting>
+    <itemizedlist>
+     <listitem>
+      <para>
+       Only the rows satisfying the <literal>t1 WHERE</literal>
+       clause of publication <literal>p1</literal> are replicated.
+      </para>
+     </listitem>
+    </itemizedlist>
+   </para>
+
+   <para>
+    Update some data, where the old and new row values both
+    satisfy the <literal>t1 WHERE</literal> clause of publication
+    <literal>p1</literal>.
+<programlisting>
+testpub=# UPDATE t1 SET b = 999 WHERE a = 6;
+UPDATE 1
+
+testpub=# SELECT * FROM t1;
+ a |  b  |  c  
+---+-----+-----
+ 2 | 102 | NSW
+ 3 | 103 | QLD
+ 4 | 104 | VIC
+ 5 | 105 | ACT
+ 7 | 107 | NT
+ 8 | 108 | QLD
+ 9 | 109 | NSW
+ 6 | 999 | NSW
+(8 rows)
+</programlisting>
+<programlisting>
+testsub=# SELECT * FROM t1;
+ a |  b  |  c  
+---+-----+-----
+ 9 | 109 | NSW
+ 6 | 999 | NSW
+(2 rows)
+</programlisting>
+    <itemizedlist>
+     <listitem>
+      <para>
+       The <command>UPDATE</command> replicates the change as normal.
+      </para>
+     </listitem>
+    </itemizedlist>
+   </para>
+
+   <para>
+    Update some data, where the old row values did not satisfy
+    the <literal>t1 WHERE</literal> clause of publication <literal>p1</literal>,
+    but the new row values do satisfy it.
+<programlisting>
+testpub=# UPDATE t1 SET a = 555 WHERE a = 2;
+UPDATE 1
+
+testpub=# SELECT * FROM t1;
+  a  |  b  |  c  
+-----+-----+-----
+   3 | 103 | QLD
+   4 | 104 | VIC
+   5 | 105 | ACT
+   7 | 107 | NT
+   8 | 108 | QLD
+   9 | 109 | NSW
+   6 | 999 | NSW
+ 555 | 102 | NSW
+(8 rows)
+</programlisting>
+<programlisting>
+testsub=# SELECT * FROM t1;
+  a  |  b  |  c  
+-----+-----+-----
+   9 | 109 | NSW
+   6 | 999 | NSW
+ 555 | 102 | NSW
+(3 rows)
+</programlisting>
+    <itemizedlist>
+     <listitem>
+      <para>
+       The <command>UPDATE</command> is transformed into an <command>INSERT</command>
+       and the change is replicated.
+      </para>
+     </listitem>
+    </itemizedlist>
+   </para>
+
+   <para>
+    Update some data, where the old row values satisfied
+    the <literal>t1 WHERE</literal> clause of publication <literal>p1</literal>,
+    but the new row values do not satisfy it.
+<programlisting>
+testpub=# UPDATE t1 SET c = 'VIC' WHERE a = 9;
+UPDATE 1
+
+testpub=# SELECT * FROM t1;
+  a  |  b  |  c  
+-----+-----+-----
+   3 | 103 | QLD
+   4 | 104 | VIC
+   5 | 105 | ACT
+   7 | 107 | NT
+   8 | 108 | QLD
+   6 | 999 | NSW
+ 555 | 102 | NSW
+   9 | 109 | VIC
+(8 rows)
+</programlisting>
+<programlisting>
+testsub=# SELECT * FROM t1;
+  a  |  b  |  c  
+-----+-----+-----
+   6 | 999 | NSW
+ 555 | 102 | NSW
+(2 rows)
+</programlisting>
+    <itemizedlist>
+     <listitem>
+      <para>
+       The <command>UPDATE</command> is transformed into a <command>DELETE</command>
+       and the change is replicated.
+      </para>
+     </listitem>
+    </itemizedlist>
+   </para>
+
+  </sect2>
+
+ </sect1>
+
  <sect1 id="logical-replication-conflicts">
   <title>Conflicts</title>
 
diff --git a/doc/src/sgml/ref/create_publication.sgml b/doc/src/sgml/ref/create_publication.sgml
index fb2d013..23d883c 100644
--- a/doc/src/sgml/ref/create_publication.sgml
+++ b/doc/src/sgml/ref/create_publication.sgml
@@ -254,6 +254,8 @@ CREATE PUBLICATION <replaceable class="parameter">name</replaceable>
    <literal>publish_via_partition_root</literal> determines if it uses the
    partition's row filter (if the parameter is false, the default) or the root
    partitioned table's row filter.
+   See <xref linkend="logical-replication-row-filter"/> for details about row
+   filters.
   </para>
 
   <para>
-- 
1.8.3.1

#17

smithpb2250@gmail.com

almost 4 years ago

In reply to: Amit Kapila (#15)

Re: PG DOCS - logical replication filtering

On Wed, Apr 6, 2022 at 8:58 PM Amit Kapila <amit.kapila16@gmail.com> wrote:

On Thu, Mar 24, 2022 at 11:48 AM Peter Smith <smithpb2250@gmail.com> wrote:
Review comments:
===============
1.
+    The <literal>WHERE</literal> clause expression is evaluated with the same
+    role used for the replication connection. i.e. the role specified in the
+    <literal>CONNECTION</literal> clause of the <xref
linkend="sql-createsubscription"/>.
Can we use () around i.e. sentence? It looks bit odd to me now.
The <literal>WHERE</literal> clause expression is evaluated with the
same role used for the replication connection (i.e., the role
specified in the <literal>CONNECTION</literal> clause of the <xref
linkend="sql-createsubscription"/>).

OK. Modified in v6 [1]/messages/by-id/CAHut+PvyxMedYY-jHaT9YSfEPHv0jU2-CZ8F_nPvhuP0b955og@mail.gmail.com.

2.
+   <para>
+    Whenever an <command>UPDATE</command> is processed, the row filter
+    expression is evaluated for both the old and new row (i.e. before
+    and after the data is updated).
Can we write the below part slightly differently?
Before:
(i.e. before and after the data is updated).
After:
(i.e the data before and after the update).

OK. Modified in v6 [1]/messages/by-id/CAHut+PvyxMedYY-jHaT9YSfEPHv0jU2-CZ8F_nPvhuP0b955og@mail.gmail.com.

3.
+   <para>
+    Whenever an <command>UPDATE</command> is processed, the row filter
+    expression is evaluated for both the old and new row (i.e. before
+    and after the data is updated).
+   </para>
+
+   <para>
+    If both evaluations are <literal>true</literal>, it replicates the
+    <command>UPDATE</command> change.
+   </para>
+
+   <para>
+    If both evaluations are <literal>false</literal>, it doesn't replicate
+    the change.
+   </para>

I think we can combine these three separate paragraphs.

The first sentence is the explanation, then there are the 3 separate
“If …” cases mentioned. It doesn’t seem quite right to group just the
first 2 “if…” cases into one paragraph, while leaving the 3rd one
separated. OTOH combining everything into one big paragraph seems
worse. Please confirm if you still want this changed.

4.
+   <para>
+    If the publication contains a partitioned table, the publication parameter
+    <literal>publish_via_partition_root</literal> determines which row filter
+    is used.
+    <itemizedlist>
+
+     <listitem>
+      <para>
+       If <literal>publish_via_partition_root</literal> is
<literal>false</literal>
+       (default), each <emphasis>partition's</emphasis> row filter is used.
+      </para>
+     </listitem>
+
+     <listitem>
+      <para>
+       If <literal>publish_via_partition_root</literal> is
<literal>true</literal>,
+       the <emphasis>root partitioned table's</emphasis> row filter is used.
+      </para>
+     </listitem>
+
+    </itemizedlist>
+   </para>

I think we can combine this into single para as Euler had in his version.

We can do it, but I am not sure if your review was looking at the
rendered HTML or just looking at the SGML text? IMO using bullets here
ended up being more readable (it is also consistent with other bullet
usages on this page). Please confirm if you still want this changed.

5.
+   <note>
+    <para>
+     Publication <literal>publish</literal> operations are ignored
when copying pre-existing table data.
+    </para>
+   </note>
+
+   <note>
+    <para>
+     If the subscriber is in a release prior to 15, copy pre-existing data
+     doesn't use row filters even if they are defined in the publication.
+     This is because old releases can only copy the entire table data.
+    </para>
+   </note>

I don't see the need for the first <note> here, the second one seems
to convey it.

Well, the 2nd note is only about compatibility with older versions
doing the subscribe. But the 1st note is not version-specific at all.
It is saying that the COPY does not take the “publish” option into
account. If you know of some other docs already mentioning this subtle
behaviour of the COPY then I can remove this note and just
cross-reference to the other place. But I did not know anywhere this
is already mentioned, so that is why I wrote the note about it.

6.
+   <para>
+    Create some tables to be used in the following examples.
+<programlisting>
+testpub=# CREATE TABLE t1(a int, b int, c text, primary key(a,c));
+CREATE TABLE
+testpub=# CREATE TABLE t2(d int primary key, e int, f int);
+CREATE TABLE
+testpub=# CREATE TABLE t3(g int primary key, h int, i int);
+CREATE TABLE
+</programlisting>
+   </para>
+
+   <para>
+    Create some publications.
+   </para>
+   <para>
+    - notice publication <literal>p1</literal> has 1 table with a row filter.
+   </para>
+   <para>
+    - notice publication <literal>p2</literal> has 2 tables, one without a row
+    filter, and one with a row filter.
+   </para>
+   <para>
+    - notice publication <literal>p3</literal> has 2 tables, both
with row filters.
+<programlisting>
+testpub=# CREATE PUBLICATION p1 FOR TABLE t1 WHERE (a > 5 AND c = 'NSW');
+CREATE PUBLICATION
+testpub=# CREATE PUBLICATION p2 FOR TABLE t1, t2 WHERE (e = 99);
+CREATE PUBLICATION
+testpub=# CREATE PUBLICATION p3 FOR TABLE t2 WHERE (d = 10), t3 WHERE (g = 10);
+CREATE PUBLICATION
+</programlisting>
+   </para>

I think it is better to use the corresponding content from Euler's version.

OK. Modified in v6 [1]/messages/by-id/CAHut+PvyxMedYY-jHaT9YSfEPHv0jU2-CZ8F_nPvhuP0b955og@mail.gmail.com. I changed the primary key syntax to be the
same as Euler had. I also moved all the 'notice' parts to below the
corresponding example and modified the text.

7.
+   <para>
+    The PSQL command <command>\d</command> shows what publications the table is
+    a member of, as well as that table's row filter expression (if defined) in
+    those publications.
+   </para>
+   <para>
+    - notice table <literal>t1</literal> is a member of 2 publications, but
+    has a row filter only in <literal>p1</literal>.
+   </para>
+   <para>
+    - notice table <literal>t2</literal> is a member of 2 publications, and
+    has a different row filter in each of them.

This looks unnecessary to me. Let's remove this part.

I thought something is needed to explain/demonstrate how the user can
know the different row filters for all the publications that the same
table is a member of. Otherwise, the user has to guess (??) what
publications are using their table and then use \dRp+ to dig at all
those publications to find the row filters.

I can remove all this part from the Examples, but I think at least the
\d should still be mentioned somewhere. IMO I should put that "PSQL
commands" section back (which existed in an earlier version) and just
add a sentence about this. Then this examples part can be removed.
What do you think?

8.
+   <para>
+    - notice that only the rows satisfying the <literal>t1 WHERE</literal>
+    clause of publication <literal>p1</literal> are replicated.
Again, it is better to use Euler's version for this and at the place,
he had in his version. Similarly, adjust other notices if any like
this one.

OK. Modified in v6 [1]/messages/by-id/CAHut+PvyxMedYY-jHaT9YSfEPHv0jU2-CZ8F_nPvhuP0b955og@mail.gmail.com. Every “notice” has now been moved to follow
the associated example (how Euler had them)

9. I suggest adding an example for partition tables showing how the
row filter is used based on the 'publish_via_partition_root'
parameter.

OK - I am working on this and will add it in a future patch version.

------
[1]: /messages/by-id/CAHut+PvyxMedYY-jHaT9YSfEPHv0jU2-CZ8F_nPvhuP0b955og@mail.gmail.com

Kind Regards,
Peter Smith.
Fujitsu Australia

#18

Alvaro Herrera

alvherre@alvh.no-ip.org

almost 4 years ago

In reply to: Peter Smith (#16)

Re: PG DOCS - logical replication filtering

On 2022-Apr-08, Peter Smith wrote:

PSA patch v6 to address some of Amit's review comments [1].

I think the text is good (didn't read it all, just about the first
half), but why is there one paragraph per sentence? Seems a bit too
sparse.

--
Álvaro Herrera 48°01'N 7°57'E — https://www.EnterpriseDB.com/
"You're _really_ hosed if the person doing the hiring doesn't understand
relational systems: you end up with a whole raft of programmers, none of
whom has had a Date with the clue stick." (Andrew Sullivan)

#19

amit.kapila16@gmail.com

almost 4 years ago

In reply to: Peter Smith (#17)

Re: PG DOCS - logical replication filtering

On Fri, Apr 8, 2022 at 11:42 AM Peter Smith <smithpb2250@gmail.com> wrote:

On Wed, Apr 6, 2022 at 8:58 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
3.
+   <para>
+    Whenever an <command>UPDATE</command> is processed, the row filter
+    expression is evaluated for both the old and new row (i.e. before
+    and after the data is updated).
+   </para>
+
+   <para>
+    If both evaluations are <literal>true</literal>, it replicates the
+    <command>UPDATE</command> change.
+   </para>
+
+   <para>
+    If both evaluations are <literal>false</literal>, it doesn't replicate
+    the change.
+   </para>
I think we can combine these three separate paragraphs.
The first sentence is the explanation, then there are the 3 separate
“If …” cases mentioned. It doesn’t seem quite right to group just the
first 2 “if…” cases into one paragraph, while leaving the 3rd one
separated. OTOH combining everything into one big paragraph seems
worse. Please confirm if you still want this changed.

Yeah, I think we should have something like what Euler's version had
and maybe keep a summary section from the current patch.

4.
+   <para>
+    If the publication contains a partitioned table, the publication parameter
+    <literal>publish_via_partition_root</literal> determines which row filter
+    is used.
+    <itemizedlist>
+
+     <listitem>
+      <para>
+       If <literal>publish_via_partition_root</literal> is
<literal>false</literal>
+       (default), each <emphasis>partition's</emphasis> row filter is used.
+      </para>
+     </listitem>
+
+     <listitem>
+      <para>
+       If <literal>publish_via_partition_root</literal> is
<literal>true</literal>,
+       the <emphasis>root partitioned table's</emphasis> row filter is used.
+      </para>
+     </listitem>
+
+    </itemizedlist>
+   </para>
I think we can combine this into single para as Euler had in his version.
We can do it, but I am not sure if your review was looking at the
rendered HTML or just looking at the SGML text? IMO using bullets here
ended up being more readable (it is also consistent with other bullet
usages on this page). Please confirm if you still want this changed.

Fair enough. We can keep this part as it is.

5.
+   <note>
+    <para>
+     Publication <literal>publish</literal> operations are ignored
when copying pre-existing table data.
+    </para>
+   </note>
+
+   <note>
+    <para>
+     If the subscriber is in a release prior to 15, copy pre-existing data
+     doesn't use row filters even if they are defined in the publication.
+     This is because old releases can only copy the entire table data.
+    </para>
+   </note>
I don't see the need for the first <note> here, the second one seems
to convey it.
Well, the 2nd note is only about compatibility with older versions
doing the subscribe. But the 1st note is not version-specific at all.
It is saying that the COPY does not take the “publish” option into
account. If you know of some other docs already mentioning this subtle
behaviour of the COPY then I can remove this note and just
cross-reference to the other place. But I did not know anywhere this
is already mentioned, so that is why I wrote the note about it.

I don't see the need to say about general initial sync (COPY) behavior
here. It is already defined at [1]. If we want to enhance, we can do
that as a separate patch to make changes where the initial sync is
explained. I am not sure that is required though.

7.
+   <para>
+    The PSQL command <command>\d</command> shows what publications the table is
+    a member of, as well as that table's row filter expression (if defined) in
+    those publications.
+   </para>
+   <para>
+    - notice table <literal>t1</literal> is a member of 2 publications, but
+    has a row filter only in <literal>p1</literal>.
+   </para>
+   <para>
+    - notice table <literal>t2</literal> is a member of 2 publications, and
+    has a different row filter in each of them.
This looks unnecessary to me. Let's remove this part.
I thought something is needed to explain/demonstrate how the user can
know the different row filters for all the publications that the same
table is a member of. Otherwise, the user has to guess (??) what
publications are using their table and then use \dRp+ to dig at all
those publications to find the row filters.

I can remove all this part from the Examples, but I think at least the
\d should still be mentioned somewhere. IMO I should put that "PSQL
commands" section back (which existed in an earlier version) and just
add a sentence about this. Then this examples part can be removed.
What do you think?

I think the way it is changed in the current patch by moving that
explanation down seems okay to me.

I feel in the initial "Row Filters" and "Row Filter Rules" sections,
we don't need to have separate paragraphs. I think the same is pointed
out by Alvaro as well.

--
With Regards,
Amit Kapila.

#20

------
[1a] /messages/by-id/CAHut+PvDFWGUOBugYMtcXhAiViZu+Q6P-kxw2+U83VOGx0Osdg@mail.gmail.com
[1b] /messages/by-id/CAA4eK1JPyVoc1dUjeqbPd9D0_uYxWyyx-8fcsrgiZ5Tpr9OAuw@mail.gmail.com
[2]: /messages/by-id/202204091045.v2ei4yupxqso@alvherre.pgsql

smithpb2250@gmail.com

almost 4 years ago

In reply to: Peter Smith (#16)

1 attachment(s)

Re: PG DOCS - logical replication filtering

PSA patch v7 which addresses all the remaining review comments (from
Amit [1a][1b], and from Alvaro [2]/messages/by-id/202204091045.v2ei4yupxqso@alvherre.pgsql).

Kind Regards,
Peter Smith.
Fujitsu Australia

Attachments:

v7-0001-PG-DOCS-page-for-row-filters.patchapplication/octet-stream; name=v7-0001-PG-DOCS-page-for-row-filters.patchDownload

From a1734d6dfdf4c0f3582d9b0da55fa383cf12ac19 Mon Sep 17 00:00:00 2001
From: Peter Smith <peter.b.smith@fujitsu.com>
Date: Mon, 11 Apr 2022 16:54:54 +1000
Subject: [PATCH v7] PG DOCS page for row filters.

This patch introduces a new documentation page for describing the "Row Filters" feature.

Author: Peter Smith, Euler Taveira
Reviewed By: Greg Nancarrow, Aleksander Alekseev, Amit Kapila, Ajin Cherian
Discussion: https://www.postgresql.org/message-id/CAHut%2BPtnsBr59%3D_NvxXp_%3DS-em0WxyuDOQmSTuHGb4sVhkHffg%40mail.gmail.com
---
 doc/src/sgml/logical-replication.sgml    | 642 +++++++++++++++++++++++++++++++
 doc/src/sgml/ref/create_publication.sgml |   2 +
 2 files changed, 644 insertions(+)

diff --git a/doc/src/sgml/logical-replication.sgml b/doc/src/sgml/logical-replication.sgml
index 555fbd7..ca47fe1 100644
--- a/doc/src/sgml/logical-replication.sgml
+++ b/doc/src/sgml/logical-replication.sgml
@@ -118,6 +118,8 @@
    any combination of <command>INSERT</command>, <command>UPDATE</command>,
    <command>DELETE</command>, and <command>TRUNCATE</command>, similar to how triggers are fired by
    particular event types.  By default, all operation types are replicated.
+   (Row filters have no effect for <command>TRUNCATE</command>. See 
+   <xref linkend="logical-replication-row-filter"/>).
   </para>
 
   <para>
@@ -317,6 +319,646 @@
   </sect2>
  </sect1>
 
+ <sect1 id="logical-replication-row-filter">
+  <title>Row Filters</title>
+
+  <para>
+   By default, all data from all published tables will be replicated to the
+   appropriate subscribers. The replicated data can be reduced by using a 
+   <firstterm>row filter</firstterm>. A user might choose to use row filters
+   for behavioral, security or performance reasons. If a published table sets a
+   row filter, a row is replicated only if its data satisfies the row filter
+   expression. This allows a set of tables to be partially replicated. The row
+   filter is defined per table. Use a <literal>WHERE</literal> clause after the
+   table name for each published table that requires data to be filtered out.
+   The <literal>WHERE</literal> clause must be enclosed by parentheses. See
+   <xref linkend="sql-createpublication"/> for details.
+  </para>
+
+  <sect2 id="logical-replication-row-filter-rules">
+   <title>Row Filter Rules</title>
+
+   <para>
+    Row filters are applied <emphasis>before</emphasis> publishing the changes.
+    If the row filter evaluates to <literal>false</literal> or <literal>NULL</literal>
+    then the row is not replicated. The <literal>WHERE</literal> clause expression
+    is evaluated with the same role used for the replication connection (i.e.
+    the role specified in the <literal>CONNECTION</literal> clause of the
+    <xref linkend="sql-createsubscription"/>). Row filters have no effect for
+    <command>TRUNCATE</command> command.
+   </para>
+
+  </sect2>
+
+  <sect2 id="logical-replication-row-filter-restrictions">
+   <title>Expression Restrictions</title>
+
+   <para>
+    The <literal>WHERE</literal> clause allows only simple expressions. It
+    cannot contain user-defined functions, operators, types, and collations,
+    system column references or non-immutable built-in functions.
+   </para>
+
+   <para>
+    If a publication publishes <command>UPDATE</command> and/or
+    <command>DELETE</command> operations, the row filter <literal>WHERE</literal> clause
+    must contain only columns that are covered by the replica identity (see
+    <xref linkend="sql-altertable-replica-identity"/>). If a publication publishes only
+    <command>INSERT</command>, the row filter <literal>WHERE</literal> clause
+    can use any column.
+   </para>
+
+  </sect2>
+
+  <sect2 id="logical-replication-row-filter-transformations">
+   <title>UPDATE Transformations</title>
+
+   <para>
+    Whenever an <command>UPDATE</command> is processed, the row filter
+    expression is evaluated for both the old and new row (i.e. using the data
+    before and after the update). If both evaluations are <literal>true</literal>,
+    it replicates the <command>UPDATE</command> change. If both evaluations are
+    <literal>false</literal>, it doesn't replicate the change. If only one of
+    the old/new rows matches the row filter expression, the <command>UPDATE</command>
+    is transformed to <command>INSERT</command> or <command>DELETE</command>, to
+    avoid any data inconsistency. The row on the subscriber should reflect what
+    is defined by the row filter expression on the publisher.
+   </para>
+
+   <para>
+    If the old row satisfies the row filter expression (it was sent to the
+    subscriber) but the new row doesn't, then from a data consistency
+    perspective the old row should be removed from the subscriber.
+    So the <command>UPDATE</command> is transformed into a <command>DELETE</command>.
+   </para>
+
+   <para>
+    If the old row doesn't satisfy the row filter expression (it wasn't sent
+    to the subscriber) but the new row does, then from a data consistency
+    perspective the new row should be added to the subscriber.
+    So the <command>UPDATE</command> is transformed into an <command>INSERT</command>.
+   </para>
+
+   <para>
+Summary:
+<synopsis>
+Case 1: old-row (no match)    new-row (no match)   -->   (drop change)
+Case 2: old-row (no match)    new row (match)      -->   INSERT
+Case 3: old-row (match)       new-row (no match)   -->   DELETE
+Case 4: old-row (match)       new row (match)      -->   UPDATE
+</synopsis></para>
+
+  </sect2>
+
+  <sect2 id="logical-replication-row-filter-partitioned-table">
+   <title>Partitioned Tables</title>
+
+   <para>
+    If the publication contains a partitioned table, the publication parameter
+    <literal>publish_via_partition_root</literal> determines which row filter
+    is used.
+    <itemizedlist>
+     <listitem>
+      <para>
+       If <literal>publish_via_partition_root</literal> is <literal>false</literal>
+       (default), each <emphasis>partition's</emphasis> row filter is used.
+      </para>
+     </listitem>
+     <listitem>
+      <para>
+       If <literal>publish_via_partition_root</literal> is <literal>true</literal>,
+       the <emphasis>root partitioned table's</emphasis> row filter is used.
+      </para>
+     </listitem>
+    </itemizedlist></para>
+
+  </sect2>
+
+  <sect2 id="logical-replication-row-filter-initial-data-sync">
+   <title>Initial Data Synchronization</title>
+
+   <para>
+    If the subscription requires copying pre-existing table data
+    and a publication contains <literal>WHERE</literal> clauses, only data that
+    satisfies the row filter expressions is copied to the subscriber.
+   </para>
+
+   <para>
+    If the subscription has several publications in which a table has been
+    published with different <literal>WHERE</literal> clauses, rows that satisfy
+    <emphasis>any</emphasis> of the expressions will be copied. (see
+    <xref linkend="logical-replication-row-filter-combining"/> for details).
+   </para>
+
+   <note>
+    <para>
+     If the subscriber is in a release prior to 15, copy pre-existing data
+     doesn't use row filters even if they are defined in the publication.
+     This is because old releases can only copy the entire table data.
+    </para>
+   </note>
+
+  </sect2>
+
+  <sect2 id="logical-replication-row-filter-combining">
+   <title>Combining Multiple Row Filters</title>
+
+   <para>
+    If the subscription has several publications in which the same table has
+    been published with different row filters (for the same <literal>publish</literal>
+    operation), those expressions get OR'ed together, so that rows satisfying
+    <emphasis>any</emphasis> of the expressions will be replicated.
+   </para>
+
+   <para>
+    This means all the other row filters for the same table become redundant if:
+    <itemizedlist>
+     <listitem>
+      <para>
+       one of the publications has no row filter.
+      </para>
+     </listitem>
+     <listitem>
+      <para>
+       one of the publications was created using <literal>FOR ALL TABLES</literal>.
+       This clause does not allow row filters.
+      </para>
+     </listitem>
+     <listitem>
+      <para>
+       one of the publications was created using
+       <literal>FOR ALL TABLES IN SCHEMA</literal> and the table belongs to
+       the referred schema. This clause does not allow row filters.
+      </para>
+     </listitem>
+    </itemizedlist></para>
+
+  </sect2>
+
+  <sect2 id="logical-replication-row-filter-examples">
+   <title>Examples</title>
+
+   <para>
+    Create some tables to be used in the following examples.
+<programlisting>
+testpub=# CREATE TABLE t1(a int, b int, c text, primary key(a,c));
+CREATE TABLE
+testpub=# CREATE TABLE t2(d int, e int, f int, primary key(d));
+CREATE TABLE
+testpub=# CREATE TABLE t3(g int, h int, i int, primary key(g));
+CREATE TABLE
+</programlisting></para>
+
+   <para>
+    Create some publications.
+<programlisting>
+testpub=# CREATE PUBLICATION p1 FOR TABLE t1 WHERE (a > 5 AND c = 'NSW');
+CREATE PUBLICATION
+testpub=# CREATE PUBLICATION p2 FOR TABLE t1, t2 WHERE (e = 99);
+CREATE PUBLICATION
+testpub=# CREATE PUBLICATION p3 FOR TABLE t2 WHERE (d = 10), t3 WHERE (g = 10);
+CREATE PUBLICATION
+</programlisting>
+    <itemizedlist>
+     <listitem>
+      <para>
+       Publication <literal>p1</literal> has one table (<literal>t1</literal>) and
+       that table has a row filter.
+      </para>
+     </listitem>
+     <listitem>
+      <para>
+       Publication <literal>p2</literal> has two tables. Table <literal>t1</literal>
+       has no row filter, and table <literal>t2</literal> has a row filter.
+      </para>
+     </listitem>
+     <listitem>
+      <para>
+       Publication <literal>p3</literal> has two tables, and both of them have a
+       row filter.
+      </para>
+     </listitem>
+    </itemizedlist></para>
+
+   <para>
+    The PSQL command <command>\dRp+</command> shows the row filter expressions
+    (if defined) for each table of the publications.
+<programlisting>
+testpub=# \dRp+
+                               Publication p1
+  Owner   | All tables | Inserts | Updates | Deletes | Truncates | Via root 
+----------+------------+---------+---------+---------+-----------+----------
+ postgres | f          | t       | t       | t       | t         | f
+Tables:
+    "public.t1" WHERE ((a > 5) AND (c = 'NSW'::text))
+
+                               Publication p2
+  Owner   | All tables | Inserts | Updates | Deletes | Truncates | Via root 
+----------+------------+---------+---------+---------+-----------+----------
+ postgres | f          | t       | t       | t       | t         | f
+Tables:
+    "public.t1"
+    "public.t2" WHERE (e = 99)
+
+                               Publication p3
+  Owner   | All tables | Inserts | Updates | Deletes | Truncates | Via root 
+----------+------------+---------+---------+---------+-----------+----------
+ postgres | f          | t       | t       | t       | t         | f
+Tables:
+    "public.t2" WHERE (d = 10)
+    "public.t3" WHERE (g = 10)
+</programlisting></para>
+
+   <para>
+    The PSQL command <command>\d</command> shows what publications the table is
+    a member of, as well as that table's row filter expression (if defined) in
+    those publications.
+<programlisting>
+testpub=# \d t1
+                 Table "public.t1"
+ Column |  Type   | Collation | Nullable | Default 
+--------+---------+-----------+----------+---------
+ a      | integer |           | not null | 
+ b      | integer |           |          | 
+ c      | text    |           | not null | 
+Indexes:
+    "t1_pkey" PRIMARY KEY, btree (a, c)
+Publications:
+    "p1" WHERE ((a > 5) AND (c = 'NSW'::text))
+    "p2"
+
+testpub=# \d t2
+                 Table "public.t2"
+ Column |  Type   | Collation | Nullable | Default 
+--------+---------+-----------+----------+---------
+ d      | integer |           | not null | 
+ e      | integer |           |          | 
+ f      | integer |           |          | 
+Indexes:
+    "t2_pkey" PRIMARY KEY, btree (d)
+Publications:
+    "p2" WHERE (e = 99)
+    "p3" WHERE (d = 10)
+
+testpub=# \d t3
+                 Table "public.t3"
+ Column |  Type   | Collation | Nullable | Default 
+--------+---------+-----------+----------+---------
+ g      | integer |           | not null | 
+ h      | integer |           |          | 
+ i      | integer |           |          | 
+Indexes:
+    "t3_pkey" PRIMARY KEY, btree (g)
+Publications:
+    "p3" WHERE (g = 10)
+</programlisting>
+    <itemizedlist>
+     <listitem>
+      <para>
+       Table <literal>t1</literal> is a member of two publications, but
+       has a row filter only in <literal>p1</literal>.
+      </para>
+     </listitem>
+     <listitem>
+      <para>
+       Table <literal>t2</literal> is a member of two publications, and
+       has a different row filter in each of them.
+      </para>
+     </listitem>
+    </itemizedlist></para>
+
+   <para>
+    On the subscriber node, create a table <literal>t1</literal> with the same
+    definition as the one on the publisher, and also create the subscription
+    <literal>s1</literal> that subscribes to the publication <literal>p1</literal>.
+<programlisting>
+testsub=# CREATE TABLE t1(a int, b int, c text, primary key(a,c));
+CREATE TABLE
+testsub=# CREATE SUBSCRIPTION s1
+testsub-# CONNECTION 'host=localhost dbname=testpub application_name=s1'
+testsub-# PUBLICATION p1;
+CREATE SUBSCRIPTION
+</programlisting></para>
+
+   <para>
+    Insert some rows.
+<programlisting>
+testpub=# INSERT INTO t1 VALUES (2, 102, 'NSW');
+INSERT 0 1
+testpub=# INSERT INTO t1 VALUES (3, 103, 'QLD');
+INSERT 0 1
+testpub=# INSERT INTO t1 VALUES (4, 104, 'VIC');
+INSERT 0 1
+testpub=# INSERT INTO t1 VALUES (5, 105, 'ACT');
+INSERT 0 1
+testpub=# INSERT INTO t1 VALUES (6, 106, 'NSW');
+INSERT 0 1
+testpub=# INSERT INTO t1 VALUES (7, 107, 'NT');
+INSERT 0 1
+testpub=# INSERT INTO t1 VALUES (8, 108, 'QLD');
+INSERT 0 1
+testpub=# INSERT INTO t1 VALUES (9, 109, 'NSW');
+INSERT 0 1
+
+testpub=# SELECT * FROM t1; 
+ a |  b  |  c
+---+-----+-----
+ 2 | 102 | NSW
+ 3 | 103 | QLD
+ 4 | 104 | VIC
+ 5 | 105 | ACT
+ 6 | 106 | NSW
+ 7 | 107 | NT
+ 8 | 108 | QLD
+ 9 | 109 | NSW
+(8 rows)
+</programlisting>
+<programlisting>
+testsub=# SELECT * FROM t1;
+ a |  b  |  c  
+---+-----+-----
+ 6 | 106 | NSW
+ 9 | 109 | NSW
+(2 rows)
+</programlisting>
+    <itemizedlist>
+     <listitem>
+      <para>
+       Only the rows satisfying the <literal>t1 WHERE</literal>
+       clause of publication <literal>p1</literal> are replicated.
+      </para>
+     </listitem>
+    </itemizedlist></para>
+
+   <para>
+    Update some data, where the old and new row values both
+    satisfy the <literal>t1 WHERE</literal> clause of publication
+    <literal>p1</literal>.
+<programlisting>
+testpub=# UPDATE t1 SET b = 999 WHERE a = 6;
+UPDATE 1
+
+testpub=# SELECT * FROM t1;
+ a |  b  |  c  
+---+-----+-----
+ 2 | 102 | NSW
+ 3 | 103 | QLD
+ 4 | 104 | VIC
+ 5 | 105 | ACT
+ 7 | 107 | NT
+ 8 | 108 | QLD
+ 9 | 109 | NSW
+ 6 | 999 | NSW
+(8 rows)
+</programlisting>
+<programlisting>
+testsub=# SELECT * FROM t1;
+ a |  b  |  c  
+---+-----+-----
+ 9 | 109 | NSW
+ 6 | 999 | NSW
+(2 rows)
+</programlisting>
+    <itemizedlist>
+     <listitem>
+      <para>
+       The <command>UPDATE</command> replicates the change as normal.
+      </para>
+     </listitem>
+    </itemizedlist></para>
+
+   <para>
+    Update some data, where the old row values did not satisfy
+    the <literal>t1 WHERE</literal> clause of publication <literal>p1</literal>,
+    but the new row values do satisfy it.
+<programlisting>
+testpub=# UPDATE t1 SET a = 555 WHERE a = 2;
+UPDATE 1
+
+testpub=# SELECT * FROM t1;
+  a  |  b  |  c  
+-----+-----+-----
+   3 | 103 | QLD
+   4 | 104 | VIC
+   5 | 105 | ACT
+   7 | 107 | NT
+   8 | 108 | QLD
+   9 | 109 | NSW
+   6 | 999 | NSW
+ 555 | 102 | NSW
+(8 rows)
+</programlisting>
+<programlisting>
+testsub=# SELECT * FROM t1;
+  a  |  b  |  c  
+-----+-----+-----
+   9 | 109 | NSW
+   6 | 999 | NSW
+ 555 | 102 | NSW
+(3 rows)
+</programlisting>
+    <itemizedlist>
+     <listitem>
+      <para>
+       The <command>UPDATE</command> is transformed into an <command>INSERT</command>
+       and the change is replicated.
+      </para>
+     </listitem>
+    </itemizedlist></para>
+
+   <para>
+    Update some data, where the old row values satisfied
+    the <literal>t1 WHERE</literal> clause of publication <literal>p1</literal>,
+    but the new row values do not satisfy it.
+<programlisting>
+testpub=# UPDATE t1 SET c = 'VIC' WHERE a = 9;
+UPDATE 1
+
+testpub=# SELECT * FROM t1;
+  a  |  b  |  c  
+-----+-----+-----
+   3 | 103 | QLD
+   4 | 104 | VIC
+   5 | 105 | ACT
+   7 | 107 | NT
+   8 | 108 | QLD
+   6 | 999 | NSW
+ 555 | 102 | NSW
+   9 | 109 | VIC
+(8 rows)
+</programlisting>
+<programlisting>
+testsub=# SELECT * FROM t1;
+  a  |  b  |  c  
+-----+-----+-----
+   6 | 999 | NSW
+ 555 | 102 | NSW
+(2 rows)
+</programlisting>
+    <itemizedlist>
+     <listitem>
+      <para>
+       The <command>UPDATE</command> is transformed into a <command>DELETE</command>
+       and the change is replicated.
+      </para>
+     </listitem>
+    </itemizedlist></para>
+
+  </sect2>
+
+  <sect2 id="logical-replication-row-filter-example-partitions">
+   <title>Example (Partitioned tables)</title>
+
+   <para>
+    Create the same partitioned tables on the publisher and subscriber.
+<programlisting>
+testpub=# CREATE TABLE parent(a int primary key) PARTITION BY RANGE(a);
+CREATE TABLE
+testpub=# CREATE TABLE child PARTITION OF parent DEFAULT;
+CREATE TABLE
+</programlisting>
+<programlisting>
+testsub=# CREATE TABLE parent(a int primary key) PARTITION BY RANGE(a);
+CREATE TABLE
+testsub=# CREATE TABLE child PARTITION OF parent DEFAULT;
+CREATE TABLE
+</programlisting></para>
+
+   <para>
+    Create a publication <literal>p4</literal>, and then subscribe to it.
+<programlisting>
+testpub=# CREATE PUBLICATION p4 FOR TABLE parent WHERE (a &lt; 5), child WHERE (a >= 5)
+testpub-# WITH (publish_via_partition_root=true);
+CREATE PUBLICATION
+</programlisting>
+    <itemizedlist>
+     <listitem>
+      <para>
+       The publication is using <literal>publish_via_partition_root</literal> set true.
+      </para>
+     </listitem>
+     <listitem>
+      <para>
+       There are row filters defined on both the partitioned table
+       (<literal>parent</literal>), and on the partition (<literal>child</literal>).
+      </para>
+     </listitem>
+    </itemizedlist>
+<programlisting>
+testsub=# CREATE SUBSCRIPTION s4
+testsub-# CONNECTION 'host=localhost dbname=testpub application_name=s4'
+testsub-# PUBLICATION p4;
+CREATE SUBSCRIPTION
+</programlisting></para>
+
+   <para>
+    Insert some values directly into the <literal>parent</literal> and
+    <literal>child</literal> tables.
+<programlisting>
+test_pub=# INSERT INTO parent VALUES (2), (4), (6);
+INSERT 0 3
+test_pub=# INSERT INTO child VALUES (3), (5), (7);
+INSERT 0 3
+</programlisting>
+<programlisting>
+test_pub=# SELECT * FROM parent ORDER BY a;
+ a 
+---
+ 2
+ 3
+ 4
+ 5
+ 6
+ 7
+(6 rows)
+</programlisting>
+<programlisting>
+test_sub=# SELECT * FROM parent ORDER BY a;
+ a 
+---
+ 2
+ 3
+ 4
+(3 rows)
+</programlisting>
+    <itemizedlist>
+     <listitem>
+      <para>
+       The row filter of <literal>parent</literal> was used (because
+       <literal>publish_via_partition_root</literal> was true).
+      </para>
+     </listitem>
+    </itemizedlist></para>
+
+   <para>
+    Repeat the same test, but with a different value for <literal>publish_via_partition_root</literal>.
+<programlisting>
+testpub=# DROP PUBLICATION p4;
+DROP PUBLICATION
+testpub=# CREATE PUBLICATION p4 FOR TABLE parent, child WHERE (a >= 5)
+testpub-# WITH (publish_via_partition_root=false);
+CREATE PUBLICATION
+</programlisting>
+    <itemizedlist>
+     <listitem>
+      <para>
+       The publication is using <literal>publish_via_partition_root</literal> set false.
+      </para>
+     </listitem>
+     <listitem>
+      <para>
+       A row filter is defined on the partition (<literal>child</literal>).
+      </para>
+     </listitem>
+    </itemizedlist>
+<programlisting>
+testsub=# ALTER SUBSCRIPTION s4 REFRESH PUBLICATION;
+ALTER SUBSCRIPTION
+</programlisting></para>
+
+   <para>
+    Do the inserts same as before.
+<programlisting>
+testpub=# TRUNCATE parent;
+TRUNCATE TABLE
+testpub=# INSERT INTO parent VALUES (2), (4), (6);
+INSERT 0 3
+testpub=# INSERT INTO child VALUES (3), (5), (7);
+INSERT 0 3
+
+testpub=# SELECT * FROM parent ORDER BY a;
+ a 
+---
+ 2
+ 3
+ 4
+ 5
+ 6
+ 7
+(6 rows)
+</programlisting>
+<programlisting>
+testsub=# SELECT * FROM child ORDER BY a;
+ a 
+---
+ 5
+ 6
+ 7
+(3 rows)
+</programlisting>
+    <itemizedlist>
+     <listitem>
+      <para>
+       The row filter of <literal>child</literal> was used (because
+       <literal>publish_via_partition_root</literal> was false).
+      </para>
+     </listitem>
+    </itemizedlist></para>
+
+  </sect2>
+
+ </sect1>
+
  <sect1 id="logical-replication-conflicts">
   <title>Conflicts</title>
 
diff --git a/doc/src/sgml/ref/create_publication.sgml b/doc/src/sgml/ref/create_publication.sgml
index fb2d013..23d883c 100644
--- a/doc/src/sgml/ref/create_publication.sgml
+++ b/doc/src/sgml/ref/create_publication.sgml
@@ -254,6 +254,8 @@ CREATE PUBLICATION <replaceable class="parameter">name</replaceable>
    <literal>publish_via_partition_root</literal> determines if it uses the
    partition's row filter (if the parameter is false, the default) or the root
    partitioned table's row filter.
+   See <xref linkend="logical-replication-row-filter"/> for details about row
+   filters.
   </para>
 
   <para>
-- 
1.8.3.1

#21

smithpb2250@gmail.com

almost 4 years ago

In reply to: Peter Smith (#17)

Re: PG DOCS - logical replication filtering

On Fri, Apr 8, 2022 at 4:12 PM Peter Smith <smithpb2250@gmail.com> wrote:

On Wed, Apr 6, 2022 at 8:58 PM Amit Kapila <amit.kapila16@gmail.com> wrote:

On Thu, Mar 24, 2022 at 11:48 AM Peter Smith <smithpb2250@gmail.com> wrote:

Review comments:
===============

[snip]

9. I suggest adding an example for partition tables showing how the
row filter is used based on the 'publish_via_partition_root'
parameter.

OK - I am working on this and will add it in a future patch version.

OK. Added in v7 [1]/messages/by-id/CAHut+Pt1X=3VaWRbx3yHByEMC-GPh4oeeMeJKJeTmOELDxZJHQ@mail.gmail.com

------
[1]: /messages/by-id/CAHut+Pt1X=3VaWRbx3yHByEMC-GPh4oeeMeJKJeTmOELDxZJHQ@mail.gmail.com

Kind Regards,
Peter Smith.
Fujitsu Australia

#22

smithpb2250@gmail.com

almost 4 years ago

In reply to: Amit Kapila (#19)

Re: PG DOCS - logical replication filtering

On Mon, Apr 11, 2022 at 1:27 PM Amit Kapila <amit.kapila16@gmail.com> wrote:

On Fri, Apr 8, 2022 at 11:42 AM Peter Smith <smithpb2250@gmail.com> wrote:
On Wed, Apr 6, 2022 at 8:58 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
3.
+   <para>
+    Whenever an <command>UPDATE</command> is processed, the row filter
+    expression is evaluated for both the old and new row (i.e. before
+    and after the data is updated).
+   </para>
+
+   <para>
+    If both evaluations are <literal>true</literal>, it replicates the
+    <command>UPDATE</command> change.
+   </para>
+
+   <para>
+    If both evaluations are <literal>false</literal>, it doesn't replicate
+    the change.
+   </para>
I think we can combine these three separate paragraphs.
The first sentence is the explanation, then there are the 3 separate
“If …” cases mentioned. It doesn’t seem quite right to group just the
first 2 “if…” cases into one paragraph, while leaving the 3rd one
separated. OTOH combining everything into one big paragraph seems
worse. Please confirm if you still want this changed.
Yeah, I think we should have something like what Euler's version had
and maybe keep a summary section from the current patch.

Modified in v7 [1]/messages/by-id/CAHut+Pt1X=3VaWRbx3yHByEMC-GPh4oeeMeJKJeTmOELDxZJHQ@mail.gmail.com. Removed the bullets. Structured the text
paragraphs the same way that Euler had it. The summary is kept as-is.

5.
+   <note>
+    <para>
+     Publication <literal>publish</literal> operations are ignored
when copying pre-existing table data.
+    </para>
+   </note>
+
+   <note>
+    <para>
+     If the subscriber is in a release prior to 15, copy pre-existing data
+     doesn't use row filters even if they are defined in the publication.
+     This is because old releases can only copy the entire table data.
+    </para>
+   </note>
I don't see the need for the first <note> here, the second one seems
to convey it.
Well, the 2nd note is only about compatibility with older versions
doing the subscribe. But the 1st note is not version-specific at all.
It is saying that the COPY does not take the “publish” option into
account. If you know of some other docs already mentioning this subtle
behaviour of the COPY then I can remove this note and just
cross-reference to the other place. But I did not know anywhere this
is already mentioned, so that is why I wrote the note about it.
I don't see the need to say about general initial sync (COPY) behavior
here. It is already defined at [1]. If we want to enhance, we can do
that as a separate patch to make changes where the initial sync is
explained. I am not sure that is required though.

Did you miss providing the link URL? Anyway, I removed the note in v7
[1]: /messages/by-id/CAHut+Pt1X=3VaWRbx3yHByEMC-GPh4oeeMeJKJeTmOELDxZJHQ@mail.gmail.com
at all).

I feel in the initial "Row Filters" and "Row Filter Rules" sections,
we don't need to have separate paragraphs. I think the same is pointed
out by Alvaro as well.

Modified in v7 [1]/messages/by-id/CAHut+Pt1X=3VaWRbx3yHByEMC-GPh4oeeMeJKJeTmOELDxZJHQ@mail.gmail.com those sections as suggested. I also assumed these
were the same sections that Alvaro was referring to.

------
[1]: /messages/by-id/CAHut+Pt1X=3VaWRbx3yHByEMC-GPh4oeeMeJKJeTmOELDxZJHQ@mail.gmail.com

Kind Regards,
Peter Smith.
Fujitsu Australia

#23

amit.kapila16@gmail.com

almost 4 years ago

In reply to: Peter Smith (#21)

1 attachment(s)

Re: PG DOCS - logical replication filtering

On Mon, Apr 11, 2022 at 12:39 PM Peter Smith <smithpb2250@gmail.com> wrote:

On Fri, Apr 8, 2022 at 4:12 PM Peter Smith <smithpb2250@gmail.com> wrote:

OK. Added in v7 [1]

Thanks, this looks mostly good to me. I didn't like the new section
added for partitioned tables examples, so I removed it and added some
explanation of the tests. I have slightly changed a few other lines. I
am planning to commit the attached tomorrow unless there are more
comments.

--
With Regards,
Amit Kapila.

Attachments:

v8-0001-Add-additional-documentation-for-row-filters.patchapplication/octet-stream; name=v8-0001-Add-additional-documentation-for-row-filters.patchDownload

From 3ca87f788d96939632d3cb9c3221d6c0ed9bc712 Mon Sep 17 00:00:00 2001
From: Amit Kapila <akapila@postgresql.org>
Date: Mon, 11 Apr 2022 15:11:59 +0530
Subject: [PATCH v8] Add additional documentation for row filters.

Commit 52e4f0cd47 added a feature to allow specifying row filters for
logical replication of tables. This patch adds detailed documentation on
that feature including examples to make it easier for users to understand.

Author: Peter Smith, Euler Taveira
Reviewed By: Greg Nancarrow, Aleksander Alekseev, Amit Kapila, Ajin Cherian, Alvaro Herrera
---
 doc/src/sgml/logical-replication.sgml    | 646 +++++++++++++++++++++++
 doc/src/sgml/ref/create_publication.sgml |   2 +
 2 files changed, 648 insertions(+)

diff --git a/doc/src/sgml/logical-replication.sgml b/doc/src/sgml/logical-replication.sgml
index 555fbd749c..04793c51be 100644
--- a/doc/src/sgml/logical-replication.sgml
+++ b/doc/src/sgml/logical-replication.sgml
@@ -118,6 +118,8 @@
    any combination of <command>INSERT</command>, <command>UPDATE</command>,
    <command>DELETE</command>, and <command>TRUNCATE</command>, similar to how triggers are fired by
    particular event types.  By default, all operation types are replicated.
+   (Row filters have no effect for <command>TRUNCATE</command>. See 
+   <xref linkend="logical-replication-row-filter"/>).
   </para>
 
   <para>
@@ -317,6 +319,650 @@
   </sect2>
  </sect1>
 
+ <sect1 id="logical-replication-row-filter">
+  <title>Row Filters</title>
+
+  <para>
+   By default, all data from all published tables will be replicated to the
+   appropriate subscribers. The replicated data can be reduced by using a 
+   <firstterm>row filter</firstterm>. A user might choose to use row filters
+   for behavioral, security or performance reasons. If a published table sets a
+   row filter, a row is replicated only if its data satisfies the row filter
+   expression. This allows a set of tables to be partially replicated. The row
+   filter is defined per table. Use a <literal>WHERE</literal> clause after the
+   table name for each published table that requires data to be filtered out.
+   The <literal>WHERE</literal> clause must be enclosed by parentheses. See
+   <xref linkend="sql-createpublication"/> for details.
+  </para>
+
+  <sect2 id="logical-replication-row-filter-rules">
+   <title>Row Filter Rules</title>
+
+   <para>
+    Row filters are applied <emphasis>before</emphasis> publishing the changes.
+    If the row filter evaluates to <literal>false</literal> or <literal>NULL</literal>
+    then the row is not replicated. The <literal>WHERE</literal> clause expression
+    is evaluated with the same role used for the replication connection (i.e.
+    the role specified in the <literal>CONNECTION</literal> clause of the
+    <xref linkend="sql-createsubscription"/>). Row filters have no effect for
+    <command>TRUNCATE</command> command.
+   </para>
+
+  </sect2>
+
+  <sect2 id="logical-replication-row-filter-restrictions">
+   <title>Expression Restrictions</title>
+
+   <para>
+    The <literal>WHERE</literal> clause allows only simple expressions. It
+    cannot contain user-defined functions, operators, types, and collations,
+    system column references or non-immutable built-in functions.
+   </para>
+
+   <para>
+    If a publication publishes <command>UPDATE</command> and/or
+    <command>DELETE</command> operations, the row filter <literal>WHERE</literal> clause
+    must contain only columns that are covered by the replica identity (see
+    <xref linkend="sql-altertable-replica-identity"/>). If a publication publishes only
+    <command>INSERT</command>, the row filter <literal>WHERE</literal> clause
+    can use any column.
+   </para>
+
+  </sect2>
+
+  <sect2 id="logical-replication-row-filter-transformations">
+   <title>UPDATE Transformations</title>
+
+   <para>
+    Whenever an <command>UPDATE</command> is processed, the row filter
+    expression is evaluated for both the old and new row (i.e. using the data
+    before and after the update). If both evaluations are <literal>true</literal>,
+    it replicates the <command>UPDATE</command> change. If both evaluations are
+    <literal>false</literal>, it doesn't replicate the change. If only one of
+    the old/new rows matches the row filter expression, the <command>UPDATE</command>
+    is transformed to <command>INSERT</command> or <command>DELETE</command>, to
+    avoid any data inconsistency. The row on the subscriber should reflect what
+    is defined by the row filter expression on the publisher.
+   </para>
+
+   <para>
+    If the old row satisfies the row filter expression (it was sent to the
+    subscriber) but the new row doesn't, then from a data consistency
+    perspective the old row should be removed from the subscriber.
+    So the <command>UPDATE</command> is transformed into a <command>DELETE</command>.
+   </para>
+
+   <para>
+    If the old row doesn't satisfy the row filter expression (it wasn't sent
+    to the subscriber) but the new row does, then from a data consistency
+    perspective the new row should be added to the subscriber.
+    So the <command>UPDATE</command> is transformed into an <command>INSERT</command>.
+   </para>
+
+   <para>
+Summary:
+<synopsis>
+Case 1: old-row (no match)    new-row (no match)   -->   (drop change)
+Case 2: old-row (no match)    new row (match)      -->   INSERT
+Case 3: old-row (match)       new-row (no match)   -->   DELETE
+Case 4: old-row (match)       new row (match)      -->   UPDATE
+</synopsis></para>
+
+  </sect2>
+
+  <sect2 id="logical-replication-row-filter-partitioned-table">
+   <title>Partitioned Tables</title>
+
+   <para>
+    If the publication contains a partitioned table, the publication parameter
+    <literal>publish_via_partition_root</literal> determines which row filter
+    is used.
+    <itemizedlist>
+     <listitem>
+      <para>
+       If <literal>publish_via_partition_root</literal> is <literal>false</literal>
+       (default), each <emphasis>partition's</emphasis> row filter is used.
+      </para>
+     </listitem>
+     <listitem>
+      <para>
+       If <literal>publish_via_partition_root</literal> is <literal>true</literal>,
+       the <emphasis>root partitioned table's</emphasis> row filter is used.
+      </para>
+     </listitem>
+    </itemizedlist></para>
+
+  </sect2>
+
+  <sect2 id="logical-replication-row-filter-initial-data-sync">
+   <title>Initial Data Synchronization</title>
+
+   <para>
+    If the subscription requires copying pre-existing table data
+    and a publication contains <literal>WHERE</literal> clauses, only data that
+    satisfies the row filter expressions is copied to the subscriber.
+   </para>
+
+   <para>
+    If the subscription has several publications in which a table has been
+    published with different <literal>WHERE</literal> clauses, rows that satisfy
+    <emphasis>any</emphasis> of the expressions will be copied. (see
+    <xref linkend="logical-replication-row-filter-combining"/> for details).
+   </para>
+
+   <note>
+    <para>
+     If the subscriber is in a release prior to 15, copy pre-existing data
+     doesn't use row filters even if they are defined in the publication.
+     This is because old releases can only copy the entire table data.
+    </para>
+   </note>
+
+  </sect2>
+
+  <sect2 id="logical-replication-row-filter-combining">
+   <title>Combining Multiple Row Filters</title>
+
+   <para>
+    If the subscription has several publications in which the same table has
+    been published with different row filters (for the same <literal>publish</literal>
+    operation), those expressions get OR'ed together, so that rows satisfying
+    <emphasis>any</emphasis> of the expressions will be replicated.
+   </para>
+
+   <para>
+    This means all the other row filters for the same table become redundant if:
+    <itemizedlist>
+     <listitem>
+      <para>
+       one of the publications has no row filter.
+      </para>
+     </listitem>
+     <listitem>
+      <para>
+       one of the publications was created using <literal>FOR ALL TABLES</literal>.
+       This clause does not allow row filters.
+      </para>
+     </listitem>
+     <listitem>
+      <para>
+       one of the publications was created using
+       <literal>FOR ALL TABLES IN SCHEMA</literal> and the table belongs to
+       the referred schema. This clause does not allow row filters.
+      </para>
+     </listitem>
+    </itemizedlist></para>
+
+  </sect2>
+
+  <sect2 id="logical-replication-row-filter-examples">
+   <title>Examples</title>
+
+   <para>
+    Create some tables to be used in the following examples.
+<programlisting>
+testpub=# CREATE TABLE t1(a int, b int, c text, primary key(a,c));
+CREATE TABLE
+testpub=# CREATE TABLE t2(d int, e int, f int, primary key(d));
+CREATE TABLE
+testpub=# CREATE TABLE t3(g int, h int, i int, primary key(g));
+CREATE TABLE
+</programlisting></para>
+
+   <para>
+    Create some publications.
+<programlisting>
+testpub=# CREATE PUBLICATION p1 FOR TABLE t1 WHERE (a > 5 AND c = 'NSW');
+CREATE PUBLICATION
+testpub=# CREATE PUBLICATION p2 FOR TABLE t1, t2 WHERE (e = 99);
+CREATE PUBLICATION
+testpub=# CREATE PUBLICATION p3 FOR TABLE t2 WHERE (d = 10), t3 WHERE (g = 10);
+CREATE PUBLICATION
+</programlisting>
+    <itemizedlist>
+     <listitem>
+      <para>
+       Publication <literal>p1</literal> has one table (<literal>t1</literal>) and
+       that table has a row filter.
+      </para>
+     </listitem>
+     <listitem>
+      <para>
+       Publication <literal>p2</literal> has two tables. Table <literal>t1</literal>
+       has no row filter, and table <literal>t2</literal> has a row filter.
+      </para>
+     </listitem>
+     <listitem>
+      <para>
+       Publication <literal>p3</literal> has two tables, and both of them have a
+       row filter.
+      </para>
+     </listitem>
+    </itemizedlist></para>
+
+   <para>
+    The PSQL command <command>\dRp+</command> shows the row filter expressions
+    (if defined) for each table of the publications.
+<programlisting>
+testpub=# \dRp+
+                               Publication p1
+  Owner   | All tables | Inserts | Updates | Deletes | Truncates | Via root 
+----------+------------+---------+---------+---------+-----------+----------
+ postgres | f          | t       | t       | t       | t         | f
+Tables:
+    "public.t1" WHERE ((a > 5) AND (c = 'NSW'::text))
+
+                               Publication p2
+  Owner   | All tables | Inserts | Updates | Deletes | Truncates | Via root 
+----------+------------+---------+---------+---------+-----------+----------
+ postgres | f          | t       | t       | t       | t         | f
+Tables:
+    "public.t1"
+    "public.t2" WHERE (e = 99)
+
+                               Publication p3
+  Owner   | All tables | Inserts | Updates | Deletes | Truncates | Via root 
+----------+------------+---------+---------+---------+-----------+----------
+ postgres | f          | t       | t       | t       | t         | f
+Tables:
+    "public.t2" WHERE (d = 10)
+    "public.t3" WHERE (g = 10)
+</programlisting></para>
+
+   <para>
+    The PSQL command <command>\d</command> shows what publications the table is
+    a member of, as well as that table's row filter expression (if defined) in
+    those publications.
+<programlisting>
+testpub=# \d t1
+                 Table "public.t1"
+ Column |  Type   | Collation | Nullable | Default 
+--------+---------+-----------+----------+---------
+ a      | integer |           | not null | 
+ b      | integer |           |          | 
+ c      | text    |           | not null | 
+Indexes:
+    "t1_pkey" PRIMARY KEY, btree (a, c)
+Publications:
+    "p1" WHERE ((a > 5) AND (c = 'NSW'::text))
+    "p2"
+
+testpub=# \d t2
+                 Table "public.t2"
+ Column |  Type   | Collation | Nullable | Default 
+--------+---------+-----------+----------+---------
+ d      | integer |           | not null | 
+ e      | integer |           |          | 
+ f      | integer |           |          | 
+Indexes:
+    "t2_pkey" PRIMARY KEY, btree (d)
+Publications:
+    "p2" WHERE (e = 99)
+    "p3" WHERE (d = 10)
+
+testpub=# \d t3
+                 Table "public.t3"
+ Column |  Type   | Collation | Nullable | Default 
+--------+---------+-----------+----------+---------
+ g      | integer |           | not null | 
+ h      | integer |           |          | 
+ i      | integer |           |          | 
+Indexes:
+    "t3_pkey" PRIMARY KEY, btree (g)
+Publications:
+    "p3" WHERE (g = 10)
+</programlisting>
+    <itemizedlist>
+     <listitem>
+      <para>
+       Table <literal>t1</literal> is a member of two publications, but
+       has a row filter only in <literal>p1</literal>.
+      </para>
+     </listitem>
+     <listitem>
+      <para>
+       Table <literal>t2</literal> is a member of two publications, and
+       has a different row filter in each of them.
+      </para>
+     </listitem>
+    </itemizedlist></para>
+
+   <para>
+    On the subscriber node, create a table <literal>t1</literal> with the same
+    definition as the one on the publisher, and also create the subscription
+    <literal>s1</literal> that subscribes to the publication <literal>p1</literal>.
+<programlisting>
+testsub=# CREATE TABLE t1(a int, b int, c text, primary key(a,c));
+CREATE TABLE
+testsub=# CREATE SUBSCRIPTION s1
+testsub-# CONNECTION 'host=localhost dbname=testpub application_name=s1'
+testsub-# PUBLICATION p1;
+CREATE SUBSCRIPTION
+</programlisting></para>
+
+   <para>
+    Insert some rows.
+<programlisting>
+testpub=# INSERT INTO t1 VALUES (2, 102, 'NSW');
+INSERT 0 1
+testpub=# INSERT INTO t1 VALUES (3, 103, 'QLD');
+INSERT 0 1
+testpub=# INSERT INTO t1 VALUES (4, 104, 'VIC');
+INSERT 0 1
+testpub=# INSERT INTO t1 VALUES (5, 105, 'ACT');
+INSERT 0 1
+testpub=# INSERT INTO t1 VALUES (6, 106, 'NSW');
+INSERT 0 1
+testpub=# INSERT INTO t1 VALUES (7, 107, 'NT');
+INSERT 0 1
+testpub=# INSERT INTO t1 VALUES (8, 108, 'QLD');
+INSERT 0 1
+testpub=# INSERT INTO t1 VALUES (9, 109, 'NSW');
+INSERT 0 1
+
+testpub=# SELECT * FROM t1; 
+ a |  b  |  c
+---+-----+-----
+ 2 | 102 | NSW
+ 3 | 103 | QLD
+ 4 | 104 | VIC
+ 5 | 105 | ACT
+ 6 | 106 | NSW
+ 7 | 107 | NT
+ 8 | 108 | QLD
+ 9 | 109 | NSW
+(8 rows)
+</programlisting>
+<programlisting>
+testsub=# SELECT * FROM t1;
+ a |  b  |  c  
+---+-----+-----
+ 6 | 106 | NSW
+ 9 | 109 | NSW
+(2 rows)
+</programlisting>
+    <itemizedlist>
+     <listitem>
+      <para>
+       Only the rows satisfying the <literal>t1 WHERE</literal>
+       clause of publication <literal>p1</literal> are replicated.
+      </para>
+     </listitem>
+    </itemizedlist></para>
+
+   <para>
+    Update some data, where the old and new row values both
+    satisfy the <literal>t1 WHERE</literal> clause of publication
+    <literal>p1</literal>.
+<programlisting>
+testpub=# UPDATE t1 SET b = 999 WHERE a = 6;
+UPDATE 1
+
+testpub=# SELECT * FROM t1;
+ a |  b  |  c  
+---+-----+-----
+ 2 | 102 | NSW
+ 3 | 103 | QLD
+ 4 | 104 | VIC
+ 5 | 105 | ACT
+ 7 | 107 | NT
+ 8 | 108 | QLD
+ 9 | 109 | NSW
+ 6 | 999 | NSW
+(8 rows)
+</programlisting>
+<programlisting>
+testsub=# SELECT * FROM t1;
+ a |  b  |  c  
+---+-----+-----
+ 9 | 109 | NSW
+ 6 | 999 | NSW
+(2 rows)
+</programlisting>
+    <itemizedlist>
+     <listitem>
+      <para>
+       The <command>UPDATE</command> replicates the change as normal.
+      </para>
+     </listitem>
+    </itemizedlist></para>
+
+   <para>
+    Update some data, where the old row values did not satisfy
+    the <literal>t1 WHERE</literal> clause of publication <literal>p1</literal>,
+    but the new row values do satisfy it.
+<programlisting>
+testpub=# UPDATE t1 SET a = 555 WHERE a = 2;
+UPDATE 1
+
+testpub=# SELECT * FROM t1;
+  a  |  b  |  c  
+-----+-----+-----
+   3 | 103 | QLD
+   4 | 104 | VIC
+   5 | 105 | ACT
+   7 | 107 | NT
+   8 | 108 | QLD
+   9 | 109 | NSW
+   6 | 999 | NSW
+ 555 | 102 | NSW
+(8 rows)
+</programlisting>
+<programlisting>
+testsub=# SELECT * FROM t1;
+  a  |  b  |  c  
+-----+-----+-----
+   9 | 109 | NSW
+   6 | 999 | NSW
+ 555 | 102 | NSW
+(3 rows)
+</programlisting>
+    <itemizedlist>
+     <listitem>
+      <para>
+       The <command>UPDATE</command> is transformed into an <command>INSERT</command>
+       and the change is replicated.
+      </para>
+     </listitem>
+    </itemizedlist></para>
+
+   <para>
+    Update some data, where the old row values satisfied
+    the <literal>t1 WHERE</literal> clause of publication <literal>p1</literal>,
+    but the new row values do not satisfy it.
+<programlisting>
+testpub=# UPDATE t1 SET c = 'VIC' WHERE a = 9;
+UPDATE 1
+
+testpub=# SELECT * FROM t1;
+  a  |  b  |  c  
+-----+-----+-----
+   3 | 103 | QLD
+   4 | 104 | VIC
+   5 | 105 | ACT
+   7 | 107 | NT
+   8 | 108 | QLD
+   6 | 999 | NSW
+ 555 | 102 | NSW
+   9 | 109 | VIC
+(8 rows)
+</programlisting>
+<programlisting>
+testsub=# SELECT * FROM t1;
+  a  |  b  |  c  
+-----+-----+-----
+   6 | 999 | NSW
+ 555 | 102 | NSW
+(2 rows)
+</programlisting>
+    <itemizedlist>
+     <listitem>
+      <para>
+       The <command>UPDATE</command> is transformed into a <command>DELETE</command>
+       and the change is replicated.
+      </para>
+     </listitem>
+    </itemizedlist></para>
+
+   <para>
+    The following examples show how the publication parameter
+    <literal>publish_via_partition_root</literal> determines whether the row
+    filter of the parent or child table will be used in the case of partitioned
+    tables.
+   </para>
+
+   <para>
+    Create the same partitioned tables on the publisher and subscriber.
+<programlisting>
+testpub=# CREATE TABLE parent(a int primary key) PARTITION BY RANGE(a);
+CREATE TABLE
+testpub=# CREATE TABLE child PARTITION OF parent DEFAULT;
+CREATE TABLE
+</programlisting>
+<programlisting>
+testsub=# CREATE TABLE parent(a int primary key) PARTITION BY RANGE(a);
+CREATE TABLE
+testsub=# CREATE TABLE child PARTITION OF parent DEFAULT;
+CREATE TABLE
+</programlisting></para>
+
+   <para>
+    Create a publication <literal>p4</literal>, and then subscribe to it.
+<programlisting>
+testpub=# CREATE PUBLICATION p4 FOR TABLE parent WHERE (a &lt; 5), child WHERE (a >= 5)
+testpub-# WITH (publish_via_partition_root=true);
+CREATE PUBLICATION
+</programlisting>
+    <itemizedlist>
+     <listitem>
+      <para>
+       The publication parameter <literal>publish_via_partition_root</literal>
+       is set as true.
+      </para>
+     </listitem>
+     <listitem>
+      <para>
+       There are row filters defined on both the partitioned table
+       (<literal>parent</literal>), and on the partition (<literal>child</literal>).
+      </para>
+     </listitem>
+    </itemizedlist>
+<programlisting>
+testsub=# CREATE SUBSCRIPTION s4
+testsub-# CONNECTION 'host=localhost dbname=testpub application_name=s4'
+testsub-# PUBLICATION p4;
+CREATE SUBSCRIPTION
+</programlisting></para>
+
+   <para>
+    Insert some values directly into the <literal>parent</literal> and
+    <literal>child</literal> tables.
+<programlisting>
+test_pub=# INSERT INTO parent VALUES (2), (4), (6);
+INSERT 0 3
+test_pub=# INSERT INTO child VALUES (3), (5), (7);
+INSERT 0 3
+</programlisting>
+<programlisting>
+test_pub=# SELECT * FROM parent ORDER BY a;
+ a 
+---
+ 2
+ 3
+ 4
+ 5
+ 6
+ 7
+(6 rows)
+</programlisting>
+<programlisting>
+test_sub=# SELECT * FROM parent ORDER BY a;
+ a 
+---
+ 2
+ 3
+ 4
+(3 rows)
+</programlisting>
+    <itemizedlist>
+     <listitem>
+      <para>
+       The row filter of <literal>parent</literal> was used (because
+       <literal>publish_via_partition_root</literal> was true).
+      </para>
+     </listitem>
+    </itemizedlist></para>
+
+   <para>
+    Repeat the same test, but with a different value for <literal>publish_via_partition_root</literal>.
+<programlisting>
+testpub=# DROP PUBLICATION p4;
+DROP PUBLICATION
+testpub=# CREATE PUBLICATION p4 FOR TABLE parent, child WHERE (a >= 5)
+testpub-# WITH (publish_via_partition_root=false);
+CREATE PUBLICATION
+</programlisting>
+    <itemizedlist>
+     <listitem>
+      <para>
+       The publication parameter <literal>publish_via_partition_root</literal>
+       is set as false.
+      </para>
+     </listitem>
+     <listitem>
+      <para>
+       A row filter is defined on the partition (<literal>child</literal>).
+      </para>
+     </listitem>
+    </itemizedlist>
+<programlisting>
+testsub=# ALTER SUBSCRIPTION s4 REFRESH PUBLICATION;
+ALTER SUBSCRIPTION
+</programlisting></para>
+
+   <para>
+    Do the inserts same as before.
+<programlisting>
+testpub=# TRUNCATE parent;
+TRUNCATE TABLE
+testpub=# INSERT INTO parent VALUES (2), (4), (6);
+INSERT 0 3
+testpub=# INSERT INTO child VALUES (3), (5), (7);
+INSERT 0 3
+
+testpub=# SELECT * FROM parent ORDER BY a;
+ a 
+---
+ 2
+ 3
+ 4
+ 5
+ 6
+ 7
+(6 rows)
+</programlisting>
+<programlisting>
+testsub=# SELECT * FROM child ORDER BY a;
+ a 
+---
+ 5
+ 6
+ 7
+(3 rows)
+</programlisting>
+    <itemizedlist>
+     <listitem>
+      <para>
+       The row filter of <literal>child</literal> was used (because
+       <literal>publish_via_partition_root</literal> was false).
+      </para>
+     </listitem>
+    </itemizedlist></para>
+
+  </sect2>
+
+ </sect1>
+
  <sect1 id="logical-replication-conflicts">
   <title>Conflicts</title>
 
diff --git a/doc/src/sgml/ref/create_publication.sgml b/doc/src/sgml/ref/create_publication.sgml
index fb2d013393..23d883c158 100644
--- a/doc/src/sgml/ref/create_publication.sgml
+++ b/doc/src/sgml/ref/create_publication.sgml
@@ -254,6 +254,8 @@ CREATE PUBLICATION <replaceable class="parameter">name</replaceable>
    <literal>publish_via_partition_root</literal> determines if it uses the
    partition's row filter (if the parameter is false, the default) or the root
    partitioned table's row filter.
+   See <xref linkend="logical-replication-row-filter"/> for details about row
+   filters.
   </para>
 
   <para>
-- 
2.28.0.windows.1

#24

[1]: https://www.postgresql.org/docs/devel/sql-createpublication.html

euler@eulerto.com

almost 4 years ago

In reply to: Amit Kapila (#23)

Re: PG DOCS - logical replication filtering

On Mon, Apr 11, 2022, at 7:40 AM, Amit Kapila wrote:

On Mon, Apr 11, 2022 at 12:39 PM Peter Smith <smithpb2250@gmail.com> wrote:

On Fri, Apr 8, 2022 at 4:12 PM Peter Smith <smithpb2250@gmail.com> wrote:

OK. Added in v7 [1]

Thanks, this looks mostly good to me. I didn't like the new section
added for partitioned tables examples, so I removed it and added some
explanation of the tests. I have slightly changed a few other lines. I
am planning to commit the attached tomorrow unless there are more
comments.

I have a few comments.

If a publication publishes UPDATE and/or DELETE operations ...

If we are talking about operations, use lowercase like I suggested in the
previous version. See the publish parameter [1]https://www.postgresql.org/docs/devel/sql-createpublication.html. If we are talking about
commands, the word "operations" should be removed or replaced by "commands".
The "and/or" isn't required, "or" implies the same. If you prefer "operations",
my suggestion is to adjust the last sentence that says "only INSERT" to "only
<italic>insert</italic> operation".

If the old row satisfies the row filter expression (it was sent to the
subscriber) but the new row doesn't, then from a data consistency perspective
the old row should be removed from the subscriber.

I suggested small additions to this sentence. We should at least add a comma
after "then" and "perspective". The same applies to the next paragraph too.

Regarding the "Summary", it is redundant as I said before. We already described
all 4 cases. I vote to remove it. However, if we would go with a table, I
suggest to improve the formatting: add borders, "old row" and "new row" should
be titles, "no match" and "match" should be represented by symbols ("" and "X",
for example), and "Case X" column should be removed because this extra column
adds nothing.

Regarding the "Partitioned Tables", I suggested to remove the bullets. We
generally have paragraphs with a few sentences. I tend to avoid short
paragraphs. I also didn't like the 2 bullets using almost the same words. In
the previous version, I suggested something like

If the publication contains a partitioned table, the parameter
publish_via_partition_root determines which row filter expression is used. If
the parameter publish_via_partition_root is true, the row filter expression
associated with the partitioned table is used. Otherwise, the row filter
expression associated with the individual partition is used.

will be copied. (see Section 31.3.6 for details).

There is an extra period after "copied" that should be removed. The other
option is to remove the parentheses and have another sentence for "See ...".

those expressions get OR'ed together

I prefer plain English here. This part of the sentence is also redundant with
the rest of the sentence so I suggested to remove it in the previous version.

rows that satisfy any of the row filter expressions is replicated.

instead of

those expressions get OR'ed together, so that rows satisfying any of the
expressions will be replicated.

I also didn't use a different paragraph (like I suggested in the previous
version) because we are talking about the same thing.

The bullets in the example sounds strange, that's why I suggested removing it.
We can even combine the 3 sentences into one paragraph.

The PSQL command \dRp+ shows the row filter expressions (if defined) for each
table of the publications.

Well, we don't use PSQL (upppercase) in the documentation. I suggested a
different sentence:

The psql shows the row filter expressions (if defined) for each table.

The PSQL command \d shows what publications the table is a member of, as well
as that table's row filter expression (if defined) in those publications.

It is not logical replication business to explain about psql behavior. If, for
some reason, someone decided to change it, this section will contain obsolete
information. The psql output is fine, the explanation is not. That's why I
suggested this modification.

Only the rows satisfying the t1 WHERE clause of publication p1 are
replicated.

Again, no bullets. This sentence is useful *before* the psql output. We're
presenting the results. Let's follow the pattern, describe the action and show
the results.

The UPDATE replicates the change as normal.

This sentence should be *before* the psql output (see my previous version).

Regarding the new examples (for partitioned tables), shouldn't we move the
parent / child definitions to the beginning of the Examples section? It seems
confusing use the same code snippet to show repeated table definitions
(publisher and subscriber). I checked fast and after a few seconds I realized
that the example is not wrong but the database name has a small difference (one
letter "s" x "p"). The publication and subscription definitions are fine there.

I think reusing the same tables and publication introduces complexity.
Shouldn't we just use different tables and publication to provide an "easy"
example? It would avoid DROP PUBLICATION, ALTER SUBSCRIPTION and TRUNCATE.

Do the inserts same as before.

We should indicate the node (publisher) to be clear.

--
Euler Taveira
EDB https://www.enterprisedb.com/

#25

amit.kapila16@gmail.com

almost 4 years ago

In reply to: Euler Taveira (#24)

Re: PG DOCS - logical replication filtering

On Mon, Apr 11, 2022 at 11:03 PM Euler Taveira <euler@eulerto.com> wrote:

On Mon, Apr 11, 2022, at 7:40 AM, Amit Kapila wrote:

Regarding the new examples (for partitioned tables), shouldn't we move the
parent / child definitions to the beginning of the Examples section?

I think that will make examples less clear.

It seems
confusing use the same code snippet to show repeated table definitions
(publisher and subscriber). I checked fast and after a few seconds I realized
that the example is not wrong but the database name has a small difference (one
letter "s" x "p").

Can you be more specific? AFAICS, dbname used (testpub) is same.

The publication and subscription definitions are fine there.

I think reusing the same tables and publication introduces complexity.
Shouldn't we just use different tables and publication to provide an "easy"
example? It would avoid DROP PUBLICATION, ALTER SUBSCRIPTION and TRUNCATE.

I don't know. I find the current way understandable. I feel using
different names won't gain much and make the example difficult to
follow.

--
With Regards,
Amit Kapila.

#26

smithpb2250@gmail.com

almost 4 years ago

In reply to: Amit Kapila (#23)

1 attachment(s)

Re: PG DOCS - logical replication filtering

PSA patch v9 which addresses most of Euler's review comments [1]/messages/by-id/1c78ebd4-b38d-4b5d-a6ea-d583efe87d97@www.fastmail.com

------
[1]: /messages/by-id/1c78ebd4-b38d-4b5d-a6ea-d583efe87d97@www.fastmail.com

Kind Regards,
Peter Smith.
Fujitsu Australia

Attachments:

v9-0001-Add-additional-documentation-for-row-filters.patchapplication/octet-stream; name=v9-0001-Add-additional-documentation-for-row-filters.patchDownload

From c3e79a600199941d761c4cdc27e0737285cb046e Mon Sep 17 00:00:00 2001
From: Peter Smith <peter.b.smith@fujitsu.com>
Date: Tue, 12 Apr 2022 17:46:45 +1000
Subject: [PATCH v9] Add additional documentation for row filters.

Commit 52e4f0cd47 added a feature to allow specifying row filters for
logical replication of tables. This patch adds detailed documentation on
that feature including examples to make it easier for users to understand.

Author: Peter Smith, Euler Taveira
Reviewed By: Greg Nancarrow, Aleksander Alekseev, Amit Kapila, Ajin Cherian, Alvaro Herrera
---
 doc/src/sgml/logical-replication.sgml    | 547 +++++++++++++++++++++++++++++++
 doc/src/sgml/ref/create_publication.sgml |   2 +
 2 files changed, 549 insertions(+)

diff --git a/doc/src/sgml/logical-replication.sgml b/doc/src/sgml/logical-replication.sgml
index 555fbd7..c2441f6 100644
--- a/doc/src/sgml/logical-replication.sgml
+++ b/doc/src/sgml/logical-replication.sgml
@@ -118,6 +118,8 @@
    any combination of <command>INSERT</command>, <command>UPDATE</command>,
    <command>DELETE</command>, and <command>TRUNCATE</command>, similar to how triggers are fired by
    particular event types.  By default, all operation types are replicated.
+   (Row filters have no effect for <command>TRUNCATE</command>. See 
+   <xref linkend="logical-replication-row-filter"/>).
   </para>
 
   <para>
@@ -317,6 +319,551 @@
   </sect2>
  </sect1>
 
+ <sect1 id="logical-replication-row-filter">
+  <title>Row Filters</title>
+
+  <para>
+   By default, all data from all published tables will be replicated to the
+   appropriate subscribers. The replicated data can be reduced by using a 
+   <firstterm>row filter</firstterm>. A user might choose to use row filters
+   for behavioral, security or performance reasons. If a published table sets a
+   row filter, a row is replicated only if its data satisfies the row filter
+   expression. This allows a set of tables to be partially replicated. The row
+   filter is defined per table. Use a <literal>WHERE</literal> clause after the
+   table name for each published table that requires data to be filtered out.
+   The <literal>WHERE</literal> clause must be enclosed by parentheses. See
+   <xref linkend="sql-createpublication"/> for details.
+  </para>
+
+  <sect2 id="logical-replication-row-filter-rules">
+   <title>Row Filter Rules</title>
+
+   <para>
+    Row filters are applied <emphasis>before</emphasis> publishing the changes.
+    If the row filter evaluates to <literal>false</literal> or <literal>NULL</literal>
+    then the row is not replicated. The <literal>WHERE</literal> clause expression
+    is evaluated with the same role used for the replication connection (i.e.
+    the role specified in the <literal>CONNECTION</literal> clause of the
+    <xref linkend="sql-createsubscription"/>). Row filters have no effect for
+    <command>TRUNCATE</command> command.
+   </para>
+
+  </sect2>
+
+  <sect2 id="logical-replication-row-filter-restrictions">
+   <title>Expression Restrictions</title>
+
+   <para>
+    The <literal>WHERE</literal> clause allows only simple expressions. It
+    cannot contain user-defined functions, operators, types, and collations,
+    system column references or non-immutable built-in functions.
+   </para>
+
+   <para>
+    If a publication publishes <command>UPDATE</command> and/or
+    <command>DELETE</command> operations, the row filter <literal>WHERE</literal> clause
+    must contain only columns that are covered by the replica identity (see
+    <xref linkend="sql-altertable-replica-identity"/>). If a publication publishes only
+    <command>INSERT</command>, the row filter <literal>WHERE</literal> clause
+    can use any column.
+   </para>
+
+  </sect2>
+
+  <sect2 id="logical-replication-row-filter-transformations">
+   <title>UPDATE Transformations</title>
+
+   <para>
+    Whenever an <command>UPDATE</command> is processed, the row filter
+    expression is evaluated for both the old and new row (i.e. using the data
+    before and after the update). If both evaluations are <literal>true</literal>,
+    it replicates the <command>UPDATE</command> change. If both evaluations are
+    <literal>false</literal>, it doesn't replicate the change. If only one of
+    the old/new rows matches the row filter expression, the <command>UPDATE</command>
+    is transformed to <command>INSERT</command> or <command>DELETE</command>, to
+    avoid any data inconsistency. The row on the subscriber should reflect what
+    is defined by the row filter expression on the publisher.
+   </para>
+
+   <para>
+    If the old row satisfies the row filter expression (it was sent to the
+    subscriber) but the new row doesn't, then, from a data consistency
+    perspective the old row should be removed from the subscriber.
+    So the <command>UPDATE</command> is transformed into a <command>DELETE</command>.
+   </para>
+
+   <para>
+    If the old row doesn't satisfy the row filter expression (it wasn't sent
+    to the subscriber) but the new row does, then, from a data consistency
+    perspective the new row should be added to the subscriber.
+    So the <command>UPDATE</command> is transformed into an <command>INSERT</command>.
+   </para>
+
+   <para>
+Summary:
+<synopsis>
+Old-row       New-row            Transformation
+(no match)    (no match)   -->   (drop change)
+(no match)    (match)      -->   INSERT
+(match)       (no match)   -->   DELETE
+(match)       (match)      -->   UPDATE
+</synopsis></para>
+
+  </sect2>
+
+  <sect2 id="logical-replication-row-filter-partitioned-table">
+   <title>Partitioned Tables</title>
+
+   <para>
+    If the publication contains a partitioned table, the publication parameter
+    <literal>publish_via_partition_root</literal> determines which row filter
+    is used. If <literal>publish_via_partition_root</literal> is <literal>true</literal>,
+    the <emphasis>root partitioned table's</emphasis> row filter is used. Otherwise,
+    if <literal>publish_via_partition_root</literal> is <literal>false</literal>
+    (default), each <emphasis>partition's</emphasis> row filter is used.
+   </para>
+
+  </sect2>
+
+  <sect2 id="logical-replication-row-filter-initial-data-sync">
+   <title>Initial Data Synchronization</title>
+
+   <para>
+    If the subscription requires copying pre-existing table data
+    and a publication contains <literal>WHERE</literal> clauses, only data that
+    satisfies the row filter expressions is copied to the subscriber.
+   </para>
+
+   <para>
+    If the subscription has several publications in which a table has been
+    published with different <literal>WHERE</literal> clauses, rows that satisfy
+    <emphasis>any</emphasis> of the expressions will be copied. See
+    <xref linkend="logical-replication-row-filter-combining"/> for details.
+   </para>
+
+   <note>
+    <para>
+     If the subscriber is in a release prior to 15, copy pre-existing data
+     doesn't use row filters even if they are defined in the publication.
+     This is because old releases can only copy the entire table data.
+    </para>
+   </note>
+
+  </sect2>
+
+  <sect2 id="logical-replication-row-filter-combining">
+   <title>Combining Multiple Row Filters</title>
+
+   <para>
+    If the subscription has several publications in which the same table has
+    been published with different row filters (for the same <literal>publish</literal>
+    operation), those expressions get OR'ed together, so that rows satisfying
+    <emphasis>any</emphasis> of the expressions will be replicated. This means all
+    the other row filters for the same table become redundant if:
+    <itemizedlist>
+     <listitem>
+      <para>
+       one of the publications has no row filter.
+      </para>
+     </listitem>
+     <listitem>
+      <para>
+       one of the publications was created using <literal>FOR ALL TABLES</literal>.
+       This clause does not allow row filters.
+      </para>
+     </listitem>
+     <listitem>
+      <para>
+       one of the publications was created using
+       <literal>FOR ALL TABLES IN SCHEMA</literal> and the table belongs to
+       the referred schema. This clause does not allow row filters.
+      </para>
+     </listitem>
+    </itemizedlist></para>
+
+  </sect2>
+
+  <sect2 id="logical-replication-row-filter-examples">
+   <title>Examples</title>
+
+   <para>
+    Create some tables to be used in the following examples.
+<programlisting>
+test_pub=# CREATE TABLE t1(a int, b int, c text, primary key(a,c));
+CREATE TABLE
+test_pub=# CREATE TABLE t2(d int, e int, f int, primary key(d));
+CREATE TABLE
+test_pub=# CREATE TABLE t3(g int, h int, i int, primary key(g));
+CREATE TABLE
+</programlisting></para>
+
+   <para>
+    Create some publications. Publication <literal>p1</literal> has one table
+    (<literal>t1</literal>) and that table has a row filter. Publication
+    <literal>p2</literal> has two tables. Table <literal>t1</literal> has no row
+    filter, and table <literal>t2</literal> has a row filter. Publication
+    <literal>p3</literal> has two tables, and both of them have a row filter.
+<programlisting>
+test_pub=# CREATE PUBLICATION p1 FOR TABLE t1 WHERE (a > 5 AND c = 'NSW');
+CREATE PUBLICATION
+test_pub=# CREATE PUBLICATION p2 FOR TABLE t1, t2 WHERE (e = 99);
+CREATE PUBLICATION
+test_pub=# CREATE PUBLICATION p3 FOR TABLE t2 WHERE (d = 10), t3 WHERE (g = 10);
+CREATE PUBLICATION
+</programlisting></para>
+
+   <para>
+    The psql command <command>\dRp+</command> shows the row filter expressions
+    (if defined) for each of the publications.
+<programlisting>
+test_pub=# \dRp+
+                               Publication p1
+  Owner   | All tables | Inserts | Updates | Deletes | Truncates | Via root 
+----------+------------+---------+---------+---------+-----------+----------
+ postgres | f          | t       | t       | t       | t         | f
+Tables:
+    "public.t1" WHERE ((a > 5) AND (c = 'NSW'::text))
+
+                               Publication p2
+  Owner   | All tables | Inserts | Updates | Deletes | Truncates | Via root 
+----------+------------+---------+---------+---------+-----------+----------
+ postgres | f          | t       | t       | t       | t         | f
+Tables:
+    "public.t1"
+    "public.t2" WHERE (e = 99)
+
+                               Publication p3
+  Owner   | All tables | Inserts | Updates | Deletes | Truncates | Via root 
+----------+------------+---------+---------+---------+-----------+----------
+ postgres | f          | t       | t       | t       | t         | f
+Tables:
+    "public.t2" WHERE (d = 10)
+    "public.t3" WHERE (g = 10)
+</programlisting></para>
+
+   <para>
+    The psql command <command>\d</command> shows what publications the table is
+    a member of, as well as that table's row filter expression (if defined) in
+    those publications. See that table <literal>t1</literal> is a member of two
+    publications, but has a row filter only in <literal>p1</literal>. See that
+    table <literal>t2</literal> is a member of two publications, and has a
+    different row filter in each of them.
+<programlisting>
+test_pub=# \d t1
+                 Table "public.t1"
+ Column |  Type   | Collation | Nullable | Default 
+--------+---------+-----------+----------+---------
+ a      | integer |           | not null | 
+ b      | integer |           |          | 
+ c      | text    |           | not null | 
+Indexes:
+    "t1_pkey" PRIMARY KEY, btree (a, c)
+Publications:
+    "p1" WHERE ((a > 5) AND (c = 'NSW'::text))
+    "p2"
+
+test_pub=# \d t2
+                 Table "public.t2"
+ Column |  Type   | Collation | Nullable | Default 
+--------+---------+-----------+----------+---------
+ d      | integer |           | not null | 
+ e      | integer |           |          | 
+ f      | integer |           |          | 
+Indexes:
+    "t2_pkey" PRIMARY KEY, btree (d)
+Publications:
+    "p2" WHERE (e = 99)
+    "p3" WHERE (d = 10)
+
+test_pub=# \d t3
+                 Table "public.t3"
+ Column |  Type   | Collation | Nullable | Default 
+--------+---------+-----------+----------+---------
+ g      | integer |           | not null | 
+ h      | integer |           |          | 
+ i      | integer |           |          | 
+Indexes:
+    "t3_pkey" PRIMARY KEY, btree (g)
+Publications:
+    "p3" WHERE (g = 10)
+</programlisting></para>
+
+   <para>
+    On the subscriber node, create a table <literal>t1</literal> with the same
+    definition as the one on the publisher, and also create the subscription
+    <literal>s1</literal> that subscribes to the publication <literal>p1</literal>.
+<programlisting>
+test_sub=# CREATE TABLE t1(a int, b int, c text, primary key(a,c));
+CREATE TABLE
+test_sub=# CREATE SUBSCRIPTION s1
+test_sub-# CONNECTION 'host=localhost dbname=test_pub application_name=s1'
+test_sub-# PUBLICATION p1;
+CREATE SUBSCRIPTION
+</programlisting></para>
+
+   <para>
+    Insert some rows. Only the rows satisfying the <literal>t1 WHERE</literal>
+    clause of publication <literal>p1</literal> are replicated.
+<programlisting>
+test_pub=# INSERT INTO t1 VALUES (2, 102, 'NSW');
+INSERT 0 1
+test_pub=# INSERT INTO t1 VALUES (3, 103, 'QLD');
+INSERT 0 1
+test_pub=# INSERT INTO t1 VALUES (4, 104, 'VIC');
+INSERT 0 1
+test_pub=# INSERT INTO t1 VALUES (5, 105, 'ACT');
+INSERT 0 1
+test_pub=# INSERT INTO t1 VALUES (6, 106, 'NSW');
+INSERT 0 1
+test_pub=# INSERT INTO t1 VALUES (7, 107, 'NT');
+INSERT 0 1
+test_pub=# INSERT INTO t1 VALUES (8, 108, 'QLD');
+INSERT 0 1
+test_pub=# INSERT INTO t1 VALUES (9, 109, 'NSW');
+INSERT 0 1
+
+test_pub=# SELECT * FROM t1; 
+ a |  b  |  c
+---+-----+-----
+ 2 | 102 | NSW
+ 3 | 103 | QLD
+ 4 | 104 | VIC
+ 5 | 105 | ACT
+ 6 | 106 | NSW
+ 7 | 107 | NT
+ 8 | 108 | QLD
+ 9 | 109 | NSW
+(8 rows)
+</programlisting>
+<programlisting>
+test_sub=# SELECT * FROM t1;
+ a |  b  |  c  
+---+-----+-----
+ 6 | 106 | NSW
+ 9 | 109 | NSW
+(2 rows)
+</programlisting></para>
+
+   <para>
+    Update some data, where the old and new row values both
+    satisfy the <literal>t1 WHERE</literal> clause of publication
+    <literal>p1</literal>. The <command>UPDATE</command> replicates
+    the change as normal.
+<programlisting>
+test_pub=# UPDATE t1 SET b = 999 WHERE a = 6;
+UPDATE 1
+
+test_pub=# SELECT * FROM t1;
+ a |  b  |  c  
+---+-----+-----
+ 2 | 102 | NSW
+ 3 | 103 | QLD
+ 4 | 104 | VIC
+ 5 | 105 | ACT
+ 7 | 107 | NT
+ 8 | 108 | QLD
+ 9 | 109 | NSW
+ 6 | 999 | NSW
+(8 rows)
+</programlisting>
+<programlisting>
+test_sub=# SELECT * FROM t1;
+ a |  b  |  c  
+---+-----+-----
+ 9 | 109 | NSW
+ 6 | 999 | NSW
+(2 rows)
+</programlisting></para>
+
+   <para>
+    Update some data, where the old row values did not satisfy
+    the <literal>t1 WHERE</literal> clause of publication <literal>p1</literal>,
+    but the new row values do satisfy it. The <command>UPDATE</command> is
+    transformed into an <command>INSERT</command> and the change is replicated.
+    See the new row on the subscriber.
+<programlisting>
+test_pub=# UPDATE t1 SET a = 555 WHERE a = 2;
+UPDATE 1
+
+test_pub=# SELECT * FROM t1;
+  a  |  b  |  c  
+-----+-----+-----
+   3 | 103 | QLD
+   4 | 104 | VIC
+   5 | 105 | ACT
+   7 | 107 | NT
+   8 | 108 | QLD
+   9 | 109 | NSW
+   6 | 999 | NSW
+ 555 | 102 | NSW
+(8 rows)
+</programlisting>
+<programlisting>
+test_sub=# SELECT * FROM t1;
+  a  |  b  |  c  
+-----+-----+-----
+   9 | 109 | NSW
+   6 | 999 | NSW
+ 555 | 102 | NSW
+(3 rows)
+</programlisting></para>
+
+   <para>
+    Update some data, where the old row values satisfied
+    the <literal>t1 WHERE</literal> clause of publication <literal>p1</literal>,
+    but the new row values do not satisfy it. The <command>UPDATE</command> is
+    transformed into a <command>DELETE</command> and the change is replicated.
+    See that the row is removed from the subscriber.
+<programlisting>
+test_pub=# UPDATE t1 SET c = 'VIC' WHERE a = 9;
+UPDATE 1
+
+test_pub=# SELECT * FROM t1;
+  a  |  b  |  c  
+-----+-----+-----
+   3 | 103 | QLD
+   4 | 104 | VIC
+   5 | 105 | ACT
+   7 | 107 | NT
+   8 | 108 | QLD
+   6 | 999 | NSW
+ 555 | 102 | NSW
+   9 | 109 | VIC
+(8 rows)
+</programlisting>
+<programlisting>
+test_sub=# SELECT * FROM t1;
+  a  |  b  |  c  
+-----+-----+-----
+   6 | 999 | NSW
+ 555 | 102 | NSW
+(2 rows)
+</programlisting></para>
+
+   <para>
+    The following examples show how the publication parameter
+    <literal>publish_via_partition_root</literal> determines whether the row
+    filter of the parent or child table will be used in the case of partitioned
+    tables.
+   </para>
+
+   <para>
+    Create the same partitioned tables on the publisher and subscriber.
+<programlisting>
+test_pub=# CREATE TABLE parent(a int primary key) PARTITION BY RANGE(a);
+CREATE TABLE
+test_pub=# CREATE TABLE child PARTITION OF parent DEFAULT;
+CREATE TABLE
+</programlisting>
+<programlisting>
+test_sub=# CREATE TABLE parent(a int primary key) PARTITION BY RANGE(a);
+CREATE TABLE
+test_sub=# CREATE TABLE child PARTITION OF parent DEFAULT;
+CREATE TABLE
+</programlisting></para>
+
+   <para>
+    Create a publication <literal>p4</literal>, and then subscribe to it. The
+    publication parameter <literal>publish_via_partition_root</literal> is set
+    as true. There are row filters defined on both the partitioned table
+    (<literal>parent</literal>), and on the partition (<literal>child</literal>).
+<programlisting>
+test_pub=# CREATE PUBLICATION p4 FOR TABLE parent WHERE (a &lt; 5), child WHERE (a >= 5)
+test_pub-# WITH (publish_via_partition_root=true);
+CREATE PUBLICATION
+</programlisting>
+<programlisting>
+test_sub=# CREATE SUBSCRIPTION s4
+test_sub-# CONNECTION 'host=localhost dbname=test_pub application_name=s4'
+test_sub-# PUBLICATION p4;
+CREATE SUBSCRIPTION
+</programlisting></para>
+
+   <para>
+    Insert some values directly into the <literal>parent</literal> and
+    <literal>child</literal> tables. They replicate using the row filter of
+    <literal>parent</literal> (because <literal>publish_via_partition_root</literal>
+    is true).
+<programlisting>
+test_pub=# INSERT INTO parent VALUES (2), (4), (6);
+INSERT 0 3
+test_pub=# INSERT INTO child VALUES (3), (5), (7);
+INSERT 0 3
+
+test_pub=# SELECT * FROM parent ORDER BY a;
+ a 
+---
+ 2
+ 3
+ 4
+ 5
+ 6
+ 7
+(6 rows)
+</programlisting>
+<programlisting>
+test_sub=# SELECT * FROM parent ORDER BY a;
+ a 
+---
+ 2
+ 3
+ 4
+(3 rows)
+</programlisting></para>
+
+   <para>
+    Repeat the same test, but with a different value for <literal>publish_via_partition_root</literal>.
+    The publication parameter <literal>publish_via_partition_root</literal> is
+    set as false. A row filter is defined on the partition (<literal>child</literal>).
+<programlisting>
+test_pub=# DROP PUBLICATION p4;
+DROP PUBLICATION
+test_pub=# CREATE PUBLICATION p4 FOR TABLE parent, child WHERE (a >= 5)
+test_pub-# WITH (publish_via_partition_root=false);
+CREATE PUBLICATION
+</programlisting>
+<programlisting>
+test_sub=# ALTER SUBSCRIPTION s4 REFRESH PUBLICATION;
+ALTER SUBSCRIPTION
+</programlisting></para>
+
+   <para>
+    Do the inserts on the publisher same as before. They replicate using the
+    row filter of <literal>child</literal> (because
+    <literal>publish_via_partition_root</literal> is false).
+<programlisting>
+test_pub=# TRUNCATE parent;
+TRUNCATE TABLE
+test_pub=# INSERT INTO parent VALUES (2), (4), (6);
+INSERT 0 3
+test_pub=# INSERT INTO child VALUES (3), (5), (7);
+INSERT 0 3
+
+test_pub=# SELECT * FROM parent ORDER BY a;
+ a 
+---
+ 2
+ 3
+ 4
+ 5
+ 6
+ 7
+(6 rows)
+</programlisting>
+<programlisting>
+test_sub=# SELECT * FROM child ORDER BY a;
+ a 
+---
+ 5
+ 6
+ 7
+(3 rows)
+</programlisting></para>
+
+  </sect2>
+
+ </sect1>
+
  <sect1 id="logical-replication-conflicts">
   <title>Conflicts</title>
 
diff --git a/doc/src/sgml/ref/create_publication.sgml b/doc/src/sgml/ref/create_publication.sgml
index fb2d013..23d883c 100644
--- a/doc/src/sgml/ref/create_publication.sgml
+++ b/doc/src/sgml/ref/create_publication.sgml
@@ -254,6 +254,8 @@ CREATE PUBLICATION <replaceable class="parameter">name</replaceable>
    <literal>publish_via_partition_root</literal> determines if it uses the
    partition's row filter (if the parameter is false, the default) or the root
    partitioned table's row filter.
+   See <xref linkend="logical-replication-row-filter"/> for details about row
+   filters.
   </para>
 
   <para>
-- 
1.8.3.1

#27

smithpb2250@gmail.com

almost 4 years ago

In reply to: Euler Taveira (#24)

Re: PG DOCS - logical replication filtering

On Tue, Apr 12, 2022 at 3:33 AM Euler Taveira <euler@eulerto.com> wrote:

On Mon, Apr 11, 2022, at 7:40 AM, Amit Kapila wrote:

On Mon, Apr 11, 2022 at 12:39 PM Peter Smith <smithpb2250@gmail.com> wrote:

On Fri, Apr 8, 2022 at 4:12 PM Peter Smith <smithpb2250@gmail.com> wrote:

OK. Added in v7 [1]

Thanks, this looks mostly good to me. I didn't like the new section
added for partitioned tables examples, so I removed it and added some
explanation of the tests. I have slightly changed a few other lines. I
am planning to commit the attached tomorrow unless there are more
comments.

I have a few comments.

Thanks for your review comments! I addressed most of them as suggested
- see the details below.

If a publication publishes UPDATE and/or DELETE operations ...

If we are talking about operations, use lowercase like I suggested in the
previous version. See the publish parameter [1]. If we are talking about
commands, the word "operations" should be removed or replaced by "commands".
The "and/or" isn't required, "or" implies the same. If you prefer "operations",
my suggestion is to adjust the last sentence that says "only INSERT" to "only
<italic>insert</italic> operation".

Not changed. Because in fact, I copied most of this sentence
(including the uppercase, "operations", "and/or") from existing
documentation [1]https://www.postgresql.org/docs/current/sql-createpublication.html
e.g. see "The tables added to a publication that publishes UPDATE
and/or DELETE operations must ..."
[1]: https://www.postgresql.org/docs/current/sql-createpublication.html

If the old row satisfies the row filter expression (it was sent to the
subscriber) but the new row doesn't, then from a data consistency perspective
the old row should be removed from the subscriber.

I suggested small additions to this sentence. We should at least add a comma
after "then" and "perspective". The same applies to the next paragraph too.

Modified the commas in [v9] as suggested.

Regarding the "Summary", it is redundant as I said before. We already described
all 4 cases. I vote to remove it. However, if we would go with a table, I
suggest to improve the formatting: add borders, "old row" and "new row" should
be titles, "no match" and "match" should be represented by symbols ("" and "X",
for example), and "Case X" column should be removed because this extra column
adds nothing.

Yeah, I know the information is the same in the summary and in the
text. Personally, I find big slabs of technical text difficult to
digest, so I'd have to spend 5 minutes of careful reading and drawing
the exact same summary on a piece of paper anyway just to visualize
what the text says. The summary makes it easy to understand at a
glance. But I have modified the summary in [v9] to remove the "case"
and to add other column headings as suggested.

Regarding the "Partitioned Tables", I suggested to remove the bullets. We
generally have paragraphs with a few sentences. I tend to avoid short
paragraphs. I also didn't like the 2 bullets using almost the same words. In
the previous version, I suggested something like

If the publication contains a partitioned table, the parameter
publish_via_partition_root determines which row filter expression is used. If
the parameter publish_via_partition_root is true, the row filter expression
associated with the partitioned table is used. Otherwise, the row filter
expression associated with the individual partition is used.

Modified in [v9] to remove the bullets.

will be copied. (see Section 31.3.6 for details).

There is an extra period after "copied" that should be removed. The other
option is to remove the parentheses and have another sentence for "See ...".

Modified in [v9] as suggested.

those expressions get OR'ed together

I prefer plain English here. This part of the sentence is also redundant with
the rest of the sentence so I suggested to remove it in the previous version.

rows that satisfy any of the row filter expressions is replicated.

instead of

those expressions get OR'ed together, so that rows satisfying any of the
expressions will be replicated.

Not changed. The readers of this docs page are all users who will be
familiar with the filter expressions, so I felt the "OR'ed together"
part will be perfectly clear to the intended audience.

I also didn't use a different paragraph (like I suggested in the previous
version) because we are talking about the same thing.

Modified in [v9] to use a single paragraph.

The bullets in the example sounds strange, that's why I suggested removing it.
We can even combine the 3 sentences into one paragraph.

Modified [v9] so the whole example now has no bullets. Also combined
all these 3 sentences as suggested.

The PSQL command \dRp+ shows the row filter expressions (if defined) for each
table of the publications.

Well, we don't use PSQL (upppercase) in the documentation. I suggested a
different sentence:

The psql shows the row filter expressions (if defined) for each table.

Modified the sentence in [v9]. Now it uses lowercase psql.

The PSQL command \d shows what publications the table is a member of, as well
as that table's row filter expression (if defined) in those publications.

It is not logical replication business to explain about psql behavior. If, for
some reason, someone decided to change it, this section will contain obsolete
information. The psql output is fine, the explanation is not. That's why I
suggested this modification.

Modified [v9] this sentence also to use lowercase psql.

But I did not understand your point about “If, for some reason,
someone decided to change it, this section will contain obsolete
information”, because IIUC that will be equally true for both the
explanation and the output, so I did not understand why you say "psql
output is fine, the explanation is not". It is the business of this
documentation to help the user to know how and where they can find the
row filter information they may need to know.

Only the rows satisfying the t1 WHERE clause of publication p1 are
replicated.

Again, no bullets. This sentence is useful *before* the psql output. We're
presenting the results. Let's follow the pattern, describe the action and show
the results.

OK. Modified all the [v9] example now has all the bullets removed and
follows the suggested pattern (e.g. where the notes always come
*before* the results)

The UPDATE replicates the change as normal.

This sentence should be *before* the psql output (see my previous version).

Modified [v9] as suggested.

Regarding the new examples (for partitioned tables), shouldn't we move the
parent / child definitions to the beginning of the Examples section?

Not changed. IMO if we moved those CREATE TABLE as suggested then they
will then be too far away from where they are being used.

It seems
confusing use the same code snippet to show repeated table definitions
(publisher and subscriber). I checked fast and after a few seconds I realized
that the example is not wrong but the database name has a small difference (one
letter "s" x "p"). The publication and subscription definitions are fine there.

Not changed. The publisher and subscriber programlistings are always
separated. If you are looking at the rendered HTML I think it is quite
clear that one is at the publisher and one is at the subscriber. OTOH,
if we omitted creating the tables on the subscriber then I think that
really would cause some confusion.

I think reusing the same tables and publication introduces complexity.
Shouldn't we just use different tables and publication to provide an "easy"
example? It would avoid DROP PUBLICATION, ALTER SUBSCRIPTION and TRUNCATE.

Not changed. Those same tables were re-used *deliberately* so that the
examples could use identical inserts, and to emphasize that the
different behaviour was caused only by the
"publish_via_partition_root" setting.

Do the inserts same as before.

We should indicate the node (publisher) to be clear.

OK. Modified [v9] as suggested.

------
[v9] /messages/by-id/CAHut+PvYqo77rwg_vHC=OyQ7hCHZGVm=Ni+JQbf8VyBz8hoo2w@mail.gmail.com

Kind Regards,
Peter Smith.
Fujitsu Australia

#28

euler@eulerto.com

almost 4 years ago

In reply to: Peter Smith (#27)

Re: PG DOCS - logical replication filtering

On Tue, Apr 12, 2022, at 5:30 AM, Peter Smith wrote:

Not changed. Because in fact, I copied most of this sentence
(including the uppercase, "operations", "and/or") from existing
documentation [1]
e.g. see "The tables added to a publication that publishes UPDATE
and/or DELETE operations must ..."
[1] https://www.postgresql.org/docs/current/sql-createpublication.html

Hmm. You are checking the Notes. I'm referring the the publish parameter. IMO
this sentence should use operations in lowercase letters too because even if
you create it with uppercase letters, Postgres will provide a lowercase word
when you dump it.

Yeah, I know the information is the same in the summary and in the
text. Personally, I find big slabs of technical text difficult to
digest, so I'd have to spend 5 minutes of careful reading and drawing
the exact same summary on a piece of paper anyway just to visualize
what the text says. The summary makes it easy to understand at a
glance. But I have modified the summary in [v9] to remove the "case"
and to add other column headings as suggested.

Isn't it better to use a table instead of synopsis?

Not changed. The readers of this docs page are all users who will be
familiar with the filter expressions, so I felt the "OR'ed together"
part will be perfectly clear to the intended audience.

If you want to keep it, change it to "ORed". It is used in indices.sgml. Let's
keep the consistence.

But I did not understand your point about “If, for some reason,
someone decided to change it, this section will contain obsolete
information”, because IIUC that will be equally true for both the
explanation and the output, so I did not understand why you say "psql
output is fine, the explanation is not". It is the business of this
documentation to help the user to know how and where they can find the
row filter information they may need to know.

You are describing a psql command here. My point is keep psql explanation in
the psql section. This section is to describe the row filter feature. If we
start describing features in other sections, we will have outdated information
when the referred feature is changed and someone fails to find all references.
I tend to concentrate detailed explanation in the feature section. If I have to
add links in other sections, I use "Seee Section 1.23 for details ...".

Not changed. The publisher and subscriber programlistings are always
separated. If you are looking at the rendered HTML I think it is quite
clear that one is at the publisher and one is at the subscriber. OTOH,
if we omitted creating the tables on the subscriber then I think that
really would cause some confusion.

The difference is extra space. By default, the CSS does not include a border
for programlisting. That's why I complained about it. I noticed that the
website CSS includes it. However, the PDF will not include the border. I would
add a separate description for the subscriber just to be clear.

One last suggestion, you are using identifiers in uppercase letters but
"primary key" is in lowercase.

--
Euler Taveira
EDB https://www.enterprisedb.com/

#29

smithpb2250@gmail.com

almost 4 years ago

In reply to: Euler Taveira (#28)

1 attachment(s)

Re: PG DOCS - logical replication filtering

PSA patch v10 which addresses the remaining review comments from Euler [1]/messages/by-id/3cd8d622-6a26-4eaf-a5aa-ac78030e5f50@www.fastmail.com

------
[1]: /messages/by-id/3cd8d622-6a26-4eaf-a5aa-ac78030e5f50@www.fastmail.com

Kind Regards,
Peter Smith.
Fujitsu Australia

Attachments:

v10-0001-Add-additional-documentation-for-row-filters.patchapplication/octet-stream; name=v10-0001-Add-additional-documentation-for-row-filters.patchDownload

From c78599c747b1e72cde5ca56da5eae481fd9f3bd8 Mon Sep 17 00:00:00 2001
From: Peter Smith <peter.b.smith@fujitsu.com>
Date: Wed, 13 Apr 2022 13:20:09 +1000
Subject: [PATCH v10] Add additional documentation for row filters.

Commit 52e4f0cd47 added a feature to allow specifying row filters for
logical replication of tables. This patch adds detailed documentation on
that feature including examples to make it easier for users to understand.

Author: Peter Smith, Euler Taveira
Reviewed By: Greg Nancarrow, Aleksander Alekseev, Amit Kapila, Ajin Cherian, Alvaro Herrera
---
 doc/src/sgml/logical-replication.sgml    | 562 +++++++++++++++++++++++++++++++
 doc/src/sgml/ref/create_publication.sgml |   2 +
 2 files changed, 564 insertions(+)

diff --git a/doc/src/sgml/logical-replication.sgml b/doc/src/sgml/logical-replication.sgml
index 1126ce4..bc4d700 100644
--- a/doc/src/sgml/logical-replication.sgml
+++ b/doc/src/sgml/logical-replication.sgml
@@ -118,6 +118,8 @@
    any combination of <command>INSERT</command>, <command>UPDATE</command>,
    <command>DELETE</command>, and <command>TRUNCATE</command>, similar to how triggers are fired by
    particular event types.  By default, all operation types are replicated.
+   (Row filters have no effect for <command>TRUNCATE</command>. See 
+   <xref linkend="logical-replication-row-filter"/>).
   </para>
 
   <para>
@@ -317,6 +319,566 @@
   </sect2>
  </sect1>
 
+ <sect1 id="logical-replication-row-filter">
+  <title>Row Filters</title>
+
+  <para>
+   By default, all data from all published tables will be replicated to the
+   appropriate subscribers. The replicated data can be reduced by using a 
+   <firstterm>row filter</firstterm>. A user might choose to use row filters
+   for behavioral, security or performance reasons. If a published table sets a
+   row filter, a row is replicated only if its data satisfies the row filter
+   expression. This allows a set of tables to be partially replicated. The row
+   filter is defined per table. Use a <literal>WHERE</literal> clause after the
+   table name for each published table that requires data to be filtered out.
+   The <literal>WHERE</literal> clause must be enclosed by parentheses. See
+   <xref linkend="sql-createpublication"/> for details.
+  </para>
+
+  <sect2 id="logical-replication-row-filter-rules">
+   <title>Row Filter Rules</title>
+
+   <para>
+    Row filters are applied <emphasis>before</emphasis> publishing the changes.
+    If the row filter evaluates to <literal>false</literal> or <literal>NULL</literal>
+    then the row is not replicated. The <literal>WHERE</literal> clause expression
+    is evaluated with the same role used for the replication connection (i.e.
+    the role specified in the <literal>CONNECTION</literal> clause of the
+    <xref linkend="sql-createsubscription"/>). Row filters have no effect for
+    <command>TRUNCATE</command> command.
+   </para>
+
+  </sect2>
+
+  <sect2 id="logical-replication-row-filter-restrictions">
+   <title>Expression Restrictions</title>
+
+   <para>
+    The <literal>WHERE</literal> clause allows only simple expressions. It
+    cannot contain user-defined functions, operators, types, and collations,
+    system column references or non-immutable built-in functions.
+   </para>
+
+   <para>
+    If a publication publishes <command>UPDATE</command> or
+    <command>DELETE</command> operations, the row filter <literal>WHERE</literal>
+    clause must contain only columns that are covered by the replica identity
+    (see <xref linkend="sql-altertable-replica-identity"/>). If a publication
+    publishes only <command>INSERT</command> operations, the row filter
+    <literal>WHERE</literal> clause can use any column.
+   </para>
+
+  </sect2>
+
+  <sect2 id="logical-replication-row-filter-transformations">
+   <title>UPDATE Transformations</title>
+
+   <para>
+    Whenever an <command>UPDATE</command> is processed, the row filter
+    expression is evaluated for both the old and new row (i.e. using the data
+    before and after the update). If both evaluations are <literal>true</literal>,
+    it replicates the <command>UPDATE</command> change. If both evaluations are
+    <literal>false</literal>, it doesn't replicate the change. If only one of
+    the old/new rows matches the row filter expression, the <command>UPDATE</command>
+    is transformed to <command>INSERT</command> or <command>DELETE</command>, to
+    avoid any data inconsistency. The row on the subscriber should reflect what
+    is defined by the row filter expression on the publisher.
+   </para>
+
+   <para>
+    If the old row satisfies the row filter expression (it was sent to the
+    subscriber) but the new row doesn't, then, from a data consistency
+    perspective the old row should be removed from the subscriber.
+    So the <command>UPDATE</command> is transformed into a <command>DELETE</command>.
+   </para>
+
+   <para>
+    If the old row doesn't satisfy the row filter expression (it wasn't sent
+    to the subscriber) but the new row does, then, from a data consistency
+    perspective the new row should be added to the subscriber.
+    So the <command>UPDATE</command> is transformed into an <command>INSERT</command>.
+   </para>
+
+   <table>
+    <title><command>UPDATE</command> Transformation Summary</title>
+    <tgroup cols="3">
+    <thead>
+     <row>
+      <entry>Old row</entry><entry>New row</entry><entry>Transformation</entry>
+     </row>
+    </thead>
+    <tbody>
+     <row>
+      <entry>no match</entry><entry>no match</entry><entry>don't replicate</entry>
+     </row>
+     <row>
+      <entry>no match</entry><entry>match</entry><entry><literal>INSERT</literal></entry>
+     </row>
+     <row>
+      <entry>match</entry><entry>no match</entry><entry><literal>DELETE</literal></entry>
+     </row>
+     <row>
+      <entry>match</entry><entry>match</entry><entry><literal>UPDATE</literal></entry>
+     </row>
+    </tbody>
+   </tgroup>
+   </table>
+
+  </sect2>
+
+  <sect2 id="logical-replication-row-filter-partitioned-table">
+   <title>Partitioned Tables</title>
+
+   <para>
+    If the publication contains a partitioned table, the publication parameter
+    <literal>publish_via_partition_root</literal> determines which row filter
+    is used. If <literal>publish_via_partition_root</literal> is <literal>true</literal>,
+    the <emphasis>root partitioned table's</emphasis> row filter is used. Otherwise,
+    if <literal>publish_via_partition_root</literal> is <literal>false</literal>
+    (default), each <emphasis>partition's</emphasis> row filter is used.
+   </para>
+
+  </sect2>
+
+  <sect2 id="logical-replication-row-filter-initial-data-sync">
+   <title>Initial Data Synchronization</title>
+
+   <para>
+    If the subscription requires copying pre-existing table data
+    and a publication contains <literal>WHERE</literal> clauses, only data that
+    satisfies the row filter expressions is copied to the subscriber.
+   </para>
+
+   <para>
+    If the subscription has several publications in which a table has been
+    published with different <literal>WHERE</literal> clauses, rows that satisfy
+    <emphasis>any</emphasis> of the expressions will be copied. See
+    <xref linkend="logical-replication-row-filter-combining"/> for details.
+   </para>
+
+   <note>
+    <para>
+     If the subscriber is in a release prior to 15, copy pre-existing data
+     doesn't use row filters even if they are defined in the publication.
+     This is because old releases can only copy the entire table data.
+    </para>
+   </note>
+
+  </sect2>
+
+  <sect2 id="logical-replication-row-filter-combining">
+   <title>Combining Multiple Row Filters</title>
+
+   <para>
+    If the subscription has several publications in which the same table has
+    been published with different row filters (for the same <literal>publish</literal>
+    operation), those expressions get ORed together, so that rows satisfying
+    <emphasis>any</emphasis> of the expressions will be replicated. This means all
+    the other row filters for the same table become redundant if:
+    <itemizedlist>
+     <listitem>
+      <para>
+       one of the publications has no row filter.
+      </para>
+     </listitem>
+     <listitem>
+      <para>
+       one of the publications was created using <literal>FOR ALL TABLES</literal>.
+       This clause does not allow row filters.
+      </para>
+     </listitem>
+     <listitem>
+      <para>
+       one of the publications was created using
+       <literal>FOR ALL TABLES IN SCHEMA</literal> and the table belongs to
+       the referred schema. This clause does not allow row filters.
+      </para>
+     </listitem>
+    </itemizedlist></para>
+
+  </sect2>
+
+  <sect2 id="logical-replication-row-filter-examples">
+   <title>Examples</title>
+
+   <para>
+    Create some tables to be used in the following examples.
+<programlisting>
+test_pub=# CREATE TABLE t1(a int, b int, c text, PRIMARY KEY(a,c));
+CREATE TABLE
+test_pub=# CREATE TABLE t2(d int, e int, f int, PRIMARY KEY(d));
+CREATE TABLE
+test_pub=# CREATE TABLE t3(g int, h int, i int, PRIMARY KEY(g));
+CREATE TABLE
+</programlisting></para>
+
+   <para>
+    Create some publications. Publication <literal>p1</literal> has one table
+    (<literal>t1</literal>) and that table has a row filter. Publication
+    <literal>p2</literal> has two tables. Table <literal>t1</literal> has no row
+    filter, and table <literal>t2</literal> has a row filter. Publication
+    <literal>p3</literal> has two tables, and both of them have a row filter.
+<programlisting>
+test_pub=# CREATE PUBLICATION p1 FOR TABLE t1 WHERE (a > 5 AND c = 'NSW');
+CREATE PUBLICATION
+test_pub=# CREATE PUBLICATION p2 FOR TABLE t1, t2 WHERE (e = 99);
+CREATE PUBLICATION
+test_pub=# CREATE PUBLICATION p3 FOR TABLE t2 WHERE (d = 10), t3 WHERE (g = 10);
+CREATE PUBLICATION
+</programlisting></para>
+
+   <para>
+    <command>psql</command> can be used to show the row filter expressions (if
+    defined) for each publication.
+<programlisting>
+test_pub=# \dRp+
+                               Publication p1
+  Owner   | All tables | Inserts | Updates | Deletes | Truncates | Via root 
+----------+------------+---------+---------+---------+-----------+----------
+ postgres | f          | t       | t       | t       | t         | f
+Tables:
+    "public.t1" WHERE ((a > 5) AND (c = 'NSW'::text))
+
+                               Publication p2
+  Owner   | All tables | Inserts | Updates | Deletes | Truncates | Via root 
+----------+------------+---------+---------+---------+-----------+----------
+ postgres | f          | t       | t       | t       | t         | f
+Tables:
+    "public.t1"
+    "public.t2" WHERE (e = 99)
+
+                               Publication p3
+  Owner   | All tables | Inserts | Updates | Deletes | Truncates | Via root 
+----------+------------+---------+---------+---------+-----------+----------
+ postgres | f          | t       | t       | t       | t         | f
+Tables:
+    "public.t2" WHERE (d = 10)
+    "public.t3" WHERE (g = 10)
+</programlisting></para>
+
+   <para>
+    <command>psql</command> can be used to show the row filter expressions (if
+    defined) for each table. See that table <literal>t1</literal> is a member
+    of two publications, but has a row filter only in <literal>p1</literal>.
+    See that table <literal>t2</literal> is a member of two publications, and
+    has a different row filter in each of them.
+<programlisting>
+test_pub=# \d t1
+                 Table "public.t1"
+ Column |  Type   | Collation | Nullable | Default 
+--------+---------+-----------+----------+---------
+ a      | integer |           | not null | 
+ b      | integer |           |          | 
+ c      | text    |           | not null | 
+Indexes:
+    "t1_pkey" PRIMARY KEY, btree (a, c)
+Publications:
+    "p1" WHERE ((a > 5) AND (c = 'NSW'::text))
+    "p2"
+
+test_pub=# \d t2
+                 Table "public.t2"
+ Column |  Type   | Collation | Nullable | Default 
+--------+---------+-----------+----------+---------
+ d      | integer |           | not null | 
+ e      | integer |           |          | 
+ f      | integer |           |          | 
+Indexes:
+    "t2_pkey" PRIMARY KEY, btree (d)
+Publications:
+    "p2" WHERE (e = 99)
+    "p3" WHERE (d = 10)
+
+test_pub=# \d t3
+                 Table "public.t3"
+ Column |  Type   | Collation | Nullable | Default 
+--------+---------+-----------+----------+---------
+ g      | integer |           | not null | 
+ h      | integer |           |          | 
+ i      | integer |           |          | 
+Indexes:
+    "t3_pkey" PRIMARY KEY, btree (g)
+Publications:
+    "p3" WHERE (g = 10)
+</programlisting></para>
+
+   <para>
+    On the subscriber node, create a table <literal>t1</literal> with the same
+    definition as the one on the publisher, and also create the subscription
+    <literal>s1</literal> that subscribes to the publication <literal>p1</literal>.
+<programlisting>
+test_sub=# CREATE TABLE t1(a int, b int, c text, PRIMARY KEY(a,c));
+CREATE TABLE
+test_sub=# CREATE SUBSCRIPTION s1
+test_sub-# CONNECTION 'host=localhost dbname=test_pub application_name=s1'
+test_sub-# PUBLICATION p1;
+CREATE SUBSCRIPTION
+</programlisting></para>
+
+   <para>
+    Insert some rows. Only the rows satisfying the <literal>t1 WHERE</literal>
+    clause of publication <literal>p1</literal> are replicated.
+<programlisting>
+test_pub=# INSERT INTO t1 VALUES (2, 102, 'NSW');
+INSERT 0 1
+test_pub=# INSERT INTO t1 VALUES (3, 103, 'QLD');
+INSERT 0 1
+test_pub=# INSERT INTO t1 VALUES (4, 104, 'VIC');
+INSERT 0 1
+test_pub=# INSERT INTO t1 VALUES (5, 105, 'ACT');
+INSERT 0 1
+test_pub=# INSERT INTO t1 VALUES (6, 106, 'NSW');
+INSERT 0 1
+test_pub=# INSERT INTO t1 VALUES (7, 107, 'NT');
+INSERT 0 1
+test_pub=# INSERT INTO t1 VALUES (8, 108, 'QLD');
+INSERT 0 1
+test_pub=# INSERT INTO t1 VALUES (9, 109, 'NSW');
+INSERT 0 1
+
+test_pub=# SELECT * FROM t1; 
+ a |  b  |  c
+---+-----+-----
+ 2 | 102 | NSW
+ 3 | 103 | QLD
+ 4 | 104 | VIC
+ 5 | 105 | ACT
+ 6 | 106 | NSW
+ 7 | 107 | NT
+ 8 | 108 | QLD
+ 9 | 109 | NSW
+(8 rows)
+</programlisting>
+<programlisting>
+test_sub=# SELECT * FROM t1;
+ a |  b  |  c  
+---+-----+-----
+ 6 | 106 | NSW
+ 9 | 109 | NSW
+(2 rows)
+</programlisting></para>
+
+   <para>
+    Update some data, where the old and new row values both
+    satisfy the <literal>t1 WHERE</literal> clause of publication
+    <literal>p1</literal>. The <command>UPDATE</command> replicates
+    the change as normal.
+<programlisting>
+test_pub=# UPDATE t1 SET b = 999 WHERE a = 6;
+UPDATE 1
+
+test_pub=# SELECT * FROM t1;
+ a |  b  |  c  
+---+-----+-----
+ 2 | 102 | NSW
+ 3 | 103 | QLD
+ 4 | 104 | VIC
+ 5 | 105 | ACT
+ 7 | 107 | NT
+ 8 | 108 | QLD
+ 9 | 109 | NSW
+ 6 | 999 | NSW
+(8 rows)
+</programlisting>
+<programlisting>
+test_sub=# SELECT * FROM t1;
+ a |  b  |  c  
+---+-----+-----
+ 9 | 109 | NSW
+ 6 | 999 | NSW
+(2 rows)
+</programlisting></para>
+
+   <para>
+    Update some data, where the old row values did not satisfy
+    the <literal>t1 WHERE</literal> clause of publication <literal>p1</literal>,
+    but the new row values do satisfy it. The <command>UPDATE</command> is
+    transformed into an <command>INSERT</command> and the change is replicated.
+    See the new row on the subscriber.
+<programlisting>
+test_pub=# UPDATE t1 SET a = 555 WHERE a = 2;
+UPDATE 1
+
+test_pub=# SELECT * FROM t1;
+  a  |  b  |  c  
+-----+-----+-----
+   3 | 103 | QLD
+   4 | 104 | VIC
+   5 | 105 | ACT
+   7 | 107 | NT
+   8 | 108 | QLD
+   9 | 109 | NSW
+   6 | 999 | NSW
+ 555 | 102 | NSW
+(8 rows)
+</programlisting>
+<programlisting>
+test_sub=# SELECT * FROM t1;
+  a  |  b  |  c  
+-----+-----+-----
+   9 | 109 | NSW
+   6 | 999 | NSW
+ 555 | 102 | NSW
+(3 rows)
+</programlisting></para>
+
+   <para>
+    Update some data, where the old row values satisfied
+    the <literal>t1 WHERE</literal> clause of publication <literal>p1</literal>,
+    but the new row values do not satisfy it. The <command>UPDATE</command> is
+    transformed into a <command>DELETE</command> and the change is replicated.
+    See that the row is removed from the subscriber.
+<programlisting>
+test_pub=# UPDATE t1 SET c = 'VIC' WHERE a = 9;
+UPDATE 1
+
+test_pub=# SELECT * FROM t1;
+  a  |  b  |  c  
+-----+-----+-----
+   3 | 103 | QLD
+   4 | 104 | VIC
+   5 | 105 | ACT
+   7 | 107 | NT
+   8 | 108 | QLD
+   6 | 999 | NSW
+ 555 | 102 | NSW
+   9 | 109 | VIC
+(8 rows)
+</programlisting>
+<programlisting>
+test_sub=# SELECT * FROM t1;
+  a  |  b  |  c  
+-----+-----+-----
+   6 | 999 | NSW
+ 555 | 102 | NSW
+(2 rows)
+</programlisting></para>
+
+   <para>
+    The following examples show how the publication parameter
+    <literal>publish_via_partition_root</literal> determines whether the row
+    filter of the parent or child table will be used in the case of partitioned
+    tables.
+   </para>
+
+   <para>
+    Create a partitioned table on the publisher.
+<programlisting>
+test_pub=# CREATE TABLE parent(a int PRIMARY KEY) PARTITION BY RANGE(a);
+CREATE TABLE
+test_pub=# CREATE TABLE child PARTITION OF parent DEFAULT;
+CREATE TABLE
+</programlisting>
+   Create the same tables on the subscriber.
+<programlisting>
+test_sub=# CREATE TABLE parent(a int PRIMARY KEY) PARTITION BY RANGE(a);
+CREATE TABLE
+test_sub=# CREATE TABLE child PARTITION OF parent DEFAULT;
+CREATE TABLE
+</programlisting></para>
+
+   <para>
+    Create a publication <literal>p4</literal>, and then subscribe to it. The
+    publication parameter <literal>publish_via_partition_root</literal> is set
+    as true. There are row filters defined on both the partitioned table
+    (<literal>parent</literal>), and on the partition (<literal>child</literal>).
+<programlisting>
+test_pub=# CREATE PUBLICATION p4 FOR TABLE parent WHERE (a &lt; 5), child WHERE (a >= 5)
+test_pub-# WITH (publish_via_partition_root=true);
+CREATE PUBLICATION
+</programlisting>
+<programlisting>
+test_sub=# CREATE SUBSCRIPTION s4
+test_sub-# CONNECTION 'host=localhost dbname=test_pub application_name=s4'
+test_sub-# PUBLICATION p4;
+CREATE SUBSCRIPTION
+</programlisting></para>
+
+   <para>
+    Insert some values directly into the <literal>parent</literal> and
+    <literal>child</literal> tables. They replicate using the row filter of
+    <literal>parent</literal> (because <literal>publish_via_partition_root</literal>
+    is true).
+<programlisting>
+test_pub=# INSERT INTO parent VALUES (2), (4), (6);
+INSERT 0 3
+test_pub=# INSERT INTO child VALUES (3), (5), (7);
+INSERT 0 3
+
+test_pub=# SELECT * FROM parent ORDER BY a;
+ a 
+---
+ 2
+ 3
+ 4
+ 5
+ 6
+ 7
+(6 rows)
+</programlisting>
+<programlisting>
+test_sub=# SELECT * FROM parent ORDER BY a;
+ a 
+---
+ 2
+ 3
+ 4
+(3 rows)
+</programlisting></para>
+
+   <para>
+    Repeat the same test, but with a different value for <literal>publish_via_partition_root</literal>.
+    The publication parameter <literal>publish_via_partition_root</literal> is
+    set as false. A row filter is defined on the partition (<literal>child</literal>).
+<programlisting>
+test_pub=# DROP PUBLICATION p4;
+DROP PUBLICATION
+test_pub=# CREATE PUBLICATION p4 FOR TABLE parent, child WHERE (a >= 5)
+test_pub-# WITH (publish_via_partition_root=false);
+CREATE PUBLICATION
+</programlisting>
+<programlisting>
+test_sub=# ALTER SUBSCRIPTION s4 REFRESH PUBLICATION;
+ALTER SUBSCRIPTION
+</programlisting></para>
+
+   <para>
+    Do the inserts on the publisher same as before. They replicate using the
+    row filter of <literal>child</literal> (because
+    <literal>publish_via_partition_root</literal> is false).
+<programlisting>
+test_pub=# TRUNCATE parent;
+TRUNCATE TABLE
+test_pub=# INSERT INTO parent VALUES (2), (4), (6);
+INSERT 0 3
+test_pub=# INSERT INTO child VALUES (3), (5), (7);
+INSERT 0 3
+
+test_pub=# SELECT * FROM parent ORDER BY a;
+ a 
+---
+ 2
+ 3
+ 4
+ 5
+ 6
+ 7
+(6 rows)
+</programlisting>
+<programlisting>
+test_sub=# SELECT * FROM child ORDER BY a;
+ a 
+---
+ 5
+ 6
+ 7
+(3 rows)
+</programlisting></para>
+
+  </sect2>
+
+ </sect1>
+
  <sect1 id="logical-replication-conflicts">
   <title>Conflicts</title>
 
diff --git a/doc/src/sgml/ref/create_publication.sgml b/doc/src/sgml/ref/create_publication.sgml
index fb2d013..23d883c 100644
--- a/doc/src/sgml/ref/create_publication.sgml
+++ b/doc/src/sgml/ref/create_publication.sgml
@@ -254,6 +254,8 @@ CREATE PUBLICATION <replaceable class="parameter">name</replaceable>
    <literal>publish_via_partition_root</literal> determines if it uses the
    partition's row filter (if the parameter is false, the default) or the root
    partitioned table's row filter.
+   See <xref linkend="logical-replication-row-filter"/> for details about row
+   filters.
   </para>
 
   <para>
-- 
1.8.3.1

#30

smithpb2250@gmail.com

almost 4 years ago

In reply to: Euler Taveira (#28)

Re: PG DOCS - logical replication filtering

On Wed, Apr 13, 2022 at 12:08 AM Euler Taveira <euler@eulerto.com> wrote:

On Tue, Apr 12, 2022, at 5:30 AM, Peter Smith wrote:

Not changed. Because in fact, I copied most of this sentence
(including the uppercase, "operations", "and/or") from existing
documentation [1]
e.g. see "The tables added to a publication that publishes UPDATE
and/or DELETE operations must ..."
[1] https://www.postgresql.org/docs/current/sql-createpublication.html

Hmm. You are checking the Notes. I'm referring the the publish parameter. IMO
this sentence should use operations in lowercase letters too because even if
you create it with uppercase letters, Postgres will provide a lowercase word
when you dump it.

IIRC the row filter replication identity checking is at run-time based
on the operation (not at DDL time based on the publish_parameter). For
this reason, and also because this is the same format/wording used in
many places already on the create_publication.sgml, I did not change
this formatting. But I did modify [v10] as earlier suggested to
replace the “and/or” with “or”, and also added another word
“operation”.

Yeah, I know the information is the same in the summary and in the
text. Personally, I find big slabs of technical text difficult to
digest, so I'd have to spend 5 minutes of careful reading and drawing
the exact same summary on a piece of paper anyway just to visualize
what the text says. The summary makes it easy to understand at a
glance. But I have modified the summary in [v9] to remove the "case"
and to add other column headings as suggested.

Isn't it better to use a table instead of synopsis?

Modified [v10] as suggested.

Not changed. The readers of this docs page are all users who will be
familiar with the filter expressions, so I felt the "OR'ed together"
part will be perfectly clear to the intended audience.

If you want to keep it, change it to "ORed". It is used in indices.sgml. Let's
keep the consistence.

Modified [v10] as suggested.

But I did not understand your point about “If, for some reason,
someone decided to change it, this section will contain obsolete
information”, because IIUC that will be equally true for both the
explanation and the output, so I did not understand why you say "psql
output is fine, the explanation is not". It is the business of this
documentation to help the user to know how and where they can find the
row filter information they may need to know.

You are describing a psql command here. My point is keep psql explanation in
the psql section. This section is to describe the row filter feature. If we
start describing features in other sections, we will have outdated information
when the referred feature is changed and someone fails to find all references.
I tend to concentrate detailed explanation in the feature section. If I have to
add links in other sections, I use "Seee Section 1.23 for details ...".

Modified [v10] so the psql descriptions are now very generic.

Not changed. The publisher and subscriber programlistings are always
separated. If you are looking at the rendered HTML I think it is quite
clear that one is at the publisher and one is at the subscriber. OTOH,
if we omitted creating the tables on the subscriber then I think that
really would cause some confusion.

The difference is extra space. By default, the CSS does not include a border
for programlisting. That's why I complained about it. I noticed that the
website CSS includes it. However, the PDF will not include the border. I would
add a separate description for the subscriber just to be clear.

Modified [v10] as suggested to add a separate description for the subscriber.

One last suggestion, you are using identifiers in uppercase letters but
"primary key" is in lowercase.

Modified [v10] as suggested to make this uppercase

------
[v10] /messages/by-id/CAHut+PvMEYkCRWDoZSpFnP+5SExus7YzWAd=6ah9vwkfRhOnSg@mail.gmail.com

Kind Regards,
Peter Smith.
Fujitsu Australia

#31

euler@eulerto.com

over 3 years ago

In reply to: Peter Smith (#29)

Re: PG DOCS - logical replication filtering

On Wed, Apr 13, 2022, at 12:24 AM, Peter Smith wrote:

PSA patch v10 which addresses the remaining review comments from Euler [1]

Looks good to me.

--
Euler Taveira
EDB https://www.enterprisedb.com/

#32

smithpb2250@gmail.com

over 3 years ago

In reply to: Euler Taveira (#31)

Re: PG DOCS - logical replication filtering

I've changed the CF entry [1]https://commitfest.postgresql.org/38/3605/ status to "ready for committer".

------
[1]: https://commitfest.postgresql.org/38/3605/

Kind Regards,
Peter Smith.
Fujitsu Australia

#33

houzj.fnst@fujitsu.com

over 3 years ago

In reply to: Peter Smith (#29)

RE: PG DOCS - logical replication filtering

On Wednesday, April 13, 2022 11:25 AM Peter Smith <smithpb2250@gmail.com> wrote:

PSA patch v10 which addresses the remaining review comments from Euler [1]

Thanks for the patch, it looks good to me.

Best regards,
Hou zj

#34

amit.kapila16@gmail.com

over 3 years ago

In reply to: Euler Taveira (#31)

Re: PG DOCS - logical replication filtering

On Thu, Apr 14, 2022 at 1:29 AM Euler Taveira <euler@eulerto.com> wrote:

On Wed, Apr 13, 2022, at 12:24 AM, Peter Smith wrote:

PSA patch v10 which addresses the remaining review comments from Euler [1]

Looks good to me.

Thanks, this looks good to me as well. I'll check this again early
next week and push unless I find something or there are more comments.

--
With Regards,
Amit Kapila.

#35

amit.kapila16@gmail.com

over 3 years ago

In reply to: Amit Kapila (#34)

Re: PG DOCS - logical replication filtering

On Thu, Apr 14, 2022 at 8:55 AM Amit Kapila <amit.kapila16@gmail.com> wrote:

On Thu, Apr 14, 2022 at 1:29 AM Euler Taveira <euler@eulerto.com> wrote:

On Wed, Apr 13, 2022, at 12:24 AM, Peter Smith wrote:

PSA patch v10 which addresses the remaining review comments from Euler [1]

Looks good to me.

Thanks, this looks good to me as well. I'll check this again early
next week and push unless I find something or there are more comments.

Pushed.

--
With Regards,
Amit Kapila.

#36