PG DOCS - logical replication filtering

Started by Peter Smithabout 4 years ago36 messageshackers

smithpb2250@gmail.com

about 4 years ago

Hi.

PSA a PG docs patch that is associated with the logical replication
Row Filters feature which was recently pushed [1]https://github.com/postgres/postgres/commit/52e4f0cd472d39d07732b99559989ea3b615be78.

This patch introduces a new "Filtering" page to give a common place
where all kinds of logical replication filtering can be described.
(e.g. It is envisaged that a "Column Filters" section can be added
sometime in the future).

The main new content for this page is the "Row Filters" section. This
gives a full overview of the new row filter feature, plus examples.

------
[1]: https://github.com/postgres/postgres/commit/52e4f0cd472d39d07732b99559989ea3b615be78

Kind Regards,
Peter Smith.
Fujitsu Australia

Aleksander Alekseev

aleksander@timescale.com

about 4 years ago

In reply to: Peter Smith (#1)

Re: PG DOCS - logical replication filtering

Hi Peter,

PSA a PG docs patch that is associated with the logical replication
Row Filters feature which was recently pushed [1].

The patch looks mostly OK, but I have several nitpicks.

```
By default, all data from all published tables will be replicated to the
appropriate subscribers.
[...]
By default, all operation types are replicated.
```

The second sentence seems to be redundant.

```
(This feature is available since PostgreSQL 15)
```

Please correct me if I'm wrong, but I don't think we say that in the docs.
When the user opens the documentation for version X he or she sees
everything that is available in this version.

```
31.3. Filtering
[...]
There are 3 different ways to filter what data gets replicated.
31.3.1. Operation Filters
[...]
31.3.2. Row Filters
[...]
```
It looks like there are 2 different ways after all.

I see that a large part of the documentation is commented and marked as TBA
(Column Filters, Combining Different Kinds of Filters). Could you please
clarify if it's a work-in-progress patch? If it's not, I believe the
commented part should be removed before committing.

--
Best regards,
Aleksander Alekseev

Aleksander Alekseev

aleksander@timescale.com

about 4 years ago

In reply to: Aleksander Alekseev (#2)

Re: PG DOCS - logical replication filtering

Hi again,

The second sentence seems to be redundant.

Actually, I'm wrong on this one.

--
Best regards,
Aleksander Alekseev

Amit Kapila

amit.kapila16@gmail.com

about 4 years ago

In reply to: Aleksander Alekseev (#2)

Re: PG DOCS - logical replication filtering

On Wed, Mar 2, 2022 at 2:37 PM Aleksander Alekseev
<aleksander@timescale.com> wrote:

I see that a large part of the documentation is commented and marked as TBA (Column Filters, Combining Different Kinds of Filters). Could you please clarify if it's a work-in-progress patch? If it's not, I believe the commented part should be removed before committing.

I think we can remove any Column Filters related information
(placeholders), if that patch gets committed, we can always extend the
existing docs.

--
With Regards,
Amit Kapila.

Aleksander Alekseev

aleksander@timescale.com

about 4 years ago

In reply to: Amit Kapila (#4)

Re: PG DOCS - logical replication filtering

Hi hackers,

I see that a large part of the documentation is commented and marked as
TBA (Column Filters, Combining Different Kinds of Filters). Could you
please clarify if it's a work-in-progress patch? If it's not, I believe the
commented part should be removed before committing.

I think we can remove any Column Filters related information
(placeholders), if that patch gets committed, we can always extend the
existing docs.

Here is an updated version of the patch.

--
Best regards,
Aleksander Alekseev

Peter Eisentraut

peter_e@gmx.net

about 4 years ago

In reply to: Peter Smith (#1)

Re: PG DOCS - logical replication filtering

On 02.03.22 05:47, Peter Smith wrote:

This patch introduces a new "Filtering" page to give a common place
where all kinds of logical replication filtering can be described.
(e.g. It is envisaged that a "Column Filters" section can be added
sometime in the future).

The pending feature to select a subset of table columns to replicate is
not "column filtering". The thread might still be still called that,
but we have changed the patch to not use that terminology.

Filtering is a dynamic action based on actual values. The row filtering
feature does that. The column list feature is a static DDL-time
configuration. It is no more filtering than specifying a list of tables
in a publication is table filtering.

So please consider organizing the documentation differently to not
create this confusion.

Peter Smith

smithpb2250@gmail.com

about 4 years ago

In reply to: Aleksander Alekseev (#5)

Re: PG DOCS - logical replication filtering

On Wed, Mar 2, 2022 at 8:43 PM Aleksander Alekseev
<aleksander@timescale.com> wrote:
...

Here is an updated version of the patch.

Thanks for your review comments and fixes. The updated v2 patch looks
good to me.

------
Kind Regards,
Peter Smith.
Fujitsu Australia

Peter Smith

smithpb2250@gmail.com

about 4 years ago

In reply to: Amit Kapila (#4)

Re: PG DOCS - logical replication filtering

On Wed, Mar 2, 2022 at 8:26 PM Amit Kapila <amit.kapila16@gmail.com> wrote:

On Wed, Mar 2, 2022 at 2:37 PM Aleksander Alekseev
<aleksander@timescale.com> wrote:

I see that a large part of the documentation is commented and marked as TBA (Column Filters, Combining Different Kinds of Filters). Could you please clarify if it's a work-in-progress patch? If it's not, I believe the commented part should be removed before committing.

I think we can remove any Column Filters related information
(placeholders), if that patch gets committed, we can always extend the
existing docs.

------
Kind Regards,
Peter Smith.
Fujitsu Australia.

Amit Kapila

amit.kapila16@gmail.com

about 4 years ago

In reply to: Peter Eisentraut (#6)

Re: PG DOCS - logical replication filtering

On Wed, Mar 2, 2022 at 8:00 PM Peter Eisentraut
<peter.eisentraut@enterprisedb.com> wrote:

On 02.03.22 05:47, Peter Smith wrote:

This patch introduces a new "Filtering" page to give a common place
where all kinds of logical replication filtering can be described.
(e.g. It is envisaged that a "Column Filters" section can be added
sometime in the future).

The pending feature to select a subset of table columns to replicate is
not "column filtering". The thread might still be still called that,
but we have changed the patch to not use that terminology.

Filtering is a dynamic action based on actual values. The row filtering
feature does that. The column list feature is a static DDL-time
configuration. It is no more filtering than specifying a list of tables
in a publication is table filtering.

So please consider organizing the documentation differently to not
create this confusion.

+1. I think Row Filters can directly be a section just before
Conflicts on the logical replication page [1]https://www.postgresql.org/docs/devel/logical-replication.html.

Some comments on the patch:
1. I think we can extend/add the example to have filters on more than
one table. This has been noticed multiple times during development
that people are not very clear on it.
2. I think we can add an example or two for row filters actions (like
Insert, Update).
3.
 Publications can choose to limit the changes they produce to
    any combination of <command>INSERT</command>, <command>UPDATE</command>,
-   <command>DELETE</command>, and <command>TRUNCATE</command>,
similar to how triggers are fired by
-   particular event types.  By default, all operation types are replicated.
+   <command>DELETE</command>, and <command>TRUNCATE</command> by using
+   <quote>operation filters</quote>.

By this, one can imply that row filters are used for Truncate as well
but that is not true. I know that that patch later specifies that "Row
filters have no effect for <command>TRUNCATE</command> commands." but
the above modification is not very clear.

[1]: https://www.postgresql.org/docs/devel/logical-replication.html

--
With Regards,
Amit Kapila.

#10

Peter Smith

smithpb2250@gmail.com

about 4 years ago

In reply to: Amit Kapila (#9)

Re: PG DOCS - logical replication filtering

On Thu, Mar 3, 2022 at 2:15 PM Amit Kapila <amit.kapila16@gmail.com> wrote:

On Wed, Mar 2, 2022 at 8:00 PM Peter Eisentraut
<peter.eisentraut@enterprisedb.com> wrote:

On 02.03.22 05:47, Peter Smith wrote:

This patch introduces a new "Filtering" page to give a common place
where all kinds of logical replication filtering can be described.
(e.g. It is envisaged that a "Column Filters" section can be added
sometime in the future).

The pending feature to select a subset of table columns to replicate is
not "column filtering". The thread might still be still called that,
but we have changed the patch to not use that terminology.

Filtering is a dynamic action based on actual values. The row filtering
feature does that. The column list feature is a static DDL-time
configuration. It is no more filtering than specifying a list of tables
in a publication is table filtering.

So please consider organizing the documentation differently to not
create this confusion.

+1. I think Row Filters can directly be a section just before
Conflicts on the logical replication page [1].

OK. I will reorganize the page as suggested, and also attend to the
other comments below.

Some comments on the patch:
1. I think we can extend/add the example to have filters on more than
one table. This has been noticed multiple times during development
that people are not very clear on it.
2. I think we can add an example or two for row filters actions (like
Insert, Update).
3.
Publications can choose to limit the changes they produce to
any combination of <command>INSERT</command>, <command>UPDATE</command>,
-   <command>DELETE</command>, and <command>TRUNCATE</command>,
similar to how triggers are fired by
-   particular event types.  By default, all operation types are replicated.
+   <command>DELETE</command>, and <command>TRUNCATE</command> by using
+   <quote>operation filters</quote>.
By this, one can imply that row filters are used for Truncate as well
but that is not true. I know that that patch later specifies that "Row
filters have no effect for <command>TRUNCATE</command> commands." but
the above modification is not very clear.

[1] - https://www.postgresql.org/docs/devel/logical-replication.html

------
Kind Regards,
Peter Smith.
Fujitsu Australia.

#11

Peter Smith

smithpb2250@gmail.com

about 4 years ago

In reply to: Aleksander Alekseev (#5)

Re: PG DOCS - logical replication filtering

PSA patch v3 to address all comments received so far.

------
Kind Regards,
Peter Smith.
Fujitsu Australia

#12

Peter Smith

smithpb2250@gmail.com

about 4 years ago

In reply to: Amit Kapila (#9)

Re: PG DOCS - logical replication filtering

On Thu, Mar 3, 2022 at 2:15 PM Amit Kapila <amit.kapila16@gmail.com> wrote:

On Wed, Mar 2, 2022 at 8:00 PM Peter Eisentraut
<peter.eisentraut@enterprisedb.com> wrote:

On 02.03.22 05:47, Peter Smith wrote:

This patch introduces a new "Filtering" page to give a common place
where all kinds of logical replication filtering can be described.
(e.g. It is envisaged that a "Column Filters" section can be added
sometime in the future).

The pending feature to select a subset of table columns to replicate is
not "column filtering". The thread might still be still called that,
but we have changed the patch to not use that terminology.

Filtering is a dynamic action based on actual values. The row filtering
feature does that. The column list feature is a static DDL-time
configuration. It is no more filtering than specifying a list of tables
in a publication is table filtering.

So please consider organizing the documentation differently to not
create this confusion.

+1. I think Row Filters can directly be a section just before
Conflicts on the logical replication page [1].

Some comments on the patch:
1. I think we can extend/add the example to have filters on more than
one table. This has been noticed multiple times during development
that people are not very clear on it.

Added example in v3 [1]/messages/by-id/CAHut+Ptwp0FscVpmMjHLV6_=SHR5ndwvWdX_gb41_3H2UA9ecA@mail.gmail.com

2. I think we can add an example or two for row filters actions (like
Insert, Update).

Added examples of INSERT and UPDATE in v3 [1]/messages/by-id/CAHut+Ptwp0FscVpmMjHLV6_=SHR5ndwvWdX_gb41_3H2UA9ecA@mail.gmail.com

3.
Publications can choose to limit the changes they produce to
any combination of <command>INSERT</command>, <command>UPDATE</command>,
-   <command>DELETE</command>, and <command>TRUNCATE</command>,
similar to how triggers are fired by
-   particular event types.  By default, all operation types are replicated.
+   <command>DELETE</command>, and <command>TRUNCATE</command> by using
+   <quote>operation filters</quote>.
By this, one can imply that row filters are used for Truncate as well
but that is not true. I know that that patch later specifies that "Row
filters have no effect for <command>TRUNCATE</command> commands." but
the above modification is not very clear.

Fixed in v3 [1]/messages/by-id/CAHut+Ptwp0FscVpmMjHLV6_=SHR5ndwvWdX_gb41_3H2UA9ecA@mail.gmail.com. Restored original text, and added a note about row filters.

------
[1]: /messages/by-id/CAHut+Ptwp0FscVpmMjHLV6_=SHR5ndwvWdX_gb41_3H2UA9ecA@mail.gmail.com

Kind Regards,
Peter Smith.
Fujitsu Australia.

#13

Euler Taveira

euler@eulerto.com

about 4 years ago

In reply to: Peter Smith (#11)

Re: PG DOCS - logical replication filtering

On Fri, Mar 4, 2022, at 12:41 AM, Peter Smith wrote:

PSA patch v3 to address all comments received so far.

Peter,

I started reviewing this documentation for row filter and I noticed that I was
changing too much lines to submit a patch on the top of it. Hence, I'm
attaching a new version based on this one.

Here as some of the changes that I did:

* links: I renamed the ids to use "logical-replication-row-filter" instead of
"logical-replication-rf" because it is used in the anchors. A compound word
is better than an abbreviation.
* titles: I changed all titles because of some stylish issue (words are
generally capitalized) or because it reads better.
* sections: I moved the "Restrictions" section to the top but it seems
important than the other sections. I removed the "psql" section. The psql
commands are shown in the "Examples" section so I don't think we should
provide a section for it.
* sentences: I expanded several sentences to provide details about the specific
topic. I also reordered some sentences and removed some duplicated sentences.
* replica identity: I added a link to replica identity. It is a key concept for
row filter.
* transformations: I added a few sentences explaining when/why a transformation
is required. I removed the "Cases" part (same as in the source code) because
it is redundant with the new sentences.
* partitioned table: the title should be _partitioned_ table. I replaced the
bullets with sentences in the same paragraph.
* initial data sync: I removed the "subscriber" from the section title. When
you say "initial data synchronization" it seems clear we're talking about the
subscriber. I also add a sentence saying why pre-15 does not consider row
filters.
* combining row filters: I renamed the section and decided to remove "for the
same table". When reading the first sentences from this section, it is clear
that row filtering is per table. So if you are combining multiple row
filters, it should be for the same table. I also added a sentence saying why
some clauses make the row filter irrelevant.
* examples: I combined the psql commands that shows row filter information
together (\dRp+ and \d). I changed to connection string to avoid "localhost".
Why? It does not seem a separate service (there is no port) and setup pub/sub
in the same cluster requires additional steps. It is better to illustrate
different clusters (at least it seems so since we don't provide details from
publisher). I changed a value in an UPDATE because both UPDATEs use 999.

We could probably reduce the number of rows in the example but I didn't bother
to remove them.

It seems we can remove some sentences from the CREATE PUBLICATION because we
have a new section that explains all of it. I think the link that was added by
this patch is sufficient.

--
Euler Taveira
EDB https://www.enterprisedb.com/

#14

Peter Smith

smithpb2250@gmail.com

about 4 years ago

In reply to: Euler Taveira (#13)

Re: PG DOCS - logical replication filtering

On Fri, Mar 11, 2022 at 9:37 AM Euler Taveira <euler@eulerto.com> wrote:

On Fri, Mar 4, 2022, at 12:41 AM, Peter Smith wrote:

PSA patch v3 to address all comments received so far.

Peter,

I started reviewing this documentation for row filter and I noticed that I was
changing too much lines to submit a patch on the top of it. Hence, I'm
attaching a new version based on this one.

Sorry for my long delay in replying. I have been away.

Thanks very much for the updated patch with all your suggestions.
There were many changes in your v4. I have not merged everything
exactly, but I did take the majority of your suggestions on-board in
the attached v5.

I responded below to each change.

Here as some of the changes that I did:

* links: I renamed the ids to use "logical-replication-row-filter" instead of
"logical-replication-rf" because it is used in the anchors. A compound word
is better than an abbreviation.

OK, done as suggested.

* titles: I changed all titles because of some stylish issue (words are
generally capitalized) or because it reads better.

OK, most titles changed as suggested.

* sections: I moved the "Restrictions" section to the top but it seems
important than the other sections. I removed the "psql" section. The psql
commands are shown in the "Examples" section so I don't think we should
provide a section for it.

OK, moved the "Restrictions" section and removed the "psql" section.

* sentences: I expanded several sentences to provide details about the specific
topic. I also reordered some sentences and removed some duplicated sentences.

I did not take everything exactly as in your v4, but wherever your
suggested changes added some more information I tried to include them
using similar wording to yours.

* replica identity: I added a link to replica identity. It is a key concept for
row filter.

OK, done as suggested.

* transformations: I added a few sentences explaining when/why a transformation
is required. I removed the "Cases" part (same as in the source code) because
it is redundant with the new sentences.

OK, I incorporated most of your new descriptions for the
transformations, however I still wanted to keep the summary of "cases"
part because IMO it makes the rules so much clearer than just having
the text descriptions.

* partitioned table: the title should be _partitioned_ table. I replaced the
bullets with sentences in the same paragraph.

OK. The title was changed, but I kept the bullets.

* initial data sync: I removed the "subscriber" from the section title. When
you say "initial data synchronization" it seems clear we're talking about the
subscriber. I also add a sentence saying why pre-15 does not consider row
filters.

OK. Title is changed. Also I added your sentence about the pre-15. But
I kept all the HTML note rendering that you'd removed in v4. I think
this information was important enough to be a "note" instead of just
buried in a paragraph.

* combining row filters: I renamed the section and decided to remove "for the
same table". When reading the first sentences from this section, it is clear
that row filtering is per table. So if you are combining multiple row
filters, it should be for the same table. I also added a sentence saying why
some clauses make the row filter irrelevant.

OK. Title is changed. Your extra sentence was added.

* examples: I combined the psql commands that shows row filter information
together (\dRp+ and \d). I changed to connection string to avoid "localhost".
Why? It does not seem a separate service (there is no port) and setup pub/sub
in the same cluster requires additional steps. It is better to illustrate
different clusters (at least it seems so since we don't provide details from
publisher). I changed a value in an UPDATE because both UPDATEs use 999.

I did not combine those \dRp+ and \d. I kept them separate so I could
write some separate notes about them.

I'm unsure about the connection change. This documentation is for "Row
Filters" so the examples were only intended to demonstrate row filters
and nothing else. I wanted to have a trivial connection such that a
user can just cut/paste directly from the example and get something
they can immediately test without having to change it. I don't mind
changing this later but probably I'd like to get some other opinions
about it first.

About the UPDATE (555 value) - OK, I changed that value as you suggested.

We could probably reduce the number of rows in the example but I didn't bother
to remove them.

It seems we can remove some sentences from the CREATE PUBLICATION because we
have a new section that explains all of it. I think the link that was added by
this patch is sufficient.

Yeah, maybe some sentences can be removed. But even though some
information is duplicated it might be useful to have a few things
still mentioned on the CREATE PUBLICATION page just so the user does
not have to chase links too much. I will wait to see if other people
have an opinion about it before removing any content from that page.
Meanwhile, I have made the create_publication.sgml identical to your
v4.

~~~

There should be much fewer v4/v5 differences now although there might
be a few things I missed that you want to re-comment on. Hopefully, it
will now be easier to post review comments as BEFORE/AFTER text
fragments - that would help other people to see the suggestions more
easily and give their opinions too.

------
Kind Regards,
Peter Smith.
Fujitsu Australia.

#15

Amit Kapila

amit.kapila16@gmail.com

about 4 years ago

In reply to: Peter Smith (#14)

Re: PG DOCS - logical replication filtering

On Thu, Mar 24, 2022 at 11:48 AM Peter Smith <smithpb2250@gmail.com> wrote:

Review comments:
===============
1.
+    The <literal>WHERE</literal> clause expression is evaluated with the same
+    role used for the replication connection. i.e. the role specified in the
+    <literal>CONNECTION</literal> clause of the <xref
linkend="sql-createsubscription"/>.

Can we use () around i.e. sentence? It looks bit odd to me now.
The <literal>WHERE</literal> clause expression is evaluated with the
same role used for the replication connection (i.e., the role
specified in the <literal>CONNECTION</literal> clause of the <xref
linkend="sql-createsubscription"/>).

2.
+   <para>
+    Whenever an <command>UPDATE</command> is processed, the row filter
+    expression is evaluated for both the old and new row (i.e. before
+    and after the data is updated).

Can we write the below part slightly differently?
Before:
(i.e. before and after the data is updated).
After:
(i.e the data before and after the update).

3.
+   <para>
+    Whenever an <command>UPDATE</command> is processed, the row filter
+    expression is evaluated for both the old and new row (i.e. before
+    and after the data is updated).
+   </para>
+
+   <para>
+    If both evaluations are <literal>true</literal>, it replicates the
+    <command>UPDATE</command> change.
+   </para>
+
+   <para>
+    If both evaluations are <literal>false</literal>, it doesn't replicate
+    the change.
+   </para>

I think we can combine these three separate paragraphs.

4.
+   <para>
+    If the publication contains a partitioned table, the publication parameter
+    <literal>publish_via_partition_root</literal> determines which row filter
+    is used.
+    <itemizedlist>
+
+     <listitem>
+      <para>
+       If <literal>publish_via_partition_root</literal> is
<literal>false</literal>
+       (default), each <emphasis>partition's</emphasis> row filter is used.
+      </para>
+     </listitem>
+
+     <listitem>
+      <para>
+       If <literal>publish_via_partition_root</literal> is
<literal>true</literal>,
+       the <emphasis>root partitioned table's</emphasis> row filter is used.
+      </para>
+     </listitem>
+
+    </itemizedlist>
+   </para>

I think we can combine this into single para as Euler had in his version.

5.
+   <note>
+    <para>
+     Publication <literal>publish</literal> operations are ignored
when copying pre-existing table data.
+    </para>
+   </note>
+
+   <note>
+    <para>
+     If the subscriber is in a release prior to 15, copy pre-existing data
+     doesn't use row filters even if they are defined in the publication.
+     This is because old releases can only copy the entire table data.
+    </para>
+   </note>

I don't see the need for the first <note> here, the second one seems
to convey it.

6.
+   <para>
+    Create some tables to be used in the following examples.
+<programlisting>
+testpub=# CREATE TABLE t1(a int, b int, c text, primary key(a,c));
+CREATE TABLE
+testpub=# CREATE TABLE t2(d int primary key, e int, f int);
+CREATE TABLE
+testpub=# CREATE TABLE t3(g int primary key, h int, i int);
+CREATE TABLE
+</programlisting>
+   </para>
+
+   <para>
+    Create some publications.
+   </para>
+   <para>
+    - notice publication <literal>p1</literal> has 1 table with a row filter.
+   </para>
+   <para>
+    - notice publication <literal>p2</literal> has 2 tables, one without a row
+    filter, and one with a row filter.
+   </para>
+   <para>
+    - notice publication <literal>p3</literal> has 2 tables, both
with row filters.
+<programlisting>
+testpub=# CREATE PUBLICATION p1 FOR TABLE t1 WHERE (a > 5 AND c = 'NSW');
+CREATE PUBLICATION
+testpub=# CREATE PUBLICATION p2 FOR TABLE t1, t2 WHERE (e = 99);
+CREATE PUBLICATION
+testpub=# CREATE PUBLICATION p3 FOR TABLE t2 WHERE (d = 10), t3 WHERE (g = 10);
+CREATE PUBLICATION
+</programlisting>
+   </para>

I think it is better to use the corresponding content from Euler's version.

7.
+   <para>
+    The PSQL command <command>\d</command> shows what publications the table is
+    a member of, as well as that table's row filter expression (if defined) in
+    those publications.
+   </para>
+   <para>
+    - notice table <literal>t1</literal> is a member of 2 publications, but
+    has a row filter only in <literal>p1</literal>.
+   </para>
+   <para>
+    - notice table <literal>t2</literal> is a member of 2 publications, and
+    has a different row filter in each of them.

This looks unnecessary to me. Let's remove this part.

8.
+   <para>
+    - notice that only the rows satisfying the <literal>t1 WHERE</literal>
+    clause of publication <literal>p1</literal> are replicated.

Again, it is better to use Euler's version for this and at the place,
he had in his version. Similarly, adjust other notices if any like
this one.

9. I suggest adding an example for partition tables showing how the
row filter is used based on the 'publish_via_partition_root'
parameter.

--
With Regards,
Amit Kapila.

#16

Peter Smith

smithpb2250@gmail.com

about 4 years ago

In reply to: Amit Kapila (#15)

Re: PG DOCS - logical replication filtering

PSA patch v6 to address some of Amit's review comments [1]/messages/by-id/CAA4eK1JdwQQsxa+zpsBW5rCxEfXopYx381nwcCyeXk6mpF8ChQ@mail.gmail.com.

------
[1]: /messages/by-id/CAA4eK1JdwQQsxa+zpsBW5rCxEfXopYx381nwcCyeXk6mpF8ChQ@mail.gmail.com

Kind Regards,
Peter Smith.
Fujitsu Australia

#17

Peter Smith

smithpb2250@gmail.com

about 4 years ago

In reply to: Amit Kapila (#15)

Re: PG DOCS - logical replication filtering

On Wed, Apr 6, 2022 at 8:58 PM Amit Kapila <amit.kapila16@gmail.com> wrote:

On Thu, Mar 24, 2022 at 11:48 AM Peter Smith <smithpb2250@gmail.com> wrote:
Review comments:
===============
1.
+    The <literal>WHERE</literal> clause expression is evaluated with the same
+    role used for the replication connection. i.e. the role specified in the
+    <literal>CONNECTION</literal> clause of the <xref
linkend="sql-createsubscription"/>.
Can we use () around i.e. sentence? It looks bit odd to me now.
The <literal>WHERE</literal> clause expression is evaluated with the
same role used for the replication connection (i.e., the role
specified in the <literal>CONNECTION</literal> clause of the <xref
linkend="sql-createsubscription"/>).

OK. Modified in v6 [1]/messages/by-id/CAHut+PvyxMedYY-jHaT9YSfEPHv0jU2-CZ8F_nPvhuP0b955og@mail.gmail.com.

2.
+   <para>
+    Whenever an <command>UPDATE</command> is processed, the row filter
+    expression is evaluated for both the old and new row (i.e. before
+    and after the data is updated).
Can we write the below part slightly differently?
Before:
(i.e. before and after the data is updated).
After:
(i.e the data before and after the update).

OK. Modified in v6 [1]/messages/by-id/CAHut+PvyxMedYY-jHaT9YSfEPHv0jU2-CZ8F_nPvhuP0b955og@mail.gmail.com.

3.
+   <para>
+    Whenever an <command>UPDATE</command> is processed, the row filter
+    expression is evaluated for both the old and new row (i.e. before
+    and after the data is updated).
+   </para>
+
+   <para>
+    If both evaluations are <literal>true</literal>, it replicates the
+    <command>UPDATE</command> change.
+   </para>
+
+   <para>
+    If both evaluations are <literal>false</literal>, it doesn't replicate
+    the change.
+   </para>

I think we can combine these three separate paragraphs.

The first sentence is the explanation, then there are the 3 separate
“If …” cases mentioned. It doesn’t seem quite right to group just the
first 2 “if…” cases into one paragraph, while leaving the 3rd one
separated. OTOH combining everything into one big paragraph seems
worse. Please confirm if you still want this changed.

4.
+   <para>
+    If the publication contains a partitioned table, the publication parameter
+    <literal>publish_via_partition_root</literal> determines which row filter
+    is used.
+    <itemizedlist>
+
+     <listitem>
+      <para>
+       If <literal>publish_via_partition_root</literal> is
<literal>false</literal>
+       (default), each <emphasis>partition's</emphasis> row filter is used.
+      </para>
+     </listitem>
+
+     <listitem>
+      <para>
+       If <literal>publish_via_partition_root</literal> is
<literal>true</literal>,
+       the <emphasis>root partitioned table's</emphasis> row filter is used.
+      </para>
+     </listitem>
+
+    </itemizedlist>
+   </para>

I think we can combine this into single para as Euler had in his version.

We can do it, but I am not sure if your review was looking at the
rendered HTML or just looking at the SGML text? IMO using bullets here
ended up being more readable (it is also consistent with other bullet
usages on this page). Please confirm if you still want this changed.

5.
+   <note>
+    <para>
+     Publication <literal>publish</literal> operations are ignored
when copying pre-existing table data.
+    </para>
+   </note>
+
+   <note>
+    <para>
+     If the subscriber is in a release prior to 15, copy pre-existing data
+     doesn't use row filters even if they are defined in the publication.
+     This is because old releases can only copy the entire table data.
+    </para>
+   </note>

I don't see the need for the first <note> here, the second one seems
to convey it.

Well, the 2nd note is only about compatibility with older versions
doing the subscribe. But the 1st note is not version-specific at all.
It is saying that the COPY does not take the “publish” option into
account. If you know of some other docs already mentioning this subtle
behaviour of the COPY then I can remove this note and just
cross-reference to the other place. But I did not know anywhere this
is already mentioned, so that is why I wrote the note about it.

6.
+   <para>
+    Create some tables to be used in the following examples.
+<programlisting>
+testpub=# CREATE TABLE t1(a int, b int, c text, primary key(a,c));
+CREATE TABLE
+testpub=# CREATE TABLE t2(d int primary key, e int, f int);
+CREATE TABLE
+testpub=# CREATE TABLE t3(g int primary key, h int, i int);
+CREATE TABLE
+</programlisting>
+   </para>
+
+   <para>
+    Create some publications.
+   </para>
+   <para>
+    - notice publication <literal>p1</literal> has 1 table with a row filter.
+   </para>
+   <para>
+    - notice publication <literal>p2</literal> has 2 tables, one without a row
+    filter, and one with a row filter.
+   </para>
+   <para>
+    - notice publication <literal>p3</literal> has 2 tables, both
with row filters.
+<programlisting>
+testpub=# CREATE PUBLICATION p1 FOR TABLE t1 WHERE (a > 5 AND c = 'NSW');
+CREATE PUBLICATION
+testpub=# CREATE PUBLICATION p2 FOR TABLE t1, t2 WHERE (e = 99);
+CREATE PUBLICATION
+testpub=# CREATE PUBLICATION p3 FOR TABLE t2 WHERE (d = 10), t3 WHERE (g = 10);
+CREATE PUBLICATION
+</programlisting>
+   </para>

I think it is better to use the corresponding content from Euler's version.

OK. Modified in v6 [1]/messages/by-id/CAHut+PvyxMedYY-jHaT9YSfEPHv0jU2-CZ8F_nPvhuP0b955og@mail.gmail.com. I changed the primary key syntax to be the
same as Euler had. I also moved all the 'notice' parts to below the
corresponding example and modified the text.

7.
+   <para>
+    The PSQL command <command>\d</command> shows what publications the table is
+    a member of, as well as that table's row filter expression (if defined) in
+    those publications.
+   </para>
+   <para>
+    - notice table <literal>t1</literal> is a member of 2 publications, but
+    has a row filter only in <literal>p1</literal>.
+   </para>
+   <para>
+    - notice table <literal>t2</literal> is a member of 2 publications, and
+    has a different row filter in each of them.

This looks unnecessary to me. Let's remove this part.

I thought something is needed to explain/demonstrate how the user can
know the different row filters for all the publications that the same
table is a member of. Otherwise, the user has to guess (??) what
publications are using their table and then use \dRp+ to dig at all
those publications to find the row filters.

I can remove all this part from the Examples, but I think at least the
\d should still be mentioned somewhere. IMO I should put that "PSQL
commands" section back (which existed in an earlier version) and just
add a sentence about this. Then this examples part can be removed.
What do you think?

8.
+   <para>
+    - notice that only the rows satisfying the <literal>t1 WHERE</literal>
+    clause of publication <literal>p1</literal> are replicated.
Again, it is better to use Euler's version for this and at the place,
he had in his version. Similarly, adjust other notices if any like
this one.

OK. Modified in v6 [1]/messages/by-id/CAHut+PvyxMedYY-jHaT9YSfEPHv0jU2-CZ8F_nPvhuP0b955og@mail.gmail.com. Every “notice” has now been moved to follow
the associated example (how Euler had them)

9. I suggest adding an example for partition tables showing how the
row filter is used based on the 'publish_via_partition_root'
parameter.

OK - I am working on this and will add it in a future patch version.

------
[1]: /messages/by-id/CAHut+PvyxMedYY-jHaT9YSfEPHv0jU2-CZ8F_nPvhuP0b955og@mail.gmail.com

Kind Regards,
Peter Smith.
Fujitsu Australia

#18

Alvaro Herrera

alvherre@2ndquadrant.com

about 4 years ago

In reply to: Peter Smith (#16)

Re: PG DOCS - logical replication filtering

On 2022-Apr-08, Peter Smith wrote:

PSA patch v6 to address some of Amit's review comments [1].

I think the text is good (didn't read it all, just about the first
half), but why is there one paragraph per sentence? Seems a bit too
sparse.

--
Álvaro Herrera 48°01'N 7°57'E — https://www.EnterpriseDB.com/
"You're _really_ hosed if the person doing the hiring doesn't understand
relational systems: you end up with a whole raft of programmers, none of
whom has had a Date with the clue stick." (Andrew Sullivan)

#19

Amit Kapila

amit.kapila16@gmail.com

about 4 years ago

In reply to: Peter Smith (#17)

Re: PG DOCS - logical replication filtering

On Fri, Apr 8, 2022 at 11:42 AM Peter Smith <smithpb2250@gmail.com> wrote:

On Wed, Apr 6, 2022 at 8:58 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
3.
+   <para>
+    Whenever an <command>UPDATE</command> is processed, the row filter
+    expression is evaluated for both the old and new row (i.e. before
+    and after the data is updated).
+   </para>
+
+   <para>
+    If both evaluations are <literal>true</literal>, it replicates the
+    <command>UPDATE</command> change.
+   </para>
+
+   <para>
+    If both evaluations are <literal>false</literal>, it doesn't replicate
+    the change.
+   </para>
I think we can combine these three separate paragraphs.
The first sentence is the explanation, then there are the 3 separate
“If …” cases mentioned. It doesn’t seem quite right to group just the
first 2 “if…” cases into one paragraph, while leaving the 3rd one
separated. OTOH combining everything into one big paragraph seems
worse. Please confirm if you still want this changed.

Yeah, I think we should have something like what Euler's version had
and maybe keep a summary section from the current patch.

4.
+   <para>
+    If the publication contains a partitioned table, the publication parameter
+    <literal>publish_via_partition_root</literal> determines which row filter
+    is used.
+    <itemizedlist>
+
+     <listitem>
+      <para>
+       If <literal>publish_via_partition_root</literal> is
<literal>false</literal>
+       (default), each <emphasis>partition's</emphasis> row filter is used.
+      </para>
+     </listitem>
+
+     <listitem>
+      <para>
+       If <literal>publish_via_partition_root</literal> is
<literal>true</literal>,
+       the <emphasis>root partitioned table's</emphasis> row filter is used.
+      </para>
+     </listitem>
+
+    </itemizedlist>
+   </para>
I think we can combine this into single para as Euler had in his version.
We can do it, but I am not sure if your review was looking at the
rendered HTML or just looking at the SGML text? IMO using bullets here
ended up being more readable (it is also consistent with other bullet
usages on this page). Please confirm if you still want this changed.

Fair enough. We can keep this part as it is.

5.
+   <note>
+    <para>
+     Publication <literal>publish</literal> operations are ignored
when copying pre-existing table data.
+    </para>
+   </note>
+
+   <note>
+    <para>
+     If the subscriber is in a release prior to 15, copy pre-existing data
+     doesn't use row filters even if they are defined in the publication.
+     This is because old releases can only copy the entire table data.
+    </para>
+   </note>
I don't see the need for the first <note> here, the second one seems
to convey it.
Well, the 2nd note is only about compatibility with older versions
doing the subscribe. But the 1st note is not version-specific at all.
It is saying that the COPY does not take the “publish” option into
account. If you know of some other docs already mentioning this subtle
behaviour of the COPY then I can remove this note and just
cross-reference to the other place. But I did not know anywhere this
is already mentioned, so that is why I wrote the note about it.

I don't see the need to say about general initial sync (COPY) behavior
here. It is already defined at [1]. If we want to enhance, we can do
that as a separate patch to make changes where the initial sync is
explained. I am not sure that is required though.

7.
+   <para>
+    The PSQL command <command>\d</command> shows what publications the table is
+    a member of, as well as that table's row filter expression (if defined) in
+    those publications.
+   </para>
+   <para>
+    - notice table <literal>t1</literal> is a member of 2 publications, but
+    has a row filter only in <literal>p1</literal>.
+   </para>
+   <para>
+    - notice table <literal>t2</literal> is a member of 2 publications, and
+    has a different row filter in each of them.
This looks unnecessary to me. Let's remove this part.
I thought something is needed to explain/demonstrate how the user can
know the different row filters for all the publications that the same
table is a member of. Otherwise, the user has to guess (??) what
publications are using their table and then use \dRp+ to dig at all
those publications to find the row filters.

I can remove all this part from the Examples, but I think at least the
\d should still be mentioned somewhere. IMO I should put that "PSQL
commands" section back (which existed in an earlier version) and just
add a sentence about this. Then this examples part can be removed.
What do you think?

I think the way it is changed in the current patch by moving that
explanation down seems okay to me.

I feel in the initial "Row Filters" and "Row Filter Rules" sections,
we don't need to have separate paragraphs. I think the same is pointed
out by Alvaro as well.

--
With Regards,
Amit Kapila.

#20

Peter Smith

smithpb2250@gmail.com

about 4 years ago

In reply to: Peter Smith (#16)

Re: PG DOCS - logical replication filtering

PSA patch v7 which addresses all the remaining review comments (from
Amit [1a][1b], and from Alvaro [2]/messages/by-id/202204091045.v2ei4yupxqso@alvherre.pgsql).

------
[1a] /messages/by-id/CAHut+PvDFWGUOBugYMtcXhAiViZu+Q6P-kxw2+U83VOGx0Osdg@mail.gmail.com
[1b] /messages/by-id/CAA4eK1JPyVoc1dUjeqbPd9D0_uYxWyyx-8fcsrgiZ5Tpr9OAuw@mail.gmail.com
[2]: /messages/by-id/202204091045.v2ei4yupxqso@alvherre.pgsql

Kind Regards,
Peter Smith.
Fujitsu Australia

#21

Peter Smith

smithpb2250@gmail.com

about 4 years ago

In reply to: Peter Smith (#17)

#22

Peter Smith

smithpb2250@gmail.com

about 4 years ago

In reply to: Amit Kapila (#19)