pg_get_publication_tables() output duplicate relid
Hi hackers,
In another thread[1]/messages/by-id/CAJcOf-eURu03QNmD=37PtsxuNW4nBGN3G_FdRMBx_tpkeyzDUw@mail.gmail.com, we found the pg_get_publication_tables function will output
duplicate partition relid when adding both child and parent table to the
publication(pubviaroot = false).
Example:
create table tbl1 (a int) partition by range (a);
create table tbl1_part1 partition of tbl1 for values from (1) to (10);
create table tbl1_part2 partition of tbl1 for values from (10) to (20);
create publication pub for table
tbl1, tbl1_part1 with (publish_via_partition_root=false);
select * from pg_get_publication_tables('pub');
relid
-------
16387
16390
16387
The reason of the duplicate output is that:
pg_get_publication_tables invokes function GetPublicationRelations internally.
In GetPublicationRelations(), it will add the oid of partition 'tbl1_part1'
twice. First time from extracting partitions from the specified parent table
'tbl1', second time from the explicitly specified partition 'tbl1_part1'.
I am not sure is this behavior expected as it seems natural for
pg_get_publication_tables to return duplicate-free relid list. OTOH, there
seems no harm for the current behavior(duplicate output), it doesn't affect the
initial sync and change replication when using logical replication.
Personally, I think it might be better to make the output of
pg_get_publication_tables duplicate-free, because the change happened on each
output relid will only be replicated once. So, it seems more consistent to
output each relid only once.
Thoughts ?
(Attach a patch which make the output duplicate-free)
[1]: /messages/by-id/CAJcOf-eURu03QNmD=37PtsxuNW4nBGN3G_FdRMBx_tpkeyzDUw@mail.gmail.com
Best regards,
Hou zj
Attachments:
0001-fix-duplicate-table-in-pg_publication_tables.patchapplication/octet-stream; name=0001-fix-duplicate-table-in-pg_publication_tables.patchDownload+16-2
On Mon, Nov 15, 2021 at 1:48 PM houzj.fnst@fujitsu.com
<houzj.fnst@fujitsu.com> wrote:
Hi hackers,
In another thread[1], we found the pg_get_publication_tables function will output
duplicate partition relid when adding both child and parent table to the
publication(pubviaroot = false).Example:
create table tbl1 (a int) partition by range (a);
create table tbl1_part1 partition of tbl1 for values from (1) to (10);
create table tbl1_part2 partition of tbl1 for values from (10) to (20);
create publication pub for table
tbl1, tbl1_part1 with (publish_via_partition_root=false);select * from pg_get_publication_tables('pub');
relid
-------
16387
16390
16387The reason of the duplicate output is that:
pg_get_publication_tables invokes function GetPublicationRelations internally.
In GetPublicationRelations(), it will add the oid of partition 'tbl1_part1'
twice. First time from extracting partitions from the specified parent table
'tbl1', second time from the explicitly specified partition 'tbl1_part1'.I am not sure is this behavior expected as it seems natural for
pg_get_publication_tables to return duplicate-free relid list. OTOH, there
seems no harm for the current behavior(duplicate output), it doesn't affect the
initial sync and change replication when using logical replication.Personally, I think it might be better to make the output of
pg_get_publication_tables duplicate-free, because the change happened on each
output relid will only be replicated once. So, it seems more consistent to
output each relid only once.Thoughts ?
(Attach a patch which make the output duplicate-free)
[1] /messages/by-id/CAJcOf-eURu03QNmD=37PtsxuNW4nBGN3G_FdRMBx_tpkeyzDUw@mail.gmail.com
The users can always specify the distinct clause, see [1]postgres=# select * from pg_get_publication_tables('pub'); relid ------- 16387 16390 16387 (3 rows). I don't see
any problem with the existing way. If required you can specify in the
view pg_publication_tables documentation that "This view returns
multiple rows for the same relation, if the relation is child to a
parent partition table and if both the parent and child are specified
in the publication" or something similart.
If at all, the pg_get_publication_tables() is supposed to give unique
outputs, let's fix it and it is a good idea to specify that the view
pg_publication_tables returns unique rows even though child and parent
partition tables are specified in the publication.
I have fdw comments on the patch:
1) Why to do result = list_concat_unique_oid(result, relids); within
the while loop which can make the while loop O(n*n*n)
complexity(list_concat_unique_oid is O(n*n)? Can't you use
list_append_unique, of course this can also be costlier? Let's avoid
adding anything to the while loop and use
list_sort+list_deduplicate_oid (qsort(O(nlogn)+O(n))
/* Now sort and de-duplicate the result list */
list_sort(result, list_oid_cmp);
list_deduplicate_oid(result);
while (HeapTupleIsValid(tup = systable_getnext(scan)))
{
Form_pg_publication_rel pubrel;
+ List *relids = NIL;
pubrel = (Form_pg_publication_rel) GETSTRUCT(tup);
- result = GetPubPartitionOptionRelations(result, pub_partopt,
+ relids = GetPubPartitionOptionRelations(relids, pub_partopt,
pubrel->prrelid);
+
+ result = list_concat_unique_oid(result, relids);
}
2) Are you sure you want GetPublicationRelations to be returning the
unique relations? I'm just thinking why can't you just do
list_sort+list_deduplicate_oid in the caller? I think this makes the
function more restrictive. If required, you can pass a boolean flag
(bool give_unique and specify in the function comments and if passed
true do list_sort+list_deduplicate_oid at the end of
GetPublicationRelations ).
[1]: postgres=# select * from pg_get_publication_tables('pub'); relid ------- 16387 16390 16387 (3 rows)
postgres=# select * from pg_get_publication_tables('pub');
relid
-------
16387
16390
16387
(3 rows)
postgres=# select distinct * from pg_get_publication_tables('pub');
relid
-------
16387
16390
(2 rows)
postgres=# select * from pg_publication_tables;
pubname | schemaname | tablename
---------+------------+------------
pub | public | tbl1_part1
pub | public | tbl1_part2
pub | public | tbl1_part1
(3 rows)
postgres=# select distinct * from pg_publication_tables;
pubname | schemaname | tablename
---------+------------+------------
pub | public | tbl1_part1
pub | public | tbl1_part2
(2 rows)
Regards,
Bharath Rupireddy.
On Mon, Nov 15, 2021 at 1:48 PM houzj.fnst@fujitsu.com
<houzj.fnst@fujitsu.com> wrote:
create table tbl1 (a int) partition by range (a);
create table tbl1_part1 partition of tbl1 for values from (1) to (10);
create table tbl1_part2 partition of tbl1 for values from (10) to (20);
create publication pub for table
tbl1, tbl1_part1 with (publish_via_partition_root=false);
In the name of consistency, I think this situation should be an error --
I mean, if we detect that the user is trying to add both the partitioned
table *and* its partition, then all manner of things are possibly going
to go wrong in some way, so my inclination is to avoid it altogether.
Is there any reason to allow that?
If we do that, then we have to be careful with later ALTER PUBLICATION
too: adding a partition is not allowed if its partitioned table is
there, and adding a partitioned table should behave in some sensible way
if one of its partitions is there (either removing the partition at the
same time, or outright rejecting the operation.)
--
Álvaro Herrera PostgreSQL Developer — https://www.EnterpriseDB.com/
On Mon, Nov 15, 2021 6:17 PM Bharath Rupireddy <bharath.rupireddyforpostgres@gmail.com> wrote:
On Mon, Nov 15, 2021 at 1:48 PM houzj.fnst@fujitsu.com
<houzj.fnst@fujitsu.com> wrote:Hi hackers,
In another thread[1], we found the pg_get_publication_tables function
will output duplicate partition relid when adding both child and
parent table to the publication(pubviaroot = false).Example:
create table tbl1 (a int) partition by range (a); create table
tbl1_part1 partition of tbl1 for values from (1) to (10); create table
tbl1_part2 partition of tbl1 for values from (10) to (20); create
publication pub for table tbl1, tbl1_part1 with
(publish_via_partition_root=false);select * from pg_get_publication_tables('pub'); relid
-------
16387
16390
16387The reason of the duplicate output is that:
pg_get_publication_tables invokes function GetPublicationRelations
internally.
In GetPublicationRelations(), it will add the oid of partition 'tbl1_part1'
twice. First time from extracting partitions from the specified parent
table 'tbl1', second time from the explicitly specified partition 'tbl1_part1'.I am not sure is this behavior expected as it seems natural for
pg_get_publication_tables to return duplicate-free relid list. OTOH,
there seems no harm for the current behavior(duplicate output), it
doesn't affect the initial sync and change replication when using logicalreplication.
Personally, I think it might be better to make the output of
pg_get_publication_tables duplicate-free, because the change happened
on each output relid will only be replicated once. So, it seems more
consistent to output each relid only once.Thoughts ?
(Attach a patch which make the output duplicate-free)
[1]
/messages/by-id/CAJcOf-eURu03QNmD=37Ptsxu
NW4nBGN3G_FdRMBx_tpkeyzDUw%40mail.gmail.com
The users can always specify the distinct clause, see [1]. I don't see any problem
with the existing way. If required you can specify in the view
pg_publication_tables documentation that "This view returns multiple rows for
the same relation, if the relation is child to a parent partition table and if both
the parent and child are specified in the publication" or something similart.
Thanks for the response.
Yes, I agreed, as I said there's no harm for the current behavior. If most people don't think
It's worth changing the behavior, I am fine with that.
If at all, the pg_get_publication_tables() is supposed to give unique outputs,
let's fix it and it is a good idea to specify that the view pg_publication_tables
returns unique rows even though child and parent partition tables are specified
in the publication.I have fdw comments on the patch:
Let's avoid adding anything to the while
loop and use list_sort+list_deduplicate_oid (qsort(O(nlogn)+O(n))
Thanks for the comments, I will address this if we finally decide to fix it.
2) Are you sure you want GetPublicationRelations to be returning the unique
relations? I'm just thinking why can't you just do list_sort+list_deduplicate_oid
in the caller? I think this makes the function more restrictive. If required, you
can pass a boolean flag (bool give_unique and specify in the function
comments and if passed true do list_sort+list_deduplicate_oid at the end of
GetPublicationRelations ).
I change the GetPublicationRelations because I found no caller need the
duplicate oid, and de-duplicate oid in GetPublicationRelations could reduce
some code change. I think if we need the duplicate oid in the future, we can
add the flag as you suggested.
Best regards,
Hou zj
On Mon, Nov 15, 2021 9:42 PM Alvaro Herrera <alvherre@alvh.no-ip.org> wrote:
On Mon, Nov 15, 2021 at 1:48 PM houzj.fnst@fujitsu.com
<houzj.fnst@fujitsu.com> wrote:create table tbl1 (a int) partition by range (a); create table
tbl1_part1 partition of tbl1 for values from (1) to (10); create
table tbl1_part2 partition of tbl1 for values from (10) to (20);
create publication pub for table tbl1, tbl1_part1 with
(publish_via_partition_root=false);In the name of consistency, I think this situation should be an error -- I mean, if
we detect that the user is trying to add both the partitioned table *and* its
partition, then all manner of things are possibly going to go wrong in some way,
so my inclination is to avoid it altogether.Is there any reason to allow that?
If we do that, then we have to be careful with later ALTER PUBLICATION
too: adding a partition is not allowed if its partitioned table is there, and adding
a partitioned table should behave in some sensible way if one of its partitions is
there (either removing the partition at the same time, or outright rejecting the
operation.)
Thanks for the response.
If we decide to disallow this case, we seem need to handle some other cases as
well, for example: We might also need additional check when ATTACH a partition,
because the partition's parent table could already be published in the same
publication as the partition. During the check we might also need to lock all
the parent tables to handle concurrently change which looks complicated to me :(
I am not sure is it worth adding these lock and check. Maybe we can leave the current
behavior as is ?
Best regards,
Hou zj
On Mon, Nov 15, 2021 at 7:12 PM Alvaro Herrera <alvherre@alvh.no-ip.org> wrote:
On Mon, Nov 15, 2021 at 1:48 PM houzj.fnst@fujitsu.com
<houzj.fnst@fujitsu.com> wrote:create table tbl1 (a int) partition by range (a);
create table tbl1_part1 partition of tbl1 for values from (1) to (10);
create table tbl1_part2 partition of tbl1 for values from (10) to (20);
create publication pub for table
tbl1, tbl1_part1 with (publish_via_partition_root=false);In the name of consistency, I think this situation should be an error --
I mean, if we detect that the user is trying to add both the partitioned
table *and* its partition, then all manner of things are possibly going
to go wrong in some way, so my inclination is to avoid it altogether.Is there any reason to allow that?
I think it could provide flexibility to users to later change
"publish_via_partition_root" option. Because when that option is
false, we use individual partitions schema to send changes and when it
is true, we use root table's schema to send changes. Added Amit L. to
know if he has any thoughts on this matter as he was the author of
this work?
--
With Regards,
Amit Kapila.
On Tue, Nov 16, 2021 at 7:21 AM houzj.fnst@fujitsu.com
<houzj.fnst@fujitsu.com> wrote:
On Mon, Nov 15, 2021 9:42 PM Alvaro Herrera <alvherre@alvh.no-ip.org> wrote:
On Mon, Nov 15, 2021 at 1:48 PM houzj.fnst@fujitsu.com
<houzj.fnst@fujitsu.com> wrote:create table tbl1 (a int) partition by range (a); create table
tbl1_part1 partition of tbl1 for values from (1) to (10); create
table tbl1_part2 partition of tbl1 for values from (10) to (20);
create publication pub for table tbl1, tbl1_part1 with
(publish_via_partition_root=false);In the name of consistency, I think this situation should be an error -- I mean, if
we detect that the user is trying to add both the partitioned table *and* its
partition, then all manner of things are possibly going to go wrong in some way,
so my inclination is to avoid it altogether.Is there any reason to allow that?
If we do that, then we have to be careful with later ALTER PUBLICATION
too: adding a partition is not allowed if its partitioned table is there, and adding
a partitioned table should behave in some sensible way if one of its partitions is
there (either removing the partition at the same time, or outright rejecting the
operation.)Thanks for the response.
If we decide to disallow this case, we seem need to handle some other cases as
well, for example: We might also need additional check when ATTACH a partition,
because the partition's parent table could already be published in the same
publication as the partition.
What kind of additional checks you are envisioning and why?
--
With Regards,
Amit Kapila.
On Wed, Nov 17, 2021 10:47 AM Amit Kapila <amit.kapila16@gmail.com> wrote:
On Tue, Nov 16, 2021 at 7:21 AM houzj.fnst@fujitsu.com
<houzj.fnst@fujitsu.com> wrote:On Mon, Nov 15, 2021 9:42 PM Alvaro Herrera <alvherre@alvh.no-ip.org>
wrote:
On Mon, Nov 15, 2021 at 1:48 PM houzj.fnst@fujitsu.com
<houzj.fnst@fujitsu.com> wrote:create table tbl1 (a int) partition by range (a); create table
tbl1_part1 partition of tbl1 for values from (1) to (10); create
table tbl1_part2 partition of tbl1 for values from (10) to (20);
create publication pub for table tbl1, tbl1_part1 with
(publish_via_partition_root=false);In the name of consistency, I think this situation should be an
error -- I mean, if we detect that the user is trying to add both
the partitioned table *and* its partition, then all manner of things
are possibly going to go wrong in some way, so my inclination is to avoid it
altogether.Is there any reason to allow that?
If we do that, then we have to be careful with later ALTER
PUBLICATION
too: adding a partition is not allowed if its partitioned table is
there, and adding a partitioned table should behave in some sensible
way if one of its partitions is there (either removing the partition
at the same time, or outright rejecting the
operation.)Thanks for the response.
If we decide to disallow this case, we seem need to handle some other
cases as well, for example: We might also need additional check when
ATTACH a partition, because the partition's parent table could already
be published in the same publication as the partition.What kind of additional checks you are envisioning and why?
For example:
create table tbl1 (a int) partition by range (a);
create table tbl1_part1 (like tbl1);
create table tbl1_part2 partition of tbl1 for values from (10) to (20);
create publication pub for table
tbl1, tbl1_part1 with (publish_via_partition_root=false);
--- We might need addition check here
Alter table tbl1 ATTACH partition tbl1_part1 for values from (1) to (10);
In the above cases, 'tbl1_part1' is not a partition of 'tb1' when creating a
publication. After the ATTACH, 'tbl1_part1' would become a partition of 'tbl1'
which seems the case we want to disallow(both parent and child table in
publication). So, When ATTACH, I thought we might need to check all the parent
tables to see if any parent table is in the same publication which the table to
be attached is also belongs to. Does it make sense ?
Best regards,
Hou zj
On Tue, Nov 16, 2021 at 10:27 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
On Mon, Nov 15, 2021 at 7:12 PM Alvaro Herrera <alvherre@alvh.no-ip.org> wrote:
On Mon, Nov 15, 2021 at 1:48 PM houzj.fnst@fujitsu.com
<houzj.fnst@fujitsu.com> wrote:create table tbl1 (a int) partition by range (a);
create table tbl1_part1 partition of tbl1 for values from (1) to (10);
create table tbl1_part2 partition of tbl1 for values from (10) to (20);
create publication pub for table
tbl1, tbl1_part1 with (publish_via_partition_root=false);In the name of consistency, I think this situation should be an error --
I mean, if we detect that the user is trying to add both the partitioned
table *and* its partition, then all manner of things are possibly going
to go wrong in some way, so my inclination is to avoid it altogether.Is there any reason to allow that?
I think it could provide flexibility to users to later change
"publish_via_partition_root" option. Because when that option is
false, we use individual partitions schema to send changes and when it
is true, we use root table's schema to send changes. Added Amit L. to
know if he has any thoughts on this matter as he was the author of
this work?
FWIW, I'm not sure that adding an error in this path, that is, when a
user adds both a partitioned table and its partitions to a
publication, helps much. As for the safety of allowing it, if you
look at get_rel_sync_entry(), which handles much of the complexity of
determining whether to publish a relation's changes and the schema to
use when doing so, you may be able to see that a partition being added
duplicatively is harmless, modulo any undiscovered bugs. At least as
far as the post-initial-sync replication functionality is concerned.
What IS problematic is what a subscriber sees in the
pg_publication_tables view and the problem occurs only in the initial
sync phase, where the partition is synced duplicatively because of
being found in the view along with the parent in this case, that is,
when publish_via_partiiton_root is true. I was saying on the other
thread [1]/messages/by-id/CA+HiwqHnDHcT4OOcga9rDFyc7TvDrpN5xFH9J2pyHQo9ptvjmQ@mail.gmail.com that we should leave it up to the subscriber to decide what
to do when the view (duplicatively) returns both the parent and the
partition, citing the use case that a subscriber may want to replicate
the parent and the partition as independent tables. Though I now tend
to agree with Amit K that that may be such a meaningful and all that
common use case, and the implementation on the subscriber side would
have to be unnecessarily complex.
So maybe it makes sense to just do what has been proposed --
de-duplicate partitions out of the pg_publication_tables view, unless
we know of a bigger problem that requires us to hack the subscriber
side of things too. Actually, I came to know of one such problem
while thinking about this today: when you ATTACH a partition to a
table that is present in a publish_via_partition_root=true
publication, it doesn't get copied via the initial sync even though
subsequent replication works just fine. The reason for that is that
the subscriber only syncs the partitions that are known at the time
when the parent is first synced, that too via the parent (as SELECT
<columns..> FROM parent), and then marks the parent as sync-done.
Refreshing the subscription after ATTACHing doesn't help, because the
subscriber can't see any partitions to begin with. Maybe a more
elaborate solution is needed for this one, though I haven't figured
what it is going to look like yet.
Thanks.
--
Amit Langote
EDB: http://www.enterprisedb.com
[1]: /messages/by-id/CA+HiwqHnDHcT4OOcga9rDFyc7TvDrpN5xFH9J2pyHQo9ptvjmQ@mail.gmail.com
On Wed, Nov 17, 2021 at 12:15 PM houzj.fnst@fujitsu.com
<houzj.fnst@fujitsu.com> wrote:
On Wed, Nov 17, 2021 10:47 AM Amit Kapila <amit.kapila16@gmail.com> wrote:
On Tue, Nov 16, 2021 at 7:21 AM houzj.fnst@fujitsu.com wrote:
If we decide to disallow this case, we seem need to handle some other
cases as well, for example: We might also need additional check when
ATTACH a partition, because the partition's parent table could already
be published in the same publication as the partition.What kind of additional checks you are envisioning and why?
For example:
create table tbl1 (a int) partition by range (a);
create table tbl1_part1 (like tbl1);
create table tbl1_part2 partition of tbl1 for values from (10) to (20);
create publication pub for table
tbl1, tbl1_part1 with (publish_via_partition_root=false);--- We might need addition check here Alter table tbl1 ATTACH partition tbl1_part1 for values from (1) to (10);In the above cases, 'tbl1_part1' is not a partition of 'tb1' when creating a
publication. After the ATTACH, 'tbl1_part1' would become a partition of 'tbl1'
which seems the case we want to disallow(both parent and child table in
publication). So, When ATTACH, I thought we might need to check all the parent
tables to see if any parent table is in the same publication which the table to
be attached is also belongs to. Does it make sense ?
I don't think creating or attaching a partition of a table that is
present in a publish_via_partition_root=false actually adds the
partition to pg_publication_rel, the base catalog. A partition's
membership in the publication is implicit, unless of course you add it
to the publication explicitly, like all the examples we have been
discussing. I guess we're only arguing about the problems with the
pg_publication_tables view, which does expand the partitioned table to
show the partitions that are not otherwise not present in the base
catalog.
--
Amit Langote
EDB: http://www.enterprisedb.com
On Wed, Nov 17, 2021 at 3:09 PM Amit Langote <amitlangote09@gmail.com> wrote:
Though I now tend
to agree with Amit K that that may be such a meaningful and all that
common use case,
Oops I meant: that may NOT be such...
--
Amit Langote
EDB: http://www.enterprisedb.com
On Wed, Nov 17, 2021 2:18 PM Amit Langote <amitlangote09@gmail.com> wrote:
On Wed, Nov 17, 2021 at 12:15 PM houzj.fnst@fujitsu.com
<houzj.fnst@fujitsu.com> wrote:On Wed, Nov 17, 2021 10:47 AM Amit Kapila <amit.kapila16@gmail.com>
wrote:
On Tue, Nov 16, 2021 at 7:21 AM houzj.fnst@fujitsu.com wrote:
If we decide to disallow this case, we seem need to handle some other
cases as well, for example: We might also need additional check when
ATTACH a partition, because the partition's parent table could already
be published in the same publication as the partition.What kind of additional checks you are envisioning and why?
For example:
create table tbl1 (a int) partition by range (a);
create table tbl1_part1 (like tbl1);
create table tbl1_part2 partition of tbl1 for values from (10) to (20);
create publication pub for table
tbl1, tbl1_part1 with (publish_via_partition_root=false);--- We might need addition check here Alter table tbl1 ATTACH partition tbl1_part1 for values from (1) to (10);In the above cases, 'tbl1_part1' is not a partition of 'tb1' when creating a
publication. After the ATTACH, 'tbl1_part1' would become a partition of 'tbl1'
which seems the case we want to disallow(both parent and child table in
publication). So, When ATTACH, I thought we might need to check all the parent
tables to see if any parent table is in the same publication which the table to
be attached is also belongs to. Does it make sense ?I don't think creating or attaching a partition of a table that is
present in a publish_via_partition_root=false actually adds the
partition to pg_publication_rel, the base catalog. A partition's
membership in the publication is implicit, unless of course you add it
to the publication explicitly, like all the examples we have been
discussing. I guess we're only arguing about the problems with the
pg_publication_tables view, which does expand the partitioned table to
show the partitions that are not otherwise not present in the base
catalog.
Maybe I didn't make it clear, I was trying to explain that it would be
complicated if we want to completely disallow specifying both child and parent
table in the publication because of the ATTACH case I gave. In other words, I
think it's fine to specify both child and parent table in the publication.
Best regards,
Hou zj
On Thu, Nov 18, 2021 at 9:33 houzj.fnst@fujitsu.com <houzj.fnst@fujitsu.com>
wrote:
On Wed, Nov 17, 2021 2:18 PM Amit Langote <amitlangote09@gmail.com> wrote:
On Wed, Nov 17, 2021 at 12:15 PM houzj.fnst@fujitsu.com
<houzj.fnst@fujitsu.com> wrote:On Wed, Nov 17, 2021 10:47 AM Amit Kapila <amit.kapila16@gmail.com>
wrote:
On Tue, Nov 16, 2021 at 7:21 AM houzj.fnst@fujitsu.com wrote:
If we decide to disallow this case, we seem need to handle some
other
cases as well, for example: We might also need additional check
when
ATTACH a partition, because the partition's parent table could
already
be published in the same publication as the partition.
What kind of additional checks you are envisioning and why?
For example:
create table tbl1 (a int) partition by range (a);
create table tbl1_part1 (like tbl1);
create table tbl1_part2 partition of tbl1 for values from (10) to (20);
create publication pub for table
tbl1, tbl1_part1 with (publish_via_partition_root=false);--- We might need addition check here Alter table tbl1 ATTACH partition tbl1_part1 for values from (1) to(10);
In the above cases, 'tbl1_part1' is not a partition of 'tb1' when
creating a
publication. After the ATTACH, 'tbl1_part1' would become a partition
of 'tbl1'
which seems the case we want to disallow(both parent and child table in
publication). So, When ATTACH, I thought we might need to check allthe parent
tables to see if any parent table is in the same publication which the
table to
be attached is also belongs to. Does it make sense ?
I don't think creating or attaching a partition of a table that is
present in a publish_via_partition_root=false actually adds the
partition to pg_publication_rel, the base catalog. A partition's
membership in the publication is implicit, unless of course you add it
to the publication explicitly, like all the examples we have been
discussing. I guess we're only arguing about the problems with the
pg_publication_tables view, which does expand the partitioned table to
show the partitions that are not otherwise not present in the base
catalog.Maybe I didn't make it clear, I was trying to explain that it would be
complicated if we want to completely disallow specifying both child and
parent
table in the publication because of the ATTACH case I gave. In other
words, I
think it's fine to specify both child and parent table in the publication.
Ah okay. I see your point. A table that was not once a partition would be
in the catalog along with the unrelated parent table and if we were to
enforce the rule that a parent and a partition cannot be in the catalog at
the same time, we’d need to delete that table’s catalog record on ATTACHing
it as a partition.
Thanks.
--
Amit Langote
EDB: http://www.enterprisedb.com
On Wed, Nov 17, 2021 at 11:39 AM Amit Langote <amitlangote09@gmail.com> wrote:
On Tue, Nov 16, 2021 at 10:27 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
On Mon, Nov 15, 2021 at 7:12 PM Alvaro Herrera <alvherre@alvh.no-ip.org> wrote:
On Mon, Nov 15, 2021 at 1:48 PM houzj.fnst@fujitsu.com
<houzj.fnst@fujitsu.com> wrote:create table tbl1 (a int) partition by range (a);
create table tbl1_part1 partition of tbl1 for values from (1) to (10);
create table tbl1_part2 partition of tbl1 for values from (10) to (20);
create publication pub for table
tbl1, tbl1_part1 with (publish_via_partition_root=false);In the name of consistency, I think this situation should be an error --
I mean, if we detect that the user is trying to add both the partitioned
table *and* its partition, then all manner of things are possibly going
to go wrong in some way, so my inclination is to avoid it altogether.Is there any reason to allow that?
I think it could provide flexibility to users to later change
"publish_via_partition_root" option. Because when that option is
false, we use individual partitions schema to send changes and when it
is true, we use root table's schema to send changes. Added Amit L. to
know if he has any thoughts on this matter as he was the author of
this work?FWIW, I'm not sure that adding an error in this path, that is, when a
user adds both a partitioned table and its partitions to a
publication, helps much. As for the safety of allowing it, if you
look at get_rel_sync_entry(), which handles much of the complexity of
determining whether to publish a relation's changes and the schema to
use when doing so, you may be able to see that a partition being added
duplicatively is harmless, modulo any undiscovered bugs. At least as
far as the post-initial-sync replication functionality is concerned.What IS problematic is what a subscriber sees in the
pg_publication_tables view and the problem occurs only in the initial
sync phase, where the partition is synced duplicatively because of
being found in the view along with the parent in this case, that is,
when publish_via_partiiton_root is true. I was saying on the other
thread [1] that we should leave it up to the subscriber to decide what
to do when the view (duplicatively) returns both the parent and the
partition, citing the use case that a subscriber may want to replicate
the parent and the partition as independent tables. Though I now tend
to agree with Amit K that that may be such a meaningful and all that
common use case, and the implementation on the subscriber side would
have to be unnecessarily complex.So maybe it makes sense to just do what has been proposed --
de-duplicate partitions out of the pg_publication_tables view,
AFAICU, there are actually two problems related to
pg_publication_tables view that are being discussed: (a) when
'publish_via_partition_root' is true then it returns both parent and
child tables (provided both are part of publication), this can lead to
duplicate data on the subscriber; the fix for this is being discussed
in thread [1]/messages/by-id/CA+HiwqHnDHcT4OOcga9rDFyc7TvDrpN5xFH9J2pyHQo9ptvjmQ@mail.gmail.com. (b) when 'publish_via_partition_root' is false, then it
returns duplicate entries for child tables (provided both are part of
publication), this problem is being discussed in this thread.
As per the proposed patches, it seems both need a separate fix, we can
fix them together if we want but I am not sure. Is your understanding
the same?
unless
we know of a bigger problem that requires us to hack the subscriber
side of things too.
There is yet another issue that might need subscriber side change. See
the second issue summarized by Hou-San in the email[2]/messages/by-id/OS0PR01MB5716C756312959F293A822C794869@OS0PR01MB5716.jpnprd01.prod.outlook.com. I feel it is
better to tackle that separately.
Actually, I came to know of one such problem
while thinking about this today: when you ATTACH a partition to a
table that is present in a publish_via_partition_root=true
publication, it doesn't get copied via the initial sync even though
subsequent replication works just fine. The reason for that is that
the subscriber only syncs the partitions that are known at the time
when the parent is first synced, that too via the parent (as SELECT
<columns..> FROM parent), and then marks the parent as sync-done.
Refreshing the subscription after ATTACHing doesn't help, because the
subscriber can't see any partitions to begin with.
Sorry, it is not clear to me how this can lead to a problem. Even if
we just have the root table in the subscriber at the time of sync
later shouldn't copy get the newly attached partition's data and
replicate it to the required partition for
"publish_via_partition_root=true" case? Anyway, if this is a problem
we need to figure the solution for this separately.
[1]: /messages/by-id/CA+HiwqHnDHcT4OOcga9rDFyc7TvDrpN5xFH9J2pyHQo9ptvjmQ@mail.gmail.com
[2]: /messages/by-id/OS0PR01MB5716C756312959F293A822C794869@OS0PR01MB5716.jpnprd01.prod.outlook.com
--
With Regards,
Amit Kapila.
On Thu, Nov 18, 2021 at 10:23 AM Amit Kapila <amit.kapila16@gmail.com> wrote:
(b) when 'publish_via_partition_root' is false, then it
returns duplicate entries for child tables (provided both are part of
publication), this problem is being discussed in this thread.
+1 to just focus on fixing the initial problem proposed i.e.
pg_get_publication_tables() outputting duplicate relid. IMO, all other
issues (if unrelated to this problem and the patch can go to another
new thread, if necessary).
Regards,
Bharath Rupireddy.
On Thu, Nov 18, 2021 at 1:53 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
On Wed, Nov 17, 2021 at 11:39 AM Amit Langote <amitlangote09@gmail.com> wrote:
What IS problematic is what a subscriber sees in the
pg_publication_tables view and the problem occurs only in the initial
sync phase, where the partition is synced duplicatively because of
being found in the view along with the parent in this case, that is,
when publish_via_partiiton_root is true. I was saying on the other
thread [1] that we should leave it up to the subscriber to decide what
to do when the view (duplicatively) returns both the parent and the
partition, citing the use case that a subscriber may want to replicate
the parent and the partition as independent tables. Though I now tend
to agree with Amit K that that may be such a meaningful and all that
common use case, and the implementation on the subscriber side would
have to be unnecessarily complex.So maybe it makes sense to just do what has been proposed --
de-duplicate partitions out of the pg_publication_tables view,AFAICU, there are actually two problems related to
pg_publication_tables view that are being discussed: (a) when
'publish_via_partition_root' is true then it returns both parent and
child tables (provided both are part of publication), this can lead to
duplicate data on the subscriber; the fix for this is being discussed
in thread [1]. (b) when 'publish_via_partition_root' is false, then it
returns duplicate entries for child tables (provided both are part of
publication), this problem is being discussed in this thread.As per the proposed patches, it seems both need a separate fix, we can
fix them together if we want but I am not sure. Is your understanding
the same?
Actually, I'm seeing both (a) and (b) as more or less the same thing.
If publish_via_partition_root=true, we'd want to remove a partition
from the list if the root parent is also present. If false, we'd want
to remove a partition from the list because it'd also be added by way
of expanding the root.
I know that (a) causes the partition-double-sync problem that could be
fixed by partition OID de-duplication as proposed in the other thread.
As a comment on that, it'd be nice to see a test case added to
src/test/subscription suite in that patch, because
partition-double-sync is the main problem statement I'd think.
Like Bharath said, I don't see (b) as much of a problem, because the
subscriber applies DISTINCT when querying pg_publication_tables()
anyway. Other users, if any out there, may be doing the same by
following core subscription code's example. That said, I am not
opposed to applying the patch being proposed here, though I guess we
don't have any src/test/subscription test case for this one? As in,
do we know of any replication (initial/streaming) misbehavior caused
by the duplicate partition OIDs in this case or is the only problem
that pg_publication_tables output looks odd?
unless
we know of a bigger problem that requires us to hack the subscriber
side of things too.There is yet another issue that might need subscriber side change. See
the second issue summarized by Hou-San in the email[2]. I feel it is
better to tackle that separately.Actually, I came to know of one such problem
while thinking about this today: when you ATTACH a partition to a
table that is present in a publish_via_partition_root=true
publication, it doesn't get copied via the initial sync even though
subsequent replication works just fine. The reason for that is that
the subscriber only syncs the partitions that are known at the time
when the parent is first synced, that too via the parent (as SELECT
<columns..> FROM parent), and then marks the parent as sync-done.
Refreshing the subscription after ATTACHing doesn't help, because the
subscriber can't see any partitions to begin with.Sorry, it is not clear to me how this can lead to a problem. Even if
we just have the root table in the subscriber at the time of sync
later shouldn't copy get the newly attached partition's data and
replicate it to the required partition for
"publish_via_partition_root=true" case?
Not sure if you thought otherwise but I am talking about attaching a
new partition into the partition tree on the *publication side*.
The problematic case is attaching the partition *after* the subscriber
has already marked the root parent as synced and/or ready for
replication. Refreshing the subscription doesn't help it discover the
newly attached partition, because a publish_via_partition_root only
ever tells about the root parent, which would be already synced, so
the subscriber would think there's nothing to copy.
Anyway, if this is a problem
we need to figure the solution for this separately.
Sure, we might need to do that after all. Though it might be a good
idea to be sure that we won't need to reconsider the fix we push for
the issue(s) being discussed here and elsewhere, because I suspect
that the solution to the problem I mentioned is likely to involve
tweaking pg_publication_tables view output.
--
Amit Langote
EDB: http://www.enterprisedb.com
On Fri, Nov 19, 2021 at 7:19 AM Amit Langote <amitlangote09@gmail.com> wrote:
On Thu, Nov 18, 2021 at 1:53 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
AFAICU, there are actually two problems related to
pg_publication_tables view that are being discussed: (a) when
'publish_via_partition_root' is true then it returns both parent and
child tables (provided both are part of publication), this can lead to
duplicate data on the subscriber; the fix for this is being discussed
in thread [1]. (b) when 'publish_via_partition_root' is false, then it
returns duplicate entries for child tables (provided both are part of
publication), this problem is being discussed in this thread.As per the proposed patches, it seems both need a separate fix, we can
fix them together if we want but I am not sure. Is your understanding
the same?Actually, I'm seeing both (a) and (b) as more or less the same thing.
If publish_via_partition_root=true, we'd want to remove a partition
from the list if the root parent is also present. If false, we'd want
to remove a partition from the list because it'd also be added by way
of expanding the root.I know that (a) causes the partition-double-sync problem that could be
fixed by partition OID de-duplication as proposed in the other thread.
As a comment on that, it'd be nice to see a test case added to
src/test/subscription suite in that patch, because
partition-double-sync is the main problem statement I'd think.Like Bharath said, I don't see (b) as much of a problem, because the
subscriber applies DISTINCT when querying pg_publication_tables()
anyway. Other users, if any out there, may be doing the same by
following core subscription code's example.
Hmm, I don't think we can assume that current or future users of
pg_publication_tables() will use it with DISTINCT unless we specify in
the specs/docs. However, I guess we might want to do it only for HEAD
as there is no direct field report for this and anyway we have some
workaround for this.
That said, I am not
opposed to applying the patch being proposed here, though I guess we
don't have any src/test/subscription test case for this one?
Yeah, we can add tests for this and the other case.
As in,
do we know of any replication (initial/streaming) misbehavior caused
by the duplicate partition OIDs in this case or is the only problem
that pg_publication_tables output looks odd?
The latter one but I think either we should document this or change it
as we can't assume users will follow what subscriber-side code does.
unless
we know of a bigger problem that requires us to hack the subscriber
side of things too.There is yet another issue that might need subscriber side change. See
the second issue summarized by Hou-San in the email[2]. I feel it is
better to tackle that separately.The problematic case is attaching the partition *after* the subscriber
has already marked the root parent as synced and/or ready for
replication. Refreshing the subscription doesn't help it discover the
newly attached partition, because a publish_via_partition_root only
ever tells about the root parent, which would be already synced, so
the subscriber would think there's nothing to copy.
Okay, I see this could be a problem but I haven't tried to reproduce it.
Anyway, if this is a problem
we need to figure the solution for this separately.Sure, we might need to do that after all. Though it might be a good
idea to be sure that we won't need to reconsider the fix we push for
the issue(s) being discussed here and elsewhere, because I suspect
that the solution to the problem I mentioned is likely to involve
tweaking pg_publication_tables view output.
Okay, so we have four known problems in the same area. The first has
been reported at the start of this thread, then the two as summarized
by Hou-San in his email [1]/messages/by-id/OS0PR01MB5716C756312959F293A822C794869@OS0PR01MB5716.jpnprd01.prod.outlook.com and the fourth one is the attach of
partitions after initial sync as you mentioned. It makes sense to have
some high-level idea on how to solve each of the problems before
pushing a fix for any one particular problem as one fix could have an
impact on other fixes. I'll think over it but please do let me know if
you have any ideas.
[1]: /messages/by-id/OS0PR01MB5716C756312959F293A822C794869@OS0PR01MB5716.jpnprd01.prod.outlook.com
--
With Regards,
Amit Kapila.
On Fri, Nov 19, 2021 at 10:58 AM Amit Kapila <amit.kapila16@gmail.com> wrote:
On Fri, Nov 19, 2021 at 7:19 AM Amit Langote <amitlangote09@gmail.com> wrote:
The problematic case is attaching the partition *after* the subscriber
has already marked the root parent as synced and/or ready for
replication. Refreshing the subscription doesn't help it discover the
newly attached partition, because a publish_via_partition_root only
ever tells about the root parent, which would be already synced, so
the subscriber would think there's nothing to copy.Okay, I see this could be a problem but I haven't tried to reproduce it.
Anyway, if this is a problem
we need to figure the solution for this separately.Sure, we might need to do that after all. Though it might be a good
idea to be sure that we won't need to reconsider the fix we push for
the issue(s) being discussed here and elsewhere, because I suspect
that the solution to the problem I mentioned is likely to involve
tweaking pg_publication_tables view output.
I have thought about this problem and I see two possibilities for a
solution (a) We could provide a new option say 'truncate' (something
on lines proposed here [1]/messages/by-id/CF3B6672-2A43-4204-A60A-68F359218A9B@endpoint.com) which would truncate the table(s) and
change its status to 'i' in the pg_subscription_rel, this would allow
the newly added partition to be synced after refresh. This could lead
to a large copy in such a case. (b) We could somehow get and store all
the partition info from the publisher-side on the subscriber-side
while initial sync (say in new system table
pg_subscription_rel_members). Now, after the refresh, if this list
changes, we can allow to just get the data of that particular
partition but I guess it would mean that we need to store oids of the
publisher which might or might not be safe considering oids can
wraparound before the refresh.
Do you have any other ideas?
One more thing you mentioned is that the initial sync won't work after
refresh but later changes will be replicated but I noticed that later
changes also don't get streamed till we restart the subscriber server.
I am not sure but we might not be invalidating apply workers cache due
to which it didn't notice the same.
[1]: /messages/by-id/CF3B6672-2A43-4204-A60A-68F359218A9B@endpoint.com
--
With Regards,
Amit Kapila.
On Fri, Nov 19, 2021 at 2:28 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
On Fri, Nov 19, 2021 at 7:19 AM Amit Langote <amitlangote09@gmail.com> wrote:
As in,
do we know of any replication (initial/streaming) misbehavior caused
by the duplicate partition OIDs in this case or is the only problem
that pg_publication_tables output looks odd?The latter one but I think either we should document this or change it
as we can't assume users will follow what subscriber-side code does.
On second thought, I agree that de-duplicating partitions from this
view is an improvement.
--
Amit Langote
EDB: http://www.enterprisedb.com
On Sat, Nov 20, 2021 at 8:31 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
On Fri, Nov 19, 2021 at 10:58 AM Amit Kapila <amit.kapila16@gmail.com> wrote:
On Fri, Nov 19, 2021 at 7:19 AM Amit Langote <amitlangote09@gmail.com> wrote:
The problematic case is attaching the partition *after* the subscriber
has already marked the root parent as synced and/or ready for
replication. Refreshing the subscription doesn't help it discover the
newly attached partition, because a publish_via_partition_root only
ever tells about the root parent, which would be already synced, so
the subscriber would think there's nothing to copy.Okay, I see this could be a problem but I haven't tried to reproduce it.
Anyway, if this is a problem
we need to figure the solution for this separately.Sure, we might need to do that after all. Though it might be a good
idea to be sure that we won't need to reconsider the fix we push for
the issue(s) being discussed here and elsewhere, because I suspect
that the solution to the problem I mentioned is likely to involve
tweaking pg_publication_tables view output.I have thought about this problem and I see two possibilities for a
solution (a) We could provide a new option say 'truncate' (something
on lines proposed here [1]) which would truncate the table(s) and
change its status to 'i' in the pg_subscription_rel, this would allow
the newly added partition to be synced after refresh. This could lead
to a large copy in such a case.
Maybe I am missing something about the proposal, though I'd think a
more automatic solution would be better, something that doesn't need
to rely on an unrelated feature.
(b) We could somehow get and store all
the partition info from the publisher-side on the subscriber-side
while initial sync (say in new system table
pg_subscription_rel_members). Now, after the refresh, if this list
changes, we can allow to just get the data of that particular
partition but I guess it would mean that we need to store oids of the
publisher which might or might not be safe considering oids can
wraparound before the refresh.Do you have any other ideas?
I thought that the idea I had earlier mentioned at [1]/messages/by-id/CA+HiwqHnDHcT4OOcga9rDFyc7TvDrpN5xFH9J2pyHQo9ptvjmQ@mail.gmail.com may be useful,
which I can see is similar to your idea (b). I also suspect that it
can be implemented without needing a separate catalog and storing
publication-side relation OIDs in the subscription-side catalog,
though maybe I haven't thought hard enough.
One more thing you mentioned is that the initial sync won't work after
refresh but later changes will be replicated but I noticed that later
changes also don't get streamed till we restart the subscriber server.
I am not sure but we might not be invalidating apply workers cache due
to which it didn't notice the same.
Oh, that sounds odd and, as you appear to say, a separate problem. I'll check.
--
Amit Langote
EDB: http://www.enterprisedb.com
[1]: /messages/by-id/CA+HiwqHnDHcT4OOcga9rDFyc7TvDrpN5xFH9J2pyHQo9ptvjmQ@mail.gmail.com