Initial Schema Sync for Logical Replication
Hi Everyone,
I am working on the initial schema sync for Logical replication. Currently, user have to
manually create a schema on subscriber side. Aim of this feature is to add an option in
create subscription, so that schema sync can be automatic. I am sharing Design Doc below,
but there are some corner cases where the design does not work. Please share your opinion
if design can be improved and we can get rid of corner cases. This design is loosely based
on Pglogical.
DDL replication is required for this feature.
(/messages/by-id/CAAD30U+pVmfKwUKy8cbZOnUXyguJ-uBNejwD75Kyo=OjdQGJ9g@mail.gmail.com)
SQL Changes:-
CREATE SUBSCRIPTION subscription_name
CONNECTION 'conninfo'
PUBLICATION publication_name [, ...]
[ WITH ( subscription_parameter [= value] [, ... ] ) ]
sync_initial_schema (enum) will be added to subscription_parameter.
It can have 3 values:-
TABLES, ALL , NONE (Default)
In ALL everything will be synced including global objects too.
Restrictions :- sync_initial_schema=ALL can only be used for publication with FOR ALL TABLES
Design:-
Publisher :-
Publisher have to implement `SHOW CREATE TABLE_NAME`, this table definition will be used by
subscriber to create exact schema of a table on the subscriber. One alternative to this can
be doing it on the subscriber side itself, we can create a function similar to
describeOneTableDetails and call it on the subscriber. We also need maintain same ownership
as of publisher.
It should also have turned on publication of DDL commands.
Subscriber :-
1. In CreateSubscription() when we create replication slot(walrcv_create_slot()), should
use CRS_EXPORT_SNAPSHOT, So that we can use this snapshot later in the pg_dump.
2. Now we can call pg_dump with above snapshot from CreateSubscription. This is inside
opts.connect && opts.create_slot if statement. If we fail in this step we have to drop
the replication slot and create a new one again. Because we need snapshot and creating a
replication slot is a way to get snapshot. The reason for running pg_dump with above
snapshot is that we don't want execute DDLs in wal_logs to 2 times. With above snapshot we
get a state of database which is before the replication slot origin and any changes after
the snapshot will be in wal_logs.
We will save the pg_dump into a file (custom archive format). So pg_dump will be similar to
pg_dump --connection_string --schema_only --snapshot=xyz -Fc --file initSchema
If sync_initial_schema=TABLES we dont have to call pg_dump/restore at all. TableSync process
will take care of it.
3. If we have to sync global objects we need to call pg_dumpall --globals-only also. But pg_dumpall
does not support --snapshot option, So if user creates a new global object between creation
of replication slot and running pg_dumpall, that above global object will be created 2
times on subscriber , which will error out the Applier process.
4. walrcv_disconnect should be called after pg_dump is finished, otherwise snapshot will
not be valid.
5. Users will replication role cant not call pg_dump , So the replication user have to
superuser. This is a a major problem.
postgres=# create role s4 WITH LOGIN Replication;
CREATE ROLE
╭─sachin@DUB-1800550165 ~
╰─$ pg_dump postgres -s -U s4 1 ↵
pg_dump: error: query failed: ERROR: permission denied for table t1
pg_dump: detail: Query was: LOCK TABLE public.t1, public.t2 IN ACCESS SHARE MODE
6. pg_subscription_rel table column srsubstate will have one more state
SUBREL_STATE_CREATE 'c'. if sync_initial_schema is enabled we will set table_state to 'c'.
Above 6 steps will be done even if subscription is not enabled, but connect is true.
7. Leader Applier process should check if initSync file exist , if true then it should
call pg_restore. We are not using —pre-data and —post-data segment as it is used in
Pglogical, Because post_data works on table having data , but we will fill the data into
table on later stages. pg_restore can be called like this
pg_restore --connection_string -1 file_name
-1 option will execute every command inside of one transaction. If there is any error
everything will be rollbacked.
pg_restore should be called quite early in the Applier process code, before any tablesync
process can be created.
Instead of checking if file exist maybe pg_subscription table can be extended with column
SyncInitialSchema and applier process will check SyncInitialSchema == SYNC_PENDING
8. TableSync process should check the state of table , if it is SUBREL_STATE_CREATE it should
get the latest definition from the publisher and recreate the table. (We have to recreate
the table even if there are no changes). Then it should go into copy table mode as usual.
It might seem that TableSync is doing duplicate work already done by pg_restore. We are doing
it in this way because of concurrent DDLs and refresh publication command.
Concurrent DDL :-
User can execute a DDL command to table t1 at the same time when subscriber is trying to sync
it. pictorial representation https://imgur.com/a/ivrIEv8 [1]
In tablesync process, it makes a connection to the publisher and it sees the
table state which can be in future wrt to the publisher, which can introduce conflicts.
For example:-
CASE 1:- { Publisher removed the column b from the table t1 when subscriber was doing pg_restore
(or any point in concurrent DDL window described in picture [1] ), when tableSync
process will start transaction on the publisher it will see request data of table t1
including column b, which does not exist on the publisher.} So that is why tableSync process
asks for the latest definition.
If we say that we will delay tableSync worker till all the DDL related to table t1 is
applied by the applier process , we can still have a window when publisher issues a DDL
command just before tableSync starts its transaction, and therefore making tableSync and
publisher table definition incompatible (Thanks to Masahiko for pointing out this race
condition).
Applier process will skip all DDL/DMLs related to the table t1 and tableSync will apply those
in Catchup phase.
Although there is one issue what will happen to views/ or functions which depend on the table
. I think they should wait till table_state is > SUBREL_STATE_CREATE (means we have the latest
schema definition from the publisher).
There might be corner cases to this approach or maybe a better way to handle concurrent DDL
One simple solution might be to disallow DDLs on the publisher till all the schema is
synced and all tables have state >= SUBREL_STATE_DATASYNC (We can have CASE 1: issue ,
even with DDL replication, so we have to wait till all the tables have table_state
SUBREL_STATE_DATASYNC). Which might be a big window for big databases.
Refresh publication :-
In refresh publication, subscriber does create a new replication slot hence , we can’t run
pg_dump with a snapshot which starts from origin(maybe this is not an issue at all). In this case
it makes more sense for tableSync worker to do schema sync.
If community is happy with above design, I can start working on prototype.
Credits :- This design is inspired by Pglogical. Also thanks to Zane, Masahiko, Amit for reviewing earlier designs
Regards
Sachin Kumar
Amazon Web Services
Hi,
I have a couple of questions.
Q1.
What happens if the subscriber already has some tables present? For
example, I did not see the post saying anything like "Only if the
table does not already exist then it will be created".
On the contrary, the post seemed to say SUBREL_STATE_CREATE 'c' would
*always* be set when this subscriber mode is enabled. And then it
seemed to say the table would *always* get re-created by the tablesync
in this new mode.
Won't this cause problems
- if the user wanted a slightly different subscriber-side table? (eg
some extra columns on the subscriber-side table)
- if there was some pre-existing table data on the subscriber-side
table that you now are about to re-create and clobber?
Or does the idea intend that the CREATE TABLE DDL that will be
executed is like "CREATE TABLE ... IF NOT EXISTS"?
~~~
Q2.
The post says. "DDL replication is required for this feature". And "It
should also have turned on publication of DDL commands."
It wasn't entirely clear to me why those must be a requirement. Is
that just to make implementation easier?
Sure, I see that the idea might have some (or maybe a lot?) of common
internal code with the table DDL replication work, but OTOH an
auto-create feature for subscriber tables seems like it might be a
useful feature to have regardless of the value of the publication
'ddl' parameter.
------
Kind Regards,
Peter Smith.
Fujitsu Australia.
Hi Peter,
Hi,
I have a couple of questions.
Q1.
What happens if the subscriber already has some tables present? For
example, I did not see the post saying anything like "Only if the table does
not already exist then it will be created".
My assumption was the if subscriber is doing initial schema sync , It does not have
any conflicting database objects.
On the contrary, the post seemed to say SUBREL_STATE_CREATE 'c' would
*always* be set when this subscriber mode is enabled. And then it seemed
to say the table would *always* get re-created by the tablesync in this new
mode.
Right
Won't this cause problems
- if the user wanted a slightly different subscriber-side table? (eg some extra
columns on the subscriber-side table)
- if there was some pre-existing table data on the subscriber-side table that
you now are about to re-create and clobber?Or does the idea intend that the CREATE TABLE DDL that will be executed is
like "CREATE TABLE ... IF NOT EXISTS"?
pg_dump does not support --if-not-exists , But I think it can be added and we get a
dump with IF NOT EXISTS.
On subscriber side we get table OID list, we can use this change table_state
= SUBREL_STATE_INIT so that it won't be recreated.
~~~
Q2.
The post says. "DDL replication is required for this feature". And "It should
also have turned on publication of DDL commands."It wasn't entirely clear to me why those must be a requirement. Is that just to
make implementation easier?
DDL replication is needed to facilitate concurrent DDL, so that we don’t have to
worry about schema change at the same time when subscriber is initializing.
if we can block publisher so that it does not do DDLs or subscriber can simple
error out if it sees conflicting table information , then we don’t need to use DDL
replication.
Regards
Sachin
Show quoted text
Sure, I see that the idea might have some (or maybe a lot?) of common
internal code with the table DDL replication work, but OTOH an auto-create
feature for subscriber tables seems like it might be a useful feature to have
regardless of the value of the publication 'ddl' parameter.------
Kind Regards,
Peter Smith.
Fujitsu Australia.
On 2023-Mar-15, Kumar, Sachin wrote:
1. In CreateSubscription() when we create replication slot(walrcv_create_slot()), should
use CRS_EXPORT_SNAPSHOT, So that we can use this snapshot later in the pg_dump.2. Now we can call pg_dump with above snapshot from CreateSubscription.
Overall I'm not on board with the idea that logical replication would
depend on pg_dump; that seems like it could run into all sorts of
trouble (what if calling external binaries requires additional security
setup? what about pg_hba connection requirements? what about
max_connections in tight circumstances?).
It would be much better, I think, to handle this internally in the
publisher instead: similar to how DDL sync would work, except it'd
somehow generate the CREATE statements from the existing tables instead
of waiting for DDL events to occur. I grant that this does require
writing a bunch of new code for each object type, a lot of which would
duplicate the pg_dump logic, but it would probably be a lot more robust.
--
Álvaro Herrera PostgreSQL Developer — https://www.EnterpriseDB.com/
On Thu, Mar 16, 2023 at 10:27 PM Kumar, Sachin <ssetiya@amazon.com> wrote:
Hi,
I have a couple of questions.
Q1.
What happens if the subscriber already has some tables present? For
example, I did not see the post saying anything like "Only if the table does
not already exist then it will be created".My assumption was the if subscriber is doing initial schema sync , It does not have
any conflicting database objects.
Can't we simply error out in such a case with "obj already exists"?
This would be similar to how we deal with conflicting rows with
unique/primary keys.
--
With Regards,
Amit Kapila.
Hi Amit,
From: Amit Kapila <amit.kapila16@gmail.com>
Hi,
I have a couple of questions.
Q1.
What happens if the subscriber already has some tables present? For
example, I did not see the post saying anything like "Only if the
table does not already exist then it will be created".My assumption was the if subscriber is doing initial schema sync , It
does not have any conflicting database objects.Can't we simply error out in such a case with "obj already exists"?
This would be similar to how we deal with conflicting rows with unique/primary
keys.
Right this is the default behaviour , We will run pg_restore with --single_transaction,
So if we get error while executing a create table the whole pg_restore will fail and
user will notified.
Regards
Sachin
Hi Alvaro,
From: Alvaro Herrera <alvherre@alvh.no-ip.org>
Subject: RE: [EXTERNAL]Initial Schema Sync for Logical Replication
On 2023-Mar-15, Kumar, Sachin wrote:1. In CreateSubscription() when we create replication
slot(walrcv_create_slot()), should use CRS_EXPORT_SNAPSHOT, So that wecan use this snapshot later in the pg_dump.
2. Now we can call pg_dump with above snapshot from CreateSubscription.
Overall I'm not on board with the idea that logical replication would depend on
pg_dump; that seems like it could run into all sorts of trouble (what if calling
external binaries requires additional security setup? what about pg_hba
connection requirements? what about max_connections in tight
circumstances?).
what if calling external binaries requires additional security setup
I am not sure what kind of security restriction would apply in this case, maybe pg_dump
binary can be changed ?
what about pg_hba connection requirements?
We will use the same connection string which subscriber process uses to connect to
the publisher.
what about max_connections in tight circumstances?
Right that might be a issue, but I don’t think it will be a big issue, We will create dump
of database in CreateSubscription() function itself , So before tableSync process even starts
if we have reached max_connections while calling pg_dump itself , tableSync wont be successful.
It would be much better, I think, to handle this internally in the publisher instead:
similar to how DDL sync would work, except it'd somehow generate the CREATE
statements from the existing tables instead of waiting for DDL events to occur. I
grant that this does require writing a bunch of new code for each object type, a
lot of which would duplicate the pg_dump logic, but it would probably be a lot
more robust.
Agree , But we might have a lots of code duplication essentially almost all of pg_dump
Code needs to be duplicated, which might cause issue when modifying/adding new
DDLs.
I am not sure but if it's possible to move dependent code of pg_dump to common/ folder
, to avoid duplication.
Regards
Sachin
On Mon, Mar 20, 2023, at 10:10 PM, Kumar, Sachin wrote:
From: Alvaro Herrera <alvherre@alvh.no-ip.org>
Subject: RE: [EXTERNAL]Initial Schema Sync for Logical Replication
On 2023-Mar-15, Kumar, Sachin wrote:1. In CreateSubscription() when we create replication
slot(walrcv_create_slot()), should use CRS_EXPORT_SNAPSHOT, So that wecan use this snapshot later in the pg_dump.
2. Now we can call pg_dump with above snapshot from CreateSubscription.
Overall I'm not on board with the idea that logical replication would depend on
pg_dump; that seems like it could run into all sorts of trouble (what if calling
external binaries requires additional security setup? what about pg_hba
connection requirements? what about max_connections in tight
circumstances?).
what if calling external binaries requires additional security setupI am not sure what kind of security restriction would apply in this case, maybe pg_dump
binary can be changed ?
Using pg_dump as part of this implementation is not acceptable because we
expect the backend to be decoupled from the client. Besides that, pg_dump
provides all table dependencies (such as tablespaces, privileges, security
labels, comments); not all dependencies shouldn't be replicated. You should
exclude them removing these objects from the TOC before running pg_restore or
adding a few pg_dump options to exclude these objects. Another issue is related
to different version. Let's say the publisher has a version ahead of the
subscriber version, a new table syntax can easily break your logical
replication setup. IMO pg_dump doesn't seem like a good solution for initial
synchronization.
Instead, the backend should provide infrastructure to obtain the required DDL
commands for the specific (set of) tables. This can work around the issues from
the previous paragraph:
* you can selectively choose dependencies;
* don't require additional client packages;
* don't need to worry about different versions.
This infrastructure can also be useful for other use cases such as:
* client tools that provide create commands (such as psql, pgAdmin);
* other logical replication solutions;
* other logical backup solutions.
--
Euler Taveira
EDB https://www.enterprisedb.com/
On Tue, Mar 21, 2023 at 7:32 AM Euler Taveira <euler@eulerto.com> wrote:
On Mon, Mar 20, 2023, at 10:10 PM, Kumar, Sachin wrote:
From: Alvaro Herrera <alvherre@alvh.no-ip.org>
Subject: RE: [EXTERNAL]Initial Schema Sync for Logical Replication
On 2023-Mar-15, Kumar, Sachin wrote:1. In CreateSubscription() when we create replication
slot(walrcv_create_slot()), should use CRS_EXPORT_SNAPSHOT, So that wecan use this snapshot later in the pg_dump.
2. Now we can call pg_dump with above snapshot from CreateSubscription.
Overall I'm not on board with the idea that logical replication would depend on
pg_dump; that seems like it could run into all sorts of trouble (what if calling
external binaries requires additional security setup? what about pg_hba
connection requirements? what about max_connections in tight
circumstances?).
what if calling external binaries requires additional security setupI am not sure what kind of security restriction would apply in this case, maybe pg_dump
binary can be changed ?Using pg_dump as part of this implementation is not acceptable because we
expect the backend to be decoupled from the client. Besides that, pg_dump
provides all table dependencies (such as tablespaces, privileges, security
labels, comments); not all dependencies shouldn't be replicated.
I agree that in the initial version we may not support sync of all
objects but why that shouldn't be possible in the later versions?
You should
exclude them removing these objects from the TOC before running pg_restore or
adding a few pg_dump options to exclude these objects. Another issue is related
to different version. Let's say the publisher has a version ahead of the
subscriber version, a new table syntax can easily break your logical
replication setup. IMO pg_dump doesn't seem like a good solution for initial
synchronization.Instead, the backend should provide infrastructure to obtain the required DDL
commands for the specific (set of) tables. This can work around the issues from
the previous paragraph:
...
* don't need to worry about different versions.
AFAICU some of the reasons why pg_dump is not allowed to dump from the
newer version are as follows: (a) there could be more columns in the
newer version of the system catalog and then Select * type of stuff
won't work because the client won't have knowledge of additional
columns. (b) the newer version could have new features (represented by
say new columns in existing catalogs or new catalogs) that the older
version of pg_dump has no knowledge of and will fail to get that data
and hence an inconsistent dump. The subscriber will easily be not in
sync due to that.
Now, how do we avoid these problems even if we have our own version of
functionality similar to pg_dump for selected objects? I guess we will
face similar problems. If so, we may need to deny schema sync in any
such case.
--
With Regards,
Amit Kapila.
On Tue, Mar 21, 2023 at 8:18 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
On Tue, Mar 21, 2023 at 7:32 AM Euler Taveira <euler@eulerto.com> wrote:
On Mon, Mar 20, 2023, at 10:10 PM, Kumar, Sachin wrote:
From: Alvaro Herrera <alvherre@alvh.no-ip.org>
Subject: RE: [EXTERNAL]Initial Schema Sync for Logical Replication
On 2023-Mar-15, Kumar, Sachin wrote:1. In CreateSubscription() when we create replication
slot(walrcv_create_slot()), should use CRS_EXPORT_SNAPSHOT, So that wecan use this snapshot later in the pg_dump.
2. Now we can call pg_dump with above snapshot from CreateSubscription.
Overall I'm not on board with the idea that logical replication would depend on
pg_dump; that seems like it could run into all sorts of trouble (what if calling
external binaries requires additional security setup? what about pg_hba
connection requirements? what about max_connections in tight
circumstances?).
what if calling external binaries requires additional security setupI am not sure what kind of security restriction would apply in this case, maybe pg_dump
binary can be changed ?Using pg_dump as part of this implementation is not acceptable because we
expect the backend to be decoupled from the client. Besides that, pg_dump
provides all table dependencies (such as tablespaces, privileges, security
labels, comments); not all dependencies shouldn't be replicated.I agree that in the initial version we may not support sync of all
objects but why that shouldn't be possible in the later versions?You should
exclude them removing these objects from the TOC before running pg_restore or
adding a few pg_dump options to exclude these objects. Another issue is related
to different version. Let's say the publisher has a version ahead of the
subscriber version, a new table syntax can easily break your logical
replication setup. IMO pg_dump doesn't seem like a good solution for initial
synchronization.Instead, the backend should provide infrastructure to obtain the required DDL
commands for the specific (set of) tables. This can work around the issues from
the previous paragraph:...
* don't need to worry about different versions.
AFAICU some of the reasons why pg_dump is not allowed to dump from the
newer version are as follows: (a) there could be more columns in the
newer version of the system catalog and then Select * type of stuff
won't work because the client won't have knowledge of additional
columns. (b) the newer version could have new features (represented by
say new columns in existing catalogs or new catalogs) that the older
version of pg_dump has no knowledge of and will fail to get that data
and hence an inconsistent dump. The subscriber will easily be not in
sync due to that.Now, how do we avoid these problems even if we have our own version of
functionality similar to pg_dump for selected objects? I guess we will
face similar problems.
Right. I think that such functionality needs to return DDL commands
that can be executed on the requested version.
If so, we may need to deny schema sync in any such case.
Yes. Do we have any concrete use case where the subscriber is an older
version, in the first place?
Regards,
--
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com
On Wed, Mar 22, 2023 at 8:29 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
On Tue, Mar 21, 2023 at 8:18 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
On Tue, Mar 21, 2023 at 7:32 AM Euler Taveira <euler@eulerto.com> wrote:
You should
exclude them removing these objects from the TOC before running pg_restore or
adding a few pg_dump options to exclude these objects. Another issue is related
to different version. Let's say the publisher has a version ahead of the
subscriber version, a new table syntax can easily break your logical
replication setup. IMO pg_dump doesn't seem like a good solution for initial
synchronization.Instead, the backend should provide infrastructure to obtain the required DDL
commands for the specific (set of) tables. This can work around the issues from
the previous paragraph:...
* don't need to worry about different versions.
AFAICU some of the reasons why pg_dump is not allowed to dump from the
newer version are as follows: (a) there could be more columns in the
newer version of the system catalog and then Select * type of stuff
won't work because the client won't have knowledge of additional
columns. (b) the newer version could have new features (represented by
say new columns in existing catalogs or new catalogs) that the older
version of pg_dump has no knowledge of and will fail to get that data
and hence an inconsistent dump. The subscriber will easily be not in
sync due to that.Now, how do we avoid these problems even if we have our own version of
functionality similar to pg_dump for selected objects? I guess we will
face similar problems.Right. I think that such functionality needs to return DDL commands
that can be executed on the requested version.If so, we may need to deny schema sync in any such case.
Yes. Do we have any concrete use case where the subscriber is an older
version, in the first place?
As per my understanding, it is mostly due to the reason that it can
work today. Today, during an off-list discussion with Jonathan on this
point, he pointed me to a similar incompatibility in MySQL
replication. See the "SQL incompatibilities" section in doc[1]https://dev.mysql.com/doc/refman/8.0/en/replication-compatibility.html. Also,
please note that this applies not only to initial sync but also to
schema sync during replication. I don't think it would be feasible to
keep such cross-version compatibility for DDL replication.
Having said above, I don't intend that we must use pg_dump from the
subscriber for the purpose of initial sync. I think the idea at this
stage is to primarily write a POC patch to see what difficulties we
may face. The other options that we could try out are (a) try to
duplicate parts of pg_dump code in some way (by extracting required
code) for the subscription's initial sync, or (b) have a common code
(probably as a library or some other way) for the required
functionality. There could be more possibilities that we may not have
thought of yet. But the main point is that for approaches other than
using pg_dump, we should consider ways to avoid duplicity of various
parts of its code. Due to this, I think before ruling out using
pg_dump, we should be clear about its risks and limitations.
Thoughts?
[1]: https://dev.mysql.com/doc/refman/8.0/en/replication-compatibility.html
[2]: /messages/by-id/CAAD30U+pVmfKwUKy8cbZOnUXyguJ-uBNejwD75Kyo=OjdQGJ9g@mail.gmail.com
--
With Regards,
Amit Kapila.
On Wednesday, March 22, 2023 1:16 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
On Wed, Mar 22, 2023 at 8:29 AM Masahiko Sawada
<sawada.mshk@gmail.com> wrote:On Tue, Mar 21, 2023 at 8:18 PM Amit Kapila <amit.kapila16@gmail.com>
wrote:
On Tue, Mar 21, 2023 at 7:32 AM Euler Taveira <euler@eulerto.com> wrote:
You should
exclude them removing these objects from the TOC before running
pg_restore or adding a few pg_dump options to exclude these
objects. Another issue is related to different version. Let's say
the publisher has a version ahead of the subscriber version, a new
table syntax can easily break your logical replication setup. IMO
pg_dump doesn't seem like a good solution for initial synchronization.Instead, the backend should provide infrastructure to obtain the
required DDL commands for the specific (set of) tables. This can
work around the issues from the previous paragraph:...
* don't need to worry about different versions.
AFAICU some of the reasons why pg_dump is not allowed to dump from
the newer version are as follows: (a) there could be more columns in
the newer version of the system catalog and then Select * type of
stuff won't work because the client won't have knowledge of
additional columns. (b) the newer version could have new features
(represented by say new columns in existing catalogs or new
catalogs) that the older version of pg_dump has no knowledge of and
will fail to get that data and hence an inconsistent dump. The
subscriber will easily be not in sync due to that.Now, how do we avoid these problems even if we have our own version
of functionality similar to pg_dump for selected objects? I guess we
will face similar problems.Right. I think that such functionality needs to return DDL commands
that can be executed on the requested version.If so, we may need to deny schema sync in any such case.
Yes. Do we have any concrete use case where the subscriber is an older
version, in the first place?As per my understanding, it is mostly due to the reason that it can work today.
Today, during an off-list discussion with Jonathan on this point, he pointed me
to a similar incompatibility in MySQL replication. See the "SQL
incompatibilities" section in doc[1]. Also, please note that this applies not only
to initial sync but also to schema sync during replication. I don't think it would
be feasible to keep such cross-version compatibility for DDL replication.Having said above, I don't intend that we must use pg_dump from the
subscriber for the purpose of initial sync. I think the idea at this stage is to
primarily write a POC patch to see what difficulties we may face. The other
options that we could try out are (a) try to duplicate parts of pg_dump code in
some way (by extracting required
code) for the subscription's initial sync, or (b) have a common code (probably
as a library or some other way) for the required functionality. There could be
more possibilities that we may not have thought of yet. But the main point is
that for approaches other than using pg_dump, we should consider ways to
avoid duplicity of various parts of its code. Due to this, I think before ruling out
using pg_dump, we should be clear about its risks and limitations.
I thought about some possible problems about the design of using pg_dump.
1) According to the design, it will internally call pg_dump when creating
subscription, but it requires to use a powerful user when calling pg_dump.
Currently, it may not be a problem because create subscription also requires
superuser. But people have recently discussed about allowing non-superuser to
create the subscription[1]/messages/by-id/20230308194743.23rmgjgwahh4i4rg@awork3.anarazel.de, if that is accepted, then it seems not great to
internally use superuser to call pg_dump while the user creating the
subscription is a non-super user.
2) I think it's possible that some cloud DB service doesn't allow user to use
the client commands(pg_dump ,..) directly, and the user that login in the
database may not have the permission to execute the client commands.
[1]: /messages/by-id/20230308194743.23rmgjgwahh4i4rg@awork3.anarazel.de
Best Regards,
Hou zj
From: Amit Kapila <amit.kapila16@gmail.com>
Sent: Wednesday, March 22, 2023 5:16 AM
To: Masahiko Sawada <sawada.mshk@gmail.com>
Cc: Euler Taveira <euler@eulerto.com>; Kumar, Sachin
<ssetiya@amazon.com>; Alvaro Herrera <alvherre@alvh.no-ip.org>; pgsql-
hackers@lists.postgresql.org; Jonathan S. Katz <jkatz@postgresql.org>
Subject: RE: [EXTERNAL]Initial Schema Sync for Logical ReplicationCAUTION: This email originated from outside of the organization. Do not click
links or open attachments unless you can confirm the sender and know the
content is safe.On Wed, Mar 22, 2023 at 8:29 AM Masahiko Sawada
<sawada.mshk@gmail.com> wrote:On Tue, Mar 21, 2023 at 8:18 PM Amit Kapila <amit.kapila16@gmail.com>
wrote:
On Tue, Mar 21, 2023 at 7:32 AM Euler Taveira <euler@eulerto.com>
wrote:
You should
exclude them removing these objects from the TOC before running
pg_restore or adding a few pg_dump options to exclude these
objects. Another issue is related to different version. Let's say
the publisher has a version ahead of the subscriber version, a new
table syntax can easily break your logical replication setup. IMO
pg_dump doesn't seem like a good solution for initial synchronization.Instead, the backend should provide infrastructure to obtain the
required DDL commands for the specific (set of) tables. This can
work around the issues from the previous paragraph:...
* don't need to worry about different versions.
AFAICU some of the reasons why pg_dump is not allowed to dump from
the newer version are as follows: (a) there could be more columns in
the newer version of the system catalog and then Select * type of
stuff won't work because the client won't have knowledge of
additional columns. (b) the newer version could have new features
(represented by say new columns in existing catalogs or new
catalogs) that the older version of pg_dump has no knowledge of and
will fail to get that data and hence an inconsistent dump. The
subscriber will easily be not in sync due to that.Now, how do we avoid these problems even if we have our own version
of functionality similar to pg_dump for selected objects? I guess we
will face similar problems.Right. I think that such functionality needs to return DDL commands
that can be executed on the requested version.If so, we may need to deny schema sync in any such case.
Yes. Do we have any concrete use case where the subscriber is an older
version, in the first place?As per my understanding, it is mostly due to the reason that it can work
today. Today, during an off-list discussion with Jonathan on this point, he
pointed me to a similar incompatibility in MySQL replication. See the "SQL
incompatibilities" section in doc[1]. Also, please note that this applies not
only to initial sync but also to schema sync during replication. I don't think it
would be feasible to keep such cross-version compatibility for DDL
replication.Having said above, I don't intend that we must use pg_dump from the
subscriber for the purpose of initial sync. I think the idea at this stage is to
primarily write a POC patch to see what difficulties we may face. The other
options that we could try out are (a) try to duplicate parts of pg_dump code
in some way (by extracting required
code) for the subscription's initial sync, or (b) have a common code (probably
as a library or some other way) for the required functionality. There could be
more possibilities that we may not have thought of yet. But the main point is
that for approaches other than using pg_dump, we should consider ways to
avoid duplicity of various parts of its code. Due to this, I think before ruling
out using pg_dump, we should be clear about its risks and limitations.Thoughts?
There is one more thing which needs to be consider even if we use pg_dump/pg_restore
We still need to have a way to get the create table for tables , if we want to support
concurrent DDLs on the publisher.
8. TableSync process should check the state of table , if it is SUBREL_STATE_CREATE it should
get the latest definition from the publisher and recreate the table. (We have to recreate
the table even if there are no changes). Then it should go into copy table mode as usual.
Unless there is different way to support concurrent DDLs or we going for blocking publisher
till initial sync is completed.
Regards
Sachin
Show quoted text
[1] - https://dev.mysql.com/doc/refman/8.0/en/replication-
compatibility.html
[2] - https://www.postgresql.org/message-
id/CAAD30U%2BpVmfKwUKy8cbZOnUXyguJ-
uBNejwD75Kyo%3DOjdQGJ9g%40mail.gmail.com--
With Regards,
Amit Kapila.
On Wed, Mar 22, 2023 at 2:16 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
On Wed, Mar 22, 2023 at 8:29 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
On Tue, Mar 21, 2023 at 8:18 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
On Tue, Mar 21, 2023 at 7:32 AM Euler Taveira <euler@eulerto.com> wrote:
You should
exclude them removing these objects from the TOC before running pg_restore or
adding a few pg_dump options to exclude these objects. Another issue is related
to different version. Let's say the publisher has a version ahead of the
subscriber version, a new table syntax can easily break your logical
replication setup. IMO pg_dump doesn't seem like a good solution for initial
synchronization.Instead, the backend should provide infrastructure to obtain the required DDL
commands for the specific (set of) tables. This can work around the issues from
the previous paragraph:...
* don't need to worry about different versions.
AFAICU some of the reasons why pg_dump is not allowed to dump from the
newer version are as follows: (a) there could be more columns in the
newer version of the system catalog and then Select * type of stuff
won't work because the client won't have knowledge of additional
columns. (b) the newer version could have new features (represented by
say new columns in existing catalogs or new catalogs) that the older
version of pg_dump has no knowledge of and will fail to get that data
and hence an inconsistent dump. The subscriber will easily be not in
sync due to that.Now, how do we avoid these problems even if we have our own version of
functionality similar to pg_dump for selected objects? I guess we will
face similar problems.Right. I think that such functionality needs to return DDL commands
that can be executed on the requested version.If so, we may need to deny schema sync in any such case.
Yes. Do we have any concrete use case where the subscriber is an older
version, in the first place?As per my understanding, it is mostly due to the reason that it can
work today. Today, during an off-list discussion with Jonathan on this
point, he pointed me to a similar incompatibility in MySQL
replication. See the "SQL incompatibilities" section in doc[1]. Also,
please note that this applies not only to initial sync but also to
schema sync during replication. I don't think it would be feasible to
keep such cross-version compatibility for DDL replication.
Makes sense to me.
Having said above, I don't intend that we must use pg_dump from the
subscriber for the purpose of initial sync. I think the idea at this
stage is to primarily write a POC patch to see what difficulties we
may face. The other options that we could try out are (a) try to
duplicate parts of pg_dump code in some way (by extracting required
code) for the subscription's initial sync, or (b) have a common code
(probably as a library or some other way) for the required
functionality. There could be more possibilities that we may not have
thought of yet. But the main point is that for approaches other than
using pg_dump, we should consider ways to avoid duplicity of various
parts of its code. Due to this, I think before ruling out using
pg_dump, we should be clear about its risks and limitations.Thoughts?
Agreed. My biggest concern about approaches other than using pg_dump
is the same; the code duplication that could increase the maintenance
costs. We should clarify what points of using pg_dump is not a good
idea, and also analyze alternative ideas in depth.
Regards,
--
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com
Yes. Do we have any concrete use case where the subscriber is an older
version, in the first place?As per my understanding, it is mostly due to the reason that it can
work today. Today, during an off-list discussion with Jonathan on this
point, he pointed me to a similar incompatibility in MySQL
replication. See the "SQL incompatibilities" section in doc[1]. Also,
please note that this applies not only to initial sync but also to
schema sync during replication. I don't think it would be feasible to
keep such cross-version compatibility for DDL replication.
I think it's possible to make DDL replication cross-version
compatible, by making the DDL deparser version-aware: the deparsed
JSON blob can have a PG version in it, and the destination server can
process the versioned JSON blob by transforming anything incompatible
according to the original version and its own version.
Regards,
Zane
On Tue, Mar 21, 2023, at 8:18 AM, Amit Kapila wrote:
Now, how do we avoid these problems even if we have our own version of
functionality similar to pg_dump for selected objects? I guess we will
face similar problems. If so, we may need to deny schema sync in any
such case.
There are 2 approaches for initial DDL synchronization:
1) generate the DDL command on the publisher, stream it and apply it as-is on
the subscriber;
2) generate a DDL representation (JSON, for example) on the publisher, stream
it, transform it into a DDL command on subscriber and apply it.
The option (1) is simpler and faster than option (2) because it does not
require an additional step (transformation). However, option (2) is more
flexible than option (1) because it allow you to create a DDL command even if a
feature was removed from the subscriber and the publisher version is less than
the subscriber version or a feature was added to the publisher and the
publisher version is greater than the subscriber version. Of course there are
exceptions and it should forbid the transformation (in this case, it can be
controlled by the protocol version -- LOGICALREP_PROTO_FOOBAR_VERSION_NUM). A
decision must be made: simple/restrict vs complex/flexible.
One of the main use cases for logical replication is migration (X -> Y where X
< Y). Postgres generally does not remove features but it might happen (such as
WITH OIDS syntax) and it would break the DDL replication (option 1). In the
downgrade case (X -> Y where X > Y), it might break the DDL replication if a
new syntax is introduced in X. Having said that, IMO option (1) is fragile if
we want to support DDL replication between different Postgres versions. It
might eventually work but there is no guarantee.
Per discussion [1]/messages/by-id/CAA4eK1+w_dFytBiv3RxbOL76_noMzmX0QGTc8uS=bc2WaPVoow@mail.gmail.com, I think if we agree that the Alvaro's DDL deparse patch is
the way to go with DDL replication, it seems wise that it should be used for
initial DDL synchronization as well.
[1]: /messages/by-id/CAA4eK1+w_dFytBiv3RxbOL76_noMzmX0QGTc8uS=bc2WaPVoow@mail.gmail.com
--
Euler Taveira
EDB https://www.enterprisedb.com/
On Wed, Mar 15, 2023 at 11:12 PM Kumar, Sachin <ssetiya@amazon.com> wrote:
Concurrent DDL :-
User can execute a DDL command to table t1 at the same time when subscriber is trying to sync
it. pictorial representation https://imgur.com/a/ivrIEv8 [1]
In tablesync process, it makes a connection to the publisher and it sees the
table state which can be in future wrt to the publisher, which can introduce conflicts.
For example:-
CASE 1:- { Publisher removed the column b from the table t1 when subscriber was doing pg_restore
(or any point in concurrent DDL window described in picture [1] ), when tableSync
process will start transaction on the publisher it will see request data of table t1
including column b, which does not exist on the publisher.} So that is why tableSync process
asks for the latest definition.
If we say that we will delay tableSync worker till all the DDL related to table t1 is
applied by the applier process , we can still have a window when publisher issues a DDL
command just before tableSync starts its transaction, and therefore making tableSync and
publisher table definition incompatible (Thanks to Masahiko for pointing out this race
condition).
IIUC, this is possible only if tablesync process uses a snapshot
different than the snapshot we have used to perform the initial schema
sync, otherwise, this shouldn't be a problem. Let me try to explain my
understanding with an example (the LSNs used are just explain the
problem):
1. Create Table t1(c1, c2); --LSN: 90
2. Insert t1 (1, 1); --LSN 100
3. Insert t1 (2, 2); --LSN 110
4. Alter t1 Add Column c3; --LSN 120
5. Insert t1 (3, 3, 3); --LSN 130
Now, say before starting tablesync worker, apply process performs
initial schema sync and uses a snapshot corresponding to LSN 100. Then
it starts tablesync process to allow the initial copy of data in t1.
Here, if the table sync process tries to establish a new snapshot, it
may get data till LSN 130 and when it will try to copy the same in
subscriber it will fail. Is my understanding correct about the problem
you described? If so, can't we allow tablesync process to use the same
exported snapshot as we used for the initial schema sync and won't
that solve the problem you described?
Applier process will skip all DDL/DMLs related to the table t1 and tableSync will apply those
in Catchup phase.
Although there is one issue what will happen to views/ or functions which depend on the table
. I think they should wait till table_state is > SUBREL_STATE_CREATE (means we have the latest
schema definition from the publisher).
There might be corner cases to this approach or maybe a better way to handle concurrent DDL
One simple solution might be to disallow DDLs on the publisher till all the schema is
synced and all tables have state >= SUBREL_STATE_DATASYNC (We can have CASE 1: issue ,
even with DDL replication, so we have to wait till all the tables have table_state
SUBREL_STATE_DATASYNC). Which might be a big window for big databases.
Refresh publication :-
In refresh publication, subscriber does create a new replication slot hence , we can’t run
pg_dump with a snapshot which starts from origin(maybe this is not an issue at all). In this case
it makes more sense for tableSync worker to do schema sync.
Can you please explain this problem with some examples?
--
With Regards,
Amit Kapila.
On Thu, Mar 23, 2023 at 2:48 AM Euler Taveira <euler@eulerto.com> wrote:
On Tue, Mar 21, 2023, at 8:18 AM, Amit Kapila wrote:
Now, how do we avoid these problems even if we have our own version of
functionality similar to pg_dump for selected objects? I guess we will
face similar problems. If so, we may need to deny schema sync in any
such case.There are 2 approaches for initial DDL synchronization:
1) generate the DDL command on the publisher, stream it and apply it as-is on
the subscriber;
2) generate a DDL representation (JSON, for example) on the publisher, stream
it, transform it into a DDL command on subscriber and apply it.The option (1) is simpler and faster than option (2) because it does not
require an additional step (transformation). However, option (2) is more
flexible than option (1) because it allow you to create a DDL command even if a
feature was removed from the subscriber and the publisher version is less than
the subscriber version or a feature was added to the publisher and the
publisher version is greater than the subscriber version.
Is this practically possible? Say the publisher has a higher version
that has introduced a new object type corresponding to which it has
either a new catalog or some new columns in the existing catalog. Now,
I don't think the older version of the subscriber can modify the
command received from the publisher so that the same can be applied to
the subscriber because it won't have any knowledge of the new feature.
In the other case where the subscriber is of a newer version, we
anyway should be able to support it with pg_dump as there doesn't
appear to be any restriction with that, am, I missing something?
One of the main use cases for logical replication is migration (X -> Y where X
< Y).
I don't think we need to restrict this case even if we decide to use pg_dump.
Per discussion [1], I think if we agree that the Alvaro's DDL deparse patch is
the way to go with DDL replication, it seems wise that it should be used for
initial DDL synchronization as well.
Even if we decide to use deparse approach, it would still need to
mimic stuff from pg_dump to construct commands based on only catalog
contents. I am not against using this approach but we shouldn't ignore
the duplicity required in this approach.
--
With Regards,
Amit Kapila.
From: Amit Kapila <amit.kapila16@gmail.com>
IIUC, this is possible only if tablesync process uses a snapshot different than the
snapshot we have used to perform the initial schema sync, otherwise, this
shouldn't be a problem. Let me try to explain my understanding with an example
(the LSNs used are just explain the
problem):1. Create Table t1(c1, c2); --LSN: 90
2. Insert t1 (1, 1); --LSN 100
3. Insert t1 (2, 2); --LSN 110
4. Alter t1 Add Column c3; --LSN 120
5. Insert t1 (3, 3, 3); --LSN 130Now, say before starting tablesync worker, apply process performs initial
schema sync and uses a snapshot corresponding to LSN 100. Then it starts
tablesync process to allow the initial copy of data in t1.
Here, if the table sync process tries to establish a new snapshot, it may get data
till LSN 130 and when it will try to copy the same in subscriber it will fail. Is my
understanding correct about the problem you described?
Right
If so, can't we allow
tablesync process to use the same exported snapshot as we used for the initial
schema sync and won't that solve the problem you described?
I think we won't be able to use same snapshot because the transaction will be committed.
In CreateSubscription() we can use the transaction snapshot from walrcv_create_slot()
till walrcv_disconnect() is called.(I am not sure about this part maybe walrcv_disconnect() calls
the commits internally ?).
So somehow we need to keep this snapshot alive, even after transaction is committed(or delay committing
the transaction , but we can have CREATE SUBSCRIPTION with ENABLED=FALSE, so we can have a restart before
tableSync is able to use the same snapshot.)
Refresh publication :-
In refresh publication, subscriber does create a new replication slot
Typo-> subscriber does not
hence , we can’t run
pg_dump with a snapshot which starts from origin(maybe this is not an
issue at all). In this caseit makes more sense for tableSync worker to do schema sync.
Can you please explain this problem with some examples?
I think we can have same issues as you mentioned
New table t1 is added to the publication , User does a refresh publication.
pg_dump / pg_restore restores the table definition. But before tableSync
can start, steps from 2 to 5 happen on the publisher.
1. Create Table t1(c1, c2); --LSN: 90
2. Insert t1 (1, 1); --LSN 100
3. Insert t1 (2, 2); --LSN 110
4. Alter t1 Add Column c3; --LSN 120
5. Insert t1 (3, 3, 3); --LSN 130
And table sync errors out
There can be one more issue , since we took the pg_dump without snapshot (wrt to replication slot).
(I am not 100 percent sure about this).
Lets imagine applier process is lagging behind publisher.
Events on publisher
1. alter t1 drop column c; LSN 100 <-- applier process tries to execute this DDL
2. alter t1 drop column d; LSN 110
3. insert into t1 values(..); LSN 120 <-- (Refresh publication called )pg_dump/restore restores this version
Applier process executing 1 will fail because t1 does not have column c.
Regards
Sachin
On Thu, Mar 23, 2023, at 8:44 AM, Amit Kapila wrote:
On Thu, Mar 23, 2023 at 2:48 AM Euler Taveira <euler@eulerto.com> wrote:
On Tue, Mar 21, 2023, at 8:18 AM, Amit Kapila wrote:
Now, how do we avoid these problems even if we have our own version of
functionality similar to pg_dump for selected objects? I guess we will
face similar problems. If so, we may need to deny schema sync in any
such case.There are 2 approaches for initial DDL synchronization:
1) generate the DDL command on the publisher, stream it and apply it as-is on
the subscriber;
2) generate a DDL representation (JSON, for example) on the publisher, stream
it, transform it into a DDL command on subscriber and apply it.The option (1) is simpler and faster than option (2) because it does not
require an additional step (transformation). However, option (2) is more
flexible than option (1) because it allow you to create a DDL command even if a
feature was removed from the subscriber and the publisher version is less than
the subscriber version or a feature was added to the publisher and the
publisher version is greater than the subscriber version.Is this practically possible? Say the publisher has a higher version
that has introduced a new object type corresponding to which it has
either a new catalog or some new columns in the existing catalog. Now,
I don't think the older version of the subscriber can modify the
command received from the publisher so that the same can be applied to
the subscriber because it won't have any knowledge of the new feature.
In the other case where the subscriber is of a newer version, we
anyway should be able to support it with pg_dump as there doesn't
appear to be any restriction with that, am, I missing something?
I think so (with some limitations). Since the publisher knows the subscriber
version, publisher knows that the subscriber does not contain the new object
type then publisher can decide if this case is critical (and reject the
replication) or optional (and silently not include the feature X -- because it
is not essential for logical replication). If required, the transformation
should be done on the publisher.
Even if we decide to use deparse approach, it would still need to
mimic stuff from pg_dump to construct commands based on only catalog
contents. I am not against using this approach but we shouldn't ignore
the duplicity required in this approach.
It is fine to share code between pg_dump and this new infrastructure. However,
the old code should coexist to support older versions because the new set of
functions don't exist in older server versions. Hence, duplicity should exist
for a long time (if you consider that the current policy is to allow dump from
9.2, we are talking about 10 years or so). There are some threads [1]/messages/by-id/82EFF560-2A09-4C3D-81CC-A2A5EC438CE5@eggerapps.at[2]/messages/by-id/71e01949.2e16b.13df4707405.Coremail.shuai900217@126.com that
discussed this topic: provide a SQL command based on the catalog
representation. You can probably find other discussions searching for "pg_dump
library" or "getddl".
[1]: /messages/by-id/82EFF560-2A09-4C3D-81CC-A2A5EC438CE5@eggerapps.at
[2]: /messages/by-id/71e01949.2e16b.13df4707405.Coremail.shuai900217@126.com
--
Euler Taveira
EDB https://www.enterprisedb.com/