Multi-Master Logical Replication

[1]: https://www.enterprisedb.com/docs/bdr/latest/overview/#characterising-bdr-performance

over 3 years ago

In reply to: Laurenz Albe (#2)

RE: Multi-Master Logical Replication

Dear Laurenz,

Thank you for your interest in our works!

I am missing a discussion how replication conflicts are handled to
prevent replication from breaking or the databases from drifting apart.

Actually we don't have plans for developing the feature that avoids conflict.
We think that it should be done as core PUB/SUB feature, and
this module will just use that.

Best Regards,
Hayato Kuroda
FUJITSU LIMITED

Yura Sokolov

y.sokolov@postgrespro.ru

over 3 years ago

In reply to: kuroda.hayato@fujitsu.com (#3)

Re: Multi-Master Logical Replication

В Чт, 28/04/2022 в 09:49 +1000, Peter Smith пишет:

1.1 ADVANTAGES OF MMLR

- Increases write scalability (e.g., all nodes can write arbitrary data).

I've never heard how transactional-aware multimaster increases
write scalability. More over, usually even non-transactional
multimaster doesn't increase write scalability. At the best it
doesn't decrease.

That is because all hosts have to write all changes anyway. But
side cost increases due to increased network interchange and
interlocking (for transaction-aware MM) and increased latency.

В Чт, 28/04/2022 в 08:34 +0000, kuroda.hayato@fujitsu.com пишет:

Dear Laurenz,

Thank you for your interest in our works!

I am missing a discussion how replication conflicts are handled to
prevent replication from breaking

Actually we don't have plans for developing the feature that avoids conflict.
We think that it should be done as core PUB/SUB feature, and
this module will just use that.

If you really want to have some proper isolation levels (
Read Committed? Repeatable Read?) and/or want to have
same data on each "master", there is no easy way. If you
think it will be "easy", you are already wrong.

Our company has MultiMaster which is built on top of
logical replication. It is even partially open source
( https://github.com/postgrespro/mmts ) , although some
core patches that have to be done for are not up to
date.

And it is second iteration of MM. First iteration were
not "simple" or "easy" already. But even that version had
the hidden bug: rare but accumulating data difference
between nodes. Attempt to fix this bug led to almost
full rewrite of multi-master.

(Disclaimer: I had no relation to both MM versions,
I just work in the same firm).

regards

---------

Yura Sokolov

vignesh C

vignesh21@gmail.com

over 3 years ago

In reply to: Yura Sokolov (#4)

Re: Multi-Master Logical Replication

On Thu, Apr 28, 2022 at 4:24 PM Yura Sokolov <y.sokolov@postgrespro.ru> wrote:

В Чт, 28/04/2022 в 09:49 +1000, Peter Smith пишет:

1.1 ADVANTAGES OF MMLR

- Increases write scalability (e.g., all nodes can write arbitrary data).

I've never heard how transactional-aware multimaster increases
write scalability. More over, usually even non-transactional
multimaster doesn't increase write scalability. At the best it
doesn't decrease.

That is because all hosts have to write all changes anyway. But
side cost increases due to increased network interchange and
interlocking (for transaction-aware MM) and increased latency.

I agree it won't increase in all cases, but it will be better in a few
cases when the user works on different geographical regions operating
on independent schemas in asynchronous mode. Since the write node is
closer to the geographical zone, the performance will be better in a
few cases.

В Чт, 28/04/2022 в 08:34 +0000, kuroda.hayato@fujitsu.com пишет:

Dear Laurenz,

Thank you for your interest in our works!

I am missing a discussion how replication conflicts are handled to
prevent replication from breaking

Actually we don't have plans for developing the feature that avoids conflict.
We think that it should be done as core PUB/SUB feature, and
this module will just use that.

If you really want to have some proper isolation levels (
Read Committed? Repeatable Read?) and/or want to have
same data on each "master", there is no easy way. If you
think it will be "easy", you are already wrong.

The synchronous_commit and synchronous_standby_names configuration
parameters will help in getting the same data across the nodes. Can
you give an example for the scenario where it will be difficult?

Regards,
Vignesh

Yura Sokolov

y.sokolov@postgrespro.ru

over 3 years ago

In reply to: vignesh C (#5)

Re: Multi-Master Logical Replication

В Чт, 28/04/2022 в 17:37 +0530, vignesh C пишет:

On Thu, Apr 28, 2022 at 4:24 PM Yura Sokolov <y.sokolov@postgrespro.ru> wrote:

В Чт, 28/04/2022 в 09:49 +1000, Peter Smith пишет:

1.1 ADVANTAGES OF MMLR

- Increases write scalability (e.g., all nodes can write arbitrary data).

I've never heard how transactional-aware multimaster increases
write scalability. More over, usually even non-transactional
multimaster doesn't increase write scalability. At the best it
doesn't decrease.

That is because all hosts have to write all changes anyway. But
side cost increases due to increased network interchange and
interlocking (for transaction-aware MM) and increased latency.

I agree it won't increase in all cases, but it will be better in a few
cases when the user works on different geographical regions operating
on independent schemas in asynchronous mode. Since the write node is
closer to the geographical zone, the performance will be better in a
few cases.

From EnterpriseDB BDB page [1]https://www.enterprisedb.com/docs/bdr/latest/overview/#characterising-bdr-performance:

Adding more master nodes to a BDR Group does not result in
significant write throughput increase when most tables are
replicated because BDR has to replay all the writes on all nodes.
Because BDR writes are in general more effective than writes coming
from Postgres clients via SQL, some performance increase can be
achieved. Read throughput generally scales linearly with the number
of nodes.

And I'm sure EnterpriseDB does the best.

В Чт, 28/04/2022 в 08:34 +0000, kuroda.hayato@fujitsu.com пишет:

Dear Laurenz,

Thank you for your interest in our works!

I am missing a discussion how replication conflicts are handled to
prevent replication from breaking

Actually we don't have plans for developing the feature that avoids conflict.
We think that it should be done as core PUB/SUB feature, and
this module will just use that.

If you really want to have some proper isolation levels (
Read Committed? Repeatable Read?) and/or want to have
same data on each "master", there is no easy way. If you
think it will be "easy", you are already wrong.

The synchronous_commit and synchronous_standby_names configuration
parameters will help in getting the same data across the nodes. Can
you give an example for the scenario where it will be difficult?

So, synchronous or asynchronous?
Synchronous commit on every master, every alive master or on quorum
of masters?

And it is not about synchronicity. It is about determinism at
conflicts.

If you have fully determenistic conflict resolution that works
exactly same way on each host, then it is possible to have same
data on each host. (But it will not be transactional.)And it seems EDB BDB achieved this.

Or if you have fully and correctly implemented one of distributed
transactions protocols.

regards

------

Yura Sokolov

Peter Smith

smithpb2250@gmail.com

over 3 years ago

In reply to: Yura Sokolov (#6)

Re: Multi-Master Logical Replication

On Fri, Apr 29, 2022 at 2:16 PM Yura Sokolov <y.sokolov@postgrespro.ru> wrote:

В Чт, 28/04/2022 в 17:37 +0530, vignesh C пишет:

On Thu, Apr 28, 2022 at 4:24 PM Yura Sokolov <y.sokolov@postgrespro.ru> wrote:

В Чт, 28/04/2022 в 09:49 +1000, Peter Smith пишет:

1.1 ADVANTAGES OF MMLR

- Increases write scalability (e.g., all nodes can write arbitrary data).

I've never heard how transactional-aware multimaster increases
write scalability. More over, usually even non-transactional
multimaster doesn't increase write scalability. At the best it
doesn't decrease.

That is because all hosts have to write all changes anyway. But
side cost increases due to increased network interchange and
interlocking (for transaction-aware MM) and increased latency.

I agree it won't increase in all cases, but it will be better in a few
cases when the user works on different geographical regions operating
on independent schemas in asynchronous mode. Since the write node is
closer to the geographical zone, the performance will be better in a
few cases.

From EnterpriseDB BDB page [1]:

Adding more master nodes to a BDR Group does not result in
significant write throughput increase when most tables are
replicated because BDR has to replay all the writes on all nodes.
Because BDR writes are in general more effective than writes coming
from Postgres clients via SQL, some performance increase can be
achieved. Read throughput generally scales linearly with the number
of nodes.

And I'm sure EnterpriseDB does the best.

В Чт, 28/04/2022 в 08:34 +0000, kuroda.hayato@fujitsu.com пишет:

Dear Laurenz,

Thank you for your interest in our works!

I am missing a discussion how replication conflicts are handled to
prevent replication from breaking

Actually we don't have plans for developing the feature that avoids conflict.
We think that it should be done as core PUB/SUB feature, and
this module will just use that.

If you really want to have some proper isolation levels (
Read Committed? Repeatable Read?) and/or want to have
same data on each "master", there is no easy way. If you
think it will be "easy", you are already wrong.

The synchronous_commit and synchronous_standby_names configuration
parameters will help in getting the same data across the nodes. Can
you give an example for the scenario where it will be difficult?

So, synchronous or asynchronous?
Synchronous commit on every master, every alive master or on quorum
of masters?

And it is not about synchronicity. It is about determinism at
conflicts.

If you have fully determenistic conflict resolution that works
exactly same way on each host, then it is possible to have same
data on each host. (But it will not be transactional.)And it seems EDB BDB achieved this.

Or if you have fully and correctly implemented one of distributed
transactions protocols.

[1] https://www.enterprisedb.com/docs/bdr/latest/overview/#characterising-bdr-performance

regards

------

Yura Sokolov

Thanks for your feedback.

This MMLR proposal was mostly just to create an interface making it
easier to use PostgreSQL core logical replication CREATE
PUBLICATION/SUBSCRIPTION for table sharing among a set of nodes.
Otherwise, this is difficult for a user to do manually. (e.g.
difficulties as mentioned in section 2.2 of the original post [1]/messages/by-id/CAHut+PuwRAoWY9pz=Eubps3ooQCOBFiYPU9Yi=VB-U+yORU7OA@mail.gmail.com -
dealing with initial table data, coordinating the timing/locking to
avoid concurrent updates, getting the SUBSCRIPTION options for
copy_data exactly right etc)

At this time we have no provision for HA, nor for transaction
consistency awareness, conflict resolutions, node failure detections,
DDL replication etc. Some of the features like DDL replication are
currently being implemented [2]/messages/by-id/45d0d97c-3322-4054-b94f-3c08774bbd90@www.fastmail.com, so when committed it will become
available in the core, and can then be integrated into this module.

Once the base feature of the current MMLR proposal is done, perhaps it
can be extended in subsequent versions.

Probably our calling this “Multi-Master” has been
misleading/confusing, because that term implies much more to other
readers. We really only intended it to mean the ability to set up
logical replication across a set of nodes. Of course, we can rename
the proposal (and API) to something different if there are better
suggestions.

------
[1]: /messages/by-id/CAHut+PuwRAoWY9pz=Eubps3ooQCOBFiYPU9Yi=VB-U+yORU7OA@mail.gmail.com
[2]: /messages/by-id/45d0d97c-3322-4054-b94f-3c08774bbd90@www.fastmail.com

Kind Regards,
Peter Smith.
Fujitsu Australia

vignesh C

vignesh21@gmail.com

over 3 years ago

In reply to: Peter Smith (#7)

Re: Multi-Master Logical Replication

On Fri, Apr 29, 2022 at 2:35 PM Peter Smith <smithpb2250@gmail.com> wrote:

On Fri, Apr 29, 2022 at 2:16 PM Yura Sokolov <y.sokolov@postgrespro.ru> wrote:

В Чт, 28/04/2022 в 17:37 +0530, vignesh C пишет:

On Thu, Apr 28, 2022 at 4:24 PM Yura Sokolov <y.sokolov@postgrespro.ru> wrote:

В Чт, 28/04/2022 в 09:49 +1000, Peter Smith пишет:

1.1 ADVANTAGES OF MMLR

- Increases write scalability (e.g., all nodes can write arbitrary data).

I've never heard how transactional-aware multimaster increases
write scalability. More over, usually even non-transactional
multimaster doesn't increase write scalability. At the best it
doesn't decrease.

That is because all hosts have to write all changes anyway. But
side cost increases due to increased network interchange and
interlocking (for transaction-aware MM) and increased latency.

I agree it won't increase in all cases, but it will be better in a few
cases when the user works on different geographical regions operating
on independent schemas in asynchronous mode. Since the write node is
closer to the geographical zone, the performance will be better in a
few cases.

From EnterpriseDB BDB page [1]:

Adding more master nodes to a BDR Group does not result in
significant write throughput increase when most tables are
replicated because BDR has to replay all the writes on all nodes.
Because BDR writes are in general more effective than writes coming
from Postgres clients via SQL, some performance increase can be
achieved. Read throughput generally scales linearly with the number
of nodes.

And I'm sure EnterpriseDB does the best.

В Чт, 28/04/2022 в 08:34 +0000, kuroda.hayato@fujitsu.com пишет:

Dear Laurenz,

Thank you for your interest in our works!

I am missing a discussion how replication conflicts are handled to
prevent replication from breaking

Actually we don't have plans for developing the feature that avoids conflict.
We think that it should be done as core PUB/SUB feature, and
this module will just use that.

If you really want to have some proper isolation levels (
Read Committed? Repeatable Read?) and/or want to have
same data on each "master", there is no easy way. If you
think it will be "easy", you are already wrong.

The synchronous_commit and synchronous_standby_names configuration
parameters will help in getting the same data across the nodes. Can
you give an example for the scenario where it will be difficult?

So, synchronous or asynchronous?
Synchronous commit on every master, every alive master or on quorum
of masters?

And it is not about synchronicity. It is about determinism at
conflicts.

If you have fully determenistic conflict resolution that works
exactly same way on each host, then it is possible to have same
data on each host. (But it will not be transactional.)And it seems EDB BDB achieved this.

Or if you have fully and correctly implemented one of distributed
transactions protocols.

[1] https://www.enterprisedb.com/docs/bdr/latest/overview/#characterising-bdr-performance

regards

------

Yura Sokolov

Thanks for your feedback.

This MMLR proposal was mostly just to create an interface making it
easier to use PostgreSQL core logical replication CREATE
PUBLICATION/SUBSCRIPTION for table sharing among a set of nodes.
Otherwise, this is difficult for a user to do manually. (e.g.
difficulties as mentioned in section 2.2 of the original post [1] -
dealing with initial table data, coordinating the timing/locking to
avoid concurrent updates, getting the SUBSCRIPTION options for
copy_data exactly right etc)

Different problems and how to solve each scenario is mentioned detailly in [1]/messages/by-id/CAA4eK1+co2cd8a6okgUD_pcFEHcc7mVc0k_RE2=6ahyv3WPRMg@mail.gmail.com.
It gets even more complex when there are more nodes associated, let's
consider the 3 node case:
Adding a new node node3 to the existing node1 and node2 when data is
present in existing nodes node1 and node2, the following steps are
required:
Create a publication in node3:
CREATE PUBLICATION pub_node3 for all tables;

Create a subscription in node1 to subscribe the changes from node3:
CREATE SUBSCRIPTION sub_node1_node3 CONNECTION 'dbname=foo host=node3
user=repuser' PUBLICATION pub_node3 WITH (copy_data = off, local_only
= on);

Create a subscription in node2 to subscribe the changes from node3:
CREATE SUBSCRIPTION sub_node2_node3 CONNECTION 'dbname=foo host=node3
user=repuser' PUBLICATION pub_node3 WITH (copy_data = off, local_only
= on);

Lock database at node2 and wait till walsender sends WAL to node1(upto
current lsn) to avoid any data loss because of node2's WAL not being
sent to node1. This lock needs to be held till the setup is complete.

Create a subscription in node3 to subscribe the changes from node1,
here copy_data is specified as force so that the existing table data
is copied during initial sync:
CREATE SUBSCRIPTION sub_node3_node1
CONNECTION 'dbname=foo host=node1 user=repuser'
PUBLICATION pub_node1
WITH (copy_data = force, local_only = on);

Create a subscription in node3 to subscribe the changes from node2:
CREATE SUBSCRIPTION sub_node3_node2
CONNECTION 'dbname=foo host=node2 user=repuser'
PUBLICATION pub_node2
WITH (copy_data = off, local_only = on);

If data is present in node3 few more additional steps are required: a)
copying node3 data to node1 b) copying node3 data to node2 c) altering
publication not to send truncate operation d) truncate the data in
node3 e) altering the publication to include sending of truncate.

[1]: /messages/by-id/CAA4eK1+co2cd8a6okgUD_pcFEHcc7mVc0k_RE2=6ahyv3WPRMg@mail.gmail.com

Regards,
Vignesh

vignesh C

vignesh21@gmail.com

over 3 years ago

In reply to: Peter Smith (#1)

Re: Multi-Master Logical Replication

On Thu, Apr 28, 2022 at 5:20 AM Peter Smith <smithpb2250@gmail.com> wrote:

MULTI-MASTER LOGICAL REPLICATION

1.0 BACKGROUND

Let’s assume that a user wishes to set up a multi-master environment
so that a set of PostgreSQL instances (nodes) use logical replication
to share tables with every other node in the set.

We define this as a multi-master logical replication (MMLR) node-set.

<please refer to the attached node-set diagram>

1.1 ADVANTAGES OF MMLR

- Increases write scalability (e.g., all nodes can write arbitrary data).
- Allows load balancing
- Allows rolling updates of nodes (e.g., logical replication works
between different major versions of PostgreSQL).
- Improves the availability of the system (e.g., no single point of failure)
- Improves performance (e.g., lower latencies for geographically local nodes)

2.0 MMLR AND POSTGRESQL

It is already possible to configure a kind of MMLR set in PostgreSQL
15 using PUB/SUB, but it is very restrictive because it can only work
when no two nodes operate on the same table. This is because when two
nodes try to share the same table then there becomes a circular
recursive problem where Node1 replicates data to Node2 which is then
replicated back to Node1 and so on.

To prevent the circular recursive problem Vignesh is developing a
patch [1] that introduces new SUBSCRIPTION options "local_only" (for
publishing only data originating at the publisher node) and
"copy_data=force". Using this patch, we have created a script [2]
demonstrating how to set up all the above multi-node examples. An
overview of the necessary steps is given in the next section.

2.1 STEPS – Adding a new node N to an existing node-set

step 1. Prerequisites – Apply Vignesh’s patch [1]. All nodes in the
set must be visible to each other by a known CONNECTION. All shared
tables must already be defined on all nodes.

step 2. On node N do CREATE PUBLICATION pub_N FOR ALL TABLES

step 3. All other nodes then CREATE SUBSCRIPTION to PUBLICATION pub_N
with "local_only=on, copy_data=on" (this will replicate initial data
from the node N tables to every other node).

step 4. On node N, temporarily ALTER PUBLICATION pub_N to prevent
replication of 'truncate', then TRUNCATE all tables of node N, then
re-allow replication of 'truncate'.

step 5. On node N do CREATE SUBSCRIPTION to the publications of all
other nodes in the set
5a. Specify "local_only=on, copy_data=force" for exactly one of the
subscriptions (this will make the node N tables now have the same
data as the other nodes)
5b. Specify "local_only=on, copy_data=off" for all other subscriptions.

step 6. Result - Now changes to any table on any node should be
replicated to every other node in the set.

Note: Steps 4 and 5 need to be done within the same transaction to
avoid loss of data in case of some command failure. (Because we can't
perform create subscription in a transaction, we need to create the
subscription in a disabled mode first and then enable it in the
transaction).

2.2 DIFFICULTIES

Notice that it becomes increasingly complex to configure MMLR manually
as the number of nodes in the set increases. There are also some
difficulties such as
- dealing with initial table data
- coordinating the timing to avoid concurrent updates
- getting the SUBSCRIPTION options for copy_data exactly right.

3.0 PROPOSAL

To make the MMLR setup simpler, we propose to create a new API that
will hide all the step details and remove the burden on the user to
get it right without mistakes.

3.1 MOTIVATION
- MMLR (sharing the same tables) is not currently possible
- Vignesh's patch [1] makes MMLR possible, but the manual setup is
still quite difficult
- An MMLR implementation can solve the timing problems (e.g., using
Database Locking)

3.2 API

Preferably the API would be implemented as new SQL functions in
PostgreSQL core, however, implementation using a contrib module or
some new SQL syntax may also be possible.

SQL functions will be like below:
- pg_mmlr_set_create = create a new set, and give it a name
- pg_mmlr_node_attach = attach the current node to a specified set
- pg_mmlr_node_detach = detach a specified node from a specified set
- pg_mmlr_set_delete = delete a specified set

For example, internally the pg_mmlr_node_attach API function would
execute the equivalent of all the CREATE PUBLICATION, CREATE
SUBSCRIPTION, and TRUNCATE steps described above.

Notice this proposal has some external API similarities with the BDR
extension [3] (which also provides multi-master logical replication),
although we plan to implement it entirely using PostgreSQL’s PUB/SUB.

4.0 ACKNOWLEDGEMENTS

The following people have contributed to this proposal – Hayato
Kuroda, Vignesh C, Peter Smith, Amit Kapila.

5.0 REFERENCES

[1] /messages/by-id/CALDaNm0gwjY_4HFxvvty01BOT01q_fJLKQ3pWP9=9orqubhjcQ@mail.gmail.com
[2] /messages/by-id/CAHut+PvY2P=UL-X6maMA5QxFKdcdciRRCKDH3j=_hO8u2OyRYg@mail.gmail.com
[3] https://www.enterprisedb.com/docs/bdr/latest/

[END]

~~~

One of my colleagues will post more detailed information later.

MMLR is changed to LRG(Logical replication group) to avoid confusions.

The LRG functionality will be implemented as given below:
The lrg contrib module provides a set of API to allow setting up
bi-directional logical replication among different nodes. The lrg
stands for Logical Replication Group.
To use this functionality shared_preload_libraries must be set to lrg like:
shared_preload_libraries = lrg
A new process "lrg launcher" is added which will be launched when the
extension is created. This process is responsible for checking if user
has created new logical replication group or if the user is attaching
a new node to the logical replication group or detach a node or drop a
logical replication group and if so, then launches another new “lrg
worker” for the corresponding database.
The new process "lrg worker" is responsible for handling the core
tasks of lrg_create, lrg_node_attach, lrg_node_detach and lrg_drop
functionality.
The “lrg worker” is required here because there are a lot of steps
involved in this process like create publication, create subscription,
alter publication, lock table, etc. If there is a failure during any
of the process, the worker will be restarted and is responsible to
continue the operation from where it left off to completion.
The following new tables were added to maintain the logical
replication group related information:
-- pg_lrg_info table to maintain the logical replication group information.
CREATE TABLE lrg.pg_lrg_info
(
groupname text PRIMARY KEY, -- name of the logical replication group
pubtype text – type of publication(ALL TABLES, SCHEMA, TABLE)
currently only “ALL TABLES” is supported
);

-- pg_ lrg_nodes table to maintain the node information that are
members of the logical replication group.
CREATE TABLE lrg.pg_lrg_nodes
(
nodeid text PRIMARY KEY, -- node id (actual node_id format is
still not finalized)
groupname text REFERENCES pg_lrg_info(groupname), -- name of the
logical replication group
dbid oid NOT NULL, -- db id
status text NOT NULL, -- status of the node
nodename text, -- node name
localconn text NOT NULL, -- local connection string
upstreamconn text – upstream connection string to connect to
another node already in the logical replication group
);

-- pg_ lrg_pub table to maintain the publications that were created
for this node.
CREATE TABLE lrg.pg_lrg_pub
(
groupname text REFERENCES pg_lrg_info(groupname), -- name of the
logical replication group
pubid oid NOT NULL – oid of the publication
);

-- pg_lrg_sub table to maintain the subscriptions that were created
for this node.
CREATE TABLE lrg.pg_lrg_sub
(
groupname text REFERENCES pg_lrg_info(groupname), -- name of the
logical replication group
subid oid NOT NULL– oid of the subscription
);

The following functionality was added to support the various logical
replication group functionalities:
lrg_create(group_name text, pub_type text, local_connection_string
text, node_name text)
lrg _node_attach(group_name text, local_connection_string text,
upstream_connection_string text, node_name text)
lrg_node_detach(group_name text, node_name text)
lrg_drop(group_name text)
-----------------------------------------------------------------------------------------------------------------------------------

lrg_create – This function creates a logical replication group as
specified in group_name.
example:
postgres=# SELECT lrg.lrg_create('test', 'FOR ALL TABLES',
'user=postgres port=5432', 'testnode1');

This function adds a logical replication group “test” with pubtype as
“FOR ALL TABLES” to pg_lrg_info like given below:
postgres=# select * from lrg. pg_lrg_info;
groupname | pubtype
----------+------------------
test | FOR ALL TABLES
(1 row)

It adds node information which includes the node id, database id,
status, node name, connection string and upstream connection string to
pg_lrg_nodes like given below:
postgres=# select * from lrg.pg_lrg_nodes ;
nodeid | groupname |
dbid | status | nodename | localconn | upstreamconn
-------------------------------------------------------------+------+--------+-----------+-----------------------------------------+-----------------------------------------
70934590432710321605user=postgres port=5432 | test | 5 | ready |
testnode1 | user=postgres port=5432 |
(1 row)

The “lrg worker” will perform the following:
1) It will lock the pg_lrg_info and pg_lrg_nodes tables.
2) It will create the publication in the current node.
3) It will change the (pg_lrg_nodes) status from init to createpublication.
4) It will unlock the pg_lrg_info and pg_lrg_nodes tables
5) It will change the (pg_lrg_nodes) status from createpublication to ready.
-----------------------------------------------------------------------------------------------------------------------------------

lrg_node_attach – Attach the specified node to the specified logical
replication group.
example:
postgres=# SELECT lrg.lrg_node_attach('test', 'user=postgres
port=9999', 'user=postgres port=5432', 'testnode2')
This function adds logical replication group “test” with pubtype as
“FOR ALL TABLES” to pg_lrg_info in the new node like given below:
postgres=# select * from pg_lrg_info;
groupname | pubtype
----------+------------------
test | FOR ALL TABLES
(1 row)

This is the same group name that was added during lrg_create in the
create node. Now this information will be available in the new node
too. This information will help the user to attach to any of the nodes
present in the logical replication group.
It adds node information which includes the node id, database id,
status, node name, connection string and upstream connection string of
the current node and the other nodes that are part of the logical
replication group to pg_lrg_nodes like given below:
postgres=# select * from lrg.pg_lrg_nodes ;
nodeid | groupname |
dbid | status | nodename | localconn | upstreamconn
-------------------------------------------------------------+------+--------+-----------+-----------------------------------------+-----------------------------------------
70937999584732760095user=vignesh dbname=postgres port=9999 | test |
5 | ready | testnode2 | user=vignesh dbname=postgres port=9999 |
user=vignesh dbname=postgres port=5432
70937999523629205245user=vignesh dbname=postgres port=5432 | test |
5 | ready | testnode1 | user=vignesh dbname=postgres port=5432 |
(2 rows)

It will use the upstream connection to connect to the upstream node
and get the nodes that are part of the logical replication group.
Note: The nodeid used here is for illustrative purpose, actual nodeid
format is still not finalized.
For this API the “lrg worker” will perform the following:
1) It will lock the pg_lrg_info and pg_lrg_nodes tables.
2) It will connect to the upstream node specified and get the list of
other nodes present in the logical replication group.
3) It will connect to the remaining nodes and lock the database so
that no new operations are performed.
4) It will wait in the upstream node till it reaches the latest lsn of
the remaining nodes, this is somewhat similar to wait_for_catchup
function in tap tests.
5) It will change the status (pg_lrg_nodes) from init to waitforlsncatchup.
6) It will create the publication in the current node.
7) It will change the status (pg_lrg_nodes) from waitforlsncatchup to
createpublication.
8) It will create a subscription in all the remaining nodes to get the
data from new node.
9) It will change the status (pg_lrg_nodes) from createpublication to
createsubscription.
10) It will alter the publication not to replicate truncate operation.
11) It will truncate the table.
12) It will alter the publication to include sending the truncate operation.
13) It will create a subscription in the current node to subscribe the
data with copy_data force.
14) It will create a subscription in the remaining nodes to subscribe
the data with copy_data off.
15) It will unlock the database in all the remaining nodes.
16) It will unlock the pg_lrg_info and pg_lrg_nodes tables.
17) It will change the status (pg_lrg_nodes) from createsubscription to ready.

The status will be useful to display the progress of the operation to
the user and help in failure handling to continue the operation from
the state it had failed.
-----------------------------------------------------------------------------------------------------------------------------------

lrg_node_detach – detach a node from the logical replication group.
example:
postgres=# SELECT lrg.lrg_node_detach('test', 'testnode');
For this API the “lrg worker” will perform the following:
1) It will lock the pg_lrg_info and pg_lrg_nodes tables.
2) It will get the list of other nodes present in the logical replication group.
3) It will connect to the remaining nodes and lock the database so
that no new operations are performed.
4) It will drop the subscription in all the nodes corresponding to
this node of the cluster.
5) It will drop the publication in the current node.
6) It will remove all the data associated with this logical
replication group from pg_lrg_* tables.
7) It will unlock the pg_lrg_info and pg_lrg_nodes tables.
-----------------------------------------------------------------------------------------------------------------------------------

lrg_drop - drop a group from logical replication groups.
example:
postgres=# SELECT lrg.lrg_drop('test');

This function removes the group specified from the logical replication
groups. This function must be executed at the member of a given
logical replication group.
For this API the “lrg worker” will perform the following:
1) It will lock the pg_lrg_info and pg_lrg_nodes tables..
2) DROP PUBLICATION of this node that was created for this logical
replication group.
3) Remove all data from the logical replication group system table
associated with the logical replication group.
4) It will unlock the pg_lrg_info and pg_lrg_nodes tables.

If there are no objections the API can be implemented as SQL functions
in PostgreSQL core and the new tables can be created as system tables.

Thoughts?

Regards,
Vignesh

#10

bruce@momjian.us

over 3 years ago

In reply to: Peter Smith (#7)

Re: Multi-Master Logical Replication

On Fri, Apr 29, 2022 at 07:05:11PM +1000, Peter Smith wrote:

This MMLR proposal was mostly just to create an interface making it
easier to use PostgreSQL core logical replication CREATE
PUBLICATION/SUBSCRIPTION for table sharing among a set of nodes.
Otherwise, this is difficult for a user to do manually. (e.g.
difficulties as mentioned in section 2.2 of the original post [1] -
dealing with initial table data, coordinating the timing/locking to
avoid concurrent updates, getting the SUBSCRIPTION options for
copy_data exactly right etc)

At this time we have no provision for HA, nor for transaction
consistency awareness, conflict resolutions, node failure detections,
DDL replication etc. Some of the features like DDL replication are
currently being implemented [2], so when committed it will become
available in the core, and can then be integrated into this module.

Uh, without these features, what workload would this help with? I think
you made the mistake of jumping too far into implementation without
explaining the problem you are trying to solve. The TODO list has this
ordering:

https://wiki.postgresql.org/wiki/Todo
Desirability -> Design -> Implement -> Test -> Review -> Commit

--
Bruce Momjian <bruce@momjian.us> https://momjian.us
EDB https://enterprisedb.com

Indecision is a decision. Inaction is an action. Mark Batterson

#11

[1]: /messages/by-id/CALDaNm0gwjY_4HFxvvty01BOT01q_fJLKQ3pWP9=9orqubhjcQ@mail.gmail.com
[2]: /messages/by-id/CALDaNm3aD3nZ0HWXA8V435AGMvORyR5-mq2FzqQdKQ8CPomB5Q@mail.gmail.com

amit.kapila16@gmail.com

over 3 years ago

In reply to: Bruce Momjian (#10)

Re: Multi-Master Logical Replication

On Sat, May 14, 2022 at 12:33 AM Bruce Momjian <bruce@momjian.us> wrote:

Uh, without these features, what workload would this help with?

To allow replication among multiple nodes when some of the nodes may
have pre-existing data. This work plans to provide simple APIs to
achieve that. Now, let me try to explain the difficulties users can
face with the existing interface. It is simple to set up replication
among various nodes when they don't have any pre-existing data but
even in that case if the user operates on the same table at multiple
nodes, the replication will lead to an infinite loop and won't
proceed. The example in email [1]/messages/by-id/CALDaNm0gwjY_4HFxvvty01BOT01q_fJLKQ3pWP9=9orqubhjcQ@mail.gmail.com demonstrates that and the patch in
that thread attempts to solve it. I have mentioned that problem
because this work will need that patch.

Now, let's take a simple case where two nodes have the same table
which has some pre-existing data:

Node-1:
Table t1 (c1 int) has data
1, 2, 3, 4

Node-2:
Table t1 (c1 int) has data
5, 6, 7, 8

If we have to set up replication among the above two nodes using
existing interfaces, it could be very tricky. Say user performs
operations like below:

Node-1
#Publication for t1
Create Publication pub1 For Table t1;

Node-2
#Publication for t1,
Create Publication pub1_2 For Table t1;

Node-1:
Create Subscription sub1 Connection '<node-2 details>' Publication pub1_2;

Node-2:
Create Subscription sub1_2 Connection '<node-1 details>' Publication pub1;

After this the data will be something like this:
Node-1:
1, 2, 3, 4, 5, 6, 7, 8

Node-2:
1, 2, 3, 4, 5, 6, 7, 8, 5, 6, 7, 8

So, you can see that data on Node-2 (5, 6, 7, 8) is duplicated. In
case, table t1 has a unique key, it will lead to a unique key
violation and replication won't proceed. Here, I have assumed that we
already have functionality for the patch in email [1]/messages/by-id/CALDaNm0gwjY_4HFxvvty01BOT01q_fJLKQ3pWP9=9orqubhjcQ@mail.gmail.com, otherwise,
replication will be an infinite loop replicating the above data again
and again. Now one way to achieve this could be that we can ask users
to stop all operations on both nodes before starting replication
between those and take data dumps of tables from each node they want
to replicate and restore them to other nodes. Then use the above
commands to set up replication and allow to start operations on those
nodes. The other possibility for users could be as below. Assume, we
have already created publications as in the above example, and then:

Node-2:
Create Subscription sub1_2 Connection '<node-1 details>' Publication pub1;

#Wait for the initial sync of table t1 to finish. Users can ensure
that by checking 'srsubstate' in pg_subscription_rel.

Node-1:
Begin;
# Disallow truncates to be published and then truncate the table
Alter Publication pub1 Set (publish = 'insert, update, delete');
Truncate t1;
Create Subscription sub1 Connection '<node-2 details>' Publication pub1_2;
Alter Publication pub1 Set (publish = 'insert, update, delete, truncate');
Commit;

This will become more complicated when more than two nodes are
involved, see the example provided for the three nodes case [2]/messages/by-id/CALDaNm3aD3nZ0HWXA8V435AGMvORyR5-mq2FzqQdKQ8CPomB5Q@mail.gmail.com. Can
you think of some other simpler way to achieve the same? If not, I
don't think the current way is ideal and even users won't prefer that.
I am not telling that the APIs proposed in this thread is the only or
best way to achieve the desired purpose but I think we should do
something to allow users to easily set up replication among multiple
nodes.

--
With Regards,
Amit Kapila.

#12

over 3 years ago

In reply to: Amit Kapila (#11)

3 attachment(s)

RE: Multi-Master Logical Replication

Hi hackers,

I created a small PoC. Please see the attached patches.

REQUIREMENT

Before patching them, patches in [1]https://commitfest.postgresql.org/38/3610/ must also be applied.

DIFFERENCES FROM PREVIOUS DESCRIPTIONS

* LRG is now implemented as SQL functions, not as a contrib module.
* New tables are added as system catalogs. Therefore, added tables have oid column.
* The node_id is the strcat of system identifier and dbid.

HOW TO USE

In the document patch, a subsection 'Example' was added for understanding LRG. In short, we can do

1. lrg_create on one node
2. lrg_node_attach on another node

Also attached is a test script that constructs a three-nodes system.

LIMITATIONS

This feature is under development, so there are many limitations for use case.

* The function for detaching a node from a group is not implemented.
* The function for removing a group is not implemented.
* LRG does not lock system catalogs and databases. Concurrent operations may cause inconsistent state.
* LRG does not wait until the upstream node reaches the latest lsn of the remaining nodes.
* LRG does not support initial data sync. That is, it can work well only when all nodes do not have initial data.

[1]: https://commitfest.postgresql.org/38/3610/

Best Regards,
Hayato Kuroda
FUJITSU LIMITED

Attachments:

test.shapplication/octet-stream; name=test.shDownload

#!/bin/bash

port_N1=5431
port_N2=5432
port_N3=5433

common_tbl="create table tbl (a int primary key);"

function show_table_on_all_nodes()
{
    echo $1
    psql -U postgres -p $port_N1 -c "select a as N1 from tbl order by a"
    psql -U postgres -p $port_N2 -c "select a as N2 from tbl order by a"
    psql -U postgres -p $port_N3 -c "select a as N3 from tbl order by a"
}

echo 'Clean up'

pg_ctl stop -D data_N1
pg_ctl stop -D data_N2
pg_ctl stop -D data_N3

rm -r data_N1 data_N2 data_N3 *log

echo 'Set up'

initdb -D data_N1 -U postgres
initdb -D data_N2 -U postgres
initdb -D data_N3 -U postgres

cat << EOF >> data_N1/postgresql.conf
wal_level = logical
port = $port_N1
max_logical_replication_workers=100
max_replication_slots=40
autovacuum = off
EOF

cat << EOF >> data_N2/postgresql.conf
wal_level = logical
port = $port_N2
max_logical_replication_workers=100
max_replication_slots=40
autovacuum = off
EOF

cat << EOF >> data_N3/postgresql.conf
wal_level = logical
port = $port_N3
max_logical_replication_workers=100
max_replication_slots=40
autovacuum = off
EOF

pg_ctl -D data_N1 start -w -l N1.log
pg_ctl -D data_N2 start -w -l N2.log
pg_ctl -D data_N3 start -w -l N3.log

psql -U postgres -p $port_N1 -c "$common_tbl"
psql -U postgres -p $port_N2 -c "$common_tbl"
psql -U postgres -p $port_N3 -c "$common_tbl"

# =====================================================================================================================

echo '****************************************'
echo 'create set testset'
echo '****************************************'

psql -U postgres -p $port_N1 -c "SELECT lrg_create('testgroup', 'FOR ALL TABLES', 'user=postgres port=$port_N1', 'testnode1');"

sleep 1s

# =====================================================================================================================

echo '****************************************'
echo 'Attach to testset'
echo '****************************************'

psql -U postgres -p $port_N2 -c "SELECT lrg_node_attach('testgroup', 'user=postgres port=$port_N2', 'user=postgres port=$port_N1', 'testnode2')"
sleep 1s
psql -U postgres -p $port_N3 -c "SELECT lrg_node_attach('testgroup', 'user=postgres port=$port_N3', 'user=postgres port=$port_N2', 'testnode3')"
sleep 1s

# Insert some more data at every node to see that it is replicated everywhere
psql -U postgres -p $port_N1 -c "insert into tbl values (12);"
psql -U postgres -p $port_N2 -c "insert into tbl values (22);"
psql -U postgres -p $port_N3 -c "insert into tbl values (32);"

sleep 5s
show_table_on_all_nodes "Data inserted at N1,N2,N3 should be shared"

#psql -U postgres -p $port_N1 -c "SELECT lrg.lrg_node_detach('testset', 'testnode1');"
#psql -U postgres -p $port_N2 -c "SELECT lrg.lrg_drop('testset');"

v1-0001-PoC-implement-LRG.patchapplication/octet-stream; name=v1-0001-PoC-implement-LRG.patchDownload

From 2b34be53d5d91920fa8b3366c73a4c624d98391c Mon Sep 17 00:00:00 2001
From: "kuroda.hayato%40jp.fujitsu.com" <kuroda.hayato@jp.fujitsu.com>
Date: Tue, 17 May 2022 08:03:31 +0000
Subject: [PATCH 1/2] (PoC) implement LRG

---
 src/Makefile                                |   1 +
 src/backend/catalog/Makefile                |   3 +-
 src/backend/postmaster/bgworker.c           |   7 +
 src/backend/postmaster/postmaster.c         |   3 +
 src/backend/replication/Makefile            |   4 +-
 src/backend/replication/libpqlrg/Makefile   |  38 ++
 src/backend/replication/libpqlrg/libpqlrg.c | 220 ++++++++
 src/backend/replication/lrg/Makefile        |  22 +
 src/backend/replication/lrg/lrg.c           | 417 ++++++++++++++
 src/backend/replication/lrg/lrg_launcher.c  | 323 +++++++++++
 src/backend/replication/lrg/lrg_worker.c    | 592 ++++++++++++++++++++
 src/backend/storage/ipc/ipci.c              |   2 +
 src/include/catalog/pg_lrg_info.h           |  47 ++
 src/include/catalog/pg_lrg_nodes.h          |  53 ++
 src/include/catalog/pg_lrg_pub.h            |  46 ++
 src/include/catalog/pg_lrg_sub.h            |  46 ++
 src/include/catalog/pg_proc.dat             |  25 +
 src/include/replication/libpqlrg.h          |  63 +++
 src/include/replication/lrg.h               |  67 +++
 src/test/regress/expected/oidjoins.out      |   6 +
 20 files changed, 1983 insertions(+), 2 deletions(-)
 create mode 100644 src/backend/replication/libpqlrg/Makefile
 create mode 100644 src/backend/replication/libpqlrg/libpqlrg.c
 create mode 100644 src/backend/replication/lrg/Makefile
 create mode 100644 src/backend/replication/lrg/lrg.c
 create mode 100644 src/backend/replication/lrg/lrg_launcher.c
 create mode 100644 src/backend/replication/lrg/lrg_worker.c
 create mode 100644 src/include/catalog/pg_lrg_info.h
 create mode 100644 src/include/catalog/pg_lrg_nodes.h
 create mode 100644 src/include/catalog/pg_lrg_pub.h
 create mode 100644 src/include/catalog/pg_lrg_sub.h
 create mode 100644 src/include/replication/libpqlrg.h
 create mode 100644 src/include/replication/lrg.h

diff --git a/src/Makefile b/src/Makefile
index 79e274a476..75db706762 100644
--- a/src/Makefile
+++ b/src/Makefile
@@ -23,6 +23,7 @@ SUBDIRS = \
 	interfaces \
 	backend/replication/libpqwalreceiver \
 	backend/replication/pgoutput \
+	backend/replication/libpqlrg \
 	fe_utils \
 	bin \
 	pl \
diff --git a/src/backend/catalog/Makefile b/src/backend/catalog/Makefile
index 89a0221ec9..744fdf4fb8 100644
--- a/src/backend/catalog/Makefile
+++ b/src/backend/catalog/Makefile
@@ -72,7 +72,8 @@ CATALOG_HEADERS := \
 	pg_collation.h pg_parameter_acl.h pg_partitioned_table.h \
 	pg_range.h pg_transform.h \
 	pg_sequence.h pg_publication.h pg_publication_namespace.h \
-	pg_publication_rel.h pg_subscription.h pg_subscription_rel.h
+	pg_publication_rel.h pg_subscription.h pg_subscription_rel.h \
+	pg_lrg_info.h pg_lrg_nodes.h pg_lrg_pub.h pg_lrg_sub.h
 
 GENERATED_HEADERS := $(CATALOG_HEADERS:%.h=%_d.h) schemapg.h system_fk_info.h
 
diff --git a/src/backend/postmaster/bgworker.c b/src/backend/postmaster/bgworker.c
index 40601aefd9..49d8ff1878 100644
--- a/src/backend/postmaster/bgworker.c
+++ b/src/backend/postmaster/bgworker.c
@@ -20,6 +20,7 @@
 #include "postmaster/bgworker_internals.h"
 #include "postmaster/interrupt.h"
 #include "postmaster/postmaster.h"
+#include "replication/lrg.h"
 #include "replication/logicallauncher.h"
 #include "replication/logicalworker.h"
 #include "storage/dsm.h"
@@ -128,6 +129,12 @@ static const struct
 	},
 	{
 		"ApplyWorkerMain", ApplyWorkerMain
+	},
+	{
+		"lrg_launcher_main", lrg_launcher_main
+	},
+	{
+		"lrg_worker_main", lrg_worker_main
 	}
 };
 
diff --git a/src/backend/postmaster/postmaster.c b/src/backend/postmaster/postmaster.c
index 3b73e26956..b900008cdd 100644
--- a/src/backend/postmaster/postmaster.c
+++ b/src/backend/postmaster/postmaster.c
@@ -118,6 +118,7 @@
 #include "postmaster/postmaster.h"
 #include "postmaster/syslogger.h"
 #include "replication/logicallauncher.h"
+#include "replication/lrg.h"
 #include "replication/walsender.h"
 #include "storage/fd.h"
 #include "storage/ipc.h"
@@ -1020,6 +1021,8 @@ PostmasterMain(int argc, char *argv[])
 	 */
 	ApplyLauncherRegister();
 
+	LrgLauncherRegister();
+
 	/*
 	 * process any libraries that should be preloaded at postmaster start
 	 */
diff --git a/src/backend/replication/Makefile b/src/backend/replication/Makefile
index 3d8fb70c0e..49ffc243f6 100644
--- a/src/backend/replication/Makefile
+++ b/src/backend/replication/Makefile
@@ -35,7 +35,9 @@ OBJS = \
 	walreceiverfuncs.o \
 	walsender.o
 
-SUBDIRS = logical
+SUBDIRS = \
+	logical \
+	lrg
 
 include $(top_srcdir)/src/backend/common.mk
 
diff --git a/src/backend/replication/libpqlrg/Makefile b/src/backend/replication/libpqlrg/Makefile
new file mode 100644
index 0000000000..72d911a918
--- /dev/null
+++ b/src/backend/replication/libpqlrg/Makefile
@@ -0,0 +1,38 @@
+#-------------------------------------------------------------------------
+#
+# Makefile--
+#    Makefile for src/backend/replication/lrg/libpqlrg
+#
+# IDENTIFICATION
+#    src/backend/replication/lrg/libpqlrg/Makefile
+#
+#-------------------------------------------------------------------------
+
+subdir = src/backend/replication/lrg/libpqlrg
+top_builddir = ../../../..
+include $(top_builddir)/src/Makefile.global
+
+override CPPFLAGS := -I$(srcdir) -I$(libpq_srcdir) $(CPPFLAGS)
+
+OBJS = \
+	$(WIN32RES) \
+	libpqlrg.o
+
+SHLIB_LINK_INTERNAL = $(libpq)
+SHLIB_LINK = $(filter -lintl, $(LIBS))
+SHLIB_PREREQS = submake-libpq
+PGFILEDESC = "libpqlrg"
+NAME = libpqlrg
+
+all: all-shared-lib
+
+include $(top_srcdir)/src/Makefile.shlib
+
+install: all installdirs install-lib
+
+installdirs: installdirs-lib
+
+uninstall: uninstall-lib
+
+clean distclean maintainer-clean: clean-lib
+	rm -f $(OBJS)
diff --git a/src/backend/replication/libpqlrg/libpqlrg.c b/src/backend/replication/libpqlrg/libpqlrg.c
new file mode 100644
index 0000000000..4bd8375be7
--- /dev/null
+++ b/src/backend/replication/libpqlrg/libpqlrg.c
@@ -0,0 +1,220 @@
+/*-------------------------------------------------------------------------
+ *
+ * libpqlrg.c
+ *		  functions for lrg worker
+ *
+ *-------------------------------------------------------------------------
+ */
+
+
+#include "postgres.h"
+
+#include "access/heapam.h"
+#include "funcapi.h"
+#include "libpq-fe.h"
+#include "lib/stringinfo.h"
+#include "replication/libpqlrg.h"
+#include "replication/lrg.h"
+#include "utils/snapmgr.h"
+
+PG_MODULE_MAGIC;
+
+void		_PG_init(void);
+
+static void libpqlrg_connect(const char *connstring, PGconn **conn);
+static bool libpqlrg_check_group(PGconn *conn, const char *group_name);
+static void libpqlrg_copy_lrg_nodes(PGconn *remoteconn, PGconn *localconn);
+static void libpqlrg_insert_into_lrg_nodes(PGconn *remoteconn,
+										   const char *node_id, LRG_NODE_STATE status,
+										   const char *node_name, const char *local_connstring,
+										   const char *upstream_connstring);
+
+static void libpqlrg_create_subscription(const char *group_name, const char *publisher_connstring,
+										 const char *publisher_node_id, const char *subscriber_node_id,
+										 PGconn *subscriberconn, const char *options);
+static void libpqlrg_disconnect(PGconn *conn);
+
+static lrg_function_types PQLrgFunctionTypes =
+{
+	libpqlrg_connect,
+	libpqlrg_check_group,
+	libpqlrg_copy_lrg_nodes,
+	libpqlrg_insert_into_lrg_nodes,
+	libpqlrg_create_subscription,
+	libpqlrg_disconnect
+};
+
+/*
+ * Just a wrapper for PQconnectdb() and PQstatus().
+ */
+static void
+libpqlrg_connect(const char *connstring, PGconn **conn)
+{
+	elog(LOG, "given connstring: %s", connstring);
+	*conn = PQconnectdb(connstring);
+	if (PQstatus(*conn) != CONNECTION_OK)
+		elog(ERROR, "failed to connect");
+}
+
+static bool
+libpqlrg_check_group(PGconn *conn, const char *group_name)
+{
+	PGresult *result;
+	StringInfoData query;
+	bool ret;
+
+	Assert(PQstatus(conn) == CONNECTION_OK);
+	initStringInfo(&query);
+	appendStringInfo(&query, "SELECT COUNT(*) FROM pg_lrg_info WHERE groupname = '%s'", group_name);
+
+	result = PQexec(conn, query.data);
+
+	ret = atoi(PQgetvalue(result, 0, 0));
+	pfree(query.data);
+
+	return ret != 0;
+}
+
+/*
+ * Copy pg_lrg_nodes from remoteconn
+ */
+static void
+libpqlrg_copy_lrg_nodes(PGconn *remoteconn, PGconn *localconn)
+{
+	PGresult *result;
+	StringInfoData query;
+	int i, num_tuples;
+
+	Assert(PQstatus(remoteconn) == CONNECTION_OK
+		   && PQstatus(localconn) == CONNECTION_OK);
+	initStringInfo(&query);
+
+
+	/*
+	 * Note that COPY command cannot be used here because group_oid
+	 * might be different between remote and local.
+	 */
+	appendStringInfo(&query, "SELECT nodeid, status, nodename, "
+							 "localconn, upstreamconn FROM pg_lrg_nodes");
+	result = PQexec(remoteconn, query.data);
+	if (PQresultStatus(result) != PGRES_TUPLES_OK)
+		elog(ERROR, "failed to read pg_lrg_nodes");
+
+	resetStringInfo(&query);
+
+	num_tuples = PQntuples(result);
+
+	for(i = 0; i < num_tuples; i++)
+	{
+		char *node_id;
+		char *status;
+		char *nodename;
+		char *localconn;
+		char *upstreamconn;
+
+		node_id = PQgetvalue(result, i, 0);
+		status = PQgetvalue(result, i, 1);
+		nodename = PQgetvalue(result, i, 2);
+		localconn = PQgetvalue(result, i, 3);
+		upstreamconn = PQgetvalue(result, i, 4);
+
+		StartTransactionCommand();
+		(void) GetTransactionSnapshot();
+		/*
+		 * group_oid is adjusted to local value
+		 */
+		lrg_add_nodes(node_id, get_group_oid(), atoi(status), nodename, localconn, upstreamconn);
+		CommitTransactionCommand();
+	}
+}
+
+static void
+libpqlrg_insert_into_lrg_nodes(PGconn *remoteconn,
+							   const char *node_id, LRG_NODE_STATE status,
+							   const char *node_name, const char *local_connstring,
+							   const char *upstream_connstring)
+{
+	StringInfoData query;
+	PGresult *result;
+
+	Assert(PQstatus(remoteconn) == CONNECTION_OK
+		   && node_id != NULL
+		   && node_name != NULL
+		   && local_connstring != NULL
+		   && upstream_connstring != NULL);
+
+	initStringInfo(&query);
+	appendStringInfo(&query, "SELECT lrg_insert_into_nodes('%s', %d, '%s', '%s', '%s')",
+					 node_id, status, node_name, local_connstring, upstream_connstring);
+
+	result = PQexec(remoteconn, query.data);
+	if (PQresultStatus(result) != PGRES_TUPLES_OK)
+		elog(ERROR, "failed to execute libpqlrg_insert_to_remote_lrg_nodes: %s", query.data);
+	PQclear(result);
+
+	pfree(query.data);
+}
+
+
+static void
+libpqlrg_create_subscription(const char *group_name, const char *publisher_connstring,
+							 const char *publisher_node_id, const char *subscriber_node_id,
+							 PGconn *subscriberconn, const char *options)
+{
+	StringInfoData query, sub_name;
+	PGresult *result;
+
+	Assert(publisher_connstring != NULL && subscriberconn != NULL);
+
+	/*
+	 * the name of subscriber is just concat of two node_id.
+	 */
+	initStringInfo(&query);
+	initStringInfo(&sub_name);
+
+	/*
+	 * construct the name of subscription and query.
+	 */
+	appendStringInfo(&sub_name, "sub_%s_%s", subscriber_node_id, publisher_node_id);
+	appendStringInfo(&query, "CREATE SUBSCRIPTION %s CONNECTION '%s' PUBLICATION pub_for_%s",
+					 sub_name.data, publisher_connstring, group_name);
+
+	if (options)
+		appendStringInfo(&query, " WITH (%s)", options);
+
+	result = PQexec(subscriberconn, query.data);
+	if (PQresultStatus(result) != PGRES_COMMAND_OK)
+		elog(ERROR, "failed to create subscription: %s", query.data);
+	PQclear(result);
+
+	resetStringInfo(&query);
+	appendStringInfo(&query, "SELECT lrg_insert_into_sub('%s')", sub_name.data);
+	result = PQexec(subscriberconn, query.data);
+	if (PQresultStatus(result) != PGRES_TUPLES_OK)
+		elog(ERROR, "failed to execute lrg_insert_into_sub: %s", query.data);
+	PQclear(result);
+
+	pfree(sub_name.data);
+	pfree(query.data);
+}
+
+
+/*
+ * Just a wrapper for PQfinish()
+ */
+static void
+libpqlrg_disconnect(PGconn *conn)
+{
+	PQfinish(conn);
+}
+
+/*
+ * Module initialization function
+ */
+void
+_PG_init(void)
+{
+	if (LrgFunctionTypes != NULL)
+		elog(ERROR, "libpqlrg already loaded");
+	LrgFunctionTypes = &PQLrgFunctionTypes;
+}
diff --git a/src/backend/replication/lrg/Makefile b/src/backend/replication/lrg/Makefile
new file mode 100644
index 0000000000..4ce929b6a4
--- /dev/null
+++ b/src/backend/replication/lrg/Makefile
@@ -0,0 +1,22 @@
+#-------------------------------------------------------------------------
+#
+# Makefile--
+#    Makefile for src/backend/replication/lrg
+#
+# IDENTIFICATION
+#    src/backend/replication/lrg/Makefile
+#
+#-------------------------------------------------------------------------
+
+subdir = src/backend/replication/lrg
+top_builddir = ../../../..
+include $(top_builddir)/src/Makefile.global
+
+override CPPFLAGS := -I$(srcdir) -I$(libpq_srcdir) $(CPPFLAGS)
+
+OBJS = \
+	lrg.o \
+	lrg_launcher.o \
+	lrg_worker.o
+
+include $(top_srcdir)/src/backend/common.mk
diff --git a/src/backend/replication/lrg/lrg.c b/src/backend/replication/lrg/lrg.c
new file mode 100644
index 0000000000..1580b9283f
--- /dev/null
+++ b/src/backend/replication/lrg/lrg.c
@@ -0,0 +1,417 @@
+/*-------------------------------------------------------------------------
+ *
+ * lrg.c
+ *		  Constructs a logical replication group
+ *
+ *-------------------------------------------------------------------------
+ */
+
+#include "postgres.h"
+
+#include "access/heapam.h"
+#include "access/htup_details.h"
+#include "access/relscan.h"
+#include "access/table.h"
+#include "access/xlog.h"
+#include "catalog/catalog.h"
+#include "catalog/indexing.h"
+#include "catalog/pg_lrg_info.h"
+#include "catalog/pg_lrg_nodes.h"
+#include "catalog/pg_lrg_sub.h"
+#include "catalog/pg_subscription.h"
+#include "miscadmin.h"
+#include "postmaster/bgworker.h"
+#include "replication/libpqlrg.h"
+#include "replication/logicallauncher.h"
+#include "replication/lrg.h"
+#include "storage/lock.h"
+#include "utils/builtins.h"
+#include "utils/fmgrprotos.h"
+#include "utils/rel.h"
+#include "utils/snapmgr.h"
+#include "utils/syscache.h"
+
+LrgPerdbCtxStruct *LrgPerdbCtx;
+
+static Size lrg_worker_array_size(void);
+static Oid lrg_add_info(char *group_name, bool puballtables);
+static Oid find_subscription(const char *subname);
+
+/*
+ * Helpler function for LrgLauncherShmemInit.
+ */
+static Size
+lrg_worker_array_size(void)
+{
+	Size size;
+
+	size = sizeof(LrgPerdbCtxStruct);
+	size = MAXALIGN(size);
+	/* XXX: for simplify the size of the array is set to max_worker_processes */
+	size = add_size(size, mul_size(max_worker_processes, sizeof(LrgPerdbCtxStruct)));
+
+	return size;
+}
+
+/*
+ * Allocate LrgPerdbCtxStruct to the shared memory.
+ */
+void
+LrgLauncherShmemInit(void)
+{
+	bool		found;
+
+	LWLockAcquire(AddinShmemInitLock, LW_EXCLUSIVE);
+	LrgPerdbCtx = (LrgPerdbCtxStruct *)
+		ShmemInitStruct("Lrg Launcher Data",
+						lrg_worker_array_size(),
+						&found);
+	if (!found)
+	{
+		MemSet(LrgPerdbCtx, 0, lrg_worker_array_size());
+		LWLockInitialize(&(LrgPerdbCtx->lock), LWLockNewTrancheId());
+	}
+	LWLockRelease(AddinShmemInitLock);
+	LWLockRegisterTranche(LrgPerdbCtx->lock.tranche, "lrg");
+}
+
+void
+LrgLauncherRegister(void)
+{
+	BackgroundWorker worker;
+
+	if (max_logical_replication_workers == 0)
+		return;
+
+	/*
+	 * Build struct BackgroundWorker for launcher.
+	 */
+	MemSet(&worker, 0, sizeof(BackgroundWorker));
+
+	snprintf(worker.bgw_name, BGW_MAXLEN, "lrg launcher");
+	worker.bgw_start_time = BgWorkerStart_RecoveryFinished;
+	worker.bgw_flags = BGWORKER_SHMEM_ACCESS | BGWORKER_BACKEND_DATABASE_CONNECTION;
+	worker.bgw_restart_time = BGW_NEVER_RESTART;
+	snprintf(worker.bgw_library_name, BGW_MAXLEN, "postgres");
+	snprintf(worker.bgw_function_name, BGW_MAXLEN, "lrg_launcher_main");
+	RegisterBackgroundWorker(&worker);
+}
+
+/*
+ * construct node_id.
+ *
+ * TODO: construct proper node_id. Currently it is just concat of
+ * sytem identifier and dbid.
+ */
+void
+construct_node_id(char *out_node_id, int size)
+{
+	snprintf(out_node_id, size, UINT64_FORMAT "%u", GetSystemIdentifier(), MyDatabaseId);
+}
+
+/*
+ * Actual work for adding a tuple to pg_lrg_nodes.
+ */
+void
+lrg_add_nodes(char *node_id, Oid group_id, LRG_NODE_STATE status, char *node_name, char *local_connstring, char *upstream_connstring)
+{
+	Relation rel;
+	bool		nulls[Natts_pg_lrg_nodes];
+	Datum		values[Natts_pg_lrg_nodes];
+	HeapTuple tup;
+
+	Oid			lrgnodesoid;
+
+	rel = table_open(LrgNodesRelationId, ExclusiveLock);
+
+	memset(values, 0, sizeof(values));
+	memset(nulls, 0, sizeof(nulls));
+
+	lrgnodesoid = GetNewOidWithIndex(rel, LrgNodesRelationIndexId, Anum_pg_lrg_nodes_oid);
+	values[Anum_pg_lrg_nodes_oid - 1] = ObjectIdGetDatum(lrgnodesoid);
+	values[Anum_pg_lrg_nodes_nodeid - 1] = CStringGetDatum(node_id);
+	values[Anum_pg_lrg_nodes_groupid - 1] = ObjectIdGetDatum(group_id);
+	values[Anum_pg_lrg_nodes_status - 1] = Int32GetDatum(status);
+	values[Anum_pg_lrg_nodes_dbid - 1] = ObjectIdGetDatum(MyDatabaseId);
+	values[Anum_pg_lrg_nodes_nodename - 1] = CStringGetDatum(node_name);
+	values[Anum_pg_lrg_nodes_localconn - 1] = CStringGetDatum(local_connstring);
+
+	if (upstream_connstring != NULL)
+		values[Anum_pg_lrg_nodes_upstreamconn - 1] = CStringGetDatum(upstream_connstring);
+	else
+		nulls[Anum_pg_lrg_nodes_upstreamconn - 1] = true;
+
+	tup = heap_form_tuple(RelationGetDescr(rel), values, nulls);
+	/* Insert tuple into catalog. */
+	CatalogTupleInsert(rel, tup);
+	heap_freetuple(tup);
+	table_close(rel, ExclusiveLock);
+}
+
+/*
+ * read pg_lrg_info and get oid.
+ *
+ * XXX: This function assumes that there is only one tuple
+ * in thepg_lrg_info.
+ */
+Oid
+get_group_oid(void)
+{
+	Relation	rel;
+	HeapTuple tup;
+	TableScanDesc scan;
+	Oid group_oid = InvalidOid;
+	Form_pg_lrg_info infoform;
+	bool is_opened = false;
+
+	if (!IsTransactionState())
+	{
+		is_opened = true;
+		StartTransactionCommand();
+		(void) GetTransactionSnapshot();
+	}
+
+	rel = table_open(LrgNodesRelationId, AccessShareLock);
+	scan = table_beginscan_catalog(rel, 0, NULL);
+	tup = heap_getnext(scan, ForwardScanDirection);
+
+	if (tup != NULL)
+	{
+		infoform = (Form_pg_lrg_info) GETSTRUCT(tup);
+		group_oid = infoform->oid;
+	}
+
+	table_endscan(scan);
+	table_close(rel, AccessShareLock);
+
+	if (is_opened)
+		CommitTransactionCommand();
+
+	return group_oid;
+}
+
+/*
+ * Actual work for adding a tuple to pg_lrg_info.
+ */
+static Oid
+lrg_add_info(char *group_name, bool puballtables)
+{
+	Relation	rel;
+	bool		nulls[Natts_pg_lrg_info];
+	Datum		values[Natts_pg_lrg_info];
+	HeapTuple tup;
+	Oid			lrgoid;
+
+	rel = table_open(LrgInfoRelationId, ExclusiveLock);
+
+	memset(values, 0, sizeof(values));
+	memset(nulls, 0, sizeof(nulls));
+
+	lrgoid = GetNewOidWithIndex(rel, LrgInfoRelationIndexId, Anum_pg_lrg_info_oid);
+	values[Anum_pg_lrg_info_oid - 1] = ObjectIdGetDatum(lrgoid);
+
+	values[Anum_pg_lrg_info_groupname - 1] = CStringGetDatum(group_name);
+
+	values[Anum_pg_lrg_info_puballtables - 1] = BoolGetDatum(puballtables);
+
+	tup = heap_form_tuple(RelationGetDescr(rel), values, nulls);
+	/* Insert tuple into catalog. */
+	CatalogTupleInsert(rel, tup);
+	heap_freetuple(tup);
+	table_close(rel, ExclusiveLock);
+
+	return lrgoid;
+}
+
+/*
+ * helper function for lrg_insert_into_sub
+ */
+static Oid
+find_subscription(const char *subname)
+{
+	/* for scannning */
+	Relation rel;
+	HeapTuple tup;
+	Form_pg_subscription form;
+
+	rel = table_open(SubscriptionRelationId, AccessExclusiveLock);
+	tup = SearchSysCacheCopy2(SUBSCRIPTIONNAME, MyDatabaseId,
+							  CStringGetDatum(subname));
+
+	if (!HeapTupleIsValid(tup))
+	{
+		table_close(rel, NoLock);
+		return InvalidOid;
+	}
+
+	form = (Form_pg_subscription) GETSTRUCT(tup);
+	table_close(rel, NoLock);
+
+	return form->oid;
+}
+
+/*
+ * ================================
+ * Public APIs
+ * ================================
+ */
+
+/*
+ * SQL function for creating a new logical replication group.
+ *
+ * This function adds a tuple to pg_lrg_info and pg_lrg_nodes,
+ * and after that kick lrg launcher.
+ */
+Datum
+lrg_create(PG_FUNCTION_ARGS)
+{
+	Oid			lrgoid;
+	char		*group_name;
+	char		*pub_type;
+	char		*local_connstring;
+	char		*node_name;
+
+	/* XXX: for simplify the fixed array is used */
+	char		node_id[64];
+
+	group_name = text_to_cstring((text *) DatumGetPointer(PG_GETARG_DATUM(0)));
+	pub_type = text_to_cstring((text *) DatumGetPointer(PG_GETARG_DATUM(1)));
+
+	if (pg_strcasecmp(pub_type, "FOR ALL TABLES") != 0)
+		elog(ERROR, "'only 'FOR ALL TABLES' is support");
+
+	lrgoid = lrg_add_info(group_name, true);
+
+	construct_node_id(node_id, sizeof(node_id));
+	local_connstring = text_to_cstring((text *) DatumGetPointer(PG_GETARG_DATUM(2)));
+	node_name = text_to_cstring((text *) DatumGetPointer(PG_GETARG_DATUM(3)));
+	lrg_add_nodes(node_id, lrgoid, LRG_STATE_INIT, node_name, local_connstring, NULL);
+
+	lrg_launcher_wakeup();
+	PG_RETURN_NULL();
+}
+
+
+/*
+ * SQL function for attaching to a specified group
+ *
+ * This function adds a tuple to pg_lrg_info and pg_lrg_nodes,
+ * and after that kick lrg launcher.
+ */
+Datum
+lrg_node_attach(PG_FUNCTION_ARGS)
+{
+	Oid			lrgoid;
+	char		*group_name;
+	char		*local_connstring;
+	char		*upstream_connstring;
+	char		*node_name;
+	PGconn		*upstreamconn = NULL;
+
+	/* XXX: for simplify the fixed array is used */
+	char		node_id[64];
+
+	group_name = text_to_cstring((text *) DatumGetPointer(PG_GETARG_DATUM(0)));
+	local_connstring = text_to_cstring((text *) DatumGetPointer(PG_GETARG_DATUM(1)));
+	upstream_connstring = text_to_cstring((text *) DatumGetPointer(PG_GETARG_DATUM(2)));
+	node_name = text_to_cstring((text *) DatumGetPointer(PG_GETARG_DATUM(3)));
+
+	/*
+	 * For sanity check the backend process must connect to the upstream node.
+	 * libpqlrg shared library will be used for that.
+	 */
+	load_file("libpqlrg", false);
+	lrg_connect(upstream_connstring, &upstreamconn);
+	if (!lrg_check_group(upstreamconn, group_name))
+		elog(ERROR, "specified group is not exist");
+	lrg_disconnect(upstreamconn);
+
+	lrgoid = lrg_add_info(group_name, true);
+	construct_node_id(node_id, sizeof(node_id));
+	lrg_add_nodes(node_id, lrgoid, LRG_STATE_INIT, node_name, local_connstring, upstream_connstring);
+
+	lrg_launcher_wakeup();
+	PG_RETURN_NULL();
+}
+
+/*
+ * SQL function for detaching from a group
+ */
+Datum
+lrg_node_detach(PG_FUNCTION_ARGS)
+{
+	PG_RETURN_NULL();
+}
+
+/*
+ * SQL function for dropping a group
+ */
+Datum
+lrg_drop(PG_FUNCTION_ARGS)
+{
+	PG_RETURN_NULL();
+}
+
+/*
+ * This funciton is used internally: wrapper for adding a tuple into pg_lrg_sub
+ */
+Datum
+lrg_insert_into_sub(PG_FUNCTION_ARGS)
+{
+	char *sub_name;
+	Oid group_oid, sub_oid, lrgsub_oid;
+	Relation rel;
+	bool		nulls[Natts_pg_lrg_sub];
+	Datum		values[Natts_pg_lrg_sub];
+	HeapTuple tup;
+
+	sub_name = text_to_cstring((text *) DatumGetPointer(PG_GETARG_DATUM(0)));
+
+	group_oid = get_group_oid();
+	sub_oid = find_subscription(sub_name);
+
+	rel = table_open(LrgSubscriptionId, ExclusiveLock);
+
+	memset(values, 0, sizeof(values));
+	memset(nulls, 0, sizeof(nulls));
+
+	lrgsub_oid = GetNewOidWithIndex(rel, LrgSubscriptionOidIndexId, Anum_pg_lrg_sub_oid);
+
+	values[Anum_pg_lrg_sub_oid - 1] = ObjectIdGetDatum(lrgsub_oid);
+	values[Anum_pg_lrg_sub_groupid - 1] = ObjectIdGetDatum(group_oid);
+	values[Anum_pg_lrg_sub_subid - 1] = ObjectIdGetDatum(sub_oid);
+
+	tup = heap_form_tuple(RelationGetDescr(rel), values, nulls);
+	/* Insert tuple into catalog. */
+	CatalogTupleInsert(rel, tup);
+	heap_freetuple(tup);
+	table_close(rel, ExclusiveLock);
+
+	PG_RETURN_NULL();
+}
+
+/*
+ * This funciton is used internally: wrapper for adding a tuple into pg_lrg_nodes
+ */
+Datum
+lrg_insert_into_nodes(PG_FUNCTION_ARGS)
+{
+	char *node_id;
+	LRG_NODE_STATE status;
+	char *node_name;
+	char *local_connstring;
+	char *upstream_connstring;
+	Oid group_oid;
+
+	node_id = text_to_cstring((text *) DatumGetPointer(PG_GETARG_DATUM(0)));
+	status = DatumGetInt32(PG_GETARG_DATUM(1));
+	node_name = text_to_cstring((text *) DatumGetPointer(PG_GETARG_DATUM(2)));
+	local_connstring = text_to_cstring((text *) DatumGetPointer(PG_GETARG_DATUM(3)));
+	upstream_connstring = text_to_cstring((text *) DatumGetPointer(PG_GETARG_DATUM(4)));
+
+	group_oid = get_group_oid();
+
+	lrg_add_nodes(node_id, group_oid, status, node_name, local_connstring, upstream_connstring);
+
+	PG_RETURN_NULL();
+}
diff --git a/src/backend/replication/lrg/lrg_launcher.c b/src/backend/replication/lrg/lrg_launcher.c
new file mode 100644
index 0000000000..d0cbe36515
--- /dev/null
+++ b/src/backend/replication/lrg/lrg_launcher.c
@@ -0,0 +1,323 @@
+/*-------------------------------------------------------------------------
+ *
+ * lrg_launcher.c
+ *		  functions for lrg launcher
+ *
+ *-------------------------------------------------------------------------
+ */
+#include "postgres.h"
+
+#include "access/htup_details.h"
+#include "access/heapam.h"
+#include "access/relscan.h"
+#include "access/table.h"
+#include "catalog/pg_database.h"
+#include "miscadmin.h"
+#include "postmaster/bgworker.h"
+#include "postmaster/interrupt.h"
+#include "replication/logicallauncher.h"
+#include "replication/lrg.h"
+#include "storage/ipc.h"
+#include "storage/proc.h"
+#include "tcop/tcopprot.h"
+#include "utils/memutils.h"
+#include "utils/snapmgr.h"
+
+static void launch_lrg_worker(Oid dbid);
+static LrgPerdbWorker* find_perdb_worker(Oid dbid);
+static List* get_db_list(void);
+static void scan_and_launch(void);
+static void lrglauncher_worker_onexit(int code, Datum arg);
+
+static bool ishook_registered = false;
+static bool isworker_needed = false;
+
+typedef struct db_list_cell
+{
+	Oid dbid;
+	char *dbname;
+} db_list_cell;
+
+/*
+ * Launch a per-db worker that related with the given database
+ */
+static void
+launch_lrg_worker(Oid dbid)
+{
+	BackgroundWorker bgw;
+	LrgPerdbWorker *worker = NULL;
+	int slot = 0;
+
+	LWLockAcquire(&LrgPerdbCtx->lock, LW_EXCLUSIVE);
+
+	/*
+	 * Find a free worker slot.
+	 */
+	for (int i = 0; i < max_logical_replication_workers; i++)
+	{
+		LrgPerdbWorker *pw = &LrgPerdbCtx->workers[i];
+
+		if (pw->dbid == InvalidOid)
+		{
+			worker = pw;
+			slot = i;
+			break;
+		}
+	}
+
+	/*
+	 * If there are no more free worker slots, raise an ERROR now.
+	 *
+	 * TODO: cleanup the array?
+	 */
+	if (worker == NULL)
+	{
+		LWLockRelease(&LrgPerdbCtx->lock);
+		ereport(ERROR,
+				errmsg("out of worker slots"));
+	}
+
+
+	/* Prepare the worker slot. */
+	worker->dbid = dbid;
+
+	LWLockRelease(&LrgPerdbCtx->lock);
+
+	MemSet(&bgw, 0, sizeof(BackgroundWorker));
+
+	snprintf(bgw.bgw_name, BGW_MAXLEN, "lrg worker for database %u", dbid);
+	bgw.bgw_start_time = BgWorkerStart_RecoveryFinished;
+	bgw.bgw_flags = BGWORKER_SHMEM_ACCESS | BGWORKER_BACKEND_DATABASE_CONNECTION;
+	bgw.bgw_restart_time = BGW_NEVER_RESTART;
+	snprintf(bgw.bgw_library_name, BGW_MAXLEN, "postgres");
+	snprintf(bgw.bgw_function_name, BGW_MAXLEN, "lrg_worker_main");
+	bgw.bgw_main_arg = UInt32GetDatum(slot);
+
+	if (!RegisterDynamicBackgroundWorker(&bgw, NULL))
+	{
+		/* Failed to start worker, so clean up the worker slot. */
+		LWLockAcquire(&LrgPerdbCtx->lock, LW_EXCLUSIVE);
+		lrg_worker_cleanup(worker);
+		LWLockRelease(&LrgPerdbCtx->lock);
+		ereport(ERROR,
+				(errcode(ERRCODE_CONFIGURATION_LIMIT_EXCEEDED),
+				 errmsg("out of worker slots")));
+	}
+}
+
+/*
+ * Find a per-db worker that related with the given database
+ */
+static LrgPerdbWorker*
+find_perdb_worker(Oid dbid)
+{
+	int i;
+
+	Assert(LWLockHeldByMe(&LrgPerdbCtx->lock));
+
+	for (i = 0; i < max_logical_replication_workers; i++)
+	{
+		LrgPerdbWorker *worker = &LrgPerdbCtx->workers[i];
+		if (worker->dbid == dbid)
+			return worker;
+	}
+	return NULL;
+}
+
+/*
+ * Load the list of databases.
+ */
+static List*
+get_db_list()
+{
+	List *res = NIL;
+	Relation	rel;
+	TableScanDesc scan;
+	HeapTuple	tup;
+	/* We will allocate the output data in the current memory context */
+	MemoryContext resultcxt = CurrentMemoryContext;
+
+	StartTransactionCommand();
+	(void) GetTransactionSnapshot();
+
+	rel = table_open(DatabaseRelationId, AccessShareLock);
+	scan = table_beginscan_catalog(rel, 0, NULL);
+
+	while (HeapTupleIsValid(tup = heap_getnext(scan, ForwardScanDirection)))
+	{
+		Form_pg_database dbform = (Form_pg_database) GETSTRUCT(tup);
+		db_list_cell *cell;
+		MemoryContext oldcxt;
+
+		/* skip if connection is not allowed */
+		if (!dbform->datallowconn)
+			continue;
+
+		/*
+		 * Allocate our results in the caller's context, not the transaction's.
+		 */
+		oldcxt = MemoryContextSwitchTo(resultcxt);
+
+		cell = (db_list_cell *) palloc0(sizeof(db_list_cell));
+		cell->dbid = dbform->oid;
+		cell->dbname = pstrdup(NameStr(dbform->datname));
+		res = lappend(res, cell);
+
+		MemoryContextSwitchTo(oldcxt);
+	}
+
+	table_endscan(scan);
+	table_close(rel, AccessShareLock);
+	CommitTransactionCommand();
+
+	return res;
+}
+
+/*
+ * Scan pg_lrg_nodes and launch or notify to per-db worker if needed.
+ */
+static void
+scan_and_launch(void)
+{
+	List *list;
+	ListCell   *lc;
+	MemoryContext subctx;
+	MemoryContext oldctx;
+
+	subctx = AllocSetContextCreate(TopMemoryContext,
+									"Lrg Launcher list",
+									ALLOCSET_DEFAULT_SIZES);
+	oldctx = MemoryContextSwitchTo(subctx);
+
+	/* search for lrg nodes to start */
+	list = get_db_list();
+
+	foreach(lc, list)
+	{
+		db_list_cell *cell = (db_list_cell *)lfirst(lc);
+		LrgPerdbWorker *worker;
+
+		LWLockAcquire(&LrgPerdbCtx->lock, LW_EXCLUSIVE);
+		worker = find_perdb_worker(cell->dbid);
+		LWLockRelease(&LrgPerdbCtx->lock);
+
+		if (worker != NULL)
+			continue;
+
+		launch_lrg_worker(cell->dbid);
+	}
+
+	/* Switch back to original memory context. */
+	MemoryContextSwitchTo(oldctx);
+	/* Clean the temporary memory. */
+	MemoryContextDelete(subctx);
+}
+
+
+/*
+ * Callback for process exit. cleanup the controller
+ */
+static void
+lrglauncher_worker_onexit(int code, Datum arg)
+{
+	LWLockAcquire(&LrgPerdbCtx->lock, LW_EXCLUSIVE);
+	LrgPerdbCtx->launcher_pid = InvalidPid;
+	LrgPerdbCtx->launcher_latch = NULL;
+	LWLockRelease(&LrgPerdbCtx->lock);
+}
+
+/*
+ * Entry point for lrg launcher
+ */
+void
+lrg_launcher_main(Datum arg)
+{
+	Assert(LrgPerdbCtx->launcher_pid == 0);
+	LrgPerdbCtx->launcher_pid = MyProcPid;
+
+	/* Establish signal handlers. */
+	pqsignal(SIGHUP, SignalHandlerForConfigReload);
+	pqsignal(SIGTERM, die);
+	BackgroundWorkerUnblockSignals();
+
+	/*
+	 * Register my latch to the controller
+	 * for receiving notifications from per-db background worker.
+	 */
+	LWLockAcquire(&LrgPerdbCtx->lock, LW_EXCLUSIVE);
+	LrgPerdbCtx->launcher_latch = &MyProc->procLatch;
+	LrgPerdbCtx->launcher_pid = MyProcPid;
+	LWLockRelease(&LrgPerdbCtx->lock);
+	before_shmem_exit(lrglauncher_worker_onexit, (Datum) 0);
+	ResetLatch(&MyProc->procLatch);
+
+	/*
+	 * we did not connect specific database, because this
+	 * will read only pg_database
+	 */
+	BackgroundWorkerInitializeConnection(NULL, NULL, 0);
+
+	/*
+	 * main loop
+	 */
+	for (;;)
+	{
+		int rc = 0;
+
+		CHECK_FOR_INTERRUPTS();
+#define TEMPORARY_NAP_TIME 180000L
+
+		rc = WaitLatch(&MyProc->procLatch,
+					   WL_LATCH_SET | WL_TIMEOUT | WL_EXIT_ON_PM_DEATH,
+					   TEMPORARY_NAP_TIME, 0);
+		if (rc & WL_LATCH_SET)
+		{
+			ResetLatch(&MyProc->procLatch);
+			CHECK_FOR_INTERRUPTS();
+			scan_and_launch();
+		}
+
+		if (ConfigReloadPending)
+		{
+			ConfigReloadPending = false;
+			ProcessConfigFile(PGC_SIGHUP);
+		}
+	}
+	/* Not reachable */
+}
+
+/*
+ * Launches per-db worker if needed.
+ */
+static void
+lrg_perdb_wakeup_callback(XactEvent event, void *arg)
+{
+	switch (event)
+	{
+		case XACT_EVENT_COMMIT:
+			if (isworker_needed)
+			{
+				LWLockAcquire(&LrgPerdbCtx->lock, LW_EXCLUSIVE);
+				SetLatch(LrgPerdbCtx->launcher_latch);
+				LWLockRelease(&LrgPerdbCtx->lock);
+			}
+			isworker_needed = false;
+			break;
+		default:
+			break;
+	}
+}
+
+/*
+ * Register a callback for notifying to launcher.
+ */
+void
+lrg_launcher_wakeup(void)
+{
+	if (!ishook_registered)
+	{
+		RegisterXactCallback(lrg_perdb_wakeup_callback, NULL);
+		ishook_registered = true;
+	}
+	isworker_needed = true;
+}
diff --git a/src/backend/replication/lrg/lrg_worker.c b/src/backend/replication/lrg/lrg_worker.c
new file mode 100644
index 0000000000..785ab851c8
--- /dev/null
+++ b/src/backend/replication/lrg/lrg_worker.c
@@ -0,0 +1,592 @@
+/*-------------------------------------------------------------------------
+ *
+ * lrg_worker.c
+ *		  functions for lrg worker
+ *
+ *-------------------------------------------------------------------------
+ */
+
+#include "postgres.h"
+
+#include "access/genam.h"
+#include "access/heapam.h"
+#include "access/htup_details.h"
+#include "access/relscan.h"
+#include "access/table.h"
+#include "catalog/catalog.h"
+#include "catalog/indexing.h"
+#include "catalog/pg_lrg_info.h"
+#include "catalog/pg_lrg_nodes.h"
+#include "catalog/pg_lrg_pub.h"
+#include "catalog/pg_publication.h"
+#include "executor/spi.h"
+#include "libpq-fe.h"
+#include "miscadmin.h"
+#include "postmaster/bgworker.h"
+#include "postmaster/interrupt.h"
+#include "replication/libpqlrg.h"
+#include "replication/lrg.h"
+#include "storage/ipc.h"
+#include "storage/proc.h"
+#include "tcop/tcopprot.h"
+#include "utils/fmgroids.h"
+#include "utils/memutils.h"
+#include "utils/snapmgr.h"
+#include "utils/syscache.h"
+
+typedef struct LrgNode {
+	Oid	  group_oid;
+	char *node_id;
+	char *node_name;
+	char *local_connstring;
+	char *upstream_connstring;
+} LrgNode;
+
+lrg_function_types *LrgFunctionTypes = NULL;
+
+static LrgPerdbWorker* my_lrg_worker = NULL;
+
+static void lrg_worker_onexit(int code, Datum arg);
+static void do_node_management(void);
+
+static void get_node_information(LrgNode *node, LRG_NODE_STATE *status);
+static void advance_state_machine(LrgNode *node, LRG_NODE_STATE initial_status);
+
+static void create_publication(const char* group_name, const char* node_id, Oid group_oid);
+static Oid find_publication(const char *pubname);
+
+static List* get_lrg_nodes_list(const char *local_nodeid);
+
+static void synchronise_system_tables(PGconn *localconn, PGconn *upstreamconn, char *local_connstring);
+static void get_group_name(char **group_name, Oid group_oid);
+static void update_mynode(LRG_NODE_STATE state);
+
+void
+lrg_worker_cleanup(LrgPerdbWorker *worker)
+{
+	Assert(LWLockHeldByMeInMode(&LrgPerdbCtx->lock, LW_EXCLUSIVE));
+
+	worker->dbid = InvalidOid;
+	worker->worker_pid = InvalidPid;
+	worker->worker_latch = NULL;
+}
+
+/*
+ * Callback for process exit. cleanup the array.
+ */
+static void
+lrg_worker_onexit(int code, Datum arg)
+{
+	LWLockAcquire(&LrgPerdbCtx->lock, LW_EXCLUSIVE);
+	lrg_worker_cleanup(my_lrg_worker);
+	LWLockRelease(&LrgPerdbCtx->lock);
+}
+
+/*
+ * Synchronise system tables from upstream node.
+ *
+ * Currently it will read and insert pg_lrg_nodes only.
+ */
+static void
+synchronise_system_tables(PGconn *localconn, PGconn *upstreamconn, char *local_connstring)
+{
+	lrg_copy_lrg_nodes(upstreamconn, localconn);
+}
+
+/*
+ * Load the list of lrg_nodes.
+ */
+static List*
+get_lrg_nodes_list(const char *local_nodeid)
+{
+	List *res = NIL;
+	Relation	rel;
+	TableScanDesc scan;
+	HeapTuple	tup;
+	/* We will allocate the output data in the current memory context */
+	MemoryContext resultcxt = CurrentMemoryContext;
+
+	StartTransactionCommand();
+	(void) GetTransactionSnapshot();
+
+	rel = table_open(LrgNodesRelationId, AccessShareLock);
+	scan = table_beginscan_catalog(rel, 0, NULL);
+
+	while (HeapTupleIsValid(tup = heap_getnext(scan, ForwardScanDirection)))
+	{
+		Form_pg_lrg_nodes nodesform = (Form_pg_lrg_nodes) GETSTRUCT(tup);
+		LrgNode *node;
+		MemoryContext oldcxt;
+
+		if (strcmp(NameStr(nodesform->nodeid), local_nodeid) == 0)
+			continue;
+		/*
+		 * Allocate our results in the caller's context, not the transaction's.
+		 */
+		oldcxt = MemoryContextSwitchTo(resultcxt);
+
+		node = (LrgNode *)palloc0(sizeof(LrgNode));
+		node->group_oid = nodesform->groupid;
+		node->node_id = NameStr(nodesform->nodeid);
+		node->node_name = NameStr(nodesform->nodename);
+		node->local_connstring = NameStr(nodesform->localconn);
+		node->upstream_connstring = NameStr(nodesform->upstreamconn);
+		res = lappend(res, node);
+
+		MemoryContextSwitchTo(oldcxt);
+	}
+
+	table_endscan(scan);
+	table_close(rel, AccessShareLock);
+	CommitTransactionCommand();
+
+	return res;
+}
+
+/*
+ * get group_name from local pg_lrg_info.
+ * The second argument is used for the key.
+ *
+ * XXX: In this version this function may be not needed
+ * because one node can join only one group.
+ */
+static void
+get_group_name(char **group_name, Oid group_oid)
+{
+	Relation	rel;
+	HeapTuple	tup;
+	SysScanDesc scandesc;
+	ScanKeyData entry[1];
+	Form_pg_lrg_info infoform;
+	MemoryContext old;
+
+	StartTransactionCommand();
+	(void) GetTransactionSnapshot();
+
+	Assert(group_name != NULL);
+	/*
+	 * Read a related tuple from pg_lrg_info.
+	 * TODO: use index scan instead?
+	 */
+	rel = table_open(LrgInfoRelationId, AccessShareLock);
+	ScanKeyInit(&entry[0],
+				Anum_pg_lrg_info_oid,
+				BTEqualStrategyNumber, F_OIDEQ,
+				DatumGetObjectId(group_oid));
+
+	scandesc = systable_beginscan(rel, LrgInfoRelationIndexId, true,
+									NULL, 1, entry);
+	tup = systable_getnext(scandesc);
+
+	Assert(HeapTupleIsValid(tup));
+	infoform = (Form_pg_lrg_info) GETSTRUCT(tup);
+	old = MemoryContextSwitchTo(TopMemoryContext);
+	*group_name = pstrdup(NameStr(infoform->groupname));
+
+	MemoryContextSwitchTo(old);
+
+	systable_endscan(scandesc);
+	table_close(rel, AccessShareLock);
+
+	CommitTransactionCommand();
+}
+
+/*
+ * Update the node information myself to specified state
+ */
+static void
+update_mynode(LRG_NODE_STATE state)
+{
+	StringInfoData query;
+	int ret;
+
+	initStringInfo(&query);
+	appendStringInfo(&query, "UPDATE pg_lrg_nodes SET status = ");
+
+	switch (state)
+	{
+		case LRG_STATE_CREATE_PUBLICATION:
+			appendStringInfo(&query, "%d ", LRG_STATE_CREATE_PUBLICATION);
+			break;
+		case LRG_STATE_CREATE_SUBSCRIPTION:
+			appendStringInfo(&query, "%d", LRG_STATE_CREATE_SUBSCRIPTION);
+			break;
+		case LRG_STATE_READY:
+			appendStringInfo(&query, "%d", LRG_STATE_READY);
+			break;
+		default:
+			elog(ERROR, "not implemented yet");
+	}
+
+	appendStringInfo(&query, " WHERE dbid = %d", my_lrg_worker->dbid);
+
+	StartTransactionCommand();
+	SPI_connect();
+	PushActiveSnapshot(GetTransactionSnapshot());
+
+	ret = SPI_execute(query.data, false, 0);
+	if (ret != SPI_OK_UPDATE)
+		elog(ERROR, "SPI error while updating a table");
+
+	PopActiveSnapshot();
+	SPI_finish();
+	CommitTransactionCommand();
+
+	pfree(query.data);
+}
+
+static Oid
+find_publication(const char *pubname)
+{
+	/* for scannning */
+	Relation rel;
+	HeapTuple tup;
+	Form_pg_publication pubform;
+
+	rel = table_open(PublicationRelationId, RowExclusiveLock);
+
+	/* Check if name is used */
+	tup = SearchSysCacheCopy1(PUBLICATIONNAME,
+							  CStringGetDatum(pubname));
+
+	if (!HeapTupleIsValid(tup))
+	{
+		table_close(rel, NoLock);
+		return InvalidOid;
+	}
+
+	pubform = (Form_pg_publication) GETSTRUCT(tup);
+	table_close(rel, NoLock);
+
+	return pubform->oid;
+}
+
+/*
+ * Create publication via SPI interface.
+ */
+static void
+create_publication(const char* group_name, const char* node_id, Oid group_oid)
+{
+	int ret;
+	StringInfoData query, pub_name;
+	Oid pub_oid;
+	Oid lrgpub_oid;
+	Relation rel;
+	bool		nulls[Natts_pg_lrg_pub];
+	Datum		values[Natts_pg_lrg_pub];
+	HeapTuple tup;
+
+	initStringInfo(&query);
+	initStringInfo(&pub_name);
+
+	StartTransactionCommand();
+	SPI_connect();
+	PushActiveSnapshot(GetTransactionSnapshot());
+
+
+	appendStringInfo(&pub_name, "pub_for_%s", group_name);
+	appendStringInfo(&query, "CREATE PUBLICATION %s %s", pub_name.data, "FOR ALL TABLES");
+
+	ret = SPI_execute(query.data, false, 0);
+	if (ret != SPI_OK_UTILITY)
+		elog(ERROR, "SPI error while creating publication");
+
+	PopActiveSnapshot();
+	SPI_finish();
+	CommitTransactionCommand();
+
+	StartTransactionCommand();
+	(void) GetTransactionSnapshot();
+
+	pub_oid = find_publication(pub_name.data);
+	if (pub_oid == InvalidOid)
+		elog(ERROR, "publication is not found");
+
+	rel = table_open(LrgPublicationId, ExclusiveLock);
+
+	memset(nulls, 0, sizeof(nulls));
+	memset(values, 0, sizeof(values));
+
+	lrgpub_oid = GetNewOidWithIndex(rel, LrgPublicationOidIndexId, Anum_pg_lrg_pub_oid);
+
+	values[Anum_pg_lrg_pub_oid - 1] = ObjectIdGetDatum(lrgpub_oid);
+	values[Anum_pg_lrg_pub_groupid - 1] = ObjectIdGetDatum(group_oid);
+	values[Anum_pg_lrg_pub_pubid - 1] = ObjectIdGetDatum(pub_oid);
+
+	tup = heap_form_tuple(RelationGetDescr(rel), values, nulls);
+	/* Insert tuple into catalog. */
+	CatalogTupleInsert(rel, tup);
+	heap_freetuple(tup);
+	table_close(rel, ExclusiveLock);
+
+	CommitTransactionCommand();
+
+	pfree(pub_name.data);
+	pfree(query.data);
+}
+
+
+/*
+ * advance the state machine
+ */
+static void
+advance_state_machine(LrgNode *local_node, LRG_NODE_STATE initial_status)
+{
+	PGconn *localconn = NULL;
+	PGconn *upstreamconn = NULL;
+	char *group_name = NULL;
+
+	LRG_NODE_STATE state = initial_status;
+
+	if (state == LRG_STATE_INIT)
+	{
+		/* Establish connection if we are in the attaching case */
+		if (local_node->upstream_connstring != NULL)
+		{
+			load_file("libpqlrg", false);
+			lrg_connect(local_node->upstream_connstring, &upstreamconn);
+			lrg_connect(local_node->local_connstring, &localconn);
+			synchronise_system_tables(localconn, upstreamconn, local_node->local_connstring);
+		}
+
+		get_group_name(&group_name, local_node->group_oid);
+		elog(LOG, "set_name: %s", group_name);
+
+		create_publication(group_name, local_node->node_id, local_node->group_oid);
+
+		state = LRG_STATE_CREATE_PUBLICATION;
+		update_mynode(LRG_STATE_CREATE_PUBLICATION);
+	}
+
+	if (state == LRG_STATE_CREATE_PUBLICATION)
+	{
+		if (local_node->upstream_connstring != NULL)
+		{
+			List *list;
+			ListCell   *lc;
+			MemoryContext subctx;
+			MemoryContext oldctx;
+
+			subctx = AllocSetContextCreate(TopMemoryContext,
+											"Lrg Launcher list",
+											ALLOCSET_DEFAULT_SIZES);
+			oldctx = MemoryContextSwitchTo(subctx);
+
+			list = get_lrg_nodes_list(local_node->node_id);
+
+			foreach(lc, list)
+			{
+				LrgNode *other_node = (LrgNode *)lfirst(lc);
+				PGconn *otherconn = NULL;
+				lrg_connect(other_node->local_connstring, &otherconn);
+				lrg_create_subscription(group_name, local_node->local_connstring,
+										local_node->node_id, other_node->node_id,
+										otherconn, "local_only = true, copy_data = false");
+				lrg_create_subscription(group_name, other_node->local_connstring,
+										other_node->node_id, local_node->node_id,
+										localconn, "local_only = true, copy_data = false");
+
+				/*
+				 * XXX: adding a tuple into remote's pg_lrg_nodes here,
+				 * but it is bad. it should be end of this function.
+				 */
+				if (local_node->upstream_connstring != NULL)
+					lrg_insert_into_lrg_nodes(otherconn, local_node->node_id,
+							LRG_STATE_READY, local_node->node_name,
+							local_node->local_connstring, local_node->upstream_connstring);
+				lrg_disconnect(otherconn);
+			}
+			MemoryContextSwitchTo(oldctx);
+			MemoryContextDelete(subctx);
+		}
+
+		state = LRG_STATE_CREATE_SUBSCRIPTION;
+		update_mynode(LRG_STATE_CREATE_SUBSCRIPTION);
+	}
+
+	state = LRG_STATE_READY;
+	update_mynode(LRG_STATE_READY);
+
+	/*
+	 * clean up phase
+	 */
+	if (localconn != NULL)
+		lrg_disconnect(localconn);
+	if (upstreamconn != NULL)
+		lrg_disconnect(upstreamconn);
+	if (group_name != NULL)
+		pfree(group_name);
+}
+
+/*
+ * Get node-specific information from pg_lrg_nodes.
+ */
+static void
+get_node_information(LrgNode *node, LRG_NODE_STATE *status)
+{
+	Relation	rel;
+	HeapTuple	tup;
+	bool found = false;
+	char local_node_id[64];
+	SysScanDesc scandesc;
+	ScanKeyData entry[1];
+
+	construct_node_id(local_node_id, sizeof(local_node_id));
+
+	/*
+	 * Read a related tuple from pg_lrg_nodes.
+	 * TODO: use index scan instead
+	 */
+	StartTransactionCommand();
+	(void) GetTransactionSnapshot();
+
+	rel = table_open(LrgNodesRelationId, AccessShareLock);
+
+	ScanKeyInit(&entry[0],
+				Anum_pg_lrg_nodes_nodeid,
+				BTEqualStrategyNumber, F_NAMEEQ,
+				CStringGetDatum(local_node_id));
+
+	scandesc = systable_beginscan(rel, LrgNodeIdIndexId, true,
+								  NULL, 1, entry);
+
+	tup = systable_getnext(scandesc);
+
+	/* We assume that there can be at most one matching tuple */
+	if (HeapTupleIsValid(tup))
+	{
+		MemoryContext old;
+		Form_pg_lrg_nodes nodesform = (Form_pg_lrg_nodes) GETSTRUCT(tup);
+
+		old = MemoryContextSwitchTo(TopMemoryContext);
+		node->group_oid = nodesform->groupid;
+		node->node_id = pstrdup(NameStr(nodesform->nodeid));
+		node->node_name = pstrdup(NameStr(nodesform->nodename));
+		node->local_connstring = pstrdup(NameStr(nodesform->localconn));
+		if (strlen(NameStr(nodesform->upstreamconn)) != 0)
+			node->upstream_connstring = pstrdup(NameStr(nodesform->upstreamconn));
+		else
+			node->upstream_connstring = NULL;
+		*status = nodesform->status;
+		found = true;
+		MemoryContextSwitchTo(old);
+	}
+
+	systable_endscan(scandesc);
+
+	table_close(rel, AccessShareLock);
+	CommitTransactionCommand();
+
+	if (!found)
+		elog(ERROR, "no tuples found");
+
+}
+
+static void
+do_node_management(void)
+{
+	LrgNode node;
+	LRG_NODE_STATE status;
+	/*
+	 * read information from pg_lrg_nodes
+	 */
+	get_node_information(&node, &status);
+	elog(DEBUG3, "initial status of %u: %d", my_lrg_worker->dbid, status);
+
+	/*
+	 * advance the state machine for creating or attaching.
+	 *
+	 * TODO: consider detaching case
+	 */
+	advance_state_machine(&node, status);
+
+	pfree(node.node_id);
+	pfree(node.node_name);
+	pfree(node.local_connstring);
+	if (node.upstream_connstring != NULL)
+		pfree(node.upstream_connstring);
+}
+
+/*
+ * Entry point for lrg worker
+ */
+void
+lrg_worker_main(Datum arg)
+{
+	int slot = DatumGetInt32(arg);
+
+	/* Establish signal handlers. */
+	pqsignal(SIGHUP, SignalHandlerForConfigReload);
+	pqsignal(SIGTERM, die);
+	BackgroundWorkerUnblockSignals();
+
+	/*
+	 * Get information from the controller. The idex
+	 * is given as the argument
+	 */
+	LWLockAcquire(&LrgPerdbCtx->lock, LW_SHARED);
+	my_lrg_worker = &LrgPerdbCtx->workers[slot];
+	my_lrg_worker->worker_pid = MyProcPid;
+	my_lrg_worker->worker_latch = &MyProc->procLatch;
+	LWLockRelease(&LrgPerdbCtx->lock);
+
+	before_shmem_exit(lrg_worker_onexit, (Datum) 0);
+
+	BackgroundWorkerInitializeConnectionByOid(my_lrg_worker->dbid, 0, 0);
+
+	elog(DEBUG3, "per-db worker for %u was launched", my_lrg_worker->dbid);
+
+	/*
+	 * The launcher launches the worker without considering
+	 * the existence of lrg related data.
+	 * So firstly workers must check their catalogs, and exit
+	 * if there is no data.
+	 * In any cases pg_lrg_info will have tuples if
+	 * this node is in a node group, so we reads it.
+	 */
+	if (!get_group_oid())
+	{
+		elog(DEBUG3, "This database %u is not a member of lrg", MyDatabaseId);
+		proc_exit(0);
+	}
+
+	do_node_management();
+
+	ResetLatch(&MyProc->procLatch);
+
+	/*
+	 * Wait for detaching or removing.
+	 */
+	for (;;)
+	{
+		int rc;
+		bool is_latch_set = false;
+
+		CHECK_FOR_INTERRUPTS();
+
+#define TEMPORARY_NAP_TIME 180000L
+
+		rc = WaitLatch(&MyProc->procLatch,
+					   WL_LATCH_SET | WL_TIMEOUT | WL_EXIT_ON_PM_DEATH,
+					   TEMPORARY_NAP_TIME, 0);
+
+		if (rc & WL_LATCH_SET)
+		{
+			is_latch_set = true;
+			ResetLatch(&MyProc->procLatch);
+			CHECK_FOR_INTERRUPTS();
+		}
+
+		if (ConfigReloadPending)
+		{
+			ConfigReloadPending = false;
+			ProcessConfigFile(PGC_SIGHUP);
+		}
+
+		if (is_latch_set)
+		{
+			do_node_management();
+			is_latch_set = false;
+		}
+	}
+}
diff --git a/src/backend/storage/ipc/ipci.c b/src/backend/storage/ipc/ipci.c
index 26372d95b3..15b77405bc 100644
--- a/src/backend/storage/ipc/ipci.c
+++ b/src/backend/storage/ipc/ipci.c
@@ -32,6 +32,7 @@
 #include "postmaster/bgwriter.h"
 #include "postmaster/postmaster.h"
 #include "replication/logicallauncher.h"
+#include "replication/lrg.h"
 #include "replication/origin.h"
 #include "replication/slot.h"
 #include "replication/walreceiver.h"
@@ -284,6 +285,7 @@ CreateSharedMemoryAndSemaphores(void)
 	WalRcvShmemInit();
 	PgArchShmemInit();
 	ApplyLauncherShmemInit();
+	LrgLauncherShmemInit();
 
 	/*
 	 * Set up other modules that need some shared memory space
diff --git a/src/include/catalog/pg_lrg_info.h b/src/include/catalog/pg_lrg_info.h
new file mode 100644
index 0000000000..0067aac389
--- /dev/null
+++ b/src/include/catalog/pg_lrg_info.h
@@ -0,0 +1,47 @@
+/*-------------------------------------------------------------------------
+ *
+ * pg_lrg_info.h
+ *	  definition of the "logical replication group information" system
+ *	  catalog (pg_lrg_info)
+ *
+ *
+ * Portions Copyright (c) 1996-2022, PostgreSQL Global Development Group
+ * Portions Copyright (c) 1994, Regents of the University of California
+ *
+ * src/include/catalog/pg_lrg_info.h
+ *
+ * NOTES
+ *	  The Catalog.pm module reads this file and derives schema
+ *	  information.
+ *
+ *-------------------------------------------------------------------------
+ */
+#ifndef PG_LRG_INFO_H
+#define PG_LRG_INFO_H
+
+#include "catalog/genbki.h"
+#include "catalog/pg_lrg_info_d.h"
+
+/* ----------------
+ *		pg_lrg_info definition.  cpp turns this into
+ *		typedef struct FormData_pg_lrg_info
+ * ----------------
+ */
+CATALOG(pg_lrg_info,8337,LrgInfoRelationId)
+{
+	Oid			oid;			/* oid */
+
+	NameData	groupname;		/* name of the logical replication group */
+	bool		puballtables;
+} FormData_pg_lrg_info;
+
+/* ----------------
+ *		Form_pg_lrg_info corresponds to a pointer to a tuple with
+ *		the format of pg_lrg_info relation.
+ * ----------------
+ */
+typedef FormData_pg_lrg_info *Form_pg_lrg_info;
+
+DECLARE_UNIQUE_INDEX_PKEY(pg_lrg_info_oid_index, 8338, LrgInfoRelationIndexId, on pg_lrg_info using btree(oid oid_ops));
+
+#endif							/* PG_LRG_INFO_H */
diff --git a/src/include/catalog/pg_lrg_nodes.h b/src/include/catalog/pg_lrg_nodes.h
new file mode 100644
index 0000000000..b4e4b290dc
--- /dev/null
+++ b/src/include/catalog/pg_lrg_nodes.h
@@ -0,0 +1,53 @@
+/*-------------------------------------------------------------------------
+ *
+ * pg_lrg_nodes.h
+ *	  definition of the "logical replication nodes" system
+ *	  catalog (pg_lrg_nodes)
+ *
+ *
+ * Portions Copyright (c) 1996-2022, PostgreSQL Global Development Group
+ * Portions Copyright (c) 1994, Regents of the University of California
+ *
+ * src/include/catalog/pg_lrg_nodes.h
+ *
+ * NOTES
+ *	  The Catalog.pm module reads this file and derives schema
+ *	  information.
+ *
+ *-------------------------------------------------------------------------
+ */
+#ifndef PG_LRG_NODES_H
+#define PG_LRG_NODES_H
+
+#include "catalog/genbki.h"
+#include "catalog/pg_lrg_nodes_d.h"
+
+/* ----------------
+ *		pg_lrg_nodes definition.  cpp turns this into
+ *		typedef struct FormData_pg_lrg_nodes
+ * ----------------
+ */
+CATALOG(pg_lrg_nodes,8339,LrgNodesRelationId)
+{
+	Oid			oid;			/* oid */
+
+	NameData	nodeid;		/* name of the logical replication group */
+	Oid			groupid BKI_LOOKUP(pg_lrg_info);
+	Oid 		dbid BKI_LOOKUP(pg_database);
+	int32		status;
+	NameData	nodename;
+	NameData	localconn;
+	NameData	upstreamconn BKI_FORCE_NULL;
+} FormData_pg_lrg_nodes;
+
+/* ----------------
+ *		Form_pg_lrg_nodes corresponds to a pointer to a tuple with
+ *		the format of pg_lrg_nodes relation.
+ * ----------------
+ */
+typedef FormData_pg_lrg_nodes *Form_pg_lrg_nodes;
+
+DECLARE_UNIQUE_INDEX_PKEY(pg_lrg_nodes_oid_index, 8340, LrgNodesRelationIndexId, on pg_lrg_nodes using btree(oid oid_ops));
+DECLARE_UNIQUE_INDEX(pg_lrg_nodes_name_index, 8346, LrgNodeIdIndexId, on pg_lrg_nodes using btree(nodeid name_ops));
+
+#endif							/* PG_LRG_NODES_H */
diff --git a/src/include/catalog/pg_lrg_pub.h b/src/include/catalog/pg_lrg_pub.h
new file mode 100644
index 0000000000..d65dc51d4d
--- /dev/null
+++ b/src/include/catalog/pg_lrg_pub.h
@@ -0,0 +1,46 @@
+/*-------------------------------------------------------------------------
+ *
+ * pg_lrg_info.h
+ *	  definition of the "logical replication group publication" system
+ *	  catalog (pg_lrg_pub)
+ *
+ *
+ * Portions Copyright (c) 1996-2022, PostgreSQL Global Development Group
+ * Portions Copyright (c) 1994, Regents of the University of California
+ *
+ * src/include/catalog/pg_lrg_pub.h
+ *
+ * NOTES
+ *	  The Catalog.pm module reads this file and derives schema
+ *	  information.
+ *
+ *-------------------------------------------------------------------------
+ */
+#ifndef PG_LRG_PUB_H
+#define PG_LRG_PUB_H
+
+#include "catalog/genbki.h"
+#include "catalog/pg_lrg_pub_d.h"
+
+/* ----------------
+ *		pg_lrg_pub definition.  cpp turns this into
+ *		typedef struct FormData_pg_lrg_pub
+ * ----------------
+ */
+CATALOG(pg_lrg_pub,8341,LrgPublicationId)
+{
+	Oid			oid;
+	Oid 		groupid BKI_LOOKUP(pg_lrg_info);
+	Oid 		pubid BKI_LOOKUP(pg_publication);
+} FormData_pg_lrg_pub;
+
+/* ----------------
+ *		Form_pg_lrg_pub corresponds to a pointer to a tuple with
+ *		the format of pg_lrg_pub relation.
+ * ----------------
+ */
+typedef FormData_pg_lrg_pub *Form_pg_lrg_pub;
+
+DECLARE_UNIQUE_INDEX_PKEY(pg_lrg_pub_oid_index, 8344, LrgPublicationOidIndexId, on pg_lrg_pub using btree(oid oid_ops));
+
+#endif							/* PG_LRG_PUB_H */
diff --git a/src/include/catalog/pg_lrg_sub.h b/src/include/catalog/pg_lrg_sub.h
new file mode 100644
index 0000000000..398c8e8971
--- /dev/null
+++ b/src/include/catalog/pg_lrg_sub.h
@@ -0,0 +1,46 @@
+/*-------------------------------------------------------------------------
+ *
+ * pg_lrg_sub.h
+ *	  definition of the "logical replication group subscription" system
+ *	  catalog (pg_lrg_sub)
+ *
+ *
+ * Portions Copyright (c) 1996-2022, PostgreSQL Global Development Group
+ * Portions Copyright (c) 1994, Regents of the University of California
+ *
+ * src/include/catalog/pg_lrg_sub.h
+ *
+ * NOTES
+ *	  The Catalog.pm module reads this file and derives schema
+ *	  information.
+ *
+ *-------------------------------------------------------------------------
+ */
+#ifndef PG_LRG_SUB_H
+#define PG_LRG_SUB_H
+
+#include "catalog/genbki.h"
+#include "catalog/pg_lrg_sub_d.h"
+
+/* ----------------
+ *		pg_lrg_sub definition.  cpp turns this into
+ *		typedef struct FormData_pg_lrg_sub
+ * ----------------
+ */
+CATALOG(pg_lrg_sub,8343,LrgSubscriptionId)
+{
+	Oid			oid;
+	Oid 		groupid BKI_LOOKUP(pg_lrg_info);;
+	Oid 		subid BKI_LOOKUP(pg_subscription);
+} FormData_pg_lrg_sub;
+
+/* ----------------
+ *		Form_pg_lrg_sub corresponds to a pointer to a tuple with
+ *		the format of pg_lrg_sub relation.
+ * ----------------
+ */
+typedef FormData_pg_lrg_sub *Form_pg_lrg_sub;
+
+DECLARE_UNIQUE_INDEX_PKEY(pg_lrg_sub_oid_index, 8345, LrgSubscriptionOidIndexId, on pg_lrg_sub using btree(oid oid_ops));
+
+#endif							/* PG_LRG_SUB_H */
diff --git a/src/include/catalog/pg_proc.dat b/src/include/catalog/pg_proc.dat
index babe16f00a..8350934a77 100644
--- a/src/include/catalog/pg_proc.dat
+++ b/src/include/catalog/pg_proc.dat
@@ -11885,4 +11885,29 @@
   prorettype => 'bytea', proargtypes => 'pg_brin_minmax_multi_summary',
   prosrc => 'brin_minmax_multi_summary_send' },
 
+# lrg
+{ oid => '8143', descr => 'create logical replication group',
+  proname => 'lrg_create', proparallel => 'r',
+  prorettype => 'void', proargtypes => 'text text text text',
+  prosrc => 'lrg_create' },
+{ oid => '8144', descr => 'attach to logical replication group',
+  proname => 'lrg_node_attach', proparallel => 'r',
+  prorettype => 'void', proargtypes => 'text text text text',
+  prosrc => 'lrg_node_attach' },
+{ oid => '8145', descr => 'detach from logical replication group',
+  proname => 'lrg_node_detach', proparallel => 'r',
+  prorettype => 'void', proargtypes => 'text text text text',
+  prosrc => 'lrg_node_detach' },
+{ oid => '8146', descr => 'delete logical replication group',
+  proname => 'lrg_drop', proparallel => 'r',
+  prorettype => 'void', proargtypes => 'text text text text',
+  prosrc => 'lrg_drop' },
+{ oid => '8147', descr => 'insert a tuple to pg_lrg_sub',
+  proname => 'lrg_insert_into_sub', proparallel => 'r',
+  prorettype => 'void', proargtypes => 'text',
+  prosrc => 'lrg_insert_into_sub' },
+{ oid => '8148', descr => 'insert a tuple to pg_lrg_nodes',
+  proname => 'lrg_insert_into_nodes', proparallel => 'r',
+  prorettype => 'void', proargtypes => 'text int4 text text text',
+  prosrc => 'lrg_insert_into_nodes' },
 ]
diff --git a/src/include/replication/libpqlrg.h b/src/include/replication/libpqlrg.h
new file mode 100644
index 0000000000..650715d40d
--- /dev/null
+++ b/src/include/replication/libpqlrg.h
@@ -0,0 +1,63 @@
+/*-------------------------------------------------------------------------
+ *
+ * libpqlrg.h
+ *		  Constructs a logical replication group
+ *
+ *-------------------------------------------------------------------------
+ */
+#ifndef LIBPQLIG_H
+#define LIBPQLIG_H
+
+#include "postgres.h"
+#include "libpq-fe.h"
+#include "replication/lrg.h"
+
+typedef void (*libpqlrg_connect_fn) (const char *connstring, PGconn **conn);
+typedef bool (*libpqlrg_check_group_fn) (PGconn *conn, const char *group_name);
+typedef void (*libpqlrg_copy_lrg_nodes_fn) (PGconn *remoteconn, PGconn *localconn);
+typedef void (*libpqlrg_insert_into_lrg_nodes_fn) (PGconn *remoteconn,
+												   const char *node_id, LRG_NODE_STATE status,
+												   const char *node_name, const char *local_connstring,
+												   const char *upstream_connstring);
+typedef void (*libpqlrg_create_subscription_fn) (const char *group_name, const char *publisher_connstring,
+											  const char *publisher_node_id, const char *subscriber_node_id,
+											  PGconn *subscriberconn, const char *options);
+typedef void (*libpqlrg_disconnect_fn) (PGconn *conn);
+
+typedef struct lrg_function_types
+{
+	libpqlrg_connect_fn libpqlrg_connect;
+	libpqlrg_check_group_fn libpqlrg_check_group;
+	libpqlrg_copy_lrg_nodes_fn libpqlrg_copy_lrg_nodes;
+	libpqlrg_insert_into_lrg_nodes_fn libpqlrg_insert_into_lrg_nodes;
+	libpqlrg_create_subscription_fn libpqlrg_create_subscription;
+	libpqlrg_disconnect_fn libpqlrg_disconnect;
+} lrg_function_types;
+
+extern PGDLLIMPORT lrg_function_types *LrgFunctionTypes;
+
+#define lrg_connect(connstring, conn) \
+	LrgFunctionTypes->libpqlrg_connect(connstring, conn)
+#define lrg_check_group(conn, group_name) \
+	LrgFunctionTypes->libpqlrg_check_group(conn, group_name)
+#define lrg_copy_lrg_nodes(remoteconn, localconn) \
+	LrgFunctionTypes->libpqlrg_copy_lrg_nodes(remoteconn, localconn)
+
+#define lrg_insert_into_lrg_nodes(remoteconn, \
+								  node_id, status, \
+								  node_name, local_connstring, \
+								  upstream_connstring) \
+	LrgFunctionTypes->libpqlrg_insert_into_lrg_nodes(remoteconn, \
+													 node_id, status, \
+													 node_name, local_connstring, \
+													 upstream_connstring)
+#define lrg_create_subscription(group_name, publisher_connstring, \
+								publisher_node_id, subscriber_node_id, \
+								subscriberconn, options) \
+	LrgFunctionTypes->libpqlrg_create_subscription(group_name, publisher_connstring, \
+												publisher_node_id, subscriber_node_id, \
+												subscriberconn, options)
+#define lrg_disconnect(conn) \
+	LrgFunctionTypes->libpqlrg_disconnect(conn)
+
+#endif /* LIBPQLIG_H */
\ No newline at end of file
diff --git a/src/include/replication/lrg.h b/src/include/replication/lrg.h
new file mode 100644
index 0000000000..874cfe6477
--- /dev/null
+++ b/src/include/replication/lrg.h
@@ -0,0 +1,67 @@
+/*-------------------------------------------------------------------------
+ *
+ * lrg.h
+ *		  Constructs a logical replication group
+ *
+ *-------------------------------------------------------------------------
+ */
+#ifndef LRG_H
+#define LRG_H
+
+#include "postgres.h"
+
+#include "storage/latch.h"
+#include "storage/lock.h"
+#include "storage/lwlock.h"
+
+/*
+ * enumeration for represents its status
+ */
+typedef enum
+{
+	LRG_STATE_INIT = 0,
+	LRG_STATE_CREATE_PUBLICATION,
+	LRG_STATE_CREATE_SUBSCRIPTION,
+	LRG_STATE_READY,
+	LRG_STATE_TO_BE_DETACHED,
+} LRG_NODE_STATE;
+
+/*
+ * working space for each lrg per-db worker.
+ */
+typedef struct LrgPerdbWorker {
+	pid_t worker_pid;
+	Oid dbid;
+	Latch *worker_latch;
+} LrgPerdbWorker;
+
+/*
+ * controller for lrg per-db worker.
+ * This will be hold by launcher.
+ */
+typedef struct LrgPerdbCtxStruct {
+	LWLock lock;
+	pid_t launcher_pid;
+	Latch *launcher_latch;
+	LrgPerdbWorker workers[FLEXIBLE_ARRAY_MEMBER];
+} LrgPerdbCtxStruct;
+
+extern LrgPerdbCtxStruct *LrgPerdbCtx;
+
+/* lrg.c */
+extern void LrgLauncherShmemInit(void);
+extern void LrgLauncherRegister(void);
+extern void lrg_add_nodes(char *node_id, Oid group_id, LRG_NODE_STATE status, char *node_name, char *local_connstring, char *upstream_connstring);
+extern Oid get_group_oid(void);
+extern void construct_node_id(char *out_node_id, int size);
+
+
+/* lrg_launcher.c */
+extern void lrg_launcher_main(Datum arg) pg_attribute_noreturn();
+extern void lrg_launcher_wakeup(void);
+
+/* *lrg_worker.c */
+extern void lrg_worker_main(Datum arg) pg_attribute_noreturn();
+extern void lrg_worker_cleanup(LrgPerdbWorker *worker);
+
+#endif /* LRG_H */
\ No newline at end of file
diff --git a/src/test/regress/expected/oidjoins.out b/src/test/regress/expected/oidjoins.out
index 215eb899be..da0ef150a2 100644
--- a/src/test/regress/expected/oidjoins.out
+++ b/src/test/regress/expected/oidjoins.out
@@ -266,3 +266,9 @@ NOTICE:  checking pg_subscription {subdbid} => pg_database {oid}
 NOTICE:  checking pg_subscription {subowner} => pg_authid {oid}
 NOTICE:  checking pg_subscription_rel {srsubid} => pg_subscription {oid}
 NOTICE:  checking pg_subscription_rel {srrelid} => pg_class {oid}
+NOTICE:  checking pg_lrg_nodes {groupid} => pg_lrg_info {oid}
+NOTICE:  checking pg_lrg_nodes {dbid} => pg_database {oid}
+NOTICE:  checking pg_lrg_pub {groupid} => pg_lrg_info {oid}
+NOTICE:  checking pg_lrg_pub {pubid} => pg_publication {oid}
+NOTICE:  checking pg_lrg_sub {groupid} => pg_lrg_info {oid}
+NOTICE:  checking pg_lrg_sub {subid} => pg_subscription {oid}
-- 
2.27.0

v1-0002-add-doc.patchapplication/octet-stream; name=v1-0002-add-doc.patchDownload

From 3c8db20981ab74a562b14fb60143559cd614dfb2 Mon Sep 17 00:00:00 2001
From: "kuroda.hayato%40jp.fujitsu.com" <kuroda.hayato@jp.fujitsu.com>
Date: Wed, 18 May 2022 04:56:18 +0000
Subject: [PATCH 2/2] add doc

---
 doc/src/sgml/catalogs.sgml | 296 +++++++++++++++++++++++++++++++++++++
 doc/src/sgml/filelist.sgml |   1 +
 doc/src/sgml/func.sgml     |  54 +++++++
 doc/src/sgml/lrg.sgml      | 294 ++++++++++++++++++++++++++++++++++++
 doc/src/sgml/postgres.sgml |   1 +
 5 files changed, 646 insertions(+)
 create mode 100644 doc/src/sgml/lrg.sgml

diff --git a/doc/src/sgml/catalogs.sgml b/doc/src/sgml/catalogs.sgml
index a533a2153e..29abc04fab 100644
--- a/doc/src/sgml/catalogs.sgml
+++ b/doc/src/sgml/catalogs.sgml
@@ -200,6 +200,22 @@
       <entry>metadata for large objects</entry>
      </row>
 
+     <row>
+      <entry><link linkend="catalog-pg-lrg-info"><structname>pg_lrg_info</structname></link></entry>
+      <entry>logical replication group</entry>
+     </row>
+
+     <row>
+      <entry><link linkend="catalog-pg-lrg-nodes"><structname>pg_lrg_nodes</structname></link></entry>
+      <entry>logical replication group</entry>
+     </row>
+
+     <row>
+      <entry><link linkend="catalog-pg-lrg-pub"><structname>pg_lrg_pub</structname></link></entry>
+      <entry>logical replication group</entry>
+     </row>
+
+
      <row>
       <entry><link linkend="catalog-pg-namespace"><structname>pg_namespace</structname></link></entry>
       <entry>schemas</entry>
@@ -4960,6 +4976,286 @@ SCRAM-SHA-256$<replaceable>&lt;iteration count&gt;</replaceable>:<replaceable>&l
   </table>
  </sect1>
 
+ <sect1 id="catalog-pg-lrg-info">
+  <title><structname>pg_lrg_info</structname></title>
+
+  <indexterm zone="catalog-pg-lrg-info">
+   <primary>pg_lrg_info</primary>
+  </indexterm>
+  <para>
+   test
+  </para>
+
+  <table>
+   <title><structname>pg_lrg_info</structname> Columns</title>
+   <tgroup cols="1">
+    <thead>
+     <row>
+      <entry role="catalog_table_entry"><para role="column_definition">
+       Column Type
+      </para>
+      <para>
+       Description
+      </para></entry>
+     </row>
+    </thead>
+
+    <tbody>
+     <row>
+      <entry role="catalog_table_entry"><para role="column_definition">
+       <structfield>oid</structfield> <type>oid</type>
+      </para>
+      <para>
+       Row identifier
+      </para></entry>
+     </row>
+
+     <row>
+      <entry role="catalog_table_entry"><para role="column_definition">
+       <structfield>groupname</structfield> <type>name</type>
+      </para>
+      <para>
+       Name of the group
+      </para></entry>
+     </row>
+
+     <row>
+      <entry role="catalog_table_entry"><para role="column_definition">
+       <structfield>puballtables</structfield> <type>bool</type>
+      </para>
+      <para>
+       The type of publication
+      </para></entry>
+     </row>
+    </tbody>
+   </tgroup>
+  </table>
+ </sect1>
+
+ <sect1 id="catalog-pg-lrg-nodes">
+  <title><structname>pg_lrg_nodes</structname></title>
+
+  <indexterm zone="catalog-pg-lrg-nodes">
+   <primary>pg_lrg_nodes</primary>
+  </indexterm>
+  <para>
+   test
+  </para>
+
+  <table>
+   <title><structname>pg_lrg_nodes</structname> Columns</title>
+   <tgroup cols="1">
+    <thead>
+     <row>
+      <entry role="catalog_table_entry"><para role="column_definition">
+       Column Type
+      </para>
+      <para>
+       Description
+      </para></entry>
+     </row>
+    </thead>
+
+    <tbody>
+     <row>
+      <entry role="catalog_table_entry"><para role="column_definition">
+       <structfield>oid</structfield> <type>oid</type>
+      </para>
+      <para>
+       Row identifier
+      </para></entry>
+     </row>
+
+     <row>
+      <entry role="catalog_table_entry"><para role="column_definition">
+       <structfield>nodeid</structfield> <type>name</type>
+      </para>
+      <para>
+       Identifier of this node
+      </para></entry>
+     </row>
+
+     <row>
+      <entry role="catalog_table_entry"><para role="column_definition">
+       <structfield>groupid</structfield> <type>oid</type>
+       (references <link linkend="catalog-pg-lrg-info"><structname>pg_lrg_info</structname></link>.<structfield>oid</structfield>)
+      </para>
+      <para>
+       attached group
+      </para></entry>
+     </row>
+
+     <row>
+      <entry role="catalog_table_entry"><para role="column_definition">
+       <structfield>dbid</structfield> <type>oid</type>
+       (references <link linkend="catalog-pg-database"><structname>pg_database</structname></link>.<structfield>oid</structfield>)
+      </para>
+      <para>
+       database
+      </para></entry>
+     </row>
+
+     <row>
+      <entry role="catalog_table_entry"><para role="column_definition">
+       <structfield>status</structfield> <type>int</type>
+      </para>
+      <para>
+       status
+      </para></entry>
+     </row>
+
+     <row>
+      <entry role="catalog_table_entry"><para role="column_definition">
+       <structfield>nodename</structfield> <type>name</type>
+      </para>
+      <para>
+       Name of this node
+      </para></entry>
+     </row>
+
+     <row>
+      <entry role="catalog_table_entry"><para role="column_definition">
+       <structfield>localconn</structfield> <type>name</type>
+      </para>
+      <para>
+       connection string for this node
+      </para></entry>
+     </row>
+
+     <row>
+      <entry role="catalog_table_entry"><para role="column_definition">
+       <structfield>upstreamconn</structfield> <type>name</type>
+      </para>
+      <para>
+       connection string for upstream node
+      </para></entry>
+     </row>
+
+    </tbody>
+   </tgroup>
+  </table>
+ </sect1>
+
+ <sect1 id="catalog-pg-lrg-pub">
+  <title><structname>pg_lrg_pub</structname></title>
+
+  <indexterm zone="catalog-pg-lrg-pub">
+   <primary>pg_lrg_pub</primary>
+  </indexterm>
+  <para>
+   test
+  </para>
+
+  <table>
+   <title><structname>pg_lrg_pub</structname> Columns</title>
+   <tgroup cols="1">
+    <thead>
+     <row>
+      <entry role="catalog_table_entry"><para role="column_definition">
+       Column Type
+      </para>
+      <para>
+       Description
+      </para></entry>
+     </row>
+    </thead>
+
+    <tbody>
+     <row>
+      <entry role="catalog_table_entry"><para role="column_definition">
+       <structfield>oid</structfield> <type>oid</type>
+      </para>
+      <para>
+       Row identifier
+      </para></entry>
+     </row>
+
+     <row>
+      <entry role="catalog_table_entry"><para role="column_definition">
+       <structfield>groupid</structfield> <type>oid</type>
+       (references <link linkend="catalog-pg-lrg-info"><structname>pg_lrg_info</structname></link>.<structfield>oid</structfield>)
+      </para>
+      <para>
+       attached group
+      </para></entry>
+     </row>
+
+     <row>
+      <entry role="catalog_table_entry"><para role="column_definition">
+       <structfield>pubid</structfield> <type>oid</type>
+       (references <link linkend="catalog-pg-publication"><structname>pg_publication</structname></link>.<structfield>oid</structfield>)
+      </para>
+      <para>
+       publication
+      </para></entry>
+     </row>
+
+
+    </tbody>
+   </tgroup>
+  </table>
+ </sect1>
+
+
+ <sect1 id="catalog-pg-lrg-sub">
+  <title><structname>pg_lrg_sub</structname></title>
+
+  <indexterm zone="catalog-pg-lrg-sub">
+   <primary>pg_lrg_sub</primary>
+  </indexterm>
+  <para>
+   test
+  </para>
+
+  <table>
+   <title><structname>pg_lrg_sub</structname> Columns</title>
+   <tgroup cols="1">
+    <thead>
+     <row>
+      <entry role="catalog_table_entry"><para role="column_definition">
+       Column Type
+      </para>
+      <para>
+       Description
+      </para></entry>
+     </row>
+    </thead>
+
+    <tbody>
+     <row>
+      <entry role="catalog_table_entry"><para role="column_definition">
+       <structfield>oid</structfield> <type>oid</type>
+      </para>
+      <para>
+       Row identifier
+      </para></entry>
+     </row>
+
+     <row>
+      <entry role="catalog_table_entry"><para role="column_definition">
+       <structfield>groupid</structfield> <type>oid</type>
+       (references <link linkend="catalog-pg-lrg-info"><structname>pg_lrg_info</structname></link>.<structfield>oid</structfield>)
+      </para>
+      <para>
+       attached group
+      </para></entry>
+     </row>
+
+     <row>
+      <entry role="catalog_table_entry"><para role="column_definition">
+       <structfield>subid</structfield> <type>oid</type>
+       (references <link linkend="catalog-pg-subscription"><structname>pg_subscription</structname></link>.<structfield>oid</structfield>)
+      </para>
+      <para>
+       subscription
+      </para></entry>
+     </row>
+
+
+    </tbody>
+   </tgroup>
+  </table>
+ </sect1>
 
  <sect1 id="catalog-pg-namespace">
   <title><structname>pg_namespace</structname></title>
diff --git a/doc/src/sgml/filelist.sgml b/doc/src/sgml/filelist.sgml
index 40ef5f7ffc..8be17a652e 100644
--- a/doc/src/sgml/filelist.sgml
+++ b/doc/src/sgml/filelist.sgml
@@ -49,6 +49,7 @@
 <!ENTITY wal           SYSTEM "wal.sgml">
 <!ENTITY logical-replication    SYSTEM "logical-replication.sgml">
 <!ENTITY jit    SYSTEM "jit.sgml">
+<!ENTITY lrg    SYSTEM "lrg.sgml">
 
 <!-- programmer's guide -->
 <!ENTITY bgworker   SYSTEM "bgworker.sgml">
diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml
index 85ecc639fd..38eac617fa 100644
--- a/doc/src/sgml/func.sgml
+++ b/doc/src/sgml/func.sgml
@@ -29233,6 +29233,60 @@ postgres=# SELECT * FROM pg_walfile_name_offset((pg_backup_stop()).lsn);
 
   </sect2>
 
+  <sect2 id="functions-lrg">
+   <title>Logical Replication Group Management Functions</title>
+
+   <para>
+    The functions shown
+    in <xref linkend="functions-lrg-table"/> are for
+    controlling and interacting with logical replication groups.
+   </para>
+
+   <table id="functions-lrg-table">
+    <title>LRG Management Functions</title>
+    <tgroup cols="1">
+     <thead>
+      <row>
+       <entry role="func_table_entry"><para role="func_signature">
+        Function
+       </para>
+       <para>
+        Description
+       </para></entry>
+      </row>
+     </thead>
+
+     <tbody>
+      <row>
+       <entry role="func_table_entry"><para role="func_signature">
+        <indexterm>
+         <primary></primary>
+        </indexterm>
+        <function>lrg_create</function> ( <parameter>group_name</parameter> <type>name</type>, <parameter>publication_type</parameter> <type>text</type>, <parameter>local_connstring</parameter> <type>text</type>, <parameter>node_name</parameter> <type>name</type>)
+        <returnvalue></returnvalue>
+       </para>
+       <para>
+        creates a group.
+       </para></entry>
+      </row>
+      <row>
+       <entry role="func_table_entry"><para role="func_signature">
+        <indexterm>
+         <primary></primary>
+        </indexterm>
+        <function>lrg_node_attach</function> ( <parameter>group_name</parameter> <type>name</type>, <parameter>local_connstring</parameter> <type>text</type>, <parameter>upstream_connstring</parameter> <type>text</type>, <parameter>node_name</parameter> <type>name</type>)
+        <returnvalue></returnvalue>
+       </para>
+       <para>
+        attach to a group.
+       </para></entry>
+      </row>
+     </tbody>
+    </tgroup>
+   </table>
+
+  </sect2>
+
   <sect2 id="functions-admin-dbobject">
    <title>Database Object Management Functions</title>
 
diff --git a/doc/src/sgml/lrg.sgml b/doc/src/sgml/lrg.sgml
new file mode 100644
index 0000000000..307ccfb61a
--- /dev/null
+++ b/doc/src/sgml/lrg.sgml
@@ -0,0 +1,294 @@
+<!-- doc/src/sgml/lrg.sgml -->
+ <chapter id="lrg">
+  <title>Logical Replication Group (LRG)</title>
+
+  <indexterm zone="lrg">
+   <primary>logical replication group</primary>
+  </indexterm>
+
+ <para>
+  Logical Replication Group(LRG) provides an easy way for constructing
+  N-directional logical replicaiton systems. Advantages of LRG are
+
+   <itemizedlist>
+    <listitem>
+     <para>
+      Allowing load balancing
+     </para>
+    </listitem>
+    <listitem>
+     <para>
+      Allowing rolling updates of nodes
+     </para>
+    </listitem>
+    <listitem>
+     <para>
+      Improving the availability of the system
+     </para>
+    </listitem>
+    <listitem>
+     <para>
+      Improving performance
+     </para>
+    </listitem>
+   </itemizedlist>
+
+ </para>
+
+  <sect1 id="lrg-concepts">
+   <title>Concepts</title>
+   <para>
+    In this section terminology for LRG is defined.
+   </para>
+   <sect2>
+    <title>Node Group</title>
+    <para>
+     Node group is a group that participant nodes have same tables and send/receive their data changes.
+     Note that each node in a group must be connected to all other nodes in that group.
+     LRG accomplishes such a group by creating publications and subscriptions on nodes.
+    </para>
+   </sect2>
+
+   <sect2>
+    <title>Upstream node</title>
+    <para>
+     Upstream node is a node that belongs to a group, and it must be specified when a node attaches to the group.
+     It is not special and arbitrary participants can be used as that.
+     The attaching node will copy data from the upstream.
+    </para>
+   </sect2>
+
+   <sect2>
+    <title>lrg launcher</title>
+    <para>
+     Lrg launcher is a background worker that is registered at postmaster startup.
+     This will be registered when <varname>max_logical_replication_workers</varname> is not equal to zero.
+     This process sometimes seeks pg_database, and it launches lrg worker associated with the database.
+    </para>
+   </sect2>
+
+   <sect2>
+    <title>lrg worker</title>
+    <para>
+     Lrg worker is a background worker that is registered by the lrg launcher.
+     This process is associated with a specific database, and performs all operations related to the LRG feature.
+     If this database is not a member of any node groups, this process will exit immediately.
+    </para>
+   </sect2>
+
+  </sect1>
+
+  <sect1 id="lrg-interface">
+   <title>LRG SQL interface</title>
+    <para>
+     See <xref linkend="functions-lrg"/> for detailed documentation on
+     SQL-level APIs for interacting with logical replication group.
+    </para>
+  </sect1>
+
+  <sect1 id="lrg-catalog">
+   <title>System Catalogs related to LRG</title>
+    <para>
+     Following catalogs are used for LRG.
+
+    <itemizedlist>
+     <listitem>
+      <para>
+       <link linkend="catalog-pg-lrg-info"><structname>pg_lrg_info</structname></link>,
+      </para>
+     </listitem>
+     <listitem>
+      <para>
+       <link linkend="catalog-pg-lrg-nodes"><structname>pg_lrg_nodes</structname></link>,
+      </para>
+     </listitem>
+     <listitem>
+      <para>
+       <link linkend="catalog-pg-lrg-pub"><structname>pg_lrg_pub</structname></link>,
+      </para>
+     </listitem>
+     <listitem>
+      <para>
+       <link linkend="catalog-pg-lrg-sub"><structname>pg_lrg_sub</structname></link>
+      </para>
+     </listitem>
+    </itemizedlist>
+    </para>
+
+  </sect1>
+
+  <sect1 id="lrg-restriction">
+   <title>Restrictions</title>
+    <para>
+     LRG currently has the following restrictions or missing functionality.
+    <itemizedlist>
+     <listitem>
+      <para>
+       Each node can attach only one node set at a time.
+      </para>
+     </listitem>
+     <listitem>
+      <para>
+       All tables must be shard in the database.
+      </para>
+     </listitem>
+     <listitem>
+      <para>
+       LRG does not provide any mechanism for conflict handling (e.g. PK violations).
+       Avoiding conflicts is the user responsibility.
+      </para>
+     </listitem>
+    </itemizedlist>
+    </para>
+  </sect1>
+
+  <sect1 id="lrg-configuration">
+   <title>Configuration</title>
+    <para>
+     LRG will create a publication and subscriptions on each node,
+     that number of subscribers is the number of participants minus one. Therefore LRG requires
+     several configuration options.
+    </para>
+
+    <itemizedlist>
+     <listitem>
+      <para>
+       <varname>wal_level</varname> must be <literal>logical</literal>.
+      </para>
+     </listitem>
+     <listitem>
+      <para>
+       <varname>max_replication_slots</varname> must be set to at least the number of participants.
+      </para>
+     </listitem>
+     <listitem>
+      <para>
+       <varname>max_wal_senders</varname> must be set to at least same as <varname>max_replication_slots</varname>.
+      </para>
+     </listitem>
+     <listitem>
+      <para>
+       <varname>max_logical_replication_workers</varname> must be larger than the number of participants.
+      </para>
+     </listitem>
+     <listitem>
+      <para>
+       <varname>max_worker_processes</varname> must be larger that <literal>max_logical_replication_workers</literal>.
+      </para>
+     </listitem>
+    </itemizedlist>
+  </sect1>
+
+  <sect1 id="lrg-example">
+   <title>Example</title>
+   <para>
+    The following example demonstrates constructing N-directional logical replication group.
+    Assuming that there are three nodes, called Node1, Node2, Node3, and they can connect each other.
+    In the example they are in the same machine.
+   </para>
+
+   <para>
+    At first, a node group must be created. Here it will be created on Node1.
+   </para>
+<programlisting>
+postgres=# -- Create a node group 'testgroup', and attach to it as 'testnode1'
+postgres=# SELECT lrg_create('testgroup', 'FOR ALL TABLES', 'port=5431 dbname=postgres', 'testnode1');
+ lrg_create
+------------
+
+(1 row)
+</programlisting>
+
+   <para>
+    Information of the group and nodes can check via system catalogs,
+    like <link linkend="catalog-pg-lrg-nodes"><structname>pg_lrg_nodes</structname></link>.
+   </para>
+<programlisting>
+postgres=# SELECT * FROM pg_lrg_info ;
+  oid  | groupname | puballtables
+-------+-----------+--------------
+ 16384 | testgroup | t
+(1 row)
+
+postgres=# SELECT * FROM pg_lrg_nodes;
+  oid  |        nodeid        | groupid | dbid | status | nodename  |         localconn         | upstreamconn
+-------+----------------------+---------+------+--------+-----------+---------------------------+--------------
+ 16385 | 70988980716274892555 |   16384 |    5 |      3 | testnode1 | port=5431 dbname=postgres |
+(1 row)
+</programlisting>
+
+   <para>
+    Next Node2 can attach the created group testgroup. Note that the connection string for Node1 must be specified.
+   </para>
+
+<programlisting>
+postgres=# -- Attach to 'testgroup' as 'testnode2'
+postgres=# SELECT lrg_node_attach('testgroup', 'port=5432 dbname=postgres', 'port=5431 dbname=postgres', 'testnode2');
+ lrg_node_attach
+-----------------
+
+(1 row)
+</programlisting>
+
+   <para>
+    The status of all nodes can check via pg_lrg_nodes. Following tuples will be found on both nodes.
+   </para>
+
+<programlisting>
+postgres=# SELECT * FROM pg_lrg_info ;
+  oid  | groupname | puballtables
+-------+-----------+--------------
+ 16384 | testgroup | t
+(1 row)
+
+postgres=# SELECT * FROM pg_lrg_nodes;
+  oid  |        nodeid        | groupid | dbid | status | nodename  |         localconn         |       upstreamconn
+-------+----------------------+---------+------+--------+-----------+---------------------------+---------------------------
+ 16385 | 70989229890284027935 |   16384 |    5 |      3 | testnode2 | port=5432 dbname=postgres | port=5431 dbname=postgres
+ 16386 | 70988980716274892555 |   16385 |    5 |      3 | testnode1 | port=5431 dbname=postgres |
+(2 rows)
+</programlisting>
+
+   <para>
+    Now Node1 and Node2 has same contents about LRG system catalogs,
+    so ether of them can be specified as an upstream node.
+    In below example Node2 is used as upstream for attaching a new node.
+   </para>
+
+<programlisting>
+postgres=# -- Attach to 'testgroup' as 'testode3', and data will be copied from Node2
+postgres=# SELECT lrg_node_attach('testgroup', 'port=5433 dbname=postgres', 'port=5432 dbname=postgres', 'testnode3');
+ lrg_node_attach
+-----------------
+
+(1 row)
+</programlisting>
+
+   <para>
+    Finally pg_lrg_info and pg_lrg_nodes will be like:
+   </para>
+
+<programlisting>
+postgres=# SELECT * FROM pg_lrg_info ;
+  oid  | groupname | puballtables
+-------+-----------+--------------
+ 16384 | testgroup | t
+(1 row)
+
+postgres=# SELECT * FROM pg_lrg_nodes;
+  oid  |        nodeid        | groupid | dbid | status | nodename  |         localconn         |       upstreamconn
+-------+----------------------+---------+------+--------+-----------+---------------------------+---------------------------
+ 16385 | 70989243367269230745 |   16384 |    5 |      3 | testnode3 | port=5433 dbname=postgres | port=5432 dbname=postgres
+ 16386 | 70989229890284027935 |   16385 |    5 |      3 | testnode2 | port=5432 dbname=postgres | port=5431 dbname=postgres
+ 16387 | 70988980716274892555 |   16385 |    5 |      3 | testnode1 | port=5431 dbname=postgres |
+(3 rows)
+</programlisting>
+
+   <para>
+    Now all nodes publish their changes, and they subscribe them. If a tuple inserted on Node1,
+    the data will be also found on Node2 and Node3.
+   </para>
+
+  </sect1>
+
+ </chapter>
\ No newline at end of file
diff --git a/doc/src/sgml/postgres.sgml b/doc/src/sgml/postgres.sgml
index 0b60e46d69..bcea47fdc9 100644
--- a/doc/src/sgml/postgres.sgml
+++ b/doc/src/sgml/postgres.sgml
@@ -172,6 +172,7 @@ break is not needed in a wider output rendering.
   &logical-replication;
   &jit;
   &regress;
+  &lrg;
 
  </part>
 
-- 
2.27.0

#13

over 3 years ago

In reply to: kuroda.hayato@fujitsu.com (#12)

4 attachment(s)

RE: Multi-Master Logical Replication

Hi hackers,

[1]: has changed the name of the parameter, so I rebased the patch. Furthermore I implemented the first version of lrg_node_detach and lrg_drop functions, and some code comments are fixed.
Furthermore I implemented the first version of lrg_node_detach and lrg_drop functions,
and some code comments are fixed.

0001 and 0002 were copied from the [1]has changed the name of the parameter, so I rebased the patch. Furthermore I implemented the first version of lrg_node_detach and lrg_drop functions, and some code comments are fixed., they were attached for the cfbot.
Please see 0003 and 0004 for LRG related codes.

Best Regards,
Hayato Kuroda
FUJITSU LIMITED

Attachments:

v2-0003-PoC-implement-LRG.patchapplication/octet-stream; name=v2-0003-PoC-implement-LRG.patchDownload

From 7d982a20fdfd92e29da426f052a898fe0f0c9dbe Mon Sep 17 00:00:00 2001
From: "kuroda.hayato%40jp.fujitsu.com" <kuroda.hayato@jp.fujitsu.com>
Date: Tue, 17 May 2022 08:03:31 +0000
Subject: [PATCH 1/2] (PoC) implement LRG

---
 src/Makefile                                |   1 +
 src/backend/catalog/Makefile                |   3 +-
 src/backend/postmaster/bgworker.c           |   7 +
 src/backend/postmaster/postmaster.c         |   3 +
 src/backend/replication/Makefile            |   4 +-
 src/backend/replication/libpqlrg/Makefile   |  38 ++
 src/backend/replication/libpqlrg/libpqlrg.c | 352 +++++++++++
 src/backend/replication/lrg/Makefile        |  22 +
 src/backend/replication/lrg/lrg.c           | 465 ++++++++++++++
 src/backend/replication/lrg/lrg_launcher.c  | 341 ++++++++++
 src/backend/replication/lrg/lrg_worker.c    | 648 ++++++++++++++++++++
 src/backend/storage/ipc/ipci.c              |   2 +
 src/include/catalog/pg_lrg_info.h           |  47 ++
 src/include/catalog/pg_lrg_nodes.h          |  54 ++
 src/include/catalog/pg_lrg_pub.h            |  46 ++
 src/include/catalog/pg_lrg_sub.h            |  46 ++
 src/include/catalog/pg_proc.dat             |  25 +
 src/include/replication/libpqlrg.h          |  99 +++
 src/include/replication/lrg.h               |  68 ++
 src/test/regress/expected/oidjoins.out      |   6 +
 20 files changed, 2275 insertions(+), 2 deletions(-)
 create mode 100644 src/backend/replication/libpqlrg/Makefile
 create mode 100644 src/backend/replication/libpqlrg/libpqlrg.c
 create mode 100644 src/backend/replication/lrg/Makefile
 create mode 100644 src/backend/replication/lrg/lrg.c
 create mode 100644 src/backend/replication/lrg/lrg_launcher.c
 create mode 100644 src/backend/replication/lrg/lrg_worker.c
 create mode 100644 src/include/catalog/pg_lrg_info.h
 create mode 100644 src/include/catalog/pg_lrg_nodes.h
 create mode 100644 src/include/catalog/pg_lrg_pub.h
 create mode 100644 src/include/catalog/pg_lrg_sub.h
 create mode 100644 src/include/replication/libpqlrg.h
 create mode 100644 src/include/replication/lrg.h

diff --git a/src/Makefile b/src/Makefile
index 79e274a476..75db706762 100644
--- a/src/Makefile
+++ b/src/Makefile
@@ -23,6 +23,7 @@ SUBDIRS = \
 	interfaces \
 	backend/replication/libpqwalreceiver \
 	backend/replication/pgoutput \
+	backend/replication/libpqlrg \
 	fe_utils \
 	bin \
 	pl \
diff --git a/src/backend/catalog/Makefile b/src/backend/catalog/Makefile
index 89a0221ec9..744fdf4fb8 100644
--- a/src/backend/catalog/Makefile
+++ b/src/backend/catalog/Makefile
@@ -72,7 +72,8 @@ CATALOG_HEADERS := \
 	pg_collation.h pg_parameter_acl.h pg_partitioned_table.h \
 	pg_range.h pg_transform.h \
 	pg_sequence.h pg_publication.h pg_publication_namespace.h \
-	pg_publication_rel.h pg_subscription.h pg_subscription_rel.h
+	pg_publication_rel.h pg_subscription.h pg_subscription_rel.h \
+	pg_lrg_info.h pg_lrg_nodes.h pg_lrg_pub.h pg_lrg_sub.h
 
 GENERATED_HEADERS := $(CATALOG_HEADERS:%.h=%_d.h) schemapg.h system_fk_info.h
 
diff --git a/src/backend/postmaster/bgworker.c b/src/backend/postmaster/bgworker.c
index 40601aefd9..49d8ff1878 100644
--- a/src/backend/postmaster/bgworker.c
+++ b/src/backend/postmaster/bgworker.c
@@ -20,6 +20,7 @@
 #include "postmaster/bgworker_internals.h"
 #include "postmaster/interrupt.h"
 #include "postmaster/postmaster.h"
+#include "replication/lrg.h"
 #include "replication/logicallauncher.h"
 #include "replication/logicalworker.h"
 #include "storage/dsm.h"
@@ -128,6 +129,12 @@ static const struct
 	},
 	{
 		"ApplyWorkerMain", ApplyWorkerMain
+	},
+	{
+		"lrg_launcher_main", lrg_launcher_main
+	},
+	{
+		"lrg_worker_main", lrg_worker_main
 	}
 };
 
diff --git a/src/backend/postmaster/postmaster.c b/src/backend/postmaster/postmaster.c
index 3b73e26956..b900008cdd 100644
--- a/src/backend/postmaster/postmaster.c
+++ b/src/backend/postmaster/postmaster.c
@@ -118,6 +118,7 @@
 #include "postmaster/postmaster.h"
 #include "postmaster/syslogger.h"
 #include "replication/logicallauncher.h"
+#include "replication/lrg.h"
 #include "replication/walsender.h"
 #include "storage/fd.h"
 #include "storage/ipc.h"
@@ -1020,6 +1021,8 @@ PostmasterMain(int argc, char *argv[])
 	 */
 	ApplyLauncherRegister();
 
+	LrgLauncherRegister();
+
 	/*
 	 * process any libraries that should be preloaded at postmaster start
 	 */
diff --git a/src/backend/replication/Makefile b/src/backend/replication/Makefile
index 3d8fb70c0e..49ffc243f6 100644
--- a/src/backend/replication/Makefile
+++ b/src/backend/replication/Makefile
@@ -35,7 +35,9 @@ OBJS = \
 	walreceiverfuncs.o \
 	walsender.o
 
-SUBDIRS = logical
+SUBDIRS = \
+	logical \
+	lrg
 
 include $(top_srcdir)/src/backend/common.mk
 
diff --git a/src/backend/replication/libpqlrg/Makefile b/src/backend/replication/libpqlrg/Makefile
new file mode 100644
index 0000000000..72d911a918
--- /dev/null
+++ b/src/backend/replication/libpqlrg/Makefile
@@ -0,0 +1,38 @@
+#-------------------------------------------------------------------------
+#
+# Makefile--
+#    Makefile for src/backend/replication/lrg/libpqlrg
+#
+# IDENTIFICATION
+#    src/backend/replication/lrg/libpqlrg/Makefile
+#
+#-------------------------------------------------------------------------
+
+subdir = src/backend/replication/lrg/libpqlrg
+top_builddir = ../../../..
+include $(top_builddir)/src/Makefile.global
+
+override CPPFLAGS := -I$(srcdir) -I$(libpq_srcdir) $(CPPFLAGS)
+
+OBJS = \
+	$(WIN32RES) \
+	libpqlrg.o
+
+SHLIB_LINK_INTERNAL = $(libpq)
+SHLIB_LINK = $(filter -lintl, $(LIBS))
+SHLIB_PREREQS = submake-libpq
+PGFILEDESC = "libpqlrg"
+NAME = libpqlrg
+
+all: all-shared-lib
+
+include $(top_srcdir)/src/Makefile.shlib
+
+install: all installdirs install-lib
+
+installdirs: installdirs-lib
+
+uninstall: uninstall-lib
+
+clean distclean maintainer-clean: clean-lib
+	rm -f $(OBJS)
diff --git a/src/backend/replication/libpqlrg/libpqlrg.c b/src/backend/replication/libpqlrg/libpqlrg.c
new file mode 100644
index 0000000000..b313e7c0b8
--- /dev/null
+++ b/src/backend/replication/libpqlrg/libpqlrg.c
@@ -0,0 +1,352 @@
+/*-------------------------------------------------------------------------
+ *
+ * libpqlrg.c
+ *
+ * This file contains the libpq-specific parts of lrg feature. It's
+ * loaded as a dynamic module to avoid linking the main server binary with
+ * libpq.
+ *-------------------------------------------------------------------------
+ */
+
+
+#include "postgres.h"
+
+#include "access/heapam.h"
+#include "funcapi.h"
+#include "libpq-fe.h"
+#include "lib/stringinfo.h"
+#include "replication/libpqlrg.h"
+#include "replication/lrg.h"
+#include "utils/snapmgr.h"
+
+PG_MODULE_MAGIC;
+
+void		_PG_init(void);
+
+/* Prototypes for interface functions */
+static void libpqlrg_connect(const char *connstring, PGconn **conn);
+static bool libpqlrg_check_group(PGconn *conn, const char *group_name);
+static void libpqlrg_copy_lrg_nodes(PGconn *remoteconn, PGconn *localconn);
+static void libpqlrg_insert_into_lrg_nodes(PGconn *remoteconn,
+										   const char *node_id, LRG_NODE_STATE status,
+										   const char *node_name, const char *local_connstring,
+										   const char *upstream_connstring);
+
+static void libpqlrg_create_subscription(const char *group_name, const char *publisher_connstring,
+										 const char *publisher_node_id, const char *subscriber_node_id,
+										 PGconn *subscriberconn, const char *options);
+
+static void libpqlrg_drop_publication(const char *group_name,
+									  PGconn *publisherconn);
+
+static void libpqlrg_drop_subscription(const char *group_name,
+										const char *publisher_node_id, const char *subscriber_node_id,
+										PGconn *subscriberconn);
+
+static void libpqlrg_delete_from_nodes(PGconn *conn, const char *node_id);
+
+static void libpqlrg_cleanup(PGconn *conn);
+
+static void libpqlrg_disconnect(PGconn *conn);
+
+static lrg_function_types PQLrgFunctionTypes =
+{
+	libpqlrg_connect,
+	libpqlrg_check_group,
+	libpqlrg_copy_lrg_nodes,
+	libpqlrg_insert_into_lrg_nodes,
+	libpqlrg_create_subscription,
+	libpqlrg_drop_publication,
+	libpqlrg_drop_subscription,
+	libpqlrg_delete_from_nodes,
+	libpqlrg_cleanup,
+	libpqlrg_disconnect
+};
+
+/*
+ * Just a wrapper for PQconnectdb() and PQstatus().
+ */
+static void
+libpqlrg_connect(const char *connstring, PGconn **conn)
+{
+	*conn = PQconnectdb(connstring);
+	if (PQstatus(*conn) != CONNECTION_OK)
+		elog(ERROR, "failed to connect");
+}
+
+/*
+ * Check whether the node is in the specified group or not.
+ */
+static bool
+libpqlrg_check_group(PGconn *conn, const char *group_name)
+{
+	PGresult *result;
+	StringInfoData query;
+	bool ret;
+
+	Assert(PQstatus(conn) == CONNECTION_OK);
+	initStringInfo(&query);
+	appendStringInfo(&query, "SELECT COUNT(*) FROM pg_lrg_info WHERE groupname = '%s'", group_name);
+
+	result = PQexec(conn, query.data);
+
+	ret = atoi(PQgetvalue(result, 0, 0));
+	pfree(query.data);
+
+	return ret != 0;
+}
+
+/*
+ * Copy pg_lrg_nodes from remoteconn.
+ */
+static void
+libpqlrg_copy_lrg_nodes(PGconn *remoteconn, PGconn *localconn)
+{
+	PGresult *result;
+	StringInfoData query;
+	int i, num_tuples;
+
+	Assert(PQstatus(remoteconn) == CONNECTION_OK
+		   && PQstatus(localconn) == CONNECTION_OK);
+	initStringInfo(&query);
+
+
+	/*
+	 * Note that COPY command cannot be used here because group_oid
+	 * might be different between remote and local.
+	 */
+	appendStringInfo(&query, "SELECT nodeid, status, nodename, "
+							 "localconn, upstreamconn FROM pg_lrg_nodes");
+	result = PQexec(remoteconn, query.data);
+	if (PQresultStatus(result) != PGRES_TUPLES_OK)
+		elog(ERROR, "failed to read pg_lrg_nodes");
+
+	resetStringInfo(&query);
+
+	num_tuples = PQntuples(result);
+
+	for(i = 0; i < num_tuples; i++)
+	{
+		char *node_id;
+		char *status;
+		char *nodename;
+		char *localconn;
+		char *upstreamconn;
+
+		node_id = PQgetvalue(result, i, 0);
+		status = PQgetvalue(result, i, 1);
+		nodename = PQgetvalue(result, i, 2);
+		localconn = PQgetvalue(result, i, 3);
+		upstreamconn = PQgetvalue(result, i, 4);
+
+		StartTransactionCommand();
+		(void) GetTransactionSnapshot();
+		/*
+		 * group_oid is adjusted to local value
+		 */
+		lrg_add_nodes(node_id, get_group_info(NULL), atoi(status), nodename, localconn, upstreamconn);
+		CommitTransactionCommand();
+	}
+}
+
+/*
+ * Insert data to remote's pg_lrg_nodes. It will be done
+ * via internal SQL function.
+ */
+static void
+libpqlrg_insert_into_lrg_nodes(PGconn *remoteconn,
+							   const char *node_id, LRG_NODE_STATE status,
+							   const char *node_name, const char *local_connstring,
+							   const char *upstream_connstring)
+{
+	StringInfoData query;
+	PGresult *result;
+
+	Assert(PQstatus(remoteconn) == CONNECTION_OK
+		   && node_id != NULL
+		   && node_name != NULL
+		   && local_connstring != NULL
+		   && upstream_connstring != NULL);
+
+	initStringInfo(&query);
+	appendStringInfo(&query, "SELECT lrg_insert_into_nodes('%s', %d, '%s', '%s', '%s')",
+					 node_id, status, node_name, local_connstring, upstream_connstring);
+
+	result = PQexec(remoteconn, query.data);
+	if (PQresultStatus(result) != PGRES_TUPLES_OK)
+		elog(ERROR, "failed to execute libpqlrg_insert_to_remote_lrg_nodes: %s", query.data);
+	PQclear(result);
+
+	pfree(query.data);
+}
+
+/*
+ * Create a subscription with given name and parameters, and
+ * add a tuple to remote's pg_lrg_sub.
+ *
+ * Note that both of this and  libpqlrg_insert_into_lrg_nodes()
+ * must be called during attaching a node.
+ */
+static void
+libpqlrg_create_subscription(const char *group_name, const char *publisher_connstring,
+							 const char *publisher_node_id, const char *subscriber_node_id,
+							 PGconn *subscriberconn, const char *options)
+{
+	StringInfoData query, sub_name;
+	PGresult *result;
+
+	Assert(publisher_connstring != NULL && subscriberconn != NULL);
+
+	/*
+	 * the name of subscriber is just concat of two node_id.
+	 */
+	initStringInfo(&query);
+	initStringInfo(&sub_name);
+
+	/*
+	 * construct the name of subscription and query.
+	 */
+	appendStringInfo(&sub_name, "sub_%s_%s", subscriber_node_id, publisher_node_id);
+	appendStringInfo(&query, "CREATE SUBSCRIPTION %s CONNECTION '%s' PUBLICATION pub_for_%s",
+					 sub_name.data, publisher_connstring, group_name);
+
+	if (options)
+		appendStringInfo(&query, " WITH (%s)", options);
+
+	result = PQexec(subscriberconn, query.data);
+	if (PQresultStatus(result) != PGRES_COMMAND_OK)
+		elog(ERROR, "failed to create subscription: %s", query.data);
+	PQclear(result);
+
+	resetStringInfo(&query);
+	appendStringInfo(&query, "SELECT lrg_insert_into_sub('%s')", sub_name.data);
+	result = PQexec(subscriberconn, query.data);
+	if (PQresultStatus(result) != PGRES_TUPLES_OK)
+		elog(ERROR, "failed to execute lrg_insert_into_sub: %s", query.data);
+	PQclear(result);
+
+	pfree(sub_name.data);
+	pfree(query.data);
+}
+
+/*
+ * Drop a given publication and delete a tuple
+ * from remote's pg_lrg_pub.
+ */
+static void
+libpqlrg_drop_publication(const char *group_name,
+						  PGconn *publisherconn)
+{
+	StringInfoData query, pub_name;
+	PGresult *result;
+
+	Assert(PQstatus(publisherconn) == CONNECTION_OK);
+
+	initStringInfo(&query);
+	initStringInfo(&pub_name);
+
+	appendStringInfo(&pub_name, "pub_for_%s", group_name);
+	appendStringInfo(&query, "DROP PUBLICATION %s", pub_name.data);
+
+	result = PQexec(publisherconn, query.data);
+	if (PQresultStatus(result) != PGRES_COMMAND_OK)
+		elog(ERROR, "failed to drop publication: %s", query.data);
+	PQclear(result);
+	pfree(pub_name.data);
+	pfree(query.data);
+}
+
+/*
+ * same as above, but for subscription.
+ */
+static void
+libpqlrg_drop_subscription(const char *group_name,
+						   const char *publisher_node_id, const char *subscriber_node_id,
+						   PGconn *subscriberconn)
+{
+	StringInfoData query, sub_name;
+	PGresult *result;
+
+	Assert(PQstatus(subscriberconn) == CONNECTION_OK);
+
+	/*
+	 * the name of subscriber is just concat of two node_id.
+	 */
+	initStringInfo(&query);
+	initStringInfo(&sub_name);
+
+	/*
+	 * construct the name of subscription and query.
+	 */
+	appendStringInfo(&sub_name, "sub_%s_%s", subscriber_node_id, publisher_node_id);
+	appendStringInfo(&query, "DROP SUBSCRIPTION %s", sub_name.data);
+
+	result = PQexec(subscriberconn, query.data);
+	if (PQresultStatus(result) != PGRES_COMMAND_OK)
+		elog(ERROR, "failed to drop subscription: %s", query.data);
+	PQclear(result);
+	pfree(sub_name.data);
+	pfree(query.data);
+}
+
+/*
+ * Delete data to remote's pg_lrg_nodes. It will be done
+ * via internal SQL function.
+ */
+static void
+libpqlrg_delete_from_nodes(PGconn *conn, const char *node_id)
+{
+	StringInfoData query;
+	PGresult *result;
+
+	Assert(PQstatus(conn) == CONNECTION_OK);
+
+	initStringInfo(&query);
+	appendStringInfo(&query, "DELETE FROM pg_lrg_nodes WHERE nodeid = '%s'", node_id);
+
+	result = PQexec(conn, query.data);
+	if (PQresultStatus(result) != PGRES_COMMAND_OK)
+		elog(ERROR, "failed to delete from pg_lrg_nodes: %s", query.data);
+
+	PQclear(result);
+	pfree(query.data);
+}
+
+/*
+ * Delete all data from LRG catalogs
+ */
+static void
+libpqlrg_cleanup(PGconn *conn)
+{
+	PGresult *result;
+	Assert(PQstatus(conn) == CONNECTION_OK);
+
+	result = PQexec(conn, "DELETE FROM pg_lrg_pub;"
+						  "DELETE FROM pg_lrg_sub;"
+						  "DELETE FROM pg_lrg_nodes;"
+						  "DELETE FROM pg_lrg_info;");
+	if (PQresultStatus(result) != PGRES_COMMAND_OK)
+		elog(ERROR, "failed to DELETE");
+
+	PQclear(result);
+}
+
+/*
+ * Just a wrapper for PQfinish()
+ */
+static void
+libpqlrg_disconnect(PGconn *conn)
+{
+	PQfinish(conn);
+}
+
+/*
+ * Module initialization function
+ */
+void
+_PG_init(void)
+{
+	if (LrgFunctionTypes != NULL)
+		elog(ERROR, "libpqlrg already loaded");
+	LrgFunctionTypes = &PQLrgFunctionTypes;
+}
diff --git a/src/backend/replication/lrg/Makefile b/src/backend/replication/lrg/Makefile
new file mode 100644
index 0000000000..4ce929b6a4
--- /dev/null
+++ b/src/backend/replication/lrg/Makefile
@@ -0,0 +1,22 @@
+#-------------------------------------------------------------------------
+#
+# Makefile--
+#    Makefile for src/backend/replication/lrg
+#
+# IDENTIFICATION
+#    src/backend/replication/lrg/Makefile
+#
+#-------------------------------------------------------------------------
+
+subdir = src/backend/replication/lrg
+top_builddir = ../../../..
+include $(top_builddir)/src/Makefile.global
+
+override CPPFLAGS := -I$(srcdir) -I$(libpq_srcdir) $(CPPFLAGS)
+
+OBJS = \
+	lrg.o \
+	lrg_launcher.o \
+	lrg_worker.o
+
+include $(top_srcdir)/src/backend/common.mk
diff --git a/src/backend/replication/lrg/lrg.c b/src/backend/replication/lrg/lrg.c
new file mode 100644
index 0000000000..153eeb6dc9
--- /dev/null
+++ b/src/backend/replication/lrg/lrg.c
@@ -0,0 +1,465 @@
+/*-------------------------------------------------------------------------
+ *
+ * lrg.c
+ *		  Constructs a logical replication group
+ *
+ *-------------------------------------------------------------------------
+ */
+
+#include "postgres.h"
+
+#include "access/heapam.h"
+#include "access/htup_details.h"
+#include "access/relscan.h"
+#include "access/table.h"
+#include "access/xlog.h"
+#include "catalog/catalog.h"
+#include "catalog/indexing.h"
+#include "catalog/pg_lrg_info.h"
+#include "catalog/pg_lrg_nodes.h"
+#include "catalog/pg_lrg_sub.h"
+#include "catalog/pg_subscription.h"
+#include "miscadmin.h"
+#include "postmaster/bgworker.h"
+#include "replication/libpqlrg.h"
+#include "replication/logicallauncher.h"
+#include "replication/lrg.h"
+#include "storage/lock.h"
+#include "utils/builtins.h"
+#include "utils/fmgrprotos.h"
+#include "utils/memutils.h"
+#include "utils/rel.h"
+#include "utils/snapmgr.h"
+#include "utils/syscache.h"
+
+LrgPerdbCtxStruct *LrgPerdbCtx;
+
+static Size lrg_worker_array_size(void);
+static Oid lrg_add_info(char *group_name, bool puballtables);
+static Oid find_subscription(const char *subname);
+
+/*
+ * Helpler function for LrgLauncherShmemInit.
+ */
+static Size
+lrg_worker_array_size(void)
+{
+	Size size;
+
+	size = sizeof(LrgPerdbCtxStruct);
+	size = MAXALIGN(size);
+	/* XXX: for simplify the size of the array is set to max_worker_processes */
+	size = add_size(size, mul_size(max_worker_processes, sizeof(LrgPerdbCtxStruct)));
+
+	return size;
+}
+
+/*
+ * Allocate LrgPerdbCtxStruct to the shared memory.
+ */
+void
+LrgLauncherShmemInit(void)
+{
+	bool		found;
+
+	LWLockAcquire(AddinShmemInitLock, LW_EXCLUSIVE);
+	LrgPerdbCtx = (LrgPerdbCtxStruct *)
+		ShmemInitStruct("Lrg Launcher Data",
+						lrg_worker_array_size(),
+						&found);
+	if (!found)
+	{
+		MemSet(LrgPerdbCtx, 0, lrg_worker_array_size());
+		LWLockInitialize(&(LrgPerdbCtx->lock), LWLockNewTrancheId());
+	}
+	LWLockRelease(AddinShmemInitLock);
+	LWLockRegisterTranche(LrgPerdbCtx->lock.tranche, "lrg");
+}
+
+void
+LrgLauncherRegister(void)
+{
+	BackgroundWorker worker;
+
+	/*
+	 * LRG deeply depends on the logical replication mechanism, so
+	 * skip registering the LRG launcher if logical replication
+	 * cannot be used.
+	 */
+	if (max_logical_replication_workers == 0)
+		return;
+
+	/*
+	 * Build struct BackgroundWorker for launcher.
+	 */
+	MemSet(&worker, 0, sizeof(BackgroundWorker));
+
+	snprintf(worker.bgw_name, BGW_MAXLEN, "lrg launcher");
+	worker.bgw_start_time = BgWorkerStart_RecoveryFinished;
+	worker.bgw_flags = BGWORKER_SHMEM_ACCESS | BGWORKER_BACKEND_DATABASE_CONNECTION;
+	worker.bgw_restart_time = BGW_NEVER_RESTART;
+	snprintf(worker.bgw_library_name, BGW_MAXLEN, "postgres");
+	snprintf(worker.bgw_function_name, BGW_MAXLEN, "lrg_launcher_main");
+	RegisterBackgroundWorker(&worker);
+}
+
+/*
+ * construct node_id.
+ *
+ * TODO: construct proper node_id. Currently it is just concat of
+ * sytem identifier and dbid.
+ */
+void
+construct_node_id(char *out_node_id, int size)
+{
+	snprintf(out_node_id, size, UINT64_FORMAT "%u", GetSystemIdentifier(), MyDatabaseId);
+}
+
+/*
+ * Actual work for adding a tuple to pg_lrg_nodes.
+ */
+void
+lrg_add_nodes(char *node_id, Oid group_id, LRG_NODE_STATE status, char *node_name, char *local_connstring, char *upstream_connstring)
+{
+	Relation rel;
+	bool		nulls[Natts_pg_lrg_nodes];
+	Datum		values[Natts_pg_lrg_nodes];
+	HeapTuple tup;
+
+	Oid			lrgnodesoid;
+
+	rel = table_open(LrgNodesRelationId, ExclusiveLock);
+
+	memset(values, 0, sizeof(values));
+	memset(nulls, 0, sizeof(nulls));
+
+	lrgnodesoid = GetNewOidWithIndex(rel, LrgNodesRelationIndexId, Anum_pg_lrg_nodes_oid);
+	values[Anum_pg_lrg_nodes_oid - 1] = ObjectIdGetDatum(lrgnodesoid);
+	values[Anum_pg_lrg_nodes_nodeid - 1] = CStringGetDatum(node_id);
+	values[Anum_pg_lrg_nodes_groupid - 1] = ObjectIdGetDatum(group_id);
+	values[Anum_pg_lrg_nodes_status - 1] = Int32GetDatum(status);
+	values[Anum_pg_lrg_nodes_dbid - 1] = ObjectIdGetDatum(MyDatabaseId);
+	values[Anum_pg_lrg_nodes_nodename - 1] = CStringGetDatum(node_name);
+	values[Anum_pg_lrg_nodes_localconn - 1] = CStringGetDatum(local_connstring);
+
+	if (upstream_connstring != NULL)
+		values[Anum_pg_lrg_nodes_upstreamconn - 1] = CStringGetDatum(upstream_connstring);
+	else
+		nulls[Anum_pg_lrg_nodes_upstreamconn - 1] = true;
+
+	tup = heap_form_tuple(RelationGetDescr(rel), values, nulls);
+	/* Insert tuple into catalog. */
+	CatalogTupleInsert(rel, tup);
+	heap_freetuple(tup);
+	table_close(rel, ExclusiveLock);
+}
+
+/*
+ * read pg_lrg_info and get oid.
+ *
+ * XXX: This function assumes that there is only one tuple
+ * in thepg_lrg_info.
+ */
+Oid
+get_group_info(char **group_name)
+{
+	Relation	rel;
+	HeapTuple tup;
+	TableScanDesc scan;
+	Oid group_oid = InvalidOid;
+	Form_pg_lrg_info infoform;
+	bool is_opened = false;
+
+	if (!IsTransactionState())
+	{
+		is_opened = true;
+		StartTransactionCommand();
+		(void) GetTransactionSnapshot();
+	}
+
+	rel = table_open(LrgInfoRelationId, AccessShareLock);
+	scan = table_beginscan_catalog(rel, 0, NULL);
+	tup = heap_getnext(scan, ForwardScanDirection);
+
+	if (tup != NULL)
+	{
+		infoform = (Form_pg_lrg_info) GETSTRUCT(tup);
+		group_oid = infoform->oid;
+		if (group_name != NULL)
+		{
+			MemoryContext old;
+			old = MemoryContextSwitchTo(TopMemoryContext);
+			*group_name = pstrdup(NameStr(infoform->groupname));
+			MemoryContextSwitchTo(old);
+		}
+	}
+
+	table_endscan(scan);
+	table_close(rel, AccessShareLock);
+
+	if (is_opened)
+		CommitTransactionCommand();
+
+	return group_oid;
+}
+
+/*
+ * Actual work for adding a tuple to pg_lrg_info.
+ */
+static Oid
+lrg_add_info(char *group_name, bool puballtables)
+{
+	Relation	rel;
+	bool		nulls[Natts_pg_lrg_info];
+	Datum		values[Natts_pg_lrg_info];
+	HeapTuple tup;
+	Oid			lrgoid;
+
+	rel = table_open(LrgInfoRelationId, ExclusiveLock);
+
+	memset(values, 0, sizeof(values));
+	memset(nulls, 0, sizeof(nulls));
+
+	lrgoid = GetNewOidWithIndex(rel, LrgInfoRelationIndexId, Anum_pg_lrg_info_oid);
+	values[Anum_pg_lrg_info_oid - 1] = ObjectIdGetDatum(lrgoid);
+	values[Anum_pg_lrg_info_groupname - 1] = CStringGetDatum(group_name);
+	values[Anum_pg_lrg_info_puballtables - 1] = BoolGetDatum(puballtables);
+
+	tup = heap_form_tuple(RelationGetDescr(rel), values, nulls);
+	/* Insert tuple into catalog. */
+	CatalogTupleInsert(rel, tup);
+	heap_freetuple(tup);
+	table_close(rel, ExclusiveLock);
+
+	return lrgoid;
+}
+
+/*
+ * helper function for lrg_insert_into_sub
+ */
+static Oid
+find_subscription(const char *subname)
+{
+	/* for scannning */
+	Relation rel;
+	HeapTuple tup;
+	Form_pg_subscription form;
+
+	rel = table_open(SubscriptionRelationId, AccessExclusiveLock);
+	tup = SearchSysCacheCopy2(SUBSCRIPTIONNAME, MyDatabaseId,
+							  CStringGetDatum(subname));
+
+	if (!HeapTupleIsValid(tup))
+	{
+		table_close(rel, NoLock);
+		return InvalidOid;
+	}
+
+	form = (Form_pg_subscription) GETSTRUCT(tup);
+	table_close(rel, NoLock);
+
+	return form->oid;
+}
+
+/*
+ * ================================
+ * Public APIs
+ * ================================
+ */
+
+/*
+ * SQL function for creating a new logical replication group.
+ *
+ * This function adds a tuple to pg_lrg_info and pg_lrg_nodes,
+ * and after that kick lrg launcher.
+ */
+Datum
+lrg_create(PG_FUNCTION_ARGS)
+{
+	Oid			lrgoid;
+	char		*group_name;
+	char		*pub_type;
+	char		*local_connstring;
+	char		*node_name;
+
+	/* XXX: for simplify the fixed array is used */
+	char		node_id[64];
+
+	if (get_group_info(NULL) != InvalidOid)
+		elog(ERROR, "This node is already a member of a node group");
+
+	group_name = text_to_cstring((text *) DatumGetPointer(PG_GETARG_DATUM(0)));
+	pub_type = text_to_cstring((text *) DatumGetPointer(PG_GETARG_DATUM(1)));
+
+	if (pg_strcasecmp(pub_type, "FOR ALL TABLES") != 0)
+		elog(ERROR, "'only 'FOR ALL TABLES' is support");
+
+	lrgoid = lrg_add_info(group_name, true);
+
+	construct_node_id(node_id, sizeof(node_id));
+	local_connstring = text_to_cstring((text *) DatumGetPointer(PG_GETARG_DATUM(2)));
+	node_name = text_to_cstring((text *) DatumGetPointer(PG_GETARG_DATUM(3)));
+	lrg_add_nodes(node_id, lrgoid, LRG_STATE_INIT, node_name, local_connstring, NULL);
+
+	lrg_launcher_wakeup();
+	PG_RETURN_NULL();
+}
+
+
+/*
+ * SQL function for attaching to a specified group
+ *
+ * This function adds a tuple to pg_lrg_info and pg_lrg_nodes,
+ * and after that kick lrg launcher.
+ */
+Datum
+lrg_node_attach(PG_FUNCTION_ARGS)
+{
+	Oid			lrgoid;
+	char		*group_name;
+	char		*local_connstring;
+	char		*upstream_connstring;
+	char		*node_name;
+	PGconn		*upstreamconn = NULL;
+
+	/* XXX: for simplify the fixed array is used */
+	char		node_id[64];
+
+	group_name = text_to_cstring((text *) DatumGetPointer(PG_GETARG_DATUM(0)));
+	local_connstring = text_to_cstring((text *) DatumGetPointer(PG_GETARG_DATUM(1)));
+	upstream_connstring = text_to_cstring((text *) DatumGetPointer(PG_GETARG_DATUM(2)));
+	node_name = text_to_cstring((text *) DatumGetPointer(PG_GETARG_DATUM(3)));
+
+	/*
+	 * For sanity check the backend process must connect to the upstream node.
+	 * libpqlrg shared library will be used for that.
+	 */
+	load_file("libpqlrg", false);
+	lrg_connect(upstream_connstring, &upstreamconn);
+	if (!lrg_check_group(upstreamconn, group_name))
+		elog(ERROR, "specified group is not exist");
+	lrg_disconnect(upstreamconn);
+
+	lrgoid = lrg_add_info(group_name, true);
+	construct_node_id(node_id, sizeof(node_id));
+	lrg_add_nodes(node_id, lrgoid, LRG_STATE_INIT, node_name, local_connstring, upstream_connstring);
+
+	lrg_launcher_wakeup();
+	PG_RETURN_NULL();
+}
+
+/*
+ * SQL function for detaching from a group
+ */
+Datum
+lrg_node_detach(PG_FUNCTION_ARGS)
+{
+	char		*node_name;
+	char		*given_group_name;
+	char		*group_name_from_catalog = NULL;
+
+	given_group_name = text_to_cstring((text *) DatumGetPointer(PG_GETARG_DATUM(0)));
+	node_name = text_to_cstring((text *) DatumGetPointer(PG_GETARG_DATUM(1)));
+
+	(void) get_group_info(&group_name_from_catalog);
+	if (strcmp(given_group_name, group_name_from_catalog) != 0)
+		elog(ERROR, "This node is not a member of the specified group: %s", given_group_name);
+
+	update_node_status_by_nodename(node_name, LRG_STATE_TO_BE_DETACHED, true);
+	lrg_launcher_wakeup();
+	PG_RETURN_NULL();
+}
+
+/*
+ * SQL function for dropping a group.
+ */
+Datum
+lrg_drop(PG_FUNCTION_ARGS)
+{
+	char node_id[64];
+	char		*given_group_name;
+	char		*group_name_from_catalog = NULL;
+
+	construct_node_id(node_id, sizeof(node_id));
+
+	given_group_name = text_to_cstring((text *) DatumGetPointer(PG_GETARG_DATUM(0)));
+
+	(void) get_group_info(&group_name_from_catalog);
+	if (strcmp(given_group_name, group_name_from_catalog) != 0)
+		elog(ERROR, "This node is not a member of the specified group: %s", given_group_name);
+
+	/* TODO: add a check whether there are not other members in the group or not  */
+	update_node_status_by_nodeid(node_id, LRG_STATE_TO_BE_DETACHED, true);
+	lrg_launcher_wakeup();
+	PG_RETURN_NULL();
+}
+
+/*
+ * ================================
+ * Internal SQL functions
+ * ================================
+ */
+
+/*
+ * Wrapper for adding a tuple into pg_lrg_sub
+ */
+Datum
+lrg_insert_into_sub(PG_FUNCTION_ARGS)
+{
+	char *sub_name;
+	Oid group_oid, sub_oid, lrgsub_oid;
+	Relation rel;
+	bool		nulls[Natts_pg_lrg_sub];
+	Datum		values[Natts_pg_lrg_sub];
+	HeapTuple tup;
+
+	sub_name = text_to_cstring((text *) DatumGetPointer(PG_GETARG_DATUM(0)));
+
+	group_oid = get_group_info(NULL);
+	sub_oid = find_subscription(sub_name);
+
+	rel = table_open(LrgSubscriptionId, ExclusiveLock);
+
+	memset(values, 0, sizeof(values));
+	memset(nulls, 0, sizeof(nulls));
+
+	lrgsub_oid = GetNewOidWithIndex(rel, LrgSubscriptionOidIndexId, Anum_pg_lrg_sub_oid);
+
+	values[Anum_pg_lrg_sub_oid - 1] = ObjectIdGetDatum(lrgsub_oid);
+	values[Anum_pg_lrg_sub_groupid - 1] = ObjectIdGetDatum(group_oid);
+	values[Anum_pg_lrg_sub_subid - 1] = ObjectIdGetDatum(sub_oid);
+
+	tup = heap_form_tuple(RelationGetDescr(rel), values, nulls);
+	/* Insert tuple into catalog. */
+	CatalogTupleInsert(rel, tup);
+	heap_freetuple(tup);
+	table_close(rel, ExclusiveLock);
+
+	PG_RETURN_NULL();
+}
+
+/*
+ * Wrapper for adding a tuple into pg_lrg_nodes
+ */
+Datum
+lrg_insert_into_nodes(PG_FUNCTION_ARGS)
+{
+	char *node_id;
+	LRG_NODE_STATE status;
+	char *node_name;
+	char *local_connstring;
+	char *upstream_connstring;
+	Oid group_oid;
+
+	node_id = text_to_cstring((text *) DatumGetPointer(PG_GETARG_DATUM(0)));
+	status = DatumGetInt32(PG_GETARG_DATUM(1));
+	node_name = text_to_cstring((text *) DatumGetPointer(PG_GETARG_DATUM(2)));
+	local_connstring = text_to_cstring((text *) DatumGetPointer(PG_GETARG_DATUM(3)));
+	upstream_connstring = text_to_cstring((text *) DatumGetPointer(PG_GETARG_DATUM(4)));
+
+	group_oid = get_group_info(NULL);
+
+	lrg_add_nodes(node_id, group_oid, status, node_name, local_connstring, upstream_connstring);
+
+	PG_RETURN_NULL();
+}
diff --git a/src/backend/replication/lrg/lrg_launcher.c b/src/backend/replication/lrg/lrg_launcher.c
new file mode 100644
index 0000000000..2a63546ffb
--- /dev/null
+++ b/src/backend/replication/lrg/lrg_launcher.c
@@ -0,0 +1,341 @@
+/*-------------------------------------------------------------------------
+ *
+ * lrg_launcher.c
+ *		  functions for lrg launcher
+ *
+ *-------------------------------------------------------------------------
+ */
+#include "postgres.h"
+
+#include "access/htup_details.h"
+#include "access/heapam.h"
+#include "access/relscan.h"
+#include "access/table.h"
+#include "catalog/pg_database.h"
+#include "miscadmin.h"
+#include "postmaster/bgworker.h"
+#include "postmaster/interrupt.h"
+#include "replication/logicallauncher.h"
+#include "replication/lrg.h"
+#include "storage/ipc.h"
+#include "storage/proc.h"
+#include "tcop/tcopprot.h"
+#include "utils/memutils.h"
+#include "utils/snapmgr.h"
+
+static void launch_lrg_worker(Oid dbid);
+static LrgPerdbWorker* find_perdb_worker(Oid dbid);
+static List* get_db_list(void);
+static void scan_and_launch(void);
+static void lrglauncher_worker_onexit(int code, Datum arg);
+
+static bool ishook_registered = false;
+static bool isworker_needed = false;
+
+typedef struct db_list_cell
+{
+	Oid dbid;
+	char *dbname;
+} db_list_cell;
+
+/*
+ * Launch a lrg worker related with the given database
+ */
+static void
+launch_lrg_worker(Oid dbid)
+{
+	BackgroundWorker bgw;
+	LrgPerdbWorker *worker = NULL;
+	int slot = 0;
+
+	LWLockAcquire(&LrgPerdbCtx->lock, LW_EXCLUSIVE);
+
+	/*
+	 * Find a free worker slot.
+	 */
+	for (int i = 0; i < max_logical_replication_workers; i++)
+	{
+		LrgPerdbWorker *pw = &LrgPerdbCtx->workers[i];
+
+		if (pw->dbid == InvalidOid)
+		{
+			worker = pw;
+			slot = i;
+			break;
+		}
+	}
+
+	/*
+	 * If there are no more free worker slots, raise an ERROR now.
+	 *
+	 * TODO: cleanup the array?
+	 */
+	if (worker == NULL)
+	{
+		LWLockRelease(&LrgPerdbCtx->lock);
+		ereport(ERROR,
+				errmsg("out of worker slots"));
+	}
+
+
+	/* Prepare the worker slot. */
+	worker->dbid = dbid;
+
+	LWLockRelease(&LrgPerdbCtx->lock);
+
+	MemSet(&bgw, 0, sizeof(BackgroundWorker));
+
+	snprintf(bgw.bgw_name, BGW_MAXLEN, "lrg worker for database %u", dbid);
+	bgw.bgw_start_time = BgWorkerStart_RecoveryFinished;
+	bgw.bgw_flags = BGWORKER_SHMEM_ACCESS | BGWORKER_BACKEND_DATABASE_CONNECTION;
+	bgw.bgw_restart_time = BGW_NEVER_RESTART;
+	snprintf(bgw.bgw_library_name, BGW_MAXLEN, "postgres");
+	snprintf(bgw.bgw_function_name, BGW_MAXLEN, "lrg_worker_main");
+	bgw.bgw_main_arg = UInt32GetDatum(slot);
+
+	if (!RegisterDynamicBackgroundWorker(&bgw, NULL))
+	{
+		/* Failed to start worker, so clean up the worker slot. */
+		LWLockAcquire(&LrgPerdbCtx->lock, LW_EXCLUSIVE);
+		lrg_worker_cleanup(worker);
+		LWLockRelease(&LrgPerdbCtx->lock);
+		ereport(ERROR,
+				(errcode(ERRCODE_CONFIGURATION_LIMIT_EXCEEDED),
+				 errmsg("out of worker slots")));
+	}
+}
+
+/*
+ * Find a launched lrg worker that related with the given database.
+ * This returns NUL if not exist.
+ */
+static LrgPerdbWorker*
+find_perdb_worker(Oid dbid)
+{
+	int i;
+
+	Assert(LWLockHeldByMe(&LrgPerdbCtx->lock));
+
+	for (i = 0; i < max_logical_replication_workers; i++)
+	{
+		LrgPerdbWorker *worker = &LrgPerdbCtx->workers[i];
+		if (worker->dbid == dbid)
+			return worker;
+	}
+	return NULL;
+}
+
+/*
+ * Load the list of databases in this server.
+ */
+static List*
+get_db_list()
+{
+	List *res = NIL;
+	Relation	rel;
+	TableScanDesc scan;
+	HeapTuple	tup;
+	/* We will allocate the output data in the current memory context */
+	MemoryContext resultcxt = CurrentMemoryContext;
+
+	StartTransactionCommand();
+	(void) GetTransactionSnapshot();
+
+	rel = table_open(DatabaseRelationId, AccessShareLock);
+	scan = table_beginscan_catalog(rel, 0, NULL);
+
+	while (HeapTupleIsValid(tup = heap_getnext(scan, ForwardScanDirection)))
+	{
+		Form_pg_database dbform = (Form_pg_database) GETSTRUCT(tup);
+		db_list_cell *cell;
+		MemoryContext oldcxt;
+
+		/* skip if connection is not allowed */
+		if (!dbform->datallowconn)
+			continue;
+
+		/*
+		 * Allocate our results in the caller's context
+		 */
+		oldcxt = MemoryContextSwitchTo(resultcxt);
+
+		cell = (db_list_cell *) palloc0(sizeof(db_list_cell));
+		cell->dbid = dbform->oid;
+		cell->dbname = pstrdup(NameStr(dbform->datname));
+		res = lappend(res, cell);
+
+		MemoryContextSwitchTo(oldcxt);
+	}
+
+	table_endscan(scan);
+	table_close(rel, AccessShareLock);
+	CommitTransactionCommand();
+
+	return res;
+}
+
+/*
+ * Scan pg_lrg_nodes and launch if needed.
+ */
+static void
+scan_and_launch(void)
+{
+	List *list;
+	ListCell   *lc;
+	MemoryContext subctx;
+	MemoryContext oldctx;
+
+	subctx = AllocSetContextCreate(TopMemoryContext,
+									"Lrg Launcher list",
+									ALLOCSET_DEFAULT_SIZES);
+	oldctx = MemoryContextSwitchTo(subctx);
+
+	list = get_db_list();
+
+	foreach(lc, list)
+	{
+		db_list_cell *cell = (db_list_cell *)lfirst(lc);
+		LrgPerdbWorker *worker;
+
+		LWLockAcquire(&LrgPerdbCtx->lock, LW_EXCLUSIVE);
+		worker = find_perdb_worker(cell->dbid);
+		LWLockRelease(&LrgPerdbCtx->lock);
+
+		if (worker != NULL)
+			continue;
+
+		launch_lrg_worker(cell->dbid);
+	}
+
+	/* Switch back to original memory context. */
+	MemoryContextSwitchTo(oldctx);
+	/* Clean the temporary memory. */
+	MemoryContextDelete(subctx);
+}
+
+
+/*
+ * Callback for process exit. cleanup the controller
+ */
+static void
+lrglauncher_worker_onexit(int code, Datum arg)
+{
+	LWLockAcquire(&LrgPerdbCtx->lock, LW_EXCLUSIVE);
+	LrgPerdbCtx->launcher_pid = InvalidPid;
+	LrgPerdbCtx->launcher_latch = NULL;
+	LWLockRelease(&LrgPerdbCtx->lock);
+}
+
+/*
+ * Entry point for lrg launcher
+ */
+void
+lrg_launcher_main(Datum arg)
+{
+	Assert(LrgPerdbCtx->launcher_pid == 0);
+	LrgPerdbCtx->launcher_pid = MyProcPid;
+
+	/* Establish signal handlers. */
+	pqsignal(SIGHUP, SignalHandlerForConfigReload);
+	pqsignal(SIGTERM, die);
+	BackgroundWorkerUnblockSignals();
+
+	/*
+	 * Register my latch to the controller
+	 * for receiving notifications from lrg background worker.
+	 */
+	LWLockAcquire(&LrgPerdbCtx->lock, LW_EXCLUSIVE);
+	LrgPerdbCtx->launcher_latch = &MyProc->procLatch;
+	LrgPerdbCtx->launcher_pid = MyProcPid;
+	LWLockRelease(&LrgPerdbCtx->lock);
+	before_shmem_exit(lrglauncher_worker_onexit, (Datum) 0);
+	ResetLatch(&MyProc->procLatch);
+
+	/*
+	 * we did not connect specific database, because launcher
+	 * will read only pg_database.
+	 */
+	BackgroundWorkerInitializeConnection(NULL, NULL, 0);
+
+	/*
+	 * main loop
+	 */
+	for (;;)
+	{
+		int rc = 0;
+
+		CHECK_FOR_INTERRUPTS();
+
+		/*
+		 * XXX: for simplify laucnher will start a loop at fixed intervals,
+		 * but it will be no-op if no one sets a latch.
+		 */
+#define TEMPORARY_NAP_TIME 180000L
+
+		rc = WaitLatch(&MyProc->procLatch,
+					   WL_LATCH_SET | WL_TIMEOUT | WL_EXIT_ON_PM_DEATH,
+					   TEMPORARY_NAP_TIME, 0);
+		if (rc & WL_LATCH_SET)
+		{
+			ResetLatch(&MyProc->procLatch);
+			CHECK_FOR_INTERRUPTS();
+			scan_and_launch();
+		}
+
+		if (ConfigReloadPending)
+		{
+			ConfigReloadPending = false;
+			ProcessConfigFile(PGC_SIGHUP);
+		}
+	}
+	/* Not reachable */
+}
+
+/*
+ * xact callback for launcher/worker.
+ */
+static void
+lrg_perdb_wakeup_callback(XactEvent event, void *arg)
+{
+	switch (event)
+	{
+		case XACT_EVENT_COMMIT:
+			if (isworker_needed)
+			{
+				LrgPerdbWorker *worker;
+				LWLockAcquire(&LrgPerdbCtx->lock, LW_EXCLUSIVE);
+				worker = find_perdb_worker(MyDatabaseId);
+
+				/*
+				 * If lrg worker related with this db has been
+				 * launched, notify to the worker.
+				 * If not, maybe it means that someone has called lrg_create()/lrg_node_attach(),
+				 * notify to the launcher.
+				 */
+				if (worker != NULL)
+					SetLatch(worker->worker_latch);
+				else
+					SetLatch(LrgPerdbCtx->launcher_latch);
+
+				LWLockRelease(&LrgPerdbCtx->lock);
+			}
+			isworker_needed = false;
+			break;
+		default:
+			break;
+	}
+}
+
+/*
+ * Register a callback for notifying to launcher, and set a flag
+ */
+void
+lrg_launcher_wakeup(void)
+{
+	if (!ishook_registered)
+	{
+		RegisterXactCallback(lrg_perdb_wakeup_callback, NULL);
+		ishook_registered = true;
+	}
+	isworker_needed = true;
+}
diff --git a/src/backend/replication/lrg/lrg_worker.c b/src/backend/replication/lrg/lrg_worker.c
new file mode 100644
index 0000000000..d11b1cb893
--- /dev/null
+++ b/src/backend/replication/lrg/lrg_worker.c
@@ -0,0 +1,648 @@
+/*-------------------------------------------------------------------------
+ *
+ * lrg_worker.c
+ *		  functions for lrg worker
+ *
+ *-------------------------------------------------------------------------
+ */
+
+#include "postgres.h"
+
+#include "access/genam.h"
+#include "access/heapam.h"
+#include "access/htup_details.h"
+#include "access/relscan.h"
+#include "access/table.h"
+#include "catalog/catalog.h"
+#include "catalog/indexing.h"
+#include "catalog/pg_lrg_info.h"
+#include "catalog/pg_lrg_nodes.h"
+#include "catalog/pg_lrg_pub.h"
+#include "catalog/pg_publication.h"
+#include "executor/spi.h"
+#include "libpq-fe.h"
+#include "miscadmin.h"
+#include "postmaster/bgworker.h"
+#include "postmaster/interrupt.h"
+#include "replication/libpqlrg.h"
+#include "replication/lrg.h"
+#include "storage/ipc.h"
+#include "storage/proc.h"
+#include "tcop/tcopprot.h"
+#include "utils/fmgroids.h"
+#include "utils/memutils.h"
+#include "utils/snapmgr.h"
+#include "utils/syscache.h"
+
+typedef struct LrgNode {
+	Oid	  group_oid;
+	char *node_id;
+	char *node_name;
+	char *local_connstring;
+	char *upstream_connstring;
+} LrgNode;
+
+lrg_function_types *LrgFunctionTypes = NULL;
+
+static LrgPerdbWorker* my_lrg_worker = NULL;
+
+static void lrg_worker_onexit(int code, Datum arg);
+static void do_node_management(void);
+static void get_node_information(LrgNode **node, LRG_NODE_STATE *status);
+static void advance_state_machine(LrgNode *node, LRG_NODE_STATE initial_status);
+static void update_node_status_internal(const char *node_id, const char *node_name, LRG_NODE_STATE state, bool is_in_txn);
+static void detach_node(LrgNode *node);
+static void create_publication(const char* group_name, const char* node_id, Oid group_oid);
+static Oid find_publication(const char *pubname);
+static List* get_lrg_nodes_list(const char *local_nodeid);
+static void synchronise_system_tables(PGconn *localconn, PGconn *upstreamconn);
+
+void
+lrg_worker_cleanup(LrgPerdbWorker *worker)
+{
+	Assert(LWLockHeldByMeInMode(&LrgPerdbCtx->lock, LW_EXCLUSIVE));
+
+	worker->dbid = InvalidOid;
+	worker->worker_pid = InvalidPid;
+	worker->worker_latch = NULL;
+}
+
+/*
+ * Callback for process exit. cleanup the array.
+ */
+static void
+lrg_worker_onexit(int code, Datum arg)
+{
+	LWLockAcquire(&LrgPerdbCtx->lock, LW_EXCLUSIVE);
+	lrg_worker_cleanup(my_lrg_worker);
+	LWLockRelease(&LrgPerdbCtx->lock);
+}
+
+/*
+ * Synchronise system tables from upstream node.
+ *
+ * Currently it will read and insert pg_lrg_nodes only.
+ */
+static void
+synchronise_system_tables(PGconn *localconn, PGconn *upstreamconn)
+{
+	lrg_copy_lrg_nodes(upstreamconn, localconn);
+}
+
+/*
+ * Load the list of lrg_nodes, except the given node
+ */
+static List*
+get_lrg_nodes_list(const char *excepted_node)
+{
+	List *res = NIL;
+	Relation	rel;
+	TableScanDesc scan;
+	HeapTuple	tup;
+	/* We will allocate the output data in the current memory context */
+	MemoryContext resultcxt = CurrentMemoryContext;
+
+	StartTransactionCommand();
+	(void) GetTransactionSnapshot();
+
+	rel = table_open(LrgNodesRelationId, AccessShareLock);
+	scan = table_beginscan_catalog(rel, 0, NULL);
+
+	while (HeapTupleIsValid(tup = heap_getnext(scan, ForwardScanDirection)))
+	{
+		Form_pg_lrg_nodes nodesform = (Form_pg_lrg_nodes) GETSTRUCT(tup);
+		LrgNode *node;
+		MemoryContext oldcxt;
+
+		if (excepted_node != NULL &&
+			strcmp(NameStr(nodesform->nodeid), excepted_node) == 0)
+			continue;
+		/*
+		 * Allocate our results in the caller's context, not the transaction's.
+		 */
+		oldcxt = MemoryContextSwitchTo(resultcxt);
+
+		node = (LrgNode *)palloc0(sizeof(LrgNode));
+		node->group_oid = nodesform->groupid;
+		node->node_id = NameStr(nodesform->nodeid);
+		node->node_name = NameStr(nodesform->nodename);
+		node->local_connstring = NameStr(nodesform->localconn);
+
+		/*
+		 * TODO: treat upstreamconn as nullable field
+		 */
+		node->upstream_connstring = NameStr(nodesform->upstreamconn);
+		res = lappend(res, node);
+
+		MemoryContextSwitchTo(oldcxt);
+	}
+
+	table_endscan(scan);
+	table_close(rel, AccessShareLock);
+	CommitTransactionCommand();
+
+	return res;
+}
+
+/*
+ * Internal routine for updaing the status of the node.
+ *
+ * TODO: implement as C func instead of SPI interface
+ */
+static void
+update_node_status_internal(const char *node_id, const char *node_name, LRG_NODE_STATE state, bool is_in_txn)
+{
+	StringInfoData query;
+	int ret;
+
+	Assert(!(node_id == NULL && node_name == NULL)
+			&& !(node_id != NULL && node_name != NULL));
+
+	initStringInfo(&query);
+	appendStringInfo(&query, "UPDATE pg_lrg_nodes SET status = ");
+
+	switch (state)
+	{
+		case LRG_STATE_CREATE_PUBLICATION:
+			appendStringInfo(&query, "%d ", LRG_STATE_CREATE_PUBLICATION);
+			break;
+		case LRG_STATE_CREATE_SUBSCRIPTION:
+			appendStringInfo(&query, "%d", LRG_STATE_CREATE_SUBSCRIPTION);
+			break;
+		case LRG_STATE_READY:
+			appendStringInfo(&query, "%d", LRG_STATE_READY);
+			break;
+		case LRG_STATE_TO_BE_DETACHED:
+			appendStringInfo(&query, "%d", LRG_STATE_TO_BE_DETACHED);
+			break;
+		default:
+			elog(ERROR, "not implemented yet");
+	}
+
+	if (node_id != NULL)
+		appendStringInfo(&query, " WHERE nodeid = '%s'", node_id);
+	else
+		appendStringInfo(&query, " WHERE nodename = '%s'", node_name);
+
+	if (!is_in_txn)
+	{
+		StartTransactionCommand();
+		PushActiveSnapshot(GetTransactionSnapshot());
+	}
+	SPI_connect();
+
+	ret = SPI_execute(query.data, false, 0);
+	if (ret != SPI_OK_UPDATE)
+		elog(ERROR, "SPI error while updating a table");
+	SPI_finish();
+
+	if (!is_in_txn)
+	{
+		PopActiveSnapshot();
+		CommitTransactionCommand();
+	}
+
+	pfree(query.data);
+}
+
+/*
+ * Update the status of node, that is speciefied by the name
+ */
+void
+update_node_status_by_nodename(const char *node_name, LRG_NODE_STATE state, bool is_in_txn)
+{
+	update_node_status_internal(NULL, node_name, state, is_in_txn);
+}
+
+/*
+ * Same as above, but node_id is used for the key
+ */
+void
+update_node_status_by_nodeid(const char *node_id, LRG_NODE_STATE state, bool is_in_txn)
+{
+	update_node_status_internal(node_id, NULL, state, is_in_txn);
+}
+
+
+static Oid
+find_publication(const char *pubname)
+{
+	Relation rel;
+	HeapTuple tup;
+	Form_pg_publication pubform;
+
+	rel = table_open(PublicationRelationId, RowExclusiveLock);
+
+	/* Check if name is used */
+	tup = SearchSysCacheCopy1(PUBLICATIONNAME,
+							  CStringGetDatum(pubname));
+
+	if (!HeapTupleIsValid(tup))
+	{
+		table_close(rel, NoLock);
+		return InvalidOid;
+	}
+
+	pubform = (Form_pg_publication) GETSTRUCT(tup);
+	table_close(rel, NoLock);
+
+	return pubform->oid;
+}
+
+/*
+ * Create publication via SPI interface, and insert its oid
+ * to the system catalog pg_lrg_pub.
+ */
+static void
+create_publication(const char* group_name, const char* node_id, Oid group_oid)
+{
+	int ret;
+	StringInfoData query, pub_name;
+	Oid pub_oid;
+	Oid lrgpub_oid;
+	Relation rel;
+	bool		nulls[Natts_pg_lrg_pub];
+	Datum		values[Natts_pg_lrg_pub];
+	HeapTuple tup;
+
+	initStringInfo(&query);
+	initStringInfo(&pub_name);
+
+	/* Firstly do CREATE PUBLICATION */
+
+	StartTransactionCommand();
+	SPI_connect();
+	PushActiveSnapshot(GetTransactionSnapshot());
+
+	appendStringInfo(&pub_name, "pub_for_%s", group_name);
+	appendStringInfo(&query, "CREATE PUBLICATION %s %s", pub_name.data, "FOR ALL TABLES");
+
+	ret = SPI_execute(query.data, false, 0);
+	if (ret != SPI_OK_UTILITY)
+		elog(ERROR, "SPI error while creating publication");
+
+	PopActiveSnapshot();
+	SPI_finish();
+	CommitTransactionCommand();
+
+	/* ...And record its oid */
+	StartTransactionCommand();
+	(void) GetTransactionSnapshot();
+
+	pub_oid = find_publication(pub_name.data);
+	if (pub_oid == InvalidOid)
+		elog(ERROR, "publication is not found");
+
+	rel = table_open(LrgPublicationId, ExclusiveLock);
+
+	memset(nulls, 0, sizeof(nulls));
+	memset(values, 0, sizeof(values));
+
+	lrgpub_oid = GetNewOidWithIndex(rel, LrgPublicationOidIndexId, Anum_pg_lrg_pub_oid);
+
+	values[Anum_pg_lrg_pub_oid - 1] = ObjectIdGetDatum(lrgpub_oid);
+	values[Anum_pg_lrg_pub_groupid - 1] = ObjectIdGetDatum(group_oid);
+	values[Anum_pg_lrg_pub_pubid - 1] = ObjectIdGetDatum(pub_oid);
+
+	tup = heap_form_tuple(RelationGetDescr(rel), values, nulls);
+	/* Insert tuple into catalog. */
+	CatalogTupleInsert(rel, tup);
+	heap_freetuple(tup);
+	table_close(rel, ExclusiveLock);
+
+	CommitTransactionCommand();
+
+	pfree(pub_name.data);
+	pfree(query.data);
+}
+
+/*
+ * Some work for detaching and dropping
+ */
+static void
+detach_node(LrgNode *node)
+{
+	PGconn *tobedetached = NULL;
+	List *list;
+	ListCell   *lc;
+	MemoryContext subctx;
+	MemoryContext oldctx;
+	char *group_name = NULL;
+
+	get_group_info(&group_name);
+
+	if (LrgFunctionTypes == NULL)
+		load_file("libpqlrg", false);
+
+	lrg_connect(node->local_connstring, &tobedetached);
+
+	subctx = AllocSetContextCreate(TopMemoryContext,
+									"Lrg Launcher list",
+									ALLOCSET_DEFAULT_SIZES);
+	oldctx = MemoryContextSwitchTo(subctx);
+
+	list = get_lrg_nodes_list(node->node_id);
+
+	if (list != NIL)
+	{
+		foreach(lc, list)
+		{
+			LrgNode *other_node = (LrgNode *)lfirst(lc);
+			PGconn *otherconn = NULL;
+			lrg_connect(other_node->local_connstring, &otherconn);
+
+			lrg_drop_subscription(group_name, node->node_id, other_node->node_id, otherconn);
+			lrg_drop_subscription(group_name, other_node->node_id, node->node_id, tobedetached);
+
+			lrg_delete_from_nodes(otherconn, node->node_id);
+			lrg_disconnect(otherconn);
+		}
+	}
+	else
+		lrg_delete_from_nodes(tobedetached, node->node_id);
+
+	MemoryContextSwitchTo(oldctx);
+	MemoryContextDelete(subctx);
+
+	lrg_drop_publication(group_name, tobedetached);
+	lrg_cleanup(tobedetached);
+	lrg_disconnect(tobedetached);
+
+	pfree(group_name);
+}
+
+/*
+ * advance the state machine for creating/attaching
+ */
+static void
+advance_state_machine(LrgNode *local_node, LRG_NODE_STATE initial_status)
+{
+	PGconn *localconn = NULL;
+	PGconn *upstreamconn = NULL;
+	char *group_name = NULL;
+	LRG_NODE_STATE state = initial_status;
+	char node_id[64];
+
+	/*
+	 * Assuming that the specified node is local
+	 */
+	construct_node_id(node_id, sizeof(node_id));
+	Assert(strcmp(node_id, local_node->node_id) == 0);
+
+	if (state == LRG_STATE_INIT)
+	{
+		/* Establish connection if we are in the attaching case */
+		if (local_node->upstream_connstring != NULL)
+		{
+			load_file("libpqlrg", false);
+			lrg_connect(local_node->upstream_connstring, &upstreamconn);
+			lrg_connect(local_node->local_connstring, &localconn);
+
+			/* and get pg_lrg_nodes from upstream */
+			synchronise_system_tables(localconn, upstreamconn);
+		}
+		get_group_info(&group_name);
+
+		create_publication(group_name, local_node->node_id, local_node->group_oid);
+
+		state = LRG_STATE_CREATE_PUBLICATION;
+		update_node_status_by_nodename(local_node->node_name, LRG_STATE_CREATE_PUBLICATION, false);
+	}
+
+	if (state == LRG_STATE_CREATE_PUBLICATION)
+	{
+		if (local_node->upstream_connstring != NULL)
+		{
+			List *list;
+			ListCell   *lc;
+			MemoryContext subctx;
+			MemoryContext oldctx;
+
+			subctx = AllocSetContextCreate(TopMemoryContext,
+											"Lrg Launcher list",
+											ALLOCSET_DEFAULT_SIZES);
+			oldctx = MemoryContextSwitchTo(subctx);
+
+			/* Get a node list that belong to the group */
+			list = get_lrg_nodes_list(local_node->node_id);
+
+			/* and do CREATE SUBSCRIPTION on all nodes! */
+			foreach(lc, list)
+			{
+				LrgNode *other_node = (LrgNode *)lfirst(lc);
+				PGconn *otherconn = NULL;
+				lrg_connect(other_node->local_connstring, &otherconn);
+				lrg_create_subscription(group_name, local_node->local_connstring,
+										local_node->node_id, other_node->node_id,
+										otherconn, "only_local = true, copy_data = false");
+				lrg_create_subscription(group_name, other_node->local_connstring,
+										other_node->node_id, local_node->node_id,
+										localconn, "only_local = true, copy_data = false");
+
+				/*
+				 * XXX: adding a tuple into remote's pg_lrg_nodes here,
+				 * but it is bad. it should be end of this function.
+				 */
+				if (local_node->upstream_connstring != NULL)
+					lrg_insert_into_lrg_nodes(otherconn, local_node->node_id,
+							LRG_STATE_READY, local_node->node_name,
+							local_node->local_connstring, local_node->upstream_connstring);
+				lrg_disconnect(otherconn);
+			}
+			MemoryContextSwitchTo(oldctx);
+			MemoryContextDelete(subctx);
+		}
+
+		state = LRG_STATE_CREATE_SUBSCRIPTION;
+		update_node_status_by_nodename(local_node->node_name, LRG_STATE_CREATE_SUBSCRIPTION, false);
+	}
+
+	state = LRG_STATE_READY;
+	update_node_status_by_nodename(local_node->node_name, LRG_STATE_READY, false);
+
+	/*
+	 * clean up phase
+	 */
+	if (localconn != NULL)
+		lrg_disconnect(localconn);
+	if (upstreamconn != NULL)
+		lrg_disconnect(upstreamconn);
+	if (group_name != NULL)
+		pfree(group_name);
+}
+
+/*
+ * Get node-specific information that status is not ready.
+ */
+static void
+get_node_information(LrgNode **node, LRG_NODE_STATE *status)
+{
+	Relation	rel;
+	HeapTuple	tup;
+	TableScanDesc scan;
+
+	StartTransactionCommand();
+	(void) GetTransactionSnapshot();
+
+	rel = table_open(LrgNodesRelationId, AccessShareLock);
+	scan = table_beginscan_catalog(rel, 0, NULL);
+
+	while (HeapTupleIsValid(tup = heap_getnext(scan, ForwardScanDirection)))
+	{
+		Form_pg_lrg_nodes nodesform = (Form_pg_lrg_nodes) GETSTRUCT(tup);
+		MemoryContext oldcxt;
+		LrgNode *tmp;
+
+		/*
+		 * If the status is ready, we skip it.
+		 */
+		if (nodesform->status == LRG_STATE_READY)
+			continue;
+
+		oldcxt = MemoryContextSwitchTo(TopMemoryContext);
+		tmp = (LrgNode *)palloc0(sizeof(LrgNode));
+		tmp->group_oid = nodesform->groupid;
+		tmp->node_id = pstrdup(NameStr(nodesform->nodeid));
+		tmp->node_name = pstrdup(NameStr(nodesform->nodename));
+		tmp->local_connstring = pstrdup(NameStr(nodesform->localconn));
+		if (strlen(NameStr(nodesform->upstreamconn)) != 0)
+			tmp->upstream_connstring = pstrdup(NameStr(nodesform->upstreamconn));
+		else
+			tmp->upstream_connstring = NULL;
+
+		*node = tmp;
+		*status = nodesform->status;
+
+		MemoryContextSwitchTo(oldcxt);
+		break;
+	}
+
+	table_endscan(scan);
+	table_close(rel, NoLock);
+	CommitTransactionCommand();
+}
+
+static void
+do_node_management(void)
+{
+	LrgNode *node = NULL;
+	LRG_NODE_STATE status;
+
+	/*
+	 * read information from pg_lrg_nodes
+	 */
+	get_node_information(&node, &status);
+
+	if (node == NULL)
+	{
+		/*
+		 * If we rearch here status of nodes are READY,
+		 * it means that no operations are needed.
+		 */
+		return;
+	}
+
+	/*
+	 * XXX: for simplify the case for detaching/dropping is completely separated
+	 * from the creating/attaching.
+	 */
+	if (status == LRG_STATE_TO_BE_DETACHED)
+		detach_node(node);
+	else
+	{
+		/*
+		 * advance the state machine for creating or attaching.
+		 */
+		advance_state_machine(node, status);
+	}
+
+	pfree(node->node_id);
+	pfree(node->node_name);
+	pfree(node->local_connstring);
+	if (node->upstream_connstring != NULL)
+		pfree(node->upstream_connstring);
+	pfree(node);
+}
+
+/*
+ * Entry point for lrg worker
+ */
+void
+lrg_worker_main(Datum arg)
+{
+	int slot = DatumGetInt32(arg);
+
+	/* Establish signal handlers. */
+	pqsignal(SIGHUP, SignalHandlerForConfigReload);
+	pqsignal(SIGTERM, die);
+	BackgroundWorkerUnblockSignals();
+
+	/*
+	 * Get information from the controller. The idex
+	 * is given as the argument
+	 */
+	LWLockAcquire(&LrgPerdbCtx->lock, LW_SHARED);
+	my_lrg_worker = &LrgPerdbCtx->workers[slot];
+	my_lrg_worker->worker_pid = MyProcPid;
+	my_lrg_worker->worker_latch = &MyProc->procLatch;
+	LWLockRelease(&LrgPerdbCtx->lock);
+
+	before_shmem_exit(lrg_worker_onexit, (Datum) 0);
+
+	BackgroundWorkerInitializeConnectionByOid(my_lrg_worker->dbid, 0, 0);
+
+	elog(DEBUG3, "per-db worker for %u was launched", my_lrg_worker->dbid);
+
+	/*
+	 * The launcher launches the worker without considering
+	 * the existence of lrg related data.
+	 * So firstly workers must check their catalogs, and exit
+	 * if there is no data.
+	 * In any cases pg_lrg_info will have tuples if
+	 * this node is in a node group, so we reads it.
+	 */
+	if (get_group_info(NULL) == InvalidOid)
+	{
+		elog(DEBUG3, "This database %u is not a member of lrg", MyDatabaseId);
+		proc_exit(0);
+	}
+
+	do_node_management();
+
+	ResetLatch(&MyProc->procLatch);
+
+	/*
+	 * Wait for detaching or dropping.
+	 */
+	for (;;)
+	{
+		int rc;
+		bool is_latch_set = false;
+
+		CHECK_FOR_INTERRUPTS();
+
+#define TEMPORARY_NAP_TIME 180000L
+		rc = WaitLatch(&MyProc->procLatch,
+					   WL_LATCH_SET | WL_TIMEOUT | WL_EXIT_ON_PM_DEATH,
+					   TEMPORARY_NAP_TIME, 0);
+
+		if (rc & WL_LATCH_SET)
+		{
+			is_latch_set = true;
+			ResetLatch(&MyProc->procLatch);
+			CHECK_FOR_INTERRUPTS();
+		}
+
+		if (ConfigReloadPending)
+		{
+			ConfigReloadPending = false;
+			ProcessConfigFile(PGC_SIGHUP);
+		}
+
+		if (is_latch_set)
+		{
+			do_node_management();
+			is_latch_set = false;
+		}
+	}
+}
diff --git a/src/backend/storage/ipc/ipci.c b/src/backend/storage/ipc/ipci.c
index 26372d95b3..15b77405bc 100644
--- a/src/backend/storage/ipc/ipci.c
+++ b/src/backend/storage/ipc/ipci.c
@@ -32,6 +32,7 @@
 #include "postmaster/bgwriter.h"
 #include "postmaster/postmaster.h"
 #include "replication/logicallauncher.h"
+#include "replication/lrg.h"
 #include "replication/origin.h"
 #include "replication/slot.h"
 #include "replication/walreceiver.h"
@@ -284,6 +285,7 @@ CreateSharedMemoryAndSemaphores(void)
 	WalRcvShmemInit();
 	PgArchShmemInit();
 	ApplyLauncherShmemInit();
+	LrgLauncherShmemInit();
 
 	/*
 	 * Set up other modules that need some shared memory space
diff --git a/src/include/catalog/pg_lrg_info.h b/src/include/catalog/pg_lrg_info.h
new file mode 100644
index 0000000000..0067aac389
--- /dev/null
+++ b/src/include/catalog/pg_lrg_info.h
@@ -0,0 +1,47 @@
+/*-------------------------------------------------------------------------
+ *
+ * pg_lrg_info.h
+ *	  definition of the "logical replication group information" system
+ *	  catalog (pg_lrg_info)
+ *
+ *
+ * Portions Copyright (c) 1996-2022, PostgreSQL Global Development Group
+ * Portions Copyright (c) 1994, Regents of the University of California
+ *
+ * src/include/catalog/pg_lrg_info.h
+ *
+ * NOTES
+ *	  The Catalog.pm module reads this file and derives schema
+ *	  information.
+ *
+ *-------------------------------------------------------------------------
+ */
+#ifndef PG_LRG_INFO_H
+#define PG_LRG_INFO_H
+
+#include "catalog/genbki.h"
+#include "catalog/pg_lrg_info_d.h"
+
+/* ----------------
+ *		pg_lrg_info definition.  cpp turns this into
+ *		typedef struct FormData_pg_lrg_info
+ * ----------------
+ */
+CATALOG(pg_lrg_info,8337,LrgInfoRelationId)
+{
+	Oid			oid;			/* oid */
+
+	NameData	groupname;		/* name of the logical replication group */
+	bool		puballtables;
+} FormData_pg_lrg_info;
+
+/* ----------------
+ *		Form_pg_lrg_info corresponds to a pointer to a tuple with
+ *		the format of pg_lrg_info relation.
+ * ----------------
+ */
+typedef FormData_pg_lrg_info *Form_pg_lrg_info;
+
+DECLARE_UNIQUE_INDEX_PKEY(pg_lrg_info_oid_index, 8338, LrgInfoRelationIndexId, on pg_lrg_info using btree(oid oid_ops));
+
+#endif							/* PG_LRG_INFO_H */
diff --git a/src/include/catalog/pg_lrg_nodes.h b/src/include/catalog/pg_lrg_nodes.h
new file mode 100644
index 0000000000..0ef32185ad
--- /dev/null
+++ b/src/include/catalog/pg_lrg_nodes.h
@@ -0,0 +1,54 @@
+/*-------------------------------------------------------------------------
+ *
+ * pg_lrg_nodes.h
+ *	  definition of the "logical replication nodes" system
+ *	  catalog (pg_lrg_nodes)
+ *
+ *
+ * Portions Copyright (c) 1996-2022, PostgreSQL Global Development Group
+ * Portions Copyright (c) 1994, Regents of the University of California
+ *
+ * src/include/catalog/pg_lrg_nodes.h
+ *
+ * NOTES
+ *	  The Catalog.pm module reads this file and derives schema
+ *	  information.
+ *
+ *-------------------------------------------------------------------------
+ */
+#ifndef PG_LRG_NODES_H
+#define PG_LRG_NODES_H
+
+#include "catalog/genbki.h"
+#include "catalog/pg_lrg_nodes_d.h"
+
+/* ----------------
+ *		pg_lrg_nodes definition.  cpp turns this into
+ *		typedef struct FormData_pg_lrg_nodes
+ * ----------------
+ */
+CATALOG(pg_lrg_nodes,8339,LrgNodesRelationId)
+{
+	Oid			oid;			/* oid */
+
+	NameData	nodeid;		/* name of the logical replication group */
+	Oid			groupid BKI_LOOKUP(pg_lrg_info);
+	Oid 		dbid BKI_LOOKUP(pg_database);
+	int32		status;
+	NameData	nodename;
+	NameData	localconn;
+	NameData	upstreamconn BKI_FORCE_NULL;
+} FormData_pg_lrg_nodes;
+
+/* ----------------
+ *		Form_pg_lrg_nodes corresponds to a pointer to a tuple with
+ *		the format of pg_lrg_nodes relation.
+ * ----------------
+ */
+typedef FormData_pg_lrg_nodes *Form_pg_lrg_nodes;
+
+DECLARE_UNIQUE_INDEX_PKEY(pg_lrg_nodes_oid_index, 8340, LrgNodesRelationIndexId, on pg_lrg_nodes using btree(oid oid_ops));
+DECLARE_UNIQUE_INDEX(pg_lrg_node_id_index, 8346, LrgNodeIdIndexId, on pg_lrg_nodes using btree(nodeid name_ops));
+DECLARE_UNIQUE_INDEX(pg_lrg_nodes_name_index, 8347, LrgNodeNameIndexId, on pg_lrg_nodes using btree(nodename name_ops));
+
+#endif							/* PG_LRG_NODES_H */
diff --git a/src/include/catalog/pg_lrg_pub.h b/src/include/catalog/pg_lrg_pub.h
new file mode 100644
index 0000000000..d65dc51d4d
--- /dev/null
+++ b/src/include/catalog/pg_lrg_pub.h
@@ -0,0 +1,46 @@
+/*-------------------------------------------------------------------------
+ *
+ * pg_lrg_info.h
+ *	  definition of the "logical replication group publication" system
+ *	  catalog (pg_lrg_pub)
+ *
+ *
+ * Portions Copyright (c) 1996-2022, PostgreSQL Global Development Group
+ * Portions Copyright (c) 1994, Regents of the University of California
+ *
+ * src/include/catalog/pg_lrg_pub.h
+ *
+ * NOTES
+ *	  The Catalog.pm module reads this file and derives schema
+ *	  information.
+ *
+ *-------------------------------------------------------------------------
+ */
+#ifndef PG_LRG_PUB_H
+#define PG_LRG_PUB_H
+
+#include "catalog/genbki.h"
+#include "catalog/pg_lrg_pub_d.h"
+
+/* ----------------
+ *		pg_lrg_pub definition.  cpp turns this into
+ *		typedef struct FormData_pg_lrg_pub
+ * ----------------
+ */
+CATALOG(pg_lrg_pub,8341,LrgPublicationId)
+{
+	Oid			oid;
+	Oid 		groupid BKI_LOOKUP(pg_lrg_info);
+	Oid 		pubid BKI_LOOKUP(pg_publication);
+} FormData_pg_lrg_pub;
+
+/* ----------------
+ *		Form_pg_lrg_pub corresponds to a pointer to a tuple with
+ *		the format of pg_lrg_pub relation.
+ * ----------------
+ */
+typedef FormData_pg_lrg_pub *Form_pg_lrg_pub;
+
+DECLARE_UNIQUE_INDEX_PKEY(pg_lrg_pub_oid_index, 8344, LrgPublicationOidIndexId, on pg_lrg_pub using btree(oid oid_ops));
+
+#endif							/* PG_LRG_PUB_H */
diff --git a/src/include/catalog/pg_lrg_sub.h b/src/include/catalog/pg_lrg_sub.h
new file mode 100644
index 0000000000..398c8e8971
--- /dev/null
+++ b/src/include/catalog/pg_lrg_sub.h
@@ -0,0 +1,46 @@
+/*-------------------------------------------------------------------------
+ *
+ * pg_lrg_sub.h
+ *	  definition of the "logical replication group subscription" system
+ *	  catalog (pg_lrg_sub)
+ *
+ *
+ * Portions Copyright (c) 1996-2022, PostgreSQL Global Development Group
+ * Portions Copyright (c) 1994, Regents of the University of California
+ *
+ * src/include/catalog/pg_lrg_sub.h
+ *
+ * NOTES
+ *	  The Catalog.pm module reads this file and derives schema
+ *	  information.
+ *
+ *-------------------------------------------------------------------------
+ */
+#ifndef PG_LRG_SUB_H
+#define PG_LRG_SUB_H
+
+#include "catalog/genbki.h"
+#include "catalog/pg_lrg_sub_d.h"
+
+/* ----------------
+ *		pg_lrg_sub definition.  cpp turns this into
+ *		typedef struct FormData_pg_lrg_sub
+ * ----------------
+ */
+CATALOG(pg_lrg_sub,8343,LrgSubscriptionId)
+{
+	Oid			oid;
+	Oid 		groupid BKI_LOOKUP(pg_lrg_info);;
+	Oid 		subid BKI_LOOKUP(pg_subscription);
+} FormData_pg_lrg_sub;
+
+/* ----------------
+ *		Form_pg_lrg_sub corresponds to a pointer to a tuple with
+ *		the format of pg_lrg_sub relation.
+ * ----------------
+ */
+typedef FormData_pg_lrg_sub *Form_pg_lrg_sub;
+
+DECLARE_UNIQUE_INDEX_PKEY(pg_lrg_sub_oid_index, 8345, LrgSubscriptionOidIndexId, on pg_lrg_sub using btree(oid oid_ops));
+
+#endif							/* PG_LRG_SUB_H */
diff --git a/src/include/catalog/pg_proc.dat b/src/include/catalog/pg_proc.dat
index babe16f00a..3db7210ef8 100644
--- a/src/include/catalog/pg_proc.dat
+++ b/src/include/catalog/pg_proc.dat
@@ -11885,4 +11885,29 @@
   prorettype => 'bytea', proargtypes => 'pg_brin_minmax_multi_summary',
   prosrc => 'brin_minmax_multi_summary_send' },
 
+# lrg
+{ oid => '8143', descr => 'create logical replication group',
+  proname => 'lrg_create', proparallel => 'r',
+  prorettype => 'void', proargtypes => 'text text text text',
+  prosrc => 'lrg_create' },
+{ oid => '8144', descr => 'attach to logical replication group',
+  proname => 'lrg_node_attach', proparallel => 'r',
+  prorettype => 'void', proargtypes => 'text text text text',
+  prosrc => 'lrg_node_attach' },
+{ oid => '8145', descr => 'detach from logical replication group',
+  proname => 'lrg_node_detach', proparallel => 'r',
+  prorettype => 'void', proargtypes => 'text text',
+  prosrc => 'lrg_node_detach' },
+{ oid => '8146', descr => 'delete logical replication group',
+  proname => 'lrg_drop', proparallel => 'r',
+  prorettype => 'void', proargtypes => 'text',
+  prosrc => 'lrg_drop' },
+{ oid => '8147', descr => 'insert a tuple to pg_lrg_sub',
+  proname => 'lrg_insert_into_sub', proparallel => 'r',
+  prorettype => 'void', proargtypes => 'text',
+  prosrc => 'lrg_insert_into_sub' },
+{ oid => '8148', descr => 'insert a tuple to pg_lrg_nodes',
+  proname => 'lrg_insert_into_nodes', proparallel => 'r',
+  prorettype => 'void', proargtypes => 'text int4 text text text',
+  prosrc => 'lrg_insert_into_nodes' },
 ]
diff --git a/src/include/replication/libpqlrg.h b/src/include/replication/libpqlrg.h
new file mode 100644
index 0000000000..f13b4934d3
--- /dev/null
+++ b/src/include/replication/libpqlrg.h
@@ -0,0 +1,99 @@
+/*-------------------------------------------------------------------------
+ *
+ * libpqlrg.h
+ *		  Constructs a logical replication group
+ *
+ *-------------------------------------------------------------------------
+ */
+#ifndef LIBPQLIG_H
+#define LIBPQLIG_H
+
+#include "postgres.h"
+#include "libpq-fe.h"
+#include "replication/lrg.h"
+
+/* function pointers for libpqlrg */
+
+typedef void (*libpqlrg_connect_fn) (const char *connstring, PGconn **conn);
+typedef bool (*libpqlrg_check_group_fn) (PGconn *conn, const char *group_name);
+typedef void (*libpqlrg_copy_lrg_nodes_fn) (PGconn *remoteconn, PGconn *localconn);
+typedef void (*libpqlrg_insert_into_lrg_nodes_fn) (PGconn *remoteconn,
+												   const char *node_id, LRG_NODE_STATE status,
+												   const char *node_name, const char *local_connstring,
+												   const char *upstream_connstring);
+typedef void (*libpqlrg_create_subscription_fn) (const char *group_name, const char *publisher_connstring,
+											  const char *publisher_node_id, const char *subscriber_node_id,
+											  PGconn *subscriberconn, const char *options);
+
+typedef void (*libpqlrg_drop_publication_fn) (const char *group_name,
+											  PGconn *publisherconn);
+
+typedef void (*libpqlrg_drop_subscription_fn) (const char *group_name,
+											   const char *publisher_node_id, const char *subscriber_node_id,
+											   PGconn *subscriberconn);
+
+typedef void (*libpqlrg_delete_from_nodes_fn) (PGconn *conn, const char *node_id);
+typedef void (*libpqlrg_cleanup_fn) (PGconn *conn);
+
+typedef void (*libpqlrg_disconnect_fn) (PGconn *conn);
+
+typedef struct lrg_function_types
+{
+	libpqlrg_connect_fn libpqlrg_connect;
+	libpqlrg_check_group_fn libpqlrg_check_group;
+	libpqlrg_copy_lrg_nodes_fn libpqlrg_copy_lrg_nodes;
+	libpqlrg_insert_into_lrg_nodes_fn libpqlrg_insert_into_lrg_nodes;
+	libpqlrg_create_subscription_fn libpqlrg_create_subscription;
+	libpqlrg_drop_publication_fn libpqlrg_drop_publication;
+	libpqlrg_drop_subscription_fn libpqlrg_drop_subscription;
+	libpqlrg_delete_from_nodes_fn libpqlrg_delete_from_nodes;
+	libpqlrg_cleanup_fn libpqlrg_cleanup;
+	libpqlrg_disconnect_fn libpqlrg_disconnect;
+} lrg_function_types;
+
+extern PGDLLIMPORT lrg_function_types *LrgFunctionTypes;
+
+#define lrg_connect(connstring, conn) \
+	LrgFunctionTypes->libpqlrg_connect(connstring, conn)
+#define lrg_check_group(conn, group_name) \
+	LrgFunctionTypes->libpqlrg_check_group(conn, group_name)
+#define lrg_copy_lrg_nodes(remoteconn, localconn) \
+	LrgFunctionTypes->libpqlrg_copy_lrg_nodes(remoteconn, localconn)
+
+#define lrg_insert_into_lrg_nodes(remoteconn, \
+								  node_id, status, \
+								  node_name, local_connstring, \
+								  upstream_connstring) \
+	LrgFunctionTypes->libpqlrg_insert_into_lrg_nodes(remoteconn, \
+													 node_id, status, \
+													 node_name, local_connstring, \
+													 upstream_connstring)
+#define lrg_create_subscription(group_name, publisher_connstring, \
+								publisher_node_id, subscriber_node_id, \
+								subscriberconn, options) \
+	LrgFunctionTypes->libpqlrg_create_subscription(group_name, publisher_connstring, \
+												publisher_node_id, subscriber_node_id, \
+												subscriberconn, options)
+
+#define lrg_drop_publication(group_name, \
+							  publisherconn) \
+	LrgFunctionTypes->libpqlrg_drop_publication(group_name, \
+												 publisherconn)
+
+#define lrg_drop_subscription(group_name, \
+							  publisher_node_id, subscriber_node_id, \
+							  subscriberconn) \
+	LrgFunctionTypes->libpqlrg_drop_subscription(group_name, \
+												 publisher_node_id, subscriber_node_id, \
+												 subscriberconn)
+
+#define lrg_delete_from_nodes(conn, node_id) \
+	LrgFunctionTypes->libpqlrg_delete_from_nodes(conn, node_id)
+
+#define lrg_cleanup(conn) \
+	LrgFunctionTypes->libpqlrg_cleanup(conn)
+
+#define lrg_disconnect(conn) \
+	LrgFunctionTypes->libpqlrg_disconnect(conn)
+
+#endif /* LIBPQLIG_H */
\ No newline at end of file
diff --git a/src/include/replication/lrg.h b/src/include/replication/lrg.h
new file mode 100644
index 0000000000..f0e38696cf
--- /dev/null
+++ b/src/include/replication/lrg.h
@@ -0,0 +1,68 @@
+/*-------------------------------------------------------------------------
+ *
+ * lrg.h
+ *		  Constructs a logical replication group
+ *
+ *-------------------------------------------------------------------------
+ */
+#ifndef LRG_H
+#define LRG_H
+
+#include "postgres.h"
+
+#include "storage/latch.h"
+#include "storage/lock.h"
+#include "storage/lwlock.h"
+
+/*
+ * enumeration for represents its status
+ */
+typedef enum
+{
+	LRG_STATE_INIT = 0,
+	LRG_STATE_CREATE_PUBLICATION,
+	LRG_STATE_CREATE_SUBSCRIPTION,
+	LRG_STATE_READY,
+	LRG_STATE_TO_BE_DETACHED,
+} LRG_NODE_STATE;
+
+/*
+ * working space for each lrg per-db worker.
+ */
+typedef struct LrgPerdbWorker {
+	pid_t worker_pid;
+	Oid dbid;
+	Latch *worker_latch;
+} LrgPerdbWorker;
+
+/*
+ * controller for lrg per-db worker.
+ * This will be hold by launcher.
+ */
+typedef struct LrgPerdbCtxStruct {
+	LWLock lock;
+	pid_t launcher_pid;
+	Latch *launcher_latch;
+	LrgPerdbWorker workers[FLEXIBLE_ARRAY_MEMBER];
+} LrgPerdbCtxStruct;
+
+extern LrgPerdbCtxStruct *LrgPerdbCtx;
+
+/* lrg.c */
+extern void LrgLauncherShmemInit(void);
+extern void LrgLauncherRegister(void);
+extern void lrg_add_nodes(char *node_id, Oid group_id, LRG_NODE_STATE status, char *node_name, char *local_connstring, char *upstream_connstring);
+extern Oid get_group_info(char **group_name);
+extern void construct_node_id(char *out_node_id, int size);
+extern void update_node_status_by_nodename(const char *node_name, LRG_NODE_STATE state, bool is_in_txn);
+extern void update_node_status_by_nodeid(const char *node_id, LRG_NODE_STATE state, bool is_in_txn);
+
+/* lrg_launcher.c */
+extern void lrg_launcher_main(Datum arg) pg_attribute_noreturn();
+extern void lrg_launcher_wakeup(void);
+
+/* *lrg_worker.c */
+extern void lrg_worker_main(Datum arg) pg_attribute_noreturn();
+extern void lrg_worker_cleanup(LrgPerdbWorker *worker);
+
+#endif /* LRG_H */
\ No newline at end of file
diff --git a/src/test/regress/expected/oidjoins.out b/src/test/regress/expected/oidjoins.out
index 215eb899be..da0ef150a2 100644
--- a/src/test/regress/expected/oidjoins.out
+++ b/src/test/regress/expected/oidjoins.out
@@ -266,3 +266,9 @@ NOTICE:  checking pg_subscription {subdbid} => pg_database {oid}
 NOTICE:  checking pg_subscription {subowner} => pg_authid {oid}
 NOTICE:  checking pg_subscription_rel {srsubid} => pg_subscription {oid}
 NOTICE:  checking pg_subscription_rel {srrelid} => pg_class {oid}
+NOTICE:  checking pg_lrg_nodes {groupid} => pg_lrg_info {oid}
+NOTICE:  checking pg_lrg_nodes {dbid} => pg_database {oid}
+NOTICE:  checking pg_lrg_pub {groupid} => pg_lrg_info {oid}
+NOTICE:  checking pg_lrg_pub {pubid} => pg_publication {oid}
+NOTICE:  checking pg_lrg_sub {groupid} => pg_lrg_info {oid}
+NOTICE:  checking pg_lrg_sub {subid} => pg_subscription {oid}
-- 
2.27.0

v2-0004-add-doc.patchapplication/octet-stream; name=v2-0004-add-doc.patchDownload

From c4c2a372968bdac6bb03a92d2b353f7b6b7b9caa Mon Sep 17 00:00:00 2001
From: "kuroda.hayato%40jp.fujitsu.com" <kuroda.hayato@jp.fujitsu.com>
Date: Wed, 18 May 2022 04:56:18 +0000
Subject: [PATCH 2/2] add doc

---
 doc/src/sgml/catalogs.sgml | 296 ++++++++++++++++++++++++++++++
 doc/src/sgml/filelist.sgml |   1 +
 doc/src/sgml/func.sgml     |  79 ++++++++
 doc/src/sgml/lrg.sgml      | 364 +++++++++++++++++++++++++++++++++++++
 doc/src/sgml/postgres.sgml |   1 +
 5 files changed, 741 insertions(+)
 create mode 100644 doc/src/sgml/lrg.sgml

diff --git a/doc/src/sgml/catalogs.sgml b/doc/src/sgml/catalogs.sgml
index a533a2153e..29abc04fab 100644
--- a/doc/src/sgml/catalogs.sgml
+++ b/doc/src/sgml/catalogs.sgml
@@ -200,6 +200,22 @@
       <entry>metadata for large objects</entry>
      </row>
 
+     <row>
+      <entry><link linkend="catalog-pg-lrg-info"><structname>pg_lrg_info</structname></link></entry>
+      <entry>logical replication group</entry>
+     </row>
+
+     <row>
+      <entry><link linkend="catalog-pg-lrg-nodes"><structname>pg_lrg_nodes</structname></link></entry>
+      <entry>logical replication group</entry>
+     </row>
+
+     <row>
+      <entry><link linkend="catalog-pg-lrg-pub"><structname>pg_lrg_pub</structname></link></entry>
+      <entry>logical replication group</entry>
+     </row>
+
+
      <row>
       <entry><link linkend="catalog-pg-namespace"><structname>pg_namespace</structname></link></entry>
       <entry>schemas</entry>
@@ -4960,6 +4976,286 @@ SCRAM-SHA-256$<replaceable>&lt;iteration count&gt;</replaceable>:<replaceable>&l
   </table>
  </sect1>
 
+ <sect1 id="catalog-pg-lrg-info">
+  <title><structname>pg_lrg_info</structname></title>
+
+  <indexterm zone="catalog-pg-lrg-info">
+   <primary>pg_lrg_info</primary>
+  </indexterm>
+  <para>
+   test
+  </para>
+
+  <table>
+   <title><structname>pg_lrg_info</structname> Columns</title>
+   <tgroup cols="1">
+    <thead>
+     <row>
+      <entry role="catalog_table_entry"><para role="column_definition">
+       Column Type
+      </para>
+      <para>
+       Description
+      </para></entry>
+     </row>
+    </thead>
+
+    <tbody>
+     <row>
+      <entry role="catalog_table_entry"><para role="column_definition">
+       <structfield>oid</structfield> <type>oid</type>
+      </para>
+      <para>
+       Row identifier
+      </para></entry>
+     </row>
+
+     <row>
+      <entry role="catalog_table_entry"><para role="column_definition">
+       <structfield>groupname</structfield> <type>name</type>
+      </para>
+      <para>
+       Name of the group
+      </para></entry>
+     </row>
+
+     <row>
+      <entry role="catalog_table_entry"><para role="column_definition">
+       <structfield>puballtables</structfield> <type>bool</type>
+      </para>
+      <para>
+       The type of publication
+      </para></entry>
+     </row>
+    </tbody>
+   </tgroup>
+  </table>
+ </sect1>
+
+ <sect1 id="catalog-pg-lrg-nodes">
+  <title><structname>pg_lrg_nodes</structname></title>
+
+  <indexterm zone="catalog-pg-lrg-nodes">
+   <primary>pg_lrg_nodes</primary>
+  </indexterm>
+  <para>
+   test
+  </para>
+
+  <table>
+   <title><structname>pg_lrg_nodes</structname> Columns</title>
+   <tgroup cols="1">
+    <thead>
+     <row>
+      <entry role="catalog_table_entry"><para role="column_definition">
+       Column Type
+      </para>
+      <para>
+       Description
+      </para></entry>
+     </row>
+    </thead>
+
+    <tbody>
+     <row>
+      <entry role="catalog_table_entry"><para role="column_definition">
+       <structfield>oid</structfield> <type>oid</type>
+      </para>
+      <para>
+       Row identifier
+      </para></entry>
+     </row>
+
+     <row>
+      <entry role="catalog_table_entry"><para role="column_definition">
+       <structfield>nodeid</structfield> <type>name</type>
+      </para>
+      <para>
+       Identifier of this node
+      </para></entry>
+     </row>
+
+     <row>
+      <entry role="catalog_table_entry"><para role="column_definition">
+       <structfield>groupid</structfield> <type>oid</type>
+       (references <link linkend="catalog-pg-lrg-info"><structname>pg_lrg_info</structname></link>.<structfield>oid</structfield>)
+      </para>
+      <para>
+       attached group
+      </para></entry>
+     </row>
+
+     <row>
+      <entry role="catalog_table_entry"><para role="column_definition">
+       <structfield>dbid</structfield> <type>oid</type>
+       (references <link linkend="catalog-pg-database"><structname>pg_database</structname></link>.<structfield>oid</structfield>)
+      </para>
+      <para>
+       database
+      </para></entry>
+     </row>
+
+     <row>
+      <entry role="catalog_table_entry"><para role="column_definition">
+       <structfield>status</structfield> <type>int</type>
+      </para>
+      <para>
+       status
+      </para></entry>
+     </row>
+
+     <row>
+      <entry role="catalog_table_entry"><para role="column_definition">
+       <structfield>nodename</structfield> <type>name</type>
+      </para>
+      <para>
+       Name of this node
+      </para></entry>
+     </row>
+
+     <row>
+      <entry role="catalog_table_entry"><para role="column_definition">
+       <structfield>localconn</structfield> <type>name</type>
+      </para>
+      <para>
+       connection string for this node
+      </para></entry>
+     </row>
+
+     <row>
+      <entry role="catalog_table_entry"><para role="column_definition">
+       <structfield>upstreamconn</structfield> <type>name</type>
+      </para>
+      <para>
+       connection string for upstream node
+      </para></entry>
+     </row>
+
+    </tbody>
+   </tgroup>
+  </table>
+ </sect1>
+
+ <sect1 id="catalog-pg-lrg-pub">
+  <title><structname>pg_lrg_pub</structname></title>
+
+  <indexterm zone="catalog-pg-lrg-pub">
+   <primary>pg_lrg_pub</primary>
+  </indexterm>
+  <para>
+   test
+  </para>
+
+  <table>
+   <title><structname>pg_lrg_pub</structname> Columns</title>
+   <tgroup cols="1">
+    <thead>
+     <row>
+      <entry role="catalog_table_entry"><para role="column_definition">
+       Column Type
+      </para>
+      <para>
+       Description
+      </para></entry>
+     </row>
+    </thead>
+
+    <tbody>
+     <row>
+      <entry role="catalog_table_entry"><para role="column_definition">
+       <structfield>oid</structfield> <type>oid</type>
+      </para>
+      <para>
+       Row identifier
+      </para></entry>
+     </row>
+
+     <row>
+      <entry role="catalog_table_entry"><para role="column_definition">
+       <structfield>groupid</structfield> <type>oid</type>
+       (references <link linkend="catalog-pg-lrg-info"><structname>pg_lrg_info</structname></link>.<structfield>oid</structfield>)
+      </para>
+      <para>
+       attached group
+      </para></entry>
+     </row>
+
+     <row>
+      <entry role="catalog_table_entry"><para role="column_definition">
+       <structfield>pubid</structfield> <type>oid</type>
+       (references <link linkend="catalog-pg-publication"><structname>pg_publication</structname></link>.<structfield>oid</structfield>)
+      </para>
+      <para>
+       publication
+      </para></entry>
+     </row>
+
+
+    </tbody>
+   </tgroup>
+  </table>
+ </sect1>
+
+
+ <sect1 id="catalog-pg-lrg-sub">
+  <title><structname>pg_lrg_sub</structname></title>
+
+  <indexterm zone="catalog-pg-lrg-sub">
+   <primary>pg_lrg_sub</primary>
+  </indexterm>
+  <para>
+   test
+  </para>
+
+  <table>
+   <title><structname>pg_lrg_sub</structname> Columns</title>
+   <tgroup cols="1">
+    <thead>
+     <row>
+      <entry role="catalog_table_entry"><para role="column_definition">
+       Column Type
+      </para>
+      <para>
+       Description
+      </para></entry>
+     </row>
+    </thead>
+
+    <tbody>
+     <row>
+      <entry role="catalog_table_entry"><para role="column_definition">
+       <structfield>oid</structfield> <type>oid</type>
+      </para>
+      <para>
+       Row identifier
+      </para></entry>
+     </row>
+
+     <row>
+      <entry role="catalog_table_entry"><para role="column_definition">
+       <structfield>groupid</structfield> <type>oid</type>
+       (references <link linkend="catalog-pg-lrg-info"><structname>pg_lrg_info</structname></link>.<structfield>oid</structfield>)
+      </para>
+      <para>
+       attached group
+      </para></entry>
+     </row>
+
+     <row>
+      <entry role="catalog_table_entry"><para role="column_definition">
+       <structfield>subid</structfield> <type>oid</type>
+       (references <link linkend="catalog-pg-subscription"><structname>pg_subscription</structname></link>.<structfield>oid</structfield>)
+      </para>
+      <para>
+       subscription
+      </para></entry>
+     </row>
+
+
+    </tbody>
+   </tgroup>
+  </table>
+ </sect1>
 
  <sect1 id="catalog-pg-namespace">
   <title><structname>pg_namespace</structname></title>
diff --git a/doc/src/sgml/filelist.sgml b/doc/src/sgml/filelist.sgml
index 40ef5f7ffc..8be17a652e 100644
--- a/doc/src/sgml/filelist.sgml
+++ b/doc/src/sgml/filelist.sgml
@@ -49,6 +49,7 @@
 <!ENTITY wal           SYSTEM "wal.sgml">
 <!ENTITY logical-replication    SYSTEM "logical-replication.sgml">
 <!ENTITY jit    SYSTEM "jit.sgml">
+<!ENTITY lrg    SYSTEM "lrg.sgml">
 
 <!-- programmer's guide -->
 <!ENTITY bgworker   SYSTEM "bgworker.sgml">
diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml
index 85ecc639fd..85edb739f0 100644
--- a/doc/src/sgml/func.sgml
+++ b/doc/src/sgml/func.sgml
@@ -29233,6 +29233,85 @@ postgres=# SELECT * FROM pg_walfile_name_offset((pg_backup_stop()).lsn);
 
   </sect2>
 
+  <sect2 id="functions-lrg">
+   <title>Logical Replication Group Management Functions</title>
+
+   <para>
+    The functions shown
+    in <xref linkend="functions-lrg-table"/> are for
+    controlling and interacting with logical replication groups.
+   </para>
+
+   <table id="functions-lrg-table">
+    <title>LRG Management Functions</title>
+    <tgroup cols="1">
+     <thead>
+      <row>
+       <entry role="func_table_entry"><para role="func_signature">
+        Function
+       </para>
+       <para>
+        Description
+       </para></entry>
+      </row>
+     </thead>
+
+     <tbody>
+      <row>
+       <entry role="func_table_entry"><para role="func_signature">
+        <indexterm>
+         <primary></primary>
+        </indexterm>
+        <function>lrg_create</function> ( <parameter>group_name</parameter> <type>name</type>, <parameter>publication_type</parameter> <type>text</type>, <parameter>local_connstring</parameter> <type>text</type>, <parameter>node_name</parameter> <type>name</type>)
+        <returnvalue></returnvalue>
+       </para>
+       <para>
+        creates a group.
+       </para></entry>
+      </row>
+      <row>
+       <entry role="func_table_entry"><para role="func_signature">
+        <indexterm>
+         <primary></primary>
+        </indexterm>
+        <function>lrg_node_attach</function> ( <parameter>group_name</parameter> <type>name</type>, <parameter>local_connstring</parameter> <type>text</type>, <parameter>upstream_connstring</parameter> <type>text</type>, <parameter>node_name</parameter> <type>name</type>)
+        <returnvalue></returnvalue>
+       </para>
+       <para>
+        attach to a group.
+       </para></entry>
+      </row>
+      <row>
+       <entry role="func_table_entry"><para role="func_signature">
+        <indexterm>
+         <primary></primary>
+        </indexterm>
+        <function>lrg_node_detach</function> ( <parameter>group_name</parameter> <type>name</type>, <parameter>node_name</parameter> <type>name</type>)
+        <returnvalue></returnvalue>
+       </para>
+       <para>
+        detach from a group. This function must be executed from a member of the group.
+       </para></entry>
+      </row>
+      <row>
+       <entry role="func_table_entry"><para role="func_signature">
+        <indexterm>
+         <primary></primary>
+        </indexterm>
+        <function>lrg_drop</function> ( <parameter>group_name</parameter> <type>name</type>)
+        <returnvalue></returnvalue>
+       </para>
+       <para>
+        Detach from a given group and remove it. This function must be executed from a member of the group.
+       </para></entry>
+      </row>
+
+     </tbody>
+    </tgroup>
+   </table>
+
+  </sect2>
+
   <sect2 id="functions-admin-dbobject">
    <title>Database Object Management Functions</title>
 
diff --git a/doc/src/sgml/lrg.sgml b/doc/src/sgml/lrg.sgml
new file mode 100644
index 0000000000..f27c3def18
--- /dev/null
+++ b/doc/src/sgml/lrg.sgml
@@ -0,0 +1,364 @@
+<!-- doc/src/sgml/lrg.sgml -->
+ <chapter id="lrg">
+  <title>Logical Replication Group (LRG)</title>
+
+  <indexterm zone="lrg">
+   <primary>logical replication group</primary>
+  </indexterm>
+
+ <para>
+  Logical Replication Group(LRG) provides an easy way for constructing
+  N-directional logical replicaiton systems. Advantages of LRG are
+
+   <itemizedlist>
+    <listitem>
+     <para>
+      Allowing load balancing
+     </para>
+    </listitem>
+    <listitem>
+     <para>
+      Allowing rolling updates of nodes
+     </para>
+    </listitem>
+    <listitem>
+     <para>
+      Improving the availability of the system
+     </para>
+    </listitem>
+    <listitem>
+     <para>
+      Improving performance
+     </para>
+    </listitem>
+   </itemizedlist>
+
+ </para>
+
+  <sect1 id="lrg-concepts">
+   <title>Concepts</title>
+   <para>
+    In this section terminology for LRG is defined.
+   </para>
+   <sect2>
+    <title>Node Group</title>
+    <para>
+     Node group is a group that participant nodes have same tables and send/receive their data changes.
+     Note that each node in a group must be connected to all other nodes in that group.
+     LRG accomplishes such a group by creating publications and subscriptions on nodes.
+    </para>
+   </sect2>
+
+   <sect2>
+    <title>Upstream node</title>
+    <para>
+     Upstream node is a node that belongs to a group, and it must be specified when a node attaches to the group.
+     It is not special and arbitrary participants can be used as that.
+     The attaching node will copy data from the upstream.
+    </para>
+   </sect2>
+
+   <sect2>
+    <title>lrg launcher</title>
+    <para>
+     Lrg launcher is a background worker that is registered at postmaster startup.
+     This will be registered when <varname>max_logical_replication_workers</varname> is not equal to zero.
+     This process sometimes seeks pg_database, and it launches lrg worker associated with the database.
+    </para>
+   </sect2>
+
+   <sect2>
+    <title>lrg worker</title>
+    <para>
+     Lrg worker is a background worker that is registered by the lrg launcher.
+     This process is associated with a specific database, and performs all operations related to the LRG feature.
+     If this database is not a member of any node groups, this process will exit immediately.
+    </para>
+   </sect2>
+
+  </sect1>
+
+  <sect1 id="lrg-interface">
+   <title>LRG SQL interface</title>
+    <para>
+     See <xref linkend="functions-lrg"/> for detailed documentation on
+     SQL-level APIs for interacting with logical replication group.
+    </para>
+  </sect1>
+
+  <sect1 id="lrg-catalog">
+   <title>System Catalogs related to LRG</title>
+    <para>
+     Following catalogs are used for LRG.
+
+    <itemizedlist>
+     <listitem>
+      <para>
+       <link linkend="catalog-pg-lrg-info"><structname>pg_lrg_info</structname></link>,
+      </para>
+     </listitem>
+     <listitem>
+      <para>
+       <link linkend="catalog-pg-lrg-nodes"><structname>pg_lrg_nodes</structname></link>,
+      </para>
+     </listitem>
+     <listitem>
+      <para>
+       <link linkend="catalog-pg-lrg-pub"><structname>pg_lrg_pub</structname></link>,
+      </para>
+     </listitem>
+     <listitem>
+      <para>
+       <link linkend="catalog-pg-lrg-sub"><structname>pg_lrg_sub</structname></link>
+      </para>
+     </listitem>
+    </itemizedlist>
+    </para>
+
+  </sect1>
+
+  <sect1 id="lrg-restriction">
+   <title>Restrictions</title>
+    <para>
+     LRG currently has the following restrictions or missing functionality.
+    <itemizedlist>
+     <listitem>
+      <para>
+       Each node can attach only one node set at a time.
+      </para>
+     </listitem>
+     <listitem>
+      <para>
+       All tables must be shard in the database.
+      </para>
+     </listitem>
+     <listitem>
+      <para>
+       LRG does not provide any mechanism for conflict handling (e.g. PK violations).
+       Avoiding conflicts is the user responsibility.
+      </para>
+     </listitem>
+    </itemizedlist>
+    </para>
+  </sect1>
+
+  <sect1 id="lrg-configuration">
+   <title>Configuration</title>
+    <para>
+     LRG will create a publication and subscriptions on each node,
+     that number of subscribers is the number of participants minus one. Therefore LRG requires
+     several configuration options.
+    </para>
+
+    <itemizedlist>
+     <listitem>
+      <para>
+       <varname>wal_level</varname> must be <literal>logical</literal>.
+      </para>
+     </listitem>
+     <listitem>
+      <para>
+       <varname>max_replication_slots</varname> must be set to at least the number of participants.
+      </para>
+     </listitem>
+     <listitem>
+      <para>
+       <varname>max_wal_senders</varname> must be set to at least same as <varname>max_replication_slots</varname>.
+      </para>
+     </listitem>
+     <listitem>
+      <para>
+       <varname>max_logical_replication_workers</varname> must be larger than the number of participants.
+      </para>
+     </listitem>
+     <listitem>
+      <para>
+       <varname>max_worker_processes</varname> must be larger that <literal>max_logical_replication_workers</literal>.
+      </para>
+     </listitem>
+    </itemizedlist>
+  </sect1>
+
+  <sect1 id="lrg-example">
+   <title>Example</title>
+   <para>
+    The following example demonstrates constructing N-directional logical replication group.
+    Assuming that there are three nodes, called Node1, Node2, Node3, and they can connect each other.
+    In the example they are in the same machine.
+   </para>
+
+   <sect2>
+    <title>Create a group</title>
+
+    <para>
+     At first, a node group must be created. Here it will be created on Node1.
+    </para>
+<programlisting>
+postgres=# -- Create a node group 'testgroup', and attach to it as 'testnode1'
+postgres=# SELECT lrg_create('testgroup', 'FOR ALL TABLES', 'port=5431 dbname=postgres', 'testnode1');
+ lrg_create
+------------
+
+(1 row)
+</programlisting>
+
+    <para>
+     Information of the group and nodes can check via system catalogs,
+     like <link linkend="catalog-pg-lrg-nodes"><structname>pg_lrg_nodes</structname></link>.
+    </para>
+<programlisting>
+postgres=# SELECT * FROM pg_lrg_info ;
+  oid  | groupname | puballtables
+-------+-----------+--------------
+ 16384 | testgroup | t
+(1 row)
+
+postgres=# SELECT * FROM pg_lrg_nodes;
+  oid  |        nodeid        | groupid | dbid | status | nodename  |         localconn         | upstreamconn
+-------+----------------------+---------+------+--------+-----------+---------------------------+--------------
+ 16385 | 70988980716274892555 |   16384 |    5 |      3 | testnode1 | port=5431 dbname=postgres |
+(1 row)
+</programlisting>
+
+   </sect2>
+
+   <sect2>
+    <title>Attach to a group</title>
+
+    <para>
+     Next Node2 can attach the created group testgroup. Note that the connection string for Node1 must be specified.
+    </para>
+
+<programlisting>
+postgres=# -- Attach to 'testgroup' as 'testnode2'
+postgres=# SELECT lrg_node_attach('testgroup', 'port=5432 dbname=postgres', 'port=5431 dbname=postgres', 'testnode2');
+ lrg_node_attach
+-----------------
+
+(1 row)
+</programlisting>
+
+    <para>
+     The status of all nodes can check via pg_lrg_nodes. Following tuples will be found on both nodes.
+    </para>
+
+<programlisting>
+postgres=# SELECT * FROM pg_lrg_info ;
+  oid  | groupname | puballtables
+-------+-----------+--------------
+ 16384 | testgroup | t
+(1 row)
+
+postgres=# SELECT * FROM pg_lrg_nodes;
+  oid  |        nodeid        | groupid | dbid | status | nodename  |         localconn         |       upstreamconn
+-------+----------------------+---------+------+--------+-----------+---------------------------+---------------------------
+ 16385 | 70989229890284027935 |   16384 |    5 |      3 | testnode2 | port=5432 dbname=postgres | port=5431 dbname=postgres
+ 16386 | 70988980716274892555 |   16385 |    5 |      3 | testnode1 | port=5431 dbname=postgres |
+(2 rows)
+</programlisting>
+
+    <para>
+     Now Node1 and Node2 has same contents about LRG system catalogs,
+     so ether of them can be specified as an upstream node.
+     In below example Node2 is used as upstream for attaching a new node.
+    </para>
+
+<programlisting>
+postgres=# -- Attach to 'testgroup' as 'testode3', and data will be copied from Node2
+postgres=# SELECT lrg_node_attach('testgroup', 'port=5433 dbname=postgres', 'port=5432 dbname=postgres', 'testnode3');
+ lrg_node_attach
+-----------------
+
+(1 row)
+</programlisting>
+
+    <para>
+     Finally pg_lrg_info and pg_lrg_nodes will be like:
+    </para>
+
+<programlisting>
+postgres=# SELECT * FROM pg_lrg_info ;
+  oid  | groupname | puballtables
+-------+-----------+--------------
+ 16384 | testgroup | t
+(1 row)
+
+postgres=# SELECT * FROM pg_lrg_nodes;
+  oid  |        nodeid        | groupid | dbid | status | nodename  |         localconn         |       upstreamconn
+-------+----------------------+---------+------+--------+-----------+---------------------------+---------------------------
+ 16385 | 70989243367269230745 |   16384 |    5 |      3 | testnode3 | port=5433 dbname=postgres | port=5432 dbname=postgres
+ 16386 | 70989229890284027935 |   16385 |    5 |      3 | testnode2 | port=5432 dbname=postgres | port=5431 dbname=postgres
+ 16387 | 70988980716274892555 |   16385 |    5 |      3 | testnode1 | port=5431 dbname=postgres |
+(3 rows)
+</programlisting>
+
+    <para>
+     Now all nodes publish their changes, and they subscribe them. If a tuple inserted on Node1,
+     the data will be also found on Node2 and Node3.
+    </para>
+
+   </sect2>
+
+   <sect2>
+    <title>Detach from a group</title>
+    <para>
+     User can detach attached nodes at any time. n API for detaching <function>lrg_node_detach</function> is used for that.
+     This function must be called from a node that is a member of a group.
+    </para>
+
+<programlisting>
+postgres=# -- Detach Node3 from 'testgroup'. This can be done from the arbitrary member of the group
+postgres=# select lrg_node_detach('testgroup', 'testnode3');
+ lrg_node_detach
+-----------------
+
+(1 row)
+
+postgres=# -- Also detach Node2 from 'testgroup'. This can be done from Node1 or Node2
+postgres=# select lrg_node_detach('testgroup', 'testnode2');
+ lrg_node_detach
+-----------------
+
+(1 row)
+</programlisting>
+
+    <para>
+     The user-defined data will be not removed even if nodes are detached, but tuples in LRG related catalogs will be deleted.
+    </para>
+
+<programlisting>
+postgres=# -- On Node2, there are no tuples in pg_lrg_nodes
+postgres=# select * from pg_lrg_nodes;
+ oid | nodeid | groupid | dbid | status | nodename | localconn | upstreamconn
+-----+--------+---------+------+--------+----------+-----------+--------------
+(0 rows)
+</programlisting>
+   </sect2>
+
+   <sect2>
+    <title>Drop a group</title>
+    <para>
+     For dropping a group, the API <function>lrg_drop</function> can be used.
+     The function must be callled from a node that is a member of a group, and
+     it will throw ERROR if there are other members in the group.
+    </para>
+
+<programlisting>
+postgres=# -- Drop 'testgroup.' Node1 will detach from it automatically.
+postgres=# select lrg_drop('testgroup');
+ lrg_drop
+----------
+
+(1 row)
+
+postgres=# -- There are no tuples in pg_lrg_info
+postgres=# select * from pg_lrg_info;
+ oid | groupname | puballtables
+-----+-----------+--------------
+(0 rows)
+</programlisting>
+   </sect2>
+
+  </sect1>
+
+ </chapter>
\ No newline at end of file
diff --git a/doc/src/sgml/postgres.sgml b/doc/src/sgml/postgres.sgml
index 0b60e46d69..bcea47fdc9 100644
--- a/doc/src/sgml/postgres.sgml
+++ b/doc/src/sgml/postgres.sgml
@@ -172,6 +172,7 @@ break is not needed in a wider output rendering.
   &logical-replication;
   &jit;
   &regress;
+  &lrg;
 
  </part>
 
-- 
2.27.0

v14-0001-Skip-replication-of-non-local-data.patchapplication/octet-stream; name=v14-0001-Skip-replication-of-non-local-data.patchDownload

From 3ea787ae1d7abea14ac3a98a4df27b3a3bdc14e2 Mon Sep 17 00:00:00 2001
From: Vigneshwaran C <vignesh21@gmail.com>
Date: Fri, 8 Apr 2022 11:10:05 +0530
Subject: [PATCH v14 1/2] Skip replication of non local data.

This patch adds a new SUBSCRIPTION boolean option
"only_local". The default is false. When a SUBSCRIPTION is
created with this option enabled, the publisher will only publish data
that originated at the publisher node.
Usage:
CREATE SUBSCRIPTION sub1 CONNECTION 'dbname=postgres port=9999'
PUBLICATION pub1 with (only_local = true);
---
 contrib/test_decoding/expected/replorigin.out |  55 ++++++
 contrib/test_decoding/sql/replorigin.sql      |  15 ++
 doc/src/sgml/catalogs.sgml                    |  11 ++
 doc/src/sgml/ref/alter_subscription.sgml      |   5 +-
 doc/src/sgml/ref/create_subscription.sgml     |  12 ++
 src/backend/catalog/pg_subscription.c         |   1 +
 src/backend/catalog/system_views.sql          |   4 +-
 src/backend/commands/subscriptioncmds.c       |  26 ++-
 .../libpqwalreceiver/libpqwalreceiver.c       |   5 +
 src/backend/replication/logical/worker.c      |   2 +
 src/backend/replication/pgoutput/pgoutput.c   |  20 ++-
 src/bin/pg_dump/pg_dump.c                     |  17 +-
 src/bin/pg_dump/pg_dump.h                     |   1 +
 src/bin/psql/describe.c                       |   8 +-
 src/bin/psql/tab-complete.c                   |   4 +-
 src/include/catalog/pg_subscription.h         |   3 +
 src/include/replication/logicalproto.h        |   7 +
 src/include/replication/pgoutput.h            |   1 +
 src/include/replication/walreceiver.h         |   1 +
 src/test/regress/expected/subscription.out    | 142 ++++++++-------
 src/test/regress/sql/subscription.sql         |  10 ++
 src/test/subscription/t/032_onlylocal.pl      | 162 ++++++++++++++++++
 22 files changed, 440 insertions(+), 72 deletions(-)
 create mode 100644 src/test/subscription/t/032_onlylocal.pl

diff --git a/contrib/test_decoding/expected/replorigin.out b/contrib/test_decoding/expected/replorigin.out
index 2e9ef7c823..94ef390120 100644
--- a/contrib/test_decoding/expected/replorigin.out
+++ b/contrib/test_decoding/expected/replorigin.out
@@ -257,3 +257,58 @@ SELECT pg_replication_origin_drop('regress_test_decoding: regression_slot_no_lsn
  
 (1 row)
 
+-- Verify that remote origin data is not returned with only-local option
+SELECT 'init' FROM pg_create_logical_replication_slot('regression_slot_only_local', 'test_decoding');
+ ?column? 
+----------
+ init
+(1 row)
+
+SELECT pg_replication_origin_create('regress_test_decoding: regression_slot_only_local');
+ pg_replication_origin_create 
+------------------------------
+                            1
+(1 row)
+
+SELECT pg_replication_origin_session_setup('regress_test_decoding: regression_slot_only_local');
+ pg_replication_origin_session_setup 
+-------------------------------------
+ 
+(1 row)
+
+INSERT INTO origin_tbl(data) VALUES ('only_local, commit1');
+-- remote origin data returned when only-local option is not set
+SELECT data FROM pg_logical_slot_get_changes('regression_slot_only_local', NULL, NULL, 'skip-empty-xacts', '1', 'include-xids', '0', 'only-local', '0');
+                                      data                                       
+---------------------------------------------------------------------------------
+ BEGIN
+ table public.origin_tbl: INSERT: id[integer]:8 data[text]:'only_local, commit1'
+ COMMIT
+(3 rows)
+
+INSERT INTO origin_tbl(data) VALUES ('only_local, commit2');
+-- remote origin data not returned when only-local option is set
+SELECT data FROM pg_logical_slot_get_changes('regression_slot_only_local', NULL, NULL, 'skip-empty-xacts', '1', 'include-xids', '0', 'only-local', '1');
+ data 
+------
+(0 rows)
+
+-- Clean up
+SELECT pg_replication_origin_session_reset();
+ pg_replication_origin_session_reset 
+-------------------------------------
+ 
+(1 row)
+
+SELECT pg_drop_replication_slot('regression_slot_only_local');
+ pg_drop_replication_slot 
+--------------------------
+ 
+(1 row)
+
+SELECT pg_replication_origin_drop('regress_test_decoding: regression_slot_only_local');
+ pg_replication_origin_drop 
+----------------------------
+ 
+(1 row)
+
diff --git a/contrib/test_decoding/sql/replorigin.sql b/contrib/test_decoding/sql/replorigin.sql
index 2e28a48777..5d1045e105 100644
--- a/contrib/test_decoding/sql/replorigin.sql
+++ b/contrib/test_decoding/sql/replorigin.sql
@@ -119,3 +119,18 @@ SELECT data FROM pg_logical_slot_get_changes('regression_slot_no_lsn', NULL, NUL
 SELECT pg_replication_origin_session_reset();
 SELECT pg_drop_replication_slot('regression_slot_no_lsn');
 SELECT pg_replication_origin_drop('regress_test_decoding: regression_slot_no_lsn');
+
+-- Verify that remote origin data is not returned with only-local option
+SELECT 'init' FROM pg_create_logical_replication_slot('regression_slot_only_local', 'test_decoding');
+SELECT pg_replication_origin_create('regress_test_decoding: regression_slot_only_local');
+SELECT pg_replication_origin_session_setup('regress_test_decoding: regression_slot_only_local');
+INSERT INTO origin_tbl(data) VALUES ('only_local, commit1');
+-- remote origin data returned when only-local option is not set
+SELECT data FROM pg_logical_slot_get_changes('regression_slot_only_local', NULL, NULL, 'skip-empty-xacts', '1', 'include-xids', '0', 'only-local', '0');
+INSERT INTO origin_tbl(data) VALUES ('only_local, commit2');
+-- remote origin data not returned when only-local option is set
+SELECT data FROM pg_logical_slot_get_changes('regression_slot_only_local', NULL, NULL, 'skip-empty-xacts', '1', 'include-xids', '0', 'only-local', '1');
+-- Clean up
+SELECT pg_replication_origin_session_reset();
+SELECT pg_drop_replication_slot('regression_slot_only_local');
+SELECT pg_replication_origin_drop('regress_test_decoding: regression_slot_only_local');
\ No newline at end of file
diff --git a/doc/src/sgml/catalogs.sgml b/doc/src/sgml/catalogs.sgml
index d96c72e531..fcb6ff0331 100644
--- a/doc/src/sgml/catalogs.sgml
+++ b/doc/src/sgml/catalogs.sgml
@@ -7861,6 +7861,17 @@ SCRAM-SHA-256$<replaceable>&lt;iteration count&gt;</replaceable>:<replaceable>&l
       </para></entry>
      </row>
 
+    <row>
+      <entry role="catalog_table_entry"><para role="column_definition">
+       <structfield>subonlylocal</structfield> <type>bool</type>
+      </para>
+      <para>
+       If true, subscription will request the publisher to send locally
+       originated changes at the publisher node, or send any publisher node
+       changes regardless of their origin
+      </para></entry>
+     </row>
+
      <row>
       <entry role="catalog_table_entry"><para role="column_definition">
        <structfield>substream</structfield> <type>bool</type>
diff --git a/doc/src/sgml/ref/alter_subscription.sgml b/doc/src/sgml/ref/alter_subscription.sgml
index 353ea5def2..45beca9b86 100644
--- a/doc/src/sgml/ref/alter_subscription.sgml
+++ b/doc/src/sgml/ref/alter_subscription.sgml
@@ -207,8 +207,9 @@ ALTER SUBSCRIPTION <replaceable class="parameter">name</replaceable> RENAME TO <
       information.  The parameters that can be altered
       are <literal>slot_name</literal>,
       <literal>synchronous_commit</literal>,
-      <literal>binary</literal>, <literal>streaming</literal>, and
-      <literal>disable_on_error</literal>.
+      <literal>binary</literal>, <literal>streaming</literal>,
+      <literal>disable_on_error</literal>, and
+      <literal>only_local</literal>.
      </para>
     </listitem>
    </varlistentry>
diff --git a/doc/src/sgml/ref/create_subscription.sgml b/doc/src/sgml/ref/create_subscription.sgml
index 203bb41844..00580cc7ba 100644
--- a/doc/src/sgml/ref/create_subscription.sgml
+++ b/doc/src/sgml/ref/create_subscription.sgml
@@ -216,6 +216,18 @@ CREATE SUBSCRIPTION <replaceable class="parameter">subscription_name</replaceabl
         </listitem>
        </varlistentry>
 
+       <varlistentry>
+        <term><literal>only_local</literal> (<type>boolean</type>)</term>
+        <listitem>
+         <para>
+          Specifies whether the subscription will request the publisher to send
+          locally originated changes at the publisher node, or send any
+          publisher node changes regardless of their origin. The default is
+          <literal>false</literal>.
+         </para>
+        </listitem>
+       </varlistentry>
+
        <varlistentry>
         <term><literal>streaming</literal> (<type>boolean</type>)</term>
         <listitem>
diff --git a/src/backend/catalog/pg_subscription.c b/src/backend/catalog/pg_subscription.c
index add51caadf..f0c83aaf59 100644
--- a/src/backend/catalog/pg_subscription.c
+++ b/src/backend/catalog/pg_subscription.c
@@ -71,6 +71,7 @@ GetSubscription(Oid subid, bool missing_ok)
 	sub->stream = subform->substream;
 	sub->twophasestate = subform->subtwophasestate;
 	sub->disableonerr = subform->subdisableonerr;
+	sub->only_local = subform->subonlylocal;
 
 	/* Get conninfo */
 	datum = SysCacheGetAttr(SUBSCRIPTIONOID,
diff --git a/src/backend/catalog/system_views.sql b/src/backend/catalog/system_views.sql
index fedaed533b..88bde866ed 100644
--- a/src/backend/catalog/system_views.sql
+++ b/src/backend/catalog/system_views.sql
@@ -1298,8 +1298,8 @@ REVOKE ALL ON pg_replication_origin_status FROM public;
 -- All columns of pg_subscription except subconninfo are publicly readable.
 REVOKE ALL ON pg_subscription FROM public;
 GRANT SELECT (oid, subdbid, subskiplsn, subname, subowner, subenabled,
-              subbinary, substream, subtwophasestate, subdisableonerr, subslotname,
-              subsynccommit, subpublications)
+              subbinary, substream, subtwophasestate, subdisableonerr,
+              subonlylocal, subslotname, subsynccommit, subpublications)
     ON pg_subscription TO public;
 
 CREATE VIEW pg_stat_subscription_stats AS
diff --git a/src/backend/commands/subscriptioncmds.c b/src/backend/commands/subscriptioncmds.c
index 690cdaa426..479d6ca372 100644
--- a/src/backend/commands/subscriptioncmds.c
+++ b/src/backend/commands/subscriptioncmds.c
@@ -64,6 +64,7 @@
 #define SUBOPT_TWOPHASE_COMMIT		0x00000200
 #define SUBOPT_DISABLE_ON_ERR		0x00000400
 #define SUBOPT_LSN					0x00000800
+#define SUBOPT_ONLY_LOCAL			0x00001000
 
 /* check if the 'val' has 'bits' set */
 #define IsSet(val, bits)  (((val) & (bits)) == (bits))
@@ -86,6 +87,7 @@ typedef struct SubOpts
 	bool		streaming;
 	bool		twophase;
 	bool		disableonerr;
+	bool		only_local;
 	XLogRecPtr	lsn;
 } SubOpts;
 
@@ -137,6 +139,8 @@ parse_subscription_options(ParseState *pstate, List *stmt_options,
 		opts->twophase = false;
 	if (IsSet(supported_opts, SUBOPT_DISABLE_ON_ERR))
 		opts->disableonerr = false;
+	if (IsSet(supported_opts, SUBOPT_ONLY_LOCAL))
+		opts->only_local = false;
 
 	/* Parse options */
 	foreach(lc, stmt_options)
@@ -235,6 +239,15 @@ parse_subscription_options(ParseState *pstate, List *stmt_options,
 			opts->specified_opts |= SUBOPT_STREAMING;
 			opts->streaming = defGetBoolean(defel);
 		}
+		else if (IsSet(supported_opts, SUBOPT_ONLY_LOCAL) &&
+				 strcmp(defel->defname, "only_local") == 0)
+		{
+			if (IsSet(opts->specified_opts, SUBOPT_ONLY_LOCAL))
+				errorConflictingDefElem(defel, pstate);
+
+			opts->specified_opts |= SUBOPT_ONLY_LOCAL;
+			opts->only_local = defGetBoolean(defel);
+		}
 		else if (strcmp(defel->defname, "two_phase") == 0)
 		{
 			/*
@@ -531,7 +544,7 @@ CreateSubscription(ParseState *pstate, CreateSubscriptionStmt *stmt,
 					  SUBOPT_SLOT_NAME | SUBOPT_COPY_DATA |
 					  SUBOPT_SYNCHRONOUS_COMMIT | SUBOPT_BINARY |
 					  SUBOPT_STREAMING | SUBOPT_TWOPHASE_COMMIT |
-					  SUBOPT_DISABLE_ON_ERR);
+					  SUBOPT_DISABLE_ON_ERR | SUBOPT_ONLY_LOCAL);
 	parse_subscription_options(pstate, stmt->options, supported_opts, &opts);
 
 	/*
@@ -602,6 +615,7 @@ CreateSubscription(ParseState *pstate, CreateSubscriptionStmt *stmt,
 	values[Anum_pg_subscription_subenabled - 1] = BoolGetDatum(opts.enabled);
 	values[Anum_pg_subscription_subbinary - 1] = BoolGetDatum(opts.binary);
 	values[Anum_pg_subscription_substream - 1] = BoolGetDatum(opts.streaming);
+	values[Anum_pg_subscription_subonlylocal - 1] = BoolGetDatum(opts.only_local);
 	values[Anum_pg_subscription_subtwophasestate - 1] =
 		CharGetDatum(opts.twophase ?
 					 LOGICALREP_TWOPHASE_STATE_PENDING :
@@ -1015,7 +1029,8 @@ AlterSubscription(ParseState *pstate, AlterSubscriptionStmt *stmt,
 			{
 				supported_opts = (SUBOPT_SLOT_NAME |
 								  SUBOPT_SYNCHRONOUS_COMMIT | SUBOPT_BINARY |
-								  SUBOPT_STREAMING | SUBOPT_DISABLE_ON_ERR);
+								  SUBOPT_STREAMING | SUBOPT_DISABLE_ON_ERR |
+								  SUBOPT_ONLY_LOCAL);
 
 				parse_subscription_options(pstate, stmt->options,
 										   supported_opts, &opts);
@@ -1072,6 +1087,13 @@ AlterSubscription(ParseState *pstate, AlterSubscriptionStmt *stmt,
 						= true;
 				}
 
+				if (IsSet(opts.specified_opts, SUBOPT_ONLY_LOCAL))
+				{
+					values[Anum_pg_subscription_subonlylocal - 1] =
+						BoolGetDatum(opts.only_local);
+					replaces[Anum_pg_subscription_subonlylocal - 1] = true;
+				}
+
 				update_tuple = true;
 				break;
 			}
diff --git a/src/backend/replication/libpqwalreceiver/libpqwalreceiver.c b/src/backend/replication/libpqwalreceiver/libpqwalreceiver.c
index 0d89db4e6a..56a07f0dce 100644
--- a/src/backend/replication/libpqwalreceiver/libpqwalreceiver.c
+++ b/src/backend/replication/libpqwalreceiver/libpqwalreceiver.c
@@ -453,6 +453,11 @@ libpqrcv_startstreaming(WalReceiverConn *conn,
 			PQserverVersion(conn->streamConn) >= 150000)
 			appendStringInfoString(&cmd, ", two_phase 'on'");
 
+		/* FIXME: 150000 should be changed to 160000 later for PG16. */
+		if (options->proto.logical.only_local &&
+			PQserverVersion(conn->streamConn) >= 150000)
+			appendStringInfoString(&cmd, ", only_local 'on'");
+
 		pubnames = options->proto.logical.publication_names;
 		pubnames_str = stringlist_to_identifierstr(conn->streamConn, pubnames);
 		if (!pubnames_str)
diff --git a/src/backend/replication/logical/worker.c b/src/backend/replication/logical/worker.c
index fc210a9e7b..d41ba854b8 100644
--- a/src/backend/replication/logical/worker.c
+++ b/src/backend/replication/logical/worker.c
@@ -3060,6 +3060,7 @@ maybe_reread_subscription(void)
 		newsub->binary != MySubscription->binary ||
 		newsub->stream != MySubscription->stream ||
 		newsub->owner != MySubscription->owner ||
+		newsub->only_local != MySubscription->only_local ||
 		!equal(newsub->publications, MySubscription->publications))
 	{
 		ereport(LOG,
@@ -3740,6 +3741,7 @@ ApplyWorkerMain(Datum main_arg)
 	options.proto.logical.binary = MySubscription->binary;
 	options.proto.logical.streaming = MySubscription->stream;
 	options.proto.logical.twophase = false;
+	options.proto.logical.only_local = MySubscription->only_local;
 
 	if (!am_tablesync_worker())
 	{
diff --git a/src/backend/replication/pgoutput/pgoutput.c b/src/backend/replication/pgoutput/pgoutput.c
index 42c06af239..82b2b8245e 100644
--- a/src/backend/replication/pgoutput/pgoutput.c
+++ b/src/backend/replication/pgoutput/pgoutput.c
@@ -287,11 +287,13 @@ parse_output_parameters(List *options, PGOutputData *data)
 	bool		messages_option_given = false;
 	bool		streaming_given = false;
 	bool		two_phase_option_given = false;
+	bool		only_local_option_given = false;
 
 	data->binary = false;
 	data->streaming = false;
 	data->messages = false;
 	data->two_phase = false;
+	data->only_local = false;
 
 	foreach(lc, options)
 	{
@@ -380,6 +382,16 @@ parse_output_parameters(List *options, PGOutputData *data)
 
 			data->two_phase = defGetBoolean(defel);
 		}
+		else if (strcmp(defel->defname, "only_local") == 0)
+		{
+			if (only_local_option_given)
+				ereport(ERROR,
+						(errcode(ERRCODE_SYNTAX_ERROR),
+						 errmsg("conflicting or redundant options")));
+			only_local_option_given = true;
+
+			data->only_local = defGetBoolean(defel);
+		}
 		else
 			elog(ERROR, "unrecognized pgoutput option: %s", defel->defname);
 	}
@@ -1698,12 +1710,18 @@ pgoutput_message(LogicalDecodingContext *ctx, ReorderBufferTXN *txn,
 }
 
 /*
- * Currently we always forward.
+ * Return true if data has originated remotely when only_local option is
+ * enabled, false otherwise.
  */
 static bool
 pgoutput_origin_filter(LogicalDecodingContext *ctx,
 					   RepOriginId origin_id)
 {
+	PGOutputData *data = (PGOutputData *) ctx->output_plugin_private;
+
+	if (data->only_local && origin_id != InvalidRepOriginId)
+		return true;
+
 	return false;
 }
 
diff --git a/src/bin/pg_dump/pg_dump.c b/src/bin/pg_dump/pg_dump.c
index 7cc9c72e49..05ed85533b 100644
--- a/src/bin/pg_dump/pg_dump.c
+++ b/src/bin/pg_dump/pg_dump.c
@@ -4411,6 +4411,7 @@ getSubscriptions(Archive *fout)
 	int			i_subsynccommit;
 	int			i_subpublications;
 	int			i_subbinary;
+	int			i_subonlylocal;
 	int			i,
 				ntups;
 
@@ -4455,13 +4456,19 @@ getSubscriptions(Archive *fout)
 	if (fout->remoteVersion >= 150000)
 		appendPQExpBufferStr(query,
 							 " s.subtwophasestate,\n"
-							 " s.subdisableonerr\n");
+							 " s.subdisableonerr,\n");
 	else
 		appendPQExpBuffer(query,
 						  " '%c' AS subtwophasestate,\n"
-						  " false AS subdisableonerr\n",
+						  " false AS subdisableonerr,\n",
 						  LOGICALREP_TWOPHASE_STATE_DISABLED);
 
+	/* FIXME: 150000 should be changed to 160000 later for PG16. */
+	if (fout->remoteVersion >= 150000)
+		appendPQExpBufferStr(query, " s.subonlylocal\n");
+	else
+		appendPQExpBufferStr(query, " false AS subonlylocal\n");
+
 	appendPQExpBufferStr(query,
 						 "FROM pg_subscription s\n"
 						 "WHERE s.subdbid = (SELECT oid FROM pg_database\n"
@@ -4487,6 +4494,7 @@ getSubscriptions(Archive *fout)
 	i_substream = PQfnumber(res, "substream");
 	i_subtwophasestate = PQfnumber(res, "subtwophasestate");
 	i_subdisableonerr = PQfnumber(res, "subdisableonerr");
+	i_subonlylocal = PQfnumber(res, "subonlylocal");
 
 	subinfo = pg_malloc(ntups * sizeof(SubscriptionInfo));
 
@@ -4516,6 +4524,8 @@ getSubscriptions(Archive *fout)
 			pg_strdup(PQgetvalue(res, i, i_subtwophasestate));
 		subinfo[i].subdisableonerr =
 			pg_strdup(PQgetvalue(res, i, i_subdisableonerr));
+		subinfo[i].subonlylocal =
+			pg_strdup(PQgetvalue(res, i, i_subonlylocal));
 
 		/* Decide whether we want to dump it */
 		selectDumpableObject(&(subinfo[i].dobj), fout);
@@ -4589,6 +4599,9 @@ dumpSubscription(Archive *fout, const SubscriptionInfo *subinfo)
 	if (strcmp(subinfo->subdisableonerr, "t") == 0)
 		appendPQExpBufferStr(query, ", disable_on_error = true");
 
+	if (strcmp(subinfo->subonlylocal, "t") == 0)
+		appendPQExpBufferStr(query, ", only_local = true");
+
 	if (strcmp(subinfo->subsynccommit, "off") != 0)
 		appendPQExpBuffer(query, ", synchronous_commit = %s", fmtId(subinfo->subsynccommit));
 
diff --git a/src/bin/pg_dump/pg_dump.h b/src/bin/pg_dump/pg_dump.h
index 1d21c2906f..ddb855fd16 100644
--- a/src/bin/pg_dump/pg_dump.h
+++ b/src/bin/pg_dump/pg_dump.h
@@ -661,6 +661,7 @@ typedef struct _SubscriptionInfo
 	char	   *subdisableonerr;
 	char	   *subsynccommit;
 	char	   *subpublications;
+	char	   *subonlylocal;
 } SubscriptionInfo;
 
 /*
diff --git a/src/bin/psql/describe.c b/src/bin/psql/describe.c
index 1a5d924a23..0013e480d6 100644
--- a/src/bin/psql/describe.c
+++ b/src/bin/psql/describe.c
@@ -6354,7 +6354,7 @@ describeSubscriptions(const char *pattern, bool verbose)
 	PGresult   *res;
 	printQueryOpt myopt = pset.popt;
 	static const bool translate_columns[] = {false, false, false, false,
-	false, false, false, false, false, false, false};
+	false, false, false, false, false, false, false, false};
 
 	if (pset.sversion < 100000)
 	{
@@ -6396,6 +6396,12 @@ describeSubscriptions(const char *pattern, bool verbose)
 							  gettext_noop("Two phase commit"),
 							  gettext_noop("Disable on error"));
 
+		/* FIXME: 150000 should be changed to 160000 later for PG16 */
+		if (pset.sversion >= 150000)
+			appendPQExpBuffer(&buf,
+							  ", subonlylocal AS \"%s\"\n",
+							  gettext_noop("Only local"));
+
 		appendPQExpBuffer(&buf,
 						  ",  subsynccommit AS \"%s\"\n"
 						  ",  subconninfo AS \"%s\"\n",
diff --git a/src/bin/psql/tab-complete.c b/src/bin/psql/tab-complete.c
index 55af9eb04e..989d4f3bcb 100644
--- a/src/bin/psql/tab-complete.c
+++ b/src/bin/psql/tab-complete.c
@@ -1875,7 +1875,7 @@ psql_completion(const char *text, int start, int end)
 		COMPLETE_WITH("(", "PUBLICATION");
 	/* ALTER SUBSCRIPTION <name> SET ( */
 	else if (HeadMatches("ALTER", "SUBSCRIPTION", MatchAny) && TailMatches("SET", "("))
-		COMPLETE_WITH("binary", "slot_name", "streaming", "synchronous_commit", "disable_on_error");
+		COMPLETE_WITH("binary", "only_local", "slot_name", "streaming", "synchronous_commit", "disable_on_error");
 	/* ALTER SUBSCRIPTION <name> SKIP ( */
 	else if (HeadMatches("ALTER", "SUBSCRIPTION", MatchAny) && TailMatches("SKIP", "("))
 		COMPLETE_WITH("lsn");
@@ -3157,7 +3157,7 @@ psql_completion(const char *text, int start, int end)
 	/* Complete "CREATE SUBSCRIPTION <name> ...  WITH ( <opt>" */
 	else if (HeadMatches("CREATE", "SUBSCRIPTION") && TailMatches("WITH", "("))
 		COMPLETE_WITH("binary", "connect", "copy_data", "create_slot",
-					  "enabled", "slot_name", "streaming",
+					  "enabled", "only_local", "slot_name", "streaming",
 					  "synchronous_commit", "two_phase", "disable_on_error");
 
 /* CREATE TRIGGER --- is allowed inside CREATE SCHEMA, so use TailMatches */
diff --git a/src/include/catalog/pg_subscription.h b/src/include/catalog/pg_subscription.h
index d1260f590c..d47d4f3a5f 100644
--- a/src/include/catalog/pg_subscription.h
+++ b/src/include/catalog/pg_subscription.h
@@ -70,6 +70,8 @@ CATALOG(pg_subscription,6100,SubscriptionRelationId) BKI_SHARED_RELATION BKI_ROW
 
 	bool		substream;		/* Stream in-progress transactions. */
 
+	bool		subonlylocal;		/* Skip replication of remote origin data */
+
 	char		subtwophasestate;	/* Stream two-phase transactions */
 
 	bool		subdisableonerr;	/* True if a worker error should cause the
@@ -110,6 +112,7 @@ typedef struct Subscription
 	bool		binary;			/* Indicates if the subscription wants data in
 								 * binary format */
 	bool		stream;			/* Allow streaming in-progress transactions. */
+	bool		only_local;		/* Skip replication of remote origin data */
 	char		twophasestate;	/* Allow streaming two-phase transactions */
 	bool		disableonerr;	/* Indicates if the subscription should be
 								 * automatically disabled if a worker error
diff --git a/src/include/replication/logicalproto.h b/src/include/replication/logicalproto.h
index a771ab8ff3..7bb6fee9c9 100644
--- a/src/include/replication/logicalproto.h
+++ b/src/include/replication/logicalproto.h
@@ -32,6 +32,13 @@
  *
  * LOGICALREP_PROTO_TWOPHASE_VERSION_NUM is the minimum protocol version with
  * support for two-phase commit decoding (at prepare time). Introduced in PG15.
+ *
+ * LOGICALREP_PROTO_LOCALONLY_VERSION_NUM is the minimum protocol version with
+ * support for sending only locally originated data from the publisher.
+ * Introduced in PG16.
+ *
+ * FIXME: LOGICALREP_PROTO_LOCALONLY_VERSION_NUM needs to be bumped to 4 in
+ * PG16.
  */
 #define LOGICALREP_PROTO_MIN_VERSION_NUM 1
 #define LOGICALREP_PROTO_VERSION_NUM 1
diff --git a/src/include/replication/pgoutput.h b/src/include/replication/pgoutput.h
index eafedd610a..0461f4e634 100644
--- a/src/include/replication/pgoutput.h
+++ b/src/include/replication/pgoutput.h
@@ -29,6 +29,7 @@ typedef struct PGOutputData
 	bool		streaming;
 	bool		messages;
 	bool		two_phase;
+	bool		only_local;
 } PGOutputData;
 
 #endif							/* PGOUTPUT_H */
diff --git a/src/include/replication/walreceiver.h b/src/include/replication/walreceiver.h
index 81184aa92f..796c04db4e 100644
--- a/src/include/replication/walreceiver.h
+++ b/src/include/replication/walreceiver.h
@@ -183,6 +183,7 @@ typedef struct
 			bool		streaming;	/* Streaming of large transactions */
 			bool		twophase;	/* Streaming of two-phase transactions at
 									 * prepare time */
+			bool		only_local; /* publish only locally originated data */
 		}			logical;
 	}			proto;
 } WalRcvStreamOptions;
diff --git a/src/test/regress/expected/subscription.out b/src/test/regress/expected/subscription.out
index 7fcfad1591..a9351b426b 100644
--- a/src/test/regress/expected/subscription.out
+++ b/src/test/regress/expected/subscription.out
@@ -70,16 +70,38 @@ ALTER SUBSCRIPTION regress_testsub3 ENABLE;
 ERROR:  cannot enable subscription that does not have a slot name
 ALTER SUBSCRIPTION regress_testsub3 REFRESH PUBLICATION;
 ERROR:  ALTER SUBSCRIPTION ... REFRESH is not allowed for disabled subscriptions
+-- fail - only_local must be boolean
+CREATE SUBSCRIPTION regress_testsub4 CONNECTION 'dbname=regress_doesnotexist' PUBLICATION testpub WITH (slot_name = NONE, connect = false, only_local = foo);
+ERROR:  only_local requires a Boolean value
+-- now it works
+CREATE SUBSCRIPTION regress_testsub4 CONNECTION 'dbname=regress_doesnotexist' PUBLICATION testpub WITH (slot_name = NONE, connect = false, only_local = true);
+WARNING:  tables were not subscribed, you will have to run ALTER SUBSCRIPTION ... REFRESH PUBLICATION to subscribe the tables
+\dRs+ regress_testsub4
+                                                                                           List of subscriptions
+       Name       |           Owner           | Enabled | Publication | Binary | Streaming | Two phase commit | Disable on error | Only local | Synchronous commit |          Conninfo           | Skip LSN 
+------------------+---------------------------+---------+-------------+--------+-----------+------------------+------------------+------------+--------------------+-----------------------------+----------
+ regress_testsub4 | regress_subscription_user | f       | {testpub}   | f      | f         | d                | f                | t          | off                | dbname=regress_doesnotexist | 0/0
+(1 row)
+
+ALTER SUBSCRIPTION regress_testsub4 SET (only_local = false);
+\dRs+ regress_testsub4
+                                                                                           List of subscriptions
+       Name       |           Owner           | Enabled | Publication | Binary | Streaming | Two phase commit | Disable on error | Only local | Synchronous commit |          Conninfo           | Skip LSN 
+------------------+---------------------------+---------+-------------+--------+-----------+------------------+------------------+------------+--------------------+-----------------------------+----------
+ regress_testsub4 | regress_subscription_user | f       | {testpub}   | f      | f         | d                | f                | f          | off                | dbname=regress_doesnotexist | 0/0
+(1 row)
+
 DROP SUBSCRIPTION regress_testsub3;
+DROP SUBSCRIPTION regress_testsub4;
 -- fail - invalid connection string
 ALTER SUBSCRIPTION regress_testsub CONNECTION 'foobar';
 ERROR:  invalid connection string syntax: missing "=" after "foobar" in connection info string
 
 \dRs+
-                                                                                    List of subscriptions
-      Name       |           Owner           | Enabled | Publication | Binary | Streaming | Two phase commit | Disable on error | Synchronous commit |          Conninfo           | Skip LSN 
------------------+---------------------------+---------+-------------+--------+-----------+------------------+------------------+--------------------+-----------------------------+----------
- regress_testsub | regress_subscription_user | f       | {testpub}   | f      | f         | d                | f                | off                | dbname=regress_doesnotexist | 0/0
+                                                                                           List of subscriptions
+      Name       |           Owner           | Enabled | Publication | Binary | Streaming | Two phase commit | Disable on error | Only local | Synchronous commit |          Conninfo           | Skip LSN 
+-----------------+---------------------------+---------+-------------+--------+-----------+------------------+------------------+------------+--------------------+-----------------------------+----------
+ regress_testsub | regress_subscription_user | f       | {testpub}   | f      | f         | d                | f                | f          | off                | dbname=regress_doesnotexist | 0/0
 (1 row)
 
 ALTER SUBSCRIPTION regress_testsub SET PUBLICATION testpub2, testpub3 WITH (refresh = false);
@@ -96,10 +118,10 @@ ERROR:  unrecognized subscription parameter: "create_slot"
 -- ok
 ALTER SUBSCRIPTION regress_testsub SKIP (lsn = '0/12345');
 \dRs+
-                                                                                         List of subscriptions
-      Name       |           Owner           | Enabled |     Publication     | Binary | Streaming | Two phase commit | Disable on error | Synchronous commit |           Conninfo           | Skip LSN 
------------------+---------------------------+---------+---------------------+--------+-----------+------------------+------------------+--------------------+------------------------------+----------
- regress_testsub | regress_subscription_user | f       | {testpub2,testpub3} | f      | f         | d                | f                | off                | dbname=regress_doesnotexist2 | 0/12345
+                                                                                               List of subscriptions
+      Name       |           Owner           | Enabled |     Publication     | Binary | Streaming | Two phase commit | Disable on error | Only local | Synchronous commit |           Conninfo           | Skip LSN 
+-----------------+---------------------------+---------+---------------------+--------+-----------+------------------+------------------+------------+--------------------+------------------------------+----------
+ regress_testsub | regress_subscription_user | f       | {testpub2,testpub3} | f      | f         | d                | f                | f          | off                | dbname=regress_doesnotexist2 | 0/12345
 (1 row)
 
 -- ok - with lsn = NONE
@@ -108,10 +130,10 @@ ALTER SUBSCRIPTION regress_testsub SKIP (lsn = NONE);
 ALTER SUBSCRIPTION regress_testsub SKIP (lsn = '0/0');
 ERROR:  invalid WAL location (LSN): 0/0
 \dRs+
-                                                                                         List of subscriptions
-      Name       |           Owner           | Enabled |     Publication     | Binary | Streaming | Two phase commit | Disable on error | Synchronous commit |           Conninfo           | Skip LSN 
------------------+---------------------------+---------+---------------------+--------+-----------+------------------+------------------+--------------------+------------------------------+----------
- regress_testsub | regress_subscription_user | f       | {testpub2,testpub3} | f      | f         | d                | f                | off                | dbname=regress_doesnotexist2 | 0/0
+                                                                                               List of subscriptions
+      Name       |           Owner           | Enabled |     Publication     | Binary | Streaming | Two phase commit | Disable on error | Only local | Synchronous commit |           Conninfo           | Skip LSN 
+-----------------+---------------------------+---------+---------------------+--------+-----------+------------------+------------------+------------+--------------------+------------------------------+----------
+ regress_testsub | regress_subscription_user | f       | {testpub2,testpub3} | f      | f         | d                | f                | f          | off                | dbname=regress_doesnotexist2 | 0/0
 (1 row)
 
 BEGIN;
@@ -143,10 +165,10 @@ ALTER SUBSCRIPTION regress_testsub_foo SET (synchronous_commit = foobar);
 ERROR:  invalid value for parameter "synchronous_commit": "foobar"
 HINT:  Available values: local, remote_write, remote_apply, on, off.
 \dRs+
-                                                                                           List of subscriptions
-        Name         |           Owner           | Enabled |     Publication     | Binary | Streaming | Two phase commit | Disable on error | Synchronous commit |           Conninfo           | Skip LSN 
----------------------+---------------------------+---------+---------------------+--------+-----------+------------------+------------------+--------------------+------------------------------+----------
- regress_testsub_foo | regress_subscription_user | f       | {testpub2,testpub3} | f      | f         | d                | f                | local              | dbname=regress_doesnotexist2 | 0/0
+                                                                                                 List of subscriptions
+        Name         |           Owner           | Enabled |     Publication     | Binary | Streaming | Two phase commit | Disable on error | Only local | Synchronous commit |           Conninfo           | Skip LSN 
+---------------------+---------------------------+---------+---------------------+--------+-----------+------------------+------------------+------------+--------------------+------------------------------+----------
+ regress_testsub_foo | regress_subscription_user | f       | {testpub2,testpub3} | f      | f         | d                | f                | f          | local              | dbname=regress_doesnotexist2 | 0/0
 (1 row)
 
 -- rename back to keep the rest simple
@@ -179,19 +201,19 @@ ERROR:  binary requires a Boolean value
 CREATE SUBSCRIPTION regress_testsub CONNECTION 'dbname=regress_doesnotexist' PUBLICATION testpub WITH (connect = false, binary = true);
 WARNING:  tables were not subscribed, you will have to run ALTER SUBSCRIPTION ... REFRESH PUBLICATION to subscribe the tables
 \dRs+
-                                                                                    List of subscriptions
-      Name       |           Owner           | Enabled | Publication | Binary | Streaming | Two phase commit | Disable on error | Synchronous commit |          Conninfo           | Skip LSN 
------------------+---------------------------+---------+-------------+--------+-----------+------------------+------------------+--------------------+-----------------------------+----------
- regress_testsub | regress_subscription_user | f       | {testpub}   | t      | f         | d                | f                | off                | dbname=regress_doesnotexist | 0/0
+                                                                                           List of subscriptions
+      Name       |           Owner           | Enabled | Publication | Binary | Streaming | Two phase commit | Disable on error | Only local | Synchronous commit |          Conninfo           | Skip LSN 
+-----------------+---------------------------+---------+-------------+--------+-----------+------------------+------------------+------------+--------------------+-----------------------------+----------
+ regress_testsub | regress_subscription_user | f       | {testpub}   | t      | f         | d                | f                | f          | off                | dbname=regress_doesnotexist | 0/0
 (1 row)
 
 ALTER SUBSCRIPTION regress_testsub SET (binary = false);
 ALTER SUBSCRIPTION regress_testsub SET (slot_name = NONE);
 \dRs+
-                                                                                    List of subscriptions
-      Name       |           Owner           | Enabled | Publication | Binary | Streaming | Two phase commit | Disable on error | Synchronous commit |          Conninfo           | Skip LSN 
------------------+---------------------------+---------+-------------+--------+-----------+------------------+------------------+--------------------+-----------------------------+----------
- regress_testsub | regress_subscription_user | f       | {testpub}   | f      | f         | d                | f                | off                | dbname=regress_doesnotexist | 0/0
+                                                                                           List of subscriptions
+      Name       |           Owner           | Enabled | Publication | Binary | Streaming | Two phase commit | Disable on error | Only local | Synchronous commit |          Conninfo           | Skip LSN 
+-----------------+---------------------------+---------+-------------+--------+-----------+------------------+------------------+------------+--------------------+-----------------------------+----------
+ regress_testsub | regress_subscription_user | f       | {testpub}   | f      | f         | d                | f                | f          | off                | dbname=regress_doesnotexist | 0/0
 (1 row)
 
 DROP SUBSCRIPTION regress_testsub;
@@ -202,19 +224,19 @@ ERROR:  streaming requires a Boolean value
 CREATE SUBSCRIPTION regress_testsub CONNECTION 'dbname=regress_doesnotexist' PUBLICATION testpub WITH (connect = false, streaming = true);
 WARNING:  tables were not subscribed, you will have to run ALTER SUBSCRIPTION ... REFRESH PUBLICATION to subscribe the tables
 \dRs+
-                                                                                    List of subscriptions
-      Name       |           Owner           | Enabled | Publication | Binary | Streaming | Two phase commit | Disable on error | Synchronous commit |          Conninfo           | Skip LSN 
------------------+---------------------------+---------+-------------+--------+-----------+------------------+------------------+--------------------+-----------------------------+----------
- regress_testsub | regress_subscription_user | f       | {testpub}   | f      | t         | d                | f                | off                | dbname=regress_doesnotexist | 0/0
+                                                                                           List of subscriptions
+      Name       |           Owner           | Enabled | Publication | Binary | Streaming | Two phase commit | Disable on error | Only local | Synchronous commit |          Conninfo           | Skip LSN 
+-----------------+---------------------------+---------+-------------+--------+-----------+------------------+------------------+------------+--------------------+-----------------------------+----------
+ regress_testsub | regress_subscription_user | f       | {testpub}   | f      | t         | d                | f                | f          | off                | dbname=regress_doesnotexist | 0/0
 (1 row)
 
 ALTER SUBSCRIPTION regress_testsub SET (streaming = false);
 ALTER SUBSCRIPTION regress_testsub SET (slot_name = NONE);
 \dRs+
-                                                                                    List of subscriptions
-      Name       |           Owner           | Enabled | Publication | Binary | Streaming | Two phase commit | Disable on error | Synchronous commit |          Conninfo           | Skip LSN 
------------------+---------------------------+---------+-------------+--------+-----------+------------------+------------------+--------------------+-----------------------------+----------
- regress_testsub | regress_subscription_user | f       | {testpub}   | f      | f         | d                | f                | off                | dbname=regress_doesnotexist | 0/0
+                                                                                           List of subscriptions
+      Name       |           Owner           | Enabled | Publication | Binary | Streaming | Two phase commit | Disable on error | Only local | Synchronous commit |          Conninfo           | Skip LSN 
+-----------------+---------------------------+---------+-------------+--------+-----------+------------------+------------------+------------+--------------------+-----------------------------+----------
+ regress_testsub | regress_subscription_user | f       | {testpub}   | f      | f         | d                | f                | f          | off                | dbname=regress_doesnotexist | 0/0
 (1 row)
 
 -- fail - publication already exists
@@ -229,10 +251,10 @@ ALTER SUBSCRIPTION regress_testsub ADD PUBLICATION testpub1, testpub2 WITH (refr
 ALTER SUBSCRIPTION regress_testsub ADD PUBLICATION testpub1, testpub2 WITH (refresh = false);
 ERROR:  publication "testpub1" is already in subscription "regress_testsub"
 \dRs+
-                                                                                            List of subscriptions
-      Name       |           Owner           | Enabled |         Publication         | Binary | Streaming | Two phase commit | Disable on error | Synchronous commit |          Conninfo           | Skip LSN 
------------------+---------------------------+---------+-----------------------------+--------+-----------+------------------+------------------+--------------------+-----------------------------+----------
- regress_testsub | regress_subscription_user | f       | {testpub,testpub1,testpub2} | f      | f         | d                | f                | off                | dbname=regress_doesnotexist | 0/0
+                                                                                                   List of subscriptions
+      Name       |           Owner           | Enabled |         Publication         | Binary | Streaming | Two phase commit | Disable on error | Only local | Synchronous commit |          Conninfo           | Skip LSN 
+-----------------+---------------------------+---------+-----------------------------+--------+-----------+------------------+------------------+------------+--------------------+-----------------------------+----------
+ regress_testsub | regress_subscription_user | f       | {testpub,testpub1,testpub2} | f      | f         | d                | f                | f          | off                | dbname=regress_doesnotexist | 0/0
 (1 row)
 
 -- fail - publication used more then once
@@ -247,10 +269,10 @@ ERROR:  publication "testpub3" is not in subscription "regress_testsub"
 -- ok - delete publications
 ALTER SUBSCRIPTION regress_testsub DROP PUBLICATION testpub1, testpub2 WITH (refresh = false);
 \dRs+
-                                                                                    List of subscriptions
-      Name       |           Owner           | Enabled | Publication | Binary | Streaming | Two phase commit | Disable on error | Synchronous commit |          Conninfo           | Skip LSN 
------------------+---------------------------+---------+-------------+--------+-----------+------------------+------------------+--------------------+-----------------------------+----------
- regress_testsub | regress_subscription_user | f       | {testpub}   | f      | f         | d                | f                | off                | dbname=regress_doesnotexist | 0/0
+                                                                                           List of subscriptions
+      Name       |           Owner           | Enabled | Publication | Binary | Streaming | Two phase commit | Disable on error | Only local | Synchronous commit |          Conninfo           | Skip LSN 
+-----------------+---------------------------+---------+-------------+--------+-----------+------------------+------------------+------------+--------------------+-----------------------------+----------
+ regress_testsub | regress_subscription_user | f       | {testpub}   | f      | f         | d                | f                | f          | off                | dbname=regress_doesnotexist | 0/0
 (1 row)
 
 DROP SUBSCRIPTION regress_testsub;
@@ -284,10 +306,10 @@ ERROR:  two_phase requires a Boolean value
 CREATE SUBSCRIPTION regress_testsub CONNECTION 'dbname=regress_doesnotexist' PUBLICATION testpub WITH (connect = false, two_phase = true);
 WARNING:  tables were not subscribed, you will have to run ALTER SUBSCRIPTION ... REFRESH PUBLICATION to subscribe the tables
 \dRs+
-                                                                                    List of subscriptions
-      Name       |           Owner           | Enabled | Publication | Binary | Streaming | Two phase commit | Disable on error | Synchronous commit |          Conninfo           | Skip LSN 
------------------+---------------------------+---------+-------------+--------+-----------+------------------+------------------+--------------------+-----------------------------+----------
- regress_testsub | regress_subscription_user | f       | {testpub}   | f      | f         | p                | f                | off                | dbname=regress_doesnotexist | 0/0
+                                                                                           List of subscriptions
+      Name       |           Owner           | Enabled | Publication | Binary | Streaming | Two phase commit | Disable on error | Only local | Synchronous commit |          Conninfo           | Skip LSN 
+-----------------+---------------------------+---------+-------------+--------+-----------+------------------+------------------+------------+--------------------+-----------------------------+----------
+ regress_testsub | regress_subscription_user | f       | {testpub}   | f      | f         | p                | f                | f          | off                | dbname=regress_doesnotexist | 0/0
 (1 row)
 
 --fail - alter of two_phase option not supported.
@@ -296,10 +318,10 @@ ERROR:  unrecognized subscription parameter: "two_phase"
 -- but can alter streaming when two_phase enabled
 ALTER SUBSCRIPTION regress_testsub SET (streaming = true);
 \dRs+
-                                                                                    List of subscriptions
-      Name       |           Owner           | Enabled | Publication | Binary | Streaming | Two phase commit | Disable on error | Synchronous commit |          Conninfo           | Skip LSN 
------------------+---------------------------+---------+-------------+--------+-----------+------------------+------------------+--------------------+-----------------------------+----------
- regress_testsub | regress_subscription_user | f       | {testpub}   | f      | t         | p                | f                | off                | dbname=regress_doesnotexist | 0/0
+                                                                                           List of subscriptions
+      Name       |           Owner           | Enabled | Publication | Binary | Streaming | Two phase commit | Disable on error | Only local | Synchronous commit |          Conninfo           | Skip LSN 
+-----------------+---------------------------+---------+-------------+--------+-----------+------------------+------------------+------------+--------------------+-----------------------------+----------
+ regress_testsub | regress_subscription_user | f       | {testpub}   | f      | t         | p                | f                | f          | off                | dbname=regress_doesnotexist | 0/0
 (1 row)
 
 ALTER SUBSCRIPTION regress_testsub SET (slot_name = NONE);
@@ -308,10 +330,10 @@ DROP SUBSCRIPTION regress_testsub;
 CREATE SUBSCRIPTION regress_testsub CONNECTION 'dbname=regress_doesnotexist' PUBLICATION testpub WITH (connect = false, streaming = true, two_phase = true);
 WARNING:  tables were not subscribed, you will have to run ALTER SUBSCRIPTION ... REFRESH PUBLICATION to subscribe the tables
 \dRs+
-                                                                                    List of subscriptions
-      Name       |           Owner           | Enabled | Publication | Binary | Streaming | Two phase commit | Disable on error | Synchronous commit |          Conninfo           | Skip LSN 
------------------+---------------------------+---------+-------------+--------+-----------+------------------+------------------+--------------------+-----------------------------+----------
- regress_testsub | regress_subscription_user | f       | {testpub}   | f      | t         | p                | f                | off                | dbname=regress_doesnotexist | 0/0
+                                                                                           List of subscriptions
+      Name       |           Owner           | Enabled | Publication | Binary | Streaming | Two phase commit | Disable on error | Only local | Synchronous commit |          Conninfo           | Skip LSN 
+-----------------+---------------------------+---------+-------------+--------+-----------+------------------+------------------+------------+--------------------+-----------------------------+----------
+ regress_testsub | regress_subscription_user | f       | {testpub}   | f      | t         | p                | f                | f          | off                | dbname=regress_doesnotexist | 0/0
 (1 row)
 
 ALTER SUBSCRIPTION regress_testsub SET (slot_name = NONE);
@@ -323,18 +345,18 @@ ERROR:  disable_on_error requires a Boolean value
 CREATE SUBSCRIPTION regress_testsub CONNECTION 'dbname=regress_doesnotexist' PUBLICATION testpub WITH (connect = false, disable_on_error = false);
 WARNING:  tables were not subscribed, you will have to run ALTER SUBSCRIPTION ... REFRESH PUBLICATION to subscribe the tables
 \dRs+
-                                                                                    List of subscriptions
-      Name       |           Owner           | Enabled | Publication | Binary | Streaming | Two phase commit | Disable on error | Synchronous commit |          Conninfo           | Skip LSN 
------------------+---------------------------+---------+-------------+--------+-----------+------------------+------------------+--------------------+-----------------------------+----------
- regress_testsub | regress_subscription_user | f       | {testpub}   | f      | f         | d                | f                | off                | dbname=regress_doesnotexist | 0/0
+                                                                                           List of subscriptions
+      Name       |           Owner           | Enabled | Publication | Binary | Streaming | Two phase commit | Disable on error | Only local | Synchronous commit |          Conninfo           | Skip LSN 
+-----------------+---------------------------+---------+-------------+--------+-----------+------------------+------------------+------------+--------------------+-----------------------------+----------
+ regress_testsub | regress_subscription_user | f       | {testpub}   | f      | f         | d                | f                | f          | off                | dbname=regress_doesnotexist | 0/0
 (1 row)
 
 ALTER SUBSCRIPTION regress_testsub SET (disable_on_error = true);
 \dRs+
-                                                                                    List of subscriptions
-      Name       |           Owner           | Enabled | Publication | Binary | Streaming | Two phase commit | Disable on error | Synchronous commit |          Conninfo           | Skip LSN 
------------------+---------------------------+---------+-------------+--------+-----------+------------------+------------------+--------------------+-----------------------------+----------
- regress_testsub | regress_subscription_user | f       | {testpub}   | f      | f         | d                | t                | off                | dbname=regress_doesnotexist | 0/0
+                                                                                           List of subscriptions
+      Name       |           Owner           | Enabled | Publication | Binary | Streaming | Two phase commit | Disable on error | Only local | Synchronous commit |          Conninfo           | Skip LSN 
+-----------------+---------------------------+---------+-------------+--------+-----------+------------------+------------------+------------+--------------------+-----------------------------+----------
+ regress_testsub | regress_subscription_user | f       | {testpub}   | f      | f         | d                | t                | f          | off                | dbname=regress_doesnotexist | 0/0
 (1 row)
 
 ALTER SUBSCRIPTION regress_testsub SET (slot_name = NONE);
diff --git a/src/test/regress/sql/subscription.sql b/src/test/regress/sql/subscription.sql
index 74c38ead5d..28eb91fc47 100644
--- a/src/test/regress/sql/subscription.sql
+++ b/src/test/regress/sql/subscription.sql
@@ -54,7 +54,17 @@ CREATE SUBSCRIPTION regress_testsub3 CONNECTION 'dbname=regress_doesnotexist' PU
 ALTER SUBSCRIPTION regress_testsub3 ENABLE;
 ALTER SUBSCRIPTION regress_testsub3 REFRESH PUBLICATION;
 
+-- fail - only_local must be boolean
+CREATE SUBSCRIPTION regress_testsub4 CONNECTION 'dbname=regress_doesnotexist' PUBLICATION testpub WITH (slot_name = NONE, connect = false, only_local = foo);
+
+-- now it works
+CREATE SUBSCRIPTION regress_testsub4 CONNECTION 'dbname=regress_doesnotexist' PUBLICATION testpub WITH (slot_name = NONE, connect = false, only_local = true);
+\dRs+ regress_testsub4
+ALTER SUBSCRIPTION regress_testsub4 SET (only_local = false);
+\dRs+ regress_testsub4
+
 DROP SUBSCRIPTION regress_testsub3;
+DROP SUBSCRIPTION regress_testsub4;
 
 -- fail - invalid connection string
 ALTER SUBSCRIPTION regress_testsub CONNECTION 'foobar';
diff --git a/src/test/subscription/t/032_onlylocal.pl b/src/test/subscription/t/032_onlylocal.pl
new file mode 100644
index 0000000000..5ff5a0d9dc
--- /dev/null
+++ b/src/test/subscription/t/032_onlylocal.pl
@@ -0,0 +1,162 @@
+
+# Copyright (c) 2021-2022, PostgreSQL Global Development Group
+
+# Test logical replication using only_local option.
+use strict;
+use warnings;
+use PostgreSQL::Test::Cluster;
+use PostgreSQL::Test::Utils;
+use Test::More;
+
+###############################################################################
+# Setup a bidirectional logical replication between Node_A & Node_B
+###############################################################################
+
+# Initialize nodes
+# node_A
+my $node_A = PostgreSQL::Test::Cluster->new('node_A');
+$node_A->init(allows_streaming => 'logical');
+$node_A->append_conf(
+	'postgresql.conf', qq(
+max_prepared_transactions = 10
+logical_decoding_work_mem = 64kB
+));
+$node_A->start;
+# node_B
+my $node_B = PostgreSQL::Test::Cluster->new('node_B');
+$node_B->init(allows_streaming => 'logical');
+$node_B->append_conf(
+	'postgresql.conf', qq(
+max_prepared_transactions = 10
+logical_decoding_work_mem = 64kB
+));
+$node_B->start;
+
+# Create tables on node_A
+$node_A->safe_psql('postgres', "CREATE TABLE tab_full (a int PRIMARY KEY)");
+
+# Create the same tables on node_B
+$node_B->safe_psql('postgres', "CREATE TABLE tab_full (a int PRIMARY KEY)");
+
+# Setup logical replication
+# node_A (pub) -> node_B (sub)
+my $node_A_connstr = $node_A->connstr . ' dbname=postgres';
+$node_A->safe_psql('postgres',
+	"CREATE PUBLICATION tap_pub_A FOR TABLE tab_full");
+my $appname_B1 = 'tap_sub_B1';
+$node_B->safe_psql(
+	'postgres', "
+	CREATE SUBSCRIPTION tap_sub_B1
+	CONNECTION '$node_A_connstr application_name=$appname_B1'
+	PUBLICATION tap_pub_A
+	WITH (only_local = on)");
+
+# node_B (pub) -> node_A (sub)
+my $node_B_connstr = $node_B->connstr . ' dbname=postgres';
+$node_B->safe_psql('postgres',
+	"CREATE PUBLICATION tap_pub_B FOR TABLE tab_full");
+my $appname_A = 'tap_sub_A';
+$node_A->safe_psql(
+	'postgres', "
+	CREATE SUBSCRIPTION tap_sub_A
+	CONNECTION '$node_B_connstr application_name=$appname_A'
+	PUBLICATION tap_pub_B
+	WITH (only_local = on, copy_data = off)");
+
+# Wait for subscribers to finish initialization
+$node_A->wait_for_catchup($appname_B1);
+$node_B->wait_for_catchup($appname_A);
+
+# Also wait for initial table sync to finish
+my $synced_query =
+  "SELECT count(1) = 0 FROM pg_subscription_rel WHERE srsubstate NOT IN ('r', 's');";
+$node_A->poll_query_until('postgres', $synced_query)
+  or die "Timed out while waiting for subscriber to synchronize data";
+$node_B->poll_query_until('postgres', $synced_query)
+  or die "Timed out while waiting for subscriber to synchronize data";
+
+is(1, 1, "Circular replication setup is complete");
+
+my $result;
+
+###############################################################################
+# check that bidirectional logical replication setup does not cause infinite
+# recursive insertion.
+###############################################################################
+
+# insert a record
+$node_A->safe_psql('postgres', "INSERT INTO tab_full VALUES (11);");
+$node_B->safe_psql('postgres', "INSERT INTO tab_full VALUES (12);");
+
+$node_A->wait_for_catchup($appname_B1);
+$node_B->wait_for_catchup($appname_A);
+
+# check that transaction was committed on subscriber(s)
+$result = $node_A->safe_psql('postgres', "SELECT * FROM tab_full ORDER BY 1;");
+is($result, qq(11
+12), 'Inserted successfully without leading to infinite recursion in bidirectional replication setup'
+);
+$result = $node_B->safe_psql('postgres', "SELECT * FROM tab_full ORDER BY 1;");
+is($result, qq(11
+12), 'Inserted successfully without leading to infinite recursion in bidirectional replication setup'
+);
+
+###############################################################################
+# check that remote data that is originated from node_C to node_B is not
+# published to node_A
+###############################################################################
+# Initialize node node_C
+my $node_C = PostgreSQL::Test::Cluster->new('node_C');
+$node_C->init(allows_streaming => 'logical');
+$node_C->append_conf(
+	'postgresql.conf', qq(
+max_prepared_transactions = 10
+logical_decoding_work_mem = 64kB
+));
+$node_C->start;
+
+$node_C->safe_psql('postgres', "CREATE TABLE tab_full (a int PRIMARY KEY)");
+
+# Setup logical replication
+# node_C (pub) -> node_B (sub)
+my $node_C_connstr = $node_C->connstr . ' dbname=postgres';
+$node_C->safe_psql('postgres',
+	"CREATE PUBLICATION tap_pub_C FOR TABLE tab_full");
+
+my $appname_B2 = 'tap_sub_B2';
+$node_B->safe_psql(
+	'postgres', "
+	CREATE SUBSCRIPTION tap_sub_B2
+	CONNECTION '$node_C_connstr application_name=$appname_B2'
+	PUBLICATION tap_pub_C
+	WITH (only_local = on)");
+
+$node_C->wait_for_catchup($appname_B2);
+
+$node_C->poll_query_until('postgres', $synced_query)
+  or die "Timed out while waiting for subscriber to synchronize data";
+
+# insert a record
+$node_C->safe_psql('postgres', "INSERT INTO tab_full VALUES (13);");
+
+$node_C->wait_for_catchup($appname_B2);
+$node_B->wait_for_catchup($appname_A);
+
+$result = $node_B->safe_psql('postgres', "SELECT * FROM tab_full ORDER BY 1;");
+is($result, qq(11
+12
+13), 'Node_C data replicated to Node_B'
+);
+
+# check that the data published from node_C to node_B is not sent to node_A
+$result = $node_A->safe_psql('postgres', "SELECT * FROM tab_full ORDER BY 1;");
+is($result, qq(11
+12), 'Remote data originated from other node is not replicated when only_local option is ON'
+);
+
+# shutdown
+$node_B->stop('fast');
+$node_A->stop('fast');
+$node_C->stop('fast');
+
+done_testing();
-- 
2.32.0

v14-0002-Support-force-option-for-copy_data-check-and-thr.patchapplication/octet-stream; name=v14-0002-Support-force-option-for-copy_data-check-and-thr.patchDownload

From 4ceb34c134ce796c89daa7e183a2b27dcc448e15 Mon Sep 17 00:00:00 2001
From: Vigneshwaran C <vignesh21@gmail.com>
Date: Fri, 20 May 2022 13:17:08 +0530
Subject: [PATCH v14 2/2] Support force option for copy_data, check and throw
 an error if publisher tables were also subscribing data in the publisher from
 other publishers.

This patch does couple of things:
change 1) Added force option for copy_data.
change 2) Check and throw an error if the publication tables were also subscribing
data in the publisher from other publishers.

--------------------------------------------------------------------------------
The following will help us understand how the first change will be useful:
Let's take a simple case where user is trying to setup bidirectional logical
replication between node1 and node1 where the two nodes has some pre-existing
data like below:
node1: Table t1 (c1 int) has data 1, 2, 3, 4
node2: Table t1 (c1 int) has data 5, 6, 7, 8

The following steps are required in this case:
node1
step 1: CREATE PUBLICATION pub_node1 FOR TABLE t1;

node2
step 2: CREATE PUBLICATION pub_node2 FOR TABLE t1;

node1:
step 3: CREATE SUBSCRIPTION sub_node1_node2 CONNECTION '<node2 details>' PUBLICATION pub_node2;

node2:
step 4: CREATE SUBSCRIPTION sub_node2_node1 Connection '<node1 details>' PUBLICATION pub_node1;

After this the data will be something like this:
node1:
1, 2, 3, 4, 5, 6, 7, 8

node2:
1, 2, 3, 4, 5, 6, 7, 8, 5, 6, 7, 8

So, you can see that data on node2 (5, 6, 7, 8) is duplicated. In
case, table t1 has a unique key, it will lead to a unique key
violation and replication won't proceed.

This problem can be solved by using only_local and copy_data option as given
below:
Step 1 & Step 2 are same as above.

step 3: Then, Create a subscription in node1 to subscribe to node2. Use
copy_data specified as on so that the existing table data is copied during
initial sync:
CREATE SUBSCRIPTION sub_node1_node2 CONNECTION '<node2 details>' PUBLICATION pub_node2 WITH (copy_data = on, only_local = on);

step 4: Adjust the publication publish settings so that truncate is not
published to the subscribers and truncate the table data in node2:
ALTER PUBLICATION pub1_node2 SET (publish='insert,update,delete');
TRUNCATE t1;
ALTER PUBLICATION pub1_node2 SET (publish='insert,update,delete,truncate');

step 5: Create a subscription in node2 to subscribe to node1. Use copy_data
specified as force when creating a subscription to node1 so that the existing
table data is copied during initial sync:
CREATE SUBSCRIPTION sub_node2_node1 CONNECTION '<node1 details>' PUBLICATION pub_node1 WITH (copy_data = force, only_local = on);

--------------------------------------------------------------------------------
The below help us understand how the second change will be useful:

If copy_data option was used with 'on' in step 5, then an error will be thrown
to alert the user to prevent inconsistent data being populated:
CREATE SUBSCRIPTION sub_node2_node1 CONNECTION '<node1 details>' PUBLICATION pub_node1 WITH (copy_data = force, only_local = on);
ERROR:  CREATE/ALTER SUBSCRIPTION with only_local and copy_data as true is not allowed when the publisher might have replicated data
---
 doc/src/sgml/logical-replication.sgml      | 353 +++++++++++++++++++++
 doc/src/sgml/ref/alter_subscription.sgml   |  16 +-
 doc/src/sgml/ref/create_subscription.sgml  |  33 +-
 src/backend/commands/subscriptioncmds.c    | 139 ++++++--
 src/test/regress/expected/subscription.out |  18 +-
 src/test/regress/sql/subscription.sql      |  12 +
 src/test/subscription/t/032_onlylocal.pl   | 327 ++++++++++++++++---
 src/tools/pgindent/typedefs.list           |   1 +
 8 files changed, 825 insertions(+), 74 deletions(-)

diff --git a/doc/src/sgml/logical-replication.sgml b/doc/src/sgml/logical-replication.sgml
index 145ea71d61..54fa20254c 100644
--- a/doc/src/sgml/logical-replication.sgml
+++ b/doc/src/sgml/logical-replication.sgml
@@ -1267,4 +1267,357 @@ CREATE SUBSCRIPTION mysub CONNECTION 'dbname=foo host=bar user=repuser' PUBLICAT
    incremental changes to those tables.
   </para>
  </sect1>
+
+ <sect1 id="bidirectional-logical-replication">
+  <title>Bidirectional logical replication</title>
+
+  <sect2 id="setting-bidirectional-replication-two-nodes">
+   <title>Setting bidirectional replication between two nodes</title>
+   <para>
+    Bidirectional replication is useful in creating a multi-master database
+    which helps in performing read/write operations from any of the nodes.
+    Setting up bidirectional logical replication between two nodes requires
+    creation of a publication in all the nodes, creating subscriptions in
+    each of the nodes that subscribes to data from all the nodes. The steps
+    to create a two-node bidirectional replication when there is no data in
+    both the nodes are given below:
+   </para>
+
+   <para>
+    Lock the required tables in <literal>node1</literal> and
+    <literal>node2</literal> till the setup is completed.
+   </para>
+
+   <para>
+    Create a publication in <literal>node1</literal>:
+<programlisting>
+node1=# CREATE PUBLICATION pub_node1 FOR TABLE t1;
+CREATE PUBLICATION
+</programlisting></para>
+
+   <para>
+    Create a subscription in <literal>node2</literal> to subscribe to
+    <literal>node1</literal>:
+<programlisting>
+node2=# CREATE SUBSCRIPTION sub_node2_node1
+node2-# CONNECTION 'dbname=foo host=node1 user=repuser'
+node2-# PUBLICATION pub_node1
+node2-# WITH (copy_data = off, only_local = on);
+CREATE SUBSCRIPTION
+</programlisting></para>
+
+   <para>
+    Create a publication in <literal>node2</literal>:
+<programlisting>
+node2=# CREATE PUBLICATION pub_node2 FOR TABLE t1;
+CREATE PUBLICATION
+</programlisting></para>
+
+   <para>
+    Create a subscription in <literal>node1</literal> to subscribe to
+    <literal>node2</literal>:
+<programlisting>
+node1=# CREATE SUBSCRIPTION sub_node1_node2
+node1-# CONNECTION 'dbname=foo host=node2 user=repuser'
+node1-# PUBLICATION pub_node2
+node1-# WITH (copy_data = off, only_local = on);
+CREATE SUBSCRIPTION
+</programlisting></para>
+
+   <para>
+    Now the bidirectional logical replication setup is complete between
+    <literal>node1</literal> and <literal>node2</literal>. Any incremental
+    changes from <literal>node1</literal> will be replicated to
+    <literal>node2</literal> and the incremental changes from
+    <literal>node2</literal> will be replicated to <literal>node1</literal>.
+   </para>
+  </sect2>
+
+  <sect2 id="add-new-node">
+   <title>Adding a new node when there is no data in any of the nodes</title>
+   <para>
+    Adding a new node <literal>node3</literal> to the existing
+    <literal>node1</literal> and <literal>node2</literal> requires setting
+    up subscription in <literal>node1</literal> and <literal>node2</literal>
+    to replicate the data from <literal>node3</literal> and setting up
+    subscription in <literal>node3</literal> to replicate data from
+    <literal>node1</literal> and <literal>node2</literal>.
+   </para>
+
+   <para>
+    Create a publication in <literal>node3</literal>:
+<programlisting>
+node3=# CREATE PUBLICATION pub_node3 FOR TABLE t1;
+CREATE PUBLICATION
+</programlisting></para>
+
+   <para>
+    Lock the required tables in all the nodes <literal>node1</literal>,
+    <literal>node2</literal> and <literal>node3</literal> till the setup is
+    completed.
+   </para>
+
+   <para>
+    Create a subscription in <literal>node1</literal> to subscribe to
+    <literal>node3</literal>:
+<programlisting>
+node1=# CREATE SUBSCRIPTION sub_node1_node3
+node1-# CONNECTION 'dbname=foo host=node3 user=repuser'
+node1-# PUBLICATION pub_node3
+node1-# WITH (copy_data = off, only_local = on);
+CREATE SUBSCRIPTION
+</programlisting></para>
+
+   <para>
+    Create a subscription in <literal>node2</literal> to subscribe to
+    <literal>node3</literal>:
+<programlisting>
+node2=# CREATE SUBSCRIPTION sub_node2_node3
+node2-# CONNECTION 'dbname=foo host=node3 user=repuser'
+node2-# PUBLICATION pub_node3
+node2-# WITH (copy_data = off, only_local = on);
+CREATE SUBSCRIPTION
+</programlisting></para>
+
+   <para>
+    Create a subscription in <literal>node3</literal> to subscribe to
+    <literal>node1</literal>:
+<programlisting>
+node3=# CREATE SUBSCRIPTION sub_node3_node1
+node3-# CONNECTION 'dbname=foo host=node1 user=repuser'
+node3-# PUBLICATION pub_node1
+node3-# WITH (copy_data = off, only_local = on);
+CREATE SUBSCRIPTION
+</programlisting></para>
+
+   <para>
+    Create a subscription in <literal>node3</literal> to subscribe to
+    <literal>node2</literal>:
+<programlisting>
+node3=# CREATE SUBSCRIPTION sub_node3_node2
+node3-# CONNECTION 'dbname=foo host=node2 user=repuser'
+node3-# PUBLICATION pub_node2
+node3-# WITH (copy_data = off, only_local = on);
+CREATE SUBSCRIPTION
+</programlisting></para>
+  </sect2>
+
+  <sect2 id="add-new-node-data-in-existing-node">
+   <title>Adding a new node when data is present in the existing nodes</title>
+    <para>
+     Adding a new node <literal>node3</literal> to the existing
+     <literal>node1</literal> and <literal>node2</literal> when data is present
+     in existing nodes <literal>node1</literal> and <literal>node2</literal>
+     needs similar steps. The only change required here is that
+     <literal>node3</literal> should create a subscription with
+     <literal>copy_data = force</literal> to one of the existing nodes to
+     receive the existing data during initial data synchronization.
+   </para>
+
+   <para>
+    Create a publication in <literal>node3</literal>:
+<programlisting>
+node3=# CREATE PUBLICATION pub_node3 FOR TABLE t1;
+CREATE PUBLICATION
+</programlisting></para>
+
+   <para>
+    Lock the required tables in <literal>node2</literal> and
+    <literal>node3</literal> till the setup is completed.
+   </para>
+
+   <para>
+    Create a subscription in <literal>node1</literal> to subscribe to
+    <literal>node3</literal>:
+<programlisting>
+node1=# CREATE SUBSCRIPTION sub_node1_node3
+node1-# CONNECTION 'dbname=foo host=node3 user=repuser'
+node1-# PUBLICATION pub_node3
+node1-# WITH (copy_data = off, only_local = on);
+CREATE SUBSCRIPTION
+</programlisting></para>
+
+   <para>
+    Create a subscription in <literal>node2</literal> to subscribe to
+    <literal>node3</literal>:
+<programlisting>
+node2=# CREATE SUBSCRIPTION sub_node2_node3
+node2-# CONNECTION 'dbname=foo host=node3 user=repuser'
+node2-# PUBLICATION pub_node3
+node2-# WITH (copy_data = off, only_local = on);
+CREATE SUBSCRIPTION
+</programlisting></para>
+
+   <para>
+    Create a subscription in <literal>node3</literal> to subscribe to
+    <literal>node1</literal>. Use <literal>copy_data</literal> specified as
+    <literal>force</literal> so that the existing table data is
+    copied during initial sync:
+<programlisting>
+node3=# CREATE SUBSCRIPTION sub_node3_node1
+node3-# CONNECTION 'dbname=foo host=node1 user=repuser'
+node3-# PUBLICATION pub_node1
+node3-# WITH (copy_data = force, only_local = on);
+CREATE SUBSCRIPTION
+</programlisting></para>
+
+   <para>
+    Create a subscription in <literal>node3</literal> to subscribe to
+    <literal>node2</literal>:
+<programlisting>
+node3=# CREATE SUBSCRIPTION sub_node3_node2
+node3-# CONNECTION 'dbname=foo host=node2 user=repuser'
+node3-# PUBLICATION pub_node2
+node3-# WITH (copy_data = off, only_local = on);
+CREATE SUBSCRIPTION
+</programlisting></para>
+  </sect2>
+
+  <sect2 id="add-node-data-present-in-new-node">
+   <title>Adding a new node when data is present in the new node</title>
+   <para>
+     Adding a new node <literal>node3</literal> to the existing
+     <literal>node1</literal> and <literal>node2</literal> when data is present
+     in the new node <literal>node3</literal> needs similar steps. A few changes
+     are required here to get the existing data from <literal>node3</literal>
+     to <literal>node1</literal> and <literal>node2</literal> and later
+     cleaning up of data in <literal>node3</literal> before synchronization of
+     all the data from the existing nodes.
+   </para>
+
+   <para>
+    Create a publication in <literal>node3</literal>:
+<programlisting>
+node3=# CREATE PUBLICATION pub_node3 FOR TABLE t1;
+CREATE PUBLICATION
+</programlisting></para>
+
+   <para>
+    Lock the required tables in <literal>node2</literal> and
+    <literal>node3</literal> till the setup is completed.
+   </para>
+
+   <para>
+    Create a subscription in <literal>node1</literal> to subscribe to
+    <literal>node3</literal>. Use <literal>copy_data</literal> specified as
+    <literal>on</literal> so that the existing table data is copied during
+    initial sync:
+<programlisting>
+node1=# CREATE SUBSCRIPTION sub_node1_node3
+node1-# CONNECTION 'dbname=foo host=node3 user=repuser'
+node1-# PUBLICATION pub_node3
+node1-# WITH (copy_data = on, only_local = on);
+CREATE SUBSCRIPTION
+</programlisting></para>
+
+   <para>
+    Create a subscription in <literal>node2</literal> to subscribe to
+    <literal>node3</literal>. Use <literal>copy_data</literal> specified as
+    <literal>on</literal> so that the existing table data is copied during
+    initial sync:
+<programlisting>
+node2=# CREATE SUBSCRIPTION sub_node2_node3
+node2-# CONNECTION 'dbname=foo host=node3 user=repuser'
+node2-# PUBLICATION pub_node3
+node2-# WITH (copy_data = on, only_local = on);
+CREATE SUBSCRIPTION
+</programlisting></para>
+
+   <para>
+    Adjust the publication publish settings so that truncate is not published
+    to the subscribers and truncate the table data in <literal>node3</literal>:
+<programlisting>
+node3=# ALTER PUBLICATION pub_node3 SET (publish='insert,update,delete');
+ALTER PUBLICATION
+node3=# TRUNCATE t1;
+TRUNCATE TABLE
+node3=# ALTER PUBLICATION pub_node3 SET (publish='insert,update,delete,truncate');
+ALTER PUBLICATION
+</programlisting></para>
+
+   <para>
+    Create a subscription in <literal>node3</literal> to subscribe to
+    <literal>node1</literal>. Use <literal>copy_data</literal> specified as
+    <literal>force</literal> when creating a subscription to
+    <literal>node1</literal> so that the existing table data is copied during
+    initial sync:
+<programlisting>
+node3=# CREATE SUBSCRIPTION
+node3-# sub_node3_node1 CONNECTION 'dbname=foo host=node1 user=repuser'
+node3-# PUBLICATION pub_node1
+node3-# WITH (copy_data = force, only_local = on);
+CREATE SUBSCRIPTION
+</programlisting></para>
+
+   <para>
+    Create a subscription in <literal>node3</literal> to subscribe to
+    <literal>node2</literal>. Use <literal>copy_data</literal> specified as
+    <literal>off</literal> because the initial table data would have been
+    already copied in the previous step:
+<programlisting>
+node3=# CREATE SUBSCRIPTION
+node3-# sub_node3_node2 CONNECTION 'dbname=foo host=node2 user=repuser'
+node3-# PUBLICATION pub_node2
+node3-# WITH (copy_data = off, only_local = on);
+CREATE SUBSCRIPTION
+</programlisting></para>
+  </sect2>
+
+  <sect2 id="generic-steps-add-new-node">
+   <title>Generic steps to add a new node to the existing set of nodes</title>
+   <para>
+    Create the required publication on the new node.
+   </para>
+   <para>
+    Lock the required tables in the new node until the setup is complete.
+   </para>
+   <para>
+    Create subscriptions on existing nodes pointing to publication on
+    the new node with <literal>only_local</literal> option specified as
+    <literal>on</literal> and <literal>copy_data</literal> specified as
+    <literal>on</literal>.
+   </para>
+   <para>
+    Wait for data to be copied from the new node to existing nodes.
+   </para>
+   <para>
+    Alter the publication in new node so that the truncate operation is not
+    replicated to the subscribers.
+   </para>
+   <para>
+    Truncate the data on the new node.
+   </para>
+   <para>
+    Alter the publication in new node to include replication of truncate
+    operations.
+   </para>
+   <para>
+    Lock the required tables in the existing nodes except the first node
+    until the setup is complete.
+   </para>
+   <para>
+    Create subscriptions on the new node pointing to publication on the first
+    node with <literal>only_local</literal> option specified as
+    <literal>on</literal> and <literal>copy_data</literal> option specified as
+    <literal>force</literal>.
+   </para>
+   <para>
+    Create subscriptions on the new node pointing to publications on the
+    remaining node with <literal>only_local</literal> option specified as
+    <literal>on</literal> and <literal>copy_data</literal> option specified as
+    <literal>off</literal>.
+   </para>
+  </sect2>
+
+  <sect2>
+   <title>Notes</title>
+   <para>
+    Setting up bidirectional logical replication across nodes requires multiple
+    steps to be performed on various nodes, as all operations are not
+    transactional, user is advised to take backup of existing data to avoid any
+    inconsistency.
+   </para>
+  </sect2>
+ </sect1>
+
 </chapter>
diff --git a/doc/src/sgml/ref/alter_subscription.sgml b/doc/src/sgml/ref/alter_subscription.sgml
index 45beca9b86..34d78a9862 100644
--- a/doc/src/sgml/ref/alter_subscription.sgml
+++ b/doc/src/sgml/ref/alter_subscription.sgml
@@ -161,12 +161,22 @@ ALTER SUBSCRIPTION <replaceable class="parameter">name</replaceable> RENAME TO <
 
       <variablelist>
        <varlistentry>
-        <term><literal>copy_data</literal> (<type>boolean</type>)</term>
+        <term><literal>copy_data</literal> (<type>enum</type>)</term>
         <listitem>
          <para>
           Specifies whether to copy pre-existing data in the publications
-          that are being subscribed to when the replication starts.
-          The default is <literal>true</literal>.
+          that are being subscribed to when the replication starts. This
+          parameter may be either <literal>true</literal>,
+          <literal>false</literal> or <literal>force</literal>. The default is
+          <literal>true</literal>.
+         </para>
+         <para>
+          There is some interaction between the <literal>only_local</literal>
+          option and <literal>copy_data</literal> option. Refer to the
+          <command>CREATE SUBSCRIPTION</command>
+          <xref linkend="sql-createsubscription-notes" /> for interaction
+          details and usage of <literal>force</literal> for
+          <literal>copy_data</literal> option.
          </para>
          <para>
           Previously subscribed tables are not copied, even if a table's row
diff --git a/doc/src/sgml/ref/create_subscription.sgml b/doc/src/sgml/ref/create_subscription.sgml
index 00580cc7ba..2fcd2238af 100644
--- a/doc/src/sgml/ref/create_subscription.sgml
+++ b/doc/src/sgml/ref/create_subscription.sgml
@@ -201,18 +201,28 @@ CREATE SUBSCRIPTION <replaceable class="parameter">subscription_name</replaceabl
        </varlistentry>
 
        <varlistentry>
-        <term><literal>copy_data</literal> (<type>boolean</type>)</term>
+        <term><literal>copy_data</literal> (<type>enum</type>)</term>
         <listitem>
          <para>
           Specifies whether to copy pre-existing data in the publications
-          that are being subscribed to when the replication starts.
-          The default is <literal>true</literal>.
+          that are being subscribed to when the replication starts. This
+          parameter may be either <literal>true</literal>,
+          <literal>false</literal> or <literal>force</literal>. The default is
+          <literal>true</literal>.
          </para>
          <para>
           If the publications contain <literal>WHERE</literal> clauses, it
           will affect what data is copied. Refer to the
           <xref linkend="sql-createsubscription-notes" /> for details.
          </para>
+
+         <para>
+          There is some interaction between the <literal>only_local</literal>
+          option and <literal>copy_data</literal> option. Refer to the
+          <xref linkend="sql-createsubscription-notes" /> for interaction
+          details and usage of <literal>force</literal> for
+          <literal>copy_data</literal> option.
+         </para>
         </listitem>
        </varlistentry>
 
@@ -225,6 +235,11 @@ CREATE SUBSCRIPTION <replaceable class="parameter">subscription_name</replaceabl
           publisher node changes regardless of their origin. The default is
           <literal>false</literal>.
          </para>
+         <para>
+          There is some interaction between the <literal>only_local</literal>
+          option and <literal>copy_data</literal> option. Refer to the
+          <xref linkend="sql-createsubscription-notes" /> for details.
+         </para>
         </listitem>
        </varlistentry>
 
@@ -374,6 +389,18 @@ CREATE SUBSCRIPTION <replaceable class="parameter">subscription_name</replaceabl
    can have non-existent publications.
   </para>
 
+  <para>
+   If subscription is created with <literal>only_local = on</literal> and
+   <literal>copy_data = on</literal>, it will check if the publisher tables are
+   being subscribed to any other publisher and throw an error to prevent
+   inconsistent data in the subscription. The user can continue with the copy
+   operation without throwing any error in this case by specifying
+   <literal>copy_data = force</literal>. Refer to the
+   <xref linkend="bidirectional-logical-replication"/> on how
+   <literal>copy_data</literal> and <literal>only_local</literal> can be used
+   in bidirectional replication.
+  </para>
+
  </refsect1>
 
  <refsect1>
diff --git a/src/backend/commands/subscriptioncmds.c b/src/backend/commands/subscriptioncmds.c
index 479d6ca372..20bdf86f1b 100644
--- a/src/backend/commands/subscriptioncmds.c
+++ b/src/backend/commands/subscriptioncmds.c
@@ -69,6 +69,18 @@
 /* check if the 'val' has 'bits' set */
 #define IsSet(val, bits)  (((val) & (bits)) == (bits))
 
+#define IS_COPY_DATA_ON_OR_FORCE(copy_data) ((copy_data) != COPY_DATA_OFF)
+
+/*
+ * Represents whether copy_data option is specified with off, on or force.
+ */
+typedef enum CopyData
+{
+	COPY_DATA_OFF,
+	COPY_DATA_ON,
+	COPY_DATA_FORCE
+} CopyData;
+
 /*
  * Structure to hold a bitmap representing the user-provided CREATE/ALTER
  * SUBSCRIPTION command options and the parsed/default values of each of them.
@@ -81,7 +93,7 @@ typedef struct SubOpts
 	bool		connect;
 	bool		enabled;
 	bool		create_slot;
-	bool		copy_data;
+	CopyData	copy_data;
 	bool		refresh;
 	bool		binary;
 	bool		streaming;
@@ -91,11 +103,66 @@ typedef struct SubOpts
 	XLogRecPtr	lsn;
 } SubOpts;
 
-static List *fetch_table_list(WalReceiverConn *wrconn, List *publications);
+static List *fetch_table_list(WalReceiverConn *wrconn, List *publications,
+							  CopyData copydata, bool only_local);
 static void check_duplicates_in_publist(List *publist, Datum *datums);
 static List *merge_publications(List *oldpublist, List *newpublist, bool addpub, const char *subname);
 static void ReportSlotConnectionError(List *rstates, Oid subid, char *slotname, char *err);
 
+/*
+ * Validate the value specified for copy_data option.
+ */
+static CopyData
+DefGetCopyData(DefElem *def)
+{
+	/*
+	 * If no parameter given, assume "true" is meant.
+	 */
+	if (def->arg == NULL)
+		return COPY_DATA_ON;
+
+	/*
+	 * Allow 0, 1, "true", "false", "on", "off" or "force".
+	 */
+	switch (nodeTag(def->arg))
+	{
+		case T_Integer:
+			switch (intVal(def->arg))
+			{
+				case 0:
+					return COPY_DATA_OFF;
+				case 1:
+					return COPY_DATA_ON;
+				default:
+					/* otherwise, error out below */
+					break;
+			}
+			break;
+		default:
+			{
+				char	   *sval = defGetString(def);
+
+				/*
+				 * The set of strings accepted here should match up with the
+				 * grammar's opt_boolean_or_string production.
+				 */
+				if (pg_strcasecmp(sval, "false") == 0 ||
+					pg_strcasecmp(sval, "off") == 0)
+					return COPY_DATA_OFF;
+				if (pg_strcasecmp(sval, "true") == 0 ||
+					pg_strcasecmp(sval, "on") == 0)
+					return COPY_DATA_ON;
+				if (pg_strcasecmp(sval, "force") == 0)
+					return COPY_DATA_FORCE;
+			}
+			break;
+	}
+
+	ereport(ERROR,
+			errcode(ERRCODE_SYNTAX_ERROR),
+			errmsg("%s requires a boolean or \"force\"", def->defname));
+	return COPY_DATA_OFF;		/* keep compiler quiet */
+}
 
 /*
  * Common option parsing function for CREATE and ALTER SUBSCRIPTION commands.
@@ -128,7 +195,7 @@ parse_subscription_options(ParseState *pstate, List *stmt_options,
 	if (IsSet(supported_opts, SUBOPT_CREATE_SLOT))
 		opts->create_slot = true;
 	if (IsSet(supported_opts, SUBOPT_COPY_DATA))
-		opts->copy_data = true;
+		opts->copy_data = COPY_DATA_ON;
 	if (IsSet(supported_opts, SUBOPT_REFRESH))
 		opts->refresh = true;
 	if (IsSet(supported_opts, SUBOPT_BINARY))
@@ -196,7 +263,7 @@ parse_subscription_options(ParseState *pstate, List *stmt_options,
 				errorConflictingDefElem(defel, pstate);
 
 			opts->specified_opts |= SUBOPT_COPY_DATA;
-			opts->copy_data = defGetBoolean(defel);
+			opts->copy_data = DefGetCopyData(defel);
 		}
 		else if (IsSet(supported_opts, SUBOPT_SYNCHRONOUS_COMMIT) &&
 				 strcmp(defel->defname, "synchronous_commit") == 0)
@@ -333,17 +400,17 @@ parse_subscription_options(ParseState *pstate, List *stmt_options,
 					 errmsg("%s and %s are mutually exclusive options",
 							"connect = false", "create_slot = true")));
 
-		if (opts->copy_data &&
+		if (IS_COPY_DATA_ON_OR_FORCE(opts->copy_data) &&
 			IsSet(opts->specified_opts, SUBOPT_COPY_DATA))
 			ereport(ERROR,
 					(errcode(ERRCODE_SYNTAX_ERROR),
 					 errmsg("%s and %s are mutually exclusive options",
-							"connect = false", "copy_data = true")));
+							"connect = false", "copy_data = true/force")));
 
 		/* Change the defaults of other options. */
 		opts->enabled = false;
 		opts->create_slot = false;
-		opts->copy_data = false;
+		opts->copy_data = COPY_DATA_OFF;
 	}
 
 	/*
@@ -671,13 +738,14 @@ CreateSubscription(ParseState *pstate, CreateSubscriptionStmt *stmt,
 			 * Set sync state based on if we were asked to do data copy or
 			 * not.
 			 */
-			table_state = opts.copy_data ? SUBREL_STATE_INIT : SUBREL_STATE_READY;
+			table_state = IS_COPY_DATA_ON_OR_FORCE(opts.copy_data) ? SUBREL_STATE_INIT : SUBREL_STATE_READY;
 
 			/*
 			 * Get the table list from publisher and build local table status
 			 * info.
 			 */
-			tables = fetch_table_list(wrconn, publications);
+			tables = fetch_table_list(wrconn, publications, opts.copy_data,
+									  opts.only_local);
 			foreach(lc, tables)
 			{
 				RangeVar   *rv = (RangeVar *) lfirst(lc);
@@ -720,7 +788,8 @@ CreateSubscription(ParseState *pstate, CreateSubscriptionStmt *stmt,
 				 * PENDING, to allow ALTER SUBSCRIPTION ... REFRESH
 				 * PUBLICATION to work.
 				 */
-				if (opts.twophase && !opts.copy_data && tables != NIL)
+				if (opts.twophase && opts.copy_data == COPY_DATA_OFF &&
+					tables != NIL)
 					twophase_enabled = true;
 
 				walrcv_create_slot(wrconn, opts.slot_name, false, twophase_enabled,
@@ -761,7 +830,7 @@ CreateSubscription(ParseState *pstate, CreateSubscriptionStmt *stmt,
 }
 
 static void
-AlterSubscription_refresh(Subscription *sub, bool copy_data,
+AlterSubscription_refresh(Subscription *sub, CopyData copy_data,
 						  List *validate_publications)
 {
 	char	   *err;
@@ -797,7 +866,8 @@ AlterSubscription_refresh(Subscription *sub, bool copy_data,
 			check_publications(wrconn, validate_publications);
 
 		/* Get the table list from publisher. */
-		pubrel_names = fetch_table_list(wrconn, sub->publications);
+		pubrel_names = fetch_table_list(wrconn, sub->publications, copy_data,
+										sub->only_local);
 
 		/* Get local table list. */
 		subrel_states = GetSubscriptionRelations(sub->oid);
@@ -851,7 +921,7 @@ AlterSubscription_refresh(Subscription *sub, bool copy_data,
 						 list_length(subrel_states), sizeof(Oid), oid_cmp))
 			{
 				AddSubscriptionRelState(sub->oid, relid,
-										copy_data ? SUBREL_STATE_INIT : SUBREL_STATE_READY,
+										IS_COPY_DATA_ON_OR_FORCE(copy_data) ? SUBREL_STATE_INIT : SUBREL_STATE_READY,
 										InvalidXLogRecPtr);
 				ereport(DEBUG1,
 						(errmsg_internal("table \"%s.%s\" added to subscription \"%s\"",
@@ -1157,7 +1227,7 @@ AlterSubscription(ParseState *pstate, AlterSubscriptionStmt *stmt,
 					 * See ALTER_SUBSCRIPTION_REFRESH for details why this is
 					 * not allowed.
 					 */
-					if (sub->twophasestate == LOGICALREP_TWOPHASE_STATE_ENABLED && opts.copy_data)
+					if (sub->twophasestate == LOGICALREP_TWOPHASE_STATE_ENABLED && IS_COPY_DATA_ON_OR_FORCE(opts.copy_data))
 						ereport(ERROR,
 								(errcode(ERRCODE_SYNTAX_ERROR),
 								 errmsg("ALTER SUBSCRIPTION with refresh and copy_data is not allowed when two_phase is enabled"),
@@ -1209,7 +1279,7 @@ AlterSubscription(ParseState *pstate, AlterSubscriptionStmt *stmt,
 					 * See ALTER_SUBSCRIPTION_REFRESH for details why this is
 					 * not allowed.
 					 */
-					if (sub->twophasestate == LOGICALREP_TWOPHASE_STATE_ENABLED && opts.copy_data)
+					if (sub->twophasestate == LOGICALREP_TWOPHASE_STATE_ENABLED && IS_COPY_DATA_ON_OR_FORCE(opts.copy_data))
 						ereport(ERROR,
 								(errcode(ERRCODE_SYNTAX_ERROR),
 								 errmsg("ALTER SUBSCRIPTION with refresh and copy_data is not allowed when two_phase is enabled"),
@@ -1255,7 +1325,8 @@ AlterSubscription(ParseState *pstate, AlterSubscriptionStmt *stmt,
 				 *
 				 * For more details see comments atop worker.c.
 				 */
-				if (sub->twophasestate == LOGICALREP_TWOPHASE_STATE_ENABLED && opts.copy_data)
+				if (sub->twophasestate == LOGICALREP_TWOPHASE_STATE_ENABLED &&
+					IS_COPY_DATA_ON_OR_FORCE(opts.copy_data))
 					ereport(ERROR,
 							(errcode(ERRCODE_SYNTAX_ERROR),
 							 errmsg("ALTER SUBSCRIPTION ... REFRESH with copy_data is not allowed when two_phase is enabled"),
@@ -1778,22 +1849,27 @@ AlterSubscriptionOwner_oid(Oid subid, Oid newOwnerId)
  * publisher connection.
  */
 static List *
-fetch_table_list(WalReceiverConn *wrconn, List *publications)
+fetch_table_list(WalReceiverConn *wrconn, List *publications, CopyData copydata,
+				 bool only_local)
 {
 	WalRcvExecResult *res;
 	StringInfoData cmd;
 	TupleTableSlot *slot;
-	Oid			tableRow[2] = {TEXTOID, TEXTOID};
+	Oid			tableRow[3] = {TEXTOID, TEXTOID, CHAROID};
 	List	   *tablelist = NIL;
 
 	initStringInfo(&cmd);
-	appendStringInfoString(&cmd, "SELECT DISTINCT t.schemaname, t.tablename\n"
-						   "  FROM pg_catalog.pg_publication_tables t\n"
-						   " WHERE t.pubname IN (");
+	appendStringInfoString(&cmd,
+						   "SELECT DISTINCT N.nspname AS schemaname, C.relname AS tablename, PS.srrelid as replicated\n"
+						   "FROM pg_publication P,\n"
+						   "LATERAL pg_get_publication_tables(P.pubname) GPT\n"
+						   "LEFT JOIN pg_subscription_rel PS ON (GPT.relid = PS.srrelid),\n"
+						   "pg_class C JOIN pg_namespace N ON (N.oid = C.relnamespace)\n"
+						   "WHERE C.oid = GPT.relid AND P.pubname in (");
 	get_publications_str(publications, &cmd, true);
 	appendStringInfoChar(&cmd, ')');
 
-	res = walrcv_exec(wrconn, cmd.data, 2, tableRow);
+	res = walrcv_exec(wrconn, cmd.data, 3, tableRow);
 	pfree(cmd.data);
 
 	if (res->status != WALRCV_OK_TUPLES)
@@ -1819,6 +1895,25 @@ fetch_table_list(WalReceiverConn *wrconn, List *publications)
 		rv = makeRangeVar(nspname, relname, -1);
 		tablelist = lappend(tablelist, rv);
 
+		/*
+		 * XXX: During initial table sync we cannot differentiate between the
+		 * local and non-local data that is present in the HEAP. Identification
+		 * of local data can be done only from the WAL by using the origin id.
+		 * Throw an error so that the user can take care of the initial data
+		 * copying and then create subscription with copy_data as off or force.
+		 *
+		 * It is quite possible that subscriber has not yet pulled data to
+		 * the tables, but in ideal cases the table data will be subscribed.
+		 * To keep the code simple it is not checked if the subscriber table
+		 * has pulled the data or not.
+		 */
+		if (copydata == COPY_DATA_ON && only_local && !slot_attisnull(slot, 3))
+			ereport(ERROR,
+					errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
+					errmsg("CREATE/ALTER SUBSCRIPTION with only_local and copy_data as true is not allowed when the publisher might have replicated data, table:%s.%s might have replicated data in the publisher",
+						   nspname, relname),
+					errhint("Use CREATE/ALTER SUBSCRIPTION with copy_data = off or force"));
+
 		ExecClearTuple(slot);
 	}
 	ExecDropSingleTupleTableSlot(slot);
diff --git a/src/test/regress/expected/subscription.out b/src/test/regress/expected/subscription.out
index a9351b426b..d209da612b 100644
--- a/src/test/regress/expected/subscription.out
+++ b/src/test/regress/expected/subscription.out
@@ -47,7 +47,13 @@ ERROR:  must be superuser to create subscriptions
 SET SESSION AUTHORIZATION 'regress_subscription_user';
 -- fail - invalid option combinations
 CREATE SUBSCRIPTION regress_testsub2 CONNECTION 'dbname=regress_doesnotexist' PUBLICATION testpub WITH (connect = false, copy_data = true);
-ERROR:  connect = false and copy_data = true are mutually exclusive options
+ERROR:  connect = false and copy_data = true/force are mutually exclusive options
+CREATE SUBSCRIPTION regress_testsub2 CONNECTION 'dbname=regress_doesnotexist' PUBLICATION testpub WITH (connect = false, copy_data = on);
+ERROR:  connect = false and copy_data = true/force are mutually exclusive options
+CREATE SUBSCRIPTION regress_testsub2 CONNECTION 'dbname=regress_doesnotexist' PUBLICATION testpub WITH (connect = false, copy_data = 1);
+ERROR:  connect = false and copy_data = true/force are mutually exclusive options
+CREATE SUBSCRIPTION regress_testsub2 CONNECTION 'dbname=regress_doesnotexist' PUBLICATION testpub WITH (connect = false, copy_data = force);
+ERROR:  connect = false and copy_data = true/force are mutually exclusive options
 CREATE SUBSCRIPTION regress_testsub2 CONNECTION 'dbname=regress_doesnotexist' PUBLICATION testpub WITH (connect = false, enabled = true);
 ERROR:  connect = false and enabled = true are mutually exclusive options
 CREATE SUBSCRIPTION regress_testsub2 CONNECTION 'dbname=regress_doesnotexist' PUBLICATION testpub WITH (connect = false, create_slot = true);
@@ -93,6 +99,16 @@ ALTER SUBSCRIPTION regress_testsub4 SET (only_local = false);
 
 DROP SUBSCRIPTION regress_testsub3;
 DROP SUBSCRIPTION regress_testsub4;
+-- ok - valid copy_data options
+CREATE SUBSCRIPTION regress_testsub3 CONNECTION 'dbname=regress_doesnotexist' PUBLICATION testpub WITH (slot_name = NONE, connect = false, copy_data = false);
+WARNING:  tables were not subscribed, you will have to run ALTER SUBSCRIPTION ... REFRESH PUBLICATION to subscribe the tables
+CREATE SUBSCRIPTION regress_testsub4 CONNECTION 'dbname=regress_doesnotexist' PUBLICATION testpub WITH (slot_name = NONE, connect = false, copy_data = off);
+WARNING:  tables were not subscribed, you will have to run ALTER SUBSCRIPTION ... REFRESH PUBLICATION to subscribe the tables
+CREATE SUBSCRIPTION regress_testsub5 CONNECTION 'dbname=regress_doesnotexist' PUBLICATION testpub WITH (slot_name = NONE, connect = false, copy_data = 0);
+WARNING:  tables were not subscribed, you will have to run ALTER SUBSCRIPTION ... REFRESH PUBLICATION to subscribe the tables
+DROP SUBSCRIPTION regress_testsub3;
+DROP SUBSCRIPTION regress_testsub4;
+DROP SUBSCRIPTION regress_testsub5;
 -- fail - invalid connection string
 ALTER SUBSCRIPTION regress_testsub CONNECTION 'foobar';
 ERROR:  invalid connection string syntax: missing "=" after "foobar" in connection info string
diff --git a/src/test/regress/sql/subscription.sql b/src/test/regress/sql/subscription.sql
index 28eb91fc47..3e95c60800 100644
--- a/src/test/regress/sql/subscription.sql
+++ b/src/test/regress/sql/subscription.sql
@@ -40,6 +40,9 @@ SET SESSION AUTHORIZATION 'regress_subscription_user';
 
 -- fail - invalid option combinations
 CREATE SUBSCRIPTION regress_testsub2 CONNECTION 'dbname=regress_doesnotexist' PUBLICATION testpub WITH (connect = false, copy_data = true);
+CREATE SUBSCRIPTION regress_testsub2 CONNECTION 'dbname=regress_doesnotexist' PUBLICATION testpub WITH (connect = false, copy_data = on);
+CREATE SUBSCRIPTION regress_testsub2 CONNECTION 'dbname=regress_doesnotexist' PUBLICATION testpub WITH (connect = false, copy_data = 1);
+CREATE SUBSCRIPTION regress_testsub2 CONNECTION 'dbname=regress_doesnotexist' PUBLICATION testpub WITH (connect = false, copy_data = force);
 CREATE SUBSCRIPTION regress_testsub2 CONNECTION 'dbname=regress_doesnotexist' PUBLICATION testpub WITH (connect = false, enabled = true);
 CREATE SUBSCRIPTION regress_testsub2 CONNECTION 'dbname=regress_doesnotexist' PUBLICATION testpub WITH (connect = false, create_slot = true);
 CREATE SUBSCRIPTION regress_testsub2 CONNECTION 'dbname=regress_doesnotexist' PUBLICATION testpub WITH (slot_name = NONE, enabled = true);
@@ -66,6 +69,15 @@ ALTER SUBSCRIPTION regress_testsub4 SET (only_local = false);
 DROP SUBSCRIPTION regress_testsub3;
 DROP SUBSCRIPTION regress_testsub4;
 
+-- ok - valid copy_data options
+CREATE SUBSCRIPTION regress_testsub3 CONNECTION 'dbname=regress_doesnotexist' PUBLICATION testpub WITH (slot_name = NONE, connect = false, copy_data = false);
+CREATE SUBSCRIPTION regress_testsub4 CONNECTION 'dbname=regress_doesnotexist' PUBLICATION testpub WITH (slot_name = NONE, connect = false, copy_data = off);
+CREATE SUBSCRIPTION regress_testsub5 CONNECTION 'dbname=regress_doesnotexist' PUBLICATION testpub WITH (slot_name = NONE, connect = false, copy_data = 0);
+
+DROP SUBSCRIPTION regress_testsub3;
+DROP SUBSCRIPTION regress_testsub4;
+DROP SUBSCRIPTION regress_testsub5;
+
 -- fail - invalid connection string
 ALTER SUBSCRIPTION regress_testsub CONNECTION 'foobar';
 
diff --git a/src/test/subscription/t/032_onlylocal.pl b/src/test/subscription/t/032_onlylocal.pl
index 5ff5a0d9dc..47b9412e70 100644
--- a/src/test/subscription/t/032_onlylocal.pl
+++ b/src/test/subscription/t/032_onlylocal.pl
@@ -8,6 +8,116 @@ use PostgreSQL::Test::Cluster;
 use PostgreSQL::Test::Utils;
 use Test::More;
 
+my $result;
+my $stdout;
+my $stderr;
+
+my $subname_AB = 'tap_sub_A_B';
+my $subname_AC = 'tap_sub_A_C';
+my $subname_BA = 'tap_sub_B_A';
+my $subname_BC = 'tap_sub_B_C';
+my $subname_CA = 'tap_sub_C_A';
+my $subname_CB = 'tap_sub_C_B';
+
+# Detach node C from the node-group of (A, B, C) and clean the table contents
+# from all nodes.
+sub detach_node_clean_table_data
+{
+	my ($node_A, $node_B, $node_C) = @_;
+	$node_A->safe_psql('postgres', "DROP SUBSCRIPTION tap_sub_A_C");
+	$node_B->safe_psql('postgres', "DROP SUBSCRIPTION tap_sub_B_C");
+	$node_C->safe_psql('postgres', "DROP SUBSCRIPTION tap_sub_C_A");
+	$node_C->safe_psql('postgres', "DROP SUBSCRIPTION tap_sub_C_B");
+
+	$result =
+	  $node_A->safe_psql('postgres', "SELECT count(*) FROM pg_subscription");
+	is($result, qq(1), 'check subscription was dropped on subscriber');
+
+	$result =
+	  $node_B->safe_psql('postgres', "SELECT count(*) FROM pg_subscription");
+	is($result, qq(1), 'check subscription was dropped on subscriber');
+
+	$result =
+	  $node_C->safe_psql('postgres', "SELECT count(*) FROM pg_subscription");
+	is($result, qq(0), 'check subscription was dropped on subscriber');
+
+	$result = $node_A->safe_psql('postgres',
+		"SELECT count(*) FROM pg_replication_slots");
+	is($result, qq(1), 'check replication slot was dropped on publisher');
+
+	$result = $node_B->safe_psql('postgres',
+		"SELECT count(*) FROM pg_replication_slots");
+	is($result, qq(1), 'check replication slot was dropped on publisher');
+
+	$result = $node_C->safe_psql('postgres',
+		"SELECT count(*) FROM pg_replication_slots");
+	is($result, qq(0), 'check replication slot was dropped on publisher');
+
+	$node_A->safe_psql('postgres', "TRUNCATE tab_full");
+	$node_B->safe_psql('postgres', "TRUNCATE tab_full");
+	$node_C->safe_psql('postgres', "TRUNCATE tab_full");
+}
+
+# Subroutine to verify the data is replicated successfully.
+sub verify_data
+{
+	my ($node_A, $node_B, $node_C, $expect) = @_;
+
+	$node_A->wait_for_catchup($subname_BA);
+	$node_A->wait_for_catchup($subname_CA);
+	$node_B->wait_for_catchup($subname_AB);
+	$node_B->wait_for_catchup($subname_CB);
+	$node_C->wait_for_catchup($subname_AC);
+	$node_C->wait_for_catchup($subname_BC);
+
+	# check that data is replicated to all the nodes
+	$result =
+	  $node_A->safe_psql('postgres', "SELECT * FROM tab_full ORDER BY 1;");
+	is($result, qq($expect),
+	   'Data is replicated as expected'
+	);
+
+	$result =
+	  $node_B->safe_psql('postgres', "SELECT * FROM tab_full ORDER BY 1;");
+	is($result, qq($expect),
+	   'Data is replicated as expected'
+	);
+
+	$result =
+	  $node_C->safe_psql('postgres', "SELECT * FROM tab_full ORDER BY 1;");
+	is($result, qq($expect),
+	   'Data is replicated as expected'
+	);
+}
+
+my $synced_query =
+  "SELECT count(1) = 0 FROM pg_subscription_rel WHERE srsubstate NOT IN ('r', 's');";
+
+# Subroutine to create subscription and wait till the initial sync is completed.
+# Subroutine expects subscriber node, publisher node, subscription name,
+# destination connection string, publication name and the subscription with
+# options to be passed as input parameters.
+sub create_subscription
+{
+	my ($node_subscriber, $node_publisher, $sub_name, $node_connstr,
+		$pub_name, $with_options)
+	  = @_;
+
+	# Application_name is always assigned the same value as the subscription
+	# name.
+	$node_subscriber->safe_psql(
+		'postgres', "
+                CREATE SUBSCRIPTION $sub_name
+                CONNECTION '$node_connstr application_name=$sub_name'
+                PUBLICATION $pub_name
+                WITH ($with_options)");
+	$node_publisher->wait_for_catchup($sub_name);
+
+	# also wait for initial table sync to finish
+	$node_subscriber->poll_query_until('postgres', $synced_query)
+	  or die "Timed out while waiting for subscriber to synchronize data";
+}
+
 ###############################################################################
 # Setup a bidirectional logical replication between Node_A & Node_B
 ###############################################################################
@@ -43,42 +153,18 @@ $node_B->safe_psql('postgres', "CREATE TABLE tab_full (a int PRIMARY KEY)");
 my $node_A_connstr = $node_A->connstr . ' dbname=postgres';
 $node_A->safe_psql('postgres',
 	"CREATE PUBLICATION tap_pub_A FOR TABLE tab_full");
-my $appname_B1 = 'tap_sub_B1';
-$node_B->safe_psql(
-	'postgres', "
-	CREATE SUBSCRIPTION tap_sub_B1
-	CONNECTION '$node_A_connstr application_name=$appname_B1'
-	PUBLICATION tap_pub_A
-	WITH (only_local = on)");
+create_subscription($node_B, $node_A, $subname_BA, $node_A_connstr,
+	'tap_pub_A', 'copy_data = on, only_local = on');
 
 # node_B (pub) -> node_A (sub)
 my $node_B_connstr = $node_B->connstr . ' dbname=postgres';
 $node_B->safe_psql('postgres',
 	"CREATE PUBLICATION tap_pub_B FOR TABLE tab_full");
-my $appname_A = 'tap_sub_A';
-$node_A->safe_psql(
-	'postgres', "
-	CREATE SUBSCRIPTION tap_sub_A
-	CONNECTION '$node_B_connstr application_name=$appname_A'
-	PUBLICATION tap_pub_B
-	WITH (only_local = on, copy_data = off)");
-
-# Wait for subscribers to finish initialization
-$node_A->wait_for_catchup($appname_B1);
-$node_B->wait_for_catchup($appname_A);
-
-# Also wait for initial table sync to finish
-my $synced_query =
-  "SELECT count(1) = 0 FROM pg_subscription_rel WHERE srsubstate NOT IN ('r', 's');";
-$node_A->poll_query_until('postgres', $synced_query)
-  or die "Timed out while waiting for subscriber to synchronize data";
-$node_B->poll_query_until('postgres', $synced_query)
-  or die "Timed out while waiting for subscriber to synchronize data";
+create_subscription($node_A, $node_B, $subname_AB, $node_B_connstr,
+	'tap_pub_B', 'copy_data = off, only_local = on');
 
 is(1, 1, "Circular replication setup is complete");
 
-my $result;
-
 ###############################################################################
 # check that bidirectional logical replication setup does not cause infinite
 # recursive insertion.
@@ -88,8 +174,8 @@ my $result;
 $node_A->safe_psql('postgres', "INSERT INTO tab_full VALUES (11);");
 $node_B->safe_psql('postgres', "INSERT INTO tab_full VALUES (12);");
 
-$node_A->wait_for_catchup($appname_B1);
-$node_B->wait_for_catchup($appname_A);
+$node_A->wait_for_catchup($subname_BA);
+$node_B->wait_for_catchup($subname_AB);
 
 # check that transaction was committed on subscriber(s)
 $result = $node_A->safe_psql('postgres', "SELECT * FROM tab_full ORDER BY 1;");
@@ -122,25 +208,14 @@ $node_C->safe_psql('postgres', "CREATE TABLE tab_full (a int PRIMARY KEY)");
 my $node_C_connstr = $node_C->connstr . ' dbname=postgres';
 $node_C->safe_psql('postgres',
 	"CREATE PUBLICATION tap_pub_C FOR TABLE tab_full");
-
-my $appname_B2 = 'tap_sub_B2';
-$node_B->safe_psql(
-	'postgres', "
-	CREATE SUBSCRIPTION tap_sub_B2
-	CONNECTION '$node_C_connstr application_name=$appname_B2'
-	PUBLICATION tap_pub_C
-	WITH (only_local = on)");
-
-$node_C->wait_for_catchup($appname_B2);
-
-$node_C->poll_query_until('postgres', $synced_query)
-  or die "Timed out while waiting for subscriber to synchronize data";
+create_subscription($node_B, $node_C, $subname_BC, $node_C_connstr,
+	'tap_pub_C', 'copy_data = on, only_local = on');
 
 # insert a record
 $node_C->safe_psql('postgres', "INSERT INTO tab_full VALUES (13);");
 
-$node_C->wait_for_catchup($appname_B2);
-$node_B->wait_for_catchup($appname_A);
+$node_C->wait_for_catchup($subname_BC);
+$node_B->wait_for_catchup($subname_AB);
 
 $result = $node_B->safe_psql('postgres', "SELECT * FROM tab_full ORDER BY 1;");
 is($result, qq(11
@@ -154,6 +229,168 @@ is($result, qq(11
 12), 'Remote data originated from other node is not replicated when only_local option is ON'
 );
 
+# clear the operations done by this test
+$node_B->safe_psql(
+       'postgres', "
+        DROP SUBSCRIPTION $subname_BC");
+$node_C->safe_psql(
+	    'postgres', "
+        DELETE FROM tab_full");
+$node_B->safe_psql(
+	    'postgres', "
+        DELETE FROM tab_full where a = 13");
+
+###############################################################################
+# Specifying only_local 'on' which indicates that the publisher should only
+# replicate the changes that are generated locally from node_B, but in
+# this case since the node_B is also subscribing data from node_A, node_B can
+# have remotely originated data from node_A. We throw an error, in this case,
+# to draw attention to there being possible remote data.
+###############################################################################
+($result, $stdout, $stderr) = $node_A->psql(
+       'postgres', "
+        CREATE SUBSCRIPTION tap_sub_A2
+        CONNECTION '$node_B_connstr application_name=$subname_AB'
+        PUBLICATION tap_pub_B
+        WITH (only_local = on, copy_data = on)");
+like(
+       $stderr,
+       qr/ERROR:  CREATE\/ALTER SUBSCRIPTION with only_local and copy_data as true is not allowed when the publisher might have replicated data/,
+       "Create subscription with only_local and copy_data having replicated table in publisher"
+);
+
+# Creating subscription with only_local and copy_data as force should be
+# successful when the publisher has replicated data
+$node_A->safe_psql(
+       'postgres', "
+        CREATE SUBSCRIPTION tap_sub_A2
+        CONNECTION '$node_B_connstr application_name=$subname_AC'
+        PUBLICATION tap_pub_B
+        WITH (only_local = on, copy_data = force)");
+
+$node_A->safe_psql(
+       'postgres', "
+        DROP SUBSCRIPTION tap_sub_A2");
+
+###############################################################################
+# Join 3rd node (node_C) to the existing 2 nodes(node_A & node_B) bidirectional
+# replication setup when the existing nodes (node_A & node_B) has pre-existing
+# data and the new node (node_C) does not have any data.
+###############################################################################
+$result = $node_A->safe_psql('postgres', "SELECT * FROM tab_full ORDER BY 1;");
+is( $result, qq(11
+12), 'Check existing data');
+
+$result = $node_B->safe_psql('postgres', "SELECT * FROM tab_full ORDER BY 1;");
+is( $result, qq(11
+12), 'Check existing data');
+
+$result =
+	$node_C->safe_psql('postgres', "SELECT * FROM tab_full ORDER BY 1;");
+is( $result, qq(), 'Check existing data');
+
+create_subscription($node_A, $node_C, $subname_AC, $node_C_connstr,
+       'tap_pub_C', 'copy_data = off, only_local = on');
+create_subscription($node_B, $node_C, $subname_BC, $node_C_connstr,
+       'tap_pub_C', 'copy_data = off, only_local = on');
+create_subscription($node_C, $node_A, $subname_CA, $node_A_connstr,
+       'tap_pub_A', 'copy_data = force, only_local = on');
+create_subscription($node_C, $node_B, $subname_CB, $node_B_connstr,
+       'tap_pub_B', 'copy_data = off, only_local = on');
+
+# insert some data in all the nodes
+$node_A->safe_psql('postgres', "INSERT INTO tab_full VALUES (13);");
+$node_B->safe_psql('postgres', "INSERT INTO tab_full VALUES (23);");
+$node_C->safe_psql('postgres', "INSERT INTO tab_full VALUES (33);");
+
+verify_data($node_A, $node_B, $node_C, '11
+12
+13
+23
+33');
+
+detach_node_clean_table_data($node_A, $node_B, $node_C);
+
+###############################################################################
+# Join 3rd node (node_C) to the existing 2 nodes(node_A & node_B) bidirectional
+# replication setup when the existing nodes (node_A & node_B) and the new node
+# (node_C) does not have any data.
+###############################################################################
+$result = $node_A->safe_psql('postgres', "SELECT * FROM tab_full ORDER BY 1;");
+is( $result, qq(), 'Check existing data');
+
+$result = $node_B->safe_psql('postgres', "SELECT * FROM tab_full ORDER BY 1;");
+is( $result, qq(), 'Check existing data');
+
+$result = $node_C->safe_psql('postgres', "SELECT * FROM tab_full ORDER BY 1;");
+is( $result, qq(), 'Check existing data');
+
+create_subscription($node_A, $node_C, $subname_AC, $node_C_connstr,
+       'tap_pub_C', 'copy_data = off, only_local = on');
+create_subscription($node_B, $node_C, $subname_BC, $node_C_connstr,
+       'tap_pub_C', 'copy_data = off, only_local = on');
+create_subscription($node_C, $node_A, $subname_CA, $node_A_connstr,
+       'tap_pub_A', 'copy_data = off, only_local = on');
+create_subscription($node_C, $node_B, $subname_CB, $node_B_connstr,
+       'tap_pub_B', 'copy_data = off, only_local = on');
+
+# insert some data in all the nodes
+$node_A->safe_psql('postgres', "INSERT INTO tab_full VALUES (11);");
+$node_B->safe_psql('postgres', "INSERT INTO tab_full VALUES (21);");
+$node_C->safe_psql('postgres', "INSERT INTO tab_full VALUES (31);");
+
+verify_data($node_A, $node_B, $node_C, '11
+21
+31');
+
+detach_node_clean_table_data($node_A, $node_B, $node_C);
+
+###############################################################################
+# Join 3rd node (node_C) to the existing 2 nodes(node_A & node_B) bidirectional
+# replication setup when the existing nodes (node_A & node_B) has no data and
+# the new node (node_C) some pre-existing data.
+###############################################################################
+$node_C->safe_psql('postgres', "INSERT INTO tab_full VALUES (31);");
+
+$result = $node_A->safe_psql('postgres', "SELECT * FROM tab_full ORDER BY 1;");
+is( $result, qq(), 'Check existing data');
+
+$result = $node_B->safe_psql('postgres', "SELECT * FROM tab_full ORDER BY 1;");
+is( $result, qq(), 'Check existing data');
+
+$result = $node_C->safe_psql('postgres', "SELECT * FROM tab_full ORDER BY 1;");
+is($result, qq(31), 'Check existing data');
+
+create_subscription($node_A, $node_C, $subname_AC, $node_C_connstr,
+       'tap_pub_C', 'copy_data = on, only_local = on');
+create_subscription($node_B, $node_C, $subname_BC, $node_C_connstr,
+       'tap_pub_C', 'copy_data = on, only_local = on');
+
+$node_C->safe_psql('postgres',
+       "ALTER PUBLICATION tap_pub_C SET (publish='insert,update,delete');");
+
+$node_C->safe_psql('postgres', "TRUNCATE tab_full");
+
+# include truncates now
+$node_C->safe_psql('postgres',
+       "ALTER PUBLICATION tap_pub_C SET (publish='insert,update,delete,truncate');"
+);
+
+create_subscription($node_C, $node_A, $subname_CA, $node_A_connstr,
+       'tap_pub_A', 'copy_data = force, only_local = on');
+create_subscription($node_C, $node_B, $subname_CB, $node_B_connstr,
+       'tap_pub_B', 'copy_data = off, only_local = on');
+
+# insert some data in all the nodes
+$node_A->safe_psql('postgres', "INSERT INTO tab_full VALUES (12);");
+$node_B->safe_psql('postgres', "INSERT INTO tab_full VALUES (22);");
+$node_C->safe_psql('postgres', "INSERT INTO tab_full VALUES (32);");
+
+verify_data($node_A, $node_B, $node_C, '12
+22
+31
+32');
+
 # shutdown
 $node_B->stop('fast');
 $node_A->stop('fast');
diff --git a/src/tools/pgindent/typedefs.list b/src/tools/pgindent/typedefs.list
index 4fb746930a..b93381aafc 100644
--- a/src/tools/pgindent/typedefs.list
+++ b/src/tools/pgindent/typedefs.list
@@ -456,6 +456,7 @@ ConvProcInfo
 ConversionLocation
 ConvertRowtypeExpr
 CookedConstraint
+CopyData
 CopyDest
 CopyFormatOptions
 CopyFromState
-- 
2.32.0

#14

over 3 years ago

In reply to: kuroda.hayato@fujitsu.com (#13)

5 attachment(s)

RE: Multi-Master Logical Replication

Sorry, I forgot to attach the test script.
For cfbot I attached again all files. Sorry for the noise.

Best Regards,
Hayato Kuroda
FUJITSU LIMITED

Attachments:

v2-0004-add-doc.patchapplication/octet-stream; name=v2-0004-add-doc.patchDownload

From c4c2a372968bdac6bb03a92d2b353f7b6b7b9caa Mon Sep 17 00:00:00 2001
From: "kuroda.hayato%40jp.fujitsu.com" <kuroda.hayato@jp.fujitsu.com>
Date: Wed, 18 May 2022 04:56:18 +0000
Subject: [PATCH 2/2] add doc

---
 doc/src/sgml/catalogs.sgml | 296 ++++++++++++++++++++++++++++++
 doc/src/sgml/filelist.sgml |   1 +
 doc/src/sgml/func.sgml     |  79 ++++++++
 doc/src/sgml/lrg.sgml      | 364 +++++++++++++++++++++++++++++++++++++
 doc/src/sgml/postgres.sgml |   1 +
 5 files changed, 741 insertions(+)
 create mode 100644 doc/src/sgml/lrg.sgml

diff --git a/doc/src/sgml/catalogs.sgml b/doc/src/sgml/catalogs.sgml
index a533a2153e..29abc04fab 100644
--- a/doc/src/sgml/catalogs.sgml
+++ b/doc/src/sgml/catalogs.sgml
@@ -200,6 +200,22 @@
       <entry>metadata for large objects</entry>
      </row>
 
+     <row>
+      <entry><link linkend="catalog-pg-lrg-info"><structname>pg_lrg_info</structname></link></entry>
+      <entry>logical replication group</entry>
+     </row>
+
+     <row>
+      <entry><link linkend="catalog-pg-lrg-nodes"><structname>pg_lrg_nodes</structname></link></entry>
+      <entry>logical replication group</entry>
+     </row>
+
+     <row>
+      <entry><link linkend="catalog-pg-lrg-pub"><structname>pg_lrg_pub</structname></link></entry>
+      <entry>logical replication group</entry>
+     </row>
+
+
      <row>
       <entry><link linkend="catalog-pg-namespace"><structname>pg_namespace</structname></link></entry>
       <entry>schemas</entry>
@@ -4960,6 +4976,286 @@ SCRAM-SHA-256$<replaceable>&lt;iteration count&gt;</replaceable>:<replaceable>&l
   </table>
  </sect1>
 
+ <sect1 id="catalog-pg-lrg-info">
+  <title><structname>pg_lrg_info</structname></title>
+
+  <indexterm zone="catalog-pg-lrg-info">
+   <primary>pg_lrg_info</primary>
+  </indexterm>
+  <para>
+   test
+  </para>
+
+  <table>
+   <title><structname>pg_lrg_info</structname> Columns</title>
+   <tgroup cols="1">
+    <thead>
+     <row>
+      <entry role="catalog_table_entry"><para role="column_definition">
+       Column Type
+      </para>
+      <para>
+       Description
+      </para></entry>
+     </row>
+    </thead>
+
+    <tbody>
+     <row>
+      <entry role="catalog_table_entry"><para role="column_definition">
+       <structfield>oid</structfield> <type>oid</type>
+      </para>
+      <para>
+       Row identifier
+      </para></entry>
+     </row>
+
+     <row>
+      <entry role="catalog_table_entry"><para role="column_definition">
+       <structfield>groupname</structfield> <type>name</type>
+      </para>
+      <para>
+       Name of the group
+      </para></entry>
+     </row>
+
+     <row>
+      <entry role="catalog_table_entry"><para role="column_definition">
+       <structfield>puballtables</structfield> <type>bool</type>
+      </para>
+      <para>
+       The type of publication
+      </para></entry>
+     </row>
+    </tbody>
+   </tgroup>
+  </table>
+ </sect1>
+
+ <sect1 id="catalog-pg-lrg-nodes">
+  <title><structname>pg_lrg_nodes</structname></title>
+
+  <indexterm zone="catalog-pg-lrg-nodes">
+   <primary>pg_lrg_nodes</primary>
+  </indexterm>
+  <para>
+   test
+  </para>
+
+  <table>
+   <title><structname>pg_lrg_nodes</structname> Columns</title>
+   <tgroup cols="1">
+    <thead>
+     <row>
+      <entry role="catalog_table_entry"><para role="column_definition">
+       Column Type
+      </para>
+      <para>
+       Description
+      </para></entry>
+     </row>
+    </thead>
+
+    <tbody>
+     <row>
+      <entry role="catalog_table_entry"><para role="column_definition">
+       <structfield>oid</structfield> <type>oid</type>
+      </para>
+      <para>
+       Row identifier
+      </para></entry>
+     </row>
+
+     <row>
+      <entry role="catalog_table_entry"><para role="column_definition">
+       <structfield>nodeid</structfield> <type>name</type>
+      </para>
+      <para>
+       Identifier of this node
+      </para></entry>
+     </row>
+
+     <row>
+      <entry role="catalog_table_entry"><para role="column_definition">
+       <structfield>groupid</structfield> <type>oid</type>
+       (references <link linkend="catalog-pg-lrg-info"><structname>pg_lrg_info</structname></link>.<structfield>oid</structfield>)
+      </para>
+      <para>
+       attached group
+      </para></entry>
+     </row>
+
+     <row>
+      <entry role="catalog_table_entry"><para role="column_definition">
+       <structfield>dbid</structfield> <type>oid</type>
+       (references <link linkend="catalog-pg-database"><structname>pg_database</structname></link>.<structfield>oid</structfield>)
+      </para>
+      <para>
+       database
+      </para></entry>
+     </row>
+
+     <row>
+      <entry role="catalog_table_entry"><para role="column_definition">
+       <structfield>status</structfield> <type>int</type>
+      </para>
+      <para>
+       status
+      </para></entry>
+     </row>
+
+     <row>
+      <entry role="catalog_table_entry"><para role="column_definition">
+       <structfield>nodename</structfield> <type>name</type>
+      </para>
+      <para>
+       Name of this node
+      </para></entry>
+     </row>
+
+     <row>
+      <entry role="catalog_table_entry"><para role="column_definition">
+       <structfield>localconn</structfield> <type>name</type>
+      </para>
+      <para>
+       connection string for this node
+      </para></entry>
+     </row>
+
+     <row>
+      <entry role="catalog_table_entry"><para role="column_definition">
+       <structfield>upstreamconn</structfield> <type>name</type>
+      </para>
+      <para>
+       connection string for upstream node
+      </para></entry>
+     </row>
+
+    </tbody>
+   </tgroup>
+  </table>
+ </sect1>
+
+ <sect1 id="catalog-pg-lrg-pub">
+  <title><structname>pg_lrg_pub</structname></title>
+
+  <indexterm zone="catalog-pg-lrg-pub">
+   <primary>pg_lrg_pub</primary>
+  </indexterm>
+  <para>
+   test
+  </para>
+
+  <table>
+   <title><structname>pg_lrg_pub</structname> Columns</title>
+   <tgroup cols="1">
+    <thead>
+     <row>
+      <entry role="catalog_table_entry"><para role="column_definition">
+       Column Type
+      </para>
+      <para>
+       Description
+      </para></entry>
+     </row>
+    </thead>
+
+    <tbody>
+     <row>
+      <entry role="catalog_table_entry"><para role="column_definition">
+       <structfield>oid</structfield> <type>oid</type>
+      </para>
+      <para>
+       Row identifier
+      </para></entry>
+     </row>
+
+     <row>
+      <entry role="catalog_table_entry"><para role="column_definition">
+       <structfield>groupid</structfield> <type>oid</type>
+       (references <link linkend="catalog-pg-lrg-info"><structname>pg_lrg_info</structname></link>.<structfield>oid</structfield>)
+      </para>
+      <para>
+       attached group
+      </para></entry>
+     </row>
+
+     <row>
+      <entry role="catalog_table_entry"><para role="column_definition">
+       <structfield>pubid</structfield> <type>oid</type>
+       (references <link linkend="catalog-pg-publication"><structname>pg_publication</structname></link>.<structfield>oid</structfield>)
+      </para>
+      <para>
+       publication
+      </para></entry>
+     </row>
+
+
+    </tbody>
+   </tgroup>
+  </table>
+ </sect1>
+
+
+ <sect1 id="catalog-pg-lrg-sub">
+  <title><structname>pg_lrg_sub</structname></title>
+
+  <indexterm zone="catalog-pg-lrg-sub">
+   <primary>pg_lrg_sub</primary>
+  </indexterm>
+  <para>
+   test
+  </para>
+
+  <table>
+   <title><structname>pg_lrg_sub</structname> Columns</title>
+   <tgroup cols="1">
+    <thead>
+     <row>
+      <entry role="catalog_table_entry"><para role="column_definition">
+       Column Type
+      </para>
+      <para>
+       Description
+      </para></entry>
+     </row>
+    </thead>
+
+    <tbody>
+     <row>
+      <entry role="catalog_table_entry"><para role="column_definition">
+       <structfield>oid</structfield> <type>oid</type>
+      </para>
+      <para>
+       Row identifier
+      </para></entry>
+     </row>
+
+     <row>
+      <entry role="catalog_table_entry"><para role="column_definition">
+       <structfield>groupid</structfield> <type>oid</type>
+       (references <link linkend="catalog-pg-lrg-info"><structname>pg_lrg_info</structname></link>.<structfield>oid</structfield>)
+      </para>
+      <para>
+       attached group
+      </para></entry>
+     </row>
+
+     <row>
+      <entry role="catalog_table_entry"><para role="column_definition">
+       <structfield>subid</structfield> <type>oid</type>
+       (references <link linkend="catalog-pg-subscription"><structname>pg_subscription</structname></link>.<structfield>oid</structfield>)
+      </para>
+      <para>
+       subscription
+      </para></entry>
+     </row>
+
+
+    </tbody>
+   </tgroup>
+  </table>
+ </sect1>
 
  <sect1 id="catalog-pg-namespace">
   <title><structname>pg_namespace</structname></title>
diff --git a/doc/src/sgml/filelist.sgml b/doc/src/sgml/filelist.sgml
index 40ef5f7ffc..8be17a652e 100644
--- a/doc/src/sgml/filelist.sgml
+++ b/doc/src/sgml/filelist.sgml
@@ -49,6 +49,7 @@
 <!ENTITY wal           SYSTEM "wal.sgml">
 <!ENTITY logical-replication    SYSTEM "logical-replication.sgml">
 <!ENTITY jit    SYSTEM "jit.sgml">
+<!ENTITY lrg    SYSTEM "lrg.sgml">
 
 <!-- programmer's guide -->
 <!ENTITY bgworker   SYSTEM "bgworker.sgml">
diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml
index 85ecc639fd..85edb739f0 100644
--- a/doc/src/sgml/func.sgml
+++ b/doc/src/sgml/func.sgml
@@ -29233,6 +29233,85 @@ postgres=# SELECT * FROM pg_walfile_name_offset((pg_backup_stop()).lsn);
 
   </sect2>
 
+  <sect2 id="functions-lrg">
+   <title>Logical Replication Group Management Functions</title>
+
+   <para>
+    The functions shown
+    in <xref linkend="functions-lrg-table"/> are for
+    controlling and interacting with logical replication groups.
+   </para>
+
+   <table id="functions-lrg-table">
+    <title>LRG Management Functions</title>
+    <tgroup cols="1">
+     <thead>
+      <row>
+       <entry role="func_table_entry"><para role="func_signature">
+        Function
+       </para>
+       <para>
+        Description
+       </para></entry>
+      </row>
+     </thead>
+
+     <tbody>
+      <row>
+       <entry role="func_table_entry"><para role="func_signature">
+        <indexterm>
+         <primary></primary>
+        </indexterm>
+        <function>lrg_create</function> ( <parameter>group_name</parameter> <type>name</type>, <parameter>publication_type</parameter> <type>text</type>, <parameter>local_connstring</parameter> <type>text</type>, <parameter>node_name</parameter> <type>name</type>)
+        <returnvalue></returnvalue>
+       </para>
+       <para>
+        creates a group.
+       </para></entry>
+      </row>
+      <row>
+       <entry role="func_table_entry"><para role="func_signature">
+        <indexterm>
+         <primary></primary>
+        </indexterm>
+        <function>lrg_node_attach</function> ( <parameter>group_name</parameter> <type>name</type>, <parameter>local_connstring</parameter> <type>text</type>, <parameter>upstream_connstring</parameter> <type>text</type>, <parameter>node_name</parameter> <type>name</type>)
+        <returnvalue></returnvalue>
+       </para>
+       <para>
+        attach to a group.
+       </para></entry>
+      </row>
+      <row>
+       <entry role="func_table_entry"><para role="func_signature">
+        <indexterm>
+         <primary></primary>
+        </indexterm>
+        <function>lrg_node_detach</function> ( <parameter>group_name</parameter> <type>name</type>, <parameter>node_name</parameter> <type>name</type>)
+        <returnvalue></returnvalue>
+       </para>
+       <para>
+        detach from a group. This function must be executed from a member of the group.
+       </para></entry>
+      </row>
+      <row>
+       <entry role="func_table_entry"><para role="func_signature">
+        <indexterm>
+         <primary></primary>
+        </indexterm>
+        <function>lrg_drop</function> ( <parameter>group_name</parameter> <type>name</type>)
+        <returnvalue></returnvalue>
+       </para>
+       <para>
+        Detach from a given group and remove it. This function must be executed from a member of the group.
+       </para></entry>
+      </row>
+
+     </tbody>
+    </tgroup>
+   </table>
+
+  </sect2>
+
   <sect2 id="functions-admin-dbobject">
    <title>Database Object Management Functions</title>
 
diff --git a/doc/src/sgml/lrg.sgml b/doc/src/sgml/lrg.sgml
new file mode 100644
index 0000000000..f27c3def18
--- /dev/null
+++ b/doc/src/sgml/lrg.sgml
@@ -0,0 +1,364 @@
+<!-- doc/src/sgml/lrg.sgml -->
+ <chapter id="lrg">
+  <title>Logical Replication Group (LRG)</title>
+
+  <indexterm zone="lrg">
+   <primary>logical replication group</primary>
+  </indexterm>
+
+ <para>
+  Logical Replication Group(LRG) provides an easy way for constructing
+  N-directional logical replicaiton systems. Advantages of LRG are
+
+   <itemizedlist>
+    <listitem>
+     <para>
+      Allowing load balancing
+     </para>
+    </listitem>
+    <listitem>
+     <para>
+      Allowing rolling updates of nodes
+     </para>
+    </listitem>
+    <listitem>
+     <para>
+      Improving the availability of the system
+     </para>
+    </listitem>
+    <listitem>
+     <para>
+      Improving performance
+     </para>
+    </listitem>
+   </itemizedlist>
+
+ </para>
+
+  <sect1 id="lrg-concepts">
+   <title>Concepts</title>
+   <para>
+    In this section terminology for LRG is defined.
+   </para>
+   <sect2>
+    <title>Node Group</title>
+    <para>
+     Node group is a group that participant nodes have same tables and send/receive their data changes.
+     Note that each node in a group must be connected to all other nodes in that group.
+     LRG accomplishes such a group by creating publications and subscriptions on nodes.
+    </para>
+   </sect2>
+
+   <sect2>
+    <title>Upstream node</title>
+    <para>
+     Upstream node is a node that belongs to a group, and it must be specified when a node attaches to the group.
+     It is not special and arbitrary participants can be used as that.
+     The attaching node will copy data from the upstream.
+    </para>
+   </sect2>
+
+   <sect2>
+    <title>lrg launcher</title>
+    <para>
+     Lrg launcher is a background worker that is registered at postmaster startup.
+     This will be registered when <varname>max_logical_replication_workers</varname> is not equal to zero.
+     This process sometimes seeks pg_database, and it launches lrg worker associated with the database.
+    </para>
+   </sect2>
+
+   <sect2>
+    <title>lrg worker</title>
+    <para>
+     Lrg worker is a background worker that is registered by the lrg launcher.
+     This process is associated with a specific database, and performs all operations related to the LRG feature.
+     If this database is not a member of any node groups, this process will exit immediately.
+    </para>
+   </sect2>
+
+  </sect1>
+
+  <sect1 id="lrg-interface">
+   <title>LRG SQL interface</title>
+    <para>
+     See <xref linkend="functions-lrg"/> for detailed documentation on
+     SQL-level APIs for interacting with logical replication group.
+    </para>
+  </sect1>
+
+  <sect1 id="lrg-catalog">
+   <title>System Catalogs related to LRG</title>
+    <para>
+     Following catalogs are used for LRG.
+
+    <itemizedlist>
+     <listitem>
+      <para>
+       <link linkend="catalog-pg-lrg-info"><structname>pg_lrg_info</structname></link>,
+      </para>
+     </listitem>
+     <listitem>
+      <para>
+       <link linkend="catalog-pg-lrg-nodes"><structname>pg_lrg_nodes</structname></link>,
+      </para>
+     </listitem>
+     <listitem>
+      <para>
+       <link linkend="catalog-pg-lrg-pub"><structname>pg_lrg_pub</structname></link>,
+      </para>
+     </listitem>
+     <listitem>
+      <para>
+       <link linkend="catalog-pg-lrg-sub"><structname>pg_lrg_sub</structname></link>
+      </para>
+     </listitem>
+    </itemizedlist>
+    </para>
+
+  </sect1>
+
+  <sect1 id="lrg-restriction">
+   <title>Restrictions</title>
+    <para>
+     LRG currently has the following restrictions or missing functionality.
+    <itemizedlist>
+     <listitem>
+      <para>
+       Each node can attach only one node set at a time.
+      </para>
+     </listitem>
+     <listitem>
+      <para>
+       All tables must be shard in the database.
+      </para>
+     </listitem>
+     <listitem>
+      <para>
+       LRG does not provide any mechanism for conflict handling (e.g. PK violations).
+       Avoiding conflicts is the user responsibility.
+      </para>
+     </listitem>
+    </itemizedlist>
+    </para>
+  </sect1>
+
+  <sect1 id="lrg-configuration">
+   <title>Configuration</title>
+    <para>
+     LRG will create a publication and subscriptions on each node,
+     that number of subscribers is the number of participants minus one. Therefore LRG requires
+     several configuration options.
+    </para>
+
+    <itemizedlist>
+     <listitem>
+      <para>
+       <varname>wal_level</varname> must be <literal>logical</literal>.
+      </para>
+     </listitem>
+     <listitem>
+      <para>
+       <varname>max_replication_slots</varname> must be set to at least the number of participants.
+      </para>
+     </listitem>
+     <listitem>
+      <para>
+       <varname>max_wal_senders</varname> must be set to at least same as <varname>max_replication_slots</varname>.
+      </para>
+     </listitem>
+     <listitem>
+      <para>
+       <varname>max_logical_replication_workers</varname> must be larger than the number of participants.
+      </para>
+     </listitem>
+     <listitem>
+      <para>
+       <varname>max_worker_processes</varname> must be larger that <literal>max_logical_replication_workers</literal>.
+      </para>
+     </listitem>
+    </itemizedlist>
+  </sect1>
+
+  <sect1 id="lrg-example">
+   <title>Example</title>
+   <para>
+    The following example demonstrates constructing N-directional logical replication group.
+    Assuming that there are three nodes, called Node1, Node2, Node3, and they can connect each other.
+    In the example they are in the same machine.
+   </para>
+
+   <sect2>
+    <title>Create a group</title>
+
+    <para>
+     At first, a node group must be created. Here it will be created on Node1.
+    </para>
+<programlisting>
+postgres=# -- Create a node group 'testgroup', and attach to it as 'testnode1'
+postgres=# SELECT lrg_create('testgroup', 'FOR ALL TABLES', 'port=5431 dbname=postgres', 'testnode1');
+ lrg_create
+------------
+
+(1 row)
+</programlisting>
+
+    <para>
+     Information of the group and nodes can check via system catalogs,
+     like <link linkend="catalog-pg-lrg-nodes"><structname>pg_lrg_nodes</structname></link>.
+    </para>
+<programlisting>
+postgres=# SELECT * FROM pg_lrg_info ;
+  oid  | groupname | puballtables
+-------+-----------+--------------
+ 16384 | testgroup | t
+(1 row)
+
+postgres=# SELECT * FROM pg_lrg_nodes;
+  oid  |        nodeid        | groupid | dbid | status | nodename  |         localconn         | upstreamconn
+-------+----------------------+---------+------+--------+-----------+---------------------------+--------------
+ 16385 | 70988980716274892555 |   16384 |    5 |      3 | testnode1 | port=5431 dbname=postgres |
+(1 row)
+</programlisting>
+
+   </sect2>
+
+   <sect2>
+    <title>Attach to a group</title>
+
+    <para>
+     Next Node2 can attach the created group testgroup. Note that the connection string for Node1 must be specified.
+    </para>
+
+<programlisting>
+postgres=# -- Attach to 'testgroup' as 'testnode2'
+postgres=# SELECT lrg_node_attach('testgroup', 'port=5432 dbname=postgres', 'port=5431 dbname=postgres', 'testnode2');
+ lrg_node_attach
+-----------------
+
+(1 row)
+</programlisting>
+
+    <para>
+     The status of all nodes can check via pg_lrg_nodes. Following tuples will be found on both nodes.
+    </para>
+
+<programlisting>
+postgres=# SELECT * FROM pg_lrg_info ;
+  oid  | groupname | puballtables
+-------+-----------+--------------
+ 16384 | testgroup | t
+(1 row)
+
+postgres=# SELECT * FROM pg_lrg_nodes;
+  oid  |        nodeid        | groupid | dbid | status | nodename  |         localconn         |       upstreamconn
+-------+----------------------+---------+------+--------+-----------+---------------------------+---------------------------
+ 16385 | 70989229890284027935 |   16384 |    5 |      3 | testnode2 | port=5432 dbname=postgres | port=5431 dbname=postgres
+ 16386 | 70988980716274892555 |   16385 |    5 |      3 | testnode1 | port=5431 dbname=postgres |
+(2 rows)
+</programlisting>
+
+    <para>
+     Now Node1 and Node2 has same contents about LRG system catalogs,
+     so ether of them can be specified as an upstream node.
+     In below example Node2 is used as upstream for attaching a new node.
+    </para>
+
+<programlisting>
+postgres=# -- Attach to 'testgroup' as 'testode3', and data will be copied from Node2
+postgres=# SELECT lrg_node_attach('testgroup', 'port=5433 dbname=postgres', 'port=5432 dbname=postgres', 'testnode3');
+ lrg_node_attach
+-----------------
+
+(1 row)
+</programlisting>
+
+    <para>
+     Finally pg_lrg_info and pg_lrg_nodes will be like:
+    </para>
+
+<programlisting>
+postgres=# SELECT * FROM pg_lrg_info ;
+  oid  | groupname | puballtables
+-------+-----------+--------------
+ 16384 | testgroup | t
+(1 row)
+
+postgres=# SELECT * FROM pg_lrg_nodes;
+  oid  |        nodeid        | groupid | dbid | status | nodename  |         localconn         |       upstreamconn
+-------+----------------------+---------+------+--------+-----------+---------------------------+---------------------------
+ 16385 | 70989243367269230745 |   16384 |    5 |      3 | testnode3 | port=5433 dbname=postgres | port=5432 dbname=postgres
+ 16386 | 70989229890284027935 |   16385 |    5 |      3 | testnode2 | port=5432 dbname=postgres | port=5431 dbname=postgres
+ 16387 | 70988980716274892555 |   16385 |    5 |      3 | testnode1 | port=5431 dbname=postgres |
+(3 rows)
+</programlisting>
+
+    <para>
+     Now all nodes publish their changes, and they subscribe them. If a tuple inserted on Node1,
+     the data will be also found on Node2 and Node3.
+    </para>
+
+   </sect2>
+
+   <sect2>
+    <title>Detach from a group</title>
+    <para>
+     User can detach attached nodes at any time. n API for detaching <function>lrg_node_detach</function> is used for that.
+     This function must be called from a node that is a member of a group.
+    </para>
+
+<programlisting>
+postgres=# -- Detach Node3 from 'testgroup'. This can be done from the arbitrary member of the group
+postgres=# select lrg_node_detach('testgroup', 'testnode3');
+ lrg_node_detach
+-----------------
+
+(1 row)
+
+postgres=# -- Also detach Node2 from 'testgroup'. This can be done from Node1 or Node2
+postgres=# select lrg_node_detach('testgroup', 'testnode2');
+ lrg_node_detach
+-----------------
+
+(1 row)
+</programlisting>
+
+    <para>
+     The user-defined data will be not removed even if nodes are detached, but tuples in LRG related catalogs will be deleted.
+    </para>
+
+<programlisting>
+postgres=# -- On Node2, there are no tuples in pg_lrg_nodes
+postgres=# select * from pg_lrg_nodes;
+ oid | nodeid | groupid | dbid | status | nodename | localconn | upstreamconn
+-----+--------+---------+------+--------+----------+-----------+--------------
+(0 rows)
+</programlisting>
+   </sect2>
+
+   <sect2>
+    <title>Drop a group</title>
+    <para>
+     For dropping a group, the API <function>lrg_drop</function> can be used.
+     The function must be callled from a node that is a member of a group, and
+     it will throw ERROR if there are other members in the group.
+    </para>
+
+<programlisting>
+postgres=# -- Drop 'testgroup.' Node1 will detach from it automatically.
+postgres=# select lrg_drop('testgroup');
+ lrg_drop
+----------
+
+(1 row)
+
+postgres=# -- There are no tuples in pg_lrg_info
+postgres=# select * from pg_lrg_info;
+ oid | groupname | puballtables
+-----+-----------+--------------
+(0 rows)
+</programlisting>
+   </sect2>
+
+  </sect1>
+
+ </chapter>
\ No newline at end of file
diff --git a/doc/src/sgml/postgres.sgml b/doc/src/sgml/postgres.sgml
index 0b60e46d69..bcea47fdc9 100644
--- a/doc/src/sgml/postgres.sgml
+++ b/doc/src/sgml/postgres.sgml
@@ -172,6 +172,7 @@ break is not needed in a wider output rendering.
   &logical-replication;
   &jit;
   &regress;
+  &lrg;
 
  </part>
 
-- 
2.27.0

v14-0001-Skip-replication-of-non-local-data.patchapplication/octet-stream; name=v14-0001-Skip-replication-of-non-local-data.patchDownload

From 3ea787ae1d7abea14ac3a98a4df27b3a3bdc14e2 Mon Sep 17 00:00:00 2001
From: Vigneshwaran C <vignesh21@gmail.com>
Date: Fri, 8 Apr 2022 11:10:05 +0530
Subject: [PATCH v14 1/2] Skip replication of non local data.

This patch adds a new SUBSCRIPTION boolean option
"only_local". The default is false. When a SUBSCRIPTION is
created with this option enabled, the publisher will only publish data
that originated at the publisher node.
Usage:
CREATE SUBSCRIPTION sub1 CONNECTION 'dbname=postgres port=9999'
PUBLICATION pub1 with (only_local = true);
---
 contrib/test_decoding/expected/replorigin.out |  55 ++++++
 contrib/test_decoding/sql/replorigin.sql      |  15 ++
 doc/src/sgml/catalogs.sgml                    |  11 ++
 doc/src/sgml/ref/alter_subscription.sgml      |   5 +-
 doc/src/sgml/ref/create_subscription.sgml     |  12 ++
 src/backend/catalog/pg_subscription.c         |   1 +
 src/backend/catalog/system_views.sql          |   4 +-
 src/backend/commands/subscriptioncmds.c       |  26 ++-
 .../libpqwalreceiver/libpqwalreceiver.c       |   5 +
 src/backend/replication/logical/worker.c      |   2 +
 src/backend/replication/pgoutput/pgoutput.c   |  20 ++-
 src/bin/pg_dump/pg_dump.c                     |  17 +-
 src/bin/pg_dump/pg_dump.h                     |   1 +
 src/bin/psql/describe.c                       |   8 +-
 src/bin/psql/tab-complete.c                   |   4 +-
 src/include/catalog/pg_subscription.h         |   3 +
 src/include/replication/logicalproto.h        |   7 +
 src/include/replication/pgoutput.h            |   1 +
 src/include/replication/walreceiver.h         |   1 +
 src/test/regress/expected/subscription.out    | 142 ++++++++-------
 src/test/regress/sql/subscription.sql         |  10 ++
 src/test/subscription/t/032_onlylocal.pl      | 162 ++++++++++++++++++
 22 files changed, 440 insertions(+), 72 deletions(-)
 create mode 100644 src/test/subscription/t/032_onlylocal.pl

diff --git a/contrib/test_decoding/expected/replorigin.out b/contrib/test_decoding/expected/replorigin.out
index 2e9ef7c823..94ef390120 100644
--- a/contrib/test_decoding/expected/replorigin.out
+++ b/contrib/test_decoding/expected/replorigin.out
@@ -257,3 +257,58 @@ SELECT pg_replication_origin_drop('regress_test_decoding: regression_slot_no_lsn
  
 (1 row)
 
+-- Verify that remote origin data is not returned with only-local option
+SELECT 'init' FROM pg_create_logical_replication_slot('regression_slot_only_local', 'test_decoding');
+ ?column? 
+----------
+ init
+(1 row)
+
+SELECT pg_replication_origin_create('regress_test_decoding: regression_slot_only_local');
+ pg_replication_origin_create 
+------------------------------
+                            1
+(1 row)
+
+SELECT pg_replication_origin_session_setup('regress_test_decoding: regression_slot_only_local');
+ pg_replication_origin_session_setup 
+-------------------------------------
+ 
+(1 row)
+
+INSERT INTO origin_tbl(data) VALUES ('only_local, commit1');
+-- remote origin data returned when only-local option is not set
+SELECT data FROM pg_logical_slot_get_changes('regression_slot_only_local', NULL, NULL, 'skip-empty-xacts', '1', 'include-xids', '0', 'only-local', '0');
+                                      data                                       
+---------------------------------------------------------------------------------
+ BEGIN
+ table public.origin_tbl: INSERT: id[integer]:8 data[text]:'only_local, commit1'
+ COMMIT
+(3 rows)
+
+INSERT INTO origin_tbl(data) VALUES ('only_local, commit2');
+-- remote origin data not returned when only-local option is set
+SELECT data FROM pg_logical_slot_get_changes('regression_slot_only_local', NULL, NULL, 'skip-empty-xacts', '1', 'include-xids', '0', 'only-local', '1');
+ data 
+------
+(0 rows)
+
+-- Clean up
+SELECT pg_replication_origin_session_reset();
+ pg_replication_origin_session_reset 
+-------------------------------------
+ 
+(1 row)
+
+SELECT pg_drop_replication_slot('regression_slot_only_local');
+ pg_drop_replication_slot 
+--------------------------
+ 
+(1 row)
+
+SELECT pg_replication_origin_drop('regress_test_decoding: regression_slot_only_local');
+ pg_replication_origin_drop 
+----------------------------
+ 
+(1 row)
+
diff --git a/contrib/test_decoding/sql/replorigin.sql b/contrib/test_decoding/sql/replorigin.sql
index 2e28a48777..5d1045e105 100644
--- a/contrib/test_decoding/sql/replorigin.sql
+++ b/contrib/test_decoding/sql/replorigin.sql
@@ -119,3 +119,18 @@ SELECT data FROM pg_logical_slot_get_changes('regression_slot_no_lsn', NULL, NUL
 SELECT pg_replication_origin_session_reset();
 SELECT pg_drop_replication_slot('regression_slot_no_lsn');
 SELECT pg_replication_origin_drop('regress_test_decoding: regression_slot_no_lsn');
+
+-- Verify that remote origin data is not returned with only-local option
+SELECT 'init' FROM pg_create_logical_replication_slot('regression_slot_only_local', 'test_decoding');
+SELECT pg_replication_origin_create('regress_test_decoding: regression_slot_only_local');
+SELECT pg_replication_origin_session_setup('regress_test_decoding: regression_slot_only_local');
+INSERT INTO origin_tbl(data) VALUES ('only_local, commit1');
+-- remote origin data returned when only-local option is not set
+SELECT data FROM pg_logical_slot_get_changes('regression_slot_only_local', NULL, NULL, 'skip-empty-xacts', '1', 'include-xids', '0', 'only-local', '0');
+INSERT INTO origin_tbl(data) VALUES ('only_local, commit2');
+-- remote origin data not returned when only-local option is set
+SELECT data FROM pg_logical_slot_get_changes('regression_slot_only_local', NULL, NULL, 'skip-empty-xacts', '1', 'include-xids', '0', 'only-local', '1');
+-- Clean up
+SELECT pg_replication_origin_session_reset();
+SELECT pg_drop_replication_slot('regression_slot_only_local');
+SELECT pg_replication_origin_drop('regress_test_decoding: regression_slot_only_local');
\ No newline at end of file
diff --git a/doc/src/sgml/catalogs.sgml b/doc/src/sgml/catalogs.sgml
index d96c72e531..fcb6ff0331 100644
--- a/doc/src/sgml/catalogs.sgml
+++ b/doc/src/sgml/catalogs.sgml
@@ -7861,6 +7861,17 @@ SCRAM-SHA-256$<replaceable>&lt;iteration count&gt;</replaceable>:<replaceable>&l
       </para></entry>
      </row>
 
+    <row>
+      <entry role="catalog_table_entry"><para role="column_definition">
+       <structfield>subonlylocal</structfield> <type>bool</type>
+      </para>
+      <para>
+       If true, subscription will request the publisher to send locally
+       originated changes at the publisher node, or send any publisher node
+       changes regardless of their origin
+      </para></entry>
+     </row>
+
      <row>
       <entry role="catalog_table_entry"><para role="column_definition">
        <structfield>substream</structfield> <type>bool</type>
diff --git a/doc/src/sgml/ref/alter_subscription.sgml b/doc/src/sgml/ref/alter_subscription.sgml
index 353ea5def2..45beca9b86 100644
--- a/doc/src/sgml/ref/alter_subscription.sgml
+++ b/doc/src/sgml/ref/alter_subscription.sgml
@@ -207,8 +207,9 @@ ALTER SUBSCRIPTION <replaceable class="parameter">name</replaceable> RENAME TO <
       information.  The parameters that can be altered
       are <literal>slot_name</literal>,
       <literal>synchronous_commit</literal>,
-      <literal>binary</literal>, <literal>streaming</literal>, and
-      <literal>disable_on_error</literal>.
+      <literal>binary</literal>, <literal>streaming</literal>,
+      <literal>disable_on_error</literal>, and
+      <literal>only_local</literal>.
      </para>
     </listitem>
    </varlistentry>
diff --git a/doc/src/sgml/ref/create_subscription.sgml b/doc/src/sgml/ref/create_subscription.sgml
index 203bb41844..00580cc7ba 100644
--- a/doc/src/sgml/ref/create_subscription.sgml
+++ b/doc/src/sgml/ref/create_subscription.sgml
@@ -216,6 +216,18 @@ CREATE SUBSCRIPTION <replaceable class="parameter">subscription_name</replaceabl
         </listitem>
        </varlistentry>
 
+       <varlistentry>
+        <term><literal>only_local</literal> (<type>boolean</type>)</term>
+        <listitem>
+         <para>
+          Specifies whether the subscription will request the publisher to send
+          locally originated changes at the publisher node, or send any
+          publisher node changes regardless of their origin. The default is
+          <literal>false</literal>.
+         </para>
+        </listitem>
+       </varlistentry>
+
        <varlistentry>
         <term><literal>streaming</literal> (<type>boolean</type>)</term>
         <listitem>
diff --git a/src/backend/catalog/pg_subscription.c b/src/backend/catalog/pg_subscription.c
index add51caadf..f0c83aaf59 100644
--- a/src/backend/catalog/pg_subscription.c
+++ b/src/backend/catalog/pg_subscription.c
@@ -71,6 +71,7 @@ GetSubscription(Oid subid, bool missing_ok)
 	sub->stream = subform->substream;
 	sub->twophasestate = subform->subtwophasestate;
 	sub->disableonerr = subform->subdisableonerr;
+	sub->only_local = subform->subonlylocal;
 
 	/* Get conninfo */
 	datum = SysCacheGetAttr(SUBSCRIPTIONOID,
diff --git a/src/backend/catalog/system_views.sql b/src/backend/catalog/system_views.sql
index fedaed533b..88bde866ed 100644
--- a/src/backend/catalog/system_views.sql
+++ b/src/backend/catalog/system_views.sql
@@ -1298,8 +1298,8 @@ REVOKE ALL ON pg_replication_origin_status FROM public;
 -- All columns of pg_subscription except subconninfo are publicly readable.
 REVOKE ALL ON pg_subscription FROM public;
 GRANT SELECT (oid, subdbid, subskiplsn, subname, subowner, subenabled,
-              subbinary, substream, subtwophasestate, subdisableonerr, subslotname,
-              subsynccommit, subpublications)
+              subbinary, substream, subtwophasestate, subdisableonerr,
+              subonlylocal, subslotname, subsynccommit, subpublications)
     ON pg_subscription TO public;
 
 CREATE VIEW pg_stat_subscription_stats AS
diff --git a/src/backend/commands/subscriptioncmds.c b/src/backend/commands/subscriptioncmds.c
index 690cdaa426..479d6ca372 100644
--- a/src/backend/commands/subscriptioncmds.c
+++ b/src/backend/commands/subscriptioncmds.c
@@ -64,6 +64,7 @@
 #define SUBOPT_TWOPHASE_COMMIT		0x00000200
 #define SUBOPT_DISABLE_ON_ERR		0x00000400
 #define SUBOPT_LSN					0x00000800
+#define SUBOPT_ONLY_LOCAL			0x00001000
 
 /* check if the 'val' has 'bits' set */
 #define IsSet(val, bits)  (((val) & (bits)) == (bits))
@@ -86,6 +87,7 @@ typedef struct SubOpts
 	bool		streaming;
 	bool		twophase;
 	bool		disableonerr;
+	bool		only_local;
 	XLogRecPtr	lsn;
 } SubOpts;
 
@@ -137,6 +139,8 @@ parse_subscription_options(ParseState *pstate, List *stmt_options,
 		opts->twophase = false;
 	if (IsSet(supported_opts, SUBOPT_DISABLE_ON_ERR))
 		opts->disableonerr = false;
+	if (IsSet(supported_opts, SUBOPT_ONLY_LOCAL))
+		opts->only_local = false;
 
 	/* Parse options */
 	foreach(lc, stmt_options)
@@ -235,6 +239,15 @@ parse_subscription_options(ParseState *pstate, List *stmt_options,
 			opts->specified_opts |= SUBOPT_STREAMING;
 			opts->streaming = defGetBoolean(defel);
 		}
+		else if (IsSet(supported_opts, SUBOPT_ONLY_LOCAL) &&
+				 strcmp(defel->defname, "only_local") == 0)
+		{
+			if (IsSet(opts->specified_opts, SUBOPT_ONLY_LOCAL))
+				errorConflictingDefElem(defel, pstate);
+
+			opts->specified_opts |= SUBOPT_ONLY_LOCAL;
+			opts->only_local = defGetBoolean(defel);
+		}
 		else if (strcmp(defel->defname, "two_phase") == 0)
 		{
 			/*
@@ -531,7 +544,7 @@ CreateSubscription(ParseState *pstate, CreateSubscriptionStmt *stmt,
 					  SUBOPT_SLOT_NAME | SUBOPT_COPY_DATA |
 					  SUBOPT_SYNCHRONOUS_COMMIT | SUBOPT_BINARY |
 					  SUBOPT_STREAMING | SUBOPT_TWOPHASE_COMMIT |
-					  SUBOPT_DISABLE_ON_ERR);
+					  SUBOPT_DISABLE_ON_ERR | SUBOPT_ONLY_LOCAL);
 	parse_subscription_options(pstate, stmt->options, supported_opts, &opts);
 
 	/*
@@ -602,6 +615,7 @@ CreateSubscription(ParseState *pstate, CreateSubscriptionStmt *stmt,
 	values[Anum_pg_subscription_subenabled - 1] = BoolGetDatum(opts.enabled);
 	values[Anum_pg_subscription_subbinary - 1] = BoolGetDatum(opts.binary);
 	values[Anum_pg_subscription_substream - 1] = BoolGetDatum(opts.streaming);
+	values[Anum_pg_subscription_subonlylocal - 1] = BoolGetDatum(opts.only_local);
 	values[Anum_pg_subscription_subtwophasestate - 1] =
 		CharGetDatum(opts.twophase ?
 					 LOGICALREP_TWOPHASE_STATE_PENDING :
@@ -1015,7 +1029,8 @@ AlterSubscription(ParseState *pstate, AlterSubscriptionStmt *stmt,
 			{
 				supported_opts = (SUBOPT_SLOT_NAME |
 								  SUBOPT_SYNCHRONOUS_COMMIT | SUBOPT_BINARY |
-								  SUBOPT_STREAMING | SUBOPT_DISABLE_ON_ERR);
+								  SUBOPT_STREAMING | SUBOPT_DISABLE_ON_ERR |
+								  SUBOPT_ONLY_LOCAL);
 
 				parse_subscription_options(pstate, stmt->options,
 										   supported_opts, &opts);
@@ -1072,6 +1087,13 @@ AlterSubscription(ParseState *pstate, AlterSubscriptionStmt *stmt,
 						= true;
 				}
 
+				if (IsSet(opts.specified_opts, SUBOPT_ONLY_LOCAL))
+				{
+					values[Anum_pg_subscription_subonlylocal - 1] =
+						BoolGetDatum(opts.only_local);
+					replaces[Anum_pg_subscription_subonlylocal - 1] = true;
+				}
+
 				update_tuple = true;
 				break;
 			}
diff --git a/src/backend/replication/libpqwalreceiver/libpqwalreceiver.c b/src/backend/replication/libpqwalreceiver/libpqwalreceiver.c
index 0d89db4e6a..56a07f0dce 100644
--- a/src/backend/replication/libpqwalreceiver/libpqwalreceiver.c
+++ b/src/backend/replication/libpqwalreceiver/libpqwalreceiver.c
@@ -453,6 +453,11 @@ libpqrcv_startstreaming(WalReceiverConn *conn,
 			PQserverVersion(conn->streamConn) >= 150000)
 			appendStringInfoString(&cmd, ", two_phase 'on'");
 
+		/* FIXME: 150000 should be changed to 160000 later for PG16. */
+		if (options->proto.logical.only_local &&
+			PQserverVersion(conn->streamConn) >= 150000)
+			appendStringInfoString(&cmd, ", only_local 'on'");
+
 		pubnames = options->proto.logical.publication_names;
 		pubnames_str = stringlist_to_identifierstr(conn->streamConn, pubnames);
 		if (!pubnames_str)
diff --git a/src/backend/replication/logical/worker.c b/src/backend/replication/logical/worker.c
index fc210a9e7b..d41ba854b8 100644
--- a/src/backend/replication/logical/worker.c
+++ b/src/backend/replication/logical/worker.c
@@ -3060,6 +3060,7 @@ maybe_reread_subscription(void)
 		newsub->binary != MySubscription->binary ||
 		newsub->stream != MySubscription->stream ||
 		newsub->owner != MySubscription->owner ||
+		newsub->only_local != MySubscription->only_local ||
 		!equal(newsub->publications, MySubscription->publications))
 	{
 		ereport(LOG,
@@ -3740,6 +3741,7 @@ ApplyWorkerMain(Datum main_arg)
 	options.proto.logical.binary = MySubscription->binary;
 	options.proto.logical.streaming = MySubscription->stream;
 	options.proto.logical.twophase = false;
+	options.proto.logical.only_local = MySubscription->only_local;
 
 	if (!am_tablesync_worker())
 	{
diff --git a/src/backend/replication/pgoutput/pgoutput.c b/src/backend/replication/pgoutput/pgoutput.c
index 42c06af239..82b2b8245e 100644
--- a/src/backend/replication/pgoutput/pgoutput.c
+++ b/src/backend/replication/pgoutput/pgoutput.c
@@ -287,11 +287,13 @@ parse_output_parameters(List *options, PGOutputData *data)
 	bool		messages_option_given = false;
 	bool		streaming_given = false;
 	bool		two_phase_option_given = false;
+	bool		only_local_option_given = false;
 
 	data->binary = false;
 	data->streaming = false;
 	data->messages = false;
 	data->two_phase = false;
+	data->only_local = false;
 
 	foreach(lc, options)
 	{
@@ -380,6 +382,16 @@ parse_output_parameters(List *options, PGOutputData *data)
 
 			data->two_phase = defGetBoolean(defel);
 		}
+		else if (strcmp(defel->defname, "only_local") == 0)
+		{
+			if (only_local_option_given)
+				ereport(ERROR,
+						(errcode(ERRCODE_SYNTAX_ERROR),
+						 errmsg("conflicting or redundant options")));
+			only_local_option_given = true;
+
+			data->only_local = defGetBoolean(defel);
+		}
 		else
 			elog(ERROR, "unrecognized pgoutput option: %s", defel->defname);
 	}
@@ -1698,12 +1710,18 @@ pgoutput_message(LogicalDecodingContext *ctx, ReorderBufferTXN *txn,
 }
 
 /*
- * Currently we always forward.
+ * Return true if data has originated remotely when only_local option is
+ * enabled, false otherwise.
  */
 static bool
 pgoutput_origin_filter(LogicalDecodingContext *ctx,
 					   RepOriginId origin_id)
 {
+	PGOutputData *data = (PGOutputData *) ctx->output_plugin_private;
+
+	if (data->only_local && origin_id != InvalidRepOriginId)
+		return true;
+
 	return false;
 }
 
diff --git a/src/bin/pg_dump/pg_dump.c b/src/bin/pg_dump/pg_dump.c
index 7cc9c72e49..05ed85533b 100644
--- a/src/bin/pg_dump/pg_dump.c
+++ b/src/bin/pg_dump/pg_dump.c
@@ -4411,6 +4411,7 @@ getSubscriptions(Archive *fout)
 	int			i_subsynccommit;
 	int			i_subpublications;
 	int			i_subbinary;
+	int			i_subonlylocal;
 	int			i,
 				ntups;
 
@@ -4455,13 +4456,19 @@ getSubscriptions(Archive *fout)
 	if (fout->remoteVersion >= 150000)
 		appendPQExpBufferStr(query,
 							 " s.subtwophasestate,\n"
-							 " s.subdisableonerr\n");
+							 " s.subdisableonerr,\n");
 	else
 		appendPQExpBuffer(query,
 						  " '%c' AS subtwophasestate,\n"
-						  " false AS subdisableonerr\n",
+						  " false AS subdisableonerr,\n",
 						  LOGICALREP_TWOPHASE_STATE_DISABLED);
 
+	/* FIXME: 150000 should be changed to 160000 later for PG16. */
+	if (fout->remoteVersion >= 150000)
+		appendPQExpBufferStr(query, " s.subonlylocal\n");
+	else
+		appendPQExpBufferStr(query, " false AS subonlylocal\n");
+
 	appendPQExpBufferStr(query,
 						 "FROM pg_subscription s\n"
 						 "WHERE s.subdbid = (SELECT oid FROM pg_database\n"
@@ -4487,6 +4494,7 @@ getSubscriptions(Archive *fout)
 	i_substream = PQfnumber(res, "substream");
 	i_subtwophasestate = PQfnumber(res, "subtwophasestate");
 	i_subdisableonerr = PQfnumber(res, "subdisableonerr");
+	i_subonlylocal = PQfnumber(res, "subonlylocal");
 
 	subinfo = pg_malloc(ntups * sizeof(SubscriptionInfo));
 
@@ -4516,6 +4524,8 @@ getSubscriptions(Archive *fout)
 			pg_strdup(PQgetvalue(res, i, i_subtwophasestate));
 		subinfo[i].subdisableonerr =
 			pg_strdup(PQgetvalue(res, i, i_subdisableonerr));
+		subinfo[i].subonlylocal =
+			pg_strdup(PQgetvalue(res, i, i_subonlylocal));
 
 		/* Decide whether we want to dump it */
 		selectDumpableObject(&(subinfo[i].dobj), fout);
@@ -4589,6 +4599,9 @@ dumpSubscription(Archive *fout, const SubscriptionInfo *subinfo)
 	if (strcmp(subinfo->subdisableonerr, "t") == 0)
 		appendPQExpBufferStr(query, ", disable_on_error = true");
 
+	if (strcmp(subinfo->subonlylocal, "t") == 0)
+		appendPQExpBufferStr(query, ", only_local = true");
+
 	if (strcmp(subinfo->subsynccommit, "off") != 0)
 		appendPQExpBuffer(query, ", synchronous_commit = %s", fmtId(subinfo->subsynccommit));
 
diff --git a/src/bin/pg_dump/pg_dump.h b/src/bin/pg_dump/pg_dump.h
index 1d21c2906f..ddb855fd16 100644
--- a/src/bin/pg_dump/pg_dump.h
+++ b/src/bin/pg_dump/pg_dump.h
@@ -661,6 +661,7 @@ typedef struct _SubscriptionInfo
 	char	   *subdisableonerr;
 	char	   *subsynccommit;
 	char	   *subpublications;
+	char	   *subonlylocal;
 } SubscriptionInfo;
 
 /*
diff --git a/src/bin/psql/describe.c b/src/bin/psql/describe.c
index 1a5d924a23..0013e480d6 100644
--- a/src/bin/psql/describe.c
+++ b/src/bin/psql/describe.c
@@ -6354,7 +6354,7 @@ describeSubscriptions(const char *pattern, bool verbose)
 	PGresult   *res;
 	printQueryOpt myopt = pset.popt;
 	static const bool translate_columns[] = {false, false, false, false,
-	false, false, false, false, false, false, false};
+	false, false, false, false, false, false, false, false};
 
 	if (pset.sversion < 100000)
 	{
@@ -6396,6 +6396,12 @@ describeSubscriptions(const char *pattern, bool verbose)
 							  gettext_noop("Two phase commit"),
 							  gettext_noop("Disable on error"));
 
+		/* FIXME: 150000 should be changed to 160000 later for PG16 */
+		if (pset.sversion >= 150000)
+			appendPQExpBuffer(&buf,
+							  ", subonlylocal AS \"%s\"\n",
+							  gettext_noop("Only local"));
+
 		appendPQExpBuffer(&buf,
 						  ",  subsynccommit AS \"%s\"\n"
 						  ",  subconninfo AS \"%s\"\n",
diff --git a/src/bin/psql/tab-complete.c b/src/bin/psql/tab-complete.c
index 55af9eb04e..989d4f3bcb 100644
--- a/src/bin/psql/tab-complete.c
+++ b/src/bin/psql/tab-complete.c
@@ -1875,7 +1875,7 @@ psql_completion(const char *text, int start, int end)
 		COMPLETE_WITH("(", "PUBLICATION");
 	/* ALTER SUBSCRIPTION <name> SET ( */
 	else if (HeadMatches("ALTER", "SUBSCRIPTION", MatchAny) && TailMatches("SET", "("))
-		COMPLETE_WITH("binary", "slot_name", "streaming", "synchronous_commit", "disable_on_error");
+		COMPLETE_WITH("binary", "only_local", "slot_name", "streaming", "synchronous_commit", "disable_on_error");
 	/* ALTER SUBSCRIPTION <name> SKIP ( */
 	else if (HeadMatches("ALTER", "SUBSCRIPTION", MatchAny) && TailMatches("SKIP", "("))
 		COMPLETE_WITH("lsn");
@@ -3157,7 +3157,7 @@ psql_completion(const char *text, int start, int end)
 	/* Complete "CREATE SUBSCRIPTION <name> ...  WITH ( <opt>" */
 	else if (HeadMatches("CREATE", "SUBSCRIPTION") && TailMatches("WITH", "("))
 		COMPLETE_WITH("binary", "connect", "copy_data", "create_slot",
-					  "enabled", "slot_name", "streaming",
+					  "enabled", "only_local", "slot_name", "streaming",
 					  "synchronous_commit", "two_phase", "disable_on_error");
 
 /* CREATE TRIGGER --- is allowed inside CREATE SCHEMA, so use TailMatches */
diff --git a/src/include/catalog/pg_subscription.h b/src/include/catalog/pg_subscription.h
index d1260f590c..d47d4f3a5f 100644
--- a/src/include/catalog/pg_subscription.h
+++ b/src/include/catalog/pg_subscription.h
@@ -70,6 +70,8 @@ CATALOG(pg_subscription,6100,SubscriptionRelationId) BKI_SHARED_RELATION BKI_ROW
 
 	bool		substream;		/* Stream in-progress transactions. */
 
+	bool		subonlylocal;		/* Skip replication of remote origin data */
+
 	char		subtwophasestate;	/* Stream two-phase transactions */
 
 	bool		subdisableonerr;	/* True if a worker error should cause the
@@ -110,6 +112,7 @@ typedef struct Subscription
 	bool		binary;			/* Indicates if the subscription wants data in
 								 * binary format */
 	bool		stream;			/* Allow streaming in-progress transactions. */
+	bool		only_local;		/* Skip replication of remote origin data */
 	char		twophasestate;	/* Allow streaming two-phase transactions */
 	bool		disableonerr;	/* Indicates if the subscription should be
 								 * automatically disabled if a worker error
diff --git a/src/include/replication/logicalproto.h b/src/include/replication/logicalproto.h
index a771ab8ff3..7bb6fee9c9 100644
--- a/src/include/replication/logicalproto.h
+++ b/src/include/replication/logicalproto.h
@@ -32,6 +32,13 @@
  *
  * LOGICALREP_PROTO_TWOPHASE_VERSION_NUM is the minimum protocol version with
  * support for two-phase commit decoding (at prepare time). Introduced in PG15.
+ *
+ * LOGICALREP_PROTO_LOCALONLY_VERSION_NUM is the minimum protocol version with
+ * support for sending only locally originated data from the publisher.
+ * Introduced in PG16.
+ *
+ * FIXME: LOGICALREP_PROTO_LOCALONLY_VERSION_NUM needs to be bumped to 4 in
+ * PG16.
  */
 #define LOGICALREP_PROTO_MIN_VERSION_NUM 1
 #define LOGICALREP_PROTO_VERSION_NUM 1
diff --git a/src/include/replication/pgoutput.h b/src/include/replication/pgoutput.h
index eafedd610a..0461f4e634 100644
--- a/src/include/replication/pgoutput.h
+++ b/src/include/replication/pgoutput.h
@@ -29,6 +29,7 @@ typedef struct PGOutputData
 	bool		streaming;
 	bool		messages;
 	bool		two_phase;
+	bool		only_local;
 } PGOutputData;
 
 #endif							/* PGOUTPUT_H */
diff --git a/src/include/replication/walreceiver.h b/src/include/replication/walreceiver.h
index 81184aa92f..796c04db4e 100644
--- a/src/include/replication/walreceiver.h
+++ b/src/include/replication/walreceiver.h
@@ -183,6 +183,7 @@ typedef struct
 			bool		streaming;	/* Streaming of large transactions */
 			bool		twophase;	/* Streaming of two-phase transactions at
 									 * prepare time */
+			bool		only_local; /* publish only locally originated data */
 		}			logical;
 	}			proto;
 } WalRcvStreamOptions;
diff --git a/src/test/regress/expected/subscription.out b/src/test/regress/expected/subscription.out
index 7fcfad1591..a9351b426b 100644
--- a/src/test/regress/expected/subscription.out
+++ b/src/test/regress/expected/subscription.out
@@ -70,16 +70,38 @@ ALTER SUBSCRIPTION regress_testsub3 ENABLE;
 ERROR:  cannot enable subscription that does not have a slot name
 ALTER SUBSCRIPTION regress_testsub3 REFRESH PUBLICATION;
 ERROR:  ALTER SUBSCRIPTION ... REFRESH is not allowed for disabled subscriptions
+-- fail - only_local must be boolean
+CREATE SUBSCRIPTION regress_testsub4 CONNECTION 'dbname=regress_doesnotexist' PUBLICATION testpub WITH (slot_name = NONE, connect = false, only_local = foo);
+ERROR:  only_local requires a Boolean value
+-- now it works
+CREATE SUBSCRIPTION regress_testsub4 CONNECTION 'dbname=regress_doesnotexist' PUBLICATION testpub WITH (slot_name = NONE, connect = false, only_local = true);
+WARNING:  tables were not subscribed, you will have to run ALTER SUBSCRIPTION ... REFRESH PUBLICATION to subscribe the tables
+\dRs+ regress_testsub4
+                                                                                           List of subscriptions
+       Name       |           Owner           | Enabled | Publication | Binary | Streaming | Two phase commit | Disable on error | Only local | Synchronous commit |          Conninfo           | Skip LSN 
+------------------+---------------------------+---------+-------------+--------+-----------+------------------+------------------+------------+--------------------+-----------------------------+----------
+ regress_testsub4 | regress_subscription_user | f       | {testpub}   | f      | f         | d                | f                | t          | off                | dbname=regress_doesnotexist | 0/0
+(1 row)
+
+ALTER SUBSCRIPTION regress_testsub4 SET (only_local = false);
+\dRs+ regress_testsub4
+                                                                                           List of subscriptions
+       Name       |           Owner           | Enabled | Publication | Binary | Streaming | Two phase commit | Disable on error | Only local | Synchronous commit |          Conninfo           | Skip LSN 
+------------------+---------------------------+---------+-------------+--------+-----------+------------------+------------------+------------+--------------------+-----------------------------+----------
+ regress_testsub4 | regress_subscription_user | f       | {testpub}   | f      | f         | d                | f                | f          | off                | dbname=regress_doesnotexist | 0/0
+(1 row)
+
 DROP SUBSCRIPTION regress_testsub3;
+DROP SUBSCRIPTION regress_testsub4;
 -- fail - invalid connection string
 ALTER SUBSCRIPTION regress_testsub CONNECTION 'foobar';
 ERROR:  invalid connection string syntax: missing "=" after "foobar" in connection info string
 
 \dRs+
-                                                                                    List of subscriptions
-      Name       |           Owner           | Enabled | Publication | Binary | Streaming | Two phase commit | Disable on error | Synchronous commit |          Conninfo           | Skip LSN 
------------------+---------------------------+---------+-------------+--------+-----------+------------------+------------------+--------------------+-----------------------------+----------
- regress_testsub | regress_subscription_user | f       | {testpub}   | f      | f         | d                | f                | off                | dbname=regress_doesnotexist | 0/0
+                                                                                           List of subscriptions
+      Name       |           Owner           | Enabled | Publication | Binary | Streaming | Two phase commit | Disable on error | Only local | Synchronous commit |          Conninfo           | Skip LSN 
+-----------------+---------------------------+---------+-------------+--------+-----------+------------------+------------------+------------+--------------------+-----------------------------+----------
+ regress_testsub | regress_subscription_user | f       | {testpub}   | f      | f         | d                | f                | f          | off                | dbname=regress_doesnotexist | 0/0
 (1 row)
 
 ALTER SUBSCRIPTION regress_testsub SET PUBLICATION testpub2, testpub3 WITH (refresh = false);
@@ -96,10 +118,10 @@ ERROR:  unrecognized subscription parameter: "create_slot"
 -- ok
 ALTER SUBSCRIPTION regress_testsub SKIP (lsn = '0/12345');
 \dRs+
-                                                                                         List of subscriptions
-      Name       |           Owner           | Enabled |     Publication     | Binary | Streaming | Two phase commit | Disable on error | Synchronous commit |           Conninfo           | Skip LSN 
------------------+---------------------------+---------+---------------------+--------+-----------+------------------+------------------+--------------------+------------------------------+----------
- regress_testsub | regress_subscription_user | f       | {testpub2,testpub3} | f      | f         | d                | f                | off                | dbname=regress_doesnotexist2 | 0/12345
+                                                                                               List of subscriptions
+      Name       |           Owner           | Enabled |     Publication     | Binary | Streaming | Two phase commit | Disable on error | Only local | Synchronous commit |           Conninfo           | Skip LSN 
+-----------------+---------------------------+---------+---------------------+--------+-----------+------------------+------------------+------------+--------------------+------------------------------+----------
+ regress_testsub | regress_subscription_user | f       | {testpub2,testpub3} | f      | f         | d                | f                | f          | off                | dbname=regress_doesnotexist2 | 0/12345
 (1 row)
 
 -- ok - with lsn = NONE
@@ -108,10 +130,10 @@ ALTER SUBSCRIPTION regress_testsub SKIP (lsn = NONE);
 ALTER SUBSCRIPTION regress_testsub SKIP (lsn = '0/0');
 ERROR:  invalid WAL location (LSN): 0/0
 \dRs+
-                                                                                         List of subscriptions
-      Name       |           Owner           | Enabled |     Publication     | Binary | Streaming | Two phase commit | Disable on error | Synchronous commit |           Conninfo           | Skip LSN 
------------------+---------------------------+---------+---------------------+--------+-----------+------------------+------------------+--------------------+------------------------------+----------
- regress_testsub | regress_subscription_user | f       | {testpub2,testpub3} | f      | f         | d                | f                | off                | dbname=regress_doesnotexist2 | 0/0
+                                                                                               List of subscriptions
+      Name       |           Owner           | Enabled |     Publication     | Binary | Streaming | Two phase commit | Disable on error | Only local | Synchronous commit |           Conninfo           | Skip LSN 
+-----------------+---------------------------+---------+---------------------+--------+-----------+------------------+------------------+------------+--------------------+------------------------------+----------
+ regress_testsub | regress_subscription_user | f       | {testpub2,testpub3} | f      | f         | d                | f                | f          | off                | dbname=regress_doesnotexist2 | 0/0
 (1 row)
 
 BEGIN;
@@ -143,10 +165,10 @@ ALTER SUBSCRIPTION regress_testsub_foo SET (synchronous_commit = foobar);
 ERROR:  invalid value for parameter "synchronous_commit": "foobar"
 HINT:  Available values: local, remote_write, remote_apply, on, off.
 \dRs+
-                                                                                           List of subscriptions
-        Name         |           Owner           | Enabled |     Publication     | Binary | Streaming | Two phase commit | Disable on error | Synchronous commit |           Conninfo           | Skip LSN 
----------------------+---------------------------+---------+---------------------+--------+-----------+------------------+------------------+--------------------+------------------------------+----------
- regress_testsub_foo | regress_subscription_user | f       | {testpub2,testpub3} | f      | f         | d                | f                | local              | dbname=regress_doesnotexist2 | 0/0
+                                                                                                 List of subscriptions
+        Name         |           Owner           | Enabled |     Publication     | Binary | Streaming | Two phase commit | Disable on error | Only local | Synchronous commit |           Conninfo           | Skip LSN 
+---------------------+---------------------------+---------+---------------------+--------+-----------+------------------+------------------+------------+--------------------+------------------------------+----------
+ regress_testsub_foo | regress_subscription_user | f       | {testpub2,testpub3} | f      | f         | d                | f                | f          | local              | dbname=regress_doesnotexist2 | 0/0
 (1 row)
 
 -- rename back to keep the rest simple
@@ -179,19 +201,19 @@ ERROR:  binary requires a Boolean value
 CREATE SUBSCRIPTION regress_testsub CONNECTION 'dbname=regress_doesnotexist' PUBLICATION testpub WITH (connect = false, binary = true);
 WARNING:  tables were not subscribed, you will have to run ALTER SUBSCRIPTION ... REFRESH PUBLICATION to subscribe the tables
 \dRs+
-                                                                                    List of subscriptions
-      Name       |           Owner           | Enabled | Publication | Binary | Streaming | Two phase commit | Disable on error | Synchronous commit |          Conninfo           | Skip LSN 
------------------+---------------------------+---------+-------------+--------+-----------+------------------+------------------+--------------------+-----------------------------+----------
- regress_testsub | regress_subscription_user | f       | {testpub}   | t      | f         | d                | f                | off                | dbname=regress_doesnotexist | 0/0
+                                                                                           List of subscriptions
+      Name       |           Owner           | Enabled | Publication | Binary | Streaming | Two phase commit | Disable on error | Only local | Synchronous commit |          Conninfo           | Skip LSN 
+-----------------+---------------------------+---------+-------------+--------+-----------+------------------+------------------+------------+--------------------+-----------------------------+----------
+ regress_testsub | regress_subscription_user | f       | {testpub}   | t      | f         | d                | f                | f          | off                | dbname=regress_doesnotexist | 0/0
 (1 row)
 
 ALTER SUBSCRIPTION regress_testsub SET (binary = false);
 ALTER SUBSCRIPTION regress_testsub SET (slot_name = NONE);
 \dRs+
-                                                                                    List of subscriptions
-      Name       |           Owner           | Enabled | Publication | Binary | Streaming | Two phase commit | Disable on error | Synchronous commit |          Conninfo           | Skip LSN 
------------------+---------------------------+---------+-------------+--------+-----------+------------------+------------------+--------------------+-----------------------------+----------
- regress_testsub | regress_subscription_user | f       | {testpub}   | f      | f         | d                | f                | off                | dbname=regress_doesnotexist | 0/0
+                                                                                           List of subscriptions
+      Name       |           Owner           | Enabled | Publication | Binary | Streaming | Two phase commit | Disable on error | Only local | Synchronous commit |          Conninfo           | Skip LSN 
+-----------------+---------------------------+---------+-------------+--------+-----------+------------------+------------------+------------+--------------------+-----------------------------+----------
+ regress_testsub | regress_subscription_user | f       | {testpub}   | f      | f         | d                | f                | f          | off                | dbname=regress_doesnotexist | 0/0
 (1 row)
 
 DROP SUBSCRIPTION regress_testsub;
@@ -202,19 +224,19 @@ ERROR:  streaming requires a Boolean value
 CREATE SUBSCRIPTION regress_testsub CONNECTION 'dbname=regress_doesnotexist' PUBLICATION testpub WITH (connect = false, streaming = true);
 WARNING:  tables were not subscribed, you will have to run ALTER SUBSCRIPTION ... REFRESH PUBLICATION to subscribe the tables
 \dRs+
-                                                                                    List of subscriptions
-      Name       |           Owner           | Enabled | Publication | Binary | Streaming | Two phase commit | Disable on error | Synchronous commit |          Conninfo           | Skip LSN 
------------------+---------------------------+---------+-------------+--------+-----------+------------------+------------------+--------------------+-----------------------------+----------
- regress_testsub | regress_subscription_user | f       | {testpub}   | f      | t         | d                | f                | off                | dbname=regress_doesnotexist | 0/0
+                                                                                           List of subscriptions
+      Name       |           Owner           | Enabled | Publication | Binary | Streaming | Two phase commit | Disable on error | Only local | Synchronous commit |          Conninfo           | Skip LSN 
+-----------------+---------------------------+---------+-------------+--------+-----------+------------------+------------------+------------+--------------------+-----------------------------+----------
+ regress_testsub | regress_subscription_user | f       | {testpub}   | f      | t         | d                | f                | f          | off                | dbname=regress_doesnotexist | 0/0
 (1 row)
 
 ALTER SUBSCRIPTION regress_testsub SET (streaming = false);
 ALTER SUBSCRIPTION regress_testsub SET (slot_name = NONE);
 \dRs+
-                                                                                    List of subscriptions
-      Name       |           Owner           | Enabled | Publication | Binary | Streaming | Two phase commit | Disable on error | Synchronous commit |          Conninfo           | Skip LSN 
------------------+---------------------------+---------+-------------+--------+-----------+------------------+------------------+--------------------+-----------------------------+----------
- regress_testsub | regress_subscription_user | f       | {testpub}   | f      | f         | d                | f                | off                | dbname=regress_doesnotexist | 0/0
+                                                                                           List of subscriptions
+      Name       |           Owner           | Enabled | Publication | Binary | Streaming | Two phase commit | Disable on error | Only local | Synchronous commit |          Conninfo           | Skip LSN 
+-----------------+---------------------------+---------+-------------+--------+-----------+------------------+------------------+------------+--------------------+-----------------------------+----------
+ regress_testsub | regress_subscription_user | f       | {testpub}   | f      | f         | d                | f                | f          | off                | dbname=regress_doesnotexist | 0/0
 (1 row)
 
 -- fail - publication already exists
@@ -229,10 +251,10 @@ ALTER SUBSCRIPTION regress_testsub ADD PUBLICATION testpub1, testpub2 WITH (refr
 ALTER SUBSCRIPTION regress_testsub ADD PUBLICATION testpub1, testpub2 WITH (refresh = false);
 ERROR:  publication "testpub1" is already in subscription "regress_testsub"
 \dRs+
-                                                                                            List of subscriptions
-      Name       |           Owner           | Enabled |         Publication         | Binary | Streaming | Two phase commit | Disable on error | Synchronous commit |          Conninfo           | Skip LSN 
------------------+---------------------------+---------+-----------------------------+--------+-----------+------------------+------------------+--------------------+-----------------------------+----------
- regress_testsub | regress_subscription_user | f       | {testpub,testpub1,testpub2} | f      | f         | d                | f                | off                | dbname=regress_doesnotexist | 0/0
+                                                                                                   List of subscriptions
+      Name       |           Owner           | Enabled |         Publication         | Binary | Streaming | Two phase commit | Disable on error | Only local | Synchronous commit |          Conninfo           | Skip LSN 
+-----------------+---------------------------+---------+-----------------------------+--------+-----------+------------------+------------------+------------+--------------------+-----------------------------+----------
+ regress_testsub | regress_subscription_user | f       | {testpub,testpub1,testpub2} | f      | f         | d                | f                | f          | off                | dbname=regress_doesnotexist | 0/0
 (1 row)
 
 -- fail - publication used more then once
@@ -247,10 +269,10 @@ ERROR:  publication "testpub3" is not in subscription "regress_testsub"
 -- ok - delete publications
 ALTER SUBSCRIPTION regress_testsub DROP PUBLICATION testpub1, testpub2 WITH (refresh = false);
 \dRs+
-                                                                                    List of subscriptions
-      Name       |           Owner           | Enabled | Publication | Binary | Streaming | Two phase commit | Disable on error | Synchronous commit |          Conninfo           | Skip LSN 
------------------+---------------------------+---------+-------------+--------+-----------+------------------+------------------+--------------------+-----------------------------+----------
- regress_testsub | regress_subscription_user | f       | {testpub}   | f      | f         | d                | f                | off                | dbname=regress_doesnotexist | 0/0
+                                                                                           List of subscriptions
+      Name       |           Owner           | Enabled | Publication | Binary | Streaming | Two phase commit | Disable on error | Only local | Synchronous commit |          Conninfo           | Skip LSN 
+-----------------+---------------------------+---------+-------------+--------+-----------+------------------+------------------+------------+--------------------+-----------------------------+----------
+ regress_testsub | regress_subscription_user | f       | {testpub}   | f      | f         | d                | f                | f          | off                | dbname=regress_doesnotexist | 0/0
 (1 row)
 
 DROP SUBSCRIPTION regress_testsub;
@@ -284,10 +306,10 @@ ERROR:  two_phase requires a Boolean value
 CREATE SUBSCRIPTION regress_testsub CONNECTION 'dbname=regress_doesnotexist' PUBLICATION testpub WITH (connect = false, two_phase = true);
 WARNING:  tables were not subscribed, you will have to run ALTER SUBSCRIPTION ... REFRESH PUBLICATION to subscribe the tables
 \dRs+
-                                                                                    List of subscriptions
-      Name       |           Owner           | Enabled | Publication | Binary | Streaming | Two phase commit | Disable on error | Synchronous commit |          Conninfo           | Skip LSN 
------------------+---------------------------+---------+-------------+--------+-----------+------------------+------------------+--------------------+-----------------------------+----------
- regress_testsub | regress_subscription_user | f       | {testpub}   | f      | f         | p                | f                | off                | dbname=regress_doesnotexist | 0/0
+                                                                                           List of subscriptions
+      Name       |           Owner           | Enabled | Publication | Binary | Streaming | Two phase commit | Disable on error | Only local | Synchronous commit |          Conninfo           | Skip LSN 
+-----------------+---------------------------+---------+-------------+--------+-----------+------------------+------------------+------------+--------------------+-----------------------------+----------
+ regress_testsub | regress_subscription_user | f       | {testpub}   | f      | f         | p                | f                | f          | off                | dbname=regress_doesnotexist | 0/0
 (1 row)
 
 --fail - alter of two_phase option not supported.
@@ -296,10 +318,10 @@ ERROR:  unrecognized subscription parameter: "two_phase"
 -- but can alter streaming when two_phase enabled
 ALTER SUBSCRIPTION regress_testsub SET (streaming = true);
 \dRs+
-                                                                                    List of subscriptions
-      Name       |           Owner           | Enabled | Publication | Binary | Streaming | Two phase commit | Disable on error | Synchronous commit |          Conninfo           | Skip LSN 
------------------+---------------------------+---------+-------------+--------+-----------+------------------+------------------+--------------------+-----------------------------+----------
- regress_testsub | regress_subscription_user | f       | {testpub}   | f      | t         | p                | f                | off                | dbname=regress_doesnotexist | 0/0
+                                                                                           List of subscriptions
+      Name       |           Owner           | Enabled | Publication | Binary | Streaming | Two phase commit | Disable on error | Only local | Synchronous commit |          Conninfo           | Skip LSN 
+-----------------+---------------------------+---------+-------------+--------+-----------+------------------+------------------+------------+--------------------+-----------------------------+----------
+ regress_testsub | regress_subscription_user | f       | {testpub}   | f      | t         | p                | f                | f          | off                | dbname=regress_doesnotexist | 0/0
 (1 row)
 
 ALTER SUBSCRIPTION regress_testsub SET (slot_name = NONE);
@@ -308,10 +330,10 @@ DROP SUBSCRIPTION regress_testsub;
 CREATE SUBSCRIPTION regress_testsub CONNECTION 'dbname=regress_doesnotexist' PUBLICATION testpub WITH (connect = false, streaming = true, two_phase = true);
 WARNING:  tables were not subscribed, you will have to run ALTER SUBSCRIPTION ... REFRESH PUBLICATION to subscribe the tables
 \dRs+
-                                                                                    List of subscriptions
-      Name       |           Owner           | Enabled | Publication | Binary | Streaming | Two phase commit | Disable on error | Synchronous commit |          Conninfo           | Skip LSN 
------------------+---------------------------+---------+-------------+--------+-----------+------------------+------------------+--------------------+-----------------------------+----------
- regress_testsub | regress_subscription_user | f       | {testpub}   | f      | t         | p                | f                | off                | dbname=regress_doesnotexist | 0/0
+                                                                                           List of subscriptions
+      Name       |           Owner           | Enabled | Publication | Binary | Streaming | Two phase commit | Disable on error | Only local | Synchronous commit |          Conninfo           | Skip LSN 
+-----------------+---------------------------+---------+-------------+--------+-----------+------------------+------------------+------------+--------------------+-----------------------------+----------
+ regress_testsub | regress_subscription_user | f       | {testpub}   | f      | t         | p                | f                | f          | off                | dbname=regress_doesnotexist | 0/0
 (1 row)
 
 ALTER SUBSCRIPTION regress_testsub SET (slot_name = NONE);
@@ -323,18 +345,18 @@ ERROR:  disable_on_error requires a Boolean value
 CREATE SUBSCRIPTION regress_testsub CONNECTION 'dbname=regress_doesnotexist' PUBLICATION testpub WITH (connect = false, disable_on_error = false);
 WARNING:  tables were not subscribed, you will have to run ALTER SUBSCRIPTION ... REFRESH PUBLICATION to subscribe the tables
 \dRs+
-                                                                                    List of subscriptions
-      Name       |           Owner           | Enabled | Publication | Binary | Streaming | Two phase commit | Disable on error | Synchronous commit |          Conninfo           | Skip LSN 
------------------+---------------------------+---------+-------------+--------+-----------+------------------+------------------+--------------------+-----------------------------+----------
- regress_testsub | regress_subscription_user | f       | {testpub}   | f      | f         | d                | f                | off                | dbname=regress_doesnotexist | 0/0
+                                                                                           List of subscriptions
+      Name       |           Owner           | Enabled | Publication | Binary | Streaming | Two phase commit | Disable on error | Only local | Synchronous commit |          Conninfo           | Skip LSN 
+-----------------+---------------------------+---------+-------------+--------+-----------+------------------+------------------+------------+--------------------+-----------------------------+----------
+ regress_testsub | regress_subscription_user | f       | {testpub}   | f      | f         | d                | f                | f          | off                | dbname=regress_doesnotexist | 0/0
 (1 row)
 
 ALTER SUBSCRIPTION regress_testsub SET (disable_on_error = true);
 \dRs+
-                                                                                    List of subscriptions
-      Name       |           Owner           | Enabled | Publication | Binary | Streaming | Two phase commit | Disable on error | Synchronous commit |          Conninfo           | Skip LSN 
------------------+---------------------------+---------+-------------+--------+-----------+------------------+------------------+--------------------+-----------------------------+----------
- regress_testsub | regress_subscription_user | f       | {testpub}   | f      | f         | d                | t                | off                | dbname=regress_doesnotexist | 0/0
+                                                                                           List of subscriptions
+      Name       |           Owner           | Enabled | Publication | Binary | Streaming | Two phase commit | Disable on error | Only local | Synchronous commit |          Conninfo           | Skip LSN 
+-----------------+---------------------------+---------+-------------+--------+-----------+------------------+------------------+------------+--------------------+-----------------------------+----------
+ regress_testsub | regress_subscription_user | f       | {testpub}   | f      | f         | d                | t                | f          | off                | dbname=regress_doesnotexist | 0/0
 (1 row)
 
 ALTER SUBSCRIPTION regress_testsub SET (slot_name = NONE);
diff --git a/src/test/regress/sql/subscription.sql b/src/test/regress/sql/subscription.sql
index 74c38ead5d..28eb91fc47 100644
--- a/src/test/regress/sql/subscription.sql
+++ b/src/test/regress/sql/subscription.sql
@@ -54,7 +54,17 @@ CREATE SUBSCRIPTION regress_testsub3 CONNECTION 'dbname=regress_doesnotexist' PU
 ALTER SUBSCRIPTION regress_testsub3 ENABLE;
 ALTER SUBSCRIPTION regress_testsub3 REFRESH PUBLICATION;
 
+-- fail - only_local must be boolean
+CREATE SUBSCRIPTION regress_testsub4 CONNECTION 'dbname=regress_doesnotexist' PUBLICATION testpub WITH (slot_name = NONE, connect = false, only_local = foo);
+
+-- now it works
+CREATE SUBSCRIPTION regress_testsub4 CONNECTION 'dbname=regress_doesnotexist' PUBLICATION testpub WITH (slot_name = NONE, connect = false, only_local = true);
+\dRs+ regress_testsub4
+ALTER SUBSCRIPTION regress_testsub4 SET (only_local = false);
+\dRs+ regress_testsub4
+
 DROP SUBSCRIPTION regress_testsub3;
+DROP SUBSCRIPTION regress_testsub4;
 
 -- fail - invalid connection string
 ALTER SUBSCRIPTION regress_testsub CONNECTION 'foobar';
diff --git a/src/test/subscription/t/032_onlylocal.pl b/src/test/subscription/t/032_onlylocal.pl
new file mode 100644
index 0000000000..5ff5a0d9dc
--- /dev/null
+++ b/src/test/subscription/t/032_onlylocal.pl
@@ -0,0 +1,162 @@
+
+# Copyright (c) 2021-2022, PostgreSQL Global Development Group
+
+# Test logical replication using only_local option.
+use strict;
+use warnings;
+use PostgreSQL::Test::Cluster;
+use PostgreSQL::Test::Utils;
+use Test::More;
+
+###############################################################################
+# Setup a bidirectional logical replication between Node_A & Node_B
+###############################################################################
+
+# Initialize nodes
+# node_A
+my $node_A = PostgreSQL::Test::Cluster->new('node_A');
+$node_A->init(allows_streaming => 'logical');
+$node_A->append_conf(
+	'postgresql.conf', qq(
+max_prepared_transactions = 10
+logical_decoding_work_mem = 64kB
+));
+$node_A->start;
+# node_B
+my $node_B = PostgreSQL::Test::Cluster->new('node_B');
+$node_B->init(allows_streaming => 'logical');
+$node_B->append_conf(
+	'postgresql.conf', qq(
+max_prepared_transactions = 10
+logical_decoding_work_mem = 64kB
+));
+$node_B->start;
+
+# Create tables on node_A
+$node_A->safe_psql('postgres', "CREATE TABLE tab_full (a int PRIMARY KEY)");
+
+# Create the same tables on node_B
+$node_B->safe_psql('postgres', "CREATE TABLE tab_full (a int PRIMARY KEY)");
+
+# Setup logical replication
+# node_A (pub) -> node_B (sub)
+my $node_A_connstr = $node_A->connstr . ' dbname=postgres';
+$node_A->safe_psql('postgres',
+	"CREATE PUBLICATION tap_pub_A FOR TABLE tab_full");
+my $appname_B1 = 'tap_sub_B1';
+$node_B->safe_psql(
+	'postgres', "
+	CREATE SUBSCRIPTION tap_sub_B1
+	CONNECTION '$node_A_connstr application_name=$appname_B1'
+	PUBLICATION tap_pub_A
+	WITH (only_local = on)");
+
+# node_B (pub) -> node_A (sub)
+my $node_B_connstr = $node_B->connstr . ' dbname=postgres';
+$node_B->safe_psql('postgres',
+	"CREATE PUBLICATION tap_pub_B FOR TABLE tab_full");
+my $appname_A = 'tap_sub_A';
+$node_A->safe_psql(
+	'postgres', "
+	CREATE SUBSCRIPTION tap_sub_A
+	CONNECTION '$node_B_connstr application_name=$appname_A'
+	PUBLICATION tap_pub_B
+	WITH (only_local = on, copy_data = off)");
+
+# Wait for subscribers to finish initialization
+$node_A->wait_for_catchup($appname_B1);
+$node_B->wait_for_catchup($appname_A);
+
+# Also wait for initial table sync to finish
+my $synced_query =
+  "SELECT count(1) = 0 FROM pg_subscription_rel WHERE srsubstate NOT IN ('r', 's');";
+$node_A->poll_query_until('postgres', $synced_query)
+  or die "Timed out while waiting for subscriber to synchronize data";
+$node_B->poll_query_until('postgres', $synced_query)
+  or die "Timed out while waiting for subscriber to synchronize data";
+
+is(1, 1, "Circular replication setup is complete");
+
+my $result;
+
+###############################################################################
+# check that bidirectional logical replication setup does not cause infinite
+# recursive insertion.
+###############################################################################
+
+# insert a record
+$node_A->safe_psql('postgres', "INSERT INTO tab_full VALUES (11);");
+$node_B->safe_psql('postgres', "INSERT INTO tab_full VALUES (12);");
+
+$node_A->wait_for_catchup($appname_B1);
+$node_B->wait_for_catchup($appname_A);
+
+# check that transaction was committed on subscriber(s)
+$result = $node_A->safe_psql('postgres', "SELECT * FROM tab_full ORDER BY 1;");
+is($result, qq(11
+12), 'Inserted successfully without leading to infinite recursion in bidirectional replication setup'
+);
+$result = $node_B->safe_psql('postgres', "SELECT * FROM tab_full ORDER BY 1;");
+is($result, qq(11
+12), 'Inserted successfully without leading to infinite recursion in bidirectional replication setup'
+);
+
+###############################################################################
+# check that remote data that is originated from node_C to node_B is not
+# published to node_A
+###############################################################################
+# Initialize node node_C
+my $node_C = PostgreSQL::Test::Cluster->new('node_C');
+$node_C->init(allows_streaming => 'logical');
+$node_C->append_conf(
+	'postgresql.conf', qq(
+max_prepared_transactions = 10
+logical_decoding_work_mem = 64kB
+));
+$node_C->start;
+
+$node_C->safe_psql('postgres', "CREATE TABLE tab_full (a int PRIMARY KEY)");
+
+# Setup logical replication
+# node_C (pub) -> node_B (sub)
+my $node_C_connstr = $node_C->connstr . ' dbname=postgres';
+$node_C->safe_psql('postgres',
+	"CREATE PUBLICATION tap_pub_C FOR TABLE tab_full");
+
+my $appname_B2 = 'tap_sub_B2';
+$node_B->safe_psql(
+	'postgres', "
+	CREATE SUBSCRIPTION tap_sub_B2
+	CONNECTION '$node_C_connstr application_name=$appname_B2'
+	PUBLICATION tap_pub_C
+	WITH (only_local = on)");
+
+$node_C->wait_for_catchup($appname_B2);
+
+$node_C->poll_query_until('postgres', $synced_query)
+  or die "Timed out while waiting for subscriber to synchronize data";
+
+# insert a record
+$node_C->safe_psql('postgres', "INSERT INTO tab_full VALUES (13);");
+
+$node_C->wait_for_catchup($appname_B2);
+$node_B->wait_for_catchup($appname_A);
+
+$result = $node_B->safe_psql('postgres', "SELECT * FROM tab_full ORDER BY 1;");
+is($result, qq(11
+12
+13), 'Node_C data replicated to Node_B'
+);
+
+# check that the data published from node_C to node_B is not sent to node_A
+$result = $node_A->safe_psql('postgres', "SELECT * FROM tab_full ORDER BY 1;");
+is($result, qq(11
+12), 'Remote data originated from other node is not replicated when only_local option is ON'
+);
+
+# shutdown
+$node_B->stop('fast');
+$node_A->stop('fast');
+$node_C->stop('fast');
+
+done_testing();
-- 
2.32.0

v14-0002-Support-force-option-for-copy_data-check-and-thr.patchapplication/octet-stream; name=v14-0002-Support-force-option-for-copy_data-check-and-thr.patchDownload

From 4ceb34c134ce796c89daa7e183a2b27dcc448e15 Mon Sep 17 00:00:00 2001
From: Vigneshwaran C <vignesh21@gmail.com>
Date: Fri, 20 May 2022 13:17:08 +0530
Subject: [PATCH v14 2/2] Support force option for copy_data, check and throw
 an error if publisher tables were also subscribing data in the publisher from
 other publishers.

This patch does couple of things:
change 1) Added force option for copy_data.
change 2) Check and throw an error if the publication tables were also subscribing
data in the publisher from other publishers.

--------------------------------------------------------------------------------
The following will help us understand how the first change will be useful:
Let's take a simple case where user is trying to setup bidirectional logical
replication between node1 and node1 where the two nodes has some pre-existing
data like below:
node1: Table t1 (c1 int) has data 1, 2, 3, 4
node2: Table t1 (c1 int) has data 5, 6, 7, 8

The following steps are required in this case:
node1
step 1: CREATE PUBLICATION pub_node1 FOR TABLE t1;

node2
step 2: CREATE PUBLICATION pub_node2 FOR TABLE t1;

node1:
step 3: CREATE SUBSCRIPTION sub_node1_node2 CONNECTION '<node2 details>' PUBLICATION pub_node2;

node2:
step 4: CREATE SUBSCRIPTION sub_node2_node1 Connection '<node1 details>' PUBLICATION pub_node1;

After this the data will be something like this:
node1:
1, 2, 3, 4, 5, 6, 7, 8

node2:
1, 2, 3, 4, 5, 6, 7, 8, 5, 6, 7, 8

So, you can see that data on node2 (5, 6, 7, 8) is duplicated. In
case, table t1 has a unique key, it will lead to a unique key
violation and replication won't proceed.

This problem can be solved by using only_local and copy_data option as given
below:
Step 1 & Step 2 are same as above.

step 3: Then, Create a subscription in node1 to subscribe to node2. Use
copy_data specified as on so that the existing table data is copied during
initial sync:
CREATE SUBSCRIPTION sub_node1_node2 CONNECTION '<node2 details>' PUBLICATION pub_node2 WITH (copy_data = on, only_local = on);

step 4: Adjust the publication publish settings so that truncate is not
published to the subscribers and truncate the table data in node2:
ALTER PUBLICATION pub1_node2 SET (publish='insert,update,delete');
TRUNCATE t1;
ALTER PUBLICATION pub1_node2 SET (publish='insert,update,delete,truncate');

step 5: Create a subscription in node2 to subscribe to node1. Use copy_data
specified as force when creating a subscription to node1 so that the existing
table data is copied during initial sync:
CREATE SUBSCRIPTION sub_node2_node1 CONNECTION '<node1 details>' PUBLICATION pub_node1 WITH (copy_data = force, only_local = on);

--------------------------------------------------------------------------------
The below help us understand how the second change will be useful:

If copy_data option was used with 'on' in step 5, then an error will be thrown
to alert the user to prevent inconsistent data being populated:
CREATE SUBSCRIPTION sub_node2_node1 CONNECTION '<node1 details>' PUBLICATION pub_node1 WITH (copy_data = force, only_local = on);
ERROR:  CREATE/ALTER SUBSCRIPTION with only_local and copy_data as true is not allowed when the publisher might have replicated data
---
 doc/src/sgml/logical-replication.sgml      | 353 +++++++++++++++++++++
 doc/src/sgml/ref/alter_subscription.sgml   |  16 +-
 doc/src/sgml/ref/create_subscription.sgml  |  33 +-
 src/backend/commands/subscriptioncmds.c    | 139 ++++++--
 src/test/regress/expected/subscription.out |  18 +-
 src/test/regress/sql/subscription.sql      |  12 +
 src/test/subscription/t/032_onlylocal.pl   | 327 ++++++++++++++++---
 src/tools/pgindent/typedefs.list           |   1 +
 8 files changed, 825 insertions(+), 74 deletions(-)

diff --git a/doc/src/sgml/logical-replication.sgml b/doc/src/sgml/logical-replication.sgml
index 145ea71d61..54fa20254c 100644
--- a/doc/src/sgml/logical-replication.sgml
+++ b/doc/src/sgml/logical-replication.sgml
@@ -1267,4 +1267,357 @@ CREATE SUBSCRIPTION mysub CONNECTION 'dbname=foo host=bar user=repuser' PUBLICAT
    incremental changes to those tables.
   </para>
  </sect1>
+
+ <sect1 id="bidirectional-logical-replication">
+  <title>Bidirectional logical replication</title>
+
+  <sect2 id="setting-bidirectional-replication-two-nodes">
+   <title>Setting bidirectional replication between two nodes</title>
+   <para>
+    Bidirectional replication is useful in creating a multi-master database
+    which helps in performing read/write operations from any of the nodes.
+    Setting up bidirectional logical replication between two nodes requires
+    creation of a publication in all the nodes, creating subscriptions in
+    each of the nodes that subscribes to data from all the nodes. The steps
+    to create a two-node bidirectional replication when there is no data in
+    both the nodes are given below:
+   </para>
+
+   <para>
+    Lock the required tables in <literal>node1</literal> and
+    <literal>node2</literal> till the setup is completed.
+   </para>
+
+   <para>
+    Create a publication in <literal>node1</literal>:
+<programlisting>
+node1=# CREATE PUBLICATION pub_node1 FOR TABLE t1;
+CREATE PUBLICATION
+</programlisting></para>
+
+   <para>
+    Create a subscription in <literal>node2</literal> to subscribe to
+    <literal>node1</literal>:
+<programlisting>
+node2=# CREATE SUBSCRIPTION sub_node2_node1
+node2-# CONNECTION 'dbname=foo host=node1 user=repuser'
+node2-# PUBLICATION pub_node1
+node2-# WITH (copy_data = off, only_local = on);
+CREATE SUBSCRIPTION
+</programlisting></para>
+
+   <para>
+    Create a publication in <literal>node2</literal>:
+<programlisting>
+node2=# CREATE PUBLICATION pub_node2 FOR TABLE t1;
+CREATE PUBLICATION
+</programlisting></para>
+
+   <para>
+    Create a subscription in <literal>node1</literal> to subscribe to
+    <literal>node2</literal>:
+<programlisting>
+node1=# CREATE SUBSCRIPTION sub_node1_node2
+node1-# CONNECTION 'dbname=foo host=node2 user=repuser'
+node1-# PUBLICATION pub_node2
+node1-# WITH (copy_data = off, only_local = on);
+CREATE SUBSCRIPTION
+</programlisting></para>
+
+   <para>
+    Now the bidirectional logical replication setup is complete between
+    <literal>node1</literal> and <literal>node2</literal>. Any incremental
+    changes from <literal>node1</literal> will be replicated to
+    <literal>node2</literal> and the incremental changes from
+    <literal>node2</literal> will be replicated to <literal>node1</literal>.
+   </para>
+  </sect2>
+
+  <sect2 id="add-new-node">
+   <title>Adding a new node when there is no data in any of the nodes</title>
+   <para>
+    Adding a new node <literal>node3</literal> to the existing
+    <literal>node1</literal> and <literal>node2</literal> requires setting
+    up subscription in <literal>node1</literal> and <literal>node2</literal>
+    to replicate the data from <literal>node3</literal> and setting up
+    subscription in <literal>node3</literal> to replicate data from
+    <literal>node1</literal> and <literal>node2</literal>.
+   </para>
+
+   <para>
+    Create a publication in <literal>node3</literal>:
+<programlisting>
+node3=# CREATE PUBLICATION pub_node3 FOR TABLE t1;
+CREATE PUBLICATION
+</programlisting></para>
+
+   <para>
+    Lock the required tables in all the nodes <literal>node1</literal>,
+    <literal>node2</literal> and <literal>node3</literal> till the setup is
+    completed.
+   </para>
+
+   <para>
+    Create a subscription in <literal>node1</literal> to subscribe to
+    <literal>node3</literal>:
+<programlisting>
+node1=# CREATE SUBSCRIPTION sub_node1_node3
+node1-# CONNECTION 'dbname=foo host=node3 user=repuser'
+node1-# PUBLICATION pub_node3
+node1-# WITH (copy_data = off, only_local = on);
+CREATE SUBSCRIPTION
+</programlisting></para>
+
+   <para>
+    Create a subscription in <literal>node2</literal> to subscribe to
+    <literal>node3</literal>:
+<programlisting>
+node2=# CREATE SUBSCRIPTION sub_node2_node3
+node2-# CONNECTION 'dbname=foo host=node3 user=repuser'
+node2-# PUBLICATION pub_node3
+node2-# WITH (copy_data = off, only_local = on);
+CREATE SUBSCRIPTION
+</programlisting></para>
+
+   <para>
+    Create a subscription in <literal>node3</literal> to subscribe to
+    <literal>node1</literal>:
+<programlisting>
+node3=# CREATE SUBSCRIPTION sub_node3_node1
+node3-# CONNECTION 'dbname=foo host=node1 user=repuser'
+node3-# PUBLICATION pub_node1
+node3-# WITH (copy_data = off, only_local = on);
+CREATE SUBSCRIPTION
+</programlisting></para>
+
+   <para>
+    Create a subscription in <literal>node3</literal> to subscribe to
+    <literal>node2</literal>:
+<programlisting>
+node3=# CREATE SUBSCRIPTION sub_node3_node2
+node3-# CONNECTION 'dbname=foo host=node2 user=repuser'
+node3-# PUBLICATION pub_node2
+node3-# WITH (copy_data = off, only_local = on);
+CREATE SUBSCRIPTION
+</programlisting></para>
+  </sect2>
+
+  <sect2 id="add-new-node-data-in-existing-node">
+   <title>Adding a new node when data is present in the existing nodes</title>
+    <para>
+     Adding a new node <literal>node3</literal> to the existing
+     <literal>node1</literal> and <literal>node2</literal> when data is present
+     in existing nodes <literal>node1</literal> and <literal>node2</literal>
+     needs similar steps. The only change required here is that
+     <literal>node3</literal> should create a subscription with
+     <literal>copy_data = force</literal> to one of the existing nodes to
+     receive the existing data during initial data synchronization.
+   </para>
+
+   <para>
+    Create a publication in <literal>node3</literal>:
+<programlisting>
+node3=# CREATE PUBLICATION pub_node3 FOR TABLE t1;
+CREATE PUBLICATION
+</programlisting></para>
+
+   <para>
+    Lock the required tables in <literal>node2</literal> and
+    <literal>node3</literal> till the setup is completed.
+   </para>
+
+   <para>
+    Create a subscription in <literal>node1</literal> to subscribe to
+    <literal>node3</literal>:
+<programlisting>
+node1=# CREATE SUBSCRIPTION sub_node1_node3
+node1-# CONNECTION 'dbname=foo host=node3 user=repuser'
+node1-# PUBLICATION pub_node3
+node1-# WITH (copy_data = off, only_local = on);
+CREATE SUBSCRIPTION
+</programlisting></para>
+
+   <para>
+    Create a subscription in <literal>node2</literal> to subscribe to
+    <literal>node3</literal>:
+<programlisting>
+node2=# CREATE SUBSCRIPTION sub_node2_node3
+node2-# CONNECTION 'dbname=foo host=node3 user=repuser'
+node2-# PUBLICATION pub_node3
+node2-# WITH (copy_data = off, only_local = on);
+CREATE SUBSCRIPTION
+</programlisting></para>
+
+   <para>
+    Create a subscription in <literal>node3</literal> to subscribe to
+    <literal>node1</literal>. Use <literal>copy_data</literal> specified as
+    <literal>force</literal> so that the existing table data is
+    copied during initial sync:
+<programlisting>
+node3=# CREATE SUBSCRIPTION sub_node3_node1
+node3-# CONNECTION 'dbname=foo host=node1 user=repuser'
+node3-# PUBLICATION pub_node1
+node3-# WITH (copy_data = force, only_local = on);
+CREATE SUBSCRIPTION
+</programlisting></para>
+
+   <para>
+    Create a subscription in <literal>node3</literal> to subscribe to
+    <literal>node2</literal>:
+<programlisting>
+node3=# CREATE SUBSCRIPTION sub_node3_node2
+node3-# CONNECTION 'dbname=foo host=node2 user=repuser'
+node3-# PUBLICATION pub_node2
+node3-# WITH (copy_data = off, only_local = on);
+CREATE SUBSCRIPTION
+</programlisting></para>
+  </sect2>
+
+  <sect2 id="add-node-data-present-in-new-node">
+   <title>Adding a new node when data is present in the new node</title>
+   <para>
+     Adding a new node <literal>node3</literal> to the existing
+     <literal>node1</literal> and <literal>node2</literal> when data is present
+     in the new node <literal>node3</literal> needs similar steps. A few changes
+     are required here to get the existing data from <literal>node3</literal>
+     to <literal>node1</literal> and <literal>node2</literal> and later
+     cleaning up of data in <literal>node3</literal> before synchronization of
+     all the data from the existing nodes.
+   </para>
+
+   <para>
+    Create a publication in <literal>node3</literal>:
+<programlisting>
+node3=# CREATE PUBLICATION pub_node3 FOR TABLE t1;
+CREATE PUBLICATION
+</programlisting></para>
+
+   <para>
+    Lock the required tables in <literal>node2</literal> and
+    <literal>node3</literal> till the setup is completed.
+   </para>
+
+   <para>
+    Create a subscription in <literal>node1</literal> to subscribe to
+    <literal>node3</literal>. Use <literal>copy_data</literal> specified as
+    <literal>on</literal> so that the existing table data is copied during
+    initial sync:
+<programlisting>
+node1=# CREATE SUBSCRIPTION sub_node1_node3
+node1-# CONNECTION 'dbname=foo host=node3 user=repuser'
+node1-# PUBLICATION pub_node3
+node1-# WITH (copy_data = on, only_local = on);
+CREATE SUBSCRIPTION
+</programlisting></para>
+
+   <para>
+    Create a subscription in <literal>node2</literal> to subscribe to
+    <literal>node3</literal>. Use <literal>copy_data</literal> specified as
+    <literal>on</literal> so that the existing table data is copied during
+    initial sync:
+<programlisting>
+node2=# CREATE SUBSCRIPTION sub_node2_node3
+node2-# CONNECTION 'dbname=foo host=node3 user=repuser'
+node2-# PUBLICATION pub_node3
+node2-# WITH (copy_data = on, only_local = on);
+CREATE SUBSCRIPTION
+</programlisting></para>
+
+   <para>
+    Adjust the publication publish settings so that truncate is not published
+    to the subscribers and truncate the table data in <literal>node3</literal>:
+<programlisting>
+node3=# ALTER PUBLICATION pub_node3 SET (publish='insert,update,delete');
+ALTER PUBLICATION
+node3=# TRUNCATE t1;
+TRUNCATE TABLE
+node3=# ALTER PUBLICATION pub_node3 SET (publish='insert,update,delete,truncate');
+ALTER PUBLICATION
+</programlisting></para>
+
+   <para>
+    Create a subscription in <literal>node3</literal> to subscribe to
+    <literal>node1</literal>. Use <literal>copy_data</literal> specified as
+    <literal>force</literal> when creating a subscription to
+    <literal>node1</literal> so that the existing table data is copied during
+    initial sync:
+<programlisting>
+node3=# CREATE SUBSCRIPTION
+node3-# sub_node3_node1 CONNECTION 'dbname=foo host=node1 user=repuser'
+node3-# PUBLICATION pub_node1
+node3-# WITH (copy_data = force, only_local = on);
+CREATE SUBSCRIPTION
+</programlisting></para>
+
+   <para>
+    Create a subscription in <literal>node3</literal> to subscribe to
+    <literal>node2</literal>. Use <literal>copy_data</literal> specified as
+    <literal>off</literal> because the initial table data would have been
+    already copied in the previous step:
+<programlisting>
+node3=# CREATE SUBSCRIPTION
+node3-# sub_node3_node2 CONNECTION 'dbname=foo host=node2 user=repuser'
+node3-# PUBLICATION pub_node2
+node3-# WITH (copy_data = off, only_local = on);
+CREATE SUBSCRIPTION
+</programlisting></para>
+  </sect2>
+
+  <sect2 id="generic-steps-add-new-node">
+   <title>Generic steps to add a new node to the existing set of nodes</title>
+   <para>
+    Create the required publication on the new node.
+   </para>
+   <para>
+    Lock the required tables in the new node until the setup is complete.
+   </para>
+   <para>
+    Create subscriptions on existing nodes pointing to publication on
+    the new node with <literal>only_local</literal> option specified as
+    <literal>on</literal> and <literal>copy_data</literal> specified as
+    <literal>on</literal>.
+   </para>
+   <para>
+    Wait for data to be copied from the new node to existing nodes.
+   </para>
+   <para>
+    Alter the publication in new node so that the truncate operation is not
+    replicated to the subscribers.
+   </para>
+   <para>
+    Truncate the data on the new node.
+   </para>
+   <para>
+    Alter the publication in new node to include replication of truncate
+    operations.
+   </para>
+   <para>
+    Lock the required tables in the existing nodes except the first node
+    until the setup is complete.
+   </para>
+   <para>
+    Create subscriptions on the new node pointing to publication on the first
+    node with <literal>only_local</literal> option specified as
+    <literal>on</literal> and <literal>copy_data</literal> option specified as
+    <literal>force</literal>.
+   </para>
+   <para>
+    Create subscriptions on the new node pointing to publications on the
+    remaining node with <literal>only_local</literal> option specified as
+    <literal>on</literal> and <literal>copy_data</literal> option specified as
+    <literal>off</literal>.
+   </para>
+  </sect2>
+
+  <sect2>
+   <title>Notes</title>
+   <para>
+    Setting up bidirectional logical replication across nodes requires multiple
+    steps to be performed on various nodes, as all operations are not
+    transactional, user is advised to take backup of existing data to avoid any
+    inconsistency.
+   </para>
+  </sect2>
+ </sect1>
+
 </chapter>
diff --git a/doc/src/sgml/ref/alter_subscription.sgml b/doc/src/sgml/ref/alter_subscription.sgml
index 45beca9b86..34d78a9862 100644
--- a/doc/src/sgml/ref/alter_subscription.sgml
+++ b/doc/src/sgml/ref/alter_subscription.sgml
@@ -161,12 +161,22 @@ ALTER SUBSCRIPTION <replaceable class="parameter">name</replaceable> RENAME TO <
 
       <variablelist>
        <varlistentry>
-        <term><literal>copy_data</literal> (<type>boolean</type>)</term>
+        <term><literal>copy_data</literal> (<type>enum</type>)</term>
         <listitem>
          <para>
           Specifies whether to copy pre-existing data in the publications
-          that are being subscribed to when the replication starts.
-          The default is <literal>true</literal>.
+          that are being subscribed to when the replication starts. This
+          parameter may be either <literal>true</literal>,
+          <literal>false</literal> or <literal>force</literal>. The default is
+          <literal>true</literal>.
+         </para>
+         <para>
+          There is some interaction between the <literal>only_local</literal>
+          option and <literal>copy_data</literal> option. Refer to the
+          <command>CREATE SUBSCRIPTION</command>
+          <xref linkend="sql-createsubscription-notes" /> for interaction
+          details and usage of <literal>force</literal> for
+          <literal>copy_data</literal> option.
          </para>
          <para>
           Previously subscribed tables are not copied, even if a table's row
diff --git a/doc/src/sgml/ref/create_subscription.sgml b/doc/src/sgml/ref/create_subscription.sgml
index 00580cc7ba..2fcd2238af 100644
--- a/doc/src/sgml/ref/create_subscription.sgml
+++ b/doc/src/sgml/ref/create_subscription.sgml
@@ -201,18 +201,28 @@ CREATE SUBSCRIPTION <replaceable class="parameter">subscription_name</replaceabl
        </varlistentry>
 
        <varlistentry>
-        <term><literal>copy_data</literal> (<type>boolean</type>)</term>
+        <term><literal>copy_data</literal> (<type>enum</type>)</term>
         <listitem>
          <para>
           Specifies whether to copy pre-existing data in the publications
-          that are being subscribed to when the replication starts.
-          The default is <literal>true</literal>.
+          that are being subscribed to when the replication starts. This
+          parameter may be either <literal>true</literal>,
+          <literal>false</literal> or <literal>force</literal>. The default is
+          <literal>true</literal>.
          </para>
          <para>
           If the publications contain <literal>WHERE</literal> clauses, it
           will affect what data is copied. Refer to the
           <xref linkend="sql-createsubscription-notes" /> for details.
          </para>
+
+         <para>
+          There is some interaction between the <literal>only_local</literal>
+          option and <literal>copy_data</literal> option. Refer to the
+          <xref linkend="sql-createsubscription-notes" /> for interaction
+          details and usage of <literal>force</literal> for
+          <literal>copy_data</literal> option.
+         </para>
         </listitem>
        </varlistentry>
 
@@ -225,6 +235,11 @@ CREATE SUBSCRIPTION <replaceable class="parameter">subscription_name</replaceabl
           publisher node changes regardless of their origin. The default is
           <literal>false</literal>.
          </para>
+         <para>
+          There is some interaction between the <literal>only_local</literal>
+          option and <literal>copy_data</literal> option. Refer to the
+          <xref linkend="sql-createsubscription-notes" /> for details.
+         </para>
         </listitem>
        </varlistentry>
 
@@ -374,6 +389,18 @@ CREATE SUBSCRIPTION <replaceable class="parameter">subscription_name</replaceabl
    can have non-existent publications.
   </para>
 
+  <para>
+   If subscription is created with <literal>only_local = on</literal> and
+   <literal>copy_data = on</literal>, it will check if the publisher tables are
+   being subscribed to any other publisher and throw an error to prevent
+   inconsistent data in the subscription. The user can continue with the copy
+   operation without throwing any error in this case by specifying
+   <literal>copy_data = force</literal>. Refer to the
+   <xref linkend="bidirectional-logical-replication"/> on how
+   <literal>copy_data</literal> and <literal>only_local</literal> can be used
+   in bidirectional replication.
+  </para>
+
  </refsect1>
 
  <refsect1>
diff --git a/src/backend/commands/subscriptioncmds.c b/src/backend/commands/subscriptioncmds.c
index 479d6ca372..20bdf86f1b 100644
--- a/src/backend/commands/subscriptioncmds.c
+++ b/src/backend/commands/subscriptioncmds.c
@@ -69,6 +69,18 @@
 /* check if the 'val' has 'bits' set */
 #define IsSet(val, bits)  (((val) & (bits)) == (bits))
 
+#define IS_COPY_DATA_ON_OR_FORCE(copy_data) ((copy_data) != COPY_DATA_OFF)
+
+/*
+ * Represents whether copy_data option is specified with off, on or force.
+ */
+typedef enum CopyData
+{
+	COPY_DATA_OFF,
+	COPY_DATA_ON,
+	COPY_DATA_FORCE
+} CopyData;
+
 /*
  * Structure to hold a bitmap representing the user-provided CREATE/ALTER
  * SUBSCRIPTION command options and the parsed/default values of each of them.
@@ -81,7 +93,7 @@ typedef struct SubOpts
 	bool		connect;
 	bool		enabled;
 	bool		create_slot;
-	bool		copy_data;
+	CopyData	copy_data;
 	bool		refresh;
 	bool		binary;
 	bool		streaming;
@@ -91,11 +103,66 @@ typedef struct SubOpts
 	XLogRecPtr	lsn;
 } SubOpts;
 
-static List *fetch_table_list(WalReceiverConn *wrconn, List *publications);
+static List *fetch_table_list(WalReceiverConn *wrconn, List *publications,
+							  CopyData copydata, bool only_local);
 static void check_duplicates_in_publist(List *publist, Datum *datums);
 static List *merge_publications(List *oldpublist, List *newpublist, bool addpub, const char *subname);
 static void ReportSlotConnectionError(List *rstates, Oid subid, char *slotname, char *err);
 
+/*
+ * Validate the value specified for copy_data option.
+ */
+static CopyData
+DefGetCopyData(DefElem *def)
+{
+	/*
+	 * If no parameter given, assume "true" is meant.
+	 */
+	if (def->arg == NULL)
+		return COPY_DATA_ON;
+
+	/*
+	 * Allow 0, 1, "true", "false", "on", "off" or "force".
+	 */
+	switch (nodeTag(def->arg))
+	{
+		case T_Integer:
+			switch (intVal(def->arg))
+			{
+				case 0:
+					return COPY_DATA_OFF;
+				case 1:
+					return COPY_DATA_ON;
+				default:
+					/* otherwise, error out below */
+					break;
+			}
+			break;
+		default:
+			{
+				char	   *sval = defGetString(def);
+
+				/*
+				 * The set of strings accepted here should match up with the
+				 * grammar's opt_boolean_or_string production.
+				 */
+				if (pg_strcasecmp(sval, "false") == 0 ||
+					pg_strcasecmp(sval, "off") == 0)
+					return COPY_DATA_OFF;
+				if (pg_strcasecmp(sval, "true") == 0 ||
+					pg_strcasecmp(sval, "on") == 0)
+					return COPY_DATA_ON;
+				if (pg_strcasecmp(sval, "force") == 0)
+					return COPY_DATA_FORCE;
+			}
+			break;
+	}
+
+	ereport(ERROR,
+			errcode(ERRCODE_SYNTAX_ERROR),
+			errmsg("%s requires a boolean or \"force\"", def->defname));
+	return COPY_DATA_OFF;		/* keep compiler quiet */
+}
 
 /*
  * Common option parsing function for CREATE and ALTER SUBSCRIPTION commands.
@@ -128,7 +195,7 @@ parse_subscription_options(ParseState *pstate, List *stmt_options,
 	if (IsSet(supported_opts, SUBOPT_CREATE_SLOT))
 		opts->create_slot = true;
 	if (IsSet(supported_opts, SUBOPT_COPY_DATA))
-		opts->copy_data = true;
+		opts->copy_data = COPY_DATA_ON;
 	if (IsSet(supported_opts, SUBOPT_REFRESH))
 		opts->refresh = true;
 	if (IsSet(supported_opts, SUBOPT_BINARY))
@@ -196,7 +263,7 @@ parse_subscription_options(ParseState *pstate, List *stmt_options,
 				errorConflictingDefElem(defel, pstate);
 
 			opts->specified_opts |= SUBOPT_COPY_DATA;
-			opts->copy_data = defGetBoolean(defel);
+			opts->copy_data = DefGetCopyData(defel);
 		}
 		else if (IsSet(supported_opts, SUBOPT_SYNCHRONOUS_COMMIT) &&
 				 strcmp(defel->defname, "synchronous_commit") == 0)
@@ -333,17 +400,17 @@ parse_subscription_options(ParseState *pstate, List *stmt_options,
 					 errmsg("%s and %s are mutually exclusive options",
 							"connect = false", "create_slot = true")));
 
-		if (opts->copy_data &&
+		if (IS_COPY_DATA_ON_OR_FORCE(opts->copy_data) &&
 			IsSet(opts->specified_opts, SUBOPT_COPY_DATA))
 			ereport(ERROR,
 					(errcode(ERRCODE_SYNTAX_ERROR),
 					 errmsg("%s and %s are mutually exclusive options",
-							"connect = false", "copy_data = true")));
+							"connect = false", "copy_data = true/force")));
 
 		/* Change the defaults of other options. */
 		opts->enabled = false;
 		opts->create_slot = false;
-		opts->copy_data = false;
+		opts->copy_data = COPY_DATA_OFF;
 	}
 
 	/*
@@ -671,13 +738,14 @@ CreateSubscription(ParseState *pstate, CreateSubscriptionStmt *stmt,
 			 * Set sync state based on if we were asked to do data copy or
 			 * not.
 			 */
-			table_state = opts.copy_data ? SUBREL_STATE_INIT : SUBREL_STATE_READY;
+			table_state = IS_COPY_DATA_ON_OR_FORCE(opts.copy_data) ? SUBREL_STATE_INIT : SUBREL_STATE_READY;
 
 			/*
 			 * Get the table list from publisher and build local table status
 			 * info.
 			 */
-			tables = fetch_table_list(wrconn, publications);
+			tables = fetch_table_list(wrconn, publications, opts.copy_data,
+									  opts.only_local);
 			foreach(lc, tables)
 			{
 				RangeVar   *rv = (RangeVar *) lfirst(lc);
@@ -720,7 +788,8 @@ CreateSubscription(ParseState *pstate, CreateSubscriptionStmt *stmt,
 				 * PENDING, to allow ALTER SUBSCRIPTION ... REFRESH
 				 * PUBLICATION to work.
 				 */
-				if (opts.twophase && !opts.copy_data && tables != NIL)
+				if (opts.twophase && opts.copy_data == COPY_DATA_OFF &&
+					tables != NIL)
 					twophase_enabled = true;
 
 				walrcv_create_slot(wrconn, opts.slot_name, false, twophase_enabled,
@@ -761,7 +830,7 @@ CreateSubscription(ParseState *pstate, CreateSubscriptionStmt *stmt,
 }
 
 static void
-AlterSubscription_refresh(Subscription *sub, bool copy_data,
+AlterSubscription_refresh(Subscription *sub, CopyData copy_data,
 						  List *validate_publications)
 {
 	char	   *err;
@@ -797,7 +866,8 @@ AlterSubscription_refresh(Subscription *sub, bool copy_data,
 			check_publications(wrconn, validate_publications);
 
 		/* Get the table list from publisher. */
-		pubrel_names = fetch_table_list(wrconn, sub->publications);
+		pubrel_names = fetch_table_list(wrconn, sub->publications, copy_data,
+										sub->only_local);
 
 		/* Get local table list. */
 		subrel_states = GetSubscriptionRelations(sub->oid);
@@ -851,7 +921,7 @@ AlterSubscription_refresh(Subscription *sub, bool copy_data,
 						 list_length(subrel_states), sizeof(Oid), oid_cmp))
 			{
 				AddSubscriptionRelState(sub->oid, relid,
-										copy_data ? SUBREL_STATE_INIT : SUBREL_STATE_READY,
+										IS_COPY_DATA_ON_OR_FORCE(copy_data) ? SUBREL_STATE_INIT : SUBREL_STATE_READY,
 										InvalidXLogRecPtr);
 				ereport(DEBUG1,
 						(errmsg_internal("table \"%s.%s\" added to subscription \"%s\"",
@@ -1157,7 +1227,7 @@ AlterSubscription(ParseState *pstate, AlterSubscriptionStmt *stmt,
 					 * See ALTER_SUBSCRIPTION_REFRESH for details why this is
 					 * not allowed.
 					 */
-					if (sub->twophasestate == LOGICALREP_TWOPHASE_STATE_ENABLED && opts.copy_data)
+					if (sub->twophasestate == LOGICALREP_TWOPHASE_STATE_ENABLED && IS_COPY_DATA_ON_OR_FORCE(opts.copy_data))
 						ereport(ERROR,
 								(errcode(ERRCODE_SYNTAX_ERROR),
 								 errmsg("ALTER SUBSCRIPTION with refresh and copy_data is not allowed when two_phase is enabled"),
@@ -1209,7 +1279,7 @@ AlterSubscription(ParseState *pstate, AlterSubscriptionStmt *stmt,
 					 * See ALTER_SUBSCRIPTION_REFRESH for details why this is
 					 * not allowed.
 					 */
-					if (sub->twophasestate == LOGICALREP_TWOPHASE_STATE_ENABLED && opts.copy_data)
+					if (sub->twophasestate == LOGICALREP_TWOPHASE_STATE_ENABLED && IS_COPY_DATA_ON_OR_FORCE(opts.copy_data))
 						ereport(ERROR,
 								(errcode(ERRCODE_SYNTAX_ERROR),
 								 errmsg("ALTER SUBSCRIPTION with refresh and copy_data is not allowed when two_phase is enabled"),
@@ -1255,7 +1325,8 @@ AlterSubscription(ParseState *pstate, AlterSubscriptionStmt *stmt,
 				 *
 				 * For more details see comments atop worker.c.
 				 */
-				if (sub->twophasestate == LOGICALREP_TWOPHASE_STATE_ENABLED && opts.copy_data)
+				if (sub->twophasestate == LOGICALREP_TWOPHASE_STATE_ENABLED &&
+					IS_COPY_DATA_ON_OR_FORCE(opts.copy_data))
 					ereport(ERROR,
 							(errcode(ERRCODE_SYNTAX_ERROR),
 							 errmsg("ALTER SUBSCRIPTION ... REFRESH with copy_data is not allowed when two_phase is enabled"),
@@ -1778,22 +1849,27 @@ AlterSubscriptionOwner_oid(Oid subid, Oid newOwnerId)
  * publisher connection.
  */
 static List *
-fetch_table_list(WalReceiverConn *wrconn, List *publications)
+fetch_table_list(WalReceiverConn *wrconn, List *publications, CopyData copydata,
+				 bool only_local)
 {
 	WalRcvExecResult *res;
 	StringInfoData cmd;
 	TupleTableSlot *slot;
-	Oid			tableRow[2] = {TEXTOID, TEXTOID};
+	Oid			tableRow[3] = {TEXTOID, TEXTOID, CHAROID};
 	List	   *tablelist = NIL;
 
 	initStringInfo(&cmd);
-	appendStringInfoString(&cmd, "SELECT DISTINCT t.schemaname, t.tablename\n"
-						   "  FROM pg_catalog.pg_publication_tables t\n"
-						   " WHERE t.pubname IN (");
+	appendStringInfoString(&cmd,
+						   "SELECT DISTINCT N.nspname AS schemaname, C.relname AS tablename, PS.srrelid as replicated\n"
+						   "FROM pg_publication P,\n"
+						   "LATERAL pg_get_publication_tables(P.pubname) GPT\n"
+						   "LEFT JOIN pg_subscription_rel PS ON (GPT.relid = PS.srrelid),\n"
+						   "pg_class C JOIN pg_namespace N ON (N.oid = C.relnamespace)\n"
+						   "WHERE C.oid = GPT.relid AND P.pubname in (");
 	get_publications_str(publications, &cmd, true);
 	appendStringInfoChar(&cmd, ')');
 
-	res = walrcv_exec(wrconn, cmd.data, 2, tableRow);
+	res = walrcv_exec(wrconn, cmd.data, 3, tableRow);
 	pfree(cmd.data);
 
 	if (res->status != WALRCV_OK_TUPLES)
@@ -1819,6 +1895,25 @@ fetch_table_list(WalReceiverConn *wrconn, List *publications)
 		rv = makeRangeVar(nspname, relname, -1);
 		tablelist = lappend(tablelist, rv);
 
+		/*
+		 * XXX: During initial table sync we cannot differentiate between the
+		 * local and non-local data that is present in the HEAP. Identification
+		 * of local data can be done only from the WAL by using the origin id.
+		 * Throw an error so that the user can take care of the initial data
+		 * copying and then create subscription with copy_data as off or force.
+		 *
+		 * It is quite possible that subscriber has not yet pulled data to
+		 * the tables, but in ideal cases the table data will be subscribed.
+		 * To keep the code simple it is not checked if the subscriber table
+		 * has pulled the data or not.
+		 */
+		if (copydata == COPY_DATA_ON && only_local && !slot_attisnull(slot, 3))
+			ereport(ERROR,
+					errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
+					errmsg("CREATE/ALTER SUBSCRIPTION with only_local and copy_data as true is not allowed when the publisher might have replicated data, table:%s.%s might have replicated data in the publisher",
+						   nspname, relname),
+					errhint("Use CREATE/ALTER SUBSCRIPTION with copy_data = off or force"));
+
 		ExecClearTuple(slot);
 	}
 	ExecDropSingleTupleTableSlot(slot);
diff --git a/src/test/regress/expected/subscription.out b/src/test/regress/expected/subscription.out
index a9351b426b..d209da612b 100644
--- a/src/test/regress/expected/subscription.out
+++ b/src/test/regress/expected/subscription.out
@@ -47,7 +47,13 @@ ERROR:  must be superuser to create subscriptions
 SET SESSION AUTHORIZATION 'regress_subscription_user';
 -- fail - invalid option combinations
 CREATE SUBSCRIPTION regress_testsub2 CONNECTION 'dbname=regress_doesnotexist' PUBLICATION testpub WITH (connect = false, copy_data = true);
-ERROR:  connect = false and copy_data = true are mutually exclusive options
+ERROR:  connect = false and copy_data = true/force are mutually exclusive options
+CREATE SUBSCRIPTION regress_testsub2 CONNECTION 'dbname=regress_doesnotexist' PUBLICATION testpub WITH (connect = false, copy_data = on);
+ERROR:  connect = false and copy_data = true/force are mutually exclusive options
+CREATE SUBSCRIPTION regress_testsub2 CONNECTION 'dbname=regress_doesnotexist' PUBLICATION testpub WITH (connect = false, copy_data = 1);
+ERROR:  connect = false and copy_data = true/force are mutually exclusive options
+CREATE SUBSCRIPTION regress_testsub2 CONNECTION 'dbname=regress_doesnotexist' PUBLICATION testpub WITH (connect = false, copy_data = force);
+ERROR:  connect = false and copy_data = true/force are mutually exclusive options
 CREATE SUBSCRIPTION regress_testsub2 CONNECTION 'dbname=regress_doesnotexist' PUBLICATION testpub WITH (connect = false, enabled = true);
 ERROR:  connect = false and enabled = true are mutually exclusive options
 CREATE SUBSCRIPTION regress_testsub2 CONNECTION 'dbname=regress_doesnotexist' PUBLICATION testpub WITH (connect = false, create_slot = true);
@@ -93,6 +99,16 @@ ALTER SUBSCRIPTION regress_testsub4 SET (only_local = false);
 
 DROP SUBSCRIPTION regress_testsub3;
 DROP SUBSCRIPTION regress_testsub4;
+-- ok - valid copy_data options
+CREATE SUBSCRIPTION regress_testsub3 CONNECTION 'dbname=regress_doesnotexist' PUBLICATION testpub WITH (slot_name = NONE, connect = false, copy_data = false);
+WARNING:  tables were not subscribed, you will have to run ALTER SUBSCRIPTION ... REFRESH PUBLICATION to subscribe the tables
+CREATE SUBSCRIPTION regress_testsub4 CONNECTION 'dbname=regress_doesnotexist' PUBLICATION testpub WITH (slot_name = NONE, connect = false, copy_data = off);
+WARNING:  tables were not subscribed, you will have to run ALTER SUBSCRIPTION ... REFRESH PUBLICATION to subscribe the tables
+CREATE SUBSCRIPTION regress_testsub5 CONNECTION 'dbname=regress_doesnotexist' PUBLICATION testpub WITH (slot_name = NONE, connect = false, copy_data = 0);
+WARNING:  tables were not subscribed, you will have to run ALTER SUBSCRIPTION ... REFRESH PUBLICATION to subscribe the tables
+DROP SUBSCRIPTION regress_testsub3;
+DROP SUBSCRIPTION regress_testsub4;
+DROP SUBSCRIPTION regress_testsub5;
 -- fail - invalid connection string
 ALTER SUBSCRIPTION regress_testsub CONNECTION 'foobar';
 ERROR:  invalid connection string syntax: missing "=" after "foobar" in connection info string
diff --git a/src/test/regress/sql/subscription.sql b/src/test/regress/sql/subscription.sql
index 28eb91fc47..3e95c60800 100644
--- a/src/test/regress/sql/subscription.sql
+++ b/src/test/regress/sql/subscription.sql
@@ -40,6 +40,9 @@ SET SESSION AUTHORIZATION 'regress_subscription_user';
 
 -- fail - invalid option combinations
 CREATE SUBSCRIPTION regress_testsub2 CONNECTION 'dbname=regress_doesnotexist' PUBLICATION testpub WITH (connect = false, copy_data = true);
+CREATE SUBSCRIPTION regress_testsub2 CONNECTION 'dbname=regress_doesnotexist' PUBLICATION testpub WITH (connect = false, copy_data = on);
+CREATE SUBSCRIPTION regress_testsub2 CONNECTION 'dbname=regress_doesnotexist' PUBLICATION testpub WITH (connect = false, copy_data = 1);
+CREATE SUBSCRIPTION regress_testsub2 CONNECTION 'dbname=regress_doesnotexist' PUBLICATION testpub WITH (connect = false, copy_data = force);
 CREATE SUBSCRIPTION regress_testsub2 CONNECTION 'dbname=regress_doesnotexist' PUBLICATION testpub WITH (connect = false, enabled = true);
 CREATE SUBSCRIPTION regress_testsub2 CONNECTION 'dbname=regress_doesnotexist' PUBLICATION testpub WITH (connect = false, create_slot = true);
 CREATE SUBSCRIPTION regress_testsub2 CONNECTION 'dbname=regress_doesnotexist' PUBLICATION testpub WITH (slot_name = NONE, enabled = true);
@@ -66,6 +69,15 @@ ALTER SUBSCRIPTION regress_testsub4 SET (only_local = false);
 DROP SUBSCRIPTION regress_testsub3;
 DROP SUBSCRIPTION regress_testsub4;
 
+-- ok - valid copy_data options
+CREATE SUBSCRIPTION regress_testsub3 CONNECTION 'dbname=regress_doesnotexist' PUBLICATION testpub WITH (slot_name = NONE, connect = false, copy_data = false);
+CREATE SUBSCRIPTION regress_testsub4 CONNECTION 'dbname=regress_doesnotexist' PUBLICATION testpub WITH (slot_name = NONE, connect = false, copy_data = off);
+CREATE SUBSCRIPTION regress_testsub5 CONNECTION 'dbname=regress_doesnotexist' PUBLICATION testpub WITH (slot_name = NONE, connect = false, copy_data = 0);
+
+DROP SUBSCRIPTION regress_testsub3;
+DROP SUBSCRIPTION regress_testsub4;
+DROP SUBSCRIPTION regress_testsub5;
+
 -- fail - invalid connection string
 ALTER SUBSCRIPTION regress_testsub CONNECTION 'foobar';
 
diff --git a/src/test/subscription/t/032_onlylocal.pl b/src/test/subscription/t/032_onlylocal.pl
index 5ff5a0d9dc..47b9412e70 100644
--- a/src/test/subscription/t/032_onlylocal.pl
+++ b/src/test/subscription/t/032_onlylocal.pl
@@ -8,6 +8,116 @@ use PostgreSQL::Test::Cluster;
 use PostgreSQL::Test::Utils;
 use Test::More;
 
+my $result;
+my $stdout;
+my $stderr;
+
+my $subname_AB = 'tap_sub_A_B';
+my $subname_AC = 'tap_sub_A_C';
+my $subname_BA = 'tap_sub_B_A';
+my $subname_BC = 'tap_sub_B_C';
+my $subname_CA = 'tap_sub_C_A';
+my $subname_CB = 'tap_sub_C_B';
+
+# Detach node C from the node-group of (A, B, C) and clean the table contents
+# from all nodes.
+sub detach_node_clean_table_data
+{
+	my ($node_A, $node_B, $node_C) = @_;
+	$node_A->safe_psql('postgres', "DROP SUBSCRIPTION tap_sub_A_C");
+	$node_B->safe_psql('postgres', "DROP SUBSCRIPTION tap_sub_B_C");
+	$node_C->safe_psql('postgres', "DROP SUBSCRIPTION tap_sub_C_A");
+	$node_C->safe_psql('postgres', "DROP SUBSCRIPTION tap_sub_C_B");
+
+	$result =
+	  $node_A->safe_psql('postgres', "SELECT count(*) FROM pg_subscription");
+	is($result, qq(1), 'check subscription was dropped on subscriber');
+
+	$result =
+	  $node_B->safe_psql('postgres', "SELECT count(*) FROM pg_subscription");
+	is($result, qq(1), 'check subscription was dropped on subscriber');
+
+	$result =
+	  $node_C->safe_psql('postgres', "SELECT count(*) FROM pg_subscription");
+	is($result, qq(0), 'check subscription was dropped on subscriber');
+
+	$result = $node_A->safe_psql('postgres',
+		"SELECT count(*) FROM pg_replication_slots");
+	is($result, qq(1), 'check replication slot was dropped on publisher');
+
+	$result = $node_B->safe_psql('postgres',
+		"SELECT count(*) FROM pg_replication_slots");
+	is($result, qq(1), 'check replication slot was dropped on publisher');
+
+	$result = $node_C->safe_psql('postgres',
+		"SELECT count(*) FROM pg_replication_slots");
+	is($result, qq(0), 'check replication slot was dropped on publisher');
+
+	$node_A->safe_psql('postgres', "TRUNCATE tab_full");
+	$node_B->safe_psql('postgres', "TRUNCATE tab_full");
+	$node_C->safe_psql('postgres', "TRUNCATE tab_full");
+}
+
+# Subroutine to verify the data is replicated successfully.
+sub verify_data
+{
+	my ($node_A, $node_B, $node_C, $expect) = @_;
+
+	$node_A->wait_for_catchup($subname_BA);
+	$node_A->wait_for_catchup($subname_CA);
+	$node_B->wait_for_catchup($subname_AB);
+	$node_B->wait_for_catchup($subname_CB);
+	$node_C->wait_for_catchup($subname_AC);
+	$node_C->wait_for_catchup($subname_BC);
+
+	# check that data is replicated to all the nodes
+	$result =
+	  $node_A->safe_psql('postgres', "SELECT * FROM tab_full ORDER BY 1;");
+	is($result, qq($expect),
+	   'Data is replicated as expected'
+	);
+
+	$result =
+	  $node_B->safe_psql('postgres', "SELECT * FROM tab_full ORDER BY 1;");
+	is($result, qq($expect),
+	   'Data is replicated as expected'
+	);
+
+	$result =
+	  $node_C->safe_psql('postgres', "SELECT * FROM tab_full ORDER BY 1;");
+	is($result, qq($expect),
+	   'Data is replicated as expected'
+	);
+}
+
+my $synced_query =
+  "SELECT count(1) = 0 FROM pg_subscription_rel WHERE srsubstate NOT IN ('r', 's');";
+
+# Subroutine to create subscription and wait till the initial sync is completed.
+# Subroutine expects subscriber node, publisher node, subscription name,
+# destination connection string, publication name and the subscription with
+# options to be passed as input parameters.
+sub create_subscription
+{
+	my ($node_subscriber, $node_publisher, $sub_name, $node_connstr,
+		$pub_name, $with_options)
+	  = @_;
+
+	# Application_name is always assigned the same value as the subscription
+	# name.
+	$node_subscriber->safe_psql(
+		'postgres', "
+                CREATE SUBSCRIPTION $sub_name
+                CONNECTION '$node_connstr application_name=$sub_name'
+                PUBLICATION $pub_name
+                WITH ($with_options)");
+	$node_publisher->wait_for_catchup($sub_name);
+
+	# also wait for initial table sync to finish
+	$node_subscriber->poll_query_until('postgres', $synced_query)
+	  or die "Timed out while waiting for subscriber to synchronize data";
+}
+
 ###############################################################################
 # Setup a bidirectional logical replication between Node_A & Node_B
 ###############################################################################
@@ -43,42 +153,18 @@ $node_B->safe_psql('postgres', "CREATE TABLE tab_full (a int PRIMARY KEY)");
 my $node_A_connstr = $node_A->connstr . ' dbname=postgres';
 $node_A->safe_psql('postgres',
 	"CREATE PUBLICATION tap_pub_A FOR TABLE tab_full");
-my $appname_B1 = 'tap_sub_B1';
-$node_B->safe_psql(
-	'postgres', "
-	CREATE SUBSCRIPTION tap_sub_B1
-	CONNECTION '$node_A_connstr application_name=$appname_B1'
-	PUBLICATION tap_pub_A
-	WITH (only_local = on)");
+create_subscription($node_B, $node_A, $subname_BA, $node_A_connstr,
+	'tap_pub_A', 'copy_data = on, only_local = on');
 
 # node_B (pub) -> node_A (sub)
 my $node_B_connstr = $node_B->connstr . ' dbname=postgres';
 $node_B->safe_psql('postgres',
 	"CREATE PUBLICATION tap_pub_B FOR TABLE tab_full");
-my $appname_A = 'tap_sub_A';
-$node_A->safe_psql(
-	'postgres', "
-	CREATE SUBSCRIPTION tap_sub_A
-	CONNECTION '$node_B_connstr application_name=$appname_A'
-	PUBLICATION tap_pub_B
-	WITH (only_local = on, copy_data = off)");
-
-# Wait for subscribers to finish initialization
-$node_A->wait_for_catchup($appname_B1);
-$node_B->wait_for_catchup($appname_A);
-
-# Also wait for initial table sync to finish
-my $synced_query =
-  "SELECT count(1) = 0 FROM pg_subscription_rel WHERE srsubstate NOT IN ('r', 's');";
-$node_A->poll_query_until('postgres', $synced_query)
-  or die "Timed out while waiting for subscriber to synchronize data";
-$node_B->poll_query_until('postgres', $synced_query)
-  or die "Timed out while waiting for subscriber to synchronize data";
+create_subscription($node_A, $node_B, $subname_AB, $node_B_connstr,
+	'tap_pub_B', 'copy_data = off, only_local = on');
 
 is(1, 1, "Circular replication setup is complete");
 
-my $result;
-
 ###############################################################################
 # check that bidirectional logical replication setup does not cause infinite
 # recursive insertion.
@@ -88,8 +174,8 @@ my $result;
 $node_A->safe_psql('postgres', "INSERT INTO tab_full VALUES (11);");
 $node_B->safe_psql('postgres', "INSERT INTO tab_full VALUES (12);");
 
-$node_A->wait_for_catchup($appname_B1);
-$node_B->wait_for_catchup($appname_A);
+$node_A->wait_for_catchup($subname_BA);
+$node_B->wait_for_catchup($subname_AB);
 
 # check that transaction was committed on subscriber(s)
 $result = $node_A->safe_psql('postgres', "SELECT * FROM tab_full ORDER BY 1;");
@@ -122,25 +208,14 @@ $node_C->safe_psql('postgres', "CREATE TABLE tab_full (a int PRIMARY KEY)");
 my $node_C_connstr = $node_C->connstr . ' dbname=postgres';
 $node_C->safe_psql('postgres',
 	"CREATE PUBLICATION tap_pub_C FOR TABLE tab_full");
-
-my $appname_B2 = 'tap_sub_B2';
-$node_B->safe_psql(
-	'postgres', "
-	CREATE SUBSCRIPTION tap_sub_B2
-	CONNECTION '$node_C_connstr application_name=$appname_B2'
-	PUBLICATION tap_pub_C
-	WITH (only_local = on)");
-
-$node_C->wait_for_catchup($appname_B2);
-
-$node_C->poll_query_until('postgres', $synced_query)
-  or die "Timed out while waiting for subscriber to synchronize data";
+create_subscription($node_B, $node_C, $subname_BC, $node_C_connstr,
+	'tap_pub_C', 'copy_data = on, only_local = on');
 
 # insert a record
 $node_C->safe_psql('postgres', "INSERT INTO tab_full VALUES (13);");
 
-$node_C->wait_for_catchup($appname_B2);
-$node_B->wait_for_catchup($appname_A);
+$node_C->wait_for_catchup($subname_BC);
+$node_B->wait_for_catchup($subname_AB);
 
 $result = $node_B->safe_psql('postgres', "SELECT * FROM tab_full ORDER BY 1;");
 is($result, qq(11
@@ -154,6 +229,168 @@ is($result, qq(11
 12), 'Remote data originated from other node is not replicated when only_local option is ON'
 );
 
+# clear the operations done by this test
+$node_B->safe_psql(
+       'postgres', "
+        DROP SUBSCRIPTION $subname_BC");
+$node_C->safe_psql(
+	    'postgres', "
+        DELETE FROM tab_full");
+$node_B->safe_psql(
+	    'postgres', "
+        DELETE FROM tab_full where a = 13");
+
+###############################################################################
+# Specifying only_local 'on' which indicates that the publisher should only
+# replicate the changes that are generated locally from node_B, but in
+# this case since the node_B is also subscribing data from node_A, node_B can
+# have remotely originated data from node_A. We throw an error, in this case,
+# to draw attention to there being possible remote data.
+###############################################################################
+($result, $stdout, $stderr) = $node_A->psql(
+       'postgres', "
+        CREATE SUBSCRIPTION tap_sub_A2
+        CONNECTION '$node_B_connstr application_name=$subname_AB'
+        PUBLICATION tap_pub_B
+        WITH (only_local = on, copy_data = on)");
+like(
+       $stderr,
+       qr/ERROR:  CREATE\/ALTER SUBSCRIPTION with only_local and copy_data as true is not allowed when the publisher might have replicated data/,
+       "Create subscription with only_local and copy_data having replicated table in publisher"
+);
+
+# Creating subscription with only_local and copy_data as force should be
+# successful when the publisher has replicated data
+$node_A->safe_psql(
+       'postgres', "
+        CREATE SUBSCRIPTION tap_sub_A2
+        CONNECTION '$node_B_connstr application_name=$subname_AC'
+        PUBLICATION tap_pub_B
+        WITH (only_local = on, copy_data = force)");
+
+$node_A->safe_psql(
+       'postgres', "
+        DROP SUBSCRIPTION tap_sub_A2");
+
+###############################################################################
+# Join 3rd node (node_C) to the existing 2 nodes(node_A & node_B) bidirectional
+# replication setup when the existing nodes (node_A & node_B) has pre-existing
+# data and the new node (node_C) does not have any data.
+###############################################################################
+$result = $node_A->safe_psql('postgres', "SELECT * FROM tab_full ORDER BY 1;");
+is( $result, qq(11
+12), 'Check existing data');
+
+$result = $node_B->safe_psql('postgres', "SELECT * FROM tab_full ORDER BY 1;");
+is( $result, qq(11
+12), 'Check existing data');
+
+$result =
+	$node_C->safe_psql('postgres', "SELECT * FROM tab_full ORDER BY 1;");
+is( $result, qq(), 'Check existing data');
+
+create_subscription($node_A, $node_C, $subname_AC, $node_C_connstr,
+       'tap_pub_C', 'copy_data = off, only_local = on');
+create_subscription($node_B, $node_C, $subname_BC, $node_C_connstr,
+       'tap_pub_C', 'copy_data = off, only_local = on');
+create_subscription($node_C, $node_A, $subname_CA, $node_A_connstr,
+       'tap_pub_A', 'copy_data = force, only_local = on');
+create_subscription($node_C, $node_B, $subname_CB, $node_B_connstr,
+       'tap_pub_B', 'copy_data = off, only_local = on');
+
+# insert some data in all the nodes
+$node_A->safe_psql('postgres', "INSERT INTO tab_full VALUES (13);");
+$node_B->safe_psql('postgres', "INSERT INTO tab_full VALUES (23);");
+$node_C->safe_psql('postgres', "INSERT INTO tab_full VALUES (33);");
+
+verify_data($node_A, $node_B, $node_C, '11
+12
+13
+23
+33');
+
+detach_node_clean_table_data($node_A, $node_B, $node_C);
+
+###############################################################################
+# Join 3rd node (node_C) to the existing 2 nodes(node_A & node_B) bidirectional
+# replication setup when the existing nodes (node_A & node_B) and the new node
+# (node_C) does not have any data.
+###############################################################################
+$result = $node_A->safe_psql('postgres', "SELECT * FROM tab_full ORDER BY 1;");
+is( $result, qq(), 'Check existing data');
+
+$result = $node_B->safe_psql('postgres', "SELECT * FROM tab_full ORDER BY 1;");
+is( $result, qq(), 'Check existing data');
+
+$result = $node_C->safe_psql('postgres', "SELECT * FROM tab_full ORDER BY 1;");
+is( $result, qq(), 'Check existing data');
+
+create_subscription($node_A, $node_C, $subname_AC, $node_C_connstr,
+       'tap_pub_C', 'copy_data = off, only_local = on');
+create_subscription($node_B, $node_C, $subname_BC, $node_C_connstr,
+       'tap_pub_C', 'copy_data = off, only_local = on');
+create_subscription($node_C, $node_A, $subname_CA, $node_A_connstr,
+       'tap_pub_A', 'copy_data = off, only_local = on');
+create_subscription($node_C, $node_B, $subname_CB, $node_B_connstr,
+       'tap_pub_B', 'copy_data = off, only_local = on');
+
+# insert some data in all the nodes
+$node_A->safe_psql('postgres', "INSERT INTO tab_full VALUES (11);");
+$node_B->safe_psql('postgres', "INSERT INTO tab_full VALUES (21);");
+$node_C->safe_psql('postgres', "INSERT INTO tab_full VALUES (31);");
+
+verify_data($node_A, $node_B, $node_C, '11
+21
+31');
+
+detach_node_clean_table_data($node_A, $node_B, $node_C);
+
+###############################################################################
+# Join 3rd node (node_C) to the existing 2 nodes(node_A & node_B) bidirectional
+# replication setup when the existing nodes (node_A & node_B) has no data and
+# the new node (node_C) some pre-existing data.
+###############################################################################
+$node_C->safe_psql('postgres', "INSERT INTO tab_full VALUES (31);");
+
+$result = $node_A->safe_psql('postgres', "SELECT * FROM tab_full ORDER BY 1;");
+is( $result, qq(), 'Check existing data');
+
+$result = $node_B->safe_psql('postgres', "SELECT * FROM tab_full ORDER BY 1;");
+is( $result, qq(), 'Check existing data');
+
+$result = $node_C->safe_psql('postgres', "SELECT * FROM tab_full ORDER BY 1;");
+is($result, qq(31), 'Check existing data');
+
+create_subscription($node_A, $node_C, $subname_AC, $node_C_connstr,
+       'tap_pub_C', 'copy_data = on, only_local = on');
+create_subscription($node_B, $node_C, $subname_BC, $node_C_connstr,
+       'tap_pub_C', 'copy_data = on, only_local = on');
+
+$node_C->safe_psql('postgres',
+       "ALTER PUBLICATION tap_pub_C SET (publish='insert,update,delete');");
+
+$node_C->safe_psql('postgres', "TRUNCATE tab_full");
+
+# include truncates now
+$node_C->safe_psql('postgres',
+       "ALTER PUBLICATION tap_pub_C SET (publish='insert,update,delete,truncate');"
+);
+
+create_subscription($node_C, $node_A, $subname_CA, $node_A_connstr,
+       'tap_pub_A', 'copy_data = force, only_local = on');
+create_subscription($node_C, $node_B, $subname_CB, $node_B_connstr,
+       'tap_pub_B', 'copy_data = off, only_local = on');
+
+# insert some data in all the nodes
+$node_A->safe_psql('postgres', "INSERT INTO tab_full VALUES (12);");
+$node_B->safe_psql('postgres', "INSERT INTO tab_full VALUES (22);");
+$node_C->safe_psql('postgres', "INSERT INTO tab_full VALUES (32);");
+
+verify_data($node_A, $node_B, $node_C, '12
+22
+31
+32');
+
 # shutdown
 $node_B->stop('fast');
 $node_A->stop('fast');
diff --git a/src/tools/pgindent/typedefs.list b/src/tools/pgindent/typedefs.list
index 4fb746930a..b93381aafc 100644
--- a/src/tools/pgindent/typedefs.list
+++ b/src/tools/pgindent/typedefs.list
@@ -456,6 +456,7 @@ ConvProcInfo
 ConversionLocation
 ConvertRowtypeExpr
 CookedConstraint
+CopyData
 CopyDest
 CopyFormatOptions
 CopyFromState
-- 
2.32.0

test.shapplication/octet-stream; name=test.shDownload

#!/bin/bash

port_N1=5431
port_N2=5432
port_N3=5433

common_tbl="create table tbl (a int primary key);"

function show_table_on_all_nodes()
{
    echo $1
    psql -U postgres -p $port_N1 -c "select a as N1 from tbl order by a"
    psql -U postgres -p $port_N2 -c "select a as N2 from tbl order by a"
    psql -U postgres -p $port_N3 -c "select a as N3 from tbl order by a"
}

echo 'Clean up'

pg_ctl stop -D data_N1
pg_ctl stop -D data_N2
pg_ctl stop -D data_N3

rm -r data_N1 data_N2 data_N3 *log

echo 'Set up'

initdb -D data_N1 -U postgres
initdb -D data_N2 -U postgres
initdb -D data_N3 -U postgres

cat << EOF >> data_N1/postgresql.conf
wal_level = logical
port = $port_N1
max_logical_replication_workers=100
max_replication_slots=40
autovacuum = off
EOF

cat << EOF >> data_N2/postgresql.conf
wal_level = logical
port = $port_N2
max_logical_replication_workers=100
max_replication_slots=40
autovacuum = off
EOF

cat << EOF >> data_N3/postgresql.conf
wal_level = logical
port = $port_N3
max_logical_replication_workers=100
max_replication_slots=40
autovacuum = off
EOF

pg_ctl -D data_N1 start -w -l N1.log
pg_ctl -D data_N2 start -w -l N2.log
pg_ctl -D data_N3 start -w -l N3.log

psql -U postgres -p $port_N1 -c "$common_tbl"
psql -U postgres -p $port_N2 -c "$common_tbl"
psql -U postgres -p $port_N3 -c "$common_tbl"

# =====================================================================================================================

echo '****************************************'
echo 'create set testset'
echo '****************************************'

psql -U postgres -p $port_N1 -c "SELECT lrg_create('testgroup', 'FOR ALL TABLES', 'user=postgres port=$port_N1', 'testnode1');"

sleep 1s

# =====================================================================================================================

echo '****************************************'
echo 'Attach to testset'
echo '****************************************'

psql -U postgres -p $port_N2 -c "SELECT lrg_node_attach('testgroup', 'user=postgres port=$port_N2', 'user=postgres port=$port_N1', 'testnode2')"
sleep 1s
psql -U postgres -p $port_N3 -c "SELECT lrg_node_attach('testgroup', 'user=postgres port=$port_N3', 'user=postgres port=$port_N2', 'testnode3')"
sleep 1s

# Insert some more data at every node to see that it is replicated everywhere
psql -U postgres -p $port_N1 -c "insert into tbl values (12);"
psql -U postgres -p $port_N2 -c "insert into tbl values (22);"
psql -U postgres -p $port_N3 -c "insert into tbl values (32);"

sleep 5s
show_table_on_all_nodes "Data inserted at N1,N2,N3 should be shared"

# detaching and dropping can be also done, do following if you are interested in
#psql -U postgres -p $port_N2 -c "SELECT lrg_node_detach('testgroup', 'testnode3');"
#psql -U postgres -p $port_N2 -c "SELECT lrg_node_detach('testgroup', 'testnode2');"
#psql -U postgres -p $port_N1 -c "SELECT lrg_drop('testgroup');"

v2-0003-PoC-implement-LRG.patchapplication/octet-stream; name=v2-0003-PoC-implement-LRG.patchDownload

From 7d982a20fdfd92e29da426f052a898fe0f0c9dbe Mon Sep 17 00:00:00 2001
From: "kuroda.hayato%40jp.fujitsu.com" <kuroda.hayato@jp.fujitsu.com>
Date: Tue, 17 May 2022 08:03:31 +0000
Subject: [PATCH 1/2] (PoC) implement LRG

---
 src/Makefile                                |   1 +
 src/backend/catalog/Makefile                |   3 +-
 src/backend/postmaster/bgworker.c           |   7 +
 src/backend/postmaster/postmaster.c         |   3 +
 src/backend/replication/Makefile            |   4 +-
 src/backend/replication/libpqlrg/Makefile   |  38 ++
 src/backend/replication/libpqlrg/libpqlrg.c | 352 +++++++++++
 src/backend/replication/lrg/Makefile        |  22 +
 src/backend/replication/lrg/lrg.c           | 465 ++++++++++++++
 src/backend/replication/lrg/lrg_launcher.c  | 341 ++++++++++
 src/backend/replication/lrg/lrg_worker.c    | 648 ++++++++++++++++++++
 src/backend/storage/ipc/ipci.c              |   2 +
 src/include/catalog/pg_lrg_info.h           |  47 ++
 src/include/catalog/pg_lrg_nodes.h          |  54 ++
 src/include/catalog/pg_lrg_pub.h            |  46 ++
 src/include/catalog/pg_lrg_sub.h            |  46 ++
 src/include/catalog/pg_proc.dat             |  25 +
 src/include/replication/libpqlrg.h          |  99 +++
 src/include/replication/lrg.h               |  68 ++
 src/test/regress/expected/oidjoins.out      |   6 +
 20 files changed, 2275 insertions(+), 2 deletions(-)
 create mode 100644 src/backend/replication/libpqlrg/Makefile
 create mode 100644 src/backend/replication/libpqlrg/libpqlrg.c
 create mode 100644 src/backend/replication/lrg/Makefile
 create mode 100644 src/backend/replication/lrg/lrg.c
 create mode 100644 src/backend/replication/lrg/lrg_launcher.c
 create mode 100644 src/backend/replication/lrg/lrg_worker.c
 create mode 100644 src/include/catalog/pg_lrg_info.h
 create mode 100644 src/include/catalog/pg_lrg_nodes.h
 create mode 100644 src/include/catalog/pg_lrg_pub.h
 create mode 100644 src/include/catalog/pg_lrg_sub.h
 create mode 100644 src/include/replication/libpqlrg.h
 create mode 100644 src/include/replication/lrg.h

diff --git a/src/Makefile b/src/Makefile
index 79e274a476..75db706762 100644
--- a/src/Makefile
+++ b/src/Makefile
@@ -23,6 +23,7 @@ SUBDIRS = \
 	interfaces \
 	backend/replication/libpqwalreceiver \
 	backend/replication/pgoutput \
+	backend/replication/libpqlrg \
 	fe_utils \
 	bin \
 	pl \
diff --git a/src/backend/catalog/Makefile b/src/backend/catalog/Makefile
index 89a0221ec9..744fdf4fb8 100644
--- a/src/backend/catalog/Makefile
+++ b/src/backend/catalog/Makefile
@@ -72,7 +72,8 @@ CATALOG_HEADERS := \
 	pg_collation.h pg_parameter_acl.h pg_partitioned_table.h \
 	pg_range.h pg_transform.h \
 	pg_sequence.h pg_publication.h pg_publication_namespace.h \
-	pg_publication_rel.h pg_subscription.h pg_subscription_rel.h
+	pg_publication_rel.h pg_subscription.h pg_subscription_rel.h \
+	pg_lrg_info.h pg_lrg_nodes.h pg_lrg_pub.h pg_lrg_sub.h
 
 GENERATED_HEADERS := $(CATALOG_HEADERS:%.h=%_d.h) schemapg.h system_fk_info.h
 
diff --git a/src/backend/postmaster/bgworker.c b/src/backend/postmaster/bgworker.c
index 40601aefd9..49d8ff1878 100644
--- a/src/backend/postmaster/bgworker.c
+++ b/src/backend/postmaster/bgworker.c
@@ -20,6 +20,7 @@
 #include "postmaster/bgworker_internals.h"
 #include "postmaster/interrupt.h"
 #include "postmaster/postmaster.h"
+#include "replication/lrg.h"
 #include "replication/logicallauncher.h"
 #include "replication/logicalworker.h"
 #include "storage/dsm.h"
@@ -128,6 +129,12 @@ static const struct
 	},
 	{
 		"ApplyWorkerMain", ApplyWorkerMain
+	},
+	{
+		"lrg_launcher_main", lrg_launcher_main
+	},
+	{
+		"lrg_worker_main", lrg_worker_main
 	}
 };
 
diff --git a/src/backend/postmaster/postmaster.c b/src/backend/postmaster/postmaster.c
index 3b73e26956..b900008cdd 100644
--- a/src/backend/postmaster/postmaster.c
+++ b/src/backend/postmaster/postmaster.c
@@ -118,6 +118,7 @@
 #include "postmaster/postmaster.h"
 #include "postmaster/syslogger.h"
 #include "replication/logicallauncher.h"
+#include "replication/lrg.h"
 #include "replication/walsender.h"
 #include "storage/fd.h"
 #include "storage/ipc.h"
@@ -1020,6 +1021,8 @@ PostmasterMain(int argc, char *argv[])
 	 */
 	ApplyLauncherRegister();
 
+	LrgLauncherRegister();
+
 	/*
 	 * process any libraries that should be preloaded at postmaster start
 	 */
diff --git a/src/backend/replication/Makefile b/src/backend/replication/Makefile
index 3d8fb70c0e..49ffc243f6 100644
--- a/src/backend/replication/Makefile
+++ b/src/backend/replication/Makefile
@@ -35,7 +35,9 @@ OBJS = \
 	walreceiverfuncs.o \
 	walsender.o
 
-SUBDIRS = logical
+SUBDIRS = \
+	logical \
+	lrg
 
 include $(top_srcdir)/src/backend/common.mk
 
diff --git a/src/backend/replication/libpqlrg/Makefile b/src/backend/replication/libpqlrg/Makefile
new file mode 100644
index 0000000000..72d911a918
--- /dev/null
+++ b/src/backend/replication/libpqlrg/Makefile
@@ -0,0 +1,38 @@
+#-------------------------------------------------------------------------
+#
+# Makefile--
+#    Makefile for src/backend/replication/lrg/libpqlrg
+#
+# IDENTIFICATION
+#    src/backend/replication/lrg/libpqlrg/Makefile
+#
+#-------------------------------------------------------------------------
+
+subdir = src/backend/replication/lrg/libpqlrg
+top_builddir = ../../../..
+include $(top_builddir)/src/Makefile.global
+
+override CPPFLAGS := -I$(srcdir) -I$(libpq_srcdir) $(CPPFLAGS)
+
+OBJS = \
+	$(WIN32RES) \
+	libpqlrg.o
+
+SHLIB_LINK_INTERNAL = $(libpq)
+SHLIB_LINK = $(filter -lintl, $(LIBS))
+SHLIB_PREREQS = submake-libpq
+PGFILEDESC = "libpqlrg"
+NAME = libpqlrg
+
+all: all-shared-lib
+
+include $(top_srcdir)/src/Makefile.shlib
+
+install: all installdirs install-lib
+
+installdirs: installdirs-lib
+
+uninstall: uninstall-lib
+
+clean distclean maintainer-clean: clean-lib
+	rm -f $(OBJS)
diff --git a/src/backend/replication/libpqlrg/libpqlrg.c b/src/backend/replication/libpqlrg/libpqlrg.c
new file mode 100644
index 0000000000..b313e7c0b8
--- /dev/null
+++ b/src/backend/replication/libpqlrg/libpqlrg.c
@@ -0,0 +1,352 @@
+/*-------------------------------------------------------------------------
+ *
+ * libpqlrg.c
+ *
+ * This file contains the libpq-specific parts of lrg feature. It's
+ * loaded as a dynamic module to avoid linking the main server binary with
+ * libpq.
+ *-------------------------------------------------------------------------
+ */
+
+
+#include "postgres.h"
+
+#include "access/heapam.h"
+#include "funcapi.h"
+#include "libpq-fe.h"
+#include "lib/stringinfo.h"
+#include "replication/libpqlrg.h"
+#include "replication/lrg.h"
+#include "utils/snapmgr.h"
+
+PG_MODULE_MAGIC;
+
+void		_PG_init(void);
+
+/* Prototypes for interface functions */
+static void libpqlrg_connect(const char *connstring, PGconn **conn);
+static bool libpqlrg_check_group(PGconn *conn, const char *group_name);
+static void libpqlrg_copy_lrg_nodes(PGconn *remoteconn, PGconn *localconn);
+static void libpqlrg_insert_into_lrg_nodes(PGconn *remoteconn,
+										   const char *node_id, LRG_NODE_STATE status,
+										   const char *node_name, const char *local_connstring,
+										   const char *upstream_connstring);
+
+static void libpqlrg_create_subscription(const char *group_name, const char *publisher_connstring,
+										 const char *publisher_node_id, const char *subscriber_node_id,
+										 PGconn *subscriberconn, const char *options);
+
+static void libpqlrg_drop_publication(const char *group_name,
+									  PGconn *publisherconn);
+
+static void libpqlrg_drop_subscription(const char *group_name,
+										const char *publisher_node_id, const char *subscriber_node_id,
+										PGconn *subscriberconn);
+
+static void libpqlrg_delete_from_nodes(PGconn *conn, const char *node_id);
+
+static void libpqlrg_cleanup(PGconn *conn);
+
+static void libpqlrg_disconnect(PGconn *conn);
+
+static lrg_function_types PQLrgFunctionTypes =
+{
+	libpqlrg_connect,
+	libpqlrg_check_group,
+	libpqlrg_copy_lrg_nodes,
+	libpqlrg_insert_into_lrg_nodes,
+	libpqlrg_create_subscription,
+	libpqlrg_drop_publication,
+	libpqlrg_drop_subscription,
+	libpqlrg_delete_from_nodes,
+	libpqlrg_cleanup,
+	libpqlrg_disconnect
+};
+
+/*
+ * Just a wrapper for PQconnectdb() and PQstatus().
+ */
+static void
+libpqlrg_connect(const char *connstring, PGconn **conn)
+{
+	*conn = PQconnectdb(connstring);
+	if (PQstatus(*conn) != CONNECTION_OK)
+		elog(ERROR, "failed to connect");
+}
+
+/*
+ * Check whether the node is in the specified group or not.
+ */
+static bool
+libpqlrg_check_group(PGconn *conn, const char *group_name)
+{
+	PGresult *result;
+	StringInfoData query;
+	bool ret;
+
+	Assert(PQstatus(conn) == CONNECTION_OK);
+	initStringInfo(&query);
+	appendStringInfo(&query, "SELECT COUNT(*) FROM pg_lrg_info WHERE groupname = '%s'", group_name);
+
+	result = PQexec(conn, query.data);
+
+	ret = atoi(PQgetvalue(result, 0, 0));
+	pfree(query.data);
+
+	return ret != 0;
+}
+
+/*
+ * Copy pg_lrg_nodes from remoteconn.
+ */
+static void
+libpqlrg_copy_lrg_nodes(PGconn *remoteconn, PGconn *localconn)
+{
+	PGresult *result;
+	StringInfoData query;
+	int i, num_tuples;
+
+	Assert(PQstatus(remoteconn) == CONNECTION_OK
+		   && PQstatus(localconn) == CONNECTION_OK);
+	initStringInfo(&query);
+
+
+	/*
+	 * Note that COPY command cannot be used here because group_oid
+	 * might be different between remote and local.
+	 */
+	appendStringInfo(&query, "SELECT nodeid, status, nodename, "
+							 "localconn, upstreamconn FROM pg_lrg_nodes");
+	result = PQexec(remoteconn, query.data);
+	if (PQresultStatus(result) != PGRES_TUPLES_OK)
+		elog(ERROR, "failed to read pg_lrg_nodes");
+
+	resetStringInfo(&query);
+
+	num_tuples = PQntuples(result);
+
+	for(i = 0; i < num_tuples; i++)
+	{
+		char *node_id;
+		char *status;
+		char *nodename;
+		char *localconn;
+		char *upstreamconn;
+
+		node_id = PQgetvalue(result, i, 0);
+		status = PQgetvalue(result, i, 1);
+		nodename = PQgetvalue(result, i, 2);
+		localconn = PQgetvalue(result, i, 3);
+		upstreamconn = PQgetvalue(result, i, 4);
+
+		StartTransactionCommand();
+		(void) GetTransactionSnapshot();
+		/*
+		 * group_oid is adjusted to local value
+		 */
+		lrg_add_nodes(node_id, get_group_info(NULL), atoi(status), nodename, localconn, upstreamconn);
+		CommitTransactionCommand();
+	}
+}
+
+/*
+ * Insert data to remote's pg_lrg_nodes. It will be done
+ * via internal SQL function.
+ */
+static void
+libpqlrg_insert_into_lrg_nodes(PGconn *remoteconn,
+							   const char *node_id, LRG_NODE_STATE status,
+							   const char *node_name, const char *local_connstring,
+							   const char *upstream_connstring)
+{
+	StringInfoData query;
+	PGresult *result;
+
+	Assert(PQstatus(remoteconn) == CONNECTION_OK
+		   && node_id != NULL
+		   && node_name != NULL
+		   && local_connstring != NULL
+		   && upstream_connstring != NULL);
+
+	initStringInfo(&query);
+	appendStringInfo(&query, "SELECT lrg_insert_into_nodes('%s', %d, '%s', '%s', '%s')",
+					 node_id, status, node_name, local_connstring, upstream_connstring);
+
+	result = PQexec(remoteconn, query.data);
+	if (PQresultStatus(result) != PGRES_TUPLES_OK)
+		elog(ERROR, "failed to execute libpqlrg_insert_to_remote_lrg_nodes: %s", query.data);
+	PQclear(result);
+
+	pfree(query.data);
+}
+
+/*
+ * Create a subscription with given name and parameters, and
+ * add a tuple to remote's pg_lrg_sub.
+ *
+ * Note that both of this and  libpqlrg_insert_into_lrg_nodes()
+ * must be called during attaching a node.
+ */
+static void
+libpqlrg_create_subscription(const char *group_name, const char *publisher_connstring,
+							 const char *publisher_node_id, const char *subscriber_node_id,
+							 PGconn *subscriberconn, const char *options)
+{
+	StringInfoData query, sub_name;
+	PGresult *result;
+
+	Assert(publisher_connstring != NULL && subscriberconn != NULL);
+
+	/*
+	 * the name of subscriber is just concat of two node_id.
+	 */
+	initStringInfo(&query);
+	initStringInfo(&sub_name);
+
+	/*
+	 * construct the name of subscription and query.
+	 */
+	appendStringInfo(&sub_name, "sub_%s_%s", subscriber_node_id, publisher_node_id);
+	appendStringInfo(&query, "CREATE SUBSCRIPTION %s CONNECTION '%s' PUBLICATION pub_for_%s",
+					 sub_name.data, publisher_connstring, group_name);
+
+	if (options)
+		appendStringInfo(&query, " WITH (%s)", options);
+
+	result = PQexec(subscriberconn, query.data);
+	if (PQresultStatus(result) != PGRES_COMMAND_OK)
+		elog(ERROR, "failed to create subscription: %s", query.data);
+	PQclear(result);
+
+	resetStringInfo(&query);
+	appendStringInfo(&query, "SELECT lrg_insert_into_sub('%s')", sub_name.data);
+	result = PQexec(subscriberconn, query.data);
+	if (PQresultStatus(result) != PGRES_TUPLES_OK)
+		elog(ERROR, "failed to execute lrg_insert_into_sub: %s", query.data);
+	PQclear(result);
+
+	pfree(sub_name.data);
+	pfree(query.data);
+}
+
+/*
+ * Drop a given publication and delete a tuple
+ * from remote's pg_lrg_pub.
+ */
+static void
+libpqlrg_drop_publication(const char *group_name,
+						  PGconn *publisherconn)
+{
+	StringInfoData query, pub_name;
+	PGresult *result;
+
+	Assert(PQstatus(publisherconn) == CONNECTION_OK);
+
+	initStringInfo(&query);
+	initStringInfo(&pub_name);
+
+	appendStringInfo(&pub_name, "pub_for_%s", group_name);
+	appendStringInfo(&query, "DROP PUBLICATION %s", pub_name.data);
+
+	result = PQexec(publisherconn, query.data);
+	if (PQresultStatus(result) != PGRES_COMMAND_OK)
+		elog(ERROR, "failed to drop publication: %s", query.data);
+	PQclear(result);
+	pfree(pub_name.data);
+	pfree(query.data);
+}
+
+/*
+ * same as above, but for subscription.
+ */
+static void
+libpqlrg_drop_subscription(const char *group_name,
+						   const char *publisher_node_id, const char *subscriber_node_id,
+						   PGconn *subscriberconn)
+{
+	StringInfoData query, sub_name;
+	PGresult *result;
+
+	Assert(PQstatus(subscriberconn) == CONNECTION_OK);
+
+	/*
+	 * the name of subscriber is just concat of two node_id.
+	 */
+	initStringInfo(&query);
+	initStringInfo(&sub_name);
+
+	/*
+	 * construct the name of subscription and query.
+	 */
+	appendStringInfo(&sub_name, "sub_%s_%s", subscriber_node_id, publisher_node_id);
+	appendStringInfo(&query, "DROP SUBSCRIPTION %s", sub_name.data);
+
+	result = PQexec(subscriberconn, query.data);
+	if (PQresultStatus(result) != PGRES_COMMAND_OK)
+		elog(ERROR, "failed to drop subscription: %s", query.data);
+	PQclear(result);
+	pfree(sub_name.data);
+	pfree(query.data);
+}
+
+/*
+ * Delete data to remote's pg_lrg_nodes. It will be done
+ * via internal SQL function.
+ */
+static void
+libpqlrg_delete_from_nodes(PGconn *conn, const char *node_id)
+{
+	StringInfoData query;
+	PGresult *result;
+
+	Assert(PQstatus(conn) == CONNECTION_OK);
+
+	initStringInfo(&query);
+	appendStringInfo(&query, "DELETE FROM pg_lrg_nodes WHERE nodeid = '%s'", node_id);
+
+	result = PQexec(conn, query.data);
+	if (PQresultStatus(result) != PGRES_COMMAND_OK)
+		elog(ERROR, "failed to delete from pg_lrg_nodes: %s", query.data);
+
+	PQclear(result);
+	pfree(query.data);
+}
+
+/*
+ * Delete all data from LRG catalogs
+ */
+static void
+libpqlrg_cleanup(PGconn *conn)
+{
+	PGresult *result;
+	Assert(PQstatus(conn) == CONNECTION_OK);
+
+	result = PQexec(conn, "DELETE FROM pg_lrg_pub;"
+						  "DELETE FROM pg_lrg_sub;"
+						  "DELETE FROM pg_lrg_nodes;"
+						  "DELETE FROM pg_lrg_info;");
+	if (PQresultStatus(result) != PGRES_COMMAND_OK)
+		elog(ERROR, "failed to DELETE");
+
+	PQclear(result);
+}
+
+/*
+ * Just a wrapper for PQfinish()
+ */
+static void
+libpqlrg_disconnect(PGconn *conn)
+{
+	PQfinish(conn);
+}
+
+/*
+ * Module initialization function
+ */
+void
+_PG_init(void)
+{
+	if (LrgFunctionTypes != NULL)
+		elog(ERROR, "libpqlrg already loaded");
+	LrgFunctionTypes = &PQLrgFunctionTypes;
+}
diff --git a/src/backend/replication/lrg/Makefile b/src/backend/replication/lrg/Makefile
new file mode 100644
index 0000000000..4ce929b6a4
--- /dev/null
+++ b/src/backend/replication/lrg/Makefile
@@ -0,0 +1,22 @@
+#-------------------------------------------------------------------------
+#
+# Makefile--
+#    Makefile for src/backend/replication/lrg
+#
+# IDENTIFICATION
+#    src/backend/replication/lrg/Makefile
+#
+#-------------------------------------------------------------------------
+
+subdir = src/backend/replication/lrg
+top_builddir = ../../../..
+include $(top_builddir)/src/Makefile.global
+
+override CPPFLAGS := -I$(srcdir) -I$(libpq_srcdir) $(CPPFLAGS)
+
+OBJS = \
+	lrg.o \
+	lrg_launcher.o \
+	lrg_worker.o
+
+include $(top_srcdir)/src/backend/common.mk
diff --git a/src/backend/replication/lrg/lrg.c b/src/backend/replication/lrg/lrg.c
new file mode 100644
index 0000000000..153eeb6dc9
--- /dev/null
+++ b/src/backend/replication/lrg/lrg.c
@@ -0,0 +1,465 @@
+/*-------------------------------------------------------------------------
+ *
+ * lrg.c
+ *		  Constructs a logical replication group
+ *
+ *-------------------------------------------------------------------------
+ */
+
+#include "postgres.h"
+
+#include "access/heapam.h"
+#include "access/htup_details.h"
+#include "access/relscan.h"
+#include "access/table.h"
+#include "access/xlog.h"
+#include "catalog/catalog.h"
+#include "catalog/indexing.h"
+#include "catalog/pg_lrg_info.h"
+#include "catalog/pg_lrg_nodes.h"
+#include "catalog/pg_lrg_sub.h"
+#include "catalog/pg_subscription.h"
+#include "miscadmin.h"
+#include "postmaster/bgworker.h"
+#include "replication/libpqlrg.h"
+#include "replication/logicallauncher.h"
+#include "replication/lrg.h"
+#include "storage/lock.h"
+#include "utils/builtins.h"
+#include "utils/fmgrprotos.h"
+#include "utils/memutils.h"
+#include "utils/rel.h"
+#include "utils/snapmgr.h"
+#include "utils/syscache.h"
+
+LrgPerdbCtxStruct *LrgPerdbCtx;
+
+static Size lrg_worker_array_size(void);
+static Oid lrg_add_info(char *group_name, bool puballtables);
+static Oid find_subscription(const char *subname);
+
+/*
+ * Helpler function for LrgLauncherShmemInit.
+ */
+static Size
+lrg_worker_array_size(void)
+{
+	Size size;
+
+	size = sizeof(LrgPerdbCtxStruct);
+	size = MAXALIGN(size);
+	/* XXX: for simplify the size of the array is set to max_worker_processes */
+	size = add_size(size, mul_size(max_worker_processes, sizeof(LrgPerdbCtxStruct)));
+
+	return size;
+}
+
+/*
+ * Allocate LrgPerdbCtxStruct to the shared memory.
+ */
+void
+LrgLauncherShmemInit(void)
+{
+	bool		found;
+
+	LWLockAcquire(AddinShmemInitLock, LW_EXCLUSIVE);
+	LrgPerdbCtx = (LrgPerdbCtxStruct *)
+		ShmemInitStruct("Lrg Launcher Data",
+						lrg_worker_array_size(),
+						&found);
+	if (!found)
+	{
+		MemSet(LrgPerdbCtx, 0, lrg_worker_array_size());
+		LWLockInitialize(&(LrgPerdbCtx->lock), LWLockNewTrancheId());
+	}
+	LWLockRelease(AddinShmemInitLock);
+	LWLockRegisterTranche(LrgPerdbCtx->lock.tranche, "lrg");
+}
+
+void
+LrgLauncherRegister(void)
+{
+	BackgroundWorker worker;
+
+	/*
+	 * LRG deeply depends on the logical replication mechanism, so
+	 * skip registering the LRG launcher if logical replication
+	 * cannot be used.
+	 */
+	if (max_logical_replication_workers == 0)
+		return;
+
+	/*
+	 * Build struct BackgroundWorker for launcher.
+	 */
+	MemSet(&worker, 0, sizeof(BackgroundWorker));
+
+	snprintf(worker.bgw_name, BGW_MAXLEN, "lrg launcher");
+	worker.bgw_start_time = BgWorkerStart_RecoveryFinished;
+	worker.bgw_flags = BGWORKER_SHMEM_ACCESS | BGWORKER_BACKEND_DATABASE_CONNECTION;
+	worker.bgw_restart_time = BGW_NEVER_RESTART;
+	snprintf(worker.bgw_library_name, BGW_MAXLEN, "postgres");
+	snprintf(worker.bgw_function_name, BGW_MAXLEN, "lrg_launcher_main");
+	RegisterBackgroundWorker(&worker);
+}
+
+/*
+ * construct node_id.
+ *
+ * TODO: construct proper node_id. Currently it is just concat of
+ * sytem identifier and dbid.
+ */
+void
+construct_node_id(char *out_node_id, int size)
+{
+	snprintf(out_node_id, size, UINT64_FORMAT "%u", GetSystemIdentifier(), MyDatabaseId);
+}
+
+/*
+ * Actual work for adding a tuple to pg_lrg_nodes.
+ */
+void
+lrg_add_nodes(char *node_id, Oid group_id, LRG_NODE_STATE status, char *node_name, char *local_connstring, char *upstream_connstring)
+{
+	Relation rel;
+	bool		nulls[Natts_pg_lrg_nodes];
+	Datum		values[Natts_pg_lrg_nodes];
+	HeapTuple tup;
+
+	Oid			lrgnodesoid;
+
+	rel = table_open(LrgNodesRelationId, ExclusiveLock);
+
+	memset(values, 0, sizeof(values));
+	memset(nulls, 0, sizeof(nulls));
+
+	lrgnodesoid = GetNewOidWithIndex(rel, LrgNodesRelationIndexId, Anum_pg_lrg_nodes_oid);
+	values[Anum_pg_lrg_nodes_oid - 1] = ObjectIdGetDatum(lrgnodesoid);
+	values[Anum_pg_lrg_nodes_nodeid - 1] = CStringGetDatum(node_id);
+	values[Anum_pg_lrg_nodes_groupid - 1] = ObjectIdGetDatum(group_id);
+	values[Anum_pg_lrg_nodes_status - 1] = Int32GetDatum(status);
+	values[Anum_pg_lrg_nodes_dbid - 1] = ObjectIdGetDatum(MyDatabaseId);
+	values[Anum_pg_lrg_nodes_nodename - 1] = CStringGetDatum(node_name);
+	values[Anum_pg_lrg_nodes_localconn - 1] = CStringGetDatum(local_connstring);
+
+	if (upstream_connstring != NULL)
+		values[Anum_pg_lrg_nodes_upstreamconn - 1] = CStringGetDatum(upstream_connstring);
+	else
+		nulls[Anum_pg_lrg_nodes_upstreamconn - 1] = true;
+
+	tup = heap_form_tuple(RelationGetDescr(rel), values, nulls);
+	/* Insert tuple into catalog. */
+	CatalogTupleInsert(rel, tup);
+	heap_freetuple(tup);
+	table_close(rel, ExclusiveLock);
+}
+
+/*
+ * read pg_lrg_info and get oid.
+ *
+ * XXX: This function assumes that there is only one tuple
+ * in thepg_lrg_info.
+ */
+Oid
+get_group_info(char **group_name)
+{
+	Relation	rel;
+	HeapTuple tup;
+	TableScanDesc scan;
+	Oid group_oid = InvalidOid;
+	Form_pg_lrg_info infoform;
+	bool is_opened = false;
+
+	if (!IsTransactionState())
+	{
+		is_opened = true;
+		StartTransactionCommand();
+		(void) GetTransactionSnapshot();
+	}
+
+	rel = table_open(LrgInfoRelationId, AccessShareLock);
+	scan = table_beginscan_catalog(rel, 0, NULL);
+	tup = heap_getnext(scan, ForwardScanDirection);
+
+	if (tup != NULL)
+	{
+		infoform = (Form_pg_lrg_info) GETSTRUCT(tup);
+		group_oid = infoform->oid;
+		if (group_name != NULL)
+		{
+			MemoryContext old;
+			old = MemoryContextSwitchTo(TopMemoryContext);
+			*group_name = pstrdup(NameStr(infoform->groupname));
+			MemoryContextSwitchTo(old);
+		}
+	}
+
+	table_endscan(scan);
+	table_close(rel, AccessShareLock);
+
+	if (is_opened)
+		CommitTransactionCommand();
+
+	return group_oid;
+}
+
+/*
+ * Actual work for adding a tuple to pg_lrg_info.
+ */
+static Oid
+lrg_add_info(char *group_name, bool puballtables)
+{
+	Relation	rel;
+	bool		nulls[Natts_pg_lrg_info];
+	Datum		values[Natts_pg_lrg_info];
+	HeapTuple tup;
+	Oid			lrgoid;
+
+	rel = table_open(LrgInfoRelationId, ExclusiveLock);
+
+	memset(values, 0, sizeof(values));
+	memset(nulls, 0, sizeof(nulls));
+
+	lrgoid = GetNewOidWithIndex(rel, LrgInfoRelationIndexId, Anum_pg_lrg_info_oid);
+	values[Anum_pg_lrg_info_oid - 1] = ObjectIdGetDatum(lrgoid);
+	values[Anum_pg_lrg_info_groupname - 1] = CStringGetDatum(group_name);
+	values[Anum_pg_lrg_info_puballtables - 1] = BoolGetDatum(puballtables);
+
+	tup = heap_form_tuple(RelationGetDescr(rel), values, nulls);
+	/* Insert tuple into catalog. */
+	CatalogTupleInsert(rel, tup);
+	heap_freetuple(tup);
+	table_close(rel, ExclusiveLock);
+
+	return lrgoid;
+}
+
+/*
+ * helper function for lrg_insert_into_sub
+ */
+static Oid
+find_subscription(const char *subname)
+{
+	/* for scannning */
+	Relation rel;
+	HeapTuple tup;
+	Form_pg_subscription form;
+
+	rel = table_open(SubscriptionRelationId, AccessExclusiveLock);
+	tup = SearchSysCacheCopy2(SUBSCRIPTIONNAME, MyDatabaseId,
+							  CStringGetDatum(subname));
+
+	if (!HeapTupleIsValid(tup))
+	{
+		table_close(rel, NoLock);
+		return InvalidOid;
+	}
+
+	form = (Form_pg_subscription) GETSTRUCT(tup);
+	table_close(rel, NoLock);
+
+	return form->oid;
+}
+
+/*
+ * ================================
+ * Public APIs
+ * ================================
+ */
+
+/*
+ * SQL function for creating a new logical replication group.
+ *
+ * This function adds a tuple to pg_lrg_info and pg_lrg_nodes,
+ * and after that kick lrg launcher.
+ */
+Datum
+lrg_create(PG_FUNCTION_ARGS)
+{
+	Oid			lrgoid;
+	char		*group_name;
+	char		*pub_type;
+	char		*local_connstring;
+	char		*node_name;
+
+	/* XXX: for simplify the fixed array is used */
+	char		node_id[64];
+
+	if (get_group_info(NULL) != InvalidOid)
+		elog(ERROR, "This node is already a member of a node group");
+
+	group_name = text_to_cstring((text *) DatumGetPointer(PG_GETARG_DATUM(0)));
+	pub_type = text_to_cstring((text *) DatumGetPointer(PG_GETARG_DATUM(1)));
+
+	if (pg_strcasecmp(pub_type, "FOR ALL TABLES") != 0)
+		elog(ERROR, "'only 'FOR ALL TABLES' is support");
+
+	lrgoid = lrg_add_info(group_name, true);
+
+	construct_node_id(node_id, sizeof(node_id));
+	local_connstring = text_to_cstring((text *) DatumGetPointer(PG_GETARG_DATUM(2)));
+	node_name = text_to_cstring((text *) DatumGetPointer(PG_GETARG_DATUM(3)));
+	lrg_add_nodes(node_id, lrgoid, LRG_STATE_INIT, node_name, local_connstring, NULL);
+
+	lrg_launcher_wakeup();
+	PG_RETURN_NULL();
+}
+
+
+/*
+ * SQL function for attaching to a specified group
+ *
+ * This function adds a tuple to pg_lrg_info and pg_lrg_nodes,
+ * and after that kick lrg launcher.
+ */
+Datum
+lrg_node_attach(PG_FUNCTION_ARGS)
+{
+	Oid			lrgoid;
+	char		*group_name;
+	char		*local_connstring;
+	char		*upstream_connstring;
+	char		*node_name;
+	PGconn		*upstreamconn = NULL;
+
+	/* XXX: for simplify the fixed array is used */
+	char		node_id[64];
+
+	group_name = text_to_cstring((text *) DatumGetPointer(PG_GETARG_DATUM(0)));
+	local_connstring = text_to_cstring((text *) DatumGetPointer(PG_GETARG_DATUM(1)));
+	upstream_connstring = text_to_cstring((text *) DatumGetPointer(PG_GETARG_DATUM(2)));
+	node_name = text_to_cstring((text *) DatumGetPointer(PG_GETARG_DATUM(3)));
+
+	/*
+	 * For sanity check the backend process must connect to the upstream node.
+	 * libpqlrg shared library will be used for that.
+	 */
+	load_file("libpqlrg", false);
+	lrg_connect(upstream_connstring, &upstreamconn);
+	if (!lrg_check_group(upstreamconn, group_name))
+		elog(ERROR, "specified group is not exist");
+	lrg_disconnect(upstreamconn);
+
+	lrgoid = lrg_add_info(group_name, true);
+	construct_node_id(node_id, sizeof(node_id));
+	lrg_add_nodes(node_id, lrgoid, LRG_STATE_INIT, node_name, local_connstring, upstream_connstring);
+
+	lrg_launcher_wakeup();
+	PG_RETURN_NULL();
+}
+
+/*
+ * SQL function for detaching from a group
+ */
+Datum
+lrg_node_detach(PG_FUNCTION_ARGS)
+{
+	char		*node_name;
+	char		*given_group_name;
+	char		*group_name_from_catalog = NULL;
+
+	given_group_name = text_to_cstring((text *) DatumGetPointer(PG_GETARG_DATUM(0)));
+	node_name = text_to_cstring((text *) DatumGetPointer(PG_GETARG_DATUM(1)));
+
+	(void) get_group_info(&group_name_from_catalog);
+	if (strcmp(given_group_name, group_name_from_catalog) != 0)
+		elog(ERROR, "This node is not a member of the specified group: %s", given_group_name);
+
+	update_node_status_by_nodename(node_name, LRG_STATE_TO_BE_DETACHED, true);
+	lrg_launcher_wakeup();
+	PG_RETURN_NULL();
+}
+
+/*
+ * SQL function for dropping a group.
+ */
+Datum
+lrg_drop(PG_FUNCTION_ARGS)
+{
+	char node_id[64];
+	char		*given_group_name;
+	char		*group_name_from_catalog = NULL;
+
+	construct_node_id(node_id, sizeof(node_id));
+
+	given_group_name = text_to_cstring((text *) DatumGetPointer(PG_GETARG_DATUM(0)));
+
+	(void) get_group_info(&group_name_from_catalog);
+	if (strcmp(given_group_name, group_name_from_catalog) != 0)
+		elog(ERROR, "This node is not a member of the specified group: %s", given_group_name);
+
+	/* TODO: add a check whether there are not other members in the group or not  */
+	update_node_status_by_nodeid(node_id, LRG_STATE_TO_BE_DETACHED, true);
+	lrg_launcher_wakeup();
+	PG_RETURN_NULL();
+}
+
+/*
+ * ================================
+ * Internal SQL functions
+ * ================================
+ */
+
+/*
+ * Wrapper for adding a tuple into pg_lrg_sub
+ */
+Datum
+lrg_insert_into_sub(PG_FUNCTION_ARGS)
+{
+	char *sub_name;
+	Oid group_oid, sub_oid, lrgsub_oid;
+	Relation rel;
+	bool		nulls[Natts_pg_lrg_sub];
+	Datum		values[Natts_pg_lrg_sub];
+	HeapTuple tup;
+
+	sub_name = text_to_cstring((text *) DatumGetPointer(PG_GETARG_DATUM(0)));
+
+	group_oid = get_group_info(NULL);
+	sub_oid = find_subscription(sub_name);
+
+	rel = table_open(LrgSubscriptionId, ExclusiveLock);
+
+	memset(values, 0, sizeof(values));
+	memset(nulls, 0, sizeof(nulls));
+
+	lrgsub_oid = GetNewOidWithIndex(rel, LrgSubscriptionOidIndexId, Anum_pg_lrg_sub_oid);
+
+	values[Anum_pg_lrg_sub_oid - 1] = ObjectIdGetDatum(lrgsub_oid);
+	values[Anum_pg_lrg_sub_groupid - 1] = ObjectIdGetDatum(group_oid);
+	values[Anum_pg_lrg_sub_subid - 1] = ObjectIdGetDatum(sub_oid);
+
+	tup = heap_form_tuple(RelationGetDescr(rel), values, nulls);
+	/* Insert tuple into catalog. */
+	CatalogTupleInsert(rel, tup);
+	heap_freetuple(tup);
+	table_close(rel, ExclusiveLock);
+
+	PG_RETURN_NULL();
+}
+
+/*
+ * Wrapper for adding a tuple into pg_lrg_nodes
+ */
+Datum
+lrg_insert_into_nodes(PG_FUNCTION_ARGS)
+{
+	char *node_id;
+	LRG_NODE_STATE status;
+	char *node_name;
+	char *local_connstring;
+	char *upstream_connstring;
+	Oid group_oid;
+
+	node_id = text_to_cstring((text *) DatumGetPointer(PG_GETARG_DATUM(0)));
+	status = DatumGetInt32(PG_GETARG_DATUM(1));
+	node_name = text_to_cstring((text *) DatumGetPointer(PG_GETARG_DATUM(2)));
+	local_connstring = text_to_cstring((text *) DatumGetPointer(PG_GETARG_DATUM(3)));
+	upstream_connstring = text_to_cstring((text *) DatumGetPointer(PG_GETARG_DATUM(4)));
+
+	group_oid = get_group_info(NULL);
+
+	lrg_add_nodes(node_id, group_oid, status, node_name, local_connstring, upstream_connstring);
+
+	PG_RETURN_NULL();
+}
diff --git a/src/backend/replication/lrg/lrg_launcher.c b/src/backend/replication/lrg/lrg_launcher.c
new file mode 100644
index 0000000000..2a63546ffb
--- /dev/null
+++ b/src/backend/replication/lrg/lrg_launcher.c
@@ -0,0 +1,341 @@
+/*-------------------------------------------------------------------------
+ *
+ * lrg_launcher.c
+ *		  functions for lrg launcher
+ *
+ *-------------------------------------------------------------------------
+ */
+#include "postgres.h"
+
+#include "access/htup_details.h"
+#include "access/heapam.h"
+#include "access/relscan.h"
+#include "access/table.h"
+#include "catalog/pg_database.h"
+#include "miscadmin.h"
+#include "postmaster/bgworker.h"
+#include "postmaster/interrupt.h"
+#include "replication/logicallauncher.h"
+#include "replication/lrg.h"
+#include "storage/ipc.h"
+#include "storage/proc.h"
+#include "tcop/tcopprot.h"
+#include "utils/memutils.h"
+#include "utils/snapmgr.h"
+
+static void launch_lrg_worker(Oid dbid);
+static LrgPerdbWorker* find_perdb_worker(Oid dbid);
+static List* get_db_list(void);
+static void scan_and_launch(void);
+static void lrglauncher_worker_onexit(int code, Datum arg);
+
+static bool ishook_registered = false;
+static bool isworker_needed = false;
+
+typedef struct db_list_cell
+{
+	Oid dbid;
+	char *dbname;
+} db_list_cell;
+
+/*
+ * Launch a lrg worker related with the given database
+ */
+static void
+launch_lrg_worker(Oid dbid)
+{
+	BackgroundWorker bgw;
+	LrgPerdbWorker *worker = NULL;
+	int slot = 0;
+
+	LWLockAcquire(&LrgPerdbCtx->lock, LW_EXCLUSIVE);
+
+	/*
+	 * Find a free worker slot.
+	 */
+	for (int i = 0; i < max_logical_replication_workers; i++)
+	{
+		LrgPerdbWorker *pw = &LrgPerdbCtx->workers[i];
+
+		if (pw->dbid == InvalidOid)
+		{
+			worker = pw;
+			slot = i;
+			break;
+		}
+	}
+
+	/*
+	 * If there are no more free worker slots, raise an ERROR now.
+	 *
+	 * TODO: cleanup the array?
+	 */
+	if (worker == NULL)
+	{
+		LWLockRelease(&LrgPerdbCtx->lock);
+		ereport(ERROR,
+				errmsg("out of worker slots"));
+	}
+
+
+	/* Prepare the worker slot. */
+	worker->dbid = dbid;
+
+	LWLockRelease(&LrgPerdbCtx->lock);
+
+	MemSet(&bgw, 0, sizeof(BackgroundWorker));
+
+	snprintf(bgw.bgw_name, BGW_MAXLEN, "lrg worker for database %u", dbid);
+	bgw.bgw_start_time = BgWorkerStart_RecoveryFinished;
+	bgw.bgw_flags = BGWORKER_SHMEM_ACCESS | BGWORKER_BACKEND_DATABASE_CONNECTION;
+	bgw.bgw_restart_time = BGW_NEVER_RESTART;
+	snprintf(bgw.bgw_library_name, BGW_MAXLEN, "postgres");
+	snprintf(bgw.bgw_function_name, BGW_MAXLEN, "lrg_worker_main");
+	bgw.bgw_main_arg = UInt32GetDatum(slot);
+
+	if (!RegisterDynamicBackgroundWorker(&bgw, NULL))
+	{
+		/* Failed to start worker, so clean up the worker slot. */
+		LWLockAcquire(&LrgPerdbCtx->lock, LW_EXCLUSIVE);
+		lrg_worker_cleanup(worker);
+		LWLockRelease(&LrgPerdbCtx->lock);
+		ereport(ERROR,
+				(errcode(ERRCODE_CONFIGURATION_LIMIT_EXCEEDED),
+				 errmsg("out of worker slots")));
+	}
+}
+
+/*
+ * Find a launched lrg worker that related with the given database.
+ * This returns NUL if not exist.
+ */
+static LrgPerdbWorker*
+find_perdb_worker(Oid dbid)
+{
+	int i;
+
+	Assert(LWLockHeldByMe(&LrgPerdbCtx->lock));
+
+	for (i = 0; i < max_logical_replication_workers; i++)
+	{
+		LrgPerdbWorker *worker = &LrgPerdbCtx->workers[i];
+		if (worker->dbid == dbid)
+			return worker;
+	}
+	return NULL;
+}
+
+/*
+ * Load the list of databases in this server.
+ */
+static List*
+get_db_list()
+{
+	List *res = NIL;
+	Relation	rel;
+	TableScanDesc scan;
+	HeapTuple	tup;
+	/* We will allocate the output data in the current memory context */
+	MemoryContext resultcxt = CurrentMemoryContext;
+
+	StartTransactionCommand();
+	(void) GetTransactionSnapshot();
+
+	rel = table_open(DatabaseRelationId, AccessShareLock);
+	scan = table_beginscan_catalog(rel, 0, NULL);
+
+	while (HeapTupleIsValid(tup = heap_getnext(scan, ForwardScanDirection)))
+	{
+		Form_pg_database dbform = (Form_pg_database) GETSTRUCT(tup);
+		db_list_cell *cell;
+		MemoryContext oldcxt;
+
+		/* skip if connection is not allowed */
+		if (!dbform->datallowconn)
+			continue;
+
+		/*
+		 * Allocate our results in the caller's context
+		 */
+		oldcxt = MemoryContextSwitchTo(resultcxt);
+
+		cell = (db_list_cell *) palloc0(sizeof(db_list_cell));
+		cell->dbid = dbform->oid;
+		cell->dbname = pstrdup(NameStr(dbform->datname));
+		res = lappend(res, cell);
+
+		MemoryContextSwitchTo(oldcxt);
+	}
+
+	table_endscan(scan);
+	table_close(rel, AccessShareLock);
+	CommitTransactionCommand();
+
+	return res;
+}
+
+/*
+ * Scan pg_lrg_nodes and launch if needed.
+ */
+static void
+scan_and_launch(void)
+{
+	List *list;
+	ListCell   *lc;
+	MemoryContext subctx;
+	MemoryContext oldctx;
+
+	subctx = AllocSetContextCreate(TopMemoryContext,
+									"Lrg Launcher list",
+									ALLOCSET_DEFAULT_SIZES);
+	oldctx = MemoryContextSwitchTo(subctx);
+
+	list = get_db_list();
+
+	foreach(lc, list)
+	{
+		db_list_cell *cell = (db_list_cell *)lfirst(lc);
+		LrgPerdbWorker *worker;
+
+		LWLockAcquire(&LrgPerdbCtx->lock, LW_EXCLUSIVE);
+		worker = find_perdb_worker(cell->dbid);
+		LWLockRelease(&LrgPerdbCtx->lock);
+
+		if (worker != NULL)
+			continue;
+
+		launch_lrg_worker(cell->dbid);
+	}
+
+	/* Switch back to original memory context. */
+	MemoryContextSwitchTo(oldctx);
+	/* Clean the temporary memory. */
+	MemoryContextDelete(subctx);
+}
+
+
+/*
+ * Callback for process exit. cleanup the controller
+ */
+static void
+lrglauncher_worker_onexit(int code, Datum arg)
+{
+	LWLockAcquire(&LrgPerdbCtx->lock, LW_EXCLUSIVE);
+	LrgPerdbCtx->launcher_pid = InvalidPid;
+	LrgPerdbCtx->launcher_latch = NULL;
+	LWLockRelease(&LrgPerdbCtx->lock);
+}
+
+/*
+ * Entry point for lrg launcher
+ */
+void
+lrg_launcher_main(Datum arg)
+{
+	Assert(LrgPerdbCtx->launcher_pid == 0);
+	LrgPerdbCtx->launcher_pid = MyProcPid;
+
+	/* Establish signal handlers. */
+	pqsignal(SIGHUP, SignalHandlerForConfigReload);
+	pqsignal(SIGTERM, die);
+	BackgroundWorkerUnblockSignals();
+
+	/*
+	 * Register my latch to the controller
+	 * for receiving notifications from lrg background worker.
+	 */
+	LWLockAcquire(&LrgPerdbCtx->lock, LW_EXCLUSIVE);
+	LrgPerdbCtx->launcher_latch = &MyProc->procLatch;
+	LrgPerdbCtx->launcher_pid = MyProcPid;
+	LWLockRelease(&LrgPerdbCtx->lock);
+	before_shmem_exit(lrglauncher_worker_onexit, (Datum) 0);
+	ResetLatch(&MyProc->procLatch);
+
+	/*
+	 * we did not connect specific database, because launcher
+	 * will read only pg_database.
+	 */
+	BackgroundWorkerInitializeConnection(NULL, NULL, 0);
+
+	/*
+	 * main loop
+	 */
+	for (;;)
+	{
+		int rc = 0;
+
+		CHECK_FOR_INTERRUPTS();
+
+		/*
+		 * XXX: for simplify laucnher will start a loop at fixed intervals,
+		 * but it will be no-op if no one sets a latch.
+		 */
+#define TEMPORARY_NAP_TIME 180000L
+
+		rc = WaitLatch(&MyProc->procLatch,
+					   WL_LATCH_SET | WL_TIMEOUT | WL_EXIT_ON_PM_DEATH,
+					   TEMPORARY_NAP_TIME, 0);
+		if (rc & WL_LATCH_SET)
+		{
+			ResetLatch(&MyProc->procLatch);
+			CHECK_FOR_INTERRUPTS();
+			scan_and_launch();
+		}
+
+		if (ConfigReloadPending)
+		{
+			ConfigReloadPending = false;
+			ProcessConfigFile(PGC_SIGHUP);
+		}
+	}
+	/* Not reachable */
+}
+
+/*
+ * xact callback for launcher/worker.
+ */
+static void
+lrg_perdb_wakeup_callback(XactEvent event, void *arg)
+{
+	switch (event)
+	{
+		case XACT_EVENT_COMMIT:
+			if (isworker_needed)
+			{
+				LrgPerdbWorker *worker;
+				LWLockAcquire(&LrgPerdbCtx->lock, LW_EXCLUSIVE);
+				worker = find_perdb_worker(MyDatabaseId);
+
+				/*
+				 * If lrg worker related with this db has been
+				 * launched, notify to the worker.
+				 * If not, maybe it means that someone has called lrg_create()/lrg_node_attach(),
+				 * notify to the launcher.
+				 */
+				if (worker != NULL)
+					SetLatch(worker->worker_latch);
+				else
+					SetLatch(LrgPerdbCtx->launcher_latch);
+
+				LWLockRelease(&LrgPerdbCtx->lock);
+			}
+			isworker_needed = false;
+			break;
+		default:
+			break;
+	}
+}
+
+/*
+ * Register a callback for notifying to launcher, and set a flag
+ */
+void
+lrg_launcher_wakeup(void)
+{
+	if (!ishook_registered)
+	{
+		RegisterXactCallback(lrg_perdb_wakeup_callback, NULL);
+		ishook_registered = true;
+	}
+	isworker_needed = true;
+}
diff --git a/src/backend/replication/lrg/lrg_worker.c b/src/backend/replication/lrg/lrg_worker.c
new file mode 100644
index 0000000000..d11b1cb893
--- /dev/null
+++ b/src/backend/replication/lrg/lrg_worker.c
@@ -0,0 +1,648 @@
+/*-------------------------------------------------------------------------
+ *
+ * lrg_worker.c
+ *		  functions for lrg worker
+ *
+ *-------------------------------------------------------------------------
+ */
+
+#include "postgres.h"
+
+#include "access/genam.h"
+#include "access/heapam.h"
+#include "access/htup_details.h"
+#include "access/relscan.h"
+#include "access/table.h"
+#include "catalog/catalog.h"
+#include "catalog/indexing.h"
+#include "catalog/pg_lrg_info.h"
+#include "catalog/pg_lrg_nodes.h"
+#include "catalog/pg_lrg_pub.h"
+#include "catalog/pg_publication.h"
+#include "executor/spi.h"
+#include "libpq-fe.h"
+#include "miscadmin.h"
+#include "postmaster/bgworker.h"
+#include "postmaster/interrupt.h"
+#include "replication/libpqlrg.h"
+#include "replication/lrg.h"
+#include "storage/ipc.h"
+#include "storage/proc.h"
+#include "tcop/tcopprot.h"
+#include "utils/fmgroids.h"
+#include "utils/memutils.h"
+#include "utils/snapmgr.h"
+#include "utils/syscache.h"
+
+typedef struct LrgNode {
+	Oid	  group_oid;
+	char *node_id;
+	char *node_name;
+	char *local_connstring;
+	char *upstream_connstring;
+} LrgNode;
+
+lrg_function_types *LrgFunctionTypes = NULL;
+
+static LrgPerdbWorker* my_lrg_worker = NULL;
+
+static void lrg_worker_onexit(int code, Datum arg);
+static void do_node_management(void);
+static void get_node_information(LrgNode **node, LRG_NODE_STATE *status);
+static void advance_state_machine(LrgNode *node, LRG_NODE_STATE initial_status);
+static void update_node_status_internal(const char *node_id, const char *node_name, LRG_NODE_STATE state, bool is_in_txn);
+static void detach_node(LrgNode *node);
+static void create_publication(const char* group_name, const char* node_id, Oid group_oid);
+static Oid find_publication(const char *pubname);
+static List* get_lrg_nodes_list(const char *local_nodeid);
+static void synchronise_system_tables(PGconn *localconn, PGconn *upstreamconn);
+
+void
+lrg_worker_cleanup(LrgPerdbWorker *worker)
+{
+	Assert(LWLockHeldByMeInMode(&LrgPerdbCtx->lock, LW_EXCLUSIVE));
+
+	worker->dbid = InvalidOid;
+	worker->worker_pid = InvalidPid;
+	worker->worker_latch = NULL;
+}
+
+/*
+ * Callback for process exit. cleanup the array.
+ */
+static void
+lrg_worker_onexit(int code, Datum arg)
+{
+	LWLockAcquire(&LrgPerdbCtx->lock, LW_EXCLUSIVE);
+	lrg_worker_cleanup(my_lrg_worker);
+	LWLockRelease(&LrgPerdbCtx->lock);
+}
+
+/*
+ * Synchronise system tables from upstream node.
+ *
+ * Currently it will read and insert pg_lrg_nodes only.
+ */
+static void
+synchronise_system_tables(PGconn *localconn, PGconn *upstreamconn)
+{
+	lrg_copy_lrg_nodes(upstreamconn, localconn);
+}
+
+/*
+ * Load the list of lrg_nodes, except the given node
+ */
+static List*
+get_lrg_nodes_list(const char *excepted_node)
+{
+	List *res = NIL;
+	Relation	rel;
+	TableScanDesc scan;
+	HeapTuple	tup;
+	/* We will allocate the output data in the current memory context */
+	MemoryContext resultcxt = CurrentMemoryContext;
+
+	StartTransactionCommand();
+	(void) GetTransactionSnapshot();
+
+	rel = table_open(LrgNodesRelationId, AccessShareLock);
+	scan = table_beginscan_catalog(rel, 0, NULL);
+
+	while (HeapTupleIsValid(tup = heap_getnext(scan, ForwardScanDirection)))
+	{
+		Form_pg_lrg_nodes nodesform = (Form_pg_lrg_nodes) GETSTRUCT(tup);
+		LrgNode *node;
+		MemoryContext oldcxt;
+
+		if (excepted_node != NULL &&
+			strcmp(NameStr(nodesform->nodeid), excepted_node) == 0)
+			continue;
+		/*
+		 * Allocate our results in the caller's context, not the transaction's.
+		 */
+		oldcxt = MemoryContextSwitchTo(resultcxt);
+
+		node = (LrgNode *)palloc0(sizeof(LrgNode));
+		node->group_oid = nodesform->groupid;
+		node->node_id = NameStr(nodesform->nodeid);
+		node->node_name = NameStr(nodesform->nodename);
+		node->local_connstring = NameStr(nodesform->localconn);
+
+		/*
+		 * TODO: treat upstreamconn as nullable field
+		 */
+		node->upstream_connstring = NameStr(nodesform->upstreamconn);
+		res = lappend(res, node);
+
+		MemoryContextSwitchTo(oldcxt);
+	}
+
+	table_endscan(scan);
+	table_close(rel, AccessShareLock);
+	CommitTransactionCommand();
+
+	return res;
+}
+
+/*
+ * Internal routine for updaing the status of the node.
+ *
+ * TODO: implement as C func instead of SPI interface
+ */
+static void
+update_node_status_internal(const char *node_id, const char *node_name, LRG_NODE_STATE state, bool is_in_txn)
+{
+	StringInfoData query;
+	int ret;
+
+	Assert(!(node_id == NULL && node_name == NULL)
+			&& !(node_id != NULL && node_name != NULL));
+
+	initStringInfo(&query);
+	appendStringInfo(&query, "UPDATE pg_lrg_nodes SET status = ");
+
+	switch (state)
+	{
+		case LRG_STATE_CREATE_PUBLICATION:
+			appendStringInfo(&query, "%d ", LRG_STATE_CREATE_PUBLICATION);
+			break;
+		case LRG_STATE_CREATE_SUBSCRIPTION:
+			appendStringInfo(&query, "%d", LRG_STATE_CREATE_SUBSCRIPTION);
+			break;
+		case LRG_STATE_READY:
+			appendStringInfo(&query, "%d", LRG_STATE_READY);
+			break;
+		case LRG_STATE_TO_BE_DETACHED:
+			appendStringInfo(&query, "%d", LRG_STATE_TO_BE_DETACHED);
+			break;
+		default:
+			elog(ERROR, "not implemented yet");
+	}
+
+	if (node_id != NULL)
+		appendStringInfo(&query, " WHERE nodeid = '%s'", node_id);
+	else
+		appendStringInfo(&query, " WHERE nodename = '%s'", node_name);
+
+	if (!is_in_txn)
+	{
+		StartTransactionCommand();
+		PushActiveSnapshot(GetTransactionSnapshot());
+	}
+	SPI_connect();
+
+	ret = SPI_execute(query.data, false, 0);
+	if (ret != SPI_OK_UPDATE)
+		elog(ERROR, "SPI error while updating a table");
+	SPI_finish();
+
+	if (!is_in_txn)
+	{
+		PopActiveSnapshot();
+		CommitTransactionCommand();
+	}
+
+	pfree(query.data);
+}
+
+/*
+ * Update the status of node, that is speciefied by the name
+ */
+void
+update_node_status_by_nodename(const char *node_name, LRG_NODE_STATE state, bool is_in_txn)
+{
+	update_node_status_internal(NULL, node_name, state, is_in_txn);
+}
+
+/*
+ * Same as above, but node_id is used for the key
+ */
+void
+update_node_status_by_nodeid(const char *node_id, LRG_NODE_STATE state, bool is_in_txn)
+{
+	update_node_status_internal(node_id, NULL, state, is_in_txn);
+}
+
+
+static Oid
+find_publication(const char *pubname)
+{
+	Relation rel;
+	HeapTuple tup;
+	Form_pg_publication pubform;
+
+	rel = table_open(PublicationRelationId, RowExclusiveLock);
+
+	/* Check if name is used */
+	tup = SearchSysCacheCopy1(PUBLICATIONNAME,
+							  CStringGetDatum(pubname));
+
+	if (!HeapTupleIsValid(tup))
+	{
+		table_close(rel, NoLock);
+		return InvalidOid;
+	}
+
+	pubform = (Form_pg_publication) GETSTRUCT(tup);
+	table_close(rel, NoLock);
+
+	return pubform->oid;
+}
+
+/*
+ * Create publication via SPI interface, and insert its oid
+ * to the system catalog pg_lrg_pub.
+ */
+static void
+create_publication(const char* group_name, const char* node_id, Oid group_oid)
+{
+	int ret;
+	StringInfoData query, pub_name;
+	Oid pub_oid;
+	Oid lrgpub_oid;
+	Relation rel;
+	bool		nulls[Natts_pg_lrg_pub];
+	Datum		values[Natts_pg_lrg_pub];
+	HeapTuple tup;
+
+	initStringInfo(&query);
+	initStringInfo(&pub_name);
+
+	/* Firstly do CREATE PUBLICATION */
+
+	StartTransactionCommand();
+	SPI_connect();
+	PushActiveSnapshot(GetTransactionSnapshot());
+
+	appendStringInfo(&pub_name, "pub_for_%s", group_name);
+	appendStringInfo(&query, "CREATE PUBLICATION %s %s", pub_name.data, "FOR ALL TABLES");
+
+	ret = SPI_execute(query.data, false, 0);
+	if (ret != SPI_OK_UTILITY)
+		elog(ERROR, "SPI error while creating publication");
+
+	PopActiveSnapshot();
+	SPI_finish();
+	CommitTransactionCommand();
+
+	/* ...And record its oid */
+	StartTransactionCommand();
+	(void) GetTransactionSnapshot();
+
+	pub_oid = find_publication(pub_name.data);
+	if (pub_oid == InvalidOid)
+		elog(ERROR, "publication is not found");
+
+	rel = table_open(LrgPublicationId, ExclusiveLock);
+
+	memset(nulls, 0, sizeof(nulls));
+	memset(values, 0, sizeof(values));
+
+	lrgpub_oid = GetNewOidWithIndex(rel, LrgPublicationOidIndexId, Anum_pg_lrg_pub_oid);
+
+	values[Anum_pg_lrg_pub_oid - 1] = ObjectIdGetDatum(lrgpub_oid);
+	values[Anum_pg_lrg_pub_groupid - 1] = ObjectIdGetDatum(group_oid);
+	values[Anum_pg_lrg_pub_pubid - 1] = ObjectIdGetDatum(pub_oid);
+
+	tup = heap_form_tuple(RelationGetDescr(rel), values, nulls);
+	/* Insert tuple into catalog. */
+	CatalogTupleInsert(rel, tup);
+	heap_freetuple(tup);
+	table_close(rel, ExclusiveLock);
+
+	CommitTransactionCommand();
+
+	pfree(pub_name.data);
+	pfree(query.data);
+}
+
+/*
+ * Some work for detaching and dropping
+ */
+static void
+detach_node(LrgNode *node)
+{
+	PGconn *tobedetached = NULL;
+	List *list;
+	ListCell   *lc;
+	MemoryContext subctx;
+	MemoryContext oldctx;
+	char *group_name = NULL;
+
+	get_group_info(&group_name);
+
+	if (LrgFunctionTypes == NULL)
+		load_file("libpqlrg", false);
+
+	lrg_connect(node->local_connstring, &tobedetached);
+
+	subctx = AllocSetContextCreate(TopMemoryContext,
+									"Lrg Launcher list",
+									ALLOCSET_DEFAULT_SIZES);
+	oldctx = MemoryContextSwitchTo(subctx);
+
+	list = get_lrg_nodes_list(node->node_id);
+
+	if (list != NIL)
+	{
+		foreach(lc, list)
+		{
+			LrgNode *other_node = (LrgNode *)lfirst(lc);
+			PGconn *otherconn = NULL;
+			lrg_connect(other_node->local_connstring, &otherconn);
+
+			lrg_drop_subscription(group_name, node->node_id, other_node->node_id, otherconn);
+			lrg_drop_subscription(group_name, other_node->node_id, node->node_id, tobedetached);
+
+			lrg_delete_from_nodes(otherconn, node->node_id);
+			lrg_disconnect(otherconn);
+		}
+	}
+	else
+		lrg_delete_from_nodes(tobedetached, node->node_id);
+
+	MemoryContextSwitchTo(oldctx);
+	MemoryContextDelete(subctx);
+
+	lrg_drop_publication(group_name, tobedetached);
+	lrg_cleanup(tobedetached);
+	lrg_disconnect(tobedetached);
+
+	pfree(group_name);
+}
+
+/*
+ * advance the state machine for creating/attaching
+ */
+static void
+advance_state_machine(LrgNode *local_node, LRG_NODE_STATE initial_status)
+{
+	PGconn *localconn = NULL;
+	PGconn *upstreamconn = NULL;
+	char *group_name = NULL;
+	LRG_NODE_STATE state = initial_status;
+	char node_id[64];
+
+	/*
+	 * Assuming that the specified node is local
+	 */
+	construct_node_id(node_id, sizeof(node_id));
+	Assert(strcmp(node_id, local_node->node_id) == 0);
+
+	if (state == LRG_STATE_INIT)
+	{
+		/* Establish connection if we are in the attaching case */
+		if (local_node->upstream_connstring != NULL)
+		{
+			load_file("libpqlrg", false);
+			lrg_connect(local_node->upstream_connstring, &upstreamconn);
+			lrg_connect(local_node->local_connstring, &localconn);
+
+			/* and get pg_lrg_nodes from upstream */
+			synchronise_system_tables(localconn, upstreamconn);
+		}
+		get_group_info(&group_name);
+
+		create_publication(group_name, local_node->node_id, local_node->group_oid);
+
+		state = LRG_STATE_CREATE_PUBLICATION;
+		update_node_status_by_nodename(local_node->node_name, LRG_STATE_CREATE_PUBLICATION, false);
+	}
+
+	if (state == LRG_STATE_CREATE_PUBLICATION)
+	{
+		if (local_node->upstream_connstring != NULL)
+		{
+			List *list;
+			ListCell   *lc;
+			MemoryContext subctx;
+			MemoryContext oldctx;
+
+			subctx = AllocSetContextCreate(TopMemoryContext,
+											"Lrg Launcher list",
+											ALLOCSET_DEFAULT_SIZES);
+			oldctx = MemoryContextSwitchTo(subctx);
+
+			/* Get a node list that belong to the group */
+			list = get_lrg_nodes_list(local_node->node_id);
+
+			/* and do CREATE SUBSCRIPTION on all nodes! */
+			foreach(lc, list)
+			{
+				LrgNode *other_node = (LrgNode *)lfirst(lc);
+				PGconn *otherconn = NULL;
+				lrg_connect(other_node->local_connstring, &otherconn);
+				lrg_create_subscription(group_name, local_node->local_connstring,
+										local_node->node_id, other_node->node_id,
+										otherconn, "only_local = true, copy_data = false");
+				lrg_create_subscription(group_name, other_node->local_connstring,
+										other_node->node_id, local_node->node_id,
+										localconn, "only_local = true, copy_data = false");
+
+				/*
+				 * XXX: adding a tuple into remote's pg_lrg_nodes here,
+				 * but it is bad. it should be end of this function.
+				 */
+				if (local_node->upstream_connstring != NULL)
+					lrg_insert_into_lrg_nodes(otherconn, local_node->node_id,
+							LRG_STATE_READY, local_node->node_name,
+							local_node->local_connstring, local_node->upstream_connstring);
+				lrg_disconnect(otherconn);
+			}
+			MemoryContextSwitchTo(oldctx);
+			MemoryContextDelete(subctx);
+		}
+
+		state = LRG_STATE_CREATE_SUBSCRIPTION;
+		update_node_status_by_nodename(local_node->node_name, LRG_STATE_CREATE_SUBSCRIPTION, false);
+	}
+
+	state = LRG_STATE_READY;
+	update_node_status_by_nodename(local_node->node_name, LRG_STATE_READY, false);
+
+	/*
+	 * clean up phase
+	 */
+	if (localconn != NULL)
+		lrg_disconnect(localconn);
+	if (upstreamconn != NULL)
+		lrg_disconnect(upstreamconn);
+	if (group_name != NULL)
+		pfree(group_name);
+}
+
+/*
+ * Get node-specific information that status is not ready.
+ */
+static void
+get_node_information(LrgNode **node, LRG_NODE_STATE *status)
+{
+	Relation	rel;
+	HeapTuple	tup;
+	TableScanDesc scan;
+
+	StartTransactionCommand();
+	(void) GetTransactionSnapshot();
+
+	rel = table_open(LrgNodesRelationId, AccessShareLock);
+	scan = table_beginscan_catalog(rel, 0, NULL);
+
+	while (HeapTupleIsValid(tup = heap_getnext(scan, ForwardScanDirection)))
+	{
+		Form_pg_lrg_nodes nodesform = (Form_pg_lrg_nodes) GETSTRUCT(tup);
+		MemoryContext oldcxt;
+		LrgNode *tmp;
+
+		/*
+		 * If the status is ready, we skip it.
+		 */
+		if (nodesform->status == LRG_STATE_READY)
+			continue;
+
+		oldcxt = MemoryContextSwitchTo(TopMemoryContext);
+		tmp = (LrgNode *)palloc0(sizeof(LrgNode));
+		tmp->group_oid = nodesform->groupid;
+		tmp->node_id = pstrdup(NameStr(nodesform->nodeid));
+		tmp->node_name = pstrdup(NameStr(nodesform->nodename));
+		tmp->local_connstring = pstrdup(NameStr(nodesform->localconn));
+		if (strlen(NameStr(nodesform->upstreamconn)) != 0)
+			tmp->upstream_connstring = pstrdup(NameStr(nodesform->upstreamconn));
+		else
+			tmp->upstream_connstring = NULL;
+
+		*node = tmp;
+		*status = nodesform->status;
+
+		MemoryContextSwitchTo(oldcxt);
+		break;
+	}
+
+	table_endscan(scan);
+	table_close(rel, NoLock);
+	CommitTransactionCommand();
+}
+
+static void
+do_node_management(void)
+{
+	LrgNode *node = NULL;
+	LRG_NODE_STATE status;
+
+	/*
+	 * read information from pg_lrg_nodes
+	 */
+	get_node_information(&node, &status);
+
+	if (node == NULL)
+	{
+		/*
+		 * If we rearch here status of nodes are READY,
+		 * it means that no operations are needed.
+		 */
+		return;
+	}
+
+	/*
+	 * XXX: for simplify the case for detaching/dropping is completely separated
+	 * from the creating/attaching.
+	 */
+	if (status == LRG_STATE_TO_BE_DETACHED)
+		detach_node(node);
+	else
+	{
+		/*
+		 * advance the state machine for creating or attaching.
+		 */
+		advance_state_machine(node, status);
+	}
+
+	pfree(node->node_id);
+	pfree(node->node_name);
+	pfree(node->local_connstring);
+	if (node->upstream_connstring != NULL)
+		pfree(node->upstream_connstring);
+	pfree(node);
+}
+
+/*
+ * Entry point for lrg worker
+ */
+void
+lrg_worker_main(Datum arg)
+{
+	int slot = DatumGetInt32(arg);
+
+	/* Establish signal handlers. */
+	pqsignal(SIGHUP, SignalHandlerForConfigReload);
+	pqsignal(SIGTERM, die);
+	BackgroundWorkerUnblockSignals();
+
+	/*
+	 * Get information from the controller. The idex
+	 * is given as the argument
+	 */
+	LWLockAcquire(&LrgPerdbCtx->lock, LW_SHARED);
+	my_lrg_worker = &LrgPerdbCtx->workers[slot];
+	my_lrg_worker->worker_pid = MyProcPid;
+	my_lrg_worker->worker_latch = &MyProc->procLatch;
+	LWLockRelease(&LrgPerdbCtx->lock);
+
+	before_shmem_exit(lrg_worker_onexit, (Datum) 0);
+
+	BackgroundWorkerInitializeConnectionByOid(my_lrg_worker->dbid, 0, 0);
+
+	elog(DEBUG3, "per-db worker for %u was launched", my_lrg_worker->dbid);
+
+	/*
+	 * The launcher launches the worker without considering
+	 * the existence of lrg related data.
+	 * So firstly workers must check their catalogs, and exit
+	 * if there is no data.
+	 * In any cases pg_lrg_info will have tuples if
+	 * this node is in a node group, so we reads it.
+	 */
+	if (get_group_info(NULL) == InvalidOid)
+	{
+		elog(DEBUG3, "This database %u is not a member of lrg", MyDatabaseId);
+		proc_exit(0);
+	}
+
+	do_node_management();
+
+	ResetLatch(&MyProc->procLatch);
+
+	/*
+	 * Wait for detaching or dropping.
+	 */
+	for (;;)
+	{
+		int rc;
+		bool is_latch_set = false;
+
+		CHECK_FOR_INTERRUPTS();
+
+#define TEMPORARY_NAP_TIME 180000L
+		rc = WaitLatch(&MyProc->procLatch,
+					   WL_LATCH_SET | WL_TIMEOUT | WL_EXIT_ON_PM_DEATH,
+					   TEMPORARY_NAP_TIME, 0);
+
+		if (rc & WL_LATCH_SET)
+		{
+			is_latch_set = true;
+			ResetLatch(&MyProc->procLatch);
+			CHECK_FOR_INTERRUPTS();
+		}
+
+		if (ConfigReloadPending)
+		{
+			ConfigReloadPending = false;
+			ProcessConfigFile(PGC_SIGHUP);
+		}
+
+		if (is_latch_set)
+		{
+			do_node_management();
+			is_latch_set = false;
+		}
+	}
+}
diff --git a/src/backend/storage/ipc/ipci.c b/src/backend/storage/ipc/ipci.c
index 26372d95b3..15b77405bc 100644
--- a/src/backend/storage/ipc/ipci.c
+++ b/src/backend/storage/ipc/ipci.c
@@ -32,6 +32,7 @@
 #include "postmaster/bgwriter.h"
 #include "postmaster/postmaster.h"
 #include "replication/logicallauncher.h"
+#include "replication/lrg.h"
 #include "replication/origin.h"
 #include "replication/slot.h"
 #include "replication/walreceiver.h"
@@ -284,6 +285,7 @@ CreateSharedMemoryAndSemaphores(void)
 	WalRcvShmemInit();
 	PgArchShmemInit();
 	ApplyLauncherShmemInit();
+	LrgLauncherShmemInit();
 
 	/*
 	 * Set up other modules that need some shared memory space
diff --git a/src/include/catalog/pg_lrg_info.h b/src/include/catalog/pg_lrg_info.h
new file mode 100644
index 0000000000..0067aac389
--- /dev/null
+++ b/src/include/catalog/pg_lrg_info.h
@@ -0,0 +1,47 @@
+/*-------------------------------------------------------------------------
+ *
+ * pg_lrg_info.h
+ *	  definition of the "logical replication group information" system
+ *	  catalog (pg_lrg_info)
+ *
+ *
+ * Portions Copyright (c) 1996-2022, PostgreSQL Global Development Group
+ * Portions Copyright (c) 1994, Regents of the University of California
+ *
+ * src/include/catalog/pg_lrg_info.h
+ *
+ * NOTES
+ *	  The Catalog.pm module reads this file and derives schema
+ *	  information.
+ *
+ *-------------------------------------------------------------------------
+ */
+#ifndef PG_LRG_INFO_H
+#define PG_LRG_INFO_H
+
+#include "catalog/genbki.h"
+#include "catalog/pg_lrg_info_d.h"
+
+/* ----------------
+ *		pg_lrg_info definition.  cpp turns this into
+ *		typedef struct FormData_pg_lrg_info
+ * ----------------
+ */
+CATALOG(pg_lrg_info,8337,LrgInfoRelationId)
+{
+	Oid			oid;			/* oid */
+
+	NameData	groupname;		/* name of the logical replication group */
+	bool		puballtables;
+} FormData_pg_lrg_info;
+
+/* ----------------
+ *		Form_pg_lrg_info corresponds to a pointer to a tuple with
+ *		the format of pg_lrg_info relation.
+ * ----------------
+ */
+typedef FormData_pg_lrg_info *Form_pg_lrg_info;
+
+DECLARE_UNIQUE_INDEX_PKEY(pg_lrg_info_oid_index, 8338, LrgInfoRelationIndexId, on pg_lrg_info using btree(oid oid_ops));
+
+#endif							/* PG_LRG_INFO_H */
diff --git a/src/include/catalog/pg_lrg_nodes.h b/src/include/catalog/pg_lrg_nodes.h
new file mode 100644
index 0000000000..0ef32185ad
--- /dev/null
+++ b/src/include/catalog/pg_lrg_nodes.h
@@ -0,0 +1,54 @@
+/*-------------------------------------------------------------------------
+ *
+ * pg_lrg_nodes.h
+ *	  definition of the "logical replication nodes" system
+ *	  catalog (pg_lrg_nodes)
+ *
+ *
+ * Portions Copyright (c) 1996-2022, PostgreSQL Global Development Group
+ * Portions Copyright (c) 1994, Regents of the University of California
+ *
+ * src/include/catalog/pg_lrg_nodes.h
+ *
+ * NOTES
+ *	  The Catalog.pm module reads this file and derives schema
+ *	  information.
+ *
+ *-------------------------------------------------------------------------
+ */
+#ifndef PG_LRG_NODES_H
+#define PG_LRG_NODES_H
+
+#include "catalog/genbki.h"
+#include "catalog/pg_lrg_nodes_d.h"
+
+/* ----------------
+ *		pg_lrg_nodes definition.  cpp turns this into
+ *		typedef struct FormData_pg_lrg_nodes
+ * ----------------
+ */
+CATALOG(pg_lrg_nodes,8339,LrgNodesRelationId)
+{
+	Oid			oid;			/* oid */
+
+	NameData	nodeid;		/* name of the logical replication group */
+	Oid			groupid BKI_LOOKUP(pg_lrg_info);
+	Oid 		dbid BKI_LOOKUP(pg_database);
+	int32		status;
+	NameData	nodename;
+	NameData	localconn;
+	NameData	upstreamconn BKI_FORCE_NULL;
+} FormData_pg_lrg_nodes;
+
+/* ----------------
+ *		Form_pg_lrg_nodes corresponds to a pointer to a tuple with
+ *		the format of pg_lrg_nodes relation.
+ * ----------------
+ */
+typedef FormData_pg_lrg_nodes *Form_pg_lrg_nodes;
+
+DECLARE_UNIQUE_INDEX_PKEY(pg_lrg_nodes_oid_index, 8340, LrgNodesRelationIndexId, on pg_lrg_nodes using btree(oid oid_ops));
+DECLARE_UNIQUE_INDEX(pg_lrg_node_id_index, 8346, LrgNodeIdIndexId, on pg_lrg_nodes using btree(nodeid name_ops));
+DECLARE_UNIQUE_INDEX(pg_lrg_nodes_name_index, 8347, LrgNodeNameIndexId, on pg_lrg_nodes using btree(nodename name_ops));
+
+#endif							/* PG_LRG_NODES_H */
diff --git a/src/include/catalog/pg_lrg_pub.h b/src/include/catalog/pg_lrg_pub.h
new file mode 100644
index 0000000000..d65dc51d4d
--- /dev/null
+++ b/src/include/catalog/pg_lrg_pub.h
@@ -0,0 +1,46 @@
+/*-------------------------------------------------------------------------
+ *
+ * pg_lrg_info.h
+ *	  definition of the "logical replication group publication" system
+ *	  catalog (pg_lrg_pub)
+ *
+ *
+ * Portions Copyright (c) 1996-2022, PostgreSQL Global Development Group
+ * Portions Copyright (c) 1994, Regents of the University of California
+ *
+ * src/include/catalog/pg_lrg_pub.h
+ *
+ * NOTES
+ *	  The Catalog.pm module reads this file and derives schema
+ *	  information.
+ *
+ *-------------------------------------------------------------------------
+ */
+#ifndef PG_LRG_PUB_H
+#define PG_LRG_PUB_H
+
+#include "catalog/genbki.h"
+#include "catalog/pg_lrg_pub_d.h"
+
+/* ----------------
+ *		pg_lrg_pub definition.  cpp turns this into
+ *		typedef struct FormData_pg_lrg_pub
+ * ----------------
+ */
+CATALOG(pg_lrg_pub,8341,LrgPublicationId)
+{
+	Oid			oid;
+	Oid 		groupid BKI_LOOKUP(pg_lrg_info);
+	Oid 		pubid BKI_LOOKUP(pg_publication);
+} FormData_pg_lrg_pub;
+
+/* ----------------
+ *		Form_pg_lrg_pub corresponds to a pointer to a tuple with
+ *		the format of pg_lrg_pub relation.
+ * ----------------
+ */
+typedef FormData_pg_lrg_pub *Form_pg_lrg_pub;
+
+DECLARE_UNIQUE_INDEX_PKEY(pg_lrg_pub_oid_index, 8344, LrgPublicationOidIndexId, on pg_lrg_pub using btree(oid oid_ops));
+
+#endif							/* PG_LRG_PUB_H */
diff --git a/src/include/catalog/pg_lrg_sub.h b/src/include/catalog/pg_lrg_sub.h
new file mode 100644
index 0000000000..398c8e8971
--- /dev/null
+++ b/src/include/catalog/pg_lrg_sub.h
@@ -0,0 +1,46 @@
+/*-------------------------------------------------------------------------
+ *
+ * pg_lrg_sub.h
+ *	  definition of the "logical replication group subscription" system
+ *	  catalog (pg_lrg_sub)
+ *
+ *
+ * Portions Copyright (c) 1996-2022, PostgreSQL Global Development Group
+ * Portions Copyright (c) 1994, Regents of the University of California
+ *
+ * src/include/catalog/pg_lrg_sub.h
+ *
+ * NOTES
+ *	  The Catalog.pm module reads this file and derives schema
+ *	  information.
+ *
+ *-------------------------------------------------------------------------
+ */
+#ifndef PG_LRG_SUB_H
+#define PG_LRG_SUB_H
+
+#include "catalog/genbki.h"
+#include "catalog/pg_lrg_sub_d.h"
+
+/* ----------------
+ *		pg_lrg_sub definition.  cpp turns this into
+ *		typedef struct FormData_pg_lrg_sub
+ * ----------------
+ */
+CATALOG(pg_lrg_sub,8343,LrgSubscriptionId)
+{
+	Oid			oid;
+	Oid 		groupid BKI_LOOKUP(pg_lrg_info);;
+	Oid 		subid BKI_LOOKUP(pg_subscription);
+} FormData_pg_lrg_sub;
+
+/* ----------------
+ *		Form_pg_lrg_sub corresponds to a pointer to a tuple with
+ *		the format of pg_lrg_sub relation.
+ * ----------------
+ */
+typedef FormData_pg_lrg_sub *Form_pg_lrg_sub;
+
+DECLARE_UNIQUE_INDEX_PKEY(pg_lrg_sub_oid_index, 8345, LrgSubscriptionOidIndexId, on pg_lrg_sub using btree(oid oid_ops));
+
+#endif							/* PG_LRG_SUB_H */
diff --git a/src/include/catalog/pg_proc.dat b/src/include/catalog/pg_proc.dat
index babe16f00a..3db7210ef8 100644
--- a/src/include/catalog/pg_proc.dat
+++ b/src/include/catalog/pg_proc.dat
@@ -11885,4 +11885,29 @@
   prorettype => 'bytea', proargtypes => 'pg_brin_minmax_multi_summary',
   prosrc => 'brin_minmax_multi_summary_send' },
 
+# lrg
+{ oid => '8143', descr => 'create logical replication group',
+  proname => 'lrg_create', proparallel => 'r',
+  prorettype => 'void', proargtypes => 'text text text text',
+  prosrc => 'lrg_create' },
+{ oid => '8144', descr => 'attach to logical replication group',
+  proname => 'lrg_node_attach', proparallel => 'r',
+  prorettype => 'void', proargtypes => 'text text text text',
+  prosrc => 'lrg_node_attach' },
+{ oid => '8145', descr => 'detach from logical replication group',
+  proname => 'lrg_node_detach', proparallel => 'r',
+  prorettype => 'void', proargtypes => 'text text',
+  prosrc => 'lrg_node_detach' },
+{ oid => '8146', descr => 'delete logical replication group',
+  proname => 'lrg_drop', proparallel => 'r',
+  prorettype => 'void', proargtypes => 'text',
+  prosrc => 'lrg_drop' },
+{ oid => '8147', descr => 'insert a tuple to pg_lrg_sub',
+  proname => 'lrg_insert_into_sub', proparallel => 'r',
+  prorettype => 'void', proargtypes => 'text',
+  prosrc => 'lrg_insert_into_sub' },
+{ oid => '8148', descr => 'insert a tuple to pg_lrg_nodes',
+  proname => 'lrg_insert_into_nodes', proparallel => 'r',
+  prorettype => 'void', proargtypes => 'text int4 text text text',
+  prosrc => 'lrg_insert_into_nodes' },
 ]
diff --git a/src/include/replication/libpqlrg.h b/src/include/replication/libpqlrg.h
new file mode 100644
index 0000000000..f13b4934d3
--- /dev/null
+++ b/src/include/replication/libpqlrg.h
@@ -0,0 +1,99 @@
+/*-------------------------------------------------------------------------
+ *
+ * libpqlrg.h
+ *		  Constructs a logical replication group
+ *
+ *-------------------------------------------------------------------------
+ */
+#ifndef LIBPQLIG_H
+#define LIBPQLIG_H
+
+#include "postgres.h"
+#include "libpq-fe.h"
+#include "replication/lrg.h"
+
+/* function pointers for libpqlrg */
+
+typedef void (*libpqlrg_connect_fn) (const char *connstring, PGconn **conn);
+typedef bool (*libpqlrg_check_group_fn) (PGconn *conn, const char *group_name);
+typedef void (*libpqlrg_copy_lrg_nodes_fn) (PGconn *remoteconn, PGconn *localconn);
+typedef void (*libpqlrg_insert_into_lrg_nodes_fn) (PGconn *remoteconn,
+												   const char *node_id, LRG_NODE_STATE status,
+												   const char *node_name, const char *local_connstring,
+												   const char *upstream_connstring);
+typedef void (*libpqlrg_create_subscription_fn) (const char *group_name, const char *publisher_connstring,
+											  const char *publisher_node_id, const char *subscriber_node_id,
+											  PGconn *subscriberconn, const char *options);
+
+typedef void (*libpqlrg_drop_publication_fn) (const char *group_name,
+											  PGconn *publisherconn);
+
+typedef void (*libpqlrg_drop_subscription_fn) (const char *group_name,
+											   const char *publisher_node_id, const char *subscriber_node_id,
+											   PGconn *subscriberconn);
+
+typedef void (*libpqlrg_delete_from_nodes_fn) (PGconn *conn, const char *node_id);
+typedef void (*libpqlrg_cleanup_fn) (PGconn *conn);
+
+typedef void (*libpqlrg_disconnect_fn) (PGconn *conn);
+
+typedef struct lrg_function_types
+{
+	libpqlrg_connect_fn libpqlrg_connect;
+	libpqlrg_check_group_fn libpqlrg_check_group;
+	libpqlrg_copy_lrg_nodes_fn libpqlrg_copy_lrg_nodes;
+	libpqlrg_insert_into_lrg_nodes_fn libpqlrg_insert_into_lrg_nodes;
+	libpqlrg_create_subscription_fn libpqlrg_create_subscription;
+	libpqlrg_drop_publication_fn libpqlrg_drop_publication;
+	libpqlrg_drop_subscription_fn libpqlrg_drop_subscription;
+	libpqlrg_delete_from_nodes_fn libpqlrg_delete_from_nodes;
+	libpqlrg_cleanup_fn libpqlrg_cleanup;
+	libpqlrg_disconnect_fn libpqlrg_disconnect;
+} lrg_function_types;
+
+extern PGDLLIMPORT lrg_function_types *LrgFunctionTypes;
+
+#define lrg_connect(connstring, conn) \
+	LrgFunctionTypes->libpqlrg_connect(connstring, conn)
+#define lrg_check_group(conn, group_name) \
+	LrgFunctionTypes->libpqlrg_check_group(conn, group_name)
+#define lrg_copy_lrg_nodes(remoteconn, localconn) \
+	LrgFunctionTypes->libpqlrg_copy_lrg_nodes(remoteconn, localconn)
+
+#define lrg_insert_into_lrg_nodes(remoteconn, \
+								  node_id, status, \
+								  node_name, local_connstring, \
+								  upstream_connstring) \
+	LrgFunctionTypes->libpqlrg_insert_into_lrg_nodes(remoteconn, \
+													 node_id, status, \
+													 node_name, local_connstring, \
+													 upstream_connstring)
+#define lrg_create_subscription(group_name, publisher_connstring, \
+								publisher_node_id, subscriber_node_id, \
+								subscriberconn, options) \
+	LrgFunctionTypes->libpqlrg_create_subscription(group_name, publisher_connstring, \
+												publisher_node_id, subscriber_node_id, \
+												subscriberconn, options)
+
+#define lrg_drop_publication(group_name, \
+							  publisherconn) \
+	LrgFunctionTypes->libpqlrg_drop_publication(group_name, \
+												 publisherconn)
+
+#define lrg_drop_subscription(group_name, \
+							  publisher_node_id, subscriber_node_id, \
+							  subscriberconn) \
+	LrgFunctionTypes->libpqlrg_drop_subscription(group_name, \
+												 publisher_node_id, subscriber_node_id, \
+												 subscriberconn)
+
+#define lrg_delete_from_nodes(conn, node_id) \
+	LrgFunctionTypes->libpqlrg_delete_from_nodes(conn, node_id)
+
+#define lrg_cleanup(conn) \
+	LrgFunctionTypes->libpqlrg_cleanup(conn)
+
+#define lrg_disconnect(conn) \
+	LrgFunctionTypes->libpqlrg_disconnect(conn)
+
+#endif /* LIBPQLIG_H */
\ No newline at end of file
diff --git a/src/include/replication/lrg.h b/src/include/replication/lrg.h
new file mode 100644
index 0000000000..f0e38696cf
--- /dev/null
+++ b/src/include/replication/lrg.h
@@ -0,0 +1,68 @@
+/*-------------------------------------------------------------------------
+ *
+ * lrg.h
+ *		  Constructs a logical replication group
+ *
+ *-------------------------------------------------------------------------
+ */
+#ifndef LRG_H
+#define LRG_H
+
+#include "postgres.h"
+
+#include "storage/latch.h"
+#include "storage/lock.h"
+#include "storage/lwlock.h"
+
+/*
+ * enumeration for represents its status
+ */
+typedef enum
+{
+	LRG_STATE_INIT = 0,
+	LRG_STATE_CREATE_PUBLICATION,
+	LRG_STATE_CREATE_SUBSCRIPTION,
+	LRG_STATE_READY,
+	LRG_STATE_TO_BE_DETACHED,
+} LRG_NODE_STATE;
+
+/*
+ * working space for each lrg per-db worker.
+ */
+typedef struct LrgPerdbWorker {
+	pid_t worker_pid;
+	Oid dbid;
+	Latch *worker_latch;
+} LrgPerdbWorker;
+
+/*
+ * controller for lrg per-db worker.
+ * This will be hold by launcher.
+ */
+typedef struct LrgPerdbCtxStruct {
+	LWLock lock;
+	pid_t launcher_pid;
+	Latch *launcher_latch;
+	LrgPerdbWorker workers[FLEXIBLE_ARRAY_MEMBER];
+} LrgPerdbCtxStruct;
+
+extern LrgPerdbCtxStruct *LrgPerdbCtx;
+
+/* lrg.c */
+extern void LrgLauncherShmemInit(void);
+extern void LrgLauncherRegister(void);
+extern void lrg_add_nodes(char *node_id, Oid group_id, LRG_NODE_STATE status, char *node_name, char *local_connstring, char *upstream_connstring);
+extern Oid get_group_info(char **group_name);
+extern void construct_node_id(char *out_node_id, int size);
+extern void update_node_status_by_nodename(const char *node_name, LRG_NODE_STATE state, bool is_in_txn);
+extern void update_node_status_by_nodeid(const char *node_id, LRG_NODE_STATE state, bool is_in_txn);
+
+/* lrg_launcher.c */
+extern void lrg_launcher_main(Datum arg) pg_attribute_noreturn();
+extern void lrg_launcher_wakeup(void);
+
+/* *lrg_worker.c */
+extern void lrg_worker_main(Datum arg) pg_attribute_noreturn();
+extern void lrg_worker_cleanup(LrgPerdbWorker *worker);
+
+#endif /* LRG_H */
\ No newline at end of file
diff --git a/src/test/regress/expected/oidjoins.out b/src/test/regress/expected/oidjoins.out
index 215eb899be..da0ef150a2 100644
--- a/src/test/regress/expected/oidjoins.out
+++ b/src/test/regress/expected/oidjoins.out
@@ -266,3 +266,9 @@ NOTICE:  checking pg_subscription {subdbid} => pg_database {oid}
 NOTICE:  checking pg_subscription {subowner} => pg_authid {oid}
 NOTICE:  checking pg_subscription_rel {srsubid} => pg_subscription {oid}
 NOTICE:  checking pg_subscription_rel {srrelid} => pg_class {oid}
+NOTICE:  checking pg_lrg_nodes {groupid} => pg_lrg_info {oid}
+NOTICE:  checking pg_lrg_nodes {dbid} => pg_database {oid}
+NOTICE:  checking pg_lrg_pub {groupid} => pg_lrg_info {oid}
+NOTICE:  checking pg_lrg_pub {pubid} => pg_publication {oid}
+NOTICE:  checking pg_lrg_sub {groupid} => pg_lrg_info {oid}
+NOTICE:  checking pg_lrg_sub {subid} => pg_subscription {oid}
-- 
2.27.0

#15

bruce@momjian.us

over 3 years ago

In reply to: Amit Kapila (#11)

Re: Multi-Master Logical Replication

On Sat, May 14, 2022 at 12:20:05PM +0530, Amit Kapila wrote:

On Sat, May 14, 2022 at 12:33 AM Bruce Momjian <bruce@momjian.us> wrote:

Uh, without these features, what workload would this help with?

To allow replication among multiple nodes when some of the nodes may
have pre-existing data. This work plans to provide simple APIs to
achieve that. Now, let me try to explain the difficulties users can
face with the existing interface. It is simple to set up replication
among various nodes when they don't have any pre-existing data but
even in that case if the user operates on the same table at multiple
nodes, the replication will lead to an infinite loop and won't
proceed. The example in email [1] demonstrates that and the patch in
that thread attempts to solve it. I have mentioned that problem
because this work will need that patch.

...

This will become more complicated when more than two nodes are
involved, see the example provided for the three nodes case [2]. Can
you think of some other simpler way to achieve the same? If not, I
don't think the current way is ideal and even users won't prefer that.
I am not telling that the APIs proposed in this thread is the only or
best way to achieve the desired purpose but I think we should do
something to allow users to easily set up replication among multiple
nodes.

You still have not answered my question above. "Without these features,
what workload would this help with?" You have only explained how the
patch would fix one of the many larger problems.

--
Bruce Momjian <bruce@momjian.us> https://momjian.us
EDB https://enterprisedb.com

Indecision is a decision. Inaction is an action. Mark Batterson

#16

amit.kapila16@gmail.com

over 3 years ago

In reply to: Bruce Momjian (#15)

Re: Multi-Master Logical Replication

On Tue, May 24, 2022 at 5:57 PM Bruce Momjian <bruce@momjian.us> wrote:

On Sat, May 14, 2022 at 12:20:05PM +0530, Amit Kapila wrote:

On Sat, May 14, 2022 at 12:33 AM Bruce Momjian <bruce@momjian.us> wrote:

Uh, without these features, what workload would this help with?

To allow replication among multiple nodes when some of the nodes may
have pre-existing data. This work plans to provide simple APIs to
achieve that. Now, let me try to explain the difficulties users can
face with the existing interface. It is simple to set up replication
among various nodes when they don't have any pre-existing data but
even in that case if the user operates on the same table at multiple
nodes, the replication will lead to an infinite loop and won't
proceed. The example in email [1] demonstrates that and the patch in
that thread attempts to solve it. I have mentioned that problem
because this work will need that patch.

...

This will become more complicated when more than two nodes are
involved, see the example provided for the three nodes case [2]. Can
you think of some other simpler way to achieve the same? If not, I
don't think the current way is ideal and even users won't prefer that.
I am not telling that the APIs proposed in this thread is the only or
best way to achieve the desired purpose but I think we should do
something to allow users to easily set up replication among multiple
nodes.

You still have not answered my question above. "Without these features,
what workload would this help with?" You have only explained how the
patch would fix one of the many larger problems.

It helps with setting up logical replication among two or more nodes
(data flows both ways) which is important for use cases where
applications are data-aware. For such apps, it will be beneficial to
always send and retrieve data to local nodes in a geographically
distributed database. Now, for such apps, to get 100% consistent data
among nodes, one needs to enable synchronous_mode (aka set
synchronous_standby_names) but if that hurts performance and the data
is for analytical purposes then one can use it in asynchronous mode.
Now, for such cases, if the local node goes down, the other master
node can be immediately available to use, sure it may slow down the
operations for some time till the local node come-up. For such apps,
later it will be also easier to perform online upgrades.

Without this, if the user tries to achieve the same via physical
replication by having two local nodes, it can take quite long before
the standby can be promoted to master and local reads/writes will be
much costlier.

--
With Regards,
Amit Kapila.

#17

Peter Smith

smithpb2250@gmail.com

over 3 years ago

In reply to: Amit Kapila (#16)

Re: Multi-Master Logical Replication

On Wed, May 25, 2022 at 4:43 PM Amit Kapila <amit.kapila16@gmail.com> wrote:

On Tue, May 24, 2022 at 5:57 PM Bruce Momjian <bruce@momjian.us> wrote:

On Sat, May 14, 2022 at 12:20:05PM +0530, Amit Kapila wrote:

On Sat, May 14, 2022 at 12:33 AM Bruce Momjian <bruce@momjian.us> wrote:

Uh, without these features, what workload would this help with?

To allow replication among multiple nodes when some of the nodes may
have pre-existing data. This work plans to provide simple APIs to
achieve that. Now, let me try to explain the difficulties users can
face with the existing interface. It is simple to set up replication
among various nodes when they don't have any pre-existing data but
even in that case if the user operates on the same table at multiple
nodes, the replication will lead to an infinite loop and won't
proceed. The example in email [1] demonstrates that and the patch in
that thread attempts to solve it. I have mentioned that problem
because this work will need that patch.

...

This will become more complicated when more than two nodes are
involved, see the example provided for the three nodes case [2]. Can
you think of some other simpler way to achieve the same? If not, I
don't think the current way is ideal and even users won't prefer that.
I am not telling that the APIs proposed in this thread is the only or
best way to achieve the desired purpose but I think we should do
something to allow users to easily set up replication among multiple
nodes.

You still have not answered my question above. "Without these features,
what workload would this help with?" You have only explained how the
patch would fix one of the many larger problems.

It helps with setting up logical replication among two or more nodes
(data flows both ways) which is important for use cases where
applications are data-aware. For such apps, it will be beneficial to
always send and retrieve data to local nodes in a geographically
distributed database. Now, for such apps, to get 100% consistent data
among nodes, one needs to enable synchronous_mode (aka set
synchronous_standby_names) but if that hurts performance and the data
is for analytical purposes then one can use it in asynchronous mode.
Now, for such cases, if the local node goes down, the other master
node can be immediately available to use, sure it may slow down the
operations for some time till the local node come-up. For such apps,
later it will be also easier to perform online upgrades.

Without this, if the user tries to achieve the same via physical
replication by having two local nodes, it can take quite long before
the standby can be promoted to master and local reads/writes will be
much costlier.

As mentioned above, the LRG idea might be a useful addition to logical
replication for configuring certain types of "data-aware"
applications.

LRG for data-aware apps (e.g. sensor data)
------------------------------------------
Consider an example where there are multiple weather stations for a
country. Each weather station is associated with a PostgreSQL node and
inserts the local sensor data (e.g wind/rain/sunshine etc) once a
minute to some local table. The row data is identified by some station
ID.

- Perhaps there are many nodes.

- Loss of a single row of replicated sensor data if some node goes
down is not a major problem for this sort of application.

- Benefits of processing data locally can be realised.

- Using LRG simplifies the setup/sharing of the data across all group
nodes via a common table.

LRG makes setup easier
----------------------
Although it is possible already (using Vignesh's "infinite recursion"
WIP patch [1]/messages/by-id/CALDaNm0gwjY_4HFxvvty01BOT01q_fJLKQ3pWP9=9orqubhjcQ@mail.gmail.com) to set up this kind of environment using logical
replication, as the number of nodes grows it becomes more and more
difficult to do it. For each new node, there needs to be N-1 x CREATE
SUBSCRIPTION for the other group nodes, meaning the connection details
for every other node also must be known up-front for the script.

OTOH, the LRG API can simplify all this, removing the user's burden
and risk of mistakes. Also, LRG only needs to know how to reach just 1
other node in the group (the implementation will discover all the
other node connection details internally).

LRG can handle initial table data
--------------------------------
If the joining node (e.g. a new weather station) already has some
initial local sensor data then sharing that initial data manually with
all the other nodes requires some tricky steps. LRG can hide all this
complexity behind the API, so it is not a user problem anymore.

------
[1]: /messages/by-id/CALDaNm0gwjY_4HFxvvty01BOT01q_fJLKQ3pWP9=9orqubhjcQ@mail.gmail.com

Kind Regards,
Peter Smith.
Fujitsu Australia

#18

bruce@momjian.us

over 3 years ago

In reply to: Amit Kapila (#16)

Re: Multi-Master Logical Replication

On Wed, May 25, 2022 at 12:13:17PM +0530, Amit Kapila wrote:

You still have not answered my question above. "Without these features,
what workload would this help with?" You have only explained how the
patch would fix one of the many larger problems.

It helps with setting up logical replication among two or more nodes
(data flows both ways) which is important for use cases where
applications are data-aware. For such apps, it will be beneficial to

That does make sense, thanks.

--
Bruce Momjian <bruce@momjian.us> https://momjian.us
EDB https://enterprisedb.com

Indecision is a decision. Inaction is an action. Mark Batterson

#19

over 3 years ago

In reply to: Bruce Momjian (#18)

5 attachment(s)

RE: Multi-Master Logical Replication

Dear hackers,

I added documentation more and tap-tests about LRG.
Same as previous e-mail, 0001 and 0002 are copied from [1]https://commitfest.postgresql.org/38/3610/.

Following lists are the TODO of patches, they will be solved one by one.

## Functional

* implement a new state "waitforlsncatchup",
that waits until the upstream node receives the latest lsn of the remaining nodes,
* implement an over-node locking mechanism
* implement operations that shares initial data
* implement mechanisms to avoid concurrent API execution

Note that tap-test must be also added if above are added.

## Implemental

* consider failure-handing while executing APIs
* add error codes for LRG
* move elog() to ereport() for native language support
* define pg_lrg_nodes that has NULL-able attribute as proper style

[1]: https://commitfest.postgresql.org/38/3610/

Best Regards,
Hayato Kuroda
FUJITSU LIMITED

Attachments:

v3-0003-PoC-implement-LRG.patchapplication/octet-stream; name=v3-0003-PoC-implement-LRG.patchDownload

From 719ca0128dd9d4ded7c61d86875956114a51cf08 Mon Sep 17 00:00:00 2001
From: "kuroda.hayato%40jp.fujitsu.com" <kuroda.hayato@jp.fujitsu.com>
Date: Tue, 17 May 2022 08:03:31 +0000
Subject: [PATCH v3 1/3] (PoC) implement LRG

Logical Replication Group (LRG) is a way to create a node group that
replicates data objects and their changes to each other. All nodes in the group
can execute Read-Write queries, and its changes will "eventually" send to other nodes.

In order to implement this feature, two processes "LRG launcher"
and "LRG worker" have been introduced. LRG launcher process will boot
when the server promotes, and it has a responsibility for starting
LRG worker processes. LRG worker process has a responsibility for connecting
to other nodes, CREATE PUB/SUB, update system catalogs, and so on.

Note that for using libpq functions in LRG worker processes, a new
libirary libpqlrg has been also introduced.
---
 src/Makefile                                |   1 +
 src/backend/catalog/Makefile                |   3 +-
 src/backend/postmaster/bgworker.c           |   7 +
 src/backend/postmaster/postmaster.c         |   3 +
 src/backend/replication/Makefile            |   4 +-
 src/backend/replication/libpqlrg/Makefile   |  38 ++
 src/backend/replication/libpqlrg/libpqlrg.c | 352 +++++++++++
 src/backend/replication/lrg/Makefile        |  22 +
 src/backend/replication/lrg/lrg.c           | 522 ++++++++++++++++
 src/backend/replication/lrg/lrg_launcher.c  | 341 ++++++++++
 src/backend/replication/lrg/lrg_worker.c    | 652 ++++++++++++++++++++
 src/backend/storage/ipc/ipci.c              |   2 +
 src/include/catalog/pg_lrg_info.h           |  47 ++
 src/include/catalog/pg_lrg_nodes.h          |  54 ++
 src/include/catalog/pg_lrg_pub.h            |  46 ++
 src/include/catalog/pg_lrg_sub.h            |  46 ++
 src/include/catalog/pg_proc.dat             |  30 +
 src/include/replication/libpqlrg.h          |  99 +++
 src/include/replication/lrg.h               |  68 ++
 src/test/regress/expected/oidjoins.out      |   6 +
 20 files changed, 2341 insertions(+), 2 deletions(-)
 create mode 100644 src/backend/replication/libpqlrg/Makefile
 create mode 100644 src/backend/replication/libpqlrg/libpqlrg.c
 create mode 100644 src/backend/replication/lrg/Makefile
 create mode 100644 src/backend/replication/lrg/lrg.c
 create mode 100644 src/backend/replication/lrg/lrg_launcher.c
 create mode 100644 src/backend/replication/lrg/lrg_worker.c
 create mode 100644 src/include/catalog/pg_lrg_info.h
 create mode 100644 src/include/catalog/pg_lrg_nodes.h
 create mode 100644 src/include/catalog/pg_lrg_pub.h
 create mode 100644 src/include/catalog/pg_lrg_sub.h
 create mode 100644 src/include/replication/libpqlrg.h
 create mode 100644 src/include/replication/lrg.h

diff --git a/src/Makefile b/src/Makefile
index 79e274a476..75db706762 100644
--- a/src/Makefile
+++ b/src/Makefile
@@ -23,6 +23,7 @@ SUBDIRS = \
 	interfaces \
 	backend/replication/libpqwalreceiver \
 	backend/replication/pgoutput \
+	backend/replication/libpqlrg \
 	fe_utils \
 	bin \
 	pl \
diff --git a/src/backend/catalog/Makefile b/src/backend/catalog/Makefile
index 89a0221ec9..744fdf4fb8 100644
--- a/src/backend/catalog/Makefile
+++ b/src/backend/catalog/Makefile
@@ -72,7 +72,8 @@ CATALOG_HEADERS := \
 	pg_collation.h pg_parameter_acl.h pg_partitioned_table.h \
 	pg_range.h pg_transform.h \
 	pg_sequence.h pg_publication.h pg_publication_namespace.h \
-	pg_publication_rel.h pg_subscription.h pg_subscription_rel.h
+	pg_publication_rel.h pg_subscription.h pg_subscription_rel.h \
+	pg_lrg_info.h pg_lrg_nodes.h pg_lrg_pub.h pg_lrg_sub.h
 
 GENERATED_HEADERS := $(CATALOG_HEADERS:%.h=%_d.h) schemapg.h system_fk_info.h
 
diff --git a/src/backend/postmaster/bgworker.c b/src/backend/postmaster/bgworker.c
index 40601aefd9..49d8ff1878 100644
--- a/src/backend/postmaster/bgworker.c
+++ b/src/backend/postmaster/bgworker.c
@@ -20,6 +20,7 @@
 #include "postmaster/bgworker_internals.h"
 #include "postmaster/interrupt.h"
 #include "postmaster/postmaster.h"
+#include "replication/lrg.h"
 #include "replication/logicallauncher.h"
 #include "replication/logicalworker.h"
 #include "storage/dsm.h"
@@ -128,6 +129,12 @@ static const struct
 	},
 	{
 		"ApplyWorkerMain", ApplyWorkerMain
+	},
+	{
+		"lrg_launcher_main", lrg_launcher_main
+	},
+	{
+		"lrg_worker_main", lrg_worker_main
 	}
 };
 
diff --git a/src/backend/postmaster/postmaster.c b/src/backend/postmaster/postmaster.c
index 3b73e26956..b900008cdd 100644
--- a/src/backend/postmaster/postmaster.c
+++ b/src/backend/postmaster/postmaster.c
@@ -118,6 +118,7 @@
 #include "postmaster/postmaster.h"
 #include "postmaster/syslogger.h"
 #include "replication/logicallauncher.h"
+#include "replication/lrg.h"
 #include "replication/walsender.h"
 #include "storage/fd.h"
 #include "storage/ipc.h"
@@ -1020,6 +1021,8 @@ PostmasterMain(int argc, char *argv[])
 	 */
 	ApplyLauncherRegister();
 
+	LrgLauncherRegister();
+
 	/*
 	 * process any libraries that should be preloaded at postmaster start
 	 */
diff --git a/src/backend/replication/Makefile b/src/backend/replication/Makefile
index 3d8fb70c0e..49ffc243f6 100644
--- a/src/backend/replication/Makefile
+++ b/src/backend/replication/Makefile
@@ -35,7 +35,9 @@ OBJS = \
 	walreceiverfuncs.o \
 	walsender.o
 
-SUBDIRS = logical
+SUBDIRS = \
+	logical \
+	lrg
 
 include $(top_srcdir)/src/backend/common.mk
 
diff --git a/src/backend/replication/libpqlrg/Makefile b/src/backend/replication/libpqlrg/Makefile
new file mode 100644
index 0000000000..72d911a918
--- /dev/null
+++ b/src/backend/replication/libpqlrg/Makefile
@@ -0,0 +1,38 @@
+#-------------------------------------------------------------------------
+#
+# Makefile--
+#    Makefile for src/backend/replication/lrg/libpqlrg
+#
+# IDENTIFICATION
+#    src/backend/replication/lrg/libpqlrg/Makefile
+#
+#-------------------------------------------------------------------------
+
+subdir = src/backend/replication/lrg/libpqlrg
+top_builddir = ../../../..
+include $(top_builddir)/src/Makefile.global
+
+override CPPFLAGS := -I$(srcdir) -I$(libpq_srcdir) $(CPPFLAGS)
+
+OBJS = \
+	$(WIN32RES) \
+	libpqlrg.o
+
+SHLIB_LINK_INTERNAL = $(libpq)
+SHLIB_LINK = $(filter -lintl, $(LIBS))
+SHLIB_PREREQS = submake-libpq
+PGFILEDESC = "libpqlrg"
+NAME = libpqlrg
+
+all: all-shared-lib
+
+include $(top_srcdir)/src/Makefile.shlib
+
+install: all installdirs install-lib
+
+installdirs: installdirs-lib
+
+uninstall: uninstall-lib
+
+clean distclean maintainer-clean: clean-lib
+	rm -f $(OBJS)
diff --git a/src/backend/replication/libpqlrg/libpqlrg.c b/src/backend/replication/libpqlrg/libpqlrg.c
new file mode 100644
index 0000000000..b313e7c0b8
--- /dev/null
+++ b/src/backend/replication/libpqlrg/libpqlrg.c
@@ -0,0 +1,352 @@
+/*-------------------------------------------------------------------------
+ *
+ * libpqlrg.c
+ *
+ * This file contains the libpq-specific parts of lrg feature. It's
+ * loaded as a dynamic module to avoid linking the main server binary with
+ * libpq.
+ *-------------------------------------------------------------------------
+ */
+
+
+#include "postgres.h"
+
+#include "access/heapam.h"
+#include "funcapi.h"
+#include "libpq-fe.h"
+#include "lib/stringinfo.h"
+#include "replication/libpqlrg.h"
+#include "replication/lrg.h"
+#include "utils/snapmgr.h"
+
+PG_MODULE_MAGIC;
+
+void		_PG_init(void);
+
+/* Prototypes for interface functions */
+static void libpqlrg_connect(const char *connstring, PGconn **conn);
+static bool libpqlrg_check_group(PGconn *conn, const char *group_name);
+static void libpqlrg_copy_lrg_nodes(PGconn *remoteconn, PGconn *localconn);
+static void libpqlrg_insert_into_lrg_nodes(PGconn *remoteconn,
+										   const char *node_id, LRG_NODE_STATE status,
+										   const char *node_name, const char *local_connstring,
+										   const char *upstream_connstring);
+
+static void libpqlrg_create_subscription(const char *group_name, const char *publisher_connstring,
+										 const char *publisher_node_id, const char *subscriber_node_id,
+										 PGconn *subscriberconn, const char *options);
+
+static void libpqlrg_drop_publication(const char *group_name,
+									  PGconn *publisherconn);
+
+static void libpqlrg_drop_subscription(const char *group_name,
+										const char *publisher_node_id, const char *subscriber_node_id,
+										PGconn *subscriberconn);
+
+static void libpqlrg_delete_from_nodes(PGconn *conn, const char *node_id);
+
+static void libpqlrg_cleanup(PGconn *conn);
+
+static void libpqlrg_disconnect(PGconn *conn);
+
+static lrg_function_types PQLrgFunctionTypes =
+{
+	libpqlrg_connect,
+	libpqlrg_check_group,
+	libpqlrg_copy_lrg_nodes,
+	libpqlrg_insert_into_lrg_nodes,
+	libpqlrg_create_subscription,
+	libpqlrg_drop_publication,
+	libpqlrg_drop_subscription,
+	libpqlrg_delete_from_nodes,
+	libpqlrg_cleanup,
+	libpqlrg_disconnect
+};
+
+/*
+ * Just a wrapper for PQconnectdb() and PQstatus().
+ */
+static void
+libpqlrg_connect(const char *connstring, PGconn **conn)
+{
+	*conn = PQconnectdb(connstring);
+	if (PQstatus(*conn) != CONNECTION_OK)
+		elog(ERROR, "failed to connect");
+}
+
+/*
+ * Check whether the node is in the specified group or not.
+ */
+static bool
+libpqlrg_check_group(PGconn *conn, const char *group_name)
+{
+	PGresult *result;
+	StringInfoData query;
+	bool ret;
+
+	Assert(PQstatus(conn) == CONNECTION_OK);
+	initStringInfo(&query);
+	appendStringInfo(&query, "SELECT COUNT(*) FROM pg_lrg_info WHERE groupname = '%s'", group_name);
+
+	result = PQexec(conn, query.data);
+
+	ret = atoi(PQgetvalue(result, 0, 0));
+	pfree(query.data);
+
+	return ret != 0;
+}
+
+/*
+ * Copy pg_lrg_nodes from remoteconn.
+ */
+static void
+libpqlrg_copy_lrg_nodes(PGconn *remoteconn, PGconn *localconn)
+{
+	PGresult *result;
+	StringInfoData query;
+	int i, num_tuples;
+
+	Assert(PQstatus(remoteconn) == CONNECTION_OK
+		   && PQstatus(localconn) == CONNECTION_OK);
+	initStringInfo(&query);
+
+
+	/*
+	 * Note that COPY command cannot be used here because group_oid
+	 * might be different between remote and local.
+	 */
+	appendStringInfo(&query, "SELECT nodeid, status, nodename, "
+							 "localconn, upstreamconn FROM pg_lrg_nodes");
+	result = PQexec(remoteconn, query.data);
+	if (PQresultStatus(result) != PGRES_TUPLES_OK)
+		elog(ERROR, "failed to read pg_lrg_nodes");
+
+	resetStringInfo(&query);
+
+	num_tuples = PQntuples(result);
+
+	for(i = 0; i < num_tuples; i++)
+	{
+		char *node_id;
+		char *status;
+		char *nodename;
+		char *localconn;
+		char *upstreamconn;
+
+		node_id = PQgetvalue(result, i, 0);
+		status = PQgetvalue(result, i, 1);
+		nodename = PQgetvalue(result, i, 2);
+		localconn = PQgetvalue(result, i, 3);
+		upstreamconn = PQgetvalue(result, i, 4);
+
+		StartTransactionCommand();
+		(void) GetTransactionSnapshot();
+		/*
+		 * group_oid is adjusted to local value
+		 */
+		lrg_add_nodes(node_id, get_group_info(NULL), atoi(status), nodename, localconn, upstreamconn);
+		CommitTransactionCommand();
+	}
+}
+
+/*
+ * Insert data to remote's pg_lrg_nodes. It will be done
+ * via internal SQL function.
+ */
+static void
+libpqlrg_insert_into_lrg_nodes(PGconn *remoteconn,
+							   const char *node_id, LRG_NODE_STATE status,
+							   const char *node_name, const char *local_connstring,
+							   const char *upstream_connstring)
+{
+	StringInfoData query;
+	PGresult *result;
+
+	Assert(PQstatus(remoteconn) == CONNECTION_OK
+		   && node_id != NULL
+		   && node_name != NULL
+		   && local_connstring != NULL
+		   && upstream_connstring != NULL);
+
+	initStringInfo(&query);
+	appendStringInfo(&query, "SELECT lrg_insert_into_nodes('%s', %d, '%s', '%s', '%s')",
+					 node_id, status, node_name, local_connstring, upstream_connstring);
+
+	result = PQexec(remoteconn, query.data);
+	if (PQresultStatus(result) != PGRES_TUPLES_OK)
+		elog(ERROR, "failed to execute libpqlrg_insert_to_remote_lrg_nodes: %s", query.data);
+	PQclear(result);
+
+	pfree(query.data);
+}
+
+/*
+ * Create a subscription with given name and parameters, and
+ * add a tuple to remote's pg_lrg_sub.
+ *
+ * Note that both of this and  libpqlrg_insert_into_lrg_nodes()
+ * must be called during attaching a node.
+ */
+static void
+libpqlrg_create_subscription(const char *group_name, const char *publisher_connstring,
+							 const char *publisher_node_id, const char *subscriber_node_id,
+							 PGconn *subscriberconn, const char *options)
+{
+	StringInfoData query, sub_name;
+	PGresult *result;
+
+	Assert(publisher_connstring != NULL && subscriberconn != NULL);
+
+	/*
+	 * the name of subscriber is just concat of two node_id.
+	 */
+	initStringInfo(&query);
+	initStringInfo(&sub_name);
+
+	/*
+	 * construct the name of subscription and query.
+	 */
+	appendStringInfo(&sub_name, "sub_%s_%s", subscriber_node_id, publisher_node_id);
+	appendStringInfo(&query, "CREATE SUBSCRIPTION %s CONNECTION '%s' PUBLICATION pub_for_%s",
+					 sub_name.data, publisher_connstring, group_name);
+
+	if (options)
+		appendStringInfo(&query, " WITH (%s)", options);
+
+	result = PQexec(subscriberconn, query.data);
+	if (PQresultStatus(result) != PGRES_COMMAND_OK)
+		elog(ERROR, "failed to create subscription: %s", query.data);
+	PQclear(result);
+
+	resetStringInfo(&query);
+	appendStringInfo(&query, "SELECT lrg_insert_into_sub('%s')", sub_name.data);
+	result = PQexec(subscriberconn, query.data);
+	if (PQresultStatus(result) != PGRES_TUPLES_OK)
+		elog(ERROR, "failed to execute lrg_insert_into_sub: %s", query.data);
+	PQclear(result);
+
+	pfree(sub_name.data);
+	pfree(query.data);
+}
+
+/*
+ * Drop a given publication and delete a tuple
+ * from remote's pg_lrg_pub.
+ */
+static void
+libpqlrg_drop_publication(const char *group_name,
+						  PGconn *publisherconn)
+{
+	StringInfoData query, pub_name;
+	PGresult *result;
+
+	Assert(PQstatus(publisherconn) == CONNECTION_OK);
+
+	initStringInfo(&query);
+	initStringInfo(&pub_name);
+
+	appendStringInfo(&pub_name, "pub_for_%s", group_name);
+	appendStringInfo(&query, "DROP PUBLICATION %s", pub_name.data);
+
+	result = PQexec(publisherconn, query.data);
+	if (PQresultStatus(result) != PGRES_COMMAND_OK)
+		elog(ERROR, "failed to drop publication: %s", query.data);
+	PQclear(result);
+	pfree(pub_name.data);
+	pfree(query.data);
+}
+
+/*
+ * same as above, but for subscription.
+ */
+static void
+libpqlrg_drop_subscription(const char *group_name,
+						   const char *publisher_node_id, const char *subscriber_node_id,
+						   PGconn *subscriberconn)
+{
+	StringInfoData query, sub_name;
+	PGresult *result;
+
+	Assert(PQstatus(subscriberconn) == CONNECTION_OK);
+
+	/*
+	 * the name of subscriber is just concat of two node_id.
+	 */
+	initStringInfo(&query);
+	initStringInfo(&sub_name);
+
+	/*
+	 * construct the name of subscription and query.
+	 */
+	appendStringInfo(&sub_name, "sub_%s_%s", subscriber_node_id, publisher_node_id);
+	appendStringInfo(&query, "DROP SUBSCRIPTION %s", sub_name.data);
+
+	result = PQexec(subscriberconn, query.data);
+	if (PQresultStatus(result) != PGRES_COMMAND_OK)
+		elog(ERROR, "failed to drop subscription: %s", query.data);
+	PQclear(result);
+	pfree(sub_name.data);
+	pfree(query.data);
+}
+
+/*
+ * Delete data to remote's pg_lrg_nodes. It will be done
+ * via internal SQL function.
+ */
+static void
+libpqlrg_delete_from_nodes(PGconn *conn, const char *node_id)
+{
+	StringInfoData query;
+	PGresult *result;
+
+	Assert(PQstatus(conn) == CONNECTION_OK);
+
+	initStringInfo(&query);
+	appendStringInfo(&query, "DELETE FROM pg_lrg_nodes WHERE nodeid = '%s'", node_id);
+
+	result = PQexec(conn, query.data);
+	if (PQresultStatus(result) != PGRES_COMMAND_OK)
+		elog(ERROR, "failed to delete from pg_lrg_nodes: %s", query.data);
+
+	PQclear(result);
+	pfree(query.data);
+}
+
+/*
+ * Delete all data from LRG catalogs
+ */
+static void
+libpqlrg_cleanup(PGconn *conn)
+{
+	PGresult *result;
+	Assert(PQstatus(conn) == CONNECTION_OK);
+
+	result = PQexec(conn, "DELETE FROM pg_lrg_pub;"
+						  "DELETE FROM pg_lrg_sub;"
+						  "DELETE FROM pg_lrg_nodes;"
+						  "DELETE FROM pg_lrg_info;");
+	if (PQresultStatus(result) != PGRES_COMMAND_OK)
+		elog(ERROR, "failed to DELETE");
+
+	PQclear(result);
+}
+
+/*
+ * Just a wrapper for PQfinish()
+ */
+static void
+libpqlrg_disconnect(PGconn *conn)
+{
+	PQfinish(conn);
+}
+
+/*
+ * Module initialization function
+ */
+void
+_PG_init(void)
+{
+	if (LrgFunctionTypes != NULL)
+		elog(ERROR, "libpqlrg already loaded");
+	LrgFunctionTypes = &PQLrgFunctionTypes;
+}
diff --git a/src/backend/replication/lrg/Makefile b/src/backend/replication/lrg/Makefile
new file mode 100644
index 0000000000..4ce929b6a4
--- /dev/null
+++ b/src/backend/replication/lrg/Makefile
@@ -0,0 +1,22 @@
+#-------------------------------------------------------------------------
+#
+# Makefile--
+#    Makefile for src/backend/replication/lrg
+#
+# IDENTIFICATION
+#    src/backend/replication/lrg/Makefile
+#
+#-------------------------------------------------------------------------
+
+subdir = src/backend/replication/lrg
+top_builddir = ../../../..
+include $(top_builddir)/src/Makefile.global
+
+override CPPFLAGS := -I$(srcdir) -I$(libpq_srcdir) $(CPPFLAGS)
+
+OBJS = \
+	lrg.o \
+	lrg_launcher.o \
+	lrg_worker.o
+
+include $(top_srcdir)/src/backend/common.mk
diff --git a/src/backend/replication/lrg/lrg.c b/src/backend/replication/lrg/lrg.c
new file mode 100644
index 0000000000..be8a757162
--- /dev/null
+++ b/src/backend/replication/lrg/lrg.c
@@ -0,0 +1,522 @@
+/*-------------------------------------------------------------------------
+ *
+ * lrg.c
+ *		  Constructs a logical replication group
+ *
+ *-------------------------------------------------------------------------
+ */
+
+#include "postgres.h"
+
+#include "access/heapam.h"
+#include "access/htup_details.h"
+#include "access/relscan.h"
+#include "access/table.h"
+#include "access/xlog.h"
+#include "catalog/catalog.h"
+#include "catalog/indexing.h"
+#include "catalog/pg_lrg_info.h"
+#include "catalog/pg_lrg_nodes.h"
+#include "catalog/pg_lrg_sub.h"
+#include "catalog/pg_subscription.h"
+#include "miscadmin.h"
+#include "postmaster/bgworker.h"
+#include "replication/libpqlrg.h"
+#include "replication/logicallauncher.h"
+#include "replication/lrg.h"
+#include "storage/lock.h"
+#include "utils/builtins.h"
+#include "utils/fmgrprotos.h"
+#include "utils/memutils.h"
+#include "utils/rel.h"
+#include "utils/snapmgr.h"
+#include "utils/syscache.h"
+
+#include "storage/proc.h"
+#include "utils/guc.h"
+
+LrgPerdbCtxStruct *LrgPerdbCtx;
+
+static Size lrg_worker_array_size(void);
+static Oid lrg_add_info(char *group_name, bool puballtables);
+static Oid find_subscription(const char *subname);
+
+/*
+ * Helpler function for LrgLauncherShmemInit.
+ */
+static Size
+lrg_worker_array_size(void)
+{
+	Size size;
+
+	size = sizeof(LrgPerdbCtxStruct);
+	size = MAXALIGN(size);
+	/* XXX: for simplify the size of the array is set to max_worker_processes */
+	size = add_size(size, mul_size(max_worker_processes, sizeof(LrgPerdbCtxStruct)));
+
+	return size;
+}
+
+/*
+ * Allocate LrgPerdbCtxStruct to the shared memory.
+ */
+void
+LrgLauncherShmemInit(void)
+{
+	bool		found;
+
+	LWLockAcquire(AddinShmemInitLock, LW_EXCLUSIVE);
+	LrgPerdbCtx = (LrgPerdbCtxStruct *)
+		ShmemInitStruct("Lrg Launcher Data",
+						lrg_worker_array_size(),
+						&found);
+	if (!found)
+	{
+		MemSet(LrgPerdbCtx, 0, lrg_worker_array_size());
+		LWLockInitialize(&(LrgPerdbCtx->lock), LWLockNewTrancheId());
+	}
+	LWLockRelease(AddinShmemInitLock);
+	LWLockRegisterTranche(LrgPerdbCtx->lock.tranche, "lrg");
+}
+
+void
+LrgLauncherRegister(void)
+{
+	BackgroundWorker worker;
+
+	/*
+	 * LRG deeply depends on the logical replication mechanism, so
+	 * skip registering the LRG launcher if logical replication
+	 * cannot be used.
+	 */
+	if (max_logical_replication_workers == 0)
+		return;
+
+	/*
+	 * Build struct BackgroundWorker for launcher.
+	 */
+	MemSet(&worker, 0, sizeof(BackgroundWorker));
+
+	snprintf(worker.bgw_name, BGW_MAXLEN, "lrg launcher");
+	worker.bgw_start_time = BgWorkerStart_RecoveryFinished;
+	worker.bgw_flags = BGWORKER_SHMEM_ACCESS | BGWORKER_BACKEND_DATABASE_CONNECTION;
+	worker.bgw_restart_time = BGW_NEVER_RESTART;
+	snprintf(worker.bgw_library_name, BGW_MAXLEN, "postgres");
+	snprintf(worker.bgw_function_name, BGW_MAXLEN, "lrg_launcher_main");
+	RegisterBackgroundWorker(&worker);
+}
+
+/*
+ * construct node_id.
+ *
+ * TODO: construct proper node_id. Currently it is just concat of
+ * sytem identifier and dbid.
+ */
+void
+construct_node_id(char *out_node_id, int size)
+{
+	snprintf(out_node_id, size, UINT64_FORMAT "%u", GetSystemIdentifier(), MyDatabaseId);
+}
+
+/*
+ * Actual work for adding a tuple to pg_lrg_nodes.
+ */
+void
+lrg_add_nodes(char *node_id, Oid group_id, LRG_NODE_STATE status, char *node_name, char *local_connstring, char *upstream_connstring)
+{
+	Relation rel;
+	bool		nulls[Natts_pg_lrg_nodes];
+	Datum		values[Natts_pg_lrg_nodes];
+	HeapTuple tup;
+
+	Oid			lrgnodesoid;
+
+	rel = table_open(LrgNodesRelationId, ExclusiveLock);
+
+	memset(values, 0, sizeof(values));
+	memset(nulls, 0, sizeof(nulls));
+
+	lrgnodesoid = GetNewOidWithIndex(rel, LrgNodesRelationIndexId, Anum_pg_lrg_nodes_oid);
+	values[Anum_pg_lrg_nodes_oid - 1] = ObjectIdGetDatum(lrgnodesoid);
+	values[Anum_pg_lrg_nodes_nodeid - 1] = CStringGetDatum(node_id);
+	values[Anum_pg_lrg_nodes_groupid - 1] = ObjectIdGetDatum(group_id);
+	values[Anum_pg_lrg_nodes_status - 1] = Int32GetDatum(status);
+	values[Anum_pg_lrg_nodes_dbid - 1] = ObjectIdGetDatum(MyDatabaseId);
+	values[Anum_pg_lrg_nodes_nodename - 1] = CStringGetDatum(node_name);
+	values[Anum_pg_lrg_nodes_localconn - 1] = CStringGetDatum(local_connstring);
+
+	if (upstream_connstring != NULL)
+		values[Anum_pg_lrg_nodes_upstreamconn - 1] = CStringGetDatum(upstream_connstring);
+	else
+		nulls[Anum_pg_lrg_nodes_upstreamconn - 1] = true;
+
+	tup = heap_form_tuple(RelationGetDescr(rel), values, nulls);
+	/* Insert tuple into catalog. */
+	CatalogTupleInsert(rel, tup);
+	heap_freetuple(tup);
+	table_close(rel, ExclusiveLock);
+}
+
+/*
+ * read pg_lrg_info and get oid.
+ *
+ * XXX: This function assumes that there is only one tuple
+ * in thepg_lrg_info.
+ */
+Oid
+get_group_info(char **group_name)
+{
+	Relation	rel;
+	HeapTuple tup;
+	TableScanDesc scan;
+	Oid group_oid = InvalidOid;
+	Form_pg_lrg_info infoform;
+	bool is_opened = false;
+
+	if (!IsTransactionState())
+	{
+		is_opened = true;
+		StartTransactionCommand();
+		(void) GetTransactionSnapshot();
+	}
+
+	rel = table_open(LrgInfoRelationId, AccessShareLock);
+	scan = table_beginscan_catalog(rel, 0, NULL);
+	tup = heap_getnext(scan, ForwardScanDirection);
+
+	if (tup != NULL)
+	{
+		infoform = (Form_pg_lrg_info) GETSTRUCT(tup);
+		group_oid = infoform->oid;
+		if (group_name != NULL)
+		{
+			MemoryContext old;
+			old = MemoryContextSwitchTo(TopMemoryContext);
+			*group_name = pstrdup(NameStr(infoform->groupname));
+			MemoryContextSwitchTo(old);
+		}
+	}
+
+	table_endscan(scan);
+	table_close(rel, AccessShareLock);
+
+	if (is_opened)
+		CommitTransactionCommand();
+
+	return group_oid;
+}
+
+/*
+ * Actual work for adding a tuple to pg_lrg_info.
+ */
+static Oid
+lrg_add_info(char *group_name, bool puballtables)
+{
+	Relation	rel;
+	bool		nulls[Natts_pg_lrg_info];
+	Datum		values[Natts_pg_lrg_info];
+	HeapTuple tup;
+	Oid			lrgoid;
+
+	rel = table_open(LrgInfoRelationId, ExclusiveLock);
+
+	memset(values, 0, sizeof(values));
+	memset(nulls, 0, sizeof(nulls));
+
+	lrgoid = GetNewOidWithIndex(rel, LrgInfoRelationIndexId, Anum_pg_lrg_info_oid);
+	values[Anum_pg_lrg_info_oid - 1] = ObjectIdGetDatum(lrgoid);
+	values[Anum_pg_lrg_info_groupname - 1] = CStringGetDatum(group_name);
+	values[Anum_pg_lrg_info_puballtables - 1] = BoolGetDatum(puballtables);
+
+	tup = heap_form_tuple(RelationGetDescr(rel), values, nulls);
+	/* Insert tuple into catalog. */
+	CatalogTupleInsert(rel, tup);
+	heap_freetuple(tup);
+	table_close(rel, ExclusiveLock);
+
+	return lrgoid;
+}
+
+/*
+ * helper function for lrg_insert_into_sub
+ */
+static Oid
+find_subscription(const char *subname)
+{
+	/* for scannning */
+	Relation rel;
+	HeapTuple tup;
+	Form_pg_subscription form;
+
+	rel = table_open(SubscriptionRelationId, AccessExclusiveLock);
+	tup = SearchSysCacheCopy2(SUBSCRIPTIONNAME, MyDatabaseId,
+							  CStringGetDatum(subname));
+
+	if (!HeapTupleIsValid(tup))
+	{
+		table_close(rel, NoLock);
+		return InvalidOid;
+	}
+
+	form = (Form_pg_subscription) GETSTRUCT(tup);
+	table_close(rel, NoLock);
+
+	return form->oid;
+}
+
+/*
+ * ================================
+ * Public APIs
+ * ================================
+ */
+
+/*
+ * SQL function for creating a new logical replication group.
+ *
+ * This function adds a tuple to pg_lrg_info and pg_lrg_nodes,
+ * and after that kick lrg launcher.
+ */
+Datum
+lrg_create(PG_FUNCTION_ARGS)
+{
+	Oid			lrgoid;
+	char		*group_name;
+	char		*pub_type;
+	char		*local_connstring;
+	char		*node_name;
+
+	/* XXX: for simplify the fixed array is used */
+	char		node_id[64];
+
+	if (get_group_info(NULL) != InvalidOid)
+		elog(ERROR, "This node is already a member of a node group");
+
+	group_name = text_to_cstring((text *) DatumGetPointer(PG_GETARG_DATUM(0)));
+	pub_type = text_to_cstring((text *) DatumGetPointer(PG_GETARG_DATUM(1)));
+
+	if (pg_strcasecmp(pub_type, "FOR ALL TABLES") != 0)
+		elog(ERROR, "'only 'FOR ALL TABLES' is support");
+
+	lrgoid = lrg_add_info(group_name, true);
+
+	construct_node_id(node_id, sizeof(node_id));
+	local_connstring = text_to_cstring((text *) DatumGetPointer(PG_GETARG_DATUM(2)));
+	node_name = text_to_cstring((text *) DatumGetPointer(PG_GETARG_DATUM(3)));
+	lrg_add_nodes(node_id, lrgoid, LRG_STATE_INIT, node_name, local_connstring, NULL);
+
+	lrg_launcher_wakeup();
+	PG_RETURN_VOID();
+}
+
+
+/*
+ * SQL function for attaching to a specified group
+ *
+ * This function adds a tuple to pg_lrg_info and pg_lrg_nodes,
+ * and after that kick lrg launcher.
+ */
+Datum
+lrg_node_attach(PG_FUNCTION_ARGS)
+{
+	Oid			lrgoid;
+	char		*group_name;
+	char		*local_connstring;
+	char		*upstream_connstring;
+	char		*node_name;
+	PGconn		*upstreamconn = NULL;
+
+	/* XXX: for simplify the fixed array is used */
+	char		node_id[64];
+
+	if (get_group_info(NULL) != InvalidOid)
+		elog(ERROR, "This node is already a member of a node group");
+
+	group_name = text_to_cstring((text *) DatumGetPointer(PG_GETARG_DATUM(0)));
+	local_connstring = text_to_cstring((text *) DatumGetPointer(PG_GETARG_DATUM(1)));
+	upstream_connstring = text_to_cstring((text *) DatumGetPointer(PG_GETARG_DATUM(2)));
+	node_name = text_to_cstring((text *) DatumGetPointer(PG_GETARG_DATUM(3)));
+
+	/*
+	 * For sanity check the backend process must connect to the upstream node.
+	 * libpqlrg shared library will be used for that.
+	 */
+	load_file("libpqlrg", false);
+	lrg_connect(upstream_connstring, &upstreamconn);
+	if (!lrg_check_group(upstreamconn, group_name))
+		elog(ERROR, "specified group is not exist");
+	lrg_disconnect(upstreamconn);
+
+	lrgoid = lrg_add_info(group_name, true);
+	construct_node_id(node_id, sizeof(node_id));
+	lrg_add_nodes(node_id, lrgoid, LRG_STATE_INIT, node_name, local_connstring, upstream_connstring);
+
+	lrg_launcher_wakeup();
+	PG_RETURN_VOID();
+}
+
+/*
+ * SQL function for detaching from a group
+ */
+Datum
+lrg_node_detach(PG_FUNCTION_ARGS)
+{
+	char		*node_name;
+	char		*given_group_name;
+	char		*group_name_from_catalog = NULL;
+
+	given_group_name = text_to_cstring((text *) DatumGetPointer(PG_GETARG_DATUM(0)));
+	node_name = text_to_cstring((text *) DatumGetPointer(PG_GETARG_DATUM(1)));
+
+	(void) get_group_info(&group_name_from_catalog);
+	if (group_name_from_catalog == NULL ||
+		strcmp(given_group_name, group_name_from_catalog) != 0)
+		elog(ERROR, "This node is not a member of the specified group: %s", given_group_name);
+
+	update_node_status_by_nodename(node_name, LRG_STATE_TO_BE_DETACHED, true);
+	lrg_launcher_wakeup();
+	PG_RETURN_VOID();
+}
+
+/*
+ * SQL function for dropping a group.
+ */
+Datum
+lrg_drop(PG_FUNCTION_ARGS)
+{
+	char node_id[64];
+	char		*given_group_name;
+	char		*group_name_from_catalog = NULL;
+
+	construct_node_id(node_id, sizeof(node_id));
+
+	given_group_name = text_to_cstring((text *) DatumGetPointer(PG_GETARG_DATUM(0)));
+
+	(void) get_group_info(&group_name_from_catalog);
+	if (group_name_from_catalog == NULL ||
+		strcmp(given_group_name, group_name_from_catalog) != 0)
+		elog(ERROR, "This node is not a member of the specified group: %s", given_group_name);
+
+	/* TODO: add a check whether there are not other members in the group or not  */
+	update_node_status_by_nodeid(node_id, LRG_STATE_TO_BE_DETACHED, true);
+	lrg_launcher_wakeup();
+	PG_RETURN_VOID();
+}
+
+/*
+ * Wait until lrg related functions are done
+ */
+Datum
+lrg_wait(PG_FUNCTION_ARGS)
+{
+	if (get_group_info(NULL) == InvalidOid)
+		PG_RETURN_NULL();
+
+	for (;;)
+	{
+		Relation	rel;
+		HeapTuple	tup;
+		TableScanDesc scan;
+		bool need_more_loop = false;
+
+		CHECK_FOR_INTERRUPTS();
+
+		rel = table_open(LrgNodesRelationId, AccessShareLock);
+		scan = table_beginscan_catalog(rel, 0, NULL);
+
+		while (HeapTupleIsValid(tup = heap_getnext(scan, ForwardScanDirection)))
+		{
+			Form_pg_lrg_nodes nodesform = (Form_pg_lrg_nodes) GETSTRUCT(tup);
+
+			/*
+			 * Set a flag if we must wait more.
+			 */
+			if (nodesform->status != LRG_STATE_READY)
+				need_more_loop = true;
+		}
+
+		table_endscan(scan);
+		table_close(rel, NoLock);
+
+		if (!need_more_loop)
+			break;
+
+		elog(LOG, "we need to wait more...");
+
+#define TEMPORARY_NAP_TIME 500L
+		WaitLatch(&MyProc->procLatch,
+				  WL_LATCH_SET | WL_TIMEOUT | WL_EXIT_ON_PM_DEATH,
+				  TEMPORARY_NAP_TIME, 0);
+	}
+
+	PG_RETURN_VOID();
+}
+
+/*
+ * ================================
+ * Internal SQL functions
+ * ================================
+ */
+
+/*
+ * Wrapper for adding a tuple into pg_lrg_sub
+ */
+Datum
+lrg_insert_into_sub(PG_FUNCTION_ARGS)
+{
+	char *sub_name;
+	Oid group_oid, sub_oid, lrgsub_oid;
+	Relation rel;
+	bool		nulls[Natts_pg_lrg_sub];
+	Datum		values[Natts_pg_lrg_sub];
+	HeapTuple tup;
+
+	sub_name = text_to_cstring((text *) DatumGetPointer(PG_GETARG_DATUM(0)));
+
+	group_oid = get_group_info(NULL);
+	sub_oid = find_subscription(sub_name);
+
+	rel = table_open(LrgSubscriptionId, ExclusiveLock);
+
+	memset(values, 0, sizeof(values));
+	memset(nulls, 0, sizeof(nulls));
+
+	lrgsub_oid = GetNewOidWithIndex(rel, LrgSubscriptionOidIndexId, Anum_pg_lrg_sub_oid);
+
+	values[Anum_pg_lrg_sub_oid - 1] = ObjectIdGetDatum(lrgsub_oid);
+	values[Anum_pg_lrg_sub_groupid - 1] = ObjectIdGetDatum(group_oid);
+	values[Anum_pg_lrg_sub_subid - 1] = ObjectIdGetDatum(sub_oid);
+
+	tup = heap_form_tuple(RelationGetDescr(rel), values, nulls);
+	/* Insert tuple into catalog. */
+	CatalogTupleInsert(rel, tup);
+	heap_freetuple(tup);
+	table_close(rel, ExclusiveLock);
+
+	PG_RETURN_VOID();
+}
+
+/*
+ * Wrapper for adding a tuple into pg_lrg_nodes
+ */
+Datum
+lrg_insert_into_nodes(PG_FUNCTION_ARGS)
+{
+	char *node_id;
+	LRG_NODE_STATE status;
+	char *node_name;
+	char *local_connstring;
+	char *upstream_connstring;
+	Oid group_oid;
+
+	node_id = text_to_cstring((text *) DatumGetPointer(PG_GETARG_DATUM(0)));
+	status = DatumGetInt32(PG_GETARG_DATUM(1));
+	node_name = text_to_cstring((text *) DatumGetPointer(PG_GETARG_DATUM(2)));
+	local_connstring = text_to_cstring((text *) DatumGetPointer(PG_GETARG_DATUM(3)));
+	upstream_connstring = text_to_cstring((text *) DatumGetPointer(PG_GETARG_DATUM(4)));
+
+	group_oid = get_group_info(NULL);
+
+	lrg_add_nodes(node_id, group_oid, status, node_name, local_connstring, upstream_connstring);
+
+	PG_RETURN_VOID();
+}
diff --git a/src/backend/replication/lrg/lrg_launcher.c b/src/backend/replication/lrg/lrg_launcher.c
new file mode 100644
index 0000000000..2a63546ffb
--- /dev/null
+++ b/src/backend/replication/lrg/lrg_launcher.c
@@ -0,0 +1,341 @@
+/*-------------------------------------------------------------------------
+ *
+ * lrg_launcher.c
+ *		  functions for lrg launcher
+ *
+ *-------------------------------------------------------------------------
+ */
+#include "postgres.h"
+
+#include "access/htup_details.h"
+#include "access/heapam.h"
+#include "access/relscan.h"
+#include "access/table.h"
+#include "catalog/pg_database.h"
+#include "miscadmin.h"
+#include "postmaster/bgworker.h"
+#include "postmaster/interrupt.h"
+#include "replication/logicallauncher.h"
+#include "replication/lrg.h"
+#include "storage/ipc.h"
+#include "storage/proc.h"
+#include "tcop/tcopprot.h"
+#include "utils/memutils.h"
+#include "utils/snapmgr.h"
+
+static void launch_lrg_worker(Oid dbid);
+static LrgPerdbWorker* find_perdb_worker(Oid dbid);
+static List* get_db_list(void);
+static void scan_and_launch(void);
+static void lrglauncher_worker_onexit(int code, Datum arg);
+
+static bool ishook_registered = false;
+static bool isworker_needed = false;
+
+typedef struct db_list_cell
+{
+	Oid dbid;
+	char *dbname;
+} db_list_cell;
+
+/*
+ * Launch a lrg worker related with the given database
+ */
+static void
+launch_lrg_worker(Oid dbid)
+{
+	BackgroundWorker bgw;
+	LrgPerdbWorker *worker = NULL;
+	int slot = 0;
+
+	LWLockAcquire(&LrgPerdbCtx->lock, LW_EXCLUSIVE);
+
+	/*
+	 * Find a free worker slot.
+	 */
+	for (int i = 0; i < max_logical_replication_workers; i++)
+	{
+		LrgPerdbWorker *pw = &LrgPerdbCtx->workers[i];
+
+		if (pw->dbid == InvalidOid)
+		{
+			worker = pw;
+			slot = i;
+			break;
+		}
+	}
+
+	/*
+	 * If there are no more free worker slots, raise an ERROR now.
+	 *
+	 * TODO: cleanup the array?
+	 */
+	if (worker == NULL)
+	{
+		LWLockRelease(&LrgPerdbCtx->lock);
+		ereport(ERROR,
+				errmsg("out of worker slots"));
+	}
+
+
+	/* Prepare the worker slot. */
+	worker->dbid = dbid;
+
+	LWLockRelease(&LrgPerdbCtx->lock);
+
+	MemSet(&bgw, 0, sizeof(BackgroundWorker));
+
+	snprintf(bgw.bgw_name, BGW_MAXLEN, "lrg worker for database %u", dbid);
+	bgw.bgw_start_time = BgWorkerStart_RecoveryFinished;
+	bgw.bgw_flags = BGWORKER_SHMEM_ACCESS | BGWORKER_BACKEND_DATABASE_CONNECTION;
+	bgw.bgw_restart_time = BGW_NEVER_RESTART;
+	snprintf(bgw.bgw_library_name, BGW_MAXLEN, "postgres");
+	snprintf(bgw.bgw_function_name, BGW_MAXLEN, "lrg_worker_main");
+	bgw.bgw_main_arg = UInt32GetDatum(slot);
+
+	if (!RegisterDynamicBackgroundWorker(&bgw, NULL))
+	{
+		/* Failed to start worker, so clean up the worker slot. */
+		LWLockAcquire(&LrgPerdbCtx->lock, LW_EXCLUSIVE);
+		lrg_worker_cleanup(worker);
+		LWLockRelease(&LrgPerdbCtx->lock);
+		ereport(ERROR,
+				(errcode(ERRCODE_CONFIGURATION_LIMIT_EXCEEDED),
+				 errmsg("out of worker slots")));
+	}
+}
+
+/*
+ * Find a launched lrg worker that related with the given database.
+ * This returns NUL if not exist.
+ */
+static LrgPerdbWorker*
+find_perdb_worker(Oid dbid)
+{
+	int i;
+
+	Assert(LWLockHeldByMe(&LrgPerdbCtx->lock));
+
+	for (i = 0; i < max_logical_replication_workers; i++)
+	{
+		LrgPerdbWorker *worker = &LrgPerdbCtx->workers[i];
+		if (worker->dbid == dbid)
+			return worker;
+	}
+	return NULL;
+}
+
+/*
+ * Load the list of databases in this server.
+ */
+static List*
+get_db_list()
+{
+	List *res = NIL;
+	Relation	rel;
+	TableScanDesc scan;
+	HeapTuple	tup;
+	/* We will allocate the output data in the current memory context */
+	MemoryContext resultcxt = CurrentMemoryContext;
+
+	StartTransactionCommand();
+	(void) GetTransactionSnapshot();
+
+	rel = table_open(DatabaseRelationId, AccessShareLock);
+	scan = table_beginscan_catalog(rel, 0, NULL);
+
+	while (HeapTupleIsValid(tup = heap_getnext(scan, ForwardScanDirection)))
+	{
+		Form_pg_database dbform = (Form_pg_database) GETSTRUCT(tup);
+		db_list_cell *cell;
+		MemoryContext oldcxt;
+
+		/* skip if connection is not allowed */
+		if (!dbform->datallowconn)
+			continue;
+
+		/*
+		 * Allocate our results in the caller's context
+		 */
+		oldcxt = MemoryContextSwitchTo(resultcxt);
+
+		cell = (db_list_cell *) palloc0(sizeof(db_list_cell));
+		cell->dbid = dbform->oid;
+		cell->dbname = pstrdup(NameStr(dbform->datname));
+		res = lappend(res, cell);
+
+		MemoryContextSwitchTo(oldcxt);
+	}
+
+	table_endscan(scan);
+	table_close(rel, AccessShareLock);
+	CommitTransactionCommand();
+
+	return res;
+}
+
+/*
+ * Scan pg_lrg_nodes and launch if needed.
+ */
+static void
+scan_and_launch(void)
+{
+	List *list;
+	ListCell   *lc;
+	MemoryContext subctx;
+	MemoryContext oldctx;
+
+	subctx = AllocSetContextCreate(TopMemoryContext,
+									"Lrg Launcher list",
+									ALLOCSET_DEFAULT_SIZES);
+	oldctx = MemoryContextSwitchTo(subctx);
+
+	list = get_db_list();
+
+	foreach(lc, list)
+	{
+		db_list_cell *cell = (db_list_cell *)lfirst(lc);
+		LrgPerdbWorker *worker;
+
+		LWLockAcquire(&LrgPerdbCtx->lock, LW_EXCLUSIVE);
+		worker = find_perdb_worker(cell->dbid);
+		LWLockRelease(&LrgPerdbCtx->lock);
+
+		if (worker != NULL)
+			continue;
+
+		launch_lrg_worker(cell->dbid);
+	}
+
+	/* Switch back to original memory context. */
+	MemoryContextSwitchTo(oldctx);
+	/* Clean the temporary memory. */
+	MemoryContextDelete(subctx);
+}
+
+
+/*
+ * Callback for process exit. cleanup the controller
+ */
+static void
+lrglauncher_worker_onexit(int code, Datum arg)
+{
+	LWLockAcquire(&LrgPerdbCtx->lock, LW_EXCLUSIVE);
+	LrgPerdbCtx->launcher_pid = InvalidPid;
+	LrgPerdbCtx->launcher_latch = NULL;
+	LWLockRelease(&LrgPerdbCtx->lock);
+}
+
+/*
+ * Entry point for lrg launcher
+ */
+void
+lrg_launcher_main(Datum arg)
+{
+	Assert(LrgPerdbCtx->launcher_pid == 0);
+	LrgPerdbCtx->launcher_pid = MyProcPid;
+
+	/* Establish signal handlers. */
+	pqsignal(SIGHUP, SignalHandlerForConfigReload);
+	pqsignal(SIGTERM, die);
+	BackgroundWorkerUnblockSignals();
+
+	/*
+	 * Register my latch to the controller
+	 * for receiving notifications from lrg background worker.
+	 */
+	LWLockAcquire(&LrgPerdbCtx->lock, LW_EXCLUSIVE);
+	LrgPerdbCtx->launcher_latch = &MyProc->procLatch;
+	LrgPerdbCtx->launcher_pid = MyProcPid;
+	LWLockRelease(&LrgPerdbCtx->lock);
+	before_shmem_exit(lrglauncher_worker_onexit, (Datum) 0);
+	ResetLatch(&MyProc->procLatch);
+
+	/*
+	 * we did not connect specific database, because launcher
+	 * will read only pg_database.
+	 */
+	BackgroundWorkerInitializeConnection(NULL, NULL, 0);
+
+	/*
+	 * main loop
+	 */
+	for (;;)
+	{
+		int rc = 0;
+
+		CHECK_FOR_INTERRUPTS();
+
+		/*
+		 * XXX: for simplify laucnher will start a loop at fixed intervals,
+		 * but it will be no-op if no one sets a latch.
+		 */
+#define TEMPORARY_NAP_TIME 180000L
+
+		rc = WaitLatch(&MyProc->procLatch,
+					   WL_LATCH_SET | WL_TIMEOUT | WL_EXIT_ON_PM_DEATH,
+					   TEMPORARY_NAP_TIME, 0);
+		if (rc & WL_LATCH_SET)
+		{
+			ResetLatch(&MyProc->procLatch);
+			CHECK_FOR_INTERRUPTS();
+			scan_and_launch();
+		}
+
+		if (ConfigReloadPending)
+		{
+			ConfigReloadPending = false;
+			ProcessConfigFile(PGC_SIGHUP);
+		}
+	}
+	/* Not reachable */
+}
+
+/*
+ * xact callback for launcher/worker.
+ */
+static void
+lrg_perdb_wakeup_callback(XactEvent event, void *arg)
+{
+	switch (event)
+	{
+		case XACT_EVENT_COMMIT:
+			if (isworker_needed)
+			{
+				LrgPerdbWorker *worker;
+				LWLockAcquire(&LrgPerdbCtx->lock, LW_EXCLUSIVE);
+				worker = find_perdb_worker(MyDatabaseId);
+
+				/*
+				 * If lrg worker related with this db has been
+				 * launched, notify to the worker.
+				 * If not, maybe it means that someone has called lrg_create()/lrg_node_attach(),
+				 * notify to the launcher.
+				 */
+				if (worker != NULL)
+					SetLatch(worker->worker_latch);
+				else
+					SetLatch(LrgPerdbCtx->launcher_latch);
+
+				LWLockRelease(&LrgPerdbCtx->lock);
+			}
+			isworker_needed = false;
+			break;
+		default:
+			break;
+	}
+}
+
+/*
+ * Register a callback for notifying to launcher, and set a flag
+ */
+void
+lrg_launcher_wakeup(void)
+{
+	if (!ishook_registered)
+	{
+		RegisterXactCallback(lrg_perdb_wakeup_callback, NULL);
+		ishook_registered = true;
+	}
+	isworker_needed = true;
+}
diff --git a/src/backend/replication/lrg/lrg_worker.c b/src/backend/replication/lrg/lrg_worker.c
new file mode 100644
index 0000000000..f4ccf3cc1c
--- /dev/null
+++ b/src/backend/replication/lrg/lrg_worker.c
@@ -0,0 +1,652 @@
+/*-------------------------------------------------------------------------
+ *
+ * lrg_worker.c
+ *		  functions for lrg worker
+ *
+ *-------------------------------------------------------------------------
+ */
+
+#include "postgres.h"
+
+#include "access/genam.h"
+#include "access/heapam.h"
+#include "access/htup_details.h"
+#include "access/relscan.h"
+#include "access/table.h"
+#include "catalog/catalog.h"
+#include "catalog/indexing.h"
+#include "catalog/pg_lrg_info.h"
+#include "catalog/pg_lrg_nodes.h"
+#include "catalog/pg_lrg_pub.h"
+#include "catalog/pg_publication.h"
+#include "executor/spi.h"
+#include "libpq-fe.h"
+#include "miscadmin.h"
+#include "postmaster/bgworker.h"
+#include "postmaster/interrupt.h"
+#include "replication/libpqlrg.h"
+#include "replication/lrg.h"
+#include "storage/ipc.h"
+#include "storage/proc.h"
+#include "tcop/tcopprot.h"
+#include "utils/fmgroids.h"
+#include "utils/memutils.h"
+#include "utils/snapmgr.h"
+#include "utils/syscache.h"
+
+typedef struct LrgNode {
+	Oid	  group_oid;
+	char *node_id;
+	char *node_name;
+	char *local_connstring;
+	char *upstream_connstring;
+} LrgNode;
+
+lrg_function_types *LrgFunctionTypes = NULL;
+
+static LrgPerdbWorker* my_lrg_worker = NULL;
+
+static void lrg_worker_onexit(int code, Datum arg);
+static void do_node_management(void);
+static void get_node_information(LrgNode **node, LRG_NODE_STATE *status);
+static void advance_state_machine(LrgNode *node, LRG_NODE_STATE initial_status);
+static void update_node_status_internal(const char *node_id, const char *node_name, LRG_NODE_STATE state, bool is_in_txn);
+static void detach_node(LrgNode *node);
+static void create_publication(const char* group_name, const char* node_id, Oid group_oid);
+static Oid find_publication(const char *pubname);
+static List* get_lrg_nodes_list(const char *local_nodeid);
+static void synchronise_system_tables(PGconn *localconn, PGconn *upstreamconn);
+
+void
+lrg_worker_cleanup(LrgPerdbWorker *worker)
+{
+	Assert(LWLockHeldByMeInMode(&LrgPerdbCtx->lock, LW_EXCLUSIVE));
+
+	worker->dbid = InvalidOid;
+	worker->worker_pid = InvalidPid;
+	worker->worker_latch = NULL;
+}
+
+/*
+ * Callback for process exit. cleanup the array.
+ */
+static void
+lrg_worker_onexit(int code, Datum arg)
+{
+	LWLockAcquire(&LrgPerdbCtx->lock, LW_EXCLUSIVE);
+	lrg_worker_cleanup(my_lrg_worker);
+	LWLockRelease(&LrgPerdbCtx->lock);
+}
+
+/*
+ * Synchronise system tables from upstream node.
+ *
+ * Currently it will read and insert pg_lrg_nodes only.
+ */
+static void
+synchronise_system_tables(PGconn *localconn, PGconn *upstreamconn)
+{
+	lrg_copy_lrg_nodes(upstreamconn, localconn);
+}
+
+/*
+ * Load the list of lrg_nodes, except the given node
+ */
+static List*
+get_lrg_nodes_list(const char *excepted_node)
+{
+	List *res = NIL;
+	Relation	rel;
+	TableScanDesc scan;
+	HeapTuple	tup;
+	/* We will allocate the output data in the current memory context */
+	MemoryContext resultcxt = CurrentMemoryContext;
+
+	StartTransactionCommand();
+	(void) GetTransactionSnapshot();
+
+	rel = table_open(LrgNodesRelationId, AccessShareLock);
+	scan = table_beginscan_catalog(rel, 0, NULL);
+
+	while (HeapTupleIsValid(tup = heap_getnext(scan, ForwardScanDirection)))
+	{
+		Form_pg_lrg_nodes nodesform = (Form_pg_lrg_nodes) GETSTRUCT(tup);
+		LrgNode *node;
+		MemoryContext oldcxt;
+
+		if (excepted_node != NULL &&
+			strcmp(NameStr(nodesform->nodeid), excepted_node) == 0)
+			continue;
+		/*
+		 * Allocate our results in the caller's context, not the transaction's.
+		 */
+		oldcxt = MemoryContextSwitchTo(resultcxt);
+
+		node = (LrgNode *)palloc0(sizeof(LrgNode));
+		node->group_oid = nodesform->groupid;
+		node->node_id = pstrdup(NameStr(nodesform->nodeid));
+		node->node_name = pstrdup(NameStr(nodesform->nodename));
+		node->local_connstring = pstrdup(NameStr(nodesform->localconn));
+
+		/*
+		 * TODO: treat upstreamconn as nullable field
+		 */
+		if (strlen(NameStr(nodesform->upstreamconn)) != 0)
+			node->upstream_connstring = pstrdup(NameStr(nodesform->upstreamconn));
+		else
+			node->upstream_connstring = NULL;
+
+		res = lappend(res, node);
+
+		MemoryContextSwitchTo(oldcxt);
+	}
+
+	table_endscan(scan);
+	table_close(rel, AccessShareLock);
+	CommitTransactionCommand();
+
+	return res;
+}
+
+/*
+ * Internal routine for updaing the status of the node.
+ *
+ * TODO: implement as C func instead of SPI interface
+ */
+static void
+update_node_status_internal(const char *node_id, const char *node_name, LRG_NODE_STATE state, bool is_in_txn)
+{
+	StringInfoData query;
+	int ret;
+
+	Assert(!(node_id == NULL && node_name == NULL)
+			&& !(node_id != NULL && node_name != NULL));
+
+	initStringInfo(&query);
+	appendStringInfo(&query, "UPDATE pg_lrg_nodes SET status = ");
+
+	switch (state)
+	{
+		case LRG_STATE_CREATE_PUBLICATION:
+			appendStringInfo(&query, "%d ", LRG_STATE_CREATE_PUBLICATION);
+			break;
+		case LRG_STATE_CREATE_SUBSCRIPTION:
+			appendStringInfo(&query, "%d", LRG_STATE_CREATE_SUBSCRIPTION);
+			break;
+		case LRG_STATE_READY:
+			appendStringInfo(&query, "%d", LRG_STATE_READY);
+			break;
+		case LRG_STATE_TO_BE_DETACHED:
+			appendStringInfo(&query, "%d", LRG_STATE_TO_BE_DETACHED);
+			break;
+		default:
+			elog(ERROR, "not implemented yet");
+	}
+
+	if (node_id != NULL)
+		appendStringInfo(&query, " WHERE nodeid = '%s'", node_id);
+	else
+		appendStringInfo(&query, " WHERE nodename = '%s'", node_name);
+
+	if (!is_in_txn)
+	{
+		StartTransactionCommand();
+		PushActiveSnapshot(GetTransactionSnapshot());
+	}
+	SPI_connect();
+
+	ret = SPI_execute(query.data, false, 0);
+	if (ret != SPI_OK_UPDATE)
+		elog(ERROR, "SPI error while updating a table");
+	SPI_finish();
+
+	if (!is_in_txn)
+	{
+		PopActiveSnapshot();
+		CommitTransactionCommand();
+	}
+
+	pfree(query.data);
+}
+
+/*
+ * Update the status of node, that is speciefied by the name
+ */
+void
+update_node_status_by_nodename(const char *node_name, LRG_NODE_STATE state, bool is_in_txn)
+{
+	update_node_status_internal(NULL, node_name, state, is_in_txn);
+}
+
+/*
+ * Same as above, but node_id is used for the key
+ */
+void
+update_node_status_by_nodeid(const char *node_id, LRG_NODE_STATE state, bool is_in_txn)
+{
+	update_node_status_internal(node_id, NULL, state, is_in_txn);
+}
+
+
+static Oid
+find_publication(const char *pubname)
+{
+	Relation rel;
+	HeapTuple tup;
+	Form_pg_publication pubform;
+
+	rel = table_open(PublicationRelationId, RowExclusiveLock);
+
+	/* Check if name is used */
+	tup = SearchSysCacheCopy1(PUBLICATIONNAME,
+							  CStringGetDatum(pubname));
+
+	if (!HeapTupleIsValid(tup))
+	{
+		table_close(rel, NoLock);
+		return InvalidOid;
+	}
+
+	pubform = (Form_pg_publication) GETSTRUCT(tup);
+	table_close(rel, NoLock);
+
+	return pubform->oid;
+}
+
+/*
+ * Create publication via SPI interface, and insert its oid
+ * to the system catalog pg_lrg_pub.
+ */
+static void
+create_publication(const char* group_name, const char* node_id, Oid group_oid)
+{
+	int ret;
+	StringInfoData query, pub_name;
+	Oid pub_oid;
+	Oid lrgpub_oid;
+	Relation rel;
+	bool		nulls[Natts_pg_lrg_pub];
+	Datum		values[Natts_pg_lrg_pub];
+	HeapTuple tup;
+
+	initStringInfo(&query);
+	initStringInfo(&pub_name);
+
+	/* Firstly do CREATE PUBLICATION */
+
+	StartTransactionCommand();
+	SPI_connect();
+	PushActiveSnapshot(GetTransactionSnapshot());
+
+	appendStringInfo(&pub_name, "pub_for_%s", group_name);
+	appendStringInfo(&query, "CREATE PUBLICATION %s %s", pub_name.data, "FOR ALL TABLES");
+
+	ret = SPI_execute(query.data, false, 0);
+	if (ret != SPI_OK_UTILITY)
+		elog(ERROR, "SPI error while creating publication");
+
+	PopActiveSnapshot();
+	SPI_finish();
+	CommitTransactionCommand();
+
+	/* ...And record its oid */
+	StartTransactionCommand();
+	(void) GetTransactionSnapshot();
+
+	pub_oid = find_publication(pub_name.data);
+	if (pub_oid == InvalidOid)
+		elog(ERROR, "publication is not found");
+
+	rel = table_open(LrgPublicationId, ExclusiveLock);
+
+	memset(nulls, 0, sizeof(nulls));
+	memset(values, 0, sizeof(values));
+
+	lrgpub_oid = GetNewOidWithIndex(rel, LrgPublicationOidIndexId, Anum_pg_lrg_pub_oid);
+
+	values[Anum_pg_lrg_pub_oid - 1] = ObjectIdGetDatum(lrgpub_oid);
+	values[Anum_pg_lrg_pub_groupid - 1] = ObjectIdGetDatum(group_oid);
+	values[Anum_pg_lrg_pub_pubid - 1] = ObjectIdGetDatum(pub_oid);
+
+	tup = heap_form_tuple(RelationGetDescr(rel), values, nulls);
+	/* Insert tuple into catalog. */
+	CatalogTupleInsert(rel, tup);
+	heap_freetuple(tup);
+	table_close(rel, ExclusiveLock);
+
+	CommitTransactionCommand();
+
+	pfree(pub_name.data);
+	pfree(query.data);
+}
+
+/*
+ * Some work for detaching and dropping
+ */
+static void
+detach_node(LrgNode *node)
+{
+	PGconn *tobedetached = NULL;
+	List *list;
+	ListCell   *lc;
+	MemoryContext subctx;
+	MemoryContext oldctx;
+	char *group_name = NULL;
+
+	get_group_info(&group_name);
+
+	if (LrgFunctionTypes == NULL)
+		load_file("libpqlrg", false);
+
+	lrg_connect(node->local_connstring, &tobedetached);
+
+	subctx = AllocSetContextCreate(TopMemoryContext,
+									"Lrg Launcher list",
+									ALLOCSET_DEFAULT_SIZES);
+	oldctx = MemoryContextSwitchTo(subctx);
+
+	list = get_lrg_nodes_list(node->node_id);
+
+	if (list != NIL)
+	{
+		foreach(lc, list)
+		{
+			LrgNode *other_node = (LrgNode *)lfirst(lc);
+			PGconn *otherconn = NULL;
+			lrg_connect(other_node->local_connstring, &otherconn);
+
+			lrg_drop_subscription(group_name, node->node_id, other_node->node_id, otherconn);
+			lrg_drop_subscription(group_name, other_node->node_id, node->node_id, tobedetached);
+
+			lrg_delete_from_nodes(otherconn, node->node_id);
+			lrg_disconnect(otherconn);
+		}
+	}
+	else
+		lrg_delete_from_nodes(tobedetached, node->node_id);
+
+	MemoryContextSwitchTo(oldctx);
+	MemoryContextDelete(subctx);
+
+	lrg_drop_publication(group_name, tobedetached);
+	lrg_cleanup(tobedetached);
+	lrg_disconnect(tobedetached);
+
+	pfree(group_name);
+}
+
+/*
+ * advance the state machine for creating/attaching
+ */
+static void
+advance_state_machine(LrgNode *local_node, LRG_NODE_STATE initial_status)
+{
+	PGconn *localconn = NULL;
+	PGconn *upstreamconn = NULL;
+	char *group_name = NULL;
+	LRG_NODE_STATE state = initial_status;
+	char node_id[64];
+
+	/*
+	 * Assuming that the specified node is local
+	 */
+	construct_node_id(node_id, sizeof(node_id));
+	Assert(strcmp(node_id, local_node->node_id) == 0);
+
+	if (state == LRG_STATE_INIT)
+	{
+		/* Establish connection if we are in the attaching case */
+		if (local_node->upstream_connstring != NULL)
+		{
+			load_file("libpqlrg", false);
+			lrg_connect(local_node->upstream_connstring, &upstreamconn);
+			lrg_connect(local_node->local_connstring, &localconn);
+
+			/* and get pg_lrg_nodes from upstream */
+			synchronise_system_tables(localconn, upstreamconn);
+		}
+		get_group_info(&group_name);
+
+		create_publication(group_name, local_node->node_id, local_node->group_oid);
+
+		state = LRG_STATE_CREATE_PUBLICATION;
+		update_node_status_by_nodename(local_node->node_name, LRG_STATE_CREATE_PUBLICATION, false);
+	}
+
+	if (state == LRG_STATE_CREATE_PUBLICATION)
+	{
+		if (local_node->upstream_connstring != NULL)
+		{
+			List *list;
+			ListCell   *lc;
+			MemoryContext subctx;
+			MemoryContext oldctx;
+
+			subctx = AllocSetContextCreate(TopMemoryContext,
+											"Lrg Launcher list",
+											ALLOCSET_DEFAULT_SIZES);
+			oldctx = MemoryContextSwitchTo(subctx);
+
+			/* Get a node list that belong to the group */
+			list = get_lrg_nodes_list(local_node->node_id);
+
+			/* and do CREATE SUBSCRIPTION on all nodes! */
+			foreach(lc, list)
+			{
+				LrgNode *other_node = (LrgNode *)lfirst(lc);
+				PGconn *otherconn = NULL;
+				lrg_connect(other_node->local_connstring, &otherconn);
+				lrg_create_subscription(group_name, local_node->local_connstring,
+										local_node->node_id, other_node->node_id,
+										otherconn, "only_local = true, copy_data = false");
+				lrg_create_subscription(group_name, other_node->local_connstring,
+										other_node->node_id, local_node->node_id,
+										localconn, "only_local = true, copy_data = false");
+
+				/*
+				 * XXX: adding a tuple into remote's pg_lrg_nodes here,
+				 * but it is bad. it should be end of this function.
+				 */
+				if (local_node->upstream_connstring != NULL)
+					lrg_insert_into_lrg_nodes(otherconn, local_node->node_id,
+							LRG_STATE_READY, local_node->node_name,
+							local_node->local_connstring, local_node->upstream_connstring);
+				lrg_disconnect(otherconn);
+			}
+			MemoryContextSwitchTo(oldctx);
+			MemoryContextDelete(subctx);
+		}
+
+		state = LRG_STATE_CREATE_SUBSCRIPTION;
+		update_node_status_by_nodename(local_node->node_name, LRG_STATE_CREATE_SUBSCRIPTION, false);
+	}
+
+	state = LRG_STATE_READY;
+	update_node_status_by_nodename(local_node->node_name, LRG_STATE_READY, false);
+
+	/*
+	 * clean up phase
+	 */
+	if (localconn != NULL)
+		lrg_disconnect(localconn);
+	if (upstreamconn != NULL)
+		lrg_disconnect(upstreamconn);
+	if (group_name != NULL)
+		pfree(group_name);
+}
+
+/*
+ * Get node-specific information that status is not ready.
+ */
+static void
+get_node_information(LrgNode **node, LRG_NODE_STATE *status)
+{
+	Relation	rel;
+	HeapTuple	tup;
+	TableScanDesc scan;
+
+	StartTransactionCommand();
+	(void) GetTransactionSnapshot();
+
+	rel = table_open(LrgNodesRelationId, AccessShareLock);
+	scan = table_beginscan_catalog(rel, 0, NULL);
+
+	while (HeapTupleIsValid(tup = heap_getnext(scan, ForwardScanDirection)))
+	{
+		Form_pg_lrg_nodes nodesform = (Form_pg_lrg_nodes) GETSTRUCT(tup);
+		MemoryContext oldcxt;
+		LrgNode *tmp;
+
+		/*
+		 * If the status is ready, we skip it.
+		 */
+		if (nodesform->status == LRG_STATE_READY)
+			continue;
+
+		oldcxt = MemoryContextSwitchTo(TopMemoryContext);
+		tmp = (LrgNode *)palloc0(sizeof(LrgNode));
+		tmp->group_oid = nodesform->groupid;
+		tmp->node_id = pstrdup(NameStr(nodesform->nodeid));
+		tmp->node_name = pstrdup(NameStr(nodesform->nodename));
+		tmp->local_connstring = pstrdup(NameStr(nodesform->localconn));
+		if (strlen(NameStr(nodesform->upstreamconn)) != 0)
+			tmp->upstream_connstring = pstrdup(NameStr(nodesform->upstreamconn));
+		else
+			tmp->upstream_connstring = NULL;
+
+		*node = tmp;
+		*status = nodesform->status;
+
+		MemoryContextSwitchTo(oldcxt);
+		break;
+	}
+
+	table_endscan(scan);
+	table_close(rel, NoLock);
+	CommitTransactionCommand();
+}
+
+static void
+do_node_management(void)
+{
+	LrgNode *node = NULL;
+	LRG_NODE_STATE status;
+
+	/*
+	 * read information from pg_lrg_nodes
+	 */
+	get_node_information(&node, &status);
+
+	if (node == NULL)
+	{
+		/*
+		 * If we rearch here status of nodes are READY,
+		 * it means that no operations are needed.
+		 */
+		return;
+	}
+
+	/*
+	 * XXX: for simplify the case for detaching/dropping is completely separated
+	 * from the creating/attaching.
+	 */
+	if (status == LRG_STATE_TO_BE_DETACHED)
+		detach_node(node);
+	else
+	{
+		/*
+		 * advance the state machine for creating or attaching.
+		 */
+		advance_state_machine(node, status);
+	}
+
+	pfree(node->node_id);
+	pfree(node->node_name);
+	pfree(node->local_connstring);
+	if (node->upstream_connstring != NULL)
+		pfree(node->upstream_connstring);
+	pfree(node);
+}
+
+/*
+ * Entry point for lrg worker
+ */
+void
+lrg_worker_main(Datum arg)
+{
+	int slot = DatumGetInt32(arg);
+
+	/* Establish signal handlers. */
+	pqsignal(SIGHUP, SignalHandlerForConfigReload);
+	pqsignal(SIGTERM, die);
+	BackgroundWorkerUnblockSignals();
+
+	/*
+	 * Get information from the controller. The idex
+	 * is given as the argument
+	 */
+	LWLockAcquire(&LrgPerdbCtx->lock, LW_SHARED);
+	my_lrg_worker = &LrgPerdbCtx->workers[slot];
+	my_lrg_worker->worker_pid = MyProcPid;
+	my_lrg_worker->worker_latch = &MyProc->procLatch;
+	LWLockRelease(&LrgPerdbCtx->lock);
+
+	before_shmem_exit(lrg_worker_onexit, (Datum) 0);
+
+	BackgroundWorkerInitializeConnectionByOid(my_lrg_worker->dbid, 0, 0);
+
+	elog(DEBUG3, "per-db worker for %u was launched", my_lrg_worker->dbid);
+
+	/*
+	 * The launcher launches the worker without considering
+	 * the existence of lrg related data.
+	 * So firstly workers must check their catalogs, and exit
+	 * if there is no data.
+	 * In any cases pg_lrg_info will have tuples if
+	 * this node is in a node group, so we reads it.
+	 */
+	if (get_group_info(NULL) == InvalidOid)
+	{
+		elog(DEBUG3, "This database %u is not a member of lrg", MyDatabaseId);
+		proc_exit(0);
+	}
+
+	do_node_management();
+
+	ResetLatch(&MyProc->procLatch);
+
+	/*
+	 * Wait for detaching or dropping.
+	 */
+	for (;;)
+	{
+		int rc;
+		bool is_latch_set = false;
+
+		CHECK_FOR_INTERRUPTS();
+
+#define TEMPORARY_NAP_TIME 180000L
+		rc = WaitLatch(&MyProc->procLatch,
+					   WL_LATCH_SET | WL_TIMEOUT | WL_EXIT_ON_PM_DEATH,
+					   TEMPORARY_NAP_TIME, 0);
+
+		if (rc & WL_LATCH_SET)
+		{
+			is_latch_set = true;
+			ResetLatch(&MyProc->procLatch);
+			CHECK_FOR_INTERRUPTS();
+		}
+
+		if (ConfigReloadPending)
+		{
+			ConfigReloadPending = false;
+			ProcessConfigFile(PGC_SIGHUP);
+		}
+
+		if (is_latch_set)
+		{
+			do_node_management();
+			is_latch_set = false;
+		}
+	}
+}
diff --git a/src/backend/storage/ipc/ipci.c b/src/backend/storage/ipc/ipci.c
index 26372d95b3..15b77405bc 100644
--- a/src/backend/storage/ipc/ipci.c
+++ b/src/backend/storage/ipc/ipci.c
@@ -32,6 +32,7 @@
 #include "postmaster/bgwriter.h"
 #include "postmaster/postmaster.h"
 #include "replication/logicallauncher.h"
+#include "replication/lrg.h"
 #include "replication/origin.h"
 #include "replication/slot.h"
 #include "replication/walreceiver.h"
@@ -284,6 +285,7 @@ CreateSharedMemoryAndSemaphores(void)
 	WalRcvShmemInit();
 	PgArchShmemInit();
 	ApplyLauncherShmemInit();
+	LrgLauncherShmemInit();
 
 	/*
 	 * Set up other modules that need some shared memory space
diff --git a/src/include/catalog/pg_lrg_info.h b/src/include/catalog/pg_lrg_info.h
new file mode 100644
index 0000000000..0067aac389
--- /dev/null
+++ b/src/include/catalog/pg_lrg_info.h
@@ -0,0 +1,47 @@
+/*-------------------------------------------------------------------------
+ *
+ * pg_lrg_info.h
+ *	  definition of the "logical replication group information" system
+ *	  catalog (pg_lrg_info)
+ *
+ *
+ * Portions Copyright (c) 1996-2022, PostgreSQL Global Development Group
+ * Portions Copyright (c) 1994, Regents of the University of California
+ *
+ * src/include/catalog/pg_lrg_info.h
+ *
+ * NOTES
+ *	  The Catalog.pm module reads this file and derives schema
+ *	  information.
+ *
+ *-------------------------------------------------------------------------
+ */
+#ifndef PG_LRG_INFO_H
+#define PG_LRG_INFO_H
+
+#include "catalog/genbki.h"
+#include "catalog/pg_lrg_info_d.h"
+
+/* ----------------
+ *		pg_lrg_info definition.  cpp turns this into
+ *		typedef struct FormData_pg_lrg_info
+ * ----------------
+ */
+CATALOG(pg_lrg_info,8337,LrgInfoRelationId)
+{
+	Oid			oid;			/* oid */
+
+	NameData	groupname;		/* name of the logical replication group */
+	bool		puballtables;
+} FormData_pg_lrg_info;
+
+/* ----------------
+ *		Form_pg_lrg_info corresponds to a pointer to a tuple with
+ *		the format of pg_lrg_info relation.
+ * ----------------
+ */
+typedef FormData_pg_lrg_info *Form_pg_lrg_info;
+
+DECLARE_UNIQUE_INDEX_PKEY(pg_lrg_info_oid_index, 8338, LrgInfoRelationIndexId, on pg_lrg_info using btree(oid oid_ops));
+
+#endif							/* PG_LRG_INFO_H */
diff --git a/src/include/catalog/pg_lrg_nodes.h b/src/include/catalog/pg_lrg_nodes.h
new file mode 100644
index 0000000000..0ef32185ad
--- /dev/null
+++ b/src/include/catalog/pg_lrg_nodes.h
@@ -0,0 +1,54 @@
+/*-------------------------------------------------------------------------
+ *
+ * pg_lrg_nodes.h
+ *	  definition of the "logical replication nodes" system
+ *	  catalog (pg_lrg_nodes)
+ *
+ *
+ * Portions Copyright (c) 1996-2022, PostgreSQL Global Development Group
+ * Portions Copyright (c) 1994, Regents of the University of California
+ *
+ * src/include/catalog/pg_lrg_nodes.h
+ *
+ * NOTES
+ *	  The Catalog.pm module reads this file and derives schema
+ *	  information.
+ *
+ *-------------------------------------------------------------------------
+ */
+#ifndef PG_LRG_NODES_H
+#define PG_LRG_NODES_H
+
+#include "catalog/genbki.h"
+#include "catalog/pg_lrg_nodes_d.h"
+
+/* ----------------
+ *		pg_lrg_nodes definition.  cpp turns this into
+ *		typedef struct FormData_pg_lrg_nodes
+ * ----------------
+ */
+CATALOG(pg_lrg_nodes,8339,LrgNodesRelationId)
+{
+	Oid			oid;			/* oid */
+
+	NameData	nodeid;		/* name of the logical replication group */
+	Oid			groupid BKI_LOOKUP(pg_lrg_info);
+	Oid 		dbid BKI_LOOKUP(pg_database);
+	int32		status;
+	NameData	nodename;
+	NameData	localconn;
+	NameData	upstreamconn BKI_FORCE_NULL;
+} FormData_pg_lrg_nodes;
+
+/* ----------------
+ *		Form_pg_lrg_nodes corresponds to a pointer to a tuple with
+ *		the format of pg_lrg_nodes relation.
+ * ----------------
+ */
+typedef FormData_pg_lrg_nodes *Form_pg_lrg_nodes;
+
+DECLARE_UNIQUE_INDEX_PKEY(pg_lrg_nodes_oid_index, 8340, LrgNodesRelationIndexId, on pg_lrg_nodes using btree(oid oid_ops));
+DECLARE_UNIQUE_INDEX(pg_lrg_node_id_index, 8346, LrgNodeIdIndexId, on pg_lrg_nodes using btree(nodeid name_ops));
+DECLARE_UNIQUE_INDEX(pg_lrg_nodes_name_index, 8347, LrgNodeNameIndexId, on pg_lrg_nodes using btree(nodename name_ops));
+
+#endif							/* PG_LRG_NODES_H */
diff --git a/src/include/catalog/pg_lrg_pub.h b/src/include/catalog/pg_lrg_pub.h
new file mode 100644
index 0000000000..d65dc51d4d
--- /dev/null
+++ b/src/include/catalog/pg_lrg_pub.h
@@ -0,0 +1,46 @@
+/*-------------------------------------------------------------------------
+ *
+ * pg_lrg_info.h
+ *	  definition of the "logical replication group publication" system
+ *	  catalog (pg_lrg_pub)
+ *
+ *
+ * Portions Copyright (c) 1996-2022, PostgreSQL Global Development Group
+ * Portions Copyright (c) 1994, Regents of the University of California
+ *
+ * src/include/catalog/pg_lrg_pub.h
+ *
+ * NOTES
+ *	  The Catalog.pm module reads this file and derives schema
+ *	  information.
+ *
+ *-------------------------------------------------------------------------
+ */
+#ifndef PG_LRG_PUB_H
+#define PG_LRG_PUB_H
+
+#include "catalog/genbki.h"
+#include "catalog/pg_lrg_pub_d.h"
+
+/* ----------------
+ *		pg_lrg_pub definition.  cpp turns this into
+ *		typedef struct FormData_pg_lrg_pub
+ * ----------------
+ */
+CATALOG(pg_lrg_pub,8341,LrgPublicationId)
+{
+	Oid			oid;
+	Oid 		groupid BKI_LOOKUP(pg_lrg_info);
+	Oid 		pubid BKI_LOOKUP(pg_publication);
+} FormData_pg_lrg_pub;
+
+/* ----------------
+ *		Form_pg_lrg_pub corresponds to a pointer to a tuple with
+ *		the format of pg_lrg_pub relation.
+ * ----------------
+ */
+typedef FormData_pg_lrg_pub *Form_pg_lrg_pub;
+
+DECLARE_UNIQUE_INDEX_PKEY(pg_lrg_pub_oid_index, 8344, LrgPublicationOidIndexId, on pg_lrg_pub using btree(oid oid_ops));
+
+#endif							/* PG_LRG_PUB_H */
diff --git a/src/include/catalog/pg_lrg_sub.h b/src/include/catalog/pg_lrg_sub.h
new file mode 100644
index 0000000000..398c8e8971
--- /dev/null
+++ b/src/include/catalog/pg_lrg_sub.h
@@ -0,0 +1,46 @@
+/*-------------------------------------------------------------------------
+ *
+ * pg_lrg_sub.h
+ *	  definition of the "logical replication group subscription" system
+ *	  catalog (pg_lrg_sub)
+ *
+ *
+ * Portions Copyright (c) 1996-2022, PostgreSQL Global Development Group
+ * Portions Copyright (c) 1994, Regents of the University of California
+ *
+ * src/include/catalog/pg_lrg_sub.h
+ *
+ * NOTES
+ *	  The Catalog.pm module reads this file and derives schema
+ *	  information.
+ *
+ *-------------------------------------------------------------------------
+ */
+#ifndef PG_LRG_SUB_H
+#define PG_LRG_SUB_H
+
+#include "catalog/genbki.h"
+#include "catalog/pg_lrg_sub_d.h"
+
+/* ----------------
+ *		pg_lrg_sub definition.  cpp turns this into
+ *		typedef struct FormData_pg_lrg_sub
+ * ----------------
+ */
+CATALOG(pg_lrg_sub,8343,LrgSubscriptionId)
+{
+	Oid			oid;
+	Oid 		groupid BKI_LOOKUP(pg_lrg_info);;
+	Oid 		subid BKI_LOOKUP(pg_subscription);
+} FormData_pg_lrg_sub;
+
+/* ----------------
+ *		Form_pg_lrg_sub corresponds to a pointer to a tuple with
+ *		the format of pg_lrg_sub relation.
+ * ----------------
+ */
+typedef FormData_pg_lrg_sub *Form_pg_lrg_sub;
+
+DECLARE_UNIQUE_INDEX_PKEY(pg_lrg_sub_oid_index, 8345, LrgSubscriptionOidIndexId, on pg_lrg_sub using btree(oid oid_ops));
+
+#endif							/* PG_LRG_SUB_H */
diff --git a/src/include/catalog/pg_proc.dat b/src/include/catalog/pg_proc.dat
index babe16f00a..c64b4b2420 100644
--- a/src/include/catalog/pg_proc.dat
+++ b/src/include/catalog/pg_proc.dat
@@ -11885,4 +11885,34 @@
   prorettype => 'bytea', proargtypes => 'pg_brin_minmax_multi_summary',
   prosrc => 'brin_minmax_multi_summary_send' },
 
+# lrg
+{ oid => '8143', descr => 'create logical replication group',
+  proname => 'lrg_create', proparallel => 'r',
+  prorettype => 'void', proargtypes => 'text text text text',
+  prosrc => 'lrg_create' },
+{ oid => '8144', descr => 'attach to logical replication group',
+  proname => 'lrg_node_attach', proparallel => 'r',
+  prorettype => 'void', proargtypes => 'text text text text',
+  prosrc => 'lrg_node_attach' },
+{ oid => '8145', descr => 'detach from logical replication group',
+  proname => 'lrg_node_detach', proparallel => 'r',
+  prorettype => 'void', proargtypes => 'text text',
+  prosrc => 'lrg_node_detach' },
+{ oid => '8146', descr => 'delete logical replication group',
+  proname => 'lrg_drop', proparallel => 'r',
+  prorettype => 'void', proargtypes => 'text',
+  prosrc => 'lrg_drop' },
+{ oid => '8147', descr => 'insert a tuple to pg_lrg_sub',
+  proname => 'lrg_insert_into_sub', proparallel => 'r',
+  prorettype => 'void', proargtypes => 'text',
+  prosrc => 'lrg_insert_into_sub' },
+{ oid => '8148', descr => 'insert a tuple to pg_lrg_nodes',
+  proname => 'lrg_insert_into_nodes', proparallel => 'r',
+  prorettype => 'void', proargtypes => 'text int4 text text text',
+  prosrc => 'lrg_insert_into_nodes' },
+{ oid => '8149', descr => 'wait until node operations are done.',
+  proname => 'lrg_wait', proparallel => 'r',
+  prorettype => 'void', proargtypes => '',
+  prosrc => 'lrg_wait' },
+
 ]
diff --git a/src/include/replication/libpqlrg.h b/src/include/replication/libpqlrg.h
new file mode 100644
index 0000000000..f13b4934d3
--- /dev/null
+++ b/src/include/replication/libpqlrg.h
@@ -0,0 +1,99 @@
+/*-------------------------------------------------------------------------
+ *
+ * libpqlrg.h
+ *		  Constructs a logical replication group
+ *
+ *-------------------------------------------------------------------------
+ */
+#ifndef LIBPQLIG_H
+#define LIBPQLIG_H
+
+#include "postgres.h"
+#include "libpq-fe.h"
+#include "replication/lrg.h"
+
+/* function pointers for libpqlrg */
+
+typedef void (*libpqlrg_connect_fn) (const char *connstring, PGconn **conn);
+typedef bool (*libpqlrg_check_group_fn) (PGconn *conn, const char *group_name);
+typedef void (*libpqlrg_copy_lrg_nodes_fn) (PGconn *remoteconn, PGconn *localconn);
+typedef void (*libpqlrg_insert_into_lrg_nodes_fn) (PGconn *remoteconn,
+												   const char *node_id, LRG_NODE_STATE status,
+												   const char *node_name, const char *local_connstring,
+												   const char *upstream_connstring);
+typedef void (*libpqlrg_create_subscription_fn) (const char *group_name, const char *publisher_connstring,
+											  const char *publisher_node_id, const char *subscriber_node_id,
+											  PGconn *subscriberconn, const char *options);
+
+typedef void (*libpqlrg_drop_publication_fn) (const char *group_name,
+											  PGconn *publisherconn);
+
+typedef void (*libpqlrg_drop_subscription_fn) (const char *group_name,
+											   const char *publisher_node_id, const char *subscriber_node_id,
+											   PGconn *subscriberconn);
+
+typedef void (*libpqlrg_delete_from_nodes_fn) (PGconn *conn, const char *node_id);
+typedef void (*libpqlrg_cleanup_fn) (PGconn *conn);
+
+typedef void (*libpqlrg_disconnect_fn) (PGconn *conn);
+
+typedef struct lrg_function_types
+{
+	libpqlrg_connect_fn libpqlrg_connect;
+	libpqlrg_check_group_fn libpqlrg_check_group;
+	libpqlrg_copy_lrg_nodes_fn libpqlrg_copy_lrg_nodes;
+	libpqlrg_insert_into_lrg_nodes_fn libpqlrg_insert_into_lrg_nodes;
+	libpqlrg_create_subscription_fn libpqlrg_create_subscription;
+	libpqlrg_drop_publication_fn libpqlrg_drop_publication;
+	libpqlrg_drop_subscription_fn libpqlrg_drop_subscription;
+	libpqlrg_delete_from_nodes_fn libpqlrg_delete_from_nodes;
+	libpqlrg_cleanup_fn libpqlrg_cleanup;
+	libpqlrg_disconnect_fn libpqlrg_disconnect;
+} lrg_function_types;
+
+extern PGDLLIMPORT lrg_function_types *LrgFunctionTypes;
+
+#define lrg_connect(connstring, conn) \
+	LrgFunctionTypes->libpqlrg_connect(connstring, conn)
+#define lrg_check_group(conn, group_name) \
+	LrgFunctionTypes->libpqlrg_check_group(conn, group_name)
+#define lrg_copy_lrg_nodes(remoteconn, localconn) \
+	LrgFunctionTypes->libpqlrg_copy_lrg_nodes(remoteconn, localconn)
+
+#define lrg_insert_into_lrg_nodes(remoteconn, \
+								  node_id, status, \
+								  node_name, local_connstring, \
+								  upstream_connstring) \
+	LrgFunctionTypes->libpqlrg_insert_into_lrg_nodes(remoteconn, \
+													 node_id, status, \
+													 node_name, local_connstring, \
+													 upstream_connstring)
+#define lrg_create_subscription(group_name, publisher_connstring, \
+								publisher_node_id, subscriber_node_id, \
+								subscriberconn, options) \
+	LrgFunctionTypes->libpqlrg_create_subscription(group_name, publisher_connstring, \
+												publisher_node_id, subscriber_node_id, \
+												subscriberconn, options)
+
+#define lrg_drop_publication(group_name, \
+							  publisherconn) \
+	LrgFunctionTypes->libpqlrg_drop_publication(group_name, \
+												 publisherconn)
+
+#define lrg_drop_subscription(group_name, \
+							  publisher_node_id, subscriber_node_id, \
+							  subscriberconn) \
+	LrgFunctionTypes->libpqlrg_drop_subscription(group_name, \
+												 publisher_node_id, subscriber_node_id, \
+												 subscriberconn)
+
+#define lrg_delete_from_nodes(conn, node_id) \
+	LrgFunctionTypes->libpqlrg_delete_from_nodes(conn, node_id)
+
+#define lrg_cleanup(conn) \
+	LrgFunctionTypes->libpqlrg_cleanup(conn)
+
+#define lrg_disconnect(conn) \
+	LrgFunctionTypes->libpqlrg_disconnect(conn)
+
+#endif /* LIBPQLIG_H */
\ No newline at end of file
diff --git a/src/include/replication/lrg.h b/src/include/replication/lrg.h
new file mode 100644
index 0000000000..f0e38696cf
--- /dev/null
+++ b/src/include/replication/lrg.h
@@ -0,0 +1,68 @@
+/*-------------------------------------------------------------------------
+ *
+ * lrg.h
+ *		  Constructs a logical replication group
+ *
+ *-------------------------------------------------------------------------
+ */
+#ifndef LRG_H
+#define LRG_H
+
+#include "postgres.h"
+
+#include "storage/latch.h"
+#include "storage/lock.h"
+#include "storage/lwlock.h"
+
+/*
+ * enumeration for represents its status
+ */
+typedef enum
+{
+	LRG_STATE_INIT = 0,
+	LRG_STATE_CREATE_PUBLICATION,
+	LRG_STATE_CREATE_SUBSCRIPTION,
+	LRG_STATE_READY,
+	LRG_STATE_TO_BE_DETACHED,
+} LRG_NODE_STATE;
+
+/*
+ * working space for each lrg per-db worker.
+ */
+typedef struct LrgPerdbWorker {
+	pid_t worker_pid;
+	Oid dbid;
+	Latch *worker_latch;
+} LrgPerdbWorker;
+
+/*
+ * controller for lrg per-db worker.
+ * This will be hold by launcher.
+ */
+typedef struct LrgPerdbCtxStruct {
+	LWLock lock;
+	pid_t launcher_pid;
+	Latch *launcher_latch;
+	LrgPerdbWorker workers[FLEXIBLE_ARRAY_MEMBER];
+} LrgPerdbCtxStruct;
+
+extern LrgPerdbCtxStruct *LrgPerdbCtx;
+
+/* lrg.c */
+extern void LrgLauncherShmemInit(void);
+extern void LrgLauncherRegister(void);
+extern void lrg_add_nodes(char *node_id, Oid group_id, LRG_NODE_STATE status, char *node_name, char *local_connstring, char *upstream_connstring);
+extern Oid get_group_info(char **group_name);
+extern void construct_node_id(char *out_node_id, int size);
+extern void update_node_status_by_nodename(const char *node_name, LRG_NODE_STATE state, bool is_in_txn);
+extern void update_node_status_by_nodeid(const char *node_id, LRG_NODE_STATE state, bool is_in_txn);
+
+/* lrg_launcher.c */
+extern void lrg_launcher_main(Datum arg) pg_attribute_noreturn();
+extern void lrg_launcher_wakeup(void);
+
+/* *lrg_worker.c */
+extern void lrg_worker_main(Datum arg) pg_attribute_noreturn();
+extern void lrg_worker_cleanup(LrgPerdbWorker *worker);
+
+#endif /* LRG_H */
\ No newline at end of file
diff --git a/src/test/regress/expected/oidjoins.out b/src/test/regress/expected/oidjoins.out
index 215eb899be..da0ef150a2 100644
--- a/src/test/regress/expected/oidjoins.out
+++ b/src/test/regress/expected/oidjoins.out
@@ -266,3 +266,9 @@ NOTICE:  checking pg_subscription {subdbid} => pg_database {oid}
 NOTICE:  checking pg_subscription {subowner} => pg_authid {oid}
 NOTICE:  checking pg_subscription_rel {srsubid} => pg_subscription {oid}
 NOTICE:  checking pg_subscription_rel {srrelid} => pg_class {oid}
+NOTICE:  checking pg_lrg_nodes {groupid} => pg_lrg_info {oid}
+NOTICE:  checking pg_lrg_nodes {dbid} => pg_database {oid}
+NOTICE:  checking pg_lrg_pub {groupid} => pg_lrg_info {oid}
+NOTICE:  checking pg_lrg_pub {pubid} => pg_publication {oid}
+NOTICE:  checking pg_lrg_sub {groupid} => pg_lrg_info {oid}
+NOTICE:  checking pg_lrg_sub {subid} => pg_subscription {oid}
-- 
2.27.0

v3-0004-add-tap-tests.patchapplication/octet-stream; name=v3-0004-add-tap-tests.patchDownload

From 62df7e81390427b182b44d7a29ffb7f68b03e72b Mon Sep 17 00:00:00 2001
From: "kuroda.hayato%40jp.fujitsu.com" <kuroda.hayato@jp.fujitsu.com>
Date: Thu, 26 May 2022 04:52:49 +0000
Subject: [PATCH v3 2/3] add tap tests

---
 src/test/Makefile                   |   2 +-
 src/test/lrg/.gitignore             |   2 +
 src/test/lrg/Makefile               |  23 ++++
 src/test/lrg/t/001_validation.pl    | 188 +++++++++++++++++++++++++
 src/test/lrg/t/002_constructions.pl | 137 +++++++++++++++++++
 src/test/lrg/t/003_rep.pl           | 205 ++++++++++++++++++++++++++++
 6 files changed, 556 insertions(+), 1 deletion(-)
 create mode 100644 src/test/lrg/.gitignore
 create mode 100644 src/test/lrg/Makefile
 create mode 100644 src/test/lrg/t/001_validation.pl
 create mode 100644 src/test/lrg/t/002_constructions.pl
 create mode 100644 src/test/lrg/t/003_rep.pl

diff --git a/src/test/Makefile b/src/test/Makefile
index 69ef074d75..f12a8cbedf 100644
--- a/src/test/Makefile
+++ b/src/test/Makefile
@@ -12,7 +12,7 @@ subdir = src/test
 top_builddir = ../..
 include $(top_builddir)/src/Makefile.global
 
-SUBDIRS = perl regress isolation modules authentication recovery subscription
+SUBDIRS = perl regress isolation modules authentication recovery subscription lrg
 
 ifeq ($(with_icu),yes)
 SUBDIRS += icu
diff --git a/src/test/lrg/.gitignore b/src/test/lrg/.gitignore
new file mode 100644
index 0000000000..e07b677a7d
--- /dev/null
+++ b/src/test/lrg/.gitignore
@@ -0,0 +1,2 @@
+# Generated by regression tests
+/tmp_check/
diff --git a/src/test/lrg/Makefile b/src/test/lrg/Makefile
new file mode 100644
index 0000000000..065cabd6eb
--- /dev/null
+++ b/src/test/lrg/Makefile
@@ -0,0 +1,23 @@
+#-------------------------------------------------------------------------
+#
+# Makefile for src/test/lrg
+#
+# Portions Copyright (c) 1996-2022, PostgreSQL Global Development Group
+# Portions Copyright (c) 1994, Regents of the University of California
+#
+# src/test/lrg/Makefile
+#
+#-------------------------------------------------------------------------
+
+subdir = src/test/lrg
+top_builddir = ../../..
+include $(top_builddir)/src/Makefile.global
+
+check:
+	$(prove_check)
+
+installcheck:
+	$(prove_installcheck)
+
+clean distclean maintainer-clean:
+	rm -rf tmp_check
diff --git a/src/test/lrg/t/001_validation.pl b/src/test/lrg/t/001_validation.pl
new file mode 100644
index 0000000000..7ae8c6dbc7
--- /dev/null
+++ b/src/test/lrg/t/001_validation.pl
@@ -0,0 +1,188 @@
+
+# Copyright (c) 2022, PostgreSQL Global Development Group
+
+#
+# Tests for constructing a logical replication groups
+#
+
+# Basic logical replication test
+use strict;
+use warnings;
+use PostgreSQL::Test::Cluster;
+use PostgreSQL::Test::Utils;
+use Test::More;
+
+my $result;
+my $stdout;
+my $stderr;
+
+#
+# In this test two nodes are used, called node1 and node2.
+# After checking validations of APIs, two-way logical replication
+# group is created with following cases:
+#
+# - Case1: There is no data in any of the nodes
+#
+
+#
+# Initialize all nodes
+#
+
+my $node1 = PostgreSQL::Test::Cluster->new('node1');
+$node1->init(allows_streaming => 'logical');
+$node1->start;
+
+my $node2 = PostgreSQL::Test::Cluster->new('node2');
+$node2->init(allows_streaming => 'logical');
+$node2->start;
+
+#
+# Input check for APIs
+#
+
+my $group_name = 'test_group';
+
+my $node1_connstr = $node1->connstr . ' dbname=postgres';
+my $node2_connstr = $node2->connstr . ' dbname=postgres';
+
+##
+## Tests for lrg_create
+##
+
+$result = $node1->safe_psql('postgres', "SELECT COUNT(*) FROM pg_lrg_info");
+is($result, qq(0), 'check the initialized node does not belong to any group');
+
+($result, $stdout, $stderr) = $node1->psql(
+	'postgres', "SELECT lrg_create('$group_name', 'WRONG INPUT', '$node1_connstr', 'node1')");
+like
+(
+	$stderr,
+	qr/ERROR:  'only 'FOR ALL TABLES' is support/,
+	"check only one publication type is allowed"
+);
+
+$node1->safe_psql(
+	'postgres',
+	"SELECT lrg_create('$group_name', 'FOR ALL TABLES', '$node1_connstr', 'node1')");
+
+$node1->safe_psql('postgres', "SELECT lrg_wait()");
+
+($result, $stdout, $stderr) = $node1->psql(
+	'postgres', "SELECT lrg_create('$group_name', 'FOR ALL TABLES', '$node1_connstr', 'node1')");
+like
+(
+	$stderr,
+	qr/ERROR:  This node is already a member of a node group/,
+	"check duplicated calling of lrg_create is not allowed"
+);
+
+##
+## Tests for lrg_node_attach
+##
+
+($result, $stdout, $stderr) = $node2->psql(
+	'postgres', "SELECT lrg_node_attach('WRONG GROUP', '$node2_connstr', '$node1_connstr', 'node2')");
+like
+(
+	$stderr,
+	qr/ERROR:  specified group is not exist/,
+	"sanity check for group_name"
+);
+
+($result, $stdout, $stderr) = $node2->psql(
+	'postgres', "SELECT lrg_node_attach('$group_name', '$node2_connstr', 'wrong connection string', 'node2')");
+like
+(
+	$stderr,
+	qr/ERROR:  failed to connect/,
+	"sanity check for upstream connection string"
+);
+
+$node2->safe_psql(
+	'postgres',
+	"SELECT lrg_node_attach('$group_name', '$node2_connstr', '$node1_connstr', 'node2')");
+
+$node2->safe_psql('postgres', "SELECT lrg_wait()");
+
+($result, $stdout, $stderr) = $node2->psql(
+	'postgres', "SELECT lrg_node_attach('$group_name', '$node2_connstr', 'wrong connection string', 'node2')");
+like
+(
+	$stderr,
+	qr/ERROR:  This node is already a member of a node group/,
+	"check duplicated calling of lrg_node_attach is not allowed"
+);
+
+($result, $stdout, $stderr) = $node2->psql(
+	'postgres', "SELECT lrg_create('$group_name', 'FOR ALL TABLES', '$node1_connstr', 'node1')");
+like
+(
+	$stderr,
+	qr/ERROR:  This node is already a member of a node group/,
+	"check lrg_create cannot be called if this node is already a member of a node group"
+);
+
+##
+## Tests for lrg_node_detach
+##
+
+($result, $stdout, $stderr) = $node2->psql(
+	'postgres', "SELECT lrg_node_detach('WRONG GROUP', 'node2')");
+like
+(
+	$stderr,
+	qr/ERROR:  This node is not a member of the specified group: WRONG GROUP/,
+	"sanity check for group_name"
+);
+
+## TODO: add a sanity check for node_name
+
+$node2->safe_psql(
+	'postgres',
+	"SELECT lrg_node_detach('$group_name', 'node2')");
+
+$node2->safe_psql('postgres', "SELECT lrg_wait()");
+
+($result, $stdout, $stderr) = $node2->psql(
+	'postgres', "SELECT lrg_node_detach('$group_name', 'node2')");
+like
+(
+	$stderr,
+	qr/ERROR:  This node is not a member of the specified group: $group_name/,
+	"duplicated calling is not allowed"
+);
+
+
+##
+## Tests for lrg_drop
+##
+
+($result, $stdout, $stderr) = $node1->psql(
+	'postgres', "SELECT lrg_drop('WRONG GROUP')");
+like
+(
+	$stderr,
+	qr/ERROR:  This node is not a member of the specified group: WRONG GROUP/,
+	"sanity check for group_name"
+);
+
+$node1->safe_psql(
+	'postgres',
+	"SELECT lrg_drop('$group_name')");
+
+$node1->safe_psql('postgres', "SELECT lrg_wait()");
+
+($result, $stdout, $stderr) = $node1->psql(
+	'postgres', "SELECT lrg_drop('$group_name')");
+like
+(
+	$stderr,
+	qr/ERROR:  This node is not a member of the specified group: $group_name/,
+	"duplicated calling is not allowed"
+);
+
+# shutdown
+$node1->stop('fast');
+$node2->stop('fast');
+
+done_testing();
diff --git a/src/test/lrg/t/002_constructions.pl b/src/test/lrg/t/002_constructions.pl
new file mode 100644
index 0000000000..7b97983acc
--- /dev/null
+++ b/src/test/lrg/t/002_constructions.pl
@@ -0,0 +1,137 @@
+# Copyright (c) 2022, PostgreSQL Global Development Group
+
+#
+# Tests for constructing a logical replication groups
+#
+
+# Basic logical replication test
+use strict;
+use warnings;
+use PostgreSQL::Test::Cluster;
+use PostgreSQL::Test::Utils;
+use Test::More;
+
+my $result;
+my $stdout;
+my $stderr;
+
+#
+# In this test two-way logical replication group
+# will be created. A node will attach, detach, and attach
+# again to the same node group. Also, catalogs related with
+# LRG will be checked on both nodes
+#
+
+#
+# Initialize all nodes
+#
+
+my $node1 = PostgreSQL::Test::Cluster->new('node1');
+$node1->init(allows_streaming => 'logical');
+$node1->start;
+
+my $node2 = PostgreSQL::Test::Cluster->new('node2');
+$node2->init(allows_streaming => 'logical');
+$node2->start;
+
+my $group_name = 'test_group';
+
+my $node1_connstr = $node1->connstr . ' dbname=postgres';
+my $node2_connstr = $node2->connstr . ' dbname=postgres';
+
+# Create a logical replication group on node1
+
+$node1->safe_psql(
+	'postgres',
+	"SELECT lrg_create('$group_name', 'FOR ALL TABLES', '$node1_connstr', 'node1')");
+
+## Check its catalog related with LRG
+
+$result = $node1->safe_psql('postgres', "SELECT groupname, puballtables FROM pg_lrg_info");
+is($result, qq($group_name|t), 'check creating group has been succeeded');
+
+$result = $node1->safe_psql('postgres', "SELECT nodename FROM pg_lrg_nodes");
+is($result, qq(node1), 'check this node has attached to the group');
+
+$node1->safe_psql('postgres', "SELECT lrg_wait()");
+
+$result = $node1->safe_psql('postgres', "SELECT COUNT(*) FROM pg_lrg_pub");
+is($result, qq(1), 'check a publication has been associated with the group');
+
+# Attach node2 to the logical replication group
+
+$node2->safe_psql(
+	'postgres',
+	"SELECT lrg_node_attach('$group_name', '$node2_connstr', '$node1_connstr', 'node2')");
+
+$node2->safe_psql('postgres', "SELECT lrg_wait()");
+
+## Check its catalog related with LRG
+
+$result = $node2->safe_psql('postgres', "SELECT groupname, puballtables FROM pg_lrg_info");
+is($result, qq($group_name|t), 'check creating group has been succeeded');
+
+$result = $node2->safe_psql('postgres', "SELECT COUNT(*) FROM pg_lrg_nodes");
+is($result, qq(2), 'check this node has attached to the group');
+
+$result = $node2->safe_psql('postgres', "SELECT COUNT(*) FROM pg_lrg_pub");
+is($result, qq(1), 'check a publication has been associated with the group');
+
+$result = $node1->safe_psql('postgres', "SELECT COUNT(*) FROM pg_lrg_sub");
+is($result, qq(1), 'check a subscription has been associated with the group');
+
+$result = $node2->safe_psql('postgres', "SELECT COUNT(*) FROM pg_lrg_sub");
+is($result, qq(1), 'check a subscription has been associated with the group');
+
+# Detach node2 once
+
+$node2->safe_psql('postgres', "SELECT lrg_node_detach('$group_name', 'node2')");
+$node2->safe_psql('postgres', "SELECT lrg_wait()");
+
+## Check its catalog related with LRG
+
+$result = $node2->safe_psql('postgres', "SELECT groupname, puballtables FROM pg_lrg_info");
+is($result, qq(), 'check catalogs have been cleaned up');
+
+$result = $node2->safe_psql('postgres', "SELECT COUNT(*) FROM pg_lrg_nodes");
+is($result, qq(0), 'check catalogs have been cleaned up');
+
+$result = $node2->safe_psql('postgres', "SELECT COUNT(*) FROM pg_lrg_pub");
+is($result, qq(0), 'check catalogs have been cleaned up');
+
+$result = $node1->safe_psql('postgres', "SELECT COUNT(*) FROM pg_lrg_sub");
+is($result, qq(1), 'check a subscription has been deleted');
+
+$result = $node2->safe_psql('postgres', "SELECT COUNT(*) FROM pg_lrg_sub");
+is($result, qq(0), 'check catalogs have been cleaned up');
+
+# ...and attach again
+
+$node2->safe_psql(
+	'postgres',
+	"SELECT lrg_node_attach('$group_name', '$node2_connstr', '$node1_connstr', 'node2')");
+
+$node2->safe_psql('postgres', "SELECT lrg_wait()");
+
+## Check its catalog related with LRG
+
+$result = $node2->safe_psql('postgres', "SELECT groupname, puballtables FROM pg_lrg_info");
+is($result, qq($group_name|t), 'check creating group has been succeeded');
+
+$result = $node2->safe_psql('postgres', "SELECT COUNT(*) FROM pg_lrg_nodes");
+is($result, qq(2), 'check this node has attached to the group');
+
+$result = $node2->safe_psql('postgres', "SELECT COUNT(*) FROM pg_lrg_pub");
+is($result, qq(1), 'check a publication has been associated with the group');
+
+$result = $node1->safe_psql('postgres', "SELECT COUNT(*) FROM pg_lrg_sub");
+is($result, qq(2), 'check a subscription has been associated with the group');
+
+$result = $node2->safe_psql('postgres', "SELECT COUNT(*) FROM pg_lrg_sub");
+is($result, qq(1), 'check a subscription has been associated with the group');
+
+# shutdown
+$node1->stop('fast');
+$node2->stop('fast');
+
+done_testing();
diff --git a/src/test/lrg/t/003_rep.pl b/src/test/lrg/t/003_rep.pl
new file mode 100644
index 0000000000..82e0da475a
--- /dev/null
+++ b/src/test/lrg/t/003_rep.pl
@@ -0,0 +1,205 @@
+# Copyright (c) 2022, PostgreSQL Global Development Group
+
+#
+# Basic LRG tests: checks replications of changes on nodes
+#
+
+# Basic logical replication test
+use strict;
+use warnings;
+use PostgreSQL::Test::Cluster;
+use PostgreSQL::Test::Utils;
+use Test::More;
+
+my $result;
+my $stdout;
+my $stderr;
+
+#
+# In this test three-way logical replication group
+# will be created with following cases:
+#
+# - Case1: There is no data in any of the nodes
+#
+
+#
+# Initialize all nodes
+#
+
+my $node1 = PostgreSQL::Test::Cluster->new('node1');
+$node1->init(allows_streaming => 'logical');
+$node1->start;
+
+my $node2 = PostgreSQL::Test::Cluster->new('node2');
+$node2->init(allows_streaming => 'logical');
+$node2->start;
+
+my $node3 = PostgreSQL::Test::Cluster->new('node3');
+$node3->init(allows_streaming => 'logical');
+$node3->start;
+
+my $group_name = 'test_group';
+
+my $node1_connstr = $node1->connstr . ' dbname=postgres';
+my $node2_connstr = $node2->connstr . ' dbname=postgres';
+my $node3_connstr = $node3->connstr . ' dbname=postgres';
+
+#
+# Create tables that will share changes on all nodes
+# They will be used for testing
+#
+
+$node1->safe_psql(
+	'postgres',
+	'CREATE TABLE foo (id int PRIMARY KEY)');
+$node2->safe_psql(
+	'postgres',
+	'CREATE TABLE foo (id int PRIMARY KEY)');
+$node3->safe_psql(
+	'postgres',
+	'CREATE TABLE foo (id int PRIMARY KEY)');
+
+#
+# - Case1: There is no data in any of the nodes
+#
+
+## Create a logical replication group on node1
+
+$node1->safe_psql(
+	'postgres',
+	"SELECT lrg_create('$group_name', 'FOR ALL TABLES', '$node1_connstr', 'node1')");
+
+## Attach node2, node3 to the logical replication group
+
+$node2->safe_psql(
+	'postgres',
+	"SELECT lrg_node_attach('$group_name', '$node2_connstr', '$node1_connstr', 'node2')");
+
+$node2->safe_psql('postgres', "SELECT lrg_wait()");
+
+$node3->safe_psql(
+	'postgres',
+	"SELECT lrg_node_attach('$group_name', '$node3_connstr', '$node1_connstr', 'node3')");
+
+$node3->safe_psql('postgres', "SELECT lrg_wait()");
+
+## Now logical replication system has been created. All changes will be shared to all nodes.
+
+my $subname12 = $node1->safe_psql('postgres',
+	"SELECT subname FROM pg_subscription WHERE subconninfo = '$node2_connstr'");
+my $subname13 = $node1->safe_psql('postgres',
+	"SELECT subname FROM pg_subscription WHERE subconninfo = '$node3_connstr'");
+
+my $subname21 = $node2->safe_psql('postgres',
+	"SELECT subname FROM pg_subscription WHERE subconninfo = '$node1_connstr'");
+my $subname23 = $node2->safe_psql('postgres',
+	"SELECT subname FROM pg_subscription WHERE subconninfo = '$node3_connstr'");
+
+my $subname31 = $node3->safe_psql('postgres',
+	"SELECT subname FROM pg_subscription WHERE subconninfo = '$node1_connstr'");
+my $subname32 = $node3->safe_psql('postgres',
+	"SELECT subname FROM pg_subscription WHERE subconninfo = '$node2_connstr'");
+
+## Inserted data on node1 will appear on node2, node3.
+
+$node1->safe_psql('postgres', "INSERT INTO foo VALUES (1)");
+
+verify_data($node1, $node2, $node3, '1');
+
+## Inserted data on node2 will appear on node1, node3.
+
+$node2->safe_psql('postgres', "INSERT INTO foo VALUES (2)");
+
+verify_data($node1, $node2, $node3, '1
+2');
+
+## Inserted data on node3 will appear on node1, node2.
+
+$node3->safe_psql('postgres', "INSERT INTO foo VALUES (3)");
+
+verify_data($node1, $node2, $node3, '1
+2
+3');
+
+
+## Updated data on node1 will appear on node2, node3.
+
+$node1->safe_psql('postgres', "UPDATE foo SET id = 4 WHERE id = 1");
+
+verify_data($node1, $node2, $node3, '2
+3
+4');
+
+## Updated data on node2 will appear on node1, node3.
+
+$node2->safe_psql('postgres', "UPDATE foo SET id = 5 WHERE id = 2");
+
+verify_data($node1, $node2, $node3, '3
+4
+5');
+
+## Updated data on node3 will appear on node1, node2.
+
+$node2->safe_psql('postgres', "UPDATE foo SET id = 6 WHERE id = 3");
+
+verify_data($node1, $node2, $node3, '4
+5
+6');
+
+## Deleted data on node1 will be removed on node2, node3.
+
+$node1->safe_psql('postgres', "DELETE FROM foo WHERE id = 6");
+
+verify_data($node1, $node2, $node3, '4
+5');
+
+## Deleted data on node2 will be removed on node1, node3.
+
+$node2->safe_psql('postgres', "DELETE FROM foo WHERE id = 5");
+
+verify_data($node1, $node2, $node3, '4');
+
+## Deleted data on node3 will be removed on node1, node2.
+
+$node2->safe_psql('postgres', "DELETE FROM foo WHERE id = 4");
+
+verify_data($node1, $node2, $node3, '');
+
+
+# shutdown
+$node1->stop('fast');
+$node2->stop('fast');
+
+done_testing();
+
+# Subroutine to verify the data is replicated successfully.
+sub verify_data
+{
+	my ($node1, $node2, $node3, $expect) = @_;
+
+	$node1->wait_for_catchup($subname21);
+	$node1->wait_for_catchup($subname31);
+	$node2->wait_for_catchup($subname12);
+	$node2->wait_for_catchup($subname32);
+	$node3->wait_for_catchup($subname13);
+	$node3->wait_for_catchup($subname23);
+
+	# check that data is replicated to all the nodes
+	$result =
+	  $node1->safe_psql('postgres', "SELECT * FROM foo ORDER BY 1;");
+	is($result, qq($expect),
+	   'Data is replicated as expected'
+	);
+
+	$result =
+	  $node2->safe_psql('postgres', "SELECT * FROM foo ORDER BY 1;");
+	is($result, qq($expect),
+	   'Data is replicated as expected'
+	);
+
+	$result =
+	  $node3->safe_psql('postgres', "SELECT * FROM foo ORDER BY 1;");
+	is($result, qq($expect),
+	   'Data is replicated as expected'
+	);
+}
-- 
2.27.0

v3-0005-add-documents.patchapplication/octet-stream; name=v3-0005-add-documents.patchDownload

From a3bf8c6819875330955cab8f68636c351db837ea Mon Sep 17 00:00:00 2001
From: "kuroda.hayato%40jp.fujitsu.com" <kuroda.hayato@jp.fujitsu.com>
Date: Wed, 18 May 2022 04:56:18 +0000
Subject: [PATCH v3 3/3] add documents

---
 doc/src/sgml/catalogs.sgml | 328 +++++++++++++++++++++++++++++
 doc/src/sgml/filelist.sgml |   1 +
 doc/src/sgml/func.sgml     | 113 ++++++++++
 doc/src/sgml/lrg.sgml      | 417 +++++++++++++++++++++++++++++++++++++
 doc/src/sgml/postgres.sgml |   1 +
 5 files changed, 860 insertions(+)
 create mode 100644 doc/src/sgml/lrg.sgml

diff --git a/doc/src/sgml/catalogs.sgml b/doc/src/sgml/catalogs.sgml
index a533a2153e..db45b339e8 100644
--- a/doc/src/sgml/catalogs.sgml
+++ b/doc/src/sgml/catalogs.sgml
@@ -200,6 +200,26 @@
       <entry>metadata for large objects</entry>
      </row>
 
+     <row>
+      <entry><link linkend="catalog-pg-lrg-info"><structname>pg_lrg_info</structname></link></entry>
+      <entry>logical replication group</entry>
+     </row>
+
+     <row>
+      <entry><link linkend="catalog-pg-lrg-nodes"><structname>pg_lrg_nodes</structname></link></entry>
+      <entry>logical replication group</entry>
+     </row>
+
+     <row>
+      <entry><link linkend="catalog-pg-lrg-pub"><structname>pg_lrg_pub</structname></link></entry>
+      <entry>logical replication group</entry>
+     </row>
+
+     <row>
+      <entry><link linkend="catalog-pg-lrg-sub"><structname>pg_lrg_sub</structname></link></entry>
+      <entry>logical replication group</entry>
+     </row>
+
      <row>
       <entry><link linkend="catalog-pg-namespace"><structname>pg_namespace</structname></link></entry>
       <entry>schemas</entry>
@@ -4960,6 +4980,314 @@ SCRAM-SHA-256$<replaceable>&lt;iteration count&gt;</replaceable>:<replaceable>&l
   </table>
  </sect1>
 
+ <sect1 id="catalog-pg-lrg-info">
+  <title><structname>pg_lrg_info</structname></title>
+
+  <indexterm zone="catalog-pg-lrg-info">
+   <primary>pg_lrg_info</primary>
+  </indexterm>
+  <para>
+   The catalog <structname>pg_lrg_info</structname> stores information about
+   node groups. A node group is a group of nodes that send and receive their
+   data changes, and it can be created by the <function>lrg_create</function>
+   function. See <xref linkend="lrg"/> for more information about node groups.
+  </para>
+
+  <para>
+   Each database in the same node group has a tuple with the same information
+   except for <structfield>oid</structfield>.
+  </para>
+
+  <para>
+   Currently, a node can participate in only one node group. This means that
+   this system catalog has at most one tuple.
+  </para>
+
+  <table>
+   <title><structname>pg_lrg_info</structname> Columns</title>
+   <tgroup cols="1">
+    <thead>
+     <row>
+      <entry role="catalog_table_entry"><para role="column_definition">
+       Column Type
+      </para>
+      <para>
+       Description
+      </para></entry>
+     </row>
+    </thead>
+
+    <tbody>
+     <row>
+      <entry role="catalog_table_entry"><para role="column_definition">
+       <structfield>oid</structfield> <type>oid</type>
+      </para>
+      <para>
+       Row identifier
+      </para></entry>
+     </row>
+
+     <row>
+      <entry role="catalog_table_entry"><para role="column_definition">
+       <structfield>groupname</structfield> <type>name</type>
+      </para>
+      <para>
+       Name of the group
+      </para></entry>
+     </row>
+
+     <row>
+      <entry role="catalog_table_entry"><para role="column_definition">
+       <structfield>puballtables</structfield> <type>bool</type>
+      </para>
+      <para>
+       The type of publication
+      </para></entry>
+     </row>
+    </tbody>
+   </tgroup>
+  </table>
+ </sect1>
+
+ <sect1 id="catalog-pg-lrg-nodes">
+  <title><structname>pg_lrg_nodes</structname></title>
+
+  <indexterm zone="catalog-pg-lrg-nodes">
+   <primary>pg_lrg_nodes</primary>
+  </indexterm>
+  <para>
+   The catalog <structname>pg_lrg_nodes</structname> stores information about
+   nodes in node group that this node has attached. When <function>lrg_node_attach</function>()
+   is executed, information about other nodes in a group is copied from the
+   upstream node, and information about this node is copied to the other nodes.
+   Therefore, each database in the same node group has tuples with the same
+   information except for <structfield>oid</structfield> and <structfield>dbid</structfield>.
+   See <xref linkend="lrg"/> for more information about LRG.
+  </para>
+
+  <table>
+   <title><structname>pg_lrg_nodes</structname> Columns</title>
+   <tgroup cols="1">
+    <thead>
+     <row>
+      <entry role="catalog_table_entry"><para role="column_definition">
+       Column Type
+      </para>
+      <para>
+       Description
+      </para></entry>
+     </row>
+    </thead>
+
+    <tbody>
+     <row>
+      <entry role="catalog_table_entry"><para role="column_definition">
+       <structfield>oid</structfield> <type>oid</type>
+      </para>
+      <para>
+       Row identifier
+      </para></entry>
+     </row>
+
+     <row>
+      <entry role="catalog_table_entry"><para role="column_definition">
+       <structfield>nodeid</structfield> <type>name</type>
+      </para>
+      <para>
+       Identifier of this node
+      </para></entry>
+     </row>
+
+     <row>
+      <entry role="catalog_table_entry"><para role="column_definition">
+       <structfield>groupid</structfield> <type>oid</type>
+       (references <link linkend="catalog-pg-lrg-info"><structname>pg_lrg_info</structname></link>.<structfield>oid</structfield>)
+      </para>
+      <para>
+       Attached group
+      </para></entry>
+     </row>
+
+     <row>
+      <entry role="catalog_table_entry"><para role="column_definition">
+       <structfield>dbid</structfield> <type>oid</type>
+       (references <link linkend="catalog-pg-database"><structname>pg_database</structname></link>.<structfield>oid</structfield>)
+      </para>
+      <para>
+       Database
+      </para></entry>
+     </row>
+
+     <row>
+      <entry role="catalog_table_entry"><para role="column_definition">
+       <structfield>status</structfield> <type>int</type>
+      </para>
+      <para>
+       State code:
+       <literal>0</literal> = initialize,
+       <literal>1</literal> = a publication was created in this node
+       <literal>2</literal> = subscriptions were created in all nodes
+       <literal>3</literal> = ready
+       <literal>4</literal> = this node is now detaching from a group
+      </para></entry>
+     </row>
+
+     <row>
+      <entry role="catalog_table_entry"><para role="column_definition">
+       <structfield>nodename</structfield> <type>name</type>
+      </para>
+      <para>
+       Name of this node
+      </para></entry>
+     </row>
+
+     <row>
+      <entry role="catalog_table_entry"><para role="column_definition">
+       <structfield>localconn</structfield> <type>name</type>
+      </para>
+      <para>
+       Connection string for this node
+      </para></entry>
+     </row>
+
+     <row>
+      <entry role="catalog_table_entry"><para role="column_definition">
+       <structfield>upstreamconn</structfield> <type>name</type>
+      </para>
+      <para>
+       Connection string for upstream node
+      </para></entry>
+     </row>
+
+    </tbody>
+   </tgroup>
+  </table>
+ </sect1>
+
+ <sect1 id="catalog-pg-lrg-pub">
+  <title><structname>pg_lrg_pub</structname></title>
+
+  <indexterm zone="catalog-pg-lrg-pub">
+   <primary>pg_lrg_pub</primary>
+  </indexterm>
+  <para>
+   The catalog <structname>pg_lrg_pub</structname> contains the mapping between
+   node groups and a publication in the database. See <xref linkend="lrg"/> for
+   more information about LRG.
+  </para>
+
+  <table>
+   <title><structname>pg_lrg_pub</structname> Columns</title>
+   <tgroup cols="1">
+    <thead>
+     <row>
+      <entry role="catalog_table_entry"><para role="column_definition">
+       Column Type
+      </para>
+      <para>
+       Description
+      </para></entry>
+     </row>
+    </thead>
+
+    <tbody>
+     <row>
+      <entry role="catalog_table_entry"><para role="column_definition">
+       <structfield>oid</structfield> <type>oid</type>
+      </para>
+      <para>
+       Row identifier
+      </para></entry>
+     </row>
+
+     <row>
+      <entry role="catalog_table_entry"><para role="column_definition">
+       <structfield>groupid</structfield> <type>oid</type>
+       (references <link linkend="catalog-pg-lrg-info"><structname>pg_lrg_info</structname></link>.<structfield>oid</structfield>)
+      </para>
+      <para>
+       Attached group
+      </para></entry>
+     </row>
+
+     <row>
+      <entry role="catalog_table_entry"><para role="column_definition">
+       <structfield>pubid</structfield> <type>oid</type>
+       (references <link linkend="catalog-pg-publication"><structname>pg_publication</structname></link>.<structfield>oid</structfield>)
+      </para>
+      <para>
+       Publication
+      </para></entry>
+     </row>
+
+
+    </tbody>
+   </tgroup>
+  </table>
+ </sect1>
+
+
+ <sect1 id="catalog-pg-lrg-sub">
+  <title><structname>pg_lrg_sub</structname></title>
+
+  <indexterm zone="catalog-pg-lrg-sub">
+   <primary>pg_lrg_sub</primary>
+  </indexterm>
+  <para>
+   The catalog <structname>pg_lrg_sub</structname> contains the mapping between
+   node groups and subscriptions in the database. See <xref linkend="lrg"/> for
+   more information about LRG.
+  </para>
+
+  <table>
+   <title><structname>pg_lrg_sub</structname> Columns</title>
+   <tgroup cols="1">
+    <thead>
+     <row>
+      <entry role="catalog_table_entry"><para role="column_definition">
+       Column Type
+      </para>
+      <para>
+       Description
+      </para></entry>
+     </row>
+    </thead>
+
+    <tbody>
+     <row>
+      <entry role="catalog_table_entry"><para role="column_definition">
+       <structfield>oid</structfield> <type>oid</type>
+      </para>
+      <para>
+       Row identifier
+      </para></entry>
+     </row>
+
+     <row>
+      <entry role="catalog_table_entry"><para role="column_definition">
+       <structfield>groupid</structfield> <type>oid</type>
+       (references <link linkend="catalog-pg-lrg-info"><structname>pg_lrg_info</structname></link>.<structfield>oid</structfield>)
+      </para>
+      <para>
+       Attached group
+      </para></entry>
+     </row>
+
+     <row>
+      <entry role="catalog_table_entry"><para role="column_definition">
+       <structfield>subid</structfield> <type>oid</type>
+       (references <link linkend="catalog-pg-subscription"><structname>pg_subscription</structname></link>.<structfield>oid</structfield>)
+      </para>
+      <para>
+       Subscription
+      </para></entry>
+     </row>
+
+
+    </tbody>
+   </tgroup>
+  </table>
+ </sect1>
 
  <sect1 id="catalog-pg-namespace">
   <title><structname>pg_namespace</structname></title>
diff --git a/doc/src/sgml/filelist.sgml b/doc/src/sgml/filelist.sgml
index 40ef5f7ffc..8be17a652e 100644
--- a/doc/src/sgml/filelist.sgml
+++ b/doc/src/sgml/filelist.sgml
@@ -49,6 +49,7 @@
 <!ENTITY wal           SYSTEM "wal.sgml">
 <!ENTITY logical-replication    SYSTEM "logical-replication.sgml">
 <!ENTITY jit    SYSTEM "jit.sgml">
+<!ENTITY lrg    SYSTEM "lrg.sgml">
 
 <!-- programmer's guide -->
 <!ENTITY bgworker   SYSTEM "bgworker.sgml">
diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml
index 85ecc639fd..0722ff8989 100644
--- a/doc/src/sgml/func.sgml
+++ b/doc/src/sgml/func.sgml
@@ -29233,6 +29233,119 @@ postgres=# SELECT * FROM pg_walfile_name_offset((pg_backup_stop()).lsn);
 
   </sect2>
 
+  <sect2 id="functions-lrg">
+   <title>Logical Replication Group Management Functions</title>
+
+   <para>
+    The functions shown
+    in <xref linkend="functions-lrg-table"/> are for
+    controlling and interacting with node groups.
+   </para>
+
+   <table id="functions-lrg-table">
+    <title>LRG Management Functions</title>
+    <tgroup cols="1">
+     <thead>
+      <row>
+       <entry role="func_table_entry"><para role="func_signature">
+        Function
+       </para>
+       <para>
+        Description
+       </para></entry>
+      </row>
+     </thead>
+
+     <tbody>
+      <row>
+       <entry role="func_table_entry"><para role="func_signature">
+        <indexterm>
+         <primary></primary>
+        </indexterm>
+        <function>lrg_create</function> ( <parameter>group_name</parameter> <type>name</type>, <parameter>publication_type</parameter> <type>text</type>, <parameter>local_connstring</parameter> <type>text</type>, <parameter>node_name</parameter> <type>name</type> )
+        <returnvalue>void</returnvalue>
+       </para>
+       <para>
+        Creates a node group named <parameter>group_name</parameter>
+        that shares <parameter>publication_type</parameter>, and attach to it as
+        <parameter>node_name</parameter>. If this node is already in a logical
+        replication group, an error is raised. Note that this function returns
+        immediately. This function is just a trigger for starting complex
+        operations by the LRG worker process. If users want to wait until
+        finishing them, <function>lrg_wait</function> can be used.
+       </para></entry>
+      </row>
+      <row>
+       <entry role="func_table_entry"><para role="func_signature">
+        <indexterm>
+         <primary></primary>
+        </indexterm>
+        <function>lrg_node_attach</function> ( <parameter>group_name</parameter> <type>name</type>, <parameter>local_connstring</parameter> <type>text</type>, <parameter>upstream_connstring</parameter> <type>text</type>, <parameter>node_name</parameter> <type>name</type> )
+        <returnvalue>void</returnvalue>
+       </para>
+       <para>
+        Attach to a node group named <parameter>group_name</parameter>
+        as <parameter>node_name</parameter>. The initial data and configuration
+        information of this group will be shared from the <parameter>upstream_connstring</parameter>.
+        <parameter>node_name</parameter> must be unique in the group.
+        If this node is already in a node group, an error is
+        raised. Note that this function returns immediately. This function is
+        just a trigger for starting complex operations by the LRG worker
+        process. If users want to wait until finishing them, <function>lrg_wait</function>
+        can be used.
+       </para></entry>
+      </row>
+      <row>
+       <entry role="func_table_entry"><para role="func_signature">
+        <indexterm>
+         <primary></primary>
+        </indexterm>
+        <function>lrg_node_detach</function> ( <parameter>group_name</parameter> <type>name</type>, <parameter>node_name</parameter> <type>name</type> )
+        <returnvalue>void</returnvalue>
+       </para>
+       <para>
+        Detach <parameter>node_name</parameter> from <parameter>group_name</parameter>.
+        If this node is not in a node group, an error is raised.
+        Note that this function returns immediately. This function is just a
+        trigger for starting complex operations by the LRG worker process. If
+        users want to wait until finishing them, <function>lrg_wait</function>
+        can be used.
+       </para></entry>
+      </row>
+      <row>
+       <entry role="func_table_entry"><para role="func_signature">
+        <indexterm>
+         <primary></primary>
+        </indexterm>
+        <function>lrg_drop</function> ( <parameter>group_name</parameter> <type>name</type> )
+        <returnvalue>void</returnvalue>
+       </para>
+       <para>
+        Detach this node <parameter>group_name</parameter> and remove it. If
+        this node is not in a node group, or there are still nodes
+        in the group, an error is raised.
+       </para></entry>
+      </row>
+      <row>
+       <entry role="func_table_entry"><para role="func_signature">
+        <indexterm>
+         <primary></primary>
+        </indexterm>
+        <function>lrg_wait</function> ()
+        <returnvalue>void</returnvalue>
+       </para>
+       <para>
+        Wait until LRG related operations are done. This function returns
+        immediately if this node is not in a node group.
+       </para></entry>
+      </row>
+
+     </tbody>
+    </tgroup>
+   </table>
+
+  </sect2>
+
   <sect2 id="functions-admin-dbobject">
    <title>Database Object Management Functions</title>
 
diff --git a/doc/src/sgml/lrg.sgml b/doc/src/sgml/lrg.sgml
new file mode 100644
index 0000000000..34843a3ef1
--- /dev/null
+++ b/doc/src/sgml/lrg.sgml
@@ -0,0 +1,417 @@
+<!-- doc/src/sgml/lrg.sgml -->
+ <chapter id="lrg">
+  <title>Logical Replication Group (LRG)</title>
+
+  <indexterm zone="lrg">
+   <primary>logical replication group</primary>
+  </indexterm>
+
+ <para>
+  Logical Replication Group (LRG) is a way to create a node group that
+  replicates data objects and their changes to each other. All nodes in
+  the group can execute Read-Write queries, and its changes will
+  "eventually" send to other nodes.
+ </para>
+
+ <para>
+  When a node is attached to a node group, data that nodes in the group have
+  will be copied to the attaching node. Also, data existing in the node will
+  be copied to others.
+ </para>
+
+ <para>
+  Each node in a node group has one publication and same number of subscriptions
+  as other nodes.
+ </para>
+
+ <para>
+  LRG is especially useful when working with "data-aware" applications.
+  It is recommended that users build a node group located acros
+  different geographical location, and each node inserts or updates different
+  tuples.
+ </para>
+
+ <para>
+  Advantages of LRG are:
+
+  <itemizedlist>
+   <listitem>
+    <para>
+     Allowing load balancing
+    </para>
+   </listitem>
+   <listitem>
+    <para>
+     Allowing rolling updates of nodes
+    </para>
+   </listitem>
+   <listitem>
+    <para>
+     Improving the availability of the system
+    </para>
+   </listitem>
+  </itemizedlist>
+ </para>
+
+ <sect1 id="lrg-group">
+  <title>Node group</title>
+  <para>
+   Node group is a group that participant nodes have same tables and
+   send/receive their data changes. This is done via <link linkend="logical-replication">logical replication</link>.
+   Each node in a group must be connected to all other nodes. LRG make such a
+   group by creating publications and subscriptions on nodes.
+  </para>
+
+  <para>
+   Node groups are created by <function>lrg_create</function>.
+   The node that executed this function automatically attaches to the created
+   group. This function requires a connection string that is used for
+   connecting to the node from other or connecting to itself. Therefore, it is
+   not recommended to specify the Unix-domain socket or loopback address as the
+   connection string unless testing purposes.
+  </para>
+
+  <para>
+   When users add a node to a node group, the node name must be specified. This
+   can be used for managing node information, and it must be specified when
+   users want to detach a node from a node group. The name must be unique within
+   a node group.
+  </para>
+
+  <para>
+   In order to attach to a node group, the function <function>lrg_node_attach</function>
+   is used. At that time, in addition to specifying connection string for this
+   node, a connection string for connecting to one of the node in the group must
+   be specified. The specified node is called as "upstream node". After
+   executing <function>lrg_node_attach</function>, this node connects to the
+   upstream node, and gets connection strings for other nodes and synchronizes
+   the initial data.
+  </para>
+
+  <para>
+   All tables present in the node group must be set up to allow all changes
+   to be replicated. This requires either of a primary key, or a replica
+   identity with unique index and not-null constraint. LRG does not check them
+   or throw any warnings. Users must define tables properly. Moreover, users
+   must not change any configurations of the automatically defined publication.
+  </para>
+
+  <para>
+   Currently, each node can attach to only one node group, and changes to all
+   tables are replicated to nodes in the group. This is implemented by
+   defining a publication with target list 'FOR ALL TABLES'. This is an
+   implementation restriction that might be lifted in a future release.
+  </para>
+ </sect1>
+
+ <sect1 id="lrg-interface">
+  <title>LRG SQL interface</title>
+  <para>
+   See <xref linkend="functions-lrg"/> for detailed documentation on
+   SQL-level APIs for interacting with LRG.
+  </para>
+ </sect1>
+
+ <sect1 id="lrg-locking">
+  <title>Locking over nodes</title>
+  <para>
+   This feature is under development.
+  </para>
+ </sect1>
+
+
+ <sect1 id="lrg-restriction">
+  <title>Restrictions</title>
+  <para>
+   LRG currently has the following restrictions or missing functionality.
+   <itemizedlist>
+    <listitem>
+     <para>
+      Each node can attach only one node set at a time.
+     </para>
+    </listitem>
+    <listitem>
+     <para>
+      All tables in the database must be replicated. That is, only
+      'FOR ALL TABLES' can be specified for <parameter>publication_type</parameter>
+      in <function>lrg_create</function>.
+     </para>
+    </listitem>
+    <listitem>
+     <para>
+      All replicated objects must be defined on the all nodes.
+     </para>
+    </listitem>
+    <listitem>
+     <para>
+      LRG does not provide any mechanism for conflict handling (e.g. PK violations).
+      Avoiding conflicts is the user responsibility.
+     </para>
+    </listitem>
+   </itemizedlist>
+  </para>
+ </sect1>
+
+ <sect1 id="lrg-configuration">
+  <title>Configuration</title>
+  <para>
+   LRG will create a publication and subscriptions on each node,
+   that number of subscribers is the number of participants minus one. Therefore LRG requires
+   several configuration options.
+  </para>
+
+  <itemizedlist>
+   <listitem>
+    <para>
+     <varname>wal_level</varname> must be <literal>logical</literal>.
+    </para>
+   </listitem>
+   <listitem>
+    <para>
+     <varname>max_replication_slots</varname> must be set to at least the number of participants.
+    </para>
+   </listitem>
+   <listitem>
+    <para>
+     <varname>max_wal_senders</varname> must be set to at least same as <varname>max_replication_slots</varname>.
+    </para>
+   </listitem>
+   <listitem>
+    <para>
+     <varname>max_logical_replication_workers</varname> must be larger than the number of participants.
+    </para>
+   </listitem>
+   <listitem>
+    <para>
+     <varname>max_worker_processes</varname> must be larger that <literal>max_logical_replication_workers</literal>.
+    </para>
+   </listitem>
+  </itemizedlist>
+ </sect1>
+
+ <sect1 id="lrg-architecture">
+  <title>Architecture</title>
+
+  <para>
+   LRG is implemented by "LRG launcher" and "LRG worker" processes. The launcher
+   process is registered when the server is not in the recovery mode, and it has
+   a responsibility for starting LRG worker processes. The actual operations for
+   creating or attaching to a node group are done by the LRG
+   worker process. There can be one LRG worker per database.
+  </para>
+
+  <para>
+   More specifically: when SQL functions related with LRG are called by users,
+   tuples will be inserted to catalogs and backend process will notify to the
+   launcher. After receiving the notification, the launcher seek <xref linkend="catalog-pg-lrg-nodes"/>
+   and launches the LRG worker process. This seeking and launching will be
+   also done when server boots.
+  </para>
+ </sect1>
+
+  <sect1 id="lrg-example">
+   <title>Example</title>
+   <para>
+    The following example demonstrates constructing a node group.
+    Assuming that there are three nodes, called Node1, Node2, Node3, and they can connect each other.
+    In the example all nodes are in the same server.
+   </para>
+
+   <sect2>
+    <title>Create a group</title>
+
+    <para>
+     At first, a node group must be created. Here it will be created on Node1.
+    </para>
+<programlisting>
+postgres=# -- Create a node group 'testgroup', and attach to it as 'testnode1'
+postgres=# SELECT lrg_create('testgroup', 'FOR ALL TABLES', 'port=5431 dbname=postgres', 'testnode1');
+ lrg_create
+------------
+
+(1 row)
+
+postgres=# -- Wait until operations by LRG worker are done
+postgres=# SELECT lrg_wait();
+ lrg_wait
+----------
+
+(1 row)
+
+</programlisting>
+
+    <para>
+     Information of the group and nodes can check via system catalogs,
+     like <link linkend="catalog-pg-lrg-nodes"><structname>pg_lrg_nodes</structname></link>.
+    </para>
+<programlisting>
+postgres=# SELECT * FROM pg_lrg_info ;
+  oid  | groupname | puballtables
+-------+-----------+--------------
+ 16384 | testgroup | t
+(1 row)
+
+postgres=# SELECT * FROM pg_lrg_nodes;
+  oid  |        nodeid        | groupid | dbid | status | nodename  |         localconn         | upstreamconn
+-------+----------------------+---------+------+--------+-----------+---------------------------+--------------
+ 16385 | 70988980716274892555 |   16384 |    5 |      3 | testnode1 | port=5431 dbname=postgres |
+(1 row)
+</programlisting>
+
+   </sect2>
+
+   <sect2>
+    <title>Attach to a group</title>
+
+    <para>
+     Next Node2 can attach the created group. Note again that the connection string for Node1 must be specified.
+    </para>
+
+<programlisting>
+postgres=# -- Attach to 'testgroup' as 'testnode2'
+postgres=# SELECT lrg_node_attach('testgroup', 'port=5432 dbname=postgres', 'port=5431 dbname=postgres', 'testnode2');
+ lrg_node_attach
+-----------------
+
+(1 row)
+
+postgres=# -- Wait until operations by LRG worker are done
+postgres=# SELECT lrg_wait();
+ lrg_wait
+----------
+
+(1 row)
+</programlisting>
+
+    <para>
+     The status of all nodes can check via pg_lrg_nodes. Following tuples will be found on both nodes.
+    </para>
+
+<programlisting>
+postgres=# SELECT * FROM pg_lrg_info ;
+  oid  | groupname | puballtables
+-------+-----------+--------------
+ 16384 | testgroup | t
+(1 row)
+
+postgres=# SELECT * FROM pg_lrg_nodes;
+  oid  |        nodeid        | groupid | dbid | status | nodename  |         localconn         |       upstreamconn
+-------+----------------------+---------+------+--------+-----------+---------------------------+---------------------------
+ 16385 | 70989229890284027935 |   16384 |    5 |      3 | testnode2 | port=5432 dbname=postgres | port=5431 dbname=postgres
+ 16386 | 70988980716274892555 |   16385 |    5 |      3 | testnode1 | port=5431 dbname=postgres |
+(2 rows)
+</programlisting>
+
+    <para>
+     Now Node1 and Node2 has same contents about LRG system catalogs,
+     so ether of them can be specified as an upstream node.
+     In below example Node2 is used as upstream for attaching a new node.
+    </para>
+
+<programlisting>
+postgres=# -- Attach to 'testgroup' as 'testode3', and data will be copied from Node2
+postgres=# SELECT lrg_node_attach('testgroup', 'port=5433 dbname=postgres', 'port=5432 dbname=postgres', 'testnode3');
+ lrg_node_attach
+-----------------
+
+(1 row)
+
+postgres=# -- Wait until operations by LRG worker are done
+postgres=# SELECT lrg_wait();
+ lrg_wait
+----------
+
+(1 row)
+</programlisting>
+
+    <para>
+     Finally pg_lrg_info and pg_lrg_nodes will be like:
+    </para>
+
+<programlisting>
+postgres=# SELECT * FROM pg_lrg_info ;
+  oid  | groupname | puballtables
+-------+-----------+--------------
+ 16384 | testgroup | t
+(1 row)
+
+postgres=# SELECT * FROM pg_lrg_nodes;
+  oid  |        nodeid        | groupid | dbid | status | nodename  |         localconn         |       upstreamconn
+-------+----------------------+---------+------+--------+-----------+---------------------------+---------------------------
+ 16385 | 70989243367269230745 |   16384 |    5 |      3 | testnode3 | port=5433 dbname=postgres | port=5432 dbname=postgres
+ 16386 | 70989229890284027935 |   16385 |    5 |      3 | testnode2 | port=5432 dbname=postgres | port=5431 dbname=postgres
+ 16387 | 70988980716274892555 |   16385 |    5 |      3 | testnode1 | port=5431 dbname=postgres |
+(3 rows)
+</programlisting>
+
+    <para>
+     Now all nodes publish their changes, and they subscribe them. If a tuple inserted on Node1,
+     data will be also found on Node2 and Node3.
+    </para>
+
+   </sect2>
+
+   <sect2>
+    <title>Detach from a group</title>
+    <para>
+     User can detach attached nodes by executing <function>lrg_node_detach</function>
+     at any time. This function must be called from a node that is a member of a
+     group.
+    </para>
+
+<programlisting>
+postgres=# -- Detach Node3 from 'testgroup'. This can be done from the arbitrary member of the group
+postgres=# select lrg_node_detach('testgroup', 'testnode3');
+ lrg_node_detach
+-----------------
+
+(1 row)
+
+postgres=# -- Also detach Node2 from 'testgroup'. This can be done from Node1 or Node2
+postgres=# select lrg_node_detach('testgroup', 'testnode2');
+ lrg_node_detach
+-----------------
+
+(1 row)
+</programlisting>
+
+    <para>
+     The user-defined data will be not removed even if nodes are detached, but tuples in LRG related catalogs will be deleted.
+    </para>
+
+<programlisting>
+postgres=# -- On Node2, there are no tuples in pg_lrg_nodes
+postgres=# select * from pg_lrg_nodes;
+ oid | nodeid | groupid | dbid | status | nodename | localconn | upstreamconn
+-----+--------+---------+------+--------+----------+-----------+--------------
+(0 rows)
+</programlisting>
+   </sect2>
+
+   <sect2>
+    <title>Drop a group</title>
+    <para>
+     For dropping a group, the API <function>lrg_drop</function> can be used.
+     The function must be callled from a node that is a member of a group, and
+     it will throw ERROR if there are other members in the group.
+    </para>
+
+<programlisting>
+postgres=# -- Drop 'testgroup.' Node1 will detach from it automatically.
+postgres=# select lrg_drop('testgroup');
+ lrg_drop
+----------
+
+(1 row)
+
+postgres=# -- There are no tuples in pg_lrg_info
+postgres=# select * from pg_lrg_info;
+ oid | groupname | puballtables
+-----+-----------+--------------
+(0 rows)
+</programlisting>
+   </sect2>
+
+  </sect1>
+
+ </chapter>
\ No newline at end of file
diff --git a/doc/src/sgml/postgres.sgml b/doc/src/sgml/postgres.sgml
index 0b60e46d69..bcea47fdc9 100644
--- a/doc/src/sgml/postgres.sgml
+++ b/doc/src/sgml/postgres.sgml
@@ -172,6 +172,7 @@ break is not needed in a wider output rendering.
   &logical-replication;
   &jit;
   &regress;
+  &lrg;
 
  </part>
 
-- 
2.27.0

v16-0001-Skip-replication-of-non-local-data.patchapplication/octet-stream; name=v16-0001-Skip-replication-of-non-local-data.patchDownload

From 8c6ef8b4534aae33d369ef43733dc91c92cb3ba0 Mon Sep 17 00:00:00 2001
From: Vigneshwaran C <vignesh21@gmail.com>
Date: Fri, 8 Apr 2022 11:10:05 +0530
Subject: [PATCH v16 1/2] Skip replication of non local data.

This patch adds a new SUBSCRIPTION boolean option
"only_local". The default is false. When a SUBSCRIPTION is
created with this option enabled, the publisher will only publish data
that originated at the publisher node.
Usage:
CREATE SUBSCRIPTION sub1 CONNECTION 'dbname=postgres port=9999'
PUBLICATION pub1 with (only_local = true);
---
 contrib/test_decoding/expected/replorigin.out |  55 ++++++
 contrib/test_decoding/sql/replorigin.sql      |  15 ++
 doc/src/sgml/catalogs.sgml                    |  11 ++
 doc/src/sgml/ref/alter_subscription.sgml      |   5 +-
 doc/src/sgml/ref/create_subscription.sgml     |  12 ++
 src/backend/catalog/pg_subscription.c         |   1 +
 src/backend/catalog/system_views.sql          |   4 +-
 src/backend/commands/subscriptioncmds.c       |  26 ++-
 .../libpqwalreceiver/libpqwalreceiver.c       |   5 +
 src/backend/replication/logical/worker.c      |   2 +
 src/backend/replication/pgoutput/pgoutput.c   |  20 ++-
 src/bin/pg_dump/pg_dump.c                     |  17 +-
 src/bin/pg_dump/pg_dump.h                     |   1 +
 src/bin/psql/describe.c                       |   8 +-
 src/bin/psql/tab-complete.c                   |   4 +-
 src/include/catalog/pg_subscription.h         |   3 +
 src/include/replication/pgoutput.h            |   1 +
 src/include/replication/walreceiver.h         |   1 +
 src/test/regress/expected/subscription.out    | 142 ++++++++-------
 src/test/regress/sql/subscription.sql         |  10 ++
 src/test/subscription/t/032_onlylocal.pl      | 162 ++++++++++++++++++
 21 files changed, 433 insertions(+), 72 deletions(-)
 create mode 100644 src/test/subscription/t/032_onlylocal.pl

diff --git a/contrib/test_decoding/expected/replorigin.out b/contrib/test_decoding/expected/replorigin.out
index 2e9ef7c823..94ef390120 100644
--- a/contrib/test_decoding/expected/replorigin.out
+++ b/contrib/test_decoding/expected/replorigin.out
@@ -257,3 +257,58 @@ SELECT pg_replication_origin_drop('regress_test_decoding: regression_slot_no_lsn
  
 (1 row)
 
+-- Verify that remote origin data is not returned with only-local option
+SELECT 'init' FROM pg_create_logical_replication_slot('regression_slot_only_local', 'test_decoding');
+ ?column? 
+----------
+ init
+(1 row)
+
+SELECT pg_replication_origin_create('regress_test_decoding: regression_slot_only_local');
+ pg_replication_origin_create 
+------------------------------
+                            1
+(1 row)
+
+SELECT pg_replication_origin_session_setup('regress_test_decoding: regression_slot_only_local');
+ pg_replication_origin_session_setup 
+-------------------------------------
+ 
+(1 row)
+
+INSERT INTO origin_tbl(data) VALUES ('only_local, commit1');
+-- remote origin data returned when only-local option is not set
+SELECT data FROM pg_logical_slot_get_changes('regression_slot_only_local', NULL, NULL, 'skip-empty-xacts', '1', 'include-xids', '0', 'only-local', '0');
+                                      data                                       
+---------------------------------------------------------------------------------
+ BEGIN
+ table public.origin_tbl: INSERT: id[integer]:8 data[text]:'only_local, commit1'
+ COMMIT
+(3 rows)
+
+INSERT INTO origin_tbl(data) VALUES ('only_local, commit2');
+-- remote origin data not returned when only-local option is set
+SELECT data FROM pg_logical_slot_get_changes('regression_slot_only_local', NULL, NULL, 'skip-empty-xacts', '1', 'include-xids', '0', 'only-local', '1');
+ data 
+------
+(0 rows)
+
+-- Clean up
+SELECT pg_replication_origin_session_reset();
+ pg_replication_origin_session_reset 
+-------------------------------------
+ 
+(1 row)
+
+SELECT pg_drop_replication_slot('regression_slot_only_local');
+ pg_drop_replication_slot 
+--------------------------
+ 
+(1 row)
+
+SELECT pg_replication_origin_drop('regress_test_decoding: regression_slot_only_local');
+ pg_replication_origin_drop 
+----------------------------
+ 
+(1 row)
+
diff --git a/contrib/test_decoding/sql/replorigin.sql b/contrib/test_decoding/sql/replorigin.sql
index 2e28a48777..81d0065b63 100644
--- a/contrib/test_decoding/sql/replorigin.sql
+++ b/contrib/test_decoding/sql/replorigin.sql
@@ -119,3 +119,18 @@ SELECT data FROM pg_logical_slot_get_changes('regression_slot_no_lsn', NULL, NUL
 SELECT pg_replication_origin_session_reset();
 SELECT pg_drop_replication_slot('regression_slot_no_lsn');
 SELECT pg_replication_origin_drop('regress_test_decoding: regression_slot_no_lsn');
+
+-- Verify that remote origin data is not returned with only-local option
+SELECT 'init' FROM pg_create_logical_replication_slot('regression_slot_only_local', 'test_decoding');
+SELECT pg_replication_origin_create('regress_test_decoding: regression_slot_only_local');
+SELECT pg_replication_origin_session_setup('regress_test_decoding: regression_slot_only_local');
+INSERT INTO origin_tbl(data) VALUES ('only_local, commit1');
+-- remote origin data returned when only-local option is not set
+SELECT data FROM pg_logical_slot_get_changes('regression_slot_only_local', NULL, NULL, 'skip-empty-xacts', '1', 'include-xids', '0', 'only-local', '0');
+INSERT INTO origin_tbl(data) VALUES ('only_local, commit2');
+-- remote origin data not returned when only-local option is set
+SELECT data FROM pg_logical_slot_get_changes('regression_slot_only_local', NULL, NULL, 'skip-empty-xacts', '1', 'include-xids', '0', 'only-local', '1');
+-- Clean up
+SELECT pg_replication_origin_session_reset();
+SELECT pg_drop_replication_slot('regression_slot_only_local');
+SELECT pg_replication_origin_drop('regress_test_decoding: regression_slot_only_local');
diff --git a/doc/src/sgml/catalogs.sgml b/doc/src/sgml/catalogs.sgml
index c00c93dd7b..63ac94c252 100644
--- a/doc/src/sgml/catalogs.sgml
+++ b/doc/src/sgml/catalogs.sgml
@@ -7903,6 +7903,17 @@ SCRAM-SHA-256$<replaceable>&lt;iteration count&gt;</replaceable>:<replaceable>&l
       </para></entry>
      </row>
 
+    <row>
+      <entry role="catalog_table_entry"><para role="column_definition">
+       <structfield>subonlylocal</structfield> <type>bool</type>
+      </para>
+      <para>
+       If true, the subscription will request that the publisher send locally
+       originated changes. False indicates that the publisher sends any changes
+       regardless of their origin.
+      </para></entry>
+     </row>
+
      <row>
       <entry role="catalog_table_entry"><para role="column_definition">
        <structfield>subconninfo</structfield> <type>text</type>
diff --git a/doc/src/sgml/ref/alter_subscription.sgml b/doc/src/sgml/ref/alter_subscription.sgml
index 353ea5def2..45beca9b86 100644
--- a/doc/src/sgml/ref/alter_subscription.sgml
+++ b/doc/src/sgml/ref/alter_subscription.sgml
@@ -207,8 +207,9 @@ ALTER SUBSCRIPTION <replaceable class="parameter">name</replaceable> RENAME TO <
       information.  The parameters that can be altered
       are <literal>slot_name</literal>,
       <literal>synchronous_commit</literal>,
-      <literal>binary</literal>, <literal>streaming</literal>, and
-      <literal>disable_on_error</literal>.
+      <literal>binary</literal>, <literal>streaming</literal>,
+      <literal>disable_on_error</literal>, and
+      <literal>only_local</literal>.
      </para>
     </listitem>
    </varlistentry>
diff --git a/doc/src/sgml/ref/create_subscription.sgml b/doc/src/sgml/ref/create_subscription.sgml
index 203bb41844..3d3a26a61c 100644
--- a/doc/src/sgml/ref/create_subscription.sgml
+++ b/doc/src/sgml/ref/create_subscription.sgml
@@ -302,6 +302,18 @@ CREATE SUBSCRIPTION <replaceable class="parameter">subscription_name</replaceabl
          </para>
         </listitem>
        </varlistentry>
+
+       <varlistentry>
+        <term><literal>only_local</literal> (<type>boolean</type>)</term>
+        <listitem>
+         <para>
+          Specifies whether the subscription will request the publisher to send
+          locally originated changes at the publisher node, or send any
+          publisher node changes regardless of their origin. The default is
+          <literal>false</literal>.
+         </para>
+        </listitem>
+       </varlistentry>
       </variablelist></para>
 
     </listitem>
diff --git a/src/backend/catalog/pg_subscription.c b/src/backend/catalog/pg_subscription.c
index add51caadf..f0c83aaf59 100644
--- a/src/backend/catalog/pg_subscription.c
+++ b/src/backend/catalog/pg_subscription.c
@@ -71,6 +71,7 @@ GetSubscription(Oid subid, bool missing_ok)
 	sub->stream = subform->substream;
 	sub->twophasestate = subform->subtwophasestate;
 	sub->disableonerr = subform->subdisableonerr;
+	sub->only_local = subform->subonlylocal;
 
 	/* Get conninfo */
 	datum = SysCacheGetAttr(SUBSCRIPTIONOID,
diff --git a/src/backend/catalog/system_views.sql b/src/backend/catalog/system_views.sql
index fedaed533b..88bde866ed 100644
--- a/src/backend/catalog/system_views.sql
+++ b/src/backend/catalog/system_views.sql
@@ -1298,8 +1298,8 @@ REVOKE ALL ON pg_replication_origin_status FROM public;
 -- All columns of pg_subscription except subconninfo are publicly readable.
 REVOKE ALL ON pg_subscription FROM public;
 GRANT SELECT (oid, subdbid, subskiplsn, subname, subowner, subenabled,
-              subbinary, substream, subtwophasestate, subdisableonerr, subslotname,
-              subsynccommit, subpublications)
+              subbinary, substream, subtwophasestate, subdisableonerr,
+              subonlylocal, subslotname, subsynccommit, subpublications)
     ON pg_subscription TO public;
 
 CREATE VIEW pg_stat_subscription_stats AS
diff --git a/src/backend/commands/subscriptioncmds.c b/src/backend/commands/subscriptioncmds.c
index 690cdaa426..1fc9ad547c 100644
--- a/src/backend/commands/subscriptioncmds.c
+++ b/src/backend/commands/subscriptioncmds.c
@@ -64,6 +64,7 @@
 #define SUBOPT_TWOPHASE_COMMIT		0x00000200
 #define SUBOPT_DISABLE_ON_ERR		0x00000400
 #define SUBOPT_LSN					0x00000800
+#define SUBOPT_ONLY_LOCAL			0x00001000
 
 /* check if the 'val' has 'bits' set */
 #define IsSet(val, bits)  (((val) & (bits)) == (bits))
@@ -86,6 +87,7 @@ typedef struct SubOpts
 	bool		streaming;
 	bool		twophase;
 	bool		disableonerr;
+	bool		only_local;
 	XLogRecPtr	lsn;
 } SubOpts;
 
@@ -137,6 +139,8 @@ parse_subscription_options(ParseState *pstate, List *stmt_options,
 		opts->twophase = false;
 	if (IsSet(supported_opts, SUBOPT_DISABLE_ON_ERR))
 		opts->disableonerr = false;
+	if (IsSet(supported_opts, SUBOPT_ONLY_LOCAL))
+		opts->only_local = false;
 
 	/* Parse options */
 	foreach(lc, stmt_options)
@@ -265,6 +269,15 @@ parse_subscription_options(ParseState *pstate, List *stmt_options,
 			opts->specified_opts |= SUBOPT_DISABLE_ON_ERR;
 			opts->disableonerr = defGetBoolean(defel);
 		}
+		else if (IsSet(supported_opts, SUBOPT_ONLY_LOCAL) &&
+				 strcmp(defel->defname, "only_local") == 0)
+		{
+			if (IsSet(opts->specified_opts, SUBOPT_ONLY_LOCAL))
+				errorConflictingDefElem(defel, pstate);
+
+			opts->specified_opts |= SUBOPT_ONLY_LOCAL;
+			opts->only_local = defGetBoolean(defel);
+		}
 		else if (IsSet(supported_opts, SUBOPT_LSN) &&
 				 strcmp(defel->defname, "lsn") == 0)
 		{
@@ -531,7 +544,7 @@ CreateSubscription(ParseState *pstate, CreateSubscriptionStmt *stmt,
 					  SUBOPT_SLOT_NAME | SUBOPT_COPY_DATA |
 					  SUBOPT_SYNCHRONOUS_COMMIT | SUBOPT_BINARY |
 					  SUBOPT_STREAMING | SUBOPT_TWOPHASE_COMMIT |
-					  SUBOPT_DISABLE_ON_ERR);
+					  SUBOPT_DISABLE_ON_ERR | SUBOPT_ONLY_LOCAL);
 	parse_subscription_options(pstate, stmt->options, supported_opts, &opts);
 
 	/*
@@ -607,6 +620,7 @@ CreateSubscription(ParseState *pstate, CreateSubscriptionStmt *stmt,
 					 LOGICALREP_TWOPHASE_STATE_PENDING :
 					 LOGICALREP_TWOPHASE_STATE_DISABLED);
 	values[Anum_pg_subscription_subdisableonerr - 1] = BoolGetDatum(opts.disableonerr);
+	values[Anum_pg_subscription_subonlylocal - 1] = BoolGetDatum(opts.only_local);
 	values[Anum_pg_subscription_subconninfo - 1] =
 		CStringGetTextDatum(conninfo);
 	if (opts.slot_name)
@@ -1015,7 +1029,8 @@ AlterSubscription(ParseState *pstate, AlterSubscriptionStmt *stmt,
 			{
 				supported_opts = (SUBOPT_SLOT_NAME |
 								  SUBOPT_SYNCHRONOUS_COMMIT | SUBOPT_BINARY |
-								  SUBOPT_STREAMING | SUBOPT_DISABLE_ON_ERR);
+								  SUBOPT_STREAMING | SUBOPT_DISABLE_ON_ERR |
+								  SUBOPT_ONLY_LOCAL);
 
 				parse_subscription_options(pstate, stmt->options,
 										   supported_opts, &opts);
@@ -1072,6 +1087,13 @@ AlterSubscription(ParseState *pstate, AlterSubscriptionStmt *stmt,
 						= true;
 				}
 
+				if (IsSet(opts.specified_opts, SUBOPT_ONLY_LOCAL))
+				{
+					values[Anum_pg_subscription_subonlylocal - 1] =
+						BoolGetDatum(opts.only_local);
+					replaces[Anum_pg_subscription_subonlylocal - 1] = true;
+				}
+
 				update_tuple = true;
 				break;
 			}
diff --git a/src/backend/replication/libpqwalreceiver/libpqwalreceiver.c b/src/backend/replication/libpqwalreceiver/libpqwalreceiver.c
index 0d89db4e6a..56a07f0dce 100644
--- a/src/backend/replication/libpqwalreceiver/libpqwalreceiver.c
+++ b/src/backend/replication/libpqwalreceiver/libpqwalreceiver.c
@@ -453,6 +453,11 @@ libpqrcv_startstreaming(WalReceiverConn *conn,
 			PQserverVersion(conn->streamConn) >= 150000)
 			appendStringInfoString(&cmd, ", two_phase 'on'");
 
+		/* FIXME: 150000 should be changed to 160000 later for PG16. */
+		if (options->proto.logical.only_local &&
+			PQserverVersion(conn->streamConn) >= 150000)
+			appendStringInfoString(&cmd, ", only_local 'on'");
+
 		pubnames = options->proto.logical.publication_names;
 		pubnames_str = stringlist_to_identifierstr(conn->streamConn, pubnames);
 		if (!pubnames_str)
diff --git a/src/backend/replication/logical/worker.c b/src/backend/replication/logical/worker.c
index fc210a9e7b..b289bc69e0 100644
--- a/src/backend/replication/logical/worker.c
+++ b/src/backend/replication/logical/worker.c
@@ -3059,6 +3059,7 @@ maybe_reread_subscription(void)
 		strcmp(newsub->slotname, MySubscription->slotname) != 0 ||
 		newsub->binary != MySubscription->binary ||
 		newsub->stream != MySubscription->stream ||
+		newsub->only_local != MySubscription->only_local ||
 		newsub->owner != MySubscription->owner ||
 		!equal(newsub->publications, MySubscription->publications))
 	{
@@ -3740,6 +3741,7 @@ ApplyWorkerMain(Datum main_arg)
 	options.proto.logical.binary = MySubscription->binary;
 	options.proto.logical.streaming = MySubscription->stream;
 	options.proto.logical.twophase = false;
+	options.proto.logical.only_local = MySubscription->only_local;
 
 	if (!am_tablesync_worker())
 	{
diff --git a/src/backend/replication/pgoutput/pgoutput.c b/src/backend/replication/pgoutput/pgoutput.c
index 42c06af239..63bcdf3e30 100644
--- a/src/backend/replication/pgoutput/pgoutput.c
+++ b/src/backend/replication/pgoutput/pgoutput.c
@@ -287,11 +287,13 @@ parse_output_parameters(List *options, PGOutputData *data)
 	bool		messages_option_given = false;
 	bool		streaming_given = false;
 	bool		two_phase_option_given = false;
+	bool		only_local_option_given = false;
 
 	data->binary = false;
 	data->streaming = false;
 	data->messages = false;
 	data->two_phase = false;
+	data->only_local = false;
 
 	foreach(lc, options)
 	{
@@ -380,6 +382,16 @@ parse_output_parameters(List *options, PGOutputData *data)
 
 			data->two_phase = defGetBoolean(defel);
 		}
+		else if (strcmp(defel->defname, "only_local") == 0)
+		{
+			if (only_local_option_given)
+				ereport(ERROR,
+						(errcode(ERRCODE_SYNTAX_ERROR),
+						 errmsg("conflicting or redundant options")));
+			only_local_option_given = true;
+
+			data->only_local = defGetBoolean(defel);
+		}
 		else
 			elog(ERROR, "unrecognized pgoutput option: %s", defel->defname);
 	}
@@ -1698,12 +1710,18 @@ pgoutput_message(LogicalDecodingContext *ctx, ReorderBufferTXN *txn,
 }
 
 /*
- * Currently we always forward.
+ * Return true if the data source (origin) is remote and user has requested
+ * only local data, false otherwise.
  */
 static bool
 pgoutput_origin_filter(LogicalDecodingContext *ctx,
 					   RepOriginId origin_id)
 {
+	PGOutputData *data = (PGOutputData *) ctx->output_plugin_private;
+
+	if (data->only_local && origin_id != InvalidRepOriginId)
+		return true;
+
 	return false;
 }
 
diff --git a/src/bin/pg_dump/pg_dump.c b/src/bin/pg_dump/pg_dump.c
index 7cc9c72e49..8f2e237cbe 100644
--- a/src/bin/pg_dump/pg_dump.c
+++ b/src/bin/pg_dump/pg_dump.c
@@ -4406,6 +4406,7 @@ getSubscriptions(Archive *fout)
 	int			i_substream;
 	int			i_subtwophasestate;
 	int			i_subdisableonerr;
+	int			i_subonlylocal;
 	int			i_subconninfo;
 	int			i_subslotname;
 	int			i_subsynccommit;
@@ -4455,13 +4456,19 @@ getSubscriptions(Archive *fout)
 	if (fout->remoteVersion >= 150000)
 		appendPQExpBufferStr(query,
 							 " s.subtwophasestate,\n"
-							 " s.subdisableonerr\n");
+							 " s.subdisableonerr,\n");
 	else
 		appendPQExpBuffer(query,
 						  " '%c' AS subtwophasestate,\n"
-						  " false AS subdisableonerr\n",
+						  " false AS subdisableonerr,\n",
 						  LOGICALREP_TWOPHASE_STATE_DISABLED);
 
+	/* FIXME: 150000 should be changed to 160000 later for PG16. */
+	if (fout->remoteVersion >= 150000)
+		appendPQExpBufferStr(query, " s.subonlylocal\n");
+	else
+		appendPQExpBufferStr(query, " false AS subonlylocal\n");
+
 	appendPQExpBufferStr(query,
 						 "FROM pg_subscription s\n"
 						 "WHERE s.subdbid = (SELECT oid FROM pg_database\n"
@@ -4487,6 +4494,7 @@ getSubscriptions(Archive *fout)
 	i_substream = PQfnumber(res, "substream");
 	i_subtwophasestate = PQfnumber(res, "subtwophasestate");
 	i_subdisableonerr = PQfnumber(res, "subdisableonerr");
+	i_subonlylocal = PQfnumber(res, "subonlylocal");
 
 	subinfo = pg_malloc(ntups * sizeof(SubscriptionInfo));
 
@@ -4516,6 +4524,8 @@ getSubscriptions(Archive *fout)
 			pg_strdup(PQgetvalue(res, i, i_subtwophasestate));
 		subinfo[i].subdisableonerr =
 			pg_strdup(PQgetvalue(res, i, i_subdisableonerr));
+		subinfo[i].subonlylocal =
+			pg_strdup(PQgetvalue(res, i, i_subonlylocal));
 
 		/* Decide whether we want to dump it */
 		selectDumpableObject(&(subinfo[i].dobj), fout);
@@ -4589,6 +4599,9 @@ dumpSubscription(Archive *fout, const SubscriptionInfo *subinfo)
 	if (strcmp(subinfo->subdisableonerr, "t") == 0)
 		appendPQExpBufferStr(query, ", disable_on_error = true");
 
+	if (strcmp(subinfo->subonlylocal, "t") == 0)
+		appendPQExpBufferStr(query, ", only_local = true");
+
 	if (strcmp(subinfo->subsynccommit, "off") != 0)
 		appendPQExpBuffer(query, ", synchronous_commit = %s", fmtId(subinfo->subsynccommit));
 
diff --git a/src/bin/pg_dump/pg_dump.h b/src/bin/pg_dump/pg_dump.h
index 1d21c2906f..09cdce43c6 100644
--- a/src/bin/pg_dump/pg_dump.h
+++ b/src/bin/pg_dump/pg_dump.h
@@ -659,6 +659,7 @@ typedef struct _SubscriptionInfo
 	char	   *substream;
 	char	   *subtwophasestate;
 	char	   *subdisableonerr;
+	char	   *subonlylocal;
 	char	   *subsynccommit;
 	char	   *subpublications;
 } SubscriptionInfo;
diff --git a/src/bin/psql/describe.c b/src/bin/psql/describe.c
index 1a5d924a23..0013e480d6 100644
--- a/src/bin/psql/describe.c
+++ b/src/bin/psql/describe.c
@@ -6354,7 +6354,7 @@ describeSubscriptions(const char *pattern, bool verbose)
 	PGresult   *res;
 	printQueryOpt myopt = pset.popt;
 	static const bool translate_columns[] = {false, false, false, false,
-	false, false, false, false, false, false, false};
+	false, false, false, false, false, false, false, false};
 
 	if (pset.sversion < 100000)
 	{
@@ -6396,6 +6396,12 @@ describeSubscriptions(const char *pattern, bool verbose)
 							  gettext_noop("Two phase commit"),
 							  gettext_noop("Disable on error"));
 
+		/* FIXME: 150000 should be changed to 160000 later for PG16 */
+		if (pset.sversion >= 150000)
+			appendPQExpBuffer(&buf,
+							  ", subonlylocal AS \"%s\"\n",
+							  gettext_noop("Only local"));
+
 		appendPQExpBuffer(&buf,
 						  ",  subsynccommit AS \"%s\"\n"
 						  ",  subconninfo AS \"%s\"\n",
diff --git a/src/bin/psql/tab-complete.c b/src/bin/psql/tab-complete.c
index 55af9eb04e..989d4f3bcb 100644
--- a/src/bin/psql/tab-complete.c
+++ b/src/bin/psql/tab-complete.c
@@ -1875,7 +1875,7 @@ psql_completion(const char *text, int start, int end)
 		COMPLETE_WITH("(", "PUBLICATION");
 	/* ALTER SUBSCRIPTION <name> SET ( */
 	else if (HeadMatches("ALTER", "SUBSCRIPTION", MatchAny) && TailMatches("SET", "("))
-		COMPLETE_WITH("binary", "slot_name", "streaming", "synchronous_commit", "disable_on_error");
+		COMPLETE_WITH("binary", "only_local", "slot_name", "streaming", "synchronous_commit", "disable_on_error");
 	/* ALTER SUBSCRIPTION <name> SKIP ( */
 	else if (HeadMatches("ALTER", "SUBSCRIPTION", MatchAny) && TailMatches("SKIP", "("))
 		COMPLETE_WITH("lsn");
@@ -3157,7 +3157,7 @@ psql_completion(const char *text, int start, int end)
 	/* Complete "CREATE SUBSCRIPTION <name> ...  WITH ( <opt>" */
 	else if (HeadMatches("CREATE", "SUBSCRIPTION") && TailMatches("WITH", "("))
 		COMPLETE_WITH("binary", "connect", "copy_data", "create_slot",
-					  "enabled", "slot_name", "streaming",
+					  "enabled", "only_local", "slot_name", "streaming",
 					  "synchronous_commit", "two_phase", "disable_on_error");
 
 /* CREATE TRIGGER --- is allowed inside CREATE SCHEMA, so use TailMatches */
diff --git a/src/include/catalog/pg_subscription.h b/src/include/catalog/pg_subscription.h
index d1260f590c..08ab5a1bf1 100644
--- a/src/include/catalog/pg_subscription.h
+++ b/src/include/catalog/pg_subscription.h
@@ -75,6 +75,8 @@ CATALOG(pg_subscription,6100,SubscriptionRelationId) BKI_SHARED_RELATION BKI_ROW
 	bool		subdisableonerr;	/* True if a worker error should cause the
 									 * subscription to be disabled */
 
+	bool		subonlylocal;		/* Skip replication of remote origin data */
+
 #ifdef CATALOG_VARLEN			/* variable-length fields start here */
 	/* Connection string to the publisher */
 	text		subconninfo BKI_FORCE_NOT_NULL;
@@ -114,6 +116,7 @@ typedef struct Subscription
 	bool		disableonerr;	/* Indicates if the subscription should be
 								 * automatically disabled if a worker error
 								 * occurs */
+	bool		only_local;		/* Skip replication of remote origin data */
 	char	   *conninfo;		/* Connection string to the publisher */
 	char	   *slotname;		/* Name of the replication slot */
 	char	   *synccommit;		/* Synchronous commit setting for worker */
diff --git a/src/include/replication/pgoutput.h b/src/include/replication/pgoutput.h
index eafedd610a..0461f4e634 100644
--- a/src/include/replication/pgoutput.h
+++ b/src/include/replication/pgoutput.h
@@ -29,6 +29,7 @@ typedef struct PGOutputData
 	bool		streaming;
 	bool		messages;
 	bool		two_phase;
+	bool		only_local;
 } PGOutputData;
 
 #endif							/* PGOUTPUT_H */
diff --git a/src/include/replication/walreceiver.h b/src/include/replication/walreceiver.h
index 81184aa92f..796c04db4e 100644
--- a/src/include/replication/walreceiver.h
+++ b/src/include/replication/walreceiver.h
@@ -183,6 +183,7 @@ typedef struct
 			bool		streaming;	/* Streaming of large transactions */
 			bool		twophase;	/* Streaming of two-phase transactions at
 									 * prepare time */
+			bool		only_local; /* publish only locally originated data */
 		}			logical;
 	}			proto;
 } WalRcvStreamOptions;
diff --git a/src/test/regress/expected/subscription.out b/src/test/regress/expected/subscription.out
index 7fcfad1591..a9351b426b 100644
--- a/src/test/regress/expected/subscription.out
+++ b/src/test/regress/expected/subscription.out
@@ -70,16 +70,38 @@ ALTER SUBSCRIPTION regress_testsub3 ENABLE;
 ERROR:  cannot enable subscription that does not have a slot name
 ALTER SUBSCRIPTION regress_testsub3 REFRESH PUBLICATION;
 ERROR:  ALTER SUBSCRIPTION ... REFRESH is not allowed for disabled subscriptions
+-- fail - only_local must be boolean
+CREATE SUBSCRIPTION regress_testsub4 CONNECTION 'dbname=regress_doesnotexist' PUBLICATION testpub WITH (slot_name = NONE, connect = false, only_local = foo);
+ERROR:  only_local requires a Boolean value
+-- now it works
+CREATE SUBSCRIPTION regress_testsub4 CONNECTION 'dbname=regress_doesnotexist' PUBLICATION testpub WITH (slot_name = NONE, connect = false, only_local = true);
+WARNING:  tables were not subscribed, you will have to run ALTER SUBSCRIPTION ... REFRESH PUBLICATION to subscribe the tables
+\dRs+ regress_testsub4
+                                                                                           List of subscriptions
+       Name       |           Owner           | Enabled | Publication | Binary | Streaming | Two phase commit | Disable on error | Only local | Synchronous commit |          Conninfo           | Skip LSN 
+------------------+---------------------------+---------+-------------+--------+-----------+------------------+------------------+------------+--------------------+-----------------------------+----------
+ regress_testsub4 | regress_subscription_user | f       | {testpub}   | f      | f         | d                | f                | t          | off                | dbname=regress_doesnotexist | 0/0
+(1 row)
+
+ALTER SUBSCRIPTION regress_testsub4 SET (only_local = false);
+\dRs+ regress_testsub4
+                                                                                           List of subscriptions
+       Name       |           Owner           | Enabled | Publication | Binary | Streaming | Two phase commit | Disable on error | Only local | Synchronous commit |          Conninfo           | Skip LSN 
+------------------+---------------------------+---------+-------------+--------+-----------+------------------+------------------+------------+--------------------+-----------------------------+----------
+ regress_testsub4 | regress_subscription_user | f       | {testpub}   | f      | f         | d                | f                | f          | off                | dbname=regress_doesnotexist | 0/0
+(1 row)
+
 DROP SUBSCRIPTION regress_testsub3;
+DROP SUBSCRIPTION regress_testsub4;
 -- fail - invalid connection string
 ALTER SUBSCRIPTION regress_testsub CONNECTION 'foobar';
 ERROR:  invalid connection string syntax: missing "=" after "foobar" in connection info string
 
 \dRs+
-                                                                                    List of subscriptions
-      Name       |           Owner           | Enabled | Publication | Binary | Streaming | Two phase commit | Disable on error | Synchronous commit |          Conninfo           | Skip LSN 
------------------+---------------------------+---------+-------------+--------+-----------+------------------+------------------+--------------------+-----------------------------+----------
- regress_testsub | regress_subscription_user | f       | {testpub}   | f      | f         | d                | f                | off                | dbname=regress_doesnotexist | 0/0
+                                                                                           List of subscriptions
+      Name       |           Owner           | Enabled | Publication | Binary | Streaming | Two phase commit | Disable on error | Only local | Synchronous commit |          Conninfo           | Skip LSN 
+-----------------+---------------------------+---------+-------------+--------+-----------+------------------+------------------+------------+--------------------+-----------------------------+----------
+ regress_testsub | regress_subscription_user | f       | {testpub}   | f      | f         | d                | f                | f          | off                | dbname=regress_doesnotexist | 0/0
 (1 row)
 
 ALTER SUBSCRIPTION regress_testsub SET PUBLICATION testpub2, testpub3 WITH (refresh = false);
@@ -96,10 +118,10 @@ ERROR:  unrecognized subscription parameter: "create_slot"
 -- ok
 ALTER SUBSCRIPTION regress_testsub SKIP (lsn = '0/12345');
 \dRs+
-                                                                                         List of subscriptions
-      Name       |           Owner           | Enabled |     Publication     | Binary | Streaming | Two phase commit | Disable on error | Synchronous commit |           Conninfo           | Skip LSN 
------------------+---------------------------+---------+---------------------+--------+-----------+------------------+------------------+--------------------+------------------------------+----------
- regress_testsub | regress_subscription_user | f       | {testpub2,testpub3} | f      | f         | d                | f                | off                | dbname=regress_doesnotexist2 | 0/12345
+                                                                                               List of subscriptions
+      Name       |           Owner           | Enabled |     Publication     | Binary | Streaming | Two phase commit | Disable on error | Only local | Synchronous commit |           Conninfo           | Skip LSN 
+-----------------+---------------------------+---------+---------------------+--------+-----------+------------------+------------------+------------+--------------------+------------------------------+----------
+ regress_testsub | regress_subscription_user | f       | {testpub2,testpub3} | f      | f         | d                | f                | f          | off                | dbname=regress_doesnotexist2 | 0/12345
 (1 row)
 
 -- ok - with lsn = NONE
@@ -108,10 +130,10 @@ ALTER SUBSCRIPTION regress_testsub SKIP (lsn = NONE);
 ALTER SUBSCRIPTION regress_testsub SKIP (lsn = '0/0');
 ERROR:  invalid WAL location (LSN): 0/0
 \dRs+
-                                                                                         List of subscriptions
-      Name       |           Owner           | Enabled |     Publication     | Binary | Streaming | Two phase commit | Disable on error | Synchronous commit |           Conninfo           | Skip LSN 
------------------+---------------------------+---------+---------------------+--------+-----------+------------------+------------------+--------------------+------------------------------+----------
- regress_testsub | regress_subscription_user | f       | {testpub2,testpub3} | f      | f         | d                | f                | off                | dbname=regress_doesnotexist2 | 0/0
+                                                                                               List of subscriptions
+      Name       |           Owner           | Enabled |     Publication     | Binary | Streaming | Two phase commit | Disable on error | Only local | Synchronous commit |           Conninfo           | Skip LSN 
+-----------------+---------------------------+---------+---------------------+--------+-----------+------------------+------------------+------------+--------------------+------------------------------+----------
+ regress_testsub | regress_subscription_user | f       | {testpub2,testpub3} | f      | f         | d                | f                | f          | off                | dbname=regress_doesnotexist2 | 0/0
 (1 row)
 
 BEGIN;
@@ -143,10 +165,10 @@ ALTER SUBSCRIPTION regress_testsub_foo SET (synchronous_commit = foobar);
 ERROR:  invalid value for parameter "synchronous_commit": "foobar"
 HINT:  Available values: local, remote_write, remote_apply, on, off.
 \dRs+
-                                                                                           List of subscriptions
-        Name         |           Owner           | Enabled |     Publication     | Binary | Streaming | Two phase commit | Disable on error | Synchronous commit |           Conninfo           | Skip LSN 
----------------------+---------------------------+---------+---------------------+--------+-----------+------------------+------------------+--------------------+------------------------------+----------
- regress_testsub_foo | regress_subscription_user | f       | {testpub2,testpub3} | f      | f         | d                | f                | local              | dbname=regress_doesnotexist2 | 0/0
+                                                                                                 List of subscriptions
+        Name         |           Owner           | Enabled |     Publication     | Binary | Streaming | Two phase commit | Disable on error | Only local | Synchronous commit |           Conninfo           | Skip LSN 
+---------------------+---------------------------+---------+---------------------+--------+-----------+------------------+------------------+------------+--------------------+------------------------------+----------
+ regress_testsub_foo | regress_subscription_user | f       | {testpub2,testpub3} | f      | f         | d                | f                | f          | local              | dbname=regress_doesnotexist2 | 0/0
 (1 row)
 
 -- rename back to keep the rest simple
@@ -179,19 +201,19 @@ ERROR:  binary requires a Boolean value
 CREATE SUBSCRIPTION regress_testsub CONNECTION 'dbname=regress_doesnotexist' PUBLICATION testpub WITH (connect = false, binary = true);
 WARNING:  tables were not subscribed, you will have to run ALTER SUBSCRIPTION ... REFRESH PUBLICATION to subscribe the tables
 \dRs+
-                                                                                    List of subscriptions
-      Name       |           Owner           | Enabled | Publication | Binary | Streaming | Two phase commit | Disable on error | Synchronous commit |          Conninfo           | Skip LSN 
------------------+---------------------------+---------+-------------+--------+-----------+------------------+------------------+--------------------+-----------------------------+----------
- regress_testsub | regress_subscription_user | f       | {testpub}   | t      | f         | d                | f                | off                | dbname=regress_doesnotexist | 0/0
+                                                                                           List of subscriptions
+      Name       |           Owner           | Enabled | Publication | Binary | Streaming | Two phase commit | Disable on error | Only local | Synchronous commit |          Conninfo           | Skip LSN 
+-----------------+---------------------------+---------+-------------+--------+-----------+------------------+------------------+------------+--------------------+-----------------------------+----------
+ regress_testsub | regress_subscription_user | f       | {testpub}   | t      | f         | d                | f                | f          | off                | dbname=regress_doesnotexist | 0/0
 (1 row)
 
 ALTER SUBSCRIPTION regress_testsub SET (binary = false);
 ALTER SUBSCRIPTION regress_testsub SET (slot_name = NONE);
 \dRs+
-                                                                                    List of subscriptions
-      Name       |           Owner           | Enabled | Publication | Binary | Streaming | Two phase commit | Disable on error | Synchronous commit |          Conninfo           | Skip LSN 
------------------+---------------------------+---------+-------------+--------+-----------+------------------+------------------+--------------------+-----------------------------+----------
- regress_testsub | regress_subscription_user | f       | {testpub}   | f      | f         | d                | f                | off                | dbname=regress_doesnotexist | 0/0
+                                                                                           List of subscriptions
+      Name       |           Owner           | Enabled | Publication | Binary | Streaming | Two phase commit | Disable on error | Only local | Synchronous commit |          Conninfo           | Skip LSN 
+-----------------+---------------------------+---------+-------------+--------+-----------+------------------+------------------+------------+--------------------+-----------------------------+----------
+ regress_testsub | regress_subscription_user | f       | {testpub}   | f      | f         | d                | f                | f          | off                | dbname=regress_doesnotexist | 0/0
 (1 row)
 
 DROP SUBSCRIPTION regress_testsub;
@@ -202,19 +224,19 @@ ERROR:  streaming requires a Boolean value
 CREATE SUBSCRIPTION regress_testsub CONNECTION 'dbname=regress_doesnotexist' PUBLICATION testpub WITH (connect = false, streaming = true);
 WARNING:  tables were not subscribed, you will have to run ALTER SUBSCRIPTION ... REFRESH PUBLICATION to subscribe the tables
 \dRs+
-                                                                                    List of subscriptions
-      Name       |           Owner           | Enabled | Publication | Binary | Streaming | Two phase commit | Disable on error | Synchronous commit |          Conninfo           | Skip LSN 
------------------+---------------------------+---------+-------------+--------+-----------+------------------+------------------+--------------------+-----------------------------+----------
- regress_testsub | regress_subscription_user | f       | {testpub}   | f      | t         | d                | f                | off                | dbname=regress_doesnotexist | 0/0
+                                                                                           List of subscriptions
+      Name       |           Owner           | Enabled | Publication | Binary | Streaming | Two phase commit | Disable on error | Only local | Synchronous commit |          Conninfo           | Skip LSN 
+-----------------+---------------------------+---------+-------------+--------+-----------+------------------+------------------+------------+--------------------+-----------------------------+----------
+ regress_testsub | regress_subscription_user | f       | {testpub}   | f      | t         | d                | f                | f          | off                | dbname=regress_doesnotexist | 0/0
 (1 row)
 
 ALTER SUBSCRIPTION regress_testsub SET (streaming = false);
 ALTER SUBSCRIPTION regress_testsub SET (slot_name = NONE);
 \dRs+
-                                                                                    List of subscriptions
-      Name       |           Owner           | Enabled | Publication | Binary | Streaming | Two phase commit | Disable on error | Synchronous commit |          Conninfo           | Skip LSN 
------------------+---------------------------+---------+-------------+--------+-----------+------------------+------------------+--------------------+-----------------------------+----------
- regress_testsub | regress_subscription_user | f       | {testpub}   | f      | f         | d                | f                | off                | dbname=regress_doesnotexist | 0/0
+                                                                                           List of subscriptions
+      Name       |           Owner           | Enabled | Publication | Binary | Streaming | Two phase commit | Disable on error | Only local | Synchronous commit |          Conninfo           | Skip LSN 
+-----------------+---------------------------+---------+-------------+--------+-----------+------------------+------------------+------------+--------------------+-----------------------------+----------
+ regress_testsub | regress_subscription_user | f       | {testpub}   | f      | f         | d                | f                | f          | off                | dbname=regress_doesnotexist | 0/0
 (1 row)
 
 -- fail - publication already exists
@@ -229,10 +251,10 @@ ALTER SUBSCRIPTION regress_testsub ADD PUBLICATION testpub1, testpub2 WITH (refr
 ALTER SUBSCRIPTION regress_testsub ADD PUBLICATION testpub1, testpub2 WITH (refresh = false);
 ERROR:  publication "testpub1" is already in subscription "regress_testsub"
 \dRs+
-                                                                                            List of subscriptions
-      Name       |           Owner           | Enabled |         Publication         | Binary | Streaming | Two phase commit | Disable on error | Synchronous commit |          Conninfo           | Skip LSN 
------------------+---------------------------+---------+-----------------------------+--------+-----------+------------------+------------------+--------------------+-----------------------------+----------
- regress_testsub | regress_subscription_user | f       | {testpub,testpub1,testpub2} | f      | f         | d                | f                | off                | dbname=regress_doesnotexist | 0/0
+                                                                                                   List of subscriptions
+      Name       |           Owner           | Enabled |         Publication         | Binary | Streaming | Two phase commit | Disable on error | Only local | Synchronous commit |          Conninfo           | Skip LSN 
+-----------------+---------------------------+---------+-----------------------------+--------+-----------+------------------+------------------+------------+--------------------+-----------------------------+----------
+ regress_testsub | regress_subscription_user | f       | {testpub,testpub1,testpub2} | f      | f         | d                | f                | f          | off                | dbname=regress_doesnotexist | 0/0
 (1 row)
 
 -- fail - publication used more then once
@@ -247,10 +269,10 @@ ERROR:  publication "testpub3" is not in subscription "regress_testsub"
 -- ok - delete publications
 ALTER SUBSCRIPTION regress_testsub DROP PUBLICATION testpub1, testpub2 WITH (refresh = false);
 \dRs+
-                                                                                    List of subscriptions
-      Name       |           Owner           | Enabled | Publication | Binary | Streaming | Two phase commit | Disable on error | Synchronous commit |          Conninfo           | Skip LSN 
------------------+---------------------------+---------+-------------+--------+-----------+------------------+------------------+--------------------+-----------------------------+----------
- regress_testsub | regress_subscription_user | f       | {testpub}   | f      | f         | d                | f                | off                | dbname=regress_doesnotexist | 0/0
+                                                                                           List of subscriptions
+      Name       |           Owner           | Enabled | Publication | Binary | Streaming | Two phase commit | Disable on error | Only local | Synchronous commit |          Conninfo           | Skip LSN 
+-----------------+---------------------------+---------+-------------+--------+-----------+------------------+------------------+------------+--------------------+-----------------------------+----------
+ regress_testsub | regress_subscription_user | f       | {testpub}   | f      | f         | d                | f                | f          | off                | dbname=regress_doesnotexist | 0/0
 (1 row)
 
 DROP SUBSCRIPTION regress_testsub;
@@ -284,10 +306,10 @@ ERROR:  two_phase requires a Boolean value
 CREATE SUBSCRIPTION regress_testsub CONNECTION 'dbname=regress_doesnotexist' PUBLICATION testpub WITH (connect = false, two_phase = true);
 WARNING:  tables were not subscribed, you will have to run ALTER SUBSCRIPTION ... REFRESH PUBLICATION to subscribe the tables
 \dRs+
-                                                                                    List of subscriptions
-      Name       |           Owner           | Enabled | Publication | Binary | Streaming | Two phase commit | Disable on error | Synchronous commit |          Conninfo           | Skip LSN 
------------------+---------------------------+---------+-------------+--------+-----------+------------------+------------------+--------------------+-----------------------------+----------
- regress_testsub | regress_subscription_user | f       | {testpub}   | f      | f         | p                | f                | off                | dbname=regress_doesnotexist | 0/0
+                                                                                           List of subscriptions
+      Name       |           Owner           | Enabled | Publication | Binary | Streaming | Two phase commit | Disable on error | Only local | Synchronous commit |          Conninfo           | Skip LSN 
+-----------------+---------------------------+---------+-------------+--------+-----------+------------------+------------------+------------+--------------------+-----------------------------+----------
+ regress_testsub | regress_subscription_user | f       | {testpub}   | f      | f         | p                | f                | f          | off                | dbname=regress_doesnotexist | 0/0
 (1 row)
 
 --fail - alter of two_phase option not supported.
@@ -296,10 +318,10 @@ ERROR:  unrecognized subscription parameter: "two_phase"
 -- but can alter streaming when two_phase enabled
 ALTER SUBSCRIPTION regress_testsub SET (streaming = true);
 \dRs+
-                                                                                    List of subscriptions
-      Name       |           Owner           | Enabled | Publication | Binary | Streaming | Two phase commit | Disable on error | Synchronous commit |          Conninfo           | Skip LSN 
------------------+---------------------------+---------+-------------+--------+-----------+------------------+------------------+--------------------+-----------------------------+----------
- regress_testsub | regress_subscription_user | f       | {testpub}   | f      | t         | p                | f                | off                | dbname=regress_doesnotexist | 0/0
+                                                                                           List of subscriptions
+      Name       |           Owner           | Enabled | Publication | Binary | Streaming | Two phase commit | Disable on error | Only local | Synchronous commit |          Conninfo           | Skip LSN 
+-----------------+---------------------------+---------+-------------+--------+-----------+------------------+------------------+------------+--------------------+-----------------------------+----------
+ regress_testsub | regress_subscription_user | f       | {testpub}   | f      | t         | p                | f                | f          | off                | dbname=regress_doesnotexist | 0/0
 (1 row)
 
 ALTER SUBSCRIPTION regress_testsub SET (slot_name = NONE);
@@ -308,10 +330,10 @@ DROP SUBSCRIPTION regress_testsub;
 CREATE SUBSCRIPTION regress_testsub CONNECTION 'dbname=regress_doesnotexist' PUBLICATION testpub WITH (connect = false, streaming = true, two_phase = true);
 WARNING:  tables were not subscribed, you will have to run ALTER SUBSCRIPTION ... REFRESH PUBLICATION to subscribe the tables
 \dRs+
-                                                                                    List of subscriptions
-      Name       |           Owner           | Enabled | Publication | Binary | Streaming | Two phase commit | Disable on error | Synchronous commit |          Conninfo           | Skip LSN 
------------------+---------------------------+---------+-------------+--------+-----------+------------------+------------------+--------------------+-----------------------------+----------
- regress_testsub | regress_subscription_user | f       | {testpub}   | f      | t         | p                | f                | off                | dbname=regress_doesnotexist | 0/0
+                                                                                           List of subscriptions
+      Name       |           Owner           | Enabled | Publication | Binary | Streaming | Two phase commit | Disable on error | Only local | Synchronous commit |          Conninfo           | Skip LSN 
+-----------------+---------------------------+---------+-------------+--------+-----------+------------------+------------------+------------+--------------------+-----------------------------+----------
+ regress_testsub | regress_subscription_user | f       | {testpub}   | f      | t         | p                | f                | f          | off                | dbname=regress_doesnotexist | 0/0
 (1 row)
 
 ALTER SUBSCRIPTION regress_testsub SET (slot_name = NONE);
@@ -323,18 +345,18 @@ ERROR:  disable_on_error requires a Boolean value
 CREATE SUBSCRIPTION regress_testsub CONNECTION 'dbname=regress_doesnotexist' PUBLICATION testpub WITH (connect = false, disable_on_error = false);
 WARNING:  tables were not subscribed, you will have to run ALTER SUBSCRIPTION ... REFRESH PUBLICATION to subscribe the tables
 \dRs+
-                                                                                    List of subscriptions
-      Name       |           Owner           | Enabled | Publication | Binary | Streaming | Two phase commit | Disable on error | Synchronous commit |          Conninfo           | Skip LSN 
------------------+---------------------------+---------+-------------+--------+-----------+------------------+------------------+--------------------+-----------------------------+----------
- regress_testsub | regress_subscription_user | f       | {testpub}   | f      | f         | d                | f                | off                | dbname=regress_doesnotexist | 0/0
+                                                                                           List of subscriptions
+      Name       |           Owner           | Enabled | Publication | Binary | Streaming | Two phase commit | Disable on error | Only local | Synchronous commit |          Conninfo           | Skip LSN 
+-----------------+---------------------------+---------+-------------+--------+-----------+------------------+------------------+------------+--------------------+-----------------------------+----------
+ regress_testsub | regress_subscription_user | f       | {testpub}   | f      | f         | d                | f                | f          | off                | dbname=regress_doesnotexist | 0/0
 (1 row)
 
 ALTER SUBSCRIPTION regress_testsub SET (disable_on_error = true);
 \dRs+
-                                                                                    List of subscriptions
-      Name       |           Owner           | Enabled | Publication | Binary | Streaming | Two phase commit | Disable on error | Synchronous commit |          Conninfo           | Skip LSN 
------------------+---------------------------+---------+-------------+--------+-----------+------------------+------------------+--------------------+-----------------------------+----------
- regress_testsub | regress_subscription_user | f       | {testpub}   | f      | f         | d                | t                | off                | dbname=regress_doesnotexist | 0/0
+                                                                                           List of subscriptions
+      Name       |           Owner           | Enabled | Publication | Binary | Streaming | Two phase commit | Disable on error | Only local | Synchronous commit |          Conninfo           | Skip LSN 
+-----------------+---------------------------+---------+-------------+--------+-----------+------------------+------------------+------------+--------------------+-----------------------------+----------
+ regress_testsub | regress_subscription_user | f       | {testpub}   | f      | f         | d                | t                | f          | off                | dbname=regress_doesnotexist | 0/0
 (1 row)
 
 ALTER SUBSCRIPTION regress_testsub SET (slot_name = NONE);
diff --git a/src/test/regress/sql/subscription.sql b/src/test/regress/sql/subscription.sql
index 74c38ead5d..28eb91fc47 100644
--- a/src/test/regress/sql/subscription.sql
+++ b/src/test/regress/sql/subscription.sql
@@ -54,7 +54,17 @@ CREATE SUBSCRIPTION regress_testsub3 CONNECTION 'dbname=regress_doesnotexist' PU
 ALTER SUBSCRIPTION regress_testsub3 ENABLE;
 ALTER SUBSCRIPTION regress_testsub3 REFRESH PUBLICATION;
 
+-- fail - only_local must be boolean
+CREATE SUBSCRIPTION regress_testsub4 CONNECTION 'dbname=regress_doesnotexist' PUBLICATION testpub WITH (slot_name = NONE, connect = false, only_local = foo);
+
+-- now it works
+CREATE SUBSCRIPTION regress_testsub4 CONNECTION 'dbname=regress_doesnotexist' PUBLICATION testpub WITH (slot_name = NONE, connect = false, only_local = true);
+\dRs+ regress_testsub4
+ALTER SUBSCRIPTION regress_testsub4 SET (only_local = false);
+\dRs+ regress_testsub4
+
 DROP SUBSCRIPTION regress_testsub3;
+DROP SUBSCRIPTION regress_testsub4;
 
 -- fail - invalid connection string
 ALTER SUBSCRIPTION regress_testsub CONNECTION 'foobar';
diff --git a/src/test/subscription/t/032_onlylocal.pl b/src/test/subscription/t/032_onlylocal.pl
new file mode 100644
index 0000000000..5ff5a0d9dc
--- /dev/null
+++ b/src/test/subscription/t/032_onlylocal.pl
@@ -0,0 +1,162 @@
+
+# Copyright (c) 2021-2022, PostgreSQL Global Development Group
+
+# Test logical replication using only_local option.
+use strict;
+use warnings;
+use PostgreSQL::Test::Cluster;
+use PostgreSQL::Test::Utils;
+use Test::More;
+
+###############################################################################
+# Setup a bidirectional logical replication between Node_A & Node_B
+###############################################################################
+
+# Initialize nodes
+# node_A
+my $node_A = PostgreSQL::Test::Cluster->new('node_A');
+$node_A->init(allows_streaming => 'logical');
+$node_A->append_conf(
+	'postgresql.conf', qq(
+max_prepared_transactions = 10
+logical_decoding_work_mem = 64kB
+));
+$node_A->start;
+# node_B
+my $node_B = PostgreSQL::Test::Cluster->new('node_B');
+$node_B->init(allows_streaming => 'logical');
+$node_B->append_conf(
+	'postgresql.conf', qq(
+max_prepared_transactions = 10
+logical_decoding_work_mem = 64kB
+));
+$node_B->start;
+
+# Create tables on node_A
+$node_A->safe_psql('postgres', "CREATE TABLE tab_full (a int PRIMARY KEY)");
+
+# Create the same tables on node_B
+$node_B->safe_psql('postgres', "CREATE TABLE tab_full (a int PRIMARY KEY)");
+
+# Setup logical replication
+# node_A (pub) -> node_B (sub)
+my $node_A_connstr = $node_A->connstr . ' dbname=postgres';
+$node_A->safe_psql('postgres',
+	"CREATE PUBLICATION tap_pub_A FOR TABLE tab_full");
+my $appname_B1 = 'tap_sub_B1';
+$node_B->safe_psql(
+	'postgres', "
+	CREATE SUBSCRIPTION tap_sub_B1
+	CONNECTION '$node_A_connstr application_name=$appname_B1'
+	PUBLICATION tap_pub_A
+	WITH (only_local = on)");
+
+# node_B (pub) -> node_A (sub)
+my $node_B_connstr = $node_B->connstr . ' dbname=postgres';
+$node_B->safe_psql('postgres',
+	"CREATE PUBLICATION tap_pub_B FOR TABLE tab_full");
+my $appname_A = 'tap_sub_A';
+$node_A->safe_psql(
+	'postgres', "
+	CREATE SUBSCRIPTION tap_sub_A
+	CONNECTION '$node_B_connstr application_name=$appname_A'
+	PUBLICATION tap_pub_B
+	WITH (only_local = on, copy_data = off)");
+
+# Wait for subscribers to finish initialization
+$node_A->wait_for_catchup($appname_B1);
+$node_B->wait_for_catchup($appname_A);
+
+# Also wait for initial table sync to finish
+my $synced_query =
+  "SELECT count(1) = 0 FROM pg_subscription_rel WHERE srsubstate NOT IN ('r', 's');";
+$node_A->poll_query_until('postgres', $synced_query)
+  or die "Timed out while waiting for subscriber to synchronize data";
+$node_B->poll_query_until('postgres', $synced_query)
+  or die "Timed out while waiting for subscriber to synchronize data";
+
+is(1, 1, "Circular replication setup is complete");
+
+my $result;
+
+###############################################################################
+# check that bidirectional logical replication setup does not cause infinite
+# recursive insertion.
+###############################################################################
+
+# insert a record
+$node_A->safe_psql('postgres', "INSERT INTO tab_full VALUES (11);");
+$node_B->safe_psql('postgres', "INSERT INTO tab_full VALUES (12);");
+
+$node_A->wait_for_catchup($appname_B1);
+$node_B->wait_for_catchup($appname_A);
+
+# check that transaction was committed on subscriber(s)
+$result = $node_A->safe_psql('postgres', "SELECT * FROM tab_full ORDER BY 1;");
+is($result, qq(11
+12), 'Inserted successfully without leading to infinite recursion in bidirectional replication setup'
+);
+$result = $node_B->safe_psql('postgres', "SELECT * FROM tab_full ORDER BY 1;");
+is($result, qq(11
+12), 'Inserted successfully without leading to infinite recursion in bidirectional replication setup'
+);
+
+###############################################################################
+# check that remote data that is originated from node_C to node_B is not
+# published to node_A
+###############################################################################
+# Initialize node node_C
+my $node_C = PostgreSQL::Test::Cluster->new('node_C');
+$node_C->init(allows_streaming => 'logical');
+$node_C->append_conf(
+	'postgresql.conf', qq(
+max_prepared_transactions = 10
+logical_decoding_work_mem = 64kB
+));
+$node_C->start;
+
+$node_C->safe_psql('postgres', "CREATE TABLE tab_full (a int PRIMARY KEY)");
+
+# Setup logical replication
+# node_C (pub) -> node_B (sub)
+my $node_C_connstr = $node_C->connstr . ' dbname=postgres';
+$node_C->safe_psql('postgres',
+	"CREATE PUBLICATION tap_pub_C FOR TABLE tab_full");
+
+my $appname_B2 = 'tap_sub_B2';
+$node_B->safe_psql(
+	'postgres', "
+	CREATE SUBSCRIPTION tap_sub_B2
+	CONNECTION '$node_C_connstr application_name=$appname_B2'
+	PUBLICATION tap_pub_C
+	WITH (only_local = on)");
+
+$node_C->wait_for_catchup($appname_B2);
+
+$node_C->poll_query_until('postgres', $synced_query)
+  or die "Timed out while waiting for subscriber to synchronize data";
+
+# insert a record
+$node_C->safe_psql('postgres', "INSERT INTO tab_full VALUES (13);");
+
+$node_C->wait_for_catchup($appname_B2);
+$node_B->wait_for_catchup($appname_A);
+
+$result = $node_B->safe_psql('postgres', "SELECT * FROM tab_full ORDER BY 1;");
+is($result, qq(11
+12
+13), 'Node_C data replicated to Node_B'
+);
+
+# check that the data published from node_C to node_B is not sent to node_A
+$result = $node_A->safe_psql('postgres', "SELECT * FROM tab_full ORDER BY 1;");
+is($result, qq(11
+12), 'Remote data originated from other node is not replicated when only_local option is ON'
+);
+
+# shutdown
+$node_B->stop('fast');
+$node_A->stop('fast');
+$node_C->stop('fast');
+
+done_testing();
-- 
2.32.0

v16-0002-Support-force-option-for-copy_data-check-and-thr.patchapplication/octet-stream; name=v16-0002-Support-force-option-for-copy_data-check-and-thr.patchDownload

From e4ae64592c3413530bafa74b67ee05938168cb0a Mon Sep 17 00:00:00 2001
From: Vigneshwaran C <vignesh21@gmail.com>
Date: Mon, 23 May 2022 22:55:46 +0530
Subject: [PATCH v16 2/2] Support force option for copy_data, check and throw
 an error if publisher tables were also subscribing data in the publisher from
 other publishers.

This patch does couple of things:
change 1) Added force option for copy_data.
change 2) Check and throw an error if the publication tables were also
subscribing data in the publisher from other publishers.

--------------------------------------------------------------------------------
The following will help us understand how the first change will be useful:
Let's take a simple case where user is trying to setup bidirectional logical
replication between node1 and node1 where the two nodes has some pre-existing
data like below:
node1: Table t1 (c1 int) has data 1, 2, 3, 4
node2: Table t1 (c1 int) has data 5, 6, 7, 8

The following steps are required in this case:
node1
step 1: CREATE PUBLICATION pub_node1 FOR TABLE t1;

node2
step 2: CREATE PUBLICATION pub_node2 FOR TABLE t1;

node1:
step 3: CREATE SUBSCRIPTION sub_node1_node2 CONNECTION '<node2 details>' PUBLICATION pub_node2;

node2:
step 4: CREATE SUBSCRIPTION sub_node2_node1 Connection '<node1 details>' PUBLICATION pub_node1;

After this the data will be something like this:
node1:
1, 2, 3, 4, 5, 6, 7, 8

node2:
1, 2, 3, 4, 5, 6, 7, 8, 5, 6, 7, 8

So, you can see that data on node2 (5, 6, 7, 8) is duplicated. In
case, table t1 has a unique key, it will lead to a unique key
violation and replication won't proceed.

This problem can be solved by using only_local and copy_data option as given
below:
Step 1 & Step 2 are same as above.

step 3: Then, Create a subscription in node1 to subscribe to node2. Use
copy_data specified as on so that the existing table data is copied during
initial sync:
CREATE SUBSCRIPTION sub_node1_node2 CONNECTION '<node2 details>' PUBLICATION pub_node2 WITH (copy_data = on, only_local = on);

step 4: Adjust the publication publish settings so that truncate is not
published to the subscribers and truncate the table data in node2:
ALTER PUBLICATION pub1_node2 SET (publish='insert,update,delete');
TRUNCATE t1;
ALTER PUBLICATION pub1_node2 SET (publish='insert,update,delete,truncate');

step 5: Create a subscription in node2 to subscribe to node1. Use copy_data
specified as force when creating a subscription to node1 so that the existing
table data is copied during initial sync:
CREATE SUBSCRIPTION sub_node2_node1 CONNECTION '<node1 details>' PUBLICATION pub_node1 WITH (copy_data = force, only_local = on);

--------------------------------------------------------------------------------
The below help us understand how the second change will be useful:

If copy_data option was used with 'on' in step 5, then an error will be thrown
to alert the user to prevent inconsistent data being populated:
CREATE SUBSCRIPTION sub_node2_node1 CONNECTION '<node1 details>' PUBLICATION pub_node1 WITH (copy_data = force, only_local = on);
ERROR:  CREATE/ALTER SUBSCRIPTION with only_local and copy_data as true is not allowed when the publisher might have replicated data
---
 doc/src/sgml/logical-replication.sgml      | 353 +++++++++++++++++++++
 doc/src/sgml/ref/alter_subscription.sgml   |  16 +-
 doc/src/sgml/ref/create_subscription.sgml  |  33 +-
 src/backend/commands/subscriptioncmds.c    | 139 ++++++--
 src/test/regress/expected/subscription.out |  18 +-
 src/test/regress/sql/subscription.sql      |  12 +
 src/test/subscription/t/032_onlylocal.pl   | 327 ++++++++++++++++---
 src/tools/pgindent/typedefs.list           |   1 +
 8 files changed, 825 insertions(+), 74 deletions(-)

diff --git a/doc/src/sgml/logical-replication.sgml b/doc/src/sgml/logical-replication.sgml
index 145ea71d61..54fa20254c 100644
--- a/doc/src/sgml/logical-replication.sgml
+++ b/doc/src/sgml/logical-replication.sgml
@@ -1267,4 +1267,357 @@ CREATE SUBSCRIPTION mysub CONNECTION 'dbname=foo host=bar user=repuser' PUBLICAT
    incremental changes to those tables.
   </para>
  </sect1>
+
+ <sect1 id="bidirectional-logical-replication">
+  <title>Bidirectional logical replication</title>
+
+  <sect2 id="setting-bidirectional-replication-two-nodes">
+   <title>Setting bidirectional replication between two nodes</title>
+   <para>
+    Bidirectional replication is useful in creating a multi-master database
+    which helps in performing read/write operations from any of the nodes.
+    Setting up bidirectional logical replication between two nodes requires
+    creation of a publication in all the nodes, creating subscriptions in
+    each of the nodes that subscribes to data from all the nodes. The steps
+    to create a two-node bidirectional replication when there is no data in
+    both the nodes are given below:
+   </para>
+
+   <para>
+    Lock the required tables in <literal>node1</literal> and
+    <literal>node2</literal> till the setup is completed.
+   </para>
+
+   <para>
+    Create a publication in <literal>node1</literal>:
+<programlisting>
+node1=# CREATE PUBLICATION pub_node1 FOR TABLE t1;
+CREATE PUBLICATION
+</programlisting></para>
+
+   <para>
+    Create a subscription in <literal>node2</literal> to subscribe to
+    <literal>node1</literal>:
+<programlisting>
+node2=# CREATE SUBSCRIPTION sub_node2_node1
+node2-# CONNECTION 'dbname=foo host=node1 user=repuser'
+node2-# PUBLICATION pub_node1
+node2-# WITH (copy_data = off, only_local = on);
+CREATE SUBSCRIPTION
+</programlisting></para>
+
+   <para>
+    Create a publication in <literal>node2</literal>:
+<programlisting>
+node2=# CREATE PUBLICATION pub_node2 FOR TABLE t1;
+CREATE PUBLICATION
+</programlisting></para>
+
+   <para>
+    Create a subscription in <literal>node1</literal> to subscribe to
+    <literal>node2</literal>:
+<programlisting>
+node1=# CREATE SUBSCRIPTION sub_node1_node2
+node1-# CONNECTION 'dbname=foo host=node2 user=repuser'
+node1-# PUBLICATION pub_node2
+node1-# WITH (copy_data = off, only_local = on);
+CREATE SUBSCRIPTION
+</programlisting></para>
+
+   <para>
+    Now the bidirectional logical replication setup is complete between
+    <literal>node1</literal> and <literal>node2</literal>. Any incremental
+    changes from <literal>node1</literal> will be replicated to
+    <literal>node2</literal> and the incremental changes from
+    <literal>node2</literal> will be replicated to <literal>node1</literal>.
+   </para>
+  </sect2>
+
+  <sect2 id="add-new-node">
+   <title>Adding a new node when there is no data in any of the nodes</title>
+   <para>
+    Adding a new node <literal>node3</literal> to the existing
+    <literal>node1</literal> and <literal>node2</literal> requires setting
+    up subscription in <literal>node1</literal> and <literal>node2</literal>
+    to replicate the data from <literal>node3</literal> and setting up
+    subscription in <literal>node3</literal> to replicate data from
+    <literal>node1</literal> and <literal>node2</literal>.
+   </para>
+
+   <para>
+    Create a publication in <literal>node3</literal>:
+<programlisting>
+node3=# CREATE PUBLICATION pub_node3 FOR TABLE t1;
+CREATE PUBLICATION
+</programlisting></para>
+
+   <para>
+    Lock the required tables in all the nodes <literal>node1</literal>,
+    <literal>node2</literal> and <literal>node3</literal> till the setup is
+    completed.
+   </para>
+
+   <para>
+    Create a subscription in <literal>node1</literal> to subscribe to
+    <literal>node3</literal>:
+<programlisting>
+node1=# CREATE SUBSCRIPTION sub_node1_node3
+node1-# CONNECTION 'dbname=foo host=node3 user=repuser'
+node1-# PUBLICATION pub_node3
+node1-# WITH (copy_data = off, only_local = on);
+CREATE SUBSCRIPTION
+</programlisting></para>
+
+   <para>
+    Create a subscription in <literal>node2</literal> to subscribe to
+    <literal>node3</literal>:
+<programlisting>
+node2=# CREATE SUBSCRIPTION sub_node2_node3
+node2-# CONNECTION 'dbname=foo host=node3 user=repuser'
+node2-# PUBLICATION pub_node3
+node2-# WITH (copy_data = off, only_local = on);
+CREATE SUBSCRIPTION
+</programlisting></para>
+
+   <para>
+    Create a subscription in <literal>node3</literal> to subscribe to
+    <literal>node1</literal>:
+<programlisting>
+node3=# CREATE SUBSCRIPTION sub_node3_node1
+node3-# CONNECTION 'dbname=foo host=node1 user=repuser'
+node3-# PUBLICATION pub_node1
+node3-# WITH (copy_data = off, only_local = on);
+CREATE SUBSCRIPTION
+</programlisting></para>
+
+   <para>
+    Create a subscription in <literal>node3</literal> to subscribe to
+    <literal>node2</literal>:
+<programlisting>
+node3=# CREATE SUBSCRIPTION sub_node3_node2
+node3-# CONNECTION 'dbname=foo host=node2 user=repuser'
+node3-# PUBLICATION pub_node2
+node3-# WITH (copy_data = off, only_local = on);
+CREATE SUBSCRIPTION
+</programlisting></para>
+  </sect2>
+
+  <sect2 id="add-new-node-data-in-existing-node">
+   <title>Adding a new node when data is present in the existing nodes</title>
+    <para>
+     Adding a new node <literal>node3</literal> to the existing
+     <literal>node1</literal> and <literal>node2</literal> when data is present
+     in existing nodes <literal>node1</literal> and <literal>node2</literal>
+     needs similar steps. The only change required here is that
+     <literal>node3</literal> should create a subscription with
+     <literal>copy_data = force</literal> to one of the existing nodes to
+     receive the existing data during initial data synchronization.
+   </para>
+
+   <para>
+    Create a publication in <literal>node3</literal>:
+<programlisting>
+node3=# CREATE PUBLICATION pub_node3 FOR TABLE t1;
+CREATE PUBLICATION
+</programlisting></para>
+
+   <para>
+    Lock the required tables in <literal>node2</literal> and
+    <literal>node3</literal> till the setup is completed.
+   </para>
+
+   <para>
+    Create a subscription in <literal>node1</literal> to subscribe to
+    <literal>node3</literal>:
+<programlisting>
+node1=# CREATE SUBSCRIPTION sub_node1_node3
+node1-# CONNECTION 'dbname=foo host=node3 user=repuser'
+node1-# PUBLICATION pub_node3
+node1-# WITH (copy_data = off, only_local = on);
+CREATE SUBSCRIPTION
+</programlisting></para>
+
+   <para>
+    Create a subscription in <literal>node2</literal> to subscribe to
+    <literal>node3</literal>:
+<programlisting>
+node2=# CREATE SUBSCRIPTION sub_node2_node3
+node2-# CONNECTION 'dbname=foo host=node3 user=repuser'
+node2-# PUBLICATION pub_node3
+node2-# WITH (copy_data = off, only_local = on);
+CREATE SUBSCRIPTION
+</programlisting></para>
+
+   <para>
+    Create a subscription in <literal>node3</literal> to subscribe to
+    <literal>node1</literal>. Use <literal>copy_data</literal> specified as
+    <literal>force</literal> so that the existing table data is
+    copied during initial sync:
+<programlisting>
+node3=# CREATE SUBSCRIPTION sub_node3_node1
+node3-# CONNECTION 'dbname=foo host=node1 user=repuser'
+node3-# PUBLICATION pub_node1
+node3-# WITH (copy_data = force, only_local = on);
+CREATE SUBSCRIPTION
+</programlisting></para>
+
+   <para>
+    Create a subscription in <literal>node3</literal> to subscribe to
+    <literal>node2</literal>:
+<programlisting>
+node3=# CREATE SUBSCRIPTION sub_node3_node2
+node3-# CONNECTION 'dbname=foo host=node2 user=repuser'
+node3-# PUBLICATION pub_node2
+node3-# WITH (copy_data = off, only_local = on);
+CREATE SUBSCRIPTION
+</programlisting></para>
+  </sect2>
+
+  <sect2 id="add-node-data-present-in-new-node">
+   <title>Adding a new node when data is present in the new node</title>
+   <para>
+     Adding a new node <literal>node3</literal> to the existing
+     <literal>node1</literal> and <literal>node2</literal> when data is present
+     in the new node <literal>node3</literal> needs similar steps. A few changes
+     are required here to get the existing data from <literal>node3</literal>
+     to <literal>node1</literal> and <literal>node2</literal> and later
+     cleaning up of data in <literal>node3</literal> before synchronization of
+     all the data from the existing nodes.
+   </para>
+
+   <para>
+    Create a publication in <literal>node3</literal>:
+<programlisting>
+node3=# CREATE PUBLICATION pub_node3 FOR TABLE t1;
+CREATE PUBLICATION
+</programlisting></para>
+
+   <para>
+    Lock the required tables in <literal>node2</literal> and
+    <literal>node3</literal> till the setup is completed.
+   </para>
+
+   <para>
+    Create a subscription in <literal>node1</literal> to subscribe to
+    <literal>node3</literal>. Use <literal>copy_data</literal> specified as
+    <literal>on</literal> so that the existing table data is copied during
+    initial sync:
+<programlisting>
+node1=# CREATE SUBSCRIPTION sub_node1_node3
+node1-# CONNECTION 'dbname=foo host=node3 user=repuser'
+node1-# PUBLICATION pub_node3
+node1-# WITH (copy_data = on, only_local = on);
+CREATE SUBSCRIPTION
+</programlisting></para>
+
+   <para>
+    Create a subscription in <literal>node2</literal> to subscribe to
+    <literal>node3</literal>. Use <literal>copy_data</literal> specified as
+    <literal>on</literal> so that the existing table data is copied during
+    initial sync:
+<programlisting>
+node2=# CREATE SUBSCRIPTION sub_node2_node3
+node2-# CONNECTION 'dbname=foo host=node3 user=repuser'
+node2-# PUBLICATION pub_node3
+node2-# WITH (copy_data = on, only_local = on);
+CREATE SUBSCRIPTION
+</programlisting></para>
+
+   <para>
+    Adjust the publication publish settings so that truncate is not published
+    to the subscribers and truncate the table data in <literal>node3</literal>:
+<programlisting>
+node3=# ALTER PUBLICATION pub_node3 SET (publish='insert,update,delete');
+ALTER PUBLICATION
+node3=# TRUNCATE t1;
+TRUNCATE TABLE
+node3=# ALTER PUBLICATION pub_node3 SET (publish='insert,update,delete,truncate');
+ALTER PUBLICATION
+</programlisting></para>
+
+   <para>
+    Create a subscription in <literal>node3</literal> to subscribe to
+    <literal>node1</literal>. Use <literal>copy_data</literal> specified as
+    <literal>force</literal> when creating a subscription to
+    <literal>node1</literal> so that the existing table data is copied during
+    initial sync:
+<programlisting>
+node3=# CREATE SUBSCRIPTION
+node3-# sub_node3_node1 CONNECTION 'dbname=foo host=node1 user=repuser'
+node3-# PUBLICATION pub_node1
+node3-# WITH (copy_data = force, only_local = on);
+CREATE SUBSCRIPTION
+</programlisting></para>
+
+   <para>
+    Create a subscription in <literal>node3</literal> to subscribe to
+    <literal>node2</literal>. Use <literal>copy_data</literal> specified as
+    <literal>off</literal> because the initial table data would have been
+    already copied in the previous step:
+<programlisting>
+node3=# CREATE SUBSCRIPTION
+node3-# sub_node3_node2 CONNECTION 'dbname=foo host=node2 user=repuser'
+node3-# PUBLICATION pub_node2
+node3-# WITH (copy_data = off, only_local = on);
+CREATE SUBSCRIPTION
+</programlisting></para>
+  </sect2>
+
+  <sect2 id="generic-steps-add-new-node">
+   <title>Generic steps to add a new node to the existing set of nodes</title>
+   <para>
+    Create the required publication on the new node.
+   </para>
+   <para>
+    Lock the required tables in the new node until the setup is complete.
+   </para>
+   <para>
+    Create subscriptions on existing nodes pointing to publication on
+    the new node with <literal>only_local</literal> option specified as
+    <literal>on</literal> and <literal>copy_data</literal> specified as
+    <literal>on</literal>.
+   </para>
+   <para>
+    Wait for data to be copied from the new node to existing nodes.
+   </para>
+   <para>
+    Alter the publication in new node so that the truncate operation is not
+    replicated to the subscribers.
+   </para>
+   <para>
+    Truncate the data on the new node.
+   </para>
+   <para>
+    Alter the publication in new node to include replication of truncate
+    operations.
+   </para>
+   <para>
+    Lock the required tables in the existing nodes except the first node
+    until the setup is complete.
+   </para>
+   <para>
+    Create subscriptions on the new node pointing to publication on the first
+    node with <literal>only_local</literal> option specified as
+    <literal>on</literal> and <literal>copy_data</literal> option specified as
+    <literal>force</literal>.
+   </para>
+   <para>
+    Create subscriptions on the new node pointing to publications on the
+    remaining node with <literal>only_local</literal> option specified as
+    <literal>on</literal> and <literal>copy_data</literal> option specified as
+    <literal>off</literal>.
+   </para>
+  </sect2>
+
+  <sect2>
+   <title>Notes</title>
+   <para>
+    Setting up bidirectional logical replication across nodes requires multiple
+    steps to be performed on various nodes, as all operations are not
+    transactional, user is advised to take backup of existing data to avoid any
+    inconsistency.
+   </para>
+  </sect2>
+ </sect1>
+
 </chapter>
diff --git a/doc/src/sgml/ref/alter_subscription.sgml b/doc/src/sgml/ref/alter_subscription.sgml
index 45beca9b86..34d78a9862 100644
--- a/doc/src/sgml/ref/alter_subscription.sgml
+++ b/doc/src/sgml/ref/alter_subscription.sgml
@@ -161,12 +161,22 @@ ALTER SUBSCRIPTION <replaceable class="parameter">name</replaceable> RENAME TO <
 
       <variablelist>
        <varlistentry>
-        <term><literal>copy_data</literal> (<type>boolean</type>)</term>
+        <term><literal>copy_data</literal> (<type>enum</type>)</term>
         <listitem>
          <para>
           Specifies whether to copy pre-existing data in the publications
-          that are being subscribed to when the replication starts.
-          The default is <literal>true</literal>.
+          that are being subscribed to when the replication starts. This
+          parameter may be either <literal>true</literal>,
+          <literal>false</literal> or <literal>force</literal>. The default is
+          <literal>true</literal>.
+         </para>
+         <para>
+          There is some interaction between the <literal>only_local</literal>
+          option and <literal>copy_data</literal> option. Refer to the
+          <command>CREATE SUBSCRIPTION</command>
+          <xref linkend="sql-createsubscription-notes" /> for interaction
+          details and usage of <literal>force</literal> for
+          <literal>copy_data</literal> option.
          </para>
          <para>
           Previously subscribed tables are not copied, even if a table's row
diff --git a/doc/src/sgml/ref/create_subscription.sgml b/doc/src/sgml/ref/create_subscription.sgml
index 3d3a26a61c..6c436ff492 100644
--- a/doc/src/sgml/ref/create_subscription.sgml
+++ b/doc/src/sgml/ref/create_subscription.sgml
@@ -201,18 +201,28 @@ CREATE SUBSCRIPTION <replaceable class="parameter">subscription_name</replaceabl
        </varlistentry>
 
        <varlistentry>
-        <term><literal>copy_data</literal> (<type>boolean</type>)</term>
+        <term><literal>copy_data</literal> (<type>enum</type>)</term>
         <listitem>
          <para>
           Specifies whether to copy pre-existing data in the publications
-          that are being subscribed to when the replication starts.
-          The default is <literal>true</literal>.
+          that are being subscribed to when the replication starts. This
+          parameter may be either <literal>true</literal>,
+          <literal>false</literal> or <literal>force</literal>. The default is
+          <literal>true</literal>.
          </para>
          <para>
           If the publications contain <literal>WHERE</literal> clauses, it
           will affect what data is copied. Refer to the
           <xref linkend="sql-createsubscription-notes" /> for details.
          </para>
+
+         <para>
+          There is some interaction between the <literal>only_local</literal>
+          option and <literal>copy_data</literal> option. Refer to the
+          <xref linkend="sql-createsubscription-notes" /> for interaction
+          details and usage of <literal>force</literal> for
+          <literal>copy_data</literal> option.
+         </para>
         </listitem>
        </varlistentry>
 
@@ -312,6 +322,11 @@ CREATE SUBSCRIPTION <replaceable class="parameter">subscription_name</replaceabl
           publisher node changes regardless of their origin. The default is
           <literal>false</literal>.
          </para>
+         <para>
+          There is some interaction between the <literal>only_local</literal>
+          option and <literal>copy_data</literal> option. Refer to the
+          <xref linkend="sql-createsubscription-notes" /> for details.
+         </para>
         </listitem>
        </varlistentry>
       </variablelist></para>
@@ -374,6 +389,18 @@ CREATE SUBSCRIPTION <replaceable class="parameter">subscription_name</replaceabl
    can have non-existent publications.
   </para>
 
+  <para>
+   If subscription is created with <literal>only_local = on</literal> and
+   <literal>copy_data = on</literal>, it will check if the publisher tables are
+   being subscribed to any other publisher and throw an error to prevent
+   inconsistent data in the subscription. The user can continue with the copy
+   operation without throwing any error in this case by specifying
+   <literal>copy_data = force</literal>. Refer to the
+   <xref linkend="bidirectional-logical-replication"/> on how
+   <literal>copy_data</literal> and <literal>only_local</literal> can be used
+   in bidirectional replication.
+  </para>
+
  </refsect1>
 
  <refsect1>
diff --git a/src/backend/commands/subscriptioncmds.c b/src/backend/commands/subscriptioncmds.c
index 1fc9ad547c..395d93cd59 100644
--- a/src/backend/commands/subscriptioncmds.c
+++ b/src/backend/commands/subscriptioncmds.c
@@ -69,6 +69,18 @@
 /* check if the 'val' has 'bits' set */
 #define IsSet(val, bits)  (((val) & (bits)) == (bits))
 
+#define IS_COPY_DATA_ON_OR_FORCE(copy_data) ((copy_data) != COPY_DATA_OFF)
+
+/*
+ * Represents whether copy_data option is specified with off, on or force.
+ */
+typedef enum CopyData
+{
+	COPY_DATA_OFF,
+	COPY_DATA_ON,
+	COPY_DATA_FORCE
+} CopyData;
+
 /*
  * Structure to hold a bitmap representing the user-provided CREATE/ALTER
  * SUBSCRIPTION command options and the parsed/default values of each of them.
@@ -81,7 +93,7 @@ typedef struct SubOpts
 	bool		connect;
 	bool		enabled;
 	bool		create_slot;
-	bool		copy_data;
+	CopyData	copy_data;
 	bool		refresh;
 	bool		binary;
 	bool		streaming;
@@ -91,11 +103,66 @@ typedef struct SubOpts
 	XLogRecPtr	lsn;
 } SubOpts;
 
-static List *fetch_table_list(WalReceiverConn *wrconn, List *publications);
+static List *fetch_table_list(WalReceiverConn *wrconn, List *publications,
+							  CopyData copydata, bool only_local);
 static void check_duplicates_in_publist(List *publist, Datum *datums);
 static List *merge_publications(List *oldpublist, List *newpublist, bool addpub, const char *subname);
 static void ReportSlotConnectionError(List *rstates, Oid subid, char *slotname, char *err);
 
+/*
+ * Validate the value specified for copy_data option.
+ */
+static CopyData
+DefGetCopyData(DefElem *def)
+{
+	/*
+	 * If no parameter given, assume "true" is meant.
+	 */
+	if (def->arg == NULL)
+		return COPY_DATA_ON;
+
+	/*
+	 * Allow 0, 1, "true", "false", "on", "off" or "force".
+	 */
+	switch (nodeTag(def->arg))
+	{
+		case T_Integer:
+			switch (intVal(def->arg))
+			{
+				case 0:
+					return COPY_DATA_OFF;
+				case 1:
+					return COPY_DATA_ON;
+				default:
+					/* otherwise, error out below */
+					break;
+			}
+			break;
+		default:
+			{
+				char	   *sval = defGetString(def);
+
+				/*
+				 * The set of strings accepted here should match up with the
+				 * grammar's opt_boolean_or_string production.
+				 */
+				if (pg_strcasecmp(sval, "false") == 0 ||
+					pg_strcasecmp(sval, "off") == 0)
+					return COPY_DATA_OFF;
+				if (pg_strcasecmp(sval, "true") == 0 ||
+					pg_strcasecmp(sval, "on") == 0)
+					return COPY_DATA_ON;
+				if (pg_strcasecmp(sval, "force") == 0)
+					return COPY_DATA_FORCE;
+			}
+			break;
+	}
+
+	ereport(ERROR,
+			errcode(ERRCODE_SYNTAX_ERROR),
+			errmsg("%s requires a boolean or \"force\"", def->defname));
+	return COPY_DATA_OFF;		/* keep compiler quiet */
+}
 
 /*
  * Common option parsing function for CREATE and ALTER SUBSCRIPTION commands.
@@ -128,7 +195,7 @@ parse_subscription_options(ParseState *pstate, List *stmt_options,
 	if (IsSet(supported_opts, SUBOPT_CREATE_SLOT))
 		opts->create_slot = true;
 	if (IsSet(supported_opts, SUBOPT_COPY_DATA))
-		opts->copy_data = true;
+		opts->copy_data = COPY_DATA_ON;
 	if (IsSet(supported_opts, SUBOPT_REFRESH))
 		opts->refresh = true;
 	if (IsSet(supported_opts, SUBOPT_BINARY))
@@ -196,7 +263,7 @@ parse_subscription_options(ParseState *pstate, List *stmt_options,
 				errorConflictingDefElem(defel, pstate);
 
 			opts->specified_opts |= SUBOPT_COPY_DATA;
-			opts->copy_data = defGetBoolean(defel);
+			opts->copy_data = DefGetCopyData(defel);
 		}
 		else if (IsSet(supported_opts, SUBOPT_SYNCHRONOUS_COMMIT) &&
 				 strcmp(defel->defname, "synchronous_commit") == 0)
@@ -333,17 +400,17 @@ parse_subscription_options(ParseState *pstate, List *stmt_options,
 					 errmsg("%s and %s are mutually exclusive options",
 							"connect = false", "create_slot = true")));
 
-		if (opts->copy_data &&
+		if (IS_COPY_DATA_ON_OR_FORCE(opts->copy_data) &&
 			IsSet(opts->specified_opts, SUBOPT_COPY_DATA))
 			ereport(ERROR,
 					(errcode(ERRCODE_SYNTAX_ERROR),
 					 errmsg("%s and %s are mutually exclusive options",
-							"connect = false", "copy_data = true")));
+							"connect = false", "copy_data = true/force")));
 
 		/* Change the defaults of other options. */
 		opts->enabled = false;
 		opts->create_slot = false;
-		opts->copy_data = false;
+		opts->copy_data = COPY_DATA_OFF;
 	}
 
 	/*
@@ -671,13 +738,14 @@ CreateSubscription(ParseState *pstate, CreateSubscriptionStmt *stmt,
 			 * Set sync state based on if we were asked to do data copy or
 			 * not.
 			 */
-			table_state = opts.copy_data ? SUBREL_STATE_INIT : SUBREL_STATE_READY;
+			table_state = IS_COPY_DATA_ON_OR_FORCE(opts.copy_data) ? SUBREL_STATE_INIT : SUBREL_STATE_READY;
 
 			/*
 			 * Get the table list from publisher and build local table status
 			 * info.
 			 */
-			tables = fetch_table_list(wrconn, publications);
+			tables = fetch_table_list(wrconn, publications, opts.copy_data,
+									  opts.only_local);
 			foreach(lc, tables)
 			{
 				RangeVar   *rv = (RangeVar *) lfirst(lc);
@@ -720,7 +788,8 @@ CreateSubscription(ParseState *pstate, CreateSubscriptionStmt *stmt,
 				 * PENDING, to allow ALTER SUBSCRIPTION ... REFRESH
 				 * PUBLICATION to work.
 				 */
-				if (opts.twophase && !opts.copy_data && tables != NIL)
+				if (opts.twophase && opts.copy_data == COPY_DATA_OFF &&
+					tables != NIL)
 					twophase_enabled = true;
 
 				walrcv_create_slot(wrconn, opts.slot_name, false, twophase_enabled,
@@ -761,7 +830,7 @@ CreateSubscription(ParseState *pstate, CreateSubscriptionStmt *stmt,
 }
 
 static void
-AlterSubscription_refresh(Subscription *sub, bool copy_data,
+AlterSubscription_refresh(Subscription *sub, CopyData copy_data,
 						  List *validate_publications)
 {
 	char	   *err;
@@ -797,7 +866,8 @@ AlterSubscription_refresh(Subscription *sub, bool copy_data,
 			check_publications(wrconn, validate_publications);
 
 		/* Get the table list from publisher. */
-		pubrel_names = fetch_table_list(wrconn, sub->publications);
+		pubrel_names = fetch_table_list(wrconn, sub->publications, copy_data,
+										sub->only_local);
 
 		/* Get local table list. */
 		subrel_states = GetSubscriptionRelations(sub->oid);
@@ -851,7 +921,7 @@ AlterSubscription_refresh(Subscription *sub, bool copy_data,
 						 list_length(subrel_states), sizeof(Oid), oid_cmp))
 			{
 				AddSubscriptionRelState(sub->oid, relid,
-										copy_data ? SUBREL_STATE_INIT : SUBREL_STATE_READY,
+										IS_COPY_DATA_ON_OR_FORCE(copy_data) ? SUBREL_STATE_INIT : SUBREL_STATE_READY,
 										InvalidXLogRecPtr);
 				ereport(DEBUG1,
 						(errmsg_internal("table \"%s.%s\" added to subscription \"%s\"",
@@ -1157,7 +1227,7 @@ AlterSubscription(ParseState *pstate, AlterSubscriptionStmt *stmt,
 					 * See ALTER_SUBSCRIPTION_REFRESH for details why this is
 					 * not allowed.
 					 */
-					if (sub->twophasestate == LOGICALREP_TWOPHASE_STATE_ENABLED && opts.copy_data)
+					if (sub->twophasestate == LOGICALREP_TWOPHASE_STATE_ENABLED && IS_COPY_DATA_ON_OR_FORCE(opts.copy_data))
 						ereport(ERROR,
 								(errcode(ERRCODE_SYNTAX_ERROR),
 								 errmsg("ALTER SUBSCRIPTION with refresh and copy_data is not allowed when two_phase is enabled"),
@@ -1209,7 +1279,7 @@ AlterSubscription(ParseState *pstate, AlterSubscriptionStmt *stmt,
 					 * See ALTER_SUBSCRIPTION_REFRESH for details why this is
 					 * not allowed.
 					 */
-					if (sub->twophasestate == LOGICALREP_TWOPHASE_STATE_ENABLED && opts.copy_data)
+					if (sub->twophasestate == LOGICALREP_TWOPHASE_STATE_ENABLED && IS_COPY_DATA_ON_OR_FORCE(opts.copy_data))
 						ereport(ERROR,
 								(errcode(ERRCODE_SYNTAX_ERROR),
 								 errmsg("ALTER SUBSCRIPTION with refresh and copy_data is not allowed when two_phase is enabled"),
@@ -1255,7 +1325,8 @@ AlterSubscription(ParseState *pstate, AlterSubscriptionStmt *stmt,
 				 *
 				 * For more details see comments atop worker.c.
 				 */
-				if (sub->twophasestate == LOGICALREP_TWOPHASE_STATE_ENABLED && opts.copy_data)
+				if (sub->twophasestate == LOGICALREP_TWOPHASE_STATE_ENABLED &&
+					IS_COPY_DATA_ON_OR_FORCE(opts.copy_data))
 					ereport(ERROR,
 							(errcode(ERRCODE_SYNTAX_ERROR),
 							 errmsg("ALTER SUBSCRIPTION ... REFRESH with copy_data is not allowed when two_phase is enabled"),
@@ -1778,22 +1849,27 @@ AlterSubscriptionOwner_oid(Oid subid, Oid newOwnerId)
  * publisher connection.
  */
 static List *
-fetch_table_list(WalReceiverConn *wrconn, List *publications)
+fetch_table_list(WalReceiverConn *wrconn, List *publications, CopyData copydata,
+				 bool only_local)
 {
 	WalRcvExecResult *res;
 	StringInfoData cmd;
 	TupleTableSlot *slot;
-	Oid			tableRow[2] = {TEXTOID, TEXTOID};
+	Oid			tableRow[3] = {TEXTOID, TEXTOID, CHAROID};
 	List	   *tablelist = NIL;
 
 	initStringInfo(&cmd);
-	appendStringInfoString(&cmd, "SELECT DISTINCT t.schemaname, t.tablename\n"
-						   "  FROM pg_catalog.pg_publication_tables t\n"
-						   " WHERE t.pubname IN (");
+	appendStringInfoString(&cmd,
+						   "SELECT DISTINCT N.nspname AS schemaname, C.relname AS tablename, PS.srrelid as replicated\n"
+						   "FROM pg_publication P,\n"
+						   "LATERAL pg_get_publication_tables(P.pubname) GPT\n"
+						   "LEFT JOIN pg_subscription_rel PS ON (GPT.relid = PS.srrelid),\n"
+						   "pg_class C JOIN pg_namespace N ON (N.oid = C.relnamespace)\n"
+						   "WHERE C.oid = GPT.relid AND P.pubname in (");
 	get_publications_str(publications, &cmd, true);
 	appendStringInfoChar(&cmd, ')');
 
-	res = walrcv_exec(wrconn, cmd.data, 2, tableRow);
+	res = walrcv_exec(wrconn, cmd.data, 3, tableRow);
 	pfree(cmd.data);
 
 	if (res->status != WALRCV_OK_TUPLES)
@@ -1819,6 +1895,25 @@ fetch_table_list(WalReceiverConn *wrconn, List *publications)
 		rv = makeRangeVar(nspname, relname, -1);
 		tablelist = lappend(tablelist, rv);
 
+		/*
+		 * XXX: During initial table sync we cannot differentiate between the
+		 * local and non-local data that is present in the HEAP. Identification
+		 * of local data can be done only from the WAL by using the origin id.
+		 * Throw an error so that the user can take care of the initial data
+		 * copying and then create subscription with copy_data as off or force.
+		 *
+		 * It is quite possible that subscriber has not yet pulled data to
+		 * the tables, but in ideal cases the table data will be subscribed.
+		 * To keep the code simple it is not checked if the subscriber table
+		 * has pulled the data or not.
+		 */
+		if (copydata == COPY_DATA_ON && only_local && !slot_attisnull(slot, 3))
+			ereport(ERROR,
+					errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
+					errmsg("CREATE/ALTER SUBSCRIPTION with only_local and copy_data as true is not allowed when the publisher might have replicated data, table:%s.%s might have replicated data in the publisher",
+						   nspname, relname),
+					errhint("Use CREATE/ALTER SUBSCRIPTION with copy_data = off or force"));
+
 		ExecClearTuple(slot);
 	}
 	ExecDropSingleTupleTableSlot(slot);
diff --git a/src/test/regress/expected/subscription.out b/src/test/regress/expected/subscription.out
index a9351b426b..d209da612b 100644
--- a/src/test/regress/expected/subscription.out
+++ b/src/test/regress/expected/subscription.out
@@ -47,7 +47,13 @@ ERROR:  must be superuser to create subscriptions
 SET SESSION AUTHORIZATION 'regress_subscription_user';
 -- fail - invalid option combinations
 CREATE SUBSCRIPTION regress_testsub2 CONNECTION 'dbname=regress_doesnotexist' PUBLICATION testpub WITH (connect = false, copy_data = true);
-ERROR:  connect = false and copy_data = true are mutually exclusive options
+ERROR:  connect = false and copy_data = true/force are mutually exclusive options
+CREATE SUBSCRIPTION regress_testsub2 CONNECTION 'dbname=regress_doesnotexist' PUBLICATION testpub WITH (connect = false, copy_data = on);
+ERROR:  connect = false and copy_data = true/force are mutually exclusive options
+CREATE SUBSCRIPTION regress_testsub2 CONNECTION 'dbname=regress_doesnotexist' PUBLICATION testpub WITH (connect = false, copy_data = 1);
+ERROR:  connect = false and copy_data = true/force are mutually exclusive options
+CREATE SUBSCRIPTION regress_testsub2 CONNECTION 'dbname=regress_doesnotexist' PUBLICATION testpub WITH (connect = false, copy_data = force);
+ERROR:  connect = false and copy_data = true/force are mutually exclusive options
 CREATE SUBSCRIPTION regress_testsub2 CONNECTION 'dbname=regress_doesnotexist' PUBLICATION testpub WITH (connect = false, enabled = true);
 ERROR:  connect = false and enabled = true are mutually exclusive options
 CREATE SUBSCRIPTION regress_testsub2 CONNECTION 'dbname=regress_doesnotexist' PUBLICATION testpub WITH (connect = false, create_slot = true);
@@ -93,6 +99,16 @@ ALTER SUBSCRIPTION regress_testsub4 SET (only_local = false);
 
 DROP SUBSCRIPTION regress_testsub3;
 DROP SUBSCRIPTION regress_testsub4;
+-- ok - valid copy_data options
+CREATE SUBSCRIPTION regress_testsub3 CONNECTION 'dbname=regress_doesnotexist' PUBLICATION testpub WITH (slot_name = NONE, connect = false, copy_data = false);
+WARNING:  tables were not subscribed, you will have to run ALTER SUBSCRIPTION ... REFRESH PUBLICATION to subscribe the tables
+CREATE SUBSCRIPTION regress_testsub4 CONNECTION 'dbname=regress_doesnotexist' PUBLICATION testpub WITH (slot_name = NONE, connect = false, copy_data = off);
+WARNING:  tables were not subscribed, you will have to run ALTER SUBSCRIPTION ... REFRESH PUBLICATION to subscribe the tables
+CREATE SUBSCRIPTION regress_testsub5 CONNECTION 'dbname=regress_doesnotexist' PUBLICATION testpub WITH (slot_name = NONE, connect = false, copy_data = 0);
+WARNING:  tables were not subscribed, you will have to run ALTER SUBSCRIPTION ... REFRESH PUBLICATION to subscribe the tables
+DROP SUBSCRIPTION regress_testsub3;
+DROP SUBSCRIPTION regress_testsub4;
+DROP SUBSCRIPTION regress_testsub5;
 -- fail - invalid connection string
 ALTER SUBSCRIPTION regress_testsub CONNECTION 'foobar';
 ERROR:  invalid connection string syntax: missing "=" after "foobar" in connection info string
diff --git a/src/test/regress/sql/subscription.sql b/src/test/regress/sql/subscription.sql
index 28eb91fc47..3e95c60800 100644
--- a/src/test/regress/sql/subscription.sql
+++ b/src/test/regress/sql/subscription.sql
@@ -40,6 +40,9 @@ SET SESSION AUTHORIZATION 'regress_subscription_user';
 
 -- fail - invalid option combinations
 CREATE SUBSCRIPTION regress_testsub2 CONNECTION 'dbname=regress_doesnotexist' PUBLICATION testpub WITH (connect = false, copy_data = true);
+CREATE SUBSCRIPTION regress_testsub2 CONNECTION 'dbname=regress_doesnotexist' PUBLICATION testpub WITH (connect = false, copy_data = on);
+CREATE SUBSCRIPTION regress_testsub2 CONNECTION 'dbname=regress_doesnotexist' PUBLICATION testpub WITH (connect = false, copy_data = 1);
+CREATE SUBSCRIPTION regress_testsub2 CONNECTION 'dbname=regress_doesnotexist' PUBLICATION testpub WITH (connect = false, copy_data = force);
 CREATE SUBSCRIPTION regress_testsub2 CONNECTION 'dbname=regress_doesnotexist' PUBLICATION testpub WITH (connect = false, enabled = true);
 CREATE SUBSCRIPTION regress_testsub2 CONNECTION 'dbname=regress_doesnotexist' PUBLICATION testpub WITH (connect = false, create_slot = true);
 CREATE SUBSCRIPTION regress_testsub2 CONNECTION 'dbname=regress_doesnotexist' PUBLICATION testpub WITH (slot_name = NONE, enabled = true);
@@ -66,6 +69,15 @@ ALTER SUBSCRIPTION regress_testsub4 SET (only_local = false);
 DROP SUBSCRIPTION regress_testsub3;
 DROP SUBSCRIPTION regress_testsub4;
 
+-- ok - valid copy_data options
+CREATE SUBSCRIPTION regress_testsub3 CONNECTION 'dbname=regress_doesnotexist' PUBLICATION testpub WITH (slot_name = NONE, connect = false, copy_data = false);
+CREATE SUBSCRIPTION regress_testsub4 CONNECTION 'dbname=regress_doesnotexist' PUBLICATION testpub WITH (slot_name = NONE, connect = false, copy_data = off);
+CREATE SUBSCRIPTION regress_testsub5 CONNECTION 'dbname=regress_doesnotexist' PUBLICATION testpub WITH (slot_name = NONE, connect = false, copy_data = 0);
+
+DROP SUBSCRIPTION regress_testsub3;
+DROP SUBSCRIPTION regress_testsub4;
+DROP SUBSCRIPTION regress_testsub5;
+
 -- fail - invalid connection string
 ALTER SUBSCRIPTION regress_testsub CONNECTION 'foobar';
 
diff --git a/src/test/subscription/t/032_onlylocal.pl b/src/test/subscription/t/032_onlylocal.pl
index 5ff5a0d9dc..47b9412e70 100644
--- a/src/test/subscription/t/032_onlylocal.pl
+++ b/src/test/subscription/t/032_onlylocal.pl
@@ -8,6 +8,116 @@ use PostgreSQL::Test::Cluster;
 use PostgreSQL::Test::Utils;
 use Test::More;
 
+my $result;
+my $stdout;
+my $stderr;
+
+my $subname_AB = 'tap_sub_A_B';
+my $subname_AC = 'tap_sub_A_C';
+my $subname_BA = 'tap_sub_B_A';
+my $subname_BC = 'tap_sub_B_C';
+my $subname_CA = 'tap_sub_C_A';
+my $subname_CB = 'tap_sub_C_B';
+
+# Detach node C from the node-group of (A, B, C) and clean the table contents
+# from all nodes.
+sub detach_node_clean_table_data
+{
+	my ($node_A, $node_B, $node_C) = @_;
+	$node_A->safe_psql('postgres', "DROP SUBSCRIPTION tap_sub_A_C");
+	$node_B->safe_psql('postgres', "DROP SUBSCRIPTION tap_sub_B_C");
+	$node_C->safe_psql('postgres', "DROP SUBSCRIPTION tap_sub_C_A");
+	$node_C->safe_psql('postgres', "DROP SUBSCRIPTION tap_sub_C_B");
+
+	$result =
+	  $node_A->safe_psql('postgres', "SELECT count(*) FROM pg_subscription");
+	is($result, qq(1), 'check subscription was dropped on subscriber');
+
+	$result =
+	  $node_B->safe_psql('postgres', "SELECT count(*) FROM pg_subscription");
+	is($result, qq(1), 'check subscription was dropped on subscriber');
+
+	$result =
+	  $node_C->safe_psql('postgres', "SELECT count(*) FROM pg_subscription");
+	is($result, qq(0), 'check subscription was dropped on subscriber');
+
+	$result = $node_A->safe_psql('postgres',
+		"SELECT count(*) FROM pg_replication_slots");
+	is($result, qq(1), 'check replication slot was dropped on publisher');
+
+	$result = $node_B->safe_psql('postgres',
+		"SELECT count(*) FROM pg_replication_slots");
+	is($result, qq(1), 'check replication slot was dropped on publisher');
+
+	$result = $node_C->safe_psql('postgres',
+		"SELECT count(*) FROM pg_replication_slots");
+	is($result, qq(0), 'check replication slot was dropped on publisher');
+
+	$node_A->safe_psql('postgres', "TRUNCATE tab_full");
+	$node_B->safe_psql('postgres', "TRUNCATE tab_full");
+	$node_C->safe_psql('postgres', "TRUNCATE tab_full");
+}
+
+# Subroutine to verify the data is replicated successfully.
+sub verify_data
+{
+	my ($node_A, $node_B, $node_C, $expect) = @_;
+
+	$node_A->wait_for_catchup($subname_BA);
+	$node_A->wait_for_catchup($subname_CA);
+	$node_B->wait_for_catchup($subname_AB);
+	$node_B->wait_for_catchup($subname_CB);
+	$node_C->wait_for_catchup($subname_AC);
+	$node_C->wait_for_catchup($subname_BC);
+
+	# check that data is replicated to all the nodes
+	$result =
+	  $node_A->safe_psql('postgres', "SELECT * FROM tab_full ORDER BY 1;");
+	is($result, qq($expect),
+	   'Data is replicated as expected'
+	);
+
+	$result =
+	  $node_B->safe_psql('postgres', "SELECT * FROM tab_full ORDER BY 1;");
+	is($result, qq($expect),
+	   'Data is replicated as expected'
+	);
+
+	$result =
+	  $node_C->safe_psql('postgres', "SELECT * FROM tab_full ORDER BY 1;");
+	is($result, qq($expect),
+	   'Data is replicated as expected'
+	);
+}
+
+my $synced_query =
+  "SELECT count(1) = 0 FROM pg_subscription_rel WHERE srsubstate NOT IN ('r', 's');";
+
+# Subroutine to create subscription and wait till the initial sync is completed.
+# Subroutine expects subscriber node, publisher node, subscription name,
+# destination connection string, publication name and the subscription with
+# options to be passed as input parameters.
+sub create_subscription
+{
+	my ($node_subscriber, $node_publisher, $sub_name, $node_connstr,
+		$pub_name, $with_options)
+	  = @_;
+
+	# Application_name is always assigned the same value as the subscription
+	# name.
+	$node_subscriber->safe_psql(
+		'postgres', "
+                CREATE SUBSCRIPTION $sub_name
+                CONNECTION '$node_connstr application_name=$sub_name'
+                PUBLICATION $pub_name
+                WITH ($with_options)");
+	$node_publisher->wait_for_catchup($sub_name);
+
+	# also wait for initial table sync to finish
+	$node_subscriber->poll_query_until('postgres', $synced_query)
+	  or die "Timed out while waiting for subscriber to synchronize data";
+}
+
 ###############################################################################
 # Setup a bidirectional logical replication between Node_A & Node_B
 ###############################################################################
@@ -43,42 +153,18 @@ $node_B->safe_psql('postgres', "CREATE TABLE tab_full (a int PRIMARY KEY)");
 my $node_A_connstr = $node_A->connstr . ' dbname=postgres';
 $node_A->safe_psql('postgres',
 	"CREATE PUBLICATION tap_pub_A FOR TABLE tab_full");
-my $appname_B1 = 'tap_sub_B1';
-$node_B->safe_psql(
-	'postgres', "
-	CREATE SUBSCRIPTION tap_sub_B1
-	CONNECTION '$node_A_connstr application_name=$appname_B1'
-	PUBLICATION tap_pub_A
-	WITH (only_local = on)");
+create_subscription($node_B, $node_A, $subname_BA, $node_A_connstr,
+	'tap_pub_A', 'copy_data = on, only_local = on');
 
 # node_B (pub) -> node_A (sub)
 my $node_B_connstr = $node_B->connstr . ' dbname=postgres';
 $node_B->safe_psql('postgres',
 	"CREATE PUBLICATION tap_pub_B FOR TABLE tab_full");
-my $appname_A = 'tap_sub_A';
-$node_A->safe_psql(
-	'postgres', "
-	CREATE SUBSCRIPTION tap_sub_A
-	CONNECTION '$node_B_connstr application_name=$appname_A'
-	PUBLICATION tap_pub_B
-	WITH (only_local = on, copy_data = off)");
-
-# Wait for subscribers to finish initialization
-$node_A->wait_for_catchup($appname_B1);
-$node_B->wait_for_catchup($appname_A);
-
-# Also wait for initial table sync to finish
-my $synced_query =
-  "SELECT count(1) = 0 FROM pg_subscription_rel WHERE srsubstate NOT IN ('r', 's');";
-$node_A->poll_query_until('postgres', $synced_query)
-  or die "Timed out while waiting for subscriber to synchronize data";
-$node_B->poll_query_until('postgres', $synced_query)
-  or die "Timed out while waiting for subscriber to synchronize data";
+create_subscription($node_A, $node_B, $subname_AB, $node_B_connstr,
+	'tap_pub_B', 'copy_data = off, only_local = on');
 
 is(1, 1, "Circular replication setup is complete");
 
-my $result;
-
 ###############################################################################
 # check that bidirectional logical replication setup does not cause infinite
 # recursive insertion.
@@ -88,8 +174,8 @@ my $result;
 $node_A->safe_psql('postgres', "INSERT INTO tab_full VALUES (11);");
 $node_B->safe_psql('postgres', "INSERT INTO tab_full VALUES (12);");
 
-$node_A->wait_for_catchup($appname_B1);
-$node_B->wait_for_catchup($appname_A);
+$node_A->wait_for_catchup($subname_BA);
+$node_B->wait_for_catchup($subname_AB);
 
 # check that transaction was committed on subscriber(s)
 $result = $node_A->safe_psql('postgres', "SELECT * FROM tab_full ORDER BY 1;");
@@ -122,25 +208,14 @@ $node_C->safe_psql('postgres', "CREATE TABLE tab_full (a int PRIMARY KEY)");
 my $node_C_connstr = $node_C->connstr . ' dbname=postgres';
 $node_C->safe_psql('postgres',
 	"CREATE PUBLICATION tap_pub_C FOR TABLE tab_full");
-
-my $appname_B2 = 'tap_sub_B2';
-$node_B->safe_psql(
-	'postgres', "
-	CREATE SUBSCRIPTION tap_sub_B2
-	CONNECTION '$node_C_connstr application_name=$appname_B2'
-	PUBLICATION tap_pub_C
-	WITH (only_local = on)");
-
-$node_C->wait_for_catchup($appname_B2);
-
-$node_C->poll_query_until('postgres', $synced_query)
-  or die "Timed out while waiting for subscriber to synchronize data";
+create_subscription($node_B, $node_C, $subname_BC, $node_C_connstr,
+	'tap_pub_C', 'copy_data = on, only_local = on');
 
 # insert a record
 $node_C->safe_psql('postgres', "INSERT INTO tab_full VALUES (13);");
 
-$node_C->wait_for_catchup($appname_B2);
-$node_B->wait_for_catchup($appname_A);
+$node_C->wait_for_catchup($subname_BC);
+$node_B->wait_for_catchup($subname_AB);
 
 $result = $node_B->safe_psql('postgres', "SELECT * FROM tab_full ORDER BY 1;");
 is($result, qq(11
@@ -154,6 +229,168 @@ is($result, qq(11
 12), 'Remote data originated from other node is not replicated when only_local option is ON'
 );
 
+# clear the operations done by this test
+$node_B->safe_psql(
+       'postgres', "
+        DROP SUBSCRIPTION $subname_BC");
+$node_C->safe_psql(
+	    'postgres', "
+        DELETE FROM tab_full");
+$node_B->safe_psql(
+	    'postgres', "
+        DELETE FROM tab_full where a = 13");
+
+###############################################################################
+# Specifying only_local 'on' which indicates that the publisher should only
+# replicate the changes that are generated locally from node_B, but in
+# this case since the node_B is also subscribing data from node_A, node_B can
+# have remotely originated data from node_A. We throw an error, in this case,
+# to draw attention to there being possible remote data.
+###############################################################################
+($result, $stdout, $stderr) = $node_A->psql(
+       'postgres', "
+        CREATE SUBSCRIPTION tap_sub_A2
+        CONNECTION '$node_B_connstr application_name=$subname_AB'
+        PUBLICATION tap_pub_B
+        WITH (only_local = on, copy_data = on)");
+like(
+       $stderr,
+       qr/ERROR:  CREATE\/ALTER SUBSCRIPTION with only_local and copy_data as true is not allowed when the publisher might have replicated data/,
+       "Create subscription with only_local and copy_data having replicated table in publisher"
+);
+
+# Creating subscription with only_local and copy_data as force should be
+# successful when the publisher has replicated data
+$node_A->safe_psql(
+       'postgres', "
+        CREATE SUBSCRIPTION tap_sub_A2
+        CONNECTION '$node_B_connstr application_name=$subname_AC'
+        PUBLICATION tap_pub_B
+        WITH (only_local = on, copy_data = force)");
+
+$node_A->safe_psql(
+       'postgres', "
+        DROP SUBSCRIPTION tap_sub_A2");
+
+###############################################################################
+# Join 3rd node (node_C) to the existing 2 nodes(node_A & node_B) bidirectional
+# replication setup when the existing nodes (node_A & node_B) has pre-existing
+# data and the new node (node_C) does not have any data.
+###############################################################################
+$result = $node_A->safe_psql('postgres', "SELECT * FROM tab_full ORDER BY 1;");
+is( $result, qq(11
+12), 'Check existing data');
+
+$result = $node_B->safe_psql('postgres', "SELECT * FROM tab_full ORDER BY 1;");
+is( $result, qq(11
+12), 'Check existing data');
+
+$result =
+	$node_C->safe_psql('postgres', "SELECT * FROM tab_full ORDER BY 1;");
+is( $result, qq(), 'Check existing data');
+
+create_subscription($node_A, $node_C, $subname_AC, $node_C_connstr,
+       'tap_pub_C', 'copy_data = off, only_local = on');
+create_subscription($node_B, $node_C, $subname_BC, $node_C_connstr,
+       'tap_pub_C', 'copy_data = off, only_local = on');
+create_subscription($node_C, $node_A, $subname_CA, $node_A_connstr,
+       'tap_pub_A', 'copy_data = force, only_local = on');
+create_subscription($node_C, $node_B, $subname_CB, $node_B_connstr,
+       'tap_pub_B', 'copy_data = off, only_local = on');
+
+# insert some data in all the nodes
+$node_A->safe_psql('postgres', "INSERT INTO tab_full VALUES (13);");
+$node_B->safe_psql('postgres', "INSERT INTO tab_full VALUES (23);");
+$node_C->safe_psql('postgres', "INSERT INTO tab_full VALUES (33);");
+
+verify_data($node_A, $node_B, $node_C, '11
+12
+13
+23
+33');
+
+detach_node_clean_table_data($node_A, $node_B, $node_C);
+
+###############################################################################
+# Join 3rd node (node_C) to the existing 2 nodes(node_A & node_B) bidirectional
+# replication setup when the existing nodes (node_A & node_B) and the new node
+# (node_C) does not have any data.
+###############################################################################
+$result = $node_A->safe_psql('postgres', "SELECT * FROM tab_full ORDER BY 1;");
+is( $result, qq(), 'Check existing data');
+
+$result = $node_B->safe_psql('postgres', "SELECT * FROM tab_full ORDER BY 1;");
+is( $result, qq(), 'Check existing data');
+
+$result = $node_C->safe_psql('postgres', "SELECT * FROM tab_full ORDER BY 1;");
+is( $result, qq(), 'Check existing data');
+
+create_subscription($node_A, $node_C, $subname_AC, $node_C_connstr,
+       'tap_pub_C', 'copy_data = off, only_local = on');
+create_subscription($node_B, $node_C, $subname_BC, $node_C_connstr,
+       'tap_pub_C', 'copy_data = off, only_local = on');
+create_subscription($node_C, $node_A, $subname_CA, $node_A_connstr,
+       'tap_pub_A', 'copy_data = off, only_local = on');
+create_subscription($node_C, $node_B, $subname_CB, $node_B_connstr,
+       'tap_pub_B', 'copy_data = off, only_local = on');
+
+# insert some data in all the nodes
+$node_A->safe_psql('postgres', "INSERT INTO tab_full VALUES (11);");
+$node_B->safe_psql('postgres', "INSERT INTO tab_full VALUES (21);");
+$node_C->safe_psql('postgres', "INSERT INTO tab_full VALUES (31);");
+
+verify_data($node_A, $node_B, $node_C, '11
+21
+31');
+
+detach_node_clean_table_data($node_A, $node_B, $node_C);
+
+###############################################################################
+# Join 3rd node (node_C) to the existing 2 nodes(node_A & node_B) bidirectional
+# replication setup when the existing nodes (node_A & node_B) has no data and
+# the new node (node_C) some pre-existing data.
+###############################################################################
+$node_C->safe_psql('postgres', "INSERT INTO tab_full VALUES (31);");
+
+$result = $node_A->safe_psql('postgres', "SELECT * FROM tab_full ORDER BY 1;");
+is( $result, qq(), 'Check existing data');
+
+$result = $node_B->safe_psql('postgres', "SELECT * FROM tab_full ORDER BY 1;");
+is( $result, qq(), 'Check existing data');
+
+$result = $node_C->safe_psql('postgres', "SELECT * FROM tab_full ORDER BY 1;");
+is($result, qq(31), 'Check existing data');
+
+create_subscription($node_A, $node_C, $subname_AC, $node_C_connstr,
+       'tap_pub_C', 'copy_data = on, only_local = on');
+create_subscription($node_B, $node_C, $subname_BC, $node_C_connstr,
+       'tap_pub_C', 'copy_data = on, only_local = on');
+
+$node_C->safe_psql('postgres',
+       "ALTER PUBLICATION tap_pub_C SET (publish='insert,update,delete');");
+
+$node_C->safe_psql('postgres', "TRUNCATE tab_full");
+
+# include truncates now
+$node_C->safe_psql('postgres',
+       "ALTER PUBLICATION tap_pub_C SET (publish='insert,update,delete,truncate');"
+);
+
+create_subscription($node_C, $node_A, $subname_CA, $node_A_connstr,
+       'tap_pub_A', 'copy_data = force, only_local = on');
+create_subscription($node_C, $node_B, $subname_CB, $node_B_connstr,
+       'tap_pub_B', 'copy_data = off, only_local = on');
+
+# insert some data in all the nodes
+$node_A->safe_psql('postgres', "INSERT INTO tab_full VALUES (12);");
+$node_B->safe_psql('postgres', "INSERT INTO tab_full VALUES (22);");
+$node_C->safe_psql('postgres', "INSERT INTO tab_full VALUES (32);");
+
+verify_data($node_A, $node_B, $node_C, '12
+22
+31
+32');
+
 # shutdown
 $node_B->stop('fast');
 $node_A->stop('fast');
diff --git a/src/tools/pgindent/typedefs.list b/src/tools/pgindent/typedefs.list
index 4fb746930a..b93381aafc 100644
--- a/src/tools/pgindent/typedefs.list
+++ b/src/tools/pgindent/typedefs.list
@@ -456,6 +456,7 @@ ConvProcInfo
 ConversionLocation
 ConvertRowtypeExpr
 CookedConstraint
+CopyData
 CopyDest
 CopyFormatOptions
 CopyFromState
-- 
2.32.0

#20

bruce@momjian.us

over 3 years ago

In reply to: Bruce Momjian (#18)

Re: Multi-Master Logical Replication

On Wed, May 25, 2022 at 10:32:50PM -0400, Bruce Momjian wrote:

On Wed, May 25, 2022 at 12:13:17PM +0530, Amit Kapila wrote:

You still have not answered my question above. "Without these features,
what workload would this help with?" You have only explained how the
patch would fix one of the many larger problems.

It helps with setting up logical replication among two or more nodes
(data flows both ways) which is important for use cases where
applications are data-aware. For such apps, it will be beneficial to

That does make sense, thanks.

Uh, thinking some more, why would anyone set things up this way ---
having part of a table being primary on one server and a different part
of the table be a subscriber. Seems it would be simpler and safer to
create two child tables and have one be primary on only one server.
Users can access both tables using the parent.

--
Bruce Momjian <bruce@momjian.us> https://momjian.us
EDB https://enterprisedb.com

Indecision is a decision. Inaction is an action. Mark Batterson

#21

[1]: /messages/by-id/CAHut+PsvvfTWWwE8vkgUg4q+QLyoCyNE7NU=mEiYHcMcXciXdg@mail.gmail.com

amit.kapila16@gmail.com

over 3 years ago

In reply to: Bruce Momjian (#20)

Re: Multi-Master Logical Replication

On Tue, May 31, 2022 at 7:36 PM Bruce Momjian <bruce@momjian.us> wrote:

On Wed, May 25, 2022 at 10:32:50PM -0400, Bruce Momjian wrote:

On Wed, May 25, 2022 at 12:13:17PM +0530, Amit Kapila wrote:

It helps with setting up logical replication among two or more nodes
(data flows both ways) which is important for use cases where
applications are data-aware. For such apps, it will be beneficial to

That does make sense, thanks.

Uh, thinking some more, why would anyone set things up this way ---
having part of a table being primary on one server and a different part
of the table be a subscriber. Seems it would be simpler and safer to
create two child tables and have one be primary on only one server.
Users can access both tables using the parent.

Yes, users can choose to do that way but still, to keep the nodes in
sync and continuity of operations, it will be very difficult to manage
the operations without the LRG APIs. Let us consider a simple two-node
example where on each node there is Table T that has partitions P1 and
P2. As far as I can understand, one needs to have the below kind of
set-up to allow local operations on geographically distributed nodes.

Node-1:
node1 writes to P1
node1 publishes P1
node2 subscribes to P1 of node1

Node-2:
node2 writes to P2
node2 publishes P2
node1 subscribes to P2 on node2

In this setup, we need to publish individual partitions, otherwise, we
will face the loop problem where the data sent by node-1 to node-2 via
logical replication will again come back to it causing problems like
constraints violations, duplicate data, etc. There could be other ways
to do this set up with current logical replication commands (for ex.
publishing via root table) but that would require ways to avoid loops
and could have other challenges.

Now, in such a setup/scheme, consider a scenario (scenario-1), where
node-2 went off (either it crashes, went out of network, just died,
etc.) and comes up after some time. Now, one can either make the
node-2 available by fixing the problem it has or can promote standby
in that location (if any) to become master, both might require some
time. In the meantime to continue the operations (which provides a
seamless experience to users), users will be connected to node-1 to
perform the required write operations. Now, to achieve this without
LRG APIs, it will be quite complex for users to keep the data in sync.
One needs to perform various steps to get the partition P2 data that
went to node-1 till the time node-2 was not available. On node-1, it
has to publish P2 changes for the time node-2 becomes available with
the help of Create/Drop Publication APIs. And when node-2 comes back,
it has to create a subscription for the above publication pub-2 to get
that data, ensure both the nodes and in sync, and then allow
operations on node-2.

Not only this, but if there are more nodes in this set-up (say-10), it
has to change (drop/create) subscriptions corresponding to partition
P2 on all other nodes as each individual node is the owner of some
partition.

Another possibility is that the entire data center where node-2 was
present was gone due to some unfortunate incident in which case they
need to set up a new data center and hence a new node. Now, in such a
case, the user needs to do all the steps mentioned in the previous
scenario and additionally, it needs to ensure that it set up the node
to sync all the existing data (of all partitions) before this node
again starts receiving write changes for partition P2.

I think all this should be relatively simpler with LRG APIs wherein
for the second scenario user ideally just needs to use the lrg_attach*
API and in the first scenario, it should automatically sync the
missing data once the node-2 comes back.

Now, the other important point that we should also consider for these
LRG APIs is the ease of setup even in the normal case where we are
just adding a new node as mentioned by Peter Smith in his email [1]/messages/by-id/CAHut+PsvvfTWWwE8vkgUg4q+QLyoCyNE7NU=mEiYHcMcXciXdg@mail.gmail.com
(LRG makes setup easier). e.g. even if there are many nodes we only
need a single lrg_attach by the joining node instead of needing N-1
subscriptions on all the existing nodes.

--
With Regards,
Amit Kapila.

#22

bruce@momjian.us

over 3 years ago

In reply to: Amit Kapila (#21)

Re: Multi-Master Logical Replication

On Wed, Jun 1, 2022 at 10:27:27AM +0530, Amit Kapila wrote:

On Tue, May 31, 2022 at 7:36 PM Bruce Momjian <bruce@momjian.us> wrote:

Uh, thinking some more, why would anyone set things up this way ---
having part of a table being primary on one server and a different part
of the table be a subscriber. Seems it would be simpler and safer to
create two child tables and have one be primary on only one server.
Users can access both tables using the parent.

Yes, users can choose to do that way but still, to keep the nodes in
sync and continuity of operations, it will be very difficult to manage
the operations without the LRG APIs. Let us consider a simple two-node
example where on each node there is Table T that has partitions P1 and
P2. As far as I can understand, one needs to have the below kind of
set-up to allow local operations on geographically distributed nodes.

Node-1:
node1 writes to P1
node1 publishes P1
node2 subscribes to P1 of node1

Node-2:
node2 writes to P2
node2 publishes P2
node1 subscribes to P2 on node2

Yes, that is how you would set it up.

In this setup, we need to publish individual partitions, otherwise, we
will face the loop problem where the data sent by node-1 to node-2 via
logical replication will again come back to it causing problems like
constraints violations, duplicate data, etc. There could be other ways
to do this set up with current logical replication commands (for ex.
publishing via root table) but that would require ways to avoid loops
and could have other challenges.

Right, individual paritions.

Now, in such a setup/scheme, consider a scenario (scenario-1), where
node-2 went off (either it crashes, went out of network, just died,
etc.) and comes up after some time. Now, one can either make the
node-2 available by fixing the problem it has or can promote standby
in that location (if any) to become master, both might require some
time. In the meantime to continue the operations (which provides a
seamless experience to users), users will be connected to node-1 to
perform the required write operations. Now, to achieve this without
LRG APIs, it will be quite complex for users to keep the data in sync.
One needs to perform various steps to get the partition P2 data that
went to node-1 till the time node-2 was not available. On node-1, it
has to publish P2 changes for the time node-2 becomes available with
the help of Create/Drop Publication APIs. And when node-2 comes back,
it has to create a subscription for the above publication pub-2 to get
that data, ensure both the nodes and in sync, and then allow
operations on node-2.

Well, you are going to need to modify the app so it knows it can write
to both partitions on failover anyway. I just don't see how adding this
complexity is wise.

My big point is that you should not be showing up with a patch but
rather have these discussions to get agreement that this is the
direction the community wants to go.

--
Bruce Momjian <bruce@momjian.us> https://momjian.us
EDB https://enterprisedb.com

Indecision is a decision. Inaction is an action. Mark Batterson

#23

amit.kapila16@gmail.com

over 3 years ago

In reply to: Bruce Momjian (#22)

Re: Multi-Master Logical Replication

On Wed, Jun 1, 2022 at 7:33 PM Bruce Momjian <bruce@momjian.us> wrote:

On Wed, Jun 1, 2022 at 10:27:27AM +0530, Amit Kapila wrote:

On Tue, May 31, 2022 at 7:36 PM Bruce Momjian <bruce@momjian.us> wrote:

Uh, thinking some more, why would anyone set things up this way ---
having part of a table being primary on one server and a different part
of the table be a subscriber. Seems it would be simpler and safer to
create two child tables and have one be primary on only one server.
Users can access both tables using the parent.

Yes, users can choose to do that way but still, to keep the nodes in
sync and continuity of operations, it will be very difficult to manage
the operations without the LRG APIs. Let us consider a simple two-node
example where on each node there is Table T that has partitions P1 and
P2. As far as I can understand, one needs to have the below kind of
set-up to allow local operations on geographically distributed nodes.

Node-1:
node1 writes to P1
node1 publishes P1
node2 subscribes to P1 of node1

Node-2:
node2 writes to P2
node2 publishes P2
node1 subscribes to P2 on node2

Yes, that is how you would set it up.

In this setup, we need to publish individual partitions, otherwise, we
will face the loop problem where the data sent by node-1 to node-2 via
logical replication will again come back to it causing problems like
constraints violations, duplicate data, etc. There could be other ways
to do this set up with current logical replication commands (for ex.
publishing via root table) but that would require ways to avoid loops
and could have other challenges.

Right, individual paritions.

Now, in such a setup/scheme, consider a scenario (scenario-1), where
node-2 went off (either it crashes, went out of network, just died,
etc.) and comes up after some time. Now, one can either make the
node-2 available by fixing the problem it has or can promote standby
in that location (if any) to become master, both might require some
time. In the meantime to continue the operations (which provides a
seamless experience to users), users will be connected to node-1 to
perform the required write operations. Now, to achieve this without
LRG APIs, it will be quite complex for users to keep the data in sync.
One needs to perform various steps to get the partition P2 data that
went to node-1 till the time node-2 was not available. On node-1, it
has to publish P2 changes for the time node-2 becomes available with
the help of Create/Drop Publication APIs. And when node-2 comes back,
it has to create a subscription for the above publication pub-2 to get
that data, ensure both the nodes and in sync, and then allow
operations on node-2.

Well, you are going to need to modify the app so it knows it can write
to both partitions on failover anyway.

I am not sure if this point is clear to me. From what I can understand
there are two possibilities for the app in this case and both seem to
be problematic.

(a) The app can be taught to write to the P2 partition in node-1 till
the time node-2 is not available. If so, how will we get the partition
P2 data that went to node-1 till the time node-2 was unavailable? If
we don't get the data to node-2 then the operations on node-2 (once it
comes back) can return incorrect results. Also, we need to ensure all
the data for P2 that went to node-1 should be replicated to all other
nodes in the system and for that also we need to create new
subscriptions pointing to node-1. It is easier to think of doing this
for physical replication where after failover the old master node can
start following the new node and the app just need to be taught to
write to the new master node. I can't see how we can achieve that by
current logical replication APIs (apart from doing the complex steps
shared by me). One of the purposes of these new LRG APIs is to ensure
that users don't need to follow those complex steps after failover.

(b) The other possibility is that the app is responsible to ensure
that the same data is written on both node-1 and node-2 for the time
one of those is not available. For that app needs to store the data at
someplace for the time one of the nodes is unavailable and then write
it once the other node becomes available? Also, it won't be practical
when there are more partitions (say 10 or more) as all the partitions
data needs to be present on each node. I think it is the
responsibility of the database to keep the data in sync among nodes
when one or more of the nodes are not available.

--
With Regards,
Amit Kapila.

#24

Peter Smith

smithpb2250@gmail.com

over 3 years ago

In reply to: Bruce Momjian (#22)

Re: Multi-Master Logical Replication

On Thu, Jun 2, 2022 at 12:03 AM Bruce Momjian <bruce@momjian.us> wrote:

On Wed, Jun 1, 2022 at 10:27:27AM +0530, Amit Kapila wrote:

...

My big point is that you should not be showing up with a patch but
rather have these discussions to get agreement that this is the
direction the community wants to go.

The purpose of posting the POC patch was certainly not to present a
fait accompli design/implementation.

We wanted to solicit some community feedback about the desirability of
the feature, but because LRG is complicated to describe we felt that
having a basic functional POC might help to better understand the
proposal. Also, we thought the ability to experiment with the proposed
API could help people to decide whether LRG is something worth
pursuing or not.

------
Kind Regards,
Peter Smith.
Fujitsu Australia

#25

bruce@momjian.us

over 3 years ago

In reply to: Peter Smith (#24)

Re: Multi-Master Logical Replication

On Thu, Jun 2, 2022 at 05:12:49PM +1000, Peter Smith wrote:

On Thu, Jun 2, 2022 at 12:03 AM Bruce Momjian <bruce@momjian.us> wrote:

On Wed, Jun 1, 2022 at 10:27:27AM +0530, Amit Kapila wrote:

...

My big point is that you should not be showing up with a patch but
rather have these discussions to get agreement that this is the
direction the community wants to go.

The purpose of posting the POC patch was certainly not to present a
fait accompli design/implementation.

We wanted to solicit some community feedback about the desirability of
the feature, but because LRG is complicated to describe we felt that
having a basic functional POC might help to better understand the
proposal. Also, we thought the ability to experiment with the proposed
API could help people to decide whether LRG is something worth
pursuing or not.

I don't think the POC is helping, and I am not sure we really want to
support this style of architecture due to its complexity vs other
options.

--
Bruce Momjian <bruce@momjian.us> https://momjian.us
EDB https://enterprisedb.com

Indecision is a decision. Inaction is an action. Mark Batterson

#26

[1]: https://docs.oracle.com/cd/E18283_01/server.112/e10707/rarrcatpac.htm
[2]: https://docs.oracle.com/cd/E18283_01/server.112/e10707/rarrcatpac.htm#i96251
[3]: https://docs.oracle.com/cd/E18283_01/server.112/e10707/rarrcatpac.htm#i94500
[4]: https://docs.oracle.com/cd/E18283_01/server.112/e10707/rarrcatpac.htm#i97185
[5]: https://dev.mysql.com/doc/refman/8.0/en/group-replication-configuring-instances.html

amit.kapila16@gmail.com

over 3 years ago

In reply to: Bruce Momjian (#25)

Re: Multi-Master Logical Replication

On Fri, Jun 3, 2022 at 7:12 AM Bruce Momjian <bruce@momjian.us> wrote:

On Thu, Jun 2, 2022 at 05:12:49PM +1000, Peter Smith wrote:

On Thu, Jun 2, 2022 at 12:03 AM Bruce Momjian <bruce@momjian.us> wrote:

On Wed, Jun 1, 2022 at 10:27:27AM +0530, Amit Kapila wrote:

...

My big point is that you should not be showing up with a patch but
rather have these discussions to get agreement that this is the
direction the community wants to go.

The purpose of posting the POC patch was certainly not to present a
fait accompli design/implementation.

We wanted to solicit some community feedback about the desirability of
the feature, but because LRG is complicated to describe we felt that
having a basic functional POC might help to better understand the
proposal. Also, we thought the ability to experiment with the proposed
API could help people to decide whether LRG is something worth
pursuing or not.

I don't think the POC is helping, and I am not sure we really want to
support this style of architecture due to its complexity vs other
options.

None of the other options discussed on this thread appears to be
better or can serve the intent. What other options do you have in mind
and how are they simpler than this? As far as I can understand this
provides a simple way to set up n-way replication among nodes.

I see that other databases provide similar ways to set up n-way
replication. See [1]https://docs.oracle.com/cd/E18283_01/server.112/e10707/rarrcatpac.htm and in particular [2]https://docs.oracle.com/cd/E18283_01/server.112/e10707/rarrcatpac.htm#i96251[3]https://docs.oracle.com/cd/E18283_01/server.112/e10707/rarrcatpac.htm#i94500[4]https://docs.oracle.com/cd/E18283_01/server.112/e10707/rarrcatpac.htm#i97185 provides a way to set
up n-way replication via APIs. Yet, another way is via configuration
as seems to be provided by MySQL [5]https://dev.mysql.com/doc/refman/8.0/en/group-replication-configuring-instances.html (Group Replication Settings).
Most of the advantages have already been shared but let me summarize
again the benefits it brings (a) more localized database access for
geographically distributed databases, (b) ensuring continuous
availability in case of the primary site becomes unavailable due to a
system or network outage, any natural disaster on the site, (c)
environments that require a fluid replication infrastructure, where
the number of servers has to grow or shrink dynamically and with as
few side-effects as possible. For instance, database services for the
cloud, and (d) load balancing. Some of these can probably be served in
other ways but not everything.

I see your point about POC not helping here and it can also sometimes
discourage OP if we decide not to do this feature or do it in an
entirely different way. But OTOH, I don't see it stopping us from
discussing the desirability or design of this feature.

--
With Regards,
Amit Kapila.

#27

over 3 years ago

In reply to: Peter Smith (#17)

RE: Multi-Master Logical Replication

Dear hackers,

I found another use-case for LRG. It might be helpful for migration.

LRG for migration
------------------------------------------
LRG may be helpful for machine migration, OS upgrade,
or PostgreSQL itself upgrade.

Assumes that users want to migrate database to other environment,
e.g., PG16 on RHEL7 to PG18 on RHEL8.
Users must copy all data into new server and catchup all changes.
In this case streaming replication cannot be used
because it requires same OS and same PostgreSQL major version.
Moreover, it is desirable to be able to return to the original environment at any time
in case of application or other environmental deficiencies.

Operation steps with LRG
------------------------------------------

LRG is appropriate for the situation. Following lines are the workflow that users must do:

1. Copy the table definition to the newer node(PG18), via pg_dump/pg_restore
2. Execute lrg_create() in the older node(PG16)
3. Execute lrg_node_attach() in PG18

=== data will be shared here===

4. Change the connection of the user application to PG18
5. Check whether ERROR is raised or not. If some ERRORs are raised,
users can change back the connection to PG16.
6. Remove the created node group if application works well.

These operations may reduce system downtime
due to incompatibilities associated with version upgrades.

Best Regards,
Hayato Kuroda
FUJITSU LIMITED

#28

Bharath Rupireddy

bharath.rupireddyforpostgres@gmail.com

over 3 years ago

In reply to: Peter Smith (#1)

Re: Multi-Master Logical Replication

On Thu, Apr 28, 2022 at 5:20 AM Peter Smith <smithpb2250@gmail.com> wrote:

MULTI-MASTER LOGICAL REPLICATION

1.0 BACKGROUND

Let’s assume that a user wishes to set up a multi-master environment
so that a set of PostgreSQL instances (nodes) use logical replication
to share tables with every other node in the set.

We define this as a multi-master logical replication (MMLR) node-set.

<please refer to the attached node-set diagram>

1.1 ADVANTAGES OF MMLR

- Increases write scalability (e.g., all nodes can write arbitrary data).
- Allows load balancing
- Allows rolling updates of nodes (e.g., logical replication works
between different major versions of PostgreSQL).
- Improves the availability of the system (e.g., no single point of failure)
- Improves performance (e.g., lower latencies for geographically local nodes)

Thanks for working on this proposal. I have a few high-level thoughts,
please bear with me if I repeat any of them:

1. Are you proposing to use logical replication subscribers to be in
sync quorum? In other words, in an N-masters node, M (M >= N)-node
configuration, will each master be part of the sync quorum in the
other master?
2. Is there any mention of reducing the latencies that logical
replication will have generally (initial table sync and
after-caught-up decoding and replication latencies)?
3. What if "some" postgres provider assures an SLA of very few seconds
for failovers in typical HA set up with primary and multiple sync and
async standbys? In this context, where does the multi-master
architecture sit in the broad range of postgres use-cases?
4. Can the design proposed here be implemented as an extension instead
of a core postgres solution?
5. Why should one use logical replication for multi master
replication? If logical replication is used, isn't it going to be
something like logically decode and replicate every WAL record from
one master to all other masters? Instead, can't it be achieved via
streaming/physical replication?

Regards,
Bharath Rupireddy.

#29

[1]: /messages/by-id/CAA4eK1+ZP9c6q1BQWSQC__w09WQ-qGt22dTmajDmTxR_CAUyJQ@mail.gmail.com
[2]: /messages/by-id/TYAPR01MB58660FCFEC7633E15106C94BF5A29@TYAPR01MB5866.jpnprd01.prod.outlook.com
[3]: /messages/by-id/CAA4eK1+DRHCNLongM0stsVBY01S-s=Ea_yjBFnv_Uz3m3Hky-w@mail.gmail.com

amit.kapila16@gmail.com

over 3 years ago

In reply to: Bharath Rupireddy (#28)

Re: Multi-Master Logical Replication

On Thu, Jun 9, 2022 at 6:04 PM Bharath Rupireddy
<bharath.rupireddyforpostgres@gmail.com> wrote:

On Thu, Apr 28, 2022 at 5:20 AM Peter Smith <smithpb2250@gmail.com> wrote:

MULTI-MASTER LOGICAL REPLICATION

1.0 BACKGROUND

Let’s assume that a user wishes to set up a multi-master environment
so that a set of PostgreSQL instances (nodes) use logical replication
to share tables with every other node in the set.

We define this as a multi-master logical replication (MMLR) node-set.

<please refer to the attached node-set diagram>

1.1 ADVANTAGES OF MMLR

- Increases write scalability (e.g., all nodes can write arbitrary data).
- Allows load balancing
- Allows rolling updates of nodes (e.g., logical replication works
between different major versions of PostgreSQL).
- Improves the availability of the system (e.g., no single point of failure)
- Improves performance (e.g., lower latencies for geographically local nodes)

Thanks for working on this proposal. I have a few high-level thoughts,
please bear with me if I repeat any of them:

1. Are you proposing to use logical replication subscribers to be in
sync quorum? In other words, in an N-masters node, M (M >= N)-node
configuration, will each master be part of the sync quorum in the
other master?

What exactly do you mean by sync quorum here? If you mean to say that
each master node will be allowed to wait till the commit happens on
all other nodes similar to how our current synchronous_commit and
synchronous_standby_names work, then yes, it could be achieved. I
think the patch currently doesn't support this but it could be
extended to support the same. Basically, one can be allowed to set up
async and sync nodes in combination depending on its use case.

2. Is there any mention of reducing the latencies that logical
replication will have generally (initial table sync and
after-caught-up decoding and replication latencies)?

No, this won't change under the hood replication mechanism.

3. What if "some" postgres provider assures an SLA of very few seconds
for failovers in typical HA set up with primary and multiple sync and
async standbys? In this context, where does the multi-master
architecture sit in the broad range of postgres use-cases?

I think this is one of the primary use cases of the n-way logical
replication solution where in there shouldn't be any noticeable wait
time when one or more of the nodes goes down. All nodes have the
capability to allow writes so the app just needs to connect to another
node. I feel some analysis is required to find out and state exactly
how the users can achieve this but seems doable. The other use cases
are discussed in this thread and are summarized in emails [1]/messages/by-id/CAA4eK1+ZP9c6q1BQWSQC__w09WQ-qGt22dTmajDmTxR_CAUyJQ@mail.gmail.com[2]/messages/by-id/TYAPR01MB58660FCFEC7633E15106C94BF5A29@TYAPR01MB5866.jpnprd01.prod.outlook.com.

4. Can the design proposed here be implemented as an extension instead
of a core postgres solution?

Yes, I think it could be. I think this proposal introduces some system
tables, so need to analyze what to do about that. BTW, do you see any
advantages to doing so?

5. Why should one use logical replication for multi master
replication? If logical replication is used, isn't it going to be
something like logically decode and replicate every WAL record from
one master to all other masters? Instead, can't it be achieved via
streaming/physical replication?

The failover/downtime will be much lesser in a solution based on
logical replication because all nodes are master nodes and users will
be allowed to write on other nodes instead of waiting for the physical
standby to become writeable. Then it will allow more localized
database access for geographically distributed databases, see the
email for further details on this [3]/messages/by-id/CAA4eK1+DRHCNLongM0stsVBY01S-s=Ea_yjBFnv_Uz3m3Hky-w@mail.gmail.com. Also, the benefiting scenarios
are the same as all usual Logical Replication quoted benefits - e.g
version independence, getting selective/required data, etc.

--
With Regards,
Amit Kapila.

#30

Bharath Rupireddy

bharath.rupireddyforpostgres@gmail.com

over 3 years ago

In reply to: Amit Kapila (#29)

Re: Multi-Master Logical Replication

On Fri, Jun 10, 2022 at 9:54 AM Amit Kapila <amit.kapila16@gmail.com> wrote:

On Thu, Jun 9, 2022 at 6:04 PM Bharath Rupireddy
<bharath.rupireddyforpostgres@gmail.com> wrote:

On Thu, Apr 28, 2022 at 5:20 AM Peter Smith <smithpb2250@gmail.com> wrote:

MULTI-MASTER LOGICAL REPLICATION

1.0 BACKGROUND

Let’s assume that a user wishes to set up a multi-master environment
so that a set of PostgreSQL instances (nodes) use logical replication
to share tables with every other node in the set.

We define this as a multi-master logical replication (MMLR) node-set.

<please refer to the attached node-set diagram>

1.1 ADVANTAGES OF MMLR

- Increases write scalability (e.g., all nodes can write arbitrary data).
- Allows load balancing
- Allows rolling updates of nodes (e.g., logical replication works
between different major versions of PostgreSQL).
- Improves the availability of the system (e.g., no single point of failure)
- Improves performance (e.g., lower latencies for geographically local nodes)

Thanks for working on this proposal. I have a few high-level thoughts,
please bear with me if I repeat any of them:

1. Are you proposing to use logical replication subscribers to be in
sync quorum? In other words, in an N-masters node, M (M >= N)-node
configuration, will each master be part of the sync quorum in the
other master?

What exactly do you mean by sync quorum here? If you mean to say that
each master node will be allowed to wait till the commit happens on
all other nodes similar to how our current synchronous_commit and
synchronous_standby_names work, then yes, it could be achieved. I
think the patch currently doesn't support this but it could be
extended to support the same. Basically, one can be allowed to set up
async and sync nodes in combination depending on its use case.

Yes, I meant each master node will be in synchronous_commit with
others. In this setup, do you see any problems such as deadlocks if
write-txns on the same table occur on all the masters at a time?

If the master nodes are not in synchronous_commit i.e. connected in
asynchronous mode, don't we have data synchronous problems because of
logical decoding and replication latencies? Say, I do a bulk-insert to
a table foo on master 1, Imagine there's a latency with which the
inserted rows get replicated to master 2 and meanwhile I do update on
the same table foo on master 2 based on the rows inserted in master 1
- master 2 doesn't have all the inserted rows on master 1 - how does
the solution proposed here address this problem?

3. What if "some" postgres provider assures an SLA of very few seconds
for failovers in typical HA set up with primary and multiple sync and
async standbys? In this context, where does the multi-master
architecture sit in the broad range of postgres use-cases?

I think this is one of the primary use cases of the n-way logical
replication solution where in there shouldn't be any noticeable wait
time when one or more of the nodes goes down. All nodes have the
capability to allow writes so the app just needs to connect to another
node. I feel some analysis is required to find out and state exactly
how the users can achieve this but seems doable. The other use cases
are discussed in this thread and are summarized in emails [1][2].

IIUC, the main goals of this feature are - zero failover times and
less write latencies, right? How is it going to solve the data
synchronization problem (stated above) with the master nodes connected
to each other in asynchronous mode?

4. Can the design proposed here be implemented as an extension instead
of a core postgres solution?

Yes, I think it could be. I think this proposal introduces some system
tables, so need to analyze what to do about that. BTW, do you see any
advantages to doing so?

IMO, yes, doing it the extension way has many advantages - it doesn't
have to touch the core part of postgres, usability will be good -
whoever requires this solution will use and we can avoid code chunks
within the core such as if (feature_enabled) { do foo} else { do bar}
sorts. Since this feature is based on core postgres logical
replication infrastructure, I think it's worth implementing it as an
extension first, maybe the extension as a PoC?

5. Why should one use logical replication for multi master
replication? If logical replication is used, isn't it going to be
something like logically decode and replicate every WAL record from
one master to all other masters? Instead, can't it be achieved via
streaming/physical replication?

The failover/downtime will be much lesser in a solution based on
logical replication because all nodes are master nodes and users will
be allowed to write on other nodes instead of waiting for the physical
standby to become writeable.

I don't think that's a correct statement unless the design proposed
here addresses the data synchronization problem (stated above) with
the master nodes connected to each other in asynchronous mode.

Then it will allow more localized
database access for geographically distributed databases, see the
email for further details on this [3]. Also, the benefiting scenarios
are the same as all usual Logical Replication quoted benefits - e.g
version independence, getting selective/required data, etc.

[1] - /messages/by-id/CAA4eK1+ZP9c6q1BQWSQC__w09WQ-qGt22dTmajDmTxR_CAUyJQ@mail.gmail.com
[2] - /messages/by-id/TYAPR01MB58660FCFEC7633E15106C94BF5A29@TYAPR01MB5866.jpnprd01.prod.outlook.com
[3] - /messages/by-id/CAA4eK1+DRHCNLongM0stsVBY01S-s=Ea_yjBFnv_Uz3m3Hky-w@mail.gmail.com

IMHO, geographically distributed databases are "different sorts in
themselves" and have different ways and means to address data
synchronization, latencies, replication, failovers, conflict
resolutions etc. (I'm no expert there, others may have better
thoughts).

Having said that, it will be great to know if there are any notable or
mentionable customer typical scenarios or use-cases for multi master
solutions within postgres.

Regards,
Bharath Rupireddy.

#31

r.takahashi_2@fujitsu.com

amit.kapila16@gmail.com

over 3 years ago

In reply to: Bharath Rupireddy (#30)

Re: Multi-Master Logical Replication

On Fri, Jun 10, 2022 at 12:40 PM Bharath Rupireddy
<bharath.rupireddyforpostgres@gmail.com> wrote:

On Fri, Jun 10, 2022 at 9:54 AM Amit Kapila <amit.kapila16@gmail.com> wrote:

1. Are you proposing to use logical replication subscribers to be in
sync quorum? In other words, in an N-masters node, M (M >= N)-node
configuration, will each master be part of the sync quorum in the
other master?

What exactly do you mean by sync quorum here? If you mean to say that
each master node will be allowed to wait till the commit happens on
all other nodes similar to how our current synchronous_commit and
synchronous_standby_names work, then yes, it could be achieved. I
think the patch currently doesn't support this but it could be
extended to support the same. Basically, one can be allowed to set up
async and sync nodes in combination depending on its use case.

Yes, I meant each master node will be in synchronous_commit with
others. In this setup, do you see any problems such as deadlocks if
write-txns on the same table occur on all the masters at a time?

I have not tried but I don't see in theory why this should happen
unless someone tries to update a similar set of rows in conflicting
order similar to how it can happen in a single node. If so, it will
error out and one of the conflicting transactions needs to be retried.
IOW, I think the behavior should be the same as on a single node. Do
you have any particular examples in mind?

If the master nodes are not in synchronous_commit i.e. connected in
asynchronous mode, don't we have data synchronous problems because of
logical decoding and replication latencies? Say, I do a bulk-insert to
a table foo on master 1, Imagine there's a latency with which the
inserted rows get replicated to master 2 and meanwhile I do update on
the same table foo on master 2 based on the rows inserted in master 1
- master 2 doesn't have all the inserted rows on master 1 - how does
the solution proposed here address this problem?

I don't think that is possible even in theory and none of the other
n-way replication solutions I have read seems to be claiming to have
something like that. It is quite possible that I am missing something
here but why do we want to have such a requirement from asynchronous
replication? I think in such cases even for load balancing we can
distribute reads where eventually consistent data is acceptable and
writes on separate tables/partitions can be distributed.

I haven't responded to some of your other points as they are
associated with the above theory.

4. Can the design proposed here be implemented as an extension instead
of a core postgres solution?

Yes, I think it could be. I think this proposal introduces some system
tables, so need to analyze what to do about that. BTW, do you see any
advantages to doing so?

IMO, yes, doing it the extension way has many advantages - it doesn't
have to touch the core part of postgres, usability will be good -
whoever requires this solution will use and we can avoid code chunks
within the core such as if (feature_enabled) { do foo} else { do bar}
sorts. Since this feature is based on core postgres logical
replication infrastructure, I think it's worth implementing it as an
extension first, maybe the extension as a PoC?

I don't know if it requires the kind of code you are thinking but I
agree that it is worth considering implementing it as an extension.

--
With Regards,
Amit Kapila.

#32

over 3 years ago

In reply to: Amit Kapila (#31)

RE: Multi-Master Logical Replication

Hi,

In addition to the use cases mentioned above, some users want to use n-way
replication of partial database.

The following is the typical use case.

* There are several data centers.
(ex. Japan and India)
* The database in each data center has its unique data.
(ex. the database in Japan has the data related to Japan)
* There are some common data.
(ex. the shipment data from Japan to India should be stored on both database)
* To replicate common data, users want to use n-way replication.

The current POC patch seems to support only n-way replication of entire database,
but I think we should support n-way replication of partial database to achieve
above use case.

I don't know if it requires the kind of code you are thinking but I
agree that it is worth considering implementing it as an extension.

I think the other advantage to implement as an extension is that users could
install the extension to older Postgres.

As mentioned in previous email, the one use case of n-way replication is migration
from older Postgres to newer Postgres.

If we implement as an extension, users could use n-way replication for migration
from PG10 to PG16.

Regards,
Ryohei Takahashi

#33

[1]: /messages/by-id/CALDaNm3Pt1CpEb3y9pE7ff91gZVpNXr91y4ZtWiw6h+GAyG4Gg@mail.gmail.com

over 3 years ago

In reply to: r.takahashi_2@fujitsu.com (#32)

RE: Multi-Master Logical Replication

Dear Takahashi-san,

Thanks for giving feedbacks!

I don't know if it requires the kind of code you are thinking but I
agree that it is worth considering implementing it as an extension.

I think the other advantage to implement as an extension is that users could
install the extension to older Postgres.

As mentioned in previous email, the one use case of n-way replication is migration
from older Postgres to newer Postgres.

If we implement as an extension, users could use n-way replication for migration
from PG10 to PG16.

I think even if LRG is implemented as contrib modules or any extensions,
it will deeply depend on the subscription option "origin" proposed in [1]/messages/by-id/CALDaNm3Pt1CpEb3y9pE7ff91gZVpNXr91y4ZtWiw6h+GAyG4Gg@mail.gmail.com.
So LRG cannot be used for older version, only PG16 or later.

Best Regards,
Hayato Kuroda
FUJITSU LIMITED

#34

r.takahashi_2@fujitsu.com

over 3 years ago

In reply to: kuroda.hayato@fujitsu.com (#33)

RE: Multi-Master Logical Replication

Hi Kuroda san,

I think even if LRG is implemented as contrib modules or any extensions,
it will deeply depend on the subscription option "origin" proposed in [1].
So LRG cannot be used for older version, only PG16 or later.

Sorry, I misunderstood.
I understand now.

Regards,
Ryohei Takahashi

#35