Single transaction in the tablesync worker?
The tablesync worker in logical replication performs the table data
sync in a single transaction which means it will copy the initial data
and then catch up with apply worker in the same transaction. There is
a comment in LogicalRepSyncTableStart ("We want to do the table data
sync in a single transaction.") saying so but I can't find the
concrete theory behind the same. Is there any fundamental problem if
we commit the transaction after initial copy and slot creation in
LogicalRepSyncTableStart and then allow the apply of transactions as
it happens in apply worker? I have tried doing so in the attached (a
quick prototype to test) and didn't find any problems with regression
tests. I have tried a few manual tests as well to see if it works and
didn't find any problem. Now, it is quite possible that it is
mandatory to do the way we are doing currently, or maybe something
else is required to remove this requirement but I think we can do
better with respect to comments in this area.
The reason why I am looking into this area is to support the logical
decoding of prepared transactions. See the problem [1]/messages/by-id/CAHut+PuEMk4SO8oGzxc_ftzPkGA8uC-y5qi-KRqHSy_P0i30DA@mail.gmail.com reported by
Peter Smith. Basically, when we stream prepared transactions in the
tablesync worker, it will simply commit the same due to the
requirement of maintaining a single transaction for the entire
duration of copy and streaming of transactions. Now, we can fix that
problem by disabling the decoding of prepared xacts in tablesync
worker. But that will arise to a different kind of problems like the
prepare will not be sent by the publisher but a later commit might
move lsn to a later step which will allow it to catch up till the
apply worker. So, now the prepared transaction will be skipped by both
tablesync and apply worker.
I think apart from unblocking the development of 'logical decoding of
prepared xacts', it will make the code consistent between apply and
tablesync worker and reduce the chances of future bugs in this area.
Basically, it will reduce the checks related to am_tablesync_worker()
at various places in the code.
I see that this code is added as part of commit
7c4f52409a8c7d85ed169bbbc1f6092274d03920 (Logical replication support
for initial data copy).
Thoughts?
[1]: /messages/by-id/CAHut+PuEMk4SO8oGzxc_ftzPkGA8uC-y5qi-KRqHSy_P0i30DA@mail.gmail.com
--
With Regards,
Amit Kapila.
Attachments:
v1-0001-Allow-more-than-one-transaction-in-tablesync-work.patchapplication/octet-stream; name=v1-0001-Allow-more-than-one-transaction-in-tablesync-work.patchDownload
From 9f2d1ff2a181136efe2d5db0e6ac43bec909a1f1 Mon Sep 17 00:00:00 2001
From: Amit Kapila <akapila@postgresql.org>
Date: Thu, 3 Dec 2020 14:18:19 +0530
Subject: [PATCH v1] Allow more than one transaction in tablesync worker.
---
src/backend/replication/logical/tablesync.c | 9 ++++++++-
src/backend/replication/logical/worker.c | 19 +++++--------------
2 files changed, 13 insertions(+), 15 deletions(-)
diff --git a/src/backend/replication/logical/tablesync.c b/src/backend/replication/logical/tablesync.c
index 1904f34..886298e 100644
--- a/src/backend/replication/logical/tablesync.c
+++ b/src/backend/replication/logical/tablesync.c
@@ -270,7 +270,8 @@ invalidate_syncing_table_states(Datum arg, int cacheid, uint32 hashvalue)
static void
process_syncing_tables_for_sync(XLogRecPtr current_lsn)
{
- Assert(IsTransactionState());
+ if (!IsTransactionState())
+ StartTransactionCommand();
SpinLockAcquire(&MyLogicalRepWorker->relmutex);
@@ -294,6 +295,9 @@ process_syncing_tables_for_sync(XLogRecPtr current_lsn)
}
else
SpinLockRelease(&MyLogicalRepWorker->relmutex);
+
+ if (IsTransactionState())
+ CommitTransactionCommand();
}
/*
@@ -943,6 +947,9 @@ LogicalRepSyncTableStart(XLogRecPtr *origin_startpos)
/* Make the copy visible. */
CommandCounterIncrement();
+ CommitTransactionCommand();
+ StartTransactionCommand();
+
/*
* We are done with the initial data synchronization, update the state.
*/
diff --git a/src/backend/replication/logical/worker.c b/src/backend/replication/logical/worker.c
index 8c7fad8..af6a98a 100644
--- a/src/backend/replication/logical/worker.c
+++ b/src/backend/replication/logical/worker.c
@@ -807,12 +807,8 @@ apply_handle_stream_stop(StringInfo s)
/* We must be in a valid transaction state */
Assert(IsTransactionState());
- /* The synchronization worker runs in single transaction. */
- if (!am_tablesync_worker())
- {
- /* Commit the per-stream transaction */
- CommitTransactionCommand();
- }
+ /* Commit the per-stream transaction */
+ CommitTransactionCommand();
in_streamed_transaction = false;
@@ -888,10 +884,7 @@ apply_handle_stream_abort(StringInfo s)
{
/* Cleanup the subxact info */
cleanup_subxact_info();
-
- /* The synchronization worker runs in single transaction */
- if (!am_tablesync_worker())
- CommitTransactionCommand();
+ CommitTransactionCommand();
return;
}
@@ -918,8 +911,7 @@ apply_handle_stream_abort(StringInfo s)
/* write the updated subxact list */
subxact_info_write(MyLogicalRepWorker->subid, xid);
- if (!am_tablesync_worker())
- CommitTransactionCommand();
+ CommitTransactionCommand();
}
}
@@ -1062,8 +1054,7 @@ apply_handle_stream_commit(StringInfo s)
static void
apply_handle_commit_internal(StringInfo s, LogicalRepCommitData* commit_data)
{
- /* The synchronization worker runs in single transaction. */
- if (IsTransactionState() && !am_tablesync_worker())
+ if (IsTransactionState())
{
/*
* Update origin state so we can restart streaming from correct
--
1.8.3.1
On Thu, Dec 3, 2020 at 2:55 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
The tablesync worker in logical replication performs the table data
sync in a single transaction which means it will copy the initial data
and then catch up with apply worker in the same transaction. There is
a comment in LogicalRepSyncTableStart ("We want to do the table data
sync in a single transaction.") saying so but I can't find the
concrete theory behind the same. Is there any fundamental problem if
we commit the transaction after initial copy and slot creation in
LogicalRepSyncTableStart and then allow the apply of transactions as
it happens in apply worker? I have tried doing so in the attached (a
quick prototype to test) and didn't find any problems with regression
tests. I have tried a few manual tests as well to see if it works and
didn't find any problem. Now, it is quite possible that it is
mandatory to do the way we are doing currently, or maybe something
else is required to remove this requirement but I think we can do
better with respect to comments in this area.
If we commit the initial copy, the data upto the initial copy's
snapshot will be visible downstream. If we apply the changes by
committing changes per transaction, the data visible to the other
transactions will differ as the apply progresses. You haven't
clarified whether we will respect the transaction boundaries in the
apply log or not. I assume we will. Whereas if we apply all the
changes in one go, other transactions either see the data before
resync or after it without any intermediate states. That will not
violate consistency, I think.
That's all I can think of as the reason behind doing a whole resync as
a single transaction.
--
Best Wishes,
Ashutosh Bapat
On Thu, Dec 3, 2020 at 7:04 PM Ashutosh Bapat
<ashutosh.bapat.oss@gmail.com> wrote:
On Thu, Dec 3, 2020 at 2:55 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
The tablesync worker in logical replication performs the table data
sync in a single transaction which means it will copy the initial data
and then catch up with apply worker in the same transaction. There is
a comment in LogicalRepSyncTableStart ("We want to do the table data
sync in a single transaction.") saying so but I can't find the
concrete theory behind the same. Is there any fundamental problem if
we commit the transaction after initial copy and slot creation in
LogicalRepSyncTableStart and then allow the apply of transactions as
it happens in apply worker? I have tried doing so in the attached (a
quick prototype to test) and didn't find any problems with regression
tests. I have tried a few manual tests as well to see if it works and
didn't find any problem. Now, it is quite possible that it is
mandatory to do the way we are doing currently, or maybe something
else is required to remove this requirement but I think we can do
better with respect to comments in this area.If we commit the initial copy, the data upto the initial copy's
snapshot will be visible downstream. If we apply the changes by
committing changes per transaction, the data visible to the other
transactions will differ as the apply progresses.
It is not clear what you mean by the above. The way you have written
appears that you are saying that instead of copying the initial data,
I am saying to copy it transaction-by-transaction. But that is not the
case. I am saying copy the initial data by using REPEATABLE READ
isolation level as we are doing now, commit it and then process
transaction-by-transaction till we reach sync-point (point till where
apply worker has already received the data).
You haven't
clarified whether we will respect the transaction boundaries in the
apply log or not. I assume we will.
It will be transaction-by-transaction.
Whereas if we apply all the
changes in one go, other transactions either see the data before
resync or after it without any intermediate states.
What is the problem even if the user is able to see the data after the
initial copy?
That will not
violate consistency, I think.
I am not sure how consistency will be broken.
That's all I can think of as the reason behind doing a whole resync as
a single transaction.
Thanks for sharing your thoughts.
--
With Regards,
Amit Kapila.
On Thu, 3 Dec 2020 at 17:25, Amit Kapila <amit.kapila16@gmail.com> wrote:
Is there any fundamental problem if
we commit the transaction after initial copy and slot creation in
LogicalRepSyncTableStart and then allow the apply of transactions as
it happens in apply worker?
No fundamental problem. Both approaches are fine. Committing the
initial copy then doing the rest in individual txns means an
incomplete sync state for the table becomes visible, which may not be
ideal. Ideally we'd do something like sync the data into a clone of
the table then swap the table relfilenodes out once we're synced up.
IMO the main advantage of committing as we go is that it would let us
use a non-temporary slot and support recovering an incomplete sync and
finishing it after interruption by connection loss, crash, etc. That
would be advantageous for big table syncs or where the sync has lots
of lag to replay. But it means we have to remember sync states, and
give users a way to cancel/abort them. Otherwise forgotten temp slots
for syncs will cause a mess on the upstream.
It also allows the sync slot to advance, freeing any held upstream
resources before the whole sync is done, which is good if the upstream
is busy and generating lots of WAL.
Finally, committing as we go means we won't exceed the cid increment
limit in a single txn.
The reason why I am looking into this area is to support the logical
decoding of prepared transactions. See the problem [1] reported by
Peter Smith. Basically, when we stream prepared transactions in the
tablesync worker, it will simply commit the same due to the
requirement of maintaining a single transaction for the entire
duration of copy and streaming of transactions. Now, we can fix that
problem by disabling the decoding of prepared xacts in tablesync
worker.
Tablesync should indeed only receive a txn when the commit arrives, it
should not attempt to handle uncommitted prepared xacts.
But that will arise to a different kind of problems like the
prepare will not be sent by the publisher but a later commit might
move lsn to a later step which will allow it to catch up till the
apply worker. So, now the prepared transaction will be skipped by both
tablesync and apply worker.
I'm not sure I understand. If what you describe is possible then
there's already a bug in prepared xact handling. Prepared xact commit
progress should be tracked by commit lsn, not by prepare lsn.
Can you set out the ordering of events in more detail?
I think apart from unblocking the development of 'logical decoding of
prepared xacts', it will make the code consistent between apply and
tablesync worker and reduce the chances of future bugs in this area.
Basically, it will reduce the checks related to am_tablesync_worker()
at various places in the code.
I think we made similar changes in pglogical to switch to applying
sync work in individual txns.
On Fri, Dec 4, 2020 at 7:53 AM Craig Ringer
<craig.ringer@enterprisedb.com> wrote:
On Thu, 3 Dec 2020 at 17:25, Amit Kapila <amit.kapila16@gmail.com> wrote:
The reason why I am looking into this area is to support the logical
decoding of prepared transactions. See the problem [1] reported by
Peter Smith. Basically, when we stream prepared transactions in the
tablesync worker, it will simply commit the same due to the
requirement of maintaining a single transaction for the entire
duration of copy and streaming of transactions. Now, we can fix that
problem by disabling the decoding of prepared xacts in tablesync
worker.Tablesync should indeed only receive a txn when the commit arrives, it
should not attempt to handle uncommitted prepared xacts.
Why? If we go with the approach of the commit as we go for individual
transactions in the tablesync worker then this shouldn't be a problem.
But that will arise to a different kind of problems like the
prepare will not be sent by the publisher but a later commit might
move lsn to a later step which will allow it to catch up till the
apply worker. So, now the prepared transaction will be skipped by both
tablesync and apply worker.I'm not sure I understand. If what you describe is possible then
there's already a bug in prepared xact handling. Prepared xact commit
progress should be tracked by commit lsn, not by prepare lsn.
Oh no, I am talking about commit of some other transaction.
Can you set out the ordering of events in more detail?
Sure. It will be something like where apply worker is ahead of sync worker:
Assume t1 has some data which tablesync worker has to first copy.
tx1
Begin;
Insert into t1....
Prepare Transaction 'foo'
tx2
Begin;
Insert into t1....
Commit
apply worker
• tx1: replays - does not apply anything because
should_apply_changes_for_rel thinks relation is not ready
• tx2: replays - does not apply anything because
should_apply_changes_for_rel thinks relation is not ready
tablesync worder
• tx1: handles: BEGIN - INSERT - PREPARE 'xyz'; (but tablesync gets
nothing because say we disable 2-PC for it)
• tx2: handles: BEGIN - INSERT - COMMIT;
• tablelsync exits
Now the situation is that the apply worker has skipped the prepared
xact data and tablesync worker has not received it, so not applied it.
Next, when we get Commit Prepared for tx1, it will silently commit the
prepared transaction without any data being updated. The commit
prepared won't error out in subscriber because the prepare would have
been successful even though the data is skipped via
should_apply_changes_for_rel.
I think apart from unblocking the development of 'logical decoding of
prepared xacts', it will make the code consistent between apply and
tablesync worker and reduce the chances of future bugs in this area.
Basically, it will reduce the checks related to am_tablesync_worker()
at various places in the code.I think we made similar changes in pglogical to switch to applying
sync work in individual txns.
oh, cool. Did you make some additional changes as you have mentioned
in the earlier part of the email?
--
With Regards,
Amit Kapila.
On Fri, Dec 4, 2020 at 7:53 AM Craig Ringer
<craig.ringer@enterprisedb.com> wrote:
On Thu, 3 Dec 2020 at 17:25, Amit Kapila <amit.kapila16@gmail.com> wrote:
Is there any fundamental problem if
we commit the transaction after initial copy and slot creation in
LogicalRepSyncTableStart and then allow the apply of transactions as
it happens in apply worker?No fundamental problem. Both approaches are fine. Committing the
initial copy then doing the rest in individual txns means an
incomplete sync state for the table becomes visible, which may not be
ideal. Ideally we'd do something like sync the data into a clone of
the table then swap the table relfilenodes out once we're synced up.IMO the main advantage of committing as we go is that it would let us
use a non-temporary slot and support recovering an incomplete sync and
finishing it after interruption by connection loss, crash, etc. That
would be advantageous for big table syncs or where the sync has lots
of lag to replay. But it means we have to remember sync states, and
give users a way to cancel/abort them. Otherwise forgotten temp slots
for syncs will cause a mess on the upstream.It also allows the sync slot to advance, freeing any held upstream
resources before the whole sync is done, which is good if the upstream
is busy and generating lots of WAL.Finally, committing as we go means we won't exceed the cid increment
limit in a single txn.
Yeah, all these are advantages of processing
transaction-by-transaction. IIUC, we need to primarily do two things
to achieve it, one is to have an additional state in the catalog (say
catch up) which will say that the initial copy is done. Then we need
to have a permanent slot using which we can track the progress of the
slot so that after restart (due to crash, connection break, etc.) we
can start from the appropriate position.
Apart from the above, I think with the current design of tablesync we
can see partial data of transactions because we allow all the
tablesync workers to run parallelly. Consider the below scenario:
CREATE TABLE mytbl1(id SERIAL PRIMARY KEY, somedata int, text varchar(120));
CREATE TABLE mytbl2(id SERIAL PRIMARY KEY, somedata int, text varchar(120));
Tx1
BEGIN;
INSERT INTO mytbl1(somedata, text) VALUES (1, 1);
INSERT INTO mytbl2(somedata, text) VALUES (1, 1);
COMMIT;
CREATE PUBLICATION mypublication FOR TABLE mytbl;
CREATE SUBSCRIPTION mysub
CONNECTION 'host=localhost port=5432 dbname=postgres'
PUBLICATION mypublication;
Tx2
BEGIN;
INSERT INTO mytbl1(somedata, text) VALUES (1, 2);
INSERT INTO mytbl2(somedata, text) VALUES (1, 2);
Commit;
Tx3
BEGIN;
INSERT INTO mytbl1(somedata, text) VALUES (1, 3);
INSERT INTO mytbl2(somedata, text) VALUES (1, 3);
Commit;
Now, I could see the below results on subscriber:
postgres=# select * from mytbl1;
id | somedata | text
----+----------+------
(0 rows)
postgres=# select * from mytbl2;
id | somedata | text
----+----------+------
1 | 1 | 1
2 | 1 | 2
3 | 1 | 3
(3 rows)
Basically, the results for Tx1, Tx2, Tx3 are visible for mytbl2 but
not for mytbl1. To reproduce this I have stopped the tablesync workers
(via debugger) for mytbl1 and mytbl2 in LogicalRepSyncTableStart
before it changes the relstate to SUBREL_STATE_SYNCWAIT. Then allowed
Tx2 and Tx3 to be processed by apply worker and then allowed tablesync
worker for mytbl2 to proceed. After that, I can see the above state.
Now, won't this behavior be considered as transaction inconsistency
where partial transaction data or later transaction data is visible? I
don't think we can have such a situation on the master (publisher)
node or in physical standby.
--
With Regards,
Amit Kapila.
On Fri, Dec 4, 2020 at 10:29 AM Amit Kapila <amit.kapila16@gmail.com> wrote:
On Fri, Dec 4, 2020 at 7:53 AM Craig Ringer
<craig.ringer@enterprisedb.com> wrote:On Thu, 3 Dec 2020 at 17:25, Amit Kapila <amit.kapila16@gmail.com> wrote:
Is there any fundamental problem if
we commit the transaction after initial copy and slot creation in
LogicalRepSyncTableStart and then allow the apply of transactions as
it happens in apply worker?No fundamental problem. Both approaches are fine. Committing the
initial copy then doing the rest in individual txns means an
incomplete sync state for the table becomes visible, which may not be
ideal. Ideally we'd do something like sync the data into a clone of
the table then swap the table relfilenodes out once we're synced up.IMO the main advantage of committing as we go is that it would let us
use a non-temporary slot and support recovering an incomplete sync and
finishing it after interruption by connection loss, crash, etc. That
would be advantageous for big table syncs or where the sync has lots
of lag to replay. But it means we have to remember sync states, and
give users a way to cancel/abort them. Otherwise forgotten temp slots
for syncs will cause a mess on the upstream.It also allows the sync slot to advance, freeing any held upstream
resources before the whole sync is done, which is good if the upstream
is busy and generating lots of WAL.Finally, committing as we go means we won't exceed the cid increment
limit in a single txn.Yeah, all these are advantages of processing
transaction-by-transaction. IIUC, we need to primarily do two things
to achieve it, one is to have an additional state in the catalog (say
catch up) which will say that the initial copy is done. Then we need
to have a permanent slot using which we can track the progress of the
slot so that after restart (due to crash, connection break, etc.) we
can start from the appropriate position.Apart from the above, I think with the current design of tablesync we
can see partial data of transactions because we allow all the
tablesync workers to run parallelly. Consider the below scenario:CREATE TABLE mytbl1(id SERIAL PRIMARY KEY, somedata int, text varchar(120));
CREATE TABLE mytbl2(id SERIAL PRIMARY KEY, somedata int, text varchar(120));Tx1
BEGIN;
INSERT INTO mytbl1(somedata, text) VALUES (1, 1);
INSERT INTO mytbl2(somedata, text) VALUES (1, 1);
COMMIT;CREATE PUBLICATION mypublication FOR TABLE mytbl;
oops, the above statement should be CREATE PUBLICATION mypublication
FOR TABLE mytbl1, mytbl2;
--
With Regards,
Amit Kapila.
On Fri, Dec 4, 2020 at 10:29 AM Amit Kapila <amit.kapila16@gmail.com> wrote:
On Fri, Dec 4, 2020 at 7:53 AM Craig Ringer
<craig.ringer@enterprisedb.com> wrote:On Thu, 3 Dec 2020 at 17:25, Amit Kapila <amit.kapila16@gmail.com> wrote:
Is there any fundamental problem if
we commit the transaction after initial copy and slot creation in
LogicalRepSyncTableStart and then allow the apply of transactions as
it happens in apply worker?No fundamental problem. Both approaches are fine. Committing the
initial copy then doing the rest in individual txns means an
incomplete sync state for the table becomes visible, which may not be
ideal. Ideally we'd do something like sync the data into a clone of
the table then swap the table relfilenodes out once we're synced up.IMO the main advantage of committing as we go is that it would let us
use a non-temporary slot and support recovering an incomplete sync and
finishing it after interruption by connection loss, crash, etc. That
would be advantageous for big table syncs or where the sync has lots
of lag to replay. But it means we have to remember sync states, and
give users a way to cancel/abort them. Otherwise forgotten temp slots
for syncs will cause a mess on the upstream.It also allows the sync slot to advance, freeing any held upstream
resources before the whole sync is done, which is good if the upstream
is busy and generating lots of WAL.Finally, committing as we go means we won't exceed the cid increment
limit in a single txn.Yeah, all these are advantages of processing
transaction-by-transaction. IIUC, we need to primarily do two things
to achieve it, one is to have an additional state in the catalog (say
catch up) which will say that the initial copy is done. Then we need
to have a permanent slot using which we can track the progress of the
slot so that after restart (due to crash, connection break, etc.) we
can start from the appropriate position.Apart from the above, I think with the current design of tablesync we
can see partial data of transactions because we allow all the
tablesync workers to run parallelly. Consider the below scenario:
..
..
Basically, the results for Tx1, Tx2, Tx3 are visible for mytbl2 but
not for mytbl1. To reproduce this I have stopped the tablesync workers
(via debugger) for mytbl1 and mytbl2 in LogicalRepSyncTableStart
before it changes the relstate to SUBREL_STATE_SYNCWAIT. Then allowed
Tx2 and Tx3 to be processed by apply worker and then allowed tablesync
worker for mytbl2 to proceed. After that, I can see the above state.Now, won't this behavior be considered as transaction inconsistency
where partial transaction data or later transaction data is visible? I
don't think we can have such a situation on the master (publisher)
node or in physical standby.
On briefly checking the pglogical code [1]https://github.com/2ndQuadrant/pglogical, it seems this problem
won't be there in pglogical. Because it seems to first copy all the
tables (via pglogical_sync_table) in one process and then catch with
the apply worker in a transaction-by-transaction manner. Am, I reading
it correctly? If so then why we followed a different approach for
in-core solution or is it that the pglogical has improved over time
but all the improvements can't be implemented in-core because of some
missing features?
[1]: https://github.com/2ndQuadrant/pglogical
--
With Regards,
Amit Kapila.
On Thu, Dec 3, 2020 at 7:24 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
On Thu, Dec 3, 2020 at 7:04 PM Ashutosh Bapat
<ashutosh.bapat.oss@gmail.com> wrote:On Thu, Dec 3, 2020 at 2:55 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
The tablesync worker in logical replication performs the table data
sync in a single transaction which means it will copy the initial data
and then catch up with apply worker in the same transaction. There is
a comment in LogicalRepSyncTableStart ("We want to do the table data
sync in a single transaction.") saying so but I can't find the
concrete theory behind the same. Is there any fundamental problem if
we commit the transaction after initial copy and slot creation in
LogicalRepSyncTableStart and then allow the apply of transactions as
it happens in apply worker? I have tried doing so in the attached (a
quick prototype to test) and didn't find any problems with regression
tests. I have tried a few manual tests as well to see if it works and
didn't find any problem. Now, it is quite possible that it is
mandatory to do the way we are doing currently, or maybe something
else is required to remove this requirement but I think we can do
better with respect to comments in this area.If we commit the initial copy, the data upto the initial copy's
snapshot will be visible downstream. If we apply the changes by
committing changes per transaction, the data visible to the other
transactions will differ as the apply progresses.It is not clear what you mean by the above. The way you have written
appears that you are saying that instead of copying the initial data,
I am saying to copy it transaction-by-transaction. But that is not the
case. I am saying copy the initial data by using REPEATABLE READ
isolation level as we are doing now, commit it and then process
transaction-by-transaction till we reach sync-point (point till where
apply worker has already received the data).
Craig in his mail has clarified this. The changes after the initial
COPY will be visible before the table sync catches up.
You haven't
clarified whether we will respect the transaction boundaries in the
apply log or not. I assume we will.It will be transaction-by-transaction.
Whereas if we apply all the
changes in one go, other transactions either see the data before
resync or after it without any intermediate states.What is the problem even if the user is able to see the data after the
initial copy?That will not
violate consistency, I think.I am not sure how consistency will be broken.
Some of the transactions applied by apply workers may not have been
applied by the resync and vice versa. If the intermediate states of
table resync worker are visible, this difference in applied
transaction will result in loss of consistency if those transactions
are changing the table being resynced and some other table in the same
transaction. The changes won't be atomically visible. Thinking more
about this, this problem exists today for a table being resynced, but
at least it's only the table being resynced that is behind the other
tables so it's predictable.
--
Best Wishes,
Ashutosh Bapat
On Fri, Dec 4, 2020 at 7:12 PM Ashutosh Bapat
<ashutosh.bapat.oss@gmail.com> wrote:
On Thu, Dec 3, 2020 at 7:24 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
On Thu, Dec 3, 2020 at 7:04 PM Ashutosh Bapat
<ashutosh.bapat.oss@gmail.com> wrote:On Thu, Dec 3, 2020 at 2:55 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
The tablesync worker in logical replication performs the table data
sync in a single transaction which means it will copy the initial data
and then catch up with apply worker in the same transaction. There is
a comment in LogicalRepSyncTableStart ("We want to do the table data
sync in a single transaction.") saying so but I can't find the
concrete theory behind the same. Is there any fundamental problem if
we commit the transaction after initial copy and slot creation in
LogicalRepSyncTableStart and then allow the apply of transactions as
it happens in apply worker? I have tried doing so in the attached (a
quick prototype to test) and didn't find any problems with regression
tests. I have tried a few manual tests as well to see if it works and
didn't find any problem. Now, it is quite possible that it is
mandatory to do the way we are doing currently, or maybe something
else is required to remove this requirement but I think we can do
better with respect to comments in this area.If we commit the initial copy, the data upto the initial copy's
snapshot will be visible downstream. If we apply the changes by
committing changes per transaction, the data visible to the other
transactions will differ as the apply progresses.It is not clear what you mean by the above. The way you have written
appears that you are saying that instead of copying the initial data,
I am saying to copy it transaction-by-transaction. But that is not the
case. I am saying copy the initial data by using REPEATABLE READ
isolation level as we are doing now, commit it and then process
transaction-by-transaction till we reach sync-point (point till where
apply worker has already received the data).Craig in his mail has clarified this. The changes after the initial
COPY will be visible before the table sync catches up.
I think the problem is not that the changes are visible after COPY
rather it is that we don't have a mechanism to restart if it crashes
after COPY unless we do all the sync up in one transaction. Assume we
commit after COPY and then process transaction-by-transaction and it
errors out (due to connection loss) or crashes, in-between one of the
following transactions after COPY then after the restart we won't know
from where to start for that relation. This is because the catalog
(pg_subscription_rel) will show the state as 'd' (data is being
copied) and the slot would have gone as it was a temporary slot. But
as mentioned in one of my emails above [1]/messages/by-id/CAA4eK1Ld9XaLoTZCoKF_gET7kc1fDf8CPR3CM48MQb1N1jDLYg@mail.gmail.com we can solve these problems
which Craig also seems to be advocating for as there are many
advantages of not doing the entire sync (initial copy + stream changes
for that relation) in one single transaction. It will allow us to
support decode of prepared xacts in the subscriber. Also, it seems
pglogical already does processing transaction-by-transaction after the
initial copy. The only thing which is not clear to me is why we
haven't decided to go ahead initially and it would be probably better
if the original authors would also chime-in to at least clarify the
same.
You haven't
clarified whether we will respect the transaction boundaries in the
apply log or not. I assume we will.It will be transaction-by-transaction.
Whereas if we apply all the
changes in one go, other transactions either see the data before
resync or after it without any intermediate states.What is the problem even if the user is able to see the data after the
initial copy?That will not
violate consistency, I think.I am not sure how consistency will be broken.
Some of the transactions applied by apply workers may not have been
applied by the resync and vice versa. If the intermediate states of
table resync worker are visible, this difference in applied
transaction will result in loss of consistency if those transactions
are changing the table being resynced and some other table in the same
transaction. The changes won't be atomically visible. Thinking more
about this, this problem exists today for a table being resynced, but
at least it's only the table being resynced that is behind the other
tables so it's predictable.
Yeah, I have already shown that this problem [1]/messages/by-id/CAA4eK1Ld9XaLoTZCoKF_gET7kc1fDf8CPR3CM48MQb1N1jDLYg@mail.gmail.com exists today and it
won't be predictable when the number of tables to be synced are more.
I am not sure why but it seems acceptable to original authors that the
data of transactions are visibly partially during the initial
synchronization phase for a subscription. I don't see it documented
clearly either.
[1]: /messages/by-id/CAA4eK1Ld9XaLoTZCoKF_gET7kc1fDf8CPR3CM48MQb1N1jDLYg@mail.gmail.com
--
With Regards,
Amit Kapila.
On Sat, 5 Dec 2020, 10:03 Amit Kapila, <amit.kapila16@gmail.com> wrote:
On Fri, Dec 4, 2020 at 7:12 PM Ashutosh Bapat
<ashutosh.bapat.oss@gmail.com> wrote:On Thu, Dec 3, 2020 at 7:24 PM Amit Kapila <amit.kapila16@gmail.com>
wrote:
On Thu, Dec 3, 2020 at 7:04 PM Ashutosh Bapat
<ashutosh.bapat.oss@gmail.com> wrote:On Thu, Dec 3, 2020 at 2:55 PM Amit Kapila <amit.kapila16@gmail.com>
wrote:
The tablesync worker in logical replication performs the table data
sync in a single transaction which means it will copy the initialdata
and then catch up with apply worker in the same transaction. There
is
a comment in LogicalRepSyncTableStart ("We want to do the table
data
sync in a single transaction.") saying so but I can't find the
concrete theory behind the same. Is there any fundamental problemif
we commit the transaction after initial copy and slot creation in
LogicalRepSyncTableStart and then allow the apply of transactionsas
it happens in apply worker? I have tried doing so in the attached
(a
quick prototype to test) and didn't find any problems with
regression
tests. I have tried a few manual tests as well to see if it works
and
didn't find any problem. Now, it is quite possible that it is
mandatory to do the way we are doing currently, or maybe something
else is required to remove this requirement but I think we can do
better with respect to comments in this area.If we commit the initial copy, the data upto the initial copy's
snapshot will be visible downstream. If we apply the changes by
committing changes per transaction, the data visible to the other
transactions will differ as the apply progresses.It is not clear what you mean by the above. The way you have written
appears that you are saying that instead of copying the initial data,
I am saying to copy it transaction-by-transaction. But that is not the
case. I am saying copy the initial data by using REPEATABLE READ
isolation level as we are doing now, commit it and then process
transaction-by-transaction till we reach sync-point (point till where
apply worker has already received the data).Craig in his mail has clarified this. The changes after the initial
COPY will be visible before the table sync catches up.I think the problem is not that the changes are visible after COPY
rather it is that we don't have a mechanism to restart if it crashes
after COPY unless we do all the sync up in one transaction. Assume we
commit after COPY and then process transaction-by-transaction and it
errors out (due to connection loss) or crashes, in-between one of the
following transactions after COPY then after the restart we won't know
from where to start for that relation. This is because the catalog
(pg_subscription_rel) will show the state as 'd' (data is being
copied) and the slot would have gone as it was a temporary slot. But
as mentioned in one of my emails above [1] we can solve these problems
which Craig also seems to be advocating for as there are many
advantages of not doing the entire sync (initial copy + stream changes
for that relation) in one single transaction. It will allow us to
support decode of prepared xacts in the subscriber. Also, it seems
pglogical already does processing transaction-by-transaction after the
initial copy. The only thing which is not clear to me is why we
haven't decided to go ahead initially and it would be probably better
if the original authors would also chime-in to at least clarify the
same.
It's partly a resource management issue.
Replication origins are a limited resource. We need to use a replication
origin for any sync we want to be durable across restarts.
Then again so are slots and we use temp slots for each sync.
If a sync fails cleanup on the upstream side is simple with a temp slot.
With persistent slots we have more risk of creating upstream issues. But
then, so long as the subscriber exists it can deal with that. And if the
subscriber no longer exists its primary slot is an issue too.
It'd help if we could register pg_shdepend entries between catalog entries
and slots, and from a main subscription slot to any extra slots used for
resynchronization.
And I should write a patch for a resource retention summarisation view.
I am not sure why but it seems acceptable to original authors that the
data of transactions are visibly partially during the initial
synchronization phase for a subscription.
I don't think there's much alternative there.
Pg would need some kind of cross commit visibility control mechanism that
separates durable commit from visibility
Hi,
I wanted to float another idea to solve these tablesync/apply worker problems.
This idea may or may not have merit. Please consider it.
~
Basically, I was wondering why can't the "tablesync" worker just
gather messages in a similar way to how the current streaming feature
gathers messages into a "changes" file, so that they can be replayed
later.
e.g. Imagine if
A) The "tablesync" worker (after the COPY) does not ever apply any of
the incoming messages, but instead it just gobbles them into a
"changes" file until it decides it has reached SYNCDONE state and
exits.
B) Then, when the "apply" worker proceeds, if it detects the existence
of the "changes" file it will replay/apply_dispatch all those gobbled
messages before just continuing as normal.
So
- IIUC this kind of replay is like how the current code stream commit
applies the streamed "changes" file.
- "tablesync" worker would only be doing table sync (COPY) as its name
suggests. Any detected "changes" are recorded and left for the "apply"
worker to handle.
- "tablesync" worker would just operate in single tx with a temporary
slot as per current code
- Then the "apply" worker would be the *only* worker that actually
applies anything. (as its name suggests)
Thoughts?
---
Kind Regards,
Peter Smith.
Fujitsu Australia
On Mon, Dec 7, 2020 at 6:20 AM Craig Ringer
<craig.ringer@enterprisedb.com> wrote:
On Sat, 5 Dec 2020, 10:03 Amit Kapila, <amit.kapila16@gmail.com> wrote:
On Fri, Dec 4, 2020 at 7:12 PM Ashutosh Bapat
<ashutosh.bapat.oss@gmail.com> wrote:I think the problem is not that the changes are visible after COPY
rather it is that we don't have a mechanism to restart if it crashes
after COPY unless we do all the sync up in one transaction. Assume we
commit after COPY and then process transaction-by-transaction and it
errors out (due to connection loss) or crashes, in-between one of the
following transactions after COPY then after the restart we won't know
from where to start for that relation. This is because the catalog
(pg_subscription_rel) will show the state as 'd' (data is being
copied) and the slot would have gone as it was a temporary slot. But
as mentioned in one of my emails above [1] we can solve these problems
which Craig also seems to be advocating for as there are many
advantages of not doing the entire sync (initial copy + stream changes
for that relation) in one single transaction. It will allow us to
support decode of prepared xacts in the subscriber. Also, it seems
pglogical already does processing transaction-by-transaction after the
initial copy. The only thing which is not clear to me is why we
haven't decided to go ahead initially and it would be probably better
if the original authors would also chime-in to at least clarify the
same.It's partly a resource management issue.
Replication origins are a limited resource. We need to use a replication origin for any sync we want to be durable across restarts.
Then again so are slots and we use temp slots for each sync.
If a sync fails cleanup on the upstream side is simple with a temp slot. With persistent slots we have more risk of creating upstream issues. But then, so long as the subscriber exists it can deal with that. And if the subscriber no longer exists its primary slot is an issue too.
I think if the only issue is slot clean up, then the same exists today
for the slot created by the apply worker (or which I think you are
referring to as a primary slot). This can only happen if the
subscriber goes away without dropping the subscription. Also, if we
are worried about using up too many slots then the slots used by
tablesync workers will probably be freed sooner.
It'd help if we could register pg_shdepend entries between catalog entries and slots, and from a main subscription slot to any extra slots used for resynchronization.
Which catalog entries you are referring to here?
And I should write a patch for a resource retention summarisation view.
That would be great.
I am not sure why but it seems acceptable to original authors that the
data of transactions are visibly partially during the initial
synchronization phase for a subscription.I don't think there's much alternative there.
I am not sure about this. I think it is primarily to allow some more
parallelism among apply and sync workers. One primitive way to achieve
parallelism and don't have this problem is to allow apply worker to
wait till all the tablesync workers are in DONE state. Then we will
never have an inconsistency problem or the prepared xact problem. Now,
surely if large copies are required for multiple relations then we
would delay a bit to replay transactions partially by the apply worker
but don't know how much that matters as compared to transaction
visibility issue and anyway we would have achieved the maximum
parallelism by allowing copy via multiple workers.
--
With Regards,
Amit Kapila.
On Mon, 7 Dec 2020 at 11:44, Peter Smith <smithpb2250@gmail.com> wrote:
Basically, I was wondering why can't the "tablesync" worker just
gather messages in a similar way to how the current streaming feature
gathers messages into a "changes" file, so that they can be replayed
later.
See the related thread "Logical archiving"
/messages/by-id/20D9328B-A189-43D1-80E2-EB25B9284AD6@yandex-team.ru
where I addressed some parts of this topic in detail earlier today.
A) The "tablesync" worker (after the COPY) does not ever apply any of
the incoming messages, but instead it just gobbles them into a
"changes" file until it decides it has reached SYNCDONE state and
exits.
This has a few issues.
Most importantly, the sync worker must cooperate with the main apply worker
to achieve a consistent end-of-sync cutover. The sync worker must have
replayed the pending changes in order to make this cut-over, because the
non-sync apply worker will need to start applying changes on top of the
resync'd table potentially as soon as the next transaction it starts
applying, so it needs to see the rows there.
Doing this would also add another round of write multiplication since the
data would get spooled then applied to WAL then heap. Write multiplication
is already an issue for logical replication so adding to it isn't
particularly desirable without a really compelling reason. With the write
multiplication comes disk space management issues for big transactions as
well as the obvious performance/throughput impact.
It adds even more latency between upstream commit and downstream apply,
something that is again already an issue for logical replication.
Right now we don't have any concept of a durable and locally flushed spool.
It's not impossible to do as you suggest but the cutover requirement makes
it far from simple. As discussed in the logical archiving thread I think
it'd be good to have something like this, and there are times the write
multiplication price would be well worth paying. But it's not easy.
B) Then, when the "apply" worker proceeds, if it detects the existence
of the "changes" file it will replay/apply_dispatch all those gobbled
messages before just continuing as normal.
That's going to introduce a really big stall in the apply worker's progress
in many cases. During that time it won't be receiving from upstream (since
we don't spool logical changes to disk at this time) so the upstream lag
will grow. That will impact synchronous replication, pg_wal size
management, catalog bloat, etc. It'll also leave the upstream logical
decoding session idle, so when it resumes it may create a spike of I/O and
CPU load as it catches up, as well as a spike of network traffic. And
depending on how close the upstream write rate is to the max decode speed,
network throughput max, and downstream apply speed max, it may take some
time to catch up over the resulting lag.
Not a big fan of that approach.
On Mon, Dec 7, 2020 at 10:02 AM Craig Ringer
<craig.ringer@enterprisedb.com> wrote:
On Mon, 7 Dec 2020 at 11:44, Peter Smith <smithpb2250@gmail.com> wrote:
Basically, I was wondering why can't the "tablesync" worker just
gather messages in a similar way to how the current streaming feature
gathers messages into a "changes" file, so that they can be replayed
later.See the related thread "Logical archiving"
/messages/by-id/20D9328B-A189-43D1-80E2-EB25B9284AD6@yandex-team.ru
where I addressed some parts of this topic in detail earlier today.
A) The "tablesync" worker (after the COPY) does not ever apply any of
the incoming messages, but instead it just gobbles them into a
"changes" file until it decides it has reached SYNCDONE state and
exits.This has a few issues.
Most importantly, the sync worker must cooperate with the main apply worker to achieve a consistent end-of-sync cutover.
In this idea, there is no need to change the end-of-sync cutover. It
will work as it is now. I am not sure what makes you think so.
The sync worker must have replayed the pending changes in order to make this cut-over, because the non-sync apply worker will need to start applying changes on top of the resync'd table potentially as soon as the next transaction it starts applying, so it needs to see the rows there.
The change here would be that the apply worker will check for changes
file and if it exists then apply them before it changes the relstate
to SUBREL_STATE_READY in process_syncing_tables_for_apply(). So, it
will not miss seeing any rows.
Doing this would also add another round of write multiplication since the data would get spooled then applied to WAL then heap. Write multiplication is already an issue for logical replication so adding to it isn't particularly desirable without a really compelling reason.
It will solve our problem of allowing decoding of prepared xacts in
pgoutput. I have explained the problem above [1]/messages/by-id/CAA4eK1KFsjf6x-S7b0dJLvEL3tcn9x-voBJiFoGsccyH5xgDzQ@mail.gmail.com. The other idea which
we discussed is to allow having an additional state in
pg_subscription_rel, make the slot as permanent in tablesync worker,
and then process transaction-by-transaction in apply worker. Does that
approach sounds better? Is there any bigger change involved in this
approach (making tablesync slot permanent) which I am missing?
With the write multiplication comes disk space management issues for big transactions as well as the obvious performance/throughput impact.
It adds even more latency between upstream commit and downstream apply, something that is again already an issue for logical replication.
Right now we don't have any concept of a durable and locally flushed spool.
I think we have a concept quite close to it for writing changes for
in-progress xacts as done in PG-14. It is not durable but that
shouldn't be a big problem if we allow syncing the changes file.
It's not impossible to do as you suggest but the cutover requirement makes it far from simple. As discussed in the logical archiving thread I think it'd be good to have something like this, and there are times the write multiplication price would be well worth paying. But it's not easy.
B) Then, when the "apply" worker proceeds, if it detects the existence
of the "changes" file it will replay/apply_dispatch all those gobbled
messages before just continuing as normal.That's going to introduce a really big stall in the apply worker's progress in many cases. During that time it won't be receiving from upstream (since we don't spool logical changes to disk at this time) so the upstream lag will grow. That will impact synchronous replication, pg_wal size management, catalog bloat, etc. It'll also leave the upstream logical decoding session idle, so when it resumes it may create a spike of I/O and CPU load as it catches up, as well as a spike of network traffic. And depending on how close the upstream write rate is to the max decode speed, network throughput max, and downstream apply speed max, it may take some time to catch up over the resulting lag.
This is just for the initial tablesync phase. I think it is equivalent
to saying that during basebackup, we need to parallelly start physical
replication. I agree that sometimes it can take a lot of time to copy
large tables but it will be just one time and no worse than the other
situations like basebackup.
[1]: /messages/by-id/CAA4eK1KFsjf6x-S7b0dJLvEL3tcn9x-voBJiFoGsccyH5xgDzQ@mail.gmail.com
--
With Regards,
Amit Kapila.
On Mon, Dec 7, 2020 at 9:21 AM Amit Kapila <amit.kapila16@gmail.com> wrote:
On Mon, Dec 7, 2020 at 6:20 AM Craig Ringer
<craig.ringer@enterprisedb.com> wrote:I am not sure why but it seems acceptable to original authors that the
data of transactions are visibly partially during the initial
synchronization phase for a subscription.I don't think there's much alternative there.
I am not sure about this. I think it is primarily to allow some more
parallelism among apply and sync workers. One primitive way to achieve
parallelism and don't have this problem is to allow apply worker to
wait till all the tablesync workers are in DONE state.
As the slot of apply worker is created before all the tablesync
workers it should never miss any LSN which tablesync workers would
have processed. Also, the table sync workers should not process any
xact if the apply worker has not processed anything. I think tablesync
currently always processes one transaction (because we call
process_sync_tables at commit of a txn) even if that is not required
to be in sync with the apply worker. This should solve both the
problems (a) visibility of partial transactions (b) allow prepared
transactions because tablesync worker no longer needs to combine
multiple transactions data.
I think the other advantages of this would be that it would reduce the
load (both CPU and I/O) on the publisher-side by allowing to decode
the data only once instead of for each table sync worker once and
separately for the apply worker. I think it will use fewer resources
to finish the work.
Is there any flaw in this idea which I am missing?
--
With Regards,
Amit Kapila.
On Mon, Dec 7, 2020 at 2:21 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
On Mon, Dec 7, 2020 at 9:21 AM Amit Kapila <amit.kapila16@gmail.com> wrote:
On Mon, Dec 7, 2020 at 6:20 AM Craig Ringer
<craig.ringer@enterprisedb.com> wrote:I am not sure why but it seems acceptable to original authors that the
data of transactions are visibly partially during the initial
synchronization phase for a subscription.I don't think there's much alternative there.
I am not sure about this. I think it is primarily to allow some more
parallelism among apply and sync workers. One primitive way to achieve
parallelism and don't have this problem is to allow apply worker to
wait till all the tablesync workers are in DONE state.As the slot of apply worker is created before all the tablesync
workers it should never miss any LSN which tablesync workers would
have processed. Also, the table sync workers should not process any
xact if the apply worker has not processed anything. I think tablesync
currently always processes one transaction (because we call
process_sync_tables at commit of a txn) even if that is not required
to be in sync with the apply worker.
One more thing to consider here is that currently in tablesync worker,
we create a slot with CRS_USE_SNAPSHOT option which creates a
transaction snapshot on the publisher, and then we use the same
snapshot for a copy from the publisher. After this, when we try to
receive the data from the publisher using the same slot, it will be in
sync with the COPY. I think to keep the same consistency between COPY
and the data we receive from the publisher in this approach, we need
to export the snapshot while creating a slot in the apply worker by
using CRS_EXPORT_SNAPSHOT and then use the same snapshot by all the
tablesync workers doing the copy. In tablesync workers, we can use the
SET TRANSACTION SNAPSHOT command after "BEGIN READ ONLY ISOLATION
LEVEL REPEATABLE READ" to achieve it. That way the COPY will use the
same snapshot as is used for receiving the changes in apply worker and
the data will be in sync.
--
With Regards,
Amit Kapila.
On Mon, Dec 7, 2020 at 7:49 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
As the slot of apply worker is created before all the tablesync
workers it should never miss any LSN which tablesync workers would
have processed. Also, the table sync workers should not process any
xact if the apply worker has not processed anything. I think tablesync
currently always processes one transaction (because we call
process_sync_tables at commit of a txn) even if that is not required
to be in sync with the apply worker. This should solve both the
problems (a) visibility of partial transactions (b) allow prepared
transactions because tablesync worker no longer needs to combine
multiple transactions data.I think the other advantages of this would be that it would reduce the
load (both CPU and I/O) on the publisher-side by allowing to decode
the data only once instead of for each table sync worker once and
separately for the apply worker. I think it will use fewer resources
to finish the work.
Yes, I observed this same behavior.
IIUC the only way for the tablesync worker to go from CATCHUP mode to
SYNCDONE is via the call to process_sync_tables.
But a side-effect of this is, when messages arrive during this CATCHUP
phase one tx will be getting handled by the tablesync worker before
the process_sync_tables() is ever encountered.
I have created and attached a simple patch which allows the tablesync
to detect if there is anything to do *before* it enters the apply main
loop. Calling process_sync_tables() before the apply main loop offers
a quick way out so the message handling will not be split
unnecessarily between the workers.
~
The result of the patch is demonstrated by the following test/logs
which are also attached.
Note: I added more logging (not in this patch) to make it easier to
see what is going on.
LOGS1. Current code.
Test: 10 x INSERTS done at CATCHUP time.
Result: tablesync worker does 1 x INSERT, then apply worker skips 1
and does remaining 9 x INSERTs.
LOGS2. Patched code.
Test: Same 10 x INSERTS done at CATCHUP time.
Result: tablesync can exit early. apply worker handles all 10 x INSERTs
LOGS3. Patched code.
Test: 2PC PREPARE then COMMIT PREPARED [1]2PC prepare test requires v29 patch from /messages/by-id/CAMGcDxeqEpWj3fTXwqhSwBdXd2RS9jzwWscO-XbeCfso6ts3+Q@mail.gmail.com done at CATCHUP time
psql -d test_pub -c "BEGIN;INSERT INTO test_tab VALUES(1,
'foo');PREPARE TRANSACTION 'test_prepared_tab';"
psql -d test_pub -c "COMMIT PREPARED 'test_prepared_tab';"
Result: The PREPARE and COMMIT PREPARED are both handle by apply
worker. This avoids complications which the split otherwise causes.
[1]: 2PC prepare test requires v29 patch from /messages/by-id/CAMGcDxeqEpWj3fTXwqhSwBdXd2RS9jzwWscO-XbeCfso6ts3+Q@mail.gmail.com
/messages/by-id/CAMGcDxeqEpWj3fTXwqhSwBdXd2RS9jzwWscO-XbeCfso6ts3+Q@mail.gmail.com
---
Kind Regards,
Peter Smith.
Fujitsu Australia
Attachments:
tablesync-early-exit.patchapplication/octet-stream; name=tablesync-early-exit.patchDownload
diff --git a/src/backend/replication/logical/worker.c b/src/backend/replication/logical/worker.c
index 7a6c594..b05d811 100644
--- a/src/backend/replication/logical/worker.c
+++ b/src/backend/replication/logical/worker.c
@@ -2322,6 +2322,15 @@ LogicalRepApplyLoop(XLogRecPtr last_received)
bool ping_sent = false;
TimeLineID tli;
+ /*
+ * Give the tablesync worker an opportunity see if it can exit instead of
+ * handling messages which the apply worker could just handle by itself.
+ */
+ if (am_tablesync_worker())
+ {
+ process_syncing_tables(last_received);
+ }
+
/*
* Init the ApplyMessageContext which we clean up after each replication
* protocol message.
On Tue, Dec 8, 2020 at 11:53 AM Peter Smith <smithpb2250@gmail.com> wrote:
Yes, I observed this same behavior.
IIUC the only way for the tablesync worker to go from CATCHUP mode to
SYNCDONE is via the call to process_sync_tables.But a side-effect of this is, when messages arrive during this CATCHUP
phase one tx will be getting handled by the tablesync worker before
the process_sync_tables() is ever encountered.I have created and attached a simple patch which allows the tablesync
to detect if there is anything to do *before* it enters the apply main
loop. Calling process_sync_tables() before the apply main loop offers
a quick way out so the message handling will not be split
unnecessarily between the workers.
Yeah, this demonstrates the idea can work but as mentioned in my
previous email [1]/messages/by-id/CAA4eK1+QC74wRQmbYT+MmOs=YbdUjuq0_A9CBbVoQMB1Ryi-OA@mail.gmail.com this needs much more work to make the COPY and then
later fetching the changes from the publisher consistently. So, let me
summarize the discussion so far. We wanted to enhance the tablesync
phase of Logical Replication to enable decoding of prepared
transactions [2]/messages/by-id/CAHut+PuEMk4SO8oGzxc_ftzPkGA8uC-y5qi-KRqHSy_P0i30DA@mail.gmail.com. The problem was when we stream prepared transactions
in the tablesync worker, it will simply commit the same due to the
requirement of maintaining a single transaction for the entire
duration of copy and streaming of transactions afterward. We can't
simply disable the decoding of prepared xacts for tablesync workers
because it can skip some of the prepared xacts forever on subscriber
as explained in one of the emails above [3]/messages/by-id/CAA4eK1KFsjf6x-S7b0dJLvEL3tcn9x-voBJiFoGsccyH5xgDzQ@mail.gmail.com. Now, while investigating
the solutions to enhance tablesync to support decoding at prepare
time, I found that due to the current design of tablesync we can see
partial data of transactions on subscribers which is also explained in
the email above with an example [4]/messages/by-id/CAA4eK1Ld9XaLoTZCoKF_gET7kc1fDf8CPR3CM48MQb1N1jDLYg@mail.gmail.com. This problem of visibility is
there since the Logical Replication is introduced in PostgreSQL and
the only answer I got till now is that there doesn't seem to be any
other alternative which I think is not true and I have provided one
alternative as well.
Next, we have discussed three different solutions all of which will
solve the first problem (allow the tablesync worker to decode
transactions at prepare time) and one of which solves both the first
and second problem (partial transaction data visibility).
Solution-1: Allow the table-sync worker to use multiple transactions.
The reason for doing it in a single transaction is that if after
initial COPY we commit and then crash while streaming changes of other
transactions, the state of the table won't be known after the restart
as we are using temporary slot so we don't from where to restart
syncing the table.
IIUC, we need to primarily do two things to achieve multiple
transactions, one is to have an additional state in the catalog (say
catch up) which will say that the initial copy is done. Then we need
to have a permanent slot using which we can track the progress of the
slot so that after restart (due to crash, connection break, etc.) we
can start from the appropriate position. Now, this will allow us to do
less work after recovering from a crash because we will know the
restart point. As Craig mentioned, it also allows the sync slot to
advance, freeing any held upstream resources before the whole sync is
done, which is good if the upstream is busy and generating lots of
WAL. Finally, committing as we go means we won't exceed the cid
increment limit in a single txn.
Solution-2: The next solution we discussed is to make "tablesync"
worker just gather messages after COPY in a similar way to how the
current streaming of in-progress transaction feature gathers messages
into a "changes" file so that they can be replayed later by the apply
worker. Now, here as we don't need to replay the individual
transactions in tablesync worker in a single transaction, it will
allow us to send decode prepared to the subscriber. This has some
disadvantages such as each transaction processed by tablesync worker
needs to be durably written to file and it can also lead to some apply
lag later when we process the same by apply worker.
Solution-3: Allow the table-sync workers to just perform initial COPY
and then once the COPY is done for all relations the apply worker will
stream all the future changes. Now, surely if large copies are
required for multiple relations then we would delay a bit to replay
transactions partially by the apply worker but don't know how much
that matters as compared to transaction visibility issue and anyway we
would have achieved the maximum parallelism by allowing copy via
multiple workers. This would reduce the load (both CPU and I/O) on the
publisher-side by allowing to decode the data only once instead of for
each table sync worker once and separately for the apply worker. I
think it will use fewer resources to finish the work.
Currently, in tablesync worker, we create a slot with CRS_USE_SNAPSHOT
option which creates a transaction snapshot on the publisher, and then
we use the same snapshot for COPY from the publisher. After this, when
we try to receive the data from the publisher using the same slot, it
will be in sync with the COPY. I think to keep the same consistency
between COPY and the data we receive from the publisher in this
approach, we need to export the snapshot while creating a slot in the
apply worker by using CRS_EXPORT_SNAPSHOT and then use the same
snapshot by all the tablesync workers doing the copy. In tablesync
workers, we can use the SET TRANSACTION SNAPSHOT command after "BEGIN
READ ONLY ISOLATION LEVEL REPEATABLE READ" to use the exported
snapshot. That way the COPY will use the same snapshot as is used for
receiving the changes in apply worker and the data will be in sync.
Then we also need a way to export snapshot while the apply worker is
already receiving the changes because users can use 'ALTER
SUBSCRIPTION name REFRESH PUBLICATION' which allows new tables to be
synced. I think we need to introduce a new command in
exec_replication_command() to export the snapshot from the existing
slot and then use it by the new tablesync worker.
Among the above three solutions, the first two will solve the first
problem (allow the tablesync worker to decode transactions at prepare
time) and the third solution will solve both the first and second
problem (partial transaction data visibility). The third solution
requires quite some redesign of how the Logical Replication work is
synchronized between apply and tablesync workers and might turn out to
be a bigger implementation effort. I am tentatively thinking to go
with a first or second solution at this stage and anyway if later
people feel that we need some bigger redesign then we can go with
something on the lines of Solution-3.
Thoughts?
[1]: /messages/by-id/CAA4eK1+QC74wRQmbYT+MmOs=YbdUjuq0_A9CBbVoQMB1Ryi-OA@mail.gmail.com
[2]: /messages/by-id/CAHut+PuEMk4SO8oGzxc_ftzPkGA8uC-y5qi-KRqHSy_P0i30DA@mail.gmail.com
[3]: /messages/by-id/CAA4eK1KFsjf6x-S7b0dJLvEL3tcn9x-voBJiFoGsccyH5xgDzQ@mail.gmail.com
[4]: /messages/by-id/CAA4eK1Ld9XaLoTZCoKF_gET7kc1fDf8CPR3CM48MQb1N1jDLYg@mail.gmail.com
--
With Regards,
Amit Kapila.
On Tue, Dec 8, 2020 at 9:14 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
On Tue, Dec 8, 2020 at 11:53 AM Peter Smith <smithpb2250@gmail.com> wrote:
Yes, I observed this same behavior.
IIUC the only way for the tablesync worker to go from CATCHUP mode to
SYNCDONE is via the call to process_sync_tables.But a side-effect of this is, when messages arrive during this CATCHUP
phase one tx will be getting handled by the tablesync worker before
the process_sync_tables() is ever encountered.I have created and attached a simple patch which allows the tablesync
to detect if there is anything to do *before* it enters the apply main
loop. Calling process_sync_tables() before the apply main loop offers
a quick way out so the message handling will not be split
unnecessarily between the workers.Yeah, this demonstrates the idea can work but as mentioned in my
previous email [1] this needs much more work to make the COPY and then
later fetching the changes from the publisher consistently. So, let me
summarize the discussion so far. We wanted to enhance the tablesync
phase of Logical Replication to enable decoding of prepared
transactions [2]. The problem was when we stream prepared transactions
in the tablesync worker, it will simply commit the same due to the
requirement of maintaining a single transaction for the entire
duration of copy and streaming of transactions afterward. We can't
simply disable the decoding of prepared xacts for tablesync workers
because it can skip some of the prepared xacts forever on subscriber
as explained in one of the emails above [3]. Now, while investigating
the solutions to enhance tablesync to support decoding at prepare
time, I found that due to the current design of tablesync we can see
partial data of transactions on subscribers which is also explained in
the email above with an example [4]. This problem of visibility is
there since the Logical Replication is introduced in PostgreSQL and
the only answer I got till now is that there doesn't seem to be any
other alternative which I think is not true and I have provided one
alternative as well.Next, we have discussed three different solutions all of which will
solve the first problem (allow the tablesync worker to decode
transactions at prepare time) and one of which solves both the first
and second problem (partial transaction data visibility).Solution-1: Allow the table-sync worker to use multiple transactions.
The reason for doing it in a single transaction is that if after
initial COPY we commit and then crash while streaming changes of other
transactions, the state of the table won't be known after the restart
as we are using temporary slot so we don't from where to restart
syncing the table.IIUC, we need to primarily do two things to achieve multiple
transactions, one is to have an additional state in the catalog (say
catch up) which will say that the initial copy is done. Then we need
to have a permanent slot using which we can track the progress of the
slot so that after restart (due to crash, connection break, etc.) we
can start from the appropriate position. Now, this will allow us to do
less work after recovering from a crash because we will know the
restart point. As Craig mentioned, it also allows the sync slot to
advance, freeing any held upstream resources before the whole sync is
done, which is good if the upstream is busy and generating lots of
WAL. Finally, committing as we go means we won't exceed the cid
increment limit in a single txn.Solution-2: The next solution we discussed is to make "tablesync"
worker just gather messages after COPY in a similar way to how the
current streaming of in-progress transaction feature gathers messages
into a "changes" file so that they can be replayed later by the apply
worker. Now, here as we don't need to replay the individual
transactions in tablesync worker in a single transaction, it will
allow us to send decode prepared to the subscriber. This has some
disadvantages such as each transaction processed by tablesync worker
needs to be durably written to file and it can also lead to some apply
lag later when we process the same by apply worker.Solution-3: Allow the table-sync workers to just perform initial COPY
and then once the COPY is done for all relations the apply worker will
stream all the future changes. Now, surely if large copies are
required for multiple relations then we would delay a bit to replay
transactions partially by the apply worker but don't know how much
that matters as compared to transaction visibility issue and anyway we
would have achieved the maximum parallelism by allowing copy via
multiple workers. This would reduce the load (both CPU and I/O) on the
publisher-side by allowing to decode the data only once instead of for
each table sync worker once and separately for the apply worker. I
think it will use fewer resources to finish the work.Currently, in tablesync worker, we create a slot with CRS_USE_SNAPSHOT
option which creates a transaction snapshot on the publisher, and then
we use the same snapshot for COPY from the publisher. After this, when
we try to receive the data from the publisher using the same slot, it
will be in sync with the COPY. I think to keep the same consistency
between COPY and the data we receive from the publisher in this
approach, we need to export the snapshot while creating a slot in the
apply worker by using CRS_EXPORT_SNAPSHOT and then use the same
snapshot by all the tablesync workers doing the copy. In tablesync
workers, we can use the SET TRANSACTION SNAPSHOT command after "BEGIN
READ ONLY ISOLATION LEVEL REPEATABLE READ" to use the exported
snapshot. That way the COPY will use the same snapshot as is used for
receiving the changes in apply worker and the data will be in sync.Then we also need a way to export snapshot while the apply worker is
already receiving the changes because users can use 'ALTER
SUBSCRIPTION name REFRESH PUBLICATION' which allows new tables to be
synced. I think we need to introduce a new command in
exec_replication_command() to export the snapshot from the existing
slot and then use it by the new tablesync worker.Among the above three solutions, the first two will solve the first
problem (allow the tablesync worker to decode transactions at prepare
time) and the third solution will solve both the first and second
problem (partial transaction data visibility). The third solution
requires quite some redesign of how the Logical Replication work is
synchronized between apply and tablesync workers and might turn out to
be a bigger implementation effort. I am tentatively thinking to go
with a first or second solution at this stage and anyway if later
people feel that we need some bigger redesign then we can go with
something on the lines of Solution-3.Thoughts?
[1] - /messages/by-id/CAA4eK1+QC74wRQmbYT+MmOs=YbdUjuq0_A9CBbVoQMB1Ryi-OA@mail.gmail.com
[2] - /messages/by-id/CAHut+PuEMk4SO8oGzxc_ftzPkGA8uC-y5qi-KRqHSy_P0i30DA@mail.gmail.com
[3] - /messages/by-id/CAA4eK1KFsjf6x-S7b0dJLvEL3tcn9x-voBJiFoGsccyH5xgDzQ@mail.gmail.com
[4] - /messages/by-id/CAA4eK1Ld9XaLoTZCoKF_gET7kc1fDf8CPR3CM48MQb1N1jDLYg@mail.gmail.com--
Hi Amit,
- Solution-3 has become too complicated to be attempted by me. Anyway,
we may be better to just focus on eliminating the new problems exposed
by the 2PC work [1]/messages/by-id/CAHut+Ptm7E5Jj92tJWPtnnjbNjJN60_=aGGKYW3h23b7J=qeDg@mail.gmail.com, rather than burning too much effort to fix some
other quirk which apparently has existed for years.
[1]: /messages/by-id/CAHut+Ptm7E5Jj92tJWPtnnjbNjJN60_=aGGKYW3h23b7J=qeDg@mail.gmail.com
- Solution-2 has some potential lag problems, and maybe file resource
problems as well. This idea did not get a very favourable response
when I first proposed it.
- This leaves Solution-1 as the best viable option to fix the current
known 2PC trouble.
~~
So I will try to write a patch for the proposed Solution-1.
---
Kind Regards,
Peter Smith.
Fujitsu Australia
On Thu, Dec 10, 2020 at 3:19 PM Peter Smith <smithpb2250@gmail.com> wrote:
On Tue, Dec 8, 2020 at 9:14 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
On Tue, Dec 8, 2020 at 11:53 AM Peter Smith <smithpb2250@gmail.com> wrote:
Yes, I observed this same behavior.
IIUC the only way for the tablesync worker to go from CATCHUP mode to
SYNCDONE is via the call to process_sync_tables.But a side-effect of this is, when messages arrive during this CATCHUP
phase one tx will be getting handled by the tablesync worker before
the process_sync_tables() is ever encountered.I have created and attached a simple patch which allows the tablesync
to detect if there is anything to do *before* it enters the apply main
loop. Calling process_sync_tables() before the apply main loop offers
a quick way out so the message handling will not be split
unnecessarily between the workers.Yeah, this demonstrates the idea can work but as mentioned in my
previous email [1] this needs much more work to make the COPY and then
later fetching the changes from the publisher consistently. So, let me
summarize the discussion so far. We wanted to enhance the tablesync
phase of Logical Replication to enable decoding of prepared
transactions [2]. The problem was when we stream prepared transactions
in the tablesync worker, it will simply commit the same due to the
requirement of maintaining a single transaction for the entire
duration of copy and streaming of transactions afterward. We can't
simply disable the decoding of prepared xacts for tablesync workers
because it can skip some of the prepared xacts forever on subscriber
as explained in one of the emails above [3]. Now, while investigating
the solutions to enhance tablesync to support decoding at prepare
time, I found that due to the current design of tablesync we can see
partial data of transactions on subscribers which is also explained in
the email above with an example [4]. This problem of visibility is
there since the Logical Replication is introduced in PostgreSQL and
the only answer I got till now is that there doesn't seem to be any
other alternative which I think is not true and I have provided one
alternative as well.Next, we have discussed three different solutions all of which will
solve the first problem (allow the tablesync worker to decode
transactions at prepare time) and one of which solves both the first
and second problem (partial transaction data visibility).Solution-1: Allow the table-sync worker to use multiple transactions.
The reason for doing it in a single transaction is that if after
initial COPY we commit and then crash while streaming changes of other
transactions, the state of the table won't be known after the restart
as we are using temporary slot so we don't from where to restart
syncing the table.IIUC, we need to primarily do two things to achieve multiple
transactions, one is to have an additional state in the catalog (say
catch up) which will say that the initial copy is done. Then we need
to have a permanent slot using which we can track the progress of the
slot so that after restart (due to crash, connection break, etc.) we
can start from the appropriate position. Now, this will allow us to do
less work after recovering from a crash because we will know the
restart point. As Craig mentioned, it also allows the sync slot to
advance, freeing any held upstream resources before the whole sync is
done, which is good if the upstream is busy and generating lots of
WAL. Finally, committing as we go means we won't exceed the cid
increment limit in a single txn.Solution-2: The next solution we discussed is to make "tablesync"
worker just gather messages after COPY in a similar way to how the
current streaming of in-progress transaction feature gathers messages
into a "changes" file so that they can be replayed later by the apply
worker. Now, here as we don't need to replay the individual
transactions in tablesync worker in a single transaction, it will
allow us to send decode prepared to the subscriber. This has some
disadvantages such as each transaction processed by tablesync worker
needs to be durably written to file and it can also lead to some apply
lag later when we process the same by apply worker.Solution-3: Allow the table-sync workers to just perform initial COPY
and then once the COPY is done for all relations the apply worker will
stream all the future changes. Now, surely if large copies are
required for multiple relations then we would delay a bit to replay
transactions partially by the apply worker but don't know how much
that matters as compared to transaction visibility issue and anyway we
would have achieved the maximum parallelism by allowing copy via
multiple workers. This would reduce the load (both CPU and I/O) on the
publisher-side by allowing to decode the data only once instead of for
each table sync worker once and separately for the apply worker. I
think it will use fewer resources to finish the work.Currently, in tablesync worker, we create a slot with CRS_USE_SNAPSHOT
option which creates a transaction snapshot on the publisher, and then
we use the same snapshot for COPY from the publisher. After this, when
we try to receive the data from the publisher using the same slot, it
will be in sync with the COPY. I think to keep the same consistency
between COPY and the data we receive from the publisher in this
approach, we need to export the snapshot while creating a slot in the
apply worker by using CRS_EXPORT_SNAPSHOT and then use the same
snapshot by all the tablesync workers doing the copy. In tablesync
workers, we can use the SET TRANSACTION SNAPSHOT command after "BEGIN
READ ONLY ISOLATION LEVEL REPEATABLE READ" to use the exported
snapshot. That way the COPY will use the same snapshot as is used for
receiving the changes in apply worker and the data will be in sync.Then we also need a way to export snapshot while the apply worker is
already receiving the changes because users can use 'ALTER
SUBSCRIPTION name REFRESH PUBLICATION' which allows new tables to be
synced. I think we need to introduce a new command in
exec_replication_command() to export the snapshot from the existing
slot and then use it by the new tablesync worker.Among the above three solutions, the first two will solve the first
problem (allow the tablesync worker to decode transactions at prepare
time) and the third solution will solve both the first and second
problem (partial transaction data visibility). The third solution
requires quite some redesign of how the Logical Replication work is
synchronized between apply and tablesync workers and might turn out to
be a bigger implementation effort. I am tentatively thinking to go
with a first or second solution at this stage and anyway if later
people feel that we need some bigger redesign then we can go with
something on the lines of Solution-3.Thoughts?
[1] - /messages/by-id/CAA4eK1+QC74wRQmbYT+MmOs=YbdUjuq0_A9CBbVoQMB1Ryi-OA@mail.gmail.com
[2] - /messages/by-id/CAHut+PuEMk4SO8oGzxc_ftzPkGA8uC-y5qi-KRqHSy_P0i30DA@mail.gmail.com
[3] - /messages/by-id/CAA4eK1KFsjf6x-S7b0dJLvEL3tcn9x-voBJiFoGsccyH5xgDzQ@mail.gmail.com
[4] - /messages/by-id/CAA4eK1Ld9XaLoTZCoKF_gET7kc1fDf8CPR3CM48MQb1N1jDLYg@mail.gmail.com--
Hi Amit,
- Solution-3 has become too complicated to be attempted by me. Anyway,
we may be better to just focus on eliminating the new problems exposed
by the 2PC work [1], rather than burning too much effort to fix some
other quirk which apparently has existed for years.
[1] /messages/by-id/CAHut+Ptm7E5Jj92tJWPtnnjbNjJN60_=aGGKYW3h23b7J=qeDg@mail.gmail.com- Solution-2 has some potential lag problems, and maybe file resource
problems as well. This idea did not get a very favourable response
when I first proposed it.- This leaves Solution-1 as the best viable option to fix the current
known 2PC trouble.~~
So I will try to write a patch for the proposed Solution-1.
Yeah, even I think that the Solution-1 is best for solving the problem for 2PC.
--
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com
On Thu, Dec 10, 2020 at 8:49 PM Peter Smith <smithpb2250@gmail.com> wrote:
So I will try to write a patch for the proposed Solution-1.
Hi Amit.
FYI, here is my v3 WIP patch for the Solution1.
This patch applies onto the v30 patch set [1]/messages/by-id/CAFPTHDYA8yE6tEmQ2USYS68kNt+kM=SwKgj=jy4AvFD5e9-UTQ@mail.gmail.com from the other 2PC thread:
[1]: /messages/by-id/CAFPTHDYA8yE6tEmQ2USYS68kNt+kM=SwKgj=jy4AvFD5e9-UTQ@mail.gmail.com
Although incomplete, it does continue to pass all the make check, and
src/test/subscription TAP tests.
====
Coded / WIP:
* tablesync slot is now permanent instead of temporary
* the tablesync slot cleanup (drop) code is added for DropSubscription
and for finish_sync_worker functions
* tablesync worked now allowing multiple tx instead of single tx
* a new state (SUBREL_STATE_COPYDONE) is persisted after a successful
copy_table in LogicalRepSyncTableStart.
* if a relaunched tablesync finds the state is SUBREL_STATE_COPYDONE
then it will bypass the initial copy_table phase.
TODO / Known Issues:
* The tablesync replication origin/lsn logic all needs to be updated
so that tablesync knows where to restart based on information held by
the now permanent slot.
* the current implementation of tablesync drop slot (e.g. from DROP
SUBSCRIPTION) or finish_sync_worker regenerates the tablesync slot
name so it knows what slot to drop. The current code may be ok for
normal use cases, but if there is an ALTER SUBSCRIPTION ... SET
(slot_name = newname) it would fail to be able to find the tablesync
slot. Some redesign may be needed for this part.
* help / comments / cleanup
* There is temporary "!!>>" excessive logging of mine scattered around
which I added to help my testing during development
---
Kind Regards,
Peter Smith.
Fujitsu Australia
Attachments:
v3-0002-2PC-Solution1-WIP-20201215.patchapplication/octet-stream; name=v3-0002-2PC-Solution1-WIP-20201215.patchDownload
From 9adb04bdb827f44a91e45d53b1fad5a02213777c Mon Sep 17 00:00:00 2001
From: Peter Smith <peter.b.smith@fujitsu.com>
Date: Tue, 15 Dec 2020 20:46:27 +1100
Subject: [PATCH v3] 2PC-Solution1-WIP-20201215.
This patch applies onto the v30 patch set [1] from other 2PC thread:
[1] https://www.postgresql.org/message-id/CAFPTHDYA8yE6tEmQ2USYS68kNt%2BkM%3DSwKgj%3Djy4AvFD5e9-UTQ%40mail.gmail.com
====
Coded / WIP:
* tablesync slot is now permanent instead of temporary
* the tablesync slot cleanup (drop) code is added for DropSubscription and for finish_sync_worker functions
* tablesync worked now allowing multiple tx instead of single tx
* a new state (SUBREL_STATE_COPYDONE) is persisted after a successful copy_table in LogicalRepSyncTableStart.
* if a relaunched tablesync finds the state is SUBREL_STATE_COPYDONE then it will bypass the initial copy_table phase.
TODO / Known Issues:
* The tablesync replication origin/lsn logic all needs to be updated so that tablesync knows where to restart based on information held by the now permanent slot.
* the current implementation of tablesync drop slot (e.g. from DROP SUBSCRIPTION) or finish_sync_worker regenerates the tablesync slot name so it knows what slot to drop. The current code may be ok for normal use cases, but if there is a ALTER SUBSCRIPTION ... SET (slot_name = newname) it would fail to be able to find the tablesync slot.
* help / comments / cleanup
* There is temporary "!!>>" excessive logging of mine scattered around which I added to help my testing during development
---
src/backend/commands/subscriptioncmds.c | 108 ++++++++++++++++++
src/backend/replication/logical/tablesync.c | 163 ++++++++++++++++++++++------
src/backend/replication/logical/worker.c | 21 +---
src/include/catalog/pg_subscription_rel.h | 1 +
src/include/commands/subscriptioncmds.h | 1 +
src/include/replication/slot.h | 1 +
6 files changed, 249 insertions(+), 46 deletions(-)
diff --git a/src/backend/commands/subscriptioncmds.c b/src/backend/commands/subscriptioncmds.c
index b0745d5..e2b9618 100644
--- a/src/backend/commands/subscriptioncmds.c
+++ b/src/backend/commands/subscriptioncmds.c
@@ -37,6 +37,7 @@
#include "replication/walreceiver.h"
#include "replication/walsender.h"
#include "replication/worker_internal.h"
+#include "replication/slot.h"
#include "storage/lmgr.h"
#include "utils/acl.h"
#include "utils/builtins.h"
@@ -1070,6 +1071,41 @@ DropSubscription(DropSubscriptionStmt *stmt, bool isTopLevel)
{
LogicalRepWorker *w = (LogicalRepWorker *) lfirst(lc);
+ /* Is this a tablesync worker? If yes, drop the tablesync's slot. */
+ if (OidIsValid(w->relid))
+ {
+ /* FIXME 1 - This slotname check below is workaround needed because the tablesync slot name
+ * is derived from the subscription slot name, so if that was set slot_name = NONE then we cannot do
+ * that calculation anymore to get the tablesyn slot name.
+ *
+ * FIXME 2 - If subscription slot name changes from 'aaa' to 'bbb' then it will be not be possible
+ * to get back to those tablesyn slots. Some resigned needed (eg store the tablesync slotname somewhere)
+ * to avoid this trouble...
+ */
+ if (slotname)
+ {
+ extern void ReplicationSlotDropAtPubNode(
+ WalReceiverConn *wrconn_given, char *conninfo, char *subname, char *slotname);
+
+ /* Calculate the name of the tablesync slot */
+ char *syncslotname = ReplicationSlotNameForTablesync(slotname, w->subid, w->relid);
+
+ elog(LOG, "!!>> DROP SUBSCRIPTION - now dropping the tablesync slot \"%s\".", syncslotname);
+ ReplicationSlotDropAtPubNode(
+ NULL,
+ conninfo, /* use conninfo to make a new connection. */
+ subname,
+ syncslotname);
+
+ pfree(syncslotname);
+ }
+ else
+ {
+ elog(LOG, "!!>> DROP SUBSCRIPTION - no slotname for relid %u.", w->relid);
+ }
+ }
+
+ /* Stop the worker. */
logicalrep_worker_stop(w->subid, w->relid);
}
list_free(subworkers);
@@ -1144,6 +1180,78 @@ DropSubscription(DropSubscriptionStmt *stmt, bool isTopLevel)
table_close(rel, NoLock);
}
+
+/*
+ * Drop the replication slot at the publisher node
+ * using the replication connection.
+ *
+ * If the connection is passed then just use that,
+ * otherwise connect/disconnect within this function.
+ */
+void
+ReplicationSlotDropAtPubNode(WalReceiverConn *wrconn_given, char *conninfo, char *subname, char *slotname)
+{
+ StringInfoData cmd;
+
+ load_file("libpqwalreceiver", false);
+
+ initStringInfo(&cmd);
+ appendStringInfo(&cmd, "DROP_REPLICATION_SLOT %s WAIT", quote_identifier(slotname));
+
+ /*
+ * If the connection was passed then use it.
+ * If the connection was not passed then make a new connection using the passed conninfo.
+ */
+ if (wrconn_given != NULL)
+ {
+ Assert (conninfo == NULL);
+ wrconn = wrconn_given;
+ }
+ else
+ {
+ char *err = NULL;
+
+ Assert(conninfo != NULL);
+ wrconn = walrcv_connect(conninfo, true, subname, &err);
+
+ if (wrconn == NULL)
+ ereport(ERROR,
+ (errmsg("could not connect to publisher when attempting to "
+ "drop the replication slot \"%s\"", slotname),
+ errdetail("The error was: %s", err)));
+ }
+
+ PG_TRY();
+ {
+ WalRcvExecResult *res;
+
+ res = walrcv_exec(wrconn, cmd.data, 0, NULL);
+
+ if (res->status != WALRCV_OK_COMMAND)
+ ereport(ERROR,
+ (errmsg("could not drop the replication slot \"%s\" on publisher",
+ slotname),
+ errdetail("The error was: %s", res->err)));
+ else
+ ereport(LOG,
+ (errmsg("dropped replication slot \"%s\" on publisher",
+ slotname)));
+
+ walrcv_clear_result(res);
+ }
+ PG_CATCH();
+ {
+ /* NOP. Just gobble any ERROR. */
+ }
+ PG_END_TRY();
+
+ /* Disconnect the connection (unless using one passed) */
+ if (wrconn_given == NULL)
+ walrcv_disconnect(wrconn);
+
+ pfree(cmd.data);
+}
+
/*
* Internal workhorse for changing a subscription owner
*/
diff --git a/src/backend/replication/logical/tablesync.c b/src/backend/replication/logical/tablesync.c
index 1904f34..7378cb6 100644
--- a/src/backend/replication/logical/tablesync.c
+++ b/src/backend/replication/logical/tablesync.c
@@ -102,6 +102,7 @@
#include "replication/logicalrelation.h"
#include "replication/walreceiver.h"
#include "replication/worker_internal.h"
+#include "replication/slot.h"
#include "storage/ipc.h"
#include "utils/builtins.h"
#include "utils/lsyscache.h"
@@ -139,6 +140,28 @@ finish_sync_worker(void)
get_rel_name(MyLogicalRepWorker->relid))));
CommitTransactionCommand();
+ /*
+ * Cleanup the tablesync slot.
+ */
+ {
+ extern void ReplicationSlotDropAtPubNode(
+ WalReceiverConn *wrconn_given, char *conninfo, char *subname, char *slotname);
+
+ /* Calculate the name of the tablesync slot */
+ char *syncslotname = ReplicationSlotNameForTablesync(
+ MySubscription->slotname,
+ MySubscription->oid,
+ MyLogicalRepWorker->relid);
+
+ elog(LOG, "!!>> Dropping the tablesync slot \"%s\".", syncslotname);
+ ReplicationSlotDropAtPubNode(
+ wrconn,
+ NULL, /* use the current connection. */
+ MySubscription->name, syncslotname);
+
+ pfree(syncslotname);
+ }
+
/* Find the main apply worker and signal it. */
logicalrep_worker_wakeup(MyLogicalRepWorker->subid, InvalidOid);
@@ -270,8 +293,6 @@ invalidate_syncing_table_states(Datum arg, int cacheid, uint32 hashvalue)
static void
process_syncing_tables_for_sync(XLogRecPtr current_lsn)
{
- Assert(IsTransactionState());
-
SpinLockAcquire(&MyLogicalRepWorker->relmutex);
if (MyLogicalRepWorker->relstate == SUBREL_STATE_CATCHUP &&
@@ -284,6 +305,15 @@ process_syncing_tables_for_sync(XLogRecPtr current_lsn)
SpinLockRelease(&MyLogicalRepWorker->relmutex);
+ /*
+ * UpdateSubscriptionRelState must be called within a transaction.
+ * That transaction will be ended within the finish_sync_worker().
+ */
+ if (!IsTransactionState())
+ {
+ StartTransactionCommand();
+ }
+
UpdateSubscriptionRelState(MyLogicalRepWorker->subid,
MyLogicalRepWorker->relid,
MyLogicalRepWorker->relstate,
@@ -808,6 +838,35 @@ copy_table(Relation rel)
logicalrep_rel_close(relmapentry, NoLock);
}
+
+/*
+ * Determine the tablesync slot name.
+ *
+ * The returned slot name is palloc'ed in current memory context.
+ */
+char *
+ReplicationSlotNameForTablesync(char *subslotname, Oid suboid, Oid relid)
+{
+ char *syncslotname;
+
+ /*
+ * To build a slot name for the sync work, we are limited to NAMEDATALEN -
+ * 1 characters. We cut the original slot name to NAMEDATALEN - 28 chars
+ * and append _%u_sync_%u (1 + 10 + 6 + 10 + '\0'). (It's actually the
+ * NAMEDATALEN on the remote that matters, but this scheme will also work
+ * reasonably if that is different.)
+ */
+ StaticAssertStmt(NAMEDATALEN >= 32, "NAMEDATALEN too small"); /* for sanity */
+
+ syncslotname = psprintf("%.*s_%u_sync_%u",
+ NAMEDATALEN - 28,
+ subslotname,
+ suboid,
+ relid);
+
+ return syncslotname;
+}
+
/*
* Start syncing the table in the sync worker.
*
@@ -825,6 +884,7 @@ LogicalRepSyncTableStart(XLogRecPtr *origin_startpos)
XLogRecPtr relstate_lsn;
Relation rel;
WalRcvExecResult *res;
+ bool copied_ok;
/* Check the state of the table synchronization. */
StartTransactionCommand();
@@ -850,16 +910,8 @@ LogicalRepSyncTableStart(XLogRecPtr *origin_startpos)
finish_sync_worker(); /* doesn't return */
}
- /*
- * To build a slot name for the sync work, we are limited to NAMEDATALEN -
- * 1 characters. We cut the original slot name to NAMEDATALEN - 28 chars
- * and append _%u_sync_%u (1 + 10 + 6 + 10 + '\0'). (It's actually the
- * NAMEDATALEN on the remote that matters, but this scheme will also work
- * reasonably if that is different.)
- */
- StaticAssertStmt(NAMEDATALEN >= 32, "NAMEDATALEN too small"); /* for sanity */
- slotname = psprintf("%.*s_%u_sync_%u",
- NAMEDATALEN - 28,
+ /* Calculate the name of the tablesync slot */
+ slotname = ReplicationSlotNameForTablesync(
MySubscription->slotname,
MySubscription->oid,
MyLogicalRepWorker->relid);
@@ -875,7 +927,18 @@ LogicalRepSyncTableStart(XLogRecPtr *origin_startpos)
(errmsg("could not connect to the publisher: %s", err)));
Assert(MyLogicalRepWorker->relstate == SUBREL_STATE_INIT ||
- MyLogicalRepWorker->relstate == SUBREL_STATE_DATASYNC);
+ MyLogicalRepWorker->relstate == SUBREL_STATE_DATASYNC ||
+ MyLogicalRepWorker->relstate == SUBREL_STATE_COPYDONE);
+
+ /*
+ * The COPY phase was previously done, but tablesync then crashed/etc
+ * before it was able to finish normally.
+ */
+ if (MyLogicalRepWorker->relstate == SUBREL_STATE_COPYDONE)
+ {
+ elog(LOG, "!!>> tablesync relstate was SUBREL_STATE_COPYDONE.");
+ goto copy_table_done;
+ }
SpinLockAcquire(&MyLogicalRepWorker->relmutex);
MyLogicalRepWorker->relstate = SUBREL_STATE_DATASYNC;
@@ -891,9 +954,6 @@ LogicalRepSyncTableStart(XLogRecPtr *origin_startpos)
CommitTransactionCommand();
pgstat_report_stat(false);
- /*
- * We want to do the table data sync in a single transaction.
- */
StartTransactionCommand();
/*
@@ -919,29 +979,70 @@ LogicalRepSyncTableStart(XLogRecPtr *origin_startpos)
walrcv_clear_result(res);
/*
- * Create a new temporary logical decoding slot. This slot will be used
+ * Create a new permanent logical decoding slot. This slot will be used
* for the catchup phase after COPY is done, so tell it to use the
* snapshot to make the final data consistent.
*/
- walrcv_create_slot(wrconn, slotname, true,
+ elog(LOG, "!!>> LogicalRepSyncTableStart calls walrcv_create_slot for \"%s\".", slotname);
+ walrcv_create_slot(wrconn, slotname, false,
CRS_USE_SNAPSHOT, origin_startpos);
- /* Now do the initial data copy */
- PushActiveSnapshot(GetTransactionSnapshot());
- copy_table(rel);
- PopActiveSnapshot();
+ /*
+ * Be sure to remove the newly created tablesync slot if the COPY fails.
+ */
+ copied_ok = false;
+ PG_TRY();
+ {
+ /* Now do the initial data copy */
+ PushActiveSnapshot(GetTransactionSnapshot());
+ copy_table(rel);
+ PopActiveSnapshot();
- res = walrcv_exec(wrconn, "COMMIT", 0, NULL);
- if (res->status != WALRCV_OK_COMMAND)
- ereport(ERROR,
- (errmsg("table copy could not finish transaction on publisher"),
- errdetail("The error was: %s", res->err)));
- walrcv_clear_result(res);
+ res = walrcv_exec(wrconn, "COMMIT", 0, NULL);
+ if (res->status != WALRCV_OK_COMMAND)
+ ereport(ERROR,
+ (errmsg("table copy could not finish transaction on publisher"),
+ errdetail("The error was: %s", res->err)));
+ walrcv_clear_result(res);
+
+ table_close(rel, NoLock);
+
+ /* Make the copy visible. */
+ CommandCounterIncrement();
+
+ copied_ok = true;
+ }
+ PG_FINALLY();
+ {
+ /* If something failed during copy table then cleanup the created slot. */
+ if (!copied_ok)
+ {
+ extern void ReplicationSlotDropAtPubNode(
+ WalReceiverConn *wrconn_given, char *conninfo, char *subname, char *slotname);
+
+ elog(LOG, "!!>> The tablesync copy failed. Drop the tablesync slot \"%s\".", slotname);
+ ReplicationSlotDropAtPubNode(
+ wrconn,
+ NULL, /* use the current connection. */
+ MySubscription->name,
+ slotname);
+
+ pfree(slotname);
+ }
+ }
+ PG_END_TRY();
- table_close(rel, NoLock);
+ CommitTransactionCommand();
+
+ /* Update the persisted state to flag COPY phase is done; make it visible to others. */
+ StartTransactionCommand();
+ UpdateSubscriptionRelState(MyLogicalRepWorker->subid,
+ MyLogicalRepWorker->relid,
+ SUBREL_STATE_COPYDONE,
+ MyLogicalRepWorker->relstate_lsn);
+ CommitTransactionCommand();
- /* Make the copy visible. */
- CommandCounterIncrement();
+copy_table_done:
/*
* We are done with the initial data synchronization, update the state.
diff --git a/src/backend/replication/logical/worker.c b/src/backend/replication/logical/worker.c
index 9271f87..a60e9fd 100644
--- a/src/backend/replication/logical/worker.c
+++ b/src/backend/replication/logical/worker.c
@@ -771,8 +771,7 @@ apply_handle_prepare_txn(LogicalRepPrepareData *prepare_data)
Assert(prepare_data->prepare_lsn == remote_final_lsn);
- /* The synchronization worker runs in single transaction. */
- if (IsTransactionState() && !am_tablesync_worker())
+ if (IsTransactionState())
{
/*
* BeginTransactionBlock is necessary to balance the
@@ -1079,12 +1078,8 @@ apply_handle_stream_stop(StringInfo s)
/* We must be in a valid transaction state */
Assert(IsTransactionState());
- /* The synchronization worker runs in single transaction. */
- if (!am_tablesync_worker())
- {
- /* Commit the per-stream transaction */
- CommitTransactionCommand();
- }
+ /* Commit the per-stream transaction */
+ CommitTransactionCommand();
in_streamed_transaction = false;
@@ -1161,9 +1156,7 @@ apply_handle_stream_abort(StringInfo s)
/* Cleanup the subxact info */
cleanup_subxact_info();
- /* The synchronization worker runs in single transaction */
- if (!am_tablesync_worker())
- CommitTransactionCommand();
+ CommitTransactionCommand();
return;
}
@@ -1190,8 +1183,7 @@ apply_handle_stream_abort(StringInfo s)
/* write the updated subxact list */
subxact_info_write(MyLogicalRepWorker->subid, xid);
- if (!am_tablesync_worker())
- CommitTransactionCommand();
+ CommitTransactionCommand();
}
}
@@ -1350,8 +1342,7 @@ apply_handle_stream_commit(StringInfo s)
static void
apply_handle_commit_internal(StringInfo s, LogicalRepCommitData* commit_data)
{
- /* The synchronization worker runs in single transaction. */
- if (IsTransactionState() && !am_tablesync_worker())
+ if (IsTransactionState())
{
/*
* Update origin state so we can restart streaming from correct
diff --git a/src/include/catalog/pg_subscription_rel.h b/src/include/catalog/pg_subscription_rel.h
index acc2926..e9f2b3f 100644
--- a/src/include/catalog/pg_subscription_rel.h
+++ b/src/include/catalog/pg_subscription_rel.h
@@ -61,6 +61,7 @@ DECLARE_UNIQUE_INDEX(pg_subscription_rel_srrelid_srsubid_index, 6117, on pg_subs
#define SUBREL_STATE_INIT 'i' /* initializing (sublsn NULL) */
#define SUBREL_STATE_DATASYNC 'd' /* data is being synchronized (sublsn
* NULL) */
+#define SUBREL_STATE_COPYDONE 'C' /* tablesync copy phase is completed */
#define SUBREL_STATE_SYNCDONE 's' /* synchronization finished in front of
* apply (sublsn set) */
#define SUBREL_STATE_READY 'r' /* ready (sublsn set) */
diff --git a/src/include/commands/subscriptioncmds.h b/src/include/commands/subscriptioncmds.h
index 804e47b..82c09d1 100644
--- a/src/include/commands/subscriptioncmds.h
+++ b/src/include/commands/subscriptioncmds.h
@@ -27,3 +27,4 @@ extern ObjectAddress AlterSubscriptionOwner(const char *name, Oid newOwnerId);
extern void AlterSubscriptionOwner_oid(Oid subid, Oid newOwnerId);
#endif /* SUBSCRIPTIONCMDS_H */
+
diff --git a/src/include/replication/slot.h b/src/include/replication/slot.h
index 63bab69..366a737 100644
--- a/src/include/replication/slot.h
+++ b/src/include/replication/slot.h
@@ -211,6 +211,7 @@ extern bool ReplicationSlotsCountDBSlots(Oid dboid, int *nslots, int *nactive);
extern void ReplicationSlotsDropDBSlots(Oid dboid);
extern void InvalidateObsoleteReplicationSlots(XLogSegNo oldestSegno);
extern ReplicationSlot *SearchNamedReplicationSlot(const char *name);
+extern char *ReplicationSlotNameForTablesync(char *subslotname, Oid suboid, Oid relid);
extern void StartupReplicationSlots(void);
extern void CheckPointReplicationSlots(void);
--
1.8.3.1
v3-0001-2PC-change-tablesync-slot-to-use-same-two_phase-m.patchapplication/octet-stream; name=v3-0001-2PC-change-tablesync-slot-to-use-same-two_phase-m.patchDownload
From 27ecd449901c0b81fe9738b9fb1421c9b0d20d05 Mon Sep 17 00:00:00 2001
From: Peter Smith <peter.b.smith@fujitsu.com>
Date: Thu, 10 Dec 2020 16:38:05 +1100
Subject: [PATCH v3] 2PC - change tablesync slot to use same two_phase mode as
apply slot
---
src/backend/replication/logical/worker.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/src/backend/replication/logical/worker.c b/src/backend/replication/logical/worker.c
index e14fe62..9271f87 100644
--- a/src/backend/replication/logical/worker.c
+++ b/src/backend/replication/logical/worker.c
@@ -2759,7 +2759,7 @@ maybe_reread_subscription(void)
strcmp(newsub->slotname, MySubscription->slotname) != 0 ||
newsub->binary != MySubscription->binary ||
newsub->stream != MySubscription->stream ||
- (!am_tablesync_worker() && newsub->twophase != MySubscription->twophase) ||
+ newsub->twophase != MySubscription->twophase ||
!equal(newsub->publications, MySubscription->publications))
{
ereport(LOG,
@@ -3406,7 +3406,7 @@ ApplyWorkerMain(Datum main_arg)
options.proto.logical.publication_names = MySubscription->publications;
options.proto.logical.binary = MySubscription->binary;
options.proto.logical.streaming = MySubscription->stream;
- options.proto.logical.twophase = MySubscription->twophase && !am_tablesync_worker();
+ options.proto.logical.twophase = MySubscription->twophase;
/* Start normal logical streaming replication. */
walrcv_startstreaming(wrconn, &options);
--
1.8.3.1
Hi Amit.
PSA my v4 WIP patch for the Solution1.
This patch applies onto the v30 patch set [1]/messages/by-id/CAFPTHDYA8yE6tEmQ2USYS68kNt+kM=SwKgj=jy4AvFD5e9-UTQ@mail.gmail.com from other 2PC thread:
[1]: /messages/by-id/CAFPTHDYA8yE6tEmQ2USYS68kNt+kM=SwKgj=jy4AvFD5e9-UTQ@mail.gmail.com
Although incomplete it does still pass all the make check, and
src/test/subscription TAP tests.
====
Coded / WIP:
* tablesync slot is now permanent instead of temporary
* the tablesync slot cleanup (drop) code is added for DropSubscription
and for finish_sync_worker functions
* tablesync worked now allowing multiple tx instead of single tx
* a new state (SUBREL_STATE_COPYDONE) is persisted after a successful
copy_table in LogicalRepSyncTableStart.
* if a relaunched tablesync finds the state is SUBREL_STATE_COPYDONE
then it will bypass the initial copy_table phase.
* tablesync now sets up replication origin tracking in
LogicalRepSyncTableStart (similar as done for apply worker)
* tablesync replication origin tracking is cleaned up during
DropSubscription and/or process_syncing_tables_for_apply
TODO / Known Issues:
* the current implementation of tablesync drop slot (e.g. from
DropSubscription or finish_sync_worker) regenerates the tablesync slot
name so it knows what slot to drop. The current code might be ok for
normal use cases, but if there is an ALTER SUBSCRIPTION ... SET
(slot_name = newname) it would fail to be able to find the tablesync
slot.
* I think if there are crashed tablesync workers then they are not
known to DropSubscription. So this might be a problem to cleanup slots
and/or origin tracking belonging to those unknown workers.
* help / comments / cleanup
* There is temporary "!!>>" excessive logging of mine scattered around
which I added to help my testing during development
---
Kind Regards,
Peter Smith.
Fujitsu Australia
Attachments:
v4-0002-WIP-patch-for-Solution1.patchapplication/octet-stream; name=v4-0002-WIP-patch-for-Solution1.patchDownload
From 8867bd8756d6b31b015ac39ae36d5bdf4146db0e Mon Sep 17 00:00:00 2001
From: Peter Smith <peter.b.smith@fujitsu.com>
Date: Sat, 19 Dec 2020 00:01:41 +1100
Subject: [PATCH v4] WIP patch for Solution1.
This patch applies onto the v30 patch set [1] from other 2PC thread:
[1] https://www.postgresql.org/message-id/CAFPTHDYA8yE6tEmQ2USYS68kNt%2BkM%3DSwKgj%3Djy4AvFD5e9-UTQ%40mail.gmail.com
Although incomplete it does still pass all the make check, and src/test/subscription TAP tests.
====
Coded / WIP:
* tablesync slot is now permanent instead of temporary
* the tablesync slot cleanup (drop) code is added for DropSubscription and for finish_sync_worker functions
* tablesync worked now allowing multiple tx instead of single tx
* a new state (SUBREL_STATE_COPYDONE) is persisted after a successful copy_table in LogicalRepSyncTableStart.
* if a relaunched tablesync finds the state is SUBREL_STATE_COPYDONE then it will bypass the initial copy_table phase.
* tablesync now sets up replication origin tracking in LogicalRepSyncTableStart (similar as done for apply worker)
* tablesync replication origin tracking is cleaned up during DropSubscription and/or process_syncing_tables_for_apply
TODO / Known Issues:
* the current implementation of tablesync drop slot (e.g. from DropSubscription or finish_sync_worker) regenerates the tablesync slot name so it knows what slot to drop. The current code might be ok for normal use cases, but if there is a ALTER SUBSCRIPTION ... SET (slot_name = newname) it would fail to be able to find the tablesync slot.
* I think if there are crashed tablesync workers then they are not known to DropSubscription. So this might be a problem to cleanup slots and/or origin tracking belonging to those unknown workers.
* help / comments / cleanup
* There is temporary "!!>>" excessive logging of mine scattered around which I added to help my testing during development
---
src/backend/commands/subscriptioncmds.c | 124 +++++++++++++++++
src/backend/replication/logical/tablesync.c | 207 +++++++++++++++++++++++-----
src/backend/replication/logical/worker.c | 21 +--
src/include/catalog/pg_subscription_rel.h | 1 +
src/include/replication/slot.h | 1 +
5 files changed, 307 insertions(+), 47 deletions(-)
diff --git a/src/backend/commands/subscriptioncmds.c b/src/backend/commands/subscriptioncmds.c
index b0745d5..c4b02a6 100644
--- a/src/backend/commands/subscriptioncmds.c
+++ b/src/backend/commands/subscriptioncmds.c
@@ -37,6 +37,7 @@
#include "replication/walreceiver.h"
#include "replication/walsender.h"
#include "replication/worker_internal.h"
+#include "replication/slot.h"
#include "storage/lmgr.h"
#include "utils/acl.h"
#include "utils/builtins.h"
@@ -47,6 +48,8 @@
static List *fetch_table_list(WalReceiverConn *wrconn, List *publications);
+void ReplicationSlotDropAtPubNode(WalReceiverConn *wrconn_given, char *conninfo, char *subname, char *slotname);
+
/*
* Common option parsing function for CREATE and ALTER SUBSCRIPTION commands.
*
@@ -1070,6 +1073,55 @@ DropSubscription(DropSubscriptionStmt *stmt, bool isTopLevel)
{
LogicalRepWorker *w = (LogicalRepWorker *) lfirst(lc);
+ /*
+ * Is this a tablesync worker? If yes, drop the tablesync's slot.
+ */
+ if (OidIsValid(w->relid))
+ {
+ /*
+ * FIXME 1 - This slotname check below is a workaround needed because the tablesync slot
+ * name is derived from the subscription slot name, so if that is set slot_name = NONE
+ * then we cannot do that calculation anymore to get the tablesyn slot name.
+ *
+ * FIXME 2 - Similarly, If subscription slot name changes from 'aaa' to 'bbb' then that
+ * will also make it not possible to re-calculate the tablesync slots. Some redesign is
+ * needed (eg store the tablesync slotname somewhere) to avoid this trouble...
+ *
+ * FIXME 3 - Crashed tablesync workers may also have remaining slots because I don't think
+ * such workers are even iterated by this loop, and nobody else is removing them.
+ */
+ if (slotname)
+ {
+ /* Calculate the name of the tablesync slot. */
+ char *syncslotname = ReplicationSlotNameForTablesync(slotname, w->subid, w->relid);
+
+ elog(LOG, "!!>> DropSubscription: now dropping the tablesync slot \"%s\".", syncslotname);
+ ReplicationSlotDropAtPubNode(
+ NULL,
+ conninfo, /* use conninfo to make a new connection. */
+ subname,
+ syncslotname);
+
+ pfree(syncslotname);
+ }
+ else
+ {
+ elog(LOG, "!!>> DropSubscription: no slotname for relid %u.", w->relid);
+ }
+
+ /* Remove the (tablesync's) origin tracking if exists. */
+ snprintf(originname, sizeof(originname), "pg_%u_%u", subid, w->relid);
+ originid = replorigin_by_name(originname, true);
+ if (originid != InvalidRepOriginId)
+ {
+ elog(LOG, "!!>> DropSubscription: dropping origin tracking for \"%s\"", originname);
+ replorigin_drop(originid, false);
+ elog(LOG, "!!>> DropSubscription: dropped origin tracking for \"%s\"", originname);
+ }
+
+ }
+
+ /* Stop the worker. */
logicalrep_worker_stop(w->subid, w->relid);
}
list_free(subworkers);
@@ -1144,6 +1196,78 @@ DropSubscription(DropSubscriptionStmt *stmt, bool isTopLevel)
table_close(rel, NoLock);
}
+
+/*
+ * Drop the replication slot at the publisher node
+ * using the replication connection.
+ *
+ * If the connection is passed then just use that,
+ * otherwise connect/disconnect within this function.
+ */
+void
+ReplicationSlotDropAtPubNode(WalReceiverConn *wrconn_given, char *conninfo, char *subname, char *slotname)
+{
+ StringInfoData cmd;
+
+ load_file("libpqwalreceiver", false);
+
+ initStringInfo(&cmd);
+ appendStringInfo(&cmd, "DROP_REPLICATION_SLOT %s WAIT", quote_identifier(slotname));
+
+ /*
+ * If the connection was passed then use it.
+ * If the connection was not passed then make a new connection using the passed conninfo.
+ */
+ if (wrconn_given != NULL)
+ {
+ Assert (conninfo == NULL);
+ wrconn = wrconn_given;
+ }
+ else
+ {
+ char *err = NULL;
+
+ Assert(conninfo != NULL);
+ wrconn = walrcv_connect(conninfo, true, subname, &err);
+
+ if (wrconn == NULL)
+ ereport(ERROR,
+ (errmsg("could not connect to publisher when attempting to "
+ "drop the replication slot \"%s\"", slotname),
+ errdetail("The error was: %s", err)));
+ }
+
+ PG_TRY();
+ {
+ WalRcvExecResult *res;
+
+ res = walrcv_exec(wrconn, cmd.data, 0, NULL);
+
+ if (res->status != WALRCV_OK_COMMAND)
+ ereport(ERROR,
+ (errmsg("could not drop the replication slot \"%s\" on publisher",
+ slotname),
+ errdetail("The error was: %s", res->err)));
+ else
+ ereport(LOG,
+ (errmsg("dropped replication slot \"%s\" on publisher",
+ slotname)));
+
+ walrcv_clear_result(res);
+ }
+ PG_CATCH();
+ {
+ /* NOP. Just gobble any ERROR. */
+ }
+ PG_END_TRY();
+
+ /* Disconnect the connection (unless using one passed) */
+ if (wrconn_given == NULL)
+ walrcv_disconnect(wrconn);
+
+ pfree(cmd.data);
+}
+
/*
* Internal workhorse for changing a subscription owner
*/
diff --git a/src/backend/replication/logical/tablesync.c b/src/backend/replication/logical/tablesync.c
index 1904f34..780cf8d 100644
--- a/src/backend/replication/logical/tablesync.c
+++ b/src/backend/replication/logical/tablesync.c
@@ -102,6 +102,8 @@
#include "replication/logicalrelation.h"
#include "replication/walreceiver.h"
#include "replication/worker_internal.h"
+#include "replication/slot.h"
+#include "replication/origin.h"
#include "storage/ipc.h"
#include "utils/builtins.h"
#include "utils/lsyscache.h"
@@ -139,6 +141,28 @@ finish_sync_worker(void)
get_rel_name(MyLogicalRepWorker->relid))));
CommitTransactionCommand();
+ /*
+ * Cleanup the tablesync slot.
+ */
+ {
+ extern void ReplicationSlotDropAtPubNode(
+ WalReceiverConn *wrconn_given, char *conninfo, char *subname, char *slotname);
+
+ /* Calculate the name of the tablesync slot */
+ char *syncslotname = ReplicationSlotNameForTablesync(
+ MySubscription->slotname,
+ MySubscription->oid,
+ MyLogicalRepWorker->relid);
+
+ elog(LOG, "!!>> finish_sync_worker: dropping the tablesync slot \"%s\".", syncslotname);
+ ReplicationSlotDropAtPubNode(
+ wrconn,
+ NULL, /* use the current connection. */
+ MySubscription->name, syncslotname);
+
+ pfree(syncslotname);
+ }
+
/* Find the main apply worker and signal it. */
logicalrep_worker_wakeup(MyLogicalRepWorker->subid, InvalidOid);
@@ -270,8 +294,6 @@ invalidate_syncing_table_states(Datum arg, int cacheid, uint32 hashvalue)
static void
process_syncing_tables_for_sync(XLogRecPtr current_lsn)
{
- Assert(IsTransactionState());
-
SpinLockAcquire(&MyLogicalRepWorker->relmutex);
if (MyLogicalRepWorker->relstate == SUBREL_STATE_CATCHUP &&
@@ -284,6 +306,15 @@ process_syncing_tables_for_sync(XLogRecPtr current_lsn)
SpinLockRelease(&MyLogicalRepWorker->relmutex);
+ /*
+ * UpdateSubscriptionRelState must be called within a transaction.
+ * That transaction will be ended within the finish_sync_worker().
+ */
+ if (!IsTransactionState())
+ {
+ StartTransactionCommand();
+ }
+
UpdateSubscriptionRelState(MyLogicalRepWorker->subid,
MyLogicalRepWorker->relid,
MyLogicalRepWorker->relstate,
@@ -416,6 +447,27 @@ process_syncing_tables_for_apply(XLogRecPtr current_lsn)
UpdateSubscriptionRelState(MyLogicalRepWorker->subid,
rstate->relid, rstate->state,
rstate->lsn);
+ /*
+ * Remove the tablesync origin tracking if exists.
+ *
+ * The cleanup is done here instead of in the finish_sync_worker function because
+ * if the tablesync worker process attempted to call replorigin_drop then that will
+ * hang because the replorigin_drop considers the owning tablesync PID as "busy".
+ */
+ {
+ char originname[NAMEDATALEN];
+ RepOriginId originid;
+
+ snprintf(originname, sizeof(originname), "pg_%u_%u", MyLogicalRepWorker->subid, rstate->relid);
+ originid = replorigin_by_name(originname, true);
+ elog(LOG, "!!>> apply worker: find tablesync origin tracking for \"%s\".", originname);
+ if (OidIsValid(originid))
+ {
+ elog(LOG, "!!>> apply worker: dropping tablesync origin tracking for \"%s\".", originname);
+ replorigin_drop(originid, false);
+ elog(LOG, "!!>> apply worker: dropped tablesync origin tracking for \"%s\".", originname);
+ }
+ }
}
}
else
@@ -808,6 +860,35 @@ copy_table(Relation rel)
logicalrep_rel_close(relmapentry, NoLock);
}
+
+/*
+ * Determine the tablesync slot name.
+ *
+ * The returned slot name is palloc'ed in current memory context.
+ */
+char *
+ReplicationSlotNameForTablesync(char *subslotname, Oid suboid, Oid relid)
+{
+ char *syncslotname;
+
+ /*
+ * To build a slot name for the sync work, we are limited to NAMEDATALEN -
+ * 1 characters. We cut the original slot name to NAMEDATALEN - 28 chars
+ * and append _%u_sync_%u (1 + 10 + 6 + 10 + '\0'). (It's actually the
+ * NAMEDATALEN on the remote that matters, but this scheme will also work
+ * reasonably if that is different.)
+ */
+ StaticAssertStmt(NAMEDATALEN >= 32, "NAMEDATALEN too small"); /* for sanity */
+
+ syncslotname = psprintf("%.*s_%u_sync_%u",
+ NAMEDATALEN - 28,
+ subslotname,
+ suboid,
+ relid);
+
+ return syncslotname;
+}
+
/*
* Start syncing the table in the sync worker.
*
@@ -825,6 +906,7 @@ LogicalRepSyncTableStart(XLogRecPtr *origin_startpos)
XLogRecPtr relstate_lsn;
Relation rel;
WalRcvExecResult *res;
+ bool copied_ok;
/* Check the state of the table synchronization. */
StartTransactionCommand();
@@ -850,16 +932,8 @@ LogicalRepSyncTableStart(XLogRecPtr *origin_startpos)
finish_sync_worker(); /* doesn't return */
}
- /*
- * To build a slot name for the sync work, we are limited to NAMEDATALEN -
- * 1 characters. We cut the original slot name to NAMEDATALEN - 28 chars
- * and append _%u_sync_%u (1 + 10 + 6 + 10 + '\0'). (It's actually the
- * NAMEDATALEN on the remote that matters, but this scheme will also work
- * reasonably if that is different.)
- */
- StaticAssertStmt(NAMEDATALEN >= 32, "NAMEDATALEN too small"); /* for sanity */
- slotname = psprintf("%.*s_%u_sync_%u",
- NAMEDATALEN - 28,
+ /* Calculate the name of the tablesync slot. */
+ slotname = ReplicationSlotNameForTablesync(
MySubscription->slotname,
MySubscription->oid,
MyLogicalRepWorker->relid);
@@ -875,7 +949,18 @@ LogicalRepSyncTableStart(XLogRecPtr *origin_startpos)
(errmsg("could not connect to the publisher: %s", err)));
Assert(MyLogicalRepWorker->relstate == SUBREL_STATE_INIT ||
- MyLogicalRepWorker->relstate == SUBREL_STATE_DATASYNC);
+ MyLogicalRepWorker->relstate == SUBREL_STATE_DATASYNC ||
+ MyLogicalRepWorker->relstate == SUBREL_STATE_COPYDONE);
+
+ if (MyLogicalRepWorker->relstate == SUBREL_STATE_COPYDONE)
+ {
+ /*
+ * The COPY phase was previously done, but tablesync then crashed/etc
+ * before it was able to finish normally.
+ */
+ elog(LOG, "!!>> LogicalRepSyncTableStart: tablesync relstate was SUBREL_STATE_COPYDONE.");
+ goto copy_table_done;
+ }
SpinLockAcquire(&MyLogicalRepWorker->relmutex);
MyLogicalRepWorker->relstate = SUBREL_STATE_DATASYNC;
@@ -891,9 +976,6 @@ LogicalRepSyncTableStart(XLogRecPtr *origin_startpos)
CommitTransactionCommand();
pgstat_report_stat(false);
- /*
- * We want to do the table data sync in a single transaction.
- */
StartTransactionCommand();
/*
@@ -919,29 +1001,90 @@ LogicalRepSyncTableStart(XLogRecPtr *origin_startpos)
walrcv_clear_result(res);
/*
- * Create a new temporary logical decoding slot. This slot will be used
+ * Create a new permanent logical decoding slot. This slot will be used
* for the catchup phase after COPY is done, so tell it to use the
* snapshot to make the final data consistent.
*/
- walrcv_create_slot(wrconn, slotname, true,
- CRS_USE_SNAPSHOT, origin_startpos);
+ elog(LOG, "!!>> LogicalRepSyncTableStart: walrcv_create_slot for \"%s\".", slotname);
+ walrcv_create_slot(wrconn, slotname, false,
+ CRS_USE_SNAPSHOT, NULL);
- /* Now do the initial data copy */
- PushActiveSnapshot(GetTransactionSnapshot());
- copy_table(rel);
- PopActiveSnapshot();
+ /*
+ * Be sure to remove the newly created tablesync slot if the COPY fails.
+ */
+ copied_ok = false;
+ PG_TRY();
+ {
+ /* Now do the initial data copy */
+ PushActiveSnapshot(GetTransactionSnapshot());
+ copy_table(rel);
+ PopActiveSnapshot();
- res = walrcv_exec(wrconn, "COMMIT", 0, NULL);
- if (res->status != WALRCV_OK_COMMAND)
- ereport(ERROR,
- (errmsg("table copy could not finish transaction on publisher"),
- errdetail("The error was: %s", res->err)));
- walrcv_clear_result(res);
+ res = walrcv_exec(wrconn, "COMMIT", 0, NULL);
+ if (res->status != WALRCV_OK_COMMAND)
+ ereport(ERROR,
+ (errmsg("table copy could not finish transaction on publisher"),
+ errdetail("The error was: %s", res->err)));
+ walrcv_clear_result(res);
+
+ table_close(rel, NoLock);
- table_close(rel, NoLock);
+ /* Make the copy visible. */
+ CommandCounterIncrement();
- /* Make the copy visible. */
- CommandCounterIncrement();
+ copied_ok = true;
+ }
+ PG_FINALLY();
+ {
+ /* If something failed during copy table then cleanup the created slot. */
+ if (!copied_ok)
+ {
+ extern void ReplicationSlotDropAtPubNode(
+ WalReceiverConn *wrconn_given, char *conninfo, char *subname, char *slotname);
+
+ elog(LOG, "!!>> LogicalRepSyncTableStart: tablesync copy failed. Drop the tablesync slot \"%s\".", slotname);
+ ReplicationSlotDropAtPubNode(
+ wrconn,
+ NULL, /* use the current connection. */
+ MySubscription->name,
+ slotname);
+
+ pfree(slotname);
+ }
+ }
+ PG_END_TRY();
+
+ CommitTransactionCommand();
+
+ /* Update the persisted state to indicate the COPY phase is done; make it visible to others. */
+ StartTransactionCommand();
+ UpdateSubscriptionRelState(MyLogicalRepWorker->subid,
+ MyLogicalRepWorker->relid,
+ SUBREL_STATE_COPYDONE,
+ MyLogicalRepWorker->relstate_lsn);
+ CommitTransactionCommand();
+
+copy_table_done:
+
+ /* Setup replication origin tracking. */
+ {
+ char originname[NAMEDATALEN];
+ RepOriginId originid;
+
+ StartTransactionCommand();
+ snprintf(originname, sizeof(originname), "pg_%u_%u", MySubscription->oid, MyLogicalRepWorker->relid);
+ originid = replorigin_by_name(originname, true);
+ if (!OidIsValid(originid))
+ {
+ elog(LOG, "!!>> LogicalRepSyncTableStart: create replication origin tracking \"%s\".", originname);
+ originid = replorigin_create(originname);
+ }
+ elog(LOG, "!!>> LogicalRepSyncTableStart: setup replication origin tracking \"%s\".", originname);
+ replorigin_session_setup(originid);
+ replorigin_session_origin = originid;
+ *origin_startpos = replorigin_session_get_progress(false);
+ CommitTransactionCommand();
+ }
/*
* We are done with the initial data synchronization, update the state.
diff --git a/src/backend/replication/logical/worker.c b/src/backend/replication/logical/worker.c
index 9271f87..a60e9fd 100644
--- a/src/backend/replication/logical/worker.c
+++ b/src/backend/replication/logical/worker.c
@@ -771,8 +771,7 @@ apply_handle_prepare_txn(LogicalRepPrepareData *prepare_data)
Assert(prepare_data->prepare_lsn == remote_final_lsn);
- /* The synchronization worker runs in single transaction. */
- if (IsTransactionState() && !am_tablesync_worker())
+ if (IsTransactionState())
{
/*
* BeginTransactionBlock is necessary to balance the
@@ -1079,12 +1078,8 @@ apply_handle_stream_stop(StringInfo s)
/* We must be in a valid transaction state */
Assert(IsTransactionState());
- /* The synchronization worker runs in single transaction. */
- if (!am_tablesync_worker())
- {
- /* Commit the per-stream transaction */
- CommitTransactionCommand();
- }
+ /* Commit the per-stream transaction */
+ CommitTransactionCommand();
in_streamed_transaction = false;
@@ -1161,9 +1156,7 @@ apply_handle_stream_abort(StringInfo s)
/* Cleanup the subxact info */
cleanup_subxact_info();
- /* The synchronization worker runs in single transaction */
- if (!am_tablesync_worker())
- CommitTransactionCommand();
+ CommitTransactionCommand();
return;
}
@@ -1190,8 +1183,7 @@ apply_handle_stream_abort(StringInfo s)
/* write the updated subxact list */
subxact_info_write(MyLogicalRepWorker->subid, xid);
- if (!am_tablesync_worker())
- CommitTransactionCommand();
+ CommitTransactionCommand();
}
}
@@ -1350,8 +1342,7 @@ apply_handle_stream_commit(StringInfo s)
static void
apply_handle_commit_internal(StringInfo s, LogicalRepCommitData* commit_data)
{
- /* The synchronization worker runs in single transaction. */
- if (IsTransactionState() && !am_tablesync_worker())
+ if (IsTransactionState())
{
/*
* Update origin state so we can restart streaming from correct
diff --git a/src/include/catalog/pg_subscription_rel.h b/src/include/catalog/pg_subscription_rel.h
index acc2926..e9f2b3f 100644
--- a/src/include/catalog/pg_subscription_rel.h
+++ b/src/include/catalog/pg_subscription_rel.h
@@ -61,6 +61,7 @@ DECLARE_UNIQUE_INDEX(pg_subscription_rel_srrelid_srsubid_index, 6117, on pg_subs
#define SUBREL_STATE_INIT 'i' /* initializing (sublsn NULL) */
#define SUBREL_STATE_DATASYNC 'd' /* data is being synchronized (sublsn
* NULL) */
+#define SUBREL_STATE_COPYDONE 'C' /* tablesync copy phase is completed */
#define SUBREL_STATE_SYNCDONE 's' /* synchronization finished in front of
* apply (sublsn set) */
#define SUBREL_STATE_READY 'r' /* ready (sublsn set) */
diff --git a/src/include/replication/slot.h b/src/include/replication/slot.h
index 63bab69..366a737 100644
--- a/src/include/replication/slot.h
+++ b/src/include/replication/slot.h
@@ -211,6 +211,7 @@ extern bool ReplicationSlotsCountDBSlots(Oid dboid, int *nslots, int *nactive);
extern void ReplicationSlotsDropDBSlots(Oid dboid);
extern void InvalidateObsoleteReplicationSlots(XLogSegNo oldestSegno);
extern ReplicationSlot *SearchNamedReplicationSlot(const char *name);
+extern char *ReplicationSlotNameForTablesync(char *subslotname, Oid suboid, Oid relid);
extern void StartupReplicationSlots(void);
extern void CheckPointReplicationSlots(void);
--
1.8.3.1
v4-0001-2PC-change-tablesync-slot-to-use-same-two_phase-m.patchapplication/octet-stream; name=v4-0001-2PC-change-tablesync-slot-to-use-same-two_phase-m.patchDownload
From 27ecd449901c0b81fe9738b9fb1421c9b0d20d05 Mon Sep 17 00:00:00 2001
From: Peter Smith <peter.b.smith@fujitsu.com>
Date: Thu, 10 Dec 2020 16:38:05 +1100
Subject: [PATCH v4] 2PC - change tablesync slot to use same two_phase mode as
apply slot
---
src/backend/replication/logical/worker.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/src/backend/replication/logical/worker.c b/src/backend/replication/logical/worker.c
index e14fe62..9271f87 100644
--- a/src/backend/replication/logical/worker.c
+++ b/src/backend/replication/logical/worker.c
@@ -2759,7 +2759,7 @@ maybe_reread_subscription(void)
strcmp(newsub->slotname, MySubscription->slotname) != 0 ||
newsub->binary != MySubscription->binary ||
newsub->stream != MySubscription->stream ||
- (!am_tablesync_worker() && newsub->twophase != MySubscription->twophase) ||
+ newsub->twophase != MySubscription->twophase ||
!equal(newsub->publications, MySubscription->publications))
{
ereport(LOG,
@@ -3406,7 +3406,7 @@ ApplyWorkerMain(Datum main_arg)
options.proto.logical.publication_names = MySubscription->publications;
options.proto.logical.binary = MySubscription->binary;
options.proto.logical.streaming = MySubscription->stream;
- options.proto.logical.twophase = MySubscription->twophase && !am_tablesync_worker();
+ options.proto.logical.twophase = MySubscription->twophase;
/* Start normal logical streaming replication. */
walrcv_startstreaming(wrconn, &options);
--
1.8.3.1
On Fri, Dec 18, 2020 at 6:41 PM Peter Smith <smithpb2250@gmail.com> wrote:
TODO / Known Issues:
* the current implementation of tablesync drop slot (e.g. from
DropSubscription or finish_sync_worker) regenerates the tablesync slot
name so it knows what slot to drop.
If you always drop the slot at finish_sync_worker, then in which case
do you need to drop it during DropSubscription? Is it when the table
sync workers are crashed?
The current code might be ok for
normal use cases, but if there is an ALTER SUBSCRIPTION ... SET
(slot_name = newname) it would fail to be able to find the tablesync
slot.
Sure, but the same will be true for the apply worker slot as well. I
agree the problem would be more for table sync workers but I think we
can solve it, see below.
* I think if there are crashed tablesync workers then they are not
known to DropSubscription. So this might be a problem to cleanup slots
and/or origin tracking belonging to those unknown workers.
Yeah, I think we can do two things to avoid this and the previous
problem. (a) We can generate the slot_name for the table sync worker
based on only subscription_id and rel_id. (b) Immediately after
creating the slot, advance the replication origin with the position
(origin_startpos) we get from walrcv_create_slot, this will help us to
start from the right location.
Do you see anything which will still not be addressed after doing the above?
I understand why you are trying to create this patch atop logical
decoding of 2PC patch but I think it is better to create this as an
independent patch and then use it to test 2PC problem. Also, please
explain what kind of testing you did to ensure that it works properly
after the table sync worker restarts after the crash.
--
With Regards,
Amit Kapila.
On Sat, Dec 19, 2020 at 12:10 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
On Fri, Dec 18, 2020 at 6:41 PM Peter Smith <smithpb2250@gmail.com> wrote:
I understand why you are trying to create this patch atop logical
decoding of 2PC patch but I think it is better to create this as an
independent patch and then use it to test 2PC problem. Also, please
explain what kind of testing you did to ensure that it works properly
after the table sync worker restarts after the crash.
Few other comments:
==================
1.
* FIXME 3 - Crashed tablesync workers may also have remaining slots
because I don't think
+ * such workers are even iterated by this loop, and nobody else is
removing them.
+ */
+ if (slotname)
+ {
The above FIXME is not clear to me. Actually, the crashed workers
should restart, finish their work, and drop the slots. So not sure
what exactly this FIXME refers to?
2.
DropSubscription()
{
..
ReplicationSlotDropAtPubNode(
+ NULL,
+ conninfo, /* use conninfo to make a new connection. */
+ subname,
+ syncslotname);
..
}
With the above call, it will form a connection with the publisher and
drop the required slots. I think we need to save the connection info
so that we don't need to connect/disconnect for each slot to be
dropped. Later in this function, we again connect and drop the apply
worker slot. I think we should connect just once drop the apply and
table sync slots if any.
3.
ReplicationSlotDropAtPubNode(WalReceiverConn *wrconn_given, char
*conninfo, char *subname, char *slotname)
{
..
+ PG_TRY();
..
+ PG_CATCH();
+ {
+ /* NOP. Just gobble any ERROR. */
+ }
+ PG_END_TRY();
Why are we suppressing the error instead of handling it the error in
the same way as we do while dropping the apply worker slot in
DropSubscription?
4.
@@ -139,6 +141,28 @@ finish_sync_worker(void)
get_rel_name(MyLogicalRepWorker->relid))));
CommitTransactionCommand();
+ /*
+ * Cleanup the tablesync slot.
+ */
+ {
+ extern void ReplicationSlotDropAtPubNode(
+ WalReceiverConn *wrconn_given, char *conninfo, char *subname, char *slotname);
This is not how we export functions at other places?
--
With Regards,
Amit Kapila.
Hi Amit.
PSA my v5 WIP patch for the Solution1.
This patch still applies onto the v30 patch set [1]/messages/by-id/CAFPTHDYA8yE6tEmQ2USYS68kNt+kM=SwKgj=jy4AvFD5e9-UTQ@mail.gmail.com from other 2PC thread:
[1]: /messages/by-id/CAFPTHDYA8yE6tEmQ2USYS68kNt+kM=SwKgj=jy4AvFD5e9-UTQ@mail.gmail.com
(I understand you would like this to be delivered as a separate patch
independent of v30. I will convert it ASAP)
====
Coded / WIP:
* tablesync slot is now permanent instead of temporary. The tablesync
slot name is no longer tied to the Subscription slot name.
* the tablesync slot cleanup (drop) code is added for DropSubscription
and for finish_sync_worker functions
* tablesync worked now allowing multiple tx instead of single tx
* a new state (SUBREL_STATE_COPYDONE) is persisted after a successful
copy_table in LogicalRepSyncTableStart.
* if a relaunched tablesync finds the state is SUBREL_STATE_COPYDONE
then it will bypass the initial copy_table phase.
* tablesync sets up replication origin tracking in
LogicalRepSyncTableStart (similar as done for the apply worker). The
origin is advanced when first created.
* tablesync replication origin tracking is cleaned up during
DropSubscription and/or process_syncing_tables_for_apply
TODO / Known Issues:
* I think if there are crashed tablesync workers they may not be known
to DropSubscription current code. This might be a problem to cleanup
slots and/or origin tracking belonging to those unknown workers.
* Help / comments / cleanup
* There is temporary "!!>>" excessive logging of mine scattered around
which I added to help my testing during development
* Address review comments
---
Kind Regards,
Peter Smith.
Fujitsu Australia
Attachments:
v5-0001-2PC-change-tablesync-slot-to-use-same-two_phase-m.patchapplication/octet-stream; name=v5-0001-2PC-change-tablesync-slot-to-use-same-two_phase-m.patchDownload
From 27ecd449901c0b81fe9738b9fb1421c9b0d20d05 Mon Sep 17 00:00:00 2001
From: Peter Smith <peter.b.smith@fujitsu.com>
Date: Thu, 10 Dec 2020 16:38:05 +1100
Subject: [PATCH v5] 2PC - change tablesync slot to use same two_phase mode as
apply slot
---
src/backend/replication/logical/worker.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/src/backend/replication/logical/worker.c b/src/backend/replication/logical/worker.c
index e14fe62..9271f87 100644
--- a/src/backend/replication/logical/worker.c
+++ b/src/backend/replication/logical/worker.c
@@ -2759,7 +2759,7 @@ maybe_reread_subscription(void)
strcmp(newsub->slotname, MySubscription->slotname) != 0 ||
newsub->binary != MySubscription->binary ||
newsub->stream != MySubscription->stream ||
- (!am_tablesync_worker() && newsub->twophase != MySubscription->twophase) ||
+ newsub->twophase != MySubscription->twophase ||
!equal(newsub->publications, MySubscription->publications))
{
ereport(LOG,
@@ -3406,7 +3406,7 @@ ApplyWorkerMain(Datum main_arg)
options.proto.logical.publication_names = MySubscription->publications;
options.proto.logical.binary = MySubscription->binary;
options.proto.logical.streaming = MySubscription->stream;
- options.proto.logical.twophase = MySubscription->twophase && !am_tablesync_worker();
+ options.proto.logical.twophase = MySubscription->twophase;
/* Start normal logical streaming replication. */
walrcv_startstreaming(wrconn, &options);
--
1.8.3.1
v5-0002-WIP-patch-for-the-Solution1.patchapplication/octet-stream; name=v5-0002-WIP-patch-for-the-Solution1.patchDownload
From 7dbaa8c89085fbede1b21643635114810274613c Mon Sep 17 00:00:00 2001
From: Peter Smith <peter.b.smith@fujitsu.com>
Date: Mon, 21 Dec 2020 20:01:02 +1100
Subject: [PATCH v5] WIP patch for the Solution1.
This patch applies onto the v30 patch set [1] from other 2PC thread:
[1] https://www.postgresql.org/message-id/CAFPTHDYA8yE6tEmQ2USYS68kNt%2BkM%3DSwKgj%3Djy4AvFD5e9-UTQ%40mail.gmail.com
====
Coded / WIP:
* tablesync slot is now permanent instead of temporary. The tablesync slot name is no longer tied to the Subscripotion slot name.
* the tablesync slot cleanup (drop) code is added for DropSubscription and for finish_sync_worker functions
* tablesync worked now allowing multiple tx instead of single tx
* a new state (SUBREL_STATE_COPYDONE) is persisted after a successful copy_table in LogicalRepSyncTableStart.
* if a relaunched tablesync finds the state is SUBREL_STATE_COPYDONE then it will bypass the initial copy_table phase.
* tablesync sets up replication origin tracking in LogicalRepSyncTableStart (similar as done for apply worker). The origin is advanced when first created.
* tablesync replication origin tracking is cleaned up during DropSubscription and/or process_syncing_tables_for_apply
TODO / Known Issues:
* I think if there are crashed tablesync workers they may not be current known to DropSubscription. This might be a problem to cleanup slots and/or origin tracking belonging to those unknown workers.
* Help / comments / cleanup
* There is temporary "!!>>" excessive logging of mine scattered around which I added to help my testing during development
* Address review comments
---
src/backend/commands/subscriptioncmds.c | 113 ++++++++++++++
src/backend/replication/logical/origin.c | 4 +-
src/backend/replication/logical/tablesync.c | 219 ++++++++++++++++++++++++----
src/backend/replication/logical/worker.c | 21 +--
src/include/catalog/pg_subscription_rel.h | 1 +
src/include/replication/slot.h | 3 +
6 files changed, 312 insertions(+), 49 deletions(-)
diff --git a/src/backend/commands/subscriptioncmds.c b/src/backend/commands/subscriptioncmds.c
index b0745d5..e6594c8 100644
--- a/src/backend/commands/subscriptioncmds.c
+++ b/src/backend/commands/subscriptioncmds.c
@@ -37,6 +37,7 @@
#include "replication/walreceiver.h"
#include "replication/walsender.h"
#include "replication/worker_internal.h"
+#include "replication/slot.h"
#include "storage/lmgr.h"
#include "utils/acl.h"
#include "utils/builtins.h"
@@ -47,6 +48,8 @@
static List *fetch_table_list(WalReceiverConn *wrconn, List *publications);
+void ReplicationSlotDropAtPubNode(WalReceiverConn *wrconn_given, char *conninfo, char *subname, char *slotname);
+
/*
* Common option parsing function for CREATE and ALTER SUBSCRIPTION commands.
*
@@ -1070,6 +1073,44 @@ DropSubscription(DropSubscriptionStmt *stmt, bool isTopLevel)
{
LogicalRepWorker *w = (LogicalRepWorker *) lfirst(lc);
+ /*
+ * Is this a tablesync worker?
+ *
+ * If yes, drop the tablesync's slot, and clean-up and remove replication origin tracking.
+ */
+ if (OidIsValid(w->relid))
+ {
+ /*
+ * FIXME - Crashed tablesync workers may also have remaining slots because I don't think
+ * such workers are even iterated by this loop, and nobody else is removing them.
+ */
+ {
+ /* Calculate the name of the tablesync slot. */
+ char *syncslotname = ReplicationSlotNameForTablesync(subid, w->relid);
+
+ elog(LOG, "!!>> DropSubscription: now dropping the tablesync slot \"%s\".", syncslotname);
+ ReplicationSlotDropAtPubNode(
+ NULL,
+ conninfo, /* use conninfo to make a new connection. */
+ subname,
+ syncslotname);
+
+ pfree(syncslotname);
+ }
+
+ /* Remove the tablesync's origin tracking if exists. */
+ snprintf(originname, sizeof(originname), "pg_%u_%u", subid, w->relid);
+ originid = replorigin_by_name(originname, true);
+ if (originid != InvalidRepOriginId)
+ {
+ elog(LOG, "!!>> DropSubscription: dropping origin tracking for \"%s\"", originname);
+ replorigin_drop(originid, false);
+ elog(LOG, "!!>> DropSubscription: dropped origin tracking for \"%s\"", originname);
+ }
+
+ }
+
+ /* Stop the worker. */
logicalrep_worker_stop(w->subid, w->relid);
}
list_free(subworkers);
@@ -1144,6 +1185,78 @@ DropSubscription(DropSubscriptionStmt *stmt, bool isTopLevel)
table_close(rel, NoLock);
}
+
+/*
+ * Drop the replication slot at the publisher node
+ * using the replication connection.
+ *
+ * If the connection is passed then just use that,
+ * otherwise connect/disconnect within this function.
+ */
+void
+ReplicationSlotDropAtPubNode(WalReceiverConn *wrconn_given, char *conninfo, char *subname, char *slotname)
+{
+ StringInfoData cmd;
+
+ load_file("libpqwalreceiver", false);
+
+ initStringInfo(&cmd);
+ appendStringInfo(&cmd, "DROP_REPLICATION_SLOT %s WAIT", quote_identifier(slotname));
+
+ /*
+ * If the connection was passed then use it.
+ * If the connection was not passed then make a new connection using the passed conninfo.
+ */
+ if (wrconn_given != NULL)
+ {
+ Assert (conninfo == NULL);
+ wrconn = wrconn_given;
+ }
+ else
+ {
+ char *err = NULL;
+
+ Assert(conninfo != NULL);
+ wrconn = walrcv_connect(conninfo, true, subname, &err);
+
+ if (wrconn == NULL)
+ ereport(ERROR,
+ (errmsg("could not connect to publisher when attempting to "
+ "drop the replication slot \"%s\"", slotname),
+ errdetail("The error was: %s", err)));
+ }
+
+ PG_TRY();
+ {
+ WalRcvExecResult *res;
+
+ res = walrcv_exec(wrconn, cmd.data, 0, NULL);
+
+ if (res->status != WALRCV_OK_COMMAND)
+ ereport(ERROR,
+ (errmsg("could not drop the replication slot \"%s\" on publisher",
+ slotname),
+ errdetail("The error was: %s", res->err)));
+ else
+ ereport(LOG,
+ (errmsg("dropped replication slot \"%s\" on publisher",
+ slotname)));
+
+ walrcv_clear_result(res);
+ }
+ PG_CATCH();
+ {
+ /* NOP. Just gobble any ERROR. */
+ }
+ PG_END_TRY();
+
+ /* Disconnect the connection (unless using one passed) */
+ if (wrconn_given == NULL)
+ walrcv_disconnect(wrconn);
+
+ pfree(cmd.data);
+}
+
/*
* Internal workhorse for changing a subscription owner
*/
diff --git a/src/backend/replication/logical/origin.c b/src/backend/replication/logical/origin.c
index 15ab8e7..6b79dc6 100644
--- a/src/backend/replication/logical/origin.c
+++ b/src/backend/replication/logical/origin.c
@@ -843,7 +843,7 @@ replorigin_redo(XLogReaderState *record)
* that originated at the LSN remote_commit on the remote node was replayed
* successfully and that we don't need to do so again. In combination with
* setting up replorigin_session_origin_lsn and replorigin_session_origin
- * that ensures we won't loose knowledge about that after a crash if the
+ * that ensures we won't lose knowledge about that after a crash if the
* transaction had a persistent effect (think of asynchronous commits).
*
* local_commit needs to be a local LSN of the commit so that we can make sure
@@ -905,7 +905,7 @@ replorigin_advance(RepOriginId node,
LWLockAcquire(&replication_state->lock, LW_EXCLUSIVE);
/* Make sure it's not used by somebody else */
- if (replication_state->acquired_by != 0)
+ if (replication_state->acquired_by != 0 && replication_state->acquired_by != MyProcPid)
{
ereport(ERROR,
(errcode(ERRCODE_OBJECT_IN_USE),
diff --git a/src/backend/replication/logical/tablesync.c b/src/backend/replication/logical/tablesync.c
index 1904f34..5b349f5 100644
--- a/src/backend/replication/logical/tablesync.c
+++ b/src/backend/replication/logical/tablesync.c
@@ -102,6 +102,8 @@
#include "replication/logicalrelation.h"
#include "replication/walreceiver.h"
#include "replication/worker_internal.h"
+#include "replication/slot.h"
+#include "replication/origin.h"
#include "storage/ipc.h"
#include "utils/builtins.h"
#include "utils/lsyscache.h"
@@ -139,6 +141,24 @@ finish_sync_worker(void)
get_rel_name(MyLogicalRepWorker->relid))));
CommitTransactionCommand();
+ /*
+ * Cleanup the tablesync slot.
+ */
+ {
+ /* Calculate the name of the tablesync slot */
+ char *syncslotname = ReplicationSlotNameForTablesync(
+ MySubscription->oid,
+ MyLogicalRepWorker->relid);
+
+ elog(LOG, "!!>> finish_sync_worker: dropping the tablesync slot \"%s\".", syncslotname);
+ ReplicationSlotDropAtPubNode(
+ wrconn,
+ NULL, /* use the current connection. */
+ MySubscription->name, syncslotname);
+
+ pfree(syncslotname);
+ }
+
/* Find the main apply worker and signal it. */
logicalrep_worker_wakeup(MyLogicalRepWorker->subid, InvalidOid);
@@ -270,8 +290,6 @@ invalidate_syncing_table_states(Datum arg, int cacheid, uint32 hashvalue)
static void
process_syncing_tables_for_sync(XLogRecPtr current_lsn)
{
- Assert(IsTransactionState());
-
SpinLockAcquire(&MyLogicalRepWorker->relmutex);
if (MyLogicalRepWorker->relstate == SUBREL_STATE_CATCHUP &&
@@ -284,6 +302,15 @@ process_syncing_tables_for_sync(XLogRecPtr current_lsn)
SpinLockRelease(&MyLogicalRepWorker->relmutex);
+ /*
+ * UpdateSubscriptionRelState must be called within a transaction.
+ * That transaction will be ended within the finish_sync_worker().
+ */
+ if (!IsTransactionState())
+ {
+ StartTransactionCommand();
+ }
+
UpdateSubscriptionRelState(MyLogicalRepWorker->subid,
MyLogicalRepWorker->relid,
MyLogicalRepWorker->relstate,
@@ -416,6 +443,27 @@ process_syncing_tables_for_apply(XLogRecPtr current_lsn)
UpdateSubscriptionRelState(MyLogicalRepWorker->subid,
rstate->relid, rstate->state,
rstate->lsn);
+ /*
+ * Remove the tablesync origin tracking if exists.
+ *
+ * The cleanup is done here instead of in the finish_sync_worker function because
+ * if the tablesync worker process attempted to call replorigin_drop then that will
+ * hang because the replorigin_drop considers the owning tablesync PID as "busy".
+ */
+ {
+ char originname[NAMEDATALEN];
+ RepOriginId originid;
+
+ snprintf(originname, sizeof(originname), "pg_%u_%u", MyLogicalRepWorker->subid, rstate->relid);
+ originid = replorigin_by_name(originname, true);
+ elog(LOG, "!!>> apply worker: find tablesync origin tracking for \"%s\".", originname);
+ if (OidIsValid(originid))
+ {
+ elog(LOG, "!!>> apply worker: dropping tablesync origin tracking for \"%s\".", originname);
+ replorigin_drop(originid, false);
+ elog(LOG, "!!>> apply worker: dropped tablesync origin tracking for \"%s\".", originname);
+ }
+ }
}
}
else
@@ -808,6 +856,32 @@ copy_table(Relation rel)
logicalrep_rel_close(relmapentry, NoLock);
}
+
+/*
+ * Determine the tablesync slot name.
+ *
+ * The returned slot name is palloc'ed in current memory context.
+ */
+char *
+ReplicationSlotNameForTablesync(Oid suboid, Oid relid)
+{
+ char *syncslotname;
+
+ /*
+ * To build a slot name for the sync work, we are limited to NAMEDATALEN -
+ * 1 characters.
+ *
+ * The name is calculated as pg_%u_sync_%u (3 + 10 + 6 + 10 + '\0').
+ * (It's actually the NAMEDATALEN on the remote that matters, but this
+ * scheme will also work reasonably if that is different.)
+ */
+ StaticAssertStmt(NAMEDATALEN >= 32, "NAMEDATALEN too small"); /* for sanity */
+
+ syncslotname = psprintf("pg_%u_sync_%u", suboid, relid);
+
+ return syncslotname;
+}
+
/*
* Start syncing the table in the sync worker.
*
@@ -825,6 +899,7 @@ LogicalRepSyncTableStart(XLogRecPtr *origin_startpos)
XLogRecPtr relstate_lsn;
Relation rel;
WalRcvExecResult *res;
+ bool copied_ok;
/* Check the state of the table synchronization. */
StartTransactionCommand();
@@ -850,17 +925,8 @@ LogicalRepSyncTableStart(XLogRecPtr *origin_startpos)
finish_sync_worker(); /* doesn't return */
}
- /*
- * To build a slot name for the sync work, we are limited to NAMEDATALEN -
- * 1 characters. We cut the original slot name to NAMEDATALEN - 28 chars
- * and append _%u_sync_%u (1 + 10 + 6 + 10 + '\0'). (It's actually the
- * NAMEDATALEN on the remote that matters, but this scheme will also work
- * reasonably if that is different.)
- */
- StaticAssertStmt(NAMEDATALEN >= 32, "NAMEDATALEN too small"); /* for sanity */
- slotname = psprintf("%.*s_%u_sync_%u",
- NAMEDATALEN - 28,
- MySubscription->slotname,
+ /* Calculate the name of the tablesync slot. */
+ slotname = ReplicationSlotNameForTablesync(
MySubscription->oid,
MyLogicalRepWorker->relid);
@@ -875,7 +941,18 @@ LogicalRepSyncTableStart(XLogRecPtr *origin_startpos)
(errmsg("could not connect to the publisher: %s", err)));
Assert(MyLogicalRepWorker->relstate == SUBREL_STATE_INIT ||
- MyLogicalRepWorker->relstate == SUBREL_STATE_DATASYNC);
+ MyLogicalRepWorker->relstate == SUBREL_STATE_DATASYNC ||
+ MyLogicalRepWorker->relstate == SUBREL_STATE_COPYDONE);
+
+ if (MyLogicalRepWorker->relstate == SUBREL_STATE_COPYDONE)
+ {
+ /*
+ * The COPY phase was previously done, but tablesync then crashed/etc
+ * before it was able to finish normally.
+ */
+ elog(LOG, "!!>> LogicalRepSyncTableStart: tablesync relstate was SUBREL_STATE_COPYDONE.");
+ goto copy_table_done;
+ }
SpinLockAcquire(&MyLogicalRepWorker->relmutex);
MyLogicalRepWorker->relstate = SUBREL_STATE_DATASYNC;
@@ -891,9 +968,6 @@ LogicalRepSyncTableStart(XLogRecPtr *origin_startpos)
CommitTransactionCommand();
pgstat_report_stat(false);
- /*
- * We want to do the table data sync in a single transaction.
- */
StartTransactionCommand();
/*
@@ -919,29 +993,110 @@ LogicalRepSyncTableStart(XLogRecPtr *origin_startpos)
walrcv_clear_result(res);
/*
- * Create a new temporary logical decoding slot. This slot will be used
+ * Create a new permanent logical decoding slot. This slot will be used
* for the catchup phase after COPY is done, so tell it to use the
* snapshot to make the final data consistent.
*/
- walrcv_create_slot(wrconn, slotname, true,
+ elog(LOG, "!!>> LogicalRepSyncTableStart: walrcv_create_slot for \"%s\".", slotname);
+ walrcv_create_slot(wrconn, slotname, false,
CRS_USE_SNAPSHOT, origin_startpos);
- /* Now do the initial data copy */
- PushActiveSnapshot(GetTransactionSnapshot());
- copy_table(rel);
- PopActiveSnapshot();
+ /*
+ * Be sure to remove the newly created tablesync slot if the COPY fails.
+ */
+ copied_ok = false;
+ PG_TRY();
+ {
+ /* Now do the initial data copy */
+ PushActiveSnapshot(GetTransactionSnapshot());
+ copy_table(rel);
+ PopActiveSnapshot();
- res = walrcv_exec(wrconn, "COMMIT", 0, NULL);
- if (res->status != WALRCV_OK_COMMAND)
- ereport(ERROR,
- (errmsg("table copy could not finish transaction on publisher"),
- errdetail("The error was: %s", res->err)));
- walrcv_clear_result(res);
+ res = walrcv_exec(wrconn, "COMMIT", 0, NULL);
+ if (res->status != WALRCV_OK_COMMAND)
+ ereport(ERROR,
+ (errmsg("table copy could not finish transaction on publisher"),
+ errdetail("The error was: %s", res->err)));
+ walrcv_clear_result(res);
+
+ table_close(rel, NoLock);
+
+ /* Make the copy visible. */
+ CommandCounterIncrement();
+
+ copied_ok = true;
+ }
+ PG_FINALLY();
+ {
+ /* If something failed during copy table then cleanup the created slot. */
+ if (!copied_ok)
+ {
+ extern void ReplicationSlotDropAtPubNode(
+ WalReceiverConn *wrconn_given, char *conninfo, char *subname, char *slotname);
+
+ elog(LOG, "!!>> LogicalRepSyncTableStart: tablesync copy failed. Drop the tablesync slot \"%s\".", slotname);
+ ReplicationSlotDropAtPubNode(
+ wrconn,
+ NULL, /* use the current connection. */
+ MySubscription->name,
+ slotname);
+
+ pfree(slotname);
+ }
+ }
+ PG_END_TRY();
- table_close(rel, NoLock);
+ CommitTransactionCommand();
- /* Make the copy visible. */
- CommandCounterIncrement();
+ /* Update the persisted state to indicate the COPY phase is done; make it visible to others. */
+ StartTransactionCommand();
+ UpdateSubscriptionRelState(MyLogicalRepWorker->subid,
+ MyLogicalRepWorker->relid,
+ SUBREL_STATE_COPYDONE,
+ MyLogicalRepWorker->relstate_lsn);
+ CommitTransactionCommand();
+
+copy_table_done:
+
+ /* Setup replication origin tracking. */
+ {
+ char originname[NAMEDATALEN];
+ RepOriginId originid;
+
+ StartTransactionCommand();
+ snprintf(originname, sizeof(originname), "pg_%u_%u", MySubscription->oid, MyLogicalRepWorker->relid);
+ originid = replorigin_by_name(originname, true);
+ if (!OidIsValid(originid))
+ {
+ /*
+ * Origin tracking does not exist. Create it now, and advance to LSN got from walrcv_create_slot.
+ */
+ elog(LOG, "!!>> LogicalRepSyncTableStart: 1 replorigin_create \"%s\".", originname);
+ originid = replorigin_create(originname);
+ elog(LOG, "!!>> LogicalRepSyncTableStart: 1 replorigin_session_setup \"%s\".", originname);
+ replorigin_session_setup(originid);
+ replorigin_session_origin = originid;
+ elog(LOG, "!!>> LogicalRepSyncTableStart: 1 replorigin_advance \"%s\".", originname);
+ replorigin_advance(originid, *origin_startpos, InvalidXLogRecPtr,
+ true /* go backward */ , true /* WAL log */ );
+ }
+ else
+ {
+ /*
+ * Origin tracktrack already exists.
+ */
+ elog(LOG, "!!>> LogicalRepSyncTableStart: 2 replorigin_session_setup \"%s\".", originname);
+ replorigin_session_setup(originid);
+ replorigin_session_origin = originid;
+ elog(LOG, "!!>> LogicalRepSyncTableStart: 2 replorigin_session_get_progress \"%s\".", originname);
+ *origin_startpos = replorigin_session_get_progress(false);
+ }
+ elog(LOG, "!!>> LogicalRepSyncTableStart: '%s' origin_startpos lsn %X/%X",
+ originname,
+ (uint32) (*origin_startpos >> 32),
+ (uint32) *origin_startpos);
+ CommitTransactionCommand();
+ }
/*
* We are done with the initial data synchronization, update the state.
diff --git a/src/backend/replication/logical/worker.c b/src/backend/replication/logical/worker.c
index 9271f87..a60e9fd 100644
--- a/src/backend/replication/logical/worker.c
+++ b/src/backend/replication/logical/worker.c
@@ -771,8 +771,7 @@ apply_handle_prepare_txn(LogicalRepPrepareData *prepare_data)
Assert(prepare_data->prepare_lsn == remote_final_lsn);
- /* The synchronization worker runs in single transaction. */
- if (IsTransactionState() && !am_tablesync_worker())
+ if (IsTransactionState())
{
/*
* BeginTransactionBlock is necessary to balance the
@@ -1079,12 +1078,8 @@ apply_handle_stream_stop(StringInfo s)
/* We must be in a valid transaction state */
Assert(IsTransactionState());
- /* The synchronization worker runs in single transaction. */
- if (!am_tablesync_worker())
- {
- /* Commit the per-stream transaction */
- CommitTransactionCommand();
- }
+ /* Commit the per-stream transaction */
+ CommitTransactionCommand();
in_streamed_transaction = false;
@@ -1161,9 +1156,7 @@ apply_handle_stream_abort(StringInfo s)
/* Cleanup the subxact info */
cleanup_subxact_info();
- /* The synchronization worker runs in single transaction */
- if (!am_tablesync_worker())
- CommitTransactionCommand();
+ CommitTransactionCommand();
return;
}
@@ -1190,8 +1183,7 @@ apply_handle_stream_abort(StringInfo s)
/* write the updated subxact list */
subxact_info_write(MyLogicalRepWorker->subid, xid);
- if (!am_tablesync_worker())
- CommitTransactionCommand();
+ CommitTransactionCommand();
}
}
@@ -1350,8 +1342,7 @@ apply_handle_stream_commit(StringInfo s)
static void
apply_handle_commit_internal(StringInfo s, LogicalRepCommitData* commit_data)
{
- /* The synchronization worker runs in single transaction. */
- if (IsTransactionState() && !am_tablesync_worker())
+ if (IsTransactionState())
{
/*
* Update origin state so we can restart streaming from correct
diff --git a/src/include/catalog/pg_subscription_rel.h b/src/include/catalog/pg_subscription_rel.h
index acc2926..e9f2b3f 100644
--- a/src/include/catalog/pg_subscription_rel.h
+++ b/src/include/catalog/pg_subscription_rel.h
@@ -61,6 +61,7 @@ DECLARE_UNIQUE_INDEX(pg_subscription_rel_srrelid_srsubid_index, 6117, on pg_subs
#define SUBREL_STATE_INIT 'i' /* initializing (sublsn NULL) */
#define SUBREL_STATE_DATASYNC 'd' /* data is being synchronized (sublsn
* NULL) */
+#define SUBREL_STATE_COPYDONE 'C' /* tablesync copy phase is completed */
#define SUBREL_STATE_SYNCDONE 's' /* synchronization finished in front of
* apply (sublsn set) */
#define SUBREL_STATE_READY 'r' /* ready (sublsn set) */
diff --git a/src/include/replication/slot.h b/src/include/replication/slot.h
index 63bab69..bbc6b11 100644
--- a/src/include/replication/slot.h
+++ b/src/include/replication/slot.h
@@ -15,6 +15,7 @@
#include "storage/lwlock.h"
#include "storage/shmem.h"
#include "storage/spin.h"
+#include "replication/walreceiver.h"
/*
* Behaviour of replication slots, upon release or crash.
@@ -211,6 +212,8 @@ extern bool ReplicationSlotsCountDBSlots(Oid dboid, int *nslots, int *nactive);
extern void ReplicationSlotsDropDBSlots(Oid dboid);
extern void InvalidateObsoleteReplicationSlots(XLogSegNo oldestSegno);
extern ReplicationSlot *SearchNamedReplicationSlot(const char *name);
+extern char *ReplicationSlotNameForTablesync(Oid suboid, Oid relid);
+extern void ReplicationSlotDropAtPubNode(WalReceiverConn *wrconn_given, char *conninfo, char *subname, char *slotname);
extern void StartupReplicationSlots(void);
extern void CheckPointReplicationSlots(void);
--
1.8.3.1
On Sat, Dec 19, 2020 at 5:38 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
On Fri, Dec 18, 2020 at 6:41 PM Peter Smith <smithpb2250@gmail.com> wrote:
TODO / Known Issues:
* the current implementation of tablesync drop slot (e.g. from
DropSubscription or finish_sync_worker) regenerates the tablesync slot
name so it knows what slot to drop.If you always drop the slot at finish_sync_worker, then in which case
do you need to drop it during DropSubscription? Is it when the table
sync workers are crashed?
Yes. It is not the normal case. But if the tablesync never yet got to
SYNCDONE state (maybe crashed) then finish_sync_worker may not be
called.
So I think a rogue tablesync slot might still exist during DropSubscription.
The current code might be ok for
normal use cases, but if there is an ALTER SUBSCRIPTION ... SET
(slot_name = newname) it would fail to be able to find the tablesync
slot.Sure, but the same will be true for the apply worker slot as well. I
agree the problem would be more for table sync workers but I think we
can solve it, see below.* I think if there are crashed tablesync workers then they are not
known to DropSubscription. So this might be a problem to cleanup slots
and/or origin tracking belonging to those unknown workers.Yeah, I think we can do two things to avoid this and the previous
problem. (a) We can generate the slot_name for the table sync worker
based on only subscription_id and rel_id. (b) Immediately after
creating the slot, advance the replication origin with the position
(origin_startpos) we get from walrcv_create_slot, this will help us to
start from the right location.Do you see anything which will still not be addressed after doing the above?
(a) V5 Patch is updated as suggested.
(b) V5 Patch is updated as suggested. Now calling replorigin_advance.
No problems seen so far. All TAP tests pass, but more testing needed
for the origin stuff
I understand why you are trying to create this patch atop logical
decoding of 2PC patch but I think it is better to create this as an
independent patch and then use it to test 2PC problem.
OK. The latest patch still applies to v30 just for my convenience
today, but I will head towards converting this to an independent patch
ASAP.
Also, please
explain what kind of testing you did to ensure that it works properly
after the table sync worker restarts after the crash.
So far tested like this - I caused the tablesync to crash after
COPYDONE (but before SYNCDONE) by sending a row to cause a PK
violation while holding the tablesync at the CATCHUP state in the
debugger. The tablesync then handles the insert, encounters the PK
violation error, and re-launches. Then I can remove the extra row so
the PK violation does not happen, so the (re-launched) tablesync can
complete and finish normally. The Apply worker then takes over.
I have attached some captured/annotated logging of my test scenario
which I ran using the V4 patch (the log has a lot of extra temporary
output to help see what is going on)
---
Kind Regards,
Peter Smith.
Fujitsu Australia.
Attachments:
On Mon, Dec 21, 2020 at 4:23 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
Few other comments:
==================
Thanks for your feedback.
1. * FIXME 3 - Crashed tablesync workers may also have remaining slots because I don't think + * such workers are even iterated by this loop, and nobody else is removing them. + */ + if (slotname) + {The above FIXME is not clear to me. Actually, the crashed workers
should restart, finish their work, and drop the slots. So not sure
what exactly this FIXME refers to?
Yes, normally if the tablesync can complete it should behave like that.
But I think there are other scenarios where it may be unable to
clean-up after itself. For example:
i) Maybe the crashed tablesync worker cannot finish. e.g. A row insert
handled by tablesync can give a PK violation which also will crash
again and again for each re-launched/replacement tablesync worker.
This can be reproduced in the debugger. If the DropSubscription
doesn't clean-up the tablesync's slot then nobody will.
ii) Also DROP SUBSCRIPTION code has locking (see code commit) "to
ensure that the launcher doesn't restart new worker during dropping
the subscription". So executing DROP SUBSCRIPTION will prevent a newly
crashed tablesync from re-launching, so it won’t be able to take care
of its own slot. If the DropSubscription doesn't clean-up that
tablesync's slot then nobody will.
2. DropSubscription() { .. ReplicationSlotDropAtPubNode( + NULL, + conninfo, /* use conninfo to make a new connection. */ + subname, + syncslotname); .. }With the above call, it will form a connection with the publisher and
drop the required slots. I think we need to save the connection info
so that we don't need to connect/disconnect for each slot to be
dropped. Later in this function, we again connect and drop the apply
worker slot. I think we should connect just once drop the apply and
table sync slots if any.
OK. IIUC this is a suggestion for more efficient connection usage,
rather than actual bug right? I have added this suggestion to my TODO
list.
3. ReplicationSlotDropAtPubNode(WalReceiverConn *wrconn_given, char *conninfo, char *subname, char *slotname) { .. + PG_TRY(); .. + PG_CATCH(); + { + /* NOP. Just gobble any ERROR. */ + } + PG_END_TRY();Why are we suppressing the error instead of handling it the error in
the same way as we do while dropping the apply worker slot in
DropSubscription?
This function is common - it is also called from the tablesync
finish_sync_worker. But in the finish_sync_worker case I wanted to
avoid throwing an ERROR which would cause the tablesync to crash and
relaunch (and crash/relaunch/repeat...) when all it was trying to do
in the first place was just cleanup and exit the process. Perhaps the
error suppression should be conditional depending where this function
is called from?
4.
@@ -139,6 +141,28 @@ finish_sync_worker(void)
get_rel_name(MyLogicalRepWorker->relid))));
CommitTransactionCommand();+ /* + * Cleanup the tablesync slot. + */ + { + extern void ReplicationSlotDropAtPubNode( + WalReceiverConn *wrconn_given, char *conninfo, char *subname, char *slotname);This is not how we export functions at other places?
Fixed in latest v5 patch -
/messages/by-id/CAHut+PvmDJ_EO11_up=_cRbOjhdWCMG-n7kF-mdRhjtCHcjHRA@mail.gmail.com
----
Kind Regards,
Peter Smith.
Fujitsu Australia.
On Mon, Dec 21, 2020 at 3:17 PM Peter Smith <smithpb2250@gmail.com> wrote:
On Mon, Dec 21, 2020 at 4:23 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
Few other comments:
==================Thanks for your feedback.
1. * FIXME 3 - Crashed tablesync workers may also have remaining slots because I don't think + * such workers are even iterated by this loop, and nobody else is removing them. + */ + if (slotname) + {The above FIXME is not clear to me. Actually, the crashed workers
should restart, finish their work, and drop the slots. So not sure
what exactly this FIXME refers to?Yes, normally if the tablesync can complete it should behave like that.
But I think there are other scenarios where it may be unable to
clean-up after itself. For example:i) Maybe the crashed tablesync worker cannot finish. e.g. A row insert
handled by tablesync can give a PK violation which also will crash
again and again for each re-launched/replacement tablesync worker.
This can be reproduced in the debugger. If the DropSubscription
doesn't clean-up the tablesync's slot then nobody will.ii) Also DROP SUBSCRIPTION code has locking (see code commit) "to
ensure that the launcher doesn't restart new worker during dropping
the subscription".
Yeah, I have also read that comment but do you know how it is
preventing relaunch? How does the subscription lock help?
So executing DROP SUBSCRIPTION will prevent a newly
crashed tablesync from re-launching, so it won’t be able to take care
of its own slot. If the DropSubscription doesn't clean-up that
tablesync's slot then nobody will.
2. DropSubscription() { .. ReplicationSlotDropAtPubNode( + NULL, + conninfo, /* use conninfo to make a new connection. */ + subname, + syncslotname); .. }With the above call, it will form a connection with the publisher and
drop the required slots. I think we need to save the connection info
so that we don't need to connect/disconnect for each slot to be
dropped. Later in this function, we again connect and drop the apply
worker slot. I think we should connect just once drop the apply and
table sync slots if any.OK. IIUC this is a suggestion for more efficient connection usage,
rather than actual bug right?
Yes, it is for effective connection usage.
I have added this suggestion to my TODO
list.3. ReplicationSlotDropAtPubNode(WalReceiverConn *wrconn_given, char *conninfo, char *subname, char *slotname) { .. + PG_TRY(); .. + PG_CATCH(); + { + /* NOP. Just gobble any ERROR. */ + } + PG_END_TRY();Why are we suppressing the error instead of handling it the error in
the same way as we do while dropping the apply worker slot in
DropSubscription?This function is common - it is also called from the tablesync
finish_sync_worker. But in the finish_sync_worker case I wanted to
avoid throwing an ERROR which would cause the tablesync to crash and
relaunch (and crash/relaunch/repeat...) when all it was trying to do
in the first place was just cleanup and exit the process. Perhaps the
error suppression should be conditional depending where this function
is called from?
Yeah, that could be one way and if you follow my previous suggestion
this function might change a bit more.
--
With Regards,
Amit Kapila.
Hi Amit.
PSA my v6 WIP patch for the Solution1.
This patch still applies onto the v30 patch set [1]/messages/by-id/CAFPTHDYA8yE6tEmQ2USYS68kNt+kM=SwKgj=jy4AvFD5e9-UTQ@mail.gmail.com from other 2PC thread:
[1]: /messages/by-id/CAFPTHDYA8yE6tEmQ2USYS68kNt+kM=SwKgj=jy4AvFD5e9-UTQ@mail.gmail.com
(I understand you would like this to be delivered as a separate patch
independent of v30. I will convert it ASAP)
====
Coded / WIP:
* tablesync slot is now permanent instead of temporary. The tablesync
slot name is no longer tied to the Subscription slot name.
* the tablesync slot cleanup (drop) code is added for DropSubscription
and for finish_sync_worker functions
* tablesync worked now allowing multiple tx instead of single tx
* a new state (SUBREL_STATE_COPYDONE) is persisted after a successful
copy_table in LogicalRepSyncTableStart.
* if a relaunched tablesync finds the state is SUBREL_STATE_COPYDONE
then it will bypass the initial copy_table phase.
* tablesync sets up replication origin tracking in
LogicalRepSyncTableStart (similar as done for the apply worker). The
origin is advanced when first created.
* tablesync replication origin tracking is cleaned up during
DropSubscription and/or process_syncing_tables_for_apply
TODO / Known Issues:
* Crashed tablesync workers may not be known to DropSubscription
current code. This might be a problem to cleanup slots and/or origin
tracking belonging to those unknown workers.
* There seems to be a race condition during DROP SUBSCRIPTION. It
manifests as the TAP test 007 hanging. Logging shows it seems to be
during replorigin_drop when called from DropSubscription. It is timing
related and quite rare - e.g. Only happens 1x every 10x running
subscription TAP tests.
* Help / comments / cleanup
* There is temporary "!!>>" excessive logging of mine scattered around
which I added to help my testing during development
* Address review comments
---
Kind Regards,
Peter Smith.
Fujitsu Australia
Attachments:
v6-0002-WIP-patch-for-the-Solution1.patchapplication/octet-stream; name=v6-0002-WIP-patch-for-the-Solution1.patchDownload
From f58809f0af819fa1efc2d4258b7434a88f9c4196 Mon Sep 17 00:00:00 2001
From: Peter Smith <peter.b.smith@fujitsu.com>
Date: Tue, 22 Dec 2020 21:57:36 +1100
Subject: [PATCH v6] WIP patch for the Solution1.
This patch still applies onto the v30 patch set [1] from other 2PC thread:
[1] https://www.postgresql.org/message-id/CAFPTHDYA8yE6tEmQ2USYS68kNt%2BkM%3DSwKgj%3Djy4AvFD5e9-UTQ%40mail.gmail.com
(I understand this should be delivered as a separate patch independent of v30. I will convert it ASAP)
====
Coded / WIP:
* tablesync slot is now permanent instead of temporary. The tablesync slot name is no longer tied to the Subscription slot name.
* the tablesync slot cleanup (drop) code is added for DropSubscription and for finish_sync_worker functions
* tablesync worked now allowing multiple tx instead of single tx
* a new state (SUBREL_STATE_COPYDONE) is persisted after a successful copy_table in LogicalRepSyncTableStart.
* if a relaunched tablesync finds the state is SUBREL_STATE_COPYDONE then it will bypass the initial copy_table phase.
* tablesync sets up replication origin tracking in LogicalRepSyncTableStart (similar as done for the apply worker). The origin is advanced when first created.
* tablesync replication origin tracking is cleaned up during DropSubscription and/or process_syncing_tables_for_apply
TODO / Known Issues:
* Crashed tablesync workers may not be known to DropSubscription current code. This might be a problem to cleanup slots and/or origin tracking belonging to those unknown workers.
* There seem a race condition. It manifests as TAP test 007 hanging. Logging show it seems to be during replorigin_drop when called from DropSubscription. It is timing related and quite rare - e.g. Only happens 1 time every 10x running subscription TAP tests.
* Help / comments / cleanup
* There is temporary "!!>>" excessive logging of mine scattered around which I added to help my testing during development
* Address review comments
---
src/backend/commands/subscriptioncmds.c | 188 ++++++++++++++++-------
src/backend/replication/logical/origin.c | 4 +-
src/backend/replication/logical/tablesync.c | 223 ++++++++++++++++++++++++----
src/backend/replication/logical/worker.c | 21 +--
src/include/catalog/pg_subscription_rel.h | 1 +
src/include/replication/slot.h | 3 +
6 files changed, 334 insertions(+), 106 deletions(-)
diff --git a/src/backend/commands/subscriptioncmds.c b/src/backend/commands/subscriptioncmds.c
index b0745d5..c557a62 100644
--- a/src/backend/commands/subscriptioncmds.c
+++ b/src/backend/commands/subscriptioncmds.c
@@ -37,6 +37,7 @@
#include "replication/walreceiver.h"
#include "replication/walsender.h"
#include "replication/worker_internal.h"
+#include "replication/slot.h"
#include "storage/lmgr.h"
#include "utils/acl.h"
#include "utils/builtins.h"
@@ -960,7 +961,6 @@ DropSubscription(DropSubscriptionStmt *stmt, bool isTopLevel)
char *err = NULL;
RepOriginId originid;
WalReceiverConn *wrconn = NULL;
- StringInfoData cmd;
Form_pg_subscription form;
/*
@@ -1048,76 +1048,154 @@ DropSubscription(DropSubscriptionStmt *stmt, bool isTopLevel)
ReleaseSysCache(tup);
/*
- * Stop all the subscription workers immediately.
- *
- * This is necessary if we are dropping the replication slot, so that the
- * slot becomes accessible.
+ * Try to acquire the connection necessary for dropping slots.
+ * We do this here so that the same connection may be shared
+ * for dropping the Subscription slot, as well as dropping any
+ * tablesync slots.
*
- * It is also necessary if the subscription is disabled and was disabled
- * in the same transaction. Then the workers haven't seen the disabling
- * yet and will still be running, leading to hangs later when we want to
- * drop the replication origin. If the subscription was disabled before
- * this transaction, then there shouldn't be any workers left, so this
- * won't make a difference.
- *
- * New workers won't be started because we hold an exclusive lock on the
- * subscription till the end of the transaction.
+ * Note: If the slotname is NONE/NULL then connection errors are
+ * suppressed. This is necessary so that the DROP SUBSCRIPTION
+ * can still complete even when the connection to publisher is
+ * broken.
*/
- LWLockAcquire(LogicalRepWorkerLock, LW_SHARED);
- subworkers = logicalrep_workers_find(subid, false);
- LWLockRelease(LogicalRepWorkerLock);
- foreach(lc, subworkers)
+ load_file("libpqwalreceiver", false);
+
+ wrconn = walrcv_connect(conninfo, true, subname, &err);
+ if (wrconn == NULL && slotname != NULL)
+ ereport(ERROR,
+ (errmsg("could not connect to publisher when attempting to "
+ "drop the replication slot \"%s\"", slotname),
+ errdetail("The error was: %s", err),
+ /* translator: %s is an SQL ALTER command */
+ errhint("Use %s to disassociate the subscription from the slot.",
+ "ALTER SUBSCRIPTION ... SET (slot_name = NONE)")));
+
+ PG_TRY();
{
- LogicalRepWorker *w = (LogicalRepWorker *) lfirst(lc);
+ /*
+ * Stop all the subscription workers immediately.
+ *
+ * This is necessary if we are dropping the replication slot, so that the
+ * slot becomes accessible.
+ *
+ * It is also necessary if the subscription is disabled and was disabled
+ * in the same transaction. Then the workers haven't seen the disabling
+ * yet and will still be running, leading to hangs later when we want to
+ * drop the replication origin. If the subscription was disabled before
+ * this transaction, then there shouldn't be any workers left, so this
+ * won't make a difference.
+ *
+ * New workers won't be started because we hold an exclusive lock on the
+ * subscription till the end of the transaction.
+ */
+ LWLockAcquire(LogicalRepWorkerLock, LW_SHARED);
+ subworkers = logicalrep_workers_find(subid, false);
+ LWLockRelease(LogicalRepWorkerLock);
+ foreach(lc, subworkers)
+ {
+ LogicalRepWorker *w = (LogicalRepWorker *) lfirst(lc);
- logicalrep_worker_stop(w->subid, w->relid);
- }
- list_free(subworkers);
+ /*
+ * Is this a tablesync worker?
+ *
+ * If yes, drop the tablesync's slot, and remove replication origin tracking.
+ */
+ if (OidIsValid(w->relid))
+ {
+ char *syncslotname;
- /* Clean up dependencies */
- deleteSharedDependencyRecordsFor(SubscriptionRelationId, subid, 0);
+ /*
+ * FIXME - Crashed tablesync workers may also have remaining slots because I don't think
+ * such workers are even iterated by this loop, and nobody else is removing them.
+ */
- /* Remove any associated relation synchronization states. */
- RemoveSubscriptionRel(subid, InvalidOid);
+ syncslotname = ReplicationSlotNameForTablesync(subid, w->relid);
+ if (!wrconn)
+ {
+ /*
+ * If the subscription slotname is NONE/NULL and the connection to publisher is
+ * broken, then the DropSubscription will still be allowed to complete. But
+ * without a connection it is not possible to drop any tablesync slots.
+ *
+ * FIXME - So what to do? OK to just log a warning?
+ */
+ elog(WARNING, "!!>> DropSubscription: no connection. Cannot drop tablesync slot \"%s\".",
+ syncslotname);
+ }
+ else
+ {
+ elog(LOG, "!!>> DropSubscription: dropping the tablesync slot \"%s\".", syncslotname);
+ ReplicationSlotDropAtPubNode(wrconn, syncslotname);
+ elog(LOG, "!!>> DropSubscription: dropped the tablesync slot \"%s\".", syncslotname);
+ }
+ pfree(syncslotname);
- /* Remove the origin tracking if exists. */
- snprintf(originname, sizeof(originname), "pg_%u", subid);
- originid = replorigin_by_name(originname, true);
- if (originid != InvalidRepOriginId)
- replorigin_drop(originid, false);
+ /* Remove the tablesync's origin tracking if exists. */
+ snprintf(originname, sizeof(originname), "pg_%u_%u", subid, w->relid);
+ originid = replorigin_by_name(originname, true);
+ if (originid != InvalidRepOriginId)
+ {
+ elog(LOG, "!!>> DropSubscription: dropping origin tracking for \"%s\"", originname);
+ replorigin_drop(originid, false);
+ elog(LOG, "!!>> DropSubscription: dropped origin tracking for \"%s\"", originname);
+ }
+ }
- /*
- * If there is no slot associated with the subscription, we can finish
- * here.
- */
- if (!slotname)
+ /* Stop the worker. */
+ logicalrep_worker_stop(w->subid, w->relid);
+ }
+ list_free(subworkers);
+
+ /* Clean up dependencies */
+ deleteSharedDependencyRecordsFor(SubscriptionRelationId, subid, 0);
+
+ /* Remove any associated relation synchronization states. */
+ RemoveSubscriptionRel(subid, InvalidOid);
+
+ /* Remove the origin tracking if exists. */
+ snprintf(originname, sizeof(originname), "pg_%u", subid);
+ originid = replorigin_by_name(originname, true);
+ if (originid != InvalidRepOriginId)
+ replorigin_drop(originid, false);
+
+ /*
+ * If there is a slot associated with the subscription, then drop the
+ * replication slot at the publisher node using the replication
+ * connection.
+ */
+ if (slotname)
+ ReplicationSlotDropAtPubNode(wrconn, slotname);
+ }
+ PG_FINALLY();
{
+ if (wrconn)
+ walrcv_disconnect(wrconn);
+
table_close(rel, NoLock);
- return;
}
+ PG_END_TRY();
+}
+
+
+/*
+ * Drop the replication slot at the publisher node
+ * using the replication connection.
+ */
+void
+ReplicationSlotDropAtPubNode(WalReceiverConn *wrconn, char *slotname)
+{
+ StringInfoData cmd;
+
+ Assert(wrconn);
- /*
- * Otherwise drop the replication slot at the publisher node using the
- * replication connection.
- */
load_file("libpqwalreceiver", false);
initStringInfo(&cmd);
appendStringInfo(&cmd, "DROP_REPLICATION_SLOT %s WAIT", quote_identifier(slotname));
- wrconn = walrcv_connect(conninfo, true, subname, &err);
- if (wrconn == NULL)
- ereport(ERROR,
- (errmsg("could not connect to publisher when attempting to "
- "drop the replication slot \"%s\"", slotname),
- errdetail("The error was: %s", err),
- /* translator: %s is an SQL ALTER command */
- errhint("Use %s to disassociate the subscription from the slot.",
- "ALTER SUBSCRIPTION ... SET (slot_name = NONE)")));
-
PG_TRY();
{
- WalRcvExecResult *res;
+ WalRcvExecResult *res;
res = walrcv_exec(wrconn, cmd.data, 0, NULL);
@@ -1135,13 +1213,9 @@ DropSubscription(DropSubscriptionStmt *stmt, bool isTopLevel)
}
PG_FINALLY();
{
- walrcv_disconnect(wrconn);
+ pfree(cmd.data);
}
PG_END_TRY();
-
- pfree(cmd.data);
-
- table_close(rel, NoLock);
}
/*
diff --git a/src/backend/replication/logical/origin.c b/src/backend/replication/logical/origin.c
index 15ab8e7..6b79dc6 100644
--- a/src/backend/replication/logical/origin.c
+++ b/src/backend/replication/logical/origin.c
@@ -843,7 +843,7 @@ replorigin_redo(XLogReaderState *record)
* that originated at the LSN remote_commit on the remote node was replayed
* successfully and that we don't need to do so again. In combination with
* setting up replorigin_session_origin_lsn and replorigin_session_origin
- * that ensures we won't loose knowledge about that after a crash if the
+ * that ensures we won't lose knowledge about that after a crash if the
* transaction had a persistent effect (think of asynchronous commits).
*
* local_commit needs to be a local LSN of the commit so that we can make sure
@@ -905,7 +905,7 @@ replorigin_advance(RepOriginId node,
LWLockAcquire(&replication_state->lock, LW_EXCLUSIVE);
/* Make sure it's not used by somebody else */
- if (replication_state->acquired_by != 0)
+ if (replication_state->acquired_by != 0 && replication_state->acquired_by != MyProcPid)
{
ereport(ERROR,
(errcode(ERRCODE_OBJECT_IN_USE),
diff --git a/src/backend/replication/logical/tablesync.c b/src/backend/replication/logical/tablesync.c
index 1904f34..9388a84 100644
--- a/src/backend/replication/logical/tablesync.c
+++ b/src/backend/replication/logical/tablesync.c
@@ -102,6 +102,8 @@
#include "replication/logicalrelation.h"
#include "replication/walreceiver.h"
#include "replication/worker_internal.h"
+#include "replication/slot.h"
+#include "replication/origin.h"
#include "storage/ipc.h"
#include "utils/builtins.h"
#include "utils/lsyscache.h"
@@ -139,6 +141,33 @@ finish_sync_worker(void)
get_rel_name(MyLogicalRepWorker->relid))));
CommitTransactionCommand();
+ /*
+ * Cleanup the tablesync slot.
+ */
+ {
+ /* Calculate the name of the tablesync slot */
+ char *syncslotname = ReplicationSlotNameForTablesync(
+ MySubscription->oid,
+ MyLogicalRepWorker->relid);
+
+ PG_TRY();
+ {
+ elog(LOG, "!!>> finish_sync_worker: dropping the tablesync slot \"%s\".", syncslotname);
+ ReplicationSlotDropAtPubNode(wrconn, syncslotname);
+ elog(LOG, "!!>> finish_sync_worker: dropped the tablesync slot \"%s\".", syncslotname);
+ }
+ PG_CATCH();
+ {
+ /*
+ * NOP. Suppress any drop slot error because otherwise
+ * it would cause tablesync to fail and re-launch.
+ */
+ }
+ PG_END_TRY();
+
+ pfree(syncslotname);
+ }
+
/* Find the main apply worker and signal it. */
logicalrep_worker_wakeup(MyLogicalRepWorker->subid, InvalidOid);
@@ -270,8 +299,6 @@ invalidate_syncing_table_states(Datum arg, int cacheid, uint32 hashvalue)
static void
process_syncing_tables_for_sync(XLogRecPtr current_lsn)
{
- Assert(IsTransactionState());
-
SpinLockAcquire(&MyLogicalRepWorker->relmutex);
if (MyLogicalRepWorker->relstate == SUBREL_STATE_CATCHUP &&
@@ -284,6 +311,15 @@ process_syncing_tables_for_sync(XLogRecPtr current_lsn)
SpinLockRelease(&MyLogicalRepWorker->relmutex);
+ /*
+ * UpdateSubscriptionRelState must be called within a transaction.
+ * That transaction will be ended within the finish_sync_worker().
+ */
+ if (!IsTransactionState())
+ {
+ StartTransactionCommand();
+ }
+
UpdateSubscriptionRelState(MyLogicalRepWorker->subid,
MyLogicalRepWorker->relid,
MyLogicalRepWorker->relstate,
@@ -416,6 +452,27 @@ process_syncing_tables_for_apply(XLogRecPtr current_lsn)
UpdateSubscriptionRelState(MyLogicalRepWorker->subid,
rstate->relid, rstate->state,
rstate->lsn);
+ /*
+ * Remove the tablesync origin tracking if exists.
+ *
+ * The cleanup is done here instead of in the finish_sync_worker function because
+ * if the tablesync worker process attempted to call replorigin_drop then that will
+ * hang because the replorigin_drop considers the owning tablesync PID as "busy".
+ */
+ {
+ char originname[NAMEDATALEN];
+ RepOriginId originid;
+
+ snprintf(originname, sizeof(originname), "pg_%u_%u", MyLogicalRepWorker->subid, rstate->relid);
+ originid = replorigin_by_name(originname, true);
+ elog(LOG, "!!>> apply worker: find tablesync origin tracking for \"%s\".", originname);
+ if (OidIsValid(originid))
+ {
+ elog(LOG, "!!>> apply worker: dropping tablesync origin tracking for \"%s\".", originname);
+ replorigin_drop(originid, false);
+ elog(LOG, "!!>> apply worker: dropped tablesync origin tracking for \"%s\".", originname);
+ }
+ }
}
}
else
@@ -808,6 +865,32 @@ copy_table(Relation rel)
logicalrep_rel_close(relmapentry, NoLock);
}
+
+/*
+ * Determine the tablesync slot name.
+ *
+ * The returned slot name is palloc'ed in current memory context.
+ */
+char *
+ReplicationSlotNameForTablesync(Oid suboid, Oid relid)
+{
+ char *syncslotname;
+
+ /*
+ * To build a slot name for the sync work, we are limited to NAMEDATALEN -
+ * 1 characters.
+ *
+ * The name is calculated as pg_%u_sync_%u (3 + 10 + 6 + 10 + '\0').
+ * (It's actually the NAMEDATALEN on the remote that matters, but this
+ * scheme will also work reasonably if that is different.)
+ */
+ StaticAssertStmt(NAMEDATALEN >= 32, "NAMEDATALEN too small"); /* for sanity */
+
+ syncslotname = psprintf("pg_%u_sync_%u", suboid, relid);
+
+ return syncslotname;
+}
+
/*
* Start syncing the table in the sync worker.
*
@@ -825,6 +908,7 @@ LogicalRepSyncTableStart(XLogRecPtr *origin_startpos)
XLogRecPtr relstate_lsn;
Relation rel;
WalRcvExecResult *res;
+ bool copied_ok;
/* Check the state of the table synchronization. */
StartTransactionCommand();
@@ -850,17 +934,8 @@ LogicalRepSyncTableStart(XLogRecPtr *origin_startpos)
finish_sync_worker(); /* doesn't return */
}
- /*
- * To build a slot name for the sync work, we are limited to NAMEDATALEN -
- * 1 characters. We cut the original slot name to NAMEDATALEN - 28 chars
- * and append _%u_sync_%u (1 + 10 + 6 + 10 + '\0'). (It's actually the
- * NAMEDATALEN on the remote that matters, but this scheme will also work
- * reasonably if that is different.)
- */
- StaticAssertStmt(NAMEDATALEN >= 32, "NAMEDATALEN too small"); /* for sanity */
- slotname = psprintf("%.*s_%u_sync_%u",
- NAMEDATALEN - 28,
- MySubscription->slotname,
+ /* Calculate the name of the tablesync slot. */
+ slotname = ReplicationSlotNameForTablesync(
MySubscription->oid,
MyLogicalRepWorker->relid);
@@ -875,7 +950,18 @@ LogicalRepSyncTableStart(XLogRecPtr *origin_startpos)
(errmsg("could not connect to the publisher: %s", err)));
Assert(MyLogicalRepWorker->relstate == SUBREL_STATE_INIT ||
- MyLogicalRepWorker->relstate == SUBREL_STATE_DATASYNC);
+ MyLogicalRepWorker->relstate == SUBREL_STATE_DATASYNC ||
+ MyLogicalRepWorker->relstate == SUBREL_STATE_COPYDONE);
+
+ if (MyLogicalRepWorker->relstate == SUBREL_STATE_COPYDONE)
+ {
+ /*
+ * The COPY phase was previously done, but tablesync then crashed/etc
+ * before it was able to finish normally.
+ */
+ elog(LOG, "!!>> LogicalRepSyncTableStart: tablesync relstate was SUBREL_STATE_COPYDONE.");
+ goto copy_table_done;
+ }
SpinLockAcquire(&MyLogicalRepWorker->relmutex);
MyLogicalRepWorker->relstate = SUBREL_STATE_DATASYNC;
@@ -891,9 +977,6 @@ LogicalRepSyncTableStart(XLogRecPtr *origin_startpos)
CommitTransactionCommand();
pgstat_report_stat(false);
- /*
- * We want to do the table data sync in a single transaction.
- */
StartTransactionCommand();
/*
@@ -919,29 +1002,105 @@ LogicalRepSyncTableStart(XLogRecPtr *origin_startpos)
walrcv_clear_result(res);
/*
- * Create a new temporary logical decoding slot. This slot will be used
+ * Create a new permanent logical decoding slot. This slot will be used
* for the catchup phase after COPY is done, so tell it to use the
* snapshot to make the final data consistent.
*/
- walrcv_create_slot(wrconn, slotname, true,
+ elog(LOG, "!!>> LogicalRepSyncTableStart: walrcv_create_slot for \"%s\".", slotname);
+ walrcv_create_slot(wrconn, slotname, false,
CRS_USE_SNAPSHOT, origin_startpos);
- /* Now do the initial data copy */
- PushActiveSnapshot(GetTransactionSnapshot());
- copy_table(rel);
- PopActiveSnapshot();
+ /*
+ * Be sure to remove the newly created tablesync slot if the COPY fails.
+ */
+ copied_ok = false;
+ PG_TRY();
+ {
+ /* Now do the initial data copy */
+ PushActiveSnapshot(GetTransactionSnapshot());
+ copy_table(rel);
+ PopActiveSnapshot();
- res = walrcv_exec(wrconn, "COMMIT", 0, NULL);
- if (res->status != WALRCV_OK_COMMAND)
- ereport(ERROR,
- (errmsg("table copy could not finish transaction on publisher"),
- errdetail("The error was: %s", res->err)));
- walrcv_clear_result(res);
+ res = walrcv_exec(wrconn, "COMMIT", 0, NULL);
+ if (res->status != WALRCV_OK_COMMAND)
+ ereport(ERROR,
+ (errmsg("table copy could not finish transaction on publisher"),
+ errdetail("The error was: %s", res->err)));
+ walrcv_clear_result(res);
+
+ table_close(rel, NoLock);
- table_close(rel, NoLock);
+ /* Make the copy visible. */
+ CommandCounterIncrement();
- /* Make the copy visible. */
- CommandCounterIncrement();
+ copied_ok = true;
+ }
+ PG_FINALLY();
+ {
+ /* If something failed during copy table then cleanup the created slot. */
+ if (!copied_ok)
+ {
+ elog(LOG, "!!>> LogicalRepSyncTableStart: tablesync copy failed. Dropping the tablesync slot \"%s\".", slotname);
+ ReplicationSlotDropAtPubNode(wrconn, slotname);
+ elog(LOG, "!!>> LogicalRepSyncTableStart: tablesync copy failed. Dropped the tablesync slot \"%s\".", slotname);
+
+ pfree(slotname);
+ slotname = NULL;
+ }
+ }
+ PG_END_TRY();
+
+ CommitTransactionCommand();
+
+ /* Update the persisted state to indicate the COPY phase is done; make it visible to others. */
+ StartTransactionCommand();
+ UpdateSubscriptionRelState(MyLogicalRepWorker->subid,
+ MyLogicalRepWorker->relid,
+ SUBREL_STATE_COPYDONE,
+ MyLogicalRepWorker->relstate_lsn);
+ CommitTransactionCommand();
+
+copy_table_done:
+
+ /* Setup replication origin tracking. */
+ {
+ char originname[NAMEDATALEN];
+ RepOriginId originid;
+
+ StartTransactionCommand();
+ snprintf(originname, sizeof(originname), "pg_%u_%u", MySubscription->oid, MyLogicalRepWorker->relid);
+ originid = replorigin_by_name(originname, true);
+ if (!OidIsValid(originid))
+ {
+ /*
+ * Origin tracking does not exist. Create it now, and advance to LSN got from walrcv_create_slot.
+ */
+ elog(LOG, "!!>> LogicalRepSyncTableStart: 1 replorigin_create \"%s\".", originname);
+ originid = replorigin_create(originname);
+ elog(LOG, "!!>> LogicalRepSyncTableStart: 1 replorigin_session_setup \"%s\".", originname);
+ replorigin_session_setup(originid);
+ replorigin_session_origin = originid;
+ elog(LOG, "!!>> LogicalRepSyncTableStart: 1 replorigin_advance \"%s\".", originname);
+ replorigin_advance(originid, *origin_startpos, InvalidXLogRecPtr,
+ true /* go backward */ , true /* WAL log */ );
+ }
+ else
+ {
+ /*
+ * Origin tracktrack already exists.
+ */
+ elog(LOG, "!!>> LogicalRepSyncTableStart: 2 replorigin_session_setup \"%s\".", originname);
+ replorigin_session_setup(originid);
+ replorigin_session_origin = originid;
+ elog(LOG, "!!>> LogicalRepSyncTableStart: 2 replorigin_session_get_progress \"%s\".", originname);
+ *origin_startpos = replorigin_session_get_progress(false);
+ }
+ elog(LOG, "!!>> LogicalRepSyncTableStart: '%s' origin_startpos lsn %X/%X",
+ originname,
+ (uint32) (*origin_startpos >> 32),
+ (uint32) *origin_startpos);
+ CommitTransactionCommand();
+ }
/*
* We are done with the initial data synchronization, update the state.
diff --git a/src/backend/replication/logical/worker.c b/src/backend/replication/logical/worker.c
index 9271f87..a60e9fd 100644
--- a/src/backend/replication/logical/worker.c
+++ b/src/backend/replication/logical/worker.c
@@ -771,8 +771,7 @@ apply_handle_prepare_txn(LogicalRepPrepareData *prepare_data)
Assert(prepare_data->prepare_lsn == remote_final_lsn);
- /* The synchronization worker runs in single transaction. */
- if (IsTransactionState() && !am_tablesync_worker())
+ if (IsTransactionState())
{
/*
* BeginTransactionBlock is necessary to balance the
@@ -1079,12 +1078,8 @@ apply_handle_stream_stop(StringInfo s)
/* We must be in a valid transaction state */
Assert(IsTransactionState());
- /* The synchronization worker runs in single transaction. */
- if (!am_tablesync_worker())
- {
- /* Commit the per-stream transaction */
- CommitTransactionCommand();
- }
+ /* Commit the per-stream transaction */
+ CommitTransactionCommand();
in_streamed_transaction = false;
@@ -1161,9 +1156,7 @@ apply_handle_stream_abort(StringInfo s)
/* Cleanup the subxact info */
cleanup_subxact_info();
- /* The synchronization worker runs in single transaction */
- if (!am_tablesync_worker())
- CommitTransactionCommand();
+ CommitTransactionCommand();
return;
}
@@ -1190,8 +1183,7 @@ apply_handle_stream_abort(StringInfo s)
/* write the updated subxact list */
subxact_info_write(MyLogicalRepWorker->subid, xid);
- if (!am_tablesync_worker())
- CommitTransactionCommand();
+ CommitTransactionCommand();
}
}
@@ -1350,8 +1342,7 @@ apply_handle_stream_commit(StringInfo s)
static void
apply_handle_commit_internal(StringInfo s, LogicalRepCommitData* commit_data)
{
- /* The synchronization worker runs in single transaction. */
- if (IsTransactionState() && !am_tablesync_worker())
+ if (IsTransactionState())
{
/*
* Update origin state so we can restart streaming from correct
diff --git a/src/include/catalog/pg_subscription_rel.h b/src/include/catalog/pg_subscription_rel.h
index acc2926..e9f2b3f 100644
--- a/src/include/catalog/pg_subscription_rel.h
+++ b/src/include/catalog/pg_subscription_rel.h
@@ -61,6 +61,7 @@ DECLARE_UNIQUE_INDEX(pg_subscription_rel_srrelid_srsubid_index, 6117, on pg_subs
#define SUBREL_STATE_INIT 'i' /* initializing (sublsn NULL) */
#define SUBREL_STATE_DATASYNC 'd' /* data is being synchronized (sublsn
* NULL) */
+#define SUBREL_STATE_COPYDONE 'C' /* tablesync copy phase is completed */
#define SUBREL_STATE_SYNCDONE 's' /* synchronization finished in front of
* apply (sublsn set) */
#define SUBREL_STATE_READY 'r' /* ready (sublsn set) */
diff --git a/src/include/replication/slot.h b/src/include/replication/slot.h
index 63bab69..5f19089 100644
--- a/src/include/replication/slot.h
+++ b/src/include/replication/slot.h
@@ -15,6 +15,7 @@
#include "storage/lwlock.h"
#include "storage/shmem.h"
#include "storage/spin.h"
+#include "replication/walreceiver.h"
/*
* Behaviour of replication slots, upon release or crash.
@@ -211,6 +212,8 @@ extern bool ReplicationSlotsCountDBSlots(Oid dboid, int *nslots, int *nactive);
extern void ReplicationSlotsDropDBSlots(Oid dboid);
extern void InvalidateObsoleteReplicationSlots(XLogSegNo oldestSegno);
extern ReplicationSlot *SearchNamedReplicationSlot(const char *name);
+extern char *ReplicationSlotNameForTablesync(Oid suboid, Oid relid);
+extern void ReplicationSlotDropAtPubNode(WalReceiverConn *wrconn, char *slotname);
extern void StartupReplicationSlots(void);
extern void CheckPointReplicationSlots(void);
--
1.8.3.1
v6-0001-2PC-change-tablesync-slot-to-use-same-two_phase-m.patchapplication/octet-stream; name=v6-0001-2PC-change-tablesync-slot-to-use-same-two_phase-m.patchDownload
From 27ecd449901c0b81fe9738b9fb1421c9b0d20d05 Mon Sep 17 00:00:00 2001
From: Peter Smith <peter.b.smith@fujitsu.com>
Date: Thu, 10 Dec 2020 16:38:05 +1100
Subject: [PATCH v6] 2PC - change tablesync slot to use same two_phase mode as
apply slot
---
src/backend/replication/logical/worker.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/src/backend/replication/logical/worker.c b/src/backend/replication/logical/worker.c
index e14fe62..9271f87 100644
--- a/src/backend/replication/logical/worker.c
+++ b/src/backend/replication/logical/worker.c
@@ -2759,7 +2759,7 @@ maybe_reread_subscription(void)
strcmp(newsub->slotname, MySubscription->slotname) != 0 ||
newsub->binary != MySubscription->binary ||
newsub->stream != MySubscription->stream ||
- (!am_tablesync_worker() && newsub->twophase != MySubscription->twophase) ||
+ newsub->twophase != MySubscription->twophase ||
!equal(newsub->publications, MySubscription->publications))
{
ereport(LOG,
@@ -3406,7 +3406,7 @@ ApplyWorkerMain(Datum main_arg)
options.proto.logical.publication_names = MySubscription->publications;
options.proto.logical.binary = MySubscription->binary;
options.proto.logical.streaming = MySubscription->stream;
- options.proto.logical.twophase = MySubscription->twophase && !am_tablesync_worker();
+ options.proto.logical.twophase = MySubscription->twophase;
/* Start normal logical streaming replication. */
walrcv_startstreaming(wrconn, &options);
--
1.8.3.1
On Mon, Dec 21, 2020 at 11:36 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
On Mon, Dec 21, 2020 at 3:17 PM Peter Smith <smithpb2250@gmail.com> wrote:
On Mon, Dec 21, 2020 at 4:23 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
Few other comments:
==================Thanks for your feedback.
1. * FIXME 3 - Crashed tablesync workers may also have remaining slots because I don't think + * such workers are even iterated by this loop, and nobody else is removing them. + */ + if (slotname) + {The above FIXME is not clear to me. Actually, the crashed workers
should restart, finish their work, and drop the slots. So not sure
what exactly this FIXME refers to?Yes, normally if the tablesync can complete it should behave like that.
But I think there are other scenarios where it may be unable to
clean-up after itself. For example:i) Maybe the crashed tablesync worker cannot finish. e.g. A row insert
handled by tablesync can give a PK violation which also will crash
again and again for each re-launched/replacement tablesync worker.
This can be reproduced in the debugger. If the DropSubscription
doesn't clean-up the tablesync's slot then nobody will.ii) Also DROP SUBSCRIPTION code has locking (see code commit) "to
ensure that the launcher doesn't restart new worker during dropping
the subscription".Yeah, I have also read that comment but do you know how it is
preventing relaunch? How does the subscription lock help?
Hmmm. I did see there is a matching lock in get_subscription_list of
launcher.c, which may be what that code comment was referring to. But
I also am currently unsure how this lock prevents anybody (e.g.
process_syncing_tables_for_apply) from executing another
logicalrep_worker_launch.
So executing DROP SUBSCRIPTION will prevent a newly
crashed tablesync from re-launching, so it won’t be able to take care
of its own slot. If the DropSubscription doesn't clean-up that
tablesync's slot then nobody will.2. DropSubscription() { .. ReplicationSlotDropAtPubNode( + NULL, + conninfo, /* use conninfo to make a new connection. */ + subname, + syncslotname); .. }With the above call, it will form a connection with the publisher and
drop the required slots. I think we need to save the connection info
so that we don't need to connect/disconnect for each slot to be
dropped. Later in this function, we again connect and drop the apply
worker slot. I think we should connect just once drop the apply and
table sync slots if any.OK. IIUC this is a suggestion for more efficient connection usage,
rather than actual bug right?Yes, it is for effective connection usage.
I have addressed this in the latest patch [v6]
3. ReplicationSlotDropAtPubNode(WalReceiverConn *wrconn_given, char *conninfo, char *subname, char *slotname) { .. + PG_TRY(); .. + PG_CATCH(); + { + /* NOP. Just gobble any ERROR. */ + } + PG_END_TRY();Why are we suppressing the error instead of handling it the error in
the same way as we do while dropping the apply worker slot in
DropSubscription?This function is common - it is also called from the tablesync
finish_sync_worker. But in the finish_sync_worker case I wanted to
avoid throwing an ERROR which would cause the tablesync to crash and
relaunch (and crash/relaunch/repeat...) when all it was trying to do
in the first place was just cleanup and exit the process. Perhaps the
error suppression should be conditional depending where this function
is called from?Yeah, that could be one way and if you follow my previous suggestion
this function might change a bit more.
I have addressed this in the latest patch [v6]
---
[v6] /messages/by-id/CAHut+PuCLty2HGNT6neyOcUmBNxOLo=ybQ2Yv-nTR4kFY-8QLw@mail.gmail.com
Kind Regards,
Peter Smith.
Fujitsu Australia.
Hi Amit.
PSA my v7 WIP patch for the Solution1.
This patch still applies onto the v30 patch set [1]/messages/by-id/CAFPTHDYA8yE6tEmQ2USYS68kNt+kM=SwKgj=jy4AvFD5e9-UTQ@mail.gmail.com from other 2PC thread:
[1]: /messages/by-id/CAFPTHDYA8yE6tEmQ2USYS68kNt+kM=SwKgj=jy4AvFD5e9-UTQ@mail.gmail.com
(I understand you would like this to be delivered as a separate patch
independent of v30. I will convert it ASAP)
====
Coded / WIP:
* tablesync slot is now permanent instead of temporary. The tablesync
slot name is no longer tied to the Subscription slot name.
* the tablesync slot cleanup (drop) code is added for DropSubscription
and for finish_sync_worker functions
* tablesync worked now allowing multiple tx instead of single tx
* a new state (SUBREL_STATE_COPYDONE) is persisted after a successful
copy_table in LogicalRepSyncTableStart.
* if a relaunched tablesync finds the state is SUBREL_STATE_COPYDONE
then it will bypass the initial copy_table phase.
* tablesync sets up replication origin tracking in
LogicalRepSyncTableStart (similar as done for the apply worker). The
origin is advanced when first created.
* tablesync replication origin tracking is cleaned up during
DropSubscription and/or process_syncing_tables_for_apply
* The v7 DropSubscription cleanup code has been rewritten since v6.
The subscription TAP tests have been executed many (7) times now
without observing any of the race problems that I previously reported
seeing when using the v6 patch.
TODO / Known Issues:
* Help / comments / cleanup
* There is temporary "!!>>" excessive logging scattered around which I
added to help my testing during development
* Address review comments
---
Kind Regards,
Peter Smith.
Fujitsu Australia
Attachments:
v7-0002-WIP-patch-for-the-Solution1.patchapplication/octet-stream; name=v7-0002-WIP-patch-for-the-Solution1.patchDownload
From 243ffbfc7622af7bbfc69ee5aa816198568c019c Mon Sep 17 00:00:00 2001
From: Peter Smith <peter.b.smith@fujitsu.com>
Date: Wed, 23 Dec 2020 17:08:30 +1100
Subject: [PATCH v7] WIP patch for the Solution1.
This patch still applies onto the v30 patch set [1] from other 2PC thread:
[1] https://www.postgresql.org/message-id/CAFPTHDYA8yE6tEmQ2USYS68kNt%2BkM%3DSwKgj%3Djy4AvFD5e9-UTQ%40mail.gmail.com
(I understand this should be delivered as a separate patch independent of v30. I will convert it ASAP)
====
Coded / WIP:
* tablesync slot is now permanent instead of temporary. The tablesync slot name is no longer tied to the Subscription slot name.
* the tablesync slot cleanup (drop) code is added for DropSubscription and for finish_sync_worker functions
* tablesync worked now allowing multiple tx instead of single tx
* a new state (SUBREL_STATE_COPYDONE) is persisted after a successful copy_table in LogicalRepSyncTableStart.
* if a relaunched tablesync finds the state is SUBREL_STATE_COPYDONE then it will bypass the initial copy_table phase.
* tablesync sets up replication origin tracking in LogicalRepSyncTableStart (similar as done for the apply worker). The origin is advanced when first created.
* tablesync replication origin tracking is cleaned up during DropSubscription and/or process_syncing_tables_for_apply
* The v7 DropSubscription cleanup code is quite different now to how it was in v6. The subscription TAP tests have been executed 6x now without observing any race problems that were sometimes seen to happen in the v6 patch.
TODO / Known Issues:
* Help / comments / cleanup
* There is temporary "!!>>" excessive logging scattered around which I added to help my testing during development
* Address review comments
---
src/backend/commands/subscriptioncmds.c | 221 +++++++++++++++++++-------
src/backend/replication/logical/origin.c | 4 +-
src/backend/replication/logical/tablesync.c | 231 ++++++++++++++++++++++++----
src/backend/replication/logical/worker.c | 21 +--
src/include/catalog/pg_subscription_rel.h | 1 +
src/include/replication/slot.h | 3 +
6 files changed, 375 insertions(+), 106 deletions(-)
diff --git a/src/backend/commands/subscriptioncmds.c b/src/backend/commands/subscriptioncmds.c
index b0745d5..b98a7e5 100644
--- a/src/backend/commands/subscriptioncmds.c
+++ b/src/backend/commands/subscriptioncmds.c
@@ -37,6 +37,7 @@
#include "replication/walreceiver.h"
#include "replication/walsender.h"
#include "replication/worker_internal.h"
+#include "replication/slot.h"
#include "storage/lmgr.h"
#include "utils/acl.h"
#include "utils/builtins.h"
@@ -960,7 +961,6 @@ DropSubscription(DropSubscriptionStmt *stmt, bool isTopLevel)
char *err = NULL;
RepOriginId originid;
WalReceiverConn *wrconn = NULL;
- StringInfoData cmd;
Form_pg_subscription form;
/*
@@ -1048,76 +1048,187 @@ DropSubscription(DropSubscriptionStmt *stmt, bool isTopLevel)
ReleaseSysCache(tup);
/*
- * Stop all the subscription workers immediately.
- *
- * This is necessary if we are dropping the replication slot, so that the
- * slot becomes accessible.
+ * Try to acquire the connection necessary for dropping slots.
+ * We do this here so that the same connection may be shared
+ * for dropping the Subscription slot, as well as dropping any
+ * tablesync slots.
*
- * It is also necessary if the subscription is disabled and was disabled
- * in the same transaction. Then the workers haven't seen the disabling
- * yet and will still be running, leading to hangs later when we want to
- * drop the replication origin. If the subscription was disabled before
- * this transaction, then there shouldn't be any workers left, so this
- * won't make a difference.
- *
- * New workers won't be started because we hold an exclusive lock on the
- * subscription till the end of the transaction.
+ * Note: If the slotname is NONE/NULL then connection errors are
+ * suppressed. This is necessary so that the DROP SUBSCRIPTION
+ * can still complete even when the connection to publisher is
+ * broken.
*/
- LWLockAcquire(LogicalRepWorkerLock, LW_SHARED);
- subworkers = logicalrep_workers_find(subid, false);
- LWLockRelease(LogicalRepWorkerLock);
- foreach(lc, subworkers)
+ load_file("libpqwalreceiver", false);
+
+ wrconn = walrcv_connect(conninfo, true, subname, &err);
+ if (wrconn == NULL && slotname != NULL)
+ ereport(ERROR,
+ (errmsg("could not connect to publisher when attempting to "
+ "drop the replication slot \"%s\"", slotname),
+ errdetail("The error was: %s", err),
+ /* translator: %s is an SQL ALTER command */
+ errhint("Use %s to disassociate the subscription from the slot.",
+ "ALTER SUBSCRIPTION ... SET (slot_name = NONE)")));
+
+ PG_TRY();
{
- LogicalRepWorker *w = (LogicalRepWorker *) lfirst(lc);
+ /*
+ * Stop all the subscription workers immediately.
+ *
+ * This is necessary if we are dropping the replication slot, so that the
+ * slot becomes accessible.
+ *
+ * It is also necessary if the subscription is disabled and was disabled
+ * in the same transaction. Then the workers haven't seen the disabling
+ * yet and will still be running, leading to hangs later when we want to
+ * drop the replication origin. If the subscription was disabled before
+ * this transaction, then there shouldn't be any workers left, so this
+ * won't make a difference.
+ *
+ * New workers won't be started because we hold an exclusive lock on the
+ * subscription till the end of the transaction.
+ */
+ LWLockAcquire(LogicalRepWorkerLock, LW_SHARED);
+ subworkers = logicalrep_workers_find(subid, false);
+ LWLockRelease(LogicalRepWorkerLock);
+ foreach(lc, subworkers)
+ {
+ LogicalRepWorker *w = (LogicalRepWorker *) lfirst(lc);
- logicalrep_worker_stop(w->subid, w->relid);
- }
- list_free(subworkers);
+ logicalrep_worker_stop(w->subid, w->relid);
+ }
+ list_free(subworkers);
- /* Clean up dependencies */
- deleteSharedDependencyRecordsFor(SubscriptionRelationId, subid, 0);
+ /*
+ * Tablesync resource cleanup (slots and origins).
+ *
+ * Any READY-state relations should already have been dealt with clean-ups.
+ */
+ {
+ List *rstates;
+ ListCell *lc;
- /* Remove any associated relation synchronization states. */
- RemoveSubscriptionRel(subid, InvalidOid);
+ rstates = GetSubscriptionNotReadyRelations(subid);
+ foreach(lc, rstates)
+ {
+ SubscriptionRelState *rstate = (SubscriptionRelState *) lfirst(lc);
+ Oid relid = rstate->relid;
- /* Remove the origin tracking if exists. */
- snprintf(originname, sizeof(originname), "pg_%u", subid);
- originid = replorigin_by_name(originname, true);
- if (originid != InvalidRepOriginId)
- replorigin_drop(originid, false);
+ /* Only cleanup the tablesync worker resources */
+ if (!OidIsValid(relid))
+ continue;
- /*
- * If there is no slot associated with the subscription, we can finish
- * here.
- */
- if (!slotname)
+ /* Drop the tablesync slot. */
+ {
+ char *syncslotname = ReplicationSlotNameForTablesync(subid, relid);
+
+ /*
+ * If the subscription slotname is NONE/NULL and the connection to publisher is
+ * broken, then the DropSubscription should still be allowed to complete.
+ * But without a connection it is not possible to drop any tablesync slots.
+ */
+ if (!wrconn)
+ {
+ /* FIXME - OK to just log a warning? */
+ elog(WARNING, "!!>> DropSubscription: no connection. Cannot drop tablesync slot \"%s\".",
+ syncslotname);
+ }
+ else
+ {
+ PG_TRY();
+ {
+ elog(LOG, "!!>> DropSubscription: dropping the tablesync slot \"%s\".", syncslotname);
+ ReplicationSlotDropAtPubNode(wrconn, syncslotname);
+ elog(LOG, "!!>> DropSubscription: dropped the tablesync slot \"%s\".", syncslotname);
+ }
+ PG_CATCH();
+ {
+ /*
+ * Typically tablesync will delete its own slot after it reaches
+ * SYNCDONE state. Next the apply worker move the tablesync from
+ * SYNCDONE to READY state.
+ *
+ * Rarely, the DropSubscription may be issued when a tablesync still
+ * is in SYNCDONE but not yet in READY state. If this happens then
+ * the drop slot could fail because it is already dropped.
+ * In this case suppress and drop slot error.
+ *
+ * FIXME - Is there a better way than this?
+ */
+ if (rstate->state != SUBREL_STATE_SYNCDONE)
+ PG_RE_THROW();
+ }
+ PG_END_TRY();
+ }
+ pfree(syncslotname);
+ }
+
+ /* Remove the tablesync's origin tracking if exists. */
+ {
+ snprintf(originname, sizeof(originname), "pg_%u_%u", subid, relid);
+ originid = replorigin_by_name(originname, true);
+ if (originid != InvalidRepOriginId)
+ {
+ elog(LOG, "!!>> DropSubscription: dropping origin tracking for \"%s\"", originname);
+ replorigin_drop(originid, false);
+ elog(LOG, "!!>> DropSubscription: dropped origin tracking for \"%s\"", originname);
+ }
+ }
+
+ }
+ list_free(rstates);
+ }
+
+ /* Clean up dependencies */
+ deleteSharedDependencyRecordsFor(SubscriptionRelationId, subid, 0);
+
+ /* Remove any associated relation synchronization states. */
+ RemoveSubscriptionRel(subid, InvalidOid);
+
+ /* Remove the origin tracking if exists. */
+ snprintf(originname, sizeof(originname), "pg_%u", subid);
+ originid = replorigin_by_name(originname, true);
+ if (originid != InvalidRepOriginId)
+ replorigin_drop(originid, false);
+
+ /*
+ * If there is a slot associated with the subscription, then drop the
+ * replication slot at the publisher node using the replication
+ * connection.
+ */
+ if (slotname)
+ ReplicationSlotDropAtPubNode(wrconn, slotname);
+ }
+ PG_FINALLY();
{
+ if (wrconn)
+ walrcv_disconnect(wrconn);
+
table_close(rel, NoLock);
- return;
}
+ PG_END_TRY();
+}
+
+
+/*
+ * Drop the replication slot at the publisher node
+ * using the replication connection.
+ */
+void
+ReplicationSlotDropAtPubNode(WalReceiverConn *wrconn, char *slotname)
+{
+ StringInfoData cmd;
+
+ Assert(wrconn);
- /*
- * Otherwise drop the replication slot at the publisher node using the
- * replication connection.
- */
load_file("libpqwalreceiver", false);
initStringInfo(&cmd);
appendStringInfo(&cmd, "DROP_REPLICATION_SLOT %s WAIT", quote_identifier(slotname));
- wrconn = walrcv_connect(conninfo, true, subname, &err);
- if (wrconn == NULL)
- ereport(ERROR,
- (errmsg("could not connect to publisher when attempting to "
- "drop the replication slot \"%s\"", slotname),
- errdetail("The error was: %s", err),
- /* translator: %s is an SQL ALTER command */
- errhint("Use %s to disassociate the subscription from the slot.",
- "ALTER SUBSCRIPTION ... SET (slot_name = NONE)")));
-
PG_TRY();
{
- WalRcvExecResult *res;
+ WalRcvExecResult *res;
res = walrcv_exec(wrconn, cmd.data, 0, NULL);
@@ -1135,13 +1246,9 @@ DropSubscription(DropSubscriptionStmt *stmt, bool isTopLevel)
}
PG_FINALLY();
{
- walrcv_disconnect(wrconn);
+ pfree(cmd.data);
}
PG_END_TRY();
-
- pfree(cmd.data);
-
- table_close(rel, NoLock);
}
/*
diff --git a/src/backend/replication/logical/origin.c b/src/backend/replication/logical/origin.c
index 15ab8e7..6b79dc6 100644
--- a/src/backend/replication/logical/origin.c
+++ b/src/backend/replication/logical/origin.c
@@ -843,7 +843,7 @@ replorigin_redo(XLogReaderState *record)
* that originated at the LSN remote_commit on the remote node was replayed
* successfully and that we don't need to do so again. In combination with
* setting up replorigin_session_origin_lsn and replorigin_session_origin
- * that ensures we won't loose knowledge about that after a crash if the
+ * that ensures we won't lose knowledge about that after a crash if the
* transaction had a persistent effect (think of asynchronous commits).
*
* local_commit needs to be a local LSN of the commit so that we can make sure
@@ -905,7 +905,7 @@ replorigin_advance(RepOriginId node,
LWLockAcquire(&replication_state->lock, LW_EXCLUSIVE);
/* Make sure it's not used by somebody else */
- if (replication_state->acquired_by != 0)
+ if (replication_state->acquired_by != 0 && replication_state->acquired_by != MyProcPid)
{
ereport(ERROR,
(errcode(ERRCODE_OBJECT_IN_USE),
diff --git a/src/backend/replication/logical/tablesync.c b/src/backend/replication/logical/tablesync.c
index 1904f34..0d7e4ce 100644
--- a/src/backend/replication/logical/tablesync.c
+++ b/src/backend/replication/logical/tablesync.c
@@ -102,6 +102,8 @@
#include "replication/logicalrelation.h"
#include "replication/walreceiver.h"
#include "replication/worker_internal.h"
+#include "replication/slot.h"
+#include "replication/origin.h"
#include "storage/ipc.h"
#include "utils/builtins.h"
#include "utils/lsyscache.h"
@@ -139,6 +141,33 @@ finish_sync_worker(void)
get_rel_name(MyLogicalRepWorker->relid))));
CommitTransactionCommand();
+ /*
+ * Cleanup the tablesync slot.
+ */
+ {
+ /* Calculate the name of the tablesync slot */
+ char *syncslotname = ReplicationSlotNameForTablesync(
+ MySubscription->oid,
+ MyLogicalRepWorker->relid);
+
+ PG_TRY();
+ {
+ elog(LOG, "!!>> finish_sync_worker: dropping the tablesync slot \"%s\".", syncslotname);
+ ReplicationSlotDropAtPubNode(wrconn, syncslotname);
+ elog(LOG, "!!>> finish_sync_worker: dropped the tablesync slot \"%s\".", syncslotname);
+ }
+ PG_CATCH();
+ {
+ /*
+ * NOP. Suppress any drop slot error because otherwise
+ * it would cause the tablesync to fail and re-launch.
+ */
+ }
+ PG_END_TRY();
+
+ pfree(syncslotname);
+ }
+
/* Find the main apply worker and signal it. */
logicalrep_worker_wakeup(MyLogicalRepWorker->subid, InvalidOid);
@@ -270,8 +299,6 @@ invalidate_syncing_table_states(Datum arg, int cacheid, uint32 hashvalue)
static void
process_syncing_tables_for_sync(XLogRecPtr current_lsn)
{
- Assert(IsTransactionState());
-
SpinLockAcquire(&MyLogicalRepWorker->relmutex);
if (MyLogicalRepWorker->relstate == SUBREL_STATE_CATCHUP &&
@@ -284,6 +311,15 @@ process_syncing_tables_for_sync(XLogRecPtr current_lsn)
SpinLockRelease(&MyLogicalRepWorker->relmutex);
+ /*
+ * UpdateSubscriptionRelState must be called within a transaction.
+ * That transaction will be ended within the finish_sync_worker().
+ */
+ if (!IsTransactionState())
+ {
+ StartTransactionCommand();
+ }
+
UpdateSubscriptionRelState(MyLogicalRepWorker->subid,
MyLogicalRepWorker->relid,
MyLogicalRepWorker->relstate,
@@ -407,12 +443,41 @@ process_syncing_tables_for_apply(XLogRecPtr current_lsn)
{
rstate->state = SUBREL_STATE_READY;
rstate->lsn = current_lsn;
+
if (!started_tx)
{
StartTransactionCommand();
started_tx = true;
}
+ /*
+ * Remove the tablesync origin tracking if exists.
+ *
+ * The cleanup is done here instead of in the finish_sync_worker function because
+ * if the tablesync worker process attempted to call replorigin_drop then that will
+ * hang because the replorigin_drop considers the owning tablesync PID as "busy".
+ *
+ * Do this before updating the state, so that DropSubscription can know that all
+ * READY workers have already had their origin tracking removed.
+ */
+ {
+ char originname[NAMEDATALEN];
+ RepOriginId originid;
+
+ snprintf(originname, sizeof(originname), "pg_%u_%u", MyLogicalRepWorker->subid, rstate->relid);
+ originid = replorigin_by_name(originname, true);
+ elog(LOG, "!!>> apply worker: find tablesync origin tracking for \"%s\".", originname);
+ if (OidIsValid(originid))
+ {
+ elog(LOG, "!!>> apply worker: dropping tablesync origin tracking for \"%s\".", originname);
+ replorigin_drop(originid, false);
+ elog(LOG, "!!>> apply worker: dropped tablesync origin tracking for \"%s\".", originname);
+ }
+ }
+
+ /*
+ * Update the state only after the origin cleanup.
+ */
UpdateSubscriptionRelState(MyLogicalRepWorker->subid,
rstate->relid, rstate->state,
rstate->lsn);
@@ -808,6 +873,32 @@ copy_table(Relation rel)
logicalrep_rel_close(relmapentry, NoLock);
}
+
+/*
+ * Determine the tablesync slot name.
+ *
+ * The returned slot name is palloc'ed in current memory context.
+ */
+char *
+ReplicationSlotNameForTablesync(Oid suboid, Oid relid)
+{
+ char *syncslotname;
+
+ /*
+ * To build a slot name for the sync work, we are limited to NAMEDATALEN -
+ * 1 characters.
+ *
+ * The name is calculated as pg_%u_sync_%u (3 + 10 + 6 + 10 + '\0').
+ * (It's actually the NAMEDATALEN on the remote that matters, but this
+ * scheme will also work reasonably if that is different.)
+ */
+ StaticAssertStmt(NAMEDATALEN >= 32, "NAMEDATALEN too small"); /* for sanity */
+
+ syncslotname = psprintf("pg_%u_sync_%u", suboid, relid);
+
+ return syncslotname;
+}
+
/*
* Start syncing the table in the sync worker.
*
@@ -825,6 +916,7 @@ LogicalRepSyncTableStart(XLogRecPtr *origin_startpos)
XLogRecPtr relstate_lsn;
Relation rel;
WalRcvExecResult *res;
+ bool copied_ok;
/* Check the state of the table synchronization. */
StartTransactionCommand();
@@ -850,17 +942,8 @@ LogicalRepSyncTableStart(XLogRecPtr *origin_startpos)
finish_sync_worker(); /* doesn't return */
}
- /*
- * To build a slot name for the sync work, we are limited to NAMEDATALEN -
- * 1 characters. We cut the original slot name to NAMEDATALEN - 28 chars
- * and append _%u_sync_%u (1 + 10 + 6 + 10 + '\0'). (It's actually the
- * NAMEDATALEN on the remote that matters, but this scheme will also work
- * reasonably if that is different.)
- */
- StaticAssertStmt(NAMEDATALEN >= 32, "NAMEDATALEN too small"); /* for sanity */
- slotname = psprintf("%.*s_%u_sync_%u",
- NAMEDATALEN - 28,
- MySubscription->slotname,
+ /* Calculate the name of the tablesync slot. */
+ slotname = ReplicationSlotNameForTablesync(
MySubscription->oid,
MyLogicalRepWorker->relid);
@@ -875,7 +958,18 @@ LogicalRepSyncTableStart(XLogRecPtr *origin_startpos)
(errmsg("could not connect to the publisher: %s", err)));
Assert(MyLogicalRepWorker->relstate == SUBREL_STATE_INIT ||
- MyLogicalRepWorker->relstate == SUBREL_STATE_DATASYNC);
+ MyLogicalRepWorker->relstate == SUBREL_STATE_DATASYNC ||
+ MyLogicalRepWorker->relstate == SUBREL_STATE_COPYDONE);
+
+ if (MyLogicalRepWorker->relstate == SUBREL_STATE_COPYDONE)
+ {
+ /*
+ * The COPY phase was previously done, but tablesync then crashed/etc
+ * before it was able to finish normally.
+ */
+ elog(LOG, "!!>> LogicalRepSyncTableStart: tablesync relstate was SUBREL_STATE_COPYDONE.");
+ goto copy_table_done;
+ }
SpinLockAcquire(&MyLogicalRepWorker->relmutex);
MyLogicalRepWorker->relstate = SUBREL_STATE_DATASYNC;
@@ -891,9 +985,6 @@ LogicalRepSyncTableStart(XLogRecPtr *origin_startpos)
CommitTransactionCommand();
pgstat_report_stat(false);
- /*
- * We want to do the table data sync in a single transaction.
- */
StartTransactionCommand();
/*
@@ -919,29 +1010,105 @@ LogicalRepSyncTableStart(XLogRecPtr *origin_startpos)
walrcv_clear_result(res);
/*
- * Create a new temporary logical decoding slot. This slot will be used
+ * Create a new permanent logical decoding slot. This slot will be used
* for the catchup phase after COPY is done, so tell it to use the
* snapshot to make the final data consistent.
*/
- walrcv_create_slot(wrconn, slotname, true,
+ elog(LOG, "!!>> LogicalRepSyncTableStart: walrcv_create_slot for \"%s\".", slotname);
+ walrcv_create_slot(wrconn, slotname, false,
CRS_USE_SNAPSHOT, origin_startpos);
- /* Now do the initial data copy */
- PushActiveSnapshot(GetTransactionSnapshot());
- copy_table(rel);
- PopActiveSnapshot();
+ /*
+ * Be sure to remove the newly created tablesync slot if the COPY fails.
+ */
+ copied_ok = false;
+ PG_TRY();
+ {
+ /* Now do the initial data copy */
+ PushActiveSnapshot(GetTransactionSnapshot());
+ copy_table(rel);
+ PopActiveSnapshot();
+
+ res = walrcv_exec(wrconn, "COMMIT", 0, NULL);
+ if (res->status != WALRCV_OK_COMMAND)
+ ereport(ERROR,
+ (errmsg("table copy could not finish transaction on publisher"),
+ errdetail("The error was: %s", res->err)));
+ walrcv_clear_result(res);
- res = walrcv_exec(wrconn, "COMMIT", 0, NULL);
- if (res->status != WALRCV_OK_COMMAND)
- ereport(ERROR,
- (errmsg("table copy could not finish transaction on publisher"),
- errdetail("The error was: %s", res->err)));
- walrcv_clear_result(res);
+ table_close(rel, NoLock);
+
+ /* Make the copy visible. */
+ CommandCounterIncrement();
- table_close(rel, NoLock);
+ copied_ok = true;
+ }
+ PG_FINALLY();
+ {
+ /* If something failed during copy table then cleanup the created slot. */
+ if (!copied_ok)
+ {
+ elog(LOG, "!!>> LogicalRepSyncTableStart: tablesync copy failed. Dropping the tablesync slot \"%s\".", slotname);
+ ReplicationSlotDropAtPubNode(wrconn, slotname);
+ elog(LOG, "!!>> LogicalRepSyncTableStart: tablesync copy failed. Dropped the tablesync slot \"%s\".", slotname);
+
+ pfree(slotname);
+ slotname = NULL;
+ }
+ }
+ PG_END_TRY();
- /* Make the copy visible. */
- CommandCounterIncrement();
+ CommitTransactionCommand();
+
+ /* Update the persisted state to indicate the COPY phase is done; make it visible to others. */
+ StartTransactionCommand();
+ UpdateSubscriptionRelState(MyLogicalRepWorker->subid,
+ MyLogicalRepWorker->relid,
+ SUBREL_STATE_COPYDONE,
+ MyLogicalRepWorker->relstate_lsn);
+ CommitTransactionCommand();
+
+copy_table_done:
+
+ /* Setup replication origin tracking. */
+ {
+ char originname[NAMEDATALEN];
+ RepOriginId originid;
+
+ StartTransactionCommand();
+ snprintf(originname, sizeof(originname), "pg_%u_%u", MySubscription->oid, MyLogicalRepWorker->relid);
+ originid = replorigin_by_name(originname, true);
+ if (!OidIsValid(originid))
+ {
+ /*
+ * Origin tracking does not exist. Create it now, and advance to LSN got from walrcv_create_slot.
+ */
+ elog(LOG, "!!>> LogicalRepSyncTableStart: 1 replorigin_create \"%s\".", originname);
+ originid = replorigin_create(originname);
+ elog(LOG, "!!>> LogicalRepSyncTableStart: 1 replorigin_session_setup \"%s\".", originname);
+ replorigin_session_setup(originid);
+ replorigin_session_origin = originid;
+ elog(LOG, "!!>> LogicalRepSyncTableStart: 1 replorigin_advance \"%s\".", originname);
+ replorigin_advance(originid, *origin_startpos, InvalidXLogRecPtr,
+ true /* go backward */ , true /* WAL log */ );
+ }
+ else
+ {
+ /*
+ * Origin tracktrack already exists.
+ */
+ elog(LOG, "!!>> LogicalRepSyncTableStart: 2 replorigin_session_setup \"%s\".", originname);
+ replorigin_session_setup(originid);
+ replorigin_session_origin = originid;
+ elog(LOG, "!!>> LogicalRepSyncTableStart: 2 replorigin_session_get_progress \"%s\".", originname);
+ *origin_startpos = replorigin_session_get_progress(false);
+ }
+ elog(LOG, "!!>> LogicalRepSyncTableStart: '%s' origin_startpos lsn %X/%X",
+ originname,
+ (uint32) (*origin_startpos >> 32),
+ (uint32) *origin_startpos);
+ CommitTransactionCommand();
+ }
/*
* We are done with the initial data synchronization, update the state.
diff --git a/src/backend/replication/logical/worker.c b/src/backend/replication/logical/worker.c
index 9271f87..a60e9fd 100644
--- a/src/backend/replication/logical/worker.c
+++ b/src/backend/replication/logical/worker.c
@@ -771,8 +771,7 @@ apply_handle_prepare_txn(LogicalRepPrepareData *prepare_data)
Assert(prepare_data->prepare_lsn == remote_final_lsn);
- /* The synchronization worker runs in single transaction. */
- if (IsTransactionState() && !am_tablesync_worker())
+ if (IsTransactionState())
{
/*
* BeginTransactionBlock is necessary to balance the
@@ -1079,12 +1078,8 @@ apply_handle_stream_stop(StringInfo s)
/* We must be in a valid transaction state */
Assert(IsTransactionState());
- /* The synchronization worker runs in single transaction. */
- if (!am_tablesync_worker())
- {
- /* Commit the per-stream transaction */
- CommitTransactionCommand();
- }
+ /* Commit the per-stream transaction */
+ CommitTransactionCommand();
in_streamed_transaction = false;
@@ -1161,9 +1156,7 @@ apply_handle_stream_abort(StringInfo s)
/* Cleanup the subxact info */
cleanup_subxact_info();
- /* The synchronization worker runs in single transaction */
- if (!am_tablesync_worker())
- CommitTransactionCommand();
+ CommitTransactionCommand();
return;
}
@@ -1190,8 +1183,7 @@ apply_handle_stream_abort(StringInfo s)
/* write the updated subxact list */
subxact_info_write(MyLogicalRepWorker->subid, xid);
- if (!am_tablesync_worker())
- CommitTransactionCommand();
+ CommitTransactionCommand();
}
}
@@ -1350,8 +1342,7 @@ apply_handle_stream_commit(StringInfo s)
static void
apply_handle_commit_internal(StringInfo s, LogicalRepCommitData* commit_data)
{
- /* The synchronization worker runs in single transaction. */
- if (IsTransactionState() && !am_tablesync_worker())
+ if (IsTransactionState())
{
/*
* Update origin state so we can restart streaming from correct
diff --git a/src/include/catalog/pg_subscription_rel.h b/src/include/catalog/pg_subscription_rel.h
index acc2926..e9f2b3f 100644
--- a/src/include/catalog/pg_subscription_rel.h
+++ b/src/include/catalog/pg_subscription_rel.h
@@ -61,6 +61,7 @@ DECLARE_UNIQUE_INDEX(pg_subscription_rel_srrelid_srsubid_index, 6117, on pg_subs
#define SUBREL_STATE_INIT 'i' /* initializing (sublsn NULL) */
#define SUBREL_STATE_DATASYNC 'd' /* data is being synchronized (sublsn
* NULL) */
+#define SUBREL_STATE_COPYDONE 'C' /* tablesync copy phase is completed */
#define SUBREL_STATE_SYNCDONE 's' /* synchronization finished in front of
* apply (sublsn set) */
#define SUBREL_STATE_READY 'r' /* ready (sublsn set) */
diff --git a/src/include/replication/slot.h b/src/include/replication/slot.h
index 63bab69..5f19089 100644
--- a/src/include/replication/slot.h
+++ b/src/include/replication/slot.h
@@ -15,6 +15,7 @@
#include "storage/lwlock.h"
#include "storage/shmem.h"
#include "storage/spin.h"
+#include "replication/walreceiver.h"
/*
* Behaviour of replication slots, upon release or crash.
@@ -211,6 +212,8 @@ extern bool ReplicationSlotsCountDBSlots(Oid dboid, int *nslots, int *nactive);
extern void ReplicationSlotsDropDBSlots(Oid dboid);
extern void InvalidateObsoleteReplicationSlots(XLogSegNo oldestSegno);
extern ReplicationSlot *SearchNamedReplicationSlot(const char *name);
+extern char *ReplicationSlotNameForTablesync(Oid suboid, Oid relid);
+extern void ReplicationSlotDropAtPubNode(WalReceiverConn *wrconn, char *slotname);
extern void StartupReplicationSlots(void);
extern void CheckPointReplicationSlots(void);
--
1.8.3.1
v7-0001-2PC-change-tablesync-slot-to-use-same-two_phase-m.patchapplication/octet-stream; name=v7-0001-2PC-change-tablesync-slot-to-use-same-two_phase-m.patchDownload
From 27ecd449901c0b81fe9738b9fb1421c9b0d20d05 Mon Sep 17 00:00:00 2001
From: Peter Smith <peter.b.smith@fujitsu.com>
Date: Thu, 10 Dec 2020 16:38:05 +1100
Subject: [PATCH v7] 2PC - change tablesync slot to use same two_phase mode as
apply slot
---
src/backend/replication/logical/worker.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/src/backend/replication/logical/worker.c b/src/backend/replication/logical/worker.c
index e14fe62..9271f87 100644
--- a/src/backend/replication/logical/worker.c
+++ b/src/backend/replication/logical/worker.c
@@ -2759,7 +2759,7 @@ maybe_reread_subscription(void)
strcmp(newsub->slotname, MySubscription->slotname) != 0 ||
newsub->binary != MySubscription->binary ||
newsub->stream != MySubscription->stream ||
- (!am_tablesync_worker() && newsub->twophase != MySubscription->twophase) ||
+ newsub->twophase != MySubscription->twophase ||
!equal(newsub->publications, MySubscription->publications))
{
ereport(LOG,
@@ -3406,7 +3406,7 @@ ApplyWorkerMain(Datum main_arg)
options.proto.logical.publication_names = MySubscription->publications;
options.proto.logical.binary = MySubscription->binary;
options.proto.logical.streaming = MySubscription->stream;
- options.proto.logical.twophase = MySubscription->twophase && !am_tablesync_worker();
+ options.proto.logical.twophase = MySubscription->twophase;
/* Start normal logical streaming replication. */
walrcv_startstreaming(wrconn, &options);
--
1.8.3.1
Hi Amit.
PSA my v8 WIP patch for the Solution1.
This has the same code changes as the v7 patch, but the v8 patch can
be applied to the current PG OSS master code base.
====
Coded / WIP:
* tablesync slot is now permanent instead of temporary. The tablesync
slot name is no longer tied to the Subscription slot name.
* the tablesync slot cleanup (drop) code is added for DropSubscription
and for finish_sync_worker functions
* tablesync worked now allowing multiple tx instead of single tx
* a new state (SUBREL_STATE_COPYDONE) is persisted after a successful
copy_table in LogicalRepSyncTableStart.
* if a relaunched tablesync finds the state is SUBREL_STATE_COPYDONE
then it will bypass the initial copy_table phase.
* tablesync sets up replication origin tracking in
LogicalRepSyncTableStart (similar as done for the apply worker). The
origin is advanced when first created.
* tablesync replication origin tracking is cleaned up during
DropSubscription and/or process_syncing_tables_for_apply
* The DropSubscription cleanup code was changed lots in v7. The
subscription TAP tests have been executed 6x now without observing any
race problems that were sometimes seen to happen in the v6 patch.
TODO / Known Issues:
* Help / comments
* There is temporary "!!>>" excessive logging scattered around which I
added to help my testing during development
* Address review comments
---
Kind Regards,
Peter Smith.
Fujitsu Australia
Attachments:
v8-0001-WIP-patch-for-the-Solution1.patchapplication/octet-stream; name=v8-0001-WIP-patch-for-the-Solution1.patchDownload
From af346cbdc1d3091af190b65b530cd572255eff7e Mon Sep 17 00:00:00 2001
From: Peter Smith <peter.b.smith@fujitsu.com>
Date: Wed, 23 Dec 2020 20:29:53 +1100
Subject: [PATCH v8] WIP patch for the Solution1.
This is same code changes as V7 patch but this patch can be applied to the current PG OSS master code base.
====
Coded / WIP:
* tablesync slot is now permanent instead of temporary. The tablesync slot name is no longer tied to the Subscription slot name.
* the tablesync slot cleanup (drop) code is added for DropSubscription and for finish_sync_worker functions
* tablesync worked now allowing multiple tx instead of single tx
* a new state (SUBREL_STATE_COPYDONE) is persisted after a successful copy_table in LogicalRepSyncTableStart.
* if a relaunched tablesync finds the state is SUBREL_STATE_COPYDONE then it will bypass the initial copy_table phase.
* tablesync sets up replication origin tracking in LogicalRepSyncTableStart (similar as done for the apply worker). The origin is advanced when first created.
* tablesync replication origin tracking is cleaned up during DropSubscription and/or process_syncing_tables_for_apply
* The DropSubscription cleanup code was changed lots in v7. The subscription TAP tests have been executed 6x now without observing any race problems that were sometimes seen to happen in the v6 patch.
TODO / Known Issues:
* Help / comments
* There is temporary "!!>>" excessive logging scattered around which I added to help my testing during development
* Address review comments
---
src/backend/commands/subscriptioncmds.c | 221 +++++++++++++++++++-------
src/backend/replication/logical/origin.c | 4 +-
src/backend/replication/logical/tablesync.c | 231 ++++++++++++++++++++++++----
src/backend/replication/logical/worker.c | 18 +--
src/include/catalog/pg_subscription_rel.h | 1 +
src/include/replication/slot.h | 3 +
6 files changed, 374 insertions(+), 104 deletions(-)
diff --git a/src/backend/commands/subscriptioncmds.c b/src/backend/commands/subscriptioncmds.c
index 1696454..9472fca 100644
--- a/src/backend/commands/subscriptioncmds.c
+++ b/src/backend/commands/subscriptioncmds.c
@@ -37,6 +37,7 @@
#include "replication/walreceiver.h"
#include "replication/walsender.h"
#include "replication/worker_internal.h"
+#include "replication/slot.h"
#include "storage/lmgr.h"
#include "utils/acl.h"
#include "utils/builtins.h"
@@ -928,7 +929,6 @@ DropSubscription(DropSubscriptionStmt *stmt, bool isTopLevel)
char *err = NULL;
RepOriginId originid;
WalReceiverConn *wrconn = NULL;
- StringInfoData cmd;
Form_pg_subscription form;
/*
@@ -1016,76 +1016,187 @@ DropSubscription(DropSubscriptionStmt *stmt, bool isTopLevel)
ReleaseSysCache(tup);
/*
- * Stop all the subscription workers immediately.
- *
- * This is necessary if we are dropping the replication slot, so that the
- * slot becomes accessible.
+ * Try to acquire the connection necessary for dropping slots.
+ * We do this here so that the same connection may be shared
+ * for dropping the Subscription slot, as well as dropping any
+ * tablesync slots.
*
- * It is also necessary if the subscription is disabled and was disabled
- * in the same transaction. Then the workers haven't seen the disabling
- * yet and will still be running, leading to hangs later when we want to
- * drop the replication origin. If the subscription was disabled before
- * this transaction, then there shouldn't be any workers left, so this
- * won't make a difference.
- *
- * New workers won't be started because we hold an exclusive lock on the
- * subscription till the end of the transaction.
+ * Note: If the slotname is NONE/NULL then connection errors are
+ * suppressed. This is necessary so that the DROP SUBSCRIPTION
+ * can still complete even when the connection to publisher is
+ * broken.
*/
- LWLockAcquire(LogicalRepWorkerLock, LW_SHARED);
- subworkers = logicalrep_workers_find(subid, false);
- LWLockRelease(LogicalRepWorkerLock);
- foreach(lc, subworkers)
+ load_file("libpqwalreceiver", false);
+
+ wrconn = walrcv_connect(conninfo, true, subname, &err);
+ if (wrconn == NULL && slotname != NULL)
+ ereport(ERROR,
+ (errmsg("could not connect to publisher when attempting to "
+ "drop the replication slot \"%s\"", slotname),
+ errdetail("The error was: %s", err),
+ /* translator: %s is an SQL ALTER command */
+ errhint("Use %s to disassociate the subscription from the slot.",
+ "ALTER SUBSCRIPTION ... SET (slot_name = NONE)")));
+
+ PG_TRY();
{
- LogicalRepWorker *w = (LogicalRepWorker *) lfirst(lc);
+ /*
+ * Stop all the subscription workers immediately.
+ *
+ * This is necessary if we are dropping the replication slot, so that the
+ * slot becomes accessible.
+ *
+ * It is also necessary if the subscription is disabled and was disabled
+ * in the same transaction. Then the workers haven't seen the disabling
+ * yet and will still be running, leading to hangs later when we want to
+ * drop the replication origin. If the subscription was disabled before
+ * this transaction, then there shouldn't be any workers left, so this
+ * won't make a difference.
+ *
+ * New workers won't be started because we hold an exclusive lock on the
+ * subscription till the end of the transaction.
+ */
+ LWLockAcquire(LogicalRepWorkerLock, LW_SHARED);
+ subworkers = logicalrep_workers_find(subid, false);
+ LWLockRelease(LogicalRepWorkerLock);
+ foreach(lc, subworkers)
+ {
+ LogicalRepWorker *w = (LogicalRepWorker *) lfirst(lc);
- logicalrep_worker_stop(w->subid, w->relid);
- }
- list_free(subworkers);
+ logicalrep_worker_stop(w->subid, w->relid);
+ }
+ list_free(subworkers);
- /* Clean up dependencies */
- deleteSharedDependencyRecordsFor(SubscriptionRelationId, subid, 0);
+ /*
+ * Tablesync resource cleanup (slots and origins).
+ *
+ * Any READY-state relations should already have been dealt with clean-ups.
+ */
+ {
+ List *rstates;
+ ListCell *lc;
- /* Remove any associated relation synchronization states. */
- RemoveSubscriptionRel(subid, InvalidOid);
+ rstates = GetSubscriptionNotReadyRelations(subid);
+ foreach(lc, rstates)
+ {
+ SubscriptionRelState *rstate = (SubscriptionRelState *) lfirst(lc);
+ Oid relid = rstate->relid;
- /* Remove the origin tracking if exists. */
- snprintf(originname, sizeof(originname), "pg_%u", subid);
- originid = replorigin_by_name(originname, true);
- if (originid != InvalidRepOriginId)
- replorigin_drop(originid, false);
+ /* Only cleanup the tablesync worker resources */
+ if (!OidIsValid(relid))
+ continue;
- /*
- * If there is no slot associated with the subscription, we can finish
- * here.
- */
- if (!slotname)
+ /* Drop the tablesync slot. */
+ {
+ char *syncslotname = ReplicationSlotNameForTablesync(subid, relid);
+
+ /*
+ * If the subscription slotname is NONE/NULL and the connection to publisher is
+ * broken, then the DropSubscription should still be allowed to complete.
+ * But without a connection it is not possible to drop any tablesync slots.
+ */
+ if (!wrconn)
+ {
+ /* FIXME - OK to just log a warning? */
+ elog(WARNING, "!!>> DropSubscription: no connection. Cannot drop tablesync slot \"%s\".",
+ syncslotname);
+ }
+ else
+ {
+ PG_TRY();
+ {
+ elog(LOG, "!!>> DropSubscription: dropping the tablesync slot \"%s\".", syncslotname);
+ ReplicationSlotDropAtPubNode(wrconn, syncslotname);
+ elog(LOG, "!!>> DropSubscription: dropped the tablesync slot \"%s\".", syncslotname);
+ }
+ PG_CATCH();
+ {
+ /*
+ * Typically tablesync will delete its own slot after it reaches
+ * SYNCDONE state. Next the apply worker move the tablesync from
+ * SYNCDONE to READY state.
+ *
+ * Rarely, the DropSubscription may be issued when a tablesync still
+ * is in SYNCDONE but not yet in READY state. If this happens then
+ * the drop slot could fail because it is already dropped.
+ * In this case suppress and drop slot error.
+ *
+ * FIXME - Is there a better way than this?
+ */
+ if (rstate->state != SUBREL_STATE_SYNCDONE)
+ PG_RE_THROW();
+ }
+ PG_END_TRY();
+ }
+ pfree(syncslotname);
+ }
+
+ /* Remove the tablesync's origin tracking if exists. */
+ {
+ snprintf(originname, sizeof(originname), "pg_%u_%u", subid, relid);
+ originid = replorigin_by_name(originname, true);
+ if (originid != InvalidRepOriginId)
+ {
+ elog(LOG, "!!>> DropSubscription: dropping origin tracking for \"%s\"", originname);
+ replorigin_drop(originid, false);
+ elog(LOG, "!!>> DropSubscription: dropped origin tracking for \"%s\"", originname);
+ }
+ }
+
+ }
+ list_free(rstates);
+ }
+
+ /* Clean up dependencies */
+ deleteSharedDependencyRecordsFor(SubscriptionRelationId, subid, 0);
+
+ /* Remove any associated relation synchronization states. */
+ RemoveSubscriptionRel(subid, InvalidOid);
+
+ /* Remove the origin tracking if exists. */
+ snprintf(originname, sizeof(originname), "pg_%u", subid);
+ originid = replorigin_by_name(originname, true);
+ if (originid != InvalidRepOriginId)
+ replorigin_drop(originid, false);
+
+ /*
+ * If there is a slot associated with the subscription, then drop the
+ * replication slot at the publisher node using the replication
+ * connection.
+ */
+ if (slotname)
+ ReplicationSlotDropAtPubNode(wrconn, slotname);
+ }
+ PG_FINALLY();
{
+ if (wrconn)
+ walrcv_disconnect(wrconn);
+
table_close(rel, NoLock);
- return;
}
+ PG_END_TRY();
+}
+
+
+/*
+ * Drop the replication slot at the publisher node
+ * using the replication connection.
+ */
+void
+ReplicationSlotDropAtPubNode(WalReceiverConn *wrconn, char *slotname)
+{
+ StringInfoData cmd;
+
+ Assert(wrconn);
- /*
- * Otherwise drop the replication slot at the publisher node using the
- * replication connection.
- */
load_file("libpqwalreceiver", false);
initStringInfo(&cmd);
appendStringInfo(&cmd, "DROP_REPLICATION_SLOT %s WAIT", quote_identifier(slotname));
- wrconn = walrcv_connect(conninfo, true, subname, &err);
- if (wrconn == NULL)
- ereport(ERROR,
- (errmsg("could not connect to publisher when attempting to "
- "drop the replication slot \"%s\"", slotname),
- errdetail("The error was: %s", err),
- /* translator: %s is an SQL ALTER command */
- errhint("Use %s to disassociate the subscription from the slot.",
- "ALTER SUBSCRIPTION ... SET (slot_name = NONE)")));
-
PG_TRY();
{
- WalRcvExecResult *res;
+ WalRcvExecResult *res;
res = walrcv_exec(wrconn, cmd.data, 0, NULL);
@@ -1103,13 +1214,9 @@ DropSubscription(DropSubscriptionStmt *stmt, bool isTopLevel)
}
PG_FINALLY();
{
- walrcv_disconnect(wrconn);
+ pfree(cmd.data);
}
PG_END_TRY();
-
- pfree(cmd.data);
-
- table_close(rel, NoLock);
}
/*
diff --git a/src/backend/replication/logical/origin.c b/src/backend/replication/logical/origin.c
index 15ab8e7..6b79dc6 100644
--- a/src/backend/replication/logical/origin.c
+++ b/src/backend/replication/logical/origin.c
@@ -843,7 +843,7 @@ replorigin_redo(XLogReaderState *record)
* that originated at the LSN remote_commit on the remote node was replayed
* successfully and that we don't need to do so again. In combination with
* setting up replorigin_session_origin_lsn and replorigin_session_origin
- * that ensures we won't loose knowledge about that after a crash if the
+ * that ensures we won't lose knowledge about that after a crash if the
* transaction had a persistent effect (think of asynchronous commits).
*
* local_commit needs to be a local LSN of the commit so that we can make sure
@@ -905,7 +905,7 @@ replorigin_advance(RepOriginId node,
LWLockAcquire(&replication_state->lock, LW_EXCLUSIVE);
/* Make sure it's not used by somebody else */
- if (replication_state->acquired_by != 0)
+ if (replication_state->acquired_by != 0 && replication_state->acquired_by != MyProcPid)
{
ereport(ERROR,
(errcode(ERRCODE_OBJECT_IN_USE),
diff --git a/src/backend/replication/logical/tablesync.c b/src/backend/replication/logical/tablesync.c
index 6259606..3cb0aad 100644
--- a/src/backend/replication/logical/tablesync.c
+++ b/src/backend/replication/logical/tablesync.c
@@ -102,6 +102,8 @@
#include "replication/logicalrelation.h"
#include "replication/walreceiver.h"
#include "replication/worker_internal.h"
+#include "replication/slot.h"
+#include "replication/origin.h"
#include "storage/ipc.h"
#include "utils/builtins.h"
#include "utils/lsyscache.h"
@@ -139,6 +141,33 @@ finish_sync_worker(void)
get_rel_name(MyLogicalRepWorker->relid))));
CommitTransactionCommand();
+ /*
+ * Cleanup the tablesync slot.
+ */
+ {
+ /* Calculate the name of the tablesync slot */
+ char *syncslotname = ReplicationSlotNameForTablesync(
+ MySubscription->oid,
+ MyLogicalRepWorker->relid);
+
+ PG_TRY();
+ {
+ elog(LOG, "!!>> finish_sync_worker: dropping the tablesync slot \"%s\".", syncslotname);
+ ReplicationSlotDropAtPubNode(wrconn, syncslotname);
+ elog(LOG, "!!>> finish_sync_worker: dropped the tablesync slot \"%s\".", syncslotname);
+ }
+ PG_CATCH();
+ {
+ /*
+ * NOP. Suppress any drop slot error because otherwise
+ * it would cause the tablesync to fail and re-launch.
+ */
+ }
+ PG_END_TRY();
+
+ pfree(syncslotname);
+ }
+
/* Find the main apply worker and signal it. */
logicalrep_worker_wakeup(MyLogicalRepWorker->subid, InvalidOid);
@@ -270,8 +299,6 @@ invalidate_syncing_table_states(Datum arg, int cacheid, uint32 hashvalue)
static void
process_syncing_tables_for_sync(XLogRecPtr current_lsn)
{
- Assert(IsTransactionState());
-
SpinLockAcquire(&MyLogicalRepWorker->relmutex);
if (MyLogicalRepWorker->relstate == SUBREL_STATE_CATCHUP &&
@@ -284,6 +311,15 @@ process_syncing_tables_for_sync(XLogRecPtr current_lsn)
SpinLockRelease(&MyLogicalRepWorker->relmutex);
+ /*
+ * UpdateSubscriptionRelState must be called within a transaction.
+ * That transaction will be ended within the finish_sync_worker().
+ */
+ if (!IsTransactionState())
+ {
+ StartTransactionCommand();
+ }
+
UpdateSubscriptionRelState(MyLogicalRepWorker->subid,
MyLogicalRepWorker->relid,
MyLogicalRepWorker->relstate,
@@ -406,12 +442,41 @@ process_syncing_tables_for_apply(XLogRecPtr current_lsn)
{
rstate->state = SUBREL_STATE_READY;
rstate->lsn = current_lsn;
+
if (!started_tx)
{
StartTransactionCommand();
started_tx = true;
}
+ /*
+ * Remove the tablesync origin tracking if exists.
+ *
+ * The cleanup is done here instead of in the finish_sync_worker function because
+ * if the tablesync worker process attempted to call replorigin_drop then that will
+ * hang because the replorigin_drop considers the owning tablesync PID as "busy".
+ *
+ * Do this before updating the state, so that DropSubscription can know that all
+ * READY workers have already had their origin tracking removed.
+ */
+ {
+ char originname[NAMEDATALEN];
+ RepOriginId originid;
+
+ snprintf(originname, sizeof(originname), "pg_%u_%u", MyLogicalRepWorker->subid, rstate->relid);
+ originid = replorigin_by_name(originname, true);
+ elog(LOG, "!!>> apply worker: find tablesync origin tracking for \"%s\".", originname);
+ if (OidIsValid(originid))
+ {
+ elog(LOG, "!!>> apply worker: dropping tablesync origin tracking for \"%s\".", originname);
+ replorigin_drop(originid, false);
+ elog(LOG, "!!>> apply worker: dropped tablesync origin tracking for \"%s\".", originname);
+ }
+ }
+
+ /*
+ * Update the state only after the origin cleanup.
+ */
UpdateSubscriptionRelState(MyLogicalRepWorker->subid,
rstate->relid, rstate->state,
rstate->lsn);
@@ -807,6 +872,32 @@ copy_table(Relation rel)
logicalrep_rel_close(relmapentry, NoLock);
}
+
+/*
+ * Determine the tablesync slot name.
+ *
+ * The returned slot name is palloc'ed in current memory context.
+ */
+char *
+ReplicationSlotNameForTablesync(Oid suboid, Oid relid)
+{
+ char *syncslotname;
+
+ /*
+ * To build a slot name for the sync work, we are limited to NAMEDATALEN -
+ * 1 characters.
+ *
+ * The name is calculated as pg_%u_sync_%u (3 + 10 + 6 + 10 + '\0').
+ * (It's actually the NAMEDATALEN on the remote that matters, but this
+ * scheme will also work reasonably if that is different.)
+ */
+ StaticAssertStmt(NAMEDATALEN >= 32, "NAMEDATALEN too small"); /* for sanity */
+
+ syncslotname = psprintf("pg_%u_sync_%u", suboid, relid);
+
+ return syncslotname;
+}
+
/*
* Start syncing the table in the sync worker.
*
@@ -824,6 +915,7 @@ LogicalRepSyncTableStart(XLogRecPtr *origin_startpos)
XLogRecPtr relstate_lsn;
Relation rel;
WalRcvExecResult *res;
+ bool copied_ok;
/* Check the state of the table synchronization. */
StartTransactionCommand();
@@ -849,17 +941,8 @@ LogicalRepSyncTableStart(XLogRecPtr *origin_startpos)
finish_sync_worker(); /* doesn't return */
}
- /*
- * To build a slot name for the sync work, we are limited to NAMEDATALEN -
- * 1 characters. We cut the original slot name to NAMEDATALEN - 28 chars
- * and append _%u_sync_%u (1 + 10 + 6 + 10 + '\0'). (It's actually the
- * NAMEDATALEN on the remote that matters, but this scheme will also work
- * reasonably if that is different.)
- */
- StaticAssertStmt(NAMEDATALEN >= 32, "NAMEDATALEN too small"); /* for sanity */
- slotname = psprintf("%.*s_%u_sync_%u",
- NAMEDATALEN - 28,
- MySubscription->slotname,
+ /* Calculate the name of the tablesync slot. */
+ slotname = ReplicationSlotNameForTablesync(
MySubscription->oid,
MyLogicalRepWorker->relid);
@@ -874,7 +957,18 @@ LogicalRepSyncTableStart(XLogRecPtr *origin_startpos)
(errmsg("could not connect to the publisher: %s", err)));
Assert(MyLogicalRepWorker->relstate == SUBREL_STATE_INIT ||
- MyLogicalRepWorker->relstate == SUBREL_STATE_DATASYNC);
+ MyLogicalRepWorker->relstate == SUBREL_STATE_DATASYNC ||
+ MyLogicalRepWorker->relstate == SUBREL_STATE_COPYDONE);
+
+ if (MyLogicalRepWorker->relstate == SUBREL_STATE_COPYDONE)
+ {
+ /*
+ * The COPY phase was previously done, but tablesync then crashed/etc
+ * before it was able to finish normally.
+ */
+ elog(LOG, "!!>> LogicalRepSyncTableStart: tablesync relstate was SUBREL_STATE_COPYDONE.");
+ goto copy_table_done;
+ }
SpinLockAcquire(&MyLogicalRepWorker->relmutex);
MyLogicalRepWorker->relstate = SUBREL_STATE_DATASYNC;
@@ -890,9 +984,6 @@ LogicalRepSyncTableStart(XLogRecPtr *origin_startpos)
CommitTransactionCommand();
pgstat_report_stat(false);
- /*
- * We want to do the table data sync in a single transaction.
- */
StartTransactionCommand();
/*
@@ -918,29 +1009,105 @@ LogicalRepSyncTableStart(XLogRecPtr *origin_startpos)
walrcv_clear_result(res);
/*
- * Create a new temporary logical decoding slot. This slot will be used
+ * Create a new permanent logical decoding slot. This slot will be used
* for the catchup phase after COPY is done, so tell it to use the
* snapshot to make the final data consistent.
*/
- walrcv_create_slot(wrconn, slotname, true,
+ elog(LOG, "!!>> LogicalRepSyncTableStart: walrcv_create_slot for \"%s\".", slotname);
+ walrcv_create_slot(wrconn, slotname, false,
CRS_USE_SNAPSHOT, origin_startpos);
- /* Now do the initial data copy */
- PushActiveSnapshot(GetTransactionSnapshot());
- copy_table(rel);
- PopActiveSnapshot();
+ /*
+ * Be sure to remove the newly created tablesync slot if the COPY fails.
+ */
+ copied_ok = false;
+ PG_TRY();
+ {
+ /* Now do the initial data copy */
+ PushActiveSnapshot(GetTransactionSnapshot());
+ copy_table(rel);
+ PopActiveSnapshot();
+
+ res = walrcv_exec(wrconn, "COMMIT", 0, NULL);
+ if (res->status != WALRCV_OK_COMMAND)
+ ereport(ERROR,
+ (errmsg("table copy could not finish transaction on publisher"),
+ errdetail("The error was: %s", res->err)));
+ walrcv_clear_result(res);
- res = walrcv_exec(wrconn, "COMMIT", 0, NULL);
- if (res->status != WALRCV_OK_COMMAND)
- ereport(ERROR,
- (errmsg("table copy could not finish transaction on publisher"),
- errdetail("The error was: %s", res->err)));
- walrcv_clear_result(res);
+ table_close(rel, NoLock);
+
+ /* Make the copy visible. */
+ CommandCounterIncrement();
- table_close(rel, NoLock);
+ copied_ok = true;
+ }
+ PG_FINALLY();
+ {
+ /* If something failed during copy table then cleanup the created slot. */
+ if (!copied_ok)
+ {
+ elog(LOG, "!!>> LogicalRepSyncTableStart: tablesync copy failed. Dropping the tablesync slot \"%s\".", slotname);
+ ReplicationSlotDropAtPubNode(wrconn, slotname);
+ elog(LOG, "!!>> LogicalRepSyncTableStart: tablesync copy failed. Dropped the tablesync slot \"%s\".", slotname);
+
+ pfree(slotname);
+ slotname = NULL;
+ }
+ }
+ PG_END_TRY();
- /* Make the copy visible. */
- CommandCounterIncrement();
+ CommitTransactionCommand();
+
+ /* Update the persisted state to indicate the COPY phase is done; make it visible to others. */
+ StartTransactionCommand();
+ UpdateSubscriptionRelState(MyLogicalRepWorker->subid,
+ MyLogicalRepWorker->relid,
+ SUBREL_STATE_COPYDONE,
+ MyLogicalRepWorker->relstate_lsn);
+ CommitTransactionCommand();
+
+copy_table_done:
+
+ /* Setup replication origin tracking. */
+ {
+ char originname[NAMEDATALEN];
+ RepOriginId originid;
+
+ StartTransactionCommand();
+ snprintf(originname, sizeof(originname), "pg_%u_%u", MySubscription->oid, MyLogicalRepWorker->relid);
+ originid = replorigin_by_name(originname, true);
+ if (!OidIsValid(originid))
+ {
+ /*
+ * Origin tracking does not exist. Create it now, and advance to LSN got from walrcv_create_slot.
+ */
+ elog(LOG, "!!>> LogicalRepSyncTableStart: 1 replorigin_create \"%s\".", originname);
+ originid = replorigin_create(originname);
+ elog(LOG, "!!>> LogicalRepSyncTableStart: 1 replorigin_session_setup \"%s\".", originname);
+ replorigin_session_setup(originid);
+ replorigin_session_origin = originid;
+ elog(LOG, "!!>> LogicalRepSyncTableStart: 1 replorigin_advance \"%s\".", originname);
+ replorigin_advance(originid, *origin_startpos, InvalidXLogRecPtr,
+ true /* go backward */ , true /* WAL log */ );
+ }
+ else
+ {
+ /*
+ * Origin tracktrack already exists.
+ */
+ elog(LOG, "!!>> LogicalRepSyncTableStart: 2 replorigin_session_setup \"%s\".", originname);
+ replorigin_session_setup(originid);
+ replorigin_session_origin = originid;
+ elog(LOG, "!!>> LogicalRepSyncTableStart: 2 replorigin_session_get_progress \"%s\".", originname);
+ *origin_startpos = replorigin_session_get_progress(false);
+ }
+ elog(LOG, "!!>> LogicalRepSyncTableStart: '%s' origin_startpos lsn %X/%X",
+ originname,
+ (uint32) (*origin_startpos >> 32),
+ (uint32) *origin_startpos);
+ CommitTransactionCommand();
+ }
/*
* We are done with the initial data synchronization, update the state.
diff --git a/src/backend/replication/logical/worker.c b/src/backend/replication/logical/worker.c
index 3874939..d28cfb8 100644
--- a/src/backend/replication/logical/worker.c
+++ b/src/backend/replication/logical/worker.c
@@ -807,12 +807,8 @@ apply_handle_stream_stop(StringInfo s)
/* We must be in a valid transaction state */
Assert(IsTransactionState());
- /* The synchronization worker runs in single transaction. */
- if (!am_tablesync_worker())
- {
- /* Commit the per-stream transaction */
- CommitTransactionCommand();
- }
+ /* Commit the per-stream transaction */
+ CommitTransactionCommand();
in_streamed_transaction = false;
@@ -889,9 +885,7 @@ apply_handle_stream_abort(StringInfo s)
/* Cleanup the subxact info */
cleanup_subxact_info();
- /* The synchronization worker runs in single transaction */
- if (!am_tablesync_worker())
- CommitTransactionCommand();
+ CommitTransactionCommand();
return;
}
@@ -918,8 +912,7 @@ apply_handle_stream_abort(StringInfo s)
/* write the updated subxact list */
subxact_info_write(MyLogicalRepWorker->subid, xid);
- if (!am_tablesync_worker())
- CommitTransactionCommand();
+ CommitTransactionCommand();
}
}
@@ -1062,8 +1055,7 @@ apply_handle_stream_commit(StringInfo s)
static void
apply_handle_commit_internal(StringInfo s, LogicalRepCommitData* commit_data)
{
- /* The synchronization worker runs in single transaction. */
- if (IsTransactionState() && !am_tablesync_worker())
+ if (IsTransactionState())
{
/*
* Update origin state so we can restart streaming from correct
diff --git a/src/include/catalog/pg_subscription_rel.h b/src/include/catalog/pg_subscription_rel.h
index acc2926..e9f2b3f 100644
--- a/src/include/catalog/pg_subscription_rel.h
+++ b/src/include/catalog/pg_subscription_rel.h
@@ -61,6 +61,7 @@ DECLARE_UNIQUE_INDEX(pg_subscription_rel_srrelid_srsubid_index, 6117, on pg_subs
#define SUBREL_STATE_INIT 'i' /* initializing (sublsn NULL) */
#define SUBREL_STATE_DATASYNC 'd' /* data is being synchronized (sublsn
* NULL) */
+#define SUBREL_STATE_COPYDONE 'C' /* tablesync copy phase is completed */
#define SUBREL_STATE_SYNCDONE 's' /* synchronization finished in front of
* apply (sublsn set) */
#define SUBREL_STATE_READY 'r' /* ready (sublsn set) */
diff --git a/src/include/replication/slot.h b/src/include/replication/slot.h
index 63bab69..5f19089 100644
--- a/src/include/replication/slot.h
+++ b/src/include/replication/slot.h
@@ -15,6 +15,7 @@
#include "storage/lwlock.h"
#include "storage/shmem.h"
#include "storage/spin.h"
+#include "replication/walreceiver.h"
/*
* Behaviour of replication slots, upon release or crash.
@@ -211,6 +212,8 @@ extern bool ReplicationSlotsCountDBSlots(Oid dboid, int *nslots, int *nactive);
extern void ReplicationSlotsDropDBSlots(Oid dboid);
extern void InvalidateObsoleteReplicationSlots(XLogSegNo oldestSegno);
extern ReplicationSlot *SearchNamedReplicationSlot(const char *name);
+extern char *ReplicationSlotNameForTablesync(Oid suboid, Oid relid);
+extern void ReplicationSlotDropAtPubNode(WalReceiverConn *wrconn, char *slotname);
extern void StartupReplicationSlots(void);
extern void CheckPointReplicationSlots(void);
--
1.8.3.1
On Wed, Dec 23, 2020 at 11:49 AM Peter Smith <smithpb2250@gmail.com> wrote:
Hi Amit.
PSA my v7 WIP patch for the Solution1.
Few comments:
================
1.
+ * Rarely, the DropSubscription may be issued when a tablesync still
+ * is in SYNCDONE but not yet in READY state. If this happens then
+ * the drop slot could fail because it is already dropped.
+ * In this case suppress and drop slot error.
+ *
+ * FIXME - Is there a better way than this?
+ */
+ if (rstate->state != SUBREL_STATE_SYNCDONE)
+ PG_RE_THROW();
So, does this situation happens when we try to drop subscription after
the state is changed to syncdone but not syncready. If so, then can't
we write a function GetSubscriptionNotDoneRelations similar to
GetSubscriptionNotReadyRelations where we get a list of relations that
are not in done stage. I think this should be safe because once we are
here we shouldn't be allowed to start a new worker and old workers are
already stopped by this function.
2. Your changes in LogicalRepSyncTableStart() doesn't seem to be
right. IIUC, you are copying the table in one transaction, then
updating the state to SUBREL_STATE_COPYDONE in another transaction,
and after that doing replorigin_advance. Consider what happened if we
error out after the first txn is committed in which we have copied the
table. After the restart, it will again try to copy and lead to an
error. Similarly, consider if we error out after the second
transaction, we won't where to start decoding from. I think all these
should be done in a single transaction.
--
With Regards,
Amit Kapila.
On Tue, Dec 22, 2020 at 4:58 PM Peter Smith <smithpb2250@gmail.com> wrote:
On Mon, Dec 21, 2020 at 11:36 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
On Mon, Dec 21, 2020 at 3:17 PM Peter Smith <smithpb2250@gmail.com> wrote:
On Mon, Dec 21, 2020 at 4:23 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
Few other comments:
==================Thanks for your feedback.
1. * FIXME 3 - Crashed tablesync workers may also have remaining slots because I don't think + * such workers are even iterated by this loop, and nobody else is removing them. + */ + if (slotname) + {The above FIXME is not clear to me. Actually, the crashed workers
should restart, finish their work, and drop the slots. So not sure
what exactly this FIXME refers to?Yes, normally if the tablesync can complete it should behave like that.
But I think there are other scenarios where it may be unable to
clean-up after itself. For example:i) Maybe the crashed tablesync worker cannot finish. e.g. A row insert
handled by tablesync can give a PK violation which also will crash
again and again for each re-launched/replacement tablesync worker.
This can be reproduced in the debugger. If the DropSubscription
doesn't clean-up the tablesync's slot then nobody will.ii) Also DROP SUBSCRIPTION code has locking (see code commit) "to
ensure that the launcher doesn't restart new worker during dropping
the subscription".Yeah, I have also read that comment but do you know how it is
preventing relaunch? How does the subscription lock help?Hmmm. I did see there is a matching lock in get_subscription_list of
launcher.c, which may be what that code comment was referring to. But
I also am currently unsure how this lock prevents anybody (e.g.
process_syncing_tables_for_apply) from executing another
logicalrep_worker_launch.
process_syncing_tables_for_apply will be called by the apply worker
and we are stopping the apply worker. So, after that launcher won't
start a new apply worker because of get_subscription_list() and if the
apply worker is not started then it won't be able to start tablesync
worker. So, we need the handling of crashed tablesync workers here
such that we need to drop any new sync slots.
--
With Regards,
Amit Kapila.
Hi Amit.
PSA my v9 WIP patch for the Solution1 which addresses some recent
review comments, and other minor changes.
====
Features:
* tablesync slot is now permanent instead of temporary. The tablesync
slot name is no longer tied to the Subscription slot na
* the tablesync slot cleanup (drop) code is added for DropSubscription
and for finish_sync_worker functions
* tablesync worked now allowing multiple tx instead of single tx
* a new state (SUBREL_STATE_COPYDONE) is persisted after a successful
copy_table in LogicalRepSyncTableStart.
* if a relaunched tablesync finds the state is SUBREL_STATE_COPYDONE
then it will bypass the initial copy_table phase.
* tablesync sets up replication origin tracking in
LogicalRepSyncTableStart (similar as done for the apply worker). The
origin is advanced when first created.
* tablesync replication origin tracking is cleaned up during
DropSubscription and/or process_syncing_tables_for_apply.
* The DropSubscription cleanup code was enhanced in v7 to take care of
crashed sync workers.
* Minor updates to PG docs
TODO / Known Issues:
* Source includes temporary "!!>>" excessive logging which I added to
help testing during development
* Address review comments
---
Kind Regards,
Peter Smith.
Fujitsu Australia
Attachments:
v9-0001-WIP-patch-for-the-Solution1.patchapplication/octet-stream; name=v9-0001-WIP-patch-for-the-Solution1.patchDownload
From b93e330a1ffe5c43d09f72e38726363bcb49d890 Mon Sep 17 00:00:00 2001
From: Peter Smith <peter.b.smith@fujitsu.com>
Date: Wed, 30 Dec 2020 15:02:21 +1100
Subject: [PATCH v9] WIP patch for the Solution1.
====
Features:
* tablesync slot is now permanent instead of temporary. The tablesync slot name is no longer tied to the Subscription slot name.
* the tablesync slot cleanup (drop) code is added for DropSubscription and for finish_sync_worker functions
* tablesync worked now allowing multiple tx instead of single tx
* a new state (SUBREL_STATE_COPYDONE) is persisted after a successful copy_table in LogicalRepSyncTableStart.
* if a relaunched tablesync finds the state is SUBREL_STATE_COPYDONE then it will bypass the initial copy_table phase.
* tablesync sets up replication origin tracking in LogicalRepSyncTableStart (similar as done for the apply worker). The origin is advanced when first created.
* tablesync replication origin tracking is cleaned up during DropSubscription and/or process_syncing_tables_for_apply.
* The DropSubscription cleanup code was enhanced in v7 to take care of crashed sync workers.
* Minor updates to PG docs
TODO / Known Issues:
* Source includes temporary "!!>>" excessive logging which I added to help testing during development
* Address review comments
---
doc/src/sgml/catalogs.sgml | 1 +
src/backend/commands/subscriptioncmds.c | 222 +++++++++++++++++++-------
src/backend/replication/logical/origin.c | 4 +-
src/backend/replication/logical/tablesync.c | 232 ++++++++++++++++++++++++----
src/backend/replication/logical/worker.c | 18 +--
src/include/catalog/pg_subscription_rel.h | 1 +
src/include/replication/slot.h | 3 +
7 files changed, 376 insertions(+), 105 deletions(-)
diff --git a/doc/src/sgml/catalogs.sgml b/doc/src/sgml/catalogs.sgml
index d988636..266615c 100644
--- a/doc/src/sgml/catalogs.sgml
+++ b/doc/src/sgml/catalogs.sgml
@@ -7651,6 +7651,7 @@ SCRAM-SHA-256$<replaceable><iteration count></replaceable>:<replaceable>&l
State code:
<literal>i</literal> = initialize,
<literal>d</literal> = data is being copied,
+ <literal>C</literal> = table data has been copied,
<literal>s</literal> = synchronized,
<literal>r</literal> = ready (normal replication)
</para></entry>
diff --git a/src/backend/commands/subscriptioncmds.c b/src/backend/commands/subscriptioncmds.c
index 1696454..c366614 100644
--- a/src/backend/commands/subscriptioncmds.c
+++ b/src/backend/commands/subscriptioncmds.c
@@ -37,6 +37,7 @@
#include "replication/walreceiver.h"
#include "replication/walsender.h"
#include "replication/worker_internal.h"
+#include "replication/slot.h"
#include "storage/lmgr.h"
#include "utils/acl.h"
#include "utils/builtins.h"
@@ -928,7 +929,6 @@ DropSubscription(DropSubscriptionStmt *stmt, bool isTopLevel)
char *err = NULL;
RepOriginId originid;
WalReceiverConn *wrconn = NULL;
- StringInfoData cmd;
Form_pg_subscription form;
/*
@@ -1016,76 +1016,188 @@ DropSubscription(DropSubscriptionStmt *stmt, bool isTopLevel)
ReleaseSysCache(tup);
/*
- * Stop all the subscription workers immediately.
- *
- * This is necessary if we are dropping the replication slot, so that the
- * slot becomes accessible.
+ * Try to acquire the connection necessary for dropping slots.
+ * We do this here so that the same connection may be shared
+ * for dropping the Subscription slot, as well as dropping any
+ * tablesync slots.
*
- * It is also necessary if the subscription is disabled and was disabled
- * in the same transaction. Then the workers haven't seen the disabling
- * yet and will still be running, leading to hangs later when we want to
- * drop the replication origin. If the subscription was disabled before
- * this transaction, then there shouldn't be any workers left, so this
- * won't make a difference.
- *
- * New workers won't be started because we hold an exclusive lock on the
- * subscription till the end of the transaction.
+ * Note: If the slotname is NONE/NULL then connection errors are
+ * suppressed. This is necessary so that the DROP SUBSCRIPTION
+ * can still complete even when the connection to publisher is
+ * broken.
*/
- LWLockAcquire(LogicalRepWorkerLock, LW_SHARED);
- subworkers = logicalrep_workers_find(subid, false);
- LWLockRelease(LogicalRepWorkerLock);
- foreach(lc, subworkers)
+ load_file("libpqwalreceiver", false);
+
+ wrconn = walrcv_connect(conninfo, true, subname, &err);
+ if (wrconn == NULL && slotname != NULL)
+ ereport(ERROR,
+ (errmsg("could not connect to publisher when attempting to "
+ "drop the replication slot \"%s\"", slotname),
+ errdetail("The error was: %s", err),
+ /* translator: %s is an SQL ALTER command */
+ errhint("Use %s to disassociate the subscription from the slot.",
+ "ALTER SUBSCRIPTION ... SET (slot_name = NONE)")));
+
+ PG_TRY();
{
- LogicalRepWorker *w = (LogicalRepWorker *) lfirst(lc);
+ /*
+ * Stop all the subscription workers immediately.
+ *
+ * This is necessary if we are dropping the replication slot, so that the
+ * slot becomes accessible.
+ *
+ * It is also necessary if the subscription is disabled and was disabled
+ * in the same transaction. Then the workers haven't seen the disabling
+ * yet and will still be running, leading to hangs later when we want to
+ * drop the replication origin. If the subscription was disabled before
+ * this transaction, then there shouldn't be any workers left, so this
+ * won't make a difference.
+ *
+ * New workers won't be started because we hold an exclusive lock on the
+ * subscription till the end of the transaction.
+ */
+ LWLockAcquire(LogicalRepWorkerLock, LW_SHARED);
+ subworkers = logicalrep_workers_find(subid, false);
+ LWLockRelease(LogicalRepWorkerLock);
+ foreach(lc, subworkers)
+ {
+ LogicalRepWorker *w = (LogicalRepWorker *) lfirst(lc);
- logicalrep_worker_stop(w->subid, w->relid);
- }
- list_free(subworkers);
+ logicalrep_worker_stop(w->subid, w->relid);
+ }
+ list_free(subworkers);
- /* Clean up dependencies */
- deleteSharedDependencyRecordsFor(SubscriptionRelationId, subid, 0);
+ /*
+ * Tablesync resource cleanup (slots and origins).
+ *
+ * Any READY-state relations would already have dealt with clean-ups.
+ */
+ {
+ List *rstates;
+ ListCell *lc;
- /* Remove any associated relation synchronization states. */
- RemoveSubscriptionRel(subid, InvalidOid);
+ rstates = GetSubscriptionNotReadyRelations(subid);
+ foreach(lc, rstates)
+ {
+ SubscriptionRelState *rstate = (SubscriptionRelState *) lfirst(lc);
+ Oid relid = rstate->relid;
- /* Remove the origin tracking if exists. */
- snprintf(originname, sizeof(originname), "pg_%u", subid);
- originid = replorigin_by_name(originname, true);
- if (originid != InvalidRepOriginId)
- replorigin_drop(originid, false);
+ /* Only cleanup the tablesync worker resources */
+ if (!OidIsValid(relid))
+ continue;
- /*
- * If there is no slot associated with the subscription, we can finish
- * here.
- */
- if (!slotname)
+ /* Drop the tablesync slot. */
+ {
+ char *syncslotname = ReplicationSlotNameForTablesync(subid, relid);
+
+ /*
+ * If the subscription slotname is NONE/NULL and the connection to publisher is
+ * broken, but the DropSubscription should still be allowed to complete.
+ * But without a connection it is not possible to drop any tablesync slots.
+ */
+ if (!wrconn)
+ {
+ /* FIXME - OK to just log a warning? */
+ elog(WARNING, "!!>> DropSubscription: no connection. Cannot drop tablesync slot \"%s\".",
+ syncslotname);
+ }
+ else
+ {
+ PG_TRY();
+ {
+ elog(LOG, "!!>> DropSubscription: dropping the tablesync slot \"%s\".", syncslotname);
+ ReplicationSlotDropAtPubNode(wrconn, syncslotname);
+ elog(LOG, "!!>> DropSubscription: dropped the tablesync slot \"%s\".", syncslotname);
+ }
+ PG_CATCH();
+ {
+ /*
+ * Typically tablesync will delete its own slot after it reaches
+ * SYNCDONE state. Then the apply worker moves the tablesync from
+ * SYNCDONE to READY state.
+ *
+ * Rarely, the DropSubscription may be issued in between when a
+ * tablesync still is in SYNCDONE, but not yet reached READY state.
+ * If this happens then the drop slot could fail since it was
+ * already dropped, so suppress the error.
+ */
+ if (rstate->state != SUBREL_STATE_SYNCDONE)
+ {
+ pfree(syncslotname);
+ PG_RE_THROW();
+ }
+ }
+ PG_END_TRY();
+ }
+ pfree(syncslotname);
+ }
+
+ /* Remove the tablesync's origin tracking if exists. */
+ {
+ snprintf(originname, sizeof(originname), "pg_%u_%u", subid, relid);
+ originid = replorigin_by_name(originname, true);
+ if (originid != InvalidRepOriginId)
+ {
+ elog(LOG, "!!>> DropSubscription: dropping origin tracking for \"%s\"", originname);
+ replorigin_drop(originid, false);
+ elog(LOG, "!!>> DropSubscription: dropped origin tracking for \"%s\"", originname);
+ }
+ }
+
+ }
+ list_free(rstates);
+ }
+
+ /* Clean up dependencies */
+ deleteSharedDependencyRecordsFor(SubscriptionRelationId, subid, 0);
+
+ /* Remove any associated relation synchronization states. */
+ RemoveSubscriptionRel(subid, InvalidOid);
+
+ /* Remove the origin tracking if exists. */
+ snprintf(originname, sizeof(originname), "pg_%u", subid);
+ originid = replorigin_by_name(originname, true);
+ if (originid != InvalidRepOriginId)
+ replorigin_drop(originid, false);
+
+ /*
+ * If there is a slot associated with the subscription, then drop the
+ * replication slot at the publisher node using the replication
+ * connection.
+ */
+ if (slotname)
+ ReplicationSlotDropAtPubNode(wrconn, slotname);
+ }
+ PG_FINALLY();
{
+ if (wrconn)
+ walrcv_disconnect(wrconn);
+
table_close(rel, NoLock);
- return;
}
+ PG_END_TRY();
+}
+
+
+/*
+ * Drop the replication slot at the publisher node
+ * using the replication connection.
+ */
+void
+ReplicationSlotDropAtPubNode(WalReceiverConn *wrconn, char *slotname)
+{
+ StringInfoData cmd;
+
+ Assert(wrconn);
- /*
- * Otherwise drop the replication slot at the publisher node using the
- * replication connection.
- */
load_file("libpqwalreceiver", false);
initStringInfo(&cmd);
appendStringInfo(&cmd, "DROP_REPLICATION_SLOT %s WAIT", quote_identifier(slotname));
- wrconn = walrcv_connect(conninfo, true, subname, &err);
- if (wrconn == NULL)
- ereport(ERROR,
- (errmsg("could not connect to publisher when attempting to "
- "drop the replication slot \"%s\"", slotname),
- errdetail("The error was: %s", err),
- /* translator: %s is an SQL ALTER command */
- errhint("Use %s to disassociate the subscription from the slot.",
- "ALTER SUBSCRIPTION ... SET (slot_name = NONE)")));
-
PG_TRY();
{
- WalRcvExecResult *res;
+ WalRcvExecResult *res;
res = walrcv_exec(wrconn, cmd.data, 0, NULL);
@@ -1103,13 +1215,9 @@ DropSubscription(DropSubscriptionStmt *stmt, bool isTopLevel)
}
PG_FINALLY();
{
- walrcv_disconnect(wrconn);
+ pfree(cmd.data);
}
PG_END_TRY();
-
- pfree(cmd.data);
-
- table_close(rel, NoLock);
}
/*
diff --git a/src/backend/replication/logical/origin.c b/src/backend/replication/logical/origin.c
index 15ab8e7..6b79dc6 100644
--- a/src/backend/replication/logical/origin.c
+++ b/src/backend/replication/logical/origin.c
@@ -843,7 +843,7 @@ replorigin_redo(XLogReaderState *record)
* that originated at the LSN remote_commit on the remote node was replayed
* successfully and that we don't need to do so again. In combination with
* setting up replorigin_session_origin_lsn and replorigin_session_origin
- * that ensures we won't loose knowledge about that after a crash if the
+ * that ensures we won't lose knowledge about that after a crash if the
* transaction had a persistent effect (think of asynchronous commits).
*
* local_commit needs to be a local LSN of the commit so that we can make sure
@@ -905,7 +905,7 @@ replorigin_advance(RepOriginId node,
LWLockAcquire(&replication_state->lock, LW_EXCLUSIVE);
/* Make sure it's not used by somebody else */
- if (replication_state->acquired_by != 0)
+ if (replication_state->acquired_by != 0 && replication_state->acquired_by != MyProcPid)
{
ereport(ERROR,
(errcode(ERRCODE_OBJECT_IN_USE),
diff --git a/src/backend/replication/logical/tablesync.c b/src/backend/replication/logical/tablesync.c
index 6259606..8180f49 100644
--- a/src/backend/replication/logical/tablesync.c
+++ b/src/backend/replication/logical/tablesync.c
@@ -43,13 +43,17 @@
* state to SYNCDONE. There might be zero changes applied between
* CATCHUP and SYNCDONE, because the sync worker might be ahead of the
* apply worker.
+ * - The sync worker has a intermediary state COPYDONE which comes after
+ * CATCHUP and before SYNCDONE. This state indicates that the initial
+ * table copy phase has completed, so if the worker crashes before
+ * reaching SYNCDONE the copy will not be re-attempted.
* - Once the state is set to SYNCDONE, the apply will continue tracking
* the table until it reaches the SYNCDONE stream position, at which
* point it sets state to READY and stops tracking. Again, there might
* be zero changes in between.
*
* So the state progression is always: INIT -> DATASYNC -> SYNCWAIT ->
- * CATCHUP -> SYNCDONE -> READY.
+ * CATCHUP -> (sync worker COPYDONE) -> SYNCDONE -> READY.
*
* The catalog pg_subscription_rel is used to keep information about
* subscribed tables and their state. Some transient state during data
@@ -64,6 +68,7 @@
* -> set in memory CATCHUP
* -> enter wait-loop
* sync:10
+ * -> set in catalog COPYDONE
* -> set in catalog SYNCDONE
* -> exit
* apply:10
@@ -79,6 +84,7 @@
* -> set in memory CATCHUP
* -> continue per-table filtering
* sync:10
+ * -> set in catalog COPYDONE
* -> set in catalog SYNCDONE
* -> exit
* apply:10
@@ -102,6 +108,8 @@
#include "replication/logicalrelation.h"
#include "replication/walreceiver.h"
#include "replication/worker_internal.h"
+#include "replication/slot.h"
+#include "replication/origin.h"
#include "storage/ipc.h"
#include "utils/builtins.h"
#include "utils/lsyscache.h"
@@ -139,6 +147,33 @@ finish_sync_worker(void)
get_rel_name(MyLogicalRepWorker->relid))));
CommitTransactionCommand();
+ /*
+ * Cleanup the tablesync slot.
+ */
+ {
+ /* Calculate the name of the tablesync slot */
+ char *syncslotname = ReplicationSlotNameForTablesync(
+ MySubscription->oid,
+ MyLogicalRepWorker->relid);
+
+ PG_TRY();
+ {
+ elog(LOG, "!!>> finish_sync_worker: dropping the tablesync slot \"%s\".", syncslotname);
+ ReplicationSlotDropAtPubNode(wrconn, syncslotname);
+ elog(LOG, "!!>> finish_sync_worker: dropped the tablesync slot \"%s\".", syncslotname);
+ }
+ PG_CATCH();
+ {
+ /*
+ * NOP. Suppress any drop slot error because otherwise
+ * it would cause the tablesync to fail and re-launch.
+ */
+ }
+ PG_END_TRY();
+
+ pfree(syncslotname);
+ }
+
/* Find the main apply worker and signal it. */
logicalrep_worker_wakeup(MyLogicalRepWorker->subid, InvalidOid);
@@ -270,8 +305,6 @@ invalidate_syncing_table_states(Datum arg, int cacheid, uint32 hashvalue)
static void
process_syncing_tables_for_sync(XLogRecPtr current_lsn)
{
- Assert(IsTransactionState());
-
SpinLockAcquire(&MyLogicalRepWorker->relmutex);
if (MyLogicalRepWorker->relstate == SUBREL_STATE_CATCHUP &&
@@ -284,6 +317,15 @@ process_syncing_tables_for_sync(XLogRecPtr current_lsn)
SpinLockRelease(&MyLogicalRepWorker->relmutex);
+ /*
+ * UpdateSubscriptionRelState must be called within a transaction.
+ * That transaction will be ended within the finish_sync_worker().
+ */
+ if (!IsTransactionState())
+ {
+ StartTransactionCommand();
+ }
+
UpdateSubscriptionRelState(MyLogicalRepWorker->subid,
MyLogicalRepWorker->relid,
MyLogicalRepWorker->relstate,
@@ -406,12 +448,41 @@ process_syncing_tables_for_apply(XLogRecPtr current_lsn)
{
rstate->state = SUBREL_STATE_READY;
rstate->lsn = current_lsn;
+
if (!started_tx)
{
StartTransactionCommand();
started_tx = true;
}
+ /*
+ * Remove the tablesync origin tracking if exists.
+ *
+ * The cleanup is done here instead of in the finish_sync_worker function because
+ * if the tablesync worker process attempted to call replorigin_drop then that will
+ * hang because the replorigin_drop considers the owning tablesync PID as "busy".
+ *
+ * Do this before updating the state, so that DropSubscription can know that all
+ * READY workers have already had their origin tracking removed.
+ */
+ {
+ char originname[NAMEDATALEN];
+ RepOriginId originid;
+
+ snprintf(originname, sizeof(originname), "pg_%u_%u", MyLogicalRepWorker->subid, rstate->relid);
+ originid = replorigin_by_name(originname, true);
+ elog(LOG, "!!>> apply worker: find tablesync origin tracking for \"%s\".", originname);
+ if (OidIsValid(originid))
+ {
+ elog(LOG, "!!>> apply worker: dropping tablesync origin tracking for \"%s\".", originname);
+ replorigin_drop(originid, false);
+ elog(LOG, "!!>> apply worker: dropped tablesync origin tracking for \"%s\".", originname);
+ }
+ }
+
+ /*
+ * Update the state only after the origin cleanup.
+ */
UpdateSubscriptionRelState(MyLogicalRepWorker->subid,
rstate->relid, rstate->state,
rstate->lsn);
@@ -807,6 +878,32 @@ copy_table(Relation rel)
logicalrep_rel_close(relmapentry, NoLock);
}
+
+/*
+ * Determine the tablesync slot name.
+ *
+ * The returned slot name is palloc'ed in current memory context.
+ */
+char *
+ReplicationSlotNameForTablesync(Oid suboid, Oid relid)
+{
+ char *syncslotname;
+
+ /*
+ * To build a slot name for the sync work, we are limited to NAMEDATALEN -
+ * 1 characters.
+ *
+ * The name is calculated as pg_%u_sync_%u (3 + 10 + 6 + 10 + '\0').
+ * (It's actually the NAMEDATALEN on the remote that matters, but this
+ * scheme will also work reasonably if that is different.)
+ */
+ StaticAssertStmt(NAMEDATALEN >= 32, "NAMEDATALEN too small"); /* for sanity */
+
+ syncslotname = psprintf("pg_%u_sync_%u", suboid, relid);
+
+ return syncslotname;
+}
+
/*
* Start syncing the table in the sync worker.
*
@@ -849,17 +946,8 @@ LogicalRepSyncTableStart(XLogRecPtr *origin_startpos)
finish_sync_worker(); /* doesn't return */
}
- /*
- * To build a slot name for the sync work, we are limited to NAMEDATALEN -
- * 1 characters. We cut the original slot name to NAMEDATALEN - 28 chars
- * and append _%u_sync_%u (1 + 10 + 6 + 10 + '\0'). (It's actually the
- * NAMEDATALEN on the remote that matters, but this scheme will also work
- * reasonably if that is different.)
- */
- StaticAssertStmt(NAMEDATALEN >= 32, "NAMEDATALEN too small"); /* for sanity */
- slotname = psprintf("%.*s_%u_sync_%u",
- NAMEDATALEN - 28,
- MySubscription->slotname,
+ /* Calculate the name of the tablesync slot. */
+ slotname = ReplicationSlotNameForTablesync(
MySubscription->oid,
MyLogicalRepWorker->relid);
@@ -874,7 +962,19 @@ LogicalRepSyncTableStart(XLogRecPtr *origin_startpos)
(errmsg("could not connect to the publisher: %s", err)));
Assert(MyLogicalRepWorker->relstate == SUBREL_STATE_INIT ||
- MyLogicalRepWorker->relstate == SUBREL_STATE_DATASYNC);
+ MyLogicalRepWorker->relstate == SUBREL_STATE_DATASYNC ||
+ MyLogicalRepWorker->relstate == SUBREL_STATE_COPYDONE);
+
+ if (MyLogicalRepWorker->relstate == SUBREL_STATE_COPYDONE)
+ {
+ /*
+ * The COPY phase was previously done, but tablesync then crashed/etc
+ * before it was able to finish normally.
+ */
+ elog(LOG, "!!>> LogicalRepSyncTableStart: tablesync relstate was SUBREL_STATE_COPYDONE.");
+ StartTransactionCommand();
+ goto copy_table_done;
+ }
SpinLockAcquire(&MyLogicalRepWorker->relmutex);
MyLogicalRepWorker->relstate = SUBREL_STATE_DATASYNC;
@@ -890,9 +990,6 @@ LogicalRepSyncTableStart(XLogRecPtr *origin_startpos)
CommitTransactionCommand();
pgstat_report_stat(false);
- /*
- * We want to do the table data sync in a single transaction.
- */
StartTransactionCommand();
/*
@@ -918,29 +1015,98 @@ LogicalRepSyncTableStart(XLogRecPtr *origin_startpos)
walrcv_clear_result(res);
/*
- * Create a new temporary logical decoding slot. This slot will be used
+ * Create a new permanent logical decoding slot. This slot will be used
* for the catchup phase after COPY is done, so tell it to use the
* snapshot to make the final data consistent.
*/
- walrcv_create_slot(wrconn, slotname, true,
+ elog(LOG, "!!>> LogicalRepSyncTableStart: walrcv_create_slot for \"%s\".", slotname);
+ walrcv_create_slot(wrconn, slotname, false,
CRS_USE_SNAPSHOT, origin_startpos);
- /* Now do the initial data copy */
- PushActiveSnapshot(GetTransactionSnapshot());
- copy_table(rel);
- PopActiveSnapshot();
+ /*
+ * Be sure to remove the newly created tablesync slot if the COPY fails.
+ */
+ PG_TRY();
+ {
+ /* Now do the initial data copy */
+ PushActiveSnapshot(GetTransactionSnapshot());
+ copy_table(rel);
+ PopActiveSnapshot();
+
+ res = walrcv_exec(wrconn, "COMMIT", 0, NULL);
+ if (res->status != WALRCV_OK_COMMAND)
+ ereport(ERROR,
+ (errmsg("table copy could not finish transaction on publisher"),
+ errdetail("The error was: %s", res->err)));
+ walrcv_clear_result(res);
+
+ table_close(rel, NoLock);
+
+ /* Make the copy visible. */
+ CommandCounterIncrement();
+ }
+ PG_CATCH();
+ {
+ /* If something failed during copy table then cleanup the created slot. */
+ elog(LOG, "!!>> LogicalRepSyncTableStart: tablesync copy failed. Dropping the tablesync slot \"%s\".", slotname);
+ ReplicationSlotDropAtPubNode(wrconn, slotname);
+ elog(LOG, "!!>> LogicalRepSyncTableStart: tablesync copy failed. Dropped the tablesync slot \"%s\".", slotname);
- res = walrcv_exec(wrconn, "COMMIT", 0, NULL);
- if (res->status != WALRCV_OK_COMMAND)
- ereport(ERROR,
- (errmsg("table copy could not finish transaction on publisher"),
- errdetail("The error was: %s", res->err)));
- walrcv_clear_result(res);
+ pfree(slotname);
+ slotname = NULL;
+
+ PG_RE_THROW();
+ }
+ PG_END_TRY();
+
+ /* Update the persisted state to indicate the COPY phase is done; make it visible to others. */
+ UpdateSubscriptionRelState(MyLogicalRepWorker->subid,
+ MyLogicalRepWorker->relid,
+ SUBREL_STATE_COPYDONE,
+ MyLogicalRepWorker->relstate_lsn);
+
+copy_table_done:
+
+ /* Setup replication origin tracking. */
+ {
+ char originname[NAMEDATALEN];
+ RepOriginId originid;
+
+ snprintf(originname, sizeof(originname), "pg_%u_%u", MySubscription->oid, MyLogicalRepWorker->relid);
+ originid = replorigin_by_name(originname, true);
+ if (!OidIsValid(originid))
+ {
+ /*
+ * Origin tracking does not exist. Create it now, and advance to LSN got from walrcv_create_slot.
+ */
+ elog(LOG, "!!>> LogicalRepSyncTableStart: 1 replorigin_create \"%s\".", originname);
+ originid = replorigin_create(originname);
+ elog(LOG, "!!>> LogicalRepSyncTableStart: 1 replorigin_session_setup \"%s\".", originname);
+ replorigin_session_setup(originid);
+ replorigin_session_origin = originid;
+ elog(LOG, "!!>> LogicalRepSyncTableStart: 1 replorigin_advance \"%s\".", originname);
+ replorigin_advance(originid, *origin_startpos, InvalidXLogRecPtr,
+ true /* go backward */ , true /* WAL log */ );
+ }
+ else
+ {
+ /*
+ * Origin tracking already exists.
+ */
+ elog(LOG, "!!>> LogicalRepSyncTableStart: 2 replorigin_session_setup \"%s\".", originname);
+ replorigin_session_setup(originid);
+ replorigin_session_origin = originid;
+ elog(LOG, "!!>> LogicalRepSyncTableStart: 2 replorigin_session_get_progress \"%s\".", originname);
+ *origin_startpos = replorigin_session_get_progress(false);
+ }
- table_close(rel, NoLock);
+ elog(LOG, "!!>> LogicalRepSyncTableStart: '%s' origin_startpos lsn %X/%X",
+ originname,
+ (uint32) (*origin_startpos >> 32),
+ (uint32) *origin_startpos);
+ }
- /* Make the copy visible. */
- CommandCounterIncrement();
+ CommitTransactionCommand();
/*
* We are done with the initial data synchronization, update the state.
diff --git a/src/backend/replication/logical/worker.c b/src/backend/replication/logical/worker.c
index 3874939..d28cfb8 100644
--- a/src/backend/replication/logical/worker.c
+++ b/src/backend/replication/logical/worker.c
@@ -807,12 +807,8 @@ apply_handle_stream_stop(StringInfo s)
/* We must be in a valid transaction state */
Assert(IsTransactionState());
- /* The synchronization worker runs in single transaction. */
- if (!am_tablesync_worker())
- {
- /* Commit the per-stream transaction */
- CommitTransactionCommand();
- }
+ /* Commit the per-stream transaction */
+ CommitTransactionCommand();
in_streamed_transaction = false;
@@ -889,9 +885,7 @@ apply_handle_stream_abort(StringInfo s)
/* Cleanup the subxact info */
cleanup_subxact_info();
- /* The synchronization worker runs in single transaction */
- if (!am_tablesync_worker())
- CommitTransactionCommand();
+ CommitTransactionCommand();
return;
}
@@ -918,8 +912,7 @@ apply_handle_stream_abort(StringInfo s)
/* write the updated subxact list */
subxact_info_write(MyLogicalRepWorker->subid, xid);
- if (!am_tablesync_worker())
- CommitTransactionCommand();
+ CommitTransactionCommand();
}
}
@@ -1062,8 +1055,7 @@ apply_handle_stream_commit(StringInfo s)
static void
apply_handle_commit_internal(StringInfo s, LogicalRepCommitData* commit_data)
{
- /* The synchronization worker runs in single transaction. */
- if (IsTransactionState() && !am_tablesync_worker())
+ if (IsTransactionState())
{
/*
* Update origin state so we can restart streaming from correct
diff --git a/src/include/catalog/pg_subscription_rel.h b/src/include/catalog/pg_subscription_rel.h
index acc2926..e9f2b3f 100644
--- a/src/include/catalog/pg_subscription_rel.h
+++ b/src/include/catalog/pg_subscription_rel.h
@@ -61,6 +61,7 @@ DECLARE_UNIQUE_INDEX(pg_subscription_rel_srrelid_srsubid_index, 6117, on pg_subs
#define SUBREL_STATE_INIT 'i' /* initializing (sublsn NULL) */
#define SUBREL_STATE_DATASYNC 'd' /* data is being synchronized (sublsn
* NULL) */
+#define SUBREL_STATE_COPYDONE 'C' /* tablesync copy phase is completed */
#define SUBREL_STATE_SYNCDONE 's' /* synchronization finished in front of
* apply (sublsn set) */
#define SUBREL_STATE_READY 'r' /* ready (sublsn set) */
diff --git a/src/include/replication/slot.h b/src/include/replication/slot.h
index 63bab69..5f19089 100644
--- a/src/include/replication/slot.h
+++ b/src/include/replication/slot.h
@@ -15,6 +15,7 @@
#include "storage/lwlock.h"
#include "storage/shmem.h"
#include "storage/spin.h"
+#include "replication/walreceiver.h"
/*
* Behaviour of replication slots, upon release or crash.
@@ -211,6 +212,8 @@ extern bool ReplicationSlotsCountDBSlots(Oid dboid, int *nslots, int *nactive);
extern void ReplicationSlotsDropDBSlots(Oid dboid);
extern void InvalidateObsoleteReplicationSlots(XLogSegNo oldestSegno);
extern ReplicationSlot *SearchNamedReplicationSlot(const char *name);
+extern char *ReplicationSlotNameForTablesync(Oid suboid, Oid relid);
+extern void ReplicationSlotDropAtPubNode(WalReceiverConn *wrconn, char *slotname);
extern void StartupReplicationSlots(void);
extern void CheckPointReplicationSlots(void);
--
1.8.3.1
On Wed, Dec 23, 2020 at 9:07 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
On Tue, Dec 22, 2020 at 4:58 PM Peter Smith <smithpb2250@gmail.com> wrote:
On Mon, Dec 21, 2020 at 11:36 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
On Mon, Dec 21, 2020 at 3:17 PM Peter Smith <smithpb2250@gmail.com> wrote:
On Mon, Dec 21, 2020 at 4:23 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
Few other comments:
==================Thanks for your feedback.
1. * FIXME 3 - Crashed tablesync workers may also have remaining slots because I don't think + * such workers are even iterated by this loop, and nobody else is removing them. + */ + if (slotname) + {The above FIXME is not clear to me. Actually, the crashed workers
should restart, finish their work, and drop the slots. So not sure
what exactly this FIXME refers to?Yes, normally if the tablesync can complete it should behave like that.
But I think there are other scenarios where it may be unable to
clean-up after itself. For example:i) Maybe the crashed tablesync worker cannot finish. e.g. A row insert
handled by tablesync can give a PK violation which also will crash
again and again for each re-launched/replacement tablesync worker.
This can be reproduced in the debugger. If the DropSubscription
doesn't clean-up the tablesync's slot then nobody will.ii) Also DROP SUBSCRIPTION code has locking (see code commit) "to
ensure that the launcher doesn't restart new worker during dropping
the subscription".Yeah, I have also read that comment but do you know how it is
preventing relaunch? How does the subscription lock help?Hmmm. I did see there is a matching lock in get_subscription_list of
launcher.c, which may be what that code comment was referring to. But
I also am currently unsure how this lock prevents anybody (e.g.
process_syncing_tables_for_apply) from executing another
logicalrep_worker_launch.process_syncing_tables_for_apply will be called by the apply worker
and we are stopping the apply worker. So, after that launcher won't
start a new apply worker because of get_subscription_list() and if the
apply worker is not started then it won't be able to start tablesync
worker. So, we need the handling of crashed tablesync workers here
such that we need to drop any new sync slots.
Yes, in the v6 patch code this was a problem in need of handling. But
since the v7 patch the DropSubscription code is now using a separate
GetSubscriptionNotReadyRelations loop to handle the cleanup of
potentially leftover slots from crashed tablesync workers (i.e.
workers that never got to a READY state).
Kind Regards,
Peter Smith.
Fujitsu Australia
On Wed, Dec 23, 2020 at 8:43 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
1. + * Rarely, the DropSubscription may be issued when a tablesync still + * is in SYNCDONE but not yet in READY state. If this happens then + * the drop slot could fail because it is already dropped. + * In this case suppress and drop slot error. + * + * FIXME - Is there a better way than this? + */ + if (rstate->state != SUBREL_STATE_SYNCDONE) + PG_RE_THROW();So, does this situation happens when we try to drop subscription after
the state is changed to syncdone but not syncready. If so, then can't
we write a function GetSubscriptionNotDoneRelations similar to
GetSubscriptionNotReadyRelations where we get a list of relations that
are not in done stage. I think this should be safe because once we are
here we shouldn't be allowed to start a new worker and old workers are
already stopped by this function.
Yes, but I don't see how adding such a function is an improvement over
the existing code:
e.g.1. GetSubscriptionNotDoneRelations will include the READY state
(which we don't want) just like GetSubscriptionNotReadyRelations
includes the SYNCDONE state.
e.g.2. Or, something like GetSubscriptionNotDoneAndNotReadyRelations
would be an unnecessary overkill replacement for the current simple
"if".
AFAIK the code is OK as-is. That "FIXME" comment was really meant only
to highlight this for review, rather than to imply something needed to
be fixed. I have removed that "FIXME" comment in the latest patch
[v9].
2. Your changes in LogicalRepSyncTableStart() doesn't seem to be
right. IIUC, you are copying the table in one transaction, then
updating the state to SUBREL_STATE_COPYDONE in another transaction,
and after that doing replorigin_advance. Consider what happened if we
error out after the first txn is committed in which we have copied the
table. After the restart, it will again try to copy and lead to an
error. Similarly, consider if we error out after the second
transaction, we won't where to start decoding from. I think all these
should be done in a single transaction.
Fixed as suggested. Please see latest patch [v9]
---
[v9] /messages/by-id/CAHut+Pv8ShLmrjCriVU+tprk_9b2kwBxYK2oGSn5Eb__kWVc7A@mail.gmail.com
Kind Regards,
Peter Smith.
Fujitsu Australia
On Wed, Dec 30, 2020 at 11:51 AM Peter Smith <smithpb2250@gmail.com> wrote:
On Wed, Dec 23, 2020 at 8:43 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
1. + * Rarely, the DropSubscription may be issued when a tablesync still + * is in SYNCDONE but not yet in READY state. If this happens then + * the drop slot could fail because it is already dropped. + * In this case suppress and drop slot error. + * + * FIXME - Is there a better way than this? + */ + if (rstate->state != SUBREL_STATE_SYNCDONE) + PG_RE_THROW();So, does this situation happens when we try to drop subscription after
the state is changed to syncdone but not syncready. If so, then can't
we write a function GetSubscriptionNotDoneRelations similar to
GetSubscriptionNotReadyRelations where we get a list of relations that
are not in done stage. I think this should be safe because once we are
here we shouldn't be allowed to start a new worker and old workers are
already stopped by this function.Yes, but I don't see how adding such a function is an improvement over
the existing code:
The advantage is that we don't need to use try..catch to deal with
such conditions which I don't think is a good way to deal with such
cases. Also, even after using try...catch, still, we can leak the
slots because the patch drops the slot after changing the state to
syncdone and if there is any error while dropping the slot, it simply
skips it. So, it is possible that the rel state is syncdone but the
slot still exists and we get an error due to some different reason,
and then we will silently skip it again and allow the subscription to
be dropped.
I think instead what we should do is to drop the slot before we change
the rel state to syncdone. Also, if the apply workers fail to drop the
slot, it should try to again drop it after restart. In
DropSubscription, we can then check if the rel state is not SYNC or
READY, we can drop the corresponding slots.
e.g.1. GetSubscriptionNotDoneRelations will include the READY state
(which we don't want) just like GetSubscriptionNotReadyRelations
includes the SYNCDONE state.
e.g.2. Or, something like GetSubscriptionNotDoneAndNotReadyRelations
would be an unnecessary overkill replacement for the current simple
"if".
or we can probably modify the function as
GetSubscriptionRelationsNotInStates and pass it an array of states
which we don't want.
AFAIK the code is OK as-is.
As described above, there are still race conditions where we can leak
slots and also this doesn't look clean.
Few other comments:
=================
1.
+ elog(LOG, "!!>> DropSubscription: dropping the tablesync slot
\"%s\".", syncslotname);
+ ReplicationSlotDropAtPubNode(wrconn, syncslotname);
+ elog(LOG, "!!>> DropSubscription: dropped the tablesync slot
\"%s\".", syncslotname);
...
...
+ elog(LOG, "!!>> finish_sync_worker: dropping the tablesync slot
\"%s\".", syncslotname);
+ ReplicationSlotDropAtPubNode(wrconn, syncslotname);
+ elog(LOG, "!!>> finish_sync_worker: dropped the tablesync slot
\"%s\".", syncslotname);
Remove these and other elogs added to aid debugging or testing. If you
need these for development purposes then move these to separate patch.
2. Remove WIP from the commit message and patch name.
--
With Regards,
Amit Kapila.
Hi Amit.
PSA my v10 patch for the Solution1.
v10 is essentially the same as v9, except now all the temporary "!!>>"
logging has been isolated to a separate (optional) patch 0002.
====
Features:
* tablesync slot is now permanent instead of temporary. The tablesync
slot name is no longer tied to the Subscription slot na
* the tablesync slot cleanup (drop) code is added for DropSubscription
and for finish_sync_worker functions
* tablesync worked now allowing multiple tx instead of single tx
* a new state (SUBREL_STATE_COPYDONE) is persisted after a successful
copy_table in LogicalRepSyncTableStart.
* if a re-launched tablesync finds the state is SUBREL_STATE_COPYDONE
then it will bypass the initial copy_table phase.
* tablesync sets up replication origin tracking in
LogicalRepSyncTableStart (similar as done for the apply worker). The
origin is advanced when first created.
* tablesync replication origin tracking is cleaned up during
DropSubscription and/or process_syncing_tables_for_apply.
* the DropSubscription cleanup code was enhanced (v7+) to take care of
crashed sync workers.
* minor updates to PG docs
TODO / Known Issues:
* address review comments
---
Kind Regards,
Peter Smith.
Fujitsu Australia
Attachments:
v10-0002-Tablesync-extra-logging.patchapplication/octet-stream; name=v10-0002-Tablesync-extra-logging.patchDownload
From 568246ff3d2e77b9010a6bd48188bbf4ccdf8cff Mon Sep 17 00:00:00 2001
From: Peter Smith <peter.b.smith@fujitsu.com>
Date: Mon, 4 Jan 2021 19:30:35 +1100
Subject: [PATCH v10] Tablesync extra logging.
This patch only adds some extra logging which may be helpful for testing, but is not for committing.
---
src/backend/commands/subscriptioncmds.c | 6 ++++--
src/backend/replication/logical/tablesync.c | 20 +++++++++++++++-----
2 files changed, 19 insertions(+), 7 deletions(-)
diff --git a/src/backend/commands/subscriptioncmds.c b/src/backend/commands/subscriptioncmds.c
index f829e5e..6e05407 100644
--- a/src/backend/commands/subscriptioncmds.c
+++ b/src/backend/commands/subscriptioncmds.c
@@ -1105,8 +1105,9 @@ DropSubscription(DropSubscriptionStmt *stmt, bool isTopLevel)
{
PG_TRY();
{
- elog(DEBUG1, "DropSubscription: dropping the tablesync slot \"%s\".", syncslotname);
+ elog(LOG, "!!>> DropSubscription: dropping the tablesync slot \"%s\".", syncslotname);
ReplicationSlotDropAtPubNode(wrconn, syncslotname);
+ elog(LOG, "!!>> DropSubscription: dropped the tablesync slot \"%s\".", syncslotname);
}
PG_CATCH();
{
@@ -1137,8 +1138,9 @@ DropSubscription(DropSubscriptionStmt *stmt, bool isTopLevel)
originid = replorigin_by_name(originname, true);
if (originid != InvalidRepOriginId)
{
- elog(DEBUG1, "DropSubscription: dropping origin tracking for \"%s\"", originname);
+ elog(LOG, "!!>> DropSubscription: dropping origin tracking for \"%s\"", originname);
replorigin_drop(originid, false);
+ elog(LOG, "!!>> DropSubscription: dropped origin tracking for \"%s\"", originname);
}
}
diff --git a/src/backend/replication/logical/tablesync.c b/src/backend/replication/logical/tablesync.c
index 206b2de..7eec7d2 100644
--- a/src/backend/replication/logical/tablesync.c
+++ b/src/backend/replication/logical/tablesync.c
@@ -158,8 +158,9 @@ finish_sync_worker(void)
PG_TRY();
{
- elog(DEBUG1, "finish_sync_worker: dropping the tablesync slot \"%s\".", syncslotname);
+ elog(LOG, "!!>> finish_sync_worker: dropping the tablesync slot \"%s\".", syncslotname);
ReplicationSlotDropAtPubNode(wrconn, syncslotname);
+ elog(LOG, "!!>> finish_sync_worker: dropped the tablesync slot \"%s\".", syncslotname);
}
PG_CATCH();
{
@@ -469,10 +470,12 @@ process_syncing_tables_for_apply(XLogRecPtr current_lsn)
snprintf(originname, sizeof(originname), "pg_%u_%u", MyLogicalRepWorker->subid, rstate->relid);
originid = replorigin_by_name(originname, true);
+ elog(LOG, "!!>> apply worker: find tablesync origin tracking for \"%s\".", originname);
if (OidIsValid(originid))
{
- elog(DEBUG1, "process_syncing_tables_for_apply: dropping tablesync origin tracking for \"%s\".", originname);
+ elog(LOG, "!!>> apply worker: dropping tablesync origin tracking for \"%s\".", originname);
replorigin_drop(originid, false);
+ elog(LOG, "!!>> apply worker: dropped tablesync origin tracking for \"%s\".", originname);
}
}
@@ -966,7 +969,7 @@ LogicalRepSyncTableStart(XLogRecPtr *origin_startpos)
* The COPY phase was previously done, but tablesync then crashed/etc
* before it was able to finish normally.
*/
- elog(DEBUG1, "LogicalRepSyncTableStart: tablesync relstate was SUBREL_STATE_COPYDONE.");
+ elog(LOG, "!!>> LogicalRepSyncTableStart: tablesync relstate was SUBREL_STATE_COPYDONE.");
StartTransactionCommand();
goto copy_table_done;
}
@@ -1014,6 +1017,7 @@ LogicalRepSyncTableStart(XLogRecPtr *origin_startpos)
* for the catchup phase after COPY is done, so tell it to use the
* snapshot to make the final data consistent.
*/
+ elog(LOG, "!!>> LogicalRepSyncTableStart: walrcv_create_slot for \"%s\".", slotname);
walrcv_create_slot(wrconn, slotname, false,
CRS_USE_SNAPSHOT, origin_startpos);
@@ -1042,8 +1046,9 @@ LogicalRepSyncTableStart(XLogRecPtr *origin_startpos)
PG_CATCH();
{
/* If something failed during copy table then cleanup the created slot. */
- elog(DEBUG1, "LogicalRepSyncTableStart: tablesync copy failed. Dropping the tablesync slot \"%s\".", slotname);
+ elog(LOG, "!!>> LogicalRepSyncTableStart: tablesync copy failed. Dropping the tablesync slot \"%s\".", slotname);
ReplicationSlotDropAtPubNode(wrconn, slotname);
+ elog(LOG, "!!>> LogicalRepSyncTableStart: tablesync copy failed. Dropped the tablesync slot \"%s\".", slotname);
pfree(slotname);
slotname = NULL;
@@ -1072,9 +1077,12 @@ copy_table_done:
/*
* Origin tracking does not exist. Create it now, and advance to LSN got from walrcv_create_slot.
*/
+ elog(LOG, "!!>> LogicalRepSyncTableStart: 1 replorigin_create \"%s\".", originname);
originid = replorigin_create(originname);
+ elog(LOG, "!!>> LogicalRepSyncTableStart: 1 replorigin_session_setup \"%s\".", originname);
replorigin_session_setup(originid);
replorigin_session_origin = originid;
+ elog(LOG, "!!>> LogicalRepSyncTableStart: 1 replorigin_advance \"%s\".", originname);
replorigin_advance(originid, *origin_startpos, InvalidXLogRecPtr,
true /* go backward */ , true /* WAL log */ );
}
@@ -1083,12 +1091,14 @@ copy_table_done:
/*
* Origin tracking already exists.
*/
+ elog(LOG, "!!>> LogicalRepSyncTableStart: 2 replorigin_session_setup \"%s\".", originname);
replorigin_session_setup(originid);
replorigin_session_origin = originid;
+ elog(LOG, "!!>> LogicalRepSyncTableStart: 2 replorigin_session_get_progress \"%s\".", originname);
*origin_startpos = replorigin_session_get_progress(false);
}
- elog(DEBUG1, "LogicalRepSyncTableStart: '%s' origin_startpos lsn %X/%X",
+ elog(LOG, "!!>> LogicalRepSyncTableStart: '%s' origin_startpos lsn %X/%X",
originname,
(uint32) (*origin_startpos >> 32),
(uint32) *origin_startpos);
--
1.8.3.1
v10-0001-Tablesync-Solution1.patchapplication/octet-stream; name=v10-0001-Tablesync-Solution1.patchDownload
From ed051282784561a62d1c13789d7269cc0c85461f Mon Sep 17 00:00:00 2001
From: Peter Smith <peter.b.smith@fujitsu.com>
Date: Mon, 4 Jan 2021 19:10:18 +1100
Subject: [PATCH v10] Tablesync Solution1.
This is essentially same as v9 except all the temporary development logging is now isolated to a separate patch.
====
Features:
* tablesync slot is now permanent instead of temporary. The tablesync slot name is no longer tied to the Subscription slot na
* the tablesync slot cleanup (drop) code is added for DropSubscription and for finish_sync_worker functions
* tablesync worked now allowing multiple tx instead of single tx
* a new state (SUBREL_STATE_COPYDONE) is persisted after a successful copy_table in LogicalRepSyncTableStart.
* if a re-launched tablesync finds the state is SUBREL_STATE_COPYDONE then it will bypass the initial copy_table phase.
* tablesync sets up replication origin tracking in LogicalRepSyncTableStart (similar as done for the apply worker). The origin is advanced when first created.
* tablesync replication origin tracking is cleaned up during DropSubscription and/or process_syncing_tables_for_apply.
* The DropSubscription cleanup code was enhanced in v7 to take care of crashed sync workers.
* Minor updates to PG docs
TODO / Known Issues:
* Address review comments
---
doc/src/sgml/catalogs.sgml | 1 +
src/backend/commands/subscriptioncmds.c | 218 ++++++++++++++++++++-------
src/backend/replication/logical/origin.c | 4 +-
src/backend/replication/logical/tablesync.c | 220 +++++++++++++++++++++++-----
src/backend/replication/logical/worker.c | 18 +--
src/include/catalog/pg_subscription_rel.h | 1 +
src/include/replication/slot.h | 3 +
7 files changed, 361 insertions(+), 104 deletions(-)
diff --git a/doc/src/sgml/catalogs.sgml b/doc/src/sgml/catalogs.sgml
index 3a22665..8fcc8b1 100644
--- a/doc/src/sgml/catalogs.sgml
+++ b/doc/src/sgml/catalogs.sgml
@@ -7651,6 +7651,7 @@ SCRAM-SHA-256$<replaceable><iteration count></replaceable>:<replaceable>&l
State code:
<literal>i</literal> = initialize,
<literal>d</literal> = data is being copied,
+ <literal>C</literal> = table data has been copied,
<literal>s</literal> = synchronized,
<literal>r</literal> = ready (normal replication)
</para></entry>
diff --git a/src/backend/commands/subscriptioncmds.c b/src/backend/commands/subscriptioncmds.c
index 490e935..f829e5e 100644
--- a/src/backend/commands/subscriptioncmds.c
+++ b/src/backend/commands/subscriptioncmds.c
@@ -37,6 +37,7 @@
#include "replication/walreceiver.h"
#include "replication/walsender.h"
#include "replication/worker_internal.h"
+#include "replication/slot.h"
#include "storage/lmgr.h"
#include "utils/acl.h"
#include "utils/builtins.h"
@@ -928,7 +929,6 @@ DropSubscription(DropSubscriptionStmt *stmt, bool isTopLevel)
char *err = NULL;
RepOriginId originid;
WalReceiverConn *wrconn = NULL;
- StringInfoData cmd;
Form_pg_subscription form;
/*
@@ -1016,73 +1016,183 @@ DropSubscription(DropSubscriptionStmt *stmt, bool isTopLevel)
ReleaseSysCache(tup);
/*
- * Stop all the subscription workers immediately.
- *
- * This is necessary if we are dropping the replication slot, so that the
- * slot becomes accessible.
+ * Try to acquire the connection necessary for dropping slots.
+ * We do this here so that the same connection may be shared
+ * for dropping the Subscription slot, as well as dropping any
+ * tablesync slots.
*
- * It is also necessary if the subscription is disabled and was disabled
- * in the same transaction. Then the workers haven't seen the disabling
- * yet and will still be running, leading to hangs later when we want to
- * drop the replication origin. If the subscription was disabled before
- * this transaction, then there shouldn't be any workers left, so this
- * won't make a difference.
- *
- * New workers won't be started because we hold an exclusive lock on the
- * subscription till the end of the transaction.
+ * Note: If the slotname is NONE/NULL then connection errors are
+ * suppressed. This is necessary so that the DROP SUBSCRIPTION
+ * can still complete even when the connection to publisher is
+ * broken.
*/
- LWLockAcquire(LogicalRepWorkerLock, LW_SHARED);
- subworkers = logicalrep_workers_find(subid, false);
- LWLockRelease(LogicalRepWorkerLock);
- foreach(lc, subworkers)
+ load_file("libpqwalreceiver", false);
+
+ wrconn = walrcv_connect(conninfo, true, subname, &err);
+ if (wrconn == NULL && slotname != NULL)
+ ereport(ERROR,
+ (errmsg("could not connect to publisher when attempting to "
+ "drop the replication slot \"%s\"", slotname),
+ errdetail("The error was: %s", err),
+ /* translator: %s is an SQL ALTER command */
+ errhint("Use %s to disassociate the subscription from the slot.",
+ "ALTER SUBSCRIPTION ... SET (slot_name = NONE)")));
+
+ PG_TRY();
{
- LogicalRepWorker *w = (LogicalRepWorker *) lfirst(lc);
+ /*
+ * Stop all the subscription workers immediately.
+ *
+ * This is necessary if we are dropping the replication slot, so that the
+ * slot becomes accessible.
+ *
+ * It is also necessary if the subscription is disabled and was disabled
+ * in the same transaction. Then the workers haven't seen the disabling
+ * yet and will still be running, leading to hangs later when we want to
+ * drop the replication origin. If the subscription was disabled before
+ * this transaction, then there shouldn't be any workers left, so this
+ * won't make a difference.
+ *
+ * New workers won't be started because we hold an exclusive lock on the
+ * subscription till the end of the transaction.
+ */
+ LWLockAcquire(LogicalRepWorkerLock, LW_SHARED);
+ subworkers = logicalrep_workers_find(subid, false);
+ LWLockRelease(LogicalRepWorkerLock);
+ foreach(lc, subworkers)
+ {
+ LogicalRepWorker *w = (LogicalRepWorker *) lfirst(lc);
- logicalrep_worker_stop(w->subid, w->relid);
- }
- list_free(subworkers);
+ logicalrep_worker_stop(w->subid, w->relid);
+ }
+ list_free(subworkers);
- /* Clean up dependencies */
- deleteSharedDependencyRecordsFor(SubscriptionRelationId, subid, 0);
+ /*
+ * Tablesync resource cleanup (slots and origins).
+ *
+ * Any READY-state relations would already have dealt with clean-ups.
+ */
+ {
+ List *rstates;
+ ListCell *lc;
- /* Remove any associated relation synchronization states. */
- RemoveSubscriptionRel(subid, InvalidOid);
+ rstates = GetSubscriptionNotReadyRelations(subid);
+ foreach(lc, rstates)
+ {
+ SubscriptionRelState *rstate = (SubscriptionRelState *) lfirst(lc);
+ Oid relid = rstate->relid;
- /* Remove the origin tracking if exists. */
- snprintf(originname, sizeof(originname), "pg_%u", subid);
- originid = replorigin_by_name(originname, true);
- if (originid != InvalidRepOriginId)
- replorigin_drop(originid, false);
+ /* Only cleanup the tablesync worker resources */
+ if (!OidIsValid(relid))
+ continue;
- /*
- * If there is no slot associated with the subscription, we can finish
- * here.
- */
- if (!slotname)
+ /* Drop the tablesync slot. */
+ {
+ char *syncslotname = ReplicationSlotNameForTablesync(subid, relid);
+
+ /*
+ * If the subscription slotname is NONE/NULL and the connection to publisher is
+ * broken, but the DropSubscription should still be allowed to complete.
+ * But without a connection it is not possible to drop any tablesync slots.
+ */
+ if (!wrconn)
+ {
+ /* XXX - OK to just log? */
+ elog(LOG, "DROP SUBSCRIPTION: no connection; cannot drop tablesync slot \"%s\".",
+ syncslotname);
+ }
+ else
+ {
+ PG_TRY();
+ {
+ elog(DEBUG1, "DropSubscription: dropping the tablesync slot \"%s\".", syncslotname);
+ ReplicationSlotDropAtPubNode(wrconn, syncslotname);
+ }
+ PG_CATCH();
+ {
+ /*
+ * Typically tablesync will delete its own slot after it reaches
+ * SYNCDONE state. Then the apply worker moves the tablesync from
+ * SYNCDONE to READY state.
+ *
+ * Rarely, the DropSubscription may be issued in between when a
+ * tablesync still is in SYNCDONE, but not yet reached READY state.
+ * If this happens then the drop slot could fail since it was
+ * already dropped, so suppress the error.
+ */
+ if (rstate->state != SUBREL_STATE_SYNCDONE)
+ {
+ pfree(syncslotname);
+ PG_RE_THROW();
+ }
+ }
+ PG_END_TRY();
+ }
+ pfree(syncslotname);
+ }
+
+ /* Remove the tablesync's origin tracking if exists. */
+ {
+ snprintf(originname, sizeof(originname), "pg_%u_%u", subid, relid);
+ originid = replorigin_by_name(originname, true);
+ if (originid != InvalidRepOriginId)
+ {
+ elog(DEBUG1, "DropSubscription: dropping origin tracking for \"%s\"", originname);
+ replorigin_drop(originid, false);
+ }
+ }
+
+ }
+ list_free(rstates);
+ }
+
+ /* Clean up dependencies. */
+ deleteSharedDependencyRecordsFor(SubscriptionRelationId, subid, 0);
+
+ /* Remove any associated relation synchronization states. */
+ RemoveSubscriptionRel(subid, InvalidOid);
+
+ /* Remove the origin tracking if exists. */
+ snprintf(originname, sizeof(originname), "pg_%u", subid);
+ originid = replorigin_by_name(originname, true);
+ if (originid != InvalidRepOriginId)
+ replorigin_drop(originid, false);
+
+ /*
+ * If there is a slot associated with the subscription, then drop the
+ * replication slot at the publisher node using the replication
+ * connection.
+ */
+ if (slotname)
+ ReplicationSlotDropAtPubNode(wrconn, slotname);
+ }
+ PG_FINALLY();
{
+ if (wrconn)
+ walrcv_disconnect(wrconn);
+
table_close(rel, NoLock);
- return;
}
+ PG_END_TRY();
+}
+
+
+/*
+ * Drop the replication slot at the publisher node
+ * using the replication connection.
+ */
+void
+ReplicationSlotDropAtPubNode(WalReceiverConn *wrconn, char *slotname)
+{
+ StringInfoData cmd;
+
+ Assert(wrconn);
- /*
- * Otherwise drop the replication slot at the publisher node using the
- * replication connection.
- */
load_file("libpqwalreceiver", false);
initStringInfo(&cmd);
appendStringInfo(&cmd, "DROP_REPLICATION_SLOT %s WAIT", quote_identifier(slotname));
- wrconn = walrcv_connect(conninfo, true, subname, &err);
- if (wrconn == NULL)
- ereport(ERROR,
- (errmsg("could not connect to publisher when attempting to "
- "drop the replication slot \"%s\"", slotname),
- errdetail("The error was: %s", err),
- /* translator: %s is an SQL ALTER command */
- errhint("Use %s to disassociate the subscription from the slot.",
- "ALTER SUBSCRIPTION ... SET (slot_name = NONE)")));
-
PG_TRY();
{
WalRcvExecResult *res;
@@ -1103,13 +1213,9 @@ DropSubscription(DropSubscriptionStmt *stmt, bool isTopLevel)
}
PG_FINALLY();
{
- walrcv_disconnect(wrconn);
+ pfree(cmd.data);
}
PG_END_TRY();
-
- pfree(cmd.data);
-
- table_close(rel, NoLock);
}
/*
diff --git a/src/backend/replication/logical/origin.c b/src/backend/replication/logical/origin.c
index 0b01cce..b4b4830 100644
--- a/src/backend/replication/logical/origin.c
+++ b/src/backend/replication/logical/origin.c
@@ -843,7 +843,7 @@ replorigin_redo(XLogReaderState *record)
* that originated at the LSN remote_commit on the remote node was replayed
* successfully and that we don't need to do so again. In combination with
* setting up replorigin_session_origin_lsn and replorigin_session_origin
- * that ensures we won't loose knowledge about that after a crash if the
+ * that ensures we won't lose knowledge about that after a crash if the
* transaction had a persistent effect (think of asynchronous commits).
*
* local_commit needs to be a local LSN of the commit so that we can make sure
@@ -905,7 +905,7 @@ replorigin_advance(RepOriginId node,
LWLockAcquire(&replication_state->lock, LW_EXCLUSIVE);
/* Make sure it's not used by somebody else */
- if (replication_state->acquired_by != 0)
+ if (replication_state->acquired_by != 0 && replication_state->acquired_by != MyProcPid)
{
ereport(ERROR,
(errcode(ERRCODE_OBJECT_IN_USE),
diff --git a/src/backend/replication/logical/tablesync.c b/src/backend/replication/logical/tablesync.c
index 863d196..206b2de 100644
--- a/src/backend/replication/logical/tablesync.c
+++ b/src/backend/replication/logical/tablesync.c
@@ -43,13 +43,17 @@
* state to SYNCDONE. There might be zero changes applied between
* CATCHUP and SYNCDONE, because the sync worker might be ahead of the
* apply worker.
+ * - The sync worker has a intermediary state COPYDONE which comes after
+ * CATCHUP and before SYNCDONE. This state indicates that the initial
+ * table copy phase has completed, so if the worker crashes before
+ * reaching SYNCDONE the copy will not be re-attempted.
* - Once the state is set to SYNCDONE, the apply will continue tracking
* the table until it reaches the SYNCDONE stream position, at which
* point it sets state to READY and stops tracking. Again, there might
* be zero changes in between.
*
* So the state progression is always: INIT -> DATASYNC -> SYNCWAIT ->
- * CATCHUP -> SYNCDONE -> READY.
+ * CATCHUP -> (sync worker COPYDONE) -> SYNCDONE -> READY.
*
* The catalog pg_subscription_rel is used to keep information about
* subscribed tables and their state. Some transient state during data
@@ -64,6 +68,7 @@
* -> set in memory CATCHUP
* -> enter wait-loop
* sync:10
+ * -> set in catalog COPYDONE
* -> set in catalog SYNCDONE
* -> exit
* apply:10
@@ -79,6 +84,7 @@
* -> set in memory CATCHUP
* -> continue per-table filtering
* sync:10
+ * -> set in catalog COPYDONE
* -> set in catalog SYNCDONE
* -> exit
* apply:10
@@ -102,6 +108,8 @@
#include "replication/logicalrelation.h"
#include "replication/walreceiver.h"
#include "replication/worker_internal.h"
+#include "replication/slot.h"
+#include "replication/origin.h"
#include "storage/ipc.h"
#include "utils/builtins.h"
#include "utils/lsyscache.h"
@@ -139,6 +147,32 @@ finish_sync_worker(void)
get_rel_name(MyLogicalRepWorker->relid))));
CommitTransactionCommand();
+ /*
+ * Cleanup the tablesync slot.
+ */
+ {
+ /* Calculate the name of the tablesync slot */
+ char *syncslotname = ReplicationSlotNameForTablesync(
+ MySubscription->oid,
+ MyLogicalRepWorker->relid);
+
+ PG_TRY();
+ {
+ elog(DEBUG1, "finish_sync_worker: dropping the tablesync slot \"%s\".", syncslotname);
+ ReplicationSlotDropAtPubNode(wrconn, syncslotname);
+ }
+ PG_CATCH();
+ {
+ /*
+ * NOP. Suppress any drop slot error because otherwise
+ * it would cause the tablesync to fail and re-launch.
+ */
+ }
+ PG_END_TRY();
+
+ pfree(syncslotname);
+ }
+
/* Find the main apply worker and signal it. */
logicalrep_worker_wakeup(MyLogicalRepWorker->subid, InvalidOid);
@@ -270,8 +304,6 @@ invalidate_syncing_table_states(Datum arg, int cacheid, uint32 hashvalue)
static void
process_syncing_tables_for_sync(XLogRecPtr current_lsn)
{
- Assert(IsTransactionState());
-
SpinLockAcquire(&MyLogicalRepWorker->relmutex);
if (MyLogicalRepWorker->relstate == SUBREL_STATE_CATCHUP &&
@@ -284,6 +316,15 @@ process_syncing_tables_for_sync(XLogRecPtr current_lsn)
SpinLockRelease(&MyLogicalRepWorker->relmutex);
+ /*
+ * UpdateSubscriptionRelState must be called within a transaction.
+ * That transaction will be ended within the finish_sync_worker().
+ */
+ if (!IsTransactionState())
+ {
+ StartTransactionCommand();
+ }
+
UpdateSubscriptionRelState(MyLogicalRepWorker->subid,
MyLogicalRepWorker->relid,
MyLogicalRepWorker->relstate,
@@ -412,6 +453,32 @@ process_syncing_tables_for_apply(XLogRecPtr current_lsn)
started_tx = true;
}
+ /*
+ * Remove the tablesync origin tracking if exists.
+ *
+ * The cleanup is done here instead of in the finish_sync_worker function because
+ * if the tablesync worker process attempted to call replorigin_drop then that will
+ * hang because replorigin_drop logic considers the owning tablesync PID as "busy".
+ *
+ * Do this before updating the state, so that DropSubscription can know that all
+ * READY workers have already had their origin tracking removed.
+ */
+ {
+ char originname[NAMEDATALEN];
+ RepOriginId originid;
+
+ snprintf(originname, sizeof(originname), "pg_%u_%u", MyLogicalRepWorker->subid, rstate->relid);
+ originid = replorigin_by_name(originname, true);
+ if (OidIsValid(originid))
+ {
+ elog(DEBUG1, "process_syncing_tables_for_apply: dropping tablesync origin tracking for \"%s\".", originname);
+ replorigin_drop(originid, false);
+ }
+ }
+
+ /*
+ * Update the state only after the origin cleanup.
+ */
UpdateSubscriptionRelState(MyLogicalRepWorker->subid,
rstate->relid, rstate->state,
rstate->lsn);
@@ -808,6 +875,31 @@ copy_table(Relation rel)
}
/*
+ * Determine the tablesync slot name.
+ *
+ * The returned slot name is palloc'ed in current memory context.
+ */
+char *
+ReplicationSlotNameForTablesync(Oid suboid, Oid relid)
+{
+ char *syncslotname;
+
+ /*
+ * To build a slot name for the sync work, we are limited to NAMEDATALEN -
+ * 1 characters.
+ *
+ * The name is calculated as pg_%u_sync_%u (3 + 10 + 6 + 10 + '\0').
+ * (It's actually the NAMEDATALEN on the remote that matters, but this
+ * scheme will also work reasonably if that is different.)
+ */
+ StaticAssertStmt(NAMEDATALEN >= 32, "NAMEDATALEN too small"); /* for sanity */
+
+ syncslotname = psprintf("pg_%u_sync_%u", suboid, relid);
+
+ return syncslotname;
+}
+
+/*
* Start syncing the table in the sync worker.
*
* If nothing needs to be done to sync the table, we exit the worker without
@@ -849,17 +941,8 @@ LogicalRepSyncTableStart(XLogRecPtr *origin_startpos)
finish_sync_worker(); /* doesn't return */
}
- /*
- * To build a slot name for the sync work, we are limited to NAMEDATALEN -
- * 1 characters. We cut the original slot name to NAMEDATALEN - 28 chars
- * and append _%u_sync_%u (1 + 10 + 6 + 10 + '\0'). (It's actually the
- * NAMEDATALEN on the remote that matters, but this scheme will also work
- * reasonably if that is different.)
- */
- StaticAssertStmt(NAMEDATALEN >= 32, "NAMEDATALEN too small"); /* for sanity */
- slotname = psprintf("%.*s_%u_sync_%u",
- NAMEDATALEN - 28,
- MySubscription->slotname,
+ /* Calculate the name of the tablesync slot. */
+ slotname = ReplicationSlotNameForTablesync(
MySubscription->oid,
MyLogicalRepWorker->relid);
@@ -874,7 +957,19 @@ LogicalRepSyncTableStart(XLogRecPtr *origin_startpos)
(errmsg("could not connect to the publisher: %s", err)));
Assert(MyLogicalRepWorker->relstate == SUBREL_STATE_INIT ||
- MyLogicalRepWorker->relstate == SUBREL_STATE_DATASYNC);
+ MyLogicalRepWorker->relstate == SUBREL_STATE_DATASYNC ||
+ MyLogicalRepWorker->relstate == SUBREL_STATE_COPYDONE);
+
+ if (MyLogicalRepWorker->relstate == SUBREL_STATE_COPYDONE)
+ {
+ /*
+ * The COPY phase was previously done, but tablesync then crashed/etc
+ * before it was able to finish normally.
+ */
+ elog(DEBUG1, "LogicalRepSyncTableStart: tablesync relstate was SUBREL_STATE_COPYDONE.");
+ StartTransactionCommand();
+ goto copy_table_done;
+ }
SpinLockAcquire(&MyLogicalRepWorker->relmutex);
MyLogicalRepWorker->relstate = SUBREL_STATE_DATASYNC;
@@ -890,9 +985,6 @@ LogicalRepSyncTableStart(XLogRecPtr *origin_startpos)
CommitTransactionCommand();
pgstat_report_stat(false);
- /*
- * We want to do the table data sync in a single transaction.
- */
StartTransactionCommand();
/*
@@ -918,29 +1010,91 @@ LogicalRepSyncTableStart(XLogRecPtr *origin_startpos)
walrcv_clear_result(res);
/*
- * Create a new temporary logical decoding slot. This slot will be used
+ * Create a new permanent logical decoding slot. This slot will be used
* for the catchup phase after COPY is done, so tell it to use the
* snapshot to make the final data consistent.
*/
- walrcv_create_slot(wrconn, slotname, true,
+ walrcv_create_slot(wrconn, slotname, false,
CRS_USE_SNAPSHOT, origin_startpos);
- /* Now do the initial data copy */
- PushActiveSnapshot(GetTransactionSnapshot());
- copy_table(rel);
- PopActiveSnapshot();
+ /*
+ * Be sure to remove the newly created tablesync slot if the COPY fails.
+ */
+ PG_TRY();
+ {
+ /* Now do the initial data copy */
+ PushActiveSnapshot(GetTransactionSnapshot());
+ copy_table(rel);
+ PopActiveSnapshot();
+
+ res = walrcv_exec(wrconn, "COMMIT", 0, NULL);
+ if (res->status != WALRCV_OK_COMMAND)
+ ereport(ERROR,
+ (errmsg("table copy could not finish transaction on publisher"),
+ errdetail("The error was: %s", res->err)));
+ walrcv_clear_result(res);
+
+ table_close(rel, NoLock);
+
+ /* Make the copy visible. */
+ CommandCounterIncrement();
+ }
+ PG_CATCH();
+ {
+ /* If something failed during copy table then cleanup the created slot. */
+ elog(DEBUG1, "LogicalRepSyncTableStart: tablesync copy failed. Dropping the tablesync slot \"%s\".", slotname);
+ ReplicationSlotDropAtPubNode(wrconn, slotname);
- res = walrcv_exec(wrconn, "COMMIT", 0, NULL);
- if (res->status != WALRCV_OK_COMMAND)
- ereport(ERROR,
- (errmsg("table copy could not finish transaction on publisher"),
- errdetail("The error was: %s", res->err)));
- walrcv_clear_result(res);
+ pfree(slotname);
+ slotname = NULL;
- table_close(rel, NoLock);
+ PG_RE_THROW();
+ }
+ PG_END_TRY();
+
+ /* Update the persisted state to indicate the COPY phase is done; make it visible to others. */
+ UpdateSubscriptionRelState(MyLogicalRepWorker->subid,
+ MyLogicalRepWorker->relid,
+ SUBREL_STATE_COPYDONE,
+ MyLogicalRepWorker->relstate_lsn);
+
+copy_table_done:
- /* Make the copy visible. */
- CommandCounterIncrement();
+ /* Setup replication origin tracking. */
+ {
+ char originname[NAMEDATALEN];
+ RepOriginId originid;
+
+ snprintf(originname, sizeof(originname), "pg_%u_%u", MySubscription->oid, MyLogicalRepWorker->relid);
+ originid = replorigin_by_name(originname, true);
+ if (!OidIsValid(originid))
+ {
+ /*
+ * Origin tracking does not exist. Create it now, and advance to LSN got from walrcv_create_slot.
+ */
+ originid = replorigin_create(originname);
+ replorigin_session_setup(originid);
+ replorigin_session_origin = originid;
+ replorigin_advance(originid, *origin_startpos, InvalidXLogRecPtr,
+ true /* go backward */ , true /* WAL log */ );
+ }
+ else
+ {
+ /*
+ * Origin tracking already exists.
+ */
+ replorigin_session_setup(originid);
+ replorigin_session_origin = originid;
+ *origin_startpos = replorigin_session_get_progress(false);
+ }
+
+ elog(DEBUG1, "LogicalRepSyncTableStart: '%s' origin_startpos lsn %X/%X",
+ originname,
+ (uint32) (*origin_startpos >> 32),
+ (uint32) *origin_startpos);
+ }
+
+ CommitTransactionCommand();
/*
* We are done with the initial data synchronization, update the state.
diff --git a/src/backend/replication/logical/worker.c b/src/backend/replication/logical/worker.c
index 1b1d70e..4bd4030 100644
--- a/src/backend/replication/logical/worker.c
+++ b/src/backend/replication/logical/worker.c
@@ -807,12 +807,8 @@ apply_handle_stream_stop(StringInfo s)
/* We must be in a valid transaction state */
Assert(IsTransactionState());
- /* The synchronization worker runs in single transaction. */
- if (!am_tablesync_worker())
- {
- /* Commit the per-stream transaction */
- CommitTransactionCommand();
- }
+ /* Commit the per-stream transaction */
+ CommitTransactionCommand();
in_streamed_transaction = false;
@@ -889,9 +885,7 @@ apply_handle_stream_abort(StringInfo s)
/* Cleanup the subxact info */
cleanup_subxact_info();
- /* The synchronization worker runs in single transaction */
- if (!am_tablesync_worker())
- CommitTransactionCommand();
+ CommitTransactionCommand();
return;
}
@@ -918,8 +912,7 @@ apply_handle_stream_abort(StringInfo s)
/* write the updated subxact list */
subxact_info_write(MyLogicalRepWorker->subid, xid);
- if (!am_tablesync_worker())
- CommitTransactionCommand();
+ CommitTransactionCommand();
}
}
@@ -1062,8 +1055,7 @@ apply_handle_stream_commit(StringInfo s)
static void
apply_handle_commit_internal(StringInfo s, LogicalRepCommitData* commit_data)
{
- /* The synchronization worker runs in single transaction. */
- if (IsTransactionState() && !am_tablesync_worker())
+ if (IsTransactionState())
{
/*
* Update origin state so we can restart streaming from correct
diff --git a/src/include/catalog/pg_subscription_rel.h b/src/include/catalog/pg_subscription_rel.h
index 06663b9..4bd93e2 100644
--- a/src/include/catalog/pg_subscription_rel.h
+++ b/src/include/catalog/pg_subscription_rel.h
@@ -61,6 +61,7 @@ DECLARE_UNIQUE_INDEX(pg_subscription_rel_srrelid_srsubid_index, 6117, on pg_subs
#define SUBREL_STATE_INIT 'i' /* initializing (sublsn NULL) */
#define SUBREL_STATE_DATASYNC 'd' /* data is being synchronized (sublsn
* NULL) */
+#define SUBREL_STATE_COPYDONE 'C' /* tablesync copy phase is completed */
#define SUBREL_STATE_SYNCDONE 's' /* synchronization finished in front of
* apply (sublsn set) */
#define SUBREL_STATE_READY 'r' /* ready (sublsn set) */
diff --git a/src/include/replication/slot.h b/src/include/replication/slot.h
index 53f636c..e617602 100644
--- a/src/include/replication/slot.h
+++ b/src/include/replication/slot.h
@@ -15,6 +15,7 @@
#include "storage/lwlock.h"
#include "storage/shmem.h"
#include "storage/spin.h"
+#include "replication/walreceiver.h"
/*
* Behaviour of replication slots, upon release or crash.
@@ -211,6 +212,8 @@ extern bool ReplicationSlotsCountDBSlots(Oid dboid, int *nslots, int *nactive);
extern void ReplicationSlotsDropDBSlots(Oid dboid);
extern void InvalidateObsoleteReplicationSlots(XLogSegNo oldestSegno);
extern ReplicationSlot *SearchNamedReplicationSlot(const char *name);
+extern char *ReplicationSlotNameForTablesync(Oid suboid, Oid relid);
+extern void ReplicationSlotDropAtPubNode(WalReceiverConn *wrconn, char *slotname);
extern void StartupReplicationSlots(void);
extern void CheckPointReplicationSlots(void);
--
1.8.3.1
On Mon, Jan 4, 2021 at 8:06 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
Few other comments: ================= 1. + elog(LOG, "!!>> DropSubscription: dropping the tablesync slot \"%s\".", syncslotname); + ReplicationSlotDropAtPubNode(wrconn, syncslotname); + elog(LOG, "!!>> DropSubscription: dropped the tablesync slot \"%s\".", syncslotname);...
...+ elog(LOG, "!!>> finish_sync_worker: dropping the tablesync slot \"%s\".", syncslotname); + ReplicationSlotDropAtPubNode(wrconn, syncslotname); + elog(LOG, "!!>> finish_sync_worker: dropped the tablesync slot \"%s\".", syncslotname);Remove these and other elogs added to aid debugging or testing. If you
need these for development purposes then move these to separate patch.
Fixed in latest patch (v10).
2. Remove WIP from the commit message and patch name.
--
Fixed in latest patch (v10)
---
v10 = /messages/by-id/CAHut+PuzPmFzk3p4oL9H3nkiY6utFryV9c5dW6kRhCe_RY=gnA@mail.gmail.com
Kind Regards,
Peter Smith.
Fujitsu Australia
On Mon, Jan 4, 2021 at 2:38 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
Few other comments:
=================
Few more comments on v9:
======================
1.
+ /* Drop the tablesync slot. */
+ {
+ char *syncslotname = ReplicationSlotNameForTablesync(subid, relid);
+
+ /*
+ * If the subscription slotname is NONE/NULL and the connection to publisher is
+ * broken, but the DropSubscription should still be allowed to complete.
+ * But without a connection it is not possible to drop any tablesync slots.
+ */
+ if (!wrconn)
+ {
+ /* FIXME - OK to just log a warning? */
+ elog(WARNING, "!!>> DropSubscription: no connection. Cannot drop
tablesync slot \"%s\".",
+ syncslotname);
+ }
Why is this not an ERROR? We don't want to keep the table slots
lingering after DropSubscription. If there is any tablesync slot that
needs to be dropped and the publisher is not available then we should
raise an error.
2.
+ /*
+ * Tablesync resource cleanup (slots and origins).
+ *
+ * Any READY-state relations would already have dealt with clean-ups.
+ */
+ {
There is no need to start a separate block '{' here.
3.
+#define SUBREL_STATE_COPYDONE 'C' /* tablesync copy phase is completed */
You can mention in the comments that sublsn will be NULL for this
state as it is mentioned for other similar states. Can we think of
using any letter in lower case for this as all other states are in
lower-case except for this which makes it a look bit odd? We can use
'f' or 'e' and describe it as 'copy finished' or 'copy end'. I am fine
if you have any better ideas.
4.
LogicalRepSyncTableStart()
{
..
..
+copy_table_done:
+
+ /* Setup replication origin tracking. */
+ {
+ char originname[NAMEDATALEN];
+ RepOriginId originid;
+
+ snprintf(originname, sizeof(originname), "pg_%u_%u",
MySubscription->oid, MyLogicalRepWorker->relid);
+ originid = replorigin_by_name(originname, true);
+ if (!OidIsValid(originid))
+ {
+ /*
+ * Origin tracking does not exist. Create it now, and advance to LSN
got from walrcv_create_slot.
+ */
+ elog(LOG, "!!>> LogicalRepSyncTableStart: 1 replorigin_create
\"%s\".", originname);
+ originid = replorigin_create(originname);
+ elog(LOG, "!!>> LogicalRepSyncTableStart: 1 replorigin_session_setup
\"%s\".", originname);
+ replorigin_session_setup(originid);
+ replorigin_session_origin = originid;
+ elog(LOG, "!!>> LogicalRepSyncTableStart: 1 replorigin_advance
\"%s\".", originname);
+ replorigin_advance(originid, *origin_startpos, InvalidXLogRecPtr,
+ true /* go backward */ , true /* WAL log */ );
+ }
+ else
+ {
+ /*
+ * Origin tracking already exists.
+ */
+ elog(LOG, "!!>> LogicalRepSyncTableStart: 2 replorigin_session_setup
\"%s\".", originname);
+ replorigin_session_setup(originid);
+ replorigin_session_origin = originid;
+ elog(LOG, "!!>> LogicalRepSyncTableStart: 2
replorigin_session_get_progress \"%s\".", originname);
+ *origin_startpos = replorigin_session_get_progress(false);
+ }
..
..
}
I am not sure if this code is correct because, for the very first time
when the copy is done, we don't expect replication origin to exist
whereas this code will silently use already existing replication
origin which can lead to a wrong start position for the slot. In such
a case it should error out. I guess we should create the replication
origin before making the state as copydone. I feel we should even have
a test case for this as it is not difficult to have a pre-existing
replication origin.
5. Is it possible to write a testcase where we fail (say due to pk
violation or some other error) after the initial copy is done, then
remove the conflicting row and allow a copy to be completed? If we
already have any such test then it is fine.
6.
+/*
+ * Drop the replication slot at the publisher node
+ * using the replication connection.
+ */
This comment looks a bit odd. The first line appears to be too short.
We have limit of 80 chars but this is much lesser than that.
7.
@@ -905,7 +905,7 @@ replorigin_advance(RepOriginId node,
LWLockAcquire(&replication_state->lock, LW_EXCLUSIVE);
/* Make sure it's not used by somebody else */
- if (replication_state->acquired_by != 0)
+ if (replication_state->acquired_by != 0 &&
replication_state->acquired_by != MyProcPid)
{
I think you won't need this change if you do replorigin_advance before
replorigin_session_setup in your patch.
8.
- * that ensures we won't loose knowledge about that after a crash if the
+ * that ensures we won't lose knowledge about that after a crash if the
It is better to submit this as a separate patch.
--
With Regards,
Amit Kapila.
On Wed, Dec 30, 2020 at 5:08 PM Peter Smith <smithpb2250@gmail.com> wrote:
PSA my v9 WIP patch for the Solution1 which addresses some recent
review comments, and other minor changes.
I did some tests using the test suite prepared by Erik Rijkers in [1]/messages/by-id/93d02794068482f96d31b002e0eb248d@xs4all.nl
during the initial design of tablesync.
Back then, they had seen some errors while doing multiple commits in
initial tablesync. So I've rerun the test script on the v9 patch
applied on HEAD and found no errors.
The script runs pgbench, creates a pub/sub on a standby server, and
all of the pgbench tables are replicated to the standby. The contents
of the tables are compared at
the end of each run to make sure they are identical.
I have run it for around 12 hours, and it worked without any errors.
Attaching the script I used.
regards,
Ajin Cherian
Fujitsu Australia
[1]: /messages/by-id/93d02794068482f96d31b002e0eb248d@xs4all.nl
Attachments:
Hi Amit.
PSA the v11 patch for the Tablesync Solution1.
Difference from v10:
- Addresses several recent review comments.
- pg_indent has been run
====
Features:
* tablesync slot is now permanent instead of temporary. The tablesync
slot name is no longer tied to the Subscription slot na
* the tablesync slot cleanup (drop) code is added for DropSubscription
and for finish_sync_worker functions
* tablesync worked now allowing multiple tx instead of single tx
* a new state (SUBREL_STATE_COPYDONE) is persisted after a successful
copy_table in LogicalRepSyncTableStart.
* if a re-launched tablesync finds the state is SUBREL_STATE_COPYDONE
then it will bypass the initial copy_table phase.
* tablesync sets up replication origin tracking in
LogicalRepSyncTableStart (similar as done for the apply worker). The
origin is advanced when first created.
* tablesync replication origin tracking is cleaned up during
DropSubscription and/or process_syncing_tables_for_apply.
* the DropSubscription cleanup code was enhanced (v7+) to take care of
crashed sync workers.
* minor updates to PG docs
TODO / Known Issues:
* address review comments
---
Kind Regards,
Peter Smith.
Fujitsu Australia
Attachments:
v11-0001-Tablesync-Solution1.patchapplication/octet-stream; name=v11-0001-Tablesync-Solution1.patchDownload
From 17a861c973a44fba0286a0879ca8f73e22fc924f Mon Sep 17 00:00:00 2001
From: Peter Smith <peter.b.smith@fujitsu.com>
Date: Tue, 5 Jan 2021 20:07:53 +1100
Subject: [PATCH v11] Tablesync Solution1.
====
Features:
* tablesync slot is now permanent instead of temporary. The tablesync slot name is no longer tied to the Subscription slot na
* the tablesync slot cleanup (drop) code is added for DropSubscription and for finish_sync_worker functions
* tablesync worked now allowing multiple tx instead of single tx
* a new state (SUBREL_STATE_COPYDONE) is persisted after a successful copy_table in LogicalRepSyncTableStart.
* if a re-launched tablesync finds the state is SUBREL_STATE_COPYDONE then it will bypass the initial copy_table phase.
* tablesync sets up replication origin tracking in LogicalRepSyncTableStart (similar as done for the apply worker). The origin is advanced when first created.
* tablesync replication origin tracking is cleaned up during DropSubscription and/or process_syncing_tables_for_apply.
* the DropSubscription cleanup code was enhanced (v7+) to take care of crashed sync workers.
* minor updates to PG docs
TODO / Known Issues:
* address review comments
---
doc/src/sgml/catalogs.sgml | 1 +
src/backend/commands/subscriptioncmds.c | 219 ++++++++++++++++++-------
src/backend/replication/logical/origin.c | 2 +-
src/backend/replication/logical/tablesync.c | 238 ++++++++++++++++++++++++----
src/backend/replication/logical/worker.c | 18 +--
src/include/catalog/pg_subscription_rel.h | 2 +
src/include/replication/slot.h | 3 +
7 files changed, 379 insertions(+), 104 deletions(-)
diff --git a/doc/src/sgml/catalogs.sgml b/doc/src/sgml/catalogs.sgml
index 3a22665..8fcc8b1 100644
--- a/doc/src/sgml/catalogs.sgml
+++ b/doc/src/sgml/catalogs.sgml
@@ -7651,6 +7651,7 @@ SCRAM-SHA-256$<replaceable><iteration count></replaceable>:<replaceable>&l
State code:
<literal>i</literal> = initialize,
<literal>d</literal> = data is being copied,
+ <literal>C</literal> = table data has been copied,
<literal>s</literal> = synchronized,
<literal>r</literal> = ready (normal replication)
</para></entry>
diff --git a/src/backend/commands/subscriptioncmds.c b/src/backend/commands/subscriptioncmds.c
index 490e935..16b2bee 100644
--- a/src/backend/commands/subscriptioncmds.c
+++ b/src/backend/commands/subscriptioncmds.c
@@ -37,6 +37,7 @@
#include "replication/walreceiver.h"
#include "replication/walsender.h"
#include "replication/worker_internal.h"
+#include "replication/slot.h"
#include "storage/lmgr.h"
#include "utils/acl.h"
#include "utils/builtins.h"
@@ -928,7 +929,6 @@ DropSubscription(DropSubscriptionStmt *stmt, bool isTopLevel)
char *err = NULL;
RepOriginId originid;
WalReceiverConn *wrconn = NULL;
- StringInfoData cmd;
Form_pg_subscription form;
/*
@@ -1016,73 +1016,184 @@ DropSubscription(DropSubscriptionStmt *stmt, bool isTopLevel)
ReleaseSysCache(tup);
/*
- * Stop all the subscription workers immediately.
- *
- * This is necessary if we are dropping the replication slot, so that the
- * slot becomes accessible.
+ * Try to acquire the connection necessary for dropping slots. We do this
+ * here so that the same connection may be shared for dropping the
+ * Subscription slot, as well as dropping any tablesync slots.
*
- * It is also necessary if the subscription is disabled and was disabled
- * in the same transaction. Then the workers haven't seen the disabling
- * yet and will still be running, leading to hangs later when we want to
- * drop the replication origin. If the subscription was disabled before
- * this transaction, then there shouldn't be any workers left, so this
- * won't make a difference.
- *
- * New workers won't be started because we hold an exclusive lock on the
- * subscription till the end of the transaction.
+ * Note: If the slotname is NONE/NULL then connection errors are
+ * suppressed. This is necessary so that the DROP SUBSCRIPTION can still
+ * complete even when the connection to publisher is broken.
*/
- LWLockAcquire(LogicalRepWorkerLock, LW_SHARED);
- subworkers = logicalrep_workers_find(subid, false);
- LWLockRelease(LogicalRepWorkerLock);
- foreach(lc, subworkers)
+ load_file("libpqwalreceiver", false);
+
+ wrconn = walrcv_connect(conninfo, true, subname, &err);
+ if (wrconn == NULL && slotname != NULL)
+ ereport(ERROR,
+ (errmsg("could not connect to publisher when attempting to "
+ "drop the replication slot \"%s\"", slotname),
+ errdetail("The error was: %s", err),
+ /* translator: %s is an SQL ALTER command */
+ errhint("Use %s to disassociate the subscription from the slot.",
+ "ALTER SUBSCRIPTION ... SET (slot_name = NONE)")));
+
+ PG_TRY();
{
- LogicalRepWorker *w = (LogicalRepWorker *) lfirst(lc);
+ /*
+ * Stop all the subscription workers immediately.
+ *
+ * This is necessary if we are dropping the replication slot, so that
+ * the slot becomes accessible.
+ *
+ * It is also necessary if the subscription is disabled and was
+ * disabled in the same transaction. Then the workers haven't seen
+ * the disabling yet and will still be running, leading to hangs later
+ * when we want to drop the replication origin. If the subscription
+ * was disabled before this transaction, then there shouldn't be any
+ * workers left, so this won't make a difference.
+ *
+ * New workers won't be started because we hold an exclusive lock on
+ * the subscription till the end of the transaction.
+ */
+ LWLockAcquire(LogicalRepWorkerLock, LW_SHARED);
+ subworkers = logicalrep_workers_find(subid, false);
+ LWLockRelease(LogicalRepWorkerLock);
+ foreach(lc, subworkers)
+ {
+ LogicalRepWorker *w = (LogicalRepWorker *) lfirst(lc);
- logicalrep_worker_stop(w->subid, w->relid);
- }
- list_free(subworkers);
+ logicalrep_worker_stop(w->subid, w->relid);
+ }
+ list_free(subworkers);
- /* Clean up dependencies */
- deleteSharedDependencyRecordsFor(SubscriptionRelationId, subid, 0);
+ /*
+ * Tablesync resource cleanup (slots and origins).
+ *
+ * Any READY-state relations would already have dealt with clean-ups.
+ */
+ {
+ List *rstates;
+ ListCell *lc;
- /* Remove any associated relation synchronization states. */
- RemoveSubscriptionRel(subid, InvalidOid);
+ rstates = GetSubscriptionNotReadyRelations(subid);
+ foreach(lc, rstates)
+ {
+ SubscriptionRelState *rstate = (SubscriptionRelState *) lfirst(lc);
+ Oid relid = rstate->relid;
- /* Remove the origin tracking if exists. */
- snprintf(originname, sizeof(originname), "pg_%u", subid);
- originid = replorigin_by_name(originname, true);
- if (originid != InvalidRepOriginId)
- replorigin_drop(originid, false);
+ /* Only cleanup the tablesync worker resources */
+ if (!OidIsValid(relid))
+ continue;
- /*
- * If there is no slot associated with the subscription, we can finish
- * here.
- */
- if (!slotname)
+ /* Drop the tablesync slot. */
+ {
+ char *syncslotname = ReplicationSlotNameForTablesync(subid, relid);
+
+ /*
+ * If the subscription slotname is NONE/NULL and the
+ * connection to publisher is broken, but the
+ * DropSubscription should still be allowed to complete.
+ * But without a connection it is not possible to drop any
+ * tablesync slots.
+ */
+ if (!wrconn)
+ {
+ /* XXX - OK to just log? */
+ elog(LOG, "DROP SUBSCRIPTION: no connection; cannot drop tablesync slot \"%s\".",
+ syncslotname);
+ }
+ else
+ {
+ PG_TRY();
+ {
+ elog(DEBUG1, "DropSubscription: dropping the tablesync slot \"%s\".", syncslotname);
+ ReplicationSlotDropAtPubNode(wrconn, syncslotname);
+ }
+ PG_CATCH();
+ {
+ /*
+ * Typically tablesync will delete its own slot
+ * after it reaches SYNCDONE state. Then the apply
+ * worker moves the tablesync from SYNCDONE to
+ * READY state.
+ *
+ * Rarely, the DropSubscription may be issued in
+ * between when a tablesync still is in SYNCDONE,
+ * but not yet reached READY state. If this
+ * happens then the drop slot could fail since it
+ * was already dropped, so suppress the error.
+ */
+ if (rstate->state != SUBREL_STATE_SYNCDONE)
+ {
+ pfree(syncslotname);
+ PG_RE_THROW();
+ }
+ }
+ PG_END_TRY();
+ }
+ pfree(syncslotname);
+ }
+
+ /* Remove the tablesync's origin tracking if exists. */
+ {
+ snprintf(originname, sizeof(originname), "pg_%u_%u", subid, relid);
+ originid = replorigin_by_name(originname, true);
+ if (originid != InvalidRepOriginId)
+ {
+ elog(DEBUG1, "DropSubscription: dropping origin tracking for \"%s\"", originname);
+ replorigin_drop(originid, false);
+ }
+ }
+
+ }
+ list_free(rstates);
+ }
+
+ /* Clean up dependencies. */
+ deleteSharedDependencyRecordsFor(SubscriptionRelationId, subid, 0);
+
+ /* Remove any associated relation synchronization states. */
+ RemoveSubscriptionRel(subid, InvalidOid);
+
+ /* Remove the origin tracking if exists. */
+ snprintf(originname, sizeof(originname), "pg_%u", subid);
+ originid = replorigin_by_name(originname, true);
+ if (originid != InvalidRepOriginId)
+ replorigin_drop(originid, false);
+
+ /*
+ * If there is a slot associated with the subscription, then drop the
+ * replication slot at the publisher node using the replication
+ * connection.
+ */
+ if (slotname)
+ ReplicationSlotDropAtPubNode(wrconn, slotname);
+ }
+ PG_FINALLY();
{
+ if (wrconn)
+ walrcv_disconnect(wrconn);
+
table_close(rel, NoLock);
- return;
}
+ PG_END_TRY();
+}
+
+
+/*
+ * Drop the replication slot at the publisher node using the replication connection.
+ */
+void
+ReplicationSlotDropAtPubNode(WalReceiverConn *wrconn, char *slotname)
+{
+ StringInfoData cmd;
+
+ Assert(wrconn);
- /*
- * Otherwise drop the replication slot at the publisher node using the
- * replication connection.
- */
load_file("libpqwalreceiver", false);
initStringInfo(&cmd);
appendStringInfo(&cmd, "DROP_REPLICATION_SLOT %s WAIT", quote_identifier(slotname));
- wrconn = walrcv_connect(conninfo, true, subname, &err);
- if (wrconn == NULL)
- ereport(ERROR,
- (errmsg("could not connect to publisher when attempting to "
- "drop the replication slot \"%s\"", slotname),
- errdetail("The error was: %s", err),
- /* translator: %s is an SQL ALTER command */
- errhint("Use %s to disassociate the subscription from the slot.",
- "ALTER SUBSCRIPTION ... SET (slot_name = NONE)")));
-
PG_TRY();
{
WalRcvExecResult *res;
@@ -1103,13 +1214,9 @@ DropSubscription(DropSubscriptionStmt *stmt, bool isTopLevel)
}
PG_FINALLY();
{
- walrcv_disconnect(wrconn);
+ pfree(cmd.data);
}
PG_END_TRY();
-
- pfree(cmd.data);
-
- table_close(rel, NoLock);
}
/*
diff --git a/src/backend/replication/logical/origin.c b/src/backend/replication/logical/origin.c
index 0b01cce..b299285 100644
--- a/src/backend/replication/logical/origin.c
+++ b/src/backend/replication/logical/origin.c
@@ -905,7 +905,7 @@ replorigin_advance(RepOriginId node,
LWLockAcquire(&replication_state->lock, LW_EXCLUSIVE);
/* Make sure it's not used by somebody else */
- if (replication_state->acquired_by != 0)
+ if (replication_state->acquired_by != 0 && replication_state->acquired_by != MyProcPid)
{
ereport(ERROR,
(errcode(ERRCODE_OBJECT_IN_USE),
diff --git a/src/backend/replication/logical/tablesync.c b/src/backend/replication/logical/tablesync.c
index 863d196..f83f06f 100644
--- a/src/backend/replication/logical/tablesync.c
+++ b/src/backend/replication/logical/tablesync.c
@@ -43,13 +43,17 @@
* state to SYNCDONE. There might be zero changes applied between
* CATCHUP and SYNCDONE, because the sync worker might be ahead of the
* apply worker.
+ * - The sync worker has a intermediary state TCOPYDONE which comes after
+ * CATCHUP and before SYNCDONE. This state indicates that the initial
+ * table copy phase has completed, so if the worker crashes before
+ * reaching SYNCDONE the copy will not be re-attempted.
* - Once the state is set to SYNCDONE, the apply will continue tracking
* the table until it reaches the SYNCDONE stream position, at which
* point it sets state to READY and stops tracking. Again, there might
* be zero changes in between.
*
* So the state progression is always: INIT -> DATASYNC -> SYNCWAIT ->
- * CATCHUP -> SYNCDONE -> READY.
+ * CATCHUP -> (sync worker TCOPYDONE) -> SYNCDONE -> READY.
*
* The catalog pg_subscription_rel is used to keep information about
* subscribed tables and their state. Some transient state during data
@@ -64,6 +68,7 @@
* -> set in memory CATCHUP
* -> enter wait-loop
* sync:10
+ * -> set in catalog TCOPYDONE
* -> set in catalog SYNCDONE
* -> exit
* apply:10
@@ -79,6 +84,7 @@
* -> set in memory CATCHUP
* -> continue per-table filtering
* sync:10
+ * -> set in catalog TCOPYDONE
* -> set in catalog SYNCDONE
* -> exit
* apply:10
@@ -102,6 +108,8 @@
#include "replication/logicalrelation.h"
#include "replication/walreceiver.h"
#include "replication/worker_internal.h"
+#include "replication/slot.h"
+#include "replication/origin.h"
#include "storage/ipc.h"
#include "utils/builtins.h"
#include "utils/lsyscache.h"
@@ -139,6 +147,32 @@ finish_sync_worker(void)
get_rel_name(MyLogicalRepWorker->relid))));
CommitTransactionCommand();
+ /*
+ * Cleanup the tablesync slot.
+ */
+ {
+ /* Calculate the name of the tablesync slot */
+ char *syncslotname = ReplicationSlotNameForTablesync(
+ MySubscription->oid,
+ MyLogicalRepWorker->relid);
+
+ PG_TRY();
+ {
+ elog(DEBUG1, "finish_sync_worker: dropping the tablesync slot \"%s\".", syncslotname);
+ ReplicationSlotDropAtPubNode(wrconn, syncslotname);
+ }
+ PG_CATCH();
+ {
+ /*
+ * NOP. Suppress any drop slot error because otherwise it would
+ * cause the tablesync to fail and re-launch.
+ */
+ }
+ PG_END_TRY();
+
+ pfree(syncslotname);
+ }
+
/* Find the main apply worker and signal it. */
logicalrep_worker_wakeup(MyLogicalRepWorker->subid, InvalidOid);
@@ -270,8 +304,6 @@ invalidate_syncing_table_states(Datum arg, int cacheid, uint32 hashvalue)
static void
process_syncing_tables_for_sync(XLogRecPtr current_lsn)
{
- Assert(IsTransactionState());
-
SpinLockAcquire(&MyLogicalRepWorker->relmutex);
if (MyLogicalRepWorker->relstate == SUBREL_STATE_CATCHUP &&
@@ -284,6 +316,15 @@ process_syncing_tables_for_sync(XLogRecPtr current_lsn)
SpinLockRelease(&MyLogicalRepWorker->relmutex);
+ /*
+ * UpdateSubscriptionRelState must be called within a transaction.
+ * That transaction will be ended within the finish_sync_worker().
+ */
+ if (!IsTransactionState())
+ {
+ StartTransactionCommand();
+ }
+
UpdateSubscriptionRelState(MyLogicalRepWorker->subid,
MyLogicalRepWorker->relid,
MyLogicalRepWorker->relstate,
@@ -412,6 +453,35 @@ process_syncing_tables_for_apply(XLogRecPtr current_lsn)
started_tx = true;
}
+ /*
+ * Remove the tablesync origin tracking if exists.
+ *
+ * The cleanup is done here instead of in the
+ * finish_sync_worker function because if the tablesync worker
+ * process attempted to call replorigin_drop then that will
+ * hang because replorigin_drop logic considers the owning
+ * tablesync PID as "busy".
+ *
+ * Do this before updating the state, so that DropSubscription
+ * can know that all READY workers have already had their
+ * origin tracking removed.
+ */
+ {
+ char originname[NAMEDATALEN];
+ RepOriginId originid;
+
+ snprintf(originname, sizeof(originname), "pg_%u_%u", MyLogicalRepWorker->subid, rstate->relid);
+ originid = replorigin_by_name(originname, true);
+ if (OidIsValid(originid))
+ {
+ elog(DEBUG1, "process_syncing_tables_for_apply: dropping tablesync origin tracking for \"%s\".", originname);
+ replorigin_drop(originid, false);
+ }
+ }
+
+ /*
+ * Update the state only after the origin cleanup.
+ */
UpdateSubscriptionRelState(MyLogicalRepWorker->subid,
rstate->relid, rstate->state,
rstate->lsn);
@@ -808,6 +878,31 @@ copy_table(Relation rel)
}
/*
+ * Determine the tablesync slot name.
+ *
+ * The returned slot name is palloc'ed in current memory context.
+ */
+char *
+ReplicationSlotNameForTablesync(Oid suboid, Oid relid)
+{
+ char *syncslotname;
+
+ /*
+ * To build a slot name for the sync work, we are limited to NAMEDATALEN -
+ * 1 characters.
+ *
+ * The name is calculated as pg_%u_sync_%u (3 + 10 + 6 + 10 + '\0'). (It's
+ * actually the NAMEDATALEN on the remote that matters, but this scheme
+ * will also work reasonably if that is different.)
+ */
+ StaticAssertStmt(NAMEDATALEN >= 32, "NAMEDATALEN too small"); /* for sanity */
+
+ syncslotname = psprintf("pg_%u_sync_%u", suboid, relid);
+
+ return syncslotname;
+}
+
+/*
* Start syncing the table in the sync worker.
*
* If nothing needs to be done to sync the table, we exit the worker without
@@ -824,6 +919,8 @@ LogicalRepSyncTableStart(XLogRecPtr *origin_startpos)
XLogRecPtr relstate_lsn;
Relation rel;
WalRcvExecResult *res;
+ char originname[NAMEDATALEN];
+ RepOriginId originid;
/* Check the state of the table synchronization. */
StartTransactionCommand();
@@ -849,19 +946,10 @@ LogicalRepSyncTableStart(XLogRecPtr *origin_startpos)
finish_sync_worker(); /* doesn't return */
}
- /*
- * To build a slot name for the sync work, we are limited to NAMEDATALEN -
- * 1 characters. We cut the original slot name to NAMEDATALEN - 28 chars
- * and append _%u_sync_%u (1 + 10 + 6 + 10 + '\0'). (It's actually the
- * NAMEDATALEN on the remote that matters, but this scheme will also work
- * reasonably if that is different.)
- */
- StaticAssertStmt(NAMEDATALEN >= 32, "NAMEDATALEN too small"); /* for sanity */
- slotname = psprintf("%.*s_%u_sync_%u",
- NAMEDATALEN - 28,
- MySubscription->slotname,
- MySubscription->oid,
- MyLogicalRepWorker->relid);
+ /* Calculate the name of the tablesync slot. */
+ slotname = ReplicationSlotNameForTablesync(
+ MySubscription->oid,
+ MyLogicalRepWorker->relid);
/*
* Here we use the slot name instead of the subscription name as the
@@ -874,7 +962,31 @@ LogicalRepSyncTableStart(XLogRecPtr *origin_startpos)
(errmsg("could not connect to the publisher: %s", err)));
Assert(MyLogicalRepWorker->relstate == SUBREL_STATE_INIT ||
- MyLogicalRepWorker->relstate == SUBREL_STATE_DATASYNC);
+ MyLogicalRepWorker->relstate == SUBREL_STATE_DATASYNC ||
+ MyLogicalRepWorker->relstate == SUBREL_STATE_TCOPYDONE);
+
+ /* Assign the origin tracking record name. */
+ snprintf(originname, sizeof(originname), "pg_%u_%u", MySubscription->oid, MyLogicalRepWorker->relid);
+
+ if (MyLogicalRepWorker->relstate == SUBREL_STATE_TCOPYDONE)
+ {
+ /*
+ * The COPY phase was previously done, but tablesync then crashed/etc
+ * before it was able to finish normally.
+ */
+ elog(DEBUG1, "LogicalRepSyncTableStart: tablesync relstate was SUBREL_STATE_TCOPYDONE.");
+ StartTransactionCommand();
+
+ /*
+ * The origin tracking name must already exist (missing_ok=false).
+ */
+ originid = replorigin_by_name(originname, false);
+ replorigin_session_setup(originid);
+ replorigin_session_origin = originid;
+ *origin_startpos = replorigin_session_get_progress(false);
+
+ goto copy_table_done;
+ }
SpinLockAcquire(&MyLogicalRepWorker->relmutex);
MyLogicalRepWorker->relstate = SUBREL_STATE_DATASYNC;
@@ -890,9 +1002,6 @@ LogicalRepSyncTableStart(XLogRecPtr *origin_startpos)
CommitTransactionCommand();
pgstat_report_stat(false);
- /*
- * We want to do the table data sync in a single transaction.
- */
StartTransactionCommand();
/*
@@ -918,29 +1027,90 @@ LogicalRepSyncTableStart(XLogRecPtr *origin_startpos)
walrcv_clear_result(res);
/*
- * Create a new temporary logical decoding slot. This slot will be used
+ * Create a new permanent logical decoding slot. This slot will be used
* for the catchup phase after COPY is done, so tell it to use the
* snapshot to make the final data consistent.
*/
- walrcv_create_slot(wrconn, slotname, true,
+ walrcv_create_slot(wrconn, slotname, false,
CRS_USE_SNAPSHOT, origin_startpos);
- /* Now do the initial data copy */
- PushActiveSnapshot(GetTransactionSnapshot());
- copy_table(rel);
- PopActiveSnapshot();
+ /*
+ * Be sure to remove the newly created tablesync slot if the COPY fails.
+ */
+ PG_TRY();
+ {
+ /* Now do the initial data copy */
+ PushActiveSnapshot(GetTransactionSnapshot());
+ copy_table(rel);
+ PopActiveSnapshot();
+
+ res = walrcv_exec(wrconn, "COMMIT", 0, NULL);
+ if (res->status != WALRCV_OK_COMMAND)
+ ereport(ERROR,
+ (errmsg("table copy could not finish transaction on publisher"),
+ errdetail("The error was: %s", res->err)));
+ walrcv_clear_result(res);
+
+ table_close(rel, NoLock);
+
+ /* Make the copy visible. */
+ CommandCounterIncrement();
+ }
+ PG_CATCH();
+ {
+ /*
+ * If something failed during copy table then cleanup the created
+ * slot.
+ */
+ elog(DEBUG1, "LogicalRepSyncTableStart: tablesync copy failed. Dropping the tablesync slot \"%s\".", slotname);
+ ReplicationSlotDropAtPubNode(wrconn, slotname);
+
+ pfree(slotname);
+ slotname = NULL;
- res = walrcv_exec(wrconn, "COMMIT", 0, NULL);
- if (res->status != WALRCV_OK_COMMAND)
+ PG_RE_THROW();
+ }
+ PG_END_TRY();
+
+ /* Setup replication origin tracking. */
+ originid = replorigin_by_name(originname, true);
+ if (!OidIsValid(originid))
+ {
+ /*
+ * Origin tracking does not exist. Create it now, and advance to LSN
+ * got from walrcv_create_slot.
+ */
+ originid = replorigin_create(originname);
+ replorigin_session_setup(originid);
+ replorigin_session_origin = originid;
+ replorigin_advance(originid, *origin_startpos, InvalidXLogRecPtr,
+ true /* go backward */ , true /* WAL log */ );
+ }
+ else
+ {
ereport(ERROR,
- (errmsg("table copy could not finish transaction on publisher"),
- errdetail("The error was: %s", res->err)));
- walrcv_clear_result(res);
+ (errcode(ERRCODE_DUPLICATE_OBJECT),
+ errmsg("replication origin \"%s\" already exists",
+ originname)));
+ }
- table_close(rel, NoLock);
+ /*
+ * Update the persisted state to indicate the COPY phase is done; make it
+ * visible to others.
+ */
+ UpdateSubscriptionRelState(MyLogicalRepWorker->subid,
+ MyLogicalRepWorker->relid,
+ SUBREL_STATE_TCOPYDONE,
+ MyLogicalRepWorker->relstate_lsn);
+
+copy_table_done:
- /* Make the copy visible. */
- CommandCounterIncrement();
+ elog(DEBUG1, "LogicalRepSyncTableStart: '%s' origin_startpos lsn %X/%X",
+ originname,
+ (uint32) (*origin_startpos >> 32),
+ (uint32) *origin_startpos);
+
+ CommitTransactionCommand();
/*
* We are done with the initial data synchronization, update the state.
diff --git a/src/backend/replication/logical/worker.c b/src/backend/replication/logical/worker.c
index 1b1d70e..4bd4030 100644
--- a/src/backend/replication/logical/worker.c
+++ b/src/backend/replication/logical/worker.c
@@ -807,12 +807,8 @@ apply_handle_stream_stop(StringInfo s)
/* We must be in a valid transaction state */
Assert(IsTransactionState());
- /* The synchronization worker runs in single transaction. */
- if (!am_tablesync_worker())
- {
- /* Commit the per-stream transaction */
- CommitTransactionCommand();
- }
+ /* Commit the per-stream transaction */
+ CommitTransactionCommand();
in_streamed_transaction = false;
@@ -889,9 +885,7 @@ apply_handle_stream_abort(StringInfo s)
/* Cleanup the subxact info */
cleanup_subxact_info();
- /* The synchronization worker runs in single transaction */
- if (!am_tablesync_worker())
- CommitTransactionCommand();
+ CommitTransactionCommand();
return;
}
@@ -918,8 +912,7 @@ apply_handle_stream_abort(StringInfo s)
/* write the updated subxact list */
subxact_info_write(MyLogicalRepWorker->subid, xid);
- if (!am_tablesync_worker())
- CommitTransactionCommand();
+ CommitTransactionCommand();
}
}
@@ -1062,8 +1055,7 @@ apply_handle_stream_commit(StringInfo s)
static void
apply_handle_commit_internal(StringInfo s, LogicalRepCommitData* commit_data)
{
- /* The synchronization worker runs in single transaction. */
- if (IsTransactionState() && !am_tablesync_worker())
+ if (IsTransactionState())
{
/*
* Update origin state so we can restart streaming from correct
diff --git a/src/include/catalog/pg_subscription_rel.h b/src/include/catalog/pg_subscription_rel.h
index 06663b9..2c80405 100644
--- a/src/include/catalog/pg_subscription_rel.h
+++ b/src/include/catalog/pg_subscription_rel.h
@@ -61,6 +61,8 @@ DECLARE_UNIQUE_INDEX(pg_subscription_rel_srrelid_srsubid_index, 6117, on pg_subs
#define SUBREL_STATE_INIT 'i' /* initializing (sublsn NULL) */
#define SUBREL_STATE_DATASYNC 'd' /* data is being synchronized (sublsn
* NULL) */
+#define SUBREL_STATE_TCOPYDONE 't' /* tablesync copy phase is completed
+ * (sublsn NULL) */
#define SUBREL_STATE_SYNCDONE 's' /* synchronization finished in front of
* apply (sublsn set) */
#define SUBREL_STATE_READY 'r' /* ready (sublsn set) */
diff --git a/src/include/replication/slot.h b/src/include/replication/slot.h
index 53f636c..e617602 100644
--- a/src/include/replication/slot.h
+++ b/src/include/replication/slot.h
@@ -15,6 +15,7 @@
#include "storage/lwlock.h"
#include "storage/shmem.h"
#include "storage/spin.h"
+#include "replication/walreceiver.h"
/*
* Behaviour of replication slots, upon release or crash.
@@ -211,6 +212,8 @@ extern bool ReplicationSlotsCountDBSlots(Oid dboid, int *nslots, int *nactive);
extern void ReplicationSlotsDropDBSlots(Oid dboid);
extern void InvalidateObsoleteReplicationSlots(XLogSegNo oldestSegno);
extern ReplicationSlot *SearchNamedReplicationSlot(const char *name);
+extern char *ReplicationSlotNameForTablesync(Oid suboid, Oid relid);
+extern void ReplicationSlotDropAtPubNode(WalReceiverConn *wrconn, char *slotname);
extern void StartupReplicationSlots(void);
extern void CheckPointReplicationSlots(void);
--
1.8.3.1
v11-0002-Tablesync-extra-logging.patchapplication/octet-stream; name=v11-0002-Tablesync-extra-logging.patchDownload
From 5209a449efca9a7e503274cedcd76c3831df72a5 Mon Sep 17 00:00:00 2001
From: Peter Smith <peter.b.smith@fujitsu.com>
Date: Tue, 5 Jan 2021 20:40:48 +1100
Subject: [PATCH v11] Tablesync extra logging.
This patch only adds some extra logging which may be helpful for testing, but is not for committing.
---
src/backend/commands/subscriptioncmds.c | 6 ++++--
src/backend/replication/logical/tablesync.c | 20 +++++++++++++++-----
2 files changed, 19 insertions(+), 7 deletions(-)
diff --git a/src/backend/commands/subscriptioncmds.c b/src/backend/commands/subscriptioncmds.c
index 16b2bee..a1881cc 100644
--- a/src/backend/commands/subscriptioncmds.c
+++ b/src/backend/commands/subscriptioncmds.c
@@ -1105,8 +1105,9 @@ DropSubscription(DropSubscriptionStmt *stmt, bool isTopLevel)
{
PG_TRY();
{
- elog(DEBUG1, "DropSubscription: dropping the tablesync slot \"%s\".", syncslotname);
+ elog(LOG, "!!>> DropSubscription: dropping the tablesync slot \"%s\".", syncslotname);
ReplicationSlotDropAtPubNode(wrconn, syncslotname);
+ elog(LOG, "!!>> DropSubscription: dropped the tablesync slot \"%s\".", syncslotname);
}
PG_CATCH();
{
@@ -1139,8 +1140,9 @@ DropSubscription(DropSubscriptionStmt *stmt, bool isTopLevel)
originid = replorigin_by_name(originname, true);
if (originid != InvalidRepOriginId)
{
- elog(DEBUG1, "DropSubscription: dropping origin tracking for \"%s\"", originname);
+ elog(LOG, "!!>> DropSubscription: dropping origin tracking for \"%s\"", originname);
replorigin_drop(originid, false);
+ elog(LOG, "!!>> DropSubscription: dropped origin tracking for \"%s\"", originname);
}
}
diff --git a/src/backend/replication/logical/tablesync.c b/src/backend/replication/logical/tablesync.c
index f83f06f..afc77dd 100644
--- a/src/backend/replication/logical/tablesync.c
+++ b/src/backend/replication/logical/tablesync.c
@@ -158,8 +158,9 @@ finish_sync_worker(void)
PG_TRY();
{
- elog(DEBUG1, "finish_sync_worker: dropping the tablesync slot \"%s\".", syncslotname);
+ elog(LOG, "!!>> finish_sync_worker: dropping the tablesync slot \"%s\".", syncslotname);
ReplicationSlotDropAtPubNode(wrconn, syncslotname);
+ elog(LOG, "!!>> finish_sync_worker: dropped the tablesync slot \"%s\".", syncslotname);
}
PG_CATCH();
{
@@ -472,10 +473,12 @@ process_syncing_tables_for_apply(XLogRecPtr current_lsn)
snprintf(originname, sizeof(originname), "pg_%u_%u", MyLogicalRepWorker->subid, rstate->relid);
originid = replorigin_by_name(originname, true);
+ elog(LOG, "!!>> apply worker: find tablesync origin tracking for \"%s\".", originname);
if (OidIsValid(originid))
{
- elog(DEBUG1, "process_syncing_tables_for_apply: dropping tablesync origin tracking for \"%s\".", originname);
+ elog(LOG, "!!>> process_syncing_tables_for_apply: dropping tablesync origin tracking for \"%s\".", originname);
replorigin_drop(originid, false);
+ elog(LOG, "!!>> apply worker: dropped tablesync origin tracking for \"%s\".", originname);
}
}
@@ -974,15 +977,17 @@ LogicalRepSyncTableStart(XLogRecPtr *origin_startpos)
* The COPY phase was previously done, but tablesync then crashed/etc
* before it was able to finish normally.
*/
- elog(DEBUG1, "LogicalRepSyncTableStart: tablesync relstate was SUBREL_STATE_TCOPYDONE.");
+ elog(LOG, "!!>> LogicalRepSyncTableStart: tablesync relstate was SUBREL_STATE_COPYDONE.");
StartTransactionCommand();
/*
* The origin tracking name must already exist (missing_ok=false).
*/
originid = replorigin_by_name(originname, false);
+ elog(LOG, "!!>> LogicalRepSyncTableStart: 2 replorigin_session_setup \"%s\".", originname);
replorigin_session_setup(originid);
replorigin_session_origin = originid;
+ elog(LOG, "!!>> LogicalRepSyncTableStart: 2 replorigin_session_get_progress \"%s\".", originname);
*origin_startpos = replorigin_session_get_progress(false);
goto copy_table_done;
@@ -1031,6 +1036,7 @@ LogicalRepSyncTableStart(XLogRecPtr *origin_startpos)
* for the catchup phase after COPY is done, so tell it to use the
* snapshot to make the final data consistent.
*/
+ elog(LOG, "!!>> LogicalRepSyncTableStart: walrcv_create_slot for \"%s\".", slotname);
walrcv_create_slot(wrconn, slotname, false,
CRS_USE_SNAPSHOT, origin_startpos);
@@ -1062,8 +1068,9 @@ LogicalRepSyncTableStart(XLogRecPtr *origin_startpos)
* If something failed during copy table then cleanup the created
* slot.
*/
- elog(DEBUG1, "LogicalRepSyncTableStart: tablesync copy failed. Dropping the tablesync slot \"%s\".", slotname);
+ elog(LOG, "!!>> LogicalRepSyncTableStart: tablesync copy failed. Dropping the tablesync slot \"%s\".", slotname);
ReplicationSlotDropAtPubNode(wrconn, slotname);
+ elog(LOG, "!!>> LogicalRepSyncTableStart: tablesync copy failed. Dropped the tablesync slot \"%s\".", slotname);
pfree(slotname);
slotname = NULL;
@@ -1080,9 +1087,12 @@ LogicalRepSyncTableStart(XLogRecPtr *origin_startpos)
* Origin tracking does not exist. Create it now, and advance to LSN
* got from walrcv_create_slot.
*/
+ elog(LOG, "!!>> LogicalRepSyncTableStart: 1 replorigin_create \"%s\".", originname);
originid = replorigin_create(originname);
+ elog(LOG, "!!>> LogicalRepSyncTableStart: 1 replorigin_session_setup \"%s\".", originname);
replorigin_session_setup(originid);
replorigin_session_origin = originid;
+ elog(LOG, "!!>> LogicalRepSyncTableStart: 1 replorigin_advance \"%s\".", originname);
replorigin_advance(originid, *origin_startpos, InvalidXLogRecPtr,
true /* go backward */ , true /* WAL log */ );
}
@@ -1105,7 +1115,7 @@ LogicalRepSyncTableStart(XLogRecPtr *origin_startpos)
copy_table_done:
- elog(DEBUG1, "LogicalRepSyncTableStart: '%s' origin_startpos lsn %X/%X",
+ elog(LOG, "!!>> LogicalRepSyncTableStart: '%s' origin_startpos lsn %X/%X",
originname,
(uint32) (*origin_startpos >> 32),
(uint32) *origin_startpos);
--
1.8.3.1
On Mon, Jan 4, 2021 at 10:48 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
Few more comments on v9: ====================== 1. + /* Drop the tablesync slot. */ + { + char *syncslotname = ReplicationSlotNameForTablesync(subid, relid); + + /* + * If the subscription slotname is NONE/NULL and the connection to publisher is + * broken, but the DropSubscription should still be allowed to complete. + * But without a connection it is not possible to drop any tablesync slots. + */ + if (!wrconn) + { + /* FIXME - OK to just log a warning? */ + elog(WARNING, "!!>> DropSubscription: no connection. Cannot drop tablesync slot \"%s\".", + syncslotname); + }Why is this not an ERROR? We don't want to keep the table slots
lingering after DropSubscription. If there is any tablesync slot that
needs to be dropped and the publisher is not available then we should
raise an error.
Previously there was only the subscription slot. If the connection was
broken and caused an error then it was still possible for the user to
disassociate the subscription from the slot using ALTER SUBSCRIPTION
... SET (slot_name = NONE). And then (when the slotname is NULL) the
DropSubscription could complete OK. I expect in that case the Admin
still had some slot clean-up they would need to do on the Publisher
machine.
But now we have the tablesync slots so if I caused them to give ERROR
when the connection is broken then the subscription would become
un-droppable. If you think that having ERROR and an undroppable
subscription is better than the current WARNING then please let me
know - there is no problem to change it.
2. + /* + * Tablesync resource cleanup (slots and origins). + * + * Any READY-state relations would already have dealt with clean-ups. + */ + {There is no need to start a separate block '{' here.
Written this way so I can declare variables only at the scope they are
needed. I didn’t see anything in the PG code conventions discouraging
doing this practice: https://www.postgresql.org/docs/devel/source.html
3.
+#define SUBREL_STATE_COPYDONE 'C' /* tablesync copy phase is completed */You can mention in the comments that sublsn will be NULL for this
state as it is mentioned for other similar states. Can we think of
using any letter in lower case for this as all other states are in
lower-case except for this which makes it a look bit odd? We can use
'f' or 'e' and describe it as 'copy finished' or 'copy end'. I am fine
if you have any better ideas.
Fixed in latest patch [v11]
4. LogicalRepSyncTableStart() { .. .. +copy_table_done: + + /* Setup replication origin tracking. */ + { + char originname[NAMEDATALEN]; + RepOriginId originid; + + snprintf(originname, sizeof(originname), "pg_%u_%u", MySubscription->oid, MyLogicalRepWorker->relid); + originid = replorigin_by_name(originname, true); + if (!OidIsValid(originid)) + { + /* + * Origin tracking does not exist. Create it now, and advance to LSN got from walrcv_create_slot. + */ + elog(LOG, "!!>> LogicalRepSyncTableStart: 1 replorigin_create \"%s\".", originname); + originid = replorigin_create(originname); + elog(LOG, "!!>> LogicalRepSyncTableStart: 1 replorigin_session_setup \"%s\".", originname); + replorigin_session_setup(originid); + replorigin_session_origin = originid; + elog(LOG, "!!>> LogicalRepSyncTableStart: 1 replorigin_advance \"%s\".", originname); + replorigin_advance(originid, *origin_startpos, InvalidXLogRecPtr, + true /* go backward */ , true /* WAL log */ ); + } + else + { + /* + * Origin tracking already exists. + */ + elog(LOG, "!!>> LogicalRepSyncTableStart: 2 replorigin_session_setup \"%s\".", originname); + replorigin_session_setup(originid); + replorigin_session_origin = originid; + elog(LOG, "!!>> LogicalRepSyncTableStart: 2 replorigin_session_get_progress \"%s\".", originname); + *origin_startpos = replorigin_session_get_progress(false); + } .. .. }I am not sure if this code is correct because, for the very first time
when the copy is done, we don't expect replication origin to exist
whereas this code will silently use already existing replication
origin which can lead to a wrong start position for the slot. In such
a case it should error out. I guess we should create the replication
origin before making the state as copydone. I feel we should even have
a test case for this as it is not difficult to have a pre-existing
replication origin.
Fixed as suggested in latest patch [v11]
5. Is it possible to write a testcase where we fail (say due to pk
violation or some other error) after the initial copy is done, then
remove the conflicting row and allow a copy to be completed? If we
already have any such test then it is fine.
Causing a PK violation during the initial copy is not a problem to
test, but doing it after the initial copy is difficult. I have done
exactly this test scenario before but I thought it cannot be
automated. E.g. To cause an PK violation error somewhere between
COPYDONE and SYNDONE means that the offending insert (the one which
tablesync will fail to replicate) has to be sent while the tablesync
is in CATCHUP mode. But AFAIK that can only be achieved using the
debugger to get the timing right.
6. +/* + * Drop the replication slot at the publisher node + * using the replication connection. + */This comment looks a bit odd. The first line appears to be too short.
We have limit of 80 chars but this is much lesser than that.
Fixed in latest patch [v11]
7.
@@ -905,7 +905,7 @@ replorigin_advance(RepOriginId node,
LWLockAcquire(&replication_state->lock, LW_EXCLUSIVE);/* Make sure it's not used by somebody else */ - if (replication_state->acquired_by != 0) + if (replication_state->acquired_by != 0 && replication_state->acquired_by != MyProcPid) {
TODO
I think you won't need this change if you do replorigin_advance before
replorigin_session_setup in your patch.8. - * that ensures we won't loose knowledge about that after a crash if the + * that ensures we won't lose knowledge about that after a crash if theIt is better to submit this as a separate patch.
Done. Please see CF entry. https://commitfest.postgresql.org/32/2926/
----
[v11] = /messages/by-id/CAHut+Pu0A6TUPgYC-L3BKYQfa_ScL31kOV_3RsB3ActdkL1iBQ@mail.gmail.com
Kind Regards,
Peter Smith.
Fujitsu Australia.
On Tue, Jan 5, 2021 at 3:32 PM Peter Smith <smithpb2250@gmail.com> wrote:
On Mon, Jan 4, 2021 at 10:48 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
Few more comments on v9: ====================== 1. + /* Drop the tablesync slot. */ + { + char *syncslotname = ReplicationSlotNameForTablesync(subid, relid); + + /* + * If the subscription slotname is NONE/NULL and the connection to publisher is + * broken, but the DropSubscription should still be allowed to complete. + * But without a connection it is not possible to drop any tablesync slots. + */ + if (!wrconn) + { + /* FIXME - OK to just log a warning? */ + elog(WARNING, "!!>> DropSubscription: no connection. Cannot drop tablesync slot \"%s\".", + syncslotname); + }Why is this not an ERROR? We don't want to keep the table slots
lingering after DropSubscription. If there is any tablesync slot that
needs to be dropped and the publisher is not available then we should
raise an error.Previously there was only the subscription slot. If the connection was
broken and caused an error then it was still possible for the user to
disassociate the subscription from the slot using ALTER SUBSCRIPTION
... SET (slot_name = NONE). And then (when the slotname is NULL) the
DropSubscription could complete OK. I expect in that case the Admin
still had some slot clean-up they would need to do on the Publisher
machine.
I think such an option could probably be used for user-created slots
but it would be difficult for even Admin to know about these
internally created slots associated with the particular subscription.
I would say it is better to ERROR out.
2. + /* + * Tablesync resource cleanup (slots and origins). + * + * Any READY-state relations would already have dealt with clean-ups. + */ + {There is no need to start a separate block '{' here.
Written this way so I can declare variables only at the scope they are
needed. I didn’t see anything in the PG code conventions discouraging
doing this practice: https://www.postgresql.org/docs/devel/source.html
But, do we encourage such a coding convention to declare variables. I
find it difficult to read such a code. I guess as a one-off we can do
this but I don't see a compelling need here.
3.
+#define SUBREL_STATE_COPYDONE 'C' /* tablesync copy phase is completed */You can mention in the comments that sublsn will be NULL for this
state as it is mentioned for other similar states. Can we think of
using any letter in lower case for this as all other states are in
lower-case except for this which makes it a look bit odd? We can use
'f' or 'e' and describe it as 'copy finished' or 'copy end'. I am fine
if you have any better ideas.Fixed in latest patch [v11]
It is still not reflected in the docs. See below:
--- a/doc/src/sgml/catalogs.sgml
+++ b/doc/src/sgml/catalogs.sgml
@@ -7651,6 +7651,7 @@ SCRAM-SHA-256$<replaceable><iteration
count></replaceable>:<replaceable>&l
State code:
<literal>i</literal> = initialize,
<literal>d</literal> = data is being copied,
+ <literal>C</literal> = table data has been copied,
<literal>s</literal> = synchronized,
--
With Regards,
Amit Kapila.
On Tue, Jan 5, 2021 at 10:41 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
1. + /* Drop the tablesync slot. */ + { + char *syncslotname = ReplicationSlotNameForTablesync(subid, relid); + + /* + * If the subscription slotname is NONE/NULL and the connection to publisher is + * broken, but the DropSubscription should still be allowed to complete. + * But without a connection it is not possible to drop any tablesync slots. + */ + if (!wrconn) + { + /* FIXME - OK to just log a warning? */ + elog(WARNING, "!!>> DropSubscription: no connection. Cannot drop tablesync slot \"%s\".", + syncslotname); + }Why is this not an ERROR? We don't want to keep the table slots
lingering after DropSubscription. If there is any tablesync slot that
needs to be dropped and the publisher is not available then we should
raise an error.Previously there was only the subscription slot. If the connection was
broken and caused an error then it was still possible for the user to
disassociate the subscription from the slot using ALTER SUBSCRIPTION
... SET (slot_name = NONE). And then (when the slotname is NULL) the
DropSubscription could complete OK. I expect in that case the Admin
still had some slot clean-up they would need to do on the Publisher
machine.I think such an option could probably be used for user-created slots
but it would be difficult for even Admin to know about these
internally created slots associated with the particular subscription.
I would say it is better to ERROR out.
I am having doubts that ERROR is the best choice here. There is a long
note in https://www.postgresql.org/docs/devel/sql-dropsubscription.html
which describes this problem for the subscription slot and how to
disassociate the name to give a workaround “To proceed in this
situation”.
OTOH if we make the tablesync slot unconditionally ERROR for a broken
connection then there is no way to proceed, and the whole (slot_name =
NONE) workaround becomes ineffectual. Note - the current patch code is
only logging when the user has already disassociated the slot name; of
course normally (when the slot name was not disassociated) table slots
will give ERROR for broken connections.
IMO, if the user has disassociated the slot name then they have
already made their decision that they REALLY DO want to “proceed in
this situation”. So I thought we should let them proceed.
What do you think?
----
Kind Regards,
Peter Smith.
Fujitsu Australia.
On Wed, Jan 6, 2021 at 4:32 AM Peter Smith <smithpb2250@gmail.com> wrote:
On Tue, Jan 5, 2021 at 10:41 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
1. + /* Drop the tablesync slot. */ + { + char *syncslotname = ReplicationSlotNameForTablesync(subid, relid); + + /* + * If the subscription slotname is NONE/NULL and the connection to publisher is + * broken, but the DropSubscription should still be allowed to complete. + * But without a connection it is not possible to drop any tablesync slots. + */ + if (!wrconn) + { + /* FIXME - OK to just log a warning? */ + elog(WARNING, "!!>> DropSubscription: no connection. Cannot drop tablesync slot \"%s\".", + syncslotname); + }Why is this not an ERROR? We don't want to keep the table slots
lingering after DropSubscription. If there is any tablesync slot that
needs to be dropped and the publisher is not available then we should
raise an error.Previously there was only the subscription slot. If the connection was
broken and caused an error then it was still possible for the user to
disassociate the subscription from the slot using ALTER SUBSCRIPTION
... SET (slot_name = NONE). And then (when the slotname is NULL) the
DropSubscription could complete OK. I expect in that case the Admin
still had some slot clean-up they would need to do on the Publisher
machine.I think such an option could probably be used for user-created slots
but it would be difficult for even Admin to know about these
internally created slots associated with the particular subscription.
I would say it is better to ERROR out.I am having doubts that ERROR is the best choice here. There is a long
note in https://www.postgresql.org/docs/devel/sql-dropsubscription.html
which describes this problem for the subscription slot and how to
disassociate the name to give a workaround “To proceed in this
situation”.OTOH if we make the tablesync slot unconditionally ERROR for a broken
connection then there is no way to proceed, and the whole (slot_name =
NONE) workaround becomes ineffectual. Note - the current patch code is
only logging when the user has already disassociated the slot name; of
course normally (when the slot name was not disassociated) table slots
will give ERROR for broken connections.IMO, if the user has disassociated the slot name then they have
already made their decision that they REALLY DO want to “proceed in
this situation”. So I thought we should let them proceed.
Okay, if we want to go that way then we should add some documentation
about it. Currently, the slot name used by apply worker is known to
the user because either it is specified by the user or the default is
subscription name, so the user can manually remove that slot later but
that is not true for tablesync slots. I think we need to update both
the Drop Subscription page [1]https://www.postgresql.org/docs/devel/sql-dropsubscription.html and logical-replication-subscription
page [2]https://www.postgresql.org/docs/devel/logical-replication-subscription.html where we have mentioned temporary slots and in the end "Here
are some scenarios: .." to mention about these slots and probably how
their names are generated so that in such special situations users can
drop them manually.
[1]: https://www.postgresql.org/docs/devel/sql-dropsubscription.html
[2]: https://www.postgresql.org/docs/devel/logical-replication-subscription.html
--
With Regards,
Amit Kapila.
On Tue, Jan 5, 2021 at 3:32 PM Peter Smith <smithpb2250@gmail.com> wrote:
On Mon, Jan 4, 2021 at 10:48 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
5. Is it possible to write a testcase where we fail (say due to pk
violation or some other error) after the initial copy is done, then
remove the conflicting row and allow a copy to be completed? If we
already have any such test then it is fine.Causing a PK violation during the initial copy is not a problem to
test, but doing it after the initial copy is difficult. I have done
exactly this test scenario before but I thought it cannot be
automated. E.g. To cause an PK violation error somewhere between
COPYDONE and SYNDONE means that the offending insert (the one which
tablesync will fail to replicate) has to be sent while the tablesync
is in CATCHUP mode. But AFAIK that can only be achieved using the
debugger to get the timing right.
Yeah, I am also not able to think of any way to automate such a test.
I was thinking about what could go wrong if we error out in that
stage. The only thing that could be problematic is if we somehow make
the slot and replication origin used during copy dangling. I think if
tablesync is restarted after error then we will clean up those which
will be normally the case but what if the tablesync worker is not
started again? I think the only possibility of tablesync worker not
started again is if during Alter Subscription ... Refresh Publication,
we remove the corresponding subscription rel (see
AlterSubscription_refresh, I guess it could happen if one has dropped
the relation from publication). I haven't tested this with your patch
but if such a possibility exists then we need to think of cleaning up
slot and origin when we remove subscription rel. What do you think?
--
With Regards,
Amit Kapila.
On Wed, Jan 6, 2021 at 4:04 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
On Tue, Jan 5, 2021 at 3:32 PM Peter Smith <smithpb2250@gmail.com> wrote:
On Mon, Jan 4, 2021 at 10:48 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
5. Is it possible to write a testcase where we fail (say due to pk
violation or some other error) after the initial copy is done, then
remove the conflicting row and allow a copy to be completed? If we
already have any such test then it is fine.Causing a PK violation during the initial copy is not a problem to
test, but doing it after the initial copy is difficult. I have done
exactly this test scenario before but I thought it cannot be
automated. E.g. To cause an PK violation error somewhere between
COPYDONE and SYNDONE means that the offending insert (the one which
tablesync will fail to replicate) has to be sent while the tablesync
is in CATCHUP mode. But AFAIK that can only be achieved using the
debugger to get the timing right.Yeah, I am also not able to think of any way to automate such a test.
I was thinking about what could go wrong if we error out in that
stage. The only thing that could be problematic is if we somehow make
the slot and replication origin used during copy dangling. I think if
tablesync is restarted after error then we will clean up those which
will be normally the case but what if the tablesync worker is not
started again? I think the only possibility of tablesync worker not
started again is if during Alter Subscription ... Refresh Publication,
we remove the corresponding subscription rel (see
AlterSubscription_refresh, I guess it could happen if one has dropped
the relation from publication). I haven't tested this with your patch
but if such a possibility exists then we need to think of cleaning up
slot and origin when we remove subscription rel. What do you think?
I think it makes sense. If there can be a race between the tablesync
re-launching (after error), and the AlterSubscription_refresh removing
some table’s relid from the subscription then there could be lurking
slot/origin tablesync resources (of the removed table) which a
subsequent DROP SUBSCRIPTION cannot discover. I will think more about
how/if it is possible to make this happen. Anyway, I suppose I ought
to refactor/isolate some of the tablesync cleanup code in case it
needs to be commonly called from DropSubscription and/or from
AlterSubscription_refresh.
----
Kind Regards,
Peter Smith.
Fujitsu Australia.
On Wed, Jan 6, 2021 at 2:13 PM Peter Smith <smithpb2250@gmail.com> wrote:
I think it makes sense. If there can be a race between the tablesync
re-launching (after error), and the AlterSubscription_refresh removing
some table’s relid from the subscription then there could be lurking
slot/origin tablesync resources (of the removed table) which a
subsequent DROP SUBSCRIPTION cannot discover. I will think more about
how/if it is possible to make this happen. Anyway, I suppose I ought
to refactor/isolate some of the tablesync cleanup code in case it
needs to be commonly called from DropSubscription and/or from
AlterSubscription_refresh.
Fair enough. BTW, I have analyzed whether we need any modifications to
pg_dump/restore for this patch as this changes the state of one of the
fields in the system table and concluded that we don't need any
change. For subscriptions, we don't dump any of the information from
pg_subscription_rel, rather we just dump subscriptions with the
connect option as false which means users need to enable the
subscription and refresh publication after restore. I have checked
this in the code and tested it as well. The related information is
present in pg_dump doc page [1]https://www.postgresql.org/docs/devel/app-pgdump.html, see from "When dumping logical
replication subscriptions ....".
[1]: https://www.postgresql.org/docs/devel/app-pgdump.html
--
With Regards,
Amit Kapila.
PSA the v11 patch for the Tablesync Solution1.
Difference from v10:
- Addresses several recent review comments.
- pg_indent has been run
Hi
I took a look into the patch and have some comments.
1.
* So the state progression is always: INIT -> DATASYNC -> SYNCWAIT ->
- * CATCHUP -> SYNCDONE -> READY.
+ * CATCHUP -> (sync worker TCOPYDONE) -> SYNCDONE -> READY.
I noticed the new state TCOPYDONE is commented between CATCHUP and SYNCDONE,
But It seems the SUBREL_STATE_TCOPYDONE is actually set before SUBREL_STATE_SYNCWAIT[1]---------------- + UpdateSubscriptionRelState(MyLogicalRepWorker->subid, + MyLogicalRepWorker->relid, + SUBREL_STATE_TCOPYDONE, + MyLogicalRepWorker->relstate_lsn); ... /* * We are done with the initial data synchronization, update the state. */ SpinLockAcquire(&MyLogicalRepWorker->relmutex); MyLogicalRepWorker->relstate = SUBREL_STATE_SYNCWAIT; ------------------.
Did i miss something here ?
[1]-----------------
+ UpdateSubscriptionRelState(MyLogicalRepWorker->subid,
+ MyLogicalRepWorker->relid,
+ SUBREL_STATE_TCOPYDONE,
+ MyLogicalRepWorker->relstate_lsn);
...
/*
* We are done with the initial data synchronization, update the state.
*/
SpinLockAcquire(&MyLogicalRepWorker->relmutex);
MyLogicalRepWorker->relstate = SUBREL_STATE_SYNCWAIT;
------------------
2.
<literal>i</literal> = initialize,
<literal>d</literal> = data is being copied,
+ <literal>C</literal> = table data has been copied,
<literal>s</literal> = synchronized,
<literal>r</literal> = ready (normal replication)
+#define SUBREL_STATE_TCOPYDONE 't' /* tablesync copy phase is completed
+ * (sublsn NULL) */
The character representing 'data has been copied' in the catalog seems different from the macro define.
Best regards,
houzj
Thankyou for the feedback.
On Thu, Jan 7, 2021 at 12:45 PM Hou, Zhijie <houzj.fnst@cn.fujitsu.com> wrote:
PSA the v11 patch for the Tablesync Solution1.
Difference from v10:
- Addresses several recent review comments.
- pg_indent has been runHi
I took a look into the patch and have some comments.
1. * So the state progression is always: INIT -> DATASYNC -> SYNCWAIT -> - * CATCHUP -> SYNCDONE -> READY. + * CATCHUP -> (sync worker TCOPYDONE) -> SYNCDONE -> READY.I noticed the new state TCOPYDONE is commented between CATCHUP and SYNCDONE,
But It seems the SUBREL_STATE_TCOPYDONE is actually set before SUBREL_STATE_SYNCWAIT[1].
Did i miss something here ?[1]----------------- + UpdateSubscriptionRelState(MyLogicalRepWorker->subid, + MyLogicalRepWorker->relid, + SUBREL_STATE_TCOPYDONE, + MyLogicalRepWorker->relstate_lsn); ... /* * We are done with the initial data synchronization, update the state. */ SpinLockAcquire(&MyLogicalRepWorker->relmutex); MyLogicalRepWorker->relstate = SUBREL_STATE_SYNCWAIT; ------------------
Thanks for reporting this mistake. I will correct the comment for the
next patch (v12)
2.
<literal>i</literal> = initialize,
<literal>d</literal> = data is being copied,
+ <literal>C</literal> = table data has been copied,
<literal>s</literal> = synchronized,
<literal>r</literal> = ready (normal replication)+#define SUBREL_STATE_TCOPYDONE 't' /* tablesync copy phase is completed + * (sublsn NULL) */ The character representing 'data has been copied' in the catalog seems different from the macro define.
Yes, same was already previously reported [1]/messages/by-id/CAA4eK1Kyi037XZzyrLE71MS2KoMmNSqa6RrQLdSCeeL27gnL+A@mail.gmail.com It will be fixed in the next patch (v12)
[1]: /messages/by-id/CAA4eK1Kyi037XZzyrLE71MS2KoMmNSqa6RrQLdSCeeL27gnL+A@mail.gmail.com It will be fixed in the next patch (v12)
It will be fixed in the next patch (v12)
----
Kind Regards,
Peter Smith.
Fujitsu Australia.
On Wed, Jan 6, 2021 at 3:39 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
On Wed, Jan 6, 2021 at 2:13 PM Peter Smith <smithpb2250@gmail.com> wrote:
I think it makes sense. If there can be a race between the tablesync
re-launching (after error), and the AlterSubscription_refresh removing
some table’s relid from the subscription then there could be lurking
slot/origin tablesync resources (of the removed table) which a
subsequent DROP SUBSCRIPTION cannot discover. I will think more about
how/if it is possible to make this happen. Anyway, I suppose I ought
to refactor/isolate some of the tablesync cleanup code in case it
needs to be commonly called from DropSubscription and/or from
AlterSubscription_refresh.Fair enough.
I think before implementing, we should once try to reproduce this
case. I understand this is a timing issue and can be reproduced only
with the help of debugger but we should do that.
BTW, I have analyzed whether we need any modifications to
pg_dump/restore for this patch as this changes the state of one of the
fields in the system table and concluded that we don't need any
change. For subscriptions, we don't dump any of the information from
pg_subscription_rel, rather we just dump subscriptions with the
connect option as false which means users need to enable the
subscription and refresh publication after restore. I have checked
this in the code and tested it as well. The related information is
present in pg_dump doc page [1], see from "When dumping logical
replication subscriptions ....".
I have further analyzed that we don't need to do anything w.r.t
pg_upgrade as well because it uses pg_dump/pg_dumpall to dump the
schema info of the old cluster and then restore it to the new cluster.
And, we know that pg_dump ignores the info in pg_subscription_rel, so
we don't need to change anything as our changes are specific to the
state of one of the columns in pg_subscription_rel. I have not tested
this but we should test it by having some relations in not_ready state
and then allow the old cluster (<=PG13) to be upgraded to new (pg14)
both with and without this patch and see if there is any change in
behavior.
--
With Regards,
Amit Kapila.
Hi Amit.
PSA the v12 patch for the Tablesync Solution1.
Differences from v11:
+ Added PG docs to mention the tablesync slot
+ Refactored tablesync slot drop (done by
DropSubscription/process_syncing_tables_for_sync)
+ Fixed PG docs mentioning wrong state code
+ Fixed wrong code comment describing TCOPYDONE state
====
Features:
* The tablesync slot is now permanent instead of temporary. The
tablesync slot name is no longer tied to the Subscription slot na
* The tablesync slot cleanup (drop) code is added for DropSubscription
and for process_syncing_tables_for_sync functions
* The tablesync worker is now allowing multiple tx instead of single tx
* A new state (SUBREL_STATE_TCOPYDONE) is persisted after a successful
copy_table in LogicalRepSyncTableStart.
* If a re-launched tablesync finds state SUBREL_STATE_TCOPYDONE then
it will bypass the initial copy_table phase.
* Now tablesync sets up replication origin tracking in
LogicalRepSyncTableStart (similar as done for the apply worker). The
origin is advanced when first created.
* The tablesync replication origin tracking is cleaned up during
DropSubscription and/or process_syncing_tables_for_apply.
* The DropSubscription cleanup code was enhanced (v7+) to take care of
any crashed tablesync workers.
* Updates to PG docs
TODO / Known Issues:
* Address review comments
* Patch applies with whitespace warning
---
Kind Regards,
Peter Smith.
Fujitsu Australia
Attachments:
v12-0001-Tablesync-Solution1.patchapplication/octet-stream; name=v12-0001-Tablesync-Solution1.patchDownload
From e5b823cb2260b9701fd13dd90c5d1681ba5b4831 Mon Sep 17 00:00:00 2001
From: Peter Smith <peter.b.smith@fujitsu.com>
Date: Thu, 7 Jan 2021 16:52:25 +1100
Subject: [PATCH v12] Tablesync Solution1.
====
Features:
* The tablesync slot is now permanent instead of temporary. The tablesync slot name is no longer tied to the Subscription slot na
* The tablesync slot cleanup (drop) code is added for DropSubscription and for process_syncing_tables_for_sync functions
* The tablesync worker is now allowing multiple tx instead of single tx
* A new state (SUBREL_STATE_TCOPYDONE) is persisted after a successful copy_table in LogicalRepSyncTableStart.
* If a re-launched tablesync finds state SUBREL_STATE_TCOPYDONE then it will bypass the initial copy_table phase.
* Now tablesync sets up replication origin tracking in LogicalRepSyncTableStart (similar as done for the apply worker). The origin is advanced when first created.
* The tablesync replication origin tracking is cleaned up during DropSubscription and/or process_syncing_tables_for_apply.
* The DropSubscription cleanup code was enhanced (v7+) to take care of any crashed tablesync workers.
* Updates to PG docs
TODO / Known Issues:
* Address review comments
---
doc/src/sgml/catalogs.sgml | 1 +
doc/src/sgml/logical-replication.sgml | 16 +-
doc/src/sgml/ref/drop_subscription.sgml | 6 +-
src/backend/commands/subscriptioncmds.c | 212 ++++++++++++++++-------
src/backend/replication/logical/origin.c | 2 +-
src/backend/replication/logical/tablesync.c | 249 +++++++++++++++++++++++-----
src/backend/replication/logical/worker.c | 18 +-
src/include/catalog/pg_subscription_rel.h | 2 +
src/include/replication/slot.h | 3 +
9 files changed, 394 insertions(+), 115 deletions(-)
diff --git a/doc/src/sgml/catalogs.sgml b/doc/src/sgml/catalogs.sgml
index 3a22665..6d294c8 100644
--- a/doc/src/sgml/catalogs.sgml
+++ b/doc/src/sgml/catalogs.sgml
@@ -7651,6 +7651,7 @@ SCRAM-SHA-256$<replaceable><iteration count></replaceable>:<replaceable>&l
State code:
<literal>i</literal> = initialize,
<literal>d</literal> = data is being copied,
+ <literal>t</literal> = table data has been copied,
<literal>s</literal> = synchronized,
<literal>r</literal> = ready (normal replication)
</para></entry>
diff --git a/doc/src/sgml/logical-replication.sgml b/doc/src/sgml/logical-replication.sgml
index a560ad6..66290d6 100644
--- a/doc/src/sgml/logical-replication.sgml
+++ b/doc/src/sgml/logical-replication.sgml
@@ -248,7 +248,16 @@
<para>
As mentioned earlier, each (active) subscription receives changes from a
- replication slot on the remote (publishing) side. Normally, the remote
+ replication slot on the remote (publishing) side.
+ </para>
+ <para>
+ Additional table synchronization slots are normally transient, created
+ internally and dropped automatically when they are no longer needed.
+ These table synchronization slots have generated names:
+ <quote><literal>pg_%u_sync_%u</literal></quote> (parameters: Subscription <parameter>oid</parameter>, Table <parameter>relid</parameter>)
+ </para>
+ <para>
+ Normally, the remote
replication slot is created automatically when the subscription is created
using <command>CREATE SUBSCRIPTION</command> and it is dropped
automatically when the subscription is dropped using <command>DROP
@@ -294,8 +303,9 @@
using <command>ALTER SUBSCRIPTION</command> before attempting to drop
the subscription. If the remote database instance no longer exists, no
further action is then necessary. If, however, the remote database
- instance is just unreachable, the replication slot should then be
- dropped manually; otherwise it would continue to reserve WAL and might
+ instance is just unreachable, the replication slot (and any still
+ remaining table synchronization slots) should then be
+ dropped manually; otherwise it/they would continue to reserve WAL and might
eventually cause the disk to fill up. Such cases should be carefully
investigated.
</para>
diff --git a/doc/src/sgml/ref/drop_subscription.sgml b/doc/src/sgml/ref/drop_subscription.sgml
index adbdeaf..aee9615 100644
--- a/doc/src/sgml/ref/drop_subscription.sgml
+++ b/doc/src/sgml/ref/drop_subscription.sgml
@@ -79,7 +79,8 @@ DROP SUBSCRIPTION [ IF EXISTS ] <replaceable class="parameter">name</replaceable
<para>
When dropping a subscription that is associated with a replication slot on
the remote host (the normal state), <command>DROP SUBSCRIPTION</command>
- will connect to the remote host and try to drop the replication slot as
+ will connect to the remote host and try to drop the replication slot (and
+ any remaining table synchronization slots) as
part of its operation. This is necessary so that the resources allocated
for the subscription on the remote host are released. If this fails,
either because the remote host is not reachable or because the remote
@@ -89,7 +90,8 @@ DROP SUBSCRIPTION [ IF EXISTS ] <replaceable class="parameter">name</replaceable
executing <literal>ALTER SUBSCRIPTION ... SET (slot_name = NONE)</literal>.
After that, <command>DROP SUBSCRIPTION</command> will no longer attempt any
actions on a remote host. Note that if the remote replication slot still
- exists, it should then be dropped manually; otherwise it will continue to
+ exists, it (and any related table synchronization slots) should then be
+ dropped manually; otherwise it/they will continue to
reserve WAL and might eventually cause the disk to fill up. See
also <xref linkend="logical-replication-subscription-slot"/>.
</para>
diff --git a/src/backend/commands/subscriptioncmds.c b/src/backend/commands/subscriptioncmds.c
index 490e935..0142278 100644
--- a/src/backend/commands/subscriptioncmds.c
+++ b/src/backend/commands/subscriptioncmds.c
@@ -37,6 +37,7 @@
#include "replication/walreceiver.h"
#include "replication/walsender.h"
#include "replication/worker_internal.h"
+#include "replication/slot.h"
#include "storage/lmgr.h"
#include "utils/acl.h"
#include "utils/builtins.h"
@@ -928,8 +929,8 @@ DropSubscription(DropSubscriptionStmt *stmt, bool isTopLevel)
char *err = NULL;
RepOriginId originid;
WalReceiverConn *wrconn = NULL;
- StringInfoData cmd;
Form_pg_subscription form;
+ List *rstates;
/*
* Lock pg_subscription with AccessExclusiveLock to ensure that the
@@ -1016,73 +1017,176 @@ DropSubscription(DropSubscriptionStmt *stmt, bool isTopLevel)
ReleaseSysCache(tup);
/*
- * Stop all the subscription workers immediately.
- *
- * This is necessary if we are dropping the replication slot, so that the
- * slot becomes accessible.
+ * Try to acquire the connection necessary for dropping slots. We do this
+ * here so that the same connection may be shared for dropping the
+ * Subscription slot, as well as dropping any tablesync slots.
*
- * It is also necessary if the subscription is disabled and was disabled
- * in the same transaction. Then the workers haven't seen the disabling
- * yet and will still be running, leading to hangs later when we want to
- * drop the replication origin. If the subscription was disabled before
- * this transaction, then there shouldn't be any workers left, so this
- * won't make a difference.
- *
- * New workers won't be started because we hold an exclusive lock on the
- * subscription till the end of the transaction.
+ * Note: If the slotname is NONE/NULL then connection errors are
+ * suppressed. This is necessary so that the DROP SUBSCRIPTION can still
+ * complete even when the connection to publisher is broken.
*/
- LWLockAcquire(LogicalRepWorkerLock, LW_SHARED);
- subworkers = logicalrep_workers_find(subid, false);
- LWLockRelease(LogicalRepWorkerLock);
- foreach(lc, subworkers)
+ load_file("libpqwalreceiver", false);
+
+ wrconn = walrcv_connect(conninfo, true, subname, &err);
+ if (wrconn == NULL && slotname != NULL)
+ ereport(ERROR,
+ (errmsg("could not connect to publisher when attempting to "
+ "drop the replication slot \"%s\"", slotname),
+ errdetail("The error was: %s", err),
+ /* translator: %s is an SQL ALTER command */
+ errhint("Use %s to disassociate the subscription from the slot.",
+ "ALTER SUBSCRIPTION ... SET (slot_name = NONE)")));
+
+ PG_TRY();
{
- LogicalRepWorker *w = (LogicalRepWorker *) lfirst(lc);
+ /*
+ * Stop all the subscription workers immediately.
+ *
+ * This is necessary if we are dropping the replication slot, so that
+ * the slot becomes accessible.
+ *
+ * It is also necessary if the subscription is disabled and was
+ * disabled in the same transaction. Then the workers haven't seen
+ * the disabling yet and will still be running, leading to hangs later
+ * when we want to drop the replication origin. If the subscription
+ * was disabled before this transaction, then there shouldn't be any
+ * workers left, so this won't make a difference.
+ *
+ * New workers won't be started because we hold an exclusive lock on
+ * the subscription till the end of the transaction.
+ */
+ LWLockAcquire(LogicalRepWorkerLock, LW_SHARED);
+ subworkers = logicalrep_workers_find(subid, false);
+ LWLockRelease(LogicalRepWorkerLock);
+ foreach(lc, subworkers)
+ {
+ LogicalRepWorker *w = (LogicalRepWorker *) lfirst(lc);
- logicalrep_worker_stop(w->subid, w->relid);
- }
- list_free(subworkers);
+ logicalrep_worker_stop(w->subid, w->relid);
+ }
+ list_free(subworkers);
+
+ /*
+ * Tablesync resource cleanup (slots and origins).
+ *
+ * Any READY-state relations would already have dealt with clean-ups.
+ */
+ rstates = GetSubscriptionNotReadyRelations(subid);
+ foreach(lc, rstates)
+ {
+ SubscriptionRelState *rstate = (SubscriptionRelState *) lfirst(lc);
+ Oid relid = rstate->relid;
- /* Clean up dependencies */
- deleteSharedDependencyRecordsFor(SubscriptionRelationId, subid, 0);
+ /* Only cleanup resources of tablesync workers */
+ if (!OidIsValid(relid))
+ continue;
- /* Remove any associated relation synchronization states. */
- RemoveSubscriptionRel(subid, InvalidOid);
+ /*
+ * Drop the tablesync slot.
+ *
+ * (For SYNCDONE/READY states the tablesync slot will already be
+ * dropped).
+ */
+ if (rstate->state != SUBREL_STATE_SYNCDONE)
+ {
+ char *syncslotname = ReplicationSlotNameForTablesync(subid, relid);
- /* Remove the origin tracking if exists. */
- snprintf(originname, sizeof(originname), "pg_%u", subid);
- originid = replorigin_by_name(originname, true);
- if (originid != InvalidRepOriginId)
- replorigin_drop(originid, false);
+ PG_TRY();
+ {
+ if (!wrconn)
+ {
+ /*
+ * It is only possible to reach here without ERROR for
+ * a broken publisher connection if the subscription
+ * slotname is already NONE/NULL.
+ *
+ * This means the user has disassociated the
+ * subscription from the replication slot deliberately
+ * so that the DROP SUBSCRIPTION can proceed to
+ * completion. See
+ * https://www.postgresql.org/docs/current/sql-dropsubscription.html
+ *
+ * For this reason we only LOG a message that the
+ * tablesync slots cannot be dropped, rather than
+ * throw ERROR (which would prevent the DROP
+ * SUBSCRIPTION from proceeding).
+ *
+ * In such a case the user must take steps to manually
+ * cleanup these remaining tablesync slots.
+ */
+ elog(LOG, "DROP SUBSCRIPTION: no connection; cannot drop tablesync slot \"%s\".",
+ syncslotname);
+ }
+ else
+ {
+ elog(DEBUG1, "DropSubscription: dropping the tablesync slot \"%s\".", syncslotname);
+ ReplicationSlotDropAtPubNode(wrconn, syncslotname);
+ }
+ }
+ PG_FINALLY();
+ {
+ pfree(syncslotname);
+ }
+ PG_END_TRY();
+ }
- /*
- * If there is no slot associated with the subscription, we can finish
- * here.
- */
- if (!slotname)
+ /* Remove the tablesync's origin tracking if exists. */
+ snprintf(originname, sizeof(originname), "pg_%u_%u", subid, relid);
+ originid = replorigin_by_name(originname, true);
+ if (originid != InvalidRepOriginId)
+ {
+ elog(DEBUG1, "DropSubscription: dropping origin tracking for \"%s\"", originname);
+ replorigin_drop(originid, false);
+ }
+ }
+ list_free(rstates);
+
+ /* Clean up dependencies. */
+ deleteSharedDependencyRecordsFor(SubscriptionRelationId, subid, 0);
+
+ /* Remove any associated relation synchronization states. */
+ RemoveSubscriptionRel(subid, InvalidOid);
+
+ /* Remove the origin tracking if exists. */
+ snprintf(originname, sizeof(originname), "pg_%u", subid);
+ originid = replorigin_by_name(originname, true);
+ if (originid != InvalidRepOriginId)
+ replorigin_drop(originid, false);
+
+ /*
+ * If there is a slot associated with the subscription, then drop the
+ * replication slot at the publisher node using the replication
+ * connection.
+ */
+ if (slotname)
+ ReplicationSlotDropAtPubNode(wrconn, slotname);
+ }
+ PG_FINALLY();
{
+ if (wrconn)
+ walrcv_disconnect(wrconn);
+
table_close(rel, NoLock);
- return;
}
+ PG_END_TRY();
+}
+
+
+/*
+ * Drop the replication slot at the publisher node using the replication connection.
+ */
+void
+ReplicationSlotDropAtPubNode(WalReceiverConn *wrconn, char *slotname)
+{
+ StringInfoData cmd;
+
+ Assert(wrconn);
- /*
- * Otherwise drop the replication slot at the publisher node using the
- * replication connection.
- */
load_file("libpqwalreceiver", false);
initStringInfo(&cmd);
appendStringInfo(&cmd, "DROP_REPLICATION_SLOT %s WAIT", quote_identifier(slotname));
- wrconn = walrcv_connect(conninfo, true, subname, &err);
- if (wrconn == NULL)
- ereport(ERROR,
- (errmsg("could not connect to publisher when attempting to "
- "drop the replication slot \"%s\"", slotname),
- errdetail("The error was: %s", err),
- /* translator: %s is an SQL ALTER command */
- errhint("Use %s to disassociate the subscription from the slot.",
- "ALTER SUBSCRIPTION ... SET (slot_name = NONE)")));
-
PG_TRY();
{
WalRcvExecResult *res;
@@ -1103,13 +1207,9 @@ DropSubscription(DropSubscriptionStmt *stmt, bool isTopLevel)
}
PG_FINALLY();
{
- walrcv_disconnect(wrconn);
+ pfree(cmd.data);
}
PG_END_TRY();
-
- pfree(cmd.data);
-
- table_close(rel, NoLock);
}
/*
diff --git a/src/backend/replication/logical/origin.c b/src/backend/replication/logical/origin.c
index 0b01cce..b299285 100644
--- a/src/backend/replication/logical/origin.c
+++ b/src/backend/replication/logical/origin.c
@@ -905,7 +905,7 @@ replorigin_advance(RepOriginId node,
LWLockAcquire(&replication_state->lock, LW_EXCLUSIVE);
/* Make sure it's not used by somebody else */
- if (replication_state->acquired_by != 0)
+ if (replication_state->acquired_by != 0 && replication_state->acquired_by != MyProcPid)
{
ereport(ERROR,
(errcode(ERRCODE_OBJECT_IN_USE),
diff --git a/src/backend/replication/logical/tablesync.c b/src/backend/replication/logical/tablesync.c
index 863d196..979a2ac 100644
--- a/src/backend/replication/logical/tablesync.c
+++ b/src/backend/replication/logical/tablesync.c
@@ -43,13 +43,17 @@
* state to SYNCDONE. There might be zero changes applied between
* CATCHUP and SYNCDONE, because the sync worker might be ahead of the
* apply worker.
+ * - The sync worker has a intermediary state TCOPYDONE which comes after
+ * DATASYNC and before SYNCWAIT. This state indicates that the initial
+ * table copy phase has completed, so if the worker crashes before
+ * reaching SYNCDONE the copy will not be re-attempted.
* - Once the state is set to SYNCDONE, the apply will continue tracking
* the table until it reaches the SYNCDONE stream position, at which
* point it sets state to READY and stops tracking. Again, there might
* be zero changes in between.
*
- * So the state progression is always: INIT -> DATASYNC -> SYNCWAIT ->
- * CATCHUP -> SYNCDONE -> READY.
+ * So the state progression is always: INIT -> DATASYNC ->
+ * (sync worker TCOPYDONE) -> SYNCWAIT -> CATCHUP -> SYNCDONE -> READY.
*
* The catalog pg_subscription_rel is used to keep information about
* subscribed tables and their state. Some transient state during data
@@ -59,6 +63,7 @@
* Example flows look like this:
* - Apply is in front:
* sync:8
+ * -> set in catalog TCOPYDONE
* -> set in memory SYNCWAIT
* apply:10
* -> set in memory CATCHUP
@@ -74,6 +79,7 @@
*
* - Sync is in front:
* sync:10
+ * -> set in catalog TCOPYDONE
* -> set in memory SYNCWAIT
* apply:8
* -> set in memory CATCHUP
@@ -102,6 +108,8 @@
#include "replication/logicalrelation.h"
#include "replication/walreceiver.h"
#include "replication/worker_internal.h"
+#include "replication/slot.h"
+#include "replication/origin.h"
#include "storage/ipc.h"
#include "utils/builtins.h"
#include "utils/lsyscache.h"
@@ -270,30 +278,62 @@ invalidate_syncing_table_states(Datum arg, int cacheid, uint32 hashvalue)
static void
process_syncing_tables_for_sync(XLogRecPtr current_lsn)
{
- Assert(IsTransactionState());
+ bool sync_done = false;
SpinLockAcquire(&MyLogicalRepWorker->relmutex);
+ sync_done = MyLogicalRepWorker->relstate == SUBREL_STATE_CATCHUP &&
+ current_lsn >= MyLogicalRepWorker->relstate_lsn;
+ SpinLockRelease(&MyLogicalRepWorker->relmutex);
- if (MyLogicalRepWorker->relstate == SUBREL_STATE_CATCHUP &&
- current_lsn >= MyLogicalRepWorker->relstate_lsn)
+ if (sync_done)
{
TimeLineID tli;
+ char *syncslotname;
+
+ /* End wal streaming so wrconn can be re-used to drop the slot. */
+ walrcv_endstreaming(wrconn, &tli);
+
+ /*
+ * Cleanup the tablesync slot.
+ */
+ syncslotname = ReplicationSlotNameForTablesync(
+ MySubscription->oid,
+ MyLogicalRepWorker->relid);
+ PG_TRY();
+ {
+ elog(DEBUG1, "process_syncing_tables_for_sync: dropping the tablesync slot \"%s\".", syncslotname);
+ ReplicationSlotDropAtPubNode(wrconn, syncslotname);
+ }
+ PG_FINALLY();
+ {
+ pfree(syncslotname);
+ }
+ PG_END_TRY();
+ /*
+ * Change state to SYNCDONE.
+ */
+ SpinLockAcquire(&MyLogicalRepWorker->relmutex);
MyLogicalRepWorker->relstate = SUBREL_STATE_SYNCDONE;
MyLogicalRepWorker->relstate_lsn = current_lsn;
-
SpinLockRelease(&MyLogicalRepWorker->relmutex);
+ /*
+ * UpdateSubscriptionRelState must be called within a transaction.
+ * That transaction will be ended within the finish_sync_worker().
+ */
+ if (!IsTransactionState())
+ {
+ StartTransactionCommand();
+ }
+
UpdateSubscriptionRelState(MyLogicalRepWorker->subid,
MyLogicalRepWorker->relid,
MyLogicalRepWorker->relstate,
MyLogicalRepWorker->relstate_lsn);
- walrcv_endstreaming(wrconn, &tli);
finish_sync_worker();
}
- else
- SpinLockRelease(&MyLogicalRepWorker->relmutex);
}
/*
@@ -412,6 +452,35 @@ process_syncing_tables_for_apply(XLogRecPtr current_lsn)
started_tx = true;
}
+ /*
+ * Remove the tablesync origin tracking if exists.
+ *
+ * The cleanup is done here instead of in the
+ * finish_sync_worker function because if the tablesync worker
+ * process attempted to call replorigin_drop then that will
+ * hang because replorigin_drop logic considers the owning
+ * tablesync PID as "busy".
+ *
+ * Do this before updating the state, so that DropSubscription
+ * can know that all READY workers have already had their
+ * origin tracking removed.
+ */
+ {
+ char originname[NAMEDATALEN];
+ RepOriginId originid;
+
+ snprintf(originname, sizeof(originname), "pg_%u_%u", MyLogicalRepWorker->subid, rstate->relid);
+ originid = replorigin_by_name(originname, true);
+ if (OidIsValid(originid))
+ {
+ elog(DEBUG1, "process_syncing_tables_for_apply: dropping tablesync origin tracking for \"%s\".", originname);
+ replorigin_drop(originid, false);
+ }
+ }
+
+ /*
+ * Update the state only after the origin cleanup.
+ */
UpdateSubscriptionRelState(MyLogicalRepWorker->subid,
rstate->relid, rstate->state,
rstate->lsn);
@@ -808,6 +877,31 @@ copy_table(Relation rel)
}
/*
+ * Determine the tablesync slot name.
+ *
+ * The returned slot name is palloc'ed in current memory context.
+ */
+char *
+ReplicationSlotNameForTablesync(Oid suboid, Oid relid)
+{
+ char *syncslotname;
+
+ /*
+ * To build a slot name for the sync work, we are limited to NAMEDATALEN -
+ * 1 characters.
+ *
+ * The name is calculated as pg_%u_sync_%u (3 + 10 + 6 + 10 + '\0'). (It's
+ * actually the NAMEDATALEN on the remote that matters, but this scheme
+ * will also work reasonably if that is different.)
+ */
+ StaticAssertStmt(NAMEDATALEN >= 32, "NAMEDATALEN too small"); /* for sanity */
+
+ syncslotname = psprintf("pg_%u_sync_%u", suboid, relid);
+
+ return syncslotname;
+}
+
+/*
* Start syncing the table in the sync worker.
*
* If nothing needs to be done to sync the table, we exit the worker without
@@ -824,6 +918,8 @@ LogicalRepSyncTableStart(XLogRecPtr *origin_startpos)
XLogRecPtr relstate_lsn;
Relation rel;
WalRcvExecResult *res;
+ char originname[NAMEDATALEN];
+ RepOriginId originid;
/* Check the state of the table synchronization. */
StartTransactionCommand();
@@ -849,19 +945,10 @@ LogicalRepSyncTableStart(XLogRecPtr *origin_startpos)
finish_sync_worker(); /* doesn't return */
}
- /*
- * To build a slot name for the sync work, we are limited to NAMEDATALEN -
- * 1 characters. We cut the original slot name to NAMEDATALEN - 28 chars
- * and append _%u_sync_%u (1 + 10 + 6 + 10 + '\0'). (It's actually the
- * NAMEDATALEN on the remote that matters, but this scheme will also work
- * reasonably if that is different.)
- */
- StaticAssertStmt(NAMEDATALEN >= 32, "NAMEDATALEN too small"); /* for sanity */
- slotname = psprintf("%.*s_%u_sync_%u",
- NAMEDATALEN - 28,
- MySubscription->slotname,
- MySubscription->oid,
- MyLogicalRepWorker->relid);
+ /* Calculate the name of the tablesync slot. */
+ slotname = ReplicationSlotNameForTablesync(
+ MySubscription->oid,
+ MyLogicalRepWorker->relid);
/*
* Here we use the slot name instead of the subscription name as the
@@ -874,7 +961,31 @@ LogicalRepSyncTableStart(XLogRecPtr *origin_startpos)
(errmsg("could not connect to the publisher: %s", err)));
Assert(MyLogicalRepWorker->relstate == SUBREL_STATE_INIT ||
- MyLogicalRepWorker->relstate == SUBREL_STATE_DATASYNC);
+ MyLogicalRepWorker->relstate == SUBREL_STATE_DATASYNC ||
+ MyLogicalRepWorker->relstate == SUBREL_STATE_TCOPYDONE);
+
+ /* Assign the origin tracking record name. */
+ snprintf(originname, sizeof(originname), "pg_%u_%u", MySubscription->oid, MyLogicalRepWorker->relid);
+
+ if (MyLogicalRepWorker->relstate == SUBREL_STATE_TCOPYDONE)
+ {
+ /*
+ * The COPY phase was previously done, but tablesync then crashed/etc
+ * before it was able to finish normally.
+ */
+ elog(DEBUG1, "LogicalRepSyncTableStart: tablesync relstate was SUBREL_STATE_TCOPYDONE.");
+ StartTransactionCommand();
+
+ /*
+ * The origin tracking name must already exist (missing_ok=false).
+ */
+ originid = replorigin_by_name(originname, false);
+ replorigin_session_setup(originid);
+ replorigin_session_origin = originid;
+ *origin_startpos = replorigin_session_get_progress(false);
+
+ goto copy_table_done;
+ }
SpinLockAcquire(&MyLogicalRepWorker->relmutex);
MyLogicalRepWorker->relstate = SUBREL_STATE_DATASYNC;
@@ -890,9 +1001,6 @@ LogicalRepSyncTableStart(XLogRecPtr *origin_startpos)
CommitTransactionCommand();
pgstat_report_stat(false);
- /*
- * We want to do the table data sync in a single transaction.
- */
StartTransactionCommand();
/*
@@ -918,29 +1026,90 @@ LogicalRepSyncTableStart(XLogRecPtr *origin_startpos)
walrcv_clear_result(res);
/*
- * Create a new temporary logical decoding slot. This slot will be used
+ * Create a new permanent logical decoding slot. This slot will be used
* for the catchup phase after COPY is done, so tell it to use the
* snapshot to make the final data consistent.
*/
- walrcv_create_slot(wrconn, slotname, true,
+ walrcv_create_slot(wrconn, slotname, false,
CRS_USE_SNAPSHOT, origin_startpos);
- /* Now do the initial data copy */
- PushActiveSnapshot(GetTransactionSnapshot());
- copy_table(rel);
- PopActiveSnapshot();
+ /*
+ * Be sure to remove the newly created tablesync slot if the COPY fails.
+ */
+ PG_TRY();
+ {
+ /* Now do the initial data copy */
+ PushActiveSnapshot(GetTransactionSnapshot());
+ copy_table(rel);
+ PopActiveSnapshot();
+
+ res = walrcv_exec(wrconn, "COMMIT", 0, NULL);
+ if (res->status != WALRCV_OK_COMMAND)
+ ereport(ERROR,
+ (errmsg("table copy could not finish transaction on publisher"),
+ errdetail("The error was: %s", res->err)));
+ walrcv_clear_result(res);
+
+ table_close(rel, NoLock);
+
+ /* Make the copy visible. */
+ CommandCounterIncrement();
+ }
+ PG_CATCH();
+ {
+ /*
+ * If something failed during copy table then cleanup the created
+ * slot.
+ */
+ elog(DEBUG1, "LogicalRepSyncTableStart: tablesync copy failed. Dropping the tablesync slot \"%s\".", slotname);
+ ReplicationSlotDropAtPubNode(wrconn, slotname);
- res = walrcv_exec(wrconn, "COMMIT", 0, NULL);
- if (res->status != WALRCV_OK_COMMAND)
+ pfree(slotname);
+ slotname = NULL;
+
+ PG_RE_THROW();
+ }
+ PG_END_TRY();
+
+ /* Setup replication origin tracking. */
+ originid = replorigin_by_name(originname, true);
+ if (!OidIsValid(originid))
+ {
+ /*
+ * Origin tracking does not exist. Create it now, and advance to LSN
+ * got from walrcv_create_slot.
+ */
+ originid = replorigin_create(originname);
+ replorigin_session_setup(originid);
+ replorigin_session_origin = originid;
+ replorigin_advance(originid, *origin_startpos, InvalidXLogRecPtr,
+ true /* go backward */ , true /* WAL log */ );
+ }
+ else
+ {
ereport(ERROR,
- (errmsg("table copy could not finish transaction on publisher"),
- errdetail("The error was: %s", res->err)));
- walrcv_clear_result(res);
+ (errcode(ERRCODE_DUPLICATE_OBJECT),
+ errmsg("replication origin \"%s\" already exists",
+ originname)));
+ }
- table_close(rel, NoLock);
+ /*
+ * Update the persisted state to indicate the COPY phase is done; make it
+ * visible to others.
+ */
+ UpdateSubscriptionRelState(MyLogicalRepWorker->subid,
+ MyLogicalRepWorker->relid,
+ SUBREL_STATE_TCOPYDONE,
+ MyLogicalRepWorker->relstate_lsn);
+
+copy_table_done:
- /* Make the copy visible. */
- CommandCounterIncrement();
+ elog(DEBUG1, "LogicalRepSyncTableStart: '%s' origin_startpos lsn %X/%X",
+ originname,
+ (uint32) (*origin_startpos >> 32),
+ (uint32) *origin_startpos);
+
+ CommitTransactionCommand();
/*
* We are done with the initial data synchronization, update the state.
diff --git a/src/backend/replication/logical/worker.c b/src/backend/replication/logical/worker.c
index 1b1d70e..4bd4030 100644
--- a/src/backend/replication/logical/worker.c
+++ b/src/backend/replication/logical/worker.c
@@ -807,12 +807,8 @@ apply_handle_stream_stop(StringInfo s)
/* We must be in a valid transaction state */
Assert(IsTransactionState());
- /* The synchronization worker runs in single transaction. */
- if (!am_tablesync_worker())
- {
- /* Commit the per-stream transaction */
- CommitTransactionCommand();
- }
+ /* Commit the per-stream transaction */
+ CommitTransactionCommand();
in_streamed_transaction = false;
@@ -889,9 +885,7 @@ apply_handle_stream_abort(StringInfo s)
/* Cleanup the subxact info */
cleanup_subxact_info();
- /* The synchronization worker runs in single transaction */
- if (!am_tablesync_worker())
- CommitTransactionCommand();
+ CommitTransactionCommand();
return;
}
@@ -918,8 +912,7 @@ apply_handle_stream_abort(StringInfo s)
/* write the updated subxact list */
subxact_info_write(MyLogicalRepWorker->subid, xid);
- if (!am_tablesync_worker())
- CommitTransactionCommand();
+ CommitTransactionCommand();
}
}
@@ -1062,8 +1055,7 @@ apply_handle_stream_commit(StringInfo s)
static void
apply_handle_commit_internal(StringInfo s, LogicalRepCommitData* commit_data)
{
- /* The synchronization worker runs in single transaction. */
- if (IsTransactionState() && !am_tablesync_worker())
+ if (IsTransactionState())
{
/*
* Update origin state so we can restart streaming from correct
diff --git a/src/include/catalog/pg_subscription_rel.h b/src/include/catalog/pg_subscription_rel.h
index 06663b9..2c80405 100644
--- a/src/include/catalog/pg_subscription_rel.h
+++ b/src/include/catalog/pg_subscription_rel.h
@@ -61,6 +61,8 @@ DECLARE_UNIQUE_INDEX(pg_subscription_rel_srrelid_srsubid_index, 6117, on pg_subs
#define SUBREL_STATE_INIT 'i' /* initializing (sublsn NULL) */
#define SUBREL_STATE_DATASYNC 'd' /* data is being synchronized (sublsn
* NULL) */
+#define SUBREL_STATE_TCOPYDONE 't' /* tablesync copy phase is completed
+ * (sublsn NULL) */
#define SUBREL_STATE_SYNCDONE 's' /* synchronization finished in front of
* apply (sublsn set) */
#define SUBREL_STATE_READY 'r' /* ready (sublsn set) */
diff --git a/src/include/replication/slot.h b/src/include/replication/slot.h
index 53f636c..e617602 100644
--- a/src/include/replication/slot.h
+++ b/src/include/replication/slot.h
@@ -15,6 +15,7 @@
#include "storage/lwlock.h"
#include "storage/shmem.h"
#include "storage/spin.h"
+#include "replication/walreceiver.h"
/*
* Behaviour of replication slots, upon release or crash.
@@ -211,6 +212,8 @@ extern bool ReplicationSlotsCountDBSlots(Oid dboid, int *nslots, int *nactive);
extern void ReplicationSlotsDropDBSlots(Oid dboid);
extern void InvalidateObsoleteReplicationSlots(XLogSegNo oldestSegno);
extern ReplicationSlot *SearchNamedReplicationSlot(const char *name);
+extern char *ReplicationSlotNameForTablesync(Oid suboid, Oid relid);
+extern void ReplicationSlotDropAtPubNode(WalReceiverConn *wrconn, char *slotname);
extern void StartupReplicationSlots(void);
extern void CheckPointReplicationSlots(void);
--
1.8.3.1
v12-0002-Tablesync-extra-logging.patchapplication/octet-stream; name=v12-0002-Tablesync-extra-logging.patchDownload
From 1fc128ec7c828bf569c1aff64872368d8a46f47b Mon Sep 17 00:00:00 2001
From: Peter Smith <peter.b.smith@fujitsu.com>
Date: Thu, 7 Jan 2021 17:08:11 +1100
Subject: [PATCH v12] Tablesync extra logging.
This patch only adds some extra logging which may be helpful for testing, but is not for committing.
---
src/backend/commands/subscriptioncmds.c | 6 ++++--
src/backend/replication/logical/tablesync.c | 19 ++++++++++++++-----
2 files changed, 18 insertions(+), 7 deletions(-)
diff --git a/src/backend/commands/subscriptioncmds.c b/src/backend/commands/subscriptioncmds.c
index 0142278..a627271 100644
--- a/src/backend/commands/subscriptioncmds.c
+++ b/src/backend/commands/subscriptioncmds.c
@@ -1119,8 +1119,9 @@ DropSubscription(DropSubscriptionStmt *stmt, bool isTopLevel)
}
else
{
- elog(DEBUG1, "DropSubscription: dropping the tablesync slot \"%s\".", syncslotname);
+ elog(LOG, "!!>> DropSubscription: dropping the tablesync slot \"%s\".", syncslotname);
ReplicationSlotDropAtPubNode(wrconn, syncslotname);
+ elog(LOG, "!!>> DropSubscription: dropped the tablesync slot \"%s\".", syncslotname);
}
}
PG_FINALLY();
@@ -1135,8 +1136,9 @@ DropSubscription(DropSubscriptionStmt *stmt, bool isTopLevel)
originid = replorigin_by_name(originname, true);
if (originid != InvalidRepOriginId)
{
- elog(DEBUG1, "DropSubscription: dropping origin tracking for \"%s\"", originname);
+ elog(LOG, "!!>> DropSubscription: dropping origin tracking for \"%s\"", originname);
replorigin_drop(originid, false);
+ elog(LOG, "!!>> DropSubscription: dropped origin tracking for \"%s\"", originname);
}
}
list_free(rstates);
diff --git a/src/backend/replication/logical/tablesync.c b/src/backend/replication/logical/tablesync.c
index 979a2ac..306be98 100644
--- a/src/backend/replication/logical/tablesync.c
+++ b/src/backend/replication/logical/tablesync.c
@@ -301,8 +301,9 @@ process_syncing_tables_for_sync(XLogRecPtr current_lsn)
MyLogicalRepWorker->relid);
PG_TRY();
{
- elog(DEBUG1, "process_syncing_tables_for_sync: dropping the tablesync slot \"%s\".", syncslotname);
+ elog(LOG, "!!>> process_syncing_tables_for_sync: dropping the tablesync slot \"%s\".", syncslotname);
ReplicationSlotDropAtPubNode(wrconn, syncslotname);
+ elog(LOG, "!!>> process_syncing_tables_for_sync: dropped the tablesync slot \"%s\".", syncslotname);
}
PG_FINALLY();
{
@@ -473,8 +474,9 @@ process_syncing_tables_for_apply(XLogRecPtr current_lsn)
originid = replorigin_by_name(originname, true);
if (OidIsValid(originid))
{
- elog(DEBUG1, "process_syncing_tables_for_apply: dropping tablesync origin tracking for \"%s\".", originname);
+ elog(LOG, "!!>> process_syncing_tables_for_apply: dropping tablesync origin tracking for \"%s\".", originname);
replorigin_drop(originid, false);
+ elog(LOG, "!!>> process_syncing_tables_for_apply: dropped tablesync origin tracking for \"%s\".", originname);
}
}
@@ -973,15 +975,17 @@ LogicalRepSyncTableStart(XLogRecPtr *origin_startpos)
* The COPY phase was previously done, but tablesync then crashed/etc
* before it was able to finish normally.
*/
- elog(DEBUG1, "LogicalRepSyncTableStart: tablesync relstate was SUBREL_STATE_TCOPYDONE.");
+ elog(LOG, "!!>> LogicalRepSyncTableStart: tablesync relstate was SUBREL_STATE_TCOPYDONE.");
StartTransactionCommand();
/*
* The origin tracking name must already exist (missing_ok=false).
*/
originid = replorigin_by_name(originname, false);
+ elog(LOG, "!!>> LogicalRepSyncTableStart: 2 replorigin_session_setup \"%s\".", originname);
replorigin_session_setup(originid);
replorigin_session_origin = originid;
+ elog(LOG, "!!>> LogicalRepSyncTableStart: 2 replorigin_session_get_progress \"%s\".", originname);
*origin_startpos = replorigin_session_get_progress(false);
goto copy_table_done;
@@ -1030,6 +1034,7 @@ LogicalRepSyncTableStart(XLogRecPtr *origin_startpos)
* for the catchup phase after COPY is done, so tell it to use the
* snapshot to make the final data consistent.
*/
+ elog(LOG, "!!>> LogicalRepSyncTableStart: walrcv_create_slot for \"%s\".", slotname);
walrcv_create_slot(wrconn, slotname, false,
CRS_USE_SNAPSHOT, origin_startpos);
@@ -1061,8 +1066,9 @@ LogicalRepSyncTableStart(XLogRecPtr *origin_startpos)
* If something failed during copy table then cleanup the created
* slot.
*/
- elog(DEBUG1, "LogicalRepSyncTableStart: tablesync copy failed. Dropping the tablesync slot \"%s\".", slotname);
+ elog(LOG, "!!>> LogicalRepSyncTableStart: tablesync copy failed. Dropping the tablesync slot \"%s\".", slotname);
ReplicationSlotDropAtPubNode(wrconn, slotname);
+ elog(LOG, "!!>> LogicalRepSyncTableStart: tablesync copy failed. Dropped the tablesync slot \"%s\".", slotname);
pfree(slotname);
slotname = NULL;
@@ -1079,9 +1085,12 @@ LogicalRepSyncTableStart(XLogRecPtr *origin_startpos)
* Origin tracking does not exist. Create it now, and advance to LSN
* got from walrcv_create_slot.
*/
+ elog(LOG, "!!>> LogicalRepSyncTableStart: 1 replorigin_create \"%s\".", originname);
originid = replorigin_create(originname);
+ elog(LOG, "!!>> LogicalRepSyncTableStart: 1 replorigin_session_setup \"%s\".", originname);
replorigin_session_setup(originid);
replorigin_session_origin = originid;
+ elog(LOG, "!!>> LogicalRepSyncTableStart: 1 replorigin_advance \"%s\".", originname);
replorigin_advance(originid, *origin_startpos, InvalidXLogRecPtr,
true /* go backward */ , true /* WAL log */ );
}
@@ -1104,7 +1113,7 @@ LogicalRepSyncTableStart(XLogRecPtr *origin_startpos)
copy_table_done:
- elog(DEBUG1, "LogicalRepSyncTableStart: '%s' origin_startpos lsn %X/%X",
+ elog(LOG, "!!>> LogicalRepSyncTableStart: '%s' origin_startpos lsn %X/%X",
originname,
(uint32) (*origin_startpos >> 32),
(uint32) *origin_startpos);
--
1.8.3.1
On Mon, Jan 4, 2021 at 8:06 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
On Wed, Dec 30, 2020 at 11:51 AM Peter Smith <smithpb2250@gmail.com> wrote:
On Wed, Dec 23, 2020 at 8:43 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
1. + * Rarely, the DropSubscription may be issued when a tablesync still + * is in SYNCDONE but not yet in READY state. If this happens then + * the drop slot could fail because it is already dropped. + * In this case suppress and drop slot error. + * + * FIXME - Is there a better way than this? + */ + if (rstate->state != SUBREL_STATE_SYNCDONE) + PG_RE_THROW();So, does this situation happens when we try to drop subscription after
the state is changed to syncdone but not syncready. If so, then can't
we write a function GetSubscriptionNotDoneRelations similar to
GetSubscriptionNotReadyRelations where we get a list of relations that
are not in done stage. I think this should be safe because once we are
here we shouldn't be allowed to start a new worker and old workers are
already stopped by this function.Yes, but I don't see how adding such a function is an improvement over
the existing code:The advantage is that we don't need to use try..catch to deal with
such conditions which I don't think is a good way to deal with such
cases. Also, even after using try...catch, still, we can leak the
slots because the patch drops the slot after changing the state to
syncdone and if there is any error while dropping the slot, it simply
skips it. So, it is possible that the rel state is syncdone but the
slot still exists and we get an error due to some different reason,
and then we will silently skip it again and allow the subscription to
be dropped.I think instead what we should do is to drop the slot before we change
the rel state to syncdone. Also, if the apply workers fail to drop the
slot, it should try to again drop it after restart. In
DropSubscription, we can then check if the rel state is not SYNC or
READY, we can drop the corresponding slots.
Fixed as suggested in latest patch [v12]
----
[v12] = /messages/by-id/CAHut+PsonJzarxSBWkOM=MjoEpaq53ShBJoTT9LHJskwP3OvZA@mail.gmail.com
Kind Regards,
Peter Smith.
Fujitsu Australia
On Tue, Jan 5, 2021 at 10:41 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
3.
+#define SUBREL_STATE_COPYDONE 'C' /* tablesync copy phase is completed */You can mention in the comments that sublsn will be NULL for this
state as it is mentioned for other similar states. Can we think of
using any letter in lower case for this as all other states are in
lower-case except for this which makes it a look bit odd? We can use
'f' or 'e' and describe it as 'copy finished' or 'copy end'. I am fine
if you have any better ideas.Fixed in latest patch [v11]
It is still not reflected in the docs. See below: --- a/doc/src/sgml/catalogs.sgml +++ b/doc/src/sgml/catalogs.sgml @@ -7651,6 +7651,7 @@ SCRAM-SHA-256$<replaceable><iteration count></replaceable>:<replaceable>&l State code: <literal>i</literal> = initialize, <literal>d</literal> = data is being copied, + <literal>C</literal> = table data has been copied, <literal>s</literal> = synchronized,
Fixed in latest patch [v12]
----
[v12] = /messages/by-id/CAHut+PsonJzarxSBWkOM=MjoEpaq53ShBJoTT9LHJskwP3OvZA@mail.gmail.com
Kind Regards,
Peter Smith.
Fujitsu Australia
On Wed, Jan 6, 2021 at 2:07 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
Okay, if we want to go that way then we should add some documentation
about it. Currently, the slot name used by apply worker is known to
the user because either it is specified by the user or the default is
subscription name, so the user can manually remove that slot later but
that is not true for tablesync slots. I think we need to update both
the Drop Subscription page [1] and logical-replication-subscription
page [2] where we have mentioned temporary slots and in the end "Here
are some scenarios: .." to mention about these slots and probably how
their names are generated so that in such special situations users can
drop them manually.[1] - https://www.postgresql.org/docs/devel/sql-dropsubscription.html
[2] - https://www.postgresql.org/docs/devel/logical-replication-subscription.html
PG docs updated in latest patch [v12]
----
[v12] = /messages/by-id/CAHut+PsonJzarxSBWkOM=MjoEpaq53ShBJoTT9LHJskwP3OvZA@mail.gmail.com
Kind Regards,
Peter Smith.
Fujitsu Australia
On Thu, Jan 7, 2021 at 3:20 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
On Wed, Jan 6, 2021 at 3:39 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
On Wed, Jan 6, 2021 at 2:13 PM Peter Smith <smithpb2250@gmail.com> wrote:
I think it makes sense. If there can be a race between the tablesync
re-launching (after error), and the AlterSubscription_refresh removing
some table’s relid from the subscription then there could be lurking
slot/origin tablesync resources (of the removed table) which a
subsequent DROP SUBSCRIPTION cannot discover. I will think more about
how/if it is possible to make this happen. Anyway, I suppose I ought
to refactor/isolate some of the tablesync cleanup code in case it
needs to be commonly called from DropSubscription and/or from
AlterSubscription_refresh.Fair enough.
I think before implementing, we should once try to reproduce this
case. I understand this is a timing issue and can be reproduced only
with the help of debugger but we should do that.
FYI, I was able to reproduce this case in debugger. PSA logs showing details.
----
Kind Regards,
Peter Smith.
Fujitsu Australia
Attachments:
PSA the v12 patch for the Tablesync Solution1.
Differences from v11: + Added PG docs to mention the tablesync slot + Refactored tablesync slot drop (done by DropSubscription/process_syncing_tables_for_sync) + Fixed PG docs mentioning wrong state code + Fixed wrong code comment describing TCOPYDONE state
Hi
I look into the new patch and have some comments.
1.
+ /* Setup replication origin tracking. */
①+ originid = replorigin_by_name(originname, true);
+ if (!OidIsValid(originid))
+ {
②+ originid = replorigin_by_name(originname, true);
+ if (originid != InvalidRepOriginId)
+ {
There are two different style code which check whether originid is valid.
Both are fine, but do you think it’s better to have a same style here?
2.
* state to SYNCDONE. There might be zero changes applied between
* CATCHUP and SYNCDONE, because the sync worker might be ahead of the
* apply worker.
+ * - The sync worker has a intermediary state TCOPYDONE which comes after
+ * DATASYNC and before SYNCWAIT. This state indicates that the initial
This comment about TCOPYDONE is better to be placed at [1]* * - Tablesync worker finishes the copy and sets table state to SYNCWAIT; * waits for state change.*, where is between DATASYNC and SYNCWAIT.
* - Tablesync worker starts; changes table state from INIT to DATASYNC while
* copying.
[1]: * * - Tablesync worker finishes the copy and sets table state to SYNCWAIT; * waits for state change.
* - Tablesync worker finishes the copy and sets table state to SYNCWAIT;
* waits for state change.
3.
+ /*
+ * To build a slot name for the sync work, we are limited to NAMEDATALEN -
+ * 1 characters.
+ *
+ * The name is calculated as pg_%u_sync_%u (3 + 10 + 6 + 10 + '\0'). (It's
+ * actually the NAMEDATALEN on the remote that matters, but this scheme
+ * will also work reasonably if that is different.)
+ */
+ StaticAssertStmt(NAMEDATALEN >= 32, "NAMEDATALEN too small"); /* for sanity */
+
+ syncslotname = psprintf("pg_%u_sync_%u", suboid, relid);
The comments says syncslotname is limit to NAMEDATALEN - 1 characters.
But the actual size of it is (3 + 10 + 6 + 10 + '\0') = 30,which seems not NAMEDATALEN - 1.
Should we change the comment here?
Best regards,
houzj
On Fri, Jan 8, 2021 at 7:14 AM Peter Smith <smithpb2250@gmail.com> wrote:
On Thu, Jan 7, 2021 at 3:20 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
On Wed, Jan 6, 2021 at 3:39 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
On Wed, Jan 6, 2021 at 2:13 PM Peter Smith <smithpb2250@gmail.com> wrote:
I think it makes sense. If there can be a race between the tablesync
re-launching (after error), and the AlterSubscription_refresh removing
some table’s relid from the subscription then there could be lurking
slot/origin tablesync resources (of the removed table) which a
subsequent DROP SUBSCRIPTION cannot discover. I will think more about
how/if it is possible to make this happen. Anyway, I suppose I ought
to refactor/isolate some of the tablesync cleanup code in case it
needs to be commonly called from DropSubscription and/or from
AlterSubscription_refresh.Fair enough.
I think before implementing, we should once try to reproduce this
case. I understand this is a timing issue and can be reproduced only
with the help of debugger but we should do that.FYI, I was able to reproduce this case in debugger. PSA logs showing details.
Thanks for reproducing as I was worried about exactly this case. I
have one question related to logs:
##
## ALTER SUBSCRIPTION to REFRESH the publication
## This blocks on some latch until the tablesync worker dies, then it continues
##
Did you check which exact latch or lock blocks this? It is important
to retain this interlock as otherwise even if decide to drop slot (and
or origin) the tablesync worker might continue.
--
With Regards,
Amit Kapila.
Hi Amit.
PSA the v13 patch for the Tablesync Solution1.
Differences from v12:
+ Fixed whitespace errors of v12-0001
+ Modify TCOPYDONE state comment (houzj feedback)
+ WIP fix for AlterSubscripion_refresh (Amit feedback)
====
Features:
* The tablesync slot is now permanent instead of temporary. The
tablesync slot name is no longer tied to the Subscription slot na
* The tablesync slot cleanup (drop) code is added for DropSubscription
and for process_syncing_tables_for_sync functions
* The tablesync worker is now allowing multiple tx instead of single tx
* A new state (SUBREL_STATE_TCOPYDONE) is persisted after a successful
copy_table in LogicalRepSyncTableStart.
* If a re-launched tablesync finds state SUBREL_STATE_TCOPYDONE then
it will bypass the initial copy_table phase.
* Now tablesync sets up replication origin tracking in
LogicalRepSyncTableStart (similar as done for the apply worker). The
origin is advanced when first created.
* The tablesync replication origin tracking is cleaned up during
DropSubscription and/or process_syncing_tables_for_apply.
* The DropSubscription cleanup code was enhanced (v7+) to take care of
any crashed tablesync workers.
* Updates to PG docs
TODO / Known Issues:
* Address review comments
* ALTER PUBLICATION DROP TABLE can mean knowledge of tablesyncs gets
lost causing resource cleanup to be missed. There is a WIP fix for
this in the AlterSubscription_refresh, however it is not entirely
correct; there are known race conditions. See FIXME comments.
---
Kind Regards,
Peter Smith.
Fujitsu Australia
Show quoted text
On Thu, Jan 7, 2021 at 6:52 PM Peter Smith <smithpb2250@gmail.com> wrote:
Hi Amit.
PSA the v12 patch for the Tablesync Solution1.
Differences from v11: + Added PG docs to mention the tablesync slot + Refactored tablesync slot drop (done by DropSubscription/process_syncing_tables_for_sync) + Fixed PG docs mentioning wrong state code + Fixed wrong code comment describing TCOPYDONE state====
Features:
* The tablesync slot is now permanent instead of temporary. The
tablesync slot name is no longer tied to the Subscription slot na* The tablesync slot cleanup (drop) code is added for DropSubscription
and for process_syncing_tables_for_sync functions* The tablesync worker is now allowing multiple tx instead of single tx
* A new state (SUBREL_STATE_TCOPYDONE) is persisted after a successful
copy_table in LogicalRepSyncTableStart.* If a re-launched tablesync finds state SUBREL_STATE_TCOPYDONE then
it will bypass the initial copy_table phase.* Now tablesync sets up replication origin tracking in
LogicalRepSyncTableStart (similar as done for the apply worker). The
origin is advanced when first created.* The tablesync replication origin tracking is cleaned up during
DropSubscription and/or process_syncing_tables_for_apply.* The DropSubscription cleanup code was enhanced (v7+) to take care of
any crashed tablesync workers.* Updates to PG docs
TODO / Known Issues:
* Address review comments
* Patch applies with whitespace warning
---
Kind Regards,
Peter Smith.
Fujitsu Australia
Attachments:
v13-0002-Tablesync-extra-logging.patchapplication/octet-stream; name=v13-0002-Tablesync-extra-logging.patchDownload
From 7dd4e8786314c6f98f9363411e2ff693c6aaad02 Mon Sep 17 00:00:00 2001
From: Peter Smith <peter.b.smith@fujitsu.com>
Date: Fri, 8 Jan 2021 20:04:23 +1100
Subject: [PATCH v13] Tablesync extra logging.
This patch only adds some extra logging which may be helpful for testing, but is not for committing.
---
src/backend/commands/subscriptioncmds.c | 17 ++++++++++++-----
src/backend/replication/logical/tablesync.c | 19 ++++++++++++++-----
2 files changed, 26 insertions(+), 10 deletions(-)
diff --git a/src/backend/commands/subscriptioncmds.c b/src/backend/commands/subscriptioncmds.c
index 51f5e40..dec1ae5 100644
--- a/src/backend/commands/subscriptioncmds.c
+++ b/src/backend/commands/subscriptioncmds.c
@@ -652,7 +652,9 @@ AlterSubscription_refresh(Subscription *sub, bool copy_data)
{
Oid subid = sub->oid;
+ elog(LOG, "!!>> AlterSubscription_refresh: before logicalrep_worker_stop_at_commit");
logicalrep_worker_stop_at_commit(subid, relid);
+ elog(LOG, "!!>> AlterSubscription_refresh: after logicalrep_worker_stop_at_commit");
/*
* Cleanup any remaining tablesync resources.
@@ -665,6 +667,7 @@ AlterSubscription_refresh(Subscription *sub, bool copy_data)
/* Last known rel state. */
state = GetSubscriptionRelState(subid, relid, &statelsn);
+ elog(LOG, "!!>> AlterSubscription_refresh: relid %u had state %c", relid, state);
RemoveSubscriptionRel(sub->oid, relid);
@@ -703,8 +706,9 @@ AlterSubscription_refresh(Subscription *sub, bool copy_data)
ereport(ERROR,
(errmsg("could not connect to the publisher: %s", err)));
- elog(DEBUG1, "AlterSubscription_refresh: dropping the tablesync slot \"%s\".", syncslotname);
+ elog(LOG, "AlterSubscription_refresh: dropping the tablesync slot \"%s\".", syncslotname);
ReplicationSlotDropAtPubNode(wrconn, syncslotname);
+ elog(LOG, "!!>> AlterSubscription_refresh: dropped the tablesync slot \"%s\".", syncslotname);
}
PG_FINALLY();
{
@@ -721,12 +725,13 @@ AlterSubscription_refresh(Subscription *sub, bool copy_data)
originid = replorigin_by_name(originname, true);
if (OidIsValid(originid))
{
- elog(DEBUG1, "AlterSubscription_refresh: dropping origin tracking for \"%s\"", originname);
+ elog(LOG, "AlterSubscription_refresh: dropping origin tracking for \"%s\"", originname);
replorigin_drop(originid, false);
+ elog(LOG, "!!>> AlterSubscription_refresh: dropped origin tracking for \"%s\"", originname);
}
}
- ereport(DEBUG1,
+ ereport(LOG,
(errmsg("table \"%s.%s\" removed from subscription \"%s\"",
get_namespace_name(get_rel_namespace(relid)),
get_rel_name(relid),
@@ -1196,8 +1201,9 @@ DropSubscription(DropSubscriptionStmt *stmt, bool isTopLevel)
}
else
{
- elog(DEBUG1, "DropSubscription: dropping the tablesync slot \"%s\".", syncslotname);
+ elog(LOG, "DropSubscription: dropping the tablesync slot \"%s\".", syncslotname);
ReplicationSlotDropAtPubNode(wrconn, syncslotname);
+ elog(LOG, "!!>> DropSubscription: dropped the tablesync slot \"%s\".", syncslotname);
}
}
PG_FINALLY();
@@ -1212,8 +1218,9 @@ DropSubscription(DropSubscriptionStmt *stmt, bool isTopLevel)
originid = replorigin_by_name(originname, true);
if (originid != InvalidRepOriginId)
{
- elog(DEBUG1, "DropSubscription: dropping origin tracking for \"%s\"", originname);
+ elog(LOG, "DropSubscription: dropping origin tracking for \"%s\"", originname);
replorigin_drop(originid, false);
+ elog(LOG, "!!>> DropSubscription: dropped origin tracking for \"%s\"", originname);
}
}
list_free(rstates);
diff --git a/src/backend/replication/logical/tablesync.c b/src/backend/replication/logical/tablesync.c
index d5d0840..4d9d3fa 100644
--- a/src/backend/replication/logical/tablesync.c
+++ b/src/backend/replication/logical/tablesync.c
@@ -299,8 +299,9 @@ process_syncing_tables_for_sync(XLogRecPtr current_lsn)
MyLogicalRepWorker->relid);
PG_TRY();
{
- elog(DEBUG1, "process_syncing_tables_for_sync: dropping the tablesync slot \"%s\".", syncslotname);
+ elog(LOG, "process_syncing_tables_for_sync: dropping the tablesync slot \"%s\".", syncslotname);
ReplicationSlotDropAtPubNode(wrconn, syncslotname);
+ elog(LOG, "!!>> process_syncing_tables_for_sync: dropped the tablesync slot \"%s\".", syncslotname);
}
PG_FINALLY();
{
@@ -471,8 +472,9 @@ process_syncing_tables_for_apply(XLogRecPtr current_lsn)
originid = replorigin_by_name(originname, true);
if (OidIsValid(originid))
{
- elog(DEBUG1, "process_syncing_tables_for_apply: dropping tablesync origin tracking for \"%s\".", originname);
+ elog(LOG, "process_syncing_tables_for_apply: dropping tablesync origin tracking for \"%s\".", originname);
replorigin_drop(originid, false);
+ elog(LOG, "!!>> process_syncing_tables_for_apply: dropped tablesync origin tracking for \"%s\".", originname);
}
}
@@ -971,15 +973,17 @@ LogicalRepSyncTableStart(XLogRecPtr *origin_startpos)
* The COPY phase was previously done, but tablesync then crashed/etc
* before it was able to finish normally.
*/
- elog(DEBUG1, "LogicalRepSyncTableStart: tablesync relstate was SUBREL_STATE_TCOPYDONE.");
+ elog(LOG, "LogicalRepSyncTableStart: tablesync relstate was SUBREL_STATE_TCOPYDONE.");
StartTransactionCommand();
/*
* The origin tracking name must already exist (missing_ok=false).
*/
originid = replorigin_by_name(originname, false);
+ elog(LOG, "!!>> LogicalRepSyncTableStart: 2 replorigin_session_setup \"%s\".", originname);
replorigin_session_setup(originid);
replorigin_session_origin = originid;
+ elog(LOG, "!!>> LogicalRepSyncTableStart: 2 replorigin_session_get_progress \"%s\".", originname);
*origin_startpos = replorigin_session_get_progress(false);
goto copy_table_done;
@@ -1028,6 +1032,7 @@ LogicalRepSyncTableStart(XLogRecPtr *origin_startpos)
* for the catchup phase after COPY is done, so tell it to use the
* snapshot to make the final data consistent.
*/
+ elog(LOG, "!!>> LogicalRepSyncTableStart: walrcv_create_slot for \"%s\".", slotname);
walrcv_create_slot(wrconn, slotname, false,
CRS_USE_SNAPSHOT, origin_startpos);
@@ -1059,8 +1064,9 @@ LogicalRepSyncTableStart(XLogRecPtr *origin_startpos)
* If something failed during copy table then cleanup the created
* slot.
*/
- elog(DEBUG1, "LogicalRepSyncTableStart: tablesync copy failed. Dropping the tablesync slot \"%s\".", slotname);
+ elog(LOG, "LogicalRepSyncTableStart: tablesync copy failed. Dropping the tablesync slot \"%s\".", slotname);
ReplicationSlotDropAtPubNode(wrconn, slotname);
+ elog(LOG, "!!>> LogicalRepSyncTableStart: tablesync copy failed. Dropped the tablesync slot \"%s\".", slotname);
pfree(slotname);
slotname = NULL;
@@ -1077,9 +1083,12 @@ LogicalRepSyncTableStart(XLogRecPtr *origin_startpos)
* Origin tracking does not exist. Create it now, and advance to LSN
* got from walrcv_create_slot.
*/
+ elog(LOG, "!!>> LogicalRepSyncTableStart: 1 replorigin_create \"%s\".", originname);
originid = replorigin_create(originname);
+ elog(LOG, "!!>> LogicalRepSyncTableStart: 1 replorigin_session_setup \"%s\".", originname);
replorigin_session_setup(originid);
replorigin_session_origin = originid;
+ elog(LOG, "!!>> LogicalRepSyncTableStart: 1 replorigin_advance \"%s\".", originname);
replorigin_advance(originid, *origin_startpos, InvalidXLogRecPtr,
true /* go backward */ , true /* WAL log */ );
}
@@ -1102,7 +1111,7 @@ LogicalRepSyncTableStart(XLogRecPtr *origin_startpos)
copy_table_done:
- elog(DEBUG1, "LogicalRepSyncTableStart: '%s' origin_startpos lsn %X/%X",
+ elog(LOG, "LogicalRepSyncTableStart: '%s' origin_startpos lsn %X/%X",
originname,
(uint32) (*origin_startpos >> 32),
(uint32) *origin_startpos);
--
1.8.3.1
v13-0001-Tablesync-Solution1.patchapplication/octet-stream; name=v13-0001-Tablesync-Solution1.patchDownload
From c4163e158fe2e0b7691062d3a14f5637f707e695 Mon Sep 17 00:00:00 2001
From: Peter Smith <peter.b.smith@fujitsu.com>
Date: Fri, 8 Jan 2021 19:09:43 +1100
Subject: [PATCH v13] Tablesync Solution1.
====
Features:
* The tablesync slot is now permanent instead of temporary. The tablesync slot name is no longer tied to the Subscription slot na
* The tablesync slot cleanup (drop) code is added for DropSubscription and for process_syncing_tables_for_sync functions
* The tablesync worker is now allowing multiple tx instead of single tx
* A new state (SUBREL_STATE_TCOPYDONE) is persisted after a successful copy_table in LogicalRepSyncTableStart.
* If a re-launched tablesync finds state SUBREL_STATE_TCOPYDONE then it will bypass the initial copy_table phase.
* Now tablesync sets up replication origin tracking in LogicalRepSyncTableStart (similar as done for the apply worker). The origin is advanced when first created.
* The tablesync replication origin tracking is cleaned up during DropSubscription and/or process_syncing_tables_for_apply.
* The DropSubscription cleanup code was enhanced (v7+) to take care of any crashed tablesync workers.
* Updates to PG docs
TODO / Known Issues:
* Address review comments
* ALTER PUBLICATION DROP TABLE can mean knowledge of tablesyncs gets lost causing resource cleanup to be missed. There is a WIP fix for this in the AlterSubscription_refresh, however it is not entirely correct; there are known race conditions. See FIXME comments.
---
doc/src/sgml/catalogs.sgml | 1 +
doc/src/sgml/logical-replication.sgml | 16 +-
doc/src/sgml/ref/drop_subscription.sgml | 6 +-
src/backend/commands/subscriptioncmds.c | 293 ++++++++++++++++++++++------
src/backend/replication/logical/origin.c | 2 +-
src/backend/replication/logical/tablesync.c | 251 ++++++++++++++++++++----
src/backend/replication/logical/worker.c | 18 +-
src/include/catalog/pg_subscription_rel.h | 2 +
src/include/replication/slot.h | 3 +
9 files changed, 473 insertions(+), 119 deletions(-)
diff --git a/doc/src/sgml/catalogs.sgml b/doc/src/sgml/catalogs.sgml
index 3a22665..6d294c8 100644
--- a/doc/src/sgml/catalogs.sgml
+++ b/doc/src/sgml/catalogs.sgml
@@ -7651,6 +7651,7 @@ SCRAM-SHA-256$<replaceable><iteration count></replaceable>:<replaceable>&l
State code:
<literal>i</literal> = initialize,
<literal>d</literal> = data is being copied,
+ <literal>t</literal> = table data has been copied,
<literal>s</literal> = synchronized,
<literal>r</literal> = ready (normal replication)
</para></entry>
diff --git a/doc/src/sgml/logical-replication.sgml b/doc/src/sgml/logical-replication.sgml
index a560ad6..8b23e03 100644
--- a/doc/src/sgml/logical-replication.sgml
+++ b/doc/src/sgml/logical-replication.sgml
@@ -248,7 +248,16 @@
<para>
As mentioned earlier, each (active) subscription receives changes from a
- replication slot on the remote (publishing) side. Normally, the remote
+ replication slot on the remote (publishing) side.
+ </para>
+ <para>
+ Additional table synchronization slots are normally transient, created
+ internally and dropped automatically when they are no longer needed.
+ These table synchronization slots have generated names:
+ <quote><literal>pg_%u_sync_%u</literal></quote> (parameters: Subscription <parameter>oid</parameter>, Table <parameter>relid</parameter>)
+ </para>
+ <para>
+ Normally, the remote
replication slot is created automatically when the subscription is created
using <command>CREATE SUBSCRIPTION</command> and it is dropped
automatically when the subscription is dropped using <command>DROP
@@ -294,8 +303,9 @@
using <command>ALTER SUBSCRIPTION</command> before attempting to drop
the subscription. If the remote database instance no longer exists, no
further action is then necessary. If, however, the remote database
- instance is just unreachable, the replication slot should then be
- dropped manually; otherwise it would continue to reserve WAL and might
+ instance is just unreachable, the replication slot (and any still
+ remaining table synchronization slots) should then be
+ dropped manually; otherwise it/they would continue to reserve WAL and might
eventually cause the disk to fill up. Such cases should be carefully
investigated.
</para>
diff --git a/doc/src/sgml/ref/drop_subscription.sgml b/doc/src/sgml/ref/drop_subscription.sgml
index adbdeaf..aee9615 100644
--- a/doc/src/sgml/ref/drop_subscription.sgml
+++ b/doc/src/sgml/ref/drop_subscription.sgml
@@ -79,7 +79,8 @@ DROP SUBSCRIPTION [ IF EXISTS ] <replaceable class="parameter">name</replaceable
<para>
When dropping a subscription that is associated with a replication slot on
the remote host (the normal state), <command>DROP SUBSCRIPTION</command>
- will connect to the remote host and try to drop the replication slot as
+ will connect to the remote host and try to drop the replication slot (and
+ any remaining table synchronization slots) as
part of its operation. This is necessary so that the resources allocated
for the subscription on the remote host are released. If this fails,
either because the remote host is not reachable or because the remote
@@ -89,7 +90,8 @@ DROP SUBSCRIPTION [ IF EXISTS ] <replaceable class="parameter">name</replaceable
executing <literal>ALTER SUBSCRIPTION ... SET (slot_name = NONE)</literal>.
After that, <command>DROP SUBSCRIPTION</command> will no longer attempt any
actions on a remote host. Note that if the remote replication slot still
- exists, it should then be dropped manually; otherwise it will continue to
+ exists, it (and any related table synchronization slots) should then be
+ dropped manually; otherwise it/they will continue to
reserve WAL and might eventually cause the disk to fill up. See
also <xref linkend="logical-replication-subscription-slot"/>.
</para>
diff --git a/src/backend/commands/subscriptioncmds.c b/src/backend/commands/subscriptioncmds.c
index 490e935..51f5e40 100644
--- a/src/backend/commands/subscriptioncmds.c
+++ b/src/backend/commands/subscriptioncmds.c
@@ -37,6 +37,7 @@
#include "replication/walreceiver.h"
#include "replication/walsender.h"
#include "replication/worker_internal.h"
+#include "replication/slot.h"
#include "storage/lmgr.h"
#include "utils/acl.h"
#include "utils/builtins.h"
@@ -649,9 +650,81 @@ AlterSubscription_refresh(Subscription *sub, bool copy_data)
if (!bsearch(&relid, pubrel_local_oids,
list_length(pubrel_names), sizeof(Oid), oid_cmp))
{
- RemoveSubscriptionRel(sub->oid, relid);
+ Oid subid = sub->oid;
- logicalrep_worker_stop_at_commit(sub->oid, relid);
+ logicalrep_worker_stop_at_commit(subid, relid);
+
+ /*
+ * Cleanup any remaining tablesync resources.
+ */
+ {
+ char originname[NAMEDATALEN];
+ RepOriginId originid;
+ char state;
+ XLogRecPtr statelsn;
+
+ /* Last known rel state. */
+ state = GetSubscriptionRelState(subid, relid, &statelsn);
+
+ RemoveSubscriptionRel(sub->oid, relid);
+
+ /*
+ * Drop tablesync slot.
+ *
+ * Should only be necessary for TCOPYDONE state (SYNCDONE and
+ * READY would have already dropped their slot)
+ *
+ * FIXME - Usually this cleanup would be OK, but will not
+ * always be OK because the logicalrep_worker_stop_at_commit
+ * only "flags" the worker to be stopped in the near future
+ * but meanwhile it may still be running. In this case there
+ * could be a race between the tablesync worker and this code
+ * to see who will succeed with the tablesync drop (and the
+ * loser will ERROR).
+ *
+ * FIXME - Also, checking the state is also not guaranteed
+ * correct because state might be TCOPYDONE when we checked
+ * but has since progressed to SYNDONE
+ */
+
+ if (state == SUBREL_STATE_TCOPYDONE)
+ {
+ char *syncslotname = ReplicationSlotNameForTablesync(subid, relid);
+
+ PG_TRY();
+ {
+ /* Try to connect to the publisher. */
+ /*
+ * XXX - Should optimize this to avoid multiple
+ * connect/disconnect.
+ */
+ wrconn = walrcv_connect(sub->conninfo, true, sub->name, &err);
+ if (!wrconn)
+ ereport(ERROR,
+ (errmsg("could not connect to the publisher: %s", err)));
+
+ elog(DEBUG1, "AlterSubscription_refresh: dropping the tablesync slot \"%s\".", syncslotname);
+ ReplicationSlotDropAtPubNode(wrconn, syncslotname);
+ }
+ PG_FINALLY();
+ {
+ /* We are done with the remote side, close connection. */
+ walrcv_disconnect(wrconn);
+
+ pfree(syncslotname);
+ }
+ PG_END_TRY();
+ }
+
+ /* Remove the tablesync's origin tracking if exists. */
+ snprintf(originname, sizeof(originname), "pg_%u_%u", subid, relid);
+ originid = replorigin_by_name(originname, true);
+ if (OidIsValid(originid))
+ {
+ elog(DEBUG1, "AlterSubscription_refresh: dropping origin tracking for \"%s\"", originname);
+ replorigin_drop(originid, false);
+ }
+ }
ereport(DEBUG1,
(errmsg("table \"%s.%s\" removed from subscription \"%s\"",
@@ -928,8 +1001,8 @@ DropSubscription(DropSubscriptionStmt *stmt, bool isTopLevel)
char *err = NULL;
RepOriginId originid;
WalReceiverConn *wrconn = NULL;
- StringInfoData cmd;
Form_pg_subscription form;
+ List *rstates;
/*
* Lock pg_subscription with AccessExclusiveLock to ensure that the
@@ -1016,73 +1089,181 @@ DropSubscription(DropSubscriptionStmt *stmt, bool isTopLevel)
ReleaseSysCache(tup);
/*
- * Stop all the subscription workers immediately.
- *
- * This is necessary if we are dropping the replication slot, so that the
- * slot becomes accessible.
- *
- * It is also necessary if the subscription is disabled and was disabled
- * in the same transaction. Then the workers haven't seen the disabling
- * yet and will still be running, leading to hangs later when we want to
- * drop the replication origin. If the subscription was disabled before
- * this transaction, then there shouldn't be any workers left, so this
- * won't make a difference.
+ * Try to acquire the connection necessary for dropping slots. We do this
+ * here so that the same connection may be shared for dropping the
+ * Subscription slot, as well as dropping any tablesync slots.
*
- * New workers won't be started because we hold an exclusive lock on the
- * subscription till the end of the transaction.
+ * Note: If the slotname is NONE/NULL then connection errors are
+ * suppressed. This is necessary so that the DROP SUBSCRIPTION can still
+ * complete even when the connection to publisher is broken.
*/
- LWLockAcquire(LogicalRepWorkerLock, LW_SHARED);
- subworkers = logicalrep_workers_find(subid, false);
- LWLockRelease(LogicalRepWorkerLock);
- foreach(lc, subworkers)
+ load_file("libpqwalreceiver", false);
+
+ wrconn = walrcv_connect(conninfo, true, subname, &err);
+ if (wrconn == NULL && slotname != NULL)
+ ereport(ERROR,
+ (errmsg("could not connect to publisher when attempting to "
+ "drop the replication slot \"%s\"", slotname),
+ errdetail("The error was: %s", err),
+ /* translator: %s is an SQL ALTER command */
+ errhint("Use %s to disassociate the subscription from the slot.",
+ "ALTER SUBSCRIPTION ... SET (slot_name = NONE)")));
+
+ PG_TRY();
{
- LogicalRepWorker *w = (LogicalRepWorker *) lfirst(lc);
+ /*
+ * Stop all the subscription workers immediately.
+ *
+ * This is necessary if we are dropping the replication slot, so that
+ * the slot becomes accessible.
+ *
+ * It is also necessary if the subscription is disabled and was
+ * disabled in the same transaction. Then the workers haven't seen
+ * the disabling yet and will still be running, leading to hangs later
+ * when we want to drop the replication origin. If the subscription
+ * was disabled before this transaction, then there shouldn't be any
+ * workers left, so this won't make a difference.
+ *
+ * New workers won't be started because we hold an exclusive lock on
+ * the subscription till the end of the transaction.
+ */
+ LWLockAcquire(LogicalRepWorkerLock, LW_SHARED);
+ subworkers = logicalrep_workers_find(subid, false);
+ LWLockRelease(LogicalRepWorkerLock);
+ foreach(lc, subworkers)
+ {
+ LogicalRepWorker *w = (LogicalRepWorker *) lfirst(lc);
- logicalrep_worker_stop(w->subid, w->relid);
- }
- list_free(subworkers);
+ logicalrep_worker_stop(w->subid, w->relid);
+ }
+ list_free(subworkers);
+
+ /*
+ * Tablesync resource cleanup (slots and origins).
+ *
+ * Any READY-state relations would already have dealt with clean-ups.
+ */
+ rstates = GetSubscriptionNotReadyRelations(subid);
+ foreach(lc, rstates)
+ {
+ SubscriptionRelState *rstate = (SubscriptionRelState *) lfirst(lc);
+ Oid relid = rstate->relid;
- /* Clean up dependencies */
- deleteSharedDependencyRecordsFor(SubscriptionRelationId, subid, 0);
+ /* Only cleanup resources of tablesync workers */
+ if (!OidIsValid(relid))
+ continue;
- /* Remove any associated relation synchronization states. */
- RemoveSubscriptionRel(subid, InvalidOid);
+ /*
+ * Drop the tablesync slot.
+ *
+ * The tablesync slot was created in the same tx as where the
+ * TCOPYDONE state was set.
+ *
+ * For SYNCDONE/READY states the tablesync slot has already been
+ * dropped by the tablesync worker.
+ *
+ * So this leaves only the TCOPYDONE state to be taken care of.
+ */
+ if (rstate->state == SUBREL_STATE_TCOPYDONE)
+ {
+ char *syncslotname = ReplicationSlotNameForTablesync(subid, relid);
- /* Remove the origin tracking if exists. */
- snprintf(originname, sizeof(originname), "pg_%u", subid);
- originid = replorigin_by_name(originname, true);
- if (originid != InvalidRepOriginId)
- replorigin_drop(originid, false);
+ PG_TRY();
+ {
+ if (!wrconn)
+ {
+ /*
+ * It is only possible to reach here without ERROR for
+ * a broken publisher connection if the subscription
+ * slotname is already NONE/NULL.
+ *
+ * This means the user has disassociated the
+ * subscription from the replication slot deliberately
+ * so that the DROP SUBSCRIPTION can proceed to
+ * completion. See
+ * https://www.postgresql.org/docs/current/sql-dropsubscription.html
+ *
+ * For this reason we only LOG a message that the
+ * tablesync slots cannot be dropped, rather than
+ * throw ERROR (which would prevent the DROP
+ * SUBSCRIPTION from proceeding).
+ *
+ * In such a case the user must take steps to manually
+ * cleanup these remaining tablesync slots.
+ */
+ elog(LOG, "DROP SUBSCRIPTION: no connection; cannot drop tablesync slot \"%s\".",
+ syncslotname);
+ }
+ else
+ {
+ elog(DEBUG1, "DropSubscription: dropping the tablesync slot \"%s\".", syncslotname);
+ ReplicationSlotDropAtPubNode(wrconn, syncslotname);
+ }
+ }
+ PG_FINALLY();
+ {
+ pfree(syncslotname);
+ }
+ PG_END_TRY();
+ }
- /*
- * If there is no slot associated with the subscription, we can finish
- * here.
- */
- if (!slotname)
+ /* Remove the tablesync's origin tracking if exists. */
+ snprintf(originname, sizeof(originname), "pg_%u_%u", subid, relid);
+ originid = replorigin_by_name(originname, true);
+ if (originid != InvalidRepOriginId)
+ {
+ elog(DEBUG1, "DropSubscription: dropping origin tracking for \"%s\"", originname);
+ replorigin_drop(originid, false);
+ }
+ }
+ list_free(rstates);
+
+ /* Clean up dependencies. */
+ deleteSharedDependencyRecordsFor(SubscriptionRelationId, subid, 0);
+
+ /* Remove any associated relation synchronization states. */
+ RemoveSubscriptionRel(subid, InvalidOid);
+
+ /* Remove the origin tracking if exists. */
+ snprintf(originname, sizeof(originname), "pg_%u", subid);
+ originid = replorigin_by_name(originname, true);
+ if (originid != InvalidRepOriginId)
+ replorigin_drop(originid, false);
+
+ /*
+ * If there is a slot associated with the subscription, then drop the
+ * replication slot at the publisher node using the replication
+ * connection.
+ */
+ if (slotname)
+ ReplicationSlotDropAtPubNode(wrconn, slotname);
+ }
+ PG_FINALLY();
{
+ if (wrconn)
+ walrcv_disconnect(wrconn);
+
table_close(rel, NoLock);
- return;
}
+ PG_END_TRY();
+}
+
+
+/*
+ * Drop the replication slot at the publisher node using the replication connection.
+ */
+void
+ReplicationSlotDropAtPubNode(WalReceiverConn *wrconn, char *slotname)
+{
+ StringInfoData cmd;
+
+ Assert(wrconn);
- /*
- * Otherwise drop the replication slot at the publisher node using the
- * replication connection.
- */
load_file("libpqwalreceiver", false);
initStringInfo(&cmd);
appendStringInfo(&cmd, "DROP_REPLICATION_SLOT %s WAIT", quote_identifier(slotname));
- wrconn = walrcv_connect(conninfo, true, subname, &err);
- if (wrconn == NULL)
- ereport(ERROR,
- (errmsg("could not connect to publisher when attempting to "
- "drop the replication slot \"%s\"", slotname),
- errdetail("The error was: %s", err),
- /* translator: %s is an SQL ALTER command */
- errhint("Use %s to disassociate the subscription from the slot.",
- "ALTER SUBSCRIPTION ... SET (slot_name = NONE)")));
-
PG_TRY();
{
WalRcvExecResult *res;
@@ -1103,13 +1284,9 @@ DropSubscription(DropSubscriptionStmt *stmt, bool isTopLevel)
}
PG_FINALLY();
{
- walrcv_disconnect(wrconn);
+ pfree(cmd.data);
}
PG_END_TRY();
-
- pfree(cmd.data);
-
- table_close(rel, NoLock);
}
/*
diff --git a/src/backend/replication/logical/origin.c b/src/backend/replication/logical/origin.c
index 0b01cce..b299285 100644
--- a/src/backend/replication/logical/origin.c
+++ b/src/backend/replication/logical/origin.c
@@ -905,7 +905,7 @@ replorigin_advance(RepOriginId node,
LWLockAcquire(&replication_state->lock, LW_EXCLUSIVE);
/* Make sure it's not used by somebody else */
- if (replication_state->acquired_by != 0)
+ if (replication_state->acquired_by != 0 && replication_state->acquired_by != MyProcPid)
{
ereport(ERROR,
(errcode(ERRCODE_OBJECT_IN_USE),
diff --git a/src/backend/replication/logical/tablesync.c b/src/backend/replication/logical/tablesync.c
index 863d196..d5d0840 100644
--- a/src/backend/replication/logical/tablesync.c
+++ b/src/backend/replication/logical/tablesync.c
@@ -31,8 +31,10 @@
* table state to INIT.
* - Tablesync worker starts; changes table state from INIT to DATASYNC while
* copying.
- * - Tablesync worker finishes the copy and sets table state to SYNCWAIT;
- * waits for state change.
+ * - Tablesync worker does initial table copy; there is a TCOPYDONE state to
+ * indicate when the copy phase has completed, so if the worker crashes
+ * before reaching SYNCDONE the copy will not be re-attempted.
+ * - Tablesync worker then sets table state to SYNCWAIT; waits for state change.
* - Apply worker periodically checks for tables in SYNCWAIT state. When
* any appear, it sets the table state to CATCHUP and starts loop-waiting
* until either the table state is set to SYNCDONE or the sync worker
@@ -48,8 +50,8 @@
* point it sets state to READY and stops tracking. Again, there might
* be zero changes in between.
*
- * So the state progression is always: INIT -> DATASYNC -> SYNCWAIT ->
- * CATCHUP -> SYNCDONE -> READY.
+ * So the state progression is always: INIT -> DATASYNC ->
+ * (sync worker TCOPYDONE) -> SYNCWAIT -> CATCHUP -> SYNCDONE -> READY.
*
* The catalog pg_subscription_rel is used to keep information about
* subscribed tables and their state. Some transient state during data
@@ -59,6 +61,7 @@
* Example flows look like this:
* - Apply is in front:
* sync:8
+ * -> set in catalog TCOPYDONE
* -> set in memory SYNCWAIT
* apply:10
* -> set in memory CATCHUP
@@ -74,6 +77,7 @@
*
* - Sync is in front:
* sync:10
+ * -> set in catalog TCOPYDONE
* -> set in memory SYNCWAIT
* apply:8
* -> set in memory CATCHUP
@@ -102,6 +106,8 @@
#include "replication/logicalrelation.h"
#include "replication/walreceiver.h"
#include "replication/worker_internal.h"
+#include "replication/slot.h"
+#include "replication/origin.h"
#include "storage/ipc.h"
#include "utils/builtins.h"
#include "utils/lsyscache.h"
@@ -270,30 +276,62 @@ invalidate_syncing_table_states(Datum arg, int cacheid, uint32 hashvalue)
static void
process_syncing_tables_for_sync(XLogRecPtr current_lsn)
{
- Assert(IsTransactionState());
+ bool sync_done = false;
SpinLockAcquire(&MyLogicalRepWorker->relmutex);
+ sync_done = MyLogicalRepWorker->relstate == SUBREL_STATE_CATCHUP &&
+ current_lsn >= MyLogicalRepWorker->relstate_lsn;
+ SpinLockRelease(&MyLogicalRepWorker->relmutex);
- if (MyLogicalRepWorker->relstate == SUBREL_STATE_CATCHUP &&
- current_lsn >= MyLogicalRepWorker->relstate_lsn)
+ if (sync_done)
{
TimeLineID tli;
+ char *syncslotname;
+
+ /* End wal streaming so wrconn can be re-used to drop the slot. */
+ walrcv_endstreaming(wrconn, &tli);
+
+ /*
+ * Cleanup the tablesync slot.
+ */
+ syncslotname = ReplicationSlotNameForTablesync(
+ MySubscription->oid,
+ MyLogicalRepWorker->relid);
+ PG_TRY();
+ {
+ elog(DEBUG1, "process_syncing_tables_for_sync: dropping the tablesync slot \"%s\".", syncslotname);
+ ReplicationSlotDropAtPubNode(wrconn, syncslotname);
+ }
+ PG_FINALLY();
+ {
+ pfree(syncslotname);
+ }
+ PG_END_TRY();
+ /*
+ * Change state to SYNCDONE.
+ */
+ SpinLockAcquire(&MyLogicalRepWorker->relmutex);
MyLogicalRepWorker->relstate = SUBREL_STATE_SYNCDONE;
MyLogicalRepWorker->relstate_lsn = current_lsn;
-
SpinLockRelease(&MyLogicalRepWorker->relmutex);
+ /*
+ * UpdateSubscriptionRelState must be called within a transaction.
+ * That transaction will be ended within the finish_sync_worker().
+ */
+ if (!IsTransactionState())
+ {
+ StartTransactionCommand();
+ }
+
UpdateSubscriptionRelState(MyLogicalRepWorker->subid,
MyLogicalRepWorker->relid,
MyLogicalRepWorker->relstate,
MyLogicalRepWorker->relstate_lsn);
- walrcv_endstreaming(wrconn, &tli);
finish_sync_worker();
}
- else
- SpinLockRelease(&MyLogicalRepWorker->relmutex);
}
/*
@@ -412,6 +450,35 @@ process_syncing_tables_for_apply(XLogRecPtr current_lsn)
started_tx = true;
}
+ /*
+ * Remove the tablesync origin tracking if exists.
+ *
+ * The cleanup is done here instead of in the
+ * finish_sync_worker function because if the tablesync worker
+ * process attempted to call replorigin_drop then that will
+ * hang because replorigin_drop logic considers the owning
+ * tablesync PID as "busy".
+ *
+ * Do this before updating the state, so that DropSubscription
+ * can know that all READY workers have already had their
+ * origin tracking removed.
+ */
+ {
+ char originname[NAMEDATALEN];
+ RepOriginId originid;
+
+ snprintf(originname, sizeof(originname), "pg_%u_%u", MyLogicalRepWorker->subid, rstate->relid);
+ originid = replorigin_by_name(originname, true);
+ if (OidIsValid(originid))
+ {
+ elog(DEBUG1, "process_syncing_tables_for_apply: dropping tablesync origin tracking for \"%s\".", originname);
+ replorigin_drop(originid, false);
+ }
+ }
+
+ /*
+ * Update the state only after the origin cleanup.
+ */
UpdateSubscriptionRelState(MyLogicalRepWorker->subid,
rstate->relid, rstate->state,
rstate->lsn);
@@ -808,6 +875,31 @@ copy_table(Relation rel)
}
/*
+ * Determine the tablesync slot name.
+ *
+ * The returned slot name is palloc'ed in current memory context.
+ */
+char *
+ReplicationSlotNameForTablesync(Oid suboid, Oid relid)
+{
+ char *syncslotname;
+
+ /*
+ * To build a slot name for the sync work, we are limited to NAMEDATALEN -
+ * 1 characters.
+ *
+ * The name is calculated as pg_%u_sync_%u (3 + 10 + 6 + 10 + '\0'). (It's
+ * actually the NAMEDATALEN on the remote that matters, but this scheme
+ * will also work reasonably if that is different.)
+ */
+ StaticAssertStmt(NAMEDATALEN >= 32, "NAMEDATALEN too small"); /* for sanity */
+
+ syncslotname = psprintf("pg_%u_sync_%u", suboid, relid);
+
+ return syncslotname;
+}
+
+/*
* Start syncing the table in the sync worker.
*
* If nothing needs to be done to sync the table, we exit the worker without
@@ -824,6 +916,8 @@ LogicalRepSyncTableStart(XLogRecPtr *origin_startpos)
XLogRecPtr relstate_lsn;
Relation rel;
WalRcvExecResult *res;
+ char originname[NAMEDATALEN];
+ RepOriginId originid;
/* Check the state of the table synchronization. */
StartTransactionCommand();
@@ -849,19 +943,10 @@ LogicalRepSyncTableStart(XLogRecPtr *origin_startpos)
finish_sync_worker(); /* doesn't return */
}
- /*
- * To build a slot name for the sync work, we are limited to NAMEDATALEN -
- * 1 characters. We cut the original slot name to NAMEDATALEN - 28 chars
- * and append _%u_sync_%u (1 + 10 + 6 + 10 + '\0'). (It's actually the
- * NAMEDATALEN on the remote that matters, but this scheme will also work
- * reasonably if that is different.)
- */
- StaticAssertStmt(NAMEDATALEN >= 32, "NAMEDATALEN too small"); /* for sanity */
- slotname = psprintf("%.*s_%u_sync_%u",
- NAMEDATALEN - 28,
- MySubscription->slotname,
- MySubscription->oid,
- MyLogicalRepWorker->relid);
+ /* Calculate the name of the tablesync slot. */
+ slotname = ReplicationSlotNameForTablesync(
+ MySubscription->oid,
+ MyLogicalRepWorker->relid);
/*
* Here we use the slot name instead of the subscription name as the
@@ -874,7 +959,31 @@ LogicalRepSyncTableStart(XLogRecPtr *origin_startpos)
(errmsg("could not connect to the publisher: %s", err)));
Assert(MyLogicalRepWorker->relstate == SUBREL_STATE_INIT ||
- MyLogicalRepWorker->relstate == SUBREL_STATE_DATASYNC);
+ MyLogicalRepWorker->relstate == SUBREL_STATE_DATASYNC ||
+ MyLogicalRepWorker->relstate == SUBREL_STATE_TCOPYDONE);
+
+ /* Assign the origin tracking record name. */
+ snprintf(originname, sizeof(originname), "pg_%u_%u", MySubscription->oid, MyLogicalRepWorker->relid);
+
+ if (MyLogicalRepWorker->relstate == SUBREL_STATE_TCOPYDONE)
+ {
+ /*
+ * The COPY phase was previously done, but tablesync then crashed/etc
+ * before it was able to finish normally.
+ */
+ elog(DEBUG1, "LogicalRepSyncTableStart: tablesync relstate was SUBREL_STATE_TCOPYDONE.");
+ StartTransactionCommand();
+
+ /*
+ * The origin tracking name must already exist (missing_ok=false).
+ */
+ originid = replorigin_by_name(originname, false);
+ replorigin_session_setup(originid);
+ replorigin_session_origin = originid;
+ *origin_startpos = replorigin_session_get_progress(false);
+
+ goto copy_table_done;
+ }
SpinLockAcquire(&MyLogicalRepWorker->relmutex);
MyLogicalRepWorker->relstate = SUBREL_STATE_DATASYNC;
@@ -890,9 +999,6 @@ LogicalRepSyncTableStart(XLogRecPtr *origin_startpos)
CommitTransactionCommand();
pgstat_report_stat(false);
- /*
- * We want to do the table data sync in a single transaction.
- */
StartTransactionCommand();
/*
@@ -918,29 +1024,90 @@ LogicalRepSyncTableStart(XLogRecPtr *origin_startpos)
walrcv_clear_result(res);
/*
- * Create a new temporary logical decoding slot. This slot will be used
+ * Create a new permanent logical decoding slot. This slot will be used
* for the catchup phase after COPY is done, so tell it to use the
* snapshot to make the final data consistent.
*/
- walrcv_create_slot(wrconn, slotname, true,
+ walrcv_create_slot(wrconn, slotname, false,
CRS_USE_SNAPSHOT, origin_startpos);
- /* Now do the initial data copy */
- PushActiveSnapshot(GetTransactionSnapshot());
- copy_table(rel);
- PopActiveSnapshot();
+ /*
+ * Be sure to remove the newly created tablesync slot if the COPY fails.
+ */
+ PG_TRY();
+ {
+ /* Now do the initial data copy */
+ PushActiveSnapshot(GetTransactionSnapshot());
+ copy_table(rel);
+ PopActiveSnapshot();
+
+ res = walrcv_exec(wrconn, "COMMIT", 0, NULL);
+ if (res->status != WALRCV_OK_COMMAND)
+ ereport(ERROR,
+ (errmsg("table copy could not finish transaction on publisher"),
+ errdetail("The error was: %s", res->err)));
+ walrcv_clear_result(res);
+
+ table_close(rel, NoLock);
+
+ /* Make the copy visible. */
+ CommandCounterIncrement();
+ }
+ PG_CATCH();
+ {
+ /*
+ * If something failed during copy table then cleanup the created
+ * slot.
+ */
+ elog(DEBUG1, "LogicalRepSyncTableStart: tablesync copy failed. Dropping the tablesync slot \"%s\".", slotname);
+ ReplicationSlotDropAtPubNode(wrconn, slotname);
- res = walrcv_exec(wrconn, "COMMIT", 0, NULL);
- if (res->status != WALRCV_OK_COMMAND)
+ pfree(slotname);
+ slotname = NULL;
+
+ PG_RE_THROW();
+ }
+ PG_END_TRY();
+
+ /* Setup replication origin tracking. */
+ originid = replorigin_by_name(originname, true);
+ if (!OidIsValid(originid))
+ {
+ /*
+ * Origin tracking does not exist. Create it now, and advance to LSN
+ * got from walrcv_create_slot.
+ */
+ originid = replorigin_create(originname);
+ replorigin_session_setup(originid);
+ replorigin_session_origin = originid;
+ replorigin_advance(originid, *origin_startpos, InvalidXLogRecPtr,
+ true /* go backward */ , true /* WAL log */ );
+ }
+ else
+ {
ereport(ERROR,
- (errmsg("table copy could not finish transaction on publisher"),
- errdetail("The error was: %s", res->err)));
- walrcv_clear_result(res);
+ (errcode(ERRCODE_DUPLICATE_OBJECT),
+ errmsg("replication origin \"%s\" already exists",
+ originname)));
+ }
- table_close(rel, NoLock);
+ /*
+ * Update the persisted state to indicate the COPY phase is done; make it
+ * visible to others.
+ */
+ UpdateSubscriptionRelState(MyLogicalRepWorker->subid,
+ MyLogicalRepWorker->relid,
+ SUBREL_STATE_TCOPYDONE,
+ MyLogicalRepWorker->relstate_lsn);
+
+copy_table_done:
- /* Make the copy visible. */
- CommandCounterIncrement();
+ elog(DEBUG1, "LogicalRepSyncTableStart: '%s' origin_startpos lsn %X/%X",
+ originname,
+ (uint32) (*origin_startpos >> 32),
+ (uint32) *origin_startpos);
+
+ CommitTransactionCommand();
/*
* We are done with the initial data synchronization, update the state.
diff --git a/src/backend/replication/logical/worker.c b/src/backend/replication/logical/worker.c
index 1b1d70e..4bd4030 100644
--- a/src/backend/replication/logical/worker.c
+++ b/src/backend/replication/logical/worker.c
@@ -807,12 +807,8 @@ apply_handle_stream_stop(StringInfo s)
/* We must be in a valid transaction state */
Assert(IsTransactionState());
- /* The synchronization worker runs in single transaction. */
- if (!am_tablesync_worker())
- {
- /* Commit the per-stream transaction */
- CommitTransactionCommand();
- }
+ /* Commit the per-stream transaction */
+ CommitTransactionCommand();
in_streamed_transaction = false;
@@ -889,9 +885,7 @@ apply_handle_stream_abort(StringInfo s)
/* Cleanup the subxact info */
cleanup_subxact_info();
- /* The synchronization worker runs in single transaction */
- if (!am_tablesync_worker())
- CommitTransactionCommand();
+ CommitTransactionCommand();
return;
}
@@ -918,8 +912,7 @@ apply_handle_stream_abort(StringInfo s)
/* write the updated subxact list */
subxact_info_write(MyLogicalRepWorker->subid, xid);
- if (!am_tablesync_worker())
- CommitTransactionCommand();
+ CommitTransactionCommand();
}
}
@@ -1062,8 +1055,7 @@ apply_handle_stream_commit(StringInfo s)
static void
apply_handle_commit_internal(StringInfo s, LogicalRepCommitData* commit_data)
{
- /* The synchronization worker runs in single transaction. */
- if (IsTransactionState() && !am_tablesync_worker())
+ if (IsTransactionState())
{
/*
* Update origin state so we can restart streaming from correct
diff --git a/src/include/catalog/pg_subscription_rel.h b/src/include/catalog/pg_subscription_rel.h
index 06663b9..2c80405 100644
--- a/src/include/catalog/pg_subscription_rel.h
+++ b/src/include/catalog/pg_subscription_rel.h
@@ -61,6 +61,8 @@ DECLARE_UNIQUE_INDEX(pg_subscription_rel_srrelid_srsubid_index, 6117, on pg_subs
#define SUBREL_STATE_INIT 'i' /* initializing (sublsn NULL) */
#define SUBREL_STATE_DATASYNC 'd' /* data is being synchronized (sublsn
* NULL) */
+#define SUBREL_STATE_TCOPYDONE 't' /* tablesync copy phase is completed
+ * (sublsn NULL) */
#define SUBREL_STATE_SYNCDONE 's' /* synchronization finished in front of
* apply (sublsn set) */
#define SUBREL_STATE_READY 'r' /* ready (sublsn set) */
diff --git a/src/include/replication/slot.h b/src/include/replication/slot.h
index 53f636c..e617602 100644
--- a/src/include/replication/slot.h
+++ b/src/include/replication/slot.h
@@ -15,6 +15,7 @@
#include "storage/lwlock.h"
#include "storage/shmem.h"
#include "storage/spin.h"
+#include "replication/walreceiver.h"
/*
* Behaviour of replication slots, upon release or crash.
@@ -211,6 +212,8 @@ extern bool ReplicationSlotsCountDBSlots(Oid dboid, int *nslots, int *nactive);
extern void ReplicationSlotsDropDBSlots(Oid dboid);
extern void InvalidateObsoleteReplicationSlots(XLogSegNo oldestSegno);
extern ReplicationSlot *SearchNamedReplicationSlot(const char *name);
+extern char *ReplicationSlotNameForTablesync(Oid suboid, Oid relid);
+extern void ReplicationSlotDropAtPubNode(WalReceiverConn *wrconn, char *slotname);
extern void StartupReplicationSlots(void);
extern void CheckPointReplicationSlots(void);
--
1.8.3.1
On Fri, Jan 8, 2021 at 1:02 PM Hou, Zhijie <houzj.fnst@cn.fujitsu.com> wrote:
PSA the v12 patch for the Tablesync Solution1.
Differences from v11: + Added PG docs to mention the tablesync slot + Refactored tablesync slot drop (done by DropSubscription/process_syncing_tables_for_sync) + Fixed PG docs mentioning wrong state code + Fixed wrong code comment describing TCOPYDONE stateHi
I look into the new patch and have some comments.
1. + /* Setup replication origin tracking. */ ①+ originid = replorigin_by_name(originname, true); + if (!OidIsValid(originid)) + {②+ originid = replorigin_by_name(originname, true); + if (originid != InvalidRepOriginId) + {There are two different style code which check whether originid is valid.
Both are fine, but do you think it’s better to have a same style here?
Yes. I think the 1st style is better, so I used the OidIsValid for all
the new code of the patch.
But the check in DropSubscription is an exception; there I used 2nd
style but ONLY to be consistent with another originid check which
already existed in that same function.
2. * state to SYNCDONE. There might be zero changes applied between * CATCHUP and SYNCDONE, because the sync worker might be ahead of the * apply worker. + * - The sync worker has a intermediary state TCOPYDONE which comes after + * DATASYNC and before SYNCWAIT. This state indicates that the initialThis comment about TCOPYDONE is better to be placed at [1]*, where is between DATASYNC and SYNCWAIT.
* - Tablesync worker starts; changes table state from INIT to DATASYNC while
* copying.
[1]*
* - Tablesync worker finishes the copy and sets table state to SYNCWAIT;
* waits for state change.
Agreed. I have moved the comment per your suggestion (and I also
re-worded it again).
Fixed in latest patch [v13]
3. + /* + * To build a slot name for the sync work, we are limited to NAMEDATALEN - + * 1 characters. + * + * The name is calculated as pg_%u_sync_%u (3 + 10 + 6 + 10 + '\0'). (It's + * actually the NAMEDATALEN on the remote that matters, but this scheme + * will also work reasonably if that is different.) + */ + StaticAssertStmt(NAMEDATALEN >= 32, "NAMEDATALEN too small"); /* for sanity */ + + syncslotname = psprintf("pg_%u_sync_%u", suboid, relid);The comments says syncslotname is limit to NAMEDATALEN - 1 characters.
But the actual size of it is (3 + 10 + 6 + 10 + '\0') = 30,which seems not NAMEDATALEN - 1.
Should we change the comment here?
The comment wording is a remnant from older code which had a
differently format slot name.
I think the comment is still valid, albeit maybe unnecessary since in
the current code the tablesync slot
name length is fixed. But I left the older comment here as a safety reminder
in case some future change would want to modify the slot name. What do
you think?
----
[v13] = /messages/by-id/CAHut+Pvby4zg6kM1RoGd_j-xs9OtPqZPPVhbiC53gCCRWdNSrw@mail.gmail.com
Kind Regards,
Peter Smith.
Fujitsu Australia.
On Fri, Jan 8, 2021 at 2:55 PM Peter Smith <smithpb2250@gmail.com> wrote:
On Fri, Jan 8, 2021 at 1:02 PM Hou, Zhijie <houzj.fnst@cn.fujitsu.com> wrote:
3. + /* + * To build a slot name for the sync work, we are limited to NAMEDATALEN - + * 1 characters. + * + * The name is calculated as pg_%u_sync_%u (3 + 10 + 6 + 10 + '\0'). (It's + * actually the NAMEDATALEN on the remote that matters, but this scheme + * will also work reasonably if that is different.) + */ + StaticAssertStmt(NAMEDATALEN >= 32, "NAMEDATALEN too small"); /* for sanity */ + + syncslotname = psprintf("pg_%u_sync_%u", suboid, relid);The comments says syncslotname is limit to NAMEDATALEN - 1 characters.
But the actual size of it is (3 + 10 + 6 + 10 + '\0') = 30,which seems not NAMEDATALEN - 1.
Should we change the comment here?The comment wording is a remnant from older code which had a
differently format slot name.
I think the comment is still valid, albeit maybe unnecessary since in
the current code the tablesync slot
name length is fixed. But I left the older comment here as a safety reminder
in case some future change would want to modify the slot name. What do
you think?
I find it quite confusing. The comments should reflect the latest
code. You can probably say in some form that the length of slotname
shouldn't exceed NAMEDATALEN because of remote node constraints on
slot name length. Also, probably the StaticAssert on NAMEDATALEN is
not required.
1.
+ <para>
+ Additional table synchronization slots are normally transient, created
+ internally and dropped automatically when they are no longer needed.
+ These table synchronization slots have generated names:
+ <quote><literal>pg_%u_sync_%u</literal></quote> (parameters:
Subscription <parameter>oid</parameter>, Table
<parameter>relid</parameter>)
+ </para>
The last line seems too long. I think we are not strict for 80 char
limit in docs but it is good to be close to that, however, this
appears quite long.
2.
+ /*
+ * Cleanup any remaining tablesync resources.
+ */
+ {
+ char originname[NAMEDATALEN];
+ RepOriginId originid;
+ char state;
+ XLogRecPtr statelsn;
I have already mentioned previously that let's not use this new style
of code (start using { to localize the scope of variables). I don't
know about others but I find it difficult to read such a code. You
might want to consider moving this whole block to a separate function.
3.
/*
+ * XXX - Should optimize this to avoid multiple
+ * connect/disconnect.
+ */
+ wrconn = walrcv_connect(sub->conninfo, true, sub->name, &err);
I think it is better to avoid multiple connect/disconnect here. In
this same function, we have connected to the publisher, we should be
able to use the same connection.
4.
process_syncing_tables_for_sync()
{
..
+ /*
+ * Cleanup the tablesync slot.
+ */
+ syncslotname = ReplicationSlotNameForTablesync(
+ MySubscription->oid,
+ MyLogicalRepWorker->relid);
+ PG_TRY();
+ {
+ elog(DEBUG1, "process_syncing_tables_for_sync: dropping the
tablesync slot \"%s\".", syncslotname);
+ ReplicationSlotDropAtPubNode(wrconn, syncslotname);
+ }
+ PG_FINALLY();
+ {
+ pfree(syncslotname);
+ }
+ PG_END_TRY();
..
}
Both here and in DropSubscription(), it seems we are using
PG_TRY..PG_FINALLY just to free the memory even though
ReplicationSlotDropAtPubNode already has try..finally. Can we arrange
code to move allocation of syncslotname inside
ReplicationSlotDropAtPubNode to avoid additional try..finaly? BTW, if
the usage of try..finally here is only to free the memory, I am not
sure if it is required because I think we will anyway Reset the memory
context where this memory is allocated as part of error handling.
5.
#define SUBREL_STATE_DATASYNC 'd' /* data is being synchronized (sublsn
* NULL) */
+#define SUBREL_STATE_TCOPYDONE 't' /* tablesync copy phase is completed
+ * (sublsn NULL) */
#define SUBREL_STATE_SYNCDONE 's' /* synchronization finished in front of
* apply (sublsn set) */
I am not very happy with the new state name SUBREL_STATE_TCOPYDONE as
it is quite different from other adjoining state names and somehow not
going well with the code. How about SUBREL_STATE_ENDCOPY 'e' or
SUBREL_STATE_FINISHEDCOPY 'f'?
--
With Regards,
Amit Kapila.
On Fri, Jan 8, 2021 at 8:20 AM Amit Kapila <amit.kapila16@gmail.com> wrote:
On Fri, Jan 8, 2021 at 7:14 AM Peter Smith <smithpb2250@gmail.com> wrote:
FYI, I was able to reproduce this case in debugger. PSA logs showing details.
Thanks for reproducing as I was worried about exactly this case. I
have one question related to logs:##
## ALTER SUBSCRIPTION to REFRESH the publication## This blocks on some latch until the tablesync worker dies, then it continues
##Did you check which exact latch or lock blocks this?
I have checked this myself and the command is waiting on the drop of
origin till the tablesync worker is finished because replorigin_drop()
requires state->acquired_by to be 0 which will only be true once the
tablesync worker exits. I think this is the reason you might have
noticed that the command can't be finished until the tablesync worker
died. So this can't be an interlock between ALTER SUBSCRIPTION ..
REFRESH command and tablesync worker and to that end it seems you have
below Fixme's in the patch:
+ * FIXME - Usually this cleanup would be OK, but will not
+ * always be OK because the logicalrep_worker_stop_at_commit
+ * only "flags" the worker to be stopped in the near future
+ * but meanwhile it may still be running. In this case there
+ * could be a race between the tablesync worker and this code
+ * to see who will succeed with the tablesync drop (and the
+ * loser will ERROR).
+ *
+ * FIXME - Also, checking the state is also not guaranteed
+ * correct because state might be TCOPYDONE when we checked
+ * but has since progressed to SYNDONE
+ */
+
+ if (state == SUBREL_STATE_TCOPYDONE)
+ {
I feel this was okay for an earlier code but now we need to stop the
tablesync workers before trying to drop the slot as we do in
DropSubscription. Now, if we do that then that would fix the race
conditions mentioned in Fixme but still, there are few more things I
am worried about: (a) What if the launcher again starts the tablesync
worker? One idea could be to acquire AccessExclusiveLock on
SubscriptionRelationId as we do in DropSubscription which is not a
very good idea but I can't think of any other good way. (b) the patch
is just checking SUBREL_STATE_TCOPYDONE before dropping the
replication slot but the slot could be created even before that (in
SUBREL_STATE_DATASYNC state). One idea could be we can try to drop the
slot and if we are not able to drop then we can simply continue
assuming it didn't exist.
One minor comment:
1.
+ SpinLockAcquire(&MyLogicalRepWorker->relmutex);
MyLogicalRepWorker->relstate = SUBREL_STATE_SYNCDONE;
MyLogicalRepWorker->relstate_lsn = current_lsn;
-
Spurious line removal.
--
With Regards,
Amit Kapila.
On Thu, Jan 7, 2021 at 3:20 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
BTW, I have analyzed whether we need any modifications to
pg_dump/restore for this patch as this changes the state of one of the
fields in the system table and concluded that we don't need any
change. For subscriptions, we don't dump any of the information from
pg_subscription_rel, rather we just dump subscriptions with the
connect option as false which means users need to enable the
subscription and refresh publication after restore. I have checked
this in the code and tested it as well. The related information is
present in pg_dump doc page [1], see from "When dumping logical
replication subscriptions ....".I have further analyzed that we don't need to do anything w.r.t
pg_upgrade as well because it uses pg_dump/pg_dumpall to dump the
schema info of the old cluster and then restore it to the new cluster.
And, we know that pg_dump ignores the info in pg_subscription_rel, so
we don't need to change anything as our changes are specific to the
state of one of the columns in pg_subscription_rel. I have not tested
this but we should test it by having some relations in not_ready state
and then allow the old cluster (<=PG13) to be upgraded to new (pg14)
both with and without this patch and see if there is any change in
behavior.
I have tested this scenario, stopped a server running PG_13 when
subscription table sync was in progress.
One of the tables in pg_subscription_rel was still in 'd' state (DATASYNC)
postgres=# select * from pg_subscription_rel;
srsubid | srrelid | srsubstate | srsublsn
---------+---------+------------+------------
16424 | 16384 | d |
16424 | 16390 | r | 0/247A63D8
16424 | 16395 | r | 0/247A6410
16424 | 16387 | r | 0/247A6448
(4 rows)
then initiated the pg_upgrade to PG_14 with the patch and without the patch:
I see that the subscription exists but is not enabled:
postgres=# select * from pg_subscription;
oid | subdbid | subname | subowner | subenabled | subbinary |
substream | subconninfo | subslotname |
subsynccommit | subpublications
-------+---------+---------+----------+------------+-----------+-----------+------------------------------------------+-------------+---------------+-----------------
16407 | 16401 | tap_sub | 10 | f | f | f
| host=localhost port=6972 dbname=postgres | tap_sub | off
| {tap_pub}
(1 row)
and looking at the pg_subscription_rel:
postgres=# select * from pg_subscription_rel;
srsubid | srrelid | srsubstate | srsublsn
---------+---------+------------+----------
(0 rows)
As can be seen, none of the data in the pg_subscription_rel has been
copied over. Same behaviour is seen with the patch and without the
patch.
regards,
Ajin Cherian
Fujitsu Australia
On Mon, Jan 11, 2021 at 3:53 PM Ajin Cherian <itsajin@gmail.com> wrote:
On Thu, Jan 7, 2021 at 3:20 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
BTW, I have analyzed whether we need any modifications to
pg_dump/restore for this patch as this changes the state of one of the
fields in the system table and concluded that we don't need any
change. For subscriptions, we don't dump any of the information from
pg_subscription_rel, rather we just dump subscriptions with the
connect option as false which means users need to enable the
subscription and refresh publication after restore. I have checked
this in the code and tested it as well. The related information is
present in pg_dump doc page [1], see from "When dumping logical
replication subscriptions ....".I have further analyzed that we don't need to do anything w.r.t
pg_upgrade as well because it uses pg_dump/pg_dumpall to dump the
schema info of the old cluster and then restore it to the new cluster.
And, we know that pg_dump ignores the info in pg_subscription_rel, so
we don't need to change anything as our changes are specific to the
state of one of the columns in pg_subscription_rel. I have not tested
this but we should test it by having some relations in not_ready state
and then allow the old cluster (<=PG13) to be upgraded to new (pg14)
both with and without this patch and see if there is any change in
behavior.I have tested this scenario, stopped a server running PG_13 when
subscription table sync was in progress.
Thanks for the test. This confirms my analysis and we don't need any
change in pg_dump or pg_upgrade for this patch.
--
With Regards,
Amit Kapila.
Hi Amit.
PSA the v14 patch for the Tablesync Solution1.
Main differences from v13:
+ Addresses all review comments 1-5, posted 9/Jan [ak9]
+ Addresses review comment 1, posted 11/Jan [ak11]
+ Modifications per suggestion [ak11] to handle race scenarios during
Drop/AlterSubscription
+ Changed LOG to WARNING if DropSubscription unable to drop tablesync slot
[ak9] = /messages/by-id/CAA4eK1+gUBxKcYWg+MCC6Qbw-My+2wKUct+iFtr-_HgundUUBQ@mail.gmail.com
[ak11] = /messages/by-id/CAA4eK1KGUt86A7CfuQW6OeDvAhEbVk8VOBJmcoZjrYBn965kOA@mail.gmail.com
====
Features:
* The tablesync slot is now permanent instead of temporary.
* The tablesync slot name is no longer tied to the Subscription slot name.
* The tablesync slot cleanup (drop) code is added for
DropSubscription, AlterSubscription_refresh and for
process_syncing_tables_for_sync functions. Drop/AlterSubscription will
issue WARNING instead of ERROR in case the slot drop fails.
* The tablesync worker is now allowing multiple tx instead of single tx
* A new state (SUBREL_STATE_FINISHEDCOPY) is persisted after a
successful copy_table in tablesync's LogicalRepSyncTableStart.
* If a re-launched tablesync finds state SUBREL_STATE_FINISHEDCOPY
then it will bypass the initial copy_table phase.
* Now tablesync sets up replication origin tracking in
LogicalRepSyncTableStart (similar as done for the apply worker). The
origin is advanced when first created.
* The tablesync replication origin tracking is cleaned up during
DropSubscription and/or process_syncing_tables_for_apply.
* The DropSubscription cleanup code was enhanced (v7+) to take care of
any crashed tablesync workers.
* The AlterSubscription_refresh (v14+) is now more similar to
DropSubscription w.r.t to stopping workers for any "removed" tables.
* Updates to PG docs.
TODO / Known Issues:
* Minor review comments
===
Also PSA some detailed logging evidence of some test scenarios
involving Drop/AlterSubscription:
+ Test-20210112-AlterSubscriptionRefresh-ok.txt =
AlterSubscription_refresh which successfully drops a tablesync slot
+ Test-20210112-AlterSubscriptionRefresh-warning.txt =
AlterSubscription_refresh gives WARNING that it cannot drop the
tablesync slot (which no longer exists)
+ Test-20210112-DropSubscription-warning.txt = DropSubscription with a
disassociated slot_name gives a WARNING that it cannot drop the
tablesync slot (due to broken connection)
---
Kind Regards,
Peter Smith.
Fujitsu Australia
Attachments:
v14-0002-Tablesync-extra-logging.patchapplication/octet-stream; name=v14-0002-Tablesync-extra-logging.patchDownload
From 77cc1f7bc334ebb2b6405e42a465c6253d958925 Mon Sep 17 00:00:00 2001
From: Peter Smith <peter.b.smith@fujitsu.com>
Date: Tue, 12 Jan 2021 22:30:59 +1100
Subject: [PATCH v14] Tablesync extra logging.
This patch only adds some extra logging which may be helpful for testing, but is not for committing.
---
src/backend/commands/subscriptioncmds.c | 29 ++++++++++++++++++----
src/backend/replication/logical/tablesync.c | 37 +++++++++++++++++++++++++----
2 files changed, 56 insertions(+), 10 deletions(-)
diff --git a/src/backend/commands/subscriptioncmds.c b/src/backend/commands/subscriptioncmds.c
index f94243b..b5f9d56 100644
--- a/src/backend/commands/subscriptioncmds.c
+++ b/src/backend/commands/subscriptioncmds.c
@@ -665,11 +665,18 @@ AlterSubscription_refresh(Subscription *sub, bool copy_data)
XLogRecPtr statelsn;
/* Immediately stop the worker. */
+ elog(LOG,
+ "!!>> AlterSubscription_refresh: before logicalrep_worker_stop");
logicalrep_worker_stop_at_commit(subid, relid); /* prevent re-launching */
logicalrep_worker_stop(subid, relid); /* stop immediately */
+ elog(LOG,
+ "!!>> AlterSubscription_refresh: after logicalrep_worker_stop");
/* Last known rel state. */
state = GetSubscriptionRelState(subid, relid, &statelsn);
+ elog(LOG,
+ "!!>> AlterSubscription_refresh: relid %u had state %c",
+ relid, state);
RemoveSubscriptionRel(sub->oid, relid);
@@ -692,10 +699,13 @@ AlterSubscription_refresh(Subscription *sub, bool copy_data)
ReplicationSlotNameForTablesync(subid, relid, syncslotname);
- elog(DEBUG1,
+ elog(LOG,
"AlterSubscription_refresh: dropping the tablesync slot \"%s\".",
syncslotname);
ReplicationSlotDropAtPubNode(wrconn, syncslotname, missing_ok);
+ elog(LOG,
+ "!!>> AlterSubscription_refresh: dropped the tablesync slot \"%s\".",
+ syncslotname);
}
/* Remove the tablesync's origin tracking if exists. */
@@ -703,13 +713,16 @@ AlterSubscription_refresh(Subscription *sub, bool copy_data)
originid = replorigin_by_name(originname, true);
if (OidIsValid(originid))
{
- elog(DEBUG1,
+ elog(LOG,
"AlterSubscription_refresh: dropping origin tracking for \"%s\"",
originname);
replorigin_drop(originid, false);
+ elog(LOG,
+ "!!>> AlterSubscription_refresh: dropped origin tracking for \"%s\"",
+ originname);
}
- ereport(DEBUG1,
+ ereport(LOG,
(errmsg("table \"%s.%s\" removed from subscription \"%s\"",
get_namespace_name(get_rel_namespace(relid)),
get_rel_name(relid),
@@ -1191,10 +1204,13 @@ DropSubscription(DropSubscriptionStmt *stmt, bool isTopLevel)
}
else
{
- elog(DEBUG1,
+ elog(LOG,
"DropSubscription: dropping the tablesync slot \"%s\".",
syncslotname);
ReplicationSlotDropAtPubNode(wrconn, syncslotname, missing_ok);
+ elog(LOG,
+ "!!>> DropSubscription: dropped the tablesync slot \"%s\".",
+ syncslotname);
}
}
@@ -1203,10 +1219,13 @@ DropSubscription(DropSubscriptionStmt *stmt, bool isTopLevel)
originid = replorigin_by_name(originname, true);
if (originid != InvalidRepOriginId)
{
- elog(DEBUG1,
+ elog(LOG,
"DropSubscription: dropping origin tracking for \"%s\"",
originname);
replorigin_drop(originid, false);
+ elog(LOG,
+ "!!>> DropSubscription: dropped origin tracking for \"%s\"",
+ originname);
}
}
list_free(rstates);
diff --git a/src/backend/replication/logical/tablesync.c b/src/backend/replication/logical/tablesync.c
index c5a95fc..e4ccd8b 100644
--- a/src/backend/replication/logical/tablesync.c
+++ b/src/backend/replication/logical/tablesync.c
@@ -299,8 +299,11 @@ process_syncing_tables_for_sync(XLogRecPtr current_lsn)
MyLogicalRepWorker->relid,
syncslotname);
- elog(DEBUG1, "process_syncing_tables_for_sync: dropping the tablesync slot \"%s\".", syncslotname);
+ elog(LOG, "process_syncing_tables_for_sync: dropping the tablesync slot \"%s\".", syncslotname);
ReplicationSlotDropAtPubNode(wrconn, syncslotname, false);
+ elog(LOG,
+ "!!>> process_syncing_tables_for_sync: dropped the tablesync slot \"%s\".",
+ syncslotname);
/*
* Change state to SYNCDONE.
@@ -466,8 +469,11 @@ process_syncing_tables_for_apply(XLogRecPtr current_lsn)
originid = replorigin_by_name(originname, true);
if (OidIsValid(originid))
{
- elog(DEBUG1, "process_syncing_tables_for_apply: dropping tablesync origin tracking for \"%s\".", originname);
+ elog(LOG, "process_syncing_tables_for_apply: dropping tablesync origin tracking for \"%s\".", originname);
replorigin_drop(originid, false);
+ elog(LOG,
+ "!!>> process_syncing_tables_for_apply: dropped tablesync origin tracking for \"%s\".",
+ originname);
}
}
@@ -966,15 +972,21 @@ LogicalRepSyncTableStart(XLogRecPtr *origin_startpos)
* The COPY phase was previously done, but tablesync then crashed/etc
* before it was able to finish normally.
*/
- elog(DEBUG1, "LogicalRepSyncTableStart: tablesync relstate was SUBREL_STATE_FINISHEDCOPY.");
+ elog(LOG, "LogicalRepSyncTableStart: tablesync relstate was SUBREL_STATE_FINISHEDCOPY.");
StartTransactionCommand();
/*
* The origin tracking name must already exist (missing_ok=false).
*/
originid = replorigin_by_name(originname, false);
+ elog(LOG,
+ "!!>> LogicalRepSyncTableStart: 2 replorigin_session_setup \"%s\".",
+ originname);
replorigin_session_setup(originid);
replorigin_session_origin = originid;
+ elog(LOG,
+ "!!>> LogicalRepSyncTableStart: 2 replorigin_session_get_progress \"%s\".",
+ originname);
*origin_startpos = replorigin_session_get_progress(false);
goto copy_table_done;
@@ -1023,6 +1035,9 @@ LogicalRepSyncTableStart(XLogRecPtr *origin_startpos)
* for the catchup phase after COPY is done, so tell it to use the
* snapshot to make the final data consistent.
*/
+ elog(LOG,
+ "!!>> LogicalRepSyncTableStart: walrcv_create_slot for \"%s\".",
+ slotname);
walrcv_create_slot(wrconn, slotname, false,
CRS_USE_SNAPSHOT, origin_startpos);
@@ -1054,8 +1069,11 @@ LogicalRepSyncTableStart(XLogRecPtr *origin_startpos)
* If something failed during copy table then cleanup the created
* slot.
*/
- elog(DEBUG1, "LogicalRepSyncTableStart: tablesync copy failed. Dropping the tablesync slot \"%s\".", slotname);
+ elog(LOG, "LogicalRepSyncTableStart: tablesync copy failed. Dropping the tablesync slot \"%s\".", slotname);
ReplicationSlotDropAtPubNode(wrconn, slotname, false);
+ elog(LOG,
+ "!!>> LogicalRepSyncTableStart: tablesync copy failed. Dropped the tablesync slot \"%s\".",
+ slotname);
pfree(slotname);
slotname = NULL;
@@ -1072,9 +1090,18 @@ LogicalRepSyncTableStart(XLogRecPtr *origin_startpos)
* Origin tracking does not exist. Create it now, and advance to LSN
* got from walrcv_create_slot.
*/
+ elog(LOG,
+ "!!>> LogicalRepSyncTableStart: 1 replorigin_create \"%s\".",
+ originname);
originid = replorigin_create(originname);
+ elog(LOG,
+ "!!>> LogicalRepSyncTableStart: 1 replorigin_session_setup \"%s\".",
+ originname);
replorigin_session_setup(originid);
replorigin_session_origin = originid;
+ elog(LOG,
+ "!!>> LogicalRepSyncTableStart: 1 replorigin_advance \"%s\".",
+ originname);
replorigin_advance(originid, *origin_startpos, InvalidXLogRecPtr,
true /* go backward */ , true /* WAL log */ );
}
@@ -1097,7 +1124,7 @@ LogicalRepSyncTableStart(XLogRecPtr *origin_startpos)
copy_table_done:
- elog(DEBUG1, "LogicalRepSyncTableStart: '%s' origin_startpos lsn %X/%X",
+ elog(LOG, "LogicalRepSyncTableStart: '%s' origin_startpos lsn %X/%X",
originname,
(uint32) (*origin_startpos >> 32),
(uint32) *origin_startpos);
--
1.8.3.1
v14-0001-Tablesync-Solution1.patchapplication/octet-stream; name=v14-0001-Tablesync-Solution1.patchDownload
From 6a1e27cb36c72c5fe9a29550118ac845f876c200 Mon Sep 17 00:00:00 2001
From: Peter Smith <peter.b.smith@fujitsu.com>
Date: Tue, 12 Jan 2021 22:23:54 +1100
Subject: [PATCH v14] Tablesync Solution1.
====
Features:
* The tablesync slot is now permanent instead of temporary.
* The tablesync slot name is no longer tied to the Subscription slot name.
* The tablesync slot cleanup (drop) code is added for DropSubscription, AlterSubscription_refresh and for process_syncing_tables_for_sync functions. Drop/AlterSubscription will issue WARNING instead of ERROR in case the slot drop fails.
* The tablesync worker is now allowing multiple tx instead of single tx
* A new state (SUBREL_STATE_FINISHEDCOPY) is persisted after a successful copy_table in tablesync's LogicalRepSyncTableStart.
* If a re-launched tablesync finds state SUBREL_STATE_FINISHEDCOPY then it will bypass the initial copy_table phase.
* Now tablesync sets up replication origin tracking in LogicalRepSyncTableStart (similar as done for the apply worker). The origin is advanced when first created.
* The tablesync replication origin tracking is cleaned up during DropSubscription and/or process_syncing_tables_for_apply.
* The DropSubscription cleanup code was enhanced (v7+) to take care of any crashed tablesync workers.
* The AlterSubscription_refresh (v14+) is now more similar to DropSubscription w.r.t to stopping workers for any "removed" tables.
* Updates to PG docs.
TODO / Known Issues:
* Minor review comments
---
doc/src/sgml/catalogs.sgml | 1 +
doc/src/sgml/logical-replication.sgml | 17 +-
doc/src/sgml/ref/drop_subscription.sgml | 6 +-
src/backend/commands/subscriptioncmds.c | 454 ++++++++++++++++++++--------
src/backend/replication/logical/origin.c | 2 +-
src/backend/replication/logical/tablesync.c | 244 ++++++++++++---
src/backend/replication/logical/worker.c | 18 +-
src/include/catalog/pg_subscription_rel.h | 2 +
src/include/replication/slot.h | 3 +
9 files changed, 553 insertions(+), 194 deletions(-)
diff --git a/doc/src/sgml/catalogs.sgml b/doc/src/sgml/catalogs.sgml
index 3a22665..2e46a49 100644
--- a/doc/src/sgml/catalogs.sgml
+++ b/doc/src/sgml/catalogs.sgml
@@ -7651,6 +7651,7 @@ SCRAM-SHA-256$<replaceable><iteration count></replaceable>:<replaceable>&l
State code:
<literal>i</literal> = initialize,
<literal>d</literal> = data is being copied,
+ <literal>f</literal> = finished table copy,
<literal>s</literal> = synchronized,
<literal>r</literal> = ready (normal replication)
</para></entry>
diff --git a/doc/src/sgml/logical-replication.sgml b/doc/src/sgml/logical-replication.sgml
index a560ad6..20cdd57 100644
--- a/doc/src/sgml/logical-replication.sgml
+++ b/doc/src/sgml/logical-replication.sgml
@@ -248,7 +248,17 @@
<para>
As mentioned earlier, each (active) subscription receives changes from a
- replication slot on the remote (publishing) side. Normally, the remote
+ replication slot on the remote (publishing) side.
+ </para>
+ <para>
+ Additional table synchronization slots are normally transient, created
+ internally and dropped automatically when they are no longer needed.
+ These table synchronization slots have generated names:
+ <quote><literal>pg_%u_sync_%u</literal></quote> (parameters: Subscription
+ <parameter>oid</parameter>, Table <parameter>relid</parameter>)
+ </para>
+ <para>
+ Normally, the remote
replication slot is created automatically when the subscription is created
using <command>CREATE SUBSCRIPTION</command> and it is dropped
automatically when the subscription is dropped using <command>DROP
@@ -294,8 +304,9 @@
using <command>ALTER SUBSCRIPTION</command> before attempting to drop
the subscription. If the remote database instance no longer exists, no
further action is then necessary. If, however, the remote database
- instance is just unreachable, the replication slot should then be
- dropped manually; otherwise it would continue to reserve WAL and might
+ instance is just unreachable, the replication slot (and any still
+ remaining table synchronization slots) should then be
+ dropped manually; otherwise it/they would continue to reserve WAL and might
eventually cause the disk to fill up. Such cases should be carefully
investigated.
</para>
diff --git a/doc/src/sgml/ref/drop_subscription.sgml b/doc/src/sgml/ref/drop_subscription.sgml
index adbdeaf..aee9615 100644
--- a/doc/src/sgml/ref/drop_subscription.sgml
+++ b/doc/src/sgml/ref/drop_subscription.sgml
@@ -79,7 +79,8 @@ DROP SUBSCRIPTION [ IF EXISTS ] <replaceable class="parameter">name</replaceable
<para>
When dropping a subscription that is associated with a replication slot on
the remote host (the normal state), <command>DROP SUBSCRIPTION</command>
- will connect to the remote host and try to drop the replication slot as
+ will connect to the remote host and try to drop the replication slot (and
+ any remaining table synchronization slots) as
part of its operation. This is necessary so that the resources allocated
for the subscription on the remote host are released. If this fails,
either because the remote host is not reachable or because the remote
@@ -89,7 +90,8 @@ DROP SUBSCRIPTION [ IF EXISTS ] <replaceable class="parameter">name</replaceable
executing <literal>ALTER SUBSCRIPTION ... SET (slot_name = NONE)</literal>.
After that, <command>DROP SUBSCRIPTION</command> will no longer attempt any
actions on a remote host. Note that if the remote replication slot still
- exists, it should then be dropped manually; otherwise it will continue to
+ exists, it (and any related table synchronization slots) should then be
+ dropped manually; otherwise it/they will continue to
reserve WAL and might eventually cause the disk to fill up. See
also <xref linkend="logical-replication-subscription-slot"/>.
</para>
diff --git a/src/backend/commands/subscriptioncmds.c b/src/backend/commands/subscriptioncmds.c
index 490e935..f94243b 100644
--- a/src/backend/commands/subscriptioncmds.c
+++ b/src/backend/commands/subscriptioncmds.c
@@ -37,6 +37,7 @@
#include "replication/walreceiver.h"
#include "replication/walsender.h"
#include "replication/worker_internal.h"
+#include "replication/slot.h"
#include "storage/lmgr.h"
#include "utils/acl.h"
#include "utils/builtins.h"
@@ -566,100 +567,165 @@ AlterSubscription_refresh(Subscription *sub, bool copy_data)
Oid *pubrel_local_oids;
ListCell *lc;
int off;
+ Relation rel;
/* Load the library providing us libpq calls. */
load_file("libpqwalreceiver", false);
- /* Try to connect to the publisher. */
- wrconn = walrcv_connect(sub->conninfo, true, sub->name, &err);
- if (!wrconn)
- ereport(ERROR,
- (errmsg("could not connect to the publisher: %s", err)));
-
- /* Get the table list from publisher. */
- pubrel_names = fetch_table_list(wrconn, sub->publications);
-
- /* We are done with the remote side, close connection. */
- walrcv_disconnect(wrconn);
+ PG_TRY();
+ {
+ /* Try to connect to the publisher. */
+ wrconn = walrcv_connect(sub->conninfo, true, sub->name, &err);
+ if (!wrconn)
+ ereport(ERROR,
+ (errmsg("could not connect to the publisher: %s", err)));
- /* Get local table list. */
- subrel_states = GetSubscriptionRelations(sub->oid);
+ /* Get the table list from publisher. */
+ pubrel_names = fetch_table_list(wrconn, sub->publications);
- /*
- * Build qsorted array of local table oids for faster lookup. This can
- * potentially contain all tables in the database so speed of lookup is
- * important.
- */
- subrel_local_oids = palloc(list_length(subrel_states) * sizeof(Oid));
- off = 0;
- foreach(lc, subrel_states)
- {
- SubscriptionRelState *relstate = (SubscriptionRelState *) lfirst(lc);
+ /* Get local table list. */
+ subrel_states = GetSubscriptionRelations(sub->oid);
- subrel_local_oids[off++] = relstate->relid;
- }
- qsort(subrel_local_oids, list_length(subrel_states),
- sizeof(Oid), oid_cmp);
+ /*
+ * Build qsorted array of local table oids for faster lookup. This can
+ * potentially contain all tables in the database so speed of lookup
+ * is important.
+ */
+ subrel_local_oids = palloc(list_length(subrel_states) * sizeof(Oid));
+ off = 0;
+ foreach(lc, subrel_states)
+ {
+ SubscriptionRelState *relstate = (SubscriptionRelState *) lfirst(lc);
- /*
- * Walk over the remote tables and try to match them to locally known
- * tables. If the table is not known locally create a new state for it.
- *
- * Also builds array of local oids of remote tables for the next step.
- */
- off = 0;
- pubrel_local_oids = palloc(list_length(pubrel_names) * sizeof(Oid));
+ subrel_local_oids[off++] = relstate->relid;
+ }
+ qsort(subrel_local_oids, list_length(subrel_states),
+ sizeof(Oid), oid_cmp);
+
+ /*
+ * Walk over the remote tables and try to match them to locally known
+ * tables. If the table is not known locally create a new state for
+ * it.
+ *
+ * Also builds array of local oids of remote tables for the next step.
+ */
+ off = 0;
+ pubrel_local_oids = palloc(list_length(pubrel_names) * sizeof(Oid));
+
+ foreach(lc, pubrel_names)
+ {
+ RangeVar *rv = (RangeVar *) lfirst(lc);
+ Oid relid;
- foreach(lc, pubrel_names)
- {
- RangeVar *rv = (RangeVar *) lfirst(lc);
- Oid relid;
+ relid = RangeVarGetRelid(rv, AccessShareLock, false);
- relid = RangeVarGetRelid(rv, AccessShareLock, false);
+ /* Check for supported relkind. */
+ CheckSubscriptionRelkind(get_rel_relkind(relid),
+ rv->schemaname, rv->relname);
- /* Check for supported relkind. */
- CheckSubscriptionRelkind(get_rel_relkind(relid),
- rv->schemaname, rv->relname);
+ pubrel_local_oids[off++] = relid;
- pubrel_local_oids[off++] = relid;
+ if (!bsearch(&relid, subrel_local_oids,
+ list_length(subrel_states), sizeof(Oid), oid_cmp))
+ {
+ AddSubscriptionRelState(sub->oid, relid,
+ copy_data ? SUBREL_STATE_INIT : SUBREL_STATE_READY,
+ InvalidXLogRecPtr);
+ ereport(DEBUG1,
+ (errmsg("table \"%s.%s\" added to subscription \"%s\"",
+ rv->schemaname, rv->relname, sub->name)));
+ }
+ }
- if (!bsearch(&relid, subrel_local_oids,
- list_length(subrel_states), sizeof(Oid), oid_cmp))
+ /*
+ * Next remove state for tables we should not care about anymore using
+ * the data we collected above
+ */
+ qsort(pubrel_local_oids, list_length(pubrel_names),
+ sizeof(Oid), oid_cmp);
+
+ /*
+ * Lock pg_subscription with AccessExclusiveLock to ensure that the
+ * launcher doesn't restart new worker for the ones we are about to
+ * stop.
+ */
+ rel = table_open(SubscriptionRelationId, AccessExclusiveLock);
+
+ for (off = 0; off < list_length(subrel_states); off++)
{
- AddSubscriptionRelState(sub->oid, relid,
- copy_data ? SUBREL_STATE_INIT : SUBREL_STATE_READY,
- InvalidXLogRecPtr);
- ereport(DEBUG1,
- (errmsg("table \"%s.%s\" added to subscription \"%s\"",
- rv->schemaname, rv->relname, sub->name)));
- }
- }
+ Oid relid = subrel_local_oids[off];
- /*
- * Next remove state for tables we should not care about anymore using the
- * data we collected above
- */
- qsort(pubrel_local_oids, list_length(pubrel_names),
- sizeof(Oid), oid_cmp);
+ if (!bsearch(&relid, pubrel_local_oids,
+ list_length(pubrel_names), sizeof(Oid), oid_cmp))
+ {
+ Oid subid = sub->oid;
+ char originname[NAMEDATALEN];
+ RepOriginId originid;
+ char state;
+ XLogRecPtr statelsn;
+
+ /* Immediately stop the worker. */
+ logicalrep_worker_stop_at_commit(subid, relid); /* prevent re-launching */
+ logicalrep_worker_stop(subid, relid); /* stop immediately */
+
+ /* Last known rel state. */
+ state = GetSubscriptionRelState(subid, relid, &statelsn);
+
+ RemoveSubscriptionRel(sub->oid, relid);
+
+ /*
+ * Drop the tablesync slot.
+ *
+ * For SYNCDONE/READY states the tablesync slot is known to
+ * have already been dropped by the tablesync worker.
+ *
+ * For other states, there is no certainty. Maybe the slot
+ * does not exist yet; Maybe the slot is already deleted but
+ * SYNCDONE is not yet set. For this reason we allow
+ * missing_ok = true for the drop.
+ */
+ if (state != SUBREL_STATE_SYNCDONE && state != SUBREL_STATE_READY)
+ {
+ char syncslotname[NAMEDATALEN] = {0};
+ bool missing_ok = true; /* no ERROR if slot is
+ * missing. */
- for (off = 0; off < list_length(subrel_states); off++)
- {
- Oid relid = subrel_local_oids[off];
+ ReplicationSlotNameForTablesync(subid, relid, syncslotname);
- if (!bsearch(&relid, pubrel_local_oids,
- list_length(pubrel_names), sizeof(Oid), oid_cmp))
- {
- RemoveSubscriptionRel(sub->oid, relid);
+ elog(DEBUG1,
+ "AlterSubscription_refresh: dropping the tablesync slot \"%s\".",
+ syncslotname);
+ ReplicationSlotDropAtPubNode(wrconn, syncslotname, missing_ok);
+ }
- logicalrep_worker_stop_at_commit(sub->oid, relid);
+ /* Remove the tablesync's origin tracking if exists. */
+ snprintf(originname, sizeof(originname), "pg_%u_%u", subid, relid);
+ originid = replorigin_by_name(originname, true);
+ if (OidIsValid(originid))
+ {
+ elog(DEBUG1,
+ "AlterSubscription_refresh: dropping origin tracking for \"%s\"",
+ originname);
+ replorigin_drop(originid, false);
+ }
- ereport(DEBUG1,
- (errmsg("table \"%s.%s\" removed from subscription \"%s\"",
- get_namespace_name(get_rel_namespace(relid)),
- get_rel_name(relid),
- sub->name)));
+ ereport(DEBUG1,
+ (errmsg("table \"%s.%s\" removed from subscription \"%s\"",
+ get_namespace_name(get_rel_namespace(relid)),
+ get_rel_name(relid),
+ sub->name)));
+ }
}
+
+ table_close(rel, NoLock);
+
}
+ PG_FINALLY();
+ {
+ if (wrconn)
+ walrcv_disconnect(wrconn);
+ }
+ PG_END_TRY();
}
/*
@@ -928,8 +994,8 @@ DropSubscription(DropSubscriptionStmt *stmt, bool isTopLevel)
char *err = NULL;
RepOriginId originid;
WalReceiverConn *wrconn = NULL;
- StringInfoData cmd;
Form_pg_subscription form;
+ List *rstates;
/*
* Lock pg_subscription with AccessExclusiveLock to ensure that the
@@ -1016,100 +1082,220 @@ DropSubscription(DropSubscriptionStmt *stmt, bool isTopLevel)
ReleaseSysCache(tup);
/*
- * Stop all the subscription workers immediately.
- *
- * This is necessary if we are dropping the replication slot, so that the
- * slot becomes accessible.
- *
- * It is also necessary if the subscription is disabled and was disabled
- * in the same transaction. Then the workers haven't seen the disabling
- * yet and will still be running, leading to hangs later when we want to
- * drop the replication origin. If the subscription was disabled before
- * this transaction, then there shouldn't be any workers left, so this
- * won't make a difference.
+ * Try to acquire the connection necessary for dropping slots. We do this
+ * here so that the same connection may be shared for dropping the
+ * Subscription slot, as well as dropping any tablesync slots.
*
- * New workers won't be started because we hold an exclusive lock on the
- * subscription till the end of the transaction.
+ * Note: If the slotname is NONE/NULL then connection errors are
+ * suppressed. This is necessary so that the DROP SUBSCRIPTION can still
+ * complete even when the connection to publisher is broken.
*/
- LWLockAcquire(LogicalRepWorkerLock, LW_SHARED);
- subworkers = logicalrep_workers_find(subid, false);
- LWLockRelease(LogicalRepWorkerLock);
- foreach(lc, subworkers)
+ load_file("libpqwalreceiver", false);
+
+ wrconn = walrcv_connect(conninfo, true, subname, &err);
+ if (wrconn == NULL && slotname != NULL)
+ ereport(ERROR,
+ (errmsg("could not connect to publisher when attempting to "
+ "drop the replication slot \"%s\"", slotname),
+ errdetail("The error was: %s", err),
+ /* translator: %s is an SQL ALTER command */
+ errhint("Use %s to disassociate the subscription from the slot.",
+ "ALTER SUBSCRIPTION ... SET (slot_name = NONE)")));
+
+ PG_TRY();
{
- LogicalRepWorker *w = (LogicalRepWorker *) lfirst(lc);
+ /*
+ * Stop all the subscription workers immediately.
+ *
+ * This is necessary if we are dropping the replication slot, so that
+ * the slot becomes accessible.
+ *
+ * It is also necessary if the subscription is disabled and was
+ * disabled in the same transaction. Then the workers haven't seen
+ * the disabling yet and will still be running, leading to hangs later
+ * when we want to drop the replication origin. If the subscription
+ * was disabled before this transaction, then there shouldn't be any
+ * workers left, so this won't make a difference.
+ *
+ * New workers won't be started because we hold an exclusive lock on
+ * the subscription till the end of the transaction.
+ */
+ LWLockAcquire(LogicalRepWorkerLock, LW_SHARED);
+ subworkers = logicalrep_workers_find(subid, false);
+ LWLockRelease(LogicalRepWorkerLock);
+ foreach(lc, subworkers)
+ {
+ LogicalRepWorker *w = (LogicalRepWorker *) lfirst(lc);
- logicalrep_worker_stop(w->subid, w->relid);
- }
- list_free(subworkers);
+ logicalrep_worker_stop(w->subid, w->relid);
+ }
+ list_free(subworkers);
+
+ /*
+ * Tablesync resource cleanup (slots and origins).
+ *
+ * Any READY-state relations would already have dealt with clean-ups.
+ */
+ rstates = GetSubscriptionNotReadyRelations(subid);
+ foreach(lc, rstates)
+ {
+ SubscriptionRelState *rstate = (SubscriptionRelState *) lfirst(lc);
+ Oid relid = rstate->relid;
- /* Clean up dependencies */
- deleteSharedDependencyRecordsFor(SubscriptionRelationId, subid, 0);
+ /* Only cleanup resources of tablesync workers */
+ if (!OidIsValid(relid))
+ continue;
- /* Remove any associated relation synchronization states. */
- RemoveSubscriptionRel(subid, InvalidOid);
+ /*
+ * Drop the tablesync slot.
+ *
+ * For SYNCDONE/READY states the tablesync slot is known to have
+ * already been dropped by the tablesync worker.
+ *
+ * For other states, there is no certainty. Maybe the slot does
+ * not exist yet; Maybe the slot is already deleted but SYNCDONE
+ * is not yet set. For this reason we allow missing_ok = true for
+ * the drop.
+ */
+ if (rstate->state != SUBREL_STATE_SYNCDONE)
+ {
+ char syncslotname[NAMEDATALEN] = {0};
+ bool missing_ok = true; /* no ERROR if slot is
+ * missing. */
- /* Remove the origin tracking if exists. */
- snprintf(originname, sizeof(originname), "pg_%u", subid);
- originid = replorigin_by_name(originname, true);
- if (originid != InvalidRepOriginId)
- replorigin_drop(originid, false);
+ ReplicationSlotNameForTablesync(subid, relid, syncslotname);
- /*
- * If there is no slot associated with the subscription, we can finish
- * here.
- */
- if (!slotname)
+ if (!wrconn)
+ {
+ /*
+ * It is only possible to reach here without ERROR for a
+ * broken publisher connection if the subscription
+ * slotname is already NONE/NULL.
+ *
+ * This means the user has disassociated the subscription
+ * from the replication slot deliberately so that the DROP
+ * SUBSCRIPTION can proceed to completion. See
+ * https://www.postgresql.org/docs/current/sql-dropsubscription.html
+ *
+ * For this reason we only give a WARNING a message that
+ * the tablesync slots cannot be dropped, rather than
+ * throw ERROR (which would prevent the DROP SUBSCRIPTION
+ * from proceeding).
+ *
+ * In such a case the user must take steps to manually
+ * cleanup these remaining tablesync slots.
+ */
+ elog(WARNING,
+ "no connection; cannot drop tablesync slot \"%s\".",
+ syncslotname);
+ }
+ else
+ {
+ elog(DEBUG1,
+ "DropSubscription: dropping the tablesync slot \"%s\".",
+ syncslotname);
+ ReplicationSlotDropAtPubNode(wrconn, syncslotname, missing_ok);
+ }
+ }
+
+ /* Remove the tablesync's origin tracking if exists. */
+ snprintf(originname, sizeof(originname), "pg_%u_%u", subid, relid);
+ originid = replorigin_by_name(originname, true);
+ if (originid != InvalidRepOriginId)
+ {
+ elog(DEBUG1,
+ "DropSubscription: dropping origin tracking for \"%s\"",
+ originname);
+ replorigin_drop(originid, false);
+ }
+ }
+ list_free(rstates);
+
+ /* Clean up dependencies. */
+ deleteSharedDependencyRecordsFor(SubscriptionRelationId, subid, 0);
+
+ /* Remove any associated relation synchronization states. */
+ RemoveSubscriptionRel(subid, InvalidOid);
+
+ /* Remove the origin tracking if exists. */
+ snprintf(originname, sizeof(originname), "pg_%u", subid);
+ originid = replorigin_by_name(originname, true);
+ if (originid != InvalidRepOriginId)
+ replorigin_drop(originid, false);
+
+ /*
+ * If there is a slot associated with the subscription, then drop the
+ * replication slot at the publisher node using the replication
+ * connection.
+ */
+ if (slotname)
+ ReplicationSlotDropAtPubNode(wrconn, slotname, false);
+ }
+ PG_FINALLY();
{
+ if (wrconn)
+ walrcv_disconnect(wrconn);
+
table_close(rel, NoLock);
- return;
}
+ PG_END_TRY();
+}
+
+
+/*
+ * Drop the replication slot at the publisher node using the replication connection.
+ *
+ * missing_ok - if true then only issue WARNING message if the slot cannot be deleted.
+ */
+void
+ReplicationSlotDropAtPubNode(WalReceiverConn *wrconn, char *slotname, bool missing_ok)
+{
+ StringInfoData cmd;
+
+ Assert(wrconn);
- /*
- * Otherwise drop the replication slot at the publisher node using the
- * replication connection.
- */
load_file("libpqwalreceiver", false);
initStringInfo(&cmd);
appendStringInfo(&cmd, "DROP_REPLICATION_SLOT %s WAIT", quote_identifier(slotname));
- wrconn = walrcv_connect(conninfo, true, subname, &err);
- if (wrconn == NULL)
- ereport(ERROR,
- (errmsg("could not connect to publisher when attempting to "
- "drop the replication slot \"%s\"", slotname),
- errdetail("The error was: %s", err),
- /* translator: %s is an SQL ALTER command */
- errhint("Use %s to disassociate the subscription from the slot.",
- "ALTER SUBSCRIPTION ... SET (slot_name = NONE)")));
-
PG_TRY();
{
WalRcvExecResult *res;
res = walrcv_exec(wrconn, cmd.data, 0, NULL);
- if (res->status != WALRCV_OK_COMMAND)
- ereport(ERROR,
+ if (res->status == WALRCV_OK_COMMAND)
+ {
+ /* NOTICE. Success. */
+ ereport(NOTICE,
+ (errmsg("dropped replication slot \"%s\" on publisher",
+ slotname)));
+ }
+ else if (res->status == WALRCV_ERROR && missing_ok)
+ {
+ /* WARNING. Error, but missing_ok = true. */
+ ereport(WARNING,
(errmsg("could not drop the replication slot \"%s\" on publisher",
slotname),
errdetail("The error was: %s", res->err)));
+ }
else
- ereport(NOTICE,
- (errmsg("dropped replication slot \"%s\" on publisher",
- slotname)));
+ {
+ /* ERROR. */
+ ereport(ERROR,
+ (errmsg("could not drop the replication slot \"%s\" on publisher",
+ slotname),
+ errdetail("The error was: %s", res->err)));
+ }
walrcv_clear_result(res);
}
PG_FINALLY();
{
- walrcv_disconnect(wrconn);
+ pfree(cmd.data);
}
PG_END_TRY();
-
- pfree(cmd.data);
-
- table_close(rel, NoLock);
}
/*
diff --git a/src/backend/replication/logical/origin.c b/src/backend/replication/logical/origin.c
index 0b01cce..b299285 100644
--- a/src/backend/replication/logical/origin.c
+++ b/src/backend/replication/logical/origin.c
@@ -905,7 +905,7 @@ replorigin_advance(RepOriginId node,
LWLockAcquire(&replication_state->lock, LW_EXCLUSIVE);
/* Make sure it's not used by somebody else */
- if (replication_state->acquired_by != 0)
+ if (replication_state->acquired_by != 0 && replication_state->acquired_by != MyProcPid)
{
ereport(ERROR,
(errcode(ERRCODE_OBJECT_IN_USE),
diff --git a/src/backend/replication/logical/tablesync.c b/src/backend/replication/logical/tablesync.c
index 863d196..c5a95fc 100644
--- a/src/backend/replication/logical/tablesync.c
+++ b/src/backend/replication/logical/tablesync.c
@@ -31,8 +31,10 @@
* table state to INIT.
* - Tablesync worker starts; changes table state from INIT to DATASYNC while
* copying.
- * - Tablesync worker finishes the copy and sets table state to SYNCWAIT;
- * waits for state change.
+ * - Tablesync worker does initial table copy; there is a FINISHEDCOPY state to
+ * indicate when the copy phase has completed, so if the worker crashes
+ * before reaching SYNCDONE the copy will not be re-attempted.
+ * - Tablesync worker then sets table state to SYNCWAIT; waits for state change.
* - Apply worker periodically checks for tables in SYNCWAIT state. When
* any appear, it sets the table state to CATCHUP and starts loop-waiting
* until either the table state is set to SYNCDONE or the sync worker
@@ -48,8 +50,8 @@
* point it sets state to READY and stops tracking. Again, there might
* be zero changes in between.
*
- * So the state progression is always: INIT -> DATASYNC -> SYNCWAIT ->
- * CATCHUP -> SYNCDONE -> READY.
+ * So the state progression is always: INIT -> DATASYNC ->
+ * (sync worker FINISHEDCOPY) -> SYNCWAIT -> CATCHUP -> SYNCDONE -> READY.
*
* The catalog pg_subscription_rel is used to keep information about
* subscribed tables and their state. Some transient state during data
@@ -59,6 +61,7 @@
* Example flows look like this:
* - Apply is in front:
* sync:8
+ * -> set in catalog FINISHEDCOPY
* -> set in memory SYNCWAIT
* apply:10
* -> set in memory CATCHUP
@@ -74,6 +77,7 @@
*
* - Sync is in front:
* sync:10
+ * -> set in catalog FINISHEDCOPY
* -> set in memory SYNCWAIT
* apply:8
* -> set in memory CATCHUP
@@ -102,6 +106,8 @@
#include "replication/logicalrelation.h"
#include "replication/walreceiver.h"
#include "replication/worker_internal.h"
+#include "replication/slot.h"
+#include "replication/origin.h"
#include "storage/ipc.h"
#include "utils/builtins.h"
#include "utils/lsyscache.h"
@@ -270,30 +276,57 @@ invalidate_syncing_table_states(Datum arg, int cacheid, uint32 hashvalue)
static void
process_syncing_tables_for_sync(XLogRecPtr current_lsn)
{
- Assert(IsTransactionState());
+ bool sync_done = false;
SpinLockAcquire(&MyLogicalRepWorker->relmutex);
+ sync_done = MyLogicalRepWorker->relstate == SUBREL_STATE_CATCHUP &&
+ current_lsn >= MyLogicalRepWorker->relstate_lsn;
+ SpinLockRelease(&MyLogicalRepWorker->relmutex);
- if (MyLogicalRepWorker->relstate == SUBREL_STATE_CATCHUP &&
- current_lsn >= MyLogicalRepWorker->relstate_lsn)
+ if (sync_done)
{
TimeLineID tli;
+ char syncslotname[NAMEDATALEN] = {0};
+
+ /* End wal streaming so wrconn can be re-used to drop the slot. */
+ walrcv_endstreaming(wrconn, &tli);
+
+ /*
+ * Cleanup the tablesync slot.
+ */
+ ReplicationSlotNameForTablesync(
+ MySubscription->oid,
+ MyLogicalRepWorker->relid,
+ syncslotname);
+ elog(DEBUG1, "process_syncing_tables_for_sync: dropping the tablesync slot \"%s\".", syncslotname);
+ ReplicationSlotDropAtPubNode(wrconn, syncslotname, false);
+
+ /*
+ * Change state to SYNCDONE.
+ */
+ SpinLockAcquire(&MyLogicalRepWorker->relmutex);
MyLogicalRepWorker->relstate = SUBREL_STATE_SYNCDONE;
MyLogicalRepWorker->relstate_lsn = current_lsn;
SpinLockRelease(&MyLogicalRepWorker->relmutex);
+ /*
+ * UpdateSubscriptionRelState must be called within a transaction.
+ * That transaction will be ended within the finish_sync_worker().
+ */
+ if (!IsTransactionState())
+ {
+ StartTransactionCommand();
+ }
+
UpdateSubscriptionRelState(MyLogicalRepWorker->subid,
MyLogicalRepWorker->relid,
MyLogicalRepWorker->relstate,
MyLogicalRepWorker->relstate_lsn);
- walrcv_endstreaming(wrconn, &tli);
finish_sync_worker();
}
- else
- SpinLockRelease(&MyLogicalRepWorker->relmutex);
}
/*
@@ -412,6 +445,35 @@ process_syncing_tables_for_apply(XLogRecPtr current_lsn)
started_tx = true;
}
+ /*
+ * Remove the tablesync origin tracking if exists.
+ *
+ * The cleanup is done here instead of in the
+ * finish_sync_worker function because if the tablesync worker
+ * process attempted to call replorigin_drop then that will
+ * hang because replorigin_drop logic considers the owning
+ * tablesync PID as "busy".
+ *
+ * Do this before updating the state, so that DropSubscription
+ * can know that all READY workers have already had their
+ * origin tracking removed.
+ */
+ {
+ char originname[NAMEDATALEN];
+ RepOriginId originid;
+
+ snprintf(originname, sizeof(originname), "pg_%u_%u", MyLogicalRepWorker->subid, rstate->relid);
+ originid = replorigin_by_name(originname, true);
+ if (OidIsValid(originid))
+ {
+ elog(DEBUG1, "process_syncing_tables_for_apply: dropping tablesync origin tracking for \"%s\".", originname);
+ replorigin_drop(originid, false);
+ }
+ }
+
+ /*
+ * Update the state only after the origin cleanup.
+ */
UpdateSubscriptionRelState(MyLogicalRepWorker->subid,
rstate->relid, rstate->state,
rstate->lsn);
@@ -808,6 +870,30 @@ copy_table(Relation rel)
}
/*
+ * Determine the tablesync slot name.
+ *
+ * The name must not exceed NAMEDATALEN -1 because of remote node constraints on
+ * slot name length.
+ *
+ * The returned slot name is either returned in the supplied buffer or
+ * palloc'ed in current memory context (if NULL buffer).
+ */
+char *
+ReplicationSlotNameForTablesync(Oid suboid, Oid relid, char *syncslotname)
+{
+ if (syncslotname)
+ {
+ sprintf(syncslotname, "pg_%u_sync_%u", suboid, relid);
+ }
+ else
+ {
+ syncslotname = psprintf("pg_%u_sync_%u", suboid, relid);
+ }
+
+ return syncslotname;
+}
+
+/*
* Start syncing the table in the sync worker.
*
* If nothing needs to be done to sync the table, we exit the worker without
@@ -824,6 +910,8 @@ LogicalRepSyncTableStart(XLogRecPtr *origin_startpos)
XLogRecPtr relstate_lsn;
Relation rel;
WalRcvExecResult *res;
+ char originname[NAMEDATALEN];
+ RepOriginId originid;
/* Check the state of the table synchronization. */
StartTransactionCommand();
@@ -849,19 +937,11 @@ LogicalRepSyncTableStart(XLogRecPtr *origin_startpos)
finish_sync_worker(); /* doesn't return */
}
- /*
- * To build a slot name for the sync work, we are limited to NAMEDATALEN -
- * 1 characters. We cut the original slot name to NAMEDATALEN - 28 chars
- * and append _%u_sync_%u (1 + 10 + 6 + 10 + '\0'). (It's actually the
- * NAMEDATALEN on the remote that matters, but this scheme will also work
- * reasonably if that is different.)
- */
- StaticAssertStmt(NAMEDATALEN >= 32, "NAMEDATALEN too small"); /* for sanity */
- slotname = psprintf("%.*s_%u_sync_%u",
- NAMEDATALEN - 28,
- MySubscription->slotname,
- MySubscription->oid,
- MyLogicalRepWorker->relid);
+ /* Calculate the name of the tablesync slot. */
+ slotname = ReplicationSlotNameForTablesync(
+ MySubscription->oid,
+ MyLogicalRepWorker->relid,
+ NULL); /* use palloc */
/*
* Here we use the slot name instead of the subscription name as the
@@ -874,7 +954,31 @@ LogicalRepSyncTableStart(XLogRecPtr *origin_startpos)
(errmsg("could not connect to the publisher: %s", err)));
Assert(MyLogicalRepWorker->relstate == SUBREL_STATE_INIT ||
- MyLogicalRepWorker->relstate == SUBREL_STATE_DATASYNC);
+ MyLogicalRepWorker->relstate == SUBREL_STATE_DATASYNC ||
+ MyLogicalRepWorker->relstate == SUBREL_STATE_FINISHEDCOPY);
+
+ /* Assign the origin tracking record name. */
+ snprintf(originname, sizeof(originname), "pg_%u_%u", MySubscription->oid, MyLogicalRepWorker->relid);
+
+ if (MyLogicalRepWorker->relstate == SUBREL_STATE_FINISHEDCOPY)
+ {
+ /*
+ * The COPY phase was previously done, but tablesync then crashed/etc
+ * before it was able to finish normally.
+ */
+ elog(DEBUG1, "LogicalRepSyncTableStart: tablesync relstate was SUBREL_STATE_FINISHEDCOPY.");
+ StartTransactionCommand();
+
+ /*
+ * The origin tracking name must already exist (missing_ok=false).
+ */
+ originid = replorigin_by_name(originname, false);
+ replorigin_session_setup(originid);
+ replorigin_session_origin = originid;
+ *origin_startpos = replorigin_session_get_progress(false);
+
+ goto copy_table_done;
+ }
SpinLockAcquire(&MyLogicalRepWorker->relmutex);
MyLogicalRepWorker->relstate = SUBREL_STATE_DATASYNC;
@@ -890,9 +994,6 @@ LogicalRepSyncTableStart(XLogRecPtr *origin_startpos)
CommitTransactionCommand();
pgstat_report_stat(false);
- /*
- * We want to do the table data sync in a single transaction.
- */
StartTransactionCommand();
/*
@@ -918,29 +1019,90 @@ LogicalRepSyncTableStart(XLogRecPtr *origin_startpos)
walrcv_clear_result(res);
/*
- * Create a new temporary logical decoding slot. This slot will be used
+ * Create a new permanent logical decoding slot. This slot will be used
* for the catchup phase after COPY is done, so tell it to use the
* snapshot to make the final data consistent.
*/
- walrcv_create_slot(wrconn, slotname, true,
+ walrcv_create_slot(wrconn, slotname, false,
CRS_USE_SNAPSHOT, origin_startpos);
- /* Now do the initial data copy */
- PushActiveSnapshot(GetTransactionSnapshot());
- copy_table(rel);
- PopActiveSnapshot();
+ /*
+ * Be sure to remove the newly created tablesync slot if the COPY fails.
+ */
+ PG_TRY();
+ {
+ /* Now do the initial data copy */
+ PushActiveSnapshot(GetTransactionSnapshot());
+ copy_table(rel);
+ PopActiveSnapshot();
+
+ res = walrcv_exec(wrconn, "COMMIT", 0, NULL);
+ if (res->status != WALRCV_OK_COMMAND)
+ ereport(ERROR,
+ (errmsg("table copy could not finish transaction on publisher"),
+ errdetail("The error was: %s", res->err)));
+ walrcv_clear_result(res);
+
+ table_close(rel, NoLock);
+
+ /* Make the copy visible. */
+ CommandCounterIncrement();
+ }
+ PG_CATCH();
+ {
+ /*
+ * If something failed during copy table then cleanup the created
+ * slot.
+ */
+ elog(DEBUG1, "LogicalRepSyncTableStart: tablesync copy failed. Dropping the tablesync slot \"%s\".", slotname);
+ ReplicationSlotDropAtPubNode(wrconn, slotname, false);
+
+ pfree(slotname);
+ slotname = NULL;
- res = walrcv_exec(wrconn, "COMMIT", 0, NULL);
- if (res->status != WALRCV_OK_COMMAND)
+ PG_RE_THROW();
+ }
+ PG_END_TRY();
+
+ /* Setup replication origin tracking. */
+ originid = replorigin_by_name(originname, true);
+ if (!OidIsValid(originid))
+ {
+ /*
+ * Origin tracking does not exist. Create it now, and advance to LSN
+ * got from walrcv_create_slot.
+ */
+ originid = replorigin_create(originname);
+ replorigin_session_setup(originid);
+ replorigin_session_origin = originid;
+ replorigin_advance(originid, *origin_startpos, InvalidXLogRecPtr,
+ true /* go backward */ , true /* WAL log */ );
+ }
+ else
+ {
ereport(ERROR,
- (errmsg("table copy could not finish transaction on publisher"),
- errdetail("The error was: %s", res->err)));
- walrcv_clear_result(res);
+ (errcode(ERRCODE_DUPLICATE_OBJECT),
+ errmsg("replication origin \"%s\" already exists",
+ originname)));
+ }
+
+ /*
+ * Update the persisted state to indicate the COPY phase is done; make it
+ * visible to others.
+ */
+ UpdateSubscriptionRelState(MyLogicalRepWorker->subid,
+ MyLogicalRepWorker->relid,
+ SUBREL_STATE_FINISHEDCOPY,
+ MyLogicalRepWorker->relstate_lsn);
- table_close(rel, NoLock);
+copy_table_done:
- /* Make the copy visible. */
- CommandCounterIncrement();
+ elog(DEBUG1, "LogicalRepSyncTableStart: '%s' origin_startpos lsn %X/%X",
+ originname,
+ (uint32) (*origin_startpos >> 32),
+ (uint32) *origin_startpos);
+
+ CommitTransactionCommand();
/*
* We are done with the initial data synchronization, update the state.
diff --git a/src/backend/replication/logical/worker.c b/src/backend/replication/logical/worker.c
index 1b1d70e..4bd4030 100644
--- a/src/backend/replication/logical/worker.c
+++ b/src/backend/replication/logical/worker.c
@@ -807,12 +807,8 @@ apply_handle_stream_stop(StringInfo s)
/* We must be in a valid transaction state */
Assert(IsTransactionState());
- /* The synchronization worker runs in single transaction. */
- if (!am_tablesync_worker())
- {
- /* Commit the per-stream transaction */
- CommitTransactionCommand();
- }
+ /* Commit the per-stream transaction */
+ CommitTransactionCommand();
in_streamed_transaction = false;
@@ -889,9 +885,7 @@ apply_handle_stream_abort(StringInfo s)
/* Cleanup the subxact info */
cleanup_subxact_info();
- /* The synchronization worker runs in single transaction */
- if (!am_tablesync_worker())
- CommitTransactionCommand();
+ CommitTransactionCommand();
return;
}
@@ -918,8 +912,7 @@ apply_handle_stream_abort(StringInfo s)
/* write the updated subxact list */
subxact_info_write(MyLogicalRepWorker->subid, xid);
- if (!am_tablesync_worker())
- CommitTransactionCommand();
+ CommitTransactionCommand();
}
}
@@ -1062,8 +1055,7 @@ apply_handle_stream_commit(StringInfo s)
static void
apply_handle_commit_internal(StringInfo s, LogicalRepCommitData* commit_data)
{
- /* The synchronization worker runs in single transaction. */
- if (IsTransactionState() && !am_tablesync_worker())
+ if (IsTransactionState())
{
/*
* Update origin state so we can restart streaming from correct
diff --git a/src/include/catalog/pg_subscription_rel.h b/src/include/catalog/pg_subscription_rel.h
index 06663b9..9027c42 100644
--- a/src/include/catalog/pg_subscription_rel.h
+++ b/src/include/catalog/pg_subscription_rel.h
@@ -61,6 +61,8 @@ DECLARE_UNIQUE_INDEX(pg_subscription_rel_srrelid_srsubid_index, 6117, on pg_subs
#define SUBREL_STATE_INIT 'i' /* initializing (sublsn NULL) */
#define SUBREL_STATE_DATASYNC 'd' /* data is being synchronized (sublsn
* NULL) */
+#define SUBREL_STATE_FINISHEDCOPY 'f' /* tablesync copy phase is completed
+ * (sublsn NULL) */
#define SUBREL_STATE_SYNCDONE 's' /* synchronization finished in front of
* apply (sublsn set) */
#define SUBREL_STATE_READY 'r' /* ready (sublsn set) */
diff --git a/src/include/replication/slot.h b/src/include/replication/slot.h
index 53f636c..5f52335 100644
--- a/src/include/replication/slot.h
+++ b/src/include/replication/slot.h
@@ -15,6 +15,7 @@
#include "storage/lwlock.h"
#include "storage/shmem.h"
#include "storage/spin.h"
+#include "replication/walreceiver.h"
/*
* Behaviour of replication slots, upon release or crash.
@@ -211,6 +212,8 @@ extern bool ReplicationSlotsCountDBSlots(Oid dboid, int *nslots, int *nactive);
extern void ReplicationSlotsDropDBSlots(Oid dboid);
extern void InvalidateObsoleteReplicationSlots(XLogSegNo oldestSegno);
extern ReplicationSlot *SearchNamedReplicationSlot(const char *name);
+extern char *ReplicationSlotNameForTablesync(Oid suboid, Oid relid, char *syncslotname);
+extern void ReplicationSlotDropAtPubNode(WalReceiverConn *wrconn, char *slotname, bool missing_ok);
extern void StartupReplicationSlots(void);
extern void CheckPointReplicationSlots(void);
--
1.8.3.1
On Sat, Jan 9, 2021 at 5:44 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
On Fri, Jan 8, 2021 at 2:55 PM Peter Smith <smithpb2250@gmail.com> wrote:
On Fri, Jan 8, 2021 at 1:02 PM Hou, Zhijie <houzj.fnst@cn.fujitsu.com> wrote:
3. + /* + * To build a slot name for the sync work, we are limited to NAMEDATALEN - + * 1 characters. + * + * The name is calculated as pg_%u_sync_%u (3 + 10 + 6 + 10 + '\0'). (It's + * actually the NAMEDATALEN on the remote that matters, but this scheme + * will also work reasonably if that is different.) + */ + StaticAssertStmt(NAMEDATALEN >= 32, "NAMEDATALEN too small"); /* for sanity */ + + syncslotname = psprintf("pg_%u_sync_%u", suboid, relid);The comments says syncslotname is limit to NAMEDATALEN - 1 characters.
But the actual size of it is (3 + 10 + 6 + 10 + '\0') = 30,which seems not NAMEDATALEN - 1.
Should we change the comment here?The comment wording is a remnant from older code which had a
differently format slot name.
I think the comment is still valid, albeit maybe unnecessary since in
the current code the tablesync slot
name length is fixed. But I left the older comment here as a safety reminder
in case some future change would want to modify the slot name. What do
you think?I find it quite confusing. The comments should reflect the latest
code. You can probably say in some form that the length of slotname
shouldn't exceed NAMEDATALEN because of remote node constraints on
slot name length. Also, probably the StaticAssert on NAMEDATALEN is
not required.
Modified comment in latest patch [v14]
1. + <para> + Additional table synchronization slots are normally transient, created + internally and dropped automatically when they are no longer needed. + These table synchronization slots have generated names: + <quote><literal>pg_%u_sync_%u</literal></quote> (parameters: Subscription <parameter>oid</parameter>, Table <parameter>relid</parameter>) + </para>The last line seems too long. I think we are not strict for 80 char
limit in docs but it is good to be close to that, however, this
appears quite long.
Fixed in latest patch [v14]
2. + /* + * Cleanup any remaining tablesync resources. + */ + { + char originname[NAMEDATALEN]; + RepOriginId originid; + char state; + XLogRecPtr statelsn;I have already mentioned previously that let's not use this new style
of code (start using { to localize the scope of variables). I don't
know about others but I find it difficult to read such a code. You
might want to consider moving this whole block to a separate function.
Removed extra code block in latest patch [v14]
3. /* + * XXX - Should optimize this to avoid multiple + * connect/disconnect. + */ + wrconn = walrcv_connect(sub->conninfo, true, sub->name, &err);I think it is better to avoid multiple connect/disconnect here. In
this same function, we have connected to the publisher, we should be
able to use the same connection.
Fixed in latest patch [v14]
4. process_syncing_tables_for_sync() { .. + /* + * Cleanup the tablesync slot. + */ + syncslotname = ReplicationSlotNameForTablesync( + MySubscription->oid, + MyLogicalRepWorker->relid); + PG_TRY(); + { + elog(DEBUG1, "process_syncing_tables_for_sync: dropping the tablesync slot \"%s\".", syncslotname); + ReplicationSlotDropAtPubNode(wrconn, syncslotname); + } + PG_FINALLY(); + { + pfree(syncslotname); + } + PG_END_TRY(); .. }Both here and in DropSubscription(), it seems we are using
PG_TRY..PG_FINALLY just to free the memory even though
ReplicationSlotDropAtPubNode already has try..finally. Can we arrange
code to move allocation of syncslotname inside
ReplicationSlotDropAtPubNode to avoid additional try..finaly? BTW, if
the usage of try..finally here is only to free the memory, I am not
sure if it is required because I think we will anyway Reset the memory
context where this memory is allocated as part of error handling.
Eliminated need for TRY/FINALLY to free syncslotname in latest patch [v14]
5. #define SUBREL_STATE_DATASYNC 'd' /* data is being synchronized (sublsn * NULL) */ +#define SUBREL_STATE_TCOPYDONE 't' /* tablesync copy phase is completed + * (sublsn NULL) */ #define SUBREL_STATE_SYNCDONE 's' /* synchronization finished in front of * apply (sublsn set) */I am not very happy with the new state name SUBREL_STATE_TCOPYDONE as
it is quite different from other adjoining state names and somehow not
going well with the code. How about SUBREL_STATE_ENDCOPY 'e' or
SUBREL_STATE_FINISHEDCOPY 'f'?
Using SUBREL_STATE_FINISHEDCOPY in latest patch [v14]
---
[v14] = /messages/by-id/CAHut+PsPO2vOp+P7U2szROMy15PKKGanhUrCYQ0ffpy9zG1V1A@mail.gmail.com
Kind Regards,
Peter Smith.
Fujitsu Australia
On Mon, Jan 11, 2021 at 3:32 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
On Fri, Jan 8, 2021 at 8:20 AM Amit Kapila <amit.kapila16@gmail.com> wrote:
On Fri, Jan 8, 2021 at 7:14 AM Peter Smith <smithpb2250@gmail.com> wrote:
FYI, I was able to reproduce this case in debugger. PSA logs showing details.
Thanks for reproducing as I was worried about exactly this case. I
have one question related to logs:##
## ALTER SUBSCRIPTION to REFRESH the publication## This blocks on some latch until the tablesync worker dies, then it continues
##Did you check which exact latch or lock blocks this?
I have checked this myself and the command is waiting on the drop of
origin till the tablesync worker is finished because replorigin_drop()
requires state->acquired_by to be 0 which will only be true once the
tablesync worker exits. I think this is the reason you might have
noticed that the command can't be finished until the tablesync worker
died. So this can't be an interlock between ALTER SUBSCRIPTION ..
REFRESH command and tablesync worker and to that end it seems you have
below Fixme's in the patch:
I have also seen this same blocking reason before in the replorigin_drop().
However, back when I first tested/reproduced the refresh issue
[test-refresh] that
AlterSubscription_refresh was still *original* unchanged code, so at
that time it did not
have any replorigin_drop() in at all. In any case in the latest code
[v14] the AlterSubscription is
immediately stopping the workers so this question may be moot.
+ * FIXME - Usually this cleanup would be OK, but will not + * always be OK because the logicalrep_worker_stop_at_commit + * only "flags" the worker to be stopped in the near future + * but meanwhile it may still be running. In this case there + * could be a race between the tablesync worker and this code + * to see who will succeed with the tablesync drop (and the + * loser will ERROR). + * + * FIXME - Also, checking the state is also not guaranteed + * correct because state might be TCOPYDONE when we checked + * but has since progressed to SYNDONE + */ + + if (state == SUBREL_STATE_TCOPYDONE) + {I feel this was okay for an earlier code but now we need to stop the
tablesync workers before trying to drop the slot as we do in
DropSubscription. Now, if we do that then that would fix the race
conditions mentioned in Fixme but still, there are few more things I
am worried about: (a) What if the launcher again starts the tablesync
worker? One idea could be to acquire AccessExclusiveLock on
SubscriptionRelationId as we do in DropSubscription which is not a
very good idea but I can't think of any other good way. (b) the patch
is just checking SUBREL_STATE_TCOPYDONE before dropping the
replication slot but the slot could be created even before that (in
SUBREL_STATE_DATASYNC state). One idea could be we can try to drop the
slot and if we are not able to drop then we can simply continue
assuming it didn't exist.
The code was modified in the latest patch [v14] something like as suggested.
The workers for removed tables are now immediately stopped (like
DropSubscription does). Although I did include the AccessExclusiveLock
as (a) suggested, AFAIK this was actually ineffective at preventing
the workers relaunching. Instead, I am using
logicalrep_worker_stop_at_commit to do this - testing shows it as
working ok. Please see the code and latest test logs [v14] for
details.
Also, now the Drop/AlterSubscription will only give WARNING if unable
to drop slots, a per suggestion (b). This is also tested [v14].
One minor comment: 1. + SpinLockAcquire(&MyLogicalRepWorker->relmutex); MyLogicalRepWorker->relstate = SUBREL_STATE_SYNCDONE; MyLogicalRepWorker->relstate_lsn = current_lsn; -Spurious line removal.
Fixed in latest patch [v14]
----
[v14] = /messages/by-id/CAHut+PsPO2vOp+P7U2szROMy15PKKGanhUrCYQ0ffpy9zG1V1A@mail.gmail.com
[test-refresh] /messages/by-id/CAHut+Pv7YW7AyO_Q_nf9kzogcJcDFQNe8FBP6yXdzowMz3dY_Q@mail.gmail.com
Kind Regards,
Peter Smith.
Fujitsu Australia
Also PSA some detailed logging evidence of some test scenarios involving Drop/AlterSubscription: + Test-20210112-AlterSubscriptionRefresh-ok.txt = AlterSubscription_refresh which successfully drops a tablesync slot + Test-20210112-AlterSubscriptionRefresh-warning.txt = AlterSubscription_refresh gives WARNING that it cannot drop the tablesync slot (which no longer exists) + Test-20210112-DropSubscription-warning.txt = DropSubscription with a disassociated slot_name gives a WARNING that it cannot drop the tablesync slot (due to broken connection)
Hi
* The AlterSubscription_refresh (v14+) is now more similar to DropSubscription w.r.t to stopping workers for any "removed" tables.
I have an issue about the above feature.
With the patch, it seems does not stop the worker in the case of [1]/messages/by-id/CALj2ACV+0UFpcZs5czYgBpujM9p0Hg1qdOZai_43OU7bqHU_xw@mail.gmail.com.
I probably missed something, should we stop the worker in such case ?
[1]: /messages/by-id/CALj2ACV+0UFpcZs5czYgBpujM9p0Hg1qdOZai_43OU7bqHU_xw@mail.gmail.com
Best regards,
houzj
On Mon, Jan 4, 2021 at 10:48 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
7.
@@ -905,7 +905,7 @@ replorigin_advance(RepOriginId node,
LWLockAcquire(&replication_state->lock, LW_EXCLUSIVE);/* Make sure it's not used by somebody else */ - if (replication_state->acquired_by != 0) + if (replication_state->acquired_by != 0 && replication_state->acquired_by != MyProcPid) {I think you won't need this change if you do replorigin_advance before
replorigin_session_setup in your patch.
As you know the replorigin_session_setup sets the
replication_state->acquired_by to be the current PID. So without this
change the replorigin_advance rejects that same slot state thinking
that it is already active for a different process. Root problem is
that the same process/PID calling both functions would hang. So this
patch change allows replorigin_advance code to be called by self.
IIUC that acquired_by check condition is like a sanity check for the
originid passed. The patched code only does just like what the comment
says:
"/* Make sure it's not used by somebody else */"
Doesn't "somebody else" means "anyone but me" (i.e. anyone but MyProcPid).
Also, “setup” of a thing generally comes before usage of that thing,
so won't it seem strange to do (like the suggestion) and deliberately
call the "setup" function 2nd instead of 1st?
Can you please explain why is it better to do it the suggested way
(switch the calls around) than keep the patch code? Probably there is
a good reason but I am just not understanding it.
----
Kind Regards,
Peter Smith.
Fujitsu Australia
On Wed, Jan 13, 2021 at 1:07 PM Hou, Zhijie <houzj.fnst@cn.fujitsu.com> wrote:
Also PSA some detailed logging evidence of some test scenarios involving Drop/AlterSubscription: + Test-20210112-AlterSubscriptionRefresh-ok.txt = AlterSubscription_refresh which successfully drops a tablesync slot + Test-20210112-AlterSubscriptionRefresh-warning.txt = AlterSubscription_refresh gives WARNING that it cannot drop the tablesync slot (which no longer exists) + Test-20210112-DropSubscription-warning.txt = DropSubscription with a disassociated slot_name gives a WARNING that it cannot drop the tablesync slot (due to broken connection)Hi
* The AlterSubscription_refresh (v14+) is now more similar to DropSubscription w.r.t to stopping workers for any "removed" tables.
I have an issue about the above feature.
With the patch, it seems does not stop the worker in the case of [1].
I probably missed something, should we stop the worker in such case ?[1] /messages/by-id/CALj2ACV+0UFpcZs5czYgBpujM9p0Hg1qdOZai_43OU7bqHU_xw@mail.gmail.com
I am not exactly sure of the concern. (If the extra info below does
not help can you please describe your concern with more details).
This [v14] patch code/feature is only referring to the immediate
stopping of only the *** "tablesync" *** worker (if any) for any/each
table being removed from the subscription. It has nothing to say about
the "apply" worker of the subscription, which continues replicating as
before.
OTOH, I think the other mail problem is not really related to the
"tablesync" workers. As you can see (e.g. steps 7,8,9,10 of [2]= /messages/by-id/CALj2ACV+0UFpcZs5czYgBpujM9p0Hg1qdOZai_43OU7bqHU_xw@mail.gmail.com), that
problem is described as continuing over multiple transactions to
replicate unexpected rows - I think this could only be done by the
subscription "apply" worker, and is after the "tablesync" worker has
gone away.
So AFAIK these are 2 quite unrelated problems, and would be solved
independently.
It just happens that they are both exposed using ALTER SUBSCRIPTION
... REFRESH PUBLICATION;
----
[v14] = /messages/by-id/CAHut+PsPO2vOp+P7U2szROMy15PKKGanhUrCYQ0ffpy9zG1V1A@mail.gmail.com
[2]: = /messages/by-id/CALj2ACV+0UFpcZs5czYgBpujM9p0Hg1qdOZai_43OU7bqHU_xw@mail.gmail.com
Kind Regards,
Peter Smith.
Fujitsu Australia
I am not exactly sure of the concern. (If the extra info below does not
help can you please describe your concern with more details).This [v14] patch code/feature is only referring to the immediate stopping
of only the *** "tablesync" *** worker (if any) for any/each table being
removed from the subscription. It has nothing to say about the "apply" worker
of the subscription, which continues replicating as before.OTOH, I think the other mail problem is not really related to the "tablesync"
workers. As you can see (e.g. steps 7,8,9,10 of [2]), that problem is
described as continuing over multiple transactions to replicate unexpected
rows - I think this could only be done by the subscription "apply" worker,
and is after the "tablesync" worker has gone away.So AFAIK these are 2 quite unrelated problems, and would be solved
independently.It just happens that they are both exposed using ALTER SUBSCRIPTION ...
REFRESH PUBLICATION;
So sorry for the confusion, you are right that these are 2 quite unrelated problems.
I misunderstood the 'stop the worker' here.
+ /* Immediately stop the worker. */
+ logicalrep_worker_stop_at_commit(subid, relid); /* prevent re-launching */
+ logicalrep_worker_stop(subid, relid); /* stop immediately */
Do you think we can add some comments to describe what type "worker" is stop here ? (sync worker here)
And should we add some more comments to talk about the reason of " Immediately stop " here ? it may looks easier to understand.
Best regards,
Houzj
On Wed, Jan 13, 2021 at 1:30 PM Hou, Zhijie <houzj.fnst@cn.fujitsu.com> wrote:
I am not exactly sure of the concern. (If the extra info below does not
help can you please describe your concern with more details).This [v14] patch code/feature is only referring to the immediate stopping
of only the *** "tablesync" *** worker (if any) for any/each table being
removed from the subscription. It has nothing to say about the "apply" worker
of the subscription, which continues replicating as before.OTOH, I think the other mail problem is not really related to the "tablesync"
workers. As you can see (e.g. steps 7,8,9,10 of [2]), that problem is
described as continuing over multiple transactions to replicate unexpected
rows - I think this could only be done by the subscription "apply" worker,
and is after the "tablesync" worker has gone away.So AFAIK these are 2 quite unrelated problems, and would be solved
independently.It just happens that they are both exposed using ALTER SUBSCRIPTION ...
REFRESH PUBLICATION;So sorry for the confusion, you are right that these are 2 quite unrelated problems.
I misunderstood the 'stop the worker' here.+ /* Immediately stop the worker. */ + logicalrep_worker_stop_at_commit(subid, relid); /* prevent re-launching */ + logicalrep_worker_stop(subid, relid); /* stop immediately */Do you think we can add some comments to describe what type "worker" is stop here ? (sync worker here)
And should we add some more comments to talk about the reason of " Immediately stop " here ? it may looks easier to understand.
Another thing related to this is why we need to call both
logicalrep_worker_stop_at_commit() and logicalrep_worker_stop()?
--
With Regards,
Amit Kapila.
On Wed, Jan 13, 2021 at 11:18 AM Peter Smith <smithpb2250@gmail.com> wrote:
On Mon, Jan 4, 2021 at 10:48 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
7.
@@ -905,7 +905,7 @@ replorigin_advance(RepOriginId node,
LWLockAcquire(&replication_state->lock, LW_EXCLUSIVE);/* Make sure it's not used by somebody else */ - if (replication_state->acquired_by != 0) + if (replication_state->acquired_by != 0 && replication_state->acquired_by != MyProcPid) {I think you won't need this change if you do replorigin_advance before
replorigin_session_setup in your patch.As you know the replorigin_session_setup sets the
replication_state->acquired_by to be the current PID. So without this
change the replorigin_advance rejects that same slot state thinking
that it is already active for a different process. Root problem is
that the same process/PID calling both functions would hang.
I think the hang happens only if we call unchanged replorigin_advance
after session_setup API, right?
So this
patch change allows replorigin_advance code to be called by self.IIUC that acquired_by check condition is like a sanity check for the
originid passed. The patched code only does just like what the comment
says:
"/* Make sure it's not used by somebody else */"
Doesn't "somebody else" means "anyone but me" (i.e. anyone but MyProcPid).Also, “setup” of a thing generally comes before usage of that thing,
so won't it seem strange to do (like the suggestion) and deliberately
call the "setup" function 2nd instead of 1st?Can you please explain why is it better to do it the suggested way
(switch the calls around) than keep the patch code? Probably there is
a good reason but I am just not understanding it.
Because there is no requirement for origin_advance API to be called
after session setup. Session setup is required to mark the node as
replaying from a remote node, see [1]https://www.postgresql.org/docs/devel/replication-origins.html whereas origin_advance is used
for setting up the initial location or setting a new location, see [2]https://www.postgresql.org/docs/devel/functions-admin.html#FUNCTIONS-REPLICATION
(pg_replication_origin_advance).
Now here, after creating the origin, we need to set up the initial
location and it seems fine to call origin_advance before
session_setup. In short, as such, I don't see any problem with your
change in replorigin_advance but OTOH, I don't see the need for the
same as well. So, let's try to avoid that change unless we can't do
without it.
Also, another thing is we need to take RowExclusiveLock on
pg_replication_origin as written in comments atop replorigin_advance
before calling it. See its usage in pg_replication_origin_advance.
Also, write comments on why we need to use replorigin_advance here
(... something, like we need to WAL log this for the purpose of
recovery...).
[1]: https://www.postgresql.org/docs/devel/replication-origins.html
[2]: https://www.postgresql.org/docs/devel/functions-admin.html#FUNCTIONS-REPLICATION
--
With Regards,
Amit Kapila.
On Tue, Jan 12, 2021 at 6:17 PM Peter Smith <smithpb2250@gmail.com> wrote:
On Mon, Jan 11, 2021 at 3:32 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
On Fri, Jan 8, 2021 at 8:20 AM Amit Kapila <amit.kapila16@gmail.com> wrote:
On Fri, Jan 8, 2021 at 7:14 AM Peter Smith <smithpb2250@gmail.com> wrote:
FYI, I was able to reproduce this case in debugger. PSA logs showing details.
Thanks for reproducing as I was worried about exactly this case. I
have one question related to logs:##
## ALTER SUBSCRIPTION to REFRESH the publication## This blocks on some latch until the tablesync worker dies, then it continues
##Did you check which exact latch or lock blocks this?
I have checked this myself and the command is waiting on the drop of
origin till the tablesync worker is finished because replorigin_drop()
requires state->acquired_by to be 0 which will only be true once the
tablesync worker exits. I think this is the reason you might have
noticed that the command can't be finished until the tablesync worker
died. So this can't be an interlock between ALTER SUBSCRIPTION ..
REFRESH command and tablesync worker and to that end it seems you have
below Fixme's in the patch:I have also seen this same blocking reason before in the replorigin_drop().
However, back when I first tested/reproduced the refresh issue
[test-refresh] that
AlterSubscription_refresh was still *original* unchanged code, so at
that time it did not
have any replorigin_drop() in at all. In any case in the latest code
[v14] the AlterSubscription is
immediately stopping the workers so this question may be moot.+ * FIXME - Usually this cleanup would be OK, but will not + * always be OK because the logicalrep_worker_stop_at_commit + * only "flags" the worker to be stopped in the near future + * but meanwhile it may still be running. In this case there + * could be a race between the tablesync worker and this code + * to see who will succeed with the tablesync drop (and the + * loser will ERROR). + * + * FIXME - Also, checking the state is also not guaranteed + * correct because state might be TCOPYDONE when we checked + * but has since progressed to SYNDONE + */ + + if (state == SUBREL_STATE_TCOPYDONE) + {I feel this was okay for an earlier code but now we need to stop the
tablesync workers before trying to drop the slot as we do in
DropSubscription. Now, if we do that then that would fix the race
conditions mentioned in Fixme but still, there are few more things I
am worried about: (a) What if the launcher again starts the tablesync
worker? One idea could be to acquire AccessExclusiveLock on
SubscriptionRelationId as we do in DropSubscription which is not a
very good idea but I can't think of any other good way. (b) the patch
is just checking SUBREL_STATE_TCOPYDONE before dropping the
replication slot but the slot could be created even before that (in
SUBREL_STATE_DATASYNC state). One idea could be we can try to drop the
slot and if we are not able to drop then we can simply continue
assuming it didn't exist.The code was modified in the latest patch [v14] something like as suggested.
The workers for removed tables are now immediately stopped (like
DropSubscription does). Although I did include the AccessExclusiveLock
as (a) suggested, AFAIK this was actually ineffective at preventing
the workers relaunching.
The reason why it was ineffective is that you are locking
SubscriptionRelationId which is to protect relaunch of apply workers
not tablesync workers. But in current form even acquiring
SubscriptionRelRelationId lock won't serve the purpose because
process_syncing_tables_for_apply() doesn't always acquire it before
relaunching the tablesync workers. However, if we acquire
SubscriptionRelRelationId in process_syncing_tables_for_apply() then
it would prevent relaunch of workers but not sure if that is a good
idea. Can you think of some other way?
Instead, I am using
logicalrep_worker_stop_at_commit to do this - testing shows it as
working ok. Please see the code and latest test logs [v14] for
details.
There is still a window where it can relaunch. Basically, after you
stop the worker in AlterSubscription_refresh and till the commit
happens apply worker can relaunch the tablesync workers. I don't see
code-wise how we can protect that. And if the tablesync workers are
restarted after we stopped them, the purpose won't be achieved because
it can recreate or try to reuse the slot which we have dropped.
The other issue with the current code could be that after we drop the
slot and origin what if the transaction (in which we are doing Alter
Subscription) is rolledback? Basically, the workers will be relaunched
and it would assume that slot should be there but the slot won't be
present. I have thought of dropping the slot at commit time after we
stop the workers but again not sure if that is a good idea because at
that point we don't want to establish the connection with the
publisher.
I think this needs some more thought.
--
With Regards,
Amit Kapila.
Hi Amit.
PSA the v15 patch for the Tablesync Solution1.
Main differences from v14:
+ Addresses review comment, posted 13/Jan [ak13]
[ak13] = /messages/by-id/CAA4eK1KzNbudfwmJD-ureYigX6sNyCU6YgHscg29xWoZG6osvA@mail.gmail.com
====
Features:
* The tablesync slot is now permanent instead of temporary.
* The tablesync slot name is no longer tied to the Subscription slot name.
* The tablesync slot cleanup (drop) code is added for
DropSubscription, AlterSubscription_refresh and for
process_syncing_tables_for_sync functions. Drop/AlterSubscription will
issue WARNING instead of ERROR in case the slot drop fails.
* The tablesync worker is now allowing multiple tx instead of single tx
* A new state (SUBREL_STATE_FINISHEDCOPY) is persisted after a
successful copy_table in tablesync's LogicalRepSyncTableStart.
* If a re-launched tablesync finds state SUBREL_STATE_FINISHEDCOPY
then it will bypass the initial copy_table phase.
* Now tablesync sets up replication origin tracking in
LogicalRepSyncTableStart (similar as done for the apply worker). The
origin is advanced when first created.
* The tablesync replication origin tracking is cleaned up during
DropSubscription and/or process_syncing_tables_for_apply.
* The DropSubscription cleanup code was enhanced (v7+) to take care of
any crashed tablesync workers.
* The AlterSubscription_refresh (v14+) is now more similar to
DropSubscription w.r.t to stopping tablesync workers for any "removed"
tables.
* Updates to PG docs.
TODO / Known Issues:
* The AlterSubscription_refresh tablesync cleanup code still has some
problems [1]= /messages/by-id/CAA4eK1JuwZF7FHM+EPjWdVh=Xaz-7Eo-G0TByMjWeUU32Xue3w@mail.gmail.com
[1]: = /messages/by-id/CAA4eK1JuwZF7FHM+EPjWdVh=Xaz-7Eo-G0TByMjWeUU32Xue3w@mail.gmail.com
---
Kind Regards,
Peter Smith.
Fujitsu Australia
Attachments:
v15-0001-Tablesync-Solution1.patchapplication/octet-stream; name=v15-0001-Tablesync-Solution1.patchDownload
From 3e69d48bf311e763fc1e9fd04afd09b9dea358a7 Mon Sep 17 00:00:00 2001
From: Peter Smith <peter.b.smith@fujitsu.com>
Date: Thu, 14 Jan 2021 15:07:23 +1100
Subject: [PATCH v15] Tablesync Solution1.
====
Features:
* The tablesync slot is now permanent instead of temporary.
* The tablesync slot name is no longer tied to the Subscription slot name.
* The tablesync slot cleanup (drop) code is added for DropSubscription, AlterSubscription_refresh and for process_syncing_tables_for_sync functions. Drop/AlterSubscription will issue WARNING instead of ERROR in case the slot drop fails.
* The tablesync worker is now allowing multiple tx instead of single tx
* A new state (SUBREL_STATE_FINISHEDCOPY) is persisted after a successful copy_table in tablesync's LogicalRepSyncTableStart.
* If a re-launched tablesync finds state SUBREL_STATE_FINISHEDCOPY then it will bypass the initial copy_table phase.
* Now tablesync sets up replication origin tracking in LogicalRepSyncTableStart (similar as done for the apply worker). The origin is advanced when first created.
* The tablesync replication origin tracking is cleaned up during DropSubscription and/or process_syncing_tables_for_apply.
* The DropSubscription cleanup code was enhanced (v7+) to take care of any crashed tablesync workers.
* The AlterSubscription_refresh (v14+) is now more similar to DropSubscription w.r.t to stopping tablesync workers for any "removed" tables.
* Updates to PG docs.
TODO / Known Issues:
* The AlterSubscription tablesync cleanup code still has problems [1]
[1] = https://www.postgresql.org/message-id/CAA4eK1JuwZF7FHM%2BEPjWdVh%3DXaz-7Eo-G0TByMjWeUU32Xue3w%40mail.gmail.com
---
doc/src/sgml/catalogs.sgml | 1 +
doc/src/sgml/logical-replication.sgml | 17 +-
doc/src/sgml/ref/drop_subscription.sgml | 6 +-
src/backend/commands/subscriptioncmds.c | 454 ++++++++++++++++++++--------
src/backend/replication/logical/tablesync.c | 259 +++++++++++++---
src/backend/replication/logical/worker.c | 18 +-
src/include/catalog/pg_subscription_rel.h | 2 +
src/include/replication/slot.h | 3 +
8 files changed, 567 insertions(+), 193 deletions(-)
diff --git a/doc/src/sgml/catalogs.sgml b/doc/src/sgml/catalogs.sgml
index 3a22665..2e46a49 100644
--- a/doc/src/sgml/catalogs.sgml
+++ b/doc/src/sgml/catalogs.sgml
@@ -7651,6 +7651,7 @@ SCRAM-SHA-256$<replaceable><iteration count></replaceable>:<replaceable>&l
State code:
<literal>i</literal> = initialize,
<literal>d</literal> = data is being copied,
+ <literal>f</literal> = finished table copy,
<literal>s</literal> = synchronized,
<literal>r</literal> = ready (normal replication)
</para></entry>
diff --git a/doc/src/sgml/logical-replication.sgml b/doc/src/sgml/logical-replication.sgml
index a560ad6..20cdd57 100644
--- a/doc/src/sgml/logical-replication.sgml
+++ b/doc/src/sgml/logical-replication.sgml
@@ -248,7 +248,17 @@
<para>
As mentioned earlier, each (active) subscription receives changes from a
- replication slot on the remote (publishing) side. Normally, the remote
+ replication slot on the remote (publishing) side.
+ </para>
+ <para>
+ Additional table synchronization slots are normally transient, created
+ internally and dropped automatically when they are no longer needed.
+ These table synchronization slots have generated names:
+ <quote><literal>pg_%u_sync_%u</literal></quote> (parameters: Subscription
+ <parameter>oid</parameter>, Table <parameter>relid</parameter>)
+ </para>
+ <para>
+ Normally, the remote
replication slot is created automatically when the subscription is created
using <command>CREATE SUBSCRIPTION</command> and it is dropped
automatically when the subscription is dropped using <command>DROP
@@ -294,8 +304,9 @@
using <command>ALTER SUBSCRIPTION</command> before attempting to drop
the subscription. If the remote database instance no longer exists, no
further action is then necessary. If, however, the remote database
- instance is just unreachable, the replication slot should then be
- dropped manually; otherwise it would continue to reserve WAL and might
+ instance is just unreachable, the replication slot (and any still
+ remaining table synchronization slots) should then be
+ dropped manually; otherwise it/they would continue to reserve WAL and might
eventually cause the disk to fill up. Such cases should be carefully
investigated.
</para>
diff --git a/doc/src/sgml/ref/drop_subscription.sgml b/doc/src/sgml/ref/drop_subscription.sgml
index adbdeaf..aee9615 100644
--- a/doc/src/sgml/ref/drop_subscription.sgml
+++ b/doc/src/sgml/ref/drop_subscription.sgml
@@ -79,7 +79,8 @@ DROP SUBSCRIPTION [ IF EXISTS ] <replaceable class="parameter">name</replaceable
<para>
When dropping a subscription that is associated with a replication slot on
the remote host (the normal state), <command>DROP SUBSCRIPTION</command>
- will connect to the remote host and try to drop the replication slot as
+ will connect to the remote host and try to drop the replication slot (and
+ any remaining table synchronization slots) as
part of its operation. This is necessary so that the resources allocated
for the subscription on the remote host are released. If this fails,
either because the remote host is not reachable or because the remote
@@ -89,7 +90,8 @@ DROP SUBSCRIPTION [ IF EXISTS ] <replaceable class="parameter">name</replaceable
executing <literal>ALTER SUBSCRIPTION ... SET (slot_name = NONE)</literal>.
After that, <command>DROP SUBSCRIPTION</command> will no longer attempt any
actions on a remote host. Note that if the remote replication slot still
- exists, it should then be dropped manually; otherwise it will continue to
+ exists, it (and any related table synchronization slots) should then be
+ dropped manually; otherwise it/they will continue to
reserve WAL and might eventually cause the disk to fill up. See
also <xref linkend="logical-replication-subscription-slot"/>.
</para>
diff --git a/src/backend/commands/subscriptioncmds.c b/src/backend/commands/subscriptioncmds.c
index 490e935..f94243b 100644
--- a/src/backend/commands/subscriptioncmds.c
+++ b/src/backend/commands/subscriptioncmds.c
@@ -37,6 +37,7 @@
#include "replication/walreceiver.h"
#include "replication/walsender.h"
#include "replication/worker_internal.h"
+#include "replication/slot.h"
#include "storage/lmgr.h"
#include "utils/acl.h"
#include "utils/builtins.h"
@@ -566,100 +567,165 @@ AlterSubscription_refresh(Subscription *sub, bool copy_data)
Oid *pubrel_local_oids;
ListCell *lc;
int off;
+ Relation rel;
/* Load the library providing us libpq calls. */
load_file("libpqwalreceiver", false);
- /* Try to connect to the publisher. */
- wrconn = walrcv_connect(sub->conninfo, true, sub->name, &err);
- if (!wrconn)
- ereport(ERROR,
- (errmsg("could not connect to the publisher: %s", err)));
-
- /* Get the table list from publisher. */
- pubrel_names = fetch_table_list(wrconn, sub->publications);
-
- /* We are done with the remote side, close connection. */
- walrcv_disconnect(wrconn);
+ PG_TRY();
+ {
+ /* Try to connect to the publisher. */
+ wrconn = walrcv_connect(sub->conninfo, true, sub->name, &err);
+ if (!wrconn)
+ ereport(ERROR,
+ (errmsg("could not connect to the publisher: %s", err)));
- /* Get local table list. */
- subrel_states = GetSubscriptionRelations(sub->oid);
+ /* Get the table list from publisher. */
+ pubrel_names = fetch_table_list(wrconn, sub->publications);
- /*
- * Build qsorted array of local table oids for faster lookup. This can
- * potentially contain all tables in the database so speed of lookup is
- * important.
- */
- subrel_local_oids = palloc(list_length(subrel_states) * sizeof(Oid));
- off = 0;
- foreach(lc, subrel_states)
- {
- SubscriptionRelState *relstate = (SubscriptionRelState *) lfirst(lc);
+ /* Get local table list. */
+ subrel_states = GetSubscriptionRelations(sub->oid);
- subrel_local_oids[off++] = relstate->relid;
- }
- qsort(subrel_local_oids, list_length(subrel_states),
- sizeof(Oid), oid_cmp);
+ /*
+ * Build qsorted array of local table oids for faster lookup. This can
+ * potentially contain all tables in the database so speed of lookup
+ * is important.
+ */
+ subrel_local_oids = palloc(list_length(subrel_states) * sizeof(Oid));
+ off = 0;
+ foreach(lc, subrel_states)
+ {
+ SubscriptionRelState *relstate = (SubscriptionRelState *) lfirst(lc);
- /*
- * Walk over the remote tables and try to match them to locally known
- * tables. If the table is not known locally create a new state for it.
- *
- * Also builds array of local oids of remote tables for the next step.
- */
- off = 0;
- pubrel_local_oids = palloc(list_length(pubrel_names) * sizeof(Oid));
+ subrel_local_oids[off++] = relstate->relid;
+ }
+ qsort(subrel_local_oids, list_length(subrel_states),
+ sizeof(Oid), oid_cmp);
+
+ /*
+ * Walk over the remote tables and try to match them to locally known
+ * tables. If the table is not known locally create a new state for
+ * it.
+ *
+ * Also builds array of local oids of remote tables for the next step.
+ */
+ off = 0;
+ pubrel_local_oids = palloc(list_length(pubrel_names) * sizeof(Oid));
+
+ foreach(lc, pubrel_names)
+ {
+ RangeVar *rv = (RangeVar *) lfirst(lc);
+ Oid relid;
- foreach(lc, pubrel_names)
- {
- RangeVar *rv = (RangeVar *) lfirst(lc);
- Oid relid;
+ relid = RangeVarGetRelid(rv, AccessShareLock, false);
- relid = RangeVarGetRelid(rv, AccessShareLock, false);
+ /* Check for supported relkind. */
+ CheckSubscriptionRelkind(get_rel_relkind(relid),
+ rv->schemaname, rv->relname);
- /* Check for supported relkind. */
- CheckSubscriptionRelkind(get_rel_relkind(relid),
- rv->schemaname, rv->relname);
+ pubrel_local_oids[off++] = relid;
- pubrel_local_oids[off++] = relid;
+ if (!bsearch(&relid, subrel_local_oids,
+ list_length(subrel_states), sizeof(Oid), oid_cmp))
+ {
+ AddSubscriptionRelState(sub->oid, relid,
+ copy_data ? SUBREL_STATE_INIT : SUBREL_STATE_READY,
+ InvalidXLogRecPtr);
+ ereport(DEBUG1,
+ (errmsg("table \"%s.%s\" added to subscription \"%s\"",
+ rv->schemaname, rv->relname, sub->name)));
+ }
+ }
- if (!bsearch(&relid, subrel_local_oids,
- list_length(subrel_states), sizeof(Oid), oid_cmp))
+ /*
+ * Next remove state for tables we should not care about anymore using
+ * the data we collected above
+ */
+ qsort(pubrel_local_oids, list_length(pubrel_names),
+ sizeof(Oid), oid_cmp);
+
+ /*
+ * Lock pg_subscription with AccessExclusiveLock to ensure that the
+ * launcher doesn't restart new worker for the ones we are about to
+ * stop.
+ */
+ rel = table_open(SubscriptionRelationId, AccessExclusiveLock);
+
+ for (off = 0; off < list_length(subrel_states); off++)
{
- AddSubscriptionRelState(sub->oid, relid,
- copy_data ? SUBREL_STATE_INIT : SUBREL_STATE_READY,
- InvalidXLogRecPtr);
- ereport(DEBUG1,
- (errmsg("table \"%s.%s\" added to subscription \"%s\"",
- rv->schemaname, rv->relname, sub->name)));
- }
- }
+ Oid relid = subrel_local_oids[off];
- /*
- * Next remove state for tables we should not care about anymore using the
- * data we collected above
- */
- qsort(pubrel_local_oids, list_length(pubrel_names),
- sizeof(Oid), oid_cmp);
+ if (!bsearch(&relid, pubrel_local_oids,
+ list_length(pubrel_names), sizeof(Oid), oid_cmp))
+ {
+ Oid subid = sub->oid;
+ char originname[NAMEDATALEN];
+ RepOriginId originid;
+ char state;
+ XLogRecPtr statelsn;
+
+ /* Immediately stop the worker. */
+ logicalrep_worker_stop_at_commit(subid, relid); /* prevent re-launching */
+ logicalrep_worker_stop(subid, relid); /* stop immediately */
+
+ /* Last known rel state. */
+ state = GetSubscriptionRelState(subid, relid, &statelsn);
+
+ RemoveSubscriptionRel(sub->oid, relid);
+
+ /*
+ * Drop the tablesync slot.
+ *
+ * For SYNCDONE/READY states the tablesync slot is known to
+ * have already been dropped by the tablesync worker.
+ *
+ * For other states, there is no certainty. Maybe the slot
+ * does not exist yet; Maybe the slot is already deleted but
+ * SYNCDONE is not yet set. For this reason we allow
+ * missing_ok = true for the drop.
+ */
+ if (state != SUBREL_STATE_SYNCDONE && state != SUBREL_STATE_READY)
+ {
+ char syncslotname[NAMEDATALEN] = {0};
+ bool missing_ok = true; /* no ERROR if slot is
+ * missing. */
- for (off = 0; off < list_length(subrel_states); off++)
- {
- Oid relid = subrel_local_oids[off];
+ ReplicationSlotNameForTablesync(subid, relid, syncslotname);
- if (!bsearch(&relid, pubrel_local_oids,
- list_length(pubrel_names), sizeof(Oid), oid_cmp))
- {
- RemoveSubscriptionRel(sub->oid, relid);
+ elog(DEBUG1,
+ "AlterSubscription_refresh: dropping the tablesync slot \"%s\".",
+ syncslotname);
+ ReplicationSlotDropAtPubNode(wrconn, syncslotname, missing_ok);
+ }
- logicalrep_worker_stop_at_commit(sub->oid, relid);
+ /* Remove the tablesync's origin tracking if exists. */
+ snprintf(originname, sizeof(originname), "pg_%u_%u", subid, relid);
+ originid = replorigin_by_name(originname, true);
+ if (OidIsValid(originid))
+ {
+ elog(DEBUG1,
+ "AlterSubscription_refresh: dropping origin tracking for \"%s\"",
+ originname);
+ replorigin_drop(originid, false);
+ }
- ereport(DEBUG1,
- (errmsg("table \"%s.%s\" removed from subscription \"%s\"",
- get_namespace_name(get_rel_namespace(relid)),
- get_rel_name(relid),
- sub->name)));
+ ereport(DEBUG1,
+ (errmsg("table \"%s.%s\" removed from subscription \"%s\"",
+ get_namespace_name(get_rel_namespace(relid)),
+ get_rel_name(relid),
+ sub->name)));
+ }
}
+
+ table_close(rel, NoLock);
+
}
+ PG_FINALLY();
+ {
+ if (wrconn)
+ walrcv_disconnect(wrconn);
+ }
+ PG_END_TRY();
}
/*
@@ -928,8 +994,8 @@ DropSubscription(DropSubscriptionStmt *stmt, bool isTopLevel)
char *err = NULL;
RepOriginId originid;
WalReceiverConn *wrconn = NULL;
- StringInfoData cmd;
Form_pg_subscription form;
+ List *rstates;
/*
* Lock pg_subscription with AccessExclusiveLock to ensure that the
@@ -1016,100 +1082,220 @@ DropSubscription(DropSubscriptionStmt *stmt, bool isTopLevel)
ReleaseSysCache(tup);
/*
- * Stop all the subscription workers immediately.
- *
- * This is necessary if we are dropping the replication slot, so that the
- * slot becomes accessible.
- *
- * It is also necessary if the subscription is disabled and was disabled
- * in the same transaction. Then the workers haven't seen the disabling
- * yet and will still be running, leading to hangs later when we want to
- * drop the replication origin. If the subscription was disabled before
- * this transaction, then there shouldn't be any workers left, so this
- * won't make a difference.
+ * Try to acquire the connection necessary for dropping slots. We do this
+ * here so that the same connection may be shared for dropping the
+ * Subscription slot, as well as dropping any tablesync slots.
*
- * New workers won't be started because we hold an exclusive lock on the
- * subscription till the end of the transaction.
+ * Note: If the slotname is NONE/NULL then connection errors are
+ * suppressed. This is necessary so that the DROP SUBSCRIPTION can still
+ * complete even when the connection to publisher is broken.
*/
- LWLockAcquire(LogicalRepWorkerLock, LW_SHARED);
- subworkers = logicalrep_workers_find(subid, false);
- LWLockRelease(LogicalRepWorkerLock);
- foreach(lc, subworkers)
+ load_file("libpqwalreceiver", false);
+
+ wrconn = walrcv_connect(conninfo, true, subname, &err);
+ if (wrconn == NULL && slotname != NULL)
+ ereport(ERROR,
+ (errmsg("could not connect to publisher when attempting to "
+ "drop the replication slot \"%s\"", slotname),
+ errdetail("The error was: %s", err),
+ /* translator: %s is an SQL ALTER command */
+ errhint("Use %s to disassociate the subscription from the slot.",
+ "ALTER SUBSCRIPTION ... SET (slot_name = NONE)")));
+
+ PG_TRY();
{
- LogicalRepWorker *w = (LogicalRepWorker *) lfirst(lc);
+ /*
+ * Stop all the subscription workers immediately.
+ *
+ * This is necessary if we are dropping the replication slot, so that
+ * the slot becomes accessible.
+ *
+ * It is also necessary if the subscription is disabled and was
+ * disabled in the same transaction. Then the workers haven't seen
+ * the disabling yet and will still be running, leading to hangs later
+ * when we want to drop the replication origin. If the subscription
+ * was disabled before this transaction, then there shouldn't be any
+ * workers left, so this won't make a difference.
+ *
+ * New workers won't be started because we hold an exclusive lock on
+ * the subscription till the end of the transaction.
+ */
+ LWLockAcquire(LogicalRepWorkerLock, LW_SHARED);
+ subworkers = logicalrep_workers_find(subid, false);
+ LWLockRelease(LogicalRepWorkerLock);
+ foreach(lc, subworkers)
+ {
+ LogicalRepWorker *w = (LogicalRepWorker *) lfirst(lc);
- logicalrep_worker_stop(w->subid, w->relid);
- }
- list_free(subworkers);
+ logicalrep_worker_stop(w->subid, w->relid);
+ }
+ list_free(subworkers);
+
+ /*
+ * Tablesync resource cleanup (slots and origins).
+ *
+ * Any READY-state relations would already have dealt with clean-ups.
+ */
+ rstates = GetSubscriptionNotReadyRelations(subid);
+ foreach(lc, rstates)
+ {
+ SubscriptionRelState *rstate = (SubscriptionRelState *) lfirst(lc);
+ Oid relid = rstate->relid;
- /* Clean up dependencies */
- deleteSharedDependencyRecordsFor(SubscriptionRelationId, subid, 0);
+ /* Only cleanup resources of tablesync workers */
+ if (!OidIsValid(relid))
+ continue;
- /* Remove any associated relation synchronization states. */
- RemoveSubscriptionRel(subid, InvalidOid);
+ /*
+ * Drop the tablesync slot.
+ *
+ * For SYNCDONE/READY states the tablesync slot is known to have
+ * already been dropped by the tablesync worker.
+ *
+ * For other states, there is no certainty. Maybe the slot does
+ * not exist yet; Maybe the slot is already deleted but SYNCDONE
+ * is not yet set. For this reason we allow missing_ok = true for
+ * the drop.
+ */
+ if (rstate->state != SUBREL_STATE_SYNCDONE)
+ {
+ char syncslotname[NAMEDATALEN] = {0};
+ bool missing_ok = true; /* no ERROR if slot is
+ * missing. */
- /* Remove the origin tracking if exists. */
- snprintf(originname, sizeof(originname), "pg_%u", subid);
- originid = replorigin_by_name(originname, true);
- if (originid != InvalidRepOriginId)
- replorigin_drop(originid, false);
+ ReplicationSlotNameForTablesync(subid, relid, syncslotname);
- /*
- * If there is no slot associated with the subscription, we can finish
- * here.
- */
- if (!slotname)
+ if (!wrconn)
+ {
+ /*
+ * It is only possible to reach here without ERROR for a
+ * broken publisher connection if the subscription
+ * slotname is already NONE/NULL.
+ *
+ * This means the user has disassociated the subscription
+ * from the replication slot deliberately so that the DROP
+ * SUBSCRIPTION can proceed to completion. See
+ * https://www.postgresql.org/docs/current/sql-dropsubscription.html
+ *
+ * For this reason we only give a WARNING a message that
+ * the tablesync slots cannot be dropped, rather than
+ * throw ERROR (which would prevent the DROP SUBSCRIPTION
+ * from proceeding).
+ *
+ * In such a case the user must take steps to manually
+ * cleanup these remaining tablesync slots.
+ */
+ elog(WARNING,
+ "no connection; cannot drop tablesync slot \"%s\".",
+ syncslotname);
+ }
+ else
+ {
+ elog(DEBUG1,
+ "DropSubscription: dropping the tablesync slot \"%s\".",
+ syncslotname);
+ ReplicationSlotDropAtPubNode(wrconn, syncslotname, missing_ok);
+ }
+ }
+
+ /* Remove the tablesync's origin tracking if exists. */
+ snprintf(originname, sizeof(originname), "pg_%u_%u", subid, relid);
+ originid = replorigin_by_name(originname, true);
+ if (originid != InvalidRepOriginId)
+ {
+ elog(DEBUG1,
+ "DropSubscription: dropping origin tracking for \"%s\"",
+ originname);
+ replorigin_drop(originid, false);
+ }
+ }
+ list_free(rstates);
+
+ /* Clean up dependencies. */
+ deleteSharedDependencyRecordsFor(SubscriptionRelationId, subid, 0);
+
+ /* Remove any associated relation synchronization states. */
+ RemoveSubscriptionRel(subid, InvalidOid);
+
+ /* Remove the origin tracking if exists. */
+ snprintf(originname, sizeof(originname), "pg_%u", subid);
+ originid = replorigin_by_name(originname, true);
+ if (originid != InvalidRepOriginId)
+ replorigin_drop(originid, false);
+
+ /*
+ * If there is a slot associated with the subscription, then drop the
+ * replication slot at the publisher node using the replication
+ * connection.
+ */
+ if (slotname)
+ ReplicationSlotDropAtPubNode(wrconn, slotname, false);
+ }
+ PG_FINALLY();
{
+ if (wrconn)
+ walrcv_disconnect(wrconn);
+
table_close(rel, NoLock);
- return;
}
+ PG_END_TRY();
+}
+
+
+/*
+ * Drop the replication slot at the publisher node using the replication connection.
+ *
+ * missing_ok - if true then only issue WARNING message if the slot cannot be deleted.
+ */
+void
+ReplicationSlotDropAtPubNode(WalReceiverConn *wrconn, char *slotname, bool missing_ok)
+{
+ StringInfoData cmd;
+
+ Assert(wrconn);
- /*
- * Otherwise drop the replication slot at the publisher node using the
- * replication connection.
- */
load_file("libpqwalreceiver", false);
initStringInfo(&cmd);
appendStringInfo(&cmd, "DROP_REPLICATION_SLOT %s WAIT", quote_identifier(slotname));
- wrconn = walrcv_connect(conninfo, true, subname, &err);
- if (wrconn == NULL)
- ereport(ERROR,
- (errmsg("could not connect to publisher when attempting to "
- "drop the replication slot \"%s\"", slotname),
- errdetail("The error was: %s", err),
- /* translator: %s is an SQL ALTER command */
- errhint("Use %s to disassociate the subscription from the slot.",
- "ALTER SUBSCRIPTION ... SET (slot_name = NONE)")));
-
PG_TRY();
{
WalRcvExecResult *res;
res = walrcv_exec(wrconn, cmd.data, 0, NULL);
- if (res->status != WALRCV_OK_COMMAND)
- ereport(ERROR,
+ if (res->status == WALRCV_OK_COMMAND)
+ {
+ /* NOTICE. Success. */
+ ereport(NOTICE,
+ (errmsg("dropped replication slot \"%s\" on publisher",
+ slotname)));
+ }
+ else if (res->status == WALRCV_ERROR && missing_ok)
+ {
+ /* WARNING. Error, but missing_ok = true. */
+ ereport(WARNING,
(errmsg("could not drop the replication slot \"%s\" on publisher",
slotname),
errdetail("The error was: %s", res->err)));
+ }
else
- ereport(NOTICE,
- (errmsg("dropped replication slot \"%s\" on publisher",
- slotname)));
+ {
+ /* ERROR. */
+ ereport(ERROR,
+ (errmsg("could not drop the replication slot \"%s\" on publisher",
+ slotname),
+ errdetail("The error was: %s", res->err)));
+ }
walrcv_clear_result(res);
}
PG_FINALLY();
{
- walrcv_disconnect(wrconn);
+ pfree(cmd.data);
}
PG_END_TRY();
-
- pfree(cmd.data);
-
- table_close(rel, NoLock);
}
/*
diff --git a/src/backend/replication/logical/tablesync.c b/src/backend/replication/logical/tablesync.c
index 863d196..33e11a1 100644
--- a/src/backend/replication/logical/tablesync.c
+++ b/src/backend/replication/logical/tablesync.c
@@ -31,8 +31,10 @@
* table state to INIT.
* - Tablesync worker starts; changes table state from INIT to DATASYNC while
* copying.
- * - Tablesync worker finishes the copy and sets table state to SYNCWAIT;
- * waits for state change.
+ * - Tablesync worker does initial table copy; there is a FINISHEDCOPY state to
+ * indicate when the copy phase has completed, so if the worker crashes
+ * before reaching SYNCDONE the copy will not be re-attempted.
+ * - Tablesync worker then sets table state to SYNCWAIT; waits for state change.
* - Apply worker periodically checks for tables in SYNCWAIT state. When
* any appear, it sets the table state to CATCHUP and starts loop-waiting
* until either the table state is set to SYNCDONE or the sync worker
@@ -48,8 +50,8 @@
* point it sets state to READY and stops tracking. Again, there might
* be zero changes in between.
*
- * So the state progression is always: INIT -> DATASYNC -> SYNCWAIT ->
- * CATCHUP -> SYNCDONE -> READY.
+ * So the state progression is always: INIT -> DATASYNC ->
+ * (sync worker FINISHEDCOPY) -> SYNCWAIT -> CATCHUP -> SYNCDONE -> READY.
*
* The catalog pg_subscription_rel is used to keep information about
* subscribed tables and their state. Some transient state during data
@@ -59,6 +61,7 @@
* Example flows look like this:
* - Apply is in front:
* sync:8
+ * -> set in catalog FINISHEDCOPY
* -> set in memory SYNCWAIT
* apply:10
* -> set in memory CATCHUP
@@ -74,6 +77,7 @@
*
* - Sync is in front:
* sync:10
+ * -> set in catalog FINISHEDCOPY
* -> set in memory SYNCWAIT
* apply:8
* -> set in memory CATCHUP
@@ -102,7 +106,10 @@
#include "replication/logicalrelation.h"
#include "replication/walreceiver.h"
#include "replication/worker_internal.h"
+#include "replication/slot.h"
+#include "replication/origin.h"
#include "storage/ipc.h"
+#include "storage/lmgr.h"
#include "utils/builtins.h"
#include "utils/lsyscache.h"
#include "utils/memutils.h"
@@ -270,30 +277,59 @@ invalidate_syncing_table_states(Datum arg, int cacheid, uint32 hashvalue)
static void
process_syncing_tables_for_sync(XLogRecPtr current_lsn)
{
- Assert(IsTransactionState());
+ bool sync_done = false;
SpinLockAcquire(&MyLogicalRepWorker->relmutex);
+ sync_done = MyLogicalRepWorker->relstate == SUBREL_STATE_CATCHUP &&
+ current_lsn >= MyLogicalRepWorker->relstate_lsn;
+ SpinLockRelease(&MyLogicalRepWorker->relmutex);
- if (MyLogicalRepWorker->relstate == SUBREL_STATE_CATCHUP &&
- current_lsn >= MyLogicalRepWorker->relstate_lsn)
+ if (sync_done)
{
TimeLineID tli;
+ char syncslotname[NAMEDATALEN] = {0};
+
+ /* End wal streaming so wrconn can be re-used to drop the slot. */
+ walrcv_endstreaming(wrconn, &tli);
+
+ /*
+ * Cleanup the tablesync slot.
+ */
+ ReplicationSlotNameForTablesync(
+ MySubscription->oid,
+ MyLogicalRepWorker->relid,
+ syncslotname);
+ elog(DEBUG1,
+ "process_syncing_tables_for_sync: dropping the tablesync slot \"%s\".",
+ syncslotname);
+ ReplicationSlotDropAtPubNode(wrconn, syncslotname, false);
+
+ /*
+ * Change state to SYNCDONE.
+ */
+ SpinLockAcquire(&MyLogicalRepWorker->relmutex);
MyLogicalRepWorker->relstate = SUBREL_STATE_SYNCDONE;
MyLogicalRepWorker->relstate_lsn = current_lsn;
SpinLockRelease(&MyLogicalRepWorker->relmutex);
+ /*
+ * UpdateSubscriptionRelState must be called within a transaction.
+ * That transaction will be ended within the finish_sync_worker().
+ */
+ if (!IsTransactionState())
+ {
+ StartTransactionCommand();
+ }
+
UpdateSubscriptionRelState(MyLogicalRepWorker->subid,
MyLogicalRepWorker->relid,
MyLogicalRepWorker->relstate,
MyLogicalRepWorker->relstate_lsn);
- walrcv_endstreaming(wrconn, &tli);
finish_sync_worker();
}
- else
- SpinLockRelease(&MyLogicalRepWorker->relmutex);
}
/*
@@ -412,6 +448,37 @@ process_syncing_tables_for_apply(XLogRecPtr current_lsn)
started_tx = true;
}
+ /*
+ * Remove the tablesync origin tracking if exists.
+ *
+ * The cleanup is done here instead of in the
+ * finish_sync_worker function because if the tablesync worker
+ * process attempted to call replorigin_drop then that will
+ * hang because replorigin_drop logic considers the owning
+ * tablesync PID as "busy".
+ *
+ * Do this before updating the state, so that DropSubscription
+ * can know that all READY workers have already had their
+ * origin tracking removed.
+ */
+ {
+ char originname[NAMEDATALEN];
+ RepOriginId originid;
+
+ snprintf(originname, sizeof(originname), "pg_%u_%u", MyLogicalRepWorker->subid, rstate->relid);
+ originid = replorigin_by_name(originname, true);
+ if (OidIsValid(originid))
+ {
+ elog(DEBUG1,
+ "process_syncing_tables_for_apply: dropping tablesync origin tracking for \"%s\".",
+ originname);
+ replorigin_drop(originid, false);
+ }
+ }
+
+ /*
+ * Update the state only after the origin cleanup.
+ */
UpdateSubscriptionRelState(MyLogicalRepWorker->subid,
rstate->relid, rstate->state,
rstate->lsn);
@@ -808,6 +875,30 @@ copy_table(Relation rel)
}
/*
+ * Determine the tablesync slot name.
+ *
+ * The name must not exceed NAMEDATALEN -1 because of remote node constraints on
+ * slot name length.
+ *
+ * The returned slot name is either returned in the supplied buffer or
+ * palloc'ed in current memory context (if NULL buffer).
+ */
+char *
+ReplicationSlotNameForTablesync(Oid suboid, Oid relid, char *syncslotname)
+{
+ if (syncslotname)
+ {
+ sprintf(syncslotname, "pg_%u_sync_%u", suboid, relid);
+ }
+ else
+ {
+ syncslotname = psprintf("pg_%u_sync_%u", suboid, relid);
+ }
+
+ return syncslotname;
+}
+
+/*
* Start syncing the table in the sync worker.
*
* If nothing needs to be done to sync the table, we exit the worker without
@@ -824,6 +915,8 @@ LogicalRepSyncTableStart(XLogRecPtr *origin_startpos)
XLogRecPtr relstate_lsn;
Relation rel;
WalRcvExecResult *res;
+ char originname[NAMEDATALEN];
+ RepOriginId originid;
/* Check the state of the table synchronization. */
StartTransactionCommand();
@@ -849,19 +942,11 @@ LogicalRepSyncTableStart(XLogRecPtr *origin_startpos)
finish_sync_worker(); /* doesn't return */
}
- /*
- * To build a slot name for the sync work, we are limited to NAMEDATALEN -
- * 1 characters. We cut the original slot name to NAMEDATALEN - 28 chars
- * and append _%u_sync_%u (1 + 10 + 6 + 10 + '\0'). (It's actually the
- * NAMEDATALEN on the remote that matters, but this scheme will also work
- * reasonably if that is different.)
- */
- StaticAssertStmt(NAMEDATALEN >= 32, "NAMEDATALEN too small"); /* for sanity */
- slotname = psprintf("%.*s_%u_sync_%u",
- NAMEDATALEN - 28,
- MySubscription->slotname,
- MySubscription->oid,
- MyLogicalRepWorker->relid);
+ /* Calculate the name of the tablesync slot. */
+ slotname = ReplicationSlotNameForTablesync(
+ MySubscription->oid,
+ MyLogicalRepWorker->relid,
+ NULL); /* use palloc */
/*
* Here we use the slot name instead of the subscription name as the
@@ -874,7 +959,32 @@ LogicalRepSyncTableStart(XLogRecPtr *origin_startpos)
(errmsg("could not connect to the publisher: %s", err)));
Assert(MyLogicalRepWorker->relstate == SUBREL_STATE_INIT ||
- MyLogicalRepWorker->relstate == SUBREL_STATE_DATASYNC);
+ MyLogicalRepWorker->relstate == SUBREL_STATE_DATASYNC ||
+ MyLogicalRepWorker->relstate == SUBREL_STATE_FINISHEDCOPY);
+
+ /* Assign the origin tracking record name. */
+ snprintf(originname, sizeof(originname), "pg_%u_%u", MySubscription->oid, MyLogicalRepWorker->relid);
+
+ if (MyLogicalRepWorker->relstate == SUBREL_STATE_FINISHEDCOPY)
+ {
+ /*
+ * The COPY phase was previously done, but tablesync then crashed/etc
+ * before it was able to finish normally.
+ */
+ elog(DEBUG1,
+ "LogicalRepSyncTableStart: tablesync relstate was SUBREL_STATE_FINISHEDCOPY.");
+ StartTransactionCommand();
+
+ /*
+ * The origin tracking name must already exist (missing_ok=false).
+ */
+ originid = replorigin_by_name(originname, false);
+ replorigin_session_setup(originid);
+ replorigin_session_origin = originid;
+ *origin_startpos = replorigin_session_get_progress(false);
+
+ goto copy_table_done;
+ }
SpinLockAcquire(&MyLogicalRepWorker->relmutex);
MyLogicalRepWorker->relstate = SUBREL_STATE_DATASYNC;
@@ -890,9 +1000,6 @@ LogicalRepSyncTableStart(XLogRecPtr *origin_startpos)
CommitTransactionCommand();
pgstat_report_stat(false);
- /*
- * We want to do the table data sync in a single transaction.
- */
StartTransactionCommand();
/*
@@ -918,29 +1025,99 @@ LogicalRepSyncTableStart(XLogRecPtr *origin_startpos)
walrcv_clear_result(res);
/*
- * Create a new temporary logical decoding slot. This slot will be used
+ * Create a new permanent logical decoding slot. This slot will be used
* for the catchup phase after COPY is done, so tell it to use the
* snapshot to make the final data consistent.
*/
- walrcv_create_slot(wrconn, slotname, true,
+ walrcv_create_slot(wrconn, slotname, false,
CRS_USE_SNAPSHOT, origin_startpos);
- /* Now do the initial data copy */
- PushActiveSnapshot(GetTransactionSnapshot());
- copy_table(rel);
- PopActiveSnapshot();
+ /*
+ * Be sure to remove the newly created tablesync slot if the COPY fails.
+ */
+ PG_TRY();
+ {
+ /* Now do the initial data copy */
+ PushActiveSnapshot(GetTransactionSnapshot());
+ copy_table(rel);
+ PopActiveSnapshot();
+
+ res = walrcv_exec(wrconn, "COMMIT", 0, NULL);
+ if (res->status != WALRCV_OK_COMMAND)
+ ereport(ERROR,
+ (errmsg("table copy could not finish transaction on publisher"),
+ errdetail("The error was: %s", res->err)));
+ walrcv_clear_result(res);
+
+ table_close(rel, NoLock);
+
+ /* Make the copy visible. */
+ CommandCounterIncrement();
+ }
+ PG_CATCH();
+ {
+ /*
+ * If something failed during copy table then cleanup the created
+ * slot.
+ */
+ elog(DEBUG1,
+ "LogicalRepSyncTableStart: tablesync copy failed. Dropping the tablesync slot \"%s\".",
+ slotname);
+ ReplicationSlotDropAtPubNode(wrconn, slotname, false);
+
+ pfree(slotname);
+ slotname = NULL;
- res = walrcv_exec(wrconn, "COMMIT", 0, NULL);
- if (res->status != WALRCV_OK_COMMAND)
+ PG_RE_THROW();
+ }
+ PG_END_TRY();
+
+ /* Setup replication origin tracking. */
+ originid = replorigin_by_name(originname, true);
+ if (!OidIsValid(originid))
+ {
+ /*
+ * Origin tracking does not exist, so create it now.
+ *
+ * Then advance to the LSN got from walrcv_create_slot. This is WAL
+ * logged for for the purpose of recovery. Locks are to prevent the
+ * replication origin from vanishing while advancing.
+ */
+ LockRelationOid(ReplicationOriginRelationId, RowExclusiveLock);
+ originid = replorigin_create(originname);
+ replorigin_advance(originid, *origin_startpos, InvalidXLogRecPtr,
+ true /* go backward */ , true /* WAL log */ );
+ UnlockRelationOid(ReplicationOriginRelationId, RowExclusiveLock);
+
+ replorigin_session_setup(originid);
+ replorigin_session_origin = originid;
+ }
+ else
+ {
ereport(ERROR,
- (errmsg("table copy could not finish transaction on publisher"),
- errdetail("The error was: %s", res->err)));
- walrcv_clear_result(res);
+ (errcode(ERRCODE_DUPLICATE_OBJECT),
+ errmsg("replication origin \"%s\" already exists",
+ originname)));
+ }
+
+ /*
+ * Update the persisted state to indicate the COPY phase is done; make it
+ * visible to others.
+ */
+ UpdateSubscriptionRelState(MyLogicalRepWorker->subid,
+ MyLogicalRepWorker->relid,
+ SUBREL_STATE_FINISHEDCOPY,
+ MyLogicalRepWorker->relstate_lsn);
- table_close(rel, NoLock);
+copy_table_done:
- /* Make the copy visible. */
- CommandCounterIncrement();
+ elog(DEBUG1,
+ "LogicalRepSyncTableStart: '%s' origin_startpos lsn %X/%X",
+ originname,
+ (uint32) (*origin_startpos >> 32),
+ (uint32) *origin_startpos);
+
+ CommitTransactionCommand();
/*
* We are done with the initial data synchronization, update the state.
diff --git a/src/backend/replication/logical/worker.c b/src/backend/replication/logical/worker.c
index 1b1d70e..4bd4030 100644
--- a/src/backend/replication/logical/worker.c
+++ b/src/backend/replication/logical/worker.c
@@ -807,12 +807,8 @@ apply_handle_stream_stop(StringInfo s)
/* We must be in a valid transaction state */
Assert(IsTransactionState());
- /* The synchronization worker runs in single transaction. */
- if (!am_tablesync_worker())
- {
- /* Commit the per-stream transaction */
- CommitTransactionCommand();
- }
+ /* Commit the per-stream transaction */
+ CommitTransactionCommand();
in_streamed_transaction = false;
@@ -889,9 +885,7 @@ apply_handle_stream_abort(StringInfo s)
/* Cleanup the subxact info */
cleanup_subxact_info();
- /* The synchronization worker runs in single transaction */
- if (!am_tablesync_worker())
- CommitTransactionCommand();
+ CommitTransactionCommand();
return;
}
@@ -918,8 +912,7 @@ apply_handle_stream_abort(StringInfo s)
/* write the updated subxact list */
subxact_info_write(MyLogicalRepWorker->subid, xid);
- if (!am_tablesync_worker())
- CommitTransactionCommand();
+ CommitTransactionCommand();
}
}
@@ -1062,8 +1055,7 @@ apply_handle_stream_commit(StringInfo s)
static void
apply_handle_commit_internal(StringInfo s, LogicalRepCommitData* commit_data)
{
- /* The synchronization worker runs in single transaction. */
- if (IsTransactionState() && !am_tablesync_worker())
+ if (IsTransactionState())
{
/*
* Update origin state so we can restart streaming from correct
diff --git a/src/include/catalog/pg_subscription_rel.h b/src/include/catalog/pg_subscription_rel.h
index 06663b9..9027c42 100644
--- a/src/include/catalog/pg_subscription_rel.h
+++ b/src/include/catalog/pg_subscription_rel.h
@@ -61,6 +61,8 @@ DECLARE_UNIQUE_INDEX(pg_subscription_rel_srrelid_srsubid_index, 6117, on pg_subs
#define SUBREL_STATE_INIT 'i' /* initializing (sublsn NULL) */
#define SUBREL_STATE_DATASYNC 'd' /* data is being synchronized (sublsn
* NULL) */
+#define SUBREL_STATE_FINISHEDCOPY 'f' /* tablesync copy phase is completed
+ * (sublsn NULL) */
#define SUBREL_STATE_SYNCDONE 's' /* synchronization finished in front of
* apply (sublsn set) */
#define SUBREL_STATE_READY 'r' /* ready (sublsn set) */
diff --git a/src/include/replication/slot.h b/src/include/replication/slot.h
index 53f636c..5f52335 100644
--- a/src/include/replication/slot.h
+++ b/src/include/replication/slot.h
@@ -15,6 +15,7 @@
#include "storage/lwlock.h"
#include "storage/shmem.h"
#include "storage/spin.h"
+#include "replication/walreceiver.h"
/*
* Behaviour of replication slots, upon release or crash.
@@ -211,6 +212,8 @@ extern bool ReplicationSlotsCountDBSlots(Oid dboid, int *nslots, int *nactive);
extern void ReplicationSlotsDropDBSlots(Oid dboid);
extern void InvalidateObsoleteReplicationSlots(XLogSegNo oldestSegno);
extern ReplicationSlot *SearchNamedReplicationSlot(const char *name);
+extern char *ReplicationSlotNameForTablesync(Oid suboid, Oid relid, char *syncslotname);
+extern void ReplicationSlotDropAtPubNode(WalReceiverConn *wrconn, char *slotname, bool missing_ok);
extern void StartupReplicationSlots(void);
extern void CheckPointReplicationSlots(void);
--
1.8.3.1
v15-0002-Tablesync-extra-logging.patchapplication/octet-stream; name=v15-0002-Tablesync-extra-logging.patchDownload
From 6fa2654b1f3c3a1ba963db4e385afcc8c9fa47d6 Mon Sep 17 00:00:00 2001
From: Peter Smith <peter.b.smith@fujitsu.com>
Date: Thu, 14 Jan 2021 15:55:23 +1100
Subject: [PATCH v15] Tablesync extra logging.
This patch only adds some extra logging which may be helpful for testing, but is not for committing.
---
src/backend/commands/subscriptioncmds.c | 29 ++++++++++++++++++----
src/backend/replication/logical/tablesync.c | 37 +++++++++++++++++++++++++----
2 files changed, 56 insertions(+), 10 deletions(-)
diff --git a/src/backend/commands/subscriptioncmds.c b/src/backend/commands/subscriptioncmds.c
index f94243b..b5f9d56 100644
--- a/src/backend/commands/subscriptioncmds.c
+++ b/src/backend/commands/subscriptioncmds.c
@@ -665,11 +665,18 @@ AlterSubscription_refresh(Subscription *sub, bool copy_data)
XLogRecPtr statelsn;
/* Immediately stop the worker. */
+ elog(LOG,
+ "!!>> AlterSubscription_refresh: before logicalrep_worker_stop");
logicalrep_worker_stop_at_commit(subid, relid); /* prevent re-launching */
logicalrep_worker_stop(subid, relid); /* stop immediately */
+ elog(LOG,
+ "!!>> AlterSubscription_refresh: after logicalrep_worker_stop");
/* Last known rel state. */
state = GetSubscriptionRelState(subid, relid, &statelsn);
+ elog(LOG,
+ "!!>> AlterSubscription_refresh: relid %u had state %c",
+ relid, state);
RemoveSubscriptionRel(sub->oid, relid);
@@ -692,10 +699,13 @@ AlterSubscription_refresh(Subscription *sub, bool copy_data)
ReplicationSlotNameForTablesync(subid, relid, syncslotname);
- elog(DEBUG1,
+ elog(LOG,
"AlterSubscription_refresh: dropping the tablesync slot \"%s\".",
syncslotname);
ReplicationSlotDropAtPubNode(wrconn, syncslotname, missing_ok);
+ elog(LOG,
+ "!!>> AlterSubscription_refresh: dropped the tablesync slot \"%s\".",
+ syncslotname);
}
/* Remove the tablesync's origin tracking if exists. */
@@ -703,13 +713,16 @@ AlterSubscription_refresh(Subscription *sub, bool copy_data)
originid = replorigin_by_name(originname, true);
if (OidIsValid(originid))
{
- elog(DEBUG1,
+ elog(LOG,
"AlterSubscription_refresh: dropping origin tracking for \"%s\"",
originname);
replorigin_drop(originid, false);
+ elog(LOG,
+ "!!>> AlterSubscription_refresh: dropped origin tracking for \"%s\"",
+ originname);
}
- ereport(DEBUG1,
+ ereport(LOG,
(errmsg("table \"%s.%s\" removed from subscription \"%s\"",
get_namespace_name(get_rel_namespace(relid)),
get_rel_name(relid),
@@ -1191,10 +1204,13 @@ DropSubscription(DropSubscriptionStmt *stmt, bool isTopLevel)
}
else
{
- elog(DEBUG1,
+ elog(LOG,
"DropSubscription: dropping the tablesync slot \"%s\".",
syncslotname);
ReplicationSlotDropAtPubNode(wrconn, syncslotname, missing_ok);
+ elog(LOG,
+ "!!>> DropSubscription: dropped the tablesync slot \"%s\".",
+ syncslotname);
}
}
@@ -1203,10 +1219,13 @@ DropSubscription(DropSubscriptionStmt *stmt, bool isTopLevel)
originid = replorigin_by_name(originname, true);
if (originid != InvalidRepOriginId)
{
- elog(DEBUG1,
+ elog(LOG,
"DropSubscription: dropping origin tracking for \"%s\"",
originname);
replorigin_drop(originid, false);
+ elog(LOG,
+ "!!>> DropSubscription: dropped origin tracking for \"%s\"",
+ originname);
}
}
list_free(rstates);
diff --git a/src/backend/replication/logical/tablesync.c b/src/backend/replication/logical/tablesync.c
index 33e11a1..80750ad 100644
--- a/src/backend/replication/logical/tablesync.c
+++ b/src/backend/replication/logical/tablesync.c
@@ -300,10 +300,13 @@ process_syncing_tables_for_sync(XLogRecPtr current_lsn)
MyLogicalRepWorker->relid,
syncslotname);
- elog(DEBUG1,
+ elog(LOG,
"process_syncing_tables_for_sync: dropping the tablesync slot \"%s\".",
syncslotname);
ReplicationSlotDropAtPubNode(wrconn, syncslotname, false);
+ elog(LOG,
+ "!!>> process_syncing_tables_for_sync: dropped the tablesync slot \"%s\".",
+ syncslotname);
/*
* Change state to SYNCDONE.
@@ -469,10 +472,13 @@ process_syncing_tables_for_apply(XLogRecPtr current_lsn)
originid = replorigin_by_name(originname, true);
if (OidIsValid(originid))
{
- elog(DEBUG1,
+ elog(LOG,
"process_syncing_tables_for_apply: dropping tablesync origin tracking for \"%s\".",
originname);
replorigin_drop(originid, false);
+ elog(LOG,
+ "!!>> process_syncing_tables_for_apply: dropped tablesync origin tracking for \"%s\".",
+ originname);
}
}
@@ -971,7 +977,7 @@ LogicalRepSyncTableStart(XLogRecPtr *origin_startpos)
* The COPY phase was previously done, but tablesync then crashed/etc
* before it was able to finish normally.
*/
- elog(DEBUG1,
+ elog(LOG,
"LogicalRepSyncTableStart: tablesync relstate was SUBREL_STATE_FINISHEDCOPY.");
StartTransactionCommand();
@@ -979,8 +985,14 @@ LogicalRepSyncTableStart(XLogRecPtr *origin_startpos)
* The origin tracking name must already exist (missing_ok=false).
*/
originid = replorigin_by_name(originname, false);
+ elog(LOG,
+ "!!>> LogicalRepSyncTableStart: 2 replorigin_session_setup \"%s\".",
+ originname);
replorigin_session_setup(originid);
replorigin_session_origin = originid;
+ elog(LOG,
+ "!!>> LogicalRepSyncTableStart: 2 replorigin_session_get_progress \"%s\".",
+ originname);
*origin_startpos = replorigin_session_get_progress(false);
goto copy_table_done;
@@ -1029,6 +1041,9 @@ LogicalRepSyncTableStart(XLogRecPtr *origin_startpos)
* for the catchup phase after COPY is done, so tell it to use the
* snapshot to make the final data consistent.
*/
+ elog(LOG,
+ "!!>> LogicalRepSyncTableStart: walrcv_create_slot for \"%s\".",
+ slotname);
walrcv_create_slot(wrconn, slotname, false,
CRS_USE_SNAPSHOT, origin_startpos);
@@ -1060,10 +1075,13 @@ LogicalRepSyncTableStart(XLogRecPtr *origin_startpos)
* If something failed during copy table then cleanup the created
* slot.
*/
- elog(DEBUG1,
+ elog(LOG,
"LogicalRepSyncTableStart: tablesync copy failed. Dropping the tablesync slot \"%s\".",
slotname);
ReplicationSlotDropAtPubNode(wrconn, slotname, false);
+ elog(LOG,
+ "!!>> LogicalRepSyncTableStart: tablesync copy failed. Dropped the tablesync slot \"%s\".",
+ slotname);
pfree(slotname);
slotname = NULL;
@@ -1084,11 +1102,20 @@ LogicalRepSyncTableStart(XLogRecPtr *origin_startpos)
* replication origin from vanishing while advancing.
*/
LockRelationOid(ReplicationOriginRelationId, RowExclusiveLock);
+ elog(LOG,
+ "!!>> LogicalRepSyncTableStart: 1 replorigin_create \"%s\".",
+ originname);
originid = replorigin_create(originname);
+ elog(LOG,
+ "!!>> LogicalRepSyncTableStart: 1 replorigin_advance \"%s\".",
+ originname);
replorigin_advance(originid, *origin_startpos, InvalidXLogRecPtr,
true /* go backward */ , true /* WAL log */ );
UnlockRelationOid(ReplicationOriginRelationId, RowExclusiveLock);
+ elog(LOG,
+ "!!>> LogicalRepSyncTableStart: 1 replorigin_session_setup \"%s\".",
+ originname);
replorigin_session_setup(originid);
replorigin_session_origin = originid;
}
@@ -1111,7 +1138,7 @@ LogicalRepSyncTableStart(XLogRecPtr *origin_startpos)
copy_table_done:
- elog(DEBUG1,
+ elog(LOG,
"LogicalRepSyncTableStart: '%s' origin_startpos lsn %X/%X",
originname,
(uint32) (*origin_startpos >> 32),
--
1.8.3.1
On Wed, Jan 13, 2021 at 9:18 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
On Wed, Jan 13, 2021 at 11:18 AM Peter Smith <smithpb2250@gmail.com> wrote:
On Mon, Jan 4, 2021 at 10:48 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
7.
@@ -905,7 +905,7 @@ replorigin_advance(RepOriginId node,
LWLockAcquire(&replication_state->lock, LW_EXCLUSIVE);/* Make sure it's not used by somebody else */ - if (replication_state->acquired_by != 0) + if (replication_state->acquired_by != 0 && replication_state->acquired_by != MyProcPid) {I think you won't need this change if you do replorigin_advance before
replorigin_session_setup in your patch.As you know the replorigin_session_setup sets the
replication_state->acquired_by to be the current PID. So without this
change the replorigin_advance rejects that same slot state thinking
that it is already active for a different process. Root problem is
that the same process/PID calling both functions would hang.I think the hang happens only if we call unchanged replorigin_advance
after session_setup API, right?So this
patch change allows replorigin_advance code to be called by self.IIUC that acquired_by check condition is like a sanity check for the
originid passed. The patched code only does just like what the comment
says:
"/* Make sure it's not used by somebody else */"
Doesn't "somebody else" means "anyone but me" (i.e. anyone but MyProcPid).Also, “setup” of a thing generally comes before usage of that thing,
so won't it seem strange to do (like the suggestion) and deliberately
call the "setup" function 2nd instead of 1st?Can you please explain why is it better to do it the suggested way
(switch the calls around) than keep the patch code? Probably there is
a good reason but I am just not understanding it.Because there is no requirement for origin_advance API to be called
after session setup. Session setup is required to mark the node as
replaying from a remote node, see [1] whereas origin_advance is used
for setting up the initial location or setting a new location, see [2]
(pg_replication_origin_advance).Now here, after creating the origin, we need to set up the initial
location and it seems fine to call origin_advance before
session_setup. In short, as such, I don't see any problem with your
change in replorigin_advance but OTOH, I don't see the need for the
same as well. So, let's try to avoid that change unless we can't do
without it.Also, another thing is we need to take RowExclusiveLock on
pg_replication_origin as written in comments atop replorigin_advance
before calling it. See its usage in pg_replication_origin_advance.
Also, write comments on why we need to use replorigin_advance here
(... something, like we need to WAL log this for the purpose of
recovery...).
Modified in latest patch [v15].
----
[v15] = /messages/by-id/CAHut+Pu3he2rOWjbXcNUO6z3aH2LYzW03KV+fiMWim49qW9etQ@mail.gmail.com
Kind Regards,
Peter Smith.
Fujitsu Australia
On Wed, Jan 13, 2021 at 5:07 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
On Tue, Jan 12, 2021 at 6:17 PM Peter Smith <smithpb2250@gmail.com> wrote:
On Mon, Jan 11, 2021 at 3:32 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
The workers for removed tables are now immediately stopped (like
DropSubscription does). Although I did include the AccessExclusiveLock
as (a) suggested, AFAIK this was actually ineffective at preventing
the workers relaunching.The reason why it was ineffective is that you are locking
SubscriptionRelationId which is to protect relaunch of apply workers
not tablesync workers. But in current form even acquiring
SubscriptionRelRelationId lock won't serve the purpose because
process_syncing_tables_for_apply() doesn't always acquire it before
relaunching the tablesync workers. However, if we acquire
SubscriptionRelRelationId in process_syncing_tables_for_apply() then
it would prevent relaunch of workers but not sure if that is a good
idea. Can you think of some other way?Instead, I am using
logicalrep_worker_stop_at_commit to do this - testing shows it as
working ok. Please see the code and latest test logs [v14] for
details.There is still a window where it can relaunch. Basically, after you
stop the worker in AlterSubscription_refresh and till the commit
happens apply worker can relaunch the tablesync workers. I don't see
code-wise how we can protect that. And if the tablesync workers are
restarted after we stopped them, the purpose won't be achieved because
it can recreate or try to reuse the slot which we have dropped.The other issue with the current code could be that after we drop the
slot and origin what if the transaction (in which we are doing Alter
Subscription) is rolledback? Basically, the workers will be relaunched
and it would assume that slot should be there but the slot won't be
present. I have thought of dropping the slot at commit time after we
stop the workers but again not sure if that is a good idea because at
that point we don't want to establish the connection with the
publisher.I think this needs some more thought.
I have another idea to solve this problem. Instead of Alter
Subscription drop the slot/origin, we can let tablesync worker do it.
Basically, we need to register SignalHandlerForShutdownRequest as
SIGTERM handler and then later need to check ShutdownRequestPending
flag in the tablesync worker. If the flag is set, then we can drop the
slot/origin and allow the process to exit cleanly.
This will obviate the need to take the lock and all sort of rollback
problems. If this works out well then I think we can use this for
DropSubscription as well but that is a matter of separate patch.
Thoughts?
--
With Regards,
Amit Kapila.
Hi Amit.
PSA the v16 patch for the Tablesync Solution1.
Main differences from v15:
+ Tablesync cleanups of DropSubscription/AlterSubscription_refresh are
re-implemented as as ProcessInterrupts function
====
Features:
* The tablesync slot is now permanent instead of temporary.
* The tablesync slot name is no longer tied to the Subscription slot name.
* The tablesync worker is now allowing multiple tx instead of single tx
* A new state (SUBREL_STATE_FINISHEDCOPY) is persisted after a
successful copy_table in tablesync's LogicalRepSyncTableStart.
* If a re-launched tablesync finds state SUBREL_STATE_FINISHEDCOPY
then it will bypass the initial copy_table phase.
* Now tablesync sets up replication origin tracking in
LogicalRepSyncTableStart (similar as done for the apply worker). The
origin is advanced when first created.
* Cleanup of tablesync resources:
- The tablesync slot cleanup (drop) code is added for
process_syncing_tables_for_sync functions.
- The tablesync replication origin tracking is cleaned
process_syncing_tables_for_apply.
- A tablesync function to cleanup its own slot/origin is called from
ProcessInterrupt. This is indirectly invoked by
DropSubscription/AlterSubscrition when they signal the tablesync
worker to stop.
* Updates to PG docs.
TODO / Known Issues:
* Race condition observed in "make check" may be related to this patch.
* Add test cases.
---
Please also see some test scenario logging which shows the new
tablesync cleanup function getting called as results of
Drop/AlterSUbscription.
---
Kind Regards,
Peter Smith.
Fujitsu Australia
Attachments:
v16-0002-Tablesync-extra-logging.patchapplication/octet-stream; name=v16-0002-Tablesync-extra-logging.patchDownload
From f17de30cdf84aacd8975d1b3c2ed102a605887fa Mon Sep 17 00:00:00 2001
From: Peter Smith <peter.b.smith@fujitsu.com>
Date: Mon, 18 Jan 2021 21:36:27 +1100
Subject: [PATCH v16] Tablesync extra logging.
This patch only adds some extra logging which may be helpful for testing, but is not for committing.
---
src/backend/replication/logical/tablesync.c | 53 +++++++++++++++++++++++++----
src/backend/tcop/postgres.c | 2 ++
2 files changed, 49 insertions(+), 6 deletions(-)
diff --git a/src/backend/replication/logical/tablesync.c b/src/backend/replication/logical/tablesync.c
index cde3297..21f686c 100644
--- a/src/backend/replication/logical/tablesync.c
+++ b/src/backend/replication/logical/tablesync.c
@@ -282,8 +282,8 @@ tablesync_cleanup_at_interrupt(void)
Oid subid = MySubscription->oid;
Oid relid = MyLogicalRepWorker->relid;
- elog(DEBUG1,
- "tablesync_cleanup_at_interrupt for relid = %d",
+ elog(LOG,
+ "!!>> tablesync_cleanup_at_interrupt for relid = %d",
MyLogicalRepWorker->relid);
@@ -324,7 +324,13 @@ tablesync_cleanup_at_interrupt(void)
ReplicationSlotNameForTablesync(subid, relid, syncslotname);
+ elog(LOG,
+ "!!>> tablesync_cleanup_at_interrupt: dropping the tablesync slot \"%s\".",
+ syncslotname);
ReplicationSlotDropAtPubNode(wrconn, syncslotname, missing_ok);
+ elog(LOG,
+ "!!>> tablesync_cleanup_at_interrupt: dropped the tablesync slot \"%s\".",
+ syncslotname);
}
/*
@@ -341,7 +347,13 @@ tablesync_cleanup_at_interrupt(void)
originid = replorigin_by_name(originname, true);
if (originid != InvalidRepOriginId)
{
+ elog(LOG,
+ "!!>> tablesync_cleanup_at_interrupt: dropping origin tracking for \"%s\"",
+ originname);
replorigin_drop(originid, false);
+ elog(LOG,
+ "!!>> tablesync_cleanup_at_interrupt: dropped origin tracking for \"%s\"",
+ originname);
/*
* CommitTransactionCommand would normally attempt to advance the origin
* but now that the origin is dropped that would fail, so we need to reset
@@ -387,10 +399,13 @@ process_syncing_tables_for_sync(XLogRecPtr current_lsn)
*/
ReplicationSlotNameForTablesync(subid, relid, syncslotname);
- elog(DEBUG1,
+ elog(LOG,
"process_syncing_tables_for_sync: dropping the tablesync slot \"%s\".",
syncslotname);
ReplicationSlotDropAtPubNode(wrconn, syncslotname, false);
+ elog(LOG,
+ "!!>> process_syncing_tables_for_sync: dropped the tablesync slot \"%s\".",
+ syncslotname);
/*
* Change state to SYNCDONE.
@@ -559,10 +574,13 @@ process_syncing_tables_for_apply(XLogRecPtr current_lsn)
originid = replorigin_by_name(originname, true);
if (OidIsValid(originid))
{
- elog(DEBUG1,
+ elog(LOG,
"process_syncing_tables_for_apply: dropping tablesync origin tracking for \"%s\".",
originname);
replorigin_drop(originid, false);
+ elog(LOG,
+ "!!>> process_syncing_tables_for_apply: dropped tablesync origin tracking for \"%s\".",
+ originname);
}
}
@@ -1061,14 +1079,22 @@ LogicalRepSyncTableStart(XLogRecPtr *origin_startpos)
* The COPY phase was previously done, but tablesync then crashed/etc
* before it was able to finish normally.
*/
+ elog(LOG,
+ "!!>> LogicalRepSyncTableStart: tablesync relstate was SUBREL_STATE_FINISHEDCOPY.");
StartTransactionCommand();
/*
* The origin tracking name must already exist (missing_ok=false).
*/
originid = replorigin_by_name(originname, false);
+ elog(LOG,
+ "!!>> LogicalRepSyncTableStart: 2 replorigin_session_setup \"%s\".",
+ originname);
replorigin_session_setup(originid);
replorigin_session_origin = originid;
+ elog(LOG,
+ "!!>> LogicalRepSyncTableStart: 2 replorigin_session_get_progress \"%s\".",
+ originname);
*origin_startpos = replorigin_session_get_progress(false);
goto copy_table_done;
@@ -1117,6 +1143,9 @@ LogicalRepSyncTableStart(XLogRecPtr *origin_startpos)
* for the catchup phase after COPY is done, so tell it to use the
* snapshot to make the final data consistent.
*/
+ elog(LOG,
+ "!!>> LogicalRepSyncTableStart: walrcv_create_slot for \"%s\".",
+ slotname);
walrcv_create_slot(wrconn, slotname, false,
CRS_USE_SNAPSHOT, origin_startpos);
@@ -1148,10 +1177,13 @@ LogicalRepSyncTableStart(XLogRecPtr *origin_startpos)
* If something failed during copy table then cleanup the created
* slot.
*/
- elog(DEBUG1,
+ elog(LOG,
"LogicalRepSyncTableStart: tablesync copy failed. Dropping the tablesync slot \"%s\".",
slotname);
ReplicationSlotDropAtPubNode(wrconn, slotname, false);
+ elog(LOG,
+ "!!>> LogicalRepSyncTableStart: tablesync copy failed. Dropped the tablesync slot \"%s\".",
+ slotname);
pfree(slotname);
slotname = NULL;
@@ -1172,11 +1204,20 @@ LogicalRepSyncTableStart(XLogRecPtr *origin_startpos)
* replication origin from vanishing while advancing.
*/
LockRelationOid(ReplicationOriginRelationId, RowExclusiveLock);
+ elog(LOG,
+ "!!>> LogicalRepSyncTableStart: 1 replorigin_create \"%s\".",
+ originname);
originid = replorigin_create(originname);
+ elog(LOG,
+ "!!>> LogicalRepSyncTableStart: 1 replorigin_advance \"%s\".",
+ originname);
replorigin_advance(originid, *origin_startpos, InvalidXLogRecPtr,
true /* go backward */ , true /* WAL log */ );
UnlockRelationOid(ReplicationOriginRelationId, RowExclusiveLock);
+ elog(LOG,
+ "!!>> LogicalRepSyncTableStart: 1 replorigin_session_setup \"%s\".",
+ originname);
replorigin_session_setup(originid);
replorigin_session_origin = originid;
}
@@ -1199,7 +1240,7 @@ LogicalRepSyncTableStart(XLogRecPtr *origin_startpos)
copy_table_done:
- elog(DEBUG1,
+ elog(LOG,
"LogicalRepSyncTableStart: '%s' origin_startpos lsn %X/%X",
originname,
(uint32) (*origin_startpos >> 32),
diff --git a/src/backend/tcop/postgres.c b/src/backend/tcop/postgres.c
index ede6c2c..d267eee 100644
--- a/src/backend/tcop/postgres.c
+++ b/src/backend/tcop/postgres.c
@@ -3084,6 +3084,8 @@ ProcessInterrupts(void)
errmsg("terminating autovacuum process due to administrator command")));
else if (IsLogicalWorker())
{
+ elog(LOG, "!!>> ProcessInterrupts: Hello, I am a LogicalWorker");
+
/* Tablesync workers do their own cleanups. */
if (IsLogicalWorkerTablesync())
tablesync_cleanup_at_interrupt(); /* does not return. */
--
1.8.3.1
v16-0001-Tablesync-Solution1.patchapplication/octet-stream; name=v16-0001-Tablesync-Solution1.patchDownload
From 879dce01d634ee488a4985c42ef4e392d0d0bfda Mon Sep 17 00:00:00 2001
From: Peter Smith <peter.b.smith@fujitsu.com>
Date: Mon, 18 Jan 2021 21:23:59 +1100
Subject: [PATCH v16] Tablesync Solution1.
====
Features:
* The tablesync slot is now permanent instead of temporary.
* The tablesync slot name is no longer tied to the Subscription slot name.
* The tablesync worker is now allowing multiple tx instead of single tx
* A new state (SUBREL_STATE_FINISHEDCOPY) is persisted after a successful copy_table in tablesync's LogicalRepSyncTableStart.
* If a re-launched tablesync finds state SUBREL_STATE_FINISHEDCOPY then it will bypass the initial copy_table phase.
* Now tablesync sets up replication origin tracking in LogicalRepSyncTableStart (similar as done for the apply worker). The origin is advanced when first created.
* Cleanup of tablesync resources:
- The tablesync slot cleanup (drop) code is added for process_syncing_tables_for_sync functions.
- The tablesync replication origin tracking is cleaned process_syncing_tables_for_apply.
- A tablesync function to cleanup its own slot/origin is called fro ProcessInterrupt. This is indirectly invoked by DropSubscription/AlterSubscrition when they signal the tablesync worker to stop.
* Updates to PG docs.
TODO / Known Issues:
* Race condition observed in "make check" may be related to this patch.
* Add test cases.
---
doc/src/sgml/catalogs.sgml | 1 +
doc/src/sgml/logical-replication.sgml | 17 +-
doc/src/sgml/ref/drop_subscription.sgml | 6 +-
src/backend/commands/subscriptioncmds.c | 321 ++++++++++++++-----------
src/backend/replication/logical/origin.c | 2 +-
src/backend/replication/logical/tablesync.c | 347 ++++++++++++++++++++++++----
src/backend/replication/logical/worker.c | 27 +--
src/backend/tcop/postgres.c | 6 +
src/include/catalog/pg_subscription_rel.h | 2 +
src/include/replication/logicalworker.h | 2 +
src/include/replication/slot.h | 3 +
11 files changed, 537 insertions(+), 197 deletions(-)
diff --git a/doc/src/sgml/catalogs.sgml b/doc/src/sgml/catalogs.sgml
index 3a22665..2e46a49 100644
--- a/doc/src/sgml/catalogs.sgml
+++ b/doc/src/sgml/catalogs.sgml
@@ -7651,6 +7651,7 @@ SCRAM-SHA-256$<replaceable><iteration count></replaceable>:<replaceable>&l
State code:
<literal>i</literal> = initialize,
<literal>d</literal> = data is being copied,
+ <literal>f</literal> = finished table copy,
<literal>s</literal> = synchronized,
<literal>r</literal> = ready (normal replication)
</para></entry>
diff --git a/doc/src/sgml/logical-replication.sgml b/doc/src/sgml/logical-replication.sgml
index a560ad6..20cdd57 100644
--- a/doc/src/sgml/logical-replication.sgml
+++ b/doc/src/sgml/logical-replication.sgml
@@ -248,7 +248,17 @@
<para>
As mentioned earlier, each (active) subscription receives changes from a
- replication slot on the remote (publishing) side. Normally, the remote
+ replication slot on the remote (publishing) side.
+ </para>
+ <para>
+ Additional table synchronization slots are normally transient, created
+ internally and dropped automatically when they are no longer needed.
+ These table synchronization slots have generated names:
+ <quote><literal>pg_%u_sync_%u</literal></quote> (parameters: Subscription
+ <parameter>oid</parameter>, Table <parameter>relid</parameter>)
+ </para>
+ <para>
+ Normally, the remote
replication slot is created automatically when the subscription is created
using <command>CREATE SUBSCRIPTION</command> and it is dropped
automatically when the subscription is dropped using <command>DROP
@@ -294,8 +304,9 @@
using <command>ALTER SUBSCRIPTION</command> before attempting to drop
the subscription. If the remote database instance no longer exists, no
further action is then necessary. If, however, the remote database
- instance is just unreachable, the replication slot should then be
- dropped manually; otherwise it would continue to reserve WAL and might
+ instance is just unreachable, the replication slot (and any still
+ remaining table synchronization slots) should then be
+ dropped manually; otherwise it/they would continue to reserve WAL and might
eventually cause the disk to fill up. Such cases should be carefully
investigated.
</para>
diff --git a/doc/src/sgml/ref/drop_subscription.sgml b/doc/src/sgml/ref/drop_subscription.sgml
index adbdeaf..aee9615 100644
--- a/doc/src/sgml/ref/drop_subscription.sgml
+++ b/doc/src/sgml/ref/drop_subscription.sgml
@@ -79,7 +79,8 @@ DROP SUBSCRIPTION [ IF EXISTS ] <replaceable class="parameter">name</replaceable
<para>
When dropping a subscription that is associated with a replication slot on
the remote host (the normal state), <command>DROP SUBSCRIPTION</command>
- will connect to the remote host and try to drop the replication slot as
+ will connect to the remote host and try to drop the replication slot (and
+ any remaining table synchronization slots) as
part of its operation. This is necessary so that the resources allocated
for the subscription on the remote host are released. If this fails,
either because the remote host is not reachable or because the remote
@@ -89,7 +90,8 @@ DROP SUBSCRIPTION [ IF EXISTS ] <replaceable class="parameter">name</replaceable
executing <literal>ALTER SUBSCRIPTION ... SET (slot_name = NONE)</literal>.
After that, <command>DROP SUBSCRIPTION</command> will no longer attempt any
actions on a remote host. Note that if the remote replication slot still
- exists, it should then be dropped manually; otherwise it will continue to
+ exists, it (and any related table synchronization slots) should then be
+ dropped manually; otherwise it/they will continue to
reserve WAL and might eventually cause the disk to fill up. See
also <xref linkend="logical-replication-subscription-slot"/>.
</para>
diff --git a/src/backend/commands/subscriptioncmds.c b/src/backend/commands/subscriptioncmds.c
index 490e935..0363ad1 100644
--- a/src/backend/commands/subscriptioncmds.c
+++ b/src/backend/commands/subscriptioncmds.c
@@ -37,6 +37,7 @@
#include "replication/walreceiver.h"
#include "replication/walsender.h"
#include "replication/worker_internal.h"
+#include "replication/slot.h"
#include "storage/lmgr.h"
#include "utils/acl.h"
#include "utils/builtins.h"
@@ -570,96 +571,103 @@ AlterSubscription_refresh(Subscription *sub, bool copy_data)
/* Load the library providing us libpq calls. */
load_file("libpqwalreceiver", false);
- /* Try to connect to the publisher. */
- wrconn = walrcv_connect(sub->conninfo, true, sub->name, &err);
- if (!wrconn)
- ereport(ERROR,
- (errmsg("could not connect to the publisher: %s", err)));
-
- /* Get the table list from publisher. */
- pubrel_names = fetch_table_list(wrconn, sub->publications);
-
- /* We are done with the remote side, close connection. */
- walrcv_disconnect(wrconn);
-
- /* Get local table list. */
- subrel_states = GetSubscriptionRelations(sub->oid);
-
- /*
- * Build qsorted array of local table oids for faster lookup. This can
- * potentially contain all tables in the database so speed of lookup is
- * important.
- */
- subrel_local_oids = palloc(list_length(subrel_states) * sizeof(Oid));
- off = 0;
- foreach(lc, subrel_states)
+ PG_TRY();
{
- SubscriptionRelState *relstate = (SubscriptionRelState *) lfirst(lc);
+ /* Try to connect to the publisher. */
+ wrconn = walrcv_connect(sub->conninfo, true, sub->name, &err);
+ if (!wrconn)
+ ereport(ERROR,
+ (errmsg("could not connect to the publisher: %s", err)));
- subrel_local_oids[off++] = relstate->relid;
- }
- qsort(subrel_local_oids, list_length(subrel_states),
- sizeof(Oid), oid_cmp);
+ /* Get the table list from publisher. */
+ pubrel_names = fetch_table_list(wrconn, sub->publications);
- /*
- * Walk over the remote tables and try to match them to locally known
- * tables. If the table is not known locally create a new state for it.
- *
- * Also builds array of local oids of remote tables for the next step.
- */
- off = 0;
- pubrel_local_oids = palloc(list_length(pubrel_names) * sizeof(Oid));
+ /* Get local table list. */
+ subrel_states = GetSubscriptionRelations(sub->oid);
- foreach(lc, pubrel_names)
- {
- RangeVar *rv = (RangeVar *) lfirst(lc);
- Oid relid;
+ /*
+ * Build qsorted array of local table oids for faster lookup. This can
+ * potentially contain all tables in the database so speed of lookup
+ * is important.
+ */
+ subrel_local_oids = palloc(list_length(subrel_states) * sizeof(Oid));
+ off = 0;
+ foreach(lc, subrel_states)
+ {
+ SubscriptionRelState *relstate = (SubscriptionRelState *) lfirst(lc);
- relid = RangeVarGetRelid(rv, AccessShareLock, false);
+ subrel_local_oids[off++] = relstate->relid;
+ }
+ qsort(subrel_local_oids, list_length(subrel_states),
+ sizeof(Oid), oid_cmp);
+
+ /*
+ * Walk over the remote tables and try to match them to locally known
+ * tables. If the table is not known locally create a new state for
+ * it.
+ *
+ * Also builds array of local oids of remote tables for the next step.
+ */
+ off = 0;
+ pubrel_local_oids = palloc(list_length(pubrel_names) * sizeof(Oid));
+
+ foreach(lc, pubrel_names)
+ {
+ RangeVar *rv = (RangeVar *) lfirst(lc);
+ Oid relid;
- /* Check for supported relkind. */
- CheckSubscriptionRelkind(get_rel_relkind(relid),
- rv->schemaname, rv->relname);
+ relid = RangeVarGetRelid(rv, AccessShareLock, false);
- pubrel_local_oids[off++] = relid;
+ /* Check for supported relkind. */
+ CheckSubscriptionRelkind(get_rel_relkind(relid),
+ rv->schemaname, rv->relname);
- if (!bsearch(&relid, subrel_local_oids,
- list_length(subrel_states), sizeof(Oid), oid_cmp))
- {
- AddSubscriptionRelState(sub->oid, relid,
- copy_data ? SUBREL_STATE_INIT : SUBREL_STATE_READY,
- InvalidXLogRecPtr);
- ereport(DEBUG1,
- (errmsg("table \"%s.%s\" added to subscription \"%s\"",
- rv->schemaname, rv->relname, sub->name)));
- }
- }
+ pubrel_local_oids[off++] = relid;
- /*
- * Next remove state for tables we should not care about anymore using the
- * data we collected above
- */
- qsort(pubrel_local_oids, list_length(pubrel_names),
- sizeof(Oid), oid_cmp);
+ if (!bsearch(&relid, subrel_local_oids,
+ list_length(subrel_states), sizeof(Oid), oid_cmp))
+ {
+ AddSubscriptionRelState(sub->oid, relid,
+ copy_data ? SUBREL_STATE_INIT : SUBREL_STATE_READY,
+ InvalidXLogRecPtr);
+ ereport(DEBUG1,
+ (errmsg("table \"%s.%s\" added to subscription \"%s\"",
+ rv->schemaname, rv->relname, sub->name)));
+ }
+ }
- for (off = 0; off < list_length(subrel_states); off++)
- {
- Oid relid = subrel_local_oids[off];
+ /*
+ * Next remove state for tables we should not care about anymore using
+ * the data we collected above
+ */
+ qsort(pubrel_local_oids, list_length(pubrel_names),
+ sizeof(Oid), oid_cmp);
- if (!bsearch(&relid, pubrel_local_oids,
- list_length(pubrel_names), sizeof(Oid), oid_cmp))
+ for (off = 0; off < list_length(subrel_states); off++)
{
- RemoveSubscriptionRel(sub->oid, relid);
+ Oid relid = subrel_local_oids[off];
- logicalrep_worker_stop_at_commit(sub->oid, relid);
+ if (!bsearch(&relid, pubrel_local_oids,
+ list_length(pubrel_names), sizeof(Oid), oid_cmp))
+ {
+ RemoveSubscriptionRel(sub->oid, relid);
- ereport(DEBUG1,
- (errmsg("table \"%s.%s\" removed from subscription \"%s\"",
- get_namespace_name(get_rel_namespace(relid)),
- get_rel_name(relid),
- sub->name)));
+ logicalrep_worker_stop_at_commit(sub->oid, relid);
+
+ ereport(DEBUG1,
+ (errmsg("table \"%s.%s\" removed from subscription \"%s\"",
+ get_namespace_name(get_rel_namespace(relid)),
+ get_rel_name(relid),
+ sub->name)));
+ }
}
}
+ PG_FINALLY();
+ {
+ if (wrconn)
+ walrcv_disconnect(wrconn);
+ }
+ PG_END_TRY();
}
/*
@@ -928,7 +936,6 @@ DropSubscription(DropSubscriptionStmt *stmt, bool isTopLevel)
char *err = NULL;
RepOriginId originid;
WalReceiverConn *wrconn = NULL;
- StringInfoData cmd;
Form_pg_subscription form;
/*
@@ -1016,100 +1023,140 @@ DropSubscription(DropSubscriptionStmt *stmt, bool isTopLevel)
ReleaseSysCache(tup);
/*
- * Stop all the subscription workers immediately.
- *
- * This is necessary if we are dropping the replication slot, so that the
- * slot becomes accessible.
+ * Try to acquire the connection necessary for dropping slots. We do this
+ * here so that the same connection may be shared for dropping the
+ * Subscription slot, as well as dropping any tablesync slots.
*
- * It is also necessary if the subscription is disabled and was disabled
- * in the same transaction. Then the workers haven't seen the disabling
- * yet and will still be running, leading to hangs later when we want to
- * drop the replication origin. If the subscription was disabled before
- * this transaction, then there shouldn't be any workers left, so this
- * won't make a difference.
- *
- * New workers won't be started because we hold an exclusive lock on the
- * subscription till the end of the transaction.
+ * Note: If the slotname is NONE/NULL then connection errors are
+ * suppressed. This is necessary so that the DROP SUBSCRIPTION can still
+ * complete even when the connection to publisher is broken.
*/
- LWLockAcquire(LogicalRepWorkerLock, LW_SHARED);
- subworkers = logicalrep_workers_find(subid, false);
- LWLockRelease(LogicalRepWorkerLock);
- foreach(lc, subworkers)
+ load_file("libpqwalreceiver", false);
+
+ wrconn = walrcv_connect(conninfo, true, subname, &err);
+ if (wrconn == NULL && slotname != NULL)
+ ereport(ERROR,
+ (errmsg("could not connect to publisher when attempting to "
+ "drop the replication slot \"%s\"", slotname),
+ errdetail("The error was: %s", err),
+ /* translator: %s is an SQL ALTER command */
+ errhint("Use %s to disassociate the subscription from the slot.",
+ "ALTER SUBSCRIPTION ... SET (slot_name = NONE)")));
+
+ PG_TRY();
{
- LogicalRepWorker *w = (LogicalRepWorker *) lfirst(lc);
+ /*
+ * Stop all the subscription workers immediately.
+ *
+ * This is necessary if we are dropping the replication slot, so that
+ * the slot becomes accessible.
+ *
+ * It is also necessary if the subscription is disabled and was
+ * disabled in the same transaction. Then the workers haven't seen
+ * the disabling yet and will still be running, leading to hangs later
+ * when we want to drop the replication origin. If the subscription
+ * was disabled before this transaction, then there shouldn't be any
+ * workers left, so this won't make a difference.
+ *
+ * New workers won't be started because we hold an exclusive lock on
+ * the subscription till the end of the transaction.
+ */
+ LWLockAcquire(LogicalRepWorkerLock, LW_SHARED);
+ subworkers = logicalrep_workers_find(subid, false);
+ LWLockRelease(LogicalRepWorkerLock);
+ foreach(lc, subworkers)
+ {
+ LogicalRepWorker *w = (LogicalRepWorker *) lfirst(lc);
- logicalrep_worker_stop(w->subid, w->relid);
+ logicalrep_worker_stop(w->subid, w->relid);
+ }
+ list_free(subworkers);
+
+ /* Clean up dependencies. */
+ deleteSharedDependencyRecordsFor(SubscriptionRelationId, subid, 0);
+
+ /* Remove any associated relation synchronization states. */
+ RemoveSubscriptionRel(subid, InvalidOid);
+
+ /* Remove the origin tracking if exists. */
+ snprintf(originname, sizeof(originname), "pg_%u", subid);
+ originid = replorigin_by_name(originname, true);
+ if (originid != InvalidRepOriginId)
+ replorigin_drop(originid, false);
+
+ /*
+ * If there is a slot associated with the subscription, then drop the
+ * replication slot at the publisher node using the replication
+ * connection.
+ */
+ if (slotname)
+ ReplicationSlotDropAtPubNode(wrconn, slotname, false);
}
- list_free(subworkers);
+ PG_FINALLY();
+ {
+ if (wrconn)
+ walrcv_disconnect(wrconn);
- /* Clean up dependencies */
- deleteSharedDependencyRecordsFor(SubscriptionRelationId, subid, 0);
+ table_close(rel, NoLock);
+ }
+ PG_END_TRY();
+}
- /* Remove any associated relation synchronization states. */
- RemoveSubscriptionRel(subid, InvalidOid);
- /* Remove the origin tracking if exists. */
- snprintf(originname, sizeof(originname), "pg_%u", subid);
- originid = replorigin_by_name(originname, true);
- if (originid != InvalidRepOriginId)
- replorigin_drop(originid, false);
+/*
+ * Drop the replication slot at the publisher node using the replication connection.
+ *
+ * missing_ok - if true then only issue WARNING message if the slot cannot be deleted.
+ */
+void
+ReplicationSlotDropAtPubNode(WalReceiverConn *wrconn, char *slotname, bool missing_ok)
+{
+ StringInfoData cmd;
- /*
- * If there is no slot associated with the subscription, we can finish
- * here.
- */
- if (!slotname)
- {
- table_close(rel, NoLock);
- return;
- }
+ Assert(wrconn);
- /*
- * Otherwise drop the replication slot at the publisher node using the
- * replication connection.
- */
load_file("libpqwalreceiver", false);
initStringInfo(&cmd);
appendStringInfo(&cmd, "DROP_REPLICATION_SLOT %s WAIT", quote_identifier(slotname));
- wrconn = walrcv_connect(conninfo, true, subname, &err);
- if (wrconn == NULL)
- ereport(ERROR,
- (errmsg("could not connect to publisher when attempting to "
- "drop the replication slot \"%s\"", slotname),
- errdetail("The error was: %s", err),
- /* translator: %s is an SQL ALTER command */
- errhint("Use %s to disassociate the subscription from the slot.",
- "ALTER SUBSCRIPTION ... SET (slot_name = NONE)")));
-
PG_TRY();
{
WalRcvExecResult *res;
res = walrcv_exec(wrconn, cmd.data, 0, NULL);
- if (res->status != WALRCV_OK_COMMAND)
- ereport(ERROR,
+ if (res->status == WALRCV_OK_COMMAND)
+ {
+ /* NOTICE. Success. */
+ ereport(NOTICE,
+ (errmsg("dropped replication slot \"%s\" on publisher",
+ slotname)));
+ }
+ else if (res->status == WALRCV_ERROR && missing_ok)
+ {
+ /* WARNING. Error, but missing_ok = true. */
+ ereport(WARNING,
(errmsg("could not drop the replication slot \"%s\" on publisher",
slotname),
errdetail("The error was: %s", res->err)));
+ }
else
- ereport(NOTICE,
- (errmsg("dropped replication slot \"%s\" on publisher",
- slotname)));
+ {
+ /* ERROR. */
+ ereport(ERROR,
+ (errmsg("could not drop the replication slot \"%s\" on publisher",
+ slotname),
+ errdetail("The error was: %s", res->err)));
+ }
walrcv_clear_result(res);
}
PG_FINALLY();
{
- walrcv_disconnect(wrconn);
+ pfree(cmd.data);
}
PG_END_TRY();
-
- pfree(cmd.data);
-
- table_close(rel, NoLock);
}
/*
diff --git a/src/backend/replication/logical/origin.c b/src/backend/replication/logical/origin.c
index 77781d0..304c879 100644
--- a/src/backend/replication/logical/origin.c
+++ b/src/backend/replication/logical/origin.c
@@ -357,7 +357,7 @@ restart:
if (state->roident == roident)
{
/* found our slot, is it busy? */
- if (state->acquired_by != 0)
+ if (state->acquired_by != 0 && state->acquired_by != MyProcPid)
{
ConditionVariable *cv;
diff --git a/src/backend/replication/logical/tablesync.c b/src/backend/replication/logical/tablesync.c
index 863d196..cde3297 100644
--- a/src/backend/replication/logical/tablesync.c
+++ b/src/backend/replication/logical/tablesync.c
@@ -31,8 +31,10 @@
* table state to INIT.
* - Tablesync worker starts; changes table state from INIT to DATASYNC while
* copying.
- * - Tablesync worker finishes the copy and sets table state to SYNCWAIT;
- * waits for state change.
+ * - Tablesync worker does initial table copy; there is a FINISHEDCOPY state to
+ * indicate when the copy phase has completed, so if the worker crashes
+ * before reaching SYNCDONE the copy will not be re-attempted.
+ * - Tablesync worker then sets table state to SYNCWAIT; waits for state change.
* - Apply worker periodically checks for tables in SYNCWAIT state. When
* any appear, it sets the table state to CATCHUP and starts loop-waiting
* until either the table state is set to SYNCDONE or the sync worker
@@ -48,8 +50,8 @@
* point it sets state to READY and stops tracking. Again, there might
* be zero changes in between.
*
- * So the state progression is always: INIT -> DATASYNC -> SYNCWAIT ->
- * CATCHUP -> SYNCDONE -> READY.
+ * So the state progression is always: INIT -> DATASYNC ->
+ * (sync worker FINISHEDCOPY) -> SYNCWAIT -> CATCHUP -> SYNCDONE -> READY.
*
* The catalog pg_subscription_rel is used to keep information about
* subscribed tables and their state. Some transient state during data
@@ -59,6 +61,7 @@
* Example flows look like this:
* - Apply is in front:
* sync:8
+ * -> set in catalog FINISHEDCOPY
* -> set in memory SYNCWAIT
* apply:10
* -> set in memory CATCHUP
@@ -74,6 +77,7 @@
*
* - Sync is in front:
* sync:10
+ * -> set in catalog FINISHEDCOPY
* -> set in memory SYNCWAIT
* apply:8
* -> set in memory CATCHUP
@@ -98,11 +102,15 @@
#include "miscadmin.h"
#include "parser/parse_relation.h"
#include "pgstat.h"
+#include "postmaster/interrupt.h"
#include "replication/logicallauncher.h"
#include "replication/logicalrelation.h"
#include "replication/walreceiver.h"
#include "replication/worker_internal.h"
+#include "replication/slot.h"
+#include "replication/origin.h"
#include "storage/ipc.h"
+#include "storage/lmgr.h"
#include "utils/builtins.h"
#include "utils/lsyscache.h"
#include "utils/memutils.h"
@@ -260,6 +268,93 @@ invalidate_syncing_table_states(Datum arg, int cacheid, uint32 hashvalue)
}
/*
+ * The sync worker cleans up any slot / origin resources it may have created.
+ * This function is called from ProcessInterrupts() as result of tablesync being
+ * signalled.
+ */
+void
+tablesync_cleanup_at_interrupt(void)
+{
+ bool drop_slot_needed;
+ char originname[NAMEDATALEN] = {0};
+ RepOriginId originid;
+ TimeLineID tli;
+ Oid subid = MySubscription->oid;
+ Oid relid = MyLogicalRepWorker->relid;
+
+ elog(DEBUG1,
+ "tablesync_cleanup_at_interrupt for relid = %d",
+ MyLogicalRepWorker->relid);
+
+
+ /*
+ * Cleanup the tablesync slot, if needed.
+ *
+ * If state is SYNCDONE or READ then the slot has already been dropped.
+ */
+ drop_slot_needed =
+ wrconn != NULL &&
+ MyLogicalRepWorker->relstate != SUBREL_STATE_SYNCDONE &&
+ MyLogicalRepWorker->relstate != SUBREL_STATE_READY;
+
+ if (drop_slot_needed)
+ {
+ char syncslotname[NAMEDATALEN] = {0};
+ bool missing_ok = true; /* no ERROR if slot is
+ * missing. */
+
+ /*
+ * End wal streaming is needed so wrconn can be re-used to drop the
+ * slot.
+ */
+ PG_TRY();
+ {
+ walrcv_endstreaming(wrconn, &tli);
+ }
+ PG_CATCH();
+ {
+ /*
+ * It is possible that the walrcv_startstreaming was not yet
+ * called (e.g. the interrupt initiating this cleanup may have
+ * happened during the table COPY phase). So suppress any error
+ * here to cope with that scenario.
+ */
+ }
+ PG_END_TRY();
+
+ ReplicationSlotNameForTablesync(subid, relid, syncslotname);
+
+ ReplicationSlotDropAtPubNode(wrconn, syncslotname, missing_ok);
+ }
+
+ /*
+ * Remove the tablesync's origin tracking if exists.
+ *
+ * The origin APIS must be called within a transaction.
+ * The transaction will be ended within the finish_sync_worker().
+ */
+ if (!IsTransactionState())
+ {
+ StartTransactionCommand();
+ }
+ snprintf(originname, sizeof(originname), "pg_%u_%u", subid, relid);
+ originid = replorigin_by_name(originname, true);
+ if (originid != InvalidRepOriginId)
+ {
+ replorigin_drop(originid, false);
+ /*
+ * CommitTransactionCommand would normally attempt to advance the origin
+ * but now that the origin is dropped that would fail, so we need to reset
+ * the replorigin_session as well to prevent that.
+ */
+ replorigin_session_reset();
+ replorigin_session_origin = InvalidRepOriginId;
+ }
+
+ finish_sync_worker(); /* doesn't return. */
+}
+
+/*
* Handle table synchronization cooperation from the synchronization
* worker.
*
@@ -270,30 +365,58 @@ invalidate_syncing_table_states(Datum arg, int cacheid, uint32 hashvalue)
static void
process_syncing_tables_for_sync(XLogRecPtr current_lsn)
{
- Assert(IsTransactionState());
+ bool sync_done = false;
+ Oid subid = MySubscription->oid;
+ Oid relid = MyLogicalRepWorker->relid;
SpinLockAcquire(&MyLogicalRepWorker->relmutex);
+ sync_done = MyLogicalRepWorker->relstate == SUBREL_STATE_CATCHUP &&
+ current_lsn >= MyLogicalRepWorker->relstate_lsn;
+ SpinLockRelease(&MyLogicalRepWorker->relmutex);
- if (MyLogicalRepWorker->relstate == SUBREL_STATE_CATCHUP &&
- current_lsn >= MyLogicalRepWorker->relstate_lsn)
+ if (sync_done)
{
TimeLineID tli;
+ char syncslotname[NAMEDATALEN] = {0};
+ /* End wal streaming so wrconn can be re-used to drop the slot. */
+ walrcv_endstreaming(wrconn, &tli);
+
+ /*
+ * Cleanup the tablesync slot.
+ */
+ ReplicationSlotNameForTablesync(subid, relid, syncslotname);
+
+ elog(DEBUG1,
+ "process_syncing_tables_for_sync: dropping the tablesync slot \"%s\".",
+ syncslotname);
+ ReplicationSlotDropAtPubNode(wrconn, syncslotname, false);
+
+ /*
+ * Change state to SYNCDONE.
+ */
+ SpinLockAcquire(&MyLogicalRepWorker->relmutex);
MyLogicalRepWorker->relstate = SUBREL_STATE_SYNCDONE;
MyLogicalRepWorker->relstate_lsn = current_lsn;
SpinLockRelease(&MyLogicalRepWorker->relmutex);
+ /*
+ * UpdateSubscriptionRelState must be called within a transaction.
+ * That transaction will be ended within the finish_sync_worker().
+ */
+ if (!IsTransactionState())
+ {
+ StartTransactionCommand();
+ }
+
UpdateSubscriptionRelState(MyLogicalRepWorker->subid,
MyLogicalRepWorker->relid,
MyLogicalRepWorker->relstate,
MyLogicalRepWorker->relstate_lsn);
- walrcv_endstreaming(wrconn, &tli);
finish_sync_worker();
}
- else
- SpinLockRelease(&MyLogicalRepWorker->relmutex);
}
/*
@@ -412,6 +535,40 @@ process_syncing_tables_for_apply(XLogRecPtr current_lsn)
started_tx = true;
}
+ /*
+ * Remove the tablesync origin tracking if exists.
+ *
+ * The cleanup is done here instead of in the
+ * finish_sync_worker function because if the tablesync worker
+ * process attempted to call replorigin_drop then that will
+ * hang because replorigin_drop logic considers the owning
+ * tablesync PID as "busy".
+ *
+ * FIXME - Above comment is not longer corrent because fix was needed to allow drop during shutdown
+ * so maybe this should be moved back into the tabelsync code.
+ *
+ * Do this before updating the state, so that DropSubscription
+ * can know that all READY workers have already had their
+ * origin tracking removed.
+ */
+ {
+ char originname[NAMEDATALEN];
+ RepOriginId originid;
+
+ snprintf(originname, sizeof(originname), "pg_%u_%u", MyLogicalRepWorker->subid, rstate->relid);
+ originid = replorigin_by_name(originname, true);
+ if (OidIsValid(originid))
+ {
+ elog(DEBUG1,
+ "process_syncing_tables_for_apply: dropping tablesync origin tracking for \"%s\".",
+ originname);
+ replorigin_drop(originid, false);
+ }
+ }
+
+ /*
+ * Update the state only after the origin cleanup.
+ */
UpdateSubscriptionRelState(MyLogicalRepWorker->subid,
rstate->relid, rstate->state,
rstate->lsn);
@@ -808,6 +965,30 @@ copy_table(Relation rel)
}
/*
+ * Determine the tablesync slot name.
+ *
+ * The name must not exceed NAMEDATALEN -1 because of remote node constraints on
+ * slot name length.
+ *
+ * The returned slot name is either returned in the supplied buffer or
+ * palloc'ed in current memory context (if NULL buffer).
+ */
+char *
+ReplicationSlotNameForTablesync(Oid suboid, Oid relid, char *syncslotname)
+{
+ if (syncslotname)
+ {
+ sprintf(syncslotname, "pg_%u_sync_%u", suboid, relid);
+ }
+ else
+ {
+ syncslotname = psprintf("pg_%u_sync_%u", suboid, relid);
+ }
+
+ return syncslotname;
+}
+
+/*
* Start syncing the table in the sync worker.
*
* If nothing needs to be done to sync the table, we exit the worker without
@@ -824,6 +1005,8 @@ LogicalRepSyncTableStart(XLogRecPtr *origin_startpos)
XLogRecPtr relstate_lsn;
Relation rel;
WalRcvExecResult *res;
+ char originname[NAMEDATALEN];
+ RepOriginId originid;
/* Check the state of the table synchronization. */
StartTransactionCommand();
@@ -849,19 +1032,11 @@ LogicalRepSyncTableStart(XLogRecPtr *origin_startpos)
finish_sync_worker(); /* doesn't return */
}
- /*
- * To build a slot name for the sync work, we are limited to NAMEDATALEN -
- * 1 characters. We cut the original slot name to NAMEDATALEN - 28 chars
- * and append _%u_sync_%u (1 + 10 + 6 + 10 + '\0'). (It's actually the
- * NAMEDATALEN on the remote that matters, but this scheme will also work
- * reasonably if that is different.)
- */
- StaticAssertStmt(NAMEDATALEN >= 32, "NAMEDATALEN too small"); /* for sanity */
- slotname = psprintf("%.*s_%u_sync_%u",
- NAMEDATALEN - 28,
- MySubscription->slotname,
- MySubscription->oid,
- MyLogicalRepWorker->relid);
+ /* Calculate the name of the tablesync slot. */
+ slotname = ReplicationSlotNameForTablesync(
+ MySubscription->oid,
+ MyLogicalRepWorker->relid,
+ NULL); /* use palloc */
/*
* Here we use the slot name instead of the subscription name as the
@@ -874,7 +1049,30 @@ LogicalRepSyncTableStart(XLogRecPtr *origin_startpos)
(errmsg("could not connect to the publisher: %s", err)));
Assert(MyLogicalRepWorker->relstate == SUBREL_STATE_INIT ||
- MyLogicalRepWorker->relstate == SUBREL_STATE_DATASYNC);
+ MyLogicalRepWorker->relstate == SUBREL_STATE_DATASYNC ||
+ MyLogicalRepWorker->relstate == SUBREL_STATE_FINISHEDCOPY);
+
+ /* Assign the origin tracking record name. */
+ snprintf(originname, sizeof(originname), "pg_%u_%u", MySubscription->oid, MyLogicalRepWorker->relid);
+
+ if (MyLogicalRepWorker->relstate == SUBREL_STATE_FINISHEDCOPY)
+ {
+ /*
+ * The COPY phase was previously done, but tablesync then crashed/etc
+ * before it was able to finish normally.
+ */
+ StartTransactionCommand();
+
+ /*
+ * The origin tracking name must already exist (missing_ok=false).
+ */
+ originid = replorigin_by_name(originname, false);
+ replorigin_session_setup(originid);
+ replorigin_session_origin = originid;
+ *origin_startpos = replorigin_session_get_progress(false);
+
+ goto copy_table_done;
+ }
SpinLockAcquire(&MyLogicalRepWorker->relmutex);
MyLogicalRepWorker->relstate = SUBREL_STATE_DATASYNC;
@@ -890,9 +1088,6 @@ LogicalRepSyncTableStart(XLogRecPtr *origin_startpos)
CommitTransactionCommand();
pgstat_report_stat(false);
- /*
- * We want to do the table data sync in a single transaction.
- */
StartTransactionCommand();
/*
@@ -918,29 +1113,99 @@ LogicalRepSyncTableStart(XLogRecPtr *origin_startpos)
walrcv_clear_result(res);
/*
- * Create a new temporary logical decoding slot. This slot will be used
+ * Create a new permanent logical decoding slot. This slot will be used
* for the catchup phase after COPY is done, so tell it to use the
* snapshot to make the final data consistent.
*/
- walrcv_create_slot(wrconn, slotname, true,
+ walrcv_create_slot(wrconn, slotname, false,
CRS_USE_SNAPSHOT, origin_startpos);
- /* Now do the initial data copy */
- PushActiveSnapshot(GetTransactionSnapshot());
- copy_table(rel);
- PopActiveSnapshot();
+ /*
+ * Be sure to remove the newly created tablesync slot if the COPY fails.
+ */
+ PG_TRY();
+ {
+ /* Now do the initial data copy */
+ PushActiveSnapshot(GetTransactionSnapshot());
+ copy_table(rel);
+ PopActiveSnapshot();
+
+ res = walrcv_exec(wrconn, "COMMIT", 0, NULL);
+ if (res->status != WALRCV_OK_COMMAND)
+ ereport(ERROR,
+ (errmsg("table copy could not finish transaction on publisher"),
+ errdetail("The error was: %s", res->err)));
+ walrcv_clear_result(res);
+
+ table_close(rel, NoLock);
+
+ /* Make the copy visible. */
+ CommandCounterIncrement();
+ }
+ PG_CATCH();
+ {
+ /*
+ * If something failed during copy table then cleanup the created
+ * slot.
+ */
+ elog(DEBUG1,
+ "LogicalRepSyncTableStart: tablesync copy failed. Dropping the tablesync slot \"%s\".",
+ slotname);
+ ReplicationSlotDropAtPubNode(wrconn, slotname, false);
- res = walrcv_exec(wrconn, "COMMIT", 0, NULL);
- if (res->status != WALRCV_OK_COMMAND)
+ pfree(slotname);
+ slotname = NULL;
+
+ PG_RE_THROW();
+ }
+ PG_END_TRY();
+
+ /* Setup replication origin tracking. */
+ originid = replorigin_by_name(originname, true);
+ if (!OidIsValid(originid))
+ {
+ /*
+ * Origin tracking does not exist, so create it now.
+ *
+ * Then advance to the LSN got from walrcv_create_slot. This is WAL
+ * logged for for the purpose of recovery. Locks are to prevent the
+ * replication origin from vanishing while advancing.
+ */
+ LockRelationOid(ReplicationOriginRelationId, RowExclusiveLock);
+ originid = replorigin_create(originname);
+ replorigin_advance(originid, *origin_startpos, InvalidXLogRecPtr,
+ true /* go backward */ , true /* WAL log */ );
+ UnlockRelationOid(ReplicationOriginRelationId, RowExclusiveLock);
+
+ replorigin_session_setup(originid);
+ replorigin_session_origin = originid;
+ }
+ else
+ {
ereport(ERROR,
- (errmsg("table copy could not finish transaction on publisher"),
- errdetail("The error was: %s", res->err)));
- walrcv_clear_result(res);
+ (errcode(ERRCODE_DUPLICATE_OBJECT),
+ errmsg("replication origin \"%s\" already exists",
+ originname)));
+ }
- table_close(rel, NoLock);
+ /*
+ * Update the persisted state to indicate the COPY phase is done; make it
+ * visible to others.
+ */
+ UpdateSubscriptionRelState(MyLogicalRepWorker->subid,
+ MyLogicalRepWorker->relid,
+ SUBREL_STATE_FINISHEDCOPY,
+ MyLogicalRepWorker->relstate_lsn);
- /* Make the copy visible. */
- CommandCounterIncrement();
+copy_table_done:
+
+ elog(DEBUG1,
+ "LogicalRepSyncTableStart: '%s' origin_startpos lsn %X/%X",
+ originname,
+ (uint32) (*origin_startpos >> 32),
+ (uint32) *origin_startpos);
+
+ CommitTransactionCommand();
/*
* We are done with the initial data synchronization, update the state.
diff --git a/src/backend/replication/logical/worker.c b/src/backend/replication/logical/worker.c
index 1b1d70e..ff95f09 100644
--- a/src/backend/replication/logical/worker.c
+++ b/src/backend/replication/logical/worker.c
@@ -807,12 +807,8 @@ apply_handle_stream_stop(StringInfo s)
/* We must be in a valid transaction state */
Assert(IsTransactionState());
- /* The synchronization worker runs in single transaction. */
- if (!am_tablesync_worker())
- {
- /* Commit the per-stream transaction */
- CommitTransactionCommand();
- }
+ /* Commit the per-stream transaction */
+ CommitTransactionCommand();
in_streamed_transaction = false;
@@ -889,9 +885,7 @@ apply_handle_stream_abort(StringInfo s)
/* Cleanup the subxact info */
cleanup_subxact_info();
- /* The synchronization worker runs in single transaction */
- if (!am_tablesync_worker())
- CommitTransactionCommand();
+ CommitTransactionCommand();
return;
}
@@ -918,8 +912,7 @@ apply_handle_stream_abort(StringInfo s)
/* write the updated subxact list */
subxact_info_write(MyLogicalRepWorker->subid, xid);
- if (!am_tablesync_worker())
- CommitTransactionCommand();
+ CommitTransactionCommand();
}
}
@@ -1062,8 +1055,7 @@ apply_handle_stream_commit(StringInfo s)
static void
apply_handle_commit_internal(StringInfo s, LogicalRepCommitData* commit_data)
{
- /* The synchronization worker runs in single transaction. */
- if (IsTransactionState() && !am_tablesync_worker())
+ if (IsTransactionState())
{
/*
* Update origin state so we can restart streaming from correct
@@ -3111,3 +3103,12 @@ IsLogicalWorker(void)
{
return MyLogicalRepWorker != NULL;
}
+
+/*
+ * Is current process a logical replication tablesync worker?
+ */
+bool
+IsLogicalWorkerTablesync(void)
+{
+ return am_tablesync_worker();
+}
diff --git a/src/backend/tcop/postgres.c b/src/backend/tcop/postgres.c
index 2805568..ede6c2c 100644
--- a/src/backend/tcop/postgres.c
+++ b/src/backend/tcop/postgres.c
@@ -3083,9 +3083,15 @@ ProcessInterrupts(void)
(errcode(ERRCODE_ADMIN_SHUTDOWN),
errmsg("terminating autovacuum process due to administrator command")));
else if (IsLogicalWorker())
+ {
+ /* Tablesync workers do their own cleanups. */
+ if (IsLogicalWorkerTablesync())
+ tablesync_cleanup_at_interrupt(); /* does not return. */
+
ereport(FATAL,
(errcode(ERRCODE_ADMIN_SHUTDOWN),
errmsg("terminating logical replication worker due to administrator command")));
+ }
else if (IsLogicalLauncher())
{
ereport(DEBUG1,
diff --git a/src/include/catalog/pg_subscription_rel.h b/src/include/catalog/pg_subscription_rel.h
index 06663b9..9027c42 100644
--- a/src/include/catalog/pg_subscription_rel.h
+++ b/src/include/catalog/pg_subscription_rel.h
@@ -61,6 +61,8 @@ DECLARE_UNIQUE_INDEX(pg_subscription_rel_srrelid_srsubid_index, 6117, on pg_subs
#define SUBREL_STATE_INIT 'i' /* initializing (sublsn NULL) */
#define SUBREL_STATE_DATASYNC 'd' /* data is being synchronized (sublsn
* NULL) */
+#define SUBREL_STATE_FINISHEDCOPY 'f' /* tablesync copy phase is completed
+ * (sublsn NULL) */
#define SUBREL_STATE_SYNCDONE 's' /* synchronization finished in front of
* apply (sublsn set) */
#define SUBREL_STATE_READY 'r' /* ready (sublsn set) */
diff --git a/src/include/replication/logicalworker.h b/src/include/replication/logicalworker.h
index 2ad61a0..085916c 100644
--- a/src/include/replication/logicalworker.h
+++ b/src/include/replication/logicalworker.h
@@ -15,5 +15,7 @@
extern void ApplyWorkerMain(Datum main_arg);
extern bool IsLogicalWorker(void);
+extern bool IsLogicalWorkerTablesync(void);
+extern void tablesync_cleanup_at_interrupt(void);
#endif /* LOGICALWORKER_H */
diff --git a/src/include/replication/slot.h b/src/include/replication/slot.h
index 53f636c..5f52335 100644
--- a/src/include/replication/slot.h
+++ b/src/include/replication/slot.h
@@ -15,6 +15,7 @@
#include "storage/lwlock.h"
#include "storage/shmem.h"
#include "storage/spin.h"
+#include "replication/walreceiver.h"
/*
* Behaviour of replication slots, upon release or crash.
@@ -211,6 +212,8 @@ extern bool ReplicationSlotsCountDBSlots(Oid dboid, int *nslots, int *nactive);
extern void ReplicationSlotsDropDBSlots(Oid dboid);
extern void InvalidateObsoleteReplicationSlots(XLogSegNo oldestSegno);
extern ReplicationSlot *SearchNamedReplicationSlot(const char *name);
+extern char *ReplicationSlotNameForTablesync(Oid suboid, Oid relid, char *syncslotname);
+extern void ReplicationSlotDropAtPubNode(WalReceiverConn *wrconn, char *slotname, bool missing_ok);
extern void StartupReplicationSlots(void);
extern void CheckPointReplicationSlots(void);
--
1.8.3.1
Hi Amit.
PSA the v17 patch for the Tablesync Solution1.
Main differences from v16:
+ Small refactor for DropSubscription to correct the "make check" deadlock
+ Added test case
+ Some comment wording
====
Features:
* The tablesync slot is now permanent instead of temporary.
* The tablesync slot name is no longer tied to the Subscription slot name.
* The tablesync worker is now allowing multiple tx instead of single tx
* A new state (SUBREL_STATE_FINISHEDCOPY) is persisted after a
successful copy_table in tablesync's LogicalRepSyncTableStart.
* If a re-launched tablesync finds state SUBREL_STATE_FINISHEDCOPY
then it will bypass the initial copy_table phase.
* Now tablesync sets up replication origin tracking in
LogicalRepSyncTableStart (similar as done for the apply worker). The
origin is advanced when first created.
* Cleanup of tablesync resources:
- The tablesync slot cleanup (drop) code is added for
process_syncing_tables_for_sync functions.
- The tablesync replication origin tracking is cleaned
process_syncing_tables_for_apply.
- A tablesync function to cleanup its own slot/origin is called fro
ProcessInterrupts. This is indirectly invoked by
DropSubscription/AlterSubscription when they signal the tablesync
worker to stop.
* Updates to PG docs.
* New TAP test case
TODO / Known Issues:
* None known.
---
Kind Regards,
Peter Smith.
Fujitsu Australia
Attachments:
v17-0002-Tablesync-extra-logging.patchapplication/octet-stream; name=v17-0002-Tablesync-extra-logging.patchDownload
From b6ee12ca3d2c606bc4c6294a35bbe6ba53f3a74f Mon Sep 17 00:00:00 2001
From: Peter Smith <peter.b.smith@fujitsu.com>
Date: Tue, 19 Jan 2021 19:23:35 +1100
Subject: [PATCH v17] Tablesync extra logging.
This patch only adds some extra logging which may be helpful for testing, but is not for committing.
---
src/backend/replication/logical/tablesync.c | 62 ++++++++++++++++++++++++-----
src/backend/tcop/postgres.c | 2 +
2 files changed, 53 insertions(+), 11 deletions(-)
diff --git a/src/backend/replication/logical/tablesync.c b/src/backend/replication/logical/tablesync.c
index ec85c08..388f0da 100644
--- a/src/backend/replication/logical/tablesync.c
+++ b/src/backend/replication/logical/tablesync.c
@@ -283,8 +283,8 @@ tablesync_cleanup_at_interrupt(void)
Oid subid = MySubscription->oid;
Oid relid = MyLogicalRepWorker->relid;
- elog(DEBUG1,
- "tablesync_cleanup_at_interrupt for relid = %d",
+ elog(LOG,
+ "!!>> tablesync_cleanup_at_interrupt for relid = %d",
MyLogicalRepWorker->relid);
/*
@@ -322,7 +322,13 @@ tablesync_cleanup_at_interrupt(void)
ReplicationSlotNameForTablesync(subid, relid, syncslotname);
+ elog(LOG,
+ "!!>> tablesync_cleanup_at_interrupt: dropping the tablesync slot \"%s\".",
+ syncslotname);
ReplicationSlotDropAtPubNode(wrconn, syncslotname, missing_ok);
+ elog(LOG,
+ "!!>> tablesync_cleanup_at_interrupt: dropped the tablesync slot \"%s\".",
+ syncslotname);
}
/*
@@ -339,8 +345,13 @@ tablesync_cleanup_at_interrupt(void)
originid = replorigin_by_name(originname, true);
if (originid != InvalidRepOriginId)
{
+ elog(LOG,
+ "!!>> tablesync_cleanup_at_interrupt: dropping origin tracking for \"%s\"",
+ originname);
replorigin_drop(originid, false);
-
+ elog(LOG,
+ "!!>> tablesync_cleanup_at_interrupt: dropped origin tracking for \"%s\"",
+ originname);
/*
* CommitTransactionCommand would normally attempt to advance the
* origin, but now that the origin has been dropped that would fail,
@@ -387,10 +398,13 @@ process_syncing_tables_for_sync(XLogRecPtr current_lsn)
*/
ReplicationSlotNameForTablesync(subid, relid, syncslotname);
- elog(DEBUG1,
- "process_syncing_tables_for_sync: dropping the tablesync slot \"%s\".",
+ elog(LOG,
+ "!!>> process_syncing_tables_for_sync: dropping the tablesync slot \"%s\".",
syncslotname);
ReplicationSlotDropAtPubNode(wrconn, syncslotname, false);
+ elog(LOG,
+ "!!>> process_syncing_tables_for_sync: dropped the tablesync slot \"%s\".",
+ syncslotname);
/*
* Change state to SYNCDONE.
@@ -552,10 +566,13 @@ process_syncing_tables_for_apply(XLogRecPtr current_lsn)
originid = replorigin_by_name(originname, true);
if (OidIsValid(originid))
{
- elog(DEBUG1,
- "process_syncing_tables_for_apply: dropping tablesync origin tracking for \"%s\".",
+ elog(LOG,
+ "!!>> process_syncing_tables_for_apply: dropping tablesync origin tracking for \"%s\".",
originname);
replorigin_drop(originid, false);
+ elog(LOG,
+ "!!>> process_syncing_tables_for_apply: dropped tablesync origin tracking for \"%s\".",
+ originname);
}
}
@@ -1054,14 +1071,22 @@ LogicalRepSyncTableStart(XLogRecPtr *origin_startpos)
* The COPY phase was previously done, but tablesync then crashed/etc
* before it was able to finish normally.
*/
+ elog(LOG,
+ "!!>> LogicalRepSyncTableStart: tablesync relstate was SUBREL_STATE_FINISHEDCOPY.");
StartTransactionCommand();
/*
* The origin tracking name must already exist (missing_ok=false).
*/
originid = replorigin_by_name(originname, false);
+ elog(LOG,
+ "!!>> LogicalRepSyncTableStart: 2 replorigin_session_setup \"%s\".",
+ originname);
replorigin_session_setup(originid);
replorigin_session_origin = originid;
+ elog(LOG,
+ "!!>> LogicalRepSyncTableStart: 2 replorigin_session_get_progress \"%s\".",
+ originname);
*origin_startpos = replorigin_session_get_progress(false);
goto copy_table_done;
@@ -1110,6 +1135,9 @@ LogicalRepSyncTableStart(XLogRecPtr *origin_startpos)
* for the catchup phase after COPY is done, so tell it to use the
* snapshot to make the final data consistent.
*/
+ elog(LOG,
+ "!!>> LogicalRepSyncTableStart: walrcv_create_slot for \"%s\".",
+ slotname);
walrcv_create_slot(wrconn, slotname, false,
CRS_USE_SNAPSHOT, origin_startpos);
@@ -1141,10 +1169,13 @@ LogicalRepSyncTableStart(XLogRecPtr *origin_startpos)
* If something failed during copy table then cleanup the created
* slot.
*/
- elog(DEBUG1,
- "LogicalRepSyncTableStart: tablesync copy failed. Dropping the tablesync slot \"%s\".",
+ elog(LOG,
+ "!!>> LogicalRepSyncTableStart: tablesync copy failed. Dropping the tablesync slot \"%s\".",
slotname);
ReplicationSlotDropAtPubNode(wrconn, slotname, false);
+ elog(LOG,
+ "!!>> LogicalRepSyncTableStart: tablesync copy failed. Dropped the tablesync slot \"%s\".",
+ slotname);
pfree(slotname);
slotname = NULL;
@@ -1165,11 +1196,20 @@ LogicalRepSyncTableStart(XLogRecPtr *origin_startpos)
* replication origin from vanishing while advancing.
*/
LockRelationOid(ReplicationOriginRelationId, RowExclusiveLock);
+ elog(LOG,
+ "!!>> LogicalRepSyncTableStart: 1 replorigin_create \"%s\".",
+ originname);
originid = replorigin_create(originname);
+ elog(LOG,
+ "!!>> LogicalRepSyncTableStart: 1 replorigin_advance \"%s\".",
+ originname);
replorigin_advance(originid, *origin_startpos, InvalidXLogRecPtr,
true /* go backward */ , true /* WAL log */ );
UnlockRelationOid(ReplicationOriginRelationId, RowExclusiveLock);
+ elog(LOG,
+ "!!>> LogicalRepSyncTableStart: 1 replorigin_session_setup \"%s\".",
+ originname);
replorigin_session_setup(originid);
replorigin_session_origin = originid;
}
@@ -1192,8 +1232,8 @@ LogicalRepSyncTableStart(XLogRecPtr *origin_startpos)
copy_table_done:
- elog(DEBUG1,
- "LogicalRepSyncTableStart: '%s' origin_startpos lsn %X/%X",
+ elog(LOG,
+ "!!>> LogicalRepSyncTableStart: '%s' origin_startpos lsn %X/%X",
originname,
(uint32) (*origin_startpos >> 32),
(uint32) *origin_startpos);
diff --git a/src/backend/tcop/postgres.c b/src/backend/tcop/postgres.c
index 2a0565e..0f3e32a 100644
--- a/src/backend/tcop/postgres.c
+++ b/src/backend/tcop/postgres.c
@@ -3087,6 +3087,8 @@ ProcessInterrupts(void)
errmsg("terminating autovacuum process due to administrator command")));
else if (IsLogicalWorker())
{
+ elog(LOG, "!!>> ProcessInterrupts: Hello, I am a LogicalWorker");
+
/* Tablesync workers do their own cleanups. */
if (IsLogicalWorkerTablesync())
tablesync_cleanup_at_interrupt(); /* does not return. */
--
1.8.3.1
v17-0001-Tablesync-Solution1.patchapplication/octet-stream; name=v17-0001-Tablesync-Solution1.patchDownload
From 0bee8be5428253da2b7f6a6808d321b6976b4204 Mon Sep 17 00:00:00 2001
From: Peter Smith <peter.b.smith@fujitsu.com>
Date: Tue, 19 Jan 2021 19:07:35 +1100
Subject: [PATCH v17] Tablesync Solution1.
====
Features:
* The tablesync slot is now permanent instead of temporary.
* The tablesync slot name is no longer tied to the Subscription slot name.
* The tablesync worker is now allowing multiple tx instead of single tx
* A new state (SUBREL_STATE_FINISHEDCOPY) is persisted after a successful copy_table in tablesync's LogicalRepSyncTableStart.
* If a re-launched tablesync finds state SUBREL_STATE_FINISHEDCOPY then it will bypass the initial copy_table phase.
* Now tablesync sets up replication origin tracking in LogicalRepSyncTableStart (similar as done for the apply worker). The origin is advanced when first created.
* Cleanup of tablesync resources:
- The tablesync slot cleanup (drop) code is added for process_syncing_tables_for_sync functions.
- The tablesync replication origin tracking is cleaned process_syncing_tables_for_apply.
- A tablesync function to cleanup its own slot/origin is called fro ProcessInterrupt. This is indirectly invoked by DropSubscription/AlterSubscrition when they signal the tablesync worker to stop.
* Updates to PG docs.
* New TAP test case
TODO / Known Issues:
* None known.
---
doc/src/sgml/catalogs.sgml | 1 +
doc/src/sgml/logical-replication.sgml | 17 +-
doc/src/sgml/ref/drop_subscription.sgml | 6 +-
src/backend/commands/subscriptioncmds.c | 166 ++++++++------
src/backend/replication/logical/origin.c | 2 +-
src/backend/replication/logical/tablesync.c | 340 ++++++++++++++++++++++++----
src/backend/replication/logical/worker.c | 27 +--
src/backend/tcop/postgres.c | 6 +
src/include/catalog/pg_subscription_rel.h | 2 +
src/include/replication/logicalworker.h | 2 +
src/include/replication/slot.h | 3 +
src/test/subscription/t/004_sync.pl | 96 +++++++-
12 files changed, 540 insertions(+), 128 deletions(-)
diff --git a/doc/src/sgml/catalogs.sgml b/doc/src/sgml/catalogs.sgml
index 43d7a1a..82e74e1 100644
--- a/doc/src/sgml/catalogs.sgml
+++ b/doc/src/sgml/catalogs.sgml
@@ -7662,6 +7662,7 @@ SCRAM-SHA-256$<replaceable><iteration count></replaceable>:<replaceable>&l
State code:
<literal>i</literal> = initialize,
<literal>d</literal> = data is being copied,
+ <literal>f</literal> = finished table copy,
<literal>s</literal> = synchronized,
<literal>r</literal> = ready (normal replication)
</para></entry>
diff --git a/doc/src/sgml/logical-replication.sgml b/doc/src/sgml/logical-replication.sgml
index a560ad6..20cdd57 100644
--- a/doc/src/sgml/logical-replication.sgml
+++ b/doc/src/sgml/logical-replication.sgml
@@ -248,7 +248,17 @@
<para>
As mentioned earlier, each (active) subscription receives changes from a
- replication slot on the remote (publishing) side. Normally, the remote
+ replication slot on the remote (publishing) side.
+ </para>
+ <para>
+ Additional table synchronization slots are normally transient, created
+ internally and dropped automatically when they are no longer needed.
+ These table synchronization slots have generated names:
+ <quote><literal>pg_%u_sync_%u</literal></quote> (parameters: Subscription
+ <parameter>oid</parameter>, Table <parameter>relid</parameter>)
+ </para>
+ <para>
+ Normally, the remote
replication slot is created automatically when the subscription is created
using <command>CREATE SUBSCRIPTION</command> and it is dropped
automatically when the subscription is dropped using <command>DROP
@@ -294,8 +304,9 @@
using <command>ALTER SUBSCRIPTION</command> before attempting to drop
the subscription. If the remote database instance no longer exists, no
further action is then necessary. If, however, the remote database
- instance is just unreachable, the replication slot should then be
- dropped manually; otherwise it would continue to reserve WAL and might
+ instance is just unreachable, the replication slot (and any still
+ remaining table synchronization slots) should then be
+ dropped manually; otherwise it/they would continue to reserve WAL and might
eventually cause the disk to fill up. Such cases should be carefully
investigated.
</para>
diff --git a/doc/src/sgml/ref/drop_subscription.sgml b/doc/src/sgml/ref/drop_subscription.sgml
index adbdeaf..aee9615 100644
--- a/doc/src/sgml/ref/drop_subscription.sgml
+++ b/doc/src/sgml/ref/drop_subscription.sgml
@@ -79,7 +79,8 @@ DROP SUBSCRIPTION [ IF EXISTS ] <replaceable class="parameter">name</replaceable
<para>
When dropping a subscription that is associated with a replication slot on
the remote host (the normal state), <command>DROP SUBSCRIPTION</command>
- will connect to the remote host and try to drop the replication slot as
+ will connect to the remote host and try to drop the replication slot (and
+ any remaining table synchronization slots) as
part of its operation. This is necessary so that the resources allocated
for the subscription on the remote host are released. If this fails,
either because the remote host is not reachable or because the remote
@@ -89,7 +90,8 @@ DROP SUBSCRIPTION [ IF EXISTS ] <replaceable class="parameter">name</replaceable
executing <literal>ALTER SUBSCRIPTION ... SET (slot_name = NONE)</literal>.
After that, <command>DROP SUBSCRIPTION</command> will no longer attempt any
actions on a remote host. Note that if the remote replication slot still
- exists, it should then be dropped manually; otherwise it will continue to
+ exists, it (and any related table synchronization slots) should then be
+ dropped manually; otherwise it/they will continue to
reserve WAL and might eventually cause the disk to fill up. See
also <xref linkend="logical-replication-subscription-slot"/>.
</para>
diff --git a/src/backend/commands/subscriptioncmds.c b/src/backend/commands/subscriptioncmds.c
index 082f785..03cf91e 100644
--- a/src/backend/commands/subscriptioncmds.c
+++ b/src/backend/commands/subscriptioncmds.c
@@ -34,6 +34,7 @@
#include "nodes/makefuncs.h"
#include "replication/logicallauncher.h"
#include "replication/origin.h"
+#include "replication/slot.h"
#include "replication/walreceiver.h"
#include "replication/walsender.h"
#include "replication/worker_internal.h"
@@ -928,7 +929,6 @@ DropSubscription(DropSubscriptionStmt *stmt, bool isTopLevel)
char *err = NULL;
RepOriginId originid;
WalReceiverConn *wrconn = NULL;
- StringInfoData cmd;
Form_pg_subscription form;
/*
@@ -1015,101 +1015,133 @@ DropSubscription(DropSubscriptionStmt *stmt, bool isTopLevel)
ReleaseSysCache(tup);
- /*
- * Stop all the subscription workers immediately.
- *
- * This is necessary if we are dropping the replication slot, so that the
- * slot becomes accessible.
- *
- * It is also necessary if the subscription is disabled and was disabled
- * in the same transaction. Then the workers haven't seen the disabling
- * yet and will still be running, leading to hangs later when we want to
- * drop the replication origin. If the subscription was disabled before
- * this transaction, then there shouldn't be any workers left, so this
- * won't make a difference.
- *
- * New workers won't be started because we hold an exclusive lock on the
- * subscription till the end of the transaction.
- */
- LWLockAcquire(LogicalRepWorkerLock, LW_SHARED);
- subworkers = logicalrep_workers_find(subid, false);
- LWLockRelease(LogicalRepWorkerLock);
- foreach(lc, subworkers)
+ PG_TRY();
{
- LogicalRepWorker *w = (LogicalRepWorker *) lfirst(lc);
-
- logicalrep_worker_stop(w->subid, w->relid);
- }
- list_free(subworkers);
-
- /* Clean up dependencies */
- deleteSharedDependencyRecordsFor(SubscriptionRelationId, subid, 0);
-
- /* Remove any associated relation synchronization states. */
- RemoveSubscriptionRel(subid, InvalidOid);
+ /*
+ * Stop all the subscription workers immediately.
+ *
+ * This is necessary if we are dropping the replication slot, so that
+ * the slot becomes accessible.
+ *
+ * It is also necessary if the subscription is disabled and was
+ * disabled in the same transaction. Then the workers haven't seen
+ * the disabling yet and will still be running, leading to hangs later
+ * when we want to drop the replication origin. If the subscription
+ * was disabled before this transaction, then there shouldn't be any
+ * workers left, so this won't make a difference.
+ *
+ * New workers won't be started because we hold an exclusive lock on
+ * the subscription till the end of the transaction.
+ */
+ LWLockAcquire(LogicalRepWorkerLock, LW_SHARED);
+ subworkers = logicalrep_workers_find(subid, false);
+ LWLockRelease(LogicalRepWorkerLock);
+ foreach(lc, subworkers)
+ {
+ LogicalRepWorker *w = (LogicalRepWorker *) lfirst(lc);
- /* Remove the origin tracking if exists. */
- snprintf(originname, sizeof(originname), "pg_%u", subid);
- originid = replorigin_by_name(originname, true);
- if (originid != InvalidRepOriginId)
- replorigin_drop(originid, false);
+ logicalrep_worker_stop(w->subid, w->relid);
+ }
+ list_free(subworkers);
+
+ /* Clean up dependencies. */
+ deleteSharedDependencyRecordsFor(SubscriptionRelationId, subid, 0);
+
+ /* Remove any associated relation synchronization states. */
+ RemoveSubscriptionRel(subid, InvalidOid);
+
+ /* Remove the origin tracking if exists. */
+ snprintf(originname, sizeof(originname), "pg_%u", subid);
+ originid = replorigin_by_name(originname, true);
+ if (originid != InvalidRepOriginId)
+ replorigin_drop(originid, false);
+
+ /*
+ * If there is a slot associated with the subscription, then drop the
+ * replication slot at the publisher node using the replication
+ * connection.
+ */
+ if (slotname)
+ {
+ load_file("libpqwalreceiver", false);
- /*
- * If there is no slot associated with the subscription, we can finish
- * here.
- */
- if (!slotname)
+ wrconn = walrcv_connect(conninfo, true, subname, &err);
+ if (wrconn == NULL)
+ ereport(ERROR,
+ (errmsg("could not connect to publisher when attempting to "
+ "drop the replication slot \"%s\"", slotname),
+ errdetail("The error was: %s", err),
+ /* translator: %s is an SQL ALTER command */
+ errhint("Use %s to disassociate the subscription from the slot.",
+ "ALTER SUBSCRIPTION ... SET (slot_name = NONE)")));
+
+ ReplicationSlotDropAtPubNode(wrconn, slotname, false);
+ }
+ }
+ PG_FINALLY();
{
+ if (wrconn)
+ walrcv_disconnect(wrconn);
+
table_close(rel, NoLock);
- return;
}
+ PG_END_TRY();
+}
+
+/*
+ * Drop the replication slot at the publisher node using the replication connection.
+ *
+ * missing_ok - if true then only issue WARNING message if the slot cannot be deleted.
+ */
+void
+ReplicationSlotDropAtPubNode(WalReceiverConn *wrconn, char *slotname, bool missing_ok)
+{
+ StringInfoData cmd;
+
+ Assert(wrconn);
- /*
- * Otherwise drop the replication slot at the publisher node using the
- * replication connection.
- */
load_file("libpqwalreceiver", false);
initStringInfo(&cmd);
appendStringInfo(&cmd, "DROP_REPLICATION_SLOT %s WAIT", quote_identifier(slotname));
- wrconn = walrcv_connect(conninfo, true, subname, &err);
- if (wrconn == NULL)
- ereport(ERROR,
- (errmsg("could not connect to publisher when attempting to "
- "drop the replication slot \"%s\"", slotname),
- errdetail("The error was: %s", err),
- /* translator: %s is an SQL ALTER command */
- errhint("Use %s to disassociate the subscription from the slot.",
- "ALTER SUBSCRIPTION ... SET (slot_name = NONE)")));
-
PG_TRY();
{
WalRcvExecResult *res;
res = walrcv_exec(wrconn, cmd.data, 0, NULL);
- if (res->status != WALRCV_OK_COMMAND)
- ereport(ERROR,
+ if (res->status == WALRCV_OK_COMMAND)
+ {
+ /* NOTICE. Success. */
+ ereport(NOTICE,
+ (errmsg("dropped replication slot \"%s\" on publisher",
+ slotname)));
+ }
+ else if (res->status == WALRCV_ERROR && missing_ok)
+ {
+ /* WARNING. Error, but missing_ok = true. */
+ ereport(WARNING,
(errmsg("could not drop the replication slot \"%s\" on publisher",
slotname),
errdetail("The error was: %s", res->err)));
+ }
else
- ereport(NOTICE,
- (errmsg("dropped replication slot \"%s\" on publisher",
- slotname)));
+ {
+ /* ERROR. */
+ ereport(ERROR,
+ (errmsg("could not drop the replication slot \"%s\" on publisher",
+ slotname),
+ errdetail("The error was: %s", res->err)));
+ }
walrcv_clear_result(res);
}
PG_FINALLY();
{
- walrcv_disconnect(wrconn);
+ pfree(cmd.data);
}
PG_END_TRY();
-
- pfree(cmd.data);
-
- table_close(rel, NoLock);
}
/*
diff --git a/src/backend/replication/logical/origin.c b/src/backend/replication/logical/origin.c
index 77781d0..304c879 100644
--- a/src/backend/replication/logical/origin.c
+++ b/src/backend/replication/logical/origin.c
@@ -357,7 +357,7 @@ restart:
if (state->roident == roident)
{
/* found our slot, is it busy? */
- if (state->acquired_by != 0)
+ if (state->acquired_by != 0 && state->acquired_by != MyProcPid)
{
ConditionVariable *cv;
diff --git a/src/backend/replication/logical/tablesync.c b/src/backend/replication/logical/tablesync.c
index 863d196..ec85c08 100644
--- a/src/backend/replication/logical/tablesync.c
+++ b/src/backend/replication/logical/tablesync.c
@@ -31,8 +31,10 @@
* table state to INIT.
* - Tablesync worker starts; changes table state from INIT to DATASYNC while
* copying.
- * - Tablesync worker finishes the copy and sets table state to SYNCWAIT;
- * waits for state change.
+ * - Tablesync worker does initial table copy; there is a FINISHEDCOPY state to
+ * indicate when the copy phase has completed, so if the worker crashes
+ * before reaching SYNCDONE the copy will not be re-attempted.
+ * - Tablesync worker then sets table state to SYNCWAIT; waits for state change.
* - Apply worker periodically checks for tables in SYNCWAIT state. When
* any appear, it sets the table state to CATCHUP and starts loop-waiting
* until either the table state is set to SYNCDONE or the sync worker
@@ -48,8 +50,8 @@
* point it sets state to READY and stops tracking. Again, there might
* be zero changes in between.
*
- * So the state progression is always: INIT -> DATASYNC -> SYNCWAIT ->
- * CATCHUP -> SYNCDONE -> READY.
+ * So the state progression is always: INIT -> DATASYNC ->
+ * (sync worker FINISHEDCOPY) -> SYNCWAIT -> CATCHUP -> SYNCDONE -> READY.
*
* The catalog pg_subscription_rel is used to keep information about
* subscribed tables and their state. Some transient state during data
@@ -59,6 +61,7 @@
* Example flows look like this:
* - Apply is in front:
* sync:8
+ * -> set in catalog FINISHEDCOPY
* -> set in memory SYNCWAIT
* apply:10
* -> set in memory CATCHUP
@@ -74,6 +77,7 @@
*
* - Sync is in front:
* sync:10
+ * -> set in catalog FINISHEDCOPY
* -> set in memory SYNCWAIT
* apply:8
* -> set in memory CATCHUP
@@ -98,11 +102,16 @@
#include "miscadmin.h"
#include "parser/parse_relation.h"
#include "pgstat.h"
+#include "postmaster/interrupt.h"
#include "replication/logicallauncher.h"
#include "replication/logicalrelation.h"
+#include "replication/logicalworker.h"
#include "replication/walreceiver.h"
#include "replication/worker_internal.h"
+#include "replication/slot.h"
+#include "replication/origin.h"
#include "storage/ipc.h"
+#include "storage/lmgr.h"
#include "utils/builtins.h"
#include "utils/lsyscache.h"
#include "utils/memutils.h"
@@ -260,6 +269,92 @@ invalidate_syncing_table_states(Datum arg, int cacheid, uint32 hashvalue)
}
/*
+ * The sync worker cleans up any slot / origin resources it may have created.
+ * This function is called from ProcessInterrupts() as result of tablesync being
+ * signalled.
+ */
+void
+tablesync_cleanup_at_interrupt(void)
+{
+ bool drop_slot_needed;
+ char originname[NAMEDATALEN] = {0};
+ RepOriginId originid;
+ TimeLineID tli;
+ Oid subid = MySubscription->oid;
+ Oid relid = MyLogicalRepWorker->relid;
+
+ elog(DEBUG1,
+ "tablesync_cleanup_at_interrupt for relid = %d",
+ MyLogicalRepWorker->relid);
+
+ /*
+ * Cleanup the tablesync slot, if needed.
+ *
+ * If state is SYNCDONE or READY then the slot has already been dropped.
+ */
+ drop_slot_needed =
+ wrconn != NULL &&
+ MyLogicalRepWorker->relstate != SUBREL_STATE_SYNCDONE &&
+ MyLogicalRepWorker->relstate != SUBREL_STATE_READY;
+
+ if (drop_slot_needed)
+ {
+ char syncslotname[NAMEDATALEN] = {0};
+ bool missing_ok = true; /* no ERROR if slot is missing. */
+
+ /*
+ * End wal streaming so the wrconn can be re-used to drop the slot.
+ */
+ PG_TRY();
+ {
+ walrcv_endstreaming(wrconn, &tli);
+ }
+ PG_CATCH();
+ {
+ /*
+ * It is possible that the walrcv_startstreaming was not yet
+ * called (e.g. the interrupt initiating this cleanup may have
+ * happened during the table COPY phase) so suppress any error
+ * here to cope with that scenario.
+ */
+ }
+ PG_END_TRY();
+
+ ReplicationSlotNameForTablesync(subid, relid, syncslotname);
+
+ ReplicationSlotDropAtPubNode(wrconn, syncslotname, missing_ok);
+ }
+
+ /*
+ * Remove the tablesync's origin tracking if exists.
+ *
+ * The origin APIS must be called within a transaction, and this
+ * transaction will be ended within the finish_sync_worker().
+ */
+ if (!IsTransactionState())
+ {
+ StartTransactionCommand();
+ }
+ snprintf(originname, sizeof(originname), "pg_%u_%u", subid, relid);
+ originid = replorigin_by_name(originname, true);
+ if (originid != InvalidRepOriginId)
+ {
+ replorigin_drop(originid, false);
+
+ /*
+ * CommitTransactionCommand would normally attempt to advance the
+ * origin, but now that the origin has been dropped that would fail,
+ * so we need to reset the replorigin_session here to prevent this
+ * error happening.
+ */
+ replorigin_session_reset();
+ replorigin_session_origin = InvalidRepOriginId;
+ }
+
+ finish_sync_worker(); /* doesn't return. */
+}
+
+/*
* Handle table synchronization cooperation from the synchronization
* worker.
*
@@ -270,30 +365,58 @@ invalidate_syncing_table_states(Datum arg, int cacheid, uint32 hashvalue)
static void
process_syncing_tables_for_sync(XLogRecPtr current_lsn)
{
- Assert(IsTransactionState());
+ bool sync_done = false;
+ Oid subid = MySubscription->oid;
+ Oid relid = MyLogicalRepWorker->relid;
SpinLockAcquire(&MyLogicalRepWorker->relmutex);
+ sync_done = MyLogicalRepWorker->relstate == SUBREL_STATE_CATCHUP &&
+ current_lsn >= MyLogicalRepWorker->relstate_lsn;
+ SpinLockRelease(&MyLogicalRepWorker->relmutex);
- if (MyLogicalRepWorker->relstate == SUBREL_STATE_CATCHUP &&
- current_lsn >= MyLogicalRepWorker->relstate_lsn)
+ if (sync_done)
{
TimeLineID tli;
+ char syncslotname[NAMEDATALEN] = {0};
+ /* End wal streaming so wrconn can be re-used to drop the slot. */
+ walrcv_endstreaming(wrconn, &tli);
+
+ /*
+ * Cleanup the tablesync slot.
+ */
+ ReplicationSlotNameForTablesync(subid, relid, syncslotname);
+
+ elog(DEBUG1,
+ "process_syncing_tables_for_sync: dropping the tablesync slot \"%s\".",
+ syncslotname);
+ ReplicationSlotDropAtPubNode(wrconn, syncslotname, false);
+
+ /*
+ * Change state to SYNCDONE.
+ */
+ SpinLockAcquire(&MyLogicalRepWorker->relmutex);
MyLogicalRepWorker->relstate = SUBREL_STATE_SYNCDONE;
MyLogicalRepWorker->relstate_lsn = current_lsn;
SpinLockRelease(&MyLogicalRepWorker->relmutex);
+ /*
+ * UpdateSubscriptionRelState must be called within a transaction.
+ * That transaction will be ended within the finish_sync_worker().
+ */
+ if (!IsTransactionState())
+ {
+ StartTransactionCommand();
+ }
+
UpdateSubscriptionRelState(MyLogicalRepWorker->subid,
MyLogicalRepWorker->relid,
MyLogicalRepWorker->relstate,
MyLogicalRepWorker->relstate_lsn);
- walrcv_endstreaming(wrconn, &tli);
finish_sync_worker();
}
- else
- SpinLockRelease(&MyLogicalRepWorker->relmutex);
}
/*
@@ -412,6 +535,33 @@ process_syncing_tables_for_apply(XLogRecPtr current_lsn)
started_tx = true;
}
+ /*
+ * Remove the tablesync origin tracking if exists.
+ *
+ * The normal case origin drop is done here instead of in the
+ * process_syncing_tables_for_sync function because if the
+ * tablesync worker process attempted to call drop its own
+ * orign then would prevent the origin from advancing properly
+ * on commit TX.
+ */
+ {
+ char originname[NAMEDATALEN];
+ RepOriginId originid;
+
+ snprintf(originname, sizeof(originname), "pg_%u_%u", MyLogicalRepWorker->subid, rstate->relid);
+ originid = replorigin_by_name(originname, true);
+ if (OidIsValid(originid))
+ {
+ elog(DEBUG1,
+ "process_syncing_tables_for_apply: dropping tablesync origin tracking for \"%s\".",
+ originname);
+ replorigin_drop(originid, false);
+ }
+ }
+
+ /*
+ * Update the state only after the origin cleanup.
+ */
UpdateSubscriptionRelState(MyLogicalRepWorker->subid,
rstate->relid, rstate->state,
rstate->lsn);
@@ -808,6 +958,30 @@ copy_table(Relation rel)
}
/*
+ * Determine the tablesync slot name.
+ *
+ * The name must not exceed NAMEDATALEN -1 because of remote node constraints on
+ * slot name length.
+ *
+ * The returned slot name is either returned in the supplied buffer or
+ * palloc'ed in current memory context (if NULL buffer).
+ */
+char *
+ReplicationSlotNameForTablesync(Oid suboid, Oid relid, char *syncslotname)
+{
+ if (syncslotname)
+ {
+ sprintf(syncslotname, "pg_%u_sync_%u", suboid, relid);
+ }
+ else
+ {
+ syncslotname = psprintf("pg_%u_sync_%u", suboid, relid);
+ }
+
+ return syncslotname;
+}
+
+/*
* Start syncing the table in the sync worker.
*
* If nothing needs to be done to sync the table, we exit the worker without
@@ -824,6 +998,8 @@ LogicalRepSyncTableStart(XLogRecPtr *origin_startpos)
XLogRecPtr relstate_lsn;
Relation rel;
WalRcvExecResult *res;
+ char originname[NAMEDATALEN];
+ RepOriginId originid;
/* Check the state of the table synchronization. */
StartTransactionCommand();
@@ -849,19 +1025,11 @@ LogicalRepSyncTableStart(XLogRecPtr *origin_startpos)
finish_sync_worker(); /* doesn't return */
}
- /*
- * To build a slot name for the sync work, we are limited to NAMEDATALEN -
- * 1 characters. We cut the original slot name to NAMEDATALEN - 28 chars
- * and append _%u_sync_%u (1 + 10 + 6 + 10 + '\0'). (It's actually the
- * NAMEDATALEN on the remote that matters, but this scheme will also work
- * reasonably if that is different.)
- */
- StaticAssertStmt(NAMEDATALEN >= 32, "NAMEDATALEN too small"); /* for sanity */
- slotname = psprintf("%.*s_%u_sync_%u",
- NAMEDATALEN - 28,
- MySubscription->slotname,
- MySubscription->oid,
- MyLogicalRepWorker->relid);
+ /* Calculate the name of the tablesync slot. */
+ slotname = ReplicationSlotNameForTablesync(
+ MySubscription->oid,
+ MyLogicalRepWorker->relid,
+ NULL); /* use palloc */
/*
* Here we use the slot name instead of the subscription name as the
@@ -874,7 +1042,30 @@ LogicalRepSyncTableStart(XLogRecPtr *origin_startpos)
(errmsg("could not connect to the publisher: %s", err)));
Assert(MyLogicalRepWorker->relstate == SUBREL_STATE_INIT ||
- MyLogicalRepWorker->relstate == SUBREL_STATE_DATASYNC);
+ MyLogicalRepWorker->relstate == SUBREL_STATE_DATASYNC ||
+ MyLogicalRepWorker->relstate == SUBREL_STATE_FINISHEDCOPY);
+
+ /* Assign the origin tracking record name. */
+ snprintf(originname, sizeof(originname), "pg_%u_%u", MySubscription->oid, MyLogicalRepWorker->relid);
+
+ if (MyLogicalRepWorker->relstate == SUBREL_STATE_FINISHEDCOPY)
+ {
+ /*
+ * The COPY phase was previously done, but tablesync then crashed/etc
+ * before it was able to finish normally.
+ */
+ StartTransactionCommand();
+
+ /*
+ * The origin tracking name must already exist (missing_ok=false).
+ */
+ originid = replorigin_by_name(originname, false);
+ replorigin_session_setup(originid);
+ replorigin_session_origin = originid;
+ *origin_startpos = replorigin_session_get_progress(false);
+
+ goto copy_table_done;
+ }
SpinLockAcquire(&MyLogicalRepWorker->relmutex);
MyLogicalRepWorker->relstate = SUBREL_STATE_DATASYNC;
@@ -890,9 +1081,6 @@ LogicalRepSyncTableStart(XLogRecPtr *origin_startpos)
CommitTransactionCommand();
pgstat_report_stat(false);
- /*
- * We want to do the table data sync in a single transaction.
- */
StartTransactionCommand();
/*
@@ -918,29 +1106,99 @@ LogicalRepSyncTableStart(XLogRecPtr *origin_startpos)
walrcv_clear_result(res);
/*
- * Create a new temporary logical decoding slot. This slot will be used
+ * Create a new permanent logical decoding slot. This slot will be used
* for the catchup phase after COPY is done, so tell it to use the
* snapshot to make the final data consistent.
*/
- walrcv_create_slot(wrconn, slotname, true,
+ walrcv_create_slot(wrconn, slotname, false,
CRS_USE_SNAPSHOT, origin_startpos);
- /* Now do the initial data copy */
- PushActiveSnapshot(GetTransactionSnapshot());
- copy_table(rel);
- PopActiveSnapshot();
+ /*
+ * Be sure to remove the newly created tablesync slot if the COPY fails.
+ */
+ PG_TRY();
+ {
+ /* Now do the initial data copy */
+ PushActiveSnapshot(GetTransactionSnapshot());
+ copy_table(rel);
+ PopActiveSnapshot();
+
+ res = walrcv_exec(wrconn, "COMMIT", 0, NULL);
+ if (res->status != WALRCV_OK_COMMAND)
+ ereport(ERROR,
+ (errmsg("table copy could not finish transaction on publisher"),
+ errdetail("The error was: %s", res->err)));
+ walrcv_clear_result(res);
+
+ table_close(rel, NoLock);
+
+ /* Make the copy visible. */
+ CommandCounterIncrement();
+ }
+ PG_CATCH();
+ {
+ /*
+ * If something failed during copy table then cleanup the created
+ * slot.
+ */
+ elog(DEBUG1,
+ "LogicalRepSyncTableStart: tablesync copy failed. Dropping the tablesync slot \"%s\".",
+ slotname);
+ ReplicationSlotDropAtPubNode(wrconn, slotname, false);
- res = walrcv_exec(wrconn, "COMMIT", 0, NULL);
- if (res->status != WALRCV_OK_COMMAND)
+ pfree(slotname);
+ slotname = NULL;
+
+ PG_RE_THROW();
+ }
+ PG_END_TRY();
+
+ /* Setup replication origin tracking. */
+ originid = replorigin_by_name(originname, true);
+ if (!OidIsValid(originid))
+ {
+ /*
+ * Origin tracking does not exist, so create it now.
+ *
+ * Then advance to the LSN got from walrcv_create_slot. This is WAL
+ * logged for for the purpose of recovery. Locks are to prevent the
+ * replication origin from vanishing while advancing.
+ */
+ LockRelationOid(ReplicationOriginRelationId, RowExclusiveLock);
+ originid = replorigin_create(originname);
+ replorigin_advance(originid, *origin_startpos, InvalidXLogRecPtr,
+ true /* go backward */ , true /* WAL log */ );
+ UnlockRelationOid(ReplicationOriginRelationId, RowExclusiveLock);
+
+ replorigin_session_setup(originid);
+ replorigin_session_origin = originid;
+ }
+ else
+ {
ereport(ERROR,
- (errmsg("table copy could not finish transaction on publisher"),
- errdetail("The error was: %s", res->err)));
- walrcv_clear_result(res);
+ (errcode(ERRCODE_DUPLICATE_OBJECT),
+ errmsg("replication origin \"%s\" already exists",
+ originname)));
+ }
- table_close(rel, NoLock);
+ /*
+ * Update the persisted state to indicate the COPY phase is done; make it
+ * visible to others.
+ */
+ UpdateSubscriptionRelState(MyLogicalRepWorker->subid,
+ MyLogicalRepWorker->relid,
+ SUBREL_STATE_FINISHEDCOPY,
+ MyLogicalRepWorker->relstate_lsn);
- /* Make the copy visible. */
- CommandCounterIncrement();
+copy_table_done:
+
+ elog(DEBUG1,
+ "LogicalRepSyncTableStart: '%s' origin_startpos lsn %X/%X",
+ originname,
+ (uint32) (*origin_startpos >> 32),
+ (uint32) *origin_startpos);
+
+ CommitTransactionCommand();
/*
* We are done with the initial data synchronization, update the state.
diff --git a/src/backend/replication/logical/worker.c b/src/backend/replication/logical/worker.c
index f2b2549..6482dd6 100644
--- a/src/backend/replication/logical/worker.c
+++ b/src/backend/replication/logical/worker.c
@@ -807,12 +807,8 @@ apply_handle_stream_stop(StringInfo s)
/* We must be in a valid transaction state */
Assert(IsTransactionState());
- /* The synchronization worker runs in single transaction. */
- if (!am_tablesync_worker())
- {
- /* Commit the per-stream transaction */
- CommitTransactionCommand();
- }
+ /* Commit the per-stream transaction */
+ CommitTransactionCommand();
in_streamed_transaction = false;
@@ -889,9 +885,7 @@ apply_handle_stream_abort(StringInfo s)
/* Cleanup the subxact info */
cleanup_subxact_info();
- /* The synchronization worker runs in single transaction */
- if (!am_tablesync_worker())
- CommitTransactionCommand();
+ CommitTransactionCommand();
return;
}
@@ -918,8 +912,7 @@ apply_handle_stream_abort(StringInfo s)
/* write the updated subxact list */
subxact_info_write(MyLogicalRepWorker->subid, xid);
- if (!am_tablesync_worker())
- CommitTransactionCommand();
+ CommitTransactionCommand();
}
}
@@ -1062,8 +1055,7 @@ apply_handle_stream_commit(StringInfo s)
static void
apply_handle_commit_internal(StringInfo s, LogicalRepCommitData* commit_data)
{
- /* The synchronization worker runs in single transaction. */
- if (IsTransactionState() && !am_tablesync_worker())
+ if (IsTransactionState())
{
/*
* Update origin state so we can restart streaming from correct
@@ -3112,3 +3104,12 @@ IsLogicalWorker(void)
{
return MyLogicalRepWorker != NULL;
}
+
+/*
+ * Is current process a logical replication tablesync worker?
+ */
+bool
+IsLogicalWorkerTablesync(void)
+{
+ return am_tablesync_worker();
+}
diff --git a/src/backend/tcop/postgres.c b/src/backend/tcop/postgres.c
index 8dab9fd..2a0565e 100644
--- a/src/backend/tcop/postgres.c
+++ b/src/backend/tcop/postgres.c
@@ -3086,9 +3086,15 @@ ProcessInterrupts(void)
(errcode(ERRCODE_ADMIN_SHUTDOWN),
errmsg("terminating autovacuum process due to administrator command")));
else if (IsLogicalWorker())
+ {
+ /* Tablesync workers do their own cleanups. */
+ if (IsLogicalWorkerTablesync())
+ tablesync_cleanup_at_interrupt(); /* does not return. */
+
ereport(FATAL,
(errcode(ERRCODE_ADMIN_SHUTDOWN),
errmsg("terminating logical replication worker due to administrator command")));
+ }
else if (IsLogicalLauncher())
{
ereport(DEBUG1,
diff --git a/src/include/catalog/pg_subscription_rel.h b/src/include/catalog/pg_subscription_rel.h
index 06663b9..9027c42 100644
--- a/src/include/catalog/pg_subscription_rel.h
+++ b/src/include/catalog/pg_subscription_rel.h
@@ -61,6 +61,8 @@ DECLARE_UNIQUE_INDEX(pg_subscription_rel_srrelid_srsubid_index, 6117, on pg_subs
#define SUBREL_STATE_INIT 'i' /* initializing (sublsn NULL) */
#define SUBREL_STATE_DATASYNC 'd' /* data is being synchronized (sublsn
* NULL) */
+#define SUBREL_STATE_FINISHEDCOPY 'f' /* tablesync copy phase is completed
+ * (sublsn NULL) */
#define SUBREL_STATE_SYNCDONE 's' /* synchronization finished in front of
* apply (sublsn set) */
#define SUBREL_STATE_READY 'r' /* ready (sublsn set) */
diff --git a/src/include/replication/logicalworker.h b/src/include/replication/logicalworker.h
index 2ad61a0..085916c 100644
--- a/src/include/replication/logicalworker.h
+++ b/src/include/replication/logicalworker.h
@@ -15,5 +15,7 @@
extern void ApplyWorkerMain(Datum main_arg);
extern bool IsLogicalWorker(void);
+extern bool IsLogicalWorkerTablesync(void);
+extern void tablesync_cleanup_at_interrupt(void);
#endif /* LOGICALWORKER_H */
diff --git a/src/include/replication/slot.h b/src/include/replication/slot.h
index 53f636c..5f52335 100644
--- a/src/include/replication/slot.h
+++ b/src/include/replication/slot.h
@@ -15,6 +15,7 @@
#include "storage/lwlock.h"
#include "storage/shmem.h"
#include "storage/spin.h"
+#include "replication/walreceiver.h"
/*
* Behaviour of replication slots, upon release or crash.
@@ -211,6 +212,8 @@ extern bool ReplicationSlotsCountDBSlots(Oid dboid, int *nslots, int *nactive);
extern void ReplicationSlotsDropDBSlots(Oid dboid);
extern void InvalidateObsoleteReplicationSlots(XLogSegNo oldestSegno);
extern ReplicationSlot *SearchNamedReplicationSlot(const char *name);
+extern char *ReplicationSlotNameForTablesync(Oid suboid, Oid relid, char *syncslotname);
+extern void ReplicationSlotDropAtPubNode(WalReceiverConn *wrconn, char *slotname, bool missing_ok);
extern void StartupReplicationSlots(void);
extern void CheckPointReplicationSlots(void);
diff --git a/src/test/subscription/t/004_sync.pl b/src/test/subscription/t/004_sync.pl
index e111ab9..ba96d1d 100644
--- a/src/test/subscription/t/004_sync.pl
+++ b/src/test/subscription/t/004_sync.pl
@@ -3,7 +3,9 @@ use strict;
use warnings;
use PostgresNode;
use TestLib;
-use Test::More tests => 7;
+use Test::More tests => 10;
+use Time::HiRes qw(usleep);
+use Scalar::Util qw(looks_like_number);
# Initialize publisher node
my $node_publisher = get_new_node('publisher');
@@ -149,7 +151,99 @@ $result = $node_subscriber->safe_psql('postgres',
is($result, qq(20),
'changes for table added after subscription initialized replicated');
+##
+## slot integrity
+##
+## Manually create a slot with the same name that tablesync will want.
+## Expect tablesync ERROR when clash is detected.
+## Then remove the slot so tablesync can proceed.
+## Expect tablesync can now finish normally.
+##
+
+# drop the subscription
+$node_subscriber->safe_psql('postgres', "DROP SUBSCRIPTION tap_sub");
+
+# empty the table tab_rep
+$node_subscriber->safe_psql('postgres', "DELETE FROM tab_rep;");
+
+# empty the table tab_rep_next
+$node_subscriber->safe_psql('postgres', "DELETE FROM tab_rep_next;");
+
+# recreate the subscription again, but leave it disabled so that we can get the OID
+$node_subscriber->safe_psql('postgres',
+ "CREATE SUBSCRIPTION tap_sub CONNECTION '$publisher_connstr' PUBLICATION tap_pub
+ with (enabled = false)"
+);
+
+# need to create the name of the tablesync slot, for this we need the subscription OID
+# and the table OID.
+my $subid = $node_subscriber->safe_psql('postgres',
+ "SELECT oid FROM pg_subscription WHERE subname = 'tap_sub';");
+is(looks_like_number($subid), qq(1), 'get the subscription OID');
+
+my $relid = $node_subscriber->safe_psql('postgres',
+ "SELECT 'tab_rep_next'::regclass::oid");
+is(looks_like_number($relid), qq(1), 'get the table OID');
+
+# name of the tablesync slot is pg_'suboid'_sync_'tableoid'.
+my $slotname = 'pg_' . $subid . '_' . 'sync_' . $relid;
+
+# temporarily, create a slot having the same name of the tablesync slot.
+$node_publisher->safe_psql('postgres',
+ "SELECT 'init' FROM pg_create_logical_replication_slot('$slotname', 'pgoutput', false);");
+
+# enable the subscription
+$node_subscriber->safe_psql('postgres',
+ "ALTER SUBSCRIPTION tap_sub ENABLE"
+);
+
+# check for occurrence of the expected error
+poll_output_until("replication slot \"$slotname\" already exists")
+ or die "no error stop for the pre-existing origin";
+
+# now drop the offending slot, the tablesync should recover.
+$node_publisher->safe_psql('postgres',
+ "SELECT pg_drop_replication_slot('$slotname');");
+
+# wait for sync to finish
+$node_subscriber->poll_query_until('postgres', $synced_query)
+ or die "Timed out while waiting for subscriber to synchronize data";
+
+$result = $node_subscriber->safe_psql('postgres',
+ "SELECT count(*) FROM tab_rep_next");
+is($result, qq(20),
+ 'data for table added after subscription initialized are now synced');
+
+# Cleanup
$node_subscriber->safe_psql('postgres', "DROP SUBSCRIPTION tap_sub");
$node_subscriber->stop('fast');
$node_publisher->stop('fast');
+
+sub poll_output_until
+{
+ my ($expected) = @_;
+
+ $expected = 'xxxxxx' unless defined($expected); # default junk value
+
+ my $max_attempts = 10 * 10;
+ my $attempts = 0;
+
+ my $output_file = '';
+ while ($attempts < $max_attempts)
+ {
+ $output_file = slurp_file($node_subscriber->logfile());
+
+ if ($output_file =~ $expected)
+ {
+ return 1;
+ }
+
+ # Wait 0.1 second before retrying.
+ usleep(100_000);
+ $attempts++;
+ }
+
+ # The output result didn't change in 180 seconds. Give up
+ return 0;
+}
--
1.8.3.1
On Tue, Jan 19, 2021 at 2:32 PM Peter Smith <smithpb2250@gmail.com> wrote:
Hi Amit.
PSA the v17 patch for the Tablesync Solution1.
Thanks for the updated patch. Below are few comments:
1. Why are we changing the scope of PG_TRY in DropSubscription()?
Also, it might be better to keep the replication slot drop part as it
is.
2.
- * - Tablesync worker finishes the copy and sets table state to SYNCWAIT;
- * waits for state change.
+ * - Tablesync worker does initial table copy; there is a
FINISHEDCOPY state to
+ * indicate when the copy phase has completed, so if the worker crashes
+ * before reaching SYNCDONE the copy will not be re-attempted.
In the last line, shouldn't the state be FINISHEDCOPY instead of SYNCDONE?
3.
+void
+tablesync_cleanup_at_interrupt(void)
+{
+ bool drop_slot_needed;
+ char originname[NAMEDATALEN] = {0};
+ RepOriginId originid;
+ TimeLineID tli;
+ Oid subid = MySubscription->oid;
+ Oid relid = MyLogicalRepWorker->relid;
+
+ elog(DEBUG1,
+ "tablesync_cleanup_at_interrupt for relid = %d",
+ MyLogicalRepWorker->relid);
The function name and message makes it sound like that we drop slot
and origin at any interrupt. Isn't it better to name it as
tablesync_cleanup_at_shutdown()?
4.
+ drop_slot_needed =
+ wrconn != NULL &&
+ MyLogicalRepWorker->relstate != SUBREL_STATE_SYNCDONE &&
+ MyLogicalRepWorker->relstate != SUBREL_STATE_READY;
+
+ if (drop_slot_needed)
+ {
+ char syncslotname[NAMEDATALEN] = {0};
+ bool missing_ok = true; /* no ERROR if slot is missing. */
I think we can avoid using missing_ok and drop_slot_needed variables.
5. Can we drop the origin along with the slot in
process_syncing_tables_for_sync() instead of
process_syncing_tables_for_apply()? I think this is possible because
of the other changes you made in origin.c. Also, if possible, we can
try to use the same code to drop the slot and origin in
tablesync_cleanup_at_interrupt and process_syncing_tables_for_sync.
6.
+ if (MyLogicalRepWorker->relstate == SUBREL_STATE_FINISHEDCOPY)
+ {
+ /*
+ * The COPY phase was previously done, but tablesync then crashed/etc
+ * before it was able to finish normally.
+ */
There seems to be a typo (crashed/etc) in the above comment.
7.
+# check for occurrence of the expected error
+poll_output_until("replication slot \"$slotname\" already exists")
+ or die "no error stop for the pre-existing origin";
In this test, isn't it better to check for datasync state like below?
004_sync.pl has some other similar test.
my $started_query = "SELECT srsubstate = 'd' FROM pg_subscription_rel;";
$node_subscriber->poll_query_until('postgres', $started_query)
or die "Timed out while waiting for subscriber to start sync";
Is there a reason why we can't use the existing way to check for
failure in this case?
--
With Regards,
Amit Kapila.
On Thu, Jan 21, 2021 at 3:47 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
On Tue, Jan 19, 2021 at 2:32 PM Peter Smith <smithpb2250@gmail.com> wrote:
Hi Amit.
PSA the v17 patch for the Tablesync Solution1.
Thanks for the updated patch. Below are few comments:
One more comment:
In LogicalRepSyncTableStart(), you are trying to remove the slot on
the failure of copy which won't work if the publisher is down. If that
happens on restart of tablesync worker, we will retry to create the
slot with the same name and it will fail because the previous slot is
still not removed from the publisher. I think the same problem can
happen if, after an error in tablesync worker and we drop the
subscription before tablesync worker gets a chance to restart. So, to
avoid these problems can we use the TEMPORARY slot for tablesync
workers as previously? If I remember correctly, the main problem was
we don't know where to start decoding if we fail in catchup phase. But
for that origins should be sufficient because if we fail before copy
then anyway we have to create a new slot and origin but if we fail
after copy then we can use the start_decoding_position from the
origin. So before copy, we still need to use CRS_USE_SNAPSHOT while
creating a temporary slot but if we are already in FINISHED COPY state
at the start of tablesync worker then create a slot with
CRS_NOEXPORT_SNAPSHOT option and then use origin's start_pos and
proceed decoding changes from that point onwards similar to how
currently the apply worker works.
--
With Regards,
Amit Kapila.
Hi Amit.
PSA the v18 patch for the Tablesync Solution1.
Main differences from v17:
+ Design change to use TEMPORARY tablesync slots [ak0122] means lots
of the v17 slot cleanup code became unnecessary.
+ Small refactor in LogicalReplicationSyncTableStart to fix a deadlock scenario.
+ Addressing some review comments [ak0121].
[ak0121] /messages/by-id/CAA4eK1LGxuB_RTfZ2HLJT76wv=FLV6UPqT+FWkiDg61rvQkkmQ@mail.gmail.com
[ak0122] /messages/by-id/CAA4eK1LS0_mdVx2zG3cS+H88FJiwyS3kZi7zxijJ_gEuw2uQ2g@mail.gmail.com
====
Features:
* The tablesync slot name is no longer tied to the Subscription slot name.
* The tablesync worker is now allowing multiple tx instead of single tx
* A new state (SUBREL_STATE_FINISHEDCOPY) is persisted after a
successful copy_table in tablesync's LogicalRepSyncTableStart.
* If a re-launched tablesync finds state SUBREL_STATE_FINISHEDCOPY
then it will bypass the initial copy_table phase.
* Now tablesync sets up replication origin tracking in
LogicalRepSyncTableStart (similar as done for the apply worker). The
origin is advanced when first created.
* The tablesync replication origin tracking record is cleaned up by:
- process_syncing_tables_for_apply
- DropSubscription
- AlterSubscription_refresh
* Updates to PG docs.
* New TAP test case
Known Issues:
* None.
---
Kind Regards,
Peter Smith.
Fujitsu Australia
Attachments:
v18-0001-Tablesync-Solution1.patchapplication/octet-stream; name=v18-0001-Tablesync-Solution1.patchDownload
From b5b2b5335bacb8c924320813a78aeb94cea66e0a Mon Sep 17 00:00:00 2001
From: Peter Smith <peter.b.smith@fujitsu.com>
Date: Sat, 23 Jan 2021 09:53:21 +1100
Subject: [PATCH v18] Tablesync Solution1.
====
Features:
* The tablesync slot name is no longer tied to the Subscription slot name.
* The tablesync worker is now allowing multiple tx instead of single tx
* A new state (SUBREL_STATE_FINISHEDCOPY) is persisted after a successful copy_table in tablesync's LogicalRepSyncTableStart.
* If a re-launched tablesync finds state SUBREL_STATE_FINISHEDCOPY then it will bypass the initial copy_table phase.
* Now tablesync sets up replication origin tracking in LogicalRepSyncTableStart (similar as done for the apply worker). The origin is advanced when first created.
* The tablesync replication origin tracking record is cleaned up by:
- process_syncing_tables_for_apply
- DropSubscription
- AlterSubscription_refresh
* Updates to PG docs.
* New TAP test case
Known Issues:
* None.
---
doc/src/sgml/catalogs.sgml | 1 +
src/backend/commands/subscriptioncmds.c | 39 ++++++
src/backend/replication/logical/tablesync.c | 183 ++++++++++++++++++++++++----
src/backend/replication/logical/worker.c | 18 +--
src/include/catalog/pg_subscription_rel.h | 2 +
src/include/replication/slot.h | 1 +
src/test/subscription/t/004_sync.pl | 96 ++++++++++++++-
7 files changed, 300 insertions(+), 40 deletions(-)
diff --git a/doc/src/sgml/catalogs.sgml b/doc/src/sgml/catalogs.sgml
index 43d7a1a..82e74e1 100644
--- a/doc/src/sgml/catalogs.sgml
+++ b/doc/src/sgml/catalogs.sgml
@@ -7662,6 +7662,7 @@ SCRAM-SHA-256$<replaceable><iteration count></replaceable>:<replaceable>&l
State code:
<literal>i</literal> = initialize,
<literal>d</literal> = data is being copied,
+ <literal>f</literal> = finished table copy,
<literal>s</literal> = synchronized,
<literal>r</literal> = ready (normal replication)
</para></entry>
diff --git a/src/backend/commands/subscriptioncmds.c b/src/backend/commands/subscriptioncmds.c
index 082f785..58f8a86 100644
--- a/src/backend/commands/subscriptioncmds.c
+++ b/src/backend/commands/subscriptioncmds.c
@@ -649,10 +649,22 @@ AlterSubscription_refresh(Subscription *sub, bool copy_data)
if (!bsearch(&relid, pubrel_local_oids,
list_length(pubrel_names), sizeof(Oid), oid_cmp))
{
+ char originname[NAMEDATALEN];
+ RepOriginId originid;
+
RemoveSubscriptionRel(sub->oid, relid);
logicalrep_worker_stop_at_commit(sub->oid, relid);
+ /* Remove the tablesync's origin tracking if exists. */
+ snprintf(originname, sizeof(originname), "pg_%u_%u", sub->oid, relid);
+ originid = replorigin_by_name(originname, true);
+ if (originid != InvalidRepOriginId)
+ {
+ elog(DEBUG1, "AlterSubscription_refresh: dropping origin tracking for \"%s\"", originname);
+ replorigin_drop(originid, false /* nowait */ );
+ }
+
ereport(DEBUG1,
(errmsg("table \"%s.%s\" removed from subscription \"%s\"",
get_namespace_name(get_rel_namespace(relid)),
@@ -930,6 +942,7 @@ DropSubscription(DropSubscriptionStmt *stmt, bool isTopLevel)
WalReceiverConn *wrconn = NULL;
StringInfoData cmd;
Form_pg_subscription form;
+ List *rstates;
/*
* Lock pg_subscription with AccessExclusiveLock to ensure that the
@@ -1042,6 +1055,32 @@ DropSubscription(DropSubscriptionStmt *stmt, bool isTopLevel)
}
list_free(subworkers);
+ /*
+ * Tablesync resource cleanup (origins).
+ *
+ * Any READY-state relations have already done this.
+ */
+ rstates = GetSubscriptionNotReadyRelations(subid);
+ foreach(lc, rstates)
+ {
+ SubscriptionRelState *rstate = (SubscriptionRelState *) lfirst(lc);
+ Oid relid = rstate->relid;
+
+ /* Only cleanup the tablesync worker resources */
+ if (!OidIsValid(relid))
+ continue;
+
+ /* Remove the tablesync's origin tracking if exists. */
+ snprintf(originname, sizeof(originname), "pg_%u_%u", subid, relid);
+ originid = replorigin_by_name(originname, true);
+ if (originid != InvalidRepOriginId)
+ {
+ elog(DEBUG1, "DropSubscription: dropping origin tracking for \"%s\"", originname);
+ replorigin_drop(originid, false /* nowait */ );
+ }
+ }
+ list_free(rstates);
+
/* Clean up dependencies */
deleteSharedDependencyRecordsFor(SubscriptionRelationId, subid, 0);
diff --git a/src/backend/replication/logical/tablesync.c b/src/backend/replication/logical/tablesync.c
index 863d196..1f828fc 100644
--- a/src/backend/replication/logical/tablesync.c
+++ b/src/backend/replication/logical/tablesync.c
@@ -31,8 +31,10 @@
* table state to INIT.
* - Tablesync worker starts; changes table state from INIT to DATASYNC while
* copying.
- * - Tablesync worker finishes the copy and sets table state to SYNCWAIT;
- * waits for state change.
+ * - Tablesync worker does initial table copy; there is a FINISHEDCOPY state to
+ * indicate when the copy phase has completed, so if the worker crashes
+ * with this (non-memory) state then the copy will not be re-attempted.
+ * - Tablesync worker then sets table state to SYNCWAIT; waits for state change.
* - Apply worker periodically checks for tables in SYNCWAIT state. When
* any appear, it sets the table state to CATCHUP and starts loop-waiting
* until either the table state is set to SYNCDONE or the sync worker
@@ -48,8 +50,8 @@
* point it sets state to READY and stops tracking. Again, there might
* be zero changes in between.
*
- * So the state progression is always: INIT -> DATASYNC -> SYNCWAIT ->
- * CATCHUP -> SYNCDONE -> READY.
+ * So the state progression is always: INIT -> DATASYNC ->
+ * (sync worker FINISHEDCOPY) -> SYNCWAIT -> CATCHUP -> SYNCDONE -> READY.
*
* The catalog pg_subscription_rel is used to keep information about
* subscribed tables and their state. Some transient state during data
@@ -59,6 +61,7 @@
* Example flows look like this:
* - Apply is in front:
* sync:8
+ * -> set in catalog FINISHEDCOPY
* -> set in memory SYNCWAIT
* apply:10
* -> set in memory CATCHUP
@@ -74,6 +77,7 @@
*
* - Sync is in front:
* sync:10
+ * -> set in catalog FINISHEDCOPY
* -> set in memory SYNCWAIT
* apply:8
* -> set in memory CATCHUP
@@ -98,11 +102,16 @@
#include "miscadmin.h"
#include "parser/parse_relation.h"
#include "pgstat.h"
+#include "postmaster/interrupt.h"
#include "replication/logicallauncher.h"
#include "replication/logicalrelation.h"
+#include "replication/logicalworker.h"
#include "replication/walreceiver.h"
#include "replication/worker_internal.h"
+#include "replication/slot.h"
+#include "replication/origin.h"
#include "storage/ipc.h"
+#include "storage/lmgr.h"
#include "utils/builtins.h"
#include "utils/lsyscache.h"
#include "utils/memutils.h"
@@ -270,20 +279,35 @@ invalidate_syncing_table_states(Datum arg, int cacheid, uint32 hashvalue)
static void
process_syncing_tables_for_sync(XLogRecPtr current_lsn)
{
- Assert(IsTransactionState());
+ bool sync_done = false;
SpinLockAcquire(&MyLogicalRepWorker->relmutex);
+ sync_done = MyLogicalRepWorker->relstate == SUBREL_STATE_CATCHUP &&
+ current_lsn >= MyLogicalRepWorker->relstate_lsn;
+ SpinLockRelease(&MyLogicalRepWorker->relmutex);
- if (MyLogicalRepWorker->relstate == SUBREL_STATE_CATCHUP &&
- current_lsn >= MyLogicalRepWorker->relstate_lsn)
+ if (sync_done)
{
TimeLineID tli;
+ /*
+ * Change state to SYNCDONE.
+ */
+ SpinLockAcquire(&MyLogicalRepWorker->relmutex);
MyLogicalRepWorker->relstate = SUBREL_STATE_SYNCDONE;
MyLogicalRepWorker->relstate_lsn = current_lsn;
SpinLockRelease(&MyLogicalRepWorker->relmutex);
+ /*
+ * UpdateSubscriptionRelState must be called within a transaction.
+ * That transaction will be ended within the finish_sync_worker().
+ */
+ if (!IsTransactionState())
+ {
+ StartTransactionCommand();
+ }
+
UpdateSubscriptionRelState(MyLogicalRepWorker->subid,
MyLogicalRepWorker->relid,
MyLogicalRepWorker->relstate,
@@ -292,8 +316,6 @@ process_syncing_tables_for_sync(XLogRecPtr current_lsn)
walrcv_endstreaming(wrconn, &tli);
finish_sync_worker();
}
- else
- SpinLockRelease(&MyLogicalRepWorker->relmutex);
}
/*
@@ -404,6 +426,9 @@ process_syncing_tables_for_apply(XLogRecPtr current_lsn)
*/
if (current_lsn >= rstate->lsn)
{
+ char originname[NAMEDATALEN];
+ RepOriginId originid;
+
rstate->state = SUBREL_STATE_READY;
rstate->lsn = current_lsn;
if (!started_tx)
@@ -412,6 +437,27 @@ process_syncing_tables_for_apply(XLogRecPtr current_lsn)
started_tx = true;
}
+ /*
+ * Remove the tablesync origin tracking if exists.
+ *
+ * The normal case origin drop must be done here, not in the
+ * process_syncing_tables_for_sync function, because if the
+ * tablesync worker process attempted to drop its own origin
+ * then it would fail (origin is "busy").
+ */
+ snprintf(originname, sizeof(originname), "pg_%u_%u", MyLogicalRepWorker->subid, rstate->relid);
+ originid = replorigin_by_name(originname, true);
+ if (OidIsValid(originid))
+ {
+ elog(DEBUG1,
+ "process_syncing_tables_for_apply: dropping tablesync origin tracking for \"%s\".",
+ originname);
+ replorigin_drop(originid, false /* nowait */ );
+ }
+
+ /*
+ * Update the state to READY only after the origin cleanup.
+ */
UpdateSubscriptionRelState(MyLogicalRepWorker->subid,
rstate->relid, rstate->state,
rstate->lsn);
@@ -808,6 +854,22 @@ copy_table(Relation rel)
}
/*
+ * Determine the tablesync slot name.
+ *
+ * The name must not exceed NAMEDATALEN -1 because of remote node constraints on
+ * slot name length.
+ *
+ * The returned slot name palloc'ed in current memory context.
+ */
+char *
+ReplicationSlotNameForTablesync(Oid suboid, Oid relid)
+{
+ char *syncslotname = psprintf("pg_%u_sync_%u", suboid, relid);
+
+ return syncslotname;
+}
+
+/*
* Start syncing the table in the sync worker.
*
* If nothing needs to be done to sync the table, we exit the worker without
@@ -824,6 +886,8 @@ LogicalRepSyncTableStart(XLogRecPtr *origin_startpos)
XLogRecPtr relstate_lsn;
Relation rel;
WalRcvExecResult *res;
+ char originname[NAMEDATALEN];
+ RepOriginId originid;
/* Check the state of the table synchronization. */
StartTransactionCommand();
@@ -849,19 +913,10 @@ LogicalRepSyncTableStart(XLogRecPtr *origin_startpos)
finish_sync_worker(); /* doesn't return */
}
- /*
- * To build a slot name for the sync work, we are limited to NAMEDATALEN -
- * 1 characters. We cut the original slot name to NAMEDATALEN - 28 chars
- * and append _%u_sync_%u (1 + 10 + 6 + 10 + '\0'). (It's actually the
- * NAMEDATALEN on the remote that matters, but this scheme will also work
- * reasonably if that is different.)
- */
- StaticAssertStmt(NAMEDATALEN >= 32, "NAMEDATALEN too small"); /* for sanity */
- slotname = psprintf("%.*s_%u_sync_%u",
- NAMEDATALEN - 28,
- MySubscription->slotname,
- MySubscription->oid,
- MyLogicalRepWorker->relid);
+ /* Calculate the name of the tablesync slot. */
+ slotname = ReplicationSlotNameForTablesync(
+ MySubscription->oid,
+ MyLogicalRepWorker->relid);
/*
* Here we use the slot name instead of the subscription name as the
@@ -874,7 +929,38 @@ LogicalRepSyncTableStart(XLogRecPtr *origin_startpos)
(errmsg("could not connect to the publisher: %s", err)));
Assert(MyLogicalRepWorker->relstate == SUBREL_STATE_INIT ||
- MyLogicalRepWorker->relstate == SUBREL_STATE_DATASYNC);
+ MyLogicalRepWorker->relstate == SUBREL_STATE_DATASYNC ||
+ MyLogicalRepWorker->relstate == SUBREL_STATE_FINISHEDCOPY);
+
+ /* Assign the origin tracking record name. */
+ snprintf(originname, sizeof(originname), "pg_%u_%u", MySubscription->oid, MyLogicalRepWorker->relid);
+
+ if (MyLogicalRepWorker->relstate == SUBREL_STATE_FINISHEDCOPY)
+ {
+ /*
+ * The COPY phase was previously done, but tablesync then crashed
+ * before it was able to finish normally.
+ */
+ StartTransactionCommand();
+
+ /*
+ * Slot creation passes NULL lsn because the origin startpos is got
+ * from origin tracking this time, not from the slot.
+ */
+ walrcv_create_slot(wrconn, slotname, true /* temporary */ ,
+ CRS_NOEXPORT_SNAPSHOT, NULL /* lsn */ );
+
+ /*
+ * The origin tracking name must already exist. It was created first
+ * time this tablesync was launched.
+ */
+ originid = replorigin_by_name(originname, false /* missing_ok */ );
+ replorigin_session_setup(originid);
+ replorigin_session_origin = originid;
+ *origin_startpos = replorigin_session_get_progress(false);
+
+ goto copy_table_done;
+ }
SpinLockAcquire(&MyLogicalRepWorker->relmutex);
MyLogicalRepWorker->relstate = SUBREL_STATE_DATASYNC;
@@ -890,9 +976,6 @@ LogicalRepSyncTableStart(XLogRecPtr *origin_startpos)
CommitTransactionCommand();
pgstat_report_stat(false);
- /*
- * We want to do the table data sync in a single transaction.
- */
StartTransactionCommand();
/*
@@ -942,6 +1025,54 @@ LogicalRepSyncTableStart(XLogRecPtr *origin_startpos)
/* Make the copy visible. */
CommandCounterIncrement();
+ /* Setup replication origin tracking. */
+ originid = replorigin_by_name(originname, true);
+ if (!OidIsValid(originid))
+ {
+ /*
+ * Origin tracking does not exist, so create it now.
+ *
+ * Then advance to the LSN got from walrcv_create_slot. This is WAL
+ * logged for for the purpose of recovery. Locks are to prevent the
+ * replication origin from vanishing while advancing.
+ */
+ originid = replorigin_create(originname);
+
+ LockRelationOid(ReplicationOriginRelationId, RowExclusiveLock);
+ replorigin_advance(originid, *origin_startpos, InvalidXLogRecPtr,
+ true /* go backward */ , true /* WAL log */ );
+ UnlockRelationOid(ReplicationOriginRelationId, RowExclusiveLock);
+
+ replorigin_session_setup(originid);
+ replorigin_session_origin = originid;
+ }
+ else
+ {
+ ereport(ERROR,
+ (errcode(ERRCODE_DUPLICATE_OBJECT),
+ errmsg("replication origin \"%s\" already exists",
+ originname)));
+ }
+
+ /*
+ * Update the persisted state to indicate the COPY phase is done; make it
+ * visible to others.
+ */
+ UpdateSubscriptionRelState(MyLogicalRepWorker->subid,
+ MyLogicalRepWorker->relid,
+ SUBREL_STATE_FINISHEDCOPY,
+ MyLogicalRepWorker->relstate_lsn);
+
+copy_table_done:
+
+ elog(DEBUG1,
+ "LogicalRepSyncTableStart: '%s' origin_startpos lsn %X/%X",
+ originname,
+ (uint32) (*origin_startpos >> 32),
+ (uint32) *origin_startpos);
+
+ CommitTransactionCommand();
+
/*
* We are done with the initial data synchronization, update the state.
*/
diff --git a/src/backend/replication/logical/worker.c b/src/backend/replication/logical/worker.c
index f2b2549..0701a9b 100644
--- a/src/backend/replication/logical/worker.c
+++ b/src/backend/replication/logical/worker.c
@@ -807,12 +807,8 @@ apply_handle_stream_stop(StringInfo s)
/* We must be in a valid transaction state */
Assert(IsTransactionState());
- /* The synchronization worker runs in single transaction. */
- if (!am_tablesync_worker())
- {
- /* Commit the per-stream transaction */
- CommitTransactionCommand();
- }
+ /* Commit the per-stream transaction */
+ CommitTransactionCommand();
in_streamed_transaction = false;
@@ -889,9 +885,7 @@ apply_handle_stream_abort(StringInfo s)
/* Cleanup the subxact info */
cleanup_subxact_info();
- /* The synchronization worker runs in single transaction */
- if (!am_tablesync_worker())
- CommitTransactionCommand();
+ CommitTransactionCommand();
return;
}
@@ -918,8 +912,7 @@ apply_handle_stream_abort(StringInfo s)
/* write the updated subxact list */
subxact_info_write(MyLogicalRepWorker->subid, xid);
- if (!am_tablesync_worker())
- CommitTransactionCommand();
+ CommitTransactionCommand();
}
}
@@ -1062,8 +1055,7 @@ apply_handle_stream_commit(StringInfo s)
static void
apply_handle_commit_internal(StringInfo s, LogicalRepCommitData* commit_data)
{
- /* The synchronization worker runs in single transaction. */
- if (IsTransactionState() && !am_tablesync_worker())
+ if (IsTransactionState())
{
/*
* Update origin state so we can restart streaming from correct
diff --git a/src/include/catalog/pg_subscription_rel.h b/src/include/catalog/pg_subscription_rel.h
index 06663b9..9027c42 100644
--- a/src/include/catalog/pg_subscription_rel.h
+++ b/src/include/catalog/pg_subscription_rel.h
@@ -61,6 +61,8 @@ DECLARE_UNIQUE_INDEX(pg_subscription_rel_srrelid_srsubid_index, 6117, on pg_subs
#define SUBREL_STATE_INIT 'i' /* initializing (sublsn NULL) */
#define SUBREL_STATE_DATASYNC 'd' /* data is being synchronized (sublsn
* NULL) */
+#define SUBREL_STATE_FINISHEDCOPY 'f' /* tablesync copy phase is completed
+ * (sublsn NULL) */
#define SUBREL_STATE_SYNCDONE 's' /* synchronization finished in front of
* apply (sublsn set) */
#define SUBREL_STATE_READY 'r' /* ready (sublsn set) */
diff --git a/src/include/replication/slot.h b/src/include/replication/slot.h
index 53f636c..c784282 100644
--- a/src/include/replication/slot.h
+++ b/src/include/replication/slot.h
@@ -211,6 +211,7 @@ extern bool ReplicationSlotsCountDBSlots(Oid dboid, int *nslots, int *nactive);
extern void ReplicationSlotsDropDBSlots(Oid dboid);
extern void InvalidateObsoleteReplicationSlots(XLogSegNo oldestSegno);
extern ReplicationSlot *SearchNamedReplicationSlot(const char *name);
+extern char *ReplicationSlotNameForTablesync(Oid suboid, Oid relid);
extern void StartupReplicationSlots(void);
extern void CheckPointReplicationSlots(void);
diff --git a/src/test/subscription/t/004_sync.pl b/src/test/subscription/t/004_sync.pl
index e111ab9..ba96d1d 100644
--- a/src/test/subscription/t/004_sync.pl
+++ b/src/test/subscription/t/004_sync.pl
@@ -3,7 +3,9 @@ use strict;
use warnings;
use PostgresNode;
use TestLib;
-use Test::More tests => 7;
+use Test::More tests => 10;
+use Time::HiRes qw(usleep);
+use Scalar::Util qw(looks_like_number);
# Initialize publisher node
my $node_publisher = get_new_node('publisher');
@@ -149,7 +151,99 @@ $result = $node_subscriber->safe_psql('postgres',
is($result, qq(20),
'changes for table added after subscription initialized replicated');
+##
+## slot integrity
+##
+## Manually create a slot with the same name that tablesync will want.
+## Expect tablesync ERROR when clash is detected.
+## Then remove the slot so tablesync can proceed.
+## Expect tablesync can now finish normally.
+##
+
+# drop the subscription
+$node_subscriber->safe_psql('postgres', "DROP SUBSCRIPTION tap_sub");
+
+# empty the table tab_rep
+$node_subscriber->safe_psql('postgres', "DELETE FROM tab_rep;");
+
+# empty the table tab_rep_next
+$node_subscriber->safe_psql('postgres', "DELETE FROM tab_rep_next;");
+
+# recreate the subscription again, but leave it disabled so that we can get the OID
+$node_subscriber->safe_psql('postgres',
+ "CREATE SUBSCRIPTION tap_sub CONNECTION '$publisher_connstr' PUBLICATION tap_pub
+ with (enabled = false)"
+);
+
+# need to create the name of the tablesync slot, for this we need the subscription OID
+# and the table OID.
+my $subid = $node_subscriber->safe_psql('postgres',
+ "SELECT oid FROM pg_subscription WHERE subname = 'tap_sub';");
+is(looks_like_number($subid), qq(1), 'get the subscription OID');
+
+my $relid = $node_subscriber->safe_psql('postgres',
+ "SELECT 'tab_rep_next'::regclass::oid");
+is(looks_like_number($relid), qq(1), 'get the table OID');
+
+# name of the tablesync slot is pg_'suboid'_sync_'tableoid'.
+my $slotname = 'pg_' . $subid . '_' . 'sync_' . $relid;
+
+# temporarily, create a slot having the same name of the tablesync slot.
+$node_publisher->safe_psql('postgres',
+ "SELECT 'init' FROM pg_create_logical_replication_slot('$slotname', 'pgoutput', false);");
+
+# enable the subscription
+$node_subscriber->safe_psql('postgres',
+ "ALTER SUBSCRIPTION tap_sub ENABLE"
+);
+
+# check for occurrence of the expected error
+poll_output_until("replication slot \"$slotname\" already exists")
+ or die "no error stop for the pre-existing origin";
+
+# now drop the offending slot, the tablesync should recover.
+$node_publisher->safe_psql('postgres',
+ "SELECT pg_drop_replication_slot('$slotname');");
+
+# wait for sync to finish
+$node_subscriber->poll_query_until('postgres', $synced_query)
+ or die "Timed out while waiting for subscriber to synchronize data";
+
+$result = $node_subscriber->safe_psql('postgres',
+ "SELECT count(*) FROM tab_rep_next");
+is($result, qq(20),
+ 'data for table added after subscription initialized are now synced');
+
+# Cleanup
$node_subscriber->safe_psql('postgres', "DROP SUBSCRIPTION tap_sub");
$node_subscriber->stop('fast');
$node_publisher->stop('fast');
+
+sub poll_output_until
+{
+ my ($expected) = @_;
+
+ $expected = 'xxxxxx' unless defined($expected); # default junk value
+
+ my $max_attempts = 10 * 10;
+ my $attempts = 0;
+
+ my $output_file = '';
+ while ($attempts < $max_attempts)
+ {
+ $output_file = slurp_file($node_subscriber->logfile());
+
+ if ($output_file =~ $expected)
+ {
+ return 1;
+ }
+
+ # Wait 0.1 second before retrying.
+ usleep(100_000);
+ $attempts++;
+ }
+
+ # The output result didn't change in 180 seconds. Give up
+ return 0;
+}
--
1.8.3.1
v18-0002-Tablesync-extra-logging.patchapplication/octet-stream; name=v18-0002-Tablesync-extra-logging.patchDownload
From 22f471614e150c0a07833592ae39f3ed73429b03 Mon Sep 17 00:00:00 2001
From: Peter Smith <peter.b.smith@fujitsu.com>
Date: Sat, 23 Jan 2021 10:13:54 +1100
Subject: [PATCH v18] Tablesync extra logging.
This patch only adds some extra logging which may be helpful for testing, but is not for committing.
---
src/backend/commands/subscriptioncmds.c | 10 ++++++----
src/backend/replication/logical/tablesync.c | 31 +++++++++++++++++++++++++++--
2 files changed, 35 insertions(+), 6 deletions(-)
diff --git a/src/backend/commands/subscriptioncmds.c b/src/backend/commands/subscriptioncmds.c
index 58f8a86..352c3bf 100644
--- a/src/backend/commands/subscriptioncmds.c
+++ b/src/backend/commands/subscriptioncmds.c
@@ -661,12 +661,13 @@ AlterSubscription_refresh(Subscription *sub, bool copy_data)
originid = replorigin_by_name(originname, true);
if (originid != InvalidRepOriginId)
{
- elog(DEBUG1, "AlterSubscription_refresh: dropping origin tracking for \"%s\"", originname);
+ elog(LOG, "!!>> AlterSubscription_refresh: dropping origin tracking for \"%s\"", originname);
replorigin_drop(originid, false /* nowait */ );
+ elog(LOG, "!!>> AlterSubscription_refresh: dropped origin tracking for \"%s\"", originname);
}
- ereport(DEBUG1,
- (errmsg("table \"%s.%s\" removed from subscription \"%s\"",
+ ereport(LOG,
+ (errmsg("!!>> table \"%s.%s\" removed from subscription \"%s\"",
get_namespace_name(get_rel_namespace(relid)),
get_rel_name(relid),
sub->name)));
@@ -1075,8 +1076,9 @@ DropSubscription(DropSubscriptionStmt *stmt, bool isTopLevel)
originid = replorigin_by_name(originname, true);
if (originid != InvalidRepOriginId)
{
- elog(DEBUG1, "DropSubscription: dropping origin tracking for \"%s\"", originname);
+ elog(LOG, "!!>> DropSubscription: dropping origin tracking for \"%s\"", originname);
replorigin_drop(originid, false /* nowait */ );
+ elog(LOG, "!!>> DropSubscription: droppped origin tracking for \"%s\"", originname);
}
}
list_free(rstates);
diff --git a/src/backend/replication/logical/tablesync.c b/src/backend/replication/logical/tablesync.c
index 1f828fc..b531516 100644
--- a/src/backend/replication/logical/tablesync.c
+++ b/src/backend/replication/logical/tablesync.c
@@ -449,10 +449,13 @@ process_syncing_tables_for_apply(XLogRecPtr current_lsn)
originid = replorigin_by_name(originname, true);
if (OidIsValid(originid))
{
- elog(DEBUG1,
- "process_syncing_tables_for_apply: dropping tablesync origin tracking for \"%s\".",
+ elog(LOG,
+ "!!>> process_syncing_tables_for_apply: dropping tablesync origin tracking for \"%s\".",
originname);
replorigin_drop(originid, false /* nowait */ );
+ elog(LOG,
+ "!!>> process_syncing_tables_for_apply: dropped tablesync origin tracking for \"%s\".",
+ originname);
}
/*
@@ -941,12 +944,17 @@ LogicalRepSyncTableStart(XLogRecPtr *origin_startpos)
* The COPY phase was previously done, but tablesync then crashed
* before it was able to finish normally.
*/
+ elog(LOG,
+ "!!>> LogicalRepSyncTableStart: tablesync relstate was SUBREL_STATE_FINISHEDCOPY.");
StartTransactionCommand();
/*
* Slot creation passes NULL lsn because the origin startpos is got
* from origin tracking this time, not from the slot.
*/
+ elog(LOG,
+ "!!>> LogicalRepSyncTableStart: 2 walrcv_create_slot for \"%s\".",
+ slotname);
walrcv_create_slot(wrconn, slotname, true /* temporary */ ,
CRS_NOEXPORT_SNAPSHOT, NULL /* lsn */ );
@@ -955,8 +963,14 @@ LogicalRepSyncTableStart(XLogRecPtr *origin_startpos)
* time this tablesync was launched.
*/
originid = replorigin_by_name(originname, false /* missing_ok */ );
+ elog(LOG,
+ "!!>> LogicalRepSyncTableStart: 2 replorigin_session_setup \"%s\".",
+ originname);
replorigin_session_setup(originid);
replorigin_session_origin = originid;
+ elog(LOG,
+ "!!>> LogicalRepSyncTableStart: 2 replorigin_session_get_progress \"%s\".",
+ originname);
*origin_startpos = replorigin_session_get_progress(false);
goto copy_table_done;
@@ -1005,6 +1019,9 @@ LogicalRepSyncTableStart(XLogRecPtr *origin_startpos)
* for the catchup phase after COPY is done, so tell it to use the
* snapshot to make the final data consistent.
*/
+ elog(LOG,
+ "!!>> LogicalRepSyncTableStart: 1 walrcv_create_slot for \"%s\".",
+ slotname);
walrcv_create_slot(wrconn, slotname, true,
CRS_USE_SNAPSHOT, origin_startpos);
@@ -1036,13 +1053,23 @@ LogicalRepSyncTableStart(XLogRecPtr *origin_startpos)
* logged for for the purpose of recovery. Locks are to prevent the
* replication origin from vanishing while advancing.
*/
+ LockRelationOid(ReplicationOriginRelationId, RowExclusiveLock);
+ elog(LOG,
+ "!!>> LogicalRepSyncTableStart: 1 replorigin_create \"%s\".",
+ originname);
originid = replorigin_create(originname);
LockRelationOid(ReplicationOriginRelationId, RowExclusiveLock);
+ elog(LOG,
+ "!!>> LogicalRepSyncTableStart: 1 replorigin_advance \"%s\".",
+ originname);
replorigin_advance(originid, *origin_startpos, InvalidXLogRecPtr,
true /* go backward */ , true /* WAL log */ );
UnlockRelationOid(ReplicationOriginRelationId, RowExclusiveLock);
+ elog(LOG,
+ "!!>> LogicalRepSyncTableStart: 1 replorigin_session_setup \"%s\".",
+ originname);
replorigin_session_setup(originid);
replorigin_session_origin = originid;
}
--
1.8.3.1
On Fri, Jan 22, 2021 at 1:43 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
On Thu, Jan 21, 2021 at 3:47 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
On Tue, Jan 19, 2021 at 2:32 PM Peter Smith <smithpb2250@gmail.com> wrote:
Hi Amit.
PSA the v17 patch for the Tablesync Solution1.
Thanks for the updated patch. Below are few comments:
One more comment:
In LogicalRepSyncTableStart(), you are trying to remove the slot on
the failure of copy which won't work if the publisher is down. If that
happens on restart of tablesync worker, we will retry to create the
slot with the same name and it will fail because the previous slot is
still not removed from the publisher. I think the same problem can
happen if, after an error in tablesync worker and we drop the
subscription before tablesync worker gets a chance to restart. So, to
avoid these problems can we use the TEMPORARY slot for tablesync
workers as previously? If I remember correctly, the main problem was
we don't know where to start decoding if we fail in catchup phase. But
for that origins should be sufficient because if we fail before copy
then anyway we have to create a new slot and origin but if we fail
after copy then we can use the start_decoding_position from the
origin. So before copy, we still need to use CRS_USE_SNAPSHOT while
creating a temporary slot but if we are already in FINISHED COPY state
at the start of tablesync worker then create a slot with
CRS_NOEXPORT_SNAPSHOT option and then use origin's start_pos and
proceed decoding changes from that point onwards similar to how
currently the apply worker works.
OK. Code is modified as suggested in the latest patch [v18].
Now that tablesync slots are temporary, quite a lot of cleanup code
from the previous patch (v17) is no longer required so has been
removed.
----
[v18] = /messages/by-id/CAHut+Pvm0R=Mn_uVN_JhK0scE54V6+EDGHJg1WYJx0Q8HX_mkQ@mail.gmail.com
Kind Regards,
Peter Smith.
Fujitsu Australia
On Thu, Jan 21, 2021 at 9:17 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
On Tue, Jan 19, 2021 at 2:32 PM Peter Smith <smithpb2250@gmail.com> wrote:
Hi Amit.
PSA the v17 patch for the Tablesync Solution1.
Thanks for the updated patch. Below are few comments:
1. Why are we changing the scope of PG_TRY in DropSubscription()?
Also, it might be better to keep the replication slot drop part as it
is.
The latest patch [v18] was re-designed to make tablesync slots as
TEMPORARY [ak0122], so this code in DropSubscription is modified a
lot. This review comment is not applicable anymore.
2. - * - Tablesync worker finishes the copy and sets table state to SYNCWAIT; - * waits for state change. + * - Tablesync worker does initial table copy; there is a FINISHEDCOPY state to + * indicate when the copy phase has completed, so if the worker crashes + * before reaching SYNCDONE the copy will not be re-attempted.In the last line, shouldn't the state be FINISHEDCOPY instead of SYNCDONE?
OK. The code comment was correct, but maybe confusing. I have reworded
it in the latest patch [v18].
3. +void +tablesync_cleanup_at_interrupt(void) +{ + bool drop_slot_needed; + char originname[NAMEDATALEN] = {0}; + RepOriginId originid; + TimeLineID tli; + Oid subid = MySubscription->oid; + Oid relid = MyLogicalRepWorker->relid; + + elog(DEBUG1, + "tablesync_cleanup_at_interrupt for relid = %d", + MyLogicalRepWorker->relid);The function name and message makes it sound like that we drop slot
and origin at any interrupt. Isn't it better to name it as
tablesync_cleanup_at_shutdown()?
The latest patch [v18] was re-designed to make tablesync slots as
TEMPORARY [ak0122], so this cleanup function is removed. This review
comment is not applicable anymore.
4. + drop_slot_needed = + wrconn != NULL && + MyLogicalRepWorker->relstate != SUBREL_STATE_SYNCDONE && + MyLogicalRepWorker->relstate != SUBREL_STATE_READY; + + if (drop_slot_needed) + { + char syncslotname[NAMEDATALEN] = {0}; + bool missing_ok = true; /* no ERROR if slot is missing. */I think we can avoid using missing_ok and drop_slot_needed variables.
The latest patch [v18] was re-designed to make tablesync slots as
TEMPORARY [ak0122], so this code no longer exists. This review comment
is not applicable anymore.
5. Can we drop the origin along with the slot in
process_syncing_tables_for_sync() instead of
process_syncing_tables_for_apply()? I think this is possible because
of the other changes you made in origin.c. Also, if possible, we can
try to use the same code to drop the slot and origin in
tablesync_cleanup_at_interrupt and process_syncing_tables_for_sync.
No, the origin tracking cannot be dropped by the tablesync worker for
the normal use-case even with my modified origin.c; it would fail
during the commit TX because while trying to do
replorigin_session_advance it would find the asserted origin id was
not there anymore.
Also, the latest patch [v18] was re-designed to make tablesync slots
as TEMPORARY [ak0122], so the tablesync_cleanup_at_interrupt function
no longer exists (so the origin.c change of v17 has also been
removed).
6. + if (MyLogicalRepWorker->relstate == SUBREL_STATE_FINISHEDCOPY) + { + /* + * The COPY phase was previously done, but tablesync then crashed/etc + * before it was able to finish normally. + */There seems to be a typo (crashed/etc) in the above comment.
OK. Fixed in latest patch [v18].
----
[ak0122] = /messages/by-id/CAA4eK1LS0_mdVx2zG3cS+H88FJiwyS3kZi7zxijJ_gEuw2uQ2g@mail.gmail.com
[v18] = /messages/by-id/CAHut+Pvm0R=Mn_uVN_JhK0scE54V6+EDGHJg1WYJx0Q8HX_mkQ@mail.gmail.com
Kind Regards,
Peter Smith.
Fujitsu Australia
On Thu, Jan 21, 2021 at 9:17 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
7. +# check for occurrence of the expected error +poll_output_until("replication slot \"$slotname\" already exists") + or die "no error stop for the pre-existing origin";In this test, isn't it better to check for datasync state like below?
004_sync.pl has some other similar test.
my $started_query = "SELECT srsubstate = 'd' FROM pg_subscription_rel;";
$node_subscriber->poll_query_until('postgres', $started_query)
or die "Timed out while waiting for subscriber to start sync";Is there a reason why we can't use the existing way to check for
failure in this case?
Since the new design now uses temporary slots, is this test case still
required?. If required, I can change it accordingly.
regards,
Ajin Cherian
Fujitsu Australia
On Sat, Jan 23, 2021 at 8:37 AM Ajin Cherian <itsajin@gmail.com> wrote:
On Thu, Jan 21, 2021 at 9:17 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
7. +# check for occurrence of the expected error +poll_output_until("replication slot \"$slotname\" already exists") + or die "no error stop for the pre-existing origin";In this test, isn't it better to check for datasync state like below?
004_sync.pl has some other similar test.
my $started_query = "SELECT srsubstate = 'd' FROM pg_subscription_rel;";
$node_subscriber->poll_query_until('postgres', $started_query)
or die "Timed out while waiting for subscriber to start sync";Is there a reason why we can't use the existing way to check for
failure in this case?Since the new design now uses temporary slots, is this test case still
required?
I think so. But do you have any reason to believe that it won't be
required anymore?
--
With Regards,
Amit Kapila.
On Sat, Jan 23, 2021 at 3:16 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
I think so. But do you have any reason to believe that it won't be
required anymore?
A temporary slot will not clash with a permanent slot of the same name,
regards,
Ajin Cherian
Fujitsu
On Sat, Jan 23, 2021 at 4:55 AM Peter Smith <smithpb2250@gmail.com> wrote:
PSA the v18 patch for the Tablesync Solution1.
Few comments:
=============
1.
- * So the state progression is always: INIT -> DATASYNC -> SYNCWAIT ->
- * CATCHUP -> SYNCDONE -> READY.
+ * So the state progression is always: INIT -> DATASYNC ->
+ * (sync worker FINISHEDCOPY) -> SYNCWAIT -> CATCHUP -> SYNCDONE -> READY.
I don't think we need to be specific here that sync worker sets
FINISHEDCOPY state.
2.
@@ -98,11 +102,16 @@
#include "miscadmin.h"
#include "parser/parse_relation.h"
#include "pgstat.h"
+#include "postmaster/interrupt.h"
#include "replication/logicallauncher.h"
#include "replication/logicalrelation.h"
+#include "replication/logicalworker.h"
#include "replication/walreceiver.h"
#include "replication/worker_internal.h"
+#include "replication/slot.h"
I don't think the above includes are required. They seem to the
remnant of the previous approach.
3.
process_syncing_tables_for_sync(XLogRecPtr current_lsn)
{
- Assert(IsTransactionState());
+ bool sync_done = false;
SpinLockAcquire(&MyLogicalRepWorker->relmutex);
+ sync_done = MyLogicalRepWorker->relstate == SUBREL_STATE_CATCHUP &&
+ current_lsn >= MyLogicalRepWorker->relstate_lsn;
+ SpinLockRelease(&MyLogicalRepWorker->relmutex);
- if (MyLogicalRepWorker->relstate == SUBREL_STATE_CATCHUP &&
- current_lsn >= MyLogicalRepWorker->relstate_lsn)
+ if (sync_done)
{
TimeLineID tli;
+ /*
+ * Change state to SYNCDONE.
+ */
+ SpinLockAcquire(&MyLogicalRepWorker->relmutex);
Why do we need these changes? If you have done it for the
code-readability purpose then we can consider this as a separate patch
because I don't see why these are required w.r.t this patch.
4.
- /*
- * To build a slot name for the sync work, we are limited to NAMEDATALEN -
- * 1 characters. We cut the original slot name to NAMEDATALEN - 28 chars
- * and append _%u_sync_%u (1 + 10 + 6 + 10 + '\0'). (It's actually the
- * NAMEDATALEN on the remote that matters, but this scheme will also work
- * reasonably if that is different.)
- */
- StaticAssertStmt(NAMEDATALEN >= 32, "NAMEDATALEN too small"); /* for sanity */
- slotname = psprintf("%.*s_%u_sync_%u",
- NAMEDATALEN - 28,
- MySubscription->slotname,
- MySubscription->oid,
- MyLogicalRepWorker->relid);
+ /* Calculate the name of the tablesync slot. */
+ slotname = ReplicationSlotNameForTablesync(
+ MySubscription->oid,
+ MyLogicalRepWorker->relid);
What is the reason for changing the slot name calculation? If there is
any particular reasons, then we can add a comment to indicate why we
can't include the subscription's slotname in this calculation.
5.
This is WAL
+ * logged for for the purpose of recovery. Locks are to prevent the
+ * replication origin from vanishing while advancing.
/for for/for
6.
+ /* Remove the tablesync's origin tracking if exists. */
+ snprintf(originname, sizeof(originname), "pg_%u_%u", subid, relid);
+ originid = replorigin_by_name(originname, true);
+ if (originid != InvalidRepOriginId)
+ {
+ elog(DEBUG1, "DropSubscription: dropping origin tracking for
\"%s\"", originname);
I don't think we need this and the DEBUG1 message in
AlterSubscription_refresh. IT is fine to print this information for
background workers like in apply-worker but not sure if need it here.
The DropSubscription drops the origin of apply worker but it doesn't
use such a DEBUG message so I guess we don't it for tablesync origins
as well.
7. Have you tested with the new patch the scenario where we crash
after FINISHEDCOPY and before SYNCDONE, is it able to pick up the
replication using the new temporary slot? Here, we need to test the
case where during the catchup phase we have received few commits and
then the tablesync worker is crashed/errored out? Basically, check if
the replication is continued from the same point? I understand that
this can be only tested by adding some logs and we might not be able
to write a test for it.
--
With Regards,
Amit Kapila.
FYI - I have done some long-running testing using the current patch [v18].
1. The src/test/subscription TAP tests:
- Subscription TAP tests were executed in a loop X 150 iterations.
- Duration 5 hrs.
- All iterations report "Result: PASS"
2. The postgres "make check" tests:
- make check was executed in a loop X 150 iterations.
- Duration 2 hrs.
- All iterations report "All 202 tests passed"
---
[v18] /messages/by-id/CAHut+Pvm0R=Mn_uVN_JhK0scE54V6+EDGHJg1WYJx0Q8HX_mkQ@mail.gmail.com
Kind Regards,
Peter Smith.
Fujitsu Australia
On Sat, Jan 23, 2021 at 11:26 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
On Sat, Jan 23, 2021 at 4:55 AM Peter Smith <smithpb2250@gmail.com> wrote:
PSA the v18 patch for the Tablesync Solution1.
Few comments: ============= 1. - * So the state progression is always: INIT -> DATASYNC -> SYNCWAIT -> - * CATCHUP -> SYNCDONE -> READY. + * So the state progression is always: INIT -> DATASYNC -> + * (sync worker FINISHEDCOPY) -> SYNCWAIT -> CATCHUP -> SYNCDONE -> READY.I don't think we need to be specific here that sync worker sets
FINISHEDCOPY state.
This was meant to indicate that *only* the sync worker knows about the
FINISHEDCOPY state, whereas all the other states are either known
(assigned and/or used) by *both* kinds of workers. But, I can remove
it if you feel that distinction is not useful.
4. - /* - * To build a slot name for the sync work, we are limited to NAMEDATALEN - - * 1 characters. We cut the original slot name to NAMEDATALEN - 28 chars - * and append _%u_sync_%u (1 + 10 + 6 + 10 + '\0'). (It's actually the - * NAMEDATALEN on the remote that matters, but this scheme will also work - * reasonably if that is different.) - */ - StaticAssertStmt(NAMEDATALEN >= 32, "NAMEDATALEN too small"); /* for sanity */ - slotname = psprintf("%.*s_%u_sync_%u", - NAMEDATALEN - 28, - MySubscription->slotname, - MySubscription->oid, - MyLogicalRepWorker->relid); + /* Calculate the name of the tablesync slot. */ + slotname = ReplicationSlotNameForTablesync( + MySubscription->oid, + MyLogicalRepWorker->relid);What is the reason for changing the slot name calculation? If there is
any particular reasons, then we can add a comment to indicate why we
can't include the subscription's slotname in this calculation.
The subscription slot name may be changed (e.g. ALTER SUBSCRIPTION)
and so including the subscription slot name as part of the tablesync
slot name was considered to be:
a) possibly risky/undefined, if the subscription slot_name = NONE
b) confusing, if we end up using 2 different slot names for the same
tablesync (e.g. if the subscription slot name is changed before a sync
worker is re-launched).
And since this subscription slot name part is not necessary for
uniqueness anyway, it was removed from the tablesync slot name to
eliminate those concerns.
Also, the tablesync slot name calculation was encapsulated as a
separate function because previously (i.e. before v18) it was used by
various other cleanup codes. I still like it better as a function, but
now it is only called from one place so we could put that code back
inline if you prefer it how it was..
----
Kind Regards,
Peter Smith.
Fujitsu Australia
On Sun, Jan 24, 2021 at 5:54 PM Peter Smith <smithpb2250@gmail.com> wrote:
4. - /* - * To build a slot name for the sync work, we are limited to NAMEDATALEN - - * 1 characters. We cut the original slot name to NAMEDATALEN - 28 chars - * and append _%u_sync_%u (1 + 10 + 6 + 10 + '\0'). (It's actually the - * NAMEDATALEN on the remote that matters, but this scheme will also work - * reasonably if that is different.) - */ - StaticAssertStmt(NAMEDATALEN >= 32, "NAMEDATALEN too small"); /* for sanity */ - slotname = psprintf("%.*s_%u_sync_%u", - NAMEDATALEN - 28, - MySubscription->slotname, - MySubscription->oid, - MyLogicalRepWorker->relid); + /* Calculate the name of the tablesync slot. */ + slotname = ReplicationSlotNameForTablesync( + MySubscription->oid, + MyLogicalRepWorker->relid);What is the reason for changing the slot name calculation? If there is
any particular reasons, then we can add a comment to indicate why we
can't include the subscription's slotname in this calculation.The subscription slot name may be changed (e.g. ALTER SUBSCRIPTION)
and so including the subscription slot name as part of the tablesync
slot name was considered to be:
a) possibly risky/undefined, if the subscription slot_name = NONE
b) confusing, if we end up using 2 different slot names for the same
tablesync (e.g. if the subscription slot name is changed before a sync
worker is re-launched).
And since this subscription slot name part is not necessary for
uniqueness anyway, it was removed from the tablesync slot name to
eliminate those concerns.Also, the tablesync slot name calculation was encapsulated as a
separate function because previously (i.e. before v18) it was used by
various other cleanup codes. I still like it better as a function, but
now it is only called from one place so we could put that code back
inline if you prefer it how it was..
It turns out those (a/b) concerns I wrote above are maybe unfounded,
because it seems not possible to alter the slot_name = NONE unless the
subscription is first DISABLED.
So probably I can revert all this tablesync slot name calculation back
to how it originally was in the OSS HEAD if you want.
----
Kind Regards,
Peter Smith.
Fujitsu Australia
Hi Amit.
PSA the v19 patch for the Tablesync Solution1.
Main differences from v18:
+ Patch has been rebased off HEAD @ 24/Jan
+ Addressing some review comments [ak0123]
[ak0123] /messages/by-id/CAA4eK1JhpuwujrV6ABMmZ3jXfW37ssZnJ3fikrY7rRdvoEmu_g@mail.gmail.com
====
Features:
* The tablesync worker is now allowing multiple tx instead of single tx.
* A new state (SUBREL_STATE_FINISHEDCOPY) is persisted after a
successful copy_table in tablesync's LogicalRepSyncTableStart.
* If a re-launched tablesync finds state SUBREL_STATE_FINISHEDCOPY
then it will bypass the initial copy_table phase.
* Now tablesync sets up replication origin tracking in
LogicalRepSyncTableStart (similar as done for the apply worker). The
origin is advanced when first created.
* The tablesync replication origin tracking record is cleaned up by:
- process_syncing_tables_for_apply
- DropSubscription
- AlterSubscription_refresh
* Updates to PG docs.
* New TAP test case.
Known Issues:
* None.
---
Kind Regards,
Peter Smith.
Fujitsu Australia
Attachments:
v19-0001-Tablesync-Solution1.patchapplication/octet-stream; name=v19-0001-Tablesync-Solution1.patchDownload
From d3153f6469800f0e1eaa115e674eff1efeba3fc0 Mon Sep 17 00:00:00 2001
From: Peter Smith <peter.b.smith@fujitsu.com>
Date: Mon, 25 Jan 2021 13:07:40 +1100
Subject: [PATCH v19] Tablesync Solution1.
====
Features:
* The tablesync worker is now allowing multiple tx instead of single tx.
* A new state (SUBREL_STATE_FINISHEDCOPY) is persisted after a successful copy_table in tablesync's LogicalRepSyncTableStart.
* If a re-launched tablesync finds state SUBREL_STATE_FINISHEDCOPY then it will bypass the initial copy_table phase.
* Now tablesync sets up replication origin tracking in LogicalRepSyncTableStart (similar as done for the apply worker). The origin is advanced when first created.
* The tablesync replication origin tracking record is cleaned up by:
- process_syncing_tables_for_apply
- DropSubscription
- AlterSubscription_refresh
* Updates to PG docs.
* New TAP test case.
Known Issues:
* None.
---
doc/src/sgml/catalogs.sgml | 1 +
src/backend/commands/subscriptioncmds.c | 33 +++++++
src/backend/replication/logical/tablesync.c | 139 ++++++++++++++++++++++++++--
src/backend/replication/logical/worker.c | 18 +---
src/include/catalog/pg_subscription_rel.h | 2 +
src/test/subscription/t/004_sync.pl | 96 ++++++++++++++++++-
6 files changed, 265 insertions(+), 24 deletions(-)
diff --git a/doc/src/sgml/catalogs.sgml b/doc/src/sgml/catalogs.sgml
index 43d7a1a..82e74e1 100644
--- a/doc/src/sgml/catalogs.sgml
+++ b/doc/src/sgml/catalogs.sgml
@@ -7662,6 +7662,7 @@ SCRAM-SHA-256$<replaceable><iteration count></replaceable>:<replaceable>&l
State code:
<literal>i</literal> = initialize,
<literal>d</literal> = data is being copied,
+ <literal>f</literal> = finished table copy,
<literal>s</literal> = synchronized,
<literal>r</literal> = ready (normal replication)
</para></entry>
diff --git a/src/backend/commands/subscriptioncmds.c b/src/backend/commands/subscriptioncmds.c
index 082f785..af13448 100644
--- a/src/backend/commands/subscriptioncmds.c
+++ b/src/backend/commands/subscriptioncmds.c
@@ -649,10 +649,19 @@ AlterSubscription_refresh(Subscription *sub, bool copy_data)
if (!bsearch(&relid, pubrel_local_oids,
list_length(pubrel_names), sizeof(Oid), oid_cmp))
{
+ char originname[NAMEDATALEN];
+ RepOriginId originid;
+
RemoveSubscriptionRel(sub->oid, relid);
logicalrep_worker_stop_at_commit(sub->oid, relid);
+ /* Remove the tablesync's origin tracking if exists. */
+ snprintf(originname, sizeof(originname), "pg_%u_%u", sub->oid, relid);
+ originid = replorigin_by_name(originname, true);
+ if (originid != InvalidRepOriginId)
+ replorigin_drop(originid, false /* nowait */ );
+
ereport(DEBUG1,
(errmsg("table \"%s.%s\" removed from subscription \"%s\"",
get_namespace_name(get_rel_namespace(relid)),
@@ -930,6 +939,7 @@ DropSubscription(DropSubscriptionStmt *stmt, bool isTopLevel)
WalReceiverConn *wrconn = NULL;
StringInfoData cmd;
Form_pg_subscription form;
+ List *rstates;
/*
* Lock pg_subscription with AccessExclusiveLock to ensure that the
@@ -1042,6 +1052,29 @@ DropSubscription(DropSubscriptionStmt *stmt, bool isTopLevel)
}
list_free(subworkers);
+ /*
+ * Tablesync resource cleanup (origins).
+ *
+ * Any READY-state relations have already done this.
+ */
+ rstates = GetSubscriptionNotReadyRelations(subid);
+ foreach(lc, rstates)
+ {
+ SubscriptionRelState *rstate = (SubscriptionRelState *) lfirst(lc);
+ Oid relid = rstate->relid;
+
+ /* Only cleanup the tablesync worker resources */
+ if (!OidIsValid(relid))
+ continue;
+
+ /* Remove the tablesync's origin tracking if exists. */
+ snprintf(originname, sizeof(originname), "pg_%u_%u", subid, relid);
+ originid = replorigin_by_name(originname, true);
+ if (originid != InvalidRepOriginId)
+ replorigin_drop(originid, false /* nowait */ );
+ }
+ list_free(rstates);
+
/* Clean up dependencies */
deleteSharedDependencyRecordsFor(SubscriptionRelationId, subid, 0);
diff --git a/src/backend/replication/logical/tablesync.c b/src/backend/replication/logical/tablesync.c
index 863d196..ae446f5 100644
--- a/src/backend/replication/logical/tablesync.c
+++ b/src/backend/replication/logical/tablesync.c
@@ -31,8 +31,10 @@
* table state to INIT.
* - Tablesync worker starts; changes table state from INIT to DATASYNC while
* copying.
- * - Tablesync worker finishes the copy and sets table state to SYNCWAIT;
- * waits for state change.
+ * - Tablesync worker does initial table copy; there is a FINISHEDCOPY state to
+ * indicate when the copy phase has completed, so if the worker crashes
+ * with this (non-memory) state then the copy will not be re-attempted.
+ * - Tablesync worker then sets table state to SYNCWAIT; waits for state change.
* - Apply worker periodically checks for tables in SYNCWAIT state. When
* any appear, it sets the table state to CATCHUP and starts loop-waiting
* until either the table state is set to SYNCDONE or the sync worker
@@ -48,8 +50,8 @@
* point it sets state to READY and stops tracking. Again, there might
* be zero changes in between.
*
- * So the state progression is always: INIT -> DATASYNC -> SYNCWAIT ->
- * CATCHUP -> SYNCDONE -> READY.
+ * So the state progression is always: INIT -> DATASYNC ->
+ * (sync worker FINISHEDCOPY) -> SYNCWAIT -> CATCHUP -> SYNCDONE -> READY.
*
* The catalog pg_subscription_rel is used to keep information about
* subscribed tables and their state. Some transient state during data
@@ -59,6 +61,7 @@
* Example flows look like this:
* - Apply is in front:
* sync:8
+ * -> set in catalog FINISHEDCOPY
* -> set in memory SYNCWAIT
* apply:10
* -> set in memory CATCHUP
@@ -74,6 +77,7 @@
*
* - Sync is in front:
* sync:10
+ * -> set in catalog FINISHEDCOPY
* -> set in memory SYNCWAIT
* apply:8
* -> set in memory CATCHUP
@@ -102,7 +106,10 @@
#include "replication/logicalrelation.h"
#include "replication/walreceiver.h"
#include "replication/worker_internal.h"
+#include "replication/slot.h"
+#include "replication/origin.h"
#include "storage/ipc.h"
+#include "storage/lmgr.h"
#include "utils/builtins.h"
#include "utils/lsyscache.h"
#include "utils/memutils.h"
@@ -270,8 +277,6 @@ invalidate_syncing_table_states(Datum arg, int cacheid, uint32 hashvalue)
static void
process_syncing_tables_for_sync(XLogRecPtr current_lsn)
{
- Assert(IsTransactionState());
-
SpinLockAcquire(&MyLogicalRepWorker->relmutex);
if (MyLogicalRepWorker->relstate == SUBREL_STATE_CATCHUP &&
@@ -279,11 +284,23 @@ process_syncing_tables_for_sync(XLogRecPtr current_lsn)
{
TimeLineID tli;
+ /*
+ * Change state to SYNCDONE.
+ */
MyLogicalRepWorker->relstate = SUBREL_STATE_SYNCDONE;
MyLogicalRepWorker->relstate_lsn = current_lsn;
SpinLockRelease(&MyLogicalRepWorker->relmutex);
+ /*
+ * UpdateSubscriptionRelState must be called within a transaction.
+ * That transaction will be ended within the finish_sync_worker().
+ */
+ if (!IsTransactionState())
+ {
+ StartTransactionCommand();
+ }
+
UpdateSubscriptionRelState(MyLogicalRepWorker->subid,
MyLogicalRepWorker->relid,
MyLogicalRepWorker->relstate,
@@ -404,6 +421,9 @@ process_syncing_tables_for_apply(XLogRecPtr current_lsn)
*/
if (current_lsn >= rstate->lsn)
{
+ char originname[NAMEDATALEN];
+ RepOriginId originid;
+
rstate->state = SUBREL_STATE_READY;
rstate->lsn = current_lsn;
if (!started_tx)
@@ -412,6 +432,27 @@ process_syncing_tables_for_apply(XLogRecPtr current_lsn)
started_tx = true;
}
+ /*
+ * Remove the tablesync origin tracking if exists.
+ *
+ * The normal case origin drop must be done here, not in the
+ * process_syncing_tables_for_sync function, because if the
+ * tablesync worker process attempted to drop its own origin
+ * then it would fail (origin is "busy").
+ */
+ snprintf(originname, sizeof(originname), "pg_%u_%u", MyLogicalRepWorker->subid, rstate->relid);
+ originid = replorigin_by_name(originname, true);
+ if (OidIsValid(originid))
+ {
+ elog(DEBUG1,
+ "process_syncing_tables_for_apply: dropping tablesync origin tracking for \"%s\".",
+ originname);
+ replorigin_drop(originid, false /* nowait */ );
+ }
+
+ /*
+ * Update the state to READY only after the origin cleanup.
+ */
UpdateSubscriptionRelState(MyLogicalRepWorker->subid,
rstate->relid, rstate->state,
rstate->lsn);
@@ -824,6 +865,8 @@ LogicalRepSyncTableStart(XLogRecPtr *origin_startpos)
XLogRecPtr relstate_lsn;
Relation rel;
WalRcvExecResult *res;
+ char originname[NAMEDATALEN];
+ RepOriginId originid;
/* Check the state of the table synchronization. */
StartTransactionCommand();
@@ -874,7 +917,38 @@ LogicalRepSyncTableStart(XLogRecPtr *origin_startpos)
(errmsg("could not connect to the publisher: %s", err)));
Assert(MyLogicalRepWorker->relstate == SUBREL_STATE_INIT ||
- MyLogicalRepWorker->relstate == SUBREL_STATE_DATASYNC);
+ MyLogicalRepWorker->relstate == SUBREL_STATE_DATASYNC ||
+ MyLogicalRepWorker->relstate == SUBREL_STATE_FINISHEDCOPY);
+
+ /* Assign the origin tracking record name. */
+ snprintf(originname, sizeof(originname), "pg_%u_%u", MySubscription->oid, MyLogicalRepWorker->relid);
+
+ if (MyLogicalRepWorker->relstate == SUBREL_STATE_FINISHEDCOPY)
+ {
+ /*
+ * The COPY phase was previously done, but tablesync then crashed
+ * before it was able to finish normally.
+ */
+ StartTransactionCommand();
+
+ /*
+ * Slot creation passes NULL lsn because the origin startpos is got
+ * from origin tracking this time, not from the slot.
+ */
+ walrcv_create_slot(wrconn, slotname, true /* temporary */ ,
+ CRS_NOEXPORT_SNAPSHOT, NULL /* lsn */ );
+
+ /*
+ * The origin tracking name must already exist. It was created first
+ * time this tablesync was launched.
+ */
+ originid = replorigin_by_name(originname, false /* missing_ok */ );
+ replorigin_session_setup(originid);
+ replorigin_session_origin = originid;
+ *origin_startpos = replorigin_session_get_progress(false);
+
+ goto copy_table_done;
+ }
SpinLockAcquire(&MyLogicalRepWorker->relmutex);
MyLogicalRepWorker->relstate = SUBREL_STATE_DATASYNC;
@@ -890,9 +964,6 @@ LogicalRepSyncTableStart(XLogRecPtr *origin_startpos)
CommitTransactionCommand();
pgstat_report_stat(false);
- /*
- * We want to do the table data sync in a single transaction.
- */
StartTransactionCommand();
/*
@@ -942,6 +1013,54 @@ LogicalRepSyncTableStart(XLogRecPtr *origin_startpos)
/* Make the copy visible. */
CommandCounterIncrement();
+ /* Setup replication origin tracking. */
+ originid = replorigin_by_name(originname, true);
+ if (!OidIsValid(originid))
+ {
+ /*
+ * Origin tracking does not exist, so create it now.
+ *
+ * Then advance to the LSN got from walrcv_create_slot. This is WAL
+ * logged for the purpose of recovery. Locks are to prevent the
+ * replication origin from vanishing while advancing.
+ */
+ originid = replorigin_create(originname);
+
+ LockRelationOid(ReplicationOriginRelationId, RowExclusiveLock);
+ replorigin_advance(originid, *origin_startpos, InvalidXLogRecPtr,
+ true /* go backward */ , true /* WAL log */ );
+ UnlockRelationOid(ReplicationOriginRelationId, RowExclusiveLock);
+
+ replorigin_session_setup(originid);
+ replorigin_session_origin = originid;
+ }
+ else
+ {
+ ereport(ERROR,
+ (errcode(ERRCODE_DUPLICATE_OBJECT),
+ errmsg("replication origin \"%s\" already exists",
+ originname)));
+ }
+
+ /*
+ * Update the persisted state to indicate the COPY phase is done; make it
+ * visible to others.
+ */
+ UpdateSubscriptionRelState(MyLogicalRepWorker->subid,
+ MyLogicalRepWorker->relid,
+ SUBREL_STATE_FINISHEDCOPY,
+ MyLogicalRepWorker->relstate_lsn);
+
+copy_table_done:
+
+ elog(DEBUG1,
+ "LogicalRepSyncTableStart: '%s' origin_startpos lsn %X/%X",
+ originname,
+ (uint32) (*origin_startpos >> 32),
+ (uint32) *origin_startpos);
+
+ CommitTransactionCommand();
+
/*
* We are done with the initial data synchronization, update the state.
*/
diff --git a/src/backend/replication/logical/worker.c b/src/backend/replication/logical/worker.c
index eb7db89..cfc924c 100644
--- a/src/backend/replication/logical/worker.c
+++ b/src/backend/replication/logical/worker.c
@@ -807,12 +807,8 @@ apply_handle_stream_stop(StringInfo s)
/* We must be in a valid transaction state */
Assert(IsTransactionState());
- /* The synchronization worker runs in single transaction. */
- if (!am_tablesync_worker())
- {
- /* Commit the per-stream transaction */
- CommitTransactionCommand();
- }
+ /* Commit the per-stream transaction */
+ CommitTransactionCommand();
in_streamed_transaction = false;
@@ -889,9 +885,7 @@ apply_handle_stream_abort(StringInfo s)
/* Cleanup the subxact info */
cleanup_subxact_info();
- /* The synchronization worker runs in single transaction */
- if (!am_tablesync_worker())
- CommitTransactionCommand();
+ CommitTransactionCommand();
return;
}
@@ -918,8 +912,7 @@ apply_handle_stream_abort(StringInfo s)
/* write the updated subxact list */
subxact_info_write(MyLogicalRepWorker->subid, xid);
- if (!am_tablesync_worker())
- CommitTransactionCommand();
+ CommitTransactionCommand();
}
}
@@ -1062,8 +1055,7 @@ apply_handle_stream_commit(StringInfo s)
static void
apply_handle_commit_internal(StringInfo s, LogicalRepCommitData *commit_data)
{
- /* The synchronization worker runs in single transaction. */
- if (IsTransactionState() && !am_tablesync_worker())
+ if (IsTransactionState())
{
/*
* Update origin state so we can restart streaming from correct
diff --git a/src/include/catalog/pg_subscription_rel.h b/src/include/catalog/pg_subscription_rel.h
index 06663b9..9027c42 100644
--- a/src/include/catalog/pg_subscription_rel.h
+++ b/src/include/catalog/pg_subscription_rel.h
@@ -61,6 +61,8 @@ DECLARE_UNIQUE_INDEX(pg_subscription_rel_srrelid_srsubid_index, 6117, on pg_subs
#define SUBREL_STATE_INIT 'i' /* initializing (sublsn NULL) */
#define SUBREL_STATE_DATASYNC 'd' /* data is being synchronized (sublsn
* NULL) */
+#define SUBREL_STATE_FINISHEDCOPY 'f' /* tablesync copy phase is completed
+ * (sublsn NULL) */
#define SUBREL_STATE_SYNCDONE 's' /* synchronization finished in front of
* apply (sublsn set) */
#define SUBREL_STATE_READY 'r' /* ready (sublsn set) */
diff --git a/src/test/subscription/t/004_sync.pl b/src/test/subscription/t/004_sync.pl
index e111ab9..52915d9 100644
--- a/src/test/subscription/t/004_sync.pl
+++ b/src/test/subscription/t/004_sync.pl
@@ -3,7 +3,9 @@ use strict;
use warnings;
use PostgresNode;
use TestLib;
-use Test::More tests => 7;
+use Test::More tests => 10;
+use Time::HiRes qw(usleep);
+use Scalar::Util qw(looks_like_number);
# Initialize publisher node
my $node_publisher = get_new_node('publisher');
@@ -149,7 +151,99 @@ $result = $node_subscriber->safe_psql('postgres',
is($result, qq(20),
'changes for table added after subscription initialized replicated');
+##
+## slot integrity
+##
+## Manually create a slot with the same name that tablesync will want.
+## Expect tablesync ERROR when clash is detected.
+## Then remove the slot so tablesync can proceed.
+## Expect tablesync can now finish normally.
+##
+
+# drop the subscription
+$node_subscriber->safe_psql('postgres', "DROP SUBSCRIPTION tap_sub");
+
+# empty the table tab_rep
+$node_subscriber->safe_psql('postgres', "DELETE FROM tab_rep;");
+
+# empty the table tab_rep_next
+$node_subscriber->safe_psql('postgres', "DELETE FROM tab_rep_next;");
+
+# recreate the subscription again, but leave it disabled so that we can get the OID
+$node_subscriber->safe_psql('postgres',
+ "CREATE SUBSCRIPTION tap_sub CONNECTION '$publisher_connstr' PUBLICATION tap_pub
+ with (enabled = false)"
+);
+
+# need to create the name of the tablesync slot, for this we need the subscription OID
+# and the table OID.
+my $subid = $node_subscriber->safe_psql('postgres',
+ "SELECT oid FROM pg_subscription WHERE subname = 'tap_sub';");
+is(looks_like_number($subid), qq(1), 'get the subscription OID');
+
+my $relid = $node_subscriber->safe_psql('postgres',
+ "SELECT 'tab_rep_next'::regclass::oid");
+is(looks_like_number($relid), qq(1), 'get the table OID');
+
+# name of the tablesync slot is 'slotname'_'suboid'_sync_'tableoid'.
+my $slotname = 'tap_sub_' . $subid . '_' . 'sync_' . $relid;
+
+# temporarily, create a slot having the same name of the tablesync slot.
+$node_publisher->safe_psql('postgres',
+ "SELECT 'init' FROM pg_create_logical_replication_slot('$slotname', 'pgoutput', false);");
+
+# enable the subscription
+$node_subscriber->safe_psql('postgres',
+ "ALTER SUBSCRIPTION tap_sub ENABLE"
+);
+
+# check for occurrence of the expected error
+poll_output_until("replication slot \"$slotname\" already exists")
+ or die "no error stop for the pre-existing origin";
+
+# now drop the offending slot, the tablesync should recover.
+$node_publisher->safe_psql('postgres',
+ "SELECT pg_drop_replication_slot('$slotname');");
+
+# wait for sync to finish
+$node_subscriber->poll_query_until('postgres', $synced_query)
+ or die "Timed out while waiting for subscriber to synchronize data";
+
+$result = $node_subscriber->safe_psql('postgres',
+ "SELECT count(*) FROM tab_rep_next");
+is($result, qq(20),
+ 'data for table added after subscription initialized are now synced');
+
+# Cleanup
$node_subscriber->safe_psql('postgres', "DROP SUBSCRIPTION tap_sub");
$node_subscriber->stop('fast');
$node_publisher->stop('fast');
+
+sub poll_output_until
+{
+ my ($expected) = @_;
+
+ $expected = 'xxxxxx' unless defined($expected); # default junk value
+
+ my $max_attempts = 10 * 10;
+ my $attempts = 0;
+
+ my $output_file = '';
+ while ($attempts < $max_attempts)
+ {
+ $output_file = slurp_file($node_subscriber->logfile());
+
+ if ($output_file =~ $expected)
+ {
+ return 1;
+ }
+
+ # Wait 0.1 second before retrying.
+ usleep(100_000);
+ $attempts++;
+ }
+
+ # The output result didn't change in 180 seconds. Give up
+ return 0;
+}
--
1.8.3.1
v19-0002-Tablesync-extra-logging.patchapplication/octet-stream; name=v19-0002-Tablesync-extra-logging.patchDownload
From e1cc6cad91752bb7e71bb640f123684a5891ae55 Mon Sep 17 00:00:00 2001
From: Peter Smith <peter.b.smith@fujitsu.com>
Date: Mon, 25 Jan 2021 13:25:07 +1100
Subject: [PATCH v19] Tablesync extra logging.
This patch only adds some extra logging which may be helpful for testing, but is not for committing.
---
src/backend/commands/subscriptioncmds.c | 20 +++++++++++++++++--
src/backend/replication/logical/tablesync.c | 30 +++++++++++++++++++++++++++--
2 files changed, 46 insertions(+), 4 deletions(-)
diff --git a/src/backend/commands/subscriptioncmds.c b/src/backend/commands/subscriptioncmds.c
index af13448..ef0817c 100644
--- a/src/backend/commands/subscriptioncmds.c
+++ b/src/backend/commands/subscriptioncmds.c
@@ -660,10 +660,18 @@ AlterSubscription_refresh(Subscription *sub, bool copy_data)
snprintf(originname, sizeof(originname), "pg_%u_%u", sub->oid, relid);
originid = replorigin_by_name(originname, true);
if (originid != InvalidRepOriginId)
+ {
+ elog(LOG,
+ "!!>> AlterSubscription_refresh: dropping origin tracking for \"%s\"",
+ originname);
replorigin_drop(originid, false /* nowait */ );
+ elog(LOG,
+ "!!>> AlterSubscription_refresh: dropped origin tracking for \"%s\"",
+ originname);
+ }
- ereport(DEBUG1,
- (errmsg("table \"%s.%s\" removed from subscription \"%s\"",
+ ereport(LOG,
+ (errmsg("!!>> table \"%s.%s\" removed from subscription \"%s\"",
get_namespace_name(get_rel_namespace(relid)),
get_rel_name(relid),
sub->name)));
@@ -1071,7 +1079,15 @@ DropSubscription(DropSubscriptionStmt *stmt, bool isTopLevel)
snprintf(originname, sizeof(originname), "pg_%u_%u", subid, relid);
originid = replorigin_by_name(originname, true);
if (originid != InvalidRepOriginId)
+ {
+ elog(LOG,
+ "!!>> DropSubscription: dropping origin tracking for \"%s\"",
+ originname);
replorigin_drop(originid, false /* nowait */ );
+ elog(LOG,
+ "!!>> DropSubscription: droppped origin tracking for \"%s\"",
+ originname);
+ }
}
list_free(rstates);
diff --git a/src/backend/replication/logical/tablesync.c b/src/backend/replication/logical/tablesync.c
index ae446f5..3693f4c 100644
--- a/src/backend/replication/logical/tablesync.c
+++ b/src/backend/replication/logical/tablesync.c
@@ -444,10 +444,13 @@ process_syncing_tables_for_apply(XLogRecPtr current_lsn)
originid = replorigin_by_name(originname, true);
if (OidIsValid(originid))
{
- elog(DEBUG1,
- "process_syncing_tables_for_apply: dropping tablesync origin tracking for \"%s\".",
+ elog(LOG,
+ "!!>> process_syncing_tables_for_apply: dropping tablesync origin tracking for \"%s\".",
originname);
replorigin_drop(originid, false /* nowait */ );
+ elog(LOG,
+ "!!>> process_syncing_tables_for_apply: dropped tablesync origin tracking for \"%s\".",
+ originname);
}
/*
@@ -929,12 +932,17 @@ LogicalRepSyncTableStart(XLogRecPtr *origin_startpos)
* The COPY phase was previously done, but tablesync then crashed
* before it was able to finish normally.
*/
+ elog(LOG,
+ "!!>> LogicalRepSyncTableStart: tablesync relstate was SUBREL_STATE_FINISHEDCOPY.");
StartTransactionCommand();
/*
* Slot creation passes NULL lsn because the origin startpos is got
* from origin tracking this time, not from the slot.
*/
+ elog(LOG,
+ "!!>> LogicalRepSyncTableStart: 2 walrcv_create_slot for \"%s\".",
+ slotname);
walrcv_create_slot(wrconn, slotname, true /* temporary */ ,
CRS_NOEXPORT_SNAPSHOT, NULL /* lsn */ );
@@ -943,8 +951,14 @@ LogicalRepSyncTableStart(XLogRecPtr *origin_startpos)
* time this tablesync was launched.
*/
originid = replorigin_by_name(originname, false /* missing_ok */ );
+ elog(LOG,
+ "!!>> LogicalRepSyncTableStart: 2 replorigin_session_setup \"%s\".",
+ originname);
replorigin_session_setup(originid);
replorigin_session_origin = originid;
+ elog(LOG,
+ "!!>> LogicalRepSyncTableStart: 2 replorigin_session_get_progress \"%s\".",
+ originname);
*origin_startpos = replorigin_session_get_progress(false);
goto copy_table_done;
@@ -993,6 +1007,9 @@ LogicalRepSyncTableStart(XLogRecPtr *origin_startpos)
* for the catchup phase after COPY is done, so tell it to use the
* snapshot to make the final data consistent.
*/
+ elog(LOG,
+ "!!>> LogicalRepSyncTableStart: 1 walrcv_create_slot for \"%s\".",
+ slotname);
walrcv_create_slot(wrconn, slotname, true,
CRS_USE_SNAPSHOT, origin_startpos);
@@ -1024,13 +1041,22 @@ LogicalRepSyncTableStart(XLogRecPtr *origin_startpos)
* logged for the purpose of recovery. Locks are to prevent the
* replication origin from vanishing while advancing.
*/
+ elog(LOG,
+ "!!>> LogicalRepSyncTableStart: 1 replorigin_create \"%s\".",
+ originname);
originid = replorigin_create(originname);
LockRelationOid(ReplicationOriginRelationId, RowExclusiveLock);
+ elog(LOG,
+ "!!>> LogicalRepSyncTableStart: 1 replorigin_advance \"%s\".",
+ originname);
replorigin_advance(originid, *origin_startpos, InvalidXLogRecPtr,
true /* go backward */ , true /* WAL log */ );
UnlockRelationOid(ReplicationOriginRelationId, RowExclusiveLock);
+ elog(LOG,
+ "!!>> LogicalRepSyncTableStart: 1 replorigin_session_setup \"%s\".",
+ originname);
replorigin_session_setup(originid);
replorigin_session_origin = originid;
}
--
1.8.3.1
On Mon, Jan 25, 2021 at 6:15 AM Peter Smith <smithpb2250@gmail.com> wrote:
On Sun, Jan 24, 2021 at 5:54 PM Peter Smith <smithpb2250@gmail.com> wrote:
4. - /* - * To build a slot name for the sync work, we are limited to NAMEDATALEN - - * 1 characters. We cut the original slot name to NAMEDATALEN - 28 chars - * and append _%u_sync_%u (1 + 10 + 6 + 10 + '\0'). (It's actually the - * NAMEDATALEN on the remote that matters, but this scheme will also work - * reasonably if that is different.) - */ - StaticAssertStmt(NAMEDATALEN >= 32, "NAMEDATALEN too small"); /* for sanity */ - slotname = psprintf("%.*s_%u_sync_%u", - NAMEDATALEN - 28, - MySubscription->slotname, - MySubscription->oid, - MyLogicalRepWorker->relid); + /* Calculate the name of the tablesync slot. */ + slotname = ReplicationSlotNameForTablesync( + MySubscription->oid, + MyLogicalRepWorker->relid);What is the reason for changing the slot name calculation? If there is
any particular reasons, then we can add a comment to indicate why we
can't include the subscription's slotname in this calculation.The subscription slot name may be changed (e.g. ALTER SUBSCRIPTION)
and so including the subscription slot name as part of the tablesync
slot name was considered to be:
a) possibly risky/undefined, if the subscription slot_name = NONE
b) confusing, if we end up using 2 different slot names for the same
tablesync (e.g. if the subscription slot name is changed before a sync
worker is re-launched).
And since this subscription slot name part is not necessary for
uniqueness anyway, it was removed from the tablesync slot name to
eliminate those concerns.Also, the tablesync slot name calculation was encapsulated as a
separate function because previously (i.e. before v18) it was used by
various other cleanup codes. I still like it better as a function, but
now it is only called from one place so we could put that code back
inline if you prefer it how it was..It turns out those (a/b) concerns I wrote above are maybe unfounded,
because it seems not possible to alter the slot_name = NONE unless the
subscription is first DISABLED.
Yeah, but I think the user can still change to some other predefined
slot_name. However, I guess it doesn't matter unless it can lead what
you have mentioned in (a). As that can't happen, it is probably better
to take out that change from the patch. I see your point of moving
this calculation to a separate function but not sure if it is worth it
unless we have to call it from multiple places or it simplifies the
existing code.
--
With Regards,
Amit Kapila.
On Sat, Jan 23, 2021 at 11:26 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
2.
@@ -98,11 +102,16 @@
#include "miscadmin.h"
#include "parser/parse_relation.h"
#include "pgstat.h"
+#include "postmaster/interrupt.h"
#include "replication/logicallauncher.h"
#include "replication/logicalrelation.h"
+#include "replication/logicalworker.h"
#include "replication/walreceiver.h"
#include "replication/worker_internal.h"
+#include "replication/slot.h"I don't think the above includes are required. They seem to the
remnant of the previous approach.
OK. Fixed in the latest patch [v19].
3.
process_syncing_tables_for_sync(XLogRecPtr current_lsn)
{
- Assert(IsTransactionState());
+ bool sync_done = false;SpinLockAcquire(&MyLogicalRepWorker->relmutex); + sync_done = MyLogicalRepWorker->relstate == SUBREL_STATE_CATCHUP && + current_lsn >= MyLogicalRepWorker->relstate_lsn; + SpinLockRelease(&MyLogicalRepWorker->relmutex);- if (MyLogicalRepWorker->relstate == SUBREL_STATE_CATCHUP && - current_lsn >= MyLogicalRepWorker->relstate_lsn) + if (sync_done) { TimeLineID tli;+ /* + * Change state to SYNCDONE. + */ + SpinLockAcquire(&MyLogicalRepWorker->relmutex);Why do we need these changes? If you have done it for the
code-readability purpose then we can consider this as a separate patch
because I don't see why these are required w.r.t this patch.
Yes it was for code readability in v17 when this function used to be
much larger. But it is not very necessary anymore and has been
reverted in the latest patch [v19].
4. - /* - * To build a slot name for the sync work, we are limited to NAMEDATALEN - - * 1 characters. We cut the original slot name to NAMEDATALEN - 28 chars - * and append _%u_sync_%u (1 + 10 + 6 + 10 + '\0'). (It's actually the - * NAMEDATALEN on the remote that matters, but this scheme will also work - * reasonably if that is different.) - */ - StaticAssertStmt(NAMEDATALEN >= 32, "NAMEDATALEN too small"); /* for sanity */ - slotname = psprintf("%.*s_%u_sync_%u", - NAMEDATALEN - 28, - MySubscription->slotname, - MySubscription->oid, - MyLogicalRepWorker->relid); + /* Calculate the name of the tablesync slot. */ + slotname = ReplicationSlotNameForTablesync( + MySubscription->oid, + MyLogicalRepWorker->relid);What is the reason for changing the slot name calculation? If there is
any particular reasons, then we can add a comment to indicate why we
can't include the subscription's slotname in this calculation.
The tablesync slot name changes were not strictly necessary, so the
code is all reverted to be the same as OSS HEAD now in the latest
patch [v19].
5. This is WAL + * logged for for the purpose of recovery. Locks are to prevent the + * replication origin from vanishing while advancing./for for/for
OK. Fixed in the latest patch [v19].
6. + /* Remove the tablesync's origin tracking if exists. */ + snprintf(originname, sizeof(originname), "pg_%u_%u", subid, relid); + originid = replorigin_by_name(originname, true); + if (originid != InvalidRepOriginId) + { + elog(DEBUG1, "DropSubscription: dropping origin tracking for \"%s\"", originname);I don't think we need this and the DEBUG1 message in
AlterSubscription_refresh. IT is fine to print this information for
background workers like in apply-worker but not sure if need it here.
The DropSubscription drops the origin of apply worker but it doesn't
use such a DEBUG message so I guess we don't it for tablesync origins
as well.
OK. These DEBUG1 logs are removed in the latest patch [v19].
----
[v19] /messages/by-id/CAHut+Psj7Xm8C1LbqeAbk-3duyS8xXJtL9TiGaeu3P8g272mAA@mail.gmail.com
Kind Regards,
Peter Smith.
Fujitsu Australia
On Sun, Jan 24, 2021 at 12:24 PM Peter Smith <smithpb2250@gmail.com> wrote:
On Sat, Jan 23, 2021 at 11:26 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
Few comments: ============= 1. - * So the state progression is always: INIT -> DATASYNC -> SYNCWAIT -> - * CATCHUP -> SYNCDONE -> READY. + * So the state progression is always: INIT -> DATASYNC -> + * (sync worker FINISHEDCOPY) -> SYNCWAIT -> CATCHUP -> SYNCDONE -> READY.I don't think we need to be specific here that sync worker sets
FINISHEDCOPY state.This was meant to indicate that *only* the sync worker knows about the
FINISHEDCOPY state, whereas all the other states are either known
(assigned and/or used) by *both* kinds of workers. But, I can remove
it if you feel that distinction is not useful.
Okay, but I feel you can mention that in the description you have
added for FINISHEDCOPY state. It looks a bit odd here and the message
you want to convey is also not that clear.
--
With Regards,
Amit Kapila.
On Sat, Jan 23, 2021 at 11:08 AM Ajin Cherian <itsajin@gmail.com> wrote:
On Sat, Jan 23, 2021 at 3:16 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
I think so. But do you have any reason to believe that it won't be
required anymore?A temporary slot will not clash with a permanent slot of the same name,
I have tried below and it seems to be clashing:
postgres=# SELECT 'init' FROM
pg_create_logical_replication_slot('test_slot2', 'test_decoding');
?column?
----------
init
(1 row)
postgres=# SELECT 'init' FROM
pg_create_logical_replication_slot('test_slot2', 'test_decoding',
true);
ERROR: replication slot "test_slot2" already exists
Note that the third parameter in the second statement above indicates
whether it is a temporary slot or not. What am I missing?
--
With Regards,
Amit Kapila.
On Mon, Jan 25, 2021 at 8:23 AM Peter Smith <smithpb2250@gmail.com> wrote:
On Sat, Jan 23, 2021 at 11:26 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
2.
@@ -98,11 +102,16 @@
#include "miscadmin.h"
#include "parser/parse_relation.h"
#include "pgstat.h"
+#include "postmaster/interrupt.h"
#include "replication/logicallauncher.h"
#include "replication/logicalrelation.h"
+#include "replication/logicalworker.h"
#include "replication/walreceiver.h"
#include "replication/worker_internal.h"
+#include "replication/slot.h"I don't think the above includes are required. They seem to the
remnant of the previous approach.OK. Fixed in the latest patch [v19].
You seem to forgot removing #include "replication/slot.h". Check, if
it is not required then remove that as well.
--
With Regards,
Amit Kapila.
On Mon, Jan 25, 2021 at 8:03 AM Peter Smith <smithpb2250@gmail.com> wrote:
Hi Amit.
PSA the v19 patch for the Tablesync Solution1.
I see one race condition in this patch where we try to drop the origin
via apply process and DropSubscription. I think it can lead to the
error "cache lookup failed for replication origin with oid %u". The
same problem can happen via exposed API pg_replication_origin_drop but
probably because this is not used concurrently so nobody faced this
issue. I think for the matter of this patch we can try to suppress
such an error either via try..catch, or by adding missing_ok argument
to replorigin_drop API, or we can just add to comments that such a
race exists. Additionally, we should try to start a new thread for the
existence of this problem in pg_replication_origin_drop. What do you
think?
--
With Regards,
Amit Kapila.
Hi Amit.
PSA the v20 patch for the Tablesync Solution1.
Main differences from v19:
+ Updated TAP test [ak0123-7]
+ Fixed comment [ak0125-1]
+ Removed redundant header [ak0125-2]
+ Protection against race condition [ak0125-race]
[ak0123-7] /messages/by-id/CAA4eK1JhpuwujrV6ABMmZ3jXfW37ssZnJ3fikrY7rRdvoEmu_g@mail.gmail.com
[ak0125-1] /messages/by-id/CAA4eK1JmP2VVpH2=O=5BBbuH7gyQtWn40aXp_Jyjn1+Kggfq8A@mail.gmail.com
[ak0125-2] /messages/by-id/CAA4eK1L1j5sfBgHb0-H-+2quBstsA3hMcDfP-4vLuU-UF43nXQ@mail.gmail.com
[ak0125-race] /messages/by-id/CAA4eK1+yeLwBCkTvTdPM-hSk1fr6jT8KJc362CN8zrGztq_JqQ@mail.gmail.com
====
Features:
* The tablesync worker is now allowing multiple tx instead of single tx.
* A new state (SUBREL_STATE_FINISHEDCOPY) is persisted after a
successful copy_table in tablesync's LogicalRepSyncTableStart.
* If a re-launched tablesync finds state SUBREL_STATE_FINISHEDCOPY
then it will bypass the initial copy_table phase.
* Now tablesync sets up replication origin tracking in
LogicalRepSyncTableStart (similar as done for the apply worker). The
origin is advanced when first created.
* The tablesync replication origin tracking record is cleaned up by:
- process_syncing_tables_for_apply
- DropSubscription
- AlterSubscription_refresh
* Updates to PG docs.
* New TAP test case.
Known Issues:
* Some records arriving between FINISHEDCOPY and SYNCDONE state may be
lost (currently under investigation).
---
Kind Regards,
Peter Smith.
Fujitsu Australia
Attachments:
v20-0002-Tablesync-extra-logging.patchapplication/octet-stream; name=v20-0002-Tablesync-extra-logging.patchDownload
From 878e4ed4fdb8e4b67c1f98bc4dd22c4f95654a5e Mon Sep 17 00:00:00 2001
From: Peter Smith <peter.b.smith@fujitsu.com>
Date: Mon, 25 Jan 2021 21:36:46 +1100
Subject: [PATCH v20] Tablesync extra logging.
This patch only adds some extra logging which may be helpful for testing, but is not for committing.
---
src/backend/commands/subscriptioncmds.c | 6 ++++--
src/backend/replication/logical/tablesync.c | 31 +++++++++++++++++++++++++++++
2 files changed, 35 insertions(+), 2 deletions(-)
diff --git a/src/backend/commands/subscriptioncmds.c b/src/backend/commands/subscriptioncmds.c
index 676da19..154a2da 100644
--- a/src/backend/commands/subscriptioncmds.c
+++ b/src/backend/commands/subscriptioncmds.c
@@ -654,10 +654,11 @@ AlterSubscription_refresh(Subscription *sub, bool copy_data)
logicalrep_worker_stop_at_commit(sub->oid, relid);
/* Remove the tablesync's origin tracking if exists. */
+ elog(LOG, "!!>> AlterSubscription_refresh: tablesync_replorigin_drop");
tablesync_replorigin_drop(sub->oid, relid, false /* nowait */ );
- ereport(DEBUG1,
- (errmsg("table \"%s.%s\" removed from subscription \"%s\"",
+ ereport(LOG,
+ (errmsg("!!>> table \"%s.%s\" removed from subscription \"%s\"",
get_namespace_name(get_rel_namespace(relid)),
get_rel_name(relid),
sub->name)));
@@ -1062,6 +1063,7 @@ DropSubscription(DropSubscriptionStmt *stmt, bool isTopLevel)
continue;
/* Remove the tablesync's origin tracking if exists. */
+ elog(LOG, "!!>> DropSubscription: tablesync_replorigin_drop");
tablesync_replorigin_drop(subid, relid, false /* nowait */ );
}
list_free(rstates);
diff --git a/src/backend/replication/logical/tablesync.c b/src/backend/replication/logical/tablesync.c
index 5870334..a3db3e5 100644
--- a/src/backend/replication/logical/tablesync.c
+++ b/src/backend/replication/logical/tablesync.c
@@ -143,7 +143,13 @@ tablesync_replorigin_drop(Oid subid, Oid relid, bool nowait)
{
PG_TRY();
{
+ elog(LOG,
+ "!!>> tablesync_replorigin_drop: dropping origin \"%s\"",
+ originname);
replorigin_drop(originid, nowait);
+ elog(LOG,
+ "!!>> tablesync_replorigin_drop: dropped origin \"%s\"",
+ originname);
}
PG_CATCH();
{
@@ -474,6 +480,8 @@ process_syncing_tables_for_apply(XLogRecPtr current_lsn)
* tablesync worker process attempted to drop its own origin
* then it would fail (origin is "busy").
*/
+ elog(LOG,
+ "!!>> process_syncing_tables_for_apply: tablesync_replorigin_drop");
tablesync_replorigin_drop(MyLogicalRepWorker->subid,
rstate->relid, false /* nowait */ );
@@ -956,12 +964,17 @@ LogicalRepSyncTableStart(XLogRecPtr *origin_startpos)
* The COPY phase was previously done, but tablesync then crashed
* before it was able to finish normally.
*/
+ elog(LOG,
+ "!!>> LogicalRepSyncTableStart: tablesync relstate was SUBREL_STATE_FINISHEDCOPY.");
StartTransactionCommand();
/*
* Slot creation passes NULL lsn because the origin startpos is got
* from origin tracking this time, not from the slot.
*/
+ elog(LOG,
+ "!!>> LogicalRepSyncTableStart: 2 walrcv_create_slot for \"%s\".",
+ slotname);
walrcv_create_slot(wrconn, slotname, true /* temporary */ ,
CRS_NOEXPORT_SNAPSHOT, NULL /* lsn */ );
@@ -970,8 +983,14 @@ LogicalRepSyncTableStart(XLogRecPtr *origin_startpos)
* time this tablesync was launched.
*/
originid = replorigin_by_name(originname, false /* missing_ok */ );
+ elog(LOG,
+ "!!>> LogicalRepSyncTableStart: 2 replorigin_session_setup \"%s\".",
+ originname);
replorigin_session_setup(originid);
replorigin_session_origin = originid;
+ elog(LOG,
+ "!!>> LogicalRepSyncTableStart: 2 replorigin_session_get_progress \"%s\".",
+ originname);
*origin_startpos = replorigin_session_get_progress(false);
goto copy_table_done;
@@ -1020,6 +1039,9 @@ LogicalRepSyncTableStart(XLogRecPtr *origin_startpos)
* for the catchup phase after COPY is done, so tell it to use the
* snapshot to make the final data consistent.
*/
+ elog(LOG,
+ "!!>> LogicalRepSyncTableStart: 1 walrcv_create_slot for \"%s\".",
+ slotname);
walrcv_create_slot(wrconn, slotname, true,
CRS_USE_SNAPSHOT, origin_startpos);
@@ -1051,13 +1073,22 @@ LogicalRepSyncTableStart(XLogRecPtr *origin_startpos)
* logged for the purpose of recovery. Locks are to prevent the
* replication origin from vanishing while advancing.
*/
+ elog(LOG,
+ "!!>> LogicalRepSyncTableStart: 1 replorigin_create \"%s\".",
+ originname);
originid = replorigin_create(originname);
LockRelationOid(ReplicationOriginRelationId, RowExclusiveLock);
+ elog(LOG,
+ "!!>> LogicalRepSyncTableStart: 1 replorigin_advance \"%s\".",
+ originname);
replorigin_advance(originid, *origin_startpos, InvalidXLogRecPtr,
true /* go backward */ , true /* WAL log */ );
UnlockRelationOid(ReplicationOriginRelationId, RowExclusiveLock);
+ elog(LOG,
+ "!!>> LogicalRepSyncTableStart: 1 replorigin_session_setup \"%s\".",
+ originname);
replorigin_session_setup(originid);
replorigin_session_origin = originid;
}
--
1.8.3.1
v20-0001-Tablesync-Solution1.patchapplication/octet-stream; name=v20-0001-Tablesync-Solution1.patchDownload
From 99bad2dfb4b1a8271c5beedbff0df3550a74775b Mon Sep 17 00:00:00 2001
From: Peter Smith <peter.b.smith@fujitsu.com>
Date: Mon, 25 Jan 2021 20:26:11 +1100
Subject: [PATCH v20] Tablesync Solution1.
Features:
* The tablesync worker is now allowing multiple tx instead of single tx.
* A new state (SUBREL_STATE_FINISHEDCOPY) is persisted after a successful copy_table in tablesync's LogicalRepSyncTableStart.
* If a re-launched tablesync finds state SUBREL_STATE_FINISHEDCOPY then it will bypass the initial copy_table phase.
* Now tablesync sets up replication origin tracking in LogicalRepSyncTableStart (similar as done for the apply worker). The origin is advanced when first created.
* The tablesync replication origin tracking record is cleaned up by:
- process_syncing_tables_for_apply
- DropSubscription
- AlterSubscription_refresh
* Updates to PG docs.
* New TAP test case.
Known Issues:
* None.
---
doc/src/sgml/catalogs.sgml | 1 +
src/backend/commands/subscriptioncmds.c | 24 ++++
src/backend/replication/logical/tablesync.c | 166 ++++++++++++++++++++++++++--
src/backend/replication/logical/worker.c | 18 +--
src/include/catalog/pg_subscription_rel.h | 2 +
src/include/replication/worker_internal.h | 2 +
src/test/subscription/t/004_sync.pl | 69 +++++++++++-
7 files changed, 258 insertions(+), 24 deletions(-)
diff --git a/doc/src/sgml/catalogs.sgml b/doc/src/sgml/catalogs.sgml
index 43d7a1a..82e74e1 100644
--- a/doc/src/sgml/catalogs.sgml
+++ b/doc/src/sgml/catalogs.sgml
@@ -7662,6 +7662,7 @@ SCRAM-SHA-256$<replaceable><iteration count></replaceable>:<replaceable>&l
State code:
<literal>i</literal> = initialize,
<literal>d</literal> = data is being copied,
+ <literal>f</literal> = finished table copy,
<literal>s</literal> = synchronized,
<literal>r</literal> = ready (normal replication)
</para></entry>
diff --git a/src/backend/commands/subscriptioncmds.c b/src/backend/commands/subscriptioncmds.c
index 082f785..676da19 100644
--- a/src/backend/commands/subscriptioncmds.c
+++ b/src/backend/commands/subscriptioncmds.c
@@ -653,6 +653,9 @@ AlterSubscription_refresh(Subscription *sub, bool copy_data)
logicalrep_worker_stop_at_commit(sub->oid, relid);
+ /* Remove the tablesync's origin tracking if exists. */
+ tablesync_replorigin_drop(sub->oid, relid, false /* nowait */ );
+
ereport(DEBUG1,
(errmsg("table \"%s.%s\" removed from subscription \"%s\"",
get_namespace_name(get_rel_namespace(relid)),
@@ -930,6 +933,7 @@ DropSubscription(DropSubscriptionStmt *stmt, bool isTopLevel)
WalReceiverConn *wrconn = NULL;
StringInfoData cmd;
Form_pg_subscription form;
+ List *rstates;
/*
* Lock pg_subscription with AccessExclusiveLock to ensure that the
@@ -1042,6 +1046,26 @@ DropSubscription(DropSubscriptionStmt *stmt, bool isTopLevel)
}
list_free(subworkers);
+ /*
+ * Tablesync resource cleanup (origins).
+ *
+ * Any READY-state relations have already done this.
+ */
+ rstates = GetSubscriptionNotReadyRelations(subid);
+ foreach(lc, rstates)
+ {
+ SubscriptionRelState *rstate = (SubscriptionRelState *) lfirst(lc);
+ Oid relid = rstate->relid;
+
+ /* Only cleanup the tablesync worker resources */
+ if (!OidIsValid(relid))
+ continue;
+
+ /* Remove the tablesync's origin tracking if exists. */
+ tablesync_replorigin_drop(subid, relid, false /* nowait */ );
+ }
+ list_free(rstates);
+
/* Clean up dependencies */
deleteSharedDependencyRecordsFor(SubscriptionRelationId, subid, 0);
diff --git a/src/backend/replication/logical/tablesync.c b/src/backend/replication/logical/tablesync.c
index 863d196..5870334 100644
--- a/src/backend/replication/logical/tablesync.c
+++ b/src/backend/replication/logical/tablesync.c
@@ -31,8 +31,11 @@
* table state to INIT.
* - Tablesync worker starts; changes table state from INIT to DATASYNC while
* copying.
- * - Tablesync worker finishes the copy and sets table state to SYNCWAIT;
- * waits for state change.
+ * - Tablesync worker does initial table copy; there is a FINISHEDCOPY (sync
+ * worker specific) state to indicate when the copy phase has completed, so
+ * if the worker crashes with this (non-memory) state then the copy will not
+ * be re-attempted.
+ * - Tablesync worker then sets table state to SYNCWAIT; waits for state change.
* - Apply worker periodically checks for tables in SYNCWAIT state. When
* any appear, it sets the table state to CATCHUP and starts loop-waiting
* until either the table state is set to SYNCDONE or the sync worker
@@ -48,8 +51,8 @@
* point it sets state to READY and stops tracking. Again, there might
* be zero changes in between.
*
- * So the state progression is always: INIT -> DATASYNC -> SYNCWAIT ->
- * CATCHUP -> SYNCDONE -> READY.
+ * So the state progression is always: INIT -> DATASYNC -> FINISHEDCOPY
+ * -> SYNCWAIT -> CATCHUP -> SYNCDONE -> READY.
*
* The catalog pg_subscription_rel is used to keep information about
* subscribed tables and their state. Some transient state during data
@@ -59,6 +62,7 @@
* Example flows look like this:
* - Apply is in front:
* sync:8
+ * -> set in catalog FINISHEDCOPY
* -> set in memory SYNCWAIT
* apply:10
* -> set in memory CATCHUP
@@ -74,6 +78,7 @@
*
* - Sync is in front:
* sync:10
+ * -> set in catalog FINISHEDCOPY
* -> set in memory SYNCWAIT
* apply:8
* -> set in memory CATCHUP
@@ -102,7 +107,9 @@
#include "replication/logicalrelation.h"
#include "replication/walreceiver.h"
#include "replication/worker_internal.h"
+#include "replication/origin.h"
#include "storage/ipc.h"
+#include "storage/lmgr.h"
#include "utils/builtins.h"
#include "utils/lsyscache.h"
#include "utils/memutils.h"
@@ -113,6 +120,43 @@ static bool table_states_valid = false;
StringInfo copybuf = NULL;
/*
+ * Common code to drop the origin of a tablesync worker.
+ *
+ * There is a potential race condition if two processes attempt to call
+ * replorigin_drop for the same originid at the same time (e.g. during
+ * DropSubscription and process_syncing_tables_for_apply). The loser of that
+ * race would give an ERROR saying that it failed to find the expected
+ * originid.
+ *
+ * The TRY/CATCH below supresses such errors allowing the tablesync cleanup
+ * code to proceed.
+ */
+void
+tablesync_replorigin_drop(Oid subid, Oid relid, bool nowait)
+{
+ char originname[NAMEDATALEN];
+ RepOriginId originid;
+
+ snprintf(originname, sizeof(originname), "pg_%u_%u", subid, relid);
+ originid = replorigin_by_name(originname, true);
+ if (OidIsValid(originid))
+ {
+ PG_TRY();
+ {
+ replorigin_drop(originid, nowait);
+ }
+ PG_CATCH();
+ {
+ ereport(WARNING,
+ errmsg("could not drop replication origin with OID %d, named \"%s\"",
+ originid,
+ originname));
+ }
+ PG_END_TRY();
+ }
+}
+
+/*
* Exit routine for synchronization worker.
*/
static void
@@ -270,8 +314,6 @@ invalidate_syncing_table_states(Datum arg, int cacheid, uint32 hashvalue)
static void
process_syncing_tables_for_sync(XLogRecPtr current_lsn)
{
- Assert(IsTransactionState());
-
SpinLockAcquire(&MyLogicalRepWorker->relmutex);
if (MyLogicalRepWorker->relstate == SUBREL_STATE_CATCHUP &&
@@ -279,11 +321,23 @@ process_syncing_tables_for_sync(XLogRecPtr current_lsn)
{
TimeLineID tli;
+ /*
+ * Change state to SYNCDONE.
+ */
MyLogicalRepWorker->relstate = SUBREL_STATE_SYNCDONE;
MyLogicalRepWorker->relstate_lsn = current_lsn;
SpinLockRelease(&MyLogicalRepWorker->relmutex);
+ /*
+ * UpdateSubscriptionRelState must be called within a transaction.
+ * That transaction will be ended within the finish_sync_worker().
+ */
+ if (!IsTransactionState())
+ {
+ StartTransactionCommand();
+ }
+
UpdateSubscriptionRelState(MyLogicalRepWorker->subid,
MyLogicalRepWorker->relid,
MyLogicalRepWorker->relstate,
@@ -412,6 +466,20 @@ process_syncing_tables_for_apply(XLogRecPtr current_lsn)
started_tx = true;
}
+ /*
+ * Remove the tablesync origin tracking if exists.
+ *
+ * The normal case origin drop must be done here, not in the
+ * process_syncing_tables_for_sync function, because if the
+ * tablesync worker process attempted to drop its own origin
+ * then it would fail (origin is "busy").
+ */
+ tablesync_replorigin_drop(MyLogicalRepWorker->subid,
+ rstate->relid, false /* nowait */ );
+
+ /*
+ * Update the state to READY only after the origin cleanup.
+ */
UpdateSubscriptionRelState(MyLogicalRepWorker->subid,
rstate->relid, rstate->state,
rstate->lsn);
@@ -824,6 +892,8 @@ LogicalRepSyncTableStart(XLogRecPtr *origin_startpos)
XLogRecPtr relstate_lsn;
Relation rel;
WalRcvExecResult *res;
+ char originname[NAMEDATALEN];
+ RepOriginId originid;
/* Check the state of the table synchronization. */
StartTransactionCommand();
@@ -874,7 +944,38 @@ LogicalRepSyncTableStart(XLogRecPtr *origin_startpos)
(errmsg("could not connect to the publisher: %s", err)));
Assert(MyLogicalRepWorker->relstate == SUBREL_STATE_INIT ||
- MyLogicalRepWorker->relstate == SUBREL_STATE_DATASYNC);
+ MyLogicalRepWorker->relstate == SUBREL_STATE_DATASYNC ||
+ MyLogicalRepWorker->relstate == SUBREL_STATE_FINISHEDCOPY);
+
+ /* Assign the origin tracking record name. */
+ snprintf(originname, sizeof(originname), "pg_%u_%u", MySubscription->oid, MyLogicalRepWorker->relid);
+
+ if (MyLogicalRepWorker->relstate == SUBREL_STATE_FINISHEDCOPY)
+ {
+ /*
+ * The COPY phase was previously done, but tablesync then crashed
+ * before it was able to finish normally.
+ */
+ StartTransactionCommand();
+
+ /*
+ * Slot creation passes NULL lsn because the origin startpos is got
+ * from origin tracking this time, not from the slot.
+ */
+ walrcv_create_slot(wrconn, slotname, true /* temporary */ ,
+ CRS_NOEXPORT_SNAPSHOT, NULL /* lsn */ );
+
+ /*
+ * The origin tracking name must already exist. It was created first
+ * time this tablesync was launched.
+ */
+ originid = replorigin_by_name(originname, false /* missing_ok */ );
+ replorigin_session_setup(originid);
+ replorigin_session_origin = originid;
+ *origin_startpos = replorigin_session_get_progress(false);
+
+ goto copy_table_done;
+ }
SpinLockAcquire(&MyLogicalRepWorker->relmutex);
MyLogicalRepWorker->relstate = SUBREL_STATE_DATASYNC;
@@ -890,9 +991,6 @@ LogicalRepSyncTableStart(XLogRecPtr *origin_startpos)
CommitTransactionCommand();
pgstat_report_stat(false);
- /*
- * We want to do the table data sync in a single transaction.
- */
StartTransactionCommand();
/*
@@ -942,6 +1040,54 @@ LogicalRepSyncTableStart(XLogRecPtr *origin_startpos)
/* Make the copy visible. */
CommandCounterIncrement();
+ /* Setup replication origin tracking. */
+ originid = replorigin_by_name(originname, true);
+ if (!OidIsValid(originid))
+ {
+ /*
+ * Origin tracking does not exist, so create it now.
+ *
+ * Then advance to the LSN got from walrcv_create_slot. This is WAL
+ * logged for the purpose of recovery. Locks are to prevent the
+ * replication origin from vanishing while advancing.
+ */
+ originid = replorigin_create(originname);
+
+ LockRelationOid(ReplicationOriginRelationId, RowExclusiveLock);
+ replorigin_advance(originid, *origin_startpos, InvalidXLogRecPtr,
+ true /* go backward */ , true /* WAL log */ );
+ UnlockRelationOid(ReplicationOriginRelationId, RowExclusiveLock);
+
+ replorigin_session_setup(originid);
+ replorigin_session_origin = originid;
+ }
+ else
+ {
+ ereport(ERROR,
+ (errcode(ERRCODE_DUPLICATE_OBJECT),
+ errmsg("replication origin \"%s\" already exists",
+ originname)));
+ }
+
+ /*
+ * Update the persisted state to indicate the COPY phase is done; make it
+ * visible to others.
+ */
+ UpdateSubscriptionRelState(MyLogicalRepWorker->subid,
+ MyLogicalRepWorker->relid,
+ SUBREL_STATE_FINISHEDCOPY,
+ MyLogicalRepWorker->relstate_lsn);
+
+copy_table_done:
+
+ elog(DEBUG1,
+ "LogicalRepSyncTableStart: '%s' origin_startpos lsn %X/%X",
+ originname,
+ (uint32) (*origin_startpos >> 32),
+ (uint32) *origin_startpos);
+
+ CommitTransactionCommand();
+
/*
* We are done with the initial data synchronization, update the state.
*/
diff --git a/src/backend/replication/logical/worker.c b/src/backend/replication/logical/worker.c
index eb7db89..cfc924c 100644
--- a/src/backend/replication/logical/worker.c
+++ b/src/backend/replication/logical/worker.c
@@ -807,12 +807,8 @@ apply_handle_stream_stop(StringInfo s)
/* We must be in a valid transaction state */
Assert(IsTransactionState());
- /* The synchronization worker runs in single transaction. */
- if (!am_tablesync_worker())
- {
- /* Commit the per-stream transaction */
- CommitTransactionCommand();
- }
+ /* Commit the per-stream transaction */
+ CommitTransactionCommand();
in_streamed_transaction = false;
@@ -889,9 +885,7 @@ apply_handle_stream_abort(StringInfo s)
/* Cleanup the subxact info */
cleanup_subxact_info();
- /* The synchronization worker runs in single transaction */
- if (!am_tablesync_worker())
- CommitTransactionCommand();
+ CommitTransactionCommand();
return;
}
@@ -918,8 +912,7 @@ apply_handle_stream_abort(StringInfo s)
/* write the updated subxact list */
subxact_info_write(MyLogicalRepWorker->subid, xid);
- if (!am_tablesync_worker())
- CommitTransactionCommand();
+ CommitTransactionCommand();
}
}
@@ -1062,8 +1055,7 @@ apply_handle_stream_commit(StringInfo s)
static void
apply_handle_commit_internal(StringInfo s, LogicalRepCommitData *commit_data)
{
- /* The synchronization worker runs in single transaction. */
- if (IsTransactionState() && !am_tablesync_worker())
+ if (IsTransactionState())
{
/*
* Update origin state so we can restart streaming from correct
diff --git a/src/include/catalog/pg_subscription_rel.h b/src/include/catalog/pg_subscription_rel.h
index 06663b9..9027c42 100644
--- a/src/include/catalog/pg_subscription_rel.h
+++ b/src/include/catalog/pg_subscription_rel.h
@@ -61,6 +61,8 @@ DECLARE_UNIQUE_INDEX(pg_subscription_rel_srrelid_srsubid_index, 6117, on pg_subs
#define SUBREL_STATE_INIT 'i' /* initializing (sublsn NULL) */
#define SUBREL_STATE_DATASYNC 'd' /* data is being synchronized (sublsn
* NULL) */
+#define SUBREL_STATE_FINISHEDCOPY 'f' /* tablesync copy phase is completed
+ * (sublsn NULL) */
#define SUBREL_STATE_SYNCDONE 's' /* synchronization finished in front of
* apply (sublsn set) */
#define SUBREL_STATE_READY 'r' /* ready (sublsn set) */
diff --git a/src/include/replication/worker_internal.h b/src/include/replication/worker_internal.h
index d046022..67bc911 100644
--- a/src/include/replication/worker_internal.h
+++ b/src/include/replication/worker_internal.h
@@ -84,6 +84,8 @@ extern void logicalrep_worker_wakeup_ptr(LogicalRepWorker *worker);
extern int logicalrep_sync_worker_count(Oid subid);
extern char *LogicalRepSyncTableStart(XLogRecPtr *origin_startpos);
+extern void tablesync_replorigin_drop(Oid subid, Oid relid, bool nowait);
+
void process_syncing_tables(XLogRecPtr current_lsn);
void invalidate_syncing_table_states(Datum arg, int cacheid,
uint32 hashvalue);
diff --git a/src/test/subscription/t/004_sync.pl b/src/test/subscription/t/004_sync.pl
index e111ab9..ec17c38 100644
--- a/src/test/subscription/t/004_sync.pl
+++ b/src/test/subscription/t/004_sync.pl
@@ -3,7 +3,9 @@ use strict;
use warnings;
use PostgresNode;
use TestLib;
-use Test::More tests => 7;
+use Test::More tests => 10;
+use Time::HiRes qw(usleep);
+use Scalar::Util qw(looks_like_number);
# Initialize publisher node
my $node_publisher = get_new_node('publisher');
@@ -149,6 +151,71 @@ $result = $node_subscriber->safe_psql('postgres',
is($result, qq(20),
'changes for table added after subscription initialized replicated');
+##
+## slot integrity
+##
+## Manually create a slot with the same name that tablesync will want.
+## Expect tablesync ERROR when clash is detected.
+## Then remove the slot so tablesync can proceed.
+## Expect tablesync can now finish normally.
+##
+
+# drop the subscription
+$node_subscriber->safe_psql('postgres', "DROP SUBSCRIPTION tap_sub");
+
+# empty the table tab_rep_next
+$node_subscriber->safe_psql('postgres', "DELETE FROM tab_rep_next;");
+
+# drop the table tab_rep from publisher and subscriber
+$node_subscriber->safe_psql('postgres', "DROP TABLE tab_rep;");
+$node_publisher->safe_psql('postgres', "DROP TABLE tab_rep;");
+
+# recreate the subscription again, but leave it disabled so that we can get the OID
+$node_subscriber->safe_psql('postgres',
+ "CREATE SUBSCRIPTION tap_sub CONNECTION '$publisher_connstr' PUBLICATION tap_pub
+ with (enabled = false)"
+);
+
+# need to create the name of the tablesync slot, for this we need the subscription OID
+# and the table OID.
+my $subid = $node_subscriber->safe_psql('postgres',
+ "SELECT oid FROM pg_subscription WHERE subname = 'tap_sub';");
+is(looks_like_number($subid), qq(1), 'get the subscription OID');
+
+my $relid = $node_subscriber->safe_psql('postgres',
+ "SELECT 'tab_rep_next'::regclass::oid");
+is(looks_like_number($relid), qq(1), 'get the table OID');
+
+# name of the tablesync slot is 'slotname'_'suboid'_sync_'tableoid'.
+my $slotname = 'tap_sub_' . $subid . '_' . 'sync_' . $relid;
+
+# temporarily, create a slot having the same name of the tablesync slot.
+$node_publisher->safe_psql('postgres',
+ "SELECT 'init' FROM pg_create_logical_replication_slot('$slotname', 'pgoutput', false);");
+
+# enable the subscription
+$node_subscriber->safe_psql('postgres',
+ "ALTER SUBSCRIPTION tap_sub ENABLE"
+);
+
+# it will be stuck on data sync as slot create will fail because slot already exists.
+$node_subscriber->poll_query_until('postgres', $started_query)
+ or die "Timed out while waiting for subscriber to start sync";
+
+# now drop the offending slot, the tablesync should recover.
+$node_publisher->safe_psql('postgres',
+ "SELECT pg_drop_replication_slot('$slotname');");
+
+# wait for sync to finish
+$node_subscriber->poll_query_until('postgres', $synced_query)
+ or die "Timed out while waiting for subscriber to synchronize data";
+
+$result = $node_subscriber->safe_psql('postgres',
+ "SELECT count(*) FROM tab_rep_next");
+is($result, qq(20),
+ 'data for table added after subscription initialized are now synced');
+
+# Cleanup
$node_subscriber->safe_psql('postgres', "DROP SUBSCRIPTION tap_sub");
$node_subscriber->stop('fast');
--
1.8.3.1
On Thu, Jan 21, 2021 at 9:17 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
7. +# check for occurrence of the expected error +poll_output_until("replication slot \"$slotname\" already exists") + or die "no error stop for the pre-existing origin";In this test, isn't it better to check for datasync state like below?
004_sync.pl has some other similar test.
my $started_query = "SELECT srsubstate = 'd' FROM pg_subscription_rel;";
$node_subscriber->poll_query_until('postgres', $started_query)
or die "Timed out while waiting for subscriber to start sync";Is there a reason why we can't use the existing way to check for
failure in this case?
The TAP test is updated in the latest patch [v20].
----
[v20] /messages/by-id/CAHut+PuNwSujoL_dwa=TtozJ_vF=CnJxjgQTCmNBkazd8J1m-A@mail.gmail.com
Kind Regards,
Peter Smith.
Fujitsu Australia
On Mon, Jan 25, 2021 at 1:58 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
On Sun, Jan 24, 2021 at 12:24 PM Peter Smith <smithpb2250@gmail.com> wrote:
On Sat, Jan 23, 2021 at 11:26 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
Few comments: ============= 1. - * So the state progression is always: INIT -> DATASYNC -> SYNCWAIT -> - * CATCHUP -> SYNCDONE -> READY. + * So the state progression is always: INIT -> DATASYNC -> + * (sync worker FINISHEDCOPY) -> SYNCWAIT -> CATCHUP -> SYNCDONE -> READY.I don't think we need to be specific here that sync worker sets
FINISHEDCOPY state.This was meant to indicate that *only* the sync worker knows about the
FINISHEDCOPY state, whereas all the other states are either known
(assigned and/or used) by *both* kinds of workers. But, I can remove
it if you feel that distinction is not useful.Okay, but I feel you can mention that in the description you have
added for FINISHEDCOPY state. It looks a bit odd here and the message
you want to convey is also not that clear.
The comment is updated in the latest patch [v20].
----
[v20] /messages/by-id/CAHut+PuNwSujoL_dwa=TtozJ_vF=CnJxjgQTCmNBkazd8J1m-A@mail.gmail.com
Kind Regards,
Peter Smith.
Fujitsu Australia
On Mon, Jan 25, 2021 at 2:54 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
On Mon, Jan 25, 2021 at 8:23 AM Peter Smith <smithpb2250@gmail.com> wrote:
On Sat, Jan 23, 2021 at 11:26 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
2.
@@ -98,11 +102,16 @@
#include "miscadmin.h"
#include "parser/parse_relation.h"
#include "pgstat.h"
+#include "postmaster/interrupt.h"
#include "replication/logicallauncher.h"
#include "replication/logicalrelation.h"
+#include "replication/logicalworker.h"
#include "replication/walreceiver.h"
#include "replication/worker_internal.h"
+#include "replication/slot.h"I don't think the above includes are required. They seem to the
remnant of the previous approach.OK. Fixed in the latest patch [v19].
You seem to forgot removing #include "replication/slot.h". Check, if
it is not required then remove that as well.
Fixed in the latest patch [v20].
----
[v20] /messages/by-id/CAHut+PuNwSujoL_dwa=TtozJ_vF=CnJxjgQTCmNBkazd8J1m-A@mail.gmail.com
Kind Regards,
Peter Smith.
Fujitsu Australia
On Mon, Jan 25, 2021 at 4:48 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
On Mon, Jan 25, 2021 at 8:03 AM Peter Smith <smithpb2250@gmail.com> wrote:
Hi Amit.
PSA the v19 patch for the Tablesync Solution1.
I see one race condition in this patch where we try to drop the origin
via apply process and DropSubscription. I think it can lead to the
error "cache lookup failed for replication origin with oid %u". The
same problem can happen via exposed API pg_replication_origin_drop but
probably because this is not used concurrently so nobody faced this
issue. I think for the matter of this patch we can try to suppress
such an error either via try..catch, or by adding missing_ok argument
to replorigin_drop API, or we can just add to comments that such a
race exists.
OK. This has been isolated to a common function called from 3 places.
The potential race ERROR is suppressed by TRY/CATCH.
Please see code of latest patch [v20]
Additionally, we should try to start a new thread for the
existence of this problem in pg_replication_origin_drop. What do you
think?
OK. It is on my TODO list..
----
[v20] /messages/by-id/CAHut+PuNwSujoL_dwa=TtozJ_vF=CnJxjgQTCmNBkazd8J1m-A@mail.gmail.com
Kind Regards,
Peter Smith.
Fujitsu Australia
On Mon, Jan 25, 2021 at 4:48 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
On Mon, Jan 25, 2021 at 8:03 AM Peter Smith <smithpb2250@gmail.com> wrote:
Hi Amit.
PSA the v19 patch for the Tablesync Solution1.
I see one race condition in this patch where we try to drop the origin
via apply process and DropSubscription. I think it can lead to the
error "cache lookup failed for replication origin with oid %u". The
same problem can happen via exposed API pg_replication_origin_drop but
probably because this is not used concurrently so nobody faced this
issue. I think for the matter of this patch we can try to suppress
such an error either via try..catch, or by adding missing_ok argument
to replorigin_drop API, or we can just add to comments that such a
race exists. Additionally, we should try to start a new thread for the
existence of this problem in pg_replication_origin_drop. What do you
think?
OK. A new thread [ps0127] for this problem was started
---
[ps0127] = /messages/by-id/CAHut+PuW8DWV5fskkMWWMqzt-x7RPcNQOtJQBp6SdwyRghCk7A@mail.gmail.com
Kind Regards,
Peter Smith.
Fujitsu Australia
On Sat, Jan 23, 2021 at 5:56 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
On Sat, Jan 23, 2021 at 4:55 AM Peter Smith <smithpb2250@gmail.com> wrote:
PSA the v18 patch for the Tablesync Solution1.
7. Have you tested with the new patch the scenario where we crash
after FINISHEDCOPY and before SYNCDONE, is it able to pick up the
replication using the new temporary slot? Here, we need to test the
case where during the catchup phase we have received few commits and
then the tablesync worker is crashed/errored out? Basically, check if
the replication is continued from the same point?
I have tested this and it didn't work, see the below example.
Publisher-side
================
CREATE TABLE mytbl1(id SERIAL PRIMARY KEY, somedata int, text varchar(120));
BEGIN;
INSERT INTO mytbl1(somedata, text) VALUES (1, 1);
INSERT INTO mytbl1(somedata, text) VALUES (1, 2);
COMMIT;
CREATE PUBLICATION mypublication FOR TABLE mytbl1;
Subscriber-side
================
- Have a while(1) loop in LogicalRepSyncTableStart so that tablesync
worker stops.
CREATE TABLE mytbl1(id SERIAL PRIMARY KEY, somedata int, text varchar(120));
CREATE SUBSCRIPTION mysub
CONNECTION 'host=localhost port=5432 dbname=postgres'
PUBLICATION mypublication;
During debug, stop after we mark FINISHEDCOPY state.
Publisher-side
================
INSERT INTO mytbl1(somedata, text) VALUES (1, 3);
INSERT INTO mytbl1(somedata, text) VALUES (1, 4);
Subscriber-side
================
- Have a breakpoint in apply_dispatch
- continue in debugger;
- After we replay first commit (which will be for values(1,3), note
down the origin position in apply_handle_commit_internal and somehow
error out. I have forced the debugger to set to the last line in
apply_dispatch where the error is raised.
- After the error, again the tablesync worker is restarted and it
starts from the position noted in the previous step
- It exits without replaying the WAL for (1,4)
So, on the subscriber-side, you will see 3 records. Fourth is missing.
Now, if you insert more records on the publisher, it will anyway
replay those but the fourth one got missing.
The temporary slots didn't seem to work because we created again the
new temporary slot after the crash and ask it to start decoding from
the point we noted in origin_lsn. The publisher didn’t hold the
required WAL as our slot was temporary so it started sending from some
later point. We retain WAL based on the slots restart_lsn position and
wal_keep_size. For our case, the positions of the slots will matter
and as we have created temporary slots, there is no way for a
publisher to save that WAL.
In this particular case, even if the WAL would have been there we only
pass the start_decoding_at position but didn’t pass restart_lsn, so it
picked a random location (current insert position in WAL) which is
ahead of start_decoding_at point so it never sent the required fourth
record. Now, I don’t think it will work even if somehow sent the
correct restart_lsn because of what I wrote earlier that there is no
guarantee that the earlier WAL would have been saved.
At this point, I can't think of any way to fix this problem except for
going back to the previous approach of permanent slots but let me know
if you have any ideas to salvage this approach?
--
With Regards,
Amit Kapila.
Hi Amit.
PSA the v21 patch for the Tablesync Solution1.
Main differences from v20:
+ Rebased to latest OSS HEAD @ 27/Jan
+ v21 is a merging of patches [v17] and [v20], which was made
necessary when it was found [ak0127] that the v20 usage of TEMPORARY
tablesync slots did not work correctly. v21 reverts to using PERMANENT
tablesync slots same as implemented in v17, while retaining other
review comment fixes made for v18, v19, v20.
----
[v17] /messages/by-id/CAHut+Pt9+g8qQR0kMC85nY-O4uDQxXboamZAYhHbvkebzC9fAQ@mail.gmail.com
[v20] /messages/by-id/CAHut+PuNwSujoL_dwa=TtozJ_vF=CnJxjgQTCmNBkazd8J1m-A@mail.gmail.com
[ak0127] /messages/by-id/CAA4eK1LDsj9kw4FbWAw3CMHyVsjafgDum03cYy-wpGmor=8-1w@mail.gmail.com
Kind Regards,
Peter Smith.
Fujitsu Australia
Attachments:
v21-0002-Tablesync-extra-logging.patchapplication/octet-stream; name=v21-0002-Tablesync-extra-logging.patchDownload
From 7ce24a04e4b3df8ac3921bf41a45e97edeb3039b Mon Sep 17 00:00:00 2001
From: Peter Smith <peter.b.smith@fujitsu.com>
Date: Thu, 28 Jan 2021 17:08:04 +1100
Subject: [PATCH v21] Tablesync extra logging.
This patch only adds some extra logging which may be helpful for testing, but is not for committing.
---
src/backend/replication/logical/tablesync.c | 56 +++++++++++++++++++++++++++--
1 file changed, 54 insertions(+), 2 deletions(-)
diff --git a/src/backend/replication/logical/tablesync.c b/src/backend/replication/logical/tablesync.c
index f8cf93e..26eae37 100644
--- a/src/backend/replication/logical/tablesync.c
+++ b/src/backend/replication/logical/tablesync.c
@@ -145,7 +145,13 @@ tablesync_replorigin_drop(Oid subid, Oid relid, bool nowait)
{
PG_TRY();
{
+ elog(LOG,
+ "!!>> tablesync_replorign_drop: droppping origin OID %d, named \"%s\"",
+ originid, originname);
replorigin_drop(originid, nowait);
+ elog(LOG,
+ "!!>> tablesync_replorign_drop: dropped origin OID %d, named \"%s\"",
+ originid, originname);
}
PG_CATCH();
{
@@ -317,6 +323,10 @@ tablesync_cleanup_at_shutdown(void)
Oid subid = MySubscription->oid;
Oid relid = MyLogicalRepWorker->relid;
+ elog(LOG,
+ "!!>> tablesync_cleanup_at_shutdown for relid = %d",
+ MyLogicalRepWorker->relid);
+
/*
* Cleanup the tablesync slot, if needed.
*
@@ -349,7 +359,13 @@ tablesync_cleanup_at_shutdown(void)
ReplicationSlotNameForTablesync(MySubscription->slotname,
subid, relid, syncslotname);
+ elog(LOG,
+ "!!>> tablesync_cleanup_at_shutdown: dropping the tablesync slot \"%s\".",
+ syncslotname);
ReplicationSlotDropAtPubNode(wrconn, syncslotname, true /* missing_ok */ );
+ elog(LOG,
+ "!!>> tablesync_cleanup_at_shutdown: dropped the tablesync slot \"%s\".",
+ syncslotname);
}
/*
@@ -363,6 +379,8 @@ tablesync_cleanup_at_shutdown(void)
StartTransactionCommand();
}
+ elog(LOG,
+ "!!>> tablesync_cleanup_at_shutdown: call tablesync_replorigin_drop");
tablesync_replorigin_drop(subid, relid, false /* nowait */ );
/*
@@ -409,7 +427,13 @@ process_syncing_tables_for_sync(XLogRecPtr current_lsn)
*/
ReplicationSlotNameForTablesync(MySubscription->slotname, subid, relid, syncslotname);
+ elog(LOG,
+ "!!>> process_syncing_tables_for_sync: dropping the tablesync slot \"%s\".",
+ syncslotname);
ReplicationSlotDropAtPubNode(wrconn, syncslotname, false /* missing_ok */ );
+ elog(LOG,
+ "!!>> process_syncing_tables_for_sync: dropped the tablesync slot \"%s\".",
+ syncslotname);
/*
* Change state to SYNCDONE.
@@ -563,6 +587,8 @@ process_syncing_tables_for_apply(XLogRecPtr current_lsn)
* orign then would prevent the origin from advancing properly
* on commit TX.
*/
+ elog(LOG,
+ "!!>> process_syncing_tables_for_apply: call tablesync_replorigin_drop");
tablesync_replorigin_drop(MyLogicalRepWorker->subid,
rstate->relid, false /* nowait */ );
@@ -1073,6 +1099,8 @@ LogicalRepSyncTableStart(XLogRecPtr *origin_startpos)
* The COPY phase was previously done, but tablesync then crashed
* before it was able to finish normally.
*/
+ elog(LOG,
+ "!!>> LogicalRepSyncTableStart: tablesync relstate was SUBREL_STATE_FINISHEDCOPY.");
StartTransactionCommand();
/*
@@ -1080,8 +1108,14 @@ LogicalRepSyncTableStart(XLogRecPtr *origin_startpos)
* time this tablesync was launched.
*/
originid = replorigin_by_name(originname, false /* missing_ok */ );
+ elog(LOG,
+ "!!>> LogicalRepSyncTableStart: 2 replorigin_session_setup \"%s\".",
+ originname);
replorigin_session_setup(originid);
replorigin_session_origin = originid;
+ elog(LOG,
+ "!!>> LogicalRepSyncTableStart: 2 replorigin_session_get_progress \"%s\".",
+ originname);
*origin_startpos = replorigin_session_get_progress(false);
CommitTransactionCommand();
@@ -1137,6 +1171,9 @@ LogicalRepSyncTableStart(XLogRecPtr *origin_startpos)
* used for the catchup phase after COPY is done, so tell it to use
* the snapshot to make the final data consistent.
*/
+ elog(LOG,
+ "!!>> LogicalRepSyncTableStart: walrcv_create_slot for \"%s\".",
+ slotname);
walrcv_create_slot(wrconn, slotname, false /* permanent */ ,
CRS_USE_SNAPSHOT, origin_startpos);
@@ -1168,13 +1205,22 @@ LogicalRepSyncTableStart(XLogRecPtr *origin_startpos)
* WAL logged for the purpose of recovery. Locks are to prevent
* the replication origin from vanishing while advancing.
*/
+ elog(LOG,
+ "!!>> LogicalRepSyncTableStart: 1 replorigin_create \"%s\".",
+ originname);
originid = replorigin_create(originname);
LockRelationOid(ReplicationOriginRelationId, RowExclusiveLock);
+ elog(LOG,
+ "!!>> LogicalRepSyncTableStart: 1 replorigin_advance \"%s\".",
+ originname);
replorigin_advance(originid, *origin_startpos, InvalidXLogRecPtr,
true /* go backward */ , true /* WAL log */ );
UnlockRelationOid(ReplicationOriginRelationId, RowExclusiveLock);
+ elog(LOG,
+ "!!>> LogicalRepSyncTableStart: 1 replorigin_session_setup \"%s\".",
+ originname);
replorigin_session_setup(originid);
replorigin_session_origin = originid;
}
@@ -1203,7 +1249,13 @@ LogicalRepSyncTableStart(XLogRecPtr *origin_startpos)
* If something failed during copy table then cleanup the created
* slot.
*/
+ elog(LOG,
+ "!!>> LogicalRepSyncTableStart: tablesync copy failed. Dropping the tablesync slot \"%s\".",
+ slotname);
ReplicationSlotDropAtPubNode(wrconn, slotname, false /* missing_ok */ );
+ elog(LOG,
+ "!!>> LogicalRepSyncTableStart: tablesync copy failed. Dropped the tablesync slot \"%s\".",
+ slotname);
pfree(slotname);
slotname = NULL;
@@ -1214,8 +1266,8 @@ LogicalRepSyncTableStart(XLogRecPtr *origin_startpos)
copy_table_done:
- elog(DEBUG1,
- "LogicalRepSyncTableStart: '%s' origin_startpos lsn %X/%X",
+ elog(LOG,
+ "!!>> LogicalRepSyncTableStart: '%s' origin_startpos lsn %X/%X",
originname,
(uint32) (*origin_startpos >> 32),
(uint32) *origin_startpos);
--
1.8.3.1
v21-0001-Tablesync-Solution1.patchapplication/octet-stream; name=v21-0001-Tablesync-Solution1.patchDownload
From d595ff741ba6880b1916e18e699b81d88d21ee40 Mon Sep 17 00:00:00 2001
From: Peter Smith <peter.b.smith@fujitsu.com>
Date: Thu, 28 Jan 2021 16:36:09 +1100
Subject: [PATCH v21] Tablesync Solution1.
====
Features:
* The tablesync slot is now permanent instead of temporary.
* The tablesync worker is now allowing multiple tx instead of single tx.
* A new state (SUBREL_STATE_FINISHEDCOPY) is persisted after a successful copy_table in tablesync's LogicalRepSyncTableStart.
* If a re-launched tablesync finds state SUBREL_STATE_FINISHEDCOPY then it will bypass the initial copy_table phase.
* Now tablesync sets up replication origin tracking in LogicalRepSyncTableStart (similar as done for the apply worker). The origin is advanced when first created.
* Cleanup of tablesync resources:
- The tablesync slot cleanup (drop) code is added for process_syncing_tables_for_sync functions.
- The tablesync replication origin tracking is cleaned process_syncing_tables_for_apply.
- A tablesync function to cleanup its own slot/origin is called fro ProcessInterrupt. This is indirectly invoked by DropSubscription/AlterSubscrition when they signal the tablesync worker to stop.
* Updates to PG docs.
* New TAP test case.
Known Issues:
* Dangling tablesync slots may be possible if some race scenario occurs during Drop/AlterSubscription.
---
doc/src/sgml/catalogs.sgml | 1 +
doc/src/sgml/logical-replication.sgml | 18 +-
doc/src/sgml/ref/drop_subscription.sgml | 6 +-
src/backend/commands/subscriptioncmds.c | 93 ++++---
src/backend/replication/logical/origin.c | 2 +-
src/backend/replication/logical/tablesync.c | 368 ++++++++++++++++++++++++----
src/backend/replication/logical/worker.c | 27 +-
src/backend/tcop/postgres.c | 6 +
src/include/catalog/pg_subscription_rel.h | 2 +
src/include/replication/logicalworker.h | 2 +
src/include/replication/slot.h | 3 +
src/include/replication/worker_internal.h | 2 +
src/test/subscription/t/004_sync.pl | 69 +++++-
13 files changed, 503 insertions(+), 96 deletions(-)
diff --git a/doc/src/sgml/catalogs.sgml b/doc/src/sgml/catalogs.sgml
index 865e826..920a39d 100644
--- a/doc/src/sgml/catalogs.sgml
+++ b/doc/src/sgml/catalogs.sgml
@@ -7665,6 +7665,7 @@ SCRAM-SHA-256$<replaceable><iteration count></replaceable>:<replaceable>&l
State code:
<literal>i</literal> = initialize,
<literal>d</literal> = data is being copied,
+ <literal>f</literal> = finished table copy,
<literal>s</literal> = synchronized,
<literal>r</literal> = ready (normal replication)
</para></entry>
diff --git a/doc/src/sgml/logical-replication.sgml b/doc/src/sgml/logical-replication.sgml
index a560ad6..e1b20ea 100644
--- a/doc/src/sgml/logical-replication.sgml
+++ b/doc/src/sgml/logical-replication.sgml
@@ -248,7 +248,18 @@
<para>
As mentioned earlier, each (active) subscription receives changes from a
- replication slot on the remote (publishing) side. Normally, the remote
+ replication slot on the remote (publishing) side.
+ </para>
+ <para>
+ Additional table synchronization slots are normally transient, created
+ internally and dropped automatically when they are no longer needed.
+ These table synchronization slots have generated names:
+ <quote><literal>%s_%u_sync_%u</literal></quote> (parameters: Subscription
+ <parameter>slot_name</parameter>, Subscription <parameter>oid</parameter>,
+ Table <parameter>relid</parameter>)
+ </para>
+ <para>
+ Normally, the remote
replication slot is created automatically when the subscription is created
using <command>CREATE SUBSCRIPTION</command> and it is dropped
automatically when the subscription is dropped using <command>DROP
@@ -294,8 +305,9 @@
using <command>ALTER SUBSCRIPTION</command> before attempting to drop
the subscription. If the remote database instance no longer exists, no
further action is then necessary. If, however, the remote database
- instance is just unreachable, the replication slot should then be
- dropped manually; otherwise it would continue to reserve WAL and might
+ instance is just unreachable, the replication slot (and any still
+ remaining table synchronization slots) should then be
+ dropped manually; otherwise it/they would continue to reserve WAL and might
eventually cause the disk to fill up. Such cases should be carefully
investigated.
</para>
diff --git a/doc/src/sgml/ref/drop_subscription.sgml b/doc/src/sgml/ref/drop_subscription.sgml
index adbdeaf..aee9615 100644
--- a/doc/src/sgml/ref/drop_subscription.sgml
+++ b/doc/src/sgml/ref/drop_subscription.sgml
@@ -79,7 +79,8 @@ DROP SUBSCRIPTION [ IF EXISTS ] <replaceable class="parameter">name</replaceable
<para>
When dropping a subscription that is associated with a replication slot on
the remote host (the normal state), <command>DROP SUBSCRIPTION</command>
- will connect to the remote host and try to drop the replication slot as
+ will connect to the remote host and try to drop the replication slot (and
+ any remaining table synchronization slots) as
part of its operation. This is necessary so that the resources allocated
for the subscription on the remote host are released. If this fails,
either because the remote host is not reachable or because the remote
@@ -89,7 +90,8 @@ DROP SUBSCRIPTION [ IF EXISTS ] <replaceable class="parameter">name</replaceable
executing <literal>ALTER SUBSCRIPTION ... SET (slot_name = NONE)</literal>.
After that, <command>DROP SUBSCRIPTION</command> will no longer attempt any
actions on a remote host. Note that if the remote replication slot still
- exists, it should then be dropped manually; otherwise it will continue to
+ exists, it (and any related table synchronization slots) should then be
+ dropped manually; otherwise it/they will continue to
reserve WAL and might eventually cause the disk to fill up. See
also <xref linkend="logical-replication-subscription-slot"/>.
</para>
diff --git a/src/backend/commands/subscriptioncmds.c b/src/backend/commands/subscriptioncmds.c
index 082f785..e31ba6e 100644
--- a/src/backend/commands/subscriptioncmds.c
+++ b/src/backend/commands/subscriptioncmds.c
@@ -34,6 +34,7 @@
#include "nodes/makefuncs.h"
#include "replication/logicallauncher.h"
#include "replication/origin.h"
+#include "replication/slot.h"
#include "replication/walreceiver.h"
#include "replication/walsender.h"
#include "replication/worker_internal.h"
@@ -928,7 +929,6 @@ DropSubscription(DropSubscriptionStmt *stmt, bool isTopLevel)
char *err = NULL;
RepOriginId originid;
WalReceiverConn *wrconn = NULL;
- StringInfoData cmd;
Form_pg_subscription form;
/*
@@ -1042,7 +1042,7 @@ DropSubscription(DropSubscriptionStmt *stmt, bool isTopLevel)
}
list_free(subworkers);
- /* Clean up dependencies */
+ /* Clean up dependencies. */
deleteSharedDependencyRecordsFor(SubscriptionRelationId, subid, 0);
/* Remove any associated relation synchronization states. */
@@ -1055,61 +1055,92 @@ DropSubscription(DropSubscriptionStmt *stmt, bool isTopLevel)
replorigin_drop(originid, false);
/*
- * If there is no slot associated with the subscription, we can finish
- * here.
+ * If there is a slot associated with the subscription, then drop the
+ * replication slot at the publisher node using the replication
+ * connection.
*/
- if (!slotname)
+ if (slotname)
{
- table_close(rel, NoLock);
- return;
+ load_file("libpqwalreceiver", false);
+
+ wrconn = walrcv_connect(conninfo, true, subname, &err);
+ if (wrconn == NULL)
+ ereport(ERROR,
+ (errmsg("could not connect to publisher when attempting to "
+ "drop the replication slot \"%s\"", slotname),
+ errdetail("The error was: %s", err),
+ /* translator: %s is an SQL ALTER command */
+ errhint("Use %s to disassociate the subscription from the slot.",
+ "ALTER SUBSCRIPTION ... SET (slot_name = NONE)")));
+
+ PG_TRY();
+ {
+ ReplicationSlotDropAtPubNode(wrconn, slotname, false /* missing_ok */ );
+ }
+ PG_FINALLY();
+ {
+ walrcv_disconnect(wrconn);
+ }
+ PG_END_TRY();
}
- /*
- * Otherwise drop the replication slot at the publisher node using the
- * replication connection.
- */
+ table_close(rel, NoLock);
+}
+
+/*
+ * Drop the replication slot at the publisher node using the replication connection.
+ *
+ * missing_ok - if true then only issue WARNING message if the slot cannot be deleted.
+ */
+void
+ReplicationSlotDropAtPubNode(WalReceiverConn *wrconn, char *slotname, bool missing_ok)
+{
+ StringInfoData cmd;
+
+ Assert(wrconn);
+
load_file("libpqwalreceiver", false);
initStringInfo(&cmd);
appendStringInfo(&cmd, "DROP_REPLICATION_SLOT %s WAIT", quote_identifier(slotname));
- wrconn = walrcv_connect(conninfo, true, subname, &err);
- if (wrconn == NULL)
- ereport(ERROR,
- (errmsg("could not connect to publisher when attempting to "
- "drop the replication slot \"%s\"", slotname),
- errdetail("The error was: %s", err),
- /* translator: %s is an SQL ALTER command */
- errhint("Use %s to disassociate the subscription from the slot.",
- "ALTER SUBSCRIPTION ... SET (slot_name = NONE)")));
-
PG_TRY();
{
WalRcvExecResult *res;
res = walrcv_exec(wrconn, cmd.data, 0, NULL);
- if (res->status != WALRCV_OK_COMMAND)
- ereport(ERROR,
+ if (res->status == WALRCV_OK_COMMAND)
+ {
+ /* NOTICE. Success. */
+ ereport(NOTICE,
+ (errmsg("dropped replication slot \"%s\" on publisher",
+ slotname)));
+ }
+ else if (res->status == WALRCV_ERROR && missing_ok)
+ {
+ /* WARNING. Error, but missing_ok = true. */
+ ereport(WARNING,
(errmsg("could not drop the replication slot \"%s\" on publisher",
slotname),
errdetail("The error was: %s", res->err)));
+ }
else
- ereport(NOTICE,
- (errmsg("dropped replication slot \"%s\" on publisher",
- slotname)));
+ {
+ /* ERROR. */
+ ereport(ERROR,
+ (errmsg("could not drop the replication slot \"%s\" on publisher",
+ slotname),
+ errdetail("The error was: %s", res->err)));
+ }
walrcv_clear_result(res);
}
PG_FINALLY();
{
- walrcv_disconnect(wrconn);
+ pfree(cmd.data);
}
PG_END_TRY();
-
- pfree(cmd.data);
-
- table_close(rel, NoLock);
}
/*
diff --git a/src/backend/replication/logical/origin.c b/src/backend/replication/logical/origin.c
index 9bd761a..77aae35 100644
--- a/src/backend/replication/logical/origin.c
+++ b/src/backend/replication/logical/origin.c
@@ -357,7 +357,7 @@ restart:
if (state->roident == roident)
{
/* found our slot, is it busy? */
- if (state->acquired_by != 0)
+ if (state->acquired_by != 0 && state->acquired_by != MyProcPid)
{
ConditionVariable *cv;
diff --git a/src/backend/replication/logical/tablesync.c b/src/backend/replication/logical/tablesync.c
index 863d196..f8cf93e 100644
--- a/src/backend/replication/logical/tablesync.c
+++ b/src/backend/replication/logical/tablesync.c
@@ -31,8 +31,11 @@
* table state to INIT.
* - Tablesync worker starts; changes table state from INIT to DATASYNC while
* copying.
- * - Tablesync worker finishes the copy and sets table state to SYNCWAIT;
- * waits for state change.
+ * - Tablesync worker does initial table copy; there is a FINISHEDCOPY (sync
+ * worker specific) state to indicate when the copy phase has completed, so
+ * if the worker crashes with this (non-memory) state then the copy will not
+ * be re-attempted.
+ * - Tablesync worker then sets table state to SYNCWAIT; waits for state change.
* - Apply worker periodically checks for tables in SYNCWAIT state. When
* any appear, it sets the table state to CATCHUP and starts loop-waiting
* until either the table state is set to SYNCDONE or the sync worker
@@ -48,8 +51,8 @@
* point it sets state to READY and stops tracking. Again, there might
* be zero changes in between.
*
- * So the state progression is always: INIT -> DATASYNC -> SYNCWAIT ->
- * CATCHUP -> SYNCDONE -> READY.
+ * So the state progression is always: INIT -> DATASYNC -> FINISHEDCOPY
+ * -> SYNCWAIT -> CATCHUP -> SYNCDONE -> READY.
*
* The catalog pg_subscription_rel is used to keep information about
* subscribed tables and their state. Some transient state during data
@@ -59,6 +62,7 @@
* Example flows look like this:
* - Apply is in front:
* sync:8
+ * -> set in catalog FINISHEDCOPY
* -> set in memory SYNCWAIT
* apply:10
* -> set in memory CATCHUP
@@ -74,6 +78,7 @@
*
* - Sync is in front:
* sync:10
+ * -> set in catalog FINISHEDCOPY
* -> set in memory SYNCWAIT
* apply:8
* -> set in memory CATCHUP
@@ -98,11 +103,16 @@
#include "miscadmin.h"
#include "parser/parse_relation.h"
#include "pgstat.h"
+#include "postmaster/interrupt.h"
#include "replication/logicallauncher.h"
#include "replication/logicalrelation.h"
+#include "replication/logicalworker.h"
#include "replication/walreceiver.h"
#include "replication/worker_internal.h"
+#include "replication/slot.h"
+#include "replication/origin.h"
#include "storage/ipc.h"
+#include "storage/lmgr.h"
#include "utils/builtins.h"
#include "utils/lsyscache.h"
#include "utils/memutils.h"
@@ -113,6 +123,42 @@ static bool table_states_valid = false;
StringInfo copybuf = NULL;
/*
+ * Common code to drop the origin of a tablesync worker.
+ *
+ * There is a potential race condition if two processes attempt to call
+ * replorigin_drop for the same originid at the same time. The loser of
+ * that race would give an ERROR saying that it failed to find the
+ * expected originid.
+ *
+ * The TRY/CATCH below supresses such errors allowing the tablesync cleanup
+ * code to proceed.
+ */
+void
+tablesync_replorigin_drop(Oid subid, Oid relid, bool nowait)
+{
+ char originname[NAMEDATALEN];
+ RepOriginId originid;
+
+ snprintf(originname, sizeof(originname), "pg_%u_%u", subid, relid);
+ originid = replorigin_by_name(originname, true);
+ if (OidIsValid(originid))
+ {
+ PG_TRY();
+ {
+ replorigin_drop(originid, nowait);
+ }
+ PG_CATCH();
+ {
+ ereport(WARNING,
+ errmsg("could not drop replication origin with OID %d, named \"%s\"",
+ originid,
+ originname));
+ }
+ PG_END_TRY();
+ }
+}
+
+/*
* Exit routine for synchronization worker.
*/
static void
@@ -260,6 +306,77 @@ invalidate_syncing_table_states(Datum arg, int cacheid, uint32 hashvalue)
}
/*
+ * The sync worker cleans up any slot / origin resources it may have created.
+ * This function is called from ProcessInterrupts() as result of tablesync being
+ * signalled.
+ */
+void
+tablesync_cleanup_at_shutdown(void)
+{
+ TimeLineID tli;
+ Oid subid = MySubscription->oid;
+ Oid relid = MyLogicalRepWorker->relid;
+
+ /*
+ * Cleanup the tablesync slot, if needed.
+ *
+ * If state is SYNCDONE or READY then the slot has already been dropped.
+ */
+ if (wrconn != NULL &&
+ MyLogicalRepWorker->relstate != SUBREL_STATE_SYNCDONE &&
+ MyLogicalRepWorker->relstate != SUBREL_STATE_READY)
+ {
+ char syncslotname[NAMEDATALEN] = {0};
+
+ /*
+ * End wal streaming so the wrconn can be re-used to drop the slot.
+ */
+ PG_TRY();
+ {
+ walrcv_endstreaming(wrconn, &tli);
+ }
+ PG_CATCH();
+ {
+ /*
+ * It is possible that the walrcv_startstreaming was not yet
+ * called (e.g. the interrupt initiating this cleanup may have
+ * happened during the table COPY phase) so suppress any error
+ * here to cope with that scenario.
+ */
+ }
+ PG_END_TRY();
+
+ ReplicationSlotNameForTablesync(MySubscription->slotname,
+ subid, relid, syncslotname);
+
+ ReplicationSlotDropAtPubNode(wrconn, syncslotname, true /* missing_ok */ );
+ }
+
+ /*
+ * Remove the tablesync's origin tracking if exists.
+ *
+ * The origin APIS must be called within a transaction, and this
+ * transaction will be ended within the finish_sync_worker().
+ */
+ if (!IsTransactionState())
+ {
+ StartTransactionCommand();
+ }
+
+ tablesync_replorigin_drop(subid, relid, false /* nowait */ );
+
+ /*
+ * CommitTransactionCommand would normally attempt to advance the origin,
+ * but now that the origin has been dropped that would fail, so we need to
+ * reset the replorigin_session here to prevent this error happening.
+ */
+ replorigin_session_reset();
+ replorigin_session_origin = InvalidRepOriginId;
+
+ finish_sync_worker(); /* doesn't return. */
+}
+
+/*
* Handle table synchronization cooperation from the synchronization
* worker.
*
@@ -270,30 +387,55 @@ invalidate_syncing_table_states(Datum arg, int cacheid, uint32 hashvalue)
static void
process_syncing_tables_for_sync(XLogRecPtr current_lsn)
{
- Assert(IsTransactionState());
+ bool sync_done = false;
+ Oid subid = MySubscription->oid;
+ Oid relid = MyLogicalRepWorker->relid;
SpinLockAcquire(&MyLogicalRepWorker->relmutex);
+ sync_done = MyLogicalRepWorker->relstate == SUBREL_STATE_CATCHUP &&
+ current_lsn >= MyLogicalRepWorker->relstate_lsn;
+ SpinLockRelease(&MyLogicalRepWorker->relmutex);
- if (MyLogicalRepWorker->relstate == SUBREL_STATE_CATCHUP &&
- current_lsn >= MyLogicalRepWorker->relstate_lsn)
+ if (sync_done)
{
TimeLineID tli;
+ char syncslotname[NAMEDATALEN] = {0};
+
+ /* End wal streaming so wrconn can be re-used to drop the slot. */
+ walrcv_endstreaming(wrconn, &tli);
+ /*
+ * Cleanup the tablesync slot.
+ */
+ ReplicationSlotNameForTablesync(MySubscription->slotname, subid, relid, syncslotname);
+
+ ReplicationSlotDropAtPubNode(wrconn, syncslotname, false /* missing_ok */ );
+
+ /*
+ * Change state to SYNCDONE.
+ */
+ SpinLockAcquire(&MyLogicalRepWorker->relmutex);
MyLogicalRepWorker->relstate = SUBREL_STATE_SYNCDONE;
MyLogicalRepWorker->relstate_lsn = current_lsn;
SpinLockRelease(&MyLogicalRepWorker->relmutex);
+ /*
+ * UpdateSubscriptionRelState must be called within a transaction.
+ * That transaction will be ended within the finish_sync_worker().
+ */
+ if (!IsTransactionState())
+ {
+ StartTransactionCommand();
+ }
+
UpdateSubscriptionRelState(MyLogicalRepWorker->subid,
MyLogicalRepWorker->relid,
MyLogicalRepWorker->relstate,
MyLogicalRepWorker->relstate_lsn);
- walrcv_endstreaming(wrconn, &tli);
finish_sync_worker();
}
- else
- SpinLockRelease(&MyLogicalRepWorker->relmutex);
}
/*
@@ -412,6 +554,21 @@ process_syncing_tables_for_apply(XLogRecPtr current_lsn)
started_tx = true;
}
+ /*
+ * Remove the tablesync origin tracking if exists.
+ *
+ * The normal case origin drop is done here instead of in the
+ * process_syncing_tables_for_sync function because if the
+ * tablesync worker process attempted to call drop its own
+ * orign then would prevent the origin from advancing properly
+ * on commit TX.
+ */
+ tablesync_replorigin_drop(MyLogicalRepWorker->subid,
+ rstate->relid, false /* nowait */ );
+
+ /*
+ * Update the state to READY only after the origin cleanup.
+ */
UpdateSubscriptionRelState(MyLogicalRepWorker->subid,
rstate->relid, rstate->state,
rstate->lsn);
@@ -808,6 +965,42 @@ copy_table(Relation rel)
}
/*
+ * Determine the tablesync slot name.
+ *
+ * The returned slot name is either:
+ * - stored in the supplied buffer (syncslotname), or
+ * - palloc'ed in current memory context (if syncslotname = NULL).
+ */
+char *
+ReplicationSlotNameForTablesync(const char *subslotname, Oid suboid, Oid relid, char *syncslotname)
+{
+ /*
+ * To build a slot name for the sync work, we are limited to NAMEDATALEN -
+ * 1 characters. We cut the original slot name to NAMEDATALEN - 28 chars
+ * and append _%u_sync_%u (1 + 10 + 6 + 10 + '\0'). (It's actually the
+ * NAMEDATALEN on the remote that matters, but this scheme will also work
+ * reasonably if that is different.)
+ */
+ StaticAssertStmt(NAMEDATALEN >= 32, "NAMEDATALEN too small"); /* for sanity */
+
+ if (syncslotname)
+ {
+ sprintf(syncslotname,
+ "%.*s_%u_sync_%u",
+ NAMEDATALEN - 28,
+ subslotname, suboid, relid);
+ }
+ else
+ {
+ syncslotname = psprintf("%.*s_%u_sync_%u",
+ NAMEDATALEN - 28,
+ subslotname, suboid, relid);
+ }
+
+ return syncslotname;
+}
+
+/*
* Start syncing the table in the sync worker.
*
* If nothing needs to be done to sync the table, we exit the worker without
@@ -824,6 +1017,8 @@ LogicalRepSyncTableStart(XLogRecPtr *origin_startpos)
XLogRecPtr relstate_lsn;
Relation rel;
WalRcvExecResult *res;
+ char originname[NAMEDATALEN];
+ RepOriginId originid;
/* Check the state of the table synchronization. */
StartTransactionCommand();
@@ -849,19 +1044,11 @@ LogicalRepSyncTableStart(XLogRecPtr *origin_startpos)
finish_sync_worker(); /* doesn't return */
}
- /*
- * To build a slot name for the sync work, we are limited to NAMEDATALEN -
- * 1 characters. We cut the original slot name to NAMEDATALEN - 28 chars
- * and append _%u_sync_%u (1 + 10 + 6 + 10 + '\0'). (It's actually the
- * NAMEDATALEN on the remote that matters, but this scheme will also work
- * reasonably if that is different.)
- */
- StaticAssertStmt(NAMEDATALEN >= 32, "NAMEDATALEN too small"); /* for sanity */
- slotname = psprintf("%.*s_%u_sync_%u",
- NAMEDATALEN - 28,
- MySubscription->slotname,
- MySubscription->oid,
- MyLogicalRepWorker->relid);
+ /* Calculate the name of the tablesync slot. */
+ slotname = ReplicationSlotNameForTablesync(MySubscription->slotname,
+ MySubscription->oid,
+ MyLogicalRepWorker->relid,
+ NULL /* use palloc */ );
/*
* Here we use the slot name instead of the subscription name as the
@@ -874,7 +1061,33 @@ LogicalRepSyncTableStart(XLogRecPtr *origin_startpos)
(errmsg("could not connect to the publisher: %s", err)));
Assert(MyLogicalRepWorker->relstate == SUBREL_STATE_INIT ||
- MyLogicalRepWorker->relstate == SUBREL_STATE_DATASYNC);
+ MyLogicalRepWorker->relstate == SUBREL_STATE_DATASYNC ||
+ MyLogicalRepWorker->relstate == SUBREL_STATE_FINISHEDCOPY);
+
+ /* Assign the origin tracking record name. */
+ snprintf(originname, sizeof(originname), "pg_%u_%u", MySubscription->oid, MyLogicalRepWorker->relid);
+
+ if (MyLogicalRepWorker->relstate == SUBREL_STATE_FINISHEDCOPY)
+ {
+ /*
+ * The COPY phase was previously done, but tablesync then crashed
+ * before it was able to finish normally.
+ */
+ StartTransactionCommand();
+
+ /*
+ * The origin tracking name must already exist. It was created first
+ * time this tablesync was launched.
+ */
+ originid = replorigin_by_name(originname, false /* missing_ok */ );
+ replorigin_session_setup(originid);
+ replorigin_session_origin = originid;
+ *origin_startpos = replorigin_session_get_progress(false);
+
+ CommitTransactionCommand();
+
+ goto copy_table_done;
+ }
SpinLockAcquire(&MyLogicalRepWorker->relmutex);
MyLogicalRepWorker->relstate = SUBREL_STATE_DATASYNC;
@@ -890,9 +1103,6 @@ LogicalRepSyncTableStart(XLogRecPtr *origin_startpos)
CommitTransactionCommand();
pgstat_report_stat(false);
- /*
- * We want to do the table data sync in a single transaction.
- */
StartTransactionCommand();
/*
@@ -918,29 +1128,97 @@ LogicalRepSyncTableStart(XLogRecPtr *origin_startpos)
walrcv_clear_result(res);
/*
- * Create a new temporary logical decoding slot. This slot will be used
- * for the catchup phase after COPY is done, so tell it to use the
- * snapshot to make the final data consistent.
+ * Be sure to remove the newly created tablesync slot if the COPY fails.
*/
- walrcv_create_slot(wrconn, slotname, true,
- CRS_USE_SNAPSHOT, origin_startpos);
+ PG_TRY();
+ {
+ /*
+ * Create a new permanent logical decoding slot. This slot will be
+ * used for the catchup phase after COPY is done, so tell it to use
+ * the snapshot to make the final data consistent.
+ */
+ walrcv_create_slot(wrconn, slotname, false /* permanent */ ,
+ CRS_USE_SNAPSHOT, origin_startpos);
- /* Now do the initial data copy */
- PushActiveSnapshot(GetTransactionSnapshot());
- copy_table(rel);
- PopActiveSnapshot();
+ /* Now do the initial data copy */
+ PushActiveSnapshot(GetTransactionSnapshot());
+ copy_table(rel);
+ PopActiveSnapshot();
- res = walrcv_exec(wrconn, "COMMIT", 0, NULL);
- if (res->status != WALRCV_OK_COMMAND)
- ereport(ERROR,
- (errmsg("table copy could not finish transaction on publisher"),
- errdetail("The error was: %s", res->err)));
- walrcv_clear_result(res);
+ res = walrcv_exec(wrconn, "COMMIT", 0, NULL);
+ if (res->status != WALRCV_OK_COMMAND)
+ ereport(ERROR,
+ (errmsg("table copy could not finish transaction on publisher"),
+ errdetail("The error was: %s", res->err)));
+ walrcv_clear_result(res);
+
+ table_close(rel, NoLock);
+
+ /* Make the copy visible. */
+ CommandCounterIncrement();
+
+ /* Setup replication origin tracking. */
+ originid = replorigin_by_name(originname, true);
+ if (!OidIsValid(originid))
+ {
+ /*
+ * Origin tracking does not exist, so create it now.
+ *
+ * Then advance to the LSN got from walrcv_create_slot. This is
+ * WAL logged for the purpose of recovery. Locks are to prevent
+ * the replication origin from vanishing while advancing.
+ */
+ originid = replorigin_create(originname);
+
+ LockRelationOid(ReplicationOriginRelationId, RowExclusiveLock);
+ replorigin_advance(originid, *origin_startpos, InvalidXLogRecPtr,
+ true /* go backward */ , true /* WAL log */ );
+ UnlockRelationOid(ReplicationOriginRelationId, RowExclusiveLock);
+
+ replorigin_session_setup(originid);
+ replorigin_session_origin = originid;
+ }
+ else
+ {
+ ereport(ERROR,
+ (errcode(ERRCODE_DUPLICATE_OBJECT),
+ errmsg("replication origin \"%s\" already exists",
+ originname)));
+ }
+
+ /*
+ * Update the persisted state to indicate the COPY phase is done; make
+ * it visible to others.
+ */
+ UpdateSubscriptionRelState(MyLogicalRepWorker->subid,
+ MyLogicalRepWorker->relid,
+ SUBREL_STATE_FINISHEDCOPY,
+ MyLogicalRepWorker->relstate_lsn);
+
+ CommitTransactionCommand();
+ }
+ PG_CATCH();
+ {
+ /*
+ * If something failed during copy table then cleanup the created
+ * slot.
+ */
+ ReplicationSlotDropAtPubNode(wrconn, slotname, false /* missing_ok */ );
+
+ pfree(slotname);
+ slotname = NULL;
+
+ PG_RE_THROW();
+ }
+ PG_END_TRY();
- table_close(rel, NoLock);
+copy_table_done:
- /* Make the copy visible. */
- CommandCounterIncrement();
+ elog(DEBUG1,
+ "LogicalRepSyncTableStart: '%s' origin_startpos lsn %X/%X",
+ originname,
+ (uint32) (*origin_startpos >> 32),
+ (uint32) *origin_startpos);
/*
* We are done with the initial data synchronization, update the state.
diff --git a/src/backend/replication/logical/worker.c b/src/backend/replication/logical/worker.c
index eb7db89..0ea1646 100644
--- a/src/backend/replication/logical/worker.c
+++ b/src/backend/replication/logical/worker.c
@@ -807,12 +807,8 @@ apply_handle_stream_stop(StringInfo s)
/* We must be in a valid transaction state */
Assert(IsTransactionState());
- /* The synchronization worker runs in single transaction. */
- if (!am_tablesync_worker())
- {
- /* Commit the per-stream transaction */
- CommitTransactionCommand();
- }
+ /* Commit the per-stream transaction */
+ CommitTransactionCommand();
in_streamed_transaction = false;
@@ -889,9 +885,7 @@ apply_handle_stream_abort(StringInfo s)
/* Cleanup the subxact info */
cleanup_subxact_info();
- /* The synchronization worker runs in single transaction */
- if (!am_tablesync_worker())
- CommitTransactionCommand();
+ CommitTransactionCommand();
return;
}
@@ -918,8 +912,7 @@ apply_handle_stream_abort(StringInfo s)
/* write the updated subxact list */
subxact_info_write(MyLogicalRepWorker->subid, xid);
- if (!am_tablesync_worker())
- CommitTransactionCommand();
+ CommitTransactionCommand();
}
}
@@ -1062,8 +1055,7 @@ apply_handle_stream_commit(StringInfo s)
static void
apply_handle_commit_internal(StringInfo s, LogicalRepCommitData *commit_data)
{
- /* The synchronization worker runs in single transaction. */
- if (IsTransactionState() && !am_tablesync_worker())
+ if (IsTransactionState())
{
/*
* Update origin state so we can restart streaming from correct
@@ -3112,3 +3104,12 @@ IsLogicalWorker(void)
{
return MyLogicalRepWorker != NULL;
}
+
+/*
+ * Is current process a logical replication tablesync worker?
+ */
+bool
+IsLogicalWorkerTablesync(void)
+{
+ return am_tablesync_worker();
+}
diff --git a/src/backend/tcop/postgres.c b/src/backend/tcop/postgres.c
index cb5a961..8b49dd1 100644
--- a/src/backend/tcop/postgres.c
+++ b/src/backend/tcop/postgres.c
@@ -3086,9 +3086,15 @@ ProcessInterrupts(void)
(errcode(ERRCODE_ADMIN_SHUTDOWN),
errmsg("terminating autovacuum process due to administrator command")));
else if (IsLogicalWorker())
+ {
+ /* Tablesync workers do their own cleanups. */
+ if (IsLogicalWorkerTablesync())
+ tablesync_cleanup_at_shutdown(); /* does not return. */
+
ereport(FATAL,
(errcode(ERRCODE_ADMIN_SHUTDOWN),
errmsg("terminating logical replication worker due to administrator command")));
+ }
else if (IsLogicalLauncher())
{
ereport(DEBUG1,
diff --git a/src/include/catalog/pg_subscription_rel.h b/src/include/catalog/pg_subscription_rel.h
index 06663b9..9027c42 100644
--- a/src/include/catalog/pg_subscription_rel.h
+++ b/src/include/catalog/pg_subscription_rel.h
@@ -61,6 +61,8 @@ DECLARE_UNIQUE_INDEX(pg_subscription_rel_srrelid_srsubid_index, 6117, on pg_subs
#define SUBREL_STATE_INIT 'i' /* initializing (sublsn NULL) */
#define SUBREL_STATE_DATASYNC 'd' /* data is being synchronized (sublsn
* NULL) */
+#define SUBREL_STATE_FINISHEDCOPY 'f' /* tablesync copy phase is completed
+ * (sublsn NULL) */
#define SUBREL_STATE_SYNCDONE 's' /* synchronization finished in front of
* apply (sublsn set) */
#define SUBREL_STATE_READY 'r' /* ready (sublsn set) */
diff --git a/src/include/replication/logicalworker.h b/src/include/replication/logicalworker.h
index 2ad61a0..585df5f 100644
--- a/src/include/replication/logicalworker.h
+++ b/src/include/replication/logicalworker.h
@@ -15,5 +15,7 @@
extern void ApplyWorkerMain(Datum main_arg);
extern bool IsLogicalWorker(void);
+extern bool IsLogicalWorkerTablesync(void);
+extern void tablesync_cleanup_at_shutdown(void);
#endif /* LOGICALWORKER_H */
diff --git a/src/include/replication/slot.h b/src/include/replication/slot.h
index 53f636c..db51cf2 100644
--- a/src/include/replication/slot.h
+++ b/src/include/replication/slot.h
@@ -15,6 +15,7 @@
#include "storage/lwlock.h"
#include "storage/shmem.h"
#include "storage/spin.h"
+#include "replication/walreceiver.h"
/*
* Behaviour of replication slots, upon release or crash.
@@ -211,6 +212,8 @@ extern bool ReplicationSlotsCountDBSlots(Oid dboid, int *nslots, int *nactive);
extern void ReplicationSlotsDropDBSlots(Oid dboid);
extern void InvalidateObsoleteReplicationSlots(XLogSegNo oldestSegno);
extern ReplicationSlot *SearchNamedReplicationSlot(const char *name);
+extern char *ReplicationSlotNameForTablesync(const char *subslotname, Oid suboid, Oid relid, char *syncslotname);
+extern void ReplicationSlotDropAtPubNode(WalReceiverConn *wrconn, char *slotname, bool missing_ok);
extern void StartupReplicationSlots(void);
extern void CheckPointReplicationSlots(void);
diff --git a/src/include/replication/worker_internal.h b/src/include/replication/worker_internal.h
index d046022..67bc911 100644
--- a/src/include/replication/worker_internal.h
+++ b/src/include/replication/worker_internal.h
@@ -84,6 +84,8 @@ extern void logicalrep_worker_wakeup_ptr(LogicalRepWorker *worker);
extern int logicalrep_sync_worker_count(Oid subid);
extern char *LogicalRepSyncTableStart(XLogRecPtr *origin_startpos);
+extern void tablesync_replorigin_drop(Oid subid, Oid relid, bool nowait);
+
void process_syncing_tables(XLogRecPtr current_lsn);
void invalidate_syncing_table_states(Datum arg, int cacheid,
uint32 hashvalue);
diff --git a/src/test/subscription/t/004_sync.pl b/src/test/subscription/t/004_sync.pl
index e111ab9..ec17c38 100644
--- a/src/test/subscription/t/004_sync.pl
+++ b/src/test/subscription/t/004_sync.pl
@@ -3,7 +3,9 @@ use strict;
use warnings;
use PostgresNode;
use TestLib;
-use Test::More tests => 7;
+use Test::More tests => 10;
+use Time::HiRes qw(usleep);
+use Scalar::Util qw(looks_like_number);
# Initialize publisher node
my $node_publisher = get_new_node('publisher');
@@ -149,6 +151,71 @@ $result = $node_subscriber->safe_psql('postgres',
is($result, qq(20),
'changes for table added after subscription initialized replicated');
+##
+## slot integrity
+##
+## Manually create a slot with the same name that tablesync will want.
+## Expect tablesync ERROR when clash is detected.
+## Then remove the slot so tablesync can proceed.
+## Expect tablesync can now finish normally.
+##
+
+# drop the subscription
+$node_subscriber->safe_psql('postgres', "DROP SUBSCRIPTION tap_sub");
+
+# empty the table tab_rep_next
+$node_subscriber->safe_psql('postgres', "DELETE FROM tab_rep_next;");
+
+# drop the table tab_rep from publisher and subscriber
+$node_subscriber->safe_psql('postgres', "DROP TABLE tab_rep;");
+$node_publisher->safe_psql('postgres', "DROP TABLE tab_rep;");
+
+# recreate the subscription again, but leave it disabled so that we can get the OID
+$node_subscriber->safe_psql('postgres',
+ "CREATE SUBSCRIPTION tap_sub CONNECTION '$publisher_connstr' PUBLICATION tap_pub
+ with (enabled = false)"
+);
+
+# need to create the name of the tablesync slot, for this we need the subscription OID
+# and the table OID.
+my $subid = $node_subscriber->safe_psql('postgres',
+ "SELECT oid FROM pg_subscription WHERE subname = 'tap_sub';");
+is(looks_like_number($subid), qq(1), 'get the subscription OID');
+
+my $relid = $node_subscriber->safe_psql('postgres',
+ "SELECT 'tab_rep_next'::regclass::oid");
+is(looks_like_number($relid), qq(1), 'get the table OID');
+
+# name of the tablesync slot is 'slotname'_'suboid'_sync_'tableoid'.
+my $slotname = 'tap_sub_' . $subid . '_' . 'sync_' . $relid;
+
+# temporarily, create a slot having the same name of the tablesync slot.
+$node_publisher->safe_psql('postgres',
+ "SELECT 'init' FROM pg_create_logical_replication_slot('$slotname', 'pgoutput', false);");
+
+# enable the subscription
+$node_subscriber->safe_psql('postgres',
+ "ALTER SUBSCRIPTION tap_sub ENABLE"
+);
+
+# it will be stuck on data sync as slot create will fail because slot already exists.
+$node_subscriber->poll_query_until('postgres', $started_query)
+ or die "Timed out while waiting for subscriber to start sync";
+
+# now drop the offending slot, the tablesync should recover.
+$node_publisher->safe_psql('postgres',
+ "SELECT pg_drop_replication_slot('$slotname');");
+
+# wait for sync to finish
+$node_subscriber->poll_query_until('postgres', $synced_query)
+ or die "Timed out while waiting for subscriber to synchronize data";
+
+$result = $node_subscriber->safe_psql('postgres',
+ "SELECT count(*) FROM tab_rep_next");
+is($result, qq(20),
+ 'data for table added after subscription initialized are now synced');
+
+# Cleanup
$node_subscriber->safe_psql('postgres', "DROP SUBSCRIPTION tap_sub");
$node_subscriber->stop('fast');
--
1.8.3.1
On Wed, Jan 27, 2021 at 2:53 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
On Sat, Jan 23, 2021 at 5:56 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
On Sat, Jan 23, 2021 at 4:55 AM Peter Smith <smithpb2250@gmail.com> wrote:
PSA the v18 patch for the Tablesync Solution1.
7. Have you tested with the new patch the scenario where we crash
after FINISHEDCOPY and before SYNCDONE, is it able to pick up the
replication using the new temporary slot? Here, we need to test the
case where during the catchup phase we have received few commits and
then the tablesync worker is crashed/errored out? Basically, check if
the replication is continued from the same point?I have tested this and it didn't work, see the below example.
Publisher-side
================
CREATE TABLE mytbl1(id SERIAL PRIMARY KEY, somedata int, text varchar(120));BEGIN;
INSERT INTO mytbl1(somedata, text) VALUES (1, 1);
INSERT INTO mytbl1(somedata, text) VALUES (1, 2);
COMMIT;CREATE PUBLICATION mypublication FOR TABLE mytbl1;
Subscriber-side
================
- Have a while(1) loop in LogicalRepSyncTableStart so that tablesync
worker stops.CREATE TABLE mytbl1(id SERIAL PRIMARY KEY, somedata int, text varchar(120));
CREATE SUBSCRIPTION mysub
CONNECTION 'host=localhost port=5432 dbname=postgres'
PUBLICATION mypublication;During debug, stop after we mark FINISHEDCOPY state.
Publisher-side
================
INSERT INTO mytbl1(somedata, text) VALUES (1, 3);
INSERT INTO mytbl1(somedata, text) VALUES (1, 4);Subscriber-side
================
- Have a breakpoint in apply_dispatch
- continue in debugger;
- After we replay first commit (which will be for values(1,3), note
down the origin position in apply_handle_commit_internal and somehow
error out. I have forced the debugger to set to the last line in
apply_dispatch where the error is raised.
- After the error, again the tablesync worker is restarted and it
starts from the position noted in the previous step
- It exits without replaying the WAL for (1,4)So, on the subscriber-side, you will see 3 records. Fourth is missing.
Now, if you insert more records on the publisher, it will anyway
replay those but the fourth one got missing.The temporary slots didn't seem to work because we created again the
new temporary slot after the crash and ask it to start decoding from
the point we noted in origin_lsn. The publisher didn’t hold the
required WAL as our slot was temporary so it started sending from some
later point. We retain WAL based on the slots restart_lsn position and
wal_keep_size. For our case, the positions of the slots will matter
and as we have created temporary slots, there is no way for a
publisher to save that WAL.In this particular case, even if the WAL would have been there we only
pass the start_decoding_at position but didn’t pass restart_lsn, so it
picked a random location (current insert position in WAL) which is
ahead of start_decoding_at point so it never sent the required fourth
record. Now, I don’t think it will work even if somehow sent the
correct restart_lsn because of what I wrote earlier that there is no
guarantee that the earlier WAL would have been saved.At this point, I can't think of any way to fix this problem except for
going back to the previous approach of permanent slots but let me know
if you have any ideas to salvage this approach?
OK. The latest patch [v21] now restores the permanent slot (and slot
cleanup) approach as it was implemented in an earlier version [v17].
Please note that this change also re-introduces some potential slot
cleanup problems for some race scenarios. These will be addressed by
future patches.
----
[v17] /messages/by-id/CAHut+Pt9+g8qQR0kMC85nY-O4uDQxXboamZAYhHbvkebzC9fAQ@mail.gmail.com
[v21] /messages/by-id/CAHut+PvzHRRA_5O0R8KZCb1tVe1mBVPxFtmttXJnmuOmAegoWA@mail.gmail.com
Kind Regards,
Peter Smith.
Fujitsu Australia
On Thu, Jan 28, 2021 at 12:32 PM Peter Smith <smithpb2250@gmail.com> wrote:
On Wed, Jan 27, 2021 at 2:53 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
On Sat, Jan 23, 2021 at 5:56 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
On Sat, Jan 23, 2021 at 4:55 AM Peter Smith <smithpb2250@gmail.com> wrote:
PSA the v18 patch for the Tablesync Solution1.
7. Have you tested with the new patch the scenario where we crash
after FINISHEDCOPY and before SYNCDONE, is it able to pick up the
replication using the new temporary slot? Here, we need to test the
case where during the catchup phase we have received few commits and
then the tablesync worker is crashed/errored out? Basically, check if
the replication is continued from the same point?I have tested this and it didn't work, see the below example.
Publisher-side
================
CREATE TABLE mytbl1(id SERIAL PRIMARY KEY, somedata int, text varchar(120));BEGIN;
INSERT INTO mytbl1(somedata, text) VALUES (1, 1);
INSERT INTO mytbl1(somedata, text) VALUES (1, 2);
COMMIT;CREATE PUBLICATION mypublication FOR TABLE mytbl1;
Subscriber-side
================
- Have a while(1) loop in LogicalRepSyncTableStart so that tablesync
worker stops.CREATE TABLE mytbl1(id SERIAL PRIMARY KEY, somedata int, text varchar(120));
CREATE SUBSCRIPTION mysub
CONNECTION 'host=localhost port=5432 dbname=postgres'
PUBLICATION mypublication;During debug, stop after we mark FINISHEDCOPY state.
Publisher-side
================
INSERT INTO mytbl1(somedata, text) VALUES (1, 3);
INSERT INTO mytbl1(somedata, text) VALUES (1, 4);Subscriber-side
================
- Have a breakpoint in apply_dispatch
- continue in debugger;
- After we replay first commit (which will be for values(1,3), note
down the origin position in apply_handle_commit_internal and somehow
error out. I have forced the debugger to set to the last line in
apply_dispatch where the error is raised.
- After the error, again the tablesync worker is restarted and it
starts from the position noted in the previous step
- It exits without replaying the WAL for (1,4)So, on the subscriber-side, you will see 3 records. Fourth is missing.
Now, if you insert more records on the publisher, it will anyway
replay those but the fourth one got missing.
...
At this point, I can't think of any way to fix this problem except for
going back to the previous approach of permanent slots but let me know
if you have any ideas to salvage this approach?OK. The latest patch [v21] now restores the permanent slot (and slot
cleanup) approach as it was implemented in an earlier version [v17].
Please note that this change also re-introduces some potential slot
cleanup problems for some race scenarios.
I am able to reproduce the race condition where slot/origin will
remain on the publisher node even when the corresponding subscription
is dropped. Basically, if we error out in the 'catchup' phase in
tablesync worker then either it will restart and cleanup slot/origin
or if in the meantime we have dropped the subscription and stopped
apply worker then probably the slot and origin will be dangling on the
publisher.
I have used exactly the same test procedure as was used to expose the
problem in the temporary slots with some minor changes as mentioned
below:
Subscriber-side
================
- Have a while(1) loop in LogicalRepSyncTableStart so that tablesync
worker stops.
- Have a while(1) loop in wait_for_relation_state_change so that we
can control apply worker via debugger at the right time.
Subscriber-side
================
- Have a breakpoint in apply_dispatch
- continue in debugger;
- After we replay first commit somehow error out. I have forced the
debugger to set to the last line in apply_dispatch where the error is
raised.
- Now, the table sync worker won't restart because the apply worker is
looping in wait_for_relation_state_change.
- Execute DropSubscription;
- We can allow apply worker to continue by skipping the while(1) and
it will exit because DropSubscription would have sent a terminate
signal.
After the above steps, check the publisher (select * from
pg_replication_slots) and you will find the dangling tablesync slot.
I think to solve the above problem we should drop tablesync
slot/origin at the Drop/Alter Subscription time and additionally we
need to ensure that apply worker doesn't let tablesync workers restart
(or it must not do any work to access the slot because the slots are
dropped) once we stopped them. To ensure that, I think we need to make
the following changes:
1. Take AccessExclusivelock on subscription_rel during Alter (before
calling RemoveSubscriptionRel) and don't release it till transaction
end (do table_close with NoLock) similar to DropSubscription.
2. Take share lock (AccessShareLock) in GetSubscriptionRelState (it
gets called from logicalrepsyncstartworker), we can release this lock
at the end of that function. This will ensure that even if the
tablesync worker is restarted, it will be blocked till the transaction
performing Alter will commit.
3. Make Alter command to not run in xact block so that we don't keep
locks for a longer time and second for the slots related stuff similar
to dropsubscription.
Few comments on v21:
===================
1.
DropSubscription()
{
..
- /* Clean up dependencies */
+ /* Clean up dependencies. */
deleteSharedDependencyRecordsFor(SubscriptionRelationId, subid, 0);
..
}
The above change seems unnecessary w.r.t current patch.
2.
DropSubscription()
{
..
/*
- * If there is no slot associated with the subscription, we can finish
- * here.
+ * If there is a slot associated with the subscription, then drop the
+ * replication slot at the publisher node using the replication
+ * connection.
*/
- if (!slotname)
+ if (slotname)
{
- table_close(rel, NoLock);
- return;
..
}
What is the reason for this change? Can't we keep the check in its
existing form?
--
With Regards,
Amit Kapila.
Hi Amit.
PSA the v22 patch for the Tablesync Solution1.
Differences from v21:
+ Patch is rebased to latest OSS HEAD @ 29/Jan.
+ Includes new code as suggested [ak0128] to ensure no dangling slots
at Drop/AlterSubscription.
+ Removes the slot/origin cleanup down by process interrupt logic
(cleanup_at_shutdown function).
+ Addresses some minor review comments.
----
[ak0128] /messages/by-id/CAA4eK1LMYXZY1SpzgW-WyFdy+FTMZ4BMz1dj0rT2rxGv-zLwFA@mail.gmail.com
Kind Regards,
Peter Smith.
Fujitsu Australia
Attachments:
v22-0001-Tablesync-Solution1.patchapplication/octet-stream; name=v22-0001-Tablesync-Solution1.patchDownload
From 5d55ec2cc4b93e33f50d2a73023b2e4557edee79 Mon Sep 17 00:00:00 2001
From: Peter Smith <peter.b.smith@fujitsu.com>
Date: Fri, 29 Jan 2021 20:38:29 +1100
Subject: [PATCH v22] Tablesync Solution1.
====
Features:
* The tablesync slot is now permanent instead of temporary.
* The tablesync worker is now allowing multiple tx instead of single tx.
* A new state (SUBREL_STATE_FINISHEDCOPY) is persisted after a successful copy_table in tablesync's LogicalRepSyncTableStart.
* If a re-launched tablesync finds state SUBREL_STATE_FINISHEDCOPY then it will bypass the initial copy_table phase.
* Now tablesync sets up replication origin tracking in LogicalRepSyncTableStart (similar as done for the apply worker). The origin is advanced when first created.
* Cleanup of tablesync resources:
- The tablesync slot is dropped by process_syncing_tables_for_sync functions.
- The tablesync replication origin tracking is dropped by process_syncing_tables_for_apply.
- DropSubscription/AlterSubscription_refresh also drop tablesyc slots/origins
* Updates to PG docs.
* New TAP test case.
Known Issues:
* None.
---
doc/src/sgml/catalogs.sgml | 1 +
doc/src/sgml/logical-replication.sgml | 17 +-
doc/src/sgml/ref/drop_subscription.sgml | 6 +-
src/backend/catalog/pg_subscription.c | 5 +
src/backend/commands/subscriptioncmds.c | 384 ++++++++++++++++++++--------
src/backend/replication/logical/tablesync.c | 294 +++++++++++++++++----
src/backend/replication/logical/worker.c | 18 +-
src/backend/tcop/utility.c | 3 +-
src/include/catalog/pg_subscription_rel.h | 2 +
src/include/commands/subscriptioncmds.h | 2 +-
src/include/replication/slot.h | 3 +
src/include/replication/worker_internal.h | 2 +
src/test/subscription/t/004_sync.pl | 69 ++++-
13 files changed, 628 insertions(+), 178 deletions(-)
diff --git a/doc/src/sgml/catalogs.sgml b/doc/src/sgml/catalogs.sgml
index 865e826..920a39d 100644
--- a/doc/src/sgml/catalogs.sgml
+++ b/doc/src/sgml/catalogs.sgml
@@ -7665,6 +7665,7 @@ SCRAM-SHA-256$<replaceable><iteration count></replaceable>:<replaceable>&l
State code:
<literal>i</literal> = initialize,
<literal>d</literal> = data is being copied,
+ <literal>f</literal> = finished table copy,
<literal>s</literal> = synchronized,
<literal>r</literal> = ready (normal replication)
</para></entry>
diff --git a/doc/src/sgml/logical-replication.sgml b/doc/src/sgml/logical-replication.sgml
index a560ad6..20cdd57 100644
--- a/doc/src/sgml/logical-replication.sgml
+++ b/doc/src/sgml/logical-replication.sgml
@@ -248,7 +248,17 @@
<para>
As mentioned earlier, each (active) subscription receives changes from a
- replication slot on the remote (publishing) side. Normally, the remote
+ replication slot on the remote (publishing) side.
+ </para>
+ <para>
+ Additional table synchronization slots are normally transient, created
+ internally and dropped automatically when they are no longer needed.
+ These table synchronization slots have generated names:
+ <quote><literal>pg_%u_sync_%u</literal></quote> (parameters: Subscription
+ <parameter>oid</parameter>, Table <parameter>relid</parameter>)
+ </para>
+ <para>
+ Normally, the remote
replication slot is created automatically when the subscription is created
using <command>CREATE SUBSCRIPTION</command> and it is dropped
automatically when the subscription is dropped using <command>DROP
@@ -294,8 +304,9 @@
using <command>ALTER SUBSCRIPTION</command> before attempting to drop
the subscription. If the remote database instance no longer exists, no
further action is then necessary. If, however, the remote database
- instance is just unreachable, the replication slot should then be
- dropped manually; otherwise it would continue to reserve WAL and might
+ instance is just unreachable, the replication slot (and any still
+ remaining table synchronization slots) should then be
+ dropped manually; otherwise it/they would continue to reserve WAL and might
eventually cause the disk to fill up. Such cases should be carefully
investigated.
</para>
diff --git a/doc/src/sgml/ref/drop_subscription.sgml b/doc/src/sgml/ref/drop_subscription.sgml
index adbdeaf..aee9615 100644
--- a/doc/src/sgml/ref/drop_subscription.sgml
+++ b/doc/src/sgml/ref/drop_subscription.sgml
@@ -79,7 +79,8 @@ DROP SUBSCRIPTION [ IF EXISTS ] <replaceable class="parameter">name</replaceable
<para>
When dropping a subscription that is associated with a replication slot on
the remote host (the normal state), <command>DROP SUBSCRIPTION</command>
- will connect to the remote host and try to drop the replication slot as
+ will connect to the remote host and try to drop the replication slot (and
+ any remaining table synchronization slots) as
part of its operation. This is necessary so that the resources allocated
for the subscription on the remote host are released. If this fails,
either because the remote host is not reachable or because the remote
@@ -89,7 +90,8 @@ DROP SUBSCRIPTION [ IF EXISTS ] <replaceable class="parameter">name</replaceable
executing <literal>ALTER SUBSCRIPTION ... SET (slot_name = NONE)</literal>.
After that, <command>DROP SUBSCRIPTION</command> will no longer attempt any
actions on a remote host. Note that if the remote replication slot still
- exists, it should then be dropped manually; otherwise it will continue to
+ exists, it (and any related table synchronization slots) should then be
+ dropped manually; otherwise it/they will continue to
reserve WAL and might eventually cause the disk to fill up. See
also <xref linkend="logical-replication-subscription-slot"/>.
</para>
diff --git a/src/backend/catalog/pg_subscription.c b/src/backend/catalog/pg_subscription.c
index 44cb285..303791d 100644
--- a/src/backend/catalog/pg_subscription.c
+++ b/src/backend/catalog/pg_subscription.c
@@ -337,6 +337,9 @@ GetSubscriptionRelState(Oid subid, Oid relid, XLogRecPtr *sublsn)
char substate;
bool isnull;
Datum d;
+ Relation rel;
+
+ rel = table_open(SubscriptionRelRelationId, AccessShareLock);
/* Try finding the mapping. */
tup = SearchSysCache2(SUBSCRIPTIONRELMAP,
@@ -363,6 +366,8 @@ GetSubscriptionRelState(Oid subid, Oid relid, XLogRecPtr *sublsn)
/* Cleanup */
ReleaseSysCache(tup);
+ table_close(rel, AccessShareLock);
+
return substate;
}
diff --git a/src/backend/commands/subscriptioncmds.c b/src/backend/commands/subscriptioncmds.c
index 082f785..b9ecf04 100644
--- a/src/backend/commands/subscriptioncmds.c
+++ b/src/backend/commands/subscriptioncmds.c
@@ -34,6 +34,7 @@
#include "nodes/makefuncs.h"
#include "replication/logicallauncher.h"
#include "replication/origin.h"
+#include "replication/slot.h"
#include "replication/walreceiver.h"
#include "replication/walsender.h"
#include "replication/worker_internal.h"
@@ -566,107 +567,154 @@ AlterSubscription_refresh(Subscription *sub, bool copy_data)
Oid *pubrel_local_oids;
ListCell *lc;
int off;
+ Relation rel;
/* Load the library providing us libpq calls. */
load_file("libpqwalreceiver", false);
- /* Try to connect to the publisher. */
- wrconn = walrcv_connect(sub->conninfo, true, sub->name, &err);
- if (!wrconn)
- ereport(ERROR,
- (errmsg("could not connect to the publisher: %s", err)));
-
- /* Get the table list from publisher. */
- pubrel_names = fetch_table_list(wrconn, sub->publications);
-
- /* We are done with the remote side, close connection. */
- walrcv_disconnect(wrconn);
-
- /* Get local table list. */
- subrel_states = GetSubscriptionRelations(sub->oid);
-
/*
- * Build qsorted array of local table oids for faster lookup. This can
- * potentially contain all tables in the database so speed of lookup is
- * important.
+ * Lock pg_subscription_rel with AccessExclusiveLock to prevent any race
+ * conditions with the apply worker re-lanching workers at the same time
+ * this code is trying to remove those tables.
*/
- subrel_local_oids = palloc(list_length(subrel_states) * sizeof(Oid));
- off = 0;
- foreach(lc, subrel_states)
+ rel = table_open(SubscriptionRelRelationId, AccessExclusiveLock);
+
+ PG_TRY();
{
- SubscriptionRelState *relstate = (SubscriptionRelState *) lfirst(lc);
+ /* Try to connect to the publisher. */
+ wrconn = walrcv_connect(sub->conninfo, true, sub->name, &err);
+ if (!wrconn)
+ ereport(ERROR,
+ (errmsg("could not connect to the publisher: %s", err)));
- subrel_local_oids[off++] = relstate->relid;
- }
- qsort(subrel_local_oids, list_length(subrel_states),
- sizeof(Oid), oid_cmp);
+ /* Get the table list from publisher. */
+ pubrel_names = fetch_table_list(wrconn, sub->publications);
- /*
- * Walk over the remote tables and try to match them to locally known
- * tables. If the table is not known locally create a new state for it.
- *
- * Also builds array of local oids of remote tables for the next step.
- */
- off = 0;
- pubrel_local_oids = palloc(list_length(pubrel_names) * sizeof(Oid));
+ /* Get local table list. */
+ subrel_states = GetSubscriptionRelations(sub->oid);
- foreach(lc, pubrel_names)
- {
- RangeVar *rv = (RangeVar *) lfirst(lc);
- Oid relid;
+ /*
+ * Build qsorted array of local table oids for faster lookup. This can
+ * potentially contain all tables in the database so speed of lookup
+ * is important.
+ */
+ subrel_local_oids = palloc(list_length(subrel_states) * sizeof(Oid));
+ off = 0;
+ foreach(lc, subrel_states)
+ {
+ SubscriptionRelState *relstate = (SubscriptionRelState *) lfirst(lc);
- relid = RangeVarGetRelid(rv, AccessShareLock, false);
+ subrel_local_oids[off++] = relstate->relid;
+ }
+ qsort(subrel_local_oids, list_length(subrel_states),
+ sizeof(Oid), oid_cmp);
+
+ /*
+ * Walk over the remote tables and try to match them to locally known
+ * tables. If the table is not known locally create a new state for
+ * it.
+ *
+ * Also builds array of local oids of remote tables for the next step.
+ */
+ off = 0;
+ pubrel_local_oids = palloc(list_length(pubrel_names) * sizeof(Oid));
+
+ foreach(lc, pubrel_names)
+ {
+ RangeVar *rv = (RangeVar *) lfirst(lc);
+ Oid relid;
- /* Check for supported relkind. */
- CheckSubscriptionRelkind(get_rel_relkind(relid),
- rv->schemaname, rv->relname);
+ relid = RangeVarGetRelid(rv, AccessShareLock, false);
- pubrel_local_oids[off++] = relid;
+ /* Check for supported relkind. */
+ CheckSubscriptionRelkind(get_rel_relkind(relid),
+ rv->schemaname, rv->relname);
- if (!bsearch(&relid, subrel_local_oids,
- list_length(subrel_states), sizeof(Oid), oid_cmp))
- {
- AddSubscriptionRelState(sub->oid, relid,
- copy_data ? SUBREL_STATE_INIT : SUBREL_STATE_READY,
- InvalidXLogRecPtr);
- ereport(DEBUG1,
- (errmsg("table \"%s.%s\" added to subscription \"%s\"",
- rv->schemaname, rv->relname, sub->name)));
- }
- }
+ pubrel_local_oids[off++] = relid;
- /*
- * Next remove state for tables we should not care about anymore using the
- * data we collected above
- */
- qsort(pubrel_local_oids, list_length(pubrel_names),
- sizeof(Oid), oid_cmp);
+ if (!bsearch(&relid, subrel_local_oids,
+ list_length(subrel_states), sizeof(Oid), oid_cmp))
+ {
+ AddSubscriptionRelState(sub->oid, relid,
+ copy_data ? SUBREL_STATE_INIT : SUBREL_STATE_READY,
+ InvalidXLogRecPtr);
+ ereport(DEBUG1,
+ (errmsg("table \"%s.%s\" added to subscription \"%s\"",
+ rv->schemaname, rv->relname, sub->name)));
+ }
+ }
- for (off = 0; off < list_length(subrel_states); off++)
- {
- Oid relid = subrel_local_oids[off];
+ /*
+ * Next remove state for tables we should not care about anymore using
+ * the data we collected above
+ */
+ qsort(pubrel_local_oids, list_length(pubrel_names),
+ sizeof(Oid), oid_cmp);
- if (!bsearch(&relid, pubrel_local_oids,
- list_length(pubrel_names), sizeof(Oid), oid_cmp))
+ for (off = 0; off < list_length(subrel_states); off++)
{
- RemoveSubscriptionRel(sub->oid, relid);
+ Oid relid = subrel_local_oids[off];
+
+ if (!bsearch(&relid, pubrel_local_oids,
+ list_length(pubrel_names), sizeof(Oid), oid_cmp))
+ {
+ char state;
+ XLogRecPtr statelsn;
+
+ /* Last known rel state. */
+ state = GetSubscriptionRelState(sub->oid, relid, &statelsn);
+
+ RemoveSubscriptionRel(sub->oid, relid);
+
+ logicalrep_worker_stop_at_commit(sub->oid, relid);
+
+ /*
+ * Drop the tablesync slot.
+ *
+ * For SYNCDONE/READY states we know the tablesync slot has
+ * already been dropped by the tablesync worker.
+ *
+ * For other states, there is no certainty. Maybe the slot
+ * does not exist yet; Maybe the slot is already deleted but
+ * SYNCDONE is not yet set. For this reason we allow
+ * missing_ok = true for the drop.
+ */
+ if (state != SUBREL_STATE_SYNCDONE && state != SUBREL_STATE_READY)
+ {
+ char syncslotname[NAMEDATALEN] = {0};
+
+ ReplicationSlotNameForTablesync(sub->oid, relid, syncslotname);
+ ReplicationSlotDropAtPubNode(wrconn, syncslotname, true /* missing_ok */ );
+ }
- logicalrep_worker_stop_at_commit(sub->oid, relid);
+ /*
+ * Drop the tablesync's origin tracking if exists.
+ */
+ tablesync_replorigin_drop(sub->oid, relid, false /* nowait */ );
- ereport(DEBUG1,
- (errmsg("table \"%s.%s\" removed from subscription \"%s\"",
- get_namespace_name(get_rel_namespace(relid)),
- get_rel_name(relid),
- sub->name)));
+ ereport(DEBUG1,
+ (errmsg("table \"%s.%s\" removed from subscription \"%s\"",
+ get_namespace_name(get_rel_namespace(relid)),
+ get_rel_name(relid),
+ sub->name)));
+ }
}
}
+ PG_FINALLY();
+ {
+ if (wrconn)
+ walrcv_disconnect(wrconn);
+ }
+ PG_END_TRY();
+
+ table_close(rel, NoLock);
}
/*
* Alter the existing subscription.
*/
ObjectAddress
-AlterSubscription(AlterSubscriptionStmt *stmt)
+AlterSubscription(AlterSubscriptionStmt *stmt, bool isTopLevel)
{
Relation rel;
ObjectAddress myself;
@@ -848,6 +896,8 @@ AlterSubscription(AlterSubscriptionStmt *stmt)
errmsg("ALTER SUBSCRIPTION with refresh is not allowed for disabled subscriptions"),
errhint("Use ALTER SUBSCRIPTION ... SET PUBLICATION ... WITH (refresh = false).")));
+ PreventInTransactionBlock(isTopLevel, "ALTER SUBSCRIPTION with refresh");
+
/* Make sure refresh sees the new list of publications. */
sub->publications = stmt->publication;
@@ -877,6 +927,8 @@ AlterSubscription(AlterSubscriptionStmt *stmt)
NULL, NULL, /* no "binary" */
NULL, NULL); /* no "streaming" */
+ PreventInTransactionBlock(isTopLevel, "ALTER SUBSCRIPTION ... REFRESH");
+
AlterSubscription_refresh(sub, copy_data);
break;
@@ -928,8 +980,8 @@ DropSubscription(DropSubscriptionStmt *stmt, bool isTopLevel)
char *err = NULL;
RepOriginId originid;
WalReceiverConn *wrconn = NULL;
- StringInfoData cmd;
Form_pg_subscription form;
+ List *rstates;
/*
* Lock pg_subscription with AccessExclusiveLock to ensure that the
@@ -1042,39 +1094,19 @@ DropSubscription(DropSubscriptionStmt *stmt, bool isTopLevel)
}
list_free(subworkers);
- /* Clean up dependencies */
- deleteSharedDependencyRecordsFor(SubscriptionRelationId, subid, 0);
-
- /* Remove any associated relation synchronization states. */
- RemoveSubscriptionRel(subid, InvalidOid);
-
- /* Remove the origin tracking if exists. */
- snprintf(originname, sizeof(originname), "pg_%u", subid);
- originid = replorigin_by_name(originname, true);
- if (originid != InvalidRepOriginId)
- replorigin_drop(originid, false);
-
- /*
- * If there is no slot associated with the subscription, we can finish
- * here.
- */
- if (!slotname)
- {
- table_close(rel, NoLock);
- return;
- }
-
/*
- * Otherwise drop the replication slot at the publisher node using the
- * replication connection.
+ * Try to acquire the connection necessary for dropping slots. We do this
+ * here so that the same connection may be shared for dropping the
+ * Subscription slot, as well as dropping any tablesync slots.
+ *
+ * Note: If the slotname is NONE/NULL then connection errors are
+ * suppressed. This is as per PG docs so the DROP SUBSCRIPTION can still
+ * complete even when the connection to publisher is broken.
*/
load_file("libpqwalreceiver", false);
- initStringInfo(&cmd);
- appendStringInfo(&cmd, "DROP_REPLICATION_SLOT %s WAIT", quote_identifier(slotname));
-
wrconn = walrcv_connect(conninfo, true, subname, &err);
- if (wrconn == NULL)
+ if (wrconn == NULL && slotname != NULL)
ereport(ERROR,
(errmsg("could not connect to publisher when attempting to "
"drop the replication slot \"%s\"", slotname),
@@ -1085,31 +1117,159 @@ DropSubscription(DropSubscriptionStmt *stmt, bool isTopLevel)
PG_TRY();
{
+ /*
+ * Tablesync resource cleanup (slots and origins).
+ *
+ * Any READY-state relations would already have dealt with clean-ups.
+ */
+ rstates = GetSubscriptionNotReadyRelations(subid);
+ foreach(lc, rstates)
+ {
+ SubscriptionRelState *rstate = (SubscriptionRelState *) lfirst(lc);
+ Oid relid = rstate->relid;
+
+ /* Only cleanup resources of tablesync workers */
+ if (!OidIsValid(relid))
+ continue;
+
+ /*
+ * Drop the tablesync slot.
+ *
+ * For SYNCDONE/READY states the tablesync slot is known to have
+ * already been dropped by the tablesync worker.
+ *
+ * For other states, there is no certainty. Maybe the slot does
+ * not exist yet; Maybe the slot is already deleted but SYNCDONE
+ * is not yet set. For this reason we allow missing_ok = true for
+ * the drop.
+ */
+ if (rstate->state != SUBREL_STATE_SYNCDONE)
+ {
+ char syncslotname[NAMEDATALEN] = {0};
+
+ ReplicationSlotNameForTablesync(subid, relid, syncslotname);
+ if (wrconn)
+ {
+ ReplicationSlotDropAtPubNode(wrconn, syncslotname, true /* missing_ok */ );
+ }
+ else
+ {
+ /*
+ * It is possible to reach here without ERROR for a broken
+ * publisher connection only if the subscription slotname
+ * is set NONE/NULL.
+ *
+ * This means the user has disassociated the subscription
+ * from the replication slot deliberately so that the DROP
+ * SUBSCRIPTION can proceed to completion. See PG docs
+ * https://www.postgresql.org/docs/current/sql-dropsubscription.html
+ *
+ * For this reason we only give a WARNING a message that
+ * the tablesync slots cannot be dropped, rather than
+ * throw ERROR (which would prevent the DROP SUBSCRIPTION
+ * from proceeding).
+ *
+ * In such a case the user must take steps to manually
+ * cleanup these remaining tablesync slots.
+ */
+ elog(WARNING,
+ "no connection; cannot drop tablesync slot \"%s\".",
+ syncslotname);
+ }
+ }
+
+ /*
+ * Drop the tablesync's origin tracking if exists.
+ */
+ tablesync_replorigin_drop(subid, relid, false /* nowait */ );
+ }
+ list_free(rstates);
+
+ /* Clean up dependencies */
+ deleteSharedDependencyRecordsFor(SubscriptionRelationId, subid, 0);
+
+ /* Remove any associated relation synchronization states. */
+ RemoveSubscriptionRel(subid, InvalidOid);
+
+ /* Remove the origin tracking if exists. */
+ snprintf(originname, sizeof(originname), "pg_%u", subid);
+ originid = replorigin_by_name(originname, true);
+ if (originid != InvalidRepOriginId)
+ replorigin_drop(originid, false);
+
+ /*
+ * If there is a slot associated with the subscription, then drop the
+ * replication slot at the publisher node using the replication
+ * connection.
+ */
+ if (slotname)
+ ReplicationSlotDropAtPubNode(wrconn, slotname, false /* missing_ok */ );
+
+ }
+ PG_FINALLY();
+ {
+ if (wrconn)
+ walrcv_disconnect(wrconn);
+ }
+ PG_END_TRY();
+
+ table_close(rel, NoLock);
+}
+
+/*
+ * Drop the replication slot at the publisher node using the replication connection.
+ *
+ * missing_ok - if true then only issue WARNING message if the slot cannot be deleted.
+ */
+void
+ReplicationSlotDropAtPubNode(WalReceiverConn *wrconn, char *slotname, bool missing_ok)
+{
+ StringInfoData cmd;
+
+ Assert(wrconn);
+
+ load_file("libpqwalreceiver", false);
+
+ initStringInfo(&cmd);
+ appendStringInfo(&cmd, "DROP_REPLICATION_SLOT %s WAIT", quote_identifier(slotname));
+
+ PG_TRY();
+ {
WalRcvExecResult *res;
res = walrcv_exec(wrconn, cmd.data, 0, NULL);
- if (res->status != WALRCV_OK_COMMAND)
- ereport(ERROR,
+ if (res->status == WALRCV_OK_COMMAND)
+ {
+ /* NOTICE. Success. */
+ ereport(NOTICE,
+ (errmsg("dropped replication slot \"%s\" on publisher",
+ slotname)));
+ }
+ else if (res->status == WALRCV_ERROR && missing_ok)
+ {
+ /* WARNING. Error, but missing_ok = true. */
+ ereport(WARNING,
(errmsg("could not drop the replication slot \"%s\" on publisher",
slotname),
errdetail("The error was: %s", res->err)));
+ }
else
- ereport(NOTICE,
- (errmsg("dropped replication slot \"%s\" on publisher",
- slotname)));
+ {
+ /* ERROR. */
+ ereport(ERROR,
+ (errmsg("could not drop the replication slot \"%s\" on publisher",
+ slotname),
+ errdetail("The error was: %s", res->err)));
+ }
walrcv_clear_result(res);
}
PG_FINALLY();
{
- walrcv_disconnect(wrconn);
+ pfree(cmd.data);
}
PG_END_TRY();
-
- pfree(cmd.data);
-
- table_close(rel, NoLock);
}
/*
diff --git a/src/backend/replication/logical/tablesync.c b/src/backend/replication/logical/tablesync.c
index 863d196..165086a 100644
--- a/src/backend/replication/logical/tablesync.c
+++ b/src/backend/replication/logical/tablesync.c
@@ -31,8 +31,11 @@
* table state to INIT.
* - Tablesync worker starts; changes table state from INIT to DATASYNC while
* copying.
- * - Tablesync worker finishes the copy and sets table state to SYNCWAIT;
- * waits for state change.
+ * - Tablesync worker does initial table copy; there is a FINISHEDCOPY (sync
+ * worker specific) state to indicate when the copy phase has completed, so
+ * if the worker crashes with this (non-memory) state then the copy will not
+ * be re-attempted.
+ * - Tablesync worker then sets table state to SYNCWAIT; waits for state change.
* - Apply worker periodically checks for tables in SYNCWAIT state. When
* any appear, it sets the table state to CATCHUP and starts loop-waiting
* until either the table state is set to SYNCDONE or the sync worker
@@ -48,8 +51,8 @@
* point it sets state to READY and stops tracking. Again, there might
* be zero changes in between.
*
- * So the state progression is always: INIT -> DATASYNC -> SYNCWAIT ->
- * CATCHUP -> SYNCDONE -> READY.
+ * So the state progression is always: INIT -> DATASYNC -> FINISHEDCOPY
+ * -> SYNCWAIT -> CATCHUP -> SYNCDONE -> READY.
*
* The catalog pg_subscription_rel is used to keep information about
* subscribed tables and their state. Some transient state during data
@@ -59,6 +62,7 @@
* Example flows look like this:
* - Apply is in front:
* sync:8
+ * -> set in catalog FINISHEDCOPY
* -> set in memory SYNCWAIT
* apply:10
* -> set in memory CATCHUP
@@ -74,6 +78,7 @@
*
* - Sync is in front:
* sync:10
+ * -> set in catalog FINISHEDCOPY
* -> set in memory SYNCWAIT
* apply:8
* -> set in memory CATCHUP
@@ -98,11 +103,16 @@
#include "miscadmin.h"
#include "parser/parse_relation.h"
#include "pgstat.h"
+#include "postmaster/interrupt.h"
#include "replication/logicallauncher.h"
#include "replication/logicalrelation.h"
+#include "replication/logicalworker.h"
#include "replication/walreceiver.h"
#include "replication/worker_internal.h"
+#include "replication/slot.h"
+#include "replication/origin.h"
#include "storage/ipc.h"
+#include "storage/lmgr.h"
#include "utils/builtins.h"
#include "utils/lsyscache.h"
#include "utils/memutils.h"
@@ -113,6 +123,42 @@ static bool table_states_valid = false;
StringInfo copybuf = NULL;
/*
+ * Common code to drop the origin of a tablesync worker.
+ *
+ * There is a potential race condition if two processes attempt to call
+ * replorigin_drop for the same originid at the same time. The loser of
+ * that race would give an ERROR saying that it failed to find the
+ * expected originid.
+ *
+ * The TRY/CATCH below supresses such errors allowing the tablesync cleanup
+ * code to proceed.
+ */
+void
+tablesync_replorigin_drop(Oid subid, Oid relid, bool nowait)
+{
+ char originname[NAMEDATALEN];
+ RepOriginId originid;
+
+ snprintf(originname, sizeof(originname), "pg_%u_%u", subid, relid);
+ originid = replorigin_by_name(originname, true);
+ if (OidIsValid(originid))
+ {
+ PG_TRY();
+ {
+ replorigin_drop(originid, nowait);
+ }
+ PG_CATCH();
+ {
+ ereport(WARNING,
+ errmsg("could not drop replication origin with OID %d, named \"%s\"",
+ originid,
+ originname));
+ }
+ PG_END_TRY();
+ }
+}
+
+/*
* Exit routine for synchronization worker.
*/
static void
@@ -270,30 +316,55 @@ invalidate_syncing_table_states(Datum arg, int cacheid, uint32 hashvalue)
static void
process_syncing_tables_for_sync(XLogRecPtr current_lsn)
{
- Assert(IsTransactionState());
+ bool sync_done = false;
+ Oid subid = MySubscription->oid;
+ Oid relid = MyLogicalRepWorker->relid;
SpinLockAcquire(&MyLogicalRepWorker->relmutex);
+ sync_done = MyLogicalRepWorker->relstate == SUBREL_STATE_CATCHUP &&
+ current_lsn >= MyLogicalRepWorker->relstate_lsn;
+ SpinLockRelease(&MyLogicalRepWorker->relmutex);
- if (MyLogicalRepWorker->relstate == SUBREL_STATE_CATCHUP &&
- current_lsn >= MyLogicalRepWorker->relstate_lsn)
+ if (sync_done)
{
TimeLineID tli;
+ char syncslotname[NAMEDATALEN] = {0};
+
+ /* End wal streaming so wrconn can be re-used to drop the slot. */
+ walrcv_endstreaming(wrconn, &tli);
+
+ /*
+ * Cleanup the tablesync slot.
+ */
+ ReplicationSlotNameForTablesync(subid, relid, syncslotname);
+ ReplicationSlotDropAtPubNode(wrconn, syncslotname, false /* missing_ok */ );
+
+ /*
+ * Change state to SYNCDONE.
+ */
+ SpinLockAcquire(&MyLogicalRepWorker->relmutex);
MyLogicalRepWorker->relstate = SUBREL_STATE_SYNCDONE;
MyLogicalRepWorker->relstate_lsn = current_lsn;
SpinLockRelease(&MyLogicalRepWorker->relmutex);
+ /*
+ * UpdateSubscriptionRelState must be called within a transaction.
+ * That transaction will be ended within the finish_sync_worker().
+ */
+ if (!IsTransactionState())
+ {
+ StartTransactionCommand();
+ }
+
UpdateSubscriptionRelState(MyLogicalRepWorker->subid,
MyLogicalRepWorker->relid,
MyLogicalRepWorker->relstate,
MyLogicalRepWorker->relstate_lsn);
- walrcv_endstreaming(wrconn, &tli);
finish_sync_worker();
}
- else
- SpinLockRelease(&MyLogicalRepWorker->relmutex);
}
/*
@@ -412,6 +483,21 @@ process_syncing_tables_for_apply(XLogRecPtr current_lsn)
started_tx = true;
}
+ /*
+ * Remove the tablesync origin tracking if exists.
+ *
+ * The normal case origin drop is done here instead of in the
+ * process_syncing_tables_for_sync function because if the
+ * tablesync worker process attempted to call drop its own
+ * orign then would prevent the origin from advancing properly
+ * on commit TX.
+ */
+ tablesync_replorigin_drop(MyLogicalRepWorker->subid,
+ rstate->relid, false /* nowait */ );
+
+ /*
+ * Update the state to READY only after the origin cleanup.
+ */
UpdateSubscriptionRelState(MyLogicalRepWorker->subid,
rstate->relid, rstate->state,
rstate->lsn);
@@ -808,6 +894,40 @@ copy_table(Relation rel)
}
/*
+ * Determine the tablesync slot name.
+ *
+ * The name must not exceed NAMEDATALEN -1 because of remote node constraints
+ * on slot name length.
+ *
+ * The returned slot name is either:
+ * - stored in the supplied buffer (syncslotname), or
+ * - palloc'ed in current memory context (if syncslotname = NULL).
+ */
+char *
+ReplicationSlotNameForTablesync(Oid suboid, Oid relid, char syncslotname[NAMEDATALEN])
+{
+ /*
+ * Note: Since now we are using PERMANENT tablesync slots this code is not
+ * using the Subscription slot name as the first part of the tablesync
+ * slot name anymore. This part is omitted because we are now responsible
+ * for cleaning up the permanenet tablesync slots, so it could become
+ * impossible to recalculate what name to cleanup if the Subscription slot
+ * name had changed.
+ */
+
+ if (syncslotname)
+ {
+ sprintf(syncslotname, "pg_%u_sync_%u", suboid, relid);
+ }
+ else
+ {
+ syncslotname = psprintf("pg_%u_sync_%u", suboid, relid);
+ }
+
+ return syncslotname;
+}
+
+/*
* Start syncing the table in the sync worker.
*
* If nothing needs to be done to sync the table, we exit the worker without
@@ -824,6 +944,8 @@ LogicalRepSyncTableStart(XLogRecPtr *origin_startpos)
XLogRecPtr relstate_lsn;
Relation rel;
WalRcvExecResult *res;
+ char originname[NAMEDATALEN];
+ RepOriginId originid;
/* Check the state of the table synchronization. */
StartTransactionCommand();
@@ -849,19 +971,10 @@ LogicalRepSyncTableStart(XLogRecPtr *origin_startpos)
finish_sync_worker(); /* doesn't return */
}
- /*
- * To build a slot name for the sync work, we are limited to NAMEDATALEN -
- * 1 characters. We cut the original slot name to NAMEDATALEN - 28 chars
- * and append _%u_sync_%u (1 + 10 + 6 + 10 + '\0'). (It's actually the
- * NAMEDATALEN on the remote that matters, but this scheme will also work
- * reasonably if that is different.)
- */
- StaticAssertStmt(NAMEDATALEN >= 32, "NAMEDATALEN too small"); /* for sanity */
- slotname = psprintf("%.*s_%u_sync_%u",
- NAMEDATALEN - 28,
- MySubscription->slotname,
- MySubscription->oid,
- MyLogicalRepWorker->relid);
+ /* Calculate the name of the tablesync slot. */
+ slotname = ReplicationSlotNameForTablesync(MySubscription->oid,
+ MyLogicalRepWorker->relid,
+ NULL /* use palloc */ );
/*
* Here we use the slot name instead of the subscription name as the
@@ -874,7 +987,33 @@ LogicalRepSyncTableStart(XLogRecPtr *origin_startpos)
(errmsg("could not connect to the publisher: %s", err)));
Assert(MyLogicalRepWorker->relstate == SUBREL_STATE_INIT ||
- MyLogicalRepWorker->relstate == SUBREL_STATE_DATASYNC);
+ MyLogicalRepWorker->relstate == SUBREL_STATE_DATASYNC ||
+ MyLogicalRepWorker->relstate == SUBREL_STATE_FINISHEDCOPY);
+
+ /* Assign the origin tracking record name. */
+ snprintf(originname, sizeof(originname), "pg_%u_%u", MySubscription->oid, MyLogicalRepWorker->relid);
+
+ if (MyLogicalRepWorker->relstate == SUBREL_STATE_FINISHEDCOPY)
+ {
+ /*
+ * The COPY phase was previously done, but tablesync then crashed
+ * before it was able to finish normally.
+ */
+ StartTransactionCommand();
+
+ /*
+ * The origin tracking name must already exist. It was created first
+ * time this tablesync was launched.
+ */
+ originid = replorigin_by_name(originname, false /* missing_ok */ );
+ replorigin_session_setup(originid);
+ replorigin_session_origin = originid;
+ *origin_startpos = replorigin_session_get_progress(false);
+
+ CommitTransactionCommand();
+
+ goto copy_table_done;
+ }
SpinLockAcquire(&MyLogicalRepWorker->relmutex);
MyLogicalRepWorker->relstate = SUBREL_STATE_DATASYNC;
@@ -890,9 +1029,6 @@ LogicalRepSyncTableStart(XLogRecPtr *origin_startpos)
CommitTransactionCommand();
pgstat_report_stat(false);
- /*
- * We want to do the table data sync in a single transaction.
- */
StartTransactionCommand();
/*
@@ -918,29 +1054,97 @@ LogicalRepSyncTableStart(XLogRecPtr *origin_startpos)
walrcv_clear_result(res);
/*
- * Create a new temporary logical decoding slot. This slot will be used
- * for the catchup phase after COPY is done, so tell it to use the
- * snapshot to make the final data consistent.
+ * Be sure to remove the newly created tablesync slot if the COPY fails.
*/
- walrcv_create_slot(wrconn, slotname, true,
- CRS_USE_SNAPSHOT, origin_startpos);
+ PG_TRY();
+ {
+ /*
+ * Create a new permanent logical decoding slot. This slot will be
+ * used for the catchup phase after COPY is done, so tell it to use
+ * the snapshot to make the final data consistent.
+ */
+ walrcv_create_slot(wrconn, slotname, false /* permanent */ ,
+ CRS_USE_SNAPSHOT, origin_startpos);
- /* Now do the initial data copy */
- PushActiveSnapshot(GetTransactionSnapshot());
- copy_table(rel);
- PopActiveSnapshot();
+ /* Now do the initial data copy */
+ PushActiveSnapshot(GetTransactionSnapshot());
+ copy_table(rel);
+ PopActiveSnapshot();
- res = walrcv_exec(wrconn, "COMMIT", 0, NULL);
- if (res->status != WALRCV_OK_COMMAND)
- ereport(ERROR,
- (errmsg("table copy could not finish transaction on publisher"),
- errdetail("The error was: %s", res->err)));
- walrcv_clear_result(res);
+ res = walrcv_exec(wrconn, "COMMIT", 0, NULL);
+ if (res->status != WALRCV_OK_COMMAND)
+ ereport(ERROR,
+ (errmsg("table copy could not finish transaction on publisher"),
+ errdetail("The error was: %s", res->err)));
+ walrcv_clear_result(res);
+
+ table_close(rel, NoLock);
+
+ /* Make the copy visible. */
+ CommandCounterIncrement();
+
+ /* Setup replication origin tracking. */
+ originid = replorigin_by_name(originname, true);
+ if (!OidIsValid(originid))
+ {
+ /*
+ * Origin tracking does not exist, so create it now.
+ *
+ * Then advance to the LSN got from walrcv_create_slot. This is
+ * WAL logged for the purpose of recovery. Locks are to prevent
+ * the replication origin from vanishing while advancing.
+ */
+ originid = replorigin_create(originname);
+
+ LockRelationOid(ReplicationOriginRelationId, RowExclusiveLock);
+ replorigin_advance(originid, *origin_startpos, InvalidXLogRecPtr,
+ true /* go backward */ , true /* WAL log */ );
+ UnlockRelationOid(ReplicationOriginRelationId, RowExclusiveLock);
+
+ replorigin_session_setup(originid);
+ replorigin_session_origin = originid;
+ }
+ else
+ {
+ ereport(ERROR,
+ (errcode(ERRCODE_DUPLICATE_OBJECT),
+ errmsg("replication origin \"%s\" already exists",
+ originname)));
+ }
+
+ /*
+ * Update the persisted state to indicate the COPY phase is done; make
+ * it visible to others.
+ */
+ UpdateSubscriptionRelState(MyLogicalRepWorker->subid,
+ MyLogicalRepWorker->relid,
+ SUBREL_STATE_FINISHEDCOPY,
+ MyLogicalRepWorker->relstate_lsn);
+
+ CommitTransactionCommand();
+ }
+ PG_CATCH();
+ {
+ /*
+ * If something failed during copy table then cleanup the created
+ * slot.
+ */
+ ReplicationSlotDropAtPubNode(wrconn, slotname, false /* missing_ok */ );
+
+ pfree(slotname);
+ slotname = NULL;
+
+ PG_RE_THROW();
+ }
+ PG_END_TRY();
- table_close(rel, NoLock);
+copy_table_done:
- /* Make the copy visible. */
- CommandCounterIncrement();
+ elog(DEBUG1,
+ "LogicalRepSyncTableStart: '%s' origin_startpos lsn %X/%X",
+ originname,
+ (uint32) (*origin_startpos >> 32),
+ (uint32) *origin_startpos);
/*
* We are done with the initial data synchronization, update the state.
diff --git a/src/backend/replication/logical/worker.c b/src/backend/replication/logical/worker.c
index eb7db89..cfc924c 100644
--- a/src/backend/replication/logical/worker.c
+++ b/src/backend/replication/logical/worker.c
@@ -807,12 +807,8 @@ apply_handle_stream_stop(StringInfo s)
/* We must be in a valid transaction state */
Assert(IsTransactionState());
- /* The synchronization worker runs in single transaction. */
- if (!am_tablesync_worker())
- {
- /* Commit the per-stream transaction */
- CommitTransactionCommand();
- }
+ /* Commit the per-stream transaction */
+ CommitTransactionCommand();
in_streamed_transaction = false;
@@ -889,9 +885,7 @@ apply_handle_stream_abort(StringInfo s)
/* Cleanup the subxact info */
cleanup_subxact_info();
- /* The synchronization worker runs in single transaction */
- if (!am_tablesync_worker())
- CommitTransactionCommand();
+ CommitTransactionCommand();
return;
}
@@ -918,8 +912,7 @@ apply_handle_stream_abort(StringInfo s)
/* write the updated subxact list */
subxact_info_write(MyLogicalRepWorker->subid, xid);
- if (!am_tablesync_worker())
- CommitTransactionCommand();
+ CommitTransactionCommand();
}
}
@@ -1062,8 +1055,7 @@ apply_handle_stream_commit(StringInfo s)
static void
apply_handle_commit_internal(StringInfo s, LogicalRepCommitData *commit_data)
{
- /* The synchronization worker runs in single transaction. */
- if (IsTransactionState() && !am_tablesync_worker())
+ if (IsTransactionState())
{
/*
* Update origin state so we can restart streaming from correct
diff --git a/src/backend/tcop/utility.c b/src/backend/tcop/utility.c
index 1d81071..05bb698 100644
--- a/src/backend/tcop/utility.c
+++ b/src/backend/tcop/utility.c
@@ -1786,7 +1786,8 @@ ProcessUtilitySlow(ParseState *pstate,
break;
case T_AlterSubscriptionStmt:
- address = AlterSubscription((AlterSubscriptionStmt *) parsetree);
+ address = AlterSubscription((AlterSubscriptionStmt *) parsetree,
+ isTopLevel);
break;
case T_DropSubscriptionStmt:
diff --git a/src/include/catalog/pg_subscription_rel.h b/src/include/catalog/pg_subscription_rel.h
index 06663b9..9027c42 100644
--- a/src/include/catalog/pg_subscription_rel.h
+++ b/src/include/catalog/pg_subscription_rel.h
@@ -61,6 +61,8 @@ DECLARE_UNIQUE_INDEX(pg_subscription_rel_srrelid_srsubid_index, 6117, on pg_subs
#define SUBREL_STATE_INIT 'i' /* initializing (sublsn NULL) */
#define SUBREL_STATE_DATASYNC 'd' /* data is being synchronized (sublsn
* NULL) */
+#define SUBREL_STATE_FINISHEDCOPY 'f' /* tablesync copy phase is completed
+ * (sublsn NULL) */
#define SUBREL_STATE_SYNCDONE 's' /* synchronization finished in front of
* apply (sublsn set) */
#define SUBREL_STATE_READY 'r' /* ready (sublsn set) */
diff --git a/src/include/commands/subscriptioncmds.h b/src/include/commands/subscriptioncmds.h
index a818650..3b926f3 100644
--- a/src/include/commands/subscriptioncmds.h
+++ b/src/include/commands/subscriptioncmds.h
@@ -20,7 +20,7 @@
extern ObjectAddress CreateSubscription(CreateSubscriptionStmt *stmt,
bool isTopLevel);
-extern ObjectAddress AlterSubscription(AlterSubscriptionStmt *stmt);
+extern ObjectAddress AlterSubscription(AlterSubscriptionStmt *stmt, bool isTopLevel);
extern void DropSubscription(DropSubscriptionStmt *stmt, bool isTopLevel);
extern ObjectAddress AlterSubscriptionOwner(const char *name, Oid newOwnerId);
diff --git a/src/include/replication/slot.h b/src/include/replication/slot.h
index 53f636c..5f52335 100644
--- a/src/include/replication/slot.h
+++ b/src/include/replication/slot.h
@@ -15,6 +15,7 @@
#include "storage/lwlock.h"
#include "storage/shmem.h"
#include "storage/spin.h"
+#include "replication/walreceiver.h"
/*
* Behaviour of replication slots, upon release or crash.
@@ -211,6 +212,8 @@ extern bool ReplicationSlotsCountDBSlots(Oid dboid, int *nslots, int *nactive);
extern void ReplicationSlotsDropDBSlots(Oid dboid);
extern void InvalidateObsoleteReplicationSlots(XLogSegNo oldestSegno);
extern ReplicationSlot *SearchNamedReplicationSlot(const char *name);
+extern char *ReplicationSlotNameForTablesync(Oid suboid, Oid relid, char *syncslotname);
+extern void ReplicationSlotDropAtPubNode(WalReceiverConn *wrconn, char *slotname, bool missing_ok);
extern void StartupReplicationSlots(void);
extern void CheckPointReplicationSlots(void);
diff --git a/src/include/replication/worker_internal.h b/src/include/replication/worker_internal.h
index d046022..67bc911 100644
--- a/src/include/replication/worker_internal.h
+++ b/src/include/replication/worker_internal.h
@@ -84,6 +84,8 @@ extern void logicalrep_worker_wakeup_ptr(LogicalRepWorker *worker);
extern int logicalrep_sync_worker_count(Oid subid);
extern char *LogicalRepSyncTableStart(XLogRecPtr *origin_startpos);
+extern void tablesync_replorigin_drop(Oid subid, Oid relid, bool nowait);
+
void process_syncing_tables(XLogRecPtr current_lsn);
void invalidate_syncing_table_states(Datum arg, int cacheid,
uint32 hashvalue);
diff --git a/src/test/subscription/t/004_sync.pl b/src/test/subscription/t/004_sync.pl
index e111ab9..963a7ee 100644
--- a/src/test/subscription/t/004_sync.pl
+++ b/src/test/subscription/t/004_sync.pl
@@ -3,7 +3,9 @@ use strict;
use warnings;
use PostgresNode;
use TestLib;
-use Test::More tests => 7;
+use Test::More tests => 10;
+use Time::HiRes qw(usleep);
+use Scalar::Util qw(looks_like_number);
# Initialize publisher node
my $node_publisher = get_new_node('publisher');
@@ -149,6 +151,71 @@ $result = $node_subscriber->safe_psql('postgres',
is($result, qq(20),
'changes for table added after subscription initialized replicated');
+##
+## slot integrity
+##
+## Manually create a slot with the same name that tablesync will want.
+## Expect tablesync ERROR when clash is detected.
+## Then remove the slot so tablesync can proceed.
+## Expect tablesync can now finish normally.
+##
+
+# drop the subscription
+$node_subscriber->safe_psql('postgres', "DROP SUBSCRIPTION tap_sub");
+
+# empty the table tab_rep_next
+$node_subscriber->safe_psql('postgres', "DELETE FROM tab_rep_next;");
+
+# drop the table tab_rep from publisher and subscriber
+$node_subscriber->safe_psql('postgres', "DROP TABLE tab_rep;");
+$node_publisher->safe_psql('postgres', "DROP TABLE tab_rep;");
+
+# recreate the subscription again, but leave it disabled so that we can get the OID
+$node_subscriber->safe_psql('postgres',
+ "CREATE SUBSCRIPTION tap_sub CONNECTION '$publisher_connstr' PUBLICATION tap_pub
+ with (enabled = false)"
+);
+
+# need to create the name of the tablesync slot, for this we need the subscription OID
+# and the table OID.
+my $subid = $node_subscriber->safe_psql('postgres',
+ "SELECT oid FROM pg_subscription WHERE subname = 'tap_sub';");
+is(looks_like_number($subid), qq(1), 'get the subscription OID');
+
+my $relid = $node_subscriber->safe_psql('postgres',
+ "SELECT 'tab_rep_next'::regclass::oid");
+is(looks_like_number($relid), qq(1), 'get the table OID');
+
+# name of the tablesync slot is 'pg_'suboid'_sync_'tableoid'.
+my $slotname = 'pg_' . $subid . '_' . 'sync_' . $relid;
+
+# temporarily, create a slot having the same name of the tablesync slot.
+$node_publisher->safe_psql('postgres',
+ "SELECT 'init' FROM pg_create_logical_replication_slot('$slotname', 'pgoutput', false);");
+
+# enable the subscription
+$node_subscriber->safe_psql('postgres',
+ "ALTER SUBSCRIPTION tap_sub ENABLE"
+);
+
+# it will be stuck on data sync as slot create will fail because slot already exists.
+$node_subscriber->poll_query_until('postgres', $started_query)
+ or die "Timed out while waiting for subscriber to start sync";
+
+# now drop the offending slot, the tablesync should recover.
+$node_publisher->safe_psql('postgres',
+ "SELECT pg_drop_replication_slot('$slotname');");
+
+# wait for sync to finish
+$node_subscriber->poll_query_until('postgres', $synced_query)
+ or die "Timed out while waiting for subscriber to synchronize data";
+
+$result = $node_subscriber->safe_psql('postgres',
+ "SELECT count(*) FROM tab_rep_next");
+is($result, qq(20),
+ 'data for table added after subscription initialized are now synced');
+
+# Cleanup
$node_subscriber->safe_psql('postgres', "DROP SUBSCRIPTION tap_sub");
$node_subscriber->stop('fast');
--
1.8.3.1
v22-0002-Tablesync-extra-logging.patchapplication/octet-stream; name=v22-0002-Tablesync-extra-logging.patchDownload
From 1a1d1a18b999c45fd546332acec83690f1d39b6e Mon Sep 17 00:00:00 2001
From: Peter Smith <peter.b.smith@fujitsu.com>
Date: Fri, 29 Jan 2021 21:29:31 +1100
Subject: [PATCH v22] Tablesync extra logging.
This patch only adds some extra logging which may be helpful for testing, but is not for committing.
---
src/backend/commands/subscriptioncmds.c | 16 +++++++++++
src/backend/replication/logical/tablesync.c | 44 +++++++++++++++++++++++++++--
2 files changed, 58 insertions(+), 2 deletions(-)
diff --git a/src/backend/commands/subscriptioncmds.c b/src/backend/commands/subscriptioncmds.c
index b9ecf04..ed55934 100644
--- a/src/backend/commands/subscriptioncmds.c
+++ b/src/backend/commands/subscriptioncmds.c
@@ -684,12 +684,20 @@ AlterSubscription_refresh(Subscription *sub, bool copy_data)
char syncslotname[NAMEDATALEN] = {0};
ReplicationSlotNameForTablesync(sub->oid, relid, syncslotname);
+ elog(LOG,
+ "!!>> AlterSubscription_refresh: dropping sync slot \"%s\"",
+ syncslotname);
ReplicationSlotDropAtPubNode(wrconn, syncslotname, true /* missing_ok */ );
+ elog(LOG,
+ "!!>> AlterSubscription_refresh: dropped sync slot \"%s\"",
+ syncslotname);
}
/*
* Drop the tablesync's origin tracking if exists.
*/
+ elog(LOG,
+ "!!>> AlterSubscription_refresh: call tablesync_replorigin_drop");
tablesync_replorigin_drop(sub->oid, relid, false /* nowait */ );
ereport(DEBUG1,
@@ -1150,7 +1158,13 @@ DropSubscription(DropSubscriptionStmt *stmt, bool isTopLevel)
ReplicationSlotNameForTablesync(subid, relid, syncslotname);
if (wrconn)
{
+ elog(LOG,
+ "!!>> DropSubscription: dropping sync slot \"%s\"",
+ syncslotname);
ReplicationSlotDropAtPubNode(wrconn, syncslotname, true /* missing_ok */ );
+ elog(LOG,
+ "!!>> DropSubscription: dropped sync slot \"%s\"",
+ syncslotname);
}
else
{
@@ -1181,6 +1195,8 @@ DropSubscription(DropSubscriptionStmt *stmt, bool isTopLevel)
/*
* Drop the tablesync's origin tracking if exists.
*/
+ elog(LOG,
+ "!!>> DropSubscription: call tablesync_replorigin_drop");
tablesync_replorigin_drop(subid, relid, false /* nowait */ );
}
list_free(rstates);
diff --git a/src/backend/replication/logical/tablesync.c b/src/backend/replication/logical/tablesync.c
index 165086a..9cc3cdf 100644
--- a/src/backend/replication/logical/tablesync.c
+++ b/src/backend/replication/logical/tablesync.c
@@ -145,7 +145,13 @@ tablesync_replorigin_drop(Oid subid, Oid relid, bool nowait)
{
PG_TRY();
{
+ elog(LOG,
+ "!!>> tablesync_replorigin_drop: droppping origin OID %d, named \"%s\"",
+ originid, originname);
replorigin_drop(originid, nowait);
+ elog(LOG,
+ "!!>> tablesync_replorigin_drop: dropped origin OID %d, named \"%s\"",
+ originid, originname);
}
PG_CATCH();
{
@@ -338,7 +344,13 @@ process_syncing_tables_for_sync(XLogRecPtr current_lsn)
*/
ReplicationSlotNameForTablesync(subid, relid, syncslotname);
+ elog(LOG,
+ "!!>> process_syncing_tables_for_sync: dropping the tablesync slot \"%s\".",
+ syncslotname);
ReplicationSlotDropAtPubNode(wrconn, syncslotname, false /* missing_ok */ );
+ elog(LOG,
+ "!!>> process_syncing_tables_for_sync: dropped the tablesync slot \"%s\".",
+ syncslotname);
/*
* Change state to SYNCDONE.
@@ -492,6 +504,8 @@ process_syncing_tables_for_apply(XLogRecPtr current_lsn)
* orign then would prevent the origin from advancing properly
* on commit TX.
*/
+ elog(LOG,
+ "!!>> process_syncing_tables_for_apply: call tablesync_replorigin_drop");
tablesync_replorigin_drop(MyLogicalRepWorker->subid,
rstate->relid, false /* nowait */ );
@@ -999,6 +1013,8 @@ LogicalRepSyncTableStart(XLogRecPtr *origin_startpos)
* The COPY phase was previously done, but tablesync then crashed
* before it was able to finish normally.
*/
+ elog(LOG,
+ "!!>> LogicalRepSyncTableStart: tablesync relstate was SUBREL_STATE_FINISHEDCOPY.");
StartTransactionCommand();
/*
@@ -1006,8 +1022,14 @@ LogicalRepSyncTableStart(XLogRecPtr *origin_startpos)
* time this tablesync was launched.
*/
originid = replorigin_by_name(originname, false /* missing_ok */ );
+ elog(LOG,
+ "!!>> LogicalRepSyncTableStart: 2 replorigin_session_setup \"%s\".",
+ originname);
replorigin_session_setup(originid);
replorigin_session_origin = originid;
+ elog(LOG,
+ "!!>> LogicalRepSyncTableStart: 2 replorigin_session_get_progress \"%s\".",
+ originname);
*origin_startpos = replorigin_session_get_progress(false);
CommitTransactionCommand();
@@ -1063,6 +1085,9 @@ LogicalRepSyncTableStart(XLogRecPtr *origin_startpos)
* used for the catchup phase after COPY is done, so tell it to use
* the snapshot to make the final data consistent.
*/
+ elog(LOG,
+ "!!>> LogicalRepSyncTableStart: walrcv_create_slot for \"%s\".",
+ slotname);
walrcv_create_slot(wrconn, slotname, false /* permanent */ ,
CRS_USE_SNAPSHOT, origin_startpos);
@@ -1094,13 +1119,22 @@ LogicalRepSyncTableStart(XLogRecPtr *origin_startpos)
* WAL logged for the purpose of recovery. Locks are to prevent
* the replication origin from vanishing while advancing.
*/
+ elog(LOG,
+ "!!>> LogicalRepSyncTableStart: 1 replorigin_create \"%s\".",
+ originname);
originid = replorigin_create(originname);
LockRelationOid(ReplicationOriginRelationId, RowExclusiveLock);
+ elog(LOG,
+ "!!>> LogicalRepSyncTableStart: 1 replorigin_advance \"%s\".",
+ originname);
replorigin_advance(originid, *origin_startpos, InvalidXLogRecPtr,
true /* go backward */ , true /* WAL log */ );
UnlockRelationOid(ReplicationOriginRelationId, RowExclusiveLock);
+ elog(LOG,
+ "!!>> LogicalRepSyncTableStart: 1 replorigin_session_setup \"%s\".",
+ originname);
replorigin_session_setup(originid);
replorigin_session_origin = originid;
}
@@ -1129,7 +1163,13 @@ LogicalRepSyncTableStart(XLogRecPtr *origin_startpos)
* If something failed during copy table then cleanup the created
* slot.
*/
+ elog(LOG,
+ "!!>> LogicalRepSyncTableStart: tablesync copy failed. Dropping the tablesync slot \"%s\".",
+ slotname);
ReplicationSlotDropAtPubNode(wrconn, slotname, false /* missing_ok */ );
+ elog(LOG,
+ "!!>> LogicalRepSyncTableStart: tablesync copy failed. Dropped the tablesync slot \"%s\".",
+ slotname);
pfree(slotname);
slotname = NULL;
@@ -1140,8 +1180,8 @@ LogicalRepSyncTableStart(XLogRecPtr *origin_startpos)
copy_table_done:
- elog(DEBUG1,
- "LogicalRepSyncTableStart: '%s' origin_startpos lsn %X/%X",
+ elog(LOG,
+ "!!>> LogicalRepSyncTableStart: '%s' origin_startpos lsn %X/%X",
originname,
(uint32) (*origin_startpos >> 32),
(uint32) *origin_startpos);
--
1.8.3.1
On Thu, Jan 28, 2021 at 9:37 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
On Thu, Jan 28, 2021 at 12:32 PM Peter Smith <smithpb2250@gmail.com> wrote:
On Wed, Jan 27, 2021 at 2:53 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
On Sat, Jan 23, 2021 at 5:56 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
On Sat, Jan 23, 2021 at 4:55 AM Peter Smith <smithpb2250@gmail.com> wrote:
PSA the v18 patch for the Tablesync Solution1.
7. Have you tested with the new patch the scenario where we crash
after FINISHEDCOPY and before SYNCDONE, is it able to pick up the
replication using the new temporary slot? Here, we need to test the
case where during the catchup phase we have received few commits and
then the tablesync worker is crashed/errored out? Basically, check if
the replication is continued from the same point?I have tested this and it didn't work, see the below example.
Publisher-side
================
CREATE TABLE mytbl1(id SERIAL PRIMARY KEY, somedata int, text varchar(120));BEGIN;
INSERT INTO mytbl1(somedata, text) VALUES (1, 1);
INSERT INTO mytbl1(somedata, text) VALUES (1, 2);
COMMIT;CREATE PUBLICATION mypublication FOR TABLE mytbl1;
Subscriber-side
================
- Have a while(1) loop in LogicalRepSyncTableStart so that tablesync
worker stops.CREATE TABLE mytbl1(id SERIAL PRIMARY KEY, somedata int, text varchar(120));
CREATE SUBSCRIPTION mysub
CONNECTION 'host=localhost port=5432 dbname=postgres'
PUBLICATION mypublication;During debug, stop after we mark FINISHEDCOPY state.
Publisher-side
================
INSERT INTO mytbl1(somedata, text) VALUES (1, 3);
INSERT INTO mytbl1(somedata, text) VALUES (1, 4);Subscriber-side
================
- Have a breakpoint in apply_dispatch
- continue in debugger;
- After we replay first commit (which will be for values(1,3), note
down the origin position in apply_handle_commit_internal and somehow
error out. I have forced the debugger to set to the last line in
apply_dispatch where the error is raised.
- After the error, again the tablesync worker is restarted and it
starts from the position noted in the previous step
- It exits without replaying the WAL for (1,4)So, on the subscriber-side, you will see 3 records. Fourth is missing.
Now, if you insert more records on the publisher, it will anyway
replay those but the fourth one got missing....
At this point, I can't think of any way to fix this problem except for
going back to the previous approach of permanent slots but let me know
if you have any ideas to salvage this approach?OK. The latest patch [v21] now restores the permanent slot (and slot
cleanup) approach as it was implemented in an earlier version [v17].
Please note that this change also re-introduces some potential slot
cleanup problems for some race scenarios.I am able to reproduce the race condition where slot/origin will
remain on the publisher node even when the corresponding subscription
is dropped. Basically, if we error out in the 'catchup' phase in
tablesync worker then either it will restart and cleanup slot/origin
or if in the meantime we have dropped the subscription and stopped
apply worker then probably the slot and origin will be dangling on the
publisher.I have used exactly the same test procedure as was used to expose the
problem in the temporary slots with some minor changes as mentioned
below:
Subscriber-side
================
- Have a while(1) loop in LogicalRepSyncTableStart so that tablesync
worker stops.
- Have a while(1) loop in wait_for_relation_state_change so that we
can control apply worker via debugger at the right time.Subscriber-side
================
- Have a breakpoint in apply_dispatch
- continue in debugger;
- After we replay first commit somehow error out. I have forced the
debugger to set to the last line in apply_dispatch where the error is
raised.
- Now, the table sync worker won't restart because the apply worker is
looping in wait_for_relation_state_change.
- Execute DropSubscription;
- We can allow apply worker to continue by skipping the while(1) and
it will exit because DropSubscription would have sent a terminate
signal.After the above steps, check the publisher (select * from
pg_replication_slots) and you will find the dangling tablesync slot.I think to solve the above problem we should drop tablesync
slot/origin at the Drop/Alter Subscription time and additionally we
need to ensure that apply worker doesn't let tablesync workers restart
(or it must not do any work to access the slot because the slots are
dropped) once we stopped them. To ensure that, I think we need to make
the following changes:1. Take AccessExclusivelock on subscription_rel during Alter (before
calling RemoveSubscriptionRel) and don't release it till transaction
end (do table_close with NoLock) similar to DropSubscription.
2. Take share lock (AccessShareLock) in GetSubscriptionRelState (it
gets called from logicalrepsyncstartworker), we can release this lock
at the end of that function. This will ensure that even if the
tablesync worker is restarted, it will be blocked till the transaction
performing Alter will commit.
3. Make Alter command to not run in xact block so that we don't keep
locks for a longer time and second for the slots related stuff similar
to dropsubscription.
OK. The latest patch [v22] changes the code as suggested above.
Few comments on v21:
===================
1.
DropSubscription()
{
..
- /* Clean up dependencies */
+ /* Clean up dependencies. */
deleteSharedDependencyRecordsFor(SubscriptionRelationId, subid, 0);
..
}The above change seems unnecessary w.r.t current patch.
OK. Modified in patch [v22].
2. DropSubscription() { .. /* - * If there is no slot associated with the subscription, we can finish - * here. + * If there is a slot associated with the subscription, then drop the + * replication slot at the publisher node using the replication + * connection. */ - if (!slotname) + if (slotname) { - table_close(rel, NoLock); - return; .. }What is the reason for this change? Can't we keep the check in its
existing form?
I think the above comment is longer applicable in the latest patch [v22].
Early exit for null slotname is not desirable anymore; we still need
to process all the tablesync slots/origins regardless.
----
[v22] /messages/by-id/CAHut+PtrAVrtjc8srASTeUhbJtviw0Up-bzFSc14Ss=mAMxz9g@mail.gmail.com
Kind Regards,
Peter Smith.
Fujitsu Australia
On Fri, Jan 29, 2021 at 4:07 PM Peter Smith <smithpb2250@gmail.com> wrote:
Differences from v21: + Patch is rebased to latest OSS HEAD @ 29/Jan. + Includes new code as suggested [ak0128] to ensure no dangling slots at Drop/AlterSubscription. + Removes the slot/origin cleanup down by process interrupt logic (cleanup_at_shutdown function). + Addresses some minor review comments.
I have made the below changes in the patch. Let me know what you think
about these?
1. It was a bit difficult to understand the code in DropSubscription
so I have rearranged the code to match the way we are doing in HEAD
where we drop the slots at the end after finishing all the other
cleanup.
2. In AlterSubscription_refresh(), we can't allow workers to be
stopped at commit time as we have already dropped the slots because
the worker can access the dropped slot. We need to stop the workers
before dropping slots. This makes all the code related to
logicalrep_worker_stop_at_commit redundant.
3. In AlterSubscription_refresh(), we need to acquire the lock on
pg_subscription_rel only when we try to remove any subscription rel.
4. Added/Changed quite a few comments.
--
With Regards,
Amit Kapila.
Attachments:
v23-0001-Tablesync-Solution1.patchapplication/octet-stream; name=v23-0001-Tablesync-Solution1.patchDownload
From 6f6cc3efafb8e959b27083429e32ec5165230527 Mon Sep 17 00:00:00 2001
From: Amit Kapila <akapila@postgresql.org>
Date: Sat, 30 Jan 2021 10:21:28 +0530
Subject: [PATCH v23] Tablesync Solution1. ==== Features:
* The tablesync slot is now permanent instead of temporary.
* The tablesync worker is now allowing multiple tx instead of single tx.
* A new state (SUBREL_STATE_FINISHEDCOPY) is persisted after a successful copy_table in tablesync's LogicalRepSyncTableStart.
* If a re-launched tablesync finds state SUBREL_STATE_FINISHEDCOPY then it will bypass the initial copy_table phase.
* Now tablesync sets up replication origin tracking in LogicalRepSyncTableStart (similar as done for the apply worker). The origin is advanced when first created.
* Cleanup of tablesync resources:
- The tablesync slot is dropped by process_syncing_tables_for_sync functions.
- The tablesync replication origin tracking is dropped by process_syncing_tables_for_apply.
- DropSubscription/AlterSubscription_refresh also drop tablesyc slots/origins
* Updates to PG docs.
* New TAP test case.
Known Issues:
* None.
---
doc/src/sgml/catalogs.sgml | 1 +
doc/src/sgml/logical-replication.sgml | 17 +-
doc/src/sgml/ref/drop_subscription.sgml | 6 +-
src/backend/access/transam/xact.c | 11 -
src/backend/catalog/pg_subscription.c | 5 +
src/backend/commands/subscriptioncmds.c | 382 +++++++++++++++-----
src/backend/replication/logical/launcher.c | 147 --------
src/backend/replication/logical/tablesync.c | 294 ++++++++++++---
src/backend/replication/logical/worker.c | 18 +-
src/backend/tcop/utility.c | 3 +-
src/include/catalog/pg_subscription_rel.h | 2 +
src/include/commands/subscriptioncmds.h | 2 +-
src/include/replication/logicallauncher.h | 2 -
src/include/replication/slot.h | 3 +
src/include/replication/worker_internal.h | 3 +-
src/test/subscription/t/004_sync.pl | 69 +++-
16 files changed, 641 insertions(+), 324 deletions(-)
diff --git a/doc/src/sgml/catalogs.sgml b/doc/src/sgml/catalogs.sgml
index 865e826fb0..920a39dfa9 100644
--- a/doc/src/sgml/catalogs.sgml
+++ b/doc/src/sgml/catalogs.sgml
@@ -7665,6 +7665,7 @@ SCRAM-SHA-256$<replaceable><iteration count></replaceable>:<replaceable>&l
State code:
<literal>i</literal> = initialize,
<literal>d</literal> = data is being copied,
+ <literal>f</literal> = finished table copy,
<literal>s</literal> = synchronized,
<literal>r</literal> = ready (normal replication)
</para></entry>
diff --git a/doc/src/sgml/logical-replication.sgml b/doc/src/sgml/logical-replication.sgml
index a560ad69b4..20cdd5715d 100644
--- a/doc/src/sgml/logical-replication.sgml
+++ b/doc/src/sgml/logical-replication.sgml
@@ -248,7 +248,17 @@
<para>
As mentioned earlier, each (active) subscription receives changes from a
- replication slot on the remote (publishing) side. Normally, the remote
+ replication slot on the remote (publishing) side.
+ </para>
+ <para>
+ Additional table synchronization slots are normally transient, created
+ internally and dropped automatically when they are no longer needed.
+ These table synchronization slots have generated names:
+ <quote><literal>pg_%u_sync_%u</literal></quote> (parameters: Subscription
+ <parameter>oid</parameter>, Table <parameter>relid</parameter>)
+ </para>
+ <para>
+ Normally, the remote
replication slot is created automatically when the subscription is created
using <command>CREATE SUBSCRIPTION</command> and it is dropped
automatically when the subscription is dropped using <command>DROP
@@ -294,8 +304,9 @@
using <command>ALTER SUBSCRIPTION</command> before attempting to drop
the subscription. If the remote database instance no longer exists, no
further action is then necessary. If, however, the remote database
- instance is just unreachable, the replication slot should then be
- dropped manually; otherwise it would continue to reserve WAL and might
+ instance is just unreachable, the replication slot (and any still
+ remaining table synchronization slots) should then be
+ dropped manually; otherwise it/they would continue to reserve WAL and might
eventually cause the disk to fill up. Such cases should be carefully
investigated.
</para>
diff --git a/doc/src/sgml/ref/drop_subscription.sgml b/doc/src/sgml/ref/drop_subscription.sgml
index adbdeafb4e..aee9615546 100644
--- a/doc/src/sgml/ref/drop_subscription.sgml
+++ b/doc/src/sgml/ref/drop_subscription.sgml
@@ -79,7 +79,8 @@ DROP SUBSCRIPTION [ IF EXISTS ] <replaceable class="parameter">name</replaceable
<para>
When dropping a subscription that is associated with a replication slot on
the remote host (the normal state), <command>DROP SUBSCRIPTION</command>
- will connect to the remote host and try to drop the replication slot as
+ will connect to the remote host and try to drop the replication slot (and
+ any remaining table synchronization slots) as
part of its operation. This is necessary so that the resources allocated
for the subscription on the remote host are released. If this fails,
either because the remote host is not reachable or because the remote
@@ -89,7 +90,8 @@ DROP SUBSCRIPTION [ IF EXISTS ] <replaceable class="parameter">name</replaceable
executing <literal>ALTER SUBSCRIPTION ... SET (slot_name = NONE)</literal>.
After that, <command>DROP SUBSCRIPTION</command> will no longer attempt any
actions on a remote host. Note that if the remote replication slot still
- exists, it should then be dropped manually; otherwise it will continue to
+ exists, it (and any related table synchronization slots) should then be
+ dropped manually; otherwise it/they will continue to
reserve WAL and might eventually cause the disk to fill up. See
also <xref linkend="logical-replication-subscription-slot"/>.
</para>
diff --git a/src/backend/access/transam/xact.c b/src/backend/access/transam/xact.c
index a2068e3fd4..3c8b4eb362 100644
--- a/src/backend/access/transam/xact.c
+++ b/src/backend/access/transam/xact.c
@@ -2432,15 +2432,6 @@ PrepareTransaction(void)
(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
errmsg("cannot PREPARE a transaction that has exported snapshots")));
- /*
- * Don't allow PREPARE but for transaction that has/might kill logical
- * replication workers.
- */
- if (XactManipulatesLogicalReplicationWorkers())
- ereport(ERROR,
- (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
- errmsg("cannot PREPARE a transaction that has manipulated logical replication workers")));
-
/* Prevent cancel/die interrupt while cleaning up */
HOLD_INTERRUPTS();
@@ -4899,7 +4890,6 @@ CommitSubTransaction(void)
AtEOSubXact_HashTables(true, s->nestingLevel);
AtEOSubXact_PgStat(true, s->nestingLevel);
AtSubCommit_Snapshot(s->nestingLevel);
- AtEOSubXact_ApplyLauncher(true, s->nestingLevel);
/*
* We need to restore the upper transaction's read-only state, in case the
@@ -5059,7 +5049,6 @@ AbortSubTransaction(void)
AtEOSubXact_HashTables(false, s->nestingLevel);
AtEOSubXact_PgStat(false, s->nestingLevel);
AtSubAbort_Snapshot(s->nestingLevel);
- AtEOSubXact_ApplyLauncher(false, s->nestingLevel);
}
/*
diff --git a/src/backend/catalog/pg_subscription.c b/src/backend/catalog/pg_subscription.c
index 44cb285b68..303791d580 100644
--- a/src/backend/catalog/pg_subscription.c
+++ b/src/backend/catalog/pg_subscription.c
@@ -337,6 +337,9 @@ GetSubscriptionRelState(Oid subid, Oid relid, XLogRecPtr *sublsn)
char substate;
bool isnull;
Datum d;
+ Relation rel;
+
+ rel = table_open(SubscriptionRelRelationId, AccessShareLock);
/* Try finding the mapping. */
tup = SearchSysCache2(SUBSCRIPTIONRELMAP,
@@ -363,6 +366,8 @@ GetSubscriptionRelState(Oid subid, Oid relid, XLogRecPtr *sublsn)
/* Cleanup */
ReleaseSysCache(tup);
+ table_close(rel, AccessShareLock);
+
return substate;
}
diff --git a/src/backend/commands/subscriptioncmds.c b/src/backend/commands/subscriptioncmds.c
index 082f7855b8..b15964e462 100644
--- a/src/backend/commands/subscriptioncmds.c
+++ b/src/backend/commands/subscriptioncmds.c
@@ -34,6 +34,7 @@
#include "nodes/makefuncs.h"
#include "replication/logicallauncher.h"
#include "replication/origin.h"
+#include "replication/slot.h"
#include "replication/walreceiver.h"
#include "replication/walsender.h"
#include "replication/worker_internal.h"
@@ -566,107 +567,175 @@ AlterSubscription_refresh(Subscription *sub, bool copy_data)
Oid *pubrel_local_oids;
ListCell *lc;
int off;
+ Relation rel;
+ bool sub_rel_locked = false;
/* Load the library providing us libpq calls. */
load_file("libpqwalreceiver", false);
- /* Try to connect to the publisher. */
- wrconn = walrcv_connect(sub->conninfo, true, sub->name, &err);
- if (!wrconn)
- ereport(ERROR,
- (errmsg("could not connect to the publisher: %s", err)));
+ PG_TRY();
+ {
+ /* Try to connect to the publisher. */
+ wrconn = walrcv_connect(sub->conninfo, true, sub->name, &err);
+ if (!wrconn)
+ ereport(ERROR,
+ (errmsg("could not connect to the publisher: %s", err)));
- /* Get the table list from publisher. */
- pubrel_names = fetch_table_list(wrconn, sub->publications);
+ /* Get the table list from publisher. */
+ pubrel_names = fetch_table_list(wrconn, sub->publications);
- /* We are done with the remote side, close connection. */
- walrcv_disconnect(wrconn);
+ /* Get local table list. */
+ subrel_states = GetSubscriptionRelations(sub->oid);
- /* Get local table list. */
- subrel_states = GetSubscriptionRelations(sub->oid);
+ /*
+ * Build qsorted array of local table oids for faster lookup. This can
+ * potentially contain all tables in the database so speed of lookup
+ * is important.
+ */
+ subrel_local_oids = palloc(list_length(subrel_states) * sizeof(Oid));
+ off = 0;
+ foreach(lc, subrel_states)
+ {
+ SubscriptionRelState *relstate = (SubscriptionRelState *) lfirst(lc);
- /*
- * Build qsorted array of local table oids for faster lookup. This can
- * potentially contain all tables in the database so speed of lookup is
- * important.
- */
- subrel_local_oids = palloc(list_length(subrel_states) * sizeof(Oid));
- off = 0;
- foreach(lc, subrel_states)
- {
- SubscriptionRelState *relstate = (SubscriptionRelState *) lfirst(lc);
+ subrel_local_oids[off++] = relstate->relid;
+ }
+ qsort(subrel_local_oids, list_length(subrel_states),
+ sizeof(Oid), oid_cmp);
+
+ /*
+ * Walk over the remote tables and try to match them to locally known
+ * tables. If the table is not known locally create a new state for
+ * it.
+ *
+ * Also builds array of local oids of remote tables for the next step.
+ */
+ off = 0;
+ pubrel_local_oids = palloc(list_length(pubrel_names) * sizeof(Oid));
+
+ foreach(lc, pubrel_names)
+ {
+ RangeVar *rv = (RangeVar *) lfirst(lc);
+ Oid relid;
- subrel_local_oids[off++] = relstate->relid;
- }
- qsort(subrel_local_oids, list_length(subrel_states),
- sizeof(Oid), oid_cmp);
+ relid = RangeVarGetRelid(rv, AccessShareLock, false);
- /*
- * Walk over the remote tables and try to match them to locally known
- * tables. If the table is not known locally create a new state for it.
- *
- * Also builds array of local oids of remote tables for the next step.
- */
- off = 0;
- pubrel_local_oids = palloc(list_length(pubrel_names) * sizeof(Oid));
-
- foreach(lc, pubrel_names)
- {
- RangeVar *rv = (RangeVar *) lfirst(lc);
- Oid relid;
+ /* Check for supported relkind. */
+ CheckSubscriptionRelkind(get_rel_relkind(relid),
+ rv->schemaname, rv->relname);
- relid = RangeVarGetRelid(rv, AccessShareLock, false);
+ pubrel_local_oids[off++] = relid;
- /* Check for supported relkind. */
- CheckSubscriptionRelkind(get_rel_relkind(relid),
- rv->schemaname, rv->relname);
+ if (!bsearch(&relid, subrel_local_oids,
+ list_length(subrel_states), sizeof(Oid), oid_cmp))
+ {
+ AddSubscriptionRelState(sub->oid, relid,
+ copy_data ? SUBREL_STATE_INIT : SUBREL_STATE_READY,
+ InvalidXLogRecPtr);
+ ereport(DEBUG1,
+ (errmsg("table \"%s.%s\" added to subscription \"%s\"",
+ rv->schemaname, rv->relname, sub->name)));
+ }
+ }
- pubrel_local_oids[off++] = relid;
+ /*
+ * Next remove state for tables we should not care about anymore using
+ * the data we collected above
+ */
+ qsort(pubrel_local_oids, list_length(pubrel_names),
+ sizeof(Oid), oid_cmp);
- if (!bsearch(&relid, subrel_local_oids,
- list_length(subrel_states), sizeof(Oid), oid_cmp))
+ for (off = 0; off < list_length(subrel_states); off++)
{
- AddSubscriptionRelState(sub->oid, relid,
- copy_data ? SUBREL_STATE_INIT : SUBREL_STATE_READY,
- InvalidXLogRecPtr);
- ereport(DEBUG1,
- (errmsg("table \"%s.%s\" added to subscription \"%s\"",
- rv->schemaname, rv->relname, sub->name)));
- }
- }
+ Oid relid = subrel_local_oids[off];
- /*
- * Next remove state for tables we should not care about anymore using the
- * data we collected above
- */
- qsort(pubrel_local_oids, list_length(pubrel_names),
- sizeof(Oid), oid_cmp);
+ if (!bsearch(&relid, pubrel_local_oids,
+ list_length(pubrel_names), sizeof(Oid), oid_cmp))
+ {
+ char state;
+ XLogRecPtr statelsn;
+
+ /*
+ * Lock pg_subscription_rel with AccessExclusiveLock to prevent any race
+ * conditions with the apply worker re-launching workers at the same time
+ * this code is trying to remove those tables.
+ *
+ * Even if new worker for this particular rel is restarted it won't be able
+ * to make any progress as we hold exclusive lock on subscription_rel till
+ * the transaction end. It will simply exit as there is no corresponding
+ * rel entry.
+ *
+ * This locking also ensures that the state of rels won't change till we
+ * are done with this refresh operation.
+ */
+ if (!sub_rel_locked)
+ {
+ rel = table_open(SubscriptionRelRelationId, AccessExclusiveLock);
+ sub_rel_locked = true;
+ }
- for (off = 0; off < list_length(subrel_states); off++)
- {
- Oid relid = subrel_local_oids[off];
+ /* Last known rel state. */
+ state = GetSubscriptionRelState(sub->oid, relid, &statelsn);
- if (!bsearch(&relid, pubrel_local_oids,
- list_length(pubrel_names), sizeof(Oid), oid_cmp))
- {
- RemoveSubscriptionRel(sub->oid, relid);
+ RemoveSubscriptionRel(sub->oid, relid);
- logicalrep_worker_stop_at_commit(sub->oid, relid);
+ logicalrep_worker_stop(sub->oid, relid);
- ereport(DEBUG1,
- (errmsg("table \"%s.%s\" removed from subscription \"%s\"",
- get_namespace_name(get_rel_namespace(relid)),
- get_rel_name(relid),
- sub->name)));
+ /*
+ * For READY state, we would have already dropped the tablesync
+ * slot and origin.
+ */
+ if (state != SUBREL_STATE_READY)
+ {
+ /*
+ * Drop the tablesync slot.
+ *
+ * For SYNCDONE state we know the tablesync slot has already
+ * been dropped by the tablesync worker.
+ *
+ * For other states, there is no certainty. Maybe the slot
+ * does not exist yet; Maybe the slot is already deleted but
+ * SYNCDONE is not yet set. For this reason we allow
+ * missing_ok = true for the drop.
+ */
+ if (state != SUBREL_STATE_SYNCDONE)
+ {
+ char syncslotname[NAMEDATALEN] = { 0 };
+
+ ReplicationSlotNameForTablesync(sub->oid, relid, syncslotname);
+ ReplicationSlotDropAtPubNode(wrconn, syncslotname, true /* missing_ok */);
+ }
+
+ /*
+ * Drop the tablesync's origin tracking if exists.
+ */
+ tablesync_replorigin_drop(sub->oid, relid, false /* nowait */);
+ }
+
+ ereport(DEBUG1,
+ (errmsg("table \"%s.%s\" removed from subscription \"%s\"",
+ get_namespace_name(get_rel_namespace(relid)),
+ get_rel_name(relid),
+ sub->name)));
+ }
}
}
+ PG_FINALLY();
+ {
+ if (wrconn)
+ walrcv_disconnect(wrconn);
+ }
+ PG_END_TRY();
+
+ if (sub_rel_locked)
+ table_close(rel, NoLock);
}
/*
* Alter the existing subscription.
*/
ObjectAddress
-AlterSubscription(AlterSubscriptionStmt *stmt)
+AlterSubscription(AlterSubscriptionStmt *stmt, bool isTopLevel)
{
Relation rel;
ObjectAddress myself;
@@ -848,6 +917,8 @@ AlterSubscription(AlterSubscriptionStmt *stmt)
errmsg("ALTER SUBSCRIPTION with refresh is not allowed for disabled subscriptions"),
errhint("Use ALTER SUBSCRIPTION ... SET PUBLICATION ... WITH (refresh = false).")));
+ PreventInTransactionBlock(isTopLevel, "ALTER SUBSCRIPTION with refresh");
+
/* Make sure refresh sees the new list of publications. */
sub->publications = stmt->publication;
@@ -877,6 +948,8 @@ AlterSubscription(AlterSubscriptionStmt *stmt)
NULL, NULL, /* no "binary" */
NULL, NULL); /* no "streaming" */
+ PreventInTransactionBlock(isTopLevel, "ALTER SUBSCRIPTION ... REFRESH");
+
AlterSubscription_refresh(sub, copy_data);
break;
@@ -928,8 +1001,8 @@ DropSubscription(DropSubscriptionStmt *stmt, bool isTopLevel)
char *err = NULL;
RepOriginId originid;
WalReceiverConn *wrconn = NULL;
- StringInfoData cmd;
Form_pg_subscription form;
+ List *rstates;
/*
* Lock pg_subscription with AccessExclusiveLock to ensure that the
@@ -1042,6 +1115,31 @@ DropSubscription(DropSubscriptionStmt *stmt, bool isTopLevel)
}
list_free(subworkers);
+ /*
+ * Tablesync resource cleanup (slots and origins).
+ *
+ * Any READY-state relations would already have dealt with clean-ups.
+ *
+ * Note that the state can't change because we have already stopped both
+ * the apply and tablesync workers and they can't restart because of
+ * exclusive lock on the subscription.
+ */
+ rstates = GetSubscriptionNotReadyRelations(subid);
+ foreach(lc, rstates)
+ {
+ SubscriptionRelState* rstate = (SubscriptionRelState*) lfirst(lc);
+ Oid relid = rstate->relid;
+
+ /* Only cleanup resources of tablesync workers */
+ if (!OidIsValid(relid))
+ continue;
+
+ /*
+ * Drop the tablesync's origin tracking if exists.
+ */
+ tablesync_replorigin_drop(subid, relid, false /* nowait */);
+ }
+
/* Clean up dependencies */
deleteSharedDependencyRecordsFor(SubscriptionRelationId, subid, 0);
@@ -1054,34 +1152,114 @@ DropSubscription(DropSubscriptionStmt *stmt, bool isTopLevel)
if (originid != InvalidRepOriginId)
replorigin_drop(originid, false);
+
/*
* If there is no slot associated with the subscription, we can finish
* here.
*/
- if (!slotname)
+ if (!slotname && rstates == NIL)
{
table_close(rel, NoLock);
return;
}
/*
- * Otherwise drop the replication slot at the publisher node using the
- * replication connection.
+ * Try to acquire the connection necessary for dropping slots.
+ *
+ * Note: If the slotname is NONE/NULL then we allow the command to finish
+ * and users need to manually cleanup the apply and tablesync worker slots
+ * later.
*/
load_file("libpqwalreceiver", false);
- initStringInfo(&cmd);
- appendStringInfo(&cmd, "DROP_REPLICATION_SLOT %s WAIT", quote_identifier(slotname));
-
wrconn = walrcv_connect(conninfo, true, subname, &err);
if (wrconn == NULL)
- ereport(ERROR,
- (errmsg("could not connect to publisher when attempting to "
- "drop the replication slot \"%s\"", slotname),
- errdetail("The error was: %s", err),
- /* translator: %s is an SQL ALTER command */
- errhint("Use %s to disassociate the subscription from the slot.",
- "ALTER SUBSCRIPTION ... SET (slot_name = NONE)")));
+ {
+ if (!slotname)
+ {
+ /* be tidy */
+ list_free(rstates);
+ return;
+ }
+ else
+ {
+ ereport(ERROR,
+ (errmsg("could not connect to publisher when attempting to "
+ "drop the replication slot \"%s\"", slotname),
+ errdetail("The error was: %s", err),
+ /* translator: %s is an SQL ALTER command */
+ errhint("Use %s to disassociate the subscription from the slot.",
+ "ALTER SUBSCRIPTION ... SET (slot_name = NONE)")));
+ }
+
+ }
+
+ PG_TRY();
+ {
+ foreach(lc, rstates)
+ {
+ SubscriptionRelState *rstate = (SubscriptionRelState *) lfirst(lc);
+ Oid relid = rstate->relid;
+
+ /* Only cleanup resources of tablesync workers */
+ if (!OidIsValid(relid))
+ continue;
+
+ /*
+ * Drop the tablesync slot.
+ *
+ * For SYNCDONE/READY states, the tablesync slot is known to have
+ * already been dropped by the tablesync worker.
+ *
+ * For other states, there is no certainty. Maybe the slot does
+ * not exist yet; Maybe the slot is already deleted but SYNCDONE
+ * is not yet set. For this reason, we allow missing_ok = true for
+ * the drop.
+ */
+ if (rstate->state != SUBREL_STATE_SYNCDONE)
+ {
+ char syncslotname[NAMEDATALEN] = {0};
+
+ ReplicationSlotNameForTablesync(subid, relid, syncslotname);
+ ReplicationSlotDropAtPubNode(wrconn, syncslotname, true /* missing_ok */ );
+ }
+ }
+
+ list_free(rstates);
+
+ /*
+ * If there is a slot associated with the subscription, then drop the
+ * replication slot at the publisher.
+ */
+ if (slotname)
+ ReplicationSlotDropAtPubNode(wrconn, slotname, false /* missing_ok */ );
+
+ }
+ PG_FINALLY();
+ {
+ walrcv_disconnect(wrconn);
+ }
+ PG_END_TRY();
+
+ table_close(rel, NoLock);
+}
+
+/*
+ * Drop the replication slot at the publisher node using the replication connection.
+ *
+ * missing_ok - if true then only issue WARNING message if the slot cannot be deleted.
+ */
+void
+ReplicationSlotDropAtPubNode(WalReceiverConn *wrconn, char *slotname, bool missing_ok)
+{
+ StringInfoData cmd;
+
+ Assert(wrconn);
+
+ load_file("libpqwalreceiver", false);
+
+ initStringInfo(&cmd);
+ appendStringInfo(&cmd, "DROP_REPLICATION_SLOT %s WAIT", quote_identifier(slotname));
PG_TRY();
{
@@ -1089,27 +1267,37 @@ DropSubscription(DropSubscriptionStmt *stmt, bool isTopLevel)
res = walrcv_exec(wrconn, cmd.data, 0, NULL);
- if (res->status != WALRCV_OK_COMMAND)
- ereport(ERROR,
+ if (res->status == WALRCV_OK_COMMAND)
+ {
+ /* NOTICE. Success. */
+ ereport(NOTICE,
+ (errmsg("dropped replication slot \"%s\" on publisher",
+ slotname)));
+ }
+ else if (res->status == WALRCV_ERROR && missing_ok)
+ {
+ /* WARNING. Error, but missing_ok = true. */
+ ereport(WARNING,
(errmsg("could not drop the replication slot \"%s\" on publisher",
slotname),
errdetail("The error was: %s", res->err)));
+ }
else
- ereport(NOTICE,
- (errmsg("dropped replication slot \"%s\" on publisher",
- slotname)));
+ {
+ /* ERROR. */
+ ereport(ERROR,
+ (errmsg("could not drop the replication slot \"%s\" on publisher",
+ slotname),
+ errdetail("The error was: %s", res->err)));
+ }
walrcv_clear_result(res);
}
PG_FINALLY();
{
- walrcv_disconnect(wrconn);
+ pfree(cmd.data);
}
PG_END_TRY();
-
- pfree(cmd.data);
-
- table_close(rel, NoLock);
}
/*
diff --git a/src/backend/replication/logical/launcher.c b/src/backend/replication/logical/launcher.c
index 186514cd9e..58082dde18 100644
--- a/src/backend/replication/logical/launcher.c
+++ b/src/backend/replication/logical/launcher.c
@@ -73,20 +73,6 @@ typedef struct LogicalRepWorkerId
Oid relid;
} LogicalRepWorkerId;
-typedef struct StopWorkersData
-{
- int nestDepth; /* Sub-transaction nest level */
- List *workers; /* List of LogicalRepWorkerId */
- struct StopWorkersData *parent; /* This need not be an immediate
- * subtransaction parent */
-} StopWorkersData;
-
-/*
- * Stack of StopWorkersData elements. Each stack element contains the workers
- * to be stopped for that subtransaction.
- */
-static StopWorkersData *on_commit_stop_workers = NULL;
-
static void ApplyLauncherWakeup(void);
static void logicalrep_launcher_onexit(int code, Datum arg);
static void logicalrep_worker_onexit(int code, Datum arg);
@@ -546,51 +532,6 @@ logicalrep_worker_stop(Oid subid, Oid relid)
LWLockRelease(LogicalRepWorkerLock);
}
-/*
- * Request worker for specified sub/rel to be stopped on commit.
- */
-void
-logicalrep_worker_stop_at_commit(Oid subid, Oid relid)
-{
- int nestDepth = GetCurrentTransactionNestLevel();
- LogicalRepWorkerId *wid;
- MemoryContext oldctx;
-
- /* Make sure we store the info in context that survives until commit. */
- oldctx = MemoryContextSwitchTo(TopTransactionContext);
-
- /* Check that previous transactions were properly cleaned up. */
- Assert(on_commit_stop_workers == NULL ||
- nestDepth >= on_commit_stop_workers->nestDepth);
-
- /*
- * Push a new stack element if we don't already have one for the current
- * nestDepth.
- */
- if (on_commit_stop_workers == NULL ||
- nestDepth > on_commit_stop_workers->nestDepth)
- {
- StopWorkersData *newdata = palloc(sizeof(StopWorkersData));
-
- newdata->nestDepth = nestDepth;
- newdata->workers = NIL;
- newdata->parent = on_commit_stop_workers;
- on_commit_stop_workers = newdata;
- }
-
- /*
- * Finally add a new worker into the worker list of the current
- * subtransaction.
- */
- wid = palloc(sizeof(LogicalRepWorkerId));
- wid->subid = subid;
- wid->relid = relid;
- on_commit_stop_workers->workers =
- lappend(on_commit_stop_workers->workers, wid);
-
- MemoryContextSwitchTo(oldctx);
-}
-
/*
* Wake up (using latch) any logical replication worker for specified sub/rel.
*/
@@ -819,109 +760,21 @@ ApplyLauncherShmemInit(void)
}
}
-/*
- * Check whether current transaction has manipulated logical replication
- * workers.
- */
-bool
-XactManipulatesLogicalReplicationWorkers(void)
-{
- return (on_commit_stop_workers != NULL);
-}
-
/*
* Wakeup the launcher on commit if requested.
*/
void
AtEOXact_ApplyLauncher(bool isCommit)
{
-
- Assert(on_commit_stop_workers == NULL ||
- (on_commit_stop_workers->nestDepth == 1 &&
- on_commit_stop_workers->parent == NULL));
-
if (isCommit)
{
- ListCell *lc;
-
- if (on_commit_stop_workers != NULL)
- {
- List *workers = on_commit_stop_workers->workers;
-
- foreach(lc, workers)
- {
- LogicalRepWorkerId *wid = lfirst(lc);
-
- logicalrep_worker_stop(wid->subid, wid->relid);
- }
- }
-
if (on_commit_launcher_wakeup)
ApplyLauncherWakeup();
}
- /*
- * No need to pfree on_commit_stop_workers. It was allocated in
- * transaction memory context, which is going to be cleaned soon.
- */
- on_commit_stop_workers = NULL;
on_commit_launcher_wakeup = false;
}
-/*
- * On commit, merge the current on_commit_stop_workers list into the
- * immediate parent, if present.
- * On rollback, discard the current on_commit_stop_workers list.
- * Pop out the stack.
- */
-void
-AtEOSubXact_ApplyLauncher(bool isCommit, int nestDepth)
-{
- StopWorkersData *parent;
-
- /* Exit immediately if there's no work to do at this level. */
- if (on_commit_stop_workers == NULL ||
- on_commit_stop_workers->nestDepth < nestDepth)
- return;
-
- Assert(on_commit_stop_workers->nestDepth == nestDepth);
-
- parent = on_commit_stop_workers->parent;
-
- if (isCommit)
- {
- /*
- * If the upper stack element is not an immediate parent
- * subtransaction, just decrement the notional nesting depth without
- * doing any real work. Else, we need to merge the current workers
- * list into the parent.
- */
- if (!parent || parent->nestDepth < nestDepth - 1)
- {
- on_commit_stop_workers->nestDepth--;
- return;
- }
-
- parent->workers =
- list_concat(parent->workers, on_commit_stop_workers->workers);
- }
- else
- {
- /*
- * Abandon everything that was done at this nesting level. Explicitly
- * free memory to avoid a transaction-lifespan leak.
- */
- list_free_deep(on_commit_stop_workers->workers);
- }
-
- /*
- * We have taken care of the current subtransaction workers list for both
- * abort or commit. So we are ready to pop the stack.
- */
- pfree(on_commit_stop_workers);
- on_commit_stop_workers = parent;
-}
-
/*
* Request wakeup of the launcher on commit of the transaction.
*
diff --git a/src/backend/replication/logical/tablesync.c b/src/backend/replication/logical/tablesync.c
index 863d196fd7..165086ad66 100644
--- a/src/backend/replication/logical/tablesync.c
+++ b/src/backend/replication/logical/tablesync.c
@@ -31,8 +31,11 @@
* table state to INIT.
* - Tablesync worker starts; changes table state from INIT to DATASYNC while
* copying.
- * - Tablesync worker finishes the copy and sets table state to SYNCWAIT;
- * waits for state change.
+ * - Tablesync worker does initial table copy; there is a FINISHEDCOPY (sync
+ * worker specific) state to indicate when the copy phase has completed, so
+ * if the worker crashes with this (non-memory) state then the copy will not
+ * be re-attempted.
+ * - Tablesync worker then sets table state to SYNCWAIT; waits for state change.
* - Apply worker periodically checks for tables in SYNCWAIT state. When
* any appear, it sets the table state to CATCHUP and starts loop-waiting
* until either the table state is set to SYNCDONE or the sync worker
@@ -48,8 +51,8 @@
* point it sets state to READY and stops tracking. Again, there might
* be zero changes in between.
*
- * So the state progression is always: INIT -> DATASYNC -> SYNCWAIT ->
- * CATCHUP -> SYNCDONE -> READY.
+ * So the state progression is always: INIT -> DATASYNC -> FINISHEDCOPY
+ * -> SYNCWAIT -> CATCHUP -> SYNCDONE -> READY.
*
* The catalog pg_subscription_rel is used to keep information about
* subscribed tables and their state. Some transient state during data
@@ -59,6 +62,7 @@
* Example flows look like this:
* - Apply is in front:
* sync:8
+ * -> set in catalog FINISHEDCOPY
* -> set in memory SYNCWAIT
* apply:10
* -> set in memory CATCHUP
@@ -74,6 +78,7 @@
*
* - Sync is in front:
* sync:10
+ * -> set in catalog FINISHEDCOPY
* -> set in memory SYNCWAIT
* apply:8
* -> set in memory CATCHUP
@@ -98,11 +103,16 @@
#include "miscadmin.h"
#include "parser/parse_relation.h"
#include "pgstat.h"
+#include "postmaster/interrupt.h"
#include "replication/logicallauncher.h"
#include "replication/logicalrelation.h"
+#include "replication/logicalworker.h"
#include "replication/walreceiver.h"
#include "replication/worker_internal.h"
+#include "replication/slot.h"
+#include "replication/origin.h"
#include "storage/ipc.h"
+#include "storage/lmgr.h"
#include "utils/builtins.h"
#include "utils/lsyscache.h"
#include "utils/memutils.h"
@@ -112,6 +122,42 @@ static bool table_states_valid = false;
StringInfo copybuf = NULL;
+/*
+ * Common code to drop the origin of a tablesync worker.
+ *
+ * There is a potential race condition if two processes attempt to call
+ * replorigin_drop for the same originid at the same time. The loser of
+ * that race would give an ERROR saying that it failed to find the
+ * expected originid.
+ *
+ * The TRY/CATCH below supresses such errors allowing the tablesync cleanup
+ * code to proceed.
+ */
+void
+tablesync_replorigin_drop(Oid subid, Oid relid, bool nowait)
+{
+ char originname[NAMEDATALEN];
+ RepOriginId originid;
+
+ snprintf(originname, sizeof(originname), "pg_%u_%u", subid, relid);
+ originid = replorigin_by_name(originname, true);
+ if (OidIsValid(originid))
+ {
+ PG_TRY();
+ {
+ replorigin_drop(originid, nowait);
+ }
+ PG_CATCH();
+ {
+ ereport(WARNING,
+ errmsg("could not drop replication origin with OID %d, named \"%s\"",
+ originid,
+ originname));
+ }
+ PG_END_TRY();
+ }
+}
+
/*
* Exit routine for synchronization worker.
*/
@@ -270,30 +316,55 @@ invalidate_syncing_table_states(Datum arg, int cacheid, uint32 hashvalue)
static void
process_syncing_tables_for_sync(XLogRecPtr current_lsn)
{
- Assert(IsTransactionState());
+ bool sync_done = false;
+ Oid subid = MySubscription->oid;
+ Oid relid = MyLogicalRepWorker->relid;
SpinLockAcquire(&MyLogicalRepWorker->relmutex);
+ sync_done = MyLogicalRepWorker->relstate == SUBREL_STATE_CATCHUP &&
+ current_lsn >= MyLogicalRepWorker->relstate_lsn;
+ SpinLockRelease(&MyLogicalRepWorker->relmutex);
- if (MyLogicalRepWorker->relstate == SUBREL_STATE_CATCHUP &&
- current_lsn >= MyLogicalRepWorker->relstate_lsn)
+ if (sync_done)
{
TimeLineID tli;
+ char syncslotname[NAMEDATALEN] = {0};
+
+ /* End wal streaming so wrconn can be re-used to drop the slot. */
+ walrcv_endstreaming(wrconn, &tli);
+
+ /*
+ * Cleanup the tablesync slot.
+ */
+ ReplicationSlotNameForTablesync(subid, relid, syncslotname);
+ ReplicationSlotDropAtPubNode(wrconn, syncslotname, false /* missing_ok */ );
+
+ /*
+ * Change state to SYNCDONE.
+ */
+ SpinLockAcquire(&MyLogicalRepWorker->relmutex);
MyLogicalRepWorker->relstate = SUBREL_STATE_SYNCDONE;
MyLogicalRepWorker->relstate_lsn = current_lsn;
SpinLockRelease(&MyLogicalRepWorker->relmutex);
+ /*
+ * UpdateSubscriptionRelState must be called within a transaction.
+ * That transaction will be ended within the finish_sync_worker().
+ */
+ if (!IsTransactionState())
+ {
+ StartTransactionCommand();
+ }
+
UpdateSubscriptionRelState(MyLogicalRepWorker->subid,
MyLogicalRepWorker->relid,
MyLogicalRepWorker->relstate,
MyLogicalRepWorker->relstate_lsn);
- walrcv_endstreaming(wrconn, &tli);
finish_sync_worker();
}
- else
- SpinLockRelease(&MyLogicalRepWorker->relmutex);
}
/*
@@ -412,6 +483,21 @@ process_syncing_tables_for_apply(XLogRecPtr current_lsn)
started_tx = true;
}
+ /*
+ * Remove the tablesync origin tracking if exists.
+ *
+ * The normal case origin drop is done here instead of in the
+ * process_syncing_tables_for_sync function because if the
+ * tablesync worker process attempted to call drop its own
+ * orign then would prevent the origin from advancing properly
+ * on commit TX.
+ */
+ tablesync_replorigin_drop(MyLogicalRepWorker->subid,
+ rstate->relid, false /* nowait */ );
+
+ /*
+ * Update the state to READY only after the origin cleanup.
+ */
UpdateSubscriptionRelState(MyLogicalRepWorker->subid,
rstate->relid, rstate->state,
rstate->lsn);
@@ -807,6 +893,40 @@ copy_table(Relation rel)
logicalrep_rel_close(relmapentry, NoLock);
}
+/*
+ * Determine the tablesync slot name.
+ *
+ * The name must not exceed NAMEDATALEN -1 because of remote node constraints
+ * on slot name length.
+ *
+ * The returned slot name is either:
+ * - stored in the supplied buffer (syncslotname), or
+ * - palloc'ed in current memory context (if syncslotname = NULL).
+ */
+char *
+ReplicationSlotNameForTablesync(Oid suboid, Oid relid, char syncslotname[NAMEDATALEN])
+{
+ /*
+ * Note: Since now we are using PERMANENT tablesync slots this code is not
+ * using the Subscription slot name as the first part of the tablesync
+ * slot name anymore. This part is omitted because we are now responsible
+ * for cleaning up the permanenet tablesync slots, so it could become
+ * impossible to recalculate what name to cleanup if the Subscription slot
+ * name had changed.
+ */
+
+ if (syncslotname)
+ {
+ sprintf(syncslotname, "pg_%u_sync_%u", suboid, relid);
+ }
+ else
+ {
+ syncslotname = psprintf("pg_%u_sync_%u", suboid, relid);
+ }
+
+ return syncslotname;
+}
+
/*
* Start syncing the table in the sync worker.
*
@@ -824,6 +944,8 @@ LogicalRepSyncTableStart(XLogRecPtr *origin_startpos)
XLogRecPtr relstate_lsn;
Relation rel;
WalRcvExecResult *res;
+ char originname[NAMEDATALEN];
+ RepOriginId originid;
/* Check the state of the table synchronization. */
StartTransactionCommand();
@@ -849,19 +971,10 @@ LogicalRepSyncTableStart(XLogRecPtr *origin_startpos)
finish_sync_worker(); /* doesn't return */
}
- /*
- * To build a slot name for the sync work, we are limited to NAMEDATALEN -
- * 1 characters. We cut the original slot name to NAMEDATALEN - 28 chars
- * and append _%u_sync_%u (1 + 10 + 6 + 10 + '\0'). (It's actually the
- * NAMEDATALEN on the remote that matters, but this scheme will also work
- * reasonably if that is different.)
- */
- StaticAssertStmt(NAMEDATALEN >= 32, "NAMEDATALEN too small"); /* for sanity */
- slotname = psprintf("%.*s_%u_sync_%u",
- NAMEDATALEN - 28,
- MySubscription->slotname,
- MySubscription->oid,
- MyLogicalRepWorker->relid);
+ /* Calculate the name of the tablesync slot. */
+ slotname = ReplicationSlotNameForTablesync(MySubscription->oid,
+ MyLogicalRepWorker->relid,
+ NULL /* use palloc */ );
/*
* Here we use the slot name instead of the subscription name as the
@@ -874,7 +987,33 @@ LogicalRepSyncTableStart(XLogRecPtr *origin_startpos)
(errmsg("could not connect to the publisher: %s", err)));
Assert(MyLogicalRepWorker->relstate == SUBREL_STATE_INIT ||
- MyLogicalRepWorker->relstate == SUBREL_STATE_DATASYNC);
+ MyLogicalRepWorker->relstate == SUBREL_STATE_DATASYNC ||
+ MyLogicalRepWorker->relstate == SUBREL_STATE_FINISHEDCOPY);
+
+ /* Assign the origin tracking record name. */
+ snprintf(originname, sizeof(originname), "pg_%u_%u", MySubscription->oid, MyLogicalRepWorker->relid);
+
+ if (MyLogicalRepWorker->relstate == SUBREL_STATE_FINISHEDCOPY)
+ {
+ /*
+ * The COPY phase was previously done, but tablesync then crashed
+ * before it was able to finish normally.
+ */
+ StartTransactionCommand();
+
+ /*
+ * The origin tracking name must already exist. It was created first
+ * time this tablesync was launched.
+ */
+ originid = replorigin_by_name(originname, false /* missing_ok */ );
+ replorigin_session_setup(originid);
+ replorigin_session_origin = originid;
+ *origin_startpos = replorigin_session_get_progress(false);
+
+ CommitTransactionCommand();
+
+ goto copy_table_done;
+ }
SpinLockAcquire(&MyLogicalRepWorker->relmutex);
MyLogicalRepWorker->relstate = SUBREL_STATE_DATASYNC;
@@ -890,9 +1029,6 @@ LogicalRepSyncTableStart(XLogRecPtr *origin_startpos)
CommitTransactionCommand();
pgstat_report_stat(false);
- /*
- * We want to do the table data sync in a single transaction.
- */
StartTransactionCommand();
/*
@@ -918,29 +1054,97 @@ LogicalRepSyncTableStart(XLogRecPtr *origin_startpos)
walrcv_clear_result(res);
/*
- * Create a new temporary logical decoding slot. This slot will be used
- * for the catchup phase after COPY is done, so tell it to use the
- * snapshot to make the final data consistent.
+ * Be sure to remove the newly created tablesync slot if the COPY fails.
*/
- walrcv_create_slot(wrconn, slotname, true,
- CRS_USE_SNAPSHOT, origin_startpos);
+ PG_TRY();
+ {
+ /*
+ * Create a new permanent logical decoding slot. This slot will be
+ * used for the catchup phase after COPY is done, so tell it to use
+ * the snapshot to make the final data consistent.
+ */
+ walrcv_create_slot(wrconn, slotname, false /* permanent */ ,
+ CRS_USE_SNAPSHOT, origin_startpos);
- /* Now do the initial data copy */
- PushActiveSnapshot(GetTransactionSnapshot());
- copy_table(rel);
- PopActiveSnapshot();
+ /* Now do the initial data copy */
+ PushActiveSnapshot(GetTransactionSnapshot());
+ copy_table(rel);
+ PopActiveSnapshot();
- res = walrcv_exec(wrconn, "COMMIT", 0, NULL);
- if (res->status != WALRCV_OK_COMMAND)
- ereport(ERROR,
- (errmsg("table copy could not finish transaction on publisher"),
- errdetail("The error was: %s", res->err)));
- walrcv_clear_result(res);
+ res = walrcv_exec(wrconn, "COMMIT", 0, NULL);
+ if (res->status != WALRCV_OK_COMMAND)
+ ereport(ERROR,
+ (errmsg("table copy could not finish transaction on publisher"),
+ errdetail("The error was: %s", res->err)));
+ walrcv_clear_result(res);
+
+ table_close(rel, NoLock);
+
+ /* Make the copy visible. */
+ CommandCounterIncrement();
+
+ /* Setup replication origin tracking. */
+ originid = replorigin_by_name(originname, true);
+ if (!OidIsValid(originid))
+ {
+ /*
+ * Origin tracking does not exist, so create it now.
+ *
+ * Then advance to the LSN got from walrcv_create_slot. This is
+ * WAL logged for the purpose of recovery. Locks are to prevent
+ * the replication origin from vanishing while advancing.
+ */
+ originid = replorigin_create(originname);
+
+ LockRelationOid(ReplicationOriginRelationId, RowExclusiveLock);
+ replorigin_advance(originid, *origin_startpos, InvalidXLogRecPtr,
+ true /* go backward */ , true /* WAL log */ );
+ UnlockRelationOid(ReplicationOriginRelationId, RowExclusiveLock);
+
+ replorigin_session_setup(originid);
+ replorigin_session_origin = originid;
+ }
+ else
+ {
+ ereport(ERROR,
+ (errcode(ERRCODE_DUPLICATE_OBJECT),
+ errmsg("replication origin \"%s\" already exists",
+ originname)));
+ }
+
+ /*
+ * Update the persisted state to indicate the COPY phase is done; make
+ * it visible to others.
+ */
+ UpdateSubscriptionRelState(MyLogicalRepWorker->subid,
+ MyLogicalRepWorker->relid,
+ SUBREL_STATE_FINISHEDCOPY,
+ MyLogicalRepWorker->relstate_lsn);
+
+ CommitTransactionCommand();
+ }
+ PG_CATCH();
+ {
+ /*
+ * If something failed during copy table then cleanup the created
+ * slot.
+ */
+ ReplicationSlotDropAtPubNode(wrconn, slotname, false /* missing_ok */ );
+
+ pfree(slotname);
+ slotname = NULL;
+
+ PG_RE_THROW();
+ }
+ PG_END_TRY();
- table_close(rel, NoLock);
+copy_table_done:
- /* Make the copy visible. */
- CommandCounterIncrement();
+ elog(DEBUG1,
+ "LogicalRepSyncTableStart: '%s' origin_startpos lsn %X/%X",
+ originname,
+ (uint32) (*origin_startpos >> 32),
+ (uint32) *origin_startpos);
/*
* We are done with the initial data synchronization, update the state.
diff --git a/src/backend/replication/logical/worker.c b/src/backend/replication/logical/worker.c
index eb7db89cef..cfc924cd89 100644
--- a/src/backend/replication/logical/worker.c
+++ b/src/backend/replication/logical/worker.c
@@ -807,12 +807,8 @@ apply_handle_stream_stop(StringInfo s)
/* We must be in a valid transaction state */
Assert(IsTransactionState());
- /* The synchronization worker runs in single transaction. */
- if (!am_tablesync_worker())
- {
- /* Commit the per-stream transaction */
- CommitTransactionCommand();
- }
+ /* Commit the per-stream transaction */
+ CommitTransactionCommand();
in_streamed_transaction = false;
@@ -889,9 +885,7 @@ apply_handle_stream_abort(StringInfo s)
/* Cleanup the subxact info */
cleanup_subxact_info();
- /* The synchronization worker runs in single transaction */
- if (!am_tablesync_worker())
- CommitTransactionCommand();
+ CommitTransactionCommand();
return;
}
@@ -918,8 +912,7 @@ apply_handle_stream_abort(StringInfo s)
/* write the updated subxact list */
subxact_info_write(MyLogicalRepWorker->subid, xid);
- if (!am_tablesync_worker())
- CommitTransactionCommand();
+ CommitTransactionCommand();
}
}
@@ -1062,8 +1055,7 @@ apply_handle_stream_commit(StringInfo s)
static void
apply_handle_commit_internal(StringInfo s, LogicalRepCommitData *commit_data)
{
- /* The synchronization worker runs in single transaction. */
- if (IsTransactionState() && !am_tablesync_worker())
+ if (IsTransactionState())
{
/*
* Update origin state so we can restart streaming from correct
diff --git a/src/backend/tcop/utility.c b/src/backend/tcop/utility.c
index 1d81071c35..05bb698cf4 100644
--- a/src/backend/tcop/utility.c
+++ b/src/backend/tcop/utility.c
@@ -1786,7 +1786,8 @@ ProcessUtilitySlow(ParseState *pstate,
break;
case T_AlterSubscriptionStmt:
- address = AlterSubscription((AlterSubscriptionStmt *) parsetree);
+ address = AlterSubscription((AlterSubscriptionStmt *) parsetree,
+ isTopLevel);
break;
case T_DropSubscriptionStmt:
diff --git a/src/include/catalog/pg_subscription_rel.h b/src/include/catalog/pg_subscription_rel.h
index 06663b9f16..9027c42976 100644
--- a/src/include/catalog/pg_subscription_rel.h
+++ b/src/include/catalog/pg_subscription_rel.h
@@ -61,6 +61,8 @@ DECLARE_UNIQUE_INDEX(pg_subscription_rel_srrelid_srsubid_index, 6117, on pg_subs
#define SUBREL_STATE_INIT 'i' /* initializing (sublsn NULL) */
#define SUBREL_STATE_DATASYNC 'd' /* data is being synchronized (sublsn
* NULL) */
+#define SUBREL_STATE_FINISHEDCOPY 'f' /* tablesync copy phase is completed
+ * (sublsn NULL) */
#define SUBREL_STATE_SYNCDONE 's' /* synchronization finished in front of
* apply (sublsn set) */
#define SUBREL_STATE_READY 'r' /* ready (sublsn set) */
diff --git a/src/include/commands/subscriptioncmds.h b/src/include/commands/subscriptioncmds.h
index a81865079d..3b926f35d7 100644
--- a/src/include/commands/subscriptioncmds.h
+++ b/src/include/commands/subscriptioncmds.h
@@ -20,7 +20,7 @@
extern ObjectAddress CreateSubscription(CreateSubscriptionStmt *stmt,
bool isTopLevel);
-extern ObjectAddress AlterSubscription(AlterSubscriptionStmt *stmt);
+extern ObjectAddress AlterSubscription(AlterSubscriptionStmt *stmt, bool isTopLevel);
extern void DropSubscription(DropSubscriptionStmt *stmt, bool isTopLevel);
extern ObjectAddress AlterSubscriptionOwner(const char *name, Oid newOwnerId);
diff --git a/src/include/replication/logicallauncher.h b/src/include/replication/logicallauncher.h
index 421ec1580d..301e494f7b 100644
--- a/src/include/replication/logicallauncher.h
+++ b/src/include/replication/logicallauncher.h
@@ -22,9 +22,7 @@ extern Size ApplyLauncherShmemSize(void);
extern void ApplyLauncherShmemInit(void);
extern void ApplyLauncherWakeupAtCommit(void);
-extern bool XactManipulatesLogicalReplicationWorkers(void);
extern void AtEOXact_ApplyLauncher(bool isCommit);
-extern void AtEOSubXact_ApplyLauncher(bool isCommit, int nestDepth);
extern bool IsLogicalLauncher(void);
diff --git a/src/include/replication/slot.h b/src/include/replication/slot.h
index 53f636c56f..5f52335f15 100644
--- a/src/include/replication/slot.h
+++ b/src/include/replication/slot.h
@@ -15,6 +15,7 @@
#include "storage/lwlock.h"
#include "storage/shmem.h"
#include "storage/spin.h"
+#include "replication/walreceiver.h"
/*
* Behaviour of replication slots, upon release or crash.
@@ -211,6 +212,8 @@ extern bool ReplicationSlotsCountDBSlots(Oid dboid, int *nslots, int *nactive);
extern void ReplicationSlotsDropDBSlots(Oid dboid);
extern void InvalidateObsoleteReplicationSlots(XLogSegNo oldestSegno);
extern ReplicationSlot *SearchNamedReplicationSlot(const char *name);
+extern char *ReplicationSlotNameForTablesync(Oid suboid, Oid relid, char *syncslotname);
+extern void ReplicationSlotDropAtPubNode(WalReceiverConn *wrconn, char *slotname, bool missing_ok);
extern void StartupReplicationSlots(void);
extern void CheckPointReplicationSlots(void);
diff --git a/src/include/replication/worker_internal.h b/src/include/replication/worker_internal.h
index d046022e49..4a5c49da7d 100644
--- a/src/include/replication/worker_internal.h
+++ b/src/include/replication/worker_internal.h
@@ -77,13 +77,14 @@ extern List *logicalrep_workers_find(Oid subid, bool only_running);
extern void logicalrep_worker_launch(Oid dbid, Oid subid, const char *subname,
Oid userid, Oid relid);
extern void logicalrep_worker_stop(Oid subid, Oid relid);
-extern void logicalrep_worker_stop_at_commit(Oid subid, Oid relid);
extern void logicalrep_worker_wakeup(Oid subid, Oid relid);
extern void logicalrep_worker_wakeup_ptr(LogicalRepWorker *worker);
extern int logicalrep_sync_worker_count(Oid subid);
extern char *LogicalRepSyncTableStart(XLogRecPtr *origin_startpos);
+extern void tablesync_replorigin_drop(Oid subid, Oid relid, bool nowait);
+
void process_syncing_tables(XLogRecPtr current_lsn);
void invalidate_syncing_table_states(Datum arg, int cacheid,
uint32 hashvalue);
diff --git a/src/test/subscription/t/004_sync.pl b/src/test/subscription/t/004_sync.pl
index e111ab9181..963a7ee4dc 100644
--- a/src/test/subscription/t/004_sync.pl
+++ b/src/test/subscription/t/004_sync.pl
@@ -3,7 +3,9 @@ use strict;
use warnings;
use PostgresNode;
use TestLib;
-use Test::More tests => 7;
+use Test::More tests => 10;
+use Time::HiRes qw(usleep);
+use Scalar::Util qw(looks_like_number);
# Initialize publisher node
my $node_publisher = get_new_node('publisher');
@@ -149,6 +151,71 @@ $result = $node_subscriber->safe_psql('postgres',
is($result, qq(20),
'changes for table added after subscription initialized replicated');
+##
+## slot integrity
+##
+## Manually create a slot with the same name that tablesync will want.
+## Expect tablesync ERROR when clash is detected.
+## Then remove the slot so tablesync can proceed.
+## Expect tablesync can now finish normally.
+##
+
+# drop the subscription
+$node_subscriber->safe_psql('postgres', "DROP SUBSCRIPTION tap_sub");
+
+# empty the table tab_rep_next
+$node_subscriber->safe_psql('postgres', "DELETE FROM tab_rep_next;");
+
+# drop the table tab_rep from publisher and subscriber
+$node_subscriber->safe_psql('postgres', "DROP TABLE tab_rep;");
+$node_publisher->safe_psql('postgres', "DROP TABLE tab_rep;");
+
+# recreate the subscription again, but leave it disabled so that we can get the OID
+$node_subscriber->safe_psql('postgres',
+ "CREATE SUBSCRIPTION tap_sub CONNECTION '$publisher_connstr' PUBLICATION tap_pub
+ with (enabled = false)"
+);
+
+# need to create the name of the tablesync slot, for this we need the subscription OID
+# and the table OID.
+my $subid = $node_subscriber->safe_psql('postgres',
+ "SELECT oid FROM pg_subscription WHERE subname = 'tap_sub';");
+is(looks_like_number($subid), qq(1), 'get the subscription OID');
+
+my $relid = $node_subscriber->safe_psql('postgres',
+ "SELECT 'tab_rep_next'::regclass::oid");
+is(looks_like_number($relid), qq(1), 'get the table OID');
+
+# name of the tablesync slot is 'pg_'suboid'_sync_'tableoid'.
+my $slotname = 'pg_' . $subid . '_' . 'sync_' . $relid;
+
+# temporarily, create a slot having the same name of the tablesync slot.
+$node_publisher->safe_psql('postgres',
+ "SELECT 'init' FROM pg_create_logical_replication_slot('$slotname', 'pgoutput', false);");
+
+# enable the subscription
+$node_subscriber->safe_psql('postgres',
+ "ALTER SUBSCRIPTION tap_sub ENABLE"
+);
+
+# it will be stuck on data sync as slot create will fail because slot already exists.
+$node_subscriber->poll_query_until('postgres', $started_query)
+ or die "Timed out while waiting for subscriber to start sync";
+
+# now drop the offending slot, the tablesync should recover.
+$node_publisher->safe_psql('postgres',
+ "SELECT pg_drop_replication_slot('$slotname');");
+
+# wait for sync to finish
+$node_subscriber->poll_query_until('postgres', $synced_query)
+ or die "Timed out while waiting for subscriber to synchronize data";
+
+$result = $node_subscriber->safe_psql('postgres',
+ "SELECT count(*) FROM tab_rep_next");
+is($result, qq(20),
+ 'data for table added after subscription initialized are now synced');
+
+# Cleanup
$node_subscriber->safe_psql('postgres', "DROP SUBSCRIPTION tap_sub");
$node_subscriber->stop('fast');
--
2.28.0.windows.1
On Sun, Jan 31, 2021 at 12:19 AM Amit Kapila <amit.kapila16@gmail.com> wrote:
I have made the below changes in the patch. Let me know what you think
about these?
1. It was a bit difficult to understand the code in DropSubscription
so I have rearranged the code to match the way we are doing in HEAD
where we drop the slots at the end after finishing all the other
cleanup.
There was a reason why the v22 logic was different from HEAD.
The broken connection leaves dangling slots which is unavoidable. But,
whereas the user knows the name of the Subscription slot (they named
it), there is no easy way for them to know the names of the remaining
tablesync slots unless we log them.
That is why the v22 code was written to process the tablesync slots
even for wrconn == NULL so their name could be logged:
elog(WARNING, "no connection; cannot drop tablesync slot \"%s\".",
syncslotname);
The v23 patch removed this dangling slot name information, so it makes
it difficult for the user to know what tablesync slots to cleanup.
2. In AlterSubscription_refresh(), we can't allow workers to be
stopped at commit time as we have already dropped the slots because
the worker can access the dropped slot. We need to stop the workers
before dropping slots. This makes all the code related to
logicalrep_worker_stop_at_commit redundant.
OK.
3. In AlterSubscription_refresh(), we need to acquire the lock on
pg_subscription_rel only when we try to remove any subscription rel.
+ if (!sub_rel_locked)
+ {
+ rel = table_open(SubscriptionRelRelationId, AccessExclusiveLock);
+ sub_rel_locked = true;
+ }
OK. But the sub_rel_locked bool is not really necessary. Why not just
check for rel == NULL? e.g.
if (!rel)
rel = table_open(SubscriptionRelRelationId, AccessExclusiveLock);
4. Added/Changed quite a few comments.
@@ -1042,6 +1115,31 @@ DropSubscription(DropSubscriptionStmt *stmt,
bool isTopLevel)
}
list_free(subworkers);
+ /*
+ * Tablesync resource cleanup (slots and origins).
The comment is misleading; this code is only dropping origins.
----
Kind Regards,
Peter Smith.
Fujitsu Australia
On Sun, Jan 31, 2021 at 12:19 AM Amit Kapila <amit.kapila16@gmail.com> wrote:
2. In AlterSubscription_refresh(), we can't allow workers to be
stopped at commit time as we have already dropped the slots because
the worker can access the dropped slot. We need to stop the workers
before dropping slots. This makes all the code related to
logicalrep_worker_stop_at_commit redundant.
@@ -73,20 +73,6 @@ typedef struct LogicalRepWorkerId
Oid relid;
} LogicalRepWorkerId;
-typedef struct StopWorkersData
-{
- int nestDepth; /* Sub-transaction nest level */
- List *workers; /* List of LogicalRepWorkerId */
- struct StopWorkersData *parent; /* This need not be an immediate
- * subtransaction parent */
-} StopWorkersData;
Since v23 removes that typedef from the code, don't you also have to
remove it from src/tools/pgindent/typedefs.list?
----
Kind Regards,
Peter Smith.
Fujitsu Australia
On Mon, Feb 1, 2021 at 6:48 AM Peter Smith <smithpb2250@gmail.com> wrote:
On Sun, Jan 31, 2021 at 12:19 AM Amit Kapila <amit.kapila16@gmail.com> wrote:
I have made the below changes in the patch. Let me know what you think
about these?
1. It was a bit difficult to understand the code in DropSubscription
so I have rearranged the code to match the way we are doing in HEAD
where we drop the slots at the end after finishing all the other
cleanup.There was a reason why the v22 logic was different from HEAD.
The broken connection leaves dangling slots which is unavoidable.
I think this is true only when the user specifically requested it by
the use of "ALTER SUBSCRIPTION ... SET (slot_name = NONE)", right?
Otherwise, we give an error on a broken connection. Also, if that is
true then is there a reason to pass missing_ok as true while dropping
tablesync slots?
But,
whereas the user knows the name of the Subscription slot (they named
it), there is no easy way for them to know the names of the remaining
tablesync slots unless we log them.That is why the v22 code was written to process the tablesync slots
even for wrconn == NULL so their name could be logged:
elog(WARNING, "no connection; cannot drop tablesync slot \"%s\".",
syncslotname);The v23 patch removed this dangling slot name information, so it makes
it difficult for the user to know what tablesync slots to cleanup.
Okay, then can we think of combining with the existing error of the
replication slot? I think that might produce a very long message, so
another idea could be to LOG a separate WARNING for each such slot
just before giving the error.
2. In AlterSubscription_refresh(), we can't allow workers to be
stopped at commit time as we have already dropped the slots because
the worker can access the dropped slot. We need to stop the workers
before dropping slots. This makes all the code related to
logicalrep_worker_stop_at_commit redundant.OK.
3. In AlterSubscription_refresh(), we need to acquire the lock on
pg_subscription_rel only when we try to remove any subscription rel.+ if (!sub_rel_locked) + { + rel = table_open(SubscriptionRelRelationId, AccessExclusiveLock); + sub_rel_locked = true; + }OK. But the sub_rel_locked bool is not really necessary. Why not just
check for rel == NULL? e.g.
if (!rel)
rel = table_open(SubscriptionRelRelationId, AccessExclusiveLock);
Okay, that seems to be better, will change accordingly.
4. Added/Changed quite a few comments.
@@ -1042,6 +1115,31 @@ DropSubscription(DropSubscriptionStmt *stmt,
bool isTopLevel)
}
list_free(subworkers);+ /* + * Tablesync resource cleanup (slots and origins).The comment is misleading; this code is only dropping origins.
Okay, will change to something like: "Cleanup of tablesync replication origins."
@@ -73,20 +73,6 @@ typedef struct LogicalRepWorkerId
Oid relid;
} LogicalRepWorkerId;-typedef struct StopWorkersData
-{
- int nestDepth; /* Sub-transaction nest level */
- List *workers; /* List of LogicalRepWorkerId */
- struct StopWorkersData *parent; /* This need not be an immediate
- * subtransaction parent */
-} StopWorkersData;Since v23 removes that typedef from the code, don't you also have to
remove it from src/tools/pgindent/typedefs.list?
Yeah.
--
With Regards,
Amit Kapila.
On Mon, Feb 1, 2021 at 1:54 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
On Mon, Feb 1, 2021 at 6:48 AM Peter Smith <smithpb2250@gmail.com> wrote:
On Sun, Jan 31, 2021 at 12:19 AM Amit Kapila <amit.kapila16@gmail.com> wrote:
I have made the below changes in the patch. Let me know what you think
about these?
1. It was a bit difficult to understand the code in DropSubscription
so I have rearranged the code to match the way we are doing in HEAD
where we drop the slots at the end after finishing all the other
cleanup.There was a reason why the v22 logic was different from HEAD.
The broken connection leaves dangling slots which is unavoidable.
I think this is true only when the user specifically requested it by
the use of "ALTER SUBSCRIPTION ... SET (slot_name = NONE)", right?
Otherwise, we give an error on a broken connection. Also, if that is
true then is there a reason to pass missing_ok as true while dropping
tablesync slots?
AFAIK there is always a potential race with DropSubscription dropping
slots. The DropSubscription might be running at exactly the same time
the apply worker has just dropped the very same tablesync slot. By
saying missing_ok = true it means DropSubscription would not give
ERROR in such a case, so at least the DROP SUBSCRIPTION would not fail
with an unexpected error.
But,
whereas the user knows the name of the Subscription slot (they named
it), there is no easy way for them to know the names of the remaining
tablesync slots unless we log them.That is why the v22 code was written to process the tablesync slots
even for wrconn == NULL so their name could be logged:
elog(WARNING, "no connection; cannot drop tablesync slot \"%s\".",
syncslotname);The v23 patch removed this dangling slot name information, so it makes
it difficult for the user to know what tablesync slots to cleanup.Okay, then can we think of combining with the existing error of the
replication slot? I think that might produce a very long message, so
another idea could be to LOG a separate WARNING for each such slot
just before giving the error.
There may be many subscribed tables so I agree combining to one
message might be too long. Yes, we can add another loop to output the
necessary information. But, isn’t logging each tablesync slot WARNING
before the subscription slot ERROR exactly the behaviour which already
existed in v22. IIUC the DropSubscription refactoring in V23 was not
done for any functional change, but was intended only to make the code
simpler, but how is that goal achieved if v23 ends up needing 3 loops
where v22 only needed 1 loop to do the same thing?
----
Kind Regards,
Peter Smith.
Fujitsu Australia.
On Mon, Feb 1, 2021 at 9:38 AM Peter Smith <smithpb2250@gmail.com> wrote:
On Mon, Feb 1, 2021 at 1:54 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
On Mon, Feb 1, 2021 at 6:48 AM Peter Smith <smithpb2250@gmail.com> wrote:
On Sun, Jan 31, 2021 at 12:19 AM Amit Kapila <amit.kapila16@gmail.com> wrote:
I have made the below changes in the patch. Let me know what you think
about these?
1. It was a bit difficult to understand the code in DropSubscription
so I have rearranged the code to match the way we are doing in HEAD
where we drop the slots at the end after finishing all the other
cleanup.There was a reason why the v22 logic was different from HEAD.
The broken connection leaves dangling slots which is unavoidable.
I think this is true only when the user specifically requested it by
the use of "ALTER SUBSCRIPTION ... SET (slot_name = NONE)", right?
Otherwise, we give an error on a broken connection. Also, if that is
true then is there a reason to pass missing_ok as true while dropping
tablesync slots?AFAIK there is always a potential race with DropSubscription dropping
slots. The DropSubscription might be running at exactly the same time
the apply worker has just dropped the very same tablesync slot.
We stopped the workers before getting a list of NotReady relations and
then we try to drop the corresponding slots. So, how such a race
condition can happen? Note, because we have a lock on pg_subscrition,
there is no chance that the workers can restart till the transaction
end.
By
saying missing_ok = true it means DropSubscription would not give
ERROR in such a case, so at least the DROP SUBSCRIPTION would not fail
with an unexpected error.But,
whereas the user knows the name of the Subscription slot (they named
it), there is no easy way for them to know the names of the remaining
tablesync slots unless we log them.That is why the v22 code was written to process the tablesync slots
even for wrconn == NULL so their name could be logged:
elog(WARNING, "no connection; cannot drop tablesync slot \"%s\".",
syncslotname);The v23 patch removed this dangling slot name information, so it makes
it difficult for the user to know what tablesync slots to cleanup.Okay, then can we think of combining with the existing error of the
replication slot? I think that might produce a very long message, so
another idea could be to LOG a separate WARNING for each such slot
just before giving the error.There may be many subscribed tables so I agree combining to one
message might be too long. Yes, we can add another loop to output the
necessary information. But, isn’t logging each tablesync slot WARNING
before the subscription slot ERROR exactly the behaviour which already
existed in v22. IIUC the DropSubscription refactoring in V23 was not
done for any functional change, but was intended only to make the code
simpler, but how is that goal achieved if v23 ends up needing 3 loops
where v22 only needed 1 loop to do the same thing?
No, there is a functionality change as well. The way we have code in
v22 can easily lead to a problem when we have dropped the slots but
get an error while removing origins or an entry from subscription rel.
In such cases, we won't be able to rollback the drop of slots but the
other database operations will be rolled back. This is the reason we
have to drop the slots at the end. We need to ensure the same thing
for AlterSubscription_refresh. Does this make sense now?
--
With Regards,
Amit Kapila.
On Mon, Feb 1, 2021 at 10:14 AM Amit Kapila <amit.kapila16@gmail.com> wrote:
On Mon, Feb 1, 2021 at 9:38 AM Peter Smith <smithpb2250@gmail.com> wrote:
On Mon, Feb 1, 2021 at 1:54 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
On Mon, Feb 1, 2021 at 6:48 AM Peter Smith <smithpb2250@gmail.com> wrote:
On Sun, Jan 31, 2021 at 12:19 AM Amit Kapila <amit.kapila16@gmail.com> wrote:
I have made the below changes in the patch. Let me know what you think
about these?
1. It was a bit difficult to understand the code in DropSubscription
so I have rearranged the code to match the way we are doing in HEAD
where we drop the slots at the end after finishing all the other
cleanup.There was a reason why the v22 logic was different from HEAD.
The broken connection leaves dangling slots which is unavoidable.
I think this is true only when the user specifically requested it by
the use of "ALTER SUBSCRIPTION ... SET (slot_name = NONE)", right?
Otherwise, we give an error on a broken connection. Also, if that is
true then is there a reason to pass missing_ok as true while dropping
tablesync slots?AFAIK there is always a potential race with DropSubscription dropping
slots. The DropSubscription might be running at exactly the same time
the apply worker has just dropped the very same tablesync slot.We stopped the workers before getting a list of NotReady relations and
then we try to drop the corresponding slots. So, how such a race
condition can happen?
I think it is possible that the state is still not SYNCDONE but the
slot is already dropped so here we should be ready with the missing
slot.
--
With Regards,
Amit Kapila.
On Mon, Feb 1, 2021 at 3:44 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
On Mon, Feb 1, 2021 at 9:38 AM Peter Smith <smithpb2250@gmail.com> wrote:
On Mon, Feb 1, 2021 at 1:54 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
On Mon, Feb 1, 2021 at 6:48 AM Peter Smith <smithpb2250@gmail.com> wrote:
On Sun, Jan 31, 2021 at 12:19 AM Amit Kapila <amit.kapila16@gmail.com> wrote:
I have made the below changes in the patch. Let me know what you think
about these?
1. It was a bit difficult to understand the code in DropSubscription
so I have rearranged the code to match the way we are doing in HEAD
where we drop the slots at the end after finishing all the other
cleanup.There was a reason why the v22 logic was different from HEAD.
The broken connection leaves dangling slots which is unavoidable.
I think this is true only when the user specifically requested it by
the use of "ALTER SUBSCRIPTION ... SET (slot_name = NONE)", right?
Otherwise, we give an error on a broken connection. Also, if that is
true then is there a reason to pass missing_ok as true while dropping
tablesync slots?AFAIK there is always a potential race with DropSubscription dropping
slots. The DropSubscription might be running at exactly the same time
the apply worker has just dropped the very same tablesync slot.We stopped the workers before getting a list of NotReady relations and
then we try to drop the corresponding slots. So, how such a race
condition can happen? Note, because we have a lock on pg_subscrition,
there is no chance that the workers can restart till the transaction
end.
OK. I think I was forgetting the logicalrep_worker_stop would also go
into a loop waiting for the worker process to die. So even if the
tablesync worker does simultaneously drop it's own slot, I think it
will certainly at least be in SYNCDONE state before DropSubscription
does anything else with that worker.
By
saying missing_ok = true it means DropSubscription would not give
ERROR in such a case, so at least the DROP SUBSCRIPTION would not fail
with an unexpected error.But,
whereas the user knows the name of the Subscription slot (they named
it), there is no easy way for them to know the names of the remaining
tablesync slots unless we log them.That is why the v22 code was written to process the tablesync slots
even for wrconn == NULL so their name could be logged:
elog(WARNING, "no connection; cannot drop tablesync slot \"%s\".",
syncslotname);The v23 patch removed this dangling slot name information, so it makes
it difficult for the user to know what tablesync slots to cleanup.Okay, then can we think of combining with the existing error of the
replication slot? I think that might produce a very long message, so
another idea could be to LOG a separate WARNING for each such slot
just before giving the error.There may be many subscribed tables so I agree combining to one
message might be too long. Yes, we can add another loop to output the
necessary information. But, isn’t logging each tablesync slot WARNING
before the subscription slot ERROR exactly the behaviour which already
existed in v22. IIUC the DropSubscription refactoring in V23 was not
done for any functional change, but was intended only to make the code
simpler, but how is that goal achieved if v23 ends up needing 3 loops
where v22 only needed 1 loop to do the same thing?No, there is a functionality change as well. The way we have code in
v22 can easily lead to a problem when we have dropped the slots but
get an error while removing origins or an entry from subscription rel.
In such cases, we won't be able to rollback the drop of slots but the
other database operations will be rolled back. This is the reason we
have to drop the slots at the end. We need to ensure the same thing
for AlterSubscription_refresh. Does this make sense now?
OK.
----
Kind Regards,
Peter Smith.
Fujitsu Australia.
On Mon, Feb 1, 2021 at 11:23 AM Peter Smith <smithpb2250@gmail.com> wrote:
On Mon, Feb 1, 2021 at 3:44 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
On Mon, Feb 1, 2021 at 9:38 AM Peter Smith <smithpb2250@gmail.com> wrote:
I think this is true only when the user specifically requested it by
the use of "ALTER SUBSCRIPTION ... SET (slot_name = NONE)", right?
Otherwise, we give an error on a broken connection. Also, if that is
true then is there a reason to pass missing_ok as true while dropping
tablesync slots?AFAIK there is always a potential race with DropSubscription dropping
slots. The DropSubscription might be running at exactly the same time
the apply worker has just dropped the very same tablesync slot.We stopped the workers before getting a list of NotReady relations and
then we try to drop the corresponding slots. So, how such a race
condition can happen? Note, because we have a lock on pg_subscrition,
there is no chance that the workers can restart till the transaction
end.OK. I think I was forgetting the logicalrep_worker_stop would also go
into a loop waiting for the worker process to die. So even if the
tablesync worker does simultaneously drop it's own slot, I think it
will certainly at least be in SYNCDONE state before DropSubscription
does anything else with that worker.
How is that ensured? We don't have anything like HOLD_INTERRUPTS
between the time dropped the slot and updated rel state as SYNCDONE.
So, isn't it possible that after we dropped the slot and before we
update the state, the SIGTERM signal arrives and led to worker exit?
--
With Regards,
Amit Kapila.
On Mon, Feb 1, 2021 at 5:19 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
AFAIK there is always a potential race with DropSubscription dropping
slots. The DropSubscription might be running at exactly the same time
the apply worker has just dropped the very same tablesync slot.We stopped the workers before getting a list of NotReady relations and
then we try to drop the corresponding slots. So, how such a race
condition can happen? Note, because we have a lock on pg_subscrition,
there is no chance that the workers can restart till the transaction
end.OK. I think I was forgetting the logicalrep_worker_stop would also go
into a loop waiting for the worker process to die. So even if the
tablesync worker does simultaneously drop it's own slot, I think it
will certainly at least be in SYNCDONE state before DropSubscription
does anything else with that worker.How is that ensured? We don't have anything like HOLD_INTERRUPTS
between the time dropped the slot and updated rel state as SYNCDONE.
So, isn't it possible that after we dropped the slot and before we
update the state, the SIGTERM signal arrives and led to worker exit?
The worker has the SIGTERM handler of "die". IIUC the "die" function
doesn't normally do anything except set some flags to say please die
at the next convenient opportunity. My understanding is that the
worker process will not actually exit until it next executes
CHECK_FOR_INTERRUPTS(), whereupon it will see the ProcDiePending flag
and *really* die. So even if the SIGTERM signal arrives immediately
after the slot is dropped, the tablesync will still become SYNCDONE.
Is this wrong understanding?
But your scenario could still be possible if "die" exited immediately
(e.g. only in single user mode?).
----
Kind Regards,
Peter Smith.
Fujitsu Australia
On Mon, Feb 1, 2021 at 1:08 PM Peter Smith <smithpb2250@gmail.com> wrote:
On Mon, Feb 1, 2021 at 5:19 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
AFAIK there is always a potential race with DropSubscription dropping
slots. The DropSubscription might be running at exactly the same time
the apply worker has just dropped the very same tablesync slot.We stopped the workers before getting a list of NotReady relations and
then we try to drop the corresponding slots. So, how such a race
condition can happen? Note, because we have a lock on pg_subscrition,
there is no chance that the workers can restart till the transaction
end.OK. I think I was forgetting the logicalrep_worker_stop would also go
into a loop waiting for the worker process to die. So even if the
tablesync worker does simultaneously drop it's own slot, I think it
will certainly at least be in SYNCDONE state before DropSubscription
does anything else with that worker.How is that ensured? We don't have anything like HOLD_INTERRUPTS
between the time dropped the slot and updated rel state as SYNCDONE.
So, isn't it possible that after we dropped the slot and before we
update the state, the SIGTERM signal arrives and led to worker exit?The worker has the SIGTERM handler of "die". IIUC the "die" function
doesn't normally do anything except set some flags to say please die
at the next convenient opportunity. My understanding is that the
worker process will not actually exit until it next executes
CHECK_FOR_INTERRUPTS(), whereupon it will see the ProcDiePending flag
and *really* die. So even if the SIGTERM signal arrives immediately
after the slot is dropped, the tablesync will still become SYNCDONE.
Is this wrong understanding?But your scenario could still be possible if "die" exited immediately
(e.g. only in single user mode?).
I think it is possible without that as well. There are many calls
in-between those two operations which can internally call
CHECK_FOR_INTERRUPTS. One of the flows where such a possibility exists
is UpdateSubscriptionRelState->SearchSysCacheCopy2->SearchSysCacheCopy->SearchSysCache->SearchCatCache->SearchCatCacheInternal->SearchCatCacheMiss->systable_getnext.
This can internally call heapgetpage where we have
CHECK_FOR_INTERRUPTS. I think even if today there was no CFI call we
can't take a guarantee for the future as the calls used are quite
common. So, probably we need missing_ok flag in DropSubscription.
One more point in the tablesync code you are calling
ReplicationSlotDropAtPubNode with missing_ok as false. What if we get
an error after that and before we have marked the state as SYNCDONE? I
guess it will always error from ReplicationSlotDropAtPubNode after
that because we had already dropped the slot.
--
With Regards,
Amit Kapila.
On Mon, Feb 1, 2021 at 11:23 AM Peter Smith <smithpb2250@gmail.com> wrote:
On Mon, Feb 1, 2021 at 3:44 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
On Mon, Feb 1, 2021 at 9:38 AM Peter Smith <smithpb2250@gmail.com> wrote:
No, there is a functionality change as well. The way we have code in
v22 can easily lead to a problem when we have dropped the slots but
get an error while removing origins or an entry from subscription rel.
In such cases, we won't be able to rollback the drop of slots but the
other database operations will be rolled back. This is the reason we
have to drop the slots at the end. We need to ensure the same thing
for AlterSubscription_refresh. Does this make sense now?OK.
I have updated the patch to display WARNING for each of the tablesync
slots during DropSubscription. As discussed, I have moved the drop
slot related code towards the end in AlterSubscription_refresh. Apart
from this, I have fixed one more issue in tablesync code where in
after catching the exception we were not clearing the transaction
state on the publisher, see changes in LogicalRepSyncTableStart. I
have also fixed other comments raised by you. Additionally, I have
removed the test because it was creating the same name slot as the
tablesync worker and tablesync worker removed the same due to new
logic in LogicalRepSyncStart. Earlier, it was not failing because of
the bug in that code which I have fixed in the attached.
I wonder whether we should restrict creating slots with prefix pg_
because we are internally creating slots with those names? I think
this was a problem previously also. We already prohibit it for few
other objects like origins, schema, etc., see the usage of
IsReservedName.
--
With Regards,
Amit Kapila.
Attachments:
v24-0001-Tablesync-Solution1.patchapplication/octet-stream; name=v24-0001-Tablesync-Solution1.patchDownload
From 83795d3fefa7a313d2c2e6ecdb46d861106fce40 Mon Sep 17 00:00:00 2001
From: Amit Kapila <akapila@postgresql.org>
Date: Sat, 30 Jan 2021 10:21:28 +0530
Subject: [PATCH v241] Tablesync Solution1.
==== Features:
* The tablesync slot is now permanent instead of temporary.
* The tablesync worker is now allowing multiple tx instead of single tx.
* A new state (SUBREL_STATE_FINISHEDCOPY) is persisted after a successful copy_table in tablesync's LogicalRepSyncTableStart.
* If a re-launched tablesync finds state SUBREL_STATE_FINISHEDCOPY then it will bypass the initial copy_table phase.
* Now tablesync sets up replication origin tracking in LogicalRepSyncTableStart (similar as done for the apply worker). The origin is advanced when first created.
* Cleanup of tablesync resources:
- The tablesync slot is dropped by process_syncing_tables_for_sync functions.
- The tablesync replication origin tracking is dropped by process_syncing_tables_for_apply.
- DropSubscription/AlterSubscription_refresh also drop tablesyc slots/origins
* Updates to PG docs.
Known Issues:
* None.
---
doc/src/sgml/catalogs.sgml | 1 +
doc/src/sgml/logical-replication.sgml | 17 +-
doc/src/sgml/ref/drop_subscription.sgml | 6 +-
src/backend/access/transam/xact.c | 11 -
src/backend/catalog/pg_subscription.c | 5 +
src/backend/commands/subscriptioncmds.c | 442 ++++++++++++++++++++++------
src/backend/replication/logical/launcher.c | 147 ---------
src/backend/replication/logical/tablesync.c | 305 ++++++++++++++++---
src/backend/replication/logical/worker.c | 18 +-
src/backend/tcop/utility.c | 3 +-
src/include/catalog/pg_subscription_rel.h | 2 +
src/include/commands/subscriptioncmds.h | 2 +-
src/include/replication/logicallauncher.h | 2 -
src/include/replication/slot.h | 3 +
src/include/replication/worker_internal.h | 3 +-
src/tools/pgindent/typedefs.list | 1 -
16 files changed, 643 insertions(+), 325 deletions(-)
diff --git a/doc/src/sgml/catalogs.sgml b/doc/src/sgml/catalogs.sgml
index 865e826..920a39d 100644
--- a/doc/src/sgml/catalogs.sgml
+++ b/doc/src/sgml/catalogs.sgml
@@ -7665,6 +7665,7 @@ SCRAM-SHA-256$<replaceable><iteration count></replaceable>:<replaceable>&l
State code:
<literal>i</literal> = initialize,
<literal>d</literal> = data is being copied,
+ <literal>f</literal> = finished table copy,
<literal>s</literal> = synchronized,
<literal>r</literal> = ready (normal replication)
</para></entry>
diff --git a/doc/src/sgml/logical-replication.sgml b/doc/src/sgml/logical-replication.sgml
index a560ad6..20cdd57 100644
--- a/doc/src/sgml/logical-replication.sgml
+++ b/doc/src/sgml/logical-replication.sgml
@@ -248,7 +248,17 @@
<para>
As mentioned earlier, each (active) subscription receives changes from a
- replication slot on the remote (publishing) side. Normally, the remote
+ replication slot on the remote (publishing) side.
+ </para>
+ <para>
+ Additional table synchronization slots are normally transient, created
+ internally and dropped automatically when they are no longer needed.
+ These table synchronization slots have generated names:
+ <quote><literal>pg_%u_sync_%u</literal></quote> (parameters: Subscription
+ <parameter>oid</parameter>, Table <parameter>relid</parameter>)
+ </para>
+ <para>
+ Normally, the remote
replication slot is created automatically when the subscription is created
using <command>CREATE SUBSCRIPTION</command> and it is dropped
automatically when the subscription is dropped using <command>DROP
@@ -294,8 +304,9 @@
using <command>ALTER SUBSCRIPTION</command> before attempting to drop
the subscription. If the remote database instance no longer exists, no
further action is then necessary. If, however, the remote database
- instance is just unreachable, the replication slot should then be
- dropped manually; otherwise it would continue to reserve WAL and might
+ instance is just unreachable, the replication slot (and any still
+ remaining table synchronization slots) should then be
+ dropped manually; otherwise it/they would continue to reserve WAL and might
eventually cause the disk to fill up. Such cases should be carefully
investigated.
</para>
diff --git a/doc/src/sgml/ref/drop_subscription.sgml b/doc/src/sgml/ref/drop_subscription.sgml
index adbdeaf..aee9615 100644
--- a/doc/src/sgml/ref/drop_subscription.sgml
+++ b/doc/src/sgml/ref/drop_subscription.sgml
@@ -79,7 +79,8 @@ DROP SUBSCRIPTION [ IF EXISTS ] <replaceable class="parameter">name</replaceable
<para>
When dropping a subscription that is associated with a replication slot on
the remote host (the normal state), <command>DROP SUBSCRIPTION</command>
- will connect to the remote host and try to drop the replication slot as
+ will connect to the remote host and try to drop the replication slot (and
+ any remaining table synchronization slots) as
part of its operation. This is necessary so that the resources allocated
for the subscription on the remote host are released. If this fails,
either because the remote host is not reachable or because the remote
@@ -89,7 +90,8 @@ DROP SUBSCRIPTION [ IF EXISTS ] <replaceable class="parameter">name</replaceable
executing <literal>ALTER SUBSCRIPTION ... SET (slot_name = NONE)</literal>.
After that, <command>DROP SUBSCRIPTION</command> will no longer attempt any
actions on a remote host. Note that if the remote replication slot still
- exists, it should then be dropped manually; otherwise it will continue to
+ exists, it (and any related table synchronization slots) should then be
+ dropped manually; otherwise it/they will continue to
reserve WAL and might eventually cause the disk to fill up. See
also <xref linkend="logical-replication-subscription-slot"/>.
</para>
diff --git a/src/backend/access/transam/xact.c b/src/backend/access/transam/xact.c
index a2068e3..3c8b4eb 100644
--- a/src/backend/access/transam/xact.c
+++ b/src/backend/access/transam/xact.c
@@ -2432,15 +2432,6 @@ PrepareTransaction(void)
(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
errmsg("cannot PREPARE a transaction that has exported snapshots")));
- /*
- * Don't allow PREPARE but for transaction that has/might kill logical
- * replication workers.
- */
- if (XactManipulatesLogicalReplicationWorkers())
- ereport(ERROR,
- (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
- errmsg("cannot PREPARE a transaction that has manipulated logical replication workers")));
-
/* Prevent cancel/die interrupt while cleaning up */
HOLD_INTERRUPTS();
@@ -4899,7 +4890,6 @@ CommitSubTransaction(void)
AtEOSubXact_HashTables(true, s->nestingLevel);
AtEOSubXact_PgStat(true, s->nestingLevel);
AtSubCommit_Snapshot(s->nestingLevel);
- AtEOSubXact_ApplyLauncher(true, s->nestingLevel);
/*
* We need to restore the upper transaction's read-only state, in case the
@@ -5059,7 +5049,6 @@ AbortSubTransaction(void)
AtEOSubXact_HashTables(false, s->nestingLevel);
AtEOSubXact_PgStat(false, s->nestingLevel);
AtSubAbort_Snapshot(s->nestingLevel);
- AtEOSubXact_ApplyLauncher(false, s->nestingLevel);
}
/*
diff --git a/src/backend/catalog/pg_subscription.c b/src/backend/catalog/pg_subscription.c
index 44cb285..303791d 100644
--- a/src/backend/catalog/pg_subscription.c
+++ b/src/backend/catalog/pg_subscription.c
@@ -337,6 +337,9 @@ GetSubscriptionRelState(Oid subid, Oid relid, XLogRecPtr *sublsn)
char substate;
bool isnull;
Datum d;
+ Relation rel;
+
+ rel = table_open(SubscriptionRelRelationId, AccessShareLock);
/* Try finding the mapping. */
tup = SearchSysCache2(SUBSCRIPTIONRELMAP,
@@ -363,6 +366,8 @@ GetSubscriptionRelState(Oid subid, Oid relid, XLogRecPtr *sublsn)
/* Cleanup */
ReleaseSysCache(tup);
+ table_close(rel, AccessShareLock);
+
return substate;
}
diff --git a/src/backend/commands/subscriptioncmds.c b/src/backend/commands/subscriptioncmds.c
index 082f785..46f8d70 100644
--- a/src/backend/commands/subscriptioncmds.c
+++ b/src/backend/commands/subscriptioncmds.c
@@ -34,6 +34,7 @@
#include "nodes/makefuncs.h"
#include "replication/logicallauncher.h"
#include "replication/origin.h"
+#include "replication/slot.h"
#include "replication/walreceiver.h"
#include "replication/walsender.h"
#include "replication/worker_internal.h"
@@ -46,6 +47,8 @@
#include "utils/syscache.h"
static List *fetch_table_list(WalReceiverConn *wrconn, List *publications);
+static void ReportSlotConnectionError(List* rstates, Oid subid, char *slotname, char *err);
+
/*
* Common option parsing function for CREATE and ALTER SUBSCRIPTION commands.
@@ -566,107 +569,191 @@ AlterSubscription_refresh(Subscription *sub, bool copy_data)
Oid *pubrel_local_oids;
ListCell *lc;
int off;
+ int remove_rel_len;
+ Relation rel = NULL;
+ typedef struct SubRemoveRels
+ {
+ Oid relid;
+ char state;
+ } SubRemoveRels;
+ SubRemoveRels *sub_remove_rels;
/* Load the library providing us libpq calls. */
load_file("libpqwalreceiver", false);
- /* Try to connect to the publisher. */
- wrconn = walrcv_connect(sub->conninfo, true, sub->name, &err);
- if (!wrconn)
- ereport(ERROR,
- (errmsg("could not connect to the publisher: %s", err)));
+ PG_TRY();
+ {
+ /* Try to connect to the publisher. */
+ wrconn = walrcv_connect(sub->conninfo, true, sub->name, &err);
+ if (!wrconn)
+ ereport(ERROR,
+ (errmsg("could not connect to the publisher: %s", err)));
- /* Get the table list from publisher. */
- pubrel_names = fetch_table_list(wrconn, sub->publications);
+ /* Get the table list from publisher. */
+ pubrel_names = fetch_table_list(wrconn, sub->publications);
- /* We are done with the remote side, close connection. */
- walrcv_disconnect(wrconn);
+ /* Get local table list. */
+ subrel_states = GetSubscriptionRelations(sub->oid);
- /* Get local table list. */
- subrel_states = GetSubscriptionRelations(sub->oid);
+ /*
+ * Build qsorted array of local table oids for faster lookup. This can
+ * potentially contain all tables in the database so speed of lookup
+ * is important.
+ */
+ subrel_local_oids = palloc(list_length(subrel_states) * sizeof(Oid));
+ off = 0;
+ foreach(lc, subrel_states)
+ {
+ SubscriptionRelState *relstate = (SubscriptionRelState *) lfirst(lc);
- /*
- * Build qsorted array of local table oids for faster lookup. This can
- * potentially contain all tables in the database so speed of lookup is
- * important.
- */
- subrel_local_oids = palloc(list_length(subrel_states) * sizeof(Oid));
- off = 0;
- foreach(lc, subrel_states)
- {
- SubscriptionRelState *relstate = (SubscriptionRelState *) lfirst(lc);
+ subrel_local_oids[off++] = relstate->relid;
+ }
+ qsort(subrel_local_oids, list_length(subrel_states),
+ sizeof(Oid), oid_cmp);
+
+ sub_remove_rels = palloc(list_length(subrel_states) * sizeof(SubRemoveRels));
+
+ /*
+ * Walk over the remote tables and try to match them to locally known
+ * tables. If the table is not known locally create a new state for
+ * it.
+ *
+ * Also builds array of local oids of remote tables for the next step.
+ */
+ off = 0;
+ pubrel_local_oids = palloc(list_length(pubrel_names) * sizeof(Oid));
+
+ foreach(lc, pubrel_names)
+ {
+ RangeVar *rv = (RangeVar *) lfirst(lc);
+ Oid relid;
- subrel_local_oids[off++] = relstate->relid;
- }
- qsort(subrel_local_oids, list_length(subrel_states),
- sizeof(Oid), oid_cmp);
+ relid = RangeVarGetRelid(rv, AccessShareLock, false);
- /*
- * Walk over the remote tables and try to match them to locally known
- * tables. If the table is not known locally create a new state for it.
- *
- * Also builds array of local oids of remote tables for the next step.
- */
- off = 0;
- pubrel_local_oids = palloc(list_length(pubrel_names) * sizeof(Oid));
+ /* Check for supported relkind. */
+ CheckSubscriptionRelkind(get_rel_relkind(relid),
+ rv->schemaname, rv->relname);
- foreach(lc, pubrel_names)
- {
- RangeVar *rv = (RangeVar *) lfirst(lc);
- Oid relid;
-
- relid = RangeVarGetRelid(rv, AccessShareLock, false);
+ pubrel_local_oids[off++] = relid;
- /* Check for supported relkind. */
- CheckSubscriptionRelkind(get_rel_relkind(relid),
- rv->schemaname, rv->relname);
+ if (!bsearch(&relid, subrel_local_oids,
+ list_length(subrel_states), sizeof(Oid), oid_cmp))
+ {
+ AddSubscriptionRelState(sub->oid, relid,
+ copy_data ? SUBREL_STATE_INIT : SUBREL_STATE_READY,
+ InvalidXLogRecPtr);
+ ereport(DEBUG1,
+ (errmsg("table \"%s.%s\" added to subscription \"%s\"",
+ rv->schemaname, rv->relname, sub->name)));
+ }
+ }
- pubrel_local_oids[off++] = relid;
+ /*
+ * Next remove state for tables we should not care about anymore using
+ * the data we collected above
+ */
+ qsort(pubrel_local_oids, list_length(pubrel_names),
+ sizeof(Oid), oid_cmp);
- if (!bsearch(&relid, subrel_local_oids,
- list_length(subrel_states), sizeof(Oid), oid_cmp))
+ remove_rel_len = 0;
+ for (off = 0; off < list_length(subrel_states); off++)
{
- AddSubscriptionRelState(sub->oid, relid,
- copy_data ? SUBREL_STATE_INIT : SUBREL_STATE_READY,
- InvalidXLogRecPtr);
- ereport(DEBUG1,
- (errmsg("table \"%s.%s\" added to subscription \"%s\"",
- rv->schemaname, rv->relname, sub->name)));
- }
- }
+ Oid relid = subrel_local_oids[off];
- /*
- * Next remove state for tables we should not care about anymore using the
- * data we collected above
- */
- qsort(pubrel_local_oids, list_length(pubrel_names),
- sizeof(Oid), oid_cmp);
+ if (!bsearch(&relid, pubrel_local_oids,
+ list_length(pubrel_names), sizeof(Oid), oid_cmp))
+ {
+ char state;
+ XLogRecPtr statelsn;
+
+ /*
+ * Lock pg_subscription_rel with AccessExclusiveLock to prevent any race
+ * conditions with the apply worker re-launching workers at the same time
+ * this code is trying to remove those tables.
+ *
+ * Even if new worker for this particular rel is restarted it won't be able
+ * to make any progress as we hold exclusive lock on subscription_rel till
+ * the transaction end. It will simply exit as there is no corresponding
+ * rel entry.
+ *
+ * This locking also ensures that the state of rels won't change till we
+ * are done with this refresh operation.
+ */
+ if (!rel)
+ rel = table_open(SubscriptionRelRelationId, AccessExclusiveLock);
+
+ /* Last known rel state. */
+ state = GetSubscriptionRelState(sub->oid, relid, &statelsn);
+
+ sub_remove_rels[remove_rel_len].relid = relid;
+ sub_remove_rels[remove_rel_len++].state = state;
+
+ RemoveSubscriptionRel(sub->oid, relid);
+
+ logicalrep_worker_stop(sub->oid, relid);
+
+ /*
+ * For READY state, we would have already dropped the tablesync
+ * origin.
+ */
+ if (state != SUBREL_STATE_READY)
+ {
+ /*
+ * Drop the tablesync's origin tracking if exists.
+ */
+ tablesync_replorigin_drop(sub->oid, relid, false /* nowait */);
+ }
- for (off = 0; off < list_length(subrel_states); off++)
- {
- Oid relid = subrel_local_oids[off];
+ ereport(DEBUG1,
+ (errmsg("table \"%s.%s\" removed from subscription \"%s\"",
+ get_namespace_name(get_rel_namespace(relid)),
+ get_rel_name(relid),
+ sub->name)));
+ }
+ }
- if (!bsearch(&relid, pubrel_local_oids,
- list_length(pubrel_names), sizeof(Oid), oid_cmp))
+ /*
+ * Drop the tablesync slot. This has to be at the end because otherwise if there
+ * is an error while doing the database operations we won't be able to rollback
+ * dropped slot.
+ */
+ for (off = 0; off < remove_rel_len; off++)
{
- RemoveSubscriptionRel(sub->oid, relid);
-
- logicalrep_worker_stop_at_commit(sub->oid, relid);
-
- ereport(DEBUG1,
- (errmsg("table \"%s.%s\" removed from subscription \"%s\"",
- get_namespace_name(get_rel_namespace(relid)),
- get_rel_name(relid),
- sub->name)));
+ if (sub_remove_rels[off].state != SUBREL_STATE_READY &&
+ sub_remove_rels[off].state != SUBREL_STATE_SYNCDONE)
+ {
+ char syncslotname[NAMEDATALEN] = { 0 };
+
+ /*
+ * For READY/SYNCDONE states we know the tablesync slot has
+ * already been dropped by the tablesync worker.
+ *
+ * For other states, there is no certainty. Maybe the slot
+ * does not exist yet; Maybe the slot is already deleted but
+ * SYNCDONE is not yet set. For this reason we allow
+ * missing_ok = true for the drop.
+ */
+ ReplicationSlotNameForTablesync(sub->oid, sub_remove_rels[off].relid, syncslotname);
+ ReplicationSlotDropAtPubNode(wrconn, syncslotname, true /* missing_ok */);
+ }
}
}
+ PG_FINALLY();
+ {
+ if (wrconn)
+ walrcv_disconnect(wrconn);
+ }
+ PG_END_TRY();
+
+ if (rel)
+ table_close(rel, NoLock);
}
/*
* Alter the existing subscription.
*/
ObjectAddress
-AlterSubscription(AlterSubscriptionStmt *stmt)
+AlterSubscription(AlterSubscriptionStmt *stmt, bool isTopLevel)
{
Relation rel;
ObjectAddress myself;
@@ -848,6 +935,8 @@ AlterSubscription(AlterSubscriptionStmt *stmt)
errmsg("ALTER SUBSCRIPTION with refresh is not allowed for disabled subscriptions"),
errhint("Use ALTER SUBSCRIPTION ... SET PUBLICATION ... WITH (refresh = false).")));
+ PreventInTransactionBlock(isTopLevel, "ALTER SUBSCRIPTION with refresh");
+
/* Make sure refresh sees the new list of publications. */
sub->publications = stmt->publication;
@@ -877,6 +966,8 @@ AlterSubscription(AlterSubscriptionStmt *stmt)
NULL, NULL, /* no "binary" */
NULL, NULL); /* no "streaming" */
+ PreventInTransactionBlock(isTopLevel, "ALTER SUBSCRIPTION ... REFRESH");
+
AlterSubscription_refresh(sub, copy_data);
break;
@@ -928,8 +1019,8 @@ DropSubscription(DropSubscriptionStmt *stmt, bool isTopLevel)
char *err = NULL;
RepOriginId originid;
WalReceiverConn *wrconn = NULL;
- StringInfoData cmd;
Form_pg_subscription form;
+ List *rstates;
/*
* Lock pg_subscription with AccessExclusiveLock to ensure that the
@@ -1042,6 +1133,31 @@ DropSubscription(DropSubscriptionStmt *stmt, bool isTopLevel)
}
list_free(subworkers);
+ /*
+ * Cleanup of tablesync replication origins.
+ *
+ * Any READY-state relations would already have dealt with clean-ups.
+ *
+ * Note that the state can't change because we have already stopped both
+ * the apply and tablesync workers and they can't restart because of
+ * exclusive lock on the subscription.
+ */
+ rstates = GetSubscriptionNotReadyRelations(subid);
+ foreach(lc, rstates)
+ {
+ SubscriptionRelState* rstate = (SubscriptionRelState*) lfirst(lc);
+ Oid relid = rstate->relid;
+
+ /* Only cleanup resources of tablesync workers */
+ if (!OidIsValid(relid))
+ continue;
+
+ /*
+ * Drop the tablesync's origin tracking if exists.
+ */
+ tablesync_replorigin_drop(subid, relid, false /* nowait */);
+ }
+
/* Clean up dependencies */
deleteSharedDependencyRecordsFor(SubscriptionRelationId, subid, 0);
@@ -1054,34 +1170,111 @@ DropSubscription(DropSubscriptionStmt *stmt, bool isTopLevel)
if (originid != InvalidRepOriginId)
replorigin_drop(originid, false);
+
/*
* If there is no slot associated with the subscription, we can finish
* here.
*/
- if (!slotname)
+ if (!slotname && rstates == NIL)
{
table_close(rel, NoLock);
return;
}
/*
- * Otherwise drop the replication slot at the publisher node using the
- * replication connection.
+ * Try to acquire the connection necessary for dropping slots.
+ *
+ * Note: If the slotname is NONE/NULL then we allow the command to finish
+ * and users need to manually cleanup the apply and tablesync worker slots
+ * later.
+ *
+ * This has to be at the end because otherwise if there is an error while
+ * doing the database operations we won't be able to rollback dropped slot.
*/
load_file("libpqwalreceiver", false);
- initStringInfo(&cmd);
- appendStringInfo(&cmd, "DROP_REPLICATION_SLOT %s WAIT", quote_identifier(slotname));
-
wrconn = walrcv_connect(conninfo, true, subname, &err);
if (wrconn == NULL)
- ereport(ERROR,
- (errmsg("could not connect to publisher when attempting to "
- "drop the replication slot \"%s\"", slotname),
- errdetail("The error was: %s", err),
- /* translator: %s is an SQL ALTER command */
- errhint("Use %s to disassociate the subscription from the slot.",
- "ALTER SUBSCRIPTION ... SET (slot_name = NONE)")));
+ {
+ if (!slotname)
+ {
+ /* be tidy */
+ list_free(rstates);
+ return;
+ }
+ else
+ {
+ ReportSlotConnectionError(rstates, subid, slotname, err);
+ }
+
+ }
+
+ PG_TRY();
+ {
+ foreach(lc, rstates)
+ {
+ SubscriptionRelState *rstate = (SubscriptionRelState *) lfirst(lc);
+ Oid relid = rstate->relid;
+
+ /* Only cleanup resources of tablesync workers */
+ if (!OidIsValid(relid))
+ continue;
+
+ /*
+ * Drop the tablesync slot.
+ *
+ * For SYNCDONE/READY states, the tablesync slot is known to have
+ * already been dropped by the tablesync worker.
+ *
+ * For other states, there is no certainty. Maybe the slot does
+ * not exist yet; Maybe the slot is already deleted but SYNCDONE
+ * is not yet set. For this reason, we allow missing_ok = true for
+ * the drop.
+ */
+ if (rstate->state != SUBREL_STATE_SYNCDONE)
+ {
+ char syncslotname[NAMEDATALEN] = {0};
+
+ ReplicationSlotNameForTablesync(subid, relid, syncslotname);
+ ReplicationSlotDropAtPubNode(wrconn, syncslotname, true /* missing_ok */ );
+ }
+ }
+
+ list_free(rstates);
+
+ /*
+ * If there is a slot associated with the subscription, then drop the
+ * replication slot at the publisher.
+ */
+ if (slotname)
+ ReplicationSlotDropAtPubNode(wrconn, slotname, false /* missing_ok */ );
+
+ }
+ PG_FINALLY();
+ {
+ walrcv_disconnect(wrconn);
+ }
+ PG_END_TRY();
+
+ table_close(rel, NoLock);
+}
+
+/*
+ * Drop the replication slot at the publisher node using the replication connection.
+ *
+ * missing_ok - if true then only issue WARNING message if the slot cannot be deleted.
+ */
+void
+ReplicationSlotDropAtPubNode(WalReceiverConn *wrconn, char *slotname, bool missing_ok)
+{
+ StringInfoData cmd;
+
+ Assert(wrconn);
+
+ load_file("libpqwalreceiver", false);
+
+ initStringInfo(&cmd);
+ appendStringInfo(&cmd, "DROP_REPLICATION_SLOT %s WAIT", quote_identifier(slotname));
PG_TRY();
{
@@ -1089,27 +1282,37 @@ DropSubscription(DropSubscriptionStmt *stmt, bool isTopLevel)
res = walrcv_exec(wrconn, cmd.data, 0, NULL);
- if (res->status != WALRCV_OK_COMMAND)
- ereport(ERROR,
+ if (res->status == WALRCV_OK_COMMAND)
+ {
+ /* NOTICE. Success. */
+ ereport(NOTICE,
+ (errmsg("dropped replication slot \"%s\" on publisher",
+ slotname)));
+ }
+ else if (res->status == WALRCV_ERROR && missing_ok)
+ {
+ /* WARNING. Error, but missing_ok = true. */
+ ereport(WARNING,
(errmsg("could not drop the replication slot \"%s\" on publisher",
slotname),
errdetail("The error was: %s", res->err)));
+ }
else
- ereport(NOTICE,
- (errmsg("dropped replication slot \"%s\" on publisher",
- slotname)));
+ {
+ /* ERROR. */
+ ereport(ERROR,
+ (errmsg("could not drop the replication slot \"%s\" on publisher",
+ slotname),
+ errdetail("The error was: %s", res->err)));
+ }
walrcv_clear_result(res);
}
PG_FINALLY();
{
- walrcv_disconnect(wrconn);
+ pfree(cmd.data);
}
PG_END_TRY();
-
- pfree(cmd.data);
-
- table_close(rel, NoLock);
}
/*
@@ -1278,3 +1481,46 @@ fetch_table_list(WalReceiverConn *wrconn, List *publications)
return tablelist;
}
+
+/*
+ * This is to report the connection failure while dropping replication slots.
+ * Here, we report the WARNING for all tablesync slots so that user can drop
+ * them manually, if required.
+ */
+static void
+ReportSlotConnectionError(List *rstates, Oid subid, char *slotname, char *err)
+{
+ ListCell *lc;
+
+ foreach(lc, rstates)
+ {
+ SubscriptionRelState *rstate = (SubscriptionRelState *) lfirst(lc);
+ Oid relid = rstate->relid;
+
+ /* Only cleanup resources of tablesync workers */
+ if (!OidIsValid(relid))
+ continue;
+
+ /*
+ * Caller needs to ensure that we have appropriate locks so that
+ * relstate doesn't change underneath us.
+ */
+ if (rstate->state != SUBREL_STATE_SYNCDONE)
+ {
+ char syncslotname[NAMEDATALEN] = { 0 };
+
+ ReplicationSlotNameForTablesync(subid, relid, syncslotname);
+ elog(WARNING, "could not drop tablesync replication slot \"%s\"",
+ syncslotname);
+
+ }
+ }
+
+ ereport(ERROR,
+ (errmsg("could not connect to publisher when attempting to "
+ "drop the replication slot \"%s\"", slotname),
+ errdetail("The error was: %s", err),
+ /* translator: %s is an SQL ALTER command */
+ errhint("Use %s to disassociate the subscription from the slot.",
+ "ALTER SUBSCRIPTION ... SET (slot_name = NONE)")));
+}
diff --git a/src/backend/replication/logical/launcher.c b/src/backend/replication/logical/launcher.c
index 186514c..58082dd 100644
--- a/src/backend/replication/logical/launcher.c
+++ b/src/backend/replication/logical/launcher.c
@@ -73,20 +73,6 @@ typedef struct LogicalRepWorkerId
Oid relid;
} LogicalRepWorkerId;
-typedef struct StopWorkersData
-{
- int nestDepth; /* Sub-transaction nest level */
- List *workers; /* List of LogicalRepWorkerId */
- struct StopWorkersData *parent; /* This need not be an immediate
- * subtransaction parent */
-} StopWorkersData;
-
-/*
- * Stack of StopWorkersData elements. Each stack element contains the workers
- * to be stopped for that subtransaction.
- */
-static StopWorkersData *on_commit_stop_workers = NULL;
-
static void ApplyLauncherWakeup(void);
static void logicalrep_launcher_onexit(int code, Datum arg);
static void logicalrep_worker_onexit(int code, Datum arg);
@@ -547,51 +533,6 @@ logicalrep_worker_stop(Oid subid, Oid relid)
}
/*
- * Request worker for specified sub/rel to be stopped on commit.
- */
-void
-logicalrep_worker_stop_at_commit(Oid subid, Oid relid)
-{
- int nestDepth = GetCurrentTransactionNestLevel();
- LogicalRepWorkerId *wid;
- MemoryContext oldctx;
-
- /* Make sure we store the info in context that survives until commit. */
- oldctx = MemoryContextSwitchTo(TopTransactionContext);
-
- /* Check that previous transactions were properly cleaned up. */
- Assert(on_commit_stop_workers == NULL ||
- nestDepth >= on_commit_stop_workers->nestDepth);
-
- /*
- * Push a new stack element if we don't already have one for the current
- * nestDepth.
- */
- if (on_commit_stop_workers == NULL ||
- nestDepth > on_commit_stop_workers->nestDepth)
- {
- StopWorkersData *newdata = palloc(sizeof(StopWorkersData));
-
- newdata->nestDepth = nestDepth;
- newdata->workers = NIL;
- newdata->parent = on_commit_stop_workers;
- on_commit_stop_workers = newdata;
- }
-
- /*
- * Finally add a new worker into the worker list of the current
- * subtransaction.
- */
- wid = palloc(sizeof(LogicalRepWorkerId));
- wid->subid = subid;
- wid->relid = relid;
- on_commit_stop_workers->workers =
- lappend(on_commit_stop_workers->workers, wid);
-
- MemoryContextSwitchTo(oldctx);
-}
-
-/*
* Wake up (using latch) any logical replication worker for specified sub/rel.
*/
void
@@ -820,109 +761,21 @@ ApplyLauncherShmemInit(void)
}
/*
- * Check whether current transaction has manipulated logical replication
- * workers.
- */
-bool
-XactManipulatesLogicalReplicationWorkers(void)
-{
- return (on_commit_stop_workers != NULL);
-}
-
-/*
* Wakeup the launcher on commit if requested.
*/
void
AtEOXact_ApplyLauncher(bool isCommit)
{
-
- Assert(on_commit_stop_workers == NULL ||
- (on_commit_stop_workers->nestDepth == 1 &&
- on_commit_stop_workers->parent == NULL));
-
if (isCommit)
{
- ListCell *lc;
-
- if (on_commit_stop_workers != NULL)
- {
- List *workers = on_commit_stop_workers->workers;
-
- foreach(lc, workers)
- {
- LogicalRepWorkerId *wid = lfirst(lc);
-
- logicalrep_worker_stop(wid->subid, wid->relid);
- }
- }
-
if (on_commit_launcher_wakeup)
ApplyLauncherWakeup();
}
- /*
- * No need to pfree on_commit_stop_workers. It was allocated in
- * transaction memory context, which is going to be cleaned soon.
- */
- on_commit_stop_workers = NULL;
on_commit_launcher_wakeup = false;
}
/*
- * On commit, merge the current on_commit_stop_workers list into the
- * immediate parent, if present.
- * On rollback, discard the current on_commit_stop_workers list.
- * Pop out the stack.
- */
-void
-AtEOSubXact_ApplyLauncher(bool isCommit, int nestDepth)
-{
- StopWorkersData *parent;
-
- /* Exit immediately if there's no work to do at this level. */
- if (on_commit_stop_workers == NULL ||
- on_commit_stop_workers->nestDepth < nestDepth)
- return;
-
- Assert(on_commit_stop_workers->nestDepth == nestDepth);
-
- parent = on_commit_stop_workers->parent;
-
- if (isCommit)
- {
- /*
- * If the upper stack element is not an immediate parent
- * subtransaction, just decrement the notional nesting depth without
- * doing any real work. Else, we need to merge the current workers
- * list into the parent.
- */
- if (!parent || parent->nestDepth < nestDepth - 1)
- {
- on_commit_stop_workers->nestDepth--;
- return;
- }
-
- parent->workers =
- list_concat(parent->workers, on_commit_stop_workers->workers);
- }
- else
- {
- /*
- * Abandon everything that was done at this nesting level. Explicitly
- * free memory to avoid a transaction-lifespan leak.
- */
- list_free_deep(on_commit_stop_workers->workers);
- }
-
- /*
- * We have taken care of the current subtransaction workers list for both
- * abort or commit. So we are ready to pop the stack.
- */
- pfree(on_commit_stop_workers);
- on_commit_stop_workers = parent;
-}
-
-/*
* Request wakeup of the launcher on commit of the transaction.
*
* This is used to send launcher signal to stop sleeping and process the
diff --git a/src/backend/replication/logical/tablesync.c b/src/backend/replication/logical/tablesync.c
index 863d196..cc49dc4 100644
--- a/src/backend/replication/logical/tablesync.c
+++ b/src/backend/replication/logical/tablesync.c
@@ -31,8 +31,11 @@
* table state to INIT.
* - Tablesync worker starts; changes table state from INIT to DATASYNC while
* copying.
- * - Tablesync worker finishes the copy and sets table state to SYNCWAIT;
- * waits for state change.
+ * - Tablesync worker does initial table copy; there is a FINISHEDCOPY (sync
+ * worker specific) state to indicate when the copy phase has completed, so
+ * if the worker crashes with this (non-memory) state then the copy will not
+ * be re-attempted.
+ * - Tablesync worker then sets table state to SYNCWAIT; waits for state change.
* - Apply worker periodically checks for tables in SYNCWAIT state. When
* any appear, it sets the table state to CATCHUP and starts loop-waiting
* until either the table state is set to SYNCDONE or the sync worker
@@ -48,8 +51,8 @@
* point it sets state to READY and stops tracking. Again, there might
* be zero changes in between.
*
- * So the state progression is always: INIT -> DATASYNC -> SYNCWAIT ->
- * CATCHUP -> SYNCDONE -> READY.
+ * So the state progression is always: INIT -> DATASYNC -> FINISHEDCOPY
+ * -> SYNCWAIT -> CATCHUP -> SYNCDONE -> READY.
*
* The catalog pg_subscription_rel is used to keep information about
* subscribed tables and their state. Some transient state during data
@@ -59,6 +62,7 @@
* Example flows look like this:
* - Apply is in front:
* sync:8
+ * -> set in catalog FINISHEDCOPY
* -> set in memory SYNCWAIT
* apply:10
* -> set in memory CATCHUP
@@ -74,6 +78,7 @@
*
* - Sync is in front:
* sync:10
+ * -> set in catalog FINISHEDCOPY
* -> set in memory SYNCWAIT
* apply:8
* -> set in memory CATCHUP
@@ -98,11 +103,16 @@
#include "miscadmin.h"
#include "parser/parse_relation.h"
#include "pgstat.h"
+#include "postmaster/interrupt.h"
#include "replication/logicallauncher.h"
#include "replication/logicalrelation.h"
+#include "replication/logicalworker.h"
#include "replication/walreceiver.h"
#include "replication/worker_internal.h"
+#include "replication/slot.h"
+#include "replication/origin.h"
#include "storage/ipc.h"
+#include "storage/lmgr.h"
#include "utils/builtins.h"
#include "utils/lsyscache.h"
#include "utils/memutils.h"
@@ -113,6 +123,42 @@ static bool table_states_valid = false;
StringInfo copybuf = NULL;
/*
+ * Common code to drop the origin of a tablesync worker.
+ *
+ * There is a potential race condition if two processes attempt to call
+ * replorigin_drop for the same originid at the same time. The loser of
+ * that race would give an ERROR saying that it failed to find the
+ * expected originid.
+ *
+ * The TRY/CATCH below supresses such errors allowing the tablesync cleanup
+ * code to proceed.
+ */
+void
+tablesync_replorigin_drop(Oid subid, Oid relid, bool nowait)
+{
+ char originname[NAMEDATALEN];
+ RepOriginId originid;
+
+ snprintf(originname, sizeof(originname), "pg_%u_%u", subid, relid);
+ originid = replorigin_by_name(originname, true);
+ if (OidIsValid(originid))
+ {
+ PG_TRY();
+ {
+ replorigin_drop(originid, nowait);
+ }
+ PG_CATCH();
+ {
+ ereport(WARNING,
+ errmsg("could not drop replication origin with OID %d, named \"%s\"",
+ originid,
+ originname));
+ }
+ PG_END_TRY();
+ }
+}
+
+/*
* Exit routine for synchronization worker.
*/
static void
@@ -270,30 +316,55 @@ invalidate_syncing_table_states(Datum arg, int cacheid, uint32 hashvalue)
static void
process_syncing_tables_for_sync(XLogRecPtr current_lsn)
{
- Assert(IsTransactionState());
+ bool sync_done = false;
+ Oid subid = MySubscription->oid;
+ Oid relid = MyLogicalRepWorker->relid;
SpinLockAcquire(&MyLogicalRepWorker->relmutex);
+ sync_done = MyLogicalRepWorker->relstate == SUBREL_STATE_CATCHUP &&
+ current_lsn >= MyLogicalRepWorker->relstate_lsn;
+ SpinLockRelease(&MyLogicalRepWorker->relmutex);
- if (MyLogicalRepWorker->relstate == SUBREL_STATE_CATCHUP &&
- current_lsn >= MyLogicalRepWorker->relstate_lsn)
+ if (sync_done)
{
TimeLineID tli;
+ char syncslotname[NAMEDATALEN] = {0};
+
+ /* End wal streaming so wrconn can be re-used to drop the slot. */
+ walrcv_endstreaming(wrconn, &tli);
+
+ /*
+ * Cleanup the tablesync slot.
+ */
+ ReplicationSlotNameForTablesync(subid, relid, syncslotname);
+ ReplicationSlotDropAtPubNode(wrconn, syncslotname, false /* missing_ok */ );
+
+ /*
+ * Change state to SYNCDONE.
+ */
+ SpinLockAcquire(&MyLogicalRepWorker->relmutex);
MyLogicalRepWorker->relstate = SUBREL_STATE_SYNCDONE;
MyLogicalRepWorker->relstate_lsn = current_lsn;
SpinLockRelease(&MyLogicalRepWorker->relmutex);
+ /*
+ * UpdateSubscriptionRelState must be called within a transaction.
+ * That transaction will be ended within the finish_sync_worker().
+ */
+ if (!IsTransactionState())
+ {
+ StartTransactionCommand();
+ }
+
UpdateSubscriptionRelState(MyLogicalRepWorker->subid,
MyLogicalRepWorker->relid,
MyLogicalRepWorker->relstate,
MyLogicalRepWorker->relstate_lsn);
- walrcv_endstreaming(wrconn, &tli);
finish_sync_worker();
}
- else
- SpinLockRelease(&MyLogicalRepWorker->relmutex);
}
/*
@@ -412,6 +483,21 @@ process_syncing_tables_for_apply(XLogRecPtr current_lsn)
started_tx = true;
}
+ /*
+ * Remove the tablesync origin tracking if exists.
+ *
+ * The normal case origin drop is done here instead of in the
+ * process_syncing_tables_for_sync function because if the
+ * tablesync worker process attempted to call drop its own
+ * orign then would prevent the origin from advancing properly
+ * on commit TX.
+ */
+ tablesync_replorigin_drop(MyLogicalRepWorker->subid,
+ rstate->relid, false /* nowait */ );
+
+ /*
+ * Update the state to READY only after the origin cleanup.
+ */
UpdateSubscriptionRelState(MyLogicalRepWorker->subid,
rstate->relid, rstate->state,
rstate->lsn);
@@ -808,6 +894,40 @@ copy_table(Relation rel)
}
/*
+ * Determine the tablesync slot name.
+ *
+ * The name must not exceed NAMEDATALEN -1 because of remote node constraints
+ * on slot name length.
+ *
+ * The returned slot name is either:
+ * - stored in the supplied buffer (syncslotname), or
+ * - palloc'ed in current memory context (if syncslotname = NULL).
+ */
+char *
+ReplicationSlotNameForTablesync(Oid suboid, Oid relid, char syncslotname[NAMEDATALEN])
+{
+ /*
+ * Note: Since now we are using PERMANENT tablesync slots this code is not
+ * using the Subscription slot name as the first part of the tablesync
+ * slot name anymore. This part is omitted because we are now responsible
+ * for cleaning up the permanenet tablesync slots, so it could become
+ * impossible to recalculate what name to cleanup if the Subscription slot
+ * name had changed.
+ */
+
+ if (syncslotname)
+ {
+ sprintf(syncslotname, "pg_%u_sync_%u", suboid, relid);
+ }
+ else
+ {
+ syncslotname = psprintf("pg_%u_sync_%u", suboid, relid);
+ }
+
+ return syncslotname;
+}
+
+/*
* Start syncing the table in the sync worker.
*
* If nothing needs to be done to sync the table, we exit the worker without
@@ -824,6 +944,8 @@ LogicalRepSyncTableStart(XLogRecPtr *origin_startpos)
XLogRecPtr relstate_lsn;
Relation rel;
WalRcvExecResult *res;
+ char originname[NAMEDATALEN];
+ RepOriginId originid;
/* Check the state of the table synchronization. */
StartTransactionCommand();
@@ -849,19 +971,10 @@ LogicalRepSyncTableStart(XLogRecPtr *origin_startpos)
finish_sync_worker(); /* doesn't return */
}
- /*
- * To build a slot name for the sync work, we are limited to NAMEDATALEN -
- * 1 characters. We cut the original slot name to NAMEDATALEN - 28 chars
- * and append _%u_sync_%u (1 + 10 + 6 + 10 + '\0'). (It's actually the
- * NAMEDATALEN on the remote that matters, but this scheme will also work
- * reasonably if that is different.)
- */
- StaticAssertStmt(NAMEDATALEN >= 32, "NAMEDATALEN too small"); /* for sanity */
- slotname = psprintf("%.*s_%u_sync_%u",
- NAMEDATALEN - 28,
- MySubscription->slotname,
- MySubscription->oid,
- MyLogicalRepWorker->relid);
+ /* Calculate the name of the tablesync slot. */
+ slotname = ReplicationSlotNameForTablesync(MySubscription->oid,
+ MyLogicalRepWorker->relid,
+ NULL /* use palloc */ );
/*
* Here we use the slot name instead of the subscription name as the
@@ -874,7 +987,33 @@ LogicalRepSyncTableStart(XLogRecPtr *origin_startpos)
(errmsg("could not connect to the publisher: %s", err)));
Assert(MyLogicalRepWorker->relstate == SUBREL_STATE_INIT ||
- MyLogicalRepWorker->relstate == SUBREL_STATE_DATASYNC);
+ MyLogicalRepWorker->relstate == SUBREL_STATE_DATASYNC ||
+ MyLogicalRepWorker->relstate == SUBREL_STATE_FINISHEDCOPY);
+
+ /* Assign the origin tracking record name. */
+ snprintf(originname, sizeof(originname), "pg_%u_%u", MySubscription->oid, MyLogicalRepWorker->relid);
+
+ if (MyLogicalRepWorker->relstate == SUBREL_STATE_FINISHEDCOPY)
+ {
+ /*
+ * The COPY phase was previously done, but tablesync then crashed
+ * before it was able to finish normally.
+ */
+ StartTransactionCommand();
+
+ /*
+ * The origin tracking name must already exist. It was created first
+ * time this tablesync was launched.
+ */
+ originid = replorigin_by_name(originname, false /* missing_ok */ );
+ replorigin_session_setup(originid);
+ replorigin_session_origin = originid;
+ *origin_startpos = replorigin_session_get_progress(false);
+
+ CommitTransactionCommand();
+
+ goto copy_table_done;
+ }
SpinLockAcquire(&MyLogicalRepWorker->relmutex);
MyLogicalRepWorker->relstate = SUBREL_STATE_DATASYNC;
@@ -890,9 +1029,6 @@ LogicalRepSyncTableStart(XLogRecPtr *origin_startpos)
CommitTransactionCommand();
pgstat_report_stat(false);
- /*
- * We want to do the table data sync in a single transaction.
- */
StartTransactionCommand();
/*
@@ -918,29 +1054,108 @@ LogicalRepSyncTableStart(XLogRecPtr *origin_startpos)
walrcv_clear_result(res);
/*
- * Create a new temporary logical decoding slot. This slot will be used
- * for the catchup phase after COPY is done, so tell it to use the
- * snapshot to make the final data consistent.
+ * Be sure to remove the newly created tablesync slot if the COPY fails.
*/
- walrcv_create_slot(wrconn, slotname, true,
- CRS_USE_SNAPSHOT, origin_startpos);
+ PG_TRY();
+ {
+ /*
+ * Create a new permanent logical decoding slot. This slot will be
+ * used for the catchup phase after COPY is done, so tell it to use
+ * the snapshot to make the final data consistent.
+ */
+ walrcv_create_slot(wrconn, slotname, false /* permanent */ ,
+ CRS_USE_SNAPSHOT, origin_startpos);
- /* Now do the initial data copy */
- PushActiveSnapshot(GetTransactionSnapshot());
- copy_table(rel);
- PopActiveSnapshot();
+ /* Now do the initial data copy */
+ PushActiveSnapshot(GetTransactionSnapshot());
+ copy_table(rel);
+ PopActiveSnapshot();
- res = walrcv_exec(wrconn, "COMMIT", 0, NULL);
- if (res->status != WALRCV_OK_COMMAND)
- ereport(ERROR,
- (errmsg("table copy could not finish transaction on publisher"),
- errdetail("The error was: %s", res->err)));
- walrcv_clear_result(res);
+ res = walrcv_exec(wrconn, "COMMIT", 0, NULL);
+ if (res->status != WALRCV_OK_COMMAND)
+ ereport(ERROR,
+ (errmsg("table copy could not finish transaction on publisher"),
+ errdetail("The error was: %s", res->err)));
+ walrcv_clear_result(res);
+
+ table_close(rel, NoLock);
+
+ /* Make the copy visible. */
+ CommandCounterIncrement();
+
+ /* Setup replication origin tracking. */
+ originid = replorigin_by_name(originname, true);
+ if (!OidIsValid(originid))
+ {
+ /*
+ * Origin tracking does not exist, so create it now.
+ *
+ * Then advance to the LSN got from walrcv_create_slot. This is
+ * WAL logged for the purpose of recovery. Locks are to prevent
+ * the replication origin from vanishing while advancing.
+ */
+ originid = replorigin_create(originname);
+
+ LockRelationOid(ReplicationOriginRelationId, RowExclusiveLock);
+ replorigin_advance(originid, *origin_startpos, InvalidXLogRecPtr,
+ true /* go backward */ , true /* WAL log */ );
+ UnlockRelationOid(ReplicationOriginRelationId, RowExclusiveLock);
+
+ replorigin_session_setup(originid);
+ replorigin_session_origin = originid;
+ }
+ else
+ {
+ ereport(ERROR,
+ (errcode(ERRCODE_DUPLICATE_OBJECT),
+ errmsg("replication origin \"%s\" already exists",
+ originname)));
+ }
+
+ /*
+ * Update the persisted state to indicate the COPY phase is done; make
+ * it visible to others.
+ */
+ UpdateSubscriptionRelState(MyLogicalRepWorker->subid,
+ MyLogicalRepWorker->relid,
+ SUBREL_STATE_FINISHEDCOPY,
+ MyLogicalRepWorker->relstate_lsn);
+
+ CommitTransactionCommand();
+ }
+ PG_CATCH();
+ {
+ /*
+ * Cleanup the transaction state on publisher before performing any
+ * other operation.
+ */
+ res = walrcv_exec(wrconn, "ROLLBACK", 0, NULL);
+ if (res->status != WALRCV_OK_COMMAND)
+ ereport(ERROR,
+ (errmsg("table copy could not rollback transaction on publisher"),
+ errdetail("The error was: %s", res->err)));
+ walrcv_clear_result(res);
+
+ /*
+ * If something failed during copy table then cleanup the created
+ * slot.
+ */
+ ReplicationSlotDropAtPubNode(wrconn, slotname, false /* missing_ok */ );
+
+ pfree(slotname);
+ slotname = NULL;
+
+ PG_RE_THROW();
+ }
+ PG_END_TRY();
- table_close(rel, NoLock);
+copy_table_done:
- /* Make the copy visible. */
- CommandCounterIncrement();
+ elog(DEBUG1,
+ "LogicalRepSyncTableStart: '%s' origin_startpos lsn %X/%X",
+ originname,
+ (uint32) (*origin_startpos >> 32),
+ (uint32) *origin_startpos);
/*
* We are done with the initial data synchronization, update the state.
diff --git a/src/backend/replication/logical/worker.c b/src/backend/replication/logical/worker.c
index eb7db89..cfc924c 100644
--- a/src/backend/replication/logical/worker.c
+++ b/src/backend/replication/logical/worker.c
@@ -807,12 +807,8 @@ apply_handle_stream_stop(StringInfo s)
/* We must be in a valid transaction state */
Assert(IsTransactionState());
- /* The synchronization worker runs in single transaction. */
- if (!am_tablesync_worker())
- {
- /* Commit the per-stream transaction */
- CommitTransactionCommand();
- }
+ /* Commit the per-stream transaction */
+ CommitTransactionCommand();
in_streamed_transaction = false;
@@ -889,9 +885,7 @@ apply_handle_stream_abort(StringInfo s)
/* Cleanup the subxact info */
cleanup_subxact_info();
- /* The synchronization worker runs in single transaction */
- if (!am_tablesync_worker())
- CommitTransactionCommand();
+ CommitTransactionCommand();
return;
}
@@ -918,8 +912,7 @@ apply_handle_stream_abort(StringInfo s)
/* write the updated subxact list */
subxact_info_write(MyLogicalRepWorker->subid, xid);
- if (!am_tablesync_worker())
- CommitTransactionCommand();
+ CommitTransactionCommand();
}
}
@@ -1062,8 +1055,7 @@ apply_handle_stream_commit(StringInfo s)
static void
apply_handle_commit_internal(StringInfo s, LogicalRepCommitData *commit_data)
{
- /* The synchronization worker runs in single transaction. */
- if (IsTransactionState() && !am_tablesync_worker())
+ if (IsTransactionState())
{
/*
* Update origin state so we can restart streaming from correct
diff --git a/src/backend/tcop/utility.c b/src/backend/tcop/utility.c
index 1d81071..05bb698 100644
--- a/src/backend/tcop/utility.c
+++ b/src/backend/tcop/utility.c
@@ -1786,7 +1786,8 @@ ProcessUtilitySlow(ParseState *pstate,
break;
case T_AlterSubscriptionStmt:
- address = AlterSubscription((AlterSubscriptionStmt *) parsetree);
+ address = AlterSubscription((AlterSubscriptionStmt *) parsetree,
+ isTopLevel);
break;
case T_DropSubscriptionStmt:
diff --git a/src/include/catalog/pg_subscription_rel.h b/src/include/catalog/pg_subscription_rel.h
index ab1202c..e04ba83 100644
--- a/src/include/catalog/pg_subscription_rel.h
+++ b/src/include/catalog/pg_subscription_rel.h
@@ -61,6 +61,8 @@ DECLARE_UNIQUE_INDEX_PKEY(pg_subscription_rel_srrelid_srsubid_index, 6117, on pg
#define SUBREL_STATE_INIT 'i' /* initializing (sublsn NULL) */
#define SUBREL_STATE_DATASYNC 'd' /* data is being synchronized (sublsn
* NULL) */
+#define SUBREL_STATE_FINISHEDCOPY 'f' /* tablesync copy phase is completed
+ * (sublsn NULL) */
#define SUBREL_STATE_SYNCDONE 's' /* synchronization finished in front of
* apply (sublsn set) */
#define SUBREL_STATE_READY 'r' /* ready (sublsn set) */
diff --git a/src/include/commands/subscriptioncmds.h b/src/include/commands/subscriptioncmds.h
index a818650..3b926f3 100644
--- a/src/include/commands/subscriptioncmds.h
+++ b/src/include/commands/subscriptioncmds.h
@@ -20,7 +20,7 @@
extern ObjectAddress CreateSubscription(CreateSubscriptionStmt *stmt,
bool isTopLevel);
-extern ObjectAddress AlterSubscription(AlterSubscriptionStmt *stmt);
+extern ObjectAddress AlterSubscription(AlterSubscriptionStmt *stmt, bool isTopLevel);
extern void DropSubscription(DropSubscriptionStmt *stmt, bool isTopLevel);
extern ObjectAddress AlterSubscriptionOwner(const char *name, Oid newOwnerId);
diff --git a/src/include/replication/logicallauncher.h b/src/include/replication/logicallauncher.h
index 421ec15..301e494 100644
--- a/src/include/replication/logicallauncher.h
+++ b/src/include/replication/logicallauncher.h
@@ -22,9 +22,7 @@ extern Size ApplyLauncherShmemSize(void);
extern void ApplyLauncherShmemInit(void);
extern void ApplyLauncherWakeupAtCommit(void);
-extern bool XactManipulatesLogicalReplicationWorkers(void);
extern void AtEOXact_ApplyLauncher(bool isCommit);
-extern void AtEOSubXact_ApplyLauncher(bool isCommit, int nestDepth);
extern bool IsLogicalLauncher(void);
diff --git a/src/include/replication/slot.h b/src/include/replication/slot.h
index 53f636c..5f52335 100644
--- a/src/include/replication/slot.h
+++ b/src/include/replication/slot.h
@@ -15,6 +15,7 @@
#include "storage/lwlock.h"
#include "storage/shmem.h"
#include "storage/spin.h"
+#include "replication/walreceiver.h"
/*
* Behaviour of replication slots, upon release or crash.
@@ -211,6 +212,8 @@ extern bool ReplicationSlotsCountDBSlots(Oid dboid, int *nslots, int *nactive);
extern void ReplicationSlotsDropDBSlots(Oid dboid);
extern void InvalidateObsoleteReplicationSlots(XLogSegNo oldestSegno);
extern ReplicationSlot *SearchNamedReplicationSlot(const char *name);
+extern char *ReplicationSlotNameForTablesync(Oid suboid, Oid relid, char *syncslotname);
+extern void ReplicationSlotDropAtPubNode(WalReceiverConn *wrconn, char *slotname, bool missing_ok);
extern void StartupReplicationSlots(void);
extern void CheckPointReplicationSlots(void);
diff --git a/src/include/replication/worker_internal.h b/src/include/replication/worker_internal.h
index d046022..4a5c49d 100644
--- a/src/include/replication/worker_internal.h
+++ b/src/include/replication/worker_internal.h
@@ -77,13 +77,14 @@ extern List *logicalrep_workers_find(Oid subid, bool only_running);
extern void logicalrep_worker_launch(Oid dbid, Oid subid, const char *subname,
Oid userid, Oid relid);
extern void logicalrep_worker_stop(Oid subid, Oid relid);
-extern void logicalrep_worker_stop_at_commit(Oid subid, Oid relid);
extern void logicalrep_worker_wakeup(Oid subid, Oid relid);
extern void logicalrep_worker_wakeup_ptr(LogicalRepWorker *worker);
extern int logicalrep_sync_worker_count(Oid subid);
extern char *LogicalRepSyncTableStart(XLogRecPtr *origin_startpos);
+extern void tablesync_replorigin_drop(Oid subid, Oid relid, bool nowait);
+
void process_syncing_tables(XLogRecPtr current_lsn);
void invalidate_syncing_table_states(Datum arg, int cacheid,
uint32 hashvalue);
diff --git a/src/tools/pgindent/typedefs.list b/src/tools/pgindent/typedefs.list
index 1d540fe..5f5c36d 100644
--- a/src/tools/pgindent/typedefs.list
+++ b/src/tools/pgindent/typedefs.list
@@ -2397,7 +2397,6 @@ StdAnalyzeData
StdRdOptions
Step
StopList
-StopWorkersData
StrategyNumber
StreamCtl
StreamXidHash
--
1.8.3.1
On Mon, Feb 1, 2021 at 11:26 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
I have updated the patch to display WARNING for each of the tablesync
slots during DropSubscription. As discussed, I have moved the drop
slot related code towards the end in AlterSubscription_refresh. Apart
from this, I have fixed one more issue in tablesync code where in
after catching the exception we were not clearing the transaction
state on the publisher, see changes in LogicalRepSyncTableStart. I
have also fixed other comments raised by you.
Here are some additional feedback comments about the v24 patch:
~~
ReportSlotConnectionError:
1,2,3,4.
+ foreach(lc, rstates)
+ {
+ SubscriptionRelState *rstate = (SubscriptionRelState *) lfirst(lc);
+ Oid relid = rstate->relid;
+
+ /* Only cleanup resources of tablesync workers */
+ if (!OidIsValid(relid))
+ continue;
+
+ /*
+ * Caller needs to ensure that we have appropriate locks so that
+ * relstate doesn't change underneath us.
+ */
+ if (rstate->state != SUBREL_STATE_SYNCDONE)
+ {
+ char syncslotname[NAMEDATALEN] = { 0 };
+
+ ReplicationSlotNameForTablesync(subid, relid, syncslotname);
+ elog(WARNING, "could not drop tablesync replication slot \"%s\"",
+ syncslotname);
+
+ }
+ }
1. I wonder if "rstates" would be better named something like
"not_ready_rstates", otherwise it is not apparent what states are in
this list
2. The comment "/* Only cleanup resources of tablesync workers */" is
not quite correct because there is no cleanup happening here. Maybe
change to:
if (!OidIsValid(relid))
continue; /* not a tablesync worker */
3. Maybe the "appropriate locks" comment can say what locks are the
"appropriate" ones?
4. Spurious blank line after the elog?
~~
AlterSubscription_refresh:
5.
+ /*
+ * Drop the tablesync slot. This has to be at the end because
otherwise if there
+ * is an error while doing the database operations we won't be able to rollback
+ * dropped slot.
+ */
Maybe "Drop the tablesync slot." should say "Drop the tablesync slots
associated with removed tables."
~~
DropSubscription:
6.
+ /*
+ * Cleanup of tablesync replication origins.
+ *
+ * Any READY-state relations would already have dealt with clean-ups.
+ *
+ * Note that the state can't change because we have already stopped both
+ * the apply and tablesync workers and they can't restart because of
+ * exclusive lock on the subscription.
+ */
+ rstates = GetSubscriptionNotReadyRelations(subid);
+ foreach(lc, rstates)
I wonder if "rstates" would be better named as "not_ready_rstates",
because it is used in several places where not READY is assumed.
7.
+ {
+ if (!slotname)
+ {
+ /* be tidy */
+ list_free(rstates);
+ return;
+ }
+ else
+ {
+ ReportSlotConnectionError(rstates, subid, slotname, err);
+ }
+
+ }
Spurious blank line above?
8.
The new logic of calling the ReportSlotConnectionError seems to be
expecting that the user has encountered some connection error, and
*after* that they have assigned slot_name = NONE as a workaround. In
this scenario the code looks ok since names of any dangling tablesync
slots were being logged at the time of the error.
But I am wondering what about where the user might have set slot_name
= NONE *before* the connection is broken. In this scenario, there is
no ERROR, so if there are still (is it possible?) dangling tablesync
slots, their names are never getting logged at all. So how can the
user know what to delete?
~~
Additionally, I have
removed the test because it was creating the same name slot as the
tablesync worker and tablesync worker removed the same due to new
logic in LogicalRepSyncStart. Earlier, it was not failing because of
the bug in that code which I have fixed in the attached.
Wasn't causing a tablesync slot clash and seeing if it could recover
the point of that test? Why not just keep, and modify the test to make
it work again? Isn't it still valuable because at least it would
execute the code through the PG_CATCH which otherwise may not get
executed by any other test?
I wonder whether we should restrict creating slots with prefix pg_
because we are internally creating slots with those names? I think
this was a problem previously also. We already prohibit it for few
other objects like origins, schema, etc., see the usage of
IsReservedName.
Yes, we could restrict the create slot API if you really wanted to.
But, IMO it is implausible that a user could "accidentally" clash with
the internal tablesync slot name, so in practice maybe this change
would not help much but it might make it more difficult to test some
scenarios.
----
Kind Regards,
Peter Smith.
Fujitsu Australia
On Tue, Feb 2, 2021 at 8:29 AM Peter Smith <smithpb2250@gmail.com> wrote:
On Mon, Feb 1, 2021 at 11:26 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
I have updated the patch to display WARNING for each of the tablesync
slots during DropSubscription. As discussed, I have moved the drop
slot related code towards the end in AlterSubscription_refresh. Apart
from this, I have fixed one more issue in tablesync code where in
after catching the exception we were not clearing the transaction
state on the publisher, see changes in LogicalRepSyncTableStart. I
have also fixed other comments raised by you.Here are some additional feedback comments about the v24 patch:
~~
ReportSlotConnectionError:
1,2,3,4. + foreach(lc, rstates) + { + SubscriptionRelState *rstate = (SubscriptionRelState *) lfirst(lc); + Oid relid = rstate->relid; + + /* Only cleanup resources of tablesync workers */ + if (!OidIsValid(relid)) + continue; + + /* + * Caller needs to ensure that we have appropriate locks so that + * relstate doesn't change underneath us. + */ + if (rstate->state != SUBREL_STATE_SYNCDONE) + { + char syncslotname[NAMEDATALEN] = { 0 }; + + ReplicationSlotNameForTablesync(subid, relid, syncslotname); + elog(WARNING, "could not drop tablesync replication slot \"%s\"", + syncslotname); + + } + }1. I wonder if "rstates" would be better named something like
"not_ready_rstates", otherwise it is not apparent what states are in
this list
I don't know if that would be better and it is used in the same way in
the existing code. I find the current naming succinct.
2. The comment "/* Only cleanup resources of tablesync workers */" is
not quite correct because there is no cleanup happening here. Maybe
change to:
if (!OidIsValid(relid))
continue; /* not a tablesync worker */
Aren't we trying to cleanup the tablesync slots here? So, I don't see
the comment as irrelevant.
3. Maybe the "appropriate locks" comment can say what locks are the
"appropriate" ones?4. Spurious blank line after the elog?
Will fix both the above.
~~
AlterSubscription_refresh:
5. + /* + * Drop the tablesync slot. This has to be at the end because otherwise if there + * is an error while doing the database operations we won't be able to rollback + * dropped slot. + */Maybe "Drop the tablesync slot." should say "Drop the tablesync slots
associated with removed tables."
makes sense, will fix.
~~
DropSubscription:
6. + /* + * Cleanup of tablesync replication origins. + * + * Any READY-state relations would already have dealt with clean-ups. + * + * Note that the state can't change because we have already stopped both + * the apply and tablesync workers and they can't restart because of + * exclusive lock on the subscription. + */ + rstates = GetSubscriptionNotReadyRelations(subid); + foreach(lc, rstates)I wonder if "rstates" would be better named as "not_ready_rstates",
because it is used in several places where not READY is assumed.
Same response as above for similar comment.
7. + { + if (!slotname) + { + /* be tidy */ + list_free(rstates); + return; + } + else + { + ReportSlotConnectionError(rstates, subid, slotname, err); + } + + }Spurious blank line above?
Will fix.
8.
The new logic of calling the ReportSlotConnectionError seems to be
expecting that the user has encountered some connection error, and
*after* that they have assigned slot_name = NONE as a workaround. In
this scenario the code looks ok since names of any dangling tablesync
slots were being logged at the time of the error.But I am wondering what about where the user might have set slot_name
= NONE *before* the connection is broken. In this scenario, there is
no ERROR, so if there are still (is it possible?) dangling tablesync
slots, their names are never getting logged at all. So how can the
user know what to delete?
It has been mentioned in docs that the user is responsible for
cleaning that up manually in such a case. The patch has also described
how the names are generated so that can help user to remove those.
+ These table synchronization slots have generated names:
+ <quote><literal>pg_%u_sync_%u</literal></quote> (parameters: Subscription
+ <parameter>oid</parameter>, Table <parameter>relid</parameter>)
I think if the user changes slot_name associated with the
subscription, it would be his responsibility to clean up the
previously associated slot. This is currently the case with the main
subscription slot as well. I think it won't be advisable for the user
to change slot_name unless under some rare cases where the system
might be stuck like the one for which we are giving WARNING and
providing a hint for setting the slot_name to NONE.
~~
Additionally, I have
removed the test because it was creating the same name slot as the
tablesync worker and tablesync worker removed the same due to new
logic in LogicalRepSyncStart. Earlier, it was not failing because of
the bug in that code which I have fixed in the attached.Wasn't causing a tablesync slot clash and seeing if it could recover
the point of that test? Why not just keep, and modify the test to make
it work again?
We can do that but my other worry was that we might want to reserve
the names for slots that start with pg_.
Isn't it still valuable because at least it would
execute the code through the PG_CATCH which otherwise may not get
executed by any other test?
It is valuable but IIRC there was a test (in subscription/004_sync.pl)
where PK violation happens during copy which will lead to the coverage
of code in CATCH.
I wonder whether we should restrict creating slots with prefix pg_
because we are internally creating slots with those names? I think
this was a problem previously also. We already prohibit it for few
other objects like origins, schema, etc., see the usage of
IsReservedName.Yes, we could restrict the create slot API if you really wanted to.
But, IMO it is implausible that a user could "accidentally" clash with
the internal tablesync slot name, so in practice maybe this change
would not help much but it might make it more difficult to test some
scenarios.
Isn't the same true for origins?
--
With Regards,
Amit Kapila.
On Mon, Feb 1, 2021 at 11:26 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
I have updated the patch to display WARNING for each of the tablesync
slots during DropSubscription. As discussed, I have moved the drop
slot related code towards the end in AlterSubscription_refresh. Apart
from this, I have fixed one more issue in tablesync code where in
after catching the exception we were not clearing the transaction
state on the publisher, see changes in LogicalRepSyncTableStart. I
have also fixed other comments raised by you. Additionally, I have
removed the test because it was creating the same name slot as the
tablesync worker and tablesync worker removed the same due to new
logic in LogicalRepSyncStart. Earlier, it was not failing because of
the bug in that code which I have fixed in the attached.
I was testing this patch. I had a table on the subscriber which had a
row that would cause a PK constraint
violation during the table copy. This is resulting in the subscriber
trying to rollback the table copy and failing.
2021-02-01 23:28:16.041 EST [23738] LOG: logical replication apply
worker for subscription "tap_sub" has started
2021-02-01 23:28:16.051 EST [23740] LOG: logical replication table
synchronization worker for subscription "tap_sub", table "tab_rep" has
started
2021-02-01 23:28:21.118 EST [23740] ERROR: table copy could not
rollback transaction on publisher
2021-02-01 23:28:21.118 EST [23740] DETAIL: The error was: another
command is already in progress
2021-02-01 23:28:21.122 EST [8028] LOG: background worker "logical
replication worker" (PID 23740) exited with exit code 1
2021-02-01 23:28:21.125 EST [23908] LOG: logical replication table
synchronization worker for subscription "tap_sub", table "tab_rep" has
started
2021-02-01 23:28:21.138 EST [23908] ERROR: could not create
replication slot "pg_16398_sync_16384": ERROR: replication slot
"pg_16398_sync_16384" already exists
2021-02-01 23:28:21.139 EST [8028] LOG: background worker "logical
replication worker" (PID 23908) exited with exit code 1
2021-02-01 23:28:26.168 EST [24048] LOG: logical replication table
synchronization worker for subscription "tap_sub", table "tab_rep" has
started
2021-02-01 23:28:34.244 EST [24048] ERROR: table copy could not
rollback transaction on publisher
2021-02-01 23:28:34.244 EST [24048] DETAIL: The error was: another
command is already in progress
2021-02-01 23:28:34.251 EST [8028] LOG: background worker "logical
replication worker" (PID 24048) exited with exit code 1
2021-02-01 23:28:34.254 EST [24337] LOG: logical replication table
synchronization worker for subscription "tap_sub", table "tab_rep" has
started
2021-02-01 23:28:34.263 EST [24337] ERROR: could not create
replication slot "pg_16398_sync_16384": ERROR: replication slot
"pg_16398_sync_16384" already exists
2021-02-01 23:28:34.264 EST [8028] LOG: background worker "logical
replication worker" (PID 24337) exited with exit code 1
And one more thing I see is that now we error out in PG_CATCH() in
LogicalRepSyncTableStart() with the above error and as a result, the
tablesync slot is not dropped. Hence causing the slot create to fail
in the next restart.
I think this can be avoided. We could either attempt a rollback only
on specific failures and drop slot prior to erroring out.
regards,
Ajin Cherian
Fujitsu Australia
Another failure I see in my testing
On publisher create a big enough table:
publisher:
postgres=# CREATE TABLE tab_rep (a int primary key);CREATE TABLE
postgres=# INSERT INTO tab_rep SELECT generate_series(1,1000000);
INSERT 0 1000000
postgres=# CREATE PUBLICATION tap_pub FOR ALL TABLES;
CREATE PUBLICATION
Subscriber:
postgres=# CREATE TABLE tab_rep (a int primary key);
CREATE TABLE
postgres=# CREATE SUBSCRIPTION tap_sub CONNECTION 'host=localhost
dbname=postgres port=6972' PUBLICATION tap_pub WITH (enabled = false);
Create the subscription but do not enable it:
The below two commands on the subscriber should be issued quickly with
no delay between them.
postgres=# ALTER SUBSCRIPTION tap_sub enable;
ALTER SUBSCRIPTION
postgres=# ALTER SUBSCRIPTION tap_sub disable;
ALTER SUBSCRIPTION
This leaves the below state for the pg_subscription rel:
postgres=# select * from pg_subscription_rel;
srsubid | srrelid | srsubstate | srsublsn
---------+---------+------------+----------
16395 | 16384 | f |
(1 row)
The rel is in the SUBREL_STATE_FINISHEDCOPY state.
Meanwhile on the publisher, looking at the slots created:
postgres=# select * from pg_replication_slots;
slot_name | plugin | slot_type | datoid | database |
temporary | active | active_pid | x
min | catalog_xmin | restart_lsn | confirmed_flush_lsn | wal_status |
safe_wal_size
---------------------+----------+-----------+--------+----------+-----------+--------+------------+--
----+--------------+-------------+---------------------+------------+---------------
tap_sub | pgoutput | logical | 13859 | postgres | f
| f | |
| 517 | 0/9303660 | 0/9303698 | reserved |
pg_16395_sync_16384 | pgoutput | logical | 13859 | postgres | f
| f | |
| 517 | 0/9303660 | 0/9303698 | reserved |
(2 rows)
There are two slots, the main slot as well as the tablesync slot, drop
the table, re-enable the subscription and then drop the subscription
Now on the subscriber:
postgres=# drop table tab_rep;
DROP TABLE
postgres=# ALTER SUBSCRIPTION tap_sub enable;
ALTER SUBSCRIPTION
postgres=# drop subscription tap_sub ;
NOTICE: dropped replication slot "tap_sub" on publisher
DROP SUBSCRIPTION
We see the tablesync slot dangling in the publisher:
postgres=# select * from pg_replication_slots;
slot_name | plugin | slot_type | datoid | database |
temporary | active | active_pid | x
min | catalog_xmin | restart_lsn | confirmed_flush_lsn | wal_status |
safe_wal_size
---------------------+----------+-----------+--------+----------+-----------+--------+------------+--
----+--------------+-------------+---------------------+------------+---------------
pg_16395_sync_16384 | pgoutput | logical | 13859 | postgres | f
| f | |
| 517 | 0/9303660 | 0/9303698 | reserved |
(1 row)
The dropping of the table, meant that after the tablesync is
restarted, the worker has no idea about the old slot created as its
name uses the relid of the dropped table.
regards,
Ajin Cherian
Fujitsu Australia
On Mon, Feb 1, 2021 at 11:26 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
I have updated the patch to display WARNING for each of the tablesync
slots during DropSubscription. As discussed, I have moved the drop
slot related code towards the end in AlterSubscription_refresh. Apart
from this, I have fixed one more issue in tablesync code where in
after catching the exception we were not clearing the transaction
state on the publisher, see changes in LogicalRepSyncTableStart.
...
I know that in another email [ac0202] Ajin has reported some problem
he found related to this new (LogicalRepSyncTableStart PG_CATCH) code
for some different use-case, but for my test scenario of a "broken
connection during a table copy" the code did appear to be working
properly.
PSA detailed logs which show the test steps and output for this
""broken connection during a table copy" scenario.
----
[ac0202] /messages/by-id/CAFPTHDaZw5o+wMbv3aveOzuLyz_rqZebXAj59rDKTJbwXFPYgw@mail.gmail.com
Kind Regards,
Peter Smith.
Fujitsu Australia
Attachments:
On Tue, Feb 2, 2021 at 10:34 AM Ajin Cherian <itsajin@gmail.com> wrote:
On Mon, Feb 1, 2021 at 11:26 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
I have updated the patch to display WARNING for each of the tablesync
slots during DropSubscription. As discussed, I have moved the drop
slot related code towards the end in AlterSubscription_refresh. Apart
from this, I have fixed one more issue in tablesync code where in
after catching the exception we were not clearing the transaction
state on the publisher, see changes in LogicalRepSyncTableStart. I
have also fixed other comments raised by you. Additionally, I have
removed the test because it was creating the same name slot as the
tablesync worker and tablesync worker removed the same due to new
logic in LogicalRepSyncStart. Earlier, it was not failing because of
the bug in that code which I have fixed in the attached.I was testing this patch. I had a table on the subscriber which had a
row that would cause a PK constraint
violation during the table copy. This is resulting in the subscriber
trying to rollback the table copy and failing.
I am not getting this error. I have tried by below test:
Publisher
===========
CREATE TABLE mytbl1(id SERIAL PRIMARY KEY, somedata int, text varchar(120));
BEGIN;
INSERT INTO mytbl1(somedata, text) VALUES (1, 1);
INSERT INTO mytbl1(somedata, text) VALUES (1, 2);
COMMIT;
CREATE PUBLICATION mypublication FOR TABLE mytbl1;
Subscriber
=============
CREATE TABLE mytbl1(id SERIAL PRIMARY KEY, somedata int, text varchar(120));
BEGIN;
INSERT INTO mytbl1(somedata, text) VALUES (1, 1);
INSERT INTO mytbl1(somedata, text) VALUES (1, 2);
COMMIT;
CREATE SUBSCRIPTION mysub
CONNECTION 'host=localhost port=5432 dbname=postgres'
PUBLICATION mypublication;
It generates the PK violation the first time and then I removed the
conflicting rows in the subscriber and it passed. See logs below.
2021-02-02 13:51:34.316 IST [20796] LOG: logical replication table
synchronization worker for subscription "mysub", table "mytbl1" has
started
2021-02-02 13:52:43.625 IST [20796] ERROR: duplicate key value
violates unique constraint "mytbl1_pkey"
2021-02-02 13:52:43.625 IST [20796] DETAIL: Key (id)=(1) already exists.
2021-02-02 13:52:43.625 IST [20796] CONTEXT: COPY mytbl1, line 1
2021-02-02 13:52:43.695 IST [27840] LOG: background worker "logical
replication worker" (PID 20796) exited with exit code 1
2021-02-02 13:52:43.884 IST [6260] LOG: logical replication table
synchronization worker for subscription "mysub", table "mytbl1" has
started
2021-02-02 13:53:54.680 IST [6260] LOG: logical replication table
synchronization worker for subscription "mysub", table "mytbl1" has
finished
Also, a similar test exists in 0004_sync.pl, is that also failing for
you? Can you please provide detailed steps that led to this failure?
And one more thing I see is that now we error out in PG_CATCH() in
LogicalRepSyncTableStart() with the above error and as a result, the
tablesync slot is not dropped. Hence causing the slot create to fail
in the next restart.
I think this can be avoided. We could either attempt a rollback only
on specific failures and drop slot prior to erroring out.
Hmm, we have to first rollback before attempting any other operation
because the transaction on the publisher is in an errored state.
--
With Regards,
Amit Kapila.
After seeing Ajin's test [ac0202] which did a DROP TABLE, I have also
tried a simple test where I do a DROP TABLE with very bad timing for
the tablesync worker. It seems that doing this can cause the sync
worker's MyLogicalRepWorker->relid to become invalid.
In my test this caused a stack trace within some logging, but I
imagine other bad things can happen if the tablesync worker can be
executed with an invalid relid.
Possibly this is an existing PG bug which has just never been seen
before; The ereport which has failed here is not new code.
PSA the log for the test steps and the stack trace details.
----
[ac0202] /messages/by-id/CAFPTHDYzjaNfzsFHpER9idAPB8v5j=SUbWL0AKj5iVy0BKbTpg@mail.gmail.com
Kind Regards,
Peter Smith.
Fujitsu Australia
Attachments:
On Tue, Feb 2, 2021 at 7:40 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
On Tue, Feb 2, 2021 at 10:34 AM Ajin Cherian <itsajin@gmail.com> wrote:
On Mon, Feb 1, 2021 at 11:26 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
I have updated the patch to display WARNING for each of the tablesync
slots during DropSubscription. As discussed, I have moved the drop
slot related code towards the end in AlterSubscription_refresh. Apart
from this, I have fixed one more issue in tablesync code where in
after catching the exception we were not clearing the transaction
state on the publisher, see changes in LogicalRepSyncTableStart. I
have also fixed other comments raised by you. Additionally, I have
removed the test because it was creating the same name slot as the
tablesync worker and tablesync worker removed the same due to new
logic in LogicalRepSyncStart. Earlier, it was not failing because of
the bug in that code which I have fixed in the attached.I was testing this patch. I had a table on the subscriber which had a
row that would cause a PK constraint
violation during the table copy. This is resulting in the subscriber
trying to rollback the table copy and failing.I am not getting this error. I have tried by below test:
I am sorry, my above steps were not correct. I think the reason for
the failure I was seeing were some other steps I did prior to this. I
will recreate this and update you with the appropriate steps.
regards,
Ajin Cherian
Fujitsu Australia
On Tue, Feb 2, 2021 at 11:35 AM Ajin Cherian <itsajin@gmail.com> wrote:
Another failure I see in my testing
The problem here is that we are allowing to drop the table when table
synchronization is still in progress and then we don't have any way to
know the corresponding slot or origin. I think we can try to drop the
slot and origin as well but that is not a good idea because slots once
dropped won't be rolled back. So, I have added a fix to disallow the
drop of the table when table synchronization is still in progress.
Apart from that, I have fixed comments raised by Peter as discussed
above and made some additional changes in comments, code (code changes
are cosmetic), and docs.
Let me know if the issue reported is fixed or not?
--
With Regards,
Amit Kapila.
Attachments:
v25-0001-Tablesync-Solution1.patchapplication/octet-stream; name=v25-0001-Tablesync-Solution1.patchDownload
From 7859b5e48d8c7e725c98095174cf427eab8d9d18 Mon Sep 17 00:00:00 2001
From: Amit Kapila <akapila@postgresql.org>
Date: Sat, 30 Jan 2021 10:21:28 +0530
Subject: [PATCH v25] Tablesync Solution1.
==== Features:
* The tablesync slot is now permanent instead of temporary.
* The tablesync worker is now allowing multiple tx instead of single tx.
* A new state (SUBREL_STATE_FINISHEDCOPY) is persisted after a successful copy_table in tablesync's LogicalRepSyncTableStart.
* If a re-launched tablesync finds state SUBREL_STATE_FINISHEDCOPY then it will bypass the initial copy_table phase.
* Now tablesync sets up replication origin tracking in LogicalRepSyncTableStart (similar as done for the apply worker). The origin is advanced when first created.
* Cleanup of tablesync resources:
- The tablesync slot is dropped by process_syncing_tables_for_sync functions.
- The tablesync replication origin tracking is dropped by process_syncing_tables_for_apply.
- DropSubscription/AlterSubscription_refresh also drop tablesyc slots/origins
* Updates to PG docs.
Known Issues:
* None.
---
doc/src/sgml/catalogs.sgml | 1 +
doc/src/sgml/logical-replication.sgml | 17 +-
doc/src/sgml/ref/alter_subscription.sgml | 6 +
doc/src/sgml/ref/drop_subscription.sgml | 6 +-
src/backend/access/transam/xact.c | 11 -
src/backend/catalog/pg_subscription.c | 31 ++
src/backend/commands/subscriptioncmds.c | 440 +++++++++++++++-----
src/backend/replication/logical/launcher.c | 147 -------
src/backend/replication/logical/tablesync.c | 297 +++++++++++--
src/backend/replication/logical/worker.c | 18 +-
src/backend/tcop/utility.c | 3 +-
src/include/catalog/pg_subscription_rel.h | 2 +
src/include/commands/subscriptioncmds.h | 2 +-
src/include/replication/logicallauncher.h | 2 -
src/include/replication/slot.h | 3 +
src/include/replication/worker_internal.h | 3 +-
src/tools/pgindent/typedefs.list | 1 -
17 files changed, 665 insertions(+), 325 deletions(-)
diff --git a/doc/src/sgml/catalogs.sgml b/doc/src/sgml/catalogs.sgml
index 865e826fb0..920a39dfa9 100644
--- a/doc/src/sgml/catalogs.sgml
+++ b/doc/src/sgml/catalogs.sgml
@@ -7665,6 +7665,7 @@ SCRAM-SHA-256$<replaceable><iteration count></replaceable>:<replaceable>&l
State code:
<literal>i</literal> = initialize,
<literal>d</literal> = data is being copied,
+ <literal>f</literal> = finished table copy,
<literal>s</literal> = synchronized,
<literal>r</literal> = ready (normal replication)
</para></entry>
diff --git a/doc/src/sgml/logical-replication.sgml b/doc/src/sgml/logical-replication.sgml
index a560ad69b4..20cdd5715d 100644
--- a/doc/src/sgml/logical-replication.sgml
+++ b/doc/src/sgml/logical-replication.sgml
@@ -248,7 +248,17 @@
<para>
As mentioned earlier, each (active) subscription receives changes from a
- replication slot on the remote (publishing) side. Normally, the remote
+ replication slot on the remote (publishing) side.
+ </para>
+ <para>
+ Additional table synchronization slots are normally transient, created
+ internally and dropped automatically when they are no longer needed.
+ These table synchronization slots have generated names:
+ <quote><literal>pg_%u_sync_%u</literal></quote> (parameters: Subscription
+ <parameter>oid</parameter>, Table <parameter>relid</parameter>)
+ </para>
+ <para>
+ Normally, the remote
replication slot is created automatically when the subscription is created
using <command>CREATE SUBSCRIPTION</command> and it is dropped
automatically when the subscription is dropped using <command>DROP
@@ -294,8 +304,9 @@
using <command>ALTER SUBSCRIPTION</command> before attempting to drop
the subscription. If the remote database instance no longer exists, no
further action is then necessary. If, however, the remote database
- instance is just unreachable, the replication slot should then be
- dropped manually; otherwise it would continue to reserve WAL and might
+ instance is just unreachable, the replication slot (and any still
+ remaining table synchronization slots) should then be
+ dropped manually; otherwise it/they would continue to reserve WAL and might
eventually cause the disk to fill up. Such cases should be carefully
investigated.
</para>
diff --git a/doc/src/sgml/ref/alter_subscription.sgml b/doc/src/sgml/ref/alter_subscription.sgml
index db5e59f707..a6ffd6688f 100644
--- a/doc/src/sgml/ref/alter_subscription.sgml
+++ b/doc/src/sgml/ref/alter_subscription.sgml
@@ -48,6 +48,12 @@ ALTER SUBSCRIPTION <replaceable class="parameter">name</replaceable> RENAME TO <
(Currently, all subscription owners must be superusers, so the owner checks
will be bypassed in practice. But this might change in the future.)
</para>
+
+ <para>
+ Commands <command>ALTER SUBSCRIPTION ... REFRESH ..</command> and
+ <command>ALTER SUBSCRIPTION ... SET PUBLICATION ..</command> with refresh
+ option as true cannot be executed inside a transaction block.
+ </para>
</refsect1>
<refsect1>
diff --git a/doc/src/sgml/ref/drop_subscription.sgml b/doc/src/sgml/ref/drop_subscription.sgml
index adbdeafb4e..aee9615546 100644
--- a/doc/src/sgml/ref/drop_subscription.sgml
+++ b/doc/src/sgml/ref/drop_subscription.sgml
@@ -79,7 +79,8 @@ DROP SUBSCRIPTION [ IF EXISTS ] <replaceable class="parameter">name</replaceable
<para>
When dropping a subscription that is associated with a replication slot on
the remote host (the normal state), <command>DROP SUBSCRIPTION</command>
- will connect to the remote host and try to drop the replication slot as
+ will connect to the remote host and try to drop the replication slot (and
+ any remaining table synchronization slots) as
part of its operation. This is necessary so that the resources allocated
for the subscription on the remote host are released. If this fails,
either because the remote host is not reachable or because the remote
@@ -89,7 +90,8 @@ DROP SUBSCRIPTION [ IF EXISTS ] <replaceable class="parameter">name</replaceable
executing <literal>ALTER SUBSCRIPTION ... SET (slot_name = NONE)</literal>.
After that, <command>DROP SUBSCRIPTION</command> will no longer attempt any
actions on a remote host. Note that if the remote replication slot still
- exists, it should then be dropped manually; otherwise it will continue to
+ exists, it (and any related table synchronization slots) should then be
+ dropped manually; otherwise it/they will continue to
reserve WAL and might eventually cause the disk to fill up. See
also <xref linkend="logical-replication-subscription-slot"/>.
</para>
diff --git a/src/backend/access/transam/xact.c b/src/backend/access/transam/xact.c
index a2068e3fd4..3c8b4eb362 100644
--- a/src/backend/access/transam/xact.c
+++ b/src/backend/access/transam/xact.c
@@ -2432,15 +2432,6 @@ PrepareTransaction(void)
(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
errmsg("cannot PREPARE a transaction that has exported snapshots")));
- /*
- * Don't allow PREPARE but for transaction that has/might kill logical
- * replication workers.
- */
- if (XactManipulatesLogicalReplicationWorkers())
- ereport(ERROR,
- (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
- errmsg("cannot PREPARE a transaction that has manipulated logical replication workers")));
-
/* Prevent cancel/die interrupt while cleaning up */
HOLD_INTERRUPTS();
@@ -4899,7 +4890,6 @@ CommitSubTransaction(void)
AtEOSubXact_HashTables(true, s->nestingLevel);
AtEOSubXact_PgStat(true, s->nestingLevel);
AtSubCommit_Snapshot(s->nestingLevel);
- AtEOSubXact_ApplyLauncher(true, s->nestingLevel);
/*
* We need to restore the upper transaction's read-only state, in case the
@@ -5059,7 +5049,6 @@ AbortSubTransaction(void)
AtEOSubXact_HashTables(false, s->nestingLevel);
AtEOSubXact_PgStat(false, s->nestingLevel);
AtSubAbort_Snapshot(s->nestingLevel);
- AtEOSubXact_ApplyLauncher(false, s->nestingLevel);
}
/*
diff --git a/src/backend/catalog/pg_subscription.c b/src/backend/catalog/pg_subscription.c
index 44cb285b68..08e339b9b1 100644
--- a/src/backend/catalog/pg_subscription.c
+++ b/src/backend/catalog/pg_subscription.c
@@ -29,6 +29,7 @@
#include "utils/array.h"
#include "utils/builtins.h"
#include "utils/fmgroids.h"
+#include "utils/lsyscache.h"
#include "utils/pg_lsn.h"
#include "utils/rel.h"
#include "utils/syscache.h"
@@ -337,6 +338,9 @@ GetSubscriptionRelState(Oid subid, Oid relid, XLogRecPtr *sublsn)
char substate;
bool isnull;
Datum d;
+ Relation rel;
+
+ rel = table_open(SubscriptionRelRelationId, AccessShareLock);
/* Try finding the mapping. */
tup = SearchSysCache2(SUBSCRIPTIONRELMAP,
@@ -363,6 +367,8 @@ GetSubscriptionRelState(Oid subid, Oid relid, XLogRecPtr *sublsn)
/* Cleanup */
ReleaseSysCache(tup);
+ table_close(rel, AccessShareLock);
+
return substate;
}
@@ -403,6 +409,31 @@ RemoveSubscriptionRel(Oid subid, Oid relid)
scan = table_beginscan_catalog(rel, nkeys, skey);
while (HeapTupleIsValid(tup = heap_getnext(scan, ForwardScanDirection)))
{
+ Form_pg_subscription_rel subrel;
+
+ subrel = (Form_pg_subscription_rel) GETSTRUCT(tup);
+
+ /*
+ * We don't allow to drop the relation mapping when the table
+ * synchronization is in progress unless the caller updates the
+ * corresponding subscription as well. This is to ensure that we don't
+ * leave tablesync slots or origins in the system when the
+ * corresponding table is dropped.
+ */
+ if (!OidIsValid(subid) && subrel->srsubstate != SUBREL_STATE_READY)
+ {
+ ereport(ERROR,
+ (errcode(ERRCODE_UNDEFINED_OBJECT),
+ errmsg("could not drop relation mapping for subscription \"%s\"",
+ get_subscription_name(subrel->srsubid, false)),
+ errdetail("Table synchronization for relation \"%s\" is in progress and is in state \"%c\".",
+ get_rel_name(relid), subrel->srsubstate),
+ /* translator: first %s is a SQL ALTER command and second %s is a SQL DROP command */
+ errhint("Use %s to enable subscription if not already enabled or use %s to drop the subscription.",
+ "ALTER SUBSCRIPTION ... ENABLE",
+ "DROP SUBSCRIPTION ...")));
+ }
+
CatalogTupleDelete(rel, &tup->t_self);
}
table_endscan(scan);
diff --git a/src/backend/commands/subscriptioncmds.c b/src/backend/commands/subscriptioncmds.c
index 082f7855b8..220c7a08d5 100644
--- a/src/backend/commands/subscriptioncmds.c
+++ b/src/backend/commands/subscriptioncmds.c
@@ -34,6 +34,7 @@
#include "nodes/makefuncs.h"
#include "replication/logicallauncher.h"
#include "replication/origin.h"
+#include "replication/slot.h"
#include "replication/walreceiver.h"
#include "replication/walsender.h"
#include "replication/worker_internal.h"
@@ -46,6 +47,8 @@
#include "utils/syscache.h"
static List *fetch_table_list(WalReceiverConn *wrconn, List *publications);
+static void ReportSlotConnectionError(List* rstates, Oid subid, char *slotname, char *err);
+
/*
* Common option parsing function for CREATE and ALTER SUBSCRIPTION commands.
@@ -566,107 +569,191 @@ AlterSubscription_refresh(Subscription *sub, bool copy_data)
Oid *pubrel_local_oids;
ListCell *lc;
int off;
+ int remove_rel_len;
+ Relation rel = NULL;
+ typedef struct SubRemoveRels
+ {
+ Oid relid;
+ char state;
+ } SubRemoveRels;
+ SubRemoveRels *sub_remove_rels;
/* Load the library providing us libpq calls. */
load_file("libpqwalreceiver", false);
- /* Try to connect to the publisher. */
- wrconn = walrcv_connect(sub->conninfo, true, sub->name, &err);
- if (!wrconn)
- ereport(ERROR,
- (errmsg("could not connect to the publisher: %s", err)));
-
- /* Get the table list from publisher. */
- pubrel_names = fetch_table_list(wrconn, sub->publications);
+ PG_TRY();
+ {
+ /* Try to connect to the publisher. */
+ wrconn = walrcv_connect(sub->conninfo, true, sub->name, &err);
+ if (!wrconn)
+ ereport(ERROR,
+ (errmsg("could not connect to the publisher: %s", err)));
- /* We are done with the remote side, close connection. */
- walrcv_disconnect(wrconn);
+ /* Get the table list from publisher. */
+ pubrel_names = fetch_table_list(wrconn, sub->publications);
- /* Get local table list. */
- subrel_states = GetSubscriptionRelations(sub->oid);
+ /* Get local table list. */
+ subrel_states = GetSubscriptionRelations(sub->oid);
- /*
- * Build qsorted array of local table oids for faster lookup. This can
- * potentially contain all tables in the database so speed of lookup is
- * important.
- */
- subrel_local_oids = palloc(list_length(subrel_states) * sizeof(Oid));
- off = 0;
- foreach(lc, subrel_states)
- {
- SubscriptionRelState *relstate = (SubscriptionRelState *) lfirst(lc);
+ /*
+ * Build qsorted array of local table oids for faster lookup. This can
+ * potentially contain all tables in the database so speed of lookup
+ * is important.
+ */
+ subrel_local_oids = palloc(list_length(subrel_states) * sizeof(Oid));
+ off = 0;
+ foreach(lc, subrel_states)
+ {
+ SubscriptionRelState *relstate = (SubscriptionRelState *) lfirst(lc);
- subrel_local_oids[off++] = relstate->relid;
- }
- qsort(subrel_local_oids, list_length(subrel_states),
- sizeof(Oid), oid_cmp);
+ subrel_local_oids[off++] = relstate->relid;
+ }
+ qsort(subrel_local_oids, list_length(subrel_states),
+ sizeof(Oid), oid_cmp);
+
+ sub_remove_rels = palloc(list_length(subrel_states) * sizeof(SubRemoveRels));
+
+ /*
+ * Walk over the remote tables and try to match them to locally known
+ * tables. If the table is not known locally create a new state for
+ * it.
+ *
+ * Also builds array of local oids of remote tables for the next step.
+ */
+ off = 0;
+ pubrel_local_oids = palloc(list_length(pubrel_names) * sizeof(Oid));
+
+ foreach(lc, pubrel_names)
+ {
+ RangeVar *rv = (RangeVar *) lfirst(lc);
+ Oid relid;
- /*
- * Walk over the remote tables and try to match them to locally known
- * tables. If the table is not known locally create a new state for it.
- *
- * Also builds array of local oids of remote tables for the next step.
- */
- off = 0;
- pubrel_local_oids = palloc(list_length(pubrel_names) * sizeof(Oid));
+ relid = RangeVarGetRelid(rv, AccessShareLock, false);
- foreach(lc, pubrel_names)
- {
- RangeVar *rv = (RangeVar *) lfirst(lc);
- Oid relid;
+ /* Check for supported relkind. */
+ CheckSubscriptionRelkind(get_rel_relkind(relid),
+ rv->schemaname, rv->relname);
- relid = RangeVarGetRelid(rv, AccessShareLock, false);
+ pubrel_local_oids[off++] = relid;
- /* Check for supported relkind. */
- CheckSubscriptionRelkind(get_rel_relkind(relid),
- rv->schemaname, rv->relname);
+ if (!bsearch(&relid, subrel_local_oids,
+ list_length(subrel_states), sizeof(Oid), oid_cmp))
+ {
+ AddSubscriptionRelState(sub->oid, relid,
+ copy_data ? SUBREL_STATE_INIT : SUBREL_STATE_READY,
+ InvalidXLogRecPtr);
+ ereport(DEBUG1,
+ (errmsg("table \"%s.%s\" added to subscription \"%s\"",
+ rv->schemaname, rv->relname, sub->name)));
+ }
+ }
- pubrel_local_oids[off++] = relid;
+ /*
+ * Next remove state for tables we should not care about anymore using
+ * the data we collected above
+ */
+ qsort(pubrel_local_oids, list_length(pubrel_names),
+ sizeof(Oid), oid_cmp);
- if (!bsearch(&relid, subrel_local_oids,
- list_length(subrel_states), sizeof(Oid), oid_cmp))
+ remove_rel_len = 0;
+ for (off = 0; off < list_length(subrel_states); off++)
{
- AddSubscriptionRelState(sub->oid, relid,
- copy_data ? SUBREL_STATE_INIT : SUBREL_STATE_READY,
- InvalidXLogRecPtr);
- ereport(DEBUG1,
- (errmsg("table \"%s.%s\" added to subscription \"%s\"",
- rv->schemaname, rv->relname, sub->name)));
- }
- }
+ Oid relid = subrel_local_oids[off];
- /*
- * Next remove state for tables we should not care about anymore using the
- * data we collected above
- */
- qsort(pubrel_local_oids, list_length(pubrel_names),
- sizeof(Oid), oid_cmp);
+ if (!bsearch(&relid, pubrel_local_oids,
+ list_length(pubrel_names), sizeof(Oid), oid_cmp))
+ {
+ char state;
+ XLogRecPtr statelsn;
+
+ /*
+ * Lock pg_subscription_rel with AccessExclusiveLock to prevent any race
+ * conditions with the apply worker re-launching workers at the same time
+ * this code is trying to remove those tables.
+ *
+ * Even if new worker for this particular rel is restarted it won't be able
+ * to make any progress as we hold exclusive lock on subscription_rel till
+ * the transaction end. It will simply exit as there is no corresponding
+ * rel entry.
+ *
+ * This locking also ensures that the state of rels won't change till we
+ * are done with this refresh operation.
+ */
+ if (!rel)
+ rel = table_open(SubscriptionRelRelationId, AccessExclusiveLock);
+
+ /* Last known rel state. */
+ state = GetSubscriptionRelState(sub->oid, relid, &statelsn);
+
+ sub_remove_rels[remove_rel_len].relid = relid;
+ sub_remove_rels[remove_rel_len++].state = state;
+
+ RemoveSubscriptionRel(sub->oid, relid);
+
+ logicalrep_worker_stop(sub->oid, relid);
+
+ /*
+ * For READY state, we would have already dropped the tablesync
+ * origin.
+ */
+ if (state != SUBREL_STATE_READY)
+ {
+ /*
+ * Drop the tablesync's origin tracking if exists.
+ */
+ tablesync_replorigin_drop(sub->oid, relid, false /* nowait */);
+ }
- for (off = 0; off < list_length(subrel_states); off++)
- {
- Oid relid = subrel_local_oids[off];
+ ereport(DEBUG1,
+ (errmsg("table \"%s.%s\" removed from subscription \"%s\"",
+ get_namespace_name(get_rel_namespace(relid)),
+ get_rel_name(relid),
+ sub->name)));
+ }
+ }
- if (!bsearch(&relid, pubrel_local_oids,
- list_length(pubrel_names), sizeof(Oid), oid_cmp))
+ /*
+ * Drop the tablesync slots associated with removed tables. This has to
+ * be at the end because otherwise if there is an error while doing the
+ * database operations we won't be able to rollback dropped slot.
+ */
+ for (off = 0; off < remove_rel_len; off++)
{
- RemoveSubscriptionRel(sub->oid, relid);
-
- logicalrep_worker_stop_at_commit(sub->oid, relid);
-
- ereport(DEBUG1,
- (errmsg("table \"%s.%s\" removed from subscription \"%s\"",
- get_namespace_name(get_rel_namespace(relid)),
- get_rel_name(relid),
- sub->name)));
+ if (sub_remove_rels[off].state != SUBREL_STATE_READY &&
+ sub_remove_rels[off].state != SUBREL_STATE_SYNCDONE)
+ {
+ char syncslotname[NAMEDATALEN] = { 0 };
+
+ /*
+ * For READY/SYNCDONE states we know the tablesync slot has
+ * already been dropped by the tablesync worker.
+ *
+ * For other states, there is no certainty. Maybe the slot
+ * does not exist yet; Maybe the slot is already deleted but
+ * SYNCDONE is not yet set. For this reason we allow
+ * missing_ok = true for the drop.
+ */
+ ReplicationSlotNameForTablesync(sub->oid, sub_remove_rels[off].relid, syncslotname);
+ ReplicationSlotDropAtPubNode(wrconn, syncslotname, true /* missing_ok */);
+ }
}
}
+ PG_FINALLY();
+ {
+ if (wrconn)
+ walrcv_disconnect(wrconn);
+ }
+ PG_END_TRY();
+
+ if (rel)
+ table_close(rel, NoLock);
}
/*
* Alter the existing subscription.
*/
ObjectAddress
-AlterSubscription(AlterSubscriptionStmt *stmt)
+AlterSubscription(AlterSubscriptionStmt *stmt, bool isTopLevel)
{
Relation rel;
ObjectAddress myself;
@@ -848,6 +935,8 @@ AlterSubscription(AlterSubscriptionStmt *stmt)
errmsg("ALTER SUBSCRIPTION with refresh is not allowed for disabled subscriptions"),
errhint("Use ALTER SUBSCRIPTION ... SET PUBLICATION ... WITH (refresh = false).")));
+ PreventInTransactionBlock(isTopLevel, "ALTER SUBSCRIPTION with refresh");
+
/* Make sure refresh sees the new list of publications. */
sub->publications = stmt->publication;
@@ -877,6 +966,8 @@ AlterSubscription(AlterSubscriptionStmt *stmt)
NULL, NULL, /* no "binary" */
NULL, NULL); /* no "streaming" */
+ PreventInTransactionBlock(isTopLevel, "ALTER SUBSCRIPTION ... REFRESH");
+
AlterSubscription_refresh(sub, copy_data);
break;
@@ -928,8 +1019,8 @@ DropSubscription(DropSubscriptionStmt *stmt, bool isTopLevel)
char *err = NULL;
RepOriginId originid;
WalReceiverConn *wrconn = NULL;
- StringInfoData cmd;
Form_pg_subscription form;
+ List *rstates;
/*
* Lock pg_subscription with AccessExclusiveLock to ensure that the
@@ -1042,6 +1133,31 @@ DropSubscription(DropSubscriptionStmt *stmt, bool isTopLevel)
}
list_free(subworkers);
+ /*
+ * Cleanup of tablesync replication origins.
+ *
+ * Any READY-state relations would already have dealt with clean-ups.
+ *
+ * Note that the state can't change because we have already stopped both
+ * the apply and tablesync workers and they can't restart because of
+ * exclusive lock on the subscription.
+ */
+ rstates = GetSubscriptionNotReadyRelations(subid);
+ foreach(lc, rstates)
+ {
+ SubscriptionRelState* rstate = (SubscriptionRelState*) lfirst(lc);
+ Oid relid = rstate->relid;
+
+ /* Only cleanup resources of tablesync workers */
+ if (!OidIsValid(relid))
+ continue;
+
+ /*
+ * Drop the tablesync's origin tracking if exists.
+ */
+ tablesync_replorigin_drop(subid, relid, false /* nowait */);
+ }
+
/* Clean up dependencies */
deleteSharedDependencyRecordsFor(SubscriptionRelationId, subid, 0);
@@ -1054,34 +1170,110 @@ DropSubscription(DropSubscriptionStmt *stmt, bool isTopLevel)
if (originid != InvalidRepOriginId)
replorigin_drop(originid, false);
+
/*
* If there is no slot associated with the subscription, we can finish
* here.
*/
- if (!slotname)
+ if (!slotname && rstates == NIL)
{
table_close(rel, NoLock);
return;
}
/*
- * Otherwise drop the replication slot at the publisher node using the
- * replication connection.
+ * Try to acquire the connection necessary for dropping slots.
+ *
+ * Note: If the slotname is NONE/NULL then we allow the command to finish
+ * and users need to manually cleanup the apply and tablesync worker slots
+ * later.
+ *
+ * This has to be at the end because otherwise if there is an error while
+ * doing the database operations we won't be able to rollback dropped slot.
*/
load_file("libpqwalreceiver", false);
- initStringInfo(&cmd);
- appendStringInfo(&cmd, "DROP_REPLICATION_SLOT %s WAIT", quote_identifier(slotname));
-
wrconn = walrcv_connect(conninfo, true, subname, &err);
if (wrconn == NULL)
- ereport(ERROR,
- (errmsg("could not connect to publisher when attempting to "
- "drop the replication slot \"%s\"", slotname),
- errdetail("The error was: %s", err),
- /* translator: %s is an SQL ALTER command */
- errhint("Use %s to disassociate the subscription from the slot.",
- "ALTER SUBSCRIPTION ... SET (slot_name = NONE)")));
+ {
+ if (!slotname)
+ {
+ /* be tidy */
+ list_free(rstates);
+ return;
+ }
+ else
+ {
+ ReportSlotConnectionError(rstates, subid, slotname, err);
+ }
+ }
+
+ PG_TRY();
+ {
+ foreach(lc, rstates)
+ {
+ SubscriptionRelState *rstate = (SubscriptionRelState *) lfirst(lc);
+ Oid relid = rstate->relid;
+
+ /* Only cleanup resources of tablesync workers */
+ if (!OidIsValid(relid))
+ continue;
+
+ /*
+ * Drop the tablesync slots associated with removed tables.
+ *
+ * For SYNCDONE/READY states, the tablesync slot is known to have
+ * already been dropped by the tablesync worker.
+ *
+ * For other states, there is no certainty. Maybe the slot does
+ * not exist yet; Maybe the slot is already deleted but SYNCDONE
+ * is not yet set. For this reason, we allow missing_ok = true for
+ * the drop.
+ */
+ if (rstate->state != SUBREL_STATE_SYNCDONE)
+ {
+ char syncslotname[NAMEDATALEN] = {0};
+
+ ReplicationSlotNameForTablesync(subid, relid, syncslotname);
+ ReplicationSlotDropAtPubNode(wrconn, syncslotname, true /* missing_ok */ );
+ }
+ }
+
+ list_free(rstates);
+
+ /*
+ * If there is a slot associated with the subscription, then drop the
+ * replication slot at the publisher.
+ */
+ if (slotname)
+ ReplicationSlotDropAtPubNode(wrconn, slotname, false /* missing_ok */ );
+
+ }
+ PG_FINALLY();
+ {
+ walrcv_disconnect(wrconn);
+ }
+ PG_END_TRY();
+
+ table_close(rel, NoLock);
+}
+
+/*
+ * Drop the replication slot at the publisher node using the replication connection.
+ *
+ * missing_ok - if true then only issue WARNING message if the slot cannot be deleted.
+ */
+void
+ReplicationSlotDropAtPubNode(WalReceiverConn *wrconn, char *slotname, bool missing_ok)
+{
+ StringInfoData cmd;
+
+ Assert(wrconn);
+
+ load_file("libpqwalreceiver", false);
+
+ initStringInfo(&cmd);
+ appendStringInfo(&cmd, "DROP_REPLICATION_SLOT %s WAIT", quote_identifier(slotname));
PG_TRY();
{
@@ -1089,27 +1281,37 @@ DropSubscription(DropSubscriptionStmt *stmt, bool isTopLevel)
res = walrcv_exec(wrconn, cmd.data, 0, NULL);
- if (res->status != WALRCV_OK_COMMAND)
- ereport(ERROR,
+ if (res->status == WALRCV_OK_COMMAND)
+ {
+ /* NOTICE. Success. */
+ ereport(NOTICE,
+ (errmsg("dropped replication slot \"%s\" on publisher",
+ slotname)));
+ }
+ else if (res->status == WALRCV_ERROR && missing_ok)
+ {
+ /* WARNING. Error, but missing_ok = true. */
+ ereport(WARNING,
(errmsg("could not drop the replication slot \"%s\" on publisher",
slotname),
errdetail("The error was: %s", res->err)));
+ }
else
- ereport(NOTICE,
- (errmsg("dropped replication slot \"%s\" on publisher",
- slotname)));
+ {
+ /* ERROR. */
+ ereport(ERROR,
+ (errmsg("could not drop the replication slot \"%s\" on publisher",
+ slotname),
+ errdetail("The error was: %s", res->err)));
+ }
walrcv_clear_result(res);
}
PG_FINALLY();
{
- walrcv_disconnect(wrconn);
+ pfree(cmd.data);
}
PG_END_TRY();
-
- pfree(cmd.data);
-
- table_close(rel, NoLock);
}
/*
@@ -1278,3 +1480,45 @@ fetch_table_list(WalReceiverConn *wrconn, List *publications)
return tablelist;
}
+
+/*
+ * This is to report the connection failure while dropping replication slots.
+ * Here, we report the WARNING for all tablesync slots so that user can drop
+ * them manually, if required.
+ */
+static void
+ReportSlotConnectionError(List *rstates, Oid subid, char *slotname, char *err)
+{
+ ListCell *lc;
+
+ foreach(lc, rstates)
+ {
+ SubscriptionRelState *rstate = (SubscriptionRelState *) lfirst(lc);
+ Oid relid = rstate->relid;
+
+ /* Only cleanup resources of tablesync workers */
+ if (!OidIsValid(relid))
+ continue;
+
+ /*
+ * Caller needs to ensure that relstate doesn't change underneath us.
+ * See DropSubscription where we get the relstates.
+ */
+ if (rstate->state != SUBREL_STATE_SYNCDONE)
+ {
+ char syncslotname[NAMEDATALEN] = { 0 };
+
+ ReplicationSlotNameForTablesync(subid, relid, syncslotname);
+ elog(WARNING, "could not drop tablesync replication slot \"%s\"",
+ syncslotname);
+ }
+ }
+
+ ereport(ERROR,
+ (errmsg("could not connect to publisher when attempting to "
+ "drop the replication slot \"%s\"", slotname),
+ errdetail("The error was: %s", err),
+ /* translator: %s is an SQL ALTER command */
+ errhint("Use %s to disassociate the subscription from the slot.",
+ "ALTER SUBSCRIPTION ... SET (slot_name = NONE)")));
+}
diff --git a/src/backend/replication/logical/launcher.c b/src/backend/replication/logical/launcher.c
index 186514cd9e..58082dde18 100644
--- a/src/backend/replication/logical/launcher.c
+++ b/src/backend/replication/logical/launcher.c
@@ -73,20 +73,6 @@ typedef struct LogicalRepWorkerId
Oid relid;
} LogicalRepWorkerId;
-typedef struct StopWorkersData
-{
- int nestDepth; /* Sub-transaction nest level */
- List *workers; /* List of LogicalRepWorkerId */
- struct StopWorkersData *parent; /* This need not be an immediate
- * subtransaction parent */
-} StopWorkersData;
-
-/*
- * Stack of StopWorkersData elements. Each stack element contains the workers
- * to be stopped for that subtransaction.
- */
-static StopWorkersData *on_commit_stop_workers = NULL;
-
static void ApplyLauncherWakeup(void);
static void logicalrep_launcher_onexit(int code, Datum arg);
static void logicalrep_worker_onexit(int code, Datum arg);
@@ -546,51 +532,6 @@ logicalrep_worker_stop(Oid subid, Oid relid)
LWLockRelease(LogicalRepWorkerLock);
}
-/*
- * Request worker for specified sub/rel to be stopped on commit.
- */
-void
-logicalrep_worker_stop_at_commit(Oid subid, Oid relid)
-{
- int nestDepth = GetCurrentTransactionNestLevel();
- LogicalRepWorkerId *wid;
- MemoryContext oldctx;
-
- /* Make sure we store the info in context that survives until commit. */
- oldctx = MemoryContextSwitchTo(TopTransactionContext);
-
- /* Check that previous transactions were properly cleaned up. */
- Assert(on_commit_stop_workers == NULL ||
- nestDepth >= on_commit_stop_workers->nestDepth);
-
- /*
- * Push a new stack element if we don't already have one for the current
- * nestDepth.
- */
- if (on_commit_stop_workers == NULL ||
- nestDepth > on_commit_stop_workers->nestDepth)
- {
- StopWorkersData *newdata = palloc(sizeof(StopWorkersData));
-
- newdata->nestDepth = nestDepth;
- newdata->workers = NIL;
- newdata->parent = on_commit_stop_workers;
- on_commit_stop_workers = newdata;
- }
-
- /*
- * Finally add a new worker into the worker list of the current
- * subtransaction.
- */
- wid = palloc(sizeof(LogicalRepWorkerId));
- wid->subid = subid;
- wid->relid = relid;
- on_commit_stop_workers->workers =
- lappend(on_commit_stop_workers->workers, wid);
-
- MemoryContextSwitchTo(oldctx);
-}
-
/*
* Wake up (using latch) any logical replication worker for specified sub/rel.
*/
@@ -819,109 +760,21 @@ ApplyLauncherShmemInit(void)
}
}
-/*
- * Check whether current transaction has manipulated logical replication
- * workers.
- */
-bool
-XactManipulatesLogicalReplicationWorkers(void)
-{
- return (on_commit_stop_workers != NULL);
-}
-
/*
* Wakeup the launcher on commit if requested.
*/
void
AtEOXact_ApplyLauncher(bool isCommit)
{
-
- Assert(on_commit_stop_workers == NULL ||
- (on_commit_stop_workers->nestDepth == 1 &&
- on_commit_stop_workers->parent == NULL));
-
if (isCommit)
{
- ListCell *lc;
-
- if (on_commit_stop_workers != NULL)
- {
- List *workers = on_commit_stop_workers->workers;
-
- foreach(lc, workers)
- {
- LogicalRepWorkerId *wid = lfirst(lc);
-
- logicalrep_worker_stop(wid->subid, wid->relid);
- }
- }
-
if (on_commit_launcher_wakeup)
ApplyLauncherWakeup();
}
- /*
- * No need to pfree on_commit_stop_workers. It was allocated in
- * transaction memory context, which is going to be cleaned soon.
- */
- on_commit_stop_workers = NULL;
on_commit_launcher_wakeup = false;
}
-/*
- * On commit, merge the current on_commit_stop_workers list into the
- * immediate parent, if present.
- * On rollback, discard the current on_commit_stop_workers list.
- * Pop out the stack.
- */
-void
-AtEOSubXact_ApplyLauncher(bool isCommit, int nestDepth)
-{
- StopWorkersData *parent;
-
- /* Exit immediately if there's no work to do at this level. */
- if (on_commit_stop_workers == NULL ||
- on_commit_stop_workers->nestDepth < nestDepth)
- return;
-
- Assert(on_commit_stop_workers->nestDepth == nestDepth);
-
- parent = on_commit_stop_workers->parent;
-
- if (isCommit)
- {
- /*
- * If the upper stack element is not an immediate parent
- * subtransaction, just decrement the notional nesting depth without
- * doing any real work. Else, we need to merge the current workers
- * list into the parent.
- */
- if (!parent || parent->nestDepth < nestDepth - 1)
- {
- on_commit_stop_workers->nestDepth--;
- return;
- }
-
- parent->workers =
- list_concat(parent->workers, on_commit_stop_workers->workers);
- }
- else
- {
- /*
- * Abandon everything that was done at this nesting level. Explicitly
- * free memory to avoid a transaction-lifespan leak.
- */
- list_free_deep(on_commit_stop_workers->workers);
- }
-
- /*
- * We have taken care of the current subtransaction workers list for both
- * abort or commit. So we are ready to pop the stack.
- */
- pfree(on_commit_stop_workers);
- on_commit_stop_workers = parent;
-}
-
/*
* Request wakeup of the launcher on commit of the transaction.
*
diff --git a/src/backend/replication/logical/tablesync.c b/src/backend/replication/logical/tablesync.c
index 863d196fd7..4163fa8ad4 100644
--- a/src/backend/replication/logical/tablesync.c
+++ b/src/backend/replication/logical/tablesync.c
@@ -31,8 +31,11 @@
* table state to INIT.
* - Tablesync worker starts; changes table state from INIT to DATASYNC while
* copying.
- * - Tablesync worker finishes the copy and sets table state to SYNCWAIT;
- * waits for state change.
+ * - Tablesync worker does initial table copy; there is a FINISHEDCOPY (sync
+ * worker specific) state to indicate when the copy phase has completed, so
+ * if the worker crashes with this (non-memory) state then the copy will not
+ * be re-attempted.
+ * - Tablesync worker then sets table state to SYNCWAIT; waits for state change.
* - Apply worker periodically checks for tables in SYNCWAIT state. When
* any appear, it sets the table state to CATCHUP and starts loop-waiting
* until either the table state is set to SYNCDONE or the sync worker
@@ -48,8 +51,8 @@
* point it sets state to READY and stops tracking. Again, there might
* be zero changes in between.
*
- * So the state progression is always: INIT -> DATASYNC -> SYNCWAIT ->
- * CATCHUP -> SYNCDONE -> READY.
+ * So the state progression is always: INIT -> DATASYNC -> FINISHEDCOPY
+ * -> SYNCWAIT -> CATCHUP -> SYNCDONE -> READY.
*
* The catalog pg_subscription_rel is used to keep information about
* subscribed tables and their state. Some transient state during data
@@ -59,6 +62,7 @@
* Example flows look like this:
* - Apply is in front:
* sync:8
+ * -> set in catalog FINISHEDCOPY
* -> set in memory SYNCWAIT
* apply:10
* -> set in memory CATCHUP
@@ -74,6 +78,7 @@
*
* - Sync is in front:
* sync:10
+ * -> set in catalog FINISHEDCOPY
* -> set in memory SYNCWAIT
* apply:8
* -> set in memory CATCHUP
@@ -98,11 +103,16 @@
#include "miscadmin.h"
#include "parser/parse_relation.h"
#include "pgstat.h"
+#include "postmaster/interrupt.h"
#include "replication/logicallauncher.h"
#include "replication/logicalrelation.h"
+#include "replication/logicalworker.h"
#include "replication/walreceiver.h"
#include "replication/worker_internal.h"
+#include "replication/slot.h"
+#include "replication/origin.h"
#include "storage/ipc.h"
+#include "storage/lmgr.h"
#include "utils/builtins.h"
#include "utils/lsyscache.h"
#include "utils/memutils.h"
@@ -112,6 +122,42 @@ static bool table_states_valid = false;
StringInfo copybuf = NULL;
+/*
+ * Common code to drop the origin of a tablesync worker.
+ *
+ * There is a potential race condition if two processes attempt to call
+ * replorigin_drop for the same originid at the same time. The loser of
+ * that race would give an ERROR saying that it failed to find the
+ * expected originid.
+ *
+ * The TRY/CATCH below supresses such errors allowing the tablesync cleanup
+ * code to proceed.
+ */
+void
+tablesync_replorigin_drop(Oid subid, Oid relid, bool nowait)
+{
+ char originname[NAMEDATALEN];
+ RepOriginId originid;
+
+ snprintf(originname, sizeof(originname), "pg_%u_%u", subid, relid);
+ originid = replorigin_by_name(originname, true);
+ if (OidIsValid(originid))
+ {
+ PG_TRY();
+ {
+ replorigin_drop(originid, nowait);
+ }
+ PG_CATCH();
+ {
+ ereport(WARNING,
+ errmsg("could not drop replication origin with OID %d, named \"%s\"",
+ originid,
+ originname));
+ }
+ PG_END_TRY();
+ }
+}
+
/*
* Exit routine for synchronization worker.
*/
@@ -270,30 +316,55 @@ invalidate_syncing_table_states(Datum arg, int cacheid, uint32 hashvalue)
static void
process_syncing_tables_for_sync(XLogRecPtr current_lsn)
{
- Assert(IsTransactionState());
+ bool sync_done = false;
+ Oid subid = MySubscription->oid;
+ Oid relid = MyLogicalRepWorker->relid;
SpinLockAcquire(&MyLogicalRepWorker->relmutex);
+ sync_done = MyLogicalRepWorker->relstate == SUBREL_STATE_CATCHUP &&
+ current_lsn >= MyLogicalRepWorker->relstate_lsn;
+ SpinLockRelease(&MyLogicalRepWorker->relmutex);
- if (MyLogicalRepWorker->relstate == SUBREL_STATE_CATCHUP &&
- current_lsn >= MyLogicalRepWorker->relstate_lsn)
+ if (sync_done)
{
TimeLineID tli;
+ char syncslotname[NAMEDATALEN] = {0};
+ /* End wal streaming so wrconn can be re-used to drop the slot. */
+ walrcv_endstreaming(wrconn, &tli);
+
+ /*
+ * Cleanup the tablesync slot.
+ */
+ ReplicationSlotNameForTablesync(subid, relid, syncslotname);
+
+ ReplicationSlotDropAtPubNode(wrconn, syncslotname, false /* missing_ok */ );
+
+ /*
+ * Change state to SYNCDONE.
+ */
+ SpinLockAcquire(&MyLogicalRepWorker->relmutex);
MyLogicalRepWorker->relstate = SUBREL_STATE_SYNCDONE;
MyLogicalRepWorker->relstate_lsn = current_lsn;
SpinLockRelease(&MyLogicalRepWorker->relmutex);
+ /*
+ * UpdateSubscriptionRelState must be called within a transaction.
+ * That transaction will be ended within the finish_sync_worker().
+ */
+ if (!IsTransactionState())
+ {
+ StartTransactionCommand();
+ }
+
UpdateSubscriptionRelState(MyLogicalRepWorker->subid,
MyLogicalRepWorker->relid,
MyLogicalRepWorker->relstate,
MyLogicalRepWorker->relstate_lsn);
- walrcv_endstreaming(wrconn, &tli);
finish_sync_worker();
}
- else
- SpinLockRelease(&MyLogicalRepWorker->relmutex);
}
/*
@@ -412,6 +483,21 @@ process_syncing_tables_for_apply(XLogRecPtr current_lsn)
started_tx = true;
}
+ /*
+ * Remove the tablesync origin tracking if exists.
+ *
+ * The normal case origin drop is done here instead of in the
+ * process_syncing_tables_for_sync function because if the
+ * tablesync worker process attempted to call drop its own
+ * orign then would prevent the origin from advancing properly
+ * on commit TX.
+ */
+ tablesync_replorigin_drop(MyLogicalRepWorker->subid,
+ rstate->relid, false /* nowait */ );
+
+ /*
+ * Update the state to READY only after the origin cleanup.
+ */
UpdateSubscriptionRelState(MyLogicalRepWorker->subid,
rstate->relid, rstate->state,
rstate->lsn);
@@ -807,6 +893,32 @@ copy_table(Relation rel)
logicalrep_rel_close(relmapentry, NoLock);
}
+/*
+ * Determine the tablesync slot name.
+ *
+ * The name must not exceed NAMEDATALEN -1 because of remote node constraints
+ * on slot name length.
+ *
+ * The returned slot name is either:
+ * - stored in the supplied buffer (syncslotname), or
+ * - palloc'ed in current memory context (if syncslotname = NULL).
+ *
+ * Note: We don't use the subscription slot name as part of tablesync slot name
+ * because we are responsible for cleaning up these slots and it could become
+ * impossible to recalculate what name to cleanup if the subscription slot name
+ * had changed.
+ */
+char *
+ReplicationSlotNameForTablesync(Oid suboid, Oid relid, char syncslotname[NAMEDATALEN])
+{
+ if (syncslotname)
+ sprintf(syncslotname, "pg_%u_sync_%u", suboid, relid);
+ else
+ syncslotname = psprintf("pg_%u_sync_%u", suboid, relid);
+
+ return syncslotname;
+}
+
/*
* Start syncing the table in the sync worker.
*
@@ -824,6 +936,8 @@ LogicalRepSyncTableStart(XLogRecPtr *origin_startpos)
XLogRecPtr relstate_lsn;
Relation rel;
WalRcvExecResult *res;
+ char originname[NAMEDATALEN];
+ RepOriginId originid;
/* Check the state of the table synchronization. */
StartTransactionCommand();
@@ -849,19 +963,10 @@ LogicalRepSyncTableStart(XLogRecPtr *origin_startpos)
finish_sync_worker(); /* doesn't return */
}
- /*
- * To build a slot name for the sync work, we are limited to NAMEDATALEN -
- * 1 characters. We cut the original slot name to NAMEDATALEN - 28 chars
- * and append _%u_sync_%u (1 + 10 + 6 + 10 + '\0'). (It's actually the
- * NAMEDATALEN on the remote that matters, but this scheme will also work
- * reasonably if that is different.)
- */
- StaticAssertStmt(NAMEDATALEN >= 32, "NAMEDATALEN too small"); /* for sanity */
- slotname = psprintf("%.*s_%u_sync_%u",
- NAMEDATALEN - 28,
- MySubscription->slotname,
- MySubscription->oid,
- MyLogicalRepWorker->relid);
+ /* Calculate the name of the tablesync slot. */
+ slotname = ReplicationSlotNameForTablesync(MySubscription->oid,
+ MyLogicalRepWorker->relid,
+ NULL /* use palloc */ );
/*
* Here we use the slot name instead of the subscription name as the
@@ -874,7 +979,33 @@ LogicalRepSyncTableStart(XLogRecPtr *origin_startpos)
(errmsg("could not connect to the publisher: %s", err)));
Assert(MyLogicalRepWorker->relstate == SUBREL_STATE_INIT ||
- MyLogicalRepWorker->relstate == SUBREL_STATE_DATASYNC);
+ MyLogicalRepWorker->relstate == SUBREL_STATE_DATASYNC ||
+ MyLogicalRepWorker->relstate == SUBREL_STATE_FINISHEDCOPY);
+
+ /* Assign the origin tracking record name. */
+ snprintf(originname, sizeof(originname), "pg_%u_%u", MySubscription->oid, MyLogicalRepWorker->relid);
+
+ if (MyLogicalRepWorker->relstate == SUBREL_STATE_FINISHEDCOPY)
+ {
+ /*
+ * The COPY phase was previously done, but tablesync then crashed
+ * before it was able to finish normally.
+ */
+ StartTransactionCommand();
+
+ /*
+ * The origin tracking name must already exist. It was created first
+ * time this tablesync was launched.
+ */
+ originid = replorigin_by_name(originname, false /* missing_ok */ );
+ replorigin_session_setup(originid);
+ replorigin_session_origin = originid;
+ *origin_startpos = replorigin_session_get_progress(false);
+
+ CommitTransactionCommand();
+
+ goto copy_table_done;
+ }
SpinLockAcquire(&MyLogicalRepWorker->relmutex);
MyLogicalRepWorker->relstate = SUBREL_STATE_DATASYNC;
@@ -890,9 +1021,6 @@ LogicalRepSyncTableStart(XLogRecPtr *origin_startpos)
CommitTransactionCommand();
pgstat_report_stat(false);
- /*
- * We want to do the table data sync in a single transaction.
- */
StartTransactionCommand();
/*
@@ -918,29 +1046,108 @@ LogicalRepSyncTableStart(XLogRecPtr *origin_startpos)
walrcv_clear_result(res);
/*
- * Create a new temporary logical decoding slot. This slot will be used
- * for the catchup phase after COPY is done, so tell it to use the
- * snapshot to make the final data consistent.
+ * Be sure to remove the newly created tablesync slot if the COPY fails.
*/
- walrcv_create_slot(wrconn, slotname, true,
- CRS_USE_SNAPSHOT, origin_startpos);
+ PG_TRY();
+ {
+ /*
+ * Create a new permanent logical decoding slot. This slot will be
+ * used for the catchup phase after COPY is done, so tell it to use
+ * the snapshot to make the final data consistent.
+ */
+ walrcv_create_slot(wrconn, slotname, false /* permanent */ ,
+ CRS_USE_SNAPSHOT, origin_startpos);
- /* Now do the initial data copy */
- PushActiveSnapshot(GetTransactionSnapshot());
- copy_table(rel);
- PopActiveSnapshot();
+ /* Now do the initial data copy */
+ PushActiveSnapshot(GetTransactionSnapshot());
+ copy_table(rel);
+ PopActiveSnapshot();
- res = walrcv_exec(wrconn, "COMMIT", 0, NULL);
- if (res->status != WALRCV_OK_COMMAND)
- ereport(ERROR,
- (errmsg("table copy could not finish transaction on publisher"),
- errdetail("The error was: %s", res->err)));
- walrcv_clear_result(res);
+ res = walrcv_exec(wrconn, "COMMIT", 0, NULL);
+ if (res->status != WALRCV_OK_COMMAND)
+ ereport(ERROR,
+ (errmsg("table copy could not finish transaction on publisher"),
+ errdetail("The error was: %s", res->err)));
+ walrcv_clear_result(res);
+
+ table_close(rel, NoLock);
+
+ /* Make the copy visible. */
+ CommandCounterIncrement();
+
+ /* Setup replication origin tracking. */
+ originid = replorigin_by_name(originname, true);
+ if (!OidIsValid(originid))
+ {
+ /*
+ * Origin tracking does not exist, so create it now.
+ *
+ * Then advance to the LSN got from walrcv_create_slot. This is
+ * WAL logged for the purpose of recovery. Locks are to prevent
+ * the replication origin from vanishing while advancing.
+ */
+ originid = replorigin_create(originname);
+
+ LockRelationOid(ReplicationOriginRelationId, RowExclusiveLock);
+ replorigin_advance(originid, *origin_startpos, InvalidXLogRecPtr,
+ true /* go backward */ , true /* WAL log */ );
+ UnlockRelationOid(ReplicationOriginRelationId, RowExclusiveLock);
+
+ replorigin_session_setup(originid);
+ replorigin_session_origin = originid;
+ }
+ else
+ {
+ ereport(ERROR,
+ (errcode(ERRCODE_DUPLICATE_OBJECT),
+ errmsg("replication origin \"%s\" already exists",
+ originname)));
+ }
+
+ /*
+ * Update the persisted state to indicate the COPY phase is done; make
+ * it visible to others.
+ */
+ UpdateSubscriptionRelState(MyLogicalRepWorker->subid,
+ MyLogicalRepWorker->relid,
+ SUBREL_STATE_FINISHEDCOPY,
+ MyLogicalRepWorker->relstate_lsn);
+
+ CommitTransactionCommand();
+ }
+ PG_CATCH();
+ {
+ /*
+ * Cleanup the transaction state on publisher before performing any
+ * other operation.
+ */
+ res = walrcv_exec(wrconn, "ROLLBACK", 0, NULL);
+ if (res->status != WALRCV_OK_COMMAND)
+ ereport(ERROR,
+ (errmsg("table copy could not rollback transaction on publisher"),
+ errdetail("The error was: %s", res->err)));
+ walrcv_clear_result(res);
+
+ /*
+ * If something failed during copy table then cleanup the created
+ * slot.
+ */
+ ReplicationSlotDropAtPubNode(wrconn, slotname, false /* missing_ok */ );
+
+ pfree(slotname);
+ slotname = NULL;
+
+ PG_RE_THROW();
+ }
+ PG_END_TRY();
- table_close(rel, NoLock);
+copy_table_done:
- /* Make the copy visible. */
- CommandCounterIncrement();
+ elog(DEBUG1,
+ "LogicalRepSyncTableStart: '%s' origin_startpos lsn %X/%X",
+ originname,
+ (uint32) (*origin_startpos >> 32),
+ (uint32) *origin_startpos);
/*
* We are done with the initial data synchronization, update the state.
diff --git a/src/backend/replication/logical/worker.c b/src/backend/replication/logical/worker.c
index eb7db89cef..cfc924cd89 100644
--- a/src/backend/replication/logical/worker.c
+++ b/src/backend/replication/logical/worker.c
@@ -807,12 +807,8 @@ apply_handle_stream_stop(StringInfo s)
/* We must be in a valid transaction state */
Assert(IsTransactionState());
- /* The synchronization worker runs in single transaction. */
- if (!am_tablesync_worker())
- {
- /* Commit the per-stream transaction */
- CommitTransactionCommand();
- }
+ /* Commit the per-stream transaction */
+ CommitTransactionCommand();
in_streamed_transaction = false;
@@ -889,9 +885,7 @@ apply_handle_stream_abort(StringInfo s)
/* Cleanup the subxact info */
cleanup_subxact_info();
- /* The synchronization worker runs in single transaction */
- if (!am_tablesync_worker())
- CommitTransactionCommand();
+ CommitTransactionCommand();
return;
}
@@ -918,8 +912,7 @@ apply_handle_stream_abort(StringInfo s)
/* write the updated subxact list */
subxact_info_write(MyLogicalRepWorker->subid, xid);
- if (!am_tablesync_worker())
- CommitTransactionCommand();
+ CommitTransactionCommand();
}
}
@@ -1062,8 +1055,7 @@ apply_handle_stream_commit(StringInfo s)
static void
apply_handle_commit_internal(StringInfo s, LogicalRepCommitData *commit_data)
{
- /* The synchronization worker runs in single transaction. */
- if (IsTransactionState() && !am_tablesync_worker())
+ if (IsTransactionState())
{
/*
* Update origin state so we can restart streaming from correct
diff --git a/src/backend/tcop/utility.c b/src/backend/tcop/utility.c
index 1d81071c35..05bb698cf4 100644
--- a/src/backend/tcop/utility.c
+++ b/src/backend/tcop/utility.c
@@ -1786,7 +1786,8 @@ ProcessUtilitySlow(ParseState *pstate,
break;
case T_AlterSubscriptionStmt:
- address = AlterSubscription((AlterSubscriptionStmt *) parsetree);
+ address = AlterSubscription((AlterSubscriptionStmt *) parsetree,
+ isTopLevel);
break;
case T_DropSubscriptionStmt:
diff --git a/src/include/catalog/pg_subscription_rel.h b/src/include/catalog/pg_subscription_rel.h
index ab1202cf9b..e04ba83e1e 100644
--- a/src/include/catalog/pg_subscription_rel.h
+++ b/src/include/catalog/pg_subscription_rel.h
@@ -61,6 +61,8 @@ DECLARE_UNIQUE_INDEX_PKEY(pg_subscription_rel_srrelid_srsubid_index, 6117, on pg
#define SUBREL_STATE_INIT 'i' /* initializing (sublsn NULL) */
#define SUBREL_STATE_DATASYNC 'd' /* data is being synchronized (sublsn
* NULL) */
+#define SUBREL_STATE_FINISHEDCOPY 'f' /* tablesync copy phase is completed
+ * (sublsn NULL) */
#define SUBREL_STATE_SYNCDONE 's' /* synchronization finished in front of
* apply (sublsn set) */
#define SUBREL_STATE_READY 'r' /* ready (sublsn set) */
diff --git a/src/include/commands/subscriptioncmds.h b/src/include/commands/subscriptioncmds.h
index a81865079d..3b926f35d7 100644
--- a/src/include/commands/subscriptioncmds.h
+++ b/src/include/commands/subscriptioncmds.h
@@ -20,7 +20,7 @@
extern ObjectAddress CreateSubscription(CreateSubscriptionStmt *stmt,
bool isTopLevel);
-extern ObjectAddress AlterSubscription(AlterSubscriptionStmt *stmt);
+extern ObjectAddress AlterSubscription(AlterSubscriptionStmt *stmt, bool isTopLevel);
extern void DropSubscription(DropSubscriptionStmt *stmt, bool isTopLevel);
extern ObjectAddress AlterSubscriptionOwner(const char *name, Oid newOwnerId);
diff --git a/src/include/replication/logicallauncher.h b/src/include/replication/logicallauncher.h
index 421ec1580d..301e494f7b 100644
--- a/src/include/replication/logicallauncher.h
+++ b/src/include/replication/logicallauncher.h
@@ -22,9 +22,7 @@ extern Size ApplyLauncherShmemSize(void);
extern void ApplyLauncherShmemInit(void);
extern void ApplyLauncherWakeupAtCommit(void);
-extern bool XactManipulatesLogicalReplicationWorkers(void);
extern void AtEOXact_ApplyLauncher(bool isCommit);
-extern void AtEOSubXact_ApplyLauncher(bool isCommit, int nestDepth);
extern bool IsLogicalLauncher(void);
diff --git a/src/include/replication/slot.h b/src/include/replication/slot.h
index 53f636c56f..5f52335f15 100644
--- a/src/include/replication/slot.h
+++ b/src/include/replication/slot.h
@@ -15,6 +15,7 @@
#include "storage/lwlock.h"
#include "storage/shmem.h"
#include "storage/spin.h"
+#include "replication/walreceiver.h"
/*
* Behaviour of replication slots, upon release or crash.
@@ -211,6 +212,8 @@ extern bool ReplicationSlotsCountDBSlots(Oid dboid, int *nslots, int *nactive);
extern void ReplicationSlotsDropDBSlots(Oid dboid);
extern void InvalidateObsoleteReplicationSlots(XLogSegNo oldestSegno);
extern ReplicationSlot *SearchNamedReplicationSlot(const char *name);
+extern char *ReplicationSlotNameForTablesync(Oid suboid, Oid relid, char *syncslotname);
+extern void ReplicationSlotDropAtPubNode(WalReceiverConn *wrconn, char *slotname, bool missing_ok);
extern void StartupReplicationSlots(void);
extern void CheckPointReplicationSlots(void);
diff --git a/src/include/replication/worker_internal.h b/src/include/replication/worker_internal.h
index d046022e49..4a5c49da7d 100644
--- a/src/include/replication/worker_internal.h
+++ b/src/include/replication/worker_internal.h
@@ -77,13 +77,14 @@ extern List *logicalrep_workers_find(Oid subid, bool only_running);
extern void logicalrep_worker_launch(Oid dbid, Oid subid, const char *subname,
Oid userid, Oid relid);
extern void logicalrep_worker_stop(Oid subid, Oid relid);
-extern void logicalrep_worker_stop_at_commit(Oid subid, Oid relid);
extern void logicalrep_worker_wakeup(Oid subid, Oid relid);
extern void logicalrep_worker_wakeup_ptr(LogicalRepWorker *worker);
extern int logicalrep_sync_worker_count(Oid subid);
extern char *LogicalRepSyncTableStart(XLogRecPtr *origin_startpos);
+extern void tablesync_replorigin_drop(Oid subid, Oid relid, bool nowait);
+
void process_syncing_tables(XLogRecPtr current_lsn);
void invalidate_syncing_table_states(Datum arg, int cacheid,
uint32 hashvalue);
diff --git a/src/tools/pgindent/typedefs.list b/src/tools/pgindent/typedefs.list
index 1d540fe489..5f5c36d8e2 100644
--- a/src/tools/pgindent/typedefs.list
+++ b/src/tools/pgindent/typedefs.list
@@ -2397,7 +2397,6 @@ StdAnalyzeData
StdRdOptions
Step
StopList
-StopWorkersData
StrategyNumber
StreamCtl
StreamXidHash
--
2.28.0.windows.1
On Tue, Feb 2, 2021 at 3:31 PM Peter Smith <smithpb2250@gmail.com> wrote:
After seeing Ajin's test [ac0202] which did a DROP TABLE, I have also
tried a simple test where I do a DROP TABLE with very bad timing for
the tablesync worker. It seems that doing this can cause the sync
worker's MyLogicalRepWorker->relid to become invalid.
I think this should be fixed by latest patch because I have disallowed
drop of a table when its synchronization is in progress. You can check
once and let me know if the issue still exists?
--
With Regards,
Amit Kapila.
On Wed, Feb 3, 2021 at 12:26 AM Amit Kapila <amit.kapila16@gmail.com> wrote:
On Tue, Feb 2, 2021 at 3:31 PM Peter Smith <smithpb2250@gmail.com> wrote:
After seeing Ajin's test [ac0202] which did a DROP TABLE, I have also
tried a simple test where I do a DROP TABLE with very bad timing for
the tablesync worker. It seems that doing this can cause the sync
worker's MyLogicalRepWorker->relid to become invalid.I think this should be fixed by latest patch because I have disallowed
drop of a table when its synchronization is in progress. You can check
once and let me know if the issue still exists?
FYI - I confirmed that the problem scenario that I reported yesterday
is no longer possible because now the V25 patch is disallowing the
DROP TABLE while the tablesync is still running.
PSA my test logs showing it is now working as expected.
----
Kind Regards,
Peter Smith.
Fujitsu Australia
Attachments:
On Wed, Feb 3, 2021 at 12:24 AM Amit Kapila <amit.kapila16@gmail.com> wrote:
The problem here is that we are allowing to drop the table when table
synchronization is still in progress and then we don't have any way to
know the corresponding slot or origin. I think we can try to drop the
slot and origin as well but that is not a good idea because slots once
dropped won't be rolled back. So, I have added a fix to disallow the
drop of the table when table synchronization is still in progress.
Apart from that, I have fixed comments raised by Peter as discussed
above and made some additional changes in comments, code (code changes
are cosmetic), and docs.Let me know if the issue reported is fixed or not?
Yes, the issue is fixed, now the table drop results in an error.
postgres=# drop table tab_rep ;
ERROR: could not drop relation mapping for subscription "tap_sub"
DETAIL: Table synchronization for relation "tab_rep" is in progress
and is in state "f".
HINT: Use ALTER SUBSCRIPTION ... ENABLE to enable subscription if not
already enabled or use DROP SUBSCRIPTION ... to drop the subscription.
regards,
Ajin Cherian
Fujitsu Australia
On Wed, Feb 3, 2021 at 6:38 AM Peter Smith <smithpb2250@gmail.com> wrote:
On Wed, Feb 3, 2021 at 12:26 AM Amit Kapila <amit.kapila16@gmail.com> wrote:
On Tue, Feb 2, 2021 at 3:31 PM Peter Smith <smithpb2250@gmail.com> wrote:
After seeing Ajin's test [ac0202] which did a DROP TABLE, I have also
tried a simple test where I do a DROP TABLE with very bad timing for
the tablesync worker. It seems that doing this can cause the sync
worker's MyLogicalRepWorker->relid to become invalid.I think this should be fixed by latest patch because I have disallowed
drop of a table when its synchronization is in progress. You can check
once and let me know if the issue still exists?FYI - I confirmed that the problem scenario that I reported yesterday
is no longer possible because now the V25 patch is disallowing the
DROP TABLE while the tablesync is still running.
Thanks for the confirmation. BTW, can you please check if we can
reproduce that problem without this patch? If so, we might want to
apply this fix irrespective of this patch. If not, why not?
--
With Regards,
Amit Kapila.
On Wed, Feb 3, 2021 at 1:34 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
On Wed, Feb 3, 2021 at 6:38 AM Peter Smith <smithpb2250@gmail.com> wrote:
On Wed, Feb 3, 2021 at 12:26 AM Amit Kapila <amit.kapila16@gmail.com> wrote:
On Tue, Feb 2, 2021 at 3:31 PM Peter Smith <smithpb2250@gmail.com> wrote:
After seeing Ajin's test [ac0202] which did a DROP TABLE, I have also
tried a simple test where I do a DROP TABLE with very bad timing for
the tablesync worker. It seems that doing this can cause the sync
worker's MyLogicalRepWorker->relid to become invalid.I think this should be fixed by latest patch because I have disallowed
drop of a table when its synchronization is in progress. You can check
once and let me know if the issue still exists?FYI - I confirmed that the problem scenario that I reported yesterday
is no longer possible because now the V25 patch is disallowing the
DROP TABLE while the tablesync is still running.Thanks for the confirmation. BTW, can you please check if we can
reproduce that problem without this patch? If so, we might want to
apply this fix irrespective of this patch. If not, why not?
Yes, this was an existing postgres bug. It is independent of the patch.
I can reproduce exactly the same stacktrace using the HEAD src pulled @ 3/Feb.
PSA my test logs showing the details.
----
Kind Regards,
Peter Smith.
Fujitsu Australia
Attachments:
On Tue, Feb 2, 2021 at 9:03 PM Ajin Cherian <itsajin@gmail.com> wrote:
I am sorry, my above steps were not correct. I think the reason for
the failure I was seeing were some other steps I did prior to this. I
will recreate this and update you with the appropriate steps.
The correct steps are as follows:
Publisher:
postgres=# CREATE TABLE tab_rep (a int primary key);
CREATE TABLE
postgres=# INSERT INTO tab_rep SELECT generate_series(1,1000000);
INSERT 0 1000000
postgres=# CREATE PUBLICATION tap_pub FOR ALL TABLES;
CREATE PUBLICATION
Subscriber:
postgres=# CREATE TABLE tab_rep (a int primary key);
CREATE TABLE
postgres=# CREATE SUBSCRIPTION tap_sub CONNECTION 'host=localhost
dbname=postgres port=6972' PUBLICATION tap_pub WITH (enabled = false);
NOTICE: created replication slot "tap_sub" on publisher
CREATE SUBSCRIPTION
postgres=# ALTER SUBSCRIPTION tap_sub enable;
ALTER SUBSCRIPTION
Allow the tablesync to complete and then drop the subscription, the
table remains full and restarting the subscription should fail with a
constraint violation during tablesync but it does not.
Subscriber:
postgres=# drop subscription tap_sub ;
NOTICE: dropped replication slot "tap_sub" on publisher
DROP SUBSCRIPTION
postgres=# CREATE SUBSCRIPTION tap_sub CONNECTION 'host=localhost
dbname=postgres port=6972' PUBLICATION tap_pub WITH (enabled = false);
NOTICE: created replication slot "tap_sub" on publisher
CREATE SUBSCRIPTION
postgres=# ALTER SUBSCRIPTION tap_sub enable;
ALTER SUBSCRIPTION
This takes the subscriber into an error loop but no mention of what
the error was:
2021-02-02 05:01:34.698 EST [1549] LOG: logical replication table
synchronization worker for subscription "tap_sub", table "tab_rep" has
started
2021-02-02 05:01:34.739 EST [1549] ERROR: table copy could not
rollback transaction on publisher
2021-02-02 05:01:34.739 EST [1549] DETAIL: The error was: another
command is already in progress
2021-02-02 05:01:34.740 EST [8028] LOG: background worker "logical
replication worker" (PID 1549) exited with exit code 1
2021-02-02 05:01:40.107 EST [1711] LOG: logical replication table
synchronization worker for subscription "tap_sub", table "tab_rep" has
started
2021-02-02 05:01:40.121 EST [1711] ERROR: could not create
replication slot "pg_16479_sync_16435": ERROR: replication slot
"pg_16479_sync_16435" already exists
2021-02-02 05:01:40.121 EST [8028] LOG: background worker "logical
replication worker" (PID 1711) exited with exit code 1
2021-02-02 05:01:45.140 EST [1891] LOG: logical replication table
synchronization worker for subscription "tap_sub", table "tab_rep" has
started
regards,
Ajin Cherian
Fujitsu Australia
On Wed, Feb 3, 2021 at 2:51 PM Peter Smith <smithpb2250@gmail.com> wrote:
On Wed, Feb 3, 2021 at 1:34 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
On Wed, Feb 3, 2021 at 6:38 AM Peter Smith <smithpb2250@gmail.com> wrote:
On Wed, Feb 3, 2021 at 12:26 AM Amit Kapila <amit.kapila16@gmail.com> wrote:
On Tue, Feb 2, 2021 at 3:31 PM Peter Smith <smithpb2250@gmail.com> wrote:
After seeing Ajin's test [ac0202] which did a DROP TABLE, I have also
tried a simple test where I do a DROP TABLE with very bad timing for
the tablesync worker. It seems that doing this can cause the sync
worker's MyLogicalRepWorker->relid to become invalid.I think this should be fixed by latest patch because I have disallowed
drop of a table when its synchronization is in progress. You can check
once and let me know if the issue still exists?FYI - I confirmed that the problem scenario that I reported yesterday
is no longer possible because now the V25 patch is disallowing the
DROP TABLE while the tablesync is still running.Thanks for the confirmation. BTW, can you please check if we can
reproduce that problem without this patch? If so, we might want to
apply this fix irrespective of this patch. If not, why not?Yes, this was an existing postgres bug. It is independent of the patch.
I can reproduce exactly the same stacktrace using the HEAD src pulled @ 3/Feb.
PSA my test logs showing the details.
Since this is an existing PG bug independent of this patch, I spawned
a new thread [ps0202] to deal with this problem.
----
[ps0202] /messages/by-id/CAHut+Pu7Z4a=omo+TvK5Gub2hxcJ7-3+Bu1FO_++fpFTW0oQfQ@mail.gmail.com
Kind Regards,
Peter Smith.
Fujitsu Australia
On Wed, Feb 3, 2021 at 1:28 PM Ajin Cherian <itsajin@gmail.com> wrote:
On Tue, Feb 2, 2021 at 9:03 PM Ajin Cherian <itsajin@gmail.com> wrote:
I am sorry, my above steps were not correct. I think the reason for
the failure I was seeing were some other steps I did prior to this. I
will recreate this and update you with the appropriate steps.The correct steps are as follows:
Publisher:
postgres=# CREATE TABLE tab_rep (a int primary key);
CREATE TABLE
postgres=# INSERT INTO tab_rep SELECT generate_series(1,1000000);
INSERT 0 1000000
postgres=# CREATE PUBLICATION tap_pub FOR ALL TABLES;
CREATE PUBLICATIONSubscriber:
postgres=# CREATE TABLE tab_rep (a int primary key);
CREATE TABLE
postgres=# CREATE SUBSCRIPTION tap_sub CONNECTION 'host=localhost
dbname=postgres port=6972' PUBLICATION tap_pub WITH (enabled = false);
NOTICE: created replication slot "tap_sub" on publisher
CREATE SUBSCRIPTION
postgres=# ALTER SUBSCRIPTION tap_sub enable;
ALTER SUBSCRIPTIONAllow the tablesync to complete and then drop the subscription, the
table remains full and restarting the subscription should fail with a
constraint violation during tablesync but it does not.Subscriber:
postgres=# drop subscription tap_sub ;
NOTICE: dropped replication slot "tap_sub" on publisher
DROP SUBSCRIPTION
postgres=# CREATE SUBSCRIPTION tap_sub CONNECTION 'host=localhost
dbname=postgres port=6972' PUBLICATION tap_pub WITH (enabled = false);
NOTICE: created replication slot "tap_sub" on publisher
CREATE SUBSCRIPTION
postgres=# ALTER SUBSCRIPTION tap_sub enable;
ALTER SUBSCRIPTIONThis takes the subscriber into an error loop but no mention of what
the error was:
Thanks for the report. The problem here was that the error occurred
when we were trying to copy the large data. Now, before fetching the
entire data we issued a rollback that led to this problem. I think the
alternative here could be to first fetch the entire data when the
error occurred then issue the following commands. Instead, I have
modified the patch to perform 'drop_replication_slot' in the beginning
if the relstate is datasync. Do let me know if you can think of a
better way to fix this?
--
With Regards,
Amit Kapila.
Attachments:
v26-0001-Tablesync-Solution1.patchapplication/octet-stream; name=v26-0001-Tablesync-Solution1.patchDownload
From c5638db6bc64eecc05d20f4b473cb698011c9b9b Mon Sep 17 00:00:00 2001
From: Amit Kapila <akapila@postgresql.org>
Date: Sat, 30 Jan 2021 10:21:28 +0530
Subject: [PATCH v26] Tablesync Solution1.
==== Features:
* The tablesync slot is now permanent instead of temporary.
* The tablesync worker is now allowing multiple tx instead of single tx.
* A new state (SUBREL_STATE_FINISHEDCOPY) is persisted after a successful copy_table in tablesync's LogicalRepSyncTableStart.
* If a re-launched tablesync finds state SUBREL_STATE_FINISHEDCOPY then it will bypass the initial copy_table phase.
* Now tablesync sets up replication origin tracking in LogicalRepSyncTableStart (similar as done for the apply worker). The origin is advanced when first created.
* Cleanup of tablesync resources:
- The tablesync slot is dropped by process_syncing_tables_for_sync functions.
- The tablesync replication origin tracking is dropped by process_syncing_tables_for_apply.
- DropSubscription/AlterSubscription_refresh also drop tablesyc slots/origins
* Updates to PG docs.
Known Issues:
* None.
---
doc/src/sgml/catalogs.sgml | 1 +
doc/src/sgml/logical-replication.sgml | 17 +-
doc/src/sgml/ref/alter_subscription.sgml | 6 +
doc/src/sgml/ref/drop_subscription.sgml | 6 +-
src/backend/access/transam/xact.c | 11 -
src/backend/catalog/pg_subscription.c | 35 ++
src/backend/commands/subscriptioncmds.c | 445 +++++++++++++++-----
src/backend/replication/logical/launcher.c | 147 -------
src/backend/replication/logical/tablesync.c | 255 +++++++++--
src/backend/replication/logical/worker.c | 18 +-
src/backend/tcop/utility.c | 3 +-
src/include/catalog/pg_subscription_rel.h | 2 +
src/include/commands/subscriptioncmds.h | 2 +-
src/include/replication/logicallauncher.h | 2 -
src/include/replication/slot.h | 3 +
src/include/replication/worker_internal.h | 3 +-
src/tools/pgindent/typedefs.list | 1 -
17 files changed, 644 insertions(+), 313 deletions(-)
diff --git a/doc/src/sgml/catalogs.sgml b/doc/src/sgml/catalogs.sgml
index 865e826fb0..920a39dfa9 100644
--- a/doc/src/sgml/catalogs.sgml
+++ b/doc/src/sgml/catalogs.sgml
@@ -7665,6 +7665,7 @@ SCRAM-SHA-256$<replaceable><iteration count></replaceable>:<replaceable>&l
State code:
<literal>i</literal> = initialize,
<literal>d</literal> = data is being copied,
+ <literal>f</literal> = finished table copy,
<literal>s</literal> = synchronized,
<literal>r</literal> = ready (normal replication)
</para></entry>
diff --git a/doc/src/sgml/logical-replication.sgml b/doc/src/sgml/logical-replication.sgml
index a560ad69b4..20cdd5715d 100644
--- a/doc/src/sgml/logical-replication.sgml
+++ b/doc/src/sgml/logical-replication.sgml
@@ -248,7 +248,17 @@
<para>
As mentioned earlier, each (active) subscription receives changes from a
- replication slot on the remote (publishing) side. Normally, the remote
+ replication slot on the remote (publishing) side.
+ </para>
+ <para>
+ Additional table synchronization slots are normally transient, created
+ internally and dropped automatically when they are no longer needed.
+ These table synchronization slots have generated names:
+ <quote><literal>pg_%u_sync_%u</literal></quote> (parameters: Subscription
+ <parameter>oid</parameter>, Table <parameter>relid</parameter>)
+ </para>
+ <para>
+ Normally, the remote
replication slot is created automatically when the subscription is created
using <command>CREATE SUBSCRIPTION</command> and it is dropped
automatically when the subscription is dropped using <command>DROP
@@ -294,8 +304,9 @@
using <command>ALTER SUBSCRIPTION</command> before attempting to drop
the subscription. If the remote database instance no longer exists, no
further action is then necessary. If, however, the remote database
- instance is just unreachable, the replication slot should then be
- dropped manually; otherwise it would continue to reserve WAL and might
+ instance is just unreachable, the replication slot (and any still
+ remaining table synchronization slots) should then be
+ dropped manually; otherwise it/they would continue to reserve WAL and might
eventually cause the disk to fill up. Such cases should be carefully
investigated.
</para>
diff --git a/doc/src/sgml/ref/alter_subscription.sgml b/doc/src/sgml/ref/alter_subscription.sgml
index db5e59f707..a6ffd6688f 100644
--- a/doc/src/sgml/ref/alter_subscription.sgml
+++ b/doc/src/sgml/ref/alter_subscription.sgml
@@ -48,6 +48,12 @@ ALTER SUBSCRIPTION <replaceable class="parameter">name</replaceable> RENAME TO <
(Currently, all subscription owners must be superusers, so the owner checks
will be bypassed in practice. But this might change in the future.)
</para>
+
+ <para>
+ Commands <command>ALTER SUBSCRIPTION ... REFRESH ..</command> and
+ <command>ALTER SUBSCRIPTION ... SET PUBLICATION ..</command> with refresh
+ option as true cannot be executed inside a transaction block.
+ </para>
</refsect1>
<refsect1>
diff --git a/doc/src/sgml/ref/drop_subscription.sgml b/doc/src/sgml/ref/drop_subscription.sgml
index adbdeafb4e..aee9615546 100644
--- a/doc/src/sgml/ref/drop_subscription.sgml
+++ b/doc/src/sgml/ref/drop_subscription.sgml
@@ -79,7 +79,8 @@ DROP SUBSCRIPTION [ IF EXISTS ] <replaceable class="parameter">name</replaceable
<para>
When dropping a subscription that is associated with a replication slot on
the remote host (the normal state), <command>DROP SUBSCRIPTION</command>
- will connect to the remote host and try to drop the replication slot as
+ will connect to the remote host and try to drop the replication slot (and
+ any remaining table synchronization slots) as
part of its operation. This is necessary so that the resources allocated
for the subscription on the remote host are released. If this fails,
either because the remote host is not reachable or because the remote
@@ -89,7 +90,8 @@ DROP SUBSCRIPTION [ IF EXISTS ] <replaceable class="parameter">name</replaceable
executing <literal>ALTER SUBSCRIPTION ... SET (slot_name = NONE)</literal>.
After that, <command>DROP SUBSCRIPTION</command> will no longer attempt any
actions on a remote host. Note that if the remote replication slot still
- exists, it should then be dropped manually; otherwise it will continue to
+ exists, it (and any related table synchronization slots) should then be
+ dropped manually; otherwise it/they will continue to
reserve WAL and might eventually cause the disk to fill up. See
also <xref linkend="logical-replication-subscription-slot"/>.
</para>
diff --git a/src/backend/access/transam/xact.c b/src/backend/access/transam/xact.c
index a2068e3fd4..3c8b4eb362 100644
--- a/src/backend/access/transam/xact.c
+++ b/src/backend/access/transam/xact.c
@@ -2432,15 +2432,6 @@ PrepareTransaction(void)
(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
errmsg("cannot PREPARE a transaction that has exported snapshots")));
- /*
- * Don't allow PREPARE but for transaction that has/might kill logical
- * replication workers.
- */
- if (XactManipulatesLogicalReplicationWorkers())
- ereport(ERROR,
- (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
- errmsg("cannot PREPARE a transaction that has manipulated logical replication workers")));
-
/* Prevent cancel/die interrupt while cleaning up */
HOLD_INTERRUPTS();
@@ -4899,7 +4890,6 @@ CommitSubTransaction(void)
AtEOSubXact_HashTables(true, s->nestingLevel);
AtEOSubXact_PgStat(true, s->nestingLevel);
AtSubCommit_Snapshot(s->nestingLevel);
- AtEOSubXact_ApplyLauncher(true, s->nestingLevel);
/*
* We need to restore the upper transaction's read-only state, in case the
@@ -5059,7 +5049,6 @@ AbortSubTransaction(void)
AtEOSubXact_HashTables(false, s->nestingLevel);
AtEOSubXact_PgStat(false, s->nestingLevel);
AtSubAbort_Snapshot(s->nestingLevel);
- AtEOSubXact_ApplyLauncher(false, s->nestingLevel);
}
/*
diff --git a/src/backend/catalog/pg_subscription.c b/src/backend/catalog/pg_subscription.c
index 44cb285b68..e422fbffa6 100644
--- a/src/backend/catalog/pg_subscription.c
+++ b/src/backend/catalog/pg_subscription.c
@@ -29,6 +29,7 @@
#include "utils/array.h"
#include "utils/builtins.h"
#include "utils/fmgroids.h"
+#include "utils/lsyscache.h"
#include "utils/pg_lsn.h"
#include "utils/rel.h"
#include "utils/syscache.h"
@@ -337,6 +338,13 @@ GetSubscriptionRelState(Oid subid, Oid relid, XLogRecPtr *sublsn)
char substate;
bool isnull;
Datum d;
+ Relation rel;
+
+ /*
+ * This is to avoid the race condition with AlterSubscription which tries
+ * to remove this relstate.
+ */
+ rel = table_open(SubscriptionRelRelationId, AccessShareLock);
/* Try finding the mapping. */
tup = SearchSysCache2(SUBSCRIPTIONRELMAP,
@@ -363,6 +371,8 @@ GetSubscriptionRelState(Oid subid, Oid relid, XLogRecPtr *sublsn)
/* Cleanup */
ReleaseSysCache(tup);
+ table_close(rel, AccessShareLock);
+
return substate;
}
@@ -403,6 +413,31 @@ RemoveSubscriptionRel(Oid subid, Oid relid)
scan = table_beginscan_catalog(rel, nkeys, skey);
while (HeapTupleIsValid(tup = heap_getnext(scan, ForwardScanDirection)))
{
+ Form_pg_subscription_rel subrel;
+
+ subrel = (Form_pg_subscription_rel) GETSTRUCT(tup);
+
+ /*
+ * We don't allow to drop the relation mapping when the table
+ * synchronization is in progress unless the caller updates the
+ * corresponding subscription as well. This is to ensure that we don't
+ * leave tablesync slots or origins in the system when the
+ * corresponding table is dropped.
+ */
+ if (!OidIsValid(subid) && subrel->srsubstate != SUBREL_STATE_READY)
+ {
+ ereport(ERROR,
+ (errcode(ERRCODE_UNDEFINED_OBJECT),
+ errmsg("could not drop relation mapping for subscription \"%s\"",
+ get_subscription_name(subrel->srsubid, false)),
+ errdetail("Table synchronization for relation \"%s\" is in progress and is in state \"%c\".",
+ get_rel_name(relid), subrel->srsubstate),
+ /* translator: first %s is a SQL ALTER command and second %s is a SQL DROP command */
+ errhint("Use %s to enable subscription if not already enabled or use %s to drop the subscription.",
+ "ALTER SUBSCRIPTION ... ENABLE",
+ "DROP SUBSCRIPTION ...")));
+ }
+
CatalogTupleDelete(rel, &tup->t_self);
}
table_endscan(scan);
diff --git a/src/backend/commands/subscriptioncmds.c b/src/backend/commands/subscriptioncmds.c
index 082f7855b8..81719cb27d 100644
--- a/src/backend/commands/subscriptioncmds.c
+++ b/src/backend/commands/subscriptioncmds.c
@@ -34,6 +34,7 @@
#include "nodes/makefuncs.h"
#include "replication/logicallauncher.h"
#include "replication/origin.h"
+#include "replication/slot.h"
#include "replication/walreceiver.h"
#include "replication/walsender.h"
#include "replication/worker_internal.h"
@@ -46,6 +47,8 @@
#include "utils/syscache.h"
static List *fetch_table_list(WalReceiverConn *wrconn, List *publications);
+static void ReportSlotConnectionError(List* rstates, Oid subid, char *slotname, char *err);
+
/*
* Common option parsing function for CREATE and ALTER SUBSCRIPTION commands.
@@ -566,107 +569,196 @@ AlterSubscription_refresh(Subscription *sub, bool copy_data)
Oid *pubrel_local_oids;
ListCell *lc;
int off;
+ int remove_rel_len;
+ Relation rel = NULL;
+ typedef struct SubRemoveRels
+ {
+ Oid relid;
+ char state;
+ } SubRemoveRels;
+ SubRemoveRels *sub_remove_rels;
/* Load the library providing us libpq calls. */
load_file("libpqwalreceiver", false);
- /* Try to connect to the publisher. */
- wrconn = walrcv_connect(sub->conninfo, true, sub->name, &err);
- if (!wrconn)
- ereport(ERROR,
- (errmsg("could not connect to the publisher: %s", err)));
-
- /* Get the table list from publisher. */
- pubrel_names = fetch_table_list(wrconn, sub->publications);
+ PG_TRY();
+ {
+ /* Try to connect to the publisher. */
+ wrconn = walrcv_connect(sub->conninfo, true, sub->name, &err);
+ if (!wrconn)
+ ereport(ERROR,
+ (errmsg("could not connect to the publisher: %s", err)));
- /* We are done with the remote side, close connection. */
- walrcv_disconnect(wrconn);
+ /* Get the table list from publisher. */
+ pubrel_names = fetch_table_list(wrconn, sub->publications);
- /* Get local table list. */
- subrel_states = GetSubscriptionRelations(sub->oid);
+ /* Get local table list. */
+ subrel_states = GetSubscriptionRelations(sub->oid);
- /*
- * Build qsorted array of local table oids for faster lookup. This can
- * potentially contain all tables in the database so speed of lookup is
- * important.
- */
- subrel_local_oids = palloc(list_length(subrel_states) * sizeof(Oid));
- off = 0;
- foreach(lc, subrel_states)
- {
- SubscriptionRelState *relstate = (SubscriptionRelState *) lfirst(lc);
+ /*
+ * Build qsorted array of local table oids for faster lookup. This can
+ * potentially contain all tables in the database so speed of lookup
+ * is important.
+ */
+ subrel_local_oids = palloc(list_length(subrel_states) * sizeof(Oid));
+ off = 0;
+ foreach(lc, subrel_states)
+ {
+ SubscriptionRelState *relstate = (SubscriptionRelState *) lfirst(lc);
- subrel_local_oids[off++] = relstate->relid;
- }
- qsort(subrel_local_oids, list_length(subrel_states),
- sizeof(Oid), oid_cmp);
+ subrel_local_oids[off++] = relstate->relid;
+ }
+ qsort(subrel_local_oids, list_length(subrel_states),
+ sizeof(Oid), oid_cmp);
+
+ sub_remove_rels = palloc(list_length(subrel_states) * sizeof(SubRemoveRels));
+
+ /*
+ * Walk over the remote tables and try to match them to locally known
+ * tables. If the table is not known locally create a new state for
+ * it.
+ *
+ * Also builds array of local oids of remote tables for the next step.
+ */
+ off = 0;
+ pubrel_local_oids = palloc(list_length(pubrel_names) * sizeof(Oid));
+
+ foreach(lc, pubrel_names)
+ {
+ RangeVar *rv = (RangeVar *) lfirst(lc);
+ Oid relid;
- /*
- * Walk over the remote tables and try to match them to locally known
- * tables. If the table is not known locally create a new state for it.
- *
- * Also builds array of local oids of remote tables for the next step.
- */
- off = 0;
- pubrel_local_oids = palloc(list_length(pubrel_names) * sizeof(Oid));
+ relid = RangeVarGetRelid(rv, AccessShareLock, false);
- foreach(lc, pubrel_names)
- {
- RangeVar *rv = (RangeVar *) lfirst(lc);
- Oid relid;
+ /* Check for supported relkind. */
+ CheckSubscriptionRelkind(get_rel_relkind(relid),
+ rv->schemaname, rv->relname);
- relid = RangeVarGetRelid(rv, AccessShareLock, false);
+ pubrel_local_oids[off++] = relid;
- /* Check for supported relkind. */
- CheckSubscriptionRelkind(get_rel_relkind(relid),
- rv->schemaname, rv->relname);
+ if (!bsearch(&relid, subrel_local_oids,
+ list_length(subrel_states), sizeof(Oid), oid_cmp))
+ {
+ AddSubscriptionRelState(sub->oid, relid,
+ copy_data ? SUBREL_STATE_INIT : SUBREL_STATE_READY,
+ InvalidXLogRecPtr);
+ ereport(DEBUG1,
+ (errmsg("table \"%s.%s\" added to subscription \"%s\"",
+ rv->schemaname, rv->relname, sub->name)));
+ }
+ }
- pubrel_local_oids[off++] = relid;
+ /*
+ * Next remove state for tables we should not care about anymore using
+ * the data we collected above
+ */
+ qsort(pubrel_local_oids, list_length(pubrel_names),
+ sizeof(Oid), oid_cmp);
- if (!bsearch(&relid, subrel_local_oids,
- list_length(subrel_states), sizeof(Oid), oid_cmp))
+ remove_rel_len = 0;
+ for (off = 0; off < list_length(subrel_states); off++)
{
- AddSubscriptionRelState(sub->oid, relid,
- copy_data ? SUBREL_STATE_INIT : SUBREL_STATE_READY,
- InvalidXLogRecPtr);
- ereport(DEBUG1,
- (errmsg("table \"%s.%s\" added to subscription \"%s\"",
- rv->schemaname, rv->relname, sub->name)));
- }
- }
+ Oid relid = subrel_local_oids[off];
- /*
- * Next remove state for tables we should not care about anymore using the
- * data we collected above
- */
- qsort(pubrel_local_oids, list_length(pubrel_names),
- sizeof(Oid), oid_cmp);
+ if (!bsearch(&relid, pubrel_local_oids,
+ list_length(pubrel_names), sizeof(Oid), oid_cmp))
+ {
+ char state;
+ XLogRecPtr statelsn;
+
+ /*
+ * Lock pg_subscription_rel with AccessExclusiveLock to prevent any race
+ * conditions with the apply worker re-launching workers at the same time
+ * this code is trying to remove those tables.
+ *
+ * Even if new worker for this particular rel is restarted it won't be able
+ * to make any progress as we hold exclusive lock on subscription_rel till
+ * the transaction end. It will simply exit as there is no corresponding
+ * rel entry.
+ *
+ * This locking also ensures that the state of rels won't change till we
+ * are done with this refresh operation.
+ */
+ if (!rel)
+ rel = table_open(SubscriptionRelRelationId, AccessExclusiveLock);
+
+ /* Last known rel state. */
+ state = GetSubscriptionRelState(sub->oid, relid, &statelsn);
+
+ sub_remove_rels[remove_rel_len].relid = relid;
+ sub_remove_rels[remove_rel_len++].state = state;
+
+ RemoveSubscriptionRel(sub->oid, relid);
+
+ logicalrep_worker_stop(sub->oid, relid);
+
+ /*
+ * For READY state, we would have already dropped the tablesync
+ * origin.
+ */
+ if (state != SUBREL_STATE_READY)
+ {
+ /*
+ * Drop the tablesync's origin tracking if exists.
+ */
+ tablesync_replorigin_drop(sub->oid, relid, false /* nowait */);
+ }
- for (off = 0; off < list_length(subrel_states); off++)
- {
- Oid relid = subrel_local_oids[off];
+ ereport(DEBUG1,
+ (errmsg("table \"%s.%s\" removed from subscription \"%s\"",
+ get_namespace_name(get_rel_namespace(relid)),
+ get_rel_name(relid),
+ sub->name)));
+ }
+ }
- if (!bsearch(&relid, pubrel_local_oids,
- list_length(pubrel_names), sizeof(Oid), oid_cmp))
+ /*
+ * Drop the tablesync slots associated with removed tables. This has to
+ * be at the end because otherwise if there is an error while doing the
+ * database operations we won't be able to rollback dropped slot.
+ */
+ for (off = 0; off < remove_rel_len; off++)
{
- RemoveSubscriptionRel(sub->oid, relid);
-
- logicalrep_worker_stop_at_commit(sub->oid, relid);
-
- ereport(DEBUG1,
- (errmsg("table \"%s.%s\" removed from subscription \"%s\"",
- get_namespace_name(get_rel_namespace(relid)),
- get_rel_name(relid),
- sub->name)));
+ if (sub_remove_rels[off].state != SUBREL_STATE_READY &&
+ sub_remove_rels[off].state != SUBREL_STATE_SYNCDONE)
+ {
+ char syncslotname[NAMEDATALEN] = { 0 };
+
+ /*
+ * For READY/SYNCDONE states we know the tablesync slot has
+ * already been dropped by the tablesync worker.
+ *
+ * For other states, there is no certainty. Maybe the slot
+ * does not exist yet; Maybe the slot is already deleted but
+ * SYNCDONE is not yet set. For this reason we allow
+ * missing_ok = true for the drop.
+ *
+ * XXX If there is a network break down while dropping the slots
+ * then we will give WARNING to the user and they need to manually
+ * remove such slots. This can happen so rarely to worry about and
+ * we don't have any better way to deal with this.
+ */
+ ReplicationSlotNameForTablesync(sub->oid, sub_remove_rels[off].relid, syncslotname);
+ ReplicationSlotDropAtPubNode(wrconn, syncslotname, true /* missing_ok */);
+ }
}
}
+ PG_FINALLY();
+ {
+ if (wrconn)
+ walrcv_disconnect(wrconn);
+ }
+ PG_END_TRY();
+
+ if (rel)
+ table_close(rel, NoLock);
}
/*
* Alter the existing subscription.
*/
ObjectAddress
-AlterSubscription(AlterSubscriptionStmt *stmt)
+AlterSubscription(AlterSubscriptionStmt *stmt, bool isTopLevel)
{
Relation rel;
ObjectAddress myself;
@@ -848,6 +940,8 @@ AlterSubscription(AlterSubscriptionStmt *stmt)
errmsg("ALTER SUBSCRIPTION with refresh is not allowed for disabled subscriptions"),
errhint("Use ALTER SUBSCRIPTION ... SET PUBLICATION ... WITH (refresh = false).")));
+ PreventInTransactionBlock(isTopLevel, "ALTER SUBSCRIPTION with refresh");
+
/* Make sure refresh sees the new list of publications. */
sub->publications = stmt->publication;
@@ -877,6 +971,8 @@ AlterSubscription(AlterSubscriptionStmt *stmt)
NULL, NULL, /* no "binary" */
NULL, NULL); /* no "streaming" */
+ PreventInTransactionBlock(isTopLevel, "ALTER SUBSCRIPTION ... REFRESH");
+
AlterSubscription_refresh(sub, copy_data);
break;
@@ -928,8 +1024,8 @@ DropSubscription(DropSubscriptionStmt *stmt, bool isTopLevel)
char *err = NULL;
RepOriginId originid;
WalReceiverConn *wrconn = NULL;
- StringInfoData cmd;
Form_pg_subscription form;
+ List *rstates;
/*
* Lock pg_subscription with AccessExclusiveLock to ensure that the
@@ -1042,6 +1138,31 @@ DropSubscription(DropSubscriptionStmt *stmt, bool isTopLevel)
}
list_free(subworkers);
+ /*
+ * Cleanup of tablesync replication origins.
+ *
+ * Any READY-state relations would already have dealt with clean-ups.
+ *
+ * Note that the state can't change because we have already stopped both
+ * the apply and tablesync workers and they can't restart because of
+ * exclusive lock on the subscription.
+ */
+ rstates = GetSubscriptionNotReadyRelations(subid);
+ foreach(lc, rstates)
+ {
+ SubscriptionRelState* rstate = (SubscriptionRelState*) lfirst(lc);
+ Oid relid = rstate->relid;
+
+ /* Only cleanup resources of tablesync workers */
+ if (!OidIsValid(relid))
+ continue;
+
+ /*
+ * Drop the tablesync's origin tracking if exists.
+ */
+ tablesync_replorigin_drop(subid, relid, false /* nowait */);
+ }
+
/* Clean up dependencies */
deleteSharedDependencyRecordsFor(SubscriptionRelationId, subid, 0);
@@ -1054,34 +1175,110 @@ DropSubscription(DropSubscriptionStmt *stmt, bool isTopLevel)
if (originid != InvalidRepOriginId)
replorigin_drop(originid, false);
+
/*
* If there is no slot associated with the subscription, we can finish
* here.
*/
- if (!slotname)
+ if (!slotname && rstates == NIL)
{
table_close(rel, NoLock);
return;
}
/*
- * Otherwise drop the replication slot at the publisher node using the
- * replication connection.
+ * Try to acquire the connection necessary for dropping slots.
+ *
+ * Note: If the slotname is NONE/NULL then we allow the command to finish
+ * and users need to manually cleanup the apply and tablesync worker slots
+ * later.
+ *
+ * This has to be at the end because otherwise if there is an error while
+ * doing the database operations we won't be able to rollback dropped slot.
*/
load_file("libpqwalreceiver", false);
- initStringInfo(&cmd);
- appendStringInfo(&cmd, "DROP_REPLICATION_SLOT %s WAIT", quote_identifier(slotname));
-
wrconn = walrcv_connect(conninfo, true, subname, &err);
if (wrconn == NULL)
- ereport(ERROR,
- (errmsg("could not connect to publisher when attempting to "
- "drop the replication slot \"%s\"", slotname),
- errdetail("The error was: %s", err),
- /* translator: %s is an SQL ALTER command */
- errhint("Use %s to disassociate the subscription from the slot.",
- "ALTER SUBSCRIPTION ... SET (slot_name = NONE)")));
+ {
+ if (!slotname)
+ {
+ /* be tidy */
+ list_free(rstates);
+ return;
+ }
+ else
+ {
+ ReportSlotConnectionError(rstates, subid, slotname, err);
+ }
+ }
+
+ PG_TRY();
+ {
+ foreach(lc, rstates)
+ {
+ SubscriptionRelState *rstate = (SubscriptionRelState *) lfirst(lc);
+ Oid relid = rstate->relid;
+
+ /* Only cleanup resources of tablesync workers */
+ if (!OidIsValid(relid))
+ continue;
+
+ /*
+ * Drop the tablesync slots associated with removed tables.
+ *
+ * For SYNCDONE/READY states, the tablesync slot is known to have
+ * already been dropped by the tablesync worker.
+ *
+ * For other states, there is no certainty. Maybe the slot does
+ * not exist yet; Maybe the slot is already deleted but SYNCDONE
+ * is not yet set. For this reason, we allow missing_ok = true for
+ * the drop.
+ */
+ if (rstate->state != SUBREL_STATE_SYNCDONE)
+ {
+ char syncslotname[NAMEDATALEN] = {0};
+
+ ReplicationSlotNameForTablesync(subid, relid, syncslotname);
+ ReplicationSlotDropAtPubNode(wrconn, syncslotname, true /* missing_ok */ );
+ }
+ }
+
+ list_free(rstates);
+
+ /*
+ * If there is a slot associated with the subscription, then drop the
+ * replication slot at the publisher.
+ */
+ if (slotname)
+ ReplicationSlotDropAtPubNode(wrconn, slotname, false /* missing_ok */ );
+
+ }
+ PG_FINALLY();
+ {
+ walrcv_disconnect(wrconn);
+ }
+ PG_END_TRY();
+
+ table_close(rel, NoLock);
+}
+
+/*
+ * Drop the replication slot at the publisher node using the replication connection.
+ *
+ * missing_ok - if true then only issue WARNING message if the slot cannot be deleted.
+ */
+void
+ReplicationSlotDropAtPubNode(WalReceiverConn *wrconn, char *slotname, bool missing_ok)
+{
+ StringInfoData cmd;
+
+ Assert(wrconn);
+
+ load_file("libpqwalreceiver", false);
+
+ initStringInfo(&cmd);
+ appendStringInfo(&cmd, "DROP_REPLICATION_SLOT %s WAIT", quote_identifier(slotname));
PG_TRY();
{
@@ -1089,27 +1286,37 @@ DropSubscription(DropSubscriptionStmt *stmt, bool isTopLevel)
res = walrcv_exec(wrconn, cmd.data, 0, NULL);
- if (res->status != WALRCV_OK_COMMAND)
- ereport(ERROR,
+ if (res->status == WALRCV_OK_COMMAND)
+ {
+ /* NOTICE. Success. */
+ ereport(NOTICE,
+ (errmsg("dropped replication slot \"%s\" on publisher",
+ slotname)));
+ }
+ else if (res->status == WALRCV_ERROR && missing_ok)
+ {
+ /* WARNING. Error, but missing_ok = true. */
+ ereport(WARNING,
(errmsg("could not drop the replication slot \"%s\" on publisher",
slotname),
errdetail("The error was: %s", res->err)));
+ }
else
- ereport(NOTICE,
- (errmsg("dropped replication slot \"%s\" on publisher",
- slotname)));
+ {
+ /* ERROR. */
+ ereport(ERROR,
+ (errmsg("could not drop the replication slot \"%s\" on publisher",
+ slotname),
+ errdetail("The error was: %s", res->err)));
+ }
walrcv_clear_result(res);
}
PG_FINALLY();
{
- walrcv_disconnect(wrconn);
+ pfree(cmd.data);
}
PG_END_TRY();
-
- pfree(cmd.data);
-
- table_close(rel, NoLock);
}
/*
@@ -1278,3 +1485,45 @@ fetch_table_list(WalReceiverConn *wrconn, List *publications)
return tablelist;
}
+
+/*
+ * This is to report the connection failure while dropping replication slots.
+ * Here, we report the WARNING for all tablesync slots so that user can drop
+ * them manually, if required.
+ */
+static void
+ReportSlotConnectionError(List *rstates, Oid subid, char *slotname, char *err)
+{
+ ListCell *lc;
+
+ foreach(lc, rstates)
+ {
+ SubscriptionRelState *rstate = (SubscriptionRelState *) lfirst(lc);
+ Oid relid = rstate->relid;
+
+ /* Only cleanup resources of tablesync workers */
+ if (!OidIsValid(relid))
+ continue;
+
+ /*
+ * Caller needs to ensure that relstate doesn't change underneath us.
+ * See DropSubscription where we get the relstates.
+ */
+ if (rstate->state != SUBREL_STATE_SYNCDONE)
+ {
+ char syncslotname[NAMEDATALEN] = { 0 };
+
+ ReplicationSlotNameForTablesync(subid, relid, syncslotname);
+ elog(WARNING, "could not drop tablesync replication slot \"%s\"",
+ syncslotname);
+ }
+ }
+
+ ereport(ERROR,
+ (errmsg("could not connect to publisher when attempting to "
+ "drop the replication slot \"%s\"", slotname),
+ errdetail("The error was: %s", err),
+ /* translator: %s is an SQL ALTER command */
+ errhint("Use %s to disassociate the subscription from the slot.",
+ "ALTER SUBSCRIPTION ... SET (slot_name = NONE)")));
+}
diff --git a/src/backend/replication/logical/launcher.c b/src/backend/replication/logical/launcher.c
index 186514cd9e..58082dde18 100644
--- a/src/backend/replication/logical/launcher.c
+++ b/src/backend/replication/logical/launcher.c
@@ -73,20 +73,6 @@ typedef struct LogicalRepWorkerId
Oid relid;
} LogicalRepWorkerId;
-typedef struct StopWorkersData
-{
- int nestDepth; /* Sub-transaction nest level */
- List *workers; /* List of LogicalRepWorkerId */
- struct StopWorkersData *parent; /* This need not be an immediate
- * subtransaction parent */
-} StopWorkersData;
-
-/*
- * Stack of StopWorkersData elements. Each stack element contains the workers
- * to be stopped for that subtransaction.
- */
-static StopWorkersData *on_commit_stop_workers = NULL;
-
static void ApplyLauncherWakeup(void);
static void logicalrep_launcher_onexit(int code, Datum arg);
static void logicalrep_worker_onexit(int code, Datum arg);
@@ -546,51 +532,6 @@ logicalrep_worker_stop(Oid subid, Oid relid)
LWLockRelease(LogicalRepWorkerLock);
}
-/*
- * Request worker for specified sub/rel to be stopped on commit.
- */
-void
-logicalrep_worker_stop_at_commit(Oid subid, Oid relid)
-{
- int nestDepth = GetCurrentTransactionNestLevel();
- LogicalRepWorkerId *wid;
- MemoryContext oldctx;
-
- /* Make sure we store the info in context that survives until commit. */
- oldctx = MemoryContextSwitchTo(TopTransactionContext);
-
- /* Check that previous transactions were properly cleaned up. */
- Assert(on_commit_stop_workers == NULL ||
- nestDepth >= on_commit_stop_workers->nestDepth);
-
- /*
- * Push a new stack element if we don't already have one for the current
- * nestDepth.
- */
- if (on_commit_stop_workers == NULL ||
- nestDepth > on_commit_stop_workers->nestDepth)
- {
- StopWorkersData *newdata = palloc(sizeof(StopWorkersData));
-
- newdata->nestDepth = nestDepth;
- newdata->workers = NIL;
- newdata->parent = on_commit_stop_workers;
- on_commit_stop_workers = newdata;
- }
-
- /*
- * Finally add a new worker into the worker list of the current
- * subtransaction.
- */
- wid = palloc(sizeof(LogicalRepWorkerId));
- wid->subid = subid;
- wid->relid = relid;
- on_commit_stop_workers->workers =
- lappend(on_commit_stop_workers->workers, wid);
-
- MemoryContextSwitchTo(oldctx);
-}
-
/*
* Wake up (using latch) any logical replication worker for specified sub/rel.
*/
@@ -819,109 +760,21 @@ ApplyLauncherShmemInit(void)
}
}
-/*
- * Check whether current transaction has manipulated logical replication
- * workers.
- */
-bool
-XactManipulatesLogicalReplicationWorkers(void)
-{
- return (on_commit_stop_workers != NULL);
-}
-
/*
* Wakeup the launcher on commit if requested.
*/
void
AtEOXact_ApplyLauncher(bool isCommit)
{
-
- Assert(on_commit_stop_workers == NULL ||
- (on_commit_stop_workers->nestDepth == 1 &&
- on_commit_stop_workers->parent == NULL));
-
if (isCommit)
{
- ListCell *lc;
-
- if (on_commit_stop_workers != NULL)
- {
- List *workers = on_commit_stop_workers->workers;
-
- foreach(lc, workers)
- {
- LogicalRepWorkerId *wid = lfirst(lc);
-
- logicalrep_worker_stop(wid->subid, wid->relid);
- }
- }
-
if (on_commit_launcher_wakeup)
ApplyLauncherWakeup();
}
- /*
- * No need to pfree on_commit_stop_workers. It was allocated in
- * transaction memory context, which is going to be cleaned soon.
- */
- on_commit_stop_workers = NULL;
on_commit_launcher_wakeup = false;
}
-/*
- * On commit, merge the current on_commit_stop_workers list into the
- * immediate parent, if present.
- * On rollback, discard the current on_commit_stop_workers list.
- * Pop out the stack.
- */
-void
-AtEOSubXact_ApplyLauncher(bool isCommit, int nestDepth)
-{
- StopWorkersData *parent;
-
- /* Exit immediately if there's no work to do at this level. */
- if (on_commit_stop_workers == NULL ||
- on_commit_stop_workers->nestDepth < nestDepth)
- return;
-
- Assert(on_commit_stop_workers->nestDepth == nestDepth);
-
- parent = on_commit_stop_workers->parent;
-
- if (isCommit)
- {
- /*
- * If the upper stack element is not an immediate parent
- * subtransaction, just decrement the notional nesting depth without
- * doing any real work. Else, we need to merge the current workers
- * list into the parent.
- */
- if (!parent || parent->nestDepth < nestDepth - 1)
- {
- on_commit_stop_workers->nestDepth--;
- return;
- }
-
- parent->workers =
- list_concat(parent->workers, on_commit_stop_workers->workers);
- }
- else
- {
- /*
- * Abandon everything that was done at this nesting level. Explicitly
- * free memory to avoid a transaction-lifespan leak.
- */
- list_free_deep(on_commit_stop_workers->workers);
- }
-
- /*
- * We have taken care of the current subtransaction workers list for both
- * abort or commit. So we are ready to pop the stack.
- */
- pfree(on_commit_stop_workers);
- on_commit_stop_workers = parent;
-}
-
/*
* Request wakeup of the launcher on commit of the transaction.
*
diff --git a/src/backend/replication/logical/tablesync.c b/src/backend/replication/logical/tablesync.c
index 863d196fd7..0a51a6a01a 100644
--- a/src/backend/replication/logical/tablesync.c
+++ b/src/backend/replication/logical/tablesync.c
@@ -31,8 +31,11 @@
* table state to INIT.
* - Tablesync worker starts; changes table state from INIT to DATASYNC while
* copying.
- * - Tablesync worker finishes the copy and sets table state to SYNCWAIT;
- * waits for state change.
+ * - Tablesync worker does initial table copy; there is a FINISHEDCOPY (sync
+ * worker specific) state to indicate when the copy phase has completed, so
+ * if the worker crashes with this (non-memory) state then the copy will not
+ * be re-attempted.
+ * - Tablesync worker then sets table state to SYNCWAIT; waits for state change.
* - Apply worker periodically checks for tables in SYNCWAIT state. When
* any appear, it sets the table state to CATCHUP and starts loop-waiting
* until either the table state is set to SYNCDONE or the sync worker
@@ -48,8 +51,8 @@
* point it sets state to READY and stops tracking. Again, there might
* be zero changes in between.
*
- * So the state progression is always: INIT -> DATASYNC -> SYNCWAIT ->
- * CATCHUP -> SYNCDONE -> READY.
+ * So the state progression is always: INIT -> DATASYNC -> FINISHEDCOPY
+ * -> SYNCWAIT -> CATCHUP -> SYNCDONE -> READY.
*
* The catalog pg_subscription_rel is used to keep information about
* subscribed tables and their state. Some transient state during data
@@ -59,6 +62,7 @@
* Example flows look like this:
* - Apply is in front:
* sync:8
+ * -> set in catalog FINISHEDCOPY
* -> set in memory SYNCWAIT
* apply:10
* -> set in memory CATCHUP
@@ -74,6 +78,7 @@
*
* - Sync is in front:
* sync:10
+ * -> set in catalog FINISHEDCOPY
* -> set in memory SYNCWAIT
* apply:8
* -> set in memory CATCHUP
@@ -98,11 +103,16 @@
#include "miscadmin.h"
#include "parser/parse_relation.h"
#include "pgstat.h"
+#include "postmaster/interrupt.h"
#include "replication/logicallauncher.h"
#include "replication/logicalrelation.h"
+#include "replication/logicalworker.h"
#include "replication/walreceiver.h"
#include "replication/worker_internal.h"
+#include "replication/slot.h"
+#include "replication/origin.h"
#include "storage/ipc.h"
+#include "storage/lmgr.h"
#include "utils/builtins.h"
#include "utils/lsyscache.h"
#include "utils/memutils.h"
@@ -112,6 +122,42 @@ static bool table_states_valid = false;
StringInfo copybuf = NULL;
+/*
+ * Common code to drop the origin of a tablesync worker.
+ *
+ * There is a potential race condition if two processes attempt to call
+ * replorigin_drop for the same originid at the same time. The loser of
+ * that race would give an ERROR saying that it failed to find the
+ * expected originid.
+ *
+ * The TRY/CATCH below supresses such errors allowing the tablesync cleanup
+ * code to proceed.
+ */
+void
+tablesync_replorigin_drop(Oid subid, Oid relid, bool nowait)
+{
+ char originname[NAMEDATALEN];
+ RepOriginId originid;
+
+ snprintf(originname, sizeof(originname), "pg_%u_%u", subid, relid);
+ originid = replorigin_by_name(originname, true);
+ if (OidIsValid(originid))
+ {
+ PG_TRY();
+ {
+ replorigin_drop(originid, nowait);
+ }
+ PG_CATCH();
+ {
+ ereport(WARNING,
+ errmsg("could not drop replication origin with OID %d, named \"%s\"",
+ originid,
+ originname));
+ }
+ PG_END_TRY();
+ }
+}
+
/*
* Exit routine for synchronization worker.
*/
@@ -270,30 +316,53 @@ invalidate_syncing_table_states(Datum arg, int cacheid, uint32 hashvalue)
static void
process_syncing_tables_for_sync(XLogRecPtr current_lsn)
{
- Assert(IsTransactionState());
+ bool sync_done = false;
+ Oid subid = MySubscription->oid;
+ Oid relid = MyLogicalRepWorker->relid;
SpinLockAcquire(&MyLogicalRepWorker->relmutex);
+ sync_done = MyLogicalRepWorker->relstate == SUBREL_STATE_CATCHUP &&
+ current_lsn >= MyLogicalRepWorker->relstate_lsn;
+ SpinLockRelease(&MyLogicalRepWorker->relmutex);
- if (MyLogicalRepWorker->relstate == SUBREL_STATE_CATCHUP &&
- current_lsn >= MyLogicalRepWorker->relstate_lsn)
+ if (sync_done)
{
TimeLineID tli;
+ char syncslotname[NAMEDATALEN] = {0};
+
+ /* End wal streaming so wrconn can be re-used to drop the slot. */
+ walrcv_endstreaming(wrconn, &tli);
+
+ /*
+ * Cleanup the tablesync slot.
+ */
+ ReplicationSlotNameForTablesync(subid, relid, syncslotname);
+
+ ReplicationSlotDropAtPubNode(wrconn, syncslotname, false /* missing_ok */ );
+ /*
+ * Change state to SYNCDONE.
+ */
+ SpinLockAcquire(&MyLogicalRepWorker->relmutex);
MyLogicalRepWorker->relstate = SUBREL_STATE_SYNCDONE;
MyLogicalRepWorker->relstate_lsn = current_lsn;
SpinLockRelease(&MyLogicalRepWorker->relmutex);
+ /*
+ * UpdateSubscriptionRelState must be called within a transaction.
+ * That transaction will be ended within the finish_sync_worker().
+ */
+ if (!IsTransactionState())
+ StartTransactionCommand();
+
UpdateSubscriptionRelState(MyLogicalRepWorker->subid,
MyLogicalRepWorker->relid,
MyLogicalRepWorker->relstate,
MyLogicalRepWorker->relstate_lsn);
- walrcv_endstreaming(wrconn, &tli);
finish_sync_worker();
}
- else
- SpinLockRelease(&MyLogicalRepWorker->relmutex);
}
/*
@@ -412,6 +481,21 @@ process_syncing_tables_for_apply(XLogRecPtr current_lsn)
started_tx = true;
}
+ /*
+ * Remove the tablesync origin tracking if exists.
+ *
+ * The normal case origin drop is done here instead of in the
+ * process_syncing_tables_for_sync function because if the
+ * tablesync worker process attempted to call drop its own
+ * orign then would prevent the origin from advancing properly
+ * on commit TX.
+ */
+ tablesync_replorigin_drop(MyLogicalRepWorker->subid,
+ rstate->relid, false /* nowait */ );
+
+ /*
+ * Update the state to READY only after the origin cleanup.
+ */
UpdateSubscriptionRelState(MyLogicalRepWorker->subid,
rstate->relid, rstate->state,
rstate->lsn);
@@ -807,6 +891,32 @@ copy_table(Relation rel)
logicalrep_rel_close(relmapentry, NoLock);
}
+/*
+ * Determine the tablesync slot name.
+ *
+ * The name must not exceed NAMEDATALEN -1 because of remote node constraints
+ * on slot name length.
+ *
+ * The returned slot name is either:
+ * - stored in the supplied buffer (syncslotname), or
+ * - palloc'ed in current memory context (if syncslotname = NULL).
+ *
+ * Note: We don't use the subscription slot name as part of tablesync slot name
+ * because we are responsible for cleaning up these slots and it could become
+ * impossible to recalculate what name to cleanup if the subscription slot name
+ * had changed.
+ */
+char *
+ReplicationSlotNameForTablesync(Oid suboid, Oid relid, char syncslotname[NAMEDATALEN])
+{
+ if (syncslotname)
+ sprintf(syncslotname, "pg_%u_sync_%u", suboid, relid);
+ else
+ syncslotname = psprintf("pg_%u_sync_%u", suboid, relid);
+
+ return syncslotname;
+}
+
/*
* Start syncing the table in the sync worker.
*
@@ -824,6 +934,8 @@ LogicalRepSyncTableStart(XLogRecPtr *origin_startpos)
XLogRecPtr relstate_lsn;
Relation rel;
WalRcvExecResult *res;
+ char originname[NAMEDATALEN];
+ RepOriginId originid;
/* Check the state of the table synchronization. */
StartTransactionCommand();
@@ -849,19 +961,10 @@ LogicalRepSyncTableStart(XLogRecPtr *origin_startpos)
finish_sync_worker(); /* doesn't return */
}
- /*
- * To build a slot name for the sync work, we are limited to NAMEDATALEN -
- * 1 characters. We cut the original slot name to NAMEDATALEN - 28 chars
- * and append _%u_sync_%u (1 + 10 + 6 + 10 + '\0'). (It's actually the
- * NAMEDATALEN on the remote that matters, but this scheme will also work
- * reasonably if that is different.)
- */
- StaticAssertStmt(NAMEDATALEN >= 32, "NAMEDATALEN too small"); /* for sanity */
- slotname = psprintf("%.*s_%u_sync_%u",
- NAMEDATALEN - 28,
- MySubscription->slotname,
- MySubscription->oid,
- MyLogicalRepWorker->relid);
+ /* Calculate the name of the tablesync slot. */
+ slotname = ReplicationSlotNameForTablesync(MySubscription->oid,
+ MyLogicalRepWorker->relid,
+ NULL /* use palloc */ );
/*
* Here we use the slot name instead of the subscription name as the
@@ -874,7 +977,48 @@ LogicalRepSyncTableStart(XLogRecPtr *origin_startpos)
(errmsg("could not connect to the publisher: %s", err)));
Assert(MyLogicalRepWorker->relstate == SUBREL_STATE_INIT ||
- MyLogicalRepWorker->relstate == SUBREL_STATE_DATASYNC);
+ MyLogicalRepWorker->relstate == SUBREL_STATE_DATASYNC ||
+ MyLogicalRepWorker->relstate == SUBREL_STATE_FINISHEDCOPY);
+
+ /* Assign the origin tracking record name. */
+ snprintf(originname, sizeof(originname), "pg_%u_%u", MySubscription->oid, MyLogicalRepWorker->relid);
+
+ if (MyLogicalRepWorker->relstate == SUBREL_STATE_DATASYNC)
+ {
+ /*
+ * We have previously errored out before finishing the copy so the
+ * replication slot might exist. We want to remove the slot if it
+ * already exists and proceed.
+ *
+ * XXX We could also instead try to drop the slot, last time we failed
+ * but for that, we might need to clean up the copy state as it might
+ * be in the middle of fetching the rows. Also, if there is a network
+ * break down then it wouldn't have succeeded so trying it next time
+ * seems like a better bet.
+ */
+ ReplicationSlotDropAtPubNode(wrconn, slotname, true /* missing_ok */);
+ }
+ else if (MyLogicalRepWorker->relstate == SUBREL_STATE_FINISHEDCOPY)
+ {
+ /*
+ * The COPY phase was previously done, but tablesync then crashed
+ * before it was able to finish normally.
+ */
+ StartTransactionCommand();
+
+ /*
+ * The origin tracking name must already exist. It was created first
+ * time this tablesync was launched.
+ */
+ originid = replorigin_by_name(originname, false /* missing_ok */ );
+ replorigin_session_setup(originid);
+ replorigin_session_origin = originid;
+ *origin_startpos = replorigin_session_get_progress(false);
+
+ CommitTransactionCommand();
+
+ goto copy_table_done;
+ }
SpinLockAcquire(&MyLogicalRepWorker->relmutex);
MyLogicalRepWorker->relstate = SUBREL_STATE_DATASYNC;
@@ -890,9 +1034,6 @@ LogicalRepSyncTableStart(XLogRecPtr *origin_startpos)
CommitTransactionCommand();
pgstat_report_stat(false);
- /*
- * We want to do the table data sync in a single transaction.
- */
StartTransactionCommand();
/*
@@ -918,12 +1059,12 @@ LogicalRepSyncTableStart(XLogRecPtr *origin_startpos)
walrcv_clear_result(res);
/*
- * Create a new temporary logical decoding slot. This slot will be used
- * for the catchup phase after COPY is done, so tell it to use the
- * snapshot to make the final data consistent.
+ * Create a new permanent logical decoding slot. This slot will be
+ * used for the catchup phase after COPY is done, so tell it to use
+ * the snapshot to make the final data consistent.
*/
- walrcv_create_slot(wrconn, slotname, true,
- CRS_USE_SNAPSHOT, origin_startpos);
+ walrcv_create_slot(wrconn, slotname, false /* permanent */ ,
+ CRS_USE_SNAPSHOT, origin_startpos);
/* Now do the initial data copy */
PushActiveSnapshot(GetTransactionSnapshot());
@@ -934,7 +1075,7 @@ LogicalRepSyncTableStart(XLogRecPtr *origin_startpos)
if (res->status != WALRCV_OK_COMMAND)
ereport(ERROR,
(errmsg("table copy could not finish transaction on publisher"),
- errdetail("The error was: %s", res->err)));
+ errdetail("The error was: %s", res->err)));
walrcv_clear_result(res);
table_close(rel, NoLock);
@@ -942,6 +1083,54 @@ LogicalRepSyncTableStart(XLogRecPtr *origin_startpos)
/* Make the copy visible. */
CommandCounterIncrement();
+ /* Setup replication origin tracking. */
+ originid = replorigin_by_name(originname, true);
+ if (!OidIsValid(originid))
+ {
+ /*
+ * Origin tracking does not exist, so create it now.
+ *
+ * Then advance to the LSN got from walrcv_create_slot. This is
+ * WAL logged for the purpose of recovery. Locks are to prevent
+ * the replication origin from vanishing while advancing.
+ */
+ originid = replorigin_create(originname);
+
+ LockRelationOid(ReplicationOriginRelationId, RowExclusiveLock);
+ replorigin_advance(originid, *origin_startpos, InvalidXLogRecPtr,
+ true /* go backward */ , true /* WAL log */ );
+ UnlockRelationOid(ReplicationOriginRelationId, RowExclusiveLock);
+
+ replorigin_session_setup(originid);
+ replorigin_session_origin = originid;
+ }
+ else
+ {
+ ereport(ERROR,
+ (errcode(ERRCODE_DUPLICATE_OBJECT),
+ errmsg("replication origin \"%s\" already exists",
+ originname)));
+ }
+
+ /*
+ * Update the persisted state to indicate the COPY phase is done; make
+ * it visible to others.
+ */
+ UpdateSubscriptionRelState(MyLogicalRepWorker->subid,
+ MyLogicalRepWorker->relid,
+ SUBREL_STATE_FINISHEDCOPY,
+ MyLogicalRepWorker->relstate_lsn);
+
+ CommitTransactionCommand();
+
+copy_table_done:
+
+ elog(DEBUG1,
+ "LogicalRepSyncTableStart: '%s' origin_startpos lsn %X/%X",
+ originname,
+ (uint32) (*origin_startpos >> 32),
+ (uint32) *origin_startpos);
+
/*
* We are done with the initial data synchronization, update the state.
*/
diff --git a/src/backend/replication/logical/worker.c b/src/backend/replication/logical/worker.c
index eb7db89cef..cfc924cd89 100644
--- a/src/backend/replication/logical/worker.c
+++ b/src/backend/replication/logical/worker.c
@@ -807,12 +807,8 @@ apply_handle_stream_stop(StringInfo s)
/* We must be in a valid transaction state */
Assert(IsTransactionState());
- /* The synchronization worker runs in single transaction. */
- if (!am_tablesync_worker())
- {
- /* Commit the per-stream transaction */
- CommitTransactionCommand();
- }
+ /* Commit the per-stream transaction */
+ CommitTransactionCommand();
in_streamed_transaction = false;
@@ -889,9 +885,7 @@ apply_handle_stream_abort(StringInfo s)
/* Cleanup the subxact info */
cleanup_subxact_info();
- /* The synchronization worker runs in single transaction */
- if (!am_tablesync_worker())
- CommitTransactionCommand();
+ CommitTransactionCommand();
return;
}
@@ -918,8 +912,7 @@ apply_handle_stream_abort(StringInfo s)
/* write the updated subxact list */
subxact_info_write(MyLogicalRepWorker->subid, xid);
- if (!am_tablesync_worker())
- CommitTransactionCommand();
+ CommitTransactionCommand();
}
}
@@ -1062,8 +1055,7 @@ apply_handle_stream_commit(StringInfo s)
static void
apply_handle_commit_internal(StringInfo s, LogicalRepCommitData *commit_data)
{
- /* The synchronization worker runs in single transaction. */
- if (IsTransactionState() && !am_tablesync_worker())
+ if (IsTransactionState())
{
/*
* Update origin state so we can restart streaming from correct
diff --git a/src/backend/tcop/utility.c b/src/backend/tcop/utility.c
index 1d81071c35..05bb698cf4 100644
--- a/src/backend/tcop/utility.c
+++ b/src/backend/tcop/utility.c
@@ -1786,7 +1786,8 @@ ProcessUtilitySlow(ParseState *pstate,
break;
case T_AlterSubscriptionStmt:
- address = AlterSubscription((AlterSubscriptionStmt *) parsetree);
+ address = AlterSubscription((AlterSubscriptionStmt *) parsetree,
+ isTopLevel);
break;
case T_DropSubscriptionStmt:
diff --git a/src/include/catalog/pg_subscription_rel.h b/src/include/catalog/pg_subscription_rel.h
index ab1202cf9b..e04ba83e1e 100644
--- a/src/include/catalog/pg_subscription_rel.h
+++ b/src/include/catalog/pg_subscription_rel.h
@@ -61,6 +61,8 @@ DECLARE_UNIQUE_INDEX_PKEY(pg_subscription_rel_srrelid_srsubid_index, 6117, on pg
#define SUBREL_STATE_INIT 'i' /* initializing (sublsn NULL) */
#define SUBREL_STATE_DATASYNC 'd' /* data is being synchronized (sublsn
* NULL) */
+#define SUBREL_STATE_FINISHEDCOPY 'f' /* tablesync copy phase is completed
+ * (sublsn NULL) */
#define SUBREL_STATE_SYNCDONE 's' /* synchronization finished in front of
* apply (sublsn set) */
#define SUBREL_STATE_READY 'r' /* ready (sublsn set) */
diff --git a/src/include/commands/subscriptioncmds.h b/src/include/commands/subscriptioncmds.h
index a81865079d..3b926f35d7 100644
--- a/src/include/commands/subscriptioncmds.h
+++ b/src/include/commands/subscriptioncmds.h
@@ -20,7 +20,7 @@
extern ObjectAddress CreateSubscription(CreateSubscriptionStmt *stmt,
bool isTopLevel);
-extern ObjectAddress AlterSubscription(AlterSubscriptionStmt *stmt);
+extern ObjectAddress AlterSubscription(AlterSubscriptionStmt *stmt, bool isTopLevel);
extern void DropSubscription(DropSubscriptionStmt *stmt, bool isTopLevel);
extern ObjectAddress AlterSubscriptionOwner(const char *name, Oid newOwnerId);
diff --git a/src/include/replication/logicallauncher.h b/src/include/replication/logicallauncher.h
index 421ec1580d..301e494f7b 100644
--- a/src/include/replication/logicallauncher.h
+++ b/src/include/replication/logicallauncher.h
@@ -22,9 +22,7 @@ extern Size ApplyLauncherShmemSize(void);
extern void ApplyLauncherShmemInit(void);
extern void ApplyLauncherWakeupAtCommit(void);
-extern bool XactManipulatesLogicalReplicationWorkers(void);
extern void AtEOXact_ApplyLauncher(bool isCommit);
-extern void AtEOSubXact_ApplyLauncher(bool isCommit, int nestDepth);
extern bool IsLogicalLauncher(void);
diff --git a/src/include/replication/slot.h b/src/include/replication/slot.h
index 53f636c56f..5f52335f15 100644
--- a/src/include/replication/slot.h
+++ b/src/include/replication/slot.h
@@ -15,6 +15,7 @@
#include "storage/lwlock.h"
#include "storage/shmem.h"
#include "storage/spin.h"
+#include "replication/walreceiver.h"
/*
* Behaviour of replication slots, upon release or crash.
@@ -211,6 +212,8 @@ extern bool ReplicationSlotsCountDBSlots(Oid dboid, int *nslots, int *nactive);
extern void ReplicationSlotsDropDBSlots(Oid dboid);
extern void InvalidateObsoleteReplicationSlots(XLogSegNo oldestSegno);
extern ReplicationSlot *SearchNamedReplicationSlot(const char *name);
+extern char *ReplicationSlotNameForTablesync(Oid suboid, Oid relid, char *syncslotname);
+extern void ReplicationSlotDropAtPubNode(WalReceiverConn *wrconn, char *slotname, bool missing_ok);
extern void StartupReplicationSlots(void);
extern void CheckPointReplicationSlots(void);
diff --git a/src/include/replication/worker_internal.h b/src/include/replication/worker_internal.h
index d046022e49..4a5c49da7d 100644
--- a/src/include/replication/worker_internal.h
+++ b/src/include/replication/worker_internal.h
@@ -77,13 +77,14 @@ extern List *logicalrep_workers_find(Oid subid, bool only_running);
extern void logicalrep_worker_launch(Oid dbid, Oid subid, const char *subname,
Oid userid, Oid relid);
extern void logicalrep_worker_stop(Oid subid, Oid relid);
-extern void logicalrep_worker_stop_at_commit(Oid subid, Oid relid);
extern void logicalrep_worker_wakeup(Oid subid, Oid relid);
extern void logicalrep_worker_wakeup_ptr(LogicalRepWorker *worker);
extern int logicalrep_sync_worker_count(Oid subid);
extern char *LogicalRepSyncTableStart(XLogRecPtr *origin_startpos);
+extern void tablesync_replorigin_drop(Oid subid, Oid relid, bool nowait);
+
void process_syncing_tables(XLogRecPtr current_lsn);
void invalidate_syncing_table_states(Datum arg, int cacheid,
uint32 hashvalue);
diff --git a/src/tools/pgindent/typedefs.list b/src/tools/pgindent/typedefs.list
index 1d540fe489..5f5c36d8e2 100644
--- a/src/tools/pgindent/typedefs.list
+++ b/src/tools/pgindent/typedefs.list
@@ -2397,7 +2397,6 @@ StdAnalyzeData
StdRdOptions
Step
StopList
-StopWorkersData
StrategyNumber
StreamCtl
StreamXidHash
--
2.28.0.windows.1
On Wed, Feb 3, 2021 at 11:38 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
Thanks for the report. The problem here was that the error occurred
when we were trying to copy the large data. Now, before fetching the
entire data we issued a rollback that led to this problem. I think the
alternative here could be to first fetch the entire data when the
error occurred then issue the following commands. Instead, I have
modified the patch to perform 'drop_replication_slot' in the beginning
if the relstate is datasync. Do let me know if you can think of a
better way to fix this?
I have verified that the problem is not seen after this patch. I also
agree with the approach taken for the fix,
regards,
Ajin Cherian
Fujitsu Australia
On Thu, Feb 4, 2021 at 9:55 AM Ajin Cherian <itsajin@gmail.com> wrote:
On Wed, Feb 3, 2021 at 11:38 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
Thanks for the report. The problem here was that the error occurred
when we were trying to copy the large data. Now, before fetching the
entire data we issued a rollback that led to this problem. I think the
alternative here could be to first fetch the entire data when the
error occurred then issue the following commands. Instead, I have
modified the patch to perform 'drop_replication_slot' in the beginning
if the relstate is datasync. Do let me know if you can think of a
better way to fix this?I have verified that the problem is not seen after this patch. I also
agree with the approach taken for the fix,
Thanks. I have fixed one of the issues reported by me earlier [1]/messages/by-id/CAA4eK1JdWv84nMyCpTboBURjj70y3BfO1xdy8SYPRqNxtH7TEA@mail.gmail.com
wherein the tablesync worker can repeatedly fail if after dropping the
slot there is an error while updating the SYNCDONE state in the
database. I have moved the drop of the slot just before commit of the
transaction where we are marking the state as SYNCDONE. Additionally,
I have removed unnecessary includes in tablesync.c, updated the docs
for Alter Subscription, and updated the comments at various places in
the patch. I have also updated the commit message this time.
I am still not very happy with the way we handle concurrent drop
origins but probably that would be addressed by the other patch Peter
is working on [2]/messages/by-id/CAHut+PsW6+7Ucb1sxjSNBaSYPGAVzQFbAEg4y1KsYQiGCnyGeQ@mail.gmail.com.
[1]: /messages/by-id/CAA4eK1JdWv84nMyCpTboBURjj70y3BfO1xdy8SYPRqNxtH7TEA@mail.gmail.com
[2]: /messages/by-id/CAHut+PsW6+7Ucb1sxjSNBaSYPGAVzQFbAEg4y1KsYQiGCnyGeQ@mail.gmail.com
--
With Regards,
Amit Kapila.
Attachments:
v27-0001-Allow-multiple-xacts-during-table-sync-in-logica.patchapplication/octet-stream; name=v27-0001-Allow-multiple-xacts-during-table-sync-in-logica.patchDownload
From 5780fd2477ba2ba10294f45840bdadd0836e05cf Mon Sep 17 00:00:00 2001
From: Amit Kapila <akapila@postgresql.org>
Date: Thu, 4 Feb 2021 14:34:00 +0530
Subject: [PATCH v27] Allow multiple xacts during table sync in logical
replication.
For the initial table data synchronization in logical replication, we use
a single transaction to copy the entire table and then synchronizes the
position in the stream with the main apply worker.
There are multiple downsides of this approach: (a) We have to perform the
entire copy operation again if there is any error (network breakdown,
error in the database operation, etc.) while we synchronize the WAL
position between tablesync worker and apply worker; this will be onerous
especially for large copies, (b) Using a single transaction in the
synchronization-phase (where we can receive WAL from multiple
transactions) will have the risk of exceeding the CID limit, (c) The slot
will hold the WAL till the entire sync is complete because we never commit
till the end.
This patch solves all the above downsides by allowing multiple
transactions during the tablesync phase. The initial copy is done in a
single transaction and after that, we commit each transaction as we
receive. To allow recovery after any error or crash, we use a permanent
slot and origin to track the progress. The slot and origin will be removed
once we finish the synchronization of the table. We also remove slot and
origin of tablesync workers if the user performs DROP SUBSCRIPTION .. or
ALTER SUBSCRIPTION .. REFERESH and some of the table syncs are still not
finished.
This will also open up the path for logical replication of 2PC
transactions on the subscriber side. Previously, we can't do that because
of the requirement of maintaining a single transaction in tablesync
workers.
Author: Peter Smith and Amit Kapila
Reviewed-by: Ajin Cherian, Hou, Zhijie and Amit Kapila
Discussion: https://postgr.es/m/CAA4eK1KHJxaZS-fod-0fey=0tq3=Gkn4ho=8N4-5HWiCfu0H1A@mail.gmail.com
---
doc/src/sgml/catalogs.sgml | 1 +
doc/src/sgml/logical-replication.sgml | 29 +-
doc/src/sgml/ref/alter_subscription.sgml | 17 ++
doc/src/sgml/ref/drop_subscription.sgml | 6 +-
src/backend/access/transam/xact.c | 11 -
src/backend/catalog/pg_subscription.c | 35 +++
src/backend/commands/subscriptioncmds.c | 450 ++++++++++++++++++++++------
src/backend/replication/logical/launcher.c | 147 ---------
src/backend/replication/logical/tablesync.c | 240 +++++++++++++--
src/backend/replication/logical/worker.c | 18 +-
src/backend/tcop/utility.c | 3 +-
src/include/catalog/pg_subscription_rel.h | 2 +
src/include/commands/subscriptioncmds.h | 2 +-
src/include/replication/logicallauncher.h | 2 -
src/include/replication/slot.h | 3 +
src/include/replication/worker_internal.h | 3 +-
src/tools/pgindent/typedefs.list | 1 -
17 files changed, 655 insertions(+), 315 deletions(-)
diff --git a/doc/src/sgml/catalogs.sgml b/doc/src/sgml/catalogs.sgml
index ea222c0..692ad65 100644
--- a/doc/src/sgml/catalogs.sgml
+++ b/doc/src/sgml/catalogs.sgml
@@ -7673,6 +7673,7 @@ SCRAM-SHA-256$<replaceable><iteration count></replaceable>:<replaceable>&l
State code:
<literal>i</literal> = initialize,
<literal>d</literal> = data is being copied,
+ <literal>f</literal> = finished table copy,
<literal>s</literal> = synchronized,
<literal>r</literal> = ready (normal replication)
</para></entry>
diff --git a/doc/src/sgml/logical-replication.sgml b/doc/src/sgml/logical-replication.sgml
index a560ad6..43fe7f7 100644
--- a/doc/src/sgml/logical-replication.sgml
+++ b/doc/src/sgml/logical-replication.sgml
@@ -248,13 +248,23 @@
<para>
As mentioned earlier, each (active) subscription receives changes from a
- replication slot on the remote (publishing) side. Normally, the remote
- replication slot is created automatically when the subscription is created
- using <command>CREATE SUBSCRIPTION</command> and it is dropped
- automatically when the subscription is dropped using <command>DROP
- SUBSCRIPTION</command>. In some situations, however, it can be useful or
- necessary to manipulate the subscription and the underlying replication
- slot separately. Here are some scenarios:
+ replication slot on the remote (publishing) side.
+ </para>
+ <para>
+ Additional table synchronization slots are normally transient, created
+ internally to perform initial table synchronization and dropped
+ automatically when they are no longer needed. These table synchronization
+ slots have generated names: <quote><literal>pg_%u_sync_%u</literal></quote>
+ (parameters: Subscription <parameter>oid</parameter>,
+ Table <parameter>relid</parameter>)
+ </para>
+ <para>
+ Normally, the remote replication slot is created automatically when the
+ subscription is created using <command>CREATE SUBSCRIPTION</command> and it
+ is dropped automatically when the subscription is dropped using
+ <command>DROP SUBSCRIPTION</command>. In some situations, however, it can
+ be useful or necessary to manipulate the subscription and the underlying
+ replication slot separately. Here are some scenarios:
<itemizedlist>
<listitem>
@@ -294,8 +304,9 @@
using <command>ALTER SUBSCRIPTION</command> before attempting to drop
the subscription. If the remote database instance no longer exists, no
further action is then necessary. If, however, the remote database
- instance is just unreachable, the replication slot should then be
- dropped manually; otherwise it would continue to reserve WAL and might
+ instance is just unreachable, the replication slot (and any still
+ remaining table synchronization slots) should then be
+ dropped manually; otherwise it/they would continue to reserve WAL and might
eventually cause the disk to fill up. Such cases should be carefully
investigated.
</para>
diff --git a/doc/src/sgml/ref/alter_subscription.sgml b/doc/src/sgml/ref/alter_subscription.sgml
index db5e59f..8761f03 100644
--- a/doc/src/sgml/ref/alter_subscription.sgml
+++ b/doc/src/sgml/ref/alter_subscription.sgml
@@ -48,6 +48,23 @@ ALTER SUBSCRIPTION <replaceable class="parameter">name</replaceable> RENAME TO <
(Currently, all subscription owners must be superusers, so the owner checks
will be bypassed in practice. But this might change in the future.)
</para>
+
+ <para>
+ When refreshing a publication we remove the relations that are no longer
+ part of the publication and we also remove the tablesync slots if there are
+ any. It is necessary to remove tablesync slots so that the resources
+ allocated for the subscription on the remote host are released. If due to
+ network breakdown or some other error, we are not able to remove the slots,
+ we give WARNING and the user needs to manually remove such slots later as
+ otherwise, they will continue to reserve WAL and might eventually cause
+ the disk to fill up. See also <xref linkend="logical-replication-subscription-slot"/>.
+ </para>
+
+ <para>
+ Commands <command>ALTER SUBSCRIPTION ... REFRESH ..</command> and
+ <command>ALTER SUBSCRIPTION ... SET PUBLICATION ..</command> with refresh
+ option as true cannot be executed inside a transaction block.
+ </para>
</refsect1>
<refsect1>
diff --git a/doc/src/sgml/ref/drop_subscription.sgml b/doc/src/sgml/ref/drop_subscription.sgml
index adbdeaf..aee9615 100644
--- a/doc/src/sgml/ref/drop_subscription.sgml
+++ b/doc/src/sgml/ref/drop_subscription.sgml
@@ -79,7 +79,8 @@ DROP SUBSCRIPTION [ IF EXISTS ] <replaceable class="parameter">name</replaceable
<para>
When dropping a subscription that is associated with a replication slot on
the remote host (the normal state), <command>DROP SUBSCRIPTION</command>
- will connect to the remote host and try to drop the replication slot as
+ will connect to the remote host and try to drop the replication slot (and
+ any remaining table synchronization slots) as
part of its operation. This is necessary so that the resources allocated
for the subscription on the remote host are released. If this fails,
either because the remote host is not reachable or because the remote
@@ -89,7 +90,8 @@ DROP SUBSCRIPTION [ IF EXISTS ] <replaceable class="parameter">name</replaceable
executing <literal>ALTER SUBSCRIPTION ... SET (slot_name = NONE)</literal>.
After that, <command>DROP SUBSCRIPTION</command> will no longer attempt any
actions on a remote host. Note that if the remote replication slot still
- exists, it should then be dropped manually; otherwise it will continue to
+ exists, it (and any related table synchronization slots) should then be
+ dropped manually; otherwise it/they will continue to
reserve WAL and might eventually cause the disk to fill up. See
also <xref linkend="logical-replication-subscription-slot"/>.
</para>
diff --git a/src/backend/access/transam/xact.c b/src/backend/access/transam/xact.c
index a2068e3..3c8b4eb 100644
--- a/src/backend/access/transam/xact.c
+++ b/src/backend/access/transam/xact.c
@@ -2432,15 +2432,6 @@ PrepareTransaction(void)
(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
errmsg("cannot PREPARE a transaction that has exported snapshots")));
- /*
- * Don't allow PREPARE but for transaction that has/might kill logical
- * replication workers.
- */
- if (XactManipulatesLogicalReplicationWorkers())
- ereport(ERROR,
- (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
- errmsg("cannot PREPARE a transaction that has manipulated logical replication workers")));
-
/* Prevent cancel/die interrupt while cleaning up */
HOLD_INTERRUPTS();
@@ -4899,7 +4890,6 @@ CommitSubTransaction(void)
AtEOSubXact_HashTables(true, s->nestingLevel);
AtEOSubXact_PgStat(true, s->nestingLevel);
AtSubCommit_Snapshot(s->nestingLevel);
- AtEOSubXact_ApplyLauncher(true, s->nestingLevel);
/*
* We need to restore the upper transaction's read-only state, in case the
@@ -5059,7 +5049,6 @@ AbortSubTransaction(void)
AtEOSubXact_HashTables(false, s->nestingLevel);
AtEOSubXact_PgStat(false, s->nestingLevel);
AtSubAbort_Snapshot(s->nestingLevel);
- AtEOSubXact_ApplyLauncher(false, s->nestingLevel);
}
/*
diff --git a/src/backend/catalog/pg_subscription.c b/src/backend/catalog/pg_subscription.c
index 44cb285..e422fbf 100644
--- a/src/backend/catalog/pg_subscription.c
+++ b/src/backend/catalog/pg_subscription.c
@@ -29,6 +29,7 @@
#include "utils/array.h"
#include "utils/builtins.h"
#include "utils/fmgroids.h"
+#include "utils/lsyscache.h"
#include "utils/pg_lsn.h"
#include "utils/rel.h"
#include "utils/syscache.h"
@@ -337,6 +338,13 @@ GetSubscriptionRelState(Oid subid, Oid relid, XLogRecPtr *sublsn)
char substate;
bool isnull;
Datum d;
+ Relation rel;
+
+ /*
+ * This is to avoid the race condition with AlterSubscription which tries
+ * to remove this relstate.
+ */
+ rel = table_open(SubscriptionRelRelationId, AccessShareLock);
/* Try finding the mapping. */
tup = SearchSysCache2(SUBSCRIPTIONRELMAP,
@@ -363,6 +371,8 @@ GetSubscriptionRelState(Oid subid, Oid relid, XLogRecPtr *sublsn)
/* Cleanup */
ReleaseSysCache(tup);
+ table_close(rel, AccessShareLock);
+
return substate;
}
@@ -403,6 +413,31 @@ RemoveSubscriptionRel(Oid subid, Oid relid)
scan = table_beginscan_catalog(rel, nkeys, skey);
while (HeapTupleIsValid(tup = heap_getnext(scan, ForwardScanDirection)))
{
+ Form_pg_subscription_rel subrel;
+
+ subrel = (Form_pg_subscription_rel) GETSTRUCT(tup);
+
+ /*
+ * We don't allow to drop the relation mapping when the table
+ * synchronization is in progress unless the caller updates the
+ * corresponding subscription as well. This is to ensure that we don't
+ * leave tablesync slots or origins in the system when the
+ * corresponding table is dropped.
+ */
+ if (!OidIsValid(subid) && subrel->srsubstate != SUBREL_STATE_READY)
+ {
+ ereport(ERROR,
+ (errcode(ERRCODE_UNDEFINED_OBJECT),
+ errmsg("could not drop relation mapping for subscription \"%s\"",
+ get_subscription_name(subrel->srsubid, false)),
+ errdetail("Table synchronization for relation \"%s\" is in progress and is in state \"%c\".",
+ get_rel_name(relid), subrel->srsubstate),
+ /* translator: first %s is a SQL ALTER command and second %s is a SQL DROP command */
+ errhint("Use %s to enable subscription if not already enabled or use %s to drop the subscription.",
+ "ALTER SUBSCRIPTION ... ENABLE",
+ "DROP SUBSCRIPTION ...")));
+ }
+
CatalogTupleDelete(rel, &tup->t_self);
}
table_endscan(scan);
diff --git a/src/backend/commands/subscriptioncmds.c b/src/backend/commands/subscriptioncmds.c
index 082f785..a50e72a 100644
--- a/src/backend/commands/subscriptioncmds.c
+++ b/src/backend/commands/subscriptioncmds.c
@@ -34,6 +34,7 @@
#include "nodes/makefuncs.h"
#include "replication/logicallauncher.h"
#include "replication/origin.h"
+#include "replication/slot.h"
#include "replication/walreceiver.h"
#include "replication/walsender.h"
#include "replication/worker_internal.h"
@@ -46,6 +47,8 @@
#include "utils/syscache.h"
static List *fetch_table_list(WalReceiverConn *wrconn, List *publications);
+static void ReportSlotConnectionError(List *rstates, Oid subid, char *slotname, char *err);
+
/*
* Common option parsing function for CREATE and ALTER SUBSCRIPTION commands.
@@ -566,107 +569,197 @@ AlterSubscription_refresh(Subscription *sub, bool copy_data)
Oid *pubrel_local_oids;
ListCell *lc;
int off;
+ int remove_rel_len;
+ Relation rel = NULL;
+ typedef struct SubRemoveRels
+ {
+ Oid relid;
+ char state;
+ } SubRemoveRels;
+ SubRemoveRels *sub_remove_rels;
/* Load the library providing us libpq calls. */
load_file("libpqwalreceiver", false);
- /* Try to connect to the publisher. */
- wrconn = walrcv_connect(sub->conninfo, true, sub->name, &err);
- if (!wrconn)
- ereport(ERROR,
- (errmsg("could not connect to the publisher: %s", err)));
-
- /* Get the table list from publisher. */
- pubrel_names = fetch_table_list(wrconn, sub->publications);
-
- /* We are done with the remote side, close connection. */
- walrcv_disconnect(wrconn);
-
- /* Get local table list. */
- subrel_states = GetSubscriptionRelations(sub->oid);
-
- /*
- * Build qsorted array of local table oids for faster lookup. This can
- * potentially contain all tables in the database so speed of lookup is
- * important.
- */
- subrel_local_oids = palloc(list_length(subrel_states) * sizeof(Oid));
- off = 0;
- foreach(lc, subrel_states)
+ PG_TRY();
{
- SubscriptionRelState *relstate = (SubscriptionRelState *) lfirst(lc);
+ /* Try to connect to the publisher. */
+ wrconn = walrcv_connect(sub->conninfo, true, sub->name, &err);
+ if (!wrconn)
+ ereport(ERROR,
+ (errmsg("could not connect to the publisher: %s", err)));
- subrel_local_oids[off++] = relstate->relid;
- }
- qsort(subrel_local_oids, list_length(subrel_states),
- sizeof(Oid), oid_cmp);
+ /* Get the table list from publisher. */
+ pubrel_names = fetch_table_list(wrconn, sub->publications);
- /*
- * Walk over the remote tables and try to match them to locally known
- * tables. If the table is not known locally create a new state for it.
- *
- * Also builds array of local oids of remote tables for the next step.
- */
- off = 0;
- pubrel_local_oids = palloc(list_length(pubrel_names) * sizeof(Oid));
+ /* Get local table list. */
+ subrel_states = GetSubscriptionRelations(sub->oid);
- foreach(lc, pubrel_names)
- {
- RangeVar *rv = (RangeVar *) lfirst(lc);
- Oid relid;
+ /*
+ * Build qsorted array of local table oids for faster lookup. This can
+ * potentially contain all tables in the database so speed of lookup
+ * is important.
+ */
+ subrel_local_oids = palloc(list_length(subrel_states) * sizeof(Oid));
+ off = 0;
+ foreach(lc, subrel_states)
+ {
+ SubscriptionRelState *relstate = (SubscriptionRelState *) lfirst(lc);
- relid = RangeVarGetRelid(rv, AccessShareLock, false);
+ subrel_local_oids[off++] = relstate->relid;
+ }
+ qsort(subrel_local_oids, list_length(subrel_states),
+ sizeof(Oid), oid_cmp);
+
+ /*
+ * Rels that we want to remove from subscription and drop any slots and
+ * origins corresponding to them.
+ */
+ sub_remove_rels = palloc(list_length(subrel_states) * sizeof(SubRemoveRels));
+
+ /*
+ * Walk over the remote tables and try to match them to locally known
+ * tables. If the table is not known locally create a new state for
+ * it.
+ *
+ * Also builds array of local oids of remote tables for the next step.
+ */
+ off = 0;
+ pubrel_local_oids = palloc(list_length(pubrel_names) * sizeof(Oid));
+
+ foreach(lc, pubrel_names)
+ {
+ RangeVar *rv = (RangeVar *) lfirst(lc);
+ Oid relid;
- /* Check for supported relkind. */
- CheckSubscriptionRelkind(get_rel_relkind(relid),
- rv->schemaname, rv->relname);
+ relid = RangeVarGetRelid(rv, AccessShareLock, false);
- pubrel_local_oids[off++] = relid;
+ /* Check for supported relkind. */
+ CheckSubscriptionRelkind(get_rel_relkind(relid),
+ rv->schemaname, rv->relname);
- if (!bsearch(&relid, subrel_local_oids,
- list_length(subrel_states), sizeof(Oid), oid_cmp))
- {
- AddSubscriptionRelState(sub->oid, relid,
- copy_data ? SUBREL_STATE_INIT : SUBREL_STATE_READY,
- InvalidXLogRecPtr);
- ereport(DEBUG1,
- (errmsg("table \"%s.%s\" added to subscription \"%s\"",
- rv->schemaname, rv->relname, sub->name)));
- }
- }
+ pubrel_local_oids[off++] = relid;
- /*
- * Next remove state for tables we should not care about anymore using the
- * data we collected above
- */
- qsort(pubrel_local_oids, list_length(pubrel_names),
- sizeof(Oid), oid_cmp);
+ if (!bsearch(&relid, subrel_local_oids,
+ list_length(subrel_states), sizeof(Oid), oid_cmp))
+ {
+ AddSubscriptionRelState(sub->oid, relid,
+ copy_data ? SUBREL_STATE_INIT : SUBREL_STATE_READY,
+ InvalidXLogRecPtr);
+ ereport(DEBUG1,
+ (errmsg("table \"%s.%s\" added to subscription \"%s\"",
+ rv->schemaname, rv->relname, sub->name)));
+ }
+ }
- for (off = 0; off < list_length(subrel_states); off++)
- {
- Oid relid = subrel_local_oids[off];
+ /*
+ * Next remove state for tables we should not care about anymore using
+ * the data we collected above
+ */
+ qsort(pubrel_local_oids, list_length(pubrel_names),
+ sizeof(Oid), oid_cmp);
- if (!bsearch(&relid, pubrel_local_oids,
- list_length(pubrel_names), sizeof(Oid), oid_cmp))
+ remove_rel_len = 0;
+ for (off = 0; off < list_length(subrel_states); off++)
{
- RemoveSubscriptionRel(sub->oid, relid);
+ Oid relid = subrel_local_oids[off];
- logicalrep_worker_stop_at_commit(sub->oid, relid);
+ if (!bsearch(&relid, pubrel_local_oids,
+ list_length(pubrel_names), sizeof(Oid), oid_cmp))
+ {
+ char state;
+ XLogRecPtr statelsn;
+
+ /*
+ * Lock pg_subscription_rel with AccessExclusiveLock to prevent any race
+ * conditions with the apply worker re-launching workers at the same time
+ * this code is trying to remove those tables.
+ *
+ * Even if new worker for this particular rel is restarted it won't be able
+ * to make any progress as we hold exclusive lock on subscription_rel till
+ * the transaction end. It will simply exit as there is no corresponding
+ * rel entry.
+ *
+ * This locking also ensures that the state of rels won't change till we
+ * are done with this refresh operation.
+ */
+ if (!rel)
+ rel = table_open(SubscriptionRelRelationId, AccessExclusiveLock);
+
+ /* Last known rel state. */
+ state = GetSubscriptionRelState(sub->oid, relid, &statelsn);
+
+ sub_remove_rels[remove_rel_len].relid = relid;
+ sub_remove_rels[remove_rel_len++].state = state;
+
+ RemoveSubscriptionRel(sub->oid, relid);
+
+ logicalrep_worker_stop(sub->oid, relid);
+
+ /*
+ * For READY state, we would have already dropped the tablesync
+ * origin.
+ */
+ if (state != SUBREL_STATE_READY)
+ tablesync_replorigin_drop(sub->oid, relid, false /* nowait */);
+
+ ereport(DEBUG1,
+ (errmsg("table \"%s.%s\" removed from subscription \"%s\"",
+ get_namespace_name(get_rel_namespace(relid)),
+ get_rel_name(relid),
+ sub->name)));
+ }
+ }
- ereport(DEBUG1,
- (errmsg("table \"%s.%s\" removed from subscription \"%s\"",
- get_namespace_name(get_rel_namespace(relid)),
- get_rel_name(relid),
- sub->name)));
+ /*
+ * Drop the tablesync slots associated with removed tables. This has to
+ * be at the end because otherwise if there is an error while doing the
+ * database operations we won't be able to rollback dropped slots.
+ */
+ for (off = 0; off < remove_rel_len; off++)
+ {
+ if (sub_remove_rels[off].state != SUBREL_STATE_READY &&
+ sub_remove_rels[off].state != SUBREL_STATE_SYNCDONE)
+ {
+ char syncslotname[NAMEDATALEN] = { 0 };
+
+ /*
+ * For READY/SYNCDONE states we know the tablesync slot has
+ * already been dropped by the tablesync worker.
+ *
+ * For other states, there is no certainty, maybe the slot
+ * does not exist yet. Also, if we fail after removing some of
+ * the slots, next time, it will again try to drop already
+ * dropped slots. For this reason, we allow missing_ok = true
+ * for the drop.
+ *
+ * XXX If there is a network break down while dropping the
+ * slots then we will give a WARNING to the user and they need
+ * to manually remove such slots. This can happen so rarely to
+ * worry about and we don't have any better way to deal with
+ * this.
+ */
+ ReplicationSlotNameForTablesync(sub->oid, sub_remove_rels[off].relid, syncslotname);
+ ReplicationSlotDropAtPubNode(wrconn, syncslotname, true /* missing_ok */);
+ }
}
}
+ PG_FINALLY();
+ {
+ if (wrconn)
+ walrcv_disconnect(wrconn);
+ }
+ PG_END_TRY();
+
+ if (rel)
+ table_close(rel, NoLock);
}
/*
* Alter the existing subscription.
*/
ObjectAddress
-AlterSubscription(AlterSubscriptionStmt *stmt)
+AlterSubscription(AlterSubscriptionStmt *stmt, bool isTopLevel)
{
Relation rel;
ObjectAddress myself;
@@ -848,6 +941,8 @@ AlterSubscription(AlterSubscriptionStmt *stmt)
errmsg("ALTER SUBSCRIPTION with refresh is not allowed for disabled subscriptions"),
errhint("Use ALTER SUBSCRIPTION ... SET PUBLICATION ... WITH (refresh = false).")));
+ PreventInTransactionBlock(isTopLevel, "ALTER SUBSCRIPTION with refresh");
+
/* Make sure refresh sees the new list of publications. */
sub->publications = stmt->publication;
@@ -877,6 +972,8 @@ AlterSubscription(AlterSubscriptionStmt *stmt)
NULL, NULL, /* no "binary" */
NULL, NULL); /* no "streaming" */
+ PreventInTransactionBlock(isTopLevel, "ALTER SUBSCRIPTION ... REFRESH");
+
AlterSubscription_refresh(sub, copy_data);
break;
@@ -928,8 +1025,8 @@ DropSubscription(DropSubscriptionStmt *stmt, bool isTopLevel)
char *err = NULL;
RepOriginId originid;
WalReceiverConn *wrconn = NULL;
- StringInfoData cmd;
Form_pg_subscription form;
+ List *rstates;
/*
* Lock pg_subscription with AccessExclusiveLock to ensure that the
@@ -1042,6 +1139,31 @@ DropSubscription(DropSubscriptionStmt *stmt, bool isTopLevel)
}
list_free(subworkers);
+ /*
+ * Cleanup of tablesync replication origins.
+ *
+ * Any READY-state relations would already have dealt with clean-ups.
+ *
+ * Note that the state can't change because we have already stopped both
+ * the apply and tablesync workers and they can't restart because of
+ * exclusive lock on the subscription.
+ */
+ rstates = GetSubscriptionNotReadyRelations(subid);
+ foreach(lc, rstates)
+ {
+ SubscriptionRelState* rstate = (SubscriptionRelState*) lfirst(lc);
+ Oid relid = rstate->relid;
+
+ /* Only cleanup resources of tablesync workers */
+ if (!OidIsValid(relid))
+ continue;
+
+ /*
+ * Drop the tablesync's origin tracking if exists.
+ */
+ tablesync_replorigin_drop(subid, relid, false /* nowait */);
+ }
+
/* Clean up dependencies */
deleteSharedDependencyRecordsFor(SubscriptionRelationId, subid, 0);
@@ -1058,30 +1180,108 @@ DropSubscription(DropSubscriptionStmt *stmt, bool isTopLevel)
* If there is no slot associated with the subscription, we can finish
* here.
*/
- if (!slotname)
+ if (!slotname && rstates == NIL)
{
table_close(rel, NoLock);
return;
}
/*
- * Otherwise drop the replication slot at the publisher node using the
- * replication connection.
+ * Try to acquire the connection necessary for dropping slots.
+ *
+ * Note: If the slotname is NONE/NULL then we allow the command to finish
+ * and users need to manually cleanup the apply and tablesync worker slots
+ * later.
+ *
+ * This has to be at the end because otherwise if there is an error while
+ * doing the database operations we won't be able to rollback dropped slot.
*/
load_file("libpqwalreceiver", false);
- initStringInfo(&cmd);
- appendStringInfo(&cmd, "DROP_REPLICATION_SLOT %s WAIT", quote_identifier(slotname));
-
wrconn = walrcv_connect(conninfo, true, subname, &err);
if (wrconn == NULL)
- ereport(ERROR,
- (errmsg("could not connect to publisher when attempting to "
- "drop the replication slot \"%s\"", slotname),
- errdetail("The error was: %s", err),
- /* translator: %s is an SQL ALTER command */
- errhint("Use %s to disassociate the subscription from the slot.",
- "ALTER SUBSCRIPTION ... SET (slot_name = NONE)")));
+ {
+ if (!slotname)
+ {
+ /* be tidy */
+ list_free(rstates);
+ return;
+ }
+ else
+ {
+ ReportSlotConnectionError(rstates, subid, slotname, err);
+ }
+ }
+
+ PG_TRY();
+ {
+ foreach(lc, rstates)
+ {
+ SubscriptionRelState *rstate = (SubscriptionRelState *) lfirst(lc);
+ Oid relid = rstate->relid;
+
+ /* Only cleanup resources of tablesync workers */
+ if (!OidIsValid(relid))
+ continue;
+
+ /*
+ * Drop the tablesync slots associated with removed tables.
+ *
+ * For SYNCDONE/READY states, the tablesync slot is known to have
+ * already been dropped by the tablesync worker.
+ *
+ * For other states, there is no certainty, maybe the slot
+ * does not exist yet. Also, if we fail after removing some of
+ * the slots, next time, it will again try to drop already
+ * dropped slots. For this reason, we allow missing_ok = true
+ * for the drop.
+ */
+ if (rstate->state != SUBREL_STATE_SYNCDONE)
+ {
+ char syncslotname[NAMEDATALEN] = {0};
+
+ ReplicationSlotNameForTablesync(subid, relid, syncslotname);
+ ReplicationSlotDropAtPubNode(wrconn, syncslotname, true /* missing_ok */ );
+ }
+ }
+
+ list_free(rstates);
+
+ /*
+ * If there is a slot associated with the subscription, then drop the
+ * replication slot at the publisher.
+ */
+ if (slotname)
+ ReplicationSlotDropAtPubNode(wrconn, slotname, false /* missing_ok */ );
+
+ }
+ PG_FINALLY();
+ {
+ walrcv_disconnect(wrconn);
+ }
+ PG_END_TRY();
+
+ table_close(rel, NoLock);
+}
+
+/*
+ * Drop the replication slot at the publisher node using the replication
+ * connection.
+ *
+ * missing_ok - if true then only issue WARNING message if the slot cannot be
+ * deleted.
+ */
+void
+ReplicationSlotDropAtPubNode(WalReceiverConn *wrconn, char *slotname, bool missing_ok)
+{
+ StringInfoData cmd;
+
+ Assert(wrconn);
+
+ load_file("libpqwalreceiver", false);
+
+ initStringInfo(&cmd);
+ appendStringInfo(&cmd, "DROP_REPLICATION_SLOT %s WAIT", quote_identifier(slotname));
PG_TRY();
{
@@ -1089,27 +1289,37 @@ DropSubscription(DropSubscriptionStmt *stmt, bool isTopLevel)
res = walrcv_exec(wrconn, cmd.data, 0, NULL);
- if (res->status != WALRCV_OK_COMMAND)
- ereport(ERROR,
+ if (res->status == WALRCV_OK_COMMAND)
+ {
+ /* NOTICE. Success. */
+ ereport(NOTICE,
+ (errmsg("dropped replication slot \"%s\" on publisher",
+ slotname)));
+ }
+ else if (res->status == WALRCV_ERROR && missing_ok)
+ {
+ /* WARNING. Error, but missing_ok = true. */
+ ereport(WARNING,
(errmsg("could not drop the replication slot \"%s\" on publisher",
slotname),
errdetail("The error was: %s", res->err)));
+ }
else
- ereport(NOTICE,
- (errmsg("dropped replication slot \"%s\" on publisher",
- slotname)));
+ {
+ /* ERROR. */
+ ereport(ERROR,
+ (errmsg("could not drop the replication slot \"%s\" on publisher",
+ slotname),
+ errdetail("The error was: %s", res->err)));
+ }
walrcv_clear_result(res);
}
PG_FINALLY();
{
- walrcv_disconnect(wrconn);
+ pfree(cmd.data);
}
PG_END_TRY();
-
- pfree(cmd.data);
-
- table_close(rel, NoLock);
}
/*
@@ -1278,3 +1488,45 @@ fetch_table_list(WalReceiverConn *wrconn, List *publications)
return tablelist;
}
+
+/*
+ * This is to report the connection failure while dropping replication slots.
+ * Here, we report the WARNING for all tablesync slots so that user can drop
+ * them manually, if required.
+ */
+static void
+ReportSlotConnectionError(List *rstates, Oid subid, char *slotname, char *err)
+{
+ ListCell *lc;
+
+ foreach(lc, rstates)
+ {
+ SubscriptionRelState *rstate = (SubscriptionRelState *) lfirst(lc);
+ Oid relid = rstate->relid;
+
+ /* Only cleanup resources of tablesync workers */
+ if (!OidIsValid(relid))
+ continue;
+
+ /*
+ * Caller needs to ensure that relstate doesn't change underneath us.
+ * See DropSubscription where we get the relstates.
+ */
+ if (rstate->state != SUBREL_STATE_SYNCDONE)
+ {
+ char syncslotname[NAMEDATALEN] = { 0 };
+
+ ReplicationSlotNameForTablesync(subid, relid, syncslotname);
+ elog(WARNING, "could not drop tablesync replication slot \"%s\"",
+ syncslotname);
+ }
+ }
+
+ ereport(ERROR,
+ (errmsg("could not connect to publisher when attempting to "
+ "drop the replication slot \"%s\"", slotname),
+ errdetail("The error was: %s", err),
+ /* translator: %s is an SQL ALTER command */
+ errhint("Use %s to disassociate the subscription from the slot.",
+ "ALTER SUBSCRIPTION ... SET (slot_name = NONE)")));
+}
diff --git a/src/backend/replication/logical/launcher.c b/src/backend/replication/logical/launcher.c
index 186514c..58082dd 100644
--- a/src/backend/replication/logical/launcher.c
+++ b/src/backend/replication/logical/launcher.c
@@ -73,20 +73,6 @@ typedef struct LogicalRepWorkerId
Oid relid;
} LogicalRepWorkerId;
-typedef struct StopWorkersData
-{
- int nestDepth; /* Sub-transaction nest level */
- List *workers; /* List of LogicalRepWorkerId */
- struct StopWorkersData *parent; /* This need not be an immediate
- * subtransaction parent */
-} StopWorkersData;
-
-/*
- * Stack of StopWorkersData elements. Each stack element contains the workers
- * to be stopped for that subtransaction.
- */
-static StopWorkersData *on_commit_stop_workers = NULL;
-
static void ApplyLauncherWakeup(void);
static void logicalrep_launcher_onexit(int code, Datum arg);
static void logicalrep_worker_onexit(int code, Datum arg);
@@ -547,51 +533,6 @@ logicalrep_worker_stop(Oid subid, Oid relid)
}
/*
- * Request worker for specified sub/rel to be stopped on commit.
- */
-void
-logicalrep_worker_stop_at_commit(Oid subid, Oid relid)
-{
- int nestDepth = GetCurrentTransactionNestLevel();
- LogicalRepWorkerId *wid;
- MemoryContext oldctx;
-
- /* Make sure we store the info in context that survives until commit. */
- oldctx = MemoryContextSwitchTo(TopTransactionContext);
-
- /* Check that previous transactions were properly cleaned up. */
- Assert(on_commit_stop_workers == NULL ||
- nestDepth >= on_commit_stop_workers->nestDepth);
-
- /*
- * Push a new stack element if we don't already have one for the current
- * nestDepth.
- */
- if (on_commit_stop_workers == NULL ||
- nestDepth > on_commit_stop_workers->nestDepth)
- {
- StopWorkersData *newdata = palloc(sizeof(StopWorkersData));
-
- newdata->nestDepth = nestDepth;
- newdata->workers = NIL;
- newdata->parent = on_commit_stop_workers;
- on_commit_stop_workers = newdata;
- }
-
- /*
- * Finally add a new worker into the worker list of the current
- * subtransaction.
- */
- wid = palloc(sizeof(LogicalRepWorkerId));
- wid->subid = subid;
- wid->relid = relid;
- on_commit_stop_workers->workers =
- lappend(on_commit_stop_workers->workers, wid);
-
- MemoryContextSwitchTo(oldctx);
-}
-
-/*
* Wake up (using latch) any logical replication worker for specified sub/rel.
*/
void
@@ -820,109 +761,21 @@ ApplyLauncherShmemInit(void)
}
/*
- * Check whether current transaction has manipulated logical replication
- * workers.
- */
-bool
-XactManipulatesLogicalReplicationWorkers(void)
-{
- return (on_commit_stop_workers != NULL);
-}
-
-/*
* Wakeup the launcher on commit if requested.
*/
void
AtEOXact_ApplyLauncher(bool isCommit)
{
-
- Assert(on_commit_stop_workers == NULL ||
- (on_commit_stop_workers->nestDepth == 1 &&
- on_commit_stop_workers->parent == NULL));
-
if (isCommit)
{
- ListCell *lc;
-
- if (on_commit_stop_workers != NULL)
- {
- List *workers = on_commit_stop_workers->workers;
-
- foreach(lc, workers)
- {
- LogicalRepWorkerId *wid = lfirst(lc);
-
- logicalrep_worker_stop(wid->subid, wid->relid);
- }
- }
-
if (on_commit_launcher_wakeup)
ApplyLauncherWakeup();
}
- /*
- * No need to pfree on_commit_stop_workers. It was allocated in
- * transaction memory context, which is going to be cleaned soon.
- */
- on_commit_stop_workers = NULL;
on_commit_launcher_wakeup = false;
}
/*
- * On commit, merge the current on_commit_stop_workers list into the
- * immediate parent, if present.
- * On rollback, discard the current on_commit_stop_workers list.
- * Pop out the stack.
- */
-void
-AtEOSubXact_ApplyLauncher(bool isCommit, int nestDepth)
-{
- StopWorkersData *parent;
-
- /* Exit immediately if there's no work to do at this level. */
- if (on_commit_stop_workers == NULL ||
- on_commit_stop_workers->nestDepth < nestDepth)
- return;
-
- Assert(on_commit_stop_workers->nestDepth == nestDepth);
-
- parent = on_commit_stop_workers->parent;
-
- if (isCommit)
- {
- /*
- * If the upper stack element is not an immediate parent
- * subtransaction, just decrement the notional nesting depth without
- * doing any real work. Else, we need to merge the current workers
- * list into the parent.
- */
- if (!parent || parent->nestDepth < nestDepth - 1)
- {
- on_commit_stop_workers->nestDepth--;
- return;
- }
-
- parent->workers =
- list_concat(parent->workers, on_commit_stop_workers->workers);
- }
- else
- {
- /*
- * Abandon everything that was done at this nesting level. Explicitly
- * free memory to avoid a transaction-lifespan leak.
- */
- list_free_deep(on_commit_stop_workers->workers);
- }
-
- /*
- * We have taken care of the current subtransaction workers list for both
- * abort or commit. So we are ready to pop the stack.
- */
- pfree(on_commit_stop_workers);
- on_commit_stop_workers = parent;
-}
-
-/*
* Request wakeup of the launcher on commit of the transaction.
*
* This is used to send launcher signal to stop sleeping and process the
diff --git a/src/backend/replication/logical/tablesync.c b/src/backend/replication/logical/tablesync.c
index ccbdbcf..0c04e63 100644
--- a/src/backend/replication/logical/tablesync.c
+++ b/src/backend/replication/logical/tablesync.c
@@ -31,8 +31,11 @@
* table state to INIT.
* - Tablesync worker starts; changes table state from INIT to DATASYNC while
* copying.
- * - Tablesync worker finishes the copy and sets table state to SYNCWAIT;
- * waits for state change.
+ * - Tablesync worker does initial table copy; there is a FINISHEDCOPY (sync
+ * worker specific) state to indicate when the copy phase has completed, so
+ * if the worker crashes with this (non-memory) state then the copy will not
+ * be re-attempted.
+ * - Tablesync worker then sets table state to SYNCWAIT; waits for state change.
* - Apply worker periodically checks for tables in SYNCWAIT state. When
* any appear, it sets the table state to CATCHUP and starts loop-waiting
* until either the table state is set to SYNCDONE or the sync worker
@@ -48,8 +51,8 @@
* point it sets state to READY and stops tracking. Again, there might
* be zero changes in between.
*
- * So the state progression is always: INIT -> DATASYNC -> SYNCWAIT ->
- * CATCHUP -> SYNCDONE -> READY.
+ * So the state progression is always: INIT -> DATASYNC -> FINISHEDCOPY
+ * -> SYNCWAIT -> CATCHUP -> SYNCDONE -> READY.
*
* The catalog pg_subscription_rel is used to keep information about
* subscribed tables and their state. The catalog holds all states
@@ -58,6 +61,7 @@
* Example flows look like this:
* - Apply is in front:
* sync:8
+ * -> set in catalog FINISHEDCOPY
* -> set in memory SYNCWAIT
* apply:10
* -> set in memory CATCHUP
@@ -73,6 +77,7 @@
*
* - Sync is in front:
* sync:10
+ * -> set in catalog FINISHEDCOPY
* -> set in memory SYNCWAIT
* apply:8
* -> set in memory CATCHUP
@@ -101,7 +106,10 @@
#include "replication/logicalrelation.h"
#include "replication/walreceiver.h"
#include "replication/worker_internal.h"
+#include "replication/slot.h"
+#include "replication/origin.h"
#include "storage/ipc.h"
+#include "storage/lmgr.h"
#include "utils/builtins.h"
#include "utils/lsyscache.h"
#include "utils/memutils.h"
@@ -112,6 +120,42 @@ static bool table_states_valid = false;
StringInfo copybuf = NULL;
/*
+ * Common code to drop the origin of a tablesync worker.
+ *
+ * There is a potential race condition if two processes attempt to call
+ * replorigin_drop for the same originid at the same time. The loser of
+ * that race would give an ERROR saying that it failed to find the
+ * expected originid.
+ *
+ * The TRY/CATCH below supresses such errors allowing the tablesync cleanup
+ * code to proceed.
+ */
+void
+tablesync_replorigin_drop(Oid subid, Oid relid, bool nowait)
+{
+ char originname[NAMEDATALEN];
+ RepOriginId originid;
+
+ snprintf(originname, sizeof(originname), "pg_%u_%u", subid, relid);
+ originid = replorigin_by_name(originname, true);
+ if (OidIsValid(originid))
+ {
+ PG_TRY();
+ {
+ replorigin_drop(originid, nowait);
+ }
+ PG_CATCH();
+ {
+ ereport(WARNING,
+ errmsg("could not drop replication origin with OID %d, named \"%s\"",
+ originid,
+ originname));
+ }
+ PG_END_TRY();
+ }
+}
+
+/*
* Exit routine for synchronization worker.
*/
static void
@@ -269,26 +313,47 @@ invalidate_syncing_table_states(Datum arg, int cacheid, uint32 hashvalue)
static void
process_syncing_tables_for_sync(XLogRecPtr current_lsn)
{
- Assert(IsTransactionState());
-
SpinLockAcquire(&MyLogicalRepWorker->relmutex);
if (MyLogicalRepWorker->relstate == SUBREL_STATE_CATCHUP &&
current_lsn >= MyLogicalRepWorker->relstate_lsn)
{
TimeLineID tli;
+ char syncslotname[NAMEDATALEN] = {0};
MyLogicalRepWorker->relstate = SUBREL_STATE_SYNCDONE;
MyLogicalRepWorker->relstate_lsn = current_lsn;
SpinLockRelease(&MyLogicalRepWorker->relmutex);
+ /*
+ * UpdateSubscriptionRelState must be called within a transaction.
+ * That transaction will be ended within the finish_sync_worker().
+ */
+ if (!IsTransactionState())
+ StartTransactionCommand();
+
UpdateSubscriptionRelState(MyLogicalRepWorker->subid,
MyLogicalRepWorker->relid,
MyLogicalRepWorker->relstate,
MyLogicalRepWorker->relstate_lsn);
+ /* End wal streaming so wrconn can be re-used to drop the slot. */
walrcv_endstreaming(wrconn, &tli);
+
+ /*
+ * Cleanup the tablesync slot.
+ *
+ * This has to be done after updating the state because otherwise if
+ * there is an error while doing the database operations we won't be
+ * able to rollback dropped slot.
+ */
+ ReplicationSlotNameForTablesync(MyLogicalRepWorker->subid,
+ MyLogicalRepWorker->relid,
+ syncslotname);
+
+ ReplicationSlotDropAtPubNode(wrconn, syncslotname, false /* missing_ok */);
+
finish_sync_worker();
}
else
@@ -411,6 +476,20 @@ process_syncing_tables_for_apply(XLogRecPtr current_lsn)
started_tx = true;
}
+ /*
+ * Remove the tablesync origin tracking if exists.
+ *
+ * The normal case origin drop is done here instead of in the
+ * process_syncing_tables_for_sync function because we don't
+ * allow to drop the origin till the process owning the origin
+ * is alive.
+ */
+ tablesync_replorigin_drop(MyLogicalRepWorker->subid,
+ rstate->relid, false /* nowait */ );
+
+ /*
+ * Update the state to READY only after the origin cleanup.
+ */
UpdateSubscriptionRelState(MyLogicalRepWorker->subid,
rstate->relid, rstate->state,
rstate->lsn);
@@ -806,6 +885,32 @@ copy_table(Relation rel)
}
/*
+ * Determine the tablesync slot name.
+ *
+ * The name must not exceed NAMEDATALEN -1 because of remote node constraints
+ * on slot name length.
+ *
+ * The returned slot name is either:
+ * - stored in the supplied buffer (syncslotname), or
+ * - palloc'ed in current memory context (if syncslotname = NULL).
+ *
+ * Note: We don't use the subscription slot name as part of tablesync slot name
+ * because we are responsible for cleaning up these slots and it could become
+ * impossible to recalculate what name to cleanup if the subscription slot name
+ * had changed.
+ */
+char *
+ReplicationSlotNameForTablesync(Oid suboid, Oid relid, char syncslotname[NAMEDATALEN])
+{
+ if (syncslotname)
+ sprintf(syncslotname, "pg_%u_sync_%u", suboid, relid);
+ else
+ syncslotname = psprintf("pg_%u_sync_%u", suboid, relid);
+
+ return syncslotname;
+}
+
+/*
* Start syncing the table in the sync worker.
*
* If nothing needs to be done to sync the table, we exit the worker without
@@ -822,6 +927,8 @@ LogicalRepSyncTableStart(XLogRecPtr *origin_startpos)
XLogRecPtr relstate_lsn;
Relation rel;
WalRcvExecResult *res;
+ char originname[NAMEDATALEN];
+ RepOriginId originid;
/* Check the state of the table synchronization. */
StartTransactionCommand();
@@ -847,19 +954,10 @@ LogicalRepSyncTableStart(XLogRecPtr *origin_startpos)
finish_sync_worker(); /* doesn't return */
}
- /*
- * To build a slot name for the sync work, we are limited to NAMEDATALEN -
- * 1 characters. We cut the original slot name to NAMEDATALEN - 28 chars
- * and append _%u_sync_%u (1 + 10 + 6 + 10 + '\0'). (It's actually the
- * NAMEDATALEN on the remote that matters, but this scheme will also work
- * reasonably if that is different.)
- */
- StaticAssertStmt(NAMEDATALEN >= 32, "NAMEDATALEN too small"); /* for sanity */
- slotname = psprintf("%.*s_%u_sync_%u",
- NAMEDATALEN - 28,
- MySubscription->slotname,
- MySubscription->oid,
- MyLogicalRepWorker->relid);
+ /* Calculate the name of the tablesync slot. */
+ slotname = ReplicationSlotNameForTablesync(MySubscription->oid,
+ MyLogicalRepWorker->relid,
+ NULL /* use palloc */ );
/*
* Here we use the slot name instead of the subscription name as the
@@ -872,7 +970,48 @@ LogicalRepSyncTableStart(XLogRecPtr *origin_startpos)
(errmsg("could not connect to the publisher: %s", err)));
Assert(MyLogicalRepWorker->relstate == SUBREL_STATE_INIT ||
- MyLogicalRepWorker->relstate == SUBREL_STATE_DATASYNC);
+ MyLogicalRepWorker->relstate == SUBREL_STATE_DATASYNC ||
+ MyLogicalRepWorker->relstate == SUBREL_STATE_FINISHEDCOPY);
+
+ /* Assign the origin tracking record name. */
+ snprintf(originname, sizeof(originname), "pg_%u_%u", MySubscription->oid, MyLogicalRepWorker->relid);
+
+ if (MyLogicalRepWorker->relstate == SUBREL_STATE_DATASYNC)
+ {
+ /*
+ * We have previously errored out before finishing the copy so the
+ * replication slot might exist. We want to remove the slot if it
+ * already exists and proceed.
+ *
+ * XXX We could also instead try to drop the slot, last time we failed
+ * but for that, we might need to clean up the copy state as it might
+ * be in the middle of fetching the rows. Also, if there is a network
+ * break down then it wouldn't have succeeded so trying it next time
+ * seems like a better bet.
+ */
+ ReplicationSlotDropAtPubNode(wrconn, slotname, true /* missing_ok */);
+ }
+ else if (MyLogicalRepWorker->relstate == SUBREL_STATE_FINISHEDCOPY)
+ {
+ /*
+ * The COPY phase was previously done, but tablesync then crashed
+ * before it was able to finish normally.
+ */
+ StartTransactionCommand();
+
+ /*
+ * The origin tracking name must already exist. It was created first
+ * time this tablesync was launched.
+ */
+ originid = replorigin_by_name(originname, false /* missing_ok */ );
+ replorigin_session_setup(originid);
+ replorigin_session_origin = originid;
+ *origin_startpos = replorigin_session_get_progress(false);
+
+ CommitTransactionCommand();
+
+ goto copy_table_done;
+ }
SpinLockAcquire(&MyLogicalRepWorker->relmutex);
MyLogicalRepWorker->relstate = SUBREL_STATE_DATASYNC;
@@ -888,9 +1027,6 @@ LogicalRepSyncTableStart(XLogRecPtr *origin_startpos)
CommitTransactionCommand();
pgstat_report_stat(false);
- /*
- * We want to do the table data sync in a single transaction.
- */
StartTransactionCommand();
/*
@@ -916,12 +1052,12 @@ LogicalRepSyncTableStart(XLogRecPtr *origin_startpos)
walrcv_clear_result(res);
/*
- * Create a new temporary logical decoding slot. This slot will be used
- * for the catchup phase after COPY is done, so tell it to use the
- * snapshot to make the final data consistent.
+ * Create a new permanent logical decoding slot. This slot will be
+ * used for the catchup phase after COPY is done, so tell it to use
+ * the snapshot to make the final data consistent.
*/
- walrcv_create_slot(wrconn, slotname, true,
- CRS_USE_SNAPSHOT, origin_startpos);
+ walrcv_create_slot(wrconn, slotname, false /* permanent */ ,
+ CRS_USE_SNAPSHOT, origin_startpos);
/* Now do the initial data copy */
PushActiveSnapshot(GetTransactionSnapshot());
@@ -940,6 +1076,54 @@ LogicalRepSyncTableStart(XLogRecPtr *origin_startpos)
/* Make the copy visible. */
CommandCounterIncrement();
+ /* Setup replication origin tracking. */
+ originid = replorigin_by_name(originname, true);
+ if (!OidIsValid(originid))
+ {
+ /*
+ * Origin tracking does not exist, so create it now.
+ *
+ * Then advance to the LSN got from walrcv_create_slot. This is
+ * WAL logged for the purpose of recovery. Locks are to prevent
+ * the replication origin from vanishing while advancing.
+ */
+ originid = replorigin_create(originname);
+
+ LockRelationOid(ReplicationOriginRelationId, RowExclusiveLock);
+ replorigin_advance(originid, *origin_startpos, InvalidXLogRecPtr,
+ true /* go backward */ , true /* WAL log */ );
+ UnlockRelationOid(ReplicationOriginRelationId, RowExclusiveLock);
+
+ replorigin_session_setup(originid);
+ replorigin_session_origin = originid;
+ }
+ else
+ {
+ ereport(ERROR,
+ (errcode(ERRCODE_DUPLICATE_OBJECT),
+ errmsg("replication origin \"%s\" already exists",
+ originname)));
+ }
+
+ /*
+ * Update the persisted state to indicate the COPY phase is done; make
+ * it visible to others.
+ */
+ UpdateSubscriptionRelState(MyLogicalRepWorker->subid,
+ MyLogicalRepWorker->relid,
+ SUBREL_STATE_FINISHEDCOPY,
+ MyLogicalRepWorker->relstate_lsn);
+
+ CommitTransactionCommand();
+
+copy_table_done:
+
+ elog(DEBUG1,
+ "LogicalRepSyncTableStart: '%s' origin_startpos lsn %X/%X",
+ originname,
+ (uint32) (*origin_startpos >> 32),
+ (uint32) *origin_startpos);
+
/*
* We are done with the initial data synchronization, update the state.
*/
diff --git a/src/backend/replication/logical/worker.c b/src/backend/replication/logical/worker.c
index eb7db89..cfc924c 100644
--- a/src/backend/replication/logical/worker.c
+++ b/src/backend/replication/logical/worker.c
@@ -807,12 +807,8 @@ apply_handle_stream_stop(StringInfo s)
/* We must be in a valid transaction state */
Assert(IsTransactionState());
- /* The synchronization worker runs in single transaction. */
- if (!am_tablesync_worker())
- {
- /* Commit the per-stream transaction */
- CommitTransactionCommand();
- }
+ /* Commit the per-stream transaction */
+ CommitTransactionCommand();
in_streamed_transaction = false;
@@ -889,9 +885,7 @@ apply_handle_stream_abort(StringInfo s)
/* Cleanup the subxact info */
cleanup_subxact_info();
- /* The synchronization worker runs in single transaction */
- if (!am_tablesync_worker())
- CommitTransactionCommand();
+ CommitTransactionCommand();
return;
}
@@ -918,8 +912,7 @@ apply_handle_stream_abort(StringInfo s)
/* write the updated subxact list */
subxact_info_write(MyLogicalRepWorker->subid, xid);
- if (!am_tablesync_worker())
- CommitTransactionCommand();
+ CommitTransactionCommand();
}
}
@@ -1062,8 +1055,7 @@ apply_handle_stream_commit(StringInfo s)
static void
apply_handle_commit_internal(StringInfo s, LogicalRepCommitData *commit_data)
{
- /* The synchronization worker runs in single transaction. */
- if (IsTransactionState() && !am_tablesync_worker())
+ if (IsTransactionState())
{
/*
* Update origin state so we can restart streaming from correct
diff --git a/src/backend/tcop/utility.c b/src/backend/tcop/utility.c
index 1d81071..05bb698 100644
--- a/src/backend/tcop/utility.c
+++ b/src/backend/tcop/utility.c
@@ -1786,7 +1786,8 @@ ProcessUtilitySlow(ParseState *pstate,
break;
case T_AlterSubscriptionStmt:
- address = AlterSubscription((AlterSubscriptionStmt *) parsetree);
+ address = AlterSubscription((AlterSubscriptionStmt *) parsetree,
+ isTopLevel);
break;
case T_DropSubscriptionStmt:
diff --git a/src/include/catalog/pg_subscription_rel.h b/src/include/catalog/pg_subscription_rel.h
index 2bea2c5..ed94f57 100644
--- a/src/include/catalog/pg_subscription_rel.h
+++ b/src/include/catalog/pg_subscription_rel.h
@@ -61,6 +61,8 @@ DECLARE_UNIQUE_INDEX_PKEY(pg_subscription_rel_srrelid_srsubid_index, 6117, on pg
#define SUBREL_STATE_INIT 'i' /* initializing (sublsn NULL) */
#define SUBREL_STATE_DATASYNC 'd' /* data is being synchronized (sublsn
* NULL) */
+#define SUBREL_STATE_FINISHEDCOPY 'f' /* tablesync copy phase is completed
+ * (sublsn NULL) */
#define SUBREL_STATE_SYNCDONE 's' /* synchronization finished in front of
* apply (sublsn set) */
#define SUBREL_STATE_READY 'r' /* ready (sublsn set) */
diff --git a/src/include/commands/subscriptioncmds.h b/src/include/commands/subscriptioncmds.h
index a818650..3b926f3 100644
--- a/src/include/commands/subscriptioncmds.h
+++ b/src/include/commands/subscriptioncmds.h
@@ -20,7 +20,7 @@
extern ObjectAddress CreateSubscription(CreateSubscriptionStmt *stmt,
bool isTopLevel);
-extern ObjectAddress AlterSubscription(AlterSubscriptionStmt *stmt);
+extern ObjectAddress AlterSubscription(AlterSubscriptionStmt *stmt, bool isTopLevel);
extern void DropSubscription(DropSubscriptionStmt *stmt, bool isTopLevel);
extern ObjectAddress AlterSubscriptionOwner(const char *name, Oid newOwnerId);
diff --git a/src/include/replication/logicallauncher.h b/src/include/replication/logicallauncher.h
index 421ec15..301e494 100644
--- a/src/include/replication/logicallauncher.h
+++ b/src/include/replication/logicallauncher.h
@@ -22,9 +22,7 @@ extern Size ApplyLauncherShmemSize(void);
extern void ApplyLauncherShmemInit(void);
extern void ApplyLauncherWakeupAtCommit(void);
-extern bool XactManipulatesLogicalReplicationWorkers(void);
extern void AtEOXact_ApplyLauncher(bool isCommit);
-extern void AtEOSubXact_ApplyLauncher(bool isCommit, int nestDepth);
extern bool IsLogicalLauncher(void);
diff --git a/src/include/replication/slot.h b/src/include/replication/slot.h
index 53f636c..5f52335 100644
--- a/src/include/replication/slot.h
+++ b/src/include/replication/slot.h
@@ -15,6 +15,7 @@
#include "storage/lwlock.h"
#include "storage/shmem.h"
#include "storage/spin.h"
+#include "replication/walreceiver.h"
/*
* Behaviour of replication slots, upon release or crash.
@@ -211,6 +212,8 @@ extern bool ReplicationSlotsCountDBSlots(Oid dboid, int *nslots, int *nactive);
extern void ReplicationSlotsDropDBSlots(Oid dboid);
extern void InvalidateObsoleteReplicationSlots(XLogSegNo oldestSegno);
extern ReplicationSlot *SearchNamedReplicationSlot(const char *name);
+extern char *ReplicationSlotNameForTablesync(Oid suboid, Oid relid, char *syncslotname);
+extern void ReplicationSlotDropAtPubNode(WalReceiverConn *wrconn, char *slotname, bool missing_ok);
extern void StartupReplicationSlots(void);
extern void CheckPointReplicationSlots(void);
diff --git a/src/include/replication/worker_internal.h b/src/include/replication/worker_internal.h
index d046022..4a5c49d 100644
--- a/src/include/replication/worker_internal.h
+++ b/src/include/replication/worker_internal.h
@@ -77,13 +77,14 @@ extern List *logicalrep_workers_find(Oid subid, bool only_running);
extern void logicalrep_worker_launch(Oid dbid, Oid subid, const char *subname,
Oid userid, Oid relid);
extern void logicalrep_worker_stop(Oid subid, Oid relid);
-extern void logicalrep_worker_stop_at_commit(Oid subid, Oid relid);
extern void logicalrep_worker_wakeup(Oid subid, Oid relid);
extern void logicalrep_worker_wakeup_ptr(LogicalRepWorker *worker);
extern int logicalrep_sync_worker_count(Oid subid);
extern char *LogicalRepSyncTableStart(XLogRecPtr *origin_startpos);
+extern void tablesync_replorigin_drop(Oid subid, Oid relid, bool nowait);
+
void process_syncing_tables(XLogRecPtr current_lsn);
void invalidate_syncing_table_states(Datum arg, int cacheid,
uint32 hashvalue);
diff --git a/src/tools/pgindent/typedefs.list b/src/tools/pgindent/typedefs.list
index 1d540fe..5f5c36d 100644
--- a/src/tools/pgindent/typedefs.list
+++ b/src/tools/pgindent/typedefs.list
@@ -2397,7 +2397,6 @@ StdAnalyzeData
StdRdOptions
Step
StopList
-StopWorkersData
StrategyNumber
StreamCtl
StreamXidHash
--
1.8.3.1
On Thu, Feb 4, 2021 at 8:33 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
...
Thanks. I have fixed one of the issues reported by me earlier [1]
wherein the tablesync worker can repeatedly fail if after dropping the
slot there is an error while updating the SYNCDONE state in the
database. I have moved the drop of the slot just before commit of the
transaction where we are marking the state as SYNCDONE. Additionally,
I have removed unnecessary includes in tablesync.c, updated the docs
for Alter Subscription, and updated the comments at various places in
the patch. I have also updated the commit message this time.
Below are my feedback comments for V17 (nothing functional)
~~
1.
V27 Commit message:
For the initial table data synchronization in logical replication, we use
a single transaction to copy the entire table and then synchronizes the
position in the stream with the main apply worker.
Typo:
"synchronizes" -> "synchronize"
~~
2.
@@ -48,6 +48,23 @@ ALTER SUBSCRIPTION <replaceable
class="parameter">name</replaceable> RENAME TO <
(Currently, all subscription owners must be superusers, so the owner checks
will be bypassed in practice. But this might change in the future.)
</para>
+
+ <para>
+ When refreshing a publication we remove the relations that are no longer
+ part of the publication and we also remove the tablesync slots if there are
+ any. It is necessary to remove tablesync slots so that the resources
+ allocated for the subscription on the remote host are released. If due to
+ network breakdown or some other error, we are not able to remove the slots,
+ we give WARNING and the user needs to manually remove such slots later as
+ otherwise, they will continue to reserve WAL and might eventually cause
+ the disk to fill up. See also <xref
linkend="logical-replication-subscription-slot"/>.
+ </para>
I think the content is good, but the 1st-person wording seemed strange.
e.g.
"we are not able to remove the slots, we give WARNING and the user needs..."
Maybe it should be like:
"... PostgreSQL is unable to remove the slots, so a WARNING is
reported. The user needs... "
~~
3.
@@ -566,107 +569,197 @@ AlterSubscription_refresh(Subscription *sub,
bool copy_data)
...
+ * XXX If there is a network break down while dropping the
"network break down" -> "network breakdown"
~~
4.
@@ -872,7 +970,48 @@ LogicalRepSyncTableStart(XLogRecPtr *origin_startpos)
(errmsg("could not connect to the publisher: %s", err)));
...
+ * XXX We could also instead try to drop the slot, last time we failed
+ * but for that, we might need to clean up the copy state as it might
+ * be in the middle of fetching the rows. Also, if there is a network
+ * break down then it wouldn't have succeeded so trying it next time
+ * seems like a better bet.
"network break down" -> "network breakdown"
~~
5.
@@ -269,26 +313,47 @@ invalidate_syncing_table_states(Datum arg, int
cacheid, uint32 hashvalue)
...
+
+ /*
+ * Cleanup the tablesync slot.
+ *
+ * This has to be done after updating the state because otherwise if
+ * there is an error while doing the database operations we won't be
+ * able to rollback dropped slot.
+ */
+ ReplicationSlotNameForTablesync(MyLogicalRepWorker->subid,
+ MyLogicalRepWorker->relid,
+ syncslotname);
+
+ ReplicationSlotDropAtPubNode(wrconn, syncslotname, false /* missing_ok */);
+
Should this comment also describe why the missing_ok is false for this case?
----
Kind Regards,
Peter Smith.
Fujitsu Australia
On Fri, Feb 5, 2021 at 7:09 AM Peter Smith <smithpb2250@gmail.com> wrote:
On Thu, Feb 4, 2021 at 8:33 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
...
Thanks. I have fixed one of the issues reported by me earlier [1]
wherein the tablesync worker can repeatedly fail if after dropping the
slot there is an error while updating the SYNCDONE state in the
database. I have moved the drop of the slot just before commit of the
transaction where we are marking the state as SYNCDONE. Additionally,
I have removed unnecessary includes in tablesync.c, updated the docs
for Alter Subscription, and updated the comments at various places in
the patch. I have also updated the commit message this time.Below are my feedback comments for V17 (nothing functional)
~~
1.
V27 Commit message:
For the initial table data synchronization in logical replication, we use
a single transaction to copy the entire table and then synchronizes the
position in the stream with the main apply worker.Typo:
"synchronizes" -> "synchronize"
Fixed and added a note about Alter Sub .. Refresh .. command can't be
executed in the transaction block.
~~
2. @@ -48,6 +48,23 @@ ALTER SUBSCRIPTION <replaceable class="parameter">name</replaceable> RENAME TO < (Currently, all subscription owners must be superusers, so the owner checks will be bypassed in practice. But this might change in the future.) </para> + + <para> + When refreshing a publication we remove the relations that are no longer + part of the publication and we also remove the tablesync slots if there are + any. It is necessary to remove tablesync slots so that the resources + allocated for the subscription on the remote host are released. If due to + network breakdown or some other error, we are not able to remove the slots, + we give WARNING and the user needs to manually remove such slots later as + otherwise, they will continue to reserve WAL and might eventually cause + the disk to fill up. See also <xref linkend="logical-replication-subscription-slot"/>. + </para>I think the content is good, but the 1st-person wording seemed strange.
e.g.
"we are not able to remove the slots, we give WARNING and the user needs..."
Maybe it should be like:
"... PostgreSQL is unable to remove the slots, so a WARNING is
reported. The user needs... "
Changed as per suggestion with a minor tweak.
~~
3. @@ -566,107 +569,197 @@ AlterSubscription_refresh(Subscription *sub, bool copy_data) ... + * XXX If there is a network break down while dropping the"network break down" -> "network breakdown"
~~
4. @@ -872,7 +970,48 @@ LogicalRepSyncTableStart(XLogRecPtr *origin_startpos) (errmsg("could not connect to the publisher: %s", err))); ... + * XXX We could also instead try to drop the slot, last time we failed + * but for that, we might need to clean up the copy state as it might + * be in the middle of fetching the rows. Also, if there is a network + * break down then it wouldn't have succeeded so trying it next time + * seems like a better bet."network break down" -> "network breakdown"
Changed as per suggestion.
~~
5. @@ -269,26 +313,47 @@ invalidate_syncing_table_states(Datum arg, int cacheid, uint32 hashvalue) ... + + /* + * Cleanup the tablesync slot. + * + * This has to be done after updating the state because otherwise if + * there is an error while doing the database operations we won't be + * able to rollback dropped slot. + */ + ReplicationSlotNameForTablesync(MyLogicalRepWorker->subid, + MyLogicalRepWorker->relid, + syncslotname); + + ReplicationSlotDropAtPubNode(wrconn, syncslotname, false /* missing_ok */); +Should this comment also describe why the missing_ok is false for this case?
Yeah that makes sense, so added a comment.
Additionally, I have changed the errorcode in RemoveSubscriptionRel,
moved the setup of origin before copy_table in
LogicalRepSyncTableStart to avoid doing copy again due to an error in
setting up origin. I have made a few comment changes as well.
--
With Regards,
Amit Kapila.
Attachments:
v28-0001-Allow-multiple-xacts-during-table-sync-in-logica.patchapplication/octet-stream; name=v28-0001-Allow-multiple-xacts-during-table-sync-in-logica.patchDownload
From d6602ac989e60a32b84438c4e5dbfe7ffa2cae56 Mon Sep 17 00:00:00 2001
From: Amit Kapila <akapila@postgresql.org>
Date: Thu, 4 Feb 2021 14:34:00 +0530
Subject: [PATCH v28] Allow multiple xacts during table sync in logical
replication.
For the initial table data synchronization in logical replication, we use
a single transaction to copy the entire table and then synchronize the
position in the stream with the main apply worker.
There are multiple downsides of this approach: (a) We have to perform the
entire copy operation again if there is any error (network breakdown,
error in the database operation, etc.) while we synchronize the WAL
position between tablesync worker and apply worker; this will be onerous
especially for large copies, (b) Using a single transaction in the
synchronization-phase (where we can receive WAL from multiple
transactions) will have the risk of exceeding the CID limit, (c) The slot
will hold the WAL till the entire sync is complete because we never commit
till the end.
This patch solves all the above downsides by allowing multiple
transactions during the tablesync phase. The initial copy is done in a
single transaction and after that, we commit each transaction as we
receive. To allow recovery after any error or crash, we use a permanent
slot and origin to track the progress. The slot and origin will be removed
once we finish the synchronization of the table. We also remove slot and
origin of tablesync workers if the user performs DROP SUBSCRIPTION .. or
ALTER SUBSCRIPTION .. REFERESH and some of the table syncs are still not
finished.
The commands ALTER SUBSCRIPTION ... REFRESH .. and
ALTER SUBSCRIPTION ... SET PUBLICATION .. with refresh option as true
cannot be executed inside a transaction block because they can now drop
the slots for which we have no provision to rollback.
This will also open up the path for logical replication of 2PC
transactions on the subscriber side. Previously, we can't do that because
of the requirement of maintaining a single transaction in tablesync
workers.
Author: Peter Smith and Amit Kapila
Reviewed-by: Ajin Cherian, Hou, Zhijie and Amit Kapila
Discussion: https://postgr.es/m/CAA4eK1KHJxaZS-fod-0fey=0tq3=Gkn4ho=8N4-5HWiCfu0H1A@mail.gmail.com
---
doc/src/sgml/catalogs.sgml | 1 +
doc/src/sgml/logical-replication.sgml | 29 +-
doc/src/sgml/ref/alter_subscription.sgml | 18 +
doc/src/sgml/ref/drop_subscription.sgml | 6 +-
src/backend/access/transam/xact.c | 11 -
src/backend/catalog/pg_subscription.c | 35 ++
src/backend/commands/subscriptioncmds.c | 450 +++++++++++++++-----
src/backend/replication/logical/launcher.c | 147 -------
src/backend/replication/logical/tablesync.c | 249 +++++++++--
src/backend/replication/logical/worker.c | 18 +-
src/backend/tcop/utility.c | 3 +-
src/include/catalog/pg_subscription_rel.h | 2 +
src/include/commands/subscriptioncmds.h | 2 +-
src/include/replication/logicallauncher.h | 2 -
src/include/replication/slot.h | 3 +
src/include/replication/worker_internal.h | 3 +-
src/tools/pgindent/typedefs.list | 1 -
17 files changed, 665 insertions(+), 315 deletions(-)
diff --git a/doc/src/sgml/catalogs.sgml b/doc/src/sgml/catalogs.sgml
index ea222c0464..692ad65de2 100644
--- a/doc/src/sgml/catalogs.sgml
+++ b/doc/src/sgml/catalogs.sgml
@@ -7673,6 +7673,7 @@ SCRAM-SHA-256$<replaceable><iteration count></replaceable>:<replaceable>&l
State code:
<literal>i</literal> = initialize,
<literal>d</literal> = data is being copied,
+ <literal>f</literal> = finished table copy,
<literal>s</literal> = synchronized,
<literal>r</literal> = ready (normal replication)
</para></entry>
diff --git a/doc/src/sgml/logical-replication.sgml b/doc/src/sgml/logical-replication.sgml
index a560ad69b4..43fe7f7264 100644
--- a/doc/src/sgml/logical-replication.sgml
+++ b/doc/src/sgml/logical-replication.sgml
@@ -248,13 +248,23 @@
<para>
As mentioned earlier, each (active) subscription receives changes from a
- replication slot on the remote (publishing) side. Normally, the remote
- replication slot is created automatically when the subscription is created
- using <command>CREATE SUBSCRIPTION</command> and it is dropped
- automatically when the subscription is dropped using <command>DROP
- SUBSCRIPTION</command>. In some situations, however, it can be useful or
- necessary to manipulate the subscription and the underlying replication
- slot separately. Here are some scenarios:
+ replication slot on the remote (publishing) side.
+ </para>
+ <para>
+ Additional table synchronization slots are normally transient, created
+ internally to perform initial table synchronization and dropped
+ automatically when they are no longer needed. These table synchronization
+ slots have generated names: <quote><literal>pg_%u_sync_%u</literal></quote>
+ (parameters: Subscription <parameter>oid</parameter>,
+ Table <parameter>relid</parameter>)
+ </para>
+ <para>
+ Normally, the remote replication slot is created automatically when the
+ subscription is created using <command>CREATE SUBSCRIPTION</command> and it
+ is dropped automatically when the subscription is dropped using
+ <command>DROP SUBSCRIPTION</command>. In some situations, however, it can
+ be useful or necessary to manipulate the subscription and the underlying
+ replication slot separately. Here are some scenarios:
<itemizedlist>
<listitem>
@@ -294,8 +304,9 @@
using <command>ALTER SUBSCRIPTION</command> before attempting to drop
the subscription. If the remote database instance no longer exists, no
further action is then necessary. If, however, the remote database
- instance is just unreachable, the replication slot should then be
- dropped manually; otherwise it would continue to reserve WAL and might
+ instance is just unreachable, the replication slot (and any still
+ remaining table synchronization slots) should then be
+ dropped manually; otherwise it/they would continue to reserve WAL and might
eventually cause the disk to fill up. Such cases should be carefully
investigated.
</para>
diff --git a/doc/src/sgml/ref/alter_subscription.sgml b/doc/src/sgml/ref/alter_subscription.sgml
index db5e59f707..9b92ea5f20 100644
--- a/doc/src/sgml/ref/alter_subscription.sgml
+++ b/doc/src/sgml/ref/alter_subscription.sgml
@@ -48,6 +48,24 @@ ALTER SUBSCRIPTION <replaceable class="parameter">name</replaceable> RENAME TO <
(Currently, all subscription owners must be superusers, so the owner checks
will be bypassed in practice. But this might change in the future.)
</para>
+
+ <para>
+ When refreshing a publication we remove the relations that are no longer
+ part of the publication and we also remove the tablesync slots if there are
+ any. It is necessary to remove tablesync slots so that the resources
+ allocated for the subscription on the remote host are released. If due to
+ network breakdown or some other error, <productname>PostgreSQL</productname>
+ is unable to remove the slots, a WARNING will be reported. The user needs to
+ manually remove such slots later as otherwise, they will continue to reserve
+ WAL and might eventually cause the disk to fill up. See also
+ <xref linkend="logical-replication-subscription-slot"/>.
+ </para>
+
+ <para>
+ Commands <command>ALTER SUBSCRIPTION ... REFRESH ..</command> and
+ <command>ALTER SUBSCRIPTION ... SET PUBLICATION ..</command> with refresh
+ option as true cannot be executed inside a transaction block.
+ </para>
</refsect1>
<refsect1>
diff --git a/doc/src/sgml/ref/drop_subscription.sgml b/doc/src/sgml/ref/drop_subscription.sgml
index adbdeafb4e..aee9615546 100644
--- a/doc/src/sgml/ref/drop_subscription.sgml
+++ b/doc/src/sgml/ref/drop_subscription.sgml
@@ -79,7 +79,8 @@ DROP SUBSCRIPTION [ IF EXISTS ] <replaceable class="parameter">name</replaceable
<para>
When dropping a subscription that is associated with a replication slot on
the remote host (the normal state), <command>DROP SUBSCRIPTION</command>
- will connect to the remote host and try to drop the replication slot as
+ will connect to the remote host and try to drop the replication slot (and
+ any remaining table synchronization slots) as
part of its operation. This is necessary so that the resources allocated
for the subscription on the remote host are released. If this fails,
either because the remote host is not reachable or because the remote
@@ -89,7 +90,8 @@ DROP SUBSCRIPTION [ IF EXISTS ] <replaceable class="parameter">name</replaceable
executing <literal>ALTER SUBSCRIPTION ... SET (slot_name = NONE)</literal>.
After that, <command>DROP SUBSCRIPTION</command> will no longer attempt any
actions on a remote host. Note that if the remote replication slot still
- exists, it should then be dropped manually; otherwise it will continue to
+ exists, it (and any related table synchronization slots) should then be
+ dropped manually; otherwise it/they will continue to
reserve WAL and might eventually cause the disk to fill up. See
also <xref linkend="logical-replication-subscription-slot"/>.
</para>
diff --git a/src/backend/access/transam/xact.c b/src/backend/access/transam/xact.c
index a2068e3fd4..3c8b4eb362 100644
--- a/src/backend/access/transam/xact.c
+++ b/src/backend/access/transam/xact.c
@@ -2432,15 +2432,6 @@ PrepareTransaction(void)
(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
errmsg("cannot PREPARE a transaction that has exported snapshots")));
- /*
- * Don't allow PREPARE but for transaction that has/might kill logical
- * replication workers.
- */
- if (XactManipulatesLogicalReplicationWorkers())
- ereport(ERROR,
- (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
- errmsg("cannot PREPARE a transaction that has manipulated logical replication workers")));
-
/* Prevent cancel/die interrupt while cleaning up */
HOLD_INTERRUPTS();
@@ -4899,7 +4890,6 @@ CommitSubTransaction(void)
AtEOSubXact_HashTables(true, s->nestingLevel);
AtEOSubXact_PgStat(true, s->nestingLevel);
AtSubCommit_Snapshot(s->nestingLevel);
- AtEOSubXact_ApplyLauncher(true, s->nestingLevel);
/*
* We need to restore the upper transaction's read-only state, in case the
@@ -5059,7 +5049,6 @@ AbortSubTransaction(void)
AtEOSubXact_HashTables(false, s->nestingLevel);
AtEOSubXact_PgStat(false, s->nestingLevel);
AtSubAbort_Snapshot(s->nestingLevel);
- AtEOSubXact_ApplyLauncher(false, s->nestingLevel);
}
/*
diff --git a/src/backend/catalog/pg_subscription.c b/src/backend/catalog/pg_subscription.c
index 44cb285b68..4f567fd221 100644
--- a/src/backend/catalog/pg_subscription.c
+++ b/src/backend/catalog/pg_subscription.c
@@ -29,6 +29,7 @@
#include "utils/array.h"
#include "utils/builtins.h"
#include "utils/fmgroids.h"
+#include "utils/lsyscache.h"
#include "utils/pg_lsn.h"
#include "utils/rel.h"
#include "utils/syscache.h"
@@ -337,6 +338,13 @@ GetSubscriptionRelState(Oid subid, Oid relid, XLogRecPtr *sublsn)
char substate;
bool isnull;
Datum d;
+ Relation rel;
+
+ /*
+ * This is to avoid the race condition with AlterSubscription which tries
+ * to remove this relstate.
+ */
+ rel = table_open(SubscriptionRelRelationId, AccessShareLock);
/* Try finding the mapping. */
tup = SearchSysCache2(SUBSCRIPTIONRELMAP,
@@ -363,6 +371,8 @@ GetSubscriptionRelState(Oid subid, Oid relid, XLogRecPtr *sublsn)
/* Cleanup */
ReleaseSysCache(tup);
+ table_close(rel, AccessShareLock);
+
return substate;
}
@@ -403,6 +413,31 @@ RemoveSubscriptionRel(Oid subid, Oid relid)
scan = table_beginscan_catalog(rel, nkeys, skey);
while (HeapTupleIsValid(tup = heap_getnext(scan, ForwardScanDirection)))
{
+ Form_pg_subscription_rel subrel;
+
+ subrel = (Form_pg_subscription_rel) GETSTRUCT(tup);
+
+ /*
+ * We don't allow to drop the relation mapping when the table
+ * synchronization is in progress unless the caller updates the
+ * corresponding subscription as well. This is to ensure that we don't
+ * leave tablesync slots or origins in the system when the
+ * corresponding table is dropped.
+ */
+ if (!OidIsValid(subid) && subrel->srsubstate != SUBREL_STATE_READY)
+ {
+ ereport(ERROR,
+ (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+ errmsg("could not drop relation mapping for subscription \"%s\"",
+ get_subscription_name(subrel->srsubid, false)),
+ errdetail("Table synchronization for relation \"%s\" is in progress and is in state \"%c\".",
+ get_rel_name(relid), subrel->srsubstate),
+ /* translator: first %s is a SQL ALTER command and second %s is a SQL DROP command */
+ errhint("Use %s to enable subscription if not already enabled or use %s to drop the subscription.",
+ "ALTER SUBSCRIPTION ... ENABLE",
+ "DROP SUBSCRIPTION ...")));
+ }
+
CatalogTupleDelete(rel, &tup->t_self);
}
table_endscan(scan);
diff --git a/src/backend/commands/subscriptioncmds.c b/src/backend/commands/subscriptioncmds.c
index 082f7855b8..1d3ca4325c 100644
--- a/src/backend/commands/subscriptioncmds.c
+++ b/src/backend/commands/subscriptioncmds.c
@@ -34,6 +34,7 @@
#include "nodes/makefuncs.h"
#include "replication/logicallauncher.h"
#include "replication/origin.h"
+#include "replication/slot.h"
#include "replication/walreceiver.h"
#include "replication/walsender.h"
#include "replication/worker_internal.h"
@@ -46,6 +47,8 @@
#include "utils/syscache.h"
static List *fetch_table_list(WalReceiverConn *wrconn, List *publications);
+static void ReportSlotConnectionError(List *rstates, Oid subid, char *slotname, char *err);
+
/*
* Common option parsing function for CREATE and ALTER SUBSCRIPTION commands.
@@ -566,107 +569,197 @@ AlterSubscription_refresh(Subscription *sub, bool copy_data)
Oid *pubrel_local_oids;
ListCell *lc;
int off;
+ int remove_rel_len;
+ Relation rel = NULL;
+ typedef struct SubRemoveRels
+ {
+ Oid relid;
+ char state;
+ } SubRemoveRels;
+ SubRemoveRels *sub_remove_rels;
/* Load the library providing us libpq calls. */
load_file("libpqwalreceiver", false);
- /* Try to connect to the publisher. */
- wrconn = walrcv_connect(sub->conninfo, true, sub->name, &err);
- if (!wrconn)
- ereport(ERROR,
- (errmsg("could not connect to the publisher: %s", err)));
-
- /* Get the table list from publisher. */
- pubrel_names = fetch_table_list(wrconn, sub->publications);
-
- /* We are done with the remote side, close connection. */
- walrcv_disconnect(wrconn);
-
- /* Get local table list. */
- subrel_states = GetSubscriptionRelations(sub->oid);
-
- /*
- * Build qsorted array of local table oids for faster lookup. This can
- * potentially contain all tables in the database so speed of lookup is
- * important.
- */
- subrel_local_oids = palloc(list_length(subrel_states) * sizeof(Oid));
- off = 0;
- foreach(lc, subrel_states)
+ PG_TRY();
{
- SubscriptionRelState *relstate = (SubscriptionRelState *) lfirst(lc);
+ /* Try to connect to the publisher. */
+ wrconn = walrcv_connect(sub->conninfo, true, sub->name, &err);
+ if (!wrconn)
+ ereport(ERROR,
+ (errmsg("could not connect to the publisher: %s", err)));
- subrel_local_oids[off++] = relstate->relid;
- }
- qsort(subrel_local_oids, list_length(subrel_states),
- sizeof(Oid), oid_cmp);
+ /* Get the table list from publisher. */
+ pubrel_names = fetch_table_list(wrconn, sub->publications);
- /*
- * Walk over the remote tables and try to match them to locally known
- * tables. If the table is not known locally create a new state for it.
- *
- * Also builds array of local oids of remote tables for the next step.
- */
- off = 0;
- pubrel_local_oids = palloc(list_length(pubrel_names) * sizeof(Oid));
+ /* Get local table list. */
+ subrel_states = GetSubscriptionRelations(sub->oid);
- foreach(lc, pubrel_names)
- {
- RangeVar *rv = (RangeVar *) lfirst(lc);
- Oid relid;
+ /*
+ * Build qsorted array of local table oids for faster lookup. This can
+ * potentially contain all tables in the database so speed of lookup
+ * is important.
+ */
+ subrel_local_oids = palloc(list_length(subrel_states) * sizeof(Oid));
+ off = 0;
+ foreach(lc, subrel_states)
+ {
+ SubscriptionRelState *relstate = (SubscriptionRelState *) lfirst(lc);
- relid = RangeVarGetRelid(rv, AccessShareLock, false);
+ subrel_local_oids[off++] = relstate->relid;
+ }
+ qsort(subrel_local_oids, list_length(subrel_states),
+ sizeof(Oid), oid_cmp);
+
+ /*
+ * Rels that we want to remove from subscription and drop any slots and
+ * origins corresponding to them.
+ */
+ sub_remove_rels = palloc(list_length(subrel_states) * sizeof(SubRemoveRels));
+
+ /*
+ * Walk over the remote tables and try to match them to locally known
+ * tables. If the table is not known locally create a new state for
+ * it.
+ *
+ * Also builds array of local oids of remote tables for the next step.
+ */
+ off = 0;
+ pubrel_local_oids = palloc(list_length(pubrel_names) * sizeof(Oid));
+
+ foreach(lc, pubrel_names)
+ {
+ RangeVar *rv = (RangeVar *) lfirst(lc);
+ Oid relid;
- /* Check for supported relkind. */
- CheckSubscriptionRelkind(get_rel_relkind(relid),
- rv->schemaname, rv->relname);
+ relid = RangeVarGetRelid(rv, AccessShareLock, false);
- pubrel_local_oids[off++] = relid;
+ /* Check for supported relkind. */
+ CheckSubscriptionRelkind(get_rel_relkind(relid),
+ rv->schemaname, rv->relname);
- if (!bsearch(&relid, subrel_local_oids,
- list_length(subrel_states), sizeof(Oid), oid_cmp))
- {
- AddSubscriptionRelState(sub->oid, relid,
- copy_data ? SUBREL_STATE_INIT : SUBREL_STATE_READY,
- InvalidXLogRecPtr);
- ereport(DEBUG1,
- (errmsg("table \"%s.%s\" added to subscription \"%s\"",
- rv->schemaname, rv->relname, sub->name)));
- }
- }
+ pubrel_local_oids[off++] = relid;
- /*
- * Next remove state for tables we should not care about anymore using the
- * data we collected above
- */
- qsort(pubrel_local_oids, list_length(pubrel_names),
- sizeof(Oid), oid_cmp);
+ if (!bsearch(&relid, subrel_local_oids,
+ list_length(subrel_states), sizeof(Oid), oid_cmp))
+ {
+ AddSubscriptionRelState(sub->oid, relid,
+ copy_data ? SUBREL_STATE_INIT : SUBREL_STATE_READY,
+ InvalidXLogRecPtr);
+ ereport(DEBUG1,
+ (errmsg("table \"%s.%s\" added to subscription \"%s\"",
+ rv->schemaname, rv->relname, sub->name)));
+ }
+ }
- for (off = 0; off < list_length(subrel_states); off++)
- {
- Oid relid = subrel_local_oids[off];
+ /*
+ * Next remove state for tables we should not care about anymore using
+ * the data we collected above
+ */
+ qsort(pubrel_local_oids, list_length(pubrel_names),
+ sizeof(Oid), oid_cmp);
- if (!bsearch(&relid, pubrel_local_oids,
- list_length(pubrel_names), sizeof(Oid), oid_cmp))
+ remove_rel_len = 0;
+ for (off = 0; off < list_length(subrel_states); off++)
{
- RemoveSubscriptionRel(sub->oid, relid);
+ Oid relid = subrel_local_oids[off];
- logicalrep_worker_stop_at_commit(sub->oid, relid);
+ if (!bsearch(&relid, pubrel_local_oids,
+ list_length(pubrel_names), sizeof(Oid), oid_cmp))
+ {
+ char state;
+ XLogRecPtr statelsn;
+
+ /*
+ * Lock pg_subscription_rel with AccessExclusiveLock to prevent any race
+ * conditions with the apply worker re-launching workers at the same time
+ * this code is trying to remove those tables.
+ *
+ * Even if new worker for this particular rel is restarted it won't be able
+ * to make any progress as we hold exclusive lock on subscription_rel till
+ * the transaction end. It will simply exit as there is no corresponding
+ * rel entry.
+ *
+ * This locking also ensures that the state of rels won't change till we
+ * are done with this refresh operation.
+ */
+ if (!rel)
+ rel = table_open(SubscriptionRelRelationId, AccessExclusiveLock);
+
+ /* Last known rel state. */
+ state = GetSubscriptionRelState(sub->oid, relid, &statelsn);
+
+ sub_remove_rels[remove_rel_len].relid = relid;
+ sub_remove_rels[remove_rel_len++].state = state;
+
+ RemoveSubscriptionRel(sub->oid, relid);
+
+ logicalrep_worker_stop(sub->oid, relid);
+
+ /*
+ * For READY state, we would have already dropped the tablesync
+ * origin.
+ */
+ if (state != SUBREL_STATE_READY)
+ tablesync_replorigin_drop(sub->oid, relid, false /* nowait */);
+
+ ereport(DEBUG1,
+ (errmsg("table \"%s.%s\" removed from subscription \"%s\"",
+ get_namespace_name(get_rel_namespace(relid)),
+ get_rel_name(relid),
+ sub->name)));
+ }
+ }
- ereport(DEBUG1,
- (errmsg("table \"%s.%s\" removed from subscription \"%s\"",
- get_namespace_name(get_rel_namespace(relid)),
- get_rel_name(relid),
- sub->name)));
+ /*
+ * Drop the tablesync slots associated with removed tables. This has to
+ * be at the end because otherwise if there is an error while doing the
+ * database operations we won't be able to rollback dropped slots.
+ */
+ for (off = 0; off < remove_rel_len; off++)
+ {
+ if (sub_remove_rels[off].state != SUBREL_STATE_READY &&
+ sub_remove_rels[off].state != SUBREL_STATE_SYNCDONE)
+ {
+ char syncslotname[NAMEDATALEN] = { 0 };
+
+ /*
+ * For READY/SYNCDONE states we know the tablesync slot has
+ * already been dropped by the tablesync worker.
+ *
+ * For other states, there is no certainty, maybe the slot
+ * does not exist yet. Also, if we fail after removing some of
+ * the slots, next time, it will again try to drop already
+ * dropped slots and fail. For these reasons, we allow
+ * missing_ok = true for the drop.
+ *
+ * XXX If there is a network breakdown while dropping the
+ * slots then we will give a WARNING to the user and they need
+ * to manually remove such slots. This can happen so rarely to
+ * worry about and we don't have any better way to deal with
+ * this.
+ */
+ ReplicationSlotNameForTablesync(sub->oid, sub_remove_rels[off].relid, syncslotname);
+ ReplicationSlotDropAtPubNode(wrconn, syncslotname, true /* missing_ok */);
+ }
}
}
+ PG_FINALLY();
+ {
+ if (wrconn)
+ walrcv_disconnect(wrconn);
+ }
+ PG_END_TRY();
+
+ if (rel)
+ table_close(rel, NoLock);
}
/*
* Alter the existing subscription.
*/
ObjectAddress
-AlterSubscription(AlterSubscriptionStmt *stmt)
+AlterSubscription(AlterSubscriptionStmt *stmt, bool isTopLevel)
{
Relation rel;
ObjectAddress myself;
@@ -848,6 +941,8 @@ AlterSubscription(AlterSubscriptionStmt *stmt)
errmsg("ALTER SUBSCRIPTION with refresh is not allowed for disabled subscriptions"),
errhint("Use ALTER SUBSCRIPTION ... SET PUBLICATION ... WITH (refresh = false).")));
+ PreventInTransactionBlock(isTopLevel, "ALTER SUBSCRIPTION with refresh");
+
/* Make sure refresh sees the new list of publications. */
sub->publications = stmt->publication;
@@ -877,6 +972,8 @@ AlterSubscription(AlterSubscriptionStmt *stmt)
NULL, NULL, /* no "binary" */
NULL, NULL); /* no "streaming" */
+ PreventInTransactionBlock(isTopLevel, "ALTER SUBSCRIPTION ... REFRESH");
+
AlterSubscription_refresh(sub, copy_data);
break;
@@ -928,8 +1025,8 @@ DropSubscription(DropSubscriptionStmt *stmt, bool isTopLevel)
char *err = NULL;
RepOriginId originid;
WalReceiverConn *wrconn = NULL;
- StringInfoData cmd;
Form_pg_subscription form;
+ List *rstates;
/*
* Lock pg_subscription with AccessExclusiveLock to ensure that the
@@ -1042,6 +1139,31 @@ DropSubscription(DropSubscriptionStmt *stmt, bool isTopLevel)
}
list_free(subworkers);
+ /*
+ * Cleanup of tablesync replication origins.
+ *
+ * Any READY-state relations would already have dealt with clean-ups.
+ *
+ * Note that the state can't change because we have already stopped both
+ * the apply and tablesync workers and they can't restart because of
+ * exclusive lock on the subscription.
+ */
+ rstates = GetSubscriptionNotReadyRelations(subid);
+ foreach(lc, rstates)
+ {
+ SubscriptionRelState* rstate = (SubscriptionRelState*) lfirst(lc);
+ Oid relid = rstate->relid;
+
+ /* Only cleanup resources of tablesync workers */
+ if (!OidIsValid(relid))
+ continue;
+
+ /*
+ * Drop the tablesync's origin tracking if exists.
+ */
+ tablesync_replorigin_drop(subid, relid, false /* nowait */);
+ }
+
/* Clean up dependencies */
deleteSharedDependencyRecordsFor(SubscriptionRelationId, subid, 0);
@@ -1058,30 +1180,108 @@ DropSubscription(DropSubscriptionStmt *stmt, bool isTopLevel)
* If there is no slot associated with the subscription, we can finish
* here.
*/
- if (!slotname)
+ if (!slotname && rstates == NIL)
{
table_close(rel, NoLock);
return;
}
/*
- * Otherwise drop the replication slot at the publisher node using the
- * replication connection.
+ * Try to acquire the connection necessary for dropping slots.
+ *
+ * Note: If the slotname is NONE/NULL then we allow the command to finish
+ * and users need to manually cleanup the apply and tablesync worker slots
+ * later.
+ *
+ * This has to be at the end because otherwise if there is an error while
+ * doing the database operations we won't be able to rollback dropped slot.
*/
load_file("libpqwalreceiver", false);
- initStringInfo(&cmd);
- appendStringInfo(&cmd, "DROP_REPLICATION_SLOT %s WAIT", quote_identifier(slotname));
-
wrconn = walrcv_connect(conninfo, true, subname, &err);
if (wrconn == NULL)
- ereport(ERROR,
- (errmsg("could not connect to publisher when attempting to "
- "drop the replication slot \"%s\"", slotname),
- errdetail("The error was: %s", err),
- /* translator: %s is an SQL ALTER command */
- errhint("Use %s to disassociate the subscription from the slot.",
- "ALTER SUBSCRIPTION ... SET (slot_name = NONE)")));
+ {
+ if (!slotname)
+ {
+ /* be tidy */
+ list_free(rstates);
+ return;
+ }
+ else
+ {
+ ReportSlotConnectionError(rstates, subid, slotname, err);
+ }
+ }
+
+ PG_TRY();
+ {
+ foreach(lc, rstates)
+ {
+ SubscriptionRelState *rstate = (SubscriptionRelState *) lfirst(lc);
+ Oid relid = rstate->relid;
+
+ /* Only cleanup resources of tablesync workers */
+ if (!OidIsValid(relid))
+ continue;
+
+ /*
+ * Drop the tablesync slots associated with removed tables.
+ *
+ * For SYNCDONE/READY states, the tablesync slot is known to have
+ * already been dropped by the tablesync worker.
+ *
+ * For other states, there is no certainty, maybe the slot
+ * does not exist yet. Also, if we fail after removing some of
+ * the slots, next time, it will again try to drop already
+ * dropped slots and fail. For these reasons, we allow
+ * missing_ok = true for the drop.
+ */
+ if (rstate->state != SUBREL_STATE_SYNCDONE)
+ {
+ char syncslotname[NAMEDATALEN] = {0};
+
+ ReplicationSlotNameForTablesync(subid, relid, syncslotname);
+ ReplicationSlotDropAtPubNode(wrconn, syncslotname, true /* missing_ok */ );
+ }
+ }
+
+ list_free(rstates);
+
+ /*
+ * If there is a slot associated with the subscription, then drop the
+ * replication slot at the publisher.
+ */
+ if (slotname)
+ ReplicationSlotDropAtPubNode(wrconn, slotname, false /* missing_ok */ );
+
+ }
+ PG_FINALLY();
+ {
+ walrcv_disconnect(wrconn);
+ }
+ PG_END_TRY();
+
+ table_close(rel, NoLock);
+}
+
+/*
+ * Drop the replication slot at the publisher node using the replication
+ * connection.
+ *
+ * missing_ok - if true then only issue WARNING message if the slot cannot be
+ * deleted.
+ */
+void
+ReplicationSlotDropAtPubNode(WalReceiverConn *wrconn, char *slotname, bool missing_ok)
+{
+ StringInfoData cmd;
+
+ Assert(wrconn);
+
+ load_file("libpqwalreceiver", false);
+
+ initStringInfo(&cmd);
+ appendStringInfo(&cmd, "DROP_REPLICATION_SLOT %s WAIT", quote_identifier(slotname));
PG_TRY();
{
@@ -1089,27 +1289,37 @@ DropSubscription(DropSubscriptionStmt *stmt, bool isTopLevel)
res = walrcv_exec(wrconn, cmd.data, 0, NULL);
- if (res->status != WALRCV_OK_COMMAND)
- ereport(ERROR,
+ if (res->status == WALRCV_OK_COMMAND)
+ {
+ /* NOTICE. Success. */
+ ereport(NOTICE,
+ (errmsg("dropped replication slot \"%s\" on publisher",
+ slotname)));
+ }
+ else if (res->status == WALRCV_ERROR && missing_ok)
+ {
+ /* WARNING. Error, but missing_ok = true. */
+ ereport(WARNING,
(errmsg("could not drop the replication slot \"%s\" on publisher",
slotname),
errdetail("The error was: %s", res->err)));
+ }
else
- ereport(NOTICE,
- (errmsg("dropped replication slot \"%s\" on publisher",
- slotname)));
+ {
+ /* ERROR. */
+ ereport(ERROR,
+ (errmsg("could not drop the replication slot \"%s\" on publisher",
+ slotname),
+ errdetail("The error was: %s", res->err)));
+ }
walrcv_clear_result(res);
}
PG_FINALLY();
{
- walrcv_disconnect(wrconn);
+ pfree(cmd.data);
}
PG_END_TRY();
-
- pfree(cmd.data);
-
- table_close(rel, NoLock);
}
/*
@@ -1278,3 +1488,45 @@ fetch_table_list(WalReceiverConn *wrconn, List *publications)
return tablelist;
}
+
+/*
+ * This is to report the connection failure while dropping replication slots.
+ * Here, we report the WARNING for all tablesync slots so that user can drop
+ * them manually, if required.
+ */
+static void
+ReportSlotConnectionError(List *rstates, Oid subid, char *slotname, char *err)
+{
+ ListCell *lc;
+
+ foreach(lc, rstates)
+ {
+ SubscriptionRelState *rstate = (SubscriptionRelState *) lfirst(lc);
+ Oid relid = rstate->relid;
+
+ /* Only cleanup resources of tablesync workers */
+ if (!OidIsValid(relid))
+ continue;
+
+ /*
+ * Caller needs to ensure that relstate doesn't change underneath us.
+ * See DropSubscription where we get the relstates.
+ */
+ if (rstate->state != SUBREL_STATE_SYNCDONE)
+ {
+ char syncslotname[NAMEDATALEN] = { 0 };
+
+ ReplicationSlotNameForTablesync(subid, relid, syncslotname);
+ elog(WARNING, "could not drop tablesync replication slot \"%s\"",
+ syncslotname);
+ }
+ }
+
+ ereport(ERROR,
+ (errmsg("could not connect to publisher when attempting to "
+ "drop the replication slot \"%s\"", slotname),
+ errdetail("The error was: %s", err),
+ /* translator: %s is an SQL ALTER command */
+ errhint("Use %s to disassociate the subscription from the slot.",
+ "ALTER SUBSCRIPTION ... SET (slot_name = NONE)")));
+}
diff --git a/src/backend/replication/logical/launcher.c b/src/backend/replication/logical/launcher.c
index 186514cd9e..58082dde18 100644
--- a/src/backend/replication/logical/launcher.c
+++ b/src/backend/replication/logical/launcher.c
@@ -73,20 +73,6 @@ typedef struct LogicalRepWorkerId
Oid relid;
} LogicalRepWorkerId;
-typedef struct StopWorkersData
-{
- int nestDepth; /* Sub-transaction nest level */
- List *workers; /* List of LogicalRepWorkerId */
- struct StopWorkersData *parent; /* This need not be an immediate
- * subtransaction parent */
-} StopWorkersData;
-
-/*
- * Stack of StopWorkersData elements. Each stack element contains the workers
- * to be stopped for that subtransaction.
- */
-static StopWorkersData *on_commit_stop_workers = NULL;
-
static void ApplyLauncherWakeup(void);
static void logicalrep_launcher_onexit(int code, Datum arg);
static void logicalrep_worker_onexit(int code, Datum arg);
@@ -546,51 +532,6 @@ logicalrep_worker_stop(Oid subid, Oid relid)
LWLockRelease(LogicalRepWorkerLock);
}
-/*
- * Request worker for specified sub/rel to be stopped on commit.
- */
-void
-logicalrep_worker_stop_at_commit(Oid subid, Oid relid)
-{
- int nestDepth = GetCurrentTransactionNestLevel();
- LogicalRepWorkerId *wid;
- MemoryContext oldctx;
-
- /* Make sure we store the info in context that survives until commit. */
- oldctx = MemoryContextSwitchTo(TopTransactionContext);
-
- /* Check that previous transactions were properly cleaned up. */
- Assert(on_commit_stop_workers == NULL ||
- nestDepth >= on_commit_stop_workers->nestDepth);
-
- /*
- * Push a new stack element if we don't already have one for the current
- * nestDepth.
- */
- if (on_commit_stop_workers == NULL ||
- nestDepth > on_commit_stop_workers->nestDepth)
- {
- StopWorkersData *newdata = palloc(sizeof(StopWorkersData));
-
- newdata->nestDepth = nestDepth;
- newdata->workers = NIL;
- newdata->parent = on_commit_stop_workers;
- on_commit_stop_workers = newdata;
- }
-
- /*
- * Finally add a new worker into the worker list of the current
- * subtransaction.
- */
- wid = palloc(sizeof(LogicalRepWorkerId));
- wid->subid = subid;
- wid->relid = relid;
- on_commit_stop_workers->workers =
- lappend(on_commit_stop_workers->workers, wid);
-
- MemoryContextSwitchTo(oldctx);
-}
-
/*
* Wake up (using latch) any logical replication worker for specified sub/rel.
*/
@@ -819,109 +760,21 @@ ApplyLauncherShmemInit(void)
}
}
-/*
- * Check whether current transaction has manipulated logical replication
- * workers.
- */
-bool
-XactManipulatesLogicalReplicationWorkers(void)
-{
- return (on_commit_stop_workers != NULL);
-}
-
/*
* Wakeup the launcher on commit if requested.
*/
void
AtEOXact_ApplyLauncher(bool isCommit)
{
-
- Assert(on_commit_stop_workers == NULL ||
- (on_commit_stop_workers->nestDepth == 1 &&
- on_commit_stop_workers->parent == NULL));
-
if (isCommit)
{
- ListCell *lc;
-
- if (on_commit_stop_workers != NULL)
- {
- List *workers = on_commit_stop_workers->workers;
-
- foreach(lc, workers)
- {
- LogicalRepWorkerId *wid = lfirst(lc);
-
- logicalrep_worker_stop(wid->subid, wid->relid);
- }
- }
-
if (on_commit_launcher_wakeup)
ApplyLauncherWakeup();
}
- /*
- * No need to pfree on_commit_stop_workers. It was allocated in
- * transaction memory context, which is going to be cleaned soon.
- */
- on_commit_stop_workers = NULL;
on_commit_launcher_wakeup = false;
}
-/*
- * On commit, merge the current on_commit_stop_workers list into the
- * immediate parent, if present.
- * On rollback, discard the current on_commit_stop_workers list.
- * Pop out the stack.
- */
-void
-AtEOSubXact_ApplyLauncher(bool isCommit, int nestDepth)
-{
- StopWorkersData *parent;
-
- /* Exit immediately if there's no work to do at this level. */
- if (on_commit_stop_workers == NULL ||
- on_commit_stop_workers->nestDepth < nestDepth)
- return;
-
- Assert(on_commit_stop_workers->nestDepth == nestDepth);
-
- parent = on_commit_stop_workers->parent;
-
- if (isCommit)
- {
- /*
- * If the upper stack element is not an immediate parent
- * subtransaction, just decrement the notional nesting depth without
- * doing any real work. Else, we need to merge the current workers
- * list into the parent.
- */
- if (!parent || parent->nestDepth < nestDepth - 1)
- {
- on_commit_stop_workers->nestDepth--;
- return;
- }
-
- parent->workers =
- list_concat(parent->workers, on_commit_stop_workers->workers);
- }
- else
- {
- /*
- * Abandon everything that was done at this nesting level. Explicitly
- * free memory to avoid a transaction-lifespan leak.
- */
- list_free_deep(on_commit_stop_workers->workers);
- }
-
- /*
- * We have taken care of the current subtransaction workers list for both
- * abort or commit. So we are ready to pop the stack.
- */
- pfree(on_commit_stop_workers);
- on_commit_stop_workers = parent;
-}
-
/*
* Request wakeup of the launcher on commit of the transaction.
*
diff --git a/src/backend/replication/logical/tablesync.c b/src/backend/replication/logical/tablesync.c
index ccbdbcf08f..b636976dc4 100644
--- a/src/backend/replication/logical/tablesync.c
+++ b/src/backend/replication/logical/tablesync.c
@@ -31,8 +31,11 @@
* table state to INIT.
* - Tablesync worker starts; changes table state from INIT to DATASYNC while
* copying.
- * - Tablesync worker finishes the copy and sets table state to SYNCWAIT;
- * waits for state change.
+ * - Tablesync worker does initial table copy; there is a FINISHEDCOPY (sync
+ * worker specific) state to indicate when the copy phase has completed, so
+ * if the worker crashes with this (non-memory) state then the copy will not
+ * be re-attempted.
+ * - Tablesync worker then sets table state to SYNCWAIT; waits for state change.
* - Apply worker periodically checks for tables in SYNCWAIT state. When
* any appear, it sets the table state to CATCHUP and starts loop-waiting
* until either the table state is set to SYNCDONE or the sync worker
@@ -48,8 +51,8 @@
* point it sets state to READY and stops tracking. Again, there might
* be zero changes in between.
*
- * So the state progression is always: INIT -> DATASYNC -> SYNCWAIT ->
- * CATCHUP -> SYNCDONE -> READY.
+ * So the state progression is always: INIT -> DATASYNC -> FINISHEDCOPY
+ * -> SYNCWAIT -> CATCHUP -> SYNCDONE -> READY.
*
* The catalog pg_subscription_rel is used to keep information about
* subscribed tables and their state. The catalog holds all states
@@ -58,6 +61,7 @@
* Example flows look like this:
* - Apply is in front:
* sync:8
+ * -> set in catalog FINISHEDCOPY
* -> set in memory SYNCWAIT
* apply:10
* -> set in memory CATCHUP
@@ -73,6 +77,7 @@
*
* - Sync is in front:
* sync:10
+ * -> set in catalog FINISHEDCOPY
* -> set in memory SYNCWAIT
* apply:8
* -> set in memory CATCHUP
@@ -101,7 +106,10 @@
#include "replication/logicalrelation.h"
#include "replication/walreceiver.h"
#include "replication/worker_internal.h"
+#include "replication/slot.h"
+#include "replication/origin.h"
#include "storage/ipc.h"
+#include "storage/lmgr.h"
#include "utils/builtins.h"
#include "utils/lsyscache.h"
#include "utils/memutils.h"
@@ -111,6 +119,42 @@ static bool table_states_valid = false;
StringInfo copybuf = NULL;
+/*
+ * Common code to drop the origin of a tablesync worker.
+ *
+ * There is a potential race condition if two processes attempt to call
+ * replorigin_drop for the same originid at the same time. The loser of
+ * that race would give an ERROR saying that it failed to find the
+ * expected originid.
+ *
+ * The TRY/CATCH below supresses such errors allowing the tablesync cleanup
+ * code to proceed.
+ */
+void
+tablesync_replorigin_drop(Oid subid, Oid relid, bool nowait)
+{
+ char originname[NAMEDATALEN];
+ RepOriginId originid;
+
+ snprintf(originname, sizeof(originname), "pg_%u_%u", subid, relid);
+ originid = replorigin_by_name(originname, true);
+ if (OidIsValid(originid))
+ {
+ PG_TRY();
+ {
+ replorigin_drop(originid, nowait);
+ }
+ PG_CATCH();
+ {
+ ereport(WARNING,
+ errmsg("could not drop replication origin with OID %d, named \"%s\"",
+ originid,
+ originname));
+ }
+ PG_END_TRY();
+ }
+}
+
/*
* Exit routine for synchronization worker.
*/
@@ -269,26 +313,52 @@ invalidate_syncing_table_states(Datum arg, int cacheid, uint32 hashvalue)
static void
process_syncing_tables_for_sync(XLogRecPtr current_lsn)
{
- Assert(IsTransactionState());
-
SpinLockAcquire(&MyLogicalRepWorker->relmutex);
if (MyLogicalRepWorker->relstate == SUBREL_STATE_CATCHUP &&
current_lsn >= MyLogicalRepWorker->relstate_lsn)
{
TimeLineID tli;
+ char syncslotname[NAMEDATALEN] = {0};
MyLogicalRepWorker->relstate = SUBREL_STATE_SYNCDONE;
MyLogicalRepWorker->relstate_lsn = current_lsn;
SpinLockRelease(&MyLogicalRepWorker->relmutex);
+ /*
+ * UpdateSubscriptionRelState must be called within a transaction.
+ * That transaction will be ended within the finish_sync_worker().
+ */
+ if (!IsTransactionState())
+ StartTransactionCommand();
+
UpdateSubscriptionRelState(MyLogicalRepWorker->subid,
MyLogicalRepWorker->relid,
MyLogicalRepWorker->relstate,
MyLogicalRepWorker->relstate_lsn);
+ /* End wal streaming so wrconn can be re-used to drop the slot. */
walrcv_endstreaming(wrconn, &tli);
+
+ /*
+ * Cleanup the tablesync slot.
+ *
+ * This has to be done after updating the state because otherwise if
+ * there is an error while doing the database operations we won't be
+ * able to rollback dropped slot.
+ */
+ ReplicationSlotNameForTablesync(MyLogicalRepWorker->subid,
+ MyLogicalRepWorker->relid,
+ syncslotname);
+
+ /*
+ * It is important to give an error if we are unable to drop the slot,
+ * otherwise, it won't be dropped till the corresponding subscription
+ * is dropped.
+ */
+ ReplicationSlotDropAtPubNode(wrconn, syncslotname, false /* missing_ok */);
+
finish_sync_worker();
}
else
@@ -411,6 +481,20 @@ process_syncing_tables_for_apply(XLogRecPtr current_lsn)
started_tx = true;
}
+ /*
+ * Remove the tablesync origin tracking if exists.
+ *
+ * The normal case origin drop is done here instead of in the
+ * process_syncing_tables_for_sync function because we don't
+ * allow to drop the origin till the process owning the origin
+ * is alive.
+ */
+ tablesync_replorigin_drop(MyLogicalRepWorker->subid,
+ rstate->relid, false /* nowait */ );
+
+ /*
+ * Update the state to READY only after the origin cleanup.
+ */
UpdateSubscriptionRelState(MyLogicalRepWorker->subid,
rstate->relid, rstate->state,
rstate->lsn);
@@ -805,6 +889,32 @@ copy_table(Relation rel)
logicalrep_rel_close(relmapentry, NoLock);
}
+/*
+ * Determine the tablesync slot name.
+ *
+ * The name must not exceed NAMEDATALEN -1 because of remote node constraints
+ * on slot name length.
+ *
+ * The returned slot name is either:
+ * - stored in the supplied buffer (syncslotname), or
+ * - palloc'ed in current memory context (if syncslotname = NULL).
+ *
+ * Note: We don't use the subscription slot name as part of tablesync slot name
+ * because we are responsible for cleaning up these slots and it could become
+ * impossible to recalculate what name to cleanup if the subscription slot name
+ * had changed.
+ */
+char *
+ReplicationSlotNameForTablesync(Oid suboid, Oid relid, char syncslotname[NAMEDATALEN])
+{
+ if (syncslotname)
+ sprintf(syncslotname, "pg_%u_sync_%u", suboid, relid);
+ else
+ syncslotname = psprintf("pg_%u_sync_%u", suboid, relid);
+
+ return syncslotname;
+}
+
/*
* Start syncing the table in the sync worker.
*
@@ -822,6 +932,8 @@ LogicalRepSyncTableStart(XLogRecPtr *origin_startpos)
XLogRecPtr relstate_lsn;
Relation rel;
WalRcvExecResult *res;
+ char originname[NAMEDATALEN];
+ RepOriginId originid;
/* Check the state of the table synchronization. */
StartTransactionCommand();
@@ -847,19 +959,10 @@ LogicalRepSyncTableStart(XLogRecPtr *origin_startpos)
finish_sync_worker(); /* doesn't return */
}
- /*
- * To build a slot name for the sync work, we are limited to NAMEDATALEN -
- * 1 characters. We cut the original slot name to NAMEDATALEN - 28 chars
- * and append _%u_sync_%u (1 + 10 + 6 + 10 + '\0'). (It's actually the
- * NAMEDATALEN on the remote that matters, but this scheme will also work
- * reasonably if that is different.)
- */
- StaticAssertStmt(NAMEDATALEN >= 32, "NAMEDATALEN too small"); /* for sanity */
- slotname = psprintf("%.*s_%u_sync_%u",
- NAMEDATALEN - 28,
- MySubscription->slotname,
- MySubscription->oid,
- MyLogicalRepWorker->relid);
+ /* Calculate the name of the tablesync slot. */
+ slotname = ReplicationSlotNameForTablesync(MySubscription->oid,
+ MyLogicalRepWorker->relid,
+ NULL /* use palloc */ );
/*
* Here we use the slot name instead of the subscription name as the
@@ -872,7 +975,48 @@ LogicalRepSyncTableStart(XLogRecPtr *origin_startpos)
(errmsg("could not connect to the publisher: %s", err)));
Assert(MyLogicalRepWorker->relstate == SUBREL_STATE_INIT ||
- MyLogicalRepWorker->relstate == SUBREL_STATE_DATASYNC);
+ MyLogicalRepWorker->relstate == SUBREL_STATE_DATASYNC ||
+ MyLogicalRepWorker->relstate == SUBREL_STATE_FINISHEDCOPY);
+
+ /* Assign the origin tracking record name. */
+ snprintf(originname, sizeof(originname), "pg_%u_%u", MySubscription->oid, MyLogicalRepWorker->relid);
+
+ if (MyLogicalRepWorker->relstate == SUBREL_STATE_DATASYNC)
+ {
+ /*
+ * We have previously errored out before finishing the copy so the
+ * replication slot might exist. We want to remove the slot if it
+ * already exists and proceed.
+ *
+ * XXX We could also instead try to drop the slot, last time we failed
+ * but for that, we might need to clean up the copy state as it might
+ * be in the middle of fetching the rows. Also, if there is a network
+ * breakdown then it wouldn't have succeeded so trying it next time
+ * seems like a better bet.
+ */
+ ReplicationSlotDropAtPubNode(wrconn, slotname, true /* missing_ok */);
+ }
+ else if (MyLogicalRepWorker->relstate == SUBREL_STATE_FINISHEDCOPY)
+ {
+ /*
+ * The COPY phase was previously done, but tablesync then crashed
+ * before it was able to finish normally.
+ */
+ StartTransactionCommand();
+
+ /*
+ * The origin tracking name must already exist. It was created first
+ * time this tablesync was launched.
+ */
+ originid = replorigin_by_name(originname, false /* missing_ok */ );
+ replorigin_session_setup(originid);
+ replorigin_session_origin = originid;
+ *origin_startpos = replorigin_session_get_progress(false);
+
+ CommitTransactionCommand();
+
+ goto copy_table_done;
+ }
SpinLockAcquire(&MyLogicalRepWorker->relmutex);
MyLogicalRepWorker->relstate = SUBREL_STATE_DATASYNC;
@@ -888,9 +1032,6 @@ LogicalRepSyncTableStart(XLogRecPtr *origin_startpos)
CommitTransactionCommand();
pgstat_report_stat(false);
- /*
- * We want to do the table data sync in a single transaction.
- */
StartTransactionCommand();
/*
@@ -916,12 +1057,45 @@ LogicalRepSyncTableStart(XLogRecPtr *origin_startpos)
walrcv_clear_result(res);
/*
- * Create a new temporary logical decoding slot. This slot will be used
- * for the catchup phase after COPY is done, so tell it to use the
- * snapshot to make the final data consistent.
+ * Create a new permanent logical decoding slot. This slot will be
+ * used for the catchup phase after COPY is done, so tell it to use
+ * the snapshot to make the final data consistent.
*/
- walrcv_create_slot(wrconn, slotname, true,
- CRS_USE_SNAPSHOT, origin_startpos);
+ walrcv_create_slot(wrconn, slotname, false /* permanent */ ,
+ CRS_USE_SNAPSHOT, origin_startpos);
+
+ /*
+ * Setup replication origin tracking. The purpose of doing this before the
+ * copy is to avoid doing the copy again due to any error in setting up
+ * origin tracking.
+ */
+ originid = replorigin_by_name(originname, true);
+ if (!OidIsValid(originid))
+ {
+ /*
+ * Origin tracking does not exist, so create it now.
+ *
+ * Then advance to the LSN got from walrcv_create_slot. This is
+ * WAL logged for the purpose of recovery. Locks are to prevent
+ * the replication origin from vanishing while advancing.
+ */
+ originid = replorigin_create(originname);
+
+ LockRelationOid(ReplicationOriginRelationId, RowExclusiveLock);
+ replorigin_advance(originid, *origin_startpos, InvalidXLogRecPtr,
+ true /* go backward */, true /* WAL log */);
+ UnlockRelationOid(ReplicationOriginRelationId, RowExclusiveLock);
+
+ replorigin_session_setup(originid);
+ replorigin_session_origin = originid;
+ }
+ else
+ {
+ ereport(ERROR,
+ (errcode(ERRCODE_DUPLICATE_OBJECT),
+ errmsg("replication origin \"%s\" already exists",
+ originname)));
+ }
/* Now do the initial data copy */
PushActiveSnapshot(GetTransactionSnapshot());
@@ -940,6 +1114,25 @@ LogicalRepSyncTableStart(XLogRecPtr *origin_startpos)
/* Make the copy visible. */
CommandCounterIncrement();
+ /*
+ * Update the persisted state to indicate the COPY phase is done; make
+ * it visible to others.
+ */
+ UpdateSubscriptionRelState(MyLogicalRepWorker->subid,
+ MyLogicalRepWorker->relid,
+ SUBREL_STATE_FINISHEDCOPY,
+ MyLogicalRepWorker->relstate_lsn);
+
+ CommitTransactionCommand();
+
+copy_table_done:
+
+ elog(DEBUG1,
+ "LogicalRepSyncTableStart: '%s' origin_startpos lsn %X/%X",
+ originname,
+ (uint32) (*origin_startpos >> 32),
+ (uint32) *origin_startpos);
+
/*
* We are done with the initial data synchronization, update the state.
*/
diff --git a/src/backend/replication/logical/worker.c b/src/backend/replication/logical/worker.c
index eb7db89cef..cfc924cd89 100644
--- a/src/backend/replication/logical/worker.c
+++ b/src/backend/replication/logical/worker.c
@@ -807,12 +807,8 @@ apply_handle_stream_stop(StringInfo s)
/* We must be in a valid transaction state */
Assert(IsTransactionState());
- /* The synchronization worker runs in single transaction. */
- if (!am_tablesync_worker())
- {
- /* Commit the per-stream transaction */
- CommitTransactionCommand();
- }
+ /* Commit the per-stream transaction */
+ CommitTransactionCommand();
in_streamed_transaction = false;
@@ -889,9 +885,7 @@ apply_handle_stream_abort(StringInfo s)
/* Cleanup the subxact info */
cleanup_subxact_info();
- /* The synchronization worker runs in single transaction */
- if (!am_tablesync_worker())
- CommitTransactionCommand();
+ CommitTransactionCommand();
return;
}
@@ -918,8 +912,7 @@ apply_handle_stream_abort(StringInfo s)
/* write the updated subxact list */
subxact_info_write(MyLogicalRepWorker->subid, xid);
- if (!am_tablesync_worker())
- CommitTransactionCommand();
+ CommitTransactionCommand();
}
}
@@ -1062,8 +1055,7 @@ apply_handle_stream_commit(StringInfo s)
static void
apply_handle_commit_internal(StringInfo s, LogicalRepCommitData *commit_data)
{
- /* The synchronization worker runs in single transaction. */
- if (IsTransactionState() && !am_tablesync_worker())
+ if (IsTransactionState())
{
/*
* Update origin state so we can restart streaming from correct
diff --git a/src/backend/tcop/utility.c b/src/backend/tcop/utility.c
index 1d81071c35..05bb698cf4 100644
--- a/src/backend/tcop/utility.c
+++ b/src/backend/tcop/utility.c
@@ -1786,7 +1786,8 @@ ProcessUtilitySlow(ParseState *pstate,
break;
case T_AlterSubscriptionStmt:
- address = AlterSubscription((AlterSubscriptionStmt *) parsetree);
+ address = AlterSubscription((AlterSubscriptionStmt *) parsetree,
+ isTopLevel);
break;
case T_DropSubscriptionStmt:
diff --git a/src/include/catalog/pg_subscription_rel.h b/src/include/catalog/pg_subscription_rel.h
index 2bea2c52aa..ed94f57baa 100644
--- a/src/include/catalog/pg_subscription_rel.h
+++ b/src/include/catalog/pg_subscription_rel.h
@@ -61,6 +61,8 @@ DECLARE_UNIQUE_INDEX_PKEY(pg_subscription_rel_srrelid_srsubid_index, 6117, on pg
#define SUBREL_STATE_INIT 'i' /* initializing (sublsn NULL) */
#define SUBREL_STATE_DATASYNC 'd' /* data is being synchronized (sublsn
* NULL) */
+#define SUBREL_STATE_FINISHEDCOPY 'f' /* tablesync copy phase is completed
+ * (sublsn NULL) */
#define SUBREL_STATE_SYNCDONE 's' /* synchronization finished in front of
* apply (sublsn set) */
#define SUBREL_STATE_READY 'r' /* ready (sublsn set) */
diff --git a/src/include/commands/subscriptioncmds.h b/src/include/commands/subscriptioncmds.h
index a81865079d..3b926f35d7 100644
--- a/src/include/commands/subscriptioncmds.h
+++ b/src/include/commands/subscriptioncmds.h
@@ -20,7 +20,7 @@
extern ObjectAddress CreateSubscription(CreateSubscriptionStmt *stmt,
bool isTopLevel);
-extern ObjectAddress AlterSubscription(AlterSubscriptionStmt *stmt);
+extern ObjectAddress AlterSubscription(AlterSubscriptionStmt *stmt, bool isTopLevel);
extern void DropSubscription(DropSubscriptionStmt *stmt, bool isTopLevel);
extern ObjectAddress AlterSubscriptionOwner(const char *name, Oid newOwnerId);
diff --git a/src/include/replication/logicallauncher.h b/src/include/replication/logicallauncher.h
index 421ec1580d..301e494f7b 100644
--- a/src/include/replication/logicallauncher.h
+++ b/src/include/replication/logicallauncher.h
@@ -22,9 +22,7 @@ extern Size ApplyLauncherShmemSize(void);
extern void ApplyLauncherShmemInit(void);
extern void ApplyLauncherWakeupAtCommit(void);
-extern bool XactManipulatesLogicalReplicationWorkers(void);
extern void AtEOXact_ApplyLauncher(bool isCommit);
-extern void AtEOSubXact_ApplyLauncher(bool isCommit, int nestDepth);
extern bool IsLogicalLauncher(void);
diff --git a/src/include/replication/slot.h b/src/include/replication/slot.h
index 53f636c56f..5f52335f15 100644
--- a/src/include/replication/slot.h
+++ b/src/include/replication/slot.h
@@ -15,6 +15,7 @@
#include "storage/lwlock.h"
#include "storage/shmem.h"
#include "storage/spin.h"
+#include "replication/walreceiver.h"
/*
* Behaviour of replication slots, upon release or crash.
@@ -211,6 +212,8 @@ extern bool ReplicationSlotsCountDBSlots(Oid dboid, int *nslots, int *nactive);
extern void ReplicationSlotsDropDBSlots(Oid dboid);
extern void InvalidateObsoleteReplicationSlots(XLogSegNo oldestSegno);
extern ReplicationSlot *SearchNamedReplicationSlot(const char *name);
+extern char *ReplicationSlotNameForTablesync(Oid suboid, Oid relid, char *syncslotname);
+extern void ReplicationSlotDropAtPubNode(WalReceiverConn *wrconn, char *slotname, bool missing_ok);
extern void StartupReplicationSlots(void);
extern void CheckPointReplicationSlots(void);
diff --git a/src/include/replication/worker_internal.h b/src/include/replication/worker_internal.h
index d046022e49..4a5c49da7d 100644
--- a/src/include/replication/worker_internal.h
+++ b/src/include/replication/worker_internal.h
@@ -77,13 +77,14 @@ extern List *logicalrep_workers_find(Oid subid, bool only_running);
extern void logicalrep_worker_launch(Oid dbid, Oid subid, const char *subname,
Oid userid, Oid relid);
extern void logicalrep_worker_stop(Oid subid, Oid relid);
-extern void logicalrep_worker_stop_at_commit(Oid subid, Oid relid);
extern void logicalrep_worker_wakeup(Oid subid, Oid relid);
extern void logicalrep_worker_wakeup_ptr(LogicalRepWorker *worker);
extern int logicalrep_sync_worker_count(Oid subid);
extern char *LogicalRepSyncTableStart(XLogRecPtr *origin_startpos);
+extern void tablesync_replorigin_drop(Oid subid, Oid relid, bool nowait);
+
void process_syncing_tables(XLogRecPtr current_lsn);
void invalidate_syncing_table_states(Datum arg, int cacheid,
uint32 hashvalue);
diff --git a/src/tools/pgindent/typedefs.list b/src/tools/pgindent/typedefs.list
index 1d540fe489..5f5c36d8e2 100644
--- a/src/tools/pgindent/typedefs.list
+++ b/src/tools/pgindent/typedefs.list
@@ -2397,7 +2397,6 @@ StdAnalyzeData
StdRdOptions
Step
StopList
-StopWorkersData
StrategyNumber
StreamCtl
StreamXidHash
--
2.28.0.windows.1
Hello
On Friday, February 5, 2021 2:23 PM Amit Kapila <amit.kapila16@gmail.com>
On Fri, Feb 5, 2021 at 7:09 AM Peter Smith <smithpb2250@gmail.com> wrote:
On Thu, Feb 4, 2021 at 8:33 PM Amit Kapila <amit.kapila16@gmail.com>
wrote:
...
Thanks. I have fixed one of the issues reported by me earlier [1]
wherein the tablesync worker can repeatedly fail if after dropping
the slot there is an error while updating the SYNCDONE state in the
database. I have moved the drop of the slot just before commit of
the transaction where we are marking the state as SYNCDONE.
Additionally, I have removed unnecessary includes in tablesync.c,
updated the docs for Alter Subscription, and updated the comments at
various places in the patch. I have also updated the commit message thistime.
Below are my feedback comments for V17 (nothing functional)
~~
1.
V27 Commit message:
For the initial table data synchronization in logical replication, we
use a single transaction to copy the entire table and then
synchronizes the position in the stream with the main apply worker.Typo:
"synchronizes" -> "synchronize"Fixed and added a note about Alter Sub .. Refresh .. command can't be
executed in the transaction block.
Thank you for the updates.
We need to add some tests to prove the new checks of AlterSubscription() work.
I chose TAP tests as we need to set connect = true for the subscription.
When it can contribute to the development, please utilize this.
I used v28 to check my patch and works as we expect.
Best Regards,
Takamichi Osumi
Attachments:
AlterSubscription_with_refresh_tests.patchapplication/octet-stream; name=AlterSubscription_with_refresh_tests.patchDownload
From f05a759c669d081be8dedbfe8a499e6df89373e0 Mon Sep 17 00:00:00 2001
From: Osumi Takamichi <osumi.takamichi@fujitsu.com>
Date: Fri, 5 Feb 2021 06:25:23 +0000
Subject: [PATCH] new 3 tests for AlterSubscription with refresh.
Check that to execute ALTER SUBSCRIPTION
with refresh in a transaction or in a function is not allowed.
Author : Takamichi Osumi <osumi.takamichi@fujitsu.com>
---
src/test/subscription/t/004_sync.pl | 43 ++++++++++++++++++++++++++++++++++++-
1 file changed, 42 insertions(+), 1 deletion(-)
diff --git a/src/test/subscription/t/004_sync.pl b/src/test/subscription/t/004_sync.pl
index e111ab9..e24a676 100644
--- a/src/test/subscription/t/004_sync.pl
+++ b/src/test/subscription/t/004_sync.pl
@@ -3,7 +3,7 @@ use strict;
use warnings;
use PostgresNode;
use TestLib;
-use Test::More tests => 7;
+use Test::More tests => 10;
# Initialize publisher node
my $node_publisher = get_new_node('publisher');
@@ -151,5 +151,46 @@ is($result, qq(20),
$node_subscriber->safe_psql('postgres', "DROP SUBSCRIPTION tap_sub");
+# Executing ALTER SUBSCRIPTION with refresh in a transaction or function is not allowed.
+$node_publisher->safe_psql('postgres',
+ "CREATE PUBLICATION mypub FOR ALL TABLES;");
+
+$node_subscriber->safe_psql('postgres',
+ "CREATE SUBSCRIPTION mysub CONNECTION '$publisher_connstr' PUBLICATION mypub WITH (connect = true);");
+
+my ($cmdret, $stdout, $stderr);
+($cmdret, $stdout, $stderr) =
+ $node_subscriber->psql('postgres', q[
+BEGIN;
+ALTER SUBSCRIPTION mysub SET PUBLICATION mypub WITH (refresh = true);
+END;
+]);
+
+ok($stderr =~
+ qr/ALTER SUBSCRIPTION with refresh cannot run inside a transaction block/,
+ 'should fail to issue ALTER SUBSCRIPTION ... SET PUBLICATION with refresh on inside a transaction block');
+
+($cmdret, $stdout, $stderr) =
+ $node_subscriber->psql('postgres', q[
+BEGIN;
+ALTER SUBSCRIPTION mysub REFRESH PUBLICATION;
+END;
+]);
+
+ok($stderr =~
+ qr/ALTER SUBSCRIPTION ... REFRESH cannot run inside a transaction block/,
+ 'should fail to issue ALTER SUBSCRIPTION ... REFRESH inside a transaction block');
+
+($cmdret, $stdout, $stderr) =
+ $node_subscriber->psql('postgres', q[
+CREATE FUNCTION func() RETURNS VOID AS
+$$ ALTER SUBSCRIPTION mysub SET PUBLICATION mypub WITH (refresh = true) $$ LANGUAGE SQL;
+SELECT func();
+]);
+
+ok($stderr =~
+ qr/ALTER SUBSCRIPTION with refresh cannot be executed from a function/,
+ 'should fail to issue ALTER SUBSCRIPTION ... SET PUBLICATION with refresh on from a function');
+
$node_subscriber->stop('fast');
$node_publisher->stop('fast');
--
2.2.0
On Fri, Feb 5, 2021 at 12:36 PM osumi.takamichi@fujitsu.com
<osumi.takamichi@fujitsu.com> wrote:
We need to add some tests to prove the new checks of AlterSubscription() work.
I chose TAP tests as we need to set connect = true for the subscription.
When it can contribute to the development, please utilize this.
I used v28 to check my patch and works as we expect.
Thanks for writing the tests but I don't understand why you need to
set connect = true for this test? I have tried below '... with connect
= false' and it seems to be working:
postgres=# CREATE SUBSCRIPTION mysub
postgres-# CONNECTION 'host=localhost port=5432 dbname=postgres'
postgres-# PUBLICATION mypublication WITH (connect = false);
WARNING: tables were not subscribed, you will have to run ALTER
SUBSCRIPTION ... REFRESH PUBLICATION to subscribe the tables
CREATE SUBSCRIPTION
postgres=# Begin;
BEGIN
postgres=*# Alter Subscription mysub Refresh Publication;
ERROR: ALTER SUBSCRIPTION ... REFRESH is not allowed for disabled subscriptions
So, if possible lets write this test in src/test/regress/sql/subscription.sql.
I have another idea for a test case: What if we write a test such that
it fails PK violation on copy and then drop the subscription. Then
check there shouldn't be any dangling slot on the publisher? This is
similar to a test in subscription/t/004_sync.pl, we can use some of
that framework but have a separate test for this.
--
With Regards,
Amit Kapila.
I did some basic cross-version testing, publisher on PG13 and
subscriber on PG14 and publisher on PG14 and subscriber on PG13.
Did some basic operations, CREATE, ALTER and STOP subscriptions and it
seemed to work fine, no errors.
regards,
Ajin Cherian
Fujitsu Australia.
Hi,
We had a bit high-level discussion about this patches with Amit
off-list, so I decided to also take a look at the actual code.
My main concern originally was the potential for left-over slots on
publisher, but I think the state now is relatively okay, with couple of
corner cases that are documented and don't seem much worse than the main
slot.
I wonder if we should mention the max_slot_wal_keep_size GUC in the
table sync docs though.
Another thing that might need documentation is that the the visibility
of changes done by table sync is not anymore isolated in that table
contents will show intermediate progress to other backends, rather than
switching from nothing to state consistent with rest of replication.
Some minor comments about code:
+ else if (res->status == WALRCV_ERROR && missing_ok) + { + /* WARNING. Error, but missing_ok = true. */ + ereport(WARNING,
I wonder if we need to add error code to the WalRcvExecResult and check
for the appropriate ones here. Because this can for example return error
because of timeout, not because slot is missing. Not sure if it matters
for current callers though (but then maybe don't call the param
missign_ok?).
+ReplicationSlotNameForTablesync(Oid suboid, Oid relid, char syncslotname[NAMEDATALEN]) +{ + if (syncslotname) + sprintf(syncslotname, "pg_%u_sync_%u", suboid, relid); + else + syncslotname = psprintf("pg_%u_sync_%u", suboid, relid); + + return syncslotname; +}
Given that we are now explicitly dropping slots, what happens here if we
have 2 different downstreams that happen to get same suboid and reloid,
will one of the drop the slot of the other one? Previously with the
cleanup being left to temp slot we'd at maximum got error when creating
it but with the new logic in LogicalRepSyncTableStart it feels like we
could get into situation where 2 downstreams are fighting over slot no?
--
Petr
On Sat, Feb 6, 2021 at 2:10 AM Petr Jelinek
<petr.jelinek@enterprisedb.com> wrote:
+ReplicationSlotNameForTablesync(Oid suboid, Oid relid, char syncslotname[NAMEDATALEN]) +{ + if (syncslotname) + sprintf(syncslotname, "pg_%u_sync_%u", suboid, relid); + else + syncslotname = psprintf("pg_%u_sync_%u", suboid, relid); + + return syncslotname; +}Given that we are now explicitly dropping slots, what happens here if we
have 2 different downstreams that happen to get same suboid and reloid,
will one of the drop the slot of the other one? Previously with the
cleanup being left to temp slot we'd at maximum got error when creating
it but with the new logic in LogicalRepSyncTableStart it feels like we
could get into situation where 2 downstreams are fighting over slot no?
The PG docs [1]https://www.postgresql.org/docs/devel/catalog-pg-subscription.html says "there is only one copy of pg_subscription per
cluster, not one per database". IIUC that means it is not possible for
2 different subscriptions to have the same suboid. And if the suboid
is globally unique then syncslotname name is also unique. Is that
understanding not correct?
-----
[1]: https://www.postgresql.org/docs/devel/catalog-pg-subscription.html
Kind Regards,
Peter Smith.
Fujitsu Australia
On Sat, Feb 6, 2021 at 6:22 AM Peter Smith <smithpb2250@gmail.com> wrote:
On Sat, Feb 6, 2021 at 2:10 AM Petr Jelinek
<petr.jelinek@enterprisedb.com> wrote:+ReplicationSlotNameForTablesync(Oid suboid, Oid relid, char syncslotname[NAMEDATALEN]) +{ + if (syncslotname) + sprintf(syncslotname, "pg_%u_sync_%u", suboid, relid); + else + syncslotname = psprintf("pg_%u_sync_%u", suboid, relid); + + return syncslotname; +}Given that we are now explicitly dropping slots, what happens here if we
have 2 different downstreams that happen to get same suboid and reloid,
will one of the drop the slot of the other one? Previously with the
cleanup being left to temp slot we'd at maximum got error when creating
it but with the new logic in LogicalRepSyncTableStart it feels like we
could get into situation where 2 downstreams are fighting over slot no?
I think so. See, if the alternative suggested below works or if you
have any other suggestions for the same?
The PG docs [1] says "there is only one copy of pg_subscription per
cluster, not one per database". IIUC that means it is not possible for
2 different subscriptions to have the same suboid.
I think he is talking about two different clusters having separate
subscriptions but point to the same publisher. In different clusters,
we can get the same subid/relid. I think we need a cluster-wide unique
identifier to distinguish among different subscribers. How about using
the system_identifier stored in the control file (we can use
GetSystemIdentifier to retrieve it). I think one concern could be
that adding that to slot name could exceed the max length of slot
(NAMEDATALEN -1) but I don't think that is the case here
(pg_%u_sync_%u_UINT64_FORMAT (3 + 10 + 6 + 10 + 20 + '\0')). Note last
is system_identifier in this scheme.
Do you guys think that works or let me know if you have any other
better idea? Petr, is there a reason why such an identifier is not
considered originally, is there any risk in it?
--
With Regards,
Amit Kapila.
Hi
On Friday, February 5, 2021 5:51 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
On Fri, Feb 5, 2021 at 12:36 PM osumi.takamichi@fujitsu.com
<osumi.takamichi@fujitsu.com> wrote:We need to add some tests to prove the new checks of AlterSubscription()
work.
I chose TAP tests as we need to set connect = true for the subscription.
When it can contribute to the development, please utilize this.
I used v28 to check my patch and works as we expect.Thanks for writing the tests but I don't understand why you need to set
connect = true for this test? I have tried below '... with connect = false' and it
seems to be working:
postgres=# CREATE SUBSCRIPTION mysub
postgres-# CONNECTION 'host=localhost port=5432
dbname=postgres'
postgres-# PUBLICATION mypublication WITH (connect = false);
WARNING: tables were not subscribed, you will have to run ALTER
SUBSCRIPTION ... REFRESH PUBLICATION to subscribe the tables CREATE
SUBSCRIPTION postgres=# Begin; BEGIN postgres=*# Alter Subscription
mysub Refresh Publication;
ERROR: ALTER SUBSCRIPTION ... REFRESH is not allowed for disabled
subscriptionsSo, if possible lets write this test in src/test/regress/sql/subscription.sql.
OK. I changed the place to write the tests for those.
I have another idea for a test case: What if we write a test such that it fails PK
violation on copy and then drop the subscription. Then check there shouldn't
be any dangling slot on the publisher? This is similar to a test in
subscription/t/004_sync.pl, we can use some of that framework but have a
separate test for this.
I've added this PK violation test to the attached tests.
The patch works with v28 and made no failure during regression tests.
Best Regards,
Takamichi Osumi
Attachments:
refresh_and_pk_violation_testsets.patchapplication/octet-stream; name=refresh_and_pk_violation_testsets.patchDownload
From 5ad17c4ed06179889cc156d747624b060a54dc47 Mon Sep 17 00:00:00 2001
From: Osumi Takamichi <osumi.takamichi@fujitsu.com>
Date: Sat, 6 Feb 2021 07:01:46 +0000
Subject: [PATCH] tests for AlterSubscription and DROP SUBSCRIPTION
Check that to execute ALTER SUBSCRIPTION
with refresh in a transaction or in a function is not allowed.
Also, confirm that DROP SUBSCRIPTION during error
(like PK violation) of the subscriber can
clean up the publisher's slots correctly.
Author : Takamichi Osumi <osumi.takamichi@fujitsu.com>
---
src/test/regress/expected/subscription.out | 20 ++++++++++++++++++++
src/test/regress/sql/subscription.sql | 21 +++++++++++++++++++++
src/test/subscription/t/004_sync.pl | 20 +++++++++++++++++++-
3 files changed, 60 insertions(+), 1 deletion(-)
diff --git a/src/test/regress/expected/subscription.out b/src/test/regress/expected/subscription.out
index 2fa9bce..9dde54b 100644
--- a/src/test/regress/expected/subscription.out
+++ b/src/test/regress/expected/subscription.out
@@ -201,6 +201,26 @@ ALTER SUBSCRIPTION regress_testsub SET (slot_name = NONE);
(1 row)
DROP SUBSCRIPTION regress_testsub;
+-- Executing ALTER SUBSCRIPTION with refresh in a transaction or function is not allowed
+CREATE SUBSCRIPTION mytest CONNECTION 'dbname=postgres' PUBLICATION mypub
+ WITH (enabled = true, create_slot = false, copy_data = false);
+BEGIN;
+ALTER SUBSCRIPTION mytest SET PUBLICATION foo WITH (refresh = true);
+ERROR: ALTER SUBSCRIPTION with refresh cannot run inside a transaction block
+END;
+BEGIN;
+ALTER SUBSCRIPTION mytest REFRESH PUBLICATION;
+ERROR: ALTER SUBSCRIPTION ... REFRESH cannot run inside a transaction block
+END;
+CREATE FUNCTION func() RETURNS VOID AS
+$$ ALTER SUBSCRIPTION mytest SET PUBLICATION mypub WITH (refresh = true) $$ LANGUAGE SQL;
+SELECT func();
+ERROR: ALTER SUBSCRIPTION with refresh cannot be executed from a function
+CONTEXT: SQL function "func" statement 1
+ALTER SUBSCRIPTION mytest DISABLE;
+ALTER SUBSCRIPTION mytest SET (slot_name = NONE);
+DROP SUBSCRIPTION mytest;
+DROP FUNCTION func;
RESET SESSION AUTHORIZATION;
DROP ROLE regress_subscription_user;
DROP ROLE regress_subscription_user2;
diff --git a/src/test/regress/sql/subscription.sql b/src/test/regress/sql/subscription.sql
index 14fa0b2..567c451 100644
--- a/src/test/regress/sql/subscription.sql
+++ b/src/test/regress/sql/subscription.sql
@@ -147,6 +147,27 @@ ALTER SUBSCRIPTION regress_testsub SET (slot_name = NONE);
DROP SUBSCRIPTION regress_testsub;
+-- Executing ALTER SUBSCRIPTION with refresh in a transaction or function is not allowed
+CREATE SUBSCRIPTION mytest CONNECTION 'dbname=postgres' PUBLICATION mypub
+ WITH (enabled = true, create_slot = false, copy_data = false);
+
+BEGIN;
+ALTER SUBSCRIPTION mytest SET PUBLICATION foo WITH (refresh = true);
+END;
+
+BEGIN;
+ALTER SUBSCRIPTION mytest REFRESH PUBLICATION;
+END;
+
+CREATE FUNCTION func() RETURNS VOID AS
+$$ ALTER SUBSCRIPTION mytest SET PUBLICATION mypub WITH (refresh = true) $$ LANGUAGE SQL;
+SELECT func();
+
+ALTER SUBSCRIPTION mytest DISABLE;
+ALTER SUBSCRIPTION mytest SET (slot_name = NONE);
+DROP SUBSCRIPTION mytest;
+DROP FUNCTION func;
+
RESET SESSION AUTHORIZATION;
DROP ROLE regress_subscription_user;
DROP ROLE regress_subscription_user2;
diff --git a/src/test/subscription/t/004_sync.pl b/src/test/subscription/t/004_sync.pl
index e111ab9..b5e5748 100644
--- a/src/test/subscription/t/004_sync.pl
+++ b/src/test/subscription/t/004_sync.pl
@@ -3,7 +3,7 @@ use strict;
use warnings;
use PostgresNode;
use TestLib;
-use Test::More tests => 7;
+use Test::More tests => 9;
# Initialize publisher node
my $node_publisher = get_new_node('publisher');
@@ -149,7 +149,25 @@ $result = $node_subscriber->safe_psql('postgres',
is($result, qq(20),
'changes for table added after subscription initialized replicated');
+$node_publisher->safe_psql('postgres', "DROP TABLE tab_rep_next");
+$node_subscriber->safe_psql('postgres', "DROP TABLE tab_rep_next");
+
+# Check if DROP SUBSCRIPTION cleans up slots on the publisher side
+# when the subscriber is stuck on data copy for constraint
$node_subscriber->safe_psql('postgres', "DROP SUBSCRIPTION tap_sub");
+$node_subscriber->safe_psql('postgres',
+ "CREATE SUBSCRIPTION tap_sub CONNECTION '$publisher_connstr' PUBLICATION tap_pub");
+
+$result = $node_subscriber->poll_query_until('postgres', $started_query)
+ or die "Timed out while waiting for subscriber to start sync";
+
+$result = $node_publisher->safe_psql('postgres', "SELECT count(*) FROM pg_replication_slots");
+is($result, qq(2), 'There should be 2 slots on the publisher before dropping the slots');
+
+$node_subscriber->safe_psql('postgres', "DROP SUBSCRIPTION tap_sub");
+
+$result = $node_publisher->safe_psql('postgres', "SELECT count(*) FROM pg_replication_slots");
+is($result, qq(0), 'DROP SUBSCRIPTION during error can clean up the slots on the publisher');
$node_subscriber->stop('fast');
$node_publisher->stop('fast');
--
2.2.0
On 06/02/2021 06:07, Amit Kapila wrote:
On Sat, Feb 6, 2021 at 6:22 AM Peter Smith <smithpb2250@gmail.com> wrote:
On Sat, Feb 6, 2021 at 2:10 AM Petr Jelinek
<petr.jelinek@enterprisedb.com> wrote:+ReplicationSlotNameForTablesync(Oid suboid, Oid relid, char syncslotname[NAMEDATALEN]) +{ + if (syncslotname) + sprintf(syncslotname, "pg_%u_sync_%u", suboid, relid); + else + syncslotname = psprintf("pg_%u_sync_%u", suboid, relid); + + return syncslotname; +}Given that we are now explicitly dropping slots, what happens here if we
have 2 different downstreams that happen to get same suboid and reloid,
will one of the drop the slot of the other one? Previously with the
cleanup being left to temp slot we'd at maximum got error when creating
it but with the new logic in LogicalRepSyncTableStart it feels like we
could get into situation where 2 downstreams are fighting over slot no?I think so. See, if the alternative suggested below works or if you
have any other suggestions for the same?The PG docs [1] says "there is only one copy of pg_subscription per
cluster, not one per database". IIUC that means it is not possible for
2 different subscriptions to have the same suboid.I think he is talking about two different clusters having separate
subscriptions but point to the same publisher. In different clusters,
we can get the same subid/relid. I think we need a cluster-wide unique
identifier to distinguish among different subscribers. How about using
the system_identifier stored in the control file (we can use
GetSystemIdentifier to retrieve it). I think one concern could be
that adding that to slot name could exceed the max length of slot
(NAMEDATALEN -1) but I don't think that is the case here
(pg_%u_sync_%u_UINT64_FORMAT (3 + 10 + 6 + 10 + 20 + '\0')). Note last
is system_identifier in this scheme.
Yep that's what I mean and system_identifier seems like a good choice to me.
Do you guys think that works or let me know if you have any other
better idea? Petr, is there a reason why such an identifier is not
considered originally, is there any risk in it?
Originally it was not considered likely because it's all based on
pglogical/BDR work where ids are hashes of stuff that's unique across
group of instances, not counter based like Oids in PostgreSQL and I
simply didn't realize it could be a problem until reading this patch :)
--
Petr Jelinek
On Sat, Feb 6, 2021 at 2:10 AM Petr Jelinek
<petr.jelinek@enterprisedb.com> wrote:
Hi,
Some minor comments about code:
+ else if (res->status == WALRCV_ERROR && missing_ok) + { + /* WARNING. Error, but missing_ok = true. */ + ereport(WARNING,I wonder if we need to add error code to the WalRcvExecResult and check
for the appropriate ones here. Because this can for example return error
because of timeout, not because slot is missing. Not sure if it matters
for current callers though (but then maybe don't call the param
missign_ok?).
You are right. The way we are using this function has evolved beyond
the original intention.
Probably renaming the param to something like "error_ok" would be more
appropriate now.
----
Kind Regards,
Peter Smith.
Fujitsu Australia
On Sun, Feb 7, 2021 at 2:38 PM Peter Smith <smithpb2250@gmail.com> wrote:
On Sat, Feb 6, 2021 at 2:10 AM Petr Jelinek
<petr.jelinek@enterprisedb.com> wrote:Hi,
Some minor comments about code:
+ else if (res->status == WALRCV_ERROR && missing_ok) + { + /* WARNING. Error, but missing_ok = true. */ + ereport(WARNING,I wonder if we need to add error code to the WalRcvExecResult and check
for the appropriate ones here. Because this can for example return error
because of timeout, not because slot is missing. Not sure if it matters
for current callers though (but then maybe don't call the param
missign_ok?).You are right. The way we are using this function has evolved beyond
the original intention.
Probably renaming the param to something like "error_ok" would be more
appropriate now.
PSA a patch (apply on top of V28) to change the misleading param name.
----
Kind Regards,
Peter Smith.
Fujitsu Australia
Attachments:
v1-0001-ReplicationSlotDropAtPubNode-param.patchapplication/octet-stream; name=v1-0001-ReplicationSlotDropAtPubNode-param.patchDownload
From 46ea93e86c0c4125735a3a115704ad2b954c0af1 Mon Sep 17 00:00:00 2001
From: Peter Smith <peter.b.smith@fujitsu.com>
Date: Mon, 8 Feb 2021 11:30:14 +1100
Subject: [PATCH v1] ReplicationSlotDropAtPubNode param.
Apply this patch on top of V28.
This patch only changes the name of the ReplicationSlotDropAtPubNode bool param which says whether to give WARNING in place of ERROR. The previous name "missing_ok" was misleading because we also want WARNING for situations where slot cannot be dropped for reasons *other* than just missing (e.g. unable to drop because connection down).
---
src/backend/commands/subscriptioncmds.c | 20 ++++++++++----------
src/backend/replication/logical/tablesync.c | 4 ++--
src/include/replication/slot.h | 2 +-
3 files changed, 13 insertions(+), 13 deletions(-)
diff --git a/src/backend/commands/subscriptioncmds.c b/src/backend/commands/subscriptioncmds.c
index 1d3ca43..691140f 100644
--- a/src/backend/commands/subscriptioncmds.c
+++ b/src/backend/commands/subscriptioncmds.c
@@ -731,7 +731,7 @@ AlterSubscription_refresh(Subscription *sub, bool copy_data)
* does not exist yet. Also, if we fail after removing some of
* the slots, next time, it will again try to drop already
* dropped slots and fail. For these reasons, we allow
- * missing_ok = true for the drop.
+ * error_ok = true for the drop.
*
* XXX If there is a network breakdown while dropping the
* slots then we will give a WARNING to the user and they need
@@ -740,7 +740,7 @@ AlterSubscription_refresh(Subscription *sub, bool copy_data)
* this.
*/
ReplicationSlotNameForTablesync(sub->oid, sub_remove_rels[off].relid, syncslotname);
- ReplicationSlotDropAtPubNode(wrconn, syncslotname, true /* missing_ok */);
+ ReplicationSlotDropAtPubNode(wrconn, syncslotname, true /* error_ok */);
}
}
}
@@ -1234,14 +1234,14 @@ DropSubscription(DropSubscriptionStmt *stmt, bool isTopLevel)
* does not exist yet. Also, if we fail after removing some of
* the slots, next time, it will again try to drop already
* dropped slots and fail. For these reasons, we allow
- * missing_ok = true for the drop.
+ * error_ok = true for the drop.
*/
if (rstate->state != SUBREL_STATE_SYNCDONE)
{
char syncslotname[NAMEDATALEN] = {0};
ReplicationSlotNameForTablesync(subid, relid, syncslotname);
- ReplicationSlotDropAtPubNode(wrconn, syncslotname, true /* missing_ok */ );
+ ReplicationSlotDropAtPubNode(wrconn, syncslotname, true /* error_ok */ );
}
}
@@ -1252,7 +1252,7 @@ DropSubscription(DropSubscriptionStmt *stmt, bool isTopLevel)
* replication slot at the publisher.
*/
if (slotname)
- ReplicationSlotDropAtPubNode(wrconn, slotname, false /* missing_ok */ );
+ ReplicationSlotDropAtPubNode(wrconn, slotname, false /* error_ok */ );
}
PG_FINALLY();
@@ -1268,11 +1268,11 @@ DropSubscription(DropSubscriptionStmt *stmt, bool isTopLevel)
* Drop the replication slot at the publisher node using the replication
* connection.
*
- * missing_ok - if true then only issue WARNING message if the slot cannot be
- * deleted.
+ * error_ok - if true then only issue WARNING message instead of ERROR when
+ * the slot cannot dropped, because does not exist, connection broken, etc.
*/
void
-ReplicationSlotDropAtPubNode(WalReceiverConn *wrconn, char *slotname, bool missing_ok)
+ReplicationSlotDropAtPubNode(WalReceiverConn *wrconn, char *slotname, bool error_ok)
{
StringInfoData cmd;
@@ -1296,9 +1296,9 @@ ReplicationSlotDropAtPubNode(WalReceiverConn *wrconn, char *slotname, bool missi
(errmsg("dropped replication slot \"%s\" on publisher",
slotname)));
}
- else if (res->status == WALRCV_ERROR && missing_ok)
+ else if (res->status == WALRCV_ERROR && error_ok)
{
- /* WARNING. Error, but missing_ok = true. */
+ /* WARNING. Error, but error_ok = true. */
ereport(WARNING,
(errmsg("could not drop the replication slot \"%s\" on publisher",
slotname),
diff --git a/src/backend/replication/logical/tablesync.c b/src/backend/replication/logical/tablesync.c
index b636976..4b8ee33 100644
--- a/src/backend/replication/logical/tablesync.c
+++ b/src/backend/replication/logical/tablesync.c
@@ -357,7 +357,7 @@ process_syncing_tables_for_sync(XLogRecPtr current_lsn)
* otherwise, it won't be dropped till the corresponding subscription
* is dropped.
*/
- ReplicationSlotDropAtPubNode(wrconn, syncslotname, false /* missing_ok */);
+ ReplicationSlotDropAtPubNode(wrconn, syncslotname, false /* error_ok */);
finish_sync_worker();
}
@@ -994,7 +994,7 @@ LogicalRepSyncTableStart(XLogRecPtr *origin_startpos)
* breakdown then it wouldn't have succeeded so trying it next time
* seems like a better bet.
*/
- ReplicationSlotDropAtPubNode(wrconn, slotname, true /* missing_ok */);
+ ReplicationSlotDropAtPubNode(wrconn, slotname, true /* error_ok */);
}
else if (MyLogicalRepWorker->relstate == SUBREL_STATE_FINISHEDCOPY)
{
diff --git a/src/include/replication/slot.h b/src/include/replication/slot.h
index 5f52335..e67942d 100644
--- a/src/include/replication/slot.h
+++ b/src/include/replication/slot.h
@@ -213,7 +213,7 @@ extern void ReplicationSlotsDropDBSlots(Oid dboid);
extern void InvalidateObsoleteReplicationSlots(XLogSegNo oldestSegno);
extern ReplicationSlot *SearchNamedReplicationSlot(const char *name);
extern char *ReplicationSlotNameForTablesync(Oid suboid, Oid relid, char *syncslotname);
-extern void ReplicationSlotDropAtPubNode(WalReceiverConn *wrconn, char *slotname, bool missing_ok);
+extern void ReplicationSlotDropAtPubNode(WalReceiverConn *wrconn, char *slotname, bool error_ok);
extern void StartupReplicationSlots(void);
extern void CheckPointReplicationSlots(void);
--
1.8.3.1
On Sat, Feb 6, 2021 at 6:30 PM osumi.takamichi@fujitsu.com
<osumi.takamichi@fujitsu.com> wrote:
Hi
On Friday, February 5, 2021 5:51 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
On Fri, Feb 5, 2021 at 12:36 PM osumi.takamichi@fujitsu.com
<osumi.takamichi@fujitsu.com> wrote:We need to add some tests to prove the new checks of AlterSubscription()
work.
I chose TAP tests as we need to set connect = true for the subscription.
When it can contribute to the development, please utilize this.
I used v28 to check my patch and works as we expect.Thanks for writing the tests but I don't understand why you need to set
connect = true for this test? I have tried below '... with connect = false' and it
seems to be working:
postgres=# CREATE SUBSCRIPTION mysub
postgres-# CONNECTION 'host=localhost port=5432
dbname=postgres'
postgres-# PUBLICATION mypublication WITH (connect = false);
WARNING: tables were not subscribed, you will have to run ALTER
SUBSCRIPTION ... REFRESH PUBLICATION to subscribe the tables CREATE
SUBSCRIPTION postgres=# Begin; BEGIN postgres=*# Alter Subscription
mysub Refresh Publication;
ERROR: ALTER SUBSCRIPTION ... REFRESH is not allowed for disabled
subscriptionsSo, if possible lets write this test in src/test/regress/sql/subscription.sql.
OK. I changed the place to write the tests for those.
I have another idea for a test case: What if we write a test such that it fails PK
violation on copy and then drop the subscription. Then check there shouldn't
be any dangling slot on the publisher? This is similar to a test in
subscription/t/004_sync.pl, we can use some of that framework but have a
separate test for this.I've added this PK violation test to the attached tests.
The patch works with v28 and made no failure during regression tests.
I checked this patch. It applied cleanly on top of V28, and all tests passed OK.
Here are two feedback comments.
1. For the regression test there is 2 x SQL and 1 x function test. I
thought to cover all the combinations there should be another function
test. e.g.
Tests ALTER … REFRESH
Tests ALTER …. (refresh = true)
Tests ALTER … (refresh = true) in a function
Tests ALTER … REFRESH in a function <== this combination is not being
testing ??
2. For the 004 test case I know the test is needing some PK constraint violation
# Check if DROP SUBSCRIPTION cleans up slots on the publisher side
# when the subscriber is stuck on data copy for constraint
But it is not clear to me what was the exact cause of that PK
violation. I think you must be relying on data that is leftover from
some previous test case but I am not sure which one. Can you make the
comment more detailed to say *how* the PK violation is happening - e.g
something to say which rows, in which table, and inserted by who?
------
Kind Regards,
Peter Smith.
Fujitsu Australia
On Mon, Feb 8, 2021 at 8:06 AM Peter Smith <smithpb2250@gmail.com> wrote:
On Sat, Feb 6, 2021 at 6:30 PM osumi.takamichi@fujitsu.com
<osumi.takamichi@fujitsu.com> wrote:I have another idea for a test case: What if we write a test such that it fails PK
violation on copy and then drop the subscription. Then check there shouldn't
be any dangling slot on the publisher? This is similar to a test in
subscription/t/004_sync.pl, we can use some of that framework but have a
separate test for this.I've added this PK violation test to the attached tests.
The patch works with v28 and made no failure during regression tests.I checked this patch. It applied cleanly on top of V28, and all tests passed OK.
Here are two feedback comments.
1. For the regression test there is 2 x SQL and 1 x function test. I
thought to cover all the combinations there should be another function
test. e.g.
Tests ALTER … REFRESH
Tests ALTER …. (refresh = true)
Tests ALTER … (refresh = true) in a function
Tests ALTER … REFRESH in a function <== this combination is not being
testing ??
I am not sure whether there is much value in adding more to this set
of negative test cases unless it really covers a different code path
which I think won't happen if we add more tests here.
--
With Regards,
Amit Kapila.
Hello
On Mon, Feb 8, 2021 12:40 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
On Mon, Feb 8, 2021 at 8:06 AM Peter Smith <smithpb2250@gmail.com>
wrote:On Sat, Feb 6, 2021 at 6:30 PM osumi.takamichi@fujitsu.com
<osumi.takamichi@fujitsu.com> wrote:I have another idea for a test case: What if we write a test such
that it fails PK violation on copy and then drop the subscription.
Then check there shouldn't be any dangling slot on the publisher?
This is similar to a test in subscription/t/004_sync.pl, we can
use some of that framework but have a separate test for this.I've added this PK violation test to the attached tests.
The patch works with v28 and made no failure during regression tests.I checked this patch. It applied cleanly on top of V28, and all tests passed
OK.
Here are two feedback comments.
1. For the regression test there is 2 x SQL and 1 x function test. I
thought to cover all the combinations there should be another function
test. e.g.
Tests ALTER … REFRESH
Tests ALTER …. (refresh = true)
Tests ALTER … (refresh = true) in a function Tests ALTER … REFRESH in
a function <== this combination is not being testing ??I am not sure whether there is much value in adding more to this set of
negative test cases unless it really covers a different code path which I think
won't happen if we add more tests here.
Yeah, I agree. Accordingly, I didn't fix that part.
On Mon, Feb 8, 2021 11:36 AM Peter Smith <smithpb2250@gmail.com> wrote:
2. For the 004 test case I know the test is needing some PK constraint
violation # Check if DROP SUBSCRIPTION cleans up slots on the publisher
side # when the subscriber is stuck on data copy for constraintBut it is not clear to me what was the exact cause of that PK violation. I think
you must be relying on data that is leftover from some previous test case but
I am not sure which one. Can you make the comment more detailed to say
*how* the PK violation is happening - e.g something to say which rows, in
which table, and inserted by who?
I added some comments to clarify how the PK violation happens.
Please have a look.
Best Regards,
Takamichi Osumi
Attachments:
refresh_and_pk_violation_testsets_v02.patchapplication/octet-stream; name=refresh_and_pk_violation_testsets_v02.patchDownload
From 2466de92dc16f42f3a15600de39d47d01472ab98 Mon Sep 17 00:00:00 2001
From: Osumi Takamichi <osumi.takamichi@fujitsu.com>
Date: Mon, 8 Feb 2021 04:21:51 +0000
Subject: [PATCH v02] tests for AlterSubscription and DROP SUBSCRIPTION
Check that to execute ALTER SUBSCRIPTION
with refresh in a transaction or in a function is not allowed.
Also, confirm that DROP SUBSCRIPTION during error
(like PK violation) of the subscriber can
clean up the publisher's slots correctly.
---
src/test/regress/expected/subscription.out | 20 ++++++++++++++++++++
src/test/regress/sql/subscription.sql | 21 +++++++++++++++++++++
src/test/subscription/t/004_sync.pl | 25 ++++++++++++++++++++++++-
3 files changed, 65 insertions(+), 1 deletion(-)
diff --git a/src/test/regress/expected/subscription.out b/src/test/regress/expected/subscription.out
index 2fa9bce..9dde54b 100644
--- a/src/test/regress/expected/subscription.out
+++ b/src/test/regress/expected/subscription.out
@@ -201,6 +201,26 @@ ALTER SUBSCRIPTION regress_testsub SET (slot_name = NONE);
(1 row)
DROP SUBSCRIPTION regress_testsub;
+-- Executing ALTER SUBSCRIPTION with refresh in a transaction or function is not allowed
+CREATE SUBSCRIPTION mytest CONNECTION 'dbname=postgres' PUBLICATION mypub
+ WITH (enabled = true, create_slot = false, copy_data = false);
+BEGIN;
+ALTER SUBSCRIPTION mytest SET PUBLICATION foo WITH (refresh = true);
+ERROR: ALTER SUBSCRIPTION with refresh cannot run inside a transaction block
+END;
+BEGIN;
+ALTER SUBSCRIPTION mytest REFRESH PUBLICATION;
+ERROR: ALTER SUBSCRIPTION ... REFRESH cannot run inside a transaction block
+END;
+CREATE FUNCTION func() RETURNS VOID AS
+$$ ALTER SUBSCRIPTION mytest SET PUBLICATION mypub WITH (refresh = true) $$ LANGUAGE SQL;
+SELECT func();
+ERROR: ALTER SUBSCRIPTION with refresh cannot be executed from a function
+CONTEXT: SQL function "func" statement 1
+ALTER SUBSCRIPTION mytest DISABLE;
+ALTER SUBSCRIPTION mytest SET (slot_name = NONE);
+DROP SUBSCRIPTION mytest;
+DROP FUNCTION func;
RESET SESSION AUTHORIZATION;
DROP ROLE regress_subscription_user;
DROP ROLE regress_subscription_user2;
diff --git a/src/test/regress/sql/subscription.sql b/src/test/regress/sql/subscription.sql
index 14fa0b2..567c451 100644
--- a/src/test/regress/sql/subscription.sql
+++ b/src/test/regress/sql/subscription.sql
@@ -147,6 +147,27 @@ ALTER SUBSCRIPTION regress_testsub SET (slot_name = NONE);
DROP SUBSCRIPTION regress_testsub;
+-- Executing ALTER SUBSCRIPTION with refresh in a transaction or function is not allowed
+CREATE SUBSCRIPTION mytest CONNECTION 'dbname=postgres' PUBLICATION mypub
+ WITH (enabled = true, create_slot = false, copy_data = false);
+
+BEGIN;
+ALTER SUBSCRIPTION mytest SET PUBLICATION foo WITH (refresh = true);
+END;
+
+BEGIN;
+ALTER SUBSCRIPTION mytest REFRESH PUBLICATION;
+END;
+
+CREATE FUNCTION func() RETURNS VOID AS
+$$ ALTER SUBSCRIPTION mytest SET PUBLICATION mypub WITH (refresh = true) $$ LANGUAGE SQL;
+SELECT func();
+
+ALTER SUBSCRIPTION mytest DISABLE;
+ALTER SUBSCRIPTION mytest SET (slot_name = NONE);
+DROP SUBSCRIPTION mytest;
+DROP FUNCTION func;
+
RESET SESSION AUTHORIZATION;
DROP ROLE regress_subscription_user;
DROP ROLE regress_subscription_user2;
diff --git a/src/test/subscription/t/004_sync.pl b/src/test/subscription/t/004_sync.pl
index e111ab9..3a5f273 100644
--- a/src/test/subscription/t/004_sync.pl
+++ b/src/test/subscription/t/004_sync.pl
@@ -3,7 +3,7 @@ use strict;
use warnings;
use PostgresNode;
use TestLib;
-use Test::More tests => 7;
+use Test::More tests => 9;
# Initialize publisher node
my $node_publisher = get_new_node('publisher');
@@ -149,7 +149,30 @@ $result = $node_subscriber->safe_psql('postgres',
is($result, qq(20),
'changes for table added after subscription initialized replicated');
+# clean up
+$node_publisher->safe_psql('postgres', "DROP TABLE tab_rep_next");
+$node_subscriber->safe_psql('postgres', "DROP TABLE tab_rep_next");
+
+# Check if DROP SUBSCRIPTION cleans up slots on the publisher side
+# when the subscriber is stuck on data copy for constraint.
+$node_subscriber->safe_psql('postgres', "DROP SUBSCRIPTION tap_sub");
+
+# Here, tap_rep on both publisher and subscriber have the exact same records already.
+# Recreate the subscription and do initial copy of table sync worker again,
+# which violates unique constraint of tap_rep on subscriber from the beginning of synchronization.
+$node_subscriber->safe_psql('postgres',
+ "CREATE SUBSCRIPTION tap_sub CONNECTION '$publisher_connstr' PUBLICATION tap_pub");
+
+$result = $node_subscriber->poll_query_until('postgres', $started_query)
+ or die "Timed out while waiting for subscriber to start sync";
+
+$result = $node_publisher->safe_psql('postgres', "SELECT count(*) FROM pg_replication_slots");
+is($result, qq(2), 'There should be 2 slots on the publisher before dropping the slots');
+
$node_subscriber->safe_psql('postgres', "DROP SUBSCRIPTION tap_sub");
+$result = $node_publisher->safe_psql('postgres', "SELECT count(*) FROM pg_replication_slots");
+is($result, qq(0), 'DROP SUBSCRIPTION during error can clean up the slots on the publisher');
+
$node_subscriber->stop('fast');
$node_publisher->stop('fast');
--
2.2.0
On Monday, February 8, 2021 1:44 PM osumi.takamichi@fujitsu.com <osumi.takamichi@fujitsu.com>
On Mon, Feb 8, 2021 12:40 PM Amit Kapila <amit.kapila16@gmail.com>
wrote:On Mon, Feb 8, 2021 at 8:06 AM Peter Smith <smithpb2250@gmail.com>
wrote:On Sat, Feb 6, 2021 at 6:30 PM osumi.takamichi@fujitsu.com
<osumi.takamichi@fujitsu.com> wrote:I have another idea for a test case: What if we write a test
such that it fails PK violation on copy and then drop the subscription.
Then check there shouldn't be any dangling slot on the publisher?
This is similar to a test in subscription/t/004_sync.pl, we can
use some of that framework but have a separate test for this.I've added this PK violation test to the attached tests.
The patch works with v28 and made no failure during regression tests.I checked this patch. It applied cleanly on top of V28, and all
tests passedOK.
Here are two feedback comments.
1. For the regression test there is 2 x SQL and 1 x function test. I
thought to cover all the combinations there should be another
function test. e.g.
Tests ALTER … REFRESH
Tests ALTER …. (refresh = true)
Tests ALTER … (refresh = true) in a function Tests ALTER … REFRESH
in a function <== this combination is not being testing ??I am not sure whether there is much value in adding more to this set
of negative test cases unless it really covers a different code path
which I think won't happen if we add more tests here.Yeah, I agree. Accordingly, I didn't fix that part.
On Mon, Feb 8, 2021 11:36 AM Peter Smith <smithpb2250@gmail.com>
wrote:2. For the 004 test case I know the test is needing some PK constraint
violation # Check if DROP SUBSCRIPTION cleans up slots on the
publisher side # when the subscriber is stuck on data copy for
constraintBut it is not clear to me what was the exact cause of that PK
violation. I think you must be relying on data that is leftover from
some previous test case but I am not sure which one. Can you make the
comment more detailed to say
*how* the PK violation is happening - e.g something to say which rows,
in which table, and inserted by who?I added some comments to clarify how the PK violation happens.
Please have a look.
Sorry, I had a one typo in the tests of subscription.sql in v2.
I used 'foo' for the first test of "ALTER SUBSCRIPTION mytest SET PUBLICATION foo WITH (refresh = true) in v02",
but I should have used 'mypub' to make this test clearly independent from other previous tests.
Attached the fixed version.
Best Regards,
Takamichi Osumi
Attachments:
refresh_and_pk_violation_testsets_v03.patchapplication/octet-stream; name=refresh_and_pk_violation_testsets_v03.patchDownload
From 706ec27a926818126053113bbe4c04a4895b484d Mon Sep 17 00:00:00 2001
From: Osumi Takamichi <osumi.takamichi@fujitsu.com>
Date: Mon, 8 Feb 2021 06:48:09 +0000
Subject: [PATCH v03] tests for AlterSubscription and DROP SUBSCRIPTION
Check that to execute ALTER SUBSCRIPTION
with refresh in a transaction or in a function is not allowed.
Also, confirm that DROP SUBSCRIPTION during error
(like PK violation) of the subscriber can
clean up the publisher's slots correctly.
---
src/test/regress/expected/subscription.out | 20 ++++++++++++++++++++
src/test/regress/sql/subscription.sql | 21 +++++++++++++++++++++
src/test/subscription/t/004_sync.pl | 25 ++++++++++++++++++++++++-
3 files changed, 65 insertions(+), 1 deletion(-)
diff --git a/src/test/regress/expected/subscription.out b/src/test/regress/expected/subscription.out
index 2fa9bce..0fd1fe4 100644
--- a/src/test/regress/expected/subscription.out
+++ b/src/test/regress/expected/subscription.out
@@ -201,6 +201,26 @@ ALTER SUBSCRIPTION regress_testsub SET (slot_name = NONE);
(1 row)
DROP SUBSCRIPTION regress_testsub;
+-- Executing ALTER SUBSCRIPTION with refresh in a transaction or function is not allowed
+CREATE SUBSCRIPTION mytest CONNECTION 'dbname=postgres' PUBLICATION mypub
+ WITH (enabled = true, create_slot = false, copy_data = false);
+BEGIN;
+ALTER SUBSCRIPTION mytest SET PUBLICATION mypub WITH (refresh = true);
+ERROR: ALTER SUBSCRIPTION with refresh cannot run inside a transaction block
+END;
+BEGIN;
+ALTER SUBSCRIPTION mytest REFRESH PUBLICATION;
+ERROR: ALTER SUBSCRIPTION ... REFRESH cannot run inside a transaction block
+END;
+CREATE FUNCTION func() RETURNS VOID AS
+$$ ALTER SUBSCRIPTION mytest SET PUBLICATION mypub WITH (refresh = true) $$ LANGUAGE SQL;
+SELECT func();
+ERROR: ALTER SUBSCRIPTION with refresh cannot be executed from a function
+CONTEXT: SQL function "func" statement 1
+ALTER SUBSCRIPTION mytest DISABLE;
+ALTER SUBSCRIPTION mytest SET (slot_name = NONE);
+DROP SUBSCRIPTION mytest;
+DROP FUNCTION func;
RESET SESSION AUTHORIZATION;
DROP ROLE regress_subscription_user;
DROP ROLE regress_subscription_user2;
diff --git a/src/test/regress/sql/subscription.sql b/src/test/regress/sql/subscription.sql
index 14fa0b2..c3566fc 100644
--- a/src/test/regress/sql/subscription.sql
+++ b/src/test/regress/sql/subscription.sql
@@ -147,6 +147,27 @@ ALTER SUBSCRIPTION regress_testsub SET (slot_name = NONE);
DROP SUBSCRIPTION regress_testsub;
+-- Executing ALTER SUBSCRIPTION with refresh in a transaction or function is not allowed
+CREATE SUBSCRIPTION mytest CONNECTION 'dbname=postgres' PUBLICATION mypub
+ WITH (enabled = true, create_slot = false, copy_data = false);
+
+BEGIN;
+ALTER SUBSCRIPTION mytest SET PUBLICATION mypub WITH (refresh = true);
+END;
+
+BEGIN;
+ALTER SUBSCRIPTION mytest REFRESH PUBLICATION;
+END;
+
+CREATE FUNCTION func() RETURNS VOID AS
+$$ ALTER SUBSCRIPTION mytest SET PUBLICATION mypub WITH (refresh = true) $$ LANGUAGE SQL;
+SELECT func();
+
+ALTER SUBSCRIPTION mytest DISABLE;
+ALTER SUBSCRIPTION mytest SET (slot_name = NONE);
+DROP SUBSCRIPTION mytest;
+DROP FUNCTION func;
+
RESET SESSION AUTHORIZATION;
DROP ROLE regress_subscription_user;
DROP ROLE regress_subscription_user2;
diff --git a/src/test/subscription/t/004_sync.pl b/src/test/subscription/t/004_sync.pl
index e111ab9..3a5f273 100644
--- a/src/test/subscription/t/004_sync.pl
+++ b/src/test/subscription/t/004_sync.pl
@@ -3,7 +3,7 @@ use strict;
use warnings;
use PostgresNode;
use TestLib;
-use Test::More tests => 7;
+use Test::More tests => 9;
# Initialize publisher node
my $node_publisher = get_new_node('publisher');
@@ -149,7 +149,30 @@ $result = $node_subscriber->safe_psql('postgres',
is($result, qq(20),
'changes for table added after subscription initialized replicated');
+# clean up
+$node_publisher->safe_psql('postgres', "DROP TABLE tab_rep_next");
+$node_subscriber->safe_psql('postgres', "DROP TABLE tab_rep_next");
+
+# Check if DROP SUBSCRIPTION cleans up slots on the publisher side
+# when the subscriber is stuck on data copy for constraint.
+$node_subscriber->safe_psql('postgres', "DROP SUBSCRIPTION tap_sub");
+
+# Here, tap_rep on both publisher and subscriber have the exact same records already.
+# Recreate the subscription and do initial copy of table sync worker again,
+# which violates unique constraint of tap_rep on subscriber from the beginning of synchronization.
+$node_subscriber->safe_psql('postgres',
+ "CREATE SUBSCRIPTION tap_sub CONNECTION '$publisher_connstr' PUBLICATION tap_pub");
+
+$result = $node_subscriber->poll_query_until('postgres', $started_query)
+ or die "Timed out while waiting for subscriber to start sync";
+
+$result = $node_publisher->safe_psql('postgres', "SELECT count(*) FROM pg_replication_slots");
+is($result, qq(2), 'There should be 2 slots on the publisher before dropping the slots');
+
$node_subscriber->safe_psql('postgres', "DROP SUBSCRIPTION tap_sub");
+$result = $node_publisher->safe_psql('postgres', "SELECT count(*) FROM pg_replication_slots");
+is($result, qq(0), 'DROP SUBSCRIPTION during error can clean up the slots on the publisher');
+
$node_subscriber->stop('fast');
$node_publisher->stop('fast');
--
2.2.0
On Fri, Feb 5, 2021 at 8:40 PM Petr Jelinek
<petr.jelinek@enterprisedb.com> wrote:
Hi,
We had a bit high-level discussion about this patches with Amit
off-list, so I decided to also take a look at the actual code.
Thanks for the discussion and a follow-up review.
My main concern originally was the potential for left-over slots on
publisher, but I think the state now is relatively okay, with couple of
corner cases that are documented and don't seem much worse than the main
slot.I wonder if we should mention the max_slot_wal_keep_size GUC in the
table sync docs though.
I have added the reference of this in Alter Subscription where we
mentioned the risk of leftover slots. Let me know if you have
something else in mind?
Another thing that might need documentation is that the the visibility
of changes done by table sync is not anymore isolated in that table
contents will show intermediate progress to other backends, rather than
switching from nothing to state consistent with rest of replication.
Agreed and updated the docs accordingly.
Some minor comments about code:
+ else if (res->status == WALRCV_ERROR && missing_ok) + { + /* WARNING. Error, but missing_ok = true. */ + ereport(WARNING,I wonder if we need to add error code to the WalRcvExecResult and check
for the appropriate ones here. Because this can for example return error
because of timeout, not because slot is missing.
I think there are both pros and cons of distinguishing the error
("slot doesnot exist" from others). The benefit is if there a network
glitch then the user can probably retry the commands Alter/Drop and it
will be successful next time. OTOH, say the network is broken for a
long time and the user wants to proceed but there won't be any way to
proceed for Alter Subscription ... Refresh or Drop Command. So by
giving WARNING at least we can provide a way to proceed and then they
can drop such slots later. We have mentioned this in docs as well. I
think we can go either way here, let me know what do you think is a
better way?
Not sure if it matters
for current callers though (but then maybe don't call the param
missign_ok?).
Sure, if we decide not to change the behavior as suggested by you then
this makes sense.
+ReplicationSlotNameForTablesync(Oid suboid, Oid relid, char syncslotname[NAMEDATALEN]) +{ + if (syncslotname) + sprintf(syncslotname, "pg_%u_sync_%u", suboid, relid); + else + syncslotname = psprintf("pg_%u_sync_%u", suboid, relid); + + return syncslotname; +}Given that we are now explicitly dropping slots, what happens here if we
have 2 different downstreams that happen to get same suboid and reloid,
will one of the drop the slot of the other one? Previously with the
cleanup being left to temp slot we'd at maximum got error when creating
it but with the new logic in LogicalRepSyncTableStart it feels like we
could get into situation where 2 downstreams are fighting over slot no?
As discussed, added system_identifier to distinguish subscriptions
between different clusters.
Apart from fixing the above comment, I have integrated it with the new
replorigin_drop_by_name() API being discussed in the thread [1]/messages/by-id/CAA4eK1L7mLhY=wyCB0qsEGUpfzWfncDSS9_0a4Co+N0GUyNGNQ@mail.gmail.com and
posted that patch just for ease. I have also integrated Osumi-San's
test case patch with minor modifications.
[1]: /messages/by-id/CAA4eK1L7mLhY=wyCB0qsEGUpfzWfncDSS9_0a4Co+N0GUyNGNQ@mail.gmail.com
--
With Regards,
Amit Kapila.
Attachments:
v29-0001-Make-pg_replication_origin_drop-safe-against-con.patchapplication/octet-stream; name=v29-0001-Make-pg_replication_origin_drop-safe-against-con.patchDownload
From 6516c90d75cc00224009eacdcc07892d12c0dba9 Mon Sep 17 00:00:00 2001
From: Amit Kapila <akapila@postgresql.org>
Date: Fri, 5 Feb 2021 12:20:13 +0530
Subject: [PATCH v29 1/2] Make pg_replication_origin_drop safe against
concurrent drops.
Currently, we get the origin id from the name and then drop the origin by
taking ExclusiveLock on ReplicationOriginRelationId. So, two concurrent
sessions can get the id from the name at the same time and then when they
try to drop the origin, one of the sessions will get the either
"tuple concurrently deleted" or "cache lookup failed for replication
origin ..".
To prevent this race condition we do the entire operation under lock. This
obviates the need for replorigin_drop() API and we have removed it so if
any extension authors are using it they need to instead use
replorigin_drop_by_name. See it's usage in pg_replication_origin_drop().
Author: Peter Smith
Reviewed-by: Amit Kapila, Euler Taveira, and Petr Jelinek
Discussion: https://www.postgresql.org/message-id/CAHut%2BPuW8DWV5fskkMWWMqzt-x7RPcNQOtJQBp6SdwyRghCk7A%40mail.gmail.com
---
src/backend/commands/subscriptioncmds.c | 5 +--
src/backend/replication/logical/origin.c | 54 ++++++++++++++++++--------------
src/include/replication/origin.h | 2 +-
3 files changed, 33 insertions(+), 28 deletions(-)
diff --git a/src/backend/commands/subscriptioncmds.c b/src/backend/commands/subscriptioncmds.c
index 082f785..5ccbc9d 100644
--- a/src/backend/commands/subscriptioncmds.c
+++ b/src/backend/commands/subscriptioncmds.c
@@ -926,7 +926,6 @@ DropSubscription(DropSubscriptionStmt *stmt, bool isTopLevel)
ListCell *lc;
char originname[NAMEDATALEN];
char *err = NULL;
- RepOriginId originid;
WalReceiverConn *wrconn = NULL;
StringInfoData cmd;
Form_pg_subscription form;
@@ -1050,9 +1049,7 @@ DropSubscription(DropSubscriptionStmt *stmt, bool isTopLevel)
/* Remove the origin tracking if exists. */
snprintf(originname, sizeof(originname), "pg_%u", subid);
- originid = replorigin_by_name(originname, true);
- if (originid != InvalidRepOriginId)
- replorigin_drop(originid, false);
+ replorigin_drop_by_name(originname, true, false);
/*
* If there is no slot associated with the subscription, we can finish
diff --git a/src/backend/replication/logical/origin.c b/src/backend/replication/logical/origin.c
index 9bd761a..d24c3ad 100644
--- a/src/backend/replication/logical/origin.c
+++ b/src/backend/replication/logical/origin.c
@@ -322,27 +322,15 @@ replorigin_create(char *roname)
return roident;
}
-
/*
- * Drop replication origin.
- *
- * Needs to be called in a transaction.
+ * Helper function to drop a replication origin.
*/
-void
-replorigin_drop(RepOriginId roident, bool nowait)
+static void
+replorigin_drop_guts(Relation rel, RepOriginId roident, bool nowait)
{
HeapTuple tuple;
- Relation rel;
int i;
- Assert(IsTransactionState());
-
- /*
- * To interlock against concurrent drops, we hold ExclusiveLock on
- * pg_replication_origin throughout this function.
- */
- rel = table_open(ReplicationOriginRelationId, ExclusiveLock);
-
/*
* First, clean up the slot state info, if there is any matching slot.
*/
@@ -415,11 +403,35 @@ restart:
ReleaseSysCache(tuple);
CommandCounterIncrement();
-
- /* now release lock again */
- table_close(rel, ExclusiveLock);
}
+/*
+ * Drop replication origin (by name).
+ *
+ * Needs to be called in a transaction.
+ */
+void
+replorigin_drop_by_name(char *name, bool missing_ok, bool nowait)
+{
+ RepOriginId roident;
+ Relation rel;
+
+ Assert(IsTransactionState());
+
+ /*
+ * To interlock against concurrent drops, we hold ExclusiveLock on
+ * pg_replication_origin throughout this function.
+ */
+ rel = table_open(ReplicationOriginRelationId, ExclusiveLock);
+
+ roident = replorigin_by_name(name, missing_ok);
+
+ if (OidIsValid(roident))
+ replorigin_drop_guts(rel, roident, nowait);
+
+ /* We keep the lock on pg_replication_origin until commit */
+ table_close(rel, NoLock);
+}
/*
* Lookup replication origin via its oid and return the name.
@@ -1256,16 +1268,12 @@ Datum
pg_replication_origin_drop(PG_FUNCTION_ARGS)
{
char *name;
- RepOriginId roident;
replorigin_check_prerequisites(false, false);
name = text_to_cstring((text *) DatumGetPointer(PG_GETARG_DATUM(0)));
- roident = replorigin_by_name(name, false);
- Assert(OidIsValid(roident));
-
- replorigin_drop(roident, true);
+ replorigin_drop_by_name(name, false, true);
pfree(name);
diff --git a/src/include/replication/origin.h b/src/include/replication/origin.h
index 731445a..d2ed630 100644
--- a/src/include/replication/origin.h
+++ b/src/include/replication/origin.h
@@ -40,7 +40,7 @@ extern PGDLLIMPORT TimestampTz replorigin_session_origin_timestamp;
/* API for querying & manipulating replication origins */
extern RepOriginId replorigin_by_name(char *name, bool missing_ok);
extern RepOriginId replorigin_create(char *name);
-extern void replorigin_drop(RepOriginId roident, bool nowait);
+extern void replorigin_drop_by_name(char *name, bool missing_ok, bool nowait);
extern bool replorigin_by_oid(RepOriginId roident, bool missing_ok,
char **roname);
--
1.8.3.1
v29-0002-Allow-multiple-xacts-during-table-sync-in-logica.patchapplication/octet-stream; name=v29-0002-Allow-multiple-xacts-during-table-sync-in-logica.patchDownload
From 5cb4aa304c55813b491a7a7c21b005619afd5fd6 Mon Sep 17 00:00:00 2001
From: Amit Kapila <akapila@postgresql.org>
Date: Sat, 6 Feb 2021 17:16:19 +0530
Subject: [PATCH v29 2/2] Allow multiple xacts during table sync in logical
replication.
For the initial table data synchronization in logical replication, we use
a single transaction to copy the entire table and then synchronize the
position in the stream with the main apply worker.
There are multiple downsides of this approach: (a) We have to perform the
entire copy operation again if there is any error (network breakdown,
error in the database operation, etc.) while we synchronize the WAL
position between tablesync worker and apply worker; this will be onerous
especially for large copies, (b) Using a single transaction in the
synchronization-phase (where we can receive WAL from multiple
transactions) will have the risk of exceeding the CID limit, (c) The slot
will hold the WAL till the entire sync is complete because we never commit
till the end.
This patch solves all the above downsides by allowing multiple
transactions during the tablesync phase. The initial copy is done in a
single transaction and after that, we commit each transaction as we
receive. To allow recovery after any error or crash, we use a permanent
slot and origin to track the progress. The slot and origin will be removed
once we finish the synchronization of the table. We also remove slot and
origin of tablesync workers if the user performs DROP SUBSCRIPTION .. or
ALTER SUBSCRIPTION .. REFERESH and some of the table syncs are still not
finished.
The commands ALTER SUBSCRIPTION ... REFRESH .. and
ALTER SUBSCRIPTION ... SET PUBLICATION .. with refresh option as true
cannot be executed inside a transaction block because they can now drop
the slots for which we have no provision to rollback.
This will also open up the path for logical replication of 2PC
transactions on the subscriber side. Previously, we can't do that because
of the requirement of maintaining a single transaction in tablesync
workers.
Author: Peter Smith, Amit Kapila, and Takamichi Osumi
Reviewed-by: Ajin Cherian, Petr Jelinek, Hou Zhijie and Amit Kapila
Discussion: https://postgr.es/m/CAA4eK1KHJxaZS-fod-0fey=0tq3=Gkn4ho=8N4-5HWiCfu0H1A@mail.gmail.com
---
doc/src/sgml/catalogs.sgml | 1 +
doc/src/sgml/logical-replication.sgml | 59 ++--
doc/src/sgml/ref/alter_subscription.sgml | 20 ++
doc/src/sgml/ref/drop_subscription.sgml | 6 +-
src/backend/access/transam/xact.c | 11 -
src/backend/catalog/pg_subscription.c | 35 +++
src/backend/commands/subscriptioncmds.c | 468 ++++++++++++++++++++++------
src/backend/replication/logical/launcher.c | 147 ---------
src/backend/replication/logical/tablesync.c | 226 ++++++++++++--
src/backend/replication/logical/worker.c | 18 +-
src/backend/tcop/utility.c | 3 +-
src/include/catalog/pg_subscription_rel.h | 2 +
src/include/commands/subscriptioncmds.h | 2 +-
src/include/replication/logicallauncher.h | 2 -
src/include/replication/slot.h | 3 +
src/include/replication/worker_internal.h | 2 +-
src/test/regress/expected/subscription.out | 21 ++
src/test/regress/sql/subscription.sql | 22 ++
src/test/subscription/t/004_sync.pl | 21 +-
src/tools/pgindent/typedefs.list | 1 -
20 files changed, 742 insertions(+), 328 deletions(-)
diff --git a/doc/src/sgml/catalogs.sgml b/doc/src/sgml/catalogs.sgml
index ea222c0..692ad65 100644
--- a/doc/src/sgml/catalogs.sgml
+++ b/doc/src/sgml/catalogs.sgml
@@ -7673,6 +7673,7 @@ SCRAM-SHA-256$<replaceable><iteration count></replaceable>:<replaceable>&l
State code:
<literal>i</literal> = initialize,
<literal>d</literal> = data is being copied,
+ <literal>f</literal> = finished table copy,
<literal>s</literal> = synchronized,
<literal>r</literal> = ready (normal replication)
</para></entry>
diff --git a/doc/src/sgml/logical-replication.sgml b/doc/src/sgml/logical-replication.sgml
index a560ad6..d467923 100644
--- a/doc/src/sgml/logical-replication.sgml
+++ b/doc/src/sgml/logical-replication.sgml
@@ -186,9 +186,10 @@
<para>
Each subscription will receive changes via one replication slot (see
- <xref linkend="streaming-replication-slots"/>). Additional temporary
- replication slots may be required for the initial data synchronization
- of pre-existing table data.
+ <xref linkend="streaming-replication-slots"/>). Additional replication
+ slots may be required for the initial data synchronization of
+ pre-existing table data and those will be dropped at the end of data
+ synchronization.
</para>
<para>
@@ -248,13 +249,23 @@
<para>
As mentioned earlier, each (active) subscription receives changes from a
- replication slot on the remote (publishing) side. Normally, the remote
- replication slot is created automatically when the subscription is created
- using <command>CREATE SUBSCRIPTION</command> and it is dropped
- automatically when the subscription is dropped using <command>DROP
- SUBSCRIPTION</command>. In some situations, however, it can be useful or
- necessary to manipulate the subscription and the underlying replication
- slot separately. Here are some scenarios:
+ replication slot on the remote (publishing) side.
+ </para>
+ <para>
+ Additional table synchronization slots are normally transient, created
+ internally to perform initial table synchronization and dropped
+ automatically when they are no longer needed. These table synchronization
+ slots have generated names: <quote><literal>pg_%u_sync_%u_%llu</literal></quote>
+ (parameters: Subscription <parameter>oid</parameter>,
+ Table <parameter>relid</parameter>, system identifier<parameter>sysid</parameter>)
+ </para>
+ <para>
+ Normally, the remote replication slot is created automatically when the
+ subscription is created using <command>CREATE SUBSCRIPTION</command> and it
+ is dropped automatically when the subscription is dropped using
+ <command>DROP SUBSCRIPTION</command>. In some situations, however, it can
+ be useful or necessary to manipulate the subscription and the underlying
+ replication slot separately. Here are some scenarios:
<itemizedlist>
<listitem>
@@ -294,8 +305,9 @@
using <command>ALTER SUBSCRIPTION</command> before attempting to drop
the subscription. If the remote database instance no longer exists, no
further action is then necessary. If, however, the remote database
- instance is just unreachable, the replication slot should then be
- dropped manually; otherwise it would continue to reserve WAL and might
+ instance is just unreachable, the replication slot (and any still
+ remaining table synchronization slots) should then be
+ dropped manually; otherwise it/they would continue to reserve WAL and might
eventually cause the disk to fill up. Such cases should be carefully
investigated.
</para>
@@ -468,16 +480,19 @@
<sect2 id="logical-replication-snapshot">
<title>Initial Snapshot</title>
<para>
- The initial data in existing subscribed tables are snapshotted and
- copied in a parallel instance of a special kind of apply process.
- This process will create its own temporary replication slot and
- copy the existing data. Once existing data is copied, the worker
- enters synchronization mode, which ensures that the table is brought
- up to a synchronized state with the main apply process by streaming
- any changes that happened during the initial data copy using standard
- logical replication. Once the synchronization is done, the control
- of the replication of the table is given back to the main apply
- process where the replication continues as normal.
+ The initial data in existing subscribed tables are snapshotted and
+ copied in a parallel instance of a special kind of apply process.
+ This process will create its own replication slot and copy the existing
+ data. As soon as the copy is finished the table contents will become
+ visible to other backends. Once existing data is copied, the worker
+ enters synchronization mode, which ensures that the table is brought
+ up to a synchronized state with the main apply process by streaming
+ any changes that happened during the initial data copy using standard
+ logical replication. During this synchronization phase, the changes
+ are applied and committed in the same order as they happened on the
+ publisher. Once the synchronization is done, the control of the
+ replication of the table is given back to the main apply process where
+ the replication continues as normal.
</para>
</sect2>
</sect1>
diff --git a/doc/src/sgml/ref/alter_subscription.sgml b/doc/src/sgml/ref/alter_subscription.sgml
index db5e59f..eb3e09b 100644
--- a/doc/src/sgml/ref/alter_subscription.sgml
+++ b/doc/src/sgml/ref/alter_subscription.sgml
@@ -48,6 +48,26 @@ ALTER SUBSCRIPTION <replaceable class="parameter">name</replaceable> RENAME TO <
(Currently, all subscription owners must be superusers, so the owner checks
will be bypassed in practice. But this might change in the future.)
</para>
+
+ <para>
+ When refreshing a publication we remove the relations that are no longer
+ part of the publication and we also remove the tablesync slots if there are
+ any. It is necessary to remove tablesync slots so that the resources
+ allocated for the subscription on the remote host are released. If due to
+ network breakdown or some other error, <productname>PostgreSQL</productname>
+ is unable to remove the slots, a WARNING will be reported. The user needs to
+ manually remove such slots later or the
+ <xref linkend="guc-max-slot-wal-keep-size"/> should be configured on the
+ remote host as otherwise, they will continue to reserve WAL and might
+ eventually cause the disk to fill up. See also
+ <xref linkend="logical-replication-subscription-slot"/>.
+ </para>
+
+ <para>
+ Commands <command>ALTER SUBSCRIPTION ... REFRESH ..</command> and
+ <command>ALTER SUBSCRIPTION ... SET PUBLICATION ..</command> with refresh
+ option as true cannot be executed inside a transaction block.
+ </para>
</refsect1>
<refsect1>
diff --git a/doc/src/sgml/ref/drop_subscription.sgml b/doc/src/sgml/ref/drop_subscription.sgml
index adbdeaf..aee9615 100644
--- a/doc/src/sgml/ref/drop_subscription.sgml
+++ b/doc/src/sgml/ref/drop_subscription.sgml
@@ -79,7 +79,8 @@ DROP SUBSCRIPTION [ IF EXISTS ] <replaceable class="parameter">name</replaceable
<para>
When dropping a subscription that is associated with a replication slot on
the remote host (the normal state), <command>DROP SUBSCRIPTION</command>
- will connect to the remote host and try to drop the replication slot as
+ will connect to the remote host and try to drop the replication slot (and
+ any remaining table synchronization slots) as
part of its operation. This is necessary so that the resources allocated
for the subscription on the remote host are released. If this fails,
either because the remote host is not reachable or because the remote
@@ -89,7 +90,8 @@ DROP SUBSCRIPTION [ IF EXISTS ] <replaceable class="parameter">name</replaceable
executing <literal>ALTER SUBSCRIPTION ... SET (slot_name = NONE)</literal>.
After that, <command>DROP SUBSCRIPTION</command> will no longer attempt any
actions on a remote host. Note that if the remote replication slot still
- exists, it should then be dropped manually; otherwise it will continue to
+ exists, it (and any related table synchronization slots) should then be
+ dropped manually; otherwise it/they will continue to
reserve WAL and might eventually cause the disk to fill up. See
also <xref linkend="logical-replication-subscription-slot"/>.
</para>
diff --git a/src/backend/access/transam/xact.c b/src/backend/access/transam/xact.c
index a2068e3..3c8b4eb 100644
--- a/src/backend/access/transam/xact.c
+++ b/src/backend/access/transam/xact.c
@@ -2432,15 +2432,6 @@ PrepareTransaction(void)
(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
errmsg("cannot PREPARE a transaction that has exported snapshots")));
- /*
- * Don't allow PREPARE but for transaction that has/might kill logical
- * replication workers.
- */
- if (XactManipulatesLogicalReplicationWorkers())
- ereport(ERROR,
- (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
- errmsg("cannot PREPARE a transaction that has manipulated logical replication workers")));
-
/* Prevent cancel/die interrupt while cleaning up */
HOLD_INTERRUPTS();
@@ -4899,7 +4890,6 @@ CommitSubTransaction(void)
AtEOSubXact_HashTables(true, s->nestingLevel);
AtEOSubXact_PgStat(true, s->nestingLevel);
AtSubCommit_Snapshot(s->nestingLevel);
- AtEOSubXact_ApplyLauncher(true, s->nestingLevel);
/*
* We need to restore the upper transaction's read-only state, in case the
@@ -5059,7 +5049,6 @@ AbortSubTransaction(void)
AtEOSubXact_HashTables(false, s->nestingLevel);
AtEOSubXact_PgStat(false, s->nestingLevel);
AtSubAbort_Snapshot(s->nestingLevel);
- AtEOSubXact_ApplyLauncher(false, s->nestingLevel);
}
/*
diff --git a/src/backend/catalog/pg_subscription.c b/src/backend/catalog/pg_subscription.c
index 44cb285..4f567fd 100644
--- a/src/backend/catalog/pg_subscription.c
+++ b/src/backend/catalog/pg_subscription.c
@@ -29,6 +29,7 @@
#include "utils/array.h"
#include "utils/builtins.h"
#include "utils/fmgroids.h"
+#include "utils/lsyscache.h"
#include "utils/pg_lsn.h"
#include "utils/rel.h"
#include "utils/syscache.h"
@@ -337,6 +338,13 @@ GetSubscriptionRelState(Oid subid, Oid relid, XLogRecPtr *sublsn)
char substate;
bool isnull;
Datum d;
+ Relation rel;
+
+ /*
+ * This is to avoid the race condition with AlterSubscription which tries
+ * to remove this relstate.
+ */
+ rel = table_open(SubscriptionRelRelationId, AccessShareLock);
/* Try finding the mapping. */
tup = SearchSysCache2(SUBSCRIPTIONRELMAP,
@@ -363,6 +371,8 @@ GetSubscriptionRelState(Oid subid, Oid relid, XLogRecPtr *sublsn)
/* Cleanup */
ReleaseSysCache(tup);
+ table_close(rel, AccessShareLock);
+
return substate;
}
@@ -403,6 +413,31 @@ RemoveSubscriptionRel(Oid subid, Oid relid)
scan = table_beginscan_catalog(rel, nkeys, skey);
while (HeapTupleIsValid(tup = heap_getnext(scan, ForwardScanDirection)))
{
+ Form_pg_subscription_rel subrel;
+
+ subrel = (Form_pg_subscription_rel) GETSTRUCT(tup);
+
+ /*
+ * We don't allow to drop the relation mapping when the table
+ * synchronization is in progress unless the caller updates the
+ * corresponding subscription as well. This is to ensure that we don't
+ * leave tablesync slots or origins in the system when the
+ * corresponding table is dropped.
+ */
+ if (!OidIsValid(subid) && subrel->srsubstate != SUBREL_STATE_READY)
+ {
+ ereport(ERROR,
+ (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+ errmsg("could not drop relation mapping for subscription \"%s\"",
+ get_subscription_name(subrel->srsubid, false)),
+ errdetail("Table synchronization for relation \"%s\" is in progress and is in state \"%c\".",
+ get_rel_name(relid), subrel->srsubstate),
+ /* translator: first %s is a SQL ALTER command and second %s is a SQL DROP command */
+ errhint("Use %s to enable subscription if not already enabled or use %s to drop the subscription.",
+ "ALTER SUBSCRIPTION ... ENABLE",
+ "DROP SUBSCRIPTION ...")));
+ }
+
CatalogTupleDelete(rel, &tup->t_self);
}
table_endscan(scan);
diff --git a/src/backend/commands/subscriptioncmds.c b/src/backend/commands/subscriptioncmds.c
index 5ccbc9d..d856dfd 100644
--- a/src/backend/commands/subscriptioncmds.c
+++ b/src/backend/commands/subscriptioncmds.c
@@ -34,6 +34,7 @@
#include "nodes/makefuncs.h"
#include "replication/logicallauncher.h"
#include "replication/origin.h"
+#include "replication/slot.h"
#include "replication/walreceiver.h"
#include "replication/walsender.h"
#include "replication/worker_internal.h"
@@ -46,6 +47,8 @@
#include "utils/syscache.h"
static List *fetch_table_list(WalReceiverConn *wrconn, List *publications);
+static void ReportSlotConnectionError(List *rstates, Oid subid, char *slotname, char *err);
+
/*
* Common option parsing function for CREATE and ALTER SUBSCRIPTION commands.
@@ -566,107 +569,212 @@ AlterSubscription_refresh(Subscription *sub, bool copy_data)
Oid *pubrel_local_oids;
ListCell *lc;
int off;
+ int remove_rel_len;
+ Relation rel = NULL;
+ typedef struct SubRemoveRels
+ {
+ Oid relid;
+ char state;
+ } SubRemoveRels;
+ SubRemoveRels *sub_remove_rels;
/* Load the library providing us libpq calls. */
load_file("libpqwalreceiver", false);
- /* Try to connect to the publisher. */
- wrconn = walrcv_connect(sub->conninfo, true, sub->name, &err);
- if (!wrconn)
- ereport(ERROR,
- (errmsg("could not connect to the publisher: %s", err)));
-
- /* Get the table list from publisher. */
- pubrel_names = fetch_table_list(wrconn, sub->publications);
+ PG_TRY();
+ {
+ /* Try to connect to the publisher. */
+ wrconn = walrcv_connect(sub->conninfo, true, sub->name, &err);
+ if (!wrconn)
+ ereport(ERROR,
+ (errmsg("could not connect to the publisher: %s", err)));
- /* We are done with the remote side, close connection. */
- walrcv_disconnect(wrconn);
+ /* Get the table list from publisher. */
+ pubrel_names = fetch_table_list(wrconn, sub->publications);
- /* Get local table list. */
- subrel_states = GetSubscriptionRelations(sub->oid);
+ /* Get local table list. */
+ subrel_states = GetSubscriptionRelations(sub->oid);
- /*
- * Build qsorted array of local table oids for faster lookup. This can
- * potentially contain all tables in the database so speed of lookup is
- * important.
- */
- subrel_local_oids = palloc(list_length(subrel_states) * sizeof(Oid));
- off = 0;
- foreach(lc, subrel_states)
- {
- SubscriptionRelState *relstate = (SubscriptionRelState *) lfirst(lc);
+ /*
+ * Build qsorted array of local table oids for faster lookup. This can
+ * potentially contain all tables in the database so speed of lookup
+ * is important.
+ */
+ subrel_local_oids = palloc(list_length(subrel_states) * sizeof(Oid));
+ off = 0;
+ foreach(lc, subrel_states)
+ {
+ SubscriptionRelState *relstate = (SubscriptionRelState *) lfirst(lc);
- subrel_local_oids[off++] = relstate->relid;
- }
- qsort(subrel_local_oids, list_length(subrel_states),
- sizeof(Oid), oid_cmp);
+ subrel_local_oids[off++] = relstate->relid;
+ }
+ qsort(subrel_local_oids, list_length(subrel_states),
+ sizeof(Oid), oid_cmp);
+
+ /*
+ * Rels that we want to remove from subscription and drop any slots and
+ * origins corresponding to them.
+ */
+ sub_remove_rels = palloc(list_length(subrel_states) * sizeof(SubRemoveRels));
+
+ /*
+ * Walk over the remote tables and try to match them to locally known
+ * tables. If the table is not known locally create a new state for
+ * it.
+ *
+ * Also builds array of local oids of remote tables for the next step.
+ */
+ off = 0;
+ pubrel_local_oids = palloc(list_length(pubrel_names) * sizeof(Oid));
+
+ foreach(lc, pubrel_names)
+ {
+ RangeVar *rv = (RangeVar *) lfirst(lc);
+ Oid relid;
- /*
- * Walk over the remote tables and try to match them to locally known
- * tables. If the table is not known locally create a new state for it.
- *
- * Also builds array of local oids of remote tables for the next step.
- */
- off = 0;
- pubrel_local_oids = palloc(list_length(pubrel_names) * sizeof(Oid));
+ relid = RangeVarGetRelid(rv, AccessShareLock, false);
- foreach(lc, pubrel_names)
- {
- RangeVar *rv = (RangeVar *) lfirst(lc);
- Oid relid;
+ /* Check for supported relkind. */
+ CheckSubscriptionRelkind(get_rel_relkind(relid),
+ rv->schemaname, rv->relname);
- relid = RangeVarGetRelid(rv, AccessShareLock, false);
+ pubrel_local_oids[off++] = relid;
- /* Check for supported relkind. */
- CheckSubscriptionRelkind(get_rel_relkind(relid),
- rv->schemaname, rv->relname);
+ if (!bsearch(&relid, subrel_local_oids,
+ list_length(subrel_states), sizeof(Oid), oid_cmp))
+ {
+ AddSubscriptionRelState(sub->oid, relid,
+ copy_data ? SUBREL_STATE_INIT : SUBREL_STATE_READY,
+ InvalidXLogRecPtr);
+ ereport(DEBUG1,
+ (errmsg("table \"%s.%s\" added to subscription \"%s\"",
+ rv->schemaname, rv->relname, sub->name)));
+ }
+ }
- pubrel_local_oids[off++] = relid;
+ /*
+ * Next remove state for tables we should not care about anymore using
+ * the data we collected above
+ */
+ qsort(pubrel_local_oids, list_length(pubrel_names),
+ sizeof(Oid), oid_cmp);
- if (!bsearch(&relid, subrel_local_oids,
- list_length(subrel_states), sizeof(Oid), oid_cmp))
+ remove_rel_len = 0;
+ for (off = 0; off < list_length(subrel_states); off++)
{
- AddSubscriptionRelState(sub->oid, relid,
- copy_data ? SUBREL_STATE_INIT : SUBREL_STATE_READY,
- InvalidXLogRecPtr);
- ereport(DEBUG1,
- (errmsg("table \"%s.%s\" added to subscription \"%s\"",
- rv->schemaname, rv->relname, sub->name)));
- }
- }
+ Oid relid = subrel_local_oids[off];
- /*
- * Next remove state for tables we should not care about anymore using the
- * data we collected above
- */
- qsort(pubrel_local_oids, list_length(pubrel_names),
- sizeof(Oid), oid_cmp);
+ if (!bsearch(&relid, pubrel_local_oids,
+ list_length(pubrel_names), sizeof(Oid), oid_cmp))
+ {
+ char state;
+ XLogRecPtr statelsn;
+
+ /*
+ * Lock pg_subscription_rel with AccessExclusiveLock to prevent any race
+ * conditions with the apply worker re-launching workers at the same time
+ * this code is trying to remove those tables.
+ *
+ * Even if new worker for this particular rel is restarted it won't be able
+ * to make any progress as we hold exclusive lock on subscription_rel till
+ * the transaction end. It will simply exit as there is no corresponding
+ * rel entry.
+ *
+ * This locking also ensures that the state of rels won't change till we
+ * are done with this refresh operation.
+ */
+ if (!rel)
+ rel = table_open(SubscriptionRelRelationId, AccessExclusiveLock);
+
+ /* Last known rel state. */
+ state = GetSubscriptionRelState(sub->oid, relid, &statelsn);
+
+ sub_remove_rels[remove_rel_len].relid = relid;
+ sub_remove_rels[remove_rel_len++].state = state;
+
+ RemoveSubscriptionRel(sub->oid, relid);
+
+ logicalrep_worker_stop(sub->oid, relid);
+
+ /*
+ * For READY state, we would have already dropped the tablesync
+ * origin.
+ */
+ if (state != SUBREL_STATE_READY)
+ {
+ char originname[NAMEDATALEN];
+
+ /*
+ * Drop the tablesync's origin tracking if exists.
+ *
+ * It is possible that the origin is not yet created for
+ * tablesync worker, this can happen for the states before
+ * SUBREL_STATE_FINISHEDCOPY. The apply worker can also
+ * concurrently try to drop the origin and by this time the
+ * origin might be already removed. For these reasons,
+ * passing missing_ok = true from here.
+ */
+ snprintf(originname, sizeof(originname), "pg_%u_%u", sub->oid, relid);
+ replorigin_drop_by_name(originname, true, false);
+ }
- for (off = 0; off < list_length(subrel_states); off++)
- {
- Oid relid = subrel_local_oids[off];
+ ereport(DEBUG1,
+ (errmsg("table \"%s.%s\" removed from subscription \"%s\"",
+ get_namespace_name(get_rel_namespace(relid)),
+ get_rel_name(relid),
+ sub->name)));
+ }
+ }
- if (!bsearch(&relid, pubrel_local_oids,
- list_length(pubrel_names), sizeof(Oid), oid_cmp))
+ /*
+ * Drop the tablesync slots associated with removed tables. This has to
+ * be at the end because otherwise if there is an error while doing the
+ * database operations we won't be able to rollback dropped slots.
+ */
+ for (off = 0; off < remove_rel_len; off++)
{
- RemoveSubscriptionRel(sub->oid, relid);
-
- logicalrep_worker_stop_at_commit(sub->oid, relid);
-
- ereport(DEBUG1,
- (errmsg("table \"%s.%s\" removed from subscription \"%s\"",
- get_namespace_name(get_rel_namespace(relid)),
- get_rel_name(relid),
- sub->name)));
+ if (sub_remove_rels[off].state != SUBREL_STATE_READY &&
+ sub_remove_rels[off].state != SUBREL_STATE_SYNCDONE)
+ {
+ char syncslotname[NAMEDATALEN] = { 0 };
+
+ /*
+ * For READY/SYNCDONE states we know the tablesync slot has
+ * already been dropped by the tablesync worker.
+ *
+ * For other states, there is no certainty, maybe the slot
+ * does not exist yet. Also, if we fail after removing some of
+ * the slots, next time, it will again try to drop already
+ * dropped slots and fail. For these reasons, we allow
+ * missing_ok = true for the drop.
+ *
+ * XXX If there is a network breakdown while dropping the
+ * slots then we will give a WARNING to the user and they need
+ * to manually remove such slots. This can happen so rarely to
+ * worry about and we don't have any better way to deal with
+ * this.
+ */
+ ReplicationSlotNameForTablesync(sub->oid, sub_remove_rels[off].relid, syncslotname);
+ ReplicationSlotDropAtPubNode(wrconn, syncslotname, true /* missing_ok */);
+ }
}
}
+ PG_FINALLY();
+ {
+ if (wrconn)
+ walrcv_disconnect(wrconn);
+ }
+ PG_END_TRY();
+
+ if (rel)
+ table_close(rel, NoLock);
}
/*
* Alter the existing subscription.
*/
ObjectAddress
-AlterSubscription(AlterSubscriptionStmt *stmt)
+AlterSubscription(AlterSubscriptionStmt *stmt, bool isTopLevel)
{
Relation rel;
ObjectAddress myself;
@@ -848,6 +956,8 @@ AlterSubscription(AlterSubscriptionStmt *stmt)
errmsg("ALTER SUBSCRIPTION with refresh is not allowed for disabled subscriptions"),
errhint("Use ALTER SUBSCRIPTION ... SET PUBLICATION ... WITH (refresh = false).")));
+ PreventInTransactionBlock(isTopLevel, "ALTER SUBSCRIPTION with refresh");
+
/* Make sure refresh sees the new list of publications. */
sub->publications = stmt->publication;
@@ -877,6 +987,8 @@ AlterSubscription(AlterSubscriptionStmt *stmt)
NULL, NULL, /* no "binary" */
NULL, NULL); /* no "streaming" */
+ PreventInTransactionBlock(isTopLevel, "ALTER SUBSCRIPTION ... REFRESH");
+
AlterSubscription_refresh(sub, copy_data);
break;
@@ -927,8 +1039,8 @@ DropSubscription(DropSubscriptionStmt *stmt, bool isTopLevel)
char originname[NAMEDATALEN];
char *err = NULL;
WalReceiverConn *wrconn = NULL;
- StringInfoData cmd;
Form_pg_subscription form;
+ List *rstates;
/*
* Lock pg_subscription with AccessExclusiveLock to ensure that the
@@ -1041,6 +1153,36 @@ DropSubscription(DropSubscriptionStmt *stmt, bool isTopLevel)
}
list_free(subworkers);
+ /*
+ * Cleanup of tablesync replication origins.
+ *
+ * Any READY-state relations would already have dealt with clean-ups.
+ *
+ * Note that the state can't change because we have already stopped both
+ * the apply and tablesync workers and they can't restart because of
+ * exclusive lock on the subscription.
+ */
+ rstates = GetSubscriptionNotReadyRelations(subid);
+ foreach(lc, rstates)
+ {
+ SubscriptionRelState* rstate = (SubscriptionRelState*) lfirst(lc);
+ Oid relid = rstate->relid;
+
+ /* Only cleanup resources of tablesync workers */
+ if (!OidIsValid(relid))
+ continue;
+
+ /*
+ * Drop the tablesync's origin tracking if exists.
+ *
+ * It is possible that the origin is not yet created for tablesync
+ * worker so passing missing_ok = true from here. This can happen
+ * for the states before SUBREL_STATE_FINISHEDCOPY.
+ */
+ snprintf(originname, sizeof(originname), "pg_%u_%u", subid, relid);
+ replorigin_drop_by_name(originname, true, false);
+ }
+
/* Clean up dependencies */
deleteSharedDependencyRecordsFor(SubscriptionRelationId, subid, 0);
@@ -1055,30 +1197,108 @@ DropSubscription(DropSubscriptionStmt *stmt, bool isTopLevel)
* If there is no slot associated with the subscription, we can finish
* here.
*/
- if (!slotname)
+ if (!slotname && rstates == NIL)
{
table_close(rel, NoLock);
return;
}
/*
- * Otherwise drop the replication slot at the publisher node using the
- * replication connection.
+ * Try to acquire the connection necessary for dropping slots.
+ *
+ * Note: If the slotname is NONE/NULL then we allow the command to finish
+ * and users need to manually cleanup the apply and tablesync worker slots
+ * later.
+ *
+ * This has to be at the end because otherwise if there is an error while
+ * doing the database operations we won't be able to rollback dropped slot.
*/
load_file("libpqwalreceiver", false);
- initStringInfo(&cmd);
- appendStringInfo(&cmd, "DROP_REPLICATION_SLOT %s WAIT", quote_identifier(slotname));
-
wrconn = walrcv_connect(conninfo, true, subname, &err);
if (wrconn == NULL)
- ereport(ERROR,
- (errmsg("could not connect to publisher when attempting to "
- "drop the replication slot \"%s\"", slotname),
- errdetail("The error was: %s", err),
- /* translator: %s is an SQL ALTER command */
- errhint("Use %s to disassociate the subscription from the slot.",
- "ALTER SUBSCRIPTION ... SET (slot_name = NONE)")));
+ {
+ if (!slotname)
+ {
+ /* be tidy */
+ list_free(rstates);
+ return;
+ }
+ else
+ {
+ ReportSlotConnectionError(rstates, subid, slotname, err);
+ }
+ }
+
+ PG_TRY();
+ {
+ foreach(lc, rstates)
+ {
+ SubscriptionRelState *rstate = (SubscriptionRelState *) lfirst(lc);
+ Oid relid = rstate->relid;
+
+ /* Only cleanup resources of tablesync workers */
+ if (!OidIsValid(relid))
+ continue;
+
+ /*
+ * Drop the tablesync slots associated with removed tables.
+ *
+ * For SYNCDONE/READY states, the tablesync slot is known to have
+ * already been dropped by the tablesync worker.
+ *
+ * For other states, there is no certainty, maybe the slot
+ * does not exist yet. Also, if we fail after removing some of
+ * the slots, next time, it will again try to drop already
+ * dropped slots and fail. For these reasons, we allow
+ * missing_ok = true for the drop.
+ */
+ if (rstate->state != SUBREL_STATE_SYNCDONE)
+ {
+ char syncslotname[NAMEDATALEN] = {0};
+
+ ReplicationSlotNameForTablesync(subid, relid, syncslotname);
+ ReplicationSlotDropAtPubNode(wrconn, syncslotname, true /* missing_ok */ );
+ }
+ }
+
+ list_free(rstates);
+
+ /*
+ * If there is a slot associated with the subscription, then drop the
+ * replication slot at the publisher.
+ */
+ if (slotname)
+ ReplicationSlotDropAtPubNode(wrconn, slotname, false /* missing_ok */ );
+
+ }
+ PG_FINALLY();
+ {
+ walrcv_disconnect(wrconn);
+ }
+ PG_END_TRY();
+
+ table_close(rel, NoLock);
+}
+
+/*
+ * Drop the replication slot at the publisher node using the replication
+ * connection.
+ *
+ * missing_ok - if true then only issue WARNING message if the slot cannot be
+ * deleted.
+ */
+void
+ReplicationSlotDropAtPubNode(WalReceiverConn *wrconn, char *slotname, bool missing_ok)
+{
+ StringInfoData cmd;
+
+ Assert(wrconn);
+
+ load_file("libpqwalreceiver", false);
+
+ initStringInfo(&cmd);
+ appendStringInfo(&cmd, "DROP_REPLICATION_SLOT %s WAIT", quote_identifier(slotname));
PG_TRY();
{
@@ -1086,27 +1306,37 @@ DropSubscription(DropSubscriptionStmt *stmt, bool isTopLevel)
res = walrcv_exec(wrconn, cmd.data, 0, NULL);
- if (res->status != WALRCV_OK_COMMAND)
- ereport(ERROR,
+ if (res->status == WALRCV_OK_COMMAND)
+ {
+ /* NOTICE. Success. */
+ ereport(NOTICE,
+ (errmsg("dropped replication slot \"%s\" on publisher",
+ slotname)));
+ }
+ else if (res->status == WALRCV_ERROR && missing_ok)
+ {
+ /* WARNING. Error, but missing_ok = true. */
+ ereport(WARNING,
(errmsg("could not drop the replication slot \"%s\" on publisher",
slotname),
errdetail("The error was: %s", res->err)));
+ }
else
- ereport(NOTICE,
- (errmsg("dropped replication slot \"%s\" on publisher",
- slotname)));
+ {
+ /* ERROR. */
+ ereport(ERROR,
+ (errmsg("could not drop the replication slot \"%s\" on publisher",
+ slotname),
+ errdetail("The error was: %s", res->err)));
+ }
walrcv_clear_result(res);
}
PG_FINALLY();
{
- walrcv_disconnect(wrconn);
+ pfree(cmd.data);
}
PG_END_TRY();
-
- pfree(cmd.data);
-
- table_close(rel, NoLock);
}
/*
@@ -1275,3 +1505,45 @@ fetch_table_list(WalReceiverConn *wrconn, List *publications)
return tablelist;
}
+
+/*
+ * This is to report the connection failure while dropping replication slots.
+ * Here, we report the WARNING for all tablesync slots so that user can drop
+ * them manually, if required.
+ */
+static void
+ReportSlotConnectionError(List *rstates, Oid subid, char *slotname, char *err)
+{
+ ListCell *lc;
+
+ foreach(lc, rstates)
+ {
+ SubscriptionRelState *rstate = (SubscriptionRelState *) lfirst(lc);
+ Oid relid = rstate->relid;
+
+ /* Only cleanup resources of tablesync workers */
+ if (!OidIsValid(relid))
+ continue;
+
+ /*
+ * Caller needs to ensure that relstate doesn't change underneath us.
+ * See DropSubscription where we get the relstates.
+ */
+ if (rstate->state != SUBREL_STATE_SYNCDONE)
+ {
+ char syncslotname[NAMEDATALEN] = { 0 };
+
+ ReplicationSlotNameForTablesync(subid, relid, syncslotname);
+ elog(WARNING, "could not drop tablesync replication slot \"%s\"",
+ syncslotname);
+ }
+ }
+
+ ereport(ERROR,
+ (errmsg("could not connect to publisher when attempting to "
+ "drop the replication slot \"%s\"", slotname),
+ errdetail("The error was: %s", err),
+ /* translator: %s is an SQL ALTER command */
+ errhint("Use %s to disassociate the subscription from the slot.",
+ "ALTER SUBSCRIPTION ... SET (slot_name = NONE)")));
+}
diff --git a/src/backend/replication/logical/launcher.c b/src/backend/replication/logical/launcher.c
index 186514c..58082dd 100644
--- a/src/backend/replication/logical/launcher.c
+++ b/src/backend/replication/logical/launcher.c
@@ -73,20 +73,6 @@ typedef struct LogicalRepWorkerId
Oid relid;
} LogicalRepWorkerId;
-typedef struct StopWorkersData
-{
- int nestDepth; /* Sub-transaction nest level */
- List *workers; /* List of LogicalRepWorkerId */
- struct StopWorkersData *parent; /* This need not be an immediate
- * subtransaction parent */
-} StopWorkersData;
-
-/*
- * Stack of StopWorkersData elements. Each stack element contains the workers
- * to be stopped for that subtransaction.
- */
-static StopWorkersData *on_commit_stop_workers = NULL;
-
static void ApplyLauncherWakeup(void);
static void logicalrep_launcher_onexit(int code, Datum arg);
static void logicalrep_worker_onexit(int code, Datum arg);
@@ -547,51 +533,6 @@ logicalrep_worker_stop(Oid subid, Oid relid)
}
/*
- * Request worker for specified sub/rel to be stopped on commit.
- */
-void
-logicalrep_worker_stop_at_commit(Oid subid, Oid relid)
-{
- int nestDepth = GetCurrentTransactionNestLevel();
- LogicalRepWorkerId *wid;
- MemoryContext oldctx;
-
- /* Make sure we store the info in context that survives until commit. */
- oldctx = MemoryContextSwitchTo(TopTransactionContext);
-
- /* Check that previous transactions were properly cleaned up. */
- Assert(on_commit_stop_workers == NULL ||
- nestDepth >= on_commit_stop_workers->nestDepth);
-
- /*
- * Push a new stack element if we don't already have one for the current
- * nestDepth.
- */
- if (on_commit_stop_workers == NULL ||
- nestDepth > on_commit_stop_workers->nestDepth)
- {
- StopWorkersData *newdata = palloc(sizeof(StopWorkersData));
-
- newdata->nestDepth = nestDepth;
- newdata->workers = NIL;
- newdata->parent = on_commit_stop_workers;
- on_commit_stop_workers = newdata;
- }
-
- /*
- * Finally add a new worker into the worker list of the current
- * subtransaction.
- */
- wid = palloc(sizeof(LogicalRepWorkerId));
- wid->subid = subid;
- wid->relid = relid;
- on_commit_stop_workers->workers =
- lappend(on_commit_stop_workers->workers, wid);
-
- MemoryContextSwitchTo(oldctx);
-}
-
-/*
* Wake up (using latch) any logical replication worker for specified sub/rel.
*/
void
@@ -820,109 +761,21 @@ ApplyLauncherShmemInit(void)
}
/*
- * Check whether current transaction has manipulated logical replication
- * workers.
- */
-bool
-XactManipulatesLogicalReplicationWorkers(void)
-{
- return (on_commit_stop_workers != NULL);
-}
-
-/*
* Wakeup the launcher on commit if requested.
*/
void
AtEOXact_ApplyLauncher(bool isCommit)
{
-
- Assert(on_commit_stop_workers == NULL ||
- (on_commit_stop_workers->nestDepth == 1 &&
- on_commit_stop_workers->parent == NULL));
-
if (isCommit)
{
- ListCell *lc;
-
- if (on_commit_stop_workers != NULL)
- {
- List *workers = on_commit_stop_workers->workers;
-
- foreach(lc, workers)
- {
- LogicalRepWorkerId *wid = lfirst(lc);
-
- logicalrep_worker_stop(wid->subid, wid->relid);
- }
- }
-
if (on_commit_launcher_wakeup)
ApplyLauncherWakeup();
}
- /*
- * No need to pfree on_commit_stop_workers. It was allocated in
- * transaction memory context, which is going to be cleaned soon.
- */
- on_commit_stop_workers = NULL;
on_commit_launcher_wakeup = false;
}
/*
- * On commit, merge the current on_commit_stop_workers list into the
- * immediate parent, if present.
- * On rollback, discard the current on_commit_stop_workers list.
- * Pop out the stack.
- */
-void
-AtEOSubXact_ApplyLauncher(bool isCommit, int nestDepth)
-{
- StopWorkersData *parent;
-
- /* Exit immediately if there's no work to do at this level. */
- if (on_commit_stop_workers == NULL ||
- on_commit_stop_workers->nestDepth < nestDepth)
- return;
-
- Assert(on_commit_stop_workers->nestDepth == nestDepth);
-
- parent = on_commit_stop_workers->parent;
-
- if (isCommit)
- {
- /*
- * If the upper stack element is not an immediate parent
- * subtransaction, just decrement the notional nesting depth without
- * doing any real work. Else, we need to merge the current workers
- * list into the parent.
- */
- if (!parent || parent->nestDepth < nestDepth - 1)
- {
- on_commit_stop_workers->nestDepth--;
- return;
- }
-
- parent->workers =
- list_concat(parent->workers, on_commit_stop_workers->workers);
- }
- else
- {
- /*
- * Abandon everything that was done at this nesting level. Explicitly
- * free memory to avoid a transaction-lifespan leak.
- */
- list_free_deep(on_commit_stop_workers->workers);
- }
-
- /*
- * We have taken care of the current subtransaction workers list for both
- * abort or commit. So we are ready to pop the stack.
- */
- pfree(on_commit_stop_workers);
- on_commit_stop_workers = parent;
-}
-
-/*
* Request wakeup of the launcher on commit of the transaction.
*
* This is used to send launcher signal to stop sleeping and process the
diff --git a/src/backend/replication/logical/tablesync.c b/src/backend/replication/logical/tablesync.c
index ccbdbcf..38df892 100644
--- a/src/backend/replication/logical/tablesync.c
+++ b/src/backend/replication/logical/tablesync.c
@@ -31,8 +31,11 @@
* table state to INIT.
* - Tablesync worker starts; changes table state from INIT to DATASYNC while
* copying.
- * - Tablesync worker finishes the copy and sets table state to SYNCWAIT;
- * waits for state change.
+ * - Tablesync worker does initial table copy; there is a FINISHEDCOPY (sync
+ * worker specific) state to indicate when the copy phase has completed, so
+ * if the worker crashes with this (non-memory) state then the copy will not
+ * be re-attempted.
+ * - Tablesync worker then sets table state to SYNCWAIT; waits for state change.
* - Apply worker periodically checks for tables in SYNCWAIT state. When
* any appear, it sets the table state to CATCHUP and starts loop-waiting
* until either the table state is set to SYNCDONE or the sync worker
@@ -48,8 +51,8 @@
* point it sets state to READY and stops tracking. Again, there might
* be zero changes in between.
*
- * So the state progression is always: INIT -> DATASYNC -> SYNCWAIT ->
- * CATCHUP -> SYNCDONE -> READY.
+ * So the state progression is always: INIT -> DATASYNC -> FINISHEDCOPY
+ * -> SYNCWAIT -> CATCHUP -> SYNCDONE -> READY.
*
* The catalog pg_subscription_rel is used to keep information about
* subscribed tables and their state. The catalog holds all states
@@ -58,6 +61,7 @@
* Example flows look like this:
* - Apply is in front:
* sync:8
+ * -> set in catalog FINISHEDCOPY
* -> set in memory SYNCWAIT
* apply:10
* -> set in memory CATCHUP
@@ -73,6 +77,7 @@
*
* - Sync is in front:
* sync:10
+ * -> set in catalog FINISHEDCOPY
* -> set in memory SYNCWAIT
* apply:8
* -> set in memory CATCHUP
@@ -101,7 +106,10 @@
#include "replication/logicalrelation.h"
#include "replication/walreceiver.h"
#include "replication/worker_internal.h"
+#include "replication/slot.h"
+#include "replication/origin.h"
#include "storage/ipc.h"
+#include "storage/lmgr.h"
#include "utils/builtins.h"
#include "utils/lsyscache.h"
#include "utils/memutils.h"
@@ -269,26 +277,52 @@ invalidate_syncing_table_states(Datum arg, int cacheid, uint32 hashvalue)
static void
process_syncing_tables_for_sync(XLogRecPtr current_lsn)
{
- Assert(IsTransactionState());
-
SpinLockAcquire(&MyLogicalRepWorker->relmutex);
if (MyLogicalRepWorker->relstate == SUBREL_STATE_CATCHUP &&
current_lsn >= MyLogicalRepWorker->relstate_lsn)
{
TimeLineID tli;
+ char syncslotname[NAMEDATALEN] = {0};
MyLogicalRepWorker->relstate = SUBREL_STATE_SYNCDONE;
MyLogicalRepWorker->relstate_lsn = current_lsn;
SpinLockRelease(&MyLogicalRepWorker->relmutex);
+ /*
+ * UpdateSubscriptionRelState must be called within a transaction.
+ * That transaction will be ended within the finish_sync_worker().
+ */
+ if (!IsTransactionState())
+ StartTransactionCommand();
+
UpdateSubscriptionRelState(MyLogicalRepWorker->subid,
MyLogicalRepWorker->relid,
MyLogicalRepWorker->relstate,
MyLogicalRepWorker->relstate_lsn);
+ /* End wal streaming so wrconn can be re-used to drop the slot. */
walrcv_endstreaming(wrconn, &tli);
+
+ /*
+ * Cleanup the tablesync slot.
+ *
+ * This has to be done after updating the state because otherwise if
+ * there is an error while doing the database operations we won't be
+ * able to rollback dropped slot.
+ */
+ ReplicationSlotNameForTablesync(MyLogicalRepWorker->subid,
+ MyLogicalRepWorker->relid,
+ syncslotname);
+
+ /*
+ * It is important to give an error if we are unable to drop the slot,
+ * otherwise, it won't be dropped till the corresponding subscription
+ * is dropped.
+ */
+ ReplicationSlotDropAtPubNode(wrconn, syncslotname, false /* missing_ok */);
+
finish_sync_worker();
}
else
@@ -403,6 +437,8 @@ process_syncing_tables_for_apply(XLogRecPtr current_lsn)
*/
if (current_lsn >= rstate->lsn)
{
+ char originname[NAMEDATALEN];
+
rstate->state = SUBREL_STATE_READY;
rstate->lsn = current_lsn;
if (!started_tx)
@@ -411,6 +447,26 @@ process_syncing_tables_for_apply(XLogRecPtr current_lsn)
started_tx = true;
}
+ /*
+ * Remove the tablesync origin tracking if exists.
+ *
+ * The normal case origin drop is done here instead of in the
+ * process_syncing_tables_for_sync function because we don't
+ * allow to drop the origin till the process owning the origin
+ * is alive.
+ *
+ * There is a chance that the user is concurrently performing
+ * refresh for the subscription where we remove the table
+ * state and its origin and by this time the origin might be
+ * already removed. So passing missing_ok = true from here.
+ */
+ snprintf(originname, sizeof(originname), "pg_%u_%u",
+ MyLogicalRepWorker->subid, rstate->relid);
+ replorigin_drop_by_name(originname, true, false);
+
+ /*
+ * Update the state to READY only after the origin cleanup.
+ */
UpdateSubscriptionRelState(MyLogicalRepWorker->subid,
rstate->relid, rstate->state,
rstate->lsn);
@@ -806,6 +862,37 @@ copy_table(Relation rel)
}
/*
+ * Determine the tablesync slot name.
+ *
+ * The name must not exceed NAMEDATALEN - 1 because of remote node constraints
+ * on slot name length. We do append system_identifier to avoid slot_name
+ * collision with subscriptions in other clusters. With current scheme
+ * pg_%u_sync_%u_UINT64_FORMAT (3 + 10 + 6 + 10 + 20 + '\0'), the maximum
+ * length of slot_name will be 50.
+ *
+ * The returned slot name is either:
+ * - stored in the supplied buffer (syncslotname), or
+ * - palloc'ed in current memory context (if syncslotname = NULL).
+ *
+ * Note: We don't use the subscription slot name as part of tablesync slot name
+ * because we are responsible for cleaning up these slots and it could become
+ * impossible to recalculate what name to cleanup if the subscription slot name
+ * had changed.
+ */
+char *
+ReplicationSlotNameForTablesync(Oid suboid, Oid relid, char syncslotname[NAMEDATALEN])
+{
+ if (syncslotname)
+ sprintf(syncslotname, "pg_%u_sync_%u_" UINT64_FORMAT, suboid, relid,
+ GetSystemIdentifier());
+ else
+ syncslotname = psprintf("pg_%u_sync_%u_" UINT64_FORMAT, suboid, relid,
+ GetSystemIdentifier());
+
+ return syncslotname;
+}
+
+/*
* Start syncing the table in the sync worker.
*
* If nothing needs to be done to sync the table, we exit the worker without
@@ -822,6 +909,8 @@ LogicalRepSyncTableStart(XLogRecPtr *origin_startpos)
XLogRecPtr relstate_lsn;
Relation rel;
WalRcvExecResult *res;
+ char originname[NAMEDATALEN];
+ RepOriginId originid;
/* Check the state of the table synchronization. */
StartTransactionCommand();
@@ -847,19 +936,10 @@ LogicalRepSyncTableStart(XLogRecPtr *origin_startpos)
finish_sync_worker(); /* doesn't return */
}
- /*
- * To build a slot name for the sync work, we are limited to NAMEDATALEN -
- * 1 characters. We cut the original slot name to NAMEDATALEN - 28 chars
- * and append _%u_sync_%u (1 + 10 + 6 + 10 + '\0'). (It's actually the
- * NAMEDATALEN on the remote that matters, but this scheme will also work
- * reasonably if that is different.)
- */
- StaticAssertStmt(NAMEDATALEN >= 32, "NAMEDATALEN too small"); /* for sanity */
- slotname = psprintf("%.*s_%u_sync_%u",
- NAMEDATALEN - 28,
- MySubscription->slotname,
- MySubscription->oid,
- MyLogicalRepWorker->relid);
+ /* Calculate the name of the tablesync slot. */
+ slotname = ReplicationSlotNameForTablesync(MySubscription->oid,
+ MyLogicalRepWorker->relid,
+ NULL /* use palloc */ );
/*
* Here we use the slot name instead of the subscription name as the
@@ -872,7 +952,48 @@ LogicalRepSyncTableStart(XLogRecPtr *origin_startpos)
(errmsg("could not connect to the publisher: %s", err)));
Assert(MyLogicalRepWorker->relstate == SUBREL_STATE_INIT ||
- MyLogicalRepWorker->relstate == SUBREL_STATE_DATASYNC);
+ MyLogicalRepWorker->relstate == SUBREL_STATE_DATASYNC ||
+ MyLogicalRepWorker->relstate == SUBREL_STATE_FINISHEDCOPY);
+
+ /* Assign the origin tracking record name. */
+ snprintf(originname, sizeof(originname), "pg_%u_%u", MySubscription->oid, MyLogicalRepWorker->relid);
+
+ if (MyLogicalRepWorker->relstate == SUBREL_STATE_DATASYNC)
+ {
+ /*
+ * We have previously errored out before finishing the copy so the
+ * replication slot might exist. We want to remove the slot if it
+ * already exists and proceed.
+ *
+ * XXX We could also instead try to drop the slot, last time we failed
+ * but for that, we might need to clean up the copy state as it might
+ * be in the middle of fetching the rows. Also, if there is a network
+ * breakdown then it wouldn't have succeeded so trying it next time
+ * seems like a better bet.
+ */
+ ReplicationSlotDropAtPubNode(wrconn, slotname, true /* missing_ok */);
+ }
+ else if (MyLogicalRepWorker->relstate == SUBREL_STATE_FINISHEDCOPY)
+ {
+ /*
+ * The COPY phase was previously done, but tablesync then crashed
+ * before it was able to finish normally.
+ */
+ StartTransactionCommand();
+
+ /*
+ * The origin tracking name must already exist. It was created first
+ * time this tablesync was launched.
+ */
+ originid = replorigin_by_name(originname, false /* missing_ok */ );
+ replorigin_session_setup(originid);
+ replorigin_session_origin = originid;
+ *origin_startpos = replorigin_session_get_progress(false);
+
+ CommitTransactionCommand();
+
+ goto copy_table_done;
+ }
SpinLockAcquire(&MyLogicalRepWorker->relmutex);
MyLogicalRepWorker->relstate = SUBREL_STATE_DATASYNC;
@@ -888,9 +1009,6 @@ LogicalRepSyncTableStart(XLogRecPtr *origin_startpos)
CommitTransactionCommand();
pgstat_report_stat(false);
- /*
- * We want to do the table data sync in a single transaction.
- */
StartTransactionCommand();
/*
@@ -916,12 +1034,45 @@ LogicalRepSyncTableStart(XLogRecPtr *origin_startpos)
walrcv_clear_result(res);
/*
- * Create a new temporary logical decoding slot. This slot will be used
- * for the catchup phase after COPY is done, so tell it to use the
- * snapshot to make the final data consistent.
+ * Create a new permanent logical decoding slot. This slot will be
+ * used for the catchup phase after COPY is done, so tell it to use
+ * the snapshot to make the final data consistent.
*/
- walrcv_create_slot(wrconn, slotname, true,
- CRS_USE_SNAPSHOT, origin_startpos);
+ walrcv_create_slot(wrconn, slotname, false /* permanent */ ,
+ CRS_USE_SNAPSHOT, origin_startpos);
+
+ /*
+ * Setup replication origin tracking. The purpose of doing this before the
+ * copy is to avoid doing the copy again due to any error in setting up
+ * origin tracking.
+ */
+ originid = replorigin_by_name(originname, true);
+ if (!OidIsValid(originid))
+ {
+ /*
+ * Origin tracking does not exist, so create it now.
+ *
+ * Then advance to the LSN got from walrcv_create_slot. This is
+ * WAL logged for the purpose of recovery. Locks are to prevent
+ * the replication origin from vanishing while advancing.
+ */
+ originid = replorigin_create(originname);
+
+ LockRelationOid(ReplicationOriginRelationId, RowExclusiveLock);
+ replorigin_advance(originid, *origin_startpos, InvalidXLogRecPtr,
+ true /* go backward */, true /* WAL log */);
+ UnlockRelationOid(ReplicationOriginRelationId, RowExclusiveLock);
+
+ replorigin_session_setup(originid);
+ replorigin_session_origin = originid;
+ }
+ else
+ {
+ ereport(ERROR,
+ (errcode(ERRCODE_DUPLICATE_OBJECT),
+ errmsg("replication origin \"%s\" already exists",
+ originname)));
+ }
/* Now do the initial data copy */
PushActiveSnapshot(GetTransactionSnapshot());
@@ -941,6 +1092,25 @@ LogicalRepSyncTableStart(XLogRecPtr *origin_startpos)
CommandCounterIncrement();
/*
+ * Update the persisted state to indicate the COPY phase is done; make
+ * it visible to others.
+ */
+ UpdateSubscriptionRelState(MyLogicalRepWorker->subid,
+ MyLogicalRepWorker->relid,
+ SUBREL_STATE_FINISHEDCOPY,
+ MyLogicalRepWorker->relstate_lsn);
+
+ CommitTransactionCommand();
+
+copy_table_done:
+
+ elog(DEBUG1,
+ "LogicalRepSyncTableStart: '%s' origin_startpos lsn %X/%X",
+ originname,
+ (uint32) (*origin_startpos >> 32),
+ (uint32) *origin_startpos);
+
+ /*
* We are done with the initial data synchronization, update the state.
*/
SpinLockAcquire(&MyLogicalRepWorker->relmutex);
diff --git a/src/backend/replication/logical/worker.c b/src/backend/replication/logical/worker.c
index eb7db89..cfc924c 100644
--- a/src/backend/replication/logical/worker.c
+++ b/src/backend/replication/logical/worker.c
@@ -807,12 +807,8 @@ apply_handle_stream_stop(StringInfo s)
/* We must be in a valid transaction state */
Assert(IsTransactionState());
- /* The synchronization worker runs in single transaction. */
- if (!am_tablesync_worker())
- {
- /* Commit the per-stream transaction */
- CommitTransactionCommand();
- }
+ /* Commit the per-stream transaction */
+ CommitTransactionCommand();
in_streamed_transaction = false;
@@ -889,9 +885,7 @@ apply_handle_stream_abort(StringInfo s)
/* Cleanup the subxact info */
cleanup_subxact_info();
- /* The synchronization worker runs in single transaction */
- if (!am_tablesync_worker())
- CommitTransactionCommand();
+ CommitTransactionCommand();
return;
}
@@ -918,8 +912,7 @@ apply_handle_stream_abort(StringInfo s)
/* write the updated subxact list */
subxact_info_write(MyLogicalRepWorker->subid, xid);
- if (!am_tablesync_worker())
- CommitTransactionCommand();
+ CommitTransactionCommand();
}
}
@@ -1062,8 +1055,7 @@ apply_handle_stream_commit(StringInfo s)
static void
apply_handle_commit_internal(StringInfo s, LogicalRepCommitData *commit_data)
{
- /* The synchronization worker runs in single transaction. */
- if (IsTransactionState() && !am_tablesync_worker())
+ if (IsTransactionState())
{
/*
* Update origin state so we can restart streaming from correct
diff --git a/src/backend/tcop/utility.c b/src/backend/tcop/utility.c
index 1d81071..05bb698 100644
--- a/src/backend/tcop/utility.c
+++ b/src/backend/tcop/utility.c
@@ -1786,7 +1786,8 @@ ProcessUtilitySlow(ParseState *pstate,
break;
case T_AlterSubscriptionStmt:
- address = AlterSubscription((AlterSubscriptionStmt *) parsetree);
+ address = AlterSubscription((AlterSubscriptionStmt *) parsetree,
+ isTopLevel);
break;
case T_DropSubscriptionStmt:
diff --git a/src/include/catalog/pg_subscription_rel.h b/src/include/catalog/pg_subscription_rel.h
index 2bea2c5..ed94f57 100644
--- a/src/include/catalog/pg_subscription_rel.h
+++ b/src/include/catalog/pg_subscription_rel.h
@@ -61,6 +61,8 @@ DECLARE_UNIQUE_INDEX_PKEY(pg_subscription_rel_srrelid_srsubid_index, 6117, on pg
#define SUBREL_STATE_INIT 'i' /* initializing (sublsn NULL) */
#define SUBREL_STATE_DATASYNC 'd' /* data is being synchronized (sublsn
* NULL) */
+#define SUBREL_STATE_FINISHEDCOPY 'f' /* tablesync copy phase is completed
+ * (sublsn NULL) */
#define SUBREL_STATE_SYNCDONE 's' /* synchronization finished in front of
* apply (sublsn set) */
#define SUBREL_STATE_READY 'r' /* ready (sublsn set) */
diff --git a/src/include/commands/subscriptioncmds.h b/src/include/commands/subscriptioncmds.h
index a818650..3b926f3 100644
--- a/src/include/commands/subscriptioncmds.h
+++ b/src/include/commands/subscriptioncmds.h
@@ -20,7 +20,7 @@
extern ObjectAddress CreateSubscription(CreateSubscriptionStmt *stmt,
bool isTopLevel);
-extern ObjectAddress AlterSubscription(AlterSubscriptionStmt *stmt);
+extern ObjectAddress AlterSubscription(AlterSubscriptionStmt *stmt, bool isTopLevel);
extern void DropSubscription(DropSubscriptionStmt *stmt, bool isTopLevel);
extern ObjectAddress AlterSubscriptionOwner(const char *name, Oid newOwnerId);
diff --git a/src/include/replication/logicallauncher.h b/src/include/replication/logicallauncher.h
index 421ec15..301e494 100644
--- a/src/include/replication/logicallauncher.h
+++ b/src/include/replication/logicallauncher.h
@@ -22,9 +22,7 @@ extern Size ApplyLauncherShmemSize(void);
extern void ApplyLauncherShmemInit(void);
extern void ApplyLauncherWakeupAtCommit(void);
-extern bool XactManipulatesLogicalReplicationWorkers(void);
extern void AtEOXact_ApplyLauncher(bool isCommit);
-extern void AtEOSubXact_ApplyLauncher(bool isCommit, int nestDepth);
extern bool IsLogicalLauncher(void);
diff --git a/src/include/replication/slot.h b/src/include/replication/slot.h
index 53f636c..5f52335 100644
--- a/src/include/replication/slot.h
+++ b/src/include/replication/slot.h
@@ -15,6 +15,7 @@
#include "storage/lwlock.h"
#include "storage/shmem.h"
#include "storage/spin.h"
+#include "replication/walreceiver.h"
/*
* Behaviour of replication slots, upon release or crash.
@@ -211,6 +212,8 @@ extern bool ReplicationSlotsCountDBSlots(Oid dboid, int *nslots, int *nactive);
extern void ReplicationSlotsDropDBSlots(Oid dboid);
extern void InvalidateObsoleteReplicationSlots(XLogSegNo oldestSegno);
extern ReplicationSlot *SearchNamedReplicationSlot(const char *name);
+extern char *ReplicationSlotNameForTablesync(Oid suboid, Oid relid, char *syncslotname);
+extern void ReplicationSlotDropAtPubNode(WalReceiverConn *wrconn, char *slotname, bool missing_ok);
extern void StartupReplicationSlots(void);
extern void CheckPointReplicationSlots(void);
diff --git a/src/include/replication/worker_internal.h b/src/include/replication/worker_internal.h
index d046022..f8a37a7 100644
--- a/src/include/replication/worker_internal.h
+++ b/src/include/replication/worker_internal.h
@@ -77,13 +77,13 @@ extern List *logicalrep_workers_find(Oid subid, bool only_running);
extern void logicalrep_worker_launch(Oid dbid, Oid subid, const char *subname,
Oid userid, Oid relid);
extern void logicalrep_worker_stop(Oid subid, Oid relid);
-extern void logicalrep_worker_stop_at_commit(Oid subid, Oid relid);
extern void logicalrep_worker_wakeup(Oid subid, Oid relid);
extern void logicalrep_worker_wakeup_ptr(LogicalRepWorker *worker);
extern int logicalrep_sync_worker_count(Oid subid);
extern char *LogicalRepSyncTableStart(XLogRecPtr *origin_startpos);
+
void process_syncing_tables(XLogRecPtr current_lsn);
void invalidate_syncing_table_states(Datum arg, int cacheid,
uint32 hashvalue);
diff --git a/src/test/regress/expected/subscription.out b/src/test/regress/expected/subscription.out
index 2fa9bce..7802279 100644
--- a/src/test/regress/expected/subscription.out
+++ b/src/test/regress/expected/subscription.out
@@ -201,6 +201,27 @@ ALTER SUBSCRIPTION regress_testsub SET (slot_name = NONE);
(1 row)
DROP SUBSCRIPTION regress_testsub;
+CREATE SUBSCRIPTION regress_testsub CONNECTION 'dbname=postgres' PUBLICATION mypub
+ WITH (enabled = true, create_slot = false, copy_data = false);
+-- fail - ALTER SUBSCRIPTION with refresh is not allowed in a transaction
+-- block or function
+BEGIN;
+ALTER SUBSCRIPTION regress_testsub SET PUBLICATION mypub WITH (refresh = true);
+ERROR: ALTER SUBSCRIPTION with refresh cannot run inside a transaction block
+END;
+BEGIN;
+ALTER SUBSCRIPTION regress_testsub REFRESH PUBLICATION;
+ERROR: ALTER SUBSCRIPTION ... REFRESH cannot run inside a transaction block
+END;
+CREATE FUNCTION func() RETURNS VOID AS
+$$ ALTER SUBSCRIPTION regress_testsub SET PUBLICATION mypub WITH (refresh = true) $$ LANGUAGE SQL;
+SELECT func();
+ERROR: ALTER SUBSCRIPTION with refresh cannot be executed from a function
+CONTEXT: SQL function "func" statement 1
+ALTER SUBSCRIPTION regress_testsub DISABLE;
+ALTER SUBSCRIPTION regress_testsub SET (slot_name = NONE);
+DROP SUBSCRIPTION regress_testsub;
+DROP FUNCTION func;
RESET SESSION AUTHORIZATION;
DROP ROLE regress_subscription_user;
DROP ROLE regress_subscription_user2;
diff --git a/src/test/regress/sql/subscription.sql b/src/test/regress/sql/subscription.sql
index 14fa0b2..ca0d782 100644
--- a/src/test/regress/sql/subscription.sql
+++ b/src/test/regress/sql/subscription.sql
@@ -147,6 +147,28 @@ ALTER SUBSCRIPTION regress_testsub SET (slot_name = NONE);
DROP SUBSCRIPTION regress_testsub;
+CREATE SUBSCRIPTION regress_testsub CONNECTION 'dbname=postgres' PUBLICATION mypub
+ WITH (enabled = true, create_slot = false, copy_data = false);
+
+-- fail - ALTER SUBSCRIPTION with refresh is not allowed in a transaction
+-- block or function
+BEGIN;
+ALTER SUBSCRIPTION regress_testsub SET PUBLICATION mypub WITH (refresh = true);
+END;
+
+BEGIN;
+ALTER SUBSCRIPTION regress_testsub REFRESH PUBLICATION;
+END;
+
+CREATE FUNCTION func() RETURNS VOID AS
+$$ ALTER SUBSCRIPTION regress_testsub SET PUBLICATION mypub WITH (refresh = true) $$ LANGUAGE SQL;
+SELECT func();
+
+ALTER SUBSCRIPTION regress_testsub DISABLE;
+ALTER SUBSCRIPTION regress_testsub SET (slot_name = NONE);
+DROP SUBSCRIPTION regress_testsub;
+DROP FUNCTION func;
+
RESET SESSION AUTHORIZATION;
DROP ROLE regress_subscription_user;
DROP ROLE regress_subscription_user2;
diff --git a/src/test/subscription/t/004_sync.pl b/src/test/subscription/t/004_sync.pl
index e111ab9..c792668 100644
--- a/src/test/subscription/t/004_sync.pl
+++ b/src/test/subscription/t/004_sync.pl
@@ -3,7 +3,7 @@ use strict;
use warnings;
use PostgresNode;
use TestLib;
-use Test::More tests => 7;
+use Test::More tests => 8;
# Initialize publisher node
my $node_publisher = get_new_node('publisher');
@@ -149,7 +149,26 @@ $result = $node_subscriber->safe_psql('postgres',
is($result, qq(20),
'changes for table added after subscription initialized replicated');
+# clean up
+$node_publisher->safe_psql('postgres', "DROP TABLE tab_rep_next");
+$node_subscriber->safe_psql('postgres', "DROP TABLE tab_rep_next");
$node_subscriber->safe_psql('postgres', "DROP SUBSCRIPTION tap_sub");
+# Table tap_rep already has the same records on both publisher and subscriber
+# at this time. Recreate the subscription which will do the initial copy of
+# the table again and fails due to unique constraint violation.
+$node_subscriber->safe_psql('postgres',
+ "CREATE SUBSCRIPTION tap_sub CONNECTION '$publisher_connstr' PUBLICATION tap_pub");
+
+$result = $node_subscriber->poll_query_until('postgres', $started_query)
+ or die "Timed out while waiting for subscriber to start sync";
+
+# DROP SUBSCRIPTION must clean up slots on the publisher side when the
+# subscriber is stuck on data copy for constraint violation.
+$node_subscriber->safe_psql('postgres', "DROP SUBSCRIPTION tap_sub");
+
+$result = $node_publisher->safe_psql('postgres', "SELECT count(*) FROM pg_replication_slots");
+is($result, qq(0), 'DROP SUBSCRIPTION during error can clean up the slots on the publisher');
+
$node_subscriber->stop('fast');
$node_publisher->stop('fast');
diff --git a/src/tools/pgindent/typedefs.list b/src/tools/pgindent/typedefs.list
index 1d540fe..5f5c36d 100644
--- a/src/tools/pgindent/typedefs.list
+++ b/src/tools/pgindent/typedefs.list
@@ -2397,7 +2397,6 @@ StdAnalyzeData
StdRdOptions
Step
StopList
-StopWorkersData
StrategyNumber
StreamCtl
StreamXidHash
--
1.8.3.1
On Mon, Feb 8, 2021 at 12:22 PM osumi.takamichi@fujitsu.com
<osumi.takamichi@fujitsu.com> wrote:
On Monday, February 8, 2021 1:44 PM osumi.takamichi@fujitsu.com <osumi.takamichi@fujitsu.com>
On Mon, Feb 8, 2021 11:36 AM Peter Smith <smithpb2250@gmail.com>
wrote:2. For the 004 test case I know the test is needing some PK constraint
violation # Check if DROP SUBSCRIPTION cleans up slots on the
publisher side # when the subscriber is stuck on data copy for
constraintBut it is not clear to me what was the exact cause of that PK
violation. I think you must be relying on data that is leftover from
some previous test case but I am not sure which one. Can you make the
comment more detailed to say
*how* the PK violation is happening - e.g something to say which rows,
in which table, and inserted by who?I added some comments to clarify how the PK violation happens.
Please have a look.Sorry, I had a one typo in the tests of subscription.sql in v2.
I used 'foo' for the first test of "ALTER SUBSCRIPTION mytest SET PUBLICATION foo WITH (refresh = true) in v02",
but I should have used 'mypub' to make this test clearly independent from other previous tests.
Attached the fixed version.
Thanks. I have integrated this into the main patch with minor
modifications in the comments. The main change I have done is to
remove the test that was testing that there are two slots remaining
after the initial sync failure. This is because on restart of
tablesync worker we again try to drop the slot so we can't guarantee
that the tablesync slot would be remaining. I think this is a timing
issue so it might not have occurred on your machine but I could
reproduce that by repeated runs of the tests provided by you.
--
With Regards,
Amit Kapila.
On Mon, Feb 8, 2021 at 11:42 AM Peter Smith <smithpb2250@gmail.com> wrote:
On Sun, Feb 7, 2021 at 2:38 PM Peter Smith <smithpb2250@gmail.com> wrote:
On Sat, Feb 6, 2021 at 2:10 AM Petr Jelinek
<petr.jelinek@enterprisedb.com> wrote:Hi,
Some minor comments about code:
+ else if (res->status == WALRCV_ERROR && missing_ok) + { + /* WARNING. Error, but missing_ok = true. */ + ereport(WARNING,I wonder if we need to add error code to the WalRcvExecResult and check
for the appropriate ones here. Because this can for example return error
because of timeout, not because slot is missing. Not sure if it matters
for current callers though (but then maybe don't call the param
missign_ok?).You are right. The way we are using this function has evolved beyond
the original intention.
Probably renaming the param to something like "error_ok" would be more
appropriate now.PSA a patch (apply on top of V28) to change the misleading param name.
PSA an alternative patch. This one adds a new member to
WalRcvExecResult and so is able to detect the "slot does not exist"
error. This patch also applies on top of V28, if you want it.
------
Kind Regards,
Peter Smith.
Fujitsu Australia
Attachments:
v1-0001-ReplicationSlotDropAtPubNode-detect-slot-does-not.patchapplication/octet-stream; name=v1-0001-ReplicationSlotDropAtPubNode-detect-slot-does-not.patchDownload
From 4fa3b2c59959c1a3a8d878dd62dda7683db7a08e Mon Sep 17 00:00:00 2001
From: Peter Smith <peter.b.smith@fujitsu.com>
Date: Tue, 9 Feb 2021 10:27:14 +1100
Subject: [PATCH v1] ReplicationSlotDropAtPubNode detect slot does not exist.
A new sqlstate member was to WalRcvExecResult. This allows walrcv_exec calling code to know detail of the cause of any error. Specifically, here it means the ReplicationSlotDropAtPubNode function can now identify the "slot does not exist error", and so can handle "missing_ok" more correctly.
---
src/backend/commands/subscriptioncmds.c | 3 ++-
src/backend/replication/libpqwalreceiver/libpqwalreceiver.c | 8 ++++++++
src/include/replication/walreceiver.h | 1 +
3 files changed, 11 insertions(+), 1 deletion(-)
diff --git a/src/backend/commands/subscriptioncmds.c b/src/backend/commands/subscriptioncmds.c
index 1d3ca43..eee7512 100644
--- a/src/backend/commands/subscriptioncmds.c
+++ b/src/backend/commands/subscriptioncmds.c
@@ -1296,7 +1296,8 @@ ReplicationSlotDropAtPubNode(WalReceiverConn *wrconn, char *slotname, bool missi
(errmsg("dropped replication slot \"%s\" on publisher",
slotname)));
}
- else if (res->status == WALRCV_ERROR && missing_ok)
+ else if (res->status == WALRCV_ERROR &&
+ missing_ok && res->sqlstate == ERRCODE_UNDEFINED_OBJECT)
{
/* WARNING. Error, but missing_ok = true. */
ereport(WARNING,
diff --git a/src/backend/replication/libpqwalreceiver/libpqwalreceiver.c b/src/backend/replication/libpqwalreceiver/libpqwalreceiver.c
index e958274..7714696 100644
--- a/src/backend/replication/libpqwalreceiver/libpqwalreceiver.c
+++ b/src/backend/replication/libpqwalreceiver/libpqwalreceiver.c
@@ -982,6 +982,7 @@ libpqrcv_exec(WalReceiverConn *conn, const char *query,
{
PGresult *pgres = NULL;
WalRcvExecResult *walres = palloc0(sizeof(WalRcvExecResult));
+ char *diag_sqlstate;
if (MyDatabaseId == InvalidOid)
ereport(ERROR,
@@ -1025,6 +1026,13 @@ libpqrcv_exec(WalReceiverConn *conn, const char *query,
case PGRES_BAD_RESPONSE:
walres->status = WALRCV_ERROR;
walres->err = pchomp(PQerrorMessage(conn->streamConn));
+ diag_sqlstate = PQresultErrorField(pgres, PG_DIAG_SQLSTATE);
+ if (diag_sqlstate)
+ walres->sqlstate = MAKE_SQLSTATE(diag_sqlstate[0],
+ diag_sqlstate[1],
+ diag_sqlstate[2],
+ diag_sqlstate[3],
+ diag_sqlstate[4]);
break;
}
diff --git a/src/include/replication/walreceiver.h b/src/include/replication/walreceiver.h
index 4313f51..a97a59a 100644
--- a/src/include/replication/walreceiver.h
+++ b/src/include/replication/walreceiver.h
@@ -210,6 +210,7 @@ typedef enum
typedef struct WalRcvExecResult
{
WalRcvExecStatus status;
+ int sqlstate;
char *err;
Tuplestorestate *tuplestore;
TupleDesc tupledesc;
--
1.8.3.1
On Mon, Feb 8, 2021 8:04 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
On Mon, Feb 8, 2021 at 12:22 PM osumi.takamichi@fujitsu.com
<osumi.takamichi@fujitsu.com> wrote:On Monday, February 8, 2021 1:44 PM osumi.takamichi@fujitsu.com
<osumi.takamichi@fujitsu.com>On Mon, Feb 8, 2021 11:36 AM Peter Smith <smithpb2250@gmail.com>
wrote:2. For the 004 test case I know the test is needing some PK
constraint violation # Check if DROP SUBSCRIPTION cleans up slots
on the publisher side # when the subscriber is stuck on data copy
for constraintBut it is not clear to me what was the exact cause of that PK
violation. I think you must be relying on data that is leftover
from some previous test case but I am not sure which one. Can you
make the comment more detailed to say
*how* the PK violation is happening - e.g something to say which
rows, in which table, and inserted by who?I added some comments to clarify how the PK violation happens.
Please have a look.Sorry, I had a one typo in the tests of subscription.sql in v2.
I used 'foo' for the first test of "ALTER SUBSCRIPTION mytest SET
PUBLICATION foo WITH (refresh = true) in v02", but I should have used'mypub' to make this test clearly independent from other previous tests.
Attached the fixed version.
Thanks. I have integrated this into the main patch with minor modifications in
the comments. The main change I have done is to remove the test that was
testing that there are two slots remaining after the initial sync failure. This is
because on restart of tablesync worker we again try to drop the slot so we
can't guarantee that the tablesync slot would be remaining. I think this is a
timing issue so it might not have occurred on your machine but I could
reproduce that by repeated runs of the tests provided by you.
OK. I understand. Thank you so much that your modified
and integrated it into the main patch.
Best Regards,
Takamichi Osumi
Here are my feedback comments for the V29 patch.
====
FILE: logical-replication.sgml
+ slots have generated names:
<quote><literal>pg_%u_sync_%u_%llu</literal></quote>
+ (parameters: Subscription <parameter>oid</parameter>,
+ Table <parameter>relid</parameter>, system
identifier<parameter>sysid</parameter>)
+ </para>
1.
There is a missing space before the sysid parameter.
=====
FILE: subscriptioncmds.c
+ * SUBREL_STATE_FINISHEDCOPY. The apply worker can also
+ * concurrently try to drop the origin and by this time the
+ * origin might be already removed. For these reasons,
+ * passing missing_ok = true from here.
+ */
+ snprintf(originname, sizeof(originname), "pg_%u_%u", sub->oid, relid);
+ replorigin_drop_by_name(originname, true, false);
+ }
2.
Don't really need to say "from here".
(same comment applies multiple places, in this file and in tablesync.c)
3.
Previously the tablesync origin name format was encapsulated in a
common function. IMO it was cleaner/safer how it was before, instead
of the same "pg_%u_%u" cut/paste and scattered in many places.
(same comment applies multiple places, in this file and in tablesync.c)
4.
Calls like replorigin_drop_by_name(originname, true, false); make it
unnecessarily hard to read code when the boolean params are neither
named as variables nor commented. I noticed on another thread [et0205]
there was an idea that having no name/comments is fine because anyway
it is not difficult to figure out when using a "modern IDE", but since
my review tools are only "vi" and "meld" I beg to differ with that
justification.
(same comment applies multiple places, in this file and in tablesync.c)
[et0205] /messages/by-id/c1d9833f-eeeb-40d5-89ba-87674e1b7ba3@www.fastmail.com
=====
FILE: tablesync.c
5.
Previously there was a function tablesync_replorigin_drop which was
encapsulating the tablesync origin name formatting. I thought that was
better than the V29 code which now has the same formatting scattered
over many places.
(same comment applies for worker_internal.h)
+ * Determine the tablesync slot name.
+ *
+ * The name must not exceed NAMEDATALEN - 1 because of remote node constraints
+ * on slot name length. We do append system_identifier to avoid slot_name
+ * collision with subscriptions in other clusters. With current scheme
+ * pg_%u_sync_%u_UINT64_FORMAT (3 + 10 + 6 + 10 + 20 + '\0'), the maximum
+ * length of slot_name will be 50.
+ *
+ * The returned slot name is either:
+ * - stored in the supplied buffer (syncslotname), or
+ * - palloc'ed in current memory context (if syncslotname = NULL).
+ *
+ * Note: We don't use the subscription slot name as part of tablesync slot name
+ * because we are responsible for cleaning up these slots and it could become
+ * impossible to recalculate what name to cleanup if the subscription slot name
+ * had changed.
+ */
+char *
+ReplicationSlotNameForTablesync(Oid suboid, Oid relid, char
syncslotname[NAMEDATALEN])
+{
+ if (syncslotname)
+ sprintf(syncslotname, "pg_%u_sync_%u_" UINT64_FORMAT, suboid, relid,
+ GetSystemIdentifier());
+ else
+ syncslotname = psprintf("pg_%u_sync_%u_" UINT64_FORMAT, suboid, relid,
+ GetSystemIdentifier());
+
+ return syncslotname;
+}
6.
"We do append" --> "We append"
"With current scheme" -> "With the current scheme"
7.
Maybe consider to just assign GetSystemIdentifier() to a static
instead of calling that function for every slot?
static uint64 sysid = GetSystemIdentifier();
IIUC the sysid value is never going to change for a process, right?
------
Kind Regards,
Peter Smith.
Fujitsu Australia
Show quoted text
On Mon, Feb 8, 2021 at 9:59 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
On Fri, Feb 5, 2021 at 8:40 PM Petr Jelinek
<petr.jelinek@enterprisedb.com> wrote:Hi,
We had a bit high-level discussion about this patches with Amit
off-list, so I decided to also take a look at the actual code.Thanks for the discussion and a follow-up review.
My main concern originally was the potential for left-over slots on
publisher, but I think the state now is relatively okay, with couple of
corner cases that are documented and don't seem much worse than the main
slot.I wonder if we should mention the max_slot_wal_keep_size GUC in the
table sync docs though.I have added the reference of this in Alter Subscription where we
mentioned the risk of leftover slots. Let me know if you have
something else in mind?Another thing that might need documentation is that the the visibility
of changes done by table sync is not anymore isolated in that table
contents will show intermediate progress to other backends, rather than
switching from nothing to state consistent with rest of replication.Agreed and updated the docs accordingly.
Some minor comments about code:
+ else if (res->status == WALRCV_ERROR && missing_ok) + { + /* WARNING. Error, but missing_ok = true. */ + ereport(WARNING,I wonder if we need to add error code to the WalRcvExecResult and check
for the appropriate ones here. Because this can for example return error
because of timeout, not because slot is missing.I think there are both pros and cons of distinguishing the error
("slot doesnot exist" from others). The benefit is if there a network
glitch then the user can probably retry the commands Alter/Drop and it
will be successful next time. OTOH, say the network is broken for a
long time and the user wants to proceed but there won't be any way to
proceed for Alter Subscription ... Refresh or Drop Command. So by
giving WARNING at least we can provide a way to proceed and then they
can drop such slots later. We have mentioned this in docs as well. I
think we can go either way here, let me know what do you think is a
better way?Not sure if it matters
for current callers though (but then maybe don't call the param
missign_ok?).Sure, if we decide not to change the behavior as suggested by you then
this makes sense.+ReplicationSlotNameForTablesync(Oid suboid, Oid relid, char syncslotname[NAMEDATALEN]) +{ + if (syncslotname) + sprintf(syncslotname, "pg_%u_sync_%u", suboid, relid); + else + syncslotname = psprintf("pg_%u_sync_%u", suboid, relid); + + return syncslotname; +}Given that we are now explicitly dropping slots, what happens here if we
have 2 different downstreams that happen to get same suboid and reloid,
will one of the drop the slot of the other one? Previously with the
cleanup being left to temp slot we'd at maximum got error when creating
it but with the new logic in LogicalRepSyncTableStart it feels like we
could get into situation where 2 downstreams are fighting over slot no?As discussed, added system_identifier to distinguish subscriptions
between different clusters.Apart from fixing the above comment, I have integrated it with the new
replorigin_drop_by_name() API being discussed in the thread [1] and
posted that patch just for ease. I have also integrated Osumi-San's
test case patch with minor modifications.[1] - /messages/by-id/CAA4eK1L7mLhY=wyCB0qsEGUpfzWfncDSS9_0a4Co+N0GUyNGNQ@mail.gmail.com
--
With Regards,
Amit Kapila.
More V29 Feedback
FILE: alter_subscription.sgml
8.
+ <para>
+ Commands <command>ALTER SUBSCRIPTION ... REFRESH ..</command> and
+ <command>ALTER SUBSCRIPTION ... SET PUBLICATION ..</command> with refresh
+ option as true cannot be executed inside a transaction block.
+ </para>
My guess is those two lots of double dots ("..") were probably meant
to be ellipsis ("...")
----
Kind Regards,
Peter Smith.
Fujitsu Australia
Looking at the V29 style tablesync slot names now they appear like this:
WARNING: could not drop tablesync replication slot
"pg_16397_sync_16389_6927117142022745645"
That is in the order subid + relid + sysid
Now that I see it in a message it seems a bit strange with the sysid
just tacked onto the end like that.
I am wondering if reordering of parent to child might be more natural.
e.g sysid + subid + relid gives a more intuitive name IMO.
So in this example it would be "pg_sync_6927117142022745645_16397_16389"
Thoughts?
----
Kind Regards,
Peter Smith
Fujitsu Australia
When looking at the DropSubscription code I noticed that there is a
small difference between the HEAD code and the V29 code when slot_name
= NONE.
HEAD does
------
if (!slotname)
{
table_close(rel, NoLock);
return;
}
------
V29 does
------
if (!slotname)
{
/* be tidy */
list_free(rstates);
return;
}
------
Isn't the V29 code missing doing a table_close(rel, NoLock) there?
------
Kind Regards,
Peter Smith.
Fujitsu Australia
On Tue, Feb 9, 2021 at 12:02 PM Peter Smith <smithpb2250@gmail.com> wrote:
Here are my feedback comments for the V29 patch.
Thanks.
3.
Previously the tablesync origin name format was encapsulated in a
common function. IMO it was cleaner/safer how it was before, instead
of the same "pg_%u_%u" cut/paste and scattered in many places.
(same comment applies multiple places, in this file and in tablesync.c)4.
Calls like replorigin_drop_by_name(originname, true, false); make it
unnecessarily hard to read code when the boolean params are neither
named as variables nor commented. I noticed on another thread [et0205]
there was an idea that having no name/comments is fine because anyway
it is not difficult to figure out when using a "modern IDE", but since
my review tools are only "vi" and "meld" I beg to differ with that
justification.
(same comment applies multiple places, in this file and in tablesync.c)
It would be a bit convenient for you but for most others, I think it
would be noise. Personally, I find the code more readable without such
name comments, it just breaks the flow of code unless you want to
study in detail the value of each param.
[et0205] /messages/by-id/c1d9833f-eeeb-40d5-89ba-87674e1b7ba3@www.fastmail.com
=====
FILE: tablesync.c
5.
Previously there was a function tablesync_replorigin_drop which was
encapsulating the tablesync origin name formatting. I thought that was
better than the V29 code which now has the same formatting scattered
over many places.
(same comment applies for worker_internal.h)
Isn't this the same as what you want to say in point-3?
7.
Maybe consider to just assign GetSystemIdentifier() to a static
instead of calling that function for every slot?
static uint64 sysid = GetSystemIdentifier();
IIUC the sysid value is never going to change for a process, right?
That's right but I am not sure if there is much value in saving one
call here by introducing extra variable.
I'll fix other comments raised by you.
--
With Regards,
Amit Kapila.
On Tue, Feb 9, 2021 at 1:37 PM Peter Smith <smithpb2250@gmail.com> wrote:
Looking at the V29 style tablesync slot names now they appear like this:
WARNING: could not drop tablesync replication slot
"pg_16397_sync_16389_6927117142022745645"
That is in the order subid + relid + sysidNow that I see it in a message it seems a bit strange with the sysid
just tacked onto the end like that.I am wondering if reordering of parent to child might be more natural.
e.g sysid + subid + relid gives a more intuitive name IMO.So in this example it would be "pg_sync_6927117142022745645_16397_16389"
I have kept the order based on the importance of each parameter. Say
when the user sees this message in the server log of the subscriber
either for the purpose of tracking the origins progress or for errors,
the sysid parameter won't be of much use and they will be mostly
looking at subid and relid. OTOH, if due to some reason this parameter
appears in the publisher logs then sysid might be helpful.
Petr, anyone else, do you have any opinion on this matter?
--
With Regards,
Amit Kapila.
On Tue, Feb 9, 2021 at 12:02 PM Peter Smith <smithpb2250@gmail.com> wrote:
Here are my feedback comments for the V29 patch.
====
FILE: logical-replication.sgml
+ slots have generated names: <quote><literal>pg_%u_sync_%u_%llu</literal></quote> + (parameters: Subscription <parameter>oid</parameter>, + Table <parameter>relid</parameter>, system identifier<parameter>sysid</parameter>) + </para>1.
There is a missing space before the sysid parameter.=====
FILE: subscriptioncmds.c
+ * SUBREL_STATE_FINISHEDCOPY. The apply worker can also + * concurrently try to drop the origin and by this time the + * origin might be already removed. For these reasons, + * passing missing_ok = true from here. + */ + snprintf(originname, sizeof(originname), "pg_%u_%u", sub->oid, relid); + replorigin_drop_by_name(originname, true, false); + }2.
Don't really need to say "from here".
(same comment applies multiple places, in this file and in tablesync.c)3.
Previously the tablesync origin name format was encapsulated in a
common function. IMO it was cleaner/safer how it was before, instead
of the same "pg_%u_%u" cut/paste and scattered in many places.
(same comment applies multiple places, in this file and in tablesync.c)
Fixed all the three above comments.
4.
Calls like replorigin_drop_by_name(originname, true, false); make it
unnecessarily hard to read code when the boolean params are neither
named as variables nor commented. I noticed on another thread [et0205]
there was an idea that having no name/comments is fine because anyway
it is not difficult to figure out when using a "modern IDE", but since
my review tools are only "vi" and "meld" I beg to differ with that
justification.
(same comment applies multiple places, in this file and in tablesync.c)
Already responded to it separately. I went ahead and removed such
comments from other places in the patch.
[et0205] /messages/by-id/c1d9833f-eeeb-40d5-89ba-87674e1b7ba3@www.fastmail.com
=====
FILE: tablesync.c
5.
Previously there was a function tablesync_replorigin_drop which was
encapsulating the tablesync origin name formatting. I thought that was
better than the V29 code which now has the same formatting scattered
over many places.
(same comment applies for worker_internal.h)
I am not sure what different you are expecting here than point-3?
+ * Determine the tablesync slot name. + * + * The name must not exceed NAMEDATALEN - 1 because of remote node constraints + * on slot name length. We do append system_identifier to avoid slot_name + * collision with subscriptions in other clusters. With current scheme + * pg_%u_sync_%u_UINT64_FORMAT (3 + 10 + 6 + 10 + 20 + '\0'), the maximum + * length of slot_name will be 50. + * + * The returned slot name is either: + * - stored in the supplied buffer (syncslotname), or + * - palloc'ed in current memory context (if syncslotname = NULL). + * + * Note: We don't use the subscription slot name as part of tablesync slot name + * because we are responsible for cleaning up these slots and it could become + * impossible to recalculate what name to cleanup if the subscription slot name + * had changed. + */ +char * +ReplicationSlotNameForTablesync(Oid suboid, Oid relid, char syncslotname[NAMEDATALEN]) +{ + if (syncslotname) + sprintf(syncslotname, "pg_%u_sync_%u_" UINT64_FORMAT, suboid, relid, + GetSystemIdentifier()); + else + syncslotname = psprintf("pg_%u_sync_%u_" UINT64_FORMAT, suboid, relid, + GetSystemIdentifier()); + + return syncslotname; +}6.
"We do append" --> "We append"
"With current scheme" -> "With the current scheme"
Fixed.
7.
Maybe consider to just assign GetSystemIdentifier() to a static
instead of calling that function for every slot?
static uint64 sysid = GetSystemIdentifier();
IIUC the sysid value is never going to change for a process, right?
Already responded.
FILE: alter_subscription.sgml
8. + <para> + Commands <command>ALTER SUBSCRIPTION ... REFRESH ..</command> and + <command>ALTER SUBSCRIPTION ... SET PUBLICATION ..</command> with refresh + option as true cannot be executed inside a transaction block. + </para>My guess is those two lots of double dots ("..") were probably meant
to be ellipsis ("...")
Fixed, for the first one I completed the command by adding PUBLICATION.
When looking at the DropSubscription code I noticed that there is a
small difference between the HEAD code and the V29 code when slot_name
= NONE.HEAD does
------
if (!slotname)
{
table_close(rel, NoLock);
return;
}
------V29 does
------
if (!slotname)
{
/* be tidy */
list_free(rstates);
return;
}
------Isn't the V29 code missing doing a table_close(rel, NoLock) there?
Yes, good catch. Fixed.
--
With Regards,
Amit Kapila.
Attachments:
v30-0001-Make-pg_replication_origin_drop-safe-against-con.patchapplication/octet-stream; name=v30-0001-Make-pg_replication_origin_drop-safe-against-con.patchDownload
From ed0e6a09150220762517441d90173c4b85d61687 Mon Sep 17 00:00:00 2001
From: Amit Kapila <akapila@postgresql.org>
Date: Fri, 5 Feb 2021 12:20:13 +0530
Subject: [PATCH v30 1/2] Make pg_replication_origin_drop safe against
concurrent drops.
Currently, we get the origin id from the name and then drop the origin by
taking ExclusiveLock on ReplicationOriginRelationId. So, two concurrent
sessions can get the id from the name at the same time and then when they
try to drop the origin, one of the sessions will get the either
"tuple concurrently deleted" or "cache lookup failed for replication
origin ..".
To prevent this race condition we do the entire operation under lock. This
obviates the need for replorigin_drop() API and we have removed it so if
any extension authors are using it they need to instead use
replorigin_drop_by_name. See it's usage in pg_replication_origin_drop().
Author: Peter Smith
Reviewed-by: Amit Kapila, Euler Taveira, Petr Jelinek, and Alvaro
Herrera
Discussion: https://www.postgresql.org/message-id/CAHut%2BPuW8DWV5fskkMWWMqzt-x7RPcNQOtJQBp6SdwyRghCk7A%40mail.gmail.com
---
src/backend/commands/subscriptioncmds.c | 5 +-
src/backend/replication/logical/origin.c | 59 +++++++++++++++---------
src/include/replication/origin.h | 2 +-
3 files changed, 38 insertions(+), 28 deletions(-)
diff --git a/src/backend/commands/subscriptioncmds.c b/src/backend/commands/subscriptioncmds.c
index 082f7855b8..5ccbc9dd50 100644
--- a/src/backend/commands/subscriptioncmds.c
+++ b/src/backend/commands/subscriptioncmds.c
@@ -926,7 +926,6 @@ DropSubscription(DropSubscriptionStmt *stmt, bool isTopLevel)
ListCell *lc;
char originname[NAMEDATALEN];
char *err = NULL;
- RepOriginId originid;
WalReceiverConn *wrconn = NULL;
StringInfoData cmd;
Form_pg_subscription form;
@@ -1050,9 +1049,7 @@ DropSubscription(DropSubscriptionStmt *stmt, bool isTopLevel)
/* Remove the origin tracking if exists. */
snprintf(originname, sizeof(originname), "pg_%u", subid);
- originid = replorigin_by_name(originname, true);
- if (originid != InvalidRepOriginId)
- replorigin_drop(originid, false);
+ replorigin_drop_by_name(originname, true, false);
/*
* If there is no slot associated with the subscription, we can finish
diff --git a/src/backend/replication/logical/origin.c b/src/backend/replication/logical/origin.c
index 9bd761a426..685eaa6134 100644
--- a/src/backend/replication/logical/origin.c
+++ b/src/backend/replication/logical/origin.c
@@ -322,27 +322,15 @@ replorigin_create(char *roname)
return roident;
}
-
/*
- * Drop replication origin.
- *
- * Needs to be called in a transaction.
+ * Helper function to drop a replication origin.
*/
-void
-replorigin_drop(RepOriginId roident, bool nowait)
+static void
+replorigin_drop_guts(Relation rel, RepOriginId roident, bool nowait)
{
HeapTuple tuple;
- Relation rel;
int i;
- Assert(IsTransactionState());
-
- /*
- * To interlock against concurrent drops, we hold ExclusiveLock on
- * pg_replication_origin throughout this function.
- */
- rel = table_open(ReplicationOriginRelationId, ExclusiveLock);
-
/*
* First, clean up the slot state info, if there is any matching slot.
*/
@@ -415,11 +403,40 @@ restart:
ReleaseSysCache(tuple);
CommandCounterIncrement();
-
- /* now release lock again */
- table_close(rel, ExclusiveLock);
}
+/*
+ * Drop replication origin (by name).
+ *
+ * Needs to be called in a transaction.
+ */
+void
+replorigin_drop_by_name(char *name, bool missing_ok, bool nowait)
+{
+ RepOriginId roident;
+ Relation rel;
+
+ Assert(IsTransactionState());
+
+ /*
+ * To interlock against concurrent drops, we hold ExclusiveLock on
+ * pg_replication_origin till xact commit.
+ *
+ * XXX We can optimize this by acquiring the lock on a specific origin by
+ * using LockSharedObject if required. However, for that, we first to
+ * acquire a lock on ReplicationOriginRelationId, get the origin_id, lock
+ * the specific origin and then re-check if the origin still exists.
+ */
+ rel = table_open(ReplicationOriginRelationId, ExclusiveLock);
+
+ roident = replorigin_by_name(name, missing_ok);
+
+ if (OidIsValid(roident))
+ replorigin_drop_guts(rel, roident, nowait);
+
+ /* We keep the lock on pg_replication_origin until commit */
+ table_close(rel, NoLock);
+}
/*
* Lookup replication origin via its oid and return the name.
@@ -1256,16 +1273,12 @@ Datum
pg_replication_origin_drop(PG_FUNCTION_ARGS)
{
char *name;
- RepOriginId roident;
replorigin_check_prerequisites(false, false);
name = text_to_cstring((text *) DatumGetPointer(PG_GETARG_DATUM(0)));
- roident = replorigin_by_name(name, false);
- Assert(OidIsValid(roident));
-
- replorigin_drop(roident, true);
+ replorigin_drop_by_name(name, false, true);
pfree(name);
diff --git a/src/include/replication/origin.h b/src/include/replication/origin.h
index 731445ae8f..d2ed6305fe 100644
--- a/src/include/replication/origin.h
+++ b/src/include/replication/origin.h
@@ -40,7 +40,7 @@ extern PGDLLIMPORT TimestampTz replorigin_session_origin_timestamp;
/* API for querying & manipulating replication origins */
extern RepOriginId replorigin_by_name(char *name, bool missing_ok);
extern RepOriginId replorigin_create(char *name);
-extern void replorigin_drop(RepOriginId roident, bool nowait);
+extern void replorigin_drop_by_name(char *name, bool missing_ok, bool nowait);
extern bool replorigin_by_oid(RepOriginId roident, bool missing_ok,
char **roname);
--
2.28.0.windows.1
v30-0002-Allow-multiple-xacts-during-table-sync-in-logica.patchapplication/octet-stream; name=v30-0002-Allow-multiple-xacts-during-table-sync-in-logica.patchDownload
From 5d10aba91b4de33d65dfcf5ba436df3987557758 Mon Sep 17 00:00:00 2001
From: Amit Kapila <akapila@postgresql.org>
Date: Sat, 6 Feb 2021 17:16:19 +0530
Subject: [PATCH v30 2/2] Allow multiple xacts during table sync in logical
replication.
For the initial table data synchronization in logical replication, we use
a single transaction to copy the entire table and then synchronize the
position in the stream with the main apply worker.
There are multiple downsides of this approach: (a) We have to perform the
entire copy operation again if there is any error (network breakdown,
error in the database operation, etc.) while we synchronize the WAL
position between tablesync worker and apply worker; this will be onerous
especially for large copies, (b) Using a single transaction in the
synchronization-phase (where we can receive WAL from multiple
transactions) will have the risk of exceeding the CID limit, (c) The slot
will hold the WAL till the entire sync is complete because we never commit
till the end.
This patch solves all the above downsides by allowing multiple
transactions during the tablesync phase. The initial copy is done in a
single transaction and after that, we commit each transaction as we
receive. To allow recovery after any error or crash, we use a permanent
slot and origin to track the progress. The slot and origin will be removed
once we finish the synchronization of the table. We also remove slot and
origin of tablesync workers if the user performs DROP SUBSCRIPTION .. or
ALTER SUBSCRIPTION .. REFERESH and some of the table syncs are still not
finished.
The commands ALTER SUBSCRIPTION ... REFRESH .. and
ALTER SUBSCRIPTION ... SET PUBLICATION .. with refresh option as true
cannot be executed inside a transaction block because they can now drop
the slots for which we have no provision to rollback.
This will also open up the path for logical replication of 2PC
transactions on the subscriber side. Previously, we can't do that because
of the requirement of maintaining a single transaction in tablesync
workers.
Author: Peter Smith, Amit Kapila, and Takamichi Osumi
Reviewed-by: Ajin Cherian, Petr Jelinek, Hou Zhijie and Amit Kapila
Discussion: https://postgr.es/m/CAA4eK1KHJxaZS-fod-0fey=0tq3=Gkn4ho=8N4-5HWiCfu0H1A@mail.gmail.com
---
doc/src/sgml/catalogs.sgml | 1 +
doc/src/sgml/logical-replication.sgml | 59 ++-
doc/src/sgml/ref/alter_subscription.sgml | 20 +
doc/src/sgml/ref/drop_subscription.sgml | 6 +-
src/backend/access/transam/xact.c | 11 -
src/backend/catalog/pg_subscription.c | 35 ++
src/backend/commands/subscriptioncmds.c | 469 ++++++++++++++++----
src/backend/replication/logical/launcher.c | 147 ------
src/backend/replication/logical/tablesync.c | 242 ++++++++--
src/backend/replication/logical/worker.c | 18 +-
src/backend/tcop/utility.c | 3 +-
src/include/catalog/pg_subscription_rel.h | 2 +
src/include/commands/subscriptioncmds.h | 2 +-
src/include/replication/logicallauncher.h | 2 -
src/include/replication/slot.h | 3 +
src/include/replication/worker_internal.h | 3 +-
src/test/regress/expected/subscription.out | 21 +
src/test/regress/sql/subscription.sql | 22 +
src/test/subscription/t/004_sync.pl | 21 +-
src/tools/pgindent/typedefs.list | 1 -
20 files changed, 760 insertions(+), 328 deletions(-)
diff --git a/doc/src/sgml/catalogs.sgml b/doc/src/sgml/catalogs.sgml
index ea222c0464..692ad65de2 100644
--- a/doc/src/sgml/catalogs.sgml
+++ b/doc/src/sgml/catalogs.sgml
@@ -7673,6 +7673,7 @@ SCRAM-SHA-256$<replaceable><iteration count></replaceable>:<replaceable>&l
State code:
<literal>i</literal> = initialize,
<literal>d</literal> = data is being copied,
+ <literal>f</literal> = finished table copy,
<literal>s</literal> = synchronized,
<literal>r</literal> = ready (normal replication)
</para></entry>
diff --git a/doc/src/sgml/logical-replication.sgml b/doc/src/sgml/logical-replication.sgml
index a560ad69b4..d0742f2c52 100644
--- a/doc/src/sgml/logical-replication.sgml
+++ b/doc/src/sgml/logical-replication.sgml
@@ -186,9 +186,10 @@
<para>
Each subscription will receive changes via one replication slot (see
- <xref linkend="streaming-replication-slots"/>). Additional temporary
- replication slots may be required for the initial data synchronization
- of pre-existing table data.
+ <xref linkend="streaming-replication-slots"/>). Additional replication
+ slots may be required for the initial data synchronization of
+ pre-existing table data and those will be dropped at the end of data
+ synchronization.
</para>
<para>
@@ -248,13 +249,23 @@
<para>
As mentioned earlier, each (active) subscription receives changes from a
- replication slot on the remote (publishing) side. Normally, the remote
- replication slot is created automatically when the subscription is created
- using <command>CREATE SUBSCRIPTION</command> and it is dropped
- automatically when the subscription is dropped using <command>DROP
- SUBSCRIPTION</command>. In some situations, however, it can be useful or
- necessary to manipulate the subscription and the underlying replication
- slot separately. Here are some scenarios:
+ replication slot on the remote (publishing) side.
+ </para>
+ <para>
+ Additional table synchronization slots are normally transient, created
+ internally to perform initial table synchronization and dropped
+ automatically when they are no longer needed. These table synchronization
+ slots have generated names: <quote><literal>pg_%u_sync_%u_%llu</literal></quote>
+ (parameters: Subscription <parameter>oid</parameter>,
+ Table <parameter>relid</parameter>, system identifier <parameter>sysid</parameter>)
+ </para>
+ <para>
+ Normally, the remote replication slot is created automatically when the
+ subscription is created using <command>CREATE SUBSCRIPTION</command> and it
+ is dropped automatically when the subscription is dropped using
+ <command>DROP SUBSCRIPTION</command>. In some situations, however, it can
+ be useful or necessary to manipulate the subscription and the underlying
+ replication slot separately. Here are some scenarios:
<itemizedlist>
<listitem>
@@ -294,8 +305,9 @@
using <command>ALTER SUBSCRIPTION</command> before attempting to drop
the subscription. If the remote database instance no longer exists, no
further action is then necessary. If, however, the remote database
- instance is just unreachable, the replication slot should then be
- dropped manually; otherwise it would continue to reserve WAL and might
+ instance is just unreachable, the replication slot (and any still
+ remaining table synchronization slots) should then be
+ dropped manually; otherwise it/they would continue to reserve WAL and might
eventually cause the disk to fill up. Such cases should be carefully
investigated.
</para>
@@ -468,16 +480,19 @@
<sect2 id="logical-replication-snapshot">
<title>Initial Snapshot</title>
<para>
- The initial data in existing subscribed tables are snapshotted and
- copied in a parallel instance of a special kind of apply process.
- This process will create its own temporary replication slot and
- copy the existing data. Once existing data is copied, the worker
- enters synchronization mode, which ensures that the table is brought
- up to a synchronized state with the main apply process by streaming
- any changes that happened during the initial data copy using standard
- logical replication. Once the synchronization is done, the control
- of the replication of the table is given back to the main apply
- process where the replication continues as normal.
+ The initial data in existing subscribed tables are snapshotted and
+ copied in a parallel instance of a special kind of apply process.
+ This process will create its own replication slot and copy the existing
+ data. As soon as the copy is finished the table contents will become
+ visible to other backends. Once existing data is copied, the worker
+ enters synchronization mode, which ensures that the table is brought
+ up to a synchronized state with the main apply process by streaming
+ any changes that happened during the initial data copy using standard
+ logical replication. During this synchronization phase, the changes
+ are applied and committed in the same order as they happened on the
+ publisher. Once the synchronization is done, the control of the
+ replication of the table is given back to the main apply process where
+ the replication continues as normal.
</para>
</sect2>
</sect1>
diff --git a/doc/src/sgml/ref/alter_subscription.sgml b/doc/src/sgml/ref/alter_subscription.sgml
index db5e59f707..1ca2437b48 100644
--- a/doc/src/sgml/ref/alter_subscription.sgml
+++ b/doc/src/sgml/ref/alter_subscription.sgml
@@ -48,6 +48,26 @@ ALTER SUBSCRIPTION <replaceable class="parameter">name</replaceable> RENAME TO <
(Currently, all subscription owners must be superusers, so the owner checks
will be bypassed in practice. But this might change in the future.)
</para>
+
+ <para>
+ When refreshing a publication we remove the relations that are no longer
+ part of the publication and we also remove the tablesync slots if there are
+ any. It is necessary to remove tablesync slots so that the resources
+ allocated for the subscription on the remote host are released. If due to
+ network breakdown or some other error, <productname>PostgreSQL</productname>
+ is unable to remove the slots, a WARNING will be reported. The user needs to
+ manually remove such slots later or the
+ <xref linkend="guc-max-slot-wal-keep-size"/> should be configured on the
+ remote host as otherwise, they will continue to reserve WAL and might
+ eventually cause the disk to fill up. See also
+ <xref linkend="logical-replication-subscription-slot"/>.
+ </para>
+
+ <para>
+ Commands <command>ALTER SUBSCRIPTION ... REFRESH PUBLICATION</command> and
+ <command>ALTER SUBSCRIPTION ... SET PUBLICATION ...</command> with refresh
+ option as true cannot be executed inside a transaction block.
+ </para>
</refsect1>
<refsect1>
diff --git a/doc/src/sgml/ref/drop_subscription.sgml b/doc/src/sgml/ref/drop_subscription.sgml
index adbdeafb4e..aee9615546 100644
--- a/doc/src/sgml/ref/drop_subscription.sgml
+++ b/doc/src/sgml/ref/drop_subscription.sgml
@@ -79,7 +79,8 @@ DROP SUBSCRIPTION [ IF EXISTS ] <replaceable class="parameter">name</replaceable
<para>
When dropping a subscription that is associated with a replication slot on
the remote host (the normal state), <command>DROP SUBSCRIPTION</command>
- will connect to the remote host and try to drop the replication slot as
+ will connect to the remote host and try to drop the replication slot (and
+ any remaining table synchronization slots) as
part of its operation. This is necessary so that the resources allocated
for the subscription on the remote host are released. If this fails,
either because the remote host is not reachable or because the remote
@@ -89,7 +90,8 @@ DROP SUBSCRIPTION [ IF EXISTS ] <replaceable class="parameter">name</replaceable
executing <literal>ALTER SUBSCRIPTION ... SET (slot_name = NONE)</literal>.
After that, <command>DROP SUBSCRIPTION</command> will no longer attempt any
actions on a remote host. Note that if the remote replication slot still
- exists, it should then be dropped manually; otherwise it will continue to
+ exists, it (and any related table synchronization slots) should then be
+ dropped manually; otherwise it/they will continue to
reserve WAL and might eventually cause the disk to fill up. See
also <xref linkend="logical-replication-subscription-slot"/>.
</para>
diff --git a/src/backend/access/transam/xact.c b/src/backend/access/transam/xact.c
index a2068e3fd4..3c8b4eb362 100644
--- a/src/backend/access/transam/xact.c
+++ b/src/backend/access/transam/xact.c
@@ -2432,15 +2432,6 @@ PrepareTransaction(void)
(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
errmsg("cannot PREPARE a transaction that has exported snapshots")));
- /*
- * Don't allow PREPARE but for transaction that has/might kill logical
- * replication workers.
- */
- if (XactManipulatesLogicalReplicationWorkers())
- ereport(ERROR,
- (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
- errmsg("cannot PREPARE a transaction that has manipulated logical replication workers")));
-
/* Prevent cancel/die interrupt while cleaning up */
HOLD_INTERRUPTS();
@@ -4899,7 +4890,6 @@ CommitSubTransaction(void)
AtEOSubXact_HashTables(true, s->nestingLevel);
AtEOSubXact_PgStat(true, s->nestingLevel);
AtSubCommit_Snapshot(s->nestingLevel);
- AtEOSubXact_ApplyLauncher(true, s->nestingLevel);
/*
* We need to restore the upper transaction's read-only state, in case the
@@ -5059,7 +5049,6 @@ AbortSubTransaction(void)
AtEOSubXact_HashTables(false, s->nestingLevel);
AtEOSubXact_PgStat(false, s->nestingLevel);
AtSubAbort_Snapshot(s->nestingLevel);
- AtEOSubXact_ApplyLauncher(false, s->nestingLevel);
}
/*
diff --git a/src/backend/catalog/pg_subscription.c b/src/backend/catalog/pg_subscription.c
index 44cb285b68..4f567fd221 100644
--- a/src/backend/catalog/pg_subscription.c
+++ b/src/backend/catalog/pg_subscription.c
@@ -29,6 +29,7 @@
#include "utils/array.h"
#include "utils/builtins.h"
#include "utils/fmgroids.h"
+#include "utils/lsyscache.h"
#include "utils/pg_lsn.h"
#include "utils/rel.h"
#include "utils/syscache.h"
@@ -337,6 +338,13 @@ GetSubscriptionRelState(Oid subid, Oid relid, XLogRecPtr *sublsn)
char substate;
bool isnull;
Datum d;
+ Relation rel;
+
+ /*
+ * This is to avoid the race condition with AlterSubscription which tries
+ * to remove this relstate.
+ */
+ rel = table_open(SubscriptionRelRelationId, AccessShareLock);
/* Try finding the mapping. */
tup = SearchSysCache2(SUBSCRIPTIONRELMAP,
@@ -363,6 +371,8 @@ GetSubscriptionRelState(Oid subid, Oid relid, XLogRecPtr *sublsn)
/* Cleanup */
ReleaseSysCache(tup);
+ table_close(rel, AccessShareLock);
+
return substate;
}
@@ -403,6 +413,31 @@ RemoveSubscriptionRel(Oid subid, Oid relid)
scan = table_beginscan_catalog(rel, nkeys, skey);
while (HeapTupleIsValid(tup = heap_getnext(scan, ForwardScanDirection)))
{
+ Form_pg_subscription_rel subrel;
+
+ subrel = (Form_pg_subscription_rel) GETSTRUCT(tup);
+
+ /*
+ * We don't allow to drop the relation mapping when the table
+ * synchronization is in progress unless the caller updates the
+ * corresponding subscription as well. This is to ensure that we don't
+ * leave tablesync slots or origins in the system when the
+ * corresponding table is dropped.
+ */
+ if (!OidIsValid(subid) && subrel->srsubstate != SUBREL_STATE_READY)
+ {
+ ereport(ERROR,
+ (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+ errmsg("could not drop relation mapping for subscription \"%s\"",
+ get_subscription_name(subrel->srsubid, false)),
+ errdetail("Table synchronization for relation \"%s\" is in progress and is in state \"%c\".",
+ get_rel_name(relid), subrel->srsubstate),
+ /* translator: first %s is a SQL ALTER command and second %s is a SQL DROP command */
+ errhint("Use %s to enable subscription if not already enabled or use %s to drop the subscription.",
+ "ALTER SUBSCRIPTION ... ENABLE",
+ "DROP SUBSCRIPTION ...")));
+ }
+
CatalogTupleDelete(rel, &tup->t_self);
}
table_endscan(scan);
diff --git a/src/backend/commands/subscriptioncmds.c b/src/backend/commands/subscriptioncmds.c
index 5ccbc9dd50..8046153371 100644
--- a/src/backend/commands/subscriptioncmds.c
+++ b/src/backend/commands/subscriptioncmds.c
@@ -34,6 +34,7 @@
#include "nodes/makefuncs.h"
#include "replication/logicallauncher.h"
#include "replication/origin.h"
+#include "replication/slot.h"
#include "replication/walreceiver.h"
#include "replication/walsender.h"
#include "replication/worker_internal.h"
@@ -46,6 +47,8 @@
#include "utils/syscache.h"
static List *fetch_table_list(WalReceiverConn *wrconn, List *publications);
+static void ReportSlotConnectionError(List *rstates, Oid subid, char *slotname, char *err);
+
/*
* Common option parsing function for CREATE and ALTER SUBSCRIPTION commands.
@@ -566,107 +569,212 @@ AlterSubscription_refresh(Subscription *sub, bool copy_data)
Oid *pubrel_local_oids;
ListCell *lc;
int off;
+ int remove_rel_len;
+ Relation rel = NULL;
+ typedef struct SubRemoveRels
+ {
+ Oid relid;
+ char state;
+ } SubRemoveRels;
+ SubRemoveRels *sub_remove_rels;
/* Load the library providing us libpq calls. */
load_file("libpqwalreceiver", false);
- /* Try to connect to the publisher. */
- wrconn = walrcv_connect(sub->conninfo, true, sub->name, &err);
- if (!wrconn)
- ereport(ERROR,
- (errmsg("could not connect to the publisher: %s", err)));
-
- /* Get the table list from publisher. */
- pubrel_names = fetch_table_list(wrconn, sub->publications);
+ PG_TRY();
+ {
+ /* Try to connect to the publisher. */
+ wrconn = walrcv_connect(sub->conninfo, true, sub->name, &err);
+ if (!wrconn)
+ ereport(ERROR,
+ (errmsg("could not connect to the publisher: %s", err)));
- /* We are done with the remote side, close connection. */
- walrcv_disconnect(wrconn);
+ /* Get the table list from publisher. */
+ pubrel_names = fetch_table_list(wrconn, sub->publications);
- /* Get local table list. */
- subrel_states = GetSubscriptionRelations(sub->oid);
+ /* Get local table list. */
+ subrel_states = GetSubscriptionRelations(sub->oid);
- /*
- * Build qsorted array of local table oids for faster lookup. This can
- * potentially contain all tables in the database so speed of lookup is
- * important.
- */
- subrel_local_oids = palloc(list_length(subrel_states) * sizeof(Oid));
- off = 0;
- foreach(lc, subrel_states)
- {
- SubscriptionRelState *relstate = (SubscriptionRelState *) lfirst(lc);
+ /*
+ * Build qsorted array of local table oids for faster lookup. This can
+ * potentially contain all tables in the database so speed of lookup
+ * is important.
+ */
+ subrel_local_oids = palloc(list_length(subrel_states) * sizeof(Oid));
+ off = 0;
+ foreach(lc, subrel_states)
+ {
+ SubscriptionRelState *relstate = (SubscriptionRelState *) lfirst(lc);
- subrel_local_oids[off++] = relstate->relid;
- }
- qsort(subrel_local_oids, list_length(subrel_states),
- sizeof(Oid), oid_cmp);
+ subrel_local_oids[off++] = relstate->relid;
+ }
+ qsort(subrel_local_oids, list_length(subrel_states),
+ sizeof(Oid), oid_cmp);
+
+ /*
+ * Rels that we want to remove from subscription and drop any slots and
+ * origins corresponding to them.
+ */
+ sub_remove_rels = palloc(list_length(subrel_states) * sizeof(SubRemoveRels));
+
+ /*
+ * Walk over the remote tables and try to match them to locally known
+ * tables. If the table is not known locally create a new state for
+ * it.
+ *
+ * Also builds array of local oids of remote tables for the next step.
+ */
+ off = 0;
+ pubrel_local_oids = palloc(list_length(pubrel_names) * sizeof(Oid));
+
+ foreach(lc, pubrel_names)
+ {
+ RangeVar *rv = (RangeVar *) lfirst(lc);
+ Oid relid;
- /*
- * Walk over the remote tables and try to match them to locally known
- * tables. If the table is not known locally create a new state for it.
- *
- * Also builds array of local oids of remote tables for the next step.
- */
- off = 0;
- pubrel_local_oids = palloc(list_length(pubrel_names) * sizeof(Oid));
+ relid = RangeVarGetRelid(rv, AccessShareLock, false);
- foreach(lc, pubrel_names)
- {
- RangeVar *rv = (RangeVar *) lfirst(lc);
- Oid relid;
+ /* Check for supported relkind. */
+ CheckSubscriptionRelkind(get_rel_relkind(relid),
+ rv->schemaname, rv->relname);
- relid = RangeVarGetRelid(rv, AccessShareLock, false);
+ pubrel_local_oids[off++] = relid;
- /* Check for supported relkind. */
- CheckSubscriptionRelkind(get_rel_relkind(relid),
- rv->schemaname, rv->relname);
+ if (!bsearch(&relid, subrel_local_oids,
+ list_length(subrel_states), sizeof(Oid), oid_cmp))
+ {
+ AddSubscriptionRelState(sub->oid, relid,
+ copy_data ? SUBREL_STATE_INIT : SUBREL_STATE_READY,
+ InvalidXLogRecPtr);
+ ereport(DEBUG1,
+ (errmsg("table \"%s.%s\" added to subscription \"%s\"",
+ rv->schemaname, rv->relname, sub->name)));
+ }
+ }
- pubrel_local_oids[off++] = relid;
+ /*
+ * Next remove state for tables we should not care about anymore using
+ * the data we collected above
+ */
+ qsort(pubrel_local_oids, list_length(pubrel_names),
+ sizeof(Oid), oid_cmp);
- if (!bsearch(&relid, subrel_local_oids,
- list_length(subrel_states), sizeof(Oid), oid_cmp))
+ remove_rel_len = 0;
+ for (off = 0; off < list_length(subrel_states); off++)
{
- AddSubscriptionRelState(sub->oid, relid,
- copy_data ? SUBREL_STATE_INIT : SUBREL_STATE_READY,
- InvalidXLogRecPtr);
- ereport(DEBUG1,
- (errmsg("table \"%s.%s\" added to subscription \"%s\"",
- rv->schemaname, rv->relname, sub->name)));
- }
- }
+ Oid relid = subrel_local_oids[off];
- /*
- * Next remove state for tables we should not care about anymore using the
- * data we collected above
- */
- qsort(pubrel_local_oids, list_length(pubrel_names),
- sizeof(Oid), oid_cmp);
+ if (!bsearch(&relid, pubrel_local_oids,
+ list_length(pubrel_names), sizeof(Oid), oid_cmp))
+ {
+ char state;
+ XLogRecPtr statelsn;
+
+ /*
+ * Lock pg_subscription_rel with AccessExclusiveLock to prevent any race
+ * conditions with the apply worker re-launching workers at the same time
+ * this code is trying to remove those tables.
+ *
+ * Even if new worker for this particular rel is restarted it won't be able
+ * to make any progress as we hold exclusive lock on subscription_rel till
+ * the transaction end. It will simply exit as there is no corresponding
+ * rel entry.
+ *
+ * This locking also ensures that the state of rels won't change till we
+ * are done with this refresh operation.
+ */
+ if (!rel)
+ rel = table_open(SubscriptionRelRelationId, AccessExclusiveLock);
+
+ /* Last known rel state. */
+ state = GetSubscriptionRelState(sub->oid, relid, &statelsn);
+
+ sub_remove_rels[remove_rel_len].relid = relid;
+ sub_remove_rels[remove_rel_len++].state = state;
+
+ RemoveSubscriptionRel(sub->oid, relid);
+
+ logicalrep_worker_stop(sub->oid, relid);
+
+ /*
+ * For READY state, we would have already dropped the tablesync
+ * origin.
+ */
+ if (state != SUBREL_STATE_READY)
+ {
+ char originname[NAMEDATALEN];
+
+ /*
+ * Drop the tablesync's origin tracking if exists.
+ *
+ * It is possible that the origin is not yet created for
+ * tablesync worker, this can happen for the states before
+ * SUBREL_STATE_FINISHEDCOPY. The apply worker can also
+ * concurrently try to drop the origin and by this time the
+ * origin might be already removed. For these reasons,
+ * passing missing_ok = true.
+ */
+ ReplicationOriginNameForTableSync(sub->oid, relid, originname);
+ replorigin_drop_by_name(originname, true, false);
+ }
- for (off = 0; off < list_length(subrel_states); off++)
- {
- Oid relid = subrel_local_oids[off];
+ ereport(DEBUG1,
+ (errmsg("table \"%s.%s\" removed from subscription \"%s\"",
+ get_namespace_name(get_rel_namespace(relid)),
+ get_rel_name(relid),
+ sub->name)));
+ }
+ }
- if (!bsearch(&relid, pubrel_local_oids,
- list_length(pubrel_names), sizeof(Oid), oid_cmp))
+ /*
+ * Drop the tablesync slots associated with removed tables. This has to
+ * be at the end because otherwise if there is an error while doing the
+ * database operations we won't be able to rollback dropped slots.
+ */
+ for (off = 0; off < remove_rel_len; off++)
{
- RemoveSubscriptionRel(sub->oid, relid);
-
- logicalrep_worker_stop_at_commit(sub->oid, relid);
-
- ereport(DEBUG1,
- (errmsg("table \"%s.%s\" removed from subscription \"%s\"",
- get_namespace_name(get_rel_namespace(relid)),
- get_rel_name(relid),
- sub->name)));
+ if (sub_remove_rels[off].state != SUBREL_STATE_READY &&
+ sub_remove_rels[off].state != SUBREL_STATE_SYNCDONE)
+ {
+ char syncslotname[NAMEDATALEN] = { 0 };
+
+ /*
+ * For READY/SYNCDONE states we know the tablesync slot has
+ * already been dropped by the tablesync worker.
+ *
+ * For other states, there is no certainty, maybe the slot
+ * does not exist yet. Also, if we fail after removing some of
+ * the slots, next time, it will again try to drop already
+ * dropped slots and fail. For these reasons, we allow
+ * missing_ok = true for the drop.
+ *
+ * XXX If there is a network breakdown while dropping the
+ * slots then we will give a WARNING to the user and they need
+ * to manually remove such slots. This can happen so rarely to
+ * worry about and we don't have any better way to deal with
+ * this.
+ */
+ ReplicationSlotNameForTablesync(sub->oid, sub_remove_rels[off].relid, syncslotname);
+ ReplicationSlotDropAtPubNode(wrconn, syncslotname, true);
+ }
}
}
+ PG_FINALLY();
+ {
+ if (wrconn)
+ walrcv_disconnect(wrconn);
+ }
+ PG_END_TRY();
+
+ if (rel)
+ table_close(rel, NoLock);
}
/*
* Alter the existing subscription.
*/
ObjectAddress
-AlterSubscription(AlterSubscriptionStmt *stmt)
+AlterSubscription(AlterSubscriptionStmt *stmt, bool isTopLevel)
{
Relation rel;
ObjectAddress myself;
@@ -848,6 +956,8 @@ AlterSubscription(AlterSubscriptionStmt *stmt)
errmsg("ALTER SUBSCRIPTION with refresh is not allowed for disabled subscriptions"),
errhint("Use ALTER SUBSCRIPTION ... SET PUBLICATION ... WITH (refresh = false).")));
+ PreventInTransactionBlock(isTopLevel, "ALTER SUBSCRIPTION with refresh");
+
/* Make sure refresh sees the new list of publications. */
sub->publications = stmt->publication;
@@ -877,6 +987,8 @@ AlterSubscription(AlterSubscriptionStmt *stmt)
NULL, NULL, /* no "binary" */
NULL, NULL); /* no "streaming" */
+ PreventInTransactionBlock(isTopLevel, "ALTER SUBSCRIPTION ... REFRESH");
+
AlterSubscription_refresh(sub, copy_data);
break;
@@ -927,8 +1039,8 @@ DropSubscription(DropSubscriptionStmt *stmt, bool isTopLevel)
char originname[NAMEDATALEN];
char *err = NULL;
WalReceiverConn *wrconn = NULL;
- StringInfoData cmd;
Form_pg_subscription form;
+ List *rstates;
/*
* Lock pg_subscription with AccessExclusiveLock to ensure that the
@@ -1041,6 +1153,36 @@ DropSubscription(DropSubscriptionStmt *stmt, bool isTopLevel)
}
list_free(subworkers);
+ /*
+ * Cleanup of tablesync replication origins.
+ *
+ * Any READY-state relations would already have dealt with clean-ups.
+ *
+ * Note that the state can't change because we have already stopped both
+ * the apply and tablesync workers and they can't restart because of
+ * exclusive lock on the subscription.
+ */
+ rstates = GetSubscriptionNotReadyRelations(subid);
+ foreach(lc, rstates)
+ {
+ SubscriptionRelState* rstate = (SubscriptionRelState*) lfirst(lc);
+ Oid relid = rstate->relid;
+
+ /* Only cleanup resources of tablesync workers */
+ if (!OidIsValid(relid))
+ continue;
+
+ /*
+ * Drop the tablesync's origin tracking if exists.
+ *
+ * It is possible that the origin is not yet created for tablesync
+ * worker so passing missing_ok = true. This can happen for the states
+ * before SUBREL_STATE_FINISHEDCOPY.
+ */
+ ReplicationOriginNameForTableSync(subid, relid, originname);
+ replorigin_drop_by_name(originname, true, false);
+ }
+
/* Clean up dependencies */
deleteSharedDependencyRecordsFor(SubscriptionRelationId, subid, 0);
@@ -1055,30 +1197,109 @@ DropSubscription(DropSubscriptionStmt *stmt, bool isTopLevel)
* If there is no slot associated with the subscription, we can finish
* here.
*/
- if (!slotname)
+ if (!slotname && rstates == NIL)
{
table_close(rel, NoLock);
return;
}
/*
- * Otherwise drop the replication slot at the publisher node using the
- * replication connection.
+ * Try to acquire the connection necessary for dropping slots.
+ *
+ * Note: If the slotname is NONE/NULL then we allow the command to finish
+ * and users need to manually cleanup the apply and tablesync worker slots
+ * later.
+ *
+ * This has to be at the end because otherwise if there is an error while
+ * doing the database operations we won't be able to rollback dropped slot.
*/
load_file("libpqwalreceiver", false);
- initStringInfo(&cmd);
- appendStringInfo(&cmd, "DROP_REPLICATION_SLOT %s WAIT", quote_identifier(slotname));
-
wrconn = walrcv_connect(conninfo, true, subname, &err);
if (wrconn == NULL)
- ereport(ERROR,
- (errmsg("could not connect to publisher when attempting to "
- "drop the replication slot \"%s\"", slotname),
- errdetail("The error was: %s", err),
- /* translator: %s is an SQL ALTER command */
- errhint("Use %s to disassociate the subscription from the slot.",
- "ALTER SUBSCRIPTION ... SET (slot_name = NONE)")));
+ {
+ if (!slotname)
+ {
+ /* be tidy */
+ list_free(rstates);
+ table_close(rel, NoLock);
+ return;
+ }
+ else
+ {
+ ReportSlotConnectionError(rstates, subid, slotname, err);
+ }
+ }
+
+ PG_TRY();
+ {
+ foreach(lc, rstates)
+ {
+ SubscriptionRelState *rstate = (SubscriptionRelState *) lfirst(lc);
+ Oid relid = rstate->relid;
+
+ /* Only cleanup resources of tablesync workers */
+ if (!OidIsValid(relid))
+ continue;
+
+ /*
+ * Drop the tablesync slots associated with removed tables.
+ *
+ * For SYNCDONE/READY states, the tablesync slot is known to have
+ * already been dropped by the tablesync worker.
+ *
+ * For other states, there is no certainty, maybe the slot
+ * does not exist yet. Also, if we fail after removing some of
+ * the slots, next time, it will again try to drop already
+ * dropped slots and fail. For these reasons, we allow
+ * missing_ok = true for the drop.
+ */
+ if (rstate->state != SUBREL_STATE_SYNCDONE)
+ {
+ char syncslotname[NAMEDATALEN] = {0};
+
+ ReplicationSlotNameForTablesync(subid, relid, syncslotname);
+ ReplicationSlotDropAtPubNode(wrconn, syncslotname, true);
+ }
+ }
+
+ list_free(rstates);
+
+ /*
+ * If there is a slot associated with the subscription, then drop the
+ * replication slot at the publisher.
+ */
+ if (slotname)
+ ReplicationSlotDropAtPubNode(wrconn, slotname, false);
+
+ }
+ PG_FINALLY();
+ {
+ walrcv_disconnect(wrconn);
+ }
+ PG_END_TRY();
+
+ table_close(rel, NoLock);
+}
+
+/*
+ * Drop the replication slot at the publisher node using the replication
+ * connection.
+ *
+ * missing_ok - if true then only issue WARNING message if the slot cannot be
+ * deleted.
+ */
+void
+ReplicationSlotDropAtPubNode(WalReceiverConn *wrconn, char *slotname, bool missing_ok)
+{
+ StringInfoData cmd;
+
+ Assert(wrconn);
+
+ load_file("libpqwalreceiver", false);
+
+ initStringInfo(&cmd);
+ appendStringInfo(&cmd, "DROP_REPLICATION_SLOT %s WAIT", quote_identifier(slotname));
PG_TRY();
{
@@ -1086,27 +1307,37 @@ DropSubscription(DropSubscriptionStmt *stmt, bool isTopLevel)
res = walrcv_exec(wrconn, cmd.data, 0, NULL);
- if (res->status != WALRCV_OK_COMMAND)
- ereport(ERROR,
+ if (res->status == WALRCV_OK_COMMAND)
+ {
+ /* NOTICE. Success. */
+ ereport(NOTICE,
+ (errmsg("dropped replication slot \"%s\" on publisher",
+ slotname)));
+ }
+ else if (res->status == WALRCV_ERROR && missing_ok)
+ {
+ /* WARNING. Error, but missing_ok = true. */
+ ereport(WARNING,
(errmsg("could not drop the replication slot \"%s\" on publisher",
slotname),
errdetail("The error was: %s", res->err)));
+ }
else
- ereport(NOTICE,
- (errmsg("dropped replication slot \"%s\" on publisher",
- slotname)));
+ {
+ /* ERROR. */
+ ereport(ERROR,
+ (errmsg("could not drop the replication slot \"%s\" on publisher",
+ slotname),
+ errdetail("The error was: %s", res->err)));
+ }
walrcv_clear_result(res);
}
PG_FINALLY();
{
- walrcv_disconnect(wrconn);
+ pfree(cmd.data);
}
PG_END_TRY();
-
- pfree(cmd.data);
-
- table_close(rel, NoLock);
}
/*
@@ -1275,3 +1506,45 @@ fetch_table_list(WalReceiverConn *wrconn, List *publications)
return tablelist;
}
+
+/*
+ * This is to report the connection failure while dropping replication slots.
+ * Here, we report the WARNING for all tablesync slots so that user can drop
+ * them manually, if required.
+ */
+static void
+ReportSlotConnectionError(List *rstates, Oid subid, char *slotname, char *err)
+{
+ ListCell *lc;
+
+ foreach(lc, rstates)
+ {
+ SubscriptionRelState *rstate = (SubscriptionRelState *) lfirst(lc);
+ Oid relid = rstate->relid;
+
+ /* Only cleanup resources of tablesync workers */
+ if (!OidIsValid(relid))
+ continue;
+
+ /*
+ * Caller needs to ensure that relstate doesn't change underneath us.
+ * See DropSubscription where we get the relstates.
+ */
+ if (rstate->state != SUBREL_STATE_SYNCDONE)
+ {
+ char syncslotname[NAMEDATALEN] = { 0 };
+
+ ReplicationSlotNameForTablesync(subid, relid, syncslotname);
+ elog(WARNING, "could not drop tablesync replication slot \"%s\"",
+ syncslotname);
+ }
+ }
+
+ ereport(ERROR,
+ (errmsg("could not connect to publisher when attempting to "
+ "drop the replication slot \"%s\"", slotname),
+ errdetail("The error was: %s", err),
+ /* translator: %s is an SQL ALTER command */
+ errhint("Use %s to disassociate the subscription from the slot.",
+ "ALTER SUBSCRIPTION ... SET (slot_name = NONE)")));
+}
diff --git a/src/backend/replication/logical/launcher.c b/src/backend/replication/logical/launcher.c
index 186514cd9e..58082dde18 100644
--- a/src/backend/replication/logical/launcher.c
+++ b/src/backend/replication/logical/launcher.c
@@ -73,20 +73,6 @@ typedef struct LogicalRepWorkerId
Oid relid;
} LogicalRepWorkerId;
-typedef struct StopWorkersData
-{
- int nestDepth; /* Sub-transaction nest level */
- List *workers; /* List of LogicalRepWorkerId */
- struct StopWorkersData *parent; /* This need not be an immediate
- * subtransaction parent */
-} StopWorkersData;
-
-/*
- * Stack of StopWorkersData elements. Each stack element contains the workers
- * to be stopped for that subtransaction.
- */
-static StopWorkersData *on_commit_stop_workers = NULL;
-
static void ApplyLauncherWakeup(void);
static void logicalrep_launcher_onexit(int code, Datum arg);
static void logicalrep_worker_onexit(int code, Datum arg);
@@ -546,51 +532,6 @@ logicalrep_worker_stop(Oid subid, Oid relid)
LWLockRelease(LogicalRepWorkerLock);
}
-/*
- * Request worker for specified sub/rel to be stopped on commit.
- */
-void
-logicalrep_worker_stop_at_commit(Oid subid, Oid relid)
-{
- int nestDepth = GetCurrentTransactionNestLevel();
- LogicalRepWorkerId *wid;
- MemoryContext oldctx;
-
- /* Make sure we store the info in context that survives until commit. */
- oldctx = MemoryContextSwitchTo(TopTransactionContext);
-
- /* Check that previous transactions were properly cleaned up. */
- Assert(on_commit_stop_workers == NULL ||
- nestDepth >= on_commit_stop_workers->nestDepth);
-
- /*
- * Push a new stack element if we don't already have one for the current
- * nestDepth.
- */
- if (on_commit_stop_workers == NULL ||
- nestDepth > on_commit_stop_workers->nestDepth)
- {
- StopWorkersData *newdata = palloc(sizeof(StopWorkersData));
-
- newdata->nestDepth = nestDepth;
- newdata->workers = NIL;
- newdata->parent = on_commit_stop_workers;
- on_commit_stop_workers = newdata;
- }
-
- /*
- * Finally add a new worker into the worker list of the current
- * subtransaction.
- */
- wid = palloc(sizeof(LogicalRepWorkerId));
- wid->subid = subid;
- wid->relid = relid;
- on_commit_stop_workers->workers =
- lappend(on_commit_stop_workers->workers, wid);
-
- MemoryContextSwitchTo(oldctx);
-}
-
/*
* Wake up (using latch) any logical replication worker for specified sub/rel.
*/
@@ -819,109 +760,21 @@ ApplyLauncherShmemInit(void)
}
}
-/*
- * Check whether current transaction has manipulated logical replication
- * workers.
- */
-bool
-XactManipulatesLogicalReplicationWorkers(void)
-{
- return (on_commit_stop_workers != NULL);
-}
-
/*
* Wakeup the launcher on commit if requested.
*/
void
AtEOXact_ApplyLauncher(bool isCommit)
{
-
- Assert(on_commit_stop_workers == NULL ||
- (on_commit_stop_workers->nestDepth == 1 &&
- on_commit_stop_workers->parent == NULL));
-
if (isCommit)
{
- ListCell *lc;
-
- if (on_commit_stop_workers != NULL)
- {
- List *workers = on_commit_stop_workers->workers;
-
- foreach(lc, workers)
- {
- LogicalRepWorkerId *wid = lfirst(lc);
-
- logicalrep_worker_stop(wid->subid, wid->relid);
- }
- }
-
if (on_commit_launcher_wakeup)
ApplyLauncherWakeup();
}
- /*
- * No need to pfree on_commit_stop_workers. It was allocated in
- * transaction memory context, which is going to be cleaned soon.
- */
- on_commit_stop_workers = NULL;
on_commit_launcher_wakeup = false;
}
-/*
- * On commit, merge the current on_commit_stop_workers list into the
- * immediate parent, if present.
- * On rollback, discard the current on_commit_stop_workers list.
- * Pop out the stack.
- */
-void
-AtEOSubXact_ApplyLauncher(bool isCommit, int nestDepth)
-{
- StopWorkersData *parent;
-
- /* Exit immediately if there's no work to do at this level. */
- if (on_commit_stop_workers == NULL ||
- on_commit_stop_workers->nestDepth < nestDepth)
- return;
-
- Assert(on_commit_stop_workers->nestDepth == nestDepth);
-
- parent = on_commit_stop_workers->parent;
-
- if (isCommit)
- {
- /*
- * If the upper stack element is not an immediate parent
- * subtransaction, just decrement the notional nesting depth without
- * doing any real work. Else, we need to merge the current workers
- * list into the parent.
- */
- if (!parent || parent->nestDepth < nestDepth - 1)
- {
- on_commit_stop_workers->nestDepth--;
- return;
- }
-
- parent->workers =
- list_concat(parent->workers, on_commit_stop_workers->workers);
- }
- else
- {
- /*
- * Abandon everything that was done at this nesting level. Explicitly
- * free memory to avoid a transaction-lifespan leak.
- */
- list_free_deep(on_commit_stop_workers->workers);
- }
-
- /*
- * We have taken care of the current subtransaction workers list for both
- * abort or commit. So we are ready to pop the stack.
- */
- pfree(on_commit_stop_workers);
- on_commit_stop_workers = parent;
-}
-
/*
* Request wakeup of the launcher on commit of the transaction.
*
diff --git a/src/backend/replication/logical/tablesync.c b/src/backend/replication/logical/tablesync.c
index ccbdbcf08f..e89e522118 100644
--- a/src/backend/replication/logical/tablesync.c
+++ b/src/backend/replication/logical/tablesync.c
@@ -31,8 +31,11 @@
* table state to INIT.
* - Tablesync worker starts; changes table state from INIT to DATASYNC while
* copying.
- * - Tablesync worker finishes the copy and sets table state to SYNCWAIT;
- * waits for state change.
+ * - Tablesync worker does initial table copy; there is a FINISHEDCOPY (sync
+ * worker specific) state to indicate when the copy phase has completed, so
+ * if the worker crashes with this (non-memory) state then the copy will not
+ * be re-attempted.
+ * - Tablesync worker then sets table state to SYNCWAIT; waits for state change.
* - Apply worker periodically checks for tables in SYNCWAIT state. When
* any appear, it sets the table state to CATCHUP and starts loop-waiting
* until either the table state is set to SYNCDONE or the sync worker
@@ -48,8 +51,8 @@
* point it sets state to READY and stops tracking. Again, there might
* be zero changes in between.
*
- * So the state progression is always: INIT -> DATASYNC -> SYNCWAIT ->
- * CATCHUP -> SYNCDONE -> READY.
+ * So the state progression is always: INIT -> DATASYNC -> FINISHEDCOPY
+ * -> SYNCWAIT -> CATCHUP -> SYNCDONE -> READY.
*
* The catalog pg_subscription_rel is used to keep information about
* subscribed tables and their state. The catalog holds all states
@@ -58,6 +61,7 @@
* Example flows look like this:
* - Apply is in front:
* sync:8
+ * -> set in catalog FINISHEDCOPY
* -> set in memory SYNCWAIT
* apply:10
* -> set in memory CATCHUP
@@ -73,6 +77,7 @@
*
* - Sync is in front:
* sync:10
+ * -> set in catalog FINISHEDCOPY
* -> set in memory SYNCWAIT
* apply:8
* -> set in memory CATCHUP
@@ -101,7 +106,10 @@
#include "replication/logicalrelation.h"
#include "replication/walreceiver.h"
#include "replication/worker_internal.h"
+#include "replication/slot.h"
+#include "replication/origin.h"
#include "storage/ipc.h"
+#include "storage/lmgr.h"
#include "utils/builtins.h"
#include "utils/lsyscache.h"
#include "utils/memutils.h"
@@ -269,26 +277,52 @@ invalidate_syncing_table_states(Datum arg, int cacheid, uint32 hashvalue)
static void
process_syncing_tables_for_sync(XLogRecPtr current_lsn)
{
- Assert(IsTransactionState());
-
SpinLockAcquire(&MyLogicalRepWorker->relmutex);
if (MyLogicalRepWorker->relstate == SUBREL_STATE_CATCHUP &&
current_lsn >= MyLogicalRepWorker->relstate_lsn)
{
TimeLineID tli;
+ char syncslotname[NAMEDATALEN] = {0};
MyLogicalRepWorker->relstate = SUBREL_STATE_SYNCDONE;
MyLogicalRepWorker->relstate_lsn = current_lsn;
SpinLockRelease(&MyLogicalRepWorker->relmutex);
+ /*
+ * UpdateSubscriptionRelState must be called within a transaction.
+ * That transaction will be ended within the finish_sync_worker().
+ */
+ if (!IsTransactionState())
+ StartTransactionCommand();
+
UpdateSubscriptionRelState(MyLogicalRepWorker->subid,
MyLogicalRepWorker->relid,
MyLogicalRepWorker->relstate,
MyLogicalRepWorker->relstate_lsn);
+ /* End wal streaming so wrconn can be re-used to drop the slot. */
walrcv_endstreaming(wrconn, &tli);
+
+ /*
+ * Cleanup the tablesync slot.
+ *
+ * This has to be done after updating the state because otherwise if
+ * there is an error while doing the database operations we won't be
+ * able to rollback dropped slot.
+ */
+ ReplicationSlotNameForTablesync(MyLogicalRepWorker->subid,
+ MyLogicalRepWorker->relid,
+ syncslotname);
+
+ /*
+ * It is important to give an error if we are unable to drop the slot,
+ * otherwise, it won't be dropped till the corresponding subscription
+ * is dropped. So passing missing_ok = false.
+ */
+ ReplicationSlotDropAtPubNode(wrconn, syncslotname, false);
+
finish_sync_worker();
}
else
@@ -403,6 +437,8 @@ process_syncing_tables_for_apply(XLogRecPtr current_lsn)
*/
if (current_lsn >= rstate->lsn)
{
+ char originname[NAMEDATALEN];
+
rstate->state = SUBREL_STATE_READY;
rstate->lsn = current_lsn;
if (!started_tx)
@@ -411,6 +447,27 @@ process_syncing_tables_for_apply(XLogRecPtr current_lsn)
started_tx = true;
}
+ /*
+ * Remove the tablesync origin tracking if exists.
+ *
+ * The normal case origin drop is done here instead of in the
+ * process_syncing_tables_for_sync function because we don't
+ * allow to drop the origin till the process owning the origin
+ * is alive.
+ *
+ * There is a chance that the user is concurrently performing
+ * refresh for the subscription where we remove the table
+ * state and its origin and by this time the origin might be
+ * already removed. So passing missing_ok = true.
+ */
+ ReplicationOriginNameForTableSync(MyLogicalRepWorker->subid,
+ rstate->relid,
+ originname);
+ replorigin_drop_by_name(originname, true, false);
+
+ /*
+ * Update the state to READY only after the origin cleanup.
+ */
UpdateSubscriptionRelState(MyLogicalRepWorker->subid,
rstate->relid, rstate->state,
rstate->lsn);
@@ -805,6 +862,50 @@ copy_table(Relation rel)
logicalrep_rel_close(relmapentry, NoLock);
}
+/*
+ * Determine the tablesync slot name.
+ *
+ * The name must not exceed NAMEDATALEN - 1 because of remote node constraints
+ * on slot name length. We append system_identifier to avoid slot_name
+ * collision with subscriptions in other clusters. With the current scheme
+ * pg_%u_sync_%u_UINT64_FORMAT (3 + 10 + 6 + 10 + 20 + '\0'), the maximum
+ * length of slot_name will be 50.
+ *
+ * The returned slot name is either:
+ * - stored in the supplied buffer (syncslotname), or
+ * - palloc'ed in current memory context (if syncslotname = NULL).
+ *
+ * Note: We don't use the subscription slot name as part of tablesync slot name
+ * because we are responsible for cleaning up these slots and it could become
+ * impossible to recalculate what name to cleanup if the subscription slot name
+ * had changed.
+ */
+char *
+ReplicationSlotNameForTablesync(Oid suboid, Oid relid,
+ char syncslotname[NAMEDATALEN])
+{
+ if (syncslotname)
+ sprintf(syncslotname, "pg_%u_sync_%u_" UINT64_FORMAT, suboid, relid,
+ GetSystemIdentifier());
+ else
+ syncslotname = psprintf("pg_%u_sync_%u_" UINT64_FORMAT, suboid, relid,
+ GetSystemIdentifier());
+
+ return syncslotname;
+}
+
+/*
+ * Form the origin name for tablesync.
+ *
+ * Return the name in the supplied buffer.
+ */
+void
+ReplicationOriginNameForTableSync(Oid suboid, Oid relid,
+ char originname[NAMEDATALEN])
+{
+ snprintf(originname, NAMEDATALEN, "pg_%u_%u", suboid, relid);
+}
+
/*
* Start syncing the table in the sync worker.
*
@@ -822,6 +923,8 @@ LogicalRepSyncTableStart(XLogRecPtr *origin_startpos)
XLogRecPtr relstate_lsn;
Relation rel;
WalRcvExecResult *res;
+ char originname[NAMEDATALEN];
+ RepOriginId originid;
/* Check the state of the table synchronization. */
StartTransactionCommand();
@@ -847,19 +950,10 @@ LogicalRepSyncTableStart(XLogRecPtr *origin_startpos)
finish_sync_worker(); /* doesn't return */
}
- /*
- * To build a slot name for the sync work, we are limited to NAMEDATALEN -
- * 1 characters. We cut the original slot name to NAMEDATALEN - 28 chars
- * and append _%u_sync_%u (1 + 10 + 6 + 10 + '\0'). (It's actually the
- * NAMEDATALEN on the remote that matters, but this scheme will also work
- * reasonably if that is different.)
- */
- StaticAssertStmt(NAMEDATALEN >= 32, "NAMEDATALEN too small"); /* for sanity */
- slotname = psprintf("%.*s_%u_sync_%u",
- NAMEDATALEN - 28,
- MySubscription->slotname,
- MySubscription->oid,
- MyLogicalRepWorker->relid);
+ /* Calculate the name of the tablesync slot. */
+ slotname = ReplicationSlotNameForTablesync(MySubscription->oid,
+ MyLogicalRepWorker->relid,
+ NULL /* use palloc */ );
/*
* Here we use the slot name instead of the subscription name as the
@@ -872,7 +966,50 @@ LogicalRepSyncTableStart(XLogRecPtr *origin_startpos)
(errmsg("could not connect to the publisher: %s", err)));
Assert(MyLogicalRepWorker->relstate == SUBREL_STATE_INIT ||
- MyLogicalRepWorker->relstate == SUBREL_STATE_DATASYNC);
+ MyLogicalRepWorker->relstate == SUBREL_STATE_DATASYNC ||
+ MyLogicalRepWorker->relstate == SUBREL_STATE_FINISHEDCOPY);
+
+ /* Assign the origin tracking record name. */
+ ReplicationOriginNameForTableSync(MySubscription->oid,
+ MyLogicalRepWorker->relid,
+ originname);
+
+ if (MyLogicalRepWorker->relstate == SUBREL_STATE_DATASYNC)
+ {
+ /*
+ * We have previously errored out before finishing the copy so the
+ * replication slot might exist. We want to remove the slot if it
+ * already exists and proceed.
+ *
+ * XXX We could also instead try to drop the slot, last time we failed
+ * but for that, we might need to clean up the copy state as it might
+ * be in the middle of fetching the rows. Also, if there is a network
+ * breakdown then it wouldn't have succeeded so trying it next time
+ * seems like a better bet.
+ */
+ ReplicationSlotDropAtPubNode(wrconn, slotname, true);
+ }
+ else if (MyLogicalRepWorker->relstate == SUBREL_STATE_FINISHEDCOPY)
+ {
+ /*
+ * The COPY phase was previously done, but tablesync then crashed
+ * before it was able to finish normally.
+ */
+ StartTransactionCommand();
+
+ /*
+ * The origin tracking name must already exist. It was created first
+ * time this tablesync was launched.
+ */
+ originid = replorigin_by_name(originname, false);
+ replorigin_session_setup(originid);
+ replorigin_session_origin = originid;
+ *origin_startpos = replorigin_session_get_progress(false);
+
+ CommitTransactionCommand();
+
+ goto copy_table_done;
+ }
SpinLockAcquire(&MyLogicalRepWorker->relmutex);
MyLogicalRepWorker->relstate = SUBREL_STATE_DATASYNC;
@@ -888,9 +1025,6 @@ LogicalRepSyncTableStart(XLogRecPtr *origin_startpos)
CommitTransactionCommand();
pgstat_report_stat(false);
- /*
- * We want to do the table data sync in a single transaction.
- */
StartTransactionCommand();
/*
@@ -916,12 +1050,45 @@ LogicalRepSyncTableStart(XLogRecPtr *origin_startpos)
walrcv_clear_result(res);
/*
- * Create a new temporary logical decoding slot. This slot will be used
- * for the catchup phase after COPY is done, so tell it to use the
- * snapshot to make the final data consistent.
+ * Create a new permanent logical decoding slot. This slot will be
+ * used for the catchup phase after COPY is done, so tell it to use
+ * the snapshot to make the final data consistent.
+ */
+ walrcv_create_slot(wrconn, slotname, false /* permanent */ ,
+ CRS_USE_SNAPSHOT, origin_startpos);
+
+ /*
+ * Setup replication origin tracking. The purpose of doing this before the
+ * copy is to avoid doing the copy again due to any error in setting up
+ * origin tracking.
*/
- walrcv_create_slot(wrconn, slotname, true,
- CRS_USE_SNAPSHOT, origin_startpos);
+ originid = replorigin_by_name(originname, true);
+ if (!OidIsValid(originid))
+ {
+ /*
+ * Origin tracking does not exist, so create it now.
+ *
+ * Then advance to the LSN got from walrcv_create_slot. This is
+ * WAL logged for the purpose of recovery. Locks are to prevent
+ * the replication origin from vanishing while advancing.
+ */
+ originid = replorigin_create(originname);
+
+ LockRelationOid(ReplicationOriginRelationId, RowExclusiveLock);
+ replorigin_advance(originid, *origin_startpos, InvalidXLogRecPtr,
+ true /* go backward */, true /* WAL log */);
+ UnlockRelationOid(ReplicationOriginRelationId, RowExclusiveLock);
+
+ replorigin_session_setup(originid);
+ replorigin_session_origin = originid;
+ }
+ else
+ {
+ ereport(ERROR,
+ (errcode(ERRCODE_DUPLICATE_OBJECT),
+ errmsg("replication origin \"%s\" already exists",
+ originname)));
+ }
/* Now do the initial data copy */
PushActiveSnapshot(GetTransactionSnapshot());
@@ -940,6 +1107,25 @@ LogicalRepSyncTableStart(XLogRecPtr *origin_startpos)
/* Make the copy visible. */
CommandCounterIncrement();
+ /*
+ * Update the persisted state to indicate the COPY phase is done; make
+ * it visible to others.
+ */
+ UpdateSubscriptionRelState(MyLogicalRepWorker->subid,
+ MyLogicalRepWorker->relid,
+ SUBREL_STATE_FINISHEDCOPY,
+ MyLogicalRepWorker->relstate_lsn);
+
+ CommitTransactionCommand();
+
+copy_table_done:
+
+ elog(DEBUG1,
+ "LogicalRepSyncTableStart: '%s' origin_startpos lsn %X/%X",
+ originname,
+ (uint32) (*origin_startpos >> 32),
+ (uint32) *origin_startpos);
+
/*
* We are done with the initial data synchronization, update the state.
*/
diff --git a/src/backend/replication/logical/worker.c b/src/backend/replication/logical/worker.c
index eb7db89cef..cfc924cd89 100644
--- a/src/backend/replication/logical/worker.c
+++ b/src/backend/replication/logical/worker.c
@@ -807,12 +807,8 @@ apply_handle_stream_stop(StringInfo s)
/* We must be in a valid transaction state */
Assert(IsTransactionState());
- /* The synchronization worker runs in single transaction. */
- if (!am_tablesync_worker())
- {
- /* Commit the per-stream transaction */
- CommitTransactionCommand();
- }
+ /* Commit the per-stream transaction */
+ CommitTransactionCommand();
in_streamed_transaction = false;
@@ -889,9 +885,7 @@ apply_handle_stream_abort(StringInfo s)
/* Cleanup the subxact info */
cleanup_subxact_info();
- /* The synchronization worker runs in single transaction */
- if (!am_tablesync_worker())
- CommitTransactionCommand();
+ CommitTransactionCommand();
return;
}
@@ -918,8 +912,7 @@ apply_handle_stream_abort(StringInfo s)
/* write the updated subxact list */
subxact_info_write(MyLogicalRepWorker->subid, xid);
- if (!am_tablesync_worker())
- CommitTransactionCommand();
+ CommitTransactionCommand();
}
}
@@ -1062,8 +1055,7 @@ apply_handle_stream_commit(StringInfo s)
static void
apply_handle_commit_internal(StringInfo s, LogicalRepCommitData *commit_data)
{
- /* The synchronization worker runs in single transaction. */
- if (IsTransactionState() && !am_tablesync_worker())
+ if (IsTransactionState())
{
/*
* Update origin state so we can restart streaming from correct
diff --git a/src/backend/tcop/utility.c b/src/backend/tcop/utility.c
index 1d81071c35..05bb698cf4 100644
--- a/src/backend/tcop/utility.c
+++ b/src/backend/tcop/utility.c
@@ -1786,7 +1786,8 @@ ProcessUtilitySlow(ParseState *pstate,
break;
case T_AlterSubscriptionStmt:
- address = AlterSubscription((AlterSubscriptionStmt *) parsetree);
+ address = AlterSubscription((AlterSubscriptionStmt *) parsetree,
+ isTopLevel);
break;
case T_DropSubscriptionStmt:
diff --git a/src/include/catalog/pg_subscription_rel.h b/src/include/catalog/pg_subscription_rel.h
index 2bea2c52aa..ed94f57baa 100644
--- a/src/include/catalog/pg_subscription_rel.h
+++ b/src/include/catalog/pg_subscription_rel.h
@@ -61,6 +61,8 @@ DECLARE_UNIQUE_INDEX_PKEY(pg_subscription_rel_srrelid_srsubid_index, 6117, on pg
#define SUBREL_STATE_INIT 'i' /* initializing (sublsn NULL) */
#define SUBREL_STATE_DATASYNC 'd' /* data is being synchronized (sublsn
* NULL) */
+#define SUBREL_STATE_FINISHEDCOPY 'f' /* tablesync copy phase is completed
+ * (sublsn NULL) */
#define SUBREL_STATE_SYNCDONE 's' /* synchronization finished in front of
* apply (sublsn set) */
#define SUBREL_STATE_READY 'r' /* ready (sublsn set) */
diff --git a/src/include/commands/subscriptioncmds.h b/src/include/commands/subscriptioncmds.h
index a81865079d..3b926f35d7 100644
--- a/src/include/commands/subscriptioncmds.h
+++ b/src/include/commands/subscriptioncmds.h
@@ -20,7 +20,7 @@
extern ObjectAddress CreateSubscription(CreateSubscriptionStmt *stmt,
bool isTopLevel);
-extern ObjectAddress AlterSubscription(AlterSubscriptionStmt *stmt);
+extern ObjectAddress AlterSubscription(AlterSubscriptionStmt *stmt, bool isTopLevel);
extern void DropSubscription(DropSubscriptionStmt *stmt, bool isTopLevel);
extern ObjectAddress AlterSubscriptionOwner(const char *name, Oid newOwnerId);
diff --git a/src/include/replication/logicallauncher.h b/src/include/replication/logicallauncher.h
index 421ec1580d..301e494f7b 100644
--- a/src/include/replication/logicallauncher.h
+++ b/src/include/replication/logicallauncher.h
@@ -22,9 +22,7 @@ extern Size ApplyLauncherShmemSize(void);
extern void ApplyLauncherShmemInit(void);
extern void ApplyLauncherWakeupAtCommit(void);
-extern bool XactManipulatesLogicalReplicationWorkers(void);
extern void AtEOXact_ApplyLauncher(bool isCommit);
-extern void AtEOSubXact_ApplyLauncher(bool isCommit, int nestDepth);
extern bool IsLogicalLauncher(void);
diff --git a/src/include/replication/slot.h b/src/include/replication/slot.h
index 53f636c56f..5f52335f15 100644
--- a/src/include/replication/slot.h
+++ b/src/include/replication/slot.h
@@ -15,6 +15,7 @@
#include "storage/lwlock.h"
#include "storage/shmem.h"
#include "storage/spin.h"
+#include "replication/walreceiver.h"
/*
* Behaviour of replication slots, upon release or crash.
@@ -211,6 +212,8 @@ extern bool ReplicationSlotsCountDBSlots(Oid dboid, int *nslots, int *nactive);
extern void ReplicationSlotsDropDBSlots(Oid dboid);
extern void InvalidateObsoleteReplicationSlots(XLogSegNo oldestSegno);
extern ReplicationSlot *SearchNamedReplicationSlot(const char *name);
+extern char *ReplicationSlotNameForTablesync(Oid suboid, Oid relid, char *syncslotname);
+extern void ReplicationSlotDropAtPubNode(WalReceiverConn *wrconn, char *slotname, bool missing_ok);
extern void StartupReplicationSlots(void);
extern void CheckPointReplicationSlots(void);
diff --git a/src/include/replication/worker_internal.h b/src/include/replication/worker_internal.h
index d046022e49..8d9cc4f596 100644
--- a/src/include/replication/worker_internal.h
+++ b/src/include/replication/worker_internal.h
@@ -77,13 +77,14 @@ extern List *logicalrep_workers_find(Oid subid, bool only_running);
extern void logicalrep_worker_launch(Oid dbid, Oid subid, const char *subname,
Oid userid, Oid relid);
extern void logicalrep_worker_stop(Oid subid, Oid relid);
-extern void logicalrep_worker_stop_at_commit(Oid subid, Oid relid);
extern void logicalrep_worker_wakeup(Oid subid, Oid relid);
extern void logicalrep_worker_wakeup_ptr(LogicalRepWorker *worker);
extern int logicalrep_sync_worker_count(Oid subid);
+extern void ReplicationOriginNameForTableSync(Oid suboid, Oid relid, char *originname);
extern char *LogicalRepSyncTableStart(XLogRecPtr *origin_startpos);
+
void process_syncing_tables(XLogRecPtr current_lsn);
void invalidate_syncing_table_states(Datum arg, int cacheid,
uint32 hashvalue);
diff --git a/src/test/regress/expected/subscription.out b/src/test/regress/expected/subscription.out
index 2fa9bce66a..7802279cb2 100644
--- a/src/test/regress/expected/subscription.out
+++ b/src/test/regress/expected/subscription.out
@@ -201,6 +201,27 @@ ALTER SUBSCRIPTION regress_testsub SET (slot_name = NONE);
(1 row)
DROP SUBSCRIPTION regress_testsub;
+CREATE SUBSCRIPTION regress_testsub CONNECTION 'dbname=postgres' PUBLICATION mypub
+ WITH (enabled = true, create_slot = false, copy_data = false);
+-- fail - ALTER SUBSCRIPTION with refresh is not allowed in a transaction
+-- block or function
+BEGIN;
+ALTER SUBSCRIPTION regress_testsub SET PUBLICATION mypub WITH (refresh = true);
+ERROR: ALTER SUBSCRIPTION with refresh cannot run inside a transaction block
+END;
+BEGIN;
+ALTER SUBSCRIPTION regress_testsub REFRESH PUBLICATION;
+ERROR: ALTER SUBSCRIPTION ... REFRESH cannot run inside a transaction block
+END;
+CREATE FUNCTION func() RETURNS VOID AS
+$$ ALTER SUBSCRIPTION regress_testsub SET PUBLICATION mypub WITH (refresh = true) $$ LANGUAGE SQL;
+SELECT func();
+ERROR: ALTER SUBSCRIPTION with refresh cannot be executed from a function
+CONTEXT: SQL function "func" statement 1
+ALTER SUBSCRIPTION regress_testsub DISABLE;
+ALTER SUBSCRIPTION regress_testsub SET (slot_name = NONE);
+DROP SUBSCRIPTION regress_testsub;
+DROP FUNCTION func;
RESET SESSION AUTHORIZATION;
DROP ROLE regress_subscription_user;
DROP ROLE regress_subscription_user2;
diff --git a/src/test/regress/sql/subscription.sql b/src/test/regress/sql/subscription.sql
index 14fa0b247e..ca0d782742 100644
--- a/src/test/regress/sql/subscription.sql
+++ b/src/test/regress/sql/subscription.sql
@@ -147,6 +147,28 @@ ALTER SUBSCRIPTION regress_testsub SET (slot_name = NONE);
DROP SUBSCRIPTION regress_testsub;
+CREATE SUBSCRIPTION regress_testsub CONNECTION 'dbname=postgres' PUBLICATION mypub
+ WITH (enabled = true, create_slot = false, copy_data = false);
+
+-- fail - ALTER SUBSCRIPTION with refresh is not allowed in a transaction
+-- block or function
+BEGIN;
+ALTER SUBSCRIPTION regress_testsub SET PUBLICATION mypub WITH (refresh = true);
+END;
+
+BEGIN;
+ALTER SUBSCRIPTION regress_testsub REFRESH PUBLICATION;
+END;
+
+CREATE FUNCTION func() RETURNS VOID AS
+$$ ALTER SUBSCRIPTION regress_testsub SET PUBLICATION mypub WITH (refresh = true) $$ LANGUAGE SQL;
+SELECT func();
+
+ALTER SUBSCRIPTION regress_testsub DISABLE;
+ALTER SUBSCRIPTION regress_testsub SET (slot_name = NONE);
+DROP SUBSCRIPTION regress_testsub;
+DROP FUNCTION func;
+
RESET SESSION AUTHORIZATION;
DROP ROLE regress_subscription_user;
DROP ROLE regress_subscription_user2;
diff --git a/src/test/subscription/t/004_sync.pl b/src/test/subscription/t/004_sync.pl
index e111ab9181..c7926681b6 100644
--- a/src/test/subscription/t/004_sync.pl
+++ b/src/test/subscription/t/004_sync.pl
@@ -3,7 +3,7 @@ use strict;
use warnings;
use PostgresNode;
use TestLib;
-use Test::More tests => 7;
+use Test::More tests => 8;
# Initialize publisher node
my $node_publisher = get_new_node('publisher');
@@ -149,7 +149,26 @@ $result = $node_subscriber->safe_psql('postgres',
is($result, qq(20),
'changes for table added after subscription initialized replicated');
+# clean up
+$node_publisher->safe_psql('postgres', "DROP TABLE tab_rep_next");
+$node_subscriber->safe_psql('postgres', "DROP TABLE tab_rep_next");
$node_subscriber->safe_psql('postgres', "DROP SUBSCRIPTION tap_sub");
+# Table tap_rep already has the same records on both publisher and subscriber
+# at this time. Recreate the subscription which will do the initial copy of
+# the table again and fails due to unique constraint violation.
+$node_subscriber->safe_psql('postgres',
+ "CREATE SUBSCRIPTION tap_sub CONNECTION '$publisher_connstr' PUBLICATION tap_pub");
+
+$result = $node_subscriber->poll_query_until('postgres', $started_query)
+ or die "Timed out while waiting for subscriber to start sync";
+
+# DROP SUBSCRIPTION must clean up slots on the publisher side when the
+# subscriber is stuck on data copy for constraint violation.
+$node_subscriber->safe_psql('postgres', "DROP SUBSCRIPTION tap_sub");
+
+$result = $node_publisher->safe_psql('postgres', "SELECT count(*) FROM pg_replication_slots");
+is($result, qq(0), 'DROP SUBSCRIPTION during error can clean up the slots on the publisher');
+
$node_subscriber->stop('fast');
$node_publisher->stop('fast');
diff --git a/src/tools/pgindent/typedefs.list b/src/tools/pgindent/typedefs.list
index 1d540fe489..5f5c36d8e2 100644
--- a/src/tools/pgindent/typedefs.list
+++ b/src/tools/pgindent/typedefs.list
@@ -2397,7 +2397,6 @@ StdAnalyzeData
StdRdOptions
Step
StopList
-StopWorkersData
StrategyNumber
StreamCtl
StreamXidHash
--
2.28.0.windows.1
On Tue, Feb 9, 2021 at 8:32 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
On Tue, Feb 9, 2021 at 12:02 PM Peter Smith <smithpb2250@gmail.com> wrote:
Here are my feedback comments for the V29 patch.
Thanks.
3.
Previously the tablesync origin name format was encapsulated in a
common function. IMO it was cleaner/safer how it was before, instead
of the same "pg_%u_%u" cut/paste and scattered in many places.
(same comment applies multiple places, in this file and in tablesync.c)
OK. I confirmed it is fixed in V30.
But I noticed that the new function name is not quite consistent with
existing function for slot name. e.g.
ReplicationSlotNameForTablesync versus
ReplicationOriginNameForTableSync (see "TableSync" instead of
"Tablesync")
------
Kind Regards,
Peter Smith.
Fujitsu Australia
On Tue, Feb 9, 2021 at 10:38 AM Peter Smith <smithpb2250@gmail.com> wrote:
On Mon, Feb 8, 2021 at 11:42 AM Peter Smith <smithpb2250@gmail.com> wrote:
On Sun, Feb 7, 2021 at 2:38 PM Peter Smith <smithpb2250@gmail.com> wrote:
On Sat, Feb 6, 2021 at 2:10 AM Petr Jelinek
<petr.jelinek@enterprisedb.com> wrote:Hi,
Some minor comments about code:
+ else if (res->status == WALRCV_ERROR && missing_ok) + { + /* WARNING. Error, but missing_ok = true. */ + ereport(WARNING,I wonder if we need to add error code to the WalRcvExecResult and check
for the appropriate ones here. Because this can for example return error
because of timeout, not because slot is missing. Not sure if it matters
for current callers though (but then maybe don't call the param
missign_ok?).You are right. The way we are using this function has evolved beyond
the original intention.
Probably renaming the param to something like "error_ok" would be more
appropriate now.PSA a patch (apply on top of V28) to change the misleading param name.
PSA an alternative patch. This one adds a new member to
WalRcvExecResult and so is able to detect the "slot does not exist"
error. This patch also applies on top of V28, if you want it.
PSA v2 of this WalRcvExceResult patch (it is same as v1 but includes
some PG doc updates).
This applies OK on top of v30 of the main patch.
------
Kind Regards,
Peter Smith.
Fujitsu Australia
Attachments:
v2-0001-ReplicationSlotDropAtPubNode-detect-slot-does-not.patchapplication/octet-stream; name=v2-0001-ReplicationSlotDropAtPubNode-detect-slot-does-not.patchDownload
From c092a3cf55085725924417af1bb5c11dbcba31e2 Mon Sep 17 00:00:00 2001
From: Peter Smith <peter.b.smith@fujitsu.com>
Date: Wed, 10 Feb 2021 13:02:06 +1100
Subject: [PATCH v2] ReplicationSlotDropAtPubNode detect slot does not exist.
A new sqlstate member was added to WalRcvExecResult.
This allows walrcv_exec calling code to know the detailed cause of any error. Specifically, here it means the ReplicationSlotDropAtPubNode function can now identify the "slot does not exist error", and so can handle "missing_ok" more correctly.
Also, minor updates to PG docs now that ALTER SUBSCRIPTION may give ERROR instead of WARNING.
---
doc/src/sgml/ref/alter_subscription.sgml | 7 +------
src/backend/commands/subscriptioncmds.c | 3 ++-
src/backend/replication/libpqwalreceiver/libpqwalreceiver.c | 8 ++++++++
src/include/replication/walreceiver.h | 1 +
4 files changed, 12 insertions(+), 7 deletions(-)
diff --git a/doc/src/sgml/ref/alter_subscription.sgml b/doc/src/sgml/ref/alter_subscription.sgml
index 1ca2437..bd81ea4 100644
--- a/doc/src/sgml/ref/alter_subscription.sgml
+++ b/doc/src/sgml/ref/alter_subscription.sgml
@@ -55,12 +55,7 @@ ALTER SUBSCRIPTION <replaceable class="parameter">name</replaceable> RENAME TO <
any. It is necessary to remove tablesync slots so that the resources
allocated for the subscription on the remote host are released. If due to
network breakdown or some other error, <productname>PostgreSQL</productname>
- is unable to remove the slots, a WARNING will be reported. The user needs to
- manually remove such slots later or the
- <xref linkend="guc-max-slot-wal-keep-size"/> should be configured on the
- remote host as otherwise, they will continue to reserve WAL and might
- eventually cause the disk to fill up. See also
- <xref linkend="logical-replication-subscription-slot"/>.
+ is unable to remove the slots, an ERROR will be reported.
</para>
<para>
diff --git a/src/backend/commands/subscriptioncmds.c b/src/backend/commands/subscriptioncmds.c
index 8046153..ff295c7 100644
--- a/src/backend/commands/subscriptioncmds.c
+++ b/src/backend/commands/subscriptioncmds.c
@@ -1314,7 +1314,8 @@ ReplicationSlotDropAtPubNode(WalReceiverConn *wrconn, char *slotname, bool missi
(errmsg("dropped replication slot \"%s\" on publisher",
slotname)));
}
- else if (res->status == WALRCV_ERROR && missing_ok)
+ else if (res->status == WALRCV_ERROR &&
+ missing_ok && res->sqlstate == ERRCODE_UNDEFINED_OBJECT)
{
/* WARNING. Error, but missing_ok = true. */
ereport(WARNING,
diff --git a/src/backend/replication/libpqwalreceiver/libpqwalreceiver.c b/src/backend/replication/libpqwalreceiver/libpqwalreceiver.c
index e958274..7714696 100644
--- a/src/backend/replication/libpqwalreceiver/libpqwalreceiver.c
+++ b/src/backend/replication/libpqwalreceiver/libpqwalreceiver.c
@@ -982,6 +982,7 @@ libpqrcv_exec(WalReceiverConn *conn, const char *query,
{
PGresult *pgres = NULL;
WalRcvExecResult *walres = palloc0(sizeof(WalRcvExecResult));
+ char *diag_sqlstate;
if (MyDatabaseId == InvalidOid)
ereport(ERROR,
@@ -1025,6 +1026,13 @@ libpqrcv_exec(WalReceiverConn *conn, const char *query,
case PGRES_BAD_RESPONSE:
walres->status = WALRCV_ERROR;
walres->err = pchomp(PQerrorMessage(conn->streamConn));
+ diag_sqlstate = PQresultErrorField(pgres, PG_DIAG_SQLSTATE);
+ if (diag_sqlstate)
+ walres->sqlstate = MAKE_SQLSTATE(diag_sqlstate[0],
+ diag_sqlstate[1],
+ diag_sqlstate[2],
+ diag_sqlstate[3],
+ diag_sqlstate[4]);
break;
}
diff --git a/src/include/replication/walreceiver.h b/src/include/replication/walreceiver.h
index 4313f51..a97a59a 100644
--- a/src/include/replication/walreceiver.h
+++ b/src/include/replication/walreceiver.h
@@ -210,6 +210,7 @@ typedef enum
typedef struct WalRcvExecResult
{
WalRcvExecStatus status;
+ int sqlstate;
char *err;
Tuplestorestate *tuplestore;
TupleDesc tupledesc;
--
1.8.3.1
On Tue, Feb 9, 2021 at 10:38 AM Peter Smith <smithpb2250@gmail.com> wrote:
PSA an alternative patch. This one adds a new member to
WalRcvExecResult and so is able to detect the "slot does not exist"
error. This patch also applies on top of V28, if you want it.
Did some testing with this patch on top of v29. I could see that now,
while dropping the subscription, if the tablesync slot does not exist
on the publisher, then it gives a warning
but the command does not fail.
postgres=# CREATE SUBSCRIPTION tap_sub CONNECTION 'host=localhost
dbname=postgres port=6972' PUBLICATION tap_pub WITH (enabled = false);
NOTICE: created replication slot "tap_sub" on publisher
CREATE SUBSCRIPTION
postgres=# ALTER SUBSCRIPTION tap_sub enable;
ALTER SUBSCRIPTION
postgres=# ALTER SUBSCRIPTION tap_sub disable;
ALTER SUBSCRIPTION
=== here, the tablesync slot exists on the publisher but I go and
=== manually drop it.
postgres=# drop subscription tap_sub;
WARNING: could not drop the replication slot
"pg_16401_sync_16389_6927117142022745645" on publisher
DETAIL: The error was: ERROR: replication slot
"pg_16401_sync_16389_6927117142022745645" does not exist
NOTICE: dropped replication slot "tap_sub" on publisher
DROP SUBSCRIPTION
I have a minor comment on the error message, the "The error was:"
seems a bit redundant here. Maybe remove it? So that it looks like:
WARNING: could not drop the replication slot
"pg_16401_sync_16389_6927117142022745645" on publisher
DETAIL: ERROR: replication slot
"pg_16401_sync_16389_6927117142022745645" does not exist
regards,
Ajin Cherian
Fujitsu Australia
On Wed, Feb 10, 2021 at 7:41 AM Peter Smith <smithpb2250@gmail.com> wrote:
On Tue, Feb 9, 2021 at 10:38 AM Peter Smith <smithpb2250@gmail.com> wrote:
PSA v2 of this WalRcvExceResult patch (it is same as v1 but includes
some PG doc updates).
This applies OK on top of v30 of the main patch.
Thanks, I have integrated these changes into the main patch and
additionally made some changes to comments and docs. I have also fixed
the function name inconsistency issue you reported and ran pgindent.
--
With Regards,
Amit Kapila.
Attachments:
v31-0001-Allow-multiple-xacts-during-table-sync-in-logica.patchapplication/octet-stream; name=v31-0001-Allow-multiple-xacts-during-table-sync-in-logica.patchDownload
From 31691bef5e895ad251325e30ea9e6d8379672695 Mon Sep 17 00:00:00 2001
From: Amit Kapila <akapila@postgresql.org>
Date: Wed, 10 Feb 2021 09:20:31 +0530
Subject: [PATCH v31] Allow multiple xacts during table sync in logical
replication.
For the initial table data synchronization in logical replication, we use
a single transaction to copy the entire table and then synchronize the
position in the stream with the main apply worker.
There are multiple downsides of this approach: (a) We have to perform the
entire copy operation again if there is any error (network breakdown,
error in the database operation, etc.) while we synchronize the WAL
position between tablesync worker and apply worker; this will be onerous
especially for large copies, (b) Using a single transaction in the
synchronization-phase (where we can receive WAL from multiple
transactions) will have the risk of exceeding the CID limit, (c) The slot
will hold the WAL till the entire sync is complete because we never commit
till the end.
This patch solves all the above downsides by allowing multiple
transactions during the tablesync phase. The initial copy is done in a
single transaction and after that, we commit each transaction as we
receive. To allow recovery after any error or crash, we use a permanent
slot and origin to track the progress. The slot and origin will be removed
once we finish the synchronization of the table. We also remove slot and
origin of tablesync workers if the user performs DROP SUBSCRIPTION .. or
ALTER SUBSCRIPTION .. REFERESH and some of the table syncs are still not
finished.
The commands ALTER SUBSCRIPTION ... REFRESH PUBLICATION and
ALTER SUBSCRIPTION ... SET PUBLICATION ... with refresh option as true
cannot be executed inside a transaction block because they can now drop
the slots for which we have no provision to rollback.
This will also open up the path for logical replication of 2PC
transactions on the subscriber side. Previously, we can't do that because
of the requirement of maintaining a single transaction in tablesync
workers.
Author: Peter Smith, Amit Kapila, and Takamichi Osumi
Reviewed-by: Ajin Cherian, Petr Jelinek, Hou Zhijie and Amit Kapila
Discussion: https://postgr.es/m/CAA4eK1KHJxaZS-fod-0fey=0tq3=Gkn4ho=8N4-5HWiCfu0H1A@mail.gmail.com
---
doc/src/sgml/catalogs.sgml | 1 +
doc/src/sgml/logical-replication.sgml | 59 ++-
doc/src/sgml/ref/alter_subscription.sgml | 18 +
doc/src/sgml/ref/drop_subscription.sgml | 6 +-
src/backend/access/transam/xact.c | 11 -
src/backend/catalog/pg_subscription.c | 39 ++
src/backend/commands/subscriptioncmds.c | 467 ++++++++++++++++-----
.../libpqwalreceiver/libpqwalreceiver.c | 8 +
src/backend/replication/logical/launcher.c | 147 -------
src/backend/replication/logical/tablesync.c | 236 +++++++++--
src/backend/replication/logical/worker.c | 18 +-
src/backend/tcop/utility.c | 3 +-
src/include/catalog/pg_subscription_rel.h | 2 +
src/include/commands/subscriptioncmds.h | 2 +-
src/include/replication/logicallauncher.h | 2 -
src/include/replication/slot.h | 3 +
src/include/replication/walreceiver.h | 1 +
src/include/replication/worker_internal.h | 3 +-
src/test/regress/expected/subscription.out | 21 +
src/test/regress/sql/subscription.sql | 22 +
src/test/subscription/t/004_sync.pl | 21 +-
src/tools/pgindent/typedefs.list | 2 +-
22 files changed, 767 insertions(+), 325 deletions(-)
diff --git a/doc/src/sgml/catalogs.sgml b/doc/src/sgml/catalogs.sgml
index ea222c0..692ad65 100644
--- a/doc/src/sgml/catalogs.sgml
+++ b/doc/src/sgml/catalogs.sgml
@@ -7673,6 +7673,7 @@ SCRAM-SHA-256$<replaceable><iteration count></replaceable>:<replaceable>&l
State code:
<literal>i</literal> = initialize,
<literal>d</literal> = data is being copied,
+ <literal>f</literal> = finished table copy,
<literal>s</literal> = synchronized,
<literal>r</literal> = ready (normal replication)
</para></entry>
diff --git a/doc/src/sgml/logical-replication.sgml b/doc/src/sgml/logical-replication.sgml
index a560ad6..d0742f2 100644
--- a/doc/src/sgml/logical-replication.sgml
+++ b/doc/src/sgml/logical-replication.sgml
@@ -186,9 +186,10 @@
<para>
Each subscription will receive changes via one replication slot (see
- <xref linkend="streaming-replication-slots"/>). Additional temporary
- replication slots may be required for the initial data synchronization
- of pre-existing table data.
+ <xref linkend="streaming-replication-slots"/>). Additional replication
+ slots may be required for the initial data synchronization of
+ pre-existing table data and those will be dropped at the end of data
+ synchronization.
</para>
<para>
@@ -248,13 +249,23 @@
<para>
As mentioned earlier, each (active) subscription receives changes from a
- replication slot on the remote (publishing) side. Normally, the remote
- replication slot is created automatically when the subscription is created
- using <command>CREATE SUBSCRIPTION</command> and it is dropped
- automatically when the subscription is dropped using <command>DROP
- SUBSCRIPTION</command>. In some situations, however, it can be useful or
- necessary to manipulate the subscription and the underlying replication
- slot separately. Here are some scenarios:
+ replication slot on the remote (publishing) side.
+ </para>
+ <para>
+ Additional table synchronization slots are normally transient, created
+ internally to perform initial table synchronization and dropped
+ automatically when they are no longer needed. These table synchronization
+ slots have generated names: <quote><literal>pg_%u_sync_%u_%llu</literal></quote>
+ (parameters: Subscription <parameter>oid</parameter>,
+ Table <parameter>relid</parameter>, system identifier <parameter>sysid</parameter>)
+ </para>
+ <para>
+ Normally, the remote replication slot is created automatically when the
+ subscription is created using <command>CREATE SUBSCRIPTION</command> and it
+ is dropped automatically when the subscription is dropped using
+ <command>DROP SUBSCRIPTION</command>. In some situations, however, it can
+ be useful or necessary to manipulate the subscription and the underlying
+ replication slot separately. Here are some scenarios:
<itemizedlist>
<listitem>
@@ -294,8 +305,9 @@
using <command>ALTER SUBSCRIPTION</command> before attempting to drop
the subscription. If the remote database instance no longer exists, no
further action is then necessary. If, however, the remote database
- instance is just unreachable, the replication slot should then be
- dropped manually; otherwise it would continue to reserve WAL and might
+ instance is just unreachable, the replication slot (and any still
+ remaining table synchronization slots) should then be
+ dropped manually; otherwise it/they would continue to reserve WAL and might
eventually cause the disk to fill up. Such cases should be carefully
investigated.
</para>
@@ -468,16 +480,19 @@
<sect2 id="logical-replication-snapshot">
<title>Initial Snapshot</title>
<para>
- The initial data in existing subscribed tables are snapshotted and
- copied in a parallel instance of a special kind of apply process.
- This process will create its own temporary replication slot and
- copy the existing data. Once existing data is copied, the worker
- enters synchronization mode, which ensures that the table is brought
- up to a synchronized state with the main apply process by streaming
- any changes that happened during the initial data copy using standard
- logical replication. Once the synchronization is done, the control
- of the replication of the table is given back to the main apply
- process where the replication continues as normal.
+ The initial data in existing subscribed tables are snapshotted and
+ copied in a parallel instance of a special kind of apply process.
+ This process will create its own replication slot and copy the existing
+ data. As soon as the copy is finished the table contents will become
+ visible to other backends. Once existing data is copied, the worker
+ enters synchronization mode, which ensures that the table is brought
+ up to a synchronized state with the main apply process by streaming
+ any changes that happened during the initial data copy using standard
+ logical replication. During this synchronization phase, the changes
+ are applied and committed in the same order as they happened on the
+ publisher. Once the synchronization is done, the control of the
+ replication of the table is given back to the main apply process where
+ the replication continues as normal.
</para>
</sect2>
</sect1>
diff --git a/doc/src/sgml/ref/alter_subscription.sgml b/doc/src/sgml/ref/alter_subscription.sgml
index db5e59f..bcb0acf 100644
--- a/doc/src/sgml/ref/alter_subscription.sgml
+++ b/doc/src/sgml/ref/alter_subscription.sgml
@@ -48,6 +48,24 @@ ALTER SUBSCRIPTION <replaceable class="parameter">name</replaceable> RENAME TO <
(Currently, all subscription owners must be superusers, so the owner checks
will be bypassed in practice. But this might change in the future.)
</para>
+
+ <para>
+ When refreshing a publication we remove the relations that are no longer
+ part of the publication and we also remove the tablesync slots if there are
+ any. It is necessary to remove tablesync slots so that the resources
+ allocated for the subscription on the remote host are released. If due to
+ network breakdown or some other error, <productname>PostgreSQL</productname>
+ is unable to remove the slots, an ERROR will be reported. To proceed in this
+ situation, either the user need to retry the operation or disassociate the
+ slot from the subscription and drop the subscription as explained in
+ <xref linkend="sql-dropsubscription"/>.
+ </para>
+
+ <para>
+ Commands <command>ALTER SUBSCRIPTION ... REFRESH PUBLICATION</command> and
+ <command>ALTER SUBSCRIPTION ... SET PUBLICATION ...</command> with refresh
+ option as true cannot be executed inside a transaction block.
+ </para>
</refsect1>
<refsect1>
diff --git a/doc/src/sgml/ref/drop_subscription.sgml b/doc/src/sgml/ref/drop_subscription.sgml
index adbdeaf..aee9615 100644
--- a/doc/src/sgml/ref/drop_subscription.sgml
+++ b/doc/src/sgml/ref/drop_subscription.sgml
@@ -79,7 +79,8 @@ DROP SUBSCRIPTION [ IF EXISTS ] <replaceable class="parameter">name</replaceable
<para>
When dropping a subscription that is associated with a replication slot on
the remote host (the normal state), <command>DROP SUBSCRIPTION</command>
- will connect to the remote host and try to drop the replication slot as
+ will connect to the remote host and try to drop the replication slot (and
+ any remaining table synchronization slots) as
part of its operation. This is necessary so that the resources allocated
for the subscription on the remote host are released. If this fails,
either because the remote host is not reachable or because the remote
@@ -89,7 +90,8 @@ DROP SUBSCRIPTION [ IF EXISTS ] <replaceable class="parameter">name</replaceable
executing <literal>ALTER SUBSCRIPTION ... SET (slot_name = NONE)</literal>.
After that, <command>DROP SUBSCRIPTION</command> will no longer attempt any
actions on a remote host. Note that if the remote replication slot still
- exists, it should then be dropped manually; otherwise it will continue to
+ exists, it (and any related table synchronization slots) should then be
+ dropped manually; otherwise it/they will continue to
reserve WAL and might eventually cause the disk to fill up. See
also <xref linkend="logical-replication-subscription-slot"/>.
</para>
diff --git a/src/backend/access/transam/xact.c b/src/backend/access/transam/xact.c
index a2068e3..3c8b4eb 100644
--- a/src/backend/access/transam/xact.c
+++ b/src/backend/access/transam/xact.c
@@ -2432,15 +2432,6 @@ PrepareTransaction(void)
(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
errmsg("cannot PREPARE a transaction that has exported snapshots")));
- /*
- * Don't allow PREPARE but for transaction that has/might kill logical
- * replication workers.
- */
- if (XactManipulatesLogicalReplicationWorkers())
- ereport(ERROR,
- (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
- errmsg("cannot PREPARE a transaction that has manipulated logical replication workers")));
-
/* Prevent cancel/die interrupt while cleaning up */
HOLD_INTERRUPTS();
@@ -4899,7 +4890,6 @@ CommitSubTransaction(void)
AtEOSubXact_HashTables(true, s->nestingLevel);
AtEOSubXact_PgStat(true, s->nestingLevel);
AtSubCommit_Snapshot(s->nestingLevel);
- AtEOSubXact_ApplyLauncher(true, s->nestingLevel);
/*
* We need to restore the upper transaction's read-only state, in case the
@@ -5059,7 +5049,6 @@ AbortSubTransaction(void)
AtEOSubXact_HashTables(false, s->nestingLevel);
AtEOSubXact_PgStat(false, s->nestingLevel);
AtSubAbort_Snapshot(s->nestingLevel);
- AtEOSubXact_ApplyLauncher(false, s->nestingLevel);
}
/*
diff --git a/src/backend/catalog/pg_subscription.c b/src/backend/catalog/pg_subscription.c
index 44cb285..750ec2a 100644
--- a/src/backend/catalog/pg_subscription.c
+++ b/src/backend/catalog/pg_subscription.c
@@ -29,6 +29,7 @@
#include "utils/array.h"
#include "utils/builtins.h"
#include "utils/fmgroids.h"
+#include "utils/lsyscache.h"
#include "utils/pg_lsn.h"
#include "utils/rel.h"
#include "utils/syscache.h"
@@ -337,6 +338,13 @@ GetSubscriptionRelState(Oid subid, Oid relid, XLogRecPtr *sublsn)
char substate;
bool isnull;
Datum d;
+ Relation rel;
+
+ /*
+ * This is to avoid the race condition with AlterSubscription which tries
+ * to remove this relstate.
+ */
+ rel = table_open(SubscriptionRelRelationId, AccessShareLock);
/* Try finding the mapping. */
tup = SearchSysCache2(SUBSCRIPTIONRELMAP,
@@ -363,6 +371,8 @@ GetSubscriptionRelState(Oid subid, Oid relid, XLogRecPtr *sublsn)
/* Cleanup */
ReleaseSysCache(tup);
+ table_close(rel, AccessShareLock);
+
return substate;
}
@@ -403,6 +413,35 @@ RemoveSubscriptionRel(Oid subid, Oid relid)
scan = table_beginscan_catalog(rel, nkeys, skey);
while (HeapTupleIsValid(tup = heap_getnext(scan, ForwardScanDirection)))
{
+ Form_pg_subscription_rel subrel;
+
+ subrel = (Form_pg_subscription_rel) GETSTRUCT(tup);
+
+ /*
+ * We don't allow to drop the relation mapping when the table
+ * synchronization is in progress unless the caller updates the
+ * corresponding subscription as well. This is to ensure that we don't
+ * leave tablesync slots or origins in the system when the
+ * corresponding table is dropped.
+ */
+ if (!OidIsValid(subid) && subrel->srsubstate != SUBREL_STATE_READY)
+ {
+ ereport(ERROR,
+ (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+ errmsg("could not drop relation mapping for subscription \"%s\"",
+ get_subscription_name(subrel->srsubid, false)),
+ errdetail("Table synchronization for relation \"%s\" is in progress and is in state \"%c\".",
+ get_rel_name(relid), subrel->srsubstate),
+
+ /*
+ * translator: first %s is a SQL ALTER command and second %s is a
+ * SQL DROP command
+ */
+ errhint("Use %s to enable subscription if not already enabled or use %s to drop the subscription.",
+ "ALTER SUBSCRIPTION ... ENABLE",
+ "DROP SUBSCRIPTION ...")));
+ }
+
CatalogTupleDelete(rel, &tup->t_self);
}
table_endscan(scan);
diff --git a/src/backend/commands/subscriptioncmds.c b/src/backend/commands/subscriptioncmds.c
index 5ccbc9d..7996f84 100644
--- a/src/backend/commands/subscriptioncmds.c
+++ b/src/backend/commands/subscriptioncmds.c
@@ -34,6 +34,7 @@
#include "nodes/makefuncs.h"
#include "replication/logicallauncher.h"
#include "replication/origin.h"
+#include "replication/slot.h"
#include "replication/walreceiver.h"
#include "replication/walsender.h"
#include "replication/worker_internal.h"
@@ -46,6 +47,8 @@
#include "utils/syscache.h"
static List *fetch_table_list(WalReceiverConn *wrconn, List *publications);
+static void ReportSlotConnectionError(List *rstates, Oid subid, char *slotname, char *err);
+
/*
* Common option parsing function for CREATE and ALTER SUBSCRIPTION commands.
@@ -566,107 +569,207 @@ AlterSubscription_refresh(Subscription *sub, bool copy_data)
Oid *pubrel_local_oids;
ListCell *lc;
int off;
+ int remove_rel_len;
+ Relation rel = NULL;
+ typedef struct SubRemoveRels
+ {
+ Oid relid;
+ char state;
+ } SubRemoveRels;
+ SubRemoveRels *sub_remove_rels;
/* Load the library providing us libpq calls. */
load_file("libpqwalreceiver", false);
- /* Try to connect to the publisher. */
- wrconn = walrcv_connect(sub->conninfo, true, sub->name, &err);
- if (!wrconn)
- ereport(ERROR,
- (errmsg("could not connect to the publisher: %s", err)));
-
- /* Get the table list from publisher. */
- pubrel_names = fetch_table_list(wrconn, sub->publications);
+ PG_TRY();
+ {
+ /* Try to connect to the publisher. */
+ wrconn = walrcv_connect(sub->conninfo, true, sub->name, &err);
+ if (!wrconn)
+ ereport(ERROR,
+ (errmsg("could not connect to the publisher: %s", err)));
- /* We are done with the remote side, close connection. */
- walrcv_disconnect(wrconn);
+ /* Get the table list from publisher. */
+ pubrel_names = fetch_table_list(wrconn, sub->publications);
- /* Get local table list. */
- subrel_states = GetSubscriptionRelations(sub->oid);
+ /* Get local table list. */
+ subrel_states = GetSubscriptionRelations(sub->oid);
- /*
- * Build qsorted array of local table oids for faster lookup. This can
- * potentially contain all tables in the database so speed of lookup is
- * important.
- */
- subrel_local_oids = palloc(list_length(subrel_states) * sizeof(Oid));
- off = 0;
- foreach(lc, subrel_states)
- {
- SubscriptionRelState *relstate = (SubscriptionRelState *) lfirst(lc);
+ /*
+ * Build qsorted array of local table oids for faster lookup. This can
+ * potentially contain all tables in the database so speed of lookup
+ * is important.
+ */
+ subrel_local_oids = palloc(list_length(subrel_states) * sizeof(Oid));
+ off = 0;
+ foreach(lc, subrel_states)
+ {
+ SubscriptionRelState *relstate = (SubscriptionRelState *) lfirst(lc);
- subrel_local_oids[off++] = relstate->relid;
- }
- qsort(subrel_local_oids, list_length(subrel_states),
- sizeof(Oid), oid_cmp);
+ subrel_local_oids[off++] = relstate->relid;
+ }
+ qsort(subrel_local_oids, list_length(subrel_states),
+ sizeof(Oid), oid_cmp);
+
+ /*
+ * Rels that we want to remove from subscription and drop any slots
+ * and origins corresponding to them.
+ */
+ sub_remove_rels = palloc(list_length(subrel_states) * sizeof(SubRemoveRels));
+
+ /*
+ * Walk over the remote tables and try to match them to locally known
+ * tables. If the table is not known locally create a new state for
+ * it.
+ *
+ * Also builds array of local oids of remote tables for the next step.
+ */
+ off = 0;
+ pubrel_local_oids = palloc(list_length(pubrel_names) * sizeof(Oid));
+
+ foreach(lc, pubrel_names)
+ {
+ RangeVar *rv = (RangeVar *) lfirst(lc);
+ Oid relid;
- /*
- * Walk over the remote tables and try to match them to locally known
- * tables. If the table is not known locally create a new state for it.
- *
- * Also builds array of local oids of remote tables for the next step.
- */
- off = 0;
- pubrel_local_oids = palloc(list_length(pubrel_names) * sizeof(Oid));
+ relid = RangeVarGetRelid(rv, AccessShareLock, false);
- foreach(lc, pubrel_names)
- {
- RangeVar *rv = (RangeVar *) lfirst(lc);
- Oid relid;
+ /* Check for supported relkind. */
+ CheckSubscriptionRelkind(get_rel_relkind(relid),
+ rv->schemaname, rv->relname);
- relid = RangeVarGetRelid(rv, AccessShareLock, false);
+ pubrel_local_oids[off++] = relid;
- /* Check for supported relkind. */
- CheckSubscriptionRelkind(get_rel_relkind(relid),
- rv->schemaname, rv->relname);
+ if (!bsearch(&relid, subrel_local_oids,
+ list_length(subrel_states), sizeof(Oid), oid_cmp))
+ {
+ AddSubscriptionRelState(sub->oid, relid,
+ copy_data ? SUBREL_STATE_INIT : SUBREL_STATE_READY,
+ InvalidXLogRecPtr);
+ ereport(DEBUG1,
+ (errmsg("table \"%s.%s\" added to subscription \"%s\"",
+ rv->schemaname, rv->relname, sub->name)));
+ }
+ }
- pubrel_local_oids[off++] = relid;
+ /*
+ * Next remove state for tables we should not care about anymore using
+ * the data we collected above
+ */
+ qsort(pubrel_local_oids, list_length(pubrel_names),
+ sizeof(Oid), oid_cmp);
- if (!bsearch(&relid, subrel_local_oids,
- list_length(subrel_states), sizeof(Oid), oid_cmp))
+ remove_rel_len = 0;
+ for (off = 0; off < list_length(subrel_states); off++)
{
- AddSubscriptionRelState(sub->oid, relid,
- copy_data ? SUBREL_STATE_INIT : SUBREL_STATE_READY,
- InvalidXLogRecPtr);
- ereport(DEBUG1,
- (errmsg("table \"%s.%s\" added to subscription \"%s\"",
- rv->schemaname, rv->relname, sub->name)));
- }
- }
+ Oid relid = subrel_local_oids[off];
- /*
- * Next remove state for tables we should not care about anymore using the
- * data we collected above
- */
- qsort(pubrel_local_oids, list_length(pubrel_names),
- sizeof(Oid), oid_cmp);
+ if (!bsearch(&relid, pubrel_local_oids,
+ list_length(pubrel_names), sizeof(Oid), oid_cmp))
+ {
+ char state;
+ XLogRecPtr statelsn;
+
+ /*
+ * Lock pg_subscription_rel with AccessExclusiveLock to
+ * prevent any race conditions with the apply worker
+ * re-launching workers at the same time this code is trying
+ * to remove those tables.
+ *
+ * Even if new worker for this particular rel is restarted it
+ * won't be able to make any progress as we hold exclusive
+ * lock on subscription_rel till the transaction end. It will
+ * simply exit as there is no corresponding rel entry.
+ *
+ * This locking also ensures that the state of rels won't
+ * change till we are done with this refresh operation.
+ */
+ if (!rel)
+ rel = table_open(SubscriptionRelRelationId, AccessExclusiveLock);
+
+ /* Last known rel state. */
+ state = GetSubscriptionRelState(sub->oid, relid, &statelsn);
+
+ sub_remove_rels[remove_rel_len].relid = relid;
+ sub_remove_rels[remove_rel_len++].state = state;
+
+ RemoveSubscriptionRel(sub->oid, relid);
+
+ logicalrep_worker_stop(sub->oid, relid);
+
+ /*
+ * For READY state, we would have already dropped the
+ * tablesync origin.
+ */
+ if (state != SUBREL_STATE_READY)
+ {
+ char originname[NAMEDATALEN];
+
+ /*
+ * Drop the tablesync's origin tracking if exists.
+ *
+ * It is possible that the origin is not yet created for
+ * tablesync worker, this can happen for the states before
+ * SUBREL_STATE_FINISHEDCOPY. The apply worker can also
+ * concurrently try to drop the origin and by this time
+ * the origin might be already removed. For these reasons,
+ * passing missing_ok = true.
+ */
+ ReplicationOriginNameForTablesync(sub->oid, relid, originname);
+ replorigin_drop_by_name(originname, true, false);
+ }
- for (off = 0; off < list_length(subrel_states); off++)
- {
- Oid relid = subrel_local_oids[off];
+ ereport(DEBUG1,
+ (errmsg("table \"%s.%s\" removed from subscription \"%s\"",
+ get_namespace_name(get_rel_namespace(relid)),
+ get_rel_name(relid),
+ sub->name)));
+ }
+ }
- if (!bsearch(&relid, pubrel_local_oids,
- list_length(pubrel_names), sizeof(Oid), oid_cmp))
+ /*
+ * Drop the tablesync slots associated with removed tables. This has
+ * to be at the end because otherwise if there is an error while doing
+ * the database operations we won't be able to rollback dropped slots.
+ */
+ for (off = 0; off < remove_rel_len; off++)
{
- RemoveSubscriptionRel(sub->oid, relid);
-
- logicalrep_worker_stop_at_commit(sub->oid, relid);
-
- ereport(DEBUG1,
- (errmsg("table \"%s.%s\" removed from subscription \"%s\"",
- get_namespace_name(get_rel_namespace(relid)),
- get_rel_name(relid),
- sub->name)));
+ if (sub_remove_rels[off].state != SUBREL_STATE_READY &&
+ sub_remove_rels[off].state != SUBREL_STATE_SYNCDONE)
+ {
+ char syncslotname[NAMEDATALEN] = {0};
+
+ /*
+ * For READY/SYNCDONE states we know the tablesync slot has
+ * already been dropped by the tablesync worker.
+ *
+ * For other states, there is no certainty, maybe the slot
+ * does not exist yet. Also, if we fail after removing some of
+ * the slots, next time, it will again try to drop already
+ * dropped slots and fail. For these reasons, we allow
+ * missing_ok = true for the drop.
+ */
+ ReplicationSlotNameForTablesync(sub->oid, sub_remove_rels[off].relid, syncslotname);
+ ReplicationSlotDropAtPubNode(wrconn, syncslotname, true);
+ }
}
}
+ PG_FINALLY();
+ {
+ if (wrconn)
+ walrcv_disconnect(wrconn);
+ }
+ PG_END_TRY();
+
+ if (rel)
+ table_close(rel, NoLock);
}
/*
* Alter the existing subscription.
*/
ObjectAddress
-AlterSubscription(AlterSubscriptionStmt *stmt)
+AlterSubscription(AlterSubscriptionStmt *stmt, bool isTopLevel)
{
Relation rel;
ObjectAddress myself;
@@ -848,6 +951,8 @@ AlterSubscription(AlterSubscriptionStmt *stmt)
errmsg("ALTER SUBSCRIPTION with refresh is not allowed for disabled subscriptions"),
errhint("Use ALTER SUBSCRIPTION ... SET PUBLICATION ... WITH (refresh = false).")));
+ PreventInTransactionBlock(isTopLevel, "ALTER SUBSCRIPTION with refresh");
+
/* Make sure refresh sees the new list of publications. */
sub->publications = stmt->publication;
@@ -877,6 +982,8 @@ AlterSubscription(AlterSubscriptionStmt *stmt)
NULL, NULL, /* no "binary" */
NULL, NULL); /* no "streaming" */
+ PreventInTransactionBlock(isTopLevel, "ALTER SUBSCRIPTION ... REFRESH");
+
AlterSubscription_refresh(sub, copy_data);
break;
@@ -927,8 +1034,8 @@ DropSubscription(DropSubscriptionStmt *stmt, bool isTopLevel)
char originname[NAMEDATALEN];
char *err = NULL;
WalReceiverConn *wrconn = NULL;
- StringInfoData cmd;
Form_pg_subscription form;
+ List *rstates;
/*
* Lock pg_subscription with AccessExclusiveLock to ensure that the
@@ -1041,6 +1148,36 @@ DropSubscription(DropSubscriptionStmt *stmt, bool isTopLevel)
}
list_free(subworkers);
+ /*
+ * Cleanup of tablesync replication origins.
+ *
+ * Any READY-state relations would already have dealt with clean-ups.
+ *
+ * Note that the state can't change because we have already stopped both
+ * the apply and tablesync workers and they can't restart because of
+ * exclusive lock on the subscription.
+ */
+ rstates = GetSubscriptionNotReadyRelations(subid);
+ foreach(lc, rstates)
+ {
+ SubscriptionRelState *rstate = (SubscriptionRelState *) lfirst(lc);
+ Oid relid = rstate->relid;
+
+ /* Only cleanup resources of tablesync workers */
+ if (!OidIsValid(relid))
+ continue;
+
+ /*
+ * Drop the tablesync's origin tracking if exists.
+ *
+ * It is possible that the origin is not yet created for tablesync
+ * worker so passing missing_ok = true. This can happen for the states
+ * before SUBREL_STATE_FINISHEDCOPY.
+ */
+ ReplicationOriginNameForTablesync(subid, relid, originname);
+ replorigin_drop_by_name(originname, true, false);
+ }
+
/* Clean up dependencies */
deleteSharedDependencyRecordsFor(SubscriptionRelationId, subid, 0);
@@ -1055,30 +1192,110 @@ DropSubscription(DropSubscriptionStmt *stmt, bool isTopLevel)
* If there is no slot associated with the subscription, we can finish
* here.
*/
- if (!slotname)
+ if (!slotname && rstates == NIL)
{
table_close(rel, NoLock);
return;
}
/*
- * Otherwise drop the replication slot at the publisher node using the
- * replication connection.
+ * Try to acquire the connection necessary for dropping slots.
+ *
+ * Note: If the slotname is NONE/NULL then we allow the command to finish
+ * and users need to manually cleanup the apply and tablesync worker slots
+ * later.
+ *
+ * This has to be at the end because otherwise if there is an error while
+ * doing the database operations we won't be able to rollback dropped
+ * slot.
*/
load_file("libpqwalreceiver", false);
- initStringInfo(&cmd);
- appendStringInfo(&cmd, "DROP_REPLICATION_SLOT %s WAIT", quote_identifier(slotname));
-
wrconn = walrcv_connect(conninfo, true, subname, &err);
if (wrconn == NULL)
- ereport(ERROR,
- (errmsg("could not connect to publisher when attempting to "
- "drop the replication slot \"%s\"", slotname),
- errdetail("The error was: %s", err),
- /* translator: %s is an SQL ALTER command */
- errhint("Use %s to disassociate the subscription from the slot.",
- "ALTER SUBSCRIPTION ... SET (slot_name = NONE)")));
+ {
+ if (!slotname)
+ {
+ /* be tidy */
+ list_free(rstates);
+ table_close(rel, NoLock);
+ return;
+ }
+ else
+ {
+ ReportSlotConnectionError(rstates, subid, slotname, err);
+ }
+ }
+
+ PG_TRY();
+ {
+ foreach(lc, rstates)
+ {
+ SubscriptionRelState *rstate = (SubscriptionRelState *) lfirst(lc);
+ Oid relid = rstate->relid;
+
+ /* Only cleanup resources of tablesync workers */
+ if (!OidIsValid(relid))
+ continue;
+
+ /*
+ * Drop the tablesync slots associated with removed tables.
+ *
+ * For SYNCDONE/READY states, the tablesync slot is known to have
+ * already been dropped by the tablesync worker.
+ *
+ * For other states, there is no certainty, maybe the slot does
+ * not exist yet. Also, if we fail after removing some of the
+ * slots, next time, it will again try to drop already dropped
+ * slots and fail. For these reasons, we allow missing_ok = true
+ * for the drop.
+ */
+ if (rstate->state != SUBREL_STATE_SYNCDONE)
+ {
+ char syncslotname[NAMEDATALEN] = {0};
+
+ ReplicationSlotNameForTablesync(subid, relid, syncslotname);
+ ReplicationSlotDropAtPubNode(wrconn, syncslotname, true);
+ }
+ }
+
+ list_free(rstates);
+
+ /*
+ * If there is a slot associated with the subscription, then drop the
+ * replication slot at the publisher.
+ */
+ if (slotname)
+ ReplicationSlotDropAtPubNode(wrconn, slotname, false);
+
+ }
+ PG_FINALLY();
+ {
+ walrcv_disconnect(wrconn);
+ }
+ PG_END_TRY();
+
+ table_close(rel, NoLock);
+}
+
+/*
+ * Drop the replication slot at the publisher node using the replication
+ * connection.
+ *
+ * missing_ok - if true then only issue a WARNING message if the slot doesn't
+ * exist.
+ */
+void
+ReplicationSlotDropAtPubNode(WalReceiverConn *wrconn, char *slotname, bool missing_ok)
+{
+ StringInfoData cmd;
+
+ Assert(wrconn);
+
+ load_file("libpqwalreceiver", false);
+
+ initStringInfo(&cmd);
+ appendStringInfo(&cmd, "DROP_REPLICATION_SLOT %s WAIT", quote_identifier(slotname));
PG_TRY();
{
@@ -1086,27 +1303,39 @@ DropSubscription(DropSubscriptionStmt *stmt, bool isTopLevel)
res = walrcv_exec(wrconn, cmd.data, 0, NULL);
- if (res->status != WALRCV_OK_COMMAND)
- ereport(ERROR,
+ if (res->status == WALRCV_OK_COMMAND)
+ {
+ /* NOTICE. Success. */
+ ereport(NOTICE,
+ (errmsg("dropped replication slot \"%s\" on publisher",
+ slotname)));
+ }
+ else if (res->status == WALRCV_ERROR &&
+ missing_ok &&
+ res->sqlstate == ERRCODE_UNDEFINED_OBJECT)
+ {
+ /* WARNING. Error, but missing_ok = true. */
+ ereport(WARNING,
(errmsg("could not drop the replication slot \"%s\" on publisher",
slotname),
errdetail("The error was: %s", res->err)));
+ }
else
- ereport(NOTICE,
- (errmsg("dropped replication slot \"%s\" on publisher",
- slotname)));
+ {
+ /* ERROR. */
+ ereport(ERROR,
+ (errmsg("could not drop the replication slot \"%s\" on publisher",
+ slotname),
+ errdetail("The error was: %s", res->err)));
+ }
walrcv_clear_result(res);
}
PG_FINALLY();
{
- walrcv_disconnect(wrconn);
+ pfree(cmd.data);
}
PG_END_TRY();
-
- pfree(cmd.data);
-
- table_close(rel, NoLock);
}
/*
@@ -1275,3 +1504,45 @@ fetch_table_list(WalReceiverConn *wrconn, List *publications)
return tablelist;
}
+
+/*
+ * This is to report the connection failure while dropping replication slots.
+ * Here, we report the WARNING for all tablesync slots so that user can drop
+ * them manually, if required.
+ */
+static void
+ReportSlotConnectionError(List *rstates, Oid subid, char *slotname, char *err)
+{
+ ListCell *lc;
+
+ foreach(lc, rstates)
+ {
+ SubscriptionRelState *rstate = (SubscriptionRelState *) lfirst(lc);
+ Oid relid = rstate->relid;
+
+ /* Only cleanup resources of tablesync workers */
+ if (!OidIsValid(relid))
+ continue;
+
+ /*
+ * Caller needs to ensure that relstate doesn't change underneath us.
+ * See DropSubscription where we get the relstates.
+ */
+ if (rstate->state != SUBREL_STATE_SYNCDONE)
+ {
+ char syncslotname[NAMEDATALEN] = {0};
+
+ ReplicationSlotNameForTablesync(subid, relid, syncslotname);
+ elog(WARNING, "could not drop tablesync replication slot \"%s\"",
+ syncslotname);
+ }
+ }
+
+ ereport(ERROR,
+ (errmsg("could not connect to publisher when attempting to "
+ "drop the replication slot \"%s\"", slotname),
+ errdetail("The error was: %s", err),
+ /* translator: %s is an SQL ALTER command */
+ errhint("Use %s to disassociate the subscription from the slot.",
+ "ALTER SUBSCRIPTION ... SET (slot_name = NONE)")));
+}
diff --git a/src/backend/replication/libpqwalreceiver/libpqwalreceiver.c b/src/backend/replication/libpqwalreceiver/libpqwalreceiver.c
index e958274..7714696 100644
--- a/src/backend/replication/libpqwalreceiver/libpqwalreceiver.c
+++ b/src/backend/replication/libpqwalreceiver/libpqwalreceiver.c
@@ -982,6 +982,7 @@ libpqrcv_exec(WalReceiverConn *conn, const char *query,
{
PGresult *pgres = NULL;
WalRcvExecResult *walres = palloc0(sizeof(WalRcvExecResult));
+ char *diag_sqlstate;
if (MyDatabaseId == InvalidOid)
ereport(ERROR,
@@ -1025,6 +1026,13 @@ libpqrcv_exec(WalReceiverConn *conn, const char *query,
case PGRES_BAD_RESPONSE:
walres->status = WALRCV_ERROR;
walres->err = pchomp(PQerrorMessage(conn->streamConn));
+ diag_sqlstate = PQresultErrorField(pgres, PG_DIAG_SQLSTATE);
+ if (diag_sqlstate)
+ walres->sqlstate = MAKE_SQLSTATE(diag_sqlstate[0],
+ diag_sqlstate[1],
+ diag_sqlstate[2],
+ diag_sqlstate[3],
+ diag_sqlstate[4]);
break;
}
diff --git a/src/backend/replication/logical/launcher.c b/src/backend/replication/logical/launcher.c
index 186514c..58082dd 100644
--- a/src/backend/replication/logical/launcher.c
+++ b/src/backend/replication/logical/launcher.c
@@ -73,20 +73,6 @@ typedef struct LogicalRepWorkerId
Oid relid;
} LogicalRepWorkerId;
-typedef struct StopWorkersData
-{
- int nestDepth; /* Sub-transaction nest level */
- List *workers; /* List of LogicalRepWorkerId */
- struct StopWorkersData *parent; /* This need not be an immediate
- * subtransaction parent */
-} StopWorkersData;
-
-/*
- * Stack of StopWorkersData elements. Each stack element contains the workers
- * to be stopped for that subtransaction.
- */
-static StopWorkersData *on_commit_stop_workers = NULL;
-
static void ApplyLauncherWakeup(void);
static void logicalrep_launcher_onexit(int code, Datum arg);
static void logicalrep_worker_onexit(int code, Datum arg);
@@ -547,51 +533,6 @@ logicalrep_worker_stop(Oid subid, Oid relid)
}
/*
- * Request worker for specified sub/rel to be stopped on commit.
- */
-void
-logicalrep_worker_stop_at_commit(Oid subid, Oid relid)
-{
- int nestDepth = GetCurrentTransactionNestLevel();
- LogicalRepWorkerId *wid;
- MemoryContext oldctx;
-
- /* Make sure we store the info in context that survives until commit. */
- oldctx = MemoryContextSwitchTo(TopTransactionContext);
-
- /* Check that previous transactions were properly cleaned up. */
- Assert(on_commit_stop_workers == NULL ||
- nestDepth >= on_commit_stop_workers->nestDepth);
-
- /*
- * Push a new stack element if we don't already have one for the current
- * nestDepth.
- */
- if (on_commit_stop_workers == NULL ||
- nestDepth > on_commit_stop_workers->nestDepth)
- {
- StopWorkersData *newdata = palloc(sizeof(StopWorkersData));
-
- newdata->nestDepth = nestDepth;
- newdata->workers = NIL;
- newdata->parent = on_commit_stop_workers;
- on_commit_stop_workers = newdata;
- }
-
- /*
- * Finally add a new worker into the worker list of the current
- * subtransaction.
- */
- wid = palloc(sizeof(LogicalRepWorkerId));
- wid->subid = subid;
- wid->relid = relid;
- on_commit_stop_workers->workers =
- lappend(on_commit_stop_workers->workers, wid);
-
- MemoryContextSwitchTo(oldctx);
-}
-
-/*
* Wake up (using latch) any logical replication worker for specified sub/rel.
*/
void
@@ -820,109 +761,21 @@ ApplyLauncherShmemInit(void)
}
/*
- * Check whether current transaction has manipulated logical replication
- * workers.
- */
-bool
-XactManipulatesLogicalReplicationWorkers(void)
-{
- return (on_commit_stop_workers != NULL);
-}
-
-/*
* Wakeup the launcher on commit if requested.
*/
void
AtEOXact_ApplyLauncher(bool isCommit)
{
-
- Assert(on_commit_stop_workers == NULL ||
- (on_commit_stop_workers->nestDepth == 1 &&
- on_commit_stop_workers->parent == NULL));
-
if (isCommit)
{
- ListCell *lc;
-
- if (on_commit_stop_workers != NULL)
- {
- List *workers = on_commit_stop_workers->workers;
-
- foreach(lc, workers)
- {
- LogicalRepWorkerId *wid = lfirst(lc);
-
- logicalrep_worker_stop(wid->subid, wid->relid);
- }
- }
-
if (on_commit_launcher_wakeup)
ApplyLauncherWakeup();
}
- /*
- * No need to pfree on_commit_stop_workers. It was allocated in
- * transaction memory context, which is going to be cleaned soon.
- */
- on_commit_stop_workers = NULL;
on_commit_launcher_wakeup = false;
}
/*
- * On commit, merge the current on_commit_stop_workers list into the
- * immediate parent, if present.
- * On rollback, discard the current on_commit_stop_workers list.
- * Pop out the stack.
- */
-void
-AtEOSubXact_ApplyLauncher(bool isCommit, int nestDepth)
-{
- StopWorkersData *parent;
-
- /* Exit immediately if there's no work to do at this level. */
- if (on_commit_stop_workers == NULL ||
- on_commit_stop_workers->nestDepth < nestDepth)
- return;
-
- Assert(on_commit_stop_workers->nestDepth == nestDepth);
-
- parent = on_commit_stop_workers->parent;
-
- if (isCommit)
- {
- /*
- * If the upper stack element is not an immediate parent
- * subtransaction, just decrement the notional nesting depth without
- * doing any real work. Else, we need to merge the current workers
- * list into the parent.
- */
- if (!parent || parent->nestDepth < nestDepth - 1)
- {
- on_commit_stop_workers->nestDepth--;
- return;
- }
-
- parent->workers =
- list_concat(parent->workers, on_commit_stop_workers->workers);
- }
- else
- {
- /*
- * Abandon everything that was done at this nesting level. Explicitly
- * free memory to avoid a transaction-lifespan leak.
- */
- list_free_deep(on_commit_stop_workers->workers);
- }
-
- /*
- * We have taken care of the current subtransaction workers list for both
- * abort or commit. So we are ready to pop the stack.
- */
- pfree(on_commit_stop_workers);
- on_commit_stop_workers = parent;
-}
-
-/*
* Request wakeup of the launcher on commit of the transaction.
*
* This is used to send launcher signal to stop sleeping and process the
diff --git a/src/backend/replication/logical/tablesync.c b/src/backend/replication/logical/tablesync.c
index ccbdbcf..19cc804 100644
--- a/src/backend/replication/logical/tablesync.c
+++ b/src/backend/replication/logical/tablesync.c
@@ -31,8 +31,11 @@
* table state to INIT.
* - Tablesync worker starts; changes table state from INIT to DATASYNC while
* copying.
- * - Tablesync worker finishes the copy and sets table state to SYNCWAIT;
- * waits for state change.
+ * - Tablesync worker does initial table copy; there is a FINISHEDCOPY (sync
+ * worker specific) state to indicate when the copy phase has completed, so
+ * if the worker crashes with this (non-memory) state then the copy will not
+ * be re-attempted.
+ * - Tablesync worker then sets table state to SYNCWAIT; waits for state change.
* - Apply worker periodically checks for tables in SYNCWAIT state. When
* any appear, it sets the table state to CATCHUP and starts loop-waiting
* until either the table state is set to SYNCDONE or the sync worker
@@ -48,8 +51,8 @@
* point it sets state to READY and stops tracking. Again, there might
* be zero changes in between.
*
- * So the state progression is always: INIT -> DATASYNC -> SYNCWAIT ->
- * CATCHUP -> SYNCDONE -> READY.
+ * So the state progression is always: INIT -> DATASYNC -> FINISHEDCOPY
+ * -> SYNCWAIT -> CATCHUP -> SYNCDONE -> READY.
*
* The catalog pg_subscription_rel is used to keep information about
* subscribed tables and their state. The catalog holds all states
@@ -58,6 +61,7 @@
* Example flows look like this:
* - Apply is in front:
* sync:8
+ * -> set in catalog FINISHEDCOPY
* -> set in memory SYNCWAIT
* apply:10
* -> set in memory CATCHUP
@@ -73,6 +77,7 @@
*
* - Sync is in front:
* sync:10
+ * -> set in catalog FINISHEDCOPY
* -> set in memory SYNCWAIT
* apply:8
* -> set in memory CATCHUP
@@ -101,7 +106,10 @@
#include "replication/logicalrelation.h"
#include "replication/walreceiver.h"
#include "replication/worker_internal.h"
+#include "replication/slot.h"
+#include "replication/origin.h"
#include "storage/ipc.h"
+#include "storage/lmgr.h"
#include "utils/builtins.h"
#include "utils/lsyscache.h"
#include "utils/memutils.h"
@@ -269,26 +277,52 @@ invalidate_syncing_table_states(Datum arg, int cacheid, uint32 hashvalue)
static void
process_syncing_tables_for_sync(XLogRecPtr current_lsn)
{
- Assert(IsTransactionState());
-
SpinLockAcquire(&MyLogicalRepWorker->relmutex);
if (MyLogicalRepWorker->relstate == SUBREL_STATE_CATCHUP &&
current_lsn >= MyLogicalRepWorker->relstate_lsn)
{
TimeLineID tli;
+ char syncslotname[NAMEDATALEN] = {0};
MyLogicalRepWorker->relstate = SUBREL_STATE_SYNCDONE;
MyLogicalRepWorker->relstate_lsn = current_lsn;
SpinLockRelease(&MyLogicalRepWorker->relmutex);
+ /*
+ * UpdateSubscriptionRelState must be called within a transaction.
+ * That transaction will be ended within the finish_sync_worker().
+ */
+ if (!IsTransactionState())
+ StartTransactionCommand();
+
UpdateSubscriptionRelState(MyLogicalRepWorker->subid,
MyLogicalRepWorker->relid,
MyLogicalRepWorker->relstate,
MyLogicalRepWorker->relstate_lsn);
+ /* End wal streaming so wrconn can be re-used to drop the slot. */
walrcv_endstreaming(wrconn, &tli);
+
+ /*
+ * Cleanup the tablesync slot.
+ *
+ * This has to be done after updating the state because otherwise if
+ * there is an error while doing the database operations we won't be
+ * able to rollback dropped slot.
+ */
+ ReplicationSlotNameForTablesync(MyLogicalRepWorker->subid,
+ MyLogicalRepWorker->relid,
+ syncslotname);
+
+ /*
+ * It is important to give an error if we are unable to drop the slot,
+ * otherwise, it won't be dropped till the corresponding subscription
+ * is dropped. So passing missing_ok = false.
+ */
+ ReplicationSlotDropAtPubNode(wrconn, syncslotname, false);
+
finish_sync_worker();
}
else
@@ -403,6 +437,8 @@ process_syncing_tables_for_apply(XLogRecPtr current_lsn)
*/
if (current_lsn >= rstate->lsn)
{
+ char originname[NAMEDATALEN];
+
rstate->state = SUBREL_STATE_READY;
rstate->lsn = current_lsn;
if (!started_tx)
@@ -411,6 +447,27 @@ process_syncing_tables_for_apply(XLogRecPtr current_lsn)
started_tx = true;
}
+ /*
+ * Remove the tablesync origin tracking if exists.
+ *
+ * The normal case origin drop is done here instead of in the
+ * process_syncing_tables_for_sync function because we don't
+ * allow to drop the origin till the process owning the origin
+ * is alive.
+ *
+ * There is a chance that the user is concurrently performing
+ * refresh for the subscription where we remove the table
+ * state and its origin and by this time the origin might be
+ * already removed. So passing missing_ok = true.
+ */
+ ReplicationOriginNameForTablesync(MyLogicalRepWorker->subid,
+ rstate->relid,
+ originname);
+ replorigin_drop_by_name(originname, true, false);
+
+ /*
+ * Update the state to READY only after the origin cleanup.
+ */
UpdateSubscriptionRelState(MyLogicalRepWorker->subid,
rstate->relid, rstate->state,
rstate->lsn);
@@ -806,6 +863,50 @@ copy_table(Relation rel)
}
/*
+ * Determine the tablesync slot name.
+ *
+ * The name must not exceed NAMEDATALEN - 1 because of remote node constraints
+ * on slot name length. We append system_identifier to avoid slot_name
+ * collision with subscriptions in other clusters. With the current scheme
+ * pg_%u_sync_%u_UINT64_FORMAT (3 + 10 + 6 + 10 + 20 + '\0'), the maximum
+ * length of slot_name will be 50.
+ *
+ * The returned slot name is either:
+ * - stored in the supplied buffer (syncslotname), or
+ * - palloc'ed in current memory context (if syncslotname = NULL).
+ *
+ * Note: We don't use the subscription slot name as part of tablesync slot name
+ * because we are responsible for cleaning up these slots and it could become
+ * impossible to recalculate what name to cleanup if the subscription slot name
+ * had changed.
+ */
+char *
+ReplicationSlotNameForTablesync(Oid suboid, Oid relid,
+ char syncslotname[NAMEDATALEN])
+{
+ if (syncslotname)
+ sprintf(syncslotname, "pg_%u_sync_%u_" UINT64_FORMAT, suboid, relid,
+ GetSystemIdentifier());
+ else
+ syncslotname = psprintf("pg_%u_sync_%u_" UINT64_FORMAT, suboid, relid,
+ GetSystemIdentifier());
+
+ return syncslotname;
+}
+
+/*
+ * Form the origin name for tablesync.
+ *
+ * Return the name in the supplied buffer.
+ */
+void
+ReplicationOriginNameForTablesync(Oid suboid, Oid relid,
+ char originname[NAMEDATALEN])
+{
+ snprintf(originname, NAMEDATALEN, "pg_%u_%u", suboid, relid);
+}
+
+/*
* Start syncing the table in the sync worker.
*
* If nothing needs to be done to sync the table, we exit the worker without
@@ -822,6 +923,8 @@ LogicalRepSyncTableStart(XLogRecPtr *origin_startpos)
XLogRecPtr relstate_lsn;
Relation rel;
WalRcvExecResult *res;
+ char originname[NAMEDATALEN];
+ RepOriginId originid;
/* Check the state of the table synchronization. */
StartTransactionCommand();
@@ -847,19 +950,10 @@ LogicalRepSyncTableStart(XLogRecPtr *origin_startpos)
finish_sync_worker(); /* doesn't return */
}
- /*
- * To build a slot name for the sync work, we are limited to NAMEDATALEN -
- * 1 characters. We cut the original slot name to NAMEDATALEN - 28 chars
- * and append _%u_sync_%u (1 + 10 + 6 + 10 + '\0'). (It's actually the
- * NAMEDATALEN on the remote that matters, but this scheme will also work
- * reasonably if that is different.)
- */
- StaticAssertStmt(NAMEDATALEN >= 32, "NAMEDATALEN too small"); /* for sanity */
- slotname = psprintf("%.*s_%u_sync_%u",
- NAMEDATALEN - 28,
- MySubscription->slotname,
- MySubscription->oid,
- MyLogicalRepWorker->relid);
+ /* Calculate the name of the tablesync slot. */
+ slotname = ReplicationSlotNameForTablesync(MySubscription->oid,
+ MyLogicalRepWorker->relid,
+ NULL /* use palloc */ );
/*
* Here we use the slot name instead of the subscription name as the
@@ -872,7 +966,50 @@ LogicalRepSyncTableStart(XLogRecPtr *origin_startpos)
(errmsg("could not connect to the publisher: %s", err)));
Assert(MyLogicalRepWorker->relstate == SUBREL_STATE_INIT ||
- MyLogicalRepWorker->relstate == SUBREL_STATE_DATASYNC);
+ MyLogicalRepWorker->relstate == SUBREL_STATE_DATASYNC ||
+ MyLogicalRepWorker->relstate == SUBREL_STATE_FINISHEDCOPY);
+
+ /* Assign the origin tracking record name. */
+ ReplicationOriginNameForTablesync(MySubscription->oid,
+ MyLogicalRepWorker->relid,
+ originname);
+
+ if (MyLogicalRepWorker->relstate == SUBREL_STATE_DATASYNC)
+ {
+ /*
+ * We have previously errored out before finishing the copy so the
+ * replication slot might exist. We want to remove the slot if it
+ * already exists and proceed.
+ *
+ * XXX We could also instead try to drop the slot, last time we failed
+ * but for that, we might need to clean up the copy state as it might
+ * be in the middle of fetching the rows. Also, if there is a network
+ * breakdown then it wouldn't have succeeded so trying it next time
+ * seems like a better bet.
+ */
+ ReplicationSlotDropAtPubNode(wrconn, slotname, true);
+ }
+ else if (MyLogicalRepWorker->relstate == SUBREL_STATE_FINISHEDCOPY)
+ {
+ /*
+ * The COPY phase was previously done, but tablesync then crashed
+ * before it was able to finish normally.
+ */
+ StartTransactionCommand();
+
+ /*
+ * The origin tracking name must already exist. It was created first
+ * time this tablesync was launched.
+ */
+ originid = replorigin_by_name(originname, false);
+ replorigin_session_setup(originid);
+ replorigin_session_origin = originid;
+ *origin_startpos = replorigin_session_get_progress(false);
+
+ CommitTransactionCommand();
+
+ goto copy_table_done;
+ }
SpinLockAcquire(&MyLogicalRepWorker->relmutex);
MyLogicalRepWorker->relstate = SUBREL_STATE_DATASYNC;
@@ -888,9 +1025,6 @@ LogicalRepSyncTableStart(XLogRecPtr *origin_startpos)
CommitTransactionCommand();
pgstat_report_stat(false);
- /*
- * We want to do the table data sync in a single transaction.
- */
StartTransactionCommand();
/*
@@ -916,13 +1050,46 @@ LogicalRepSyncTableStart(XLogRecPtr *origin_startpos)
walrcv_clear_result(res);
/*
- * Create a new temporary logical decoding slot. This slot will be used
+ * Create a new permanent logical decoding slot. This slot will be used
* for the catchup phase after COPY is done, so tell it to use the
* snapshot to make the final data consistent.
*/
- walrcv_create_slot(wrconn, slotname, true,
+ walrcv_create_slot(wrconn, slotname, false /* permanent */ ,
CRS_USE_SNAPSHOT, origin_startpos);
+ /*
+ * Setup replication origin tracking. The purpose of doing this before the
+ * copy is to avoid doing the copy again due to any error in setting up
+ * origin tracking.
+ */
+ originid = replorigin_by_name(originname, true);
+ if (!OidIsValid(originid))
+ {
+ /*
+ * Origin tracking does not exist, so create it now.
+ *
+ * Then advance to the LSN got from walrcv_create_slot. This is WAL
+ * logged for the purpose of recovery. Locks are to prevent the
+ * replication origin from vanishing while advancing.
+ */
+ originid = replorigin_create(originname);
+
+ LockRelationOid(ReplicationOriginRelationId, RowExclusiveLock);
+ replorigin_advance(originid, *origin_startpos, InvalidXLogRecPtr,
+ true /* go backward */ , true /* WAL log */ );
+ UnlockRelationOid(ReplicationOriginRelationId, RowExclusiveLock);
+
+ replorigin_session_setup(originid);
+ replorigin_session_origin = originid;
+ }
+ else
+ {
+ ereport(ERROR,
+ (errcode(ERRCODE_DUPLICATE_OBJECT),
+ errmsg("replication origin \"%s\" already exists",
+ originname)));
+ }
+
/* Now do the initial data copy */
PushActiveSnapshot(GetTransactionSnapshot());
copy_table(rel);
@@ -941,6 +1108,25 @@ LogicalRepSyncTableStart(XLogRecPtr *origin_startpos)
CommandCounterIncrement();
/*
+ * Update the persisted state to indicate the COPY phase is done; make it
+ * visible to others.
+ */
+ UpdateSubscriptionRelState(MyLogicalRepWorker->subid,
+ MyLogicalRepWorker->relid,
+ SUBREL_STATE_FINISHEDCOPY,
+ MyLogicalRepWorker->relstate_lsn);
+
+ CommitTransactionCommand();
+
+copy_table_done:
+
+ elog(DEBUG1,
+ "LogicalRepSyncTableStart: '%s' origin_startpos lsn %X/%X",
+ originname,
+ (uint32) (*origin_startpos >> 32),
+ (uint32) *origin_startpos);
+
+ /*
* We are done with the initial data synchronization, update the state.
*/
SpinLockAcquire(&MyLogicalRepWorker->relmutex);
diff --git a/src/backend/replication/logical/worker.c b/src/backend/replication/logical/worker.c
index eb7db89..cfc924c 100644
--- a/src/backend/replication/logical/worker.c
+++ b/src/backend/replication/logical/worker.c
@@ -807,12 +807,8 @@ apply_handle_stream_stop(StringInfo s)
/* We must be in a valid transaction state */
Assert(IsTransactionState());
- /* The synchronization worker runs in single transaction. */
- if (!am_tablesync_worker())
- {
- /* Commit the per-stream transaction */
- CommitTransactionCommand();
- }
+ /* Commit the per-stream transaction */
+ CommitTransactionCommand();
in_streamed_transaction = false;
@@ -889,9 +885,7 @@ apply_handle_stream_abort(StringInfo s)
/* Cleanup the subxact info */
cleanup_subxact_info();
- /* The synchronization worker runs in single transaction */
- if (!am_tablesync_worker())
- CommitTransactionCommand();
+ CommitTransactionCommand();
return;
}
@@ -918,8 +912,7 @@ apply_handle_stream_abort(StringInfo s)
/* write the updated subxact list */
subxact_info_write(MyLogicalRepWorker->subid, xid);
- if (!am_tablesync_worker())
- CommitTransactionCommand();
+ CommitTransactionCommand();
}
}
@@ -1062,8 +1055,7 @@ apply_handle_stream_commit(StringInfo s)
static void
apply_handle_commit_internal(StringInfo s, LogicalRepCommitData *commit_data)
{
- /* The synchronization worker runs in single transaction. */
- if (IsTransactionState() && !am_tablesync_worker())
+ if (IsTransactionState())
{
/*
* Update origin state so we can restart streaming from correct
diff --git a/src/backend/tcop/utility.c b/src/backend/tcop/utility.c
index 1d81071..05bb698 100644
--- a/src/backend/tcop/utility.c
+++ b/src/backend/tcop/utility.c
@@ -1786,7 +1786,8 @@ ProcessUtilitySlow(ParseState *pstate,
break;
case T_AlterSubscriptionStmt:
- address = AlterSubscription((AlterSubscriptionStmt *) parsetree);
+ address = AlterSubscription((AlterSubscriptionStmt *) parsetree,
+ isTopLevel);
break;
case T_DropSubscriptionStmt:
diff --git a/src/include/catalog/pg_subscription_rel.h b/src/include/catalog/pg_subscription_rel.h
index 2bea2c5..ed94f57 100644
--- a/src/include/catalog/pg_subscription_rel.h
+++ b/src/include/catalog/pg_subscription_rel.h
@@ -61,6 +61,8 @@ DECLARE_UNIQUE_INDEX_PKEY(pg_subscription_rel_srrelid_srsubid_index, 6117, on pg
#define SUBREL_STATE_INIT 'i' /* initializing (sublsn NULL) */
#define SUBREL_STATE_DATASYNC 'd' /* data is being synchronized (sublsn
* NULL) */
+#define SUBREL_STATE_FINISHEDCOPY 'f' /* tablesync copy phase is completed
+ * (sublsn NULL) */
#define SUBREL_STATE_SYNCDONE 's' /* synchronization finished in front of
* apply (sublsn set) */
#define SUBREL_STATE_READY 'r' /* ready (sublsn set) */
diff --git a/src/include/commands/subscriptioncmds.h b/src/include/commands/subscriptioncmds.h
index a818650..3b926f3 100644
--- a/src/include/commands/subscriptioncmds.h
+++ b/src/include/commands/subscriptioncmds.h
@@ -20,7 +20,7 @@
extern ObjectAddress CreateSubscription(CreateSubscriptionStmt *stmt,
bool isTopLevel);
-extern ObjectAddress AlterSubscription(AlterSubscriptionStmt *stmt);
+extern ObjectAddress AlterSubscription(AlterSubscriptionStmt *stmt, bool isTopLevel);
extern void DropSubscription(DropSubscriptionStmt *stmt, bool isTopLevel);
extern ObjectAddress AlterSubscriptionOwner(const char *name, Oid newOwnerId);
diff --git a/src/include/replication/logicallauncher.h b/src/include/replication/logicallauncher.h
index 421ec15..301e494 100644
--- a/src/include/replication/logicallauncher.h
+++ b/src/include/replication/logicallauncher.h
@@ -22,9 +22,7 @@ extern Size ApplyLauncherShmemSize(void);
extern void ApplyLauncherShmemInit(void);
extern void ApplyLauncherWakeupAtCommit(void);
-extern bool XactManipulatesLogicalReplicationWorkers(void);
extern void AtEOXact_ApplyLauncher(bool isCommit);
-extern void AtEOSubXact_ApplyLauncher(bool isCommit, int nestDepth);
extern bool IsLogicalLauncher(void);
diff --git a/src/include/replication/slot.h b/src/include/replication/slot.h
index 53f636c..5f52335 100644
--- a/src/include/replication/slot.h
+++ b/src/include/replication/slot.h
@@ -15,6 +15,7 @@
#include "storage/lwlock.h"
#include "storage/shmem.h"
#include "storage/spin.h"
+#include "replication/walreceiver.h"
/*
* Behaviour of replication slots, upon release or crash.
@@ -211,6 +212,8 @@ extern bool ReplicationSlotsCountDBSlots(Oid dboid, int *nslots, int *nactive);
extern void ReplicationSlotsDropDBSlots(Oid dboid);
extern void InvalidateObsoleteReplicationSlots(XLogSegNo oldestSegno);
extern ReplicationSlot *SearchNamedReplicationSlot(const char *name);
+extern char *ReplicationSlotNameForTablesync(Oid suboid, Oid relid, char *syncslotname);
+extern void ReplicationSlotDropAtPubNode(WalReceiverConn *wrconn, char *slotname, bool missing_ok);
extern void StartupReplicationSlots(void);
extern void CheckPointReplicationSlots(void);
diff --git a/src/include/replication/walreceiver.h b/src/include/replication/walreceiver.h
index 4313f51..a97a59a 100644
--- a/src/include/replication/walreceiver.h
+++ b/src/include/replication/walreceiver.h
@@ -210,6 +210,7 @@ typedef enum
typedef struct WalRcvExecResult
{
WalRcvExecStatus status;
+ int sqlstate;
char *err;
Tuplestorestate *tuplestore;
TupleDesc tupledesc;
diff --git a/src/include/replication/worker_internal.h b/src/include/replication/worker_internal.h
index d046022..4a5adc2 100644
--- a/src/include/replication/worker_internal.h
+++ b/src/include/replication/worker_internal.h
@@ -77,13 +77,14 @@ extern List *logicalrep_workers_find(Oid subid, bool only_running);
extern void logicalrep_worker_launch(Oid dbid, Oid subid, const char *subname,
Oid userid, Oid relid);
extern void logicalrep_worker_stop(Oid subid, Oid relid);
-extern void logicalrep_worker_stop_at_commit(Oid subid, Oid relid);
extern void logicalrep_worker_wakeup(Oid subid, Oid relid);
extern void logicalrep_worker_wakeup_ptr(LogicalRepWorker *worker);
extern int logicalrep_sync_worker_count(Oid subid);
+extern void ReplicationOriginNameForTablesync(Oid suboid, Oid relid, char *originname);
extern char *LogicalRepSyncTableStart(XLogRecPtr *origin_startpos);
+
void process_syncing_tables(XLogRecPtr current_lsn);
void invalidate_syncing_table_states(Datum arg, int cacheid,
uint32 hashvalue);
diff --git a/src/test/regress/expected/subscription.out b/src/test/regress/expected/subscription.out
index 2fa9bce..7802279 100644
--- a/src/test/regress/expected/subscription.out
+++ b/src/test/regress/expected/subscription.out
@@ -201,6 +201,27 @@ ALTER SUBSCRIPTION regress_testsub SET (slot_name = NONE);
(1 row)
DROP SUBSCRIPTION regress_testsub;
+CREATE SUBSCRIPTION regress_testsub CONNECTION 'dbname=postgres' PUBLICATION mypub
+ WITH (enabled = true, create_slot = false, copy_data = false);
+-- fail - ALTER SUBSCRIPTION with refresh is not allowed in a transaction
+-- block or function
+BEGIN;
+ALTER SUBSCRIPTION regress_testsub SET PUBLICATION mypub WITH (refresh = true);
+ERROR: ALTER SUBSCRIPTION with refresh cannot run inside a transaction block
+END;
+BEGIN;
+ALTER SUBSCRIPTION regress_testsub REFRESH PUBLICATION;
+ERROR: ALTER SUBSCRIPTION ... REFRESH cannot run inside a transaction block
+END;
+CREATE FUNCTION func() RETURNS VOID AS
+$$ ALTER SUBSCRIPTION regress_testsub SET PUBLICATION mypub WITH (refresh = true) $$ LANGUAGE SQL;
+SELECT func();
+ERROR: ALTER SUBSCRIPTION with refresh cannot be executed from a function
+CONTEXT: SQL function "func" statement 1
+ALTER SUBSCRIPTION regress_testsub DISABLE;
+ALTER SUBSCRIPTION regress_testsub SET (slot_name = NONE);
+DROP SUBSCRIPTION regress_testsub;
+DROP FUNCTION func;
RESET SESSION AUTHORIZATION;
DROP ROLE regress_subscription_user;
DROP ROLE regress_subscription_user2;
diff --git a/src/test/regress/sql/subscription.sql b/src/test/regress/sql/subscription.sql
index 14fa0b2..ca0d782 100644
--- a/src/test/regress/sql/subscription.sql
+++ b/src/test/regress/sql/subscription.sql
@@ -147,6 +147,28 @@ ALTER SUBSCRIPTION regress_testsub SET (slot_name = NONE);
DROP SUBSCRIPTION regress_testsub;
+CREATE SUBSCRIPTION regress_testsub CONNECTION 'dbname=postgres' PUBLICATION mypub
+ WITH (enabled = true, create_slot = false, copy_data = false);
+
+-- fail - ALTER SUBSCRIPTION with refresh is not allowed in a transaction
+-- block or function
+BEGIN;
+ALTER SUBSCRIPTION regress_testsub SET PUBLICATION mypub WITH (refresh = true);
+END;
+
+BEGIN;
+ALTER SUBSCRIPTION regress_testsub REFRESH PUBLICATION;
+END;
+
+CREATE FUNCTION func() RETURNS VOID AS
+$$ ALTER SUBSCRIPTION regress_testsub SET PUBLICATION mypub WITH (refresh = true) $$ LANGUAGE SQL;
+SELECT func();
+
+ALTER SUBSCRIPTION regress_testsub DISABLE;
+ALTER SUBSCRIPTION regress_testsub SET (slot_name = NONE);
+DROP SUBSCRIPTION regress_testsub;
+DROP FUNCTION func;
+
RESET SESSION AUTHORIZATION;
DROP ROLE regress_subscription_user;
DROP ROLE regress_subscription_user2;
diff --git a/src/test/subscription/t/004_sync.pl b/src/test/subscription/t/004_sync.pl
index e111ab9..c792668 100644
--- a/src/test/subscription/t/004_sync.pl
+++ b/src/test/subscription/t/004_sync.pl
@@ -3,7 +3,7 @@ use strict;
use warnings;
use PostgresNode;
use TestLib;
-use Test::More tests => 7;
+use Test::More tests => 8;
# Initialize publisher node
my $node_publisher = get_new_node('publisher');
@@ -149,7 +149,26 @@ $result = $node_subscriber->safe_psql('postgres',
is($result, qq(20),
'changes for table added after subscription initialized replicated');
+# clean up
+$node_publisher->safe_psql('postgres', "DROP TABLE tab_rep_next");
+$node_subscriber->safe_psql('postgres', "DROP TABLE tab_rep_next");
$node_subscriber->safe_psql('postgres', "DROP SUBSCRIPTION tap_sub");
+# Table tap_rep already has the same records on both publisher and subscriber
+# at this time. Recreate the subscription which will do the initial copy of
+# the table again and fails due to unique constraint violation.
+$node_subscriber->safe_psql('postgres',
+ "CREATE SUBSCRIPTION tap_sub CONNECTION '$publisher_connstr' PUBLICATION tap_pub");
+
+$result = $node_subscriber->poll_query_until('postgres', $started_query)
+ or die "Timed out while waiting for subscriber to start sync";
+
+# DROP SUBSCRIPTION must clean up slots on the publisher side when the
+# subscriber is stuck on data copy for constraint violation.
+$node_subscriber->safe_psql('postgres', "DROP SUBSCRIPTION tap_sub");
+
+$result = $node_publisher->safe_psql('postgres', "SELECT count(*) FROM pg_replication_slots");
+is($result, qq(0), 'DROP SUBSCRIPTION during error can clean up the slots on the publisher');
+
$node_subscriber->stop('fast');
$node_publisher->stop('fast');
diff --git a/src/tools/pgindent/typedefs.list b/src/tools/pgindent/typedefs.list
index 1d540fe..bab4f3a 100644
--- a/src/tools/pgindent/typedefs.list
+++ b/src/tools/pgindent/typedefs.list
@@ -2397,7 +2397,6 @@ StdAnalyzeData
StdRdOptions
Step
StopList
-StopWorkersData
StrategyNumber
StreamCtl
StreamXidHash
@@ -2408,6 +2407,7 @@ SubLink
SubLinkType
SubPlan
SubPlanState
+SubRemoveRels
SubTransactionId
SubXactCallback
SubXactCallbackItem
--
1.8.3.1
I have reviewed again the latest patch (V31)
I found only a few minor nitpick issues not worth listing.
Then I ran the subscription TAP tests 50x in a loop as a kind of
stress test. That ran for 2.5hrs and the result was all 50x 'Result:
PASS'.
So V31 looks good to me.
------
Kind Regards,
Peter Smith.
Fujitsu Australia
On 10 Feb 2021, at 06:32, Amit Kapila <amit.kapila16@gmail.com> wrote:
On Wed, Feb 10, 2021 at 7:41 AM Peter Smith <smithpb2250@gmail.com> wrote:
On Tue, Feb 9, 2021 at 10:38 AM Peter Smith <smithpb2250@gmail.com> wrote:
PSA v2 of this WalRcvExceResult patch (it is same as v1 but includes
some PG doc updates).
This applies OK on top of v30 of the main patch.Thanks, I have integrated these changes into the main patch and
additionally made some changes to comments and docs. I have also fixed
the function name inconsistency issue you reported and ran pgindent.
One thing:
+ else if (res->status == WALRCV_ERROR && + missing_ok && + res->sqlstate == ERRCODE_UNDEFINED_OBJECT) + { + /* WARNING. Error, but missing_ok = true. */ + ereport(WARNING, (errmsg("could not drop the replication slot \"%s\" on publisher", slotname), errdetail("The error was: %s", res->err)));
Hmm, why is this WARNING, we mostly call it with missing_ok = true when the slot is not expected to be there, so it does not seem correct to report it as warning?
--
Petr
On Thu, Feb 11, 2021 at 1:51 PM Petr Jelinek
<petr.jelinek@enterprisedb.com> wrote:
On 10 Feb 2021, at 06:32, Amit Kapila <amit.kapila16@gmail.com> wrote:
On Wed, Feb 10, 2021 at 7:41 AM Peter Smith <smithpb2250@gmail.com> wrote:
On Tue, Feb 9, 2021 at 10:38 AM Peter Smith <smithpb2250@gmail.com> wrote:
PSA v2 of this WalRcvExceResult patch (it is same as v1 but includes
some PG doc updates).
This applies OK on top of v30 of the main patch.Thanks, I have integrated these changes into the main patch and
additionally made some changes to comments and docs. I have also fixed
the function name inconsistency issue you reported and ran pgindent.One thing:
+ else if (res->status == WALRCV_ERROR && + missing_ok && + res->sqlstate == ERRCODE_UNDEFINED_OBJECT) + { + /* WARNING. Error, but missing_ok = true. */ + ereport(WARNING, (errmsg("could not drop the replication slot \"%s\" on publisher", slotname), errdetail("The error was: %s", res->err)));Hmm, why is this WARNING, we mostly call it with missing_ok = true when the slot is not expected to be there, so it does not seem correct to report it as warning?
WARNING is for the cases where we don't always expect slots to exist
and we don't want to stop the operation due to it. For example, in
DropSubscription, for some of the rel states like (SUBREL_STATE_INIT
and SUBREL_STATE_DATASYNC), the slot won't exist. Similarly, say if we
fail (due to network error) after removing some of the slots, next
time, it will again try to drop already dropped slots and fail. For
these reasons, we need to use WARNING. Similarly for tablesync workers
when we are trying to initially drop the slot there is no certainty
that it exists, so we can't throw ERROR and stop the operation there.
There are other cases like when the table sync worker has finished
syncing the table, there we will raise an ERROR if the slot doesn't
exist. Does this make sense?
--
With Regards,
Amit Kapila.
On 11 Feb 2021, at 10:42, Amit Kapila <amit.kapila16@gmail.com> wrote:
On Thu, Feb 11, 2021 at 1:51 PM Petr Jelinek
<petr.jelinek@enterprisedb.com> wrote:On 10 Feb 2021, at 06:32, Amit Kapila <amit.kapila16@gmail.com> wrote:
On Wed, Feb 10, 2021 at 7:41 AM Peter Smith <smithpb2250@gmail.com> wrote:
On Tue, Feb 9, 2021 at 10:38 AM Peter Smith <smithpb2250@gmail.com> wrote:
PSA v2 of this WalRcvExceResult patch (it is same as v1 but includes
some PG doc updates).
This applies OK on top of v30 of the main patch.Thanks, I have integrated these changes into the main patch and
additionally made some changes to comments and docs. I have also fixed
the function name inconsistency issue you reported and ran pgindent.One thing:
+ else if (res->status == WALRCV_ERROR && + missing_ok && + res->sqlstate == ERRCODE_UNDEFINED_OBJECT) + { + /* WARNING. Error, but missing_ok = true. */ + ereport(WARNING, (errmsg("could not drop the replication slot \"%s\" on publisher", slotname), errdetail("The error was: %s", res->err)));Hmm, why is this WARNING, we mostly call it with missing_ok = true when the slot is not expected to be there, so it does not seem correct to report it as warning?
WARNING is for the cases where we don't always expect slots to exist
and we don't want to stop the operation due to it. For example, in
DropSubscription, for some of the rel states like (SUBREL_STATE_INIT
and SUBREL_STATE_DATASYNC), the slot won't exist. Similarly, say if we
fail (due to network error) after removing some of the slots, next
time, it will again try to drop already dropped slots and fail. For
these reasons, we need to use WARNING. Similarly for tablesync workers
when we are trying to initially drop the slot there is no certainty
that it exists, so we can't throw ERROR and stop the operation there.
There are other cases like when the table sync worker has finished
syncing the table, there we will raise an ERROR if the slot doesn't
exist. Does this make sense?
Well, I was thinking it could be NOTICE or LOG to be honest, WARNING seems unnecessarily scary for those usecases to me.
—
Petr
On Thu, Feb 11, 2021 at 3:20 PM Petr Jelinek
<petr.jelinek@enterprisedb.com> wrote:
On 11 Feb 2021, at 10:42, Amit Kapila <amit.kapila16@gmail.com> wrote:
On Thu, Feb 11, 2021 at 1:51 PM Petr Jelinek
<petr.jelinek@enterprisedb.com> wrote:On 10 Feb 2021, at 06:32, Amit Kapila <amit.kapila16@gmail.com> wrote:
On Wed, Feb 10, 2021 at 7:41 AM Peter Smith <smithpb2250@gmail.com> wrote:
On Tue, Feb 9, 2021 at 10:38 AM Peter Smith <smithpb2250@gmail.com> wrote:
PSA v2 of this WalRcvExceResult patch (it is same as v1 but includes
some PG doc updates).
This applies OK on top of v30 of the main patch.Thanks, I have integrated these changes into the main patch and
additionally made some changes to comments and docs. I have also fixed
the function name inconsistency issue you reported and ran pgindent.One thing:
+ else if (res->status == WALRCV_ERROR && + missing_ok && + res->sqlstate == ERRCODE_UNDEFINED_OBJECT) + { + /* WARNING. Error, but missing_ok = true. */ + ereport(WARNING, (errmsg("could not drop the replication slot \"%s\" on publisher", slotname), errdetail("The error was: %s", res->err)));Hmm, why is this WARNING, we mostly call it with missing_ok = true when the slot is not expected to be there, so it does not seem correct to report it as warning?
WARNING is for the cases where we don't always expect slots to exist
and we don't want to stop the operation due to it. For example, in
DropSubscription, for some of the rel states like (SUBREL_STATE_INIT
and SUBREL_STATE_DATASYNC), the slot won't exist. Similarly, say if we
fail (due to network error) after removing some of the slots, next
time, it will again try to drop already dropped slots and fail. For
these reasons, we need to use WARNING. Similarly for tablesync workers
when we are trying to initially drop the slot there is no certainty
that it exists, so we can't throw ERROR and stop the operation there.
There are other cases like when the table sync worker has finished
syncing the table, there we will raise an ERROR if the slot doesn't
exist. Does this make sense?Well, I was thinking it could be NOTICE or LOG to be honest, WARNING seems unnecessarily scary for those usecases to me.
I am fine with LOG and will make that change. Do you have any more
comments or want to spend more time on this patch before we call it
good?
--
With Regards,
Amit Kapila.
On 11 Feb 2021, at 10:56, Amit Kapila <amit.kapila16@gmail.com> wrote:
On Thu, Feb 11, 2021 at 3:20 PM Petr Jelinek
<petr.jelinek@enterprisedb.com> wrote:On 11 Feb 2021, at 10:42, Amit Kapila <amit.kapila16@gmail.com> wrote:
On Thu, Feb 11, 2021 at 1:51 PM Petr Jelinek
<petr.jelinek@enterprisedb.com> wrote:On 10 Feb 2021, at 06:32, Amit Kapila <amit.kapila16@gmail.com> wrote:
On Wed, Feb 10, 2021 at 7:41 AM Peter Smith <smithpb2250@gmail.com> wrote:
On Tue, Feb 9, 2021 at 10:38 AM Peter Smith <smithpb2250@gmail.com> wrote:
PSA v2 of this WalRcvExceResult patch (it is same as v1 but includes
some PG doc updates).
This applies OK on top of v30 of the main patch.Thanks, I have integrated these changes into the main patch and
additionally made some changes to comments and docs. I have also fixed
the function name inconsistency issue you reported and ran pgindent.One thing:
+ else if (res->status == WALRCV_ERROR && + missing_ok && + res->sqlstate == ERRCODE_UNDEFINED_OBJECT) + { + /* WARNING. Error, but missing_ok = true. */ + ereport(WARNING, (errmsg("could not drop the replication slot \"%s\" on publisher", slotname), errdetail("The error was: %s", res->err)));Hmm, why is this WARNING, we mostly call it with missing_ok = true when the slot is not expected to be there, so it does not seem correct to report it as warning?
WARNING is for the cases where we don't always expect slots to exist
and we don't want to stop the operation due to it. For example, in
DropSubscription, for some of the rel states like (SUBREL_STATE_INIT
and SUBREL_STATE_DATASYNC), the slot won't exist. Similarly, say if we
fail (due to network error) after removing some of the slots, next
time, it will again try to drop already dropped slots and fail. For
these reasons, we need to use WARNING. Similarly for tablesync workers
when we are trying to initially drop the slot there is no certainty
that it exists, so we can't throw ERROR and stop the operation there.
There are other cases like when the table sync worker has finished
syncing the table, there we will raise an ERROR if the slot doesn't
exist. Does this make sense?Well, I was thinking it could be NOTICE or LOG to be honest, WARNING seems unnecessarily scary for those usecases to me.
I am fine with LOG and will make that change. Do you have any more
comments or want to spend more time on this patch before we call it
good?
I am good, thanks!
—
Petr
On Thu, Feb 11, 2021 at 3:32 PM Petr Jelinek
<petr.jelinek@enterprisedb.com> wrote:
On 11 Feb 2021, at 10:56, Amit Kapila <amit.kapila16@gmail.com> wrote:
Well, I was thinking it could be NOTICE or LOG to be honest, WARNING seems unnecessarily scary for those usecases to me.
I am fine with LOG and will make that change. Do you have any more
comments or want to spend more time on this patch before we call it
good?I am good, thanks!
Okay, attached an updated patch with only that change.
--
With Regards,
Amit Kapila.
Attachments:
v32-0001-Allow-multiple-xacts-during-table-sync-in-logica.patchapplication/octet-stream; name=v32-0001-Allow-multiple-xacts-during-table-sync-in-logica.patchDownload
From 53f6eecba9733df6eb7041813caba5789ad3c037 Mon Sep 17 00:00:00 2001
From: Amit Kapila <akapila@postgresql.org>
Date: Wed, 10 Feb 2021 09:20:31 +0530
Subject: [PATCH v32] Allow multiple xacts during table sync in logical
replication.
For the initial table data synchronization in logical replication, we use
a single transaction to copy the entire table and then synchronize the
position in the stream with the main apply worker.
There are multiple downsides of this approach: (a) We have to perform the
entire copy operation again if there is any error (network breakdown,
error in the database operation, etc.) while we synchronize the WAL
position between tablesync worker and apply worker; this will be onerous
especially for large copies, (b) Using a single transaction in the
synchronization-phase (where we can receive WAL from multiple
transactions) will have the risk of exceeding the CID limit, (c) The slot
will hold the WAL till the entire sync is complete because we never commit
till the end.
This patch solves all the above downsides by allowing multiple
transactions during the tablesync phase. The initial copy is done in a
single transaction and after that, we commit each transaction as we
receive. To allow recovery after any error or crash, we use a permanent
slot and origin to track the progress. The slot and origin will be removed
once we finish the synchronization of the table. We also remove slot and
origin of tablesync workers if the user performs DROP SUBSCRIPTION .. or
ALTER SUBSCRIPTION .. REFERESH and some of the table syncs are still not
finished.
The commands ALTER SUBSCRIPTION ... REFRESH PUBLICATION and
ALTER SUBSCRIPTION ... SET PUBLICATION ... with refresh option as true
cannot be executed inside a transaction block because they can now drop
the slots for which we have no provision to rollback.
This will also open up the path for logical replication of 2PC
transactions on the subscriber side. Previously, we can't do that because
of the requirement of maintaining a single transaction in tablesync
workers.
Author: Peter Smith, Amit Kapila, and Takamichi Osumi
Reviewed-by: Ajin Cherian, Petr Jelinek, Hou Zhijie and Amit Kapila
Discussion: https://postgr.es/m/CAA4eK1KHJxaZS-fod-0fey=0tq3=Gkn4ho=8N4-5HWiCfu0H1A@mail.gmail.com
---
doc/src/sgml/catalogs.sgml | 1 +
doc/src/sgml/logical-replication.sgml | 59 ++-
doc/src/sgml/ref/alter_subscription.sgml | 18 +
doc/src/sgml/ref/drop_subscription.sgml | 6 +-
src/backend/access/transam/xact.c | 11 -
src/backend/catalog/pg_subscription.c | 39 ++
src/backend/commands/subscriptioncmds.c | 467 ++++++++++++++----
.../libpqwalreceiver/libpqwalreceiver.c | 8 +
src/backend/replication/logical/launcher.c | 147 ------
src/backend/replication/logical/tablesync.c | 236 ++++++++-
src/backend/replication/logical/worker.c | 18 +-
src/backend/tcop/utility.c | 3 +-
src/include/catalog/pg_subscription_rel.h | 2 +
src/include/commands/subscriptioncmds.h | 2 +-
src/include/replication/logicallauncher.h | 2 -
src/include/replication/slot.h | 3 +
src/include/replication/walreceiver.h | 1 +
src/include/replication/worker_internal.h | 3 +-
src/test/regress/expected/subscription.out | 21 +
src/test/regress/sql/subscription.sql | 22 +
src/test/subscription/t/004_sync.pl | 21 +-
src/tools/pgindent/typedefs.list | 2 +-
22 files changed, 767 insertions(+), 325 deletions(-)
diff --git a/doc/src/sgml/catalogs.sgml b/doc/src/sgml/catalogs.sgml
index ea222c0464..692ad65de2 100644
--- a/doc/src/sgml/catalogs.sgml
+++ b/doc/src/sgml/catalogs.sgml
@@ -7673,6 +7673,7 @@ SCRAM-SHA-256$<replaceable><iteration count></replaceable>:<replaceable>&l
State code:
<literal>i</literal> = initialize,
<literal>d</literal> = data is being copied,
+ <literal>f</literal> = finished table copy,
<literal>s</literal> = synchronized,
<literal>r</literal> = ready (normal replication)
</para></entry>
diff --git a/doc/src/sgml/logical-replication.sgml b/doc/src/sgml/logical-replication.sgml
index a560ad69b4..d0742f2c52 100644
--- a/doc/src/sgml/logical-replication.sgml
+++ b/doc/src/sgml/logical-replication.sgml
@@ -186,9 +186,10 @@
<para>
Each subscription will receive changes via one replication slot (see
- <xref linkend="streaming-replication-slots"/>). Additional temporary
- replication slots may be required for the initial data synchronization
- of pre-existing table data.
+ <xref linkend="streaming-replication-slots"/>). Additional replication
+ slots may be required for the initial data synchronization of
+ pre-existing table data and those will be dropped at the end of data
+ synchronization.
</para>
<para>
@@ -248,13 +249,23 @@
<para>
As mentioned earlier, each (active) subscription receives changes from a
- replication slot on the remote (publishing) side. Normally, the remote
- replication slot is created automatically when the subscription is created
- using <command>CREATE SUBSCRIPTION</command> and it is dropped
- automatically when the subscription is dropped using <command>DROP
- SUBSCRIPTION</command>. In some situations, however, it can be useful or
- necessary to manipulate the subscription and the underlying replication
- slot separately. Here are some scenarios:
+ replication slot on the remote (publishing) side.
+ </para>
+ <para>
+ Additional table synchronization slots are normally transient, created
+ internally to perform initial table synchronization and dropped
+ automatically when they are no longer needed. These table synchronization
+ slots have generated names: <quote><literal>pg_%u_sync_%u_%llu</literal></quote>
+ (parameters: Subscription <parameter>oid</parameter>,
+ Table <parameter>relid</parameter>, system identifier <parameter>sysid</parameter>)
+ </para>
+ <para>
+ Normally, the remote replication slot is created automatically when the
+ subscription is created using <command>CREATE SUBSCRIPTION</command> and it
+ is dropped automatically when the subscription is dropped using
+ <command>DROP SUBSCRIPTION</command>. In some situations, however, it can
+ be useful or necessary to manipulate the subscription and the underlying
+ replication slot separately. Here are some scenarios:
<itemizedlist>
<listitem>
@@ -294,8 +305,9 @@
using <command>ALTER SUBSCRIPTION</command> before attempting to drop
the subscription. If the remote database instance no longer exists, no
further action is then necessary. If, however, the remote database
- instance is just unreachable, the replication slot should then be
- dropped manually; otherwise it would continue to reserve WAL and might
+ instance is just unreachable, the replication slot (and any still
+ remaining table synchronization slots) should then be
+ dropped manually; otherwise it/they would continue to reserve WAL and might
eventually cause the disk to fill up. Such cases should be carefully
investigated.
</para>
@@ -468,16 +480,19 @@
<sect2 id="logical-replication-snapshot">
<title>Initial Snapshot</title>
<para>
- The initial data in existing subscribed tables are snapshotted and
- copied in a parallel instance of a special kind of apply process.
- This process will create its own temporary replication slot and
- copy the existing data. Once existing data is copied, the worker
- enters synchronization mode, which ensures that the table is brought
- up to a synchronized state with the main apply process by streaming
- any changes that happened during the initial data copy using standard
- logical replication. Once the synchronization is done, the control
- of the replication of the table is given back to the main apply
- process where the replication continues as normal.
+ The initial data in existing subscribed tables are snapshotted and
+ copied in a parallel instance of a special kind of apply process.
+ This process will create its own replication slot and copy the existing
+ data. As soon as the copy is finished the table contents will become
+ visible to other backends. Once existing data is copied, the worker
+ enters synchronization mode, which ensures that the table is brought
+ up to a synchronized state with the main apply process by streaming
+ any changes that happened during the initial data copy using standard
+ logical replication. During this synchronization phase, the changes
+ are applied and committed in the same order as they happened on the
+ publisher. Once the synchronization is done, the control of the
+ replication of the table is given back to the main apply process where
+ the replication continues as normal.
</para>
</sect2>
</sect1>
diff --git a/doc/src/sgml/ref/alter_subscription.sgml b/doc/src/sgml/ref/alter_subscription.sgml
index db5e59f707..bcb0acf28d 100644
--- a/doc/src/sgml/ref/alter_subscription.sgml
+++ b/doc/src/sgml/ref/alter_subscription.sgml
@@ -48,6 +48,24 @@ ALTER SUBSCRIPTION <replaceable class="parameter">name</replaceable> RENAME TO <
(Currently, all subscription owners must be superusers, so the owner checks
will be bypassed in practice. But this might change in the future.)
</para>
+
+ <para>
+ When refreshing a publication we remove the relations that are no longer
+ part of the publication and we also remove the tablesync slots if there are
+ any. It is necessary to remove tablesync slots so that the resources
+ allocated for the subscription on the remote host are released. If due to
+ network breakdown or some other error, <productname>PostgreSQL</productname>
+ is unable to remove the slots, an ERROR will be reported. To proceed in this
+ situation, either the user need to retry the operation or disassociate the
+ slot from the subscription and drop the subscription as explained in
+ <xref linkend="sql-dropsubscription"/>.
+ </para>
+
+ <para>
+ Commands <command>ALTER SUBSCRIPTION ... REFRESH PUBLICATION</command> and
+ <command>ALTER SUBSCRIPTION ... SET PUBLICATION ...</command> with refresh
+ option as true cannot be executed inside a transaction block.
+ </para>
</refsect1>
<refsect1>
diff --git a/doc/src/sgml/ref/drop_subscription.sgml b/doc/src/sgml/ref/drop_subscription.sgml
index adbdeafb4e..aee9615546 100644
--- a/doc/src/sgml/ref/drop_subscription.sgml
+++ b/doc/src/sgml/ref/drop_subscription.sgml
@@ -79,7 +79,8 @@ DROP SUBSCRIPTION [ IF EXISTS ] <replaceable class="parameter">name</replaceable
<para>
When dropping a subscription that is associated with a replication slot on
the remote host (the normal state), <command>DROP SUBSCRIPTION</command>
- will connect to the remote host and try to drop the replication slot as
+ will connect to the remote host and try to drop the replication slot (and
+ any remaining table synchronization slots) as
part of its operation. This is necessary so that the resources allocated
for the subscription on the remote host are released. If this fails,
either because the remote host is not reachable or because the remote
@@ -89,7 +90,8 @@ DROP SUBSCRIPTION [ IF EXISTS ] <replaceable class="parameter">name</replaceable
executing <literal>ALTER SUBSCRIPTION ... SET (slot_name = NONE)</literal>.
After that, <command>DROP SUBSCRIPTION</command> will no longer attempt any
actions on a remote host. Note that if the remote replication slot still
- exists, it should then be dropped manually; otherwise it will continue to
+ exists, it (and any related table synchronization slots) should then be
+ dropped manually; otherwise it/they will continue to
reserve WAL and might eventually cause the disk to fill up. See
also <xref linkend="logical-replication-subscription-slot"/>.
</para>
diff --git a/src/backend/access/transam/xact.c b/src/backend/access/transam/xact.c
index a2068e3fd4..3c8b4eb362 100644
--- a/src/backend/access/transam/xact.c
+++ b/src/backend/access/transam/xact.c
@@ -2432,15 +2432,6 @@ PrepareTransaction(void)
(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
errmsg("cannot PREPARE a transaction that has exported snapshots")));
- /*
- * Don't allow PREPARE but for transaction that has/might kill logical
- * replication workers.
- */
- if (XactManipulatesLogicalReplicationWorkers())
- ereport(ERROR,
- (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
- errmsg("cannot PREPARE a transaction that has manipulated logical replication workers")));
-
/* Prevent cancel/die interrupt while cleaning up */
HOLD_INTERRUPTS();
@@ -4899,7 +4890,6 @@ CommitSubTransaction(void)
AtEOSubXact_HashTables(true, s->nestingLevel);
AtEOSubXact_PgStat(true, s->nestingLevel);
AtSubCommit_Snapshot(s->nestingLevel);
- AtEOSubXact_ApplyLauncher(true, s->nestingLevel);
/*
* We need to restore the upper transaction's read-only state, in case the
@@ -5059,7 +5049,6 @@ AbortSubTransaction(void)
AtEOSubXact_HashTables(false, s->nestingLevel);
AtEOSubXact_PgStat(false, s->nestingLevel);
AtSubAbort_Snapshot(s->nestingLevel);
- AtEOSubXact_ApplyLauncher(false, s->nestingLevel);
}
/*
diff --git a/src/backend/catalog/pg_subscription.c b/src/backend/catalog/pg_subscription.c
index 44cb285b68..750ec2ac17 100644
--- a/src/backend/catalog/pg_subscription.c
+++ b/src/backend/catalog/pg_subscription.c
@@ -29,6 +29,7 @@
#include "utils/array.h"
#include "utils/builtins.h"
#include "utils/fmgroids.h"
+#include "utils/lsyscache.h"
#include "utils/pg_lsn.h"
#include "utils/rel.h"
#include "utils/syscache.h"
@@ -337,6 +338,13 @@ GetSubscriptionRelState(Oid subid, Oid relid, XLogRecPtr *sublsn)
char substate;
bool isnull;
Datum d;
+ Relation rel;
+
+ /*
+ * This is to avoid the race condition with AlterSubscription which tries
+ * to remove this relstate.
+ */
+ rel = table_open(SubscriptionRelRelationId, AccessShareLock);
/* Try finding the mapping. */
tup = SearchSysCache2(SUBSCRIPTIONRELMAP,
@@ -363,6 +371,8 @@ GetSubscriptionRelState(Oid subid, Oid relid, XLogRecPtr *sublsn)
/* Cleanup */
ReleaseSysCache(tup);
+ table_close(rel, AccessShareLock);
+
return substate;
}
@@ -403,6 +413,35 @@ RemoveSubscriptionRel(Oid subid, Oid relid)
scan = table_beginscan_catalog(rel, nkeys, skey);
while (HeapTupleIsValid(tup = heap_getnext(scan, ForwardScanDirection)))
{
+ Form_pg_subscription_rel subrel;
+
+ subrel = (Form_pg_subscription_rel) GETSTRUCT(tup);
+
+ /*
+ * We don't allow to drop the relation mapping when the table
+ * synchronization is in progress unless the caller updates the
+ * corresponding subscription as well. This is to ensure that we don't
+ * leave tablesync slots or origins in the system when the
+ * corresponding table is dropped.
+ */
+ if (!OidIsValid(subid) && subrel->srsubstate != SUBREL_STATE_READY)
+ {
+ ereport(ERROR,
+ (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+ errmsg("could not drop relation mapping for subscription \"%s\"",
+ get_subscription_name(subrel->srsubid, false)),
+ errdetail("Table synchronization for relation \"%s\" is in progress and is in state \"%c\".",
+ get_rel_name(relid), subrel->srsubstate),
+
+ /*
+ * translator: first %s is a SQL ALTER command and second %s is a
+ * SQL DROP command
+ */
+ errhint("Use %s to enable subscription if not already enabled or use %s to drop the subscription.",
+ "ALTER SUBSCRIPTION ... ENABLE",
+ "DROP SUBSCRIPTION ...")));
+ }
+
CatalogTupleDelete(rel, &tup->t_self);
}
table_endscan(scan);
diff --git a/src/backend/commands/subscriptioncmds.c b/src/backend/commands/subscriptioncmds.c
index 5ccbc9dd50..5cf874e0b4 100644
--- a/src/backend/commands/subscriptioncmds.c
+++ b/src/backend/commands/subscriptioncmds.c
@@ -34,6 +34,7 @@
#include "nodes/makefuncs.h"
#include "replication/logicallauncher.h"
#include "replication/origin.h"
+#include "replication/slot.h"
#include "replication/walreceiver.h"
#include "replication/walsender.h"
#include "replication/worker_internal.h"
@@ -46,6 +47,8 @@
#include "utils/syscache.h"
static List *fetch_table_list(WalReceiverConn *wrconn, List *publications);
+static void ReportSlotConnectionError(List *rstates, Oid subid, char *slotname, char *err);
+
/*
* Common option parsing function for CREATE and ALTER SUBSCRIPTION commands.
@@ -566,107 +569,207 @@ AlterSubscription_refresh(Subscription *sub, bool copy_data)
Oid *pubrel_local_oids;
ListCell *lc;
int off;
+ int remove_rel_len;
+ Relation rel = NULL;
+ typedef struct SubRemoveRels
+ {
+ Oid relid;
+ char state;
+ } SubRemoveRels;
+ SubRemoveRels *sub_remove_rels;
/* Load the library providing us libpq calls. */
load_file("libpqwalreceiver", false);
- /* Try to connect to the publisher. */
- wrconn = walrcv_connect(sub->conninfo, true, sub->name, &err);
- if (!wrconn)
- ereport(ERROR,
- (errmsg("could not connect to the publisher: %s", err)));
-
- /* Get the table list from publisher. */
- pubrel_names = fetch_table_list(wrconn, sub->publications);
+ PG_TRY();
+ {
+ /* Try to connect to the publisher. */
+ wrconn = walrcv_connect(sub->conninfo, true, sub->name, &err);
+ if (!wrconn)
+ ereport(ERROR,
+ (errmsg("could not connect to the publisher: %s", err)));
- /* We are done with the remote side, close connection. */
- walrcv_disconnect(wrconn);
+ /* Get the table list from publisher. */
+ pubrel_names = fetch_table_list(wrconn, sub->publications);
- /* Get local table list. */
- subrel_states = GetSubscriptionRelations(sub->oid);
+ /* Get local table list. */
+ subrel_states = GetSubscriptionRelations(sub->oid);
- /*
- * Build qsorted array of local table oids for faster lookup. This can
- * potentially contain all tables in the database so speed of lookup is
- * important.
- */
- subrel_local_oids = palloc(list_length(subrel_states) * sizeof(Oid));
- off = 0;
- foreach(lc, subrel_states)
- {
- SubscriptionRelState *relstate = (SubscriptionRelState *) lfirst(lc);
+ /*
+ * Build qsorted array of local table oids for faster lookup. This can
+ * potentially contain all tables in the database so speed of lookup
+ * is important.
+ */
+ subrel_local_oids = palloc(list_length(subrel_states) * sizeof(Oid));
+ off = 0;
+ foreach(lc, subrel_states)
+ {
+ SubscriptionRelState *relstate = (SubscriptionRelState *) lfirst(lc);
- subrel_local_oids[off++] = relstate->relid;
- }
- qsort(subrel_local_oids, list_length(subrel_states),
- sizeof(Oid), oid_cmp);
+ subrel_local_oids[off++] = relstate->relid;
+ }
+ qsort(subrel_local_oids, list_length(subrel_states),
+ sizeof(Oid), oid_cmp);
+
+ /*
+ * Rels that we want to remove from subscription and drop any slots
+ * and origins corresponding to them.
+ */
+ sub_remove_rels = palloc(list_length(subrel_states) * sizeof(SubRemoveRels));
+
+ /*
+ * Walk over the remote tables and try to match them to locally known
+ * tables. If the table is not known locally create a new state for
+ * it.
+ *
+ * Also builds array of local oids of remote tables for the next step.
+ */
+ off = 0;
+ pubrel_local_oids = palloc(list_length(pubrel_names) * sizeof(Oid));
+
+ foreach(lc, pubrel_names)
+ {
+ RangeVar *rv = (RangeVar *) lfirst(lc);
+ Oid relid;
- /*
- * Walk over the remote tables and try to match them to locally known
- * tables. If the table is not known locally create a new state for it.
- *
- * Also builds array of local oids of remote tables for the next step.
- */
- off = 0;
- pubrel_local_oids = palloc(list_length(pubrel_names) * sizeof(Oid));
+ relid = RangeVarGetRelid(rv, AccessShareLock, false);
- foreach(lc, pubrel_names)
- {
- RangeVar *rv = (RangeVar *) lfirst(lc);
- Oid relid;
+ /* Check for supported relkind. */
+ CheckSubscriptionRelkind(get_rel_relkind(relid),
+ rv->schemaname, rv->relname);
- relid = RangeVarGetRelid(rv, AccessShareLock, false);
+ pubrel_local_oids[off++] = relid;
- /* Check for supported relkind. */
- CheckSubscriptionRelkind(get_rel_relkind(relid),
- rv->schemaname, rv->relname);
+ if (!bsearch(&relid, subrel_local_oids,
+ list_length(subrel_states), sizeof(Oid), oid_cmp))
+ {
+ AddSubscriptionRelState(sub->oid, relid,
+ copy_data ? SUBREL_STATE_INIT : SUBREL_STATE_READY,
+ InvalidXLogRecPtr);
+ ereport(DEBUG1,
+ (errmsg("table \"%s.%s\" added to subscription \"%s\"",
+ rv->schemaname, rv->relname, sub->name)));
+ }
+ }
- pubrel_local_oids[off++] = relid;
+ /*
+ * Next remove state for tables we should not care about anymore using
+ * the data we collected above
+ */
+ qsort(pubrel_local_oids, list_length(pubrel_names),
+ sizeof(Oid), oid_cmp);
- if (!bsearch(&relid, subrel_local_oids,
- list_length(subrel_states), sizeof(Oid), oid_cmp))
+ remove_rel_len = 0;
+ for (off = 0; off < list_length(subrel_states); off++)
{
- AddSubscriptionRelState(sub->oid, relid,
- copy_data ? SUBREL_STATE_INIT : SUBREL_STATE_READY,
- InvalidXLogRecPtr);
- ereport(DEBUG1,
- (errmsg("table \"%s.%s\" added to subscription \"%s\"",
- rv->schemaname, rv->relname, sub->name)));
- }
- }
+ Oid relid = subrel_local_oids[off];
- /*
- * Next remove state for tables we should not care about anymore using the
- * data we collected above
- */
- qsort(pubrel_local_oids, list_length(pubrel_names),
- sizeof(Oid), oid_cmp);
+ if (!bsearch(&relid, pubrel_local_oids,
+ list_length(pubrel_names), sizeof(Oid), oid_cmp))
+ {
+ char state;
+ XLogRecPtr statelsn;
+
+ /*
+ * Lock pg_subscription_rel with AccessExclusiveLock to
+ * prevent any race conditions with the apply worker
+ * re-launching workers at the same time this code is trying
+ * to remove those tables.
+ *
+ * Even if new worker for this particular rel is restarted it
+ * won't be able to make any progress as we hold exclusive
+ * lock on subscription_rel till the transaction end. It will
+ * simply exit as there is no corresponding rel entry.
+ *
+ * This locking also ensures that the state of rels won't
+ * change till we are done with this refresh operation.
+ */
+ if (!rel)
+ rel = table_open(SubscriptionRelRelationId, AccessExclusiveLock);
+
+ /* Last known rel state. */
+ state = GetSubscriptionRelState(sub->oid, relid, &statelsn);
+
+ sub_remove_rels[remove_rel_len].relid = relid;
+ sub_remove_rels[remove_rel_len++].state = state;
+
+ RemoveSubscriptionRel(sub->oid, relid);
+
+ logicalrep_worker_stop(sub->oid, relid);
+
+ /*
+ * For READY state, we would have already dropped the
+ * tablesync origin.
+ */
+ if (state != SUBREL_STATE_READY)
+ {
+ char originname[NAMEDATALEN];
+
+ /*
+ * Drop the tablesync's origin tracking if exists.
+ *
+ * It is possible that the origin is not yet created for
+ * tablesync worker, this can happen for the states before
+ * SUBREL_STATE_FINISHEDCOPY. The apply worker can also
+ * concurrently try to drop the origin and by this time
+ * the origin might be already removed. For these reasons,
+ * passing missing_ok = true.
+ */
+ ReplicationOriginNameForTablesync(sub->oid, relid, originname);
+ replorigin_drop_by_name(originname, true, false);
+ }
- for (off = 0; off < list_length(subrel_states); off++)
- {
- Oid relid = subrel_local_oids[off];
+ ereport(DEBUG1,
+ (errmsg("table \"%s.%s\" removed from subscription \"%s\"",
+ get_namespace_name(get_rel_namespace(relid)),
+ get_rel_name(relid),
+ sub->name)));
+ }
+ }
- if (!bsearch(&relid, pubrel_local_oids,
- list_length(pubrel_names), sizeof(Oid), oid_cmp))
+ /*
+ * Drop the tablesync slots associated with removed tables. This has
+ * to be at the end because otherwise if there is an error while doing
+ * the database operations we won't be able to rollback dropped slots.
+ */
+ for (off = 0; off < remove_rel_len; off++)
{
- RemoveSubscriptionRel(sub->oid, relid);
-
- logicalrep_worker_stop_at_commit(sub->oid, relid);
-
- ereport(DEBUG1,
- (errmsg("table \"%s.%s\" removed from subscription \"%s\"",
- get_namespace_name(get_rel_namespace(relid)),
- get_rel_name(relid),
- sub->name)));
+ if (sub_remove_rels[off].state != SUBREL_STATE_READY &&
+ sub_remove_rels[off].state != SUBREL_STATE_SYNCDONE)
+ {
+ char syncslotname[NAMEDATALEN] = {0};
+
+ /*
+ * For READY/SYNCDONE states we know the tablesync slot has
+ * already been dropped by the tablesync worker.
+ *
+ * For other states, there is no certainty, maybe the slot
+ * does not exist yet. Also, if we fail after removing some of
+ * the slots, next time, it will again try to drop already
+ * dropped slots and fail. For these reasons, we allow
+ * missing_ok = true for the drop.
+ */
+ ReplicationSlotNameForTablesync(sub->oid, sub_remove_rels[off].relid, syncslotname);
+ ReplicationSlotDropAtPubNode(wrconn, syncslotname, true);
+ }
}
}
+ PG_FINALLY();
+ {
+ if (wrconn)
+ walrcv_disconnect(wrconn);
+ }
+ PG_END_TRY();
+
+ if (rel)
+ table_close(rel, NoLock);
}
/*
* Alter the existing subscription.
*/
ObjectAddress
-AlterSubscription(AlterSubscriptionStmt *stmt)
+AlterSubscription(AlterSubscriptionStmt *stmt, bool isTopLevel)
{
Relation rel;
ObjectAddress myself;
@@ -848,6 +951,8 @@ AlterSubscription(AlterSubscriptionStmt *stmt)
errmsg("ALTER SUBSCRIPTION with refresh is not allowed for disabled subscriptions"),
errhint("Use ALTER SUBSCRIPTION ... SET PUBLICATION ... WITH (refresh = false).")));
+ PreventInTransactionBlock(isTopLevel, "ALTER SUBSCRIPTION with refresh");
+
/* Make sure refresh sees the new list of publications. */
sub->publications = stmt->publication;
@@ -877,6 +982,8 @@ AlterSubscription(AlterSubscriptionStmt *stmt)
NULL, NULL, /* no "binary" */
NULL, NULL); /* no "streaming" */
+ PreventInTransactionBlock(isTopLevel, "ALTER SUBSCRIPTION ... REFRESH");
+
AlterSubscription_refresh(sub, copy_data);
break;
@@ -927,8 +1034,8 @@ DropSubscription(DropSubscriptionStmt *stmt, bool isTopLevel)
char originname[NAMEDATALEN];
char *err = NULL;
WalReceiverConn *wrconn = NULL;
- StringInfoData cmd;
Form_pg_subscription form;
+ List *rstates;
/*
* Lock pg_subscription with AccessExclusiveLock to ensure that the
@@ -1041,6 +1148,36 @@ DropSubscription(DropSubscriptionStmt *stmt, bool isTopLevel)
}
list_free(subworkers);
+ /*
+ * Cleanup of tablesync replication origins.
+ *
+ * Any READY-state relations would already have dealt with clean-ups.
+ *
+ * Note that the state can't change because we have already stopped both
+ * the apply and tablesync workers and they can't restart because of
+ * exclusive lock on the subscription.
+ */
+ rstates = GetSubscriptionNotReadyRelations(subid);
+ foreach(lc, rstates)
+ {
+ SubscriptionRelState *rstate = (SubscriptionRelState *) lfirst(lc);
+ Oid relid = rstate->relid;
+
+ /* Only cleanup resources of tablesync workers */
+ if (!OidIsValid(relid))
+ continue;
+
+ /*
+ * Drop the tablesync's origin tracking if exists.
+ *
+ * It is possible that the origin is not yet created for tablesync
+ * worker so passing missing_ok = true. This can happen for the states
+ * before SUBREL_STATE_FINISHEDCOPY.
+ */
+ ReplicationOriginNameForTablesync(subid, relid, originname);
+ replorigin_drop_by_name(originname, true, false);
+ }
+
/* Clean up dependencies */
deleteSharedDependencyRecordsFor(SubscriptionRelationId, subid, 0);
@@ -1055,30 +1192,110 @@ DropSubscription(DropSubscriptionStmt *stmt, bool isTopLevel)
* If there is no slot associated with the subscription, we can finish
* here.
*/
- if (!slotname)
+ if (!slotname && rstates == NIL)
{
table_close(rel, NoLock);
return;
}
/*
- * Otherwise drop the replication slot at the publisher node using the
- * replication connection.
+ * Try to acquire the connection necessary for dropping slots.
+ *
+ * Note: If the slotname is NONE/NULL then we allow the command to finish
+ * and users need to manually cleanup the apply and tablesync worker slots
+ * later.
+ *
+ * This has to be at the end because otherwise if there is an error while
+ * doing the database operations we won't be able to rollback dropped
+ * slot.
*/
load_file("libpqwalreceiver", false);
- initStringInfo(&cmd);
- appendStringInfo(&cmd, "DROP_REPLICATION_SLOT %s WAIT", quote_identifier(slotname));
-
wrconn = walrcv_connect(conninfo, true, subname, &err);
if (wrconn == NULL)
- ereport(ERROR,
- (errmsg("could not connect to publisher when attempting to "
- "drop the replication slot \"%s\"", slotname),
- errdetail("The error was: %s", err),
- /* translator: %s is an SQL ALTER command */
- errhint("Use %s to disassociate the subscription from the slot.",
- "ALTER SUBSCRIPTION ... SET (slot_name = NONE)")));
+ {
+ if (!slotname)
+ {
+ /* be tidy */
+ list_free(rstates);
+ table_close(rel, NoLock);
+ return;
+ }
+ else
+ {
+ ReportSlotConnectionError(rstates, subid, slotname, err);
+ }
+ }
+
+ PG_TRY();
+ {
+ foreach(lc, rstates)
+ {
+ SubscriptionRelState *rstate = (SubscriptionRelState *) lfirst(lc);
+ Oid relid = rstate->relid;
+
+ /* Only cleanup resources of tablesync workers */
+ if (!OidIsValid(relid))
+ continue;
+
+ /*
+ * Drop the tablesync slots associated with removed tables.
+ *
+ * For SYNCDONE/READY states, the tablesync slot is known to have
+ * already been dropped by the tablesync worker.
+ *
+ * For other states, there is no certainty, maybe the slot does
+ * not exist yet. Also, if we fail after removing some of the
+ * slots, next time, it will again try to drop already dropped
+ * slots and fail. For these reasons, we allow missing_ok = true
+ * for the drop.
+ */
+ if (rstate->state != SUBREL_STATE_SYNCDONE)
+ {
+ char syncslotname[NAMEDATALEN] = {0};
+
+ ReplicationSlotNameForTablesync(subid, relid, syncslotname);
+ ReplicationSlotDropAtPubNode(wrconn, syncslotname, true);
+ }
+ }
+
+ list_free(rstates);
+
+ /*
+ * If there is a slot associated with the subscription, then drop the
+ * replication slot at the publisher.
+ */
+ if (slotname)
+ ReplicationSlotDropAtPubNode(wrconn, slotname, false);
+
+ }
+ PG_FINALLY();
+ {
+ walrcv_disconnect(wrconn);
+ }
+ PG_END_TRY();
+
+ table_close(rel, NoLock);
+}
+
+/*
+ * Drop the replication slot at the publisher node using the replication
+ * connection.
+ *
+ * missing_ok - if true then only issue a LOG message if the slot doesn't
+ * exist.
+ */
+void
+ReplicationSlotDropAtPubNode(WalReceiverConn *wrconn, char *slotname, bool missing_ok)
+{
+ StringInfoData cmd;
+
+ Assert(wrconn);
+
+ load_file("libpqwalreceiver", false);
+
+ initStringInfo(&cmd);
+ appendStringInfo(&cmd, "DROP_REPLICATION_SLOT %s WAIT", quote_identifier(slotname));
PG_TRY();
{
@@ -1086,27 +1303,39 @@ DropSubscription(DropSubscriptionStmt *stmt, bool isTopLevel)
res = walrcv_exec(wrconn, cmd.data, 0, NULL);
- if (res->status != WALRCV_OK_COMMAND)
- ereport(ERROR,
+ if (res->status == WALRCV_OK_COMMAND)
+ {
+ /* NOTICE. Success. */
+ ereport(NOTICE,
+ (errmsg("dropped replication slot \"%s\" on publisher",
+ slotname)));
+ }
+ else if (res->status == WALRCV_ERROR &&
+ missing_ok &&
+ res->sqlstate == ERRCODE_UNDEFINED_OBJECT)
+ {
+ /* LOG. Error, but missing_ok = true. */
+ ereport(LOG,
(errmsg("could not drop the replication slot \"%s\" on publisher",
slotname),
errdetail("The error was: %s", res->err)));
+ }
else
- ereport(NOTICE,
- (errmsg("dropped replication slot \"%s\" on publisher",
- slotname)));
+ {
+ /* ERROR. */
+ ereport(ERROR,
+ (errmsg("could not drop the replication slot \"%s\" on publisher",
+ slotname),
+ errdetail("The error was: %s", res->err)));
+ }
walrcv_clear_result(res);
}
PG_FINALLY();
{
- walrcv_disconnect(wrconn);
+ pfree(cmd.data);
}
PG_END_TRY();
-
- pfree(cmd.data);
-
- table_close(rel, NoLock);
}
/*
@@ -1275,3 +1504,45 @@ fetch_table_list(WalReceiverConn *wrconn, List *publications)
return tablelist;
}
+
+/*
+ * This is to report the connection failure while dropping replication slots.
+ * Here, we report the WARNING for all tablesync slots so that user can drop
+ * them manually, if required.
+ */
+static void
+ReportSlotConnectionError(List *rstates, Oid subid, char *slotname, char *err)
+{
+ ListCell *lc;
+
+ foreach(lc, rstates)
+ {
+ SubscriptionRelState *rstate = (SubscriptionRelState *) lfirst(lc);
+ Oid relid = rstate->relid;
+
+ /* Only cleanup resources of tablesync workers */
+ if (!OidIsValid(relid))
+ continue;
+
+ /*
+ * Caller needs to ensure that relstate doesn't change underneath us.
+ * See DropSubscription where we get the relstates.
+ */
+ if (rstate->state != SUBREL_STATE_SYNCDONE)
+ {
+ char syncslotname[NAMEDATALEN] = {0};
+
+ ReplicationSlotNameForTablesync(subid, relid, syncslotname);
+ elog(WARNING, "could not drop tablesync replication slot \"%s\"",
+ syncslotname);
+ }
+ }
+
+ ereport(ERROR,
+ (errmsg("could not connect to publisher when attempting to "
+ "drop the replication slot \"%s\"", slotname),
+ errdetail("The error was: %s", err),
+ /* translator: %s is an SQL ALTER command */
+ errhint("Use %s to disassociate the subscription from the slot.",
+ "ALTER SUBSCRIPTION ... SET (slot_name = NONE)")));
+}
diff --git a/src/backend/replication/libpqwalreceiver/libpqwalreceiver.c b/src/backend/replication/libpqwalreceiver/libpqwalreceiver.c
index e958274861..7714696140 100644
--- a/src/backend/replication/libpqwalreceiver/libpqwalreceiver.c
+++ b/src/backend/replication/libpqwalreceiver/libpqwalreceiver.c
@@ -982,6 +982,7 @@ libpqrcv_exec(WalReceiverConn *conn, const char *query,
{
PGresult *pgres = NULL;
WalRcvExecResult *walres = palloc0(sizeof(WalRcvExecResult));
+ char *diag_sqlstate;
if (MyDatabaseId == InvalidOid)
ereport(ERROR,
@@ -1025,6 +1026,13 @@ libpqrcv_exec(WalReceiverConn *conn, const char *query,
case PGRES_BAD_RESPONSE:
walres->status = WALRCV_ERROR;
walres->err = pchomp(PQerrorMessage(conn->streamConn));
+ diag_sqlstate = PQresultErrorField(pgres, PG_DIAG_SQLSTATE);
+ if (diag_sqlstate)
+ walres->sqlstate = MAKE_SQLSTATE(diag_sqlstate[0],
+ diag_sqlstate[1],
+ diag_sqlstate[2],
+ diag_sqlstate[3],
+ diag_sqlstate[4]);
break;
}
diff --git a/src/backend/replication/logical/launcher.c b/src/backend/replication/logical/launcher.c
index 186514cd9e..58082dde18 100644
--- a/src/backend/replication/logical/launcher.c
+++ b/src/backend/replication/logical/launcher.c
@@ -73,20 +73,6 @@ typedef struct LogicalRepWorkerId
Oid relid;
} LogicalRepWorkerId;
-typedef struct StopWorkersData
-{
- int nestDepth; /* Sub-transaction nest level */
- List *workers; /* List of LogicalRepWorkerId */
- struct StopWorkersData *parent; /* This need not be an immediate
- * subtransaction parent */
-} StopWorkersData;
-
-/*
- * Stack of StopWorkersData elements. Each stack element contains the workers
- * to be stopped for that subtransaction.
- */
-static StopWorkersData *on_commit_stop_workers = NULL;
-
static void ApplyLauncherWakeup(void);
static void logicalrep_launcher_onexit(int code, Datum arg);
static void logicalrep_worker_onexit(int code, Datum arg);
@@ -546,51 +532,6 @@ logicalrep_worker_stop(Oid subid, Oid relid)
LWLockRelease(LogicalRepWorkerLock);
}
-/*
- * Request worker for specified sub/rel to be stopped on commit.
- */
-void
-logicalrep_worker_stop_at_commit(Oid subid, Oid relid)
-{
- int nestDepth = GetCurrentTransactionNestLevel();
- LogicalRepWorkerId *wid;
- MemoryContext oldctx;
-
- /* Make sure we store the info in context that survives until commit. */
- oldctx = MemoryContextSwitchTo(TopTransactionContext);
-
- /* Check that previous transactions were properly cleaned up. */
- Assert(on_commit_stop_workers == NULL ||
- nestDepth >= on_commit_stop_workers->nestDepth);
-
- /*
- * Push a new stack element if we don't already have one for the current
- * nestDepth.
- */
- if (on_commit_stop_workers == NULL ||
- nestDepth > on_commit_stop_workers->nestDepth)
- {
- StopWorkersData *newdata = palloc(sizeof(StopWorkersData));
-
- newdata->nestDepth = nestDepth;
- newdata->workers = NIL;
- newdata->parent = on_commit_stop_workers;
- on_commit_stop_workers = newdata;
- }
-
- /*
- * Finally add a new worker into the worker list of the current
- * subtransaction.
- */
- wid = palloc(sizeof(LogicalRepWorkerId));
- wid->subid = subid;
- wid->relid = relid;
- on_commit_stop_workers->workers =
- lappend(on_commit_stop_workers->workers, wid);
-
- MemoryContextSwitchTo(oldctx);
-}
-
/*
* Wake up (using latch) any logical replication worker for specified sub/rel.
*/
@@ -819,109 +760,21 @@ ApplyLauncherShmemInit(void)
}
}
-/*
- * Check whether current transaction has manipulated logical replication
- * workers.
- */
-bool
-XactManipulatesLogicalReplicationWorkers(void)
-{
- return (on_commit_stop_workers != NULL);
-}
-
/*
* Wakeup the launcher on commit if requested.
*/
void
AtEOXact_ApplyLauncher(bool isCommit)
{
-
- Assert(on_commit_stop_workers == NULL ||
- (on_commit_stop_workers->nestDepth == 1 &&
- on_commit_stop_workers->parent == NULL));
-
if (isCommit)
{
- ListCell *lc;
-
- if (on_commit_stop_workers != NULL)
- {
- List *workers = on_commit_stop_workers->workers;
-
- foreach(lc, workers)
- {
- LogicalRepWorkerId *wid = lfirst(lc);
-
- logicalrep_worker_stop(wid->subid, wid->relid);
- }
- }
-
if (on_commit_launcher_wakeup)
ApplyLauncherWakeup();
}
- /*
- * No need to pfree on_commit_stop_workers. It was allocated in
- * transaction memory context, which is going to be cleaned soon.
- */
- on_commit_stop_workers = NULL;
on_commit_launcher_wakeup = false;
}
-/*
- * On commit, merge the current on_commit_stop_workers list into the
- * immediate parent, if present.
- * On rollback, discard the current on_commit_stop_workers list.
- * Pop out the stack.
- */
-void
-AtEOSubXact_ApplyLauncher(bool isCommit, int nestDepth)
-{
- StopWorkersData *parent;
-
- /* Exit immediately if there's no work to do at this level. */
- if (on_commit_stop_workers == NULL ||
- on_commit_stop_workers->nestDepth < nestDepth)
- return;
-
- Assert(on_commit_stop_workers->nestDepth == nestDepth);
-
- parent = on_commit_stop_workers->parent;
-
- if (isCommit)
- {
- /*
- * If the upper stack element is not an immediate parent
- * subtransaction, just decrement the notional nesting depth without
- * doing any real work. Else, we need to merge the current workers
- * list into the parent.
- */
- if (!parent || parent->nestDepth < nestDepth - 1)
- {
- on_commit_stop_workers->nestDepth--;
- return;
- }
-
- parent->workers =
- list_concat(parent->workers, on_commit_stop_workers->workers);
- }
- else
- {
- /*
- * Abandon everything that was done at this nesting level. Explicitly
- * free memory to avoid a transaction-lifespan leak.
- */
- list_free_deep(on_commit_stop_workers->workers);
- }
-
- /*
- * We have taken care of the current subtransaction workers list for both
- * abort or commit. So we are ready to pop the stack.
- */
- pfree(on_commit_stop_workers);
- on_commit_stop_workers = parent;
-}
-
/*
* Request wakeup of the launcher on commit of the transaction.
*
diff --git a/src/backend/replication/logical/tablesync.c b/src/backend/replication/logical/tablesync.c
index ccbdbcf08f..19cc804678 100644
--- a/src/backend/replication/logical/tablesync.c
+++ b/src/backend/replication/logical/tablesync.c
@@ -31,8 +31,11 @@
* table state to INIT.
* - Tablesync worker starts; changes table state from INIT to DATASYNC while
* copying.
- * - Tablesync worker finishes the copy and sets table state to SYNCWAIT;
- * waits for state change.
+ * - Tablesync worker does initial table copy; there is a FINISHEDCOPY (sync
+ * worker specific) state to indicate when the copy phase has completed, so
+ * if the worker crashes with this (non-memory) state then the copy will not
+ * be re-attempted.
+ * - Tablesync worker then sets table state to SYNCWAIT; waits for state change.
* - Apply worker periodically checks for tables in SYNCWAIT state. When
* any appear, it sets the table state to CATCHUP and starts loop-waiting
* until either the table state is set to SYNCDONE or the sync worker
@@ -48,8 +51,8 @@
* point it sets state to READY and stops tracking. Again, there might
* be zero changes in between.
*
- * So the state progression is always: INIT -> DATASYNC -> SYNCWAIT ->
- * CATCHUP -> SYNCDONE -> READY.
+ * So the state progression is always: INIT -> DATASYNC -> FINISHEDCOPY
+ * -> SYNCWAIT -> CATCHUP -> SYNCDONE -> READY.
*
* The catalog pg_subscription_rel is used to keep information about
* subscribed tables and their state. The catalog holds all states
@@ -58,6 +61,7 @@
* Example flows look like this:
* - Apply is in front:
* sync:8
+ * -> set in catalog FINISHEDCOPY
* -> set in memory SYNCWAIT
* apply:10
* -> set in memory CATCHUP
@@ -73,6 +77,7 @@
*
* - Sync is in front:
* sync:10
+ * -> set in catalog FINISHEDCOPY
* -> set in memory SYNCWAIT
* apply:8
* -> set in memory CATCHUP
@@ -101,7 +106,10 @@
#include "replication/logicalrelation.h"
#include "replication/walreceiver.h"
#include "replication/worker_internal.h"
+#include "replication/slot.h"
+#include "replication/origin.h"
#include "storage/ipc.h"
+#include "storage/lmgr.h"
#include "utils/builtins.h"
#include "utils/lsyscache.h"
#include "utils/memutils.h"
@@ -269,26 +277,52 @@ invalidate_syncing_table_states(Datum arg, int cacheid, uint32 hashvalue)
static void
process_syncing_tables_for_sync(XLogRecPtr current_lsn)
{
- Assert(IsTransactionState());
-
SpinLockAcquire(&MyLogicalRepWorker->relmutex);
if (MyLogicalRepWorker->relstate == SUBREL_STATE_CATCHUP &&
current_lsn >= MyLogicalRepWorker->relstate_lsn)
{
TimeLineID tli;
+ char syncslotname[NAMEDATALEN] = {0};
MyLogicalRepWorker->relstate = SUBREL_STATE_SYNCDONE;
MyLogicalRepWorker->relstate_lsn = current_lsn;
SpinLockRelease(&MyLogicalRepWorker->relmutex);
+ /*
+ * UpdateSubscriptionRelState must be called within a transaction.
+ * That transaction will be ended within the finish_sync_worker().
+ */
+ if (!IsTransactionState())
+ StartTransactionCommand();
+
UpdateSubscriptionRelState(MyLogicalRepWorker->subid,
MyLogicalRepWorker->relid,
MyLogicalRepWorker->relstate,
MyLogicalRepWorker->relstate_lsn);
+ /* End wal streaming so wrconn can be re-used to drop the slot. */
walrcv_endstreaming(wrconn, &tli);
+
+ /*
+ * Cleanup the tablesync slot.
+ *
+ * This has to be done after updating the state because otherwise if
+ * there is an error while doing the database operations we won't be
+ * able to rollback dropped slot.
+ */
+ ReplicationSlotNameForTablesync(MyLogicalRepWorker->subid,
+ MyLogicalRepWorker->relid,
+ syncslotname);
+
+ /*
+ * It is important to give an error if we are unable to drop the slot,
+ * otherwise, it won't be dropped till the corresponding subscription
+ * is dropped. So passing missing_ok = false.
+ */
+ ReplicationSlotDropAtPubNode(wrconn, syncslotname, false);
+
finish_sync_worker();
}
else
@@ -403,6 +437,8 @@ process_syncing_tables_for_apply(XLogRecPtr current_lsn)
*/
if (current_lsn >= rstate->lsn)
{
+ char originname[NAMEDATALEN];
+
rstate->state = SUBREL_STATE_READY;
rstate->lsn = current_lsn;
if (!started_tx)
@@ -411,6 +447,27 @@ process_syncing_tables_for_apply(XLogRecPtr current_lsn)
started_tx = true;
}
+ /*
+ * Remove the tablesync origin tracking if exists.
+ *
+ * The normal case origin drop is done here instead of in the
+ * process_syncing_tables_for_sync function because we don't
+ * allow to drop the origin till the process owning the origin
+ * is alive.
+ *
+ * There is a chance that the user is concurrently performing
+ * refresh for the subscription where we remove the table
+ * state and its origin and by this time the origin might be
+ * already removed. So passing missing_ok = true.
+ */
+ ReplicationOriginNameForTablesync(MyLogicalRepWorker->subid,
+ rstate->relid,
+ originname);
+ replorigin_drop_by_name(originname, true, false);
+
+ /*
+ * Update the state to READY only after the origin cleanup.
+ */
UpdateSubscriptionRelState(MyLogicalRepWorker->subid,
rstate->relid, rstate->state,
rstate->lsn);
@@ -805,6 +862,50 @@ copy_table(Relation rel)
logicalrep_rel_close(relmapentry, NoLock);
}
+/*
+ * Determine the tablesync slot name.
+ *
+ * The name must not exceed NAMEDATALEN - 1 because of remote node constraints
+ * on slot name length. We append system_identifier to avoid slot_name
+ * collision with subscriptions in other clusters. With the current scheme
+ * pg_%u_sync_%u_UINT64_FORMAT (3 + 10 + 6 + 10 + 20 + '\0'), the maximum
+ * length of slot_name will be 50.
+ *
+ * The returned slot name is either:
+ * - stored in the supplied buffer (syncslotname), or
+ * - palloc'ed in current memory context (if syncslotname = NULL).
+ *
+ * Note: We don't use the subscription slot name as part of tablesync slot name
+ * because we are responsible for cleaning up these slots and it could become
+ * impossible to recalculate what name to cleanup if the subscription slot name
+ * had changed.
+ */
+char *
+ReplicationSlotNameForTablesync(Oid suboid, Oid relid,
+ char syncslotname[NAMEDATALEN])
+{
+ if (syncslotname)
+ sprintf(syncslotname, "pg_%u_sync_%u_" UINT64_FORMAT, suboid, relid,
+ GetSystemIdentifier());
+ else
+ syncslotname = psprintf("pg_%u_sync_%u_" UINT64_FORMAT, suboid, relid,
+ GetSystemIdentifier());
+
+ return syncslotname;
+}
+
+/*
+ * Form the origin name for tablesync.
+ *
+ * Return the name in the supplied buffer.
+ */
+void
+ReplicationOriginNameForTablesync(Oid suboid, Oid relid,
+ char originname[NAMEDATALEN])
+{
+ snprintf(originname, NAMEDATALEN, "pg_%u_%u", suboid, relid);
+}
+
/*
* Start syncing the table in the sync worker.
*
@@ -822,6 +923,8 @@ LogicalRepSyncTableStart(XLogRecPtr *origin_startpos)
XLogRecPtr relstate_lsn;
Relation rel;
WalRcvExecResult *res;
+ char originname[NAMEDATALEN];
+ RepOriginId originid;
/* Check the state of the table synchronization. */
StartTransactionCommand();
@@ -847,19 +950,10 @@ LogicalRepSyncTableStart(XLogRecPtr *origin_startpos)
finish_sync_worker(); /* doesn't return */
}
- /*
- * To build a slot name for the sync work, we are limited to NAMEDATALEN -
- * 1 characters. We cut the original slot name to NAMEDATALEN - 28 chars
- * and append _%u_sync_%u (1 + 10 + 6 + 10 + '\0'). (It's actually the
- * NAMEDATALEN on the remote that matters, but this scheme will also work
- * reasonably if that is different.)
- */
- StaticAssertStmt(NAMEDATALEN >= 32, "NAMEDATALEN too small"); /* for sanity */
- slotname = psprintf("%.*s_%u_sync_%u",
- NAMEDATALEN - 28,
- MySubscription->slotname,
- MySubscription->oid,
- MyLogicalRepWorker->relid);
+ /* Calculate the name of the tablesync slot. */
+ slotname = ReplicationSlotNameForTablesync(MySubscription->oid,
+ MyLogicalRepWorker->relid,
+ NULL /* use palloc */ );
/*
* Here we use the slot name instead of the subscription name as the
@@ -872,7 +966,50 @@ LogicalRepSyncTableStart(XLogRecPtr *origin_startpos)
(errmsg("could not connect to the publisher: %s", err)));
Assert(MyLogicalRepWorker->relstate == SUBREL_STATE_INIT ||
- MyLogicalRepWorker->relstate == SUBREL_STATE_DATASYNC);
+ MyLogicalRepWorker->relstate == SUBREL_STATE_DATASYNC ||
+ MyLogicalRepWorker->relstate == SUBREL_STATE_FINISHEDCOPY);
+
+ /* Assign the origin tracking record name. */
+ ReplicationOriginNameForTablesync(MySubscription->oid,
+ MyLogicalRepWorker->relid,
+ originname);
+
+ if (MyLogicalRepWorker->relstate == SUBREL_STATE_DATASYNC)
+ {
+ /*
+ * We have previously errored out before finishing the copy so the
+ * replication slot might exist. We want to remove the slot if it
+ * already exists and proceed.
+ *
+ * XXX We could also instead try to drop the slot, last time we failed
+ * but for that, we might need to clean up the copy state as it might
+ * be in the middle of fetching the rows. Also, if there is a network
+ * breakdown then it wouldn't have succeeded so trying it next time
+ * seems like a better bet.
+ */
+ ReplicationSlotDropAtPubNode(wrconn, slotname, true);
+ }
+ else if (MyLogicalRepWorker->relstate == SUBREL_STATE_FINISHEDCOPY)
+ {
+ /*
+ * The COPY phase was previously done, but tablesync then crashed
+ * before it was able to finish normally.
+ */
+ StartTransactionCommand();
+
+ /*
+ * The origin tracking name must already exist. It was created first
+ * time this tablesync was launched.
+ */
+ originid = replorigin_by_name(originname, false);
+ replorigin_session_setup(originid);
+ replorigin_session_origin = originid;
+ *origin_startpos = replorigin_session_get_progress(false);
+
+ CommitTransactionCommand();
+
+ goto copy_table_done;
+ }
SpinLockAcquire(&MyLogicalRepWorker->relmutex);
MyLogicalRepWorker->relstate = SUBREL_STATE_DATASYNC;
@@ -888,9 +1025,6 @@ LogicalRepSyncTableStart(XLogRecPtr *origin_startpos)
CommitTransactionCommand();
pgstat_report_stat(false);
- /*
- * We want to do the table data sync in a single transaction.
- */
StartTransactionCommand();
/*
@@ -916,13 +1050,46 @@ LogicalRepSyncTableStart(XLogRecPtr *origin_startpos)
walrcv_clear_result(res);
/*
- * Create a new temporary logical decoding slot. This slot will be used
+ * Create a new permanent logical decoding slot. This slot will be used
* for the catchup phase after COPY is done, so tell it to use the
* snapshot to make the final data consistent.
*/
- walrcv_create_slot(wrconn, slotname, true,
+ walrcv_create_slot(wrconn, slotname, false /* permanent */ ,
CRS_USE_SNAPSHOT, origin_startpos);
+ /*
+ * Setup replication origin tracking. The purpose of doing this before the
+ * copy is to avoid doing the copy again due to any error in setting up
+ * origin tracking.
+ */
+ originid = replorigin_by_name(originname, true);
+ if (!OidIsValid(originid))
+ {
+ /*
+ * Origin tracking does not exist, so create it now.
+ *
+ * Then advance to the LSN got from walrcv_create_slot. This is WAL
+ * logged for the purpose of recovery. Locks are to prevent the
+ * replication origin from vanishing while advancing.
+ */
+ originid = replorigin_create(originname);
+
+ LockRelationOid(ReplicationOriginRelationId, RowExclusiveLock);
+ replorigin_advance(originid, *origin_startpos, InvalidXLogRecPtr,
+ true /* go backward */ , true /* WAL log */ );
+ UnlockRelationOid(ReplicationOriginRelationId, RowExclusiveLock);
+
+ replorigin_session_setup(originid);
+ replorigin_session_origin = originid;
+ }
+ else
+ {
+ ereport(ERROR,
+ (errcode(ERRCODE_DUPLICATE_OBJECT),
+ errmsg("replication origin \"%s\" already exists",
+ originname)));
+ }
+
/* Now do the initial data copy */
PushActiveSnapshot(GetTransactionSnapshot());
copy_table(rel);
@@ -940,6 +1107,25 @@ LogicalRepSyncTableStart(XLogRecPtr *origin_startpos)
/* Make the copy visible. */
CommandCounterIncrement();
+ /*
+ * Update the persisted state to indicate the COPY phase is done; make it
+ * visible to others.
+ */
+ UpdateSubscriptionRelState(MyLogicalRepWorker->subid,
+ MyLogicalRepWorker->relid,
+ SUBREL_STATE_FINISHEDCOPY,
+ MyLogicalRepWorker->relstate_lsn);
+
+ CommitTransactionCommand();
+
+copy_table_done:
+
+ elog(DEBUG1,
+ "LogicalRepSyncTableStart: '%s' origin_startpos lsn %X/%X",
+ originname,
+ (uint32) (*origin_startpos >> 32),
+ (uint32) *origin_startpos);
+
/*
* We are done with the initial data synchronization, update the state.
*/
diff --git a/src/backend/replication/logical/worker.c b/src/backend/replication/logical/worker.c
index eb7db89cef..cfc924cd89 100644
--- a/src/backend/replication/logical/worker.c
+++ b/src/backend/replication/logical/worker.c
@@ -807,12 +807,8 @@ apply_handle_stream_stop(StringInfo s)
/* We must be in a valid transaction state */
Assert(IsTransactionState());
- /* The synchronization worker runs in single transaction. */
- if (!am_tablesync_worker())
- {
- /* Commit the per-stream transaction */
- CommitTransactionCommand();
- }
+ /* Commit the per-stream transaction */
+ CommitTransactionCommand();
in_streamed_transaction = false;
@@ -889,9 +885,7 @@ apply_handle_stream_abort(StringInfo s)
/* Cleanup the subxact info */
cleanup_subxact_info();
- /* The synchronization worker runs in single transaction */
- if (!am_tablesync_worker())
- CommitTransactionCommand();
+ CommitTransactionCommand();
return;
}
@@ -918,8 +912,7 @@ apply_handle_stream_abort(StringInfo s)
/* write the updated subxact list */
subxact_info_write(MyLogicalRepWorker->subid, xid);
- if (!am_tablesync_worker())
- CommitTransactionCommand();
+ CommitTransactionCommand();
}
}
@@ -1062,8 +1055,7 @@ apply_handle_stream_commit(StringInfo s)
static void
apply_handle_commit_internal(StringInfo s, LogicalRepCommitData *commit_data)
{
- /* The synchronization worker runs in single transaction. */
- if (IsTransactionState() && !am_tablesync_worker())
+ if (IsTransactionState())
{
/*
* Update origin state so we can restart streaming from correct
diff --git a/src/backend/tcop/utility.c b/src/backend/tcop/utility.c
index 1d81071c35..05bb698cf4 100644
--- a/src/backend/tcop/utility.c
+++ b/src/backend/tcop/utility.c
@@ -1786,7 +1786,8 @@ ProcessUtilitySlow(ParseState *pstate,
break;
case T_AlterSubscriptionStmt:
- address = AlterSubscription((AlterSubscriptionStmt *) parsetree);
+ address = AlterSubscription((AlterSubscriptionStmt *) parsetree,
+ isTopLevel);
break;
case T_DropSubscriptionStmt:
diff --git a/src/include/catalog/pg_subscription_rel.h b/src/include/catalog/pg_subscription_rel.h
index 2bea2c52aa..ed94f57baa 100644
--- a/src/include/catalog/pg_subscription_rel.h
+++ b/src/include/catalog/pg_subscription_rel.h
@@ -61,6 +61,8 @@ DECLARE_UNIQUE_INDEX_PKEY(pg_subscription_rel_srrelid_srsubid_index, 6117, on pg
#define SUBREL_STATE_INIT 'i' /* initializing (sublsn NULL) */
#define SUBREL_STATE_DATASYNC 'd' /* data is being synchronized (sublsn
* NULL) */
+#define SUBREL_STATE_FINISHEDCOPY 'f' /* tablesync copy phase is completed
+ * (sublsn NULL) */
#define SUBREL_STATE_SYNCDONE 's' /* synchronization finished in front of
* apply (sublsn set) */
#define SUBREL_STATE_READY 'r' /* ready (sublsn set) */
diff --git a/src/include/commands/subscriptioncmds.h b/src/include/commands/subscriptioncmds.h
index a81865079d..3b926f35d7 100644
--- a/src/include/commands/subscriptioncmds.h
+++ b/src/include/commands/subscriptioncmds.h
@@ -20,7 +20,7 @@
extern ObjectAddress CreateSubscription(CreateSubscriptionStmt *stmt,
bool isTopLevel);
-extern ObjectAddress AlterSubscription(AlterSubscriptionStmt *stmt);
+extern ObjectAddress AlterSubscription(AlterSubscriptionStmt *stmt, bool isTopLevel);
extern void DropSubscription(DropSubscriptionStmt *stmt, bool isTopLevel);
extern ObjectAddress AlterSubscriptionOwner(const char *name, Oid newOwnerId);
diff --git a/src/include/replication/logicallauncher.h b/src/include/replication/logicallauncher.h
index 421ec1580d..301e494f7b 100644
--- a/src/include/replication/logicallauncher.h
+++ b/src/include/replication/logicallauncher.h
@@ -22,9 +22,7 @@ extern Size ApplyLauncherShmemSize(void);
extern void ApplyLauncherShmemInit(void);
extern void ApplyLauncherWakeupAtCommit(void);
-extern bool XactManipulatesLogicalReplicationWorkers(void);
extern void AtEOXact_ApplyLauncher(bool isCommit);
-extern void AtEOSubXact_ApplyLauncher(bool isCommit, int nestDepth);
extern bool IsLogicalLauncher(void);
diff --git a/src/include/replication/slot.h b/src/include/replication/slot.h
index 53f636c56f..5f52335f15 100644
--- a/src/include/replication/slot.h
+++ b/src/include/replication/slot.h
@@ -15,6 +15,7 @@
#include "storage/lwlock.h"
#include "storage/shmem.h"
#include "storage/spin.h"
+#include "replication/walreceiver.h"
/*
* Behaviour of replication slots, upon release or crash.
@@ -211,6 +212,8 @@ extern bool ReplicationSlotsCountDBSlots(Oid dboid, int *nslots, int *nactive);
extern void ReplicationSlotsDropDBSlots(Oid dboid);
extern void InvalidateObsoleteReplicationSlots(XLogSegNo oldestSegno);
extern ReplicationSlot *SearchNamedReplicationSlot(const char *name);
+extern char *ReplicationSlotNameForTablesync(Oid suboid, Oid relid, char *syncslotname);
+extern void ReplicationSlotDropAtPubNode(WalReceiverConn *wrconn, char *slotname, bool missing_ok);
extern void StartupReplicationSlots(void);
extern void CheckPointReplicationSlots(void);
diff --git a/src/include/replication/walreceiver.h b/src/include/replication/walreceiver.h
index 4313f516d3..a97a59a6a3 100644
--- a/src/include/replication/walreceiver.h
+++ b/src/include/replication/walreceiver.h
@@ -210,6 +210,7 @@ typedef enum
typedef struct WalRcvExecResult
{
WalRcvExecStatus status;
+ int sqlstate;
char *err;
Tuplestorestate *tuplestore;
TupleDesc tupledesc;
diff --git a/src/include/replication/worker_internal.h b/src/include/replication/worker_internal.h
index d046022e49..4a5adc2fda 100644
--- a/src/include/replication/worker_internal.h
+++ b/src/include/replication/worker_internal.h
@@ -77,13 +77,14 @@ extern List *logicalrep_workers_find(Oid subid, bool only_running);
extern void logicalrep_worker_launch(Oid dbid, Oid subid, const char *subname,
Oid userid, Oid relid);
extern void logicalrep_worker_stop(Oid subid, Oid relid);
-extern void logicalrep_worker_stop_at_commit(Oid subid, Oid relid);
extern void logicalrep_worker_wakeup(Oid subid, Oid relid);
extern void logicalrep_worker_wakeup_ptr(LogicalRepWorker *worker);
extern int logicalrep_sync_worker_count(Oid subid);
+extern void ReplicationOriginNameForTablesync(Oid suboid, Oid relid, char *originname);
extern char *LogicalRepSyncTableStart(XLogRecPtr *origin_startpos);
+
void process_syncing_tables(XLogRecPtr current_lsn);
void invalidate_syncing_table_states(Datum arg, int cacheid,
uint32 hashvalue);
diff --git a/src/test/regress/expected/subscription.out b/src/test/regress/expected/subscription.out
index 2fa9bce66a..7802279cb2 100644
--- a/src/test/regress/expected/subscription.out
+++ b/src/test/regress/expected/subscription.out
@@ -201,6 +201,27 @@ ALTER SUBSCRIPTION regress_testsub SET (slot_name = NONE);
(1 row)
DROP SUBSCRIPTION regress_testsub;
+CREATE SUBSCRIPTION regress_testsub CONNECTION 'dbname=postgres' PUBLICATION mypub
+ WITH (enabled = true, create_slot = false, copy_data = false);
+-- fail - ALTER SUBSCRIPTION with refresh is not allowed in a transaction
+-- block or function
+BEGIN;
+ALTER SUBSCRIPTION regress_testsub SET PUBLICATION mypub WITH (refresh = true);
+ERROR: ALTER SUBSCRIPTION with refresh cannot run inside a transaction block
+END;
+BEGIN;
+ALTER SUBSCRIPTION regress_testsub REFRESH PUBLICATION;
+ERROR: ALTER SUBSCRIPTION ... REFRESH cannot run inside a transaction block
+END;
+CREATE FUNCTION func() RETURNS VOID AS
+$$ ALTER SUBSCRIPTION regress_testsub SET PUBLICATION mypub WITH (refresh = true) $$ LANGUAGE SQL;
+SELECT func();
+ERROR: ALTER SUBSCRIPTION with refresh cannot be executed from a function
+CONTEXT: SQL function "func" statement 1
+ALTER SUBSCRIPTION regress_testsub DISABLE;
+ALTER SUBSCRIPTION regress_testsub SET (slot_name = NONE);
+DROP SUBSCRIPTION regress_testsub;
+DROP FUNCTION func;
RESET SESSION AUTHORIZATION;
DROP ROLE regress_subscription_user;
DROP ROLE regress_subscription_user2;
diff --git a/src/test/regress/sql/subscription.sql b/src/test/regress/sql/subscription.sql
index 14fa0b247e..ca0d782742 100644
--- a/src/test/regress/sql/subscription.sql
+++ b/src/test/regress/sql/subscription.sql
@@ -147,6 +147,28 @@ ALTER SUBSCRIPTION regress_testsub SET (slot_name = NONE);
DROP SUBSCRIPTION regress_testsub;
+CREATE SUBSCRIPTION regress_testsub CONNECTION 'dbname=postgres' PUBLICATION mypub
+ WITH (enabled = true, create_slot = false, copy_data = false);
+
+-- fail - ALTER SUBSCRIPTION with refresh is not allowed in a transaction
+-- block or function
+BEGIN;
+ALTER SUBSCRIPTION regress_testsub SET PUBLICATION mypub WITH (refresh = true);
+END;
+
+BEGIN;
+ALTER SUBSCRIPTION regress_testsub REFRESH PUBLICATION;
+END;
+
+CREATE FUNCTION func() RETURNS VOID AS
+$$ ALTER SUBSCRIPTION regress_testsub SET PUBLICATION mypub WITH (refresh = true) $$ LANGUAGE SQL;
+SELECT func();
+
+ALTER SUBSCRIPTION regress_testsub DISABLE;
+ALTER SUBSCRIPTION regress_testsub SET (slot_name = NONE);
+DROP SUBSCRIPTION regress_testsub;
+DROP FUNCTION func;
+
RESET SESSION AUTHORIZATION;
DROP ROLE regress_subscription_user;
DROP ROLE regress_subscription_user2;
diff --git a/src/test/subscription/t/004_sync.pl b/src/test/subscription/t/004_sync.pl
index e111ab9181..c7926681b6 100644
--- a/src/test/subscription/t/004_sync.pl
+++ b/src/test/subscription/t/004_sync.pl
@@ -3,7 +3,7 @@ use strict;
use warnings;
use PostgresNode;
use TestLib;
-use Test::More tests => 7;
+use Test::More tests => 8;
# Initialize publisher node
my $node_publisher = get_new_node('publisher');
@@ -149,7 +149,26 @@ $result = $node_subscriber->safe_psql('postgres',
is($result, qq(20),
'changes for table added after subscription initialized replicated');
+# clean up
+$node_publisher->safe_psql('postgres', "DROP TABLE tab_rep_next");
+$node_subscriber->safe_psql('postgres', "DROP TABLE tab_rep_next");
$node_subscriber->safe_psql('postgres', "DROP SUBSCRIPTION tap_sub");
+# Table tap_rep already has the same records on both publisher and subscriber
+# at this time. Recreate the subscription which will do the initial copy of
+# the table again and fails due to unique constraint violation.
+$node_subscriber->safe_psql('postgres',
+ "CREATE SUBSCRIPTION tap_sub CONNECTION '$publisher_connstr' PUBLICATION tap_pub");
+
+$result = $node_subscriber->poll_query_until('postgres', $started_query)
+ or die "Timed out while waiting for subscriber to start sync";
+
+# DROP SUBSCRIPTION must clean up slots on the publisher side when the
+# subscriber is stuck on data copy for constraint violation.
+$node_subscriber->safe_psql('postgres', "DROP SUBSCRIPTION tap_sub");
+
+$result = $node_publisher->safe_psql('postgres', "SELECT count(*) FROM pg_replication_slots");
+is($result, qq(0), 'DROP SUBSCRIPTION during error can clean up the slots on the publisher');
+
$node_subscriber->stop('fast');
$node_publisher->stop('fast');
diff --git a/src/tools/pgindent/typedefs.list b/src/tools/pgindent/typedefs.list
index 1d540fe489..bab4f3adb3 100644
--- a/src/tools/pgindent/typedefs.list
+++ b/src/tools/pgindent/typedefs.list
@@ -2397,7 +2397,6 @@ StdAnalyzeData
StdRdOptions
Step
StopList
-StopWorkersData
StrategyNumber
StreamCtl
StreamXidHash
@@ -2408,6 +2407,7 @@ SubLink
SubLinkType
SubPlan
SubPlanState
+SubRemoveRels
SubTransactionId
SubXactCallback
SubXactCallbackItem
--
2.28.0.windows.1
On Thu, Feb 11, 2021 at 10:38 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
Okay, attached an updated patch with only that change.
I ran Erik's test suite [1]/messages/by-id/93d02794068482f96d31b002e0eb248d@xs4all.nl on this patch overnight and found no
errors. No more comments from me. The patch looks good.
regards,
Ajin Cherian
Fujitsu Australia
[1]: /messages/by-id/93d02794068482f96d31b002e0eb248d@xs4all.nl
On Fri, Feb 12, 2021 at 7:18 AM Ajin Cherian <itsajin@gmail.com> wrote:
On Thu, Feb 11, 2021 at 10:38 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
Okay, attached an updated patch with only that change.
I ran Erik's test suite [1] on this patch overnight and found no
errors. No more comments from me. The patch looks good.
Thanks, I have pushed the patch but getting one failure:
https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=thorntail&dt=2021-02-12%2002%3A28%3A12
The reason seems to be that we are trying to connect and
max_wal_senders is set to zero. I think we can write this without
trying to connect. The attached patch fixes the problem for me. What
do you think?
--
With Regards,
Amit Kapila.
Attachments:
fix_subs_test_1.patchapplication/octet-stream; name=fix_subs_test_1.patchDownload
diff --git a/src/test/regress/expected/subscription.out b/src/test/regress/expected/subscription.out
index 7802279..14a4302 100644
--- a/src/test/regress/expected/subscription.out
+++ b/src/test/regress/expected/subscription.out
@@ -201,8 +201,10 @@ ALTER SUBSCRIPTION regress_testsub SET (slot_name = NONE);
(1 row)
DROP SUBSCRIPTION regress_testsub;
-CREATE SUBSCRIPTION regress_testsub CONNECTION 'dbname=postgres' PUBLICATION mypub
- WITH (enabled = true, create_slot = false, copy_data = false);
+CREATE SUBSCRIPTION regress_testsub CONNECTION 'dbname=regress_doesnotexist' PUBLICATION mypub
+ WITH (connect = false, create_slot = false, copy_data = false);
+WARNING: tables were not subscribed, you will have to run ALTER SUBSCRIPTION ... REFRESH PUBLICATION to subscribe the tables
+ALTER SUBSCRIPTION regress_testsub ENABLE;
-- fail - ALTER SUBSCRIPTION with refresh is not allowed in a transaction
-- block or function
BEGIN;
diff --git a/src/test/regress/sql/subscription.sql b/src/test/regress/sql/subscription.sql
index ca0d782..81e65e5 100644
--- a/src/test/regress/sql/subscription.sql
+++ b/src/test/regress/sql/subscription.sql
@@ -147,8 +147,10 @@ ALTER SUBSCRIPTION regress_testsub SET (slot_name = NONE);
DROP SUBSCRIPTION regress_testsub;
-CREATE SUBSCRIPTION regress_testsub CONNECTION 'dbname=postgres' PUBLICATION mypub
- WITH (enabled = true, create_slot = false, copy_data = false);
+CREATE SUBSCRIPTION regress_testsub CONNECTION 'dbname=regress_doesnotexist' PUBLICATION mypub
+ WITH (connect = false, create_slot = false, copy_data = false);
+
+ALTER SUBSCRIPTION regress_testsub ENABLE;
-- fail - ALTER SUBSCRIPTION with refresh is not allowed in a transaction
-- block or function
On Fri, Feb 12, 2021 at 2:46 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
Thanks, I have pushed the patch but getting one failure:
https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=thorntail&dt=2021-02-12%2002%3A28%3A12The reason seems to be that we are trying to connect and
max_wal_senders is set to zero. I think we can write this without
trying to connect. The attached patch fixes the problem for me. What
do you think?
Verified this with installcheck and modified configuration to have
wal_level = minimal and max_wal_senders = 0.
Tests passed. The changes look good to me.
regards,
Ajin Cherian
Fujitsu Australia
On Fri, Feb 12, 2021 at 10:08 AM Ajin Cherian <itsajin@gmail.com> wrote:
On Fri, Feb 12, 2021 at 2:46 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
Thanks, I have pushed the patch but getting one failure:
https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=thorntail&dt=2021-02-12%2002%3A28%3A12The reason seems to be that we are trying to connect and
max_wal_senders is set to zero. I think we can write this without
trying to connect. The attached patch fixes the problem for me. What
do you think?Verified this with installcheck and modified configuration to have
wal_level = minimal and max_wal_senders = 0.
Tests passed. The changes look good to me.
Thanks, I have pushed the fix and the latest run of 'thorntail' has passed.
--
With Regards,
Amit Kapila.
On Fri, Feb 12, 2021 at 2:49 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
On Fri, Feb 12, 2021 at 10:08 AM Ajin Cherian <itsajin@gmail.com> wrote:
On Fri, Feb 12, 2021 at 2:46 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
Thanks, I have pushed the patch but getting one failure:
https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=thorntail&dt=2021-02-12%2002%3A28%3A12The reason seems to be that we are trying to connect and
max_wal_senders is set to zero. I think we can write this without
trying to connect. The attached patch fixes the problem for me. What
do you think?Verified this with installcheck and modified configuration to have
wal_level = minimal and max_wal_senders = 0.
Tests passed. The changes look good to me.Thanks, I have pushed the fix and the latest run of 'thorntail' has passed.
I got the following WARNING message from a logical replication apply worker:
WARNING: relcache reference leak: relation "pg_subscription_rel" not closed
The cause of this is that GetSubscriptionRelState() doesn't close the
relation in SUBREL_STATE_UNKNOWN case. It seems that commit ce0fdbfe9
forgot to close it. I've attached the patch to fix this issue.
Here is a reproducible step:
1. On both publisher and subscriber:
create table test (a int primary key);
2. On publisher:
create publication test_pub for table test;
3. On subscriber:
create subscription test_sub connection 'dbname=postgres' publication test_pub";
-- wait until table sync finished
drop table test;
create table test (a int primary key);
From this point, you will get the WARNING message when doing
insert/update/delete/truncate to 'test' table on the publisher.
Regards,
--
Masahiko Sawada
EDB: https://www.enterprisedb.com/
Attachments:
fix_relcache_leak.patchapplication/octet-stream; name=fix_relcache_leak.patchDownload
diff --git a/src/backend/catalog/pg_subscription.c b/src/backend/catalog/pg_subscription.c
index c32fc8137d..4039768865 100644
--- a/src/backend/catalog/pg_subscription.c
+++ b/src/backend/catalog/pg_subscription.c
@@ -353,6 +353,7 @@ GetSubscriptionRelState(Oid subid, Oid relid, XLogRecPtr *sublsn)
if (!HeapTupleIsValid(tup))
{
+ table_close(rel, AccessShareLock);
*sublsn = InvalidXLogRecPtr;
return SUBREL_STATE_UNKNOWN;
}
On Wed, Feb 24, 2021 at 12:47 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
On Fri, Feb 12, 2021 at 2:49 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
Thanks, I have pushed the fix and the latest run of 'thorntail' has passed.
I got the following WARNING message from a logical replication apply worker:
WARNING: relcache reference leak: relation "pg_subscription_rel" not closed
The cause of this is that GetSubscriptionRelState() doesn't close the
relation in SUBREL_STATE_UNKNOWN case. It seems that commit ce0fdbfe9
forgot to close it. I've attached the patch to fix this issue.
Thanks for the report and fix. Your patch LGTM. I'll push it tomorrow.
--
With Regards,
Amit Kapila.
On Wed, Feb 24, 2021 at 5:55 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
On Wed, Feb 24, 2021 at 12:47 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
On Fri, Feb 12, 2021 at 2:49 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
Thanks, I have pushed the fix and the latest run of 'thorntail' has passed.
I got the following WARNING message from a logical replication apply worker:
WARNING: relcache reference leak: relation "pg_subscription_rel" not closed
The cause of this is that GetSubscriptionRelState() doesn't close the
relation in SUBREL_STATE_UNKNOWN case. It seems that commit ce0fdbfe9
forgot to close it. I've attached the patch to fix this issue.Thanks for the report and fix. Your patch LGTM. I'll push it tomorrow.
Pushed!
--
With Regards,
Amit Kapila.
On Thu, Feb 25, 2021 at 1:52 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
On Wed, Feb 24, 2021 at 5:55 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
On Wed, Feb 24, 2021 at 12:47 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
On Fri, Feb 12, 2021 at 2:49 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
Thanks, I have pushed the fix and the latest run of 'thorntail' has passed.
I got the following WARNING message from a logical replication apply worker:
WARNING: relcache reference leak: relation "pg_subscription_rel" not closed
The cause of this is that GetSubscriptionRelState() doesn't close the
relation in SUBREL_STATE_UNKNOWN case. It seems that commit ce0fdbfe9
forgot to close it. I've attached the patch to fix this issue.Thanks for the report and fix. Your patch LGTM. I'll push it tomorrow.
Pushed!
Thank you!
Regards,
--
Masahiko Sawada
EDB: https://www.enterprisedb.com/