Deadlock between logrep apply worker and tablesync worker
On my machine, the src/test/subscription/t/002_types.pl test
usually takes right about 1.5 seconds:
$ time make check PROVE_FLAGS=--timer PROVE_TESTS=t/002_types.pl
...
[14:22:12] t/002_types.pl .. ok 1550 ms ( 0.00 usr 0.00 sys + 0.70 cusr 0.25 csys = 0.95 CPU)
[14:22:13]
I noticed however that sometimes (at least one try in ten, for me)
it takes about 2.5 seconds:
[14:22:16] t/002_types.pl .. ok 2591 ms ( 0.00 usr 0.00 sys + 0.69 cusr 0.28 csys = 0.97 CPU)
[14:22:18]
and I've even seen 3.5 seconds. I dug into this and eventually
identified the cause: it's a deadlock between a subscription's apply
worker and a tablesync worker that it's spawned. Sometimes the apply
worker calls wait_for_relation_state_change (to wait for the tablesync
worker to finish) while it's holding a lock on pg_replication_origin.
If that's the case, then when the tablesync worker reaches
process_syncing_tables_for_sync it is able to perform
UpdateSubscriptionRelState and reach the transaction commit below
that; but when it tries to do replorigin_drop_by_name a little further
down, it blocks on acquiring ExclusiveLock on pg_replication_origin.
So we have an undetected deadlock. We escape that because
wait_for_relation_state_change has a 1-second timeout, after which
it rechecks GetSubscriptionRelState and is able to see the committed
relation state change; so it continues, and eventually releases its
transaction and the lock, permitting the tablesync worker to finish.
I've not tracked down the exact circumstances in which the apply
worker ends up holding a problematic lock, but it seems likely
that it corresponds to cases where its main loop has itself called
replorigin_drop_by_name, a bit further up, for some other concurrent
tablesync operation. (In all the cases I've traced through, the apply
worker is herding multiple tablesync workers when this happens.)
I experimented with having the apply worker release its locks
before waiting for the tablesync worker, as attached. This passes
check-world and it seems to eliminate the test runtime instability,
but I wonder whether it's semantically correct. This whole business
of taking table-wide ExclusiveLock on pg_replication_origin looks
like a horrid kluge that we should try to get rid of, not least
because I don't see any clear documentation of what hazard it's
trying to prevent.
Another thing that has a bad smell about it is the fact that
process_syncing_tables_for_sync uses two transactions in the first
place. There's a comment there claiming that it's for crash safety,
but I can't help suspecting it's really because this case becomes a
hard deadlock without that mid-function commit.
It's not great in any case that the apply worker can move on in
the belief that the tablesync worker is done when in fact the latter
still has catalog state updates to make. And I wonder what we're
doing with having both of them calling replorigin_drop_by_name
... shouldn't that responsibility belong to just one of them?
So I think this whole area deserves a hard look, and I'm not at
all sure that what's attached is a good solution.
regards, tom lane
Attachments:
hack-to-prevent-logrep-worker-deadlocks.patchtext/x-diff; charset=us-ascii; name=hack-to-prevent-logrep-worker-deadlocks.patchDownload+15-3
On Mon, Jan 23, 2023 at 1:29 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:
On my machine, the src/test/subscription/t/002_types.pl test
usually takes right about 1.5 seconds:$ time make check PROVE_FLAGS=--timer PROVE_TESTS=t/002_types.pl
...
[14:22:12] t/002_types.pl .. ok 1550 ms ( 0.00 usr 0.00 sys + 0.70 cusr 0.25 csys = 0.95 CPU)
[14:22:13]I noticed however that sometimes (at least one try in ten, for me)
it takes about 2.5 seconds:[14:22:16] t/002_types.pl .. ok 2591 ms ( 0.00 usr 0.00 sys + 0.69 cusr 0.28 csys = 0.97 CPU)
[14:22:18]and I've even seen 3.5 seconds. I dug into this and eventually
identified the cause: it's a deadlock between a subscription's apply
worker and a tablesync worker that it's spawned. Sometimes the apply
worker calls wait_for_relation_state_change (to wait for the tablesync
worker to finish) while it's holding a lock on pg_replication_origin.
If that's the case, then when the tablesync worker reaches
process_syncing_tables_for_sync it is able to perform
UpdateSubscriptionRelState and reach the transaction commit below
that; but when it tries to do replorigin_drop_by_name a little further
down, it blocks on acquiring ExclusiveLock on pg_replication_origin.
So we have an undetected deadlock. We escape that because
wait_for_relation_state_change has a 1-second timeout, after which
it rechecks GetSubscriptionRelState and is able to see the committed
relation state change; so it continues, and eventually releases its
transaction and the lock, permitting the tablesync worker to finish.I've not tracked down the exact circumstances in which the apply
worker ends up holding a problematic lock, but it seems likely
that it corresponds to cases where its main loop has itself called
replorigin_drop_by_name, a bit further up, for some other concurrent
tablesync operation. (In all the cases I've traced through, the apply
worker is herding multiple tablesync workers when this happens.)I experimented with having the apply worker release its locks
before waiting for the tablesync worker, as attached.
I don't see any problem with your proposed change but I was wondering
if it would be better to commit the transaction and release locks
immediately after performing the replication origin drop? By doing
that, we will minimize the amount of time the transaction holds the
lock.
This passes
check-world and it seems to eliminate the test runtime instability,
but I wonder whether it's semantically correct. This whole business
of taking table-wide ExclusiveLock on pg_replication_origin looks
like a horrid kluge that we should try to get rid of, not least
because I don't see any clear documentation of what hazard it's
trying to prevent.
IIRC, this is done to prevent concurrent drops of origin drop say by
exposed API pg_replication_origin_drop(). See the discussion in [1]/messages/by-id/CAHut+PuW8DWV5fskkMWWMqzt-x7RPcNQOtJQBp6SdwyRghCk7A@mail.gmail.com
related to it. If we want we can optimize it so that we can acquire
the lock on the specific origin as mentioned in comments
replorigin_drop_by_name() but it was not clear that this operation
would be frequent enough.
[1]: /messages/by-id/CAHut+PuW8DWV5fskkMWWMqzt-x7RPcNQOtJQBp6SdwyRghCk7A@mail.gmail.com
--
With Regards,
Amit Kapila.
On Mon, Jan 23, 2023 at 1:29 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:
Another thing that has a bad smell about it is the fact that
process_syncing_tables_for_sync uses two transactions in the first
place. There's a comment there claiming that it's for crash safety,
but I can't help suspecting it's really because this case becomes a
hard deadlock without that mid-function commit.It's not great in any case that the apply worker can move on in
the belief that the tablesync worker is done when in fact the latter
still has catalog state updates to make. And I wonder what we're
doing with having both of them calling replorigin_drop_by_name
... shouldn't that responsibility belong to just one of them?
Originally, it was being dropped at one place only (via tablesync
worker) but we found a race condition as mentioned in the comments in
process_syncing_tables_for_sync() before the start of the second
transaction which leads to this change. See the report and discussion
about that race condition in the email [1]/messages/by-id/CAD21AoAw0Oofi4kiDpJBOwpYyBBBkJj=sLUOn4Gd2GjUAKG-fw@mail.gmail.com.
[1]: /messages/by-id/CAD21AoAw0Oofi4kiDpJBOwpYyBBBkJj=sLUOn4Gd2GjUAKG-fw@mail.gmail.com
--
With Regards,
Amit Kapila.
On Mon, 23 Jan 2023 at 10:52, Amit Kapila <amit.kapila16@gmail.com> wrote:
IIRC, this is done to prevent concurrent drops of origin drop say by
exposed API pg_replication_origin_drop(). See the discussion in [1]
related to it. If we want we can optimize it so that we can acquire
the lock on the specific origin as mentioned in comments
replorigin_drop_by_name() but it was not clear that this operation
would be frequent enough.
Here is an attached patch to lock the replication origin record using
LockSharedObject instead of locking pg_replication_origin relation in
ExclusiveLock mode. Now tablesync worker will wait only if the
tablesync worker is trying to drop the same replication origin which
has already been dropped by the apply worker, the other tablesync
workers will be able to successfully drop the replication origin
without any wait.
Regards,
Vignesh
Attachments:
0001-Lock-the-replication-origin-record-instead-of-lockin.patchtext/x-patch; charset=US-ASCII; name=0001-Lock-the-replication-origin-record-instead-of-lockin.patchDownload+25-13
On Fri, Jan 27, 2023 at 3:45 PM vignesh C <vignesh21@gmail.com> wrote:
On Mon, 23 Jan 2023 at 10:52, Amit Kapila <amit.kapila16@gmail.com> wrote:
IIRC, this is done to prevent concurrent drops of origin drop say by
exposed API pg_replication_origin_drop(). See the discussion in [1]
related to it. If we want we can optimize it so that we can acquire
the lock on the specific origin as mentioned in comments
replorigin_drop_by_name() but it was not clear that this operation
would be frequent enough.Here is an attached patch to lock the replication origin record using
LockSharedObject instead of locking pg_replication_origin relation in
ExclusiveLock mode. Now tablesync worker will wait only if the
tablesync worker is trying to drop the same replication origin which
has already been dropped by the apply worker, the other tablesync
workers will be able to successfully drop the replication origin
without any wait.
There is a code in the function replorigin_drop_guts() that uses the
functionality introduced by replorigin_exists(). Can we reuse this
function for the same?
Also, it would be good if you can share the numbers for different runs
of "src/test/subscription/t/002_types.pl" before and after the patch.
--
With Regards,
Amit Kapila.
On Friday, January 27, 2023 8:16 PM Amit Kapila <amit.kapila16@gmail.com>
On Fri, Jan 27, 2023 at 3:45 PM vignesh C <vignesh21@gmail.com> wrote:
On Mon, 23 Jan 2023 at 10:52, Amit Kapila <amit.kapila16@gmail.com>
wrote:
IIRC, this is done to prevent concurrent drops of origin drop say by
exposed API pg_replication_origin_drop(). See the discussion in [1]
related to it. If we want we can optimize it so that we can acquire
the lock on the specific origin as mentioned in comments
replorigin_drop_by_name() but it was not clear that this operation
would be frequent enough.Here is an attached patch to lock the replication origin record using
LockSharedObject instead of locking pg_replication_origin relation in
ExclusiveLock mode. Now tablesync worker will wait only if the
tablesync worker is trying to drop the same replication origin which
has already been dropped by the apply worker, the other tablesync
workers will be able to successfully drop the replication origin
without any wait.There is a code in the function replorigin_drop_guts() that uses the
functionality introduced by replorigin_exists(). Can we reuse this function for
the same?
Maybe we can use SearchSysCacheExists1 to check the existence instead of
adding a new function.
One comment about the patch.
@@ -430,23 +445,21 @@ replorigin_drop_by_name(const char *name, bool missing_ok, bool nowait)
...
+ /* Drop the replication origin if it has not been dropped already */
+ if (replorigin_exists(roident))
replorigin_drop_guts(rel, roident, nowait);
If developer pass missing_ok as false, should we report an ERROR here
instead of silently return ?
Best Regards,
Hou zj
On Sat, Jan 28, 2023 at 9:36 AM houzj.fnst@fujitsu.com
<houzj.fnst@fujitsu.com> wrote:
On Friday, January 27, 2023 8:16 PM Amit Kapila <amit.kapila16@gmail.com>
On Fri, Jan 27, 2023 at 3:45 PM vignesh C <vignesh21@gmail.com> wrote:
On Mon, 23 Jan 2023 at 10:52, Amit Kapila <amit.kapila16@gmail.com>
wrote:
IIRC, this is done to prevent concurrent drops of origin drop say by
exposed API pg_replication_origin_drop(). See the discussion in [1]
related to it. If we want we can optimize it so that we can acquire
the lock on the specific origin as mentioned in comments
replorigin_drop_by_name() but it was not clear that this operation
would be frequent enough.Here is an attached patch to lock the replication origin record using
LockSharedObject instead of locking pg_replication_origin relation in
ExclusiveLock mode. Now tablesync worker will wait only if the
tablesync worker is trying to drop the same replication origin which
has already been dropped by the apply worker, the other tablesync
workers will be able to successfully drop the replication origin
without any wait.There is a code in the function replorigin_drop_guts() that uses the
functionality introduced by replorigin_exists(). Can we reuse this function for
the same?Maybe we can use SearchSysCacheExists1 to check the existence instead of
adding a new function.
Yeah, I think that would be better.
One comment about the patch.
@@ -430,23 +445,21 @@ replorigin_drop_by_name(const char *name, bool missing_ok, bool nowait) ... + /* Drop the replication origin if it has not been dropped already */ + if (replorigin_exists(roident)) replorigin_drop_guts(rel, roident, nowait);If developer pass missing_ok as false, should we report an ERROR here
instead of silently return ?
One thing that looks a bit odd is that we will anyway have a similar
check in replorigin_drop_guts() which is a static function and called
from only one place, so, will it be required to check at both places?
--
With Regards,
Amit Kapila.
On Sat, 28 Jan 2023 at 11:26, Amit Kapila <amit.kapila16@gmail.com> wrote:
On Sat, Jan 28, 2023 at 9:36 AM houzj.fnst@fujitsu.com
<houzj.fnst@fujitsu.com> wrote:On Friday, January 27, 2023 8:16 PM Amit Kapila <amit.kapila16@gmail.com>
On Fri, Jan 27, 2023 at 3:45 PM vignesh C <vignesh21@gmail.com> wrote:
On Mon, 23 Jan 2023 at 10:52, Amit Kapila <amit.kapila16@gmail.com>
wrote:
IIRC, this is done to prevent concurrent drops of origin drop say by
exposed API pg_replication_origin_drop(). See the discussion in [1]
related to it. If we want we can optimize it so that we can acquire
the lock on the specific origin as mentioned in comments
replorigin_drop_by_name() but it was not clear that this operation
would be frequent enough.Here is an attached patch to lock the replication origin record using
LockSharedObject instead of locking pg_replication_origin relation in
ExclusiveLock mode. Now tablesync worker will wait only if the
tablesync worker is trying to drop the same replication origin which
has already been dropped by the apply worker, the other tablesync
workers will be able to successfully drop the replication origin
without any wait.There is a code in the function replorigin_drop_guts() that uses the
functionality introduced by replorigin_exists(). Can we reuse this function for
the same?Maybe we can use SearchSysCacheExists1 to check the existence instead of
adding a new function.Yeah, I think that would be better.
One comment about the patch.
@@ -430,23 +445,21 @@ replorigin_drop_by_name(const char *name, bool missing_ok, bool nowait) ... + /* Drop the replication origin if it has not been dropped already */ + if (replorigin_exists(roident)) replorigin_drop_guts(rel, roident, nowait);If developer pass missing_ok as false, should we report an ERROR here
instead of silently return ?One thing that looks a bit odd is that we will anyway have a similar
check in replorigin_drop_guts() which is a static function and called
from only one place, so, will it be required to check at both places?
There is a possibility that the initial check to verify if replication
origin exists in replorigin_drop_by_name was successful but later one
of either table sync worker or apply worker process might have dropped
the replication origin, so it is better to check again before calling
replorigin_drop_guts, ideally the tuple should be valid in
replorigin_drop_guts, but can we keep the check as it is to maintain
the consistency before calling CatalogTupleDelete.
Regards,
Vignesh
On Fri, 27 Jan 2023 at 17:46, Amit Kapila <amit.kapila16@gmail.com> wrote:
On Fri, Jan 27, 2023 at 3:45 PM vignesh C <vignesh21@gmail.com> wrote:
On Mon, 23 Jan 2023 at 10:52, Amit Kapila <amit.kapila16@gmail.com> wrote:
IIRC, this is done to prevent concurrent drops of origin drop say by
exposed API pg_replication_origin_drop(). See the discussion in [1]
related to it. If we want we can optimize it so that we can acquire
the lock on the specific origin as mentioned in comments
replorigin_drop_by_name() but it was not clear that this operation
would be frequent enough.Here is an attached patch to lock the replication origin record using
LockSharedObject instead of locking pg_replication_origin relation in
ExclusiveLock mode. Now tablesync worker will wait only if the
tablesync worker is trying to drop the same replication origin which
has already been dropped by the apply worker, the other tablesync
workers will be able to successfully drop the replication origin
without any wait.There is a code in the function replorigin_drop_guts() that uses the
functionality introduced by replorigin_exists(). Can we reuse this
function for the same?Also, it would be good if you can share the numbers for different runs
of "src/test/subscription/t/002_types.pl" before and after the patch.
By using only Tom Lane's Fix, I noticed that the execution time is
varying between 3.6 seconds to 4.4 seconds.
By using only the "Changing to LockSharedObject" fix, I noticed that
the execution time is varying between 3.8 seconds to 4.6 seconds.
By using the combined Fix(Tom Lane's fix + changing to
LockSharedObject) , I noticed that the execution time is varying
between 3.5 seconds to 3.8 seconds.
I felt both the changes will be required as it will also handle the
scenario when both the apply worker and the table sync worker try to
drop the same replication origin.
The execution results for the same:
With only Tom Lane's fix:
[12:25:32] t/002_types.pl .. ok 3604 ms ( 0.00 usr 0.00 sys +
2.26 cusr 0.37 csys = 2.63 CPU)
[12:25:48] t/002_types.pl .. ok 3788 ms ( 0.00 usr 0.00 sys +
2.24 cusr 0.39 csys = 2.63 CPU)
[12:26:01] t/002_types.pl .. ok 3783 ms ( 0.00 usr 0.00 sys +
2.42 cusr 0.37 csys = 2.79 CPU)
[12:26:14] t/002_types.pl .. ok 3845 ms ( 0.00 usr 0.00 sys +
2.38 cusr 0.44 csys = 2.82 CPU)
[12:26:29] t/002_types.pl .. ok 3923 ms ( 0.00 usr 0.00 sys +
2.54 cusr 0.39 csys = 2.93 CPU)
[12:26:42] t/002_types.pl .. ok 4416 ms ( 0.00 usr 0.00 sys +
2.73 cusr 0.48 csys = 3.21 CPU)
[12:26:55] t/002_types.pl .. ok 4310 ms ( 0.00 usr 0.00 sys +
2.62 cusr 0.39 csys = 3.01 CPU)
[12:27:09] t/002_types.pl .. ok 4168 ms ( 0.00 usr 0.00 sys +
2.67 cusr 0.46 csys = 3.13 CPU)
[12:27:21] t/002_types.pl .. ok 4167 ms ( 0.00 usr 0.00 sys +
2.46 cusr 0.53 csys = 2.99 CPU)
[12:27:34] t/002_types.pl .. ok 4144 ms ( 0.00 usr 0.00 sys +
2.59 cusr 0.41 csys = 3.00 CPU)
[12:27:46] t/002_types.pl .. ok 3982 ms ( 0.00 usr 0.00 sys +
2.52 cusr 0.41 csys = 2.93 CPU)
[12:28:03] t/002_types.pl .. ok 4190 ms ( 0.01 usr 0.00 sys +
2.67 cusr 0.46 csys = 3.14 CPU)
With only "Changing to LockSharedObject" fix:
[12:33:02] t/002_types.pl .. ok 3815 ms ( 0.00 usr 0.00 sys +
2.30 cusr 0.38 csys = 2.68 CPU)
[12:33:16] t/002_types.pl .. ok 4295 ms ( 0.00 usr 0.00 sys +
2.66 cusr 0.42 csys = 3.08 CPU)
[12:33:31] t/002_types.pl .. ok 4270 ms ( 0.00 usr 0.00 sys +
2.72 cusr 0.44 csys = 3.16 CPU)
[12:33:44] t/002_types.pl .. ok 4460 ms ( 0.00 usr 0.00 sys +
2.78 cusr 0.45 csys = 3.23 CPU)
[12:33:58] t/002_types.pl .. ok 4340 ms ( 0.01 usr 0.00 sys +
2.67 cusr 0.45 csys = 3.13 CPU)
[12:34:11] t/002_types.pl .. ok 4142 ms ( 0.00 usr 0.00 sys +
2.58 cusr 0.42 csys = 3.00 CPU)
[12:34:24] t/002_types.pl .. ok 4459 ms ( 0.00 usr 0.00 sys +
2.76 cusr 0.49 csys = 3.25 CPU)
[12:34:38] t/002_types.pl .. ok 4427 ms ( 0.00 usr 0.00 sys +
2.68 cusr 0.48 csys = 3.16 CPU)
[12:35:10] t/002_types.pl .. ok 4642 ms ( 0.00 usr 0.00 sys +
2.84 cusr 0.55 csys = 3.39 CPU)
[12:35:22] t/002_types.pl .. ok 4047 ms ( 0.01 usr 0.00 sys +
2.49 cusr 0.46 csys = 2.96 CPU)
[12:35:32] t/002_types.pl .. ok 4505 ms ( 0.01 usr 0.00 sys +
2.90 cusr 0.45 csys = 3.36 CPU)
[12:36:03] t/002_types.pl .. ok 4088 ms ( 0.00 usr 0.00 sys +
2.51 cusr 0.42 csys = 2.93 CPU)
002_types with combination of Tom Lane's and "Changing to LockSharedObject" fix:
[10:22:04] t/002_types.pl .. ok 3730 ms ( 0.00 usr 0.00 sys +
2.30 cusr 0.41 csys = 2.71 CPU)
[10:23:40] t/002_types.pl .. ok 3666 ms ( 0.00 usr 0.00 sys +
2.16 cusr 0.42 csys = 2.58 CPU)
[10:23:31] t/002_types.pl .. ok 3665 ms ( 0.00 usr 0.00 sys +
2.31 cusr 0.40 csys = 2.71 CPU)
[10:23:23] t/002_types.pl .. ok 3500 ms ( 0.00 usr 0.00 sys +
2.20 cusr 0.36 csys = 2.56 CPU)
[10:23:14] t/002_types.pl .. ok 3704 ms ( 0.00 usr 0.00 sys +
2.36 cusr 0.35 csys = 2.71 CPU)
[10:23:05] t/002_types.pl .. ok 3594 ms ( 0.00 usr 0.00 sys +
2.32 cusr 0.31 csys = 2.63 CPU)
[10:24:10] t/002_types.pl .. ok 3702 ms ( 0.00 usr 0.00 sys +
2.27 cusr 0.42 csys = 2.69 CPU)
[10:24:22] t/002_types.pl .. ok 3741 ms ( 0.00 usr 0.00 sys +
2.39 cusr 0.36 csys = 2.75 CPU)
[10:24:38] t/002_types.pl .. ok 3676 ms ( 0.00 usr 0.00 sys +
2.28 cusr 0.43 csys = 2.71 CPU)
[10:24:50] t/002_types.pl .. ok 3843 ms ( 0.00 usr 0.00 sys +
2.36 cusr 0.43 csys = 2.79 CPU)
[10:25:03] t/002_types.pl .. ok 3710 ms ( 0.00 usr 0.00 sys +
2.30 cusr 0.36 csys = 2.66 CPU)
[10:25:12] t/002_types.pl .. ok 3695 ms ( 0.00 usr 0.00 sys +
2.34 cusr 0.35 csys = 2.69 CPU)
002_types on HEAD:
[10:31:05] t/002_types.pl .. ok 5687 ms ( 0.00 usr 0.00 sys +
2.35 cusr 0.45 csys = 2.80 CPU)
[10:31:31] t/002_types.pl .. ok 6815 ms ( 0.00 usr 0.00 sys +
2.61 cusr 0.43 csys = 3.04 CPU)
[10:31:47] t/002_types.pl .. ok 5561 ms ( 0.00 usr 0.00 sys +
2.24 cusr 0.47 csys = 2.71 CPU)
[10:32:05] t/002_types.pl .. ok 4542 ms ( 0.00 usr 0.00 sys +
2.27 cusr 0.39 csys = 2.66 CPU)
[10:32:20] t/002_types.pl .. ok 3663 ms ( 0.00 usr 0.00 sys +
2.30 cusr 0.38 csys = 2.68 CPU)
[10:32:33] t/002_types.pl .. ok 3627 ms ( 0.00 usr 0.00 sys +
2.27 cusr 0.32 csys = 2.59 CPU)
[10:32:45] t/002_types.pl .. ok 3808 ms ( 0.00 usr 0.00 sys +
2.41 cusr 0.39 csys = 2.80 CPU)
[10:32:59] t/002_types.pl .. ok 4536 ms ( 0.00 usr 0.00 sys +
2.24 cusr 0.38 csys = 2.62 CPU)
[10:33:13] t/002_types.pl .. ok 3638 ms ( 0.00 usr 0.00 sys +
2.25 cusr 0.41 csys = 2.66 CPU)
[10:33:35] t/002_types.pl .. ok 4796 ms ( 0.00 usr 0.00 sys +
2.38 cusr 0.38 csys = 2.76 CPU)
[10:33:51] t/002_types.pl .. ok 4695 ms ( 0.00 usr 0.00 sys +
2.40 cusr 0.37 csys = 2.77 CPU)
[10:34:06] t/002_types.pl .. ok 5738 ms ( 0.00 usr 0.00 sys +
2.44 cusr 0.43 csys = 2.87 CPU)
Regards,
Vignesh
On Mon, Jan 30, 2023 at 9:20 AM vignesh C <vignesh21@gmail.com> wrote:
On Sat, 28 Jan 2023 at 11:26, Amit Kapila <amit.kapila16@gmail.com> wrote:
One thing that looks a bit odd is that we will anyway have a similar
check in replorigin_drop_guts() which is a static function and called
from only one place, so, will it be required to check at both places?There is a possibility that the initial check to verify if replication
origin exists in replorigin_drop_by_name was successful but later one
of either table sync worker or apply worker process might have dropped
the replication origin,
Won't locking on the particular origin prevent concurrent drops? IIUC,
the drop happens after the patch acquires the lock on the origin.
--
With Regards,
Amit Kapila.
On Monday, January 30, 2023 2:32 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
On Mon, Jan 30, 2023 at 9:20 AM vignesh C <vignesh21@gmail.com> wrote:
On Sat, 28 Jan 2023 at 11:26, Amit Kapila <amit.kapila16@gmail.com> wrote:
One thing that looks a bit odd is that we will anyway have a similar
check in replorigin_drop_guts() which is a static function and
called from only one place, so, will it be required to check at both places?There is a possibility that the initial check to verify if replication
origin exists in replorigin_drop_by_name was successful but later one
of either table sync worker or apply worker process might have dropped
the replication origin,Won't locking on the particular origin prevent concurrent drops? IIUC, the
drop happens after the patch acquires the lock on the origin.
Yes, I think the existence check in replorigin_drop_guts is unnecessary as we
already lock the origin before that. I think the check in replorigin_drop_guts
is a custom check after calling SearchSysCache1 to get the tuple, but the error
should not happen as no concurrent drop can be performed.
To make it simpler, one idea is to move the code that getting the tuple from
system cache to the replorigin_drop_by_name(). After locking the origin, we
can try to get the tuple and do the existence check, and we can reuse
this tuple to perform origin delete. In this approach we only need to check
origin existence once after locking. BTW, if we do this, then we'd better rename the
replorigin_drop_guts() to something like replorigin_state_clear() as the function
only clear the in-memory information after that.
The code could be like:
-------
replorigin_drop_by_name(const char *name, bool missing_ok, bool nowait)
...
/*
* Lock the origin to prevent concurrent drops. We keep the lock until the
* end of transaction.
*/
LockSharedObject(ReplicationOriginRelationId, roident, 0,
AccessExclusiveLock);
tuple = SearchSysCache1(REPLORIGIDENT, ObjectIdGetDatum(roident));
if (!HeapTupleIsValid(tuple))
{
if (!missing_ok)
elog(ERROR, "cache lookup failed for replication origin with ID %d",
roident);
return;
}
replorigin_state_clear(rel, roident, nowait);
/*
* Now, we can delete the catalog entry.
*/
CatalogTupleDelete(rel, &tuple->t_self);
ReleaseSysCache(tuple);
CommandCounterIncrement();
...
-------
Best Regards,
Hou zj
On Mon, 30 Jan 2023 at 13:00, houzj.fnst@fujitsu.com
<houzj.fnst@fujitsu.com> wrote:
On Monday, January 30, 2023 2:32 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
On Mon, Jan 30, 2023 at 9:20 AM vignesh C <vignesh21@gmail.com> wrote:
On Sat, 28 Jan 2023 at 11:26, Amit Kapila <amit.kapila16@gmail.com> wrote:
One thing that looks a bit odd is that we will anyway have a similar
check in replorigin_drop_guts() which is a static function and
called from only one place, so, will it be required to check at both places?There is a possibility that the initial check to verify if replication
origin exists in replorigin_drop_by_name was successful but later one
of either table sync worker or apply worker process might have dropped
the replication origin,Won't locking on the particular origin prevent concurrent drops? IIUC, the
drop happens after the patch acquires the lock on the origin.Yes, I think the existence check in replorigin_drop_guts is unnecessary as we
already lock the origin before that. I think the check in replorigin_drop_guts
is a custom check after calling SearchSysCache1 to get the tuple, but the error
should not happen as no concurrent drop can be performed.
This scenario is possible while creating subscription, apply worker
will try to drop the replication origin if the state is
SUBREL_STATE_SYNCDONE. Table sync worker will set the state to
SUBREL_STATE_SYNCDONE and update the relation state before calling
replorigin_drop_by_name. Since the transaction is committed by table
sync worker, the state is visible to apply worker, now apply worker
will parallelly try to drop the replication origin in this case.
There is a race condition in this case, one of the process table sync
worker or apply worker will acquire the lock and drop the replication
origin, the other process will get the lock after the process drops
the origin and commits the transaction. Now the other process will try
to drop the replication origin once it acquires the lock and get the
error(from replorigin_drop_guts): cache lookup failed for replication
origin with ID.
Concurrent drop is possible in this case.
Regards,
Vignesh
On Mon, 30 Jan 2023 at 13:00, houzj.fnst@fujitsu.com
<houzj.fnst@fujitsu.com> wrote:
On Monday, January 30, 2023 2:32 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
On Mon, Jan 30, 2023 at 9:20 AM vignesh C <vignesh21@gmail.com> wrote:
On Sat, 28 Jan 2023 at 11:26, Amit Kapila <amit.kapila16@gmail.com> wrote:
One thing that looks a bit odd is that we will anyway have a similar
check in replorigin_drop_guts() which is a static function and
called from only one place, so, will it be required to check at both places?There is a possibility that the initial check to verify if replication
origin exists in replorigin_drop_by_name was successful but later one
of either table sync worker or apply worker process might have dropped
the replication origin,Won't locking on the particular origin prevent concurrent drops? IIUC, the
drop happens after the patch acquires the lock on the origin.Yes, I think the existence check in replorigin_drop_guts is unnecessary as we
already lock the origin before that. I think the check in replorigin_drop_guts
is a custom check after calling SearchSysCache1 to get the tuple, but the error
should not happen as no concurrent drop can be performed.To make it simpler, one idea is to move the code that getting the tuple from
system cache to the replorigin_drop_by_name(). After locking the origin, we
can try to get the tuple and do the existence check, and we can reuse
this tuple to perform origin delete. In this approach we only need to check
origin existence once after locking. BTW, if we do this, then we'd better rename the
replorigin_drop_guts() to something like replorigin_state_clear() as the function
only clear the in-memory information after that.The code could be like:
-------
replorigin_drop_by_name(const char *name, bool missing_ok, bool nowait)
...
/*
* Lock the origin to prevent concurrent drops. We keep the lock until the
* end of transaction.
*/
LockSharedObject(ReplicationOriginRelationId, roident, 0,
AccessExclusiveLock);tuple = SearchSysCache1(REPLORIGIDENT, ObjectIdGetDatum(roident));
if (!HeapTupleIsValid(tuple))
{
if (!missing_ok)
elog(ERROR, "cache lookup failed for replication origin with ID %d",
roident);return;
}replorigin_state_clear(rel, roident, nowait);
/*
* Now, we can delete the catalog entry.
*/
CatalogTupleDelete(rel, &tuple->t_self);
ReleaseSysCache(tuple);CommandCounterIncrement();
...
+1 for this change as it removes the redundant check which is not
required. I will post an updated version for this.
Regards,
Vignesh
On Mon, 30 Jan 2023 at 13:00, houzj.fnst@fujitsu.com
<houzj.fnst@fujitsu.com> wrote:
On Monday, January 30, 2023 2:32 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
On Mon, Jan 30, 2023 at 9:20 AM vignesh C <vignesh21@gmail.com> wrote:
On Sat, 28 Jan 2023 at 11:26, Amit Kapila <amit.kapila16@gmail.com> wrote:
One thing that looks a bit odd is that we will anyway have a similar
check in replorigin_drop_guts() which is a static function and
called from only one place, so, will it be required to check at both places?There is a possibility that the initial check to verify if replication
origin exists in replorigin_drop_by_name was successful but later one
of either table sync worker or apply worker process might have dropped
the replication origin,Won't locking on the particular origin prevent concurrent drops? IIUC, the
drop happens after the patch acquires the lock on the origin.Yes, I think the existence check in replorigin_drop_guts is unnecessary as we
already lock the origin before that. I think the check in replorigin_drop_guts
is a custom check after calling SearchSysCache1 to get the tuple, but the error
should not happen as no concurrent drop can be performed.To make it simpler, one idea is to move the code that getting the tuple from
system cache to the replorigin_drop_by_name(). After locking the origin, we
can try to get the tuple and do the existence check, and we can reuse
this tuple to perform origin delete. In this approach we only need to check
origin existence once after locking. BTW, if we do this, then we'd better rename the
replorigin_drop_guts() to something like replorigin_state_clear() as the function
only clear the in-memory information after that.The code could be like:
-------
replorigin_drop_by_name(const char *name, bool missing_ok, bool nowait)
...
/*
* Lock the origin to prevent concurrent drops. We keep the lock until the
* end of transaction.
*/
LockSharedObject(ReplicationOriginRelationId, roident, 0,
AccessExclusiveLock);tuple = SearchSysCache1(REPLORIGIDENT, ObjectIdGetDatum(roident));
if (!HeapTupleIsValid(tuple))
{
if (!missing_ok)
elog(ERROR, "cache lookup failed for replication origin with ID %d",
roident);return;
}replorigin_state_clear(rel, roident, nowait);
/*
* Now, we can delete the catalog entry.
*/
CatalogTupleDelete(rel, &tuple->t_self);
ReleaseSysCache(tuple);CommandCounterIncrement();
The attached updated patch has the changes to handle the same.
Regards,
Vignesh
Attachments:
v2-0001-Lock-the-replication-origin-record-instead-of-loc.patchtext/x-patch; charset=US-ASCII; name=v2-0001-Lock-the-replication-origin-record-instead-of-loc.patchDownload+27-30
On Mon, 30 Jan 2023 at 17:30, vignesh C <vignesh21@gmail.com> wrote:
On Mon, 30 Jan 2023 at 13:00, houzj.fnst@fujitsu.com
<houzj.fnst@fujitsu.com> wrote:On Monday, January 30, 2023 2:32 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
On Mon, Jan 30, 2023 at 9:20 AM vignesh C <vignesh21@gmail.com> wrote:
On Sat, 28 Jan 2023 at 11:26, Amit Kapila <amit.kapila16@gmail.com> wrote:
One thing that looks a bit odd is that we will anyway have a similar
check in replorigin_drop_guts() which is a static function and
called from only one place, so, will it be required to check at both places?There is a possibility that the initial check to verify if replication
origin exists in replorigin_drop_by_name was successful but later one
of either table sync worker or apply worker process might have dropped
the replication origin,Won't locking on the particular origin prevent concurrent drops? IIUC, the
drop happens after the patch acquires the lock on the origin.Yes, I think the existence check in replorigin_drop_guts is unnecessary as we
already lock the origin before that. I think the check in replorigin_drop_guts
is a custom check after calling SearchSysCache1 to get the tuple, but the error
should not happen as no concurrent drop can be performed.To make it simpler, one idea is to move the code that getting the tuple from
system cache to the replorigin_drop_by_name(). After locking the origin, we
can try to get the tuple and do the existence check, and we can reuse
this tuple to perform origin delete. In this approach we only need to check
origin existence once after locking. BTW, if we do this, then we'd better rename the
replorigin_drop_guts() to something like replorigin_state_clear() as the function
only clear the in-memory information after that.The code could be like:
-------
replorigin_drop_by_name(const char *name, bool missing_ok, bool nowait)
...
/*
* Lock the origin to prevent concurrent drops. We keep the lock until the
* end of transaction.
*/
LockSharedObject(ReplicationOriginRelationId, roident, 0,
AccessExclusiveLock);tuple = SearchSysCache1(REPLORIGIDENT, ObjectIdGetDatum(roident));
if (!HeapTupleIsValid(tuple))
{
if (!missing_ok)
elog(ERROR, "cache lookup failed for replication origin with ID %d",
roident);return;
}replorigin_state_clear(rel, roident, nowait);
/*
* Now, we can delete the catalog entry.
*/
CatalogTupleDelete(rel, &tuple->t_self);
ReleaseSysCache(tuple);CommandCounterIncrement();
The attached updated patch has the changes to handle the same.
I had not merged one of the local changes that was present, please
find the updated patch including that change. Sorry for missing that
change.
Regards,
Vignesh
Attachments:
v3-0001-Lock-the-replication-origin-record-instead-of-loc.patchtext/x-patch; charset=US-ASCII; name=v3-0001-Lock-the-replication-origin-record-instead-of-loc.patchDownload+28-30
On Tuesday, January 31, 2023 1:07 AM vignesh C <vignesh21@gmail.com> wrote:
On Mon, 30 Jan 2023 at 17:30, vignesh C <vignesh21@gmail.com> wrote:
On Mon, 30 Jan 2023 at 13:00, houzj.fnst@fujitsu.com
<houzj.fnst@fujitsu.com> wrote:On Monday, January 30, 2023 2:32 PM Amit Kapila
<amit.kapila16@gmail.com> wrote:
On Mon, Jan 30, 2023 at 9:20 AM vignesh C <vignesh21@gmail.com>
wrote:
On Sat, 28 Jan 2023 at 11:26, Amit Kapila <amit.kapila16@gmail.com>
wrote:
One thing that looks a bit odd is that we will anyway have a
similar check in replorigin_drop_guts() which is a static
function and called from only one place, so, will it be required tocheck at both places?
There is a possibility that the initial check to verify if
replication origin exists in replorigin_drop_by_name was
successful but later one of either table sync worker or apply
worker process might have dropped the replication origin,Won't locking on the particular origin prevent concurrent drops?
IIUC, the drop happens after the patch acquires the lock on the origin.Yes, I think the existence check in replorigin_drop_guts is
unnecessary as we already lock the origin before that. I think the
check in replorigin_drop_guts is a custom check after calling
SearchSysCache1 to get the tuple, but the error should not happen as noconcurrent drop can be performed.
To make it simpler, one idea is to move the code that getting the
tuple from system cache to the replorigin_drop_by_name(). After
locking the origin, we can try to get the tuple and do the existence
check, and we can reuse this tuple to perform origin delete. In this
approach we only need to check origin existence once after locking.
BTW, if we do this, then we'd better rename the
replorigin_drop_guts() to something like replorigin_state_clear() as
the function only clear the in-memory information after that.The attached updated patch has the changes to handle the same.
I had not merged one of the local changes that was present, please find the
updated patch including that change. Sorry for missing that change.
I also tried to test the time of "src/test/subscription/t/002_types.pl"
before and after the patch(change the lock level) and Tom's patch(split
transaction) like what Vignesh has shared on -hackers.
I run about 100 times for each case. Tom's and the lock level patch
behave similarly on my machines[1]CentOS 8.2, 128G RAM, 40 processors Intel(R) Xeon(R) Silver 4210 CPU @ 2.20GHz.
HEAD: 3426 ~ 6425 ms
HEAD + Tom: 3404 ~ 3462 ms
HEAD + Vignesh: 3419 ~ 3474 ms
HEAD + Tom + Vignesh: 3408 ~ 3454 ms
Even apart from the testing time reduction, reducing the lock level and lock
the specific object can also help improve the lock contention which user(that
use the exposed function) , table sync worker and apply worker can also benefit
from it. So, I think pushing the patch to change the lock level makes sense.
And the patch looks good to me.
While on it, after pushing the patch, I think there is another case might also
worth to be improved, that is the table sync and apply worker try to drop the
same origin which might cause some delay. This is another case(different from
the deadlock), so I feel we can try to improve this in another patch.
[1]: CentOS 8.2, 128G RAM, 40 processors Intel(R) Xeon(R) Silver 4210 CPU @ 2.20GHz
Best Regards,
Hou zj
On Thu, Feb 2, 2023 at 12:05 PM houzj.fnst@fujitsu.com
<houzj.fnst@fujitsu.com> wrote:
On Tuesday, January 31, 2023 1:07 AM vignesh C <vignesh21@gmail.com> wrote:
On Mon, 30 Jan 2023 at 17:30, vignesh C <vignesh21@gmail.com> wrote:
I also tried to test the time of "src/test/subscription/t/002_types.pl"
before and after the patch(change the lock level) and Tom's patch(split
transaction) like what Vignesh has shared on -hackers.I run about 100 times for each case. Tom's and the lock level patch
behave similarly on my machines[1].HEAD: 3426 ~ 6425 ms
HEAD + Tom: 3404 ~ 3462 ms
HEAD + Vignesh: 3419 ~ 3474 ms
HEAD + Tom + Vignesh: 3408 ~ 3454 msEven apart from the testing time reduction, reducing the lock level and lock
the specific object can also help improve the lock contention which user(that
use the exposed function) , table sync worker and apply worker can also benefit
from it. So, I think pushing the patch to change the lock level makes sense.And the patch looks good to me.
Thanks for the tests. I also see a reduction in test time variability
with Vignesh's patch. I think we can release the locks in case the
origin is concurrently dropped as in the attached patch. I am planning
to commit this patch tomorrow unless there are more comments or
objections.
While on it, after pushing the patch, I think there is another case might also
worth to be improved, that is the table sync and apply worker try to drop the
same origin which might cause some delay. This is another case(different from
the deadlock), so I feel we can try to improve this in another patch.
Right, I think that case could be addressed by Tom's patch to some
extent but I am thinking we should also try to analyze if we can
completely avoid the need to remove origins from both processes. One
idea could be to introduce another relstate something like
PRE_SYNCDONE and set it in a separate transaction before we set the
state as SYNCDONE and remove the slot and origin in tablesync worker.
Now, if the tablesync worker errors out due to some reason during the
second transaction, it can remove the slot and origin after restart by
checking the state. However, it would add another relstate which may
not be the best way to address this problem. Anyway, that can be
accomplished as a separate patch.
--
With Regards,
Amit Kapila.
Attachments:
v4-0001-Optimize-the-origin-drop-functionality.patchapplication/octet-stream; name=v4-0001-Optimize-the-origin-drop-functionality.patchDownload+33-30
On Thu, Feb 2, 2023 at 4:51 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
Thanks for the tests. I also see a reduction in test time variability
with Vignesh's patch. I think we can release the locks in case the
origin is concurrently dropped as in the attached patch. I am planning
to commit this patch tomorrow unless there are more comments or
objections.
Pushed.
--
With Regards,
Amit Kapila.
On Thursday, February 2, 2023 7:21 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
On Thu, Feb 2, 2023 at 12:05 PM houzj.fnst@fujitsu.com
<houzj.fnst@fujitsu.com> wrote:On Tuesday, January 31, 2023 1:07 AM vignesh C <vignesh21@gmail.com>
wrote:
On Mon, 30 Jan 2023 at 17:30, vignesh C <vignesh21@gmail.com> wrote:
I also tried to test the time of "src/test/subscription/t/002_types.pl"
before and after the patch(change the lock level) and Tom's
patch(split
transaction) like what Vignesh has shared on -hackers.I run about 100 times for each case. Tom's and the lock level patch
behave similarly on my machines[1].HEAD: 3426 ~ 6425 ms
HEAD + Tom: 3404 ~ 3462 ms
HEAD + Vignesh: 3419 ~ 3474 ms
HEAD + Tom + Vignesh: 3408 ~ 3454 msEven apart from the testing time reduction, reducing the lock level
and lock the specific object can also help improve the lock contention
which user(that use the exposed function) , table sync worker and
apply worker can also benefit from it. So, I think pushing the patch to changethe lock level makes sense.
And the patch looks good to me.
Thanks for the tests. I also see a reduction in test time variability with Vignesh's
patch. I think we can release the locks in case the origin is concurrently
dropped as in the attached patch. I am planning to commit this patch
tomorrow unless there are more comments or objections.While on it, after pushing the patch, I think there is another case
might also worth to be improved, that is the table sync and apply
worker try to drop the same origin which might cause some delay. This
is another case(different from the deadlock), so I feel we can try to improvethis in another patch.
Right, I think that case could be addressed by Tom's patch to some extent but
I am thinking we should also try to analyze if we can completely avoid the need
to remove origins from both processes. One idea could be to introduce
another relstate something like PRE_SYNCDONE and set it in a separate
transaction before we set the state as SYNCDONE and remove the slot and
origin in tablesync worker.
Now, if the tablesync worker errors out due to some reason during the second
transaction, it can remove the slot and origin after restart by checking the state.
However, it would add another relstate which may not be the best way to
address this problem. Anyway, that can be accomplished as a separate patch.
Here is an attempt to achieve the same.
Basically, the patch removes the code that drop the origin in apply worker. And
add a new state PRE_SYNCDONE after synchronization finished in front of apply
(sublsn set), but before dropping the origin and other final cleanups. The
tablesync will restart and redo the cleanup if it failed after reaching the new
state. Besides, since the changes can already be applied on the table in
PRE_SYNCDONE state, so I also modified the check in
should_apply_changes_for_rel(). And some other conditions for the origin drop
in subscription commands are were adjusted in this patch.
Best Regards,
Hou zj
Attachments:
0001-Avoid-dropping-origins-from-both-apply-and-tablesync.patchapplication/octet-stream; name=0001-Avoid-dropping-origins-from-both-apply-and-tablesync.patchDownload+125-110
On Fri, Feb 3, 2023 at 6:58 PM houzj.fnst@fujitsu.com
<houzj.fnst@fujitsu.com> wrote:
...
Right, I think that case could be addressed by Tom's patch to some extent but
I am thinking we should also try to analyze if we can completely avoid the need
to remove origins from both processes. One idea could be to introduce
another relstate something like PRE_SYNCDONE and set it in a separate
transaction before we set the state as SYNCDONE and remove the slot and
origin in tablesync worker.
Now, if the tablesync worker errors out due to some reason during the second
transaction, it can remove the slot and origin after restart by checking the state.
However, it would add another relstate which may not be the best way to
address this problem. Anyway, that can be accomplished as a separate patch.Here is an attempt to achieve the same.
Basically, the patch removes the code that drop the origin in apply worker. And
add a new state PRE_SYNCDONE after synchronization finished in front of apply
(sublsn set), but before dropping the origin and other final cleanups. The
tablesync will restart and redo the cleanup if it failed after reaching the new
state. Besides, since the changes can already be applied on the table in
PRE_SYNCDONE state, so I also modified the check in
should_apply_changes_for_rel(). And some other conditions for the origin drop
in subscription commands are were adjusted in this patch.
Here are some review comments for the 0001 patch
======
General Comment
0.
The idea of using the extra relstate for clean-up seems OK, but the
implementation of the new state in this patch appears misordered and
misnamed to me.
The state name should indicate what it is doing (PRE_SYNCDONE is
meaningless). The patch describes in several places that this state
means "synchronized, but not yet cleaned up" therefore IMO it means
the SYNCDONE state should be *before* this new state. And since this
new state is for "cleanup" then let's call it something like that.
To summarize, I don’t think the meaning of SYNCDONE should be touched.
SYNCDONE means the synchronization is done, same as before. And your
new "cleanup" state belongs directly *after* that. IMO it should be
like this:
1. STATE_INIT
2. STATE_DATASYNC
3. STATE_FINISHEDCOPY
4. STATE_SYNCDONE
5. STATE_CLEANUP <-- new relstate
6. STATE_READY
Of course, this is going to impact almost every aspect of the patch,
but I think everything will be basically the same as you have it now
-- only all the state names and comments need to be adjusted according
to the above.
======
Commit Message
1.
Previously, we allowed the apply worker to drop the origin to avoid the case
that the tablesync worker fails to the origin(due to crash). In this case we
don't restart the tablesync worker, and the apply worker can clean the origin.
~
There seem to be some words missing in this paragraph.
SUGGESTION
Previously, we allowed the apply worker to drop the origin as a way to
recover from the scenario where the tablesync worker failed to drop it
(due to crash).
~~~
2.
To improve this, we introduce a new relstate SUBREL_STATE_PRE_SYNCDONE which
will be set after synchronization finished in front of apply (sublsn set), but
before dropping the origin and other final cleanups. The apply worker will
restart tablesync worker if the relstate is SUBREL_STATE_PRE_SYNCDONE. This
way, even if the tablesync worker error out in the transaction that tries to
drop the origin, the apply worker will restart the tablesync worker to redo the
cleanup(for origin and other stuff) and then directly exit.
~
2a.
This is going to be impacted by my "General Comment". Notice how you
describe again "will be set after synchronization finished". Evidence
again this means
the new CLEANUP state should directly follow the SYNCDONE state.
2b.
"error out” --> "encounters an error"
2c.
"cleanup(for origin" --> space before the "("
======
doc/src/sgml/catalogs.sgml
3.
@@ -8071,7 +8071,8 @@ SCRAM-SHA-256$<replaceable><iteration
count></replaceable>:<replaceable>&l
<literal>i</literal> = initialize,
<literal>d</literal> = data is being copied,
<literal>f</literal> = finished table copy,
- <literal>s</literal> = synchronized,
+ <literal>p</literal> = synchronized but not yet cleaned up,
+ <literal>s</literal> = synchronization done,
<literal>r</literal> = ready (normal replication)
</para></entry>
</row>
@@ -8082,8 +8083,8 @@ SCRAM-SHA-256$<replaceable><iteration
count></replaceable>:<replaceable>&l
</para>
<para>
Remote LSN of the state change used for synchronization coordination
- when in <literal>s</literal> or <literal>r</literal> states,
- otherwise null
+ when in <literal>p</literal>, <literal>s</literal> or
+ <literal>r</literal> states, otherwise null
</para></entry>
</row>
</tbody>
This state order and choice of the letter are impacted by my "General Comment".
IMO it should be more like this:
State code: i = initialize, d = data is being copied, f = finished
table copy, s = synchronization done, c = clean-up done, r = ready
(normal replication)
======
src/backend/commands/subscriptioncmds.c
4. AlterSubscription_refresh
Some adjustments are needed according to my "General Comment".
~~~
5. DropSubscription
Some adjustments are needed according to my "General Comment".
======
src/backend/replication/logical/tablesync.c
6.
+ * Update the state of the table to SUBREL_STATE_SYNCDONE and cleanup the
+ * tablesync slot and drop the tablesync's origin tracking.
+ */
+static void
+finish_synchronization(bool restart_after_crash)
6a.
Suggest calling this function something like 'cleanup_after_synchronization'
~
6b.
Some adjustments to states and comments are needed according to my
"General Comment".
~~~
7. process_syncing_tables_for_sync
- MyLogicalRepWorker->relstate = SUBREL_STATE_SYNCDONE;
+ MyLogicalRepWorker->relstate = SUBREL_STATE_PRE_SYNCDONE;
MyLogicalRepWorker->relstate_lsn = current_lsn;
This should just be setting SUBREL_STATE_SYNCDONE how it previously did.
Other states/comments in this function to change according to my
"General Comments".
~
8.
if (rstate->state == SUBREL_STATE_SYNCDONE)
{
/*
* Apply has caught up to the position where the table sync has
* finished. Mark the table as ready so that the apply will just
* continue to replicate it normally.
*/
That should now be checking for SUBREL_STATE_CLEANUPDONE according to
me "General Comment"
~~~
9. process_syncing_tables_for_apply
Some adjustments to states and comments are needed according to my
"General Comment".
~~~
10. LogicalRepSyncTableStart
Some adjustments to states and comments are needed according to my
"General Comment".
======
src/backend/replication/logical/worker.c
11. should_apply_changes_for_rel
Some adjustments to states according to my "General Comment".
======
src/include/catalog/pg_subscription_rel.h
12.
@@ -62,8 +62,10 @@
DECLARE_UNIQUE_INDEX_PKEY(pg_subscription_rel_srrelid_srsubid_index,
6117, Subsc
* NULL) */
#define SUBREL_STATE_FINISHEDCOPY 'f' /* tablesync copy phase is completed
* (sublsn NULL) */
-#define SUBREL_STATE_SYNCDONE 's' /* synchronization finished in front of
- * apply (sublsn set) */
+#define SUBREL_STATE_PRE_SYNCDONE 'p' /* synchronization finished in front of
+ * apply (sublsn set), but the final
+ * cleanup has not yet been performed */
+#define SUBREL_STATE_SYNCDONE 's' /* synchronization complete */
#define SUBREL_STATE_READY 'r' /* ready (sublsn set) */
Some adjustments to states and comments are needed according to my
"General Comment".
------
Kind Regards,
Peter Smith.
Fujitsu Australia