undetected deadlock in ALTER SUBSCRIPTION ... REFRESH PUBLICATION
Hi,
I decided to do some stress-testing of the built-in logical replication,
as part of the sequence decoding work. And I soon ran into an undetected
deadlock related to ALTER SUBSCRIPTION ... REFRESH PUBLICATION :-(
The attached bash scripts triggers that in a couple seconds for me. The
script looks complicated, but most of the code is waiting for sync to
complete, catchup, and that sort of thing.
What the script does is pretty simple:
1) initialize two clusters, set them as publisher/subscriber pair
2) create some number of tables, add them to publication and wait for
the sync to complete
3) start two pgbench runs in the background, modifying the publication
(one removes+adds all tables in a single transaction, one does that
with transaction per table)
4) run refresh.sh which does ALTER PUBLICATION ... REFRESH PUBLICATION
in a loop (now that I think about it, could be another pgbench
script, but well ...)
5) some consistency checks, but the lockup happens earlier so this does
not really matter
After a small number of refresh cycles (for me it's usually a couple
dozen), we end up with a couple stuck locks (I shortened the backend
type string a bit, for formatting reasons):
test=# select a.pid, classid, objid, backend_type, query
from pg_locks l join pg_stat_activity a on (a.pid = l.pid)
where not granted;
pid | classid | objid | backend_type | query
---------+---------+-------+------------------+----------------------
2691941 | 6100 | 16785 | client backend | ALTER SUBSCRIPTION s
REFRESH PUBLICATION
2691837 | 6100 | 16785 | tablesync worker |
2691936 | 6100 | 16785 | tablesync worker |
(3 rows)
All these backends wait for 6100/16785, which is the subscription row in
pg_subscription. The tablesync workers are requesting AccessShareLock,
the client backend however asks for AccessExclusiveLock.
The entry is currently locked by:
test=# select a.pid, mode, backend_type from pg_locks l
join pg_stat_activity a on (a.pid = l.pid)
where classid=6100 and objid=16785 and granted;
pid | mode | backend_type
---------+-----------------+----------------------------------
2690477 | AccessShareLock | logical replication apply worker
(1 row)
But the apply worker is not waiting for any locks, so what's going on?
Well, the problem is the apply worker is waiting for notification from
the tablesync workers the relation is synced, which happens through
updating the pg_subscription_rel row. And that wait happens in
wait_for_relation_state_change, which simply checks the row in a loop,
with a sleep by WaitLatch().
Unfortunately, the tablesync workers can't update the row because the
client backend executing ALTER SUBSCRIPTION ... REFRESH PUBLICATION
sneaked in, and waits for an AccessExclusiveLock. So the tablesync
workers are stuck in the queue and can't proceed.
The client backend can't proceed, because it's waiting for a lock held
by the apply worker.
The tablesync workers can't proceed because their lock request is stuck
behind the AccessExclusiveLock request.
And the apply worker can't proceed, because it's waiting for status
update from the tablesync workers.
And the deadlock is undetected, because the apply worker is not waiting
on a lock, but sleeping on a latch :-(
I don't know what's the right solution here. I wonder if the apply
worker might release the lock before waiting for the update, that'd
solve this whole issue.
Alternatively, ALTER PUBLICATION might wait for the lock only for a
limited amount of time, and try again (but then it'd be susceptible to
starving, of course).
Or maybe there's a way to make this work in a way that would be visible
to the deadlock detector? That'd mean we occasionally get processes
killed to resolve a deadlock, but that's still better than processes
stuck indefinitely ...
regards
--
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
On Tue, Nov 21, 2023 at 5:17 PM Tomas Vondra
<tomas.vondra@enterprisedb.com> wrote:
I decided to do some stress-testing of the built-in logical replication,
as part of the sequence decoding work. And I soon ran into an undetected
deadlock related to ALTER SUBSCRIPTION ... REFRESH PUBLICATION :-(The attached bash scripts triggers that in a couple seconds for me. The
script looks complicated, but most of the code is waiting for sync to
complete, catchup, and that sort of thing.What the script does is pretty simple:
1) initialize two clusters, set them as publisher/subscriber pair
2) create some number of tables, add them to publication and wait for
the sync to complete3) start two pgbench runs in the background, modifying the publication
(one removes+adds all tables in a single transaction, one does that
with transaction per table)4) run refresh.sh which does ALTER PUBLICATION ... REFRESH PUBLICATION
in a loop (now that I think about it, could be another pgbench
script, but well ...)5) some consistency checks, but the lockup happens earlier so this does
not really matterAfter a small number of refresh cycles (for me it's usually a couple
dozen), we end up with a couple stuck locks (I shortened the backend
type string a bit, for formatting reasons):test=# select a.pid, classid, objid, backend_type, query
from pg_locks l join pg_stat_activity a on (a.pid = l.pid)
where not granted;pid | classid | objid | backend_type | query
---------+---------+-------+------------------+----------------------
2691941 | 6100 | 16785 | client backend | ALTER SUBSCRIPTION s
REFRESH PUBLICATION
2691837 | 6100 | 16785 | tablesync worker |
2691936 | 6100 | 16785 | tablesync worker |
(3 rows)All these backends wait for 6100/16785, which is the subscription row in
pg_subscription. The tablesync workers are requesting AccessShareLock,
the client backend however asks for AccessExclusiveLock.The entry is currently locked by:
test=# select a.pid, mode, backend_type from pg_locks l
join pg_stat_activity a on (a.pid = l.pid)
where classid=6100 and objid=16785 and granted;pid | mode | backend_type
---------+-----------------+----------------------------------
2690477 | AccessShareLock | logical replication apply worker
(1 row)But the apply worker is not waiting for any locks, so what's going on?
Well, the problem is the apply worker is waiting for notification from
the tablesync workers the relation is synced, which happens through
updating the pg_subscription_rel row. And that wait happens in
wait_for_relation_state_change, which simply checks the row in a loop,
with a sleep by WaitLatch().Unfortunately, the tablesync workers can't update the row because the
client backend executing ALTER SUBSCRIPTION ... REFRESH PUBLICATION
sneaked in, and waits for an AccessExclusiveLock. So the tablesync
workers are stuck in the queue and can't proceed.The client backend can't proceed, because it's waiting for a lock held
by the apply worker.
It seems there is some inconsistency in what you have written for
client backends/tablesync worker vs. apply worker. The above text
seems to be saying that the client backend and table sync worker are
waiting on a "subscription row in pg_subscription" and the apply
worker is operating on "pg_subscription_rel". So, if that is true then
they shouldn't get stuck.
I think here client backend and tablesync worker seems to be blocked
for a lock on pg_subscription_rel.
The tablesync workers can't proceed because their lock request is stuck
behind the AccessExclusiveLock request.And the apply worker can't proceed, because it's waiting for status
update from the tablesync workers.
This part is not clear to me because
wait_for_relation_state_change()->GetSubscriptionRelState() seems to
be releasing the lock while closing the relation. Am, I missing
something?
--
With Regards,
Amit Kapila.
On 11/21/23 14:16, Amit Kapila wrote:
On Tue, Nov 21, 2023 at 5:17 PM Tomas Vondra
<tomas.vondra@enterprisedb.com> wrote:I decided to do some stress-testing of the built-in logical replication,
as part of the sequence decoding work. And I soon ran into an undetected
deadlock related to ALTER SUBSCRIPTION ... REFRESH PUBLICATION :-(The attached bash scripts triggers that in a couple seconds for me. The
script looks complicated, but most of the code is waiting for sync to
complete, catchup, and that sort of thing.What the script does is pretty simple:
1) initialize two clusters, set them as publisher/subscriber pair
2) create some number of tables, add them to publication and wait for
the sync to complete3) start two pgbench runs in the background, modifying the publication
(one removes+adds all tables in a single transaction, one does that
with transaction per table)4) run refresh.sh which does ALTER PUBLICATION ... REFRESH PUBLICATION
in a loop (now that I think about it, could be another pgbench
script, but well ...)5) some consistency checks, but the lockup happens earlier so this does
not really matterAfter a small number of refresh cycles (for me it's usually a couple
dozen), we end up with a couple stuck locks (I shortened the backend
type string a bit, for formatting reasons):test=# select a.pid, classid, objid, backend_type, query
from pg_locks l join pg_stat_activity a on (a.pid = l.pid)
where not granted;pid | classid | objid | backend_type | query
---------+---------+-------+------------------+----------------------
2691941 | 6100 | 16785 | client backend | ALTER SUBSCRIPTION s
REFRESH PUBLICATION
2691837 | 6100 | 16785 | tablesync worker |
2691936 | 6100 | 16785 | tablesync worker |
(3 rows)All these backends wait for 6100/16785, which is the subscription row in
pg_subscription. The tablesync workers are requesting AccessShareLock,
the client backend however asks for AccessExclusiveLock.The entry is currently locked by:
test=# select a.pid, mode, backend_type from pg_locks l
join pg_stat_activity a on (a.pid = l.pid)
where classid=6100 and objid=16785 and granted;pid | mode | backend_type
---------+-----------------+----------------------------------
2690477 | AccessShareLock | logical replication apply worker
(1 row)But the apply worker is not waiting for any locks, so what's going on?
Well, the problem is the apply worker is waiting for notification from
the tablesync workers the relation is synced, which happens through
updating the pg_subscription_rel row. And that wait happens in
wait_for_relation_state_change, which simply checks the row in a loop,
with a sleep by WaitLatch().Unfortunately, the tablesync workers can't update the row because the
client backend executing ALTER SUBSCRIPTION ... REFRESH PUBLICATION
sneaked in, and waits for an AccessExclusiveLock. So the tablesync
workers are stuck in the queue and can't proceed.The client backend can't proceed, because it's waiting for a lock held
by the apply worker.It seems there is some inconsistency in what you have written for
client backends/tablesync worker vs. apply worker. The above text
seems to be saying that the client backend and table sync worker are
waiting on a "subscription row in pg_subscription" and the apply
worker is operating on "pg_subscription_rel". So, if that is true then
they shouldn't get stuck.I think here client backend and tablesync worker seems to be blocked
for a lock on pg_subscription_rel.
Not really, they are all locking the subscription. All the locks are on
classid=6100, which is pg_subscription:
test=# select 6100::regclass;
regclass
-----------------
pg_subscription
(1 row)
The thing is, the tablesync workers call UpdateSubscriptionRelState,
which locks the pg_subscription catalog at the very beginning:
LockSharedObject(SubscriptionRelationId, ...);
So that's the issue. I haven't explored why it's done this way, and
there's no comment explaining locking the subscriptions is needed ...
The tablesync workers can't proceed because their lock request is stuck
behind the AccessExclusiveLock request.And the apply worker can't proceed, because it's waiting for status
update from the tablesync workers.This part is not clear to me because
wait_for_relation_state_change()->GetSubscriptionRelState() seems to
be releasing the lock while closing the relation. Am, I missing
something?
I think you're missing the fact that GetSubscriptionRelState() acquires
and releases the lock on pg_subscription_rel, but that's not the lock
causing the issue. The problem is the lock on the pg_subscription row.
regards
--
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
On Tue, Nov 21, 2023 at 6:56 PM Tomas Vondra
<tomas.vondra@enterprisedb.com> wrote:
On 11/21/23 14:16, Amit Kapila wrote:
On Tue, Nov 21, 2023 at 5:17 PM Tomas Vondra
<tomas.vondra@enterprisedb.com> wrote:It seems there is some inconsistency in what you have written for
client backends/tablesync worker vs. apply worker. The above text
seems to be saying that the client backend and table sync worker are
waiting on a "subscription row in pg_subscription" and the apply
worker is operating on "pg_subscription_rel". So, if that is true then
they shouldn't get stuck.I think here client backend and tablesync worker seems to be blocked
for a lock on pg_subscription_rel.Not really, they are all locking the subscription. All the locks are on
classid=6100, which is pg_subscription:test=# select 6100::regclass;
regclass
-----------------
pg_subscription
(1 row)The thing is, the tablesync workers call UpdateSubscriptionRelState,
which locks the pg_subscription catalog at the very beginning:LockSharedObject(SubscriptionRelationId, ...);
So that's the issue. I haven't explored why it's done this way, and
there's no comment explaining locking the subscriptions is needed ...
I think it prevents concurrent drop of rel during the REFRESH operation.
The tablesync workers can't proceed because their lock request is stuck
behind the AccessExclusiveLock request.And the apply worker can't proceed, because it's waiting for status
update from the tablesync workers.This part is not clear to me because
wait_for_relation_state_change()->GetSubscriptionRelState() seems to
be releasing the lock while closing the relation. Am, I missing
something?I think you're missing the fact that GetSubscriptionRelState() acquires
and releases the lock on pg_subscription_rel, but that's not the lock
causing the issue. The problem is the lock on the pg_subscription row.
Okay. IIUC, what's going on here is that the apply worker acquires
AccessShareLock on pg_subscription to update rel state for one of the
tables say tbl-1, and then for another table say tbl-2, it started
waiting for a state change via wait_for_relation_state_change(). I
think here the fix is to commit the transaction before we go for a
wait. I guess we need something along the lines of what is proposed in
[1]: /messages/by-id/1412708.1674417574@sss.pgh.pa.us
way..
[1]: /messages/by-id/1412708.1674417574@sss.pgh.pa.us
--
With Regards,
Amit Kapila.
On 11/22/23 11:38, Amit Kapila wrote:
On Tue, Nov 21, 2023 at 6:56 PM Tomas Vondra
<tomas.vondra@enterprisedb.com> wrote:On 11/21/23 14:16, Amit Kapila wrote:
On Tue, Nov 21, 2023 at 5:17 PM Tomas Vondra
<tomas.vondra@enterprisedb.com> wrote:It seems there is some inconsistency in what you have written for
client backends/tablesync worker vs. apply worker. The above text
seems to be saying that the client backend and table sync worker are
waiting on a "subscription row in pg_subscription" and the apply
worker is operating on "pg_subscription_rel". So, if that is true then
they shouldn't get stuck.I think here client backend and tablesync worker seems to be blocked
for a lock on pg_subscription_rel.Not really, they are all locking the subscription. All the locks are on
classid=6100, which is pg_subscription:test=# select 6100::regclass;
regclass
-----------------
pg_subscription
(1 row)The thing is, the tablesync workers call UpdateSubscriptionRelState,
which locks the pg_subscription catalog at the very beginning:LockSharedObject(SubscriptionRelationId, ...);
So that's the issue. I haven't explored why it's done this way, and
there's no comment explaining locking the subscriptions is needed ...I think it prevents concurrent drop of rel during the REFRESH operation.
Yes. Or maybe some other concurrent DDL on the relations included in the
subscription.
The tablesync workers can't proceed because their lock request is stuck
behind the AccessExclusiveLock request.And the apply worker can't proceed, because it's waiting for status
update from the tablesync workers.This part is not clear to me because
wait_for_relation_state_change()->GetSubscriptionRelState() seems to
be releasing the lock while closing the relation. Am, I missing
something?I think you're missing the fact that GetSubscriptionRelState() acquires
and releases the lock on pg_subscription_rel, but that's not the lock
causing the issue. The problem is the lock on the pg_subscription row.Okay. IIUC, what's going on here is that the apply worker acquires
AccessShareLock on pg_subscription to update rel state for one of the
tables say tbl-1, and then for another table say tbl-2, it started
waiting for a state change via wait_for_relation_state_change(). I
think here the fix is to commit the transaction before we go for a
wait. I guess we need something along the lines of what is proposed in
[1] though we have solved the problem in that thread in some other
way..
Possibly. I haven't checked if the commit might have some unexpected
consequences, but I can confirm I can no longer reproduce the deadlock
with the patch applied.
regards
--
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
On Wed, Nov 22, 2023 at 4:51 PM Tomas Vondra
<tomas.vondra@enterprisedb.com> wrote:
On 11/22/23 11:38, Amit Kapila wrote:
Okay. IIUC, what's going on here is that the apply worker acquires
AccessShareLock on pg_subscription to update rel state for one of the
tables say tbl-1, and then for another table say tbl-2, it started
waiting for a state change via wait_for_relation_state_change(). I
think here the fix is to commit the transaction before we go for a
wait. I guess we need something along the lines of what is proposed in
[1] though we have solved the problem in that thread in some other
way..Possibly. I haven't checked if the commit might have some unexpected
consequences, but I can confirm I can no longer reproduce the deadlock
with the patch applied.
Thanks for the verification. Offhand, I don't see any problem with
doing a commit at that place but will try to think some more about it.
I think we may want to call pgstat_report_stat(false) after commit to
avoid a long delay in stats.
I haven't verified but I think this will be a problem in back-branches as well.
--
With Regards,
Amit Kapila.
On 11/23/23 10:24, Amit Kapila wrote:
On Wed, Nov 22, 2023 at 4:51 PM Tomas Vondra
<tomas.vondra@enterprisedb.com> wrote:On 11/22/23 11:38, Amit Kapila wrote:
Okay. IIUC, what's going on here is that the apply worker acquires
AccessShareLock on pg_subscription to update rel state for one of the
tables say tbl-1, and then for another table say tbl-2, it started
waiting for a state change via wait_for_relation_state_change(). I
think here the fix is to commit the transaction before we go for a
wait. I guess we need something along the lines of what is proposed in
[1] though we have solved the problem in that thread in some other
way..Possibly. I haven't checked if the commit might have some unexpected
consequences, but I can confirm I can no longer reproduce the deadlock
with the patch applied.Thanks for the verification. Offhand, I don't see any problem with
doing a commit at that place but will try to think some more about it.
I think we may want to call pgstat_report_stat(false) after commit to
avoid a long delay in stats.
Makes sense.
I haven't verified but I think this will be a problem in
back-branches as well.
Yes. I haven't tried but I don't see why backbranches wouldn't have the
same issue.
regards
--
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
Hi,
I tried to reproduce the issue and was able to reproduce it with
scripts shared by Tomas.
I tried testing it from PG17 to PG 11. This issue is reproducible for
each version.
Next I would try to test with the patch in the thread shared by Amit.
Thanks,
Shlok Kumar Kyal
Hi,
I tried to reproduce the issue and was able to reproduce it with
scripts shared by Tomas.
I tried testing it from PG17 to PG 11. This issue is reproducible for
each version.Next I would try to test with the patch in the thread shared by Amit.
I have created the v1 patch to resolve the issue. Have tested the
patch on HEAD to PG12.
The same patch applies to all the versions. The changes are similar to
the one posted in the thread
/messages/by-id/1412708.1674417574@sss.pgh.pa.us
Thanks and Regards,
Shlok Kyal
Attachments:
v1-0001-Deadlock-when-apply-worker-tablesync-worker-and-c.patchapplication/octet-stream; name=v1-0001-Deadlock-when-apply-worker-tablesync-worker-and-c.patchDownload
From 549ebeac998a0f81d0b2a99191db84e153dcc582 Mon Sep 17 00:00:00 2001
From: Shlok Kyal <shlok.kyal.oss@gmail.com>
Date: Fri, 24 Nov 2023 16:37:14 +0530
Subject: [PATCH v1] Deadlock when apply worker,tablesync worker and client
backend run parallelly.
Apply worker holds a lock on the table pg_subscription_rel and waits for
notification from the tablesync workers where the relation is synced, which
happens through updating the pg_subscription_rel row. And that wait happens in
wait_for_relation_state_change, which simply checks the row in a loop, with a
sleep by WaitLatch(). Unfortunately, the tablesync workers can't update the row
because the client backend executing ALTER SUBSCRIPTION ... REFRESH PUBLICATION
sneaked in, and waits for an AccessExclusiveLock. So the tablesync workers are
stuck in the queue and can't proceed.
The client backend is waiting for a lock held by the apply worker. The tablesync
workers can't proceed because their lock request is stuck behind the
AccessExclusiveLock request. And the apply worker is waiting for status update
from the tablesync workers. And the deadlock is undetected, because the apply
worker is not waiting on a lock, but sleeping on a latch.
We have resolved the issue by releasing the lock by commiting the transaction
before the apply worker goes to wait state.
---
src/backend/replication/logical/tablesync.c | 19 ++++++++++++++++---
1 file changed, 16 insertions(+), 3 deletions(-)
diff --git a/src/backend/replication/logical/tablesync.c b/src/backend/replication/logical/tablesync.c
index df3c42eb5d..e2b3ce64cb 100644
--- a/src/backend/replication/logical/tablesync.c
+++ b/src/backend/replication/logical/tablesync.c
@@ -542,15 +542,28 @@ process_syncing_tables_for_apply(XLogRecPtr current_lsn)
LWLockRelease(LogicalRepWorkerLock);
/*
- * Enter busy loop and wait for synchronization worker to
- * reach expected state (or die trying).
+ * If we have a transaction, we must commit it to release
+ * any locks we have (it's quite likely we hold lock on
+ * pg_replication_origin, which the sync worker will need
+ * to update). Then start a new transaction so we can
+ * examine catalog state.
*/
- if (!started_tx)
+ if (started_tx)
+ {
+ CommitTransactionCommand();
+ pgstat_report_stat(false);
+ StartTransactionCommand();
+ }
+ else
{
StartTransactionCommand();
started_tx = true;
}
+ /*
+ * Enter busy loop and wait for synchronization worker to
+ * reach expected state (or die trying).
+ */
wait_for_relation_state_change(rstate->relid,
SUBREL_STATE_SYNCDONE);
}
--
2.34.1
On Fri, Nov 24, 2023 at 5:05 PM Shlok Kyal <shlok.kyal.oss@gmail.com> wrote:
I tried to reproduce the issue and was able to reproduce it with
scripts shared by Tomas.
I tried testing it from PG17 to PG 11. This issue is reproducible for
each version.Next I would try to test with the patch in the thread shared by Amit.
I have created the v1 patch to resolve the issue. Have tested the
patch on HEAD to PG12.
The same patch applies to all the versions. The changes are similar to
the one posted in the thread
/messages/by-id/1412708.1674417574@sss.pgh.pa.us
(it's quite likely we hold lock on
+ * pg_replication_origin, which the sync worker will need
+ * to update).
This part of the comment is stale and doesn't hold true. You need to
update the reason based on the latest problem discovered in this
thread. I think you can compare the timing of regression tests in
subscription, with and without the patch to show there is no
regression. And probably some tests with a large number of tables for
sync with very little data.
--
With Regards,
Amit Kapila.
Hi,
thread. I think you can compare the timing of regression tests in
subscription, with and without the patch to show there is no
regression. And probably some tests with a large number of tables for
sync with very little data.
I have tested the regression test timings for subscription with and
without patch. I also did the timing test for sync of subscription
with the publisher for 100 and 1000 tables respectively.
I have attached the test script and results of the timing test are as follows:
Time taken for test to run in Linux VM
Summary | Subscription Test (sec)
| 100 tables in pub and Sub (sec) | 1000 tables in pub and Sub
(sec)
Without patch Release | 95.564
| 7.877 | 58.919
With patch Release | 96.513
| 6.533 | 45.807
Time Taken for test to run in another Linux VM
Summary | Subscription Test (sec)
| 100 tables in pub and Sub (sec) | 1000 tables in pub and Sub
(sec)
Without patch Release | 109.8145
| 6.4675 | 83.001
With patch Release | 113.162
| 7.947 | 87.113
Time Taken for test to run in Performance Machine Linux
Summary | Subscription Test (sec)
| 100 tables in pub and Sub (sec) | 1000 tables in pub and Sub
(sec)
Without patch Release | 115.871
| 6.656 | 81.157
With patch Release | 115.922
| 6.7305 | 81.1525
thoughts?
Thanks and Regards,
Shlok Kyal
Attachments:
On Sat, Dec 2, 2023 at 9:52 PM Shlok Kyal <shlok.kyal.oss@gmail.com> wrote:
thread. I think you can compare the timing of regression tests in
subscription, with and without the patch to show there is no
regression. And probably some tests with a large number of tables for
sync with very little data.I have tested the regression test timings for subscription with and
without patch. I also did the timing test for sync of subscription
with the publisher for 100 and 1000 tables respectively.
I have attached the test script and results of the timing test are as follows:Time taken for test to run in Linux VM
Summary | Subscription Test (sec)
| 100 tables in pub and Sub (sec) | 1000 tables in pub and Sub
(sec)
Without patch Release | 95.564
| 7.877 | 58.919
With patch Release | 96.513
| 6.533 | 45.807Time Taken for test to run in another Linux VM
Summary | Subscription Test (sec)
| 100 tables in pub and Sub (sec) | 1000 tables in pub and Sub
(sec)
Without patch Release | 109.8145
| 6.4675 | 83.001
With patch Release | 113.162
| 7.947 | 87.113
So, on some machines, it may increase the test timing although not too
much. I think the reason is probably doing the work in multiple
transactions for multiple relations. I am wondering that instead of
committing and starting a new transaction before
wait_for_relation_state_change(), what if we do it inside that
function just before we decide to wait? It is quite possible that in
many cases we don't need any wait at all.
--
With Regards,
Amit Kapila.
On 12/4/23 12:37, Amit Kapila wrote:
On Sat, Dec 2, 2023 at 9:52 PM Shlok Kyal <shlok.kyal.oss@gmail.com> wrote:
thread. I think you can compare the timing of regression tests in
subscription, with and without the patch to show there is no
regression. And probably some tests with a large number of tables for
sync with very little data.I have tested the regression test timings for subscription with and
without patch. I also did the timing test for sync of subscription
with the publisher for 100 and 1000 tables respectively.
I have attached the test script and results of the timing test are as follows:Time taken for test to run in Linux VM
Summary | Subscription Test (sec)
| 100 tables in pub and Sub (sec) | 1000 tables in pub and Sub
(sec)
Without patch Release | 95.564
| 7.877 | 58.919
With patch Release | 96.513
| 6.533 | 45.807Time Taken for test to run in another Linux VM
Summary | Subscription Test (sec)
| 100 tables in pub and Sub (sec) | 1000 tables in pub and Sub
(sec)
Without patch Release | 109.8145
| 6.4675 | 83.001
With patch Release | 113.162
| 7.947 | 87.113So, on some machines, it may increase the test timing although not too
much. I think the reason is probably doing the work in multiple
transactions for multiple relations. I am wondering that instead of
committing and starting a new transaction before
wait_for_relation_state_change(), what if we do it inside that
function just before we decide to wait? It is quite possible that in
many cases we don't need any wait at all.
I'm not sure what you mean by "do it". What should the function do?
As for the test results, I very much doubt the differences are not
caused simply by random timing variations, or something like that. And I
don't understand what "Performance Machine Linux" is, considering those
timings are slower than the other two machines.
Also, even if it was a bit slower, does it really matter? I mean, the
current code is wrong, can lead to infinite duration if it happens to
hit the deadlock. And it's a one-time action, I don't think it's a very
sensitive in terms of performance.
regards
--
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
On Mon, Dec 4, 2023 at 5:30 PM Tomas Vondra
<tomas.vondra@enterprisedb.com> wrote:
On 12/4/23 12:37, Amit Kapila wrote:
On Sat, Dec 2, 2023 at 9:52 PM Shlok Kyal <shlok.kyal.oss@gmail.com> wrote:
thread. I think you can compare the timing of regression tests in
subscription, with and without the patch to show there is no
regression. And probably some tests with a large number of tables for
sync with very little data.I have tested the regression test timings for subscription with and
without patch. I also did the timing test for sync of subscription
with the publisher for 100 and 1000 tables respectively.
I have attached the test script and results of the timing test are as follows:Time taken for test to run in Linux VM
Summary | Subscription Test (sec)
| 100 tables in pub and Sub (sec) | 1000 tables in pub and Sub
(sec)
Without patch Release | 95.564
| 7.877 | 58.919
With patch Release | 96.513
| 6.533 | 45.807Time Taken for test to run in another Linux VM
Summary | Subscription Test (sec)
| 100 tables in pub and Sub (sec) | 1000 tables in pub and Sub
(sec)
Without patch Release | 109.8145
| 6.4675 | 83.001
With patch Release | 113.162
| 7.947 | 87.113So, on some machines, it may increase the test timing although not too
much. I think the reason is probably doing the work in multiple
transactions for multiple relations. I am wondering that instead of
committing and starting a new transaction before
wait_for_relation_state_change(), what if we do it inside that
function just before we decide to wait? It is quite possible that in
many cases we don't need any wait at all.I'm not sure what you mean by "do it". What should the function do?
I mean to commit the open transaction at the below place in
wait_for_relation_state_change()
wait_for_relation_state_change()
{
...
-- commit the xact
WaitLatch();
...
}
Then start after the wait is over. This is just to test whether it
improves the difference in regression test timing.
As for the test results, I very much doubt the differences are not
caused simply by random timing variations, or something like that. And I
don't understand what "Performance Machine Linux" is, considering those
timings are slower than the other two machines.Also, even if it was a bit slower, does it really matter? I mean, the
current code is wrong, can lead to infinite duration if it happens to
hit the deadlock. And it's a one-time action, I don't think it's a very
sensitive in terms of performance.
Yeah, I see that point but trying to evaluate if we can avoid an
increase in regression test timing at least for the HEAD. The tests
are done in release mode, so, I suspect there could be a slightly
bigger gap in debug mode (we can check that once though) which may hit
developers running regressions quite often in their development
environments. Now, if we there is no easy way to avoid the increase in
regression test timing, we have to still fix the problem, so we have
to take the hit of some increase in time.
--
With Regards,
Amit Kapila.
Hi,
As for the test results, I very much doubt the differences are not
caused simply by random timing variations, or something like that. And I
don't understand what "Performance Machine Linux" is, considering those
timings are slower than the other two machines.
The machine has Total Memory of 755.536 GB, 120 CPUs and RHEL 7 Operating System
Also find the detailed info of the performance machine attached.
Thanks and Regards,
Shlok Kyal
On 12/5/23 08:14, Shlok Kyal wrote:
Hi,
As for the test results, I very much doubt the differences are not
caused simply by random timing variations, or something like that. And I
don't understand what "Performance Machine Linux" is, considering those
timings are slower than the other two machines.The machine has Total Memory of 755.536 GB, 120 CPUs and RHEL 7 Operating System
Also find the detailed info of the performance machine attached.
Thanks for the info. I don't think the tests really benefit from this
much resources, I would be rather surprised if it was faster beyond 8
cores or so. The CPU frequency likely matters much more. Which probably
explains why this machine was the slowest.
Also, I wonder how much the results vary between the runs. I suppose you
only did s single run for each, right?
regards
--
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
On Tue, 5 Dec 2023 at 17:18, Tomas Vondra <tomas.vondra@enterprisedb.com> wrote:
On 12/5/23 08:14, Shlok Kyal wrote:
Hi,
As for the test results, I very much doubt the differences are not
caused simply by random timing variations, or something like that. And I
don't understand what "Performance Machine Linux" is, considering those
timings are slower than the other two machines.The machine has Total Memory of 755.536 GB, 120 CPUs and RHEL 7 Operating System
Also find the detailed info of the performance machine attached.Thanks for the info. I don't think the tests really benefit from this
much resources, I would be rather surprised if it was faster beyond 8
cores or so. The CPU frequency likely matters much more. Which probably
explains why this machine was the slowest.Also, I wonder how much the results vary between the runs. I suppose you
only did s single run for each, right?
I did 10 runs for each of the cases and reported the median result in
the previous thread.
I have documented the result of the runs and have attached the same here.
Thanks and Regards,
Shlok Kyal
Attachments:
Hi,
I mean to commit the open transaction at the below place in
wait_for_relation_state_change()wait_for_relation_state_change()
{
...
-- commit the xact
WaitLatch();
...
}Then start after the wait is over. This is just to test whether it
improves the difference in regression test timing.
I tried the above approach and observed that the performance of this
approach is nearly same as the previous approach.
For Linux VM:
Summary | Subscription | 100 tables in pub | 1000 tables in pub
| Test (sec) | and Sub (sec) | and Sub (sec)
------------------------------------------------------------------------------
old patch | 107.4545 | 6.911 | 77.918
alternate | 108.3985 | 6.9835 | 78.111
approach
For Performance Machine:
Summary | Subscription | 100 tables in pub | 1000 tables in pub
| Test (sec) | and Sub (sec) | and Sub (sec)
------------------------------------------------------------------------------
old patch | 115.922 | 6.7305 | 81.1525
alternate | 115.8215 | 6.7685 | 81.2335
approach
I have attached the patch for this approach as 'alternate_approach.patch'.
Since the performance is the same, I think that the previous approach
is better. As in this approach we are using CommitTransactionCommand()
and StartTransactionCommand() inside a 'for loop'.
I also fixed the comment in previous approach and attached here as
'v2-0001-Deadlock-when-apply-worker-tablesync-worker-and-c.patch'
Thanks and Regards
Shlok Kyal
Attachments:
v2-0001-Deadlock-when-apply-worker-tablesync-worker-and-c.patchapplication/x-patch; name=v2-0001-Deadlock-when-apply-worker-tablesync-worker-and-c.patchDownload
From 11072d138d900227b963b54d1a3626cf256db721 Mon Sep 17 00:00:00 2001
From: Shlok Kyal <shlok.kyal.oss@gmail.com>
Date: Fri, 24 Nov 2023 16:37:14 +0530
Subject: [PATCH v2] Deadlock when apply worker,tablesync worker and client
backend run parallelly.
Apply worker holds a lock on the table pg_subscription_rel and waits for
notification from the tablesync workers where the relation is synced, which
happens through updating the pg_subscription_rel row. And that wait happens in
wait_for_relation_state_change, which simply checks the row in a loop, with a
sleep by WaitLatch(). Unfortunately, the tablesync workers can't update the row
because the client backend executing ALTER SUBSCRIPTION ... REFRESH PUBLICATION
sneaked in, and waits for an AccessExclusiveLock. So the tablesync workers are
stuck in the queue and can't proceed.
The client backend is waiting for a lock held by the apply worker. The tablesync
workers can't proceed because their lock request is stuck behind the
AccessExclusiveLock request. And the apply worker is waiting for status update
from the tablesync workers. And the deadlock is undetected, because the apply
worker is not waiting on a lock, but sleeping on a latch.
We have resolved the issue by releasing the lock by commiting the transaction
before the apply worker goes to wait state.
---
src/backend/replication/logical/tablesync.c | 17 ++++++++++++++---
1 file changed, 14 insertions(+), 3 deletions(-)
diff --git a/src/backend/replication/logical/tablesync.c b/src/backend/replication/logical/tablesync.c
index df3c42eb5d..b71234d998 100644
--- a/src/backend/replication/logical/tablesync.c
+++ b/src/backend/replication/logical/tablesync.c
@@ -542,15 +542,26 @@ process_syncing_tables_for_apply(XLogRecPtr current_lsn)
LWLockRelease(LogicalRepWorkerLock);
/*
- * Enter busy loop and wait for synchronization worker to
- * reach expected state (or die trying).
+ * If we have a transaction, we must commit it to release
+ * any locks we have. Then start a new transaction so we
+ * can examine catalog state.
*/
- if (!started_tx)
+ if (started_tx)
+ {
+ CommitTransactionCommand();
+ pgstat_report_stat(false);
+ StartTransactionCommand();
+ }
+ else
{
StartTransactionCommand();
started_tx = true;
}
+ /*
+ * Enter busy loop and wait for synchronization worker to
+ * reach expected state (or die trying).
+ */
wait_for_relation_state_change(rstate->relid,
SUBREL_STATE_SYNCDONE);
}
--
2.34.1
alternate-approach.patchapplication/x-patch; name=alternate-approach.patchDownload
From ce261399d22a7fd9ee7ccdd3b9d1cc7f4f72ce4b Mon Sep 17 00:00:00 2001
From: Shlok Kyal <shlok.kyal.oss@gmail.com>
Date: Thu, 7 Dec 2023 10:55:26 +0530
Subject: [PATCH] Deadlock when apply worker,tablesync worker and client
backend run parallelly.
Apply worker holds a lock on the table pg_subscription_rel and waits for
notification from the tablesync workers where the relation is synced, which
happens through updating the pg_subscription_rel row. And that wait happens in
wait_for_relation_state_change, which simply checks the row in a loop, with a
sleep by WaitLatch(). Unfortunately, the tablesync workers can't update the row
because the client backend executing ALTER SUBSCRIPTION ... REFRESH PUBLICATION
sneaked in, and waits for an AccessExclusiveLock. So the tablesync workers are
stuck in the queue and can't proceed.
The client backend is waiting for a lock held by the apply worker. The tablesync
workers can't proceed because their lock request is stuck behind the
AccessExclusiveLock request. And the apply worker is waiting for status update
from the tablesync workers. And the deadlock is undetected, because the apply
worker is not waiting on a lock, but sleeping on a latch.
We have resolved the issue by releasing the lock by commiting the transaction
before the apply worker goes to wait state.
---
src/backend/replication/logical/tablesync.c | 15 +++++++++++++--
1 file changed, 13 insertions(+), 2 deletions(-)
diff --git a/src/backend/replication/logical/tablesync.c b/src/backend/replication/logical/tablesync.c
index df3c42eb5d..fb0ac84450 100644
--- a/src/backend/replication/logical/tablesync.c
+++ b/src/backend/replication/logical/tablesync.c
@@ -174,7 +174,7 @@ finish_sync_worker(void)
* CATCHUP state to SYNCDONE.
*/
static bool
-wait_for_relation_state_change(Oid relid, char expected_state)
+wait_for_relation_state_change(Oid relid, char expected_state, bool *started_tx)
{
char state;
@@ -203,6 +203,16 @@ wait_for_relation_state_change(Oid relid, char expected_state)
if (!worker)
break;
+ /*
+ * If we have a transaction, we must commit it to release any locks we
+ * have. Then start a new transaction so we can examine catalog state.
+ */
+ if (*started_tx)
+ {
+ CommitTransactionCommand();
+ pgstat_report_stat(false);
+ StartTransactionCommand();
+ }
(void) WaitLatch(MyLatch,
WL_LATCH_SET | WL_TIMEOUT | WL_EXIT_ON_PM_DEATH,
1000L, WAIT_EVENT_LOGICAL_SYNC_STATE_CHANGE);
@@ -552,7 +562,8 @@ process_syncing_tables_for_apply(XLogRecPtr current_lsn)
}
wait_for_relation_state_change(rstate->relid,
- SUBREL_STATE_SYNCDONE);
+ SUBREL_STATE_SYNCDONE,
+ &started_tx);
}
else
LWLockRelease(LogicalRepWorkerLock);
--
2.34.1
On Thu, Dec 7, 2023 at 11:21 AM Shlok Kyal <shlok.kyal.oss@gmail.com> wrote:
I mean to commit the open transaction at the below place in
wait_for_relation_state_change()wait_for_relation_state_change()
{
...
-- commit the xact
WaitLatch();
...
}Then start after the wait is over. This is just to test whether it
improves the difference in regression test timing.I tried the above approach and observed that the performance of this
approach is nearly same as the previous approach.
Then let's go with the original patch only. BTW, it took almost the
same time (105 wallclock secs) in my environment (CentOs VM) to run
tests in src/test/subscription both with and without the patch. I took
a median of five runs. I have slightly adjusted the comments and
commit message in the attached. If you are fine with this, we can
commit and backpatch this.
--
With Regards,
Amit Kapila.
Attachments:
v3-0001-Fix-an-undetected-deadlock-due-to-apply-worker.patchapplication/octet-stream; name=v3-0001-Fix-an-undetected-deadlock-due-to-apply-worker.patchDownload
From d9d97f1179f46b1c3763ed9339d8c75019583d55 Mon Sep 17 00:00:00 2001
From: Amit Kapila <akapila@postgresql.org>
Date: Fri, 8 Dec 2023 17:07:51 +0530
Subject: [PATCH v3] Fix an undetected deadlock due to apply worker.
The apply worker needs to update the state of the subscription tables to
'READY' during the synchronization phase which requires locking the
corresponding subscription. The apply worker also waits for the
subscription tables to reach the 'SYNCDONE' state after holding the locks
on the subscription and the wait is done using WaitLatch. The 'SYNCDONE'
state is changed by tablesync workers again by locking the corresponding
subscription. Both the state updates use AccessShareLock mode to lock the
subscription, so they can't block each other. However, a backend can
simultaneously try to acquire a lock on the same subscription using
AccessExclusiveLock mode to alter the subscription. Now, the backend's
wait on a lock can sneak in between the apply worker and table sync worker
causing deadlock.
In other words, apply_worker waits for tablesync worker which waits for
backend, and backend waits for apply worker. This is not detected by the
deadlock detector because apply worker uses WaitLatch.
The fix is to release existing locks in apply worker before it starts to
wait for tablesync worker to change the state.
Reported-by: Tomas Vondra
Author: Shlok Kyal
Reviewed-by: Amit Kapila
Backpatch-through: 12
Discussion: https://postgr.es/m/d291bb50-12c4-e8af-2af2-7bb9bb4d8e3e@enterprisedb.com
---
src/backend/replication/logical/tablesync.c | 23 ++++++++++++++++-----
1 file changed, 18 insertions(+), 5 deletions(-)
diff --git a/src/backend/replication/logical/tablesync.c b/src/backend/replication/logical/tablesync.c
index df3c42eb5d..2bdc1bab55 100644
--- a/src/backend/replication/logical/tablesync.c
+++ b/src/backend/replication/logical/tablesync.c
@@ -541,16 +541,29 @@ process_syncing_tables_for_apply(XLogRecPtr current_lsn)
/* Now safe to release the LWLock */
LWLockRelease(LogicalRepWorkerLock);
- /*
- * Enter busy loop and wait for synchronization worker to
- * reach expected state (or die trying).
- */
- if (!started_tx)
+ if (started_tx)
+ {
+ /*
+ * We must commit the existing transaction to release
+ * the existing locks before entering a busy loop. This
+ * is required to avoid any undetected deadlocks due to
+ * any existing lock as deadlock detector won't be able
+ * to detect the waits on the latch.
+ */
+ CommitTransactionCommand();
+ pgstat_report_stat(false);
+ StartTransactionCommand();
+ }
+ else
{
StartTransactionCommand();
started_tx = true;
}
+ /*
+ * Enter busy loop and wait for synchronization worker to
+ * reach expected state (or die trying).
+ */
wait_for_relation_state_change(rstate->relid,
SUBREL_STATE_SYNCDONE);
}
--
2.28.0.windows.1
Hi,
Then let's go with the original patch only. BTW, it took almost the
same time (105 wallclock secs) in my environment (CentOs VM) to run
tests in src/test/subscription both with and without the patch. I took
a median of five runs. I have slightly adjusted the comments and
commit message in the attached. If you are fine with this, we can
commit and backpatch this.
I have tested the patch for all the branches from PG 17 to PG 12.
The same patch applies cleanly on all branches. Also, the same patch
resolves the issue on all the branches.
I ran all the tests and all the tests passed on each branch.
I also reviewed the patch and it looks good to me.
Thanks and Regards,
Shlok Kyal
On Fri, Dec 8, 2023 at 7:16 PM Shlok Kyal <shlok.kyal.oss@gmail.com> wrote:
Then let's go with the original patch only. BTW, it took almost the
same time (105 wallclock secs) in my environment (CentOs VM) to run
tests in src/test/subscription both with and without the patch. I took
a median of five runs. I have slightly adjusted the comments and
commit message in the attached. If you are fine with this, we can
commit and backpatch this.I have tested the patch for all the branches from PG 17 to PG 12.
The same patch applies cleanly on all branches. Also, the same patch
resolves the issue on all the branches.
I ran all the tests and all the tests passed on each branch.I also reviewed the patch and it looks good to me.
Thanks, I could also reproduce the issue on back branches (tried till
12), and the fix works. I'll push this on Monday.
--
With Regards,
Amit Kapila.
On Sat, Dec 9, 2023 at 12:16 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
Thanks, I could also reproduce the issue on back branches (tried till
12), and the fix works. I'll push this on Monday.
Peter sent one minor suggestion (to write the check differently for
easier understanding) offlist which I addressed and pushed the patch.
--
With Regards,
Amit Kapila.
On 12/11/23 07:12, Amit Kapila wrote:
On Sat, Dec 9, 2023 at 12:16 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
Thanks, I could also reproduce the issue on back branches (tried till
12), and the fix works. I'll push this on Monday.Peter sent one minor suggestion (to write the check differently for
easier understanding) offlist which I addressed and pushed the patch.
Thanks for taking care of fixing this!
regards
--
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company