Problem with synchronous replication

Started by Dongming Liuabout 6 years ago17 messages
#1Dongming Liu
lingce.ldm@alibaba-inc.com
1 attachment(s)

Hi,

I recently discovered two possible bugs about synchronous replication.

1. SyncRepCleanupAtProcExit may delete an element that has been deleted
SyncRepCleanupAtProcExit first checks whether the queue is detached, if it is not detached,
acquires the SyncRepLock lock and deletes it. If this element has been deleted by walsender,
it will be deleted repeatedly, SHMQueueDelete will core with a segment fault.

IMO, like SyncRepCancelWait, we should lock the SyncRepLock first and then check
whether the queue is detached or not.

2. SyncRepWaitForLSN may not call SyncRepCancelWait if ereport check one interrupt.
For SyncRepWaitForLSN, if a query cancel interrupt arrives, we just terminate the wait
with suitable warning. As follows:

a. set QueryCancelPending to false
b. errport outputs one warning
c. calls SyncRepCancelWait to delete one element from the queue

If another cancel interrupt arrives when we are outputting warning at step b, the errfinish
will call CHECK_FOR_INTERRUPTS that will output an ERROR, such as "canceling autovacuum
task", then the process will jump to the sigsetjmp. Unfortunately, the step c will be skipped
and the element that should be deleted by SyncRepCancelWait is remained.

The easiest way to fix this is to swap the order of step b and step c. On the other hand,
let sigsetjmp clean up the queue may also be a good choice. What do you think?

Attached the patch, any feedback is greatly appreciated.

Best regards,
--
Dongming Liu

Attachments:

0001-Fix-two-bugs-for-synchronous-replication.patchapplication/octet-streamDownload
From 5e5245076ea8105dab88c2bfd2ad90a96b48dbdd Mon Sep 17 00:00:00 2001
From: "lingce.ldm" <lingce.ldm@alibaba-inc.com>
Date: Fri, 25 Oct 2019 14:49:49 +0800
Subject: [PATCH] Fix two bugs for synchronous replication

---
 src/backend/replication/syncrep.c | 8 +++-----
 1 file changed, 3 insertions(+), 5 deletions(-)

diff --git a/src/backend/replication/syncrep.c b/src/backend/replication/syncrep.c
index a21f7d3..7b09d0f 100644
--- a/src/backend/replication/syncrep.c
+++ b/src/backend/replication/syncrep.c
@@ -261,10 +261,10 @@ SyncRepWaitForLSN(XLogRecPtr lsn, bool commit)
 		if (QueryCancelPending)
 		{
 			QueryCancelPending = false;
+			SyncRepCancelWait();
 			ereport(WARNING,
 					(errmsg("canceling wait for synchronous replication due to user request"),
 					 errdetail("The transaction has already committed locally, but might not have been replicated to the standby.")));
-			SyncRepCancelWait();
 			break;
 		}
 
@@ -361,12 +361,10 @@ SyncRepCancelWait(void)
 void
 SyncRepCleanupAtProcExit(void)
 {
+	LWLockAcquire(SyncRepLock, LW_EXCLUSIVE);
 	if (!SHMQueueIsDetached(&(MyProc->syncRepLinks)))
-	{
-		LWLockAcquire(SyncRepLock, LW_EXCLUSIVE);
 		SHMQueueDelete(&(MyProc->syncRepLinks));
-		LWLockRelease(SyncRepLock);
-	}
+	LWLockRelease(SyncRepLock);
 }
 
 /*
-- 
2.9.2

#2Dongming Liu
lingce.ldm@alibaba-inc.com
In reply to: Dongming Liu (#1)
Re: Problem with synchronous replication

Can someone help me to confirm that these two problems are bugs?
If they are bugs, please help review the patch or provide better fix suggestions.
Thanks.

Best regards,
--
Dongming Liu
------------------------------------------------------------------
From:LIU Dongming <lingce.ldm@alibaba-inc.com>
Sent At:2019 Oct. 25 (Fri.) 15:18
To:pgsql-hackers <pgsql-hackers@postgresql.org>
Subject:Problem with synchronous replication

Hi,

I recently discovered two possible bugs about synchronous replication.

1. SyncRepCleanupAtProcExit may delete an element that has been deleted
SyncRepCleanupAtProcExit first checks whether the queue is detached, if it is not detached,
acquires the SyncRepLock lock and deletes it. If this element has been deleted by walsender,
it will be deleted repeatedly, SHMQueueDelete will core with a segment fault.

IMO, like SyncRepCancelWait, we should lock the SyncRepLock first and then check
whether the queue is detached or not.

2. SyncRepWaitForLSN may not call SyncRepCancelWait if ereport check one interrupt.
For SyncRepWaitForLSN, if a query cancel interrupt arrives, we just terminate the wait
with suitable warning. As follows:

a. set QueryCancelPending to false
b. errport outputs one warning
c. calls SyncRepCancelWait to delete one element from the queue

If another cancel interrupt arrives when we are outputting warning at step b, the errfinish
will call CHECK_FOR_INTERRUPTS that will output an ERROR, such as "canceling autovacuum
task", then the process will jump to the sigsetjmp. Unfortunately, the step c will be skipped
and the element that should be deleted by SyncRepCancelWait is remained.

The easiest way to fix this is to swap the order of step b and step c. On the other hand,
let sigsetjmp clean up the queue may also be a good choice. What do you think?

Attached the patch, any feedback is greatly appreciated.

Best regards,
--
Dongming Liu

#3Kyotaro Horiguchi
horikyota.ntt@gmail.com
In reply to: Dongming Liu (#1)
Re: Problem with synchronous replication

Hello.

At Fri, 25 Oct 2019 15:18:34 +0800, "Dongming Liu" <lingce.ldm@alibaba-inc.com> wrote in

Hi,

I recently discovered two possible bugs about synchronous replication.

1. SyncRepCleanupAtProcExit may delete an element that has been deleted
SyncRepCleanupAtProcExit first checks whether the queue is detached, if it is not detached,
acquires the SyncRepLock lock and deletes it. If this element has been deleted by walsender,
it will be deleted repeatedly, SHMQueueDelete will core with a segment fault.

IMO, like SyncRepCancelWait, we should lock the SyncRepLock first and then check
whether the queue is detached or not.

I think you're right here.

2. SyncRepWaitForLSN may not call SyncRepCancelWait if ereport check one interrupt.
For SyncRepWaitForLSN, if a query cancel interrupt arrives, we just terminate the wait
with suitable warning. As follows:

a. set QueryCancelPending to false
b. errport outputs one warning
c. calls SyncRepCancelWait to delete one element from the queue

If another cancel interrupt arrives when we are outputting warning at step b, the errfinish
will call CHECK_FOR_INTERRUPTS that will output an ERROR, such as "canceling autovacuum
task", then the process will jump to the sigsetjmp. Unfortunately, the step c will be skipped
and the element that should be deleted by SyncRepCancelWait is remained.

The easiest way to fix this is to swap the order of step b and step c. On the other hand,
let sigsetjmp clean up the queue may also be a good choice. What do you think?

Attached the patch, any feedback is greatly appreciated.

This is not right. It is in transaction commit so it is in a
HOLD_INTERRUPTS section. ProcessInterrupt does not respond to
cancel/die interrupts thus the ereport should return.

regards.

--
Kyotaro Horiguchi
NTT Open Source Software Center

#4Michael Paquier
michael@paquier.xyz
In reply to: Dongming Liu (#2)
Re: Problem with synchronous replication

On Tue, Oct 29, 2019 at 01:40:41PM +0800, Dongming Liu wrote:

Can someone help me to confirm that these two problems are bugs?
If they are bugs, please help review the patch or provide better fix
suggestions.

There is no need to send periodic pings. Sometimes it takes time to
get replies as time is an important resource that is always limited.
I can see that Horiguchi-san has already provided some feedback, and I
am looking now at your suggestions.
--
Michael

#5Michael Paquier
michael@paquier.xyz
In reply to: Kyotaro Horiguchi (#3)
1 attachment(s)
Re: Problem with synchronous replication

On Tue, Oct 29, 2019 at 07:50:01PM +0900, Kyotaro Horiguchi wrote:

At Fri, 25 Oct 2019 15:18:34 +0800, "Dongming Liu" <lingce.ldm@alibaba-inc.com> wrote in

I recently discovered two possible bugs about synchronous replication.

1. SyncRepCleanupAtProcExit may delete an element that has been deleted
SyncRepCleanupAtProcExit first checks whether the queue is detached, if it is not detached,
acquires the SyncRepLock lock and deletes it. If this element has been deleted by walsender,
it will be deleted repeatedly, SHMQueueDelete will core with a segment fault.

IMO, like SyncRepCancelWait, we should lock the SyncRepLock first and then check
whether the queue is detached or not.

I think you're right here.

Indeed. Looking at the surroundings we expect some code paths to hold
SyncRepLock in exclusive or shared mode but we don't actually check
that the lock is hold. So let's add some assertions while on it.

This is not right. It is in transaction commit so it is in a
HOLD_INTERRUPTS section. ProcessInterrupt does not respond to
cancel/die interrupts thus the ereport should return.

Yeah. There is an easy way to check after that: InterruptHoldoffCount
needs to be strictly positive.

My suggestions are attached. Any thoughts?
--
Michael

Attachments:

syncrep-lwlocks.patchtext/x-diff; charset=us-asciiDownload
diff --git a/src/backend/replication/syncrep.c b/src/backend/replication/syncrep.c
index a21f7d3347..c11945d1c2 100644
--- a/src/backend/replication/syncrep.c
+++ b/src/backend/replication/syncrep.c
@@ -149,6 +149,13 @@ SyncRepWaitForLSN(XLogRecPtr lsn, bool commit)
 	const char *old_status;
 	int			mode;
 
+	/*
+	 * This should be called while holding interrupts during a transaction
+	 * commit to prevent the follow-up shared memory queue cleanups to be
+	 * influenced by external interruptions.
+	 */
+	Assert(InterruptHoldoffCount > 0);
+
 	/* Cap the level for anything other than commit to remote flush only. */
 	if (commit)
 		mode = SyncRepWaitMode;
@@ -361,12 +368,10 @@ SyncRepCancelWait(void)
 void
 SyncRepCleanupAtProcExit(void)
 {
+	LWLockAcquire(SyncRepLock, LW_EXCLUSIVE);
 	if (!SHMQueueIsDetached(&(MyProc->syncRepLinks)))
-	{
-		LWLockAcquire(SyncRepLock, LW_EXCLUSIVE);
 		SHMQueueDelete(&(MyProc->syncRepLinks));
-		LWLockRelease(SyncRepLock);
-	}
+	LWLockRelease(SyncRepLock);
 }
 
 /*
@@ -508,6 +513,8 @@ SyncRepReleaseWaiters(void)
 /*
  * Calculate the synced Write, Flush and Apply positions among sync standbys.
  *
+ * The caller must hold SyncRepLock.
+ *
  * Return false if the number of sync standbys is less than
  * synchronous_standby_names specifies. Otherwise return true and
  * store the positions into *writePtr, *flushPtr and *applyPtr.
@@ -521,6 +528,8 @@ SyncRepGetSyncRecPtr(XLogRecPtr *writePtr, XLogRecPtr *flushPtr,
 {
 	List	   *sync_standbys;
 
+	Assert(LWLockHeldByMe(SyncRepLock));
+
 	*writePtr = InvalidXLogRecPtr;
 	*flushPtr = InvalidXLogRecPtr;
 	*applyPtr = InvalidXLogRecPtr;
@@ -680,6 +689,8 @@ cmp_lsn(const void *a, const void *b)
 List *
 SyncRepGetSyncStandbys(bool *am_sync)
 {
+	Assert(LWLockHeldByMe(SyncRepLock));
+
 	/* Set default result */
 	if (am_sync != NULL)
 		*am_sync = false;
@@ -984,7 +995,7 @@ SyncRepGetStandbyPriority(void)
  * Pass all = true to wake whole queue; otherwise, just wake up to
  * the walsender's LSN.
  *
- * Must hold SyncRepLock.
+ * The caller must hold SyncRepLock in exclusive mode.
  */
 static int
 SyncRepWakeQueue(bool all, int mode)
@@ -995,6 +1006,7 @@ SyncRepWakeQueue(bool all, int mode)
 	int			numprocs = 0;
 
 	Assert(mode >= 0 && mode < NUM_SYNC_REP_WAIT_MODE);
+	Assert(LWLockHeldByMeInMode(SyncRepLock, LW_EXCLUSIVE));
 	Assert(SyncRepQueueIsOrderedByLSN(mode));
 
 	proc = (PGPROC *) SHMQueueNext(&(WalSndCtl->SyncRepQueue[mode]),
#6Kyotaro Horiguchi
horikyota.ntt@gmail.com
In reply to: Michael Paquier (#5)
1 attachment(s)
Re: Problem with synchronous replication

At Wed, 30 Oct 2019 10:45:11 +0900, Michael Paquier <michael@paquier.xyz> wrote in

On Tue, Oct 29, 2019 at 07:50:01PM +0900, Kyotaro Horiguchi wrote:

At Fri, 25 Oct 2019 15:18:34 +0800, "Dongming Liu" <lingce.ldm@alibaba-inc.com> wrote in

I recently discovered two possible bugs about synchronous replication.

1. SyncRepCleanupAtProcExit may delete an element that has been deleted
SyncRepCleanupAtProcExit first checks whether the queue is detached, if it is not detached,
acquires the SyncRepLock lock and deletes it. If this element has been deleted by walsender,
it will be deleted repeatedly, SHMQueueDelete will core with a segment fault.

IMO, like SyncRepCancelWait, we should lock the SyncRepLock first and then check
whether the queue is detached or not.

I think you're right here.

Indeed. Looking at the surroundings we expect some code paths to hold
SyncRepLock in exclusive or shared mode but we don't actually check
that the lock is hold. So let's add some assertions while on it.

I looked around closer.

If we do that strictly, other functions like
SyncRepGetOldestSyncRecPtr need the same Assert()s. I think static
functions don't need Assert() and caution in their comments would be
enough.

On the other hand, the similar-looking code in SyncRepInitConfig and
SyncRepUpdateSyncStandbysDefined seems safe since AFAICS it doesn't
have (this kind of) racing condition on wirtes. It might need a
comment like that. Or, we could go to (apparently) safer-side by
applying the same amendment to it.

SyncRepReleaseWaiters reads MyWalSnd->sync_standby_priority without
holding SyncRepLock, which could lead to a message with wrong
priority. I'm not sure it matters, though.

This is not right. It is in transaction commit so it is in a
HOLD_INTERRUPTS section. ProcessInterrupt does not respond to
cancel/die interrupts thus the ereport should return.

Yeah. There is an easy way to check after that: InterruptHoldoffCount
needs to be strictly positive.

My suggestions are attached. Any thoughts?

Seems reasonable for holdoffs. The same assertion would be needed in
more places but it's another issue.

By the way while I was looking this, I found a typo. Please find the
attached.

regards.

--
Kyotaro Horiguchi
NTT Open Source Software Center

Attachments:

syncrep_comment_typo.patchtext/x-patch; charset=us-asciiDownload
diff --git a/src/backend/replication/syncrep.c b/src/backend/replication/syncrep.c
index a21f7d3347..16aee1de4c 100644
--- a/src/backend/replication/syncrep.c
+++ b/src/backend/replication/syncrep.c
@@ -1065,8 +1065,8 @@ SyncRepUpdateSyncStandbysDefined(void)
 
 		/*
 		 * If synchronous_standby_names has been reset to empty, it's futile
-		 * for backends to continue to waiting.  Since the user no longer
-		 * wants synchronous replication, we'd better wake them up.
+		 * for backends to continue waiting.  Since the user no longer wants
+		 * synchronous replication, we'd better wake them up.
 		 */
 		if (!sync_standbys_defined)
 		{
#7Dongming Liu
ldming101@gmail.com
In reply to: Michael Paquier (#4)
Re: Problem with synchronous replication

On Wed, Oct 30, 2019 at 9:12 AM Michael Paquier <michael@paquier.xyz> wrote:

On Tue, Oct 29, 2019 at 01:40:41PM +0800, Dongming Liu wrote:

Can someone help me to confirm that these two problems are bugs?
If they are bugs, please help review the patch or provide better fix
suggestions.

There is no need to send periodic pings. Sometimes it takes time to
get replies as time is an important resource that is always limited.

Thank you for your reply. I also realized my mistake, thank you for
correcting me.

I can see that Horiguchi-san has already provided some feedback, and I
am looking now at your suggestions.

Thanks again.

#8lingce.ldm
lingce.ldm@alibaba-inc.com
In reply to: Kyotaro Horiguchi (#3)
1 attachment(s)
Re: Problem with synchronous replication

On Oct 29, 2019, at 18:50, Kyotaro Horiguchi <horikyota.ntt@gmail.com <mailto:horikyota.ntt@gmail.com>> wrote:

Hello.

At Fri, 25 Oct 2019 15:18:34 +0800, "Dongming Liu" <lingce.ldm@alibaba-inc.com <mailto:lingce.ldm@alibaba-inc.com>> wrote in

Hi,

I recently discovered two possible bugs about synchronous replication.

1. SyncRepCleanupAtProcExit may delete an element that has been deleted
SyncRepCleanupAtProcExit first checks whether the queue is detached, if it is not detached,
acquires the SyncRepLock lock and deletes it. If this element has been deleted by walsender,
it will be deleted repeatedly, SHMQueueDelete will core with a segment fault.

IMO, like SyncRepCancelWait, we should lock the SyncRepLock first and then check
whether the queue is detached or not.

I think you're right here.

Thanks.

2. SyncRepWaitForLSN may not call SyncRepCancelWait if ereport check one interrupt.
For SyncRepWaitForLSN, if a query cancel interrupt arrives, we just terminate the wait
with suitable warning. As follows:

a. set QueryCancelPending to false
b. errport outputs one warning
c. calls SyncRepCancelWait to delete one element from the queue

If another cancel interrupt arrives when we are outputting warning at step b, the errfinish
will call CHECK_FOR_INTERRUPTS that will output an ERROR, such as "canceling autovacuum
task", then the process will jump to the sigsetjmp. Unfortunately, the step c will be skipped
and the element that should be deleted by SyncRepCancelWait is remained.

The easiest way to fix this is to swap the order of step b and step c. On the other hand,
let sigsetjmp clean up the queue may also be a good choice. What do you think?

Attached the patch, any feedback is greatly appreciated.

This is not right. It is in transaction commit so it is in a
HOLD_INTERRUPTS section. ProcessInterrupt does not respond to
cancel/die interrupts thus the ereport should return.

I read the relevant code, you are right. I called SyncRepWaitForLSN somewhere else,
but forgot to put it in a HOLD_INTERRUPTS and triggered an exception.

regards.


Dongming Liu

Attachments:

smime.p7sapplication/pkcs7-signature; name=smime.p7sDownload
#9lingce.ldm
lingce.ldm@alibaba-inc.com
In reply to: Michael Paquier (#5)
1 attachment(s)
Re: Problem with synchronous replication

On Oct 30, 2019, at 09:45, Michael Paquier <michael@paquier.xyz <mailto:michael@paquier.xyz>> wrote:

On Tue, Oct 29, 2019 at 07:50:01PM +0900, Kyotaro Horiguchi wrote:

At Fri, 25 Oct 2019 15:18:34 +0800, "Dongming Liu" <lingce.ldm@alibaba-inc.com> wrote in

I recently discovered two possible bugs about synchronous replication.

1. SyncRepCleanupAtProcExit may delete an element that has been deleted
SyncRepCleanupAtProcExit first checks whether the queue is detached, if it is not detached,
acquires the SyncRepLock lock and deletes it. If this element has been deleted by walsender,
it will be deleted repeatedly, SHMQueueDelete will core with a segment fault.

IMO, like SyncRepCancelWait, we should lock the SyncRepLock first and then check
whether the queue is detached or not.

I think you're right here.

Indeed. Looking at the surroundings we expect some code paths to hold
SyncRepLock in exclusive or shared mode but we don't actually check
that the lock is hold. So let's add some assertions while on it.

This is not right. It is in transaction commit so it is in a
HOLD_INTERRUPTS section. ProcessInterrupt does not respond to
cancel/die interrupts thus the ereport should return.

Yeah. There is an easy way to check after that: InterruptHoldoffCount
needs to be strictly positive.

My suggestions are attached. Any thoughts?

Thanks, this patch looks good to me.

Attachments:

smime.p7sapplication/pkcs7-signature; name=smime.p7sDownload
#10Fujii Masao
masao.fujii@gmail.com
In reply to: lingce.ldm (#8)
Re: Problem with synchronous replication

On Wed, Oct 30, 2019 at 4:16 PM lingce.ldm <lingce.ldm@alibaba-inc.com> wrote:

On Oct 29, 2019, at 18:50, Kyotaro Horiguchi <horikyota.ntt@gmail.com> wrote:

Hello.

At Fri, 25 Oct 2019 15:18:34 +0800, "Dongming Liu" <lingce.ldm@alibaba-inc.com> wrote in

Hi,

I recently discovered two possible bugs about synchronous replication.

1. SyncRepCleanupAtProcExit may delete an element that has been deleted
SyncRepCleanupAtProcExit first checks whether the queue is detached, if it is not detached,
acquires the SyncRepLock lock and deletes it. If this element has been deleted by walsender,
it will be deleted repeatedly, SHMQueueDelete will core with a segment fault.

IMO, like SyncRepCancelWait, we should lock the SyncRepLock first and then check
whether the queue is detached or not.

I think you're right here.

This change causes every ending backends to always take the exclusive lock
even when it's not in SyncRep queue. This may be problematic, for example,
when terminating multiple backends at the same time? If yes,
it might be better to check SHMQueueIsDetached() again after taking the lock.
That is,

if (!SHMQueueIsDetached(&(MyProc->syncRepLinks)))
{
LWLockAcquire(SyncRepLock, LW_EXCLUSIVE);
if (!SHMQueueIsDetached(&(MyProc->syncRepLinks)))
SHMQueueDelete(&(MyProc->syncRepLinks));
LWLockRelease(SyncRepLock);
}

Regards,

--
Fujii Masao

#11Kyotaro Horiguchi
horikyota.ntt@gmail.com
In reply to: Fujii Masao (#10)
Re: Problem with synchronous replication

Hello.

At Wed, 30 Oct 2019 17:21:17 +0900, Fujii Masao <masao.fujii@gmail.com> wrote in

This change causes every ending backends to always take the exclusive lock
even when it's not in SyncRep queue. This may be problematic, for example,
when terminating multiple backends at the same time? If yes,
it might be better to check SHMQueueIsDetached() again after taking the lock.
That is,

I'm not sure how much that harms but double-checked locking
(releasing) is simple enough for reducing possible congestion here, I
think. In short, + 1 for that.

if (!SHMQueueIsDetached(&(MyProc->syncRepLinks)))
{
LWLockAcquire(SyncRepLock, LW_EXCLUSIVE);
if (!SHMQueueIsDetached(&(MyProc->syncRepLinks)))
SHMQueueDelete(&(MyProc->syncRepLinks));
LWLockRelease(SyncRepLock);
}

regards.

--
Kyotaro Horiguchi
NTT Open Source Software Center

#12Michael Paquier
michael@paquier.xyz
In reply to: Fujii Masao (#10)
Re: Problem with synchronous replication

On Wed, Oct 30, 2019 at 05:21:17PM +0900, Fujii Masao wrote:

This change causes every ending backends to always take the exclusive lock
even when it's not in SyncRep queue. This may be problematic, for example,
when terminating multiple backends at the same time? If yes,
it might be better to check SHMQueueIsDetached() again after taking the lock.
That is,

if (!SHMQueueIsDetached(&(MyProc->syncRepLinks)))
{
LWLockAcquire(SyncRepLock, LW_EXCLUSIVE);
if (!SHMQueueIsDetached(&(MyProc->syncRepLinks)))
SHMQueueDelete(&(MyProc->syncRepLinks));
LWLockRelease(SyncRepLock);
}

Makes sense. Thanks for the suggestion.
--
Michael

#13Michael Paquier
michael@paquier.xyz
In reply to: Kyotaro Horiguchi (#6)
Re: Problem with synchronous replication

On Wed, Oct 30, 2019 at 12:34:28PM +0900, Kyotaro Horiguchi wrote:

If we do that strictly, other functions like
SyncRepGetOldestSyncRecPtr need the same Assert()s. I think static
functions don't need Assert() and caution in their comments would be
enough.

Perhaps. I'd rather be careful though if we meddle again with this
code in the future as it is shared across multiple places and
callers.

SyncRepReleaseWaiters reads MyWalSnd->sync_standby_priority without
holding SyncRepLock, which could lead to a message with wrong
priority. I'm not sure it matters, though.

The WAL sender is the only writer of its info in shared memory, so
there is no problem to have it read data without its spin lock hold.

Seems reasonable for holdoffs. The same assertion would be needed in
more places but it's another issue.

Sure.

By the way while I was looking this, I found a typo. Please find the
attached.

Thanks, applied that one.
--
Michael

#14Michael Paquier
michael@paquier.xyz
In reply to: Kyotaro Horiguchi (#11)
1 attachment(s)
Re: Problem with synchronous replication

On Wed, Oct 30, 2019 at 05:43:04PM +0900, Kyotaro Horiguchi wrote:

At Wed, 30 Oct 2019 17:21:17 +0900, Fujii Masao <masao.fujii@gmail.com> wrote in

This change causes every ending backends to always take the exclusive lock
even when it's not in SyncRep queue. This may be problematic, for example,
when terminating multiple backends at the same time? If yes,
it might be better to check SHMQueueIsDetached() again after taking the lock.
That is,

I'm not sure how much that harms but double-checked locking
(releasing) is simple enough for reducing possible congestion here, I
think.

FWIW, I could not measure any actual difference with pgbench -C, up to
500 sessions and an empty input file (just have one meta-command) and
-c 20.

I have added some comments in SyncRepCleanupAtProcExit(), and adjusted
the patch with the suggestion from Fujii-san. Any comments?
--
Michael

Attachments:

syncrep-lwlocks-v2.patchtext/x-diff; charset=us-asciiDownload
diff --git a/src/backend/replication/syncrep.c b/src/backend/replication/syncrep.c
index 16aee1de4c..f72b8a75f3 100644
--- a/src/backend/replication/syncrep.c
+++ b/src/backend/replication/syncrep.c
@@ -149,6 +149,13 @@ SyncRepWaitForLSN(XLogRecPtr lsn, bool commit)
 	const char *old_status;
 	int			mode;
 
+	/*
+	 * This should be called while holding interrupts during a transaction
+	 * commit to prevent the follow-up shared memory queue cleanups to be
+	 * influenced by external interruptions.
+	 */
+	Assert(InterruptHoldoffCount > 0);
+
 	/* Cap the level for anything other than commit to remote flush only. */
 	if (commit)
 		mode = SyncRepWaitMode;
@@ -361,10 +368,18 @@ SyncRepCancelWait(void)
 void
 SyncRepCleanupAtProcExit(void)
 {
+	/*
+	 * First check if we are removed from the queue without the lock to
+	 * not slow down backend exit.
+	 */
 	if (!SHMQueueIsDetached(&(MyProc->syncRepLinks)))
 	{
 		LWLockAcquire(SyncRepLock, LW_EXCLUSIVE);
-		SHMQueueDelete(&(MyProc->syncRepLinks));
+
+		/* maybe we have just been removed, so recheck */
+		if (!SHMQueueIsDetached(&(MyProc->syncRepLinks)))
+			SHMQueueDelete(&(MyProc->syncRepLinks));
+
 		LWLockRelease(SyncRepLock);
 	}
 }
@@ -508,6 +523,8 @@ SyncRepReleaseWaiters(void)
 /*
  * Calculate the synced Write, Flush and Apply positions among sync standbys.
  *
+ * The caller must hold SyncRepLock.
+ *
  * Return false if the number of sync standbys is less than
  * synchronous_standby_names specifies. Otherwise return true and
  * store the positions into *writePtr, *flushPtr and *applyPtr.
@@ -521,6 +538,8 @@ SyncRepGetSyncRecPtr(XLogRecPtr *writePtr, XLogRecPtr *flushPtr,
 {
 	List	   *sync_standbys;
 
+	Assert(LWLockHeldByMe(SyncRepLock));
+
 	*writePtr = InvalidXLogRecPtr;
 	*flushPtr = InvalidXLogRecPtr;
 	*applyPtr = InvalidXLogRecPtr;
@@ -680,6 +699,8 @@ cmp_lsn(const void *a, const void *b)
 List *
 SyncRepGetSyncStandbys(bool *am_sync)
 {
+	Assert(LWLockHeldByMe(SyncRepLock));
+
 	/* Set default result */
 	if (am_sync != NULL)
 		*am_sync = false;
@@ -984,7 +1005,7 @@ SyncRepGetStandbyPriority(void)
  * Pass all = true to wake whole queue; otherwise, just wake up to
  * the walsender's LSN.
  *
- * Must hold SyncRepLock.
+ * The caller must hold SyncRepLock in exclusive mode.
  */
 static int
 SyncRepWakeQueue(bool all, int mode)
@@ -995,6 +1016,7 @@ SyncRepWakeQueue(bool all, int mode)
 	int			numprocs = 0;
 
 	Assert(mode >= 0 && mode < NUM_SYNC_REP_WAIT_MODE);
+	Assert(LWLockHeldByMeInMode(SyncRepLock, LW_EXCLUSIVE));
 	Assert(SyncRepQueueIsOrderedByLSN(mode));
 
 	proc = (PGPROC *) SHMQueueNext(&(WalSndCtl->SyncRepQueue[mode]),
#15Fujii Masao
masao.fujii@gmail.com
In reply to: Michael Paquier (#14)
Re: Problem with synchronous replication

On Thu, Oct 31, 2019 at 11:12 AM Michael Paquier <michael@paquier.xyz> wrote:

On Wed, Oct 30, 2019 at 05:43:04PM +0900, Kyotaro Horiguchi wrote:

At Wed, 30 Oct 2019 17:21:17 +0900, Fujii Masao <masao.fujii@gmail.com> wrote in

This change causes every ending backends to always take the exclusive lock
even when it's not in SyncRep queue. This may be problematic, for example,
when terminating multiple backends at the same time? If yes,
it might be better to check SHMQueueIsDetached() again after taking the lock.
That is,

I'm not sure how much that harms but double-checked locking
(releasing) is simple enough for reducing possible congestion here, I
think.

FWIW, I could not measure any actual difference with pgbench -C, up to
500 sessions and an empty input file (just have one meta-command) and
-c 20.

I have added some comments in SyncRepCleanupAtProcExit(), and adjusted
the patch with the suggestion from Fujii-san. Any comments?

Thanks for the patch! Looks good to me.

Regards,

--
Fujii Masao

#16lingce.ldm
lingce.ldm@alibaba-inc.com
In reply to: Michael Paquier (#14)
1 attachment(s)
Re: Problem with synchronous replication

On Oct 31, 2019, at 10:11, Michael Paquier <michael@paquier.xyz> wrote:

On Wed, Oct 30, 2019 at 05:43:04PM +0900, Kyotaro Horiguchi wrote:

At Wed, 30 Oct 2019 17:21:17 +0900, Fujii Masao <masao.fujii@gmail.com> wrote in

This change causes every ending backends to always take the exclusive lock
even when it's not in SyncRep queue. This may be problematic, for example,
when terminating multiple backends at the same time? If yes,
it might be better to check SHMQueueIsDetached() again after taking the lock.
That is,

I'm not sure how much that harms but double-checked locking
(releasing) is simple enough for reducing possible congestion here, I
think.

FWIW, I could not measure any actual difference with pgbench -C, up to
500 sessions and an empty input file (just have one meta-command) and
-c 20.

I have added some comments in SyncRepCleanupAtProcExit(), and adjusted
the patch with the suggestion from Fujii-san. Any comments?

Thanks for the patch. Looks good to me +1.

Regards,


Dongming Liu

Attachments:

smime.p7sapplication/pkcs7-signature; name=smime.p7sDownload
#17Michael Paquier
michael@paquier.xyz
In reply to: Fujii Masao (#15)
Re: Problem with synchronous replication

On Thu, Oct 31, 2019 at 05:38:32PM +0900, Fujii Masao wrote:

Thanks for the patch! Looks good to me.

Thanks. I have applied the core fix down to 9.4, leaving the new
assertion improvements only for HEAD.
--
Michael