SyncRepLock acquired exclusively in default configuration

masahiko.sawada@2ndquadrant.com

almost 6 years ago

In reply to: Andres Freund (#1)

Re: SyncRepLock acquired exclusively in default configuration

On Mon, 6 Apr 2020 at 14:04, Andres Freund <andres@anarazel.de> wrote:

Hi,

Due to the change below, when using the default postgres configuration
of ynchronous_commit = on, max_wal_senders = 10, will now acquire a new
exclusive lwlock after writing a commit record.

Indeed.

commit 48c9f4926562278a2fd2b85e7486c6d11705f177
Author: Simon Riggs <simon@2ndQuadrant.com>
Date: 2017-12-29 14:30:33 +0000

Fix race condition when changing synchronous_standby_names

A momentary window exists when synchronous_standby_names
changes that allows commands issued after the change to
continue to act as async until the change becomes visible.
Remove the race by using more appropriate test in syncrep.c

Author: Asim Rama Praveen and Ashwin Agrawal
Reported-by: Xin Zhang, Ashwin Agrawal, and Asim Rama Praveen
Reviewed-by: Michael Paquier, Masahiko Sawada

As far as I can tell there was no discussion about the added contention
due this change in the relevant thread [1].

The default configuration has an empty synchronous_standby_names. Before
this change we'd fall out of SyncRepWaitForLSN() before acquiring
SyncRepLock in exlusive mode. Now we don't anymore.

I'm really not ok with unneccessarily adding an exclusive lock
acquisition to such a crucial path.

I think we can acquire SyncRepLock in share mode once to check
WalSndCtl->sync_standbys_defined and if it's true then check it again
after acquiring it in exclusive mode. But it in turn ends up with
adding one extra LWLockAcquire and LWLockRelease in sync rep path.

Regards,

--
Masahiko Sawada http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Ashwin Agrawal

aagrawal@pivotal.io

almost 6 years ago

In reply to: Masahiko Sawada (#2)

Re: SyncRepLock acquired exclusively in default configuration

On Mon, Apr 6, 2020 at 1:52 AM Masahiko Sawada <
masahiko.sawada@2ndquadrant.com> wrote:

On Mon, 6 Apr 2020 at 14:04, Andres Freund <andres@anarazel.de> wrote:

commit 48c9f4926562278a2fd2b85e7486c6d11705f177
Author: Simon Riggs <simon@2ndQuadrant.com>
Date: 2017-12-29 14:30:33 +0000

Fix race condition when changing synchronous_standby_names

A momentary window exists when synchronous_standby_names
changes that allows commands issued after the change to
continue to act as async until the change becomes visible.
Remove the race by using more appropriate test in syncrep.c

Author: Asim Rama Praveen and Ashwin Agrawal
Reported-by: Xin Zhang, Ashwin Agrawal, and Asim Rama Praveen
Reviewed-by: Michael Paquier, Masahiko Sawada

As far as I can tell there was no discussion about the added contention
due this change in the relevant thread [1].

The default configuration has an empty synchronous_standby_names. Before
this change we'd fall out of SyncRepWaitForLSN() before acquiring
SyncRepLock in exlusive mode. Now we don't anymore.

I'm really not ok with unneccessarily adding an exclusive lock
acquisition to such a crucial path.

I think we can acquire SyncRepLock in share mode once to check
WalSndCtl->sync_standbys_defined and if it's true then check it again
after acquiring it in exclusive mode. But it in turn ends up with
adding one extra LWLockAcquire and LWLockRelease in sync rep path.

How about we change it to this ?

diff --git a/src/backend/replication/syncrep.c
b/src/backend/replication/syncrep.c
index ffd5b31eb2..cdb82a8b28 100644
--- a/src/backend/replication/syncrep.c
+++ b/src/backend/replication/syncrep.c
@@ -165,8 +165,11 @@ SyncRepWaitForLSN(XLogRecPtr lsn, bool commit)
        /*
         * Fast exit if user has not requested sync replication.
         */
-       if (!SyncRepRequested())
-               return;
+       if (!SyncRepRequested() || !SyncStandbysDefined())
+       {
+               if (!WalSndCtl->sync_standbys_defined)
+                       return;
+       }

Assert(SHMQueueIsDetached(&(MyProc->syncRepLinks)));
Assert(WalSndCtl != NULL);

Bring back the check which existed based on GUC but instead of just blindly
returning based on just GUC not being set, check
WalSndCtl->sync_standbys_defined. Thoughts?

Andres Freund

andres@anarazel.de

almost 6 years ago

In reply to: Ashwin Agrawal (#3)

Re: SyncRepLock acquired exclusively in default configuration

Hi,

On 2020-04-06 11:51:06 -0700, Ashwin Agrawal wrote:

On Mon, Apr 6, 2020 at 1:52 AM Masahiko Sawada <
masahiko.sawada@2ndquadrant.com> wrote:

On Mon, 6 Apr 2020 at 14:04, Andres Freund <andres@anarazel.de> wrote:

I'm really not ok with unneccessarily adding an exclusive lock
acquisition to such a crucial path.

I think we can acquire SyncRepLock in share mode once to check
WalSndCtl->sync_standbys_defined and if it's true then check it again
after acquiring it in exclusive mode. But it in turn ends up with
adding one extra LWLockAcquire and LWLockRelease in sync rep path.

That's still too much. Adding another lwlock acquisition, where the same
lock is acquired by all backends (contrasting e.g. to buffer locks), to
the commit path, for the benefit of a feature that the vast majority of
people aren't going to use, isn't good.

How about we change it to this ?

Hm. Better. But I think it might need at least a compiler barrier /
volatile memory load? Unlikely here, but otherwise the compiler could
theoretically just stash the variable somewhere locally (it's not likely
to be a problem because it'd not be long ago that we acquired an lwlock,
which is a full barrier).

Bring back the check which existed based on GUC but instead of just blindly
returning based on just GUC not being set, check
WalSndCtl->sync_standbys_defined. Thoughts?

Hm. Is there any reason not to just check
WalSndCtl->sync_standbys_defined? rather than both !SyncStandbysDefined()
and WalSndCtl->sync_standbys_defined?

Greetings,

Andres Freund

masahiko.sawada@2ndquadrant.com

almost 6 years ago

In reply to: Andres Freund (#4)

Re: SyncRepLock acquired exclusively in default configuration

On Tue, 7 Apr 2020 at 06:14, Andres Freund <andres@anarazel.de> wrote:

Hi,

On 2020-04-06 11:51:06 -0700, Ashwin Agrawal wrote:

On Mon, Apr 6, 2020 at 1:52 AM Masahiko Sawada <
masahiko.sawada@2ndquadrant.com> wrote:

On Mon, 6 Apr 2020 at 14:04, Andres Freund <andres@anarazel.de> wrote:

I'm really not ok with unneccessarily adding an exclusive lock
acquisition to such a crucial path.

I think we can acquire SyncRepLock in share mode once to check
WalSndCtl->sync_standbys_defined and if it's true then check it again
after acquiring it in exclusive mode. But it in turn ends up with
adding one extra LWLockAcquire and LWLockRelease in sync rep path.

That's still too much. Adding another lwlock acquisition, where the same
lock is acquired by all backends (contrasting e.g. to buffer locks), to
the commit path, for the benefit of a feature that the vast majority of
people aren't going to use, isn't good.

Agreed.

In this case it seems okay to read WalSndCtl->sync_standbys_defined
without SyncRepLock before we acquire SyncRepLock in exclusive mode.
While changing WalSndCtl->sync_standbys_defined to true, in the
current code a backend who reached SyncRepWaitForLSN() waits on
SyncRepLock, see sync_standbys_defined is true and enqueue itself.
With this change, since we don't acquire SyncRepLock to read
WalSndCtl->sync_standbys_defined these backends return without waiting
for the change of WalSndCtl->sync_standbys_defined but it would not be
a problem. Similarly, I've considered the case where changing to
false, but I think there is no problem.

Regards,

--
Masahiko Sawada http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Ashwin Agrawal

aagrawal@pivotal.io

almost 6 years ago

In reply to: Andres Freund (#4)

Re: SyncRepLock acquired exclusively in default configuration

On Mon, Apr 6, 2020 at 2:14 PM Andres Freund <andres@anarazel.de> wrote:

How about we change it to this ?

Hm. Better. But I think it might need at least a compiler barrier /
volatile memory load? Unlikely here, but otherwise the compiler could
theoretically just stash the variable somewhere locally (it's not likely
to be a problem because it'd not be long ago that we acquired an lwlock,
which is a full barrier).

That's the part, I am not fully sure about. But reading the comment above
SyncRepUpdateSyncStandbysDefined(), it seems fine.

Bring back the check which existed based on GUC but instead of just
blindly

returning based on just GUC not being set, check
WalSndCtl->sync_standbys_defined. Thoughts?

Hm. Is there any reason not to just check
WalSndCtl->sync_standbys_defined? rather than both !SyncStandbysDefined()
and WalSndCtl->sync_standbys_defined?

Agree, just checking for WalSndCtl->sync_standbys_defined seems fine.

I wasn't fully thinking there, as I got distracted by if lock will be
required or not for reading the same. If lock was required then checking
for guc first would have been better, but seems not required.

masao.fujii@oss.nttdata.com

almost 6 years ago

In reply to: Ashwin Agrawal (#6)

1 attachment(s)

Re: SyncRepLock acquired exclusively in default configuration

On 2020/04/08 3:01, Ashwin Agrawal wrote:

On Mon, Apr 6, 2020 at 2:14 PM Andres Freund <andres@anarazel.de <mailto:andres@anarazel.de>> wrote:

How about we change it to this ?

Hm. Better. But I think it might need at least a compiler barrier /
volatile memory load? Unlikely here, but otherwise the compiler could
theoretically just stash the variable somewhere locally (it's not likely
to be a problem because it'd not be long ago that we acquired an lwlock,
which is a full barrier).

That's the part, I am not fully sure about. But reading the comment above SyncRepUpdateSyncStandbysDefined(), it seems fine.

Bring back the check which existed based on GUC but instead of just blindly
returning based on just GUC not being set, check
WalSndCtl->sync_standbys_defined. Thoughts?

Hm. Is there any reason not to just check
WalSndCtl->sync_standbys_defined? rather than both !SyncStandbysDefined()
and WalSndCtl->sync_standbys_defined?

Agree, just checking for WalSndCtl->sync_standbys_defined seems fine.

So the consensus is something like the following? Patch attached.

         /*
-        * Fast exit if user has not requested sync replication.
+        * Fast exit if user has not requested sync replication, or there are no
+        * sync replication standby names defined.
          */
-       if (!SyncRepRequested())
+       if (!SyncRepRequested() ||
+               !((volatile WalSndCtlData *) WalSndCtl)->sync_standbys_defined)
                 return;

Regards,

--
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION

Attachments:

syncreplock.patchtext/plain; charset=UTF-8; name=syncreplock.patch; x-mac-creator=0; x-mac-type=0Download

diff --git a/src/backend/replication/syncrep.c b/src/backend/replication/syncrep.c
index ffd5b31eb2..c6df3f4da6 100644
--- a/src/backend/replication/syncrep.c
+++ b/src/backend/replication/syncrep.c
@@ -163,9 +163,11 @@ SyncRepWaitForLSN(XLogRecPtr lsn, bool commit)
 		mode = Min(SyncRepWaitMode, SYNC_REP_WAIT_FLUSH);
 
 	/*
-	 * Fast exit if user has not requested sync replication.
+	 * Fast exit if user has not requested sync replication, or there are no
+	 * sync replication standby names defined.
 	 */
-	if (!SyncRepRequested())
+	if (!SyncRepRequested() ||
+		!((volatile WalSndCtlData *) WalSndCtl)->sync_standbys_defined)
 		return;
 
 	Assert(SHMQueueIsDetached(&(MyProc->syncRepLinks)));

masahiko.sawada@2ndquadrant.com

almost 6 years ago

In reply to: Fujii Masao (#7)

Re: SyncRepLock acquired exclusively in default configuration

On Fri, 10 Apr 2020 at 13:20, Fujii Masao <masao.fujii@oss.nttdata.com> wrote:

On 2020/04/08 3:01, Ashwin Agrawal wrote:

On Mon, Apr 6, 2020 at 2:14 PM Andres Freund <andres@anarazel.de <mailto:andres@anarazel.de>> wrote:

How about we change it to this ?

Hm. Better. But I think it might need at least a compiler barrier /
volatile memory load? Unlikely here, but otherwise the compiler could
theoretically just stash the variable somewhere locally (it's not likely
to be a problem because it'd not be long ago that we acquired an lwlock,
which is a full barrier).

That's the part, I am not fully sure about. But reading the comment above SyncRepUpdateSyncStandbysDefined(), it seems fine.

Bring back the check which existed based on GUC but instead of just blindly
returning based on just GUC not being set, check
WalSndCtl->sync_standbys_defined. Thoughts?

Hm. Is there any reason not to just check
WalSndCtl->sync_standbys_defined? rather than both !SyncStandbysDefined()
and WalSndCtl->sync_standbys_defined?

Agree, just checking for WalSndCtl->sync_standbys_defined seems fine.

So the consensus is something like the following? Patch attached.
/*
-        * Fast exit if user has not requested sync replication.
+        * Fast exit if user has not requested sync replication, or there are no
+        * sync replication standby names defined.
*/
-       if (!SyncRepRequested())
+       if (!SyncRepRequested() ||
+               !((volatile WalSndCtlData *) WalSndCtl)->sync_standbys_defined)
return;

I think we need more comments describing why checking
sync_standby_defined without SyncRepLock is safe here. For example:

This routine gets called every commit time. So, to check if the
synchronous standbys is defined as quick as possible we check
WalSndCtl->sync_standbys_defined without acquiring SyncRepLock. Since
we make this test unlocked, there's a change we might fail to notice
that it has been turned off and continue processing. But since the
subsequent check will check it again while holding SyncRepLock, it's
no problem. Similarly even if we fail to notice that it has been
turned on, it's okay to return quickly since all backend consistently
behaves so.

Regards,

--
Masahiko Sawada http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

masao.fujii@oss.nttdata.com

almost 6 years ago

In reply to: Masahiko Sawada (#8)

Re: SyncRepLock acquired exclusively in default configuration

On 2020/04/10 14:11, Masahiko Sawada wrote:

On Fri, 10 Apr 2020 at 13:20, Fujii Masao <masao.fujii@oss.nttdata.com> wrote:
On 2020/04/08 3:01, Ashwin Agrawal wrote:

On Mon, Apr 6, 2020 at 2:14 PM Andres Freund <andres@anarazel.de <mailto:andres@anarazel.de>> wrote:

How about we change it to this ?

Hm. Better. But I think it might need at least a compiler barrier /
volatile memory load? Unlikely here, but otherwise the compiler could
theoretically just stash the variable somewhere locally (it's not likely
to be a problem because it'd not be long ago that we acquired an lwlock,
which is a full barrier).

That's the part, I am not fully sure about. But reading the comment above SyncRepUpdateSyncStandbysDefined(), it seems fine.

Bring back the check which existed based on GUC but instead of just blindly
returning based on just GUC not being set, check
WalSndCtl->sync_standbys_defined. Thoughts?

Hm. Is there any reason not to just check
WalSndCtl->sync_standbys_defined? rather than both !SyncStandbysDefined()
and WalSndCtl->sync_standbys_defined?

Agree, just checking for WalSndCtl->sync_standbys_defined seems fine.

So the consensus is something like the following? Patch attached.
/*
-        * Fast exit if user has not requested sync replication.
+        * Fast exit if user has not requested sync replication, or there are no
+        * sync replication standby names defined.
*/
-       if (!SyncRepRequested())
+       if (!SyncRepRequested() ||
+               !((volatile WalSndCtlData *) WalSndCtl)->sync_standbys_defined)
return;
I think we need more comments describing why checking
sync_standby_defined without SyncRepLock is safe here. For example:

Yep, agreed!

This routine gets called every commit time. So, to check if the
synchronous standbys is defined as quick as possible we check
WalSndCtl->sync_standbys_defined without acquiring SyncRepLock. Since
we make this test unlocked, there's a change we might fail to notice
that it has been turned off and continue processing.

Does this really happen? I was thinking that the problem by not taking
the lock here is that SyncRepWaitForLSN() can see that shared flag after
SyncRepUpdateSyncStandbysDefined() wakes up all the waiters and
before it sets the flag to false. Then if SyncRepWaitForLSN() adds itself
into the wait queue becaues the flag was true, without lock, it may keep
sleeping infinitely.

But since the
subsequent check will check it again while holding SyncRepLock, it's
no problem. Similarly even if we fail to notice that it has been
turned on

Is this true? ISTM that after SyncRepUpdateSyncStandbysDefined()
sets the flag to true, SyncRepWaitForLSN() basically doesn't seem
to fail to notice that. No?

Regards,

--
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION

#10

masahiko.sawada@2ndquadrant.com

almost 6 years ago

In reply to: Fujii Masao (#9)

Re: SyncRepLock acquired exclusively in default configuration

On Fri, 10 Apr 2020 at 18:57, Fujii Masao <masao.fujii@oss.nttdata.com> wrote:

On 2020/04/10 14:11, Masahiko Sawada wrote:
On Fri, 10 Apr 2020 at 13:20, Fujii Masao <masao.fujii@oss.nttdata.com> wrote:
On 2020/04/08 3:01, Ashwin Agrawal wrote:

On Mon, Apr 6, 2020 at 2:14 PM Andres Freund <andres@anarazel.de <mailto:andres@anarazel.de>> wrote:

How about we change it to this ?

Hm. Better. But I think it might need at least a compiler barrier /
volatile memory load? Unlikely here, but otherwise the compiler could
theoretically just stash the variable somewhere locally (it's not likely
to be a problem because it'd not be long ago that we acquired an lwlock,
which is a full barrier).

That's the part, I am not fully sure about. But reading the comment above SyncRepUpdateSyncStandbysDefined(), it seems fine.

Bring back the check which existed based on GUC but instead of just blindly
returning based on just GUC not being set, check
WalSndCtl->sync_standbys_defined. Thoughts?

Hm. Is there any reason not to just check
WalSndCtl->sync_standbys_defined? rather than both !SyncStandbysDefined()
and WalSndCtl->sync_standbys_defined?

Agree, just checking for WalSndCtl->sync_standbys_defined seems fine.

So the consensus is something like the following? Patch attached.
/*
-        * Fast exit if user has not requested sync replication.
+        * Fast exit if user has not requested sync replication, or there are no
+        * sync replication standby names defined.
*/
-       if (!SyncRepRequested())
+       if (!SyncRepRequested() ||
+               !((volatile WalSndCtlData *) WalSndCtl)->sync_standbys_defined)
return;
I think we need more comments describing why checking
sync_standby_defined without SyncRepLock is safe here. For example:
Yep, agreed!

This routine gets called every commit time. So, to check if the
synchronous standbys is defined as quick as possible we check
WalSndCtl->sync_standbys_defined without acquiring SyncRepLock. Since
we make this test unlocked, there's a change we might fail to notice
that it has been turned off and continue processing.

Does this really happen? I was thinking that the problem by not taking
the lock here is that SyncRepWaitForLSN() can see that shared flag after
SyncRepUpdateSyncStandbysDefined() wakes up all the waiters and
before it sets the flag to false. Then if SyncRepWaitForLSN() adds itself
into the wait queue becaues the flag was true, without lock, it may keep
sleeping infinitely.

I think that because a backend does the following check after
acquiring SyncRepLock, in that case, once the backend has taken
SyncRepLock it can see that sync_standbys_defined is false and return.
But you meant that we do both checks without SyncRepLock?

/*
* We don't wait for sync rep if WalSndCtl->sync_standbys_defined is not
* set. See SyncRepUpdateSyncStandbysDefined.
*
* Also check that the standby hasn't already replied. Unlikely race
* condition but we'll be fetching that cache line anyway so it's likely
* to be a low cost check.
*/
if (!WalSndCtl->sync_standbys_defined ||
lsn <= WalSndCtl->lsn[mode])
{
LWLockRelease(SyncRepLock);
return;
}

But since the
subsequent check will check it again while holding SyncRepLock, it's
no problem. Similarly even if we fail to notice that it has been
turned on

Is this true? ISTM that after SyncRepUpdateSyncStandbysDefined()
sets the flag to true, SyncRepWaitForLSN() basically doesn't seem
to fail to notice that. No?

What I wanted to say is, in the current code, while the checkpointer
process is holding SyncRepLock to turn off sync_standbys_defined,
backends who reach SyncRepWaitForLSN() wait for the lock. Then, after
the checkpointer process releases SyncRepLock these backend can
enqueue themselves to the wait queue because they can see that
sync_standbys_defined is turned on. On the other hand if we do the
check without SyncRepLock, backends who reach SyncRepWaitForLSN() will
return instead of waiting, in spite of checkpointer process being
turning on sync_standbys_defined. Which means these backends are
failing to notice that it has been turned on, I thought.

Regards,

--
Masahiko Sawada http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

#11

masao.fujii@oss.nttdata.com

almost 6 years ago

In reply to: Masahiko Sawada (#10)

Re: SyncRepLock acquired exclusively in default configuration

On 2020/04/10 20:56, Masahiko Sawada wrote:

On Fri, 10 Apr 2020 at 18:57, Fujii Masao <masao.fujii@oss.nttdata.com> wrote:
On 2020/04/10 14:11, Masahiko Sawada wrote:
On Fri, 10 Apr 2020 at 13:20, Fujii Masao <masao.fujii@oss.nttdata.com> wrote:
On 2020/04/08 3:01, Ashwin Agrawal wrote:

On Mon, Apr 6, 2020 at 2:14 PM Andres Freund <andres@anarazel.de <mailto:andres@anarazel.de>> wrote:

How about we change it to this ?

Hm. Better. But I think it might need at least a compiler barrier /
volatile memory load? Unlikely here, but otherwise the compiler could
theoretically just stash the variable somewhere locally (it's not likely
to be a problem because it'd not be long ago that we acquired an lwlock,
which is a full barrier).

That's the part, I am not fully sure about. But reading the comment above SyncRepUpdateSyncStandbysDefined(), it seems fine.

Bring back the check which existed based on GUC but instead of just blindly
returning based on just GUC not being set, check
WalSndCtl->sync_standbys_defined. Thoughts?

Hm. Is there any reason not to just check
WalSndCtl->sync_standbys_defined? rather than both !SyncStandbysDefined()
and WalSndCtl->sync_standbys_defined?

Agree, just checking for WalSndCtl->sync_standbys_defined seems fine.

So the consensus is something like the following? Patch attached.
/*
-        * Fast exit if user has not requested sync replication.
+        * Fast exit if user has not requested sync replication, or there are no
+        * sync replication standby names defined.
*/
-       if (!SyncRepRequested())
+       if (!SyncRepRequested() ||
+               !((volatile WalSndCtlData *) WalSndCtl)->sync_standbys_defined)
return;
I think we need more comments describing why checking
sync_standby_defined without SyncRepLock is safe here. For example:
Yep, agreed!

This routine gets called every commit time. So, to check if the
synchronous standbys is defined as quick as possible we check
WalSndCtl->sync_standbys_defined without acquiring SyncRepLock. Since
we make this test unlocked, there's a change we might fail to notice
that it has been turned off and continue processing.

Does this really happen? I was thinking that the problem by not taking
the lock here is that SyncRepWaitForLSN() can see that shared flag after
SyncRepUpdateSyncStandbysDefined() wakes up all the waiters and
before it sets the flag to false. Then if SyncRepWaitForLSN() adds itself
into the wait queue becaues the flag was true, without lock, it may keep
sleeping infinitely.
I think that because a backend does the following check after
acquiring SyncRepLock, in that case, once the backend has taken
SyncRepLock it can see that sync_standbys_defined is false and return.

Yes, but the backend can see that sync_standby_defined indicates false
whether holding SyncRepLock or not, after the checkpointer sets it to false.

But you meant that we do both checks without SyncRepLock?

Maybe No. The change that the latest patch provides should be applied, I think.
That is, sync_standbys_defined should be check without lock at first, then
only if it's true, it should be checked again with lock.

ISTM that basically SyncRepLock is used in SyncRepWaitForLSN() and
SyncRepUpdateSyncStandbysDefined() to make operation on the queue
and enabling sync_standbys_defined atomic. Without lock, the issue that
the comment in SyncRepUpdateSyncStandbysDefined() explains would
happen. That is, the backend may keep waiting infinitely as follows.

1. checkpointer calls SyncRepUpdateSyncStandbysDefined()
2. checkpointer sees that the flag indicates true but the config indicates false
3. checkpointer takes lock and wakes up all the waiters
4. backend calls SyncRepWaitForLSN() can see that the flag indicates true
5. checkpointer sets the flag to false and releases the lock
6. backend adds itself to the queue and wait until it's waken up, but will not happen immediately

So after the backend sees that the flag indicates true without lock,
it must check the flag again with lock immediately without operating
the queue. If this my understanding is right, I was thinking that
the comment should mention these things.

/*
* We don't wait for sync rep if WalSndCtl->sync_standbys_defined is not
* set. See SyncRepUpdateSyncStandbysDefined.
*
* Also check that the standby hasn't already replied. Unlikely race
* condition but we'll be fetching that cache line anyway so it's likely
* to be a low cost check.
*/
if (!WalSndCtl->sync_standbys_defined ||
lsn <= WalSndCtl->lsn[mode])
{
LWLockRelease(SyncRepLock);
return;
}

But since the
subsequent check will check it again while holding SyncRepLock, it's
no problem. Similarly even if we fail to notice that it has been
turned on

Is this true? ISTM that after SyncRepUpdateSyncStandbysDefined()
sets the flag to true, SyncRepWaitForLSN() basically doesn't seem
to fail to notice that. No?

What I wanted to say is, in the current code, while the checkpointer
process is holding SyncRepLock to turn off sync_standbys_defined,
backends who reach SyncRepWaitForLSN() wait for the lock. Then, after
the checkpointer process releases SyncRepLock these backend can
enqueue themselves to the wait queue because they can see that
sync_standbys_defined is turned on.

In this case, since the checkpointer turned the flag off while holding
the lock, the backend sees that the flag is turned off, and doesn't
enqueue itself. No?

On the other hand if we do the
check without SyncRepLock, backends who reach SyncRepWaitForLSN() will
return instead of waiting, in spite of checkpointer process being
turning on sync_standbys_defined. Which means these backends are
failing to notice that it has been turned on, I thought.

No. Or I'm missing something... In this case, the backend sees that
the flag is turned on without lock since checkpointer turned it on.
So you're thinking the following. Right?

1. sync_standbys_defined flag is false
2. checkpointer takes the lock and turns the flag on
3. backend sees the flag
4. checkpointer releases the lock

In #3, the flag indicates true, I think. But you think it's false?

Regards,

--
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION

#12

masahiko.sawada@2ndquadrant.com

almost 6 years ago

In reply to: Fujii Masao (#11)

Re: SyncRepLock acquired exclusively in default configuration

On Fri, 10 Apr 2020 at 21:52, Fujii Masao <masao.fujii@oss.nttdata.com> wrote:

On 2020/04/10 20:56, Masahiko Sawada wrote:
On Fri, 10 Apr 2020 at 18:57, Fujii Masao <masao.fujii@oss.nttdata.com> wrote:
On 2020/04/10 14:11, Masahiko Sawada wrote:
On Fri, 10 Apr 2020 at 13:20, Fujii Masao <masao.fujii@oss.nttdata.com> wrote:
On 2020/04/08 3:01, Ashwin Agrawal wrote:

On Mon, Apr 6, 2020 at 2:14 PM Andres Freund <andres@anarazel.de <mailto:andres@anarazel.de>> wrote:

How about we change it to this ?

Hm. Better. But I think it might need at least a compiler barrier /
volatile memory load? Unlikely here, but otherwise the compiler could
theoretically just stash the variable somewhere locally (it's not likely
to be a problem because it'd not be long ago that we acquired an lwlock,
which is a full barrier).

That's the part, I am not fully sure about. But reading the comment above SyncRepUpdateSyncStandbysDefined(), it seems fine.

Bring back the check which existed based on GUC but instead of just blindly
returning based on just GUC not being set, check
WalSndCtl->sync_standbys_defined. Thoughts?

Hm. Is there any reason not to just check
WalSndCtl->sync_standbys_defined? rather than both !SyncStandbysDefined()
and WalSndCtl->sync_standbys_defined?

Agree, just checking for WalSndCtl->sync_standbys_defined seems fine.

So the consensus is something like the following? Patch attached.
/*
-        * Fast exit if user has not requested sync replication.
+        * Fast exit if user has not requested sync replication, or there are no
+        * sync replication standby names defined.
*/
-       if (!SyncRepRequested())
+       if (!SyncRepRequested() ||
+               !((volatile WalSndCtlData *) WalSndCtl)->sync_standbys_defined)
return;
I think we need more comments describing why checking
sync_standby_defined without SyncRepLock is safe here. For example:
Yep, agreed!

This routine gets called every commit time. So, to check if the
synchronous standbys is defined as quick as possible we check
WalSndCtl->sync_standbys_defined without acquiring SyncRepLock. Since
we make this test unlocked, there's a change we might fail to notice
that it has been turned off and continue processing.

Does this really happen? I was thinking that the problem by not taking
the lock here is that SyncRepWaitForLSN() can see that shared flag after
SyncRepUpdateSyncStandbysDefined() wakes up all the waiters and
before it sets the flag to false. Then if SyncRepWaitForLSN() adds itself
into the wait queue becaues the flag was true, without lock, it may keep
sleeping infinitely.
I think that because a backend does the following check after
acquiring SyncRepLock, in that case, once the backend has taken
SyncRepLock it can see that sync_standbys_defined is false and return.
Yes, but the backend can see that sync_standby_defined indicates false
whether holding SyncRepLock or not, after the checkpointer sets it to false.

But you meant that we do both checks without SyncRepLock?

Maybe No. The change that the latest patch provides should be applied, I think.
That is, sync_standbys_defined should be check without lock at first, then
only if it's true, it should be checked again with lock.

Yes. My understanding is the same.

After applying your patch, SyncRepWaitForLSN() is going to become
something like:

/*
* Fast exit if user has not requested sync replication, or there are no
* sync replication standby names defined.
*/
if (!SyncRepRequested() ||
!((volatile WalSndCtlData *) WalSndCtl)->sync_standbys_defined)
return;

Assert(SHMQueueIsDetached(&(MyProc->syncRepLinks)));
Assert(WalSndCtl != NULL);

LWLockAcquire(SyncRepLock, LW_EXCLUSIVE);
Assert(MyProc->syncRepState == SYNC_REP_NOT_WAITING);

/*
* Set our waitLSN so WALSender will know when to wake us, and add
* ourselves to the queue.
*/
MyProc->waitLSN = lsn;
MyProc->syncRepState = SYNC_REP_WAITING;
SyncRepQueueInsert(mode);
Assert(SyncRepQueueIsOrderedByLSN(mode));
LWLockRelease(SyncRepLock);

There are two checks of sync_standbys_defined. The first check is
performed without SyncRepLock and the second check is performed with
SyncRepLock. That's what you and I are expecting. Right?

ISTM that basically SyncRepLock is used in SyncRepWaitForLSN() and
SyncRepUpdateSyncStandbysDefined() to make operation on the queue
and enabling sync_standbys_defined atomic. Without lock, the issue that
the comment in SyncRepUpdateSyncStandbysDefined() explains would
happen. That is, the backend may keep waiting infinitely as follows.

Let me think the following sequence after applying your changes:

1. checkpointer calls SyncRepUpdateSyncStandbysDefined()
2. checkpointer sees that the flag indicates true but the config indicates false
3. checkpointer takes lock and wakes up all the waiters
4. backend calls SyncRepWaitForLSN() can see that the flag indicates true

Yes, I suppose this is the first check of sync_standbys_defined.

And before the second check, backend tries to acquire SyncRepLock but
since the lock is already being held by checkohpointer it must wait.

5. checkpointer sets the flag to false and releases the lock

After checkpointer release the lock, the backend is woken up.

6. backend adds itself to the queue and wait until it's waken up, but will not happen immediately

The backend sees that the flag has been false at the second check, so return.

If we didn't acquire SyncRepLock even for the second check I think the
backend would keep waiting infinitely as you mentioned.

So after the backend sees that the flag indicates true without lock,
it must check the flag again with lock immediately without operating
the queue. If this my understanding is right, I was thinking that
the comment should mention these things.

I think that's right. I was going to describe why we do the first
check without SyncRepLock and why it is safe but it seems to me that
these things you mentioned are related to the second check, if I'm not
missing something.

/*
* We don't wait for sync rep if WalSndCtl->sync_standbys_defined is not
* set. See SyncRepUpdateSyncStandbysDefined.
*
* Also check that the standby hasn't already replied. Unlikely race
* condition but we'll be fetching that cache line anyway so it's likely
* to be a low cost check.
*/
if (!WalSndCtl->sync_standbys_defined ||
lsn <= WalSndCtl->lsn[mode])
{
LWLockRelease(SyncRepLock);
return;
}

But since the
subsequent check will check it again while holding SyncRepLock, it's
no problem. Similarly even if we fail to notice that it has been
turned on

Is this true? ISTM that after SyncRepUpdateSyncStandbysDefined()
sets the flag to true, SyncRepWaitForLSN() basically doesn't seem
to fail to notice that. No?

What I wanted to say is, in the current code, while the checkpointer
process is holding SyncRepLock to turn off sync_standbys_defined,
backends who reach SyncRepWaitForLSN() wait for the lock. Then, after
the checkpointer process releases SyncRepLock these backend can
enqueue themselves to the wait queue because they can see that
sync_standbys_defined is turned on.

In this case, since the checkpointer turned the flag off while holding
the lock, the backend sees that the flag is turned off, and doesn't
enqueue itself. No?

Oops, I had mistake here. It should be "while the checkpointer process
is holding SyncRepLock to turn *on* sync_standbys_defined, ...".

On the other hand if we do the
check without SyncRepLock, backends who reach SyncRepWaitForLSN() will
return instead of waiting, in spite of checkpointer process being
turning on sync_standbys_defined. Which means these backends are
failing to notice that it has been turned on, I thought.

No. Or I'm missing something... In this case, the backend sees that
the flag is turned on without lock since checkpointer turned it on.
So you're thinking the following. Right?

1. sync_standbys_defined flag is false
2. checkpointer takes the lock and turns the flag on
3. backend sees the flag
4. checkpointer releases the lock

In #3, the flag indicates true, I think. But you think it's false?

I meant the backends who reach SyncRepLock() while checkpointer is at
after acquiring the lock but before turning the flag on, described at
#3 step in the following sequence. Such backends will wait for the
lock in the current code, but after applying the patch they return
quickly. So what I'm thinking is:

1. sync_standbys_defined flag is false
2. checkpointer takes the lock
3. backend sees the flag, and return as it's still false
4. checkpointer turns the flag on
5. checkpointer releases the lock

If a backend reaches SyncRepWaitForLSN() between #4 and #5 it will
wait for the lock and then enqueue itself after acquiring the lock.
But such behavior is not changed before and after applying the patch.

Regards,

--
Masahiko Sawada http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

#13

masahiko.sawada@2ndquadrant.com

over 5 years ago

In reply to: Masahiko Sawada (#12)

Re: SyncRepLock acquired exclusively in default configuration

On Sat, 11 Apr 2020 at 09:30, Masahiko Sawada
<masahiko.sawada@2ndquadrant.com> wrote:

On Fri, 10 Apr 2020 at 21:52, Fujii Masao <masao.fujii@oss.nttdata.com> wrote:
On 2020/04/10 20:56, Masahiko Sawada wrote:
On Fri, 10 Apr 2020 at 18:57, Fujii Masao <masao.fujii@oss.nttdata.com> wrote:
On 2020/04/10 14:11, Masahiko Sawada wrote:
On Fri, 10 Apr 2020 at 13:20, Fujii Masao <masao.fujii@oss.nttdata.com> wrote:
On 2020/04/08 3:01, Ashwin Agrawal wrote:

On Mon, Apr 6, 2020 at 2:14 PM Andres Freund <andres@anarazel.de <mailto:andres@anarazel.de>> wrote:

How about we change it to this ?

Hm. Better. But I think it might need at least a compiler barrier /
volatile memory load? Unlikely here, but otherwise the compiler could
theoretically just stash the variable somewhere locally (it's not likely
to be a problem because it'd not be long ago that we acquired an lwlock,
which is a full barrier).

That's the part, I am not fully sure about. But reading the comment above SyncRepUpdateSyncStandbysDefined(), it seems fine.

Bring back the check which existed based on GUC but instead of just blindly
returning based on just GUC not being set, check
WalSndCtl->sync_standbys_defined. Thoughts?

Hm. Is there any reason not to just check
WalSndCtl->sync_standbys_defined? rather than both !SyncStandbysDefined()
and WalSndCtl->sync_standbys_defined?

Agree, just checking for WalSndCtl->sync_standbys_defined seems fine.

So the consensus is something like the following? Patch attached.
/*
-        * Fast exit if user has not requested sync replication.
+        * Fast exit if user has not requested sync replication, or there are no
+        * sync replication standby names defined.
*/
-       if (!SyncRepRequested())
+       if (!SyncRepRequested() ||
+               !((volatile WalSndCtlData *) WalSndCtl)->sync_standbys_defined)
return;
I think we need more comments describing why checking
sync_standby_defined without SyncRepLock is safe here. For example:
Yep, agreed!

This routine gets called every commit time. So, to check if the
synchronous standbys is defined as quick as possible we check
WalSndCtl->sync_standbys_defined without acquiring SyncRepLock. Since
we make this test unlocked, there's a change we might fail to notice
that it has been turned off and continue processing.

Does this really happen? I was thinking that the problem by not taking
the lock here is that SyncRepWaitForLSN() can see that shared flag after
SyncRepUpdateSyncStandbysDefined() wakes up all the waiters and
before it sets the flag to false. Then if SyncRepWaitForLSN() adds itself
into the wait queue becaues the flag was true, without lock, it may keep
sleeping infinitely.
I think that because a backend does the following check after
acquiring SyncRepLock, in that case, once the backend has taken
SyncRepLock it can see that sync_standbys_defined is false and return.
Yes, but the backend can see that sync_standby_defined indicates false
whether holding SyncRepLock or not, after the checkpointer sets it to false.

But you meant that we do both checks without SyncRepLock?

Maybe No. The change that the latest patch provides should be applied, I think.
That is, sync_standbys_defined should be check without lock at first, then
only if it's true, it should be checked again with lock.
Yes. My understanding is the same.

After applying your patch, SyncRepWaitForLSN() is going to become
something like:

/*
* Fast exit if user has not requested sync replication, or there are no
* sync replication standby names defined.
*/
if (!SyncRepRequested() ||
!((volatile WalSndCtlData *) WalSndCtl)->sync_standbys_defined)
return;

Assert(SHMQueueIsDetached(&(MyProc->syncRepLinks)));
Assert(WalSndCtl != NULL);

LWLockAcquire(SyncRepLock, LW_EXCLUSIVE);
Assert(MyProc->syncRepState == SYNC_REP_NOT_WAITING);

/*
* We don't wait for sync rep if WalSndCtl->sync_standbys_defined is not
* set. See SyncRepUpdateSyncStandbysDefined.
*
* Also check that the standby hasn't already replied. Unlikely race
* condition but we'll be fetching that cache line anyway so it's likely
* to be a low cost check.
*/
if (!WalSndCtl->sync_standbys_defined ||
lsn <= WalSndCtl->lsn[mode])
{
LWLockRelease(SyncRepLock);
return;
}

/*
* Set our waitLSN so WALSender will know when to wake us, and add
* ourselves to the queue.
*/
MyProc->waitLSN = lsn;
MyProc->syncRepState = SYNC_REP_WAITING;
SyncRepQueueInsert(mode);
Assert(SyncRepQueueIsOrderedByLSN(mode));
LWLockRelease(SyncRepLock);

There are two checks of sync_standbys_defined. The first check is
performed without SyncRepLock and the second check is performed with
SyncRepLock. That's what you and I are expecting. Right?

ISTM that basically SyncRepLock is used in SyncRepWaitForLSN() and
SyncRepUpdateSyncStandbysDefined() to make operation on the queue
and enabling sync_standbys_defined atomic. Without lock, the issue that
the comment in SyncRepUpdateSyncStandbysDefined() explains would
happen. That is, the backend may keep waiting infinitely as follows.

Let me think the following sequence after applying your changes:

1. checkpointer calls SyncRepUpdateSyncStandbysDefined()
2. checkpointer sees that the flag indicates true but the config indicates false
3. checkpointer takes lock and wakes up all the waiters
4. backend calls SyncRepWaitForLSN() can see that the flag indicates true

Yes, I suppose this is the first check of sync_standbys_defined.

And before the second check, backend tries to acquire SyncRepLock but
since the lock is already being held by checkohpointer it must wait.

5. checkpointer sets the flag to false and releases the lock

After checkpointer release the lock, the backend is woken up.

6. backend adds itself to the queue and wait until it's waken up, but will not happen immediately

The backend sees that the flag has been false at the second check, so return.

If we didn't acquire SyncRepLock even for the second check I think the
backend would keep waiting infinitely as you mentioned.

So after the backend sees that the flag indicates true without lock,
it must check the flag again with lock immediately without operating
the queue. If this my understanding is right, I was thinking that
the comment should mention these things.

I think that's right. I was going to describe why we do the first
check without SyncRepLock and why it is safe but it seems to me that
these things you mentioned are related to the second check, if I'm not
missing something.

/*
* We don't wait for sync rep if WalSndCtl->sync_standbys_defined is not
* set. See SyncRepUpdateSyncStandbysDefined.
*
* Also check that the standby hasn't already replied. Unlikely race
* condition but we'll be fetching that cache line anyway so it's likely
* to be a low cost check.
*/
if (!WalSndCtl->sync_standbys_defined ||
lsn <= WalSndCtl->lsn[mode])
{
LWLockRelease(SyncRepLock);
return;
}

But since the
subsequent check will check it again while holding SyncRepLock, it's
no problem. Similarly even if we fail to notice that it has been
turned on

Is this true? ISTM that after SyncRepUpdateSyncStandbysDefined()
sets the flag to true, SyncRepWaitForLSN() basically doesn't seem
to fail to notice that. No?

What I wanted to say is, in the current code, while the checkpointer
process is holding SyncRepLock to turn off sync_standbys_defined,
backends who reach SyncRepWaitForLSN() wait for the lock. Then, after
the checkpointer process releases SyncRepLock these backend can
enqueue themselves to the wait queue because they can see that
sync_standbys_defined is turned on.

In this case, since the checkpointer turned the flag off while holding
the lock, the backend sees that the flag is turned off, and doesn't
enqueue itself. No?

Oops, I had mistake here. It should be "while the checkpointer process
is holding SyncRepLock to turn *on* sync_standbys_defined, ...".

On the other hand if we do the
check without SyncRepLock, backends who reach SyncRepWaitForLSN() will
return instead of waiting, in spite of checkpointer process being
turning on sync_standbys_defined. Which means these backends are
failing to notice that it has been turned on, I thought.

No. Or I'm missing something... In this case, the backend sees that
the flag is turned on without lock since checkpointer turned it on.
So you're thinking the following. Right?

1. sync_standbys_defined flag is false
2. checkpointer takes the lock and turns the flag on
3. backend sees the flag
4. checkpointer releases the lock

In #3, the flag indicates true, I think. But you think it's false?

I meant the backends who reach SyncRepLock() while checkpointer is at
after acquiring the lock but before turning the flag on, described at
#3 step in the following sequence. Such backends will wait for the
lock in the current code, but after applying the patch they return
quickly. So what I'm thinking is:

1. sync_standbys_defined flag is false
2. checkpointer takes the lock
3. backend sees the flag, and return as it's still false
4. checkpointer turns the flag on
5. checkpointer releases the lock

If a backend reaches SyncRepWaitForLSN() between #4 and #5 it will
wait for the lock and then enqueue itself after acquiring the lock.
But such behavior is not changed before and after applying the patch.

Fujii-san, I think we agree on how to fix this issue and on the patch
you proposed so please add your comments.

This item is for PG14, right? If so I'd like to add this item to the
next commit fest.

Regards,

--
Masahiko Sawada http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

#14

Ashwin Agrawal

aagrawal@pivotal.io

over 5 years ago

In reply to: Masahiko Sawada (#13)

Re: SyncRepLock acquired exclusively in default configuration

On Mon, May 18, 2020 at 7:41 PM Masahiko Sawada <
masahiko.sawada@2ndquadrant.com> wrote:

This item is for PG14, right? If so I'd like to add this item to the
next commit fest.

Sure, add it to commit fest.
Seems though it should be backpatched to relevant branches as well.

#15

Michael Paquier

michael@paquier.xyz

over 5 years ago

In reply to: Ashwin Agrawal (#14)

Re: SyncRepLock acquired exclusively in default configuration

On Tue, May 19, 2020 at 08:56:13AM -0700, Ashwin Agrawal wrote:

Sure, add it to commit fest.
Seems though it should be backpatched to relevant branches as well.

It does not seem to be listed yet. Are you planning to add it under
the section for bug fixes?
--
Michael

#16

masao.fujii@oss.nttdata.com

over 5 years ago

In reply to: Masahiko Sawada (#13)

Re: SyncRepLock acquired exclusively in default configuration

On 2020/05/19 11:41, Masahiko Sawada wrote:

On Sat, 11 Apr 2020 at 09:30, Masahiko Sawada
<masahiko.sawada@2ndquadrant.com> wrote:
On Fri, 10 Apr 2020 at 21:52, Fujii Masao <masao.fujii@oss.nttdata.com> wrote:
On 2020/04/10 20:56, Masahiko Sawada wrote:
On Fri, 10 Apr 2020 at 18:57, Fujii Masao <masao.fujii@oss.nttdata.com> wrote:
On 2020/04/10 14:11, Masahiko Sawada wrote:
On Fri, 10 Apr 2020 at 13:20, Fujii Masao <masao.fujii@oss.nttdata.com> wrote:
On 2020/04/08 3:01, Ashwin Agrawal wrote:

On Mon, Apr 6, 2020 at 2:14 PM Andres Freund <andres@anarazel.de <mailto:andres@anarazel.de>> wrote:

How about we change it to this ?

Hm. Better. But I think it might need at least a compiler barrier /
volatile memory load? Unlikely here, but otherwise the compiler could
theoretically just stash the variable somewhere locally (it's not likely
to be a problem because it'd not be long ago that we acquired an lwlock,
which is a full barrier).

That's the part, I am not fully sure about. But reading the comment above SyncRepUpdateSyncStandbysDefined(), it seems fine.

Bring back the check which existed based on GUC but instead of just blindly
returning based on just GUC not being set, check
WalSndCtl->sync_standbys_defined. Thoughts?

Hm. Is there any reason not to just check
WalSndCtl->sync_standbys_defined? rather than both !SyncStandbysDefined()
and WalSndCtl->sync_standbys_defined?

Agree, just checking for WalSndCtl->sync_standbys_defined seems fine.

So the consensus is something like the following? Patch attached.
/*
-        * Fast exit if user has not requested sync replication.
+        * Fast exit if user has not requested sync replication, or there are no
+        * sync replication standby names defined.
*/
-       if (!SyncRepRequested())
+       if (!SyncRepRequested() ||
+               !((volatile WalSndCtlData *) WalSndCtl)->sync_standbys_defined)
return;
I think we need more comments describing why checking
sync_standby_defined without SyncRepLock is safe here. For example:
Yep, agreed!

This routine gets called every commit time. So, to check if the
synchronous standbys is defined as quick as possible we check
WalSndCtl->sync_standbys_defined without acquiring SyncRepLock. Since
we make this test unlocked, there's a change we might fail to notice
that it has been turned off and continue processing.

Does this really happen? I was thinking that the problem by not taking
the lock here is that SyncRepWaitForLSN() can see that shared flag after
SyncRepUpdateSyncStandbysDefined() wakes up all the waiters and
before it sets the flag to false. Then if SyncRepWaitForLSN() adds itself
into the wait queue becaues the flag was true, without lock, it may keep
sleeping infinitely.
I think that because a backend does the following check after
acquiring SyncRepLock, in that case, once the backend has taken
SyncRepLock it can see that sync_standbys_defined is false and return.
Yes, but the backend can see that sync_standby_defined indicates false
whether holding SyncRepLock or not, after the checkpointer sets it to false.

But you meant that we do both checks without SyncRepLock?

Maybe No. The change that the latest patch provides should be applied, I think.
That is, sync_standbys_defined should be check without lock at first, then
only if it's true, it should be checked again with lock.
Yes. My understanding is the same.

After applying your patch, SyncRepWaitForLSN() is going to become
something like:

/*
* Fast exit if user has not requested sync replication, or there are no
* sync replication standby names defined.
*/
if (!SyncRepRequested() ||
!((volatile WalSndCtlData *) WalSndCtl)->sync_standbys_defined)
return;

Assert(SHMQueueIsDetached(&(MyProc->syncRepLinks)));
Assert(WalSndCtl != NULL);

LWLockAcquire(SyncRepLock, LW_EXCLUSIVE);
Assert(MyProc->syncRepState == SYNC_REP_NOT_WAITING);

/*
* We don't wait for sync rep if WalSndCtl->sync_standbys_defined is not
* set. See SyncRepUpdateSyncStandbysDefined.
*
* Also check that the standby hasn't already replied. Unlikely race
* condition but we'll be fetching that cache line anyway so it's likely
* to be a low cost check.
*/
if (!WalSndCtl->sync_standbys_defined ||
lsn <= WalSndCtl->lsn[mode])
{
LWLockRelease(SyncRepLock);
return;
}

/*
* Set our waitLSN so WALSender will know when to wake us, and add
* ourselves to the queue.
*/
MyProc->waitLSN = lsn;
MyProc->syncRepState = SYNC_REP_WAITING;
SyncRepQueueInsert(mode);
Assert(SyncRepQueueIsOrderedByLSN(mode));
LWLockRelease(SyncRepLock);

There are two checks of sync_standbys_defined. The first check is
performed without SyncRepLock and the second check is performed with
SyncRepLock. That's what you and I are expecting. Right?

ISTM that basically SyncRepLock is used in SyncRepWaitForLSN() and
SyncRepUpdateSyncStandbysDefined() to make operation on the queue
and enabling sync_standbys_defined atomic. Without lock, the issue that
the comment in SyncRepUpdateSyncStandbysDefined() explains would
happen. That is, the backend may keep waiting infinitely as follows.

Let me think the following sequence after applying your changes:

1. checkpointer calls SyncRepUpdateSyncStandbysDefined()
2. checkpointer sees that the flag indicates true but the config indicates false
3. checkpointer takes lock and wakes up all the waiters
4. backend calls SyncRepWaitForLSN() can see that the flag indicates true

Yes, I suppose this is the first check of sync_standbys_defined.

And before the second check, backend tries to acquire SyncRepLock but
since the lock is already being held by checkohpointer it must wait.

5. checkpointer sets the flag to false and releases the lock

After checkpointer release the lock, the backend is woken up.

6. backend adds itself to the queue and wait until it's waken up, but will not happen immediately

The backend sees that the flag has been false at the second check, so return.

If we didn't acquire SyncRepLock even for the second check I think the
backend would keep waiting infinitely as you mentioned.

So after the backend sees that the flag indicates true without lock,
it must check the flag again with lock immediately without operating
the queue. If this my understanding is right, I was thinking that
the comment should mention these things.

I think that's right. I was going to describe why we do the first
check without SyncRepLock and why it is safe but it seems to me that
these things you mentioned are related to the second check, if I'm not
missing something.

/*
* We don't wait for sync rep if WalSndCtl->sync_standbys_defined is not
* set. See SyncRepUpdateSyncStandbysDefined.
*
* Also check that the standby hasn't already replied. Unlikely race
* condition but we'll be fetching that cache line anyway so it's likely
* to be a low cost check.
*/
if (!WalSndCtl->sync_standbys_defined ||
lsn <= WalSndCtl->lsn[mode])
{
LWLockRelease(SyncRepLock);
return;
}

But since the
subsequent check will check it again while holding SyncRepLock, it's
no problem. Similarly even if we fail to notice that it has been
turned on

Is this true? ISTM that after SyncRepUpdateSyncStandbysDefined()
sets the flag to true, SyncRepWaitForLSN() basically doesn't seem
to fail to notice that. No?

What I wanted to say is, in the current code, while the checkpointer
process is holding SyncRepLock to turn off sync_standbys_defined,
backends who reach SyncRepWaitForLSN() wait for the lock. Then, after
the checkpointer process releases SyncRepLock these backend can
enqueue themselves to the wait queue because they can see that
sync_standbys_defined is turned on.

In this case, since the checkpointer turned the flag off while holding
the lock, the backend sees that the flag is turned off, and doesn't
enqueue itself. No?

Oops, I had mistake here. It should be "while the checkpointer process
is holding SyncRepLock to turn *on* sync_standbys_defined, ...".

On the other hand if we do the
check without SyncRepLock, backends who reach SyncRepWaitForLSN() will
return instead of waiting, in spite of checkpointer process being
turning on sync_standbys_defined. Which means these backends are
failing to notice that it has been turned on, I thought.

No. Or I'm missing something... In this case, the backend sees that
the flag is turned on without lock since checkpointer turned it on.
So you're thinking the following. Right?

1. sync_standbys_defined flag is false
2. checkpointer takes the lock and turns the flag on
3. backend sees the flag
4. checkpointer releases the lock

In #3, the flag indicates true, I think. But you think it's false?

I meant the backends who reach SyncRepLock() while checkpointer is at
after acquiring the lock but before turning the flag on, described at
#3 step in the following sequence. Such backends will wait for the
lock in the current code, but after applying the patch they return
quickly. So what I'm thinking is:

1. sync_standbys_defined flag is false
2. checkpointer takes the lock
3. backend sees the flag, and return as it's still false
4. checkpointer turns the flag on
5. checkpointer releases the lock

If a backend reaches SyncRepWaitForLSN() between #4 and #5 it will
wait for the lock and then enqueue itself after acquiring the lock.
But such behavior is not changed before and after applying the patch.
Fujii-san, I think we agree on how to fix this issue and on the patch
you proposed so please add your comments.

Sorry for the late reply...

Regarding how to fix, don't we need memory barrier when reading
sync_standbys_defined? Without that, after SyncRepUpdateSyncStandbysDefined()
updates it to true, SyncRepWaitForLSN() can see the previous value,
i.e., false, and then exit out of the function. Is this right?
If this is right, we need memory barrier to avoid this issue?

This item is for PG14, right?

Yes!

Regards,

--
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION

#17

pasim@vmware.com

over 5 years ago

In reply to: Fujii Masao (#16)

Re: SyncRepLock acquired exclusively in default configuration

The proposed fix looks good, it resolves the lock contention problem as intended. +1 from my side.

On 09-Jul-2020, at 4:22 PM, Fujii Masao <masao.fujii@oss.nttdata.com> wrote:

Regarding how to fix, don't we need memory barrier when reading
sync_standbys_defined? Without that, after SyncRepUpdateSyncStandbysDefined()
updates it to true, SyncRepWaitForLSN() can see the previous value,
i.e., false, and then exit out of the function. Is this right?
If this is right, we need memory barrier to avoid this issue?

There is no out-of-order execution hazard in the scenario you are describing, memory barriers don’t seem to fit. Using locks to synchronise checkpointer process and a committing backend process is the right way. We have made a conscious decision to bypass the lock, which looks correct in this case.

As an aside, there is a small (?) window where a change to synchronous_standby_names GUC is partially propagated among committing backends, checkpointer and walsender. Such a window may result in walsender declaring a standby as synchronous while a commit backend fails to wait for it in SyncRepWaitForLSN. The root cause is walsender uses sync_standby_priority, a per-walsender variable to tell if a standby is synchronous. It is updated when walsender processes a config change. Whereas sync_standbys_defined, a variable updated by checkpointer, is used by committing backends to determine if they need to wait. If checkpointer is busy flushing buffers, it may take longer than walsender to reflect a change in sync_standbys_defined. This is a low impact problem, should be ok to live with it.

Asim

#18

Robert Haas

robertmhaas@gmail.com

over 5 years ago

In reply to: Asim Praveen (#17)

Re: SyncRepLock acquired exclusively in default configuration

On Tue, Aug 11, 2020 at 7:55 AM Asim Praveen <pasim@vmware.com> wrote:

There is no out-of-order execution hazard in the scenario you are describing, memory barriers don’t seem to fit. Using locks to synchronise checkpointer process and a committing backend process is the right way. We have made a conscious decision to bypass the lock, which looks correct in this case.

Yeah, I am not immediately seeing why a memory barrier would help anything here.

As an aside, there is a small (?) window where a change to synchronous_standby_names GUC is partially propagated among committing backends, checkpointer and walsender. Such a window may result in walsender declaring a standby as synchronous while a commit backend fails to wait for it in SyncRepWaitForLSN. The root cause is walsender uses sync_standby_priority, a per-walsender variable to tell if a standby is synchronous. It is updated when walsender processes a config change. Whereas sync_standbys_defined, a variable updated by checkpointer, is used by committing backends to determine if they need to wait. If checkpointer is busy flushing buffers, it may take longer than walsender to reflect a change in sync_standbys_defined. This is a low impact problem, should be ok to live with it.

I think this gets to the root of the issue. If we check the flag
without a lock, we might see a slightly stale value. But, considering
that there's no particular amount of time within which configuration
changes are guaranteed to take effect, maybe that's OK. However, there
is one potential gotcha here: if the walsender declares the standby to
be synchronous, a user can see that, right? So maybe there's this
problem: a user sees that the standby is synchronous and expects a
transaction committing afterward to provoke a wait, but really it
doesn't. Now the user is unhappy, feeling that the system didn't
perform according to expectations.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

#19

pasim@vmware.com

over 5 years ago

In reply to: Robert Haas (#18)

Re: SyncRepLock acquired exclusively in default configuration

On 11-Aug-2020, at 8:57 PM, Robert Haas <robertmhaas@gmail.com> wrote:

I think this gets to the root of the issue. If we check the flag
without a lock, we might see a slightly stale value. But, considering
that there's no particular amount of time within which configuration
changes are guaranteed to take effect, maybe that's OK. However, there
is one potential gotcha here: if the walsender declares the standby to
be synchronous, a user can see that, right? So maybe there's this
problem: a user sees that the standby is synchronous and expects a
transaction committing afterward to provoke a wait, but really it
doesn't. Now the user is unhappy, feeling that the system didn't
perform according to expectations.

Yes, pg_stat_replication reports a standby in sync as soon as walsender updates priority of the standby to something other than 0.

The potential gotcha referred above doesn’t seem too severe. What is the likelihood of someone setting synchronous_standby_names GUC with either “*” or a standby name and then immediately promoting that standby? If the standby is promoted before the checkpointer on master gets a chance to update sync_standbys_defined in shared memory, commits made during this interval on master may not make it to standby. Upon promotion, those commits may be lost.

Asim

#20

masahiko.sawada@2ndquadrant.com

over 5 years ago

In reply to: Asim Praveen (#19)

Re: SyncRepLock acquired exclusively in default configuration

On Wed, 12 Aug 2020 at 14:06, Asim Praveen <pasim@vmware.com> wrote:

On 11-Aug-2020, at 8:57 PM, Robert Haas <robertmhaas@gmail.com> wrote:

I think this gets to the root of the issue. If we check the flag
without a lock, we might see a slightly stale value. But, considering
that there's no particular amount of time within which configuration
changes are guaranteed to take effect, maybe that's OK. However, there
is one potential gotcha here: if the walsender declares the standby to
be synchronous, a user can see that, right? So maybe there's this
problem: a user sees that the standby is synchronous and expects a
transaction committing afterward to provoke a wait, but really it
doesn't. Now the user is unhappy, feeling that the system didn't
perform according to expectations.

Yes, pg_stat_replication reports a standby in sync as soon as walsender updates priority of the standby to something other than 0.

The potential gotcha referred above doesn’t seem too severe. What is the likelihood of someone setting synchronous_standby_names GUC with either “*” or a standby name and then immediately promoting that standby? If the standby is promoted before the checkpointer on master gets a chance to update sync_standbys_defined in shared memory, commits made during this interval on master may not make it to standby. Upon promotion, those commits may be lost.

I think that if the standby is quite behind the primary and in case of
the primary crashes, the likelihood of losing commits might get
higher. The user can see the standby became synchronous standby via
pg_stat_replication but commit completes without a wait because the
checkpointer doesn't update sync_standbys_defined yet. If the primary
crashes before standby catching up and the user does failover, the
committed transaction will be lost, even though the user expects that
transaction commit has been replicated to the standby synchronously.
And this can happen even without the patch, right?

Regards,

--
Masahiko Sawada http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

#21

masao.fujii@oss.nttdata.com

over 5 years ago

In reply to: Masahiko Sawada (#20)

Re: SyncRepLock acquired exclusively in default configuration

On 2020/08/12 15:32, Masahiko Sawada wrote:

On Wed, 12 Aug 2020 at 14:06, Asim Praveen <pasim@vmware.com> wrote:

On 11-Aug-2020, at 8:57 PM, Robert Haas <robertmhaas@gmail.com> wrote:

I think this gets to the root of the issue. If we check the flag
without a lock, we might see a slightly stale value. But, considering
that there's no particular amount of time within which configuration
changes are guaranteed to take effect, maybe that's OK. However, there
is one potential gotcha here: if the walsender declares the standby to
be synchronous, a user can see that, right? So maybe there's this
problem: a user sees that the standby is synchronous and expects a
transaction committing afterward to provoke a wait, but really it
doesn't. Now the user is unhappy, feeling that the system didn't
perform according to expectations.

Yes, pg_stat_replication reports a standby in sync as soon as walsender updates priority of the standby to something other than 0.

The potential gotcha referred above doesn’t seem too severe. What is the likelihood of someone setting synchronous_standby_names GUC with either “*” or a standby name and then immediately promoting that standby? If the standby is promoted before the checkpointer on master gets a chance to update sync_standbys_defined in shared memory, commits made during this interval on master may not make it to standby. Upon promotion, those commits may be lost.

I think that if the standby is quite behind the primary and in case of
the primary crashes, the likelihood of losing commits might get
higher. The user can see the standby became synchronous standby via
pg_stat_replication but commit completes without a wait because the
checkpointer doesn't update sync_standbys_defined yet. If the primary
crashes before standby catching up and the user does failover, the
committed transaction will be lost, even though the user expects that
transaction commit has been replicated to the standby synchronously.
And this can happen even without the patch, right?

I think you're right. This issue can happen even without the patch.

Maybe we should not mark the standby as "sync" whenever sync_standbys_defined
is false even if synchronous_standby_names is actually set and walsenders have
already detect that? Or we need more aggressive approach;
make the checkpointer update sync_standby_priority values of
all the walsenders? ISTM that the latter looks overkill...

Regards,

--
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION

#22

pasim@vmware.com

over 5 years ago

In reply to: Masahiko Sawada (#20)

Re: SyncRepLock acquired exclusively in default configuration

On 12-Aug-2020, at 12:02 PM, Masahiko Sawada <masahiko.sawada@2ndquadrant.com> wrote:

On Wed, 12 Aug 2020 at 14:06, Asim Praveen <pasim@vmware.com> wrote:

On 11-Aug-2020, at 8:57 PM, Robert Haas <robertmhaas@gmail.com> wrote:

I think this gets to the root of the issue. If we check the flag
without a lock, we might see a slightly stale value. But, considering
that there's no particular amount of time within which configuration
changes are guaranteed to take effect, maybe that's OK. However, there
is one potential gotcha here: if the walsender declares the standby to
be synchronous, a user can see that, right? So maybe there's this
problem: a user sees that the standby is synchronous and expects a
transaction committing afterward to provoke a wait, but really it
doesn't. Now the user is unhappy, feeling that the system didn't
perform according to expectations.

Yes, pg_stat_replication reports a standby in sync as soon as walsender updates priority of the standby to something other than 0.

The potential gotcha referred above doesn’t seem too severe. What is the likelihood of someone setting synchronous_standby_names GUC with either “*” or a standby name and then immediately promoting that standby? If the standby is promoted before the checkpointer on master gets a chance to update sync_standbys_defined in shared memory, commits made during this interval on master may not make it to standby. Upon promotion, those commits may be lost.

I think that if the standby is quite behind the primary and in case of
the primary crashes, the likelihood of losing commits might get
higher. The user can see the standby became synchronous standby via
pg_stat_replication but commit completes without a wait because the
checkpointer doesn't update sync_standbys_defined yet. If the primary
crashes before standby catching up and the user does failover, the
committed transaction will be lost, even though the user expects that
transaction commit has been replicated to the standby synchronously.
And this can happen even without the patch, right?

It is correct that the issue is orthogonal to the patch upthread and I’ve marked
the commitfest entry as ready-for-committer.

Regarding the issue described above, the amount by which the standby is lagging
behind the primary does not affect the severity. A standby’s state will be
reported as “sync” to the user only after the standby has caught up (state ==
WALSNDSTATE_STREAMING). The time it takes for the checkpointer to update the
sync_standbys_defined flag in shared memory is the important factor. Once
checkpointer sets this flag, commits start waiting for the standby (as long as
it is in-sync).

Asim

#23

Kyotaro Horiguchi

horikyota.ntt@gmail.com

over 5 years ago

In reply to: Asim Praveen (#22)

Re: SyncRepLock acquired exclusively in default configuration

At Wed, 19 Aug 2020 13:20:46 +0000, Asim Praveen <pasim@vmware.com> wrote in

On 12-Aug-2020, at 12:02 PM, Masahiko Sawada <masahiko.sawada@2ndquadrant.com> wrote:

On Wed, 12 Aug 2020 at 14:06, Asim Praveen <pasim@vmware.com> wrote:

On 11-Aug-2020, at 8:57 PM, Robert Haas <robertmhaas@gmail.com> wrote:

I think this gets to the root of the issue. If we check the flag
without a lock, we might see a slightly stale value. But, considering
that there's no particular amount of time within which configuration
changes are guaranteed to take effect, maybe that's OK. However, there
is one potential gotcha here: if the walsender declares the standby to
be synchronous, a user can see that, right? So maybe there's this
problem: a user sees that the standby is synchronous and expects a
transaction committing afterward to provoke a wait, but really it
doesn't. Now the user is unhappy, feeling that the system didn't
perform according to expectations.

Yes, pg_stat_replication reports a standby in sync as soon as walsender updates priority of the standby to something other than 0.

The potential gotcha referred above doesn’t seem too severe. What is the likelihood of someone setting synchronous_standby_names GUC with either “*” or a standby name and then immediately promoting that standby? If the standby is promoted before the checkpointer on master gets a chance to update sync_standbys_defined in shared memory, commits made during this interval on master may not make it to standby. Upon promotion, those commits may be lost.

I think that if the standby is quite behind the primary and in case of
the primary crashes, the likelihood of losing commits might get
higher. The user can see the standby became synchronous standby via
pg_stat_replication but commit completes without a wait because the
checkpointer doesn't update sync_standbys_defined yet. If the primary
crashes before standby catching up and the user does failover, the
committed transaction will be lost, even though the user expects that
transaction commit has been replicated to the standby synchronously.
And this can happen even without the patch, right?

It is correct that the issue is orthogonal to the patch upthread and I’ve marked
the commitfest entry as ready-for-committer.

I find the name of SyncStandbysDefined macro is very confusing with
the struct member sync_standbys_defined, but that might be another
issue..

-	 * Fast exit if user has not requested sync replication.
+	 * Fast exit if user has not requested sync replication, or there are no
+	 * sync replication standby names defined.

This comment sounds like we just do that twice. The reason for the
check is to avoid wasteful exclusive locks on SyncRepLock, or to form
double-checked locking on the variable. I think we should explain that
here.

Regarding the issue described above, the amount by which the standby is lagging
behind the primary does not affect the severity. A standby’s state will be
reported as “sync” to the user only after the standby has caught up (state ==
WALSNDSTATE_STREAMING). The time it takes for the checkpointer to update the
sync_standbys_defined flag in shared memory is the important factor. Once
checkpointer sets this flag, commits start waiting for the standby (as long as
it is in-sync).

CATCHUP state is only entered at replication startup. It stays at
STREAMING when sync_sby_names is switched from '' to a valid name,
thus sync_state shows 'sync' even if checkpointer hasn't changed
sync_standbys_defined. If the standby being switched had a big lag,
the chance of losing commits getting larger (up to certain extent) for
the same extent of checkpointer lag.

regards.

--
Kyotaro Horiguchi
NTT Open Source Software Center

#24

Kyotaro Horiguchi

horikyota.ntt@gmail.com

over 5 years ago

In reply to: Fujii Masao (#21)

Re: SyncRepLock acquired exclusively in default configuration

At Wed, 19 Aug 2020 21:41:03 +0900, Fujii Masao <masao.fujii@oss.nttdata.com> wrote in

On 2020/08/12 15:32, Masahiko Sawada wrote:

On Wed, 12 Aug 2020 at 14:06, Asim Praveen <pasim@vmware.com> wrote:

On 11-Aug-2020, at 8:57 PM, Robert Haas <robertmhaas@gmail.com> wrote:

I think this gets to the root of the issue. If we check the flag
without a lock, we might see a slightly stale value. But, considering
that there's no particular amount of time within which configuration
changes are guaranteed to take effect, maybe that's OK. However, there
is one potential gotcha here: if the walsender declares the standby to
be synchronous, a user can see that, right? So maybe there's this
problem: a user sees that the standby is synchronous and expects a
transaction committing afterward to provoke a wait, but really it
doesn't. Now the user is unhappy, feeling that the system didn't
perform according to expectations.

Yes, pg_stat_replication reports a standby in sync as soon as
walsender updates priority of the standby to something other than 0.

The potential gotcha referred above doesn’t seem too severe. What is
the likelihood of someone setting synchronous_standby_names GUC with
either “*” or a standby name and then immediately promoting that
standby? If the standby is promoted before the checkpointer on master
gets a chance to update sync_standbys_defined in shared memory,
commits made during this interval on master may not make it to
standby. Upon promotion, those commits may be lost.

I think that if the standby is quite behind the primary and in case of
the primary crashes, the likelihood of losing commits might get
higher. The user can see the standby became synchronous standby via
pg_stat_replication but commit completes without a wait because the
checkpointer doesn't update sync_standbys_defined yet. If the primary
crashes before standby catching up and the user does failover, the
committed transaction will be lost, even though the user expects that
transaction commit has been replicated to the standby synchronously.
And this can happen even without the patch, right?

I think you're right. This issue can happen even without the patch.

Maybe we should not mark the standby as "sync" whenever
sync_standbys_defined
is false even if synchronous_standby_names is actually set and
walsenders have
already detect that? Or we need more aggressive approach;
make the checkpointer update sync_standby_priority values of
all the walsenders? ISTM that the latter looks overkill...

It seems to me that the issue here is what
pg_stat_replication.sync_status doens't show "the working state of the
walsdner", but "the state the walsender is commanded". Non-zero
WalSnd.sync_standby_priority is immediately considered as "I am in
sync" but actually it is "I am going to sync from async or am already
in sync". And it is precisely "..., or am already in sync if
checkpointer already notices any sync walsender exists".

1. if a walsender changes its state from async to sync, it should once
change its state back to "CATCHUP" or something like that.

2. pg_stat_replication.sync_status need to consider WalSnd.state
and WalSndCtlData.sync_standbys_defined.

We might be able to let SyncRepUpdateSyncStandbysDefined postpone
changing sync_standbys_defined until any sync standby actually comes,
but it would be complex.

regards.

--
Kyotaro Horiguchi
NTT Open Source Software Center

#25

masahiko.sawada@2ndquadrant.com

over 5 years ago

In reply to: Fujii Masao (#21)

Re: SyncRepLock acquired exclusively in default configuration

On Wed, 19 Aug 2020 at 21:41, Fujii Masao <masao.fujii@oss.nttdata.com> wrote:

On 2020/08/12 15:32, Masahiko Sawada wrote:

On Wed, 12 Aug 2020 at 14:06, Asim Praveen <pasim@vmware.com> wrote:

On 11-Aug-2020, at 8:57 PM, Robert Haas <robertmhaas@gmail.com> wrote:

I think this gets to the root of the issue. If we check the flag
without a lock, we might see a slightly stale value. But, considering
that there's no particular amount of time within which configuration
changes are guaranteed to take effect, maybe that's OK. However, there
is one potential gotcha here: if the walsender declares the standby to
be synchronous, a user can see that, right? So maybe there's this
problem: a user sees that the standby is synchronous and expects a
transaction committing afterward to provoke a wait, but really it
doesn't. Now the user is unhappy, feeling that the system didn't
perform according to expectations.

Yes, pg_stat_replication reports a standby in sync as soon as walsender updates priority of the standby to something other than 0.

The potential gotcha referred above doesn’t seem too severe. What is the likelihood of someone setting synchronous_standby_names GUC with either “*” or a standby name and then immediately promoting that standby? If the standby is promoted before the checkpointer on master gets a chance to update sync_standbys_defined in shared memory, commits made during this interval on master may not make it to standby. Upon promotion, those commits may be lost.

I think that if the standby is quite behind the primary and in case of
the primary crashes, the likelihood of losing commits might get
higher. The user can see the standby became synchronous standby via
pg_stat_replication but commit completes without a wait because the
checkpointer doesn't update sync_standbys_defined yet. If the primary
crashes before standby catching up and the user does failover, the
committed transaction will be lost, even though the user expects that
transaction commit has been replicated to the standby synchronously.
And this can happen even without the patch, right?

I think you're right. This issue can happen even without the patch.

Maybe we should not mark the standby as "sync" whenever sync_standbys_defined
is false even if synchronous_standby_names is actually set and walsenders have
already detect that?

It seems good. I guess that we can set 'async' to sync_status and 0 to
sync_priority when sync_standbys_defined is not true regardless of
walsender's actual priority value. We print the message "standby
\"%s\" now has synchronous standby priority %u" in SyncRepInitConfig()
regardless of sync_standbys_defined but maybe it's fine as the message
isn't incorrect and it's DEBUG1 message.

Or we need more aggressive approach;
make the checkpointer update sync_standby_priority values of
all the walsenders? ISTM that the latter looks overkill...

I think so too.

Regards,

--
Masahiko Sawada http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

#26

masao.fujii@oss.nttdata.com

over 5 years ago

In reply to: Kyotaro Horiguchi (#23)

1 attachment(s)

Re: SyncRepLock acquired exclusively in default configuration

On 2020/08/20 11:29, Kyotaro Horiguchi wrote:

At Wed, 19 Aug 2020 13:20:46 +0000, Asim Praveen <pasim@vmware.com> wrote in

On 12-Aug-2020, at 12:02 PM, Masahiko Sawada <masahiko.sawada@2ndquadrant.com> wrote:

On Wed, 12 Aug 2020 at 14:06, Asim Praveen <pasim@vmware.com> wrote:

On 11-Aug-2020, at 8:57 PM, Robert Haas <robertmhaas@gmail.com> wrote:

I think this gets to the root of the issue. If we check the flag
without a lock, we might see a slightly stale value. But, considering
that there's no particular amount of time within which configuration
changes are guaranteed to take effect, maybe that's OK. However, there
is one potential gotcha here: if the walsender declares the standby to
be synchronous, a user can see that, right? So maybe there's this
problem: a user sees that the standby is synchronous and expects a
transaction committing afterward to provoke a wait, but really it
doesn't. Now the user is unhappy, feeling that the system didn't
perform according to expectations.

Yes, pg_stat_replication reports a standby in sync as soon as walsender updates priority of the standby to something other than 0.

The potential gotcha referred above doesn’t seem too severe. What is the likelihood of someone setting synchronous_standby_names GUC with either “*” or a standby name and then immediately promoting that standby? If the standby is promoted before the checkpointer on master gets a chance to update sync_standbys_defined in shared memory, commits made during this interval on master may not make it to standby. Upon promotion, those commits may be lost.

I think that if the standby is quite behind the primary and in case of
the primary crashes, the likelihood of losing commits might get
higher. The user can see the standby became synchronous standby via
pg_stat_replication but commit completes without a wait because the
checkpointer doesn't update sync_standbys_defined yet. If the primary
crashes before standby catching up and the user does failover, the
committed transaction will be lost, even though the user expects that
transaction commit has been replicated to the standby synchronously.
And this can happen even without the patch, right?

It is correct that the issue is orthogonal to the patch upthread and I’ve marked
the commitfest entry as ready-for-committer.

Yes, thanks for the review!

I find the name of SyncStandbysDefined macro is very confusing with
the struct member sync_standbys_defined, but that might be another
issue..
-	 * Fast exit if user has not requested sync replication.
+	 * Fast exit if user has not requested sync replication, or there are no
+	 * sync replication standby names defined.
This comment sounds like we just do that twice. The reason for the
check is to avoid wasteful exclusive locks on SyncRepLock, or to form
double-checked locking on the variable. I think we should explain that
here.

I added the following comments based on the suggestion by Sawada-san upthread. Thought?

+	 * Since this routine gets called every commit time, it's important to
+	 * exit quickly if sync replication is not requested. So we check
+	 * WalSndCtl->sync_standbys_define without the lock and exit
+	 * immediately if it's false. If it's true, we check it again later
+	 * while holding the lock, to avoid the race condition described
+	 * in SyncRepUpdateSyncStandbysDefined().

Attached is the updated version of the patch. I didn't change how to
fix the issue. But I changed the check for fast exit so that it's called
before setting the "mode", to avoid a few cycle.

Regards,

--
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION

Attachments:

syncreplock_v2.patchtext/plain; charset=UTF-8; name=syncreplock_v2.patch; x-mac-creator=0; x-mac-type=0Download

diff --git a/src/backend/replication/syncrep.c b/src/backend/replication/syncrep.c
index df1e341c76..c30ebaa0a6 100644
--- a/src/backend/replication/syncrep.c
+++ b/src/backend/replication/syncrep.c
@@ -158,18 +158,27 @@ SyncRepWaitForLSN(XLogRecPtr lsn, bool commit)
 	 */
 	Assert(InterruptHoldoffCount > 0);
 
+	/*
+	 * Fast exit if user has not requested sync replication, or there are no
+	 * sync replication standby names defined.
+	 *
+	 * Since this routine gets called every commit time, it's important to
+	 * exit quickly if sync replication is not requested. So we check
+	 * WalSndCtl->sync_standbys_define without the lock and exit
+	 * immediately if it's false. If it's true, we check it again later
+	 * while holding the lock, to avoid the race condition described
+	 * in SyncRepUpdateSyncStandbysDefined().
+	 */
+	if (!SyncRepRequested() ||
+		!((volatile WalSndCtlData *) WalSndCtl)->sync_standbys_defined)
+		return;
+
 	/* Cap the level for anything other than commit to remote flush only. */
 	if (commit)
 		mode = SyncRepWaitMode;
 	else
 		mode = Min(SyncRepWaitMode, SYNC_REP_WAIT_FLUSH);
 
-	/*
-	 * Fast exit if user has not requested sync replication.
-	 */
-	if (!SyncRepRequested())
-		return;
-
 	Assert(SHMQueueIsDetached(&(MyProc->syncRepLinks)));
 	Assert(WalSndCtl != NULL);

#27

pasim@vmware.com

over 5 years ago

In reply to: Fujii Masao (#26)

Re: SyncRepLock acquired exclusively in default configuration

On 26-Aug-2020, at 11:10 PM, Fujii Masao <masao.fujii@oss.nttdata.com> wrote:

I added the following comments based on the suggestion by Sawada-san upthread. Thought?

+	 * Since this routine gets called every commit time, it's important to
+	 * exit quickly if sync replication is not requested. So we check
+	 * WalSndCtl->sync_standbys_define without the lock and exit
+	 * immediately if it's false. If it's true, we check it again later
+	 * while holding the lock, to avoid the race condition described
+	 * in SyncRepUpdateSyncStandbysDefined().

+1. May I suggest the following addition to the above comment (feel free to
rephrase / reject)?

"If sync_standbys_defined was being set from false to true and we observe it as
false, it ok to skip the wait. Replication was async and it is in the process
of being changed to sync, due to user request. Subsequent commits will observe
the change and start waiting.”

Attached is the updated version of the patch. I didn't change how to
fix the issue. But I changed the check for fast exit so that it's called
before setting the "mode", to avoid a few cycle.

Looks good to me. There is a typo in the comment:

s/sync_standbys_define/sync_standbys_defined/

Asim

#28

masao.fujii@oss.nttdata.com

over 5 years ago

In reply to: Asim Praveen (#27)

1 attachment(s)

Re: SyncRepLock acquired exclusively in default configuration

On 2020/08/27 15:59, Asim Praveen wrote:

On 26-Aug-2020, at 11:10 PM, Fujii Masao <masao.fujii@oss.nttdata.com> wrote:

I added the following comments based on the suggestion by Sawada-san upthread. Thought?
+	 * Since this routine gets called every commit time, it's important to
+	 * exit quickly if sync replication is not requested. So we check
+	 * WalSndCtl->sync_standbys_define without the lock and exit
+	 * immediately if it's false. If it's true, we check it again later
+	 * while holding the lock, to avoid the race condition described
+	 * in SyncRepUpdateSyncStandbysDefined().
+1. May I suggest the following addition to the above comment (feel free to
rephrase / reject)?

"If sync_standbys_defined was being set from false to true and we observe it as
false, it ok to skip the wait. Replication was async and it is in the process
of being changed to sync, due to user request. Subsequent commits will observe
the change and start waiting.”

Thanks for the suggestion! I'm not sure if it's worth adding this because
it seems obvious thing. But maybe you imply that we need to comment
why the lock is not necessary when sync_standbys_defined is false. Right?
If so, what about updating the comments as follows?

+	 * Since this routine gets called every commit time, it's important to
+	 * exit quickly if sync replication is not requested. So we check
+	 * WalSndCtl->sync_standbys_defined flag without the lock and exit
+	 * immediately if it's false. If it's true, we need to check it again later
+	 * while holding the lock, to check the flag and operate the sync rep
+	 * queue atomically. This is necessary to avoid the race condition
+	 * described in SyncRepUpdateSyncStandbysDefined(). On the other
+	 * hand, if it's false, the lock is not necessary because we don't touch
+	 * the queue.

Attached is the updated version of the patch. I didn't change how to
fix the issue. But I changed the check for fast exit so that it's called
before setting the "mode", to avoid a few cycle.

Looks good to me. There is a typo in the comment:

s/sync_standbys_define/sync_standbys_defined/

Fixed. Thanks!

Regards,

--
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION

Attachments:

syncreplock_v3.patchtext/plain; charset=UTF-8; name=syncreplock_v3.patch; x-mac-creator=0; x-mac-type=0Download

diff --git a/src/backend/replication/syncrep.c b/src/backend/replication/syncrep.c
index df1e341c76..6e8c76537a 100644
--- a/src/backend/replication/syncrep.c
+++ b/src/backend/replication/syncrep.c
@@ -158,18 +158,30 @@ SyncRepWaitForLSN(XLogRecPtr lsn, bool commit)
 	 */
 	Assert(InterruptHoldoffCount > 0);
 
+	/*
+	 * Fast exit if user has not requested sync replication, or there are no
+	 * sync replication standby names defined.
+	 *
+	 * Since this routine gets called every commit time, it's important to
+	 * exit quickly if sync replication is not requested. So we check
+	 * WalSndCtl->sync_standbys_defined flag without the lock and exit
+	 * immediately if it's false. If it's true, we need to check it again later
+	 * while holding the lock, to check the flag and operate the sync rep
+	 * queue atomically. This is necessary to avoid the race condition
+	 * described in SyncRepUpdateSyncStandbysDefined(). On the other
+	 * hand, if it's false, the lock is not necessary because we don't touch
+	 * the queue.
+	 */
+	if (!SyncRepRequested() ||
+		!((volatile WalSndCtlData *) WalSndCtl)->sync_standbys_defined)
+		return;
+
 	/* Cap the level for anything other than commit to remote flush only. */
 	if (commit)
 		mode = SyncRepWaitMode;
 	else
 		mode = Min(SyncRepWaitMode, SYNC_REP_WAIT_FLUSH);
 
-	/*
-	 * Fast exit if user has not requested sync replication.
-	 */
-	if (!SyncRepRequested())
-		return;
-
 	Assert(SHMQueueIsDetached(&(MyProc->syncRepLinks)));
 	Assert(WalSndCtl != NULL);

#29

pasim@vmware.com

over 5 years ago

In reply to: Fujii Masao (#28)

Re: SyncRepLock acquired exclusively in default configuration

On 28-Aug-2020, at 7:03 AM, Fujii Masao <masao.fujii@oss.nttdata.com> wrote:

On 2020/08/27 15:59, Asim Praveen wrote:

+1. May I suggest the following addition to the above comment (feel free to
rephrase / reject)?
"If sync_standbys_defined was being set from false to true and we observe it as
false, it ok to skip the wait. Replication was async and it is in the process
of being changed to sync, due to user request. Subsequent commits will observe
the change and start waiting.”

Thanks for the suggestion! I'm not sure if it's worth adding this because
it seems obvious thing. But maybe you imply that we need to comment
why the lock is not necessary when sync_standbys_defined is false. Right?
If so, what about updating the comments as follows?
+	 * Since this routine gets called every commit time, it's important to
+	 * exit quickly if sync replication is not requested. So we check
+	 * WalSndCtl->sync_standbys_defined flag without the lock and exit
+	 * immediately if it's false. If it's true, we need to check it again later
+	 * while holding the lock, to check the flag and operate the sync rep
+	 * queue atomically. This is necessary to avoid the race condition
+	 * described in SyncRepUpdateSyncStandbysDefined(). On the other
+	 * hand, if it's false, the lock is not necessary because we don't touch
+	 * the queue.

Thank you for updating the comment. This looks better.

Asim

#30

masahiko.sawada@2ndquadrant.com

over 5 years ago

In reply to: Fujii Masao (#28)

Re: SyncRepLock acquired exclusively in default configuration

On Fri, 28 Aug 2020 at 10:33, Fujii Masao <masao.fujii@oss.nttdata.com> wrote:

On 2020/08/27 15:59, Asim Praveen wrote:
On 26-Aug-2020, at 11:10 PM, Fujii Masao <masao.fujii@oss.nttdata.com> wrote:

I added the following comments based on the suggestion by Sawada-san upthread. Thought?
+     * Since this routine gets called every commit time, it's important to
+     * exit quickly if sync replication is not requested. So we check
+     * WalSndCtl->sync_standbys_define without the lock and exit
+     * immediately if it's false. If it's true, we check it again later
+     * while holding the lock, to avoid the race condition described
+     * in SyncRepUpdateSyncStandbysDefined().
+1. May I suggest the following addition to the above comment (feel free to
rephrase / reject)?

"If sync_standbys_defined was being set from false to true and we observe it as
false, it ok to skip the wait. Replication was async and it is in the process
of being changed to sync, due to user request. Subsequent commits will observe
the change and start waiting.”
Thanks for the suggestion! I'm not sure if it's worth adding this because
it seems obvious thing. But maybe you imply that we need to comment
why the lock is not necessary when sync_standbys_defined is false. Right?
If so, what about updating the comments as follows?
+        * Since this routine gets called every commit time, it's important to
+        * exit quickly if sync replication is not requested. So we check
+        * WalSndCtl->sync_standbys_defined flag without the lock and exit
+        * immediately if it's false. If it's true, we need to check it again later
+        * while holding the lock, to check the flag and operate the sync rep
+        * queue atomically. This is necessary to avoid the race condition
+        * described in SyncRepUpdateSyncStandbysDefined(). On the other
+        * hand, if it's false, the lock is not necessary because we don't touch
+        * the queue.
Attached is the updated version of the patch. I didn't change how to
fix the issue. But I changed the check for fast exit so that it's called
before setting the "mode", to avoid a few cycle.

Looks good to me. There is a typo in the comment:

s/sync_standbys_define/sync_standbys_defined/

Fixed. Thanks!

Both v2 and v3 look good to me too. IMO I'm okay with and without the
last sentence.

Regards,

--
Masahiko Sawada http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

#31