More race conditions in logical replication

Started by Tom Laneover 8 years ago28 messages

Tom Lane

tgl@sss.pgh.pa.us

over 8 years ago

1 attachment(s)

I noticed a recent failure that looked suspiciously like a race condition:

https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=hornet&dt=2017-07-02%2018%3A02%3A07

The critical bit in the log file is

error running SQL: 'psql:<stdin>:1: ERROR: could not drop the replication slot "tap_sub" on publisher
DETAIL: The error was: ERROR: replication slot "tap_sub" is active for PID 3866790'
while running 'psql -XAtq -d port=59543 host=/tmp/QpCJtafT7R dbname='postgres' -f - -v ON_ERROR_STOP=1' with sql 'DROP SUBSCRIPTION tap_sub' at /home/nm/farm/xlc64/HEAD/pgsql.build/src/test/subscription/../../../src/test/perl/PostgresNode.pm line 1198.

After poking at it a bit, I found that I can cause several different
failures of this ilk in the subscription tests by injecting delays at
the points where a slot's active_pid is about to be cleared, as in the
attached patch (which also adds some extra printouts for debugging
purposes; none of that is meant for commit). It seems clear that there
is inadequate interlocking going on when we kill and restart a logical
rep worker: we're trying to start a new one before the old one has
gotten out of the slot.

I'm not particularly interested in fixing this myself, so I'm just
going to add it to the open items list.

regards, tom lane

Attachments:

break_repl_slot_management.patchtext/x-diff; charset=us-ascii; name=break_repl_slot_management.patchDownload

diff --git a/src/backend/replication/slot.c b/src/backend/replication/slot.c
index dc7de20..78cd416 100644
*** a/src/backend/replication/slot.c
--- b/src/backend/replication/slot.c
*************** ReplicationSlotAcquire(const char *name)
*** 358,364 ****
  	if (active_pid != MyProcPid)
  		ereport(ERROR,
  				(errcode(ERRCODE_OBJECT_IN_USE),
! 				 errmsg("replication slot \"%s\" is active for PID %d",
  						name, active_pid)));
  
  	/* We made this slot active, so it's ours now. */
--- 358,364 ----
  	if (active_pid != MyProcPid)
  		ereport(ERROR,
  				(errcode(ERRCODE_OBJECT_IN_USE),
! 				 errmsg("replication slot \"%s\" is active for PID %d (acq)",
  						name, active_pid)));
  
  	/* We made this slot active, so it's ours now. */
*************** ReplicationSlotRelease(void)
*** 391,396 ****
--- 391,398 ----
  		 * Mark persistent slot inactive.  We're not freeing it, just
  		 * disconnecting.
  		 */
+ 		pg_usleep(100000);
+ 		elog(LOG, "ReplicationSlotRelease: unmarking persistent slot");
  		SpinLockAcquire(&slot->mutex);
  		slot->active_pid = 0;
  		SpinLockRelease(&slot->mutex);
*************** ReplicationSlotDropPtr(ReplicationSlot *
*** 523,528 ****
--- 525,532 ----
  	{
  		bool		fail_softly = slot->data.persistency != RS_PERSISTENT;
  
+ 		pg_usleep(100000);
+ 		elog(LOG, "ReplicationSlotDropPtr: unmarking slot after rename fail");
  		SpinLockAcquire(&slot->mutex);
  		slot->active_pid = 0;
  		SpinLockRelease(&slot->mutex);
*************** ReplicationSlotDropPtr(ReplicationSlot *
*** 540,545 ****
--- 544,551 ----
  	 * and nobody can be attached to this slot and thus access it without
  	 * scanning the array.
  	 */
+ 	pg_usleep(100000);
+ 	elog(LOG, "ReplicationSlotDropPtr: unmarking slot");
  	LWLockAcquire(ReplicationSlotControlLock, LW_EXCLUSIVE);
  	slot->active_pid = 0;
  	slot->in_use = false;
*************** restart:
*** 876,882 ****
  		if (active_pid)
  			ereport(ERROR,
  					(errcode(ERRCODE_OBJECT_IN_USE),
! 					 errmsg("replication slot \"%s\" is active for PID %d",
  							slotname, active_pid)));
  
  		/*
--- 882,888 ----
  		if (active_pid)
  			ereport(ERROR,
  					(errcode(ERRCODE_OBJECT_IN_USE),
! 					 errmsg("replication slot \"%s\" is active for PID %d (drop)",
  							slotname, active_pid)));
  
  		/*

Noah Misch

noah@leadboat.com

over 8 years ago

In reply to: Tom Lane (#1)

Re: More race conditions in logical replication

On Sun, Jul 02, 2017 at 07:54:48PM -0400, Tom Lane wrote:

I noticed a recent failure that looked suspiciously like a race condition:

https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=hornet&dt=2017-07-02%2018%3A02%3A07

The critical bit in the log file is

error running SQL: 'psql:<stdin>:1: ERROR: could not drop the replication slot "tap_sub" on publisher
DETAIL: The error was: ERROR: replication slot "tap_sub" is active for PID 3866790'
while running 'psql -XAtq -d port=59543 host=/tmp/QpCJtafT7R dbname='postgres' -f - -v ON_ERROR_STOP=1' with sql 'DROP SUBSCRIPTION tap_sub' at /home/nm/farm/xlc64/HEAD/pgsql.build/src/test/subscription/../../../src/test/perl/PostgresNode.pm line 1198.

After poking at it a bit, I found that I can cause several different
failures of this ilk in the subscription tests by injecting delays at
the points where a slot's active_pid is about to be cleared, as in the
attached patch (which also adds some extra printouts for debugging
purposes; none of that is meant for commit). It seems clear that there
is inadequate interlocking going on when we kill and restart a logical
rep worker: we're trying to start a new one before the old one has
gotten out of the slot.

I'm not particularly interested in fixing this myself, so I'm just
going to add it to the open items list.

[Action required within three days. This is a generic notification.]

The above-described topic is currently a PostgreSQL 10 open item. Peter,
since you committed the patch believed to have created it, you own this open
item. If some other commit is more relevant or if this does not belong as a
v10 open item, please let us know. Otherwise, please observe the policy on
open item ownership[1]/messages/by-id/20170404140717.GA2675809@tornado.leadboat.com and send a status update within three calendar days of
this message. Include a date for your subsequent status update. Testers may
discover new open items at any time, and I want to plan to get them all fixed
well in advance of shipping v10. Consequently, I will appreciate your efforts
toward speedy resolution. Thanks.

[1]: /messages/by-id/20170404140717.GA2675809@tornado.leadboat.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Alvaro Herrera

alvherre@2ndquadrant.com

over 8 years ago

In reply to: Noah Misch (#2)

Re: More race conditions in logical replication

Noah Misch wrote:

The above-described topic is currently a PostgreSQL 10 open item. Peter,
since you committed the patch believed to have created it, you own this open
item.

I volunteer to own this item. My next update is going to be on or
before Friday 7th at 19:00 Chilean time, though I don't think I can have
this ready before then. I expect a fix for this to miss next week's
beta release, but I'll try to get it done sooner.

--
ï¿½lvaro Herrera https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Petr Jelinek

petr.jelinek@2ndquadrant.com

over 8 years ago

In reply to: Tom Lane (#1)

Re: More race conditions in logical replication

On 03/07/17 01:54, Tom Lane wrote:

I noticed a recent failure that looked suspiciously like a race condition:

https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=hornet&dt=2017-07-02%2018%3A02%3A07

The critical bit in the log file is

error running SQL: 'psql:<stdin>:1: ERROR: could not drop the replication slot "tap_sub" on publisher
DETAIL: The error was: ERROR: replication slot "tap_sub" is active for PID 3866790'
while running 'psql -XAtq -d port=59543 host=/tmp/QpCJtafT7R dbname='postgres' -f - -v ON_ERROR_STOP=1' with sql 'DROP SUBSCRIPTION tap_sub' at /home/nm/farm/xlc64/HEAD/pgsql.build/src/test/subscription/../../../src/test/perl/PostgresNode.pm line 1198.

After poking at it a bit, I found that I can cause several different
failures of this ilk in the subscription tests by injecting delays at
the points where a slot's active_pid is about to be cleared, as in the
attached patch (which also adds some extra printouts for debugging
purposes; none of that is meant for commit). It seems clear that there
is inadequate interlocking going on when we kill and restart a logical
rep worker: we're trying to start a new one before the old one has
gotten out of the slot.

Thanks for the test case.

It's not actually that we start new worker fast. It's that we try to
drop the slot right after worker process was killed but if the code that
clears slot's active_pid takes too long, it still looks like it's being
used. I am quite sure it's possible to make this happen for physical
replication as well when using slots.

This is not something that can be solved by locking on subscriber. ISTM
we need to make pg_drop_replication_slot behave more nicely, like making
it wait for the slot to become available (either by default or as an
option).

I'll have to think about how to do it without rewriting half of
replication slots or reimplementing lock queue though because the
replication slots don't use normal catalog access so there is no object
locking with wait queue. We could use some latch wait with small timeout
but that seems ugly as that function can be called by user without
having dropped the slot before so the wait can be quite long (as in
"forever").

--
Petr Jelinek http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Petr Jelinek

petr.jelinek@2ndquadrant.com

over 8 years ago

In reply to: Petr Jelinek (#4)

1 attachment(s)

Re: More race conditions in logical replication

On 06/07/17 17:33, Petr Jelinek wrote:

On 03/07/17 01:54, Tom Lane wrote:

I noticed a recent failure that looked suspiciously like a race condition:

https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=hornet&dt=2017-07-02%2018%3A02%3A07

The critical bit in the log file is

error running SQL: 'psql:<stdin>:1: ERROR: could not drop the replication slot "tap_sub" on publisher
DETAIL: The error was: ERROR: replication slot "tap_sub" is active for PID 3866790'
while running 'psql -XAtq -d port=59543 host=/tmp/QpCJtafT7R dbname='postgres' -f - -v ON_ERROR_STOP=1' with sql 'DROP SUBSCRIPTION tap_sub' at /home/nm/farm/xlc64/HEAD/pgsql.build/src/test/subscription/../../../src/test/perl/PostgresNode.pm line 1198.

After poking at it a bit, I found that I can cause several different
failures of this ilk in the subscription tests by injecting delays at
the points where a slot's active_pid is about to be cleared, as in the
attached patch (which also adds some extra printouts for debugging
purposes; none of that is meant for commit). It seems clear that there
is inadequate interlocking going on when we kill and restart a logical
rep worker: we're trying to start a new one before the old one has
gotten out of the slot.

Thanks for the test case.

It's not actually that we start new worker fast. It's that we try to
drop the slot right after worker process was killed but if the code that
clears slot's active_pid takes too long, it still looks like it's being
used. I am quite sure it's possible to make this happen for physical
replication as well when using slots.

This is not something that can be solved by locking on subscriber. ISTM
we need to make pg_drop_replication_slot behave more nicely, like making
it wait for the slot to become available (either by default or as an
option).

I'll have to think about how to do it without rewriting half of
replication slots or reimplementing lock queue though because the
replication slots don't use normal catalog access so there is no object
locking with wait queue. We could use some latch wait with small timeout
but that seems ugly as that function can be called by user without
having dropped the slot before so the wait can be quite long (as in
"forever").

Naive fix would be something like attached. But as I said, it's not
exactly pretty.

--
Petr Jelinek http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

Attachments:

0001-Wait-for-slot-to-become-free-in-when-dropping-it.patchtext/x-patch; name=0001-Wait-for-slot-to-become-free-in-when-dropping-it.patchDownload

From 6d016a3fad8635c2366bcf5438d49f41c905621c Mon Sep 17 00:00:00 2001
From: Petr Jelinek <pjmodos@pjmodos.net>
Date: Thu, 6 Jul 2017 18:16:44 +0200
Subject: [PATCH] Wait for slot to become free in when dropping it

---
 src/backend/replication/logical/logicalfuncs.c |  2 +-
 src/backend/replication/slot.c                 | 51 ++++++++++++++++++++++----
 src/backend/replication/slotfuncs.c            |  2 +-
 src/backend/replication/walsender.c            |  6 +--
 src/include/replication/slot.h                 |  4 +-
 5 files changed, 50 insertions(+), 15 deletions(-)

diff --git a/src/backend/replication/logical/logicalfuncs.c b/src/backend/replication/logical/logicalfuncs.c
index 363ca82..a3ba2b1 100644
--- a/src/backend/replication/logical/logicalfuncs.c
+++ b/src/backend/replication/logical/logicalfuncs.c
@@ -244,7 +244,7 @@ pg_logical_slot_get_changes_guts(FunctionCallInfo fcinfo, bool confirm, bool bin
 	else
 		end_of_wal = GetXLogReplayRecPtr(&ThisTimeLineID);
 
-	ReplicationSlotAcquire(NameStr(*name));
+	ReplicationSlotAcquire(NameStr(*name), true);
 
 	PG_TRY();
 	{
diff --git a/src/backend/replication/slot.c b/src/backend/replication/slot.c
index dc7de20..ab115f4 100644
--- a/src/backend/replication/slot.c
+++ b/src/backend/replication/slot.c
@@ -46,6 +46,7 @@
 #include "pgstat.h"
 #include "replication/slot.h"
 #include "storage/fd.h"
+#include "storage/ipc.h"
 #include "storage/proc.h"
 #include "storage/procarray.h"
 #include "utils/builtins.h"
@@ -323,7 +324,7 @@ ReplicationSlotCreate(const char *name, bool db_specific,
  * Find a previously created slot and mark it as used by this backend.
  */
 void
-ReplicationSlotAcquire(const char *name)
+ReplicationSlotAcquire(const char *name, bool nowait)
 {
 	ReplicationSlot *slot = NULL;
 	int			i;
@@ -331,6 +332,8 @@ ReplicationSlotAcquire(const char *name)
 
 	Assert(MyReplicationSlot == NULL);
 
+retry:
+
 	/* Search for the named slot and mark it active if we find it. */
 	LWLockAcquire(ReplicationSlotControlLock, LW_SHARED);
 	for (i = 0; i < max_replication_slots; i++)
@@ -350,16 +353,48 @@ ReplicationSlotAcquire(const char *name)
 	}
 	LWLockRelease(ReplicationSlotControlLock);
 
-	/* If we did not find the slot or it was already active, error out. */
+	/* If we did not find the slot, error out. */
 	if (slot == NULL)
 		ereport(ERROR,
 				(errcode(ERRCODE_UNDEFINED_OBJECT),
 				 errmsg("replication slot \"%s\" does not exist", name)));
+
+	/*
+	 * If we did find the slot but it's already acquired by another backend,
+	 * we either error out or retry after short wait, depending on what was
+	 * the behavior requested by caller.
+	 */
 	if (active_pid != MyProcPid)
-		ereport(ERROR,
-				(errcode(ERRCODE_OBJECT_IN_USE),
-				 errmsg("replication slot \"%s\" is active for PID %d",
-						name, active_pid)));
+	{
+		int			rc;
+
+		if (nowait)
+			ereport(ERROR,
+					(errcode(ERRCODE_OBJECT_IN_USE),
+					 errmsg("replication slot \"%s\" is active for PID %d",
+							name, active_pid)));
+
+		/*
+		 * Wait a bit, there is no way to register that we are interested in
+		 * being woken up so we need to use timeout.  It's up to caller to
+		 * ensure that the slot is going to be free soon when instructing us
+		 * to wait.
+		 */
+		rc = WaitLatch(MyLatch,
+					   WL_LATCH_SET | WL_TIMEOUT | WL_POSTMASTER_DEATH,
+					   10L, PG_WAIT_LOCK);
+		/* emergency bailout if postmaster has died */
+		if (rc & WL_POSTMASTER_DEATH)
+			proc_exit(1);
+
+		if (rc & WL_LATCH_SET)
+		{
+			ResetLatch(MyLatch);
+			CHECK_FOR_INTERRUPTS();
+		}
+
+		goto retry;
+	}
 
 	/* We made this slot active, so it's ours now. */
 	MyReplicationSlot = slot;
@@ -451,11 +486,11 @@ ReplicationSlotCleanup(void)
  * Permanently drop replication slot identified by the passed in name.
  */
 void
-ReplicationSlotDrop(const char *name)
+ReplicationSlotDrop(const char *name, bool nowait)
 {
 	Assert(MyReplicationSlot == NULL);
 
-	ReplicationSlotAcquire(name);
+	ReplicationSlotAcquire(name, nowait);
 
 	ReplicationSlotDropAcquired();
 }
diff --git a/src/backend/replication/slotfuncs.c b/src/backend/replication/slotfuncs.c
index 6dc8088..a5ecc85 100644
--- a/src/backend/replication/slotfuncs.c
+++ b/src/backend/replication/slotfuncs.c
@@ -171,7 +171,7 @@ pg_drop_replication_slot(PG_FUNCTION_ARGS)
 
 	CheckSlotRequirements();
 
-	ReplicationSlotDrop(NameStr(*name));
+	ReplicationSlotDrop(NameStr(*name), false);
 
 	PG_RETURN_VOID();
 }
diff --git a/src/backend/replication/walsender.c b/src/backend/replication/walsender.c
index 002143b..9a2babe 100644
--- a/src/backend/replication/walsender.c
+++ b/src/backend/replication/walsender.c
@@ -541,7 +541,7 @@ StartReplication(StartReplicationCmd *cmd)
 
 	if (cmd->slotname)
 	{
-		ReplicationSlotAcquire(cmd->slotname);
+		ReplicationSlotAcquire(cmd->slotname, true);
 		if (SlotIsLogical(MyReplicationSlot))
 			ereport(ERROR,
 					(errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
@@ -1028,7 +1028,7 @@ CreateReplicationSlot(CreateReplicationSlotCmd *cmd)
 static void
 DropReplicationSlot(DropReplicationSlotCmd *cmd)
 {
-	ReplicationSlotDrop(cmd->slotname);
+	ReplicationSlotDrop(cmd->slotname, false);
 	EndCommand("DROP_REPLICATION_SLOT", DestRemote);
 }
 
@@ -1046,7 +1046,7 @@ StartLogicalReplication(StartReplicationCmd *cmd)
 
 	Assert(!MyReplicationSlot);
 
-	ReplicationSlotAcquire(cmd->slotname);
+	ReplicationSlotAcquire(cmd->slotname, true);
 
 	/*
 	 * Force a disconnect, so that the decoding code doesn't need to care
diff --git a/src/include/replication/slot.h b/src/include/replication/slot.h
index a283f4e..0fb0bf5 100644
--- a/src/include/replication/slot.h
+++ b/src/include/replication/slot.h
@@ -162,9 +162,9 @@ extern void ReplicationSlotsShmemInit(void);
 extern void ReplicationSlotCreate(const char *name, bool db_specific,
 					  ReplicationSlotPersistency p);
 extern void ReplicationSlotPersist(void);
-extern void ReplicationSlotDrop(const char *name);
+extern void ReplicationSlotDrop(const char *name, bool nowait);
 
-extern void ReplicationSlotAcquire(const char *name);
+extern void ReplicationSlotAcquire(const char *name, bool nowait);
 extern void ReplicationSlotRelease(void);
 extern void ReplicationSlotCleanup(void);
 extern void ReplicationSlotSave(void);
-- 
2.7.4

Petr Jelinek

petr.jelinek@2ndquadrant.com

over 8 years ago

In reply to: Petr Jelinek (#5)

1 attachment(s)

Re: More race conditions in logical replication

On 06/07/17 18:20, Petr Jelinek wrote:

On 06/07/17 17:33, Petr Jelinek wrote:

On 03/07/17 01:54, Tom Lane wrote:

I noticed a recent failure that looked suspiciously like a race condition:

https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=hornet&dt=2017-07-02%2018%3A02%3A07

The critical bit in the log file is

error running SQL: 'psql:<stdin>:1: ERROR: could not drop the replication slot "tap_sub" on publisher
DETAIL: The error was: ERROR: replication slot "tap_sub" is active for PID 3866790'
while running 'psql -XAtq -d port=59543 host=/tmp/QpCJtafT7R dbname='postgres' -f - -v ON_ERROR_STOP=1' with sql 'DROP SUBSCRIPTION tap_sub' at /home/nm/farm/xlc64/HEAD/pgsql.build/src/test/subscription/../../../src/test/perl/PostgresNode.pm line 1198.

After poking at it a bit, I found that I can cause several different
failures of this ilk in the subscription tests by injecting delays at
the points where a slot's active_pid is about to be cleared, as in the
attached patch (which also adds some extra printouts for debugging
purposes; none of that is meant for commit). It seems clear that there
is inadequate interlocking going on when we kill and restart a logical
rep worker: we're trying to start a new one before the old one has
gotten out of the slot.

Thanks for the test case.

It's not actually that we start new worker fast. It's that we try to
drop the slot right after worker process was killed but if the code that
clears slot's active_pid takes too long, it still looks like it's being
used. I am quite sure it's possible to make this happen for physical
replication as well when using slots.

This is not something that can be solved by locking on subscriber. ISTM
we need to make pg_drop_replication_slot behave more nicely, like making
it wait for the slot to become available (either by default or as an
option).

I'll have to think about how to do it without rewriting half of
replication slots or reimplementing lock queue though because the
replication slots don't use normal catalog access so there is no object
locking with wait queue. We could use some latch wait with small timeout
but that seems ugly as that function can be called by user without
having dropped the slot before so the wait can be quite long (as in
"forever").

Naive fix would be something like attached. But as I said, it's not
exactly pretty.

So best idea I could come up with is to make use of the new condition
variable API. That lets us wait for variable which can be set per slot.

It's not backportable however, I am not sure how big problem that is
considering the lack of complaints until now (maybe we could backport
using the ugly timeout version?).

The attached patch is a prototype of such solution and there are some
race conditions (variable can get signaled before the waiting process
starts sleeping for it). I am mainly sending it to get feedback on the
approach.

--
Petr Jelinek http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

Attachments:

v2-0001-Wait-for-slot-to-become-free-in-when-dropping-it.patchtext/x-patch; name=v2-0001-Wait-for-slot-to-become-free-in-when-dropping-it.patchDownload

From b72ea52c865b2d7f0d7d29d0834d71e1ec33d54a Mon Sep 17 00:00:00 2001
From: Petr Jelinek <pjmodos@pjmodos.net>
Date: Thu, 6 Jul 2017 18:16:44 +0200
Subject: [PATCH] Wait for slot to become free in when dropping it

---
 src/backend/replication/logical/logicalfuncs.c |  2 +-
 src/backend/replication/slot.c                 | 43 +++++++++++++++++++++-----
 src/backend/replication/slotfuncs.c            |  2 +-
 src/backend/replication/walsender.c            |  6 ++--
 src/include/replication/slot.h                 |  8 +++--
 5 files changed, 46 insertions(+), 15 deletions(-)

diff --git a/src/backend/replication/logical/logicalfuncs.c b/src/backend/replication/logical/logicalfuncs.c
index 363ca82..a3ba2b1 100644
--- a/src/backend/replication/logical/logicalfuncs.c
+++ b/src/backend/replication/logical/logicalfuncs.c
@@ -244,7 +244,7 @@ pg_logical_slot_get_changes_guts(FunctionCallInfo fcinfo, bool confirm, bool bin
 	else
 		end_of_wal = GetXLogReplayRecPtr(&ThisTimeLineID);
 
-	ReplicationSlotAcquire(NameStr(*name));
+	ReplicationSlotAcquire(NameStr(*name), true);
 
 	PG_TRY();
 	{
diff --git a/src/backend/replication/slot.c b/src/backend/replication/slot.c
index dc7de20..2993bb9 100644
--- a/src/backend/replication/slot.c
+++ b/src/backend/replication/slot.c
@@ -46,6 +46,7 @@
 #include "pgstat.h"
 #include "replication/slot.h"
 #include "storage/fd.h"
+#include "storage/ipc.h"
 #include "storage/proc.h"
 #include "storage/procarray.h"
 #include "utils/builtins.h"
@@ -157,6 +158,7 @@ ReplicationSlotsShmemInit(void)
 			/* everything else is zeroed by the memset above */
 			SpinLockInit(&slot->mutex);
 			LWLockInitialize(&slot->io_in_progress_lock, LWTRANCHE_REPLICATION_SLOT_IO_IN_PROGRESS);
+			ConditionVariableInit(&slot->active_cv);
 		}
 	}
 }
@@ -323,7 +325,7 @@ ReplicationSlotCreate(const char *name, bool db_specific,
  * Find a previously created slot and mark it as used by this backend.
  */
 void
-ReplicationSlotAcquire(const char *name)
+ReplicationSlotAcquire(const char *name, bool nowait)
 {
 	ReplicationSlot *slot = NULL;
 	int			i;
@@ -331,6 +333,8 @@ ReplicationSlotAcquire(const char *name)
 
 	Assert(MyReplicationSlot == NULL);
 
+retry:
+
 	/* Search for the named slot and mark it active if we find it. */
 	LWLockAcquire(ReplicationSlotControlLock, LW_SHARED);
 	for (i = 0; i < max_replication_slots; i++)
@@ -342,7 +346,10 @@ ReplicationSlotAcquire(const char *name)
 			SpinLockAcquire(&s->mutex);
 			active_pid = s->active_pid;
 			if (active_pid == 0)
+			{
 				active_pid = s->active_pid = MyProcPid;
+				ConditionVariableBroadcast(&s->active_cv);
+			}
 			SpinLockRelease(&s->mutex);
 			slot = s;
 			break;
@@ -350,16 +357,33 @@ ReplicationSlotAcquire(const char *name)
 	}
 	LWLockRelease(ReplicationSlotControlLock);
 
-	/* If we did not find the slot or it was already active, error out. */
+	/* If we did not find the slot, error out. */
 	if (slot == NULL)
 		ereport(ERROR,
 				(errcode(ERRCODE_UNDEFINED_OBJECT),
 				 errmsg("replication slot \"%s\" does not exist", name)));
+
+	/*
+	 * If we did find the slot but it's already acquired by another backend,
+	 * we either error out or retry after short wait, depending on what was
+	 * the behavior requested by caller.
+	 */
 	if (active_pid != MyProcPid)
-		ereport(ERROR,
-				(errcode(ERRCODE_OBJECT_IN_USE),
-				 errmsg("replication slot \"%s\" is active for PID %d",
-						name, active_pid)));
+	{
+		if (nowait)
+			ereport(ERROR,
+					(errcode(ERRCODE_OBJECT_IN_USE),
+					 errmsg("replication slot \"%s\" is active for PID %d",
+							name, active_pid)));
+
+		/* Wait for condition variable signal from ReplicationSlotRelease. */
+		ConditionVariableSleep(&slot->active_cv, PG_WAIT_LOCK);
+		ConditionVariableCancelSleep();
+
+		goto retry;
+	}
+
+
 
 	/* We made this slot active, so it's ours now. */
 	MyReplicationSlot = slot;
@@ -393,6 +417,7 @@ ReplicationSlotRelease(void)
 		 */
 		SpinLockAcquire(&slot->mutex);
 		slot->active_pid = 0;
+		ConditionVariableBroadcast(&slot->active_cv);
 		SpinLockRelease(&slot->mutex);
 	}
 
@@ -451,11 +476,11 @@ ReplicationSlotCleanup(void)
  * Permanently drop replication slot identified by the passed in name.
  */
 void
-ReplicationSlotDrop(const char *name)
+ReplicationSlotDrop(const char *name, bool nowait)
 {
 	Assert(MyReplicationSlot == NULL);
 
-	ReplicationSlotAcquire(name);
+	ReplicationSlotAcquire(name, nowait);
 
 	ReplicationSlotDropAcquired();
 }
@@ -525,6 +550,7 @@ ReplicationSlotDropPtr(ReplicationSlot *slot)
 
 		SpinLockAcquire(&slot->mutex);
 		slot->active_pid = 0;
+		ConditionVariableBroadcast(&slot->active_cv);
 		SpinLockRelease(&slot->mutex);
 
 		ereport(fail_softly ? WARNING : ERROR,
@@ -543,6 +569,7 @@ ReplicationSlotDropPtr(ReplicationSlot *slot)
 	LWLockAcquire(ReplicationSlotControlLock, LW_EXCLUSIVE);
 	slot->active_pid = 0;
 	slot->in_use = false;
+	ConditionVariableBroadcast(&slot->active_cv);
 	LWLockRelease(ReplicationSlotControlLock);
 
 	/*
diff --git a/src/backend/replication/slotfuncs.c b/src/backend/replication/slotfuncs.c
index 6dc8088..a5ecc85 100644
--- a/src/backend/replication/slotfuncs.c
+++ b/src/backend/replication/slotfuncs.c
@@ -171,7 +171,7 @@ pg_drop_replication_slot(PG_FUNCTION_ARGS)
 
 	CheckSlotRequirements();
 
-	ReplicationSlotDrop(NameStr(*name));
+	ReplicationSlotDrop(NameStr(*name), false);
 
 	PG_RETURN_VOID();
 }
diff --git a/src/backend/replication/walsender.c b/src/backend/replication/walsender.c
index 002143b..9a2babe 100644
--- a/src/backend/replication/walsender.c
+++ b/src/backend/replication/walsender.c
@@ -541,7 +541,7 @@ StartReplication(StartReplicationCmd *cmd)
 
 	if (cmd->slotname)
 	{
-		ReplicationSlotAcquire(cmd->slotname);
+		ReplicationSlotAcquire(cmd->slotname, true);
 		if (SlotIsLogical(MyReplicationSlot))
 			ereport(ERROR,
 					(errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
@@ -1028,7 +1028,7 @@ CreateReplicationSlot(CreateReplicationSlotCmd *cmd)
 static void
 DropReplicationSlot(DropReplicationSlotCmd *cmd)
 {
-	ReplicationSlotDrop(cmd->slotname);
+	ReplicationSlotDrop(cmd->slotname, false);
 	EndCommand("DROP_REPLICATION_SLOT", DestRemote);
 }
 
@@ -1046,7 +1046,7 @@ StartLogicalReplication(StartReplicationCmd *cmd)
 
 	Assert(!MyReplicationSlot);
 
-	ReplicationSlotAcquire(cmd->slotname);
+	ReplicationSlotAcquire(cmd->slotname, true);
 
 	/*
 	 * Force a disconnect, so that the decoding code doesn't need to care
diff --git a/src/include/replication/slot.h b/src/include/replication/slot.h
index a283f4e..f97679e 100644
--- a/src/include/replication/slot.h
+++ b/src/include/replication/slot.h
@@ -12,6 +12,7 @@
 #include "fmgr.h"
 #include "access/xlog.h"
 #include "access/xlogreader.h"
+#include "storage/condition_variable.h"
 #include "storage/lwlock.h"
 #include "storage/shmem.h"
 #include "storage/spin.h"
@@ -93,6 +94,9 @@ typedef struct ReplicationSlot
 	/* Who is streaming out changes for this slot? 0 in unused slots. */
 	pid_t		active_pid;
 
+	/* Conditional variable which is signalled when the above changes. */
+	ConditionVariable active_cv;
+
 	/* any outstanding modifications? */
 	bool		just_dirtied;
 	bool		dirty;
@@ -162,9 +166,9 @@ extern void ReplicationSlotsShmemInit(void);
 extern void ReplicationSlotCreate(const char *name, bool db_specific,
 					  ReplicationSlotPersistency p);
 extern void ReplicationSlotPersist(void);
-extern void ReplicationSlotDrop(const char *name);
+extern void ReplicationSlotDrop(const char *name, bool nowait);
 
-extern void ReplicationSlotAcquire(const char *name);
+extern void ReplicationSlotAcquire(const char *name, bool nowait);
 extern void ReplicationSlotRelease(void);
 extern void ReplicationSlotCleanup(void);
 extern void ReplicationSlotSave(void);
-- 
2.7.4

Alvaro Herrera

alvherre@2ndquadrant.com

over 8 years ago

In reply to: Petr Jelinek (#6)

Re: More race conditions in logical replication

Petr Jelinek wrote:

So best idea I could come up with is to make use of the new condition
variable API. That lets us wait for variable which can be set per slot.

It's not backportable however, I am not sure how big problem that is
considering the lack of complaints until now (maybe we could backport
using the ugly timeout version?).

The attached patch is a prototype of such solution and there are some
race conditions (variable can get signaled before the waiting process
starts sleeping for it). I am mainly sending it to get feedback on the
approach.

This one looks a lot better than the original one.

I wonder if it's possible to do this using lwlocks instead of condition
variables (similar to how we do the "wait for IO" thing both for slots
and buffers). We could backport that easily ...

I'll next update this on or before Monday 10th at 19:00 CLT.

--
ï¿½lvaro Herrera https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Tom Lane

tgl@sss.pgh.pa.us

over 8 years ago

In reply to: Alvaro Herrera (#7)

Re: More race conditions in logical replication

Alvaro Herrera <alvherre@2ndquadrant.com> writes:

I'll next update this on or before Monday 10th at 19:00 CLT.

Schedule note --- we'll be wrapping beta2 on Monday, a couple hours
before that. Although it'd be great to have this issue fixed before
beta2, jamming in a patch just a few hours before wrap is probably
not prudent. If you can't get it done over the weekend, I'd counsel
holding off till after beta2 is tagged.

regards, tom lane

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Alvaro Herrera

alvherre@2ndquadrant.com

over 8 years ago

In reply to: Alvaro Herrera (#7)

Re: More race conditions in logical replication

Alvaro Herrera wrote:

I'll next update this on or before Monday 10th at 19:00 CLT.

I couldn't get to this today as I wanted. Next update on Wednesday
12th, same time.

--
ï¿½lvaro Herrera https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#10

Alvaro Herrera

alvherre@2ndquadrant.com

over 8 years ago

In reply to: Petr Jelinek (#6)

1 attachment(s)

Re: More race conditions in logical replication

Petr Jelinek wrote:

So best idea I could come up with is to make use of the new condition
variable API. That lets us wait for variable which can be set per slot.

Here's a cleaned up version of that patch, which I intend to get in the
tree tomorrow. I verified that this allows the subscription tests to
pass with Tom's sleep additions.

I noticed one point where we're reading the active_pid without locking;
marked it with an XXX comment. Not yet sure whether this is a bug or
not.

I noticed something funny in CREATE_REPLICATION; apparently we first
create a slot and set it active, then we activate it by name. With the
current coding it seems to work fine, because we're careful to make
activation idempotent, but it looks convoluted. I don't think this is
new, though.

--
ï¿½lvaro Herrera https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Attachments:

v3-0001-Wait-for-slot-to-become-free-in-when-dropping-it.patchtext/plain; charset=us-asciiDownload

From 55533aa3cdc2fecbf7b6b6c661649342a204e73b Mon Sep 17 00:00:00 2001
From: Alvaro Herrera <alvherre@alvh.no-ip.org>
Date: Wed, 12 Jul 2017 18:38:33 -0400
Subject: [PATCH v3 1/1] Wait for slot to become free in when dropping it

---
 src/backend/replication/logical/logicalfuncs.c |  2 +-
 src/backend/replication/slot.c                 | 79 +++++++++++++++++++-------
 src/backend/replication/slotfuncs.c            |  2 +-
 src/backend/replication/walsender.c            |  6 +-
 src/include/replication/slot.h                 |  8 ++-
 5 files changed, 69 insertions(+), 28 deletions(-)

diff --git a/src/backend/replication/logical/logicalfuncs.c b/src/backend/replication/logical/logicalfuncs.c
index 363ca82cb0..a3ba2b1266 100644
--- a/src/backend/replication/logical/logicalfuncs.c
+++ b/src/backend/replication/logical/logicalfuncs.c
@@ -244,7 +244,7 @@ pg_logical_slot_get_changes_guts(FunctionCallInfo fcinfo, bool confirm, bool bin
 	else
 		end_of_wal = GetXLogReplayRecPtr(&ThisTimeLineID);
 
-	ReplicationSlotAcquire(NameStr(*name));
+	ReplicationSlotAcquire(NameStr(*name), true);
 
 	PG_TRY();
 	{
diff --git a/src/backend/replication/slot.c b/src/backend/replication/slot.c
index dc7de20e11..76198a627d 100644
--- a/src/backend/replication/slot.c
+++ b/src/backend/replication/slot.c
@@ -157,6 +157,7 @@ ReplicationSlotsShmemInit(void)
 			/* everything else is zeroed by the memset above */
 			SpinLockInit(&slot->mutex);
 			LWLockInitialize(&slot->io_in_progress_lock, LWTRANCHE_REPLICATION_SLOT_IO_IN_PROGRESS);
+			ConditionVariableInit(&slot->active_cv);
 		}
 	}
 }
@@ -313,24 +314,27 @@ ReplicationSlotCreate(const char *name, bool db_specific,
 	LWLockRelease(ReplicationSlotControlLock);
 
 	/*
-	 * Now that the slot has been marked as in_use and in_active, it's safe to
+	 * Now that the slot has been marked as in_use and active, it's safe to
 	 * let somebody else try to allocate a slot.
 	 */
 	LWLockRelease(ReplicationSlotAllocationLock);
+
+	ConditionVariableBroadcast(&slot->active_cv);
 }
 
 /*
  * Find a previously created slot and mark it as used by this backend.
  */
 void
-ReplicationSlotAcquire(const char *name)
+ReplicationSlotAcquire(const char *name, bool nowait)
 {
 	ReplicationSlot *slot = NULL;
+	int			active_pid = 0;
 	int			i;
-	int			active_pid = 0; /* Keep compiler quiet */
 
 	Assert(MyReplicationSlot == NULL);
 
+retry:
 	/* Search for the named slot and mark it active if we find it. */
 	LWLockAcquire(ReplicationSlotControlLock, LW_SHARED);
 	for (i = 0; i < max_replication_slots; i++)
@@ -339,27 +343,52 @@ ReplicationSlotAcquire(const char *name)
 
 		if (s->in_use && strcmp(name, NameStr(s->data.name)) == 0)
 		{
+			/* Found the slot we want -- can we activate it? */
 			SpinLockAcquire(&s->mutex);
+
 			active_pid = s->active_pid;
 			if (active_pid == 0)
 				active_pid = s->active_pid = MyProcPid;
+
 			SpinLockRelease(&s->mutex);
 			slot = s;
+
 			break;
 		}
 	}
 	LWLockRelease(ReplicationSlotControlLock);
 
-	/* If we did not find the slot or it was already active, error out. */
+	/* If we did not find the slot, error out. */
 	if (slot == NULL)
 		ereport(ERROR,
 				(errcode(ERRCODE_UNDEFINED_OBJECT),
 				 errmsg("replication slot \"%s\" does not exist", name)));
+
+	/*
+	 * If we found the slot but it's already active in another backend, we
+	 * either error out or retry after a short wait, as caller specified.
+	 */
 	if (active_pid != MyProcPid)
-		ereport(ERROR,
-				(errcode(ERRCODE_OBJECT_IN_USE),
-				 errmsg("replication slot \"%s\" is active for PID %d",
-						name, active_pid)));
+	{
+		if (nowait)
+			ereport(ERROR,
+					(errcode(ERRCODE_OBJECT_IN_USE),
+					 errmsg("replication slot \"%s\" is active for PID %d",
+							name, active_pid)));
+
+		/* Wait here until we get signaled by whoever is active */
+		ConditionVariablePrepareToSleep(&slot->active_cv);
+		ConditionVariableSleep(&slot->active_cv, PG_WAIT_LOCK);
+		ConditionVariableCancelSleep();
+
+		goto retry;
+	}
+
+	/*
+	 * If another process lost the race to set the slot active, it's now
+	 * sleeping; wake it up so that it can continue and fail properly.
+	 */
+	ConditionVariableBroadcast(&slot->active_cv);
 
 	/* We made this slot active, so it's ours now. */
 	MyReplicationSlot = slot;
@@ -385,17 +414,6 @@ ReplicationSlotRelease(void)
 		 */
 		ReplicationSlotDropAcquired();
 	}
-	else if (slot->data.persistency == RS_PERSISTENT)
-	{
-		/*
-		 * Mark persistent slot inactive.  We're not freeing it, just
-		 * disconnecting.
-		 */
-		SpinLockAcquire(&slot->mutex);
-		slot->active_pid = 0;
-		SpinLockRelease(&slot->mutex);
-	}
-
 
 	/*
 	 * If slot needed to temporarily restrain both data and catalog xmin to
@@ -412,6 +430,18 @@ ReplicationSlotRelease(void)
 		ReplicationSlotsComputeRequiredXmin(false);
 	}
 
+	if (slot->data.persistency == RS_PERSISTENT)
+	{
+		/*
+		 * Mark persistent slot inactive.  We're not freeing it, just
+		 * disconnecting, but wake up others that may be waiting for it.
+		 */
+		SpinLockAcquire(&slot->mutex);
+		slot->active_pid = 0;
+		SpinLockRelease(&slot->mutex);
+		ConditionVariableBroadcast(&slot->active_cv);
+	}
+
 	MyReplicationSlot = NULL;
 
 	/* might not have been set when we've been a plain slot */
@@ -438,6 +468,7 @@ ReplicationSlotCleanup(void)
 	{
 		ReplicationSlot *s = &ReplicationSlotCtl->replication_slots[i];
 
+		/* XXX why is it okay to read unlocked here? */
 		if (s->active_pid == MyProcPid)
 		{
 			Assert(s->in_use && s->data.persistency == RS_TEMPORARY);
@@ -451,11 +482,11 @@ ReplicationSlotCleanup(void)
  * Permanently drop replication slot identified by the passed in name.
  */
 void
-ReplicationSlotDrop(const char *name)
+ReplicationSlotDrop(const char *name, bool nowait)
 {
 	Assert(MyReplicationSlot == NULL);
 
-	ReplicationSlotAcquire(name);
+	ReplicationSlotAcquire(name, nowait);
 
 	ReplicationSlotDropAcquired();
 }
@@ -527,6 +558,9 @@ ReplicationSlotDropPtr(ReplicationSlot *slot)
 		slot->active_pid = 0;
 		SpinLockRelease(&slot->mutex);
 
+		/* wake up anyone waiting on this slot */
+		ConditionVariableBroadcast(&slot->active_cv);
+
 		ereport(fail_softly ? WARNING : ERROR,
 				(errcode_for_file_access(),
 				 errmsg("could not rename file \"%s\" to \"%s\": %m",
@@ -539,11 +573,14 @@ ReplicationSlotDropPtr(ReplicationSlot *slot)
 	 * grabbing the mutex because nobody else can be scanning the array here,
 	 * and nobody can be attached to this slot and thus access it without
 	 * scanning the array.
+	 *
+	 * Also wake up processes waiting for it.
 	 */
 	LWLockAcquire(ReplicationSlotControlLock, LW_EXCLUSIVE);
 	slot->active_pid = 0;
 	slot->in_use = false;
 	LWLockRelease(ReplicationSlotControlLock);
+	ConditionVariableBroadcast(&slot->active_cv);
 
 	/*
 	 * Slot is dead and doesn't prevent resource removal anymore, recompute
diff --git a/src/backend/replication/slotfuncs.c b/src/backend/replication/slotfuncs.c
index 6dc808874d..a5ecc85ba5 100644
--- a/src/backend/replication/slotfuncs.c
+++ b/src/backend/replication/slotfuncs.c
@@ -171,7 +171,7 @@ pg_drop_replication_slot(PG_FUNCTION_ARGS)
 
 	CheckSlotRequirements();
 
-	ReplicationSlotDrop(NameStr(*name));
+	ReplicationSlotDrop(NameStr(*name), false);
 
 	PG_RETURN_VOID();
 }
diff --git a/src/backend/replication/walsender.c b/src/backend/replication/walsender.c
index 002143b26a..9a2babef1e 100644
--- a/src/backend/replication/walsender.c
+++ b/src/backend/replication/walsender.c
@@ -541,7 +541,7 @@ StartReplication(StartReplicationCmd *cmd)
 
 	if (cmd->slotname)
 	{
-		ReplicationSlotAcquire(cmd->slotname);
+		ReplicationSlotAcquire(cmd->slotname, true);
 		if (SlotIsLogical(MyReplicationSlot))
 			ereport(ERROR,
 					(errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
@@ -1028,7 +1028,7 @@ CreateReplicationSlot(CreateReplicationSlotCmd *cmd)
 static void
 DropReplicationSlot(DropReplicationSlotCmd *cmd)
 {
-	ReplicationSlotDrop(cmd->slotname);
+	ReplicationSlotDrop(cmd->slotname, false);
 	EndCommand("DROP_REPLICATION_SLOT", DestRemote);
 }
 
@@ -1046,7 +1046,7 @@ StartLogicalReplication(StartReplicationCmd *cmd)
 
 	Assert(!MyReplicationSlot);
 
-	ReplicationSlotAcquire(cmd->slotname);
+	ReplicationSlotAcquire(cmd->slotname, true);
 
 	/*
 	 * Force a disconnect, so that the decoding code doesn't need to care
diff --git a/src/include/replication/slot.h b/src/include/replication/slot.h
index a283f4e2b8..f52e988a6d 100644
--- a/src/include/replication/slot.h
+++ b/src/include/replication/slot.h
@@ -12,6 +12,7 @@
 #include "fmgr.h"
 #include "access/xlog.h"
 #include "access/xlogreader.h"
+#include "storage/condition_variable.h"
 #include "storage/lwlock.h"
 #include "storage/shmem.h"
 #include "storage/spin.h"
@@ -117,6 +118,9 @@ typedef struct ReplicationSlot
 	/* is somebody performing io on this slot? */
 	LWLock		io_in_progress_lock;
 
+	/* Condition variable signalled when active_pid changes */
+	ConditionVariable active_cv;
+
 	/* all the remaining data is only used for logical slots */
 
 	/*
@@ -162,9 +166,9 @@ extern void ReplicationSlotsShmemInit(void);
 extern void ReplicationSlotCreate(const char *name, bool db_specific,
 					  ReplicationSlotPersistency p);
 extern void ReplicationSlotPersist(void);
-extern void ReplicationSlotDrop(const char *name);
+extern void ReplicationSlotDrop(const char *name, bool nowait);
 
-extern void ReplicationSlotAcquire(const char *name);
+extern void ReplicationSlotAcquire(const char *name, bool nowait);
 extern void ReplicationSlotRelease(void);
 extern void ReplicationSlotCleanup(void);
 extern void ReplicationSlotSave(void);
-- 
2.11.0

#11

Noah Misch

noah@leadboat.com

over 8 years ago

In reply to: Alvaro Herrera (#10)

Re: More race conditions in logical replication

On Wed, Jul 12, 2017 at 06:48:28PM -0400, Alvaro Herrera wrote:

Petr Jelinek wrote:

So best idea I could come up with is to make use of the new condition
variable API. That lets us wait for variable which can be set per slot.

Here's a cleaned up version of that patch, which I intend to get in the
tree tomorrow. I verified that this allows the subscription tests to
pass with Tom's sleep additions.

This PostgreSQL 10 open item is past due for your status update. Kindly send
a status update within 24 hours, and include a date for your subsequent status
update. Refer to the policy on open item ownership:
/messages/by-id/20170404140717.GA2675809@tornado.leadboat.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#12

Alvaro Herrera

alvherre@2ndquadrant.com

over 8 years ago

In reply to: Alvaro Herrera (#10)

Re: More race conditions in logical replication

After going over this a bunch more times, I found other problems. For
example, I noticed that if I create a temporary slot in one session,
then another session would rightly go to sleep if it tries to drop that
slot. But the slot goes away when the first session disconnects, so I'd
expect the sleeping session to get a signal and wake up, but that
doesn't happen.

I'm going to continue to look at this and report back Tuesday 18th.

--
ï¿½lvaro Herrera https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#13

Alvaro Herrera

alvherre@2ndquadrant.com

over 8 years ago

In reply to: Alvaro Herrera (#12)

Re: More race conditions in logical replication

Alvaro Herrera wrote:

After going over this a bunch more times, I found other problems. For
example, I noticed that if I create a temporary slot in one session,
then another session would rightly go to sleep if it tries to drop that
slot. But the slot goes away when the first session disconnects, so I'd
expect the sleeping session to get a signal and wake up, but that
doesn't happen.

I'm going to continue to look at this and report back Tuesday 18th.

I got tangled up in a really tough problem this week, so I won't have
time to work on this for a couple of days. I can next update tomorrow
19:00 CLT (probably not with a final fix yet, though).

--
ï¿½lvaro Herrera https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#14

Alvaro Herrera

alvherre@2ndquadrant.com

over 8 years ago

In reply to: Alvaro Herrera (#13)

Re: More race conditions in logical replication

I'm back at looking into this again, after a rather exhausting week. I
think I have a better grasp of what is going on in this code now, and it
appears to me that we should change the locking so that active_pid is
protected by ReplicationSlotControlLock instead of the slot's spinlock;
but I haven't analyzed the idea fully yet and I don't have the patch
done yet either. I'm going to report, hopefully with a fix committed
this time, on Monday at 19:00 CLT.

--
ï¿½lvaro Herrera https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#15

Alvaro Herrera

alvherre@2ndquadrant.com

over 8 years ago

In reply to: Alvaro Herrera (#14)

1 attachment(s)

Re: More race conditions in logical replication

Alvaro Herrera wrote:

I'm back at looking into this again, after a rather exhausting week. I
think I have a better grasp of what is going on in this code now,

Here's an updated patch, which I intend to push tomorrow.

and it
appears to me that we should change the locking so that active_pid is
protected by ReplicationSlotControlLock instead of the slot's spinlock;
but I haven't analyzed the idea fully yet and I don't have the patch
done yet either.

This doesn't seem to be a good idea actually.

--
ï¿½lvaro Herrera https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Attachments:

v4-0001-Wait-for-slot-to-become-free-in-when-dropping-it.patchtext/plain; charset=us-asciiDownload

From 33f08678bf20eed3a4cb3d10960bb06543a1b3db Mon Sep 17 00:00:00 2001
From: Alvaro Herrera <alvherre@alvh.no-ip.org>
Date: Wed, 12 Jul 2017 18:38:33 -0400
Subject: [PATCH v4] Wait for slot to become free in when dropping it

---
 src/backend/replication/logical/logicalfuncs.c |   2 +-
 src/backend/replication/slot.c                 | 115 ++++++++++++++++++-------
 src/backend/replication/slotfuncs.c            |  32 ++++---
 src/backend/replication/walsender.c            |   6 +-
 src/include/replication/slot.h                 |  10 ++-
 5 files changed, 110 insertions(+), 55 deletions(-)

diff --git a/src/backend/replication/logical/logicalfuncs.c b/src/backend/replication/logical/logicalfuncs.c
index 363ca82cb0..a3ba2b1266 100644
--- a/src/backend/replication/logical/logicalfuncs.c
+++ b/src/backend/replication/logical/logicalfuncs.c
@@ -244,7 +244,7 @@ pg_logical_slot_get_changes_guts(FunctionCallInfo fcinfo, bool confirm, bool bin
 	else
 		end_of_wal = GetXLogReplayRecPtr(&ThisTimeLineID);
 
-	ReplicationSlotAcquire(NameStr(*name));
+	ReplicationSlotAcquire(NameStr(*name), true);
 
 	PG_TRY();
 	{
diff --git a/src/backend/replication/slot.c b/src/backend/replication/slot.c
index dc7de20e11..ea9cd1f22b 100644
--- a/src/backend/replication/slot.c
+++ b/src/backend/replication/slot.c
@@ -157,6 +157,7 @@ ReplicationSlotsShmemInit(void)
 			/* everything else is zeroed by the memset above */
 			SpinLockInit(&slot->mutex);
 			LWLockInitialize(&slot->io_in_progress_lock, LWTRANCHE_REPLICATION_SLOT_IO_IN_PROGRESS);
+			ConditionVariableInit(&slot->active_cv);
 		}
 	}
 }
@@ -313,25 +314,35 @@ ReplicationSlotCreate(const char *name, bool db_specific,
 	LWLockRelease(ReplicationSlotControlLock);
 
 	/*
-	 * Now that the slot has been marked as in_use and in_active, it's safe to
+	 * Now that the slot has been marked as in_use and active, it's safe to
 	 * let somebody else try to allocate a slot.
 	 */
 	LWLockRelease(ReplicationSlotAllocationLock);
+
+	/* Let everybody know we've modified this slot */
+	ConditionVariableBroadcast(&slot->active_cv);
 }
 
 /*
  * Find a previously created slot and mark it as used by this backend.
  */
 void
-ReplicationSlotAcquire(const char *name)
+ReplicationSlotAcquire(const char *name, bool nowait)
 {
-	ReplicationSlot *slot = NULL;
+	ReplicationSlot *slot;
+	int			active_pid;
 	int			i;
-	int			active_pid = 0; /* Keep compiler quiet */
 
+retry:
 	Assert(MyReplicationSlot == NULL);
 
-	/* Search for the named slot and mark it active if we find it. */
+	/*
+	 * Search for the named slot and mark it active if we find it.  If the
+	 * slot is already active, we exit the loop with active_pid set to the PID
+	 * of the backend that owns it.
+	 */
+	active_pid = 0;
+	slot = NULL;
 	LWLockAcquire(ReplicationSlotControlLock, LW_SHARED);
 	for (i = 0; i < max_replication_slots; i++)
 	{
@@ -339,35 +350,59 @@ ReplicationSlotAcquire(const char *name)
 
 		if (s->in_use && strcmp(name, NameStr(s->data.name)) == 0)
 		{
+			/* Found the slot we want -- can we activate it? */
 			SpinLockAcquire(&s->mutex);
+
 			active_pid = s->active_pid;
 			if (active_pid == 0)
 				active_pid = s->active_pid = MyProcPid;
+
 			SpinLockRelease(&s->mutex);
 			slot = s;
+
 			break;
 		}
 	}
 	LWLockRelease(ReplicationSlotControlLock);
 
-	/* If we did not find the slot or it was already active, error out. */
+	/* If we did not find the slot, error out. */
 	if (slot == NULL)
 		ereport(ERROR,
 				(errcode(ERRCODE_UNDEFINED_OBJECT),
 				 errmsg("replication slot \"%s\" does not exist", name)));
+
+	/*
+	 * If we found the slot but it's already active in another backend, we
+	 * either error out or retry after a short wait, as caller specified.
+	 */
 	if (active_pid != MyProcPid)
-		ereport(ERROR,
-				(errcode(ERRCODE_OBJECT_IN_USE),
-				 errmsg("replication slot \"%s\" is active for PID %d",
-						name, active_pid)));
+	{
+		if (nowait)
+			ereport(ERROR,
+					(errcode(ERRCODE_OBJECT_IN_USE),
+					 errmsg("replication slot \"%s\" is active for PID %d",
+							name, active_pid)));
+
+		/* Wait here until we get signaled by whoever is active */
+		ConditionVariablePrepareToSleep(&slot->active_cv);
+		ConditionVariableSleep(&slot->active_cv, PG_WAIT_LOCK);
+		ConditionVariableCancelSleep();
+
+		goto retry;
+	}
+
+	/* Let everybody know we've modified this slot */
+	ConditionVariableBroadcast(&slot->active_cv);
 
 	/* We made this slot active, so it's ours now. */
 	MyReplicationSlot = slot;
 }
 
 /*
- * Release a replication slot, this or another backend can ReAcquire it
- * later. Resources this slot requires will be preserved.
+ * Release the replication slot that this backend considers to own.
+ *
+ * This or another backend can re-acquire the slot later.
+ * Resources this slot requires will be preserved.
  */
 void
 ReplicationSlotRelease(void)
@@ -385,17 +420,6 @@ ReplicationSlotRelease(void)
 		 */
 		ReplicationSlotDropAcquired();
 	}
-	else if (slot->data.persistency == RS_PERSISTENT)
-	{
-		/*
-		 * Mark persistent slot inactive.  We're not freeing it, just
-		 * disconnecting.
-		 */
-		SpinLockAcquire(&slot->mutex);
-		slot->active_pid = 0;
-		SpinLockRelease(&slot->mutex);
-	}
-
 
 	/*
 	 * If slot needed to temporarily restrain both data and catalog xmin to
@@ -412,6 +436,18 @@ ReplicationSlotRelease(void)
 		ReplicationSlotsComputeRequiredXmin(false);
 	}
 
+	if (slot->data.persistency == RS_PERSISTENT)
+	{
+		/*
+		 * Mark persistent slot inactive.  We're not freeing it, just
+		 * disconnecting, but wake up others that may be waiting for it.
+		 */
+		SpinLockAcquire(&slot->mutex);
+		slot->active_pid = 0;
+		SpinLockRelease(&slot->mutex);
+		ConditionVariableBroadcast(&slot->active_cv);
+	}
+
 	MyReplicationSlot = NULL;
 
 	/* might not have been set when we've been a plain slot */
@@ -430,32 +466,43 @@ ReplicationSlotCleanup(void)
 
 	Assert(MyReplicationSlot == NULL);
 
-	/*
-	 * No need for locking as we are only interested in slots active in
-	 * current process and those are not touched by other processes.
-	 */
+restart:
+	LWLockAcquire(ReplicationSlotControlLock, LW_SHARED);
 	for (i = 0; i < max_replication_slots; i++)
 	{
 		ReplicationSlot *s = &ReplicationSlotCtl->replication_slots[i];
 
+		if (!s->in_use)
+			continue;
+
+		SpinLockAcquire(&s->mutex);
 		if (s->active_pid == MyProcPid)
 		{
-			Assert(s->in_use && s->data.persistency == RS_TEMPORARY);
+			Assert(s->data.persistency == RS_TEMPORARY);
+			SpinLockRelease(&s->mutex);
+			LWLockRelease(ReplicationSlotControlLock);	/* avoid deadlock */
 
 			ReplicationSlotDropPtr(s);
+
+			ConditionVariableBroadcast(&s->active_cv);
+			goto restart;
 		}
+		else
+			SpinLockRelease(&s->mutex);
 	}
+
+	LWLockRelease(ReplicationSlotControlLock);
 }
 
 /*
  * Permanently drop replication slot identified by the passed in name.
  */
 void
-ReplicationSlotDrop(const char *name)
+ReplicationSlotDrop(const char *name, bool nowait)
 {
 	Assert(MyReplicationSlot == NULL);
 
-	ReplicationSlotAcquire(name);
+	ReplicationSlotAcquire(name, nowait);
 
 	ReplicationSlotDropAcquired();
 }
@@ -527,6 +574,9 @@ ReplicationSlotDropPtr(ReplicationSlot *slot)
 		slot->active_pid = 0;
 		SpinLockRelease(&slot->mutex);
 
+		/* wake up anyone waiting on this slot */
+		ConditionVariableBroadcast(&slot->active_cv);
+
 		ereport(fail_softly ? WARNING : ERROR,
 				(errcode_for_file_access(),
 				 errmsg("could not rename file \"%s\" to \"%s\": %m",
@@ -535,15 +585,18 @@ ReplicationSlotDropPtr(ReplicationSlot *slot)
 
 	/*
 	 * The slot is definitely gone.  Lock out concurrent scans of the array
-	 * long enough to kill it.  It's OK to clear the active flag here without
+	 * long enough to kill it.  It's OK to clear the active PID here without
 	 * grabbing the mutex because nobody else can be scanning the array here,
 	 * and nobody can be attached to this slot and thus access it without
 	 * scanning the array.
+	 *
+	 * Also wake up processes waiting for it.
 	 */
 	LWLockAcquire(ReplicationSlotControlLock, LW_EXCLUSIVE);
 	slot->active_pid = 0;
 	slot->in_use = false;
 	LWLockRelease(ReplicationSlotControlLock);
+	ConditionVariableBroadcast(&slot->active_cv);
 
 	/*
 	 * Slot is dead and doesn't prevent resource removal anymore, recompute
diff --git a/src/backend/replication/slotfuncs.c b/src/backend/replication/slotfuncs.c
index 6dc808874d..d4cbd83bde 100644
--- a/src/backend/replication/slotfuncs.c
+++ b/src/backend/replication/slotfuncs.c
@@ -171,7 +171,7 @@ pg_drop_replication_slot(PG_FUNCTION_ARGS)
 
 	CheckSlotRequirements();
 
-	ReplicationSlotDrop(NameStr(*name));
+	ReplicationSlotDrop(NameStr(*name), false);
 
 	PG_RETURN_VOID();
 }
@@ -221,6 +221,7 @@ pg_get_replication_slots(PG_FUNCTION_ARGS)
 
 	MemoryContextSwitchTo(oldcontext);
 
+	LWLockAcquire(ReplicationSlotControlLock, LW_SHARED);
 	for (slotno = 0; slotno < max_replication_slots; slotno++)
 	{
 		ReplicationSlot *slot = &ReplicationSlotCtl->replication_slots[slotno];
@@ -238,25 +239,21 @@ pg_get_replication_slots(PG_FUNCTION_ARGS)
 		NameData	plugin;
 		int			i;
 
-		SpinLockAcquire(&slot->mutex);
 		if (!slot->in_use)
-		{
-			SpinLockRelease(&slot->mutex);
 			continue;
-		}
-		else
-		{
-			xmin = slot->data.xmin;
-			catalog_xmin = slot->data.catalog_xmin;
-			database = slot->data.database;
-			restart_lsn = slot->data.restart_lsn;
-			confirmed_flush_lsn = slot->data.confirmed_flush;
-			namecpy(&slot_name, &slot->data.name);
-			namecpy(&plugin, &slot->data.plugin);
 
-			active_pid = slot->active_pid;
-			persistency = slot->data.persistency;
-		}
+		SpinLockAcquire(&slot->mutex);
+
+		xmin = slot->data.xmin;
+		catalog_xmin = slot->data.catalog_xmin;
+		database = slot->data.database;
+		restart_lsn = slot->data.restart_lsn;
+		confirmed_flush_lsn = slot->data.confirmed_flush;
+		namecpy(&slot_name, &slot->data.name);
+		namecpy(&plugin, &slot->data.plugin);
+		active_pid = slot->active_pid;
+		persistency = slot->data.persistency;
+
 		SpinLockRelease(&slot->mutex);
 
 		memset(nulls, 0, sizeof(nulls));
@@ -309,6 +306,7 @@ pg_get_replication_slots(PG_FUNCTION_ARGS)
 
 		tuplestore_putvalues(tupstore, tupdesc, values, nulls);
 	}
+	LWLockRelease(ReplicationSlotControlLock);
 
 	tuplestore_donestoring(tupstore);
 
diff --git a/src/backend/replication/walsender.c b/src/backend/replication/walsender.c
index 002143b26a..9a2babef1e 100644
--- a/src/backend/replication/walsender.c
+++ b/src/backend/replication/walsender.c
@@ -541,7 +541,7 @@ StartReplication(StartReplicationCmd *cmd)
 
 	if (cmd->slotname)
 	{
-		ReplicationSlotAcquire(cmd->slotname);
+		ReplicationSlotAcquire(cmd->slotname, true);
 		if (SlotIsLogical(MyReplicationSlot))
 			ereport(ERROR,
 					(errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
@@ -1028,7 +1028,7 @@ CreateReplicationSlot(CreateReplicationSlotCmd *cmd)
 static void
 DropReplicationSlot(DropReplicationSlotCmd *cmd)
 {
-	ReplicationSlotDrop(cmd->slotname);
+	ReplicationSlotDrop(cmd->slotname, false);
 	EndCommand("DROP_REPLICATION_SLOT", DestRemote);
 }
 
@@ -1046,7 +1046,7 @@ StartLogicalReplication(StartReplicationCmd *cmd)
 
 	Assert(!MyReplicationSlot);
 
-	ReplicationSlotAcquire(cmd->slotname);
+	ReplicationSlotAcquire(cmd->slotname, true);
 
 	/*
 	 * Force a disconnect, so that the decoding code doesn't need to care
diff --git a/src/include/replication/slot.h b/src/include/replication/slot.h
index a283f4e2b8..0bf2611fe9 100644
--- a/src/include/replication/slot.h
+++ b/src/include/replication/slot.h
@@ -12,6 +12,7 @@
 #include "fmgr.h"
 #include "access/xlog.h"
 #include "access/xlogreader.h"
+#include "storage/condition_variable.h"
 #include "storage/lwlock.h"
 #include "storage/shmem.h"
 #include "storage/spin.h"
@@ -19,7 +20,7 @@
 /*
  * Behaviour of replication slots, upon release or crash.
  *
- * Slots marked as PERSISTENT are crashsafe and will not be dropped when
+ * Slots marked as PERSISTENT are crash-safe and will not be dropped when
  * released. Slots marked as EPHEMERAL will be dropped when released or after
  * restarts.
  *
@@ -117,6 +118,9 @@ typedef struct ReplicationSlot
 	/* is somebody performing io on this slot? */
 	LWLock		io_in_progress_lock;
 
+	/* Condition variable signalled when active_pid changes */
+	ConditionVariable active_cv;
+
 	/* all the remaining data is only used for logical slots */
 
 	/*
@@ -162,9 +166,9 @@ extern void ReplicationSlotsShmemInit(void);
 extern void ReplicationSlotCreate(const char *name, bool db_specific,
 					  ReplicationSlotPersistency p);
 extern void ReplicationSlotPersist(void);
-extern void ReplicationSlotDrop(const char *name);
+extern void ReplicationSlotDrop(const char *name, bool nowait);
 
-extern void ReplicationSlotAcquire(const char *name);
+extern void ReplicationSlotAcquire(const char *name, bool nowait);
 extern void ReplicationSlotRelease(void);
 extern void ReplicationSlotCleanup(void);
 extern void ReplicationSlotSave(void);
-- 
2.11.0

#16

Petr Jelinek

petr.jelinek@2ndquadrant.com

over 8 years ago

In reply to: Alvaro Herrera (#15)

Re: More race conditions in logical replication

On 25/07/17 01:33, Alvaro Herrera wrote:

Alvaro Herrera wrote:

I'm back at looking into this again, after a rather exhausting week. I
think I have a better grasp of what is going on in this code now,

Here's an updated patch, which I intend to push tomorrow.

I think the ConditionVariablePrepareToSleep() call in
ReplicationSlotAcquire() needs to be done inside the mutex lock
otherwise there is tiny race condition where the process which has slot
acquired might release the slot between the mutex unlock and the
ConditionVariablePrepareToSleep() call which means we'll never get
signaled and ConditionVariableSleep() will wait forever.

Otherwise the patch looks good to me.

As a side note, the ConditionVariablePrepareToSleep()'s comment could be
improved because currently it says the only advantage is that we skip
double-test in the beginning of ConditionVariableSleep(). But that's not
true, it's essential for preventing race conditions like the one above
because it puts the current process into waiting list so we can be sure
it will be signaled on broadcast once ConditionVariablePrepareToSleep()
has been called.

--
Petr Jelinek http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#17

Alvaro Herrera

alvherre@2ndquadrant.com

over 8 years ago

In reply to: Petr Jelinek (#16)

Re: More race conditions in logical replication

Petr Jelinek wrote:

On 25/07/17 01:33, Alvaro Herrera wrote:

Alvaro Herrera wrote:

I'm back at looking into this again, after a rather exhausting week. I
think I have a better grasp of what is going on in this code now,

Here's an updated patch, which I intend to push tomorrow.

I think the ConditionVariablePrepareToSleep() call in
ReplicationSlotAcquire() needs to be done inside the mutex lock
otherwise there is tiny race condition where the process which has slot
acquired might release the slot between the mutex unlock and the
ConditionVariablePrepareToSleep() call which means we'll never get
signaled and ConditionVariableSleep() will wait forever.

Hmm, yeah, that's not good. However, I didn't like the idea of putting
it inside the locked area, as it's too much code. Instead I added just
before acquiring the spinlock. We cancel the sleep unconditionally
later on if we didn't need to sleep after all.

As I see it, we need to backpatch at least parts of this patch. I've
received reports that in earlier branches pglogical and BDR can
sometimes leave slots behind when removing nodes, and I have a hunch
that it can be explained by the bugs being fixed here. Now we cannot
use condition variables in back-branches, so we'll need to figure out
how to actually do it ...

As a side note, the ConditionVariablePrepareToSleep()'s comment could be
improved because currently it says the only advantage is that we skip
double-test in the beginning of ConditionVariableSleep(). But that's not
true, it's essential for preventing race conditions like the one above
because it puts the current process into waiting list so we can be sure
it will be signaled on broadcast once ConditionVariablePrepareToSleep()
has been called.

Hmm.

--
ï¿½lvaro Herrera https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#18

Robert Haas

robertmhaas@gmail.com

over 8 years ago

In reply to: Alvaro Herrera (#17)

Re: More race conditions in logical replication

On Tue, Jul 25, 2017 at 1:42 PM, Alvaro Herrera
<alvherre@2ndquadrant.com> wrote:

As I see it, we need to backpatch at least parts of this patch. I've
received reports that in earlier branches pglogical and BDR can
sometimes leave slots behind when removing nodes, and I have a hunch
that it can be explained by the bugs being fixed here. Now we cannot
use condition variables in back-branches, so we'll need to figure out
how to actually do it ...

If all you had to back-patch was the condition variable code itself,
that might not really be all that bad, but it depends on the
WaitEventSet stuff, which I think is far too dangerous to back-patch.
However, you might be able to create a dumber, slower version that
only uses WaitLatch.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#19

Robert Haas

robertmhaas@gmail.com

over 8 years ago

In reply to: Petr Jelinek (#16)

Re: More race conditions in logical replication

On Tue, Jul 25, 2017 at 5:47 AM, Petr Jelinek
<petr.jelinek@2ndquadrant.com> wrote:

As a side note, the ConditionVariablePrepareToSleep()'s comment could be
improved because currently it says the only advantage is that we skip
double-test in the beginning of ConditionVariableSleep(). But that's not
true, it's essential for preventing race conditions like the one above
because it puts the current process into waiting list so we can be sure
it will be signaled on broadcast once ConditionVariablePrepareToSleep()
has been called.

But if you don't call ConditionVariablePrepareToSleep() before calling
ConditionVariableSleep(), then the first call to the latter will call
the former and return without doing anything else. So I don't see how
this can ever go wrong if you're using these primitives as documented.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#20

Alvaro Herrera

alvherre@2ndquadrant.com

over 8 years ago

In reply to: Alvaro Herrera (#17)

Re: More race conditions in logical replication

Alvaro Herrera wrote:

Hmm, yeah, that's not good. However, I didn't like the idea of putting
it inside the locked area, as it's too much code. Instead I added just
before acquiring the spinlock. We cancel the sleep unconditionally
later on if we didn't need to sleep after all.

I just noticed that Jacana failed again in the subscription test, and it
looks like it's related to this. I'll take a look tomorrow if no one
beats me to it.
https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=jacana&dt=2017-07-26%2014%3A39%3A54

--
ï¿½lvaro Herrera https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#21

Alvaro Herrera

alvherre@2ndquadrant.com

over 8 years ago

In reply to: Alvaro Herrera (#20)

Re: More race conditions in logical replication

Alvaro Herrera wrote:

Alvaro Herrera wrote:

Hmm, yeah, that's not good. However, I didn't like the idea of putting
it inside the locked area, as it's too much code. Instead I added just
before acquiring the spinlock. We cancel the sleep unconditionally
later on if we didn't need to sleep after all.

I just noticed that Jacana failed again in the subscription test, and it
looks like it's related to this. I'll take a look tomorrow if no one
beats me to it.
https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=jacana&dt=2017-07-26%2014%3A39%3A54

Ahh, I misread the message. This one is actually about the replication
*origin* being still active, not the replication *slot*. I suppose
origins need the same treatment as we just did for slots. Anybody wants
to volunteer a patch?

--
ï¿½lvaro Herrera https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#22

Alvaro Herrera

alvherre@2ndquadrant.com

over 8 years ago

In reply to: Alvaro Herrera (#21)

1 attachment(s)

Re: More race conditions in logical replication

Alvaro Herrera wrote:

Alvaro Herrera wrote:

I just noticed that Jacana failed again in the subscription test, and it
looks like it's related to this. I'll take a look tomorrow if no one
beats me to it.
https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=jacana&dt=2017-07-26%2014%3A39%3A54

Ahh, I misread the message. This one is actually about the replication
*origin* being still active, not the replication *slot*. I suppose
origins need the same treatment as we just did for slots. Anybody wants
to volunteer a patch?

So here's a patch. I was able to reproduce the issue locally by adding
a couple of sleeps, just like Tom did in the case of replication slots.
With this patch it seems the problem is solved. The mechanism is the
same as Petr used to fix replication origins -- if an origin is busy,
sleep on the CV until someone else signals us; and each time the
acquirer PID changes, broadcast a signal. Like before, this is likely
to be a problem in older branches too, but we have no CVs to backpatch
this.

BTW, I noticed that the PG_WAIT_LOCK value that we're using for wait
event here (and in the replication slot case) is bogus. We probably
need something new here.

I found four instances of this problem in buildfarm,
https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=jacana&dt=2017-07-26%2014%3A39%3A54
https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=jacana&dt=2017-07-28%2021%3A00%3A15
https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=jacana&dt=2017-07-31%2007%3A03%3A20
https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=nightjar&dt=2017-08-04%2004%3A34%3A04

Jacana only stopped having it because it broke for unrelated reasons. I
bet we'll see another failure soon ...

--
ï¿½lvaro Herrera https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Attachments:

drop-origins.patchtext/plain; charset=us-asciiDownload

diff --git a/src/backend/commands/subscriptioncmds.c b/src/backend/commands/subscriptioncmds.c
index 3593712791..ae40f7164d 100644
--- a/src/backend/commands/subscriptioncmds.c
+++ b/src/backend/commands/subscriptioncmds.c
@@ -939,7 +939,7 @@ DropSubscription(DropSubscriptionStmt *stmt, bool isTopLevel)
 	snprintf(originname, sizeof(originname), "pg_%u", subid);
 	originid = replorigin_by_name(originname, true);
 	if (originid != InvalidRepOriginId)
-		replorigin_drop(originid);
+		replorigin_drop(originid, false);
 
 	/*
 	 * If there is no slot associated with the subscription, we can finish
diff --git a/src/backend/replication/logical/origin.c b/src/backend/replication/logical/origin.c
index 1c665312a4..4f32e7861c 100644
--- a/src/backend/replication/logical/origin.c
+++ b/src/backend/replication/logical/origin.c
@@ -79,15 +79,15 @@
 #include "access/xact.h"
 
 #include "catalog/indexing.h"
-
 #include "nodes/execnodes.h"
 
 #include "replication/origin.h"
 #include "replication/logical.h"
-
+#include "pgstat.h"
 #include "storage/fd.h"
 #include "storage/ipc.h"
 #include "storage/lmgr.h"
+#include "storage/condition_variable.h"
 #include "storage/copydir.h"
 
 #include "utils/builtins.h"
@@ -125,6 +125,11 @@ typedef struct ReplicationState
 	int			acquired_by;
 
 	/*
+	 * Condition variable that's signalled when acquired_by changes.
+	 */
+	ConditionVariable origin_cv;
+
+	/*
 	 * Lock protecting remote_lsn and local_lsn.
 	 */
 	LWLock		lock;
@@ -324,9 +329,9 @@ replorigin_create(char *roname)
  * Needs to be called in a transaction.
  */
 void
-replorigin_drop(RepOriginId roident)
+replorigin_drop(RepOriginId roident, bool nowait)
 {
-	HeapTuple	tuple = NULL;
+	HeapTuple	tuple;
 	Relation	rel;
 	int			i;
 
@@ -334,6 +339,8 @@ replorigin_drop(RepOriginId roident)
 
 	rel = heap_open(ReplicationOriginRelationId, ExclusiveLock);
 
+restart:
+	tuple = NULL;
 	/* cleanup the slot state info */
 	LWLockAcquire(ReplicationOriginLock, LW_EXCLUSIVE);
 
@@ -346,11 +353,21 @@ replorigin_drop(RepOriginId roident)
 		{
 			if (state->acquired_by != 0)
 			{
-				ereport(ERROR,
-						(errcode(ERRCODE_OBJECT_IN_USE),
-						 errmsg("could not drop replication origin with OID %d, in use by PID %d",
-								state->roident,
-								state->acquired_by)));
+				ConditionVariable  *cv;
+
+				if (nowait)
+					ereport(ERROR,
+							(errcode(ERRCODE_OBJECT_IN_USE),
+							 errmsg("could not drop replication origin with OID %d, in use by PID %d",
+									state->roident,
+									state->acquired_by)));
+				cv = &state->origin_cv;
+
+				LWLockRelease(ReplicationOriginLock);
+				ConditionVariablePrepareToSleep(cv);
+				ConditionVariableSleep(cv, PG_WAIT_LOCK);
+				ConditionVariableCancelSleep();
+				goto restart;
 			}
 
 			/* first WAL log */
@@ -476,8 +493,11 @@ ReplicationOriginShmemInit(void)
 		MemSet(replication_states, 0, ReplicationOriginShmemSize());
 
 		for (i = 0; i < max_replication_slots; i++)
+		{
 			LWLockInitialize(&replication_states[i].lock,
 							 replication_states_ctl->tranche_id);
+			ConditionVariableInit(&replication_states[i].origin_cv);
+		}
 	}
 
 	LWLockRegisterTranche(replication_states_ctl->tranche_id,
@@ -957,16 +977,23 @@ replorigin_get_progress(RepOriginId node, bool flush)
 static void
 ReplicationOriginExitCleanup(int code, Datum arg)
 {
+	ConditionVariable   *cv = NULL;
+
 	LWLockAcquire(ReplicationOriginLock, LW_EXCLUSIVE);
 
 	if (session_replication_state != NULL &&
 		session_replication_state->acquired_by == MyProcPid)
 	{
+		cv = &session_replication_state->origin_cv;
+
 		session_replication_state->acquired_by = 0;
 		session_replication_state = NULL;
 	}
 
 	LWLockRelease(ReplicationOriginLock);
+
+	if (cv)
+		ConditionVariableBroadcast(cv);
 }
 
 /*
@@ -1056,6 +1083,9 @@ replorigin_session_setup(RepOriginId node)
 	session_replication_state->acquired_by = MyProcPid;
 
 	LWLockRelease(ReplicationOriginLock);
+
+	/* probably this one is pointless */
+	ConditionVariableBroadcast(&session_replication_state->origin_cv);
 }
 
 /*
@@ -1067,6 +1097,8 @@ replorigin_session_setup(RepOriginId node)
 void
 replorigin_session_reset(void)
 {
+	ConditionVariable  *cv;
+
 	Assert(max_replication_slots != 0);
 
 	if (session_replication_state == NULL)
@@ -1077,9 +1109,12 @@ replorigin_session_reset(void)
 	LWLockAcquire(ReplicationOriginLock, LW_EXCLUSIVE);
 
 	session_replication_state->acquired_by = 0;
+	cv = &session_replication_state->origin_cv;
 	session_replication_state = NULL;
 
 	LWLockRelease(ReplicationOriginLock);
+
+	ConditionVariableBroadcast(cv);
 }
 
 /*
@@ -1170,7 +1205,7 @@ pg_replication_origin_drop(PG_FUNCTION_ARGS)
 	roident = replorigin_by_name(name, false);
 	Assert(OidIsValid(roident));
 
-	replorigin_drop(roident);
+	replorigin_drop(roident, false);
 
 	pfree(name);
 
diff --git a/src/include/replication/origin.h b/src/include/replication/origin.h
index ca56c01469..a9595c3c3d 100644
--- a/src/include/replication/origin.h
+++ b/src/include/replication/origin.h
@@ -41,7 +41,7 @@ extern PGDLLIMPORT TimestampTz replorigin_session_origin_timestamp;
 /* API for querying & manipulating replication origins */
 extern RepOriginId replorigin_by_name(char *name, bool missing_ok);
 extern RepOriginId replorigin_create(char *name);
-extern void replorigin_drop(RepOriginId roident);
+extern void replorigin_drop(RepOriginId roident, bool nowait);
 extern bool replorigin_by_oid(RepOriginId roident, bool missing_ok,
 				  char **roname);

#23

Robert Haas

robertmhaas@gmail.com

over 8 years ago

In reply to: Alvaro Herrera (#22)

Re: More race conditions in logical replication

On Mon, Aug 7, 2017 at 8:14 PM, Alvaro Herrera <alvherre@2ndquadrant.com> wrote:

BTW, I noticed that the PG_WAIT_LOCK value that we're using for wait
event here (and in the replication slot case) is bogus. We probably
need something new here.

Yeah, if you're adding a new wait point, you should add document a new
constant in the appropriate section, probably something under
PG_WAIT_IPC in this case.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#24

Alvaro Herrera

alvherre@2ndquadrant.com

over 8 years ago

In reply to: Robert Haas (#23)

1 attachment(s)

Re: More race conditions in logical replication

Robert Haas wrote:

On Mon, Aug 7, 2017 at 8:14 PM, Alvaro Herrera <alvherre@2ndquadrant.com> wrote:

BTW, I noticed that the PG_WAIT_LOCK value that we're using for wait
event here (and in the replication slot case) is bogus. We probably
need something new here.

Yeah, if you're adding a new wait point, you should add document a new
constant in the appropriate section, probably something under
PG_WAIT_IPC in this case.

Here's a patch. It turned to be a bit larger than I initially expected.

Wait events are a maintainability fail IMO. I think we need to rethink
this stuff; using generated files from a single source containing the C
symbol, text name and doc blurb sounds better. That can wait for pg11
though.

--
ï¿½lvaro Herrera https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Attachments:

0001-Fix-various-inadequacies-in-wait-events.patchtext/plain; charset=us-asciiDownload

From e050ef66956935e6dba41fc18e11632ff9f0068f Mon Sep 17 00:00:00 2001
From: Alvaro Herrera <alvherre@alvh.no-ip.org>
Date: Tue, 8 Aug 2017 13:33:47 -0400
Subject: [PATCH] Fix various inadequacies in wait events

In commit 9915de6c1cb2, we introduced a new wait point for replication
slots and incorrectly labelled it as wait event PG_WAIT_LOCK.  That's
wrong, so invent an appropriate new wait event instead, and document it
properly.

While at it, fix numerous other problems in the vicinity:
- two different walreceiver wait events were being mixed up in a single
  wait event; split it out so that they can be distinguished, and
  document the new symbols properly.

- ParallelBitmapPopulate was documented but didn't exist.

- ParallelBitmapScan was not documented

- Logical replication wait events weren't documented

- various symbols had been added in dartboard order in various places.
  Put them back in alphabetical order, as was originally intended.
---
 doc/src/sgml/monitoring.sgml                       | 32 +++++++++++++++++--
 src/backend/postmaster/pgstat.c                    | 36 +++++++++++++---------
 .../libpqwalreceiver/libpqwalreceiver.c            |  4 +--
 src/backend/replication/slot.c                     |  3 +-
 src/include/pgstat.h                               | 16 +++++-----
 5 files changed, 64 insertions(+), 27 deletions(-)

diff --git a/doc/src/sgml/monitoring.sgml b/doc/src/sgml/monitoring.sgml
index be3dc672bc..eb20c9c543 100644
--- a/doc/src/sgml/monitoring.sgml
+++ b/doc/src/sgml/monitoring.sgml
@@ -1176,6 +1176,14 @@ postgres   27093  0.0  0.0  30096  2752 ?        Ss   11:34   0:00 postgres: ser
          <entry>Waiting in main loop of checkpointer process.</entry>
         </row>
         <row>
+         <entry><literal>LogicalLauncherMain</></entry>
+         <entry>Waiting in main loop of logical launcher process.</entry>
+        </row>
+        <row>
+         <entry><literal>LogicalApplyMain</></entry>
+         <entry>Waiting in main loop of logical apply process.</entry>
+        </row>
+        <row>
          <entry><literal>PgStatMain</></entry>
          <entry>Waiting in main loop of the statistics collector process.</entry>
         </row>
@@ -1213,6 +1221,14 @@ postgres   27093  0.0  0.0  30096  2752 ?        Ss   11:34   0:00 postgres: ser
          <entry>Waiting to write data from the client.</entry>
         </row>
         <row>
+         <entry><literal>LibPQWalReceiverConnect</></entry>
+         <entry>Waiting in WAL receiver to establish connection to remote server.<entry>
+        </row>
+        <row>
+         <entry><literal>LibPQWalReceiverReceive</></entry>
+         <entry>Waiting in WAL receiver to receive data from remote server.<entry>
+        </row>
+        <row>
          <entry><literal>SSLOpenServer</></entry>
          <entry>Waiting for SSL while attempting connection.</entry>
         </row>
@@ -1251,6 +1267,14 @@ postgres   27093  0.0  0.0  30096  2752 ?        Ss   11:34   0:00 postgres: ser
          <entry>Waiting for activity from child process when executing <literal>Gather</> node.</entry>
         </row>
         <row>
+         <entry><literal>LogicalSyncData</></entry>
+         <entry>Waiting for logical replication remote server to send data for initial table synchronization.</entry>
+        </row>
+        <row>
+         <entry><literal>LogicalSyncStateChange</></entry>
+         <entry>Waiting for logical replication remote server to change state.</entry>
+        </row>
+        <row>
          <entry><literal>MessageQueueInternal</></entry>
          <entry>Waiting for other process to be attached in shared message queue.</entry>
         </row>
@@ -1271,14 +1295,18 @@ postgres   27093  0.0  0.0  30096  2752 ?        Ss   11:34   0:00 postgres: ser
          <entry>Waiting for parallel workers to finish computing.</entry>
         </row>
         <row>
-         <entry><literal>ParallelBitmapPopulate</></entry>
-         <entry>Waiting for the leader to populate the TidBitmap.</entry>
+         <entry><literal>ParallelBitmapScan</></entry>
+         <entry>Waiting for parallel bitmap scan to become initialized.</entry>
         </row>
         <row>
          <entry><literal>ProcArrayGroupUpdate</></entry>
          <entry>Waiting for group leader to clear transaction id at transaction end.</entry>
         </row>
         <row>
+         <entry><literal>ReplicationSlotDrop</></entry>
+         <entry>Waiting for a replication slot to become inactive to be dropped.</entry>
+        </row>
+        <row>
          <entry><literal>SafeSnapshot</></entry>
          <entry>Waiting for a snapshot for a <literal>READ ONLY DEFERRABLE</> transaction.</entry>
         </row>
diff --git a/src/backend/postmaster/pgstat.c b/src/backend/postmaster/pgstat.c
index a0b0eecbd5..3f5fb796a5 100644
--- a/src/backend/postmaster/pgstat.c
+++ b/src/backend/postmaster/pgstat.c
@@ -3481,6 +3481,12 @@ pgstat_get_wait_activity(WaitEventActivity w)
 		case WAIT_EVENT_CHECKPOINTER_MAIN:
 			event_name = "CheckpointerMain";
 			break;
+		case WAIT_EVENT_LOGICAL_LAUNCHER_MAIN:
+			event_name = "LogicalLauncherMain";
+			break;
+		case WAIT_EVENT_LOGICAL_APPLY_MAIN:
+			event_name = "LogicalApplyMain";
+			break;
 		case WAIT_EVENT_PGSTAT_MAIN:
 			event_name = "PgStatMain";
 			break;
@@ -3502,12 +3508,6 @@ pgstat_get_wait_activity(WaitEventActivity w)
 		case WAIT_EVENT_WAL_WRITER_MAIN:
 			event_name = "WalWriterMain";
 			break;
-		case WAIT_EVENT_LOGICAL_LAUNCHER_MAIN:
-			event_name = "LogicalLauncherMain";
-			break;
-		case WAIT_EVENT_LOGICAL_APPLY_MAIN:
-			event_name = "LogicalApplyMain";
-			break;
 			/* no default case, so that compiler will warn */
 	}
 
@@ -3533,15 +3533,18 @@ pgstat_get_wait_client(WaitEventClient w)
 		case WAIT_EVENT_CLIENT_WRITE:
 			event_name = "ClientWrite";
 			break;
+		case WAIT_EVENT_LIBPQWALRECEIVER_CONNECT:
+			event_name = "LibPQWalReceiverConnect";
+			break;
+		case WAIT_EVENT_LIBPQWALRECEIVER_RECEIVE:
+			event_name = "LibPQWalReceiverReceive";
+			break;
 		case WAIT_EVENT_SSL_OPEN_SERVER:
 			event_name = "SSLOpenServer";
 			break;
 		case WAIT_EVENT_WAL_RECEIVER_WAIT_START:
 			event_name = "WalReceiverWaitStart";
 			break;
-		case WAIT_EVENT_LIBPQWALRECEIVER:
-			event_name = "LibPQWalReceiver";
-			break;
 		case WAIT_EVENT_WAL_SENDER_WAIT_WAL:
 			event_name = "WalSenderWaitForWAL";
 			break;
@@ -3579,6 +3582,12 @@ pgstat_get_wait_ipc(WaitEventIPC w)
 		case WAIT_EVENT_EXECUTE_GATHER:
 			event_name = "ExecuteGather";
 			break;
+		case WAIT_EVENT_LOGICAL_SYNC_DATA:
+			event_name = "LogicalSyncData";
+			break;
+		case WAIT_EVENT_LOGICAL_SYNC_STATE_CHANGE:
+			event_name = "LogicalSyncStateChange";
+			break;
 		case WAIT_EVENT_MQ_INTERNAL:
 			event_name = "MessageQueueInternal";
 			break;
@@ -3600,18 +3609,15 @@ pgstat_get_wait_ipc(WaitEventIPC w)
 		case WAIT_EVENT_PROCARRAY_GROUP_UPDATE:
 			event_name = "ProcArrayGroupUpdate";
 			break;
+		case WAIT_EVENT_REPLICATION_SLOT_DROP:
+			event_name = "ReplicationSlotDrop";
+			break;
 		case WAIT_EVENT_SAFE_SNAPSHOT:
 			event_name = "SafeSnapshot";
 			break;
 		case WAIT_EVENT_SYNC_REP:
 			event_name = "SyncRep";
 			break;
-		case WAIT_EVENT_LOGICAL_SYNC_DATA:
-			event_name = "LogicalSyncData";
-			break;
-		case WAIT_EVENT_LOGICAL_SYNC_STATE_CHANGE:
-			event_name = "LogicalSyncStateChange";
-			break;
 			/* no default case, so that compiler will warn */
 	}
 
diff --git a/src/backend/replication/libpqwalreceiver/libpqwalreceiver.c b/src/backend/replication/libpqwalreceiver/libpqwalreceiver.c
index 93dd7b5c17..de03362c91 100644
--- a/src/backend/replication/libpqwalreceiver/libpqwalreceiver.c
+++ b/src/backend/replication/libpqwalreceiver/libpqwalreceiver.c
@@ -181,7 +181,7 @@ libpqrcv_connect(const char *conninfo, bool logical, const char *appname,
 							   WL_LATCH_SET | io_flag,
 							   PQsocket(conn->streamConn),
 							   0,
-							   WAIT_EVENT_LIBPQWALRECEIVER);
+							   WAIT_EVENT_LIBPQWALRECEIVER_CONNECT);
 
 		/* Emergency bailout? */
 		if (rc & WL_POSTMASTER_DEATH)
@@ -582,7 +582,7 @@ libpqrcv_PQexec(PGconn *streamConn, const char *query)
 								   WL_LATCH_SET,
 								   PQsocket(streamConn),
 								   0,
-								   WAIT_EVENT_LIBPQWALRECEIVER);
+								   WAIT_EVENT_LIBPQWALRECEIVER_RECEIVE);
 
 			/* Emergency bailout? */
 			if (rc & WL_POSTMASTER_DEATH)
diff --git a/src/backend/replication/slot.c b/src/backend/replication/slot.c
index 08c0b1b285..63e1aaa910 100644
--- a/src/backend/replication/slot.c
+++ b/src/backend/replication/slot.c
@@ -391,7 +391,8 @@ retry:
 							name, active_pid)));
 
 		/* Wait here until we get signaled, and then restart */
-		ConditionVariableSleep(&slot->active_cv, PG_WAIT_LOCK);
+		ConditionVariableSleep(&slot->active_cv,
+							   WAIT_EVENT_REPLICATION_SLOT_DROP);
 		ConditionVariableCancelSleep();
 		goto retry;
 	}
diff --git a/src/include/pgstat.h b/src/include/pgstat.h
index 6bffe63ad6..43ea55e9eb 100644
--- a/src/include/pgstat.h
+++ b/src/include/pgstat.h
@@ -759,15 +759,15 @@ typedef enum
 	WAIT_EVENT_BGWRITER_HIBERNATE,
 	WAIT_EVENT_BGWRITER_MAIN,
 	WAIT_EVENT_CHECKPOINTER_MAIN,
+	WAIT_EVENT_LOGICAL_LAUNCHER_MAIN,
+	WAIT_EVENT_LOGICAL_APPLY_MAIN,
 	WAIT_EVENT_PGSTAT_MAIN,
 	WAIT_EVENT_RECOVERY_WAL_ALL,
 	WAIT_EVENT_RECOVERY_WAL_STREAM,
 	WAIT_EVENT_SYSLOGGER_MAIN,
 	WAIT_EVENT_WAL_RECEIVER_MAIN,
 	WAIT_EVENT_WAL_SENDER_MAIN,
-	WAIT_EVENT_WAL_WRITER_MAIN,
-	WAIT_EVENT_LOGICAL_LAUNCHER_MAIN,
-	WAIT_EVENT_LOGICAL_APPLY_MAIN
+	WAIT_EVENT_WAL_WRITER_MAIN
 } WaitEventActivity;
 
 /* ----------
@@ -782,9 +782,10 @@ typedef enum
 {
 	WAIT_EVENT_CLIENT_READ = PG_WAIT_CLIENT,
 	WAIT_EVENT_CLIENT_WRITE,
+	WAIT_EVENT_LIBPQWALRECEIVER_CONNECT,
+	WAIT_EVENT_LIBPQWALRECEIVER_RECEIVE,
 	WAIT_EVENT_SSL_OPEN_SERVER,
 	WAIT_EVENT_WAL_RECEIVER_WAIT_START,
-	WAIT_EVENT_LIBPQWALRECEIVER,
 	WAIT_EVENT_WAL_SENDER_WAIT_WAL,
 	WAIT_EVENT_WAL_SENDER_WRITE_DATA
 } WaitEventClient;
@@ -802,6 +803,8 @@ typedef enum
 	WAIT_EVENT_BGWORKER_STARTUP,
 	WAIT_EVENT_BTREE_PAGE,
 	WAIT_EVENT_EXECUTE_GATHER,
+	WAIT_EVENT_LOGICAL_SYNC_DATA,
+	WAIT_EVENT_LOGICAL_SYNC_STATE_CHANGE,
 	WAIT_EVENT_MQ_INTERNAL,
 	WAIT_EVENT_MQ_PUT_MESSAGE,
 	WAIT_EVENT_MQ_RECEIVE,
@@ -809,10 +812,9 @@ typedef enum
 	WAIT_EVENT_PARALLEL_FINISH,
 	WAIT_EVENT_PARALLEL_BITMAP_SCAN,
 	WAIT_EVENT_PROCARRAY_GROUP_UPDATE,
+	WAIT_EVENT_REPLICATION_SLOT_DROP,
 	WAIT_EVENT_SAFE_SNAPSHOT,
-	WAIT_EVENT_SYNC_REP,
-	WAIT_EVENT_LOGICAL_SYNC_DATA,
-	WAIT_EVENT_LOGICAL_SYNC_STATE_CHANGE
+	WAIT_EVENT_SYNC_REP
 } WaitEventIPC;
 
 /* ----------
-- 
2.11.0

#25

Michael Paquier

michael.paquier@gmail.com

over 8 years ago

In reply to: Alvaro Herrera (#24)

1 attachment(s)

Re: More race conditions in logical replication

On Tue, Aug 8, 2017 at 8:11 PM, Alvaro Herrera <alvherre@2ndquadrant.com> wrote:

Robert Haas wrote:

On Mon, Aug 7, 2017 at 8:14 PM, Alvaro Herrera <alvherre@2ndquadrant.com> wrote:

BTW, I noticed that the PG_WAIT_LOCK value that we're using for wait
event here (and in the replication slot case) is bogus. We probably
need something new here.

Yeah, if you're adding a new wait point, you should add document a new
constant in the appropriate section, probably something under
PG_WAIT_IPC in this case.

Here's a patch. It turned to be a bit larger than I initially expected.

Álvaro, 030273b7 did not get things completely right. A couple of wait
events have been added in the docs for sections Client, IPC and
Activity but it forgot to update the counters for morerows . Attached
is a patch to fix all that.
--
Michael

Attachments:

docfix-wait-event-rows.patchapplication/octet-stream; name=docfix-wait-event-rows.patchDownload

diff --git a/doc/src/sgml/monitoring.sgml b/doc/src/sgml/monitoring.sgml
index 12d5628266..751a7669e0 100644
--- a/doc/src/sgml/monitoring.sgml
+++ b/doc/src/sgml/monitoring.sgml
@@ -1155,7 +1155,7 @@ postgres   27093  0.0  0.0  30096  2752 ?        Ss   11:34   0:00 postgres: ser
          <entry>Waiting to acquire a pin on a buffer.</entry>
         </row>
         <row>
-         <entry morerows="11"><literal>Activity</></entry>
+         <entry morerows="13"><literal>Activity</></entry>
          <entry><literal>ArchiverMain</></entry>
          <entry>Waiting in main loop of the archiver process.</entry>
         </row>
@@ -1212,7 +1212,7 @@ postgres   27093  0.0  0.0  30096  2752 ?        Ss   11:34   0:00 postgres: ser
          <entry>Waiting in main loop of WAL writer process.</entry>
         </row>
         <row>
-         <entry morerows="5"><literal>Client</></entry>
+         <entry morerows="7"><literal>Client</></entry>
          <entry><literal>ClientRead</></entry>
          <entry>Waiting to read data from the client.</entry>
         </row>
@@ -1250,7 +1250,7 @@ postgres   27093  0.0  0.0  30096  2752 ?        Ss   11:34   0:00 postgres: ser
          <entry>Waiting in an extension.</entry>
         </row>
         <row>
-         <entry morerows="12"><literal>IPC</></entry>
+         <entry morerows="16"><literal>IPC</></entry>
          <entry><literal>BgWorkerShutdown</></entry>
          <entry>Waiting for background worker to shut down.</entry>
         </row>

#26

Michael Paquier

michael.paquier@gmail.com

over 8 years ago

In reply to: Michael Paquier (#25)

1 attachment(s)

Re: More race conditions in logical replication

On Thu, Aug 10, 2017 at 4:22 PM, Michael Paquier
<michael.paquier@gmail.com> wrote:

On Tue, Aug 8, 2017 at 8:11 PM, Alvaro Herrera <alvherre@2ndquadrant.com> wrote:

Here's a patch. It turned to be a bit larger than I initially expected.

Álvaro, 030273b7 did not get things completely right. A couple of wait
events have been added in the docs for sections Client, IPC and
Activity but it forgot to update the counters for morerows . Attached
is a patch to fix all that.

As the documentation format is weird because of this commit, I am
adding an open item dedicated to that. Look for example at
WalSenderWaitForWAL in
https://www.postgresql.org/docs/devel/static/monitoring-stats.html.
While looking again, I have noticed that one line for the section of
LWLock is weird, so attached is an updated patch.

Thanks,
--
Michael

Attachments:

docfix-wait-event-rows-v2.patchtext/x-patch; charset=US-ASCII; name=docfix-wait-event-rows-v2.patchDownload

diff --git a/doc/src/sgml/monitoring.sgml b/doc/src/sgml/monitoring.sgml
index 12d5628266..5575c2c837 100644
--- a/doc/src/sgml/monitoring.sgml
+++ b/doc/src/sgml/monitoring.sgml
@@ -845,7 +845,7 @@ postgres   27093  0.0  0.0  30096  2752 ?        Ss   11:34   0:00 postgres: ser
 
       <tbody>
        <row>
-        <entry morerows="59"><literal>LWLock</></entry>
+        <entry morerows="60"><literal>LWLock</></entry>
         <entry><literal>ShmemIndexLock</></entry>
         <entry>Waiting to find or allocate space in shared memory.</entry>
        </row>
@@ -1155,7 +1155,7 @@ postgres   27093  0.0  0.0  30096  2752 ?        Ss   11:34   0:00 postgres: ser
          <entry>Waiting to acquire a pin on a buffer.</entry>
         </row>
         <row>
-         <entry morerows="11"><literal>Activity</></entry>
+         <entry morerows="13"><literal>Activity</></entry>
          <entry><literal>ArchiverMain</></entry>
          <entry>Waiting in main loop of the archiver process.</entry>
         </row>
@@ -1212,7 +1212,7 @@ postgres   27093  0.0  0.0  30096  2752 ?        Ss   11:34   0:00 postgres: ser
          <entry>Waiting in main loop of WAL writer process.</entry>
         </row>
         <row>
-         <entry morerows="5"><literal>Client</></entry>
+         <entry morerows="7"><literal>Client</></entry>
          <entry><literal>ClientRead</></entry>
          <entry>Waiting to read data from the client.</entry>
         </row>
@@ -1250,7 +1250,7 @@ postgres   27093  0.0  0.0  30096  2752 ?        Ss   11:34   0:00 postgres: ser
          <entry>Waiting in an extension.</entry>
         </row>
         <row>
-         <entry morerows="12"><literal>IPC</></entry>
+         <entry morerows="16"><literal>IPC</></entry>
          <entry><literal>BgWorkerShutdown</></entry>
          <entry>Waiting for background worker to shut down.</entry>
         </row>

#27

Noah Misch

noah@leadboat.com

over 8 years ago

In reply to: Michael Paquier (#26)

Re: More race conditions in logical replication

On Sat, Aug 12, 2017 at 05:38:29PM +0900, Michael Paquier wrote:

On Thu, Aug 10, 2017 at 4:22 PM, Michael Paquier
<michael.paquier@gmail.com> wrote:

On Tue, Aug 8, 2017 at 8:11 PM, Alvaro Herrera <alvherre@2ndquadrant.com> wrote:

Here's a patch. It turned to be a bit larger than I initially expected.

ï¿½lvaro, 030273b7 did not get things completely right. A couple of wait
events have been added in the docs for sections Client, IPC and
Activity but it forgot to update the counters for morerows . Attached
is a patch to fix all that.

As the documentation format is weird because of this commit, I am
adding an open item dedicated to that. Look for example at
WalSenderWaitForWAL in
https://www.postgresql.org/docs/devel/static/monitoring-stats.html.
While looking again, I have noticed that one line for the section of
LWLock is weird, so attached is an updated patch.

Committed. These counts broke three times in the v10 release cycle. It's too
bad this defect doesn't cause an error when building the docs.

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#28

Michael Paquier

michael.paquier@gmail.com

over 8 years ago

In reply to: Noah Misch (#27)

Re: More race conditions in logical replication

On Sun, Aug 13, 2017 at 10:25 AM, Noah Misch <noah@leadboat.com> wrote:

Committed. These counts broke three times in the v10 release cycle. It's too
bad this defect doesn't cause an error when building the docs.

That's another argument for generating the table dynamically. Thanks
for the commit.
--
Michael

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers