Replication slot drop message is sent after pgstats shutdown.

Started by Masahiko Sawadaover 4 years ago20 messages
#1Masahiko Sawada
sawada.mshk@gmail.com

Hi all,

I found another pass where we report stats after the stats collector
shutdown. The reproducer and the backtrace I got are here:

1. psql -c "begin; create table a (a int); select pg_sleep(30); commit;" &
2. pg_recvlogical --create-slot -S slot -d postgres &
3. stop the server

TRAP: FailedAssertion("pgstat_is_initialized && !pgstat_is_shutdown",
File: "pgstat.c", Line: 4752, PID: 62789)
0 postgres 0x000000010a8ed79a
ExceptionalCondition + 234
1 postgres 0x000000010a5e03d2
pgstat_assert_is_up + 66
2 postgres 0x000000010a5e1dc4 pgstat_send + 20
3 postgres 0x000000010a5e1d5c
pgstat_report_replslot_drop + 108
4 postgres 0x000000010a64c796
ReplicationSlotDropPtr + 838
5 postgres 0x000000010a64c0e9
ReplicationSlotDropAcquired + 89
6 postgres 0x000000010a64bf23
ReplicationSlotRelease + 99
7 postgres 0x000000010a6d60ab ProcKill + 219
8 postgres 0x000000010a6a350c shmem_exit + 444
9 postgres 0x000000010a6a326a
proc_exit_prepare + 122
10 postgres 0x000000010a6a3163 proc_exit + 19
11 postgres 0x000000010a8ee665 errfinish + 1109
12 postgres 0x000000010a6e3535
ProcessInterrupts + 1445
13 postgres 0x000000010a65f654
WalSndWaitForWal + 164
14 postgres 0x000000010a65edb2
logical_read_xlog_page + 146
15 postgres 0x000000010a22c336
ReadPageInternal + 518
16 postgres 0x000000010a22b860 XLogReadRecord + 320
17 postgres 0x000000010a619c67
DecodingContextFindStartpoint + 231
18 postgres 0x000000010a65c105
CreateReplicationSlot + 1237
19 postgres 0x000000010a65b64c
exec_replication_command + 1180
20 postgres 0x000000010a6e6d2b PostgresMain + 2459
21 postgres 0x000000010a5ef1a9 BackendRun + 89
22 postgres 0x000000010a5ee6fd BackendStartup + 557
23 postgres 0x000000010a5ed487 ServerLoop + 759
24 postgres 0x000000010a5eac22 PostmasterMain + 6610
25 postgres 0x000000010a4c32d3 main + 819
26 libdyld.dylib 0x00007fff73477cc9 start + 1

At step #2, wal sender waits for another transaction started at step
#1 to complete after creating the replication slot. When the server is
stopping, wal sender process drops the slot on releasing the slot
since it's still RS_EPHEMERAL. Then, after dropping the slot we report
the message for dropping the slot (see ReplicationSlotDropPtr()).
These are executed in ReplicationSlotRelease() called by ProcKill()
which is called during calling on_shmem_exit callbacks, which is after
shutting down pgstats during before_shmem_exit callbacks. I’ve not
tested yet but I think this can potentially happen also when dropping
a temporary slot. ProcKill() also calls ReplicationSlotCleanup() to
clean up temporary slots.

There are some ideas to fix this issue but I don’t think it’s a good
idea to move either ProcKill() or the slot releasing code to
before_shmem_exit in this case, like we did for other similar
issues[1]https://git.postgresql.org/gitweb/?p=postgresql.git;a=commit;h=675c945394b36c2db0e8c8c9f6209c131ce3f0a8[2]https://git.postgresql.org/gitweb/?p=postgresql.git;a=commit;h=dcac5e7ac157964f71f15d81c7429130c69c3f9b. Reporting the slot dropping message on dropping the slot
isn’t necessarily essential actually since autovacuums periodically
check already-dropped slots and report to drop the stats. So another
idea would be to move pgstat_report_replslot_drop() to a higher layer
such as ReplicationSlotDrop() and ReplicationSlotsDropDBSlots() that
are not called during callbacks. The replication slot stats are
dropped when it’s dropped via commands such as
pg_drop_replication_slot() and DROP_REPLICATION_SLOT. On the other
hand, for temporary slots and ephemeral slots, we rely on autovacuums
to drop their stats. Even if we delay to drop the stats for those
slots, pg_stat_replication_slots don’t show the stats for
already-dropped slots.

Any other ideas?

Regards,

[1]: https://git.postgresql.org/gitweb/?p=postgresql.git;a=commit;h=675c945394b36c2db0e8c8c9f6209c131ce3f0a8
[2]: https://git.postgresql.org/gitweb/?p=postgresql.git;a=commit;h=dcac5e7ac157964f71f15d81c7429130c69c3f9b

--
Masahiko Sawada
EDB: https://www.enterprisedb.com/

#2Andres Freund
andres@anarazel.de
In reply to: Masahiko Sawada (#1)
Re: Replication slot drop message is sent after pgstats shutdown.

Hi,

On 2021-08-31 11:37:08 +0900, Masahiko Sawada wrote:

At step #2, wal sender waits for another transaction started at step
#1 to complete after creating the replication slot. When the server is
stopping, wal sender process drops the slot on releasing the slot
since it's still RS_EPHEMERAL. Then, after dropping the slot we report
the message for dropping the slot (see ReplicationSlotDropPtr()).
These are executed in ReplicationSlotRelease() called by ProcKill()
which is called during calling on_shmem_exit callbacks, which is after
shutting down pgstats during before_shmem_exit callbacks. I’ve not
tested yet but I think this can potentially happen also when dropping
a temporary slot. ProcKill() also calls ReplicationSlotCleanup() to
clean up temporary slots.

There are some ideas to fix this issue but I don’t think it’s a good
idea to move either ProcKill() or the slot releasing code to
before_shmem_exit in this case, like we did for other similar
issues[1][2].

Yea, that's clearly not an option.

I wonder why the replication slot stuff is in ProcKill() rather than its
own callback. That's probably my fault, but I don't remember what lead
to that.

Reporting the slot dropping message on dropping the slot
isn’t necessarily essential actually since autovacuums periodically
check already-dropped slots and report to drop the stats. So another
idea would be to move pgstat_report_replslot_drop() to a higher layer
such as ReplicationSlotDrop() and ReplicationSlotsDropDBSlots() that
are not called during callbacks. The replication slot stats are
dropped when it’s dropped via commands such as
pg_drop_replication_slot() and DROP_REPLICATION_SLOT. On the other
hand, for temporary slots and ephemeral slots, we rely on autovacuums
to drop their stats. Even if we delay to drop the stats for those
slots, pg_stat_replication_slots don’t show the stats for
already-dropped slots.

Yea, we could do that, but I think it'd be nicer to find a bit more
principled solution...

Perhaps moving this stuff out from ProcKill() into its own
before_shmem_exit() callback would do the trick?

Greetings,

Andres Freund

#3Masahiko Sawada
sawada.mshk@gmail.com
In reply to: Andres Freund (#2)
Re: Replication slot drop message is sent after pgstats shutdown.

On Tue, Aug 31, 2021 at 12:45 PM Andres Freund <andres@anarazel.de> wrote:

Hi,

On 2021-08-31 11:37:08 +0900, Masahiko Sawada wrote:

At step #2, wal sender waits for another transaction started at step
#1 to complete after creating the replication slot. When the server is
stopping, wal sender process drops the slot on releasing the slot
since it's still RS_EPHEMERAL. Then, after dropping the slot we report
the message for dropping the slot (see ReplicationSlotDropPtr()).
These are executed in ReplicationSlotRelease() called by ProcKill()
which is called during calling on_shmem_exit callbacks, which is after
shutting down pgstats during before_shmem_exit callbacks. I’ve not
tested yet but I think this can potentially happen also when dropping
a temporary slot. ProcKill() also calls ReplicationSlotCleanup() to
clean up temporary slots.

There are some ideas to fix this issue but I don’t think it’s a good
idea to move either ProcKill() or the slot releasing code to
before_shmem_exit in this case, like we did for other similar
issues[1][2].

Yea, that's clearly not an option.

I wonder why the replication slot stuff is in ProcKill() rather than its
own callback. That's probably my fault, but I don't remember what lead
to that.

Reporting the slot dropping message on dropping the slot
isn’t necessarily essential actually since autovacuums periodically
check already-dropped slots and report to drop the stats. So another
idea would be to move pgstat_report_replslot_drop() to a higher layer
such as ReplicationSlotDrop() and ReplicationSlotsDropDBSlots() that
are not called during callbacks. The replication slot stats are
dropped when it’s dropped via commands such as
pg_drop_replication_slot() and DROP_REPLICATION_SLOT. On the other
hand, for temporary slots and ephemeral slots, we rely on autovacuums
to drop their stats. Even if we delay to drop the stats for those
slots, pg_stat_replication_slots don’t show the stats for
already-dropped slots.

Yea, we could do that, but I think it'd be nicer to find a bit more
principled solution...

Perhaps moving this stuff out from ProcKill() into its own
before_shmem_exit() callback would do the trick?

You mean to move only the part of sending the message to its own
before_shmem_exit() callback? or move ReplicationSlotRelease() and
ReplicationSlotCleanup() from ProcKill() to it?

Regards,

--
Masahiko Sawada
EDB: https://www.enterprisedb.com/

#4Andres Freund
andres@anarazel.de
In reply to: Masahiko Sawada (#3)
Re: Replication slot drop message is sent after pgstats shutdown.

On 2021-08-31 14:22:39 +0900, Masahiko Sawada wrote:

You mean to move only the part of sending the message to its own
before_shmem_exit() callback? or move ReplicationSlotRelease() and
ReplicationSlotCleanup() from ProcKill() to it?

The latter.

#5Masahiko Sawada
sawada.mshk@gmail.com
In reply to: Andres Freund (#4)
1 attachment(s)
Re: Replication slot drop message is sent after pgstats shutdown.

On Tue, Aug 31, 2021 at 2:34 PM Andres Freund <andres@anarazel.de> wrote:

On 2021-08-31 14:22:39 +0900, Masahiko Sawada wrote:

You mean to move only the part of sending the message to its own
before_shmem_exit() callback? or move ReplicationSlotRelease() and
ReplicationSlotCleanup() from ProcKill() to it?

The latter.

Makes sense.

I've attached the patch that moves them to its own
before_shmem_exit(). Unless I missed to register the callback it works
the same as before except for where to release and clean up the slots.

Regards,

--
Masahiko Sawada
EDB: https://www.enterprisedb.com/

Attachments:

move_slot_release_to_before_shmem_exit.patchapplication/octet-stream; name=move_slot_release_to_before_shmem_exit.patchDownload
diff --git a/src/backend/replication/slot.c b/src/backend/replication/slot.c
index 33e9acab4a..f2ae497266 100644
--- a/src/backend/replication/slot.c
+++ b/src/backend/replication/slot.c
@@ -46,6 +46,7 @@
 #include "pgstat.h"
 #include "replication/slot.h"
 #include "storage/fd.h"
+#include "storage/ipc.h"
 #include "storage/proc.h"
 #include "storage/procarray.h"
 #include "utils/builtins.h"
@@ -101,6 +102,8 @@ int			max_replication_slots = 0;	/* the maximum number of replication
 
 static void ReplicationSlotDropAcquired(void);
 static void ReplicationSlotDropPtr(ReplicationSlot *slot);
+static void RegisterReplicationSlotCallback(void);
+static void ReplicationSlotBeforeShmemExit(int code, Datum arg);
 
 /* internal persistency functions */
 static void RestoreSlotFromDisk(const char *name);
@@ -301,6 +304,12 @@ ReplicationSlotCreate(const char *name, bool db_specific,
 	 */
 	CreateSlotOnDisk(slot);
 
+	/*
+	 * Register callback to make sure cleanup and releasing the replication
+	 * slot on exit.
+	 */
+	RegisterReplicationSlotCallback();
+
 	/*
 	 * We need to briefly prevent any other backend from iterating over the
 	 * slots while we flip the in_use flag. We also need to set the active
@@ -455,6 +464,9 @@ retry:
 	/* Let everybody know we've modified this slot */
 	ConditionVariableBroadcast(&s->active_cv);
 
+	/* Register callback to make sure releasing the replication slot on exit */
+	RegisterReplicationSlotCallback();
+
 	/* We made this slot active, so it's ours now. */
 	MyReplicationSlot = s;
 }
@@ -702,6 +714,45 @@ ReplicationSlotDropPtr(ReplicationSlot *slot)
 	LWLockRelease(ReplicationSlotAllocationLock);
 }
 
+/*
+ * Register the callback for replication slot cleanup and releasing if not
+ * registered yet.
+ */
+static void
+RegisterReplicationSlotCallback(void)
+{
+	static bool callback_registered = false;
+
+	if (callback_registered)
+		return;
+
+	/* The callback must be registered before acquiring a replication slot */
+	Assert(MyReplicationSlot == NULL);
+
+	/*
+	 * Register before-shmem-exit hook to ensure releasing and cleanup
+	 * replication slots while we can still report stats. On dropping a
+	 * replication slot, we report the message to drop the replication slot
+	 * stats.
+	 */
+	before_shmem_exit(ReplicationSlotBeforeShmemExit, 0);
+	callback_registered = true;
+}
+
+/*
+ * Release and cleanup replication slots.
+ */
+static void
+ReplicationSlotBeforeShmemExit(int code, Datum arg)
+{
+	/* Make sure active replication slots are released */
+	if (MyReplicationSlot != NULL)
+		ReplicationSlotRelease();
+
+	/* Also cleanup all the temporary slots. */
+	ReplicationSlotCleanup();
+}
+
 /*
  * Serialize the currently acquired slot's state from memory to disk, thereby
  * guaranteeing the current state will survive a crash.
@@ -972,6 +1023,13 @@ ReplicationSlotsDropDBSlots(Oid dboid)
 	if (max_replication_slots <= 0)
 		return;
 
+	/*
+	 * While dropping replication slots, we acquire a slot to reuse
+	 * ReplicationSlotDropAcquired().  So we register callback to make sure
+	 * releasing the replication slot on exit.
+	 */
+	RegisterReplicationSlotCallback();
+
 restart:
 	LWLockAcquire(ReplicationSlotControlLock, LW_SHARED);
 	for (i = 0; i < max_replication_slots; i++)
diff --git a/src/backend/storage/lmgr/proc.c b/src/backend/storage/lmgr/proc.c
index b7d9da0aa9..031a73104e 100644
--- a/src/backend/storage/lmgr/proc.c
+++ b/src/backend/storage/lmgr/proc.c
@@ -847,13 +847,6 @@ ProcKill(int code, Datum arg)
 	/* Cancel any pending condition variable sleep, too */
 	ConditionVariableCancelSleep();
 
-	/* Make sure active replication slots are released */
-	if (MyReplicationSlot != NULL)
-		ReplicationSlotRelease();
-
-	/* Also cleanup all the temporary slots. */
-	ReplicationSlotCleanup();
-
 	/*
 	 * Detach from any lock group of which we are a member.  If the leader
 	 * exist before all other group members, its PGPROC will remain allocated
diff --git a/src/backend/tcop/postgres.c b/src/backend/tcop/postgres.c
index 58b5960e27..36e39c50c0 100644
--- a/src/backend/tcop/postgres.c
+++ b/src/backend/tcop/postgres.c
@@ -4249,8 +4249,9 @@ PostgresMain(int argc, char *argv[],
 		 * We can't release replication slots inside AbortTransaction() as we
 		 * need to be able to start and abort transactions while having a slot
 		 * acquired. But we never need to hold them across top level errors,
-		 * so releasing here is fine. There's another cleanup in ProcKill()
-		 * ensuring we'll correctly cleanup on FATAL errors as well.
+		 * so releasing here is fine. There's another cleanup in
+		 * ReplicationSlotBeforeShmemExit() callback ensuring we'll correctly
+		 * cleanup on FATAL errors as well.
 		 */
 		if (MyReplicationSlot != NULL)
 			ReplicationSlotRelease();
#6Kyotaro Horiguchi
horikyota.ntt@gmail.com
In reply to: Masahiko Sawada (#5)
Re: Replication slot drop message is sent after pgstats shutdown.

At Tue, 31 Aug 2021 17:14:45 +0900, Masahiko Sawada <sawada.mshk@gmail.com> wrote in

On Tue, Aug 31, 2021 at 2:34 PM Andres Freund <andres@anarazel.de> wrote:

On 2021-08-31 14:22:39 +0900, Masahiko Sawada wrote:

You mean to move only the part of sending the message to its own
before_shmem_exit() callback? or move ReplicationSlotRelease() and
ReplicationSlotCleanup() from ProcKill() to it?

The latter.

Makes sense.

I've attached the patch that moves them to its own
before_shmem_exit(). Unless I missed to register the callback it works
the same as before except for where to release and clean up the slots.

Is there any reason we need to register the callback dynamically? It
seems to me what we need to do here is to call the functions at
before-shmem-exit.

regards.

--
Kyotaro Horiguchi
NTT Open Source Software Center

#7Andres Freund
andres@anarazel.de
In reply to: Kyotaro Horiguchi (#6)
Re: Replication slot drop message is sent after pgstats shutdown.

Hi,

On 2021-08-31 18:34:12 +0900, Kyotaro Horiguchi wrote:

At Tue, 31 Aug 2021 17:14:45 +0900, Masahiko Sawada <sawada.mshk@gmail.com> wrote in

On Tue, Aug 31, 2021 at 2:34 PM Andres Freund <andres@anarazel.de> wrote:

On 2021-08-31 14:22:39 +0900, Masahiko Sawada wrote:

You mean to move only the part of sending the message to its own
before_shmem_exit() callback? or move ReplicationSlotRelease() and
ReplicationSlotCleanup() from ProcKill() to it?

The latter.

Makes sense.

I've attached the patch that moves them to its own
before_shmem_exit(). Unless I missed to register the callback it works
the same as before except for where to release and clean up the slots.

Is there any reason we need to register the callback dynamically? It
seems to me what we need to do here is to call the functions at
before-shmem-exit.

+1. I'd just add a ReplicationSlotInitialize() to BaseInit().

Greetings,

Andres Freund

#8Masahiko Sawada
sawada.mshk@gmail.com
In reply to: Andres Freund (#7)
Re: Replication slot drop message is sent after pgstats shutdown.

On Wed, Sep 1, 2021 at 2:39 AM Andres Freund <andres@anarazel.de> wrote:

Hi,

On 2021-08-31 18:34:12 +0900, Kyotaro Horiguchi wrote:

At Tue, 31 Aug 2021 17:14:45 +0900, Masahiko Sawada <sawada.mshk@gmail.com> wrote in

On Tue, Aug 31, 2021 at 2:34 PM Andres Freund <andres@anarazel.de> wrote:

On 2021-08-31 14:22:39 +0900, Masahiko Sawada wrote:

You mean to move only the part of sending the message to its own
before_shmem_exit() callback? or move ReplicationSlotRelease() and
ReplicationSlotCleanup() from ProcKill() to it?

The latter.

Makes sense.

I've attached the patch that moves them to its own
before_shmem_exit(). Unless I missed to register the callback it works
the same as before except for where to release and clean up the slots.

Is there any reason we need to register the callback dynamically? It
seems to me what we need to do here is to call the functions at
before-shmem-exit.

+1. I'd just add a ReplicationSlotInitialize() to BaseInit().

+1. But BaseInit() is also called by auxiliary processes, which seems
not necessary. So isn't it better to add it to InitPostgres() or
InitProcess()?

Regards,

--
Masahiko Sawada
EDB: https://www.enterprisedb.com/

#9Andres Freund
andres@anarazel.de
In reply to: Masahiko Sawada (#8)
Re: Replication slot drop message is sent after pgstats shutdown.

Hi,

On 2021-09-01 10:05:18 +0900, Masahiko Sawada wrote:

On Wed, Sep 1, 2021 at 2:39 AM Andres Freund <andres@anarazel.de> wrote:

On 2021-08-31 18:34:12 +0900, Kyotaro Horiguchi wrote:

At Tue, 31 Aug 2021 17:14:45 +0900, Masahiko Sawada <sawada.mshk@gmail.com> wrote in

On Tue, Aug 31, 2021 at 2:34 PM Andres Freund <andres@anarazel.de> wrote:
I've attached the patch that moves them to its own
before_shmem_exit(). Unless I missed to register the callback it works
the same as before except for where to release and clean up the slots.

Is there any reason we need to register the callback dynamically? It
seems to me what we need to do here is to call the functions at
before-shmem-exit.

+1. I'd just add a ReplicationSlotInitialize() to BaseInit().

+1. But BaseInit() is also called by auxiliary processes, which seems
not necessary. So isn't it better to add it to InitPostgres() or
InitProcess()?

-0.5 - I think we should default to making the environments more similar,
rather than the opposite. With exceptions for cases where that'd cause
overhead or undue complexity. Which I don't see here?

Greetings,

Andres Freund

#10Masahiko Sawada
sawada.mshk@gmail.com
In reply to: Andres Freund (#9)
1 attachment(s)
Re: Replication slot drop message is sent after pgstats shutdown.

On Wed, Sep 1, 2021 at 12:37 PM Andres Freund <andres@anarazel.de> wrote:

Hi,

On 2021-09-01 10:05:18 +0900, Masahiko Sawada wrote:

On Wed, Sep 1, 2021 at 2:39 AM Andres Freund <andres@anarazel.de> wrote:

On 2021-08-31 18:34:12 +0900, Kyotaro Horiguchi wrote:

At Tue, 31 Aug 2021 17:14:45 +0900, Masahiko Sawada <sawada.mshk@gmail.com> wrote in

On Tue, Aug 31, 2021 at 2:34 PM Andres Freund <andres@anarazel.de> wrote:
I've attached the patch that moves them to its own
before_shmem_exit(). Unless I missed to register the callback it works
the same as before except for where to release and clean up the slots.

Is there any reason we need to register the callback dynamically? It
seems to me what we need to do here is to call the functions at
before-shmem-exit.

+1. I'd just add a ReplicationSlotInitialize() to BaseInit().

+1. But BaseInit() is also called by auxiliary processes, which seems
not necessary. So isn't it better to add it to InitPostgres() or
InitProcess()?

-0.5 - I think we should default to making the environments more similar,
rather than the opposite. With exceptions for cases where that'd cause
overhead or undue complexity. Which I don't see here?

Sorry for the late response. I'd missed this discussion for some reason.

I agreed with Andres and Horiguchi-san and attached an updated patch.

Regards,

--
Masahiko Sawada
EDB: https://www.enterprisedb.com/

Attachments:

v2-0001-Move-replication-slot-release-to-before_shmem_exi.patchapplication/octet-stream; name=v2-0001-Move-replication-slot-release-to-before_shmem_exi.patchDownload
From b2fc61d5267e184951aa2740ff7de2bf7130ed7a Mon Sep 17 00:00:00 2001
From: Masahiko Sawada <sawada.mshk@gmail.com>
Date: Fri, 10 Dec 2021 18:01:36 +0900
Subject: [PATCH v2] Move replication slot release to before_shmem_exit().

Previously, the wal sender releases the replication slot in ProcKill()
on error, resulting in sending the replication slot drop message for
the ephemeral slot to the stats collector after it shut down.

To fix this problem, move replication slot release to a
before_shmem_exit() hook that is called before the stats collector
shuts down.
---
 src/backend/replication/slot.c    | 26 ++++++++++++++++++++++++++
 src/backend/storage/lmgr/proc.c   |  7 -------
 src/backend/tcop/postgres.c       |  5 +++--
 src/backend/utils/init/postinit.c |  7 +++++++
 src/include/replication/slot.h    |  1 +
 5 files changed, 37 insertions(+), 9 deletions(-)

diff --git a/src/backend/replication/slot.c b/src/backend/replication/slot.c
index 90ba9b417d..d686f7c0ad 100644
--- a/src/backend/replication/slot.c
+++ b/src/backend/replication/slot.c
@@ -46,6 +46,7 @@
 #include "pgstat.h"
 #include "replication/slot.h"
 #include "storage/fd.h"
+#include "storage/ipc.h"
 #include "storage/proc.h"
 #include "storage/procarray.h"
 #include "utils/builtins.h"
@@ -107,6 +108,8 @@ static void RestoreSlotFromDisk(const char *name);
 static void CreateSlotOnDisk(ReplicationSlot *slot);
 static void SaveSlotToPath(ReplicationSlot *slot, const char *path, int elevel);
 
+static void ReplicationSlotBeforeShmemExit(int code, Datum arg);
+
 /*
  * Report shared-memory space needed by ReplicationSlotsShmemInit.
  */
@@ -160,6 +163,29 @@ ReplicationSlotsShmemInit(void)
 	}
 }
 
+/*
+ * Register the callback for replication slot cleanup and releasing.
+ */
+void
+ReplicationSlotInitialize(void)
+{
+	before_shmem_exit(ReplicationSlotBeforeShmemExit, 0);
+}
+
+/*
+ * Release and cleanup replication slots.
+ */
+static void
+ReplicationSlotBeforeShmemExit(int code, Datum arg)
+{
+	/* Make sure active replication slots are released */
+	if (MyReplicationSlot != NULL)
+		ReplicationSlotRelease();
+
+	/* Also cleanup all the temporary slots. */
+	ReplicationSlotCleanup();
+}
+
 /*
  * Check whether the passed slot name is valid and report errors at elevel.
  *
diff --git a/src/backend/storage/lmgr/proc.c b/src/backend/storage/lmgr/proc.c
index b7d9da0aa9..031a73104e 100644
--- a/src/backend/storage/lmgr/proc.c
+++ b/src/backend/storage/lmgr/proc.c
@@ -847,13 +847,6 @@ ProcKill(int code, Datum arg)
 	/* Cancel any pending condition variable sleep, too */
 	ConditionVariableCancelSleep();
 
-	/* Make sure active replication slots are released */
-	if (MyReplicationSlot != NULL)
-		ReplicationSlotRelease();
-
-	/* Also cleanup all the temporary slots. */
-	ReplicationSlotCleanup();
-
 	/*
 	 * Detach from any lock group of which we are a member.  If the leader
 	 * exist before all other group members, its PGPROC will remain allocated
diff --git a/src/backend/tcop/postgres.c b/src/backend/tcop/postgres.c
index 82de01cdc6..dc9d836297 100644
--- a/src/backend/tcop/postgres.c
+++ b/src/backend/tcop/postgres.c
@@ -4261,8 +4261,9 @@ PostgresMain(const char *dbname, const char *username)
 		 * We can't release replication slots inside AbortTransaction() as we
 		 * need to be able to start and abort transactions while having a slot
 		 * acquired. But we never need to hold them across top level errors,
-		 * so releasing here is fine. There's another cleanup in ProcKill()
-		 * ensuring we'll correctly cleanup on FATAL errors as well.
+		 * so releasing here is fine. There's another cleanup in
+		 * ReplicationSlotBeforeShmemExit() callback ensuring we'll correctly
+		 * cleanup on FATAL errors as well.
 		 */
 		if (MyReplicationSlot != NULL)
 			ReplicationSlotRelease();
diff --git a/src/backend/utils/init/postinit.c b/src/backend/utils/init/postinit.c
index 646126edee..6cf3b3c961 100644
--- a/src/backend/utils/init/postinit.c
+++ b/src/backend/utils/init/postinit.c
@@ -40,6 +40,7 @@
 #include "pgstat.h"
 #include "postmaster/autovacuum.h"
 #include "postmaster/postmaster.h"
+#include "replication/slot.h"
 #include "replication/walsender.h"
 #include "storage/bufmgr.h"
 #include "storage/fd.h"
@@ -547,6 +548,12 @@ BaseInit(void)
 	 * ever try to insert XLOG.
 	 */
 	InitXLogInsert();
+
+	/*
+	 * Register callback to make sure cleanup and releasing the replication
+	 * slot on exit.
+	 */
+	ReplicationSlotInitialize();
 }
 
 
diff --git a/src/include/replication/slot.h b/src/include/replication/slot.h
index 53d773ccff..2f4b123c87 100644
--- a/src/include/replication/slot.h
+++ b/src/include/replication/slot.h
@@ -206,6 +206,7 @@ extern void ReplicationSlotSave(void);
 extern void ReplicationSlotMarkDirty(void);
 
 /* misc stuff */
+extern void ReplicationSlotInitialize(void);
 extern bool ReplicationSlotValidateName(const char *name, int elevel);
 extern void ReplicationSlotReserveWal(void);
 extern void ReplicationSlotsComputeRequiredXmin(bool already_locked);
-- 
2.24.3 (Apple Git-128)

#11Kyotaro Horiguchi
horikyota.ntt@gmail.com
In reply to: Masahiko Sawada (#10)
1 attachment(s)
Re: Replication slot drop message is sent after pgstats shutdown.

At Fri, 10 Dec 2021 18:13:31 +0900, Masahiko Sawada <sawada.mshk@gmail.com> wrote in

I agreed with Andres and Horiguchi-san and attached an updated patch.

Thanks for the new version.

It seems fine, but I have some comments from a cosmetic viewpoint.

+	/*
+	 * Register callback to make sure cleanup and releasing the replication
+	 * slot on exit.
+	 */
+	ReplicationSlotInitialize();

Actually all the function does is that but it looks slightly
inconsistent with the function name. I think we can call it just
"initialization" here. I think we don't need to change the function
comment the same way but either will do for me.

+ReplicationSlotBeforeShmemExit(int code, Datum arg)

The name looks a bit too verbose. Doesn't just "ReplicationSlotShmemExit" work?

-		 * so releasing here is fine. There's another cleanup in ProcKill()
-		 * ensuring we'll correctly cleanup on FATAL errors as well.
+		 * so releasing here is fine. There's another cleanup in
+		 * ReplicationSlotBeforeShmemExit() callback ensuring we'll correctly
+		 * cleanup on FATAL errors as well.

I'd prefer "during before_shmem_exit()" than "in
ReplicationSlotBeforeShmemExit() callback" here. (But the current
wording is also fine by me.)

The attached detects that bug, but I'm not sure it's worth expending
test time, or this might be in the server test suit.

regards.

--
Kyotaro Horiguchi
NTT Open Source Software Center

Attachments:

replslot_shutdown_crash_check.txttext/plain; charset=us-asciiDownload
diff --git a/src/bin/pg_basebackup/t/030_pg_recvlogical.pl b/src/bin/pg_basebackup/t/030_pg_recvlogical.pl
index 90da1662e3..0fb4e67697 100644
--- a/src/bin/pg_basebackup/t/030_pg_recvlogical.pl
+++ b/src/bin/pg_basebackup/t/030_pg_recvlogical.pl
@@ -5,7 +5,8 @@ use strict;
 use warnings;
 use PostgreSQL::Test::Utils;
 use PostgreSQL::Test::Cluster;
-use Test::More tests => 20;
+use Test::More tests => 25;
+use IPC::Run qw(pump finish timer);
 
 program_help_ok('pg_recvlogical');
 program_version_ok('pg_recvlogical');
@@ -106,3 +107,44 @@ $node->command_ok(
 		'--start', '--endpos', "$nextlsn", '--no-loop', '-f', '-'
 	],
 	'replayed a two-phase transaction');
+
+## Check for a crash bug caused by replication-slot cleanup after
+## pgstat shutdown.
+#fire up an interactive psql session
+my $in  = '';
+my $out = '';
+my $timer = timer(5);
+my $h = $node->interactive_psql('postgres', \$in, \$out, $timer);
+like($out, qr/psql/, "print startup banner");
+
+# open a transaction
+$out = "";
+$in .= "BEGIN;\nCREATE TABLE a (a int);\n";
+pump $h until ($out =~ /CREATE TABLE/ || timer->is_expired);
+ok(!timer->is_expired, 'background CREATE TABLE passed');
+
+# this recvlogical waits for the transaction ends
+ok(open(my $recvlogical, '-|',
+		'pg_recvlogical', '--create-slot', '-S', 'test2',
+		'-d', $node->connstr('postgres')),
+   'launch background pg_recvlogical');
+
+$node->poll_query_until('postgres',
+			qq{SELECT count(*) > 0 FROM pg_stat_activity 
+						WHERE backend_type='walsender'
+						AND query like 'CREATE_REPLICATION_SLOT %';});
+# stop server while it hangs. This shouldn't crash server.
+$node->stop;
+ok(open(my $cont, '-|', 'pg_controldata', $node->data_dir),
+   'run pg_controldata');
+my $stop_result = '';
+while (<$cont>)
+{
+	if (/^Database cluster state: *([^ ].*)$/)
+	{
+		$stop_result = $1;
+		last;
+	}
+}
+
+is($stop_result, 'shut down', 'server is properly shut down');
#12Masahiko Sawada
sawada.mshk@gmail.com
In reply to: Kyotaro Horiguchi (#11)
1 attachment(s)
Re: Replication slot drop message is sent after pgstats shutdown.

On Mon, Dec 13, 2021 at 12:11 PM Kyotaro Horiguchi
<horikyota.ntt@gmail.com> wrote:

At Fri, 10 Dec 2021 18:13:31 +0900, Masahiko Sawada <sawada.mshk@gmail.com> wrote in

I agreed with Andres and Horiguchi-san and attached an updated patch.

Thanks for the new version.

It seems fine, but I have some comments from a cosmetic viewpoint.

+       /*
+        * Register callback to make sure cleanup and releasing the replication
+        * slot on exit.
+        */
+       ReplicationSlotInitialize();

Actually all the function does is that but it looks slightly
inconsistent with the function name. I think we can call it just
"initialization" here. I think we don't need to change the function
comment the same way but either will do for me.

+ReplicationSlotBeforeShmemExit(int code, Datum arg)

The name looks a bit too verbose. Doesn't just "ReplicationSlotShmemExit" work?

-                * so releasing here is fine. There's another cleanup in ProcKill()
-                * ensuring we'll correctly cleanup on FATAL errors as well.
+                * so releasing here is fine. There's another cleanup in
+                * ReplicationSlotBeforeShmemExit() callback ensuring we'll correctly
+                * cleanup on FATAL errors as well.

I'd prefer "during before_shmem_exit()" than "in
ReplicationSlotBeforeShmemExit() callback" here. (But the current
wording is also fine by me.)

Thank you for the comments! Agreed with all comments.

I've attached an updated patch. Please review it.

The attached detects that bug, but I'm not sure it's worth expending
test time, or this might be in the server test suit.

Thanks. It's convenient to test this issue but I'm also not sure it's
worth adding to the test suit.

Regards,

--
Masahiko Sawada
EDB: https://www.enterprisedb.com/

Attachments:

v3-0001-Move-replication-slot-release-to-before_shmem_exi.patchapplication/octet-stream; name=v3-0001-Move-replication-slot-release-to-before_shmem_exi.patchDownload
From f9f4b9cc1f21b80aeaf23c1d58534c99347ffb30 Mon Sep 17 00:00:00 2001
From: Masahiko Sawada <sawada.mshk@gmail.com>
Date: Fri, 10 Dec 2021 18:01:36 +0900
Subject: [PATCH v3] Move replication slot release to before_shmem_exit().

Previously, the wal sender releases the replication slot in ProcKill()
on error, resulting in sending the replication slot drop message for
the ephemeral slot to the stats collector after it shut down.

To fix this problem, move replication slot release to a
before_shmem_exit() hook that is called before the stats collector
shuts down.
---
 src/backend/replication/slot.c    | 26 ++++++++++++++++++++++++++
 src/backend/storage/lmgr/proc.c   |  7 -------
 src/backend/tcop/postgres.c       |  5 +++--
 src/backend/utils/init/postinit.c |  6 ++++++
 src/include/replication/slot.h    |  1 +
 5 files changed, 36 insertions(+), 9 deletions(-)

diff --git a/src/backend/replication/slot.c b/src/backend/replication/slot.c
index 90ba9b417d..25124264fd 100644
--- a/src/backend/replication/slot.c
+++ b/src/backend/replication/slot.c
@@ -46,6 +46,7 @@
 #include "pgstat.h"
 #include "replication/slot.h"
 #include "storage/fd.h"
+#include "storage/ipc.h"
 #include "storage/proc.h"
 #include "storage/procarray.h"
 #include "utils/builtins.h"
@@ -107,6 +108,8 @@ static void RestoreSlotFromDisk(const char *name);
 static void CreateSlotOnDisk(ReplicationSlot *slot);
 static void SaveSlotToPath(ReplicationSlot *slot, const char *path, int elevel);
 
+static void ReplicationSlotShmemExit(int code, Datum arg);
+
 /*
  * Report shared-memory space needed by ReplicationSlotsShmemInit.
  */
@@ -160,6 +163,29 @@ ReplicationSlotsShmemInit(void)
 	}
 }
 
+/*
+ * Register the callback for replication slot cleanup and releasing.
+ */
+void
+ReplicationSlotInitialize(void)
+{
+	before_shmem_exit(ReplicationSlotShmemExit, 0);
+}
+
+/*
+ * Release and cleanup replication slots.
+ */
+static void
+ReplicationSlotShmemExit(int code, Datum arg)
+{
+	/* Make sure active replication slots are released */
+	if (MyReplicationSlot != NULL)
+		ReplicationSlotRelease();
+
+	/* Also cleanup all the temporary slots. */
+	ReplicationSlotCleanup();
+}
+
 /*
  * Check whether the passed slot name is valid and report errors at elevel.
  *
diff --git a/src/backend/storage/lmgr/proc.c b/src/backend/storage/lmgr/proc.c
index d1d3cd0dc8..f68975b461 100644
--- a/src/backend/storage/lmgr/proc.c
+++ b/src/backend/storage/lmgr/proc.c
@@ -830,13 +830,6 @@ ProcKill(int code, Datum arg)
 	/* Cancel any pending condition variable sleep, too */
 	ConditionVariableCancelSleep();
 
-	/* Make sure active replication slots are released */
-	if (MyReplicationSlot != NULL)
-		ReplicationSlotRelease();
-
-	/* Also cleanup all the temporary slots. */
-	ReplicationSlotCleanup();
-
 	/*
 	 * Detach from any lock group of which we are a member.  If the leader
 	 * exist before all other group members, its PGPROC will remain allocated
diff --git a/src/backend/tcop/postgres.c b/src/backend/tcop/postgres.c
index 82de01cdc6..ab633d03bf 100644
--- a/src/backend/tcop/postgres.c
+++ b/src/backend/tcop/postgres.c
@@ -4261,8 +4261,9 @@ PostgresMain(const char *dbname, const char *username)
 		 * We can't release replication slots inside AbortTransaction() as we
 		 * need to be able to start and abort transactions while having a slot
 		 * acquired. But we never need to hold them across top level errors,
-		 * so releasing here is fine. There's another cleanup in ProcKill()
-		 * ensuring we'll correctly cleanup on FATAL errors as well.
+		 * so releasing here is fine. There's another cleanup in
+		 * before_shmem_exit() callback ensuring we'll correctly cleanup on
+		 * FATAL errors as well.
 		 */
 		if (MyReplicationSlot != NULL)
 			ReplicationSlotRelease();
diff --git a/src/backend/utils/init/postinit.c b/src/backend/utils/init/postinit.c
index 7292e51f7d..12c012e664 100644
--- a/src/backend/utils/init/postinit.c
+++ b/src/backend/utils/init/postinit.c
@@ -40,6 +40,7 @@
 #include "pgstat.h"
 #include "postmaster/autovacuum.h"
 #include "postmaster/postmaster.h"
+#include "replication/slot.h"
 #include "replication/walsender.h"
 #include "storage/bufmgr.h"
 #include "storage/fd.h"
@@ -547,6 +548,11 @@ BaseInit(void)
 	 * ever try to insert XLOG.
 	 */
 	InitXLogInsert();
+
+	/*
+	 * Initialize replication slots.
+	 */
+	ReplicationSlotInitialize();
 }
 
 
diff --git a/src/include/replication/slot.h b/src/include/replication/slot.h
index 53d773ccff..2f4b123c87 100644
--- a/src/include/replication/slot.h
+++ b/src/include/replication/slot.h
@@ -206,6 +206,7 @@ extern void ReplicationSlotSave(void);
 extern void ReplicationSlotMarkDirty(void);
 
 /* misc stuff */
+extern void ReplicationSlotInitialize(void);
 extern bool ReplicationSlotValidateName(const char *name, int elevel);
 extern void ReplicationSlotReserveWal(void);
 extern void ReplicationSlotsComputeRequiredXmin(bool already_locked);
-- 
2.24.3 (Apple Git-128)

#13Andres Freund
andres@anarazel.de
In reply to: Masahiko Sawada (#12)
1 attachment(s)
Re: Replication slot drop message is sent after pgstats shutdown.

Hi,

On 2021-12-22 22:34:45 +0900, Masahiko Sawada wrote:

I've attached an updated patch. Please review it.

Sorry for dropping the ball on this again :(. I've pushed the fix with some
very minor polishing.

The attached detects that bug, but I'm not sure it's worth expending
test time, or this might be in the server test suit.

Thanks. It's convenient to test this issue but I'm also not sure it's
worth adding to the test suit.

I think it's definitely worth adding a test, but I don't particularly like the
specific test implementation. Primarily because I think it's better to test
this in a cluster that stays running, so that we can verify that the slot drop
worked. It also doesn't seem necessary to create a separate cluster.

I wrote the attached isolation test. I ended up not committing it yet - I'm
worried that there could be some OS dependent output difference, due to some
libpq error handling issues. See [1]/messages/by-id/20220215004143.dlzsn72oqsmqa7uw@alap3.anarazel.de, which Tom pointed out is caused by the
issue discussed in [2]/messages/by-id/20220215004143.dlzsn72oqsmqa7uw@alap3.anarazel.de.

Greetings,

Andres Freund

[1]: /messages/by-id/20220215004143.dlzsn72oqsmqa7uw@alap3.anarazel.de
[2]: /messages/by-id/20220215004143.dlzsn72oqsmqa7uw@alap3.anarazel.de

Attachments:

v1-0001-Add-isolation-test-for-errors-during-logical-slot.patchtext/x-diff; charset=us-asciiDownload
From 343106442eafde4fcc55cc75e78501010d50a883 Mon Sep 17 00:00:00 2001
From: Andres Freund <andres@anarazel.de>
Date: Mon, 14 Feb 2022 17:17:37 -0800
Subject: [PATCH v1] Add isolation test for errors during logical slot
 creation.

Discussion: https://postgr.es/m/CAD21AoDAeEpAbZEyYJsPZJUmSPaRicVSBObaL7sPaofnKz+9zg@mail.gmail.com
---
 contrib/test_decoding/Makefile                |   2 +-
 .../expected/slot_creation_error.out          | 113 ++++++++++++++++++
 .../specs/slot_creation_error.spec            |  41 +++++++
 3 files changed, 155 insertions(+), 1 deletion(-)
 create mode 100644 contrib/test_decoding/expected/slot_creation_error.out
 create mode 100644 contrib/test_decoding/specs/slot_creation_error.spec

diff --git a/contrib/test_decoding/Makefile b/contrib/test_decoding/Makefile
index 56ddc3abaeb..36929dd97d3 100644
--- a/contrib/test_decoding/Makefile
+++ b/contrib/test_decoding/Makefile
@@ -9,7 +9,7 @@ REGRESS = ddl xact rewrite toast permissions decoding_in_xact \
 	sequence
 ISOLATION = mxact delayed_startup ondisk_startup concurrent_ddl_dml \
 	oldest_xmin snapshot_transfer subxact_without_top concurrent_stream \
-	twophase_snapshot
+	twophase_snapshot slot_creation_error
 
 REGRESS_OPTS = --temp-config $(top_srcdir)/contrib/test_decoding/logical.conf
 ISOLATION_OPTS = --temp-config $(top_srcdir)/contrib/test_decoding/logical.conf
diff --git a/contrib/test_decoding/expected/slot_creation_error.out b/contrib/test_decoding/expected/slot_creation_error.out
new file mode 100644
index 00000000000..fc1c98b5248
--- /dev/null
+++ b/contrib/test_decoding/expected/slot_creation_error.out
@@ -0,0 +1,113 @@
+Parsed test spec with 2 sessions
+
+starting permutation: s1_b s1_xid s2_init s1_view_slot s1_cancel_s2 s1_view_slot s1_c
+step s1_b: BEGIN ISOLATION LEVEL SERIALIZABLE;
+step s1_xid: SELECT 'xid' FROM txid_current();
+?column?
+--------
+xid     
+(1 row)
+
+step s2_init: 
+    SELECT 'init' FROM pg_create_logical_replication_slot('slot_creation_error', 'test_decoding');
+ <waiting ...>
+step s1_view_slot: 
+    SELECT slot_name, slot_type, active FROM pg_replication_slots WHERE slot_name = 'slot_creation_error'
+
+slot_name          |slot_type|active
+-------------------+---------+------
+slot_creation_error|logical  |t     
+(1 row)
+
+step s1_cancel_s2: 
+    SELECT pg_cancel_backend(pid)
+    FROM pg_stat_activity
+    WHERE application_name = 'isolation/slot_creation_error/s2';
+
+pg_cancel_backend
+-----------------
+t                
+(1 row)
+
+step s2_init: <... completed>
+ERROR:  canceling statement due to user request
+step s1_view_slot: 
+    SELECT slot_name, slot_type, active FROM pg_replication_slots WHERE slot_name = 'slot_creation_error'
+
+slot_name|slot_type|active
+---------+---------+------
+(0 rows)
+
+step s1_c: COMMIT;
+
+starting permutation: s1_b s1_xid s2_init s1_c s1_view_slot s1_drop_slot
+step s1_b: BEGIN ISOLATION LEVEL SERIALIZABLE;
+step s1_xid: SELECT 'xid' FROM txid_current();
+?column?
+--------
+xid     
+(1 row)
+
+step s2_init: 
+    SELECT 'init' FROM pg_create_logical_replication_slot('slot_creation_error', 'test_decoding');
+ <waiting ...>
+step s1_c: COMMIT;
+step s2_init: <... completed>
+?column?
+--------
+init    
+(1 row)
+
+step s1_view_slot: 
+    SELECT slot_name, slot_type, active FROM pg_replication_slots WHERE slot_name = 'slot_creation_error'
+
+slot_name          |slot_type|active
+-------------------+---------+------
+slot_creation_error|logical  |f     
+(1 row)
+
+step s1_drop_slot: 
+    SELECT pg_drop_replication_slot('slot_creation_error');
+
+pg_drop_replication_slot
+------------------------
+                        
+(1 row)
+
+
+starting permutation: s1_b s1_xid s2_init s1_terminate_s2 s1_c s1_view_slot
+step s1_b: BEGIN ISOLATION LEVEL SERIALIZABLE;
+step s1_xid: SELECT 'xid' FROM txid_current();
+?column?
+--------
+xid     
+(1 row)
+
+step s2_init: 
+    SELECT 'init' FROM pg_create_logical_replication_slot('slot_creation_error', 'test_decoding');
+ <waiting ...>
+step s1_terminate_s2: 
+    SELECT pg_terminate_backend(pid)
+    FROM pg_stat_activity
+    WHERE application_name = 'isolation/slot_creation_error/s2';
+
+pg_terminate_backend
+--------------------
+t                   
+(1 row)
+
+step s2_init: <... completed>
+FATAL:  terminating connection due to administrator command
+FATAL:  terminating connection due to administrator command
+server closed the connection unexpectedly
+	This probably means the server terminated abnormally
+	before or while processing the request.
+
+step s1_c: COMMIT;
+step s1_view_slot: 
+    SELECT slot_name, slot_type, active FROM pg_replication_slots WHERE slot_name = 'slot_creation_error'
+
+slot_name|slot_type|active
+---------+---------+------
+(0 rows)
+
diff --git a/contrib/test_decoding/specs/slot_creation_error.spec b/contrib/test_decoding/specs/slot_creation_error.spec
new file mode 100644
index 00000000000..582b282331f
--- /dev/null
+++ b/contrib/test_decoding/specs/slot_creation_error.spec
@@ -0,0 +1,41 @@
+# Test that erroring out during logical slot creation is handled properly
+
+session "s1"
+setup { SET synchronous_commit=on; }
+
+step s1_b { BEGIN ISOLATION LEVEL SERIALIZABLE; }
+step s1_xid { SELECT 'xid' FROM txid_current(); }
+step s1_c { COMMIT; }
+step s1_cancel_s2 {
+    SELECT pg_cancel_backend(pid)
+    FROM pg_stat_activity
+    WHERE application_name = 'isolation/slot_creation_error/s2';
+}
+
+step s1_terminate_s2 {
+    SELECT pg_terminate_backend(pid)
+    FROM pg_stat_activity
+    WHERE application_name = 'isolation/slot_creation_error/s2';
+}
+
+step s1_view_slot {
+    SELECT slot_name, slot_type, active FROM pg_replication_slots WHERE slot_name = 'slot_creation_error'
+}
+
+step s1_drop_slot {
+    SELECT pg_drop_replication_slot('slot_creation_error');
+}
+
+session s2
+setup { SET synchronous_commit=on; }
+step s2_init {
+    SELECT 'init' FROM pg_create_logical_replication_slot('slot_creation_error', 'test_decoding');
+}
+
+# The tests first start a transaction with an xid assigned in s1, then create
+# a slot in s2. The slot creation waits for s1's transaction to end. Instead
+# we cancel / terminate s2.
+permutation s1_b s1_xid s2_init s1_view_slot s1_cancel_s2 s1_view_slot s1_c
+permutation s1_b s1_xid s2_init s1_c s1_view_slot s1_drop_slot # check slot creation still works
+permutation s1_b s1_xid s2_init s1_terminate_s2 s1_c s1_view_slot
+# can't run tests after this, due to s2's connection failure
-- 
2.34.0

#14Kyotaro Horiguchi
horikyota.ntt@gmail.com
In reply to: Andres Freund (#13)
Re: Replication slot drop message is sent after pgstats shutdown.

At Mon, 14 Feb 2022 17:20:16 -0800, Andres Freund <andres@anarazel.de> wrote in

Hi,

On 2021-12-22 22:34:45 +0900, Masahiko Sawada wrote:

I've attached an updated patch. Please review it.

Sorry for dropping the ball on this again :(. I've pushed the fix with some
very minor polishing.

Thanks!

The attached detects that bug, but I'm not sure it's worth expending
test time, or this might be in the server test suit.

Thanks. It's convenient to test this issue but I'm also not sure it's
worth adding to the test suit.

I think it's definitely worth adding a test, but I don't particularly like the
specific test implementation. Primarily because I think it's better to test
this in a cluster that stays running, so that we can verify that the slot drop
worked. It also doesn't seem necessary to create a separate cluster.

One of the points I was not satisfied the TAP test is the second point
above. FWIW I agree to the proposed test on the direction.

I wrote the attached isolation test. I ended up not committing it yet - I'm
worried that there could be some OS dependent output difference, due to some
libpq error handling issues. See [1], which Tom pointed out is caused by the
issue discussed in [2].

Mmm.. This is..
slot_creation_error.out

step s2_init: <... completed>
FATAL: terminating connection due to administrator command
FATAL: terminating connection due to administrator command

Greetings,

Andres Freund

[1] /messages/by-id/20220215004143.dlzsn72oqsmqa7uw@alap3.anarazel.de
[2] /messages/by-id/20220215004143.dlzsn72oqsmqa7uw@alap3.anarazel.de

regards.

--
Kyotaro Horiguchi
NTT Open Source Software Center

#15Masahiko Sawada
sawada.mshk@gmail.com
In reply to: Kyotaro Horiguchi (#14)
Re: Replication slot drop message is sent after pgstats shutdown.

On Tue, Feb 15, 2022 at 12:09 PM Kyotaro Horiguchi
<horikyota.ntt@gmail.com> wrote:

At Mon, 14 Feb 2022 17:20:16 -0800, Andres Freund <andres@anarazel.de> wrote in

Hi,

On 2021-12-22 22:34:45 +0900, Masahiko Sawada wrote:

I've attached an updated patch. Please review it.

Sorry for dropping the ball on this again :(. I've pushed the fix with some
very minor polishing.

Thanks!

The attached detects that bug, but I'm not sure it's worth expending
test time, or this might be in the server test suit.

Thanks. It's convenient to test this issue but I'm also not sure it's
worth adding to the test suit.

I think it's definitely worth adding a test, but I don't particularly like the
specific test implementation. Primarily because I think it's better to test
this in a cluster that stays running, so that we can verify that the slot drop
worked. It also doesn't seem necessary to create a separate cluster.

FWIW I agree to the proposed test on the direction.

+1

Regards,

--
Masahiko Sawada
EDB: https://www.enterprisedb.com/

#16Andres Freund
andres@anarazel.de
In reply to: Masahiko Sawada (#15)
Re: Replication slot drop message is sent after pgstats shutdown.

Hi,

On 2022-02-15 14:07:26 +0900, Masahiko Sawada wrote:

On Tue, Feb 15, 2022 at 12:09 PM Kyotaro Horiguchi
<horikyota.ntt@gmail.com> wrote:

FWIW I agree to the proposed test on the direction.

+1

Pushed the test yesterday evening, after Tom checked if it is likely to be
problematic. Seems to worked without problems so far.

Greetings,

Andres Freund

#17Noah Misch
noah@leadboat.com
In reply to: Andres Freund (#16)
1 attachment(s)
Re: Replication slot drop message is sent after pgstats shutdown.

On Tue, Feb 15, 2022 at 08:58:56AM -0800, Andres Freund wrote:

Pushed the test yesterday evening, after Tom checked if it is likely to be
problematic. Seems to worked without problems so far.

wrasse │ 2022-02-15 09:29:06 │ HEAD │ http://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=wrasse&amp;dt=2022-02-15%2009%3A29%3A06
flaviventris │ 2022-02-24 15:17:30 │ HEAD │ http://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=flaviventris&amp;dt=2022-02-24%2015%3A17%3A30
calliphoridae │ 2022-03-08 01:14:51 │ HEAD │ http://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=calliphoridae&amp;dt=2022-03-08%2001%3A14%3A51

The buildfarm failed to convey adequate logs for this particular test suite.
Here's regression.diffs from the wrasse case (saved via keep_error_builds):

===
diff -U3 /export/home/nm/farm/studio64v12_6/HEAD/pgsql/contrib/test_decoding/expected/slot_creation_error.out /export/home/nm/farm/studio64v12_6/HEAD/pgsql.build/contrib/test_decoding/output_iso/results/slot_creation_error.out
--- /export/home/nm/farm/studio64v12_6/HEAD/pgsql/contrib/test_decoding/expected/slot_creation_error.out	Tue Feb 15 06:58:14 2022
+++ /export/home/nm/farm/studio64v12_6/HEAD/pgsql.build/contrib/test_decoding/output_iso/results/slot_creation_error.out	Tue Feb 15 11:38:14 2022
@@ -29,16 +29,17 @@
 t                
 (1 row)

-step s2_init: <... completed>
-ERROR: canceling statement due to user request
step s1_view_slot:
SELECT slot_name, slot_type, active FROM pg_replication_slots WHERE slot_name = 'slot_creation_error'

-slot_name|slot_type|active
----------+---------+------
-(0 rows)
+slot_name          |slot_type|active
+-------------------+---------+------
+slot_creation_error|logical  |t     
+(1 row)
 step s1_c: COMMIT;
+step s2_init: <... completed>
+ERROR:  canceling statement due to user request

starting permutation: s1_b s1_xid s2_init s1_c s1_view_slot s1_drop_slot
step s1_b: BEGIN;
===

I can make it fail that way by injecting a 1s delay here:

--- a/src/backend/tcop/postgres.c
+++ b/src/backend/tcop/postgres.c
@@ -3339,6 +3339,7 @@ ProcessInterrupts(void)
 		 */
 		if (!DoingCommandRead)
 		{
+			pg_usleep(1 * 1000 * 1000);
 			LockErrorCleanup();
 			ereport(ERROR,
 					(errcode(ERRCODE_QUERY_CANCELED),

I plan to fix this as attached, similar to how commit c04c767 fixed the same
challenge in detach-partition-concurrently-[34].

Attachments:

slot_creation_error-cancel-race-v1.patchtext/plain; charset=us-asciiDownload
Author:     Noah Misch <noah@leadboat.com>
Commit:     Noah Misch <noah@leadboat.com>

    Close race condition in slot_creation_error.spec.
    
    Use the pattern from detach-partition-concurrently-3.spec.  Per
    buildfarm member wrasse.
    
    Reviewed by FIXME.
    
    Discussion: https://postgr.es/m/FIXME

diff --git a/contrib/test_decoding/expected/slot_creation_error.out b/contrib/test_decoding/expected/slot_creation_error.out
index 043bdae..25883b5 100644
--- a/contrib/test_decoding/expected/slot_creation_error.out
+++ b/contrib/test_decoding/expected/slot_creation_error.out
@@ -23,14 +23,15 @@ step s1_cancel_s2:
     SELECT pg_cancel_backend(pid)
     FROM pg_stat_activity
     WHERE application_name = 'isolation/slot_creation_error/s2';
-
+ <waiting ...>
+step s2_init: <... completed>
+ERROR:  canceling statement due to user request
+step s1_cancel_s2: <... completed>
 pg_cancel_backend
 -----------------
 t                
 (1 row)
 
-step s2_init: <... completed>
-ERROR:  canceling statement due to user request
 step s1_view_slot: 
     SELECT slot_name, slot_type, active FROM pg_replication_slots WHERE slot_name = 'slot_creation_error'
 
@@ -90,18 +91,19 @@ step s1_terminate_s2:
     SELECT pg_terminate_backend(pid)
     FROM pg_stat_activity
     WHERE application_name = 'isolation/slot_creation_error/s2';
-
-pg_terminate_backend
---------------------
-t                   
-(1 row)
-
+ <waiting ...>
 step s2_init: <... completed>
 FATAL:  terminating connection due to administrator command
 server closed the connection unexpectedly
 	This probably means the server terminated abnormally
 	before or while processing the request.
 
+step s1_terminate_s2: <... completed>
+pg_terminate_backend
+--------------------
+t                   
+(1 row)
+
 step s1_c: COMMIT;
 step s1_view_slot: 
     SELECT slot_name, slot_type, active FROM pg_replication_slots WHERE slot_name = 'slot_creation_error'
diff --git a/contrib/test_decoding/specs/slot_creation_error.spec b/contrib/test_decoding/specs/slot_creation_error.spec
index 6816696..d1e35bf 100644
--- a/contrib/test_decoding/specs/slot_creation_error.spec
+++ b/contrib/test_decoding/specs/slot_creation_error.spec
@@ -35,7 +35,7 @@ step s2_init {
 # The tests first start a transaction with an xid assigned in s1, then create
 # a slot in s2. The slot creation waits for s1's transaction to end. Instead
 # we cancel / terminate s2.
-permutation s1_b s1_xid s2_init s1_view_slot s1_cancel_s2 s1_view_slot s1_c
+permutation s1_b s1_xid s2_init s1_view_slot s1_cancel_s2(s2_init) s1_view_slot s1_c
 permutation s1_b s1_xid s2_init s1_c s1_view_slot s1_drop_slot # check slot creation still works
-permutation s1_b s1_xid s2_init s1_terminate_s2 s1_c s1_view_slot
+permutation s1_b s1_xid s2_init s1_terminate_s2(s2_init) s1_c s1_view_slot
 # can't run tests after this, due to s2's connection failure
#18Kyotaro Horiguchi
horikyota.ntt@gmail.com
In reply to: Noah Misch (#17)
Re: Replication slot drop message is sent after pgstats shutdown.

At Fri, 18 Mar 2022 00:28:37 -0700, Noah Misch <noah@leadboat.com> wrote in

===
diff -U3 /export/home/nm/farm/studio64v12_6/HEAD/pgsql/contrib/test_decoding/expected/slot_creation_error.out /export/home/nm/farm/studio64v12_6/HEAD/pgsql.build/contrib/test_decoding/output_iso/results/slot_creation_error.out
--- /export/home/nm/farm/studio64v12_6/HEAD/pgsql/contrib/test_decoding/expected/slot_creation_error.out	Tue Feb 15 06:58:14 2022
+++ /export/home/nm/farm/studio64v12_6/HEAD/pgsql.build/contrib/test_decoding/output_iso/results/slot_creation_error.out	Tue Feb 15 11:38:14 2022

I plan to fix this as attached, similar to how commit c04c767 fixed the same
challenge in detach-partition-concurrently-[34].

It looks correct and I confirmed that it works.

It looks like a similar issue with [1]/messages/by-id/20220304.113347.2105652521035137491.horikyota.ntt@gmail.com but this is cleaner and stable.

[1]: /messages/by-id/20220304.113347.2105652521035137491.horikyota.ntt@gmail.com

regards.

--
Kyotaro Horiguchi
NTT Open Source Software Center

#19Andres Freund
andres@anarazel.de
In reply to: Noah Misch (#17)
Re: Replication slot drop message is sent after pgstats shutdown.

Hi,

On 2022-03-18 00:28:37 -0700, Noah Misch wrote:

On Tue, Feb 15, 2022 at 08:58:56AM -0800, Andres Freund wrote:

Pushed the test yesterday evening, after Tom checked if it is likely to be
problematic. Seems to worked without problems so far.

wrasse │ 2022-02-15 09:29:06 │ HEAD │ http://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=wrasse&amp;dt=2022-02-15%2009%3A29%3A06
flaviventris │ 2022-02-24 15:17:30 │ HEAD │ http://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=flaviventris&amp;dt=2022-02-24%2015%3A17%3A30
calliphoridae │ 2022-03-08 01:14:51 │ HEAD │ http://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=calliphoridae&amp;dt=2022-03-08%2001%3A14%3A51

Huh. Weirdly enough I saw this failure twice in a development branch
yesterday...

The buildfarm failed to convey adequate logs for this particular test suite.
Here's regression.diffs from the wrasse case (saved via keep_error_builds):

Thanks for getting that!

===
diff -U3 /export/home/nm/farm/studio64v12_6/HEAD/pgsql/contrib/test_decoding/expected/slot_creation_error.out /export/home/nm/farm/studio64v12_6/HEAD/pgsql.build/contrib/test_decoding/output_iso/results/slot_creation_error.out
--- /export/home/nm/farm/studio64v12_6/HEAD/pgsql/contrib/test_decoding/expected/slot_creation_error.out	Tue Feb 15 06:58:14 2022
+++ /export/home/nm/farm/studio64v12_6/HEAD/pgsql.build/contrib/test_decoding/output_iso/results/slot_creation_error.out	Tue Feb 15 11:38:14 2022
@@ -29,16 +29,17 @@
t                
(1 row)

-step s2_init: <... completed>
-ERROR: canceling statement due to user request
step s1_view_slot:
SELECT slot_name, slot_type, active FROM pg_replication_slots WHERE slot_name = 'slot_creation_error'

-slot_name|slot_type|active
----------+---------+------
-(0 rows)
+slot_name          |slot_type|active
+-------------------+---------+------
+slot_creation_error|logical  |t     
+(1 row)
step s1_c: COMMIT;
+step s2_init: <... completed>
+ERROR:  canceling statement due to user request

starting permutation: s1_b s1_xid s2_init s1_c s1_view_slot s1_drop_slot
step s1_b: BEGIN;
===

I can make it fail that way by injecting a 1s delay here:

--- a/src/backend/tcop/postgres.c
+++ b/src/backend/tcop/postgres.c
@@ -3339,6 +3339,7 @@ ProcessInterrupts(void)
*/
if (!DoingCommandRead)
{
+			pg_usleep(1 * 1000 * 1000);
LockErrorCleanup();
ereport(ERROR,
(errcode(ERRCODE_QUERY_CANCELED),

So isolationtester still sees the blocking condition from before the cancel
processing is finished and thus proceeds to the next statement - which it
normally should only do once the other running step is detected as still
blocking?

I wonder if we should emit <waiting> everytime a step is detected anew as
being blocked to make it easier to understand issues like this.

diff --git a/contrib/test_decoding/specs/slot_creation_error.spec b/contrib/test_decoding/specs/slot_creation_error.spec
index 6816696..d1e35bf 100644
--- a/contrib/test_decoding/specs/slot_creation_error.spec
+++ b/contrib/test_decoding/specs/slot_creation_error.spec
@@ -35,7 +35,7 @@ step s2_init {
# The tests first start a transaction with an xid assigned in s1, then create
# a slot in s2. The slot creation waits for s1's transaction to end. Instead
# we cancel / terminate s2.
-permutation s1_b s1_xid s2_init s1_view_slot s1_cancel_s2 s1_view_slot s1_c
+permutation s1_b s1_xid s2_init s1_view_slot s1_cancel_s2(s2_init) s1_view_slot s1_c
permutation s1_b s1_xid s2_init s1_c s1_view_slot s1_drop_slot # check slot creation still works
-permutation s1_b s1_xid s2_init s1_terminate_s2 s1_c s1_view_slot
+permutation s1_b s1_xid s2_init s1_terminate_s2(s2_init) s1_c s1_view_slot
# can't run tests after this, due to s2's connection failure

That looks good to me.

Thanks!

Andres Freund

#20Noah Misch
noah@leadboat.com
In reply to: Andres Freund (#19)
Re: Replication slot drop message is sent after pgstats shutdown.

On Fri, Mar 18, 2022 at 01:24:15PM -0700, Andres Freund wrote:

On 2022-03-18 00:28:37 -0700, Noah Misch wrote:

===
diff -U3 /export/home/nm/farm/studio64v12_6/HEAD/pgsql/contrib/test_decoding/expected/slot_creation_error.out /export/home/nm/farm/studio64v12_6/HEAD/pgsql.build/contrib/test_decoding/output_iso/results/slot_creation_error.out
--- /export/home/nm/farm/studio64v12_6/HEAD/pgsql/contrib/test_decoding/expected/slot_creation_error.out	Tue Feb 15 06:58:14 2022
+++ /export/home/nm/farm/studio64v12_6/HEAD/pgsql.build/contrib/test_decoding/output_iso/results/slot_creation_error.out	Tue Feb 15 11:38:14 2022
@@ -29,16 +29,17 @@
t                
(1 row)

-step s2_init: <... completed>
-ERROR: canceling statement due to user request
step s1_view_slot:
SELECT slot_name, slot_type, active FROM pg_replication_slots WHERE slot_name = 'slot_creation_error'

-slot_name|slot_type|active
----------+---------+------
-(0 rows)
+slot_name          |slot_type|active
+-------------------+---------+------
+slot_creation_error|logical  |t     
+(1 row)
step s1_c: COMMIT;
+step s2_init: <... completed>
+ERROR:  canceling statement due to user request

starting permutation: s1_b s1_xid s2_init s1_c s1_view_slot s1_drop_slot
step s1_b: BEGIN;
===

I can make it fail that way by injecting a 1s delay here:

--- a/src/backend/tcop/postgres.c
+++ b/src/backend/tcop/postgres.c
@@ -3339,6 +3339,7 @@ ProcessInterrupts(void)
*/
if (!DoingCommandRead)
{
+			pg_usleep(1 * 1000 * 1000);
LockErrorCleanup();
ereport(ERROR,
(errcode(ERRCODE_QUERY_CANCELED),

So isolationtester still sees the blocking condition from before the cancel
processing is finished and thus proceeds to the next statement - which it
normally should only do once the other running step is detected as still
blocking?

Essentially. It called s1_view_slot too early. s2_init can remain blocked
arbitrarily long after pg_cancel_backend returns. Writing
s1_cancel_s2(s2_init) directs the isolationtester to send the cancel, then
wait for s2_init to finish, then wait for the cancel to finish.

I wonder if we should emit <waiting> everytime a step is detected anew as
being blocked to make it easier to understand issues like this.

Good question.

diff --git a/contrib/test_decoding/specs/slot_creation_error.spec b/contrib/test_decoding/specs/slot_creation_error.spec
index 6816696..d1e35bf 100644
--- a/contrib/test_decoding/specs/slot_creation_error.spec
+++ b/contrib/test_decoding/specs/slot_creation_error.spec
@@ -35,7 +35,7 @@ step s2_init {
# The tests first start a transaction with an xid assigned in s1, then create
# a slot in s2. The slot creation waits for s1's transaction to end. Instead
# we cancel / terminate s2.
-permutation s1_b s1_xid s2_init s1_view_slot s1_cancel_s2 s1_view_slot s1_c
+permutation s1_b s1_xid s2_init s1_view_slot s1_cancel_s2(s2_init) s1_view_slot s1_c
permutation s1_b s1_xid s2_init s1_c s1_view_slot s1_drop_slot # check slot creation still works
-permutation s1_b s1_xid s2_init s1_terminate_s2 s1_c s1_view_slot
+permutation s1_b s1_xid s2_init s1_terminate_s2(s2_init) s1_c s1_view_slot
# can't run tests after this, due to s2's connection failure

That looks good to me.

Pushed. Kyotaro Horiguchi had posted a patch that also changed the
pg_terminate_backend call in temp-schema-cleanup.spec. I think that one is
fine as-is, because it does pg_terminate_backend(pg_backend_pid()). There's
no way for a backend running that particular command to report that it's ready
for another query, so the problem doesn't arise.