Is it worth adding ReplicationSlot active_pid to ReplicationSlotPersistentData?

Started by Bharath Rupireddyabout 4 years ago49 messages
#1Bharath Rupireddy
bharath.rupireddyforpostgres@gmail.com

Hi,

The active_pid of ReplicationSlot structure, which tells whether a
replication slot is active or inactive, isn't persisted to the disk
i.e has no entry in ReplicationSlotPersistentData structure. Isn't it
better if we add that info to ReplicationSlotPersistentData structure
and persist to the disk? This will help to know what were the inactive
replication slots in case the server goes down or crashes for some
reason. Currently, we don't have a way to interpret the replication
slot info in the disk but there's a patch for pg_replslotdata tool at
[1]: /messages/by-id/CALj2ACW0rV5gWK8A3m6_X62qH+Vfaq5hznC=i0R5Wojt5+yhyw@mail.gmail.com
down/crash and figure out which replication slots to remove to bring
the server up and running without touching the other replication
slots.

Thoughts?

[1]: /messages/by-id/CALj2ACW0rV5gWK8A3m6_X62qH+Vfaq5hznC=i0R5Wojt5+yhyw@mail.gmail.com

Regards,
Bharath Rupireddy.

#2Alvaro Herrera
alvherre@alvh.no-ip.org
In reply to: Bharath Rupireddy (#1)
Re: Is it worth adding ReplicationSlot active_pid to ReplicationSlotPersistentData?

On 2021-Dec-01, Bharath Rupireddy wrote:

The active_pid of ReplicationSlot structure, which tells whether a
replication slot is active or inactive, isn't persisted to the disk
i.e has no entry in ReplicationSlotPersistentData structure. Isn't it
better if we add that info to ReplicationSlotPersistentData structure
and persist to the disk? This will help to know what were the inactive
replication slots in case the server goes down or crashes for some
reason. Currently, we don't have a way to interpret the replication
slot info in the disk but there's a patch for pg_replslotdata tool at
[1]. This way, one can figure out the reasons for the server
down/crash and figure out which replication slots to remove to bring
the server up and running without touching the other replication
slots.

I think the PIDs are log-worthy for sure, but it's not clear to me that
it is desirable to write them to the persistent state file. In case of
crashes, the log should serve just fine to aid root cause investigation
-- in fact even better than the persistent file, where the data would be
lost as soon as the next client acquires that slot.

--
Álvaro Herrera Valdivia, Chile — https://www.EnterpriseDB.com/
"The Postgresql hackers have what I call a "NASA space shot" mentality.
Quite refreshing in a world of "weekend drag racer" developers."
(Scott Marlowe)

#3Bharath Rupireddy
bharath.rupireddyforpostgres@gmail.com
In reply to: Alvaro Herrera (#2)
Re: Is it worth adding ReplicationSlot active_pid to ReplicationSlotPersistentData?

On Wed, Dec 1, 2021 at 9:50 PM Alvaro Herrera <alvherre@alvh.no-ip.org> wrote:

On 2021-Dec-01, Bharath Rupireddy wrote:

The active_pid of ReplicationSlot structure, which tells whether a
replication slot is active or inactive, isn't persisted to the disk
i.e has no entry in ReplicationSlotPersistentData structure. Isn't it
better if we add that info to ReplicationSlotPersistentData structure
and persist to the disk? This will help to know what were the inactive
replication slots in case the server goes down or crashes for some
reason. Currently, we don't have a way to interpret the replication
slot info in the disk but there's a patch for pg_replslotdata tool at
[1]. This way, one can figure out the reasons for the server
down/crash and figure out which replication slots to remove to bring
the server up and running without touching the other replication
slots.

I think the PIDs are log-worthy for sure, but it's not clear to me that
it is desirable to write them to the persistent state file. In case of
crashes, the log should serve just fine to aid root cause investigation
-- in fact even better than the persistent file, where the data would be
lost as soon as the next client acquires that slot.

Thanks. +1 to log a message at LOG level whenever a replication slot
becomes active (gets assigned a valid pid to active_pid) and becomes
inactive(gets assigned 0 to active_pid). Having said that, isn't it
also helpful if we write a bool (1 byte character) whenever the slot
becomes active and inactive to the disk?

Regards,
Bharath Rupireddy.

#4Bharath Rupireddy
bharath.rupireddyforpostgres@gmail.com
In reply to: Bharath Rupireddy (#3)
1 attachment(s)
Re: Is it worth adding ReplicationSlot active_pid to ReplicationSlotPersistentData?

On Fri, Dec 3, 2021 at 7:39 PM Bharath Rupireddy
<bharath.rupireddyforpostgres@gmail.com> wrote:

On Wed, Dec 1, 2021 at 9:50 PM Alvaro Herrera <alvherre@alvh.no-ip.org> wrote:

On 2021-Dec-01, Bharath Rupireddy wrote:

The active_pid of ReplicationSlot structure, which tells whether a
replication slot is active or inactive, isn't persisted to the disk
i.e has no entry in ReplicationSlotPersistentData structure. Isn't it
better if we add that info to ReplicationSlotPersistentData structure
and persist to the disk? This will help to know what were the inactive
replication slots in case the server goes down or crashes for some
reason. Currently, we don't have a way to interpret the replication
slot info in the disk but there's a patch for pg_replslotdata tool at
[1]. This way, one can figure out the reasons for the server
down/crash and figure out which replication slots to remove to bring
the server up and running without touching the other replication
slots.

I think the PIDs are log-worthy for sure, but it's not clear to me that
it is desirable to write them to the persistent state file. In case of
crashes, the log should serve just fine to aid root cause investigation
-- in fact even better than the persistent file, where the data would be
lost as soon as the next client acquires that slot.

Thanks. +1 to log a message at LOG level whenever a replication slot
becomes active (gets assigned a valid pid to active_pid) and becomes
inactive(gets assigned 0 to active_pid). Having said that, isn't it
also helpful if we write a bool (1 byte character) whenever the slot
becomes active and inactive to the disk?

Here's the patch that adds a LOG message whenever a replication slot
becomes active and inactive. These logs will be extremely useful on
production servers to debug and analyze inactive replication slot
issues.

Thoughts?

Regards,
Bharath Rupireddy.

Attachments:

v1-0001-Add-LOG-messages-when-replication-slots-become-ac.patchapplication/octet-stream; name=v1-0001-Add-LOG-messages-when-replication-slots-become-ac.patchDownload
From ec1a6b0bc6e9616c02acbf9e63a00746019bc83e Mon Sep 17 00:00:00 2001
From: Bharath Rupireddy <bharath.rupireddyforpostgres@gmail.com>
Date: Tue, 14 Dec 2021 13:31:26 +0000
Subject: [PATCH v1] Add LOG messages when replication slots become active and
 inactive

These logs will be extremely useful on production servers to debug
and analyze inactive replication slot issues.
---
 src/backend/replication/slot.c | 39 ++++++++++++++++++++++++++++++++++
 1 file changed, 39 insertions(+)

diff --git a/src/backend/replication/slot.c b/src/backend/replication/slot.c
index 90ba9b417d..63bf1140ec 100644
--- a/src/backend/replication/slot.c
+++ b/src/backend/replication/slot.c
@@ -318,6 +318,11 @@ ReplicationSlotCreate(const char *name, bool db_specific,
 	SpinLockRelease(&slot->mutex);
 	MyReplicationSlot = slot;
 
+	ereport(LOG,
+			(errmsg("replication slot \"%s\" becomes active",
+					NameStr(slot->data.name)),
+			 errdetail("The process with PID %d acquired it.", slot->active_pid)));
+
 	LWLockRelease(ReplicationSlotControlLock);
 
 	/*
@@ -457,6 +462,11 @@ retry:
 
 	/* We made this slot active, so it's ours now. */
 	MyReplicationSlot = s;
+
+	ereport(LOG,
+			(errmsg("replication slot \"%s\" becomes active",
+					NameStr(s->data.name)),
+			errdetail("The process with PID %d acquired it.", s->active_pid)));
 }
 
 /*
@@ -499,6 +509,10 @@ ReplicationSlotRelease(void)
 
 	if (slot->data.persistency == RS_PERSISTENT)
 	{
+		int	pid;
+
+		pid = slot->active_pid;
+
 		/*
 		 * Mark persistent slot inactive.  We're not freeing it, just
 		 * disconnecting, but wake up others that may be waiting for it.
@@ -506,6 +520,12 @@ ReplicationSlotRelease(void)
 		SpinLockAcquire(&slot->mutex);
 		slot->active_pid = 0;
 		SpinLockRelease(&slot->mutex);
+
+		ereport(LOG,
+				(errmsg("replication slot \"%s\" becomes inactive",
+						NameStr(slot->data.name)),
+				 errdetail("The process with PID %d released it.", pid)));
+
 		ConditionVariableBroadcast(&slot->active_cv);
 	}
 
@@ -594,6 +614,7 @@ ReplicationSlotDropPtr(ReplicationSlot *slot)
 {
 	char		path[MAXPGPATH];
 	char		tmppath[MAXPGPATH];
+	int			pid;
 
 	/*
 	 * If some other backend ran this code concurrently with us, we might try
@@ -606,6 +627,8 @@ ReplicationSlotDropPtr(ReplicationSlot *slot)
 	sprintf(path, "pg_replslot/%s", NameStr(slot->data.name));
 	sprintf(tmppath, "pg_replslot/%s.tmp", NameStr(slot->data.name));
 
+	pid = slot->active_pid;
+
 	/*
 	 * Rename the slot directory on disk, so that we'll no longer recognize
 	 * this as a valid slot.  Note that if this fails, we've got to mark the
@@ -636,6 +659,14 @@ ReplicationSlotDropPtr(ReplicationSlot *slot)
 		slot->active_pid = 0;
 		SpinLockRelease(&slot->mutex);
 
+		ereport(LOG,
+				(errmsg("replication slot \"%s\" becomes inactive",
+						NameStr(slot->data.name)),
+				 errdetail("The process with PID %d released it.", pid)));
+
+		/* let's not emit the above message twice */
+		pid = -1;
+
 		/* wake up anyone waiting on this slot */
 		ConditionVariableBroadcast(&slot->active_cv);
 
@@ -645,6 +676,7 @@ ReplicationSlotDropPtr(ReplicationSlot *slot)
 						path, tmppath)));
 	}
 
+
 	/*
 	 * The slot is definitely gone.  Lock out concurrent scans of the array
 	 * long enough to kill it.  It's OK to clear the active PID here without
@@ -658,6 +690,13 @@ ReplicationSlotDropPtr(ReplicationSlot *slot)
 	slot->active_pid = 0;
 	slot->in_use = false;
 	LWLockRelease(ReplicationSlotControlLock);
+
+	if (pid > 0)
+		ereport(LOG,
+				(errmsg("replication slot \"%s\" becomes inactive",
+						NameStr(slot->data.name)),
+				 errdetail("The process with PID %d released it.", pid)));
+
 	ConditionVariableBroadcast(&slot->active_cv);
 
 	/*
-- 
2.25.1

#5Kyotaro Horiguchi
horikyota.ntt@gmail.com
In reply to: Bharath Rupireddy (#4)
Re: Is it worth adding ReplicationSlot active_pid to ReplicationSlotPersistentData?

At Tue, 14 Dec 2021 19:04:09 +0530, Bharath Rupireddy <bharath.rupireddyforpostgres@gmail.com> wrote in

On Fri, Dec 3, 2021 at 7:39 PM Bharath Rupireddy
<bharath.rupireddyforpostgres@gmail.com> wrote:

On Wed, Dec 1, 2021 at 9:50 PM Alvaro Herrera <alvherre@alvh.no-ip.org> wrote:

I think the PIDs are log-worthy for sure, but it's not clear to me that
it is desirable to write them to the persistent state file. In case of
crashes, the log should serve just fine to aid root cause investigation
-- in fact even better than the persistent file, where the data would be
lost as soon as the next client acquires that slot.

Thanks. +1 to log a message at LOG level whenever a replication slot
becomes active (gets assigned a valid pid to active_pid) and becomes
inactive(gets assigned 0 to active_pid). Having said that, isn't it
also helpful if we write a bool (1 byte character) whenever the slot
becomes active and inactive to the disk?

Here's the patch that adds a LOG message whenever a replication slot
becomes active and inactive. These logs will be extremely useful on
production servers to debug and analyze inactive replication slot
issues.

Thoughts?

If I create a replication slot, I saw the following lines in server log.

[22054:client backend] LOG: replication slot "s1" becomes active
[22054:client backend] DETAIL: The process with PID 22054 acquired it.
[22054:client backend] STATEMENT: select pg_drop_replication_slot('s1');
[22054:client backend] LOG: replication slot "s1" becomes inactive
[22054:client backend] DETAIL: The process with PID 22054 released it.
[22054:client backend] STATEMENT: select pg_drop_replication_slot('s1');

They are apparently too much as if they were DEBUG3 lines. The
process PID shown is of the process the slot operations took place so
I think it conveys no information. The STATEMENT lines are also noisy
for non-ERROR emssages. Couldn't we hide that line?

That is, how about making the log lines as simple as the follows?

[17156:walsender] LOG: acquired replication slot "s1"
[17156:walsender] LOG: released replication slot "s1"

I think in the first place we don't need this log lines at slot
creation since it is actually not acquirement nor releasing of a slot.

It behaves in a strange way when executing pg_basebackup.

[22864:walsender] LOG: replication slot "pg_basebackup_22864" becomes active
[22864:walsender] DETAIL: The process with PID 22864 acquired it.
[22864:walsender] STATEMENT: CREATE_REPLICATION_SLOT "pg_basebackup_22864" TEMPORARY PHYSICAL ( RESERVE_WAL)
[22864:walsender] LOG: replication slot "pg_basebackup_22864" becomes active
[22864:walsender] DETAIL: The process with PID 22864 acquired it.
[22864:walsender] STATEMENT: START_REPLICATION SLOT "pg_basebackup_22864" 0/6000000 TIMELINE 1
[22864:walsender] LOG: replication slot "pg_basebackup_22864" becomes inactive
[22864:walsender] DETAIL: The process with PID 22864 released it.

The slot is acquired twice then released once. It is becuase the
patch doesn't emit "becomes inactive" line when releasing a temporary
slot. However, I'd rather think we don't need the first 'become
active' line like the previous example.

@@ -658,6 +690,13 @@ ReplicationSlotDropPtr(ReplicationSlot *slot)
 	slot->active_pid = 0;
 	slot->in_use = false;
 	LWLockRelease(ReplicationSlotControlLock);
+
+	if (pid > 0)
+		ereport(LOG,
+				(errmsg("replication slot \"%s\" becomes inactive",
+						NameStr(slot->data.name)),
+				 errdetail("The process with PID %d released it.", pid)));
+

This is wrong. I see a "become inactive" message if I droped an
"inactive" replication slot. The reason the inactive slot looks as if
it were acquired is it is temporarily aquired as a preparing step of
dropping.

Even assuming that the log lines are simplified to this extent, I
still see it a bit strange that the "becomes acitve (or acruied)"
message shows alone without having a message like "replication
connection accepted". But that would be another issue even if it is
true.

regards.

--
Kyotaro Horiguchi
NTT Open Source Software Center

#6Bharath Rupireddy
bharath.rupireddyforpostgres@gmail.com
In reply to: Kyotaro Horiguchi (#5)
1 attachment(s)
Re: Is it worth adding ReplicationSlot active_pid to ReplicationSlotPersistentData?

On Wed, Dec 15, 2021 at 8:32 AM Kyotaro Horiguchi
<horikyota.ntt@gmail.com> wrote:

Here's the patch that adds a LOG message whenever a replication slot
becomes active and inactive. These logs will be extremely useful on
production servers to debug and analyze inactive replication slot
issues.

Thoughts?

If I create a replication slot, I saw the following lines in server log.

[22054:client backend] LOG: replication slot "s1" becomes active
[22054:client backend] DETAIL: The process with PID 22054 acquired it.
[22054:client backend] STATEMENT: select pg_drop_replication_slot('s1');
[22054:client backend] LOG: replication slot "s1" becomes inactive
[22054:client backend] DETAIL: The process with PID 22054 released it.
[22054:client backend] STATEMENT: select pg_drop_replication_slot('s1');

They are apparently too much as if they were DEBUG3 lines. The
process PID shown is of the process the slot operations took place so
I think it conveys no information. The STATEMENT lines are also noisy
for non-ERROR emssages. Couldn't we hide that line?

That is, how about making the log lines as simple as the follows?

[17156:walsender] LOG: acquired replication slot "s1"
[17156:walsender] LOG: released replication slot "s1"

Thanks for taking a look at the patch. Here's what I've come up with:

for drops:
2021-12-28 06:39:34.963 UTC [2541600] LOG: acquired persistent
physical replication slot "myslot1"
2021-12-28 06:39:34.980 UTC [2541600] LOG: dropped persistent
physical replication slot "myslot1"
2021-12-28 06:47:39.994 UTC [2544153] LOG: acquired persistent
logical replication slot "myslot2"
2021-12-28 06:47:40.003 UTC [2544153] LOG: dropped persistent logical
replication slot "myslot2"

for creates:
2021-12-28 06:39:46.859 UTC [2541600] LOG: created persistent
physical replication slot "myslot1"
2021-12-28 06:39:46.859 UTC [2541600] LOG: released persistent
physical replication slot "myslot1"
2021-12-28 06:45:20.037 UTC [2544153] LOG: created persistent logical
replication slot "myslot2"
2021-12-28 06:45:20.058 UTC [2544153] LOG: released persistent
logical replication slot "myslot2"

for pg_basebackup:
2021-12-28 06:41:04.601 UTC [2542686] LOG: created temporary physical
replication slot "pg_basebackup_2542686"
2021-12-28 06:41:04.602 UTC [2542686] LOG: released temporary
physical replication slot "pg_basebackup_2542686"
2021-12-28 06:41:04.602 UTC [2542686] LOG: acquired temporary
physical replication slot "pg_basebackup_2542686"
2021-12-28 06:41:04.867 UTC [2542686] LOG: released temporary
physical replication slot "pg_basebackup_2542686"
2021-12-28 06:41:04.954 UTC [2542686] LOG: dropped temporary physical
replication slot "pg_basebackup_2542686"

The amount of logs may seem noisy, but they do help a lot given the
fact that the server generates much more noise, for instance, the
server logs the syntax error statements. And, the replication slots
don't get created/dropped every now and then (at max, the
pg_basebackup if at all used and configured to take the backups, say,
every 24hrs or so). With the above set of logs debugging for inactive
replication slots becomes easier. One can find the root cause and
answer questions like "why there was a huge WAL file growth at some
point or when did a replication slot become inactive or how much time
a replication slot was inactive or when did a standby disconnected and
connected again or when did a pg_receivewal or pg_recvlogical
connected and disconnected so on.".

Here's the v2 patch. Please provide your thoughts.

Regards,
Bharath Rupireddy.

Attachments:

v2-0001-Add-LOG-messages-when-replication-slots-become-ac.patchapplication/octet-stream; name=v2-0001-Add-LOG-messages-when-replication-slots-become-ac.patchDownload
From ddfe0a371599bc7bf9af6ba2e7dfad04673fb372 Mon Sep 17 00:00:00 2001
From: Bharath Rupireddy <bharath.rupireddyforpostgres@gmail.com>
Date: Tue, 28 Dec 2021 06:53:16 +0000
Subject: [PATCH v2] Add LOG messages when replication slots become active and
 inactive

These logs will be extremely useful on production servers to debug
and analyze inactive replication slot issues.
---
 src/backend/replication/slot.c | 28 ++++++++++++++++++++++++++++
 1 file changed, 28 insertions(+)

diff --git a/src/backend/replication/slot.c b/src/backend/replication/slot.c
index 90ba9b417d..2fdb60927f 100644
--- a/src/backend/replication/slot.c
+++ b/src/backend/replication/slot.c
@@ -337,6 +337,13 @@ ReplicationSlotCreate(const char *name, bool db_specific,
 
 	/* Let everybody know we've modified this slot */
 	ConditionVariableBroadcast(&slot->active_cv);
+
+	ereport(LOG,
+			(errmsg("created %s %s replication slot \"%s\"",
+					(slot->data.persistency == RS_TEMPORARY ? "temporary" : "persistent"),
+					(SlotIsPhysical(slot) ? "physical" : "logical"),
+					NameStr(slot->data.name)),
+			 errhidestmt(true)));
 }
 
 /*
@@ -457,6 +464,13 @@ retry:
 
 	/* We made this slot active, so it's ours now. */
 	MyReplicationSlot = s;
+
+	ereport(LOG,
+			(errmsg("acquired %s %s replication slot \"%s\"",
+					(s->data.persistency == RS_TEMPORARY ? "temporary" : "persistent"),
+					(SlotIsPhysical(s) ? "physical" : "logical"),
+					NameStr(s->data.name)),
+			 errhidestmt(true)));
 }
 
 /*
@@ -516,6 +530,13 @@ ReplicationSlotRelease(void)
 	MyProc->statusFlags &= ~PROC_IN_LOGICAL_DECODING;
 	ProcGlobal->statusFlags[MyProc->pgxactoff] = MyProc->statusFlags;
 	LWLockRelease(ProcArrayLock);
+
+	ereport(LOG,
+			(errmsg("released %s %s replication slot \"%s\"",
+					(slot->data.persistency == RS_TEMPORARY ? "temporary" : "persistent"),
+					(SlotIsPhysical(slot) ? "physical" : "logical"),
+					NameStr(slot->data.name)),
+			 errhidestmt(true)));
 }
 
 /*
@@ -700,6 +721,13 @@ ReplicationSlotDropPtr(ReplicationSlot *slot)
 	 * a slot while we're still cleaning up the detritus of the old one.
 	 */
 	LWLockRelease(ReplicationSlotAllocationLock);
+
+	ereport(LOG,
+			(errmsg("dropped %s %s replication slot \"%s\"",
+					(slot->data.persistency == RS_TEMPORARY ? "temporary" : "persistent"),
+					(SlotIsPhysical(slot) ? "physical" : "logical"),
+					NameStr(slot->data.name)),
+			 errhidestmt(true)));
 }
 
 /*
-- 
2.25.1

#7Kyotaro Horiguchi
horikyota.ntt@gmail.com
In reply to: Bharath Rupireddy (#6)
Re: Is it worth adding ReplicationSlot active_pid to ReplicationSlotPersistentData?

At Tue, 28 Dec 2021 12:28:07 +0530, Bharath Rupireddy <bharath.rupireddyforpostgres@gmail.com> wrote in

On Wed, Dec 15, 2021 at 8:32 AM Kyotaro Horiguchi
<horikyota.ntt@gmail.com> wrote:

Here's the patch that adds a LOG message whenever a replication slot
becomes active and inactive. These logs will be extremely useful on
production servers to debug and analyze inactive replication slot
issues.

Thoughts?

If I create a replication slot, I saw the following lines in server log.

[22054:client backend] LOG: replication slot "s1" becomes active
[22054:client backend] DETAIL: The process with PID 22054 acquired it.
[22054:client backend] STATEMENT: select pg_drop_replication_slot('s1');
[22054:client backend] LOG: replication slot "s1" becomes inactive
[22054:client backend] DETAIL: The process with PID 22054 released it.
[22054:client backend] STATEMENT: select pg_drop_replication_slot('s1');

They are apparently too much as if they were DEBUG3 lines. The
process PID shown is of the process the slot operations took place so
I think it conveys no information. The STATEMENT lines are also noisy
for non-ERROR emssages. Couldn't we hide that line?

That is, how about making the log lines as simple as the follows?

[17156:walsender] LOG: acquired replication slot "s1"
[17156:walsender] LOG: released replication slot "s1"

Thanks for taking a look at the patch. Here's what I've come up with:

for drops:

(two log lines per slot: acquire->drop)

for creates:

(two log lines per slot: create->release)
..

Theses are still needlessly verbose. Even for those who want slot
activities to be logged are not interested in this detail. This is
still debug logs in that sense.

The amount of logs may seem noisy, but they do help a lot given the
fact that the server generates much more noise, for instance, the
server logs the syntax error statements. And, the replication slots
don't get created/dropped every now and then (at max, the
pg_basebackup if at all used and configured to take the backups, say,
every 24hrs or so). With the above set of logs debugging for inactive

In a nearby thread, there was a discussion that checkpoint logs are
too noisy to defaultly turn on. Finally it is committed but I don't
think this is committed as is as it is more verbose (IMV) than the
checkpoint logs. Thus this logs need to be muteable. Meanwhile I
don't think we willingly add a new knob for this feature. I think we
can piggy-back on log_replication_commands for the purpose, changing
its meaning slightly to "log replication commands and related
activity".

replication slots becomes easier. One can find the root cause and
answer questions like "why there was a huge WAL file growth at some
point or when did a replication slot become inactive or how much time
a replication slot was inactive or when did a standby disconnected and
connected again or when did a pg_receivewal or pg_recvlogical
connected and disconnected so on.".

I don't deny it is useful in that cases. If you are fine that the
logs are debug-only, making them DEBUG1 would work. But I don't think
you are fine with that since I think you are going to turn on them on
production systems.

Here's the v2 patch. Please provide your thoughts.

Thanks! I have three comments on this version.

- I still think "acquire/release" logs on creation/dropping should be
silenced. Adding the third parameter to ReplicationSlotAcquire()
that can mute the acquiring (and as well as the corresponding
releasing) log will work.

- Need to mute the logs by log_replication_commands. (We could add
another choice for the variable for this purpose but I think we
don't need it.)

- The messages are not translatable as the variable parts are
adjectives. They need to consist of static sentences. The
combinations of the two properties are 6 (note that persistence is
tristate) but I don't think we accept that complexity for the
information. So I recommend to just remove the variable parts from
the messages.

regards.

--
Kyotaro Horiguchi
NTT Open Source Software Center

#8Bharath Rupireddy
bharath.rupireddyforpostgres@gmail.com
In reply to: Kyotaro Horiguchi (#7)
1 attachment(s)
Re: Is it worth adding ReplicationSlot active_pid to ReplicationSlotPersistentData?

On Wed, Jan 5, 2022 at 12:13 PM Kyotaro Horiguchi <horikyota.ntt@gmail.com>
wrote:

Here's the v2 patch. Please provide your thoughts.

Thanks! I have three comments on this version.

Thanks for your review.

- I still think "acquire/release" logs on creation/dropping should be
silenced. Adding the third parameter to ReplicationSlotAcquire()
that can mute the acquiring (and as well as the corresponding
releasing) log will work.

Done.

can piggy-back on log_replication_commands for the purpose, changing
its meaning slightly to "log replication commands and related
activity".
- Need to mute the logs by log_replication_commands. (We could add
another choice for the variable for this purpose but I think we
don't need it.)

Done.

- The messages are not translatable as the variable parts are
adjectives. They need to consist of static sentences. The
combinations of the two properties are 6 (note that persistence is
tristate) but I don't think we accept that complexity for the
information. So I recommend to just remove the variable parts from
the messages.

Done.

Here's v3, please review it further.

Regards,
Bharath Rupireddy.

Attachments:

v3-0001-Add-LOG-messages-when-replication-slots-become-ac.patchapplication/x-patch; name=v3-0001-Add-LOG-messages-when-replication-slots-become-ac.patchDownload
From 46792e7b33c15afeda83769655e2512674c33269 Mon Sep 17 00:00:00 2001
From: Bharath Rupireddy <bharath.rupireddyforpostgres@gmail.com>
Date: Sun, 9 Jan 2022 09:38:36 +0000
Subject: [PATCH v3] Add LOG messages when replication slots become active and
 inactive

These logs will be extremely useful on production servers to debug
and analyze inactive replication slot issues.
---
 doc/src/sgml/config.sgml                      |  6 ++--
 .../replication/logical/logicalfuncs.c        |  4 +--
 src/backend/replication/slot.c                | 30 ++++++++++++++++---
 src/backend/replication/slotfuncs.c           | 10 +++----
 src/backend/replication/walsender.c           | 12 ++++----
 src/backend/storage/lmgr/proc.c               |  2 +-
 src/backend/tcop/postgres.c                   |  2 +-
 src/backend/utils/misc/guc.c                  |  2 +-
 src/include/replication/slot.h                |  4 +--
 9 files changed, 47 insertions(+), 25 deletions(-)

diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml
index afbb6c35e3..5f95fdffc8 100644
--- a/doc/src/sgml/config.sgml
+++ b/doc/src/sgml/config.sgml
@@ -7284,9 +7284,9 @@ log_line_prefix = '%m [%p] %q%u@%d/%a '
       </term>
       <listitem>
        <para>
-        Causes each replication command to be logged in the server log.
-        See <xref linkend="protocol-replication"/> for more information about
-        replication command. The default value is <literal>off</literal>.
+        Causes each replication command and related activity to be logged in
+        the server log. See <xref linkend="protocol-replication"/> for more
+        information about replication command. The default value is <literal>off</literal>.
         Only superusers can change this setting.
        </para>
       </listitem>
diff --git a/src/backend/replication/logical/logicalfuncs.c b/src/backend/replication/logical/logicalfuncs.c
index 4f633888b4..ef19e48719 100644
--- a/src/backend/replication/logical/logicalfuncs.c
+++ b/src/backend/replication/logical/logicalfuncs.c
@@ -215,7 +215,7 @@ pg_logical_slot_get_changes_guts(FunctionCallInfo fcinfo, bool confirm, bool bin
 	else
 		end_of_wal = GetXLogReplayRecPtr(NULL);
 
-	ReplicationSlotAcquire(NameStr(*name), true);
+	ReplicationSlotAcquire(NameStr(*name), true, true);
 
 	PG_TRY();
 	{
@@ -331,7 +331,7 @@ pg_logical_slot_get_changes_guts(FunctionCallInfo fcinfo, bool confirm, bool bin
 		/* free context, call shutdown callback */
 		FreeDecodingContext(ctx);
 
-		ReplicationSlotRelease();
+		ReplicationSlotRelease(true);
 		InvalidateSystemCaches();
 	}
 	PG_CATCH();
diff --git a/src/backend/replication/slot.c b/src/backend/replication/slot.c
index e5e0cf8768..fbb02bb036 100644
--- a/src/backend/replication/slot.c
+++ b/src/backend/replication/slot.c
@@ -337,6 +337,11 @@ ReplicationSlotCreate(const char *name, bool db_specific,
 
 	/* Let everybody know we've modified this slot */
 	ConditionVariableBroadcast(&slot->active_cv);
+
+	ereport(log_replication_commands ? LOG : DEBUG1,
+			(errmsg("created replication slot \"%s\"",
+					NameStr(slot->data.name)),
+			 errhidestmt(true)));
 }
 
 /*
@@ -377,7 +382,7 @@ SearchNamedReplicationSlot(const char *name, bool need_lock)
  * nowait is false, we sleep until the slot is released by the owning process.
  */
 void
-ReplicationSlotAcquire(const char *name, bool nowait)
+ReplicationSlotAcquire(const char *name, bool nowait, bool msg_ok)
 {
 	ReplicationSlot *s;
 	int			active_pid;
@@ -457,6 +462,12 @@ retry:
 
 	/* We made this slot active, so it's ours now. */
 	MyReplicationSlot = s;
+
+	if (msg_ok)
+		ereport(log_replication_commands ? LOG : DEBUG1,
+				(errmsg("acquired replication slot \"%s\"",
+						NameStr(s->data.name)),
+				 errhidestmt(true)));
 }
 
 /*
@@ -466,7 +477,7 @@ retry:
  * Resources this slot requires will be preserved.
  */
 void
-ReplicationSlotRelease(void)
+ReplicationSlotRelease(bool msg_ok)
 {
 	ReplicationSlot *slot = MyReplicationSlot;
 
@@ -516,6 +527,12 @@ ReplicationSlotRelease(void)
 	MyProc->statusFlags &= ~PROC_IN_LOGICAL_DECODING;
 	ProcGlobal->statusFlags[MyProc->pgxactoff] = MyProc->statusFlags;
 	LWLockRelease(ProcArrayLock);
+
+	if (msg_ok)
+		ereport(log_replication_commands ? LOG : DEBUG1,
+				(errmsg("released replication slot \"%s\"",
+						NameStr(slot->data.name)),
+				 errhidestmt(true)));
 }
 
 /*
@@ -564,7 +581,7 @@ ReplicationSlotDrop(const char *name, bool nowait)
 {
 	Assert(MyReplicationSlot == NULL);
 
-	ReplicationSlotAcquire(name, nowait);
+	ReplicationSlotAcquire(name, nowait, false);
 
 	ReplicationSlotDropAcquired();
 }
@@ -700,6 +717,11 @@ ReplicationSlotDropPtr(ReplicationSlot *slot)
 	 * a slot while we're still cleaning up the detritus of the old one.
 	 */
 	LWLockRelease(ReplicationSlotAllocationLock);
+
+	ereport(log_replication_commands ? LOG : DEBUG1,
+			(errmsg("dropped replication slot \"%s\"",
+					NameStr(slot->data.name)),
+			 errhidestmt(true)));
 }
 
 /*
@@ -1288,7 +1310,7 @@ InvalidatePossiblyObsoleteSlot(ReplicationSlot *s, XLogRecPtr oldestLSN,
 			/* Make sure the invalidated state persists across server restart */
 			ReplicationSlotMarkDirty();
 			ReplicationSlotSave();
-			ReplicationSlotRelease();
+			ReplicationSlotRelease(true);
 
 			ereport(LOG,
 					(errmsg("invalidating slot \"%s\" because its restart_lsn %X/%X exceeds max_slot_wal_keep_size",
diff --git a/src/backend/replication/slotfuncs.c b/src/backend/replication/slotfuncs.c
index ae6316d908..6737c1bb78 100644
--- a/src/backend/replication/slotfuncs.c
+++ b/src/backend/replication/slotfuncs.c
@@ -99,7 +99,7 @@ pg_create_physical_replication_slot(PG_FUNCTION_ARGS)
 	tuple = heap_form_tuple(tupdesc, values, nulls);
 	result = HeapTupleGetDatum(tuple);
 
-	ReplicationSlotRelease();
+	ReplicationSlotRelease(false);
 
 	PG_RETURN_DATUM(result);
 }
@@ -201,7 +201,7 @@ pg_create_logical_replication_slot(PG_FUNCTION_ARGS)
 	/* ok, slot is now fully created, mark it as persistent if needed */
 	if (!temporary)
 		ReplicationSlotPersist();
-	ReplicationSlotRelease();
+	ReplicationSlotRelease(false);
 
 	PG_RETURN_DATUM(result);
 }
@@ -631,7 +631,7 @@ pg_replication_slot_advance(PG_FUNCTION_ARGS)
 		moveto = Min(moveto, GetXLogReplayRecPtr(NULL));
 
 	/* Acquire the slot so we "own" it */
-	ReplicationSlotAcquire(NameStr(*slotname), true);
+	ReplicationSlotAcquire(NameStr(*slotname), true, true);
 
 	/* A slot whose restart_lsn has never been reserved cannot be advanced */
 	if (XLogRecPtrIsInvalid(MyReplicationSlot->data.restart_lsn))
@@ -674,7 +674,7 @@ pg_replication_slot_advance(PG_FUNCTION_ARGS)
 	ReplicationSlotsComputeRequiredXmin(false);
 	ReplicationSlotsComputeRequiredLSN();
 
-	ReplicationSlotRelease();
+	ReplicationSlotRelease(true);
 
 	/* Return the reached position. */
 	values[1] = LSNGetDatum(endlsn);
@@ -905,7 +905,7 @@ copy_replication_slot(FunctionCallInfo fcinfo, bool logical_slot)
 	tuple = heap_form_tuple(tupdesc, values, nulls);
 	result = HeapTupleGetDatum(tuple);
 
-	ReplicationSlotRelease();
+	ReplicationSlotRelease(false);
 
 	PG_RETURN_DATUM(result);
 }
diff --git a/src/backend/replication/walsender.c b/src/backend/replication/walsender.c
index 4cf95ce439..52ec13088e 100644
--- a/src/backend/replication/walsender.c
+++ b/src/backend/replication/walsender.c
@@ -304,7 +304,7 @@ WalSndErrorCleanup(void)
 		wal_segment_close(xlogreader);
 
 	if (MyReplicationSlot != NULL)
-		ReplicationSlotRelease();
+		ReplicationSlotRelease(true);
 
 	ReplicationSlotCleanup();
 
@@ -695,7 +695,7 @@ StartReplication(StartReplicationCmd *cmd)
 
 	if (cmd->slotname)
 	{
-		ReplicationSlotAcquire(cmd->slotname, true);
+		ReplicationSlotAcquire(cmd->slotname, true, true);
 		if (SlotIsLogical(MyReplicationSlot))
 			ereport(ERROR,
 					(errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
@@ -841,7 +841,7 @@ StartReplication(StartReplicationCmd *cmd)
 	}
 
 	if (cmd->slotname)
-		ReplicationSlotRelease();
+		ReplicationSlotRelease(true);
 
 	/*
 	 * Copy is finished now. Send a single-row result set indicating the next
@@ -1220,7 +1220,7 @@ CreateReplicationSlot(CreateReplicationSlotCmd *cmd)
 	do_tup_output(tstate, values, nulls);
 	end_tup_output(tstate);
 
-	ReplicationSlotRelease();
+	ReplicationSlotRelease(false);
 }
 
 /*
@@ -1247,7 +1247,7 @@ StartLogicalReplication(StartReplicationCmd *cmd)
 
 	Assert(!MyReplicationSlot);
 
-	ReplicationSlotAcquire(cmd->slotname, true);
+	ReplicationSlotAcquire(cmd->slotname, true, true);
 
 	if (XLogRecPtrIsInvalid(MyReplicationSlot->data.restart_lsn))
 		ereport(ERROR,
@@ -1316,7 +1316,7 @@ StartLogicalReplication(StartReplicationCmd *cmd)
 	WalSndLoop(XLogSendLogical);
 
 	FreeDecodingContext(logical_decoding_ctx);
-	ReplicationSlotRelease();
+	ReplicationSlotRelease(true);
 
 	replication_active = false;
 	if (got_STOPPING)
diff --git a/src/backend/storage/lmgr/proc.c b/src/backend/storage/lmgr/proc.c
index e306b04738..cb9fe4dc6f 100644
--- a/src/backend/storage/lmgr/proc.c
+++ b/src/backend/storage/lmgr/proc.c
@@ -832,7 +832,7 @@ ProcKill(int code, Datum arg)
 
 	/* Make sure active replication slots are released */
 	if (MyReplicationSlot != NULL)
-		ReplicationSlotRelease();
+		ReplicationSlotRelease(true);
 
 	/* Also cleanup all the temporary slots. */
 	ReplicationSlotCleanup();
diff --git a/src/backend/tcop/postgres.c b/src/backend/tcop/postgres.c
index fda2e9360e..1bd192c03f 100644
--- a/src/backend/tcop/postgres.c
+++ b/src/backend/tcop/postgres.c
@@ -4265,7 +4265,7 @@ PostgresMain(const char *dbname, const char *username)
 		 * ensuring we'll correctly cleanup on FATAL errors as well.
 		 */
 		if (MyReplicationSlot != NULL)
-			ReplicationSlotRelease();
+			ReplicationSlotRelease(true);
 
 		/* We also want to cleanup temporary slots on error. */
 		ReplicationSlotCleanup();
diff --git a/src/backend/utils/misc/guc.c b/src/backend/utils/misc/guc.c
index 6fc5cbc09a..90da7fa20e 100644
--- a/src/backend/utils/misc/guc.c
+++ b/src/backend/utils/misc/guc.c
@@ -1373,7 +1373,7 @@ static struct config_bool ConfigureNamesBool[] =
 	},
 	{
 		{"log_replication_commands", PGC_SUSET, LOGGING_WHAT,
-			gettext_noop("Logs each replication command."),
+			gettext_noop("Logs each replication command and related activity."),
 			NULL
 		},
 		&log_replication_commands,
diff --git a/src/include/replication/slot.h b/src/include/replication/slot.h
index f833b1d6dc..af8c7b5baa 100644
--- a/src/include/replication/slot.h
+++ b/src/include/replication/slot.h
@@ -199,8 +199,8 @@ extern void ReplicationSlotCreate(const char *name, bool db_specific,
 extern void ReplicationSlotPersist(void);
 extern void ReplicationSlotDrop(const char *name, bool nowait);
 
-extern void ReplicationSlotAcquire(const char *name, bool nowait);
-extern void ReplicationSlotRelease(void);
+extern void ReplicationSlotAcquire(const char *name, bool nowait, bool msg_ok);
+extern void ReplicationSlotRelease(bool msg_ok);
 extern void ReplicationSlotCleanup(void);
 extern void ReplicationSlotSave(void);
 extern void ReplicationSlotMarkDirty(void);
-- 
2.25.1

#9Bharath Rupireddy
bharath.rupireddyforpostgres@gmail.com
In reply to: Bharath Rupireddy (#8)
1 attachment(s)
Re: Is it worth adding ReplicationSlot active_pid to ReplicationSlotPersistentData?

On Mon, Jan 10, 2022 at 6:50 AM Bharath Rupireddy
<bharath.rupireddyforpostgres@gmail.com> wrote:

On Wed, Jan 5, 2022 at 12:13 PM Kyotaro Horiguchi <horikyota.ntt@gmail.com> wrote:

Here's the v2 patch. Please provide your thoughts.

Thanks! I have three comments on this version.

Thanks for your review.

- I still think "acquire/release" logs on creation/dropping should be
silenced. Adding the third parameter to ReplicationSlotAcquire()
that can mute the acquiring (and as well as the corresponding
releasing) log will work.

Done.

can piggy-back on log_replication_commands for the purpose, changing
its meaning slightly to "log replication commands and related
activity".
- Need to mute the logs by log_replication_commands. (We could add
another choice for the variable for this purpose but I think we
don't need it.)

Done.

- The messages are not translatable as the variable parts are
adjectives. They need to consist of static sentences. The
combinations of the two properties are 6 (note that persistence is
tristate) but I don't think we accept that complexity for the
information. So I recommend to just remove the variable parts from
the messages.

Done.

Here's v3, please review it further.

Here's the rebased v4 patch.

Regards,
Bharath Rupireddy.

Attachments:

v4-0001-Add-LOG-messages-when-replication-slots-become-ac.patchapplication/octet-stream; name=v4-0001-Add-LOG-messages-when-replication-slots-become-ac.patchDownload
From d6ee893d72931392d7fe01f2f2ce4b1340451f5d Mon Sep 17 00:00:00 2001
From: Bharath Rupireddy <bharath.rupireddyforpostgres@gmail.com>
Date: Sun, 27 Feb 2022 09:44:03 +0000
Subject: [PATCH v4] Add LOG messages when replication slots become active and
 inactive

These logs will be extremely useful on production servers to debug
and analyze inactive replication slot issues.
---
 doc/src/sgml/config.sgml                      |  6 +--
 .../replication/logical/logicalfuncs.c        |  4 +-
 src/backend/replication/slot.c                | 38 ++++++++++++++-----
 src/backend/replication/slotfuncs.c           | 10 ++---
 src/backend/replication/walsender.c           | 12 +++---
 src/backend/tcop/postgres.c                   |  2 +-
 src/backend/utils/misc/guc.c                  |  2 +-
 src/include/replication/slot.h                |  4 +-
 8 files changed, 48 insertions(+), 30 deletions(-)

diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml
index 7ed8c82a9d..b4e616bf52 100644
--- a/doc/src/sgml/config.sgml
+++ b/doc/src/sgml/config.sgml
@@ -7322,9 +7322,9 @@ log_line_prefix = '%m [%p] %q%u@%d/%a '
       </term>
       <listitem>
        <para>
-        Causes each replication command to be logged in the server log.
-        See <xref linkend="protocol-replication"/> for more information about
-        replication command. The default value is <literal>off</literal>.
+        Causes each replication command and related activity to be logged in
+        the server log. See <xref linkend="protocol-replication"/> for more
+        information about replication command. The default value is <literal>off</literal>.
         Only superusers can change this setting.
        </para>
       </listitem>
diff --git a/src/backend/replication/logical/logicalfuncs.c b/src/backend/replication/logical/logicalfuncs.c
index 3bd770a3ba..87e73b37cc 100644
--- a/src/backend/replication/logical/logicalfuncs.c
+++ b/src/backend/replication/logical/logicalfuncs.c
@@ -216,7 +216,7 @@ pg_logical_slot_get_changes_guts(FunctionCallInfo fcinfo, bool confirm, bool bin
 	else
 		end_of_wal = GetXLogReplayRecPtr(NULL);
 
-	ReplicationSlotAcquire(NameStr(*name), true);
+	ReplicationSlotAcquire(NameStr(*name), true, true);
 
 	PG_TRY();
 	{
@@ -330,7 +330,7 @@ pg_logical_slot_get_changes_guts(FunctionCallInfo fcinfo, bool confirm, bool bin
 		/* free context, call shutdown callback */
 		FreeDecodingContext(ctx);
 
-		ReplicationSlotRelease();
+		ReplicationSlotRelease(true);
 		InvalidateSystemCaches();
 	}
 	PG_CATCH();
diff --git a/src/backend/replication/slot.c b/src/backend/replication/slot.c
index caa6b29756..838e4019c1 100644
--- a/src/backend/replication/slot.c
+++ b/src/backend/replication/slot.c
@@ -183,7 +183,7 @@ ReplicationSlotShmemExit(int code, Datum arg)
 
 	/* Make sure active replication slots are released */
 	if (MyReplicationSlot != NULL)
-		ReplicationSlotRelease();
+		ReplicationSlotRelease(true);
 
 	/* Also cleanup all the temporary slots. */
 	ReplicationSlotCleanup();
@@ -366,6 +366,10 @@ ReplicationSlotCreate(const char *name, bool db_specific,
 
 	/* Let everybody know we've modified this slot */
 	ConditionVariableBroadcast(&slot->active_cv);
+
+	ereport(log_replication_commands ? LOG : DEBUG3,
+			(errmsg("created replication slot \"%s\"",
+					NameStr(slot->data.name))));
 }
 
 /*
@@ -406,7 +410,7 @@ SearchNamedReplicationSlot(const char *name, bool need_lock)
  * nowait is false, we sleep until the slot is released by the owning process.
  */
 void
-ReplicationSlotAcquire(const char *name, bool nowait)
+ReplicationSlotAcquire(const char *name, bool nowait, bool msg_ok)
 {
 	ReplicationSlot *s;
 	int			active_pid;
@@ -486,6 +490,11 @@ retry:
 
 	/* We made this slot active, so it's ours now. */
 	MyReplicationSlot = s;
+
+	if (msg_ok)
+		ereport(log_replication_commands ? LOG : DEBUG3,
+				(errmsg("acquired replication slot \"%s\"",
+						NameStr(s->data.name))));
 }
 
 /*
@@ -495,7 +504,7 @@ retry:
  * Resources this slot requires will be preserved.
  */
 void
-ReplicationSlotRelease(void)
+ReplicationSlotRelease(bool msg_ok)
 {
 	ReplicationSlot *slot = MyReplicationSlot;
 
@@ -545,6 +554,11 @@ ReplicationSlotRelease(void)
 	MyProc->statusFlags &= ~PROC_IN_LOGICAL_DECODING;
 	ProcGlobal->statusFlags[MyProc->pgxactoff] = MyProc->statusFlags;
 	LWLockRelease(ProcArrayLock);
+
+	if (msg_ok)
+		ereport(log_replication_commands ? LOG : DEBUG3,
+				(errmsg("released replication slot \"%s\"",
+						NameStr(slot->data.name))));
 }
 
 /*
@@ -602,7 +616,7 @@ ReplicationSlotDrop(const char *name, bool nowait)
 {
 	Assert(MyReplicationSlot == NULL);
 
-	ReplicationSlotAcquire(name, nowait);
+	ReplicationSlotAcquire(name, nowait, false);
 
 	ReplicationSlotDropAcquired();
 }
@@ -634,7 +648,9 @@ ReplicationSlotDropPtr(ReplicationSlot *slot)
 	char		tmppath[MAXPGPATH];
 
 	/* temp debugging aid to analyze 019_replslot_limit failures */
-	elog(DEBUG3, "replication slot drop: %s: begin", NameStr(slot->data.name));
+	ereport(log_replication_commands ? LOG : DEBUG3,
+			(errmsg("replication slot drop: %s: begin",
+					NameStr(slot->data.name))));
 
 	/*
 	 * If some other backend ran this code concurrently with us, we might try
@@ -686,8 +702,9 @@ ReplicationSlotDropPtr(ReplicationSlot *slot)
 						path, tmppath)));
 	}
 
-	elog(DEBUG3, "replication slot drop: %s: removed on-disk",
-		 NameStr(slot->data.name));
+	ereport(log_replication_commands ? LOG : DEBUG3,
+			(errmsg("replication slot drop: %s: removed on-disk",
+					NameStr(slot->data.name))));
 
 	/*
 	 * The slot is definitely gone.  Lock out concurrent scans of the array
@@ -745,8 +762,9 @@ ReplicationSlotDropPtr(ReplicationSlot *slot)
 	 */
 	LWLockRelease(ReplicationSlotAllocationLock);
 
-	elog(DEBUG3, "replication slot drop: %s: done",
-		 NameStr(slot->data.name));
+	ereport(log_replication_commands ? LOG : DEBUG3,
+			(errmsg("replication slot drop: %s: done",
+					NameStr(slot->data.name))));
 }
 
 /*
@@ -1341,7 +1359,7 @@ InvalidatePossiblyObsoleteSlot(ReplicationSlot *s, XLogRecPtr oldestLSN,
 			/* Make sure the invalidated state persists across server restart */
 			ReplicationSlotMarkDirty();
 			ReplicationSlotSave();
-			ReplicationSlotRelease();
+			ReplicationSlotRelease(true);
 
 			ereport(LOG,
 					(errmsg("invalidating slot \"%s\" because its restart_lsn %X/%X exceeds max_slot_wal_keep_size",
diff --git a/src/backend/replication/slotfuncs.c b/src/backend/replication/slotfuncs.c
index 886899afd2..86572f9be9 100644
--- a/src/backend/replication/slotfuncs.c
+++ b/src/backend/replication/slotfuncs.c
@@ -100,7 +100,7 @@ pg_create_physical_replication_slot(PG_FUNCTION_ARGS)
 	tuple = heap_form_tuple(tupdesc, values, nulls);
 	result = HeapTupleGetDatum(tuple);
 
-	ReplicationSlotRelease();
+	ReplicationSlotRelease(false);
 
 	PG_RETURN_DATUM(result);
 }
@@ -202,7 +202,7 @@ pg_create_logical_replication_slot(PG_FUNCTION_ARGS)
 	/* ok, slot is now fully created, mark it as persistent if needed */
 	if (!temporary)
 		ReplicationSlotPersist();
-	ReplicationSlotRelease();
+	ReplicationSlotRelease(false);
 
 	PG_RETURN_DATUM(result);
 }
@@ -630,7 +630,7 @@ pg_replication_slot_advance(PG_FUNCTION_ARGS)
 		moveto = Min(moveto, GetXLogReplayRecPtr(NULL));
 
 	/* Acquire the slot so we "own" it */
-	ReplicationSlotAcquire(NameStr(*slotname), true);
+	ReplicationSlotAcquire(NameStr(*slotname), true, true);
 
 	/* A slot whose restart_lsn has never been reserved cannot be advanced */
 	if (XLogRecPtrIsInvalid(MyReplicationSlot->data.restart_lsn))
@@ -673,7 +673,7 @@ pg_replication_slot_advance(PG_FUNCTION_ARGS)
 	ReplicationSlotsComputeRequiredXmin(false);
 	ReplicationSlotsComputeRequiredLSN();
 
-	ReplicationSlotRelease();
+	ReplicationSlotRelease(true);
 
 	/* Return the reached position. */
 	values[1] = LSNGetDatum(endlsn);
@@ -904,7 +904,7 @@ copy_replication_slot(FunctionCallInfo fcinfo, bool logical_slot)
 	tuple = heap_form_tuple(tupdesc, values, nulls);
 	result = HeapTupleGetDatum(tuple);
 
-	ReplicationSlotRelease();
+	ReplicationSlotRelease(false);
 
 	PG_RETURN_DATUM(result);
 }
diff --git a/src/backend/replication/walsender.c b/src/backend/replication/walsender.c
index 5a718b1fe9..ebd036a8ab 100644
--- a/src/backend/replication/walsender.c
+++ b/src/backend/replication/walsender.c
@@ -305,7 +305,7 @@ WalSndErrorCleanup(void)
 		wal_segment_close(xlogreader);
 
 	if (MyReplicationSlot != NULL)
-		ReplicationSlotRelease();
+		ReplicationSlotRelease(true);
 
 	ReplicationSlotCleanup();
 
@@ -696,7 +696,7 @@ StartReplication(StartReplicationCmd *cmd)
 
 	if (cmd->slotname)
 	{
-		ReplicationSlotAcquire(cmd->slotname, true);
+		ReplicationSlotAcquire(cmd->slotname, true, true);
 		if (SlotIsLogical(MyReplicationSlot))
 			ereport(ERROR,
 					(errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
@@ -842,7 +842,7 @@ StartReplication(StartReplicationCmd *cmd)
 	}
 
 	if (cmd->slotname)
-		ReplicationSlotRelease();
+		ReplicationSlotRelease(true);
 
 	/*
 	 * Copy is finished now. Send a single-row result set indicating the next
@@ -1221,7 +1221,7 @@ CreateReplicationSlot(CreateReplicationSlotCmd *cmd)
 	do_tup_output(tstate, values, nulls);
 	end_tup_output(tstate);
 
-	ReplicationSlotRelease();
+	ReplicationSlotRelease(false);
 }
 
 /*
@@ -1248,7 +1248,7 @@ StartLogicalReplication(StartReplicationCmd *cmd)
 
 	Assert(!MyReplicationSlot);
 
-	ReplicationSlotAcquire(cmd->slotname, true);
+	ReplicationSlotAcquire(cmd->slotname, true, true);
 
 	if (XLogRecPtrIsInvalid(MyReplicationSlot->data.restart_lsn))
 		ereport(ERROR,
@@ -1317,7 +1317,7 @@ StartLogicalReplication(StartReplicationCmd *cmd)
 	WalSndLoop(XLogSendLogical);
 
 	FreeDecodingContext(logical_decoding_ctx);
-	ReplicationSlotRelease();
+	ReplicationSlotRelease(true);
 
 	replication_active = false;
 	if (got_STOPPING)
diff --git a/src/backend/tcop/postgres.c b/src/backend/tcop/postgres.c
index 3c7d08209f..d159982df7 100644
--- a/src/backend/tcop/postgres.c
+++ b/src/backend/tcop/postgres.c
@@ -4273,7 +4273,7 @@ PostgresMain(const char *dbname, const char *username)
 		 * callback ensuring correct cleanup on FATAL errors.
 		 */
 		if (MyReplicationSlot != NULL)
-			ReplicationSlotRelease();
+			ReplicationSlotRelease(true);
 
 		/* We also want to cleanup temporary slots on error. */
 		ReplicationSlotCleanup();
diff --git a/src/backend/utils/misc/guc.c b/src/backend/utils/misc/guc.c
index 1e3650184b..3480890f0e 100644
--- a/src/backend/utils/misc/guc.c
+++ b/src/backend/utils/misc/guc.c
@@ -1378,7 +1378,7 @@ static struct config_bool ConfigureNamesBool[] =
 	},
 	{
 		{"log_replication_commands", PGC_SUSET, LOGGING_WHAT,
-			gettext_noop("Logs each replication command."),
+			gettext_noop("Logs each replication command and related activity."),
 			NULL
 		},
 		&log_replication_commands,
diff --git a/src/include/replication/slot.h b/src/include/replication/slot.h
index 24b30210c3..2ec6032cbf 100644
--- a/src/include/replication/slot.h
+++ b/src/include/replication/slot.h
@@ -199,8 +199,8 @@ extern void ReplicationSlotCreate(const char *name, bool db_specific,
 extern void ReplicationSlotPersist(void);
 extern void ReplicationSlotDrop(const char *name, bool nowait);
 
-extern void ReplicationSlotAcquire(const char *name, bool nowait);
-extern void ReplicationSlotRelease(void);
+extern void ReplicationSlotAcquire(const char *name, bool nowait, bool msg_ok);
+extern void ReplicationSlotRelease(bool msg_ok);
 extern void ReplicationSlotCleanup(void);
 extern void ReplicationSlotSave(void);
 extern void ReplicationSlotMarkDirty(void);
-- 
2.25.1

#10Bharath Rupireddy
bharath.rupireddyforpostgres@gmail.com
In reply to: Bharath Rupireddy (#9)
1 attachment(s)
add log messages when replication slots become active and inactive (was Re: Is it worth adding ReplicationSlot active_pid to ReplicationSlotPersistentData?)

On Sun, Feb 27, 2022 at 3:16 PM Bharath Rupireddy
<bharath.rupireddyforpostgres@gmail.com> wrote:

On Mon, Jan 10, 2022 at 6:50 AM Bharath Rupireddy
<bharath.rupireddyforpostgres@gmail.com> wrote:

On Wed, Jan 5, 2022 at 12:13 PM Kyotaro Horiguchi <horikyota.ntt@gmail.com> wrote:

Here's the v2 patch. Please provide your thoughts.

Thanks! I have three comments on this version.

Thanks for your review.

- I still think "acquire/release" logs on creation/dropping should be
silenced. Adding the third parameter to ReplicationSlotAcquire()
that can mute the acquiring (and as well as the corresponding
releasing) log will work.

Done.

can piggy-back on log_replication_commands for the purpose, changing
its meaning slightly to "log replication commands and related
activity".
- Need to mute the logs by log_replication_commands. (We could add
another choice for the variable for this purpose but I think we
don't need it.)

Done.

- The messages are not translatable as the variable parts are
adjectives. They need to consist of static sentences. The
combinations of the two properties are 6 (note that persistence is
tristate) but I don't think we accept that complexity for the
information. So I recommend to just remove the variable parts from
the messages.

Done.

Here's v3, please review it further.

Here's the rebased v4 patch.

Here's the rebased v5 patch.

Regards,
Bharath Rupireddy.

Attachments:

v5-0001-Add-LOG-messages-when-replication-slots-become-ac.patchapplication/octet-stream; name=v5-0001-Add-LOG-messages-when-replication-slots-become-ac.patchDownload
From 635bdfb7ac1d0fe57a8a0949290d34e5d2a35427 Mon Sep 17 00:00:00 2001
From: Bharath Rupireddy <bharath.rupireddyforpostgres@gmail.com>
Date: Fri, 29 Apr 2022 09:58:33 +0000
Subject: [PATCH v5] Add LOG messages when replication slots become active and
 inactive

These logs will be extremely useful on production servers to debug
and analyze inactive replication slot issues.
---
 doc/src/sgml/config.sgml                      |  7 ++--
 .../replication/logical/logicalfuncs.c        |  4 +-
 src/backend/replication/slot.c                | 38 ++++++++++++++-----
 src/backend/replication/slotfuncs.c           | 10 ++---
 src/backend/replication/walsender.c           | 12 +++---
 src/backend/tcop/postgres.c                   |  2 +-
 src/backend/utils/misc/guc.c                  |  2 +-
 src/include/replication/slot.h                |  4 +-
 8 files changed, 49 insertions(+), 30 deletions(-)

diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml
index 03986946a8..b6b7b4d838 100644
--- a/doc/src/sgml/config.sgml
+++ b/doc/src/sgml/config.sgml
@@ -7461,9 +7461,10 @@ log_line_prefix = '%m [%p] %q%u@%d/%a '
       </term>
       <listitem>
        <para>
-        Causes each replication command to be logged in the server log.
-        See <xref linkend="protocol-replication"/> for more information about
-        replication command. The default value is <literal>off</literal>.
+        Causes each replication command and related activity to be logged in
+        the server log. See <xref linkend="protocol-replication"/> for more
+        information about replication command. The default value is
+        <literal>off</literal>.
         Only superusers and users with the appropriate <literal>SET</literal>
         privilege can change this setting.
        </para>
diff --git a/src/backend/replication/logical/logicalfuncs.c b/src/backend/replication/logical/logicalfuncs.c
index 6058d36e0d..11e66a5821 100644
--- a/src/backend/replication/logical/logicalfuncs.c
+++ b/src/backend/replication/logical/logicalfuncs.c
@@ -201,7 +201,7 @@ pg_logical_slot_get_changes_guts(FunctionCallInfo fcinfo, bool confirm, bool bin
 	else
 		end_of_wal = GetXLogReplayRecPtr(NULL);
 
-	ReplicationSlotAcquire(NameStr(*name), true);
+	ReplicationSlotAcquire(NameStr(*name), true, true);
 
 	PG_TRY();
 	{
@@ -315,7 +315,7 @@ pg_logical_slot_get_changes_guts(FunctionCallInfo fcinfo, bool confirm, bool bin
 		/* free context, call shutdown callback */
 		FreeDecodingContext(ctx);
 
-		ReplicationSlotRelease();
+		ReplicationSlotRelease(true);
 		InvalidateSystemCaches();
 	}
 	PG_CATCH();
diff --git a/src/backend/replication/slot.c b/src/backend/replication/slot.c
index 5c778f5333..d9246e2ebc 100644
--- a/src/backend/replication/slot.c
+++ b/src/backend/replication/slot.c
@@ -183,7 +183,7 @@ ReplicationSlotShmemExit(int code, Datum arg)
 
 	/* Make sure active replication slots are released */
 	if (MyReplicationSlot != NULL)
-		ReplicationSlotRelease();
+		ReplicationSlotRelease(true);
 
 	/* Also cleanup all the temporary slots. */
 	ReplicationSlotCleanup();
@@ -366,6 +366,10 @@ ReplicationSlotCreate(const char *name, bool db_specific,
 
 	/* Let everybody know we've modified this slot */
 	ConditionVariableBroadcast(&slot->active_cv);
+
+	ereport(log_replication_commands ? LOG : DEBUG3,
+			(errmsg("created replication slot \"%s\"",
+					NameStr(slot->data.name))));
 }
 
 /*
@@ -422,7 +426,7 @@ ReplicationSlotIndex(ReplicationSlot *slot)
  * nowait is false, we sleep until the slot is released by the owning process.
  */
 void
-ReplicationSlotAcquire(const char *name, bool nowait)
+ReplicationSlotAcquire(const char *name, bool nowait, bool msg_ok)
 {
 	ReplicationSlot *s;
 	int			active_pid;
@@ -503,6 +507,11 @@ retry:
 	/* We made this slot active, so it's ours now. */
 	MyReplicationSlot = s;
 
+	if (msg_ok)
+		ereport(log_replication_commands ? LOG : DEBUG3,
+				(errmsg("acquired replication slot \"%s\"",
+						NameStr(s->data.name))));
+
 	/*
 	 * The call to pgstat_acquire_replslot() protects against stats for
 	 * a different slot, from before a restart or such, being present during
@@ -519,7 +528,7 @@ retry:
  * Resources this slot requires will be preserved.
  */
 void
-ReplicationSlotRelease(void)
+ReplicationSlotRelease(bool msg_ok)
 {
 	ReplicationSlot *slot = MyReplicationSlot;
 
@@ -569,6 +578,11 @@ ReplicationSlotRelease(void)
 	MyProc->statusFlags &= ~PROC_IN_LOGICAL_DECODING;
 	ProcGlobal->statusFlags[MyProc->pgxactoff] = MyProc->statusFlags;
 	LWLockRelease(ProcArrayLock);
+
+	if (msg_ok)
+		ereport(log_replication_commands ? LOG : DEBUG3,
+				(errmsg("released replication slot \"%s\"",
+						NameStr(slot->data.name))));
 }
 
 /*
@@ -626,7 +640,7 @@ ReplicationSlotDrop(const char *name, bool nowait)
 {
 	Assert(MyReplicationSlot == NULL);
 
-	ReplicationSlotAcquire(name, nowait);
+	ReplicationSlotAcquire(name, nowait, false);
 
 	ReplicationSlotDropAcquired();
 }
@@ -658,7 +672,9 @@ ReplicationSlotDropPtr(ReplicationSlot *slot)
 	char		tmppath[MAXPGPATH];
 
 	/* temp debugging aid to analyze 019_replslot_limit failures */
-	elog(DEBUG3, "replication slot drop: %s: begin", NameStr(slot->data.name));
+	ereport(log_replication_commands ? LOG : DEBUG3,
+			(errmsg("replication slot drop: %s: begin",
+					NameStr(slot->data.name))));
 
 	/*
 	 * If some other backend ran this code concurrently with us, we might try
@@ -710,8 +726,9 @@ ReplicationSlotDropPtr(ReplicationSlot *slot)
 						path, tmppath)));
 	}
 
-	elog(DEBUG3, "replication slot drop: %s: removed on-disk",
-		 NameStr(slot->data.name));
+	ereport(log_replication_commands ? LOG : DEBUG3,
+			(errmsg("replication slot drop: %s: removed on-disk",
+					NameStr(slot->data.name))));
 
 	/*
 	 * The slot is definitely gone.  Lock out concurrent scans of the array
@@ -768,8 +785,9 @@ ReplicationSlotDropPtr(ReplicationSlot *slot)
 	 */
 	LWLockRelease(ReplicationSlotAllocationLock);
 
-	elog(DEBUG3, "replication slot drop: %s: done",
-		 NameStr(slot->data.name));
+	ereport(log_replication_commands ? LOG : DEBUG3,
+			(errmsg("replication slot drop: %s: done",
+					NameStr(slot->data.name))));
 }
 
 /*
@@ -1364,7 +1382,7 @@ InvalidatePossiblyObsoleteSlot(ReplicationSlot *s, XLogRecPtr oldestLSN,
 			/* Make sure the invalidated state persists across server restart */
 			ReplicationSlotMarkDirty();
 			ReplicationSlotSave();
-			ReplicationSlotRelease();
+			ReplicationSlotRelease(true);
 
 			ereport(LOG,
 					(errmsg("invalidating slot \"%s\" because its restart_lsn %X/%X exceeds max_slot_wal_keep_size",
diff --git a/src/backend/replication/slotfuncs.c b/src/backend/replication/slotfuncs.c
index ca945994ef..368ab444b4 100644
--- a/src/backend/replication/slotfuncs.c
+++ b/src/backend/replication/slotfuncs.c
@@ -100,7 +100,7 @@ pg_create_physical_replication_slot(PG_FUNCTION_ARGS)
 	tuple = heap_form_tuple(tupdesc, values, nulls);
 	result = HeapTupleGetDatum(tuple);
 
-	ReplicationSlotRelease();
+	ReplicationSlotRelease(false);
 
 	PG_RETURN_DATUM(result);
 }
@@ -202,7 +202,7 @@ pg_create_logical_replication_slot(PG_FUNCTION_ARGS)
 	/* ok, slot is now fully created, mark it as persistent if needed */
 	if (!temporary)
 		ReplicationSlotPersist();
-	ReplicationSlotRelease();
+	ReplicationSlotRelease(false);
 
 	PG_RETURN_DATUM(result);
 }
@@ -605,7 +605,7 @@ pg_replication_slot_advance(PG_FUNCTION_ARGS)
 		moveto = Min(moveto, GetXLogReplayRecPtr(NULL));
 
 	/* Acquire the slot so we "own" it */
-	ReplicationSlotAcquire(NameStr(*slotname), true);
+	ReplicationSlotAcquire(NameStr(*slotname), true, true);
 
 	/* A slot whose restart_lsn has never been reserved cannot be advanced */
 	if (XLogRecPtrIsInvalid(MyReplicationSlot->data.restart_lsn))
@@ -648,7 +648,7 @@ pg_replication_slot_advance(PG_FUNCTION_ARGS)
 	ReplicationSlotsComputeRequiredXmin(false);
 	ReplicationSlotsComputeRequiredLSN();
 
-	ReplicationSlotRelease();
+	ReplicationSlotRelease(true);
 
 	/* Return the reached position. */
 	values[1] = LSNGetDatum(endlsn);
@@ -879,7 +879,7 @@ copy_replication_slot(FunctionCallInfo fcinfo, bool logical_slot)
 	tuple = heap_form_tuple(tupdesc, values, nulls);
 	result = HeapTupleGetDatum(tuple);
 
-	ReplicationSlotRelease();
+	ReplicationSlotRelease(false);
 
 	PG_RETURN_DATUM(result);
 }
diff --git a/src/backend/replication/walsender.c b/src/backend/replication/walsender.c
index 63a818140b..e4879a43f5 100644
--- a/src/backend/replication/walsender.c
+++ b/src/backend/replication/walsender.c
@@ -322,7 +322,7 @@ WalSndErrorCleanup(void)
 		wal_segment_close(xlogreader);
 
 	if (MyReplicationSlot != NULL)
-		ReplicationSlotRelease();
+		ReplicationSlotRelease(true);
 
 	ReplicationSlotCleanup();
 
@@ -713,7 +713,7 @@ StartReplication(StartReplicationCmd *cmd)
 
 	if (cmd->slotname)
 	{
-		ReplicationSlotAcquire(cmd->slotname, true);
+		ReplicationSlotAcquire(cmd->slotname, true, true);
 		if (SlotIsLogical(MyReplicationSlot))
 			ereport(ERROR,
 					(errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
@@ -859,7 +859,7 @@ StartReplication(StartReplicationCmd *cmd)
 	}
 
 	if (cmd->slotname)
-		ReplicationSlotRelease();
+		ReplicationSlotRelease(true);
 
 	/*
 	 * Copy is finished now. Send a single-row result set indicating the next
@@ -1237,7 +1237,7 @@ CreateReplicationSlot(CreateReplicationSlotCmd *cmd)
 	do_tup_output(tstate, values, nulls);
 	end_tup_output(tstate);
 
-	ReplicationSlotRelease();
+	ReplicationSlotRelease(false);
 }
 
 /*
@@ -1264,7 +1264,7 @@ StartLogicalReplication(StartReplicationCmd *cmd)
 
 	Assert(!MyReplicationSlot);
 
-	ReplicationSlotAcquire(cmd->slotname, true);
+	ReplicationSlotAcquire(cmd->slotname, true, true);
 
 	if (XLogRecPtrIsInvalid(MyReplicationSlot->data.restart_lsn))
 		ereport(ERROR,
@@ -1333,7 +1333,7 @@ StartLogicalReplication(StartReplicationCmd *cmd)
 	WalSndLoop(XLogSendLogical);
 
 	FreeDecodingContext(logical_decoding_ctx);
-	ReplicationSlotRelease();
+	ReplicationSlotRelease(true);
 
 	replication_active = false;
 	if (got_STOPPING)
diff --git a/src/backend/tcop/postgres.c b/src/backend/tcop/postgres.c
index 304cce135a..144d081715 100644
--- a/src/backend/tcop/postgres.c
+++ b/src/backend/tcop/postgres.c
@@ -4295,7 +4295,7 @@ PostgresMain(const char *dbname, const char *username)
 		 * callback ensuring correct cleanup on FATAL errors.
 		 */
 		if (MyReplicationSlot != NULL)
-			ReplicationSlotRelease();
+			ReplicationSlotRelease(true);
 
 		/* We also want to cleanup temporary slots on error. */
 		ReplicationSlotCleanup();
diff --git a/src/backend/utils/misc/guc.c b/src/backend/utils/misc/guc.c
index 8e9b71375c..50dfd5cd3c 100644
--- a/src/backend/utils/misc/guc.c
+++ b/src/backend/utils/misc/guc.c
@@ -1424,7 +1424,7 @@ static struct config_bool ConfigureNamesBool[] =
 	},
 	{
 		{"log_replication_commands", PGC_SUSET, LOGGING_WHAT,
-			gettext_noop("Logs each replication command."),
+			gettext_noop("Logs each replication command and related activity."),
 			NULL
 		},
 		&log_replication_commands,
diff --git a/src/include/replication/slot.h b/src/include/replication/slot.h
index 1ee63c4cf4..70871c6785 100644
--- a/src/include/replication/slot.h
+++ b/src/include/replication/slot.h
@@ -199,8 +199,8 @@ extern void ReplicationSlotCreate(const char *name, bool db_specific,
 extern void ReplicationSlotPersist(void);
 extern void ReplicationSlotDrop(const char *name, bool nowait);
 
-extern void ReplicationSlotAcquire(const char *name, bool nowait);
-extern void ReplicationSlotRelease(void);
+extern void ReplicationSlotAcquire(const char *name, bool nowait, bool msg_ok);
+extern void ReplicationSlotRelease(bool msg_ok);
 extern void ReplicationSlotCleanup(void);
 extern void ReplicationSlotSave(void);
 extern void ReplicationSlotMarkDirty(void);
-- 
2.25.1

#11Alvaro Herrera
alvherre@alvh.no-ip.org
In reply to: Bharath Rupireddy (#10)
Re: add log messages when replication slots become active and inactive (was Re: Is it worth adding ReplicationSlot active_pid to ReplicationSlotPersistentData?)

If I read the code right, this patch emits logs when
pg_logical_slot_get_changes and pg_replication_slot_advance SQL
functions are called. Is this desirable/useful, for the use case you
stated at start of thread? I think it is most likely pointless. If you
get rid of those, then the only acquisitions that would log messages are
those in StartReplication and StartLogicalReplication. So I wonder if
it would be better to leave the API of ReplicationSlotAcquire() alone,
and instead make StartReplication and StartLogicalReplication
responsible for those messages.

I didn't look at the release-side messages you're adding, but I suppose
it should be symmetrical.

--
Álvaro Herrera PostgreSQL Developer — https://www.EnterpriseDB.com/
"El hombre nunca sabe de lo que es capaz hasta que lo intenta" (C. Dickens)

#12Bharath Rupireddy
bharath.rupireddyforpostgres@gmail.com
In reply to: Alvaro Herrera (#11)
Re: add log messages when replication slots become active and inactive (was Re: Is it worth adding ReplicationSlot active_pid to ReplicationSlotPersistentData?)

On Fri, Apr 29, 2022 at 4:02 PM Alvaro Herrera <alvherre@alvh.no-ip.org> wrote:

If I read the code right, this patch emits logs when
pg_logical_slot_get_changes and pg_replication_slot_advance SQL
functions are called. Is this desirable/useful, for the use case you
stated at start of thread? I think it is most likely pointless. If you
get rid of those, then the only acquisitions that would log messages are
those in StartReplication and StartLogicalReplication. So I wonder if
it would be better to leave the API of ReplicationSlotAcquire() alone,
and instead make StartReplication and StartLogicalReplication
responsible for those messages.

I didn't look at the release-side messages you're adding, but I suppose
it should be symmetrical.

Adding the messages right after ReplicationSlotAcquire in the
StartReplication and StartLogicalReplication looks okay, but, if we
just add "released replication slot \"%s\" for logical/physical
replication". in StartReplication and StartLogicalReplication right
after ReplicationSlotRelease, how about ReplicationSlotRelease in
WalSndErrorCleanup and ReplicationSlotShmemExit?

The whole idea is to get to know when and how much time a slot was
inactive/unused when log_replication_commands is set to true.

We can think of adding a timestamp column to on-disk replication slot
data but that will be too much.

Regards,
Bharath Rupireddy.

#13Bharath Rupireddy
bharath.rupireddyforpostgres@gmail.com
In reply to: Bharath Rupireddy (#12)
1 attachment(s)
Re: add log messages when replication slots become active and inactive (was Re: Is it worth adding ReplicationSlot active_pid to ReplicationSlotPersistentData?)

On Thu, May 5, 2022 at 2:57 PM Bharath Rupireddy
<bharath.rupireddyforpostgres@gmail.com> wrote:

On Fri, Apr 29, 2022 at 4:02 PM Alvaro Herrera <alvherre@alvh.no-ip.org> wrote:

If I read the code right, this patch emits logs when
pg_logical_slot_get_changes and pg_replication_slot_advance SQL
functions are called. Is this desirable/useful, for the use case you
stated at start of thread? I think it is most likely pointless. If you
get rid of those, then the only acquisitions that would log messages are
those in StartReplication and StartLogicalReplication. So I wonder if
it would be better to leave the API of ReplicationSlotAcquire() alone,
and instead make StartReplication and StartLogicalReplication
responsible for those messages.

I didn't look at the release-side messages you're adding, but I suppose
it should be symmetrical.

Here's the v6 patch, a much simpler one - no changes to any of the
existing function APIs. Please see the sample logs at [1]2022-07-25 12:30:14.847 UTC [152873] LOG: acquired physical replication slot "foo" 2022-07-25 12:30:20.878 UTC [152873] LOG: released physical replication slot "foo". There's a
bit of duplicate code in the v6 patch, if the overall approach looks
okay, I can remove that too in the next version of the patch.

Thoughts?

[1]: 2022-07-25 12:30:14.847 UTC [152873] LOG: acquired physical replication slot "foo" 2022-07-25 12:30:20.878 UTC [152873] LOG: released physical replication slot "foo"
2022-07-25 12:30:14.847 UTC [152873] LOG: acquired physical
replication slot "foo"
2022-07-25 12:30:20.878 UTC [152873] LOG: released physical
replication slot "foo"

2022-07-25 12:49:18.023 UTC [168738] LOG: acquired logical
replication slot "bar"
2022-07-25 12:49:28.105 UTC [168738] LOG: released logical
replication slot "bar"

Regards,
Bharath Rupireddy.

Attachments:

v6-0001-Add-LOG-messages-when-replication-slots-become-ac.patchapplication/x-patch; name=v6-0001-Add-LOG-messages-when-replication-slots-become-ac.patchDownload
From c57cfd8dd27eaf9e0c17c47769cb24b8ab961116 Mon Sep 17 00:00:00 2001
From: Bharath Rupireddy <bharath.rupireddyforpostgres@gmail.com>
Date: Mon, 25 Jul 2022 12:51:26 +0000
Subject: [PATCH v6] Add LOG messages when replication slots become active and
 inactive

These logs will be extremely useful on production servers to debug
and analyze inactive replication slot issues.
---
 doc/src/sgml/config.sgml            |  7 +++---
 src/backend/replication/slot.c      | 21 ++++++++++++++++
 src/backend/replication/walsender.c | 39 +++++++++++++++++++++++++++++
 3 files changed, 64 insertions(+), 3 deletions(-)

diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml
index e2d728e0c4..8bf3c2ab53 100644
--- a/doc/src/sgml/config.sgml
+++ b/doc/src/sgml/config.sgml
@@ -7461,9 +7461,10 @@ log_line_prefix = '%m [%p] %q%u@%d/%a '
       </term>
       <listitem>
        <para>
-        Causes each replication command to be logged in the server log.
-        See <xref linkend="protocol-replication"/> for more information about
-        replication command. The default value is <literal>off</literal>.
+        Causes each replication command and related activity to be logged in
+        the server log. See <xref linkend="protocol-replication"/> for more
+        information about replication command. The default value is
+        <literal>off</literal>.
         Only superusers and users with the appropriate <literal>SET</literal>
         privilege can change this setting.
        </para>
diff --git a/src/backend/replication/slot.c b/src/backend/replication/slot.c
index 850b74936f..43563bec7e 100644
--- a/src/backend/replication/slot.c
+++ b/src/backend/replication/slot.c
@@ -179,8 +179,29 @@ ReplicationSlotShmemExit(int code, Datum arg)
 {
 	/* Make sure active replication slots are released */
 	if (MyReplicationSlot != NULL)
+	{
+		bool is_physical;
+		char slotname[NAMEDATALEN] = {0};
+
+		is_physical = SlotIsPhysical(MyReplicationSlot);
+		strcpy(slotname, NameStr(MyReplicationSlot->data.name));
+
 		ReplicationSlotRelease();
 
+		if (is_physical)
+		{
+			ereport(log_replication_commands ? LOG : DEBUG3,
+					(errmsg("released physical replication slot \"%s\"",
+							slotname)));
+		}
+		else
+		{
+			ereport(log_replication_commands ? LOG : DEBUG3,
+					(errmsg("released logical replication slot \"%s\"",
+							slotname)));
+		}
+	}
+
 	/* Also cleanup all the temporary slots. */
 	ReplicationSlotCleanup();
 }
diff --git a/src/backend/replication/walsender.c b/src/backend/replication/walsender.c
index 3a86786cc3..ebd34316e2 100644
--- a/src/backend/replication/walsender.c
+++ b/src/backend/replication/walsender.c
@@ -322,8 +322,29 @@ WalSndErrorCleanup(void)
 		wal_segment_close(xlogreader);
 
 	if (MyReplicationSlot != NULL)
+	{
+		bool is_physical;
+		char slotname[NAMEDATALEN] = {0};
+
+		is_physical = SlotIsPhysical(MyReplicationSlot);
+		strcpy(slotname, NameStr(MyReplicationSlot->data.name));
+
 		ReplicationSlotRelease();
 
+		if (is_physical)
+		{
+			ereport(log_replication_commands ? LOG : DEBUG3,
+					(errmsg("released physical replication slot \"%s\"",
+							slotname)));
+		}
+		else
+		{
+			ereport(log_replication_commands ? LOG : DEBUG3,
+					(errmsg("released logical replication slot \"%s\"",
+							slotname)));
+		}
+	}
+
 	ReplicationSlotCleanup();
 
 	replication_active = false;
@@ -704,6 +725,10 @@ StartReplication(StartReplicationCmd *cmd)
 					(errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
 					 errmsg("cannot use a logical replication slot for physical replication")));
 
+		ereport(log_replication_commands ? LOG : DEBUG3,
+				(errmsg("acquired physical replication slot \"%s\"",
+						cmd->slotname)));
+
 		/*
 		 * We don't need to verify the slot's restart_lsn here; instead we
 		 * rely on the caller requesting the starting point to use.  If the
@@ -844,8 +869,14 @@ StartReplication(StartReplicationCmd *cmd)
 	}
 
 	if (cmd->slotname)
+	{
 		ReplicationSlotRelease();
 
+		ereport(log_replication_commands ? LOG : DEBUG3,
+				(errmsg("released physical replication slot \"%s\"",
+						cmd->slotname)));
+	}
+
 	/*
 	 * Copy is finished now. Send a single-row result set indicating the next
 	 * timeline.
@@ -1256,6 +1287,10 @@ StartLogicalReplication(StartReplicationCmd *cmd)
 						cmd->slotname),
 				 errdetail("This slot has been invalidated because it exceeded the maximum reserved size.")));
 
+	ereport(log_replication_commands ? LOG : DEBUG3,
+			(errmsg("acquired logical replication slot \"%s\"",
+					cmd->slotname)));
+
 	/*
 	 * Force a disconnect, so that the decoding code doesn't need to care
 	 * about an eventual switch from running in recovery, to running in a
@@ -1318,6 +1353,10 @@ StartLogicalReplication(StartReplicationCmd *cmd)
 	FreeDecodingContext(logical_decoding_ctx);
 	ReplicationSlotRelease();
 
+	ereport(log_replication_commands ? LOG : DEBUG3,
+			(errmsg("released logical replication slot \"%s\"",
+					cmd->slotname)));
+
 	replication_active = false;
 	if (got_STOPPING)
 		proc_exit(0);
-- 
2.34.1

#14Bharath Rupireddy
bharath.rupireddyforpostgres@gmail.com
In reply to: Bharath Rupireddy (#13)
1 attachment(s)
Re: add log messages when replication slots become active and inactive (was Re: Is it worth adding ReplicationSlot active_pid to ReplicationSlotPersistentData?)

On Mon, Jul 25, 2022 at 6:31 PM Bharath Rupireddy
<bharath.rupireddyforpostgres@gmail.com> wrote:

Here's the v6 patch, a much simpler one - no changes to any of the
existing function APIs. Please see the sample logs at [1]. There's a
bit of duplicate code in the v6 patch, if the overall approach looks
okay, I can remove that too in the next version of the patch.

I modified the log_replication_commands description in guc_tables.c.
Please review the v7 patch further.

--
Bharath Rupireddy
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

Attachments:

v7-0001-Add-LOG-messages-when-replication-slots-become-ac.patchapplication/octet-stream; name=v7-0001-Add-LOG-messages-when-replication-slots-become-ac.patchDownload
From fc53b36e62c76bb29ad201842a7a6c1bdb94b6da Mon Sep 17 00:00:00 2001
From: Bharath Rupireddy <bharath.rupireddyforpostgres@gmail.com>
Date: Thu, 15 Sep 2022 05:05:22 +0000
Subject: [PATCH v7] Add LOG messages when replication slots become active and
 inactive

These logs will be extremely useful on production servers to debug
and analyze inactive replication slot issues.
---
 doc/src/sgml/config.sgml            |  7 +++---
 src/backend/replication/slot.c      | 17 ++++++++++++++
 src/backend/replication/walsender.c | 35 +++++++++++++++++++++++++++++
 src/backend/utils/misc/guc_tables.c |  2 +-
 4 files changed, 57 insertions(+), 4 deletions(-)

diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml
index 700914684d..55976b3df8 100644
--- a/doc/src/sgml/config.sgml
+++ b/doc/src/sgml/config.sgml
@@ -7482,9 +7482,10 @@ log_line_prefix = '%m [%p] %q%u@%d/%a '
       </term>
       <listitem>
        <para>
-        Causes each replication command to be logged in the server log.
-        See <xref linkend="protocol-replication"/> for more information about
-        replication command. The default value is <literal>off</literal>.
+        Causes each replication command and related activity to be logged in
+        the server log. See <xref linkend="protocol-replication"/> for more
+        information about replication command. The default value is
+        <literal>off</literal>.
         Only superusers and users with the appropriate <literal>SET</literal>
         privilege can change this setting.
        </para>
diff --git a/src/backend/replication/slot.c b/src/backend/replication/slot.c
index 8fec1cb4a5..9814fc87de 100644
--- a/src/backend/replication/slot.c
+++ b/src/backend/replication/slot.c
@@ -180,8 +180,25 @@ ReplicationSlotShmemExit(int code, Datum arg)
 {
 	/* Make sure active replication slots are released */
 	if (MyReplicationSlot != NULL)
+	{
+		bool is_physical;
+		char slotname[NAMEDATALEN] = {0};
+
+		is_physical = SlotIsPhysical(MyReplicationSlot);
+		strcpy(slotname, NameStr(MyReplicationSlot->data.name));
+
 		ReplicationSlotRelease();
 
+		if (is_physical)
+			ereport(log_replication_commands ? LOG : DEBUG3,
+					(errmsg("released physical replication slot \"%s\"",
+							slotname)));
+		else
+			ereport(log_replication_commands ? LOG : DEBUG3,
+					(errmsg("released logical replication slot \"%s\"",
+							slotname)));
+	}
+
 	/* Also cleanup all the temporary slots. */
 	ReplicationSlotCleanup();
 }
diff --git a/src/backend/replication/walsender.c b/src/backend/replication/walsender.c
index 724010dbd9..3daaea362a 100644
--- a/src/backend/replication/walsender.c
+++ b/src/backend/replication/walsender.c
@@ -322,8 +322,25 @@ WalSndErrorCleanup(void)
 		wal_segment_close(xlogreader);
 
 	if (MyReplicationSlot != NULL)
+	{
+		bool is_physical;
+		char slotname[NAMEDATALEN] = {0};
+
+		is_physical = SlotIsPhysical(MyReplicationSlot);
+		strcpy(slotname, NameStr(MyReplicationSlot->data.name));
+
 		ReplicationSlotRelease();
 
+		if (is_physical)
+			ereport(log_replication_commands ? LOG : DEBUG3,
+					(errmsg("released physical replication slot \"%s\"",
+							slotname)));
+		else
+			ereport(log_replication_commands ? LOG : DEBUG3,
+					(errmsg("released logical replication slot \"%s\"",
+							slotname)));
+	}
+
 	ReplicationSlotCleanup();
 
 	replication_active = false;
@@ -704,6 +721,10 @@ StartReplication(StartReplicationCmd *cmd)
 					(errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
 					 errmsg("cannot use a logical replication slot for physical replication")));
 
+		ereport(log_replication_commands ? LOG : DEBUG3,
+				(errmsg("acquired physical replication slot \"%s\"",
+						cmd->slotname)));
+
 		/*
 		 * We don't need to verify the slot's restart_lsn here; instead we
 		 * rely on the caller requesting the starting point to use.  If the
@@ -844,8 +865,14 @@ StartReplication(StartReplicationCmd *cmd)
 	}
 
 	if (cmd->slotname)
+	{
 		ReplicationSlotRelease();
 
+		ereport(log_replication_commands ? LOG : DEBUG3,
+				(errmsg("released physical replication slot \"%s\"",
+						cmd->slotname)));
+	}
+
 	/*
 	 * Copy is finished now. Send a single-row result set indicating the next
 	 * timeline.
@@ -1256,6 +1283,10 @@ StartLogicalReplication(StartReplicationCmd *cmd)
 						cmd->slotname),
 				 errdetail("This slot has been invalidated because it exceeded the maximum reserved size.")));
 
+	ereport(log_replication_commands ? LOG : DEBUG3,
+			(errmsg("acquired logical replication slot \"%s\"",
+					cmd->slotname)));
+
 	/*
 	 * Force a disconnect, so that the decoding code doesn't need to care
 	 * about an eventual switch from running in recovery, to running in a
@@ -1318,6 +1349,10 @@ StartLogicalReplication(StartReplicationCmd *cmd)
 	FreeDecodingContext(logical_decoding_ctx);
 	ReplicationSlotRelease();
 
+	ereport(log_replication_commands ? LOG : DEBUG3,
+			(errmsg("released logical replication slot \"%s\"",
+					cmd->slotname)));
+
 	replication_active = false;
 	if (got_STOPPING)
 		proc_exit(0);
diff --git a/src/backend/utils/misc/guc_tables.c b/src/backend/utils/misc/guc_tables.c
index 550e95056c..da1f90b4fa 100644
--- a/src/backend/utils/misc/guc_tables.c
+++ b/src/backend/utils/misc/guc_tables.c
@@ -1177,7 +1177,7 @@ struct config_bool ConfigureNamesBool[] =
 	},
 	{
 		{"log_replication_commands", PGC_SUSET, LOGGING_WHAT,
-			gettext_noop("Logs each replication command."),
+			gettext_noop("Logs each replication command and related activity."),
 			NULL
 		},
 		&log_replication_commands,
-- 
2.34.1

#15Amit Kapila
amit.kapila16@gmail.com
In reply to: Bharath Rupireddy (#14)
Re: add log messages when replication slots become active and inactive (was Re: Is it worth adding ReplicationSlot active_pid to ReplicationSlotPersistentData?)

On Thu, Sep 15, 2022 at 10:39 AM Bharath Rupireddy
<bharath.rupireddyforpostgres@gmail.com> wrote:

On Mon, Jul 25, 2022 at 6:31 PM Bharath Rupireddy
<bharath.rupireddyforpostgres@gmail.com> wrote:

Here's the v6 patch, a much simpler one - no changes to any of the
existing function APIs. Please see the sample logs at [1]. There's a
bit of duplicate code in the v6 patch, if the overall approach looks
okay, I can remove that too in the next version of the patch.

I modified the log_replication_commands description in guc_tables.c.
Please review the v7 patch further.

I see that you have modified the patch to address the comments from
Alvaro. Personally, I feel it would be better to add such a message at
a centralized location instead of spreading these in different callers
of slot acquire/release functionality to avoid getting these missed in
the new callers in the future. However, if Alvaro and others think
that the current style is better then we should go ahead and do it
that way. I hope that we should be able to decide on this and get it
into PG16. Anyone else would like to weigh in here?

--
With Regards,
Amit Kapila.

#16Peter Smith
smithpb2250@gmail.com
In reply to: Bharath Rupireddy (#14)
Re: add log messages when replication slots become active and inactive (was Re: Is it worth adding ReplicationSlot active_pid to ReplicationSlotPersistentData?)

Hi, I had a quick look at the v7 patch.

You might consider to encapsulate some of this logic in new functions like:

void
LogReplicationSlotAquired(bool is_physical, char *slotname)
{
loglevel = log_replication_commands ? LOG : DEBUG3;

if (is_physical)
ereport(loglevel,
(errmsg("acquired physical replication slot \"%s\"", slotname)));
else
ereport(loglevel,
(errmsg("acquired logical replication slot \"%s\"", slotname)));
}

void
LogReplicationSlotReleased(bool is_physical, char *slotname)
{
loglevel = log_replication_commands ? LOG : DEBUG3;

if (is_physical)
ereport(loglevel,
(errmsg("released physical replication slot \"%s\"", slotname)));
else
ereport(loglevel,
(errmsg("released logical replication slot \"%s\"", slotname)));
}

~~

THEN

ReplicationSlotShmemExit and WalSndErrorCleanup can call it like:
if (MyReplicationSlot != NULL)
{
bool is_physical = SlotIsPhysical(MyReplicationSlot);
char *slotname = pstrdup(NameStr(MyReplicationSlot->data.name));

ReplicationSlotRelease();
LogReplicationSlotReleased(is_physical, slotname);
}

~

StartlReplication can call like:
LogReplicationSlotAquired(true, cmd->slotname);
...
LogReplicationSlotReleased(true, cmd->slotname);

~

StartLogicalReplication can call like:
LogReplicationSlotAquired(false, cmd->slotname);
...
LogReplicationSlotReleased(false, cmd->slotname);

~~~

TBH, I am not sure for the *current* code if the encapsulation is
worth the trouble or not. But maybe at least it helps message
consistency and will make it easier if future callers are needed. I
guess those functions could also give you some central point to
comment the intent of this logging? Feel free to take or leave this
code suggestion.

------
Kind Regards,
Peter Smith.
Fujitsu Australia

#17Alvaro Herrera
alvherre@alvh.no-ip.org
In reply to: Amit Kapila (#15)
Re: add log messages when replication slots become active and inactive (was Re: Is it worth adding ReplicationSlot active_pid to ReplicationSlotPersistentData?)

On 2023-Mar-22, Amit Kapila wrote:

I see that you have modified the patch to address the comments from
Alvaro. Personally, I feel it would be better to add such a message at
a centralized location instead of spreading these in different callers
of slot acquire/release functionality to avoid getting these missed in
the new callers in the future. However, if Alvaro and others think
that the current style is better then we should go ahead and do it
that way. I hope that we should be able to decide on this and get it
into PG16. Anyone else would like to weigh in here?

I like Peter Smith's suggestion downthread.

--
Álvaro Herrera PostgreSQL Developer — https://www.EnterpriseDB.com/

#18Bharath Rupireddy
bharath.rupireddyforpostgres@gmail.com
In reply to: Alvaro Herrera (#17)
1 attachment(s)
Re: add log messages when replication slots become active and inactive (was Re: Is it worth adding ReplicationSlot active_pid to ReplicationSlotPersistentData?)

On Thu, Mar 23, 2023 at 3:37 PM Alvaro Herrera <alvherre@alvh.no-ip.org> wrote:

On 2023-Mar-22, Amit Kapila wrote:

I see that you have modified the patch to address the comments from
Alvaro. Personally, I feel it would be better to add such a message at
a centralized location instead of spreading these in different callers
of slot acquire/release functionality to avoid getting these missed in
the new callers in the future. However, if Alvaro and others think
that the current style is better then we should go ahead and do it
that way. I hope that we should be able to decide on this and get it
into PG16. Anyone else would like to weigh in here?

I like Peter Smith's suggestion downthread.

+1. Please review the attached v8 patch further.

--
Bharath Rupireddy
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

Attachments:

v8-0001-Add-LOG-messages-when-replication-slots-become-ac.patchapplication/octet-stream; name=v8-0001-Add-LOG-messages-when-replication-slots-become-ac.patchDownload
From b0824cbe21d02d6826aead6cc7cf50c955160626 Mon Sep 17 00:00:00 2001
From: Bharath Rupireddy <bharath.rupireddyforpostgres@gmail.com>
Date: Thu, 23 Mar 2023 17:21:12 +0000
Subject: [PATCH v8] Add LOG messages when replication slots become active and
 inactive

These logs will be extremely useful on production servers to debug
and analyze inactive replication slot issues.
---
 doc/src/sgml/config.sgml            |  7 +++--
 src/backend/replication/slot.c      | 45 +++++++++++++++++++++++++++++
 src/backend/replication/walsender.c | 17 +++++++++++
 src/backend/utils/misc/guc_tables.c |  2 +-
 src/include/replication/slot.h      |  2 ++
 5 files changed, 69 insertions(+), 4 deletions(-)

diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml
index 481f93cea1..f9b0d3c9a9 100644
--- a/doc/src/sgml/config.sgml
+++ b/doc/src/sgml/config.sgml
@@ -7593,9 +7593,10 @@ log_line_prefix = '%m [%p] %q%u@%d/%a '
       </term>
       <listitem>
        <para>
-        Causes each replication command to be logged in the server log.
-        See <xref linkend="protocol-replication"/> for more information about
-        replication command. The default value is <literal>off</literal>.
+        Causes each replication command and related activity to be logged in
+        the server log. See <xref linkend="protocol-replication"/> for more
+        information about replication command. The default value is
+        <literal>off</literal>.
         Only superusers and users with the appropriate <literal>SET</literal>
         privilege can change this setting.
        </para>
diff --git a/src/backend/replication/slot.c b/src/backend/replication/slot.c
index 2293c0c6fc..4dba028ab9 100644
--- a/src/backend/replication/slot.c
+++ b/src/backend/replication/slot.c
@@ -180,7 +180,16 @@ ReplicationSlotShmemExit(int code, Datum arg)
 {
 	/* Make sure active replication slots are released */
 	if (MyReplicationSlot != NULL)
+	{
+		bool is_physical;
+		char *slotname;
+
+		is_physical = SlotIsPhysical(MyReplicationSlot);
+		slotname = pstrdup(NameStr(MyReplicationSlot->data.name));
 		ReplicationSlotRelease();
+		LogReplicationSlotReleased(is_physical, slotname);
+		pfree(slotname);
+	}
 
 	/* Also cleanup all the temporary slots. */
 	ReplicationSlotCleanup();
@@ -596,6 +605,42 @@ ReplicationSlotRelease(void)
 	LWLockRelease(ProcArrayLock);
 }
 
+/*
+ * Helper function to log a message after replication slot acquisition.
+ */
+void
+LogReplicationSlotAquired(bool is_physical, char *slotname)
+{
+	Assert(MyReplicationSlot != NULL);
+
+	if (is_physical)
+		ereport(log_replication_commands ? LOG : DEBUG3,
+				(errmsg("acquired physical replication slot \"%s\"",
+						slotname)));
+	else
+		ereport(log_replication_commands ? LOG : DEBUG3,
+				(errmsg("acquired logical replication slot \"%s\"",
+						slotname)));
+}
+
+/*
+ * Helper function to log a message after replication slot release.
+ */
+void
+LogReplicationSlotReleased(bool is_physical, char *slotname)
+{
+	Assert(MyReplicationSlot == NULL);
+
+	if (is_physical)
+		ereport(log_replication_commands ? LOG : DEBUG3,
+				(errmsg("released physical replication slot \"%s\"",
+						slotname)));
+	else
+		ereport(log_replication_commands ? LOG : DEBUG3,
+				(errmsg("released logical replication slot \"%s\"",
+						slotname)));
+}
+
 /*
  * Cleanup all temporary slots created in current session.
  */
diff --git a/src/backend/replication/walsender.c b/src/backend/replication/walsender.c
index 75e8363e24..38d25ed3c8 100644
--- a/src/backend/replication/walsender.c
+++ b/src/backend/replication/walsender.c
@@ -322,7 +322,16 @@ WalSndErrorCleanup(void)
 		wal_segment_close(xlogreader);
 
 	if (MyReplicationSlot != NULL)
+	{
+		bool is_physical;
+		char *slotname;
+
+		is_physical = SlotIsPhysical(MyReplicationSlot);
+		slotname = pstrdup(NameStr(MyReplicationSlot->data.name));
 		ReplicationSlotRelease();
+		LogReplicationSlotReleased(is_physical, slotname);
+		pfree(slotname);
+	}
 
 	ReplicationSlotCleanup();
 
@@ -703,6 +712,8 @@ StartReplication(StartReplicationCmd *cmd)
 					(errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
 					 errmsg("cannot use a logical replication slot for physical replication")));
 
+		LogReplicationSlotAquired(true, cmd->slotname);
+
 		/*
 		 * We don't need to verify the slot's restart_lsn here; instead we
 		 * rely on the caller requesting the starting point to use.  If the
@@ -843,7 +854,10 @@ StartReplication(StartReplicationCmd *cmd)
 	}
 
 	if (cmd->slotname)
+	{
 		ReplicationSlotRelease();
+		LogReplicationSlotReleased(true, cmd->slotname);
+	}
 
 	/*
 	 * Copy is finished now. Send a single-row result set indicating the next
@@ -1260,6 +1274,8 @@ StartLogicalReplication(StartReplicationCmd *cmd)
 						cmd->slotname),
 				 errdetail("This slot has been invalidated because it exceeded the maximum reserved size.")));
 
+	LogReplicationSlotAquired(false, cmd->slotname);
+
 	/*
 	 * Force a disconnect, so that the decoding code doesn't need to care
 	 * about an eventual switch from running in recovery, to running in a
@@ -1321,6 +1337,7 @@ StartLogicalReplication(StartReplicationCmd *cmd)
 
 	FreeDecodingContext(logical_decoding_ctx);
 	ReplicationSlotRelease();
+	LogReplicationSlotReleased(false, cmd->slotname);
 
 	replication_active = false;
 	if (got_STOPPING)
diff --git a/src/backend/utils/misc/guc_tables.c b/src/backend/utils/misc/guc_tables.c
index 1c0583fe26..ea41ccc7e2 100644
--- a/src/backend/utils/misc/guc_tables.c
+++ b/src/backend/utils/misc/guc_tables.c
@@ -1203,7 +1203,7 @@ struct config_bool ConfigureNamesBool[] =
 	},
 	{
 		{"log_replication_commands", PGC_SUSET, LOGGING_WHAT,
-			gettext_noop("Logs each replication command."),
+			gettext_noop("Logs each replication command and related activity."),
 			NULL
 		},
 		&log_replication_commands,
diff --git a/src/include/replication/slot.h b/src/include/replication/slot.h
index 8872c80cdf..147e4d4371 100644
--- a/src/include/replication/slot.h
+++ b/src/include/replication/slot.h
@@ -202,6 +202,8 @@ extern void ReplicationSlotDrop(const char *name, bool nowait);
 
 extern void ReplicationSlotAcquire(const char *name, bool nowait);
 extern void ReplicationSlotRelease(void);
+extern void LogReplicationSlotAquired(bool is_physical, char *slotname);
+extern void LogReplicationSlotReleased(bool is_physical, char *slotname);
 extern void ReplicationSlotCleanup(void);
 extern void ReplicationSlotSave(void);
 extern void ReplicationSlotMarkDirty(void);
-- 
2.34.1

#19Peter Smith
smithpb2250@gmail.com
In reply to: Bharath Rupireddy (#18)
Re: add log messages when replication slots become active and inactive (was Re: Is it worth adding ReplicationSlot active_pid to ReplicationSlotPersistentData?)

On Fri, Mar 24, 2023 at 4:33 AM Bharath Rupireddy
<bharath.rupireddyforpostgres@gmail.com> wrote:

On Thu, Mar 23, 2023 at 3:37 PM Alvaro Herrera <alvherre@alvh.no-ip.org> wrote:

On 2023-Mar-22, Amit Kapila wrote:

I see that you have modified the patch to address the comments from
Alvaro. Personally, I feel it would be better to add such a message at
a centralized location instead of spreading these in different callers
of slot acquire/release functionality to avoid getting these missed in
the new callers in the future. However, if Alvaro and others think
that the current style is better then we should go ahead and do it
that way. I hope that we should be able to decide on this and get it
into PG16. Anyone else would like to weigh in here?

I like Peter Smith's suggestion downthread.

+1. Please review the attached v8 patch further.

Patch v8 applied OK, and builds/renders the HTML docs OK, and passes
the regression and subscription TAP tests OK.

(Note - I didn't do any additional manual testing, and I've assumed it
to be covering all the necessary acquire/related logging that you
wanted).

~~

Here are some minor comments:

1.
+ ereport(log_replication_commands ? LOG : DEBUG3,
+ (errmsg("acquired physical replication slot \"%s\"",
+ slotname)));

AFAIK those extra parentheses wrapping the "errmsg" part are not necessary.

~~

2.
extern void LogReplicationSlotAquired(bool is_physical, char *slotname);
extern void LogReplicationSlotReleased(bool is_physical, char *slotname);

The "char *slotname" params of those helper functions should probably
be declared and defined as "const char *slotname".

~~

Otherwise, from a code review perspective the patch v8 LGTM.

------
Kind Regards,
Peter Smith.
Fujitsu Australia

#20Euler Taveira
euler@eulerto.com
In reply to: Bharath Rupireddy (#18)
Re: add log messages when replication slots become active and inactive (was Re: Is it worth adding ReplicationSlot active_pid to ReplicationSlotPersistentData?)

On Thu, Mar 23, 2023, at 2:33 PM, Bharath Rupireddy wrote:

On Thu, Mar 23, 2023 at 3:37 PM Alvaro Herrera <alvherre@alvh.no-ip.org> wrote:

On 2023-Mar-22, Amit Kapila wrote:

I see that you have modified the patch to address the comments from
Alvaro. Personally, I feel it would be better to add such a message at
a centralized location instead of spreading these in different callers
of slot acquire/release functionality to avoid getting these missed in
the new callers in the future. However, if Alvaro and others think
that the current style is better then we should go ahead and do it
that way. I hope that we should be able to decide on this and get it
into PG16. Anyone else would like to weigh in here?

I like Peter Smith's suggestion downthread.

+1. Please review the attached v8 patch further.

If you are adding separate functions as suggested, you should add a comment at
the top of ReplicationSlotAcquire() and ReplicationSlotRelease() functions
saying that LogReplicationSlotAquired() and LogReplicationSlotReleased()
functions should be called respectively after it.

My suggestion is that the functions should have the same name with a "Log"
prefix. On of them has a typo "Aquired" in its name. Hence,
LogReplicationSlotAcquire() and LogReplicationSlotRelease() as names. It is
easier to find if someone is grepping by the origin function.

I prefer a sentence that includes a verb.

physical replication slot \"%s\" is acquired
logical replication slot \"%s\" is released

Isn't the PID important for this use case? If so, of course, you can rely on
log_line_prefix (%p) but if the PID is crucial for an investigation then it
should also be included in the message.

--
Euler Taveira
EDB https://www.enterprisedb.com/

#21Bharath Rupireddy
bharath.rupireddyforpostgres@gmail.com
In reply to: Euler Taveira (#20)
1 attachment(s)
Re: add log messages when replication slots become active and inactive (was Re: Is it worth adding ReplicationSlot active_pid to ReplicationSlotPersistentData?)

On Fri, Mar 24, 2023 at 3:11 AM Euler Taveira <euler@eulerto.com> wrote:

If you are adding separate functions as suggested, you should add a comment at
the top of ReplicationSlotAcquire() and ReplicationSlotRelease() functions
saying that LogReplicationSlotAquired() and LogReplicationSlotReleased()
functions should be called respectively after it.

Done.

My suggestion is that the functions should have the same name with a "Log"
prefix. On of them has a typo "Aquired" in its name. Hence,
LogReplicationSlotAcquire() and LogReplicationSlotRelease() as names. It is
easier to find if someone is grepping by the origin function.

Done.

I prefer a sentence that includes a verb.

physical replication slot \"%s\" is acquired
logical replication slot \"%s\" is released

Hm, changed for now. But I'll leave it to the committer's discretion.

Isn't the PID important for this use case? If so, of course, you can rely on
log_line_prefix (%p) but if the PID is crucial for an investigation then it
should also be included in the message.

On Fri, Mar 24, 2023 at 3:10 AM Peter Smith <smithpb2250@gmail.com> wrote:

Patch v8 applied OK, and builds/renders the HTML docs OK, and passes
the regression and subscription TAP tests OK.

Here are some minor comments:

1.
+ ereport(log_replication_commands ? LOG : DEBUG3,
+ (errmsg("acquired physical replication slot \"%s\"",
+ slotname)));

AFAIK those extra parentheses wrapping the "errmsg" part are not necessary.

Done

2.
extern void LogReplicationSlotAquired(bool is_physical, char *slotname);
extern void LogReplicationSlotReleased(bool is_physical, char *slotname);

The "char *slotname" params of those helper functions should probably
be declared and defined as "const char *slotname".

Done.

Otherwise, from a code review perspective the patch v8 LGTM.

Thanks. Please have a look at the v9 patch.

--
Bharath Rupireddy
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

Attachments:

v9-0001-Add-LOG-messages-when-replication-slots-become-ac.patchapplication/octet-stream; name=v9-0001-Add-LOG-messages-when-replication-slots-become-ac.patchDownload
From 44d6a8528e22e5882319d1308c74946b95922bfd Mon Sep 17 00:00:00 2001
From: Bharath Rupireddy <bharath.rupireddyforpostgres@gmail.com>
Date: Fri, 24 Mar 2023 03:23:15 +0000
Subject: [PATCH v9] Add LOG messages when replication slots become active and
 inactive

These logs will be extremely useful on production servers to debug
and analyze inactive replication slot issues.
---
 doc/src/sgml/config.sgml            |  7 +++--
 src/backend/replication/slot.c      | 49 +++++++++++++++++++++++++++++
 src/backend/replication/walsender.c | 15 +++++++++
 src/backend/utils/misc/guc_tables.c |  2 +-
 src/include/replication/slot.h      |  2 ++
 5 files changed, 71 insertions(+), 4 deletions(-)

diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml
index 481f93cea1..f9b0d3c9a9 100644
--- a/doc/src/sgml/config.sgml
+++ b/doc/src/sgml/config.sgml
@@ -7593,9 +7593,10 @@ log_line_prefix = '%m [%p] %q%u@%d/%a '
       </term>
       <listitem>
        <para>
-        Causes each replication command to be logged in the server log.
-        See <xref linkend="protocol-replication"/> for more information about
-        replication command. The default value is <literal>off</literal>.
+        Causes each replication command and related activity to be logged in
+        the server log. See <xref linkend="protocol-replication"/> for more
+        information about replication command. The default value is
+        <literal>off</literal>.
         Only superusers and users with the appropriate <literal>SET</literal>
         privilege can change this setting.
        </para>
diff --git a/src/backend/replication/slot.c b/src/backend/replication/slot.c
index 2293c0c6fc..3e363eaad8 100644
--- a/src/backend/replication/slot.c
+++ b/src/backend/replication/slot.c
@@ -180,7 +180,14 @@ ReplicationSlotShmemExit(int code, Datum arg)
 {
 	/* Make sure active replication slots are released */
 	if (MyReplicationSlot != NULL)
+	{
+		char *slotname;
+
+		slotname = pstrdup(NameStr(MyReplicationSlot->data.name));
 		ReplicationSlotRelease();
+		LogReplicationSlotRelease(SlotIsPhysical(MyReplicationSlot), slotname);
+		pfree(slotname);
+	}
 
 	/* Also cleanup all the temporary slots. */
 	ReplicationSlotCleanup();
@@ -445,6 +452,9 @@ ReplicationSlotName(int index, Name name)
  *
  * An error is raised if nowait is true and the slot is currently in use. If
  * nowait is false, we sleep until the slot is released by the owning process.
+ *
+ * Note: use LogReplicationSlotAcquire() if needed, to log a message after
+ * acquiring the replication slot.
  */
 void
 ReplicationSlotAcquire(const char *name, bool nowait)
@@ -542,6 +552,9 @@ retry:
  *
  * This or another backend can re-acquire the slot later.
  * Resources this slot requires will be preserved.
+ *
+ * Note: use LogReplicationSlotRelease() if needed, to log a message after
+ * releasing the replication slot.
  */
 void
 ReplicationSlotRelease(void)
@@ -596,6 +609,42 @@ ReplicationSlotRelease(void)
 	LWLockRelease(ProcArrayLock);
 }
 
+/*
+ * Helper function to log a message after replication slot acquisition.
+ */
+void
+LogReplicationSlotAcquire(bool is_physical, const char *slotname)
+{
+	Assert(MyReplicationSlot != NULL);
+
+	if (is_physical)
+		ereport(log_replication_commands ? LOG : DEBUG3,
+				errmsg("physical replication slot \"%s\" is acquired",
+					   slotname));
+	else
+		ereport(log_replication_commands ? LOG : DEBUG3,
+				errmsg("logical replication slot \"%s\" is acquired",
+						slotname));
+}
+
+/*
+ * Helper function to log a message after replication slot release.
+ */
+void
+LogReplicationSlotRelease(bool is_physical, const char *slotname)
+{
+	Assert(MyReplicationSlot == NULL);
+
+	if (is_physical)
+		ereport(log_replication_commands ? LOG : DEBUG3,
+				errmsg("physical replication slot \"%s\" is released",
+						slotname));
+	else
+		ereport(log_replication_commands ? LOG : DEBUG3,
+				errmsg("logical replication slot \"%s\" is released",
+						slotname));
+}
+
 /*
  * Cleanup all temporary slots created in current session.
  */
diff --git a/src/backend/replication/walsender.c b/src/backend/replication/walsender.c
index 75e8363e24..97ecb0b7d5 100644
--- a/src/backend/replication/walsender.c
+++ b/src/backend/replication/walsender.c
@@ -322,7 +322,14 @@ WalSndErrorCleanup(void)
 		wal_segment_close(xlogreader);
 
 	if (MyReplicationSlot != NULL)
+	{
+		char *slotname;
+
+		slotname = pstrdup(NameStr(MyReplicationSlot->data.name));
 		ReplicationSlotRelease();
+		LogReplicationSlotRelease(SlotIsPhysical(MyReplicationSlot), slotname);
+		pfree(slotname);
+	}
 
 	ReplicationSlotCleanup();
 
@@ -703,6 +710,8 @@ StartReplication(StartReplicationCmd *cmd)
 					(errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
 					 errmsg("cannot use a logical replication slot for physical replication")));
 
+		LogReplicationSlotAcquire(true, cmd->slotname);
+
 		/*
 		 * We don't need to verify the slot's restart_lsn here; instead we
 		 * rely on the caller requesting the starting point to use.  If the
@@ -843,7 +852,10 @@ StartReplication(StartReplicationCmd *cmd)
 	}
 
 	if (cmd->slotname)
+	{
 		ReplicationSlotRelease();
+		LogReplicationSlotRelease(true, cmd->slotname);
+	}
 
 	/*
 	 * Copy is finished now. Send a single-row result set indicating the next
@@ -1260,6 +1272,8 @@ StartLogicalReplication(StartReplicationCmd *cmd)
 						cmd->slotname),
 				 errdetail("This slot has been invalidated because it exceeded the maximum reserved size.")));
 
+	LogReplicationSlotAcquire(false, cmd->slotname);
+
 	/*
 	 * Force a disconnect, so that the decoding code doesn't need to care
 	 * about an eventual switch from running in recovery, to running in a
@@ -1321,6 +1335,7 @@ StartLogicalReplication(StartReplicationCmd *cmd)
 
 	FreeDecodingContext(logical_decoding_ctx);
 	ReplicationSlotRelease();
+	LogReplicationSlotRelease(false, cmd->slotname);
 
 	replication_active = false;
 	if (got_STOPPING)
diff --git a/src/backend/utils/misc/guc_tables.c b/src/backend/utils/misc/guc_tables.c
index 1c0583fe26..ea41ccc7e2 100644
--- a/src/backend/utils/misc/guc_tables.c
+++ b/src/backend/utils/misc/guc_tables.c
@@ -1203,7 +1203,7 @@ struct config_bool ConfigureNamesBool[] =
 	},
 	{
 		{"log_replication_commands", PGC_SUSET, LOGGING_WHAT,
-			gettext_noop("Logs each replication command."),
+			gettext_noop("Logs each replication command and related activity."),
 			NULL
 		},
 		&log_replication_commands,
diff --git a/src/include/replication/slot.h b/src/include/replication/slot.h
index 8872c80cdf..d5faf89368 100644
--- a/src/include/replication/slot.h
+++ b/src/include/replication/slot.h
@@ -202,6 +202,8 @@ extern void ReplicationSlotDrop(const char *name, bool nowait);
 
 extern void ReplicationSlotAcquire(const char *name, bool nowait);
 extern void ReplicationSlotRelease(void);
+extern void LogReplicationSlotAcquire(bool is_physical, const char *slotname);
+extern void LogReplicationSlotRelease(bool is_physical, const char *slotname);
 extern void ReplicationSlotCleanup(void);
 extern void ReplicationSlotSave(void);
 extern void ReplicationSlotMarkDirty(void);
-- 
2.34.1

#22Greg Stark
stark@mit.edu
In reply to: Bharath Rupireddy (#21)
Re: add log messages when replication slots become active and inactive (was Re: Is it worth adding ReplicationSlot active_pid to ReplicationSlotPersistentData?)

On Thu, 23 Mar 2023 at 23:30, Bharath Rupireddy
<bharath.rupireddyforpostgres@gmail.com> wrote:

+ ereport(log_replication_commands ? LOG : DEBUG3,
+ (errmsg("acquired physical replication slot \"%s\"",
+ slotname)));

So this is just a bit of bike-shedding but I don't feel like these log
messages really meet the standard we set for our logging. Like what
did the acquiring? What does "acquired" actually mean for a
replication slot? Is there not any meta information about the
acquisition that can give more context to the reader to make this
message more meaningful?

I would expect a log message like this to say, I dunno, something like
"physical replication slot \"%s\" acquired by streaming TCP connection
to 192.168.0.1:999 at LSN ... with xxxMB of logs to read"

I even would be wondering if the other end shouldn't also be logging a
corresponding log and we shouldn't be going out of our way to ensure
there's enough information to match them up and presenting them in a
way that makes that easy.

--
greg

#23Alvaro Herrera
alvherre@alvh.no-ip.org
In reply to: Greg Stark (#22)
Re: add log messages when replication slots become active and inactive (was Re: Is it worth adding ReplicationSlot active_pid to ReplicationSlotPersistentData?)

On 2023-Mar-23, Greg Stark wrote:

On Thu, 23 Mar 2023 at 23:30, Bharath Rupireddy
<bharath.rupireddyforpostgres@gmail.com> wrote:

+ ereport(log_replication_commands ? LOG : DEBUG3,
+ (errmsg("acquired physical replication slot \"%s\"",
+ slotname)));

So this is just a bit of bike-shedding but I don't feel like these log
messages really meet the standard we set for our logging. Like what
did the acquiring? What does "acquired" actually mean for a
replication slot? Is there not any meta information about the
acquisition that can give more context to the reader to make this
message more meaningful?

I would expect a log message like this to say, I dunno, something like
"physical replication slot \"%s\" acquired by streaming TCP connection
to 192.168.0.1:999 at LSN ... with xxxMB of logs to read"

Hmm, I don't disagree with your argument in principle, but I think this
proposal is going too far. I think stating the PID is more than
sufficient. And I don't think we need this patch to go great lengths to
explain what acquisition is, either; I mean, maybe that's a good thing
to have, but then that's a different patch.

I even would be wondering if the other end shouldn't also be logging a
corresponding log and we shouldn't be going out of our way to ensure
there's enough information to match them up and presenting them in a
way that makes that easy.

Hmm, you should be able to match things using the connection
information. I don't think the slot acquisition operation in itself is
that important.

--
Álvaro Herrera Breisgau, Deutschland — https://www.EnterpriseDB.com/
"Use it up, wear it out, make it do, or do without"

#24Bharath Rupireddy
bharath.rupireddyforpostgres@gmail.com
In reply to: Alvaro Herrera (#23)
1 attachment(s)
Re: add log messages when replication slots become active and inactive (was Re: Is it worth adding ReplicationSlot active_pid to ReplicationSlotPersistentData?)

On Fri, Mar 24, 2023 at 3:05 PM Alvaro Herrera <alvherre@alvh.no-ip.org> wrote:

On 2023-Mar-23, Greg Stark wrote:

On Thu, 23 Mar 2023 at 23:30, Bharath Rupireddy
<bharath.rupireddyforpostgres@gmail.com> wrote:

+ ereport(log_replication_commands ? LOG : DEBUG3,
+ (errmsg("acquired physical replication slot \"%s\"",
+ slotname)));

So this is just a bit of bike-shedding but I don't feel like these log
messages really meet the standard we set for our logging. Like what
did the acquiring? What does "acquired" actually mean for a
replication slot? Is there not any meta information about the
acquisition that can give more context to the reader to make this
message more meaningful?

I would expect a log message like this to say, I dunno, something like
"physical replication slot \"%s\" acquired by streaming TCP connection
to 192.168.0.1:999 at LSN ... with xxxMB of logs to read"

Hmm, I don't disagree with your argument in principle, but I think this
proposal is going too far. I think stating the PID is more than
sufficient.

Do you mean to have something like "physical/logical replication slot
\"%s\" is released/acquired by PID %d", MyProcPid? If yes, the
log_line_prefix already contains PID right? Or do we want to cover the
cases when someone changes log_line_prefix to not contain PID?

And I don't think we need this patch to go great lengths to
explain what acquisition is, either; I mean, maybe that's a good thing
to have, but then that's a different patch.

I even would be wondering if the other end shouldn't also be logging a
corresponding log and we shouldn't be going out of our way to ensure
there's enough information to match them up and presenting them in a
way that makes that easy.

Hmm, you should be able to match things using the connection
information. I don't think the slot acquisition operation in itself is
that important.

Yeah, the intention of the patch is to track the patterns of slot
acquisitions and releases to aid analysis. Of course, this information
alone may not help but when matched with others in the logs, it will.

The v9 patch was failing because I was using MyReplicationSlot after
it got reset by slot release, attached v10 patch fixes it.

--
Bharath Rupireddy
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

Attachments:

v10-0001-Add-LOG-messages-when-replication-slots-become-a.patchapplication/x-patch; name=v10-0001-Add-LOG-messages-when-replication-slots-become-a.patchDownload
From 626dc27e81142cddee656c0140f4570e952e728a Mon Sep 17 00:00:00 2001
From: Bharath Rupireddy <bharath.rupireddyforpostgres@gmail.com>
Date: Fri, 24 Mar 2023 06:11:52 +0000
Subject: [PATCH v10] Add LOG messages when replication slots become active and
 inactive

These logs will be extremely useful on production servers to debug
and analyze inactive replication slot issues.
---
 doc/src/sgml/config.sgml            |  7 ++--
 src/backend/replication/slot.c      | 51 +++++++++++++++++++++++++++++
 src/backend/replication/walsender.c | 17 ++++++++++
 src/backend/utils/misc/guc_tables.c |  2 +-
 src/include/replication/slot.h      |  2 ++
 5 files changed, 75 insertions(+), 4 deletions(-)

diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml
index 481f93cea1..f9b0d3c9a9 100644
--- a/doc/src/sgml/config.sgml
+++ b/doc/src/sgml/config.sgml
@@ -7593,9 +7593,10 @@ log_line_prefix = '%m [%p] %q%u@%d/%a '
       </term>
       <listitem>
        <para>
-        Causes each replication command to be logged in the server log.
-        See <xref linkend="protocol-replication"/> for more information about
-        replication command. The default value is <literal>off</literal>.
+        Causes each replication command and related activity to be logged in
+        the server log. See <xref linkend="protocol-replication"/> for more
+        information about replication command. The default value is
+        <literal>off</literal>.
         Only superusers and users with the appropriate <literal>SET</literal>
         privilege can change this setting.
        </para>
diff --git a/src/backend/replication/slot.c b/src/backend/replication/slot.c
index 2293c0c6fc..2caa8a3532 100644
--- a/src/backend/replication/slot.c
+++ b/src/backend/replication/slot.c
@@ -180,7 +180,16 @@ ReplicationSlotShmemExit(int code, Datum arg)
 {
 	/* Make sure active replication slots are released */
 	if (MyReplicationSlot != NULL)
+	{
+		bool	is_physical;
+		char	*slotname;
+
+		is_physical = SlotIsPhysical(MyReplicationSlot);
+		slotname = pstrdup(NameStr(MyReplicationSlot->data.name));
 		ReplicationSlotRelease();
+		LogReplicationSlotRelease(is_physical, slotname);
+		pfree(slotname);
+	}
 
 	/* Also cleanup all the temporary slots. */
 	ReplicationSlotCleanup();
@@ -445,6 +454,9 @@ ReplicationSlotName(int index, Name name)
  *
  * An error is raised if nowait is true and the slot is currently in use. If
  * nowait is false, we sleep until the slot is released by the owning process.
+ *
+ * Note: use LogReplicationSlotAcquire() if needed, to log a message after
+ * acquiring the replication slot.
  */
 void
 ReplicationSlotAcquire(const char *name, bool nowait)
@@ -542,6 +554,9 @@ retry:
  *
  * This or another backend can re-acquire the slot later.
  * Resources this slot requires will be preserved.
+ *
+ * Note: use LogReplicationSlotRelease() if needed, to log a message after
+ * releasing the replication slot.
  */
 void
 ReplicationSlotRelease(void)
@@ -596,6 +611,42 @@ ReplicationSlotRelease(void)
 	LWLockRelease(ProcArrayLock);
 }
 
+/*
+ * Helper function to log a message after replication slot acquisition.
+ */
+void
+LogReplicationSlotAcquire(bool is_physical, const char *slotname)
+{
+	Assert(MyReplicationSlot != NULL);
+
+	if (is_physical)
+		ereport(log_replication_commands ? LOG : DEBUG3,
+				errmsg("physical replication slot \"%s\" is acquired",
+					   slotname));
+	else
+		ereport(log_replication_commands ? LOG : DEBUG3,
+				errmsg("logical replication slot \"%s\" is acquired",
+						slotname));
+}
+
+/*
+ * Helper function to log a message after replication slot release.
+ */
+void
+LogReplicationSlotRelease(bool is_physical, const char *slotname)
+{
+	Assert(MyReplicationSlot == NULL);
+
+	if (is_physical)
+		ereport(log_replication_commands ? LOG : DEBUG3,
+				errmsg("physical replication slot \"%s\" is released",
+						slotname));
+	else
+		ereport(log_replication_commands ? LOG : DEBUG3,
+				errmsg("logical replication slot \"%s\" is released",
+						slotname));
+}
+
 /*
  * Cleanup all temporary slots created in current session.
  */
diff --git a/src/backend/replication/walsender.c b/src/backend/replication/walsender.c
index 75e8363e24..749581b7f1 100644
--- a/src/backend/replication/walsender.c
+++ b/src/backend/replication/walsender.c
@@ -322,7 +322,16 @@ WalSndErrorCleanup(void)
 		wal_segment_close(xlogreader);
 
 	if (MyReplicationSlot != NULL)
+	{
+		bool	is_physical;
+		char	*slotname;
+
+		is_physical = SlotIsPhysical(MyReplicationSlot);
+		slotname = pstrdup(NameStr(MyReplicationSlot->data.name));
 		ReplicationSlotRelease();
+		LogReplicationSlotRelease(is_physical, slotname);
+		pfree(slotname);
+	}
 
 	ReplicationSlotCleanup();
 
@@ -703,6 +712,8 @@ StartReplication(StartReplicationCmd *cmd)
 					(errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
 					 errmsg("cannot use a logical replication slot for physical replication")));
 
+		LogReplicationSlotAcquire(true, cmd->slotname);
+
 		/*
 		 * We don't need to verify the slot's restart_lsn here; instead we
 		 * rely on the caller requesting the starting point to use.  If the
@@ -843,7 +854,10 @@ StartReplication(StartReplicationCmd *cmd)
 	}
 
 	if (cmd->slotname)
+	{
 		ReplicationSlotRelease();
+		LogReplicationSlotRelease(true, cmd->slotname);
+	}
 
 	/*
 	 * Copy is finished now. Send a single-row result set indicating the next
@@ -1260,6 +1274,8 @@ StartLogicalReplication(StartReplicationCmd *cmd)
 						cmd->slotname),
 				 errdetail("This slot has been invalidated because it exceeded the maximum reserved size.")));
 
+	LogReplicationSlotAcquire(false, cmd->slotname);
+
 	/*
 	 * Force a disconnect, so that the decoding code doesn't need to care
 	 * about an eventual switch from running in recovery, to running in a
@@ -1321,6 +1337,7 @@ StartLogicalReplication(StartReplicationCmd *cmd)
 
 	FreeDecodingContext(logical_decoding_ctx);
 	ReplicationSlotRelease();
+	LogReplicationSlotRelease(false, cmd->slotname);
 
 	replication_active = false;
 	if (got_STOPPING)
diff --git a/src/backend/utils/misc/guc_tables.c b/src/backend/utils/misc/guc_tables.c
index 1c0583fe26..ea41ccc7e2 100644
--- a/src/backend/utils/misc/guc_tables.c
+++ b/src/backend/utils/misc/guc_tables.c
@@ -1203,7 +1203,7 @@ struct config_bool ConfigureNamesBool[] =
 	},
 	{
 		{"log_replication_commands", PGC_SUSET, LOGGING_WHAT,
-			gettext_noop("Logs each replication command."),
+			gettext_noop("Logs each replication command and related activity."),
 			NULL
 		},
 		&log_replication_commands,
diff --git a/src/include/replication/slot.h b/src/include/replication/slot.h
index 8872c80cdf..d5faf89368 100644
--- a/src/include/replication/slot.h
+++ b/src/include/replication/slot.h
@@ -202,6 +202,8 @@ extern void ReplicationSlotDrop(const char *name, bool nowait);
 
 extern void ReplicationSlotAcquire(const char *name, bool nowait);
 extern void ReplicationSlotRelease(void);
+extern void LogReplicationSlotAcquire(bool is_physical, const char *slotname);
+extern void LogReplicationSlotRelease(bool is_physical, const char *slotname);
 extern void ReplicationSlotCleanup(void);
 extern void ReplicationSlotSave(void);
 extern void ReplicationSlotMarkDirty(void);
-- 
2.34.1

#25Amit Kapila
amit.kapila16@gmail.com
In reply to: Bharath Rupireddy (#24)
Re: add log messages when replication slots become active and inactive (was Re: Is it worth adding ReplicationSlot active_pid to ReplicationSlotPersistentData?)

On Mon, Mar 27, 2023 at 11:08 AM Bharath Rupireddy
<bharath.rupireddyforpostgres@gmail.com> wrote:

The v9 patch was failing because I was using MyReplicationSlot after
it got reset by slot release, attached v10 patch fixes it.

+ *
+ * Note: use LogReplicationSlotAcquire() if needed, to log a message after
+ * acquiring the replication slot.
  */
 void
 ReplicationSlotAcquire(const char *name, bool nowait)
@@ -542,6 +554,9 @@ retry:

When does it need to be logged? For example, recently, we added one
more slot acquisition/release call in
binary_upgrade_logical_slot_has_caught_up(); it is not clear from the
comments whether we need to LOG it or not. I guess at some place like
atop LogReplicationSlotAcquire() we should document in a bit more
specific way as to when is this expected to be called.

--
With Regards,
Amit Kapila.

#26Peter Smith
smithpb2250@gmail.com
In reply to: Amit Kapila (#25)
Re: add log messages when replication slots become active and inactive (was Re: Is it worth adding ReplicationSlot active_pid to ReplicationSlotPersistentData?)

On Wed, Nov 1, 2023 at 2:03 PM Amit Kapila <amit.kapila16@gmail.com> wrote:

On Mon, Mar 27, 2023 at 11:08 AM Bharath Rupireddy
<bharath.rupireddyforpostgres@gmail.com> wrote:

The v9 patch was failing because I was using MyReplicationSlot after
it got reset by slot release, attached v10 patch fixes it.

+ *
+ * Note: use LogReplicationSlotAcquire() if needed, to log a message after
+ * acquiring the replication slot.
*/
void
ReplicationSlotAcquire(const char *name, bool nowait)
@@ -542,6 +554,9 @@ retry:

When does it need to be logged? For example, recently, we added one
more slot acquisition/release call in
binary_upgrade_logical_slot_has_caught_up(); it is not clear from the
comments whether we need to LOG it or not. I guess at some place like
atop LogReplicationSlotAcquire() we should document in a bit more
specific way as to when is this expected to be called.

I agree. Just saying "if needed" in those function comments doesn't
help with knowing how to judge when logging is needed or not.

~

Looking back at the thread history it seems the function comment was
added after Euler [1]Euler - /messages/by-id/c42d5634-ca9b-49a7-85cd-9eff9feb33b4@app.fastmail.com suggested ("... you should add a comment at the
top of ReplicationSlotAcquire() and ReplicationSlotRelease() functions
saying that LogReplicationSlotAquired() and
LogReplicationSlotReleased() functions should be called respectively
after it.")

But that's not quite compatible with what Alvaro [2]Alvaro - /messages/by-id/202204291032.qfvyuqxkjnjw@alvherre.pgsql had written long
back ("... the only acquisitions that would log messages are those in
StartReplication and StartLogicalReplication.").

In other words, ReplicationSlotAcquire/ReplicationSlotRelease is
called by more places than you care to log from.

~

Adding a better explanatory comment than "if needed" will be good, and
maybe that is all that is necessary. I'm not sure.

OTOH, if you have to explain that logging is only wanted for a couple
of scenarios, then it raises some doubts about the usefulness of
having a common function in the first place. I had the same doubts
back in March [3]Peter - /messages/by-id/CAHut+Pu6Knwooc_NckMxszGrAJnytgpMadtoJ-OA-SFWT+GFYw@mail.gmail.com ("I am not sure for the *current* code if the
encapsulation is worth the trouble or not.").

======
[1]: Euler - /messages/by-id/c42d5634-ca9b-49a7-85cd-9eff9feb33b4@app.fastmail.com
[2]: Alvaro - /messages/by-id/202204291032.qfvyuqxkjnjw@alvherre.pgsql
[3]: Peter - /messages/by-id/CAHut+Pu6Knwooc_NckMxszGrAJnytgpMadtoJ-OA-SFWT+GFYw@mail.gmail.com

Kind Regards,
Peter Smith.
Fujitsu Austalia

#27Bharath Rupireddy
bharath.rupireddyforpostgres@gmail.com
In reply to: Peter Smith (#26)
1 attachment(s)
Re: add log messages when replication slots become active and inactive (was Re: Is it worth adding ReplicationSlot active_pid to ReplicationSlotPersistentData?)

On Thu, Nov 2, 2023 at 7:19 AM Peter Smith <smithpb2250@gmail.com> wrote:

But that's not quite compatible with what Alvaro [2] had written long
back ("... the only acquisitions that would log messages are those in
StartReplication and StartLogicalReplication.").

In other words, ReplicationSlotAcquire/ReplicationSlotRelease is
called by more places than you care to log from.

I refreshed my thoughts for this patch and I think it's enough if
walsenders alone log messages when slots become active and inactive.
How about something like the attached v11 patch?

--
Bharath Rupireddy
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

Attachments:

v11-0001-Emit-messages-when-replication-slots-become-acti.patchapplication/x-patch; name=v11-0001-Emit-messages-when-replication-slots-become-acti.patchDownload
From eed7c96d9cab98a576728792c2cb111ae61f930f Mon Sep 17 00:00:00 2001
From: Bharath Rupireddy <bharath.rupireddyforpostgres@gmail.com>
Date: Sat, 4 Nov 2023 21:59:27 +0000
Subject: [PATCH v11] Emit messages when replication slots become active and
 inactive

These logs will be extremely useful on production servers to debug
and analyze inactive replication slot issues.
---
 doc/src/sgml/config.sgml            | 10 +++---
 src/backend/replication/slot.c      | 54 ++++++++++++++++++++++++++++-
 src/backend/replication/walsender.c | 10 +++---
 src/include/replication/slot.h      |  2 ++
 4 files changed, 65 insertions(+), 11 deletions(-)

diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml
index bd70ff2e4b..6d14a94cf4 100644
--- a/doc/src/sgml/config.sgml
+++ b/doc/src/sgml/config.sgml
@@ -7510,11 +7510,11 @@ log_line_prefix = '%m [%p] %q%u@%d/%a '
       </term>
       <listitem>
        <para>
-        Causes each replication command to be logged in the server log.
-        See <xref linkend="protocol-replication"/> for more information about
-        replication command. The default value is <literal>off</literal>.
-        Only superusers and users with the appropriate <literal>SET</literal>
-        privilege can change this setting.
+        Causes each replication command and related activity to be logged in
+        the server log. See <xref linkend="protocol-replication"/> for more
+        information about replication command. The default value is
+        <literal>off</literal>. Only superusers and users with the appropriate
+        <literal>SET</literal> privilege can change this setting.
        </para>
       </listitem>
      </varlistentry>
diff --git a/src/backend/replication/slot.c b/src/backend/replication/slot.c
index 99823df3c7..dfb564b04f 100644
--- a/src/backend/replication/slot.c
+++ b/src/backend/replication/slot.c
@@ -181,7 +181,12 @@ ReplicationSlotShmemExit(int code, Datum arg)
 {
 	/* Make sure active replication slots are released */
 	if (MyReplicationSlot != NULL)
-		ReplicationSlotRelease();
+	{
+		if (am_walsender)
+			ReplicationSlotReleaseAndLog();
+		else
+			ReplicationSlotRelease();
+	}
 
 	/* Also cleanup all the temporary slots. */
 	ReplicationSlotCleanup();
@@ -539,6 +544,26 @@ retry:
 		pgstat_acquire_replslot(s);
 }
 
+/*
+ * A wrapper around ReplicationSlotAcquire() but logs a message when
+ * appropriate.
+ *
+ * This function is currently used only in walsender.
+ */
+void
+ReplicationSlotAcquireAndLog(const char *name, bool nowait)
+{
+	Assert(am_walsender);
+
+	ReplicationSlotAcquire(name, nowait);
+
+	ereport(log_replication_commands ? LOG : DEBUG1,
+			(errmsg("walsender process with PID %d acquired %s replication slot \"%s\"",
+					MyProcPid,
+					SlotIsPhysical(MyReplicationSlot) ? "physical" : "logical",
+					NameStr(MyReplicationSlot->data.name))));
+}
+
 /*
  * Release the replication slot that this backend considers to own.
  *
@@ -598,6 +623,33 @@ ReplicationSlotRelease(void)
 	LWLockRelease(ProcArrayLock);
 }
 
+/*
+ * A wrapper around ReplicationSlotAcquire() but logs a message when
+ * appropriate.
+ *
+ * This function is currently used only in walsender.
+ */
+void
+ReplicationSlotReleaseAndLog(void)
+{
+	char	   *slotname;
+	bool		is_physical;
+
+	Assert(am_walsender);
+
+	slotname = pstrdup(NameStr(MyReplicationSlot->data.name));
+	is_physical = SlotIsPhysical(MyReplicationSlot);
+
+	ReplicationSlotRelease();
+
+	ereport(log_replication_commands ? LOG : DEBUG1,
+			(errmsg("walsender process with PID %d released %s replication slot \"%s\"",
+					MyProcPid,
+					is_physical ? "physical" : "logical", slotname)));
+
+	pfree(slotname);
+}
+
 /*
  * Cleanup all temporary slots created in current session.
  */
diff --git a/src/backend/replication/walsender.c b/src/backend/replication/walsender.c
index e250b0567e..2a1c5056c0 100644
--- a/src/backend/replication/walsender.c
+++ b/src/backend/replication/walsender.c
@@ -322,7 +322,7 @@ WalSndErrorCleanup(void)
 		wal_segment_close(xlogreader);
 
 	if (MyReplicationSlot != NULL)
-		ReplicationSlotRelease();
+		ReplicationSlotReleaseAndLog();
 
 	ReplicationSlotCleanup();
 
@@ -697,7 +697,7 @@ StartReplication(StartReplicationCmd *cmd)
 
 	if (cmd->slotname)
 	{
-		ReplicationSlotAcquire(cmd->slotname, true);
+		ReplicationSlotAcquireAndLog(cmd->slotname, true);
 		if (SlotIsLogical(MyReplicationSlot))
 			ereport(ERROR,
 					(errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
@@ -843,7 +843,7 @@ StartReplication(StartReplicationCmd *cmd)
 	}
 
 	if (cmd->slotname)
-		ReplicationSlotRelease();
+		ReplicationSlotReleaseAndLog();
 
 	/*
 	 * Copy is finished now. Send a single-row result set indicating the next
@@ -1261,7 +1261,7 @@ StartLogicalReplication(StartReplicationCmd *cmd)
 
 	Assert(!MyReplicationSlot);
 
-	ReplicationSlotAcquire(cmd->slotname, true);
+	ReplicationSlotAcquireAndLog(cmd->slotname, true);
 
 	/*
 	 * Force a disconnect, so that the decoding code doesn't need to care
@@ -1323,7 +1323,7 @@ StartLogicalReplication(StartReplicationCmd *cmd)
 	WalSndLoop(XLogSendLogical);
 
 	FreeDecodingContext(logical_decoding_ctx);
-	ReplicationSlotRelease();
+	ReplicationSlotReleaseAndLog();
 
 	replication_active = false;
 	if (got_STOPPING)
diff --git a/src/include/replication/slot.h b/src/include/replication/slot.h
index d3535eed58..3dc8699106 100644
--- a/src/include/replication/slot.h
+++ b/src/include/replication/slot.h
@@ -223,7 +223,9 @@ extern void ReplicationSlotPersist(void);
 extern void ReplicationSlotDrop(const char *name, bool nowait);
 
 extern void ReplicationSlotAcquire(const char *name, bool nowait);
+extern void ReplicationSlotAcquireAndLog(const char *name, bool nowait);
 extern void ReplicationSlotRelease(void);
+extern void ReplicationSlotReleaseAndLog(void);
 extern void ReplicationSlotCleanup(void);
 extern void ReplicationSlotSave(void);
 extern void ReplicationSlotMarkDirty(void);
-- 
2.34.1

#28Amit Kapila
amit.kapila16@gmail.com
In reply to: Bharath Rupireddy (#27)
Re: add log messages when replication slots become active and inactive (was Re: Is it worth adding ReplicationSlot active_pid to ReplicationSlotPersistentData?)

On Sun, Nov 5, 2023 at 4:01 AM Bharath Rupireddy
<bharath.rupireddyforpostgres@gmail.com> wrote:

On Thu, Nov 2, 2023 at 7:19 AM Peter Smith <smithpb2250@gmail.com> wrote:

But that's not quite compatible with what Alvaro [2] had written long
back ("... the only acquisitions that would log messages are those in
StartReplication and StartLogicalReplication.").

In other words, ReplicationSlotAcquire/ReplicationSlotRelease is
called by more places than you care to log from.

I refreshed my thoughts for this patch and I think it's enough if
walsenders alone log messages when slots become active and inactive.
How about something like the attached v11 patch?

+ * This function is currently used only in walsender.
+ */
+void
+ReplicationSlotAcquireAndLog(const char *name, bool nowait)

BTW, is the reason for using it only in walsender is that it is a
background process and it is not very apparent whether the slot is
created, and for foreground processes, it is a bit clear when the
command is executed. If so, the other alternative is to either use a
parameter to the existing function or directly use am_walsender flag
to distinguish when to print the message in acquire/release calls.

Can you please tell me the use case of this additional message?

A few other minor comments:
1.
+        Causes each replication command and related activity to be logged in
+        the server log.

Can we be bit more specific by changing to something like: "Causes
each replication command and slot acquisition/release to be logged in
the server log."

2.
+ ereport(log_replication_commands ? LOG : DEBUG1,
+ (errmsg("walsender process with PID %d acquired %s replication slot \"%s\"",

It seems PID and process name is quite unlike what we print in other
similar messages. For example, see below messages when we start
replication via pub/sub :

2023-11-06 08:41:57.867 IST [24400] LOG: received replication
command: CREATE_REPLICATION_SLOT "sub1" LOGICAL pgoutput (SNAPSHOT
'nothing')
2023-11-06 08:41:57.867 IST [24400] STATEMENT:
CREATE_REPLICATION_SLOT "sub1" LOGICAL pgoutput (SNAPSHOT 'nothing')
...
...
2023-11-06 08:41:57.993 IST [22332] LOG: walsender process with PID
22332 acquired logical replication slot "sub1"
2023-11-06 08:41:57.993 IST [22332] STATEMENT: START_REPLICATION SLOT
"sub1" LOGICAL 0/0 (proto_version '4', origin 'any', publication_names
'"pub1"')
...
...
2023-11-06 08:41:58.015 IST [22332] LOG: starting logical decoding
for slot "sub1"
2023-11-06 08:41:58.015 IST [22332] DETAIL: Streaming transactions
committing after 0/1522730, reading WAL from 0/15226F8.
2023-11-06 08:41:58.015 IST [22332] STATEMENT: START_REPLICATION SLOT
"sub1" LOGICAL 0/0 (proto_version '4', origin 'any', publication_names
'"pub1"')

We can get the PID from the log line as for other logs and I don't see
the process name printed anywhere else.

--
With Regards,
Amit Kapila.

#29Peter Smith
smithpb2250@gmail.com
In reply to: Bharath Rupireddy (#27)
4 attachment(s)
Re: add log messages when replication slots become active and inactive (was Re: Is it worth adding ReplicationSlot active_pid to ReplicationSlotPersistentData?)

Hi,

FWIW, I used a small script to set up the following environment to
help observe the log messages written by this patch.

Node1 = primary; it creates a PUBLICATION
Node2 = physical standby; the publication is replicated here
Node3 = subscriber; it subscribes to the publication now on the standby Node2

~

In the log files you can see:

1. The "physical" message from the walsender of the primary (Node1)
e.g. 2023-11-06 18:59:58.094 AEDT [17091] LOG: walsender process with
PID 17091 acquired physical replication slot "pg_basebackup_17091"

2. The "logical" message from the pub/sub walsender of the standby (Node2)
e.g. 2023-11-06 19:16:29.053 AEDT [12774] LOG: walsender process with
PID 12774 acquired logical replication slot "sub1"

~

PSA the test script and logs

======
Kind Regards,
Peter Smith.
Fujitsu Australia

Attachments:

ps_setup.shtext/x-sh; charset=US-ASCII; name=ps_setup.shDownload
sub.logapplication/octet-stream; name=sub.logDownload
primarydb.logapplication/octet-stream; name=primarydb.logDownload
standby.logapplication/octet-stream; name=standby.logDownload
#30Bharath Rupireddy
bharath.rupireddyforpostgres@gmail.com
In reply to: Amit Kapila (#28)
1 attachment(s)
Re: add log messages when replication slots become active and inactive (was Re: Is it worth adding ReplicationSlot active_pid to ReplicationSlotPersistentData?)

On Mon, Nov 6, 2023 at 9:09 AM Amit Kapila <amit.kapila16@gmail.com> wrote:

On Sun, Nov 5, 2023 at 4:01 AM Bharath Rupireddy
<bharath.rupireddyforpostgres@gmail.com> wrote:

On Thu, Nov 2, 2023 at 7:19 AM Peter Smith <smithpb2250@gmail.com> wrote:

But that's not quite compatible with what Alvaro [2] had written long
back ("... the only acquisitions that would log messages are those in
StartReplication and StartLogicalReplication.").

In other words, ReplicationSlotAcquire/ReplicationSlotRelease is
called by more places than you care to log from.

I refreshed my thoughts for this patch and I think it's enough if
walsenders alone log messages when slots become active and inactive.
How about something like the attached v11 patch?

+ * This function is currently used only in walsender.
+ */
+void
+ReplicationSlotAcquireAndLog(const char *name, bool nowait)

BTW, is the reason for using it only in walsender is that it is a
background process and it is not very apparent whether the slot is
created, and for foreground processes, it is a bit clear when the
command is executed.

Can you please tell me the use case of this additional message?

Replication slot acquisitions and releases by backends say when
running pg_replication_slot_advance or pg_logical_slot_get_changes or
pg_drop_replication_slot or pg_create_{physical,
logical}_replication_slot are transient unlike walsenders which
comparatively hold slots for longer durations. Therefore, I've added
them only for walsenders. These messages help to know the lifetime of
a replication slot - one can know how long a streaming standby or
logical subscriber is down, IOW, how long a replication slot is
inactive in production. For instance, the time between released and
acquired slots in the below messages is the inactive replication slot
duration.

2023-11-13 11:06:34.338 UTC [470262] LOG: acquired physical
replication slot "sb_repl_slot"
2023-11-13 11:06:34.338 UTC [470262] STATEMENT: START_REPLICATION
SLOT "sb_repl_slot" 0/3000000 TIMELINE 1
2023-11-13 11:09:24.918 UTC [470262] LOG: released physical
replication slot "sb_repl_slot"
2023-11-13 12:01:40.530 UTC [470967] LOG: acquired physical
replication slot "sb_repl_slot"
2023-11-13 12:01:40.530 UTC [470967] STATEMENT: START_REPLICATION
SLOT "sb_repl_slot" 0/3000000 TIMELINE 1

If so, the other alternative is to either use a
parameter to the existing function or directly use am_walsender flag
to distinguish when to print the message in acquire/release calls.

Done that way. PSA v12.

A few other minor comments:
1.
+        Causes each replication command and related activity to be logged in
+        the server log.

Can we be bit more specific by changing to something like: "Causes
each replication command and slot acquisition/release to be logged in
the server log."

Done.

2.
+ ereport(log_replication_commands ? LOG : DEBUG1,
+ (errmsg("walsender process with PID %d acquired %s replication slot \"%s\"",

It seems PID and process name is quite unlike what we print in other
similar messages. For example, see below messages when we start
replication via pub/sub :

We can get the PID from the log line as for other logs and I don't see
the process name printed anywhere else.

There was a comment upthread to have PID printed, but I agree to be
consistent and changed the messages to be: acquired physical/logical
replication slot "foo" and released physical/logical replication slot
"foo".

PSA v12 patch.

--
Bharath Rupireddy
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

Attachments:

v12-0001-Emit-messages-when-replication-slots-become-acti.patchapplication/octet-stream; name=v12-0001-Emit-messages-when-replication-slots-become-acti.patchDownload
From cd784c50ed0fc928241f109e19f0f5c9afaa01ef Mon Sep 17 00:00:00 2001
From: Bharath Rupireddy <bharath.rupireddyforpostgres@gmail.com>
Date: Mon, 13 Nov 2023 12:04:11 +0000
Subject: [PATCH v12] Emit messages when replication slots become active and
 inactive

This commit adds log messages (at LOG level when
log_replication_commands is set, otherwise at DEBUG1 level) when
walsenders acquire and release replication slots. These messages
help to know the lifetime of a replication slot - one can know how
long a streaming standby or logical subscriber or replication slot
consumer is down. In other words, one can know how long a
replication slot is inactive - the time between released and
acquired slot messages is the inactive replication slot duration.
These messages will be extremely useful on production servers to
debug and analyze inactive replication slot issues.

Note that these messages are emitted only for walsenders when
replication slot is acquired and released, but not for backends.
This is because walsenders are the ones that typically hold
replication slots for longer durations unlike backends which hold
replication slots for executing some replication related
functions.
---
 doc/src/sgml/config.sgml       | 10 +++++-----
 src/backend/replication/slot.c | 23 +++++++++++++++++++++++
 2 files changed, 28 insertions(+), 5 deletions(-)

diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml
index fc35a46e5e..1dc014ce84 100644
--- a/doc/src/sgml/config.sgml
+++ b/doc/src/sgml/config.sgml
@@ -7510,11 +7510,11 @@ log_line_prefix = '%m [%p] %q%u@%d/%a '
       </term>
       <listitem>
        <para>
-        Causes each replication command to be logged in the server log.
-        See <xref linkend="protocol-replication"/> for more information about
-        replication command. The default value is <literal>off</literal>.
-        Only superusers and users with the appropriate <literal>SET</literal>
-        privilege can change this setting.
+        Causes each replication command and slot acquisition/release to be
+        logged in the server log. See <xref linkend="protocol-replication"/>
+        for more information about replication command. The default value is
+        <literal>off</literal>. Only superusers and users with the appropriate
+        <literal>SET</literal> privilege can change this setting.
        </para>
       </listitem>
      </varlistentry>
diff --git a/src/backend/replication/slot.c b/src/backend/replication/slot.c
index 781aa43cc4..fd8b8151b6 100644
--- a/src/backend/replication/slot.c
+++ b/src/backend/replication/slot.c
@@ -537,6 +537,12 @@ retry:
 	 */
 	if (SlotIsLogical(s))
 		pgstat_acquire_replslot(s);
+
+	if (am_walsender)
+		ereport(log_replication_commands ? LOG : DEBUG1,
+				errmsg_internal("acquired %s replication slot \"%s\"",
+								SlotIsPhysical(MyReplicationSlot) ? "physical" : "logical",
+								NameStr(MyReplicationSlot->data.name)));
 }
 
 /*
@@ -549,9 +555,17 @@ void
 ReplicationSlotRelease(void)
 {
 	ReplicationSlot *slot = MyReplicationSlot;
+	char	   *slotname;
+	bool		is_physical;
 
 	Assert(slot != NULL && slot->active_pid != 0);
 
+	if (am_walsender)
+	{
+		slotname = pstrdup(NameStr(MyReplicationSlot->data.name));
+		is_physical = SlotIsPhysical(MyReplicationSlot);
+	}
+
 	if (slot->data.persistency == RS_EPHEMERAL)
 	{
 		/*
@@ -596,6 +610,15 @@ ReplicationSlotRelease(void)
 	MyProc->statusFlags &= ~PROC_IN_LOGICAL_DECODING;
 	ProcGlobal->statusFlags[MyProc->pgxactoff] = MyProc->statusFlags;
 	LWLockRelease(ProcArrayLock);
+
+	if (am_walsender)
+	{
+		ereport(log_replication_commands ? LOG : DEBUG1,
+				errmsg_internal("released %s replication slot \"%s\"",
+								is_physical ? "physical" : "logical", slotname));
+
+		pfree(slotname);
+	}
 }
 
 /*
-- 
2.34.1

#31Bharath Rupireddy
bharath.rupireddyforpostgres@gmail.com
In reply to: Bharath Rupireddy (#30)
1 attachment(s)
Re: add log messages when replication slots become active and inactive (was Re: Is it worth adding ReplicationSlot active_pid to ReplicationSlotPersistentData?)

On Mon, Nov 13, 2023 at 5:43 PM Bharath Rupireddy
<bharath.rupireddyforpostgres@gmail.com> wrote:

On Mon, Nov 6, 2023 at 9:09 AM Amit Kapila <amit.kapila16@gmail.com> wrote:

On Sun, Nov 5, 2023 at 4:01 AM Bharath Rupireddy
<bharath.rupireddyforpostgres@gmail.com> wrote:

On Thu, Nov 2, 2023 at 7:19 AM Peter Smith <smithpb2250@gmail.com> wrote:

But that's not quite compatible with what Alvaro [2] had written long
back ("... the only acquisitions that would log messages are those in
StartReplication and StartLogicalReplication.").

In other words, ReplicationSlotAcquire/ReplicationSlotRelease is
called by more places than you care to log from.

I refreshed my thoughts for this patch and I think it's enough if
walsenders alone log messages when slots become active and inactive.
How about something like the attached v11 patch?

+ * This function is currently used only in walsender.
+ */
+void
+ReplicationSlotAcquireAndLog(const char *name, bool nowait)

BTW, is the reason for using it only in walsender is that it is a
background process and it is not very apparent whether the slot is
created, and for foreground processes, it is a bit clear when the
command is executed.

Can you please tell me the use case of this additional message?

Replication slot acquisitions and releases by backends say when
running pg_replication_slot_advance or pg_logical_slot_get_changes or
pg_drop_replication_slot or pg_create_{physical,
logical}_replication_slot are transient unlike walsenders which
comparatively hold slots for longer durations. Therefore, I've added
them only for walsenders. These messages help to know the lifetime of
a replication slot - one can know how long a streaming standby or
logical subscriber is down, IOW, how long a replication slot is
inactive in production. For instance, the time between released and
acquired slots in the below messages is the inactive replication slot
duration.

2023-11-13 11:06:34.338 UTC [470262] LOG: acquired physical
replication slot "sb_repl_slot"
2023-11-13 11:06:34.338 UTC [470262] STATEMENT: START_REPLICATION
SLOT "sb_repl_slot" 0/3000000 TIMELINE 1
2023-11-13 11:09:24.918 UTC [470262] LOG: released physical
replication slot "sb_repl_slot"
2023-11-13 12:01:40.530 UTC [470967] LOG: acquired physical
replication slot "sb_repl_slot"
2023-11-13 12:01:40.530 UTC [470967] STATEMENT: START_REPLICATION
SLOT "sb_repl_slot" 0/3000000 TIMELINE 1

If so, the other alternative is to either use a
parameter to the existing function or directly use am_walsender flag
to distinguish when to print the message in acquire/release calls.

Done that way. PSA v12.

A few other minor comments:
1.
+        Causes each replication command and related activity to be logged in
+        the server log.

Can we be bit more specific by changing to something like: "Causes
each replication command and slot acquisition/release to be logged in
the server log."

Done.

2.
+ ereport(log_replication_commands ? LOG : DEBUG1,
+ (errmsg("walsender process with PID %d acquired %s replication slot \"%s\"",

It seems PID and process name is quite unlike what we print in other
similar messages. For example, see below messages when we start
replication via pub/sub :

We can get the PID from the log line as for other logs and I don't see
the process name printed anywhere else.

There was a comment upthread to have PID printed, but I agree to be
consistent and changed the messages to be: acquired physical/logical
replication slot "foo" and released physical/logical replication slot
"foo".

PSA v12 patch.

Compiler isn't happy with v12
https://cirrus-ci.com/task/5543061376204800?logs=gcc_warning#L405. PSA
v13 patch.

--
Bharath Rupireddy
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

Attachments:

v13-0001-Log-messages-for-replication-slot-acquisition-an.patchapplication/octet-stream; name=v13-0001-Log-messages-for-replication-slot-acquisition-an.patchDownload
From 390cd5c81518f209e30e7dde3220185ae006b04d Mon Sep 17 00:00:00 2001
From: Bharath Rupireddy <bharath.rupireddyforpostgres@gmail.com>
Date: Mon, 13 Nov 2023 14:08:23 +0000
Subject: [PATCH v13] Log messages for replication slot acquisition and release

This commit adds log messages (at LOG level when
log_replication_commands is set, otherwise at DEBUG1 level) when
walsenders acquire and release replication slots. These messages
help to know the lifetime of a replication slot - one can know how
long a streaming standby or logical subscriber or replication slot
consumer is down. In other words, one can know how long a
replication slot is inactive - the time between released and
acquired slot messages is the inactive replication slot duration.
These messages will be extremely useful on production servers to
debug and analyze inactive replication slot issues.

Note that these messages are emitted only for walsenders when
replication slot is acquired and released, but not for backends.
This is because walsenders are the ones that typically hold
replication slots for longer durations unlike backends which hold
replication slots for executing some replication related
functions.
---
 doc/src/sgml/config.sgml       | 10 +++++-----
 src/backend/replication/slot.c | 24 ++++++++++++++++++++++++
 2 files changed, 29 insertions(+), 5 deletions(-)

diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml
index fc35a46e5e..1dc014ce84 100644
--- a/doc/src/sgml/config.sgml
+++ b/doc/src/sgml/config.sgml
@@ -7510,11 +7510,11 @@ log_line_prefix = '%m [%p] %q%u@%d/%a '
       </term>
       <listitem>
        <para>
-        Causes each replication command to be logged in the server log.
-        See <xref linkend="protocol-replication"/> for more information about
-        replication command. The default value is <literal>off</literal>.
-        Only superusers and users with the appropriate <literal>SET</literal>
-        privilege can change this setting.
+        Causes each replication command and slot acquisition/release to be
+        logged in the server log. See <xref linkend="protocol-replication"/>
+        for more information about replication command. The default value is
+        <literal>off</literal>. Only superusers and users with the appropriate
+        <literal>SET</literal> privilege can change this setting.
        </para>
       </listitem>
      </varlistentry>
diff --git a/src/backend/replication/slot.c b/src/backend/replication/slot.c
index 781aa43cc4..379aece031 100644
--- a/src/backend/replication/slot.c
+++ b/src/backend/replication/slot.c
@@ -537,6 +537,12 @@ retry:
 	 */
 	if (SlotIsLogical(s))
 		pgstat_acquire_replslot(s);
+
+	if (am_walsender)
+		ereport(log_replication_commands ? LOG : DEBUG1,
+				errmsg_internal("acquired %s replication slot \"%s\"",
+								SlotIsPhysical(MyReplicationSlot) ? "physical" : "logical",
+								NameStr(MyReplicationSlot->data.name)));
 }
 
 /*
@@ -549,9 +555,17 @@ void
 ReplicationSlotRelease(void)
 {
 	ReplicationSlot *slot = MyReplicationSlot;
+	char	   *slotname = NULL;	/* keep compiler quiet */
+	bool		is_physical = false;	/* keep compiler quiet */
 
 	Assert(slot != NULL && slot->active_pid != 0);
 
+	if (am_walsender)
+	{
+		slotname = pstrdup(NameStr(MyReplicationSlot->data.name));
+		is_physical = SlotIsPhysical(MyReplicationSlot);
+	}
+
 	if (slot->data.persistency == RS_EPHEMERAL)
 	{
 		/*
@@ -596,6 +610,16 @@ ReplicationSlotRelease(void)
 	MyProc->statusFlags &= ~PROC_IN_LOGICAL_DECODING;
 	ProcGlobal->statusFlags[MyProc->pgxactoff] = MyProc->statusFlags;
 	LWLockRelease(ProcArrayLock);
+
+	if (am_walsender)
+	{
+		ereport(log_replication_commands ? LOG : DEBUG1,
+				errmsg_internal("released %s replication slot \"%s\"",
+								is_physical ? "physical" : "logical",
+								slotname));
+
+		pfree(slotname);
+	}
 }
 
 /*
-- 
2.34.1

#32Peter Smith
smithpb2250@gmail.com
In reply to: Bharath Rupireddy (#31)
Re: add log messages when replication slots become active and inactive (was Re: Is it worth adding ReplicationSlot active_pid to ReplicationSlotPersistentData?)

Here are some review comments for patch v13-0001.

======
doc/src/sgml/config.sgml

1.
        <para>
-        Causes each replication command to be logged in the server log.
-        See <xref linkend="protocol-replication"/> for more information about
-        replication command. The default value is <literal>off</literal>.
-        Only superusers and users with the appropriate <literal>SET</literal>
-        privilege can change this setting.
+        Causes each replication command and slot acquisition/release to be
+        logged in the server log. See <xref linkend="protocol-replication"/>
+        for more information about replication command. The default value is
+        <literal>off</literal>. Only superusers and users with the appropriate
+        <literal>SET</literal> privilege can change this setting.
        </para>

Should that also mention about walsender?

e.g.
"and slot acquisition/release" ==> "and <literal>walsender</literal>
slot acquisition/release"

======
src/backend/replication/slot.c

2. ReplicationSlotAcquire

  if (SlotIsLogical(s))
  pgstat_acquire_replslot(s);
+
+ if (am_walsender)
+ ereport(log_replication_commands ? LOG : DEBUG1,
+ errmsg_internal("acquired %s replication slot \"%s\"",
+ SlotIsPhysical(MyReplicationSlot) ? "physical" : "logical",
+ NameStr(MyReplicationSlot->data.name)));

2a.
Instead of calling SlotIsLogical() and then again calling
SlotIsPhysical(), it might be better to assign this one time to a
local variable.

~

2b.
IMO it is better to continue using variable 's' here instead of
'MyReplicationSlot'. Code is not only shorter but is also consistent
with the rest of the function which never uses MyReplicationSlot, even
in the places where it could have.

~

SUGGESTION (for #2a and #2b)
is_logical = SlotIsLogical(s);
if (is_logical)
pgstat_acquire_replslot(s);

if (am_walsender)
ereport(log_replication_commands ? LOG : DEBUG1,
errmsg_internal("acquired %s replication slot \"%s\"",
is_logical ? "logical" : "physical", NameStr(s->data.name)));

~~~

3. ReplicationSlotRelease

 ReplicationSlotRelease(void)
 {
  ReplicationSlot *slot = MyReplicationSlot;
+ char    *slotname = NULL; /* keep compiler quiet */
+ bool is_physical = false; /* keep compiler quiet */

Assert(slot != NULL && slot->active_pid != 0);

+ if (am_walsender)
+ {
+ slotname = pstrdup(NameStr(MyReplicationSlot->data.name));
+ is_physical = SlotIsPhysical(MyReplicationSlot);
+ }
+

3a.
Notice 'MyReplicationSlot' is already assigned to the local 'slot'
variable, so IMO it is better if this new code also uses that 'slot'
variable for consistency with the rest of the function.

~

3b.
Consider flipping the flag to be 'is_logical' instead of
'is_physical', so the ereport substitution will match the other
ReplicationSlotAcquirecode suggested above (#2a).

~

SUGGESTION (For #3a and #3b)
if (am_walsender)
{
slotname = pstrdup(NameStr(slot->data.name));
is_logical = SlotIsLogical(slot);
}

======
Kind Regards,
Peter Smith.
Fujitsu Australia

#33Bharath Rupireddy
bharath.rupireddyforpostgres@gmail.com
In reply to: Peter Smith (#32)
1 attachment(s)
Re: add log messages when replication slots become active and inactive (was Re: Is it worth adding ReplicationSlot active_pid to ReplicationSlotPersistentData?)

On Tue, Nov 14, 2023 at 4:50 AM Peter Smith <smithpb2250@gmail.com> wrote:

Here are some review comments for patch v13-0001.

Thanks.

======
doc/src/sgml/config.sgml

1.

Should that also mention about walsender?

e.g.
"and slot acquisition/release" ==> "and <literal>walsender</literal>
slot acquisition/release"

Changed.

2a.
Instead of calling SlotIsLogical() and then again calling
SlotIsPhysical(), it might be better to assign this one time to a
local variable.

2b.
IMO it is better to continue using variable 's' here instead of
'MyReplicationSlot'. Code is not only shorter but is also consistent
with the rest of the function which never uses MyReplicationSlot, even
in the places where it could have.

SUGGESTION (for #2a and #2b)
is_logical = SlotIsLogical(s);
if (is_logical)
pgstat_acquire_replslot(s);

if (am_walsender)
ereport(log_replication_commands ? LOG : DEBUG1,
errmsg_internal("acquired %s replication slot \"%s\"",
is_logical ? "logical" : "physical", NameStr(s->data.name)));

Use of a separate variable isn't good IMO, I used SlotIsLogical(s); directly.

3a.
Notice 'MyReplicationSlot' is already assigned to the local 'slot'
variable, so IMO it is better if this new code also uses that 'slot'
variable for consistency with the rest of the function.

3b.
Consider flipping the flag to be 'is_logical' instead of
'is_physical', so the ereport substitution will match the other
ReplicationSlotAcquirecode suggested above (#2a).

SUGGESTION (For #3a and #3b)
if (am_walsender)
{
slotname = pstrdup(NameStr(slot->data.name));
is_logical = SlotIsLogical(slot);
}

Done.

PSA v14 patch.

--
Bharath Rupireddy
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

Attachments:

v14-0001-Log-messages-for-replication-slot-acquisition-an.patchapplication/x-patch; name=v14-0001-Log-messages-for-replication-slot-acquisition-an.patchDownload
From 87d04d040fc048af2065f34c6c5887c84d4e15da Mon Sep 17 00:00:00 2001
From: Bharath Rupireddy <bharath.rupireddyforpostgres@gmail.com>
Date: Tue, 14 Nov 2023 06:53:02 +0000
Subject: [PATCH v14] Log messages for replication slot acquisition and release

This commit adds log messages (at LOG level when
log_replication_commands is set, otherwise at DEBUG1 level) when
walsenders acquire and release replication slots. These messages
help to know the lifetime of a replication slot - one can know how
long a streaming standby or logical subscriber or replication slot
consumer is down. In other words, one can know how long a
replication slot is inactive - the time between released and
acquired slot messages is the inactive replication slot duration.
These messages will be extremely useful on production servers to
debug and analyze inactive replication slot issues.

Note that these messages are emitted only for walsenders when
replication slot is acquired and released, but not for backends.
This is because walsenders are the ones that typically hold
replication slots for longer durations unlike backends which hold
replication slots for executing some replication related
functions.
---
 doc/src/sgml/config.sgml       |  7 ++++---
 src/backend/replication/slot.c | 24 ++++++++++++++++++++++++
 2 files changed, 28 insertions(+), 3 deletions(-)

diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml
index fc35a46e5e..d87320141a 100644
--- a/doc/src/sgml/config.sgml
+++ b/doc/src/sgml/config.sgml
@@ -7510,9 +7510,10 @@ log_line_prefix = '%m [%p] %q%u@%d/%a '
       </term>
       <listitem>
        <para>
-        Causes each replication command to be logged in the server log.
-        See <xref linkend="protocol-replication"/> for more information about
-        replication command. The default value is <literal>off</literal>.
+        Causes each replication command and <literal>walsender</literal>
+        process replication slot acquisition/release to be logged in the server
+        log. See <xref linkend="protocol-replication"/> for more information
+        about replication command. The default value is <literal>off</literal>.
         Only superusers and users with the appropriate <literal>SET</literal>
         privilege can change this setting.
        </para>
diff --git a/src/backend/replication/slot.c b/src/backend/replication/slot.c
index 781aa43cc4..5d6ba0aef1 100644
--- a/src/backend/replication/slot.c
+++ b/src/backend/replication/slot.c
@@ -537,6 +537,12 @@ retry:
 	 */
 	if (SlotIsLogical(s))
 		pgstat_acquire_replslot(s);
+
+	if (am_walsender)
+		ereport(log_replication_commands ? LOG : DEBUG1,
+				errmsg_internal("acquired %s replication slot \"%s\"",
+								SlotIsLogical(s) ? "logical" : "physical",
+								NameStr(s->data.name)));
 }
 
 /*
@@ -549,9 +555,17 @@ void
 ReplicationSlotRelease(void)
 {
 	ReplicationSlot *slot = MyReplicationSlot;
+	char	   *slotname = NULL;	/* keep compiler quiet */
+	bool		is_logical = false; /* keep compiler quiet */
 
 	Assert(slot != NULL && slot->active_pid != 0);
 
+	if (am_walsender)
+	{
+		slotname = pstrdup(NameStr(slot->data.name));
+		is_logical = SlotIsLogical(slot);
+	}
+
 	if (slot->data.persistency == RS_EPHEMERAL)
 	{
 		/*
@@ -596,6 +610,16 @@ ReplicationSlotRelease(void)
 	MyProc->statusFlags &= ~PROC_IN_LOGICAL_DECODING;
 	ProcGlobal->statusFlags[MyProc->pgxactoff] = MyProc->statusFlags;
 	LWLockRelease(ProcArrayLock);
+
+	if (am_walsender)
+	{
+		ereport(log_replication_commands ? LOG : DEBUG1,
+				errmsg_internal("released %s replication slot \"%s\"",
+								is_logical ? "logical" : "physical",
+								slotname));
+
+		pfree(slotname);
+	}
 }
 
 /*
-- 
2.34.1

#34Peter Smith
smithpb2250@gmail.com
In reply to: Bharath Rupireddy (#33)
Re: add log messages when replication slots become active and inactive (was Re: Is it worth adding ReplicationSlot active_pid to ReplicationSlotPersistentData?)

Hi,

Thanks for addressing my previous comments. Patch v14-0001 looks good
to me, except I have one question:

The patch uses errmsg_internal() for the logging, but I noticed the
only other code using GUC 'log_replication_commands' has errmsg()
instead of errmsg_internal(). Isn't it better to be consistent with
the existing code?

======
Kind Regards,
Peter Smith.
Fujitsu Australia

#35Amit Kapila
amit.kapila16@gmail.com
In reply to: Peter Smith (#34)
Re: add log messages when replication slots become active and inactive (was Re: Is it worth adding ReplicationSlot active_pid to ReplicationSlotPersistentData?)

On Wed, Nov 15, 2023 at 4:40 AM Peter Smith <smithpb2250@gmail.com> wrote:

Thanks for addressing my previous comments. Patch v14-0001 looks good
to me, except I have one question:

The patch uses errmsg_internal() for the logging, but I noticed the
only other code using GUC 'log_replication_commands' has errmsg()
instead of errmsg_internal(). Isn't it better to be consistent with
the existing code?

I agree that we should errmsg here. If we read the description of
errmsg_internal() [1]https://www.postgresql.org/docs/devel/error-message-reporting.html, it is recommended to be used for "cannot
happen" cases where we don't want to spend translation effort which is
not the case here. Also, similar to the below message, we should add a
comment for a translator.

ereport(LOG,
/* translator: %s is SIGKILL or SIGABRT */
(errmsg("issuing %s to recalcitrant children",
send_abort_for_kill ? "SIGABRT" : "SIGKILL")));

Another minor comment:
+        Causes each replication command and <literal>walsender</literal>
+        process replication slot acquisition/release to be logged in the server
+        log.

Isn't it better to use process's instead of process in the above sentence?

[1]: https://www.postgresql.org/docs/devel/error-message-reporting.html

--
With Regards,
Amit Kapila.

#36Bharath Rupireddy
bharath.rupireddyforpostgres@gmail.com
In reply to: Amit Kapila (#35)
1 attachment(s)
Re: add log messages when replication slots become active and inactive (was Re: Is it worth adding ReplicationSlot active_pid to ReplicationSlotPersistentData?)

On Wed, Nov 15, 2023 at 9:49 AM Amit Kapila <amit.kapila16@gmail.com> wrote:

On Wed, Nov 15, 2023 at 4:40 AM Peter Smith <smithpb2250@gmail.com> wrote:

Thanks for addressing my previous comments. Patch v14-0001 looks good
to me, except I have one question:

The patch uses errmsg_internal() for the logging, but I noticed the
only other code using GUC 'log_replication_commands' has errmsg()
instead of errmsg_internal(). Isn't it better to be consistent with
the existing code?

I agree that we should errmsg here. If we read the description of
errmsg_internal() [1], it is recommended to be used for "cannot
happen" cases where we don't want to spend translation effort which is
not the case here.

I chose not to translate the newly added messages as they are only
written to server logs not sent to the client. However, I've changed
to errmsg, after looking at the errmsg_internal docs.

Also, similar to the below message, we should add a
comment for a translator.

ereport(LOG,
/* translator: %s is SIGKILL or SIGABRT */
(errmsg("issuing %s to recalcitrant children",
send_abort_for_kill ? "SIGABRT" : "SIGKILL")));

Added.

Another minor comment:
+        Causes each replication command and <literal>walsender</literal>
+        process replication slot acquisition/release to be logged in the server
+        log.

Isn't it better to use process's instead of process in the above sentence?

Changed.

PSA v15 patch.

--
Bharath Rupireddy
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

Attachments:

v15-0001-Log-messages-for-replication-slot-acquisition-an.patchapplication/octet-stream; name=v15-0001-Log-messages-for-replication-slot-acquisition-an.patchDownload
From 3cf4416fc01cd27bf90059e136dc008b8b5b726f Mon Sep 17 00:00:00 2001
From: Bharath Rupireddy <bharath.rupireddyforpostgres@gmail.com>
Date: Wed, 15 Nov 2023 05:21:45 +0000
Subject: [PATCH v15] Log messages for replication slot acquisition and release

This commit adds log messages (at LOG level when
log_replication_commands is set, otherwise at DEBUG1 level) when
walsenders acquire and release replication slots. These messages
help to know the lifetime of a replication slot - one can know how
long a streaming standby or logical subscriber or replication slot
consumer is down. In other words, one can know how long a
replication slot is inactive - the time between released and
acquired slot messages is the inactive replication slot duration.
These messages will be extremely useful on production servers to
debug and analyze inactive replication slot issues.

Note that these messages are emitted only for walsenders when
replication slot is acquired and released, but not for backends.
This is because walsenders are the ones that typically hold
replication slots for longer durations unlike backends which hold
replication slots for executing some replication related
functions.
---
 doc/src/sgml/config.sgml       | 11 ++++++-----
 src/backend/replication/slot.c | 36 ++++++++++++++++++++++++++++++++++
 2 files changed, 42 insertions(+), 5 deletions(-)

diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml
index fc35a46e5e..9398afbcbd 100644
--- a/doc/src/sgml/config.sgml
+++ b/doc/src/sgml/config.sgml
@@ -7510,11 +7510,12 @@ log_line_prefix = '%m [%p] %q%u@%d/%a '
       </term>
       <listitem>
        <para>
-        Causes each replication command to be logged in the server log.
-        See <xref linkend="protocol-replication"/> for more information about
-        replication command. The default value is <literal>off</literal>.
-        Only superusers and users with the appropriate <literal>SET</literal>
-        privilege can change this setting.
+        Causes each replication command and <literal>walsender</literal>
+        process's replication slot acquisition/release to be logged in the
+        server log. See <xref linkend="protocol-replication"/> for more
+        information about replication command. The default value is
+        <literal>off</literal>. Only superusers and users with the appropriate
+        <literal>SET</literal> privilege can change this setting.
        </para>
       </listitem>
      </varlistentry>
diff --git a/src/backend/replication/slot.c b/src/backend/replication/slot.c
index 781aa43cc4..bac21244bd 100644
--- a/src/backend/replication/slot.c
+++ b/src/backend/replication/slot.c
@@ -537,6 +537,19 @@ retry:
 	 */
 	if (SlotIsLogical(s))
 		pgstat_acquire_replslot(s);
+
+	if (am_walsender)
+	{
+		ereport(log_replication_commands ? LOG : DEBUG1,
+
+		/*
+		 * translator: first %s is replication slot type, eg logical or
+		 * physical
+		 */
+				errmsg("acquired %s replication slot \"%s\"",
+					   SlotIsLogical(s) ? "logical" : "physical",
+					   NameStr(s->data.name)));
+	}
 }
 
 /*
@@ -549,9 +562,17 @@ void
 ReplicationSlotRelease(void)
 {
 	ReplicationSlot *slot = MyReplicationSlot;
+	char	   *slotname = NULL;	/* keep compiler quiet */
+	bool		is_logical = false; /* keep compiler quiet */
 
 	Assert(slot != NULL && slot->active_pid != 0);
 
+	if (am_walsender)
+	{
+		slotname = pstrdup(NameStr(slot->data.name));
+		is_logical = SlotIsLogical(slot);
+	}
+
 	if (slot->data.persistency == RS_EPHEMERAL)
 	{
 		/*
@@ -596,6 +617,21 @@ ReplicationSlotRelease(void)
 	MyProc->statusFlags &= ~PROC_IN_LOGICAL_DECODING;
 	ProcGlobal->statusFlags[MyProc->pgxactoff] = MyProc->statusFlags;
 	LWLockRelease(ProcArrayLock);
+
+	if (am_walsender)
+	{
+		ereport(log_replication_commands ? LOG : DEBUG1,
+
+		/*
+		 * translator: first %s is replication slot type, eg logical or
+		 * physical
+		 */
+				errmsg("released %s replication slot \"%s\"",
+					   is_logical ? "logical" : "physical",
+					   slotname));
+
+		pfree(slotname);
+	}
 }
 
 /*
-- 
2.34.1

#37Peter Smith
smithpb2250@gmail.com
In reply to: Amit Kapila (#35)
Re: add log messages when replication slots become active and inactive (was Re: Is it worth adding ReplicationSlot active_pid to ReplicationSlotPersistentData?)

Patch v15-0001 LGTM.

======
Kind Regards,
Peter Smith.
Fujitsu Australia

#38Amit Kapila
amit.kapila16@gmail.com
In reply to: Bharath Rupireddy (#36)
1 attachment(s)
Re: add log messages when replication slots become active and inactive (was Re: Is it worth adding ReplicationSlot active_pid to ReplicationSlotPersistentData?)

On Wed, Nov 15, 2023 at 11:00 AM Bharath Rupireddy
<bharath.rupireddyforpostgres@gmail.com> wrote:

PSA v15 patch.

The patch looks good to me. I have slightly modified the translator
message and commit message in the attached. I'll push this tomorrow
unless there are any comments.

--
With Regards,
Amit Kapila.

Attachments:

v16-0001-Log-messages-for-replication-slot-acquisition-an.patchapplication/octet-stream; name=v16-0001-Log-messages-for-replication-slot-acquisition-an.patchDownload
From 9e76c6bef0350718d36a6db5993baa4cdeacc5e3 Mon Sep 17 00:00:00 2001
From: Bharath Rupireddy <bharath.rupireddyforpostgres@gmail.com>
Date: Wed, 15 Nov 2023 05:21:45 +0000
Subject: [PATCH v16] Log messages for replication slot acquisition and
 release.

This commit log messages (at LOG level when log_replication_commands is
set, otherwise at DEBUG1 level) when walsenders acquire and release
replication slots. These messages help to know the lifetime of a
replication slot - one can know how long a streaming standby, logical
subscriber, or replication slot consumer is down. These messages will be
useful on production servers to debug and analyze inactive replication
slots.

Note that these messages are emitted only for walsenders but not for
backends. This is because walsenders are the ones that typically hold
replication slots for longer durations, unlike backends which hold them
for executing replication related functions.

Author: Bharath Rupireddy
Reviewed-by: Peter Smith, Amit Kapila
Discussion: http://postgr.es/m/CALj2ACX17G7F-jeLt+7KhJ6YxVeRwR8Zk0rDh4VnT546o0UpTQ@mail.gmail.com
---
 doc/src/sgml/config.sgml       | 11 ++++++-----
 src/backend/replication/slot.c | 28 ++++++++++++++++++++++++++++
 2 files changed, 34 insertions(+), 5 deletions(-)

diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml
index fc35a46e5e..9398afbcbd 100644
--- a/doc/src/sgml/config.sgml
+++ b/doc/src/sgml/config.sgml
@@ -7510,11 +7510,12 @@ log_line_prefix = '%m [%p] %q%u@%d/%a '
       </term>
       <listitem>
        <para>
-        Causes each replication command to be logged in the server log.
-        See <xref linkend="protocol-replication"/> for more information about
-        replication command. The default value is <literal>off</literal>.
-        Only superusers and users with the appropriate <literal>SET</literal>
-        privilege can change this setting.
+        Causes each replication command and <literal>walsender</literal>
+        process's replication slot acquisition/release to be logged in the
+        server log. See <xref linkend="protocol-replication"/> for more
+        information about replication command. The default value is
+        <literal>off</literal>. Only superusers and users with the appropriate
+        <literal>SET</literal> privilege can change this setting.
        </para>
       </listitem>
      </varlistentry>
diff --git a/src/backend/replication/slot.c b/src/backend/replication/slot.c
index 781aa43cc4..6f014d0df5 100644
--- a/src/backend/replication/slot.c
+++ b/src/backend/replication/slot.c
@@ -537,6 +537,15 @@ retry:
 	 */
 	if (SlotIsLogical(s))
 		pgstat_acquire_replslot(s);
+
+	if (am_walsender)
+	{
+		ereport(log_replication_commands ? LOG : DEBUG1,
+		/* translator: first %s is logical or physical slot */
+				errmsg("acquired %s replication slot \"%s\"",
+					   SlotIsLogical(s) ? "logical" : "physical",
+					   NameStr(s->data.name)));
+	}
 }
 
 /*
@@ -549,9 +558,17 @@ void
 ReplicationSlotRelease(void)
 {
 	ReplicationSlot *slot = MyReplicationSlot;
+	char	   *slotname = NULL;	/* keep compiler quiet */
+	bool		is_logical = false; /* keep compiler quiet */
 
 	Assert(slot != NULL && slot->active_pid != 0);
 
+	if (am_walsender)
+	{
+		slotname = pstrdup(NameStr(slot->data.name));
+		is_logical = SlotIsLogical(slot);
+	}
+
 	if (slot->data.persistency == RS_EPHEMERAL)
 	{
 		/*
@@ -596,6 +613,17 @@ ReplicationSlotRelease(void)
 	MyProc->statusFlags &= ~PROC_IN_LOGICAL_DECODING;
 	ProcGlobal->statusFlags[MyProc->pgxactoff] = MyProc->statusFlags;
 	LWLockRelease(ProcArrayLock);
+
+	if (am_walsender)
+	{
+		ereport(log_replication_commands ? LOG : DEBUG1,
+		/* translator: first %s is logical or physical slot */
+				errmsg("released %s replication slot \"%s\"",
+					   is_logical ? "logical" : "physical",
+					   slotname));
+
+		pfree(slotname);
+	}
 }
 
 /*
-- 
2.28.0.windows.1

#39Alvaro Herrera
alvherre@alvh.no-ip.org
In reply to: Amit Kapila (#38)
Re: add log messages when replication slots become active and inactive (was Re: Is it worth adding ReplicationSlot active_pid to ReplicationSlotPersistentData?)

Translation-wise, this doesn't work, because you're building a string.
There's no reason to think that the words "logical" and "physical"
should stay untranslated; the message would make no sense, or at least
would be very ugly.

You should do something like

if (am_walsender)
{
ereport(log_replication_commands ? LOG : DEBUG1,
SlotIsLogical(s) ? errmsg("acquired logical replication slot \"%s\"", NameStr(s->data.name)) :
errmsg("acquired physical replication slot \"%s\"", NameStr(s->data.name)));
}

(Obviously, lose the "translator:" comments since they are unnecessary)

If you really want to keep the "logical"/"physical" word untranslated,
you need to split it out of the sentence somehow. But it would be
really horrible IMO. Like

errmsg("acquired replication slot \"%s\" of type \"%s\"",
NameStr(s->data.name), SlotIsLogical(s) ? "logical" : "physical")

--
Álvaro Herrera 48°01'N 7°57'E — https://www.EnterpriseDB.com/
"Use it up, wear it out, make it do, or do without"

#40Bharath Rupireddy
bharath.rupireddyforpostgres@gmail.com
In reply to: Alvaro Herrera (#39)
1 attachment(s)
Re: add log messages when replication slots become active and inactive (was Re: Is it worth adding ReplicationSlot active_pid to ReplicationSlotPersistentData?)

On Wed, Nov 15, 2023 at 3:58 PM Alvaro Herrera <alvherre@alvh.no-ip.org> wrote:

Translation-wise, this doesn't work, because you're building a string.
There's no reason to think that the words "logical" and "physical"
should stay untranslated; the message would make no sense, or at least
would be very ugly.

You should do something like

if (am_walsender)
{
ereport(log_replication_commands ? LOG : DEBUG1,
SlotIsLogical(s) ? errmsg("acquired logical replication slot \"%s\"", NameStr(s->data.name)) :
errmsg("acquired physical replication slot \"%s\"", NameStr(s->data.name)));
}

This seems better, so done that way.

(Obviously, lose the "translator:" comments since they are unnecessary)

The translator message now indicates that the remaining %s denotes the
replication slot name.

PSA v17 patch.

--
Bharath Rupireddy
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

Attachments:

v17-0001-Log-messages-for-replication-slot-acquisition-an.patchapplication/octet-stream; name=v17-0001-Log-messages-for-replication-slot-acquisition-an.patchDownload
From eb5b165fb09652baf226292362a0df4fcff5714c Mon Sep 17 00:00:00 2001
From: Bharath Rupireddy <bharath.rupireddyforpostgres@gmail.com>
Date: Wed, 15 Nov 2023 18:27:30 +0000
Subject: [PATCH v17] Log messages for replication slot acquisition and
 release.

This commit log messages (at LOG level when log_replication_commands is
set, otherwise at DEBUG1 level) when walsenders acquire and release
replication slots. These messages help to know the lifetime of a
replication slot - one can know how long a streaming standby, logical
subscriber, or replication slot consumer is down. These messages will be
useful on production servers to debug and analyze inactive replication
slots.

Note that these messages are emitted only for walsenders but not for
backends. This is because walsenders are the ones that typically hold
replication slots for longer durations, unlike backends which hold them
for executing replication related functions.

Author: Bharath Rupireddy
Reviewed-by: Peter Smith, Amit Kapila
Discussion: http://postgr.es/m/CALj2ACX17G7F-jeLt+7KhJ6YxVeRwR8Zk0rDh4VnT546o0UpTQ@mail.gmail.com
---
 doc/src/sgml/config.sgml       | 11 ++++++-----
 src/backend/replication/slot.c | 32 ++++++++++++++++++++++++++++++++
 2 files changed, 38 insertions(+), 5 deletions(-)

diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml
index fc35a46e5e..9398afbcbd 100644
--- a/doc/src/sgml/config.sgml
+++ b/doc/src/sgml/config.sgml
@@ -7510,11 +7510,12 @@ log_line_prefix = '%m [%p] %q%u@%d/%a '
       </term>
       <listitem>
        <para>
-        Causes each replication command to be logged in the server log.
-        See <xref linkend="protocol-replication"/> for more information about
-        replication command. The default value is <literal>off</literal>.
-        Only superusers and users with the appropriate <literal>SET</literal>
-        privilege can change this setting.
+        Causes each replication command and <literal>walsender</literal>
+        process's replication slot acquisition/release to be logged in the
+        server log. See <xref linkend="protocol-replication"/> for more
+        information about replication command. The default value is
+        <literal>off</literal>. Only superusers and users with the appropriate
+        <literal>SET</literal> privilege can change this setting.
        </para>
       </listitem>
      </varlistentry>
diff --git a/src/backend/replication/slot.c b/src/backend/replication/slot.c
index 781aa43cc4..2bfdbd5e80 100644
--- a/src/backend/replication/slot.c
+++ b/src/backend/replication/slot.c
@@ -537,6 +537,17 @@ retry:
 	 */
 	if (SlotIsLogical(s))
 		pgstat_acquire_replslot(s);
+
+	if (am_walsender)
+	{
+		ereport(log_replication_commands ? LOG : DEBUG1,
+				SlotIsLogical(s)
+		/* translator: %s is name of the replication slot */
+				? errmsg("acquired logical replication slot \"%s\"",
+						 NameStr(s->data.name))
+				: errmsg("acquired physical replication slot \"%s\"",
+						 NameStr(s->data.name)));
+	}
 }
 
 /*
@@ -549,9 +560,17 @@ void
 ReplicationSlotRelease(void)
 {
 	ReplicationSlot *slot = MyReplicationSlot;
+	char	   *slotname = NULL;	/* keep compiler quiet */
+	bool		is_logical = false; /* keep compiler quiet */
 
 	Assert(slot != NULL && slot->active_pid != 0);
 
+	if (am_walsender)
+	{
+		slotname = pstrdup(NameStr(slot->data.name));
+		is_logical = SlotIsLogical(slot);
+	}
+
 	if (slot->data.persistency == RS_EPHEMERAL)
 	{
 		/*
@@ -596,6 +615,19 @@ ReplicationSlotRelease(void)
 	MyProc->statusFlags &= ~PROC_IN_LOGICAL_DECODING;
 	ProcGlobal->statusFlags[MyProc->pgxactoff] = MyProc->statusFlags;
 	LWLockRelease(ProcArrayLock);
+
+	if (am_walsender)
+	{
+		ereport(log_replication_commands ? LOG : DEBUG1,
+				is_logical
+		/* translator: %s is name of the replication slot */
+				? errmsg("released logical replication slot \"%s\"",
+						 slotname)
+				: errmsg("released physical replication slot \"%s\"",
+						 slotname));
+
+		pfree(slotname);
+	}
 }
 
 /*
-- 
2.34.1

#41Peter Smith
smithpb2250@gmail.com
In reply to: Bharath Rupireddy (#40)
Re: add log messages when replication slots become active and inactive (was Re: Is it worth adding ReplicationSlot active_pid to ReplicationSlotPersistentData?)

Some minor comments for v17-0001.

======

1.
+       ereport(log_replication_commands ? LOG : DEBUG1,
+               SlotIsLogical(s)
+       /* translator: %s is name of the replication slot */
+               ? errmsg("acquired logical replication slot \"%s\"",
+                        NameStr(s->data.name))
+               : errmsg("acquired physical replication slot \"%s\"",
+                        NameStr(s->data.name)));

1a.
FWIW, if the ternary was inside the errmsg, there would be less code
duplication.

~

1b.
I searched HEAD code and did not find any "translator:" comments for
just ordinary slot name substitutions like these; AFAICT that comment
is not necessary anymore.

~

SUGGESTION (#1a and #1b)

ereport(log_replication_commands ? LOG : DEBUG1,
errmsg(SlotIsLogical(s)
? "acquired logical replication slot \"%s\""
: "acquired physical replication slot \"%s\"",
NameStr(s->data.name)));

~~~

2.
Ditto for the other ereport.

======
Kind Regards,
Peter Smith.
Fujitsu Australia

#42Amit Kapila
amit.kapila16@gmail.com
In reply to: Peter Smith (#41)
Re: add log messages when replication slots become active and inactive (was Re: Is it worth adding ReplicationSlot active_pid to ReplicationSlotPersistentData?)

On Thu, Nov 16, 2023 at 3:48 AM Peter Smith <smithpb2250@gmail.com> wrote:

~

SUGGESTION (#1a and #1b)

ereport(log_replication_commands ? LOG : DEBUG1,
errmsg(SlotIsLogical(s)
? "acquired logical replication slot \"%s\""
: "acquired physical replication slot \"%s\"",
NameStr(s->data.name)));

~~~

Personally, I prefer the way Bharath had in his patch. Did you see any
preferred way in the existing code?

--
With Regards,
Amit Kapila.

#43Amit Kapila
amit.kapila16@gmail.com
In reply to: Alvaro Herrera (#39)
Re: add log messages when replication slots become active and inactive (was Re: Is it worth adding ReplicationSlot active_pid to ReplicationSlotPersistentData?)

On Wed, Nov 15, 2023 at 3:58 PM Alvaro Herrera <alvherre@alvh.no-ip.org> wrote:

Translation-wise, this doesn't work, because you're building a string.
There's no reason to think that the words "logical" and "physical"
should stay untranslated; the message would make no sense, or at least
would be very ugly.

You should do something like

if (am_walsender)
{
ereport(log_replication_commands ? LOG : DEBUG1,
SlotIsLogical(s) ? errmsg("acquired logical replication slot \"%s\"", NameStr(s->data.name)) :
errmsg("acquired physical replication slot \"%s\"", NameStr(s->data.name)));
}

(Obviously, lose the "translator:" comments since they are unnecessary)

If you really want to keep the "logical"/"physical" word untranslated,
you need to split it out of the sentence somehow. But it would be
really horrible IMO. Like

errmsg("acquired replication slot \"%s\" of type \"%s\"",
NameStr(s->data.name), SlotIsLogical(s) ? "logical" : "physical")

Thanks for the suggestion. I would like to clarify on this a bit. What
do exactly mean by splitting out of the sentence? For example, in one
of the existing messages:

ereport(LOG,
/* translator: %s is SIGKILL or SIGABRT */
(errmsg("issuing %s to recalcitrant children",
send_abort_for_kill ? "SIGABRT" : "SIGKILL")));

Do here words SIGABRT/SIGKILL remain untranslated due to the
translator's comment? I thought this was similar to the message being
proposed but seems like this message construction follows translation
rules better.

--
With Regards,
Amit Kapila.

#44Peter Smith
smithpb2250@gmail.com
In reply to: Amit Kapila (#43)
Re: add log messages when replication slots become active and inactive (was Re: Is it worth adding ReplicationSlot active_pid to ReplicationSlotPersistentData?)

On Thu, Nov 16, 2023 at 12:36 PM Amit Kapila <amit.kapila16@gmail.com> wrote:

On Wed, Nov 15, 2023 at 3:58 PM Alvaro Herrera <alvherre@alvh.no-ip.org> wrote:

Translation-wise, this doesn't work, because you're building a string.
There's no reason to think that the words "logical" and "physical"
should stay untranslated; the message would make no sense, or at least
would be very ugly.

You should do something like

if (am_walsender)
{
ereport(log_replication_commands ? LOG : DEBUG1,
SlotIsLogical(s) ? errmsg("acquired logical replication slot \"%s\"", NameStr(s->data.name)) :
errmsg("acquired physical replication slot \"%s\"", NameStr(s->data.name)));
}

(Obviously, lose the "translator:" comments since they are unnecessary)

If you really want to keep the "logical"/"physical" word untranslated,
you need to split it out of the sentence somehow. But it would be
really horrible IMO. Like

errmsg("acquired replication slot \"%s\" of type \"%s\"",
NameStr(s->data.name), SlotIsLogical(s) ? "logical" : "physical")

Thanks for the suggestion. I would like to clarify on this a bit. What
do exactly mean by splitting out of the sentence? For example, in one
of the existing messages:

ereport(LOG,
/* translator: %s is SIGKILL or SIGABRT */
(errmsg("issuing %s to recalcitrant children",
send_abort_for_kill ? "SIGABRT" : "SIGKILL")));

Do here words SIGABRT/SIGKILL remain untranslated due to the
translator's comment? I thought this was similar to the message being
proposed but seems like this message construction follows translation
rules better.

IIUC, that example is different because "SIGABRT" / "SIGKILL" are not
real words, so you don't want the translator to attempt to translate
them.You want them to appear in the message as-is.

OTOH in this patch "logical" and "physical" are just normal English
words that should be translated as part of the original message.
e.g. like in these similar messages:
- msgid "database \"%s\" is used by an active logical replication slot"
- msgstr "la base de données « %s » est utilisée par un slot de
réplication logique actif"

======
Kind Regards,
Peter Smith.
Fujitsu Australia

#45Peter Smith
smithpb2250@gmail.com
In reply to: Amit Kapila (#42)
Re: add log messages when replication slots become active and inactive (was Re: Is it worth adding ReplicationSlot active_pid to ReplicationSlotPersistentData?)

On Thu, Nov 16, 2023 at 12:18 PM Amit Kapila <amit.kapila16@gmail.com> wrote:

On Thu, Nov 16, 2023 at 3:48 AM Peter Smith <smithpb2250@gmail.com> wrote:

~

SUGGESTION (#1a and #1b)

ereport(log_replication_commands ? LOG : DEBUG1,
errmsg(SlotIsLogical(s)
? "acquired logical replication slot \"%s\""
: "acquired physical replication slot \"%s\"",
NameStr(s->data.name)));

~~~

Personally, I prefer the way Bharath had in his patch. Did you see any
preferred way in the existing code?

Not really. I think the errmsg combined with ternary is not so common.
I couldn't find many examples, so I wouldn't try to claim anything is
a "preferred" way

There are some existing examples, like Bharath had:

ereport(NOTICE,
(errcode(ERRCODE_DUPLICATE_OBJECT),
collencoding == -1
? errmsg("collation \"%s\" already exists, skipping",
collname)
: errmsg("collation \"%s\" for encoding \"%s\" already
exists, skipping",
collname, pg_encoding_to_char(collencoding))));

OTOH, when there are different numbers of substitution parameters in
each of the errmsg like that, you don't have much choice but to do it
that way.

I am fine with whatever is chosen -- I was only offering an alternative.

======
Kind Regards,
Peter Smith.
Fujitsu Australia

#46Alvaro Herrera
alvherre@alvh.no-ip.org
In reply to: Peter Smith (#41)
Re: add log messages when replication slots become active and inactive (was Re: Is it worth adding ReplicationSlot active_pid to ReplicationSlotPersistentData?)

On 2023-Nov-16, Peter Smith wrote:

I searched HEAD code and did not find any "translator:" comments for
just ordinary slot name substitutions like these; AFAICT that comment
is not necessary anymore.

True. Lose that.

The rationale I have is to consider whether a translator looking at the
original message message in isolation is going to understand what the %s
means. If it's possible to tell what it is without having to go read
the source code that leads to the message, then you don't need a
"translator:" comment. Otherwise you do.

You also need to assume the translator is not stupid, but that seems an
OK assumption.

SUGGESTION (#1a and #1b)

ereport(log_replication_commands ? LOG : DEBUG1,
errmsg(SlotIsLogical(s)
? "acquired logical replication slot \"%s\""
: "acquired physical replication slot \"%s\"",
NameStr(s->data.name)));

The bad thing about this is that gettext() is not going to pick up these
strings into the translation catalog. You could fix that by adding
gettext_noop() calls around them:

ereport(log_replication_commands ? LOG : DEBUG1,
errmsg(SlotIsLogical(s)
? gettext_noop("acquired logical replication slot \"%s\"")
: gettext_noop("acquired physical replication slot \"%s\""),
NameStr(s->data.name)));

but at that point it's not clear that it's really better than putting
the ternary in the outer scope.

You can verify this by doing "make update-po" and then searching for the
messages in postgres.pot.

--
Álvaro Herrera PostgreSQL Developer — https://www.EnterpriseDB.com/
"Cada quien es cada cual y baja las escaleras como quiere" (JMSerrat)

#47Bharath Rupireddy
bharath.rupireddyforpostgres@gmail.com
In reply to: Alvaro Herrera (#46)
1 attachment(s)
Re: add log messages when replication slots become active and inactive (was Re: Is it worth adding ReplicationSlot active_pid to ReplicationSlotPersistentData?)

On Thu, Nov 16, 2023 at 4:01 PM Alvaro Herrera <alvherre@alvh.no-ip.org> wrote:

On 2023-Nov-16, Peter Smith wrote:

I searched HEAD code and did not find any "translator:" comments for
just ordinary slot name substitutions like these; AFAICT that comment
is not necessary anymore.

True. Lose that.

Removed.

The rationale I have is to consider whether a translator looking at the
original message message in isolation is going to understand what the %s
means. If it's possible to tell what it is without having to go read
the source code that leads to the message, then you don't need a
"translator:" comment. Otherwise you do.

Agree. I think it's easy for one to guess the slot name follows "....
replication slot %s", so I removed the translator message.

SUGGESTION (#1a and #1b)

ereport(log_replication_commands ? LOG : DEBUG1,
errmsg(SlotIsLogical(s)
? "acquired logical replication slot \"%s\""
: "acquired physical replication slot \"%s\"",
NameStr(s->data.name)));

The bad thing about this is that gettext() is not going to pick up these
strings into the translation catalog. You could fix that by adding
gettext_noop() calls around them:

ereport(log_replication_commands ? LOG : DEBUG1,
errmsg(SlotIsLogical(s)
? gettext_noop("acquired logical replication slot \"%s\"")
: gettext_noop("acquired physical replication slot \"%s\""),
NameStr(s->data.name)));

but at that point it's not clear that it's really better than putting
the ternary in the outer scope.

I retained wrapping messages in errmsg("...").

You can verify this by doing "make update-po" and then searching for the
messages in postgres.pot.

Translation gives me [1]#: replication/slot.c:545 #, c-format msgid "acquired logical replication slot \"%s\"" msgstr "" with v18 patch

PSA v18 patch.

[1]: #: replication/slot.c:545 #, c-format msgid "acquired logical replication slot \"%s\"" msgstr ""
#: replication/slot.c:545
#, c-format
msgid "acquired logical replication slot \"%s\""
msgstr ""

#: replication/slot.c:547
#, c-format
msgid "acquired physical replication slot \"%s\""
msgstr ""

#: replication/slot.c:622
#, c-format
msgid "released logical replication slot \"%s\""
msgstr ""

#: replication/slot.c:624
#, c-format
msgid "released physical replication slot \"%s\""
msgstr ""

--
Bharath Rupireddy
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

Attachments:

v18-0001-Log-messages-for-replication-slot-acquisition-an.patchapplication/octet-stream; name=v18-0001-Log-messages-for-replication-slot-acquisition-an.patchDownload
From 8e0fe98061ad5c49f42f8595f157d312daad7c04 Mon Sep 17 00:00:00 2001
From: Bharath Rupireddy <bharath.rupireddyforpostgres@gmail.com>
Date: Thu, 16 Nov 2023 12:26:10 +0000
Subject: [PATCH v18] Log messages for replication slot acquisition and
 release.

This commit log messages (at LOG level when log_replication_commands is
set, otherwise at DEBUG1 level) when walsenders acquire and release
replication slots. These messages help to know the lifetime of a
replication slot - one can know how long a streaming standby, logical
subscriber, or replication slot consumer is down. These messages will be
useful on production servers to debug and analyze inactive replication
slots.

Note that these messages are emitted only for walsenders but not for
backends. This is because walsenders are the ones that typically hold
replication slots for longer durations, unlike backends which hold them
for executing replication related functions.

Author: Bharath Rupireddy
Reviewed-by: Peter Smith, Amit Kapila, Alvaro Herrera
Discussion: http://postgr.es/m/CALj2ACX17G7F-jeLt+7KhJ6YxVeRwR8Zk0rDh4VnT546o0UpTQ@mail.gmail.com
---
 doc/src/sgml/config.sgml       | 11 ++++++-----
 src/backend/replication/slot.c | 30 ++++++++++++++++++++++++++++++
 2 files changed, 36 insertions(+), 5 deletions(-)

diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml
index fc35a46e5e..9398afbcbd 100644
--- a/doc/src/sgml/config.sgml
+++ b/doc/src/sgml/config.sgml
@@ -7510,11 +7510,12 @@ log_line_prefix = '%m [%p] %q%u@%d/%a '
       </term>
       <listitem>
        <para>
-        Causes each replication command to be logged in the server log.
-        See <xref linkend="protocol-replication"/> for more information about
-        replication command. The default value is <literal>off</literal>.
-        Only superusers and users with the appropriate <literal>SET</literal>
-        privilege can change this setting.
+        Causes each replication command and <literal>walsender</literal>
+        process's replication slot acquisition/release to be logged in the
+        server log. See <xref linkend="protocol-replication"/> for more
+        information about replication command. The default value is
+        <literal>off</literal>. Only superusers and users with the appropriate
+        <literal>SET</literal> privilege can change this setting.
        </para>
       </listitem>
      </varlistentry>
diff --git a/src/backend/replication/slot.c b/src/backend/replication/slot.c
index 781aa43cc4..18bc28195b 100644
--- a/src/backend/replication/slot.c
+++ b/src/backend/replication/slot.c
@@ -537,6 +537,16 @@ retry:
 	 */
 	if (SlotIsLogical(s))
 		pgstat_acquire_replslot(s);
+
+	if (am_walsender)
+	{
+		ereport(log_replication_commands ? LOG : DEBUG1,
+				SlotIsLogical(s)
+				? errmsg("acquired logical replication slot \"%s\"",
+						 NameStr(s->data.name))
+				: errmsg("acquired physical replication slot \"%s\"",
+						 NameStr(s->data.name)));
+	}
 }
 
 /*
@@ -549,9 +559,17 @@ void
 ReplicationSlotRelease(void)
 {
 	ReplicationSlot *slot = MyReplicationSlot;
+	char	   *slotname = NULL;	/* keep compiler quiet */
+	bool		is_logical = false; /* keep compiler quiet */
 
 	Assert(slot != NULL && slot->active_pid != 0);
 
+	if (am_walsender)
+	{
+		slotname = pstrdup(NameStr(slot->data.name));
+		is_logical = SlotIsLogical(slot);
+	}
+
 	if (slot->data.persistency == RS_EPHEMERAL)
 	{
 		/*
@@ -596,6 +614,18 @@ ReplicationSlotRelease(void)
 	MyProc->statusFlags &= ~PROC_IN_LOGICAL_DECODING;
 	ProcGlobal->statusFlags[MyProc->pgxactoff] = MyProc->statusFlags;
 	LWLockRelease(ProcArrayLock);
+
+	if (am_walsender)
+	{
+		ereport(log_replication_commands ? LOG : DEBUG1,
+				is_logical
+				? errmsg("released logical replication slot \"%s\"",
+						 slotname)
+				: errmsg("released physical replication slot \"%s\"",
+						 slotname));
+
+		pfree(slotname);
+	}
 }
 
 /*
-- 
2.34.1

#48Amit Kapila
amit.kapila16@gmail.com
In reply to: Bharath Rupireddy (#47)
Re: add log messages when replication slots become active and inactive (was Re: Is it worth adding ReplicationSlot active_pid to ReplicationSlotPersistentData?)

On Thu, Nov 16, 2023 at 6:09 PM Bharath Rupireddy
<bharath.rupireddyforpostgres@gmail.com> wrote:

On Thu, Nov 16, 2023 at 4:01 PM Alvaro Herrera <alvherre@alvh.no-ip.org> wrote:

On 2023-Nov-16, Peter Smith wrote:

I searched HEAD code and did not find any "translator:" comments for
just ordinary slot name substitutions like these; AFAICT that comment
is not necessary anymore.

True. Lose that.

Removed.

The rationale I have is to consider whether a translator looking at the
original message message in isolation is going to understand what the %s
means. If it's possible to tell what it is without having to go read
the source code that leads to the message, then you don't need a
"translator:" comment. Otherwise you do.

Agree. I think it's easy for one to guess the slot name follows "....
replication slot %s", so I removed the translator message.

SUGGESTION (#1a and #1b)

ereport(log_replication_commands ? LOG : DEBUG1,
errmsg(SlotIsLogical(s)
? "acquired logical replication slot \"%s\""
: "acquired physical replication slot \"%s\"",
NameStr(s->data.name)));

The bad thing about this is that gettext() is not going to pick up these
strings into the translation catalog. You could fix that by adding
gettext_noop() calls around them:

ereport(log_replication_commands ? LOG : DEBUG1,
errmsg(SlotIsLogical(s)
? gettext_noop("acquired logical replication slot \"%s\"")
: gettext_noop("acquired physical replication slot \"%s\""),
NameStr(s->data.name)));

but at that point it's not clear that it's really better than putting
the ternary in the outer scope.

I retained wrapping messages in errmsg("...").

You can verify this by doing "make update-po" and then searching for the
messages in postgres.pot.

Translation gives me [1] with v18 patch

PSA v18 patch.

LGTM. I'll push this early next week unless there are further
suggestions or comments.

--
With Regards,
Amit Kapila.

#49Amit Kapila
amit.kapila16@gmail.com
In reply to: Amit Kapila (#48)
Re: add log messages when replication slots become active and inactive (was Re: Is it worth adding ReplicationSlot active_pid to ReplicationSlotPersistentData?)

On Sat, Nov 18, 2023 at 4:54 PM Amit Kapila <amit.kapila16@gmail.com> wrote:

On Thu, Nov 16, 2023 at 6:09 PM Bharath Rupireddy
<bharath.rupireddyforpostgres@gmail.com> wrote:

PSA v18 patch.

LGTM. I'll push this early next week unless there are further
suggestions or comments.

Pushed.

--
With Regards,
Amit Kapila.