Is it worth adding ReplicationSlot active_pid to ReplicationSlotPersistentData?

Started by Bharath Rupireddyover 4 years ago49 messageshackers
Jump to latest
#1Bharath Rupireddy
bharath.rupireddyforpostgres@gmail.com

Hi,

The active_pid of ReplicationSlot structure, which tells whether a
replication slot is active or inactive, isn't persisted to the disk
i.e has no entry in ReplicationSlotPersistentData structure. Isn't it
better if we add that info to ReplicationSlotPersistentData structure
and persist to the disk? This will help to know what were the inactive
replication slots in case the server goes down or crashes for some
reason. Currently, we don't have a way to interpret the replication
slot info in the disk but there's a patch for pg_replslotdata tool at
[1]: /messages/by-id/CALj2ACW0rV5gWK8A3m6_X62qH+Vfaq5hznC=i0R5Wojt5+yhyw@mail.gmail.com
down/crash and figure out which replication slots to remove to bring
the server up and running without touching the other replication
slots.

Thoughts?

[1]: /messages/by-id/CALj2ACW0rV5gWK8A3m6_X62qH+Vfaq5hznC=i0R5Wojt5+yhyw@mail.gmail.com

Regards,
Bharath Rupireddy.

#2Alvaro Herrera
alvherre@2ndquadrant.com
In reply to: Bharath Rupireddy (#1)
Re: Is it worth adding ReplicationSlot active_pid to ReplicationSlotPersistentData?

On 2021-Dec-01, Bharath Rupireddy wrote:

The active_pid of ReplicationSlot structure, which tells whether a
replication slot is active or inactive, isn't persisted to the disk
i.e has no entry in ReplicationSlotPersistentData structure. Isn't it
better if we add that info to ReplicationSlotPersistentData structure
and persist to the disk? This will help to know what were the inactive
replication slots in case the server goes down or crashes for some
reason. Currently, we don't have a way to interpret the replication
slot info in the disk but there's a patch for pg_replslotdata tool at
[1]. This way, one can figure out the reasons for the server
down/crash and figure out which replication slots to remove to bring
the server up and running without touching the other replication
slots.

I think the PIDs are log-worthy for sure, but it's not clear to me that
it is desirable to write them to the persistent state file. In case of
crashes, the log should serve just fine to aid root cause investigation
-- in fact even better than the persistent file, where the data would be
lost as soon as the next client acquires that slot.

--
Álvaro Herrera Valdivia, Chile — https://www.EnterpriseDB.com/
"The Postgresql hackers have what I call a "NASA space shot" mentality.
Quite refreshing in a world of "weekend drag racer" developers."
(Scott Marlowe)

#3Bharath Rupireddy
bharath.rupireddyforpostgres@gmail.com
In reply to: Alvaro Herrera (#2)
Re: Is it worth adding ReplicationSlot active_pid to ReplicationSlotPersistentData?

On Wed, Dec 1, 2021 at 9:50 PM Alvaro Herrera <alvherre@alvh.no-ip.org> wrote:

On 2021-Dec-01, Bharath Rupireddy wrote:

The active_pid of ReplicationSlot structure, which tells whether a
replication slot is active or inactive, isn't persisted to the disk
i.e has no entry in ReplicationSlotPersistentData structure. Isn't it
better if we add that info to ReplicationSlotPersistentData structure
and persist to the disk? This will help to know what were the inactive
replication slots in case the server goes down or crashes for some
reason. Currently, we don't have a way to interpret the replication
slot info in the disk but there's a patch for pg_replslotdata tool at
[1]. This way, one can figure out the reasons for the server
down/crash and figure out which replication slots to remove to bring
the server up and running without touching the other replication
slots.

I think the PIDs are log-worthy for sure, but it's not clear to me that
it is desirable to write them to the persistent state file. In case of
crashes, the log should serve just fine to aid root cause investigation
-- in fact even better than the persistent file, where the data would be
lost as soon as the next client acquires that slot.

Thanks. +1 to log a message at LOG level whenever a replication slot
becomes active (gets assigned a valid pid to active_pid) and becomes
inactive(gets assigned 0 to active_pid). Having said that, isn't it
also helpful if we write a bool (1 byte character) whenever the slot
becomes active and inactive to the disk?

Regards,
Bharath Rupireddy.

#4Bharath Rupireddy
bharath.rupireddyforpostgres@gmail.com
In reply to: Bharath Rupireddy (#3)
Re: Is it worth adding ReplicationSlot active_pid to ReplicationSlotPersistentData?

On Fri, Dec 3, 2021 at 7:39 PM Bharath Rupireddy
<bharath.rupireddyforpostgres@gmail.com> wrote:

On Wed, Dec 1, 2021 at 9:50 PM Alvaro Herrera <alvherre@alvh.no-ip.org> wrote:

On 2021-Dec-01, Bharath Rupireddy wrote:

The active_pid of ReplicationSlot structure, which tells whether a
replication slot is active or inactive, isn't persisted to the disk
i.e has no entry in ReplicationSlotPersistentData structure. Isn't it
better if we add that info to ReplicationSlotPersistentData structure
and persist to the disk? This will help to know what were the inactive
replication slots in case the server goes down or crashes for some
reason. Currently, we don't have a way to interpret the replication
slot info in the disk but there's a patch for pg_replslotdata tool at
[1]. This way, one can figure out the reasons for the server
down/crash and figure out which replication slots to remove to bring
the server up and running without touching the other replication
slots.

I think the PIDs are log-worthy for sure, but it's not clear to me that
it is desirable to write them to the persistent state file. In case of
crashes, the log should serve just fine to aid root cause investigation
-- in fact even better than the persistent file, where the data would be
lost as soon as the next client acquires that slot.

Thanks. +1 to log a message at LOG level whenever a replication slot
becomes active (gets assigned a valid pid to active_pid) and becomes
inactive(gets assigned 0 to active_pid). Having said that, isn't it
also helpful if we write a bool (1 byte character) whenever the slot
becomes active and inactive to the disk?

Here's the patch that adds a LOG message whenever a replication slot
becomes active and inactive. These logs will be extremely useful on
production servers to debug and analyze inactive replication slot
issues.

Thoughts?

Regards,
Bharath Rupireddy.

Attachments:

v1-0001-Add-LOG-messages-when-replication-slots-become-ac.patchapplication/octet-stream; name=v1-0001-Add-LOG-messages-when-replication-slots-become-ac.patchDownload+39-1
#5Kyotaro Horiguchi
horikyota.ntt@gmail.com
In reply to: Bharath Rupireddy (#4)
Re: Is it worth adding ReplicationSlot active_pid to ReplicationSlotPersistentData?

At Tue, 14 Dec 2021 19:04:09 +0530, Bharath Rupireddy <bharath.rupireddyforpostgres@gmail.com> wrote in

On Fri, Dec 3, 2021 at 7:39 PM Bharath Rupireddy
<bharath.rupireddyforpostgres@gmail.com> wrote:

On Wed, Dec 1, 2021 at 9:50 PM Alvaro Herrera <alvherre@alvh.no-ip.org> wrote:

I think the PIDs are log-worthy for sure, but it's not clear to me that
it is desirable to write them to the persistent state file. In case of
crashes, the log should serve just fine to aid root cause investigation
-- in fact even better than the persistent file, where the data would be
lost as soon as the next client acquires that slot.

Thanks. +1 to log a message at LOG level whenever a replication slot
becomes active (gets assigned a valid pid to active_pid) and becomes
inactive(gets assigned 0 to active_pid). Having said that, isn't it
also helpful if we write a bool (1 byte character) whenever the slot
becomes active and inactive to the disk?

Here's the patch that adds a LOG message whenever a replication slot
becomes active and inactive. These logs will be extremely useful on
production servers to debug and analyze inactive replication slot
issues.

Thoughts?

If I create a replication slot, I saw the following lines in server log.

[22054:client backend] LOG: replication slot "s1" becomes active
[22054:client backend] DETAIL: The process with PID 22054 acquired it.
[22054:client backend] STATEMENT: select pg_drop_replication_slot('s1');
[22054:client backend] LOG: replication slot "s1" becomes inactive
[22054:client backend] DETAIL: The process with PID 22054 released it.
[22054:client backend] STATEMENT: select pg_drop_replication_slot('s1');

They are apparently too much as if they were DEBUG3 lines. The
process PID shown is of the process the slot operations took place so
I think it conveys no information. The STATEMENT lines are also noisy
for non-ERROR emssages. Couldn't we hide that line?

That is, how about making the log lines as simple as the follows?

[17156:walsender] LOG: acquired replication slot "s1"
[17156:walsender] LOG: released replication slot "s1"

I think in the first place we don't need this log lines at slot
creation since it is actually not acquirement nor releasing of a slot.

It behaves in a strange way when executing pg_basebackup.

[22864:walsender] LOG: replication slot "pg_basebackup_22864" becomes active
[22864:walsender] DETAIL: The process with PID 22864 acquired it.
[22864:walsender] STATEMENT: CREATE_REPLICATION_SLOT "pg_basebackup_22864" TEMPORARY PHYSICAL ( RESERVE_WAL)
[22864:walsender] LOG: replication slot "pg_basebackup_22864" becomes active
[22864:walsender] DETAIL: The process with PID 22864 acquired it.
[22864:walsender] STATEMENT: START_REPLICATION SLOT "pg_basebackup_22864" 0/6000000 TIMELINE 1
[22864:walsender] LOG: replication slot "pg_basebackup_22864" becomes inactive
[22864:walsender] DETAIL: The process with PID 22864 released it.

The slot is acquired twice then released once. It is becuase the
patch doesn't emit "becomes inactive" line when releasing a temporary
slot. However, I'd rather think we don't need the first 'become
active' line like the previous example.

@@ -658,6 +690,13 @@ ReplicationSlotDropPtr(ReplicationSlot *slot)
 	slot->active_pid = 0;
 	slot->in_use = false;
 	LWLockRelease(ReplicationSlotControlLock);
+
+	if (pid > 0)
+		ereport(LOG,
+				(errmsg("replication slot \"%s\" becomes inactive",
+						NameStr(slot->data.name)),
+				 errdetail("The process with PID %d released it.", pid)));
+

This is wrong. I see a "become inactive" message if I droped an
"inactive" replication slot. The reason the inactive slot looks as if
it were acquired is it is temporarily aquired as a preparing step of
dropping.

Even assuming that the log lines are simplified to this extent, I
still see it a bit strange that the "becomes acitve (or acruied)"
message shows alone without having a message like "replication
connection accepted". But that would be another issue even if it is
true.

regards.

--
Kyotaro Horiguchi
NTT Open Source Software Center

#6Bharath Rupireddy
bharath.rupireddyforpostgres@gmail.com
In reply to: Kyotaro Horiguchi (#5)
Re: Is it worth adding ReplicationSlot active_pid to ReplicationSlotPersistentData?

On Wed, Dec 15, 2021 at 8:32 AM Kyotaro Horiguchi
<horikyota.ntt@gmail.com> wrote:

Here's the patch that adds a LOG message whenever a replication slot
becomes active and inactive. These logs will be extremely useful on
production servers to debug and analyze inactive replication slot
issues.

Thoughts?

If I create a replication slot, I saw the following lines in server log.

[22054:client backend] LOG: replication slot "s1" becomes active
[22054:client backend] DETAIL: The process with PID 22054 acquired it.
[22054:client backend] STATEMENT: select pg_drop_replication_slot('s1');
[22054:client backend] LOG: replication slot "s1" becomes inactive
[22054:client backend] DETAIL: The process with PID 22054 released it.
[22054:client backend] STATEMENT: select pg_drop_replication_slot('s1');

They are apparently too much as if they were DEBUG3 lines. The
process PID shown is of the process the slot operations took place so
I think it conveys no information. The STATEMENT lines are also noisy
for non-ERROR emssages. Couldn't we hide that line?

That is, how about making the log lines as simple as the follows?

[17156:walsender] LOG: acquired replication slot "s1"
[17156:walsender] LOG: released replication slot "s1"

Thanks for taking a look at the patch. Here's what I've come up with:

for drops:
2021-12-28 06:39:34.963 UTC [2541600] LOG: acquired persistent
physical replication slot "myslot1"
2021-12-28 06:39:34.980 UTC [2541600] LOG: dropped persistent
physical replication slot "myslot1"
2021-12-28 06:47:39.994 UTC [2544153] LOG: acquired persistent
logical replication slot "myslot2"
2021-12-28 06:47:40.003 UTC [2544153] LOG: dropped persistent logical
replication slot "myslot2"

for creates:
2021-12-28 06:39:46.859 UTC [2541600] LOG: created persistent
physical replication slot "myslot1"
2021-12-28 06:39:46.859 UTC [2541600] LOG: released persistent
physical replication slot "myslot1"
2021-12-28 06:45:20.037 UTC [2544153] LOG: created persistent logical
replication slot "myslot2"
2021-12-28 06:45:20.058 UTC [2544153] LOG: released persistent
logical replication slot "myslot2"

for pg_basebackup:
2021-12-28 06:41:04.601 UTC [2542686] LOG: created temporary physical
replication slot "pg_basebackup_2542686"
2021-12-28 06:41:04.602 UTC [2542686] LOG: released temporary
physical replication slot "pg_basebackup_2542686"
2021-12-28 06:41:04.602 UTC [2542686] LOG: acquired temporary
physical replication slot "pg_basebackup_2542686"
2021-12-28 06:41:04.867 UTC [2542686] LOG: released temporary
physical replication slot "pg_basebackup_2542686"
2021-12-28 06:41:04.954 UTC [2542686] LOG: dropped temporary physical
replication slot "pg_basebackup_2542686"

The amount of logs may seem noisy, but they do help a lot given the
fact that the server generates much more noise, for instance, the
server logs the syntax error statements. And, the replication slots
don't get created/dropped every now and then (at max, the
pg_basebackup if at all used and configured to take the backups, say,
every 24hrs or so). With the above set of logs debugging for inactive
replication slots becomes easier. One can find the root cause and
answer questions like "why there was a huge WAL file growth at some
point or when did a replication slot become inactive or how much time
a replication slot was inactive or when did a standby disconnected and
connected again or when did a pg_receivewal or pg_recvlogical
connected and disconnected so on.".

Here's the v2 patch. Please provide your thoughts.

Regards,
Bharath Rupireddy.

Attachments:

v2-0001-Add-LOG-messages-when-replication-slots-become-ac.patchapplication/octet-stream; name=v2-0001-Add-LOG-messages-when-replication-slots-become-ac.patchDownload+28-1
#7Kyotaro Horiguchi
horikyota.ntt@gmail.com
In reply to: Bharath Rupireddy (#6)
Re: Is it worth adding ReplicationSlot active_pid to ReplicationSlotPersistentData?

At Tue, 28 Dec 2021 12:28:07 +0530, Bharath Rupireddy <bharath.rupireddyforpostgres@gmail.com> wrote in

On Wed, Dec 15, 2021 at 8:32 AM Kyotaro Horiguchi
<horikyota.ntt@gmail.com> wrote:

Here's the patch that adds a LOG message whenever a replication slot
becomes active and inactive. These logs will be extremely useful on
production servers to debug and analyze inactive replication slot
issues.

Thoughts?

If I create a replication slot, I saw the following lines in server log.

[22054:client backend] LOG: replication slot "s1" becomes active
[22054:client backend] DETAIL: The process with PID 22054 acquired it.
[22054:client backend] STATEMENT: select pg_drop_replication_slot('s1');
[22054:client backend] LOG: replication slot "s1" becomes inactive
[22054:client backend] DETAIL: The process with PID 22054 released it.
[22054:client backend] STATEMENT: select pg_drop_replication_slot('s1');

They are apparently too much as if they were DEBUG3 lines. The
process PID shown is of the process the slot operations took place so
I think it conveys no information. The STATEMENT lines are also noisy
for non-ERROR emssages. Couldn't we hide that line?

That is, how about making the log lines as simple as the follows?

[17156:walsender] LOG: acquired replication slot "s1"
[17156:walsender] LOG: released replication slot "s1"

Thanks for taking a look at the patch. Here's what I've come up with:

for drops:

(two log lines per slot: acquire->drop)

for creates:

(two log lines per slot: create->release)
..

Theses are still needlessly verbose. Even for those who want slot
activities to be logged are not interested in this detail. This is
still debug logs in that sense.

The amount of logs may seem noisy, but they do help a lot given the
fact that the server generates much more noise, for instance, the
server logs the syntax error statements. And, the replication slots
don't get created/dropped every now and then (at max, the
pg_basebackup if at all used and configured to take the backups, say,
every 24hrs or so). With the above set of logs debugging for inactive

In a nearby thread, there was a discussion that checkpoint logs are
too noisy to defaultly turn on. Finally it is committed but I don't
think this is committed as is as it is more verbose (IMV) than the
checkpoint logs. Thus this logs need to be muteable. Meanwhile I
don't think we willingly add a new knob for this feature. I think we
can piggy-back on log_replication_commands for the purpose, changing
its meaning slightly to "log replication commands and related
activity".

replication slots becomes easier. One can find the root cause and
answer questions like "why there was a huge WAL file growth at some
point or when did a replication slot become inactive or how much time
a replication slot was inactive or when did a standby disconnected and
connected again or when did a pg_receivewal or pg_recvlogical
connected and disconnected so on.".

I don't deny it is useful in that cases. If you are fine that the
logs are debug-only, making them DEBUG1 would work. But I don't think
you are fine with that since I think you are going to turn on them on
production systems.

Here's the v2 patch. Please provide your thoughts.

Thanks! I have three comments on this version.

- I still think "acquire/release" logs on creation/dropping should be
silenced. Adding the third parameter to ReplicationSlotAcquire()
that can mute the acquiring (and as well as the corresponding
releasing) log will work.

- Need to mute the logs by log_replication_commands. (We could add
another choice for the variable for this purpose but I think we
don't need it.)

- The messages are not translatable as the variable parts are
adjectives. They need to consist of static sentences. The
combinations of the two properties are 6 (note that persistence is
tristate) but I don't think we accept that complexity for the
information. So I recommend to just remove the variable parts from
the messages.

regards.

--
Kyotaro Horiguchi
NTT Open Source Software Center

#8Bharath Rupireddy
bharath.rupireddyforpostgres@gmail.com
In reply to: Kyotaro Horiguchi (#7)
Re: Is it worth adding ReplicationSlot active_pid to ReplicationSlotPersistentData?

On Wed, Jan 5, 2022 at 12:13 PM Kyotaro Horiguchi <horikyota.ntt@gmail.com>
wrote:

Here's the v2 patch. Please provide your thoughts.

Thanks! I have three comments on this version.

Thanks for your review.

- I still think "acquire/release" logs on creation/dropping should be
silenced. Adding the third parameter to ReplicationSlotAcquire()
that can mute the acquiring (and as well as the corresponding
releasing) log will work.

Done.

can piggy-back on log_replication_commands for the purpose, changing
its meaning slightly to "log replication commands and related
activity".
- Need to mute the logs by log_replication_commands. (We could add
another choice for the variable for this purpose but I think we
don't need it.)

Done.

- The messages are not translatable as the variable parts are
adjectives. They need to consist of static sentences. The
combinations of the two properties are 6 (note that persistence is
tristate) but I don't think we accept that complexity for the
information. So I recommend to just remove the variable parts from
the messages.

Done.

Here's v3, please review it further.

Regards,
Bharath Rupireddy.

Attachments:

v3-0001-Add-LOG-messages-when-replication-slots-become-ac.patchapplication/x-patch; name=v3-0001-Add-LOG-messages-when-replication-slots-become-ac.patchDownload+47-26
#9Bharath Rupireddy
bharath.rupireddyforpostgres@gmail.com
In reply to: Bharath Rupireddy (#8)
Re: Is it worth adding ReplicationSlot active_pid to ReplicationSlotPersistentData?

On Mon, Jan 10, 2022 at 6:50 AM Bharath Rupireddy
<bharath.rupireddyforpostgres@gmail.com> wrote:

On Wed, Jan 5, 2022 at 12:13 PM Kyotaro Horiguchi <horikyota.ntt@gmail.com> wrote:

Here's the v2 patch. Please provide your thoughts.

Thanks! I have three comments on this version.

Thanks for your review.

- I still think "acquire/release" logs on creation/dropping should be
silenced. Adding the third parameter to ReplicationSlotAcquire()
that can mute the acquiring (and as well as the corresponding
releasing) log will work.

Done.

can piggy-back on log_replication_commands for the purpose, changing
its meaning slightly to "log replication commands and related
activity".
- Need to mute the logs by log_replication_commands. (We could add
another choice for the variable for this purpose but I think we
don't need it.)

Done.

- The messages are not translatable as the variable parts are
adjectives. They need to consist of static sentences. The
combinations of the two properties are 6 (note that persistence is
tristate) but I don't think we accept that complexity for the
information. So I recommend to just remove the variable parts from
the messages.

Done.

Here's v3, please review it further.

Here's the rebased v4 patch.

Regards,
Bharath Rupireddy.

Attachments:

v4-0001-Add-LOG-messages-when-replication-slots-become-ac.patchapplication/octet-stream; name=v4-0001-Add-LOG-messages-when-replication-slots-become-ac.patchDownload+48-31
#10Bharath Rupireddy
bharath.rupireddyforpostgres@gmail.com
In reply to: Bharath Rupireddy (#9)
add log messages when replication slots become active and inactive (was Re: Is it worth adding ReplicationSlot active_pid to ReplicationSlotPersistentData?)

On Sun, Feb 27, 2022 at 3:16 PM Bharath Rupireddy
<bharath.rupireddyforpostgres@gmail.com> wrote:

On Mon, Jan 10, 2022 at 6:50 AM Bharath Rupireddy
<bharath.rupireddyforpostgres@gmail.com> wrote:

On Wed, Jan 5, 2022 at 12:13 PM Kyotaro Horiguchi <horikyota.ntt@gmail.com> wrote:

Here's the v2 patch. Please provide your thoughts.

Thanks! I have three comments on this version.

Thanks for your review.

- I still think "acquire/release" logs on creation/dropping should be
silenced. Adding the third parameter to ReplicationSlotAcquire()
that can mute the acquiring (and as well as the corresponding
releasing) log will work.

Done.

can piggy-back on log_replication_commands for the purpose, changing
its meaning slightly to "log replication commands and related
activity".
- Need to mute the logs by log_replication_commands. (We could add
another choice for the variable for this purpose but I think we
don't need it.)

Done.

- The messages are not translatable as the variable parts are
adjectives. They need to consist of static sentences. The
combinations of the two properties are 6 (note that persistence is
tristate) but I don't think we accept that complexity for the
information. So I recommend to just remove the variable parts from
the messages.

Done.

Here's v3, please review it further.

Here's the rebased v4 patch.

Here's the rebased v5 patch.

Regards,
Bharath Rupireddy.

Attachments:

v5-0001-Add-LOG-messages-when-replication-slots-become-ac.patchapplication/octet-stream; name=v5-0001-Add-LOG-messages-when-replication-slots-become-ac.patchDownload+49-31
#11Alvaro Herrera
alvherre@2ndquadrant.com
In reply to: Bharath Rupireddy (#10)
Re: add log messages when replication slots become active and inactive (was Re: Is it worth adding ReplicationSlot active_pid to ReplicationSlotPersistentData?)

If I read the code right, this patch emits logs when
pg_logical_slot_get_changes and pg_replication_slot_advance SQL
functions are called. Is this desirable/useful, for the use case you
stated at start of thread? I think it is most likely pointless. If you
get rid of those, then the only acquisitions that would log messages are
those in StartReplication and StartLogicalReplication. So I wonder if
it would be better to leave the API of ReplicationSlotAcquire() alone,
and instead make StartReplication and StartLogicalReplication
responsible for those messages.

I didn't look at the release-side messages you're adding, but I suppose
it should be symmetrical.

--
Álvaro Herrera PostgreSQL Developer — https://www.EnterpriseDB.com/
"El hombre nunca sabe de lo que es capaz hasta que lo intenta" (C. Dickens)

#12Bharath Rupireddy
bharath.rupireddyforpostgres@gmail.com
In reply to: Alvaro Herrera (#11)
Re: add log messages when replication slots become active and inactive (was Re: Is it worth adding ReplicationSlot active_pid to ReplicationSlotPersistentData?)

On Fri, Apr 29, 2022 at 4:02 PM Alvaro Herrera <alvherre@alvh.no-ip.org> wrote:

If I read the code right, this patch emits logs when
pg_logical_slot_get_changes and pg_replication_slot_advance SQL
functions are called. Is this desirable/useful, for the use case you
stated at start of thread? I think it is most likely pointless. If you
get rid of those, then the only acquisitions that would log messages are
those in StartReplication and StartLogicalReplication. So I wonder if
it would be better to leave the API of ReplicationSlotAcquire() alone,
and instead make StartReplication and StartLogicalReplication
responsible for those messages.

I didn't look at the release-side messages you're adding, but I suppose
it should be symmetrical.

Adding the messages right after ReplicationSlotAcquire in the
StartReplication and StartLogicalReplication looks okay, but, if we
just add "released replication slot \"%s\" for logical/physical
replication". in StartReplication and StartLogicalReplication right
after ReplicationSlotRelease, how about ReplicationSlotRelease in
WalSndErrorCleanup and ReplicationSlotShmemExit?

The whole idea is to get to know when and how much time a slot was
inactive/unused when log_replication_commands is set to true.

We can think of adding a timestamp column to on-disk replication slot
data but that will be too much.

Regards,
Bharath Rupireddy.

#13Bharath Rupireddy
bharath.rupireddyforpostgres@gmail.com
In reply to: Bharath Rupireddy (#12)
Re: add log messages when replication slots become active and inactive (was Re: Is it worth adding ReplicationSlot active_pid to ReplicationSlotPersistentData?)

On Thu, May 5, 2022 at 2:57 PM Bharath Rupireddy
<bharath.rupireddyforpostgres@gmail.com> wrote:

On Fri, Apr 29, 2022 at 4:02 PM Alvaro Herrera <alvherre@alvh.no-ip.org> wrote:

If I read the code right, this patch emits logs when
pg_logical_slot_get_changes and pg_replication_slot_advance SQL
functions are called. Is this desirable/useful, for the use case you
stated at start of thread? I think it is most likely pointless. If you
get rid of those, then the only acquisitions that would log messages are
those in StartReplication and StartLogicalReplication. So I wonder if
it would be better to leave the API of ReplicationSlotAcquire() alone,
and instead make StartReplication and StartLogicalReplication
responsible for those messages.

I didn't look at the release-side messages you're adding, but I suppose
it should be symmetrical.

Here's the v6 patch, a much simpler one - no changes to any of the
existing function APIs. Please see the sample logs at [1]2022-07-25 12:30:14.847 UTC [152873] LOG: acquired physical replication slot "foo" 2022-07-25 12:30:20.878 UTC [152873] LOG: released physical replication slot "foo". There's a
bit of duplicate code in the v6 patch, if the overall approach looks
okay, I can remove that too in the next version of the patch.

Thoughts?

[1]: 2022-07-25 12:30:14.847 UTC [152873] LOG: acquired physical replication slot "foo" 2022-07-25 12:30:20.878 UTC [152873] LOG: released physical replication slot "foo"
2022-07-25 12:30:14.847 UTC [152873] LOG: acquired physical
replication slot "foo"
2022-07-25 12:30:20.878 UTC [152873] LOG: released physical
replication slot "foo"

2022-07-25 12:49:18.023 UTC [168738] LOG: acquired logical
replication slot "bar"
2022-07-25 12:49:28.105 UTC [168738] LOG: released logical
replication slot "bar"

Regards,
Bharath Rupireddy.

Attachments:

v6-0001-Add-LOG-messages-when-replication-slots-become-ac.patchapplication/x-patch; name=v6-0001-Add-LOG-messages-when-replication-slots-become-ac.patchDownload+64-4
#14Bharath Rupireddy
bharath.rupireddyforpostgres@gmail.com
In reply to: Bharath Rupireddy (#13)
Re: add log messages when replication slots become active and inactive (was Re: Is it worth adding ReplicationSlot active_pid to ReplicationSlotPersistentData?)

On Mon, Jul 25, 2022 at 6:31 PM Bharath Rupireddy
<bharath.rupireddyforpostgres@gmail.com> wrote:

Here's the v6 patch, a much simpler one - no changes to any of the
existing function APIs. Please see the sample logs at [1]. There's a
bit of duplicate code in the v6 patch, if the overall approach looks
okay, I can remove that too in the next version of the patch.

I modified the log_replication_commands description in guc_tables.c.
Please review the v7 patch further.

--
Bharath Rupireddy
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

Attachments:

v7-0001-Add-LOG-messages-when-replication-slots-become-ac.patchapplication/octet-stream; name=v7-0001-Add-LOG-messages-when-replication-slots-become-ac.patchDownload+57-5
#15Amit Kapila
amit.kapila16@gmail.com
In reply to: Bharath Rupireddy (#14)
Re: add log messages when replication slots become active and inactive (was Re: Is it worth adding ReplicationSlot active_pid to ReplicationSlotPersistentData?)

On Thu, Sep 15, 2022 at 10:39 AM Bharath Rupireddy
<bharath.rupireddyforpostgres@gmail.com> wrote:

On Mon, Jul 25, 2022 at 6:31 PM Bharath Rupireddy
<bharath.rupireddyforpostgres@gmail.com> wrote:

Here's the v6 patch, a much simpler one - no changes to any of the
existing function APIs. Please see the sample logs at [1]. There's a
bit of duplicate code in the v6 patch, if the overall approach looks
okay, I can remove that too in the next version of the patch.

I modified the log_replication_commands description in guc_tables.c.
Please review the v7 patch further.

I see that you have modified the patch to address the comments from
Alvaro. Personally, I feel it would be better to add such a message at
a centralized location instead of spreading these in different callers
of slot acquire/release functionality to avoid getting these missed in
the new callers in the future. However, if Alvaro and others think
that the current style is better then we should go ahead and do it
that way. I hope that we should be able to decide on this and get it
into PG16. Anyone else would like to weigh in here?

--
With Regards,
Amit Kapila.

#16Peter Smith
smithpb2250@gmail.com
In reply to: Bharath Rupireddy (#14)
Re: add log messages when replication slots become active and inactive (was Re: Is it worth adding ReplicationSlot active_pid to ReplicationSlotPersistentData?)

Hi, I had a quick look at the v7 patch.

You might consider to encapsulate some of this logic in new functions like:

void
LogReplicationSlotAquired(bool is_physical, char *slotname)
{
loglevel = log_replication_commands ? LOG : DEBUG3;

if (is_physical)
ereport(loglevel,
(errmsg("acquired physical replication slot \"%s\"", slotname)));
else
ereport(loglevel,
(errmsg("acquired logical replication slot \"%s\"", slotname)));
}

void
LogReplicationSlotReleased(bool is_physical, char *slotname)
{
loglevel = log_replication_commands ? LOG : DEBUG3;

if (is_physical)
ereport(loglevel,
(errmsg("released physical replication slot \"%s\"", slotname)));
else
ereport(loglevel,
(errmsg("released logical replication slot \"%s\"", slotname)));
}

~~

THEN

ReplicationSlotShmemExit and WalSndErrorCleanup can call it like:
if (MyReplicationSlot != NULL)
{
bool is_physical = SlotIsPhysical(MyReplicationSlot);
char *slotname = pstrdup(NameStr(MyReplicationSlot->data.name));

ReplicationSlotRelease();
LogReplicationSlotReleased(is_physical, slotname);
}

~

StartlReplication can call like:
LogReplicationSlotAquired(true, cmd->slotname);
...
LogReplicationSlotReleased(true, cmd->slotname);

~

StartLogicalReplication can call like:
LogReplicationSlotAquired(false, cmd->slotname);
...
LogReplicationSlotReleased(false, cmd->slotname);

~~~

TBH, I am not sure for the *current* code if the encapsulation is
worth the trouble or not. But maybe at least it helps message
consistency and will make it easier if future callers are needed. I
guess those functions could also give you some central point to
comment the intent of this logging? Feel free to take or leave this
code suggestion.

------
Kind Regards,
Peter Smith.
Fujitsu Australia

#17Alvaro Herrera
alvherre@2ndquadrant.com
In reply to: Amit Kapila (#15)
Re: add log messages when replication slots become active and inactive (was Re: Is it worth adding ReplicationSlot active_pid to ReplicationSlotPersistentData?)

On 2023-Mar-22, Amit Kapila wrote:

I see that you have modified the patch to address the comments from
Alvaro. Personally, I feel it would be better to add such a message at
a centralized location instead of spreading these in different callers
of slot acquire/release functionality to avoid getting these missed in
the new callers in the future. However, if Alvaro and others think
that the current style is better then we should go ahead and do it
that way. I hope that we should be able to decide on this and get it
into PG16. Anyone else would like to weigh in here?

I like Peter Smith's suggestion downthread.

--
Álvaro Herrera PostgreSQL Developer — https://www.EnterpriseDB.com/

#18Bharath Rupireddy
bharath.rupireddyforpostgres@gmail.com
In reply to: Alvaro Herrera (#17)
Re: add log messages when replication slots become active and inactive (was Re: Is it worth adding ReplicationSlot active_pid to ReplicationSlotPersistentData?)

On Thu, Mar 23, 2023 at 3:37 PM Alvaro Herrera <alvherre@alvh.no-ip.org> wrote:

On 2023-Mar-22, Amit Kapila wrote:

I see that you have modified the patch to address the comments from
Alvaro. Personally, I feel it would be better to add such a message at
a centralized location instead of spreading these in different callers
of slot acquire/release functionality to avoid getting these missed in
the new callers in the future. However, if Alvaro and others think
that the current style is better then we should go ahead and do it
that way. I hope that we should be able to decide on this and get it
into PG16. Anyone else would like to weigh in here?

I like Peter Smith's suggestion downthread.

+1. Please review the attached v8 patch further.

--
Bharath Rupireddy
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

Attachments:

v8-0001-Add-LOG-messages-when-replication-slots-become-ac.patchapplication/octet-stream; name=v8-0001-Add-LOG-messages-when-replication-slots-become-ac.patchDownload+69-5
#19Peter Smith
smithpb2250@gmail.com
In reply to: Bharath Rupireddy (#18)
Re: add log messages when replication slots become active and inactive (was Re: Is it worth adding ReplicationSlot active_pid to ReplicationSlotPersistentData?)

On Fri, Mar 24, 2023 at 4:33 AM Bharath Rupireddy
<bharath.rupireddyforpostgres@gmail.com> wrote:

On Thu, Mar 23, 2023 at 3:37 PM Alvaro Herrera <alvherre@alvh.no-ip.org> wrote:

On 2023-Mar-22, Amit Kapila wrote:

I see that you have modified the patch to address the comments from
Alvaro. Personally, I feel it would be better to add such a message at
a centralized location instead of spreading these in different callers
of slot acquire/release functionality to avoid getting these missed in
the new callers in the future. However, if Alvaro and others think
that the current style is better then we should go ahead and do it
that way. I hope that we should be able to decide on this and get it
into PG16. Anyone else would like to weigh in here?

I like Peter Smith's suggestion downthread.

+1. Please review the attached v8 patch further.

Patch v8 applied OK, and builds/renders the HTML docs OK, and passes
the regression and subscription TAP tests OK.

(Note - I didn't do any additional manual testing, and I've assumed it
to be covering all the necessary acquire/related logging that you
wanted).

~~

Here are some minor comments:

1.
+ ereport(log_replication_commands ? LOG : DEBUG3,
+ (errmsg("acquired physical replication slot \"%s\"",
+ slotname)));

AFAIK those extra parentheses wrapping the "errmsg" part are not necessary.

~~

2.
extern void LogReplicationSlotAquired(bool is_physical, char *slotname);
extern void LogReplicationSlotReleased(bool is_physical, char *slotname);

The "char *slotname" params of those helper functions should probably
be declared and defined as "const char *slotname".

~~

Otherwise, from a code review perspective the patch v8 LGTM.

------
Kind Regards,
Peter Smith.
Fujitsu Australia

#20Euler Taveira
euler@eulerto.com
In reply to: Bharath Rupireddy (#18)
Re: add log messages when replication slots become active and inactive (was Re: Is it worth adding ReplicationSlot active_pid to ReplicationSlotPersistentData?)

On Thu, Mar 23, 2023, at 2:33 PM, Bharath Rupireddy wrote:

On Thu, Mar 23, 2023 at 3:37 PM Alvaro Herrera <alvherre@alvh.no-ip.org> wrote:

On 2023-Mar-22, Amit Kapila wrote:

I see that you have modified the patch to address the comments from
Alvaro. Personally, I feel it would be better to add such a message at
a centralized location instead of spreading these in different callers
of slot acquire/release functionality to avoid getting these missed in
the new callers in the future. However, if Alvaro and others think
that the current style is better then we should go ahead and do it
that way. I hope that we should be able to decide on this and get it
into PG16. Anyone else would like to weigh in here?

I like Peter Smith's suggestion downthread.

+1. Please review the attached v8 patch further.

If you are adding separate functions as suggested, you should add a comment at
the top of ReplicationSlotAcquire() and ReplicationSlotRelease() functions
saying that LogReplicationSlotAquired() and LogReplicationSlotReleased()
functions should be called respectively after it.

My suggestion is that the functions should have the same name with a "Log"
prefix. On of them has a typo "Aquired" in its name. Hence,
LogReplicationSlotAcquire() and LogReplicationSlotRelease() as names. It is
easier to find if someone is grepping by the origin function.

I prefer a sentence that includes a verb.

physical replication slot \"%s\" is acquired
logical replication slot \"%s\" is released

Isn't the PID important for this use case? If so, of course, you can rely on
log_line_prefix (%p) but if the PID is crucial for an investigation then it
should also be included in the message.

--
Euler Taveira
EDB https://www.enterprisedb.com/

#21Bharath Rupireddy
bharath.rupireddyforpostgres@gmail.com
In reply to: Euler Taveira (#20)
#22Bruce Momjian
bruce@momjian.us
In reply to: Bharath Rupireddy (#21)
#23Alvaro Herrera
alvherre@2ndquadrant.com
In reply to: Bruce Momjian (#22)
#24Bharath Rupireddy
bharath.rupireddyforpostgres@gmail.com
In reply to: Alvaro Herrera (#23)
#25Amit Kapila
amit.kapila16@gmail.com
In reply to: Bharath Rupireddy (#24)
#26Peter Smith
smithpb2250@gmail.com
In reply to: Amit Kapila (#25)
#27Bharath Rupireddy
bharath.rupireddyforpostgres@gmail.com
In reply to: Peter Smith (#26)
#28Amit Kapila
amit.kapila16@gmail.com
In reply to: Bharath Rupireddy (#27)
#29Peter Smith
smithpb2250@gmail.com
In reply to: Bharath Rupireddy (#27)
#30Bharath Rupireddy
bharath.rupireddyforpostgres@gmail.com
In reply to: Amit Kapila (#28)
#31Bharath Rupireddy
bharath.rupireddyforpostgres@gmail.com
In reply to: Bharath Rupireddy (#30)
#32Peter Smith
smithpb2250@gmail.com
In reply to: Bharath Rupireddy (#31)
#33Bharath Rupireddy
bharath.rupireddyforpostgres@gmail.com
In reply to: Peter Smith (#32)
#34Peter Smith
smithpb2250@gmail.com
In reply to: Bharath Rupireddy (#33)
#35Amit Kapila
amit.kapila16@gmail.com
In reply to: Peter Smith (#34)
#36Bharath Rupireddy
bharath.rupireddyforpostgres@gmail.com
In reply to: Amit Kapila (#35)
#37Peter Smith
smithpb2250@gmail.com
In reply to: Amit Kapila (#35)
#38Amit Kapila
amit.kapila16@gmail.com
In reply to: Bharath Rupireddy (#36)
#39Alvaro Herrera
alvherre@2ndquadrant.com
In reply to: Amit Kapila (#38)
#40Bharath Rupireddy
bharath.rupireddyforpostgres@gmail.com
In reply to: Alvaro Herrera (#39)
#41Peter Smith
smithpb2250@gmail.com
In reply to: Bharath Rupireddy (#40)
#42Amit Kapila
amit.kapila16@gmail.com
In reply to: Peter Smith (#41)
#43Amit Kapila
amit.kapila16@gmail.com
In reply to: Alvaro Herrera (#39)
#44Peter Smith
smithpb2250@gmail.com
In reply to: Amit Kapila (#43)
#45Peter Smith
smithpb2250@gmail.com
In reply to: Amit Kapila (#42)
#46Alvaro Herrera
alvherre@2ndquadrant.com
In reply to: Peter Smith (#41)
#47Bharath Rupireddy
bharath.rupireddyforpostgres@gmail.com
In reply to: Alvaro Herrera (#46)
#48Amit Kapila
amit.kapila16@gmail.com
In reply to: Bharath Rupireddy (#47)
#49Amit Kapila
amit.kapila16@gmail.com
In reply to: Amit Kapila (#48)