Track in pg_replication_slots the reason why slots conflict?

Started by Michael Paquierabout 2 years ago33 messages
Jump to latest
#1Michael Paquier
michael@paquier.xyz

Hi all,
(Bertrand and Andres in CC.)

While listening at Bertrand's talk about logical decoding on standbys
last week at Prague, I got surprised by the fact that we do not
reflect in the catalogs the reason why a conflict happened for a slot.
There are three of them depending on ReplicationSlotInvalidationCause:
- WAL removed.
- Invalid horizon.
- Insufficient WAL level.

This idea has been hinted around here on the original thread that led
to be87200efd93:
/messages/by-id/d7547f2c-a0c3-6aad-b631-b7ed5efaf298@gmail.com
However v44 has picked up the idea of a boolean:
/messages/by-id/bdc49e0b-cd39-bcd3-e391-b0ad6e48b5cf@gmail.com

ReplicationSlotCtl holds this information, so couldn't it be useful
for monitoring purposes to know why a slot got invalidated and add a
column to pg_get_replication_slots()? This could just be an extra
text conflicting_reason, defaulting to NULL when there's nothing to
see.

Thoughts?
--
Michael

#2Amit Kapila
amit.kapila16@gmail.com
In reply to: Michael Paquier (#1)
Re: Track in pg_replication_slots the reason why slots conflict?

On Thu, Dec 21, 2023 at 5:51 AM Michael Paquier <michael@paquier.xyz> wrote:

While listening at Bertrand's talk about logical decoding on standbys
last week at Prague, I got surprised by the fact that we do not
reflect in the catalogs the reason why a conflict happened for a slot.
There are three of them depending on ReplicationSlotInvalidationCause:
- WAL removed.
- Invalid horizon.
- Insufficient WAL level.

The invalidation cause is also required by one of the features being
discussed "Synchronize slots from primary to standby" [1]/messages/by-id/514f6f2f-6833-4539-39f1-96cd1e011f23@enterprisedb.com and there is
already a thread to discuss the same [2]/messages/by-id/CAJpy0uBpr0ym12+0mXpjcRFA6N=anX+Yk9aGU4EJhHNu=fWykQ@mail.gmail.com. As that thread started
yesterday only, you may not have noticed it. Currently, the proposal
is to expose it via a function but we can extend it to also display
via view, feel free to share your opinion on that thread.

[1]: /messages/by-id/514f6f2f-6833-4539-39f1-96cd1e011f23@enterprisedb.com
[2]: /messages/by-id/CAJpy0uBpr0ym12+0mXpjcRFA6N=anX+Yk9aGU4EJhHNu=fWykQ@mail.gmail.com

--
With Regards,
Amit Kapila.

#3Michael Paquier
michael@paquier.xyz
In reply to: Amit Kapila (#2)
Re: Track in pg_replication_slots the reason why slots conflict?

On Thu, Dec 21, 2023 at 08:20:16AM +0530, Amit Kapila wrote:

The invalidation cause is also required by one of the features being
discussed "Synchronize slots from primary to standby" [1] and there is
already a thread to discuss the same [2]. As that thread started
yesterday only, you may not have noticed it. Currently, the proposal
is to expose it via a function but we can extend it to also display
via view, feel free to share your opinion on that thread.

[1] - /messages/by-id/514f6f2f-6833-4539-39f1-96cd1e011f23@enterprisedb.com
[2] - /messages/by-id/CAJpy0uBpr0ym12+0mXpjcRFA6N=anX+Yk9aGU4EJhHNu=fWykQ@mail.gmail.com

Ah thanks, missed this one. This cannot use a separate function,
actually, and there is a good reason for that that has not been
mentioned. I'll jump there.
--
Michael

#4Andres Freund
andres@anarazel.de
In reply to: Michael Paquier (#1)
Re: Track in pg_replication_slots the reason why slots conflict?

Hi,

On 2023-12-21 09:21:04 +0900, Michael Paquier wrote:

While listening at Bertrand's talk about logical decoding on standbys
last week at Prague, I got surprised by the fact that we do not
reflect in the catalogs the reason why a conflict happened for a slot.
There are three of them depending on ReplicationSlotInvalidationCause:
- WAL removed.
- Invalid horizon.
- Insufficient WAL level.

It should be extremely rare to hit any of these other than "WAL removed", so
I'm not sure it's worth adding interface complexity to show them.

ReplicationSlotCtl holds this information, so couldn't it be useful
for monitoring purposes to know why a slot got invalidated and add a
column to pg_get_replication_slots()? This could just be an extra
text conflicting_reason, defaulting to NULL when there's nothing to
see.

Extra columns aren't free from a usability perspective. IFF we do something, I
think it should be a single column with a cause.

Greetings,

Andres Freund

#5shveta malik
shveta.malik@gmail.com
In reply to: Andres Freund (#4)
Re: Track in pg_replication_slots the reason why slots conflict?

On Thu, Dec 21, 2023 at 3:10 PM Andres Freund <andres@anarazel.de> wrote:

Hi,

On 2023-12-21 09:21:04 +0900, Michael Paquier wrote:

While listening at Bertrand's talk about logical decoding on standbys
last week at Prague, I got surprised by the fact that we do not
reflect in the catalogs the reason why a conflict happened for a slot.
There are three of them depending on ReplicationSlotInvalidationCause:
- WAL removed.
- Invalid horizon.
- Insufficient WAL level.

It should be extremely rare to hit any of these other than "WAL removed", so
I'm not sure it's worth adding interface complexity to show them.

ReplicationSlotCtl holds this information, so couldn't it be useful
for monitoring purposes to know why a slot got invalidated and add a
column to pg_get_replication_slots()? This could just be an extra
text conflicting_reason, defaulting to NULL when there's nothing to
see.

Extra columns aren't free from a usability perspective. IFF we do something, I
think it should be a single column with a cause.

Thanks for the feedback. But do you mean that we replace existing
'conflicting' column with 'cause' in both the function and view
(pg_get_replication_slots() and pg_replication_slots)? Or do you mean
that we expose 'cause' from pg_get_replication_slots() and use that to
display 'conflicting' in pg_replication_slots view?

And if we plan to return/display cause from either function or view,
then shall it be enum 'ReplicationSlotInvalidationCause' or
description/text corresponding to enum?

In the other feature being discussed "Synchronize slots from primary
to standby" [1]/messages/by-id/514f6f2f-6833-4539-39f1-96cd1e011f23@enterprisedb.com , there is a requirement to replicate invalidation
cause of slot from the primary to standby and thus it is needed in
enum form there. And thus there was a suggestion earlier to have the
function return enum-value and let the view display it as
text/description to the user. So kindly let us know your thoughts.

[1]: /messages/by-id/514f6f2f-6833-4539-39f1-96cd1e011f23@enterprisedb.com

thanks
Shveta

#6Andres Freund
andres@anarazel.de
In reply to: shveta malik (#5)
Re: Track in pg_replication_slots the reason why slots conflict?

Hi,

On 2023-12-21 16:08:48 +0530, shveta malik wrote:

On Thu, Dec 21, 2023 at 3:10 PM Andres Freund <andres@anarazel.de> wrote:

Hi,

On 2023-12-21 09:21:04 +0900, Michael Paquier wrote:

While listening at Bertrand's talk about logical decoding on standbys
last week at Prague, I got surprised by the fact that we do not
reflect in the catalogs the reason why a conflict happened for a slot.
There are three of them depending on ReplicationSlotInvalidationCause:
- WAL removed.
- Invalid horizon.
- Insufficient WAL level.

It should be extremely rare to hit any of these other than "WAL removed", so
I'm not sure it's worth adding interface complexity to show them.

ReplicationSlotCtl holds this information, so couldn't it be useful
for monitoring purposes to know why a slot got invalidated and add a
column to pg_get_replication_slots()? This could just be an extra
text conflicting_reason, defaulting to NULL when there's nothing to
see.

Extra columns aren't free from a usability perspective. IFF we do something, I
think it should be a single column with a cause.

Thanks for the feedback. But do you mean that we replace existing
'conflicting' column with 'cause' in both the function and view
(pg_get_replication_slots() and pg_replication_slots)? Or do you mean
that we expose 'cause' from pg_get_replication_slots() and use that to
display 'conflicting' in pg_replication_slots view?

I'm not entirely sure I understand the difference - just whether we add one
new column or replace the existing 'conflicting' column? I can see arguments
for either. A conflicting column where NULL indicates no conflict, and other
values indicate the reason for the conflict, doesn't seem too bad.

And if we plan to return/display cause from either function or view,
then shall it be enum 'ReplicationSlotInvalidationCause' or
description/text corresponding to enum?

We clearly can't just expose the numerical value for a C enum. So it has to be
converted to something SQL representable.

In the other feature being discussed "Synchronize slots from primary
to standby" [1] , there is a requirement to replicate invalidation
cause of slot from the primary to standby and thus it is needed in
enum form there. And thus there was a suggestion earlier to have the
function return enum-value and let the view display it as
text/description to the user. So kindly let us know your thoughts.

[1] - /messages/by-id/514f6f2f-6833-4539-39f1-96cd1e011f23@enterprisedb.com

Can you point me to a more specific message for that requirement? It seems
pretty odd to me. Your link goes to the top of a 400 message thread, I don't
have time to find one specific design point in that...

Greetings,

Andres

#7shveta malik
shveta.malik@gmail.com
In reply to: Andres Freund (#6)
Re: Track in pg_replication_slots the reason why slots conflict?

On Thu, Dec 21, 2023 at 5:04 PM Andres Freund <andres@anarazel.de> wrote:

Hi,

On 2023-12-21 16:08:48 +0530, shveta malik wrote:

On Thu, Dec 21, 2023 at 3:10 PM Andres Freund <andres@anarazel.de> wrote:

Hi,

On 2023-12-21 09:21:04 +0900, Michael Paquier wrote:

While listening at Bertrand's talk about logical decoding on standbys
last week at Prague, I got surprised by the fact that we do not
reflect in the catalogs the reason why a conflict happened for a slot.
There are three of them depending on ReplicationSlotInvalidationCause:
- WAL removed.
- Invalid horizon.
- Insufficient WAL level.

It should be extremely rare to hit any of these other than "WAL removed", so
I'm not sure it's worth adding interface complexity to show them.

ReplicationSlotCtl holds this information, so couldn't it be useful
for monitoring purposes to know why a slot got invalidated and add a
column to pg_get_replication_slots()? This could just be an extra
text conflicting_reason, defaulting to NULL when there's nothing to
see.

Extra columns aren't free from a usability perspective. IFF we do something, I
think it should be a single column with a cause.

Thanks for the feedback. But do you mean that we replace existing
'conflicting' column with 'cause' in both the function and view
(pg_get_replication_slots() and pg_replication_slots)? Or do you mean
that we expose 'cause' from pg_get_replication_slots() and use that to
display 'conflicting' in pg_replication_slots view?

I'm not entirely sure I understand the difference - just whether we add one
new column or replace the existing 'conflicting' column? I can see arguments
for either. A conflicting column where NULL indicates no conflict, and other
values indicate the reason for the conflict, doesn't seem too bad.

And if we plan to return/display cause from either function or view,
then shall it be enum 'ReplicationSlotInvalidationCause' or
description/text corresponding to enum?

We clearly can't just expose the numerical value for a C enum. So it has to be
converted to something SQL representable.

In the other feature being discussed "Synchronize slots from primary
to standby" [1] , there is a requirement to replicate invalidation
cause of slot from the primary to standby and thus it is needed in
enum form there. And thus there was a suggestion earlier to have the
function return enum-value and let the view display it as
text/description to the user. So kindly let us know your thoughts.

[1] - /messages/by-id/514f6f2f-6833-4539-39f1-96cd1e011f23@enterprisedb.com

Can you point me to a more specific message for that requirement? It seems
pretty odd to me. Your link goes to the top of a 400 message thread, I don't
have time to find one specific design point in that...

It is currently implemented there as a new function
'pg_get_slot_invalidation_cause()' without changing existing view
pg_replication_slots. (See 2.1 in [1]/messages/by-id/CAJpy0uAuzbzvcjpnzFTiWuDBctnH-SDZC6AZabPX65x9GWBrjQ@mail.gmail.com where it was introduced).

Then it was suggested in [2]/messages/by-id/CAA4eK1K0KCDNtpDyUKucMRdyK-5KdrCRWakCpHEdHT9muAiEOw@mail.gmail.com to fork a new thread as it makes sense to
have it independent of this slot-synchronization feature.

The new thread forked is [3]/messages/by-id/CAJpy0uBpr0ym12+0mXpjcRFA6N=anX+Yk9aGU4EJhHNu=fWykQ@mail.gmail.com. In that thread, the issues in having a
new function pg_get_slot_invalidation_cause() are discussed and also
we came to know about this very thread that started the next day.

[1]: /messages/by-id/CAJpy0uAuzbzvcjpnzFTiWuDBctnH-SDZC6AZabPX65x9GWBrjQ@mail.gmail.com
[2]: /messages/by-id/CAA4eK1K0KCDNtpDyUKucMRdyK-5KdrCRWakCpHEdHT9muAiEOw@mail.gmail.com
[3]: /messages/by-id/CAJpy0uBpr0ym12+0mXpjcRFA6N=anX+Yk9aGU4EJhHNu=fWykQ@mail.gmail.com

thanks
Shveta

#8Amit Kapila
amit.kapila16@gmail.com
In reply to: Andres Freund (#6)
Re: Track in pg_replication_slots the reason why slots conflict?

On Thu, Dec 21, 2023 at 5:05 PM Andres Freund <andres@anarazel.de> wrote:

On 2023-12-21 16:08:48 +0530, shveta malik wrote:

On Thu, Dec 21, 2023 at 3:10 PM Andres Freund <andres@anarazel.de> wrote:

Extra columns aren't free from a usability perspective. IFF we do something, I
think it should be a single column with a cause.

Thanks for the feedback. But do you mean that we replace existing
'conflicting' column with 'cause' in both the function and view
(pg_get_replication_slots() and pg_replication_slots)? Or do you mean
that we expose 'cause' from pg_get_replication_slots() and use that to
display 'conflicting' in pg_replication_slots view?

I'm not entirely sure I understand the difference - just whether we add one
new column or replace the existing 'conflicting' column? I can see arguments
for either.

Agreed. I think the argument against replacing the existing
'conflicting' column is that there is a chance that it is being used
by some monitoring script which I guess shouldn't be a big deal to
change. So, if we don't see that as a problem, I would prefer to have
a single column with conflict reason where one of its values indicates
there is no conflict.

A conflicting column where NULL indicates no conflict, and other

values indicate the reason for the conflict, doesn't seem too bad.

This is fine too.

And if we plan to return/display cause from either function or view,
then shall it be enum 'ReplicationSlotInvalidationCause' or
description/text corresponding to enum?

We clearly can't just expose the numerical value for a C enum. So it has to be
converted to something SQL representable.

We can return int2 value from the function pg_get_replication_slots()
and then use that to display a string in the view
pg_replication_slots.

--
With Regards,
Amit Kapila.

#9Bertrand Drouvot
bertranddrouvot.pg@gmail.com
In reply to: Amit Kapila (#8)
Re: Track in pg_replication_slots the reason why slots conflict?

Hi,

On Thu, Dec 21, 2023 at 07:55:51PM +0530, Amit Kapila wrote:

On Thu, Dec 21, 2023 at 5:05 PM Andres Freund <andres@anarazel.de> wrote:

I'm not entirely sure I understand the difference - just whether we add one
new column or replace the existing 'conflicting' column? I can see arguments
for either.

Agreed. I think the argument against replacing the existing
'conflicting' column is that there is a chance that it is being used
by some monitoring script which I guess shouldn't be a big deal to
change. So, if we don't see that as a problem, I would prefer to have
a single column with conflict reason where one of its values indicates
there is no conflict.

+1

A conflicting column where NULL indicates no conflict, and other

values indicate the reason for the conflict, doesn't seem too bad.

This is fine too.

+1

And if we plan to return/display cause from either function or view,
then shall it be enum 'ReplicationSlotInvalidationCause' or
description/text corresponding to enum?

We clearly can't just expose the numerical value for a C enum. So it has to be
converted to something SQL representable.

We can return int2 value from the function pg_get_replication_slots()
and then use that to display a string in the view
pg_replication_slots.

Yeah, and in the sync slot related work we could use pg_get_replication_slots()
then to get the enum.

Regards,

--
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

#10Andres Freund
andres@anarazel.de
In reply to: Amit Kapila (#8)
Re: Track in pg_replication_slots the reason why slots conflict?

On 2023-12-21 19:55:51 +0530, Amit Kapila wrote:

On Thu, Dec 21, 2023 at 5:05 PM Andres Freund <andres@anarazel.de> wrote:

We clearly can't just expose the numerical value for a C enum. So it has to be
converted to something SQL representable.

We can return int2 value from the function pg_get_replication_slots()
and then use that to display a string in the view
pg_replication_slots.

I strongly dislike that pattern. It just leads to complicated views - and
doesn't provide a single benefit that I am aware of. It's much bettter to
simply populate the text version in pg_get_replication_slots().

#11Michael Paquier
michael@paquier.xyz
In reply to: Andres Freund (#10)
Re: Track in pg_replication_slots the reason why slots conflict?

On Thu, Dec 21, 2023 at 07:26:56AM -0800, Andres Freund wrote:

On 2023-12-21 19:55:51 +0530, Amit Kapila wrote:

We can return int2 value from the function pg_get_replication_slots()
and then use that to display a string in the view
pg_replication_slots.

I strongly dislike that pattern. It just leads to complicated views - and
doesn't provide a single benefit that I am aware of. It's much bettter to
simply populate the text version in pg_get_replication_slots().

I agree that this is a better integration in the view, and that's what
I would do FWIW.

Amit, how much of a problem would it be to do a text->enum mapping
when synchronizing the slots from a primary to a standby? Sure you
could have a system function that does some of the mapping work, but I
am not sure what's the best integration when it comes to the other
patch.
--
Michael

#12Amit Kapila
amit.kapila16@gmail.com
In reply to: Michael Paquier (#11)
Re: Track in pg_replication_slots the reason why slots conflict?

On Fri, Dec 22, 2023 at 5:00 AM Michael Paquier <michael@paquier.xyz> wrote:

On Thu, Dec 21, 2023 at 07:26:56AM -0800, Andres Freund wrote:

On 2023-12-21 19:55:51 +0530, Amit Kapila wrote:

We can return int2 value from the function pg_get_replication_slots()
and then use that to display a string in the view
pg_replication_slots.

I strongly dislike that pattern. It just leads to complicated views - and
doesn't provide a single benefit that I am aware of. It's much bettter to
simply populate the text version in pg_get_replication_slots().

I agree that this is a better integration in the view, and that's what
I would do FWIW.

Amit, how much of a problem would it be to do a text->enum mapping
when synchronizing the slots from a primary to a standby?

There is no problem as such in that. We were trying to see if there is
a more convenient way but let's move by having cause as text from both
the function and view as that seems to be a preferred way.

--
With Regards,
Amit Kapila.

#13Amit Kapila
amit.kapila16@gmail.com
In reply to: Bertrand Drouvot (#9)
Re: Track in pg_replication_slots the reason why slots conflict?

On Thu, Dec 21, 2023 at 8:21 PM Bertrand Drouvot
<bertranddrouvot.pg@gmail.com> wrote:

On Thu, Dec 21, 2023 at 07:55:51PM +0530, Amit Kapila wrote:

On Thu, Dec 21, 2023 at 5:05 PM Andres Freund <andres@anarazel.de> wrote:

I'm not entirely sure I understand the difference - just whether we add one
new column or replace the existing 'conflicting' column? I can see arguments
for either.

Agreed. I think the argument against replacing the existing
'conflicting' column is that there is a chance that it is being used
by some monitoring script which I guess shouldn't be a big deal to
change. So, if we don't see that as a problem, I would prefer to have
a single column with conflict reason where one of its values indicates
there is no conflict.

+1

Does anyone else have a preference on whether to change the existing
column or add a new one?

--
With Regards,
Amit Kapila.

#14Michael Paquier
michael@paquier.xyz
In reply to: Amit Kapila (#13)
Re: Track in pg_replication_slots the reason why slots conflict?

On Tue, Dec 26, 2023 at 08:44:44AM +0530, Amit Kapila wrote:

Does anyone else have a preference on whether to change the existing
column or add a new one?

Just to be clear here, I'd vote for replacing the existing boolean
with a text.
--
Michael

#15Bertrand Drouvot
bertranddrouvot.pg@gmail.com
In reply to: Michael Paquier (#14)
Re: Track in pg_replication_slots the reason why slots conflict?

Hi,

On Tue, Dec 26, 2023 at 05:23:56PM +0900, Michael Paquier wrote:

On Tue, Dec 26, 2023 at 08:44:44AM +0530, Amit Kapila wrote:

Does anyone else have a preference on whether to change the existing
column or add a new one?

Just to be clear here, I'd vote for replacing the existing boolean
with a text.

Same here, I'd vote to avoid 2 columns having the same "meaning".

Regards,

--
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

#16Isaac Morland
isaac.morland@gmail.com
In reply to: Amit Kapila (#8)
Re: Track in pg_replication_slots the reason why slots conflict?

On Thu, 21 Dec 2023 at 09:26, Amit Kapila <amit.kapila16@gmail.com> wrote:

A conflicting column where NULL indicates no conflict, and other

values indicate the reason for the conflict, doesn't seem too bad.

This is fine too.

I prefer this option. There is precedent for doing it this way, for example
in pg_stat_activity.wait_event_type.

The most common test of this field is likely to be "is there a conflict"
and it's better to write this as "[fieldname] IS NOT NULL" than to
introduce a magic constant. Also, it makes clear to future maintainers that
this field has one purpose: saying what type of conflict there is, if any.
If we find ourselves wanting to record a new non-conflict status (no idea
what that could be: "almost conflict"? "probably conflict soon"?) there
would be less temptation to break existing tests for "is there a conflict".

#17shveta malik
shveta.malik@gmail.com
In reply to: Isaac Morland (#16)
Re: Track in pg_replication_slots the reason why slots conflict?

On Tue, Dec 26, 2023 at 7:35 PM Isaac Morland <isaac.morland@gmail.com> wrote:

On Thu, 21 Dec 2023 at 09:26, Amit Kapila <amit.kapila16@gmail.com> wrote:

A conflicting column where NULL indicates no conflict, and other

values indicate the reason for the conflict, doesn't seem too bad.

This is fine too.

I prefer this option. There is precedent for doing it this way, for example in pg_stat_activity.wait_event_type.

The most common test of this field is likely to be "is there a conflict" and it's better to write this as "[fieldname] IS NOT NULL" than to introduce a magic constant. Also, it makes clear to future maintainers that this field has one purpose: saying what type of conflict there is, if any. If we find ourselves wanting to record a new non-conflict status (no idea what that could be: "almost conflict"? "probably conflict soon"?) there would be less temptation to break existing tests for "is there a conflict".

+1 on using "[fieldname] IS NOT NULL" to find "is there a conflict"

PFA the patch which attempts to implement this.

This patch changes the existing 'conflicting' field to
'conflicting_cause' in pg_replication_slots. This new field is always
NULL for physical slots (like the previous field conflicting), as well
as for those logical slots which are not invalidated.

thanks
Shveta

Attachments:

v1-0001-Track-conflicting_cause-in-pg_replication_slots.patchapplication/octet-stream; name=v1-0001-Track-conflicting_cause-in-pg_replication_slots.patchDownload+92-23
#18Amit Kapila
amit.kapila16@gmail.com
In reply to: shveta malik (#17)
Re: Track in pg_replication_slots the reason why slots conflict?

On Wed, Dec 27, 2023 at 3:08 PM shveta malik <shveta.malik@gmail.com> wrote:

PFA the patch which attempts to implement this.

This patch changes the existing 'conflicting' field to
'conflicting_cause' in pg_replication_slots.

This name sounds a bit odd to me, would it be better to name it as
conflict_cause?

A few other minor comments:
=========================
*
+       <structfield>conflicting_cause</structfield> <type>text</type>
+      </para>
+      <para>
+       Cause if this logical slot conflicted with recovery (and so is now
+       invalidated). It is always NULL for physical slots, as well as for
+       those logical slots which are not invalidated. Possible values are:

Would it better to use description as follows:" Cause of logical
slot's conflict with recovery. It is always NULL for physical slots,
as well as for logical slots which are not invalidated. The non-NULL
values indicate that the slot is marked as invalidated. Possible
values are:
.."

*
  $res = $node_standby->safe_psql(
  'postgres', qq(
- select bool_and(conflicting) from pg_replication_slots;));
+ select bool_and(conflicting) from
+ (select conflicting_cause is not NULL as conflicting from
pg_replication_slots);));

Won't the query "select conflicting_cause is not NULL as conflicting
from pg_replication_slots" can return false even for physical slots
and then impact the result of the main query whereas the original
query would seem to be simply ignoring physical slots? If this
observation is correct then you might want to add a 'slot_type'
condition in the new query.

* After introducing check_slots_conflicting_cause(), do we need to
have check_slots_conflicting_status()? Aren't both checking the same
thing?

--
With Regards,
Amit Kapila.

#19shveta malik
shveta.malik@gmail.com
In reply to: Amit Kapila (#18)
Re: Track in pg_replication_slots the reason why slots conflict?

On Wed, Dec 27, 2023 at 4:16 PM Amit Kapila <amit.kapila16@gmail.com> wrote:

On Wed, Dec 27, 2023 at 3:08 PM shveta malik <shveta.malik@gmail.com> wrote:

PFA the patch which attempts to implement this.

This patch changes the existing 'conflicting' field to
'conflicting_cause' in pg_replication_slots.

This name sounds a bit odd to me, would it be better to name it as
conflict_cause?

A few other minor comments:
=========================
*
+       <structfield>conflicting_cause</structfield> <type>text</type>
+      </para>
+      <para>
+       Cause if this logical slot conflicted with recovery (and so is now
+       invalidated). It is always NULL for physical slots, as well as for
+       those logical slots which are not invalidated. Possible values are:

Would it better to use description as follows:" Cause of logical
slot's conflict with recovery. It is always NULL for physical slots,
as well as for logical slots which are not invalidated. The non-NULL
values indicate that the slot is marked as invalidated. Possible
values are:
.."

*
$res = $node_standby->safe_psql(
'postgres', qq(
- select bool_and(conflicting) from pg_replication_slots;));
+ select bool_and(conflicting) from
+ (select conflicting_cause is not NULL as conflicting from
pg_replication_slots);));

Won't the query "select conflicting_cause is not NULL as conflicting
from pg_replication_slots" can return false even for physical slots
and then impact the result of the main query whereas the original
query would seem to be simply ignoring physical slots? If this
observation is correct then you might want to add a 'slot_type'
condition in the new query.

* After introducing check_slots_conflicting_cause(), do we need to
have check_slots_conflicting_status()? Aren't both checking the same
thing?

I think it is needed for the case where we want to check that there is
no conflict.

# Verify slots are reported as non conflicting in pg_replication_slots
check_slots_conflicting_status(0);

For the cases where there is conflict, I think
check_slots_conflicting_cause() can replace
check_slots_conflicting_status().

thanks
Shveta

#20shveta malik
shveta.malik@gmail.com
In reply to: shveta malik (#19)
Re: Track in pg_replication_slots the reason why slots conflict?

On Thu, Dec 28, 2023 at 10:16 AM shveta malik <shveta.malik@gmail.com> wrote:

On Wed, Dec 27, 2023 at 4:16 PM Amit Kapila <amit.kapila16@gmail.com> wrote:

On Wed, Dec 27, 2023 at 3:08 PM shveta malik <shveta.malik@gmail.com> wrote:

PFA the patch which attempts to implement this.

This patch changes the existing 'conflicting' field to
'conflicting_cause' in pg_replication_slots.

This name sounds a bit odd to me, would it be better to name it as
conflict_cause?

A few other minor comments:
=========================

Thanks for the feedback Amit.

*
+       <structfield>conflicting_cause</structfield> <type>text</type>
+      </para>
+      <para>
+       Cause if this logical slot conflicted with recovery (and so is now
+       invalidated). It is always NULL for physical slots, as well as for
+       those logical slots which are not invalidated. Possible values are:

Would it better to use description as follows:" Cause of logical
slot's conflict with recovery. It is always NULL for physical slots,
as well as for logical slots which are not invalidated. The non-NULL
values indicate that the slot is marked as invalidated. Possible
values are:
.."

*
$res = $node_standby->safe_psql(
'postgres', qq(
- select bool_and(conflicting) from pg_replication_slots;));
+ select bool_and(conflicting) from
+ (select conflicting_cause is not NULL as conflicting from
pg_replication_slots);));

Won't the query "select conflicting_cause is not NULL as conflicting
from pg_replication_slots" can return false even for physical slots
and then impact the result of the main query whereas the original
query would seem to be simply ignoring physical slots? If this
observation is correct then you might want to add a 'slot_type'
condition in the new query.

* After introducing check_slots_conflicting_cause(), do we need to
have check_slots_conflicting_status()? Aren't both checking the same
thing?

I think it is needed for the case where we want to check that there is
no conflict.

# Verify slots are reported as non conflicting in pg_replication_slots
check_slots_conflicting_status(0);

For the cases where there is conflict, I think
check_slots_conflicting_cause() can replace
check_slots_conflicting_status().

I have removed check_slots_conflicting_status() and where it was
needed to check non-conflicting, I have added a simple query.

PFA the v2-patch with all your comments addressed.

thanks
Shveta

Attachments:

v2-0001-Track-conflicting_cause-in-pg_replication_slots.patchapplication/octet-stream; name=v2-0001-Track-conflicting_cause-in-pg_replication_slots.patchDownload+87-51
#21Amit Kapila
amit.kapila16@gmail.com
In reply to: shveta malik (#20)
#22Michael Paquier
michael@paquier.xyz
In reply to: Amit Kapila (#21)
#23shveta malik
shveta.malik@gmail.com
In reply to: Michael Paquier (#22)
#24shveta malik
shveta.malik@gmail.com
In reply to: shveta malik (#23)
#25Amit Kapila
amit.kapila16@gmail.com
In reply to: shveta malik (#24)
#26shveta malik
shveta.malik@gmail.com
In reply to: Amit Kapila (#25)
#27shveta malik
shveta.malik@gmail.com
In reply to: shveta malik (#26)
#28Amit Kapila
amit.kapila16@gmail.com
In reply to: shveta malik (#27)
#29Bertrand Drouvot
bertranddrouvot.pg@gmail.com
In reply to: Amit Kapila (#28)
#30Michael Paquier
michael@paquier.xyz
In reply to: Bertrand Drouvot (#29)
#31Amit Kapila
amit.kapila16@gmail.com
In reply to: Michael Paquier (#30)
#32Bertrand Drouvot
bertranddrouvot.pg@gmail.com
In reply to: Amit Kapila (#31)
#33vignesh C
vignesh21@gmail.com
In reply to: Amit Kapila (#31)