Function to get invalidation cause of a replication slot.
Hello hackers,
Attached is a patch which attempts to implement a new system function
pg_get_slot_invalidation_cause('slot_name') to get invalidation cause
of a replication slot.
Currently, users do not have a way to know the invalidation cause. One
can only find out if the slot is invalidated or not by querying the
'conflicting' field of pg_replication_slots. This new function
provides a way to query invalidation cause as well.
One of the use case scenarios for this function is another ongoing
thread "Synchronizing slots from primary to standby" at [1]/messages/by-id/514f6f2f-6833-4539-39f1-96cd1e011f23@enterprisedb.com. This
function is needed there to replicate invalidation-cause of a logical
replication slot from the primary server to the hot standby. But this
is an independent requirement in itself and thus makes sense to have
it implemented separately.
Please review and provide your feedback.
[1]: /messages/by-id/514f6f2f-6833-4539-39f1-96cd1e011f23@enterprisedb.com
thanks
Shveta
Attachments:
v1-0001-Function-to-get-invalidation-cause-of-a-replicati.patchapplication/octet-stream; name=v1-0001-Function-to-get-invalidation-cause-of-a-replicati.patchDownload
From 8a955a031598864e1acbe88c6bac8417319d0fb7 Mon Sep 17 00:00:00 2001
From: Shveta Malik <shveta.malik@gmail.com>
Date: Wed, 20 Dec 2023 10:54:53 +0530
Subject: [PATCH v1] Function to get invalidation cause of a replication slot.
This patch implements a new system function
pg_get_slot_invalidation_cause('slot_name')
to get invalidation cause of a replication slot.
Currently, users do not have a way to know the invalidation cause.
One can only find out if the slot is invalidated or not by querying
the 'conflicting' field of pg_replication_slots. This new function
provides a way to query invalidation cause as well.
This function returns NULL if the replication slot is not found and 0 if
the replication slot is not invalidated. Possible invalidation causes
returned are:
1 = required WAL has been removed.
2 = required rows have been removed.
3 = wal_level insufficient for the slot
---
doc/src/sgml/func.sgml | 33 +++++++++++++++++++++++++++++
src/backend/replication/slotfuncs.c | 27 +++++++++++++++++++++++
src/include/catalog/pg_proc.dat | 4 ++++
3 files changed, 64 insertions(+)
diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml
index 20da3ed033..0826168192 100644
--- a/doc/src/sgml/func.sgml
+++ b/doc/src/sgml/func.sgml
@@ -27684,6 +27684,39 @@ postgres=# SELECT '0/0'::pg_lsn + pd.segment_number * ps.setting::int + :offset
</para></entry>
</row>
+ <row>
+ <entry role="func_table_entry"><para role="func_signature">
+ <indexterm>
+ <primary>pg_get_slot_invalidation_cause</primary>
+ </indexterm>
+ <function>pg_get_slot_invalidation_cause</function> ( <parameter>slot_name</parameter> <type>name</type> )
+ <returnvalue>integer</returnvalue>
+ </para>
+ <para>
+ Returns invalidation cause of the replication slot named
+ <parameter>slot_name</parameter>. Returns NULL if the replication slot
+ is not found and 0 if the replication slot is not invalidated.
+ Possible invalidation causes are:
+ <itemizedlist spacing="compact">
+ <listitem>
+ <para>
+ <literal>1</literal> = required WAL has been removed.
+ </para>
+ </listitem>
+ <listitem>
+ <para>
+ <literal>2</literal> = required rows have been removed.
+ </para>
+ </listitem>
+ <listitem>
+ <para>
+ <literal>3</literal> = wal_level insufficient for the slot
+ </para>
+ </listitem>
+ </itemizedlist>
+ </para></entry>
+ </row>
+
<row>
<entry role="func_table_entry"><para role="func_signature">
<indexterm>
diff --git a/src/backend/replication/slotfuncs.c b/src/backend/replication/slotfuncs.c
index 4b694a03d0..a8643c621c 100644
--- a/src/backend/replication/slotfuncs.c
+++ b/src/backend/replication/slotfuncs.c
@@ -225,6 +225,33 @@ pg_drop_replication_slot(PG_FUNCTION_ARGS)
PG_RETURN_VOID();
}
+/*
+ * SQL function for getting invalidation cause of a slot.
+ *
+ * Returns ReplicationSlotInvalidationCause enum value for valid slot_name;
+ * returns NULL if slot with given name is not found.
+ *
+ * Returns RS_INVAL_NONE if the given slot is not invalidated.
+ */
+Datum
+pg_get_slot_invalidation_cause(PG_FUNCTION_ARGS)
+{
+ Name name = PG_GETARG_NAME(0);
+ ReplicationSlot *s;
+ ReplicationSlotInvalidationCause cause;
+
+ s = SearchNamedReplicationSlot(NameStr(*name), true);
+
+ if (s == NULL)
+ PG_RETURN_NULL();
+
+ SpinLockAcquire(&s->mutex);
+ cause = s->data.invalidated;
+ SpinLockRelease(&s->mutex);
+
+ PG_RETURN_INT16(cause);
+}
+
/*
* pg_get_replication_slots - SQL SRF showing all replication slots
* that currently exist on the database cluster.
diff --git a/src/include/catalog/pg_proc.dat b/src/include/catalog/pg_proc.dat
index 77e8b13764..ce001dacb9 100644
--- a/src/include/catalog/pg_proc.dat
+++ b/src/include/catalog/pg_proc.dat
@@ -11095,6 +11095,10 @@
proname => 'pg_drop_replication_slot', provolatile => 'v', proparallel => 'u',
prorettype => 'void', proargtypes => 'name',
prosrc => 'pg_drop_replication_slot' },
+{ oid => '8484', descr => 'what caused the replication slot to become invalid',
+ proname => 'pg_get_slot_invalidation_cause', provolatile => 's', proisstrict => 't',
+ prorettype => 'int2', proargtypes => 'name',
+ prosrc => 'pg_get_slot_invalidation_cause' },
{ oid => '3781',
descr => 'information about replication slots currently in use',
proname => 'pg_get_replication_slots', prorows => '10', proisstrict => 'f',
--
2.34.1
Hi,
On 12/20/23 7:01 AM, shveta malik wrote:
Hello hackers,
Attached is a patch which attempts to implement a new system function
pg_get_slot_invalidation_cause('slot_name') to get invalidation cause
of a replication slot.
Thanks! +1 for the idea to display this information through an SQL API.
I wonder if we could add a new field to pg_replication_slots instead
of creating a new function.
One of the use case scenarios for this function is another ongoing
thread "Synchronizing slots from primary to standby" at [1]. This
function is needed there to replicate invalidation-cause of a logical
replication slot from the primary server to the hot standby. But this
is an independent requirement in itself and thus makes sense to have
it implemented separately.
Agree.
My thoughts about it:
1 ===
"Currently, users do not have a way to know the invalidation cause"
I'm not sure about it, I think one could see the reasons in the log file.
2 ===
"This function returns NULL if the replication slot is not found"
Why not returning an error instead? (like pg_drop_replication_slot() does for example)
FWIW, we'd not need to cover this case if the description would be added to pg_replication_slots.
3 ===
+ <para>
+ <literal>3</literal> = wal_level insufficient for the slot
+ </para>
wal_level insufficient on the primary maybe?
4 ===
+ * Returns ReplicationSlotInvalidationCause enum value for valid slot_name;
I think it would be more user friendly to return a description instead of an enum value.
5 ===
doc/src/sgml/func.sgml | 33 +++++++++++++++++++++++++++++++++
src/backend/replication/slotfuncs.c | 27 +++++++++++++++++++++++++++
src/include/catalog/pg_proc.dat | 4 ++++
Worth to add some coverage in 019_replslot_limit.pl and 035_standby_logical_decoding.pl?
Regards,
--
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com
On Wed, Dec 20, 2023 at 12:46 PM Drouvot, Bertrand
<bertranddrouvot.pg@gmail.com> wrote:
On 12/20/23 7:01 AM, shveta malik wrote:
Hello hackers,
Attached is a patch which attempts to implement a new system function
pg_get_slot_invalidation_cause('slot_name') to get invalidation cause
of a replication slot.Thanks! +1 for the idea to display this information through an SQL API.
I wonder if we could add a new field to pg_replication_slots instead
of creating a new function.
Yeah, that is another option. We already have a boolean column
'conflicting' in pg_replication_slots, so are you suggesting adding a
new column like 'conflict_cause'? I feel it is okay to have an SQL
function for this as it may be primarily used for internal purposes or
for some specific users who want to dig deeper for the invalidation
cause.
One of the use case scenarios for this function is another ongoing
thread "Synchronizing slots from primary to standby" at [1]. This
function is needed there to replicate invalidation-cause of a logical
replication slot from the primary server to the hot standby. But this
is an independent requirement in itself and thus makes sense to have
it implemented separately.Agree.
My thoughts about it:
1 ===
"Currently, users do not have a way to know the invalidation cause"
I'm not sure about it, I think one could see the reasons in the log file.
2 ===
"This function returns NULL if the replication slot is not found"
Why not returning an error instead? (like pg_drop_replication_slot() does for example)
+1. Also, the check for NULL argument should be there.
FWIW, we'd not need to cover this case if the description would be added to pg_replication_slots.
3 ===
+ <para> + <literal>3</literal> = wal_level insufficient for the slot + </para>wal_level insufficient on the primary maybe?
4 ===
+ * Returns ReplicationSlotInvalidationCause enum value for valid slot_name;
I think it would be more user friendly to return a description instead of an enum value.
But it would be tricky to use at a place where we want to know the
enum value as in the case of the sync slots patch where we want to
copy the cause. I think it would be better to display the description
if we want to display it in the view.
--
With Regards,
Amit Kapila.
On Wed, Dec 20, 2023 at 2:20 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
On Wed, Dec 20, 2023 at 12:46 PM Drouvot, Bertrand
<bertranddrouvot.pg@gmail.com> wrote:On 12/20/23 7:01 AM, shveta malik wrote:
Hello hackers,
Attached is a patch which attempts to implement a new system function
pg_get_slot_invalidation_cause('slot_name') to get invalidation cause
of a replication slot.Thanks! +1 for the idea to display this information through an SQL API.
I wonder if we could add a new field to pg_replication_slots instead
of creating a new function.Yeah, that is another option. We already have a boolean column
'conflicting' in pg_replication_slots, so are you suggesting adding a
new column like 'conflict_cause'? I feel it is okay to have an SQL
function for this as it may be primarily used for internal purposes or
for some specific users who want to dig deeper for the invalidation
cause.One of the use case scenarios for this function is another ongoing
thread "Synchronizing slots from primary to standby" at [1]. This
function is needed there to replicate invalidation-cause of a logical
replication slot from the primary server to the hot standby. But this
is an independent requirement in itself and thus makes sense to have
it implemented separately.Agree.
My thoughts about it:
1 ===
"Currently, users do not have a way to know the invalidation cause"
I'm not sure about it, I think one could see the reasons in the log file.
2 ===
"This function returns NULL if the replication slot is not found"
Why not returning an error instead? (like pg_drop_replication_slot() does for example)
+1. Also, the check for NULL argument should be there.
If arg is NULL, it will not come to this function call. It comes to
this function if the signature matches. That is why other functions
like pg_drop_replication_slot(), pg_replication_slot_advance() etc do
not have NULL arg check as well.
FWIW, we'd not need to cover this case if the description would be added to pg_replication_slots.
3 ===
+ <para> + <literal>3</literal> = wal_level insufficient for the slot + </para>wal_level insufficient on the primary maybe?
4 ===
+ * Returns ReplicationSlotInvalidationCause enum value for valid slot_name;
I think it would be more user friendly to return a description instead of an enum value.
But it would be tricky to use at a place where we want to know the
enum value as in the case of the sync slots patch where we want to
copy the cause. I think it would be better to display the description
if we want to display it in the view.--
With Regards,
Amit Kapila.
PFA v2 patch. Addressed below comments:
1) Added test in 019_replslot_limit.pl
2) 'pg_get_slot_invalidation_cause' now returns error if the given
slot does not exist
3) Corrected doc and commit msg.
thanks
Shveta
Attachments:
v2-0001-Function-to-get-invalidation-cause-of-a-replicati.patchapplication/octet-stream; name=v2-0001-Function-to-get-invalidation-cause-of-a-replicati.patchDownload
From fe498c31c2163a69728ad749e365112c65739fe1 Mon Sep 17 00:00:00 2001
From: Shveta Malik <shveta.malik@gmail.com>
Date: Wed, 20 Dec 2023 10:54:53 +0530
Subject: [PATCH v2] Function to get invalidation cause of a replication slot.
This patch implements a new system function
pg_get_slot_invalidation_cause('slot_name')
to get invalidation cause of a replication slot.
This function returns 0 if the replication slot is not invalidated;
else returns invalidation cause. Possible invalidation causes are:
1 = required WAL has been removed.
2 = required rows have been removed.
3 = wal_level insufficient on the primary server
---
doc/src/sgml/func.sgml | 32 +++++++++++++++++++++++
src/backend/replication/slotfuncs.c | 27 +++++++++++++++++++
src/include/catalog/pg_proc.dat | 4 +++
src/test/recovery/t/019_replslot_limit.pl | 6 +++++
4 files changed, 69 insertions(+)
diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml
index 20da3ed033..2ec0c411ca 100644
--- a/doc/src/sgml/func.sgml
+++ b/doc/src/sgml/func.sgml
@@ -27684,6 +27684,38 @@ postgres=# SELECT '0/0'::pg_lsn + pd.segment_number * ps.setting::int + :offset
</para></entry>
</row>
+ <row>
+ <entry role="func_table_entry"><para role="func_signature">
+ <indexterm>
+ <primary>pg_get_slot_invalidation_cause</primary>
+ </indexterm>
+ <function>pg_get_slot_invalidation_cause</function> ( <parameter>slot_name</parameter> <type>name</type> )
+ <returnvalue>integer</returnvalue>
+ </para>
+ <para>
+ Returns invalidation cause of the replication slot named
+ <parameter>slot_name</parameter>. Returns 0 if the given replication
+ slot is not invalidated. Possible invalidation causes are:
+ <itemizedlist spacing="compact">
+ <listitem>
+ <para>
+ <literal>1</literal> = required WAL has been removed.
+ </para>
+ </listitem>
+ <listitem>
+ <para>
+ <literal>2</literal> = required rows have been removed.
+ </para>
+ </listitem>
+ <listitem>
+ <para>
+ <literal>3</literal> = wal_level insufficient on the primary server
+ </para>
+ </listitem>
+ </itemizedlist>
+ </para></entry>
+ </row>
+
<row>
<entry role="func_table_entry"><para role="func_signature">
<indexterm>
diff --git a/src/backend/replication/slotfuncs.c b/src/backend/replication/slotfuncs.c
index 4b694a03d0..3ac6c45170 100644
--- a/src/backend/replication/slotfuncs.c
+++ b/src/backend/replication/slotfuncs.c
@@ -225,6 +225,33 @@ pg_drop_replication_slot(PG_FUNCTION_ARGS)
PG_RETURN_VOID();
}
+/*
+ * SQL function for getting invalidation cause of a slot.
+ *
+ * Returns ReplicationSlotInvalidationCause enum value for valid slot_name;
+ * Returns RS_INVAL_NONE if the given slot is not invalidated.
+ */
+Datum
+pg_get_slot_invalidation_cause(PG_FUNCTION_ARGS)
+{
+ Name name = PG_GETARG_NAME(0);
+ ReplicationSlot *s;
+ ReplicationSlotInvalidationCause cause;
+
+ s = SearchNamedReplicationSlot(NameStr(*name), true);
+ if (s == NULL)
+ ereport(ERROR,
+ (errcode(ERRCODE_UNDEFINED_OBJECT),
+ errmsg("replication slot \"%s\" does not exist",
+ NameStr(*name))));
+
+ SpinLockAcquire(&s->mutex);
+ cause = s->data.invalidated;
+ SpinLockRelease(&s->mutex);
+
+ PG_RETURN_INT16(cause);
+}
+
/*
* pg_get_replication_slots - SQL SRF showing all replication slots
* that currently exist on the database cluster.
diff --git a/src/include/catalog/pg_proc.dat b/src/include/catalog/pg_proc.dat
index 77e8b13764..ce001dacb9 100644
--- a/src/include/catalog/pg_proc.dat
+++ b/src/include/catalog/pg_proc.dat
@@ -11095,6 +11095,10 @@
proname => 'pg_drop_replication_slot', provolatile => 'v', proparallel => 'u',
prorettype => 'void', proargtypes => 'name',
prosrc => 'pg_drop_replication_slot' },
+{ oid => '8484', descr => 'what caused the replication slot to become invalid',
+ proname => 'pg_get_slot_invalidation_cause', provolatile => 's', proisstrict => 't',
+ prorettype => 'int2', proargtypes => 'name',
+ prosrc => 'pg_get_slot_invalidation_cause' },
{ oid => '3781',
descr => 'information about replication slots currently in use',
proname => 'pg_get_replication_slots', prorows => '10', proisstrict => 'f',
diff --git a/src/test/recovery/t/019_replslot_limit.pl b/src/test/recovery/t/019_replslot_limit.pl
index 7d94f15778..61ac19e974 100644
--- a/src/test/recovery/t/019_replslot_limit.pl
+++ b/src/test/recovery/t/019_replslot_limit.pl
@@ -201,6 +201,12 @@ $result = $node_primary->safe_psql(
is($result, "rep1|f|t|lost|",
'check that the slot became inactive and the state "lost" persists');
+$result = $node_primary->safe_psql(
+ 'postgres',
+ qq[SELECT pg_get_slot_invalidation_cause('rep1')]);
+is($result, "1",
+ 'check that the invalidation cause of the slot is obtained correctly');
+
# Wait until current checkpoint ends
my $checkpoint_ended = 0;
for (my $i = 0; $i < 10 * $PostgreSQL::Test::Utils::timeout_default; $i++)
--
2.34.1
On Wed, Dec 20, 2023 at 12:46 PM Drouvot, Bertrand
<bertranddrouvot.pg@gmail.com> wrote:
Hi,
On 12/20/23 7:01 AM, shveta malik wrote:
Hello hackers,
Attached is a patch which attempts to implement a new system function
pg_get_slot_invalidation_cause('slot_name') to get invalidation cause
of a replication slot.Thanks! +1 for the idea to display this information through an SQL API.
I wonder if we could add a new field to pg_replication_slots instead
of creating a new function.One of the use case scenarios for this function is another ongoing
thread "Synchronizing slots from primary to standby" at [1]. This
function is needed there to replicate invalidation-cause of a logical
replication slot from the primary server to the hot standby. But this
is an independent requirement in itself and thus makes sense to have
it implemented separately.Agree.
My thoughts about it:
1 ===
"Currently, users do not have a way to know the invalidation cause"
I'm not sure about it, I think one could see the reasons in the log file.
2 ===
"This function returns NULL if the replication slot is not found"
Why not returning an error instead? (like pg_drop_replication_slot() does for example)
FWIW, we'd not need to cover this case if the description would be added to pg_replication_slots.
3 ===
+ <para> + <literal>3</literal> = wal_level insufficient for the slot + </para>wal_level insufficient on the primary maybe?
4 ===
+ * Returns ReplicationSlotInvalidationCause enum value for valid slot_name;
I think it would be more user friendly to return a description instead of an enum value.
5 ===
doc/src/sgml/func.sgml | 33 +++++++++++++++++++++++++++++++++
src/backend/replication/slotfuncs.c | 27 +++++++++++++++++++++++++++
src/include/catalog/pg_proc.dat | 4 ++++Worth to add some coverage in 019_replslot_limit.pl and 035_standby_logical_decoding.pl?
I have recently added a test in 019_replslot_limit.pl in v2. Do you
suggest adding in 035_standby_logical_decoding as well? Will it have
additional benefits?
thanks
Shveta
Hi,
On 12/20/23 9:50 AM, Amit Kapila wrote:
On Wed, Dec 20, 2023 at 12:46 PM Drouvot, Bertrand
<bertranddrouvot.pg@gmail.com> wrote:On 12/20/23 7:01 AM, shveta malik wrote:
Hello hackers,
Attached is a patch which attempts to implement a new system function
pg_get_slot_invalidation_cause('slot_name') to get invalidation cause
of a replication slot.Thanks! +1 for the idea to display this information through an SQL API.
I wonder if we could add a new field to pg_replication_slots instead
of creating a new function.Yeah, that is another option. We already have a boolean column
'conflicting' in pg_replication_slots, so are you suggesting adding a
new column like 'conflict_cause'?
Yes.
I feel it is okay to have an SQL
function for this as it may be primarily used for internal purposes or
for some specific users who want to dig deeper for the invalidation
cause.4 ===
+ * Returns ReplicationSlotInvalidationCause enum value for valid slot_name;
I think it would be more user friendly to return a description instead of an enum value.
But it would be tricky to use at a place where we want to know the
enum value as in the case of the sync slots patch where we want to
copy the cause.
Yeah right, what about displaying both then? (one field for the enum and one for
the description). I feel it's not friendly to ask users to refer to the documentation
(or the commit message) to get the meaning of the output.
Another option could be to create a new view, say pg_slot_invalidation_causes, that would list
them all (cause_id, cause_description) and then keep pg_get_slot_invalidation_cause() returning
the cause_id only.
Also I think we should add a comment in ReplicationSlotInvalidationCause definition (slot.h)
that any new enum values (if any) should be added after the ones that are already defined (to
provide some consistency across changes in this area if any).
Regards,
--
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com
Hi,
On 12/20/23 10:57 AM, shveta malik wrote:
On Wed, Dec 20, 2023 at 12:46 PM Drouvot, Bertrand
Worth to add some coverage in 019_replslot_limit.pl and 035_standby_logical_decoding.pl?
I have recently added a test in 019_replslot_limit.pl in v2.
Thanks!
Do you
suggest adding in 035_standby_logical_decoding as well? Will it have
additional benefits?
That would cover the remaining invalidation causes but I'm not sure
that's worth it as this new function is simple enough after all.
Regards,
--
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com
On Wed, Dec 20, 2023 at 4:10 PM Drouvot, Bertrand
<bertranddrouvot.pg@gmail.com> wrote:
On 12/20/23 9:50 AM, Amit Kapila wrote:
On Wed, Dec 20, 2023 at 12:46 PM Drouvot, Bertrand
<bertranddrouvot.pg@gmail.com> wrote:4 ===
+ * Returns ReplicationSlotInvalidationCause enum value for valid slot_name;
I think it would be more user friendly to return a description instead of an enum value.
But it would be tricky to use at a place where we want to know the
enum value as in the case of the sync slots patch where we want to
copy the cause.Yeah right, what about displaying both then? (one field for the enum and one for
the description). I feel it's not friendly to ask users to refer to the documentation
(or the commit message) to get the meaning of the output.Another option could be to create a new view, say pg_slot_invalidation_causes, that would list
them all (cause_id, cause_description) and then keep pg_get_slot_invalidation_cause() returning
the cause_id only.
I am not sure if it is really valuable enough to have a separate view
but if others also feel that would be a good idea then we can do it. I
feel for the current purpose having a function as proposed in the
patch is good enough, we can always extend it or add a new view
dependening on if some users really care about it.
--
With Regards,
Amit Kapila.
Hi,
On 12/20/23 10:55 AM, shveta malik wrote:
On Wed, Dec 20, 2023 at 2:20 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
PFA v2 patch. Addressed below comments:
1) Added test in 019_replslot_limit.pl
2) 'pg_get_slot_invalidation_cause' now returns error if the given
slot does not exist
3) Corrected doc and commit msg.
Thanks!
+ <literal>3</literal> = wal_level insufficient on the primary server
"." is missing at the end (to be consistent with 1 and 2). Same
in the commit message.
+ * Returns ReplicationSlotInvalidationCause enum value for valid slot_name;
Not sure the sentence should finish with ";".
Another Nit is to add a comment in ReplicationSlotInvalidationCause definition (slot.h)
that any new enum values (if any) should be added after the ones that are already defined (to
provide some consistency across changes in this area if any).
Except the above Nit(s) the patch LGTM.
Regards,
--
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com
On Wed, Dec 20, 2023 at 12:43:45PM +0100, Drouvot, Bertrand wrote:
+ <literal>3</literal> = wal_level insufficient on the primary server
"." is missing at the end (to be consistent with 1 and 2). Same
in the commit message.+ * Returns ReplicationSlotInvalidationCause enum value for valid slot_name;
Not sure the sentence should finish with ";".
Another Nit is to add a comment in ReplicationSlotInvalidationCause definition (slot.h)
that any new enum values (if any) should be added after the ones that are already defined (to
provide some consistency across changes in this area if any).Except the above Nit(s) the patch LGTM.
This patch has a problem: this design can easily cause the report of
data inconsistent with pg_replication_slots because the data returned
by pg_get_replication_slots() and pg_get_slot_invalidation_cause()
would happen across *two* function call contexts, no? So, it seems to
me that this had better be integrated as an extra column of
pg_get_replication_slots() to ensure the consistency of the data
reported. And it's critical to get consistent data for monitoring.
Not to mention that the lookup had better happen also while holding
ReplicationSlotControlLock as well, which is also something
InvalidatePossiblyObsoleteSlot() relies on. v2 ignores that.
--
Michael
On Thu, Dec 21, 2023 at 8:47 AM Michael Paquier <michael@paquier.xyz> wrote:
On Wed, Dec 20, 2023 at 12:43:45PM +0100, Drouvot, Bertrand wrote:
+ <literal>3</literal> = wal_level insufficient on the primary server
"." is missing at the end (to be consistent with 1 and 2). Same
in the commit message.+ * Returns ReplicationSlotInvalidationCause enum value for valid slot_name;
Not sure the sentence should finish with ";".
Another Nit is to add a comment in ReplicationSlotInvalidationCause definition (slot.h)
that any new enum values (if any) should be added after the ones that are already defined (to
provide some consistency across changes in this area if any).Except the above Nit(s) the patch LGTM.
This patch has a problem: this design can easily cause the report of
data inconsistent with pg_replication_slots because the data returned
by pg_get_replication_slots() and pg_get_slot_invalidation_cause()
would happen across *two* function call contexts, no? So, it seems to
me that this had better be integrated as an extra column of
pg_get_replication_slots() to ensure the consistency of the data
reported. And it's critical to get consistent data for monitoring.
It depends on how one uses the function. For example, if one finds
there is a conflict via pg_get_replication_slots() and wants to check
the reason for the same via this new function then it would give the
correct answer. Now, if we think it is okay to have two columns
'conflicting' and 'conflict_reason/conflict_cause' to be returned by
pg_get_replication_slots() then that should serve the purpose as well
but one can argue that 'conflicting' is deducible from
'conflict_reason'.
The other thing we want to decide is how useful will it be to display
this information via pg_replication_slots view considering we already
have a boolean for it. Shall we add yet another column in the view and
if so, it would probably be a string rather than an enum? Now, if we
do so, we still want function pg_get_replication_slots() or
pg_get_slot_invalidation_cause() to return an enum value as we want
that to be synced/copied to standby.
--
With Regards,
Amit Kapila.
On Thu, Dec 21, 2023 at 09:37:39AM +0530, Amit Kapila wrote:
It depends on how one uses the function. For example, if one finds
there is a conflict via pg_get_replication_slots() and wants to check
the reason for the same via this new function then it would give the
correct answer.
My point is that we may not get the "correct" answer at all :/
What guarantee do you have that between the scan of
pg_get_replication_slots() to get the value of "conflicting" and the
call of pg_get_slot_invalidation_cause() the slot state will still be
the same? A lot could happen between both function calls while the
repslot LWLock is not hold.
Now, if we think it is okay to have two columns
'conflicting' and 'conflict_reason/conflict_cause' to be returned by
pg_get_replication_slots() then that should serve the purpose as well
but one can argue that 'conflicting' is deducible from
'conflict_reason'.
Yeah, you could keep the reason text as NULL when there is no
conflict, replacing the boolean by the text in the function, and keep
the view definition compatible with v16 while adding an extra column.
--
Michael
On Thu, Dec 21, 2023 at 11:18 AM Michael Paquier <michael@paquier.xyz> wrote:
On Thu, Dec 21, 2023 at 09:37:39AM +0530, Amit Kapila wrote:
It depends on how one uses the function. For example, if one finds
there is a conflict via pg_get_replication_slots() and wants to check
the reason for the same via this new function then it would give the
correct answer.My point is that we may not get the "correct" answer at all :/
What guarantee do you have that between the scan of
pg_get_replication_slots() to get the value of "conflicting" and the
call of pg_get_slot_invalidation_cause() the slot state will still be
the same?
Yeah, if one uses them independently then there is no such guarantee.
A lot could happen between both function calls while the
repslot LWLock is not hold.Now, if we think it is okay to have two columns
'conflicting' and 'conflict_reason/conflict_cause' to be returned by
pg_get_replication_slots() then that should serve the purpose as well
but one can argue that 'conflicting' is deducible from
'conflict_reason'.Yeah, you could keep the reason text as NULL when there is no
conflict, replacing the boolean by the text in the function, and keep
the view definition compatible with v16 while adding an extra column.
But as mentioned we also want the enum value to be exposed in some way
so that it can be used by the sync slot feature [1]/messages/by-id/OS0PR01MB5716DAF72265388A2AD424119495A@OS0PR01MB5716.jpnprd01.prod.outlook.com as well,
otherwise, we may need some mappings to convert the text back to an
enum. I guess if we want to expose via view, then we can return an
enum value by pg_get_replication_slots() and the view can replace it
with text based on the value.
[1]: /messages/by-id/OS0PR01MB5716DAF72265388A2AD424119495A@OS0PR01MB5716.jpnprd01.prod.outlook.com
--
With Regards,
Amit Kapila.
On Thu, Dec 21, 2023 at 11:53:04AM +0530, Amit Kapila wrote:
On Thu, Dec 21, 2023 at 11:18 AM Michael Paquier <michael@paquier.xyz> wrote:
Yeah, if one uses them independently then there is no such guarantee.
This could be possible in the same query as well, still less likely,
as the contents are volatile.
A lot could happen between both function calls while the
repslot LWLock is not hold.Yeah, you could keep the reason text as NULL when there is no
conflict, replacing the boolean by the text in the function, and keep
the view definition compatible with v16 while adding an extra column.But as mentioned we also want the enum value to be exposed in some way
so that it can be used by the sync slot feature [1] as well,
otherwise, we may need some mappings to convert the text back to an
enum. I guess if we want to expose via view, then we can return an
enum value by pg_get_replication_slots() and the view can replace it
with text based on the value.
Sure. Something like is OK by me as long as the data is retrieved
from a single scan of the slot data while holding the slot data's
LWLock.
--
Michael
On Thu, Dec 21, 2023 at 12:07 PM Michael Paquier <michael@paquier.xyz> wrote:
On Thu, Dec 21, 2023 at 11:53:04AM +0530, Amit Kapila wrote:
On Thu, Dec 21, 2023 at 11:18 AM Michael Paquier <michael@paquier.xyz> wrote:
Yeah, if one uses them independently then there is no such guarantee.This could be possible in the same query as well, still less likely,
as the contents are volatile.
True, this is quite obvious but that was not a recommended way to use
the function. Anyway, now that we agree to expose it via an existing
function, there is no point in further argument on this.
A lot could happen between both function calls while the
repslot LWLock is not hold.Yeah, you could keep the reason text as NULL when there is no
conflict, replacing the boolean by the text in the function, and keep
the view definition compatible with v16 while adding an extra column.But as mentioned we also want the enum value to be exposed in some way
so that it can be used by the sync slot feature [1] as well,
otherwise, we may need some mappings to convert the text back to an
enum. I guess if we want to expose via view, then we can return an
enum value by pg_get_replication_slots() and the view can replace it
with text based on the value.Sure. Something like is OK by me as long as the data is retrieved
from a single scan of the slot data while holding the slot data's
LWLock.
Okay, so let's go this way unless someone feels otherwise.
--
With Regards,
Amit Kapila.
On Thu, Dec 21, 2023 at 2:59 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
On Thu, Dec 21, 2023 at 12:07 PM Michael Paquier <michael@paquier.xyz> wrote:
On Thu, Dec 21, 2023 at 11:53:04AM +0530, Amit Kapila wrote:
On Thu, Dec 21, 2023 at 11:18 AM Michael Paquier <michael@paquier.xyz> wrote:
Yeah, if one uses them independently then there is no such guarantee.This could be possible in the same query as well, still less likely,
as the contents are volatile.True, this is quite obvious but that was not a recommended way to use
the function. Anyway, now that we agree to expose it via an existing
function, there is no point in further argument on this.A lot could happen between both function calls while the
repslot LWLock is not hold.Yeah, you could keep the reason text as NULL when there is no
conflict, replacing the boolean by the text in the function, and keep
the view definition compatible with v16 while adding an extra column.But as mentioned we also want the enum value to be exposed in some way
so that it can be used by the sync slot feature [1] as well,
otherwise, we may need some mappings to convert the text back to an
enum. I guess if we want to expose via view, then we can return an
enum value by pg_get_replication_slots() and the view can replace it
with text based on the value.Sure. Something like is OK by me as long as the data is retrieved
from a single scan of the slot data while holding the slot data's
LWLock.Okay, so let's go this way unless someone feels otherwise.
Please track the progress in another thread [1]/messages/by-id/ZYOE8IguqTbp-seF@paquier.xyz where the patch is posted today.
[1]: /messages/by-id/ZYOE8IguqTbp-seF@paquier.xyz
thanks
Shveta